Computer Vision Project 2 - Automatic Panoramic Mosaic Stitching

CSC5280 Project 2: Automatic Panoramic Mosaic Stitching Wei Zhang Dept. of Information Engineering, The Chinese University of Hong Kong
Fig. 1 A panorama of gymnasium.
[Basic Algorithm]	[More Issues]	[Experimental Results]
[Appendix] How to Run		[Reference]
Download: Test Data, Executable File, Code
A panorama (e.g. my result in Fig. 1), is some wide-angle view of a physical space in a photograph. In this project, I implement an algorithm for aligning photos and stitching them into a seamless panorama. I also implement some advanced issues, including Bundle Adjustment, Global Orientation Specification, Pyramid Blending and acceleration. This report is best viewed in Internet Exploror. 1. Basic Algorithm The algorithm consists of five steps: 1. Correct radial distortion in each image; 2. Extract distinctive features for each image; 3. Align input images pairwisely; 4. Automatically stitch images, i.e. compute the transformation matrix for each image to the final panorama; 5. Warp the images and stitch them into the final panorama. I will go througth them one by one. 1.1 Radial Distortion Correction (Related file: WarpSpherical.cpp; Related routine: WarpRDField.) The photo captured by cameras may have radial distortion. The simplified model of radial distortion is (1) where x'n and y'n are normalized image coordinates, i.e. , and . The distorted image coordinates are (2) I directly apply this model as I use inverse mapping to obtain undistorted images. 1.2 Distinctive Feature Extraction I use SIFT features [2] for image matching. Thanks to Lowe's implementation of SIFT. For details of feature detection and description, please refer to their paper [2]. 1.3 Pairwise Alignment (Related files: FeatureMatch.cpp, FeatureAlign.cpp; Related routines: FeatureMatchByDistRatio, alignPair, countInliers, and leastSquaresFit.) The part is to compute the transform from image i to image j for each pair (i, j). *Initial Matching* First of all, the intial matching of a feature point in image i is the corresponding point of its feature descriptor's nearest neighor in the descriptor pool of image j. Euclidean distance is used. If the ratio between the distance of the nearest neighbor and the second nearest neighbor is less than r, the feature point is regarded as distinctive; otherwise the feature point will not be used. Note that the distinctiveness is critical for images with repeated patterns such as textures. In my implementation, r is typically 0.7 and may need to be changed according to the matching result. *RANSAC* Then RANSAC algorithm [4] is used to find the transform. RANSAC is an iterative algorithm. In each iteration, it randomly selects two pairs of matching points and compute the transformation matrix as follows. (3) where and i = 0, 1. So the transformation matrix . The set of inliers is chosen as f \|pi’ - R pi\| < eps. Finally, the largest set of inliers is kept. The full homography is then estimated by these inliners using (3). 1.4 Stitching (Related file: GlobalAlign.cpp; Related routine: initGlobalAlign.) The automatic stitching involves how to transform all the images to a common plane. First, only the good matchings between images are kept. Then the matchings are sorted by their strength (i.e. the number of inliers). After that, the images are added according to the next strongest match (and other relevant matches) to current stitch and perform global alignment every time. 1.5 Final Panorama Blending (Related files: WarpSpherical.cpp, BlendImages.cpp; Related routines: WarpSphericalField, BlendImages, AccumulateBlend, NormalizeBlend.) The blending uses inverse mapping. The correspondence of each pixel in the panorama is searched for each image. The computation is: for a pixel in the panorama (xsph, ysph), (4) Fig. 2 The sphere coordinates. where (xc, yc) is the source image center and (x'c, y'c) is the panorama center. R is the transformation matrix obtained in the stitching. As a pixel may have multiple corresponding pixels from different images, the result pixel value is the weighted sum of all these pixel values. A simple linear feathering function is chosen as the weighting function [6].
		[Go to Next Section]