Large collections of visual data sets are created every day and the analysis of the content of high-dimensional data is an important research problem. In many domains, we need efficient tools to perform tasks such as the classification, clustering and modeling of image data.
Meanwhile, in practice, images that are captured by arbitrary users in uncontrolled environments are very likely to undergo geometric transformations. This creates a challenge for their analysis, and in particular, for their classification. One solution for the classification of geometrically transformed image sets is to first align them and then apply traditional classification techniques. However, in this case the reference image model used for aligning the image set affects the classification performance a lot. Another solution is to train classification algorithms with many training images of known class labels. This, however, requires the availability of large amounts of manually labelled image examples, which may be difficult to supply.
Parametric Image Models
Such limitations lead to the need for effective image analysis techniques that can successfully handle geometric transformations. In our research, we focus on image analysis methods that are based on parametric image models. Parametric models are especially useful for coping with geometric transformations, since they can often be modeled with a few transformation parameters. In particular, we study the analysis of images with transformation manifolds, which are image sets generated by the parametrizable geometric transformations of a reference image model. Some examples for transformation manifolds are pattern transformation manifolds (images created by the 2-D geometric transformations of a reference image) and object observation manifolds (images capturing an object from varying viewpoints).
Manifold models provide low-dimensional representations that can successfully capture the main characteristics of image data sets in a concise way. In particular, manifold models provide a geometric interpretation of image analysis problems such as image registration and image classification.
Image registration with manifold models
The purpose of image registration is to find a mapping, i.e. a warping function, that gives the best approximation of a target image from a reference image. If the warping function between the two images is parametrizable, the registration problem can be regarded as the estimation of the point on the transformation manifold of the reference image that is closest to the target image. This corresponds to the computation of the projection of the target image onto the transformation manifold of the reference image.
Image classification with manifold models
Similarly, in a setting where different image classes are represented with different transformation manifolds, the class label of a query image can be estimated by comparing its distance to the class-representative transformation manifolds. The image is assigned the class label of the manifold that is closest to it.
In such a setting, transformation manifolds define a decision boundary in the high-dimensional image space that separates the regions that are closest to different manifolds. Since the manifolds are generated by geometric transformations of class-representative image models, the decision boundary naturally takes care of geometric transformations in classification. In other words, the decision boundary induced by the manifolds defines a transformation-invariant classification rule.
Transformation-invariant image analysis with manifolds
Let us now look at some problems concerning the transformation-invariant analysis of data sets.
Manifold distance computation
An important problem in the analysis of image sets with manifold models is the estimation of the manifold distance, which means the distance between a query image and a manifold. As discussed above, image registration and classification applications require the computation of the projection of a query image onto a reference transformation manifold. However, this is a complicated optimization problem that does not have a known optimal solution applicable for arbitrary transformation models. We overview below some possible approaches to compute the manifold distance.
Manifold sampling: A simple way to compute the manifold distance is to take samples from the manifold, compare the distance of a query image q to the manifold samples, and approximate the manifold distance with the distance of q to the closest manifold sample. This is illustrated in the figure below.
Approximation of the manifold distance with manifold samples. The distance d is the true manifold distance and its approximation is estimated with samples taken from the manifold.
As more and more samples are taken from the manifold, the accuracy of the estimation will get better and better. However, in order to achieve a reasonably fast estimation, one is often required to put a limit on the number of samples. In this case, the choice of the samples gets important as the accuracy of the manifold distance estimation depends highly on the sampling. Therefore, in an image registration or classification application where predetermined and fixed transformation manifolds are used for analyzing various different query images, an efficient solution is to find a good sampling of the manifolds offline, and then use these samples to estimate the distances of query images to the manifolds in a fast way.
We propose a solution for sampling parametrizable manifolds for accurate manifold distance estimation in this study. We look at two aspects of the manifold sampling problem. We first propose an algorithm to select samples from a given reference transformation manifold in order to attain a good registration performance. We then study the sampling of multiple manifolds representing different classes such that the selected sample sets yield a good classification accuracy when they are used in the estimation of the distances to the manifolds.
Multiscale manifold representations: The image registration problem can be interpreted as a manifold distance estimation problem, if the motion between the reference image and the target image can be represented with a globally parametrizable model. In this case, the reference transformation manifold is nothing but the pattern transformation manifold generated by the geometric transformations of the reference image. Numerous methods have been proposed so far to solve the image registration problem. Many of these methods are coupled with a hierarchical coarse-to-fine alignment strategy. In hierarchical alignment, the transformation parameters that best align the reference image and the target image are estimated in a progressive way in several stages. The transformation parameters in each stage are computed using smoothed versions of the image pair, where the strength of smoothing (the size of the low-pass filter used in smoothing) is decreased throughout the stages. In this way, transformation parameters are roughly estimated at the coarse scales in the beginning; and then this estimation is refined gradually by moving to the fine scales. Coarse-to-fine image registration is illustrated in the figure below.
Hierarchical estimation of the transformation parameters λ* that best align an image pair. The estimate λ k of each stage is used as an initial solution for λ k+1 in the next stage. The size of the low-pass filter is reduced gradually throughout the stages.
In order to get the best performance in hierarchical registration, one needs to design the registration algorithm properly. It is especially important to determine the sizes of the low-pass filters to be used in smoothing the image pair in each stage of the alignment. Although the hierarchical alignment strategy is used very commonly in image registration algorithms in practice, there are very few works that analyze the image registration problem in a multiscale setting to search the effect of low-pass filtering on the performance of registration. This has been the motivation of our recent study on hierarchical image registration, where we present a theoretical analysis of the multiscale performance of local (descent-type) optimizers that estimate the 2-D translation between a pair of visual patterns. The first part of our analysis shows why smoothing with large filters is necessary at the coarse scales of the pyramid: The region of translation vectors that are correctly computable with local optimizers expands at least quadratically with respect the size of the low-pass filter. Then, the second part of our analysis justifies why and to what extent fine scales are needed: Low-pass filtering amplifies the estimation errors stemming from image noise. This creates a source of error at coarse scales and these errors should be compensated for at fine scales.
Learning manifold representations from data
Another key problem regarding the analysis of images with manifold models is to learn good manifold models from available data samples. Recently, many dimensionality reduction methods have been proposed to learn a mapping of data samples to a lower-dimensional space, which are callled manifold learning methods. We discuss below some problems related to manifold learning.
Manifold learning with data priors : Classical manifold learning methods (such as ISOMAP, LLE, …) treat the data samples as an arbitrary point cloud in the ambient signal space and do not make use of any additional information about the data model that generates or explains these samples. However, in a real data analysis application, it is quite often possible to have some information about the data model. In this case, the utilization of this information in the learning can improve a lot the accuracy of the computed parameterization.
In our study of manifold learning, we focus on a setting where the input data samples are images that are exposed to geometric transformations. We then treat the manifold learning problem in an application-oriented manner. The type of the geometric transformation model (e.g., rotation, scaling) that explains the data is used in the learning. We first consider the problem of learning manifolds for image approximation. We propose an algorithm to learn a pattern transformation manifold (PTM) that fits well a given set of input images. Next, we consider the problem of learning PTMs for image classification. We present a method to jointly learn multiple PTMs each of which represents an image class accurately. The PTMs can then be used to classify test images of unknown class labels by comparing their distances to the learned class-representative manifolds. This is illustrated in the figure below. From the classification perspective, the proposed manifold learning method is a supervised classification algorithm, where the learning of the manifolds is nothing but the learning of a nonlinear classifier directly in the original image space.
Proposed approach for supervised manifold learning. Input images of known class labels are used to learn class-representative pattern transformation manifolds. Class labels of query images can then be estimated by computing their distances to the manifolds.
Performance analysis of manifold learning algorithms : Many manifold learning algorithms are based on the assumption that the data has an approximately locally linear structure. This is a reasonable assumption for many data models, as long as the sampling of the data from the underlying manifold is sufficiently dense. However, if the sampling is too sparse, the curvature of the manifold (instead of the tangential manifold component) becomes the main factor that determines the deviation between neighboring data samples. In this case, the local linearity assumption fails. Therefore, in a dimensionality reduction application, it is important to determine if the sampling of the data is reliable enough to make sure that manifold learning algorithms will function properly.
This has been the purpose of our recent study, where we examine the sampling of smooth embeddings of Riemannian manifolds. We derive sampling conditions such that the local linearity assumption of data samples holds. As the local linearity measure, we consider the deviation between the true tangent space of the manifold at a reference point and the estimate of the tangent space computed by applying PCA on neighboring manifold samples around the reference point (illustrated below). Our analysis shows that, if the width of the region of sampling in the ambient space is chosen as O( m -3/2 K -1 ), where m is the intrinsic manifold dimension and K is the manifold curvature, one can retain the linearity of data samples. Therefore, to compute the tangent space of a manifold from finitely many data samples in a reliable way, the distance between neighboring manifold samples should shrink with the increase in the curvature and the intrinsic manifold dimension.
Illustration of tangent space estimation with PCA. Nϵ(P) is the ϵ-neighborhood of the reference manifold point P where data samples are taken from. Then the tangent space estimation error of is the angle between the true tangent space TPS at P and its estimation computed with the neighboring manifold samples around P.