This paper presents a method for alignment of images acquired by sensors of different modalities (e.g., EO and IR). The paper has two main contributions: (i) It identifies an appropriate image representation for multi-sensor alignment, i.e., a representation which emphasizes the common information between the two multi-sensor images, suppresses the non-common information, and is adequate for coarse-to-fine processing. (ii) It presents a new alignment technique, which applies global estimation to any choice of a local similarity measure. In particular, it is shown that when this registration technique is applied to the chosen image representation with a local-normalized-correlation similarity measure, it provides a new multi-sensor alignment algorithm which is robust to outliers, and applies to a wide variety of globally complex brightness transformations between the two images. Our proposed image representation does not rely on sparse image features (e.g., edge, contour, or point features). It is continuous and does not eliminate the detailed variations within local image regions. Our method naturally extends to coarse-to-fine processing, and applies even in situations when the multi-sensor signals are globally characterized by low statistical correlation.