Multiple and heterogenous Earth observation (EO) platforms are broadly used for a wide array of applications, and the integration of these diverse modalities facilitates better extraction of information than using them individually. The detection capability of the multispectral unmanned aerial vehicle (UAV) and satellite imagery can be significantly improved by fusing with ground hyperspectral data. However, variability in spatial and spectral resolution can affect the efficiency of such dataset's fusion. In this study, to address the modality bias, the input data was projected to a shared latent space using cross-modal generative approaches or guided unsupervised transformation. The proposed adversarial networks and variational encoder-based strategies used bi-directional transformations to model the cross-domain correlation without using cross-domain correspondence. It may be noted that an interpolation-based convolution was adopted instead of the normal convolution for learning the features of the point spectral data (ground spectra). The proposed generative adversarial network-based approach employed dynamic time wrapping based layers along with a cyclic consistency constraint to use the minimal number of unlabeled samples, having cross-domain correlation, to compute a cross-modal generative latent space. The proposed variational encoder-based transformation also addressed the cross-modal resolution differences and limited availability of cross-domain samples by using a mixture of expert-based strategy, cross-domain constraints, and adversarial learning. In addition, the latent space was modelled to be composed of modality independent and modality dependent spaces, thereby further reducing the requirement of training samples and addressing the cross-modality biases. An unsupervised covariance guided transformation was also proposed to transform the labelled samples without using cross-domain correlation prior. The proposed latent space transformation approaches resolved the requirement of cross-domain samples which has been a critical issue with the fusion of multi-modal Earth observation data. This study also proposed a latent graph generation and graph convolutional approach to predict the labels resolving the domain discrepancy and cross-modality biases. Based on the experiments over different standard benchmark airborne datasets and real-world UAV datasets, the developed approaches outperformed the prominent hyperspectral panchromatic sharpening, image fusion, and domain adaptation approaches. By using specific constraints and regularizations, the network developed was less sensitive to network parameters, unlike in similar implementations. The proposed approach illustrated improved generalizability in comparison with the prominent existing approaches. In addition to the fusion-based classification of the multispectral and hyperspectral datasets, the proposed approach was extended to the classification of hyperspectral airborne datasets where the latent graph generation and convolution were employed to resolve the domain bias with a small number of training samples. Overall, the developed transformations and architectures will be useful for the semantic interpretation and analysis of multimodal data and are applicable to signal processing, manifold learning, video analysis, data mining, and time series analysis, to name a few.
Bibliographical notePublisher Copyright:
© 2021 Elsevier B.V.
- Convolutional neural networks
- Ground measured spectra
- Multispectral UAV