RobSense: A Robust Multi-modal Foundation Model for Remote Sensing with Static, Temporal, and Incomplete Data Adaptability

Overview

We propose RobSense, a robust multi-modal foundation model designed for multi-spectral and Synthetic Aperture Radar (SAR) data. RobSense:

Supports diverse input types, ranging from static to temporal data, and from uni-modal to multi-modal formats
Handles incomplete data, including missing spectral bands and irregularities in temporal sequences

How is Robsense pre-trained?

Pre-training dataset

Robsense is pre-trained on Satlas, a large-scale global dataset comprising approximately 12 million images from Sentinel-1 and Sentinel-2 satellites. Figure 1 illustrates its worldwide geographic coverage.

Interpolate start reference image. — Geographic coverage of Robsense's pre-trained dataset, Satlas. Satlas spans all continents except Antarctica (This image is adapted from Satlas publication)

Pre-training workflow

Fine-tunning

Quantitative Results

Segmentation results (mIoU ↑) on Satlas dataset

Classification results (mAP ↑) on BigEarthNet dataset

Comparison of foundation models fine-tuned on datasets with diverse input types—from uni-modal to temporal (T-) multi-modal data—across varying missing rates. Rand. indicates a random missing rate applied to each sequence

Related Work

Favyen Bastani et. al. Satlaspretrain: A large-scale dataset for remote sensing image understanding. ICCV 2023

Anthony Fuller et. al. CROMA: Remote Sensing Representations with Contrastive Radar-Optical Masked Autoencoders. NeurIPS 2023

Mubashir Noman et. al. Rethinking transformers pre-training for multi- spectral satellite imagery. CVPR 2024

Danfeng Hong et. al. Spectralgpt: Spectral remote sensing foundation model. IEEE TPAMI 2024

BibTeX


@article{robsense2025,
  title={RobSense: A Robust Multi-modal Foundation Model for Remote Sensing with Static, Temporal, and Incomplete Data Adaptability},
  author={Minh Kha Do and Kang Han and Phu Lai and Khoa T. Phan and Wei Xiang},
  journal={2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2025},
}