We propose SatTxt, a spectrum-aware VLFM for satellite imagery that leverages spectral priors while operating exclusively on RGB inputs at inference:
SATtxt is pre-trained on SL4EO-S12 v1.1 with captions obtained from LLaMA3-SSL4EO-S12-v1.1-captions, a large-scale global dataset comprising approximately 1 million images from Sentinel-2 satellite. Figure 1 illustrates its worldwide geographic coverage.
Clive Tinashe Marimo et. al. Beyond the Visible: Multispectral Vision-Language Learning for Earth Observation. ECML PKDD 2025
Johannes Jakubik et. al. TerraMind: Large-Scale Generative Multimodality for Earth Observation. ICCV 2025
Danfeng Hong et. al. Spectralgpt: Spectral remote sensing foundation model. IEEE TPAMI 2024
@article{sattxt2026,
title={Spectrally Distilled Representations Aligned with Instruction-Augmented LLMs for Satellite Imagery},
author={Minh Kha Do and Wei Xiang and Kang Han and Di Wu and Khoa Phan and Yi-Ping Phoebe Chen and Gaowen Liu and Ramana Rao Kompella},
journal={2026 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2026},
}