A benchmarking study comparing GAN, Diffusion, and multi-temporal compositing approaches on 15,000 Sentinel-2 patches across five cloud coverage levels. 在五种云覆盖率等级的15,000张Sentinel-2卫星图像上,对GAN、扩散模型与多时相合成方法进行系统性基准测试研究。
Cloud-free satellite imagery is essential for land-cover mapping, vegetation monitoring (NDVI), urban expansion analysis, and disaster response. Yet in humid regions — particularly southern China — the majority of Sentinel-2 acquisitions are partially or fully occluded by clouds.
This project asks: can cloud-covered image patches be reconstructed reliably enough to support downstream visual inspection and vegetation-sensitive analysis when a fully cloud-free scene isn't available?
We benchmark three approaches across five cloud coverage levels (5%, 10%, 30%, 50%, 70%), evaluating reconstruction quality via PSNR, SSIM, cloud-region L1 error, and NDVI MAE.
云层无关的卫星影像对土地覆盖制图、植被监测(NDVI)、城市扩张分析和灾害响应至关重要。然而在湿润地区——尤其是中国南方——大多数Sentinel-2影像都受到不同程度的云层遮挡。
本研究探讨:当完全无云影像无法获取时,能否可靠地重建被云层遮挡的图像块,以支持下游的目视解译和植被分析?
我们在五种云覆盖率等级(5%、10%、30%、50%、70%)上对三种方法进行基准测试,通过PSNR、SSIM、云区L1误差和NDVI MAE评估重建质量。
We pull COPERNICUS/S2_SR_HARMONIZED (Level-2A surface reflectance) from Google Earth Engine, covering Beijing and its urban-rural fringe during the growing season (April–October 2023). This window captures vegetated periods most sensitive to NDVI accuracy.
Each 128×128 pixel patch captures four spectral bands: Red (B4), Green (B3), Blue (B2), and NIR (B8) at 10m native resolution. Synthetic clouds are generated via Gaussian-filtered noise thresholded to exact coverage targets, ensuring ground-truth availability at all occlusion levels.
Dataset split is done at the base-image level before cloud augmentation to prevent train-test leakage (70% train / 10% val / 20% test).
我们通过Google Earth Engine获取COPERNICUS/S2_SR_HARMONIZED(L2A地表反射率)影像,覆盖北京市及其城乡交错带,时间窗口为生长季(2023年4月–10月)。该时段的植被状况对NDVI精度最为敏感。
每个128×128像素图块包含四个光谱波段:红光(B4)、绿光(B3)、蓝光(B2)和近红外(B8),原始分辨率10米。合成云通过高斯滤波噪声阈值化生成,确保所有遮挡等级均有对应的真实参考值。
数据集划分在云层增强前以基础影像为单元进行,有效防止训练-测试数据泄露(70%训练 / 10%验证 / 20%测试)。
Averages cloud-free pixels across 4 temporal observations (original + 3 neighbors). Falls back to cloudy value when all frames are masked. 在4次时序观测中(原图+3个相邻时相)取无云像素的平均值。当所有时相均被遮挡时,回退使用含云像素值。
Failure probability at 70% coverage: 0.7⁴ = 24% of pixels have no clean alternative. 70%覆盖率下失效概率:0.7⁴ = 24%的像素无法找到无云替代值。
U-Net generator (5-channel input: 4-band image + mask) paired with PatchGAN discriminator. Adversarial loss + 100× L1 loss trains a constrained image translator, not a free-form generator. U-Net生成器(5通道输入:4波段影像+掩膜)配合PatchGAN判别器。对抗损失+100×L1损失训练受约束的图像翻译器,而非无条件生成模型。
DDPM-style 100-step denoising with U-Net denoiser (6-channel: conditioned image + mask + timestep). Re-imposes known clean pixels at each reverse step to constrain generation. 采用100步DDPM去噪过程,U-Net去噪器(6通道:条件影像+掩膜+时步编码)。每个反向步骤中重新施加已知无云像素以约束生成过程。
| Method方法 | PSNR (dB) ↑ | SSIM ↑ | L1 Cloud ↓ | NDVI MAE ↓ | Avg. Coverage平均覆盖率 |
|---|---|---|---|---|---|
| Multi-temporal多时相 | 15.09 | 0.142 | 0.1453 | 0.0877 | 30–70% |
| GAN (ours)GAN(本文) | 32.76 | 0.850 | 0.0243 | 0.1127 | 30–70% |
| Diffusion扩散模型 | 29.20 | 0.711 | 0.0375 | 0.1850 | 30–70% |
Metrics averaged over 30%, 50%, and 70% coverage levels. ↑ = higher is better · ↓ = lower is better. 指标为30%、50%、70%覆盖率等级的平均值。↑ = 越高越好 · ↓ = 越低越好。
The 30% coverage threshold is a phase transition. Below it, temporal compositing is near-perfect because simultaneous cloud probability across 4 frames is 0.3⁴ ≈ 0.8%. Above it, the probability of catastrophic failure spikes — and GAN's context-based reconstruction pulls ahead. 30%覆盖率是性能分水岭。低于此值时,时序合成方法接近完美(4帧同时被遮挡的概率仅0.3⁴≈0.8%)。高于此值后,灾难性失败概率急剧上升,基于上下文重建的GAN优势凸显。
GAN wins because it is constrained, not because it is generative. The conditioned U-Net translates cloudy-to-clean images as a supervised regression, not free-form generation. This makes it more reliable and spectrally consistent than diffusion under limited training budgets. GAN胜出是因为其受约束性,而非生成性。条件U-Net将含云影像翻译为无云影像,本质是有监督回归而非无条件生成。这使其在有限训练资源下比扩散模型更稳定、光谱一致性更好。
PSNR and cloud-region L1 can tell different stories at 30%. The squared error in PSNR amplifies the ~133 pixels per patch with zero temporal options, causing a large PSNR drop. Cloud-region L1 dilutes the same pixels, showing a smaller change. GAN avoids this spike entirely by learning from spatial context. 30%覆盖率下PSNR与云区L1可能呈现不同规律。PSNR的平方误差放大了每个图块约133个无时序可替代像素的影响,导致PSNR大幅下降;云区L1则因稀释效应变化较小。GAN通过学习空间上下文完全规避了这一问题。
Diffusion initialization matters. Using the mean of clean-region pixels (rather than the cloudy proxy) as the diffusion starting point was critical. Linear final activation (not sigmoid) was also required to avoid output range collapse in the denoised patches. 扩散模型的初始化至关重要。以无云区域像素均值(而非含云影像像素)作为扩散起始点至关重要。同时,最终激活函数必须是线性的(而非sigmoid),否则去噪图块的输出值域会发生坍塌。
Our synthetic masks use Gaussian-filtered noise, not real atmospheric scattering patterns. Results may not transfer directly to optically thin cloud types or cloud shadows. 合成掩膜采用高斯滤波噪声,而非真实大气散射模式。结果可能不能直接迁移到薄云类型或云阴影情形。
All training data comes from Beijing (April–October 2023). The motivating application — cloud removal for humid southern China (e.g., Yunnan) — has not been validated. 所有训练数据来自北京(2023年4–10月)。初始动机——面向中国南方湿润地区(如云南)的云去除应用——尚未经过验证。
The diffusion result represents a limited-capacity, limited-training-budget baseline. With 100+ epochs, a larger backbone, and cloud-region-weighted objectives, diffusion may substantially close the gap. 扩散模型的结果仅代表有限容量和训练资源下的基准水平。若扩展至100+轮训练、更大网络和云区加权损失,性能差距可能显著缩小。
Evaluation is image-centric. The benefit of cloud removal for actual land-cover classification or vegetation monitoring workflows has not been directly measured. 评估以图像质量为中心。云去除对实际土地覆盖分类或植被监测工作流程的实际效益尚未直接测量。
Expand to Yunnan province and other cloud-prone regions in southern China to test geographic transferability of the GAN model. 扩展至云南省及中国南方其他云层密集地区,测试GAN模型的地理迁移能力。
Provide more data, extended training budget (100+ epochs), larger backbone, and cloud-region-weighted objectives to reveal diffusion's true potential. 扩充数据量,增加训练轮次(100+),使用更大网络和云区加权损失,充分挖掘扩散模型的性能上限。
Validate the effect of cloud removal directly on land-cover classification accuracy and vegetation index (NDVI/EVI) computation in real-world monitoring pipelines. 在真实监测流程中直接验证云去除对土地覆盖分类精度和植被指数(NDVI/EVI)计算的影响。
Incorporate optically-derived cloud masks from Sentinel-2 QA bands and evaluate whether synthetic-trained models generalize to real cloud morphology. 引入Sentinel-2 QA波段的光学云掩膜,评估合成训练模型对真实云形态的泛化能力。