0
Article ? AI-assigned paper type based on the abstract. Classification may not be perfect — flag errors using the feedback button. Sign in to save

Binary reformulation for marine debris detection in Sentinel-2 imagery: an empirical study on extreme class imbalance using the first benchmarks on combined MARIDA and MADOS datasets

Nature Geoscience 2026
Soufyane Bouchelaghem, Lahcene Mamen, Marco Balsi, Ahmed Tibermacine, Monica Moroni, Imad Eddine Tibermacine

Summary

This study develops a binary reformulation method to improve the detection of marine debris using Sentinel satellite imagery. By recasting the detection problem as a binary classification task, the approach improves signal separation between debris and background ocean features. The technique offers a scalable remote-sensing solution for monitoring plastic and debris accumulation across large ocean areas.

Introduction Marine debris detection from satellite imagery is challenged by two major factors: extreme class imbalance, with debris pixels accounting for less than 0.01% of image content, and the need for robust generalization across diverse geographic and temporal domains for operational deployment. Although existing methods often report strong within-dataset performance, cross-dataset generalization, where models trained on one dataset are applied to entirely different geographic regions, remains insufficiently investigated. Methods To address this limitation, we conducted rigorous bidirectional cross-dataset validation experiments using the MARIDA and MADOS datasets. The problem was reformulated as a binary segmentation task and addressed using a standard U-Net architecture combined with a composite imbalance-aware loss and a rarity-aware sampling strategy. Two experimental settings were considered: training on MARIDA and testing on MADOS, and training on MADOS and testing on MARIDA. Results The experiments revealed asymmetric cross-dataset generalization. Models trained on the geographically diverse MADOS dataset achieved an F1-score of 0.890 when tested on MARIDA, corresponding to only a 1.25% decrease from the within-dataset baseline of 0.901. In contrast, models trained on MARIDA achieved an F1-score of 0.833 on MADOS, representing a 7.55% decrease. The average cross-dataset degradation was 4.38%, which is substantially lower than the typical 10--25% performance drops reported in remote sensing domain-shift scenarios. Despite comparable patch counts (2,529 for MADOS versus 2,173 for MARIDA), the superior transferability of MADOS-trained models indicates that geographic diversity across globally distributed tiles is more beneficial than exhaustive annotation within concentrated regions. Moreover, the MADOS-to-MARIDA cross-dataset F1-score of 0.890 exceeded MAP-Mapper's within-dataset F1-score of 0.880 and closely approached MariNeXt's reported performance of 0.891. Discussion These findings show that careful data formulation and training design can enable standard architectures to achieve strong cross-domain performance under extreme class imbalance, approaching or even surpassing more specialized models in realistic deployment conditions. The results provide practical guidance for operational marine debris monitoring systems: spatially stratified sampling across diverse marine environments should be prioritized, F1-scores in the range of 0.86--0.89 can be expected when deploying on previously unseen regions without fine-tuning, and a two-stage strategy should be considered in which models are first trained on geographically diverse data and then optionally adapted for region-specific applications. To the best of our knowledge, this is the first systematic cross-dataset validation study involving both MARIDA and MADOS, demonstrating that binary reformulation supports generalization-preserving marine debris detection across geographic and temporal domain shifts.

Share this paper