0
Article ? AI-assigned paper type based on the abstract. Classification may not be perfect — flag errors using the feedback button. Tier 2 ? Original research — experimental, observational, or case-control study. Direct primary evidence. Detection Methods Policy & Risk Sign in to save

Evaluation-driven preprocessing and interpretable machine learning for large-scale FTIR polymer classification in microplastics research

2026
M. Maksuda Khanam, Saleena Younus, Laila Mousafi Alasal, M. Khabir Uddin, M. Khabir Uddin, Julhash U. Kazi

Summary

Scientists developed a new computer program called xpectrass that can automatically identify different types of plastic particles (microplastics) using a special light analysis technique. The program correctly identified plastic types with high accuracy across thousands of samples, which could help researchers better track microplastic pollution in our food, water, and environment. This improved identification system is important because understanding what types of plastics are contaminating our world is a key step in protecting human health from microplastic exposure.

Fourier Transform Infrared (FTIR) spectroscopy is widely used for polymer identification in microplastics research. However, raw spectra are often affected by noise, baseline drift, and other acquisition-related artifacts that are not chemically meaningful. Since preprocessing steps can strongly influence multivariate analysis and machine learning (ML) classification, we developed xpectrass, an open-source framework designed for systematic evaluation of FTIR preprocessing, exploratory analysis, and interpretable ML. A combined dataset of 12,214 spectra covering multiple polymer classes was used to compare different denoising, baseline correction, and normalization strategies. A preprocessing workflow based on wavelet denoising, adaptive smoothness penalized least squares (asPLS) baseline correction, and either Standard Normal Variate (SNV) or Spectral Moments normalization showed stable performance across datasets and supported harmonization between studies. Unsupervised methods such as PCA, t-SNE, and UMAP separated most polymer classes, although partial overlap remained among chemically similar polyolefins. We further evaluated 41 ML model configurations and present detailed results for a representative high-performing model based on XGBoost. Model interpretation using SHapley Additive exPlanations (SHAP) associated predictions with specific wavenumbers, facilitating chemical interpretability. In summary, xpectrass provides a structured and scalable workflow that links heterogeneous FTIR spectra to reproducible polymer classification in microplastics research.

Share this paper