0
Article ? AI-assigned paper type based on the abstract. Classification may not be perfect — flag errors using the feedback button. Sign in to save

High Data Quality Enhances Microplastic Toxicity Prediction

2026
Ana Antonio Vital, Scott Coffin, Andrea Bonisoli-Alquati, Maaike Vercauteren, Luan de Souza Leite, Maximilian Pichler, Magdalena Mair

Summary

This study applied the Boosted Regression Tree machine learning algorithm to four datasets from the ToMEx 2.0 microplastic toxicity database to predict microplastic toxicity and evaluate how data quality influences model performance. The high-quality dataset achieved the best predictive accuracy (AUC = 0.93 for random cross-validation), with endpoints and concentration emerging as the primary predictors, highlighting the importance of standardized experimental design and detailed microplastic characterization for risk assessment.

Unlike chemicals, microplastics (MPs) lack standardized identifiers, limiting the applicability of traditional predictive ecotoxicology methods such as quantitative structure-activity relationship (QSAR) models. This study aimed to predict MP toxicity using MP properties, MP concentration, organismal traits, endpoints, and experimental design, and to evaluate how data pre-processing, dataset size, and quality influence model performance. We applied the Boosted Regression Tree (BRT) machine learning algorithm to four datasets derived from the Toxicity of Microplastics Explorer database (ToMEx 2.0): (i) imputed missing values, (ii) complete-case (missing values removed), (iii) high-quality data, and (iv) low-quality data. The high-quality dataset yielded the best final predictions for both random cross-validation (AUC = 0.93) and blocked cross-validation by particle identifier (AUC = 0.87). Explainable artificial intelligence (xAI) analyses showed that predictive performance was primarily determined by endpoints and concentration, with MP properties contributing despite limited reporting. Our findings demonstrate the feasibility of machine learning to predict and identify key drivers of MP toxicity, highlighting that high-quality data improves predictive performance while reducing data mining and computational costs. Standardized experiments, detailed MP characterization, and high reporting standards would better support risk assessment frameworks and inform the design of safer materials.

Share this paper