0
Article ? AI-assigned paper type based on the abstract. Classification may not be perfect — flag errors using the feedback button. Sign in to save

Prediction of 35 Target Per- and Polyfluoroalkyl Substances (PFASs) in California Groundwater Using Multilabel Semisupervised Machine Learning

Frontiers in Plant Science 2023 22 citations ? Citation count from OpenAlex, updated daily. May differ slightly from the publisher's own count.
Jialin Dong, Gabriel Tsai, Christopher I. Olivares

Summary

Researchers built a machine learning pipeline trained on 25,000 groundwater observations across California to predict contamination levels of 35 individual PFAS compounds, achieving up to 100% accuracy and identifying contamination sources and soil characteristics as the strongest predictive factors.

Comprehensive monitoring of perfluoroalkyl and polyfluoroalkyl substances (PFASs) is challenging because of the high analytical cost and an increasing number of analytes. We developed a machine learning pipeline to understand environmental features influencing PFAS profiles in groundwater. By examining 23 public data sets (2016–2022) in California, we built a state-wide groundwater database (25,000 observations across 4200 wells) encompassing contamination sources, weather, air quality, soil, hydrology, and groundwater quality (PFASs and cocontaminants). We used supervised learning to prescreen total PFAS concentrations above 70 ng/L and multilabel semisupervised learning to predict 35 individual PFAS concentrations above 2 ng/L. Random forest with ADASYN oversampling performed the best for total PFASs (AUROC 99%). XGBoost with SMOTE oversampling achieved the AUROC of 73–100% for individual PFAS prediction. Contamination sources and soil variables contributed the most to accuracy. Individual PFASs were strongly correlated within each PFAS’s subfamily (i.e., short- vs long-chain PFCAs, sulfonamides). These associations improved prediction performance using classifier chains, which predicts a PFAS based on previously predicted species. We applied the model to reconstruct PFAS profiles in groundwater wells with missing data in previous years. Our approach can complement monitoring programs of environmental agencies to validate previous investigation results and prioritize sites for future PFAS sampling.

Share this paper