We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Microplastics and Trash Cleaning and Harmonization (MaTCH): Semantic Data Ingestion and Harmonization Using Artificial Intelligence
Summary
Researchers developed an AI-powered tool called MaTCH that can automatically harmonize microplastic pollution datasets that use different formats, naming conventions, and measurement methods. The algorithm achieved 71-94% accuracy when matching semantically different descriptors across research groups. This open-source tool addresses a critical need in the field, where thousands of inconsistent categorical descriptors have made it difficult to compare and integrate plastic pollution data across studies.
With the rapid expansion of microplastic research and reliance on semantic descriptors, there is an increasing need for plastic pollution data harmonization. Data standards have been developed but are seldom implemented across research sectors, geographic regions, environmental media, or size classes of plastic pollution. Harmonization of existing data is currently hindered by increasingly large datasets using thousands of different categorical variable descriptors, as well as various metrics used to describe particle abundance and differing size ranges studied across groups. For this study, we used manually developed relational databases to build an algorithm utilizing artificial intelligence capable of automatically curating harmonized, more usable datasets describing micro to macro plastic pollution in the environment. The study algorithm MaTCH (microplastics and trash cleaning and harmonization) can harmonize datasets with different formats, nomenclature, methods, and measured particle characteristics with an accuracy of 71-94% when matching semantically. All other non-semantic corrections are reported within a 95% confidence interval and with model uncertainty. All steps of the algorithm are integrated in an open-source software tool for the benefit of the scientific community and ease of integration for all plastic pollution data.
Sign in to start a discussion.