0
Article ? AI-assigned paper type based on the abstract. Classification may not be perfect — flag errors using the feedback button. Tier 2 ? Original research — experimental, observational, or case-control study. Direct primary evidence. Sign in to save

Language-Agnostic Modeling of Source Reliability on Wikipedia

ACM Transactions on the Web 2025 Score: 38 ? 0–100 AI score estimating relevance to the microplastics field. Papers below 30 are filtered from public browse.
Jacopo D'Ignazi, Andreas Kaltenbrunner, Yelena Mejova, Michele Tizzani, Kyriaki Kalimeri, Mariano G. Beiró, Pablo Aragón

Summary

Researchers developed a language-agnostic machine learning model to assess the reliability of web domains used as references across multiple Wikipedia language editions, evaluating domain credibility within articles on controversial topics such as Climate Change, COVID-19, and Biology.

Over the last few years, verifying the credibility of information sources has become a fundamental need to combat disinformation. Here, we present a language-agnostic model designed to assess the reliability of web domains as sources in references across multiple language editions of Wikipedia. Utilizing editing activity data, the model evaluates domain reliability within different articles of varying controversiality, such as Climate Change, COVID-19, History, Media, and Biology topics. Crafting features that express domain usage across articles, the model effectively predicts domain reliability, achieving an F1 Macro score of approximately 0.80 for English and other high-resource languages. For mid-resource languages, we achieve 0.65, while the performance of low-resource languages varies. In all cases, the time the domain remains present in the articles (which we dub as permanence ) is one of the most predictive features. We highlight the challenge of maintaining consistent model performance across languages of varying resource levels and demonstrate that adapting models from higher-resource languages can improve performance. We believe these findings can assist Wikipedia editors in their ongoing efforts to verify citations and may offer useful insights for other user-generated content communities.

Sign in to start a discussion.

Share this paper