We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Data Coverage
Not every paper in Atlas has every field populated. This page explains what's included, what's missing, and why — so you know exactly what you're searching, browsing, and filtering.
See also: Methodology
Corpus at a glance
111,495
Total papers
69,080
Visible (relevance ≥ 30)
42,415
Excluded by relevance
Papers
Atlas ingests every paper matching microplastics-related queries from OpenAlex and PubMed. Not all of them are shown to users.
| Filter | Excluded | Why |
|---|---|---|
| Relevance < 30 | 42,415 | Materials science papers about "microplasticity" (plastic deformation in metals) and other off-topic matches. Kept in DB but hidden from browse and search. |
| Visibility: delisted | 0 | Manually or automatically delisted after review (duplicates, retractions, non-research items). |
69,080 papers pass all filters and are visible on Atlas.
Abstracts
An abstract is required for almost every downstream enrichment step. Papers without abstracts are functionally title-only records.
97,294
Have abstract
87.3%
14,201
Missing abstract
12.7%
Why abstracts are missing
- — Publisher withholds abstract from open metadata (common with Elsevier, Springer Nature paywalled content)
- — Book chapters and conference proceedings often lack structured abstracts
- — Editorials, letters, and news pieces have no formal abstract
Without an abstract, a paper cannot receive a summary, embedding, classification, or keyword annotation. It appears in keyword search and title browse only. We periodically scrape publisher pages via DOI to recover missing abstracts.
Summaries
Each paper's abstract is rewritten into a plain-language summary by AI. A paper gets a summary only if it has an abstract.
79,684
Have summary
71.5%
31,811
No summary
28.5%
What's excluded
- — Papers without an abstract (no input text to summarize)
- — Low-relevance papers (score < 30) are deprioritized for summary generation
Of the 69,080 visible papers, 9,740 lack summaries because they have no abstract. All visible papers with abstracts have summaries.
Embeddings (Semantic Search)
Each paper's title and summary are encoded into a 1024-dimension vector by VoyageAI. These vectors power "More Papers Like This" and semantic search.
108,567
Have embedding
97.4%
2,928
No embedding
2.6%
What's excluded
- — Papers without an abstract (no meaningful text to encode)
Papers without embeddings will not appear in semantic search results or "More Papers Like This" recommendations. They still appear in keyword and fulltext search.
Rankings
Rankings apply additional quality filters beyond the base relevance threshold. These filters prevent non-research items from inflating institution, author, and country metrics.
| Filter | What it removes |
|---|---|
| Relevance < 30 | Same base filter as paper browse — off-topic materials science papers |
| No recorded authors | Journal housekeeping records (table of contents, indexes) that OpenAlex sometimes classifies as research |
| > 50 co-authors | Conference proceedings, multi-consortium announcements, and bulk-indexed journal volumes. Including them would inflate every listed institution's count. |
| Housekeeping titles | Titles matching "Table of Contents", "Editorial Board", "Issue Information", "Front Matter", or "Contents List" — not research, regardless of other metadata |
These filters are applied on top of the base relevance threshold. A paper can be visible in browse/search but excluded from rankings. See Rankings Methodology for full details.
Annotations
Papers are tagged with structured metadata (polymers, body systems, animal models, study type) using rule-based keyword matching against the title and abstract.
86,257
Annotated
77.4%
25,238
Not annotated
22.6%
What's excluded
- — Papers without an abstract (no text to scan for keywords)
- — Papers that use clinical terminology not in the keyword dictionary (e.g., "gonadal" instead of "reproductive")
Annotations are indicative, not exhaustive. A paper may study a polymer or body system without using Atlas's exact keywords. Filter results should be treated as a lower bound.
How it flows together
Each step depends on the one before it:
A paper missing step 2 (abstract) will also be missing steps 3–6. This is the primary driver of incomplete coverage across all dimensions.
These numbers update as the corpus grows. For how each enrichment step works, see Methodology.