0
Article ? AI-assigned paper type based on the abstract. Classification may not be perfect — flag errors using the feedback button. Tier 2 ? Original research — experimental, observational, or case-control study. Direct primary evidence. Detection Methods Environmental Sources Food & Water Policy & Risk Remediation Sign in to save

Assessment of Performance and Uncertainty in Qualitative Analytical Chemistry: A Metrological Approach based on the Eurachem/CITAC guide

Brazilian Journal of Analytical Chemistry 2026
Ariadne Rocha, Felipe Rebello Lourenço, Elcio de Oliveira

Summary

Not relevant to microplastics — this paper reviews the Eurachem/CITAC 2021 guide for assessing uncertainty and performance in qualitative analytical chemistry, with a brief illustrative mention of Raman spectroscopy applied to microplastic identification as one example of the methodology.

While the aim of "Quantifying Uncertainty in Analytical Measurement" is to provide a standardized framework and methodology for estimating and expressing uncertainty in quantitative analytical measurements, the Eurachem/CITAC guide for qualitative analysis expands the concept of uncertainty to categorical decisions, allowing qualitative analyses to be evaluated with the same metrological and statistical rigor. Published in 2021, the Eurachem/CITAC Guide on Assessment of Performance and Uncertainty in Qualitative Chemical Analysis is considered as a key methodological reference for assessing the reliability of qualitative outcomes. Such qualitative tests yield categorical results, such as the presence or absence of a substance or the identification of a compound. Although such results do not yield a directly measured numerical value, they are not exempt from uncertainty. The guide proposes quantifying these uncertainties associated with the probability of false-positive or false-negative results to inform users of the analysis about the method's reliability limits.1 In this context, the document provides different methodological approaches that enable qualitative decisions to be treated with the same statistical rigor as quantitative analyses. It presents practical examples illustrating how these approaches can be implemented, recognizing that the outcome of a qualitative method, such as the identification of a compound or the confirmation of a substance’s presence, is subject to errors and uncertainties that must be systematically evaluated and communicated.2 Defining the types of criteria used in qualitative analysis involves distinguishing between quantitative criteria, which entail converting a numerical value into a category (e.g., compliant or non-compliant based on a threshold), and qualitative criteria, such as color change, visual observation, or other indications of presence or absence. Although the guide primarily focuses on binary nominal classification (e.g., yes or no, present or absent), it acknowledges that, in certain categorical cases, classification can be reduced to correct or incorrect to apply the same principles. By explicitly defining the decision criteria and the statistical or probabilistic thresholds that delineate the boundary between categories, the guide ensures metrological traceability and transparency in the interpretation of results. As a central tool for characterizing the performance of qualitative methods, the guide introduces metrics based on false results. Laboratories are advised to collect samples with known or reference results, apply the method under evaluation to these cases, and estimate the frequency with which the process fails to correctly classify an item. From these results, the proportions of true positives, true negatives, false positives, and false negatives are calculated. These proportions allow the estimation of error probabilities and the construction of confidence intervals, thereby expressing statistical uncertainty. Treated as direct measures of uncertainty, these error rates provide qualitative analyses with a quantitative dimension of reliability, which is important when analytical results affect technical, regulatory, or scientific decisions.3,4 A crucial aspect mentioned by the guide is the representativeness and diversity of the cases used in performance estimation. If the test cases are too homogeneous or not representative of real conditions, the estimated error rates may underestimate or overestimate the actual uncertainty. Therefore, it is recommended to use test samples that cover the expected range of conditions, including different matrices, varying levels of interferents, and operational variations, to ensure the reliability estimate is robust. This recommendation becomes especially relevant in methodologies based on spectroscopy and measurement instruments, where spectral variability and signal overlap among similar constituents pose practical challenges. By requiring sample sets that are representative of real analysis conditions, the guide promotes validation that reflects the natural variability of matrices and the presence of interferents, an essential element to realistically estimating the probability of incorrect classifications.5-7 In addition to representativeness, the number of samples used for performance estimation plays a critical role in the reliability and stability of validation results. Although no universally fixed minimum sample size can be defined since this number depends on the analytical objective, system complexity, and data variability, previous validation studies conducted under simple or simulated scenarios have shown that insufficient sample numbers can lead to unstable or overly optimistic estimates of error rates, particularly in qualitative and classification problems. As a general indication, validation studies require sample sizes on the order of several tens per class to achieve stable and convergent performance estimates, with larger sample sizes becoming necessary as model complexity increases.8 Therefore, representativeness and sample size should be considered jointly to ensure that estimated probabilities of incorrect classification realistically reflect the uncertainty associated with real analytical conditions. When it comes to selectivity, the guide allows for a reinterpretation of this traditional concept. Instead of remaining a purely qualitative attribute, selectivity is viewed as a measurable parameter linked to the probability of correctly distinguishing between samples containing similar compounds or interferents. Thus, evaluating selectivity becomes an exercise in pattern recognition and probabilistic classification, where performance is expressed by the method’s ability to assign each sample to its category correctly. This perspective broadens the understanding of selectivity, associating it with the statistical reliability of the response rather than solely the absence of visual or instrumental interference.9-13 The practical application of this methodology is especially relevant in non-targeted analyses, where the goal is to detect complex patterns and identify substances across diverse matrices without predefined targets. In such cases, the method must correctly differentiate signals corresponding to distinct compounds, even in the presence of instrumental noise and spectral overlap. Recent studies illustrate this by developing an automated approach for identifying microplastics using Raman spectroscopy, addressing the challenges of spectral variability and signal overlap among similar polymers. Instead of relying on an arbitrary similarity value, the method used a correlation distribution obtained via bootstrap sampling to determine the practical acceptance threshold, aligning with the guide's recommendation to base qualitative decisions on probabilistic metrics and explicit performance assessments. This statistically controlled approach has been shown to significantly reduce classification errors, providing known confidence levels for each decision and bringing the analytical process closer to a metrologically traceable system where each decision is supported by uncertainty and performance estimates.14 When Raman spectroscopy is applied to quantify species in reactive mixtures, as in studies of urea and thiourea, adherence to the principles outlined in the guide is essential. Although the primary objective of such studies is to estimate concentrations from Raman signals quantitatively, a qualitative component remains inherent in spectral assignment, band identification, and the differentiation of interfering signals, steps that carry a risk of interpretative error. The guide emphasizes that any implicit qualitative decision, for instance, assigning a peak to a specific vibrational mode or determining whether a signal belongs to the analyte or to noise, is subject to uncertainty, and this uncertainty should be expressed in terms of error probabilities. In the urea/thiourea system, this involves evaluating the likelihood of misinterpretations—such as mistaking an interfering band for the analyte or overlooking weak peaks buried in noise—and designing validation experiments that account for varying compound ratios, noise levels, and instrumental conditions.15 Moreover, the guide recommends that qualitative conclusions, such as band assignments or confirmation of analyte presence, be accompanied by a confidence statement or an estimate of the local error probability derived from previously assessed spectral error rates, thereby enhancing transparency and discouraging absolute interpretations of spectroscopic signals. The systematic implementation of these guidelines also fosters a culture of continuous performance monitoring in laboratories. By acknowledging that even seemingly unambiguous spectral assignments may fail, researchers are encouraged to establish controls, retest protocols, and periodically review the criteria used to discriminate signals.16,17 Beyond individual method performance, interlaboratory comparability of qualitative decisions is equally critical to ensure reproducibility across different analytical contexts. In this regard, the joint IUPAC/CITAC (2025) guide enhances metrological capacity by proposing a statistical framework specifically designed for analyzing agreement in categorical results obtained across laboratories, operators, or instruments. Tools such as CATANOVA (Categorical Analysis of Variance) and ORDANOVA (Ordinal Analysis of Variance) enable the treatment of nominal and ordinal variables, respectively, quantifying the degree of agreement among classifications and identifying sources of systematic variability between laboratories. This statistical evaluation is crucial for validating automated decision systems in qualitative methods. Consequently, it ensures that a method not only performs reliably within a single laboratory but also yi

Share this paper