Problem Binary classifiers are widely used in medical research, especially for diagnoses. They are usually evaluated via performance metrics computed based on confusion matrices. Accuracy and F-measure are among the most frequently used performance metrics, but they make implicit assumptions and do not take into account important characteristics of classifiers. As a consequence, evaluations based on Accuracy or F-measure may turn out to be incorrect, unreliable, and inadequate for the specific application context. The usage of Accuracy and F-measure is particularly critical in the medical domain, where selecting a sub-optimal classifier may lead to incorrect diagnoses, with potentially serious or even fatal consequences. Aim We investigated whether the improper or naive usage of Accuracy and F-measure can lead to partial or incorrect evaluations. If this is the case, we need a procedure to reinterpret the conclusions reported in research articles, whenever possible. Method After discussing a few important properties of Accuracy and F-measure, we examine a set of representative research articles, to assess their conclusions, and illustrate a procedure to reinterpret those conclusions. Results It appears that the examined research articles yield conclusions that are largely affected by the used performance metrics, which in some cases lead to very misleading conclusions. The application of the proposed procedure allows the retrieval of confusion matrices and the derivation of reliable indications of classifiers’ performances. Conclusion F-measure and Accuracy should be used with care, being aware of their characteristics and limits. We recommend that future evaluations of binary classifiers be provided with the complete confusion matrices, so that users can formulate evaluations based on specific contexts and priorities.

Common problems with the usage of F-measure and accuracy metrics in medical research

Lavazza, Luigi
;
Morasca, Sandro
2023-01-01

Abstract

Problem Binary classifiers are widely used in medical research, especially for diagnoses. They are usually evaluated via performance metrics computed based on confusion matrices. Accuracy and F-measure are among the most frequently used performance metrics, but they make implicit assumptions and do not take into account important characteristics of classifiers. As a consequence, evaluations based on Accuracy or F-measure may turn out to be incorrect, unreliable, and inadequate for the specific application context. The usage of Accuracy and F-measure is particularly critical in the medical domain, where selecting a sub-optimal classifier may lead to incorrect diagnoses, with potentially serious or even fatal consequences. Aim We investigated whether the improper or naive usage of Accuracy and F-measure can lead to partial or incorrect evaluations. If this is the case, we need a procedure to reinterpret the conclusions reported in research articles, whenever possible. Method After discussing a few important properties of Accuracy and F-measure, we examine a set of representative research articles, to assess their conclusions, and illustrate a procedure to reinterpret those conclusions. Results It appears that the examined research articles yield conclusions that are largely affected by the used performance metrics, which in some cases lead to very misleading conclusions. The application of the proposed procedure allows the retrieval of confusion matrices and the derivation of reliable indications of classifiers’ performances. Conclusion F-measure and Accuracy should be used with care, being aware of their characteristics and limits. We recommend that future evaluations of binary classifiers be provided with the complete confusion matrices, so that users can formulate evaluations based on specific contexts and priorities.
2023
2023
https://doi.org/10.1109/access.2023.3278996
Accuracy, binary classifiers, F-measure, F-score, performance metrics.
Lavazza, Luigi; Morasca, Sandro
File in questo prodotto:
File Dimensione Formato  
IEEE_Access_2023.pdf

accesso aperto

Descrizione: Articolo principale
Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 928.17 kB
Formato Adobe PDF
928.17 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11383/2155431
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 1
social impact