Considerations on the region of interest in the ROC space

IRIS - Institutional Research Information System
IRIS è il sistema di gestione integrata dei dati della ricerca (persone, progetti, pubblicazioni, attività) adottato dall'Università degli Studi dell’Insubria.

IRInSubria - Institutional Repository Insubria
IRInSubria raccoglie, conserva, documenta e dissemina le informazioni sulla produzione scientifica dell'Università degli Studi dell’Insubria anche ai fini della valutazione della ricerca.

Receiver Operating Characteristic curves have been widely used to represent the performance of diagnostic tests. The corresponding area under the curve, widely used to evaluate their performance quantitatively, has been criticized in several respects. Several proposals have been introduced to improve area under the curve by taking into account only specific regions of the Receiver Operating Characteristic space, that is, the plane to which Receiver Operating Characteristic curves belong. For instance, a region of interest can be delimited by setting specific thresholds for the true positive rate or the false positive rate. Different ways of setting the borders of the region of interest may result in completely different, even opposing, evaluations. In this paper, we present a method to define a region of interest in a rigorous and objective way, and compute a partial area under the curve that can be used to evaluate the performance of diagnostic tests. The method was originally conceived in the Software Engineering domain to evaluate the performance of methods that estimate the defectiveness of software modules. We compare this method with previous proposals. Our method allows the definition of regions of interest by setting acceptability thresholds on any kind of performance metric, and not just false positive rate and true positive rate: for instance, the region of interest can be determined by imposing that (Formula presented.) (also known as the Matthews Correlation Coefficient) is above a given threshold. We also show how to delimit the region of interest corresponding to acceptable costs, whenever the individual cost of false positives and false negatives is known. Finally, we demonstrate the effectiveness of the method by applying it to the Wisconsin Breast Cancer Data. We provide Python and R packages supporting the presented method.

Considerations on the region of interest in the ROC space

Lavazza L.;Morasca S.

2021-01-01

Abstract

Receiver Operating Characteristic curves have been widely used to represent the performance of diagnostic tests. The corresponding area under the curve, widely used to evaluate their performance quantitatively, has been criticized in several respects. Several proposals have been introduced to improve area under the curve by taking into account only specific regions of the Receiver Operating Characteristic space, that is, the plane to which Receiver Operating Characteristic curves belong. For instance, a region of interest can be delimited by setting specific thresholds for the true positive rate or the false positive rate. Different ways of setting the borders of the region of interest may result in completely different, even opposing, evaluations. In this paper, we present a method to define a region of interest in a rigorous and objective way, and compute a partial area under the curve that can be used to evaluate the performance of diagnostic tests. The method was originally conceived in the Software Engineering domain to evaluate the performance of methods that estimate the defectiveness of software modules. We compare this method with previous proposals. Our method allows the definition of regions of interest by setting acceptability thresholds on any kind of performance metric, and not just false positive rate and true positive rate: for instance, the region of interest can be determined by imposing that (Formula presented.) (also known as the Matthews Correlation Coefficient) is above a given threshold. We also show how to delimit the region of interest corresponding to acceptable costs, whenever the individual cost of false positives and false negatives is known. Finally, we demonstrate the effectiveness of the method by applying it to the Wisconsin Breast Cancer Data. We provide Python and R packages supporting the presented method.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2021
			
	Anno di pubblicazione online
	
				2021
			
	Rivista
	
				STATISTICAL METHODS IN MEDICAL RESEARCH
			
	DOI
	
				https://dx.doi.org/10.1177/09622802211060515
			
	Codice PUBMED
	
				34928729
			
	Codice Web of Science
	
				WOS:000736604200001
			
	Codice Scopus
	
				2-s2.0-85122148511
			
	Parole chiave
	
				Area under the curve; discrimination capability; false positive rate; Matthews Correlation Coefficient; partial area under the curve; ratio of relevant areas; ROC (Receiver operating characteristic) curve; true positive rate.
			
	Tutti gli autori
	
						Lavazza, L.; Morasca, S.
					
	Appare nelle tipologie:
	
				Articolo su Rivista

File in questo prodotto:

File	Dimensione	Formato
SMMR2021.pdf non disponibili Descrizione: Articolo principale Tipologia: Versione Editoriale (PDF) Licenza: DRM non definito Dimensione 2.44 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	2.44 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11383/2123784

Citazioni

1

4

4

social impact