Unveiling the risks of ChatGPT in diagnostic surgical pathologyChatGPT

IRIS - Institutional Research Information System
IRIS è il sistema di gestione integrata dei dati della ricerca (persone, progetti, pubblicazioni, attività) adottato dall'Università degli Studi dell’Insubria.

IRInSubria - Institutional Repository Insubria
IRInSubria raccoglie, conserva, documenta e dissemina le informazioni sulla produzione scientifica dell'Università degli Studi dell’Insubria anche ai fini della valutazione della ricerca.

ChatGPT, an AI capable of processing and generating human-like language, has been studied in medical education and care, yet its potential in histopathological diagnosis remains unexplored. This study evaluates ChatGPT’s reliability in addressing pathology-related diagnostic questions across ten subspecialties and its ability to provide scientific references. We crafted five clinico-pathological scenarios per subspecialty, simulating a pathologist using ChatGPT to refine differential diagnoses. Each scenario, aligned with current diagnostic guidelines and validated by expert pathologists, was posed as open-ended or multiple-choice questions, either requesting scientific references or not. Outputs were assessed by six pathologists according to. (1) usefulness in supporting the diagnosis and (2) absolute number of errors. We used directed acyclic graphs and structural causal models to determine the effect of each scenario type, field, question modality, and pathologist evaluation. We yielded 894 evaluations. ChatGPT provided useful answers in 62.2% of cases, and 32.1% of outputs contained no errors, while the remaining had at least one error. ChatGPT provided 214 bibliographic references: 70.1% correct, 12.1% inaccurate, and 17.8% non-existing. Scenario variability had the greatest impact on ratings, and latent knowledge across fields showed minimal variation. Although ChatGPT provided useful responses in one-third of cases, the frequency of errors and variability underscores its inadequacy for routine diagnostic use and highlights the need for discretion as a support tool. Imprecise referencing also suggests caution as a self-learning tool. It is essential to recognize the irreplaceable role of human experts in synthesizing images, clinical data, and experience for the intricate task of histopathological diagnosis.

Unveiling the risks of ChatGPT in diagnostic surgical pathologyChatGPT

Guastafierro V;Corbitt DN;Bressan A;Fernandes B;Mintemur Ö;Magnoli F;Ronchi S;La Rosa S;Uccella S;Renne SL

2024-01-01

Abstract

ChatGPT, an AI capable of processing and generating human-like language, has been studied in medical education and care, yet its potential in histopathological diagnosis remains unexplored. This study evaluates ChatGPT’s reliability in addressing pathology-related diagnostic questions across ten subspecialties and its ability to provide scientific references. We crafted five clinico-pathological scenarios per subspecialty, simulating a pathologist using ChatGPT to refine differential diagnoses. Each scenario, aligned with current diagnostic guidelines and validated by expert pathologists, was posed as open-ended or multiple-choice questions, either requesting scientific references or not. Outputs were assessed by six pathologists according to. (1) usefulness in supporting the diagnosis and (2) absolute number of errors. We used directed acyclic graphs and structural causal models to determine the effect of each scenario type, field, question modality, and pathologist evaluation. We yielded 894 evaluations. ChatGPT provided useful answers in 62.2% of cases, and 32.1% of outputs contained no errors, while the remaining had at least one error. ChatGPT provided 214 bibliographic references: 70.1% correct, 12.1% inaccurate, and 17.8% non-existing. Scenario variability had the greatest impact on ratings, and latent knowledge across fields showed minimal variation. Although ChatGPT provided useful responses in one-third of cases, the frequency of errors and variability underscores its inadequacy for routine diagnostic use and highlights the need for discretion as a support tool. Imprecise referencing also suggests caution as a self-learning tool. It is essential to recognize the irreplaceable role of human experts in synthesizing images, clinical data, and experience for the intricate task of histopathological diagnosis.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Anno di pubblicazione online
	
				2024
			
	Rivista
	
				VIRCHOWS ARCHIV
			
	DOI
	
				https://dx.doi.org/10.1007/s00428-024-03918-1
			
	Codice PUBMED
	
				39269615
			
	Codice Web of Science
	
				WOS:001311981900002
			
	Codice Scopus
	
				2-s2.0-85204120511
			
	Parole chiave
	
				Accuracy; ChatGPT; Large language model; Surgical pathology; Usefulness
			
	Tutti gli autori
	
						Guastafierro, V; Corbitt, Dn; Bressan, A; Fernandes, B; Mintemur, Ö; Magnoli, F; Ronchi, S; La Rosa, S; Uccella, S; Renne, Sl
					
	Appare nelle tipologie:
	
				Articolo su Rivista

File in questo prodotto:

File	Dimensione	Formato
s00428-024-03918-1.pdf non disponibili Tipologia: Versione Editoriale (PDF) Licenza: Copyright dell'editore Dimensione 2.27 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	2.27 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11383/2179653

Attenzione

L'Ateneo sottopone a validazione solo i file PDF allegati

Citazioni

0

5

6

social impact