Investigating the impact of fault data completeness over time on predicting class fault-proneness

IRIS - Institutional Research Information System
IRIS è il sistema di gestione integrata dei dati della ricerca (persone, progetti, pubblicazioni, attività) adottato dall'Università degli Studi dell’Insubria.

IRInSubria - Institutional Repository Insubria
IRInSubria raccoglie, conserva, documenta e dissemina le informazioni sulla produzione scientifica dell'Università degli Studi dell’Insubria anche ai fini della valutazione della ricerca.

Context: The adequacy of fault-proneness prediction models in representing the relationship between the internal quality of classes and their fault-proneness relies on several factors. One of these factors is the completeness of the fault data. A fault-proneness prediction model that is built using fault data collected during testing or after a relatively short period of time after release may be inadequate and may not be reliable enough in predicting faulty classes. Objective: We empirically study the relationship between the time interval since a system is released and the performance of the fault-proneness prediction models constructed based on the fault data reported within the time interval. Method: We construct prediction models using fault data collected at several time intervals since a system has been released and study the performance of the models in representing the relationship between the internal quality of classes and their fault-proneness. In addition, we empirically explore the relationship between the performance of a prediction model and the percentage of increase in the number of classes detected faulty (PIF) over time. Results: Our results show evidence in favor of the expectation that predictions models that use more complete fault data, to a certain extent, more adequately represent the expected relationship between the internal quality of classes and their fault-proneness and have better performance. A threshold based on the PIF value can be used as an indicator for deciding when to stop collecting fault data. When reaching this threshold, collecting additional fault data will not significantly improve the prediction ability of the constructed model. Conclusion: When constructing fault-proneness prediction models, researchers and software engineers are advised to rely on systems that have relatively long maintenance histories. Researchers and software engineers can use the PIF value as an indicator for deciding when to stop collecting fault data.

Investigating the impact of fault data completeness over time on predicting class fault-proneness

Al Dallal, Jehad;Morasca, Sandro

2017-01-01

Abstract

Context: The adequacy of fault-proneness prediction models in representing the relationship between the internal quality of classes and their fault-proneness relies on several factors. One of these factors is the completeness of the fault data. A fault-proneness prediction model that is built using fault data collected during testing or after a relatively short period of time after release may be inadequate and may not be reliable enough in predicting faulty classes. Objective: We empirically study the relationship between the time interval since a system is released and the performance of the fault-proneness prediction models constructed based on the fault data reported within the time interval. Method: We construct prediction models using fault data collected at several time intervals since a system has been released and study the performance of the models in representing the relationship between the internal quality of classes and their fault-proneness. In addition, we empirically explore the relationship between the performance of a prediction model and the percentage of increase in the number of classes detected faulty (PIF) over time. Results: Our results show evidence in favor of the expectation that predictions models that use more complete fault data, to a certain extent, more adequately represent the expected relationship between the internal quality of classes and their fault-proneness and have better performance. A threshold based on the PIF value can be used as an indicator for deciding when to stop collecting fault data. When reaching this threshold, collecting additional fault data will not significantly improve the prediction ability of the constructed model. Conclusion: When constructing fault-proneness prediction models, researchers and software engineers are advised to rely on systems that have relatively long maintenance histories. Researchers and software engineers can use the PIF value as an indicator for deciding when to stop collecting fault data.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2017
			
	Rivista
	
				INFORMATION AND SOFTWARE TECHNOLOGY
			
	Url
	
				http://www.elsevier.com/wps/find/journaldescription.cws_home/525444/description#description
			
	DOI
	
				https://dx.doi.org/10.1016/j.infsof.2017.11.001
			
	Codice Web of Science
	
				WOS:000426026800007
			
	Codice Scopus
	
				2-s2.0-85034575078
			
	Parole chiave
	
				Class fault-proneness; Fault data; Internal quality attributes; Object-oriented software; Quality measures; Software; Information Systems; Computer Science Applications1707 Computer Vision and Pattern Recognition
			
	Tutti gli autori
	
						Al Dallal, Jehad; Morasca, Sandro
					
	Appare nelle tipologie:
	
				Articolo su Rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11383/2068911

Attenzione

L'Ateneo sottopone a validazione solo i file PDF allegati

Citazioni

ND

11

7

social impact