Context: The adequacy of fault-proneness prediction models in representing the relationship between the internal quality of classes and their fault-proneness relies on several factors. One of these factors is the completeness of the fault data. A fault-proneness prediction model that is built using fault data collected during testing or after a relatively short period of time after release may be inadequate and may not be reliable enough in predicting faulty classes. Objective: We empirically study the relationship between the time interval since a system is released and the performance of the fault-proneness prediction models constructed based on the fault data reported within the time interval. Method: We construct prediction models using fault data collected at several time intervals since a system has been released and study the performance of the models in representing the relationship between the internal quality of classes and their fault-proneness. In addition, we empirically explore the relationship between the performance of a prediction model and the percentage of increase in the number of classes detected faulty (PIF) over time. Results: Our results show evidence in favor of the expectation that predictions models that use more complete fault data, to a certain extent, more adequately represent the expected relationship between the internal quality of classes and their fault-proneness and have better performance. A threshold based on the PIF value can be used as an indicator for deciding when to stop collecting fault data. When reaching this threshold, collecting additional fault data will not significantly improve the prediction ability of the constructed model. Conclusion: When constructing fault-proneness prediction models, researchers and software engineers are advised to rely on systems that have relatively long maintenance histories. Researchers and software engineers can use the PIF value as an indicator for deciding when to stop collecting fault data.

Investigating the impact of fault data completeness over time on predicting class fault-proneness

Morasca, Sandro
2017-01-01

Abstract

Context: The adequacy of fault-proneness prediction models in representing the relationship between the internal quality of classes and their fault-proneness relies on several factors. One of these factors is the completeness of the fault data. A fault-proneness prediction model that is built using fault data collected during testing or after a relatively short period of time after release may be inadequate and may not be reliable enough in predicting faulty classes. Objective: We empirically study the relationship between the time interval since a system is released and the performance of the fault-proneness prediction models constructed based on the fault data reported within the time interval. Method: We construct prediction models using fault data collected at several time intervals since a system has been released and study the performance of the models in representing the relationship between the internal quality of classes and their fault-proneness. In addition, we empirically explore the relationship between the performance of a prediction model and the percentage of increase in the number of classes detected faulty (PIF) over time. Results: Our results show evidence in favor of the expectation that predictions models that use more complete fault data, to a certain extent, more adequately represent the expected relationship between the internal quality of classes and their fault-proneness and have better performance. A threshold based on the PIF value can be used as an indicator for deciding when to stop collecting fault data. When reaching this threshold, collecting additional fault data will not significantly improve the prediction ability of the constructed model. Conclusion: When constructing fault-proneness prediction models, researchers and software engineers are advised to rely on systems that have relatively long maintenance histories. Researchers and software engineers can use the PIF value as an indicator for deciding when to stop collecting fault data.
2017
http://www.elsevier.com/wps/find/journaldescription.cws_home/525444/description#description
Class fault-proneness; Fault data; Internal quality attributes; Object-oriented software; Quality measures; Software; Information Systems; Computer Science Applications1707 Computer Vision and Pattern Recognition
Al Dallal, Jehad; Morasca, Sandro
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11383/2068911
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 8
  • ???jsp.display-item.citation.isi??? 5
social impact