The big number of projects producing open source software provides researches with the possibility to measure software artefacts, thus producing a huge amount of data that are available for analysis. In order to be efficient and reliable, the process of data retrieval and analysis needs to be adequately supported by tools. In particular, measurement tools should guarantee that a large amount of artefacts are measured in a coherent and efficient way. They should also guarantee that the delivered measures have a well specified structure and meaning, which should be agreed upon by the community of researchers interested in analysing the data. A problem that such tools have to face is that all the elements involved are highly variable: the data source can be available in different versions; the measures to be carried out can be defined in (often only slightly) different ways; it is usually different the output required by different types of analysis. Another non trivial problem is that measured data have to be stored persistently in a way that lets the user not only retrieve the data, but also the meta-data describing the measurement themselves. In this paper we describe a tool that addresses the requirements described above, and present a first implementation that satisfies several of such requirements.
A tool for the measurement, storage, and pre-elaboration of data supporting the release of public datasets
LAVAZZA, LUIGI ANTONIO
2006-01-01
Abstract
The big number of projects producing open source software provides researches with the possibility to measure software artefacts, thus producing a huge amount of data that are available for analysis. In order to be efficient and reliable, the process of data retrieval and analysis needs to be adequately supported by tools. In particular, measurement tools should guarantee that a large amount of artefacts are measured in a coherent and efficient way. They should also guarantee that the delivered measures have a well specified structure and meaning, which should be agreed upon by the community of researchers interested in analysing the data. A problem that such tools have to face is that all the elements involved are highly variable: the data source can be available in different versions; the measures to be carried out can be defined in (often only slightly) different ways; it is usually different the output required by different types of analysis. Another non trivial problem is that measured data have to be stored persistently in a way that lets the user not only retrieve the data, but also the meta-data describing the measurement themselves. In this paper we describe a tool that addresses the requirements described above, and present a first implementation that satisfies several of such requirements.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.