In this paper we describe a new on-line document categorization strategy that can be integrated within Web applications. A salient aspect is the use of neural learning in both representation and classification tasks. Within text documents conceived as images, the regions of interest (RoI) containing information meaningful for categorization are identified with the support of a supervised neural network. Text within RoI is represented according to a simple solution that consider the first k words in the text and code them properly. A Kohonen Self-Organizing Map (SOM) is applied to cluster documents that are subsequently labelled by applying a simple majority voting mechanism. Solutions adopted were evaluated by conducting experiments within the context of on-line price comparison services.
Text categorization of commercial web pages
BINAGHI, ELISABETTA;GALLO, IGNAZIO;
2008-01-01
Abstract
In this paper we describe a new on-line document categorization strategy that can be integrated within Web applications. A salient aspect is the use of neural learning in both representation and classification tasks. Within text documents conceived as images, the regions of interest (RoI) containing information meaningful for categorization are identified with the support of a supervised neural network. Text within RoI is represented according to a simple solution that consider the first k words in the text and code them properly. A Kohonen Self-Organizing Map (SOM) is applied to cluster documents that are subsequently labelled by applying a simple majority voting mechanism. Solutions adopted were evaluated by conducting experiments within the context of on-line price comparison services.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.