Document clustering techniques have been applied in several areas, with the web as one of the most recent and influential. Both general-purpose and text-oriented techniques exist and can be used to cluster a collection of documents in many ways. This work proposes a novel heuristic online document clustering model that can be specialized with a variety of text-oriented similarity measures. An experimental evaluation of the proposed model was conducted in the e-commerce domain. Performances were measured using a clustering-oriented metric based on F-Measure and compared with those obtained by other well-known approaches. The obtained results confirm the validity of the proposed method both for batch scenarios and online scenarios where document collections can grow over time.

An online document clustering technique for short web contents

BINAGHI, ELISABETTA;GALLO, IGNAZIO
2009-01-01

Abstract

Document clustering techniques have been applied in several areas, with the web as one of the most recent and influential. Both general-purpose and text-oriented techniques exist and can be used to cluster a collection of documents in many ways. This work proposes a novel heuristic online document clustering model that can be specialized with a variety of text-oriented similarity measures. An experimental evaluation of the proposed model was conducted in the e-commerce domain. Performances were measured using a clustering-oriented metric based on F-Measure and compared with those obtained by other well-known approaches. The obtained results confirm the validity of the proposed method both for batch scenarios and online scenarios where document collections can grow over time.
2009
Online clustering; Short documents analysis; Similarity measures
M., Carullo; Binaghi, Elisabetta; Gallo, Ignazio
File in questo prodotto:
File Dimensione Formato  
An online document clustering technique for short web contents.pdf

non disponibili

Descrizione: PDF editoriale
Tipologia: Altro materiale allegato
Licenza: DRM non definito
Dimensione 431.25 kB
Formato Adobe PDF
431.25 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11383/1708729
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 37
  • ???jsp.display-item.citation.isi??? 27
social impact