Controlling for selection bias in social media indicators through official statistics: a proposal

IRIS - Institutional Research Information System
IRIS è il sistema di gestione integrata dei dati della ricerca (persone, progetti, pubblicazioni, attività) adottato dall'Università degli Studi dell’Insubria.

IRInSubria - Institutional Repository Insubria
IRInSubria raccoglie, conserva, documenta e dissemina le informazioni sulla produzione scientifica dell'Università degli Studi dell’Insubria anche ai fini della valutazione della ricerca.

With the increase of social media usage, a huge new source of data has become available. Despite the enthusiasm linked to this revolution, one of the main outstanding criticisms in using these data is selection bias. Indeed, the reference population is unknown. Nevertheless, many studies show evidence that these data constitute a valuable source because they are more timely and possess higher space granularity. We propose to adjust statistics based on Twitter data by anchoring them to reliable official statistics through a weighted, space-time, small area estimation model. As a by-product, the proposed method also stabilizes the social media indicators, which is a welcome property required for official statistics. The method can be adapted anytime official statistics exists at the proper level of granularity and for which social media usage within the population is known. As an example, we adjust a subjective well- being indicator of “working conditions” in Italy, and combine it with relevant official statistics. The weights depend on broadband coverage and the Twitter rate at province level, while the analysis is performed at regional level. The resulting statistics are then compared with survey statistics on the “quality of job” at macro-economic regional level, showing evidence of similar paths.

Controlling for selection bias in social media indicators through official statistics: a proposal

Stefano Maria Iacus;Porro Giuseppe;Silvia Salini;Elena Siletti

2020-01-01

Abstract

With the increase of social media usage, a huge new source of data has become available. Despite the enthusiasm linked to this revolution, one of the main outstanding criticisms in using these data is selection bias. Indeed, the reference population is unknown. Nevertheless, many studies show evidence that these data constitute a valuable source because they are more timely and possess higher space granularity. We propose to adjust statistics based on Twitter data by anchoring them to reliable official statistics through a weighted, space-time, small area estimation model. As a by-product, the proposed method also stabilizes the social media indicators, which is a welcome property required for official statistics. The method can be adapted anytime official statistics exists at the proper level of granularity and for which social media usage within the population is known. As an example, we adjust a subjective well- being indicator of “working conditions” in Italy, and combine it with relevant official statistics. The weights depend on broadband coverage and the Twitter rate at province level, while the analysis is performed at regional level. The resulting statistics are then compared with survey statistics on the “quality of job” at macro-economic regional level, showing evidence of similar paths.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2020
			
	Rivista
	
				JOURNAL OF OFFICIAL STATISTICS
			
	Url
	
				https://content.sciendo.com/view/journals/jos/jos-overview.xml?tab_body=latestIssueToc-78033
			
	DOI
	
				https://dx.doi.org/10.2478/JOS-2020-0017
			
	Codice Web of Science
	
				WOS:000542688100006
			
	Codice Scopus
	
				2-s2.0-85089075536
			
	Parole chiave
	
				well-being; big data; sentiment analysis; small area estimation; weighting.
			
	Tutti gli autori
	
						Stefano Maria, Iacus; Porro, Giuseppe; Silvia, Salini; Elena, Siletti
					
	Appare nelle tipologie:
	
				Articolo su Rivista

File in questo prodotto:

File	Dimensione	Formato
JOS_2020.pdf accesso aperto Descrizione: Articolo Tipologia: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 679.29 kB Formato Adobe PDF Visualizza/Apri	679.29 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11383/2094865

Attenzione

L'Ateneo sottopone a validazione solo i file PDF allegati

Citazioni

ND

14

14

social impact