Learning vs earning trade-off with missing or censored observations: The two-armed bayesian nonparametric beta-stacy bandit problem

IRIS - Institutional Research Information System
IRIS è il sistema di gestione integrata dei dati della ricerca (persone, progetti, pubblicazioni, attività) adottato dall'Università degli Studi dell’Insubria.

IRInSubria - Institutional Repository Insubria
IRInSubria raccoglie, conserva, documenta e dissemina le informazioni sulla produzione scientifica dell'Università degli Studi dell’Insubria anche ai fini della valutazione della ricerca.

Existing Bayesian nonparametric methodologies for bandit problems focus on exact observations, leaving a gap in those bandit applications where censored observations are crucial.We address this gap by extending a Bayesian nonparametric two-armed bandit problem to right-censored data, where each arm is generated from a beta-Stacy process as defined byWalker and Muliere (1997). We first show some properties of the expected advantage of choosing one arm over the other, namely the monotonicity in the arm response and, limited to the case of continuous state space, the continuity in the right-censored arm response. We partially characterize optimal strategies by proving the existence of stay-with-a-winner and stay-witha-winner/switch-on-a-loser break-even points, under non-restrictive conditions that include the special cases of the simple homogeneous process and the Dirichlet process. Numerical estimations and simulations for a variety of discrete and continuous state space settings are presented to illustrate the performance and flexibility of our framework.

Learning vs earning trade-off with missing or censored observations: The two-armed bayesian nonparametric beta-stacy bandit problem

Peluso, Stefano;Mira, Antonietta;Muliere, Pietro

2017-01-01

Abstract

Existing Bayesian nonparametric methodologies for bandit problems focus on exact observations, leaving a gap in those bandit applications where censored observations are crucial.We address this gap by extending a Bayesian nonparametric two-armed bandit problem to right-censored data, where each arm is generated from a beta-Stacy process as defined byWalker and Muliere (1997). We first show some properties of the expected advantage of choosing one arm over the other, namely the monotonicity in the arm response and, limited to the case of continuous state space, the continuity in the right-censored arm response. We partially characterize optimal strategies by proving the existence of stay-with-a-winner and stay-witha-winner/switch-on-a-loser break-even points, under non-restrictive conditions that include the special cases of the simple homogeneous process and the Dirichlet process. Numerical estimations and simulations for a variety of discrete and continuous state space settings are presented to illustrate the performance and flexibility of our framework.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2017
			
	Rivista
	
				ELECTRONIC JOURNAL OF STATISTICS
			
	Url
	
				https://projecteuclid.org/download/pdfview_1/euclid.ejs/1507255609
			
	DOI
	
				https://dx.doi.org/10.1214/17-EJS1342
			
	Codice Web of Science
	
				WOS:000438838500021
			
	Codice Scopus
	
				2-s2.0-85030866307
			
	Parole chiave
	
				Statistics and Probability; Statistics, Probability and Uncertainty
			
	Tutti gli autori
	
						Peluso, Stefano; Mira, Antonietta; Muliere, Pietro
					
	Appare nelle tipologie:
	
				Articolo su Rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11383/2076236

Attenzione

L'Ateneo sottopone a validazione solo i file PDF allegati

Citazioni

ND

1

1

social impact