Nonstochastic multi-armed bandits with graph-structured feedback

IRIS - Institutional Research Information System
IRIS è il sistema di gestione integrata dei dati della ricerca (persone, progetti, pubblicazioni, attività) adottato dall'Università degli Studi dell’Insubria.

IRInSubria - Institutional Repository Insubria
IRInSubria raccoglie, conserva, documenta e dissemina le informazioni sulla produzione scientifica dell'Università degli Studi dell’Insubria anche ai fini della valutazione della ricerca.

We introduce and study a partial-information model of online learning, where a decision maker repeatedly chooses from a finite set of actions and observes some subset of the associated losses. This setting naturally models several situations where knowing the loss of one action provides information on the loss of other actions. Moreover, it generalizes and interpolates between the well-studied full-information setting (where all losses are revealed) and the bandit setting (where only the loss of the action chosen by the player is revealed). We provide several algorithms addressing different variants of our setting and provide tight regret bounds depending on combinatorial properties of the information feedback structure.

Nonstochastic multi-armed bandits with graph-structured feedback

Alon, Noga;Cesa-Bianchi, Nicolo;Gentile, Claudio;Mannor, Shie;Mansour, Yishay;Shamir, Ohad

2017-01-01

Abstract

We introduce and study a partial-information model of online learning, where a decision maker repeatedly chooses from a finite set of actions and observes some subset of the associated losses. This setting naturally models several situations where knowing the loss of one action provides information on the loss of other actions. Moreover, it generalizes and interpolates between the well-studied full-information setting (where all losses are revealed) and the bandit setting (where only the loss of the action chosen by the player is revealed). We provide several algorithms addressing different variants of our setting and provide tight regret bounds depending on combinatorial properties of the information feedback structure.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2017
			
	Rivista
	
				SIAM JOURNAL ON COMPUTING
			
	Url
	
				http://epubs.siam.org/doi/pdf/10.1137/140989455
			
	DOI
	
				https://dx.doi.org/10.1137/140989455
			
	Codice Web of Science
	
				WOS:000418680200004
			
	Codice Scopus
	
				2-s2.0-85039930361
			
	Parole chiave
	
				Graph theory; Learning from experts; Learning with partial feedback; Multi-armed bandits; Online learning; Computer Science (all); Mathematics (all)
			
	Tutti gli autori
	
						Alon, Noga; Cesa-Bianchi, Nicolo; Gentile, Claudio; Mannor, Shie; Mansour, Yishay; Shamir, Ohad
					
	Appare nelle tipologie:
	
				Articolo su Rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11383/2069063

Attenzione

L'Ateneo sottopone a validazione solo i file PDF allegati

Citazioni

ND

97

79

social impact