TY - GEN
T1 - Characterizing truthful multi-armed bandit mechanisms
AU - Babaioff, Moshe
AU - Sharma, Yogeshwer
AU - Slivkins, Aleksandrs
PY - 2009
Y1 - 2009
N2 - We consider a multi-round auction setting motivated by pay-per-click auctions for Internet advertising. In each round the auctioneer selects an advertiser and shows her ad, which is then either clicked or not. An advertiser derives value from clicks; the value of a click is her private information. Initially, neither the auctioneer nor the advertisers have any information about the likelihood of clicks on the advertisements. The auctioneer's goal is to design a (dominant strategies) truthful mechanism that (approximately) maximizes the social welfare. If the advertisers bid their true private values, our problem is equivalent to the multi-armed bandit problem, and thus can be viewed as a strategic version of the latter. In particular, for both problems the quality of an algorithm can be characterized by regret, the difference in social welfare between the algorithm and the benchmark which always selects the same "best" advertisement. We investigate how the design of multi-armed bandit algorithms is affected by the restriction that the resulting mechanism must be truthful. We find that truthful mechanisms have certain strong structural properties - essentially, they must separate exploration from exploitation - and they incur much higher regret than the optimal multi-armed bandit algorithms. Moreover, we provide a truthful mechanism which (essentially) matches our lower bound on regret.
AB - We consider a multi-round auction setting motivated by pay-per-click auctions for Internet advertising. In each round the auctioneer selects an advertiser and shows her ad, which is then either clicked or not. An advertiser derives value from clicks; the value of a click is her private information. Initially, neither the auctioneer nor the advertisers have any information about the likelihood of clicks on the advertisements. The auctioneer's goal is to design a (dominant strategies) truthful mechanism that (approximately) maximizes the social welfare. If the advertisers bid their true private values, our problem is equivalent to the multi-armed bandit problem, and thus can be viewed as a strategic version of the latter. In particular, for both problems the quality of an algorithm can be characterized by regret, the difference in social welfare between the algorithm and the benchmark which always selects the same "best" advertisement. We investigate how the design of multi-armed bandit algorithms is affected by the restriction that the resulting mechanism must be truthful. We find that truthful mechanisms have certain strong structural properties - essentially, they must separate exploration from exploitation - and they incur much higher regret than the optimal multi-armed bandit algorithms. Moreover, we provide a truthful mechanism which (essentially) matches our lower bound on regret.
KW - Mechanism design
KW - Multi-armed bandits
KW - Online learning
KW - Single-parameter auctions
KW - Truthful mechanisms
UR - http://www.scopus.com/inward/record.url?scp=77950572152&partnerID=8YFLogxK
U2 - 10.1145/1566374.1566386
DO - 10.1145/1566374.1566386
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:77950572152
SN - 9781605584584
T3 - Proceedings of the ACM Conference on Electronic Commerce
SP - 79
EP - 88
BT - EC'09 - Proceedings of the 2009 ACM Conference on Electronic Commerce
T2 - 2009 ACM Conference on Electronic Commerce, EC'09
Y2 - 6 July 2009 through 10 July 2009
ER -