TY - JOUR
T1 - Tractable near-optimal policies for crawling
AU - Azar, Yossi
AU - Horvitz, Eric
AU - Lubetzky, Eyal
AU - Peres, Yuval
AU - Shahaf, Dafna
N1 - Publisher Copyright:
© 2018 National Academy of Sciences. All rights reserved.
PY - 2018/8/7
Y1 - 2018/8/7
N2 - The problem of maintaining a local cache of n constantly changing pages arises in multiple mechanisms such as web crawlers and proxy servers. In these, the resources for polling pages for possible updates are typically limited. The goal is to devise a polling and fetching policy that maximizes the utility of served pages that are up to date. Cho and Garcia-Molina [(2003) ACM Trans Database Syst 28:390–426] formulated this as an optimization problem, which can be solved numerically for small values of n, but appears intractable in general. Here, we show that the optimal randomized policy can be found exactly in O(n log n) operations. Moreover, using the optimal probabilities to define in linear time a deterministic schedule yields a tractable policy that in experiments attains 99% of the optimum.
AB - The problem of maintaining a local cache of n constantly changing pages arises in multiple mechanisms such as web crawlers and proxy servers. In these, the resources for polling pages for possible updates are typically limited. The goal is to devise a polling and fetching policy that maximizes the utility of served pages that are up to date. Cho and Garcia-Molina [(2003) ACM Trans Database Syst 28:390–426] formulated this as an optimization problem, which can be solved numerically for small values of n, but appears intractable in general. Here, we show that the optimal randomized policy can be found exactly in O(n log n) operations. Moreover, using the optimal probabilities to define in linear time a deterministic schedule yields a tractable policy that in experiments attains 99% of the optimum.
KW - Caching policies
KW - Scheduling optimization
KW - Web crawling
UR - http://www.scopus.com/inward/record.url?scp=85054929704&partnerID=8YFLogxK
U2 - 10.1073/pnas.1801519115
DO - 10.1073/pnas.1801519115
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 30038026
AN - SCOPUS:85054929704
SN - 0027-8424
VL - 115
SP - 8099
EP - 8103
JO - Proceedings of the National Academy of Sciences of the United States of America
JF - Proceedings of the National Academy of Sciences of the United States of America
IS - 32
ER -