TY - GEN
T1 - L1 cache filtering through random selection of memory references
AU - Etsion, Yoav
AU - Feitelson, Dror G.
PY - 2007
Y1 - 2007
N2 - Distinguishing transient blocks from frequently used blocks enables servicing references to transient blocks from a small fully-associative auxiliary cache structure. By inserting only frequently used blocks into the main cache structure, we can reduce the number of conflict misses, thus achieving higher performance and allowing the use of direct mapped caches which offer lower power consumption and lower access latencies. We suggest using a simple probabilistic filtering mechanism based on random sampling to identify and select the frequently used blocks. Furthermore, by using a small direct-mapped lookup table to cache the most recently accessed blocks in the auxiliary cache, we eliminate the vast majority of the costly fully-associative lookups. Finally, we show that a 16K direct-mapped L1 cache, augmentedwith a fully-associative 2K filter, achieves on average over 10% more instructions per cycle than a regular 16K, 4-way set-associative cache, and even ∼5% more IPC than a 32K, 4-way cache, while consuming 70%-80% less dynamic power than either of them.
AB - Distinguishing transient blocks from frequently used blocks enables servicing references to transient blocks from a small fully-associative auxiliary cache structure. By inserting only frequently used blocks into the main cache structure, we can reduce the number of conflict misses, thus achieving higher performance and allowing the use of direct mapped caches which offer lower power consumption and lower access latencies. We suggest using a simple probabilistic filtering mechanism based on random sampling to identify and select the frequently used blocks. Furthermore, by using a small direct-mapped lookup table to cache the most recently accessed blocks in the auxiliary cache, we eliminate the vast majority of the costly fully-associative lookups. Finally, we show that a 16K direct-mapped L1 cache, augmentedwith a fully-associative 2K filter, achieves on average over 10% more instructions per cycle than a regular 16K, 4-way set-associative cache, and even ∼5% more IPC than a 32K, 4-way cache, while consuming 70%-80% less dynamic power than either of them.
UR - http://www.scopus.com/inward/record.url?scp=47849089439&partnerID=8YFLogxK
U2 - 10.1109/PACT.2007.20
DO - 10.1109/PACT.2007.20
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:47849089439
SN - 0769529445
SN - 9780769529448
T3 - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT
SP - 235
EP - 244
BT - 16th International Conference on Parallel Architecture and Compilation Techniques, PACT 2007
T2 - 16th International Conference on Parallel Architecture and Compilation Techniques, PACT 2007
Y2 - 15 September 2007 through 19 September 2007
ER -