Abstract
Recent advancements in semiconductor process technologies have unveiled the susceptibility of hardware circuits to reliability issues, especially those related to transistor aging. Transistor aging gradually degrades gate performance, eventually causing hardware to behave incorrectly. Such misbehaving hardware can result in silent data corruptions (SDCs) in software - -a type of failure that comes without logs or exceptions, but causes miscomputing instructions, bitflips, and broken cache coherency. Alas, while design efforts can be made to mitigate transistor aging, complete elimination of this problem during design and fabrication cannot be guaranteed. This emerging challenge calls for a mechanism that not only detects potentially aged hardware in the field, but also triggers software mitigations at application runtime.We propose Vega, a novel workflow that allows efficient detection of aging-related failures at software runtime. Vega leverages the well-studied gate-level modeling of aging effects to identify susceptible signal propagation paths that could fail due to transistor aging. It then utilizes formal verification techniques to generate short test cases that activate these paths and detect any failure within them. Vega integrates the test cases into a user application by directly fusing them together, or by packaging the test cases into a library that the application can invoke. We demonstrate our proposed techniques on the arithmetic logic unit and floating-point unit of a RISC-V CPU. We show that Vega generates effective test cases and integrates them into applications with an average of 0.8% performance overhead.
Original language | English |
---|---|
Title of host publication | ASPLOS 2024 - Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems |
Publisher | Association for Computing Machinery |
Pages | 220-235 |
Number of pages | 16 |
ISBN (Electronic) | 9798400703911 |
DOIs | |
State | Published - 10 Apr 2025 |
Event | 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2024 - San Diego, United States Duration: 27 Apr 2024 → 1 May 2024 |
Publication series
Name | International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS |
---|---|
Volume | 4 |
Conference
Conference | 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2024 |
---|---|
Country/Territory | United States |
City | San Diego |
Period | 27/04/24 → 1/05/24 |
Bibliographical note
Publisher Copyright:© 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.