Multicore Resource Isolation for Deterministic, Resilient and Secure Concurrent Execution of Safety-Critical Applications
Published In: IEEE Computer Architecture Letters, 2018.
Abstract: Multicores increasingly deploy spatial execution of safety-critical applications that demand a deterministic, resilient, and secure environment to meet the safety standards. However, multicores aggressively share hardware resources that leads to non-deterministic performance due to destructive interference from concurrent applications. Resource sharing not only hinders efficient resilient execution, but also introduces security vulnerabilities due to information leakage on side-channels. This work proposes a novel multicore framework that constructs isolated clusters of cores for each concurrent application. It guarantees concurrent applications with deterministic performance, as well as an efficient execution environment for resiliency and security. Moreover, the framework allows dynamic re-sizing of cluster sizes for load balanced execution of concurrent applications. However, it leads to diminished isolation between clusters, which opens various performance–resilience and performance–security tradeoffs.
Declarative Resilience: A Holistic Soft-Error Resilient Multicore Architecture that Trades off Program Accuracy for Efficiency
Published In: ACM Transactions on Embedded Computing Systems, 2018.
Abstract: To protect multicores from soft-error perturbations, research has explored various resiliency schemes that provide high soft-error coverage. However, these schemes incur high performance and energy overheads. We observe that not all soft-error perturbations affect program correctness, and some soft-errors only affect program accuracy, i.e., the program completes with certain acceptable deviations from error free outcome. Thus, it is practical to improve processor efficiency by trading off resiliency overheads with program accuracy. This article proposes the idea of declarative resilience that selectively applies strong resiliency schemes for code regions that are crucial for program correctness (crucial code) and lightweight resiliency for code regions that are susceptible to program accuracy deviations as a result of soft-errors (non-crucial code). At the application level, crucial and non-crucial code is identified based on its impact on the program outcome. A cross-layer architecture enables efficient resilience along with holistic soft-error coverage. Only program accuracy is compromised in the worst-case scenario of a soft-error strike during non-crucial code execution. For a set of machine-learning and graph analytic benchmarks, declarative resilience reduces performance overhead over a state-of-the-art system that applies strong resiliency for all program code regions from ∼1.43× to ∼1.2×.
GraphTuner: An Input Dependence Aware Loop Perforation Scheme for Efficient Execution of Approximated Graph Algorithms
Published In: IEEE International Conference on Computer Design, 2017.
Abstract: Graph algorithms have gained popularity and are utilized in high performance and mobile computing paradigms. Input dependence due to input graph changes leads to performance variations in such algorithms. The impact of input dependence for graph algorithms is not well studied in the context of approximate computing. This paper conducts such analysis by applying loop perforation, which is a general approximation mechanism that transforms the program loops to drop a subset of their total iterations. The analysis identifies the need to adapt the inner and outer loop perforation as a function of input graph characteristics, such as the density or size of the graph. A predictive model is proposed to learn the near-optimal loop perforation rates using synthetic input graphs. When the input-aware loop perforation model is applied to real world graphs, the evaluated graph algorithms systematically degrade accuracy to achieve performance and power benefits. Results show ∼30% performance and ∼19% power utilization improvements on average at a program accuracy loss threshold of 10% for an NVidia GPU. The analysis is also conducted for two concurrent Intel CPU architectures, an 8-core Xeon and a 61-core Xeon Phi machine.
Breaking The Oblivious RAM Bandwidth Wall
Published In: IEEE International Conference on Computer Design, 2018.
Abstract: PathORAM is a popular security primitive for obfuscating memory access patterns from a secure processor to an insecure main memory. Emerging throughput multicore and GPU processors provide immense memory bandwidth via multiple on-chip memory controllers. PathORAM translates a single off-chip cache line access into ∼100 cache lines, thereby stressing the available memory bandwidth. However, current PathORAM scheme shows degradation of bandwidth utilization with an increase in the number of memory controllers. This deprivation in bandwidth utilization is primarily due to the fact that PathORAM falls short in proportionate distribution of memory accesses among all available on-chip memory controllers. This paper presents a novel ORAM path distribution scheme that ensures balanced load distribution among parallel on-chip memory controllers, and consequently improves secure processor performance by ∼24% over state-of-the-art PathORAM scheme.
Connecting the Dots: Privacy Leakage via Write-Access Patterns to the Main Memory
Published In: IEEE Transactions on Dependable and Secure Computing, 2017.
Abstract: Data-dependent access patterns of an application to an untrusted storage system are notorious for leaking sensitive information about the user’s data. Previous research has shown how an adversary capable of monitoring both read and write requests issued to the memory can correlate them with the application to learn its sensitive data. However, information leakage through only the write access patterns is less obvious and not well studied in the current literature. In this work, we demonstrate an actual attack on power-side-channel resistant Montgomery’s ladder based modular exponentiation algorithm commonly used in public key cryptography. We infer the complete 512-bit secret exponent in ~3-5 minutes by virtue of just the write access patterns of the algorithm to the main memory. In order to learn the victim algorithm’s write access patterns under realistic settings, we exploit a compromised DMA device to take frequent snapshots of the application’s address space, and then run a simple differential analysis on these snapshots to find the write access sequence. The attack has been shown on an Intel Core(TM) i7-4790 3.60GHz processor based system. We further discuss a possible attack on McEliece public-key cryptosystem that also exploits the write-access patterns to learn the secret key.
Leveraging Hardware Isolation for Process Level Access Control & Authentication
Published In: ACM Symposium on Access Control Models and Technologies, 2017.
Abstract: Critical resource sharing among multiple entities in a processing system is inevitable, which in turn calls for the presence of appropriate authentication and access control mechanisms. Generally speaking, these mechanisms are implemented via trusted software “policy checkers” that enforce certain high level application-specific “rules” to enforce a policy. Whether implemented as operating system modules or embedded inside the application ad hoc, these policy checkers expose additional attack surface in addition to the application logic. In order to protect application software from an adversary, modern secure processing platforms, such as Intel’s Software Guard Extensions (SGX), employ principled hardware isolation to offer secure software containers or enclaves to execute trusted sensitive code with some integrity and privacy guarantees against a privileged software adversary. We extend this model further and propose using these hardware isolation mechanisms to shield the authentication and access control logic essential to policy checker software. While relying on the fundamental features of modern secure processors, our framework introduces productive software design guidelines which enable a guarded environment to execute sensitive policy checking code -- hence enforcing application control flow integrity -- and afford flexibility to the application designer to construct appropriate high-level policies to customize policy checker software.