CAGE: Complementing Arm CCA with GPU Extensions

Overview

Confidential computing is an emerging technique that provides users and third-party developers with an isolated and transparent execution environment. To support this technique, Arm introduced the Confidential Computing Architecture (CCA), which creates multiple isolated address spaces, known as realms, to ensure data confidentiality and integrity in security-sensitive tasks. Arm recently proposed the concept of confidential computing on GPU hardware, which is widely used in general-purpose, high-performance, and artificial intelligence computing scenarios. However, hardware and firmware supporting confidential GPU workloads remain unavailable. Existing studies leverage Trusted Execution Environments (TEEs) to secure GPU computing on Arm- or Intel-based platforms, but they are not suitable for CCA's realm-style architecture, such as using incompatible hardware or introducing a large TCB. Therefore, there is a need to complement existing Arm CCA capabilities with GPU acceleration.

To address this challenge, we present CAGE to support confidential GPU computing for Arm CCA. By leveraging the existing security features in Arm CCA, CAGE ensures data security during confidential computing on unified-memory GPUs, the mainstream accelerators in Arm devices. To adapt the GPU workflow to CCA's realm-style architecture, CAGE proposes a novel shadow task mechanism to manage confidential GPU applications flexibly. Additionally, CAGE leverages the memory isolation mechanism in Arm CCA to protect data confidentiality and integrity from the strong adversary. Based on this, CAGE also optimizes security operations in memory isolation to mitigate performance overhead. Without hardware changes, our approach uses the generic hardware security primitives in Arm CCA to defend against a privileged adversary. We present two prototypes to verify CAGE's functionality and evaluate performance, respectively. Results show that CAGE effectively provides GPU support for Arm CCA with an average of 2.45% performance overhead.

FAQ

(1) What is the threat model of CAGE?

CAGE follows the threat model for realms in Arm CCA. We assume the adversary can control the entire software stacks (including the GPU driver and runtime), the Host OS and hypervisor, and even the Secure World software components. We trust the Monitor and RMM, which is responsible for realm isolation on CPU side. We also discuss defense mechanisms against several physical attacks, please refer to our paper. However, we consider cryptographic-based attacks, side-channel attacks, and DoS attacks to be beyond the scope of CAGE.

(2) What is the application scenarios of CAGE?

We envision scenarios in which the realm users transfer and store their sensitive data in realms and perform confidential GPU computing. Following CCA's realm-style architecture, our confidential GPU computation starts with being created through the Host GPU software, then proceeds to the Monitor to execute the GPU workload confidentially.

(3) What will we do in the future?

Currently, CAGE ensures confidential GPU computing for unified-memory GPUs. In the future, we will extend CAGE to support other accelerators such as the FPGA.

Prototype

We provide our functionality and performance prototype on Github.

Prototype on Github

Publication

CAGE: Complementing Arm CCA with GPU Extensions.

Chenxu Wang, Fengwei Zhang, Yunjie Deng, Kevin Leach, Jiannong Cao, Zhenyu Ning, Shoumeng Yan and Zhengyu He.

In Proceedings of the 31st Annual Network and Distributed System Security Symposium (NDSS'24).

Bibtex for citation:

@inproceedings{wang2024cage,
title={CAGE: Complementing Arm CCA with GPU Extensions},
author={Wang, Chenxu and Zhang, Fengwei and Deng, Yunjie and Leach, Kevin and Cao, Jiannong and Ning, Zhenyu and Yan, Shoumeng and He, Zhengyu},
booktitle={Proceedings of the 31st Annual Network and Distributed System Security Symposium},
year={2024}
}