Overview
The Linux kernel extensively uses the Berkeley
Packet Filter (BPF) to allow user-written BPF applications to
execute in the kernel space. The BPF employs a verifier to check
the security of user-supplied BPF code statically. Recent attacks
show that BPF programs can evade security checks and gain
unauthorized access to kernel memory, indicating that the verification process is not flawless.
In this paper, we present Moat,
a novel hardware-assisted, cross-platform isolation framework
designed to protect the kernel from malicious BPF programs.
Moat
leverages two classes of hardware primitives: key-based hardwares, such as Intel Memory Protection Keys (MPK)
and Arm Permission Overlay Extension (POE), and
virtualization-based haedwares, such as Arm Stage-2 translation, AMD Rapid Virtualization Indexing (RVI) and
RISC-V H-mode. Given the widespread support for Intel
MPK and Arm Stage-2 translation on modern processors, we
select these two primitives from each class as representative
examples to illustrate Moat's cross-platform design.
Moat introduces a two-layer memory isolation scheme that
leverages hardware features such as Intel MPK and Arm Stage-2
translation to enforce isolation. Our design overcomes several key
challenges, including the limited scalability of available hardware
isolation mechanisms and the risk of helper function abuse. We
implement Moat for Intel x86 and Arm on Linux (ver. 6.1.38),
and our evaluation shows that Moat delivers low-cost isolation
of BPF programs under mainstream use cases, such as isolating
a BPF packet filter with only 3% throughput loss.
FAQ
(0) What is the difference between the USENIX Security paper and the TDSC paper?
The original Moat was initially confined to the Intel x86 platform, where it leveraged Intel MPK to isolate BPF programs. In the TDSC paper, we extended Moat to be a cross-platform framework, with significant enhancements focused on cross-platform capability. We generalize the design of Moat to support multiple architectures and introduce a new approach, Moat-vir, for platforms equipped with virtualization-based hardware primitives, where key-based primitives such as Intel MPK are unavailable.
(1) How to run Moat?
Check out our repo.
We have a detailed guide on how to setup and run Moat.
If you have questions about Moat, you can contact
@jwnhy and @Lijian Huang. We will try to help you. (if you cite this paper; this is a joke.)
Note that this is a highly experimental prototype.
DO NOT USE IT IN PRODUCTION.
(2) What challenges has Moat overcome?
Both key-based and virtualization-based hardware primitives offer
limited support for scalable, fine-grained isolation, making
it difficult to efficiently isolate a large number of concurrent
BPF programs. To address this hurdle, we propose a novel
two-layer isolation scheme that protects both the kernel and
benign BPF programs from malicious BPF programs. Layer-I
leverages the hardware isolation primitives to construct three
isolation domains, preventing unauthorized kernel access from
BPF programs. Layer-II enforces intra-BPF isolation within
the same domain by assigning each BPF program a dedicated
address space, while mitigating the Translation Lookaside
Buffer (TLB) flush overhead with emerging hardware features.
We also propose two scheme to regulate the bahavior of BPF helper functions to prevent them from being abused by malicious BPF programs.
(3) What are the application scenarios for Moat?
If you want to allow unprivileged user to run BPF programs, but you don't want these BPF programs
break your system, then you might consider migrating Moat to your system.
There are other things you need to fully enable unprivileged BPF on your system (e.g., access control), Moat only ensures
the memory/helper safety of your BPF programs.
(4) What will we do in the future?
We are actively working with some company on turning Moat into a production-level system.
Manuscript & Prototype
USENIX Security Manuscript
TDSC Manuscript
Prototype on Github
Publication
Moat: Towards Safe BPF Kernel Extension
Hongyi Lu, Shuai Wang, Yechang Wu, Wanning He, Fengwei Zhang
Presented in the Proceedings of 33rd USENIX Security Symposium
@inproceedings {moat,
author = {Hongyi Lu and Shuai Wang and Yechang Wu and Wanning He and Fengwei Zhang},
title = {{MOAT}: Towards Safe {BPF} Kernel Extension},
booktitle = {33rd USENIX Security Symposium (USENIX Security 24)},
year = {2024},
isbn = {978-1-939133-44-1},
address = {Philadelphia, PA},
pages = {1153--1170},
publisher = {USENIX Association},
}
Towards Secure BPF Kernel Extension with Hardware-enhanced Memory Isolation
Lijian Huang^, Hongyi Lu^, Shuai Wang*, Fengwei Zhang*
^ Equal contribution. * Corresponding authors.
To Appear in IEEE Transactions on Dependable and Secure Computing (TDSC), 2026
Early Access