A type confusion bug in nft_set_elem_init (leading to a buffer overflow)
Screenshots from the blog posts
Summary
An issue was discovered in the Linux kernel A type confusion bug in nft_set_elem_init (leading to a buffer overflow) could be used by a local attacker to escalate privileges, a different vulnerability
Description
Description
An issue was discovered in the Linux kernel . A type confusion bug in nft_set_elem_init (leading to a buffer overflow) could be used by a local attacker to escalate privileges, a different vulnerability than CVE-2022-32250. (The attacker can obtain root access, but must start with an unprivileged user namespace to obtain CAP_NET_ADMIN access.) This can be fixed in nft_setelem_parse_data in net/netfilter/nf_tables_api.c.
What is NF_Tables?
NF_Tables
is a packet-filtering framework in the Linux kernel that provides an efficient and flexible way to classify and manipulate network packets. It is designed to replace the older iptables
and ip6tables
tools for firewall and packet filtering tasks, offering improved performance, syntax, and capabilities.
NF_Tables
allows you to define rulesets to control the flow of network packets through your system. It uses a rule-based syntax to match packets based on various criteria and then applies actions to those packets, such as dropping, accepting, or modifying them. The rules are organized into tables, chains, and rulesets, providing a hierarchical structure for packet filtering.
The main purpose netfilter
is the table object. In the context of netfilter
and nftables
, a table is a container for organizing and storing rules. In the given command below:
nft> add table ip my-table
This creates a new table named my-table
specifically for filtering on the IP protocol.
+------------------------+
| my-table |
| (IP Filtering) |
+------------------------+
In a table can contain different objects, such as sets, used to store data. In the below command:
nft> add set ip my-table my-set {type: ipv4_addr;}
This command creates a new set named my-set
associated with the table my-table
, and it's configured to store IPv4 addresses.
+------------------------+
| my-table |
| (IP Filtering) |
| +--------+ |
| | my-set | |
| +--------+ |
| (IPv4 Addresses) |
+------------------------+
Then finally the creation of chains of rules will come to an action that would be applied to received packets
+------------------------+
| my-table |
| (IP Filtering) |
| +--------+ |
| | my-set | |
| +--------+ |
| (IPv4 Addresses) |
| +--------+ |
| | Chain |-----|--> Rule 1
| +--------+ |
| | Chain |-----|--> Rule 2
| +--------+ |
| | Chain |-----|--> Rule 3
+------------------------+
Let’s understand this with an example :
Consider a scenario where you want to control access to a web server. You could use netfilter to create a table named “web-filter” with a set named “allowed-ips” containing IP addresses allowed to access the server. You might create a chain of rules within this table to permit or deny access based on the source IP addresses. For example:
add table ip web-filter
add set ip web-filter allowed-ips {type: ipv4_addr;}
add rule ip web-filter input ip saddr @allowed-ips accept
add rule ip web-filter input drop
Build the Lab
As this is a kernel module vulnerability it’s typical to debug, so you need to have a little bit more patience than usual 🦐
- VirtualBox
- I used 2 Linux Virtual Machines.
As we have to debug a Kernel Module and Kernel is a user-space process we GDB
alone cannot use it for debugging hence we need an Client/Server
architecture. Kernel programs can be debugged remotely using the combination of gdbserver
the target machine and gdb
on the host machine/development machine. The Linux kernel has a GDB Server implementation called KGDB. It communicates with a GDB client over a network or serial port
connection.
Host/Development Machine: Runs gdb against the vmlinux file which contains the symbols and performs debugging
Target Machine: Runs kgdb and is the machine to be debugged
------------------ --------------------
| Host | | Target |
| | | |
| ------------- | | ------------ |
| | gdb | |<--------------------------->| | kgdb | |
| | | | Serial or | | | |
| -------------- | Ethernet | ------------- |
| | | Connection | | |
| -------------- | | -------------- |
| | Kernel image || | |Linux Kernel | |
| | with debug || | |(zImage) | |
| | symbols || | | | |
| | (vmlinux) || | --------------- |
| ----------------| | |
------------------- --------------------
Hence Two machines are required for using kgdb:
KGDB
is a GDB
Server implementation integrated into the Linux Kernel, It supports serial port communication (available in the mainline kernel) and network communication (patch required)
It’s available in the mainline Linux kernel since version 2.6.26 (x86 and sparc) and 2.6.27 (arm, MIPS, and PPC)
Enables full control over kernel execution on target, including memory read and write, step-by-step execution, and even breakpoints in interrupt handlers
There might be other ways to do it but I generally do the above way.
- I am using Ubuntu
AMD64-22.04 LTS
iso: https://releases.ubuntu.com/
Connect and Create a Serial Port in VirtualBox
The assumption for the step:
This has been assumed that users have ISO
images downloaded locally and already created 2 VM's
with that.
For the Demonstration I created 2 machines named as target-server
and dev
Machine.
To create a serial port in VirtualBox
and Connect
the machines it's very easy
- Select your
target
machine fromvirtualbox
and go to thesettings
options - Once you are in the
settings
tab of thetarget
server select theSerial Ports
and enter the below configuration,
Don’t check the Connect to existing pipe/socket
as we don't have any previous ones.
- Once we have the
serial port
configured follow the step1
and2
fordev
machine as well, but indev
machine you need to checkConnect to existing pipe/socket
and make sure you specify the samePath/Address
Once that’s done Congratulations Labs are ready
WARNING
! Do not start the Dev machine first other wise you will see an error of serial port as you might have already noticed that we connected the 2 machines together with serial port
DEBUGGING KERNEL — nf_tables
As we discussed already the vulnerability lies in nf_tables
and it's a kernel module so to debug a kernel module we need to follow some steps let's do those initial settings first:
- Verify the
machines (dev & target)
are communicating inserial-port
, to verify the communication between thedev
andtarget
machine, send the message onserial ports
I did send the message from target
machine to dev
machine and confirmed that they are communicating with each other on the serial port. The current version of the kernel is 22.04
if you have downloaded it from the ubuntu
official website it will not be an older version so we have to downgrade the kernel
, let's continue to do that step in Debugging
stage.
2. Download the affected versions of kernel
, so to accomplish this step I downloaded the v5.12
from the official kernel GitHub
Once you have checked the affected version of the kernel you need to install this image and update it to your grub
but before we do that we need some libraries to be available
1. build-essentials
2. flex
3. bison
4. libnftnl-dev
5. libmnl-dev
6. nftables - (Installed by default but just in case missing)
7. libncurses-dev
8. dkms
9. libssl-dev
10. libelf-dev
Once the packages are installed let’s enable KGDB
in the config file to debug the kernel and enable KGDB
settings please move inside the git
repo where we have downloaded the kernel source and run make nconfig
command
This command will bring the config
file in graphical view and verify the KGDB
the variable value is enabled.
- Select the Generic Kernel Debugging Instruments
- Verify the
KGDB
andmagic
sysrq
option is selected - Once these settings are verified we need to verify one more variable
DEBUG_INFO
it should bey
as well, as to look for the variable pressf8
and search for the value
As from the verification process, all things are verified, libraries have been installed and things are in place, as the flaw is in nf_tables
we need to make sure that this module is also enabled and installed so let's verify that too
To do that we will go to the Networking Support
> Networking Option
> Network Packet Filtering Framework
> Core Netfilter Configuration
For the safer side (as it takes a lot of time to install modules or install kernel image) and we should not miss any class or file debugging I have enabled all netfilter
modules for nf_tables
so that we don't have to repeat this step for any miss.
Press f6
and save the changes and run make -j8
the command to build the Linux kernel with multiple threads in parallel.
Go out and Grab a coffee as it going to take a long believe me very long
Verify for vmlinux
file in the location.
After make -j8
success you need to run make modules_install
command and wait for installation and completion of the command.
Once that’s completed run make install
and this will update the v5.12
modules in boot
, once that's done write update-grub
and reboot
command to restart the machine. During the restart of the machine, it will display the option to select the kernel
version, select v5.12
and boot the kernel.
Verify the kernel version by writing uname -r
Now we have downgraded to the affected version of kernel
Next, We wanted to enable the GDB-Script
in the affected target machine, GDB Scripts is a collection of helper scripts that can simplify kernel debugging steps
Todo that we have to perform 2 steps target
machine we have to enable CONFIG_GDB_SCRIPT
which was enabled in our target
machine already.
- In
Dev
machine we have to create a~/.gdbinit
machine and writeadd-auto-load-safe-path <location-bin-file>
To start the debugging on target
machine we also have to copy the debugged build
and compiled
kernel Linux folder to the dev
server. To make copying easy I installed open ssh
in target
server and used scp
command in dev
server to copy linux
compiled folder from target
machine to dev
In target
machine we made a tar.gz
file and In dev
the server used the SCP command to copy linux.tar.gz
from target
to dev
make tar.gz file with tar : tar -czvf linux.tar.gz linux
In dev
server I copied the folder at /home/target/Desktop/linux
scp <username>@<ip-address>:<file-to-copy> <dev-server-location-to-paste>
And then extracted the gz
file in the dev server by using the tar command : tar -xzvf linux.tar.gz
Open the copied vmlinux
with gdb
, make sure to open it with the root user in dev
machine
Next to debug kernel
we have to specify the serial port
and baud rate to the kgdboc
so that we can debug kernel
from the dev
machine.
Run the sysrq
magic sequence in target
server
echo g > /proc/sysrq-trigger
and On the dev
server run target remote /dev/ttyS0
We can see the kgdb
breakpoints
triggered let's put the breakpoint in our suspected functions
As we have enabled GDB-Script
let's load our beloved affected nf_tables.ko
module, and to do that we use apropos lx
so write lx-symbols
to load nf_tables.ko
and other existing modules from kernel to GDB
Once the module is loaded we can put the breakpoint
in suspected function our case (nft_set_elem_init)
under nf_tables_api.c
and start our static analysis
STATIC ANALYSIS
Delving into the myriad pathways leading to the ‘dlen’ field, my focus has been captivated by a pivotal moment — the invocation of the ‘memcpy’ function within the realms of ‘nft_set_elem_init’ in the intricate landscape of ‘/net/netfilter/nf_tables_api.c’.
Intriguingly, this code snippet and function call raises eyebrows due to its unconventional approach — utilizing two distinct objects in a rather peculiar manner. The receiving buffer finds its residence within an nft_set_ext
object, affectionately named 'ext
,' While the magnitude of the data copy is derived from an entirely different entity, an nft_set
object. The dynamic allocation of the 'ext
' object at line number 5195
in the code accomplished with 'elem
' reserves a space dictated by tmpl->len
.
Let’s represent the relevant objects and their relationships in a diagram:
+---------------------+
| nft_set_ext (ext) |
|---------------------|
| Destination |
| Buffer |
| + |
| | |
| v |
| elem |
| | |
| | |
| v |
|---------------------|
| tmpl->len |
+---------------------+
+---------------------+
| nft_set |
|---------------------|
| Source |
| Size |
| + |
| | |
| v |
|---------------------|
| set->dlen |
+---------------------+
- The upper part of the diagram represents the
nft_set_ext
object (ext
), where the destination buffer is stored. The buffer is dynamically allocated at the point withelem
, and the size reserved for it is determined bytmpl->len
. - The lower part of the diagram represents the
nft_set
object, where the source size (set->dlen
) for thememcpy
operation is stored. - The diagram illustrates the two objects,
nft_set_ext
andnft_set
, and their interconnection through thememcpy
operation. - The question here pertains to the relationship between the size of the destination buffer (
tmpl->len
) and the value stored inset->dlen
. I am suspicious about potential inconsistencies or dependencies that may exist between these two values.
Let’s check where all nft_set_elem_init
has been called to dig further
It has been referenced in the line 5560
and 5793
This function is being invoked from another function named nft_add_set_elem
, which is located in the file /net/netfilter/nf_tables_api.c
.The purpose of nft_add_set_elem
is to add an element to a netfilter set.
+-----------------------------+
| |
| nft_add_set_elem |
| (/net/netfilter/ |
| nf_tables_api.c) |
| |
+--------+--------------------+
|
| calls
|
+--------v--------------------+
| |
| nft_set_elem_init |
| |
+-----------------------------+
The nft_set
the structure has a field named dlen
, presumably indicating the length of data associated with the identifier NFT_SET_EXT_DATA
.Within the nft_set_ext
structure, there is a field named desc
. The desc
structure is where the space for data associated with NFT_SET_EXT_DATA
is reserved.The desc
the structure has a field named len
, which is used to determine the length of the space to be reserved for data associated with NFT_SET_EXT_DATA
.Contrary to expectations, the length of information from set->dlen
is not used for the reservation; instead, the length is determined by desc.len
.The desc
structure is initialized within the function nft_setelem_parse_data
in the /net/netfilter/nf_tables_api.c
file. This function is where the length information is set for the NFT_SET_EXT_DATA
.
+------------------------+ +---------------------+
| | | |
| nft_set | | nft_set_ext |
| | | |
|------------------------| |---------------------|
| ... | | ... |
|------------------------| |---------------------|
| dlen | | |
| | |---------------------|
| | | |
| | | desc |
| | |---------------------|
| | | len |
+------------------------+ +---------------------+
| ... |
+---------------------+
The nft_data_init
function is responsible for initializing the data
and desc
structures based on user-provided data. This initialization occurs at (1) and involves processing user input to set values for the data
and desc
structures.
A critical check is performed at (2) between desc->len
and set->dlen
.This check is conditional and is triggered only when the data associated with the added element has a type different from NFT_DATA_VERDICT
.
- The user has control over the variable
set->dlen
when creating a new set. - The only restrictions are that
set->dlen
should be lower than 64 bytes, and the data type should be different fromNFT_DATA_VERDICT
. - When
desc->type
is equal toNFT_DATA_VERDICT
,desc->len
is set to 16 bytes. - If an element of type
NFT_DATA_VERDICT
is added to a set with data typeNFT_DATA_VALUE
, it can lead to a situation wheredesc->len
is different fromset->dlen
.
The vulnerability arises in the nft_set_elem_init
function, where a heap buffer overflow is possible. This overflow can extend up to 48 bytes, potentially leading to a security compromise.
In the code snippet, a local variable elem
of type struct nft_set_elem
is declared. This variable is used to store information about new elements during their creation. The elem
variable is used in a call to nft_set_elem_init
. This call initializes the elem
structure with data provided by the user.
The structure struct nft_set_elem
is defined in /net/netfilter/nf_tables.h
. It contains unions for key
, key_end
, and data
, each with a maximum size of 64 bytes.
Root Cause
The vulnerability arises because, even though 64 bytes are reserved in the data
union, only 16 bytes are written into elem.data
when the buffer overflow is triggered. As a result, random bytes are used in the overflow. In essence, the overflow doesn't allow direct control of the data being copied. Instead, it involves copying random data from the allocated buffer, which adds a layer of complexity to potential exploitation. The use of random bytes can make the impact of the overflow less predictable and potentially harder to exploit in a controlled manner.
Exploitation and Explanation:
We have an exploits available for the vulnerability / POC:
- @merlinepedra25 : https://github.com/merlinepedra25/CVE-2022-34918-LPE-PoC
I used the exploit.
The requirement to run the exploit:
You need libmnl-dev
and libnftnl-dev
packages installed in your machine.
Affected Version
- Linux, introduced within the commit fdb9c405e35bdc6e305b9b4e20ebc141ed14fc81 [fdb9c405e35bdc6e305b9b4e20ebc141ed14fc81](https://github.com/torvalds/linux/commit/fdb9c405e35bdc6e305b9b4e20ebc141ed14fc81), it affects the Linux kernel since the version 5.8.
- Ubuntu <= 22.04 before security patch
Test Environment
- Platform
Ubuntu 22.04 amd64
- Versions
Linux ubuntu 5.12.0
#2 SMP Aug 18 14:17:41 JST 2023 x86_64 x86_64 x86_64 GNU/Linux
Running Exploit
# Once the exploit is downloaded go to the downloaded folder and run make command
make # make command will generate the poc file just run ./poc to run the exploit later
Result
- use
git
tool to download the exploit from: https://github.com/merlinepedra25/CVE-2022-34918-LPE-PoC - Run the
make
command to create thepoc
executable and run./poc
Exploitation — Strategy Explanation
As mentioned as well there can be multiple exploits available for the vulnerability, here we discuss the strategy used by @merlinepedra25 in his exploit https://github.com/merlinepedra25/CVE-2022-34918-LPE-PoC.
Root Cause One More Time :
The issue is in a heap overflow vulnerability in the nft_set_elem_init()
function, specifying that the overflow length can be as much as 48 bytes (64 - 16). The target objects affected by this vulnerability are those allocated by the kernel memory allocator (kmalloc
) with sizes of 64, 96, 128, or 192
bytes. The specific focus in the example is on the case where the vulnerability object is allocated with 64 bytes.
Example
Think of the nft_set_elem_init()
function as a construction site where different-sized containers are allocated to store materials. Now, imagine a flaw in how these containers are handled, allowing for an overflow of materials.
In this construction analogy, the overflow length can be substantial — up to 48 extra units of material. The specific containers affected by this vulnerability are the ones designated as kmalloc-64
, kmalloc-96
, kmalloc-128
, or kmalloc-192
. For the sake of illustrating this example, let's focus on the kmalloc-64
container.
+------------------------------------------------------+
| nft_set_elem_init() |
| Heap Overflow |
+------------------------------------------------------+
| 48 bytes |
| <---------------------------------------------------> |
| |
| +----------------------+ +-----------------+ |
| | kmalloc-64 object | | Unused Space | |
| +----------------------+ +-----------------+ |
|<--| Vulnerability Object |<---| Extra Overflow |<----|
| | | | (48 bytes) | |
| | | | | |
| +----------------------+ +-----------------+ |
| |
+------------------------------------------------------+
- Construction Site (Heap):
- The heap is like a construction site where memory is dynamically allocated to store different-sized containers.
2. nft_set_elem_init() Function:
- This function represents a specific process in the construction site where materials are handled.
3. Heap Overflow:
- The vulnerability in
nft_set_elem_init()
allows for an overflow of 48 bytes beyond the allocated container.
4. Affected Containers (kmalloc
):
- The vulnerability impacts containers designated as
kmalloc-64
,kmalloc-96
,kmalloc-128
, orkmalloc-192
. In this example, we focus on thekmalloc-64
container.
5. Vulnerability Object (kmalloc-64
):
- The specific object affected by the overflow is the
kmalloc-64
container. This is where the vulnerability resides, and it's selected when exploiting the issue.
6. Unused Space and Extra Overflow:
- The unused space within the
kmalloc-64
container becomes a target for overflow. The overflow, amounting to 48 bytes, extends into this unused space. - In essence, the vulnerability is like a construction flaw allowing materials to spill over into an unintended area.
Exploit Development Strategy:
As we already discussed the root cause of The nft_set_elem_init() function which has a heap overflow, the overflow length can reach 64-16=48
bytes and the vulnerability object can be located kmalloc-{64,96,128,192}
(the kmalloc-64 vulnerability object is selected when exploiting this article).
Imagine a scenario where you’ve identified a potential security vulnerability, a bit like finding an unguarded entrance in a fortress. However, the challenge lies in exploiting this vulnerability because you don’t have direct control over the data causing the security breach. It’s like trying to navigate through a maze blindfolded.
In the code, there’s a variable called elem.data
that plays a crucial role in the overflow, but it starts uninitialized. This uninitialized variable could be a key to controlling the overflow, turning it into a powerful tool for a potential attacker.
Let’s dive into the caller function, nf_tables_newsetelem
which is like the gatekeeper managing entries into a secure area. It adds elements to a set, and it does so by calling nft_add_set_elem
for each element the user wants to include.
Now, imagine this process as a series of doors in a secure facility. The user, like a visitor with a key, can control the number of doors they want to pass through. The key insight is that the process of passing through doors (calls to nft_add_set_elem
) can be chained together. This chaining is possible because of the user's ability to iterate over attributes using nla_for_each_nested
. It's akin to having a sequence of interconnected rooms.
Now, let’s bring in a real-life analogy: consider each element being added as a room in a building, and each room has its unique set of attributes. The user, acting as a designer, controls the number of rooms they want to design and the features within each room.
Here comes the clever part — as each room (element) is added, it contributes to the overall structure of the building (stack). The uninitialized elem.data
is like a space in each room that the user can leverage.
- Random Data Stages: Initially, random data occupies the stack, much like furnishing an empty building with random items.
- Adding NFT_DATA_VALUE Element: Introducing an element with
NFT_DATA_VALUE
data is like designing a room with specific features. This user-controlled data now occupies a section of the stack. - Adding NFT_DATA_VERDICT Element: Finally, adding a second element with
NFT_DATA_VERDICT
data triggers the buffer overflow. The residue of the last element, which contains user-controlled data, is copied during the overflow. This is akin to a design flaw in the building, causing unintended consequences.
In essence, the exploit is like a designer strategically placing rooms in a building, utilizing uninitialized spaces to create a chain reaction that results in controlled data influencing the security of the entire structure. The ability to chain these design decisions allows for a unique and independent way to manipulate the overflow, making it less reliant on specific system configurations.
CACHE: A Place where Overflow will Happen
Imagine you’re planning a construction project, and before getting into the details of how to exploit a vulnerability, you need to understand the terrain — specifically, the cache where the overflow is going to happen. In our case, this is represented by the elem
object allocated at (0). Now, the size of this elem
is dynamic and depends on choices made by the user, as seen in a previous excerpt from the nft_add_set_elem
function. The size can be influenced by options like NFT_SET_ELEM_KEY
and NFT_SET_ELEM_KEY_END
, which allows the reservation of two buffers with a maximum length of 64 bytes in elem
. This implies that the overflow can potentially occur in multiple caches.
Let’s relate this to a real-life example:
Construction Site Analogy:
- Think of the construction project as building a structure, where different-sized containers are used to store materials.
- The
elem
object is like a container whose size can be influenced by choices made during the planning phase of the construction project.
Cache Sizes (Ubuntu 22.04 with GFP_KERNEL):
- In our project, we are working on Ubuntu 22.04 with the
GFP_KERNEL
flag. The relevant cache sizes arekmalloc-64
,kmalloc-96
,kmalloc-128
, andkmalloc-192
.
Now, all that’s left is to make sure our elem
is aligned with the cache object size for the most effective overflow. The diagram below represents the construction of elem
aligning it on 64 bytes, considering the cache sizes.
+------------------------------------+
| Construction Site |
| +--------------------------+ |
| | elem | |
| | (User-Selected) | |
| |--------------------------| |
| | NFT_SET_ELEM_KEY, | |
| | NFT_SET_ELEM_KEY_END, | |
| | and other options | |
| +--------------------------+ |
| |
|(Cache Sizes: kmalloc-{64,96,128,192}) |
+------------------------------------+
- The construction site represents the memory space where the
elem
object is allocated. elem
is dynamic and influenced by user-selected options, such asNFT_SET_ELEM_KEY
andNFT_SET_ELEM_KEY_END
.- The diagram visually depicts the alignment of
elem
on a cache object size (64 bytes in this case) to optimize the overflow.
Exploit Construction strategy:
The construction involves allocating a certain amount of memory for the object header, adding padding through the use of, and allocating space to store element data of type NFT_DATA_VERDICT
. The goal is likely to optimize memory usage and layout for efficient exploitation or manipulation.
- Object Header (20 bytes):
- Like the labels, tags, or identifiers you might attach to boxes on a shelf, the construction allocates 20 bytes for the object header. This is the essential information needed to identify and manage the stored elements.
2. Padding via NFT_SET_ELEM_KEY (28 bytes):
- Just as you might strategically arrange smaller items around the edges of a box to fill space efficiently, the construction used
NFT_SET_ELEM_KEY
to add 28 bytes of padding. This helps optimize the layout within thekmalloc-64
cache.
3. Element Data Storage for NFT_DATA_VERDICT (16 bytes):
- Similar to allocating specific compartments for certain types of products on a shelf, 16 bytes are reserved to store element data of type
NFT_DATA_VERDICT
. This could be likened to allocating space for a specific category of items.
+---------------------------------------+
| Shelf-64 (kmalloc-64) |
| +----------------------------+ |
| | Object Header | |
| | (Identification Tags) | |
| +-------------------------------+ |
| | Padding via NFT_SET_ELEM_KEY| |
| |-------------------------------| |
| | | |
| | | |
| +-------------------------------+ |
| | Element Data Storage | |
| | (NFT_DATA_VERDICT Type) | |
| +-------------------------------+ |
+---------------------------------------+
- Shelf-64
(kmalloc-64)
: Represents the specific cache size targeted by the construction strategy. - Object Header: Serves as identification tags, labels, or headers attached to each storage unit.
- Padding via NFT_SET_ELEM_KEY: Analogous to strategically filling space with smaller items on a shelf to maximize efficiency.
- Element Data Storage: Reserved space for a specific type of data (in this case,
NFT_DATA_VERDICT
), comparable to allocating specific compartments for certain categories of products on a shelf.
Since the overflow occurs in kmalloc-x
caches and not in kmalloc-cg-x
caches where classical msg_msg
objects are allocated, and an alternative information leak method is needed. So exploit development happened using user_key_payload
objects, typically used to store sensitive user information in the kernel.
Imagine you’re trying to gather information from labeled boxes in a storage facility, but you can’t directly access the boxes you need. However, you discover another set of special boxes that might hold the information you’re looking for — these are the user_key_payload
boxes. Each box has a structure similar to the ones you've been trying to access before, containing a header indicating the size of the object and a buffer for user data.
Imagine you’re trying to gather information from labeled boxes in a storage facility, but you can’t directly access the boxes you need. However, you discover another set of special boxes that might hold the information you’re looking for — these are the user_key_payload
boxes. Each box has a structure similar to the ones you've been trying to access before, containing a header indicating the size of the object and a buffer for user data.
Structure of user_key_payload
Object:
struct user_key_payload {
struct rcu_head rcu; /* RCU destructor */
unsigned short datalen; /* length of this data */
char data[] __aligned(__alignof__(u64)); /* actual data */
};
In the storage facility, these special boxes are allocated within the function user_preparse
in a way similar to how you might allocate space for certain items based on their size.
Allocation in user_preparse
Function:
int user_preparse(struct key_preparsed_payload *prep) {
struct user_key_payload *upayload;
size_t datalen = prep->datalen;
if (datalen <= 0 || datalen > 32767 || !prep->data)
return -EINVAL;
upayload = kmalloc(sizeof(*upayload) + datalen, GFP_KERNEL); // Allocation at (6)
if (!upayload)
return -ENOMEM;
/* attach the data */
prep->quotalen = datalen;
prep->payload.data[0] = upayload;
upayload->datalen = datalen;
memcpy(upayload->data, prep->data, datalen); // Copying data at (7)
return 0;
}
The allocation at (6) ensures that the length of the allocated space is based on the size of the user-provided data. The data is then stored just after the header with a memcpy
call at (7). The headers of user_key_payload
objects are 24 bytes long, allowing them to be used to fill several caches, from kmalloc-32
to kmalloc-8k
.
The goal is similar to the previous method with msg_msg
objects: overwrite the datalen
field with a larger value than the initial one. When retrieving the information stored, the corrupted object will return more data than initially provided by the user.
However, there’s a limitation to this approach. The number of allocated objects is restricted by sysctl variables, specifically kernel.keys.maxkeys
(limit on the number of allowed keys) and kernel.keys.maxbytes
(limit on the number of stored bytes). The default values for Ubuntu 22.04 are very low:
kernel.keys.maxbytes = 20000
kernel.keys.maxkeys = 200
+---------------------------------------------------+
| Storage Facility (Kernel) |
| +-----------------------------------------+ |
| | user_key_payload Box | |
| | (Header + Buffer for User Data) | |
| +-----------------------------------------+ |
| | Allocation (kmalloc) | |
| | Based on User Data Size | |
| +-----------------------------------------+ |
+---------------------------------------------------+
- The storage facility represents the kernel memory space.
user_key_payload
boxes are analogous to storage boxes containing a header and user data.- Allocation is performed based on the size of user-provided data, and the goal is to manipulate the headers for controlled overflow.
In this analogy, think of the user_key_payload
boxes as specially labeled storage containers that might hold the information you're looking for, and the challenge is to efficiently use them to extract valuable details about the system.
The exploit is developed focusing on the kmalloc-64
cache due to its small object size. The exploit developer targets percpu_ref_data
objects, which are allocated in this cache and contain pointers to functions useful for computing the Kernel Address Space Layout Randomization (KASLR) base and module bases. The objects are allocated during the initialization of an io_ring_ctx
object, specifically in the io_ring_ctx_alloc
function, which is part of the Linux core. The io_uring_setup
syscall is used as the simplest way to allocate these objects, and the close
syscall is employed to program their release.
So overall the whole phase of memory leakage is described in steps it would go like:
- Focus on the
kmalloc-64
cache for efficient information leakage. - Target objects within this cache are
percpu_ref_data
objects, which contain useful pointers. percpu_ref_data
structure includes pointers to functions (release
andconfirm_switch
) useful for computing KASLR base or module bases when leaked and a pointer to a dynamically allocated object (ref
) useful for computing the physmap base.- Allocation of
percpu_ref_data
objects occur during the initialization of anio_ring_ctx
object using theio_uring_setup
syscall. - The
io_uring_ctx_alloc
function within/fs/io_uring.c
is responsible for this allocation. - By leaking information about
io_ring_ctx_ref_free
andio_rsrc_node_ref_zero
functions, we can compute the KASLR base. - The unexpected discovery of
percpu_ref_data
objects with the address of theio_rsrc_node_ref_zero
function in therelease
field, originating from theio_uring_setup
syscall becomes a beneficial side effect for improving the exploit.
Example Diagram for leaking steps:
+---------------------------------------------+
| Kernel Memory Space |
| |
| +-----------------------------------+ |
| | kmalloc-64 Cache | |
| | | |
| | +-----------------------------+ | |
| | | percpu_ref_data Object | | |
| | | | | |
| | | +-----------------------+ | | |
| | | | count | | | |
| | | | release | | | |
| | | | confirm_switch | | | |
| | | | force_atomic | | | |
| | | | allow_reinit | | | |
| | | | rcu | | | |
| | | | ref | | | |
| | | +-----------------------+ | | |
| | +-----------------------------+ | |
| | | |
| +-----------------------------------+ |
+---------------------------------------------+
- The diagram represents the kernel memory space with a focus on the
kmalloc-64
cache. - Within this cache,
percpu_ref_data
objects are allocated during the initialization of anio_ring_ctx
object using theio_uring_setup
syscall. - These
percpu_ref_data
objects contain pointers to functions and a reference to dynamically allocated objects, making them valuable targets for information leakage. - The goal is to exploit the leak of information about functions like
io_ring_ctx_ref_free
andio_rsrc_node_ref_zero
to compute the KASLR base and improve the overall exploit.
High-Level Steps to Develop an Exploit
- Heap Layout Construction:
- Construct the heap layout with the following components:
vul_obj
: Vulnerability objectuser_key_payload
: Payload containing user-controlled datapercpu_ref_data
: Per-CPU reference data
2. Overflow and Tamper (Leak Addresses):
- Trigger a heap overflow to tamper with
user_key_payload
. - Modify
user_key_payload->datalen
to leakpercpu_ref_data->release
(kernel base address) andpercpu_ref_data->ref_physmap
(physmap
base address). - This step aims to obtain critical kernel addresses for later privilege escalation.
3. Heap Layout Reconstruction (Arbitrary Write):
- Construct a new heap layout with the following components:
vul_obj
: Vulnerability objectsimple_xattr
: Simple extended attribute- Trigger another overflow to tamper with
simple_xattr
and manipulate its linked list.
4. Restricted Arbitrary Write:
- Leverage the restricted arbitrary write-on
simple_xattr
to modifymodprobe_path
. - This action is performed when the extended attribute (
xattr
) is removed from the linked list. - The goal is to escalate privileges by modifying
modprobe_path
to point to/sbin/modprobe
and executing arbitrary commands.
5. Prerequisites:
- The
physmap
address needs to be leaked. - The root directory (“/”) contains both the kernel base address and the
physmap
address.
+-------------------------------------+
| vul_obj |
+-------------------------------------+
| user_key_payload |
| +--------------------------+ |
| | percpu_ref_data | |
| +--------------------------+ |
| |
| |
| (Heap Overflow) |
| |
+-------------------------------------+
| simple_xattr |
| +--------------------------+ |
| | Modified list | |
| +--------------------------+ |
| |
| (Arbitrary Write) |
| |
+-------------------------------------+
- The heap layout is manipulated to create vulnerabilities in two different objects (
vul_obj
andsimple_xattr
). - The first overflow is used to leak kernel addresses (
percpu_ref_data->release
andpercpu_ref_data->ref_physmap
). - The second overflow, triggered when removing an extended attribute, allows for a restricted arbitrary write to modify
modprobe_path
. - Successful exploitation of these vulnerabilities would lead to privilege escalation, allowing an attacker to execute arbitrary commands with elevated privileges.
Patch Diffing
A change was made to fix a vulnerability in the code. As we discussed the flaw is in the nftables
framework, particularly within the nft_setelem_parse_data
function handling NFT_MSG_NEWSETELEM
, permits manipulation of built-in sets. These sets involve the addition of elements. The vulnerability stems from an oversight in the type-checking mechanism during element addition.
The nft_setelem_parse_data
function initializes data and desc and then undergoes a legality check to confirm the incoming data's size aligns with the set type. The problem arises when a VERDICT
type is introduced, and the set primarily stores the VALUE type. In this scenario, the type check fails to consider the VERDICT
type, allowing a VERDICT element to be added to a VALUE set, leading to a potential heap overflow.
The remedy involves a patch that introduces a dedicated check for the VERDICT
type, ensuring both type and length conform to the set's expectations.
+---------------------------------------+
| nft_setelem_parse_data |
| |
| +---------------------------+ |
| | Initialization | |
| | | |
| | - nft_data_init | |
| | - Size legality check | |
| +---------------------------+ |
| | |
| v |
| +---------------------------+ |
| | Type Check (Before) | |
| | - VERDICT vs. VALUE type | |
| | - Length verification | |
| +---------------------------+ |
| | |
| v |
| +---------------------------+ |
| | Heap Overflow Occurs | |
| | - Addition of VERDICT | |
| | to a VALUE set | |
| +---------------------------+ |
| | |
| v |
| +---------------------------+ |
| | Patched Check | |
| | - Specific VERDICT check | |
| | - Type and length match | |
| +---------------------------+ |
| |
+---------------------------------------+
- The diagram outlines the control flow within the
nft_setelem_parse_data
function. - The vulnerability arises when the type check fails to appropriately handle the introduction of a VERDICT element into a VALUE set, potentially leading to a heap overflow.
- The patched version includes an additional check specifically addressing the VERDICT type, ensuring both type and length align with the set’s expectations, thereby preventing the heap overflow vulnerability.
Final Thoughts
Throughout the journey of analyzing the CVE-2022-34918
and addressing the security concern, it has been an illuminating experience. The process of delving into the heap overflow in restricted user data inputs, understanding its implications, and applying the necessary fixes has deepened my understanding of nf_tables.ko
module and heap overflow exploitation.
Furthermore, I would like to acknowledge @Arthur Mongodin
the remarkable contribution in crafting an exploit for the vulnerability. The exploit has not only provided a practical demonstration of the vulnerability but has also enabled me to test and validate its vulnerability
existence.
I trust that reading this account was as delightful for you as it was for me to craft it.
Also, there can be multiple ways to exploit the vulnerability, The exploitation operates under the assumption that a particular address is consistently mapped in the kernel space, though this is not universally guaranteed. Consequently, the exploit’s reliability is not absolute, yet it boasts a commendable success rate. Another challenge lies in the occurrence of a kernel panic upon completion of the exploit. To mitigate this, efforts are underway to identify objects capable of persisting in kernel memory beyond the conclusion of the exploitation process. It requires thorough experimentation with various placements but it’s a worthwhile task to manipulate it.
Resources:
https://elixir.bootlin.com/linux
https://randorisec.fr/crack-linux-firewall/
https://github.com/torvalds/linux/commit/7e6bc1f6cabcd30aba0b11219d8e01b952eacbb6
https://access.redhat.com/security/cve/cve-2022-34918