I recently put a kprobe using eBPF for a function that accepts 8 parameters. The trouble is that BPF_KPROBE macro can only handle functions with up to 5 parameters but I was interested in most of them. It took a bit of fiddling to get to all of them and I wanted to document it in case the need arises for someone else (or my future self).

There seems to be no portable way of doing this but I was only interested in the 64-bit x86 (x86-64) so this post will only talk about this single architecture.

I was hooking __get_user_pages which has the following prototype:

1
2
3
4
static long __get_user_pages(struct mm_struct *mm,
                unsigned long start, unsigned long nr_pages,
                unsigned int gup_flags, struct page **pages,
                struct vm_area_struct **vmas, int *locked);

The kprobe code that grabs all the parameters looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
SEC("kprobe/__get_user_pages")
int BPF_KPROBE(kprobe____get_user_pages, struct task_struct *tsk, struct mm_struct *mm,
        unsigned long start, unsigned long nr_pages, unsigned int gup_flags
        /*, struct page **pages, struct vm_area_struct **vmas, int *nonblocking) */)
{
    struct page **pages = (struct page**)ctx->r9;

    struct vm_area_struct **vmas;
    void *parm7 = (void*)(ctx->sp + 8);
    if (bpf_probe_read_kernel(&vmas, sizeof(vmas), parm7) != 0) {
        // error but shouldn't happen
    }

    int *nonblocking;
    void *parm8 = (void*)(ctx->sp + 16);
    if (bpf_probe_read_kernel(&nonblocking, sizeof(nonblocking), parm8) != 0) {
        // error but shouldn't happen
    }
    // ...
}

Since BPF_KPROBE supports up to five parameters, I commented out the remaining three.

x86-64 ABI defines how function parameters are passed. The first 6 integral arguments are passed in registers (rdi, rsi, rdx, rcx, r8, r9) (floating point ones are passed in xmm registers). Anything beyond that is passed on the stack. The parameters are pushed onto the stack right-to-left and since the stack grows towards smaller addresses, the last parameter will be at the highest address. Eli’s post gives a good summary.

There are some differences though between that article and how things play out with eBPF kprobes. When the kprobe is set up, it installs a trampoline at the very beginning of the target function. The trampoline ultimately invokes the eBPF handler. Typically this trampoline is installed via the ftrace mechanism. The kernel is compiled with the -pg -mfentry options which cause the GCC to emit a call to a special __fentry__ function as the very first instruction of a subroutine. On bootup, the kernel overwrites those instructions with nops. When a kprobe is requested, the kernel overwrites those nops with the call to the trampoline.

If we dump /proc/kallsyms we can find the addresses of __fentry__ and __get_user_pages functions.

ffffffffb3801950 T __fentry__
...
ffffffffb2e4ad90 t __get_user_pages
...

Next, let’s disassemble the kernel and take a look at the __get_user_pages instructions.

1
2
3
4
5
ffffffffb2e4ad90:   e8 bb 6b 9b 00          callq  0xffffffffb3801950
ffffffffb2e4ad95:   55                      push   %rbp
ffffffffb2e4ad96:   48 89 c8                mov    %rcx,%rax
ffffffffb2e4ad99:   48 89 e5                mov    %rsp,%rbp
...

The very first thing is a call to __fentry__ (remember, it gets replaced with the call to the trampoline when the kprobe is installed). It happens right before the function prologue which saves the previous frame pointer (rbp register) onto the stack and sets the current frame pointer to the value of the stack pointer (rsp register).

The trampoline function immediately saves all of the registers into a struct pt_regs, the pointer to which gets passed into eBPF kprobe handler in a hidden ctx parameter. When the call to the trampoline is made, there’ll be more data on the stack (at a minimum the callq instruction will add the return address, 0xffffffffb2e4ad95. Fortunately, the trampoline does not save the raw rsp value in pt_regs, it adjusts it to look like the call to the trampoline never happened.

When our kprobe handler is invoked, ctx->sp points to a stack that looks like this:

         +=====================+
SP + 0:  | return address      |
         +---------------------+
SP + 8:  | parm7 (vmas)        |
         +---------------------+
SP + 16: | parm8 (nonblocking) |
         +---------------------+
         | ...                 |

With the parameter offsets known relative to the stack pointer, reading them just requires calling the bpf_probe_read_kernel helper.