Understanding ptrace

Background

ptrace is the only interface that the Linux kernel provides to debug applications. All *nix debuggers (such as gdb, or lldb) use it.

But, there’s a problem - it isn’t easy to use.

I was working on a project at work that required a way to pretty much check every syscall before allowing the program to execute it. This would be possible to script using gdb and python, except python would cause the debugged program (aka. tracee) to run really slowly. So, I started working on my own solution - I started writing a program that would directly use the ptrace API.

ptrace Overview

A process can either be Running or Stopped.

The following requests are valid for Running state:

  • PTRACE_ATTACH
  • PTRACE_SEIZE
  • PTRACE_INTERRUPT
  • PTRACE_KILL

All other ptrace requests are only valid for Stopped tracees (except PTRACE_TRACEME which we’re not going to look into).

Writing a process tracer

A quick overview of how a tracer should work:

  1. Attach to the target tracee(s)
  2. If you’re working with breakpoints, now’s a good time to set them (by pushing int3s, assuming x86).
  3. Tell the kernel to stop when something happens (syscall?, single stepping?, you decide).
  4. Handle the situation when you recieve a ptrace-stop.
  5. Repeat step 3…

Attaching

ptrace provides to ways to attach a process. The first way is using PTRACE_ATTACH, the second is PTRACE_SEIZE.

I originally used the attach method, but then moved to the seize method (explained later).

Once a PTRACE_ATTACH is requested, you must wait on the pid (to make sure it’s in a stopped state), set tracer options (with PTRACE_SETOPTIONS) if needed, and then either PTRACE_CONT or PTRACE_SYSCALL to continue.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
/* attach */
ptrace(PTRACE_ATTACH, pid, 0, 0);

/* wait for the attach request to complete */
waitpid(pid, NULL, 0);

/* set ptrace options */
ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_SYSVGOOD);

/* resume ptrace execution */
ptrace(PTRACE_SYSCALL, pid, 0, 0);

while (1) {
    int status = 0;
    int pid = waitpid(pid, &status, 0);
    
    /* handle ptrace events */
    /*  ... */

    /* resume tracee execution */
    ptrace(PTRACE_SYSCALL, pid, 0, 0);
}

Setting ptrace options

Setting ptrace options is good if you want to get ptrace-stops for forks/execves/clones/exits etc… these constant start with PTRACE_O_*.

Settings ptrace options is done with the PTRACE_SETOPTIONS request, unless you’re running a kernel before Linux 2.4. If you’re running Linux 2.4 or below you’ll need to set options with PTRACE_OLDSETOPTIONS.

Instead of dealing with ptrace’s constants madness, I went for what looked like a more stable request - I changed my program to PTRACE_SEIZE. You should note that ptrace’s behavious changes in some cases when switching to the SEIZE method. Unlike PTRACE_ATTACH, detecting a group stop is possible without calling PTRACE_GETSIGINFO under PTRACE_SEIZE.

Seizing

Seizing a process turns the target process into a tracer’s tracee, but unlike PTRACE_ATTACH the tracee isn’t stopped (SIGSTOP will not be sent) & allows you to set ptrace options at the same time!

Once you’ve seized your process, you should PTRACE_INTERRUPT your tracee if you want to set breakpoints or tell ptrace when you want to stop (if the stops you want aren’t already available in ptrace’s options).

After switching to the seize method, my code started with something like this

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
ptrace(PTRACE_SEIZE, pid, 0, PTRACE_O_SYSVGOOD);
ptrace(PTRACE_INTERRUPT, pid, 0, 0);
ptrace(PTRACE_SYSCALL, pid, 0, 0);

while (1) {
    int status = 0;
    int pid = waitpid(pid, &status, 0);
    /* ... */
    ptrace(PTRACE_SYSCALL, pid, 0, 0);
}

Getting ptrace stops

Now that you’ve got your process attached, it’s time to actually get work done. The tracer I wrote didn’t require any breakpoints, only syscalls. So I called ptrace with a PTRACE_SYSCALL and waited for a ptrace stops using waitpid(2).

ptrace stop types

Possible types of stops:

  • syscall enter - Just before that kernel receives a syscall.
  • syscall exit - Just before the syscall return to the tracee.
  • group stop - stops for multithreaded programs.
  • signal stop - handling signals that tracees recieve.
  • ptrace event - events that you can choose to receive using the ptrace options.

Telling the difference between ptrace stop types is as follows:

When a ptrace event occurs, you can get the status from waitpid (second argument) and compare it to the event type.

1
2
3
4
5
6
int status = 0;
int pid = waitpid(-1, &status, 0);

if (status >> 8 == SIGTRAP | (PTRACE_EVENT_* << 8)) {
    /* handle ptrace event */
}

A syscall stop can be distinguished by settings the PTRACE_O_TRACESYSVGOOD option. It will set the status to status >> 8 == (SIGTRAP|0x80).

On x86 platforms, syscall-enter stops with have the eax flag set to -ENOSYS. syscall-exit stop will usually not have this value. Either way, It is always important to keep track of syscall-stop states per pid. If you really need to know the type of stop, you can call PTRACE_GETSIGINFO and check si_code’s value.

If the tracee was attached using the seize method, you can distinguish group-stops by comparing status>>16==PTRACE_EVENT_STOP.

waitpid

waitpid is how we wait for ptrace to report an event. How can you do that on multiple processes?

I originally looked through waitpid and tried to use the WNOHANG flag, it kind of worked but CPU was at 100% on one core which isn’t really optimal. I then tried to create multiple threads, which didn’t work since the thread that attaches the tracee is the only thread that can issue ptrace requests for that process (every thread has it’s own PID).

Apperantly tracees are similar to child processes of the tracer. I later found out that -1 can be used with waitpid as well (usually for child processes).

waitpid with -1 is great because waitpid returns the pid. More information about the stop can be determined from waitpid’s returned status.

Reading process data

It is possible to read registers using PTRACE_GETREGS, It will copy register values to the user_regs_struct that you can pass to the data argument of ptrace.

ptrace doesn’t provide a good way to read process memory. Using ptrace’s PTRACE_PEEKDATA will allow you to read 2 bytes from the tracee, but that would be too slow for any normal sized buffer (too many syscalls). If you do decide to use PTRACE_PEEKDATA, make sure to clear errno and check it if ptrace returns -1 (that might actually be the value from the process).

Better ways to read data would include:

  1. Using process_vm_readv syscall is the best option in my opinion.
  2. Using /proc/<pid>/mem might also be a good option, but doesn’t exist on as many platforms and doesn’t seem as light as process_vm_readv. WSL doesn’t support this (last time i tried at least).

Notes/Tips

  • If your process needs to execute a SUID, it won’t work under a non-root tracer.
  • Try not to call syscalls as much as you can. They’ll slow you down.
  • Unexpected errors may occur (espacially when tracing programs), try to handle all errors.
  • In this post I only focused on retrieving data from tracees, similar requests exist to write data (PTRACE_POKEDATA, PTRACE_SETREGS, process_vm_writev, etc…)
  • Please leave comments!