README.md 10.3 KB
Newer Older
Gabor Buella's avatar
Gabor Buella committed
1 2
# syscall_intercept

3 4
[![Build Status](https://travis-ci.org/pmem/syscall_intercept.svg)](https://travis-ci.org/pmem/syscall_intercept)
[![Coverage Status](https://codecov.io/github/pmem/syscall_intercept/coverage.svg)](https://codecov.io/gh/pmem/syscall_intercept)
Marcin Ślusarz's avatar
Marcin Ślusarz committed
5 6
[![Coverity Scan Build Status](https://scan.coverity.com/projects/12890/badge.svg)](https://scan.coverity.com/projects/syscall_intercept)

Gabor Buella's avatar
Gabor Buella committed
7
Userspace syscall intercepting library.
Gabor Buella's avatar
Gabor Buella committed
8

9 10 11 12 13 14 15 16
# Dependencies #

## Runtime dependencies ##

 * libcapstone -- the disassembly engine used under the hood

## Build dependencies ##

17 18
# Local build dependencies #

19 20
 * C99 toolchain -- tested with recent versions of GCC and clang
 * cmake
21
 * perl -- for checking coding style
22 23
 * pandoc -- for generating the man page

24
### Travis CI build dependencies ###
25 26 27 28 29 30 31 32 33 34 35

The travis builds use some scripts to generate a docker images, in which syscall_intercept is built/tested.
These docker images are pushed to Dockerhub, to be reused in later travis builds.
The scripts expect four environment variables to be set in the travis environment:
 * DOCKERHUB_REPO - where to store the docker images used for building
    e.g. in order to refer to a Dockerhub repository at https://hub.docker.com/r/pmem/syscall_intercept, this variable
    should contain the string "pmem/syscall_intercept"
 * DOCKERHUB_USER - used for logging into Dockerhub
 * DOCKERHUB_PASSWORD - used for logging into Dockerhub
 * GITHUB_REPO - where the repository is available on github (e.g. "pmem/syscall_intercept" )

36
### How to build ###
Gabor Buella's avatar
Gabor Buella committed
37 38 39 40

Building libsyscall_intercept requires cmake.
Example:
```sh
41
cmake path_to_syscall_intercept -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang
Gabor Buella's avatar
Gabor Buella committed
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
make
```
alternatively:
```sh
ccmake path_to_syscall_intercept
make
```

There is an install target. For now, all it does, is cp.
```sh
make install
```

Coming soon:
```sh
make test
```

# Synopsis #
Gabor Buella's avatar
Gabor Buella committed
61 62 63 64 65

```c
#include <libsyscall_intercept_hook_point.h>
```
```sh
66 67 68
cc -lsyscall_intercept -fpic -shared source.c -o preloadlib.so

LD_PRELOAD=preloadlib.so ./application
Gabor Buella's avatar
Gabor Buella committed
69 70 71 72 73 74 75 76 77
```

##### Description: #####

The system call intercepting library provides a low-level interface
for hooking Linux system calls in user space. This is achieved
by hotpatching the machine code of the standard C library in the
memory of a process. The user of this library can provide the
functionality of almost any syscall in user space, using the very
Gabor Buella's avatar
Gabor Buella committed
78
simple API specified in the libsyscall_intercept\_hook\_point.h header file:
Gabor Buella's avatar
Gabor Buella committed
79 80 81 82 83 84 85 86
```c
int (*intercept_hook_point)(long syscall_number,
			long arg0, long arg1,
			long arg2, long arg3,
			long arg4, long arg5,
			long *result);
```

87
The user of the library shall assign to the variable called
Gabor Buella's avatar
Gabor Buella committed
88
intercept_hook_point a pointer to the address of a callback function.
Gabor Buella's avatar
Gabor Buella committed
89
A non-zero return value returned by the callback function is used
90 91
to signal to the intercepting library that the specific system
call was ignored by the user and the original syscall should be
Gabor Buella's avatar
Gabor Buella committed
92 93
executed. A zero return value signals that the user takes over the
system call. In this case, the result of the system call
94
(the value stored in the RAX register after the system call)
Gabor Buella's avatar
Gabor Buella committed
95 96
can be set via the *result pointer. In order to use the library,
the intercepting code is expected to be loaded using the
97
LD_PRELOAD feature provided by the system loader.
Gabor Buella's avatar
Gabor Buella committed
98

99 100
All syscalls issued by libc are intercepted. Syscalls made
by code outside libc are not intercepted. In order to
Gabor Buella's avatar
Gabor Buella committed
101 102 103 104 105 106 107 108
be able to issue syscalls that are not intercepted, a
convenience function is provided by the library:
```c
long syscall_no_intercept(long syscall_number, ...);
```

Three environment variables control the operation of the library:

Marcin Ślusarz's avatar
Marcin Ślusarz committed
109 110 111 112 113 114
*INTERCEPT_LOG* -- when set, the library logs each syscall intercepted
to a file. If it ends with "-" the path of the file is formed by appending
a process id to the value provided in the environment variable.
E.g.: initializing the library in a process with pid 123 when the
INTERCEPT_LOG is set to "intercept.log-" will result in a log file named
intercept.log-123.
Gabor Buella's avatar
Gabor Buella committed
115

Marcin Ślusarz's avatar
Marcin Ślusarz committed
116 117
*INTERCEPT_LOG_TRUNC -- when set to 0, the log file from INTERCEPT_LOG
is not truncated.
Gabor Buella's avatar
Gabor Buella committed
118

Gabor Buella's avatar
Gabor Buella committed
119
*INTERCEPT_HOOK_CMDLINE_FILTER* -- when set, the library
120
checks the command line used to start the program.
Gabor Buella's avatar
Gabor Buella committed
121
Hotpatching, and syscall intercepting is only done, if the
122 123 124
last component of the command used to start the program
is the same as the string provided in the environment variable.
This can also be queried by the user of the library:
Gabor Buella's avatar
Gabor Buella committed
125
```c
126
int syscall_hook_in_process_allowed(void);
Gabor Buella's avatar
Gabor Buella committed
127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184
```

##### Example: #####

```c
#include <libsyscall_intercept_hook_point.h>
#include <syscall.h>
#include <errno.h>

static int
hook(long syscall_number,
			long arg0, long arg1,
			long arg2, long arg3,
			long arg4, long arg5,
			long *result)
{
	if (syscall_number == SYS_getdents) {
		/*
		 * Prevent the application from
		 * using the getdents syscall. From
		 * the point of view of the calling
		 * process, it is as if the kernel
		 * would return the ENOTSUP error
		 * code from the syscall.
		 */
		*result = -ENOTSUP;
		return 0;
	} else {
		/*
		 * Ignore any other syscalls
		 * i.e.: pass them on to the kernel
		 * as would normally happen.
		 */
		return 1;
	}
}

static __attribute__((constructor)) void
init(void)
{
	// Set up the callback function
	intercept_hook_point = hook;
}
```

```sh
$ cc example.c -lsyscall_intercept -fpic -shared -o example.so
$ LD_LIBRARY_PATH=. LD_PRELOAD=example.so ls
ls: reading directory '.': Operation not supported
```

# Under the hood: #

##### Assumptions: #####
In order to handle syscalls in user space, the library relies
on the following assumptions:

- Each syscall made by the applicaton is issued via libc
185
- No other facility attempts to hotpatch libc in the same process
Gabor Buella's avatar
Gabor Buella committed
186 187 188 189 190 191 192 193 194 195 196 197 198
- The libc implementation is already loaded in the processes
memory space when the intercepting library is being initialized
- The machine code in the libc implementation is suitable
for the methods listed in this section
- For some more basic assumptions, see the section on limitations.

##### Disassembly: #####
The library disassembles the text segment of the libc loaded
into the memory space of the process it is initialized in. It
locates all syscall instructions, and replaces each of them
with a jump to a unique address. Since the syscall instruction
of the x86_64 ISA occupies only two bytes, the method involves
locating other bytes close to the syscall suitable for overwriting.
199
The destination of the jump (unique for each syscall) is a
200
small routine, which accomplishes the following tasks:
Gabor Buella's avatar
Gabor Buella committed
201 202 203 204 205

1. Optionally executes any instruction that originally
preceded the syscall instruction, and was overwritten to
make space for the jump instruction
2. Saves the current state of all registers to the stack
206
3. Translates the arguments (in the registers) from
Gabor Buella's avatar
Gabor Buella committed
207 208
the Linux x86_64 syscall calling convention to the C ABI's
calling convention used on x86_64
209
4. Calls a function written in C (which in turn calls
210
the callback supplied by the library user)
Gabor Buella's avatar
Gabor Buella committed
211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238
5. Loads the values from the stack back into the registers
6. Jumps back to libc, to the instruction following the
overwritten part

##### In action: #####

*Simple hotpatching:*
Replace a mov and a syscall instruction with a jmp instruction
```
Before:                         After:

db2a0 <__open>:                 db2b0 <__open>:
db2aa: mov $2, %eax           /-db2aa: jmp e0000
db2af: syscall                |
db2b1: cmp $-4095, %rax       | db2b1: cmp $-4095, %rax ---\
db2b7: jae db2ea              | db2b7: jae db2ea           |
db2b9: retq                   | db2b9: retq                |
                              | ...                        |
                              | ...                        |
                              \_...                        |
                                e0000: mov $2, $eax        |
                                ...                        |
                                e0100: call implementation /
                                ...                       /
                                e0200: jmp db2aa ________/
```
*Hotpatching using a trampoline jump:*
Replace a syscall instruction with a short jmp instruction,
239
the destination of which is a regular jmp instruction.
Gabor Buella's avatar
Gabor Buella committed
240 241 242 243 244 245 246 247 248 249 250
The reason to use this, is that a short jmp instruction
consumes only two bytes, thus fits in the place of a syscall
instruction. Sometimes the instructions directly preceding
or following the syscall instruction can not be overwritten,
leaving only the two bytes of the syscall instruction
for patching.
The hotpatching library looks for place for the trampoline jump
in the padding found to the end of each routine. Since the start
of all routines is aligned to 16 bytes, often there is a padding
space between the end of a symbol, and the start of the next symbol.
In the example below, this padding is filled with 7 byte long
251
nop instruction (so the next symbol can start at the address 3f410).
Gabor Buella's avatar
Gabor Buella committed
252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274
```
Before:                         After:

3f3fe: mov %rdi, %rbx           3f3fe: mov %rdi, %rbx
3f401: syscall                /-3f401: jmp 3f430
3f403: jmp 3f415              | 3f403: jmp 3f415 ----------\
3f407: retq                   | 3f407: retq                |
                              \                            |
3f408: nopl 0x0(%rax,%rax,1)  /-3f408: jmp e1000           |
                              | ...                        |
                              | ...                        |
                              \_...                        |
                                e1000: nop                 |
                                ...                        |
                                e1100: call implementation /
                                ...                       /
                                e1200: jmp 3f403 ________/

```

# Limitations: #
* Only Linux is supported
* Only x86\_64 is supported
Gabor Buella's avatar
Gabor Buella committed
275
* Only tested with glibc, although perhaps it works
Gabor Buella's avatar
Gabor Buella committed
276 277 278 279 280 281 282 283
with some other libc implementations as well
* There are known issues with the following syscalls:
  * clone
  * rt_sigreturn

# Debugging: #
Besides logging, the most important factor during debugging is to make
sure the syscalls in the debugger are not intercepted. To achieve this, use
Gabor Buella's avatar
Gabor Buella committed
284
the INTERCEPT_HOOK_CMDLINE_FILTER variable described above.
Gabor Buella's avatar
Gabor Buella committed
285 286

```
Gabor Buella's avatar
Gabor Buella committed
287
INTERCEPT_HOOK_CMDLINE_FILTER=ls \
Gabor Buella's avatar
Gabor Buella committed
288 289 290 291 292 293
	LD_PRELOAD=libsyscall_intercept.so \
	gdb ls
```

With this filtering, the intercepting library is not activated in the gdb
process itself.