Fuzzing the Linux kernel x86 instruction decoder and finding nothing

28 May 2024

Context

For specific uses, the Linux kernel can decode x86 instructions. One of these uses is to handle #VC exceptions. VMM Communication Exception (or #VC) was introduced with the confidential computing technologies (see 15.35.5 #VC Exception in AMD manual). Some VM-exits (Non-Automatic Exits or NAE, cf Table 7: List of Supported Non-Automatic Events) require the hypervisor to modify the guest’s registers (eg. CPUID result is stored in RAX, RBX, RCX and RDX). However, the hypervisor can’t do this since the registers state is encrypted (and integrity protected for SEV-SNP). So when a NAE exits occurs in the guest, the CPU raises a #VC, handled by the guest kernel by setting up a communication channel with the hypervisor, but this is out of context. You can also refer to Tom’s article that provides clear explanations

However, to properly handle a #VC exception, the guest needs to find out which NAE event raised the #VC. The CPU actually pushes the NAE exit code through the error code on the kernel stack, so that the kernel knows which NAE exit occurred. However, even when knowing the NAE exit code, it could not be enough to fully handle the VM-exit, for example for an IN/OUT instruction, we need to know what port (and value) is used in the instruction. Moreover, due to the recent AHOI attacks, we also need to double-check that the instruction that raised the NAE-events matches the CPU-provided error code.

For all these reasons, the guest kernel needs to decode the instructions pointed by RIP when the exception occurred (RIP is also pushed on the stack by the CPU). Once the guest decoded the instruction that raised the exception (eg. a CPUID), it can properly handle the #VC with the appropriate handler (eg. by emulating the instruction, or calling the hypervisor).

The funny part is, the instruction decoder is an attack surface. A guest user can trigger a #VC (eg. CPUID can be executed from CPL-3), and user entry will land in the kernel instruction decoder. Moreover, there is an intrinsic race between when the #VC exception is raised, and when the exception handler fetches the instructions for decoding. That opens a tiny window to an attacker to write valid code raising a #VC, and just after to put an arbitrary 15-byte buffer (maximum x86 instruction length) at the address where the #VC exception was raised. Those 15 bytes will then be decoded by the kernel. Please note that this attack scenario is only valid in an AMD SEV guest.

The problem is that the decoder is only called in an exception handler context, not ideal for simple fuzzing, so I patched the kernel.

Kernel patch

An easy way to fuzz the instruction decoder with Syzkaller is to expose the decoding code to user with a new syscall:

#include <asm/current.h>
#include <asm/insn.h>
#include <asm/vm86.h>
#include <linux/compiler_types.h>
#include <linux/mmu_context.h>
#include <linux/syscalls.h>

SYSCALL_DEFINE2(decode_insn, unsigned char __user *, user_insn_buf,
		struct insn __user *, decoded_insn)
{
	struct insn insn;
	unsigned char insn_buf[MAX_INSN_SIZE] = { 0 };
	int err;

	if (copy_from_user(&insn_buf, user_insn_buf, sizeof(insn_buf)))
		goto err;

	err = insn_decode(&insn, insn_buf, sizeof(insn_buf), INSN_MODE_64);
	if (err < 0)
		goto err;

	if (copy_to_user(decoded_insn, &insn, sizeof(*decoded_insn)))
		goto err;

	return 0;

err:
	return -EFAULT;
}

This simply calls the insn_decode() API, with a user-provided buffer, and returns the struct insn result. You can find the full patch here, applied on c9c3395 (“Linux 6.2”) (as recommended for Syzkaller setup). We now want to define this new syscall in Syzkaller.

Syzkaller patch

Since we created a new syscall, we need to add a new syscall definition in syzkaller to enable fuzzing:

type insn_attr_t int32
type insn_byte_t int8
type insn_value_t int32

insn_field_union {
	value	insn_value_t
	bytes	array[insn_byte_t, 4]
}

insn_field {
	union	insn_field_union
	got	int8
	nbytes	int8
}

insn {
	prefixes		insn_field
	rex_prefix		insn_field
	vex_prifile		insn_field
	opcode			insn_field
	modrm			insn_field
	sib			insn_field
	displacement		insn_field
	u1			union1
	u2			union2
	emulate_prefix_size	int32
	attr			insn_attr_t
	opnd_bytes		int8
	addr_bytes		int8
	length			int8
	x86_64			int8

	kaddr			ptr[inout, insn_byte_t]
	end_kaddr		ptr[inout, insn_byte_t]
	next_byte		ptr[inout, insn_byte_t]
}

union1 {
	imm		insn_field
	moffset1	insn_field
	imm1		insn_field
}

union2 {
	moffset2	insn_field
	imm2		insn_field
}

decode_insn(buf buffer[in], i ptr[out, insn])

Please note that: I’m nore sure if this is fully correct, and we don’t care about the output insn, so I guess the whole syzlang struct insn definition can be skipped to return a raw buffer.

You can find the complete Syzkaller patch here. We can now follow the Syzkaller setup, only enable the new decode_insn syscall, and start fuzzing.

Results

The disapointing moment: I found nothing. Two main reasons for this:

I run the fuzzer on my laptop, with limited resources. I got code coverage hit where I wanted (in arch/x86/lib/insn.c and arch/x86/lib/insn-eval.c), but not enough hits. So if you have more resources and time than me, please run this fuzzer again.
The x86 decoder code is robust. It’s using stack allocated buffers only, everyting is static, no runtime-decided sizes or indexes, etc.

In the end, it’s a good news for Linux that no bug was found (it would have been one more CVE), but this is not that simple. In #VC exception handling, the critical part is not the decoding, it’s the emulation. Emulating a user MMIO request can be very dangerous. Previous bugs were found in the emulation code (eg. CVE-2023-46813 and Tom’s article once again). So this is where to search.