eBPF Memory Leak Detection — Q&A Preparation

Based on slides 27–33 of the CSITEP presentation. Organized by topic.

1. eBPF vs Traditional Tools (Slide 28)

Q1: What's the overhead of eBPF memleak in production?

A: Typically less than 5% CPU overhead. The uprobe/kprobe mechanism adds a few microseconds per hit. For functions called less than 10K times per second, the impact is negligible. For extremely hot paths (>100K calls/sec), the overhead can become noticeable.

中文: 通常 CPU 开销低于 5%。uprobe/kprobe 每次命中增加几微秒。每秒调用少于 1 万次的函数影响可忽略。极高频热路径（>10 万次/秒）开销可能变得明显。

Q2: Why not just use ASan in production? It's only 2-3x overhead.

A: 2-3x means you need 2-3x the hardware to maintain the same throughput. Also, ASan requires recompilation with -fsanitize=address, which changes the binary. You can't attach it to an already-running process. eBPF can attach and detach at any time without restarting.

中文: 2-3 倍意味着需要 2-3 倍硬件来维持同样的吞吐量。且 ASan 需要用 -fsanitize=address 重新编译。你无法挂到已在运行的进程上。eBPF 可以随时挂载和卸载，无需重启。

Q3: Can eBPF detect use-after-free at all?

A: Not directly. eBPF memleak tracks alloc/free pairing — it knows if memory was freed but not if it was accessed after being freed. For use-after-free, you still need Valgrind or ASan, which maintain shadow memory tracking the state of every byte.

中文: 不能直接检测。eBPF memleak 追踪 alloc/free 配对——它知道内存是否被释放，但不知道释放后是否被访问。要检测 use-after-free，仍需 Valgrind 或 ASan。

Q4: What about memory leak detection in Go or Java programs?

A: Go and Java use garbage collectors, so traditional malloc/free leaks don't apply. eBPF memleak is primarily for C/C++/Rust programs that use manual memory management or system allocators. For Go, use pprof; for Java, use JVM heap dumps.

中文: Go 和 Java 使用垃圾回收器，传统 malloc/free 泄漏不适用。eBPF memleak 主要针对使用手动内存管理的 C/C++/Rust 程序。Go 用 pprof；Java 用 JVM 堆转储。

Q5: Does it work on containers/Kubernetes?

A: Yes. eBPF runs at the kernel level, so it can trace any process on the host regardless of whether it's in a container. You just need to specify the target PID or binary path.

中文: 可以。eBPF 在内核层面运行，可以追踪主机上的任何进程，无论是否在容器中。只需指定目标 PID 或二进制路径。

Q6: What Linux kernel version do we actually need?

A: The minimum is 4.9 for basic BPF functionality. For stable memleak with stack traces, we recommend 4.14+. CAP_BPF (non-root usage) requires 5.8+.

中文: 基本 BPF 功能最低需要 4.9。稳定的 memleak 含调用栈建议 4.14+。CAP_BPF（非 root）需要 5.8+。

Q7: Valgrind vs eBPF memleak — when to use which?

A: Valgrind during development for comprehensive checking (use-after-free, overflow, double-free, leaks with byte-level precision); eBPF in production for leak hunting (zero code change, near-zero overhead, attach to running process). They complement each other.

中文: 开发阶段用 Valgrind 做全面检查（use-after-free、溢出、double-free、字节级泄漏定位）；生产环境用 eBPF 猎取泄漏（零代码修改、近零开销、挂载到运行中进程）。它们互补。

2. What We Capture (Slide 29)

Q8: Does memleak slow down malloc/free calls?

A: Each uprobe hit adds approximately 2-4 microseconds. For a typical application calling malloc a few thousand times per second, this is imperceptible. For millions of calls per second, you'd want to sample rather than trace every call.

中文: 每次 uprobe 命中增加约 2-4 微秒。每秒几千次 malloc 的典型应用不可感知。每秒百万次调用需要采样而不是全量追踪。

Q9: How do you get the stack trace? Does it require debug symbols?

A: We use the kernel's BPF stack trace helper, walking frame pointers. The binary needs -fno-omit-frame-pointer for reliable user-space stacks. Debug symbols (DWARF) are not required at collection time but are needed afterward to translate addresses to function names and line numbers.

中文: 使用内核的 BPF 调用栈辅助函数，通过帧指针回溯。二进制文件需要 -fno-omit-frame-pointer 才能获得可靠的用户态栈。收集时不需要调试符号（DWARF），但之后解析地址为函数名/行号时需要。

Q10: What if the program uses a custom allocator instead of malloc?

A: If the custom allocator ultimately calls malloc (jemalloc, tcmalloc, mimalloc all do), memleak will still catch it. If it uses mmap directly or a fully custom pool allocator that never calls malloc, you'd need to add uprobes to that allocator's specific functions.

中文: 如果自定义分配器最终调用 malloc（jemalloc、tcmalloc、mimalloc 都是），memleak 仍能捕获。如果直接使用 mmap 或完全自定义的池分配器，需要对其特定函数添加 uprobe。

Q11: Does it work with C++ new/delete?

A: Yes. C++ new calls operator new which calls malloc. delete calls operator delete which calls free. Hooking malloc/free covers C++ allocations.

中文: 可以。C++ 的 new 调用 operator new 再调用 malloc。delete 调用 operator delete 再调用 free。hook malloc/free 就覆盖了 C++ 分配。

Q12: Can you filter by specific process or thread?

A: Yes. You can target a specific PID and the tool records TID per allocation, so you can filter results by thread in post-processing.

中文: 可以。可以指定特定 PID，工具记录每次分配的 TID，后处理中可按线程过滤。

Q13: How much memory does the eBPF memleak tool itself consume?

A: The BPF maps grow proportionally to outstanding allocations being tracked. For a typical application with thousands of active allocations, this is a few megabytes. Each unique stack trace is stored once and referenced by ID.

中文: BPF maps 与被追踪的未释放分配数量成正比。数千个活跃分配的典型应用约几 MB。每个唯一调用栈只存一次，通过 ID 引用。

Q14: How do you distinguish intentional long-lived allocations from real leaks?

A: Key indicators of a real leak: (1) continuously growing allocations over time, not one-time initialization; (2) multiple allocations from the same call stack accumulating; (3) memory growing proportionally to load/time. Our HTML report's "continuous leak detection" flags pattern #1 automatically.

中文: 真正泄漏的关键指标：(1) 随时间持续增长，非一次性初始化；(2) 同一调用栈的多次分配不断累积；(3) 内存与负载/时间成正比增长。HTML 报告的"持续泄漏检测"自动标记模式 #1。

3. Kernel-Space Tracing (Slide 30)

Q15: Is it safe to trace kernel allocations in production?

A: Yes. kprobes are a well-established kernel tracing mechanism. The BPF verifier ensures the probe program cannot crash the kernel or access invalid memory. Overhead is similar to user-space — a few microseconds per probe hit.

中文: 安全。kprobe 是成熟的内核追踪机制。BPF 验证器确保探针程序不会崩溃内核或访问无效内存。开销与用户态类似。

Q16: What about SLUB/SLAB allocator tracking?

A: kmem_cache_alloc is the SLAB/SLUB interface and we hook it directly. This covers most kernel object allocations (inodes, dentries, sk_buffs, etc.). kmalloc is also backed by SLAB/SLUB internally.

中文: kmem_cache_alloc 是 SLAB/SLUB 接口，我们直接 hook。这覆盖了大多数内核对象分配（inodes、dentries、sk_buffs 等）。kmalloc 内部也基于 SLAB/SLUB。

Q17: Can you trace specific kernel modules?

A: The tool captures all kernel allocations. You can filter by kernel function name patterns in post-processing — look for stacks containing the specific module's functions.

中文: 工具捕获所有内核分配。可在后处理中按内核函数名模式过滤——查找包含特定模块函数的调用栈。

Q18: What if memleak itself crashes? Does it affect the traced process?

A: No. eBPF programs are verified by the kernel before loading — they cannot crash the kernel or the traced process. If the user-space component crashes, BPF probes are automatically cleaned up. The traced process continues unaffected.

中文: 不会。eBPF 程序在加载前经过内核验证。如果用户态组件崩溃，BPF 探针自动清理。被追踪进程继续不受影响。

4. HTML Visual Report (Slides 31–33)

Q19: How does the "continuous leak detection" algorithm work?

A: It compares each allocation source across all snapshots. If a source's total bytes increase in every consecutive snapshot (monotonically increasing), it's flagged as a suspected continuous leak. This eliminates one-time allocations and focuses on genuinely growing memory.

中文: 比较每个分配源在所有快照中的数据。如果某源的总字节数在每个连续快照中都增加（单调递增），则标记为疑似持续泄漏。排除一次性分配，聚焦真正增长的内存。

Q20: Can the report handle very large logs (hours of tracing)?

A: Yes. The report aggregates data by snapshot intervals. Even hours of tracing produces a manageable number of data points. We've tested with thousands of snapshots without issues.

中文: 可以。报告按快照间隔聚合数据。数小时追踪也产生可管理数量的数据点。测试过数千个快照没有问题。

Q21: Can we integrate this into CI/CD?

A: Yes. Run the service under memleak for N seconds, parse the output, fail the build if outstanding allocations exceed a threshold or continuous growth is detected. The HTML report generation can also be automated as a post-processing step.

中文: 可以。在 memleak 下运行服务 N 秒，解析输出，如果未释放分配超过阈值或检测到持续增长则构建失败。HTML 报告生成也可作为后处理步骤自动化。

Q22: What's do_anonymous_page? Why does it dominate kernel allocations?

A: do_anonymous_page handles page faults for anonymous memory (heap, stack, mmap without file backing). It's normal for it to dominate because every user-space malloc that triggers a new page goes through this path. It typically represents the program's working set, not a kernel bug.

中文: do_anonymous_page 处理匿名内存的缺页中断。它占主导是正常的，因为每个触发新页分配的用户态 malloc 都经过这条路径。通常代表程序的工作集，不是内核 bug。

Q23: How do we handle false positives in reports?

A: Common false positives: (1) startup allocations — filter by growth pattern, should plateau; (2) cache/pool allocations — grow then stabilize, continuous-growth detector ignores them; (3) intentional leaks (daemon patterns) — create a baseline and compare. You can maintain a whitelist to suppress known non-issues.

中文: 常见误报：(1) 启动分配——通过增长模式过滤，应趋于平稳；(2) 缓存/池分配——增长后稳定，检测器忽略；(3) 故意泄漏（守护进程模式）——创建基线比较。可维护白名单抑制已知非问题。

5. General / Architecture

Q24: Does this work on ARM/aarch64?

A: Yes. eBPF is architecture-independent — it runs in the kernel's BPF virtual machine. Both x86_64 and aarch64 are fully supported. The only concern is stack unwinding may require different compilation flags on ARM.

中文: 可以。eBPF 架构无关，在内核 BPF 虚拟机中运行。x86_64 和 aarch64 都完全支持。唯一关注点是 ARM 上栈回溯可能需要不同编译标志。

Q25: Can we run memleak on multiple processes simultaneously?

A: Yes. Run separate instances targeting different PIDs, or use system-wide mode to trace all processes. System-wide mode captures every malloc/free on the system.

中文: 可以。运行多个实例分别指向不同 PID，或以系统级模式追踪所有进程。

Q26: What privileges do we need? Can we avoid root?

A: Options: (1) Root — always works; (2) CAP_BPF + CAP_PERFMON (Linux 5.8+) — minimum capabilities without full root; (3) CAP_SYS_ADMIN — works but overly broad. In production, recommend a dedicated service account with CAP_BPF + CAP_PERFMON only.

中文: 选项：(1) Root——始终有效；(2) CAP_BPF + CAP_PERFMON（Linux 5.8+）——不需要完整 root 的最小权限；(3) CAP_SYS_ADMIN——有效但过宽。生产中建议只有 CAP_BPF + CAP_PERFMON 的专用账号。

Q27: What's the recommended workflow for investigating a suspected leak?

Confirm the symptom — check RSS growth in monitoring
Attach memleak — run for 2-5 minutes with periodic snapshots
Generate HTML report — look for continuously growing sources
Identify top offenders — expand stack traces to find the code path
Analyze the code — determine if it's a real leak or intentional retention
Fix and verify — patch the code, run memleak again to confirm

中文:

确认症状——监控中检查 RSS 增长
挂载 memleak——运行 2-5 分钟，周期性快照
生成 HTML 报告——查找持续增长的来源
识别主要问题——展开调用栈找到代码路径
分析代码——判断是真正泄漏还是故意保留
修复并验证——修补代码，再次确认泄漏消除

Q28: Can we save raw data and generate the report later?

A: Yes. memleak outputs text logs to stdout/file. Collect raw output, transfer to another machine, run the HTML report generator offline.

中文: 可以。memleak 输出文本日志到 stdout/文件。收集原始输出，传到另一台机器，离线生成 HTML 报告。

Q29: How does Rust leak memory if it has ownership?

A: Several ways: (1) Rc/Arc reference cycles — reference count never reaches zero; (2) mem::forget() — skips destructor; (3) Box::leak() — intentional leak for 'static reference; (4) FFI boundaries — memory allocated in C code; (5) infinite-growing collections. Rust guarantees memory safety (no UB), not preventing leaks. Since Rust's allocator calls malloc/free underneath, our uprobe approach works out of the box.

中文: 几种方式：(1) Rc/Arc 引用循环；(2) mem::forget()；(3) Box::leak()；(4) FFI 边界；(5) 无限增长的集合。Rust 保证内存安全（无 UB），不保证防泄漏。由于 Rust 分配器底层调用 malloc/free，我们的 uprobe 方法开箱即用。

Q30: What's the maximum duration you can run memleak?

A: No hard limit. The BPF hash map has a configurable max size (default ~10240 entries). If you hit the limit, new allocations are silently dropped from tracking (the allocation itself still succeeds). We've run 24+ hours in production without issues. Map size can be tuned at startup.

中文: 没有硬性限制。BPF 哈希表有可配置最大大小（默认约 10240 条目）。达到限制时新分配被静默丢弃追踪（分配本身仍成功）。生产中运行 24 小时以上没有问题。Map 大小可在启动时调整。