0x14 Extreme Optimization: Methodology
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
Phase V Keynote Codename: “Metal Mode” Philosophy: “If you can’t measure it, you can’t improve it.”
1. The Performance Ceiling
In the previous chapters, we built a highly reliable exchange core (Phase I-IV). We achieved 1.3M TPS on a single thread using the Ring Buffer architecture. This is “fast enough” for 99% of crypto exchanges.
But for top-tier HFT engines, “Fast Enough” is not enough. We want to hit the physical limits of the CPU and Memory.
1.1 Why “Extreme Optimization”?
| Phase | Focus | Goal |
|---|---|---|
| I-III | Correctness | “Does it work?” |
| IV | Integration | “Does it work end-to-end?” |
| V | Speed | “How fast can it go?” |
In Phase V, we assume correctness is already proven. Our sole focus is performance.
1.2 Why “Metal Mode”?
“Metal Mode” is our internal codename. It means:
- Close to the Metal: We will bypass high-level abstractions and work directly with memory layouts, CPU caches, and SIMD instructions.
- Bare Metal Rust: No unnecessary
clone(), no hiddenmalloc(), no runtime surprises.
2. The Benchmarking Methodology (Tier 2)
To optimize, we must first measure. But what we measure matters.
2.1 The Problem with Naive Benchmarks
| Benchmark Type | What it Measures | Problem for Optimization |
|---|---|---|
wrk / curl | HTTP round-trip | Includes OS, Network, Kernel noise |
| Unit tests | Function correctness | No performance data |
These are useful for validation (Phase IV), but not for isolation (Phase V).
2.2 Tier 2: Pipeline Benchmarks
We introduce Tier 2 Pipeline Benchmarks:
| Feature | Description |
|---|---|
| No Network I/O | Data is pre-loaded in memory. |
| No Disk I/O | WAL is mocked or in-memory. |
| Pure CPU/Memory | Measures only the “Hot Path”: RingBuffer → UBSCore → ME → Settlement. |
| Deterministic | Same input → Same output → Same timing. |
Goal: Establish the “Red Line” – the current baseline performance under ideal conditions. All future optimizations will be measured against this.
🇨🇳 中文
Phase V 基调 内部代号: “Metal Mode” 核心哲学: “无法测量,就无法优化。”
1. 性能天花板
在前几个阶段(Phase I-IV),我们构建了一个高可靠的交易所核心。利用 Ring Buffer 架构,我们在单线程上实现了 130万 TPS。对于 99% 的加密货币交易所来说,这已经“足够快“了。
但对于顶级的 HFT 引擎,“足够快“是不够的。我们要触达 CPU 和内存的物理极限。
1.1 为什么叫 “Extreme Optimization”?
| 阶段 | 关注点 | 目标 |
|---|---|---|
| I-III | 正确性 | “能跑吗?” |
| IV | 集成 | “端到端能跑通吗?” |
| V | 速度 | “能跑多快?” |
在 Phase V,我们假设正确性已经被验证。唯一的焦点是性能。
1.2 为什么叫 “Metal Mode”?
“Metal Mode” 是我们的内部代号,意为:
- 贴近金属 (Close to the Metal):我们将绕过高层抽象,直接操作内存布局、CPU 缓存和 SIMD 指令。
- Bare Metal Rust:没有不必要的
clone(),没有隐藏的malloc(),没有运行时惊喜。
2. 基准测试方法论 (Tier 2)
要优化,必须先测量。但测什么至关重要。
2.1 朴素基准测试的问题
| 基准测试类型 | 测量内容 | 优化的问题 |
|---|---|---|
wrk / curl | HTTP 往返 | 包含操作系统、网络、内核噪声 |
| 单元测试 | 函数正确性 | 没有性能数据 |
这些对于验证 (Phase IV) 有用,但不适合隔离测试 (Phase V)。
2.2 Tier 2: 流水线基准测试 (Pipeline Benchmarks)
我们引入 Tier 2 流水线基准测试:
| 特性 | 描述 |
|---|---|
| 无网络 I/O | 数据预加载在内存中。 |
| 无磁盘 I/O | WAL 被 Mock 或在内存中。 |
| 纯 CPU/内存 | 只测量“热路径“:RingBuffer → UBSCore → ME → Settlement。 |
| 确定性 | 相同输入 → 相同输出 → 相同耗时。 |
目标:建立 “Red Line (红线)” – 理想条件下的当前基线性能。所有后续优化都将以此为基准进行衡量。