Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

0x14 Extreme Optimization: Methodology

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

Phase V Keynote Codename: “Metal Mode” Philosophy: “If you can’t measure it, you can’t improve it.”

1. The Performance Ceiling

In the previous chapters, we built a highly reliable exchange core (Phase I-IV). We achieved 1.3M TPS on a single thread using the Ring Buffer architecture. This is “fast enough” for 99% of crypto exchanges.

But for top-tier HFT engines, “Fast Enough” is not enough. We want to hit the physical limits of the CPU and Memory.

1.1 Why “Extreme Optimization”?

PhaseFocusGoal
I-IIICorrectness“Does it work?”
IVIntegration“Does it work end-to-end?”
VSpeed“How fast can it go?”

In Phase V, we assume correctness is already proven. Our sole focus is performance.

1.2 Why “Metal Mode”?

“Metal Mode” is our internal codename. It means:

  • Close to the Metal: We will bypass high-level abstractions and work directly with memory layouts, CPU caches, and SIMD instructions.
  • Bare Metal Rust: No unnecessary clone(), no hidden malloc(), no runtime surprises.

2. The Benchmarking Methodology (Tier 2)

To optimize, we must first measure. But what we measure matters.

2.1 The Problem with Naive Benchmarks

Benchmark TypeWhat it MeasuresProblem for Optimization
wrk / curlHTTP round-tripIncludes OS, Network, Kernel noise
Unit testsFunction correctnessNo performance data

These are useful for validation (Phase IV), but not for isolation (Phase V).

2.2 Tier 2: Pipeline Benchmarks

We introduce Tier 2 Pipeline Benchmarks:

FeatureDescription
No Network I/OData is pre-loaded in memory.
No Disk I/OWAL is mocked or in-memory.
Pure CPU/MemoryMeasures only the “Hot Path”: RingBuffer → UBSCore → ME → Settlement.
DeterministicSame input → Same output → Same timing.

Goal: Establish the “Red Line” – the current baseline performance under ideal conditions. All future optimizations will be measured against this.


🇨🇳 中文

Phase V 基调 内部代号: “Metal Mode” 核心哲学: “无法测量,就无法优化。”

1. 性能天花板

在前几个阶段(Phase I-IV),我们构建了一个高可靠的交易所核心。利用 Ring Buffer 架构,我们在单线程上实现了 130万 TPS。对于 99% 的加密货币交易所来说,这已经“足够快“了。

但对于顶级的 HFT 引擎,“足够快“是不够的。我们要触达 CPU 和内存的物理极限。

1.1 为什么叫 “Extreme Optimization”?

阶段关注点目标
I-III正确性“能跑吗?”
IV集成“端到端能跑通吗?”
V速度“能跑多快?”

在 Phase V,我们假设正确性已经被验证。唯一的焦点是性能

1.2 为什么叫 “Metal Mode”?

“Metal Mode” 是我们的内部代号,意为:

  • 贴近金属 (Close to the Metal):我们将绕过高层抽象,直接操作内存布局、CPU 缓存和 SIMD 指令。
  • Bare Metal Rust:没有不必要的 clone(),没有隐藏的 malloc(),没有运行时惊喜。

2. 基准测试方法论 (Tier 2)

要优化,必须先测量。但测什么至关重要。

2.1 朴素基准测试的问题

基准测试类型测量内容优化的问题
wrk / curlHTTP 往返包含操作系统、网络、内核噪声
单元测试函数正确性没有性能数据

这些对于验证 (Phase IV) 有用,但不适合隔离测试 (Phase V)

2.2 Tier 2: 流水线基准测试 (Pipeline Benchmarks)

我们引入 Tier 2 流水线基准测试

特性描述
无网络 I/O数据预加载在内存中。
无磁盘 I/OWAL 被 Mock 或在内存中。
纯 CPU/内存只测量“热路径“:RingBuffer → UBSCore → ME → Settlement。
确定性相同输入 → 相同输出 → 相同耗时。

目标:建立 “Red Line (红线)” – 理想条件下的当前基线性能。所有后续优化都将以此为基准进行衡量。