0x00 Project Roadmap
Vision: Build a production-grade cryptocurrency exchange from Hello World to Microsecond Latency. Current Status: Phase V (Extreme Optimization) - Order Commands parity complete.
📊 Progress Overview
This project documents the complete journey of building a 1.3M orders/sec matching engine. Below is the current status of each phase.
✅ Phase I: Core Matching Engine
Status: Complete
| Chapter | Title | Description |
|---|---|---|
| 0x01 | Genesis | Basic OrderBook with Vec<Order> |
| 0x02 | Float Curse | Why floats fail → u64 refactoring |
| 0x03 | Decimal World | Precision configuration system |
| 0x04 | BTree OrderBook | BTreeMap-based order book |
| 0x05 | User Balance | Account & balance management |
| 0x06 | Enforced Balance | Type-safe fund locking |
| 0x07 | Testing Framework | 1M order batch testing |
| 0x08 | Trading Pipeline | LMAX-style Ring Buffer architecture |
| 0x09 | Gateway & Persistence | HTTP API, TDengine, WebSocket, K-Line |
✅ Phase II: Productization
Status: Complete
| Chapter | Title | Description |
|---|---|---|
| 0x0A | Account System | PostgreSQL user management |
| 0x0A-b | ID Specification | Identity addressing rules |
| 0x0A-c | API Authentication | Ed25519 cryptographic auth |
| 0x0B | Funding & Transfer | Internal transfer architecture |
| 0x0C | Trade Fee | Maker/Taker fees + VIP discount |
🔶 Phase III: Resilience & Funding
Status: Complete
| Chapter | Title | Description | Status |
|---|---|---|---|
| 0x0D | Snapshot & Recovery | State snapshot, crash recovery | ✅ Done |
| 0x0E | OpenAPI Integration | Swagger UI, SDK generation | ✅ Done |
| 0x0F | Admin Dashboard | Ops Panel, KYC, hot-reload | ✅ Done |
| 0x11 | Deposit & Withdraw | Mock Chain integration, Idempotency | ✅ Done |
| 0x11-a | Real Chain Integration | Sentinel Service (Pull Model) | ✅ MVP Done |
| 0x11-b | Sentinel Hardening | SegWit Fix (DEF-002) & ETH/ERC20 & ADR-005/006 | ✅ Done |
🔶 Phase IV: Trading Integration & Verification
Status: Pending Verification
Context: The Core Engine and Trading APIs are implemented but currently tested with Mocks. This phase bridges the gap between the Real Chain (0x11) and the Matching Engine (0x01).
| Chapter | Title | Description | Status |
|---|---|---|---|
| 0x12 | Real Trading Verification | End-to-End: Bitcoind -> Sentinel -> Order -> Trade | � Code Ready (Needs Real-Chain Test) |
| 0x13 | Market Data Experience | WebSocket Verification (Ticker, Trade, Depth) | � Code Ready (Needs E2E Test) |
⏳ Phase V: Extreme Optimization (Metal Mode)
Status: In Progress
Codename: “Metal Mode” Goal: Push Rust to the physical limits of the hardware.
| Chapter | Title | Description |
|---|---|---|
| 0x14 | Extreme Optimization | Architecture Manifesto |
| 0x14-a | Benchmark Harness | ✅ 100% Bit-exact Parity (FILL) |
| 0x14-b | Order Commands | ✅ IOC, Move, Reduce (Feature Parity) |
| 0x15 | Zero-Copy | Planned |
| 0x16 | CPU Affinity | Planned |
| 0x17 | SIMD Matching | Planned |
🏆 Key Milestones
| Git Tag | Phase | Highlights |
|---|---|---|
v0.09-f-integration-test | 0x09 | 1.3M orders/sec baseline achieved |
v0.10-a-account-system | 0x0A | PostgreSQL account integration |
v0.10-b-api-auth | 0x0A | Ed25519 authentication |
v0.0C-trade-fee | 0x0C | Maker/Taker fee system |
v0.0D-persistence | 0x0D | Universal WAL & Snapshot persistence |
v0.0F-admin-dashboard | 0x0F | Admin Operations Dashboard |
v0.11-a-funding-qa | 0x11-a | Real Chain Sentinel MVP (Deposit/Withdraw) |
v0.11-b-sentinel-hardening | 0x11-b | DEF-002 Fix, ADR-005/006, Hot Listing |
v0.14-b-order-commands | 0x14-b | ✅ IOC, Move, Reduce (Bit-exact Parity) |
🎯 What You’ll Learn
- Financial Precision - Why
f64fails and how to use fixed-pointu64 - High-Performance Data Structures - BTreeMap for O(log n) order matching
- Lock-Free Concurrency - LMAX Disruptor-style Ring Buffer
- Event Sourcing - WAL-based deterministic state reconstruction
- Real-World Blockchain Integration - Handling Re-orgs, Confirmations, and UTXO management
- Production Security - Watch-only wallets & Ed25519 authentication
Last Updated: 2025-12-31
0x01 Genesis: Basic Engine
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📦 Code Changes: View Diff
This is the first version of 0xInfinity. In this stage, we have built a minimal prototype of a Central Limit Order Book (CLOB). Our goal is to intuitively demonstrate real-world trading logic using standard data structures to manage orders.
1. Visualizing the Orderbook
An Orderbook is essentially a list of orders arranged by price. We place Sells (Asks) at the top and Buys (Bids) at the bottom. The gap in the middle is called the “Spread”.
We maintain two lists in memory:
- Sells: Sorted by price Low to High (Buyers want the cheapest price).
- Buys: Sorted by price High to Low (Sellers want the most expensive price).
===========================================================
ORDER BOOK SNAPSHOT
===========================================================
Side | Price (f64) | Qty | Orders (FIFO)
-----------------------------------------------------------
SELL | 102.00 | 5.0 | [Order #2]
SELL | 101.00 | 5.0 | [Order #3] ^
| Best Ask (Lowest)
-----------------------------------------------------------
$$$ MARKET SPREAD $$$
-----------------------------------------------------------
| Best Bid (Highest)
BUY | 100.00 | 10.0 | [Order #1] v
BUY | 99.00 | 10.0 | [Order #5]
===========================================================
2. Program Output
After executing cargo run, we can observe the actual output of the engine:
--- 0xInfinity: Stage 1 (Genesis) ---
[1] Makers coming in...
[2] Taker eats liquidity...
MATCH: Buy 4 eats Sell 1 @ Price 100 (Qty: 10)
MATCH: Buy 4 eats Sell 3 @ Price 101 (Qty: 2)
[3] More makers...
--- End of Simulation ---
🇨🇳 中文
📦 代码变更: 查看 Diff
这是 0xInfinity 的第一个版本。 在这一阶段,我们构建了一个最简单的**中央限价订单簿(CLOB)**雏形。我们的目标是直观地展示现实世界的交易逻辑,使用标准的数据结构来管理订单。
1. 订单簿布局 (Visualizing the Orderbook)
订单簿本质上是一个按价格排列的列表。我们将**卖单(Sells)**放在上方,**买单(Buys)**放在下方。中间的空隙被称为“价差(Spread)”。
我们在内存中维护了两个列表:
- Sells: 按价格 从低到高 排列(买家希望买到最便宜的)。
- Buys: 按价格 从高到低 排列(卖家希望卖给最贵的)。
===========================================================
ORDER BOOK SNAPSHOT
===========================================================
Side | Price (f64) | Qty | Orders (FIFO)
-----------------------------------------------------------
SELL | 102.00 | 5.0 | [Order #2]
SELL | 101.00 | 5.0 | [Order #3] ^
| Best Ask (Lowest)
-----------------------------------------------------------
$$$ MARKET SPREAD $$$
-----------------------------------------------------------
| Best Bid (Highest)
BUY | 100.00 | 10.0 | [Order #1] v
BUY | 99.00 | 10.0 | [Order #5]
===========================================================
2. 运行结果 (Program Output)
执行 cargo run 后,我们可以看到引擎的实际运行结果:
--- 0xInfinity: Stage 1 (Genesis) ---
[1] Makers coming in...
[2] Taker eats liquidity...
MATCH: Buy 4 eats Sell 1 @ Price 100 (Qty: 10)
MATCH: Buy 4 eats Sell 3 @ Price 101 (Qty: 2)
[3] More makers...
--- End of Simulation ---
0x02: The Curse of Float
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📦 Code Changes: View Diff
1. The Rookie Mistake
Experienced developers might have noticed that the price type was f64. This is problematic. In models.rs, we had this line:
#![allow(unused)]
fn main() {
pub price: f64, // The root of all evil
}
In most general-purpose applications where absolute precision is not critical, using floating-point numbers is fine. If single precision isn’t enough, double precision usually suffices. However, in the financial domain, storing monetary values as floats is considered an engineering disaster.
If you use floats to store money, it is impossible to maintain a 100% accurate ledger over time. Even with frequent reconciliation, you often end up accepting a “close enough” result.
Moreover, using floats introduces accumulation errors. Over millions of transactions, these tiny errors add up. While various rounding modes can mitigate this if done correctly, the root cause remains.
The biggest issue isn’t just the error itself (which might be acceptable within a tolerance), but the fact that you cannot fundamentally verify the correctness of the settlement, potentially hiding real bugs.
2. The Precision Trap
Run this incredibly simple code (you can run it in this project via cargo run --example the_curse_of_float):
fn main() {
let a: f64 = 0.1;
let b: f64 = 0.2;
let sum = a + b;
// You expect this to pass, right?
if sum == 0.3 {
println!("Math works!");
} else {
println!("PANIC: Math is broken! Sum is {:.20}", sum);
}
}
The output might surprise you:
PANIC: Math is broken! Sum is 0.30000000000000004441
See that extra 0.00000000000000004441? What is that? Why does it happen?
The main issue isn’t just about floating-point precision being “insufficient,” but that computers simply cannot precisely represent certain numbers.
Computers use binary, while humans use decimal. Just as 1/3 = 0.3333... repeats infinitely in decimal, 0.1 is a repeating fraction in binary that cannot be represented exactly.
In a matching engine, if an Ask in your OrderBook is 0.3 and a user’s Bid is computed as 0.1 + 0.2, these two orders—which inherently should match—will never match due to floating-point errors.
3. Why Blockchain Hates Floats
If you’ve worked with Ethereum smart contracts, you know there are no floating-point numbers in Solidity. Many people wonder why.
There is only one reason: Blockchain cores require 100% deterministic outputs for the same input. Regardless of time, location, hardware, OS, or CPU architecture, running the same code must yield exactly the same result. Only with absolute consistency—down to the last bit—can we ensure that everyone shares the same ledger and the same “consensus.”
Specifically, while floating-point calculations follow the IEEE 754 standard, edge cases can cause minute differences across CPUs:
Node A (Intel) Result: 100.00000000000001
Node B (ARM) Result: 100.00000000000000
Once this happens, the storage Hash differs, consensus breaks, and the chain forks.
4. The Decimal Temptation
When people realize the issue with f64, they often look for a precise decimal type, such as rust_decimal.
However, even with Decimal, different hardware, programming languages, or even compiler versions can lead to subtle differences. Achieving the 100% determinism required by blockchain is difficult.
The only thing that guarantees 100% determinism is Integer arithmetic. If integer calculations are inconsistent, it is 100% a bug.
Problems with Decimal:
- Software Emulation: Decimal is a software struct, not a hardware primitive.
- Implementation Dependency: Consistency depends on the library implementation.
- “Dialects”: If your backend uses Rust (
rust_decimal), risk engine uses Python (decimal), and frontend uses JS (BigInt), subtle differences in “Rounding Mode” or “Overflow Handling” can lead to ledger discrepancies over time.
5. Need for Speed: f64 vs u64
Besides determinism, another core reason we avoid Decimal is Performance.
u64 (Native Integer):
- When executing
a + b, the CPU has a dedicated ALU circuit for 64-bit integer addition. - It completes in as little as 1 clock cycle.
Decimal (Software Struct):
- When executing addition, the CPU runs a complex piece of code: checking Scale, aligning decimals, handling overflow, and finally calculating.
- This takes hundreds to thousands of times more instruction cycles.
In most apps, CPU cycles are abundant, so this doesn’t matter. But we are writing an HFT (High-Frequency Trading) engine where every nanosecond counts.
Cache Efficiency:
u64takes 8 bytes.Decimaltypically takes 16 bytes (128-bit).- Using
u64means your CPU cache can store twice as much price data, effectively doubling your throughput.
We will discuss Cache mechanics in detail later.
Summary
Two reasons to ban floating-point numbers:
- No 100% Determinism — Fails to meet blockchain consensus and precise reconciliation requirements.
- Performance Issues — For HFT engines, Integer is the only choice.
Refactoring Results
We have refactored all f64 fields in models.rs to u64:
#![allow(unused)]
fn main() {
pub struct Order {
pub id: u64,
pub price: u64, // Use Integer for Price
pub qty: u64, // Use Integer for Quantity
pub side: Side,
}
}
Output after cargo run:
--- 0xInfinity: Stage 2 (Integer) ---
[1] Makers coming in...
[2] Taker eats liquidity...
MATCH: Buy 4 eats Sell 1 @ Price 100 (Qty: 10)
MATCH: Buy 4 eats Sell 3 @ Price 101 (Qty: 2)
[3] More makers...
--- End of Simulation ---
Now all price comparisons are precise integer comparisons, free from floating-point errors.
🇨🇳 中文
📦 代码变更: 查看 Diff
1. 新手常犯的错误 (The Rookie Mistake)
有经验的老手,应该马上看到 price 的类型是 f64,这是有问题的。因为我们在 models.rs 里有这行代码:
#![allow(unused)]
fn main() {
pub price: f64, // The root of all evil
}
在大多数不要求计算结果绝对精确的场合,使用浮点数是没问题的。如果单精度不够,那就使用双精度,一般都不会有什么问题。但是在金融领域,使用浮点数存储金额,属于工程事故。
使用浮点数存储金额,稍微长一点时间,都不可能做到账本的完全精确、分毫不差。即使通过频繁的对账校验,最后也只能接受“大差不差,差不多就行“的结果。
而且使用浮点数存储金额,会带来累积误差。在常年累月的交易后,这些微小的误差会越来越多。使用各种不同的误差舍入模式,如果做对了,可以减少累积误差。
如果说累积误差在一定范围内是可以接受的,那么误差本身一般不是问题。最大的问题是:如果不能从根本上检验结算的正确性,就可能因此而隐藏了真正的 bug。
2. 精度陷阱 (The Precision Trap)
跑一下这段极其简单的代码(你可以在本项目中运行 cargo run --example the_curse_of_float):
fn main() {
let a: f64 = 0.1;
let b: f64 = 0.2;
let sum = a + b;
// You expect this to pass, right?
if sum == 0.3 {
println!("Math works!");
} else {
println!("PANIC: Math is broken! Sum is {:.20}", sum);
}
}
输出结果会让人惊讶:
PANIC: Math is broken! Sum is 0.30000000000000004441
看到了吗?那个多出来的 0.00000000000000004441。这是什么鬼?为什么会这样?
主要的问题不仅仅是浮点数精度够不够的问题,而是计算机根本无法精确表示某些数字的问题。
计算机是二进制的,而人类的常用数字是十进制的。就像十进制里 1/3 = 0.3333... 永远写不完一样,在二进制里,0.1 也是一个用二进制永远无法完全精确表达的数。
在撮合引擎里,如果你的 OrderBook 里的 Ask 是 0.3,而用户的 Bid 是 0.1 + 0.2,由于浮点误差,这两个本来应该成交的单子,永远不会匹配。
3. 区块链的零容忍 (Why Blockchain Hates Floats)
如果了解过以太坊的智能合约语言就知道,在合约里面是没有任何浮点数的。很多人不知道为什么。
原因只有一个:区块链的核心是要求同样的输入必须 100% 确定的输出。无论你在什么时间、什么地方,都必须在不同的硬件、不同的操作系统、不同的 CPU 架构上,运行同一段代码,并得到完全一致的结果。只有完全一致,一个 bit 的误差都没有,才能确定全球所有人共享的都是同一个账本、同一种“比特币“。
具体而言,浮点数计算遵循 IEEE 754 标准,但在极端边缘情况下,不同的 CPU 对浮点数的处理可能会有极其微小的差异:
Node A (Intel) 算出结果:100.00000000000001
Node B (ARM) 算出结果:100.00000000000000
一旦发生这种情况,Hash 就会不同,共识就会破裂,链就会分叉。
4. Decimal 的诱惑与陷阱 (The Decimal Temptation)
有人意识到 f64 的问题时,会寻找一种精确的小数类型,比如 rust_decimal。
但即使是 Decimal,在不同的硬件、不同编程语言,甚至同一种语言的不同版本、编译器的实现上,都可能有细微的差别,都不可能做到区块链要求的 100% 确定性。
能做到 100% 确定性的,只有整数。如果全部是整数计算结果也不一致的话,可以 100% 确定是有 bug。
Decimal 的问题
Decimal (Software Struct):
- Decimal 是软件模拟的
- Decimal 的一致性依赖于库的实现
- 如果你的后端用 Rust (
rust_decimal),风控用 Python (decimal),前端用 JS (BigInt),不同的库对“舍入模式 (Rounding Mode)“和“溢出处理“可能有不同的“方言” - 这种微小的差异会导致长时间之后系统对不上账
5. 性能之争: f64 vs u64 (Need for Speed)
除了 100% 确定性,我们不使用 Decimal 的另一个核心理由是:性能。
u64 (Native Integer):
- 当你执行
a + b时,CPU 内部有专门的 ALU 电路直接处理 64 位整数加法 - 它最快只需要 1 个时钟周期 就完成了计算
Decimal (Software Struct):
- 当你执行加法时,CPU 实际上是在运行一段复杂的代码:检查 Scale、调整对齐、处理溢出、最后计算
- 这需要多 上百倍甚至几千倍 的指令周期
大多数情况下,CPU 时钟周期都过剩,因此一般应用无需过多考虑。而且大多数现代 CPU 都有浮点计算单元,也会很快。但我们要写的是 HFT 引擎,纳秒必争。
还有就是 Cache Efficiency(缓存效率):
u64占 8 字节Decimal通常占 16 字节 (128-bit)- 使用
u64意味着你的 CPU 缓存能多存一倍的价格数据,这直接意味着吞吐量翻倍
关于 Cache 的问题,后面再详细讨论。
Summary
不能使用浮点数的两个理由:
- 不能保证 100% 确定性 — 无法满足区块链共识和精确对账的要求
- Decimal 有性能问题 — 对于 HFT 引擎来说,整数是唯一的选择
重构后的运行结果
我们已经把 models.rs 中的 f64 全部重构为 u64:
#![allow(unused)]
fn main() {
pub struct Order {
pub id: u64,
pub price: u64, // 使用整数表示价格
pub qty: u64, // 使用整数表示数量
pub side: Side,
}
}
运行 cargo run 后的输出:
--- 0xInfinity: Stage 2 (Integer) ---
[1] Makers coming in...
[2] Taker eats liquidity...
MATCH: Buy 4 eats Sell 1 @ Price 100 (Qty: 10)
MATCH: Buy 4 eats Sell 3 @ Price 101 (Qty: 2)
[3] More makers...
--- End of Simulation ---
现在所有的价格比较都是精确的整数比较,不再有浮点数误差的问题。
0x03: Decimal World
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📦 Code Changes: View Diff
In the previous chapter, we refactored all f64 to u64, solving the floating-point precision issues. But this introduced a new problem: Clients use decimals, while we use integers internally. How do we convert between them?
1. The Decimal Conversion Problem
When a user places an order, the input price might be "100.50" and quantity "10.5". However, our engine uses u64 integers:
#![allow(unused)]
fn main() {
pub struct Order {
pub id: u64,
pub price: u64, // Integer representation
pub qty: u64, // Integer representation
pub side: Side,
}
}
Core Question: How to perform lossless conversion between decimal strings and u64?
The answer is the Fixed Decimal scheme:
#![allow(unused)]
fn main() {
/// Convert decimal string to u64
/// e.g., "100.50" with 2 decimals -> 10050
fn parse_decimal(s: &str, decimals: u32) -> u64 {
let multiplier = 10u64.pow(decimals);
// ... Parsing Logic
}
/// Convert u64 back to decimal string for display
/// e.g., 10050 with 2 decimals -> "100.50"
fn format_decimal(value: u64, decimals: u32) -> String {
let multiplier = 10u64.pow(decimals);
let int_part = value / multiplier;
let dec_part = value % multiplier;
format!("{}.{:0>width$}", int_part, dec_part, width = decimals as usize)
}
}
2. The u64 Max Value (Range Analysis)
The maximum value of u64 is:
u64::MAX = 18,446,744,073,709,551,615
If we use 8 decimal places (similar to Bitcoin’s satoshi), the maximum representable value is:
184,467,440,737.09551615
This means:
- For Price: We can represent up to ~184 Billion. (If Bitcoin hits this price, we’ll upgrade…)
- For Quantity: It can hold the entire total supply of BTC (21 million).
Decimals Configuration for Different Assets
Different blockchain assets have different native precisions:
| Asset | Native Decimals | Smallest Unit |
|---|---|---|
| BTC | 8 | 1 satoshi = 0.00000001 BTC |
| USDT (ERC20) | 6 | 0.000001 USDT |
| ETH | 18 | 1 wei = 0.000000000000000001 ETH |
The Question: ETH natively uses 18 decimals. Will we lose precision if we use only 8?
The answer is: It is sufficient for an Exchange. Because:
- With 8 decimals, the smallest supported unit is
0.00000001 ETH. - There’s no real need to trade
0.000000000000000001 ETH(value ≈ $0.000000000000003).
So we can choose a reasonable internal precision, not necessarily identical to the native chain.
Thus, we need a SymbolManager to manage:
- Internal precision (
decimals) for each asset. - User display precision (
display_decimals). - Price precision configuration for trading pairs.
- Conversion between on-chain and internal precision during Deposit/Withdrawal.
ETH Decimals Analysis: 8 vs 12 bits
Let’s analyze the maximum ETH amount representable by u64 under different decimal configs:
| Decimals | Multiplier | Max Value by u64 | Sufficient? |
|---|---|---|---|
| 8 | 10^8 | 184,467,440,737 ETH | ✅ Huge margin |
| 9 | 10^9 | 18,446,744,073 ETH | ✅ Huge margin |
| 10 | 10^10 | 1,844,674,407 ETH | ✅ > Total Supply |
| 11 | 10^11 | 184,467,440 ETH | ✅ Just enough (~120M) |
| 12 | 10^12 | 18,446,744 ETH | ❌ < Total Supply! |
| 18 | 10^18 | 18.44 ETH | ❌ Absolutely not enough |
ETH Total Supply ≈ 120 Million ETH
Why we chose 8 decimals for ETH?
0.00000001 ETH≈$0.00000003, far below any meaningful trade size.- Max capacity 184 Billion ETH > Total Supply (120M).
- Just convert precision during Deposit/Withdrawal.
Configuration Example:
#![allow(unused)]
fn main() {
// BTC: 8 decimals (Same as satoshi)
manager.add_asset(1, 8, 3, "BTC");
// USDT: 8 decimals (Native is 6, we align to 8 internally)
manager.add_asset(2, 8, 2, "USDT");
// ETH: 8 decimals (Safe range, sufficient precision)
manager.add_asset(3, 8, 4, "ETH");
}
3. Symbol Configuration
Different trading pairs have different precision requirements:
| Symbol | Price Decimals | Qty Display Decimals | Example |
|---|---|---|---|
| BTC_USDT | 2 | 3 | Buy 0.001 BTC @ $65000.00 |
| ETH_USDT | 2 | 4 | Buy 0.0001 ETH @ $3500.00 |
| DOGE_USDT | 6 | 0 | Buy 100 DOGE @ $0.123456 |
We use SymbolManager to manage these configs:
#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
pub struct SymbolInfo {
pub symbol: String,
pub symbol_id: u32,
pub base_asset_id: u32,
pub quote_asset_id: u32,
pub price_decimal: u32, // Decimals for Price
pub price_display_decimal: u32, // Display decimals for Price
}
#[derive(Debug, Clone)]
pub struct AssetInfo {
pub asset_id: u32,
pub decimals: u32, // Internal precision (usually 8)
pub display_decimals: u32, // Max decimals for input/display
pub name: String,
}
}
4. decimals vs display_decimals
Distinguishing these two concepts is crucial:
decimals (Internal Precision)
- Determines the multiplier for
u64. - Usually 8 (like satoshi).
- This is internal storage format, invisible to users.
display_decimals (Display Precision)
- Determines how many decimal places users can see/input.
- E.g., BTC displays 3 digits:
0.001 BTC. - USDT displays 2 digits:
100.00 USDT.
Why separate them?
- UX: Users don’t need to see 8 decimal places.
- Validation: Limit user input precision.
- Cleanliness: Avoid trailing zeros.
5. Program Output
Output after cargo run:
--- 0xInfinity: Stage 3 (Decimal World) ---
Symbol: BTC_USDT (ID: 0)
Price Decimals: 2, Qty Display Decimals: 3
[1] Makers coming in...
Order 1: Sell 10.000 BTC @ $100.00
Order 2: Sell 5.000 BTC @ $102.00
Order 3: Sell 5.000 BTC @ $101.00
[2] Taker eats liquidity...
Order 4: Buy 12.000 BTC @ $101.50
MATCH: Buy 4 eats Sell 1 @ Price 10000 (Qty: 10000)
MATCH: Buy 4 eats Sell 3 @ Price 10100 (Qty: 2000)
[3] More makers...
Order 5: Buy 10.000 BTC @ $99.00
--- End of Simulation ---
--- u64 Range Demo ---
u64::MAX = 18446744073709551615
With 8 decimals, max representable value = 184467440737.09551615
Observation:
- User input is decimal string
"100.00". - Internal storage is integer
10000. - Display converts back to
"100.00".
This is the core of Decimal World: Seamless lossless conversion between Decimal Strings and u64 Integers.
📖 True Story: JavaScript Number Overflow
During development, we encountered a bizarre bug:
Symptom: The backend returned raw ETH amount (in wei). During testing with small amounts (0.00x ETH), frontend worked fine. But once the amount hit ~0.009 ETH, the number started losing precision and became incorrect!
Root Cause: JavaScript’s Number type uses IEEE 754 double-precision floats. The maximum safe integer is 2^53 - 1:
> console.log(Number.MAX_SAFE_INTEGER);
9007199254740991 // ~ 9 * 10^15
// 1 ETH = 10^18 wei
> const oneEthInWei = 1000000000000000000;
// The Issue: When wei amount exceeds MAX_SAFE_INTEGER
> const smallAmount = 1000000000000000; // 0.001 ETH = 10^15 wei ✅ Safe
> const dangerAmount = 9007199254740992; // ~ 0.009 ETH ⚠️ Just exceeded limit!
> const tenEthInWei = 10000000000000000000; // 10 ETH = 10^19 wei ❌ Overflow!
// Verify Precision Loss: Adding 1 has no effect!
> console.log(tenEthInWei + 1);
10000000000000000000 // No +1!
> console.log(tenEthInWei === tenEthInWei + 1);
true // 😱 WHAT?!
Why ~0.009 ETH?
> console.log(Number.MAX_SAFE_INTEGER / 1e18);
0.009007199254740991 // 0.009 ETH is the safety limit!
Solution:
// ✅ Solution 1: Backend returns String, Frontend uses BigInt
> const weiString = "10000000000000000000"; // String from backend
> const weiBigInt = BigInt(weiString); // Convert to BigInt
> console.log((weiBigInt + 1n).toString());
10000000000000000001 // ✅ Correct!
// ✅ Solution 2: Use libraries like ethers.js
// import { formatEther, parseEther } from 'ethers';
// const eth = formatEther(weiBigInt); // "10.0"
Summary
This chapter solved:
- ✅ Decimal Conversion:
parse_decimal()andformat_decimal()for bidirectional lossless conversion. - ✅ u64 Range: Max value 184 Billion (at 8 decimals), sufficient for any financial scenario.
- ✅ Symbol Config:
SymbolManagerhandles precision settings per pair. - ✅ Precision Definitions: Distinct
decimals(internal) vsdisplay_decimals(UI).
🇨🇳 中文
📦 代码变更: 查看 Diff
在上一章中,我们将所有的 f64 重构为 u64,解决了浮点数的精度问题。但这引入了一个新的问题:客户端使用的是十进制,而我们内部使用的是整数,如何进行转换?
1. 十进制转换问题 (The Decimal Conversion Problem)
用户在下单时,输入的价格是 "100.50",数量是 "10.5"。但我们的引擎内部使用的是 u64 整数:
#![allow(unused)]
fn main() {
pub struct Order {
pub id: u64,
pub price: u64, // 整数表示
pub qty: u64, // 整数表示
pub side: Side,
}
}
核心问题:如何在十进制字符串和 u64 之间进行无损转换?
答案是使用 固定小数位数(Fixed Decimal) 方案:
#![allow(unused)]
fn main() {
/// 将十进制字符串转换为 u64
/// e.g., "100.50" with 2 decimals -> 10050
fn parse_decimal(s: &str, decimals: u32) -> u64 {
let multiplier = 10u64.pow(decimals);
// ... 解析逻辑
}
/// 将 u64 转换回十进制字符串用于显示
/// e.g., 10050 with 2 decimals -> "100.50"
fn format_decimal(value: u64, decimals: u32) -> String {
let multiplier = 10u64.pow(decimals);
let int_part = value / multiplier;
let dec_part = value % multiplier;
format!("{}.{:0>width$}", int_part, dec_part, width = decimals as usize)
}
}
2. u64 的最大值问题 (u64 Max Value)
u64 的最大值是:
u64::MAX = 18,446,744,073,709,551,615
如果我们使用 8 位小数(类似比特币的 satoshi),可以表示的最大值是:
184,467,440,737.09551615
这意味着:
- 对于价格:可以表示到约 1844 亿,某天比特币需要这么大价格表示的时候再升级吧….
- 对于数量:可以装进去全部比特币BTC总量了(总供应量 2100 万)
不同资产的 Decimals 配置
不同的区块链资产有不同的原生精度:
| Asset | Native Decimals | 最小单位 |
|---|---|---|
| BTC | 8 | 1 satoshi = 0.00000001 BTC |
| USDT (ERC20) | 6 | 0.000001 USDT |
| ETH | 18 | 1 wei = 0.000000000000000001 ETH |
问题来了:但是 ETH 原生是 18 位小数,但我们只用 8 位会丢失精度吗?
答案是:在交易所场景下足够使用。因为:
- 定义8位的时候交易所交易的最小支持精度是
0.00000001 ETH, 足够了 - 没有必要支持交易
0.000000000000000001 ETH(价值约 $0.000000000000003)
所以我们可以选择一个合理的内部精度,不一定要和原生链一致。
因此,我们需要一个资产和币对的基本配置管理器(SymbolManager),用于:
- 管理每个资产的内部精度(decimals)
- 管理用户可见的显示精度(display_decimals)
- 管理交易对的价格精度配置
- 在入金/提币时进行链上精度和内部精度的转换
ETH Decimals 分析:8 到 12 位的选择
让我们分析不同 decimals 配置下,u64 能表示的最大 ETH 数量:
| Decimals | 乘数 | u64 能表示的最大值 | 够用? |
|---|---|---|---|
| 8 | 10^8 | 184,467,440,737 ETH | ✅ 远超总供应量 |
| 9 | 10^9 | 18,446,744,073 ETH | ✅ 远超总供应量 |
| 10 | 10^10 | 1,844,674,407 ETH | ✅ 超过总供应量 |
| 11 | 10^11 | 184,467,440 ETH | ✅ 刚好超过总供应量 (~120M) |
| 12 | 10^12 | 18,446,744 ETH | ❌ 小于总供应量! |
| 18 | 10^18 | 18.44 ETH | ❌ 完全不够用 |
ETH 当前总供应量约 1.2 亿 ETH
分析:
- 8 位小数:最大 1844 亿 ETH,余量巨大,精度
0.00000001 ETH对交易所足够 - 10 位小数:最大 18 亿 ETH,精度更高
- 12 位小数:最大 1800 万 ETH,精度最高,⚠️ 但小于总供应量
为什么 ETH 选择 8 位小数?
虽然 ETH 原生是 18 位小数(wei),但对于交易所来说:
0.00000001 ETH≈$0.00000003,远小于任何有意义的交易金额- 最大可表示 1844 亿 ETH,远超总供应量(1.2 亿)
- 入金/提币时进行精度转换即可
配置示例:
#![allow(unused)]
fn main() {
// BTC: 8 位小数(和链上 satoshi 一致)
manager.add_asset(1, 8, 3, "BTC");
// USDT: 8 位小数
manager.add_asset(2, 8, 2, "USDT");
// ETH: 8 位小数(精度足够,范围安全)
manager.add_asset(3, 8, 4, "ETH");
}
3. 交易对配置 (Symbol Configuration)
不同的交易对可能有不同的精度要求:
| Symbol | Price Decimals | Qty Display Decimals | Example |
|---|---|---|---|
| BTC_USDT | 2 | 3 | 买 0.001 BTC @ $65000.00 |
| ETH_USDT | 2 | 4 | 买 0.0001 ETH @ $3500.00 |
| DOGE_USDT | 6 | 0 | 买 100 DOGE @ $0.123456 |
我们使用 SymbolManager 来管理这些配置:
#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
pub struct SymbolInfo {
pub symbol: String,
pub symbol_id: u32,
pub base_asset_id: u32,
pub quote_asset_id: u32,
pub price_decimal: u32, // 价格的小数位数
pub price_display_decimal: u32, // 价格显示的小数位数
}
#[derive(Debug, Clone)]
pub struct AssetInfo {
pub asset_id: u32,
pub decimals: u32, // 内部精度(通常是 8)
pub display_decimals: u32, // 显示/输入的最大小数位数
pub name: String,
}
}
4. decimals vs display_decimals
这里有两个概念需要区分:
decimals (内部精度)
- 决定了 u64 乘以多少
- 通常是 8(类似 satoshi)
- 这是内部存储精度,用户看不到
display_decimals (显示精度)
- 决定了用户可以输入/看到多少位小数
- 例如 BTC 显示 3 位:
0.001 BTC - USDT 显示 2 位:
100.00 USDT
为什么要分开?
- 用户体验:用户不需要看到 8 位小数的精度
- 输入验证:可以限制用户输入的小数位数
- 显示简洁:避免显示过多无意义的零
5. 运行结果
运行 cargo run 后的输出:
--- 0xInfinity: Stage 3 (Decimal World) ---
Symbol: BTC_USDT (ID: 0)
Price Decimals: 2, Qty Display Decimals: 3
[1] Makers coming in...
Order 1: Sell 10.000 BTC @ $100.00
Order 2: Sell 5.000 BTC @ $102.00
Order 3: Sell 5.000 BTC @ $101.00
[2] Taker eats liquidity...
Order 4: Buy 12.000 BTC @ $101.50
MATCH: Buy 4 eats Sell 1 @ Price 10000 (Qty: 10000)
MATCH: Buy 4 eats Sell 3 @ Price 10100 (Qty: 2000)
[3] More makers...
Order 5: Buy 10.000 BTC @ $99.00
--- End of Simulation ---
--- u64 Range Demo ---
u64::MAX = 18446744073709551615
With 8 decimals, max representable value = 184467440737.09551615
可以看到:
- 用户输入的是十进制字符串
"100.00" - 内部存储为整数
10000 - 显示时又转换回
"100.00"
这就是 Decimal World 的核心:在十进制和 u64 整数之间无缝转换。
📖 真实踩坑故事:JavaScript Number 溢出
在我们的开发过程中,曾经遇到过一个非常诡异的 bug:
现象:后端返回给前端的是原始 ETH 数量(单位 wei)。在开发测试阶段,因为测试金额非常小(0.00x 个 ETH 级别),前端都能正常显示和处理。但上线后只要金额稍大一点(实际上超过约 0.009 ETH),数字就开始出现精度问题,变成一个不正确的数值!
根本原因:JavaScript 的 Number 类型使用 IEEE 754 双精度浮点数,最大安全整数是 2^53 - 1:
> console.log(Number.MAX_SAFE_INTEGER);
9007199254740991 // 约 9 * 10^15
// 1 ETH = 10^18 wei
> const oneEthInWei = 1000000000000000000;
// 问题演示:当 wei 数量超过 MAX_SAFE_INTEGER 时
> const smallAmount = 1000000000000000; // 0.001 ETH = 10^15 wei ✅ 安全
> const dangerAmount = 9007199254740992; // 约 0.009 ETH ⚠️ 刚好超过安全范围
> const tenEthInWei = 10000000000000000000; // 10 ETH = 10^19 wei ❌ 溢出!
// 验证精度丢失:加 1 后值不变!
> console.log(tenEthInWei + 1);
10000000000000000000 // 没有 +1!
> console.log(tenEthInWei + 2);
10000000000000000000 // 还是一样!
> console.log(tenEthInWei + 1000);
10000000000000000000 // 加 1000 也还是一样!
> console.log(tenEthInWei === tenEthInWei + 1);
true // 😱 见鬼了!
为什么超过约 0.009 个 ETH 就出问题?
> console.log(Number.MAX_SAFE_INTEGER / 1e18);
0.009007199254740991 // 约 0.009 ETH 就是安全边界!
// 虽然输出看起来正确,但实际上精度已经丢失,验证方法:
> const nineEth = 9n * 10n ** 18n; // BigInt 表示 9 ETH
> const nineEthNum = Number(nineEth); // 转为 Number
> console.log(nineEthNum);
9000000000000000000 // 看起来正确...
> console.log(nineEthNum + 1);
9000000000000000000 // 但是 +1 没有效果!
> console.log(nineEthNum === nineEthNum + 1);
true // 证明精度已丢失
正确的处理方案:
// ✅ 方案 1: 后端返回字符串,前端用 BigInt 处理
> const weiString = "10000000000000000000"; // 后端返回字符串
> const weiBigInt = BigInt(weiString); // 转为 BigInt
> console.log(weiBigInt.toString());
10000000000000000000 // ✅ 精确!
// BigInt 可以正确进行算术运算
> console.log((weiBigInt + 1n).toString());
10000000000000000001 // ✅ +1 正确!
// ✅ 方案 2: 使用专业库如 ethers.js
// import { formatEther, parseEther } from 'ethers';
// const eth = formatEther(weiBigInt); // "10.0"
Summary
本章解决了以下问题:
- ✅ 十进制转换:
parse_decimal()和format_decimal()实现双向无损转换 - ✅ u64 范围:最大值 1844 亿(8 位小数),足够应对任何金融场景
- ✅ 交易对配置:
SymbolManager管理每个交易对的精度设置 - ✅ 两种精度定义:
decimals(内部)vsdisplay_decimals(显示)
0x04 OrderBook Refactoring (BTreeMap)
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📦 Code Changes: View Diff
In the previous chapters, we completed the transition from Float to Integer and established a precision configuration system. However, our OrderBook data structure was still a “toy” implementation—re-sorting on every match! This chapter upgrades it to a truly production-ready data structure.
1. The Problem with the Naive Implementation
Let’s review the original engine.rs:
#![allow(unused)]
fn main() {
pub struct OrderBook {
bids: Vec<PriceLevel>, // Was 'buys'
asks: Vec<PriceLevel>, // Was 'sells'
}
}
💡 Naming Convention: We renamed
buys/sellstobids/asks. These are standard options industry terms:
- Bid: Price buyers are willing to pay.
- Ask: Price sellers are demanding.
Using professional terminology aligns the code with industry docs and APIs.
#![allow(unused)]
fn main() {
fn match_buy(&mut self, buy_order: &mut Order) {
// Problem 1: Re-sort every time! O(n log n)
self.asks.sort_by_key(|l| l.price);
for level in self.asks.iter_mut() {
// ...matching logic...
}
// Problem 2: Removing empty levels shifts the whole array! O(n)
self.asks.retain(|l| !l.orders.is_empty());
}
fn rest_order(&mut self, order: Order) {
// Problem 3: Finding price level is a linear scan! O(n)
let level = self.asks.iter_mut().find(|l| l.price == order.price);
// ...
}
}
Time Complexity Analysis
| Operation | Vec Impl | Issue |
|---|---|---|
| Insert Order | O(n) | Linear scan for price level |
| Pre-match Sort | O(n log n) | Sort required before every match |
| Remove Empty Level | O(n) | Array element shifting |
In an active exchange with tens of thousands of orders per second, O(n) operations quickly become a performance bottleneck.
2. The BTreeMap Solution
Rust’s standard library provides BTreeMap, a Self-Balancing Binary Search Tree:
#![allow(unused)]
fn main() {
use std::collections::BTreeMap;
pub struct OrderBook {
/// Asks: price -> orders (Ascending, Lowest Price = Best Ask)
asks: BTreeMap<u64, VecDeque<Order>>,
/// Bids: (u64::MAX - price) -> orders (Trick: Highest Price First)
bids: BTreeMap<u64, VecDeque<Order>>,
}
}
Key Trick: Key Design for Bids
BTreeMap sorts keys in ascending order by default. This works perfectly for Asks (lowest price first). But for Bids, we need highest price first.
Solution: Use u64::MAX - price as the key.
#![allow(unused)]
fn main() {
// Insert Bid
let key = u64::MAX - order.price;
self.bids.entry(key).or_insert_with(VecDeque::new).push_back(order);
// Read Real Price
let price = u64::MAX - key;
}
Thus, Price 100 becomes Key u64::MAX - 100, and Price 99 becomes u64::MAX - 99. Since (u64::MAX - 100) < (u64::MAX - 99), Price 100 comes before Price 99!
Why not Reverse or Custom Comparator?
You might ask: Why not BTreeMap<Reverse<u64>, ...>?
Comparison:
| Approach | Issue |
|---|---|
BTreeMap<Reverse<u64>> | Reverse is a wrapper; unwrapping on every access adds complexity. |
Custom Ord | Requires a newtype wrapper, increasing boilerplate. |
u64::MAX - price | Zero-Cost Abstraction: Two subtraction ops, easily inlined by compiler. |
Key Advantages:
- Simple: Just two lines of code.
- Zero Overhead: Subtraction is a single-cycle CPU instruction.
- Type Safe: Key remains
u64. - No Overflow: Price is always <
u64::MAX.
Time Complexity Comparison
| Operation | Vec Impl | BTreeMap Impl |
|---|---|---|
| Insert Order | O(n) | O(log n) |
| Match (No Sort) | - | O(log n) |
| Cancel Order | O(n) | O(n)* |
| Remove Empty Level | O(n) | O(log n) |
| Query Best Price | O(n) / O(n log n) | **O(1)**xx |
*Note: Cancelling requires linear scan in
VecDeque(O(n)). O(1) cancel requires an auxiliary HashMap index. **Note:BTreeMap::first_key_value()is amortized O(1).
3. New Data Models
Order
#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
pub struct Order {
pub id: u64,
pub price: u64, // Internal Integer Price
pub qty: u64, // Original Qty
pub filled_qty: u64, // Filled Qty
pub side: Side,
pub order_type: OrderType,
pub status: OrderStatus,
}
}
Trade
#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
pub struct Trade {
pub id: u64,
pub buyer_order_id: u64,
pub seller_order_id: u64,
pub price: u64,
pub qty: u64,
}
}
OrderResult
#![allow(unused)]
fn main() {
pub struct OrderResult {
pub order: Order, // Updated Order
pub trades: Vec<Trade>, // Generated Trades
}
}
4. Core API
#![allow(unused)]
fn main() {
impl OrderBook {
/// Add order, return match result
pub fn add_order(&mut self, order: Order) -> OrderResult;
/// Cancel order
pub fn cancel_order(&mut self, order_id: u64, price: u64, side: Side) -> bool;
/// Get Best Bid
pub fn best_bid(&self) -> Option<u64>;
/// Get Best Ask
pub fn best_ask(&self) -> Option<u64>;
/// Get Spread
pub fn spread(&self) -> Option<u64>;
}
}
5. Execution Results
=== 0xInfinity: Stage 4 (BTree OrderBook) ===
Symbol: BTC_USDT (ID: 0)
Price Decimals: 2, Qty Display Decimals: 3
[1] Makers coming in...
Order 1: Sell 10.000 BTC @ $100.00 -> New
Order 2: Sell 5.000 BTC @ $102.00 -> New
Order 3: Sell 5.000 BTC @ $101.00 -> New
Book State: Best Bid=None, Best Ask=Some("100.00"), Spread=None
[2] Taker eats liquidity...
Order 4: Buy 12.000 BTC @ $101.50
Trades:
- Trade #1: 10.000 @ $100.00
- Trade #2: 2.000 @ $101.00
Order Status: Filled, Filled: 12.000/12.000
Book State: Best Bid=None, Best Ask=Some("101.00")
[3] More makers...
Order 5: Buy 10.000 BTC @ $99.00 -> New
Final Book State: Best Bid=Some("99.00"), Best Ask=Some("101.00"), Spread=Some("2.00")
=== End of Simulation ===
Observations:
- Orders matched correctly by price priority (First $100, then $101).
- Every trade recorded in
Trades. - Real-time tracking of Best Bid/Ask and Spread.
6. Unit Tests
We added 8 unit tests covering core scenarios:
$ cargo test
running 8 tests
test engine::tests::test_add_resting_order ... ok
test engine::tests::test_cancel_order ... ok
test engine::tests::test_fifo_at_same_price ... ok
test engine::tests::test_full_match ... ok
test engine::tests::test_multiple_trades_single_order ... ok
test engine::tests::test_partial_match ... ok
test engine::tests::test_price_priority ... ok
test engine::tests::test_spread ... ok
test result: ok. 8 passed; 0 failed
7. Is BTreeMap Enough?
For an exchange not chasing extreme performance, BTreeMap is perfectly adequate:
| Scenario | BTreeMap Performance |
|---|---|
| 1,000 TPS | Easy |
| 10,000 TPS | Manageable |
| 100,000+ TPS | Need specialized structures |
If you want to build a Ferrari-level matching engine (nanosecond latency, millions of TPS), you need:
- Lock-free data structures
- Memory pools (avoid heap allocation)
- CPU Cache optimization
- FPGA acceleration
But that’s for later. For now, we have a Correct and Efficient baseline implementation.
Summary
This chapter accomplished:
- ✅ Analyzed Problem: O(n) bottleneck in Vec implementation.
- ✅ Refactored to BTreeMap: O(log n) insert/search/delete.
- ✅ Defined Types: Standard Order/Trade/OrderResult models.
- ✅ Refined API: best_bid/ask, spread, cancel_order.
- ✅ Added Tests: 8 tests covering core logic.
🇨🇳 中文
📦 代码变更: 查看 Diff
在前三章中,我们完成了从浮点数到整数的转换,并建立了精度配置系统。但我们的 OrderBook 数据结构还是一个“玩具“实现——每次撮合都需要重新排序!本章我们将把它升级为一个真正生产可用的数据结构。
1. 原有实现的问题
让我们回顾一下原来的 engine.rs:
#![allow(unused)]
fn main() {
pub struct OrderBook {
bids: Vec<PriceLevel>, // 原来叫 buys
asks: Vec<PriceLevel>, // 原来叫 sells
}
}
💡 命名规范:我们把
buys/sells改名为bids/asks。这是金融行业的标准术语:
- Bid(买盘):买方愿意出的价格
- Ask(卖盘):卖方要求的价格
使用专业术语可以让代码更易于与行业文档、API 对接。
#![allow(unused)]
fn main() {
fn match_buy(&mut self, buy_order: &mut Order) {
// 问题 1: 每次都要重新排序!O(n log n)
self.asks.sort_by_key(|l| l.price);
for level in self.asks.iter_mut() {
// ...matching logic...
}
// 问题 2: 删除空档位需要移动整个数组!O(n)
self.asks.retain(|l| !l.orders.is_empty());
}
fn rest_order(&mut self, order: Order) {
// 问题 3: 查找价格档位是线性扫描!O(n)
let level = self.asks.iter_mut().find(|l| l.price == order.price);
// ...
}
}
时间复杂度分析
| 操作 | Vec 实现 | 问题 |
|---|---|---|
| 插入订单 | O(n) | 线性查找价格档位 |
| 撮合前排序 | O(n log n) | 每次撮合都要排序 |
| 删除空档位 | O(n) | 数组元素移动 |
在一个活跃的交易所,每秒可能有数万笔订单。如果每笔订单都要 O(n) 操作,这里很快就会成为性能瓶颈。
2. BTreeMap 解决方案
Rust 标准库提供了 BTreeMap,它是一个自平衡二叉搜索树:
#![allow(unused)]
fn main() {
use std::collections::BTreeMap;
pub struct OrderBook {
/// 卖单: price -> orders (按价格升序,最低价 = 最优卖价)
asks: BTreeMap<u64, VecDeque<Order>>,
/// 买单: (u64::MAX - price) -> orders (技巧:让最高价排在前面)
bids: BTreeMap<u64, VecDeque<Order>>,
}
}
关键技巧:买单的 Key 设计
BTreeMap 默认按 key 升序排列。对于卖单,这正好是我们想要的(最低价优先)。但对于买单,我们需要最高价优先。
解决方案:使用 u64::MAX - price 作为 key:
#![allow(unused)]
fn main() {
// 插入买单
let key = u64::MAX - order.price;
self.bids.entry(key).or_insert_with(VecDeque::new).push_back(order);
// 读取真实价格
let price = u64::MAX - key;
}
这样,价格 100 对应 key u64::MAX - 100,价格 99 对应 key u64::MAX - 99。由于 (u64::MAX - 100) < (u64::MAX - 99),价格 100 会排在价格 99 前面!
为什么不用 Reverse 或自定义比较器?
你可能会问:为什么不用 BTreeMap<Reverse<u64>, ...> 或者自定义比较器?
方案对比:
| 方案 | 问题 |
|---|---|
BTreeMap<Reverse<u64>, ...> | Reverse 是一个 wrapper 类型,每次访问 key 都需要解包,增加代码复杂度 |
自定义 Ord trait | 需要创建 newtype wrapper,代码量大增 |
u64::MAX - price | 零成本抽象:两次减法操作,编译器可以内联优化 |
关键优势:
- 简单:只需要两行代码(插入时
u64::MAX - price,读取时再减回来) - 零开销:减法操作在 CPU 上是单周期指令
- 类型安全:key 仍然是
u64,不需要额外的 wrapper 类型 - 无溢出风险:价格永远小于
u64::MAX,减法不会溢出
时间复杂度对比
| 操作 | Vec 实现 | BTreeMap 实现 |
|---|---|---|
| 插入订单 | O(n) | O(log n) |
| 撮合(不排序) | - | O(log n) |
| 取消订单 | O(n) | O(n)* |
| 删除空价格档 | O(n) | O(log n) |
| 查询最优价 | O(n) 或 O(n log n) | **O(1)**xx |
*注: 取消订单需要在 VecDeque 中线性查找订单 ID,这是 O(n)。如果需要 O(1) 取消,需要额外的 HashMap 索引。
**注: BTreeMap 的
first_key_value()是 O(1) 摊销复杂度。
3. 新的数据模型
Order(订单)
#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
pub struct Order {
pub id: u64,
pub price: u64, // 价格(内部单位)
pub qty: u64, // 原始数量
pub filled_qty: u64, // 已成交数量
pub side: Side,
pub order_type: OrderType,
pub status: OrderStatus,
}
}
Trade(成交记录)
#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
pub struct Trade {
pub id: u64,
pub buyer_order_id: u64,
pub seller_order_id: u64,
pub price: u64,
pub qty: u64,
}
}
OrderResult(下单结果)
#![allow(unused)]
fn main() {
pub struct OrderResult {
pub order: Order, // 更新后的订单
pub trades: Vec<Trade>, // 产生的成交
}
}
4. 核心 API
#![allow(unused)]
fn main() {
impl OrderBook {
/// 添加订单,返回成交结果
pub fn add_order(&mut self, order: Order) -> OrderResult;
/// 取消订单
pub fn cancel_order(&mut self, order_id: u64, price: u64, side: Side) -> bool;
/// 获取最优买价
pub fn best_bid(&self) -> Option<u64>;
/// 获取最优卖价
pub fn best_ask(&self) -> Option<u64>;
/// 获取买卖价差
pub fn spread(&self) -> Option<u64>;
}
}
5. 运行结果
=== 0xInfinity: Stage 4 (BTree OrderBook) ===
Symbol: BTC_USDT (ID: 0)
Price Decimals: 2, Qty Display Decimals: 3
[1] Makers coming in...
Order 1: Sell 10.000 BTC @ $100.00 -> New
Order 2: Sell 5.000 BTC @ $102.00 -> New
Order 3: Sell 5.000 BTC @ $101.00 -> New
Book State: Best Bid=None, Best Ask=Some("100.00"), Spread=None
[2] Taker eats liquidity...
Order 4: Buy 12.000 BTC @ $101.50
Trades:
- Trade #1: 10.000 @ $100.00
- Trade #2: 2.000 @ $101.00
Order Status: Filled, Filled: 12.000/12.000
Book State: Best Bid=None, Best Ask=Some("101.00")
[3] More makers...
Order 5: Buy 10.000 BTC @ $99.00 -> New
Final Book State: Best Bid=Some("99.00"), Best Ask=Some("101.00"), Spread=Some("2.00")
=== End of Simulation ===
可以看到:
- 订单按价格优先级正确匹配(先 $100,再 $101)
- 每笔成交都记录在
Trade中 - 实时追踪 Best Bid/Ask 和 Spread
6. 单元测试
我们添加了 8 个单元测试来验证核心功能:
$ cargo test
running 8 tests
test engine::tests::test_add_resting_order ... ok
test engine::tests::test_cancel_order ... ok
test engine::tests::test_fifo_at_same_price ... ok
test engine::tests::test_full_match ... ok
test engine::tests::test_multiple_trades_single_order ... ok
test engine::tests::test_partial_match ... ok
test engine::tests::test_price_priority ... ok
test engine::tests::test_spread ... ok
test result: ok. 8 passed; 0 failed
覆盖的场景包括:
- ✅ 订单挂单(无匹配)
- ✅ 完全成交
- ✅ 部分成交
- ✅ 价格优先级(Price Priority)
- ✅ 同价格 FIFO
- ✅ 取消订单
- ✅ 价差计算
- ✅ 一个大单吃掉多个小单
7. BTreeMap 够用吗?
对于一个不追求极致性能的交易所,BTreeMap 完全够用:
| 场景 | BTreeMap 表现 |
|---|---|
| 每秒 1000 单 | 轻松应对 |
| 每秒 10000 单 | 可以应对 |
| 每秒 100000+ 单 | 需要更专业的数据结构 |
如果你要打造一个法拉利级别的撮合引擎(纳秒级延迟、每秒百万单),需要考虑:
- 无锁数据结构
- 内存池(避免动态分配)
- CPU Cache 优化
- FPGA 硬件加速
但那是后话了。现在,我们有了一个正确且高效的基础实现。
Summary
本章完成了以下工作:
- ✅ 分析原有问题:Vec 实现的 O(n) 复杂度瓶颈
- ✅ 重构为 BTreeMap:O(log n) 的插入、查找、删除
- ✅ 定义规范类型:Order、Trade、OrderResult
- ✅ 完善 API:best_bid/ask、spread、cancel_order
- ✅ 添加单元测试:8 个测试覆盖核心场景
0x05 User Account & Balance Management
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📦 Code Changes: View Diff
In previous chapters, our matching engine could match orders correctly. But there’s a key question: User Funds? In a real exchange, users must have sufficient funds before placing an order, and funds must be transferred upon matching.
This chapter implements the user account system, including:
- Balance Management (Avail / Frozen)
- Pre-trade Fund Validation
- Post-trade Settlement
1. Dual State of Balance: Avail vs Frozen
In an exchange, a balance has two states:
| State | Meaning | Usage |
|---|---|---|
| Avail | Can be used for trading or withdrawal | Daily operations |
| Frozen | Locked in open orders | Waiting for match or cancel |
Why do we need Frozen?
Suppose Alice has 10 BTC and she places two sell orders:
- Order A: Sell 8 BTC
- Order B: Sell 5 BTC
Without a freeze mechanism, these two orders require 13 BTC, but Alice only has 10! This is the Over-Selling problem.
Correct Flow:
1. Alice has 10 BTC (avail=10, frozen=0)
2. Place Order A (8 BTC) → freeze 8 BTC → (avail=2, frozen=8) ✅
3. Place Order B (5 BTC) → try freeze 5 BTC → Fail! avail only 2 ❌
2. Balance Structure
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Default)]
pub struct Balance {
pub avail: u64, // Available Balance
pub frozen: u64, // Frozen Balance
}
impl Balance {
/// Deposit (Increase avail)
/// Returns false on overflow - Financial systems must detect this!
pub fn deposit(&mut self, amount: u64) -> bool {
match self.avail.checked_add(amount) {
Some(new_avail) => {
self.avail = new_avail;
true
}
None => false, // Overflow! Alert and investigate.
}
}
}
Why
checked_add?
Method Overflow Behavior (250u8 + 10u8) Use Case +(Std)Panic (Debug) or Wrap (Release) General logic, overflow is bug wrapping_add4 (Wrap) Hashing, Graphics saturating_add255 (Cap) Quotas, Token buckets checked_addNone✅ Finance, Overflow must error! ⚠️ In financial systems, “too much money causing overflow” is a severe bug. It must return an error for handling, not silently wrap or saturate.
#![allow(unused)]
fn main() {
/// Freeze (avail → frozen)
pub fn freeze(&mut self, amount: u64) -> bool {
if self.avail >= amount {
self.avail -= amount;
self.frozen += amount;
true
} else {
false
}
}
/// Unfreeze (frozen → avail), for cancellations
pub fn unfreeze(&mut self, amount: u64) -> bool {
if self.frozen >= amount {
self.frozen -= amount;
self.avail += amount;
true
} else {
false
}
}
/// Consume Frozen (Fund leaves account after match)
pub fn consume_frozen(&mut self, amount: u64) -> bool {
if self.frozen >= amount {
self.frozen -= amount;
true
} else {
false
}
}
/// Receive Funds (Fund enters account after match)
pub fn receive(&mut self, amount: u64) {
self.avail = self.avail.checked_add(amount);
}
}
}
3. User Account Structure
Each user holds balances for multiple assets:
#![allow(unused)]
fn main() {
/// Use FxHashMap for O(1) asset lookup
/// FxHashMap is faster for integer keys
pub struct UserAccount {
pub user_id: u64,
balances: FxHashMap<u32, Balance>, // asset_id -> Balance
}
impl UserAccount {
pub fn deposit(&mut self, asset_id: u32, amount: u64) {
self.get_balance_mut(asset_id).deposit(amount);
}
pub fn avail(&self, asset_id: u32) -> u64 {
self.balances.get(&asset_id).map(|b| b.avail).unwrap_or(0)
}
pub fn frozen(&self, asset_id: u32) -> u64 {
self.balances.get(&asset_id).map(|b| b.frozen).unwrap_or(0)
}
}
}
4. Order Placing: Freezing Funds
When placing an order, we freeze specific assets based on order side:
| Order Side | Asset to Freeze | Amount |
|---|---|---|
| Buy | Quote Asset (e.g. USDT) | price × quantity / qty_unit |
| Sell | Base Asset (e.g. BTC) | quantity |
Using SymbolManager for Precision
Each pair has its own precision config:
#![allow(unused)]
fn main() {
let symbol_info = manager.get_symbol_info("BTC_USDT").unwrap();
let price_decimal = symbol_info.price_decimal; // 2
let base_asset = manager.assets.get(&symbol_info.base_asset_id).unwrap();
let qty_decimal = base_asset.decimals; // 8
let qty_unit = 10u64.pow(qty_decimal); // 100_000_000
// price = 100 USDT (Internal: 100 * price_unit)
// qty = 10 BTC (Internal: 10 * qty_unit)
// cost = price * qty / qty_unit (Prevent overflow)
let cost = price * qty / qty_unit;
if accounts.freeze(user_id, USDT, cost) {
let result = book.add_order(Order::new(id, user_id, price, qty, Side::Buy));
} else {
println!("REJECTED: Insufficient balance");
}
// Sell Order: Freeze BTC
if accounts.freeze(user_id, BTC, qty) {
let result = book.add_order(Order::new(id, user_id, price, qty, Side::Sell));
}
}
5. Settlement: Fund Transfer
When orders match, funds transfer between buyer and seller:
Trade: Alice sells 1 BTC to Bob @ $100
Before:
Alice: BTC(frozen=1), USDT(avail=0)
Bob: BTC(avail=0), USDT(frozen=100)
Settlement:
Alice: consume_frozen(BTC, 1) + receive(USDT, 100)
Bob: consume_frozen(USDT, 100) + receive(BTC, 1)
After:
Alice: BTC(frozen=0), USDT(avail=100)
Bob: BTC(avail=1), USDT(frozen=0)
Code Implementation:
#![allow(unused)]
fn main() {
pub fn settle_trade(
&mut self,
buyer_id: u64,
seller_id: u64,
base_asset_id: u32,
quote_asset_id: u32,
base_amount: u64, // Trade Qty
quote_amount: u64, // Trade Amount (price × qty)
) {
// Buyer: Use USDT, Get BTC
self.get_account_mut(buyer_id)
.get_balance_mut(quote_asset_id)
.consume_frozen(quote_amount);
self.get_account_mut(buyer_id)
.get_balance_mut(base_asset_id)
.receive(base_amount);
// Seller: Use BTC, Get USDT
self.get_account_mut(seller_id)
.get_balance_mut(base_asset_id)
.consume_frozen(base_amount);
self.get_account_mut(seller_id)
.get_balance_mut(quote_asset_id)
.receive(quote_amount);
}
}
6. Refined Trade Structure
To support settlement, Trade needs user IDs:
#![allow(unused)]
fn main() {
pub struct Trade {
pub id: u64,
pub buyer_order_id: u64,
pub seller_order_id: u64,
pub buyer_user_id: u64, // New
pub seller_user_id: u64, // New
pub price: u64,
pub qty: u64,
}
}
7. Execution Results
=== 0xInfinity: Stage 5 (User Balance) ===
Symbol: BTC_USDT | Price: 2 decimals, Qty: 8 decimals
Cost formula: price * qty / 100000000
[0] Initial deposits...
Alice: 100.00000000 BTC, 10000.00 USDT
Bob: 5.00000000 BTC, 200000.00 USDT
[1] Alice places sell orders...
Order 1: Sell 10.00000000 BTC @ $100.00 -> New
Order 2: Sell 5.00000000 BTC @ $101.00 -> New
Alice balance: avail=85.00000000 BTC, frozen=15.00000000 BTC
[2] Bob places buy order (taker)...
Order 3: Buy 12.00000000 BTC @ $101.00 (cost: 1212.00 USDT)
Trades:
- Trade #1: 10.00000000 BTC @ $100.00
- Trade #2: 2.00000000 BTC @ $101.00
Order status: Filled
[3] Final balances:
Alice: 85.00000000 BTC (frozen: 3.00000000), 11202.00 USDT
Bob: 17.00000000 BTC, 198798.00 USDT (frozen: 0.00)
Book: Best Bid=None, Best Ask=Some("101.00")
Analysis:
- Alice initial 100 BTC. Sold 10+2=12. Remaining 85 avail + 3 frozen = 88 BTC ✓
- Alice got 10×100 + 2×101 = 1202 USDT. Initial 10000 + 1202 = 11202 USDT ✓
- Bob initial 5 BTC. Bought 12. Total 17 BTC ✓
- Bob spent 1202 USDT. Initial 200000 - 1202 = 198798 USDT ✓
Summary
This chapter accomplished:
- ✅ Implemented Balance: Dual-state (avail/frozen).
- ✅ Implemented UserAccount: Multi-asset support.
- ✅ Implemented AccountManager: Managing all users.
- ✅ Pre-trade Freeze: Prevent over-selling/buying.
- ✅ Post-trade Settlement: Correct fund transfer.
- ✅ Refined Trade: Included user_ids.
Now our engine not only matches orders but also ensures funding sufficiency and correct settlement!
🇨🇳 中文
📦 代码变更: 查看 Diff
在前几章中,我们的撮合引擎已经可以正确匹配订单并产生成交。但有一个关键问题:钱从哪里来? 在真实的交易所中,用户必须先有足够的资金才能下单,成交后资金才会转移。
本章我们将实现用户账户系统,包括:
- 余额管理(可用 / 冻结)
- 下单前资金校验
- 成交后资金结算
1. 余额的双重状态:Avail vs Frozen
在交易所中,用户的余额有两种状态:
| 状态 | 含义 | 使用场景 |
|---|---|---|
| Avail (可用) | 可以用于下单或提现 | 日常操作 |
| Frozen (冻结) | 已锁定在挂单中 | 等待成交或取消 |
为什么需要冻结?
假设 Alice 有 10 BTC,她同时挂了两个卖单:
- 卖单 A:卖 8 BTC
- 卖单 B:卖 5 BTC
如果没有冻结机制,这两个订单共需要 13 BTC,但 Alice 只有 10 BTC!这就是超卖问题。
正确的流程:
1. Alice 有 10 BTC (avail=10, frozen=0)
2. 下卖单 A (8 BTC) → freeze 8 BTC → (avail=2, frozen=8) ✅
3. 下卖单 B (5 BTC) → 尝试 freeze 5 BTC → 失败!avail 只有 2 ❌
2. Balance 结构
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Default)]
pub struct Balance {
pub avail: u64, // 可用余额 (简短命名,JSON 输出更高效)
pub frozen: u64, // 冻结余额
}
impl Balance {
/// 存款 (增加 avail)
/// 返回 false 表示溢出 - 金融系统必须检测此错误
pub fn deposit(&mut self, amount: u64) -> bool {
match self.avail.checked_add(amount) {
Some(new_avail) => {
self.avail = new_avail;
true
}
None => false, // 溢出!需要报警和调查
}
}
}
为什么要用
checked_add?
方法 溢出行为 (250u8 + 10u8) 适用场景 +(标准)Panic (Debug) 或 4 (Release回绕) 常规逻辑,溢出是 Bug wrapping_add4 (回绕) 哈希计算、图形算法 saturating_add255 (封顶) 资源配额、令牌桶 checked_addNone✅ 金融余额,溢出必须报错! ⚠️ 金融系统中,“钱多到溢出“是严重的 Bug,必须返回错误让上层处理,而不是静默封顶或回绕。
#![allow(unused)]
fn main() {
/// 冻结 (avail → frozen)
pub fn freeze(&mut self, amount: u64) -> bool {
if self.avail >= amount {
self.avail -= amount;
self.frozen += amount;
true
} else {
false
}
}
/// 解冻 (frozen → avail),用于取消订单
pub fn unfreeze(&mut self, amount: u64) -> bool {
if self.frozen >= amount {
self.frozen -= amount;
self.avail += amount;
true
} else {
false
}
}
/// 消耗冻结资金 (成交后,资金离开账户)
pub fn consume_frozen(&mut self, amount: u64) -> bool {
if self.frozen >= amount {
self.frozen -= amount;
true
} else {
false
}
}
/// 接收资金 (成交后,资金进入账户)
pub fn receive(&mut self, amount: u64) {
self.avail = self.avail.checked_add(amount);
}
}
}
3. 用户账户结构
每个用户持有多种资产的余额:
#![allow(unused)]
fn main() {
/// 使用 FxHashMap 实现 O(1) 资产查找
/// FxHashMap 使用更简单、更快的哈希函数,特别适合整数键
pub struct UserAccount {
pub user_id: u64,
balances: FxHashMap<u32, Balance>, // asset_id -> Balance
}
impl UserAccount {
pub fn deposit(&mut self, asset_id: u32, amount: u64) {
self.get_balance_mut(asset_id).deposit(amount);
}
pub fn avail(&self, asset_id: u32) -> u64 {
self.balances.get(&asset_id).map(|b| b.avail).unwrap_or(0)
}
pub fn frozen(&self, asset_id: u32) -> u64 {
self.balances.get(&asset_id).map(|b| b.frozen).unwrap_or(0)
}
}
}
4. 下单流程:冻结资金
在下单时,我们需要根据订单类型冻结相应的资产:
| 订单类型 | 需要冻结的资产 | 冻结金额 |
|---|---|---|
| 买单 (Buy) | Quote 资产 (如 USDT) | price × quantity / qty_unit |
| 卖单 (Sell) | Base 资产 (如 BTC) | quantity |
从 SymbolManager 获取精度配置
每个交易对有独立的精度配置:
#![allow(unused)]
fn main() {
let symbol_info = manager.get_symbol_info("BTC_USDT").unwrap();
let price_decimal = symbol_info.price_decimal; // 2 (价格精度)
let base_asset = manager.assets.get(&symbol_info.base_asset_id).unwrap();
let qty_decimal = base_asset.decimals; // 8 (数量精度)
let qty_unit = 10u64.pow(qty_decimal); // 100_000_000
// price = 100 USDT (内部单位: 100 * price_unit)
// qty = 10 BTC (内部单位: 10 * qty_unit)
// cost = price * qty / qty_unit (确保不会溢出)
let cost = price * qty / qty_unit;
if accounts.freeze(user_id, USDT, cost) {
let result = book.add_order(Order::new(id, user_id, price, qty, Side::Buy));
} else {
println!("REJECTED: Insufficient balance");
}
// 卖单:冻结 BTC
if accounts.freeze(user_id, BTC, qty) {
let result = book.add_order(Order::new(id, user_id, price, qty, Side::Sell));
}
}
这样,精度配置跟着 Symbol 走,price * qty / qty_unit 保证结果在合理范围内。
5. 成交结算:资金转移
当订单匹配成交后,需要在买卖双方之间转移资金:
Trade: Alice sells 1 BTC to Bob @ $100
Before:
Alice: BTC(frozen=1), USDT(avail=0)
Bob: BTC(avail=0), USDT(frozen=100)
Settlement:
Alice: consume_frozen(BTC, 1) + receive(USDT, 100)
Bob: consume_frozen(USDT, 100) + receive(BTC, 1)
After:
Alice: BTC(frozen=0), USDT(avail=100)
Bob: BTC(avail=1), USDT(frozen=0)
代码实现:
#![allow(unused)]
fn main() {
pub fn settle_trade(
&mut self,
buyer_id: u64,
seller_id: u64,
base_asset_id: u32, // 如 BTC
quote_asset_id: u32, // 如 USDT
base_amount: u64, // 成交数量
quote_amount: u64, // 成交金额 (price × qty)
) {
// Buyer: 消耗 USDT,获得 BTC
self.get_account_mut(buyer_id)
.get_balance_mut(quote_asset_id)
.consume_frozen(quote_amount);
self.get_account_mut(buyer_id)
.get_balance_mut(base_asset_id)
.receive(base_amount);
// Seller: 消耗 BTC,获得 USDT
self.get_account_mut(seller_id)
.get_balance_mut(base_asset_id)
.consume_frozen(base_amount);
self.get_account_mut(seller_id)
.get_balance_mut(quote_asset_id)
.receive(quote_amount);
}
}
6. Trade 结构的完善
为了正确结算,Trade 结构需要包含买卖双方的用户 ID:
#![allow(unused)]
fn main() {
pub struct Trade {
pub id: u64,
pub buyer_order_id: u64,
pub seller_order_id: u64,
pub buyer_user_id: u64, // 新增
pub seller_user_id: u64, // 新增
pub price: u64,
pub qty: u64,
}
}
在撮合时,从 Order 中提取 user_id 并写入 Trade:
#![allow(unused)]
fn main() {
trades.push(Trade::new(
self.trade_id_counter,
buy_order.id,
sell_order.id,
buy_order.user_id, // 从订单获取用户 ID
sell_order.user_id,
price,
trade_qty,
));
}
7. 运行结果
=== 0xInfinity: Stage 5 (User Balance) ===
Symbol: BTC_USDT | Price: 2 decimals, Qty: 8 decimals
Cost formula: price * qty / 100000000
[0] Initial deposits...
Alice: 100.00000000 BTC, 10000.00 USDT
Bob: 5.00000000 BTC, 200000.00 USDT
[1] Alice places sell orders...
Order 1: Sell 10.00000000 BTC @ $100.00 -> New
Order 2: Sell 5.00000000 BTC @ $101.00 -> New
Alice balance: avail=85.00000000 BTC, frozen=15.00000000 BTC
[2] Bob places buy order (taker)...
Order 3: Buy 12.00000000 BTC @ $101.00 (cost: 1212.00 USDT)
Trades:
- Trade #1: 10.00000000 BTC @ $100.00
- Trade #2: 2.00000000 BTC @ $101.00
Order status: Filled
[3] Final balances:
Alice: 85.00000000 BTC (frozen: 3.00000000), 11202.00 USDT
Bob: 17.00000000 BTC, 198798.00 USDT (frozen: 0.00)
Book: Best Bid=None, Best Ask=Some("101.00")
分析:
- Alice 初始有 100 BTC,卖出 10+2=12 BTC,还剩 85 + 3(frozen) = 88 BTC ✓
- Alice 收到 10×100 + 2×101 = 1202 USDT,加上初始 10000 = 11202 USDT ✓
- Bob 初始有 5 BTC,买入 12 BTC = 17 BTC ✓
- Bob 花费 1202 USDT,初始 200000 - 1202 = 198798 USDT ✓
Summary
本章完成了以下工作:
- ✅ 实现 Balance 结构:avail/frozen 双状态余额管理
- ✅ 实现 UserAccount:一个用户持有多种资产余额
- ✅ 实现 AccountManager:管理所有用户账户
- ✅ 下单前资金冻结:防止超卖/超买
- ✅ 成交后资金结算:在买卖双方间正确转移资金
- ✅ 完善 Trade 结构:包含买卖双方 user_id
- ✅ 添加单元测试:4 个新测试覆盖余额管理
现在我们的撮合引擎不仅能正确匹配订单,还能确保用户有足够的资金,并在成交后正确结算!
0x06 Enforced Balance Management
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📦 Code Changes: View Diff
In the previous chapter, we implemented balance management. However, in financial systems, fund operations are the most critical part and must be foolproof. This chapter upgrades balance management to a Type-System Enforced version.
1. Why “Enforced”?
The previous implementation had flaws:
#![allow(unused)]
fn main() {
// ❌ Problem 1: Public fields, easily modified unintentionally
pub struct Balance {
pub avail: u64, // Dev might assign directly, bypassing logic
pub frozen: u64,
}
// ❌ Problem 2: Returns bool, unclear error
fn freeze(&mut self, amount: u64) -> bool {
// Failed? Why? Don't know.
}
// ❌ Problem 3: No Audit Trail
// Balance changed, but no versioning for tracing.
}
These issues can lead to:
- Developers accidentally bypassing checks: In complex logic, one might modify fields directly.
- Hard to debug: “Operation failed” doesn’t tell you why.
- Audit difficulty: No change tracking makes it hard to pinpoint when a bug occurred.
Note: This is not to prevent malicious attacks (it’s an internal system), but to prevent developer errors. Just like Rust’s ownership system—we use types to reduce the chance of shooting ourselves in the foot.
2. Enforced Balance Design
The new version enforces safety via Rust Type System:
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
pub struct Balance {
avail: u64, // ← Private! Only accessible via methods
frozen: u64, // ← Private!
version: u64, // ← Private! Auto-increment on change
}
}
Core Principles
| Principle | Implementation |
|---|---|
| Encapsulation | All fields private, read-only getters provided |
| Explicit Error | All mutations return Result<(), &'static str> |
| Audit Trail | version auto-increments on every mutation |
| Overflow Protection | Use checked_add/sub, overflow returns Error |
Method Renaming
| Old (v0.5) | New (v0.6) | Meaning |
|---|---|---|
freeze() | lock() | More accurate: lock funds for order |
unfreeze() | unlock() | Unlock (when cancelling) |
consume_frozen() | spend_frozen() | Spend frozen funds (after match) |
receive() | deposit() | Unified deposit semantics |
3. Balance API Details
Safe Getters
#![allow(unused)]
fn main() {
impl Balance {
/// Get Available (Read-only)
pub const fn avail(&self) -> u64 { self.avail }
/// Get Frozen (Read-only)
pub const fn frozen(&self) -> u64 { self.frozen }
/// Get Total (avail + frozen)
/// Returns None on overflow (data corruption)
pub const fn total(&self) -> Option<u64> {
self.avail.checked_add(self.frozen)
}
/// Get Version (Read-only)
pub const fn version(&self) -> u64 { self.version }
}
}
Why
const fn? Compiler guarantees state is never modified, providing strongest safety.
Validated Mutations
Every mutation method:
- Validates preconditions
- Uses checked arithmetic
- Returns
Result - Auto-increments
version
#![allow(unused)]
fn main() {
/// Deposit: Increase Available
pub fn deposit(&mut self, amount: u64) -> Result<(), &'static str> {
self.avail = self.avail.checked_add(amount)
.ok_or("Deposit overflow")?; // ← Return Error on Overflow
self.version = self.version.wrapping_add(1); // ← Auto Increment
Ok(())
}
/// Lock: Avail → Frozen
pub fn lock(&mut self, amount: u64) -> Result<(), &'static str> {
if self.avail < amount {
return Err("Insufficient funds to lock"); // ← Explicit Error
}
self.avail = self.avail.checked_sub(amount)
.ok_or("Lock avail underflow")?;
self.frozen = self.frozen.checked_add(amount)
.ok_or("Lock frozen overflow")?;
self.version = self.version.wrapping_add(1);
Ok(())
}
/// Unlock: Frozen → Avail
pub fn unlock(&mut self, amount: u64) -> Result<(), &'static str> {
if self.frozen < amount {
return Err("Insufficient frozen funds");
}
self.frozen = self.frozen.checked_sub(amount)
.ok_or("Unlock frozen underflow")?;
self.avail = self.avail.checked_add(amount)
.ok_or("Unlock avail overflow")?;
self.version = self.version.wrapping_add(1);
Ok(())
}
/// Spend Frozen: Funds leave account after match
pub fn spend_frozen(&mut self, amount: u64) -> Result<(), &'static str> {
if self.frozen < amount {
return Err("Insufficient frozen funds");
}
self.frozen = self.frozen.checked_sub(amount)
.ok_or("Spend frozen underflow")?;
self.version = self.version.wrapping_add(1);
Ok(())
}
}
4. UserAccount Refactoring
UserAccount is also refactored:
Data Structure Change
#![allow(unused)]
fn main() {
// Old: FxHashMap
pub struct UserAccount {
pub user_id: u64,
balances: FxHashMap<u32, Balance>,
}
// New: O(1) Direct Array Indexing
pub struct UserAccount {
user_id: UserId, // Private
assets: Vec<Balance>, // Private, asset_id as index
}
}
O(1) Direct Array Indexing
#![allow(unused)] fn main() { // deposit() auto-creates slot pub fn deposit(&mut self, asset_id: AssetId, amount: u64) -> Result<(), &'static str> { let idx = asset_id as usize; if idx >= self.assets.len() { self.assets.resize(idx + 1, Balance::default()); } self.assets[idx].deposit(amount) } // get_balance_mut() returns Result pub fn get_balance_mut(&mut self, asset_id: AssetId) -> Result<&mut Balance, &'static str> { self.assets.get_mut(asset_id as usize).ok_or("Asset not found") } }
🚀 Why
Vec<Balance>is Highest Performance?1. Cache-Friendly
Vec<Balance>is contiguous in memory. Loading one Balance loads neighbors into CPU cache line.2.
get_balance()is High Frequency Each order triggers 5-10 balance checks. O(1) + Cache Friendly is critical for millions of TPS.
Settlement Methods
New methods dedicated to handling all settlement logic for buyer/seller in one go:
#![allow(unused)]
fn main() {
/// Buyer Settlement: Spend Quote, Gain Base, Refund unused Quote
pub fn settle_as_buyer(
&mut self,
quote_asset_id: AssetId,
base_asset_id: AssetId,
spend_quote: u64, // Consumed USDT
gain_base: u64, // Gained BTC
refund_quote: u64, // Refunded USDT
) -> Result<(), &'static str> {
// 1. Spend Quote (Frozen)
self.get_balance_mut(quote_asset_id).spend_frozen(spend_quote)?;
// 2. Gain Base (Available)
self.get_balance_mut(base_asset_id).deposit(gain_base)?;
// 3. Refund (Frozen → Available)
if refund_quote > 0 {
self.get_balance_mut(quote_asset_id).unlock(refund_quote)?;
}
Ok(())
}
}
5. Execution Results
=== 0xInfinity: Stage 6 (Enforced Balance) ===
Symbol: BTC_USDT | Price: 2 decimals, Qty: 8 decimals
Cost formula: price * qty / 100000000
[0] Initial deposits...
Alice: 100.00000000 BTC, 10000.00 USDT
Bob: 5.00000000 BTC, 200000.00 USDT
[1] Alice places sell orders...
Order 1: Sell 10.00000000 BTC @ $100.00 -> New
Order 2: Sell 5.00000000 BTC @ $101.00 -> New
Alice balance: avail=85.00000000 BTC, frozen=15.00000000 BTC
[2] Bob places buy order (taker)...
Order 3: Buy 12.00000000 BTC @ $101.00 (cost: 1212.00 USDT)
Trades:
- Trade #1: 10.00000000 BTC @ $100.00
- Trade #2: 2.00000000 BTC @ $101.00
Order status: Filled
[3] Final balances:
Alice: 85.00000000 BTC (frozen: 3.00000000), 11202.00 USDT
Bob: 17.00000000 BTC, 198798.00 USDT (frozen: 0.00)
Book: Best Bid=None, Best Ask=Some("101.00")
=== End of Simulation ===
Results are consistent with the previous chapter, but now all operations are protected by the Type System!
6. Unit Tests
We added 8 new tests for enforced_balance. Total 16 tests passing.
test enrolled_balance::tests::test_deposit ... ok
test enrolled_balance::tests::test_deposit_overflow ... ok
test enrolled_balance::tests::test_lock_unlock ... ok
...
test result: ok. 16 passed; 0 failed
7. Error Handling Example
With the new API, Result must be handled:
#![allow(unused)]
fn main() {
// ❌ Compile Error: Unhandled Result
balance.deposit(100);
// ✅ Correct: Propagate
balance.deposit(100)?;
// ✅ Correct: Unwrap (Only if sure)
balance.deposit(100).unwrap();
// ✅ Correct: Match
match balance.lock(1000) {
Ok(()) => println!("Locked successfully"),
Err(e) => println!("Failed to lock: {}", e),
}
}
Summary
This chapter accomplished:
- ✅ Encapsulation: Private fields prevent accidental modification.
- ✅ Result Return: All mutations return explicit errors.
- ✅ Versioning: Auto-increment
versionfor audit. - ✅ Checked Arithmetic: Prevents overflow.
- ✅ Renaming:
lock/unlock/spend_frozenare clearer. - ✅ Settlement Helper:
settle_as_buyer/seller. - ✅ Asset ID: Constraint for future O(1) array optimization.
Now our balance management is Type-Safe—the compiler prevents most balance-related bugs!
🇨🇳 中文
📦 代码变更: 查看 Diff
在上一章中,我们实现了用户账户的余额管理。但在金融系统中,资金操作是最核心、最关键的操作,必须确保万无一失。本章我们将余额管理升级为类型系统强制的安全版本。
1. 为什么需要“强制“版本?
上一章的实现存在几个隐患:
#![allow(unused)]
fn main() {
// ❌ 旧版问题1:字段是公开的,容易被无意修改
pub struct Balance {
pub avail: u64, // 开发者可能不小心直接赋值,绕过业务逻辑校验
pub frozen: u64,
}
// ❌ 旧版问题2:返回 bool,错误信息不明确
fn freeze(&mut self, amount: u64) -> bool {
// 失败了?为什么失败?不知道
}
// ❌ 旧版问题3:无审计追踪
// 余额变了,但没有版本号,无法追溯
}
这些问题可能导致:
- 开发者无意中绕过校验:在复杂的业务代码中,可能不小心直接修改公开字段
- 错误难以排查:只知道操作失败,不知道具体原因
- 审计困难:没有变更追踪,难以定位问题发生的时间点
注意:这不是防止恶意攻击(这是内部系统),而是防止开发者无意挖坑。 就像 Rust 的所有权系统一样——我们用类型系统来减少挖坑的机会。
2. 强制余额设计 (Enforced Balance)
新版本通过 Rust 类型系统 强制安全:
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
pub struct Balance {
avail: u64, // ← 私有!只能通过方法访问
frozen: u64, // ← 私有!
version: u64, // ← 私有!每次变更自动递增
}
}
核心原则
| 原则 | 实现方式 |
|---|---|
| 封装 | 所有字段私有,提供只读 getter |
| 显式错误 | 所有变更返回 Result<(), &'static str> |
| 审计追踪 | version 在每次变更时自动递增 |
| 溢出保护 | 使用 checked_add/sub,溢出返回错误 |
方法命名变更
| 旧版 (v0.5) | 新版 (v0.6) | 说明 |
|---|---|---|
freeze() | lock() | 更准确:锁定资金用于订单 |
unfreeze() | unlock() | 解锁(取消订单时) |
consume_frozen() | spend_frozen() | 消费冻结资金(成交后) |
receive() | deposit() | 统一为存款语义 |
3. Balance API 详解
只读方法 (Safe Getters)
#![allow(unused)]
fn main() {
impl Balance {
/// 获取可用余额 (只读)
pub const fn avail(&self) -> u64 { self.avail }
/// 获取冻结余额 (只读)
pub const fn frozen(&self) -> u64 { self.frozen }
/// 获取总余额 (avail + frozen)
/// 返回 None 表示溢出(数据损坏)
pub const fn total(&self) -> Option<u64> {
self.avail.checked_add(self.frozen)
}
/// 获取版本号 (只读)
pub const fn version(&self) -> u64 { self.version }
}
}
为什么用
const fn? 编译器保证永远不会修改状态,提供最强的安全保证。
变更方法 (Validated Mutations)
每个变更方法都:
- 验证前置条件
- 使用 checked 算术
- 返回
Result - 自动递增
version
#![allow(unused)]
fn main() {
/// 存款:增加可用余额
pub fn deposit(&mut self, amount: u64) -> Result<(), &'static str> {
self.avail = self.avail.checked_add(amount)
.ok_or("Deposit overflow")?; // ← 溢出返回错误
self.version = self.version.wrapping_add(1); // ← 自动递增
Ok(())
}
/// 锁定:可用 → 冻结
pub fn lock(&mut self, amount: u64) -> Result<(), &'static str> {
if self.avail < amount {
return Err("Insufficient funds to lock"); // ← 明确错误信息
}
self.avail = self.avail.checked_sub(amount)
.ok_or("Lock avail underflow")?;
self.frozen = self.frozen.checked_add(amount)
.ok_or("Lock frozen overflow")?;
self.version = self.version.wrapping_add(1);
Ok(())
}
/// 解锁:冻结 → 可用
pub fn unlock(&mut self, amount: u64) -> Result<(), &'static str> {
if self.frozen < amount {
return Err("Insufficient frozen funds");
}
self.frozen = self.frozen.checked_sub(amount)
.ok_or("Unlock frozen underflow")?;
self.avail = self.avail.checked_add(amount)
.ok_or("Unlock avail overflow")?;
self.version = self.version.wrapping_add(1);
Ok(())
}
/// 消费冻结资金:成交后资金离开账户
pub fn spend_frozen(&mut self, amount: u64) -> Result<(), &'static str> {
if self.frozen < amount {
return Err("Insufficient frozen funds");
}
self.frozen = self.frozen.checked_sub(amount)
.ok_or("Spend frozen underflow")?;
self.version = self.version.wrapping_add(1);
Ok(())
}
}
4. UserAccount 重构
新版 UserAccount 也进行了重构:
数据结构变更
#![allow(unused)]
fn main() {
// 旧版:使用 FxHashMap
pub struct UserAccount {
pub user_id: u64,
balances: FxHashMap<u32, Balance>,
}
// 新版:O(1) 直接数组索引
pub struct UserAccount {
user_id: UserId, // 私有
assets: Vec<Balance>, // 私有,asset_id 作为下标
}
}
O(1) 直接数组索引
#![allow(unused)] fn main() { // deposit() 自动创建资产槽位(唯一入口) pub fn deposit(&mut self, asset_id: AssetId, amount: u64) -> Result<(), &'static str> { let idx = asset_id as usize; if idx >= self.assets.len() { self.assets.resize(idx + 1, Balance::default()); } self.assets[idx].deposit(amount) } // get_balance_mut() 不创建槽位,返回 Result pub fn get_balance_mut(&mut self, asset_id: AssetId) -> Result<&mut Balance, &'static str> { self.assets.get_mut(asset_id as usize).ok_or("Asset not found") } }
🚀 为什么
Vec<Balance>直接索引是最高效选择?1. 极佳的缓存友好性 (Cache-Friendly)
Vec<Balance>是连续内存布局,相邻资产的 Balance 在内存中也相邻。 当 CPU 读取一个 Balance 时,整个缓存行(通常 64 字节)会被加载, 相邻的 Balance 数据也一并进入 L1/L2 缓存,后续访问几乎零延迟。2.
get_balance()是高频调用函数在撮合引擎中,每笔订单都需要多次调用
get_balance():
- 下单前检查余额
- 冻结资金
- 每笔成交结算(买方 + 卖方各 2-3 次)
- 退款未使用资金
一笔订单可能产生 5-10 次
get_balance()调用。 在高频交易场景(每秒万笔订单),这意味着每秒 5-10 万次调用。 O(1) + 缓存友好 对性能至关重要。
结算方法
新增专门的结算方法,一次性处理买方或卖方的所有结算:
#![allow(unused)]
fn main() {
/// 买方结算:消费 Quote,获得 Base,退款未使用的 Quote
pub fn settle_as_buyer(
&mut self,
quote_asset_id: AssetId,
base_asset_id: AssetId,
spend_quote: u64, // 消费的 USDT
gain_base: u64, // 获得的 BTC
refund_quote: u64, // 退款的 USDT
) -> Result<(), &'static str> {
// 1. 消费 Quote (Frozen)
self.get_balance_mut(quote_asset_id).spend_frozen(spend_quote)?;
// 2. 获得 Base (Available)
self.get_balance_mut(base_asset_id).deposit(gain_base)?;
// 3. 退款 (Frozen → Available)
if refund_quote > 0 {
self.get_balance_mut(quote_asset_id).unlock(refund_quote)?;
}
Ok(())
}
}
5. 运行结果
=== 0xInfinity: Stage 6 (Enforced Balance) ===
Symbol: BTC_USDT | Price: 2 decimals, Qty: 8 decimals
Cost formula: price * qty / 100000000
[0] Initial deposits...
Alice: 100.00000000 BTC, 10000.00 USDT
Bob: 5.00000000 BTC, 200000.00 USDT
[1] Alice places sell orders...
Order 1: Sell 10.00000000 BTC @ $100.00 -> New
Order 2: Sell 5.00000000 BTC @ $101.00 -> New
Alice balance: avail=85.00000000 BTC, frozen=15.00000000 BTC
[2] Bob places buy order (taker)...
Order 3: Buy 12.00000000 BTC @ $101.00 (cost: 1212.00 USDT)
Trades:
- Trade #1: 10.00000000 BTC @ $100.00
- Trade #2: 2.00000000 BTC @ $101.00
Order status: Filled
[3] Final balances:
Alice: 85.00000000 BTC (frozen: 3.00000000), 11202.00 USDT
Bob: 17.00000000 BTC, 198798.00 USDT (frozen: 0.00)
Book: Best Bid=None, Best Ask=Some("101.00")
=== End of Simulation ===
结果与前一章一致,但现在所有余额操作都通过类型系统保护!
6. 单元测试
新增 8 个 enforced_balance 测试:
$ cargo test
test result: ok. 16 passed; 0 failed
7. 错误处理示例
使用新 API 时,必须处理 Result:
#![allow(unused)]
fn main() {
// ❌ 编译错误:未处理的 Result
balance.deposit(100);
// ✅ 正确:显式处理
balance.deposit(100)?; // 使用 ? 传播错误
// ✅ 正确:使用 unwrap(仅在确定不会失败时)
balance.deposit(100).unwrap();
// ✅ 正确:匹配处理
match balance.lock(1000) {
Ok(()) => println!("Locked successfully"),
Err(e) => println!("Failed to lock: {}", e),
}
}
Summary
本章完成了以下工作:
- ✅ 私有字段封装:所有余额字段私有化,防止无意修改
- ✅ Result 返回类型:所有变更操作返回明确的错误信息
- ✅ 版本追踪:每次变更自动递增
version,支持审计 - ✅ Checked 算术:所有运算使用 checked_add/sub,溢出返回错误
- ✅ 方法重命名:
lock/unlock/spend_frozen语义更清晰 - ✅ 结算方法:
settle_as_buyer/settle_as_seller一站式结算 - ✅ Asset ID 约束:为未来 O(1) 直接索引优化做准备
- ✅ 16 个测试通过:包括 8 个新的 enforced_balance 测试
现在我们的余额管理是类型安全的——编译器本身就能防止大部分余额操作错误!
0x07-a Testing Framework - Correctness
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📦 Code Changes: View Diff
Core Objective: To establish a verifiable, repeatable, and traceable testing infrastructure for the matching engine.
This chapter is not just about “how to test”, but importantly about understanding “why designed this way”—these design decisions stem directly from real-world exchange requirements.
1. Why a Testing Framework?
1.1 The Uniqueness of Matching Engines
A matching engine is not a generic CRUD app. A single bug can lead to:
- Fund Errors: Users’ funds disappearing or inflating.
- Order Loss: Orders executed but not recorded.
- Inconsistent States: Contradictions between balances, orders, and ledgers.
Therefore, we need:
- Deterministic Testing: Same input must yield same output.
- Complete Audit: Every penny movement must be traceable.
- Fast Verification: Quickly confirm correctness after every code change.
1.2 Golden File Testing Pattern
We adopt the Golden File Pattern:
fixtures/ # Input (Fixed)
├── orders.csv
└── balances_init.csv
baseline/ # Golden Baseline (Result of first correct run, committed to git)
├── t1_balances_deposited.csv
├── t2_balances_final.csv
├── t2_ledger.csv
└── t2_orderbook.csv
output/ # Current Run Result (gitignored)
└── ...
Why this pattern?
- Determinism: Fixed seeds ensure identical random sequences.
- Version Control: Baselines are committed; any change triggers a git diff.
- Fast Feedback: Just
diff baseline/ output/. - Auditable: Baseline is the “contract”; deviations require explanation.
2. Precision Design: decimals vs display_decimals
2.1 Why Two Precisions?
This is the most error-prone area in exchanges. Consider this real case:
User sees: Buy 0.01 BTC @ $85,000.00
Internal store: qty=1000000 (satoshi), price=85000000000 (micro-cents)
If we confuse these layers:
- User enters
0.01, system treats as0.01 satoshi(= 0.00000001 BTC). - Or user account shows 100 BTC, but actually has 0.000001 BTC.
Solution: Clearly distinguish two layers.
2.2 Precision Layers
┌─────────────────────────────────────────────────────────────┐
│ Layer 1: Client (display_decimals) │
│ - Numbers seen by users │
│ - Can be adjusted based on business needs │
│ - E.g.: BTC displays 6 decimals (0.000001 BTC) │
└─────────────────────────────────────────────────────────────┘
↓
Auto Convert (× 10^decimals)
↓
┌─────────────────────────────────────────────────────────────┐
│ Layer 2: Internal (decimals) │
│ - Precision for internal storage and calculation │
│ - NEVER change once set │
│ - E.g.: BTC stored with 8 decimals (satoshi) │
└─────────────────────────────────────────────────────────────┘
2.3 Configuration Design
assets_config.csv (Asset Precision Config):
asset_id,asset,decimals,display_decimals
1,BTC,8,6 # Min unit 0.000001 BTC ≈ $0.085
2,USDT,6,4 # Min unit 0.0001 USDT
3,ETH,8,4 # Min unit 0.0001 ETH ≈ $0.40
| Field | Mutability | Explanation |
|---|---|---|
decimals | ⚠️ Never Change | Defines min unit; changing breaks all existing data. |
display_decimals | ✅ Dynamic | Client-side precision for Quantity (qty). |
symbols_config.csv (Trading Pair Config):
symbol_id,symbol,base_asset_id,quote_asset_id,price_decimal,price_display_decimal
0,BTC_USDT,1,2,6,2 # Price min unit $0.01
1,ETH_USDT,3,2,6,2
Key Design: Precision Source
| Order Field | Precision Source | Config File |
|---|---|---|
qty | base_asset.display_decimals | assets_config.csv |
price | symbol.price_display_decimal | symbols_config.csv |
⚠️ Note: Price precision comes from Symbol config, NOT Quote Asset! This is because the same quote asset (e.g., USDT) may have different price precisions in different pairs.
Why decimals cannot change?
Suppose BTC decimals change from 8 to 6:
- Original balance 100,000,000 (= 1 BTC with 8 decimals).
- New interpretation 100,000,000 / 10^6 = 100 BTC.
- User gains 99 BTC out of thin air!
Why display_decimals can change?
This is just the display layer:
- Original display: 0.12345678 BTC.
- New display (6 decimals): 0.123456 BTC.
- Internal storage remains 12,345,678 satoshis.
3. Balance Format: Row vs Column
3.1 Problem: Storing Multi-Asset Balances
Option A: Columnar (One column per asset)
user_id,btc_avail,btc_frozen,usdt_avail,usdt_frozen
1,10000000000,0,10000000000000,0
Option B: Row-based (One row per asset)
user_id,asset_id,avail,frozen,version
1,1,10000000000,0,0
1,2,10000000000000,0,0
3.2 Why Row-based?
| Dimension | Columnar | Row-based |
|---|---|---|
| Extensibility | ❌ Alter table to add asset | ✅ Just add a row |
| Sparse Data | ❌ Many nulls/zeros | ✅ Store only non-zero assets |
| DB Compat | ❌ Non-standard | ✅ Standard normalization |
| Genericity | ❌ Asset names hardcoded | ✅ asset_id is generic |
Real Scenario: An exchange supports 500+ assets, but users avg 3-5 holdings. Row-based design saves 99% storage space.
4. Timeline Snapshot Design
4.1 Why Multiple Snapshots?
Matching is a multi-stage process:
T0: Initial State (fixtures/balances_init.csv)
↓ deposit()
T1: Deposit Done (baseline/t1_balances_deposited.csv)
↓ execute orders
T2: Trading Done (baseline/t2_balances_final.csv)
Errors can occur at any stage:
- T0→T1: Is deposit logic correct?
- T1→T2: Is trade settlement correct?
Snapshots pinpoint issues:
# Verify Deposit
diff balances_init.csv t1_balances_deposited.csv
# Verify Settlement
diff t1_balances_deposited.csv t2_balances_final.csv
4.2 Naming Convention
t1_balances_deposited.csv # t1 stage, balances type, deposited state
t2_balances_final.csv # t2 stage, balances type, final state
t2_ledger.csv # t2 stage, ledger type
t2_orderbook.csv # t2 stage, orderbook type
Principle: {Time}_{Type}_{State}.csv
Benefits:
- Natural sort order by time.
- Clear content identification.
- Avoids ambiguity.
5. Settlement Ledger Design
5.1 Why Ledger?
t2_ledger.csv is the system’s Audit Log. Every penny movement is recorded here.
Without Ledger:
- User complaint: “Where did my money go?”
- Support: “Your balance is X.”
- Unanswerable: “When did it change? Why?”
With Ledger:
trade_id,user_id,asset_id,op,delta,balance_after
1,96,2,debit,849700700,9999150299300
1,96,1,credit,1000000,10001000000
Traceability:
- Trade #1 caused User #96’s USDT to decrease by 849,700,700.
- Simultaneously BTC increased by 1,000,000.
- What is the balance after change.
5.2 Why delta + after instead of before + after?
Option A: before + after
delta,balance_before,balance_after
849700700,10000000000,9999150299300
Option B: delta + after
delta,balance_after
849700700,9999150299300
Why B?
- Less Redundancy:
before = after - delta. - Usefulness: We mostly verify “Is the final state correct?”.
- Clarity: Delta directly explains the change.
6. ME Orderbook Snapshot
6.1 Why Orderbook Snapshot?
After trading, the Orderbook still holds unfilled orders. These orders:
- Reside in RAM.
- Are lost if system restarts.
t2_orderbook.csv is a Full Snapshot of ME State:
order_id,user_id,side,order_type,price,qty,filled_qty,status
6,907,sell,limit,85330350000,2000000,0,New
Uses:
- Recovery: Revert Orderbook state after restart.
- Verification: Compare against theoretical expectations.
- Debugging: Check stuck orders.
6.2 Why Record All Fields?
The goal is Full Recovery. Rebuilding Order struct requires:
#![allow(unused)]
fn main() {
struct Order {
id, user_id, price, qty, filled_qty, side, order_type, status
}
}
Missing any field prevents recovery.
7. Test Script Design
7.1 Modular Scripts
scripts/
├── test_01_generate.sh # Step 1: Generate Data
├── test_02_baseline.sh # Step 2: Generate Baseline
├── test_03_verify.sh # Step 3: Run & Verify
└── test_e2e.sh # Combo: Full E2E Flow
Why Modular?
- Isolated Debugging: Run only relevant steps.
- Flexible Composition: CI can verify without regenerating.
- Readability: One script, one job.
7.2 Usage
# Daily Test (Use existing baseline)
./scripts/test_e2e.sh
# Regenerate Baseline & Test
./scripts/test_e2e.sh --regenerate
8. CLI Design: --baseline Switch
8.1 Why Switch?
Default behavior:
- Output to
output/ - Never overwrite baseline
Update baseline:
- Add
--baselinearg - Output to
baseline/
Why not auto-overwrite?
- Safety: Prevent accidental baseline corruption.
- Intent: Updating baseline is a conscious decision.
- Git Friendly: Changes trigger diff.
8.2 Implementation
#![allow(unused)]
fn main() {
fn get_output_dir() -> &'static str {
let args: Vec<String> = std::env::args().collect();
if args.iter().any(|a| a == "--baseline") {
"baseline"
} else {
"output"
}
}
}
9. Execution Example
9.1 Full Flow
# 1. Generate Data
python3 scripts/generate_orders.py --orders 100000 --seed 42
# 2. Generate Baseline (First run or update)
cargo run --release -- --baseline
# 3. Daily Test
./scripts/test_e2e.sh
9.2 Verification Output
╔════════════════════════════════════════════════════════════╗
║ 0xInfinity Testing Framework - E2E Test ║
╚════════════════════════════════════════════════════════════╝
t1_balances_deposited.csv: ✅ MATCH
t2_balances_final.csv: ✅ MATCH
t2_ledger.csv: ✅ MATCH
t2_orderbook.csv: ✅ MATCH
✅ All tests passed!
10. Summary
This chapter established a complete testing infrastructure:
| Design Point | Problem Solved | Solution |
|---|---|---|
| Precision Confusion | User vs Internal precision | decimals + display_decimals |
| Asset Extension | Support N assets | Row-based balance format |
| Traceability | Where failed? | Timeline Snapshots (T0→T1→T2) |
| Fund Audit | Where funds go? | Settlement Ledger |
| State Recovery | Restart recovery | Orderbook Snapshot |
| Regression | Breaking changes? | Golden File Pattern |
| Efficiency | Fast feedback | Modular scripts |
Core Philosophy:
Testing is not an afterthought, but part of the design. A good testing framework gives you confidence when changing code.
Next section (0x07-b) will add performance benchmarks on top of this.
🇨🇳 中文
📦 代码变更: 查看 Diff
核心目的:为撮合引擎建立可验证、可重复、可追溯的测试基础设施。
本章不仅是“如何测试“,更重要的是理解“为什么这样设计“——这些设计决策直接源于真实交易所的需求。
1. 为什么需要测试框架?
1.1 撮合引擎的特殊性
撮合引擎不是普通的 CRUD 应用。一个 bug:
- 资金错误:用户资金凭空消失或增加
- 订单丢失:订单被执行但没有记录
- 状态不一致:余额、订单、成交记录互相矛盾
因此,我们需要:
- 确定性测试:相同的输入必须产生相同的输出
- 完整审计:每一分钱的变动都可追溯
- 快速验证:每次修改代码后能快速确认没有破坏正确性
1.2 Golden File 测试模式
我们采用 Golden File 模式:
fixtures/ # 输入(固定)
├── orders.csv
└── balances_init.csv
baseline/ # 黄金基准(第一次正确运行的结果,git 提交)
├── t1_balances_deposited.csv
├── t2_balances_final.csv
├── t2_ledger.csv
└── t2_orderbook.csv
output/ # 当前运行结果(gitignored)
└── ...
为什么选择这种模式?
- 确定性:固定的 seed 保证相同的随机数序列
- 版本控制:baseline 提交到 git,任何变化都能被 diff 检测
- 快速反馈:只需
diff baseline/ output/ - 可审计:baseline 是“合约“,任何偏离都需要解释
2. 精度设计:decimals vs display_decimals
2.1 为什么需要两种精度?
这是交易所最容易出错的地方。看这个真实案例:
用户看到:买入 0.01 BTC @ $85,000.00
内部存储:qty=1000000 (satoshi), price=85000000000 (微美分)
如果混淆这两层,会发生什么?
- 用户输入
0.01,系统理解为0.01 satoshi= 实际 0.0000001 BTC - 或者用户账户显示有 100 BTC,实际只有 0.000001 BTC
解决方案:明确区分两层精度
2.2 精度层次
┌─────────────────────────────────────────────────────────────┐
│ Layer 1: Client (display_decimals) │
│ - 用户看到的数字 │
│ - 可以根据业务需求调整 │
│ - 例如:BTC 数量显示 6 位小数 (0.000001 BTC) │
└─────────────────────────────────────────────────────────────┘
↓
自动转换 (× 10^decimals)
↓
┌─────────────────────────────────────────────────────────────┐
│ Layer 2: Internal (decimals) │
│ - 内部存储和计算的精度 │
│ - 一旦设定永不改变 │
│ - 例如:BTC 存储 8 位精度 (satoshi) │
└─────────────────────────────────────────────────────────────┘
2.3 配置文件设计
assets_config.csv(资产精度配置):
asset_id,asset,decimals,display_decimals
1,BTC,8,6 # 最小单位 0.000001 BTC ≈ $0.085
2,USDT,6,4 # 最小单位 0.0001 USDT
3,ETH,8,4 # 最小单位 0.0001 ETH ≈ $0.40
| 字段 | 可变性 | 说明 |
|---|---|---|
decimals | ⚠️ 永不改变 | 定义最小单位,改变会破坏所有现有数据 |
display_decimals | ✅ 可动态调整 | 用于数量 (qty) 的客户端精度 |
symbols_config.csv(交易对配置):
symbol_id,symbol,base_asset_id,quote_asset_id,price_decimal,price_display_decimal
0,BTC_USDT,1,2,6,2 # 价格最小单位 $0.01
1,ETH_USDT,3,2,6,2
关键设计:精度来源
| 订单字段 | 精度来源 | 配置位置 |
|---|---|---|
qty (数量) | base_asset.display_decimals | assets_config.csv |
price (价格) | symbol.price_display_decimal | symbols_config.csv |
⚠️ 注意:price 精度来自 symbol 配置,不是 quote_asset! 这样设计是因为同一个 quote asset(如 USDT)在不同交易对中可能有不同的价格精度。
为什么 decimals 不能改变?
假设 BTC decimals 从 8 改为 6:
- 原来账户余额 100000000 (= 1 BTC)
- 现在变成 100000000 / 10^6 = 100 BTC
- 用户凭空获得 99 BTC!
为什么 display_decimals 可以改变?
这只是显示层,不影响存储:
- 原来显示 0.12345678 BTC
- 调整后显示 0.123456 BTC(6位)
- 内部存储仍然是 12345678 satoshi
3. 余额格式设计:行式 vs 列式
3.1 问题:如何存储多资产余额?
Option A:列式(每个资产一列)
user_id,btc_avail,btc_frozen,usdt_avail,usdt_frozen
1,10000000000,0,10000000000000,0
Option B:行式(每个资产一行)
user_id,asset_id,avail,frozen,version
1,1,10000000000,0,0
1,2,10000000000000,0,0
3.2 为什么选择行式?
| 对比维度 | 列式 | 行式 |
|---|---|---|
| 扩展性 | ❌ 添加资产需改表结构 | ✅ 直接添加新行 |
| 稀疏数据 | ❌ 大量空值 | ✅ 只存有余额的资产 |
| 数据库兼容 | ❌ 非标准化 | ✅ 标准化范式 |
| 通用性 | ❌ 资产名硬编码 | ✅ asset_id 通用 |
真实场景:交易所支持 500+ 种资产,但用户平均只持有 3-5 种。行式设计节省 99% 的存储空间。
4. 时间线快照设计
4.1 为什么需要多个快照?
撮合过程不是单一操作,而是多阶段流程:
T0: 初始状态 (fixtures/balances_init.csv)
↓ deposit()
T1: 充值完成 (baseline/t1_balances_deposited.csv)
↓ execute orders
T2: 交易完成 (baseline/t2_balances_final.csv)
每个阶段都可能出错:
- T0→T1:deposit 逻辑是否正确?
- T1→T2:交易结算是否正确?
有了快照,可以精确定位问题:
# 验证 deposit 正确性
diff balances_init.csv t1_balances_deposited.csv
# 验证交易结算正确性
diff t1_balances_deposited.csv t2_balances_final.csv
4.2 文件命名设计
t1_balances_deposited.csv # t1 阶段,balances 类型,deposited 状态
t2_balances_final.csv # t2 阶段,balances 类型,final 状态
t2_ledger.csv # t2 阶段,ledger 类型
t2_orderbook.csv # t2 阶段,orderbook 类型
命名原则:{时间点}_{数据类型}_{状态}.csv
这样的命名:
- 按时间排序时自然有序
- 一眼看出数据是什么
- 避免文件名歧义
5. Settlement Ledger 设计
5.1 为什么需要 Ledger?
t2_ledger.csv 是整个系统的审计日志。每一分钱的变动都记录在这里。
没有 Ledger 的问题:
- 用户投诉:我的钱去哪了?
- 只能说:交易后余额是 X
- 无法回答:什么时候变的?为什么变?
有了 Ledger:
trade_id,user_id,asset_id,op,delta,balance_after
1,96,2,debit,849700700,9999150299300
1,96,1,credit,1000000,10001000000
可以完整追溯:
- Trade #1 导致 User #96 的 USDT 减少 849700700
- 同时 BTC 增加 1000000
- 变化后余额是多少
5.2 为什么用 delta + after,而不是 before + after?
Option A:before + after
delta,balance_before,balance_after
849700700,10000000000,9999150299300
Option B:delta + after
delta,balance_after
849700700,9999150299300
选择 Option B 的原因:
- 冗余更少:before = after - delta,可计算得出
- after 更有用:通常我们想验证的是“最终状态对不对“
- delta 直接说明变化:不需要心算 before - after
6. ME Orderbook 快照
6.1 为什么需要 Orderbook 快照?
交易完成后,Orderbook 里仍然有未成交的挂单。这些订单:
- 在内存中
- 如果系统重启,会丢失
t2_orderbook.csv 是 ME 状态的完整快照:
order_id,user_id,side,order_type,price,qty,filled_qty,status
6,907,sell,limit,85330350000,2000000,0,New
用途:
- 状态恢复:重启后可以从快照恢复 Orderbook
- 正确性验证:与理论预期对比
- 调试:哪些订单还在挂着?
6.2 为什么记录所有字段?
快照目的是完整恢复。恢复时需要重建 Order 结构体:
#![allow(unused)]
fn main() {
struct Order {
id,
user_id,
price,
qty,
filled_qty,
side,
order_type,
status,
}
}
缺少任何字段都无法恢复。
7. 测试脚本设计
7.1 模块化脚本
scripts/
├── test_01_generate.sh # Step 1: 生成测试数据
├── test_02_baseline.sh # Step 2: 生成基准
├── test_03_verify.sh # Step 3: 运行并验证
└── test_e2e.sh # 组合:完整 E2E 流程
为什么模块化?
- 单独调试:出问题时只运行相关步骤
- 灵活组合:CI 可以只运行 verify,不重新生成数据
- 可读性:每个脚本做一件事
7.2 使用方式
# 日常测试(使用现有 baseline)
./scripts/test_e2e.sh
# 重新生成基准并测试
./scripts/test_e2e.sh --regenerate
8. 命令行设计:–baseline 开关
8.1 为什么需要开关?
默认行为:
- 输出到
output/ - 不会覆盖 baseline
需要更新基准时:
- 加
--baseline参数 - 输出到
baseline/
为什么不自动覆盖?
- 安全:防止意外覆盖基准
- 意图明确:更新基准是有意识的决定
- Git 友好:baseline 变化会触发 git diff
- 代码实现:
#![allow(unused)]
fn main() {
fn get_output_dir() -> &'static str {
let args: Vec<String> = std::env::args().collect();
if args.iter().any(|a| a == "--baseline") {
"baseline"
} else {
"output"
}
}
}
9. 运行示例
9.1 完整流程
# 1. 生成测试数据
python3 scripts/generate_orders.py --orders 100000 --seed 42
# 2. 生成基准(首次或需要更新时)
cargo run --release -- --baseline
# 3. 日常测试
./scripts/test_e2e.sh
9.2 验证输出
╔════════════════════════════════════════════════════════════╗
║ 0xInfinity Testing Framework - E2E Test ║
╚════════════════════════════════════════════════════════════╝
t1_balances_deposited.csv: ✅ MATCH
t2_balances_final.csv: ✅ MATCH
t2_ledger.csv: ✅ MATCH
t2_orderbook.csv: ✅ MATCH
✅ All tests passed!
10. Summary
本章建立了完整的测试基础设施:
| 设计点 | 解决的问题 | 方案 |
|---|---|---|
| 精度混淆 | 用户精度 vs 内部精度 | decimals + display_decimals |
| 资产扩展 | 支持 N 种资产 | 行式余额格式 |
| 过程追溯 | 哪一步出错? | 时间线快照 (T0→T1→T2) |
| 资金审计 | 每分钱去向 | Settlement Ledger |
| 状态恢复 | 重启后恢复 | Orderbook 快照 |
| 回归测试 | 代码改动是否破坏正确性 | Golden File 模式 |
| 测试效率 | 快速反馈 | 模块化脚本 |
核心理念:
测试不是事后补的,而是设计的一部分。好的测试框架能让你在改动代码时有信心。
下一节 (0x07-b) 将在此基础上添加性能测试和优化基准。
0x07-b Performance Baseline - Initial Setup
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📦 Code Changes: View Diff
Core Objective: To establish a quantifiable, traceable, and comparable performance baseline.
Building on the testing framework from 0x07-a, this chapter adds detailed performance metric collection and analysis capabilities.
1. Why a Performance Baseline?
1.1 The Performance Trap
Optimization without a baseline is blind:
- Premature Optimization: Optimizing code that accounts for 1% of runtime.
- Delayed Regression Detection: A refactor drops performance by 50%, but it’s only discovered 3 months later.
- Unquantifiable Improvement: Promoting “it’s much faster,” but exactly how much?
1.2 Value of a Baseline
With a baseline, you can:
- Verify before Commit: Ensure performance hasn’t degraded.
- Pinpoint Bottlenecks: Identify which component consumes the most time.
- Quantify Optimization: “Throughput increased from 30K ops/s to 100K ops/s.”
2. Metric Design
2.1 Throughput Metrics
| Metric | Explanation | Calculation |
|---|---|---|
throughput_ops | Order Throughput | orders / exec_time |
throughput_tps | Trade Throughput | trades / exec_time |
2.2 Time Breakdown
We decompose execution time into four components:
┌─────────────────────────────────────────────────────────────┐
│ Order Processing (per order) │
├─────────────────────────────────────────────────────────────┤
│ 1. Balance Check │ Account lookup + balance validation │
│ - Account lookup │ FxHashMap O(1) │
│ - Fund locking │ Check avail >= required, then lock │
├─────────────────────────────────────────────────────────────┤
│ 2. Matching Engine │ book.add_order() │
│ - Price lookup │ BTreeMap O(log n) │
│ - Order matching │ iterate + partial fill │
├─────────────────────────────────────────────────────────────┤
│ 3. Settlement │ settle_as_buyer/seller │
│ - Balance update │ HashMap O(1) │
├─────────────────────────────────────────────────────────────┤
│ 4. Ledger I/O │ write_entry() │
│ - File write │ Disk I/O │
└─────────────────────────────────────────────────────────────┘
2.3 Latency Percentiles
Sample total processing latency every N orders:
| Percentile | Meaning |
|---|---|
| P50 | Median, typical case |
| P99 | 99% of requests are faster than this |
| P99.9 | Tail latency, worst cases |
| Max | Maximum latency |
3. Initial Baseline Data
3.1 Test Environment
- Hardware: MacBook Pro M Series
- Data: 100,000 Orders, 47,886 Trades
- Mode: Release build (
--release)
3.2 Throughput
Throughput: ~29,000 orders/sec | ~14,000 trades/sec
Execution Time: ~3.5s
3.3 Time Breakdown 🔥
=== Performance Breakdown ===
Balance Check: 17.68ms ( 0.5%) ← FxHashMap O(1)
Matching Engine: 36.04ms ( 1.0%) ← Extremely Fast!
Settlement: 4.77ms ( 0.1%) ← Negligible
Ledger I/O: 3678.68ms ( 98.4%) ← Bottleneck!
Key Findings:
- Ledger I/O consumes 98.4% of time.
- Balance Check + Matching + Settlement total only ~58ms.
- Theoretical Limit: ~1.7 Million orders/sec (without I/O).
3.4 Order Lifecycle Timeline 📊
Order Lifecycle
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Balance │ │ Matching │ │ Settlement │ │ Ledger │
│ Check │───▶│ Engine │───▶│ (Balance) │───▶│ I/O │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ FxHashMap │ │ BTreeMap │ │Vec<Balance> │ │ File:: │
│ +Vec O(1) │ │ O(log n) │ │ O(1) │ │ write() │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Total Time: 17.68ms 36.04ms 4.77ms 3678.68ms
Percentage: 0.5% 1.0% 0.1% 98.4%
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Per-Order: 0.18µs 0.36µs 0.05µs 36.79µs
Potential: 5.6M ops/s 2.8M ops/s 20M ops/s 27K ops/s
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Business Logic ~58ms (1.6%) I/O ~3679ms (98.4%)
◀─────────────────────────▶ ◀───────────────────────▶
Fast ✅ Bottleneck 🔴
Analysis:
| Phase | Latency/Order | Theoretical OPS | Note |
|---|---|---|---|
| Balance Check | 0.18µs | 5.6M/s | FxHashMap Lookup + Vec O(1) |
| Matching Engine | 0.36µs | 2.8M/s | BTreeMap Price Matching |
| Settlement | 0.05µs | 20M/s | Vec<Balance> O(1) Indexing |
| Ledger I/O | 36.79µs | 27K/s | Unbuffered File Write = Bottleneck! |
E2E Result:
- Actual Throughput: ~29K orders/sec (I/O Bound)
- Theoretical Limit (No I/O): ~1.7M orders/sec (60x room for improvement!)
3.5 Latency Percentiles
=== Latency Percentiles (sampled) ===
Min: 125 ns
Avg: 34022 ns
P50: 583 ns ← Typical order < 1µs
P99: 391750 ns ← 99% of orders < 0.4ms
P99.9: 1243833 ns ← Tail latency ~1.2ms
Max: 3207875 ns ← Worst case ~3ms
4. Output Files
4.1 t2_perf.txt (Machine Readable)
# Performance Baseline - 0xInfinity
# Generated: 2025-12-16
orders=100000
trades=47886
exec_time_ms=3451.78
throughput_ops=28971
throughput_tps=13873
matching_ns=32739014
settlement_ns=3085409
ledger_ns=3388134698
latency_min_ns=125
latency_avg_ns=34022
latency_p50_ns=583
latency_p99_ns=391750
latency_p999_ns=1243833
latency_max_ns=3207875
4.2 t2_summary.txt (Human Readable)
Contains full execution summary and performance breakdown.
5. PerfMetrics Implementation
#![allow(unused)]
fn main() {
/// Performance metrics for execution analysis
#[derive(Default)]
struct PerfMetrics {
// Timing breakdown (nanoseconds)
total_balance_check_ns: u64, // Account lookup + balance check + lock
total_matching_ns: u64, // OrderBook.add_order()
total_settlement_ns: u64, // Balance updates after trade
total_ledger_ns: u64, // Ledger file I/O
// Per-order latency samples
latency_samples: Vec<u64>,
sample_rate: usize,
}
impl PerfMetrics {
fn new(sample_rate: usize) -> Self { ... }
fn add_order_latency(&mut self, latency_ns: u64) { ... }
fn add_balance_check_time(&mut self, ns: u64) { ... }
fn add_matching_time(&mut self, ns: u64) { ... }
fn add_settlement_time(&mut self, ns: u64) { ... }
fn add_ledger_time(&mut self, ns: u64) { ... }
fn percentile(&self, p: f64) -> Option<u64> { ... }
fn min_latency(&self) -> Option<u64> { ... }
fn max_latency(&self) -> Option<u64> { ... }
fn avg_latency(&self) -> Option<u64> { ... }
}
}
6. Optimization Roadmap
Based on baseline data, future directions:
6.1 Short Term (0x07-c)
| Optimization | Expected Gain | Difficulty |
|---|---|---|
| Use BufWriter | 10-50x I/O | Low |
| Batch Write | 2-5x | Low |
6.2 Mid Term (0x08+)
| Optimization | Expected Gain | Difficulty |
|---|---|---|
| Async I/O | Decouple Matching & Persistence | Medium |
| Memory Pool | Reduce Allocation | Medium |
6.3 Long Term
| Optimization | Expected Gain | Difficulty |
|---|---|---|
| DPDK/io_uring | 10x+ | High |
| FPGA | 100x+ | Extreme |
7. Commands Reference
# Run and generate performance data
cargo run --release
# Update baseline (when code changes)
cargo run --release -- --baseline
# View performance data
cat output/t2_perf.txt
# Compare performance changes
python3 scripts/compare_perf.py
compare_perf.py Output Example
╔════════════════════════════════════════════════════════════════════════╗
║ Performance Comparison Report ║
╚════════════════════════════════════════════════════════════════════════╝
Metric Baseline Current Change
───────────────────────────────────────────────────────────────────────────
Orders 100000 100000 -
Trades 47886 47886 -
Exec Time 3753.87ms 3484.37ms -7.2%
Throughput (orders) 26639/s 28700/s +7.7%
Throughput (trades) 12756/s 13743/s +7.7%
───────────────────────────────────────────────────────────────────────────
Timing Breakdown (lower is better):
Metric Baseline Current Change OPS
Balance Check 17.68ms 16.51ms -6.6% 6.1M
Matching Engine 36.04ms 35.01ms -2.8% 2.9M
Settlement 4.77ms 5.22ms +9.4% 19.2M
Ledger I/O 3678.68ms 3411.49ms -7.3% 29K
───────────────────────────────────────────────────────────────────────────
Latency Percentiles (lower is better):
Metric Baseline Current Change
Latency MIN 125ns 125ns +0.0%
Latency AVG 37.9µs 34.8µs -8.2%
Latency P50 584ns 541ns -7.4%
Latency P99 420.2µs 398.9µs -5.1%
Latency P99.9 1.63ms 1.24ms -24.3%
Latency MAX 9.76ms 3.53ms -63.9%
───────────────────────────────────────────────────────────────────────────
✅ No significant regressions detected
Summary
This chapter accomplished:
- PerfMetrics Structure: Collecting time breakdown & latency samples.
- Time Breakdown: Balance Check / Matching / Settlement / Ledger I/O.
- Latency Percentiles: P50 / P99 / P99.9 / Max.
- t2_perf.txt: Machine-readable baseline file.
- compare_perf.py: Tool to detect regression.
- Key Finding: Ledger I/O takes 98.4%, major bottleneck.
🇨🇳 中文
📦 代码变更: 查看 Diff
核心目的:建立可量化、可追踪、可比较的性能基线。
本章在 0x07-a 测试框架基础上,添加详细的性能指标收集和分析能力。
1. 为什么需要性能基线?
1.1 性能陷阱
没有基线的优化是盲目的:
- 过早优化:优化了占 1% 时间的代码
- 回归发现延迟:某次重构导致性能下降 50%,但 3 个月后才发现
- 无法量化改进:说“快了很多“,但具体快了多少?
1.2 基线的价值
有了基线,你可以:
- 每次提交前验证:性能没有下降
- 精确定位瓶颈:哪个组件消耗最多时间
- 量化优化效果:从 30K ops/s 提升到 100K ops/s
2. 性能指标设计
2.1 吞吐量指标
| 指标 | 说明 | 计算方式 |
|---|---|---|
throughput_ops | 订单吞吐量 | orders / exec_time |
throughput_tps | 成交吞吐量 | trades / exec_time |
2.2 时间分解
我们将执行时间分解为四个组件:
┌─────────────────────────────────────────────────────────────┐
│ Order Processing (per order) │
├─────────────────────────────────────────────────────────────┤
│ 1. Balance Check │ Account lookup + balance validation │
│ - Account lookup │ FxHashMap O(1) │
│ - Fund locking │ Check avail >= required, then lock │
├─────────────────────────────────────────────────────────────┤
│ 2. Matching Engine │ book.add_order() │
│ - Price lookup │ BTreeMap O(log n) │
│ - Order matching │ iterate + partial fill │
├─────────────────────────────────────────────────────────────┤
│ 3. Settlement │ settle_as_buyer/seller │
│ - Balance update │ HashMap O(1) │
├─────────────────────────────────────────────────────────────┤
│ 4. Ledger I/O │ write_entry() │
│ - File write │ Disk I/O │
└─────────────────────────────────────────────────────────────┘
2.3 延迟百分位数
采样每 N 个订单的总处理延迟,计算:
| 百分位数 | 含义 |
|---|---|
| P50 | 中位数,典型情况 |
| P99 | 99% 的请求低于此值 |
| P99.9 | 尾延迟,最坏情况 |
| Max | 最大延迟 |
3. 初始基线数据
3.1 测试环境
- 硬件:MacBook Pro M 系列
- 数据:100,000 订单,47,886 成交
- 模式:Release build (
--release)
3.2 吞吐量
Throughput: ~29,000 orders/sec | ~14,000 trades/sec
Execution Time: ~3.5s
3.3 时间分解 🔥
=== Performance Breakdown ===
Balance Check: 17.68ms ( 0.5%) ← FxHashMap O(1)
Matching Engine: 36.04ms ( 1.0%) ← 极快!
Settlement: 4.77ms ( 0.1%) ← 几乎可忽略
Ledger I/O: 3678.68ms ( 98.4%) ← 瓶颈!
关键发现:
- Ledger I/O 占用 98.4% 的时间
- Balance Check + Matching + Settlement 总共只需 ~58ms
- 理论上限:~170 万 orders/sec(如果没有 I/O)
3.4 订单生命周期性能时间线 📊
订单生命周期 (Order Lifecycle)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Balance │ │ Matching │ │ Settlement │ │ Ledger │
│ Check │───▶│ Engine │───▶│ (Balance) │───▶│ I/O │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ FxHashMap │ │ BTreeMap │ │Vec<Balance> │ │ File:: │
│ +Vec O(1) │ │ O(log n) │ │ O(1) │ │ write() │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Total Time: 17.68ms 36.04ms 4.77ms 3678.68ms
Percentage: 0.5% 1.0% 0.1% 98.4%
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Per-Order: 0.18µs 0.36µs 0.05µs 36.79µs
Potential: 5.6M ops/s 2.8M ops/s 20M ops/s 27K ops/s
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
业务逻辑 ~58ms (1.6%) I/O ~3679ms (98.4%)
◀─────────────────────────▶ ◀───────────────────────▶
极快 ✅ 瓶颈 🔴
性能分析:
| 阶段 | 每订单延迟 | 理论 OPS | 说明 |
|---|---|---|---|
| Balance Check | 0.18µs | 5.6M/s | FxHashMap 账户查找 + Vec O(1) 余额索引 |
| Matching Engine | 0.36µs | 2.8M/s | BTreeMap 价格匹配 |
| Settlement | 0.05µs | 20M/s | Vec<Balance> O(1) 直接索引 |
| Ledger I/O | 36.79µs | 27K/s | unbuffered 文件写入 = 瓶颈! |
E2E 结果:
- 实际吞吐量: ~29K orders/sec (受限于 Ledger I/O)
- 理论上限 (无 I/O): ~1.7M orders/sec (60x 提升空间!)
3.5 延迟百分位数
=== Latency Percentiles (sampled) ===
Min: 125 ns
Avg: 34022 ns
P50: 583 ns ← 典型订单 < 1µs
P99: 391750 ns ← 99% 的订单 < 0.4ms
P99.9: 1243833 ns ← 尾延迟 ~1.2ms
Max: 3207875 ns ← 最坏 ~3ms
4. 输出文件
4.1 t2_perf.txt(机器可读)
# Performance Baseline - 0xInfinity
# Generated: 2025-12-16
orders=100000
trades=47886
exec_time_ms=3451.78
throughput_ops=28971
throughput_tps=13873
matching_ns=32739014
settlement_ns=3085409
ledger_ns=3388134698
latency_min_ns=125
latency_avg_ns=34022
latency_p50_ns=583
latency_p99_ns=391750
latency_p999_ns=1243833
latency_max_ns=3207875
4.2 t2_summary.txt(人类可读)
包含完整的执行摘要和性能分解。
5. PerfMetrics 实现
#![allow(unused)]
fn main() {
/// Performance metrics for execution analysis
#[derive(Default)]
struct PerfMetrics {
// Timing breakdown (nanoseconds)
total_balance_check_ns: u64, // Account lookup + balance check + lock
total_matching_ns: u64, // OrderBook.add_order()
total_settlement_ns: u64, // Balance updates after trade
total_ledger_ns: u64, // Ledger file I/O
// Per-order latency samples
latency_samples: Vec<u64>,
sample_rate: usize,
}
impl PerfMetrics {
fn new(sample_rate: usize) -> Self { ... }
fn add_order_latency(&mut self, latency_ns: u64) { ... }
fn add_balance_check_time(&mut self, ns: u64) { ... }
fn add_matching_time(&mut self, ns: u64) { ... }
fn add_settlement_time(&mut self, ns: u64) { ... }
fn add_ledger_time(&mut self, ns: u64) { ... }
fn percentile(&self, p: f64) -> Option<u64> { ... }
fn min_latency(&self) -> Option<u64> { ... }
fn max_latency(&self) -> Option<u64> { ... }
fn avg_latency(&self) -> Option<u64> { ... }
}
}
6. 优化路线图
基于基线数据,后续优化方向:
6.1 短期(0x07-c)
| 优化点 | 预期提升 | 难度 |
|---|---|---|
| 使用 BufWriter | 10-50x I/O | 低 |
| 批量写入 | 2-5x | 低 |
6.2 中期(0x08+)
| 优化点 | 预期提升 | 难度 |
|---|---|---|
| 异步 I/O | 解耦撮合和持久化 | 中 |
| 内存池 | 减少分配 | 中 |
6.3 长期
| 优化点 | 预期提升 | 难度 |
|---|---|---|
| DPDK/io_uring | 10x+ | 高 |
| FPGA | 100x+ | 极高 |
7. 命令参考
# 运行并生成性能数据
cargo run --release
# 更新基线(当代码变化时)
cargo run --release -- --baseline
# 查看性能数据
cat output/t2_perf.txt
# 对比性能变化
python3 scripts/compare_perf.py
compare_perf.py 输出示例
╔════════════════════════════════════════════════════════════════════════╗
║ Performance Comparison Report ║
╚════════════════════════════════════════════════════════════════════════╝
Metric Baseline Current Change
───────────────────────────────────────────────────────────────────────────
Orders 100000 100000 -
Trades 47886 47886 -
Exec Time 3753.87ms 3484.37ms -7.2%
Throughput (orders) 26639/s 28700/s +7.7%
Throughput (trades) 12756/s 13743/s +7.7%
───────────────────────────────────────────────────────────────────────────
Timing Breakdown (lower is better):
Metric Baseline Current Change OPS
Balance Check 17.68ms 16.51ms -6.6% 6.1M
Matching Engine 36.04ms 35.01ms -2.8% 2.9M
Settlement 4.77ms 5.22ms +9.4% 19.2M
Ledger I/O 3678.68ms 3411.49ms -7.3% 29K
───────────────────────────────────────────────────────────────────────────
Latency Percentiles (lower is better):
Metric Baseline Current Change
Latency MIN 125ns 125ns +0.0%
Latency AVG 37.9µs 34.8µs -8.2%
Latency P50 584ns 541ns -7.4%
Latency P99 420.2µs 398.9µs -5.1%
Latency P99.9 1.63ms 1.24ms -24.3%
Latency MAX 9.76ms 3.53ms -63.9%
───────────────────────────────────────────────────────────────────────────
✅ No significant regressions detected
Summary
本章完成了以下工作:
- PerfMetrics 结构:收集时间分解和延迟样本
- 时间分解:Balance Check / Matching / Settlement / Ledger I/O
- 延迟百分位数:P50 / P99 / P99.9 / Max
- t2_perf.txt:机器可读的性能基线文件
- compare_perf.py:对比工具,检测性能回归
- 关键发现:Ledger I/O 占 98.4%,是主要瓶颈
0x08-a Trading Pipeline Design
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📦 Code Changes: View Diff
Core Objective: To design a complete trading pipeline architecture that ensures order persistence, balance consistency, and system recoverability.
This chapter addresses the most critical design issues in a matching engine: Service Partitioning, Data Flow, and Atomicity Guarantees.
1. Why Persistence?
1.1 The Problem Scenario
Suppose the system crashes during matching:
User A sends Buy Order → ME receives & fills → System Crash
↓
User A's funds deducted
But no trade record
Order Lost!
Consequences of No Persistence:
- Order Loss: User orders vanish.
- Inconsistent State: Funds changed but no record exists.
- Unrecoverable: Upon restart, valid orders are unknown.
1.2 Solution: Persist First, Match Later
User A Buy Order → WAL Persist → ME Match → System Crash
↓ ↓
Order Saved Replay & Recover!
2. Unique Ordering
2.1 Why Unique Ordering?
In distributed systems, multiple nodes must agree on order sequence:
| Scenario | Problem |
|---|---|
| Node A receives Order 1 then Order 2 | |
| Node B receives Order 2 then Order 1 | Inconsistent Order! |
Result: Matching results differ between nodes!
2.2 Solution: Single Sequencer + Global Sequence ID
All Orders → Sequencer → Assign Global sequence_id → Persist → Dispatch to ME
↓
Unique Arrival Order
| Field | Description |
|---|---|
sequence_id | Monotonically increasing global ID |
timestamp | Nanosecond precision timestamp |
order_id | Business level Order ID |
3. Order Lifecycle
3.1 Persist First, Execute Later
┌─────────────────────────────────────────────────────────────────────────┐
│ Order Lifecycle │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Gateway │───▶│Pre-Check│───▶│ WAL │───▶│ ME │ │
│ │(Receiver)│ │(Balance) │ │(Persist)│ │ (Match) │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ Receive Order Insufficient? Disk Write Execute Match │
│ Early Reject Assign SeqID Guaranteed Exec │
│ │
└─────────────────────────────────────────────────────────────────────────┘
3.2 Pre-Check: Reducing Invalid Orders
Pre-Check queries UBSCore (User Balance Core Service) for balance info. Read-Only, No Side Effects.
#![allow(unused)]
fn main() {
async fn pre_check(order: Order) -> Result<Order, Reject> {
// 1. Query UBSCore for balance (Read-Only)
let balance = ubscore.query_balance(order.user_id, asset);
// 2. Calculate required amount
let required = match order.side {
Buy => order.price * order.qty / QTY_UNIT, // quote
Sell => order.qty, // base
};
// 3. Balance Check (Read-Only, No Lock)
if balance.avail < required {
return Err(Reject::InsufficientBalance);
}
// 4. Pass
Ok(order)
}
// Note: Balance might be consumed by others between Pre-Check and WAL.
// This is allowed; WAL's Balance Lock will handle it.
}
Why Pre-Check?
The Core Flow (WAL + Balance Lock + Matching) is expensive. We must filter garbage orders fast.
| No Pre-Check | With Pre-Check |
|---|---|
| Garbage enters core flow | Filters most invalid orders |
| Core wastes latency on invalid orders | Core processes mostly valid orders |
| Vulnerable to spam attacks | Reduces impact of malicious requests |
Pre-Check Items:
- ✅ Balance Check
- 📋 User Status (Banned?)
- 📋 Format Validation
- 📋 Rate Limiting
- 📋 Risk Rules
3.3 Must Execute Once Persisted
Once an order is persisted, it MUST end in one of these states:
┌─────────────────────┐
│ Order Persisted │
└─────────────────────┘
│
├──▶ Filled
├──▶ PartialFilled
├──▶ New (Booked)
├──▶ Cancelled
├──▶ Expired
└──▶ Rejected (Insufficient Balance) ← Valid Final State!
❌ Never: Logged but state unknown.
4. WAL: Why it’s the Best Choice?
4.1 What is WAL (Write-Ahead Log)?
WAL is an Append-Only log structure:
┌─────────────────────────────────────────────────────────────────┐
│ WAL File │
├─────────────────────────────────────────────────────────────────┤
│ Entry 1 │ Entry 2 │ Entry 3 │ Entry 4 │ ... │ ← Append│
│ (seq=1) │ (seq=2) │ (seq=3) │ (seq=4) │ │ │
└─────────────────────────────────────────────────────────────────┘
↑
Append Only!
4.2 Why WAL for HFT?
| Method | Write Pattern | Latency | Throughput | HFT Suitability |
|---|---|---|---|---|
| DB (MySQL) | Random + Txn | ~1-10ms | ~1K ops/s | ❌ Too Slow |
| KV (Redis) | Random | ~0.1-1ms | ~10K ops/s | ⚠️ Average |
| WAL | Sequential | ~1-10µs | ~1M ops/s | ✅ Best |
Why is WAL fast?
- Sequential Write vs Random Write:
- HDD: No seek time (~10ms saved).
- SSD: Reduces Write Amplification.
- Result: 10-100x faster.
- No Transaction Overhead:
- DB: Txn start, lock, redo log, data page, binlog, commit…
- WAL: Serialize -> Append -> (Optional) Fsync.
- Group Commit:
- Batch multiple writes into one
fsync.
- Batch multiple writes into one
#![allow(unused)]
fn main() {
// Group Commit Logic
pub fn flush(&mut self) -> io::Result<()> {
self.file.write_all(&self.buffer)?;
self.file.sync_data()?; // fsync once for N orders
self.buffer.clear();
Ok(())
}
}
5. Single Thread + Lock-Free Architecture
5.1 Why Single Thread?
Intuition: Concurrency = Fast. Reality in HFT: Single Thread is Faster.
| Multi-Thread | Single Thread |
|---|---|
| Locks & Contention | Lock-Free |
| Cache Invalidation | Cache Friendly |
| Context Switch Overhead | No Context Switch |
| Hard Ordering | Naturally Ordered |
| Complex Sync Logic | Simple Code |
5.2 Mechanical Sympathy
CPU Cache Hierarchy:
- L1 Cache: ~1ns
- L2 Cache: ~4ns
- RAM: ~100ns
Single Thread Advantage: Data stays in L1/L2 (Hot). No cache line contention.
5.3 LMAX Disruptor Pattern
Originating from LMAX Exchange (6M TPS on single thread):
- Single Writer (Avoid write contention)
- Pre-allocated Memory (Avoid GC/malloc)
- Cache Padding (Avoid false sharing)
- Batch Consumption
6. Ring Buffer: Inter-Service Communication
6.1 Why Ring Buffer?
| Method | Latency | Throughput |
|---|---|---|
| HTTP/gRPC | ~1ms | ~10K/s |
| Kafka | ~1-10ms | ~1M/s |
| Shared Memory Ring Buffer | ~100ns | ~10M/s |
6.2 Ring Buffer Principle
write_idx read_idx
↓ ↓
┌───┬───┬───┬───┬───┬───┬───┬───┬───┬───┐
│ 8 │ 9 │10 │11 │12 │13 │14 │ 0 │ 1 │ 2 │ ...
└───┴───┴───┴───┴───┴───┴───┴───┴───┴───┘
↑ ↑
New Data Consumer
- Fixed size, circular.
- Zero allocation during runtime.
- SPSC (Single Producer Single Consumer) is lock-free.
7. Overall Architecture
7.1 Core Services
| Service | Responsibility | State |
|---|---|---|
| Gateway | Receive Requests | Stateless |
| Pre-Check | Read-only Balance Check | Stateless |
| UBSCore | Balance Ops + Order WAL | Stateful (Balance) |
| ME | Matching, Generate Trades | Stateful (OrderBook) |
| Settlement | Persist Events | Stateless |
7.2 UBSCore Service (User Balance Core)
Single Entry Point for ALL Balance Operations.
Why UBSCore?
- Atomic: Single thread = No Double Spend.
- Audit: Complete trace of all changes.
- Recovery: Single WAL restores state.
Pipeline Role:
- Write Order WAL (Persist)
- Lock Balance
- Success → Forward to ME
- Fail → Rejected
- Handle Trade Events (Settlement)
- Update buyer/seller balances.
7.3 Matching Engine (ME)
ME is Pure Matching. It ignores Balances.
- Does: Maintain OrderBook, Match by Price/Time, Generate Trade Events.
- Does NOT: Check balance, lock funds, persist data.
Trade Event Drive Balance Update:
TradeEvent contains {price, qty, user_ids} → sufficient to calculate balance changes.
7.4 Settlement Service
Settlement Persists, does not modify Balances.
- Persist Trade Events, Order Events.
- Write Audit Log (Ledger).
7.5 Architecture Diagram
┌──────────────────────────────────────────────────────────────────────────────────┐
│ 0xInfinity HFT Architecture │
├──────────────────────────────────────────────────────────────────────────────────┤
│ Client Orders │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Gateway │ │
│ └──────┬───────┘ │
│ ▼ │
│ ┌──────────────┐ query balance │
│ │ Pre-Check │ ──────────────────────────────▶ UBSCore Service │
│ └──────┬───────┘ │
│ ▼ │
│ ┌──────────────┐ ┌────────────────────┐ │
│ │ Order Buffer │ │ Balance State │ │
│ └──────┬───────┘ │ (RAM, Single Thd) │ │
│ │ Ring Buffer └────────────────────┘ │
│ ▼ │
│ ┌──────────────────────────────────────────┐ │
│ │ UBSCore: Order Processing │ Operations: │
│ │ 1. Write Order WAL (Persist) │ - lock / unlock │
│ │ 2. Lock Balance │ - spend_frozen │
│ │ - OK → forward to ME │ - deposit │
│ │ - Fail → Rejected │ │
│ └──────────────┬───────────────────────────┘ │
│ │ Ring Buffer (valid orders) │
│ ▼ │
│ ┌──────────────────────────────────────────┐ │
│ │ Matching Engine (ME) │ │
│ │ │ │
│ │ Pure Matching, Ignore Balance │ │
│ │ Output: Trade Events │ │
│ └──────────────┬───────────────────────────┘ │
│ │ Ring Buffer (Trade Events) │
│ ┌───────┴────────┐ │
│ ▼ ▼ │
│ ┌───────────┐ ┌─────────────────────────┐ │
│ │ Settlement│ │ Balance Update Events │────▶ Execute Balance Update │
│ │ │ │ (from Trade Events) │ │
│ │ Persist: │ └─────────────────────────┘ │
│ │ - Trades │ │
│ │ - Ledger │ │
│ └───────────┘ │
└───────────────────────────────────────────────────────────────────────────────────┘
7.7 Event Sourcing + Pure State Machine
Order WAL = Single Source of Truth
State(t) = Replay(Order_WAL[0..t])
Any state (Balance, OrderBook) can be 100% reconstructed by replaying the Order WAL.
Pure State Machines:
- UBSCore: Order Events → Balance Events (Deterministic)
- ME: Valid Orders → Trade Events (Deterministic)
Recovery Flow:
- Load Checkpoint (Snapshot).
- Replay Order WAL from checkpoint.
- ME re-matches and generates events.
- UBSCore applies balance updates.
- System Restored.
8. Summary
Core Decisions:
- Persist First: WAL ensures recoverability.
- Pre-Check: Filters invalid orders early.
- Single Thread + Lock-Free: Avoids contention, maximizes throughput.
- UBSCore: Centralized, atomic balance management.
- Responsibility Segregation: UBSCore (Money), ME (Match), Settlement (Log).
Refactoring: For the upcoming implementation, we refactored the code structure:
lib.rs,main.rs,core_types.rs,config.rsorderbook.rs,balance.rs,engine.rscsv_io.rs,ledger.rs,perf.rs
Next: Detailed implementation of UBSCore and Ring Buffer.
🇨🇳 中文
📦 代码变更: 查看 Diff
核心目的:设计完整的交易流水线架构,确保订单持久化、余额一致性和系统可恢复性。
本章解决撮合引擎最关键的设计问题:服务划分、数据流和原子性保证。
1. 为什么需要持久化?
1.1 问题场景
假设系统在撮合过程中崩溃:
用户 A 发送买单 → ME 接收并成交 → 系统崩溃
↓
用户 A 的钱扣了
但没有成交记录
订单丢失!
没有持久化的后果:
- 订单丢失:用户下的单消失了
- 状态不一致:资金变动了但没有记录
- 无法恢复:重启后不知道有哪些订单
1.2 解决方案:先持久化,后撮合
用户 A 发送买单 → WAL 持久化 → ME 撮合 → 系统崩溃
↓ ↓
订单已保存 可以重放恢复!
2. 唯一排序 (Unique Ordering)
2.1 为什么需要唯一排序?
在分布式系统中,多个节点必须对订单顺序达成一致:
| 场景 | 问题 |
|---|---|
| 节点 A 先收到订单 1,再收到订单 2 | |
| 节点 B 先收到订单 2,再收到订单 1 | 顺序不一致! |
结果:两个节点的撮合结果可能不同!
2.2 解决方案:单点排序 + 全局序号
所有订单 → Sequencer → 分配全局 sequence_id → 持久化 → 分发到 ME
↓
唯一的到达顺序
| 字段 | 说明 |
|---|---|
sequence_id | 单调递增的全局序号 |
timestamp | 精确到纳秒的时间戳 |
order_id | 业务层订单 ID |
3. 订单生命周期
3.1 先持久化,后执行
┌─────────────────────────────────────────────────────────────────────────┐
│ 订单生命周期 │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Gateway │───▶│Pre-Check│───▶│ WAL │───▶│ ME │ │
│ │(接收订单)│ │(余额校验)│ │ (持久化)│ │ (撮合) │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ 接收订单 余额不足? 写入磁盘 执行撮合 │
│ 提前拒绝 分配seq_id 保证执行 │
│ │
└─────────────────────────────────────────────────────────────────────────┘
3.2 Pre-Check:减少无效订单
Pre-Check 通过查询 UBSCore (User Balance Core Service,用户余额核心服务,详见第 7.2 节) 获取余额信息,只读,无副作用:
#![allow(unused)]
fn main() {
async fn pre_check(order: Order) -> Result<Order, Reject> {
// 1. 查询 UBSCore 获取余额 (只读查询)
let balance = ubscore.query_balance(order.user_id, asset);
// 2. 计算所需金额
let required = match order.side {
Buy => order.price * order.qty / QTY_UNIT, // quote
Sell => order.qty, // base
};
// 3. 余额检查 (只读,不锁定)
if balance.avail < required {
return Err(Reject::InsufficientBalance);
}
// 4. 检查通过,放行订单到下一阶段
Ok(order)
}
// 注意:Pre-Check 不锁定余额!
// 余额可能在 Pre-Check 和 WAL 之间被其他订单消耗
// 这是允许的,WAL 后的 Balance Lock 会处理这种情况
}
为什么需要 Pre-Check?
核心流程(WAL 持久化、Balance Lock、撮合)的延迟成本很高。 用户可能提交大量垃圾订单,我们需要最快速地预过滤,减少进入核心流程的订单量。
| 不 Pre-Check | 有 Pre-Check |
|---|---|
| 垃圾订单直接进入核心流程 | 快速过滤大部分无效订单 |
| 核心流程处理无效订单,浪费延迟 | 核心流程只处理可能有效的订单 |
| 系统容易被刷单攻击 | 减少恶意请求的影响 |
Pre-Check 可以包含多种快速检查:
- ✅ 余额检查(当前实现)
- 📋 用户状态检查(是否被禁用)
- 📋 订单格式校验
- 📋 频率限制 (Rate Limit)
- 📋 风控规则(未来扩展)
重要:Pre-Check 是“尽力而为“的过滤器,不保证 100% 准确。 通过 Pre-Check 的订单,仍可能在 WAL + Balance Lock 阶段被拒绝。
3.3 一旦持久化,必须完整执行
订单被持久化后,无论发生什么,都必须有以下其中一个结果:
┌─────────────────────┐
│ 订单已持久化 │
└─────────────────────┘
│
├──▶ 成交 (Filled)
├──▶ 部分成交 (PartialFilled)
├──▶ 挂单中 (New)
├──▶ 用户取消 (Cancelled)
├──▶ 系统过期 (Expired)
└──▶ 余额不足被拒绝 (Rejected) ← 也是合法的终态!
❌ 绝对不能:订单消失 / 状态未知
4. WAL:为什么是最佳选择?
4.1 什么是 WAL (Write-Ahead Log)?
WAL 是一种追加写 (Append-Only) 的日志结构:
┌─────────────────────────────────────────────────────────────────┐
│ WAL File │
├─────────────────────────────────────────────────────────────────┤
│ Entry 1 │ Entry 2 │ Entry 3 │ Entry 4 │ ... │ ← 追加 │
│ (seq=1) │ (seq=2) │ (seq=3) │ (seq=4) │ │ │
└─────────────────────────────────────────────────────────────────┘
↑
只追加,不修改
4.2 为什么 WAL 是 HFT 最佳实践?
| 持久化方式 | 写入模式 | 延迟 | 吞吐量 | HFT 适用性 |
|---|---|---|---|---|
| 数据库 (MySQL/Postgres) | 随机写 + 事务 | ~1-10ms | ~1K ops/s | ❌ 太慢 |
| KV 存储 (Redis/RocksDB) | 随机写 | ~0.1-1ms | ~10K ops/s | ⚠️ 一般 |
| WAL 追加写 | 顺序写 | ~1-10µs | ~1M ops/s | ✅ 最佳 |
为什么 WAL 这么快?
- 顺序写 vs 随机写:
- 机械硬盘不用寻道。
- SSD 减少写放大。
- 结果:快 10-100 倍。
- 无事务开销:
- 无需锁、redo log、binlog 等数据库复杂机制。
- 批量刷盘 (Group Commit):
- 合并多次写入一次 fsync。
5. 单线程 + Lock-Free 架构
5.1 为什么选择单线程?
大多数人直觉认为:并发 = 快。但在 HFT 领域,单线程往往更快:
| 多线程 | 单线程 |
|---|---|
| 需要锁保护共享状态 | 无锁,无竞争 |
| 缓存失效 (cache invalidation) | 缓存友好 |
| 上下文切换开销 | 无切换开销 |
| 顺序难以保证 | 天然有序 |
| 复杂的同步逻辑 | 代码简单直观 |
5.2 Mechanical Sympathy
CPU Cache Hierarchy:
- L1 Cache: ~1ns
- L2 Cache: ~4ns
- RAM: ~100ns
单线程优势:数据始终在 L1/L2 缓存中(热数据),无 cache line 争用。
5.3 LMAX Disruptor 模式
这种单线程 + Ring Buffer 的架构源自 LMAX Exchange(伦敦多资产交易所),号称能在单线程上处理 600 万订单/秒:
- Single Writer (避免写竞争)
- Pre-allocated Memory (避免 GC/malloc)
- Cache Padding (避免 false sharing)
- Batch Consumption
6. Ring Buffer:服务间通信
6.1 为什么使用 Ring Buffer?
服务间通信的选择:
| 方式 | 延迟 | 吞吐量 |
|---|---|---|
| HTTP/gRPC | ~1ms | ~10K/s |
| Kafka | ~1-10ms | ~1M/s |
| Shared Memory Ring Buffer | ~100ns | ~10M/s |
6.2 Ring Buffer 原理
write_idx read_idx
↓ ↓
┌───┬───┬───┬───┬───┬───┬───┬───┬───┬───┐
│ 8 │ 9 │ 10│ 11│ 12│ 13│ 14│ 15│ 0 │ 1 │ ...
└───┴───┴───┴───┴───┴───┴───┴───┴───┴───┘
↑ ↑
新数据写入 消费者读取
- 固定大小,循环使用
- 无需动态分配
- Single Producer, Single Consumer ({SPSC) 可完全无锁
7. 整体架构
7.1 核心服务
| 服务 | 职责 | 状态 |
|---|---|---|
| Gateway | 接收客户端请求 | 无状态 |
| Pre-Check | 只读查询余额,过滤无效订单 | 无状态 |
| UBSCore | 所有余额操作 + Order WAL | 有状态 (余额) |
| ME | 纯撮合,生成 Trade Events | 有状态 (OrderBook) |
| Settlement | 持久化 events,未来写 DB | 无状态 |
7.2 UBSCore Service (User Balance Core)
UBSCore 是所有账户余额操作的唯一入口,单线程执行保证原子性。
应用场景:
- Write Order WAL (持久化)
- Lock Balance (锁定)
- Handle Trade Events (成交后结算)
7.3 Matching Engine (ME)
ME 是纯撮合引擎,不关心余额。
- 负责:维护 OrderBook,撮合,生成 Trade Events。
- 不负责:检查余额,锁定资金,持久化。
Trade Event 驱动余额更新:
TradeEvent 包含 {price, qty, user_ids},足够计算出余额变化。
7.4 Settlement Service
Settlement 负责持久化,不修改余额。
- 持久化 Trade Events,Order Events。
- 写审计日志 (Ledger)。
7.5 完整架构图
┌──────────────────────────────────────────────────────────────────────────────────┐
│ 0xInfinity HFT Architecture │
├──────────────────────────────────────────────────────────────────────────────────┤
│ Client Orders │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Gateway │ │
│ └──────┬───────┘ │
│ ▼ │
│ ┌──────────────┐ query balance │
│ │ Pre-Check │ ──────────────────────────────▶ UBSCore Service │
│ └──────┬───────┘ │
│ ▼ │
│ ┌──────────────┐ ┌────────────────────┐ │
│ │ Order Buffer │ │ Balance State │ │
│ └──────┬───────┘ │ (RAM, Single Thd) │ │
│ │ Ring Buffer └────────────────────┘ │
│ ▼ │
│ ┌──────────────────────────────────────────┐ │
│ │ UBSCore: Order Processing │ Operations: │
│ │ 1. Write Order WAL (持久化) │ - lock / unlock │
│ │ 2. Lock Balance │ - spend_frozen │
│ │ - OK → forward to ME │ - deposit │
│ │ - Fail → Rejected │ │
│ └──────────────┬───────────────────────────┘ │
│ │ Ring Buffer (valid orders) │
│ ▼ │
│ ┌──────────────────────────────────────────┐ │
│ │ Matching Engine (ME) │ │
│ │ │ │
│ │ 纯撮合,不关心 Balance │ │
│ │ 输出: Trade Events │ │
│ └──────────────┬───────────────────────────┘ │
│ │ Ring Buffer (Trade Events) │
│ ┌───────┴────────┐ │
│ ▼ ▼ │
│ ┌───────────┐ ┌─────────────────────────┐ │
│ │ Settlement│ │ Balance Update Events │────▶ 执行余额更新 │
│ │ │ │ (from Trade Events) │ │
│ │ 持久化: │ └─────────────────────────┘ │
│ │ - Trades │ │
│ │ - Ledger │ │
│ └───────────┘ │
└───────────────────────────────────────────────────────────────────────────────────┘
7.7 Event Sourcing + Pure State Machine
Order WAL = Single Source of Truth
State(t) = Replay(Order_WAL[0..t])
只要有 Order WAL,就能恢复整个系统状态!
Pure State Machines:
- UBSCore: Order Events → Balance Events (确定性)
- ME: Valid Orders → Trade Events (确定性)
恢复流程:
- 加载最近快照 Checkpoint。
- 重放 Order WAL。
- 系统恢复到崩溃前状态。
8. Summary
核心设计:
- 先持久化:WAL 保证可恢复性。
- Pre-Check:提前过滤无效订单。
- 单线程 + 无锁:避免锁竞争,最大化吞吐。
- UBSCore:集中式、原子的余额管理。
- 职责分离:UBSCore (钱),ME (撮合),Settlement (日志)。
代码重构:
为后续章节准备,我们重构了 src 目录结构,模块化了 main.rs, core_types.rs 等。
下一步:实现 UBSCore 和 Ring Buffer。
0x08-b UBSCore Implementation
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📦 Code Changes: View Diff
Objective: From design to implementation: Building a Safety-First Balance Core Service.
In the previous chapter (0x08-a), we designed the full HFT pipeline architecture. Now, it’s time to implement the core components. This chapter covers:
- Ring Buffer - Lock-free inter-service communication.
- Write-Ahead Log (WAL) - Order persistence.
- UBSCore Service - The core balance service.
1. Technology Selection: Safety First
In financial systems, maturity and stability outweigh extreme performance.
1.1 Ring Buffer Selection
| Crate | Maturity | Security | Performance |
|---|---|---|---|
crossbeam-queue | 🌟🌟🌟🌟🌟 (3.3M+ DLs) | Heavily Audited | Very Low Latency |
ringbuf | 🌟🌟🌟🌟 (600K+ DLs) | Community Verified | Lower Latency |
rtrb | 🌟🌟🌟 (Newer) | Less Vetted | Lowest Latency |
Our Choice: crossbeam-queue
Reasons:
- Maintained by Rust core team members.
- Base dependency for
tokio,actix,rayon. - If it has a bug, half the Rust ecosystem collapses.
Financial System Selection Principle: Use what lets you sleep at night.
#![allow(unused)]
fn main() {
use crossbeam_queue::ArrayQueue;
// Create fixed-size ring buffer
let queue: ArrayQueue<OrderMessage> = ArrayQueue::new(1024);
// Producer: Non-blocking push
queue.push(order_msg).unwrap();
// Consumer: Non-blocking pop
if let Some(msg) = queue.pop() {
process(msg);
}
}
2. Write-Ahead Log (WAL)
WAL is the system’s Single Source of Truth.
2.1 Design Principles
#![allow(unused)]
fn main() {
/// Write-Ahead Log for Orders
///
/// Principles:
/// 1. Append-Only: Sequential I/O, max performance.
/// 2. Group Commit: Batch fsyncs.
/// 3. Monotonic sequence_id: Deterministic replay.
pub struct WalWriter {
writer: BufWriter<File>,
next_seq: SeqNum,
pending_count: usize,
config: WalConfig,
}
}
2.2 Group Commit Strategy
| Flush Strategy | Latency | Throughput | Safety |
|---|---|---|---|
| Every Entry | ~50µs | ~20K/s | Highest |
| Every 100 Entries | ~5µs (amortized) | ~200K/s | High |
| Every 1ms | ~1µs (amortized) | ~1M/s | Medium |
We choose Every 100 Entries to balance performance and safety:
#![allow(unused)]
fn main() {
pub struct WalConfig {
pub path: String,
pub flush_interval_entries: usize, // Flush every N entries
pub sync_on_flush: bool, // Whether to call fsync
}
}
2.3 WAL Entry Format
Currently CSV (readable for dev):
seq_id,timestamp_ns,order_id,user_id,price,qty,side,order_type
1,1702742400000000000,1001,100,85000000000,100000000,Buy,Limit
In production, switch to Binary (54 bytes/entry) for better performance.
3. UBSCore Service
UBSCore is the Single Entry Point for all balance operations.
3.1 Responsibilities
- Balance State Management: In-memory balance state.
- Order WAL Writing: Persist orders.
- Balance Operations: lock/unlock/spend_frozen/deposit.
3.2 Core Structure
#![allow(unused)]
fn main() {
pub struct UBSCore {
/// User Accounts - Authoritative Balance State
accounts: FxHashMap<UserId, UserAccount>,
/// Write-Ahead Log
wal: WalWriter,
/// Configuration
config: TradingConfig,
/// Pending Orders (Locked but not filled)
pending_orders: FxHashMap<OrderId, PendingOrder>,
/// Statistics
stats: UBSCoreStats,
}
}
3.3 Order Processing Flow
process_order(order):
│
├─ 1. Write to WAL ──────────► Get seq_id
│
├─ 2. Validate order ────────► Check price/qty
│
├─ 3. Get user account ──────► Lookup user
│
├─ 4. Calculate lock amount ─► Buy: price * qty / qty_unit
│ Sell: qty
│
└─ 5. Lock balance ──────────► Success → Ok(ValidOrder)
Fail → Err(Rejected)
Implementation:
#![allow(unused)]
fn main() {
pub fn process_order(&mut self, order: Order) -> Result<ValidOrder, OrderEvent> {
// Step 1: Write to WAL FIRST (persist before any state change)
let seq_id = self.wal.append(&order)?;
// Step 2-4: Validate and calculate
// ...
// Step 5: Lock balance
let lock_result = account
.get_balance_mut(locked_asset_id)
.and_then(|balance| balance.lock(locked_amount));
match lock_result {
Ok(()) => {
// Track pending order
self.pending_orders.insert(order.id, PendingOrder { ... });
Ok(ValidOrder::new(seq_id, order, locked_amount, locked_asset_id))
}
Err(_) => Err(OrderEvent::Rejected { ... })
}
}
}
3.4 Settlement
#![allow(unused)]
fn main() {
pub fn settle_trade(&mut self, event: &TradeEvent) -> Result<(), &'static str> {
let trade = &event.trade;
let quote_amount = trade.price * trade.qty / self.config.qty_unit();
// Buyer: spend USDT, receive BTC
buyer.get_balance_mut(quote_id)?.spend_frozen(quote_amount)?;
buyer.get_balance_mut(base_id)?.deposit(trade.qty)?;
// Seller: spend BTC, receive USDT
seller.get_balance_mut(base_id)?.spend_frozen(trade.qty)?;
seller.get_balance_mut(quote_id)?.deposit(quote_amount)?;
Ok(())
}
}
4. Message Types
Services communicate via defined message types:
#![allow(unused)]
fn main() {
// Gateway → UBSCore
pub struct OrderMessage {
pub seq_id: SeqNum,
pub order: Order,
// ...
}
// UBSCore → ME
pub struct ValidOrder {
pub seq_id: SeqNum,
pub order: Order,
pub locked_amount: u64,
// ...
}
// ME → UBSCore + Settlement
pub struct TradeEvent {
pub trade: Trade,
pub taker_order_id: OrderId,
pub maker_order_id: OrderId,
// ...
}
}
5. Integration & Usage
5.1 CLI Arguments
# Original Pipeline
cargo run --release
# UBSCore Pipeline (Enable WAL)
cargo run --release -- --ubscore
5.2 Performance Comparison
| Metric | Original | UBSCore | Change |
|---|---|---|---|
| Throughput | 15,070 ops/s | 14,314 ops/s | -5% |
| WAL Entries | N/A | 100,000 | 6.67 MB |
| Balance Check | 0.3% | 1.3% | +1% |
| Matching | 45.5% | 45.5% | - |
| Settlement | 0.1% | 0.2% | - |
| Ledger I/O | 54.0% | 53.0% | -1% |
Analysis:
- WAL introduces ~5% overhead.
- Acceptable cost for safety.
- Main bottleneck remains Ledger I/O.
6. Tests
6.1 Unit Tests
cargo test
# 31 tests passing
6.2 E2E Tests
sh scripts/test_e2e.sh
# ✅ All tests passed!
7. New Files
| File | Lines | Description |
|---|---|---|
src/messages.rs | 265 | Inter-service messages |
src/wal.rs | 340 | Write-Ahead Log |
src/ubscore.rs | 490 | User Balance Core |
8. Key Learnings
8.1 Safety First
- Maturity > Performance
- Auditable > Rapid Dev
8.2 WAL is Single Source of Truth
All state = f(WAL). Foundation for Disaster Recovery and Audit.
8.3 Single Thread Advantage
UBSCore uses single thread for natural atomicity (no locking needed for balance ops) and predictable latency.
9. Critical Bug Fix: Cost Calculation Overflow
9.1 The Issue
Testing with --ubscore revealed 1032 rejected orders that were accepted in the legacy mode.
9.2 Root Cause
Overflow in price * qty (u64).
Example Order #21:
- Price: 84,956.01 USDT (6 decimals) ->
84,956,010,000 - Qty: 2.56 BTC (8 decimals) ->
256,284,400 - Product:
2.177 × 10^19>u64::MAX
9.3 Why Legacy Mode Passed?
Release Code Wrapping Arithmetic:
Legacy code cost = price * qty wrapped around, resulting in a much smaller, incorrect value. users were locked for 33k USDT but bought 217k USDT worth of BTC!
9.4 The Fix
#![allow(unused)]
fn main() {
// Use u128 for intermediate calculation
let cost_128 = (self.price as u128) * (self.qty as u128) / (qty_unit as u128);
if cost_128 > u64::MAX as u128 {
Err(CostError::Overflow)
}
}
9.5 Configuration Issue
USDT with 6 decimals is risky. Recommended: 2 decimals. Binance uses 2 decimals for USDT price.
10. Improvement: Ledger Integrity & Determinism
10.1 Incomplete Ledger
Current Ledger lacks Deposit, Lock, Unlock, SpendFrozen. Only tracks Settlement.
10.2 Pipeline Non-Determinism
Pipeline concurrency means Lock and Settlement events interleave non-deterministically.
Snapshot comparison is impossible.
10.3 Solution: Version Space Separation
Separate version counters for Lock events and Settle events.
| Version Space | Increment On | Sort By | Determinism |
|---|---|---|---|
lock_version | Lock/Unlock | order_seq_id | ✅ Deterministic |
settle_version | Settle | trade_id | ✅ Deterministic |
Validation Strategy: Verify the Final Set of events, sorted by their respective versions/source IDs, rather than checking snapshot consistency at arbitrary times.
11. Design Discussion: Causal Chain
UBSCore has inputs from OrderQueue and TradeQueue. Interleaving is random.
Solution:
- OrderQueue strictly follows
order_seq_id. - TradeQueue strictly follows
trade_id. - Link every Balance Event to its source (
order_seq_idortrade_id). - This forms a Causal Chain for audit.
#![allow(unused)]
fn main() {
struct BalanceEvent {
// ...
source_type: SourceType, // Order | Trade
source_id: u64, // order_seq_id | trade_id
}
}
This allows offline verification:
Lock(source=Order N) must exist if Order N exists.
Settle(source=Trade M) must exist if Trade M exists.
12. Next Steps (0x08-c)
- Implement Version Space Separation.
- Expand
BalanceEventwith causal links. - Integrate Ring Buffer.
- Develop Causal Chain Audit Tools.
🇨🇳 中文
📦 代码变更: 查看 Diff
从设计到实现:构建安全第一的余额核心服务
概述
在上一章(0x08-a)中,我们设计了完整的 HFT 交易流水线架构。现在,是时候实现核心组件了。本章我们将构建:
- Ring Buffer - 服务间无锁通信
- Write-Ahead Log (WAL) - 订单持久化
- UBSCore Service - 余额核心服务
1. 技术选型:安全第一
在金融系统中,成熟稳定比极致性能更重要。
1.1 Ring Buffer 选型
| 库 | 成熟度 | 安全性 | 性能 |
|---|---|---|---|
crossbeam-queue | 🌟🌟🌟🌟🌟 (330万+下载) | 最严苛审计 | 极低延迟 |
ringbuf | 🌟🌟🌟🌟 (60万+下载) | 社区验证 | 更低延迟 |
rtrb | 🌟🌟🌟 (较新) | 较少审查 | 最低延迟 |
我们的选择:crossbeam-queue
理由:
- Rust 核心团队成员参与维护
- 被 tokio, actix, rayon 作为底层依赖
- 如果它有 Bug,半个 Rust 生态都会崩
金融系统选型原则:用它睡得着觉。
#![allow(unused)]
fn main() {
use crossbeam_queue::ArrayQueue;
// 创建固定容量的 ring buffer
let queue: ArrayQueue<OrderMessage> = ArrayQueue::new(1024);
// 生产者:非阻塞 push
queue.push(order_msg).unwrap();
// 消费者:非阻塞 pop
if let Some(msg) = queue.pop() {
process(msg);
}
}
2. Write-Ahead Log (WAL)
WAL 是系统的唯一事实来源 (Single Source of Truth)。
2.1 设计原则
#![allow(unused)]
fn main() {
/// Write-Ahead Log for Orders
///
/// 设计原则:
/// 1. 追加写 (Append-Only) - 顺序 I/O,最大化性能
/// 2. Group Commit - 批量刷盘,减少 fsync 次数
/// 3. 单调递增 sequence_id - 保证确定性重放
pub struct WalWriter {
writer: BufWriter<File>,
next_seq: SeqNum,
pending_count: usize,
config: WalConfig,
}
}
2.2 Group Commit 策略
| 刷盘策略 | 延迟 | 吞吐量 | 数据安全 |
|---|---|---|---|
| 每条 fsync | ~50µs | ~20K/s | 最高 |
| 每 100 条 | ~5µs (均摊) | ~200K/s | 高 |
| 每 1ms | ~1µs (均摊) | ~1M/s | 中 |
我们选择 每 100 条刷盘,在性能和安全间取得平衡:
#![allow(unused)]
fn main() {
pub struct WalConfig {
pub path: String,
pub flush_interval_entries: usize, // 每 N 条刷盘
pub sync_on_flush: bool, // 是否调用 fsync
}
}
2.3 WAL 条目格式
当前使用 CSV 格式(开发阶段可读性好):
seq_id,timestamp_ns,order_id,user_id,price,qty,side,order_type
1,1702742400000000000,1001,100,85000000000,100000000,Buy,Limit
生产环境可切换为二进制格式(54 bytes/entry)以提升性能。
3. UBSCore Service
UBSCore 是所有余额操作的唯一入口。
3.1 职责
- Balance State Management - 内存中的余额状态
- Order WAL Writing - 持久化订单
- Balance Operations - lock/unlock/spend_frozen/deposit
3.2 核心结构
#![allow(unused)]
fn main() {
pub struct UBSCore {
/// 用户账户 - 权威余额状态
accounts: FxHashMap<UserId, UserAccount>,
/// Write-Ahead Log
wal: WalWriter,
/// 交易配置
config: TradingConfig,
/// 待处理订单(已锁定但未成交)
pending_orders: FxHashMap<OrderId, PendingOrder>,
/// 统计信息
stats: UBSCoreStats,
}
}
3.3 订单处理流程
process_order(order):
│
├─ 1. Write to WAL ──────────► 获得 seq_id
│
├─ 2. Validate order ────────► 价格/数量检查
│
├─ 3. Get user account ──────► 查找用户
│
├─ 4. Calculate lock amount ─► Buy: price * qty / qty_unit
│ Sell: qty
│
└─ 5. Lock balance ──────────► Success → Ok(ValidOrder)
Fail → Err(Rejected)
代码实现:
#![allow(unused)]
fn main() {
pub fn process_order(&mut self, order: Order) -> Result<ValidOrder, OrderEvent> {
// Step 1: Write to WAL FIRST (persist before any state change)
let seq_id = self.wal.append(&order)?;
// Step 2-4: Validate and calculate
// ...
// Step 5: Lock balance
let lock_result = account
.get_balance_mut(locked_asset_id)
.and_then(|balance| balance.lock(locked_amount));
match lock_result {
Ok(()) => {
// Track pending order
self.pending_orders.insert(order.id, PendingOrder { ... });
Ok(ValidOrder::new(seq_id, order, locked_amount, locked_asset_id))
}
Err(_) => Err(OrderEvent::Rejected { ... })
}
}
}
3.4 成交结算
#![allow(unused)]
fn main() {
pub fn settle_trade(&mut self, event: &TradeEvent) -> Result<(), &'static str> {
let trade = &event.trade;
let quote_amount = trade.price * trade.qty / self.config.qty_unit();
// Buyer: spend USDT, receive BTC
buyer.get_balance_mut(quote_id)?.spend_frozen(quote_amount)?;
buyer.get_balance_mut(base_id)?.deposit(trade.qty)?;
// Seller: spend BTC, receive USDT
seller.get_balance_mut(base_id)?.spend_frozen(trade.qty)?;
seller.get_balance_mut(quote_id)?.deposit(quote_amount)?;
Ok(())
}
}
4. 消息类型
服务间通过明确定义的消息类型通信:
#![allow(unused)]
fn main() {
// Gateway → UBSCore
pub struct OrderMessage {
pub seq_id: SeqNum,
pub order: Order,
// ...
}
// UBSCore → ME
pub struct ValidOrder {
pub seq_id: SeqNum,
pub order: Order,
pub locked_amount: u64,
// ...
}
// ME → UBSCore + Settlement
pub struct TradeEvent {
pub trade: Trade,
pub taker_order_id: OrderId,
pub maker_order_id: OrderId,
// ...
}
}
5. 集成与使用
5.1 命令行参数
# 原始流水线
cargo run --release
# UBSCore 流水线(启用 WAL)
cargo run --release -- --ubscore
5.2 性能对比
| 指标 | 原始 | UBSCore | 变化 |
|---|---|---|---|
| 吞吐量 | 15,070 ops/s | 14,314 ops/s | -5% |
| WAL 条目 | N/A | 100,000 | 6.67 MB |
| 余额检查 | 0.3% | 1.3% | +1% |
| 匹配引擎 | 45.5% | 45.5% | - |
| 结算 | 0.1% | 0.2% | - |
| 账本 I/O | 54.0% | 53.0% | -1% |
分析:
- WAL 写入引入约 5% 的开销
- 这是可接受的代价,换取了数据安全性
- 主要瓶颈仍是 Ledger I/O(下一章优化目标)
6. 测试
6.1 单元测试
cargo test
# 31 tests passing
6.2 E2E 测试
sh scripts/test_e2e.sh
# ✅ All tests passed!
7. 新增文件
| 文件 | 行数 | 描述 |
|---|---|---|
src/messages.rs | 265 | 服务间消息类型 |
src/wal.rs | 340 | Write-Ahead Log |
src/ubscore.rs | 490 | User Balance Core |
8. 关键学习
8.1 安全第一
- 成熟稳定 > 极致性能
- 可审计 > 快速开发
- 用它睡得着觉 是选型的最高标准
8.2 WAL 是唯一事实来源
All state = f(WAL)。任何时刻,系统状态都可以从 WAL 100% 重建。这也是灾难恢复和审计合规的基础。
8.3 单线程是优势
UBSCore 选择单线程不是因为简单,而是因为:
- 自然的原子性(无锁)
- 不可能双重支付
- 可预测的延迟
9. 重要 Bug 修复:Cost 计算溢出
9.1 问题发现
在实现 UBSCore 并运行 --ubscore 模式测试时,发现了 1032 个订单被拒绝,而传统模式全部接受。
9.2 根本原因
Cost 计算时 price * qty 溢出 u64。
订单 #21:
price = 84,956,010,000(84956.01 USDT,6位精度)qty = 256,284,400(2.562844 BTC,8位精度)price * qty = 2.177 × 10^19> u64::MAX
9.3 传统模式为什么没报错?
Release 模式的 wrapping arithmetic! 传统模式下,溢出后值变小,虽然通过了检查,但是锁定的金额严重不足!这是一个巨大的金融漏洞。
9.4 修复方案
#![allow(unused)]
fn main() {
// 使用 u128 进行中间计算
let cost_128 = (self.price as u128) * (self.qty as u128) / (qty_unit as u128);
if cost_128 > u64::MAX as u128 {
Err(CostError::Overflow)
}
}
9.5 配置问题:USDT 精度过高
USDT 使用 6 位精度导致溢出风险。建议使用 2 位精度(Binance 标准)。
10. 待改进:Ledger 完整性与确定性
10.1 当前 Ledger 不完整
当前 Ledger 缺失 Deposit, Lock, Unlock, SpendFrozen 等操作。
10.2 Pipeline 模式的确定性问题
由于 Ring Buffer 并行处理,Lock 和 Settle 事件的交错顺序不固定,导致无法通过快照对比来验证一致性。
10.3 解决方案:分离 Version 空间
为每种事件类型维护独立的 version:
| Version 空间 | 递增条件 | 排序依据 | 确定性 |
|---|---|---|---|
lock_version | Lock/Unlock 事件 | order_seq_id | ✅ 确定 |
settle_version | Settle 事件 | trade_id | ✅ 确定 |
验证策略: 不再验证任意时刻的快照,而是验证处理完成后的最终事件集合(按各自 Version 排序)。
11. 设计讨论全记录
11.1 因果链设计
UBSCore 有两个输入源:OrderQueue 和 TradeQueue。 为了审计,我们建立了因果链:
#![allow(unused)]
fn main() {
struct BalanceEvent {
// ...
source_type: SourceType, // Order | Trade
source_id: u64, // order_seq_id | trade_id
}
}
这不仅解决了审计问题,还让我们可以快速定位问题源头:Lock 必定对应一个 Order,Settle 必定对应一个 Trade。
12. 下一章任务 (0x08-c)
- 实现分离 Version 空间 -
lock_version/settle_version - 扩展
BalanceEvent- 添加event_type,version,source_id - Ring Buffer 集成
- 因果链审计工具
0x08-c Complete Event Flow & Verification
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📦 Code Changes: View Diff
Core Objective: Implement a complete Event Sourcing architecture, verify equivalence with the legacy version, and upgrade the baseline.
Problems Identified
In the previous chapter (0x08-b), we implemented the UBSCore service but identified several issues:
1. Incomplete Ledger
The current Ledger only records settlement operations (Credit/Debit), missing other critical balance changes:
| Operation | Current Record | Production Req |
|---|---|---|
| Deposit | ❌ | ✅ |
| Lock | ❌ | ✅ |
| Unlock | ❌ | ✅ |
| Settle | ❌ | ✅ |
2. Pipeline Determinism Issue
With a multi-stage Ring Buffer pipeline, the interleaving order of Lock and Settle events is non-deterministic:
Run 1: [Lock1, Lock2, Lock3, Settle1, Settle2, Settle3]
Run 2: [Lock1, Settle1, Lock2, Settle2, Lock3, Settle3]
Result: Final state is identical, but the intermediate version sequence differs. Direct diff verification fails.
Objectives
1. Implement Separate Version Spaces
#![allow(unused)]
fn main() {
struct Balance {
avail: u64,
frozen: u64,
lock_version: u64, // Increments only on lock/unlock
settle_version: u64, // Increments only on settle
}
}
2. Expand BalanceEvent
#![allow(unused)]
fn main() {
struct BalanceEvent {
user_id: u64,
asset_id: u32,
event_type: EventType, // Deposit | Lock | Unlock | Settle
version: u64, // Increments within strict version space
source_type: SourceType,// Order | Trade | External
source_id: u64, // order_seq_id | trade_id | ref_id
delta: i64,
avail_after: u64,
frozen_after: u64,
}
}
3. Record ALL Balance Operations
Order(seq=5) ──Trigger──→ Lock(buyer USDT, lock_version=1)
│
└──→ Trade(id=3)
│
├──Trigger──→ Settle(buyer: -USDT, +BTC, settle_version=1)
└──Trigger──→ Settle(seller: -BTC, +USDT, settle_version=1)
4. Verify Equivalence & Upgrade Baseline
Ensure the refactored system produces the exact same final state as the pre-refactor version.
Implementation Progress
Phase 1: Separate Version Spaces ✅ Done
Goal: Solve Pipeline Determinism.
1.1 Modify Balance Struct
#![allow(unused)]
fn main() {
// src/balance.rs
pub struct Balance {
avail: u64,
frozen: u64,
lock_version: u64, // lock/unlock/deposit/withdraw
settle_version: u64, // spend_frozen/deposit
}
}
1.2 Version Increment Logic
| Operation | Version Incremented |
|---|---|
deposit() | lock_version AND settle_version |
withdraw() | lock_version |
lock() | lock_version |
unlock() | lock_version |
spend_frozen() | settle_version |
1.3 Equivalence Verification ✅
Script: scripts/verify_baseline_equivalence.py
$ python3 scripts/verify_baseline_equivalence.py
╔════════════════════════════════════════════════════════════╗
║ Baseline Equivalence Verification ║
╚════════════════════════════════════════════════════════════╝
...
=== Step 3: Compare avail and frozen values ===
✅ EQUIVALENT: avail and frozen values are IDENTICAL
Phase 2: Expand BalanceEvent ✅ Done
Goal: Full Event Sourcing.
2.1 Event Types & Structure
Implemented in src/messages.rs:
#![allow(unused)]
fn main() {
pub enum BalanceEventType { Deposit, Withdraw, Lock, Unlock, Settle }
pub enum SourceType { Order, Trade, External }
pub struct BalanceEvent {
pub user_id: u64,
pub asset_id: u32,
pub event_type: BalanceEventType,
pub version: u64,
pub source_type: SourceType,
pub source_id: u64,
pub delta: i64,
// ...
}
}
Phase 3: Record All Operations in Ledger ✅ Done
Goal: Every balance change is recorded.
3.1 Event Log File
UBSCore mode generates output/t2_events.csv:
user_id,asset_id,event_type,version,source_type,source_id,delta,avail_after,frozen_after
655,2,lock,2,order,1,-3315478,996684522,3315478
96,2,settle,2,trade,1,-92889,999907111,0
604,1,deposit,1,external,1,10000000000,10000000000,0
3.2 Recorded Operations
| Operation | Status | Note |
|---|---|---|
| Deposit | ✅ | Recorded on init |
| Lock | ✅ | Recorded on order lock |
| Settle | ✅ | Recorded on trade settle |
| Unlock | ⏳ | (No cancel in current test) |
| Withdraw | ⏳ | (No withdraw in current test) |
3.3 Event Stats
Total events: 293,544
Deposit events: 2,000
Lock events: 100,000
Settle events: 191,544
Phase 4: Validation Tests ✅ Done
Goal: Verify Event Correctness.
4.1 Event Correctness Verification
scripts/verify_balance_events.py - 7 Checks:
| Check | Description | Status |
|---|---|---|
| Lock Count | = Accepted Orders | ✅ |
| Settle Count | = Trades × 4 | ✅ |
| Lock Version Continuity | Incremental per User-Asset | ✅ |
| Settle Version Continuity | Incremental per User-Asset | ✅ |
| Delta Conservation | Sum of deltas per trade = 0 | ✅ |
| Source Consistency | Lock→Order, Settle→Trade | ✅ |
| Deposit Correctness | Positive delta + source=external | ✅ |
4.2 Events Baseline Verification
scripts/verify_events_baseline.py:
$ python3 scripts/verify_events_baseline.py
...
Comparing by event type...
deposit: output=2000, baseline=2000 ✅
lock: output=100000, baseline=100000 ✅
settle: output=191544, baseline=191544 ✅
╔════════════════════════════════════════════════════════════╗
║ ✅ Events match baseline! ║
╚════════════════════════════════════════════════════════════╝
4.3 Full E2E Test
Run scripts/test_ubscore_e2e.sh:
$ bash scripts/test_ubscore_e2e.sh
=== Step 1: Run with UBSCore mode ===
...
=== Step 2: Verify standard baselines ===
✅ All MATCH
=== Step 3: Verify balance events correctness ===
✅ All 7 checks passed!
=== Step 4: Verify events baseline ===
✅ Events match baseline!
Baseline Files
| File | Description |
|---|---|
baseline/t2_balances_final.csv | Final Balance State |
baseline/t2_orderbook.csv | Final OrderBook State |
baseline/t2_events.csv | Event Log (293,544 events) |
Next Steps
- 0x08-d: Multi-threaded Pipeline: Implement Ring Buffer to connect services.
- 0x09: Multi-Symbol Support: Scale to multiple trading pairs.
References
🇨🇳 中文
📦 代码变更: 查看 Diff
核心目标:实现完整的事件溯源架构,验证与旧版本的等效性,升级 baseline。
本章问题
上一章(0x08-b)我们实现了 UBSCore 服务,但发现了几个问题:
1. Ledger 不完整
当前 Ledger 只记录结算操作(Credit/Debit),缺失其他余额变更:
| 操作 | 当前记录 | 生产要求 |
|---|---|---|
| Deposit | ❌ | ✅ |
| Lock | ❌ | ✅ |
| Unlock | ❌ | ✅ |
| Settle | ❌ | ✅ |
2. Pipeline 确定性问题
当采用 Ring Buffer 多阶段 Pipeline 时,Lock 和 Settle 的交错顺序不确定:
运行 1: [Lock1, Lock2, Lock3, Settle1, Settle2, Settle3]
运行 2: [Lock1, Settle1, Lock2, Settle2, Lock3, Settle3]
最终状态相同,但中间 version 序列不同 → 无法直接 diff 验证。
本章目标
1. 实现分离 Version 空间
#![allow(unused)]
fn main() {
struct Balance {
avail: u64,
frozen: u64,
lock_version: u64, // 只在 lock/unlock 时递增
settle_version: u64, // 只在 settle 时递增
}
}
2. 扩展 BalanceEvent
#![allow(unused)]
fn main() {
struct BalanceEvent {
user_id: u64,
asset_id: u32,
event_type: EventType, // Deposit | Lock | Unlock | Settle
version: u64, // 在对应 version 空间内递增
source_type: SourceType,// Order | Trade | External
source_id: u64, // order_seq_id | trade_id | ref_id
delta: i64,
avail_after: u64,
frozen_after: u64,
}
}
3. 记录所有余额操作
Order(seq=5) ──触发──→ Lock(buyer USDT, lock_version=1)
│
└──→ Trade(id=3)
│
├──触发──→ Settle(buyer: -USDT, +BTC, settle_version=1)
└──触发──→ Settle(seller: -BTC, +USDT, settle_version=1)
4. 验证等效性并升级 Baseline
确保重构后的系统与重构前产生相同的最终状态。
实现进度
Phase 1: 分离 Version 空间 ✅ 已完成
目标:解决 Pipeline 确定性问题
1.1 修改 Balance 结构
#![allow(unused)]
fn main() {
// src/balance.rs
pub struct Balance {
avail: u64,
frozen: u64,
lock_version: u64, // lock/unlock/deposit/withdraw 操作递增
settle_version: u64, // spend_frozen/deposit 操作递增
}
}
1.2 Version 递增逻辑
| 操作 | 递增的 Version |
|---|---|
deposit() | lock_version AND settle_version |
withdraw() | lock_version |
lock() | lock_version |
unlock() | lock_version |
spend_frozen() | settle_version |
1.3 等效性验证 ✅
验证脚本:scripts/verify_baseline_equivalence.py
$ python3 scripts/verify_baseline_equivalence.py
╔════════════════════════════════════════════════════════════╗
║ Baseline Equivalence Verification ║
╚════════════════════════════════════════════════════════════╝
...
=== Step 3: Compare avail and frozen values ===
✅ EQUIVALENT: avail and frozen values are IDENTICAL
Phase 2: 扩展 BalanceEvent ✅ 已完成
目标:完整的事件溯源
2.1 事件类型和结构
已在 src/messages.rs 中实现:
#![allow(unused)]
fn main() {
pub enum BalanceEventType { Deposit, Withdraw, Lock, Unlock, Settle }
pub enum SourceType { Order, Trade, External }
pub struct BalanceEvent {
pub user_id: u64,
pub asset_id: u32,
pub event_type: BalanceEventType,
pub version: u64,
pub source_type: SourceType,
pub source_id: u64,
pub delta: i64,
// ...
}
}
Phase 3: Ledger 记录所有操作 ✅ 已完成
目标:每个余额变更都有记录
3.1 事件日志文件
UBSCore 模式下生成 output/t2_events.csv:
user_id,asset_id,event_type,version,source_type,source_id,delta,avail_after,frozen_after
655,2,lock,2,order,1,-3315478,996684522,3315478
96,2,settle,2,trade,1,-92889,999907111,0
604,1,deposit,1,external,1,10000000000,10000000000,0
3.2 当前记录的操作
| 操作 | 状态 | 说明 |
|---|---|---|
| Deposit | ✅ | 初始充值时记录 |
| Lock | ✅ | 下单锁定后记录 |
| Settle | ✅ | 成交结算后记录 |
| Unlock | ⏳ | 取消订单时记录(当前测试无取消) |
| Withdraw | ⏳ | 提现时记录(当前测试无提现) |
3.3 事件统计
Total events: 293,544
Deposit events: 2,000
Lock events: 100,000
Settle events: 191,544
Phase 4: 验证测试 ✅ 已完成
目标:验证事件正确性
4.1 事件正确性验证
scripts/verify_balance_events.py - 7 项检查:
| 检查项 | 说明 | 状态 |
|---|---|---|
| Lock 事件数量 | = 接受的订单数 | ✅ |
| Settle 事件数量 | = 成交数 × 4 | ✅ |
| Lock 版本连续性 | 每个用户-资产对内递增 | ✅ |
| Settle 版本连续性 | 每个用户-资产对内递增 | ✅ |
| Delta 守恒 | 每笔成交的 delta 总和 = 0 | ✅ |
| Source 类型一致性 | Lock→Order, Settle→Trade | ✅ |
| Deposit 事件 | 正 delta + source_type=external | ✅ |
4.2 Events Baseline 验证
scripts/verify_events_baseline.py:
$ python3 scripts/verify_events_baseline.py
...
Comparing by event type...
deposit: output=2000, baseline=2000 ✅
lock: output=100000, baseline=100000 ✅
settle: output=191544, baseline=191544 ✅
╔════════════════════════════════════════════════════════════╗
║ ✅ Events match baseline! ║
╚════════════════════════════════════════════════════════════╝
4.3 完整 E2E 测试
运行 scripts/test_ubscore_e2e.sh:
$ bash scripts/test_ubscore_e2e.sh
=== Step 1: Run with UBSCore mode ===
...
=== Step 2: Verify standard baselines ===
✅ All MATCH
=== Step 3: Verify balance events correctness ===
✅ All 7 checks passed!
=== Step 4: Verify events baseline ===
✅ Events match baseline!
Baseline 文件
| 文件 | 说明 |
|---|---|
baseline/t2_balances_final.csv | 最终余额状态 |
baseline/t2_orderbook.csv | 最终订单簿状态 |
baseline/t2_events.csv | 事件日志 (293,544 事件) |
下一步
- 0x08-d: 多线程 Pipeline - 实现 Ring Buffer 连接各服务
- 0x09: 多 Symbol 支持 - 扩展到多交易对
参考
- Event Sourcing - 事件溯源模式
- LMAX Disruptor - Ring Buffer 架构原型
0x08-d Complete Order Lifecycle & Cancel Optimization
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📦 Code Changes: View Diff
Core Objective: Implement full order lifecycle management (including Cancel and Refund), design a dual-track testing framework, and analyze performance bottlenecks.
1. Feature Implementation Overview
In this chapter, we completed the following core features to equip the trading engine with full order processing capabilities:
1.1 Order Events & State Management
Implemented complete OrderEvent enum and CSV logging.
OrderStatus (src/models.rs): Follows Binance-style Screaming Snake Case.
#![allow(unused)]
fn main() {
pub enum OrderStatus {
NEW, // Booked
PARTIALLY_FILLED,
FILLED,
CANCELED, // User Cancelled
REJECTED, // Risk Check Failed
EXPIRED, // System Expired
}
}
OrderEvent (src/messages.rs): Used for Event Sourcing and Audit Logs.
| Event Type | Trigger | Fund Operation |
|---|---|---|
Accepted | Passed risk check | Lock |
Rejected | Insufficient balance/Bad params | None |
Filled | Fully filled | Settle |
PartialFilled | Partially filled | Settle |
Cancelled | User cancel | Unlock (Refund remaining) |
Expired | System expired | Unlock |
CSV Log Format (output/t2_order_events.csv):
event_type,order_id,user_id,seq_id,filled_qty,remaining_qty,price,reason
accepted,1,100,101,,,,
rejected,3,102,103,,,,insufficient_balance
partial_filled,1,100,,5000,1000,,
filled,1,100,,0,,85000,
cancelled,5,100,,,2000,,
1.2 Cancel Workflow
- Parsing:
scripts/csv_io.rssupportsaction=cancel. - Removal:
MatchingEnginecallsOrderBook::remove_order_by_id. - Unlock:
UBSCoregeneratesUnlockevent to refund frozen funds. - Logging: Record
Cancelledevent.
2. Dual-Track Testing Framework
To guarantee baseline stability while adding new features:
2.1 Regression Baseline
- Dataset:
fixtures/orders.csv(100k orders, Place only). - Script:
scripts/test_e2e.sh - Goal: Ensure no performance regression for legacy flows.
2.2 Feature Testing
- Dataset:
fixtures/test_with_cancel/orders.csv(1M orders, 30% Cancel). - Script:
scripts/test_cancel.sh - Goal: Verify lifecycle closure (Lock = Settle + Unlock).
3. Major Performance Issue
When scaling Cancel tests from 1,000 to 1,000,000 orders, we hit a severe performance wall.
3.1 Symptoms
- Baseline (100k Place): ~3 seconds.
- Cancel Test (1M Place+Cancel): > 7 minutes (430s).
- Bottleneck:
Matching Engineconsumes 98% CPU.
3.2 Root Cause Analysis
The culprit is OrderBook::remove_order_by_id:
#![allow(unused)]
fn main() {
// src/orderbook.rs
pub fn remove_order_by_id(&mut self, order_id: u64) -> Option<InternalOrder> {
// Scan ALL price levels -> Scan ALL orders in level
for (key, orders) in self.bids.iter_mut() {
if let Some(pos) = orders.iter().position(|o| o.order_id == order_id) {
// ...
}
}
// Scan asks...
}
}
- Complexity: O(N).
- Worst Case: With 500k orders piled up in the book, executing 300k cancels means 150 billion comparisons.
3.3 Solution (Next Step)
Introduce Order Index:
- Structure:
HashMap<OrderId, (Price, Side)>. - Complexity: Reduces Cancel from O(N) to O(1).
4. Verification Scripts
-
verify_balance_events.py:- Added
Check 8: Verify Frozen Balance history consistency. - Verify
Unlockevents correctly release funds.
- Added
-
verify_order_events.py:- Verify every
Acceptedorder has a final state. - Verify
Cancelledorders correspond to existingAcceptedorders.
- Verify every
5. Summary
We implemented full order lifecycle management and established a rigorous testing framework. Crucially, mass stress testing exposed a Big O algorithm defect in the cancel logic, setting the stage for the next optimization iteration.
🇨🇳 中文
📦 代码变更: 查看 Diff
核心目标:实现订单全生命周期管理(含撤单、退款),设计双轨制测试框架,并深入分析引入的性能瓶颈。
1. 功能实现概览
在本章中,我们完成了以下核心功能,使交易引擎具备了完整的订单处理能力:
1.1 订单事件与状态管理
实现了完整的 OrderEvent 枚举与 CSV 日志记录。
OrderStatus (src/models.rs): 注意遵循 Binance 风格的 Screaming Snake Case。
#![allow(unused)]
fn main() {
pub enum OrderStatus {
NEW, // 挂单中
PARTIALLY_FILLED, // 部分成交
FILLED, // 完全成交
CANCELED, // 用户撤单 (注意拼写 CANCELED)
REJECTED, // 风控拒绝
EXPIRED, // 系统过期
}
}
OrderEvent (src/messages.rs): 用于 Event Sourcing 和审计日志。
| 事件类型 | 触发场景 | 资金操作 |
|---|---|---|
Accepted | 订单通过风控并进入撮合 | Lock (冻结) |
Rejected | 余额不足或参数错误 | 无 |
Filled | 完全成交 | Settle (结算) |
PartialFilled | 部分成交 | Settle (结算) |
Cancelled | 用户撤单 (注意拼写 Cancelled) | Unlock (解冻剩余资金) |
Expired | 系统过期 | Unlock (解冻) |
CSV 日志格式 (output/t2_order_events.csv): 实际代码实现的列顺序如下:
event_type,order_id,user_id,seq_id,filled_qty,remaining_qty,price,reason
accepted,1,100,101,,,,
rejected,3,102,103,,,,insufficient_balance
partial_filled,1,100,,5000,1000,,
filled,1,100,,0,,85000,
cancelled,5,100,,,2000,,
1.2 撤单流程 (Cancel Workflow)
实现了 cancel 动作的处理流程:
- 输入解析:
scripts/csv_io.rs支持新旧两种 CSV 格式。- 新格式:
order_id,user_id,action,side,price,qty(支持action=cancel)。
- 新格式:
- 撮合移除:
MatchingEngine调用OrderBook::remove_order_by_id移除订单。 - 资金解锁:
UBSCore生成Unlock事件,返还冻结资金。 - 事件记录: 记录
Cancelled事件。
2. 双轨制测试框架
为了在引入新功能的同时保证原有基准不被破坏,我们设计了双轨制测试策略:
2.1 原始基准 (Regression Baseline)
- 数据集:
fixtures/orders.csv(10万订单,仅 Place)。 - 脚本:
scripts/test_e2e.sh - 目的: 确保传统撮合性能不回退,验证核心正确性。
- 原则: 保持基准稳定 (非必要不修改,除非格式升级或重大调整)。
2.2 新功能测试 (Feature Testing)
- 数据集:
fixtures/test_with_cancel/orders.csv(100万订单,含30% Cancel)。 - 脚本:
scripts/test_cancel.sh - 验证:
verify_balance_events.py: 验证资金守恒 (Lock = Settle + Unlock)。verify_order_events.py: 验证订单生命周期闭环。
3. 重大性能问题分析 (Major Issue)
在将撤单测试规模从 1000 扩大到 100万 时,我们发现了一个严重的性能崩塌现象。
3.1 现象
- 基准测试 (10万 Place): 耗时 ~3秒。
- 撤单测试 (100万 Place+Cancel): 耗时 超过 7分钟 (430秒)。
- 瓶颈定位:
Matching Engine耗时占比 98%。
3.2 原因深入分析
通过代码审查,我们发现瓶颈在于 OrderBook::remove_order_by_id 的实现:
#![allow(unused)]
fn main() {
// src/orderbook.rs
pub fn remove_order_by_id(&mut self, order_id: u64) -> Option<InternalOrder> {
// 遍历卖单簿的所有价格层级 --> 遍历每个层级的所有订单
for (key, orders) in self.bids.iter_mut() {
if let Some(pos) = orders.iter().position(|o| o.order_id == order_id) {
// ...
}
}
// 遍历买单簿...
}
}
- 复杂度: O(N),其中 N 是当前 OrderBook 中的订单总数。
- 数据分布恶化: 在
test_with_cancel数据集中,由于缺乏激进的“吃单”逻辑,大量订单堆积在撮合簿中(未成交)。假设盘口堆积了 50万 订单。 - 计算量: 执行 30万 次撤单,每次遍历 50万 数据 = 1500亿次 CPU 比较操作。
这解释了为什么系统在处理大规模撤单时速度极慢。
3.3 解决方案 (Next Step)
为了解决此问题,必须引入订单索引 (Order Index):
- 结构:
HashMap<OrderId, (Price, Side)>。 - 优化后复杂度: 撤单查找从 O(N) 降为 O(1)。
4. 验证脚本
我们提供了两个 Python脚本用于验证逻辑正确性:
-
verify_balance_events.py:- 新增
Check 8: 验证 Frozen Balance 的历史一致性。 - 验证
Unlock事件是否正确释放了资金。
- 新增
-
verify_order_events.py:- 验证所有
Accepted订单最终都有终态 (Filled/Cancelled/Rejected)。 - 验证
Cancelled订单真的对应了相应的Accepted事件。
- 验证所有
5. 总结
本章不仅完成了功能的开发,更重要的是建立了数据隔离的测试体系,并通过大规模压测暴露了算法复杂度缺陷。这为下一步的持续迭代奠定了坚实基础。
0x08-e Performance Profiling & Optimization
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📦 Code Changes: View Diff
Background: After introducing Cancel, execution time exploded from ~30s to 7+ minutes. We need to identify and fix the issue.
Goal:
- Establish architecture-level profiling to pinpoint bottlenecks.
- Fix the identified O(N) issues.
- Verify improvements with data.
1. Symptoms
Performance collapsed after adding Cancel:
- Execution Time: ~30s → 7+ minutes
- Throughput: ~34k ops/s → ~3k ops/s
Hypothesis:
- Is it the O(N) Cancel scan?
VecDequeremoval overhead?- Something else?
Hypothesis implies guessing. Profiling provides facts.
2. Optimization 1: Order Index
2.1 The Problem
Cancelling requires looking up an order. The naive remove_order_by_id iterates the entire book:
#![allow(unused)]
fn main() {
// Before: O(N) full scan
pub fn remove_order_by_id(&mut self, order_id: u64) -> Option<InternalOrder> {
for (key, orders) in self.bids.iter_mut() {
if let Some(pos) = orders.iter().position(|o| o.order_id == order_id) {
// ...
}
}
// Scan asks...
}
}
2.2 The Solution
Introduce order_index: FxHashMap<OrderId, (Price, Side)> for O(1) lookup.
#![allow(unused)]
fn main() {
pub struct OrderBook {
asks: BTreeMap<u64, VecDeque<InternalOrder>>,
bids: BTreeMap<u64, VecDeque<InternalOrder>>,
order_index: FxHashMap<u64, (u64, Side)>, // New
trade_id_counter: u64,
}
}
2.3 Index Maintenance
| Operation | Action |
|---|---|
rest_order() | Insert |
cancel_order() | Remove |
remove_order_by_id() | Remove |
| Match Fill | Remove |
2.4 Optimized Implementation
#![allow(unused)]
fn main() {
pub fn remove_order_by_id(&mut self, order_id: u64) -> Option<InternalOrder> {
// O(1) Lookup
let (price, side) = self.order_index.remove(&order_id)?;
// O(log n) Find level
let (book, key) = match side {
Side::Buy => (&mut self.bids, u64::MAX - price),
Side::Sell => (&mut self.asks, price),
};
// O(k) Find in level (k is small)
let orders = book.get_mut(&key)?;
let pos = orders.iter().position(|o| o.order_id == order_id)?;
let order = orders.remove(pos)?;
if orders.is_empty() {
book.remove(&key);
}
Some(order)
}
}
2.5 Result 1
| Metric | Before | After |
|---|---|---|
| Time | 7+ min | 87s |
| Throughput | ~3k ops/s | 15k ops/s |
| Boost | - | 5x |
Huge improvement! But 87s for 1.3M orders is still slow (15k ops/s). Further analysis is needed.
3. Architecture Profiling
3.1 Design
Measure time at architectural stages:
Order Input
│
▼
┌─────────────────┐
│ 1. Pre-Trade │ ← UBSCore: WAL + Balance Lock
└────────┬────────┘
│
▼
┌─────────────────┐
│ 2. Matching │ ← Pure ME: process_order
└────────┬────────┘
│
▼
┌─────────────────┐
│ 3. Settlement │ ← UBSCore: settle_trade
└────────┬────────┘
│
▼
┌─────────────────┐
│ 4. Event Log │ ← Ledger writes
└─────────────────┘
3.2 PerfMetrics
#![allow(unused)]
fn main() {
pub struct PerfMetrics {
pub total_pretrade_ns: u64, // UBSCore WAL + Lock
pub total_matching_ns: u64, // Match processing
pub total_settlement_ns: u64, // Balance updates
pub total_event_log_ns: u64, // Ledger I/O
pub place_count: u64,
pub cancel_count: u64,
}
}
4. Optimization 2: Matching Engine
4.1 Bottleneck Identification
Profiling revealed Matching Engine used 96% of time.
Deep dive found:
#![allow(unused)]
fn main() {
// Problem: Copy ALL price keys on every match
let prices: Vec<u64> = book.asks().keys().copied().collect();
}
With 250k+ price levels in the Cancel test, copying keys O(P) + Alloc every match is disastrous.
4.2 Solution
Use BTreeMap::range() to iterate only relevant prices.
#![allow(unused)]
fn main() {
// Solution: Iterate only valid price range
let max_price = if buy_order.order_type == OrderType::Limit {
buy_order.price
} else {
u64::MAX
};
let prices: Vec<u64> = book.asks().range(..=max_price).map(|(&k, _)| k).collect();
}
5. Final Results
5.1 Environment
- Dataset: 1.3M Orders (1M Place + 300k Cancel)
- HW: MacBook Pro M1
5.2 Breakdown
=== Performance Breakdown ===
Orders: 1300000, Trades: 538487
1. Pre-Trade: 621.97ms ( 3.5%) [ 0.48 µs/order]
2. Matching: 15014.08ms ( 84.0%) [ 15.01 µs/order]
3. Settlement: 21.57ms ( 0.1%) [ 0.04 µs/trade]
4. Event Log: 2206.71ms ( 12.4%) [ 1.70 µs/order]
Total Tracked: 17864.33ms
5.3 Improvements
| Stage | Latency Before | Latency After | Gain |
|---|---|---|---|
| Matching | 83.53 µs/order | 15.01 µs/order | 5.6x |
| Cancel Lookup | O(N) | 0.29 µs | - |
6. Comparison Table
| Version | Time | Throughput | Gain |
|---|---|---|---|
| Before optimization | 7+ min | ~3k ops/s | - |
| Order Index | 87s | 15k ops/s | 5x |
| + BTreeMap range | 18s | 72k ops/s | 24x |
7. Summary
7.1 Achievements
| Optimization | Problem | Solution | Result |
|---|---|---|---|
| Order Index | O(N) Cancel | FxHashMap | 0.29 µs |
| Range Query | Full key copy | range() | 83→15 µs |
7.2 Final Design Pattern
┌─────────────────────────────────────────────────────────┐
│ OrderBook │
│ ┌─────────────────┐ ┌─────────────────────────────┐ │
│ │ order_index │◄───│ Sync on: rest, cancel, │ │
│ │ FxHashMap<id, │ │ match, remove │ │
│ │ (price,side)> │ └─────────────────────────────┘ │
│ └────────┬────────┘ │
│ │ O(1) lookup │
│ ▼ │
│ ┌─────────────────┐ ┌─────────────────────────────┐ │
│ │ bids │ │ asks │ │
│ │ BTreeMap<price, │ │ BTreeMap<price, │ │
│ │ VecDeque> │ │ VecDeque> │ │
│ │ + range() │ │ + range() │ │
│ └─────────────────┘ └─────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
Optimization Conclusion: From 7 minutes to 18 seconds. 24x boost. 🚀
🇨🇳 中文
📦 代码变更: 查看 Diff
背景:引入 Cancel 功能后,执行时间从 ~30s 暴涨到 7+ 分钟,需要定位并解决问题。
本章目的:
- 建立正确的架构级 Profiling 方法
- 通过 Profiling 精确定位性能瓶颈
- 针对性修复发现的问题
关键点:直觉可以指导方向,但必须用 Profiling 数据验证。
1. 问题现象
引入 Cancel 后性能急剧下降:
- 执行时间:~30s → 7+ 分钟
- 吞吐量:~34k ops/s → ~3k ops/s
初始假设可能的原因:
- Cancel 的 O(N) 查找?
- VecDeque 删除开销?
- 其他未知问题?
但在 Profile 之前,这些都只是猜测。
2. Order Index 优化(第一次优化)
2.1 问题
撤单操作需要在 OrderBook 中查找订单。原始实现 remove_order_by_id 需要遍历整个订单簿:
#![allow(unused)]
fn main() {
// 优化前:O(N) 全表扫描
pub fn remove_order_by_id(&mut self, order_id: u64) -> Option<InternalOrder> {
for (key, orders) in self.bids.iter_mut() {
if let Some(pos) = orders.iter().position(|o| o.order_id == order_id) {
// ...
}
}
// 再遍历 asks...
}
}
2.2 解决方案
引入 order_index: FxHashMap<OrderId, (Price, Side)> 实现 O(1) 查找:
#![allow(unused)]
fn main() {
pub struct OrderBook {
asks: BTreeMap<u64, VecDeque<InternalOrder>>,
bids: BTreeMap<u64, VecDeque<InternalOrder>>,
order_index: FxHashMap<u64, (u64, Side)>, // 新增
trade_id_counter: u64,
}
}
2.3 索引维护
| 操作 | 索引动作 |
|---|---|
rest_order() | 插入 |
cancel_order() | 移除 |
remove_order_by_id() | 移除 |
| 撮合成交 | 移除 |
2.4 优化后实现
#![allow(unused)]
fn main() {
pub fn remove_order_by_id(&mut self, order_id: u64) -> Option<InternalOrder> {
// O(1) 查找
let (price, side) = self.order_index.remove(&order_id)?;
// O(log n) 定位价格层级
let (book, key) = match side {
Side::Buy => (&mut self.bids, u64::MAX - price),
Side::Sell => (&mut self.asks, price),
};
// O(k) 在价格层级内查找 (k 通常很小)
let orders = book.get_mut(&key)?;
let pos = orders.iter().position(|o| o.order_id == order_id)?;
let order = orders.remove(pos)?;
if orders.is_empty() {
book.remove(&key);
}
Some(order)
}
}
2.5 第一次优化结果
| 指标 | 优化前 | 优化后 |
|---|---|---|
| 执行时间 | 7+ 分钟 | 87s |
| 吞吐量 | ~3k ops/s | 15k ops/s |
| 提升 | - | 5x |
提升巨大! 但 87s 处理 130万订单仍然很慢。需要继续分析。
3. 架构级 Profiling(定位真正瓶颈)
3.1 Profiling 设计
按照订单生命周期的顶层架构分阶段计时:
Order Input
│
▼
┌─────────────────┐
│ 1. Pre-Trade │ ← UBSCore: WAL + Balance Lock
└────────┬────────┘
│
▼
┌─────────────────┐
│ 2. Matching │ ← Pure ME: process_order
└────────┬────────┘
│
▼
┌─────────────────┐
│ 3. Settlement │ ← UBSCore: settle_trade
└────────┬────────┘
│
▼
┌─────────────────┐
│ 4. Event Log │ ← Ledger writes
└─────────────────┘
3.2 PerfMetrics 设计
#![allow(unused)]
fn main() {
pub struct PerfMetrics {
// 顶层架构计时
pub total_pretrade_ns: u64, // UBSCore WAL + Lock
pub total_matching_ns: u64, // Pure ME
pub total_settlement_ns: u64, // Balance updates
pub total_event_log_ns: u64, // Ledger writes
// 操作计数
pub place_count: u64,
pub cancel_count: u64,
}
}
4. Matching Engine 优化(第二次优化)
4.1 问题定位
通过架构级 Profiling 发现 Matching Engine 占用 96% 时间。深入分析发现:
#![allow(unused)]
fn main() {
// 问题代码:每次 match 都复制所有价格 keys
let prices: Vec<u64> = book.asks().keys().copied().collect();
}
当订单簿有 25万+ 价格层级时,每次 match 都要:
- 遍历整个 BTreeMap 收集 keys - O(P)
- 分配 Vec 存储 - 内存分配开销
- 再遍历 Vec 进行匹配
4.2 优化方案
使用 BTreeMap::range() 只收集匹配范围内的 keys:
#![allow(unused)]
fn main() {
// 优化后:只收集匹配价格范围内的 keys
let max_price = if buy_order.order_type == OrderType::Limit {
buy_order.price
} else {
u64::MAX
};
let prices: Vec<u64> = book.asks().range(..=max_price).map(|(&k, _)| k).collect();
}
5. 性能测试结果
5.1 测试环境
- 数据集:130万订单(100万 Place + 30万 Cancel)
- 机器:MacBook Pro M1
5.2 最终 Breakdown
=== Performance Breakdown ===
Orders: 1300000 (Place: 1000000, Cancel: 300000), Trades: 538487
1. Pre-Trade: 621.97ms ( 3.5%) [ 0.48 µs/order]
2. Matching: 15014.08ms ( 84.0%) [ 15.01 µs/order]
3. Settlement: 21.57ms ( 0.1%) [ 0.04 µs/trade]
4. Event Log: 2206.71ms ( 12.4%) [ 1.70 µs/order]
Total Tracked: 17864.33ms
5.3 优化效果
| 阶段 | 优化前 | 优化后 | 提升 |
|---|---|---|---|
| Matching | 83.53 µs/order | 15.01 µs/order | 5.6x |
| Cancel Lookup | O(N) | 0.29 µs | - |
6. 执行性能对比
| 版本 | 执行时间 | 吞吐量 | 改进 |
|---|---|---|---|
| 优化前 (O(N) 撤单 + 全量 keys) | 7+ 分钟 | ~3k ops/s | - |
| Order Index 优化 | 87s | 15k ops/s | 5x |
| + BTreeMap range query | 18s | 72k ops/s | 24x |
7. 总结
7.1 优化成果
| 优化 | 问题 | 解决方案 | 效果 |
|---|---|---|---|
| Order Index | O(N) 撤单查找 | FxHashMap 索引 | 0.29 µs/cancel |
| BTreeMap range | 全量 keys 复制 | range() 范围查询 | 83→15 µs/order |
7.2 最终设计模式
┌─────────────────────────────────────────────────────────┐
│ OrderBook │
│ ┌─────────────────┐ ┌─────────────────────────────┐ │
│ │ order_index │◄───│ Sync on: rest, cancel, │ │
│ │ FxHashMap<id, │ │ match, remove │ │
│ │ (price,side)> │ └─────────────────────────────┘ │
│ └────────┬────────┘ │
│ │ O(1) lookup │
│ ▼ │
│ ┌─────────────────┐ ┌─────────────────────────────┐ │
│ │ bids │ │ asks │ │
│ │ BTreeMap<price, │ │ BTreeMap<price, │ │
│ │ VecDeque> │ │ VecDeque> │ │
│ │ + range() │ │ + range() │ │
│ └─────────────────┘ └─────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
本次优化先到此为止!从 7 分钟到 18 秒,吞吐量提升 24 倍! 🚀
0x08-f Ring Buffer Pipeline Implementation
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📦 Code Changes: View Diff
Goal: Connect services using Ring Buffers to implement a true Pipeline architecture.
Part 1: Single-Thread Pipeline
1.1 Background
Legacy Execution (Synchronous Serial):
for order in orders:
1. ubscore.process_order(order) # WAL + Lock
2. engine.process_order(order) # Match
3. ubscore.settle_trade(trade) # Settle
4. ledger.write(event) # Persist
Problem: No pipeline parallelism, latency accumulates.
1.2 Single-Thread Pipeline Architecture
Decouple services using Ring Buffers, but polling within a single thread loop:
┌─────────────────────────────────────────────────────────────────────────┐
│ Single-Thread Pipeline (Round-Robin) │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Stage 1: Ingestion → order_queue │
│ Stage 2: UBSCore Pre-Trade → valid_order_queue │
│ Stage 3: Matching Engine → trade_queue │
│ Stage 4: Settlement → (Ledger) │
│ │
│ All Stages executed in a round-robin loop │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Core Data Structures:
#![allow(unused)]
fn main() {
pub struct PipelineQueues {
pub order_queue: Arc<ArrayQueue<SequencedOrder>>,
pub valid_order_queue: Arc<ArrayQueue<ValidOrder>>,
pub trade_queue: Arc<ArrayQueue<TradeEvent>>,
}
}
Execution Loop:
#![allow(unused)]
fn main() {
loop {
// UBSCore: order_queue → valid_order_queue
if let Some(order) = queues.order_queue.pop() {
// ...
}
// ME: valid_order_queue → trade_queue
if let Some(valid_order) = queues.valid_order_queue.pop() {
// ...
}
// Settlement: trade_queue → persist
if let Some(trade) = queues.trade_queue.pop() {
// ...
}
}
}
Part 2: Multi-Thread Pipeline
2.1 Architecture
Full Multi-Threaded Pipeline based on 0x08-a design:
┌───────────────────────────────────────────────────────────────────────────────────────┐
│ Multi-Thread Pipeline (Full) │
├───────────────────────────────────────────────────────────────────────────────────────┤
│ │
│ Thread 1: Ingestion Thread 2: UBSCore Thread 3: ME │
│ ┌─────────────────┐ ┌──────────────────────┐ ┌─────────────────┐ │
│ │ Read orders │ │ PRE-TRADE: │ │ Match Order │ │
│ │ Assign SeqNum │──────▶│ - Write WAL │──────▶│ in OrderBook │ │
│ │ │ ① │ - process_order() │ ③ │ │ │
│ └─────────────────┘ │ - lock_balance() │ │ Generate │ │
│ │ │ │ TradeEvents │ │
│ └──────────┬───────────┘ └────────┬────────┘ │
│ ▲ │ │
│ │ │ │
│ │ ⑤ balance_update_queue │ ④ trade_queue │
│ └────────────────────────────┤ │
│ │ │
│ ┌──────────────────────┐ ▼ │
│ │ POST-TRADE: │ ┌─────────────────┐ │
│ │ - settle_trade() │ │ Thread 4: │ │
│ │ - spend_frozen() │──────▶│ Settlement │ │
│ │ - deposit() │ ⑥ │ │ │
│ │ - Generate Balance │ │ Persist: │ │
│ │ Update Events │ │ - Trade Events │ │
│ └──────────────────────┘ │ - Balance Events│ │
│ │ - Ledger │ │
│ └─────────────────┘ │
│ │
└───────────────────────────────────────────────────────────────────────────────────────┘
2.2 Key Design Points
- ME Fan-out: ME sends
TradeEventin parallel to:trade_queue→ Settlement (Persist)balance_update_queue→ UBSCore (Balance Settle)
- UBSCore as Single Balance Entry: Handles Pre-Trade Lock, Post-Trade Settle, and Refunds.
- Settlement Consolidation: Consumes both Trade Events and Balance Events.
2.3 Data Types
BalanceUpdateRequest (ME → UBSCore): Contains Trade Event and optional Price Improvement data.
BalanceEvent (UBSCore → Settlement): The unified channel for ALL balance changes (Lock, Settle, Credit, Refund).
#![allow(unused)]
fn main() {
pub enum BalanceEventType {
Lock, // Pre-Trade
SpendFrozen, // Post-Trade
Credit, // Post-Trade
RefundFrozen, // Price Improvement
// ...
}
}
2.4 Implementation Status
| Component | Status |
|---|---|
| All Queues | ✅ Implemented |
| UBSCore BalanceEvent Gen | ✅ Implemented |
| Settlement Persistence | ✅ Implemented |
Verification & Performance (2025-12-17)
Correctness
E2E tests pass for both pipeline modes.
Performance Comparison
1.3M Orders (with 300k Cancel):
| Mode | Time | Throughput | Trades |
|---|---|---|---|
| UBSCore (Baseline) | 23.5s | 55k ops/s | 538,487 |
| Single-Thread Pipeline | 22.1s | 59k ops/s | 538,487 |
| Multi-Thread Pipeline | 29.1s | 45k ops/s | 489,804 |
- Issue: Multi-Thread mode is currently slower (-30%) on large datasets and skips cancel orders.
100k Orders (Place only):
| Mode | Time | Throughput | vs Baseline |
|---|---|---|---|
| UBSCore | 755ms | 132k ops/s | - |
| Single-Thread | 519ms | 193k ops/s | +46% |
| Multi-Thread | 391ms | 256k ops/s | +93% |
- Observation: Multi-threading shines on smaller, simpler datasets (+93%).
Analysis
Multi-threaded pipeline overhead (context switching, queue contention, event generation) outweighs benefits when per-order processing time is very low (due to optimizations). Also, missing Cancel logic reduces correctness.
Key Design Decisions
- Backpressure: Spin Wait (prioritize low latency).
- Shutdown: Graceful drain using Atomic Signals.
- Error Handling: Logging and metric counting; critical paths must succeed.
🇨🇳 中文
📦 代码变更: 查看 Diff
目标:使用 Ring Buffer 串接不同服务,实现真正的 Pipeline 架构
Part 1: 单线程 Pipeline
1.1 背景
原始执行模式 (同步串行):
for order in orders:
1. ubscore.process_order(order) # WAL + Lock
2. engine.process_order(order) # Match
3. ubscore.settle_trade(trade) # Settle
4. ledger.write(event) # Persist
问题:没有 Pipeline 并行,延迟累加
1.2 单线程 Pipeline 架构
使用 Ring Buffer 解耦各服务,但仍在单线程中轮询执行:
┌─────────────────────────────────────────────────────────────────────────┐
│ Single-Thread Pipeline (Round-Robin) │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Stage 1: Ingestion → order_queue │
│ Stage 2: UBSCore Pre-Trade → valid_order_queue │
│ Stage 3: Matching Engine → trade_queue │
│ Stage 4: Settlement → (Ledger) │
│ │
│ 所有 Stage 在同一个 while 循环中轮询执行 │
│ │
└─────────────────────────────────────────────────────────────────────────┘
核心数据结构:
#![allow(unused)]
fn main() {
pub struct PipelineQueues {
pub order_queue: Arc<ArrayQueue<SequencedOrder>>,
pub valid_order_queue: Arc<ArrayQueue<ValidOrder>>,
pub trade_queue: Arc<ArrayQueue<TradeEvent>>,
}
}
执行流程:
#![allow(unused)]
fn main() {
loop {
// UBSCore: order_queue → valid_order_queue
if let Some(order) = queues.order_queue.pop() {
// ...
}
// ME: valid_order_queue → trade_queue
if let Some(valid_order) = queues.valid_order_queue.pop() {
// ...
}
// Settlement: trade_queue → persist
if let Some(trade) = queues.trade_queue.pop() {
// ...
}
}
}
Part 2: 多线程 Pipeline
2.1 架构
根据 0x08-a 原始设计,完整的多线程 Pipeline 数据流如下:
┌───────────────────────────────────────────────────────────────────────────────────────┐
│ Multi-Thread Pipeline (完整版) │
├───────────────────────────────────────────────────────────────────────────────────────┤
│ │
│ Thread 1: Ingestion Thread 2: UBSCore Thread 3: ME │
│ ┌─────────────────┐ ┌──────────────────────┐ ┌─────────────────┐ │
│ │ Read orders │ │ PRE-TRADE: │ │ Match Order │ │
│ │ Assign SeqNum │──────▶│ - Write WAL │──────▶│ in OrderBook │ │
│ │ │ ① │ - process_order() │ ③ │ │ │
│ └─────────────────┘ │ - lock_balance() │ │ Generate │ │
│ │ │ │ TradeEvents │ │
│ └──────────┬───────────┘ └────────┬────────┘ │
│ ▲ │ │
│ │ │ │
│ │ ⑤ balance_update_queue │ ④ trade_queue │
│ └────────────────────────────┤ │
│ │ │
│ ┌──────────────────────┐ ▼ │
│ │ POST-TRADE: │ ┌─────────────────┐ │
│ │ - settle_trade() │ │ Thread 4: │ │
│ │ - spend_frozen() │──────▶│ Settlement │ │
│ │ - deposit() │ ⑥ │ │ │
│ │ - Generate Balance │ │ Persist: │ │
│ │ Update Events │ │ - Trade Events │ │
│ └──────────────────────┘ │ - Balance Events│ │
│ │ - Ledger │ │
│ └─────────────────┘ │
│ │
└───────────────────────────────────────────────────────────────────────────────────────┘
2.2 关键设计点
- ME Fan-out: ME 将
TradeEvent并行发送到:trade_queue→ Settlement (持久化交易记录)balance_update_queue→ UBSCore (余额结算)
- UBSCore 是余额操作的唯一入口: 处理 Pre-Trade 锁定、Post-Trade 结算和退款。
- Settlement 聚合: 同时消费交易事件和余额事件。
2.3 数据类型
BalanceUpdateRequest (ME → UBSCore): 包含成交事件和可能的价格改善(Price Improvement)数据。
BalanceEvent (UBSCore → Settlement): 所有余额变更的统一通道 (Lock, Settle, Credit, Refund)。
#![allow(unused)]
fn main() {
pub enum BalanceEventType {
Lock, // Pre-Trade
SpendFrozen, // Post-Trade
Credit, // Post-Trade
RefundFrozen, // Price Improvement
// ...
}
}
2.4 实现状态
| 组件 | 状态 |
|---|---|
| 所有队列 | ✅ 已实现 |
| UBSCore BalanceEvent 生成 | ✅ 已实现 |
| Settlement 持久化 | ✅ 已实现 |
验证与性能 (2025-12-17)
正确性
E2E 测试在两种模式下均通过。
性能对比
1.3M 订单 (含 30 万撤单):
| 模式 | 执行时间 | 吞吐量 | 成交数 |
|---|---|---|---|
| UBSCore (Baseline) | 23.5s | 55k ops/s | 538,487 |
| 单线程 Pipeline | 22.1s | 59k ops/s | 538,487 |
| 多线程 Pipeline | 29.1s | 45k ops/s | 489,804 |
- 问题: 多线程模式在大数据集上反而更慢 (-30%),且目前跳过了撤单处理。
100k 订单 (仅 Place):
| 模式 | 时间 | 吞吐量 | 提升 |
|---|---|---|---|
| UBSCore | 755ms | 132k ops/s | - |
| 单线程 | 519ms | 193k ops/s | +46% |
| 多线程 | 391ms | 256k ops/s | +93% |
- 观察: 多线程在简单的小数据集上表现出色 (+93%)。
分析
在单笔处理极快的情况下,多线程带来的开销(上下文切换、队列竞争、事件生成)超过了并行的收益。此外,缺失撤单逻辑降低了正确性。
关键设计决策
- 背压: 自旋等待 (Spin Wait),优先低延迟。
- 关闭: 使用原子信号优雅退出。
- 错误处理: 日志记录,核心路径必须成功。
0x08-g Multi-Thread Pipeline Design
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📦 Code Changes: View Diff | Key File: pipeline_mt.rs
Overview
The Multi-Thread Pipeline distributes processing logic across 4 independent threads, communicating via lock-free queues to achieve high throughput order processing.
Architecture
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Ingestion │────▶│ UBSCore │────▶│ ME │────▶│ Settlement │
│ (Thread 1) │ │ (Thread 2) │ │ (Thread 3) │ │ (Thread 4) │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
│ │ ▲ │ │
│ │ │ │ │
▼ ▼ │ ▼ ▼
order_queue ────▶ action_queue balance_update_queue trade_queue
│ balance_event_queue
└──────────────────────────────────────┘
Thread Responsibilities
| Thread | Responsibility | Input Queue | Output |
|---|---|---|---|
| Ingestion | Parse orders, assign SeqNum | orders (iterator) | order_queue |
| UBSCore | Pre-Trade (WAL + Lock) + Post-Trade (Settle) | order_queue, balance_update_queue | action_queue, balance_event_queue |
| ME | Match, Cancel handling | action_queue | trade_queue, balance_update_queue |
| Settlement | Persist Events (Trade, Balance) | trade_queue, balance_event_queue | ledgers |
Queue Design
Using crossbeam-queue::ArrayQueue for lock-free MPSC queues:
#![allow(unused)]
fn main() {
pub struct MultiThreadQueues {
pub order_queue: Arc<ArrayQueue<OrderAction>>, // 64K
pub action_queue: Arc<ArrayQueue<ValidAction>>, // 64K
pub trade_queue: Arc<ArrayQueue<TradeEvent>>, // 64K
pub balance_update_queue: Arc<ArrayQueue<BalanceUpdateRequest>>, // 64K
pub balance_event_queue: Arc<ArrayQueue<BalanceEvent>>, // 64K
}
}
Cancel Handling
- Ingestion: Create
OrderAction::Cancel. - UBSCore: Pass to
action_queue(No lock needed). - ME: Remove from OrderBook, send
BalanceUpdateRequest::Cancel. - UBSCore: Process unlock, generate
BalanceEvent::Unlock. - Settlement: Persist
BalanceEvent.
Consistency Verification
Test Script
# Run full comparison test
./scripts/test_pipeline_compare.sh highbal
# Supported Datasets:
# 100k - 100k orders without cancel
# cancel - 1.3M orders with 30% cancel
# highbal - 1.3M orders with 30% cancel, high balance (Recommended)
Verification Results (1.3M orders, 30% cancel, high balance)
╔════════════════════════════════════════════════════════════════╗
║ ✅ ALL TESTS PASSED ║
║ Multi-thread pipeline matches single-thread exactly! ║
╚════════════════════════════════════════════════════════════════╝
Key Metrics
| Dataset | Total | Place | Cancel | Trades | Result |
|---|---|---|---|---|---|
| 100k | 100,000 | 100,000 | 0 | 47,886 | ✅ Match |
| 1.3M HighBal | 1,300,000 | 1,000,000 | 300,000 | 667,567 | ✅ Match |
Important Considerations
Balance Sufficiency
Insufficient balance may cause rejections. In concurrent environments, rejection timing can vary due to settlement latency, leading to non-deterministic results.
Solution: Use highbal dataset (1000 BTC + 100M USDT per user).
Shutdown Synchronization
Wait for queues to drain before signaling shutdown:
#![allow(unused)]
fn main() {
while !queues.all_empty() {
std::hint::spin_loop();
}
shutdown.request_shutdown();
}
Performance
| Mode | 100k orders | 1.3M orders |
|---|---|---|
| Single-Thread | 350ms | 15.5s |
| Multi-Thread | 330ms | 15.6s |
Note: Multi-thread version includes overhead for BalanceEvent generation/persistence, matching Single-Thread performance. Future optimizations: Batch I/O, reduce contention.
Queue Priority Strategy (Future)
Current Implementation:
Prioritize draining balance_update_queue completely before processing order_queue.
Future: Weighted Round-Robin: Allow alternating processing to improve responsiveness.
#![allow(unused)]
fn main() {
const SETTLE_WEIGHT: u32 = 3; // settle : order = 3 : 1
}
File Structure
src/
├── pipeline.rs # Shared types
├── pipeline_mt.rs # Multi-thread impl
├── pipeline_runner.rs # Single-thread impl
└── main.rs
🇨🇳 中文
📦 代码变更: 查看 Diff | 关键文件: pipeline_mt.rs
概述
Multi-Thread Pipeline 将处理逻辑分布在 4 个独立线程中,通过无锁队列通信,实现高吞吐量的订单处理。
架构
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Ingestion │────▶│ UBSCore │────▶│ ME │────▶│ Settlement │
│ (Thread 1) │ │ (Thread 2) │ │ (Thread 3) │ │ (Thread 4) │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
│ │ ▲ │ │
│ │ │ │ │
▼ ▼ │ ▼ ▼
order_queue ────▶ action_queue balance_update_queue trade_queue
│ balance_event_queue
└──────────────────────────────────────┘
线程职责
| 线程 | 职责 | 输入队列 | 输出 |
|---|---|---|---|
| Ingestion | 订单解析、序列号分配 | orders (iterator) | order_queue |
| UBSCore | Pre-Trade (WAL + Lock) + Post-Trade (Settle) | order_queue, balance_update_queue | action_queue, balance_event_queue |
| ME | 订单撮合、取消处理 | action_queue | trade_queue, balance_update_queue |
| Settlement | 事件持久化 (TradeEvent, BalanceEvent) | trade_queue, balance_event_queue | ledger files |
队列设计
使用 crossbeam-queue::ArrayQueue 实现无锁 MPSC 队列:
#![allow(unused)]
fn main() {
pub struct MultiThreadQueues {
pub order_queue: Arc<ArrayQueue<OrderAction>>, // 64K capacity
pub action_queue: Arc<ArrayQueue<ValidAction>>, // 64K capacity
pub trade_queue: Arc<ArrayQueue<TradeEvent>>, // 64K capacity
pub balance_update_queue: Arc<ArrayQueue<BalanceUpdateRequest>>, // 64K
pub balance_event_queue: Arc<ArrayQueue<BalanceEvent>>, // 64K
}
}
Cancel 订单处理
Cancel 订单流程:
- Ingestion: 创建
OrderAction::Cancel { order_id, user_id } - UBSCore: 直接传递到
action_queue(无需 balance lock) - ME: 从 OrderBook 移除订单,发送
BalanceUpdateRequest::Cancel - UBSCore (Post-Trade): 处理 unlock,生成
BalanceEvent::Unlock - Settlement: 持久化
BalanceEvent
一致性验证
测试脚本
# 运行完整对比测试
./scripts/test_pipeline_compare.sh highbal
# 支持的数据集:
# 100k - 100k orders without cancel
# cancel - 1.3M orders with 30% cancel
# highbal - 1.3M orders with 30% cancel, high balance (推荐)
验证结果 (1.3M orders, 30% cancel, high balance)
╔════════════════════════════════════════════════════════════════╗
║ ✅ ALL TESTS PASSED ║
║ Multi-thread pipeline matches single-thread exactly! ║
╚════════════════════════════════════════════════════════════════╝
关键指标
| 数据集 | 总订单 | Place | Cancel | Trades | 结果 |
|---|---|---|---|---|---|
| 100k (无 cancel) | 100,000 | 100,000 | 0 | 47,886 | ✅ 完全一致 |
| 1.3M + 30% cancel (高余额) | 1,300,000 | 1,000,000 | 300,000 | 667,567 | ✅ 完全一致 |
注意事项
余额充足性
如果测试数据中用户余额不足,可能导致部分订单被 reject。在并发环境中,由于 settle 时序不同,这些 reject 可能与单线程结果不同。
解决方案: 使用 highbal 数据集,确保每个用户有充足余额(1000 BTC + 100M USDT)。
Shutdown 同步
Multi-thread pipeline 在 shutdown 时需要确保所有队列都已 drain:
#![allow(unused)]
fn main() {
while !queues.all_empty() {
std::hint::spin_loop();
}
shutdown.request_shutdown();
}
性能
| 模式 | 100k orders | 1.3M orders |
|---|---|---|
| Single-Thread | 350ms | 15.5s |
| Multi-Thread | 330ms | 15.6s |
注:Multi-thread 当前版本包含 BalanceEvent 生成和持久化开销,性能与 Single-Thread 相当。未来优化方向包括批量 I/O 和减少队列竞争。
队列优先级策略 (未来)
当前实现:
完全优先 drain balance_update_queue,然后才处理新订单。
未来优化: 加权轮询 (Weighted Round-Robin): 允许交替处理,提高响应性。
#![allow(unused)]
fn main() {
const SETTLE_WEIGHT: u32 = 3; // settle : order = 3 : 1
}
文件结构
src/
├── pipeline.rs # 共享类型: PipelineStats, MultiThreadQueues, ShutdownSignal
├── pipeline_mt.rs # Multi-thread 实现: run_pipeline_multi_thread()
├── pipeline_runner.rs # Single-thread 实现: run_pipeline()
└── main.rs # --pipeline / --pipeline-mt 模式选择
0x08-h Performance Monitoring & Observability
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📦 Code Changes: View Diff | Key File: pipeline_services.rs
“If you can’t measure it, you can’t improve it.” This chapter focuses on introducing production-grade performance monitoring and observability for our multi-threaded pipeline.
Monitoring Dimensions
1. Latency Metrics
In HFT, averages are misleading. We care about Tail Latency.
- P50 (Median): General performance.
- P99 / P99.9: Stability in extreme cases.
- Max: Jitter, GC, or system calls.
2. Throughput
- Orders/sec: Processing capacity.
- Trades/sec: Matching capacity.
3. Queue Depth & Backpressure
Monitoring Ring Buffer occupancy reveals downstream bottlenecks and jitter.
4. Architectural Breakdown
Knowing where time is spent (Pre-Trade vs Matching vs Settlement).
Test Execution
Dataset: 1.3M orders (30% cancel) from fixtures/test_with_cancel_highbal/.
Single-Thread Run:
cargo run --release -- --pipeline --input fixtures/test_with_cancel_highbal
Multi-Thread Run:
cargo run --release -- --pipeline-mt --input fixtures/test_with_cancel_highbal
Compare Script:
./scripts/test_pipeline_compare.sh highbal
Analysis Results (1.3M Dataset)
1. Single-Thread Pipeline
- Throughput: 210,000 orders/sec (P50 Latency: 1.25 µs)
- Breakdown:
- Matching Engine: 91.5% (The bottleneck)
- UBSCore Lock: 5.6%
- Persistence: 2.7%
2. Multi-Thread Pipeline (After Service Refactor)
- Throughput: ~64,450 orders/sec
- E2E Latency (P50): ~113 ms
- E2E Latency (P99): ~188 ms
Conclusion
- Parallelism Works: Total task CPU time (~34s) > Wall time (17.5s).
- Bottleneck: Matching Engine remains the serial bottleneck (~52k ops/s limit).
- Latency Cost: Multi-threading introduces significant message passing latency (µs → ms).
Logging & Observability
We introduced a production-grade asynchronous logging system using tracing.
1. Non-blocking I/O
Using tracing-appender with a dedicated worker thread and memory buffer to prevent I/O blocking.
2. Environment-driven Config
- Dev: Detailed, human-readable.
- Prod: JSON format, high-frequency tracing disabled (
0XINFI=off).
3. Standardized Targets
All pipeline logs use the 0XINFI namespace (e.g., 0XINFI::ME, 0XINFI::UBSC) for precise filtering.
Intent-Based Design: From Functions to Services
“Good architecture is not designed upfront, but evolved through refactoring.”
We refactored tightly coupled spawn_* functions into decoupled Service Structs.
Problem: Coupled Functions
#![allow(unused)]
fn main() {
// ❌ Business logic buried in thread spawning
fn spawn_me_stage(...) -> JoinHandle<OrderBook> {
thread::spawn(move || {
// Logic locked inside closure
})
}
}
- Untestable: Cannot unit test logic without spawning threads.
- Not Reusable: Cannot be used in single-thread mode.
Solution: Service Structs
#![allow(unused)]
fn main() {
// ✅ Intent is clear and decoupled
pub struct MatchingService {
book: OrderBook,
// ...
}
impl MatchingService {
pub fn run(&mut self, shutdown: &ShutdownSignal) { ... }
}
}
Benefits
- Testability: Services can be instantiated and tested in isolation.
- Reusability: Core logic is decoupled from threading model.
- Clarity: Code expresses “what” (Service), not just “how” (Thread).
🇨🇳 中文
📦 代码变更: 查看 Diff | 关键文件: pipeline_services.rs
在构建高性能低延迟交易系统时,“如果你无法测量它,你就无法优化它”。本章重点在于为我们的多线程 Pipeline 引入生产级的性能监控和延迟指标分析。
监控维度
1. 延迟指标 (Latency Metrics)
对于 HFT 系统,平均延迟往往是误导性的,我们更关心长尾延迟 (Tail Latency)。
- P50 (Median): 中位数延迟,反映平均水平。
- P99 / P99.9: 长尾延迟,反映系统在极端情况下的稳定性。
- Max: 峰值延迟,通常由系统抖动 (Jitter) 或 GC/系统调用引起。
2. 吞吐量 (Throughput)
- Orders/sec: 每秒处理订单数。
- Trades/sec: 每秒撮合成交数。
3. 队列深度与背压 (Queue Depth & Backpressure)
监控 Ring Buffer 的占用情况,识别下游瓶颈。
4. 架构内部阶段耗时 (Architectural Breakdown)
清晰地知道时间花在了哪里:Pre-Trade / Matching / Settlement / Logging。
测试执行方法
数据集: 130 万订单(含 30% 撤单) fixtures/test_with_cancel_highbal/。
运行单线程:
cargo run --release -- --pipeline --input fixtures/test_with_cancel_highbal
运行多线程:
cargo run --release -- --pipeline-mt --input fixtures/test_with_cancel_highbal
对比脚本:
./scripts/test_pipeline_compare.sh highbal
执行结果与分析 (1.3M 数据集)
1. 单线程流水线
- 性能: 210,000 orders/sec (P50: 1.25 µs)
- 瓶颈: Matching Engine 耗时 91.5%,是最大瓶颈。
2. 多线程流水线 (重构后)
- 吞吐量: ~64,450 orders/sec
- 端到端延迟 (P50): ~113 ms
- 端到端延迟 (P99): ~188 ms
结论
- 并行有效: CPU 总耗时远大于执行时间。
- 瓶颈: Matching Engine 依然是最大的串行瓶颈 (吞吐上限 ~52k)。
- 延迟: 多线程引入的消息传递开销导致端到端延迟从微秒级退化到毫秒级。
日志与可观测性
引入基于 tracing 的生产级异步日志体系。
1. 异步非阻塞架构
使用 tracing-appender 独立线程写入日志,不阻塞业务线程。
2. 环境驱动配置
Dev 开启详细日志,Prod 使用 JSON 并关闭高频追踪。
3. 标准化日志目标
使用 0XINFI 命名空间 (如 0XINFI::ME) 实现精细过滤。
意图编码:从函数到服务
“好的架构不是一开始就设计出来的,而是通过不断重构演进出来的。”
我们将紧耦合的 spawn_* 函数重构为解耦的 Service 结构体。
问题:紧耦合
#![allow(unused)]
fn main() {
// ❌ 业务逻辑埋在线程创建中
fn spawn_me_stage(...) {
thread::spawn(move || { ... })
}
}
无法单元测试,无法复用。
解决方案:Service 结构体
#![allow(unused)]
fn main() {
// ✅ 意图清晰,解耦
pub struct MatchingService { ... }
impl MatchingService {
pub fn run(&mut self, shutdown: &ShutdownSignal) { ... }
}
}
收益
- 可测试性: 服务可独立实例化测试。
- 可复用性: 核心逻辑与线程模型解耦。
- 清晰度: 代码表达“做什么“ (Service),而非“怎么做“ (Thread)。
0x09-a Gateway: Client Access Layer
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📦 Code Changes: View Diff
Core Objective: Implement a lightweight HTTP Gateway to connect clients with the trading core system.
Background: From Core to MVP
We have built a functional Trading Core:
- OrderBook (0x04)
- Balance Management (0x05-0x06)
- Matching Engine (0x08)
- Pipeline & Monitoring (0x08-f/g/h)
To become a usable MVP, we need auxiliary systems:
┌─────────────────────────────────────────────────────────────────────────┐
│ Complete Trading System MVP │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Client (Web/Mobile/API) │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ 0x09-a │ ← This Chapter: Accept orders, return response │
│ │ Gateway │ │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Trading Core (Completed) │ │
│ │ Ingestion → UBSCore → ME → Settlement │ │
│ └─────────────────────────────────────────────────────────────────┘ │
0x09 Series Plan
| Chapter | Topic | Core Function |
|---|---|---|
| 0x09-a | Gateway | HTTP/WS Entry, Pre-Check |
| 0x09-b | Settlement Persistence | DB Persistence for Balances/Trades |
| 0x09-c | K-Line Aggregation | Real-time Candles |
| 0x09-d | WebSocket Push | Real-time Market Data |
1. Gateway Design
1.1 Responsibilities
The Gateway is the sole entry point for clients.
- Protocol Conversion: HTTP/WebSocket → Internal Formats
- Authentication: API Key / JWT
- Pre-Check: Fast balance validation
- Rate Limiting: Anti-DDoS
- Response: Synchronous acknowledgment
1.2 Why Separate Gateway & Core?
- Decoupling: Network I/O doesn’t block matching.
- Scalability: Gateway can scale horizontally.
- Predictability: Async queues ensure predictable matching latency.
1.3 Tech Stack
- HTTP:
axum(High performance, tokio-native) - WebSocket:
tokio-tungstenite - Serialization:
serde+ JSON - Rate Limiting:
towermiddleware
2. Core Data Flow
2.1 Order Submission
┌──────────┐ HTTP POST ┌──────────┐ Ring Buffer ┌──────────┐
│ Client │ ───────────────▶│ Gateway │ ─────────────────▶│ Ingestion│
│ │ │ │ │ Stage │
│ │◀─────────────── │ │ │ │
└──────────┘ 202 Accepted └──────────┘ └──────────┘
+ │
order_id ▼
seq_id Trading Core
2.2 Pre-Check Logic
#![allow(unused)]
fn main() {
async fn submit_order(order: OrderRequest) -> Result<OrderResponse, ApiError> {
// 1. Validation
validate_order(&order)?;
// 2. Auth
let user_id = authenticate(&headers)?;
// 3. Pre-Check: Balance (Read-Only)
let balance = ubscore.query_balance(user_id, order.asset_id).await?;
if balance.avail < required {
return Err(ApiError::InsufficientBalance);
}
// 4. Assign ID
let order_id = id_generator.next();
// 5. Push to Ring Buffer
order_queue.push(SequencedOrder { ... })?;
// 6. Return Accepted
Ok(OrderResponse { status: "PENDING", ... })
}
}
Key Points:
- Pre-Check is “best effort”.
- Final locking happens in UBSCore.
- Returns
202 Acceptedto indicate async processing.
3. API Design
3.1 RESTful Endpoints
POST /api/v1/create_order: Submit orderPOST /api/v1/cancel_order: Cancel orderGET /api/v1/order/{order_id}: Query status
3.2 Request/Response Format
Submit Order:
// POST /api/v1/create_order
{
"symbol": "BTC_USDT",
"side": "BUY",
"type": "LIMIT",
"price": "85000.00",
"qty": "0.001"
}
// Response (202 Accepted)
{
"code": 0,
"msg": "ok",
"data": {
"order_id": 1001,
"status": "ACCEPTED",
"accepted_at": 1734533784000
}
}
3.3 Unified Response Format
{
"code": 0, // 0 = Success, Non-0 = Error
"msg": "ok", // Short description
"data": {} // Payload or null
}
3.4 API Conventions
Important: Must follow API Conventions.
- SCREAMING_CASE Enums:
"BUY","SELL","LIMIT". - Naming:
qty(not quantity),cid(client_order_id). - SCREAMING_SNAKE_CASE Error Codes:
INVALID_PARAMETER.
4. WebSocket Push
4.1 Flow
Clients connect via WS, authenticate, and subscribe to channels.
4.2 Channels
order_updates: Private order status changes.balance_updates: Private balance changes.trades: Public trade feed.
5. Security
| Level | Method | Scenario |
|---|---|---|
| MVP | Header X-User-ID | Internal / Reliability Testing |
| Prod | API Key (HMAC) | Programmatic Trading |
| Prod | JWT | Web/Mobile |
6. Communication Architecture
6.1 MVP Choice: Single Process Ring Buffer
Gateway and Trading Core run in the same process, communicating via Arc<ArrayQueue>.
Pros:
- ✅ Zero network overhead (~100ns latency).
- ✅ Reuse existing
crossbeamqueues. - ✅ Simple deployment.
6.2 Architecture Diagram
┌─────────────────────────────────────────────────────────────────────────┐
│ Single Process (--gateway mode) │
├─────────────────────────────────────────────────────────────────────────┤
│ ┌─────────────────────────────┐ │
│ │ HTTP Server (tokio runtime) │ │
│ └──────────────┬──────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────┐ │
│ │ order_queue │ (Shared Ring Buffer) │
│ └──────────────┬──────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────┐ │
│ │ Trading Core Threads │ │
│ └─────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
6.3 Evolution Path
- MVP: Single Process.
- Phase 2: Unix Domain Socket (Multi-process on same host).
- Phase 3: TCP / RPC (Distributed).
7. Implementation Guidelines
7.1 Startup Modes
# Gateway Mode
cargo run --release -- --gateway --port 8080
# Batch Mode (Original)
cargo run --release -- --pipeline-mt
7.2 Main Integration
#![allow(unused)]
fn main() {
if args.gateway {
// Spawn HTTP Server in a thread
std::thread::spawn(move || {
let rt = tokio::runtime::Runtime::new().unwrap();
rt.block_on(run_http_server(queues));
});
// Run Trading Core
run_pipeline_multi_thread(queues, ...);
}
}
Summary
This chapter implements the Gateway as the client access layer.
Core Philosophy:
The Gateway is a speed guard, not a business processor. Accept fast, validate fast, forward fast.
🇨🇳 中文
📦 代码变更: 查看 Diff
本节核心目标:实现一个轻量级的 HTTP Gateway,连接客户端与交易核心系统。
背景:从核心到完整 MVP
在前面的章节中,我们已经构建了一个功能完整的交易核心系统:
- OrderBook (0x04)
- Balance Management (0x05-0x06)
- Matching Engine (0x08)
- Pipeline (0x08-f/g/h)
但要成为一个可用的 MVP (Minimum Viable Product),还需要以下辅助系统:
┌─────────────────────────────────────────────────────────────────────────┐
│ Complete Trading System MVP │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Client (Web/Mobile/API) │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ 0x09-a │ ← 本章:接收订单,返回响应 │
│ │ Gateway │ │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Trading Core (已完成) │ │
│ │ Ingestion → UBSCore → ME → Settlement │ │
│ └─────────────────────────────────────────────────────────────────┘ │
0x09 系列章节规划
| 章节 | 主题 | 核心功能 |
|---|---|---|
| 0x09-a | Gateway | HTTP/WS 订单接入、Pre-Check |
| 0x09-b | Settlement Persistence | 用户余额、订单、成交入库 |
| 0x09-c | K-Line Aggregation | 实时 K 线聚合 |
| 0x09-d | WebSocket Push | 实时行情推送 |
1. Gateway 设计
1.1 职责
Gateway 是客户端与交易系统的唯一入口:
- 协议转换:HTTP/WebSocket → 内部消息格式
- 身份验证:API Key / JWT
- Pre-Check:快速余额校验
- 限流:防止 DDoS
- 响应:同步返回接收确认
1.2 为什么 Gateway + Trading Core 分离?
- 解耦:网络 I/O 不阻塞撮合。
- 扩展性:Gateway 可水平扩展。
- 可预测性:异步队列确保撮合延迟可预测。
1.3 技术选型
- HTTP:
axum(高性能、tokio 原生) - WebSocket:
tokio-tungstenite - Serialization:
serde+ JSON - Rate Limiting:
towermiddleware
2. 核心数据流
2.1 订单提交流程
┌──────────┐ HTTP POST ┌──────────┐ Ring Buffer ┌──────────┐
│ Client │ ───────────────▶│ Gateway │ ─────────────────▶│ Ingestion│
│ │ │ │ │ Stage │
│ │◀─────────────── │ │ │ │
└──────────┘ 202 Accepted └──────────┘ └──────────┘
+ │
order_id ▼
seq_id Trading Core
2.2 Pre-Check 流程
#![allow(unused)]
fn main() {
async fn submit_order(order: OrderRequest) -> Result<OrderResponse, ApiError> {
// 1. 参数校验
validate_order(&order)?;
// 2. 身份验证
let user_id = authenticate(&headers)?;
// 3. Pre-Check: 余额检查 (只读)
let balance = ubscore.query_balance(user_id, order.asset_id).await?;
if balance.avail < required {
return Err(ApiError::InsufficientBalance);
}
// 4. 分配 ID
let order_id = id_generator.next();
// 5. 推送到 Ring Buffer
order_queue.push(SequencedOrder { ... })?;
// 6. 返回接收确认
Ok(OrderResponse { status: "PENDING", ... })
}
}
关键点:
- Pre-Check 是“尽力而为“的检查。
- 最终锁定在 UBSCore 执行。
- 返回
202 Accepted表示异步处理中。
3. API 设计
3.1 RESTful Endpoints
POST /api/v1/create_order: 提交订单POST /api/v1/cancel_order: 取消订单GET /api/v1/order/{order_id}: 查询状态
3.2 请求/响应格式
提交订单:
// POST /api/v1/create_order
{
"symbol": "BTC_USDT",
"side": "BUY",
"type": "LIMIT",
"price": "85000.00",
"qty": "0.001"
}
// Response (202 Accepted)
{
"code": 0,
"msg": "ok",
"data": {
"order_id": 1001,
"status": "ACCEPTED",
"accepted_at": 1734533784000
}
}
3.3 统一响应格式
{
"code": 0, // 0 = 成功, 非0 = 错误码
"msg": "ok", // 简短描述
"data": {} // 数据或 null
}
3.4 API 规范
重要: 必须遵循 API Conventions 规范。
- 大写枚举:
"BUY","SELL","LIMIT"。 - 命名一致:
qty(而非 quantity),cid(client_order_id)。 - 大写蛇形错误码:
INVALID_PARAMETER。
4. WebSocket 实时推送
4.1 流程
客户端连接 WS,认证,并订阅频道。
4.2 频道
order_updates: 私有订单状态变更。balance_updates: 私有余额变更。trades: 公共成交推送。
5. 安全设计
| 级别 | 方法 | 场景 |
|---|---|---|
| MVP | Header X-User-ID | 内部测试 |
| Prod | API Key (HMAC) | 程序化交易 |
| Prod | JWT | Web/移动端 |
6. 通信架构设计
6.1 MVP 选择:单进程 Ring Buffer
Gateway 和 Trading Core 运行在同一进程中,通过 Arc<ArrayQueue> 通信。
优势:
- ✅ 零网络开销 (~100ns 延迟)。
- ✅ 复用现有
crossbeam队列。 - ✅ 部署简单。
6.2 架构图
┌─────────────────────────────────────────────────────────────────────────┐
│ Single Process (--gateway mode) │
├─────────────────────────────────────────────────────────────────────────┤
│ ┌─────────────────────────────┐ │
│ │ HTTP Server (tokio runtime) │ │
│ └──────────────┬──────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────┐ │
│ │ order_queue │ (共享 Ring Buffer) │
│ └──────────────┬──────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────┐ │
│ │ Trading Core Threads │ │
│ └─────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
6.3 演进路径
- MVP: 单进程。
- Phase 2: Unix Domain Socket (同机多进程)。
- Phase 3: TCP / RPC (分布式)。
7. 实现指引
7.1 启动模式
# Gateway 模式
cargo run --release -- --gateway --port 8080
# 批量模式 (原有)
cargo run --release -- --pipeline-mt
7.2 Main 集成
#![allow(unused)]
fn main() {
if args.gateway {
// 启动 HTTP Server 线程
std::thread::spawn(move || {
let rt = tokio::runtime::Runtime::new().unwrap();
rt.block_on(run_http_server(queues));
});
// 运行 Trading Core
run_pipeline_multi_thread(queues, ...);
}
}
总结
本章实现了 Gateway 作为客户端接入层。
核心理念:
Gateway 是速度门卫而不是业务处理器。快速接收、快速校验、快速转发。
0x09-b Settlement Persistence: TDengine Integration
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📦 Code Changes: View Diff
Core Objective: Persist trade data to TDengine and implement Order Query & History APIs.
Background: From Memory to Persistence
In Gateway Phase 1 (0x09-a), we completed:
- ✅ HTTP API (create_order, cancel_order)
- ✅ Order Validation
- ✅ Ring Buffer Integration
- ⏳ Data Persistence ← This Chapter
Current System Issue:
┌─────────────────────────────────────────────────────────────────┐
│ Trading Core (In-Memory) │
│ │
│ Orders → Match → Trades → Settle → Balance Update │
│ ↓ ↓ ↓ │
│ ❌ ❌ ❌ ← Data LOST on restart! │
└─────────────────────────────────────────────────────────────────┘
This Chapter’s Solution:
┌─────────────────────────────────────────────────────────────────┐
│ Trading Core │
│ │
│ Orders → Match → Trades → Settle → Balance Update │
│ ↓ ↓ ↓ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ TDengine (Persistence) │ │
│ │ orders | trades | balances │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
1. Why TDengine?
Detailed comparison: Database Selection Analysis
Core Advantages
| Feature | TDengine | PostgreSQL |
|---|---|---|
| Write Speed | 1M/sec | 10k/sec |
| Time-Series | Native Support | Index Optimization Needed |
| Storage | 1/10 | 1x |
| Real-time Analytics | Built-in Stream | External Tools Needed |
| Rust Client | ✅ Official taos | ✅ tokio-postgres |
2. Schema Design
2.1 Super Table Architecture
TDengine uses the Super Table concept:
┌─────────────────────────────────────────────────────────┐
│ Super Table: orders │
│ (Unified schema, auto-create sub-table per symbol) │
├─────────────────┬─────────────────┬────────────────────┤
│ orders_1 │ orders_2 │ orders_N │
│ (BTC_USDT) │ (ETH_USDT) │ (...) │
└─────────────────┴─────────────────┴────────────────────┘
2.2 DDL Definitions
-- Database Setup
CREATE DATABASE IF NOT EXISTS trading
KEEP 365d -- Retain data for 1 year
DURATION 10d -- Partition every 10 days
BUFFER 256 -- 256MB Write Buffer
WAL_LEVEL 2 -- WAL Persistence Level
PRECISION 'us'; -- Microsecond Precision
USE trading;
-- Orders Super Table
CREATE STABLE IF NOT EXISTS orders (
ts TIMESTAMP, -- Timestamp (PK)
order_id BIGINT UNSIGNED,
user_id BIGINT UNSIGNED,
side TINYINT UNSIGNED, -- 0=BUY, 1=SELL
order_type TINYINT UNSIGNED,-- 0=LIMIT, 1=MARKET
price BIGINT UNSIGNED, -- Integer representation
qty BIGINT UNSIGNED,
filled_qty BIGINT UNSIGNED,
status TINYINT UNSIGNED,
cid NCHAR(64) -- Client Order ID
) TAGS (
symbol_id INT UNSIGNED -- Partition Key
);
-- Trades Super Table
CREATE STABLE IF NOT EXISTS trades (
ts TIMESTAMP,
trade_id BIGINT UNSIGNED,
order_id BIGINT UNSIGNED,
user_id BIGINT UNSIGNED,
side TINYINT UNSIGNED,
price BIGINT UNSIGNED,
qty BIGINT UNSIGNED,
fee BIGINT UNSIGNED,
role TINYINT UNSIGNED -- 0=MAKER, 1=TAKER
) TAGS (
symbol_id INT UNSIGNED
);
-- Balances Super Table
CREATE STABLE IF NOT EXISTS balances (
ts TIMESTAMP,
avail BIGINT UNSIGNED,
frozen BIGINT UNSIGNED,
lock_version BIGINT UNSIGNED,
settle_version BIGINT UNSIGNED
) TAGS (
user_id BIGINT UNSIGNED,
asset_id INT UNSIGNED
);
2.3 Status Enums
#![allow(unused)]
fn main() {
// New Enum
pub enum TradeRole {
Maker = 0,
Taker = 1,
}
}
3. API Design
3.1 Query Endpoints
| Endpoint | Method | Description |
|---|---|---|
/api/v1/order/{order_id} | GET | Query single order |
/api/v1/orders | GET | Query order list |
/api/v1/trades | GET | Query trade history |
/api/v1/balances | GET | Query user balances |
3.2 Request/Response Format
GET /api/v1/order/{order_id}:
{
"code": 0,
"msg": "ok",
"data": {
"order_id": 1001,
"symbol": "BTC_USDT",
"status": "PARTIALLY_FILLED",
"filled_qty": "0.0005",
"created_at": 1734533784000
}
}
GET /api/v1/balances:
{
"code": 0,
"msg": "ok",
"data": {
"balances": [
{ "asset": "BTC", "avail": "1.50000000", "frozen": "0.10000000" }
]
}
}
4. Implementation Architecture
4.1 Module Structure
src/
├── persistence/
│ ├── mod.rs // Entry
│ ├── tdengine.rs // Connection Manager
│ ├── orders.rs // Order Persistence
│ ├── trades.rs // Trade Persistence
│ └── balances.rs // Balance Persistence
4.2 Data Flow
┌─────────────────────────────────────────────────────────────────┐
│ Settlement Thread │
│ │
│ trade_queue.pop() ──┬── Update In-Memory Balance │
│ │ │
│ └── Write to TDengine │
│ ├── INSERT trades │
│ ├── INSERT order_events │
│ └── INSERT balances (Snapshot) │
└─────────────────────────────────────────────────────────────────┘
4.3 Batch Write Optimization
#![allow(unused)]
fn main() {
// Batch write to reduce I/O overhead
const BATCH_SIZE: usize = 1000;
async fn flush_trades(trades: Vec<Trade>) {
let mut sql = String::from("INSERT INTO ");
// Construct bulk insert SQL...
client.exec(&sql).await;
}
}
5. Implementation Plan
Phase 1: Basic Persistence (This Chapter)
- TDengine Connection
- Schema Initialization
- Trade/Order/Balance Writes
Phase 2: Query APIs
- Implement GET Endpoints
Phase 3: Optimization
- Batch Writes
- Connection Pool
- Redis Cache
6. Verification Plan
6.1 Integration Test
# 1. Start TDengine
docker run -d -p 6030:6030 -p 6041:6041 tdengine/tdengine:latest
# 2. Run Gateway
cargo run --release -- --gateway --port 8080
# 3. Submit Order
curl -X POST http://localhost:8080/api/v1/create_order ...
# 4. Query Order (Verify Persistence)
curl http://localhost:8080/api/v1/order/1
Summary
This chapter implements Settlement Persistence.
Core Philosophy:
Persistence is a side-channel operation, not blocking the main trading flow. The Settlement thread writes to TDengine asynchronously.
🇨🇳 中文
📦 代码变更: 查看 Diff
本节核心目标:将成交数据持久化到 TDengine,实现订单查询和历史记录 API。
背景:从内存到持久化
在 Gateway Phase 1 (0x09-a) 中,我们完成了:
- ✅ HTTP API (create_order, cancel_order)
- ✅ 订单验证和转换
- ✅ Ring Buffer 队列集成
- ⏳ 数据持久化 ← 本章
当前系统的问题:
┌─────────────────────────────────────────────────────────────────┐
│ Trading Core (内存中) │
│ │
│ Orders → 匹配 → Trades → 结算 → 余额更新 │
│ ↓ ↓ ↓ │
│ ❌ ❌ ❌ ← 重启后数据丢失! │
└─────────────────────────────────────────────────────────────────┘
本章解决方案:
┌─────────────────────────────────────────────────────────────────┐
│ Trading Core │
│ │
│ Orders → 匹配 → Trades → 结算 → 余额更新 │
│ ↓ ↓ ↓ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ TDengine (持久化) │ │
│ │ orders | trades | balances │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
1. 为什么选择 TDengine
详细对比见: 数据库选型分析
核心优势
| 特性 | TDengine | PostgreSQL |
|---|---|---|
| 写入速度 | 100万/秒 | 1万/秒 |
| 时序查询 | 原生支持 | 需要索引优化 |
| 存储空间 | 1/10 | 1x |
| 实时分析 | 内置流计算 | 需要额外工具 |
| Rust 客户端 | ✅ 官方 taos | ✅ tokio-postgres |
2. Schema 设计
2.1 Super Table 架构
TDengine 使用 Super Table 概念:
┌─────────────────────────────────────────────────────────┐
│ Super Table: orders │
│ (统一 schema,自动按 symbol_id 创建子表) │
├─────────────────┬─────────────────┬────────────────────┤
│ orders_1 │ orders_2 │ orders_N │
│ (BTC_USDT) │ (ETH_USDT) │ (...) │
└─────────────────┴─────────────────┴────────────────────┘
2.2 DDL 定义
-- Database Setup
CREATE DATABASE IF NOT EXISTS trading
KEEP 365d -- 数据保留 1 年
DURATION 10d -- 每 10 天一个分区
BUFFER 256 -- 写缓冲 256MB
WAL_LEVEL 2 -- WAL 持久化级别
PRECISION 'us'; -- 微秒精度
USE trading;
-- Orders Super Table
CREATE STABLE IF NOT EXISTS orders (
ts TIMESTAMP, -- 订单时间戳 (主键)
order_id BIGINT UNSIGNED, -- 订单 ID
user_id BIGINT UNSIGNED, -- 用户 ID
side TINYINT UNSIGNED, -- 0=BUY, 1=SELL
order_type TINYINT UNSIGNED,-- 0=LIMIT, 1=MARKET
price BIGINT UNSIGNED, -- 价格 (整数)
qty BIGINT UNSIGNED, -- 原始数量
filled_qty BIGINT UNSIGNED, -- 已成交数量
status TINYINT UNSIGNED, -- 订单状态
cid NCHAR(64) -- 客户端订单 ID
) TAGS (
symbol_id INT UNSIGNED -- 交易对 ID (分区键)
);
-- Trades Super Table
CREATE STABLE IF NOT EXISTS trades (
ts TIMESTAMP, -- 成交时间戳
trade_id BIGINT UNSIGNED, -- 成交 ID
order_id BIGINT UNSIGNED, -- 订单 ID
user_id BIGINT UNSIGNED, -- 用户 ID
side TINYINT UNSIGNED, -- 0=BUY, 1=SELL
price BIGINT UNSIGNED, -- 成交价格
qty BIGINT UNSIGNED, -- 成交数量
fee BIGINT UNSIGNED, -- 手续费
role TINYINT UNSIGNED -- 0=MAKER, 1=TAKER
) TAGS (
symbol_id INT UNSIGNED
);
-- Balances Super Table
CREATE STABLE IF NOT EXISTS balances (
ts TIMESTAMP, -- 快照时间
avail BIGINT UNSIGNED, -- 可用余额
frozen BIGINT UNSIGNED, -- 冻结余额
lock_version BIGINT UNSIGNED, -- 锁定版本
settle_version BIGINT UNSIGNED -- 结算版本
) TAGS (
user_id BIGINT UNSIGNED, -- 用户 ID
asset_id INT UNSIGNED -- 资产 ID
);
2.3 状态枚举
#![allow(unused)]
fn main() {
// 新增
pub enum TradeRole {
Maker = 0,
Taker = 1,
}
}
3. API 设计
3.1 查询端点
| 端点 | 方法 | 描述 |
|---|---|---|
/api/v1/order/{order_id} | GET | 查询单个订单 |
/api/v1/orders | GET | 查询订单列表 |
/api/v1/trades | GET | 查询成交历史 |
/api/v1/balances | GET | 查询用户余额 |
3.2 请求/响应格式
GET /api/v1/order/{order_id}:
{
"code": 0,
"msg": "ok",
"data": {
"order_id": 1001,
"symbol": "BTC_USDT",
"status": "PARTIALLY_FILLED",
"filled_qty": "0.0005",
"created_at": 1734533784000
}
}
4. 实现架构
4.1 模块结构
src/
├── persistence/
│ ├── mod.rs // 模块入口
│ ├── tdengine.rs // TDengine 连接管理
│ ├── orders.rs // 订单持久化
│ ├── trades.rs // 成交持久化
│ └── balances.rs // 余额持久化
4.2 数据流
┌─────────────────────────────────────────────────────────────────┐
│ Settlement 线程 │
│ │
│ trade_queue.pop() ──┬── 更新内存余额 │
│ │ │
│ └── 写入 TDengine │
│ ├── INSERT trades │
│ ├── INSERT order_events │
│ └── INSERT balances (快照) │
└─────────────────────────────────────────────────────────────────┘
4.3 批量写入优化
#![allow(unused)]
fn main() {
// 批量写入,减少 I/O 开销
const BATCH_SIZE: usize = 1000;
async fn flush_trades(trades: Vec<Trade>) {
let mut sql = String::from("INSERT INTO ");
// ... 构建批量插入 SQL
client.exec(&sql).await;
}
}
5. 实现计划
Phase 1: 基础持久化 (本次)
- TDengine 连接管理
- Schema 初始化
- 成交/订单/余额写入
Phase 2: 查询接口
- 实现 GET 端点
Phase 3: 优化
- 批量写入
- 连接池
- Redis 缓存
6. 验证计划
6.1 集成测试
# 1. 启动 TDengine
docker run -d -p 6030:6030 -p 6041:6041 tdengine/tdengine:latest
# 2. 运行 Gateway
cargo run --release -- --gateway --port 8080
# 3. 提交订单
curl -X POST http://localhost:8080/api/v1/create_order ...
# 4. 查询订单 (验证持久化)
curl http://localhost:8080/api/v1/order/1
Summary
本章实现 Settlement Persistence:
核心理念:
持久化是旁路操作,不阻塞主交易流程。Trading Core 保持高性能,Settlement 线程异步写入 TDengine。
下一章 (0x09-c) 将实现 WebSocket 实时推送。
0x09-c WebSocket Push: Real-time Notification
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📦 Code Changes: View Diff
Core Objective: Implement WebSocket real-time push so clients can receive order updates, trade notifications, and balance changes.
Background: From Polling to Push
Current Query Method (Polling):
Client Gateway
│ │
├─── GET /orders ─────────>│ (Poll)
│<──────────────────────────┤
│ ... seconds ... │
├─── GET /orders ─────────>│ (Poll again)
│<──────────────────────────┤
Issues:
- ❌ High Latency
- ❌ Wasted Resources
- ❌ Poor Real-time experience
This Chapter’s Solution (Push):
Client Gateway Trading Core
│ │ │
├── WS Connect ───────────>│ │
│<── Connected ────────────┤ │
│ │ │
│ │<── Order Filled ───────┤
│<── push: order.update ───┤ │
│ │ │
│ │<── Trade ──────────────┤
│<── push: trade ──────────┤ │
1. Push Event Types
1.1 Classification
| Event Type | Trigger | Recipient |
|---|---|---|
order.update | Status change (NEW/FILLED/CANCELED) | Order Owner |
trade | Trade execution | Buyer & Seller |
balance.update | Balance change | Account Owner |
1.2 Message Format
// Order Update
{
"type": "order.update",
"data": {
"order_id": 1001,
"symbol": "BTC_USDT",
"status": "FILLED",
"filled_qty": "0.001",
"avg_price": "85000.00",
"updated_at": 1734533790000
}
}
// Trade Notification
{
"type": "trade",
"data": {
"trade_id": 5001,
"order_id": 1001,
"symbol": "BTC_USDT",
"side": "BUY",
"role": "TAKER",
"traded_at": 1734533790000
}
}
// Balance Update
{
"type": "balance.update",
"data": {
"asset": "BTC",
"avail": "1.501000",
"frozen": "0.000000"
}
}
2. Architecture Design
2.1 Design Principles
Important
Data Consistency First: When a user receives a push, the database MUST already be updated.
Correct Flow: ME Match → Settlement Persist → Push → User Query → Data Exists ✅
Incorrect Flow: ME Match → Push → User Query → Data Not Found ❌
2.2 System Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Multi-Thread Pipeline │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Thread 3: ME ──▶ trade_queue ──▶ Thread 4: Settlement│
│ └──▶ balance_update_queue │
│ │
│ Thread 4: Settlement ──▶ push_event_queue ──▶ WsService │
│ │ │
│ └──▶ TDengine (persist) │
│ │
│ WsService (Gateway) ──▶ ConnectionManager ──▶ Clients │
│ │
└─────────────────────────────────────────────────────────────────┘
Key Decisions:
- ✅ Settlement is the only push source.
- ✅ Push events generated ONLY after persistence success.
- ✅ WsService runs in the Gateway’s tokio runtime.
2.3 Connection Management
ConnectionManager uses DashMap to handle concurrent connections, supporting multiple connections per user.
3. API Design
3.1 Endpoint
ws://host:port/ws
3.2 Connection Flow
- Connect.
- Send Auth:
{"type": "auth", "token": "..."}. - Receive Auth Success.
- Receive Push Events.
3.3 Heartbeat
Client sends {"type": "ping"} every 30s, Server responds {"type": "pong"}.
4. Implementation
4.1 Core Structures
PushEvent (Internal Queue):
#![allow(unused)]
fn main() {
pub enum PushEvent {
OrderUpdate { ... },
Trade { ... },
BalanceUpdate { ... },
}
}
TradeEvent Extension:
Added taker_filled_qty, maker_filled_qty etc., to TradeEvent to allow Settlement to determine order status (FILLED vs PARTIAL) without querying generic order state.
4.2 Implementation Plan
- Phase 1: Basic Connection (Manager, Handler, Gateway Integration).
- Phase 2: Push Integration (
push_event_queue,WsService, Settlement logic). - Phase 3: Refinement (Error handling, Performance tests).
5. Verification
5.1 Automated Tests
Run sh run_test.sh:
- Validates WS connection.
- Submits orders and verifies receiving
order_update,trade, andbalance_updateevents.
5.2 Manual Test
websocat "ws://localhost:8080/ws?user_id=1001"
# Send {"type": "ping"} -> Receive {"type": "pong"}
Summary
This chapter implements WebSocket real-time push.
Key Design Decisions:
- Settlement-first: Ensuring consistency.
- Single Source: All events originate from Settlement.
- Extended TradeEvent: Carrying adequate state for downstream consumers.
Next Chapter: 0x09-d K-Line Aggregation.
🇨🇳 中文
📦 代码变更: 查看 Diff
本节核心目标:实现 WebSocket 实时推送,客户端可接收订单状态更新、成交通知、余额变化。
背景:从轮询到推送
当前系统查询方式(轮询):
Client Gateway
│ │
├─── GET /orders ─────────>│ (轮询 polling)
│<──────────────────────────┤
│ ... 数秒后 ... │
├─── GET /orders ─────────>│ (再次轮询)
│<──────────────────────────┤
问题:
- ❌ 延迟高
- ❌ 浪费资源
- ❌ 实时性差
本章解决方案(推送):
Client Gateway Trading Core
│ │ │
├── WS Connect ───────────>│ │
│<── Connected ────────────┤ │
│ │ │
│ │<── Order Filled ───────┤
│<── push: order.update ───┤ │
│ │ │
│ │<── Trade ──────────────┤
│<── push: trade ──────────┤ │
1. 推送事件类型
1.1 事件分类
| 事件类型 | 触发时机 | 接收者 |
|---|---|---|
order.update | 订单状态变化 | 订单所有者 |
trade | 成交发生 | 双方用户 |
balance.update | 余额变化 | 账户所有者 |
1.2 消息格式
// 订单更新
{
"type": "order.update",
"data": {
"order_id": 1001,
"symbol": "BTC_USDT",
"status": "FILLED",
"filled_qty": "0.001",
"avg_price": "85000.00",
"updated_at": 1734533790000
}
}
2. 架构设计
2.1 设计原则
Important
数据一致性优先: 用户收到推送时,数据库必须已更新。
正确流程: ME 成交 → Settlement 持久化 → 推送 → 用户查询 → 数据已存在 ✅
2.2 系统架构
┌─────────────────────────────────────────────────────────────────┐
│ Multi-Thread Pipeline │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Thread 3: ME ──▶ trade_queue ──▶ Thread 4: Settlement│
│ └──▶ balance_update_queue │
│ │
│ Thread 4: Settlement ──▶ push_event_queue ──▶ WsService │
│ │ │
│ └──▶ TDengine (persist) │
│ │
│ WsService (Gateway) ──▶ ConnectionManager ──▶ Clients │
│ │
└─────────────────────────────────────────────────────────────────┘
关键设计:
- ✅ Settlement 作为唯一推送源
- ✅ 持久化成功后才生成
PushEvent - ✅ WsService 运行在 Gateway 的 tokio runtime
3. API 设计
3.1 端点
ws://host:port/ws
3.2 连接流程
- Client 连接
- 发送认证:
{"type": "auth", "token": "..."} - 接收推送
3.3 心跳
Client 发送 {"type": "ping"} (每30秒),Server 回复 {"type": "pong"}。
4. 实现细节
4.1 核心结构
PushEvent (内部队列): 定义了三种核心事件结构。
TradeEvent 扩展: 新增了 taker_filled_qty 等字段,允许 Settlement 判断订单最终状态。
4.2 实现计划
- Phase 1: 基础连接管理
- Phase 2: 推送集成 (Settlement -> WsService)
- Phase 3: 完善与验证
5. 验证
5.1 自动化测试
运行 sh run_test.sh,覆盖连接、下单、接收各类推送的全流程。
5.2 手动测试
websocat "ws://localhost:8080/ws?user_id=1001"
总结
本章实现了 WebSocket 实时推送。
关键设计决策:
- settlement-first: 确保一致性。
- 单一推送源: 简化架构。
- TradeEvent 扩展: 携带足够状态。
下一章 (0x09-d) 将实现 K-Line 聚合服务。
0x09-d K-Line Aggregation Service
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📦 Code Changes: View Diff
Core Objective: Implement real-time K-Line (Candlestick) aggregation service, supporting multiple intervals (1m, 5m, 15m, 30m, 1h, 1d).
Background: Market Data Aggregation
The exchange needs to provide standardized market data:
Trades K-Line (OHLCV)
│ │
├── Trade 1: price=30000, qty=0.1 │
├── Trade 2: price=30100, qty=0.2 ──▶ 1-Min K-Line:
├── Trade 3: price=29900, qty=0.1 │ Open: 30000
└── Trade 4: price=30050, qty=0.3 │ High: 30100
│ Low: 29900
│ Close: 30050
│ Volume: 0.7
1. K-Line Data Structure
1.1 OHLCV
#![allow(unused)]
fn main() {
pub struct KLine {
pub symbol_id: u32,
pub interval: KLineInterval,
pub open_time: u64, // Unix timestamp (ms)
pub close_time: u64,
pub open: u64,
pub high: u64,
pub low: u64,
pub close: u64,
pub volume: u64, // Base asset volume
pub quote_volume: u64, // Quote asset volume (price * qty)
pub trade_count: u32,
}
}
Warning
quote_volume Overflow:
price * qtymight overflowu64.Correct SQL:
SUM(CAST(price AS DOUBLE) * CAST(qty AS DOUBLE)) AS quote_volume
1.2 API Response Format
{
"symbol": "BTC_USDT",
"interval": "1m",
"open_time": 1734533760000,
"close_time": 1734533819999,
"open": "30000.00",
"high": "30100.00",
"low": "29900.00",
"close": "30050.00",
"volume": "0.700000",
"quote_volume": "21035.00",
"trade_count": 4
}
2. Architecture: TDengine Stream Computing
2.1 Core Concept
Leverage TDengine built-in Stream Computing for auto-aggregation. No manual aggregator implementation needed:
- Settlement writes to
tradestable. - TDengine automatically triggers stream computing.
- Results are written to
klinestables. - HTTP API queries
klinestables directly.
2.2 Data Flow
Settlement ──▶ trades table (TDengine)
│
│ TDengine Stream Computing (Auto)
│
├─── kline_1m_stream ──► klines_1m table
├─── kline_5m_stream ──► klines_5m table
└─── ...
│
┌────────────────────────┴───────────────────────┐
▼ ▼
HTTP API WebSocket Push
GET /api/v1/klines kline.update (Optional)
2.3 TDengine Stream Example
CREATE STREAM IF NOT EXISTS kline_1m_stream
INTO klines_1m SUBTABLE(CONCAT('kl_1m_', CAST(symbol_id AS NCHAR(10))))
AS SELECT
_wstart AS ts,
FIRST(price) AS open,
MAX(price) AS high,
MIN(price) AS low,
LAST(price) AS close,
SUM(qty) AS volume,
SUM(CAST(price AS DOUBLE) * CAST(qty AS DOUBLE)) AS quote_volume,
COUNT(*) AS trade_count
FROM trades
PARTITION BY symbol_id
INTERVAL(1m);
3. API Design
3.1 HTTP Endpoint
GET /api/v1/klines?symbol=BTC_USDT&interval=1m&limit=100
3.2 WebSocket Push
{
"type": "kline.update",
"data": {
"symbol": "BTC_USDT",
"interval": "1m",
"open": "30000.00",
"close": "30050.00",
"is_final": false
}
}
4. Module Structure
src/
├── persistence/
│ ├── klines.rs # Create Streams, Query K-Lines
│ ├── schema.rs # Add klines Super Table
│ └── queries.rs # Add query_klines()
├── gateway/
│ ├── handlers.rs # Add get_klines
│ └── ...
Tip
No need for
src/kline/logic directory, TDengine handles it.
5. Implementation Plan
- Phase 1: Schema: Add
klinessuper table. - Phase 2: Stream Computing: Implement
create_kline_streams(). - Phase 3: HTTP API: Implement
query_klines()and API endpoint. - Phase 4: Verification: E2E test.
6. Verification
6.1 E2E Test Scenarios
Script: ./scripts/test_kline_e2e.sh
- Check API connectivity.
- Record initial K-Line count.
- Create matched orders.
- Wait for Stream processing (5s).
- Query K-Line API and verify data structure.
6.2 Binance Standard Alignment
Warning
P0 Fix: Ensure time fields align with Binance standard (Unix Milliseconds Number).
open_time: 1734611580000 (was ISO 8601 string)close_time: 1734611639999 (was missing)
Summary
This chapter implements K-Line aggregation service leveraging TDengine’s Stream Computing.
Key Concept:
K-Line is derived data. We calculate it from trades in real-time, rather than storing original raw data.
Next Chapter: 0x09-e OrderBook Depth.
🇨🇳 中文
📦 代码变更: 查看 Diff
本节核心目标:实现 K-Line (蜡烛图) 实时聚合服务,支持多时间周期 (1m, 5m, 15m, 30m, 1h, 1d)。
背景:行情数据聚合
交易所需要提供标准化的行情数据:
每笔成交 K-Line (OHLCV)
│ │
├── Trade 1: price=30000, qty=0.1 │
├── Trade 2: price=30100, qty=0.2 ──▶ 1分钟 K-Line:
├── Trade 3: price=29900, qty=0.1 │ Open: 30000
└── Trade 4: price=30050, qty=0.3 │ High: 30100
│ Low: 29900
│ Close: 30050
│ Volume: 0.7
1. K-Line 数据结构
1.1 OHLCV
#![allow(unused)]
fn main() {
pub struct KLine {
pub symbol_id: u32,
pub interval: KLineInterval,
pub open_time: u64, // 时间戳 (毫秒)
pub close_time: u64,
pub open: u64, // 开盘价
pub high: u64, // 最高价
pub low: u64, // 最低价
pub close: u64, // 收盘价
pub volume: u64, // 成交量 (base asset)
pub quote_volume: u64, // 成交额 (quote asset)
pub trade_count: u32, // 成交笔数
}
}
Warning
quote_volume 精度问题:
price * qty可能导致 u64 溢出,需使用 DOUBLE 计算。
1.2 API 响应格式
{
"symbol": "BTC_USDT",
"interval": "1m",
"open_time": 1734533760000,
"close_time": 1734533819999,
"open": "30000.00",
"high": "30100.00",
"low": "29900.00",
"close": "30050.00",
"volume": "0.700000",
"quote_volume": "21035.00",
"trade_count": 4
}
2. 架构设计:TDengine Stream Computing
2.1 核心思路
利用 TDengine 内置流计算自动聚合 K-Line,无需手动实现聚合器:
- Settlement 写入
trades表后,TDengine 自动触发流计算 - 流计算结果自动写入
klines表 - HTTP API 直接查询
klines表返回结果
2.2 数据流
Settlement ──▶ trades 表 (TDengine)
│
│ TDengine Stream Computing (自动)
│
├─── kline_1m_stream ──► klines_1m 表
├─── kline_5m_stream ──► klines_5m 表
└─── ...
2.3 TDengine Stream 示例
CREATE STREAM IF NOT EXISTS kline_1m_stream
INTO klines_1m SUBTABLE(...)
AS SELECT
_wstart AS ts,
FIRST(price) AS open,
MAX(price) AS high,
MIN(price) AS low,
LAST(price) AS close,
SUM(qty) AS volume,
SUM(CAST(price AS DOUBLE) * CAST(qty AS DOUBLE)) AS quote_volume,
COUNT(*) AS trade_count
FROM trades
PARTITION BY symbol_id
INTERVAL(1m);
3. API 设计
HTTP 端点: GET /api/v1/klines?symbol=BTC_USDT&interval=1m&limit=100
4. 模块结构
src/
├── persistence/
│ ├── klines.rs # Create Stream, Query K-Line
│ ├── schema.rs # Add klines table
│ └── queries.rs # Add query_klines()
├── gateway/
│ ├── handlers.rs # Add get_klines
Tip
无需
src/kline/目录,TDengine 流计算替代了手动聚合逻辑
5. 实现计划
- Phase 1: Schema: 添加
klines超级表。 - Phase 2: Stream Computing: 实现
create_kline_streams()。 - Phase 3: HTTP API: 实现查询函数和 API 端点。
- Phase 4: 验证: E2E 测试。
6. 验证计划
运行脚本 ./scripts/test_kline_e2e.sh 验证:
- API 连通性
- K-Line 数据生成 (Stream 处理)
- 响应结构正确性 (对齐 Binance 标准)
Summary
本章实现 K-Line 聚合服务。
核心理念:
K-Line 是衍生数据:从成交事件实时计算,而非存储原始数据。
下一章 (0x09-e) 将实现 OrderBook Depth 聚合。
0x09-e Order Book Depth
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📦 Code Changes: View Diff
Core Objective: Implement Order Book Depth push, allowing users to view the current buy/sell order distribution in real-time.
Background: Depth Data
The Order Book Depth displays the current market’s distribution of limit orders:
Asks (Sells)
┌─────────────────────┐
│ 30100.00 0.3 BTC │ ← Lowest Ask
│ 30050.00 0.5 BTC │
│ 30020.00 1.2 BTC │
├─────────────────────┤
│ Current: 30000 │
├─────────────────────┤
│ 29980.00 0.8 BTC │
│ 29950.00 1.5 BTC │
│ 29900.00 2.0 BTC │ ← Highest Bid
└─────────────────────┘
Bids (Buys)
1. Data Structure
1.1 Depth Response Format
{
"symbol": "BTC_USDT",
"bids": [
["29980.00", "0.800000"],
["29950.00", "1.500000"],
["29900.00", "2.000000"]
],
"asks": [
["30020.00", "1.200000"],
["30050.00", "0.500000"],
["30100.00", "0.300000"]
],
"last_update_id": 12345
}
1.2 Binance Format Comparison
| Field | Us | Binance |
|---|---|---|
| bids | [["price", "qty"], ...] | ✅ Match |
| asks | [["price", "qty"], ...] | ✅ Match |
| last_update_id | 12345 | ✅ Match |
2. API Design
2.1 HTTP Endpoint
GET /api/v1/depth?symbol=BTC_USDT&limit=20
| Parameter | Type | Description |
|---|---|---|
| symbol | String | Trading Pair |
| limit | u32 | Depth levels (5, 10, 20, 50, 100) |
2.2 WebSocket Push
// Subscribe
{"type": "subscribe", "channel": "depth", "symbol": "BTC_USDT"}
// Push (Incremental)
{
"type": "depth.update",
"symbol": "BTC_USDT",
"bids": [["29980.00", "0.800000"]],
"asks": [["30020.00", "0.000000"]], // qty=0 means removal
"last_update_id": 12346
}
3. Architecture Design
3.1 Comparison with K-Line
| Data | Source | Latency | Method |
|---|---|---|---|
| K-Line | Historical Trades | Minute-level | TDengine Stream |
| Depth | Current Orders | Ms-level | In-Memory |
Depth is too real-time for DB storage. We use Ring Buffer + Independent Service.
3.2 Event-Driven Architecture
Following the pattern: Isolated service, Ring Buffer, Lock-Free.
┌────────────┐ ┌─────────────────────┐
│ ME │ ──(non-blocking)─► │ depth_event_queue │
│ │ drop if full │ (capacity: 1024) │
1└────────────┘ └──────────┬──────────┘
│
▼
┌─────────────────────┐
│ DepthService │
│ (tokio async) │
├─────────────────────┤
│ ● HTTP Snapshot │
│ ● WS Incremental │
└─────────────────────┘
Important
Market Data Characteristic: Freshness is key. Dropping a few events is acceptable if the consumer is slow, as eventual consistency is restored by snapshots.
4. Module Structure
src/
├── gateway/
│ ├── handlers.rs # Add get_depth
│ └── ...
├── engine.rs # Add get_depth() method
└── websocket/
└── messages.rs # Add DepthUpdate
5. Implementation Plan
- Phase 1: HTTP API: Add
OrderBook::get_depth(), API endpoint. - Phase 2: WebSocket:
depth.updatemessage, subscription Logic.
6. Verification
6.1 E2E Test Scenarios
Script: scripts/test_depth.sh
- Query empty depth.
- Submit Buy/Sell orders (creating depth).
- Wait for update (200ms).
- Query depth and verify bids/asks.
- Performance test (100 orders rapid fire).
Expected Result:
- Depth reflects order book state.
- Update latency ≤ 100ms.
- High frequency updates are batched/throttled correctly.
Summary
| Point | Implementation |
|---|---|
| Structure | Compatible with Binance (Array format) |
| API | GET /api/v1/depth |
| WebSocket | depth.update (Future: Incremental) |
| Architecture | Event-driven, Ring Buffer |
Core Concept:
Service Isolation: ME pushes via DepthEvent. DepthService maintains state. Lock-free.
Next Chapter: 0x09-f Integration Test.
🇨🇳 中文
📦 代码变更: 查看 Diff
本节核心目标:实现 Order Book 盘口深度推送,让用户实时看到买卖挂单分布。
背景:盘口数据
交易所盘口展示当前市场的买卖挂单分布:
卖单 (Asks)
┌─────────────────────┐
│ 30100.00 0.3 BTC │ ← 最低卖价
│ 30050.00 0.5 BTC │
│ 30020.00 1.2 BTC │
├─────────────────────┤
│ 当前价格: 30000 │
├─────────────────────┤
│ 29980.00 0.8 BTC │
│ 29950.00 1.5 BTC │
│ 29900.00 2.0 BTC │ ← 最高买价
└─────────────────────┘
买单 (Bids)
1. 数据结构
1.1 Depth 响应格式
{
"symbol": "BTC_USDT",
"bids": [
["29980.00", "0.800000"],
["29950.00", "1.500000"],
["29900.00", "2.000000"]
],
"asks": [
["30020.00", "1.200000"],
["30050.00", "0.500000"],
["30100.00", "0.300000"]
],
"last_update_id": 12345
}
1.2 Binance 格式对比
| 字段 | 我们 | Binance |
|---|---|---|
| bids | [["price", "qty"], ...] | ✅ 相同 |
| asks | [["price", "qty"], ...] | ✅ 相同 |
| last_update_id | 12345 | ✅ 相同 |
2. API 设计
2.1 HTTP 端点
GET /api/v1/depth?symbol=BTC_USDT&limit=20
| 参数 | 类型 | 描述 |
|---|---|---|
| symbol | String | 交易对 |
| limit | u32 | 档位数量 (5, 10, 20, 50, 100) |
2.2 WebSocket 推送
depth.update (增量更新),qty=0 表示删除。
3. 架构设计
3.1 与 K-Line 的对比
| 数据 | 来源 | 时效性 | 处理方式 |
|---|---|---|---|
| K-Line | 历史成交 | 分钟级别 | TDengine 流计算 |
| Depth | 当前挂单 | 毫秒级 | 内存状态 |
Depth 太实时,不适合存数据库——使用 ring buffer + 独立服务 模式。
3.2 事件驱动架构
延续项目一贯的设计:服务独立,通过 ring buffer 通信,lock-free。
┌────────────┐ ┌─────────────────────┐
│ ME │ ──(non-blocking)─► │ depth_event_queue │
│ │ drop if full │ (capacity: 1024) │
└────────────┘ └──────────┬──────────┘
4. 模块结构
src/
├── gateway/
│ ├── handlers.rs # 添加 get_depth
├── engine.rs # 添加 get_depth()
└── websocket/
└── messages.rs # 添加 DepthUpdate
5. 实现计划
- Phase 1: HTTP API: 实现
OrderBook::get_depth和 API。 - Phase 2: WebSocket: 增量推送 (可选)。
6. 验证计划
运行 scripts/test_depth.sh:
- 查询空盘口
- 提交买卖单
- 验证盘口数据更新
- 性能验证 (100ms 更新频率)
Summary
| 设计点 | 方案 |
|---|---|
| 数据结构 | bids/asks 数组,Binance 兼容 |
| HTTP API | GET /api/v1/depth |
| WebSocket | depth.update (增量) |
| 架构 | 事件驱动,Ring Buffer 通信 |
核心理念:
服务隔离:ME 通过 DepthEvent 推送,DepthService 维护独立状态,lock-free。
下一章 (0x09-f) 将进行集成测试。
0x09-f Integration Test: Full Acceptance
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📦 Code Changes: View Diff
Core Objective: Perform comprehensive integration testing on all 0x09 features using historical datasets to establish a reproducible acceptance baseline.
Background
Phase 0x09 delivered multiple key features:
| Chapter | Feature | Status |
|---|---|---|
| 0x09-a | Gateway HTTP API | ✅ |
| 0x09-b | Settlement Persistence | ✅ |
| 0x09-c | WebSocket Push | ✅ |
| 0x09-d | K-Line Aggregation | ✅ |
| 0x09-e | Order Book Depth | ✅ |
We now need to integrate and verify these features to ensure end-to-end correctness.
Test Scope
1. Pipeline Correctness
| Test | Dataset | Verification Point |
|---|---|---|
| Single vs Multi-Thread | 100K | Output Identical |
| Single vs Multi-Thread | 1.3M | Output Identical |
2. Settlement Persistence
| Test | Verification Point |
|---|---|
| Orders Table | Status changes recorded correctly |
| Trades Table | Trade data integrity |
| Balances Table | Final balances match |
3. HTTP API
| Endpoint | Verification Point |
|---|---|
| POST /create_order | Success |
| POST /cancel_order | Correct execution |
| GET /orders | Correct list |
| GET /trades | Record integrity |
| GET /depth | Bids/Asks ordered |
Acceptance Criteria
1. Pipeline Correctness (Must Pass All)
- Output diff between Single-Thread and Multi-Thread is empty.
- Final balances match exactly.
- Trade counts match exactly.
2. Settlement Persistence (Must Pass All)
- Orders Row Count == Total Orders.
- Trades Row Count == Total Trades.
- Final Balances match precisely (100% consistency for avail/frozen).
Important
Consistency Requirement: Core assets (avail, frozen) and order status (filled_qty, status) must be 100% consistent.
3. Performance Baseline
- Record 100K and 1.3M TPS.
- Record P99 Latency.
Test Artifacts & Baseline
Baseline Generation
After testing, organize the following for regression testing:
- 100K Output:
baseline/100k/ - 1.3M Output:
baseline/1.3m/ - Performance Metrics:
docs/src/perf-history/
Regression Testing
Use scripts to automatically compare against baseline:
./scripts/test_pipeline_compare.sh 100k
./scripts/test_integration_full.sh
Large Dataset Testing Notes
Important
Special attention needed for 1.3M dataset tests:
- Output Redirection: Must redirect output to file to avoid IDE freezing.
- Execution Time: Multi-thread mode is slower (~100s vs 16s) due to persistence overhead.
- Balance Events: “Lock events != Accepted orders” is expected (due to cancels).
- Push Queue Overflow:
[PUSH] queue fullwarnings are expected under high load.
Test Report (2025-12-21)
Performance Baseline
| Version | Time | Rate | vs Baseline |
|---|---|---|---|
| Baseline (urllib) | 576s | 174/s | - |
| HTTP Keep-Alive | 117s | 857/s | +393% |
| Optimized (Current) | 69s | 1,435/s | +725% |
Pipeline Correctness (1.3M) ✅
- Core balances consistent.
- Trade count matches (667,567).
- Balance final state 100% MATCH.
Settlement Persistence (100K)
- Orders: 100% MATCH (filled_qty, status).
- Trades: 100% MATCH.
- Balances: 100% MATCH.
Conclusion: All 0x09 features (Persistence & Gateway) are production-ready.
🇨🇳 中文
📦 代码变更: 查看 Diff
本节核心目标:使用历史数据集对所有 0x09 功能进行全面集成测试,建立可重复的验收基线。
背景
Phase 0x09 实现了多个关键功能:
| 章节 | 功能 | 状态 |
|---|---|---|
| 0x09-a | Gateway HTTP API | ✅ |
| 0x09-b | Settlement Persistence | ✅ |
| 0x09-c | WebSocket Push | ✅ |
| 0x09-d | K-Line Aggregation | ✅ |
| 0x09-e | Order Book Depth | ✅ |
现需将这些功能整合验证,确保系统端到端正确性。
测试范围
1. Pipeline 正确性
| 测试 | 数据集 | 验证点 |
|---|---|---|
| 单线程 vs 多线程 | 100K | 输出完全一致 |
| 单线程 vs 多线程 | 1.3M | 输出完全一致 |
2. Settlement 持久化
| 测试 | 验证点 |
|---|---|
| Orders 表 | 状态变更正确记录 |
| Trades 表 | 成交数据完整 |
| Balances 表 | 最终余额一致 |
3. HTTP API
验证 create_order, cancel_order, orders, trades, depth 等接口。
验收标准
1. Pipeline 正确性 (必须全部通过)
- 100K/1.3M 输出对比为空。
- 余额最终状态一致。
- 成交数量一致。
2. Settlement 持久化 (必须全部通过)
- Orders/Trades 记录数匹配。
- Balances 最终值 100% 匹配。
Important
一致性要求:核心资产 (avail, frozen) 和订单状态 (filled_qty, status) 必须 100% 一致。
3. 性能基线
- 记录 100K 和 1.3M TPS。
- 记录 P99 延迟。
测试产物与基线
基线生成与回归
使用 baseline/ 目录存储基线数据,并使用 test_pipeline_compare.sh 进行自动化回归测试。
大数据集测试注意事项
Important
运行 1.3M 数据集测试时需要特别注意:
- 输出重定向:必须重定向到文件。
- 执行时间:多线程模式较慢是正常的。
- Balance Events:Lock 事件数不等于订单数是正常的。
- Push Queue 溢出:高压下队列满警告是正常的。
测试报告 (2025-12-21)
性能基线
当前优化后 TPS 为 1,435/s,相比基线提升 725%。
Pipeline 正确性 (1.3M) ✅
- 成交数量匹配 (667,567)。
- 余额最终状态 100% MATCH。
Settlement 持久化 (100K)
- Orders, Trades, Balances 均为 100% MATCH。
结论:0x09 阶段的所有持久化与网关功能已具备生产级稳定性。
Part II: Productization
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📦 Code Changes: View Diff
Core Objective: Upgrade the core matching engine into a complete trading system with Account System, Fund Transfer, and Security Authentication.
1. Review: Achievements of Part I
| Chapter | Topic | Key Achievement |
|---|---|---|
| 0x01 | Genesis | Minimal Matching Prototype |
| 0x02-03 | Floats & Decimals | Financial Grade Precision |
| 0x04 | BTree OrderBook | O(log n) Matching |
| 0x05-06 | User Balance | Locking/Unlocking |
| 0x07 | Testing Framework | 100K Order Baseline |
| 0x08 | Multi-Thread Pipeline | 4-Thread Concurrency |
| 0x09 | Gateway & Persistence | Gateway, TDengine, WebSocket |
2. Gap Analysis: From Engine to System
| Dimension | Current State | Target State |
|---|---|---|
| Identity | Raw user_id | API Key Signature |
| Accounts | Single Balance | Funding + Spot Dual-Account |
| Funds | Manual deposit() | Deposit/Withdraw/Transfer |
| Economics | Zero Fee | Maker/Taker Fees |
3. Blueprint for Part II
0x0A ─── Account System & Security
├── 0x0A-a: Account System (exchange_info + DB)
├── 0x0A-b: ID Specification (Asset/Symbol Naming)
└── 0x0A-c: Authentication (API Key Middleware)
0x0B ─── Fund System & Transfers
├── Funding/Spot Dual-Account Structure
└── Deposit/Withdraw API
0x0C ─── Economic Model
└── Fee Calculation & Deduction
0x0D ─── Snapshot & Recovery
└── Graceful Shutdown & State Restoration
4. Tech Stack Choices
| Component | Choice | Purpose |
|---|---|---|
| PostgreSQL 18 | Account/Asset/Symbol | Relational Config Data |
| TDengine | Orders/Trades/K-Lines | Time-Series Trading Data |
| sqlx | Rust PG Driver | Async + Compile-time Check |
5. Design Principles
| Principle | Description |
|---|---|
| Minimal External Deps | Auth/Transfer logic is cohesive |
| Auditability | All fund changes must have event logs |
| Progressive | System remains runnable after each module |
| Backward Compatible | Reuse Core types from Part I |
🇨🇳 中文
📦 代码变更: 查看 Diff
核心目的:将撮合引擎核心升级为具备账户体系、资金划转和安全鉴权的完整交易系统。
1. 回顾:第一部分的成就
| 章节 | 主题 | 关键成果 |
|---|---|---|
| 0x01 | 创世纪 | 最简撮合原型 |
| 0x02-03 | 浮点数与定点数 | 金融级精度保障 |
| 0x04 | BTree OrderBook | O(log n) 撮合 |
| 0x05-06 | 用户余额 | 锁定/解锁机制 |
| 0x07 | 测试框架 | 100K 订单基线 |
| 0x08 | 多线程 Pipeline | 四线程并发架构 |
| 0x09 | 接入层 & 持久化 | Gateway, TDengine, WebSocket |
2. 差距分析:从引擎到系统
| 维度 | 当前状态 | 目标状态 |
|---|---|---|
| 身份认证 | user_id 裸奔 | API Key 签名校验 |
| 账户管理 | 单一余额结构 | Funding + Spot 双账户 |
| 资金流转 | 手动 deposit() | 完整充提+划转流程 |
| 经济模型 | 零手续费 | Maker/Taker 费率 |
3. 第二部分蓝图
0x0A ─── 账户体系与安全鉴权
├── 0x0A-a: 账户体系 (exchange_info + DB 管理)
├── 0x0A-b: ID 规范 (Asset/Symbol 命名)
└── 0x0A-c: 安全鉴权 (API Key 中间件)
0x0B ─── 资金体系与划转
├── Funding/Spot 双账户结构
└── 充提币 API
0x0C ─── 经济模型
└── 手续费计算与扣除
0x0D ─── 快照与恢复
└── 优雅停机与状态恢复
4. 技术选型
| 组件 | 选型 | 用途 |
|---|---|---|
| PostgreSQL 18 | 账户/资产/交易对 | 关系型配置数据 |
| TDengine | 订单/成交/K线 | 时序交易数据 |
| sqlx | Rust PG Driver | 异步 + 编译时检查 |
5. 设计原则
| 原则 | 说明 |
|---|---|
| 最小外部依赖 | 鉴权、划转等逻辑内聚 |
| 可审计性 | 所有资金变动必须有完整事件流水 |
| 渐进式增强 | 每个子模块完成后保持系统可运行 |
| 向后兼容 | 复用 Part I 的核心类型 |
0x0A-a: Account System
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📦 Code Changes: View Diff
This chapter establishes the account infrastructure for the trading system: exchange_info module, naming conventions, and database management.
1. Core Module: exchange_info
1.1 Module Structure
src/exchange_info/
├── mod.rs # Module entry
├── validation.rs # AssetName/SymbolName validation
├── asset/
│ ├── mod.rs
│ ├── models.rs # Asset struct + asset_flags
│ └── manager.rs # AssetManager
└── symbol/
├── mod.rs
├── models.rs # Symbol struct + symbol_flags
└── manager.rs # SymbolManager
1.2 Core Types
#![allow(unused)]
fn main() {
// Asset
pub struct Asset {
pub asset_id: i32,
pub asset: String, // "BTC", "USDT" (UPPERCASE)
pub name: String, // "Bitcoin", "Tether USD"
pub decimals: i16, // 8 for BTC, 6 for USDT
pub status: i16,
pub asset_flags: i32, // Permission bits
}
// Symbol
pub struct Symbol {
pub symbol_id: i32,
pub symbol: String, // "BTC_USDT" (UPPERCASE)
pub base_asset_id: i32,
pub quote_asset_id: i32,
pub price_decimals: i16,
pub qty_decimals: i16,
pub symbol_flags: i32,
}
}
2. Naming Convention
| Category | Standard | Example |
|---|---|---|
| Database Name | _db suffix | exchange_info_db |
| Table Name | _tb suffix | assets_tb, symbols_tb |
| Flags Module | Table name prefix | asset_flags::, symbol_flags:: |
| Codes | UPPERCASE | BTC, BTC_USDT |
See Naming Convention Document.
3. Database Management
3.1 Management Script
# Full Init (Reset + Seed)
python3 scripts/db/manage_db.py init
# Reset Schema Only
python3 scripts/db/manage_db.py reset
# Seed Data Only
python3 scripts/db/manage_db.py seed
# Check Status
python3 scripts/db/manage_db.py status
3.2 Database Constraints
-- Enforce UPPERCASE Asset
CONSTRAINT chk_asset_uppercase CHECK (asset = UPPER(asset))
-- Enforce UPPERCASE Symbol
CONSTRAINT chk_symbol_uppercase CHECK (symbol = UPPER(symbol))
4. API Endpoints
4.1 GET /api/v1/exchange_info
Returns full exchange information:
{
"code": 0,
"data": {
"assets": [
{
"asset_id": 1,
"asset": "BTC",
"name": "Bitcoin",
"decimals": 8,
"can_deposit": true,
"can_withdraw": true,
"can_trade": true
}
],
"symbols": [
{
"symbol_id": 1,
"symbol": "BTC_USDT",
"base_asset": "BTC",
"quote_asset": "USDT",
"price_decimals": 2,
"qty_decimals": 8,
"is_tradable": true,
"is_visible": true
}
],
"server_time": 1734897000000
}
}
4.2 Other Endpoints
| Endpoint | Description |
|---|---|
GET /api/v1/assets | Asset list only |
GET /api/v1/symbols | Symbol list only |
5. Verification
5.1 Integration Test
./scripts/test_account_integration.sh
Scope:
- ✅ DB Initialization (Auto reset + seed)
- ✅ Assets/Symbols/ExchangeInfo API
- ✅ DB Constraints (Lowercase rejected)
- ✅ Idempotency
5.2 Unit Test
cargo test --lib
# 150 passed, 0 failed
6. Next Steps
🇨🇳 中文
📦 代码变更: 查看 Diff
本章建立交易系统的账户基础设施:exchange_info 模块、命名规范、数据库管理。
1. 核心模块:exchange_info
1.1 模块结构
src/exchange_info/
├── mod.rs # 模块入口
├── validation.rs # AssetName/SymbolName 验证
├── asset/
│ ├── mod.rs
│ ├── models.rs # Asset 结构 + asset_flags
│ └── manager.rs # AssetManager
└── symbol/
├── mod.rs
├── models.rs # Symbol 结构 + symbol_flags
└── manager.rs # SymbolManager
1.2 核心类型
#![allow(unused)]
fn main() {
// Asset (资产)
pub struct Asset {
pub asset_id: i32,
pub asset: String, // "BTC", "USDT" (强制大写)
pub name: String, // "Bitcoin", "Tether USD"
pub decimals: i16, // 8 for BTC, 6 for USDT
pub status: i16,
pub asset_flags: i32, // 权限位
}
// Symbol (交易对)
pub struct Symbol {
pub symbol_id: i32,
pub symbol: String, // "BTC_USDT" (强制大写)
pub base_asset_id: i32,
pub quote_asset_id: i32,
pub price_decimals: i16,
pub qty_decimals: i16,
pub symbol_flags: i32,
}
}
2. 命名规范
| 类别 | 规范 | 示例 |
|---|---|---|
| 数据库名 | _db 后缀 | exchange_info_db |
| 表名 | _tb 后缀 | assets_tb, symbols_tb |
| Flags 模块 | 表名前缀 | asset_flags::, symbol_flags:: |
| Asset/Symbol 代码 | 强制大写 | BTC, BTC_USDT |
详见 命名规范文档
3. 数据库管理
3.1 Python 管理脚本
# 完整初始化(重置 + 种子数据)
python3 scripts/db/manage_db.py init
# 只重置 schema(无数据)
python3 scripts/db/manage_db.py reset
# 只添加种子数据
python3 scripts/db/manage_db.py seed
# 查看当前状态
python3 scripts/db/manage_db.py status
3.2 数据库约束
-- Asset 强制大写
CONSTRAINT chk_asset_uppercase CHECK (asset = UPPER(asset))
-- Symbol 强制大写
CONSTRAINT chk_symbol_uppercase CHECK (symbol = UPPER(symbol))
4. API 端点
4.1 GET /api/v1/exchange_info
返回完整的交易所信息:
{
"code": 0,
"data": {
"assets": [
{
"asset_id": 1,
"asset": "BTC",
"name": "Bitcoin",
"decimals": 8,
"can_deposit": true,
"can_withdraw": true,
"can_trade": true
}
],
"symbols": [
{
"symbol_id": 1,
"symbol": "BTC_USDT",
"..."
}
],
"server_time": 1734897000000
}
}
4.2 其他端点
| 端点 | 说明 |
|---|---|
GET /api/v1/assets | 仅返回资产列表 |
GET /api/v1/symbols | 仅返回交易对列表 |
5. 测试验证
5.1 集成测试
./scripts/test_account_integration.sh
5.2 单元测试
cargo test --lib
6. 下一步
0x0A-b: ID Specification & Account Structure
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📅 Status: Design Phase Core Objective: Define ID generation rules and account data structures.
1. ID Generation Rules
1.1 User ID (u64)
- Semantics: Global unique user identifier.
- Strategy: Auto-increment or Snowflake/ULID (for future distributed support).
- Initial Value:
1024(0-1023 reserved for system accounts).
1.2 Asset ID (u32)
- Semantics: Asset identifier (e.g., BTC=1, USDT=2).
- Strategy: Sequential allocation starting from
1. - Purpose: Maintain O(1) array indexing performance.
1.3 Symbol ID (u32)
- Semantics: Trading Pair identifier (e.g., BTC_USDT=1).
- Strategy: Sequential allocation starting from
1.
1.4 Account Identification
- Semantics: User’s sub-account (distinguishing Funding vs Spot).
- Strategy: Use
(user_id, account_type)tuple, no composite ID needed.#![allow(unused)] fn main() { struct AccountKey { user_id: u64, account_type: AccountType, // Funding | Spot } } - Account Types:
Spot= 1Funding= 2
1.5 Order ID / Trade ID (u64)
- Semantics: Unique identifier for orders/trades within the Matching Engine.
- Strategy: Global atomic increment.
2. Core Data Structures
2.1 AccountType Enum
#![allow(unused)]
fn main() {
#[repr(u8)]
pub enum AccountType {
Funding = 0x01,
Spot = 0x02,
}
}
2.2 Account Struct (Conceptual)
#![allow(unused)]
fn main() {
pub struct Account {
pub user_id: u64,
pub account_type: AccountType,
pub balances: HashMap<AssetId, Balance>,
pub created_at: u64,
pub status: AccountStatus,
}
}
3. System Reserved Accounts
| User ID | Purpose | Description |
|---|---|---|
0 | REVENUE | Platform fee income account |
1 | INSURANCE | Insurance fund (future) |
2-1023 | Reserved | For future system use (1024 total) |
This design will be updated to
src/core_types.rsandsrc/account/mod.rsupon confirmation.
💡 Future Consideration: Alternative System ID Range
Current: System IDs use 0-1023 (1024 total), users start at 1024.
Problem: Test data might accidentally use 1, 2, 3… which conflicts with system IDs.
Alternative: Use u64::MAX downward for system accounts:
#![allow(unused)]
fn main() {
const REVENUE_ID: u64 = u64::MAX; // 18446744073709551615
const INSURANCE_ID: u64 = u64::MAX - 1; // 18446744073709551614
const SYSTEM_MIN: u64 = u64::MAX - 1000; // Boundary
fn is_system_account(user_id: u64) -> bool {
user_id > SYSTEM_MIN
}
}
Benefits:
- Users can start from 1, more natural
- Test data never conflicts with system IDs
- Clear separation: low = users, high = system
🇨🇳 中文
📅 状态: 设计中 核心目标: 定义系统中所有关键 ID 的生成规则和账户的基础数据结构。
1. ID 生成规则
1.1 User ID (u64)
- 语义: 全局唯一的用户标识符。
- 生成策略: 自增序列 或 Snowflake/ULID (未来支持分布式)。
- 初始值:
1024(0-1023 保留给系统账户)。
1.2 Asset ID (u32)
- 语义: 资产标识符(如 BTC=1, USDT=2)。
- 生成策略: 顺序分配,从
1开始。 - 目的: 保持 O(1) 数组索引性能。
1.3 Symbol ID (u32)
- 语义: 交易对标识符(如 BTC/USDT=1)。
- 生成策略: 顺序分配,从
1开始。
1.4 账户标识
- 语义: 用户的子账户(区分 Funding 与 Spot)。
- 策略: 使用
(user_id, account_type)元组,不需要复合 ID。#![allow(unused)] fn main() { struct AccountKey { user_id: u64, account_type: AccountType, // Funding | Spot } } - 账户类型:
Spot= 1Funding= 2
1.5 Order ID / Trade ID (u64)
- 语义: 撮合引擎内的订单/成交唯一标识。
- 生成策略: 全局原子递增。
2. 核心数据结构
2.1 AccountType 枚举
#![allow(unused)]
fn main() {
#[repr(u8)]
pub enum AccountType {
Funding = 0x01,
Spot = 0x02,
}
}
2.2 Account 结构体 (概念)
#![allow(unused)]
fn main() {
pub struct Account {
pub user_id: u64,
pub account_type: AccountType,
pub balances: HashMap<AssetId, Balance>,
pub created_at: u64,
pub status: AccountStatus,
}
}
3. 系统保留账户
| User ID | 用途 | 说明 |
|---|---|---|
0 | REVENUE | 平台手续费收入账户 |
1 | INSURANCE | 保险基金 (未来) |
2-1023 | 保留 | 未来系统用途 (共 1024 个) |
此设计待确认后,将同步更新至
src/core_types.rs与src/account/mod.rs。
💡 未来考虑: 替代系统 ID 范围
当前: 系统 ID 使用 0-1023 (共 1024 个),用户从 1024 开始。
问题: 测试数据可能使用 1, 2, 3…,与系统 ID 冲突。
替代方案: 使用 u64::MAX 倒数作为系统账户:
#![allow(unused)]
fn main() {
const REVENUE_ID: u64 = u64::MAX; // 18446744073709551615
const INSURANCE_ID: u64 = u64::MAX - 1; // 18446744073709551614
const SYSTEM_MIN: u64 = u64::MAX - 1000; // 边界
fn is_system_account(user_id: u64) -> bool {
user_id > SYSTEM_MIN
}
}
好处:
- 用户可以从 1 开始,更自然
- 测试数据永不与系统 ID 冲突
- 清晰分离: 低位=用户,高位=系统
0x0A-c: API Authentication
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📅 Status: ✅ Implemented Branch:
0x0A-b-api-authDate: 2025-12-23 Code Changes: v0.0A-a-account-system…v0.0A-b-api-auth
Implementation Summary
| Metric | Result |
|---|---|
| Auth Module | 8 Files |
| Unit Tests | 35/35 ✅ |
| Total Tests | 188/188 ✅ |
| Commits | 31 commits |
1. Overview
Implement secure request authentication for Gateway API to protect trading endpoints from unauthorized access.
1.1 Design Goals
| Goal | Description |
|---|---|
| Security | Prevent forgery and replay attacks |
| Performance | Verification latency < 1ms |
| Scalability | Support multiple auth methods |
| Usability | Developer-friendly SDK integration |
1.2 Threat Model
- Request Forgery
- Replay Attack
- Man-in-the-Middle (MITM)
- API Key Leakage
- Brute Force
2. Authentication Scheme Comparison
2.1 Evaluation
| Scheme | Security | Performance | Complexity | Leal Risk |
|---|---|---|---|---|
| HMAC-SHA256 | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | Medium | 🔴 Secret on server |
| Ed25519 | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | Medium | 🟢 Public key only |
| JWT Token | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | Low | 🔴 Token replayable |
| OAuth 2.0 | ⭐⭐⭐⭐ | ⭐⭐⭐ | High | 🟡 Dependency |
2.2 Decision: Ed25519
Selected Ed25519 Asymmetric Signature.
- No Server Secret: Only public key stored.
- Non-Repudiation: Only private key holder can sign.
- High Security: 128-bit security level (256-bit key).
- Fast Verification: ~100μs.
3. Ed25519 Signature Design
3.1 Key Pair
- Private Key: 32 bytes, stored on Client, NEVER transmitted.
- Public Key: 32 bytes, stored on Server.
- Signature: 64 bytes.
3.2 Request Signature Format
payload = api_key + ts_nonce + method + path + body
signature = Ed25519.sign(private_key, payload)
Header Format:
Authorization: ZXINF v1.<api_key>.<ts_nonce>.<signature>
| Field | Description | Encoding |
|---|---|---|
api_key | AK_ + 16 HEX (19 chars) | plain |
ts_nonce | Monotonic Timestamp (ms) | numeric |
signature | 64-byte signature | Base62 |
ts_nonce: Must be strictly monotonically increasing.
new_ts = max(now_ms, last_ts + 1).
4. Database Design
4.1 api_keys_tb Table
CREATE TABLE api_keys_tb (
key_id SERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users_tb(user_id),
api_key VARCHAR(35) UNIQUE NOT NULL,
key_type SMALLINT NOT NULL DEFAULT 1, -- 1=Ed25519
key_data BYTEA NOT NULL, -- Public Key (32 bytes)
permissions INT NOT NULL DEFAULT 1,
status SMALLINT NOT NULL DEFAULT 1,
...
);
4.2 Key Types
| key_type | Algorithm | key_data |
|---|---|---|
| 1 | Ed25519 | Public Key (32 bytes) |
| 2 | HMAC-SHA256 | SHA256(secret) |
| 3 | RSA | PEM Public Key |
5. Code Architecture
5.1 Module Structure
src/api_auth/
├── mod.rs
├── api_key.rs # Model + Repository
├── signature.rs # Ed25519 verification
├── middleware.rs # Axum Middleware
└── error.rs # Auth Errors
5.2 Request Flow
- Extract Headers.
- Verify Timestamp window.
- Query ApiKey (Cache/DB).
- Verify Ed25519 Signature.
- Check Permissions.
- Inject
user_idinto context.
6. Route Protection
6.1 Public Endpoints (No Auth)
GET /api/v1/public/exchange_infoGET /api/v1/public/depthGET /api/v1/public/klinesGET /api/v1/public/ticker
6.2 Private Endpoints (Auth Required)
GET /api/v1/private/accountPOST /api/v1/private/order(Trade Perm)POST /api/v1/private/withdraw(Withdraw Perm)
7. Performance
- Signature Verification: < 50μs (Ed25519).
- DB Query: < 1ms (Cached).
- Total Latency Overhead: < 2ms.
8. SDK Example (Python)
from nacl.signing import SigningKey
import time
api_key = "AK_..."
private_key = bytes.fromhex("...")
signing_key = SigningKey(private_key)
def sign_request(method, path, body=""):
ts_nonce = str(int(time.time() * 1000))
payload = f"{api_key}{ts_nonce}{method}{path}{body}"
signature = signing_key.sign(payload.encode()).signature
sig_b62 = base62_encode(signature)
return f"v1.{api_key}.{ts_nonce}.{sig_b62}"
🇨🇳 中文
📅 状态: ✅ 实现完成 代码变更: 查看 Diff
Implementation Summary
| 指标 | 结果 |
|---|---|
| Auth 模块 | 8 文件 |
| 单元测试 | 35/35 ✅ |
| 全部测试 | 188/188 ✅ |
1. 概述
为 Gateway API 实现安全的请求鉴权机制,保护交易接口免受未授权访问。
1.1 设计目标
安全、高性能、可扩展、易用。
1.2 安全威胁模型
请求伪造、重放攻击、中间人攻击、Key 泄露等。
2. 鉴权方案对比
2.2 选型决策
选择 Ed25519 非对称签名。
- 服务端无 secret:仅存储公钥。
- 不可抵赖性。
- 高安全性。
- 快速验证 (~100μs)。
3. Ed25519 签名算法设计
3.1 密钥对生成
私钥客户端保存,公钥服务端存储。
3.2 请求签名格式
payload = api_key + ts_nonce + method + path + body
signature = Ed25519.sign(private_key, payload)
Header: Authorization: ZXINF v1.<api_key>.<ts_nonce>.<signature>
4. 数据库设计
4.1 api_keys_tb 表
支持 key_type (1=Ed25519, 2=HMAC, 3=RSA)。key_data 存储公钥或 secret hash。
5. 代码架构
src/api_auth/ 下包含 api_key, signature, middleware 等模块。
6. 路由保护策略
- Public: 行情接口,无需鉴权。
- Private: 交易/账户接口,需签名鉴权。
7. 性能考虑
使用 Ed25519 极速验证 (< 50μs) + 内存缓存,总延迟 < 2ms。
8. SDK 示例 (Python)
提供 Python/Curl 示例代码,展示如何生成符合规范的 Authorization header。
0x0B Funding & Transfer: Fund System
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📅 Status: 📝 Draft Branch:
0x0B-funding-transferDate: 2025-12-23
1. Overview
1.1 Objectives
Build a complete fund management system supporting:
- Deposit: External funds entering the exchange.
- Withdraw: Funds leaving the exchange.
- Transfer: Internal fund movement between accounts.
1.2 Design Principles
| Principle | Description |
|---|---|
| Integrity | Complete audit log for every change |
| Double Entry | Debits = Credits, funds conserved |
| Async | Deposits/Withdrawals are async, Transfers sync |
| Idempotency | No duplicate execution |
| Auditability | All actions traceable |
2. Account Model
2.1 Architecture Overview
┌─────────────────────────────────────────────────────────────────────────┐
│ Account Architecture │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────────────┐ ┌───────────────────────────┐ │
│ │ Funding Account │ │ Spot Account │ │
│ │ (account_type = 2) │ │ (account_type = 1) │ │
│ ├───────────────────────────┤ ├───────────────────────────┤ │
│ │ Storage: PostgreSQL │ │ Storage: UBSCore (RAM) │ │
│ │ Table: balances_tb │ │ HashMap in memory │ │
│ │ │ │ │ │
│ │ Purpose: │ │ Purpose: │ │
│ │ - Deposit (充值) │ │ - Trading (撮合) │ │
│ │ - Withdraw (提现) │ │ - Order matching │ │
│ │ - Internal Transfer │ │ - Real-time balance │ │
│ └─────────────┬─────────────┘ └─────────────┬─────────────┘ │
│ │ │ │
│ └──────── Transfer (划转) ───────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
2.2 Storage Summary
| Account | Type | Storage | Table/Structure |
|---|---|---|---|
| Funding | 2 | PostgreSQL | balances_tb |
| Spot | 1 | Memory (UBSCore) | HashMap<(user_id, asset_id), Balance> |
Note:
balances_tbis currently used for Funding account only. Spot balances are managed in-memory by UBSCore and persisted to TDengine as events.
2.3 Schema (PostgreSQL)
Current Implementation: Single balances_tb for all user balances.
-- 001_init_schema.sql
CREATE TABLE balances_tb (
balance_id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users_tb(user_id),
asset_id INT NOT NULL REFERENCES assets_tb(asset_id),
available DECIMAL(30, 8) NOT NULL DEFAULT 0,
frozen DECIMAL(30, 8) NOT NULL DEFAULT 0,
version INT NOT NULL DEFAULT 1,
UNIQUE (user_id, asset_id)
);
Note: Current design uses single balance per (user_id, asset_id). Future multi-account support (Spot/Funding/Margin) can add
account_typecolumn.
3. Deposit Flow
- User gets address.
- User transfers funds to exchange address.
- Indexer monitors chain.
- Wait for Confirmations.
- Credit to Funding Account.
3.1 Deposit Table
CREATE TYPE deposit_status AS ENUM ('pending', 'confirming', 'completed', 'failed');
CREATE TABLE deposits_tb (
deposit_id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users_tb(user_id),
asset_id INTEGER NOT NULL REFERENCES assets_tb(asset_id),
amount BIGINT NOT NULL,
tx_hash VARCHAR(128) UNIQUE,
status deposit_status NOT NULL DEFAULT 'pending',
...
);
4. Withdrawal Flow
- User Request -> Review -> Sign -> Broadcast -> Complete.
4.1 Withdrawal Table
CREATE TYPE withdraw_status AS ENUM ('pending', 'risk_review', 'processing', 'completed', ...);
CREATE TABLE withdrawals_tb (
withdrawal_id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL,
amount BIGINT NOT NULL,
fee BIGINT NOT NULL,
net_amount BIGINT NOT NULL,
status withdraw_status NOT NULL DEFAULT 'pending',
...
);
4.2 Risk Rules
- Small Amount: Auto-approve (< 500 USDT).
- Large Amount: Manual Review (>= 10000 USDT).
- New Address: 24h Delay.
5. Transfer
5.1 Types
funding → spot: Available for trading.spot → funding: Available for withdrawal.user → user: Internal transfer.
5.2 API Design
POST /api/v1/private/transfer
{
"from_account": "funding",
"to_account": "spot",
"asset": "USDT",
"amount": "100.00"
}
6. Ledger
Complete record of all fund movements.
CREATE TYPE ledger_type AS ENUM ('deposit', 'withdraw', 'transfer_in', 'trade_buy', ...);
CREATE TABLE ledger_tb (
ledger_id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL,
ledger_type ledger_type NOT NULL,
amount BIGINT NOT NULL,
balance_after BIGINT NOT NULL,
ref_id BIGINT,
...
);
7. Implementation Plan
- Phase 1: DB: Migrations for sub_accounts, funding, ledger.
- Phase 2: Transfer: Model + API (Sync).
- Phase 3: Deposit: Model + Address logic.
- Phase 4: Withdraw: Model + Risk logic.
8. Design Decisions
| Decision | Choice | Reason |
|---|---|---|
| Account Model | Sub-accounts | Isolate trading risks |
| Storage | PostgreSQL | ACID Requirement |
| Transfer | Synchronous | User Experience |
| Deposit | Asynchronous | Chain dependency |
🇨🇳 中文
📅 状态: 📝 草稿 分支:
0x0B-funding-transfer
1. 概述
构建完整的资金管理体系,支持充值、提现、划转。
1.2 设计原则
账本完整性、双重记账、异步处理、幂等性、可审计。
2. 账户模型
2.1 架构总览
┌─────────────────────────────────────────────────────────────────────────┐
│ 账户架构 │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────────────┐ ┌───────────────────────────┐ │
│ │ Funding 账户 │ │ Spot 账户 │ │
│ │ (account_type = 2) │ │ (account_type = 1) │ │
│ ├───────────────────────────┤ ├───────────────────────────┤ │
│ │ 存储: PostgreSQL │ │ 存储: UBSCore (内存) │ │
│ │ 表: balances_tb │ │ HashMap 内存结构 │ │
│ │ │ │ │ │
│ │ 用途: │ │ 用途: │ │
│ │ - 充值 (Deposit) │ │ - 撮合交易 (Trading) │ │
│ │ - 提现 (Withdraw) │ │ - 订单匹配 │ │
│ │ - 内部划转 │ │ - 实时余额管理 │ │
│ └─────────────┬─────────────┘ └─────────────┬─────────────┘ │
│ │ │ │
│ └──────── 划转 (Transfer) ───────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
2.2 存储汇总
| 账户 | 类型 | 存储 | 表/结构 |
|---|---|---|---|
| Funding | 2 | PostgreSQL | balances_tb |
| Spot | 1 | 内存 (UBSCore) | HashMap<(user_id, asset_id), Balance> |
备注:
balances_tb目前仅用于 Funding 账户。 Spot 余额由 UBSCore 内存管理,事件持久化到 TDengine。
2.3 数据库设计 (PostgreSQL)
当前实现: balances_tb 用于 Funding 账户余额。
-- 001_init_schema.sql
CREATE TABLE balances_tb (
balance_id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL,
asset_id INT NOT NULL,
available DECIMAL(30, 8) NOT NULL DEFAULT 0,
frozen DECIMAL(30, 8) NOT NULL DEFAULT 0,
UNIQUE (user_id, asset_id)
);
备注: 当前设计每个 (user_id, asset_id) 一条余额记录。 未来多账户支持 (Spot/Funding/Margin) 可添加
account_type列。
3. 充值流程 (Deposit)
监听链上交易 -> 等待确认数 -> 入账 Funding 账户。
3.3 确认数规则
BTC 3个确认 (~30min),ETH 12个确认 (~3min)。
4. 提现流程 (Withdraw)
用户申请 -> 风控审核 -> 签名广播 -> 完成。
4.3 风控规则
小额免审,大额人工复核,新地址延迟提现。
5. 划转 (Transfer)
5.1 划转类型
支持 funding <-> spot 互转,及内部用户转账。
5.3 API 设计
POST /api/v1/private/transfer,需要 Ed25519 签名鉴权。
6. 资金流水 (Ledger)
记录每一笔资金变动 (deposit, withdraw, trade, fee, etc.),确保可追溯。
7. 实现计划
- Phase 1: 数据库 Migration
- Phase 2: Transfer 功能 (优先)
- Phase 3: Deposit (P2)
- Phase 4: Withdraw (P2)
8. 设计决策
| 决策 | 选择 | 理由 |
|---|---|---|
| 账户模型 | 子账户 | 隔离交易和充提资金 |
| 充提存储 | PostgreSQL | 需要事务 ACID |
| 划转 | 同步 | 低延迟,用户体验好 |
| 充提 | 异步 | 依赖链上确认 |
0x0B-a Internal Transfer Architecture (Strict FSM)
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📦 Code Changes: View Diff
1. Problem Statement
1.1 System Topology
| System | Role | Source of Truth | Persistence |
|---|---|---|---|
| PostgreSQL | Funding Account | balances_tb | ACID, Durable |
| UBSCore | Trading Account | RAM | WAL + Volatile |
1.2 The Core Constraint
These two systems cannot share a transaction. There is no XA/2PC database protocol. Therefore: We must build our own 2-Phase Commit using an external FSM Coordinator.
1.5 Security Pre-Validation (MANDATORY)
Caution
Defense-in-Depth All checks below MUST be performed at every independent module, not just API layer.
- API Layer: First line of defense, reject obviously invalid requests
- Coordinator: Re-validate, prevent internal calls bypassing API
- Adapters: Final defense, each adapter must independently validate parameters
- UBSCore: Last check before in-memory operations
Safety > Performance. The cost of redundant checks is acceptable; security vulnerabilities are not.
1.5.1 Identity & Authorization Checks
| Check | Attack Vector | Validation Logic | Error Code |
|---|---|---|---|
| User Authentication | Forged request | JWT/Session must be valid | UNAUTHORIZED |
| User ID Consistency | Cross-user transfer attack | request.user_id == auth.user_id | FORBIDDEN |
| Account Ownership | Steal others’ funds | Source/Target accounts belong to same user_id | FORBIDDEN |
1.5.2 Account Type Checks
| Check | Attack Vector | Validation Logic | Error Code |
|---|---|---|---|
| from != to | Infinite wash trading/resource waste | request.from != request.to | SAME_ACCOUNT |
| Account Type Valid | Inject invalid type | from, to ∈ {FUNDING, SPOT} | INVALID_ACCOUNT_TYPE |
| Account Type Supported | Request unlaunched feature | from, to both in supported list | UNSUPPORTED_ACCOUNT_TYPE |
1.5.3 Amount Checks
| Check | Attack Vector | Validation Logic | Error Code |
|---|---|---|---|
| amount > 0 | Zero/negative transfer | amount > 0 | INVALID_AMOUNT |
| Precision Check | Precision overflow | decimal_places(amount) <= asset.precision | PRECISION_OVERFLOW |
| Minimum Amount | Dust attack | amount >= asset.min_transfer_amount | AMOUNT_TOO_SMALL |
| Maximum Single Amount | Risk control bypass | amount <= asset.max_transfer_amount | AMOUNT_TOO_LARGE |
| Integer Overflow | u64 overflow attack | amount <= u64::MAX / safety_factor | OVERFLOW |
1.5.4 Asset Checks
| Check | Attack Vector | Validation Logic | Error Code |
|---|---|---|---|
| Asset Exists | Fake asset_id | asset_id exists in system | INVALID_ASSET |
| Asset Status | Delisted asset | asset.status == ACTIVE | ASSET_SUSPENDED |
| Transfer Permission | Some assets forbid internal transfer | asset.internal_transfer_enabled == true | TRANSFER_NOT_ALLOWED |
1.5.5 Account Status Checks
Account Initialization Rules (Overview)
| Account Type | Init Timing | Notes |
|---|---|---|
| FUNDING | Created on first deposit request | Triggered by external deposit flow |
| SPOT | Created on first internal transfer | Lazy Init |
| FUTURE | Created on first internal transfer [P2] | Lazy Init |
| MARGIN | Created on first internal transfer [P2] | Lazy Init |
Note
- Specific initialization behaviors and business rules for each account type are defined in their dedicated documents.
- Each account has its own state definitions (e.g., whether transfer is allowed); not detailed here.
- Default State: On account initialization, transfer is allowed by default.
Account Status Check Table
| Check | Attack Vector | Validation Logic | Error Code |
|---|---|---|---|
| Source Account Exists | Non-existent account | Source account record must exist | SOURCE_ACCOUNT_NOT_FOUND |
| Target Account Exists/Create | Non-existent target | FUNDING must exist; SPOT/FUTURE/MARGIN can create | TARGET_ACCOUNT_NOT_FOUND (FUNDING only) |
| Source Not Frozen | Frozen account transfer out | source.status != FROZEN | ACCOUNT_FROZEN |
| Source Not Disabled | Disabled account operation | source.status != DISABLED | ACCOUNT_DISABLED |
| Sufficient Balance | Insufficient balance direct reject | source.available >= amount | INSUFFICIENT_BALANCE |
1.5.6 Rate Limiting - [P2 Future Optimization]
Note
This is a V2 optimization. V1 may skip this.
| Check | Attack Vector | Validation Logic | Error Code |
|---|---|---|---|
| Requests Per Second | DoS attack | user_requests_per_second <= 10 | RATE_LIMIT_EXCEEDED |
| Daily Transfer Count | Abuse | user_daily_transfers <= 100 | DAILY_LIMIT_EXCEEDED |
| Daily Transfer Amount | Large amount risk control | user_daily_amount <= daily_limit | DAILY_AMOUNT_EXCEEDED |
1.5.7 Idempotency Check
| Check | Attack Vector | Validation Logic | Error Code |
|---|---|---|---|
| cid Unique | Duplicate submission | If cid provided, check if exists | DUPLICATE_REQUEST (return original result) |
1.5.8 Check Order (Recommended)
1. Authentication (JWT valid?)
2. Authorization (user_id match?)
3. Request Format (from/to/amount valid?)
4. Account Type (from != to, type supported?)
5. Asset Check (exists? enabled? transferable?)
6. Amount Check (range? precision? overflow?)
7. Rate Limiting (exceeded?)
8. Idempotency (duplicate?)
9. Balance Check (sufficient?) ← Check last, avoid unnecessary queries
2. FSM Design (The State Machine)
2.0 Library Choice: rust-fsm
We use the rust-fsm library, providing:
- ✅ Compile-time validation - Illegal state transitions cause compile errors.
- ✅ Declarative DSL - Clearly defined states and transitions.
- ✅ Type Safety - Prevents missing match arms.
Cargo.toml:
[dependencies]
rust-fsm = "0.7"
DSL Definition:
#![allow(unused)]
fn main() {
use rust_fsm::*;
state_machine! {
derive(Debug, Clone, Copy, PartialEq, Eq)
TransferFsm(Init) // Initial State
// State Definitions
Init => {
SourceWithdrawOk => SourceDone,
SourceWithdrawFail => Failed,
},
SourceDone => {
TargetDepositOk => Committed,
TargetDepositFail => Compensating,
TargetDepositUnknown => SourceDone [loop], // Stay, Infinite Retry
},
Compensating => {
RefundOk => RolledBack,
RefundFail => Compensating [loop], // Stay, Infinite Retry
},
// Terminal States
Committed,
Failed,
RolledBack,
}
}
Note
The DSL above is used for compile-time validation of state transition validity. Actual runtime state is stored in PostgreSQL and updated via CAS.
2.0.1 Core State Flow (Top Level)
┌─────────────────────────────────────────────────────────┐
│ INTERNAL TRANSFER FSM │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────── Happy Path ────────────────────────────────────────────┐
│ │
│ ┌─────────┐ ┌─────────────┐ ┌───────────────┐ │
│ │ INIT │ Source Deduct ✓ │ SOURCE_DONE │ Target Credit ✓ │ │ │
│ │(Request)│ ─────────────────▶ │ (In-Flight) │ ─────────────────▶ │ COMMITTED │ │
│ └─────────┘ └─────────────┘ │ │ │
│ │ │ └───────────────┘ │
│ │ │ ✅ │
└─────────│───────────────────────────────│───────────────────────────────────────────────┘
│ │
│ │
│ ▼
│ ╔══════════════════════════════════════════════════╗
│ ║ 🔒 ATOMIC COMMIT ║
│ ║ ║
│ ║ IF AND ONLY IF: ║
│ ║ FROM.withdraw = SUCCESS ✓ ║
│ ║ TO.deposit = SUCCESS ✓ ║
│ ║ ║
│ ║ EXECUTE: CAS(SOURCE_DONE → COMMITTED) ║
│ ║ Must be atomic and non-interruptible. ║
│ ╚══════════════════════════════════════════════════╝
│ │
│ Source Deduction Fail │ Target Credit Fail (EXPLICIT_FAIL)
▼ ▼
┌──────────┐ ┌──────────────┐
│ FAILED │ │ COMPENSATING │◀───────────┐
│ (Source) │ │ (Refunding) │ │ Refund Fail (Infinite Retry)
└──────────┘ └──────────────┘────────────┘
❌ │ Refund Success
▼
┌─────────────┐
│ ROLLED_BACK │
│ (Restored) │
└─────────────┘
↩️
╔════════════════════════════════════════════════════════════════════════════════════════╗
║ ⚠️ Target Unknown (TIMEOUT/UNKNOWN) → Stay SOURCE_DONE, Infinite Retry, NEVER rollback. ║
╚════════════════════════════════════════════════════════════════════════════════════════╝
Core State Description:
| State | Fund Location | Description |
|---|---|---|
INIT | Source Account | User request accepted, funds haven’t moved yet. |
SOURCE_DONE | In-Flight | CRITICAL! Funds have left source, haven’t reached target. |
COMMITTED | Target Account | Terminal state, transfer succeeded. |
FAILED | Source Account | Terminal state, source deduction failed, no funds moved. |
COMPENSATING | In-Flight | Target credit failed, refunding to source. |
ROLLED_BACK | Source Account | Terminal state, refund succeeded. |
Important
SOURCE_DONEis the most critical state - funds have left the source account but have not yet reached the target. At this point, the state MUST NOT be lost; it must eventually reachCOMMITTEDorROLLED_BACK.
2.1 States (Exhaustive)
| ID | State Name | Entry Condition | Terminal? | Funds Location |
|---|---|---|---|---|
| 0 | INIT | User request accepted. | No | Source |
| 10 | SOURCE_PENDING | CAS success, Adapter call initiated. | No | Source (Deducting) |
| 20 | SOURCE_DONE | Source Adapter returned OK. | No | In-Flight |
| 30 | TARGET_PENDING | CAS success, Target Adapter call initiated. | No | In-Flight (Crediting) |
| 40 | COMMITTED | Target Adapter returned OK. | YES | Target |
| -10 | FAILED | Source Adapter returned FAIL. | YES | Source (Unchanged) |
| -20 | COMPENSATING | Target Adapter FAIL AND Source is Reversible. | No | In-Flight (Refunding) |
| -30 | ROLLED_BACK | Source Refund OK. | YES | Source (Restored) |
2.2 State Transition Rules (Exhaustive)
┌───────────────────────────────────────────────────────────────────────────────┐
│ CANONICAL STATE TRANSITIONS │
├───────────────────────────────────────────────────────────────────────────────┤
│ │
│ INIT ──────[CAS OK]───────► SOURCE_PENDING │
│ │ │ │
│ │ ├──[Adapter OK]────► SOURCE_DONE │
│ │ │ │ │
│ │ └──[Adapter FAIL]──► FAILED (Terminal) │
│ │ │ │
│ │ │ │
│ │ SOURCE_DONE ──[CAS OK]──► TARGET_PENDING │
│ │ │ │
│ │ ┌────────────────────────────────────┤ │
│ │ │ │ │
│ │ [Adapter OK]│ [Adapter FAIL] │
│ │ │ │ │
│ │ ▼ ▼ │
│ │ COMMITTED ┌───────────────────┐ │
│ │ (Terminal) │ SOURCE REVERSIBLE?│ │
│ │ └─────────┬─────────┘ │
│ │ YES │ NO │
│ │ ▼ │ ▼ │
│ │ COMPENSATING │ INFINITE │
│ │ │ │ RETRY │
│ │ [Refund OK] │ │ (Stay in │
│ │ ▼ │ │ TARGET_ │
│ │ ROLLED_BACK │ │ PENDING) │
│ │ (Terminal) │ │ │
│ │ │ │ │
│ └─────────────────────────────────────────────────┴─────────┴──────────────┘
2.3 Reversibility Rule (CRITICAL)
Core Principle: Only when an Adapter returns an explicitly defined failure can we safely rollback.
| Response Type | Meaning | Can Safely Rollback? | Handling |
|---|---|---|---|
SUCCESS | Operation succeeded | N/A | Continue to next step |
EXPLICIT_FAIL | Explicit business failure (e.g., insufficient balance) | YES | Can enter COMPENSATING |
TIMEOUT | Timeout, state unknown | NO | Infinite Retry |
PENDING | Processing, state unknown | NO | Infinite Retry |
NETWORK_ERROR | Network error, state unknown | NO | Infinite Retry |
UNKNOWN | Any other situation | NO | Infinite Retry or Manual Intervention |
Caution
Only
EXPLICIT_FAILallows safe rollback. Any unknown state (Timeout, Pending, Network Error) means funds are In-Flight. We cannot know whether the counterparty has processed the request. Rash rollback will cause Double Spend or Fund Loss. Only safe actions: Infinite Retry or Manual Intervention.
3. Transfer Scenarios (Step-by-Step)
3.1 Scenario A: Funding → Spot (Deposit to Trading)
Happy Path:
| Step | Actor | Action | Pre-State | Post-State | Funds |
|---|---|---|---|---|---|
| 1 | API | Validate, Create Record | - | INIT | Funding |
| 2 | Coordinator | CAS(INIT → SOURCE_PENDING) | INIT | SOURCE_PENDING | Funding |
| 3 | Coordinator | Call FundingAdapter.withdraw(req_id) | - | - | - |
| 4 | PG | UPDATE balances SET amount = amount - X | - | - | Deducted |
| 5 | Coordinator | On OK: CAS(SOURCE_PENDING → SOURCE_DONE) | SOURCE_PENDING | SOURCE_DONE | In-Flight |
| 6 | Coordinator | CAS(SOURCE_DONE → TARGET_PENDING) | SOURCE_DONE | TARGET_PENDING | In-Flight |
| 7 | Coordinator | Call TradingAdapter.deposit(req_id) | - | - | - |
| 8 | UBSCore | Credit RAM, Write WAL, Emit Event | - | - | Credited |
| 9 | Coordinator | On Event: CAS(TARGET_PENDING → COMMITTED) | TARGET_PENDING | COMMITTED | Trading |
Failure Path (Target Fails):
| Step | Actor | Action | Pre-State | Post-State | Funds |
|---|---|---|---|---|---|
| 7’ | Coordinator | Call TradingAdapter.deposit(req_id) → FAIL/Timeout | TARGET_PENDING | - | In-Flight |
| 8’ | Coordinator | Check: Source = Funding (Reversible) | - | - | - |
| 9’ | Coordinator | CAS(TARGET_PENDING → COMPENSATING) | TARGET_PENDING | COMPENSATING | In-Flight |
| 10’ | Coordinator | Call FundingAdapter.refund(req_id) | - | - | - |
| 11’ | PG | UPDATE balances SET amount = amount + X | - | - | Refunded |
| 12’ | Coordinator | CAS(COMPENSATING → ROLLED_BACK) | COMPENSATING | ROLLED_BACK | Funding |
3.2 Scenario B: Spot → Funding (Withdraw from Trading)
Happy Path:
| Step | Actor | Action | Pre-State | Post-State | Funds |
|---|---|---|---|---|---|
| 1 | API | Validate, Create Record | - | INIT | Trading |
| 2 | Coordinator | CAS(INIT → SOURCE_PENDING) | INIT | SOURCE_PENDING | Trading |
| 3 | Coordinator | Call TradingAdapter.withdraw(req_id) | - | - | - |
| 4 | UBSCore | Check Balance, Deduct RAM, Write WAL, Emit Event | - | - | Deducted |
| 5 | Coordinator | On Event: CAS(SOURCE_PENDING → SOURCE_DONE) | SOURCE_PENDING | SOURCE_DONE | In-Flight |
| 6 | Coordinator | CAS(SOURCE_DONE → TARGET_PENDING) | SOURCE_DONE | TARGET_PENDING | In-Flight |
| 7 | Coordinator | Call FundingAdapter.deposit(req_id) | - | - | - |
| 8 | PG | INSERT ... ON CONFLICT UPDATE SET amount = amount + X | - | - | Credited |
| 9 | Coordinator | On OK: CAS(TARGET_PENDING → COMMITTED) | TARGET_PENDING | COMMITTED | Funding |
Failure Path (Target Fails):
| Step | Actor | Action | Pre-State | Post-State | Funds |
|---|---|---|---|---|---|
| 7a | Coordinator | Call FundingAdapter.deposit(req_id) → EXPLICIT_FAIL (e.g., constraint) | TARGET_PENDING | - | In-Flight |
| 8a | Coordinator | Check response type = EXPLICIT_FAIL (can safely rollback) | - | - | - |
| 9a | Coordinator | CAS(TARGET_PENDING → COMPENSATING) | TARGET_PENDING | COMPENSATING | In-Flight |
| 10a | Coordinator | Call TradingAdapter.refund(req_id) (refund to UBSCore) | - | - | - |
| 11a | UBSCore | Credit RAM balance, write WAL | - | - | Refunded |
| 12a | Coordinator | CAS(COMPENSATING → ROLLED_BACK) | COMPENSATING | ROLLED_BACK | Trading |
| Step | Actor | Action | Pre-State | Post-State | Funds |
|---|---|---|---|---|---|
| 7b | Coordinator | Call FundingAdapter.deposit(req_id) → TIMEOUT/UNKNOWN | TARGET_PENDING | - | In-Flight |
| 8b | Coordinator | Check response type = UNKNOWN (cannot safely rollback) | - | - | - |
| 9b | Coordinator | DO NOT TRANSITION. Stay TARGET_PENDING. | TARGET_PENDING | TARGET_PENDING | In-Flight |
| 10b | Coordinator | Log CRITICAL. Alert Ops. Schedule Retry. | - | - | - |
| 11b | Recovery | Retry FundingAdapter.deposit(req_id) INFINITELY. | - | - | - |
| 12b | (Eventually) | On OK: CAS(TARGET_PENDING → COMMITTED) | TARGET_PENDING | COMMITTED | Funding |
Warning
Only enter
COMPENSATINGwhen Target returnsEXPLICIT_FAIL. If Timeout or Unknown, funds are In-Flight. Must Infinite Retry or Manual Intervention.
4. Failure Mode and Effects Analysis (FMEA)
4.1 Phase 1 Failures (Source Operation)
| Failure | Cause | Current State | Funds | Resolution |
|---|---|---|---|---|
Adapter returns FAIL | Insufficient balance, DB constraint | SOURCE_PENDING | Source | Transition to FAILED. User sees error. |
Adapter returns PENDING | Timeout, network issue | SOURCE_PENDING | Unknown | Retry. Adapter MUST be idempotent. |
| Coordinator crashes after CAS, before call | Process kill | SOURCE_PENDING | Source | Recovery Worker retries call. |
| Coordinator crashes after call, before result | Process kill | SOURCE_PENDING | Unknown | Recovery Worker retries (idempotent). |
4.2 Phase 2 Failures (Target Operation)
| Failure | Cause | Response Type | Current State | Funds | Resolution |
|---|---|---|---|---|---|
| Target explicit reject | Business rule | EXPLICIT_FAIL | TARGET_PENDING | In-Flight | COMPENSATING → Refund. |
| Timeout | Network delay | TIMEOUT | TARGET_PENDING | Unknown | Infinite Retry. |
| Network error | Connection lost | NETWORK_ERROR | TARGET_PENDING | Unknown | Infinite Retry. |
| Unknown error | System exception | UNKNOWN | TARGET_PENDING | Unknown | Infinite Retry or Manual Intervention. |
| Coordinator crashes | Process kill | N/A | TARGET_PENDING | In-Flight | Recovery Worker retries. |
4.3 Compensation Failures
| Failure | Cause | Current State | Funds | Resolution |
|---|---|---|---|---|
Refund FAIL | PG down, constraint | COMPENSATING | In-Flight | Infinite Retry. Funds stuck until PG up. |
Refund PENDING | Timeout | COMPENSATING | Unknown | Retry. |
5. Idempotency Requirements (MANDATORY)
5.1 Why Idempotency?
Retries are the foundation of crash recovery. Without idempotency, a retry will cause double execution (double deduction, double credit).
5.2 Implementation (Funding Adapter)
Requirement: Given the same req_id, calling withdraw() or deposit() multiple times MUST have the same effect as calling it once.
Mechanism:
transfers_tbhasUNIQUE(req_id).- Atomic Transaction:
BEGIN; -- Check if already processed SELECT state FROM transfers_tb WHERE req_id = $1; IF state >= expected_post_state THEN RETURN 'AlreadyProcessed'; END IF; -- Perform balance update UPDATE balances_tb SET amount = amount - $2 WHERE user_id = $3 AND asset_id = $4 AND amount >= $2; IF NOT FOUND THEN RETURN 'InsufficientBalance'; END IF; -- Update state UPDATE transfers_tb SET state = $new_state, updated_at = NOW() WHERE req_id = $1; COMMIT; RETURN 'Success';
5.3 Implementation (Trading Adapter)
Requirement: Same as above. UBSCore MUST reject duplicate req_id.
Mechanism:
InternalOrderincludesreq_idfield (orcid).- UBSCore maintains a
ProcessedTransferSet(HashSet in RAM, rebuilt from WAL on restart). - On receiving Transfer Order:
IF req_id IN ProcessedTransferSet THEN RETURN 'AlreadyProcessed' (Success, no-op) ELSE ProcessTransfer() ProcessedTransferSet.insert(req_id) WriteWAL(TransferEvent) RETURN 'Success' END IF
6. Recovery Worker (Zombie Handler)
6.1 Purpose
On Coordinator startup (or periodically), scan for “stuck” transfers and resume them.
6.2 Query
SELECT * FROM transfers_tb
WHERE state IN (0, 10, 20, 30, -20) -- INIT, SOURCE_PENDING, SOURCE_DONE, TARGET_PENDING, COMPENSATING
AND updated_at < NOW() - INTERVAL '1 minute'; -- Stale threshold
6.3 Recovery Logic
| Current State | Action |
|---|---|
INIT | Call step() (will transition to SOURCE_PENDING). |
SOURCE_PENDING | Retry Source.withdraw(). |
SOURCE_DONE | Call step() (will transition to TARGET_PENDING). |
TARGET_PENDING | Retry Target.deposit(). Apply Reversibility Rule. |
COMPENSATING | Retry Source.refund(). |
7. Data Model
7.1 Table: transfers_tb
CREATE TABLE transfers_tb (
transfer_id BIGSERIAL PRIMARY KEY,
req_id VARCHAR(26) UNIQUE NOT NULL, -- Server-generated Unique ID (ULID)
cid VARCHAR(64) UNIQUE, -- Client Idempotency Key (Optional)
user_id BIGINT NOT NULL,
asset_id INTEGER NOT NULL,
amount DECIMAL(30, 8) NOT NULL,
transfer_type SMALLINT NOT NULL, -- 1 = Funding->Spot, 2 = Spot->Funding
source_type SMALLINT NOT NULL, -- 1 = Funding, 2 = Trading
state SMALLINT NOT NULL DEFAULT 0, -- FSM State ID
error_message TEXT, -- Last error (for debugging)
retry_count INTEGER NOT NULL DEFAULT 0,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_transfers_state ON transfers_tb(state) WHERE state NOT IN (40, -10, -30);
7.2 Invariant Check
Run periodically to detect data corruption:
-- Sum of Funding + Trading + In-Flight should be constant per user per asset
-- In-Flight = SUM(amount) WHERE state IN (SOURCE_DONE, TARGET_PENDING, COMPENSATING)
8. API Contract
8.1 Endpoint: POST /api/v1/internal_transfer
Request:
{
"from": "SPOT", // Source account type
"to": "FUNDING", // Target account type
"asset": "USDT",
"amount": "100.00"
}
Account Type Enum (AccountType):
| Value | Meaning | Status |
|---|---|---|
FUNDING | Funding Account (PostgreSQL) | Supported |
SPOT | Spot Trading Account (UBSCore) | Supported |
FUTURE | Futures Account | Future Extension |
MARGIN | Margin Account | Future Extension |
Response:
{
"transfer_id": 12345,
"req_id": "01JFVQ2X8Z0Y1M3N4P5R6S7T8U", // Server-generated (ULID)
"from": "SPOT",
"to": "FUNDING",
"state": "COMMITTED", // or "PENDING" if async
"message": "Transfer successful"
}
8.2 Query Endpoint: GET /api/v1/internal_transfer/:req_id
Response:
{
"transfer_id": 12345,
"req_id": "sr-1734912345678901234",
"from": "SPOT",
"to": "FUNDING",
"asset": "USDT",
"amount": "100.00",
"state": "COMMITTED",
"created_at": "2024-12-23T14:00:00Z",
"updated_at": "2024-12-23T14:00:01Z"
}
Important
req_idis SERVER-GENERATED, not client. If client needs idempotency, use optionalcid(client_order_id) field. Server will check for duplicates and return existing result.
Error Codes:
| Code | Meaning |
|---|---|
INSUFFICIENT_BALANCE | Source account balance < amount. |
INVALID_ACCOUNT_TYPE | from or to account type is invalid or unsupported. |
SAME_ACCOUNT | from and to are the same. |
DUPLICATE_REQUEST | cid already processed. Return original result. |
INVALID_AMOUNT | amount <= 0 or exceeds precision. |
SYSTEM_ERROR | Internal failure. Advise retry. |
9. Implementation Pseudocode (Critical State Checks)
9.1 API Layer
function handle_transfer_request(request, auth_context):
// ========== Defense-in-Depth Layer 1: API Layer ==========
// 1. Identity Authentication
if !auth_context.is_valid():
return Error(UNAUTHORIZED)
// 2. User ID Consistency (Prevent cross-user attacks)
if request.user_id != auth_context.user_id:
return Error(FORBIDDEN, "User ID mismatch")
// 3. Account Type Check
if request.from == request.to:
return Error(SAME_ACCOUNT)
if request.from NOT IN [FUNDING, SPOT]:
return Error(INVALID_ACCOUNT_TYPE)
if request.to NOT IN [FUNDING, SPOT]:
return Error(INVALID_ACCOUNT_TYPE)
// 4. Amount Check
if request.amount <= 0:
return Error(INVALID_AMOUNT)
if decimal_places(request.amount) > asset.precision:
return Error(PRECISION_OVERFLOW)
// 5. Idempotency Check
if request.cid:
existing = db.find_by_cid(request.cid)
if existing:
return Success(existing) // Return existing result
// 6. Asset Check
asset = db.get_asset(request.asset_id)
if !asset or asset.status != ACTIVE:
return Error(INVALID_ASSET)
// 7. Call Coordinator
result = coordinator.create_and_execute(request)
return result
9.2 Coordinator Layer
function create_and_execute(request):
// ========== Defense-in-Depth Layer 2: Coordinator ==========
// Re-verify (Prevent internal calls bypassing API)
ASSERT request.from != request.to
ASSERT request.amount > 0
ASSERT request.user_id > 0
// Generate unique ID
req_id = ulid.new()
// Create transfer record (State = INIT)
transfer = TransferRecord {
req_id: req_id,
user_id: request.user_id,
from: request.from,
to: request.to,
asset_id: request.asset_id,
amount: request.amount,
state: INIT,
created_at: now()
}
db.insert(transfer)
log.info("Transfer created", req_id)
// Execute FSM
return execute_fsm(req_id)
function execute_fsm(req_id):
loop:
transfer = db.get(req_id)
if transfer.state.is_terminal():
return transfer
new_state = step(transfer)
if new_state == transfer.state:
// No progress, wait for retry
sleep(RETRY_INTERVAL)
continue
function step(transfer):
match transfer.state:
INIT:
return step_init(transfer)
SOURCE_PENDING:
return step_source_pending(transfer)
SOURCE_DONE:
return step_source_done(transfer)
TARGET_PENDING:
return step_target_pending(transfer)
COMPENSATING:
return step_compensating(transfer)
_:
return transfer.state // Terminal, no processing
function step_init(transfer):
// CAS: Persist state BEFORE calling adapter (Persist-Before-Call)
success = db.cas_update(
req_id = transfer.req_id,
old_state = INIT,
new_state = SOURCE_PENDING
)
if !success:
return db.get(transfer.req_id).state
// Get source adapter
source_adapter = get_adapter(transfer.from)
// ========== Defense-in-Depth Layer 3: Adapter ==========
result = source_adapter.withdraw(
req_id = transfer.req_id,
user_id = transfer.user_id,
asset_id = transfer.asset_id,
amount = transfer.amount
)
match result:
SUCCESS:
db.cas_update(transfer.req_id, SOURCE_PENDING, SOURCE_DONE)
return SOURCE_DONE
EXPLICIT_FAIL(reason):
db.update_with_error(transfer.req_id, SOURCE_PENDING, FAILED, reason)
return FAILED
TIMEOUT | PENDING | NETWORK_ERROR | UNKNOWN:
log.warn("Source withdraw unknown state", transfer.req_id)
return SOURCE_PENDING
function step_source_done(transfer):
// ========== Enter SOURCE_DONE: Funds In-Flight, must reach terminal state ==========
// CAS update to TARGET_PENDING
success = db.cas_update(transfer.req_id, SOURCE_DONE, TARGET_PENDING)
if !success:
return db.get(transfer.req_id).state
// Get target adapter
target_adapter = get_adapter(transfer.to)
// ========== Defense-in-Depth Layer 4: Target Adapter ==========
result = target_adapter.deposit(
req_id = transfer.req_id,
user_id = transfer.user_id,
asset_id = transfer.asset_id,
amount = transfer.amount
)
match result:
SUCCESS:
// ╔════════════════════════════════════════════════════════════════╗
// ║ 🔒 ATOMIC COMMIT - CRITICAL STEP! ║
// ║ ║
// ║ At this point: ║
// ║ FROM.withdraw = SUCCESS ✓ (already confirmed) ║
// ║ TO.deposit = SUCCESS ✓ (just confirmed) ║
// ║ ║
// ║ Execute Atomic CAS Commit: ║
// ║ CAS(TARGET_PENDING → COMMITTED) ║
// ║ ║
// ║ Once this CAS succeeds, the transfer is irreversible! ║
// ╚════════════════════════════════════════════════════════════════╝
commit_success = db.cas_update(transfer.req_id, TARGET_PENDING, COMMITTED)
if !commit_success:
return db.get(transfer.req_id).state
log.info("🔒 ATOMIC COMMIT SUCCESS", transfer.req_id)
return COMMITTED
EXPLICIT_FAIL(reason):
db.update_with_error(transfer.req_id, TARGET_PENDING, COMPENSATING, reason)
return COMPENSATING
TIMEOUT | PENDING | NETWORK_ERROR | UNKNOWN:
// ========== CRITICAL: Unknown state, MUST NOT compensate! ==========
log.critical("Target deposit unknown state - INFINITE RETRY", transfer.req_id)
alert_ops("Transfer stuck in TARGET_PENDING", transfer.req_id)
return TARGET_PENDING // Stay and retry
function step_compensating(transfer):
source_adapter = get_adapter(transfer.from)
result = source_adapter.refund(
req_id = transfer.req_id,
user_id = transfer.user_id,
asset_id = transfer.asset_id,
amount = transfer.amount
)
match result:
SUCCESS:
db.cas_update(transfer.req_id, COMPENSATING, ROLLED_BACK)
log.info("Transfer rolled back", transfer.req_id)
return ROLLED_BACK
_:
log.critical("Refund failed - MUST RETRY", transfer.req_id)
return COMPENSATING
9.3 Adapter Layer (Example: Funding Adapter)
function withdraw(req_id, user_id, asset_id, amount):
// ========== Defense-in-Depth Layer 3: Adapter Internal Verification ==========
// Re-verify parameters (Do not trust caller)
ASSERT amount > 0
ASSERT user_id > 0
ASSERT asset_id > 0
// Idempotency Check
existing = db.find_transfer_operation(req_id, "WITHDRAW")
if existing:
return existing.result
// Begin transaction
tx = db.begin_transaction()
try:
// SELECT FOR UPDATE
account = tx.select_for_update(
"SELECT * FROM balances_tb WHERE user_id = ? AND asset_id = ? AND account_type = 'FUNDING'"
)
if !account:
tx.rollback()
return EXPLICIT_FAIL("SOURCE_ACCOUNT_NOT_FOUND")
if account.status == FROZEN:
tx.rollback()
return EXPLICIT_FAIL("ACCOUNT_FROZEN")
if account.available < amount:
tx.rollback()
return EXPLICIT_FAIL("INSUFFICIENT_BALANCE")
// Execute deduction
tx.update("UPDATE balances_tb SET available = available - ? WHERE id = ?", amount, account.id)
// Record operation for idempotency
tx.insert("INSERT INTO transfer_operations (req_id, op_type, result) VALUES (?, 'WITHDRAW', 'SUCCESS')")
tx.commit()
return SUCCESS
catch Exception as e:
tx.rollback()
log.error("Withdraw failed", req_id, e)
return UNKNOWN // Uncertainty requires retry
10. Acceptance Test Plan (Security Critical)
Caution
ALL tests below must pass before going production. Any failure indicates potential fund theft, loss, or creation from thin air.
10.1 Fund Conservation Tests
| Test ID | Scenario | Expected Result | Verification |
|---|---|---|---|
| INV-001 | After normal transfer | Total funds = Before | SUM(source) + SUM(target) = Constant |
| INV-002 | After failed transfer | Total funds = Before | Source balance unchanged |
| INV-003 | After rollback | Total funds = Before | Source balance fully restored |
| INV-004 | After crash recovery | Total funds = Before | Verify all account balances |
10.2 External Attack Tests
| Test ID | Attack Vector | Steps | Expected Result |
|---|---|---|---|
| ATK-001 | Cross-user transfer | Submits user B’s funds with user A’s token | FORBIDDEN |
| ATK-002 | user_id Tampering | Modify user_id in request body | FORBIDDEN |
| ATK-003 | Negative Amount | amount = -100 | INVALID_AMOUNT |
| ATK-004 | Zero Amount | amount = 0 | INVALID_AMOUNT |
| ATK-005 | Precision Overflow | amount = 0.000000001 (>8 decimals) | PRECISION_OVERFLOW |
| ATK-006 | Integer Overflow | amount = u64::MAX + 1 | OVERFLOW or parse error |
| ATK-007 | Same Account | from = to = SPOT | SAME_ACCOUNT |
| ATK-008 | Invalid Account Type | from = “INVALID” | INVALID_ACCOUNT_TYPE |
| ATK-009 | Non-existent Asset | asset_id = 999999 | INVALID_ASSET |
| ATK-010 | Duplicate cid | Submit same ID twice | Second returns first result |
| ATK-011 | No Token | Missing Authorization header | UNAUTHORIZED |
| ATK-012 | Expired Token | Use expired JWT | UNAUTHORIZED |
| ATK-013 | Forged Token | Invalid signature JWT | UNAUTHORIZED |
10.3 Balance & Status Tests
| Test ID | Scenario | Expected Result |
|---|---|---|
| BAL-001 | amount > available | INSUFFICIENT_BALANCE, no change |
| BAL-002 | amount = available | Success, balance becomes 0 |
| BAL-003 | Concurrent: Total > balance | One success, one INSUFFICIENT_BALANCE |
| BAL-004 | Transfer from frozen account | ACCOUNT_FROZEN |
| BAL-005 | Transfer from disabled account | ACCOUNT_DISABLED |
10.4 FSM State Transition Tests
| Test ID | Scenario | Expected State Flow |
|---|---|---|
| FSM-001 | Normal Funding→Spot | INIT → SOURCE_PENDING → SOURCE_DONE → TARGET_PENDING → COMMITTED |
| FSM-002 | Normal Spot→Funding | Same as above |
| FSM-003 | Source Failure | INIT → SOURCE_PENDING → FAILED |
| FSM-004 | Target Failure (Explicit) | … → TARGET_PENDING → COMPENSATING → ROLLED_BACK |
| FSM-005 | Target Timeout | … → TARGET_PENDING (Stay, infinite retry) |
| FSM-006 | Compensation Failure | COMPENSATING (Stay, infinite retry) |
10.5 Crash Recovery Tests
| Test ID | Crash Point | Expected Recovery Behavior |
|---|---|---|
| CRA-001 | After INIT, before SOURCE_PENDING | Recovery reads INIT, restarts step_init |
| CRA-002 | During SOURCE_PENDING, before call | Recovery retries withdraw (idempotent) |
| CRA-003 | During SOURCE_PENDING, after call | Recovery retries withdraw (idempotent, returns handled) |
| CRA-004 | After SOURCE_DONE, before TARGET_PENDING | Recovery executes step_source_done |
| CRA-005 | During TARGET_PENDING | Recovery retries deposit (idempotent) |
| CRA-006 | During COMPENSATING | Recovery retries refund (idempotent) |
10.6 Concurrency & Race Tests
| Test ID | Scenario | Expected Result |
|---|---|---|
| CON-001 | Multiple Workers on same req_id | Only one successful CAS, others skip |
| CON-002 | Concurrent Same Amount Transer | Two separate req_ids, both execute |
| CON-003 | Transfer + External Withdraw | Sum cannot exceed balance |
| CON-004 | No-lock balance read | No double deduction (SELECT FOR UPDATE) |
10.7 Idempotency Tests
| Test ID | Scenario | Expected Result |
|---|---|---|
| IDP-001 | Call withdraw twice | Second returns SUCCESS, balance deducted once |
| IDP-002 | Call deposit twice | Second returns SUCCESS, balance credited once |
| IDP-003 | Call refund twice | Second returns SUCCESS, balance credited once |
| IDP-004 | Recovery multiple retries | Final state consistent, balance correct |
10.8 Fund Anomaly Tests (Most Critical)
| Test ID | Threat | Method | Verification |
|---|---|---|---|
| FND-001 | Double Spend | Source deduct twice | Only deduct once (idempotent) |
| FND-002 | Fund Disappearance | Source success, target fail, no compensation | Must compensate or retry |
| FND-003 | Money from Nothing | Target credit twice | Only credit once (idempotent) |
| FND-004 | Lost in Transit | Crash at any point | Recovery restores integrity |
| FND-005 | State Inconsistency | SOURCE_DONE but DB not updated | WAL + Idempotency parity |
| FND-006 | Partial Commit | PG Transaction partial success | Atomic transaction (all or none) |
10.9 Monitoring & Alerting Tests
| Test ID | Scenario | Expected Alert |
|---|---|---|
| MON-001 | Stuck in TARGET_PENDING > 1m | CRITICAL Alert |
| MON-002 | Compensation fail 3 times | CRITICAL Alert |
| MON-003 | Fund conservation check fail | CRITICAL Alert + HALT Service |
| MON-004 | Abnormal freq per user | WARNING Alert [P2] |
🇨🇳 中文
📦 代码变更: 查看 Diff
1. 问题陈述
1.1 系统拓扑
| 系统 | 角色 | 数据源 | 持久化 |
|---|---|---|---|
| PostgreSQL | 资金账户 (Funding) | balances_tb | ACID, 持久化 |
| UBSCore | 交易账户 (Trading) | RAM | WAL + 易失性 |
1.2 核心约束
这两个系统 无法共享事务。没有 XA/2PC 数据库协议。 因此:我们必须使用外部 FSM 协调器构建自己的两阶段提交。
1.5 安全前置检查 (MANDATORY)
Caution
纵深防御 (Defense-in-Depth) 以下所有检查必须在 每一个独立模块 中执行,不仅仅是 API 层。
- API 层: 第一道防线,拒绝明显非法请求
- Coordinator: 再次验证,防止内部调用绕过 API
- Adapters: 最终防线,每个适配器必须独立验证参数
- UBSCore: 内存操作前最后一次检查
安全 > 性能。重复检查的开销可以接受,安全漏洞不可接受。
1.5.1 身份与授权检查
| 检查项 | 攻击向量 | 验证逻辑 | 错误码 |
|---|---|---|---|
| 用户认证 | 伪造请求 | JWT/Session 必须有效 | UNAUTHORIZED |
| 用户 ID 一致性 | 跨用户转账攻击 | request.user_id == auth.user_id | FORBIDDEN |
| 账户归属 | 转走他人资金 | 源/目标账户都属于同一 user_id | FORBIDDEN |
1.5.2 账户类型检查
| 检查项 | 攻击向量 | 验证逻辑 | 错误码 |
|---|---|---|---|
| from != to | 无限刷单/浪费资源 | request.from != request.to | SAME_ACCOUNT |
| 账户类型有效 | 注入无效类型 | from, to ∈ {FUNDING, SPOT} | INVALID_ACCOUNT_TYPE |
| 账户类型支持 | 请求未上线功能 | from, to 都在支持列表中 | UNSUPPORTED_ACCOUNT_TYPE |
1.5.3 金额检查
| 检查项 | 攻击向量 | 验证逻辑 | 错误码 |
|---|---|---|---|
| amount > 0 | 零/负数转账 | amount > 0 | INVALID_AMOUNT |
| 精度检查 | 精度溢出 | decimal_places(amount) <= asset.precision | PRECISION_OVERFLOW |
| 最小金额 | 微额攻击/粉尘攻击 | amount >= asset.min_transfer_amount | AMOUNT_TOO_SMALL |
| 最大单笔金额 | 风控绕过 | amount <= asset.max_transfer_amount | AMOUNT_TOO_LARGE |
| 整数溢出 | u64 溢出攻击 | amount <= u64::MAX / safety_factor | OVERFLOW |
1.5.4 资产检查
| 检查项 | 攻击向量 | 验证逻辑 | 错误码 |
|---|---|---|---|
| 资产存在 | 伪造 asset_id | asset_id 在系统中存在 | INVALID_ASSET |
| 资产状态 | 已下架资产 | asset.status == ACTIVE | ASSET_SUSPENDED |
| 转账许可 | 某些资产禁止内部转账 | asset.internal_transfer_enabled == true | TRANSFER_NOT_ALLOWED |
1.5.5 账户状态检查
账户初始化规则(概述)
| 账户类型 | 初始化时机 | 备注 |
|---|---|---|
| FUNDING | 首次申请充值时创建 | 外部充值流程触发 |
| SPOT | 首次内部转账时创建 | 懒加载 (Lazy Init) |
| FUTURE | 首次内部转账时创建 [P2] | 懒加载 |
| MARGIN | 首次内部转账时创建 [P2] | 懒加载 |
Note
- 各账户类型的具体初始化行为和业务规则,请参见各账户类型的专用文档。
- 每个账户都有自己的状态定义(如是否允许划转),当前不详细定义。
- 默认状态:账户初始化时,默认允许划转。
账户状态检查表
| 检查项 | 攻击向量 | 验证逻辑 | 错误码 |
|---|---|---|---|
| 源账户存在 | 不存在的账户 | 源账户记录必须存在 | SOURCE_ACCOUNT_NOT_FOUND |
| 目标账户存在/创建 | 不存在的目标 | FUNDING必须存在;SPOT/FUTURE/MARGIN可创建 | TARGET_ACCOUNT_NOT_FOUND (仅FUNDING) |
| 源账户未冻结 | 被冻结账户转出 | source.status != FROZEN | ACCOUNT_FROZEN |
| 源账户未禁用 | 被禁用账户操作 | source.status != DISABLED | ACCOUNT_DISABLED |
| 余额充足 | 余额不足直接拒绝 | source.available >= amount | INSUFFICIENT_BALANCE |
1.5.6 频率限制 (Rate Limiting) - [P2 未来优化]
Note
此部分为 V2 优化项,V1 可不实现。
| 检查项 | 攻击向量 | 验证逻辑 | 错误码 |
|---|---|---|---|
| 每秒请求数 | DoS 攻击 | user_requests_per_second <= 10 | RATE_LIMIT_EXCEEDED |
| 每日转账次数 | 滥用 | user_daily_transfers <= 100 | DAILY_LIMIT_EXCEEDED |
| 每日转账金额 | 大额风控 | user_daily_amount <= daily_limit | DAILY_AMOUNT_EXCEEDED |
1.5.7 幂等性检查
| 检查项 | 攻击向量 | 验证逻辑 | 错误码 |
|---|---|---|---|
| cid 唯一 | 重复提交 | 如提供 cid,检查是否已存在 | DUPLICATE_REQUEST (返回原结果) |
1.5.8 检查顺序 (推荐)
1. 身份认证 (JWT 有效?)
2. 授权检查 (user_id 匹配?)
3. 请求格式 (from/to/amount 有效?)
4. 账户类型 (from != to, 类型支持?)
5. 资产检查 (存在? 启用? 可转账?)
6. 金额检查 (范围? 精度? 溢出?)
7. 频率限制 (超限?)
8. 幂等性 (重复?)
9. 余额检查 (充足?) ← 最后检查,避免无谓查询
2. FSM 设计 (状态机)
2.0 库选择: rust-fsm
使用 rust-fsm 库,提供:
- ✅ 编译时验证 - 非法状态转换在编译时报错
- ✅ 声明式 DSL - 清晰定义状态和转换
- ✅ 类型安全 - 防止遗漏分支
Cargo.toml:
[dependencies]
rust-fsm = "0.7"
DSL 定义:
#![allow(unused)]
fn main() {
use rust_fsm::*;
state_machine! {
derive(Debug, Clone, Copy, PartialEq, Eq)
TransferFsm(Init) // 初始状态
// 状态定义
Init => {
SourceWithdrawOk => SourceDone,
SourceWithdrawFail => Failed,
},
SourceDone => {
TargetDepositOk => Committed,
TargetDepositFail => Compensating,
TargetDepositUnknown => SourceDone [loop], // 保持,无限重试
},
Compensating => {
RefundOk => RolledBack,
RefundFail => Compensating [loop], // 保持,无限重试
},
// 终态
Committed,
Failed,
RolledBack,
}
}
Note
上述 DSL 用于编译时验证状态转换的合法性。 实际运行时状态存储在 PostgreSQL,使用 CAS 更新。
2.0.1 核心状态流程图 (Top Level)
┌─────────────────────────────────────────────────────────┐
│ INTERNAL TRANSFER FSM │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────── 正常路径 (Happy Path) ──────────────────────────────────┐
│ │
│ ┌─────────┐ ┌─────────────┐ ┌───────────────┐ │
│ │ INIT │ 源扣减成功 ✓ │ SOURCE_DONE │ 目标入账成功 ✓ │ │ │
│ │(用户请求)│ ─────────────────▶ │ (资金在途) │ ─────────────────▶ │ COMMITTED │ │
│ └─────────┘ └─────────────┘ │ │ │
│ │ │ └───────────────┘ │
│ │ │ ✅ │
└────────│───────────────────────────────│───────────────────────────────────────────────┘
│ │
│ │
│ ▼
│ ╔══════════════════════════════════════════════════╗
│ ║ 🔒 ATOMIC COMMIT (原子提交) ║
│ ║ ║
│ ║ 当且仅当: ║
│ ║ FROM.withdraw = SUCCESS ✓ ║
│ ║ TO.deposit = SUCCESS ✓ ║
│ ║ ║
│ ║ 执行: CAS(SOURCE_DONE → COMMITTED) ║
│ ║ 此操作必须原子,不可中断 ║
│ ╚══════════════════════════════════════════════════╝
│ │
│ 源扣减失败 │ 目标入账失败 (明确 EXPLICIT_FAIL)
▼ ▼
┌──────────┐ ┌──────────────┐
│ FAILED │ │ COMPENSATING │◀───────────┐
│ (源失败) │ │ (退款中) │ │ 退款失败 (无限重试)
└──────────┘ └──────────────┘────────────┘
❌ │ 退款成功
▼
┌─────────────┐
│ ROLLED_BACK │
│ (已回滚) │
└─────────────┘
↩️
╔════════════════════════════════════════════════════════════════════════════════════════╗
║ ⚠️ 目标入账状态未知 (TIMEOUT/UNKNOWN) → 保持 SOURCE_DONE,无限重试,绝不进入 COMPENSATING║
╚════════════════════════════════════════════════════════════════════════════════════════╝
核心状态说明:
| 状态 | 资金位置 | 说明 |
|---|---|---|
INIT | 源账户 | 用户发起请求,资金尚未移动 |
SOURCE_DONE | 在途 | 关键点!资金已离开源,尚未到达目标 |
COMMITTED | 目标账户 | 终态,转账成功 |
FAILED | 源账户 | 终态,源扣减失败,无资金移动 |
COMPENSATING | 在途 | 目标入账失败,正在退款 |
ROLLED_BACK | 源账户 | 终态,退款成功 |
Important
SOURCE_DONE是最关键的状态 - 资金已离开源账户但尚未到达目标。 此时绝不能丢失状态,必须确保最终到达COMMITTED或ROLLED_BACK。
2.1 状态 (穷举)
| ID | 状态名 | 进入条件 | 终态? | 资金位置 |
|---|---|---|---|---|
| 0 | INIT | 用户请求已接受 | 否 | 源账户 |
| 10 | SOURCE_PENDING | CAS 成功,适配器调用已发起 | 否 | 源账户 (扣减中) |
| 20 | SOURCE_DONE | 源适配器返回 OK | 否 | 在途 |
| 30 | TARGET_PENDING | CAS 成功,目标适配器调用已发起 | 否 | 在途 (入账中) |
| 40 | COMMITTED | 目标适配器返回 OK | 是 | 目标账户 |
| -10 | FAILED | 源适配器返回 FAIL | 是 | 源账户 (未变) |
| -20 | COMPENSATING | 目标适配器 FAIL 且源可逆 | 否 | 在途 (退款中) |
| -30 | ROLLED_BACK | 源退款 OK | 是 | 源账户 (已恢复) |
2.2 状态转换规则 (穷举)
┌───────────────────────────────────────────────────────────────────────────────┐
│ 规范状态转换 │
├───────────────────────────────────────────────────────────────────────────────┤
│ │
│ INIT ──────[CAS成功]───────► SOURCE_PENDING │
│ │ │ │
│ │ ├──[适配器OK]────► SOURCE_DONE │
│ │ │ │ │
│ │ └──[适配器FAIL]──► FAILED (终态) │
│ │ │ │
│ │ │ │
│ │ SOURCE_DONE ──[CAS成功]──► TARGET_PENDING │
│ │ │ │
│ │ ┌────────────────────────────────────┤ │
│ │ │ │ │
│ │ [适配器OK] │ [适配器FAIL] │
│ │ │ │ │
│ │ ▼ ▼ │
│ │ COMMITTED ┌───────────────────┐ │
│ │ (终态) │ 源可逆? │ │
│ │ └─────────┬─────────┘ │
│ │ 是 │ 否 │
│ │ ▼ │ ▼ │
│ │ COMPENSATING │ 无限重试 │
│ │ │ │ (保持在 │
│ │ [退款OK] │ │ TARGET_ │
│ │ ▼ │ │ PENDING) │
│ │ ROLLED_BACK │ │ │
│ │ (终态) │ │ │
│ │ │ │ │
│ └─────────────────────────────────────────────────┴─────────┴──────────────┘
2.3 可逆性规则 (关键)
核心原则: 只有当适配器返回 明确定义的失败 时,才能安全撤销。
| 响应类型 | 含义 | 可安全撤销? | 处理方式 |
|---|---|---|---|
SUCCESS | 操作成功 | N/A | 继续下一步 |
EXPLICIT_FAIL | 明确业务失败 (如余额不足) | 是 | 可进入 COMPENSATING |
TIMEOUT | 超时,状态未知 | 否 | 无限重试 |
PENDING | 处理中,状态未知 | 否 | 无限重试 |
NETWORK_ERROR | 网络错误,状态未知 | 否 | 无限重试 |
UNKNOWN | 任何其他情况 | 否 | 无限重试或人工介入 |
Caution
只有
EXPLICIT_FAIL可以安全撤销。 任何状态未知的情况(超时、Pending、网络错误),资金都处于 In-Flight 中。 我们无法知道对方是否已处理。贸然撤销将导致 双花 或 资金丢失。 唯一安全操作:无限重试 或 人工介入。
3. 转账场景 (逐步)
3.1 场景 A: 资金 → 交易 (充值到交易账户)
正常路径:
| 步骤 | 执行者 | 操作 | 前状态 | 后状态 | 资金 |
|---|---|---|---|---|---|
| 1 | API | 验证,创建记录 | - | INIT | 资金账户 |
| 2 | 协调器 | CAS(INIT → SOURCE_PENDING) | INIT | SOURCE_PENDING | 资金账户 |
| 3 | 协调器 | 调用 FundingAdapter.withdraw(req_id) | - | - | - |
| 4 | PG | UPDATE balances SET amount = amount - X | - | - | 已扣减 |
| 5 | 协调器 | 收到 OK: CAS(SOURCE_PENDING → SOURCE_DONE) | SOURCE_PENDING | SOURCE_DONE | 在途 |
| 6 | 协调器 | CAS(SOURCE_DONE → TARGET_PENDING) | SOURCE_DONE | TARGET_PENDING | 在途 |
| 7 | 协调器 | 调用 TradingAdapter.deposit(req_id) | - | - | - |
| 8 | UBSCore | 增加RAM余额,写WAL,发出事件 | - | - | 已入账 |
| 9 | 协调器 | 收到事件: CAS(TARGET_PENDING → COMMITTED) | TARGET_PENDING | COMMITTED | 交易账户 |
失败路径 (目标失败):
| 步骤 | 执行者 | 操作 | 前状态 | 后状态 | 资金 |
|---|---|---|---|---|---|
| 7’ | 协调器 | 调用 TradingAdapter.deposit(req_id) → FAIL/超时 | TARGET_PENDING | - | 在途 |
| 8’ | 协调器 | 检查: 源 = 资金账户 (可逆) | - | - | - |
| 9’ | 协调器 | CAS(TARGET_PENDING → COMPENSATING) | TARGET_PENDING | COMPENSATING | 在途 |
| 10’ | 协调器 | 调用 FundingAdapter.refund(req_id) | - | - | - |
| 11’ | PG | UPDATE balances SET amount = amount + X | - | - | 已退款 |
| 12’ | 协调器 | CAS(COMPENSATING → ROLLED_BACK) | COMPENSATING | ROLLED_BACK | 资金账户 |
3.2 场景 B: 交易 → 资金 (从交易账户提现)
正常路径:
| 步骤 | 执行者 | 操作 | 前状态 | 后状态 | 资金 |
|---|---|---|---|---|---|
| 1 | API | 验证,创建记录 | - | INIT | 交易账户 |
| 2 | 协调器 | CAS(INIT → SOURCE_PENDING) | INIT | SOURCE_PENDING | 交易账户 |
| 3 | 协调器 | 调用 TradingAdapter.withdraw(req_id) | - | - | - |
| 4 | UBSCore | 检查余额,扣减RAM,写WAL,发出事件 | - | - | 已扣减 |
| 5 | 协调器 | 收到事件: CAS(SOURCE_PENDING → SOURCE_DONE) | SOURCE_PENDING | SOURCE_DONE | 在途 |
| 6 | 协调器 | CAS(SOURCE_DONE → TARGET_PENDING) | SOURCE_DONE | TARGET_PENDING | 在途 |
| 7 | 协调器 | 调用 FundingAdapter.deposit(req_id) | - | - | - |
| 8 | PG | INSERT ... ON CONFLICT UPDATE SET amount = amount + X | - | - | 已入账 |
| 9 | 协调器 | 收到 OK: CAS(TARGET_PENDING → COMMITTED) | TARGET_PENDING | COMMITTED | 资金账户 |
失败路径 (目标失败):
| 步骤 | 执行者 | 操作 | 前状态 | 后状态 | 资金 |
|---|---|---|---|---|---|
| 7a | 协调器 | 调用 FundingAdapter.deposit(req_id) → EXPLICIT_FAIL (如约束违反) | TARGET_PENDING | - | 在途 |
| 8a | 协调器 | 检查响应类型 = EXPLICIT_FAIL (可安全撤销) | - | - | - |
| 9a | 协调器 | CAS(TARGET_PENDING → COMPENSATING) | TARGET_PENDING | COMPENSATING | 在途 |
| 10a | 协调器 | 调用 TradingAdapter.refund(req_id) (向UBSCore退款) | - | - | - |
| 11a | UBSCore | 增加RAM余额,写WAL | - | - | 已退款 |
| 12a | 协调器 | CAS(COMPENSATING → ROLLED_BACK) | COMPENSATING | ROLLED_BACK | 交易账户 |
| 步骤 | 执行者 | 操作 | 前状态 | 后状态 | 资金 |
|---|---|---|---|---|---|
| 7b | 协调器 | 调用 FundingAdapter.deposit(req_id) → TIMEOUT/UNKNOWN | TARGET_PENDING | - | 在途 |
| 8b | 协调器 | 检查响应类型 = UNKNOWN (不可安全撤销) | - | - | - |
| 9b | 协调器 | 不转换状态。保持 TARGET_PENDING。 | TARGET_PENDING | TARGET_PENDING | 在途 |
| 10b | 协调器 | 记录 CRITICAL 日志。告警运维。安排重试。 | - | - | - |
| 11b | 恢复器 | 无限重试 FundingAdapter.deposit(req_id)。 | - | - | - |
| 12b | (最终) | 收到 OK: CAS(TARGET_PENDING → COMMITTED) | TARGET_PENDING | COMMITTED | 资金账户 |
Warning
只有当目标返回
EXPLICIT_FAIL时才能进入COMPENSATING。 如果是超时或未知状态,资金处于 In-Flight,必须无限重试或人工介入。
4. 失效模式与影响分析 (FMEA)
4.1 阶段1失败 (源操作)
| 失败 | 原因 | 当前状态 | 资金 | 解决方案 |
|---|---|---|---|---|
适配器返回 FAIL | 余额不足,DB约束 | SOURCE_PENDING | 源账户 | 转到 FAILED。用户看到错误。 |
适配器返回 PENDING | 超时,网络问题 | SOURCE_PENDING | 未知 | 重试。适配器必须幂等。 |
| 协调器在CAS后、调用前崩溃 | 进程终止 | SOURCE_PENDING | 源账户 | 恢复工作器重试调用。 |
| 协调器在调用后、结果前崩溃 | 进程终止 | SOURCE_PENDING | 未知 | 恢复工作器重试(幂等)。 |
4.2 阶段2失败 (目标操作)
| 失败 | 原因 | 响应类型 | 当前状态 | 资金 | 解决方案 |
|---|---|---|---|---|---|
| 目标明确拒绝 | 业务规则 | EXPLICIT_FAIL | TARGET_PENDING | 在途 | COMPENSATING → 退款。 |
| 超时 | 网络延迟 | TIMEOUT | TARGET_PENDING | 未知 | 无限重试。 |
| 网络错误 | 连接断开 | NETWORK_ERROR | TARGET_PENDING | 未知 | 无限重试。 |
| 未知错误 | 系统异常 | UNKNOWN | TARGET_PENDING | 未知 | 无限重试 或 人工介入。 |
| 协调器崩溃 | 进程终止 | N/A | TARGET_PENDING | 在途 | 恢复工作器重试。 |
4.3 补偿失败
| 失败 | 原因 | 当前状态 | 资金 | 解决方案 |
|---|---|---|---|---|
退款 FAIL | PG宕机,约束 | COMPENSATING | 在途 | 无限重试。资金卡住直到PG恢复。 |
退款 PENDING | 超时 | COMPENSATING | 未知 | 重试。 |
5. 幂等性要求 (强制)
5.1 为什么需要幂等性?
重试是崩溃恢复的基础。没有幂等性,重试将导致 双重执行(双重扣减、双重入账)。
5.2 实现 (资金适配器)
要求: 给定相同的 req_id,多次调用 withdraw() 或 deposit() 必须与调用一次效果相同。
机制:
transfers_tb有UNIQUE(req_id)。- 原子事务:
BEGIN; -- 检查是否已处理 SELECT state FROM transfers_tb WHERE req_id = $1; IF state >= expected_post_state THEN RETURN 'AlreadyProcessed'; END IF; -- 执行余额更新 UPDATE balances_tb SET amount = amount - $2 WHERE user_id = $3 AND asset_id = $4 AND amount >= $2; IF NOT FOUND THEN RETURN 'InsufficientBalance'; END IF; -- 更新状态 UPDATE transfers_tb SET state = $new_state, updated_at = NOW() WHERE req_id = $1; COMMIT; RETURN 'Success';
5.3 实现 (交易适配器)
要求: 同上。UBSCore 必须拒绝重复的 req_id。
机制:
InternalOrder包含req_id字段(或cid)。- UBSCore 维护一个
ProcessedTransferSet(RAM中的HashSet,重启时从WAL重建)。 - 收到转账订单时:
IF req_id IN ProcessedTransferSet THEN RETURN 'AlreadyProcessed' (成功,无操作) ELSE ProcessTransfer() ProcessedTransferSet.insert(req_id) WriteWAL(TransferEvent) RETURN 'Success' END IF
6. 恢复工作器 (僵尸处理器)
6.1 目的
在协调器启动时(或定期),扫描“卡住“的转账并恢复它们。
6.2 查询
SELECT * FROM transfers_tb
WHERE state IN (0, 10, 20, 30, -20) -- INIT, SOURCE_PENDING, SOURCE_DONE, TARGET_PENDING, COMPENSATING
AND updated_at < NOW() - INTERVAL '1 minute'; -- 过期阈值
6.3 恢复逻辑
| 当前状态 | 操作 |
|---|---|
INIT | 调用 step()(将转到 SOURCE_PENDING)。 |
SOURCE_PENDING | 重试 Source.withdraw()。 |
SOURCE_DONE | 调用 step()(将转到 TARGET_PENDING)。 |
TARGET_PENDING | 重试 Target.deposit()。应用可逆性规则。 |
COMPENSATING | 重试 Source.refund()。 |
7. 数据模型
7.1 表: transfers_tb
CREATE TABLE transfers_tb (
transfer_id BIGSERIAL PRIMARY KEY,
req_id VARCHAR(26) UNIQUE NOT NULL, -- 服务端生成的唯一 ID (ULID)
cid VARCHAR(64) UNIQUE, -- 客户端幂等键 (可选)
user_id BIGINT NOT NULL,
asset_id INTEGER NOT NULL,
amount DECIMAL(30, 8) NOT NULL,
transfer_type SMALLINT NOT NULL, -- 1 = 资金->交易, 2 = 交易->资金
source_type SMALLINT NOT NULL, -- 1 = 资金, 2 = 交易
state SMALLINT NOT NULL DEFAULT 0, -- FSM 状态 ID
error_message TEXT, -- 最后错误(用于调试)
retry_count INTEGER NOT NULL DEFAULT 0,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_transfers_state ON transfers_tb(state) WHERE state NOT IN (40, -10, -30);
7.2 不变量检查
定期运行以检测数据损坏:
-- 每个用户每个资产的 资金 + 交易 + 在途 之和应该是常数
-- 在途 = SUM(amount) WHERE state IN (SOURCE_DONE, TARGET_PENDING, COMPENSATING)
8. API 契约
8.1 端点: POST /api/v1/internal_transfer
请求:
{
"from": "SPOT", // 源账户类型
"to": "FUNDING", // 目标账户类型
"asset": "USDT",
"amount": "100.00"
}
账户类型枚举 (AccountType):
| 值 | 含义 | 状态 |
|---|---|---|
FUNDING | 资金账户 (PostgreSQL) | 已支持 |
SPOT | 现货交易账户 (UBSCore) | 已支持 |
FUTURE | 合约账户 | 未来扩展 |
MARGIN | 杠杆账户 | 未来扩展 |
响应:
{
"transfer_id": 12345,
"req_id": "01JFVQ2X8Z0Y1M3N4P5R6S7T8U", // 服务端生成 (ULID)
"from": "SPOT",
"to": "FUNDING",
"state": "COMMITTED", // 或 "PENDING" 如果异步
"message": "转账成功"
}
8.2 查询端点: GET /api/v1/internal_transfer/:req_id
响应:
{
"transfer_id": 12345,
"req_id": "sr-1734912345678901234",
"from": "SPOT",
"to": "FUNDING",
"asset": "USDT",
"amount": "100.00",
"state": "COMMITTED",
"created_at": "2024-12-23T14:00:00Z",
"updated_at": "2024-12-23T14:00:01Z"
}
Important
req_id由服务端生成,不是客户端。 客户端如果需要幂等性,应使用cid(client_order_id) 字段(可选),服务端会检查重复并返回已有结果。
错误码:
| 代码 | 含义 |
|---|---|
INSUFFICIENT_BALANCE | 源账户余额 < 金额。 |
INVALID_ACCOUNT_TYPE | from 或 to 的账户类型无效或不支持。 |
SAME_ACCOUNT | from 和 to 相同。 |
DUPLICATE_REQUEST | cid 已处理。返回原始结果。 |
INVALID_AMOUNT | 金额 <= 0 或超过精度。 |
SYSTEM_ERROR | 内部失败。建议重试。 |
9. 实现伪代码 (关键状态检查)
9.1 API 层
function handle_transfer_request(request, auth_context):
// ========== 纵深防御 Layer 1: API 层 ==========
// 1. 身份认证
if !auth_context.is_valid():
return Error(UNAUTHORIZED)
// 2. 用户 ID 一致性(防止跨用户攻击)
if request.user_id != auth_context.user_id:
return Error(FORBIDDEN, "User ID mismatch")
// 3. 账户类型检查
if request.from == request.to:
return Error(SAME_ACCOUNT)
if request.from NOT IN [FUNDING, SPOT]:
return Error(INVALID_ACCOUNT_TYPE)
if request.to NOT IN [FUNDING, SPOT]:
return Error(INVALID_ACCOUNT_TYPE)
// 4. 金额检查
if request.amount <= 0:
return Error(INVALID_AMOUNT)
if decimal_places(request.amount) > asset.precision:
return Error(PRECISION_OVERFLOW)
// 5. 幂等性检查
if request.cid:
existing = db.find_by_cid(request.cid)
if existing:
return Success(existing) // 返回已存在的结果
// 6. 资产检查
asset = db.get_asset(request.asset_id)
if !asset or asset.status != ACTIVE:
return Error(INVALID_ASSET)
// 7. 调用 Coordinator
result = coordinator.create_and_execute(request)
return result
9.2 Coordinator 层
function create_and_execute(request):
// ========== 纵深防御 Layer 2: Coordinator ==========
// 再次验证(防止内部调用绕过 API)
ASSERT request.from != request.to
ASSERT request.amount > 0
ASSERT request.user_id > 0
// 生成唯一 ID
req_id = ulid.new()
// 创建转账记录 (State = INIT)
transfer = TransferRecord {
req_id: req_id,
user_id: request.user_id,
from: request.from,
to: request.to,
asset_id: request.asset_id,
amount: request.amount,
state: INIT,
created_at: now()
}
db.insert(transfer)
log.info("Transfer created", req_id)
// 执行 FSM
return execute_fsm(req_id)
function execute_fsm(req_id):
loop:
transfer = db.get(req_id)
if transfer.state.is_terminal():
return transfer
new_state = step(transfer)
if new_state == transfer.state:
// 未进展,等待重试
sleep(RETRY_INTERVAL)
continue
function step(transfer):
match transfer.state:
INIT:
return step_init(transfer)
SOURCE_PENDING:
return step_source_pending(transfer)
SOURCE_DONE:
return step_source_done(transfer)
TARGET_PENDING:
return step_target_pending(transfer)
COMPENSATING:
return step_compensating(transfer)
_:
return transfer.state // 终态,不处理
function step_init(transfer):
// CAS: 先更新状态,再调用适配器(Persist-Before-Call)
success = db.cas_update(
req_id = transfer.req_id,
old_state = INIT,
new_state = SOURCE_PENDING
)
if !success:
// 并发冲突,重新读取
return db.get(transfer.req_id).state
// 获取源适配器
source_adapter = get_adapter(transfer.from)
// ========== 纵深防御 Layer 3: Adapter ==========
result = source_adapter.withdraw(
req_id = transfer.req_id,
user_id = transfer.user_id,
asset_id = transfer.asset_id,
amount = transfer.amount
)
match result:
SUCCESS:
db.cas_update(transfer.req_id, SOURCE_PENDING, SOURCE_DONE)
return SOURCE_DONE
EXPLICIT_FAIL(reason):
// 明确失败,可以安全终止
db.update_with_error(transfer.req_id, SOURCE_PENDING, FAILED, reason)
return FAILED
TIMEOUT | PENDING | NETWORK_ERROR | UNKNOWN:
// 状态未知,保持 SOURCE_PENDING,等待重试
log.warn("Source withdraw unknown state", transfer.req_id)
return SOURCE_PENDING
function step_source_done(transfer):
// ========== 进入 SOURCE_DONE: 资金已在途,必须确保最终到达终态 ==========
// CAS 更新到 TARGET_PENDING
success = db.cas_update(transfer.req_id, SOURCE_DONE, TARGET_PENDING)
if !success:
return db.get(transfer.req_id).state
// 获取目标适配器
target_adapter = get_adapter(transfer.to)
// ========== 纵深防御 Layer 4: Target Adapter ==========
result = target_adapter.deposit(
req_id = transfer.req_id,
user_id = transfer.user_id,
asset_id = transfer.asset_id,
amount = transfer.amount
)
match result:
SUCCESS:
// ╔════════════════════════════════════════════════════════════════╗
// ║ 🔒 ATOMIC COMMIT - 最关键的一步! ║
// ║ ║
// ║ 此时: ║
// ║ FROM.withdraw = SUCCESS ✓ (已确认) ║
// ║ TO.deposit = SUCCESS ✓ (刚确认) ║
// ║ ║
// ║ 执行原子 CAS 提交: ║
// ║ CAS(TARGET_PENDING → COMMITTED) ║
// ║ ║
// ║ 此 CAS 是最终确认,一旦成功,转账不可逆转! ║
// ╚════════════════════════════════════════════════════════════════╝
commit_success = db.cas_update(transfer.req_id, TARGET_PENDING, COMMITTED)
if !commit_success:
// 极少发生:另一个 Worker 已经提交,返回当前状态
return db.get(transfer.req_id).state
log.info("🔒 ATOMIC COMMIT SUCCESS", transfer.req_id)
return COMMITTED
EXPLICIT_FAIL(reason):
// 明确失败,可以进入补偿
db.update_with_error(transfer.req_id, TARGET_PENDING, COMPENSATING, reason)
return COMPENSATING
TIMEOUT | PENDING | NETWORK_ERROR | UNKNOWN:
// ========== 关键:状态未知,不能补偿!==========
log.critical("Target deposit unknown state - INFINITE RETRY", transfer.req_id)
alert_ops("Transfer stuck in TARGET_PENDING", transfer.req_id)
return TARGET_PENDING // 保持状态,等待重试
function step_compensating(transfer):
source_adapter = get_adapter(transfer.from)
result = source_adapter.refund(
req_id = transfer.req_id,
user_id = transfer.user_id,
asset_id = transfer.asset_id,
amount = transfer.amount
)
match result:
SUCCESS:
db.cas_update(transfer.req_id, COMPENSATING, ROLLED_BACK)
log.info("Transfer rolled back", transfer.req_id)
return ROLLED_BACK
_:
// 退款失败,必须无限重试
log.critical("Refund failed - MUST RETRY", transfer.req_id)
return COMPENSATING
9.3 Adapter 层 (示例: Funding Adapter)
function withdraw(req_id, user_id, asset_id, amount):
// ========== 纵深防御 Layer 3: Adapter 内部检查 ==========
// 再次验证参数(不信任调用者)
ASSERT amount > 0
ASSERT user_id > 0
ASSERT asset_id > 0
// 幂等性检查
existing = db.find_transfer_operation(req_id, "WITHDRAW")
if existing:
return existing.result // 返回已处理的结果
// 开始事务
tx = db.begin_transaction()
try:
// 获取账户并锁定
account = tx.select_for_update(
"SELECT * FROM balances_tb WHERE user_id = ? AND asset_id = ? AND account_type = 'FUNDING'"
)
if !account:
tx.rollback()
return EXPLICIT_FAIL("SOURCE_ACCOUNT_NOT_FOUND")
if account.status == FROZEN:
tx.rollback()
return EXPLICIT_FAIL("ACCOUNT_FROZEN")
if account.available < amount:
tx.rollback()
return EXPLICIT_FAIL("INSUFFICIENT_BALANCE")
// 执行扣减
tx.update("UPDATE balances_tb SET available = available - ? WHERE id = ?", amount, account.id)
// 记录操作(用于幂等性)
tx.insert("INSERT INTO transfer_operations (req_id, op_type, result) VALUES (?, 'WITHDRAW', 'SUCCESS')")
tx.commit()
return SUCCESS
catch Exception as e:
tx.rollback()
log.error("Withdraw failed", req_id, e)
return UNKNOWN // 不确定是否执行,必须重试
10. 验收测试计划 (安全关键)
Caution
以下测试必须全部通过才能上线。 任何失败都可能导致资金被盗、消失或无中生有。
10.1 资金守恒测试
| 测试 ID | 场景 | 预期结果 | 验证方法 |
|---|---|---|---|
| INV-001 | 正常转账后 | 总资金 = 转账前 | SUM(source) + SUM(target) = 常数 |
| INV-002 | 失败转账后 | 总资金 = 转账前 | 源账户余额无变化 |
| INV-003 | 回滚后 | 总资金 = 转账前 | 源账户余额完全恢复 |
| INV-004 | 系统崩溃恢复后 | 总资金 = 崩溃前 | 遍历所有账户验证 |
10.2 外部攻击测试
| 测试 ID | 攻击向量 | 测试步骤 | 预期结果 |
|---|---|---|---|
| ATK-001 | 跨用户转账 | 用 user_id=A 的 token 请求转 user_id=B 的资金 | FORBIDDEN |
| ATK-002 | user_id 篡改 | 修改请求体中的 user_id | FORBIDDEN |
| ATK-003 | 负数金额 | amount = -100 | INVALID_AMOUNT |
| ATK-004 | 零金额 | amount = 0 | INVALID_AMOUNT |
| ATK-005 | 超精度金额 | amount = 0.000000001 (超过8位) | PRECISION_OVERFLOW |
| ATK-006 | 整数溢出 | amount = u64::MAX + 1 | OVERFLOW 或解析失败 |
| ATK-007 | 相同账户 | from = to = SPOT | SAME_ACCOUNT |
| ATK-008 | 无效账户类型 | from = “INVALID” | INVALID_ACCOUNT_TYPE |
| ATK-009 | 不存在的资产 | asset_id = 999999 | INVALID_ASSET |
| ATK-010 | 重复 cid | 同一 cid 发两次 | 第二次返回第一次结果 |
| ATK-011 | 无 Token | 不带 Authorization header | UNAUTHORIZED |
| ATK-012 | 过期 Token | 使用过期的 JWT | UNAUTHORIZED |
| ATK-013 | 伪造 Token | 使用无效签名的 JWT | UNAUTHORIZED |
10.3 余额不足测试
| 测试 ID | 场景 | 预期结果 |
|---|---|---|
| BAL-001 | 转账金额 > 可用余额 | INSUFFICIENT_BALANCE,余额无变化 |
| BAL-002 | 转账金额 = 可用余额 | 成功,余额变为 0 |
| BAL-003 | 并发: 两次转账总额 > 余额 | 一个成功,一个 INSUFFICIENT_BALANCE |
| BAL-004 | 冻结账户转出 | ACCOUNT_FROZEN |
| BAL-005 | 禁用账户转出 | ACCOUNT_DISABLED |
10.4 FSM 状态转换测试
| 测试 ID | 场景 | 预期状态流 |
|---|---|---|
| FSM-001 | 正常 Funding→Spot | INIT → SOURCE_PENDING → SOURCE_DONE → TARGET_PENDING → COMMITTED |
| FSM-002 | 正常 Spot→Funding | 同上 |
| FSM-003 | 源失败 | INIT → SOURCE_PENDING → FAILED |
| FSM-004 | 目标失败 (明确) | … → TARGET_PENDING → COMPENSATING → ROLLED_BACK |
| FSM-005 | 目标超时 | … → TARGET_PENDING (保持,无限重试) |
| FSM-006 | 补偿失败 | COMPENSATING (保持,无限重试) |
10.5 崩溃恢复测试
| 测试 ID | 崩溃点 | 预期恢复行为 |
|---|---|---|
| CRA-001 | INIT 后,SOURCE_PENDING 前 | Recovery 读取 INIT,重新执行 step_init |
| CRA-002 | SOURCE_PENDING 中,适配器调用前 | Recovery 重试 withdraw (幂等) |
| CRA-003 | SOURCE_PENDING 中,适配器调用后 | Recovery 重试 withdraw (幂等,返回已处理) |
| CRA-004 | SOURCE_DONE 后,TARGET_PENDING 前 | Recovery 继续执行 step_source_done |
| CRA-005 | TARGET_PENDING 中 | Recovery 重试 deposit (幂等) |
| CRA-006 | COMPENSATING 中 | Recovery 重试 refund (幂等) |
10.6 并发/竞态测试
| 测试 ID | 场景 | 预期结果 |
|---|---|---|
| CON-001 | 多个 Worker 处理同一 req_id | 只有一个成功 CAS,其他跳过 |
| CON-002 | 同时两次相同金额转账 | 两个独立 req_id,各自执行 |
| CON-003 | 转账 + 外部提现并发 | 只有余额足够的操作成功 |
| CON-004 | 读取余额时无锁 | 无重复扣减(SELECT FOR UPDATE) |
10.7 幂等性测试
| 测试 ID | 场景 | 预期结果 |
|---|---|---|
| IDP-001 | 同一 req_id 调用 withdraw 两次 | 第二次返回 SUCCESS,余额只扣一次 |
| IDP-002 | 同一 req_id 调用 deposit 两次 | 第二次返回 SUCCESS,余额只加一次 |
| IDP-003 | 同一 req_id 调用 refund 两次 | 第二次返回 SUCCESS,余额只加一次 |
| IDP-004 | Recovery 多次重试同一 transfer | 最终状态一致,余额正确 |
10.8 资金异常测试 (最关键)
| 测试 ID | 威胁 | 测试方法 | 验证 |
|---|---|---|---|
| FND-001 | 双花 (Double Spend) | 源扣减两次 | 只扣一次(幂等) |
| FND-002 | 资金消失 | 源扣减成功,目标失败,不补偿 | 必须补偿或无限重试 |
| FND-003 | 资金无中生有 | 目标入账两次 | 只入一次(幂等) |
| FND-004 | 中途崩溃丢失 | 任意点崩溃 | Recovery 恢复完整性 |
| FND-005 | 状态不一致 | SOURCE_DONE 但 DB 未更新 | WAL + 幂等保证一致 |
| FND-006 | 部分提交 | PG 事务部分成功 | 原子事务,全成功或全失败 |
10.9 监控告警测试
| 测试 ID | 场景 | 预期告警 |
|---|---|---|
| MON-001 | 转账卡在 TARGET_PENDING > 1 分钟 | CRITICAL 告警 |
| MON-002 | 补偿连续失败 3 次 | CRITICAL 告警 |
| MON-003 | 资金守恒检查失败 | CRITICAL 告警 + 暂停服务 |
| MON-004 | 单用户转账频率异常 | WARNING 告警 [P2] |
📋 Implementation & Verification | 实现与验证
本章的完整实现细节、API 说明、E2E 测试脚本和验证结果请参阅:
For complete implementation details, API documentation, E2E test scripts, and verification results:
👉 Phase 0x0B-a: Implementation & Testing Guide
包含 / Includes:
- 架构实现与核心模块 (Architecture & Core Modules)
- 新增 API 端点 (New API Endpoints)
- 可复用 E2E 测试脚本 (Reusable E2E Test Script)
- 数据库验证方法 (Database Verification)
- 已修复 Bug 清单 (Fixed Bugs)
Internal Transfer E2E Testing Guide
概述 / Overview
本文档描述了 Phase 0x0B-a 内部转账功能的完成工作、实现细节和端到端测试方法。
This document describes the completed work, implementation details, and end-to-end testing methodology for Phase 0x0B-a Internal Transfer feature.
本章完成工作 / Chapter Deliverables
架构实现 / Architecture Implementation
实现了跨系统资金划转的 2-Phase Commit FSM:
┌─────────────────┐
│ TransferAPI │ Gateway 层
└────────┬────────┘
│
┌────────▼────────┐
│ TransferCoord. │ FSM 协调器
└────────┬────────┘
│
┌──────────────────┼──────────────────┐
│ │ │
┌────────▼────────┐ ┌───────▼───────┐ ┌───────▼───────┐
│ FundingAdapter │ │ TradingAdapter│ │ TransferDb │
│ (PostgreSQL) │ │ (UBSCore) │ │ (FSM State) │
└─────────────────┘ └───────────────┘ └───────────────┘
核心模块 / Core Modules
| 模块 / Module | 文件 / File | 功能 / Function |
|---|---|---|
| TransferCoordinator | src/transfer/coordinator.rs | FSM 状态机驱动 State machine driver |
| FundingAdapter | src/transfer/adapters/funding.rs | PostgreSQL 资金操作 PostgreSQL balance ops |
| TradingAdapter | src/transfer/adapters/trading.rs | UBSCore 通道通信 UBSCore channel comm |
| TransferDb | src/transfer/db.rs | FSM 状态持久化 FSM state persistence |
| TransferChannel | src/transfer/channel.rs | 跨线程通信 Cross-thread messaging |
新增 API / New APIs
| Endpoint | Method | 描述 / Description |
|---|---|---|
/api/v1/private/transfer | POST | 创建内部转账 |
/api/v1/private/transfer/{req_id} | GET | 查询转账状态 |
/api/v1/private/balances/all | GET | 查询所有账户余额 |
数据库表 / Database Tables
| 表 / Table | 用途 / Purpose |
|---|---|
fsm_transfers_tb | FSM 转账状态记录 |
transfer_operations_tb | 幂等操作追踪 |
balances_tb | 账户余额 (Funding/Spot) |
交付物 / Deliverables
- ✅ 完整的 FSM 实现 (Init → SourcePending → SourceDone → TargetPending → Committed)
- ✅ 双向转账验证 (Funding ↔ Spot)
- ✅ 可复用 E2E 测试脚本
- ✅
/balances/all余额查询 API - ✅ 232 个单元测试通过
测试脚本 / Test Script
自动化 E2E 测试 / Automated E2E Test
# 运行完整 E2E 测试 (自动启动 Gateway)
./scripts/test_transfer_e2e.sh
脚本位置: scripts/test_transfer_e2e.sh
测试流程 / Test Flow
[1/6] Prerequisites Check
✓ PostgreSQL connected (port 5433)
✓ Release binary ready
[2/6] Setup Test Data
- Enable CAN_INTERNAL_TRANSFER for USDT
- Create 1000 USDT in Funding for user 1001
- Clear previous transfer records
[3/6] Start Gateway
- Stop existing Gateway (pgrep + kill)
- Start new Gateway with updated config
- Wait for health check
[4/6] Run Transfer Tests
- Funding → Spot (50 USDT)
- Spot → Funding (25 USDT)
- Verify both COMMITTED
[5/6] Verify Balance Changes
- Check Funding: 1000 → 975 (Δ-25)
- Use /balances/all API
[6/6] Cleanup
- Stop Gateway
API 测试 / API Testing
使用 Python 客户端 / Using Python Client
import sys
sys.path.append('scripts/lib')
from api_auth import get_test_client
USER_ID = 1001
client = get_test_client(user_id=USER_ID)
headers = {'X-User-ID': str(USER_ID)}
# 1. 查询余额 / Query balances
resp = client.get('/api/v1/private/balances/all', headers=headers)
print(resp.json())
# 2. 发起转账 / Create transfer
resp = client.post('/api/v1/private/transfer',
json_body={
'from': 'funding',
'to': 'spot',
'asset': 'USDT',
'amount': '50'
},
headers=headers)
print(resp.json())
# 3. 查询转账状态 / Query transfer status
req_id = resp.json()['data']['req_id']
resp = client.get(f'/api/v1/private/transfer/{req_id}', headers=headers)
print(resp.json())
使用 curl / Using curl
# 查询余额 (需要正确签名)
curl http://localhost:8080/api/v1/private/balances/all \
-H "X-API-Key: AK_0000000000001001" \
-H "X-Signature: ..." \
-H "X-User-ID: 1001"
数据库验证 / Database Verification
检查余额 / Check Balances
PGPASSWORD=trading123 psql -h localhost -p 5433 -U trading -d exchange_info_db -c "
SELECT
CASE account_type WHEN 1 THEN 'Spot' WHEN 2 THEN 'Funding' END as account,
(available / 1000000)::text || ' USDT' as balance
FROM balances_tb
WHERE user_id = 1001 AND asset_id = 2
ORDER BY account_type;
"
检查 FSM 状态 / Check FSM State
PGPASSWORD=trading123 psql -h localhost -p 5433 -U trading -d exchange_info_db -c "
SELECT req_id, amount, state, created_at
FROM fsm_transfers_tb
WHERE user_id = 1001
ORDER BY created_at DESC LIMIT 5;
"
State 值含义 / State Values:
0: INIT10: SOURCE_PENDING20: SOURCE_DONE30: TARGET_PENDING40: COMMITTED ✅-10: FAILED-20: COMPENSATING-30: ROLLED_BACK
已修复的 Bug / Fixed Bugs
1. FSM 未执行 / FSM Not Executing
问题: create_transfer_fsm 只调用 coordinator.create(),没有调用 coordinator.execute()
修复: 添加 execute() 调用
#![allow(unused)]
fn main() {
// src/transfer/api.rs
let req_id = coordinator.create(core_req).await?;
let state = coordinator.execute(req_id).await?; // ← Added
}
2. 金额解析为 0 / Amount Parsed as 0
问题: Decimal.to_string().parse::<u64>() 对 “50000000.00000000” 返回失败
修复: 使用 trunc().to_i64()
#![allow(unused)]
fn main() {
// src/transfer/db.rs
let amount_u64 = amount.trunc().to_i64().unwrap_or(0) as u64;
}
3. 类型不匹配 / Type Mismatch
status列: INT4 (i32), 不是 INT2decimals列: INT2 (i16), 不是 i32
测试结果示例 / Sample Test Output
==============================================
Internal Transfer E2E Test (Phase 0x0B-a)
==============================================
[1/6] Checking prerequisites...
✓ PostgreSQL connected
✓ Release binary ready
[2/6] Setting up test data...
✓ Test data initialized (1000 USDT in Funding only for user 1001)
[3/6] Starting Gateway...
✓ Gateway ready
[4/6] Running transfer tests with balance verification...
[BEFORE] Getting initial balances...
USDT:funding: 1000.00
[TRANSFER 1] Funding → Spot (50 USDT)...
✓ COMMITTED
[TRANSFER 2] Spot → Funding (25 USDT)...
✓ COMMITTED
[AFTER] Getting final Funding balance...
USDT:funding: 975.00
[VERIFY] Checking Funding balance changes...
✓ Funding: 1000.00 → 975.00 (Δ-25.00)
Results: 3 passed, 0 failed
[5/6] Final database state...
Funding | 975.0000000000000000 USDT
[6/6] Cleanup...
==============================================
✅ All E2E Transfer Tests PASSED
==============================================
相关文件 / Related Files
| 文件 / File | 描述 / Description |
|---|---|
scripts/test_transfer_e2e.sh | E2E 测试脚本 |
scripts/lib/api_auth.py | API 认证库 |
src/transfer/api.rs | 转账 API 处理 |
src/transfer/coordinator.rs | FSM 协调器 |
src/transfer/adapters/funding.rs | Funding 适配器 |
src/transfer/adapters/trading.rs | Trading 适配器 |
Build & Verification Guide | 编译与验证事项
0x0C Trade Fee System | 交易手续费系统
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📦 Code Changes: View Diff
1. Overview
1.1 Connecting the Dots: From Transfer to Trading
In 0x0B, we built the FSM mechanism for fund transfers between Funding and Spot accounts. Once funds enter the Spot account, the exchange needs a revenue source.
This is the topic of this chapter: Trade Fee.
Whenever buyers and sellers execute trades, the exchange collects a percentage fee. This is the core business model of exchanges and the foundation for sustainable operations.
Design Philosophy: Fee implementation seems simple (just deducting a percentage, right?), but involves multiple key decisions:
- Where to configure fee rates? (Symbol level vs Global)
- Which asset to deduct from? (Paid vs Received)
- When to deduct? (In ME vs In Settlement)
- How to ensure precision? (u64 * bps / 10000 overflow issues)
1.2 Goal
Implement Maker/Taker fee model for trade execution. Fees are the primary revenue source for exchanges
1.3 Key Concepts
| Term | Definition |
|---|---|
| Maker | Order that adds liquidity (resting on orderbook) |
| Taker | Order that removes liquidity (matches immediately) |
| Fee Rate | Percentage of trade value charged |
| bps | Basis points (1 bps = 0.01% = 0.0001) |
1.4 Architecture Overview
┌─────────── Fee Model ────────────┐
│ │
│ Final Rate = Symbol.base_fee │
│ × VipDiscount / 100 │
└──────────────────────────────────┘
┌─────────── Data Flow ─────────────────────────────────────────────────────┐
│ │
│ ME ────▶ Trade{role} ────▶ UBSCore ────▶ BalanceEventBatch ────▶ TDengine
│ │ │ │ │
│ │ Memory: VIP/Fees ├── buyer event │
│ │ O(1) fee calc ├── seller event │
│ │ └── revenue event ×2 │
│ │ │
└──────────────┴────────────────────────────────────────────────────────────┘
┌─────────── Core Design ───────────┐
│ ✅ Fee from Gain → No reservation │
│ ✅ UBSCore billing → Balance auth │
│ ✅ Per-User Event → Decoupled │
│ ✅ Event Sourcing → Conservation │
└───────────────────────────────────┘
2. Fee Model Design
2.1 Why Maker/Taker Model?
Traditional stock exchanges use fixed rates, but crypto exchanges universally adopt the Maker/Taker model. This is not arbitrary:
| Problem | How Maker/Taker Solves |
|---|---|
| Low liquidity | Low Maker fees encourage limit orders |
| Price discovery | Deeper orderbook, narrower spreads |
| Fairness | Liquidity takers pay more |
Industry Practice: Binance, OKX, Bybit all use this model.
2.2 Fee Rate Architecture
Two-Layer System: Symbol base rate × VIP discount coefficient
Final Rate = Symbol.base_fee × VipDiscountTable[user.vip_level] / 100
Layer 1: Symbol Base Rate
Each trading pair defines its own base rate:
| Field | Precision | Default | Description |
|---|---|---|---|
base_maker_fee | 10^6 | 1000 | 0.10% |
base_taker_fee | 10^6 | 2000 | 0.20% |
Layer 2: VIP Discount Coefficient
VIP levels and discounts are configured from database (not hardcoded).
VIP Level Table Design:
| Field | Type | Description |
|---|---|---|
level | SMALLINT PK | VIP level (0, 1, 2, …) |
discount_percent | SMALLINT | Discount % (100=no discount, 50=50% off) |
min_volume | DECIMAL | Trading volume for upgrade (optional) |
description | VARCHAR | Level description (optional) |
Example Data:
| level | discount_percent | description |
|---|---|---|
| 0 | 100 | Normal |
| 1 | 90 | VIP 1 |
| 2 | 80 | VIP 2 |
| 3 | 70 | VIP 3 |
| … | … | … |
Operations can configure any number of VIP levels; code loads from database.
Example Calculation:
BTC_USDT: base_taker_fee = 2000 (0.20%)
User VIP 5: discount = 50%
Final Rate = 2000 × 50 / 100 = 1000 (0.10%)
Why 10^6 Precision?
- 10^4 (bps) only represents down to 0.01%, not fine enough
- 10^6 can represent 0.0001%, sufficient for VIP discounts and rebates
- Safe with u128 intermediate:
(amount as u128 * rate as u128 / 10^6) as u64
2.3 Fee Collection Point
Trade: Alice (Taker, BUY) ← → Bob (Maker, SELL)
Alice buys 1 BTC @ 100,000 USDT
┌──────────────────────────────────────────────────────────┐
│ Before Fee: │
│ Alice: -100,000 USDT, +1 BTC │
│ Bob: +100,000 USDT, -1 BTC │
├──────────────────────────────────────────────────────────┤
│ After Fee (deducted from RECEIVED asset): │
│ Alice (Taker 0.20%): -100,000 USDT, +0.998 BTC │
│ Bob (Maker 0.10%): +99,900 USDT, -1 BTC │
│ │
│ Exchange collects: 0.002 BTC + 100 USDT │
└──────────────────────────────────────────────────────────┘
Rule: Fee is always deducted from what you receive, not what you pay.
Why deduct from received asset?
- Simplify user mental accounting: User pays 100 USDT, it’s exactly 100 USDT
- Avoid budget overrun: Buying 1 BTC won’t require 100,020 USDT due to fees
- Industry practice: Binance, Coinbase all do this
2.4 Why No Lock Reservation Needed
Since fees are deducted from received asset, no fee reservation needed:
┌─────────────────────────────────────────────────────────────────────┐
│ Benefits of Fee from Gain (Received Asset) │
├─────────────────────────────────────────────────────────────────────┤
│ User receives 1 BTC → Deduct 0.002 BTC fee → Net credit 0.998 BTC │
│ │
│ ✅ Never "insufficient balance for fee" │
│ ✅ Pay amount = Actual pay amount (exact) │
│ ✅ No complex reservation/refund logic │
└─────────────────────────────────────────────────────────────────────┘
Compare with deducting from paid asset:
| Approach | Lock Amount | Issue |
|---|---|---|
| From Gain | base_cost | No extra reservation ✅ |
| From Pay | base_cost + max_fee | May insufficient, need reservation ❌ |
Design Decision: Use “fee from gain” mode, simplify lock logic.
- Buy order locks USDT, fee deducted from received BTC
- Sell order locks BTC, fee deducted from received USDT
2.5 Fee Responsibility: UBSCore (First Principles)
Core Question: Who is responsible for fee calculation?
Fee deduction = Balance change = Must be executed by UBSCore
| Question | Answer |
|---|---|
| Who knows trade occurred? | ME |
| Who manages balances? | UBSCore |
| Who can execute deductions? | UBSCore |
| Who is responsible for fees? | UBSCore |
Data Flow:
ME ──▶ Trade{role} ──▶ UBSCore ──▶ BalanceEvent{fee} ──▶ Settlement ──▶ TDengine
│
① Get VIP level (memory)
② Get Symbol fee rate (memory)
③ Calculate fee = received × rate
④ credit(net_amount)
2.6 High Performance Design
Key to efficiency: All config in UBSCore memory
UBSCore Memory Structure (loaded at startup):
├── user_vip_levels: HashMap<UserId, u8>
├── vip_discounts: HashMap<u8, u8> // level → discount%
└── symbol_fees: HashMap<SymbolId, (u64, u64)> // (maker, taker)
Fee calculation = Pure memory operation, O(1)
| Component | Responsibility | Blocking? |
|---|---|---|
| UBSCore | Calculate fee, update balance | ❌ Pure memory |
| BalanceEvent | Pass fee info | ❌ Async channel |
| Settlement | Write to TDengine | ❌ Separate thread |
Why efficient?
- No I/O on critical path
- All data in memory
- Output reuses existing BalanceEvent channel
2.7 Per-User BalanceEvent Design
Core Insight: One Trade produces two users’ balance changes → Two BalanceEvents
Trade ──▶ UBSCore ──┬──▶ BalanceEvent{user: buyer} ──▶ WS + TDengine
│
└──▶ BalanceEvent{user: seller} ──▶ WS + TDengine
Per-User Event Structure:
| Field | Type | Description |
|---|---|---|
trade_id | u64 | Links to original Trade |
user_id | u64 | Who this event belongs to |
debit_asset | u32 | Asset paid |
debit_amount | u64 | Amount paid |
credit_asset | u32 | Asset received |
credit_amount | u64 | Net amount (after fee) |
fee | u64 | Fee charged |
is_maker | bool | Is Maker role |
Example Code (Pseudocode, for reference only):
#![allow(unused)]
fn main() {
// ⚠️ Pseudocode - may change during implementation
BalanceEvent::TradeSettled {
trade_id: u64, // Links to original Trade
user_id: u64, // Who this event belongs to
debit_asset: u32, // Paid
debit_amount: u64,
credit_asset: u32, // Received (net)
credit_amount: u64,
fee: u64, // Fee
is_maker: bool, // Role
}
}
Why Per-User Design?
- Single responsibility: One event = One user’s balance change
- Decoupled: User doesn’t need to know counterparty
- WebSocket friendly: Route directly by user_id
- Query friendly: TDengine partitioned by user_id
- Privacy safe: User only sees own data
3. Data Model
3.1 Symbol Base Fee Configuration
-- Symbol base fee (10^6 precision: 1000 = 0.10%)
ALTER TABLE symbols_tb ADD COLUMN base_maker_fee INTEGER NOT NULL DEFAULT 1000;
ALTER TABLE symbols_tb ADD COLUMN base_taker_fee INTEGER NOT NULL DEFAULT 2000;
3.2 User VIP Level
-- User VIP level (0-9, 0=normal user, 9=top tier)
ALTER TABLE users_tb ADD COLUMN vip_level SMALLINT NOT NULL DEFAULT 0;
3.3 Trade Record Enhancement
Existing Trade struct already has:
fee: u64- Amount of fee charged (in received asset’s scaled units)role: u8- 0=Maker, 1=Taker
3.4 Fee Record Storage
Fee info is already included in Trade record:
| Storage | Content |
|---|---|
trades_tb (TDengine) | fee, fee_asset, role fields |
| Trade Event | Real-time push to downstream (WS, Kafka) |
3.5 Event Sourcing: BalanceEventBatch (Full Traceability)
Core Design: One Trade produces a group of BalanceEvents as atomic unit
Trade ──▶ UBSCore ──▶ BalanceEventBatch{trade_id, events: [...]}
│
├── TradeSettled{user: buyer} // Buyer
├── TradeSettled{user: seller} // Seller
├── FeeReceived{account: REVENUE, from: buyer}
└── FeeReceived{account: REVENUE, from: seller}
Example Structure (Pseudocode):
#![allow(unused)]
fn main() {
// ⚠️ Pseudocode - may change during implementation
BalanceEventBatch {
trade_id: u64,
ts: Timestamp,
events: [
TradeSettled{user: buyer_id, debit_asset, debit_amount, credit_asset, credit_amount, fee},
TradeSettled{user: seller_id, debit_asset, debit_amount, credit_asset, credit_amount, fee},
FeeReceived{account: REVENUE_ID, asset: base_asset, amount: buyer_fee, from_user: buyer_id},
FeeReceived{account: REVENUE_ID, asset: quote_asset, amount: seller_fee, from_user: seller_id},
]
}
}
Atomic Unit Properties:
| Property | Description |
|---|---|
| Generated together | Same trade_id |
| Persisted together | Single batch write to TDengine |
| Traced together | All events linked by trade_id |
Asset Conservation Verification:
buyer.debit(quote) + buyer.credit(base - fee) = 0 ✓
seller.debit(base) + seller.credit(quote - fee) = 0 ✓
revenue.credit(buyer_fee + seller_fee) = fee_total ✓
Σ changes = 0 (Asset conservation, auditable)
TDengine Storage (Event Sourcing):
| Table | Content |
|---|---|
balance_events_tb | All BalanceEvents (TradeSettled + FeeReceived) |
Why Event Sourcing?
- Full traceability: Any fee can be traced to trade_id + user_id
- Asset conservation: Conservation verifiable within event batch
- Aggregation is derived: Balance = SUM(events), computed on demand
4. Implementation Architecture
4.1 Complete Data Flow
┌───────────┐ ┌───────────┐ ┌─────────────────────────────────────────┐
│ ME │───▶│ UBSCore │───▶│ BalanceEventBatch │
│ (Match) │ │ (Fee calc)│ │ ┌─ TradeSettled{buyer} │
└───────────┘ └───────────┘ │ ├─ TradeSettled{seller} │
│ │ ├─ FeeReceived{REVENUE, from:buyer} │
│ │ └─ FeeReceived{REVENUE, from:seller} │
Memory: VIP/Fee rates └───────────────┬─────────────────────────┘
│
▼
┌──────────────────────────────────────────────┐
│ Settlement Service │
│ ① Batch write to TDengine │
│ ② WebSocket push (routed by user_id) │
│ ③ Kafka publish (optional) │
└──────────────────────────────────────────────┘
4.2 TDengine Schema Design
balance_events Super Table:
CREATE STABLE balance_events (
ts TIMESTAMP,
event_type TINYINT, -- 1=TradeSettled, 2=FeeReceived, 3=Deposit...
trade_id BIGINT,
debit_asset INT,
debit_amt BIGINT,
credit_asset INT,
credit_amt BIGINT,
fee BIGINT,
fee_asset INT,
is_maker BOOL,
from_user BIGINT -- FeeReceived: source user
) TAGS (
user_id BIGINT, -- User identifier (0=REVENUE)
account_type TINYINT -- 1=Spot, 2=Funding, 3=Futures...
);
-- Subtable per (user, account_type)
CREATE TABLE user_1001_spot USING balance_events TAGS (1001, 1);
CREATE TABLE user_1001_funding USING balance_events TAGS (1001, 2);
CREATE TABLE revenue_spot USING balance_events TAGS (0, 1); -- REVENUE
Design Points:
| Design | Rationale |
|---|---|
Dual TAGs (user_id, account_type) | Future-proof for Futures, Margin… |
| Partition by user_id | User queries scan only their tables |
| Partition by account_type | Account-specific queries are O(1) |
| Timestamp index | TDengine native optimization |
4.3 Query Patterns
User query fee history:
SELECT ts, trade_id, fee, fee_asset, is_maker
FROM user_1001_events
WHERE event_type = 1 -- TradeSettled
AND ts > NOW() - 30d
ORDER BY ts DESC
LIMIT 100;
Platform fee income stats:
SELECT fee_asset, SUM(credit_amt) as total_fee
FROM revenue_events
WHERE ts > NOW() - 1d
GROUP BY fee_asset;
Trace all events for a trade:
SELECT * FROM balance_events
WHERE trade_id = 12345
ORDER BY ts;
4.4 Consumer Architecture
BalanceEventBatch
│
├──▶ TDengine Writer (batch write, high throughput)
│ └── Route to subtable by (user_id, account_type)
│
├──▶ WebSocket Router (real-time push)
│ └── Route to WS connection by user_id
│
└──▶ Kafka Publisher (optional, downstream subscription)
└── Topic: balance_events
4.5 Performance Considerations
| Optimization | Strategy |
|---|---|
| Batch write | BalanceEventBatch writes at once |
| Partition strategy | Partition by user_id, avoid hotspots |
| Time partition | TDengine auto partitions by time |
| Async processing | UBSCore doesn’t wait after send |
5. API Changes
5.1 Trade Response
{
"trade_id": "12345",
"price": "100000.00",
"qty": "1.00000000",
"fee": "0.00200000", // NEW: Fee amount
"fee_asset": "BTC", // NEW: Fee asset
"role": "TAKER" // NEW: Maker/Taker
}
5.2 WebSocket Trade Update
{
"e": "trade.update",
"data": {
"trade_id": "12345",
"fee": "0.002",
"fee_asset": "BTC",
"is_maker": false
}
}
6. Edge Cases
| Case | Handling |
|---|---|
| Zero-fee symbol | Allow maker_fee = 0 |
| Insufficient for fee | N/A - fee always deducted from received asset |
7. Verification Plan
7.1 Unit Tests
- Fee calculation accuracy (multiple precisions)
- Maker vs Taker role assignment
7.2 Integration Tests
- E2E trade with fee deduction
- Fee ledger reconciliation
7.3 Acceptance Criteria
- Trades deduct correct fees
- Fee ledger matches Σ(trade.fee)
- API returns fee info
- WS pushes fee info
🇨🇳 中文
📦 代码变更: 查看 Diff
1. 概述
1.1 从资金划转到交易
在 0x0B 章节中,我们建立了资金划转机制。本章的主题是交易手续费——交易所最核心的商业模式。
1.2 目标
实现 Maker/Taker 手续费模型。
1.3 核心概念
| 术语 | 定义 |
|---|---|
| Maker | 挂单方 (订单在盘口等待成交) |
| Taker | 吃单方 (订单立即匹配成交) |
| 费率 | 交易额的百分比 |
| bps | 基点 (1 bps = 0.01%) |
1.4 架构总览
┌─────────── 费率模型 ────────────┐
│ 最终费率 = Symbol.base_fee │
│ × VipDiscount / 100 │
└────────────────────────────────┘
┌─────────── 数据流 ─────────────────────────────────────────────────────┐
│ ME ────▶ Trade{role} ────▶ UBSCore ────▶ BalanceEventBatch ────▶ TDengine
│ │ │ │ │
│ │ 内存: VIP/费率 ├── buyer event │
│ │ O(1) fee 计算 ├── seller event │
│ │ └── revenue event ×2 │
└──────────────┴─────────────────────────────────────────────────────────┘
┌─────────── 核心设计 ───────────┐
│ ✅ 从 Gain 扣费 → 无需预留 │
│ ✅ UBSCore 计费 → 余额权威 │
│ ✅ Per-User Event → 解耦隐私 │
│ ✅ Event Sourcing → 资产守恒 │
└────────────────────────────────┘
2. 费率模型设计
2.1 为什么选择 Maker/Taker?
| 问题 | 解决方案 |
|---|---|
| 流动性不足 | 低 Maker 费率鼓励挂单 |
| 价格发现 | 盘口深度越深,价差越小 |
| 公平性 | 消耗流动性者多付费 |
2.2 两层费率体系
最终费率 = Symbol.base_fee × VipDiscount[vip_level] / 100
Layer 1: Symbol 基础费率
| 字段 | 精度 | 默认值 | 说明 |
|---|---|---|---|
base_maker_fee | 10^6 | 1000 | 0.10% |
base_taker_fee | 10^6 | 2000 | 0.20% |
Layer 2: VIP 折扣系数
| 字段 | 类型 | 说明 |
|---|---|---|
level | SMALLINT PK | VIP 等级 |
discount_percent | SMALLINT | 折扣百分比 |
2.3 手续费扣除点
规则: 手续费从收到的资产扣除,不是支付的资产。
Alice (Taker, BUY) 以 100,000 USDT 购买 1 BTC
Before: Alice -100,000 USDT, +1 BTC
After: Alice -100,000 USDT, +0.998 BTC (手续费 0.002 BTC)
2.4 无需预留手续费
从 Gain 扣费的好处:
- ✅ 永远不会“余额不足付手续费“
- ✅ 支付金额 = 实际支付金额
- ✅ 无需复杂的预留/退还逻辑
2.5 计费责任: UBSCore (第一性原理)
费用扣除 = 余额变动 = 必须由 UBSCore 执行
| 问题 | 答案 |
|---|---|
| 谁管理余额? | UBSCore |
| 谁能执行扣款? | UBSCore |
| 谁负责计费? | UBSCore |
2.6 高性能设计
UBSCore 内存结构 (启动时加载):
├── user_vip_levels: HashMap<UserId, u8>
├── vip_discounts: HashMap<u8, u8>
└── symbol_fees: HashMap<SymbolId, (u64, u64)>
费用计算 = 纯内存操作, O(1)
2.7 Per-User BalanceEvent
一个 Trade → 两个用户事件
Trade ──▶ UBSCore ──┬──▶ BalanceEvent{user: buyer}
└──▶ BalanceEvent{user: seller}
3. 数据模型
3.1 Symbol 费率配置
ALTER TABLE symbols_tb ADD COLUMN base_maker_fee INTEGER NOT NULL DEFAULT 1000;
ALTER TABLE symbols_tb ADD COLUMN base_taker_fee INTEGER NOT NULL DEFAULT 2000;
3.2 User VIP 等级
ALTER TABLE users_tb ADD COLUMN vip_level SMALLINT NOT NULL DEFAULT 0;
3.3 Event Sourcing: BalanceEventBatch
一个 Trade 产生一组 BalanceEvent 作为原子整体:
BalanceEventBatch{trade_id}
├── TradeSettled{user: buyer}
├── TradeSettled{user: seller}
├── FeeReceived{REVENUE, from: buyer}
└── FeeReceived{REVENUE, from: seller}
资产守恒验证:
buyer.debit(quote) + buyer.credit(base - fee) = 0 ✓
seller.debit(base) + seller.credit(quote - fee) = 0 ✓
revenue.credit(buyer_fee + seller_fee) = fee_total ✓
Σ 变动 = 0 (可审计)
4. 实现架构
4.1 TDengine Schema
CREATE STABLE balance_events (
ts TIMESTAMP,
event_type TINYINT,
trade_id BIGINT,
debit_asset INT,
debit_amt BIGINT,
credit_asset INT,
credit_amt BIGINT,
fee BIGINT,
fee_asset INT,
is_maker BOOL
) TAGS (
user_id BIGINT, -- 用户 ID (0=REVENUE)
account_type TINYINT -- 1=Spot, 2=Funding, 3=Futures...
);
4.2 查询模式
-- 用户手续费历史
SELECT ts, trade_id, fee FROM user_1001_events WHERE event_type = 1;
-- 平台收入统计
SELECT fee_asset, SUM(credit_amt) FROM revenue_events GROUP BY fee_asset;
4.3 消费者架构
BalanceEventBatch
├──▶ TDengine Writer (批量写入)
├──▶ WebSocket Router (按 user_id 推送)
└──▶ Kafka Publisher (可选)
5. API 变更
5.1 Trade 响应
{
"trade_id": "12345",
"fee": "0.002",
"fee_asset": "BTC",
"role": "TAKER"
}
5.2 WebSocket 推送
{
"e": "trade.update",
"data": {"trade_id": "12345", "fee": "0.002", "is_maker": false}
}
6. 边界情况
| 情况 | 处理 |
|---|---|
| 零费率交易对 | 允许 maker_fee = 0 |
7. 验证计划
- 手续费计算准确性测试
- E2E 交易手续费扣除
- API/WS 返回手续费信息
- 资产守恒审计
0x0D Snapshot & Recovery: Robustness
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📅 Status: 🚧 Under Construction Core Objective: Implement graceful shutdown and state recovery mechanisms.
1. Overview
- Snapshot: Periodically save the memory state (OrderBook, Balances) to disk.
- Recovery: Restore state from the latest snapshot + replay WAL (Write-Ahead Log) upon restart.
- Graceful Shutdown: Ensure all pending events are processed before stopping.
(Detailed content coming soon)
🇨🇳 中文
📅 状态: 🚧 建设中 核心目标: 实现优雅停机与状态恢复机制。
1. 概述
- 快照 (Snapshot): 定期将内存状态(OrderBook, Balances)保存到磁盘。
- 恢复 (Recovery): 重启时从最新快照恢复 + 重放 WAL (Write-Ahead Log)。
- 优雅停机: 确保在停止前处理完所有挂起事件。
(详细内容即将推出)
0x0E OpenAPI Integration
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📦 Code Changes: View Diff
1. Overview
1.1 Why OpenAPI?
Programmatic traders need API documentation to integrate with our exchange. Instead of maintaining separate docs that drift from code, we auto-generate OpenAPI 3.0 spec directly from Rust types.
1.2 Goal
- Serve interactive API docs at
/docs(Swagger UI) - Export
openapi.jsonfor SDK generation - Keep docs in sync with code (single source of truth)
1.3 Key Concepts
| Term | Definition |
|---|---|
| OpenAPI | Industry-standard API specification format (formerly Swagger) |
| utoipa | Rust crate for compile-time OpenAPI generation |
| Swagger UI | Interactive API documentation interface |
| Code-First | Generate spec from code, not YAML files |
1.4 Architecture Overview
┌─────────── OpenAPI Integration Flow ────────────┐
│ │
│ Rust Handlers ──▶ #[utoipa::path] ──▶ OpenAPI │
│ │ │ │
│ │ ▼ │
│ │ Swagger UI │
│ │ (/docs) │
│ │ │ │
│ ▼ ▼ │
│ Type-Safe API ◀─────────────────▶ openapi.json │
│ │ │
│ ▼ │
│ SDK Clients │
│ (Python, TS) │
└─────────────────────────────────────────────────┘
2. Implementation
2.1 Adding Dependencies
Cargo.toml:
[dependencies]
+ utoipa = { version = "5.3", features = ["axum_extras", "chrono", "uuid"] }
+ utoipa-swagger-ui = { version = "8.0", features = ["axum"] }
2.2 Creating OpenAPI Module
Create src/gateway/openapi.rs:
#![allow(unused)]
fn main() {
use utoipa::OpenApi;
#[derive(OpenApi)]
#[openapi(
info(
title = "Zero X Infinity Exchange API",
version = "1.0.0",
description = "High-performance crypto exchange API (1.3M orders/sec)"
),
paths(
handlers::health_check,
handlers::get_depth,
handlers::get_klines,
// ... all API handlers
),
components(schemas(
types::ApiResponse<()>,
types::DepthApiData,
// ... all response types
))
)]
pub struct ApiDoc;
}
2.3 Annotating Handlers
Add #[utoipa::path] to each handler:
+ #[utoipa::path(
+ get,
+ path = "/api/v1/public/depth",
+ params(
+ ("symbol" = String, Query, description = "Trading pair"),
+ ("limit" = Option<u32>, Query, description = "Depth levels")
+ ),
+ responses(
+ (status = 200, description = "Order book depth", body = ApiResponse<DepthApiData>)
+ ),
+ tag = "Market Data"
+ )]
pub async fn get_depth(
State(state): State<Arc<AppState>>,
Query(params): Query<HashMap<String, String>>,
) -> impl IntoResponse {
// ... existing implementation ...
}
2.4 Adding Schema Derivations
Add ToSchema to response types:
+ use utoipa::ToSchema;
- #[derive(Serialize, Deserialize)]
+ #[derive(Serialize, Deserialize, ToSchema)]
pub struct DepthApiData {
+ #[schema(example = "BTC_USDT")]
pub symbol: String,
+ #[schema(example = json!([["85000.00", "0.5"]]))]
pub bids: Vec<[String; 2]>,
+ #[schema(example = json!([["85001.00", "0.3"]]))]
pub asks: Vec<[String; 2]>,
}
2.5 Integrating Swagger UI
In src/gateway/mod.rs:
+ use utoipa_swagger_ui::SwaggerUi;
+ use crate::gateway::openapi::ApiDoc;
let app = Router::new()
.route("/api/v1/health", get(handlers::health_check))
.nest("/api/v1/public", public_routes)
.nest("/api/v1/private", private_routes)
+ .merge(
+ SwaggerUi::new("/docs")
+ .url("/api-docs/openapi.json", ApiDoc::openapi())
+ )
.with_state(state);
3. API Endpoints
3.1 Public Endpoints (No Auth)
| Endpoint | Method | Description |
|---|---|---|
/api/v1/health | GET | Health check |
/api/v1/public/depth | GET | Order book depth |
/api/v1/public/klines | GET | K-line data |
/api/v1/public/assets | GET | Asset list |
/api/v1/public/symbols | GET | Trading pairs |
/api/v1/public/exchange_info | GET | Exchange metadata |
3.2 Private Endpoints (Ed25519 Auth)
| Endpoint | Method | Description |
|---|---|---|
/api/v1/private/order | POST | Create order |
/api/v1/private/cancel | POST | Cancel order |
/api/v1/private/orders | GET | Query orders |
/api/v1/private/trades | GET | Trade history |
/api/v1/private/balances | GET | Balance query |
/api/v1/private/balances/all | GET | All balances |
/api/v1/private/transfer | POST | Internal transfer |
/api/v1/private/transfer/{id} | GET | Transfer status |
4. SDK Generation
4.1 Python SDK
Auto-generated Python client with Ed25519 signing:
from zero_x_infinity_sdk import ZeroXInfinityClient
client = ZeroXInfinityClient(
api_key="your_api_key",
secret_key_bytes=secret_key # Ed25519 private key
)
# Create order
order = client.create_order(
symbol="BTC_USDT",
side="BUY",
price="85000.00",
qty="0.001"
)
4.2 TypeScript SDK
import { ZeroXInfinityClient } from './zero_x_infinity_sdk';
const client = new ZeroXInfinityClient(apiKey, secretKey);
const depth = await client.getDepth('BTC_USDT');
5. Verification
5.1 Access Swagger UI
cargo run --release -- --gateway --port 8080
# Open: http://localhost:8080/docs
5.2 Test Results
| Test Category | Tests | Result |
|---|---|---|
| Unit Tests | 293 | ✅ All pass |
| Public Endpoints | 6 | ✅ All pass |
| Private Endpoints | 9 | ✅ All pass |
| E2E Total | 17 | ✅ All pass |
6. Summary
In this chapter, we added OpenAPI documentation to our trading engine:
| Achievement | Result |
|---|---|
| Swagger UI | Available at /docs |
| OpenAPI Spec | 15 endpoints documented |
| Python SDK | Auto-generated with Ed25519 |
| TypeScript SDK | Type-safe client |
| Zero Breaking Changes | All existing tests pass |
Next Chapter: With resilience (0x0D) and documentation (0x0E) complete, the foundation is solid. The next logical step is 0x0F: Deposit & Withdraw—connecting to blockchain for real crypto funding.
🇨🇳 中文
📦 代码变更: 查看 Diff
1. 概述
1.1 为什么需要 OpenAPI?
程序化交易者需要 API 文档。与其手写 YAML 文档(容易和代码不同步),不如直接从 Rust 类型生成 OpenAPI 3.0 规范。
1.2 目标
- 在
/docs提供交互式文档(Swagger UI) - 导出
openapi.json用于 SDK 生成 - 文档和代码保持同步(单一事实来源)
1.3 核心概念
| 术语 | 定义 |
|---|---|
| OpenAPI | 行业标准的 API 规范格式(前身是 Swagger) |
| utoipa | Rust 编译时 OpenAPI 生成库 |
| Swagger UI | 交互式 API 文档界面 |
| 代码优先 | 从代码生成规范,而非 YAML 文件 |
1.4 架构总览
┌─────────── OpenAPI 集成流程 ────────────┐
│ │
│ Rust Handlers ──▶ #[utoipa::path] ──▶ OpenAPI
│ │ │
│ │ ▼
│ │ Swagger UI
│ │ (/docs)
│ ▼ │
│ 类型安全 API ◀────────────────▶ openapi.json
│ │
│ ▼
│ SDK 客户端
│ (Python, TS)
└──────────────────────────────────────────┘
2. 实现
2.1 添加依赖
Cargo.toml:
[dependencies]
+ utoipa = { version = "5.3", features = ["axum_extras", "chrono", "uuid"] }
+ utoipa-swagger-ui = { version = "8.0", features = ["axum"] }
2.2 创建 OpenAPI 模块
创建 src/gateway/openapi.rs:
#![allow(unused)]
fn main() {
use utoipa::OpenApi;
#[derive(OpenApi)]
#[openapi(
info(
title = "Zero X Infinity Exchange API",
version = "1.0.0",
description = "高性能加密货币交易所 API (1.3M 订单/秒)"
),
paths(
handlers::health_check,
handlers::get_depth,
handlers::get_klines,
// ... 所有 API handlers
),
components(schemas(
types::ApiResponse<()>,
types::DepthApiData,
// ... 所有响应类型
))
)]
pub struct ApiDoc;
}
2.3 注解 Handlers
为每个 handler 添加 #[utoipa::path]:
+ #[utoipa::path(
+ get,
+ path = "/api/v1/public/depth",
+ params(
+ ("symbol" = String, Query, description = "交易对"),
+ ("limit" = Option<u32>, Query, description = "深度层数")
+ ),
+ responses(
+ (status = 200, description = "订单簿深度", body = ApiResponse<DepthApiData>)
+ ),
+ tag = "行情数据"
+ )]
pub async fn get_depth(
State(state): State<Arc<AppState>>,
Query(params): Query<HashMap<String, String>>,
) -> impl IntoResponse {
// ... 现有实现 ...
}
2.4 添加 Schema 派生
为响应类型添加 ToSchema:
+ use utoipa::ToSchema;
- #[derive(Serialize, Deserialize)]
+ #[derive(Serialize, Deserialize, ToSchema)]
pub struct DepthApiData {
+ #[schema(example = "BTC_USDT")]
pub symbol: String,
+ #[schema(example = json!([["85000.00", "0.5"]]))]
pub bids: Vec<[String; 2]>,
+ #[schema(example = json!([["85001.00", "0.3"]]))]
pub asks: Vec<[String; 2]>,
}
2.5 集成 Swagger UI
在 src/gateway/mod.rs 中:
+ use utoipa_swagger_ui::SwaggerUi;
+ use crate::gateway::openapi::ApiDoc;
let app = Router::new()
.route("/api/v1/health", get(handlers::health_check))
.nest("/api/v1/public", public_routes)
.nest("/api/v1/private", private_routes)
+ .merge(
+ SwaggerUi::new("/docs")
+ .url("/api-docs/openapi.json", ApiDoc::openapi())
+ )
.with_state(state);
3. API 端点
3.1 公开端点(无需认证)
| 端点 | 方法 | 描述 |
|---|---|---|
/api/v1/health | GET | 健康检查 |
/api/v1/public/depth | GET | 订单簿深度 |
/api/v1/public/klines | GET | K 线数据 |
/api/v1/public/assets | GET | 资产列表 |
/api/v1/public/symbols | GET | 交易对 |
/api/v1/public/exchange_info | GET | 交易所信息 |
3.2 私有端点(Ed25519 认证)
| 端点 | 方法 | 描述 |
|---|---|---|
/api/v1/private/order | POST | 创建订单 |
/api/v1/private/cancel | POST | 取消订单 |
/api/v1/private/orders | GET | 查询订单 |
/api/v1/private/trades | GET | 成交历史 |
/api/v1/private/balances | GET | 余额查询 |
/api/v1/private/balances/all | GET | 所有余额 |
/api/v1/private/transfer | POST | 内部划转 |
/api/v1/private/transfer/{id} | GET | 划转状态 |
4. SDK 生成
4.1 Python SDK
自动生成的 Python 客户端(含 Ed25519 签名):
from zero_x_infinity_sdk import ZeroXInfinityClient
client = ZeroXInfinityClient(
api_key="your_api_key",
secret_key_bytes=secret_key # Ed25519 私钥
)
# 创建订单
order = client.create_order(
symbol="BTC_USDT",
side="BUY",
price="85000.00",
qty="0.001"
)
4.2 TypeScript SDK
import { ZeroXInfinityClient } from './zero_x_infinity_sdk';
const client = new ZeroXInfinityClient(apiKey, secretKey);
const depth = await client.getDepth('BTC_USDT');
5. 验证
5.1 访问 Swagger UI
cargo run --release -- --gateway --port 8080
# 打开: http://localhost:8080/docs
5.2 测试结果
| 测试类别 | 数量 | 结果 |
|---|---|---|
| 单元测试 | 293 | ✅ 全部通过 |
| 公开端点 | 6 | ✅ 全部通过 |
| 私有端点 | 9 | ✅ 全部通过 |
| E2E 总计 | 17 | ✅ 全部通过 |
6. 总结
本章我们为交易引擎添加了 OpenAPI 文档:
| 成就 | 结果 |
|---|---|
| Swagger UI | 可通过 /docs 访问 |
| OpenAPI 规范 | 15 个端点已文档化 |
| Python SDK | 自动生成(含 Ed25519) |
| TypeScript SDK | 类型安全的客户端 |
| 零破坏性变更 | 所有现有测试通过 |
下一章:随着鲁棒性(0x0D)和文档化(0x0E)的完成,基础已经稳固。下一个合理的步骤是 0x0F: 充值与提现 —— 连接区块链实现真正的加密货币资金。
0x0F Admin Dashboard Architecture
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📅 Status: ✅ Verified (E2E 4/4 Pass) Branch:
0x0F-admin-dashboardUpdated: 2024-12-27
📦 Code Changes: View Code
1. Overview
1.1 Goal
Build an admin dashboard for exchange operations using FastAPI Amis Admin + FastAPI-User-Auth.
1.2 Tech Stack
| Component | Technology |
|---|---|
| Backend | FastAPI + SQLAlchemy |
| Admin UI | FastAPI Amis Admin (Baidu Amis) |
| Auth | FastAPI-User-Auth (Casbin RBAC) |
| Database | PostgreSQL (existing) |
1.3 Design Highlights ✨
Why do these designs matter? The Admin Dashboard is a core operations system for the exchange. Incorrect operations can lead to fund loss or system failures. The following design principles are key lessons we’ve learned in practice:
| Design Principle | Why? |
|---|---|
| 🔒 ID Immutability | asset_id, symbol_id cannot be modified after creation. Historical orders and trade records depend on these IDs—changing them would break data relationships. |
| 🔢 DB-Generated IDs | asset_id, symbol_id use PostgreSQL SERIAL for auto-generation, preventing human input conflicts or errors. |
| 📝 Status as Strings | Users see Active/Disabled instead of 1/0, reducing cognitive load and avoiding misinterpretation. |
| 🚫 Base ≠ Quote | Prevent creation of invalid pairs like BTC_BTC—this is a logic bug, not a UX issue. |
| 🔍 Trace ID Evidence Chain | Fundamental financial compliance requirement. Each operation carries a ULID trace_id, forming a complete audit evidence chain. When issues arise: traceable, provable, reproducible. |
| 📜 Mandatory Audit Log | All operations record before/after states, meeting compliance requirements and supporting incident investigation. |
| 🔄 Gateway Hot-Reload | Config changes take effect within 5 seconds without service restart—critical for emergency delisting scenarios. |
| ⬇️ Default Descending Sort | Lists show newest items first—operators typically focus on recent activity. |
Tutorial Tip: These design principles didn’t emerge from nothing—they come from real operational pitfalls in exchange systems. Readers should carefully understand each “Why”.
1.4 Features
| Module | Functions |
|---|---|
| User Management | KYC review, VIP level, ban/unban |
| Asset Management | Deposit confirm, withdrawal review, freeze |
| Trading Monitor | Real-time orders, trades, anomaly alerts |
| Fee Config | Symbol fee rates, VIP discounts |
| System Monitor | Service health, queue depth, latency |
| Audit Log | All admin operations logged |
2. Architecture
┌─────────────────────────────────────────────────────────┐
│ Admin Dashboard │
├─────────────────────────────────────────────────────────┤
│ FastAPI Amis Admin (UI) │
│ ├── User Management │
│ ├── Asset Management │
│ ├── Trading Monitor │
│ ├── Fee Config │
│ └── System Monitor │
├─────────────────────────────────────────────────────────┤
│ FastAPI-User-Auth (RBAC) │
│ ├── Page Permissions │
│ ├── Action Permissions │
│ ├── Field Permissions │
│ └── Data Permissions │
├─────────────────────────────────────────────────────────┤
│ PostgreSQL (existing) │ TDengine (read-only) │
│ - users_tb │ - trades_tb │
│ - balances_tb │ - balance_events_tb │
│ - symbols_tb │ - klines_tb │
│ - transfers_tb │ │
└─────────────────────────────────────────────────────────┘
3. RBAC Roles
| Role | Permissions |
|---|---|
| Super Admin | All permissions |
| Risk Officer | Withdrawal review, user freeze |
| Operations | User management, VIP config |
| Support | View-only, no modifications |
| Auditor | View audit logs only |
4. Implementation Plan
Phase 1: MVP - Config Management
Scope: Basic login + config CRUD (Asset, Symbol, VIP)
Step 1: Project Setup
mkdir admin && cd admin
python -m venv venv && source venv/bin/activate
pip install fastapi-amis-admin fastapi-user-auth sqlalchemy asyncpg
Step 2: Database Connection
- Connect to existing PostgreSQL (
zero_x_infinitydatabase) - Reuse existing tables:
assets_tb,symbols_tb,users_tb
Step 3: Admin CRUD
| Model | Table | Operations |
|---|---|---|
| Asset | assets_tb | List, Create, Update, Enable/Disable |
| Symbol | symbols_tb | List, Create, Update, Trading/Halt |
| VIP Level | vip_levels_tb | List, Create, Update |
| Audit Log | admin_audit_log | List (read-only) |
Symbol Status
| Status | Description |
|---|---|
trading | Normal trading |
halt | Suspended (maintenance/emergency) |
Step 4: Admin Auth
- Default super admin account
- Login/Logout UI
Acceptance Criteria
| ID | Criteria | Verify |
|---|---|---|
| AC-01 | Admin can login at http://localhost:$ADMIN_PORT/admin | Browser access (dev:8002, ci:8001) |
| AC-02 | Can create Asset (name, symbol, decimals) | UI + DB |
| AC-03 | Can edit Asset | UI + DB |
| AC-04 | Gateway hot-reload Asset config | No restart needed |
| AC-05 | Can create Symbol (base, quote, fees) | UI + DB |
| AC-06 | Can edit Symbol | UI + DB |
| AC-07 | Gateway hot-reload Symbol config | No restart needed |
| AC-08 | Can create/edit VIP Level | UI + DB |
| AC-09 | Reject invalid input (decimals<0, fee>100%) | Boundary tests |
| AC-10 | VIP default Normal (level=0, 100% fee) | Seed data |
| AC-11 | Asset Enable/Disable | Gateway rejects disabled asset |
| AC-12 | Symbol Halt | Gateway rejects new orders |
| AC-13 | Audit log | All CRUD ops queryable |
Input Validation Rules
| Field | Rule |
|---|---|
decimals | 0-18, must be integer |
fee_rate | 0-100%, max 10000 bps |
symbol | Unique, uppercase + underscore |
base_asset / quote_asset | Must exist |
Future Enhancements (P2)
Chain Asset Management (Layer 2): Implementation of ADR-005
- Chain Config: Manage
chains_tb(RPC, confirmations)- Asset Binding: Manage
chain_assets_tb(Contract Address, Decimals)- Auto-Verify: Verify contracts on-chain before binding
- Asset Migration (P3): Unbind/Rebind for Token Swaps (e.g., LEND -> AAVE)
Dual-Confirmation Workflow:
- Preview - Config change preview
- Second approval - Another admin approves
- Apply - Takes effect after confirmation
For: Symbol delisting, Asset disable, and other irreversible ops
Multisig Withdrawal:
- Admin can only create “withdrawal proposal”, not execute directly
- Flow: Support submits → Finance reviews → Offline sign/MPC executes
- Private keys must NEVER touch admin server
5. Security Requirements (MVP Must-Have)
5.1 Mandatory Audit Logging (Middleware)
Every request must be logged:
# FastAPI Middleware
@app.middleware("http")
async def audit_log_middleware(request: Request, call_next):
response = await call_next(request)
await AuditLog.create(
admin_id=request.state.admin_id,
ip=request.client.host,
timestamp=datetime.utcnow(),
action=f"{request.method} {request.url.path}",
old_value=...,
new_value=...,
)
return response
5.2 Decimal Precision (Required)
Prevent JSON float precision loss:
from pydantic import BaseModel, field_serializer
from decimal import Decimal
class FeeRateResponse(BaseModel):
rate: Decimal
@field_serializer('rate')
def serialize_rate(self, rate: Decimal, _info):
return str(rate) # Serialize as String
⚠️ All amounts and rates MUST use
Decimal, output MUST beString
Naming Consistency (with existing code)
| Entity | Field | Values |
|---|---|---|
| Asset | status | 0=disabled, 1=active |
| Symbol | status | 0=offline, 1=online, 2=maintenance |
⚠️ Implementation MUST match
migrations/001_init_schema.sql
6. UX Requirements (Post-QA Review)
Based on QA feedback from 160+ test cases. These requirements enhance usability and prevent errors.
6.1 Asset/Symbol Display Enhancement
UX-01: Display Asset names in Symbol creation/edit forms
Base Asset: [BTC (ID: 1) ▼] ← Dropdown with asset code
Quote Asset: [USDT (ID: 2) ▼]
Implementation: Use SQLAlchemy relationship display in FastAPI Amis Admin.
6.2 Fee Display Format
UX-02: Show fees in both percentage and basis points
Maker Fee: 0.10% (10 bps)
Taker Fee: 0.20% (20 bps)
Implementation:
@field_serializer('base_maker_fee')
def serialize_fee(self, fee: int, _info):
pct = fee / 10000
return f"{pct:.2f}% ({fee} bps)"
6.3 Danger Confirmation Dialog
UX-03: Confirm dialog for critical operations (Symbol Halt, Asset Disable)
┌─────────────────────────────────┐
│ ⚠️ Halt Symbol: BTC_USDT │
├─────────────────────────────────┤
│ • Current orders: 1,234 │
│ • 24h volume: $12M │
│ │
│ This action is reversible │
│ │
│ [Confirm Halt] [Cancel] │
└─────────────────────────────────┘
Note: No “type to confirm” required (action is reversible).
6.4 Immutable Field Indicators
UX-04: Visually mark immutable fields in edit forms
Asset Edit:
┌──────────────────────────┐
│ Asset Code: BTC 🔒 │ ← Locked, disabled
│ Decimals: 8 🔒 │ ← Locked, disabled
│ Name: [Bitcoin ] ✏️ │ ← Editable
│ Status: [Active ▼] ✏️ │ ← Editable
└──────────────────────────┘
Implementation: Use readonly_fields in ModelAdmin.
6.5 Structured Error Messages
UX-05: Provide actionable error responses
{
"field": "asset",
"error": "Invalid format",
"got": "btc!",
"expected": "Uppercase letters A-Z only (e.g., BTC)",
"hint": "Remove special character '!'"
}
🚨 6.6 CRITICAL: Base ≠ Quote Validation
UX-06: Prevent creating symbols with same base and quote
This is a LOGIC BUG, not just UX.
@model_validator(mode='after')
def validate_base_quote_different(self):
if self.base_asset_id == self.quote_asset_id:
raise ValueError("Base and Quote assets must be different")
return self
Test Case: BTC_BTC must be rejected.
6.7 ID Auto-Generation (DB Responsibility)
Requirement: asset_id and symbol_id are auto-generated by database, NOT user input.
Create Asset Form:
┌──────────────────────────┐
│ Asset Code: [BTC ] │ ← User fills
│ Name: [Bitcoin ] │ ← User fills
│ Decimals: [8] │ ← User fills
│ │
│ asset_id: (auto) │ ← DB generates (SERIAL)
└──────────────────────────┘
Create Symbol Form:
┌──────────────────────────┐
│ Symbol: [BTC_USDT ] │ ← User fills
│ Base Asset: [BTC ▼] │ ← User selects
│ Quote Asset: [USDT ▼] │ ← User selects
│ │
│ symbol_id: (auto) │ ← DB generates (SERIAL)
└──────────────────────────┘
Implementation: Use PostgreSQL SERIAL or IDENTITY columns.
-- Already in migrations/001_init_schema.sql
CREATE TABLE assets_tb (
asset_id SERIAL PRIMARY KEY, -- Auto-increment
asset VARCHAR(16) NOT NULL UNIQUE,
...
);
6.8 Status/Flags String Display
Requirement: Display Status and Flags as human-readable strings, not raw numbers.
Asset Status Display:
| DB Value | Display String | Color |
|---|---|---|
| 0 | Disabled | 🔴 Red |
| 1 | Active | 🟢 Green |
Symbol Status Display:
| DB Value | Display String | Color |
|---|---|---|
| 0 | Offline | ⚫ Gray |
| 1 | Online | 🟢 Green |
| 2 | Close-Only | 🟡 Yellow |
Asset Flags Display (bitmask):
Flags: [Deposit ✓] [Withdraw ✓] [Trade ✓] [Internal Transfer ✓]
Instead of: asset_flags: 23
Implementation (Final Design):
⚠️ API Design: Status accepts STRING INPUT ONLY. Integer input is rejected.
class AssetStatus(IntEnum):
DISABLED = 0
ACTIVE = 1
class SymbolStatus(IntEnum):
OFFLINE = 0
ONLINE = 1
CLOSE_ONLY = 2
# Pydantic schema validation (string-only input)
@field_validator('status', mode='before')
def validate_status(cls, v):
if not isinstance(v, str):
raise ValueError(f"Status must be a string, got: {type(v).__name__}")
return AssetStatus[v.upper()]
# Output serialization (always string)
@field_serializer('status')
def serialize_status(self, value: int) -> str:
return AssetStatus(value).name # "ACTIVE" or "DISABLED"
Test Count: 177 unit tests (5 for UX-08 specifically)
6.9 Default Descending Sorting (UX-09)
Requirement: All list views must default to descending order (newest items first).
Reason: Admins usually want to see recent activity or newly created entities.
Implementation: Set ordering = [Model.pk.desc()] in ModelAdmin classes.
🔒 6.10 Full Lifecycle Trace ID (UX-10) - CRITICAL
Requirement: Every admin operation MUST carry a unique trace_id (ULID) from entry to exit.
Why: Admin Dashboard is critical infrastructure. Full observability is mandatory for:
- Audit compliance
- Debugging production issues
- Security forensics
- Performance monitoring
Trace Lifecycle:
┌──────────────────────────────────────────────────────────────────┐
│ Request Entry │
│ trace_id: 01HRC5K8F1ABCDEFG... (ULID generated) │
├──────────────────────────────────────────────────────────────────┤
│ [LOG] trace_id=01HRC5K8F1... action=START endpoint=/asset │
│ [LOG] trace_id=01HRC5K8F1... action=VALIDATE input={...} │
│ [LOG] trace_id=01HRC5K8F1... action=DB_QUERY sql=SELECT... │
│ [LOG] trace_id=01HRC5K8F1... action=DB_UPDATE before={} after={}│
│ [LOG] trace_id=01HRC5K8F1... action=AUDIT_LOG written │
│ [LOG] trace_id=01HRC5K8F1... action=END status=200 duration=45ms│
├──────────────────────────────────────────────────────────────────┤
│ Response Exit │
│ X-Trace-ID: 01HRC5K8F1ABCDEFG... (returned in header) │
└──────────────────────────────────────────────────────────────────┘
Implementation:
import ulid
from fastapi import Request
from contextvars import ContextVar
# Context variable for trace_id
trace_id_var: ContextVar[str] = ContextVar("trace_id", default="")
@app.middleware("http")
async def trace_middleware(request: Request, call_next):
# Generate ULID for each request
trace_id = str(ulid.new())
trace_id_var.set(trace_id)
# Log entry
logger.info(f"trace_id={trace_id} action=START endpoint={request.url.path}")
response = await call_next(request)
# Log exit
logger.info(f"trace_id={trace_id} action=END status={response.status_code}")
# Return trace_id in response header
response.headers["X-Trace-ID"] = trace_id
return response
# Audit log includes trace_id
class AuditLog(Base):
trace_id = Column(String(26), nullable=False) # ULID is 26 chars
admin_id = Column(BigInteger, nullable=False)
action = Column(String(32), nullable=False)
...
Log Format (structured JSON):
{
"timestamp": "2025-12-27T10:25:00Z",
"trace_id": "01HRC5K8F1ABCDEFGHIJK",
"admin_id": 1001,
"action": "DB_UPDATE",
"entity": "Asset",
"entity_id": 5,
"before": {"status": 1},
"after": {"status": 0},
"duration_ms": 12
}
Verification:
- Every request generates unique ULID trace_id
- All log lines include trace_id
- Audit log table has trace_id column
- Response includes
X-Trace-IDheader - Local log files are rotated and retained
7. Testing
� Full Testing Guide: 0x0F-admin-testing.md
Quick Start:
./scripts/run_admin_full_suite.sh # Run all tests
Test Summary:
| Category | Count | Status |
|---|---|---|
| Rust unit tests | 5 | ✅ |
| Admin unit tests | 178+ | ✅ |
| Admin E2E tests | 4/4 | ✅ |
| UX-10 Trace ID | 16/16 | ✅ |
Ports: Dev 8002, CI 8001
8. Future Phases
| Phase | Content |
|---|---|
| Phase 2 | User management, balance viewer |
| Phase 3 | TDengine monitoring |
| Phase 4 | Full RBAC, advanced audit |
7. Directory Structure
admin/
├── main.py # FastAPI app entry
├── settings.py # Config
├── models/ # SQLAlchemy models (shared with main app)
├── admin/
│ ├── user.py # User admin
│ ├── asset.py # Asset admin
│ ├── trading.py # Trading admin
│ └── system.py # System admin
├── auth/
│ └── rbac.py # RBAC config
└── requirements.txt
🇨🇳 中文
📅 状态: ✅ 已验证 (E2E 4/4 通过) 分支:
0x0F-admin-dashboard
📦 代码变更: 查看代码
1. 概述
1.1 目标
使用 FastAPI Amis Admin + FastAPI-User-Auth 构建交易所后台管理系统。
1.2 技术栈
| 组件 | 技术 |
|---|---|
| 后端 | FastAPI + SQLAlchemy |
| 管理界面 | FastAPI Amis Admin (百度 Amis) |
| 认证 | FastAPI-User-Auth (Casbin RBAC) |
| 数据库 | PostgreSQL (现有) |
1.3 功能模块
| 模块 | 功能 |
|---|---|
| 用户管理 | KYC 审核、VIP 等级、封禁/解封 |
| 资产管理 | 充值确认、提现审核、资产冻结 |
| 交易监控 | 实时订单/成交、异常报警 |
| 费率配置 | Symbol 费率、VIP 折扣 |
| 系统监控 | 服务健康、队列积压、延迟 |
| 审计日志 | 所有管理操作可追溯 |
2. RBAC 角色
| 角色 | 权限 |
|---|---|
| 超级管理员 | 全部权限 |
| 风控专员 | 提现审核、用户冻结 |
| 运营人员 | 用户管理、VIP 配置 |
| 客服 | 只读,不可修改 |
| 审计员 | 只看审计日志 |
4. 配置与脚本统一 (2024-12-27)
4.1 配置单一源 (Single Source of Truth)
所有环境配置统一从 scripts/lib/db_env.sh 导出:
# 数据库
export PG_HOST, PG_PORT, PG_USER, PG_PASSWORD, PG_DB
export DATABASE_URL, DATABASE_URL_ASYNC
# 服务端口
export GATEWAY_PORT # 8080
export ADMIN_PORT # Dev: 8002, CI: 8001
export ADMIN_URL, GATEWAY_URL
端口约定:
| 环境 | Gateway | Admin |
|---|---|---|
| Dev (本地) | 8080 | 8002 |
| CI | 8080 | 8001 |
| QA | 8080 | 8001 |
4.2 测试脚本命名规范
| 脚本 | 用途 |
|---|---|
run_admin_full_suite.sh | 统一入口(Rust + Admin Unit + E2E) |
run_admin_gateway_e2e.sh | Admin → Gateway 传播E2E测试 |
run_admin_tests_standalone.sh | 一键完整测试(安装deps+启动server) |
命名规范:run_<scope>_<type>.sh
4.3 测试结构
admin/tests/
├── unit/ # pytest 单元测试
├── e2e/ # pytest E2E测试 (需service running)
└── integration/ # 独立脚本 (通过CI运行)
└── test_admin_gateway_e2e.py
运行方式:
# 运行全部
./scripts/run_admin_full_suite.sh
# 快速模式(跳过unit tests)
./scripts/run_admin_full_suite.sh --quick
0x0F Admin Dashboard - Testing Guide
This document contains detailed test cases and scripts for the Admin Dashboard. For architecture overview, see Admin Dashboard.
Test Scripts
One-Click Testing
# Run all tests (Rust + Admin Unit + E2E)
./scripts/run_admin_full_suite.sh
# Quick mode (skip Unit Tests)
./scripts/run_admin_full_suite.sh --quick
# Run only Admin → Gateway propagation E2E
./scripts/run_admin_gateway_e2e.sh
Script Reference
| Script | Purpose |
|---|---|
run_admin_full_suite.sh | Unified entry (Rust + Admin Unit + E2E) |
run_admin_gateway_e2e.sh | Admin → Gateway propagation tests |
run_admin_tests_standalone.sh | One-click full test (install deps + start server) |
Port Configuration
| Environment | Admin Port | Gateway Port |
|---|---|---|
| Dev (local) | 8002 | 8080 |
| CI | 8001 | 8080 |
Test Files
| Script | Function |
|---|---|
verify_e2e.py | Admin login/logout, health check |
test_admin_login.py | Authentication tests |
test_constraints.py | Database constraint validation |
test_core_flow.py | Asset/Symbol CRUD workflows |
test_input_validation.py | Invalid input rejection |
test_security.py | Security and authentication |
tests/e2e/test_asset_lifecycle.py | Asset enable/disable lifecycle |
tests/e2e/test_symbol_lifecycle.py | Symbol trading status management |
tests/e2e/test_fee_update.py | Fee configuration updates |
tests/e2e/test_audit_log.py | Audit trail verification |
tests/test_ux10_trace_id.py | UX-10 Trace ID verification |
Running Individual Tests
cd admin && source venv/bin/activate
pytest tests/test_core_flow.py -v
pytest tests/e2e/test_asset_lifecycle.py -v
pytest tests/test_ux10_trace_id.py -v
Test Coverage
Total: 198+ tests
- Rust unit tests: 5 passed
- Admin unit tests: 178+ passed
- Admin E2E tests: 4/4 passed
- UX-10 Trace ID tests: 16/16 passed
UX Requirements Test Matrix
| UX ID | Requirement | Test File |
|---|---|---|
| UX-06 | Base ≠ Quote validation | test_constraints.py |
| UX-07 | ID Auto-Generation | test_id_mapping.py |
| UX-08 | Status String Display | test_ux08_status_strings.py |
| UX-09 | Default Descending Sort | test_core_flow.py |
| UX-10 | Trace ID Evidence Chain | test_ux10_trace_id.py |
Acceptance Criteria
| # | Deliverable | Verification |
|---|---|---|
| 1 | Admin UI accessible | Browser at localhost:$ADMIN_PORT |
| 2 | One-click E2E test | ./scripts/run_admin_full_suite.sh passes |
| 3 | All tests pass | 198+ tests green |
| 4 | Audit log queryable | Admin UI audit page |
| 5 | Gateway hot-reload | Config change without restart |
Standard Operating Procedure (SOP): Token Listing
Role: Operations / Listing Manager System: Admin Dashboard
1. 准备工作 (Pre-requisites)
Before listing, you need the following information:
| Item | Description | Example | Source |
|---|---|---|---|
| Logic Symbol | The unique ticker on the exchange | UNI | Project Team |
| Asset Name | Full display name | Uniswap | Project Team |
| Chain | The blockchain network | ETH | Project Team |
| Contract Address | The Token’s Smart Contract | 0x1f98... | Etherscan / Project |
| Decimals | Token precision | 18 | Auto-detected |
| Min Deposit | Minimum amount to credit | 0.1 | Ops Decision (Risk) |
| Withdraw Fee | Fee deducted per withdrawal | 5.0 | Ops Decision (Gas Cost) |
2. 操作步骤 (Workflow steps)
Phase 1: Create Logical Asset (业务定义)
Define the asset for Trading and User Balances.
- Navigate: Admin ->
Assets->Create New. - Input:
- Symbol:
UNI - Name:
Uniswap - Decimals:
18(System Internal Precision) - Initial Permissions:
[x] Can Allow Deposit[ ] Can Allow Withdraw(Recommended: Disable initially)[ ] Can Allow Trade(Recommended: Enable later)
- Status:
Active
- Symbol:
- Click:
Save.- System Result:
assets_tbcreated. Asset ID generated (e.g.,#10).
- System Result:
Phase 2: Bind Chain Asset (链上绑定)
Tell Sentinel how to find this asset on-chain and set limits.
-
Navigate: Admin ->
Assets-> SelectUNI(#10) ->Chain ConfigTab. -
Click:
Add New Binding. -
Input Configuration (Minimal):
- Chain: Select
ETH(Ethereum). - Contract Address: Paste
0x1f98... - (Leave other fields empty - System will fetch them)
- Chain: Select
-
Action: Click
Auto-Detect from Chain.- System Action: Queries RPC
decimals(),symbol(). - Result:
- Decimals: Auto-filled
18. (Locked, Read-only) - Symbol: Auto-detected
UNI. (Verifies against Asset name)
- Decimals: Auto-filled
- Ops Action: Verify the fetched data matches. Adjust
Min Deposit/Feeonly if defaults are unsuitable.
- System Action: Queries RPC
-
Risk Configuration: (Review Defaults)
- Min Deposit:
0.1(Prevent dust attacks). - Min Withdraw:
10.0(Must be > Fee). - Withdraw Fee:
5.0(Cover Gas + Margin).
- Min Deposit:
-
Confirm: Check detected Decimals match project info.
-
Click:
Bind (Saved as Inactive).- System Result:
chain_assets_tbcreated withis_active=false.
- System Result:
Note: Risk Parameters (Fee, Min Deposit) are Chain-Specific.
Phase 3: Validation & Activation (验证与激活)
Verify functionality before opening to public.
- Validation: Perform the “User Deposit Test” (See Section 3).
- Note: Inactive assets can still be processed by Sentinel for Admin-whitelisted test accounts (if supported), or use Staging env.
- Correction: Current Sentinel design might require Active to process. SOP Update: Set “Fee” very high or “Min Deposit” very high to prevent public use, OR rely on
asset_flags(Deposit Disabled) from Phase 1. - Refined Strategy:
- Enable Chain Binding (Active = True) to allow Sentinel to see it.
- Keep Logical Asset “Deposit/Withdraw” flags (Phase 1) DISABLED.
- Test.
- Enable Logical Flags.
(Self-Correction: Sentinel needs is_active=true to index logs. So we must keep Chain Active but Logic (User Balance) Disabled).
Revised Step 7:
7. Click: Bind & Activate (Chain Level).
* Safety: Ensure Phase 1 Flags (Deposit/Withdraw) are UNCHECKED. This allows Sentinel to sync, but Users cannot operate.
Phase 4: Public Launch (Go Live)
- Navigate: Admin ->
Assets->UNI. - Action: Check
[x] Can Allow Deposit,[x] Can Allow Withdraw. - Click:
Save.- Result: Users can now see deposit address and transact.
Note: Risk Parameters (Fee, Min Deposit) are Chain-Specific. If you list USDT on both ETH and TRON, you must configure them separately for each chain (e.g., ETH Fee = 5.0, TRON Fee = 1.0).
3. 结果验证 (Verification)
Verification A: User Deposit (Hot Test)
- Ask a test user to deposit
UNIto their Existing ETH Address.- Note: User does NOT need to generate a new address.
- Wait 1-2 minutes (Block Confirmation).
- Check Admin ->
Deposits: Should see+ UNIrecord.
Verification B: System Log
- Check
Sentinel Logs:[ETH] New asset watched: UNI (0x1f98...).
4. 常见问题 (FAQ)
Q: 用户需要重新生成地址吗? A: 不需要。只要是 ETH 链上的资产,用户统一使用同一个 ETH 充值地址。系统会自动根据 Contract 地址识别是 UNI 还是 USDT。
Q: 填错了合约地址怎么办?
A: Verify On-Chain 步骤会报错(Decimal获取失败或为0)。如果强行保存了错误地址,请立即在 Admin 中将该 Binding 设为 Disabled,然后重新添加正确的。
0x10 Web Frontend Outsourcing Specification
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📅 Status: 📝 RFP / Requirements Spec Goal: Develop a production-grade cryptocurrency exchange frontend.
1. Project Overview
We are looking for a professional development team to build the web frontend for Zero X Infinity, a high-performance cryptocurrency exchange.
Core Requirement: The frontend must be fast, responsive, and visually premium (similar to Binance/Bybit Pro implementations).
Technology Stack: Open Choice (Developer proposes stack).
- Recommended: React, Vue 3, or Svelte.
- Requirement: Must produce static assets manageable by Nginx/Docker.
2. Scope of Work
2.1 Core Pages
| Page | Features | Backend Status |
|---|---|---|
| Home / Landing | Market overview, Tickers, “Start Trading” CTA. | ⚠️ Mock Data (Public API part ready) |
| Authentication | Login, Register, Forgot Password. | ✅ Ready (Phase 0x10.6 Implemented) |
| Trading Interface | (Core) K-Line Chart, OrderBook, Trade History, Order Form. | ✅ Ready (Full API Support) |
| Assets / Wallet | Balance overview, Deposit, Withdrawal, Asset History. | ⚠️ Partial (Read Only ready; Dep/Wdw Pending) |
| User Center | API Key management, Password reset, Activity log. | ✅ Backend Ready (UI Pending) |
2.2 Key Features & Requirements
A. Trading Interface (Critical)
- Layout: 3-column classic layout (Left: Orderbook, Mid: Chart, Right: Trade History/Forms).
- Chart: Integration with TradingView Charting Library (or Lightweight Charts).
- OrderBook: Visual depth representation, clickable price to fill order form.
- Responsiveness: Must work flawlessly on Desktop (1080p+) and Mobile.
B. Technical Constraints
- NO FLOATING POINT MATH: All precision must use String or BigInt arithmetic.
- Backend sends:
"123.45670000"(String). - Frontend displays: Fixed precision per asset config.
- Backend sends:
- WebSocket Push: Market data is pushed via WebSocket. Frontend must handle reconnection and heartbeat.
- Ed25519 Authentication:
- API requests require
X-Signatureheader. - Frontend must sign payload using Ed25519 private key (stored in memory/session).
- Note: If using a standard password login flow, the backend may handle session cookies, but for high-security actions or if “API-Key mode” is used, client-side signing is required. (Clarification: MVP will use opaque Session Token returned by API, standard HTTP Only Cookie or Bearer Token. Ed25519 is for API Clients, but Web UI can use session wrapper.)
- API requests require
3. Deliverables
- Source Code: Full git repository history.
- Docker Support:
Dockerfilefor multi-stage build (Node build -> Nginx alpine). - Documentation:
README.md: Build & Run instructions.CONFIG.md: Environment variable reference.
- Mock Server: Simple mock logic or fixtures for UI testing without full backend.
4. Resources provided
- API Documentation: Swagger UI / OpenAPI Spec (See Section 6.1)
- WebSocket Protocol: Docs
- UI/UX References: Binance, Kraken Pro.
5. API Inventory (Current Available)
The following APIs are implemented and available for frontend integration.
5.1 Public Market Data
Base URL: /api/v1/public
| Endpoint | Method | Description | Status |
|---|---|---|---|
/exchange_info | GET | Server time, limits | ✅ Ready |
/assets | GET | List supported assets | ✅ Ready |
/symbols | GET | List trading pairs | ✅ Ready |
/depth | GET | Order book depth | ✅ Ready |
/klines | GET | OHLCV candles | ✅ Ready |
/trades | GET | Public trade history | ✅ Ready |
5.2 Private Trading (Requires Signature)
Base URL: /api/v1/private
| Endpoint | Method | Description | Status |
|---|---|---|---|
/order | POST | Place limit/market order | ✅ Ready |
/cancel | POST | Cancel order | ✅ Ready |
/orders | GET | List open/history orders | ✅ Ready |
/order/{id} | GET | Get single order details | ✅ Ready |
/trades | GET | User trade history | ✅ Ready |
/balances | GET | Get specific asset balance | ✅ Ready |
/balances/all | GET | Get all asset balances | ✅ Ready |
5.3 WebSocket Real-time Stream
Endpoint: ws://host:port/ws
| Channel | Type | Description | Status |
|---|---|---|---|
order.update | Private | Order status change | ✅ Ready (Authenticated) |
trade | Private | User trade execution | ✅ Ready (Authenticated) |
balance.update | Private | Balance change | ✅ Ready (Authenticated) |
market.depth | Public | Orderbook updates | ✅ Ready |
market.ticker | Public | 24h Ticker updates | ✅ Ready |
market.trade | Public | Public trade stream | ✅ Ready |
5.4 Authentication & User
| Feature | Description | Status |
|---|---|---|
| Sign-up/Login | User registration & JWT | ✅ Ready (Implemented) |
| User Profile | KYC, Password reset | ⚠️ Partial (Password Reset Ready) |
| API Keys | Manage API keys | ✅ Ready (Implemented) |
6. Development Resources
6.1 How to Access API Documentation
The backend provides auto-generated OpenAPI 3.0 documentation.
Step 1: Start the Backend (Mock Mode)
# Clone repository
git clone https://github.com/gjwang/zero_x_infinity
cd zero_x_infinity
# Run Gateway (requires Rust installed)
cargo run --release -- --gateway --port 8080
Step 2: Access Documentation
- Interactive Swagger UI: http://localhost:8080/docs
- Raw OpenAPI JSON: http://localhost:8080/api-docs/openapi.json
Step 3: Generate Client SDK
You can use openapi-generator-cli to generate a robust client:
npx @openapitools/openapi-generator-cli generate \
-i http://localhost:8080/api-docs/openapi.json \
-g typescript-axios \
-o ./src/api
🇨🇳 中文
📅 状态: 📝 外包需求文档 (RFP) 目标: 开发一套生产级的加密货币交易所 Web 前端。
1. 项目概览
我们需要一个专业团队为 Zero X Infinity 高性能交易所开发 Web 前端。
核心要求: 界面必须 快速、响应式且具备高级感(对标 Binance/Bybit 专业版体验)。
技术栈: 不限 (由开发方提案)。
- 推荐: React, Vue 3, 或 Svelte。
- 要求: 最终产物必须是静态文件,可由 Nginx/Docker 托管。
2. 工作范围
2.1 核心页面
| 页面 | 功能点 | 后端状态 | |——|________|–––––| | 首页 | 市场概览, 推荐币种, “开始交易“引导 | ⚠️ Mock 数据 (部分公有API就绪) | | 认证模块 | 登录, 注册, 找回密码 | ✅ 后端就绪 (Phase 0x10.6 已完成) | | 交易界面 | (核心) K线图, 盘口, 最新成交, 下单面板 | ✅ 完全就绪 (API 齐备) | | 资产/钱包 | 资产总览, 充值, 提现, 资金流水 | ⚠️ 部分就绪 (仅只读余额; 充提待定) | | 用户中心 | API Key 管理, 密码修改, 活动日志 | ✅ 后端就绪 (UI 待开发) |
2.2 关键特性与要求
A. 交易界面 (关键)
- 布局: 经典三栏布局 (左: 盘口, 中: K线, 右: 成交/下单)。
- 图表: 集成 TradingView Charting Library (或 Lightweight Charts)。
- 盘口: 带有视觉深度的买卖盘列表,点击价格可填入下单框。
- 响应式: 必须完美适配桌面端 (1080p+) 和移动端浏览器。
B. 技术限制
- 严禁浮点数运算: 所有金额/价格必须使用 String 或 BigInt 处理。
- 后端下发:
"123.45670000"(字符串)。 - 前端显示: 根据配置的精度进行截断/补零。
- 后端下发:
- WebSocket 推送: 行情数据通过 WS 推送。前端需处理断线重连和心跳。
- Ed25519 签名 (如需):
- 注: Web 端通常使用 Session Cookie/Token 模式。如涉及客户端签名功能,需支持 Ed25519 算法。
3. 交付物
- 源代码: 完整的 Git 提交记录。
- Docker 支持:
Dockerfile(多阶段构建: Node build -> Nginx alpine)。 - 文档:
README.md: 构建与运行指南。CONFIG.md: 环境变量说明。
- Mock 服务: 用于 UI 独立开发的 Mock 数据或逻辑。
4. 提供资源
- API 文档: Swagger UI / OpenAPI Spec (见第 6.1 节)
- WebSocket 协议: 文档
- UI/UX 参考: Binance, Kraken Pro.
5. API 清单 (当前可用)
以下 API 已实现并可用于前端集成。
5.1 公开行情数据
基础 URL: /api/v1/public
| 端点 | 方法 | 描述 | 状态 |
|---|---|---|---|
/exchange_info | GET | 服务器时间, 限制 | ✅ 就绪 |
/assets | GET | 资产列表 | ✅ 就绪 |
/symbols | GET | 交易对列表 | ✅ 就绪 |
/depth | GET | 订单簿深度 | ✅ 就绪 |
/klines | GET | K线数据 | ✅ 就绪 |
/trades | GET | 公开成交历史 | ✅ 就绪 |
5.2 私有交易 (需签名)
基础 URL: /api/v1/private
| 端点 | 方法 | 描述 | 状态 |
|---|---|---|---|
/order | POST | 下单 (限价/市价) | ✅ 就绪 |
/cancel | POST | 撤单 | ✅ 就绪 |
/orders | GET | 查询订单 (当前/历史) | ✅ 就绪 |
/order/{id} | GET | 查询单条订单 | ✅ 就绪 |
/trades | GET | 用户成交历史 | ✅ 就绪 |
/balances | GET | 查询特定资产余额 | ✅ 就绪 |
/balances/all | GET | 查询所有余额 | ✅ 就绪 |
5.3 WebSocket 实时流
端点: ws://host:port/ws
| 频道 | 类型 | 描述 | 状态 |
|---|---|---|---|
order.update | 私有 | 订单状态变更 | ✅ 就绪 (需鉴权) |
trade | 私有 | 用户成交通知 | ✅ 就绪 (需鉴权) |
balance.update | 私有 | 余额变更 | ✅ 就绪 (需鉴权) |
market.depth | 公开 | 盘口深度更新 | ✅ 就绪 |
market.ticker | 公开 | 24h Ticker更新 | ✅ 就绪 |
market.trade | 公开 | 公开成交流 | ✅ 就绪 |
5.4 认证与用户
| 功能 | 描述 | 状态 |
|---|---|---|
| 注册/登录 | 用户注册 & JWT | ✅ 就绪 (已实现) |
| 用户资料 | KYC, 密码重置 | ⚠️ 部分就绪 (支持改密) |
| API Key | 管理 API Key | ✅ 就绪 (已实现) |
6. 开发资源
6.1 如何获取 API 文档
后端提供自动生成的 OpenAPI 3.0 文档。
步骤 1: 启动后端 (Mock 模式)
# 克隆仓库
git clone https://github.com/gjwang/zero_x_infinity
cd zero_x_infinity
# 运行网关 (需要安装 Rust)
cargo run --release -- --gateway --port 8080
步骤 2: 访问文档
- 交互式 Swagger UI: http://localhost:8080/docs
- 原始 OpenAPI JSON: http://localhost:8080/api-docs/openapi.json
步骤 3: 生成客户端 SDK
你可以使用 openapi-generator-cli 生成健壮的客户端代码:
npx @openapitools/openapi-generator-cli generate \
-i http://localhost:8080/api-docs/openapi.json \
-g typescript-axios \
-o ./src/api
0x11 Deposit & Withdraw (Mock Chain)
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📦 Code Changes: View Diff
Core Objective: Implement the Funding Layer (Deposit & Withdraw) using a Mock Chain Architecture to validate asset flows without external blockchain dependencies.
1. Background & Architecture
We have a high-performance Matching Engine (Phase I) and a Product Layer (Accounts/Auth, Phase II). Now we add the Funding Layer to allow assets to enter and leave the system.
1.1 The “Mock Chain” Strategy
Instead of syncing 500GB of Bitcoin data, we implement a Simulator for Phase 0x11.
- Goal: Validate internal logic (Balance Credit, Risk Check, Idempotency).
- Method:
MockBtcChainandMockEvmChaintraits that simulate RPC calls.
graph LR
User[User] -->|API Request| Gateway
Gateway -->|Risk Check| FundingService
FundingService -->|Command| ME[Matching Engine]
FundingService -.->|Simulated RPC| MockChain[Mock Chain Adapter]
MockChain -.->|Callback| FundingService
1.2 Phase Plan
| Chapter | Topic | Status |
|---|---|---|
| 0x11 | Deposit & Withdraw (Mock) | ✅ Completed |
| 0x11-a | Real Chain Integration | 🚧 Construction |
2. Core Implementation
2.1 Funding Service (src/funding/service.rs)
The central orchestrator for all funding operations.
- Deposit: Receives “Mock Event”, checks idempotency, credits user balance via matching engine.
- Withdraw: Authenticates user, locks funds in engine, simulates broadcast, updates DB.
2.2 Chain Adapter Trait (src/funding/chain_adapter.rs)
We abstract blockchain specifics behind a trait:
#![allow(unused)]
fn main() {
#[async_trait]
pub trait ChainClient: Send + Sync {
async fn generate_address(&self, user_id: i64) -> Result<String, ChainError>;
async fn broadcast_withdraw(&self, to: &str, amount: &str) -> Result<String, ChainError>;
// ... validation methods
}
}
2.3 Database Schema (Migration)
Key tables added in migrations/010_deposit_withdraw.sql:
deposit_history: Tracks incoming transactions (Key:tx_hash).withdraw_history: Tracks outgoing requests (Key:request_id).user_addresses: MapsUser <-> Asset <-> Address.
3. Data Flow
3.1 Deposit Flow (Mock)
- Trigger:
POST /internal/mock/deposit { user_id, asset, amount } - Idempotency: Check if
tx_hashexists indeposit_history. - Engine Execution: Send
OrderAction::Depositto Match Engine. - Result: User Balance increases.
#![allow(unused)]
fn main() {
// src/funding/deposit.rs
pub async fn process_deposit(...) {
if db.exists(tx_hash).await? { return Ok(()); }
// Command Engine
engine.execute(Deposit(user_id, asset, amount)).await?;
// Persist
db.insert_deposit(..., "SUCCESS").await?;
}
}
3.2 Withdraw Flow
- Request:
POST /api/v1/private/withdraw/apply - Risk Check: 2FA (Future), Whitelist, Balance Check.
- Engine Lock: Send
OrderAction::WithdrawLock(Instant deduction). - Broadcast: Call
mock_chain.broadcast(). - Finalize: Update
withdraw_historywithtx_hash.
4. Verification
We verified this phase using a comprehensive E2E script.
4.1 Verification Script
Run the master script to verify the full lifecycle:
./scripts/verify_funding_trading_flow.sh
Scenario Covered:
- Register User A & B.
- Deposit BTC to User A (Mock).
- Transfer internal funds.
- Trade (Buy/Sell) to change balances.
- Withdraw USDT from User B.
- Audit: Check DB consistency.
4.2 Security Validation
- Address Validation: Strict Regex for
0x...(ETH) and1/3/bc1...(BTC). - Internal Auth: Mock endpoints protected by
X-Internal-Secret.
Warning
SECURITY ADVISORY: The
/internal/mock/depositendpoint is a major security risk as it allows direct balance manipulation. It is currently protected by a secret but MUST be removed entirely once the Phase 0x11-a Sentinel (blockchain scanner) is fully integrated and stable.
Summary
Phase 0x11 establishes the “Financial Highways” of the exchange. By using a Mock Chain, we isolated the complex internal logic (Accounting, Risk, Idempotency) from the external chaos of real blockchains.
Key Achievement:
A complete, idempotent Asset Inflow/Outflow system that is “Blockchain Agnostic”.
Next Step:
Phase 0x11-a: Replace the “Mock Adapter” with a “Real Node Sentinel” (Bitcoin Core / Anvil).
🇨🇳 中文
📦 代码变更: 查看 Diff
核心目标:实现 资金层 (Funding Layer) (充值与提现),使用 模拟链架构 (Mock Chain) 来验证资金流转,而不依赖外部区块链环境。
1. 背景与架构
我们已经拥有了高性能的 撮合引擎 (Phase I) 和 产品层 (账户/鉴权, Phase II)。 现在我们需要添加 资金层,允许资产进入和离开系统。
1.1 “Mock Chain” 策略
在 Phase 0x11 中,我们实现一个 模拟器,而不是直接同步 500GB 的比特币数据。
- 目标: 验证内部逻辑 (余额入账、风控检查、幂等性)。
- 方法:
MockBtcChain和MockEvmChaintrait,模拟 RPC 调用。
graph LR
User[用户] -->|API 请求| Gateway
Gateway -->|风控检查| FundingService
FundingService -->|指令| ME[撮合引擎]
FundingService -.->|模拟 RPC| MockChain[Mock Chain 适配器]
MockChain -.->|回调| FundingService
1.2 阶段规划
| 章节 | 主题 | 状态 |
|---|---|---|
| 0x11 | 充值与提现 (Mock) | ✅ 已完成 |
| 0x11-a | 真实链集成 | 🚧 建设中 |
2. 核心实现
2.1 资金服务 (src/funding/service.rs)
资金操作的核心协调器。
- 充值 (Deposit): 接收 “模拟事件”,检查幂等性,通过撮合引擎增加用户余额。
- 提现 (Withdraw): 验证用户,锁定引擎中的资金,模拟广播,更新数据库。
2.2 链适配器接口 (src/funding/chain_adapter.rs)
我们将区块链细节抽象在 Trait 之后:
#![allow(unused)]
fn main() {
#[async_trait]
pub trait ChainClient: Send + Sync {
async fn generate_address(&self, user_id: i64) -> Result<String, ChainError>;
async fn broadcast_withdraw(&self, to: &str, amount: &str) -> Result<String, ChainError>;
// ... 验证方法
}
}
2.3 数据库 Schema (Migration)
migrations/010_deposit_withdraw.sql 新增的关键表:
deposit_history: 追踪入金 (Key:tx_hash)。withdraw_history: 追踪出金 (Key:request_id)。user_addresses: 映射User <-> Asset <-> Address。
3. 数据流
3.1 充值流程 (Mock)
- 触发:
POST /internal/mock/deposit { user_id, asset, amount } - 幂等性: 检查
deposit_history中是否存在tx_hash。 - 引擎执行: 发送
OrderAction::Deposit给撮合引擎。 - 结果: 用户余额增加。
#![allow(unused)]
fn main() {
// src/funding/deposit.rs
pub async fn process_deposit(...) {
if db.exists(tx_hash).await? { return Ok(()); }
// Command Engine
engine.execute(Deposit(user_id, asset, amount)).await?;
// Persist
db.insert_deposit(..., "SUCCESS").await?;
}
}
3.2 提现流程
- 请求:
POST /api/v1/private/withdraw/apply - 风控: 2FA (规划中), 白名单, 余额检查。
- 引擎锁定: 发送
OrderAction::WithdrawLock(瞬间扣除)。 - 广播: 调用
mock_chain.broadcast()。 - 终结: 更新
withdraw_history填充tx_hash。
4. 验证与测试
我们使用全链路 E2E 脚本验证了本阶段功能。
4.1 验证脚本
运行主脚本以验证完整生命周期:
./scripts/verify_funding_trading_flow.sh
覆盖场景:
- 注册 用户 A & B。
- 充值 BTC 给用户 A (模拟)。
- 划转 资金 (Internal Transfer)。
- 交易 (买/卖) 改变余额。
- 提现 USDT (用户 B)。
- 审计: 检查数据库一致性。
4.2 安全性验证
- 地址验证: 针对
0x...(ETH) 和1/3/bc1...(BTC) 的严格正则校验。 - 内部鉴权: Mock 端点受
X-Internal-Secret保护。
Caution
安全警告:
/internal/mock/deposit接口存在重大安全隐患,因为它允许直接修改用户余额。虽然目前增加了 Secret 校验,但在 Phase 0x11-a Sentinel(区块链扫描器)完全集成并稳定后,必须彻底移除此接口。
总结
Phase 0x11 建立了交易所的 “资金高速公路”。 通过使用 Mock Chain,我们将复杂的内部逻辑(会计、风控、幂等性)与外部区块链的混乱隔离开来。
关键成就:
一套完整的、幂等的资产流入/流出系统,且做到 “Blockchain Agnostic” (与具体链解耦)。
下一步:
Phase 0x11-a: 将 “Mock Adapter” 替换为 “Real Node Sentinel” (Bitcoin Core / Anvil)。
0x11-a Real Chain Integration
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
| Status | ✅ IMPLEMENTED / QA VERIFIED (Phase 0x11-a Complete) |
|---|---|
| Date | 2025-12-29 |
| Context | Phase 0x11 Extension: From Mock to Reality |
| Goal | Integrate real Blockchain Nodes (Regtest/Testnet) and handle distributed system failures (Re-orgs, Network Partition). |
1. Core Architecture Change: Pull vs Push
The “Mock” phase (0x11) relied on a Push Model (API Call -> Deposit). Real Chain Integration (0x11-a) requires a Pull Model (Sentinel -> DB).
1.1 The Sentinel (New Service)
A dedicated, independent service loop responsible for “watching” the blockchain.
- Block Scanning: Polls
getblockchaininfo/eth_blockNumber. - Filter: Index
user_addressesin memory. Scan every transaction in new blocks against this filter. - State Tracking: Updates confirmation counts for existing
CONFIRMINGdeposits.
2. Critical Challenge: Re-org (Chain Reorganization)
In a real blockchain, the “latest” block is not final. It can be orphaned.
2.1 Confirmation State Machine
We must expand the Deposit Status flow to handle volatility.
| Status | Confirmations | Action | UI Display |
|---|---|---|---|
| DETECTED | 0 | Log Tx. Do NOT credit balance. | “Confirming (0/X)” |
| CONFIRMING | 1 to (X-1) | Update confirmation count. Check for Re-org (BlockHash mismatch). | “Confirming (N/X)” |
| FINALIZED | >= X | Action: Push OrderAction::Deposit to Pipeline. | “Success” |
Important
X represents the
REQUIRED_CONFIRMATIONSparameter. Hardcoding is forbidden.
2.2 Re-org Detection Logic
- Sentinel remembers
Block(Height H) = Hash A. - Sentinel scans
Height Hagain later. - If
Hash != A, a Re-org happened. - Action: Rollback scan cursor, re-evaluate all affected deposits.
3. Supported Chains (Phase I)
3.1 Bitcoin (The UTXO Archetype)
- Node:
bitcoind(Regtest Mode). - Key Challenge: UTXO Management. A deposit is not a “balance update”, it’s a new Unspent Output.
- Docker:
ruimarinho/bitcoin-core:24
3.2 Ethereum (The Account/EVM Archetype) - 🚧 PENDING
- Status: Design Complete, Implementation Pending (Phase 0x11-b).
- Node:
anvil(from Foundry-rs). - Key Challenge: Event Log Parsing. ERC20 deposits are
Transferevents in receipt logs. - Docker:
ghcr.io/foundry-rs/foundry:latest
4. Sentinel Architecture (Detailed)
4.1 BtcSentinel (Implemented)
getblockhash->getblock(Verbosity 2).- Iterate outputs
vout: MatchscriptPubKeyagainstuser_addresses. - Re-org Check: Keep a rolling window. If
previousblockhashmismatch, trigger Rollback.
4.2 EthSentinel (Planned for 0x11-b)
eth_getLogs(Topic0 = Transfer).- Re-org Check: Check
blockHashof confirmed logs.
5. Reconciliation & Safety (The Financial Firewall)
5.1 The “Truncation Protocol”
- Ingress Logic:
Deposit_Credited = Truncate(Deposit_Raw, Configured_Precision) - Residue: Remainder stays in wallet as “System Dust”.
5.2 The Triangular Reconciliation
We verify solvency using three independent data sources:
| Source | Alias | Data Point |
|---|---|---|
| Blockchain RPC | Proof of Assets (PoA) | getbalance() or sum of UTXOs |
| Internal Ledger | Proof of Liabilities (PoL) | SUM(user.available + user.frozen) |
| Transaction History | Proof of Flow (PoF) | SUM(deposits) - SUM(withdrawals) - SUM(fees) |
The Equation: PoA == PoL + SystemProfit
5.3 Re-org Recovery Protocol
- Shallow Re-org: Sentinel rolls back cursor.
- Deep Re-org (> Max Depth): Manual intervention (Freeze + Clawback).
6. Database Schema Extensions
CREATE TABLE chain_cursor (
chain_id VARCHAR(16) PRIMARY KEY,
last_scanned_height BIGINT NOT NULL,
last_scanned_hash VARCHAR(128) NOT NULL,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
ALTER TABLE deposit_history
ADD COLUMN chain_id VARCHAR(16),
ADD COLUMN block_height BIGINT,
ADD COLUMN block_hash VARCHAR(128),
ADD COLUMN tx_index INT,
ADD COLUMN confirmations INT DEFAULT 0;
7. Configuration: No Hardcoding
All chain-specific parameters (confirmations, reorg depth, dust threshold) must be loaded from YAML.
8. Security: HD Wallet Architecture
8.1 Key Storage
- Cold Storage: Private Key (Mnemonic) offline.
- Hot Server: XPUB only.
8.2 Address Derivation
- BTC: BIP84 (
m/84'/0'/0'/0/{index}) - ETH: BIP44 (
m/44'/60'/0'/0/{index})
8.3 The “Gap Limit” Solution
- Solution: Full Index Scanning. Sentinel loads ALL active allocated addresses from the
user_addressestable into a HashSet (Memory) or Bloom Filter (Future optimization). - Scanning: Scan every block transaction output against this set, ignoring standard Gap Limits.
9. Future Work (Out of Scope for 0x11-a)
- Bloom Filters: For million-user address matching (Phase 0x12).
- Automated Clawback: For deep re-orgs.
- Multi-Source Validation: Anti-RPC-spoofing.
10. Summary
Phase 0x11-a transitions the Funding System to production-ready blockchain integration.
11. Implementation Status (2025-12-29)
11.1 Completed Features
- Core Funding:
DepositServiceandWithdrawServicefully implemented with Integer-Only Persistence (BigInt/i64). - Sentinel (BTC): Basic
BtcScannerimplemented (Pollinggetblock,HashSetaddress matching). - Api Layer: Deposit/Withdraw history APIs fixed (QA-01) and internal auth secured (QA-03).
- Address Validation: Strict Regex for BTC/ETH addresses (DEF-001).
11.2 Verification & Testing Guide
Run the verified QA suite covering Core, Chaos, and Security scenarios:
bash scripts/run_0x11a_verification.sh
Results:
- Agent B (Core): Address Persistence, Deposit/Withdraw Lifecycle ✅
- Agent A (Chaos): Idempotency, Race Condition Resilience ✅
- Agent C (Security): Address Isolation, Internal Auth ✅
11.3 Known Limitations (Deferred to 0x11-b)
- ETH / ERC20 Support: Real chain integration for Ethereum is Pending.
EthScanneris currently a stub. - DEF-002 (Sentinel SegWit): The current
bitcoincore-rpcintegration has issues parsing P2WPKH addresses inregtest. Sentinel runs but may miss specific SegWit deposits. - Bloom Filters: Currently using
HashSetfor address matching. Bloom Filters deferred to Phase 0x12 optimizations.
🇨🇳 中文
| 状态 | ✅ 已实施 / QA 验证通过 (Phase 0x11-a 完成) |
|---|---|
| 日期 | 2025-12-29 |
| 上下文 | Phase 0x11 扩展: 从模拟到现实 |
| 目标 | 集成真实区块链节点 (Regtest/Testnet) 并处理分布式系统容错 (链重组、网络分区)。 |
1. 核心架构升级:推 (Push) vs 拉 (Pull)
模拟阶段 (0x11) 依赖 推模式 (API 调用 -> 触发充值)。 真实链集成 (0x11-a) 必须采用 拉模式 (哨兵 -> 被动轮询数据库)。
1.1 哨兵服务 (Sentinel - 新增组件)
一个独立运行的守护进程,负责持续“注视”区块链。
- 区块扫描 (Block Scanning): 轮询
getblockchaininfo(BTC) 或eth_blockNumber(ETH)。 - 过滤器 (Filter): 在内存中索引所有
user_addresses(HashSet)。扫描新块交易时进行快速匹配。 - 状态追踪 (State Tracking): 持续跟进
CONFIRMING状态存款的确认数变化。
2. 核心挑战:链重组 (Chain Re-org)
真实区块链中,“最新” 区块并非最终态。它随时可能被孤立 (Orphaned)。
2.1 确认数状态机 (Confirmation State Machine)
必须扩展存款状态流以处理链的不确定性。
| 状态 | 确认数 | 动作 | UI 显示 |
|---|---|---|---|
| DETECTED (已检测) | 0 | 记录交易,但 绝对不 增加用户余额。 | “确认中 (0/X)” |
| CONFIRMING (确认中) | 1 ~ (X-1) | 更新确认数。检查父哈希以防重组。 | “确认中 (N/X)” |
| FINALIZED (已完成) | >= X | 动作: 向撮合引擎提交 OrderAction::Deposit。 | “成功” |
Important
X 代表
REQUIRED_CONFIRMATIONS(所需确认数) 参数。禁止硬编码,必须按链配置。
2.2 重组检测逻辑
- 哨兵记录
Block(Height H) = Hash A。 - 哨兵稍后再次扫描
Height H。 - 如果
Hash != A,说明发生了重组。 - 动作: 回滚扫描游标 (Cursor),重新评估所有受影响的存款。
3. 支持的链 (第一阶段)
3.1 Bitcoin (UTXO 原型)
- 节点:
bitcoind(Regtest 模式)。 - 挑战: UTXO 管理。比特币存款是新的未花费输出 (UTXO),而非简单的余额变动。
- Docker:
ruimarinho/bitcoin-core:24
3.2 Ethereum (账户/EVM 原型) - 🚧 待实现
- 状态: 设计完成,等待实现 (Phase 0x11-b)。
- 节点:
anvil(Foundry-rs)。 - 挑战: Event Log 解析。ERC20 存款体现为 Receipt Log 中的
Transfer事件。 - Docker:
ghcr.io/foundry-rs/foundry:latest
4. 哨兵架构详解
4.1 BtcSentinel (已实现 - 比特币哨兵)
getblockhash->getblock(Verbosity 2,获取完整交易细节)。- 遍历输出
vout: 将scriptPubKey与user_addresses匹配。 - 重组检查: 维护一个滚动窗口。如果
previousblockhash不匹配,触发 回滚 (Rollback)。
4.2 EthSentinel (计划中 - 0x11-b)
eth_getLogs(Topic0 = Transfer 事件签名)。- 重组检查: 检查已确认日志的
blockHash是否变更。
5. 对账与安全 (金融防火墙)
5.1 “截断协议” (The Truncation Protocol)
解决链上浮点数/大整数与系统精度不匹配的问题:
- 入金逻辑:
入账金额 = Truncate(链上原始金额, 系统配置精度)。 - 系统粉尘 (System Dust): 截断后的余数留在热钱包中,归系统所有,不归属用户。
5.2 三角对账策略 (Triangular Reconciliation)
使用三个独立数据源验证系统偿付能力:
| 来源 | 别名 | 数据点 |
|---|---|---|
| 区块链 RPC | 资产证明 (PoA) | getbalance() 或 UTXO 总和 |
| 内部账本 | 负债证明 (PoL) | SUM(user.available + user.frozen) |
| 流水历史 | 流水证明 (PoF) | SUM(充值) - SUM(提现) - SUM(手续费) |
核心对账公式: PoA == PoL + 系统利润
5.3 重组恢复协议
- 浅层重组: 哨兵自动回滚游标。
- 深层重组 (> 最大深度): 触发熔断,需人工介入 (冻结提现 + 资金冲正)。
6. 数据库模式扩展
CREATE TABLE chain_cursor (
chain_id VARCHAR(16) PRIMARY KEY, -- 'BTC', 'ETH'
last_scanned_height BIGINT NOT NULL,
last_scanned_hash VARCHAR(128) NOT NULL,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
ALTER TABLE deposit_history
ADD COLUMN chain_id VARCHAR(16),
ADD COLUMN confirmations INT DEFAULT 0;
-- (其他字段省略)
7. 配置:拒绝硬编码
所有特定于链的参数(确认数、重组深度、最小入金阈值)必须从 YAML 配置文件加载。
8. 安全:HD 钱包架构
8.1 密钥存储
- 冷存储 (离线): 私钥/助记词永远离线保存。
- 热服务 (在线): 仅部署 扩展公钥 (XPUB)。
8.2 地址派生
- BTC: BIP84 (原生 SegWit)
m/84'/0'/0'/0/{index} - ETH: BIP44
m/44'/60'/0'/0/{index}
8.3 “Gap Limit” 解决方案
- 问题: 标准钱包在连续 20 个空地址后停止扫描。
- 方案: 全索引扫描。哨兵将
user_addresses表中 所有 活跃地址加载到 HashSet (当前实现) 或 Bloom Filter (未来优化),无视 Gap Limit。
9. 未来工作 (本次范围之外)
- Bloom Filters: 百万级用户地址匹配优化。
- 自动冲正 (Automated Clawback): 针对深层重组的自动化处理。
- 多源验证: 对抗单一 RPC 节点被劫持的风险。
10. 总结
Phase 0x11-a 将资金系统从模拟环境升级为生产就绪的区块链集成架构。
11. 实施状态报告 (2025-12-29)
11.1 已完成功能
- 核心资金流:
DepositService/WithdrawService实现,并严格遵守整型持久化 (BigInt/i64)。 - 哨兵 (BTC): 基础
BtcScanner已上线 (轮询getblock,HashSet地址匹配)。 - API 层: 充提历史接口已修复 (QA-01),内部 mock 接口已加固 (QA-03)。
- 地址校验: 实现 BTC/ETH 下的严格格式正则校验 (DEF-001)。
11.2 验证与测试指南
运行全量验证套件 (包含 Core/Chaos/Security 测试):
bash scripts/run_0x11a_verification.sh
验证结果:
- Agent B (Core): 地址持久化, 充提生命周期 ✅
- Agent A (Chaos): 幂等性, 竞态条件鲁棒性 ✅
- Agent C (Security): 地址隔离, 内部接口鉴权 ✅
11.3 已知限制 (推迟至 0x11-b)
- ETH / ERC20 支持: Ethereum 的真实链集成 尚未实现 (Pending)。
EthScanner目前仅为 Stub。 - DEF-002 (Sentinel SegWit): 当前
bitcoincore-rpc集成在regtest环境下解析 P2WPKH 地址存在问题,可能会漏掉隔离见证存款。 - Bloom Filter: 当前版本使用
HashSet进行地址匹配,Bloom Filter 优化推迟至 Phase 0x12。
0x11-b Sentinel Hardening
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
| Status | ✅ COMPLETE (Core) |
|---|---|
| Date | 2025-12-30 |
| Context | Phase 0x11-a Extension: Hardening Sentinel for Production |
| Goal | Fix SegWit blindness (DEF-002), implement ETH/ERC20 & ADR-005/006. |
| Branch | 0x11-b-sentinel-hardening |
| Latest Commit | d307e12 |
1. Objectives
This phase addresses the critical gaps identified during Phase 0x11-a QA:
| Priority | Issue | Description |
|---|---|---|
| P0 | DEF-002 | Sentinel fails to detect P2WPKH (SegWit) deposits on BTC. |
| P1 | ETH Gap | EthScanner is a stub; no real ERC20 event parsing. |
2. Deposit Flow Architecture
Important
🚨 Production Risk Control Requirements
Before crediting user balance on finalization, deposits SHOULD pass through:
- Source Verification - Check if sender address is on sanctions/blacklist
- Amount Thresholds - Large deposits may require enhanced verification
- Pattern Analysis - Detect unusual deposit patterns (structuring, layering)
- AML Compliance - Regulatory reporting for threshold amounts
- Address Attribution - Verify expected vs actual funding sources
The current implementation credits balance automatically on finalization.
2.1 Overview
┌─────────────────────────────────────────────────────────────────────────────┐
│ Sentinel Deposit Flow │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────────┐ ┌────────────────┐ ┌─────────────┐ │
│ │ BTC/ETH │───▶│ ChainScanner │───▶│ Confirmation │───▶│ Deposit │ │
│ │ Node │ │ │ │ Monitor │ │ Pipeline │ │
│ └──────────┘ └──────────────┘ └────────────────┘ └─────────────┘ │
│ ▲ │ │ │ │
│ │ ▼ ▼ ▼ │
│ │ ┌─────────────┐ ┌───────────┐ ┌─────────────┐ │
│ │ │ ScannedBlock│ │ deposit_ │ │ balances_tb │ │
│ │ │ + Deposits │ │ history │ │ (Balance) │ │
│ │ └─────────────┘ └───────────┘ └─────────────┘ │
│ │ DB DB │
└───────┴─────────────────────────────────────────────────────────────────────┘
2.2 State Machine
DETECTED ──▶ CONFIRMING ──▶ FINALIZED ──▶ SUCCESS
│ │
└───────── ORPHANED ◀──────────┘
(Re-org detected)
| Status | Meaning | Balance Impact |
|---|---|---|
DETECTED | On-chain detected, awaiting confirmation | ❌ |
CONFIRMING | 1+ confirmations, not yet finalized | ❌ |
FINALIZED | Required confirmations reached | 🔄 Processing |
SUCCESS | Balance credited | ✅ |
ORPHANED | Block re-orged, tx invalidated | ❌ |
2.3 Key Components
| Component | File | Responsibility |
|---|---|---|
BtcScanner | src/sentinel/btc.rs | Scan BTC blocks, extract P2PKH/P2WPKH addresses |
EthScanner | src/sentinel/eth.rs | Scan ETH blocks via JSON-RPC |
ConfirmationMonitor | src/sentinel/confirmation.rs | Track confirmations, detect re-orgs |
DepositPipeline | src/sentinel/pipeline.rs | Credit balance on finalization |
2.4 Database Schema
deposit_history (Deposit Records):
tx_hash VARCHAR PRIMARY KEY -- Transaction hash
user_id BIGINT -- User ID
asset VARCHAR -- Asset (BTC/ETH)
amount DECIMAL -- Amount
chain_id VARCHAR -- Chain ID
block_height BIGINT -- Block height
block_hash VARCHAR -- Block hash (for re-org detection)
status VARCHAR -- Status (see state machine)
confirmations INT -- Current confirmation count
3. Withdraw Flow Architecture
Caution
⛔ Production Risk Control Requirements ⛔
The current implementation is for MVP/Testing only. Before production deployment, withdrawals MUST pass through:
- Comprehensive Risk Engine - Real-time fraud detection, velocity limits, address blacklist
- Manual Review - Large amounts require human approval
- Multi-signature Approval - Hot wallet threshold triggers cold wallet multi-sig
- AML/KYC Verification - Regulatory compliance checks
- Delay Mechanism - Suspicious transactions held for review period
Never deploy the current auto-approval flow to production!
3.1 Overview
┌─────────────────────────────────────────────────────────────────────────────┐
│ Withdraw Flow (Push Model) │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────────┐ ┌────────────────┐ ┌─────────────┐ │
│ │ User │───▶│ WithdrawServ │───▶│ Balance │───▶│ Chain │ │
│ │ Request │ │ ice │ │ Deduct │ │ Broadcast │ │
│ └──────────┘ └──────────────┘ └────────────────┘ └─────────────┘ │
│ │ │ │ │ │
│ │ ▼ ▼ ▼ │
│ │ ┌─────────────┐ ┌───────────┐ ┌─────────────┐ │
│ │ │ Validate │ │ withdraw_ │ │ TX Hash │ │
│ │ │ Address │ │ history │ │ or Fail │ │
│ │ └─────────────┘ └───────────┘ └─────────────┘ │
│ │ DB ▼ │
│ │ ┌─────────────────────────────────┐ │
│ │ │ On Fail: AUTO REFUND to balance │ │
│ │ └─────────────────────────────────┘ │
└───────┴─────────────────────────────────────────────────────────────────────┘
3.2 Flow Steps
1. Validate Request
└─▶ Address format ✓, Amount > 0 ✓
2. Lock & Check Balance (FOR UPDATE)
└─▶ available >= amount ? Continue : Error
3. Deduct Balance (Immediate)
└─▶ available -= amount
4. Create Record (PROCESSING)
└─▶ INSERT INTO withdraw_history
5. COMMIT Transaction
└─▶ Balance deducted, record created
6. Broadcast to Chain
├─▶ Success: UPDATE status = 'SUCCESS', tx_hash = ?
└─▶ Failure: AUTO REFUND + status = 'FAILED'
3.3 State Machine
┌──────────────┐
│ PROCESSING │
└──────┬───────┘
│
┌───────────┼───────────┐
▼ ▼
┌──────────┐ ┌──────────┐
│ SUCCESS │ │ FAILED │
│ (✅ TX) │ │(Refunded)│
└──────────┘ └──────────┘
| Status | Meaning | Balance Impact |
|---|---|---|
PROCESSING | Request submitted, awaiting broadcast | 💰 Deducted |
SUCCESS | TX broadcast successful | ✅ Completed |
FAILED | Broadcast failed, auto-refunded | 🔄 Refunded |
3.4 Key Components
| Component | File | Responsibility |
|---|---|---|
WithdrawService | src/funding/withdraw.rs | Validate, deduct, broadcast, refund |
ChainClient | src/funding/chain_adapter.rs | Blockchain TX broadcast interface |
handlers::apply_withdraw | src/funding/handlers.rs | HTTP API endpoint |
3.5 Database Schema
withdraw_history (Withdraw Records):
request_id VARCHAR PRIMARY KEY -- Request UUID
user_id BIGINT -- User ID
asset VARCHAR -- Asset (BTC/ETH)
amount BIGINT -- Amount (scaled integer)
fee BIGINT -- Network fee (scaled integer)
to_address VARCHAR -- Destination address
status VARCHAR -- PROCESSING/SUCCESS/FAILED
tx_hash VARCHAR -- Blockchain TX hash (on success)
created_at TIMESTAMP -- Created time
updated_at TIMESTAMP -- Updated time
3.6 Amount Calculation
User Balance Delta = -Request Amount
Network Receive = Request Amount - Fee
Example:
- User requests withdraw 1.0 BTC with 0.0001 BTC fee
- Balance deducted: 1.0 BTC
- Network receives: 0.9999 BTC
4. 🛡️ Tiered Risk Control Framework (Defense in Depth)
4.1 Defense Layers
┌─────────────────────────────────────────────────────────────────────────────┐
│ Defense in Depth Architecture │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Layer 1: 🟢 AUTOMATED │
│ ├─▶ Address blacklist/sanctions check │
│ ├─▶ Velocity limits (per hour/day/week) │
│ └─▶ Basic fraud pattern detection │
│ │
│ Layer 2: 🟡 THRESHOLD-BASED │
│ ├─▶ Amount > $1K: Enhanced verification │
│ ├─▶ Amount > $10K: 24-hour delay + notification │
│ └─▶ Amount > $50K: Requires Layer 3 │
│ │
│ Layer 3: 🔴 MANUAL REVIEW │
│ ├─▶ Human analyst verification │
│ ├─▶ Source of funds documentation │
│ └─▶ Multi-party approval (2-of-3) │
│ │
│ Layer 4: ⚫ COLD WALLET MULTI-SIG │
│ ├─▶ Amount > $100K: Cold wallet release │
│ ├─▶ Hardware key requirement │
│ └─▶ Geographic distribution of signers │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
4.2 Risk Tiers by Amount
| Tier | Amount | Delay | Approval | Wallet |
|---|---|---|---|---|
| 🟢 T1 | < $1,000 | None | Auto | Hot |
| 🟡 T2 | $1K - $10K | 1 hour | Auto + Alert | Hot |
| 🟠 T3 | $10K - $50K | 24 hours | 1-of-2 Manual | Hot |
| 🔴 T4 | $50K - $100K | 48 hours | 2-of-3 Manual | Warm |
| ⚫ T5 | > $100K | 72 hours | 3-of-5 + HSM | Cold |
4.3 Automated Checks (All Tiers)
| Check | Block | Alert |
|---|---|---|
| OFAC/Sanctions list | ✅ | ✅ |
| Address blacklist | ✅ | ✅ |
| Velocity limit exceeded | ✅ | ✅ |
| New address (< 24h) | ⚠️ T2+ | ✅ |
| Unusual amount pattern | ⚠️ Delay | ✅ |
| Geographic anomaly | ⚠️ Delay | ✅ |
4.4 Deposit-Specific Checks
┌────────────────────────────────────────────────────────────────┐
│ Deposit Risk Assessment │
├────────────────────────────────────────────────────────────────┤
│ ✓ Source address attribution (known exchange? mixer? unknown?) │
│ ✓ Transaction graph analysis (1-hop, 2-hop connections) │
│ ✓ Timing pattern (structuring detection) │
│ ✓ Historical behavior baseline │
│ ✓ Cross-chain correlation (same entity on ETH/BTC?) │
└────────────────────────────────────────────────────────────────┘
4.5 Withdraw-Specific Checks
┌────────────────────────────────────────────────────────────────┐
│ Withdraw Risk Assessment │
├────────────────────────────────────────────────────────────────┤
│ ✓ Destination address reputation │
│ ✓ First-time address penalty │
│ ✓ Account age vs amount ratio │
│ ✓ Recent password/2FA changes (48h cooldown) │
│ ✓ Device fingerprint verification │
│ ✓ API key usage pattern │
└────────────────────────────────────────────────────────────────┘
5. Problem Analysis: DEF-002 (BTC SegWit Blindness)
5.1 Root Cause
The extract_address function in src/sentinel/btc.rs uses Address::from_script(script, network).
While the rust-bitcoin crate should support P2WPKH scripts (OP_0 <20-byte-hash>), the current implementation may fail due to:
- Network mismatch between the script encoding and the
Networkenum passed. - Missing feature flags in the
bitcoincore-rpcdependency.
5.2 Solution
- Verify: Add unit test with raw P2WPKH script construction.
- Fix: If
Address::from_scriptfails, manually detect witness v0 scripts:#![allow(unused)] fn main() { if script.is_p2wpkh() { // Extract 20-byte hash from script[2..22] // Construct Address::p2wpkh(...) } }
6. Feature Specification: ETH/ERC20 Sentinel
6.1 Architecture
┌─────────────────────────────────────────────────────────────────┐
│ EthScanner │
├─────────────────────────────────────────────────────────────────┤
│ 1. Poll eth_blockNumber (Tip Tracking) │
│ 2. eth_getLogs(fromBlock, toBlock, topics=[Transfer]) │
│ 3. Filter: Match log.address (Contract) + topic[2] (To) │
│ 4. Parse: Decode log.data as uint256 amount │
│ 5. Emit: DetectedDeposit { tx_hash, to_address, amount, ... } │
└─────────────────────────────────────────────────────────────────┘
6.2 Key Implementation Details
- Topic0 (Transfer):
keccak256("Transfer(address,address,uint256)")=0xddf252ad... - Topic1: Sender (indexed)
- Topic2: Recipient (indexed) - Match against
user_addresses - Data: Amount (uint256, left-padded)
6.3 Precision Handling
| Token | Decimals | Scaling |
|---|---|---|
| ETH | 18 | amount / 10^18 |
| USDT | 6 | amount / 10^6 |
| USDC | 6 | amount / 10^6 |
Important
Token decimals MUST be loaded from
assets_tb, not hardcoded.
7. Database Schema Extensions
-- EthScanner requires contract address tracking
ALTER TABLE assets_tb
ADD COLUMN contract_address VARCHAR(42); -- e.g., '0xdAC17F958D2ee523a2206206994597C13D831ec7'
-- Index for fast lookup by contract
CREATE INDEX idx_assets_contract ON assets_tb(contract_address);
8. Configuration: config/sentinel.yaml
eth:
chain_id: "ETH"
network: "anvil" # or "mainnet", "goerli"
rpc:
url: "http://127.0.0.1:8545"
scanning:
required_confirmations: 12
max_reorg_depth: 20
start_height: 0
contracts:
- name: "USDT"
address: "0x..."
decimals: 6
- name: "USDC"
address: "0x..."
decimals: 6
9. Acceptance Criteria
- BTC: Unit test
test_p2wpkh_extractionpasses. ✅ (test_segwit_p2wpkh_extraction_def_002) - BTC: E2E deposit to
bcrt1...address is detected and credited. ✅ (Verified via greybox test) - ETH: Unit test
test_erc20_transfer_parsingpasses. ✅ (7 ETH tests pass) - ETH: E2E deposit via MockUSDT contract is detected. ⏳ (Pending: ERC20
eth_getLogsnot yet implemented) - Regression: All existing Phase 0x11-a tests still pass. ✅ (322 tests)
10. Implementation Status
| Component | Status | Notes |
|---|---|---|
BtcScanner P2WPKH Fix | ✅ Complete | Test test_segwit_p2wpkh_extraction_def_002 passes |
EthScanner Implementation | ✅ Complete | Full JSON-RPC (eth_blockNumber, eth_getBlockByNumber, eth_syncing) |
| Unit Tests | ✅ 22 Pass | All Sentinel tests passing |
| E2E Verification | ⚠️ Partial | Nodes not running during test; scripts ready |
| ERC20 Token Support | 🚧 In Progress | eth_getLogs for Transfer events (Phase 0x11-b scope) |
11. Testing Instructions
Quick Test (Rust Unit Tests)
# Run all Sentinel tests
cargo test --package zero_x_infinity --lib sentinel -- --nocapture
# Run DEF-002 verification test only
cargo test test_segwit_p2wpkh_extraction_def_002 -- --nocapture
# Run ETH Scanner tests only
cargo test sentinel::eth -- --nocapture
Full Test Suite
# Run test script (no nodes required)
./scripts/tests/0x11b_sentinel/run_tests.sh
# Run with node startup (requires docker-compose)
./scripts/tests/0x11b_sentinel/run_tests.sh --with-nodes
🇨🇳 中文
| 状态 | ✅ 核心功能已完成 |
|---|---|
| 日期 | 2025-12-29 |
| 上下文 | Phase 0x11-a 延续: 强化哨兵服务 |
| 目标 | 修复 SegWit 盲区 (DEF-002) 并实现 ETH/ERC20 支持。 |
| 分支 | 0x11-b-sentinel-hardening |
| 最新提交 | d383b6c |
1. 目标
本阶段解决 Phase 0x11-a QA 中识别的关键缺陷:
| 优先级 | 问题 | 描述 |
|---|---|---|
| P0 | DEF-002 | 哨兵无法检测 BTC P2WPKH (SegWit) 充值。 |
| P1 | ETH 缺口 | EthScanner 只是空壳;无法解析 ERC20 事件。 |
2. 充值流程架构
Important
🚨 生产环境风控要求
在确认完成后为用户入账之前,充值 应该 经过:
- 来源验证 - 检查发送地址是否在制裁/黑名单上
- 金额阈值 - 大额充值可能需要加强验证
- 模式分析 - 检测异常充值模式 (拆分、分层)
- AML 合规 - 超过阈值金额的监管报告
- 地址归属 - 验证预期 vs 实际资金来源
当前实现在确认完成后自动入账。
2.1 概览
┌─────────────────────────────────────────────────────────────────────────────┐
│ Sentinel 充值流程 │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────────┐ ┌────────────────┐ ┌─────────────┐ │
│ │ BTC/ETH │───▶│ ChainScanner │───▶│ Confirmation │───▶│ Deposit │ │
│ │ 节点 │ │ 区块扫描器 │ │ Monitor │ │ Pipeline │ │
│ └──────────┘ └──────────────┘ └────────────────┘ └─────────────┘ │
│ ▲ │ │ │ │
│ │ ▼ ▼ ▼ │
│ │ ┌─────────────┐ ┌───────────┐ ┌─────────────┐ │
│ │ │ ScannedBlock│ │ deposit_ │ │ balances_tb │ │
│ │ │ 扫描区块 │ │ history │ │ 余额表 │ │
│ │ └─────────────┘ └───────────┘ └─────────────┘ │
│ │ 数据库 数据库 │
└───────┴─────────────────────────────────────────────────────────────────────┘
2.2 状态机
DETECTED ──▶ CONFIRMING ──▶ FINALIZED ──▶ SUCCESS
已检测 确认中 已完成 成功
│ │
└───────── ORPHANED ◀──────────┘
已孤立 (区块重组)
| 状态 | 含义 | 余额影响 |
|---|---|---|
DETECTED | 链上检测到,等待确认 | ❌ |
CONFIRMING | 有 1+ 确认,尚未达标 | ❌ |
FINALIZED | 达到所需确认数 | 🔄 处理中 |
SUCCESS | 已入账到余额 | ✅ |
ORPHANED | 区块被重组,交易失效 | ❌ |
2.3 关键组件
| 组件 | 文件 | 职责 |
|---|---|---|
BtcScanner | src/sentinel/btc.rs | 扫描 BTC 区块,提取 P2PKH/P2WPKH 地址 |
EthScanner | src/sentinel/eth.rs | 通过 JSON-RPC 扫描 ETH 区块 |
ConfirmationMonitor | src/sentinel/confirmation.rs | 追踪确认数,检测重组 |
DepositPipeline | src/sentinel/pipeline.rs | 完成后入账余额 |
2.4 数据库结构
deposit_history (充值记录表):
tx_hash VARCHAR PRIMARY KEY -- 交易哈希
user_id BIGINT -- 用户 ID
asset VARCHAR -- 资产 (BTC/ETH)
amount DECIMAL -- 金额
chain_id VARCHAR -- 链 ID
block_height BIGINT -- 区块高度
block_hash VARCHAR -- 区块哈希 (用于重组检测)
status VARCHAR -- 状态 (见状态机)
confirmations INT -- 当前确认数
3. 提现流程架构
Caution
⛔ 生产环境风控要求 ⛔
当前实现仅用于 MVP/测试。生产部署前,提现请求 必须 经过:
- 完整风控引擎 - 实时欺诈检测、频率限制、地址黑名单
- 人工审核 - 大额提现需人工批准
- 多签审批 - 热钱包阈值触发冷钱包多签
- AML/KYC 验证 - 合规性检查
- 延迟机制 - 可疑交易进入审核等待期
绝对不要将当前自动审批流程部署到生产环境!
3.1 概览
┌─────────────────────────────────────────────────────────────────────────────┐
│ 提现流程 (推送模式) │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────────┐ ┌────────────────┐ ┌─────────────┐ │
│ │ 用户 │───▶│ WithdrawServ │───▶│ 余额扣减 │───▶│ 链上广播 │ │
│ │ 请求 │ │ 提现服务 │ │ (立即) │ │ │ │
│ └──────────┘ └──────────────┘ └────────────────┘ └─────────────┘ │
│ │ │ │ │ │
│ │ ▼ ▼ ▼ │
│ │ ┌─────────────┐ ┌───────────┐ ┌─────────────┐ │
│ │ │ 地址验证 │ │ withdraw_ │ │ TX Hash 或 │ │
│ │ │ │ │ history │ │ 失败 │ │
│ │ └─────────────┘ └───────────┘ └─────────────┘ │
│ │ 数据库 ▼ │
│ │ ┌─────────────────────────────────┐ │
│ │ │ 失败时: 自动退款到余额 │ │
│ │ └─────────────────────────────────┘ │
└───────┴─────────────────────────────────────────────────────────────────────┘
3.2 流程步骤
1. 验证请求
└─▶ 地址格式 ✓, 金额 > 0 ✓
2. 锁定并检查余额 (FOR UPDATE)
└─▶ 可用余额 >= 金额 ? 继续 : 错误
3. 扣减余额 (立即)
└─▶ 可用余额 -= 金额
4. 创建记录 (PROCESSING)
└─▶ INSERT INTO withdraw_history
5. 提交事务
└─▶ 余额已扣减,记录已创建
6. 广播到链
├─▶ 成功: UPDATE status = 'SUCCESS', tx_hash = ?
└─▶ 失败: 自动退款 + status = 'FAILED'
3.3 状态机
┌──────────────┐
│ PROCESSING │
│ 处理中 │
└──────┬───────┘
│
┌───────────┼───────────┐
▼ ▼
┌──────────┐ ┌──────────┐
│ SUCCESS │ │ FAILED │
│ 成功 │ │ 失败 │
│ (✅ TX) │ │(已退款) │
└──────────┘ └──────────┘
| 状态 | 含义 | 余额影响 |
|---|---|---|
PROCESSING | 请求已提交,等待广播 | 💰 已扣减 |
SUCCESS | 交易广播成功 | ✅ 完成 |
FAILED | 广播失败,已自动退款 | 🔄 已退款 |
3.4 关键组件
| 组件 | 文件 | 职责 |
|---|---|---|
WithdrawService | src/funding/withdraw.rs | 验证、扣减、广播、退款 |
ChainClient | src/funding/chain_adapter.rs | 区块链交易广播接口 |
handlers::apply_withdraw | src/funding/handlers.rs | HTTP API 端点 |
3.5 数据库结构
withdraw_history (提现记录表):
request_id VARCHAR PRIMARY KEY -- 请求 UUID
user_id BIGINT -- 用户 ID
asset VARCHAR -- 资产 (BTC/ETH)
amount BIGINT -- 金额 (整数缩放)
fee BIGINT -- 网络手续费 (整数缩放)
to_address VARCHAR -- 目标地址
status VARCHAR -- PROCESSING/SUCCESS/FAILED
tx_hash VARCHAR -- 区块链交易哈希 (成功时)
created_at TIMESTAMP -- 创建时间
updated_at TIMESTAMP -- 更新时间
3.6 金额计算
用户余额变化 = -请求金额
链上到账金额 = 请求金额 - 手续费
示例:
- 用户请求提现 1.0 BTC,手续费 0.0001 BTC
- 余额扣减: 1.0 BTC
- 链上到账: 0.9999 BTC
4. 🛡️ 分级纵深防御风控框架
4.1 防御层级
┌─────────────────────────────────────────────────────────────────────────────┐
│ 纵深防御架构 │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ 第一层: 🟢 自动化检查 │
│ ├─▶ 地址黑名单/制裁名单检查 │
│ ├─▶ 频率限制 (每小时/每天/每周) │
│ └─▶ 基础欺诈模式检测 │
│ │
│ 第二层: 🟡 阈值触发 │
│ ├─▶ 金额 > ¥7K: 加强验证 │
│ ├─▶ 金额 > ¥70K: 24小时延迟 + 通知 │
│ └─▶ 金额 > ¥350K: 进入第三层 │
│ │
│ 第三层: 🔴 人工审核 │
│ ├─▶ 人工分析师验证 │
│ ├─▶ 资金来源证明文件 │
│ └─▶ 多方审批 (2-of-3) │
│ │
│ 第四层: ⚫ 冷钱包多签 │
│ ├─▶ 金额 > ¥700K: 冷钱包释放 │
│ ├─▶ 硬件密钥要求 │
│ └─▶ 签名者地理分布 │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
4.2 风险分级 (按金额)
| 层级 | 金额 | 延迟 | 审批 | 钱包 |
|---|---|---|---|---|
| 🟢 T1 | < ¥7,000 | 无 | 自动 | 热 |
| 🟡 T2 | ¥7K - ¥70K | 1小时 | 自动 + 告警 | 热 |
| 🟠 T3 | ¥70K - ¥350K | 24小时 | 1-of-2 人工 | 热 |
| 🔴 T4 | ¥350K - ¥700K | 48小时 | 2-of-3 人工 | 温 |
| ⚫ T5 | > ¥700K | 72小时 | 3-of-5 + HSM | 冷 |
4.3 自动化检查 (所有层级)
| 检查项 | 阻止 | 告警 |
|---|---|---|
| OFAC/制裁名单 | ✅ | ✅ |
| 地址黑名单 | ✅ | ✅ |
| 超过频率限制 | ✅ | ✅ |
| 新地址 (< 24h) | ⚠️ T2+ | ✅ |
| 异常金额模式 | ⚠️ 延迟 | ✅ |
| 地理位置异常 | ⚠️ 延迟 | ✅ |
4.4 充值专项检查
┌────────────────────────────────────────────────────────────────┐
│ 充值风险评估 │
├────────────────────────────────────────────────────────────────┤
│ ✓ 来源地址归属 (已知交易所? 混币器? 未知?) │
│ ✓ 交易图谱分析 (1跳、2跳关联) │
│ ✓ 时序模式 (拆分检测) │
│ ✓ 历史行为基线 │
│ ✓ 跨链关联 (同一实体在 ETH/BTC?) │
└────────────────────────────────────────────────────────────────┘
4.5 提现专项检查
┌────────────────────────────────────────────────────────────────┐
│ 提现风险评估 │
├────────────────────────────────────────────────────────────────┤
│ ✓ 目标地址信誉 │
│ ✓ 首次使用地址惩罚 │
│ ✓ 账户年龄 vs 金额比率 │
│ ✓ 近期密码/2FA变更 (48h冷却) │
│ ✓ 设备指纹验证 │
│ ✓ API密钥使用模式 │
└────────────────────────────────────────────────────────────────┘
5. 问题分析: DEF-002 (BTC SegWit 盲区)
5.1 根因
src/sentinel/btc.rs 中的 extract_address 函数使用 Address::from_script(script, network)。
虽然 rust-bitcoin 库 理论上 支持 P2WPKH 脚本 (OP_0 <20-byte-hash>),但当前实现可能因以下原因失败:
- 脚本编码与传入的
Network枚举不匹配。 bitcoincore-rpc依赖缺少必要的 feature flags。
5.2 解决方案
- 验证: 添加单元测试,手动构造原始 P2WPKH 脚本。
- 修复: 如果
Address::from_script失败,手动检测 witness v0 脚本:#![allow(unused)] fn main() { if script.is_p2wpkh() { // 从 script[2..22] 提取 20 字节哈希 // 构造 Address::p2wpkh(...) } }
6. 功能规格: ETH/ERC20 哨兵
6.1 架构
┌─────────────────────────────────────────────────────────────────┐
│ EthScanner │
├─────────────────────────────────────────────────────────────────┤
│ 1. 轮询 eth_blockNumber (区块高度追踪) │
│ 2. eth_getLogs(fromBlock, toBlock, topics=[Transfer]) │
│ 3. 过滤: 匹配 log.address (合约) + topic[2] (收款人) │
│ 4. 解析: 将 log.data 解码为 uint256 金额 │
│ 5. 产出: DetectedDeposit { tx_hash, to_address, amount, ... } │
└─────────────────────────────────────────────────────────────────┘
6.2 关键实现细节
- Topic0 (Transfer):
keccak256("Transfer(address,address,uint256)")=0xddf252ad... - Topic1: 发送方 (indexed)
- Topic2: 接收方 (indexed) - 与
user_addresses匹配 - Data: 金额 (uint256, 左填充)
6.3 精度处理
| 代币 | 小数位 | 缩放比例 |
|---|---|---|
| ETH | 18 | amount / 10^18 |
| USDT | 6 | amount / 10^6 |
| USDC | 6 | amount / 10^6 |
Important
代币精度 必须 从
assets_tb加载,禁止硬编码。
7. 数据库模式扩展
-- EthScanner 需要追踪合约地址
ALTER TABLE assets_tb
ADD COLUMN contract_address VARCHAR(42); -- 例: '0xdAC17F958D2ee523a2206206994597C13D831ec7'
-- 按合约快速查询的索引
CREATE INDEX idx_assets_contract ON assets_tb(contract_address);
8. 配置: config/sentinel.yaml
eth:
chain_id: "ETH"
network: "anvil" # 或 "mainnet", "goerli"
rpc:
url: "http://127.0.0.1:8545"
scanning:
required_confirmations: 12
max_reorg_depth: 20
start_height: 0
contracts:
- name: "USDT"
address: "0x..."
decimals: 6
- name: "USDC"
address: "0x..."
decimals: 6
9. 验收标准
- BTC: 单元测试
test_p2wpkh_extraction通过。 ✅ (test_segwit_p2wpkh_extraction_def_002) - BTC: E2E 测试中充值到
bcrt1...地址被检测并入账。 ✅ (通过 greybox 测试验证) - ETH: 单元测试
test_erc20_transfer_parsing通过。 ✅ (7 个 ETH 测试通过) - ETH: E2E 测试中通过 MockUSDT 合约充值被检测。 ⏳ (待完成: ERC20
eth_getLogs尚未实现) - 回归: 所有 Phase 0x11-a 现有测试仍然通过。 ✅ (322 个测试)
10. 实施状态
| 组件 | 状态 | 备注 |
|---|---|---|
BtcScanner P2WPKH 修复 | ✅ 已完成 | 测试 test_segwit_p2wpkh_extraction_def_002 通过 |
EthScanner 实现 | ✅ 已完成 | 完整 JSON-RPC (eth_blockNumber, eth_getBlockByNumber, eth_syncing) |
| 单元测试 | ✅ 22 通过 | 所有 Sentinel 测试通过 |
| E2E 验证 | ⚠️ 部分 | 测试时节点未运行;脚本已就绪 |
| ERC20 代币支持 | 🚧 进行中 | eth_getLogs for Transfer events (Phase 0x11-b 范围) |
11. 测试方法
快速测试 (Rust 单元测试)
# 运行所有 Sentinel 测试
cargo test --package zero_x_infinity --lib sentinel -- --nocapture
# 仅运行 DEF-002 验证测试
cargo test test_segwit_p2wpkh_extraction_def_002 -- --nocapture
# 仅运行 ETH Scanner 测试
cargo test sentinel::eth -- --nocapture
完整测试套件
# 运行测试脚本 (无需节点)
./scripts/tests/0x11b_sentinel/run_tests.sh
# 运行测试脚本 (自动启动节点, 需要 docker-compose)
./scripts/tests/0x11b_sentinel/run_tests.sh --with-nodes
Appendix A: Industry Standards Reference
Full Design: See Chains Schema Design for complete schema and industry standards.
Naming Conventions
| Concept | Industry Term | Our Column | Type |
|---|---|---|---|
| Business ID | shortName | chain_slug | VARCHAR |
| EIP-155 ID | chainId | chain_id | INTEGER |
| Native Token | nativeCurrency.symbol | native_currency | VARCHAR |
References
- EIP-155 - Ethereum Chain ID
- ethereum-lists/chains - Chain Registry
- SLIP-0044 - BIP-44 Coin Types
Phase 0x11-b Schema
-- Minimum viable: uses chain_slug only
CREATE TABLE user_addresses (
user_id BIGINT,
asset VARCHAR(32),
chain_slug VARCHAR(32), -- "eth", "btc"
address VARCHAR(255),
PRIMARY KEY (user_id, asset, chain_slug)
);
0x12 Real Trading Verification
🚧 Documentation In Progress
0x13 Market Data Experience
🚧 Documentation In Progress
0x14 Extreme Optimization: Methodology
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
Phase V Keynote Codename: “Metal Mode” Philosophy: “If you can’t measure it, you can’t improve it.”
1. The Performance Ceiling
In the previous chapters, we built a highly reliable exchange core (Phase I-IV). We achieved 1.3M TPS on a single thread using the Ring Buffer architecture. This is “fast enough” for 99% of crypto exchanges.
But for top-tier HFT engines, “Fast Enough” is not enough. We want to hit the physical limits of the CPU and Memory.
1.1 Why “Extreme Optimization”?
| Phase | Focus | Goal |
|---|---|---|
| I-III | Correctness | “Does it work?” |
| IV | Integration | “Does it work end-to-end?” |
| V | Speed | “How fast can it go?” |
In Phase V, we assume correctness is already proven. Our sole focus is performance.
1.2 Why “Metal Mode”?
“Metal Mode” is our internal codename. It means:
- Close to the Metal: We will bypass high-level abstractions and work directly with memory layouts, CPU caches, and SIMD instructions.
- Bare Metal Rust: No unnecessary
clone(), no hiddenmalloc(), no runtime surprises.
2. The Benchmarking Methodology (Tier 2)
To optimize, we must first measure. But what we measure matters.
2.1 The Problem with Naive Benchmarks
| Benchmark Type | What it Measures | Problem for Optimization |
|---|---|---|
wrk / curl | HTTP round-trip | Includes OS, Network, Kernel noise |
| Unit tests | Function correctness | No performance data |
These are useful for validation (Phase IV), but not for isolation (Phase V).
2.2 Tier 2: Pipeline Benchmarks
We introduce Tier 2 Pipeline Benchmarks:
| Feature | Description |
|---|---|
| No Network I/O | Data is pre-loaded in memory. |
| No Disk I/O | WAL is mocked or in-memory. |
| Pure CPU/Memory | Measures only the “Hot Path”: RingBuffer → UBSCore → ME → Settlement. |
| Deterministic | Same input → Same output → Same timing. |
Goal: Establish the “Red Line” – the current baseline performance under ideal conditions. All future optimizations will be measured against this.
🇨🇳 中文
Phase V 基调 内部代号: “Metal Mode” 核心哲学: “无法测量,就无法优化。”
1. 性能天花板
在前几个阶段(Phase I-IV),我们构建了一个高可靠的交易所核心。利用 Ring Buffer 架构,我们在单线程上实现了 130万 TPS。对于 99% 的加密货币交易所来说,这已经“足够快“了。
但对于顶级的 HFT 引擎,“足够快“是不够的。我们要触达 CPU 和内存的物理极限。
1.1 为什么叫 “Extreme Optimization”?
| 阶段 | 关注点 | 目标 |
|---|---|---|
| I-III | 正确性 | “能跑吗?” |
| IV | 集成 | “端到端能跑通吗?” |
| V | 速度 | “能跑多快?” |
在 Phase V,我们假设正确性已经被验证。唯一的焦点是性能。
1.2 为什么叫 “Metal Mode”?
“Metal Mode” 是我们的内部代号,意为:
- 贴近金属 (Close to the Metal):我们将绕过高层抽象,直接操作内存布局、CPU 缓存和 SIMD 指令。
- Bare Metal Rust:没有不必要的
clone(),没有隐藏的malloc(),没有运行时惊喜。
2. 基准测试方法论 (Tier 2)
要优化,必须先测量。但测什么至关重要。
2.1 朴素基准测试的问题
| 基准测试类型 | 测量内容 | 优化的问题 |
|---|---|---|
wrk / curl | HTTP 往返 | 包含操作系统、网络、内核噪声 |
| 单元测试 | 函数正确性 | 没有性能数据 |
这些对于验证 (Phase IV) 有用,但不适合隔离测试 (Phase V)。
2.2 Tier 2: 流水线基准测试 (Pipeline Benchmarks)
我们引入 Tier 2 流水线基准测试:
| 特性 | 描述 |
|---|---|
| 无网络 I/O | 数据预加载在内存中。 |
| 无磁盘 I/O | WAL 被 Mock 或在内存中。 |
| 纯 CPU/内存 | 只测量“热路径“:RingBuffer → UBSCore → ME → Settlement。 |
| 确定性 | 相同输入 → 相同输出 → 相同耗时。 |
目标:建立 “Red Line (红线)” – 理想条件下的当前基线性能。所有后续优化都将以此为基准进行衡量。
0x14-a Benchmark Harness: Test Data Generation
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
| Status | ✅ IMPLEMENTED / QA VERIFIED (Phase 0x14-a Complete) |
|---|---|
| Date | 2025-12-30 |
| Context | Phase V: Extreme Optimization (Step 1) |
| Goal | Re-implement Exchange-Core test data generation algorithm in Rust and verify correctness against golden data. |
1. Chapter Objectives
| # | Goal | Deliverable |
|---|---|---|
| 1 | Implement LCG PRNG | src/bench/java_random.rs - Java-compatible random generator |
| 2 | Implement Order Generator | src/bench/order_generator.rs - Deterministic order sequence |
| 3 | Verify Correctness | Unit tests that compare generated data with golden_*.csv |
Success Criteria: Generated data matches golden CSV byte-for-byte (same order_id, price, size, uid for each row).
2. Reference Algorithm: LCG PRNG
The Exchange-Core project uses Java’s java.util.Random as its PRNG. We must implement a bit-exact replica.
2.1 Java Random Implementation
#![allow(unused)]
fn main() {
/// Java-compatible Linear Congruential Generator
pub struct JavaRandom {
seed: u64,
}
impl JavaRandom {
const MULTIPLIER: u64 = 0x5DEECE66D;
const ADDEND: u64 = 0xB;
const MASK: u64 = (1 << 48) - 1;
pub fn new(seed: i64) -> Self {
Self {
seed: (seed as u64 ^ Self::MULTIPLIER) & Self::MASK,
}
}
fn next(&mut self, bits: u32) -> i32 {
self.seed = self.seed
.wrapping_mul(Self::MULTIPLIER)
.wrapping_add(Self::ADDEND) & Self::MASK;
(self.seed >> (48 - bits)) as i32
}
pub fn next_int(&mut self, bound: i32) -> i32 {
assert!(bound > 0);
let bound = bound as u32;
if (bound & bound.wrapping_sub(1)) == 0 {
// Power of two
return ((bound as u64 * self.next(31) as u64) >> 31) as i32;
}
loop {
let bits = self.next(31) as u32;
let val = bits % bound;
if bits.wrapping_sub(val).wrapping_add(bound.wrapping_sub(1)) >= bits {
return val as i32;
}
}
}
pub fn next_long(&mut self) -> i64 {
((self.next(32) as i64) << 32) + self.next(32) as i64
}
pub fn next_double(&mut self) -> f64 {
let a = (self.next(26) as u64) << 27;
let b = self.next(27) as u64;
(a + b) as f64 / ((1u64 << 53) as f64)
}
}
}
2.2 Seed Derivation
Each test session derives its seed from symbol_id and benchmark_seed:
#![allow(unused)]
fn main() {
fn derive_session_seed(symbol_id: i32, benchmark_seed: i64) -> i64 {
let mut hash: i64 = 1;
hash = 31 * hash + (symbol_id as i64 * -177277);
hash = 31 * hash + (benchmark_seed * 10037 + 198267);
hash
}
}
3. Golden Data Reference
Location: docs/exchange_core_verification_kit/golden_data/
| File | Records | Seed | Description |
|---|---|---|---|
golden_single_pair_margin.csv | 11,000 | 1 | Margin (futures) contract |
golden_single_pair_exchange.csv | 11,000 | 1 | Spot exchange |
CSV Format:
phase,command,order_id,symbol,price,size,action,order_type,uid
4. Implementation Checklist
- Step 1: Create
src/bench/mod.rs - Step 2: Implement
JavaRandominsrc/bench/java_random.rs- Unit test: verify first 100 random numbers match Java output
- Step 3: Implement
TestOrdersGeneratorinsrc/bench/order_generator.rs- Pareto distribution for symbol/user weights
- Order generation logic (GTC orders for FILL phase)
- Seed derivation using
Objects.hashformula
- Step 4: Load and compare with golden CSV
-
#[test] fn test_golden_single_pair_margin() -
#[test] fn test_golden_single_pair_exchange()
-
5. Implementation Results
Note
✅ FILL PHASE: 100% BIT-EXACT MATCH (1,000 orders) ⚠️ BENCHMARK PHASE: Requires matching engine (10,000 orders)
5.1 FILL Phase (Rows 1-1000)
| Field | Match Status | Formula |
|---|---|---|
| Price | ✅ 100% | pow(r,2)*deviation + 4-value averaging |
| Size | ✅ 100% | 1 + rand(6)*rand(6)*rand(6) |
| Action | ✅ 100% | (rand(4)+priceDir>=2) ? BID : ASK |
| UID | ✅ 100% | Pareto user account generation |
5.2 BENCHMARK Phase Analysis
| Component | Status | Notes |
|---|---|---|
| RNG Sequence | ✅ Aligned | nextInt(4) for action FIRST, then nextInt(q_range) |
| Order Selection | ✅ Aligned | Uses orderUids iterator (BTreeMap deterministic) |
| IOC Simulation | ✅ Implemented | Shadow order book with simulate_ioc_match |
| Order Book Feedback | ❌ Gap | Java uses real matcher feedback for lackOfOrders |
Important
BENCHMARK Phase Gap: Java’s
generateRandomOrderuseslastOrderBookOrdersSizeAsk/Bidfrom the real matching engine (updated inupdateOrderBookSizeStat). Without a full Rust matching engine, the shadow book diverges from Java’s state.
5.3 Golden Data Scale
| Dataset | FILL | BENCHMARK | Total |
|---|---|---|---|
golden_single_pair_margin.csv | 1,000 | 10,000 | 11,000 |
golden_single_pair_exchange.csv | 1,000 | 10,000 | 11,000 |
5.4 Key Implementation Details
- JavaRandom - Bit-exact
java.util.RandomLCG - Seed derivation:
Objects.hash(symbol*-177277, seed*10037+198267) - User accounts:
1 + (int)paretoSampleformula - Currency order:
[978, 840]based on HashMap bucket index - CENTRAL_MOVE_ALPHA:
0.01(not 0.1) - Shadow Order Book:
ask_orders/bid_ordersVec with O(1) swap_remove
6. Verification Commands
One-Click Verification:
# Run all golden data verification tests
cargo test golden_ -- --nocapture
Detailed Comparison Test:
# Compare first 20 orders against golden CSV with full output
cargo test test_generator_vs_golden_detailed -- --nocapture
All Benchmark Tests:
# Run all tests in the bench module
cargo test bench:: -- --nocapture
Expected Output:
[ 1] ✅ | Golden: id=1, price=34386, size= 1, action=BID, uid=377
[ 2] ✅ | Golden: id=2, price=34135, size= 1, action=BID, uid=110
[ 3] ✅ | Golden: id=3, price=34347, size= 2, action=BID, uid=459
...
[20] ✅ | Golden: id=20, price=34297, size= 1, action=BID, uid=491
7. Fair Benchmark Procedure
Important
Key to Fairness: Generation and Execution must be separated. Java pre-generates all commands into memory before testing.
7.1 Four Phase Separation
Phase 1: Data Pre-generation ───────── ⏸️ Not Timed
Phase 2: FILL (Pre-fill) ───────────── ⏸️ Not Timed
Phase 3: BENCHMARK (Stress) ────────── ⏱️ Timed Phase
Phase 4: Verification ──────────────── ⏸️ Not Timed
7.2 Rust Implementation Spec
#![allow(unused)]
fn main() {
// ✅ Correct: Pre-generate -> Then Execute
let (fill_commands, benchmark_commands) = generator.pre_generate_all();
// Phase 2: FILL (Not Timed)
for cmd in &fill_commands {
exchange.execute(cmd);
}
// Phase 3: BENCHMARK (Timed Only)
let start = Instant::now();
for cmd in &benchmark_commands {
exchange.execute(cmd);
}
let mtps = benchmark_commands.len() as f64 / start.elapsed().as_secs_f64() / 1_000_000.0;
}
7.3 Pre-generation Interface
#![allow(unused)]
fn main() {
impl TestOrdersGeneratorSession {
/// Pre-generate all commands for fair benchmarking
pub fn pre_generate_all(&mut self) -> (Vec<TestCommand>, Vec<TestCommand>) {
let fill_count = self.config.target_orders_per_side * 2;
let benchmark_count = self.config.symbol_messages;
let fill: Vec<_> = (0..fill_count).map(|_| self.next_command()).collect();
let benchmark: Vec<_> = (0..benchmark_count).map(|_| self.next_command()).collect();
(fill, benchmark)
}
}
}
7.4 Current Status vs ME Requirements
| Task | Current | Needs ME |
|---|---|---|
Pre-gen Method pre_generate_all() | ✅ | - |
| Generate 3M orders to memory | ✅ | - |
| Export CSV for verification | ✅ | - |
| Execute FILL Phase | - | ✅ |
| Execute BENCHMARK Phase | - | ✅ |
| Global Balance Verification | - | ✅ |
8. Phase 0x14-a Summary
8.1 Completed Components
| Component | Status | Verification |
|---|---|---|
| JavaRandom LCG PRNG | ✅ | Bit-exact with Java |
| Seed Derivation | ✅ | Objects.hash reproduction |
| TestOrdersGenerator | ✅ | FILL 1000 rows 100% matched |
| Shadow OrderBook | ✅ | IOC Simulation implemented |
| Pre-gen Interface | ✅ | pre_generate_all(), pre_generate_3m() |
| Fair Test Procedure Docs | ✅ | Section 7, Appendix B |
8.2 BENCHMARK Phase Gap Analysis
| Cause | Description |
|---|---|
| Matching Engine Feedback | Java uses lastOrderBookOrdersSizeAsk/Bid to decide growOrders. |
| Impact | Command type distribution (GTC vs IOC) differs slightly. |
| Solution | Phase 0x14-b introduces full ME to reach 100% parity. |
8.3 Next Steps
| Priority | Task | Dependency |
|---|---|---|
| P0 | Implement Rust Matching Engine (Phase 0x14-b) | - |
| P1 | 3M Orders Stress Test Verification | Matching Engine |
| P2 | Latency Stats (HdrHistogram) | Matching Engine |
🇨🇳 中文
| 状态 | ✅ 已实施 / QA 验证通过 (Phase 0x14-a 完成) |
|---|---|
| 日期 | 2025-12-30 |
| 上下文 | Phase V: 极致优化 (Step 1) |
| 目标 | 用 Rust 重新实现 Exchange-Core 测试数据生成算法,并对比黄金数据验证正确性。 |
1. 章节目标
| # | 目标 | 交付物 |
|---|---|---|
| 1 | 实现 LCG PRNG | src/bench/java_random.rs - Java 兼容随机数生成器 |
| 2 | 实现订单生成器 | src/bench/order_generator.rs - 确定性订单序列 |
| 3 | 验证正确性 | 单元测试对比生成数据与 golden_*.csv |
成功标准: 生成的数据与黄金 CSV 逐字节匹配(每行的 order_id, price, size, uid 完全一致)。
2. 参考算法: LCG PRNG
Exchange-Core 项目使用 Java 的 java.util.Random 作为 PRNG。我们必须实现一个比特级精确的副本。
2.1 Java Random Implementation
#![allow(unused)]
fn main() {
/// Java-compatible Linear Congruential Generator
pub struct JavaRandom {
seed: u64,
}
impl JavaRandom {
const MULTIPLIER: u64 = 0x5DEECE66D;
const ADDEND: u64 = 0xB;
const MASK: u64 = (1 << 48) - 1;
pub fn new(seed: i64) -> Self {
Self {
seed: (seed as u64 ^ Self::MULTIPLIER) & Self::MASK,
}
}
fn next(&mut self, bits: u32) -> i32 {
self.seed = self.seed
.wrapping_mul(Self::MULTIPLIER)
.wrapping_add(Self::ADDEND) & Self::MASK;
(self.seed >> (48 - bits)) as i32
}
pub fn next_int(&mut self, bound: i32) -> i32 {
assert!(bound > 0);
let bound = bound as u32;
if (bound & bound.wrapping_sub(1)) == 0 {
// Power of two
return ((bound as u64 * self.next(31) as u64) >> 31) as i32;
}
loop {
let bits = self.next(31) as u32;
let val = bits % bound;
if bits.wrapping_sub(val).wrapping_add(bound.wrapping_sub(1)) >= bits {
return val as i32;
}
}
}
pub fn next_long(&mut self) -> i64 {
((self.next(32) as i64) << 32) + self.next(32) as i64
}
pub fn next_double(&mut self) -> f64 {
let a = (self.next(26) as u64) << 27;
let b = self.next(27) as u64;
(a + b) as f64 / ((1u64 << 53) as f64)
}
}
}
2.2 Seed Derivation
Each test session derives its seed from symbol_id and benchmark_seed:
#![allow(unused)]
fn main() {
fn derive_session_seed(symbol_id: i32, benchmark_seed: i64) -> i64 {
let mut hash: i64 = 1;
hash = 31 * hash + (symbol_id as i64 * -177277);
hash = 31 * hash + (benchmark_seed * 10037 + 198267);
hash
}
}
3. 黄金数据参考
位置: docs/exchange_core_verification_kit/golden_data/
| 文件 | 记录数 | Seed | 描述 |
|---|---|---|---|
golden_single_pair_margin.csv | 11,000 | 1 | 保证金(期货)合约 |
golden_single_pair_exchange.csv | 11,000 | 1 | 现货交易 |
4. 实施清单
- 步骤 1: 创建
src/bench/mod.rs - 步骤 2: 在
src/bench/java_random.rs中实现JavaRandom- 单元测试: 验证前 100 个随机数与 Java 输出匹配
- 步骤 3: 在
src/bench/order_generator.rs中实现TestOrdersGenerator- Pareto 分布用于用户权重
- 订单生成逻辑 (GTC 阶段)
- 使用
Objects.hash公式进行种子派生
- 步骤 4: 加载并对比黄金 CSV
-
#[test] fn test_golden_single_pair_margin() -
#[test] fn test_golden_single_pair_exchange()
-
5. 实现结果
Note
✅ FILL 阶段: 100% 比特精确匹配 (1,000 订单) ⚠️ BENCHMARK 阶段: 需要匹配引擎 (10,000 订单)
5.1 FILL 阶段 (行 1-1000)
| 字段 | 匹配状态 | 公式 |
|---|---|---|
| Price | ✅ 100% | pow(r,2)*deviation + 4 值平均 |
| Size | ✅ 100% | 1 + rand(6)*rand(6)*rand(6) |
| Action | ✅ 100% | (rand(4)+priceDir>=2) ? BID : ASK |
| UID | ✅ 100% | Pareto 用户账户生成 |
5.2 BENCHMARK 阶段分析
| 组件 | 状态 | 说明 |
|---|---|---|
| RNG 序列 | ✅ 已对齐 | nextInt(4) action 优先,然后 nextInt(q_range) |
| 订单选择 | ✅ 已对齐 | 使用 orderUids 迭代器 (BTreeMap 确定性) |
| IOC 模拟 | ✅ 已实现 | 影子订单簿 simulate_ioc_match |
| 订单簿反馈 | ❌ 缺口 | Java 使用真实匹配引擎反馈 lackOfOrders |
Important
BENCHMARK 阶段缺口: Java 的
generateRandomOrder使用 真实匹配引擎 的lastOrderBookOrdersSizeAsk/Bid(在updateOrderBookSizeStat中更新)。没有完整 Rust 匹配引擎,影子订单簿会与 Java 状态分歧。
5.3 关键实现细节
- JavaRandom - 比特级精确的
java.util.RandomLCG - 种子派生:
Objects.hash(symbol*-177277, seed*10037+198267) - 用户账户:
1 + (int)paretoSample公式 - 货币顺序:
[978, 840]基于 HashMap bucket 索引 - CENTRAL_MOVE_ALPHA:
0.01(不是 0.1) - 影子订单簿:
ask_orders/bid_ordersVec 支持 O(1) swap_remove
6. 验证命令
一键验证:
# 运行所有黄金数据验证测试
cargo test golden_ -- --nocapture
详细对比测试:
# 逐行对比前 20 个订单与黄金 CSV
cargo test test_generator_vs_golden_detailed -- --nocapture
所有 Benchmark 测试:
# 运行 bench 模块的所有测试
cargo test bench:: -- --nocapture
预期输出:
[ 1] ✅ | Golden: id=1, price=34386, size= 1, action=BID, uid=377
[ 2] ✅ | Golden: id=2, price=34135, size= 1, action=BID, uid=110
[ 3] ✅ | Golden: id=3, price=34347, size= 2, action=BID, uid=459
...
[20] ✅ | Golden: id=20, price=34297, size= 1, action=BID, uid=491
7. 公平压测流程 (Fair Benchmark Procedure)
Important
公平比较的关键: 数据生成与执行必须分离。Java 在测试前预生成所有命令到内存。
7.1 四阶段分离
Phase 1: 数据预生成 ───────────── ⏸️ 不计时
Phase 2: FILL (预填充) ──────────── ⏸️ 不计时
Phase 3: BENCHMARK (压测) ──────── ⏱️ 仅此阶段计时
Phase 4: 验证 ────────────────── ⏸️ 不计时
7.2 Rust 实现规范
#![allow(unused)]
fn main() {
// ✅ 正确: 预生成 → 再执行
let (fill_commands, benchmark_commands) = generator.pre_generate_all();
// Phase 2: FILL (不计时)
for cmd in &fill_commands {
exchange.execute(cmd);
}
// Phase 3: BENCHMARK (仅此阶段计时)
let start = Instant::now();
for cmd in &benchmark_commands {
exchange.execute(cmd);
}
let mtps = benchmark_commands.len() as f64 / start.elapsed().as_secs_f64() / 1_000_000.0;
}
7.3 预生成接口
#![allow(unused)]
fn main() {
impl TestOrdersGeneratorSession {
/// Pre-generate all commands for fair benchmarking
pub fn pre_generate_all(&mut self) -> (Vec<TestCommand>, Vec<TestCommand>) {
let fill_count = self.config.target_orders_per_side * 2;
let benchmark_count = self.config.symbol_messages;
let fill: Vec<_> = (0..fill_count).map(|_| self.next_command()).collect();
let benchmark: Vec<_> = (0..benchmark_count).map(|_| self.next_command()).collect();
(fill, benchmark)
}
}
}
7.4 现阶段可完成 vs 需要 ME 集成
| 任务 | 现阶段 | 需 ME |
|---|---|---|
预生成接口 pre_generate_all() | ✅ | - |
| 生成 3M 订单到内存 | ✅ | - |
| 导出 CSV 供验证 | ✅ | - |
| 执行 FILL 阶段 | - | ✅ |
| 执行 BENCHMARK 计时 | - | ✅ |
| 全局余额验证 | - | ✅ |
8. Phase 0x14-a 总结
8.1 已完成组件
| 组件 | 状态 | 验证 |
|---|---|---|
| JavaRandom LCG PRNG | ✅ | 与 Java 比特精确 |
| 种子派生算法 | ✅ | Objects.hash 复现 |
| TestOrdersGenerator | ✅ | FILL 1000 行 100% 匹配 |
| 影子订单簿 | ✅ | IOC 模拟实现 |
| 预生成接口 | ✅ | pre_generate_all(), pre_generate_3m() |
| 公平测试流程文档 | ✅ | Section 7, Appendix B |
8.2 BENCHMARK 阶段差异分析
| 原因 | 说明 |
|---|---|
| 匹配引擎反馈 | Java 使用 lastOrderBookOrdersSizeAsk/Bid 决定 growOrders |
| 影响 | 命令类型分布略有不同(GTC vs IOC 比例) |
| 解决方案 | Phase 0x14-b 实现完整匹配引擎后可达 100% |
8.3 下一步
| 优先级 | 任务 | 依赖 |
|---|---|---|
| P0 | 实现 Rust 匹配引擎 (Phase 0x14-b) | - |
| P1 | 3M 订单压测验证 | 匹配引擎 |
| P2 | 延迟统计 (HdrHistogram) | 匹配引擎 |
0x14-b Order Commands: Feature Completion
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
| Status | ✅ COMPLETED |
|---|---|
| Context | Phase V: Extreme Optimization (Step 2) |
| Goal | Achieve feature parity with Exchange-Core’s Spot Matching Engine to support the Benchmark harness. |
| Scope | Spot Only. Margin/Futures deferred to 0x14-c. |
1. Gap Analysis
Based on code review of src/engine.rs, src/models.rs, src/orderbook.rs:
✅ Already Implemented
| Feature | Location | Notes |
|---|---|---|
| MatchingEngine | src/engine.rs | process_order(), match_buy(), match_sell() |
| Price-Time Priority | engine.rs:80-165 | Lowest ask first (buy), highest bid first (sell), FIFO |
| Limit Orders | engine.rs:61-68 | Unfilled remainder rests in book |
| Market Orders | engine.rs:90-94 | u64::MAX price for buy, matches all |
| Order Status | models.rs:57-68 | NEW, PARTIALLY_FILLED, FILLED, CANCELED, REJECTED, EXPIRED |
| OrderBook | orderbook.rs | BTreeMap storage, cancel_order() by ID+price+side |
❌ Missing (Required for 0x14-b)
Based on
exchange_core_verification_kit/test_datasets_and_steps.mdL162-171 (Command Distribution):
| Feature | Benchmark % | Current Status | Priority |
|---|---|---|---|
| IOC (Immediate-or-Cancel) | ~35% | ❌ Not Implemented | P0 |
| MoveOrder | ~8% | ❌ Not Implemented | P0 |
| ReduceOrder | ~3% | ❌ Not Implemented | P1 |
| FOK_BUDGET | ~1% | ❌ Not Implemented | P2 |
Note: FOK_BUDGET (Fill-or-Kill by Quote Budget) is ~1% of benchmark commands. Required for full S-to-Huge parity.
2. Architectural Requirements
2.1 Data Model Extensions (Schema)
We must extend InternalOrder to support varied execution strategies without polluting the core OrderType.
New Enum: TimeInForce
#![allow(unused)]
fn main() {
pub enum TimeInForce {
GTC, // Good Till Cancel (Default)
IOC, // Immediate or Cancel (Taker only, cancel remainder)
FOK, // Fill or Kill (All or Nothing) - Optional for now
}
}
Updated InternalOrder:
- Add
pub time_in_force: TimeInForce - Add
pub post_only: bool(Future proofing, Generator doesn’t strictly use it yet but good practice)
2.2 Matching Engine Logic
The Matching Engine must process orders sequentially based on seq_id.
Execution Flow:
- Incoming Order: Parse
TimeInForceandOrderType. - Matching:
- Limit GTC: Match against opposite book. Remaining -> Add to Book.
- Limit IOC: Match against opposite book. Remaining -> Expire (do not add to book).
- Market: Match against opposite book at any price. Remaining -> Expire (or defined slippage protection).
- Command Handling:
MoveOrder: Atomic “Cancel old ID + Place new ID”. Priority Loss is acceptable (and expected).ReduceOrder: Reduce qty in-place. Priority Preservation required if implemented efficiently, else re-insert. Exchange-Core typically preserves priority on reduce.
2.3 FokBudget Handling (Spot)
- Generator produces
FokBudget? -> Checks show mostlyGtc/Ioc. - Correction:
CommandType::FokBudgetexists in Generator enum but usage is rare in the Spot Benchmark. We prioritize IOC and GTC.
3. Developer Specification
3.1 Task List
- Model Update:
- Modify
src/models.rs: AddTimeInForceenum. - Update
InternalOrderstruct.
- Modify
- Engine Implementation (
src/engine/matching.rs):- Implement
process_order(&mut self, order: InternalOrder) -> OrderResult. - Implement
match_market_order. - Implement
match_limit_order.
- Implement
- Command Logic:
- Implement
reduce_order(price, old_qty, new_qty). - Implement
move_order(atomic cancel + place).
- Implement
3.2 Acceptance Criteria
- Unit Tests:
test_ioc_partial_fill: 100 qty order vs 60 qty book -> 60 filled, 40 expired.test_gtc_maker: 100 qty order vs empty book -> 100 rests in book.test_market_sweep: Market order consumes multiple price levels.
4. QA Verification Plan
- Property:
Iocorders must never appear inall_orders()(the book) after processing. - Property:
Gtcorders must appear in book if not fully matched. - | Latency | Measure
process_ordertime | ✅ < 5µs (Verified) |
5. Implementation Status & Results
Note
✅ Phase 0x14-b: 100% Feature Parity Achieved
5.1 Verification Matrix
| Module | Purpose | Tests | Status |
|---|---|---|---|
| IOC Logic | Immediate-or-Cancel (Taker) | 9/9 | ✅ |
| MoveOrder | Price modification (Atomic) | 7/7 | ✅ |
| ReduceOrder | Qty reduction (Priority Preserved) | 5/5 | ✅ |
| Persistence | Settlement & DB Sync | 5/5 | ✅ |
| Edge Cases | Robustness & Error Handling | 17/17 | ✅ |
| Total | 43/43 | ✅ 100% |
5.2 Key Technical Findings
- Asynchronous Consistency: Fixed a critical bug where Cancel/Reduce actions bypassed the
MEResultpersistence queue. - Priority Preservation: Verified that
ReduceOrdermaintains temporal priority, whileMoveOrder(Price change) correctly resets it. - Reactive Loop: Optimized the matching engine to handle state transitions without synchronous blocking on I/O.
6. Validation Commands
Automated QA Suite:
# Run all 0x14-b specific QA tests
./scripts/test_0x14b_qa.sh --with-gateway
Unit Verification:
cargo test test_ioc_ test_mov_ test_reduce_
🇨🇳 中文
| 状态 | ✅ 已完成 |
|---|---|
| 上下文 | Phase V: 极致优化 (Step 2) |
| 目标 | 实现与 Exchange-Core 现货撮合引擎的功能对齐,以支持基准测试工具。 |
| 范围 | 仅现货。杠杆/期货推迟至 0x14-c。 |
1. 差距分析 (基于 Verification Kit)
基于
exchange_core_verification_kit/test_datasets_and_steps.mdL162-171 命令分布:
✅ 已实现
| 功能 | 基准占比 | 说明 |
|---|---|---|
| GTC 限价单 | ~45% | engine.rs::process_order() |
| Cancel 取消 | ~9% | 完整链路: Gateway → Pipeline → OrderBook → WAL |
❌ 需新增
| 功能 | 基准占比 | 优先级 |
|---|---|---|
| IOC 即时单 | ~35% | P0 |
| Move 移动 | ~8% | P0 |
| Reduce 减量 | ~3% | P1 |
| FOK_BUDGET | ~1% | P2 |
说明: FOK_BUDGET (按报价币金额买入) 占比 ~1%,完成 S-to-Huge 全量测试需实现。
2. 架构需求
2.1 数据模型扩展 (Schema)
必须扩展 InternalOrder 以支持多种执行策略。
新枚举: TimeInForce
#![allow(unused)]
fn main() {
pub enum TimeInForce {
GTC, // Good Till Cancel (默认: 一直有效直到取消)
IOC, // Immediate or Cancel (Taker 专用: 剩余未成交部分立即过期)
FOK, // Fill or Kill (全部成交或全部取消) - 暂可选
}
}
更新 InternalOrder:
- 新增
pub time_in_force: TimeInForce - 新增
pub post_only: bool(为未来准备,虽然生成器暂时未严格使用)
2.2 撮合引擎逻辑
撮合引擎必须基于 seq_id 顺序处理 订单。
执行流:
- 新订单接入: 解析
TimeInForce和OrderType。 - 撮合过程:
- Limit GTC: 与对手盘撮合。剩余部分 -> 加入订单簿。
- Limit IOC: 与对手盘撮合。剩余部分 -> 立即过期 (Expire) (不入簿)。
- Market: 与对手盘在任意价格撮合。剩余部分 -> 过期 (或滑点保护)。
- 指令处理:
MoveOrder: 原子化 “取消旧ID + 下单新ID”。优先级丢失 是可接受的 (且预期的)。ReduceOrder: 原地减少数量。如果实现得当,应保留优先级。Exchange-Core 通常在减量时保留优先级。
2.3 FokBudget 处理 (现货)
- 生成器会产生
FokBudget吗? -> 代码显示主要是Gtc/Ioc。 - 修正:
CommandType::FokBudget存在于枚举中,但在现货 Benchmark 中极少使用。我们优先保证 IOC 和 GTC 的正确性。
3. 开发规范 (Developer Specification)
3.1 任务清单
- 模型更新:
- 修改
src/models.rs: 增加TimeInForce枚举。 - 更新
InternalOrder结构体。
- 修改
- 引擎实现 (
src/engine/matching.rs):- 实现
process_order(&mut self, order: InternalOrder) -> OrderResult。 - 实现
match_market_order(市价撮合)。 - 实现
match_limit_order(限价撮合)。
- 实现
- 指令逻辑:
- 实现
reduce_order(price, old_qty, new_qty)。 - 实现
move_order(atomic cancel + place)。
- 实现
3.2 验收标准
- 单元测试:
test_ioc_partial_fill: 100 qty 订单 vs 60 qty 深度 -> 成交 60, 过期 40。test_gtc_maker: 100 qty 订单 vs 空订单簿 -> 100 进入 OrderBook。test_market_sweep: 市价单吃掉多个价格档位。
4. QA 验证计划
- 属性:
Ioc订单处理后,绝不 应出现在all_orders()(订单簿) 中。 - 属性:
Gtc订单若未完全成交,必须 出现在订单簿中。 - | 延迟 | 测量
process_order处理时间 | ✅ < 5µs (已验证) |
5. 实施结果与验证
Note
✅ Phase 0x14-b: 100% 功能对齐已完成
5.1 验证矩阵
| 模块 | 目的 | 测试项 | 状态 |
|---|---|---|---|
| IOC 逻辑 | 立即成交或取消 (Taker) | 9/9 | ✅ |
| MoveOrder | 改价指令 (原子化) | 7/7 | ✅ |
| ReduceOrder | 减量指令 (保留优先级) | 5/5 | ✅ |
| 持久化 | 结算与数据库同步 | 5/5 | ✅ |
| 边界测试 | 鲁棒性与错误处理 | 17/17 | ✅ |
| 合计 | 43/43 | ✅ 100% |
5.2 关键技术点总结
- 异步一致性: 修复了 Cancel/Reduce 操作绕过
MEResult持久化队列的 Bug,确保数据库状态与内存一致。 - 优先级保留: 通过单元测试验证了
ReduceOrder成功保留时间优先级,而MoveOrder(改价) 正确重置了优先级。 - 响应式架构: 优化了撮合引擎的反应循环,确保所有指令都在微秒级完成且具备确定性的副作用路径。
6. 验证命令
一键回归测试:
# 运行所有 0x14-b QA 自动化测试
./scripts/test_0x14b_qa.sh --with-gateway
单元逻辑验证:
cargo test test_ioc_ test_mov_ test_reduce_
0x13 CPU Affinity & Cache
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📅 Status: 🚧 Planned Core Objective: Pin threads to CPU cores and optimize data layout for cache locality.
1. Overview
- CPU Affinity: Bind matching threads to isolated cores to reduce context switching.
- Cache Locality: Optimize
OrderBooknode layout to fit L1/L2 cache lines. - False Sharing: Padding atomic variables to prevent cache line contention.
(Detailed content coming soon in Phase III)
🇨🇳 中文
📅 状态: 🚧 计划中 核心目标: 主要线程绑核与缓存友好性优化。
1. 概述
- CPU 亲和性 (Affinity): 将撮合线程绑定到隔离核心,减少上下文切换。
- 缓存局部性 (Locality): 优化
OrderBook节点布局以适应 L1/L2 缓存行。 - 伪共享 (False Sharing): 通过 Padding 避免多线程竞争同一缓存行。
(第三阶段详细内容敬请期待)
0x14 SIMD Matching Acceleration
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
📅 Status: 🚧 Planned Core Objective: Use SIMD (AVX2/AVX-512) instructions to accelerate order matching.
1. Overview
- Vectorization: Process multiple price levels in parallel.
- Intrinsics: Direct use of Rust
std::archintrinsics. - Benchmark: Aiming for > 5M TPS.
(Detailed content coming soon in Phase III)
🇨🇳 中文
📅 状态: 🚧 计划中 核心目标: 使用 SIMD (AVX2/AVX-512) 指令集加速订单撮合。
1. 概述
- 向量化 (Vectorization): 并行处理多个价格档位。
- Intrinsics: 直接使用 Rust
std::arch内联汇编/指令。 - 基准目标: 目标吞吐量 > 500万 TPS。
(第三阶段详细内容敬请期待)
0x17 SIMD Matching Acceleration
Status: Planned
This chapter will cover SIMD (Single Instruction Multiple Data) vectorized matching using AVX-512 or ARM NEON instructions.
Coming soon…
Performance Report
Generated: 2025-12-31 04:48:47
Summary
| Metric | Baseline | Current | Change |
|---|---|---|---|
| Orders | 0 | 100,000 | - |
| Trades | 0 | 0 | - |
| Exec Time | 0.00ms | 119.10ms | +0.0% |
| Throughput | 0/s | 839,618/s | +0.0% |
Timing Breakdown
| Component | Time | OPS | % of Total |
|---|---|---|---|
| Balance Check | 0ns | 0K | 0.0% |
| Matching Engine | 23.42ms | 4.3M | 100.0% |
| Settlement | 0ns | 0K | 0.0% |
| Ledger I/O | 0ns | 0K | 0.0% |
Latency Percentiles
| Percentile | Value |
|---|---|
| MIN | 150ns |
| AVG | 1.1µs |
| P50 | 1.6µs |
| P99 | 4.2µs |
| P99.9 | 11.1µs |
| MAX | 960.5µs |
Verdict
✅ No significant regressions detected
Performance History
性能报告历史存档。每个重要章节完成后生成一份报告。
报告列表
| 日期 | 章节 | 关键变化 |
|---|---|---|
| 2025-12-18 | 0x08h | 服务化重构,ME占76.6%,1.3M数据集 |
| 2025-12-16 | 0x07b | 性能基线建立,Ledger I/O 占 98.5% |
命名规范
YYYY-MM-DD-章节.md
例如:2025-12-16-0x07b.md
如何生成报告
# 1. 运行性能测试
cargo run --release
# 2. 生成报告
python3 scripts/generate_perf_report.py > docs/src/perf-report.md
# 3. 存档历史
cp docs/src/perf-report.md docs/src/perf-history/$(date +%Y-%m-%d)-章节.md
# 4. 更新此索引文件,添加新条目
# 5. 提交
git add docs/src/perf-report.md docs/src/perf-history/
git commit -m "docs: Update perf report"
Performance Report - 2025-12-18 0x08-h
Branch: 0x08-h-performance-monitoring
Dataset: 1.3M orders (30% cancels, high-balance mode)
Changes: Service-oriented refactoring (IngestionService, UBSCoreService, MatchingService, SettlementService)
Summary
| Metric | Single-Thread | Multi-Thread |
|---|---|---|
| Orders | 1,300,000 | 1,300,000 |
| Trades | 667,567 | 667,567 |
| Exec Time | 14.18s | 20.17s |
| Throughput | 91,710/s | 64,450/s |
| P50 Latency | 2.5 µs | 113 ms |
Multi-Thread Breakdown
| Component | Time | % | Latency/op |
|---|---|---|---|
| Matching Engine | 19.23s | 76.6% | 19.23 µs |
| Persistence | 5.35s | 21.3% | 4.12 µs |
| Settlement | 0.51s | 2.0% | 0.76 µs |
Key Changes
- Extracted 4 service structs from spawn functions
- Reduced
pipeline_mt.rsfrom 720 to ~250 lines - Added
pipeline_services.rs(~640 lines) - All tests pass with exact trade count match
Verdict
✅ Correctness Verified: 667,567 trades, 0 balance differences
Performance Report
Generated: 2025-12-16 18:16:36
Summary
| Metric | Baseline | Current | Change |
|---|---|---|---|
| Orders | 100,000 | 100,000 | - |
| Trades | 47,886 | 47,886 | - |
| Exec Time | 3753.87ms | 3956.64ms | +5.4% |
| Throughput | 26,639/s | 25,274/s | -5.1% |
Timing Breakdown
| Component | Time | OPS | % of Total |
|---|---|---|---|
| Balance Check | 17.64ms | 5.7M | 0.4% |
| Matching Engine | 36.37ms | 2.7M | 0.9% |
| Settlement | 4.71ms | 21.2M | 0.1% |
| Ledger I/O | 3.88s | 26K | 98.5% |
Latency Percentiles
| Percentile | Value |
|---|---|
| MIN | 125ns |
| AVG | 38.6µs |
| P50 | 625ns |
| P99 | 429.7µs |
| P99.9 | 1.37ms |
| MAX | 7.25ms |
Verdict
❌ 2 regression(s) detected
- Exec Time: +5.4%
- Throughput: -5.1%
开发规范 (Development Guidelines)
Core Principle: Standardize environments to eliminate “works on my machine” issues.
🐍 Python Environment
We use uv for strict dependency management and execution speed.
1. The Golden Rule
NEVER use system python3 or pip directly for project scripts.
ALWAYS use uv run to execute scripts.
2. Standard Workflow
# 1. Sync dependencies (like npm install)
uv sync
# 2. Run script (like npm run)
uv run python3 scripts/my_script.py
3. Adding Dependencies
# Add new package
uv add requests
🦀 Rust Environment
- Format:
cargo fmtmust pass. - Lint:
cargo clippymust pass (no warnings). - Tests:
cargo testmust pass.
API 规范 (API Conventions)
ID 规范 (ID Specification)
命名规范 (Naming Convention)
Money Type Safety Standard | 资金类型安全规范
Version: 1.3 | Last Updated: 2025-12-31
本文件定义了本项目处理资金(余额、订单金额、成交价格)的治理方案。 重点是:如何在代码层面禁止不符合规范的操作。 任何违反本规范的代码不得合并。
Part I: 背景与设计决策
1.1 核心风险
金额是领域概念,不是原始类型。
在任何金融系统中,“钱“都不应被视为一个裸露的整数。它是一个携带精度语义的领域对象——1 BTC 内部表示为 100_000_000 聪,这个 10^8 的缩放因子是资产的内在属性,而非程序员的临时决定。
当开发者在代码中随意写下 amount * 10u64.pow(8) 时,他实际上在破坏这层抽象,将领域逻辑泄漏到业务代码的每一个角落。这会导致:
| 风险类型 | 后果 |
|---|---|
| 账本无法对齐 | 任何微小误差都会破坏“资金恒等定理“,导致无法 100% 精确对账。我们无法区分“正常误差“还是“真正的 Bug“。 |
| 语义错误 | 错误地将 BTC 金额与 USDT 金额直接相加。 |
| 溢出攻击 | 恶意构造的超大数值导致系统崩溃或资金错算。 |
| 维护噩梦 | 转换逻辑复杂,到处重复写必然到处犯错。 |
1.2 为什么选择 u64 + 内部缩放?
前置阅读: 关于浮点数的问题,请参阅 0x02 浮点数的诅咒,此处不再重复。
核心结论:
f64无法满足跨平台确定性(不同 CPU/编译器结果可能不同)。Decimal无法满足极致性能(比u64慢 10x+)。u64是唯一能同时满足“区块链级验证强度“和“高频撮合性能“的方案。
但 u64 需要内部缩放,这引入了复杂性。因此我们必须:
- 将缩放算法封装在
money.rs中。 - 严禁在其他地方手工进行缩放运算。
1.3 内部缩放方案:如何实现大额处理?
核心机制:我们为每种资产定义系统精度(通常 8 位),而非使用链上原生精度(如 ETH 的 18 位)。
| 资产 | 链上精度 | 系统精度 | u64 最大可处理金额 |
|---|---|---|---|
| BTC | 8 位 | 8 位 | 1844 亿 BTC (远超总供应量) |
| ETH | 18 位 | 8 位 | 1844 亿 ETH ✅ |
| USDT | 6 位 | 6 位 | 18.4 万亿 USDT ✅ |
Important
精度权衡:使用 8 位系统精度意味着 ETH 最小单位是
0.00000001 ETH(10 gwei),而非链上的1 wei。 对于交易所场景,这完全足够——没有人会交易 1 wei 的 ETH。
这就是为什么“缩放“必须封装:
- 不同资产有不同的链上精度和系统精度。
- 入金时:链上精度 → 系统精度(可能截断极小尾数)。
- 出金时:系统精度 → 链上精度(补零)。
- 这套转换逻辑复杂,必须集中管理,严禁各处手写。
Tip
u128的替代方案:如果不追求极致性能,使用u128可以直接采用统一的 18 位精度,避免不同资产间的精度转换问题。但这会牺牲约 10-20% 的撮合性能。
Part II: 解决方案与决策 (Solutions & Decisions)
2.1 类型安全:Newtype 守卫 (The Newtype Guardian)
问题: u64 是原生类型,开发者可以轻易写出 amount * 10u64.pow(8)。
方案: 引入不透明的包装类型 ScaledAmount(u64):
- 内部字段
u64是 private 的,无法直接访问。 - 所有构造必须通过
money.rs提供的审计过的 Constructor。 - 如果有人想“私自计算“,他必须先解包(
to_raw()),这种“不自然“的操作在 Code Review 中一眼可见。
#![allow(unused)]
fn main() {
// 🛡️ 核心类型定义
pub struct ScaledAmount(u64); // 无符号:余额、订单数量
pub struct ScaledAmountSigned(i64); // 有符号:盈亏、差额
}
已实现:
-
ScaledAmount/ScaledAmountSigned定义 -
checked_add/checked_sub安全算术 -
Deref<Target = u64>允许比较,但禁止直接算术
2.2 访问控制:入口收缩 (Visibility Chokepoint)
问题: 如果底层函数 parse_amount(str, decimals) 是 pub 的,开发者会倾向于直接使用它。
方案: 将 Layer 1 工具函数收缩为 pub(crate):
| 可见性 | 函数 | 用途 |
|---|---|---|
pub(crate) | parse_amount, format_amount | 仅限 money.rs 和核心模块内部使用 |
pub | SymbolManager::parse_qty(), SymbolManager::format_price() | 外部唯一入口 |
效果: 在代码自动补全时,开发者首先看到的是 SymbolManager 上的高层方法。
已实现:
-
parse_amount/format_amount改为pub(crate)
2.3 分层架构 (Layered Architecture)
| 层级 | 组件 | 职责 | 可见性 |
|---|---|---|---|
| Layer 1 (Core) | money.rs | 原子类型定义与底层缩放 | pub(crate) |
| Layer 2 (Domain) | Asset / AssetInfo | 感知资产精度,提供意图封装 API | pub |
| Layer 3 (Integration) | SymbolManager / MoneyFormatter | 交易对级别的转换与批量格式化 | pub |
Tip
扩展性:
MoneyFormatter目前服务于深度图。随着 Kline/Ticker 复杂化,此模式可推广至所有行情展示。
2.4 铁律:意图封装 API (Intent-based API)
Caution
业务代码禁止直接调用
money::函数。必须使用Asset/AssetInfo提供的意图封装 API。
问题:直接调用底层函数暴露实现细节
#![allow(unused)]
fn main() {
// ❌ 错误:暴露了 decimals 参数,调用者需要知道内部实现
let amount_scaled = *money::parse_decimal(amount, asset.decimals as u32)?;
}
解决方案:在 Asset / AssetInfo 上提供意图封装方法
#![allow(unused)]
fn main() {
// ✅ 正确:调用者只需表达意图,不需要知道 decimals
let amount_scaled = asset.parse_amount(amount)?;
let fee_scaled = asset.parse_amount_allow_zero(fee)?;
}
设计架构:
┌─────────────────────────────────────────────────────────────────┐
│ 业务代码 (deposit.rs, withdraw.rs, order.rs...) │
│ ✅ asset.parse_amount(decimal) │
│ ✅ asset.parse_amount_allow_zero(decimal) │
│ ✅ asset.format_amount(scaled_amount) │
└─────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ 意图封装层 (Asset / AssetInfo) │
│ 封装 decimals 参数,提供"类型 → 类型"的简洁 API │
└─────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ 核心转换层 (money.rs) │
│ parse_decimal() / parse_decimal_allow_zero() / format_amount() │
│ ⚠️ pub(crate) - 仅供意图封装层调用 │
└─────────────────────────────────────────────────────────────────┘
关键收益:
| 收益 | 说明 |
|---|---|
| 简洁性 | asset.parse_amount(d) vs money::parse_decimal(d, asset.decimals as u32) |
| 封装性 | 调用者不需要知道 decimals、display_decimals 等内部参数 |
| 一致性 | 所有业务代码使用相同的 API 模式 |
| 可审计性 | 直接 money:: 调用是需要审查的红旗 |
Part III: 内外边界与显示策略 (Internal/External Boundary & Display)
3.0 核心规范:内部实现绝不暴露
Caution
内部的
u64表示是实现细节,绝对不能暴露给客户端。
强制规范:
- 统一转换层:内部系统与外部 Client 之间,必须经过统一的转换层。
- API 层使用 Decimal:DTO 中的金额字段使用
StrictDecimal(自定义类型),利用rust_decimal的格式验证能力。 - 分层验证:
- Serde 层:格式验证(拒绝
.5、非数字等)→ 得到Decimal - SymbolManager 层:精度/范围验证 → 得到
ScaledAmount
- Serde 层:格式验证(拒绝
- 精度来源唯一:资产精度从
SymbolManager获取,严禁硬编码。
┌─────────────┐ ┌──────────────┐ ┌──────────────┐ ┌─────────────┐
│ Client │ ──→ │ Serde 层 │ ──→ │SymbolManager │ ──→ │ Internal │
│ (String) │ │ (Decimal) │ │ (验证精度) │ │ (u64) │
└─────────────┘ └──────────────┘ └──────────────┘ └─────────────┘
"1.5" 格式验证 Decimal(1.5) 精度验证 ScaledAmount(150_000_000)
设计优势:
- 利用库能力:
rust_decimal提供成熟的数字解析 - 早期失败:格式错误在反序列化阶段就拦截
- 关注点分离:格式验证 vs 精度验证 分开处理
- 业务代码简化:Handler 拿到的
Decimal已是合法数字,只需验证范围
3.1 截断是唯一合法的舍入策略
决策:所有转换、计算过程中的精度损失,一律使用截断(Truncation),不允许四舍五入。
原因:
- 一致性:与整数除法的行为一致(向零截断)。
- 可预测性:任何人在任何平台重算,结果完全一致。
- 安全性:宁愿少显示,也不能让用户认为自己拥有实际不存在的余额。
| 场景 | 策略 | 示例 |
|---|---|---|
| 入金转换 | 截断 | 链上 1.23456789012345678 ETH → 系统 1.23456789 ETH |
| 余额显示 | 截断 | 内部 123456789 → 显示 "1.2345" (4位显示精度) |
| 成交计算 | 截断 | 避免凭空产生资金 |
3.2 严格解析:拒绝模糊输入
决策: 拒绝 .5 和 5. 等简写,强制要求 0.5 和 5.0。
原因:处理金额数据,严谨和安全是第一位的。模糊的输入格式可能导致:
- 手抖或脚本错误输入不完整数字
- 不同解析器对歧义格式有不同解读
- 隐蔽的精度丢失
行动项:
- 在 OpenAPI 文档和错误信息中明确提示此规范
3.3 零值处理:默认严格 + 显式入口
问题:零值在某些场景是非法的(订单数量),在另一些场景是合法的(手续费)。
反模式:到处写 workaround
#![allow(unused)]
fn main() {
// ❌ 散落各处,维护噩梦
let fee = if fee_str == "0" {
ScaledAmount::ZERO
} else {
parse_amount(&fee_str, decimals)?
};
}
推荐模式:显式入口
#![allow(unused)]
fn main() {
// ===== 默认入口:严格,拒绝零 =====
/// 用于订单数量、价格等必须非零的场景
pub fn parse_amount(s: &str, decimals: u32) -> Result<ScaledAmount>
// ===== 显式入口:允许零 =====
/// 用于手续费等可能为零的场景
/// 调用者应该知道自己在做什么
pub fn parse_amount_allow_zero(s: &str, decimals: u32) -> Result<ScaledAmount>
}
使用示例:
#![allow(unused)]
fn main() {
// 订单数量:必须非零(使用默认严格版本)
let qty = symbol_mgr.parse_qty(symbol, &req.quantity)?;
// 提现手续费:可以为零(显式表达意图)
let fee = symbol_mgr.parse_fee_allow_zero(symbol, &req.fee)?;
}
设计原则:
| 原则 | 说明 |
|---|---|
| 坑的成功 (Pit of Success) | 默认行为是安全的,需要绕过时必须显式声明 |
| 意图可见 | 代码中看到 _allow_zero 就知道这里允许零 |
| Code Review 信号 | _allow_zero 调用是需要审查的信号 |
| 解析层不做业务判断 | 是否允许零由调用方通过选择入口决定 |
Part IV: 如何在代码层面强制执行?
核心问题:如何禁止开发者到处私自转换?
4.1 第一道防线:类型系统 (编译期)
Newtype 封装:ScaledAmount(u64) 的内部字段是 private 的。
#![allow(unused)]
fn main() {
pub struct ScaledAmount(u64); // u64 不可直接访问
impl ScaledAmount {
pub(crate) fn from_raw(v: u64) -> Self { Self(v) } // 仅 crate 内部可构造
pub fn to_raw(self) -> u64 { self.0 } // 显式"逃逸"
}
}
效果:
- ❌
ScaledAmount::from_raw(100)— 外部模块无法调用 - ❌
amount.0— 无法直接访问内部字段 - ❌
amount + 100u64— 类型不匹配,编译失败 - ✅
*amount > 0— 通过Deref允许比较
4.2 第二道防线:可见性控制 (API 入口收缩)
层级隔离:
| 函数 | 可见性 | 谁可以调用 |
|---|---|---|
parse_amount() | pub(crate) | 仅 money.rs 和核心模块 |
format_amount() | pub(crate) | 仅 money.rs 和核心模块 |
SymbolManager::parse_qty() | pub | 任何模块(唯一合法入口) |
SymbolManager::format_price() | pub | 任何模块(唯一合法入口) |
效果:
- Gateway Handler 的代码自动补全中,只能看到
SymbolManager的方法。 - 如果开发者想调用底层
parse_amount(),会发现它不在作用域内。
4.3 第三道防线:API 层数据类型 (DTO 设计)
强制规范:API 请求/响应中的金额字段,必须使用 String 类型。
#![allow(unused)]
fn main() {
// ✅ 正确: 使用 String,由 Handler 调用 SymbolManager 转换
#[derive(Deserialize)]
pub struct PlaceOrderRequest {
pub quantity: String, // "1.5"
pub price: String, // "50000.00"
}
// ❌ 错误: 直接使用 u64,暴露内部实现
#[derive(Deserialize)]
pub struct PlaceOrderRequest {
pub quantity: u64, // 客户端如何知道要传 150_000_000?
}
}
Serde 不会自动转换:如果客户端传 "quantity": 1.5(JSON number),String 类型会反序列化失败,强制客户端传 "1.5"(JSON string)。
4.4 第四道防线:CI 自动化审计
审计脚本: scripts/audit_money_safety.sh
#!/bin/bash
set -e
echo "🔍 Auditing money safety..."
# 1. 检查非 money.rs 中的手动缩放
if grep -rn "10u64.pow" --include="*.rs" src/ | grep -v "money.rs"; then
echo "❌ FAIL: Found 10u64.pow outside money.rs"
exit 1
fi
# 2. 检查 Decimal 手动幂运算
if grep -rn "Decimal::from(10).powi" --include="*.rs" src/ | grep -v "money.rs"; then
echo "❌ FAIL: Found Decimal power operation outside money.rs"
exit 1
fi
# 3. 检查硬编码精度 (可选,需要更精细的规则)
# grep -rn "decimals.*=.*8" --include="*.rs" src/ | grep -v "symbol_manager.rs"
echo "✅ Money safety audit passed!"
集成:
.github/workflows/ci.yml— 每次 PR 自动运行.git/hooks/pre-commit— 本地提交前拦截
4.5 第五道防线:Code Review 信号
高危操作清单 (PR 审查时重点关注):
| 代码模式 | 风险等级 | 处理方式 |
|---|---|---|
.to_raw() | ⚠️ 高 | 必须注释说明原因 |
10u64.pow 在 money.rs 外 | 🚫 禁止 | 拒绝合并 |
decimals: u32 硬编码 | ⚠️ 高 | 应从 SymbolManager 获取 |
API DTO 中 u64 金额字段 | 🚫 禁止 | 必须使用 String |
Deref 后直接算术 (*a + *b) | ⚠️ 高 | 应使用 checked_add |
4.6 第六道防线:Agent 记忆 (AGENTS.md)
已生效: AGENTS.md 必读列表中包含本规范。所有 AI Agent 在开始工作前必须阅读,确保生成的代码符合规范。
Part V: 未来升级路径 (Future Upgrade Path)
| 阶段 | 目标 | 状态 |
|---|---|---|
| Phase 0 | Newtype 定义, API 收缩, 文档治理 | ✅ 已完成 |
| Phase 1 | audit_money_safety.sh 集成 CI | ⏳ 待实现 |
| Phase 1.5 | API Money Enforcement:Extractor + IntoResponse 强制转换 | ⏳ 待实现 |
| Phase 2 | 存量代码全面扫描与迁移 | ⏳ 待执行 |
| Phase 2.5 | Legacy 代码迁移至意图封装 API(见下方详情) | ⏳ 待执行 |
Phase 2.5 详情:Legacy 代码迁移
目标:将所有直接调用 money:: 函数的代码迁移到 Asset / AssetInfo 意图封装 API。
迁移内容:
| 旧代码 | 新代码 |
|---|---|
money::parse_decimal(d, asset.decimals as u32) | asset.parse_amount(d) |
money::parse_decimal_allow_zero(d, asset.decimals as u32) | asset.parse_amount_allow_zero(d) |
money::format_amount(amt, decimals, display) | asset.format_amount(amt) |
已完成:
-
src/funding/deposit.rs -
src/funding/withdraw.rs
待迁移(扫描整个代码库中的 money::parse 和 money::format 调用):
- 其他业务模块全面扫描
- 添加 CI 检查禁止业务代码直接调用
money::函数
总结:为什么如此严苛 (Why So Heavy?)
核心原则 1:账本必须 100% 可对账
如果允许任何精度误差的存在,系统的账本就无法做到 100% 对齐。 我们无法利用“资金恒等定理“(总入金 = 总余额 + 总出金)来进行精确对账。 一旦账本不能 100% 对齐,我们就无法分辨一个差异是“可接受的正常误差“还是一个隐藏的 Bug。 真正的问题可能被“误差“掩盖,直到造成无法挽回的损失。
核心原则 2:转换逻辑必须收敛到唯一位置
金额转换逻辑非常复杂(精度、舍入、溢出检查)。 如果允许在代码库各处重复编写,每个地方都可能犯不同的错误。 将转换收敛到唯一的、经过充分审计和测试的代码位置 (
money.rs+SymbolManager),我们可以:
- 对这一处进行穷尽式测试(边界值、溢出、负数等)。
- 确保所有调用者都享受同等的安全保障。
- 在发现 Bug 时,只需修复一处,全局生效。
简单总结 (The Rules)
- NO
10u64.pow()outsidemoney.rs. - NO raw
u64arithmetic for amounts. - NO implicit scaling.
- YES
SymbolManagerfor all intent-based conversions.
速查表 (Quick Reference)
| 场景 | ✅ 正确做法 | ❌ 错误做法 |
|---|---|---|
| API DTO 字段 | quantity: StrictDecimal | quantity: u64 或 quantity: String |
| Decimal → ScaledAmount | symbol_mgr.decimal_to_scaled(symbol, decimal) | 手动计算 decimal * 10^8 |
| ScaledAmount → String | symbol_mgr.format_price(symbol, amount) | format!("{}", amount) |
| 获取精度 | symbol_mgr.get_decimals(asset) | let decimals = 8; |
| 算术运算 | amount.checked_add(other)? | *amount + *other |
| 比较运算 | *amount > 0 | ✅ 允许 (Deref) |
API Money Enforcement | API 层资金类型强制规范
目标:确保所有 API Handler 都通过统一的转换层处理金额数据,禁止各处私自转换。
适用范围:Request(入)和 Response(出)双向。
1. 问题陈述
Gateway 有多个 API Handler,每个都需要:
- 入向:接收 JSON 中的金额字符串(如
"1.5"),转换为内部ScaledAmount - 出向:将内部
ScaledAmount格式化为 JSON 字符串返回给客户端
核心挑战:如何确保所有 Handler 都通过 SymbolManager 转换,而不是各自写一套转换逻辑?
2. 方案对比
方案 A:DTO + 显式验证层
机制:Handler 接收原始 DTO,手动调用验证函数。
#![allow(unused)]
fn main() {
// Request
async fn place_order(Json(req): Json<PlaceOrderRequest>) -> Result<...> {
// 每个 Handler 都要记得调用 validate()
let validated = symbol_mgr.validate_order(&req)?;
// ...
}
// Response
async fn get_balance(...) -> Json<BalanceResponse> {
let raw = service.get_balance(...)?;
// 每个 Handler 都要记得调用 format()
Json(symbol_mgr.format_balance_response(&raw))
}
}
| 优点 | 缺点 |
|---|---|
| 简单直接 | 依赖开发者自觉,容易遗漏 |
| 无需额外类型 | 转换逻辑分散在各 Handler |
方案 B:Service 层封装
机制:Handler 只能调用 Service 方法,Service 内部做转换。
#![allow(unused)]
fn main() {
// Handler 只传递原始 DTO
async fn place_order(Json(req): Json<PlaceOrderRequest>) -> Result<...> {
order_service.place(req).await // Service 内部调用 SymbolManager
}
async fn get_balance(...) -> Result<Json<BalanceResponse>> {
Ok(Json(balance_service.get_formatted(...).await?)) // Service 返回已格式化数据
}
}
| 优点 | 缺点 |
|---|---|
| 业务逻辑集中 | Service 仍需记得调用 SymbolManager |
| Handler 简洁 | 如果 Service 遗漏,问题仍会发生 |
方案 C:Axum Extractor + IntoResponse 模式 ⭐ 推荐
机制:在 Axum 框架层强制转换。
Request 端:自定义 Extractor
#![allow(unused)]
fn main() {
/// 已验证的订单请求,Handler 直接拿到 ScaledAmount
pub struct ValidatedOrder {
pub symbol_id: SymbolId,
pub quantity: ScaledAmount,
pub price: ScaledAmount,
}
#[async_trait]
impl<S> FromRequest<S> for ValidatedOrder
where
S: Send + Sync,
{
type Rejection = ApiError;
async fn from_request(req: Request, state: &S) -> Result<Self, Self::Rejection> {
let Json(raw): Json<RawOrderRequest> = Json::from_request(req, state).await?;
let symbol_mgr = state.symbol_manager();
Ok(ValidatedOrder {
symbol_id: raw.symbol_id,
quantity: symbol_mgr.parse_qty(raw.symbol_id, &raw.quantity)?,
price: symbol_mgr.parse_price(raw.symbol_id, &raw.price)?,
})
}
}
// Handler 直接拿到已验证的类型,无法绕过
async fn place_order(order: ValidatedOrder) -> Result<impl IntoResponse> {
// order.quantity 已经是 ScaledAmount,不可能是未转换的 String
}
}
Response 端:自定义 IntoResponse
#![allow(unused)]
fn main() {
/// 已格式化的余额响应,自动调用 SymbolManager 格式化
pub struct FormattedBalanceResponse {
pub balances: Vec<(AssetId, ScaledAmount)>,
pub symbol_mgr: Arc<SymbolManager>,
}
impl IntoResponse for FormattedBalanceResponse {
fn into_response(self) -> Response {
let formatted: Vec<BalanceDto> = self.balances.iter()
.map(|(asset, amount)| BalanceDto {
asset: asset.to_string(),
amount: self.symbol_mgr.format_asset_amount(*asset, *amount),
})
.collect();
Json(formatted).into_response()
}
}
// Handler 返回内部类型,格式化在 IntoResponse 中自动完成
async fn get_balances(State(state): State<AppState>) -> FormattedBalanceResponse {
let balances = state.service.get_balances().await;
FormattedBalanceResponse { balances, symbol_mgr: state.symbol_mgr.clone() }
}
}
| 优点 | 缺点 |
|---|---|
| 框架层强制,Handler 拿不到原始 String | 需要为每类请求定义 Extractor |
| 编译期保证 | 需要在 Extractor 中获取 SymbolManager |
| 转换逻辑完全集中 | 初期实现成本略高 |
方案 D:类型驱动设计(最严格)
机制:定义“未验证“的金额类型,只能通过 SymbolManager 转换。
#![allow(unused)]
fn main() {
/// 未验证的金额,不能直接使用
pub struct UnvalidatedAmount(String);
impl UnvalidatedAmount {
// 没有 .parse() 方法
// 没有 Deref<Target=String>
// 唯一的出路是传给 SymbolManager
}
impl SymbolManager {
pub fn parse(&self, asset: AssetId, amount: UnvalidatedAmount) -> Result<ScaledAmount>;
}
// DTO 使用未验证类型
#[derive(Deserialize)]
pub struct PlaceOrderRequest {
pub quantity: UnvalidatedAmount, // 无法直接 .parse()
}
}
| 优点 | 缺点 |
|---|---|
| 类型系统完全封锁 | 引入更多类型 |
| 即使忘记调用也无法编译 | Serde 自定义反序列化略复杂 |
3. 推荐方案:StrictDecimal + Extractor
3.1 核心设计:分层验证
Client (JSON String "1.5")
↓ Serde: StrictDecimal 自定义反序列化
API DTO (StrictDecimal) ← 格式已验证
↓ Extractor: SymbolManager.decimal_to_scaled()
Handler (ScaledAmount) ← 精度已验证
关键洞察:
- Serde 层负责格式验证:利用
rust_decimal的解析能力,拒绝非法格式 - SymbolManager 负责精度验证:检查小数位是否符合资产精度
- 业务代码只需验证范围:数字格式和精度都已保证
3.2 StrictDecimal 实现
#![allow(unused)]
fn main() {
use rust_decimal::Decimal;
use serde::{Deserialize, Deserializer};
/// 严格格式的 Decimal,在反序列化时进行格式验证
#[derive(Debug, Clone, Copy)]
pub struct StrictDecimal(Decimal);
impl StrictDecimal {
pub fn inner(&self) -> Decimal {
self.0
}
}
impl<'de> Deserialize<'de> for StrictDecimal {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
let s = String::deserialize(deserializer)?;
// 严格格式检查:拒绝 .5, 5., 空字符串等
if s.is_empty() {
return Err(serde::de::Error::custom("Amount cannot be empty"));
}
if s.starts_with('.') {
return Err(serde::de::Error::custom("Invalid format: use 0.5 not .5"));
}
if s.ends_with('.') {
return Err(serde::de::Error::custom("Invalid format: use 5.0 not 5."));
}
// 使用 Decimal 库解析
let d = Decimal::from_str(&s)
.map_err(|e| serde::de::Error::custom(format!("Invalid decimal: {}", e)))?;
// 拒绝负数(金额必须非负)
if d.is_sign_negative() {
return Err(serde::de::Error::custom("Amount cannot be negative"));
}
Ok(StrictDecimal(d))
}
}
}
3.3 DTO 使用示例
#![allow(unused)]
fn main() {
#[derive(Debug, Deserialize)]
pub struct PlaceOrderRequest {
pub symbol: String,
pub quantity: StrictDecimal, // 格式已验证
pub price: StrictDecimal, // 格式已验证
}
}
3.4 SymbolManager 扩展
#![allow(unused)]
fn main() {
impl SymbolManager {
/// 将已验证的 Decimal 转换为 ScaledAmount
/// 只需验证精度,格式已在 Serde 层验证
pub fn decimal_to_scaled(
&self,
symbol: SymbolId,
decimal: Decimal,
) -> Result<ScaledAmount, MoneyError> {
let decimals = self.get_symbol_decimals(symbol)?;
// 检查精度是否超限
if decimal.scale() > decimals {
return Err(MoneyError::PrecisionExceeded {
provided: decimal.scale(),
max: decimals,
});
}
// 转换为 u64
let scaled = decimal * Decimal::from(10u64.pow(decimals));
let raw = scaled.to_u64()
.ok_or(MoneyError::Overflow)?;
Ok(ScaledAmount::from_raw(raw))
}
}
}
3.5 Extractor 整合
#![allow(unused)]
fn main() {
pub struct ValidatedOrder {
pub symbol_id: SymbolId,
pub quantity: ScaledAmount,
pub price: ScaledAmount,
}
#[async_trait]
impl<S> FromRequest<S> for ValidatedOrder
where
S: Send + Sync,
{
type Rejection = ApiError;
async fn from_request(req: Request, state: &S) -> Result<Self, Self::Rejection> {
let Json(raw): Json<PlaceOrderRequest> = Json::from_request(req, state).await?;
let symbol_mgr = state.symbol_manager();
let symbol_id = symbol_mgr.get_symbol_id(&raw.symbol)?;
Ok(ValidatedOrder {
symbol_id,
// StrictDecimal 已验证格式,这里只验证精度
quantity: symbol_mgr.decimal_to_scaled(symbol_id, raw.quantity.inner())?,
price: symbol_mgr.decimal_to_scaled(symbol_id, raw.price.inner())?,
})
}
}
}
3.6 设计优势总结
| 层级 | 职责 | 验证内容 |
|---|---|---|
| Serde (StrictDecimal) | 格式验证 | 拒绝 .5, 5., 负数, 非数字 |
| SymbolManager | 精度验证 | 检查小数位是否超限 |
| 业务代码 | 范围验证 | 检查金额是否在合理范围 |
关键收益:
- 利用库能力:
rust_decimal提供成熟的数字解析 - 早期失败:格式错误在反序列化阶段就拦截
- 关注点分离:每层只负责一种验证
- 编译期保证:Handler 拿到的是
ScaledAmount,无法出错
4. CI 自动化检查:机制强制,不靠自觉
核心原则:我们要从机制和流程上规范,而不是依赖开发者的“自觉“。
4.1 审计脚本:scripts/audit_api_types.sh
#!/bin/bash
set -e
echo "🔍 Auditing API type safety..."
# 1. 检查 DTO 中是否存在 u64/i64 金额字段
# 金额字段名通常包含: amount, quantity, price, balance, volume
AMOUNT_PATTERNS="amount|quantity|price|balance|volume|size|qty"
if grep -rn "pub\s\+\(${AMOUNT_PATTERNS}\)\s*:\s*u64" --include="*.rs" src/gateway/; then
echo "❌ FAIL: Found u64 amount field in API DTO"
echo " → Should use String type instead"
exit 1
fi
if grep -rn "pub\s\+\(${AMOUNT_PATTERNS}\)\s*:\s*i64" --include="*.rs" src/gateway/; then
echo "❌ FAIL: Found i64 amount field in API DTO"
echo " → Should use String type instead"
exit 1
fi
# 2. 检查 Handler 中是否直接 parse 金额
if grep -rn "\.parse::<u64>\(\)" --include="*.rs" src/gateway/; then
echo "❌ FAIL: Found direct u64 parsing in gateway"
echo " → Should use SymbolManager.parse_qty() instead"
exit 1
fi
# 3. 检查是否直接使用 format!() 格式化金额
if grep -rn 'format!\s*(\s*"{}"\s*,\s*\w*amount' --include="*.rs" src/gateway/; then
echo "⚠️ WARNING: Possible direct amount formatting found"
echo " → Consider using SymbolManager.format_*() instead"
fi
# 4. 检查 Decimal 是否绕过 SymbolManager
if grep -rn "Decimal::from_str" --include="*.rs" src/gateway/ | grep -v "// safe:"; then
echo "⚠️ WARNING: Direct Decimal parsing found in gateway"
echo " → Should use SymbolManager for conversions"
fi
echo "✅ API type safety audit passed!"
4.2 检查规则详解
| 检查项 | 目标 | 检测模式 |
|---|---|---|
| DTO 字段类型 | 金额字段必须是 String | `pub (amount |
| 直接解析 | 禁止在 Handler 中 .parse::<u64>() | .parse::<u64>() in src/gateway/ |
| 直接格式化 | 禁止 format!("{}", amount) | format!(...amount...) in src/gateway/ |
| 绕过转换层 | 禁止直接使用 Decimal::from_str | Decimal::from_str in src/gateway/ |
4.3 CI 集成
GitHub Actions 配置:
# .github/workflows/ci.yml
- name: Audit API Type Safety
run: |
chmod +x scripts/audit_api_types.sh
./scripts/audit_api_types.sh
本地 Pre-commit Hook:
# .git/hooks/pre-commit
#!/bin/bash
./scripts/audit_api_types.sh || exit 1
4.4 豁免机制
对于确实需要绕过检查的特殊场景(如测试代码、内部工具),可以使用注释标记:
#![allow(unused)]
fn main() {
// safe: 这是测试代码,允许直接解析
let amount = "100".parse::<u64>().unwrap();
}
审计脚本应排除带有 // safe: 注释的行。
5. 实施路线图
| 阶段 | 任务 | 状态 |
|---|---|---|
| Phase 1 | 为核心订单 API 实现 ValidatedOrder Extractor | ⏳ 待实现 |
| Phase 2 | 为余额/资产 API 实现 FormattedBalanceResponse | ⏳ 待实现 |
| Phase 3 | 为所有金额相关 API 统一改造 | ⏳ 待实现 |
| Phase 4 | 实现 audit_api_types.sh 并集成 CI | ⏳ 待实现 |
| Phase 5 | 添加 pre-commit hook 本地拦截 | 📋 规划中 |
6. 参考
- Money Type Safety Standard — 资金类型安全规范
- 0x02 浮点数的诅咒 — 浮点数问题详解
CI 常见坑与解决方案
本文档汇总 GitHub Actions CI 中遇到的典型问题及解决方案。
🚨 0. 关键警告:禁止使用 pkill -f
问题描述
在 Antigravity IDE 中执行 pkill -f "zero_x_infinity" 会导致 IDE 崩溃。
因为 IDE 的 language_server 进程参数中包含项目路径,会被 pkill -f 误杀。
正确做法
永远使用 PID 或精确匹配:
# ✅ 方法 1: 启动时记录 PID (推荐)
./target/release/zero_x_infinity --gateway &
GW_PID=$!
# ...
kill "$GW_PID"
# ✅ 方法 2: 精确匹配进程名
pkill "^zero_x_infinity$"
1. 服务容器 (Service Containers)
1.1 禁止使用 docker exec
问题描述
GitHub Actions 的 services: 是托管服务容器,不是本地 Docker 容器。
services:
tdengine:
image: tdengine/tdengine:latest
ports:
- 6041:6041
典型报错
docker exec tdengine taos -s "DROP DATABASE IF EXISTS trading"
# Error: No such container: tdengine
解决方案
使用 REST API 或网络协议连接,不用 docker exec:
# ❌ 错误
docker exec tdengine taos -s "DROP DATABASE IF EXISTS trading"
# ✅ TDengine REST API
curl -sf -u root:taosdata -d "DROP DATABASE IF EXISTS trading" http://localhost:6041/rest/sql
# ✅ PostgreSQL psql
PGPASSWORD=trading123 psql -h localhost -U trading -d exchange_info_db -c "..."
1.2 服务连接必须用 localhost
# CI 中:
PG_HOST=localhost # ✅ 正确
PG_HOST=postgres # ❌ 只在 Docker Compose 中有效
2. 环境变量
2.1 测试脚本必须加载 db_env.sh
问题描述
测试脚本没有设置 DATABASE_URL 等环境变量,导致 PostgreSQL 连接超时。
典型报错
❌ Failed to connect to PostgreSQL: pool timed out while waiting for an open connection
解决方案
在脚本开头 source db_env.sh:
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/lib/db_env.sh"
2.2 CI 环境检测
if [ -n "$CI" ]; then
# CI 专用逻辑
else
# 本地环境逻辑
fi
3. workflow 步骤条件
3.1 正确的日志 Dump 模式
问题描述
如果不当使用 continue-on-error: true,会导致即使测试失败,Job 最终也被标记为成功(绿色),掩盖了错误。
❌ 错误做法:
- name: Run Test
run: ./test.sh
continue-on-error: true # 导致测试失败也被忽略
- name: Dump Logs
run: cat logs/*.log
# 结果:Job 变绿,错误被隐藏!
✅ 正确做法:
不要使用 continue-on-error。利用 if: failure() 条件在失败时运行日志打印步骤。
- name: Run Test
run: ./test.sh
# 默认 behavior: 失败立即停止后续非 if: failure() 步骤
- name: Dump Logs
if: failure() # 仅在之前步骤失败时运行
run: cat logs/*.log
# 注意:此步骤本身会成功,但 Job 状态仍由 Run Test 决定(红色)
3.2 日志文件路径一致性
确保脚本写入的日志路径与 workflow 读取的路径一致:
# 脚本中
nohup ./gateway > /tmp/gateway_fee_e2e.log 2>&1 &
# workflow 中必须匹配
cat /tmp/gateway_fee_e2e.log # ✅ 路径一致
cat /tmp/gw_test.log # ❌ 路径不一致
4. 数据库初始化
4.1 PostgreSQL 健康检查
问题: 默认使用 root 用户,数据库没有 root 角色。
services:
postgres:
options: >-
--health-cmd "pg_isready -U trading -d exchange_info_db" # 指定用户
4.2 TDengine 精度
必须使用 PRECISION 'us':
CREATE DATABASE IF NOT EXISTS trading PRECISION 'us';
如果精度错误,微秒时间戳会报 “Timestamp data out of range”。
4.3 服务沉淀时间
- name: Initialize TDengine
run: ./scripts/db/init.sh td && sleep 5 # 等待元数据初始化
5. 二进制与启动
5.1 二进制新鲜度
本地测试前确保 release 二进制是最新的:
cargo build --release
CI 每次都是 fresh build,但本地开发可能运行旧版本。
5.2 Gateway 启动等待
for i in $(seq 1 60); do
if curl -sf "http://localhost:8080/api/v1/health" > /dev/null 2>&1; then
break
fi
sleep 1
done
注意:健康检查路径是 /api/v1/health,不是 /health。
6. 配置与端口对齐 (Config & Port Parity)
6.1 5433 vs 5432 端口陷阱
- 本地 (Dev): 默认端口 5433 (
config/dev.yaml)。 - CI 环境: 标准端口 5432 (
config/ci.yaml)。 - 解决方案: 测试脚本必须检测
CI=true并传递--env ci。
if [ "$CI" = "true" ]; then
GATEWAY_ARGS="--gateway --env ci"
fi
6.2 标准化脚本模板
请复用标准模板:scripts/templates/test_integration_template.sh。
7. Python 环境规范 (uv)
7.1 禁止裸跑 Python
CI 环境中直接运行 python3 可能找不到依赖。
7.2 解决方案
使用 uv run 显式管理依赖,并推荐使用 HEREDOC 模式以确保环境隔离:
#!/bin/bash
# 统一入口 (Wrapper Scripts) 示例
export SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# 使用 --with 显式声明依赖,并传递所有参数 "$@"
uv run --with requests --with pynacl python3 - "$@" << 'EOF'
import sys
import os
# ... python code ...
EOF
8. 快速参考
| 场景 | 本地 | CI |
|---|---|---|
| TDengine 操作 | docker exec tdengine taos | curl localhost:6041/rest/sql |
| PostgreSQL 连接 | 容器名或 localhost | localhost only |
| 环境变量 | 手动设置或 .env | source db_env.sh |
| 日志输出 | 终端 | 文件 + artifact 上传 |
9. 竞态条件与资源清理 (Race Conditions)
9.1 端口占用 (“Address already in use”)
问题描述 在同一个 Job 中连续运行多个测试脚本(如 QA Suite + POC),前一个脚本可能未完全释放端口,导致后续脚本启动 Gateway 失败。
解决方案
在启动 Gateway 前,必须显式清理旧进程。在 CI 环境中(非本地 IDE),可以使用 pkill:
# Ensure clean slate
echo "Cleaning up any existing Gateway processes..."
pkill -9 -f "zero_x_infinity" || true
sleep 2 # 等待内核释放端口
关键点:使用 kill -9 确保立即终止,防止僵尸进程。
10. 错误处理规范
10.1 如果 Config 加载 Panic
禁止:
#![allow(unused)]
fn main() {
File::open("config.yaml").unwrap(); // ❌ 导致 crash,无详细日志
}
必须:
使用 anyhow::Result 并添加 Context:
#![allow(unused)]
fn main() {
File::open("config.yaml").with_context(|| "Failed to open config")?; // ✅
}
10.2 数据库唯一约束 (Duplicate Key)
问题:重复注册用户导致 500 Panic 并在日志中打印堆栈跟踪,干扰排查。
解决方案:捕获 “duplicate key” 错误,记录为 Warning,并返回 409 Conflict。
#![allow(unused)]
fn main() {
if err.to_string().contains("duplicate key") {
tracing::warn!("User already exists: {}", err);
return Err(StatusCode::CONFLICT);
}
}
11. 测试数据与环境对齐 (Test Data Parity)
11.1 手动 SQL 注入 vs API 初始化
问题描述
本地开发通常依赖 run_poc.sh (基于 API 的全流程验证),而 CI 可能会运行更底层的 test_e2e.sh (基于 SQL 注入的快速验证)。
如果两者逻辑不一致,会导致本地通过但 CI 失败。
典型案例:
- API 充值逻辑:自动处理单位缩放 (Scaling)。
- 手动 SQL 注入:错误地假设数据库存储 Scaled Integer (10^6),手动插入了
1000000。 - 结果:数据库里实际上存储了 1,000,000 USDT (而非 1 USDT),导致后续余额检查逻辑完全失效。
解决方案:
- 首选 API 初始化:尽可能在测试脚本中使用
POST /api/v1/private/deposit等 API 进行数据准备,保证业务逻辑一致性。 - 二次确认 Schema:如果必须使用 SQL 注入,务必查阅
migrations/或schema.rs确认字段类型 (Decimal vs BigInt)。 - 共享 Helper:使用统一的 Python/Bash 库处理数据注入,避免每个脚本重复造轮子且逻辑不一。
最后更新:2025-12-30
12. Bash 脚本陷阱
12.1 算术扩展导致脚本静默退出
问题描述
在开启了 set -e 的 Bash 脚本中,如果算术表达式的结果为 0,Bash 会将其视为“失败”(False),导致脚本立即退出。
典型场景
set -e
TOTAL_TESTS=0
# ...
((TOTAL_TESTS++)) # 当 TOTAL_TESTS 为 0 时,表达式结果为 0,返回码 1 -> 脚本立即退出!
后果
CI 任务在没有任何报错日志的情况下突然停止(Silent Failure),极难排查。
解决方案
始终使用标准的 POSIX 算术扩展写法,或者确保算术表达式不以此方式单独执行:
# ✅ 推荐:标准写法,不受结果值影响
TOTAL_TESTS=$((TOTAL_TESTS + 1))
# ✅ 替代:强制返回真(不优雅)
((TOTAL_TESTS++)) || true
Pre-Merge to Main Checklist
合并到 main 分支之前,必须完成所有检查项。
1. 代码质量 ✓
-
cargo fmt --check通过 -
cargo test通过 - Clippy(必须使用 CI 相同配置):
cargo clippy -- -D warnings -A clippy::too_many_arguments -A clippy::collapsible_if -A clippy::unwrap_or_default -A clippy::doc_lazy_continuation -A clippy::manual_is_multiple_of -A clippy::implicit_saturating_sub -A clippy::redundant_pattern_matching -A clippy::derivable_impls
2. 文档更新 ✓
-
docs/src/*.md相关章节已更新 -
docs/src/SUMMARY.md目录结构正确 -
mdbook build构建成功 - README.md 已添加新章节链接
- 阅读 agent-testing-notes.md(避免常见坑)
3. CI/CD ✓
3.1 本地验证(必须)
-
./scripts/test_ci.sh --quick通过 - 模拟 CI 单独运行(关键!本地全跑可能掩盖问题):
CI=true ./scripts/test_ci.sh --test-gateway-e2e CI=true ./scripts/test_ci.sh --test-kline CI=true ./scripts/test_ci.sh --test-depth CI=true ./scripts/test_ci.sh --test-account
3.2 CI 环境检查
- 不使用
docker exec(CI service container 不支持) - 数据库连接使用
localhost而非容器名 - 所有 helper 函数定义在全局作用域(不在
if块内)
3.3 CI 失败时
- 立即下载日志:
gh run view <run-id> --log-failed - 搜索错误:
grep -i "error\|fail\|fatal" logs/*.txt - 根据日志修复,不要瞎猜
4. Git 操作 ✓
- 所有更改已 commit
-
git status显示 clean - 分支已 rebase/merge 到最新 main (无冲突)
5. 发布 ✓
- 合并后 创建 Git Tag:
git tag v{版本号} - 推送 Tag:
git push origin --tags
Caution
⚠️ 严禁删除 feature 分支!分支是项目历史的重要组成部分,必须永久保留。
执行命令
# 1. 最终检查
cargo check && cargo test && cargo clippy && cargo fmt --check
# 2. 文档构建
cd docs && mdbook build && cd ..
# 3. 合并
git checkout main
git merge <feature-branch> --no-ff -m "Merge branch '<feature-branch>'"
# 4. 打 Tag
git tag v0.10-a-account-system
git push origin main --tags
# 5. 完成
echo "✅ Merge complete!"
编译与验证注意事项 (Build & Verification Guide)
本文档总结了在本地进行 Gateway 开发和 E2E 测试时,关于“修改未生效”和“端口冲突”的常见坑点及解决方案。
1. 源码修改未生效 (The Stale Binary Trap)
当你执行了 cargo build --release 但发现测试结果仍然运行旧代码逻辑时:
常见原因
- 指纹失效 (Fingerprint):Cargo 错误认为二进制已是最新的,跳过了重编或重新链接。
- 增量缓存损坏:
target/release/incremental中的缓存导致逻辑未刷新。 - 时间戳分辨率:源码修改时间与上次构建时间太接近(APFS 精度问题)。
解决方案 (由轻到重)
- 最常用 (强制重链):
touch src/main.rs && cargo build --release - 清理增量缓存 (非全量 clean):
rm -rf target/release/incremental - 强制重编核心模块:
cargo clean -p zero_x_infinity && cargo build --release
2. 端口冲突与僵尸进程 (Port Conflict)
现象
Gateway 启动失败并报错:❌ FATAL: Failed to bind to 0.0.0.0:8080: Address already in use。
这通常是因为后台残留了一个旧的 Gateway 进程。
诊断与解决
- 查杀残留进程 (安全方式):
不要使用
pkill -f(会杀掉 IDE)。请使用:# 查找并杀掉占用 8080 的进程 lsof -i :8080 kill -9 <PID> - 检查脚本冲突:
确认你没有在终端手动运行 Gateway 的同时,又在另一个窗口跑
test_transfer_e2e.sh。
3. E2E 测试最佳实践
确认二进制新鲜度
在运行测试前,手动核对时间戳或观察 E2E 脚本的警告:
ls -lh src/funding/service.rs target/release/zero_x_infinity
数据库一致性
如果逻辑看起来对了但 API 报错 Missing column:
- 确认 PostgreSQL 迁移已手动应用(如果
init.sh因为存在旧数据跳过了)。 - 确认
balances_tb的account_type和status是SMALLINT,在 Rust 中必须对应i16。
运行 Gateway 时常备参数
手动调试时请务必带上环境变量参数:
./target/release/zero_x_infinity --gateway --env dev
最后更新: 2025-12-24
Database Selection: TDengine vs Others
🇺🇸 English | 🇨🇳 中文
🇺🇸 English
Scenario: Settlement Persistence - Storing orders, trades, and balances.
📊 Comparison
Candidates
| Database | Type | Use Case |
|---|---|---|
| TDengine | Time-Series | IoT, Financial Data, High-Frequency Write |
| PostgreSQL | Relational | General OLTP |
| TimescaleDB | PG Extension | Time-Series (PG based) |
| ClickHouse | Columnar | OLAP, Analytics |
🎯 Why TDengine?
1. Performance (Based on TSBS)
| Metric | TDengine vs TimescaleDB | TDengine vs PostgreSQL |
|---|---|---|
| Write Speed | 1.5-6.7x Faster | 10x+ Faster |
| Query Speed | 1.2-24.6x Faster | 10x+ Faster |
| Storage | 1/12 - 1/27 Space | Huge Saving |
2. Matching Exchange Requirements
| Requirement | TDengine Solution |
|---|---|
| High Frequency Write | Million/sec write capacity |
| Timestamp Index | Native time-series design |
| High Cardinality | High data points, Super Tables |
| Real-time Stream | Built-in Stream Computing |
| Data Subscription | Kafka-like real-time push |
| Auto Partitioning | Auto-sharding by time |
3. Simplified Architecture
TDengine Solution:
┌─────────────────────────────────────────────┐
│ TDengine │
│ Persistence + Stream + Subscription │
55 └─────────────────────────────────────────────┘
Fewer Components = Lower Ops Complexity + Lower Latency
4. Rust Ecosystem
- ✅ Official Rust Client
taos - ✅ Async (tokio)
- ✅ Connection Pool (r2d2)
- ✅ WebSocket (Cloud friendly)
❌ Why Not Others?
PostgreSQL
- ❌ Poor time-series performance.
- ❌ High-frequency write bottleneck.
- ❌ Large storage consumption.
TimescaleDB
- ⚠️ Slower than TDengine.
- ⚠️ Much larger storage footprint.
ClickHouse
- ✅ Fast analytics.
- ❌ Real-time row-by-row write is weak (prefers batch).
- ❌ High Ops complexity.
📋 Data Model
TDengine Super Table
-- Orders Super Table
CREATE STABLE orders (
ts TIMESTAMP, -- PK
order_id BIGINT,
user_id BIGINT,
side TINYINT,
order_type TINYINT,
price BIGINT,
qty BIGINT,
filled_qty BIGINT,
status TINYINT
) TAGS (
symbol_id INT -- Partition Tag
);
-- Trades
CREATE STABLE trades (...) TAGS (symbol_id INT);
-- Balances
CREATE STABLE balances (...) TAGS (user_id BIGINT, asset_id INT);
Advantages
- ✅ Auto-partition by TAG.
- ✅ Auto-aggregation query.
- ✅ Unified Schema.
🏗️ Architecture Integration
Gateway -> Order Queue -> Trading Core -> Events -> TDengine
✅ Final Recommendation
Primary Storage: TDengine
- Orders, Trades, Balances History.
- High performance write/read.
📊 Expected Performance
- Write Latency: < 1ms
- Query Latency: < 5ms
- Storage Compression: 10:1
- Supported TPS: 100,000+
🇨🇳 中文
场景: 交易所 Settlement Persistence - 存储订单、成交、余额
📊 方案对比
候选数据库
| 数据库 | 类型 | 适用场景 |
|---|---|---|
| TDengine | 时序数据库 | IoT, 金融数据, 高频写入 |
| PostgreSQL | 关系型数据库 | 通用 OLTP |
| TimescaleDB | PostgreSQL扩展 | 时序数据 (基于PG) |
| ClickHouse | 列式分析数据库 | OLAP, 大规模聚合 |
🎯 为什么选择 TDengine
1. 性能优势 (基于 TSBS 基准测试)
| 指标 | TDengine vs TimescaleDB | TDengine vs PostgreSQL |
|---|---|---|
| 写入速度 | 1.5-6.7x 更快 | 10x+ 更快 |
| 查询速度 | 1.2-24.6x 更快 | 10x+ 更快 |
| 存储空间 | 1/12 - 1/27 | 极大节省 |
2. 交易所场景完美匹配
| 需求 | TDengine 解决方案 |
|---|---|
| 高频写入 | 百万/秒级写入能力 |
| 时间戳索引 | 原生时序设计,毫秒级查询 |
| 高基数支持 | 亿级数据点,Super Table |
| 实时分析 | 内置流计算引擎 |
| 数据订阅 | 类 Kafka 的实时推送 |
| 自动分区 | 按时间自动分片 |
| 高压缩率 | 1/10 存储空间 |
3. 简化架构
TDengine 方案:
┌─────────────────────────────────────────────┐
│ TDengine │
│ 持久化 + 流计算 + 数据订阅 │
└─────────────────────────────────────────────┘
减少组件 = 减少运维复杂度 + 减少延迟
4. Rust 生态支持
- ✅ 官方 Rust 客户端
taos - ✅ 异步支持 (tokio 兼容)
- ✅ 连接池 (r2d2)
- ✅ WebSocket 连接 (适合云部署)
❌ 为什么不选其他方案
PostgreSQL
- ❌ 通用数据库,时序性能差
- ❌ 高频写入会成为瓶颈
- ❌ 存储空间消耗大
TimescaleDB
- ⚠️ 基于 PostgreSQL,继承其限制
- ⚠️ 比 TDengine 慢 1.5-6.7x
- ⚠️ 存储空间是 TDengine 的 12-27x
ClickHouse
- ✅ 分析查询极快
- ❌ 实时写入不如 TDengine
- ❌ 更适合批量导入 + OLAP
- ❌ 运维复杂度高
📋 交易所数据模型设计
TDengine Super Table 设计
-- 订单表 (Super Table)
CREATE STABLE orders (...) TAGS (symbol_id INT);
-- 成交表 (Super Table)
CREATE STABLE trades (...) TAGS (symbol_id INT);
-- 余额快照表 (Super Table)
CREATE STABLE balances (...) TAGS (user_id BIGINT, asset_id INT);
Super Table 优势
- ✅ 自动按 TAG 分表
- ✅ 查询时自动聚合
- ✅ Schema 统一管理
🏗️ 架构集成方案
Gateway -> Order Queue -> Trading Core -> Events -> TDengine
✅ 最终推荐
主存储: TDengine
- 订单、成交、余额历史
- 高性能写入和查询
- 自动数据分区和压缩
📊 预期性能
- 写入延迟: < 1ms
- 查询延迟: < 5ms
- 存储压缩率: 10:1
- 支持 TPS: 100,000+
ADR-001: WebSocket Security - Strict Auth Enforcement
| Status | Accepted |
|---|---|
| Date | 2025-12-27 |
| Author | QA / Security Remediation Agent |
| Context | Phase 0x10.5 Backend Gaps |
Context
During the QA Audit of Phase 0x10.5, a critical security vulnerability (Identity Spoofing) was identified in the WebSocket Gateway.
The implementation allowed clients to assert any user_id via query parameter (ws://...?user_id=123) without cryptographic verification (Token/Signature).
Decision
To immediately mitigate this P0 vulnerability while preserving functionality for the “Public Market Data” milestone:
- Strict Anonymous Mode: The Gateway MUST reject any connection attempt where
user_idis provided and is NOT0(Anonymous). - HTTP 401: Rejection must return
401 Unauthorized. - Future Auth: Authenticated access (for Private Channels) is deferred to the Authentication Phase (0x0A-b). Until then, NO private user connections are allowed.
Consequences
- Positive: Eliminates identity spoofing risk. System is secure for public data consumption.
- Negative: Private channel testing (e.g.,
private.order) is temporarily blocked until proper Auth is implemented.
Verification
scripts/test_qa_adversarial.pywas created to verify this constraint.
ADR-005: Unified Chain-Asset Schema & Admin Integration
Date: 2025-12-30
Status: Accepted
Supersedes: ADR-004 (Partial), Design Doc 0x11-c (Draft)
Context: Reconciling conflict between Admin’s Logical Assets (assets_tb) and Sentinel’s Physical Chains (chain_assets).
1. Problem Statement
The system currently has ambiguity regarding where “Asset Definition” lives:
- Admin (
assets_tb): Defines “USDT” (Logical), Symbol, Decimal (Internal). - Sentinel (
chain_assets): Needs “USDT” on ETH (Physical), Contract, Decimal (Chain). - Conflict: Potential data duplication (redundancy) and unclear ownership.
2. Architectural Decision: Layered Asset Model
We explicitly separate the domain model into two strictly defined layers.
Layer 1: Logical Asset (Master) -> assets_tb
- Owner: Admin Dashboard (Existing).
- Scope: Business logic, User Balances, Trading Pairs.
- Key Fields:
asset_id(PK)asset(Unique Identifier, e.g., “USDT”)decimals(System Precision, e.g., 8)status(Global Switch)
Layer 2: Physical Binding (Extension) -> chain_assets_tb
- Owner: Operations (via Admin Extension).
- Scope: Blockchain adapters, Deposit/Withdrawal addresses, Sentinel config.
- Key Fields:
chain_slug(FK tochains_tb)asset_id(FK toassets_tb)contract_address(Physical ID)decimals(Physical Precision)
- Constraint: No re-definition of Logical fields (Symbol, Name).
3. Schema Specification (Finalized)
-- 1. Chains (Infrastructure)
CREATE TABLE chains_tb (
chain_slug VARCHAR(32) PRIMARY KEY, -- 'ETH', 'BTC' (Renamed from chain_id)
chain_name VARCHAR(64) NOT NULL,
network_id VARCHAR(32), -- '1', 'regtest'
rpc_urls TEXT[] NOT NULL,
confirmation_blocks INT DEFAULT 1,
is_active BOOLEAN DEFAULT TRUE
);
-- 2. Chain Assets (Physical Extension)
CREATE TABLE chain_assets_tb (
id SERIAL PRIMARY KEY,
chain_slug VARCHAR(32) NOT NULL REFERENCES chains_tb(chain_slug),
asset_id INT NOT NULL REFERENCES assets_tb(asset_id),
-- Physical Properties Only
contract_address VARCHAR(128), -- Mutually Exclusive Unique ID per chain
decimals SMALLINT NOT NULL, -- The mapping factor (Chain -> System)
-- Chain-Specific Overrides
min_deposit DECIMAL(30, 8) DEFAULT 0,
min_withdraw DECIMAL(30, 8) DEFAULT 0,
withdraw_fee DECIMAL(30, 8) DEFAULT 0,
is_active BOOLEAN DEFAULT FALSE, -- Safety: Inactive by default until verified
-- Constraints
UNIQUE(chain_slug, asset_id), -- 1 Asset per Chain (for now, can look into bridging later)
UNIQUE(chain_slug, contract_address) -- 1 Contract = 1 Asset
);
4. Admin Integration Scope
Admin code currently does not support Layer 2. To strictly follow this architecture, Admin must be updated in a future iteration:
- New Model:
ChainAssetmapping tochain_assets_tb. - New View: “Chain Configurations” tab under Asset details.
- Logic: When viewing “USDT”, allow adding “ETH Binding” (Contract: 0x…, Decimals: 6).
5. Migration Strategy (Immediate)
For Phase 0x11-b (Sentinel Hardening), we implement the Schema and Manual Seeding (Migration 012). Admin UI updates are deferred, but the Schema is future-proofed to support them perfectly.
ADR-006: User Address Decoupling for Account-Based Chains
Date: 2025-12-30
Status: Accepted
Context: Replaces user_addresses definition in migrations/010_deposit_withdraw.sql to enable “Hot Listing”.
1. Problem Statement
The current schema for user addresses matches Assets, not Chains:
-- OLD (Flawed)
PRIMARY KEY (user_id, asset, chain_slug)
The Loophole:
- User A has ETH Address
0x123. DB Record:(UserA, 'ETH', 'ETH', '0x123'). - Ops lists
UNI(ERC20). - User A deposits
UNIto0x123. - Sentinel parses
Transfer(0x123, val). - Sentinel looks up:
SELECT user_id FROM user_addresses WHERE address='0x123' AND asset='UNI'. - Result: NULL. Deposit Ignored.
Impact: Users must manually “Generate UNI Address” (redundant action) before depositing, or funds are lost/stuck. This breaks the “Ops List -> Immediate Deposit” workflow.
2. Solution: Chain-Centric Address Model
We must recognize that for Account-Based Chains (ETH, TRON, BSC, SOL), an address belongs to the Chain Account, not the individual Asset.
2.1 Schema Change
We split the concept into “Account Bindings”.
-- migration/012_user_chain_addresses.sql
CREATE TABLE user_chain_addresses (
user_id BIGINT NOT NULL,
chain_slug VARCHAR(32) NOT NULL REFERENCES chains_tb(chain_slug),
address VARCHAR(255) NOT NULL,
-- Metadata
memo_tag VARCHAR(64), -- For XRP/EOS destination tags
created_at TIMESTAMPTZ DEFAULT NOW(),
-- Constraint: One address per user per chain (simplified model)
-- Or Multiple? For now, 1:1 is sufficient for MVP.
PRIMARY KEY (user_id, chain_slug),
UNIQUE (chain_slug, address) -- Reverse lookup must be unique
);
2.2 Sentinel Lookup Logic
When EthScanner detects a Transfer(to, value, contract):
- Identify Asset: Match
contract->asset_id(viachain_assets_tb). - Identify User: Match
to->user_id(viauser_chain_addressesWHEREchain_slug=‘ETH’). - Insert Deposit:
deposit_history(user_id, asset_id, amount).
Outcome: The asset_id comes from the Contract, the user_id comes from the Address. They are decoupled.
3. Handling UTXO (BTC)
BTC addresses are generally single-use or per-intent. However, for an Exchange Deposit model, we typically generate one “Deposit Address” per User per Chain (or rotate them).
Currently, we can treat BTC the same: “User’s BTC Deposit Address”.
If we need asset-specific addresses (e.g. OMNI USDT vs BTC), that’s a legacy edge case we might ignore for MVP, or handle via chain_slug variants (e.g. btc-omni vs btc-native? No, usually same chain).
Decision: The Schema user_chain_addresses(user_id, chain_slug) works for BTC too if we assume “One Checkable Address per User” or “List of Addresses”.
Refinement: PRIMARY KEY (chain_slug, address) is the real physical truth. A user maps to an address.
An address maps to a user.
4. Operational Workflow (Final)
- Listing: Ops lists
UNI(Contract0x...) ->chain_assets_tb. - Sentinel: Refreshes map
0x...->UNI. - User: Sends
UNIto their existing ETH address. - Sentinel:
- Sees
0x...-> Knows it’sUNI. - Sees
Receiver-> Knows it’sUser A. - Success.
- Sees
5. Status
Accepted
AR-001: Architecture Request - WebSocket Authentication
| Status | REQUESTED |
|---|---|
| Date | 2025-12-27 |
| Requester | QA / Remediation Agent |
| Driver | Identity Spoofing Remediation |
Problem Statement
The current WebSocket implementation relies on a “Strict Anonymous Mode” (ADR-001) which rejects any user_id != 0.
While this mitigates immediate identity spoofing, it creates a functional gap: Authentic users cannot verify their identity or access private channels.
The user explicitly rejected ADR-001 as a complete solution (security is not fixed ... require forthar design), necessitating a robust authentication design.
Requirements
The Architect must provide a design (e.g., ADR-002) that:
- Authentication mechanism: Defines how a WebSocket client proves its identity (e.g., JWT in Query Param vs Header vs Handshake Message).
- Integration: How this integrates with
src/api_auth/(Ed25519) or standard Session Management. - State Management: How
ConnectionManagerstores and validates the authenticated session. - Migration: Specific steps to replace the temporary “Strict Anonymous Mode” in
handler.rswith the new mechanism.
Constraints
- Low Latency: Auth check must not significantly delay connection establishment.
- Backwards Compatibility: Must support Anonymous public trade streams simultaneously.