Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

0x00 Project Roadmap

Vision: Build a production-grade cryptocurrency exchange from Hello World to Microsecond Latency. Current Status: Phase V (Extreme Optimization) - Order Commands parity complete.


📊 Progress Overview

This project documents the complete journey of building a 1.3M orders/sec matching engine. Below is the current status of each phase.


✅ Phase I: Core Matching Engine

Status: Complete

ChapterTitleDescription
0x01GenesisBasic OrderBook with Vec<Order>
0x02Float CurseWhy floats fail → u64 refactoring
0x03Decimal WorldPrecision configuration system
0x04BTree OrderBookBTreeMap-based order book
0x05User BalanceAccount & balance management
0x06Enforced BalanceType-safe fund locking
0x07Testing Framework1M order batch testing
0x08Trading PipelineLMAX-style Ring Buffer architecture
0x09Gateway & PersistenceHTTP API, TDengine, WebSocket, K-Line

✅ Phase II: Productization

Status: Complete

ChapterTitleDescription
0x0AAccount SystemPostgreSQL user management
0x0A-bID SpecificationIdentity addressing rules
0x0A-cAPI AuthenticationEd25519 cryptographic auth
0x0BFunding & TransferInternal transfer architecture
0x0CTrade FeeMaker/Taker fees + VIP discount

🔶 Phase III: Resilience & Funding

Status: Complete

ChapterTitleDescriptionStatus
0x0DSnapshot & RecoveryState snapshot, crash recovery✅ Done
0x0EOpenAPI IntegrationSwagger UI, SDK generation✅ Done
0x0FAdmin DashboardOps Panel, KYC, hot-reload✅ Done
0x11Deposit & WithdrawMock Chain integration, Idempotency✅ Done
0x11-aReal Chain IntegrationSentinel Service (Pull Model)✅ MVP Done
0x11-bSentinel HardeningSegWit Fix (DEF-002) & ETH/ERC20 & ADR-005/006✅ Done

🔶 Phase IV: Trading Integration & Verification

Status: Pending Verification

Context: The Core Engine and Trading APIs are implemented but currently tested with Mocks. This phase bridges the gap between the Real Chain (0x11) and the Matching Engine (0x01).

ChapterTitleDescriptionStatus
0x12Real Trading VerificationEnd-to-End: Bitcoind -> Sentinel -> Order -> Trade� Code Ready (Needs Real-Chain Test)
0x13Market Data ExperienceWebSocket Verification (Ticker, Trade, Depth)� Code Ready (Needs E2E Test)

⏳ Phase V: Extreme Optimization (Metal Mode)

Status: In Progress

Codename: “Metal Mode” Goal: Push Rust to the physical limits of the hardware.

ChapterTitleDescription
0x14Extreme OptimizationArchitecture Manifesto
0x14-aBenchmark Harness✅ 100% Bit-exact Parity (FILL)
0x14-bOrder Commands✅ IOC, Move, Reduce (Feature Parity)
0x15Zero-CopyPlanned
0x16CPU AffinityPlanned
0x17SIMD MatchingPlanned

🏆 Key Milestones

Git TagPhaseHighlights
v0.09-f-integration-test0x091.3M orders/sec baseline achieved
v0.10-a-account-system0x0APostgreSQL account integration
v0.10-b-api-auth0x0AEd25519 authentication
v0.0C-trade-fee0x0CMaker/Taker fee system
v0.0D-persistence0x0DUniversal WAL & Snapshot persistence
v0.0F-admin-dashboard0x0FAdmin Operations Dashboard
v0.11-a-funding-qa0x11-aReal Chain Sentinel MVP (Deposit/Withdraw)
v0.11-b-sentinel-hardening0x11-bDEF-002 Fix, ADR-005/006, Hot Listing
v0.14-b-order-commands0x14-b✅ IOC, Move, Reduce (Bit-exact Parity)

🎯 What You’ll Learn

  1. Financial Precision - Why f64 fails and how to use fixed-point u64
  2. High-Performance Data Structures - BTreeMap for O(log n) order matching
  3. Lock-Free Concurrency - LMAX Disruptor-style Ring Buffer
  4. Event Sourcing - WAL-based deterministic state reconstruction
  5. Real-World Blockchain Integration - Handling Re-orgs, Confirmations, and UTXO management
  6. Production Security - Watch-only wallets & Ed25519 authentication

Last Updated: 2025-12-31

0x01 Genesis: Basic Engine

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

This is the first version of 0xInfinity. In this stage, we have built a minimal prototype of a Central Limit Order Book (CLOB). Our goal is to intuitively demonstrate real-world trading logic using standard data structures to manage orders.

1. Visualizing the Orderbook

An Orderbook is essentially a list of orders arranged by price. We place Sells (Asks) at the top and Buys (Bids) at the bottom. The gap in the middle is called the “Spread”.

We maintain two lists in memory:

  • Sells: Sorted by price Low to High (Buyers want the cheapest price).
  • Buys: Sorted by price High to Low (Sellers want the most expensive price).
===========================================================
               ORDER BOOK SNAPSHOT
===========================================================

    Side   |   Price (f64)   |   Qty    |   Orders (FIFO)
-----------------------------------------------------------
    SELL   |     102.00      |   5.0    |   [Order #2]
    SELL   |     101.00      |   5.0    |   [Order #3]     ^
                                                           | Best Ask (Lowest)
-----------------------------------------------------------
             $$$  MARKET SPREAD  $$$
-----------------------------------------------------------
                                                           | Best Bid (Highest)
    BUY    |     100.00      |   10.0   |   [Order #1]     v
    BUY    |      99.00      |   10.0   |   [Order #5]

===========================================================

2. Program Output

After executing cargo run, we can observe the actual output of the engine:

--- 0xInfinity: Stage 1 (Genesis) ---

[1] Makers coming in...

[2] Taker eats liquidity...
MATCH: Buy 4 eats Sell 1 @ Price 100 (Qty: 10)
MATCH: Buy 4 eats Sell 3 @ Price 101 (Qty: 2)

[3] More makers...

--- End of Simulation ---



🇨🇳 中文

📦 代码变更: 查看 Diff

这是 0xInfinity 的第一个版本。 在这一阶段,我们构建了一个最简单的**中央限价订单簿(CLOB)**雏形。我们的目标是直观地展示现实世界的交易逻辑,使用标准的数据结构来管理订单。

1. 订单簿布局 (Visualizing the Orderbook)

订单簿本质上是一个按价格排列的列表。我们将**卖单(Sells)**放在上方,**买单(Buys)**放在下方。中间的空隙被称为“价差(Spread)”。

我们在内存中维护了两个列表:

  • Sells: 按价格 从低到高 排列(买家希望买到最便宜的)。
  • Buys: 按价格 从高到低 排列(卖家希望卖给最贵的)。
===========================================================
               ORDER BOOK SNAPSHOT
===========================================================

    Side   |   Price (f64)   |   Qty    |   Orders (FIFO)
-----------------------------------------------------------
    SELL   |     102.00      |   5.0    |   [Order #2]
    SELL   |     101.00      |   5.0    |   [Order #3]     ^
                                                           | Best Ask (Lowest)
-----------------------------------------------------------
             $$$  MARKET SPREAD  $$$
-----------------------------------------------------------
                                                           | Best Bid (Highest)
    BUY    |     100.00      |   10.0   |   [Order #1]     v
    BUY    |      99.00      |   10.0   |   [Order #5]

===========================================================

2. 运行结果 (Program Output)

执行 cargo run 后,我们可以看到引擎的实际运行结果:

--- 0xInfinity: Stage 1 (Genesis) ---

[1] Makers coming in...

[2] Taker eats liquidity...
MATCH: Buy 4 eats Sell 1 @ Price 100 (Qty: 10)
MATCH: Buy 4 eats Sell 3 @ Price 101 (Qty: 2)

[3] More makers...

--- End of Simulation ---

0x02: The Curse of Float

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

1. The Rookie Mistake

Experienced developers might have noticed that the price type was f64. This is problematic. In models.rs, we had this line:

#![allow(unused)]
fn main() {
pub price: f64, // The root of all evil
}

In most general-purpose applications where absolute precision is not critical, using floating-point numbers is fine. If single precision isn’t enough, double precision usually suffices. However, in the financial domain, storing monetary values as floats is considered an engineering disaster.

If you use floats to store money, it is impossible to maintain a 100% accurate ledger over time. Even with frequent reconciliation, you often end up accepting a “close enough” result.

Moreover, using floats introduces accumulation errors. Over millions of transactions, these tiny errors add up. While various rounding modes can mitigate this if done correctly, the root cause remains.

The biggest issue isn’t just the error itself (which might be acceptable within a tolerance), but the fact that you cannot fundamentally verify the correctness of the settlement, potentially hiding real bugs.

2. The Precision Trap

Run this incredibly simple code (you can run it in this project via cargo run --example the_curse_of_float):

fn main() {
    let a: f64 = 0.1;
    let b: f64 = 0.2;
    let sum = a + b;

    // You expect this to pass, right?
    if sum == 0.3 {
        println!("Math works!");
    } else {
        println!("PANIC: Math is broken! Sum is {:.20}", sum);
    }
}

The output might surprise you:

PANIC: Math is broken! Sum is 0.30000000000000004441

See that extra 0.00000000000000004441? What is that? Why does it happen?

The main issue isn’t just about floating-point precision being “insufficient,” but that computers simply cannot precisely represent certain numbers.

Computers use binary, while humans use decimal. Just as 1/3 = 0.3333... repeats infinitely in decimal, 0.1 is a repeating fraction in binary that cannot be represented exactly.

In a matching engine, if an Ask in your OrderBook is 0.3 and a user’s Bid is computed as 0.1 + 0.2, these two orders—which inherently should match—will never match due to floating-point errors.

3. Why Blockchain Hates Floats

If you’ve worked with Ethereum smart contracts, you know there are no floating-point numbers in Solidity. Many people wonder why.

There is only one reason: Blockchain cores require 100% deterministic outputs for the same input. Regardless of time, location, hardware, OS, or CPU architecture, running the same code must yield exactly the same result. Only with absolute consistency—down to the last bit—can we ensure that everyone shares the same ledger and the same “consensus.”

Specifically, while floating-point calculations follow the IEEE 754 standard, edge cases can cause minute differences across CPUs:

Node A (Intel) Result: 100.00000000000001
Node B (ARM)   Result: 100.00000000000000

Once this happens, the storage Hash differs, consensus breaks, and the chain forks.

4. The Decimal Temptation

When people realize the issue with f64, they often look for a precise decimal type, such as rust_decimal.

However, even with Decimal, different hardware, programming languages, or even compiler versions can lead to subtle differences. Achieving the 100% determinism required by blockchain is difficult.

The only thing that guarantees 100% determinism is Integer arithmetic. If integer calculations are inconsistent, it is 100% a bug.

Problems with Decimal:

  • Software Emulation: Decimal is a software struct, not a hardware primitive.
  • Implementation Dependency: Consistency depends on the library implementation.
  • “Dialects”: If your backend uses Rust (rust_decimal), risk engine uses Python (decimal), and frontend uses JS (BigInt), subtle differences in “Rounding Mode” or “Overflow Handling” can lead to ledger discrepancies over time.

5. Need for Speed: f64 vs u64

Besides determinism, another core reason we avoid Decimal is Performance.

u64 (Native Integer):

  • When executing a + b, the CPU has a dedicated ALU circuit for 64-bit integer addition.
  • It completes in as little as 1 clock cycle.

Decimal (Software Struct):

  • When executing addition, the CPU runs a complex piece of code: checking Scale, aligning decimals, handling overflow, and finally calculating.
  • This takes hundreds to thousands of times more instruction cycles.

In most apps, CPU cycles are abundant, so this doesn’t matter. But we are writing an HFT (High-Frequency Trading) engine where every nanosecond counts.

Cache Efficiency:

  • u64 takes 8 bytes.
  • Decimal typically takes 16 bytes (128-bit).
  • Using u64 means your CPU cache can store twice as much price data, effectively doubling your throughput.

We will discuss Cache mechanics in detail later.

Summary

Two reasons to ban floating-point numbers:

  1. No 100% Determinism — Fails to meet blockchain consensus and precise reconciliation requirements.
  2. Performance Issues — For HFT engines, Integer is the only choice.

Refactoring Results

We have refactored all f64 fields in models.rs to u64:

#![allow(unused)]
fn main() {
pub struct Order {
    pub id: u64,
    pub price: u64,  // Use Integer for Price
    pub qty: u64,    // Use Integer for Quantity
    pub side: Side,
}
}

Output after cargo run:

--- 0xInfinity: Stage 2 (Integer) ---

[1] Makers coming in...

[2] Taker eats liquidity...
MATCH: Buy 4 eats Sell 1 @ Price 100 (Qty: 10)
MATCH: Buy 4 eats Sell 3 @ Price 101 (Qty: 2)

[3] More makers...

--- End of Simulation ---

Now all price comparisons are precise integer comparisons, free from floating-point errors.




🇨🇳 中文

📦 代码变更: 查看 Diff

1. 新手常犯的错误 (The Rookie Mistake)

有经验的老手,应该马上看到 price 的类型是 f64,这是有问题的。因为我们在 models.rs 里有这行代码:

#![allow(unused)]
fn main() {
pub price: f64, // The root of all evil
}

在大多数不要求计算结果绝对精确的场合,使用浮点数是没问题的。如果单精度不够,那就使用双精度,一般都不会有什么问题。但是在金融领域,使用浮点数存储金额,属于工程事故

使用浮点数存储金额,稍微长一点时间,都不可能做到账本的完全精确、分毫不差。即使通过频繁的对账校验,最后也只能接受“大差不差,差不多就行“的结果。

而且使用浮点数存储金额,会带来累积误差。在常年累月的交易后,这些微小的误差会越来越多。使用各种不同的误差舍入模式,如果做对了,可以减少累积误差。

如果说累积误差在一定范围内是可以接受的,那么误差本身一般不是问题。最大的问题是:如果不能从根本上检验结算的正确性,就可能因此而隐藏了真正的 bug。

2. 精度陷阱 (The Precision Trap)

跑一下这段极其简单的代码(你可以在本项目中运行 cargo run --example the_curse_of_float):

fn main() {
    let a: f64 = 0.1;
    let b: f64 = 0.2;
    let sum = a + b;

    // You expect this to pass, right?
    if sum == 0.3 {
        println!("Math works!");
    } else {
        println!("PANIC: Math is broken! Sum is {:.20}", sum);
    }
}

输出结果会让人惊讶:

PANIC: Math is broken! Sum is 0.30000000000000004441

看到了吗?那个多出来的 0.00000000000000004441。这是什么鬼?为什么会这样?

主要的问题不仅仅是浮点数精度够不够的问题,而是计算机根本无法精确表示某些数字的问题。

计算机是二进制的,而人类的常用数字是十进制的。就像十进制里 1/3 = 0.3333... 永远写不完一样,在二进制里,0.1 也是一个用二进制永远无法完全精确表达的数。

在撮合引擎里,如果你的 OrderBook 里的 Ask 是 0.3,而用户的 Bid 是 0.1 + 0.2,由于浮点误差,这两个本来应该成交的单子,永远不会匹配

3. 区块链的零容忍 (Why Blockchain Hates Floats)

如果了解过以太坊的智能合约语言就知道,在合约里面是没有任何浮点数的。很多人不知道为什么。

原因只有一个:区块链的核心是要求同样的输入必须 100% 确定的输出。无论你在什么时间、什么地方,都必须在不同的硬件、不同的操作系统、不同的 CPU 架构上,运行同一段代码,并得到完全一致的结果。只有完全一致,一个 bit 的误差都没有,才能确定全球所有人共享的都是同一个账本、同一种“比特币“。

具体而言,浮点数计算遵循 IEEE 754 标准,但在极端边缘情况下,不同的 CPU 对浮点数的处理可能会有极其微小的差异:

Node A (Intel) 算出结果:100.00000000000001
Node B (ARM) 算出结果:100.00000000000000

一旦发生这种情况,Hash 就会不同,共识就会破裂,链就会分叉。

4. Decimal 的诱惑与陷阱 (The Decimal Temptation)

有人意识到 f64 的问题时,会寻找一种精确的小数类型,比如 rust_decimal

但即使是 Decimal,在不同的硬件、不同编程语言,甚至同一种语言的不同版本、编译器的实现上,都可能有细微的差别,都不可能做到区块链要求的 100% 确定性。

能做到 100% 确定性的,只有整数。如果全部是整数计算结果也不一致的话,可以 100% 确定是有 bug。

Decimal 的问题

Decimal (Software Struct):

  • Decimal 是软件模拟的
  • Decimal 的一致性依赖于库的实现
  • 如果你的后端用 Rust (rust_decimal),风控用 Python (decimal),前端用 JS (BigInt),不同的库对“舍入模式 (Rounding Mode)“和“溢出处理“可能有不同的“方言”
  • 这种微小的差异会导致长时间之后系统对不上账

5. 性能之争: f64 vs u64 (Need for Speed)

除了 100% 确定性,我们不使用 Decimal 的另一个核心理由是:性能

u64 (Native Integer):

  • 当你执行 a + b 时,CPU 内部有专门的 ALU 电路直接处理 64 位整数加法
  • 它最快只需要 1 个时钟周期 就完成了计算

Decimal (Software Struct):

  • 当你执行加法时,CPU 实际上是在运行一段复杂的代码:检查 Scale、调整对齐、处理溢出、最后计算
  • 这需要多 上百倍甚至几千倍 的指令周期

大多数情况下,CPU 时钟周期都过剩,因此一般应用无需过多考虑。而且大多数现代 CPU 都有浮点计算单元,也会很快。但我们要写的是 HFT 引擎,纳秒必争。

还有就是 Cache Efficiency(缓存效率)

  • u64 占 8 字节
  • Decimal 通常占 16 字节 (128-bit)
  • 使用 u64 意味着你的 CPU 缓存能多存一倍的价格数据,这直接意味着吞吐量翻倍

关于 Cache 的问题,后面再详细讨论。

Summary

不能使用浮点数的两个理由:

  1. 不能保证 100% 确定性 — 无法满足区块链共识和精确对账的要求
  2. Decimal 有性能问题 — 对于 HFT 引擎来说,整数是唯一的选择

重构后的运行结果

我们已经把 models.rs 中的 f64 全部重构为 u64

#![allow(unused)]
fn main() {
pub struct Order {
    pub id: u64,
    pub price: u64,  // 使用整数表示价格
    pub qty: u64,    // 使用整数表示数量
    pub side: Side,
}
}

运行 cargo run 后的输出:

--- 0xInfinity: Stage 2 (Integer) ---

[1] Makers coming in...

[2] Taker eats liquidity...
MATCH: Buy 4 eats Sell 1 @ Price 100 (Qty: 10)
MATCH: Buy 4 eats Sell 3 @ Price 101 (Qty: 2)

[3] More makers...

--- End of Simulation ---

现在所有的价格比较都是精确的整数比较,不再有浮点数误差的问题。

0x03: Decimal World

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

In the previous chapter, we refactored all f64 to u64, solving the floating-point precision issues. But this introduced a new problem: Clients use decimals, while we use integers internally. How do we convert between them?

1. The Decimal Conversion Problem

When a user places an order, the input price might be "100.50" and quantity "10.5". However, our engine uses u64 integers:

#![allow(unused)]
fn main() {
pub struct Order {
    pub id: u64,
    pub price: u64,   // Integer representation
    pub qty: u64,     // Integer representation
    pub side: Side,
}
}

Core Question: How to perform lossless conversion between decimal strings and u64?

The answer is the Fixed Decimal scheme:

#![allow(unused)]
fn main() {
/// Convert decimal string to u64
/// e.g., "100.50" with 2 decimals -> 10050
fn parse_decimal(s: &str, decimals: u32) -> u64 {
    let multiplier = 10u64.pow(decimals);
    // ... Parsing Logic
}

/// Convert u64 back to decimal string for display
/// e.g., 10050 with 2 decimals -> "100.50"
fn format_decimal(value: u64, decimals: u32) -> String {
    let multiplier = 10u64.pow(decimals);
    let int_part = value / multiplier;
    let dec_part = value % multiplier;
    format!("{}.{:0>width$}", int_part, dec_part, width = decimals as usize)
}
}

2. The u64 Max Value (Range Analysis)

The maximum value of u64 is:

u64::MAX = 18,446,744,073,709,551,615

If we use 8 decimal places (similar to Bitcoin’s satoshi), the maximum representable value is:

184,467,440,737.09551615

This means:

  • For Price: We can represent up to ~184 Billion. (If Bitcoin hits this price, we’ll upgrade…)
  • For Quantity: It can hold the entire total supply of BTC (21 million).

Decimals Configuration for Different Assets

Different blockchain assets have different native precisions:

AssetNative DecimalsSmallest Unit
BTC81 satoshi = 0.00000001 BTC
USDT (ERC20)60.000001 USDT
ETH181 wei = 0.000000000000000001 ETH

The Question: ETH natively uses 18 decimals. Will we lose precision if we use only 8?

The answer is: It is sufficient for an Exchange. Because:

  • With 8 decimals, the smallest supported unit is 0.00000001 ETH.
  • There’s no real need to trade 0.000000000000000001 ETH (value ≈ $0.000000000000003).

So we can choose a reasonable internal precision, not necessarily identical to the native chain.

Thus, we need a SymbolManager to manage:

  • Internal precision (decimals) for each asset.
  • User display precision (display_decimals).
  • Price precision configuration for trading pairs.
  • Conversion between on-chain and internal precision during Deposit/Withdrawal.

ETH Decimals Analysis: 8 vs 12 bits

Let’s analyze the maximum ETH amount representable by u64 under different decimal configs:

DecimalsMultiplierMax Value by u64Sufficient?
810^8184,467,440,737 ETH✅ Huge margin
910^918,446,744,073 ETH✅ Huge margin
1010^101,844,674,407 ETH✅ > Total Supply
1110^11184,467,440 ETH✅ Just enough (~120M)
1210^1218,446,744 ETH❌ < Total Supply!
1810^1818.44 ETH❌ Absolutely not enough

ETH Total Supply ≈ 120 Million ETH

Why we chose 8 decimals for ETH?

  • 0.00000001 ETH$0.00000003, far below any meaningful trade size.
  • Max capacity 184 Billion ETH > Total Supply (120M).
  • Just convert precision during Deposit/Withdrawal.

Configuration Example:

#![allow(unused)]
fn main() {
// BTC: 8 decimals (Same as satoshi)
manager.add_asset(1, 8, 3, "BTC");

// USDT: 8 decimals (Native is 6, we align to 8 internally)
manager.add_asset(2, 8, 2, "USDT");

// ETH: 8 decimals (Safe range, sufficient precision)
manager.add_asset(3, 8, 4, "ETH");
}

3. Symbol Configuration

Different trading pairs have different precision requirements:

SymbolPrice DecimalsQty Display DecimalsExample
BTC_USDT23Buy 0.001 BTC @ $65000.00
ETH_USDT24Buy 0.0001 ETH @ $3500.00
DOGE_USDT60Buy 100 DOGE @ $0.123456

We use SymbolManager to manage these configs:

#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
pub struct SymbolInfo {
    pub symbol: String,
    pub symbol_id: u32,
    pub base_asset_id: u32,
    pub quote_asset_id: u32,
    pub price_decimal: u32,         // Decimals for Price
    pub price_display_decimal: u32, // Display decimals for Price
}

#[derive(Debug, Clone)]
pub struct AssetInfo {
    pub asset_id: u32,
    pub decimals: u32,         // Internal precision (usually 8)
    pub display_decimals: u32, // Max decimals for input/display
    pub name: String,
}
}

4. decimals vs display_decimals

Distinguishing these two concepts is crucial:

decimals (Internal Precision)

  • Determines the multiplier for u64.
  • Usually 8 (like satoshi).
  • This is internal storage format, invisible to users.

display_decimals (Display Precision)

  • Determines how many decimal places users can see/input.
  • E.g., BTC displays 3 digits: 0.001 BTC.
  • USDT displays 2 digits: 100.00 USDT.

Why separate them?

  1. UX: Users don’t need to see 8 decimal places.
  2. Validation: Limit user input precision.
  3. Cleanliness: Avoid trailing zeros.

5. Program Output

Output after cargo run:

--- 0xInfinity: Stage 3 (Decimal World) ---
Symbol: BTC_USDT (ID: 0)
Price Decimals: 2, Qty Display Decimals: 3

[1] Makers coming in...
    Order 1: Sell 10.000 BTC @ $100.00
    Order 2: Sell 5.000 BTC @ $102.00
    Order 3: Sell 5.000 BTC @ $101.00

[2] Taker eats liquidity...
    Order 4: Buy 12.000 BTC @ $101.50
MATCH: Buy 4 eats Sell 1 @ Price 10000 (Qty: 10000)
MATCH: Buy 4 eats Sell 3 @ Price 10100 (Qty: 2000)

[3] More makers...
    Order 5: Buy 10.000 BTC @ $99.00

--- End of Simulation ---

--- u64 Range Demo ---
u64::MAX = 18446744073709551615
With 8 decimals, max representable value = 184467440737.09551615

Observation:

  • User input is decimal string "100.00".
  • Internal storage is integer 10000.
  • Display converts back to "100.00".

This is the core of Decimal World: Seamless lossless conversion between Decimal Strings and u64 Integers.

📖 True Story: JavaScript Number Overflow

During development, we encountered a bizarre bug:

Symptom: The backend returned raw ETH amount (in wei). During testing with small amounts (0.00x ETH), frontend worked fine. But once the amount hit ~0.009 ETH, the number started losing precision and became incorrect!

Root Cause: JavaScript’s Number type uses IEEE 754 double-precision floats. The maximum safe integer is 2^53 - 1:

> console.log(Number.MAX_SAFE_INTEGER);
9007199254740991                          // ~ 9 * 10^15

// 1 ETH = 10^18 wei
> const oneEthInWei = 1000000000000000000;

// The Issue: When wei amount exceeds MAX_SAFE_INTEGER
> const smallAmount = 1000000000000000;     // 0.001 ETH = 10^15 wei ✅ Safe
> const dangerAmount = 9007199254740992;    // ~ 0.009 ETH ⚠️ Just exceeded limit!
> const tenEthInWei = 10000000000000000000; // 10 ETH = 10^19 wei ❌ Overflow!

// Verify Precision Loss: Adding 1 has no effect!
> console.log(tenEthInWei + 1);
10000000000000000000                       // No +1!
> console.log(tenEthInWei === tenEthInWei + 1);
true                                       // 😱 WHAT?!

Why ~0.009 ETH?

> console.log(Number.MAX_SAFE_INTEGER / 1e18);
0.009007199254740991                       // 0.009 ETH is the safety limit!

Solution:

// ✅ Solution 1: Backend returns String, Frontend uses BigInt
> const weiString = "10000000000000000000";  // String from backend
> const weiBigInt = BigInt(weiString);       // Convert to BigInt
> console.log((weiBigInt + 1n).toString());
10000000000000000001                       // ✅ Correct!

// ✅ Solution 2: Use libraries like ethers.js
// import { formatEther, parseEther } from 'ethers';
// const eth = formatEther(weiBigInt);  // "10.0"

Summary

This chapter solved:

  1. Decimal Conversion: parse_decimal() and format_decimal() for bidirectional lossless conversion.
  2. u64 Range: Max value 184 Billion (at 8 decimals), sufficient for any financial scenario.
  3. Symbol Config: SymbolManager handles precision settings per pair.
  4. Precision Definitions: Distinct decimals (internal) vs display_decimals (UI).



🇨🇳 中文

📦 代码变更: 查看 Diff

在上一章中,我们将所有的 f64 重构为 u64,解决了浮点数的精度问题。但这引入了一个新的问题:客户端使用的是十进制,而我们内部使用的是整数,如何进行转换?

1. 十进制转换问题 (The Decimal Conversion Problem)

用户在下单时,输入的价格是 "100.50",数量是 "10.5"。但我们的引擎内部使用的是 u64 整数:

#![allow(unused)]
fn main() {
pub struct Order {
    pub id: u64,
    pub price: u64,   // 整数表示
    pub qty: u64,     // 整数表示
    pub side: Side,
}
}

核心问题:如何在十进制字符串和 u64 之间进行无损转换?

答案是使用 固定小数位数(Fixed Decimal) 方案:

#![allow(unused)]
fn main() {
/// 将十进制字符串转换为 u64
/// e.g., "100.50" with 2 decimals -> 10050
fn parse_decimal(s: &str, decimals: u32) -> u64 {
    let multiplier = 10u64.pow(decimals);
    // ... 解析逻辑
}

/// 将 u64 转换回十进制字符串用于显示
/// e.g., 10050 with 2 decimals -> "100.50"
fn format_decimal(value: u64, decimals: u32) -> String {
    let multiplier = 10u64.pow(decimals);
    let int_part = value / multiplier;
    let dec_part = value % multiplier;
    format!("{}.{:0>width$}", int_part, dec_part, width = decimals as usize)
}
}

2. u64 的最大值问题 (u64 Max Value)

u64 的最大值是:

u64::MAX = 18,446,744,073,709,551,615

如果我们使用 8 位小数(类似比特币的 satoshi),可以表示的最大值是:

184,467,440,737.09551615

这意味着:

  • 对于价格:可以表示到约 1844 亿,某天比特币需要这么大价格表示的时候再升级吧….
  • 对于数量:可以装进去全部比特币BTC总量了(总供应量 2100 万)

不同资产的 Decimals 配置

不同的区块链资产有不同的原生精度:

AssetNative Decimals最小单位
BTC81 satoshi = 0.00000001 BTC
USDT (ERC20)60.000001 USDT
ETH181 wei = 0.000000000000000001 ETH

问题来了:但是 ETH 原生是 18 位小数,但我们只用 8 位会丢失精度吗?

答案是:在交易所场景下足够使用。因为:

  • 定义8位的时候交易所交易的最小支持精度是 0.00000001 ETH, 足够了
  • 没有必要支持交易 0.000000000000000001 ETH(价值约 $0.000000000000003)

所以我们可以选择一个合理的内部精度,不一定要和原生链一致。

因此,我们需要一个资产和币对的基本配置管理器SymbolManager),用于:

  • 管理每个资产的内部精度(decimals)
  • 管理用户可见的显示精度(display_decimals)
  • 管理交易对的价格精度配置
  • 在入金/提币时进行链上精度和内部精度的转换

ETH Decimals 分析:8 到 12 位的选择

让我们分析不同 decimals 配置下,u64 能表示的最大 ETH 数量:

Decimals乘数u64 能表示的最大值够用?
810^8184,467,440,737 ETH✅ 远超总供应量
910^918,446,744,073 ETH✅ 远超总供应量
1010^101,844,674,407 ETH✅ 超过总供应量
1110^11184,467,440 ETH✅ 刚好超过总供应量 (~120M)
1210^1218,446,744 ETH❌ 小于总供应量!
1810^1818.44 ETH❌ 完全不够用

ETH 当前总供应量约 1.2 亿 ETH

分析

  • 8 位小数:最大 1844 亿 ETH,余量巨大,精度 0.00000001 ETH 对交易所足够
  • 10 位小数:最大 18 亿 ETH,精度更高
  • 12 位小数:最大 1800 万 ETH,精度最高,⚠️ 但小于总供应量

为什么 ETH 选择 8 位小数?

虽然 ETH 原生是 18 位小数(wei),但对于交易所来说:

  • 0.00000001 ETH$0.00000003,远小于任何有意义的交易金额
  • 最大可表示 1844 亿 ETH,远超总供应量(1.2 亿)
  • 入金/提币时进行精度转换即可

配置示例

#![allow(unused)]
fn main() {
// BTC: 8 位小数(和链上 satoshi 一致)
manager.add_asset(1, 8, 3, "BTC");

// USDT: 8 位小数
manager.add_asset(2, 8, 2, "USDT");

// ETH: 8 位小数(精度足够,范围安全)
manager.add_asset(3, 8, 4, "ETH");
}

3. 交易对配置 (Symbol Configuration)

不同的交易对可能有不同的精度要求:

SymbolPrice DecimalsQty Display DecimalsExample
BTC_USDT23买 0.001 BTC @ $65000.00
ETH_USDT24买 0.0001 ETH @ $3500.00
DOGE_USDT60买 100 DOGE @ $0.123456

我们使用 SymbolManager 来管理这些配置:

#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
pub struct SymbolInfo {
    pub symbol: String,
    pub symbol_id: u32,
    pub base_asset_id: u32,
    pub quote_asset_id: u32,
    pub price_decimal: u32,         // 价格的小数位数
    pub price_display_decimal: u32, // 价格显示的小数位数
}

#[derive(Debug, Clone)]
pub struct AssetInfo {
    pub asset_id: u32,
    pub decimals: u32,         // 内部精度(通常是 8)
    pub display_decimals: u32, // 显示/输入的最大小数位数
    pub name: String,
}
}

4. decimals vs display_decimals

这里有两个概念需要区分:

decimals (内部精度)

  • 决定了 u64 乘以多少
  • 通常是 8(类似 satoshi)
  • 这是内部存储精度,用户看不到

display_decimals (显示精度)

  • 决定了用户可以输入/看到多少位小数
  • 例如 BTC 显示 3 位:0.001 BTC
  • USDT 显示 2 位:100.00 USDT

为什么要分开?

  1. 用户体验:用户不需要看到 8 位小数的精度
  2. 输入验证:可以限制用户输入的小数位数
  3. 显示简洁:避免显示过多无意义的零

5. 运行结果

运行 cargo run 后的输出:

--- 0xInfinity: Stage 3 (Decimal World) ---
Symbol: BTC_USDT (ID: 0)
Price Decimals: 2, Qty Display Decimals: 3

[1] Makers coming in...
    Order 1: Sell 10.000 BTC @ $100.00
    Order 2: Sell 5.000 BTC @ $102.00
    Order 3: Sell 5.000 BTC @ $101.00

[2] Taker eats liquidity...
    Order 4: Buy 12.000 BTC @ $101.50
MATCH: Buy 4 eats Sell 1 @ Price 10000 (Qty: 10000)
MATCH: Buy 4 eats Sell 3 @ Price 10100 (Qty: 2000)

[3] More makers...
    Order 5: Buy 10.000 BTC @ $99.00

--- End of Simulation ---

--- u64 Range Demo ---
u64::MAX = 18446744073709551615
With 8 decimals, max representable value = 184467440737.09551615

可以看到:

  • 用户输入的是十进制字符串 "100.00"
  • 内部存储为整数 10000
  • 显示时又转换回 "100.00"

这就是 Decimal World 的核心:在十进制和 u64 整数之间无缝转换

📖 真实踩坑故事:JavaScript Number 溢出

在我们的开发过程中,曾经遇到过一个非常诡异的 bug:

现象:后端返回给前端的是原始 ETH 数量(单位 wei)。在开发测试阶段,因为测试金额非常小(0.00x 个 ETH 级别),前端都能正常显示和处理。但上线后只要金额稍大一点(实际上超过约 0.009 ETH),数字就开始出现精度问题,变成一个不正确的数值

根本原因:JavaScript 的 Number 类型使用 IEEE 754 双精度浮点数,最大安全整数是 2^53 - 1

> console.log(Number.MAX_SAFE_INTEGER);
9007199254740991                          // 约 9 * 10^15

// 1 ETH = 10^18 wei
> const oneEthInWei = 1000000000000000000;

// 问题演示:当 wei 数量超过 MAX_SAFE_INTEGER 时
> const smallAmount = 1000000000000000;     // 0.001 ETH = 10^15 wei ✅ 安全
> const dangerAmount = 9007199254740992;    // 约 0.009 ETH ⚠️ 刚好超过安全范围
> const tenEthInWei = 10000000000000000000; // 10 ETH = 10^19 wei ❌ 溢出!

// 验证精度丢失:加 1 后值不变!
> console.log(tenEthInWei + 1);
10000000000000000000                       // 没有 +1!

> console.log(tenEthInWei + 2);
10000000000000000000                       // 还是一样!

> console.log(tenEthInWei + 1000);
10000000000000000000                       // 加 1000 也还是一样!

> console.log(tenEthInWei === tenEthInWei + 1);
true                                       // 😱 见鬼了!

为什么超过约 0.009 个 ETH 就出问题?

> console.log(Number.MAX_SAFE_INTEGER / 1e18);
0.009007199254740991                       // 约 0.009 ETH 就是安全边界!

// 虽然输出看起来正确,但实际上精度已经丢失,验证方法:
> const nineEth = 9n * 10n ** 18n;         // BigInt 表示 9 ETH
> const nineEthNum = Number(nineEth);      // 转为 Number
> console.log(nineEthNum);
9000000000000000000                        // 看起来正确...

> console.log(nineEthNum + 1);
9000000000000000000                        // 但是 +1 没有效果!

> console.log(nineEthNum === nineEthNum + 1);
true                                       // 证明精度已丢失

正确的处理方案

// ✅ 方案 1: 后端返回字符串,前端用 BigInt 处理
> const weiString = "10000000000000000000";  // 后端返回字符串
> const weiBigInt = BigInt(weiString);       // 转为 BigInt
> console.log(weiBigInt.toString());
10000000000000000000                       // ✅ 精确!

// BigInt 可以正确进行算术运算
> console.log((weiBigInt + 1n).toString());
10000000000000000001                       // ✅ +1 正确!

// ✅ 方案 2: 使用专业库如 ethers.js
// import { formatEther, parseEther } from 'ethers';
// const eth = formatEther(weiBigInt);  // "10.0"

Summary

本章解决了以下问题:

  1. 十进制转换parse_decimal()format_decimal() 实现双向无损转换
  2. u64 范围:最大值 1844 亿(8 位小数),足够应对任何金融场景
  3. 交易对配置SymbolManager 管理每个交易对的精度设置
  4. 两种精度定义decimals(内部)vs display_decimals(显示)

0x04 OrderBook Refactoring (BTreeMap)

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

In the previous chapters, we completed the transition from Float to Integer and established a precision configuration system. However, our OrderBook data structure was still a “toy” implementation—re-sorting on every match! This chapter upgrades it to a truly production-ready data structure.

1. The Problem with the Naive Implementation

Let’s review the original engine.rs:

#![allow(unused)]
fn main() {
pub struct OrderBook {
    bids: Vec<PriceLevel>,  // Was 'buys'
    asks: Vec<PriceLevel>,  // Was 'sells'
}
}

💡 Naming Convention: We renamed buys/sells to bids/asks. These are standard options industry terms:

  • Bid: Price buyers are willing to pay.
  • Ask: Price sellers are demanding.

Using professional terminology aligns the code with industry docs and APIs.

#![allow(unused)]
fn main() {
fn match_buy(&mut self, buy_order: &mut Order) {
    // Problem 1: Re-sort every time! O(n log n)
    self.asks.sort_by_key(|l| l.price);
    
    for level in self.asks.iter_mut() {
        // ...matching logic...
    }
    
    // Problem 2: Removing empty levels shifts the whole array! O(n)
    self.asks.retain(|l| !l.orders.is_empty());
}

fn rest_order(&mut self, order: Order) {
    // Problem 3: Finding price level is a linear scan! O(n)
    let level = self.asks.iter_mut().find(|l| l.price == order.price);
    // ...
}
}

Time Complexity Analysis

OperationVec ImplIssue
Insert OrderO(n)Linear scan for price level
Pre-match SortO(n log n)Sort required before every match
Remove Empty LevelO(n)Array element shifting

In an active exchange with tens of thousands of orders per second, O(n) operations quickly become a performance bottleneck.

2. The BTreeMap Solution

Rust’s standard library provides BTreeMap, a Self-Balancing Binary Search Tree:

#![allow(unused)]
fn main() {
use std::collections::BTreeMap;

pub struct OrderBook {
    /// Asks: price -> orders (Ascending, Lowest Price = Best Ask)
    asks: BTreeMap<u64, VecDeque<Order>>,
    
    /// Bids: (u64::MAX - price) -> orders (Trick: Highest Price First)
    bids: BTreeMap<u64, VecDeque<Order>>,
}
}

Key Trick: Key Design for Bids

BTreeMap sorts keys in ascending order by default. This works perfectly for Asks (lowest price first). But for Bids, we need highest price first.

Solution: Use u64::MAX - price as the key.

#![allow(unused)]
fn main() {
// Insert Bid
let key = u64::MAX - order.price;
self.bids.entry(key).or_insert_with(VecDeque::new).push_back(order);

// Read Real Price
let price = u64::MAX - key;
}

Thus, Price 100 becomes Key u64::MAX - 100, and Price 99 becomes u64::MAX - 99. Since (u64::MAX - 100) < (u64::MAX - 99), Price 100 comes before Price 99!

Why not Reverse or Custom Comparator?

You might ask: Why not BTreeMap<Reverse<u64>, ...>?

Comparison:

ApproachIssue
BTreeMap<Reverse<u64>>Reverse is a wrapper; unwrapping on every access adds complexity.
Custom OrdRequires a newtype wrapper, increasing boilerplate.
u64::MAX - priceZero-Cost Abstraction: Two subtraction ops, easily inlined by compiler.

Key Advantages:

  • Simple: Just two lines of code.
  • Zero Overhead: Subtraction is a single-cycle CPU instruction.
  • Type Safe: Key remains u64.
  • No Overflow: Price is always < u64::MAX.

Time Complexity Comparison

OperationVec ImplBTreeMap Impl
Insert OrderO(n)O(log n)
Match (No Sort)-O(log n)
Cancel OrderO(n)O(n)*
Remove Empty LevelO(n)O(log n)
Query Best PriceO(n) / O(n log n)**O(1)**xx

*Note: Cancelling requires linear scan in VecDeque (O(n)). O(1) cancel requires an auxiliary HashMap index. **Note: BTreeMap::first_key_value() is amortized O(1).

3. New Data Models

Order

#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
pub struct Order {
    pub id: u64,
    pub price: u64,          // Internal Integer Price
    pub qty: u64,            // Original Qty
    pub filled_qty: u64,     // Filled Qty
    pub side: Side,
    pub order_type: OrderType,
    pub status: OrderStatus,
}
}

Trade

#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
pub struct Trade {
    pub id: u64,
    pub buyer_order_id: u64,
    pub seller_order_id: u64,
    pub price: u64,
    pub qty: u64,
}
}

OrderResult

#![allow(unused)]
fn main() {
pub struct OrderResult {
    pub order: Order,       // Updated Order
    pub trades: Vec<Trade>, // Generated Trades
}
}

4. Core API

#![allow(unused)]
fn main() {
impl OrderBook {
    /// Add order, return match result
    pub fn add_order(&mut self, order: Order) -> OrderResult;
    
    /// Cancel order
    pub fn cancel_order(&mut self, order_id: u64, price: u64, side: Side) -> bool;
    
    /// Get Best Bid
    pub fn best_bid(&self) -> Option<u64>;
    
    /// Get Best Ask
    pub fn best_ask(&self) -> Option<u64>;
    
    /// Get Spread
    pub fn spread(&self) -> Option<u64>;
}
}

5. Execution Results

=== 0xInfinity: Stage 4 (BTree OrderBook) ===
Symbol: BTC_USDT (ID: 0)
Price Decimals: 2, Qty Display Decimals: 3

[1] Makers coming in...
    Order 1: Sell 10.000 BTC @ $100.00 -> New
    Order 2: Sell 5.000 BTC @ $102.00 -> New
    Order 3: Sell 5.000 BTC @ $101.00 -> New

    Book State: Best Bid=None, Best Ask=Some("100.00"), Spread=None

[2] Taker eats liquidity...
    Order 4: Buy 12.000 BTC @ $101.50
    Trades:
      - Trade #1: 10.000 @ $100.00
      - Trade #2: 2.000 @ $101.00
    Order Status: Filled, Filled: 12.000/12.000

    Book State: Best Bid=None, Best Ask=Some("101.00")

[3] More makers...
    Order 5: Buy 10.000 BTC @ $99.00 -> New

    Final Book State: Best Bid=Some("99.00"), Best Ask=Some("101.00"), Spread=Some("2.00")

=== End of Simulation ===

Observations:

  • Orders matched correctly by price priority (First $100, then $101).
  • Every trade recorded in Trades.
  • Real-time tracking of Best Bid/Ask and Spread.

6. Unit Tests

We added 8 unit tests covering core scenarios:

$ cargo test

running 8 tests
test engine::tests::test_add_resting_order ... ok
test engine::tests::test_cancel_order ... ok
test engine::tests::test_fifo_at_same_price ... ok
test engine::tests::test_full_match ... ok
test engine::tests::test_multiple_trades_single_order ... ok
test engine::tests::test_partial_match ... ok
test engine::tests::test_price_priority ... ok
test engine::tests::test_spread ... ok

test result: ok. 8 passed; 0 failed

7. Is BTreeMap Enough?

For an exchange not chasing extreme performance, BTreeMap is perfectly adequate:

ScenarioBTreeMap Performance
1,000 TPSEasy
10,000 TPSManageable
100,000+ TPSNeed specialized structures

If you want to build a Ferrari-level matching engine (nanosecond latency, millions of TPS), you need:

  • Lock-free data structures
  • Memory pools (avoid heap allocation)
  • CPU Cache optimization
  • FPGA acceleration

But that’s for later. For now, we have a Correct and Efficient baseline implementation.

Summary

This chapter accomplished:

  1. Analyzed Problem: O(n) bottleneck in Vec implementation.
  2. Refactored to BTreeMap: O(log n) insert/search/delete.
  3. Defined Types: Standard Order/Trade/OrderResult models.
  4. Refined API: best_bid/ask, spread, cancel_order.
  5. Added Tests: 8 tests covering core logic.



🇨🇳 中文

📦 代码变更: 查看 Diff

在前三章中,我们完成了从浮点数到整数的转换,并建立了精度配置系统。但我们的 OrderBook 数据结构还是一个“玩具“实现——每次撮合都需要重新排序!本章我们将把它升级为一个真正生产可用的数据结构。

1. 原有实现的问题

让我们回顾一下原来的 engine.rs

#![allow(unused)]
fn main() {
pub struct OrderBook {
    bids: Vec<PriceLevel>,  // 原来叫 buys
    asks: Vec<PriceLevel>,  // 原来叫 sells
}
}

💡 命名规范:我们把 buys/sells 改名为 bids/asks。这是金融行业的标准术语:

  • Bid(买盘):买方愿意出的价格
  • Ask(卖盘):卖方要求的价格

使用专业术语可以让代码更易于与行业文档、API 对接。

#![allow(unused)]
fn main() {
fn match_buy(&mut self, buy_order: &mut Order) {
    // 问题 1: 每次都要重新排序!O(n log n)
    self.asks.sort_by_key(|l| l.price);
    
    for level in self.asks.iter_mut() {
        // ...matching logic...
    }
    
    // 问题 2: 删除空档位需要移动整个数组!O(n)
    self.asks.retain(|l| !l.orders.is_empty());
}

fn rest_order(&mut self, order: Order) {
    // 问题 3: 查找价格档位是线性扫描!O(n)
    let level = self.asks.iter_mut().find(|l| l.price == order.price);
    // ...
}
}

时间复杂度分析

操作Vec 实现问题
插入订单O(n)线性查找价格档位
撮合前排序O(n log n)每次撮合都要排序
删除空档位O(n)数组元素移动

在一个活跃的交易所,每秒可能有数万笔订单。如果每笔订单都要 O(n) 操作,这里很快就会成为性能瓶颈。

2. BTreeMap 解决方案

Rust 标准库提供了 BTreeMap,它是一个自平衡二叉搜索树

#![allow(unused)]
fn main() {
use std::collections::BTreeMap;

pub struct OrderBook {
    /// 卖单: price -> orders (按价格升序,最低价 = 最优卖价)
    asks: BTreeMap<u64, VecDeque<Order>>,
    
    /// 买单: (u64::MAX - price) -> orders (技巧:让最高价排在前面)
    bids: BTreeMap<u64, VecDeque<Order>>,
}
}

关键技巧:买单的 Key 设计

BTreeMap 默认按 key 升序排列。对于卖单,这正好是我们想要的(最低价优先)。但对于买单,我们需要最高价优先。

解决方案:使用 u64::MAX - price 作为 key:

#![allow(unused)]
fn main() {
// 插入买单
let key = u64::MAX - order.price;
self.bids.entry(key).or_insert_with(VecDeque::new).push_back(order);

// 读取真实价格
let price = u64::MAX - key;
}

这样,价格 100 对应 key u64::MAX - 100,价格 99 对应 key u64::MAX - 99。由于 (u64::MAX - 100) < (u64::MAX - 99),价格 100 会排在价格 99 前面!

为什么不用 Reverse 或自定义比较器?

你可能会问:为什么不用 BTreeMap<Reverse<u64>, ...> 或者自定义比较器?

方案对比

方案问题
BTreeMap<Reverse<u64>, ...>Reverse 是一个 wrapper 类型,每次访问 key 都需要解包,增加代码复杂度
自定义 Ord trait需要创建 newtype wrapper,代码量大增
u64::MAX - price零成本抽象:两次减法操作,编译器可以内联优化

关键优势

  • 简单:只需要两行代码(插入时 u64::MAX - price,读取时再减回来)
  • 零开销:减法操作在 CPU 上是单周期指令
  • 类型安全:key 仍然是 u64,不需要额外的 wrapper 类型
  • 无溢出风险:价格永远小于 u64::MAX,减法不会溢出

时间复杂度对比

操作Vec 实现BTreeMap 实现
插入订单O(n)O(log n)
撮合(不排序)-O(log n)
取消订单O(n)O(n)*
删除空价格档O(n)O(log n)
查询最优价O(n) 或 O(n log n)**O(1)**xx

*注: 取消订单需要在 VecDeque 中线性查找订单 ID,这是 O(n)。如果需要 O(1) 取消,需要额外的 HashMap 索引。

**注: BTreeMap 的 first_key_value() 是 O(1) 摊销复杂度。

3. 新的数据模型

Order(订单)

#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
pub struct Order {
    pub id: u64,
    pub price: u64,          // 价格(内部单位)
    pub qty: u64,            // 原始数量
    pub filled_qty: u64,     // 已成交数量
    pub side: Side,
    pub order_type: OrderType,
    pub status: OrderStatus,
}
}

Trade(成交记录)

#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
pub struct Trade {
    pub id: u64,
    pub buyer_order_id: u64,
    pub seller_order_id: u64,
    pub price: u64,
    pub qty: u64,
}
}

OrderResult(下单结果)

#![allow(unused)]
fn main() {
pub struct OrderResult {
    pub order: Order,      // 更新后的订单
    pub trades: Vec<Trade>, // 产生的成交
}
}

4. 核心 API

#![allow(unused)]
fn main() {
impl OrderBook {
    /// 添加订单,返回成交结果
    pub fn add_order(&mut self, order: Order) -> OrderResult;
    
    /// 取消订单
    pub fn cancel_order(&mut self, order_id: u64, price: u64, side: Side) -> bool;
    
    /// 获取最优买价
    pub fn best_bid(&self) -> Option<u64>;
    
    /// 获取最优卖价
    pub fn best_ask(&self) -> Option<u64>;
    
    /// 获取买卖价差
    pub fn spread(&self) -> Option<u64>;
}
}

5. 运行结果

=== 0xInfinity: Stage 4 (BTree OrderBook) ===
Symbol: BTC_USDT (ID: 0)
Price Decimals: 2, Qty Display Decimals: 3

[1] Makers coming in...
    Order 1: Sell 10.000 BTC @ $100.00 -> New
    Order 2: Sell 5.000 BTC @ $102.00 -> New
    Order 3: Sell 5.000 BTC @ $101.00 -> New

    Book State: Best Bid=None, Best Ask=Some("100.00"), Spread=None

[2] Taker eats liquidity...
    Order 4: Buy 12.000 BTC @ $101.50
    Trades:
      - Trade #1: 10.000 @ $100.00
      - Trade #2: 2.000 @ $101.00
    Order Status: Filled, Filled: 12.000/12.000

    Book State: Best Bid=None, Best Ask=Some("101.00")

[3] More makers...
    Order 5: Buy 10.000 BTC @ $99.00 -> New

    Final Book State: Best Bid=Some("99.00"), Best Ask=Some("101.00"), Spread=Some("2.00")

=== End of Simulation ===

可以看到:

  • 订单按价格优先级正确匹配(先 $100,再 $101)
  • 每笔成交都记录在 Trade
  • 实时追踪 Best Bid/Ask 和 Spread

6. 单元测试

我们添加了 8 个单元测试来验证核心功能:

$ cargo test

running 8 tests
test engine::tests::test_add_resting_order ... ok
test engine::tests::test_cancel_order ... ok
test engine::tests::test_fifo_at_same_price ... ok
test engine::tests::test_full_match ... ok
test engine::tests::test_multiple_trades_single_order ... ok
test engine::tests::test_partial_match ... ok
test engine::tests::test_price_priority ... ok
test engine::tests::test_spread ... ok

test result: ok. 8 passed; 0 failed

覆盖的场景包括:

  • ✅ 订单挂单(无匹配)
  • ✅ 完全成交
  • ✅ 部分成交
  • ✅ 价格优先级(Price Priority)
  • ✅ 同价格 FIFO
  • ✅ 取消订单
  • ✅ 价差计算
  • ✅ 一个大单吃掉多个小单

7. BTreeMap 够用吗?

对于一个不追求极致性能的交易所,BTreeMap 完全够用:

场景BTreeMap 表现
每秒 1000 单轻松应对
每秒 10000 单可以应对
每秒 100000+ 单需要更专业的数据结构

如果你要打造一个法拉利级别的撮合引擎(纳秒级延迟、每秒百万单),需要考虑:

  • 无锁数据结构
  • 内存池(避免动态分配)
  • CPU Cache 优化
  • FPGA 硬件加速

但那是后话了。现在,我们有了一个正确且高效的基础实现。

Summary

本章完成了以下工作:

  1. 分析原有问题:Vec 实现的 O(n) 复杂度瓶颈
  2. 重构为 BTreeMap:O(log n) 的插入、查找、删除
  3. 定义规范类型:Order、Trade、OrderResult
  4. 完善 API:best_bid/ask、spread、cancel_order
  5. 添加单元测试:8 个测试覆盖核心场景

0x05 User Account & Balance Management

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

In previous chapters, our matching engine could match orders correctly. But there’s a key question: User Funds? In a real exchange, users must have sufficient funds before placing an order, and funds must be transferred upon matching.

This chapter implements the user account system, including:

  • Balance Management (Avail / Frozen)
  • Pre-trade Fund Validation
  • Post-trade Settlement

1. Dual State of Balance: Avail vs Frozen

In an exchange, a balance has two states:

StateMeaningUsage
AvailCan be used for trading or withdrawalDaily operations
FrozenLocked in open ordersWaiting for match or cancel

Why do we need Frozen?

Suppose Alice has 10 BTC and she places two sell orders:

  • Order A: Sell 8 BTC
  • Order B: Sell 5 BTC

Without a freeze mechanism, these two orders require 13 BTC, but Alice only has 10! This is the Over-Selling problem.

Correct Flow:

1. Alice has 10 BTC (avail=10, frozen=0)
2. Place Order A (8 BTC) → freeze 8 BTC → (avail=2, frozen=8) ✅
3. Place Order B (5 BTC) → try freeze 5 BTC → Fail! avail only 2 ❌

2. Balance Structure

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Default)]
pub struct Balance {
    pub avail: u64,  // Available Balance
    pub frozen: u64, // Frozen Balance
}

impl Balance {
    /// Deposit (Increase avail)
    /// Returns false on overflow - Financial systems must detect this!
    pub fn deposit(&mut self, amount: u64) -> bool {
        match self.avail.checked_add(amount) {
            Some(new_avail) => {
                self.avail = new_avail;
                true
            }
            None => false, // Overflow! Alert and investigate.
        }
    }
}

Why checked_add?

MethodOverflow Behavior (250u8 + 10u8)Use Case
+ (Std)Panic (Debug) or Wrap (Release)General logic, overflow is bug
wrapping_add4 (Wrap)Hashing, Graphics
saturating_add255 (Cap)Quotas, Token buckets
checked_addNoneFinance, Overflow must error!

⚠️ In financial systems, “too much money causing overflow” is a severe bug. It must return an error for handling, not silently wrap or saturate.

#![allow(unused)]
fn main() {
    /// Freeze (avail → frozen)
    pub fn freeze(&mut self, amount: u64) -> bool {
        if self.avail >= amount {
            self.avail -= amount;
            self.frozen += amount;
            true
        } else {
            false
        }
    }

    /// Unfreeze (frozen → avail), for cancellations
    pub fn unfreeze(&mut self, amount: u64) -> bool {
        if self.frozen >= amount {
            self.frozen -= amount;
            self.avail += amount;
            true
        } else {
            false
        }
    }

    /// Consume Frozen (Fund leaves account after match)
    pub fn consume_frozen(&mut self, amount: u64) -> bool {
        if self.frozen >= amount {
            self.frozen -= amount;
            true
        } else {
            false
        }
    }

    /// Receive Funds (Fund enters account after match)
    pub fn receive(&mut self, amount: u64) {
        self.avail = self.avail.checked_add(amount);
    }
}
}

3. User Account Structure

Each user holds balances for multiple assets:

#![allow(unused)]
fn main() {
/// Use FxHashMap for O(1) asset lookup
/// FxHashMap is faster for integer keys
pub struct UserAccount {
    pub user_id: u64,
    balances: FxHashMap<u32, Balance>, // asset_id -> Balance
}

impl UserAccount {
    pub fn deposit(&mut self, asset_id: u32, amount: u64) {
        self.get_balance_mut(asset_id).deposit(amount);
    }

    pub fn avail(&self, asset_id: u32) -> u64 {
        self.balances.get(&asset_id).map(|b| b.avail).unwrap_or(0)
    }

    pub fn frozen(&self, asset_id: u32) -> u64 {
        self.balances.get(&asset_id).map(|b| b.frozen).unwrap_or(0)
    }
}
}

4. Order Placing: Freezing Funds

When placing an order, we freeze specific assets based on order side:

Order SideAsset to FreezeAmount
BuyQuote Asset (e.g. USDT)price × quantity / qty_unit
SellBase Asset (e.g. BTC)quantity

Using SymbolManager for Precision

Each pair has its own precision config:

#![allow(unused)]
fn main() {
let symbol_info = manager.get_symbol_info("BTC_USDT").unwrap();
let price_decimal = symbol_info.price_decimal;  // 2
let base_asset = manager.assets.get(&symbol_info.base_asset_id).unwrap();
let qty_decimal = base_asset.decimals;  // 8
let qty_unit = 10u64.pow(qty_decimal);  // 100_000_000

// price = 100 USDT (Internal: 100 * price_unit)
// qty = 10 BTC (Internal: 10 * qty_unit)
// cost = price * qty / qty_unit (Prevent overflow)
let cost = price * qty / qty_unit;

if accounts.freeze(user_id, USDT, cost) {
    let result = book.add_order(Order::new(id, user_id, price, qty, Side::Buy));
} else {
    println!("REJECTED: Insufficient balance");
}

// Sell Order: Freeze BTC
if accounts.freeze(user_id, BTC, qty) {
    let result = book.add_order(Order::new(id, user_id, price, qty, Side::Sell));
}
}

5. Settlement: Fund Transfer

When orders match, funds transfer between buyer and seller:

Trade: Alice sells 1 BTC to Bob @ $100

Before:
  Alice: BTC(frozen=1), USDT(avail=0)
  Bob:   BTC(avail=0), USDT(frozen=100)

Settlement:
  Alice: consume_frozen(BTC, 1) + receive(USDT, 100)
  Bob:   consume_frozen(USDT, 100) + receive(BTC, 1)

After:
  Alice: BTC(frozen=0), USDT(avail=100)
  Bob:   BTC(avail=1), USDT(frozen=0)

Code Implementation:

#![allow(unused)]
fn main() {
pub fn settle_trade(
    &mut self,
    buyer_id: u64,
    seller_id: u64,
    base_asset_id: u32,
    quote_asset_id: u32,
    base_amount: u64,    // Trade Qty
    quote_amount: u64,   // Trade Amount (price × qty)
) {
    // Buyer: Use USDT, Get BTC
    self.get_account_mut(buyer_id)
        .get_balance_mut(quote_asset_id)
        .consume_frozen(quote_amount);
    self.get_account_mut(buyer_id)
        .get_balance_mut(base_asset_id)
        .receive(base_amount);

    // Seller: Use BTC, Get USDT
    self.get_account_mut(seller_id)
        .get_balance_mut(base_asset_id)
        .consume_frozen(base_amount);
    self.get_account_mut(seller_id)
        .get_balance_mut(quote_asset_id)
        .receive(quote_amount);
}
}

6. Refined Trade Structure

To support settlement, Trade needs user IDs:

#![allow(unused)]
fn main() {
pub struct Trade {
    pub id: u64,
    pub buyer_order_id: u64,
    pub seller_order_id: u64,
    pub buyer_user_id: u64,   // New
    pub seller_user_id: u64,  // New
    pub price: u64,
    pub qty: u64,
}
}

7. Execution Results

=== 0xInfinity: Stage 5 (User Balance) ===
Symbol: BTC_USDT | Price: 2 decimals, Qty: 8 decimals
Cost formula: price * qty / 100000000

[0] Initial deposits...
    Alice: 100.00000000 BTC, 10000.00 USDT
    Bob:   5.00000000 BTC, 200000.00 USDT

[1] Alice places sell orders...
    Order 1: Sell 10.00000000 BTC @ $100.00 -> New
    Order 2: Sell 5.00000000 BTC @ $101.00 -> New
    Alice balance: avail=85.00000000 BTC, frozen=15.00000000 BTC

[2] Bob places buy order (taker)...
    Order 3: Buy 12.00000000 BTC @ $101.00 (cost: 1212.00 USDT)
    Trades:
      - Trade #1: 10.00000000 BTC @ $100.00
      - Trade #2: 2.00000000 BTC @ $101.00
    Order status: Filled

[3] Final balances:
    Alice: 85.00000000 BTC (frozen: 3.00000000), 11202.00 USDT
    Bob:   17.00000000 BTC, 198798.00 USDT (frozen: 0.00)

    Book: Best Bid=None, Best Ask=Some("101.00")

Analysis:

  • Alice initial 100 BTC. Sold 10+2=12. Remaining 85 avail + 3 frozen = 88 BTC ✓
  • Alice got 10×100 + 2×101 = 1202 USDT. Initial 10000 + 1202 = 11202 USDT ✓
  • Bob initial 5 BTC. Bought 12. Total 17 BTC ✓
  • Bob spent 1202 USDT. Initial 200000 - 1202 = 198798 USDT ✓

Summary

This chapter accomplished:

  1. Implemented Balance: Dual-state (avail/frozen).
  2. Implemented UserAccount: Multi-asset support.
  3. Implemented AccountManager: Managing all users.
  4. Pre-trade Freeze: Prevent over-selling/buying.
  5. Post-trade Settlement: Correct fund transfer.
  6. Refined Trade: Included user_ids.

Now our engine not only matches orders but also ensures funding sufficiency and correct settlement!




🇨🇳 中文

📦 代码变更: 查看 Diff

在前几章中,我们的撮合引擎已经可以正确匹配订单并产生成交。但有一个关键问题:钱从哪里来? 在真实的交易所中,用户必须先有足够的资金才能下单,成交后资金才会转移。

本章我们将实现用户账户系统,包括:

  • 余额管理(可用 / 冻结)
  • 下单前资金校验
  • 成交后资金结算

1. 余额的双重状态:Avail vs Frozen

在交易所中,用户的余额有两种状态:

状态含义使用场景
Avail (可用)可以用于下单或提现日常操作
Frozen (冻结)已锁定在挂单中等待成交或取消

为什么需要冻结?

假设 Alice 有 10 BTC,她同时挂了两个卖单:

  • 卖单 A:卖 8 BTC
  • 卖单 B:卖 5 BTC

如果没有冻结机制,这两个订单共需要 13 BTC,但 Alice 只有 10 BTC!这就是超卖问题。

正确的流程

1. Alice 有 10 BTC (avail=10, frozen=0)
2. 下卖单 A (8 BTC) → freeze 8 BTC → (avail=2, frozen=8) ✅
3. 下卖单 B (5 BTC) → 尝试 freeze 5 BTC → 失败!avail 只有 2 ❌

2. Balance 结构

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Default)]
pub struct Balance {
    pub avail: u64,  // 可用余额 (简短命名,JSON 输出更高效)
    pub frozen: u64, // 冻结余额
}

impl Balance {
    /// 存款 (增加 avail)
    /// 返回 false 表示溢出 - 金融系统必须检测此错误
    pub fn deposit(&mut self, amount: u64) -> bool {
        match self.avail.checked_add(amount) {
            Some(new_avail) => {
                self.avail = new_avail;
                true
            }
            None => false, // 溢出!需要报警和调查
        }
    }
}

为什么要用 checked_add

方法溢出行为 (250u8 + 10u8)适用场景
+ (标准)Panic (Debug) 或 4 (Release回绕)常规逻辑,溢出是 Bug
wrapping_add4 (回绕)哈希计算、图形算法
saturating_add255 (封顶)资源配额、令牌桶
checked_addNone金融余额,溢出必须报错!

⚠️ 金融系统中,“钱多到溢出“是严重的 Bug,必须返回错误让上层处理,而不是静默封顶或回绕。

#![allow(unused)]
fn main() {

    /// 冻结 (avail → frozen)
    pub fn freeze(&mut self, amount: u64) -> bool {
        if self.avail >= amount {
            self.avail -= amount;
            self.frozen += amount;
            true
        } else {
            false
        }
    }

    /// 解冻 (frozen → avail),用于取消订单
    pub fn unfreeze(&mut self, amount: u64) -> bool {
        if self.frozen >= amount {
            self.frozen -= amount;
            self.avail += amount;
            true
        } else {
            false
        }
    }

    /// 消耗冻结资金 (成交后,资金离开账户)
    pub fn consume_frozen(&mut self, amount: u64) -> bool {
        if self.frozen >= amount {
            self.frozen -= amount;
            true
        } else {
            false
        }
    }

    /// 接收资金 (成交后,资金进入账户)
    pub fn receive(&mut self, amount: u64) {
        self.avail = self.avail.checked_add(amount);
    }
}
}

3. 用户账户结构

每个用户持有多种资产的余额:

#![allow(unused)]
fn main() {
/// 使用 FxHashMap 实现 O(1) 资产查找
/// FxHashMap 使用更简单、更快的哈希函数,特别适合整数键
pub struct UserAccount {
    pub user_id: u64,
    balances: FxHashMap<u32, Balance>, // asset_id -> Balance
}

impl UserAccount {
    pub fn deposit(&mut self, asset_id: u32, amount: u64) {
        self.get_balance_mut(asset_id).deposit(amount);
    }

    pub fn avail(&self, asset_id: u32) -> u64 {
        self.balances.get(&asset_id).map(|b| b.avail).unwrap_or(0)
    }

    pub fn frozen(&self, asset_id: u32) -> u64 {
        self.balances.get(&asset_id).map(|b| b.frozen).unwrap_or(0)
    }
}
}

4. 下单流程:冻结资金

在下单时,我们需要根据订单类型冻结相应的资产:

订单类型需要冻结的资产冻结金额
买单 (Buy)Quote 资产 (如 USDT)price × quantity / qty_unit
卖单 (Sell)Base 资产 (如 BTC)quantity

从 SymbolManager 获取精度配置

每个交易对有独立的精度配置:

#![allow(unused)]
fn main() {
let symbol_info = manager.get_symbol_info("BTC_USDT").unwrap();
let price_decimal = symbol_info.price_decimal;  // 2 (价格精度)

let base_asset = manager.assets.get(&symbol_info.base_asset_id).unwrap();
let qty_decimal = base_asset.decimals;  // 8 (数量精度)
let qty_unit = 10u64.pow(qty_decimal);  // 100_000_000

// price = 100 USDT (内部单位: 100 * price_unit)
// qty = 10 BTC (内部单位: 10 * qty_unit)
// cost = price * qty / qty_unit (确保不会溢出)
let cost = price * qty / qty_unit;

if accounts.freeze(user_id, USDT, cost) {
    let result = book.add_order(Order::new(id, user_id, price, qty, Side::Buy));
} else {
    println!("REJECTED: Insufficient balance");
}

// 卖单:冻结 BTC
if accounts.freeze(user_id, BTC, qty) {
    let result = book.add_order(Order::new(id, user_id, price, qty, Side::Sell));
}
}

这样,精度配置跟着 Symbol 走,price * qty / qty_unit 保证结果在合理范围内。

5. 成交结算:资金转移

当订单匹配成交后,需要在买卖双方之间转移资金:

Trade: Alice sells 1 BTC to Bob @ $100

Before:
  Alice: BTC(frozen=1), USDT(avail=0)
  Bob:   BTC(avail=0), USDT(frozen=100)

Settlement:
  Alice: consume_frozen(BTC, 1) + receive(USDT, 100)
  Bob:   consume_frozen(USDT, 100) + receive(BTC, 1)

After:
  Alice: BTC(frozen=0), USDT(avail=100)
  Bob:   BTC(avail=1), USDT(frozen=0)

代码实现:

#![allow(unused)]
fn main() {
pub fn settle_trade(
    &mut self,
    buyer_id: u64,
    seller_id: u64,
    base_asset_id: u32,  // 如 BTC
    quote_asset_id: u32, // 如 USDT
    base_amount: u64,    // 成交数量
    quote_amount: u64,   // 成交金额 (price × qty)
) {
    // Buyer: 消耗 USDT,获得 BTC
    self.get_account_mut(buyer_id)
        .get_balance_mut(quote_asset_id)
        .consume_frozen(quote_amount);
    self.get_account_mut(buyer_id)
        .get_balance_mut(base_asset_id)
        .receive(base_amount);

    // Seller: 消耗 BTC,获得 USDT
    self.get_account_mut(seller_id)
        .get_balance_mut(base_asset_id)
        .consume_frozen(base_amount);
    self.get_account_mut(seller_id)
        .get_balance_mut(quote_asset_id)
        .receive(quote_amount);
}
}

6. Trade 结构的完善

为了正确结算,Trade 结构需要包含买卖双方的用户 ID:

#![allow(unused)]
fn main() {
pub struct Trade {
    pub id: u64,
    pub buyer_order_id: u64,
    pub seller_order_id: u64,
    pub buyer_user_id: u64,   // 新增
    pub seller_user_id: u64,  // 新增
    pub price: u64,
    pub qty: u64,
}
}

在撮合时,从 Order 中提取 user_id 并写入 Trade:

#![allow(unused)]
fn main() {
trades.push(Trade::new(
    self.trade_id_counter,
    buy_order.id,
    sell_order.id,
    buy_order.user_id,   // 从订单获取用户 ID
    sell_order.user_id,
    price,
    trade_qty,
));
}

7. 运行结果

=== 0xInfinity: Stage 5 (User Balance) ===
Symbol: BTC_USDT | Price: 2 decimals, Qty: 8 decimals
Cost formula: price * qty / 100000000

[0] Initial deposits...
    Alice: 100.00000000 BTC, 10000.00 USDT
    Bob:   5.00000000 BTC, 200000.00 USDT

[1] Alice places sell orders...
    Order 1: Sell 10.00000000 BTC @ $100.00 -> New
    Order 2: Sell 5.00000000 BTC @ $101.00 -> New
    Alice balance: avail=85.00000000 BTC, frozen=15.00000000 BTC

[2] Bob places buy order (taker)...
    Order 3: Buy 12.00000000 BTC @ $101.00 (cost: 1212.00 USDT)
    Trades:
      - Trade #1: 10.00000000 BTC @ $100.00
      - Trade #2: 2.00000000 BTC @ $101.00
    Order status: Filled

[3] Final balances:
    Alice: 85.00000000 BTC (frozen: 3.00000000), 11202.00 USDT
    Bob:   17.00000000 BTC, 198798.00 USDT (frozen: 0.00)

    Book: Best Bid=None, Best Ask=Some("101.00")

分析

  • Alice 初始有 100 BTC,卖出 10+2=12 BTC,还剩 85 + 3(frozen) = 88 BTC ✓
  • Alice 收到 10×100 + 2×101 = 1202 USDT,加上初始 10000 = 11202 USDT ✓
  • Bob 初始有 5 BTC,买入 12 BTC = 17 BTC ✓
  • Bob 花费 1202 USDT,初始 200000 - 1202 = 198798 USDT ✓

Summary

本章完成了以下工作:

  1. 实现 Balance 结构:avail/frozen 双状态余额管理
  2. 实现 UserAccount:一个用户持有多种资产余额
  3. 实现 AccountManager:管理所有用户账户
  4. 下单前资金冻结:防止超卖/超买
  5. 成交后资金结算:在买卖双方间正确转移资金
  6. 完善 Trade 结构:包含买卖双方 user_id
  7. 添加单元测试:4 个新测试覆盖余额管理

现在我们的撮合引擎不仅能正确匹配订单,还能确保用户有足够的资金,并在成交后正确结算!

0x06 Enforced Balance Management

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

In the previous chapter, we implemented balance management. However, in financial systems, fund operations are the most critical part and must be foolproof. This chapter upgrades balance management to a Type-System Enforced version.

1. Why “Enforced”?

The previous implementation had flaws:

#![allow(unused)]
fn main() {
// ❌ Problem 1: Public fields, easily modified unintentionally
pub struct Balance {
    pub avail: u64,   // Dev might assign directly, bypassing logic
    pub frozen: u64,
}

// ❌ Problem 2: Returns bool, unclear error
fn freeze(&mut self, amount: u64) -> bool {
    // Failed? Why? Don't know.
}

// ❌ Problem 3: No Audit Trail
// Balance changed, but no versioning for tracing.
}

These issues can lead to:

  • Developers accidentally bypassing checks: In complex logic, one might modify fields directly.
  • Hard to debug: “Operation failed” doesn’t tell you why.
  • Audit difficulty: No change tracking makes it hard to pinpoint when a bug occurred.

Note: This is not to prevent malicious attacks (it’s an internal system), but to prevent developer errors. Just like Rust’s ownership system—we use types to reduce the chance of shooting ourselves in the foot.

2. Enforced Balance Design

The new version enforces safety via Rust Type System:

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
pub struct Balance {
    avail: u64,      // ← Private! Only accessible via methods
    frozen: u64,     // ← Private!
    version: u64,    // ← Private! Auto-increment on change
}
}

Core Principles

PrincipleImplementation
EncapsulationAll fields private, read-only getters provided
Explicit ErrorAll mutations return Result<(), &'static str>
Audit Trailversion auto-increments on every mutation
Overflow ProtectionUse checked_add/sub, overflow returns Error

Method Renaming

Old (v0.5)New (v0.6)Meaning
freeze()lock()More accurate: lock funds for order
unfreeze()unlock()Unlock (when cancelling)
consume_frozen()spend_frozen()Spend frozen funds (after match)
receive()deposit()Unified deposit semantics

3. Balance API Details

Safe Getters

#![allow(unused)]
fn main() {
impl Balance {
    /// Get Available (Read-only)
    pub const fn avail(&self) -> u64 { self.avail }
    
    /// Get Frozen (Read-only)
    pub const fn frozen(&self) -> u64 { self.frozen }
    
    /// Get Total (avail + frozen)
    /// Returns None on overflow (data corruption)
    pub const fn total(&self) -> Option<u64> {
        self.avail.checked_add(self.frozen)
    }
    
    /// Get Version (Read-only)
    pub const fn version(&self) -> u64 { self.version }
}
}

Why const fn? Compiler guarantees state is never modified, providing strongest safety.

Validated Mutations

Every mutation method:

  1. Validates preconditions
  2. Uses checked arithmetic
  3. Returns Result
  4. Auto-increments version
#![allow(unused)]
fn main() {
/// Deposit: Increase Available
pub fn deposit(&mut self, amount: u64) -> Result<(), &'static str> {
    self.avail = self.avail.checked_add(amount)
        .ok_or("Deposit overflow")?;  // ← Return Error on Overflow
    self.version = self.version.wrapping_add(1);  // ← Auto Increment
    Ok(())
}

/// Lock: Avail → Frozen
pub fn lock(&mut self, amount: u64) -> Result<(), &'static str> {
    if self.avail < amount {
        return Err("Insufficient funds to lock");  // ← Explicit Error
    }
    self.avail = self.avail.checked_sub(amount)
        .ok_or("Lock avail underflow")?;
    self.frozen = self.frozen.checked_add(amount)
        .ok_or("Lock frozen overflow")?;
    self.version = self.version.wrapping_add(1);
    Ok(())
}

/// Unlock: Frozen → Avail
pub fn unlock(&mut self, amount: u64) -> Result<(), &'static str> {
    if self.frozen < amount {
        return Err("Insufficient frozen funds");
    }
    self.frozen = self.frozen.checked_sub(amount)
        .ok_or("Unlock frozen underflow")?;
    self.avail = self.avail.checked_add(amount)
        .ok_or("Unlock avail overflow")?;
    self.version = self.version.wrapping_add(1);
    Ok(())
}

/// Spend Frozen: Funds leave account after match
pub fn spend_frozen(&mut self, amount: u64) -> Result<(), &'static str> {
    if self.frozen < amount {
        return Err("Insufficient frozen funds");
    }
    self.frozen = self.frozen.checked_sub(amount)
        .ok_or("Spend frozen underflow")?;
    self.version = self.version.wrapping_add(1);
    Ok(())
}
}

4. UserAccount Refactoring

UserAccount is also refactored:

Data Structure Change

#![allow(unused)]
fn main() {
// Old: FxHashMap
pub struct UserAccount {
    pub user_id: u64,
    balances: FxHashMap<u32, Balance>,
}

// New: O(1) Direct Array Indexing
pub struct UserAccount {
    user_id: UserId,      // Private
    assets: Vec<Balance>, // Private, asset_id as index
}
}

O(1) Direct Array Indexing

#![allow(unused)]
fn main() {
// deposit() auto-creates slot
pub fn deposit(&mut self, asset_id: AssetId, amount: u64) -> Result<(), &'static str> {
    let idx = asset_id as usize;
    if idx >= self.assets.len() {
        self.assets.resize(idx + 1, Balance::default());
    }
    self.assets[idx].deposit(amount)
}

// get_balance_mut() returns Result
pub fn get_balance_mut(&mut self, asset_id: AssetId) -> Result<&mut Balance, &'static str> {
    self.assets.get_mut(asset_id as usize).ok_or("Asset not found")
}
}

🚀 Why Vec<Balance> is Highest Performance?

1. Cache-Friendly Vec<Balance> is contiguous in memory. Loading one Balance loads neighbors into CPU cache line.

2. get_balance() is High Frequency Each order triggers 5-10 balance checks. O(1) + Cache Friendly is critical for millions of TPS.

Settlement Methods

New methods dedicated to handling all settlement logic for buyer/seller in one go:

#![allow(unused)]
fn main() {
/// Buyer Settlement: Spend Quote, Gain Base, Refund unused Quote
pub fn settle_as_buyer(
    &mut self,
    quote_asset_id: AssetId,
    base_asset_id: AssetId,
    spend_quote: u64,   // Consumed USDT
    gain_base: u64,     // Gained BTC
    refund_quote: u64,  // Refunded USDT
) -> Result<(), &'static str> {
    // 1. Spend Quote (Frozen)
    self.get_balance_mut(quote_asset_id).spend_frozen(spend_quote)?;
    
    // 2. Gain Base (Available)
    self.get_balance_mut(base_asset_id).deposit(gain_base)?;
    
    // 3. Refund (Frozen → Available)
    if refund_quote > 0 {
        self.get_balance_mut(quote_asset_id).unlock(refund_quote)?;
    }
    Ok(())
}
}

5. Execution Results

=== 0xInfinity: Stage 6 (Enforced Balance) ===
Symbol: BTC_USDT | Price: 2 decimals, Qty: 8 decimals
Cost formula: price * qty / 100000000

[0] Initial deposits...
    Alice: 100.00000000 BTC, 10000.00 USDT
    Bob:   5.00000000 BTC, 200000.00 USDT

[1] Alice places sell orders...
    Order 1: Sell 10.00000000 BTC @ $100.00 -> New
    Order 2: Sell 5.00000000 BTC @ $101.00 -> New
    Alice balance: avail=85.00000000 BTC, frozen=15.00000000 BTC

[2] Bob places buy order (taker)...
    Order 3: Buy 12.00000000 BTC @ $101.00 (cost: 1212.00 USDT)
    Trades:
      - Trade #1: 10.00000000 BTC @ $100.00
      - Trade #2: 2.00000000 BTC @ $101.00
    Order status: Filled

[3] Final balances:
    Alice: 85.00000000 BTC (frozen: 3.00000000), 11202.00 USDT
    Bob:   17.00000000 BTC, 198798.00 USDT (frozen: 0.00)

    Book: Best Bid=None, Best Ask=Some("101.00")

=== End of Simulation ===

Results are consistent with the previous chapter, but now all operations are protected by the Type System!

6. Unit Tests

We added 8 new tests for enforced_balance. Total 16 tests passing.

test enrolled_balance::tests::test_deposit ... ok
test enrolled_balance::tests::test_deposit_overflow ... ok
test enrolled_balance::tests::test_lock_unlock ... ok
...
test result: ok. 16 passed; 0 failed

7. Error Handling Example

With the new API, Result must be handled:

#![allow(unused)]
fn main() {
// ❌ Compile Error: Unhandled Result
balance.deposit(100);

// ✅ Correct: Propagate
balance.deposit(100)?;

// ✅ Correct: Unwrap (Only if sure)
balance.deposit(100).unwrap();

// ✅ Correct: Match
match balance.lock(1000) {
    Ok(()) => println!("Locked successfully"),
    Err(e) => println!("Failed to lock: {}", e),
}
}

Summary

This chapter accomplished:

  1. Encapsulation: Private fields prevent accidental modification.
  2. Result Return: All mutations return explicit errors.
  3. Versioning: Auto-increment version for audit.
  4. Checked Arithmetic: Prevents overflow.
  5. Renaming: lock/unlock/spend_frozen are clearer.
  6. Settlement Helper: settle_as_buyer/seller.
  7. Asset ID: Constraint for future O(1) array optimization.

Now our balance management is Type-Safe—the compiler prevents most balance-related bugs!




🇨🇳 中文

📦 代码变更: 查看 Diff

在上一章中,我们实现了用户账户的余额管理。但在金融系统中,资金操作是最核心、最关键的操作,必须确保万无一失。本章我们将余额管理升级为类型系统强制的安全版本。

1. 为什么需要“强制“版本?

上一章的实现存在几个隐患:

#![allow(unused)]
fn main() {
// ❌ 旧版问题1:字段是公开的,容易被无意修改
pub struct Balance {
    pub avail: u64,   // 开发者可能不小心直接赋值,绕过业务逻辑校验
    pub frozen: u64,
}

// ❌ 旧版问题2:返回 bool,错误信息不明确
fn freeze(&mut self, amount: u64) -> bool {
    // 失败了?为什么失败?不知道
}

// ❌ 旧版问题3:无审计追踪
// 余额变了,但没有版本号,无法追溯
}

这些问题可能导致:

  • 开发者无意中绕过校验:在复杂的业务代码中,可能不小心直接修改公开字段
  • 错误难以排查:只知道操作失败,不知道具体原因
  • 审计困难:没有变更追踪,难以定位问题发生的时间点

注意:这不是防止恶意攻击(这是内部系统),而是防止开发者无意挖坑。 就像 Rust 的所有权系统一样——我们用类型系统来减少挖坑的机会。

2. 强制余额设计 (Enforced Balance)

新版本通过 Rust 类型系统 强制安全:

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
pub struct Balance {
    avail: u64,      // ← 私有!只能通过方法访问
    frozen: u64,     // ← 私有!
    version: u64,    // ← 私有!每次变更自动递增
}
}

核心原则

原则实现方式
封装所有字段私有,提供只读 getter
显式错误所有变更返回 Result<(), &'static str>
审计追踪version 在每次变更时自动递增
溢出保护使用 checked_add/sub,溢出返回错误

方法命名变更

旧版 (v0.5)新版 (v0.6)说明
freeze()lock()更准确:锁定资金用于订单
unfreeze()unlock()解锁(取消订单时)
consume_frozen()spend_frozen()消费冻结资金(成交后)
receive()deposit()统一为存款语义

3. Balance API 详解

只读方法 (Safe Getters)

#![allow(unused)]
fn main() {
impl Balance {
    /// 获取可用余额 (只读)
    pub const fn avail(&self) -> u64 { self.avail }
    
    /// 获取冻结余额 (只读)
    pub const fn frozen(&self) -> u64 { self.frozen }
    
    /// 获取总余额 (avail + frozen)
    /// 返回 None 表示溢出(数据损坏)
    pub const fn total(&self) -> Option<u64> {
        self.avail.checked_add(self.frozen)
    }
    
    /// 获取版本号 (只读)
    pub const fn version(&self) -> u64 { self.version }
}
}

为什么用 const fn 编译器保证永远不会修改状态,提供最强的安全保证。

变更方法 (Validated Mutations)

每个变更方法都:

  1. 验证前置条件
  2. 使用 checked 算术
  3. 返回 Result
  4. 自动递增 version
#![allow(unused)]
fn main() {
/// 存款:增加可用余额
pub fn deposit(&mut self, amount: u64) -> Result<(), &'static str> {
    self.avail = self.avail.checked_add(amount)
        .ok_or("Deposit overflow")?;  // ← 溢出返回错误
    self.version = self.version.wrapping_add(1);  // ← 自动递增
    Ok(())
}

/// 锁定:可用 → 冻结
pub fn lock(&mut self, amount: u64) -> Result<(), &'static str> {
    if self.avail < amount {
        return Err("Insufficient funds to lock");  // ← 明确错误信息
    }
    self.avail = self.avail.checked_sub(amount)
        .ok_or("Lock avail underflow")?;
    self.frozen = self.frozen.checked_add(amount)
        .ok_or("Lock frozen overflow")?;
    self.version = self.version.wrapping_add(1);
    Ok(())
}

/// 解锁:冻结 → 可用
pub fn unlock(&mut self, amount: u64) -> Result<(), &'static str> {
    if self.frozen < amount {
        return Err("Insufficient frozen funds");
    }
    self.frozen = self.frozen.checked_sub(amount)
        .ok_or("Unlock frozen underflow")?;
    self.avail = self.avail.checked_add(amount)
        .ok_or("Unlock avail overflow")?;
    self.version = self.version.wrapping_add(1);
    Ok(())
}

/// 消费冻结资金:成交后资金离开账户
pub fn spend_frozen(&mut self, amount: u64) -> Result<(), &'static str> {
    if self.frozen < amount {
        return Err("Insufficient frozen funds");
    }
    self.frozen = self.frozen.checked_sub(amount)
        .ok_or("Spend frozen underflow")?;
    self.version = self.version.wrapping_add(1);
    Ok(())
}
}

4. UserAccount 重构

新版 UserAccount 也进行了重构:

数据结构变更

#![allow(unused)]
fn main() {
// 旧版:使用 FxHashMap
pub struct UserAccount {
    pub user_id: u64,
    balances: FxHashMap<u32, Balance>,
}

// 新版:O(1) 直接数组索引
pub struct UserAccount {
    user_id: UserId,      // 私有
    assets: Vec<Balance>, // 私有,asset_id 作为下标
}
}

O(1) 直接数组索引

#![allow(unused)]
fn main() {
// deposit() 自动创建资产槽位(唯一入口)
pub fn deposit(&mut self, asset_id: AssetId, amount: u64) -> Result<(), &'static str> {
    let idx = asset_id as usize;
    if idx >= self.assets.len() {
        self.assets.resize(idx + 1, Balance::default());
    }
    self.assets[idx].deposit(amount)
}

// get_balance_mut() 不创建槽位,返回 Result
pub fn get_balance_mut(&mut self, asset_id: AssetId) -> Result<&mut Balance, &'static str> {
    self.assets.get_mut(asset_id as usize).ok_or("Asset not found")
}
}

🚀 为什么 Vec<Balance> 直接索引是最高效选择?

1. 极佳的缓存友好性 (Cache-Friendly)

Vec<Balance> 是连续内存布局,相邻资产的 Balance 在内存中也相邻。 当 CPU 读取一个 Balance 时,整个缓存行(通常 64 字节)会被加载, 相邻的 Balance 数据也一并进入 L1/L2 缓存,后续访问几乎零延迟。

2. get_balance() 是高频调用函数

在撮合引擎中,每笔订单都需要多次调用 get_balance()

  • 下单前检查余额
  • 冻结资金
  • 每笔成交结算(买方 + 卖方各 2-3 次)
  • 退款未使用资金

一笔订单可能产生 5-10 次 get_balance() 调用。 在高频交易场景(每秒万笔订单),这意味着每秒 5-10 万次调用。 O(1) + 缓存友好 对性能至关重要。

结算方法

新增专门的结算方法,一次性处理买方或卖方的所有结算:

#![allow(unused)]
fn main() {
/// 买方结算:消费 Quote,获得 Base,退款未使用的 Quote
pub fn settle_as_buyer(
    &mut self,
    quote_asset_id: AssetId,
    base_asset_id: AssetId,
    spend_quote: u64,   // 消费的 USDT
    gain_base: u64,     // 获得的 BTC
    refund_quote: u64,  // 退款的 USDT
) -> Result<(), &'static str> {
    // 1. 消费 Quote (Frozen)
    self.get_balance_mut(quote_asset_id).spend_frozen(spend_quote)?;
    
    // 2. 获得 Base (Available)
    self.get_balance_mut(base_asset_id).deposit(gain_base)?;
    
    // 3. 退款 (Frozen → Available)
    if refund_quote > 0 {
        self.get_balance_mut(quote_asset_id).unlock(refund_quote)?;
    }
    Ok(())
}
}

5. 运行结果

=== 0xInfinity: Stage 6 (Enforced Balance) ===
Symbol: BTC_USDT | Price: 2 decimals, Qty: 8 decimals
Cost formula: price * qty / 100000000

[0] Initial deposits...
    Alice: 100.00000000 BTC, 10000.00 USDT
    Bob:   5.00000000 BTC, 200000.00 USDT

[1] Alice places sell orders...
    Order 1: Sell 10.00000000 BTC @ $100.00 -> New
    Order 2: Sell 5.00000000 BTC @ $101.00 -> New
    Alice balance: avail=85.00000000 BTC, frozen=15.00000000 BTC

[2] Bob places buy order (taker)...
    Order 3: Buy 12.00000000 BTC @ $101.00 (cost: 1212.00 USDT)
    Trades:
      - Trade #1: 10.00000000 BTC @ $100.00
      - Trade #2: 2.00000000 BTC @ $101.00
    Order status: Filled

[3] Final balances:
    Alice: 85.00000000 BTC (frozen: 3.00000000), 11202.00 USDT
    Bob:   17.00000000 BTC, 198798.00 USDT (frozen: 0.00)

    Book: Best Bid=None, Best Ask=Some("101.00")

=== End of Simulation ===

结果与前一章一致,但现在所有余额操作都通过类型系统保护!

6. 单元测试

新增 8 个 enforced_balance 测试:

$ cargo test

test result: ok. 16 passed; 0 failed

7. 错误处理示例

使用新 API 时,必须处理 Result

#![allow(unused)]
fn main() {
// ❌ 编译错误:未处理的 Result
balance.deposit(100);

// ✅ 正确:显式处理
balance.deposit(100)?;  // 使用 ? 传播错误

// ✅ 正确:使用 unwrap(仅在确定不会失败时)
balance.deposit(100).unwrap();

// ✅ 正确:匹配处理
match balance.lock(1000) {
    Ok(()) => println!("Locked successfully"),
    Err(e) => println!("Failed to lock: {}", e),
}
}

Summary

本章完成了以下工作:

  1. 私有字段封装:所有余额字段私有化,防止无意修改
  2. Result 返回类型:所有变更操作返回明确的错误信息
  3. 版本追踪:每次变更自动递增 version,支持审计
  4. Checked 算术:所有运算使用 checked_add/sub,溢出返回错误
  5. 方法重命名lock/unlock/spend_frozen 语义更清晰
  6. 结算方法settle_as_buyer/settle_as_seller 一站式结算
  7. Asset ID 约束:为未来 O(1) 直接索引优化做准备
  8. 16 个测试通过:包括 8 个新的 enforced_balance 测试

现在我们的余额管理是类型安全的——编译器本身就能防止大部分余额操作错误!

0x07-a Testing Framework - Correctness

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

Core Objective: To establish a verifiable, repeatable, and traceable testing infrastructure for the matching engine.

This chapter is not just about “how to test”, but importantly about understanding “why designed this way”—these design decisions stem directly from real-world exchange requirements.

1. Why a Testing Framework?

1.1 The Uniqueness of Matching Engines

A matching engine is not a generic CRUD app. A single bug can lead to:

  • Fund Errors: Users’ funds disappearing or inflating.
  • Order Loss: Orders executed but not recorded.
  • Inconsistent States: Contradictions between balances, orders, and ledgers.

Therefore, we need:

  1. Deterministic Testing: Same input must yield same output.
  2. Complete Audit: Every penny movement must be traceable.
  3. Fast Verification: Quickly confirm correctness after every code change.

1.2 Golden File Testing Pattern

We adopt the Golden File Pattern:

fixtures/         # Input (Fixed)
    ├── orders.csv
    └── balances_init.csv

baseline/         # Golden Baseline (Result of first correct run, committed to git)
    ├── t1_balances_deposited.csv
    ├── t2_balances_final.csv
    ├── t2_ledger.csv
    └── t2_orderbook.csv

output/           # Current Run Result (gitignored)
    └── ...

Why this pattern?

  1. Determinism: Fixed seeds ensure identical random sequences.
  2. Version Control: Baselines are committed; any change triggers a git diff.
  3. Fast Feedback: Just diff baseline/ output/.
  4. Auditable: Baseline is the “contract”; deviations require explanation.

2. Precision Design: decimals vs display_decimals

2.1 Why Two Precisions?

This is the most error-prone area in exchanges. Consider this real case:

User sees:      Buy 0.01 BTC @ $85,000.00
Internal store: qty=1000000 (satoshi), price=85000000000 (micro-cents)

If we confuse these layers:

  • User enters 0.01, system treats as 0.01 satoshi (= 0.00000001 BTC).
  • Or user account shows 100 BTC, but actually has 0.000001 BTC.

Solution: Clearly distinguish two layers.

2.2 Precision Layers

┌─────────────────────────────────────────────────────────────┐
│ Layer 1: Client (display_decimals)                          │
│   - Numbers seen by users                                   │
│   - Can be adjusted based on business needs                 │
│   - E.g.: BTC displays 6 decimals (0.000001 BTC)            │
└─────────────────────────────────────────────────────────────┘
                              ↓
                    Auto Convert (× 10^decimals)
                              ↓
┌─────────────────────────────────────────────────────────────┐
│ Layer 2: Internal (decimals)                                │
│   - Precision for internal storage and calculation          │
│   - NEVER change once set                                   │
│   - E.g.: BTC stored with 8 decimals (satoshi)              │
└─────────────────────────────────────────────────────────────┘

2.3 Configuration Design

assets_config.csv (Asset Precision Config):

asset_id,asset,decimals,display_decimals
1,BTC,8,6     # Min unit 0.000001 BTC ≈ $0.085
2,USDT,6,4    # Min unit 0.0001 USDT
3,ETH,8,4     # Min unit 0.0001 ETH ≈ $0.40
FieldMutabilityExplanation
decimals⚠️ Never ChangeDefines min unit; changing breaks all existing data.
display_decimals✅ DynamicClient-side precision for Quantity (qty).

symbols_config.csv (Trading Pair Config):

symbol_id,symbol,base_asset_id,quote_asset_id,price_decimal,price_display_decimal
0,BTC_USDT,1,2,6,2    # Price min unit $0.01
1,ETH_USDT,3,2,6,2

Key Design: Precision Source

Order FieldPrecision SourceConfig File
qtybase_asset.display_decimalsassets_config.csv
pricesymbol.price_display_decimalsymbols_config.csv

⚠️ Note: Price precision comes from Symbol config, NOT Quote Asset! This is because the same quote asset (e.g., USDT) may have different price precisions in different pairs.

Why decimals cannot change?

Suppose BTC decimals change from 8 to 6:

  • Original balance 100,000,000 (= 1 BTC with 8 decimals).
  • New interpretation 100,000,000 / 10^6 = 100 BTC.
  • User gains 99 BTC out of thin air!

Why display_decimals can change?

This is just the display layer:

  • Original display: 0.12345678 BTC.
  • New display (6 decimals): 0.123456 BTC.
  • Internal storage remains 12,345,678 satoshis.

3. Balance Format: Row vs Column

3.1 Problem: Storing Multi-Asset Balances

Option A: Columnar (One column per asset)

user_id,btc_avail,btc_frozen,usdt_avail,usdt_frozen
1,10000000000,0,10000000000000,0

Option B: Row-based (One row per asset)

user_id,asset_id,avail,frozen,version
1,1,10000000000,0,0
1,2,10000000000000,0,0

3.2 Why Row-based?

DimensionColumnarRow-based
Extensibility❌ Alter table to add asset✅ Just add a row
Sparse Data❌ Many nulls/zeros✅ Store only non-zero assets
DB Compat❌ Non-standard✅ Standard normalization
Genericity❌ Asset names hardcodedasset_id is generic

Real Scenario: An exchange supports 500+ assets, but users avg 3-5 holdings. Row-based design saves 99% storage space.

4. Timeline Snapshot Design

4.1 Why Multiple Snapshots?

Matching is a multi-stage process:

T0: Initial State (fixtures/balances_init.csv)
    ↓ deposit()
T1: Deposit Done (baseline/t1_balances_deposited.csv)
    ↓ execute orders
T2: Trading Done (baseline/t2_balances_final.csv)

Errors can occur at any stage:

  • T0→T1: Is deposit logic correct?
  • T1→T2: Is trade settlement correct?

Snapshots pinpoint issues:

# Verify Deposit
diff balances_init.csv t1_balances_deposited.csv

# Verify Settlement
diff t1_balances_deposited.csv t2_balances_final.csv

4.2 Naming Convention

t1_balances_deposited.csv   # t1 stage, balances type, deposited state
t2_balances_final.csv       # t2 stage, balances type, final state
t2_ledger.csv               # t2 stage, ledger type
t2_orderbook.csv            # t2 stage, orderbook type

Principle: {Time}_{Type}_{State}.csv

Benefits:

  1. Natural sort order by time.
  2. Clear content identification.
  3. Avoids ambiguity.

5. Settlement Ledger Design

5.1 Why Ledger?

t2_ledger.csv is the system’s Audit Log. Every penny movement is recorded here.

Without Ledger:

  • User complaint: “Where did my money go?”
  • Support: “Your balance is X.”
  • Unanswerable: “When did it change? Why?”

With Ledger:

trade_id,user_id,asset_id,op,delta,balance_after
1,96,2,debit,849700700,9999150299300
1,96,1,credit,1000000,10001000000

Traceability:

  • Trade #1 caused User #96’s USDT to decrease by 849,700,700.
  • Simultaneously BTC increased by 1,000,000.
  • What is the balance after change.

5.2 Why delta + after instead of before + after?

Option A: before + after

delta,balance_before,balance_after
849700700,10000000000,9999150299300

Option B: delta + after

delta,balance_after
849700700,9999150299300

Why B?

  1. Less Redundancy: before = after - delta.
  2. Usefulness: We mostly verify “Is the final state correct?”.
  3. Clarity: Delta directly explains the change.

6. ME Orderbook Snapshot

6.1 Why Orderbook Snapshot?

After trading, the Orderbook still holds unfilled orders. These orders:

  • Reside in RAM.
  • Are lost if system restarts.

t2_orderbook.csv is a Full Snapshot of ME State:

order_id,user_id,side,order_type,price,qty,filled_qty,status
6,907,sell,limit,85330350000,2000000,0,New

Uses:

  1. Recovery: Revert Orderbook state after restart.
  2. Verification: Compare against theoretical expectations.
  3. Debugging: Check stuck orders.

6.2 Why Record All Fields?

The goal is Full Recovery. Rebuilding Order struct requires:

#![allow(unused)]
fn main() {
struct Order {
    id, user_id, price, qty, filled_qty, side, order_type, status
}
}

Missing any field prevents recovery.

7. Test Script Design

7.1 Modular Scripts

scripts/
├── test_01_generate.sh     # Step 1: Generate Data
├── test_02_baseline.sh     # Step 2: Generate Baseline
├── test_03_verify.sh       # Step 3: Run & Verify
└── test_e2e.sh             # Combo: Full E2E Flow

Why Modular?

  1. Isolated Debugging: Run only relevant steps.
  2. Flexible Composition: CI can verify without regenerating.
  3. Readability: One script, one job.

7.2 Usage

# Daily Test (Use existing baseline)
./scripts/test_e2e.sh

# Regenerate Baseline & Test
./scripts/test_e2e.sh --regenerate

8. CLI Design: --baseline Switch

8.1 Why Switch?

Default behavior:

  • Output to output/
  • Never overwrite baseline

Update baseline:

  • Add --baseline arg
  • Output to baseline/

Why not auto-overwrite?

  1. Safety: Prevent accidental baseline corruption.
  2. Intent: Updating baseline is a conscious decision.
  3. Git Friendly: Changes trigger diff.

8.2 Implementation

#![allow(unused)]
fn main() {
fn get_output_dir() -> &'static str {
    let args: Vec<String> = std::env::args().collect();
    if args.iter().any(|a| a == "--baseline") {
        "baseline"
    } else {
        "output"
    }
}
}

9. Execution Example

9.1 Full Flow

# 1. Generate Data
python3 scripts/generate_orders.py --orders 100000 --seed 42

# 2. Generate Baseline (First run or update)
cargo run --release -- --baseline

# 3. Daily Test
./scripts/test_e2e.sh

9.2 Verification Output

╔════════════════════════════════════════════════════════════╗
║     0xInfinity Testing Framework - E2E Test                ║
╚════════════════════════════════════════════════════════════╝

  t1_balances_deposited.csv: ✅ MATCH
  t2_balances_final.csv: ✅ MATCH
  t2_ledger.csv: ✅ MATCH
  t2_orderbook.csv: ✅ MATCH

✅ All tests passed!

10. Summary

This chapter established a complete testing infrastructure:

Design PointProblem SolvedSolution
Precision ConfusionUser vs Internal precisiondecimals + display_decimals
Asset ExtensionSupport N assetsRow-based balance format
TraceabilityWhere failed?Timeline Snapshots (T0→T1→T2)
Fund AuditWhere funds go?Settlement Ledger
State RecoveryRestart recoveryOrderbook Snapshot
RegressionBreaking changes?Golden File Pattern
EfficiencyFast feedbackModular scripts

Core Philosophy:

Testing is not an afterthought, but part of the design. A good testing framework gives you confidence when changing code.

Next section (0x07-b) will add performance benchmarks on top of this.




🇨🇳 中文

📦 代码变更: 查看 Diff

核心目的:为撮合引擎建立可验证、可重复、可追溯的测试基础设施。

本章不仅是“如何测试“,更重要的是理解“为什么这样设计“——这些设计决策直接源于真实交易所的需求。

1. 为什么需要测试框架?

1.1 撮合引擎的特殊性

撮合引擎不是普通的 CRUD 应用。一个 bug:

  • 资金错误:用户资金凭空消失或增加
  • 订单丢失:订单被执行但没有记录
  • 状态不一致:余额、订单、成交记录互相矛盾

因此,我们需要:

  1. 确定性测试:相同的输入必须产生相同的输出
  2. 完整审计:每一分钱的变动都可追溯
  3. 快速验证:每次修改代码后能快速确认没有破坏正确性

1.2 Golden File 测试模式

我们采用 Golden File 模式

fixtures/         # 输入(固定)
    ├── orders.csv
    └── balances_init.csv

baseline/         # 黄金基准(第一次正确运行的结果,git 提交)
    ├── t1_balances_deposited.csv
    ├── t2_balances_final.csv
    ├── t2_ledger.csv
    └── t2_orderbook.csv

output/           # 当前运行结果(gitignored)
    └── ...

为什么选择这种模式?

  1. 确定性:固定的 seed 保证相同的随机数序列
  2. 版本控制:baseline 提交到 git,任何变化都能被 diff 检测
  3. 快速反馈:只需 diff baseline/ output/
  4. 可审计:baseline 是“合约“,任何偏离都需要解释

2. 精度设计:decimals vs display_decimals

2.1 为什么需要两种精度?

这是交易所最容易出错的地方。看这个真实案例:

用户看到:买入 0.01 BTC @ $85,000.00
内部存储:qty=1000000 (satoshi), price=85000000000 (微美分)

如果混淆这两层,会发生什么?

  • 用户输入 0.01,系统理解为 0.01 satoshi = 实际 0.0000001 BTC
  • 或者用户账户显示有 100 BTC,实际只有 0.000001 BTC

解决方案:明确区分两层精度

2.2 精度层次

┌─────────────────────────────────────────────────────────────┐
│ Layer 1: Client (display_decimals)                          │
│   - 用户看到的数字                                            │
│   - 可以根据业务需求调整                                        │
│   - 例如:BTC 数量显示 6 位小数 (0.000001 BTC)              │
└─────────────────────────────────────────────────────────────┘
                              ↓
                    自动转换 (× 10^decimals)
                              ↓
┌─────────────────────────────────────────────────────────────┐
│ Layer 2: Internal (decimals)                                │
│   - 内部存储和计算的精度                                        │
│   - 一旦设定永不改变                                            │
│   - 例如:BTC 存储 8 位精度 (satoshi)                          │
└─────────────────────────────────────────────────────────────┘

2.3 配置文件设计

assets_config.csv(资产精度配置):

asset_id,asset,decimals,display_decimals
1,BTC,8,6     # 最小单位 0.000001 BTC ≈ $0.085
2,USDT,6,4   # 最小单位 0.0001 USDT
3,ETH,8,4    # 最小单位 0.0001 ETH ≈ $0.40
字段可变性说明
decimals⚠️ 永不改变定义最小单位,改变会破坏所有现有数据
display_decimals✅ 可动态调整用于数量 (qty) 的客户端精度

symbols_config.csv(交易对配置):

symbol_id,symbol,base_asset_id,quote_asset_id,price_decimal,price_display_decimal
0,BTC_USDT,1,2,6,2    # 价格最小单位 $0.01
1,ETH_USDT,3,2,6,2

关键设计:精度来源

订单字段精度来源配置位置
qty (数量)base_asset.display_decimalsassets_config.csv
price (价格)symbol.price_display_decimalsymbols_config.csv

⚠️ 注意:price 精度来自 symbol 配置,不是 quote_asset! 这样设计是因为同一个 quote asset(如 USDT)在不同交易对中可能有不同的价格精度。

为什么 decimals 不能改变?

假设 BTC decimals 从 8 改为 6:

  • 原来账户余额 100000000 (= 1 BTC)
  • 现在变成 100000000 / 10^6 = 100 BTC
  • 用户凭空获得 99 BTC!

为什么 display_decimals 可以改变?

这只是显示层,不影响存储:

  • 原来显示 0.12345678 BTC
  • 调整后显示 0.123456 BTC(6位)
  • 内部存储仍然是 12345678 satoshi

3. 余额格式设计:行式 vs 列式

3.1 问题:如何存储多资产余额?

Option A:列式(每个资产一列)

user_id,btc_avail,btc_frozen,usdt_avail,usdt_frozen
1,10000000000,0,10000000000000,0

Option B:行式(每个资产一行)

user_id,asset_id,avail,frozen,version
1,1,10000000000,0,0
1,2,10000000000000,0,0

3.2 为什么选择行式?

对比维度列式行式
扩展性❌ 添加资产需改表结构✅ 直接添加新行
稀疏数据❌ 大量空值✅ 只存有余额的资产
数据库兼容❌ 非标准化✅ 标准化范式
通用性❌ 资产名硬编码✅ asset_id 通用

真实场景:交易所支持 500+ 种资产,但用户平均只持有 3-5 种。行式设计节省 99% 的存储空间。

4. 时间线快照设计

4.1 为什么需要多个快照?

撮合过程不是单一操作,而是多阶段流程:

T0: 初始状态 (fixtures/balances_init.csv)
    ↓ deposit()
T1: 充值完成 (baseline/t1_balances_deposited.csv)
    ↓ execute orders
T2: 交易完成 (baseline/t2_balances_final.csv)

每个阶段都可能出错

  • T0→T1:deposit 逻辑是否正确?
  • T1→T2:交易结算是否正确?

有了快照,可以精确定位问题:

# 验证 deposit 正确性
diff balances_init.csv t1_balances_deposited.csv

# 验证交易结算正确性
diff t1_balances_deposited.csv t2_balances_final.csv

4.2 文件命名设计

t1_balances_deposited.csv   # t1 阶段,balances 类型,deposited 状态
t2_balances_final.csv       # t2 阶段,balances 类型,final 状态
t2_ledger.csv               # t2 阶段,ledger 类型
t2_orderbook.csv            # t2 阶段,orderbook 类型

命名原则{时间点}_{数据类型}_{状态}.csv

这样的命名:

  1. 按时间排序时自然有序
  2. 一眼看出数据是什么
  3. 避免文件名歧义

5. Settlement Ledger 设计

5.1 为什么需要 Ledger?

t2_ledger.csv 是整个系统的审计日志。每一分钱的变动都记录在这里。

没有 Ledger 的问题

  • 用户投诉:我的钱去哪了?
  • 只能说:交易后余额是 X
  • 无法回答:什么时候变的?为什么变?

有了 Ledger

trade_id,user_id,asset_id,op,delta,balance_after
1,96,2,debit,849700700,9999150299300
1,96,1,credit,1000000,10001000000

可以完整追溯:

  • Trade #1 导致 User #96 的 USDT 减少 849700700
  • 同时 BTC 增加 1000000
  • 变化后余额是多少

5.2 为什么用 delta + after,而不是 before + after?

Option A:before + after

delta,balance_before,balance_after
849700700,10000000000,9999150299300

Option B:delta + after

delta,balance_after
849700700,9999150299300

选择 Option B 的原因

  1. 冗余更少:before = after - delta,可计算得出
  2. after 更有用:通常我们想验证的是“最终状态对不对“
  3. delta 直接说明变化:不需要心算 before - after

6. ME Orderbook 快照

6.1 为什么需要 Orderbook 快照?

交易完成后,Orderbook 里仍然有未成交的挂单。这些订单:

  • 在内存中
  • 如果系统重启,会丢失

t2_orderbook.csvME 状态的完整快照

order_id,user_id,side,order_type,price,qty,filled_qty,status
6,907,sell,limit,85330350000,2000000,0,New

用途

  1. 状态恢复:重启后可以从快照恢复 Orderbook
  2. 正确性验证:与理论预期对比
  3. 调试:哪些订单还在挂着?

6.2 为什么记录所有字段?

快照目的是完整恢复。恢复时需要重建 Order 结构体:

#![allow(unused)]
fn main() {
struct Order {
    id,
    user_id,
    price,
    qty,
    filled_qty,
    side,
    order_type,
    status,
}
}

缺少任何字段都无法恢复。

7. 测试脚本设计

7.1 模块化脚本

scripts/
├── test_01_generate.sh     # Step 1: 生成测试数据
├── test_02_baseline.sh     # Step 2: 生成基准
├── test_03_verify.sh       # Step 3: 运行并验证
└── test_e2e.sh             # 组合:完整 E2E 流程

为什么模块化?

  1. 单独调试:出问题时只运行相关步骤
  2. 灵活组合:CI 可以只运行 verify,不重新生成数据
  3. 可读性:每个脚本做一件事

7.2 使用方式

# 日常测试(使用现有 baseline)
./scripts/test_e2e.sh

# 重新生成基准并测试
./scripts/test_e2e.sh --regenerate

8. 命令行设计:–baseline 开关

8.1 为什么需要开关?

默认行为:

  • 输出到 output/
  • 不会覆盖 baseline

需要更新基准时:

  • --baseline 参数
  • 输出到 baseline/

为什么不自动覆盖?

  1. 安全:防止意外覆盖基准
  2. 意图明确:更新基准是有意识的决定
  3. Git 友好:baseline 变化会触发 git diff
  4. 代码实现
#![allow(unused)]
fn main() {
fn get_output_dir() -> &'static str {
    let args: Vec<String> = std::env::args().collect();
    if args.iter().any(|a| a == "--baseline") {
        "baseline"
    } else {
        "output"
    }
}
}

9. 运行示例

9.1 完整流程

# 1. 生成测试数据
python3 scripts/generate_orders.py --orders 100000 --seed 42

# 2. 生成基准(首次或需要更新时)
cargo run --release -- --baseline

# 3. 日常测试
./scripts/test_e2e.sh

9.2 验证输出

╔════════════════════════════════════════════════════════════╗
║     0xInfinity Testing Framework - E2E Test                ║
╚════════════════════════════════════════════════════════════╝

  t1_balances_deposited.csv: ✅ MATCH
  t2_balances_final.csv: ✅ MATCH
  t2_ledger.csv: ✅ MATCH
  t2_orderbook.csv: ✅ MATCH

✅ All tests passed!

10. Summary

本章建立了完整的测试基础设施:

设计点解决的问题方案
精度混淆用户精度 vs 内部精度decimals + display_decimals
资产扩展支持 N 种资产行式余额格式
过程追溯哪一步出错?时间线快照 (T0→T1→T2)
资金审计每分钱去向Settlement Ledger
状态恢复重启后恢复Orderbook 快照
回归测试代码改动是否破坏正确性Golden File 模式
测试效率快速反馈模块化脚本

核心理念

测试不是事后补的,而是设计的一部分。好的测试框架能让你在改动代码时有信心。

下一节 (0x07-b) 将在此基础上添加性能测试和优化基准。

0x07-b Performance Baseline - Initial Setup

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

Core Objective: To establish a quantifiable, traceable, and comparable performance baseline.

Building on the testing framework from 0x07-a, this chapter adds detailed performance metric collection and analysis capabilities.

1. Why a Performance Baseline?

1.1 The Performance Trap

Optimization without a baseline is blind:

  • Premature Optimization: Optimizing code that accounts for 1% of runtime.
  • Delayed Regression Detection: A refactor drops performance by 50%, but it’s only discovered 3 months later.
  • Unquantifiable Improvement: Promoting “it’s much faster,” but exactly how much?

1.2 Value of a Baseline

With a baseline, you can:

  1. Verify before Commit: Ensure performance hasn’t degraded.
  2. Pinpoint Bottlenecks: Identify which component consumes the most time.
  3. Quantify Optimization: “Throughput increased from 30K ops/s to 100K ops/s.”

2. Metric Design

2.1 Throughput Metrics

MetricExplanationCalculation
throughput_opsOrder Throughputorders / exec_time
throughput_tpsTrade Throughputtrades / exec_time

2.2 Time Breakdown

We decompose execution time into four components:

┌─────────────────────────────────────────────────────────────┐
│ Order Processing (per order)                                │
├─────────────────────────────────────────────────────────────┤
│ 1. Balance Check     │ Account lookup + balance validation  │
│    - Account lookup  │ FxHashMap O(1)                       │
│    - Fund locking    │ Check avail >= required, then lock   │
├─────────────────────────────────────────────────────────────┤
│ 2. Matching Engine   │ book.add_order()                     │
│    - Price lookup    │ BTreeMap O(log n)                    │
│    - Order matching  │ iterate + partial fill               │
├─────────────────────────────────────────────────────────────┤
│ 3. Settlement        │ settle_as_buyer/seller               │
│    - Balance update  │ HashMap O(1)                         │
├─────────────────────────────────────────────────────────────┤
│ 4. Ledger I/O        │ write_entry()                        │
│    - File write      │ Disk I/O                             │
└─────────────────────────────────────────────────────────────┘

2.3 Latency Percentiles

Sample total processing latency every N orders:

PercentileMeaning
P50Median, typical case
P9999% of requests are faster than this
P99.9Tail latency, worst cases
MaxMaximum latency

3. Initial Baseline Data

3.1 Test Environment

  • Hardware: MacBook Pro M Series
  • Data: 100,000 Orders, 47,886 Trades
  • Mode: Release build (--release)

3.2 Throughput

Throughput: ~29,000 orders/sec | ~14,000 trades/sec
Execution Time: ~3.5s

3.3 Time Breakdown 🔥

=== Performance Breakdown ===
Balance Check:       17.68ms (  0.5%)  ← FxHashMap O(1)
Matching Engine:     36.04ms (  1.0%)  ← Extremely Fast!
Settlement:           4.77ms (  0.1%)  ← Negligible
Ledger I/O:        3678.68ms ( 98.4%) ← Bottleneck!

Key Findings:

  • Ledger I/O consumes 98.4% of time.
  • Balance Check + Matching + Settlement total only ~58ms.
  • Theoretical Limit: ~1.7 Million orders/sec (without I/O).

3.4 Order Lifecycle Timeline 📊

                           Order Lifecycle
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
    │   Balance   │    │  Matching   │    │ Settlement  │    │  Ledger     │
    │   Check     │───▶│   Engine    │───▶│  (Balance)  │───▶│   I/O       │
    └─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘
          │                  │                  │                  │
          ▼                  ▼                  ▼                  ▼
    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
    │ FxHashMap   │    │  BTreeMap   │    │Vec<Balance> │    │  File::     │
    │   +Vec O(1) │    │  O(log n)   │    │    O(1)     │    │  write()    │
    └─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘

    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    Total Time:   17.68ms            36.04ms            4.77ms          3678.68ms
    Percentage:    0.5%               1.0%              0.1%             98.4%
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    Per-Order:    0.18µs             0.36µs            0.05µs           36.79µs
    Potential:   5.6M ops/s         2.8M ops/s       20M ops/s         27K ops/s
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

                        Business Logic ~58ms (1.6%)        I/O ~3679ms (98.4%)
                    ◀─────────────────────────▶      ◀───────────────────────▶
                             Fast ✅                        Bottleneck 🔴

Analysis:

PhaseLatency/OrderTheoretical OPSNote
Balance Check0.18µs5.6M/sFxHashMap Lookup + Vec O(1)
Matching Engine0.36µs2.8M/sBTreeMap Price Matching
Settlement0.05µs20M/sVec<Balance> O(1) Indexing
Ledger I/O36.79µs27K/sUnbuffered File Write = Bottleneck!

E2E Result:

  • Actual Throughput: ~29K orders/sec (I/O Bound)
  • Theoretical Limit (No I/O): ~1.7M orders/sec (60x room for improvement!)

3.5 Latency Percentiles

=== Latency Percentiles (sampled) ===
  Min:        125 ns
  Avg:      34022 ns
  P50:        583 ns   ← Typical order < 1µs
  P99:     391750 ns   ← 99% of orders < 0.4ms
  P99.9:  1243833 ns   ← Tail latency ~1.2ms
  Max:    3207875 ns   ← Worst case ~3ms

4. Output Files

4.1 t2_perf.txt (Machine Readable)

# Performance Baseline - 0xInfinity
# Generated: 2025-12-16
orders=100000
trades=47886
exec_time_ms=3451.78
throughput_ops=28971
throughput_tps=13873
matching_ns=32739014
settlement_ns=3085409
ledger_ns=3388134698
latency_min_ns=125
latency_avg_ns=34022
latency_p50_ns=583
latency_p99_ns=391750
latency_p999_ns=1243833
latency_max_ns=3207875

4.2 t2_summary.txt (Human Readable)

Contains full execution summary and performance breakdown.

5. PerfMetrics Implementation

#![allow(unused)]
fn main() {
/// Performance metrics for execution analysis
#[derive(Default)]
struct PerfMetrics {
    // Timing breakdown (nanoseconds)
    total_balance_check_ns: u64,  // Account lookup + balance check + lock
    total_matching_ns: u64,       // OrderBook.add_order()
    total_settlement_ns: u64,     // Balance updates after trade
    total_ledger_ns: u64,         // Ledger file I/O
    
    // Per-order latency samples
    latency_samples: Vec<u64>,
    sample_rate: usize,
}

impl PerfMetrics {
    fn new(sample_rate: usize) -> Self { ... }
    
    fn add_order_latency(&mut self, latency_ns: u64) { ... }
    fn add_balance_check_time(&mut self, ns: u64) { ... }
    fn add_matching_time(&mut self, ns: u64) { ... }
    fn add_settlement_time(&mut self, ns: u64) { ... }
    fn add_ledger_time(&mut self, ns: u64) { ... }
    
    fn percentile(&self, p: f64) -> Option<u64> { ... }
    fn min_latency(&self) -> Option<u64> { ... }
    fn max_latency(&self) -> Option<u64> { ... }
    fn avg_latency(&self) -> Option<u64> { ... }
}
}

6. Optimization Roadmap

Based on baseline data, future directions:

6.1 Short Term (0x07-c)

OptimizationExpected GainDifficulty
Use BufWriter10-50x I/OLow
Batch Write2-5xLow

6.2 Mid Term (0x08+)

OptimizationExpected GainDifficulty
Async I/ODecouple Matching & PersistenceMedium
Memory PoolReduce AllocationMedium

6.3 Long Term

OptimizationExpected GainDifficulty
DPDK/io_uring10x+High
FPGA100x+Extreme

7. Commands Reference

# Run and generate performance data
cargo run --release

# Update baseline (when code changes)
cargo run --release -- --baseline

# View performance data
cat output/t2_perf.txt

# Compare performance changes
python3 scripts/compare_perf.py

compare_perf.py Output Example

╔════════════════════════════════════════════════════════════════════════╗
║                    Performance Comparison Report                       ║
╚════════════════════════════════════════════════════════════════════════╝

Metric                           Baseline         Current       Change
───────────────────────────────────────────────────────────────────────────
Orders                             100000          100000            -
Trades                              47886           47886            -

Exec Time                       3753.87ms       3484.37ms        -7.2%
Throughput (orders)               26639/s         28700/s        +7.7%
Throughput (trades)               12756/s         13743/s        +7.7%

───────────────────────────────────────────────────────────────────────────
Timing Breakdown (lower is better):

Metric                           Baseline         Current     Change        OPS
Balance Check                     17.68ms         16.51ms      -6.6%       6.1M
Matching Engine                   36.04ms         35.01ms      -2.8%       2.9M
Settlement                         4.77ms          5.22ms      +9.4%      19.2M
Ledger I/O                      3678.68ms       3411.49ms      -7.3%        29K

───────────────────────────────────────────────────────────────────────────
Latency Percentiles (lower is better):

Metric                           Baseline         Current       Change
Latency MIN                         125ns           125ns        +0.0%
Latency AVG                        37.9µs          34.8µs        -8.2%
Latency P50                         584ns           541ns        -7.4%
Latency P99                       420.2µs         398.9µs        -5.1%
Latency P99.9                      1.63ms          1.24ms       -24.3%
Latency MAX                        9.76ms          3.53ms       -63.9%

───────────────────────────────────────────────────────────────────────────
✅ No significant regressions detected

Summary

This chapter accomplished:

  1. PerfMetrics Structure: Collecting time breakdown & latency samples.
  2. Time Breakdown: Balance Check / Matching / Settlement / Ledger I/O.
  3. Latency Percentiles: P50 / P99 / P99.9 / Max.
  4. t2_perf.txt: Machine-readable baseline file.
  5. compare_perf.py: Tool to detect regression.
  6. Key Finding: Ledger I/O takes 98.4%, major bottleneck.



🇨🇳 中文

📦 代码变更: 查看 Diff

核心目的:建立可量化、可追踪、可比较的性能基线。

本章在 0x07-a 测试框架基础上,添加详细的性能指标收集和分析能力。

1. 为什么需要性能基线?

1.1 性能陷阱

没有基线的优化是盲目的:

  • 过早优化:优化了占 1% 时间的代码
  • 回归发现延迟:某次重构导致性能下降 50%,但 3 个月后才发现
  • 无法量化改进:说“快了很多“,但具体快了多少?

1.2 基线的价值

有了基线,你可以:

  1. 每次提交前验证:性能没有下降
  2. 精确定位瓶颈:哪个组件消耗最多时间
  3. 量化优化效果:从 30K ops/s 提升到 100K ops/s

2. 性能指标设计

2.1 吞吐量指标

指标说明计算方式
throughput_ops订单吞吐量orders / exec_time
throughput_tps成交吞吐量trades / exec_time

2.2 时间分解

我们将执行时间分解为四个组件:

┌─────────────────────────────────────────────────────────────┐
│ Order Processing (per order)                                │
├─────────────────────────────────────────────────────────────┤
│ 1. Balance Check     │ Account lookup + balance validation  │
│    - Account lookup  │ FxHashMap O(1)                       │
│    - Fund locking    │ Check avail >= required, then lock   │
├─────────────────────────────────────────────────────────────┤
│ 2. Matching Engine   │ book.add_order()                     │
│    - Price lookup    │ BTreeMap O(log n)                    │
│    - Order matching  │ iterate + partial fill               │
├─────────────────────────────────────────────────────────────┤
│ 3. Settlement        │ settle_as_buyer/seller               │
│    - Balance update  │ HashMap O(1)                         │
├─────────────────────────────────────────────────────────────┤
│ 4. Ledger I/O        │ write_entry()                        │
│    - File write      │ Disk I/O                             │
└─────────────────────────────────────────────────────────────┘

2.3 延迟百分位数

采样每 N 个订单的总处理延迟,计算:

百分位数含义
P50中位数,典型情况
P9999% 的请求低于此值
P99.9尾延迟,最坏情况
Max最大延迟

3. 初始基线数据

3.1 测试环境

  • 硬件:MacBook Pro M 系列
  • 数据:100,000 订单,47,886 成交
  • 模式:Release build (--release)

3.2 吞吐量

Throughput: ~29,000 orders/sec | ~14,000 trades/sec
Execution Time: ~3.5s

3.3 时间分解 🔥

=== Performance Breakdown ===
Balance Check:       17.68ms (  0.5%)  ← FxHashMap O(1)
Matching Engine:     36.04ms (  1.0%)  ← 极快!
Settlement:           4.77ms (  0.1%)  ← 几乎可忽略
Ledger I/O:        3678.68ms ( 98.4%) ← 瓶颈!

关键发现

  • Ledger I/O 占用 98.4% 的时间
  • Balance Check + Matching + Settlement 总共只需 ~58ms
  • 理论上限:~170 万 orders/sec(如果没有 I/O)

3.4 订单生命周期性能时间线 📊

                           订单生命周期 (Order Lifecycle)
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
    │   Balance   │    │  Matching   │    │ Settlement  │    │  Ledger     │
    │   Check     │───▶│   Engine    │───▶│  (Balance)  │───▶│   I/O       │
    └─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘
          │                  │                  │                  │
          ▼                  ▼                  ▼                  ▼
    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
    │ FxHashMap   │    │  BTreeMap   │    │Vec<Balance> │    │  File::     │
    │   +Vec O(1) │    │  O(log n)   │    │    O(1)     │    │  write()    │
    └─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘

    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    Total Time:   17.68ms            36.04ms            4.77ms          3678.68ms
    Percentage:    0.5%               1.0%              0.1%             98.4%
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    Per-Order:    0.18µs             0.36µs            0.05µs           36.79µs
    Potential:   5.6M ops/s         2.8M ops/s       20M ops/s         27K ops/s
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

                        业务逻辑 ~58ms (1.6%)              I/O ~3679ms (98.4%)
                    ◀─────────────────────────▶      ◀───────────────────────▶
                             极快 ✅                        瓶颈 🔴

性能分析:

阶段每订单延迟理论 OPS说明
Balance Check0.18µs5.6M/sFxHashMap 账户查找 + Vec O(1) 余额索引
Matching Engine0.36µs2.8M/sBTreeMap 价格匹配
Settlement0.05µs20M/sVec<Balance> O(1) 直接索引
Ledger I/O36.79µs27K/sunbuffered 文件写入 = 瓶颈!

E2E 结果:

  • 实际吞吐量: ~29K orders/sec (受限于 Ledger I/O)
  • 理论上限 (无 I/O): ~1.7M orders/sec (60x 提升空间!)

3.5 延迟百分位数

=== Latency Percentiles (sampled) ===
  Min:        125 ns
  Avg:      34022 ns
  P50:        583 ns   ← 典型订单 < 1µs
  P99:     391750 ns   ← 99% 的订单 < 0.4ms
  P99.9:  1243833 ns   ← 尾延迟 ~1.2ms
  Max:    3207875 ns   ← 最坏 ~3ms

4. 输出文件

4.1 t2_perf.txt(机器可读)

# Performance Baseline - 0xInfinity
# Generated: 2025-12-16
orders=100000
trades=47886
exec_time_ms=3451.78
throughput_ops=28971
throughput_tps=13873
matching_ns=32739014
settlement_ns=3085409
ledger_ns=3388134698
latency_min_ns=125
latency_avg_ns=34022
latency_p50_ns=583
latency_p99_ns=391750
latency_p999_ns=1243833
latency_max_ns=3207875

4.2 t2_summary.txt(人类可读)

包含完整的执行摘要和性能分解。

5. PerfMetrics 实现

#![allow(unused)]
fn main() {
/// Performance metrics for execution analysis
#[derive(Default)]
struct PerfMetrics {
    // Timing breakdown (nanoseconds)
    total_balance_check_ns: u64,  // Account lookup + balance check + lock
    total_matching_ns: u64,       // OrderBook.add_order()
    total_settlement_ns: u64,     // Balance updates after trade
    total_ledger_ns: u64,         // Ledger file I/O
    
    // Per-order latency samples
    latency_samples: Vec<u64>,
    sample_rate: usize,
}

impl PerfMetrics {
    fn new(sample_rate: usize) -> Self { ... }
    
    fn add_order_latency(&mut self, latency_ns: u64) { ... }
    fn add_balance_check_time(&mut self, ns: u64) { ... }
    fn add_matching_time(&mut self, ns: u64) { ... }
    fn add_settlement_time(&mut self, ns: u64) { ... }
    fn add_ledger_time(&mut self, ns: u64) { ... }
    
    fn percentile(&self, p: f64) -> Option<u64> { ... }
    fn min_latency(&self) -> Option<u64> { ... }
    fn max_latency(&self) -> Option<u64> { ... }
    fn avg_latency(&self) -> Option<u64> { ... }
}
}

6. 优化路线图

基于基线数据,后续优化方向:

6.1 短期(0x07-c)

优化点预期提升难度
使用 BufWriter10-50x I/O
批量写入2-5x

6.2 中期(0x08+)

优化点预期提升难度
异步 I/O解耦撮合和持久化
内存池减少分配

6.3 长期

优化点预期提升难度
DPDK/io_uring10x+
FPGA100x+极高

7. 命令参考

# 运行并生成性能数据
cargo run --release

# 更新基线(当代码变化时)
cargo run --release -- --baseline

# 查看性能数据
cat output/t2_perf.txt

# 对比性能变化
python3 scripts/compare_perf.py

compare_perf.py 输出示例

╔════════════════════════════════════════════════════════════════════════╗
║                    Performance Comparison Report                       ║
╚════════════════════════════════════════════════════════════════════════╝

Metric                           Baseline         Current       Change
───────────────────────────────────────────────────────────────────────────
Orders                             100000          100000            -
Trades                              47886           47886            -

Exec Time                       3753.87ms       3484.37ms        -7.2%
Throughput (orders)               26639/s         28700/s        +7.7%
Throughput (trades)               12756/s         13743/s        +7.7%

───────────────────────────────────────────────────────────────────────────
Timing Breakdown (lower is better):

Metric                           Baseline         Current     Change        OPS
Balance Check                     17.68ms         16.51ms      -6.6%       6.1M
Matching Engine                   36.04ms         35.01ms      -2.8%       2.9M
Settlement                         4.77ms          5.22ms      +9.4%      19.2M
Ledger I/O                      3678.68ms       3411.49ms      -7.3%        29K

───────────────────────────────────────────────────────────────────────────
Latency Percentiles (lower is better):

Metric                           Baseline         Current       Change
Latency MIN                         125ns           125ns        +0.0%
Latency AVG                        37.9µs          34.8µs        -8.2%
Latency P50                         584ns           541ns        -7.4%
Latency P99                       420.2µs         398.9µs        -5.1%
Latency P99.9                      1.63ms          1.24ms       -24.3%
Latency MAX                        9.76ms          3.53ms       -63.9%

───────────────────────────────────────────────────────────────────────────
✅ No significant regressions detected

Summary

本章完成了以下工作:

  1. PerfMetrics 结构:收集时间分解和延迟样本
  2. 时间分解:Balance Check / Matching / Settlement / Ledger I/O
  3. 延迟百分位数:P50 / P99 / P99.9 / Max
  4. t2_perf.txt:机器可读的性能基线文件
  5. compare_perf.py:对比工具,检测性能回归
  6. 关键发现:Ledger I/O 占 98.4%,是主要瓶颈

0x08-a Trading Pipeline Design

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

Core Objective: To design a complete trading pipeline architecture that ensures order persistence, balance consistency, and system recoverability.

This chapter addresses the most critical design issues in a matching engine: Service Partitioning, Data Flow, and Atomicity Guarantees.

1. Why Persistence?

1.1 The Problem Scenario

Suppose the system crashes during matching:

User A sends Buy Order → ME receives & fills → System Crash
                                               ↓
                                        User A's funds deducted
                                        But no trade record
                                        Order Lost!

Consequences of No Persistence:

  • Order Loss: User orders vanish.
  • Inconsistent State: Funds changed but no record exists.
  • Unrecoverable: Upon restart, valid orders are unknown.

1.2 Solution: Persist First, Match Later

User A Buy Order → WAL Persist → ME Match → System Crash
                     ↓             ↓
                Order Saved    Replay & Recover!

2. Unique Ordering

2.1 Why Unique Ordering?

In distributed systems, multiple nodes must agree on order sequence:

ScenarioProblem
Node A receives Order 1 then Order 2
Node B receives Order 2 then Order 1Inconsistent Order!

Result: Matching results differ between nodes!

2.2 Solution: Single Sequencer + Global Sequence ID

All Orders → Sequencer → Assign Global sequence_id → Persist → Dispatch to ME
              ↓
         Unique Arrival Order
FieldDescription
sequence_idMonotonically increasing global ID
timestampNanosecond precision timestamp
order_idBusiness level Order ID

3. Order Lifecycle

3.1 Persist First, Execute Later

┌─────────────────────────────────────────────────────────────────────────┐
│                          Order Lifecycle                                │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐             │
│   │ Gateway │───▶│Pre-Check│───▶│   WAL   │───▶│   ME    │             │
│   │(Receiver)│    │(Balance) │    │(Persist)│    │ (Match) │             │
│   └─────────┘    └─────────┘    └─────────┘    └─────────┘             │
│        │              │              │              │                   │
│        ▼              ▼              ▼              ▼                   │
│   Receive Order   Insufficient?   Disk Write     Execute Match           │
│                   Early Reject    Assign SeqID   Guaranteed Exec         │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

3.2 Pre-Check: Reducing Invalid Orders

Pre-Check queries UBSCore (User Balance Core Service) for balance info. Read-Only, No Side Effects.

#![allow(unused)]
fn main() {
async fn pre_check(order: Order) -> Result<Order, Reject> {
    // 1. Query UBSCore for balance (Read-Only)
    let balance = ubscore.query_balance(order.user_id, asset);

    // 2. Calculate required amount
    let required = match order.side {
        Buy  => order.price * order.qty / QTY_UNIT,  // quote
        Sell => order.qty,                            // base
    };

    // 3. Balance Check (Read-Only, No Lock)
    if balance.avail < required {
        return Err(Reject::InsufficientBalance);
    }

    // 4. Pass
    Ok(order)
}
// Note: Balance might be consumed by others between Pre-Check and WAL.
// This is allowed; WAL's Balance Lock will handle it.
}

Why Pre-Check?

The Core Flow (WAL + Balance Lock + Matching) is expensive. We must filter garbage orders fast.

No Pre-CheckWith Pre-Check
Garbage enters core flowFilters most invalid orders
Core wastes latency on invalid ordersCore processes mostly valid orders
Vulnerable to spam attacksReduces impact of malicious requests

Pre-Check Items:

  • ✅ Balance Check
  • 📋 User Status (Banned?)
  • 📋 Format Validation
  • 📋 Rate Limiting
  • 📋 Risk Rules

3.3 Must Execute Once Persisted

Once an order is persisted, it MUST end in one of these states:

┌─────────────────────┐
│   Order Persisted   │
└─────────────────────┘
           │
           ├──▶ Filled
           ├──▶ PartialFilled
           ├──▶ New (Booked)
           ├──▶ Cancelled
           ├──▶ Expired
           └──▶ Rejected (Insufficient Balance) ← Valid Final State!

❌ Never: Logged but state unknown.

4. WAL: Why it’s the Best Choice?

4.1 What is WAL (Write-Ahead Log)?

WAL is an Append-Only log structure:

┌─────────────────────────────────────────────────────────────────┐
│                          WAL File                               │
├─────────────────────────────────────────────────────────────────┤
│  Entry 1  │  Entry 2  │  Entry 3  │  Entry 4  │  ...  │ ← Append│
│ (seq=1)   │ (seq=2)   │ (seq=3)   │ (seq=4)   │       │         │
└─────────────────────────────────────────────────────────────────┘
                                                          ↑
                                                     Append Only!

4.2 Why WAL for HFT?

MethodWrite PatternLatencyThroughputHFT Suitability
DB (MySQL)Random + Txn~1-10ms~1K ops/s❌ Too Slow
KV (Redis)Random~0.1-1ms~10K ops/s⚠️ Average
WALSequential~1-10µs~1M ops/sBest

Why is WAL fast?

  1. Sequential Write vs Random Write:
    • HDD: No seek time (~10ms saved).
    • SSD: Reduces Write Amplification.
    • Result: 10-100x faster.
  2. No Transaction Overhead:
    • DB: Txn start, lock, redo log, data page, binlog, commit…
    • WAL: Serialize -> Append -> (Optional) Fsync.
  3. Group Commit:
    • Batch multiple writes into one fsync.
#![allow(unused)]
fn main() {
// Group Commit Logic
pub fn flush(&mut self) -> io::Result<()> {
    self.file.write_all(&self.buffer)?;
    self.file.sync_data()?;  // fsync once for N orders
    self.buffer.clear();
    Ok(())
}
}

5. Single Thread + Lock-Free Architecture

5.1 Why Single Thread?

Intuition: Concurrency = Fast. Reality in HFT: Single Thread is Faster.

Multi-ThreadSingle Thread
Locks & ContentionLock-Free
Cache InvalidationCache Friendly
Context Switch OverheadNo Context Switch
Hard OrderingNaturally Ordered
Complex Sync LogicSimple Code

5.2 Mechanical Sympathy

CPU Cache Hierarchy:

  • L1 Cache: ~1ns
  • L2 Cache: ~4ns
  • RAM: ~100ns

Single Thread Advantage: Data stays in L1/L2 (Hot). No cache line contention.

5.3 LMAX Disruptor Pattern

Originating from LMAX Exchange (6M TPS on single thread):

  1. Single Writer (Avoid write contention)
  2. Pre-allocated Memory (Avoid GC/malloc)
  3. Cache Padding (Avoid false sharing)
  4. Batch Consumption

6. Ring Buffer: Inter-Service Communication

6.1 Why Ring Buffer?

MethodLatencyThroughput
HTTP/gRPC~1ms~10K/s
Kafka~1-10ms~1M/s
Shared Memory Ring Buffer~100ns~10M/s

6.2 Ring Buffer Principle

      write_idx                       read_idx
          ↓                               ↓
   ┌───┬───┬───┬───┬───┬───┬───┬───┬───┬───┐
   │ 8 │ 9 │10 │11 │12 │13 │14 │ 0 │ 1 │ 2 │ ...
   └───┴───┴───┴───┴───┴───┴───┴───┴───┴───┘
         ↑                               ↑
     New Data                        Consumer
  • Fixed size, circular.
  • Zero allocation during runtime.
  • SPSC (Single Producer Single Consumer) is lock-free.

7. Overall Architecture

7.1 Core Services

ServiceResponsibilityState
GatewayReceive RequestsStateless
Pre-CheckRead-only Balance CheckStateless
UBSCoreBalance Ops + Order WALStateful (Balance)
MEMatching, Generate TradesStateful (OrderBook)
SettlementPersist EventsStateless

7.2 UBSCore Service (User Balance Core)

Single Entry Point for ALL Balance Operations.

Why UBSCore?

  • Atomic: Single thread = No Double Spend.
  • Audit: Complete trace of all changes.
  • Recovery: Single WAL restores state.

Pipeline Role:

  1. Write Order WAL (Persist)
  2. Lock Balance
    • Success → Forward to ME
    • Fail → Rejected
  3. Handle Trade Events (Settlement)
    • Update buyer/seller balances.

7.3 Matching Engine (ME)

ME is Pure Matching. It ignores Balances.

  • Does: Maintain OrderBook, Match by Price/Time, Generate Trade Events.
  • Does NOT: Check balance, lock funds, persist data.

Trade Event Drive Balance Update: TradeEvent contains {price, qty, user_ids} → sufficient to calculate balance changes.

7.4 Settlement Service

Settlement Persists, does not modify Balances.

  • Persist Trade Events, Order Events.
  • Write Audit Log (Ledger).

7.5 Architecture Diagram

┌──────────────────────────────────────────────────────────────────────────────────┐
│                         0xInfinity HFT Architecture                               │
├──────────────────────────────────────────────────────────────────────────────────┤
│   Client Orders                                                                   │
│        │                                                                          │
│        ▼                                                                          │
│   ┌──────────────┐                                                                │
│   │   Gateway    │                                                                │
│   └──────┬───────┘                                                                │
│          ▼                                                                        │
│   ┌──────────────┐         query balance                                          │
│   │  Pre-Check   │ ──────────────────────────────▶   UBSCore Service              │
│   └──────┬───────┘                                                                │
│          ▼                                                                        │
│   ┌──────────────┐                                   ┌────────────────────┐       │
│   │ Order Buffer │                                   │  Balance State     │       │
│   └──────┬───────┘                                   │  (RAM, Single Thd) │       │
│          │ Ring Buffer                               └────────────────────┘       │
│          ▼                                                                        │
│   ┌──────────────────────────────────────────┐                                    │
│   │  UBSCore: Order Processing               │       Operations:                  │
│   │  1. Write Order WAL (Persist)            │       - lock / unlock              │
│   │  2. Lock Balance                         │       - spend_frozen               │
│   │     - OK → forward to ME                 │       - deposit                    │
│   │     - Fail → Rejected                    │                                    │
│   └──────────────┬───────────────────────────┘                                    │
│                  │ Ring Buffer (valid orders)                                     │
│                  ▼                                                                │
│   ┌──────────────────────────────────────────┐                                    │
│   │         Matching Engine (ME)             │                                    │
│   │                                          │                                    │
│   │  Pure Matching, Ignore Balance           │                                    │
│   │  Output: Trade Events                    │                                    │
│   └──────────────┬───────────────────────────┘                                    │
│                  │ Ring Buffer (Trade Events)                                     │
│         ┌───────┴────────┐                                                        │
│         ▼                ▼                                                        │
│   ┌───────────┐   ┌─────────────────────────┐                                     │
│   │ Settlement│   │ Balance Update Events   │────▶   Execute Balance Update       │
│   │           │   │ (from Trade Events)     │                                     │
│   │ Persist:  │   └─────────────────────────┘                                     │
│   │ - Trades  │                                                                   │
│   │ - Ledger  │                                                                   │
│   └───────────┘                                                                   │
└───────────────────────────────────────────────────────────────────────────────────┘

7.7 Event Sourcing + Pure State Machine

Order WAL = Single Source of Truth

State(t) = Replay(Order_WAL[0..t])

Any state (Balance, OrderBook) can be 100% reconstructed by replaying the Order WAL.

Pure State Machines:

  • UBSCore: Order Events → Balance Events (Deterministic)
  • ME: Valid Orders → Trade Events (Deterministic)

Recovery Flow:

  1. Load Checkpoint (Snapshot).
  2. Replay Order WAL from checkpoint.
  3. ME re-matches and generates events.
  4. UBSCore applies balance updates.
  5. System Restored.

8. Summary

Core Decisions:

  • Persist First: WAL ensures recoverability.
  • Pre-Check: Filters invalid orders early.
  • Single Thread + Lock-Free: Avoids contention, maximizes throughput.
  • UBSCore: Centralized, atomic balance management.
  • Responsibility Segregation: UBSCore (Money), ME (Match), Settlement (Log).

Refactoring: For the upcoming implementation, we refactored the code structure:

  • lib.rs, main.rs, core_types.rs, config.rs
  • orderbook.rs, balance.rs, engine.rs
  • csv_io.rs, ledger.rs, perf.rs

Next: Detailed implementation of UBSCore and Ring Buffer.




🇨🇳 中文

📦 代码变更: 查看 Diff

核心目的:设计完整的交易流水线架构,确保订单持久化、余额一致性和系统可恢复性。

本章解决撮合引擎最关键的设计问题:服务划分、数据流和原子性保证

1. 为什么需要持久化?

1.1 问题场景

假设系统在撮合过程中崩溃:

用户 A 发送买单 → ME 接收并成交 → 系统崩溃
                                    ↓
                            用户 A 的钱扣了
                            但没有成交记录
                            订单丢失!

没有持久化的后果

  • 订单丢失:用户下的单消失了
  • 状态不一致:资金变动了但没有记录
  • 无法恢复:重启后不知道有哪些订单

1.2 解决方案:先持久化,后撮合

用户 A 发送买单 → WAL 持久化 → ME 撮合 → 系统崩溃
                    ↓              ↓
               订单已保存      可以重放恢复!

2. 唯一排序 (Unique Ordering)

2.1 为什么需要唯一排序?

在分布式系统中,多个节点必须对订单顺序达成一致:

场景问题
节点 A 先收到订单 1,再收到订单 2
节点 B 先收到订单 2,再收到订单 1顺序不一致!

结果:两个节点的撮合结果可能不同!

2.2 解决方案:单点排序 + 全局序号

所有订单 → Sequencer → 分配全局 sequence_id → 持久化 → 分发到 ME
              ↓
         唯一的到达顺序
字段说明
sequence_id单调递增的全局序号
timestamp精确到纳秒的时间戳
order_id业务层订单 ID

3. 订单生命周期

3.1 先持久化,后执行

┌─────────────────────────────────────────────────────────────────────────┐
│                         订单生命周期                                      │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐             │
│   │ Gateway │───▶│Pre-Check│───▶│   WAL   │───▶│   ME    │             │
│   │(接收订单)│    │(余额校验)│    │ (持久化)│    │ (撮合) │             │
│   └─────────┘    └─────────┘    └─────────┘    └─────────┘             │
│        │              │              │              │                   │
│        ▼              ▼              ▼              ▼                   │
│    接收订单      余额不足?        写入磁盘        执行撮合               │
│                  提前拒绝        分配seq_id      保证执行               │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

3.2 Pre-Check:减少无效订单

Pre-Check 通过查询 UBSCore (User Balance Core Service,用户余额核心服务,详见第 7.2 节) 获取余额信息,只读,无副作用

#![allow(unused)]
fn main() {
async fn pre_check(order: Order) -> Result<Order, Reject> {
     // 1. 查询 UBSCore 获取余额 (只读查询)
     let balance = ubscore.query_balance(order.user_id, asset);

     // 2. 计算所需金额
     let required = match order.side {
         Buy  => order.price * order.qty / QTY_UNIT,  // quote
         Sell => order.qty,                            // base
     };

     // 3. 余额检查 (只读,不锁定)
     if balance.avail < required {
         return Err(Reject::InsufficientBalance);
     }

     // 4. 检查通过,放行订单到下一阶段
     Ok(order)
}
// 注意:Pre-Check 不锁定余额!
// 余额可能在 Pre-Check 和 WAL 之间被其他订单消耗
// 这是允许的,WAL 后的 Balance Lock 会处理这种情况
}

为什么需要 Pre-Check?

核心流程(WAL 持久化、Balance Lock、撮合)的延迟成本很高。 用户可能提交大量垃圾订单,我们需要最快速地预过滤,减少进入核心流程的订单量。

不 Pre-Check有 Pre-Check
垃圾订单直接进入核心流程快速过滤大部分无效订单
核心流程处理无效订单,浪费延迟核心流程只处理可能有效的订单
系统容易被刷单攻击减少恶意请求的影响

Pre-Check 可以包含多种快速检查

  • ✅ 余额检查(当前实现)
  • 📋 用户状态检查(是否被禁用)
  • 📋 订单格式校验
  • 📋 频率限制 (Rate Limit)
  • 📋 风控规则(未来扩展)

重要:Pre-Check 是“尽力而为“的过滤器,不保证 100% 准确。 通过 Pre-Check 的订单,仍可能在 WAL + Balance Lock 阶段被拒绝。

3.3 一旦持久化,必须完整执行

订单被持久化后,无论发生什么,都必须有以下其中一个结果:

┌─────────────────────┐
│ 订单已持久化         │
└─────────────────────┘
           │
           ├──▶ 成交 (Filled)
           ├──▶ 部分成交 (PartialFilled)
           ├──▶ 挂单中 (New)
           ├──▶ 用户取消 (Cancelled)
           ├──▶ 系统过期 (Expired)
           └──▶ 余额不足被拒绝 (Rejected)  ← 也是合法的终态!

❌ 绝对不能:订单消失 / 状态未知

4. WAL:为什么是最佳选择?

4.1 什么是 WAL (Write-Ahead Log)?

WAL 是一种追加写 (Append-Only) 的日志结构:

┌─────────────────────────────────────────────────────────────────┐
│                          WAL File                               │
├─────────────────────────────────────────────────────────────────┤
│  Entry 1  │  Entry 2  │  Entry 3  │  Entry 4  │  ...  │ ← 追加  │
│ (seq=1)   │ (seq=2)   │ (seq=3)   │ (seq=4)   │       │         │
└─────────────────────────────────────────────────────────────────┘
                                                          ↑
                                                     只追加,不修改

4.2 为什么 WAL 是 HFT 最佳实践?

持久化方式写入模式延迟吞吐量HFT 适用性
数据库 (MySQL/Postgres)随机写 + 事务~1-10ms~1K ops/s❌ 太慢
KV 存储 (Redis/RocksDB)随机写~0.1-1ms~10K ops/s⚠️ 一般
WAL 追加写顺序写~1-10µs~1M ops/s最佳

为什么 WAL 这么快?

  1. 顺序写 vs 随机写
    • 机械硬盘不用寻道。
    • SSD 减少写放大。
    • 结果:快 10-100 倍。
  2. 无事务开销
    • 无需锁、redo log、binlog 等数据库复杂机制。
  3. 批量刷盘 (Group Commit)
    • 合并多次写入一次 fsync。

5. 单线程 + Lock-Free 架构

5.1 为什么选择单线程?

大多数人直觉认为:并发 = 快。但在 HFT 领域,单线程往往更快

多线程单线程
需要锁保护共享状态无锁,无竞争
缓存失效 (cache invalidation)缓存友好
上下文切换开销无切换开销
顺序难以保证天然有序
复杂的同步逻辑代码简单直观

5.2 Mechanical Sympathy

CPU Cache Hierarchy:

  • L1 Cache: ~1ns
  • L2 Cache: ~4ns
  • RAM: ~100ns

单线程优势:数据始终在 L1/L2 缓存中(热数据),无 cache line 争用。

5.3 LMAX Disruptor 模式

这种单线程 + Ring Buffer 的架构源自 LMAX Exchange(伦敦多资产交易所),号称能在单线程上处理 600 万订单/秒

  1. Single Writer (避免写竞争)
  2. Pre-allocated Memory (避免 GC/malloc)
  3. Cache Padding (避免 false sharing)
  4. Batch Consumption

6. Ring Buffer:服务间通信

6.1 为什么使用 Ring Buffer?

服务间通信的选择:

方式延迟吞吐量
HTTP/gRPC~1ms~10K/s
Kafka~1-10ms~1M/s
Shared Memory Ring Buffer~100ns~10M/s

6.2 Ring Buffer 原理

      write_idx                       read_idx
          ↓                               ↓
   ┌───┬───┬───┬───┬───┬───┬───┬───┬───┬───┐
   │ 8 │ 9 │ 10│ 11│ 12│ 13│ 14│ 15│ 0 │ 1 │ ...
   └───┴───┴───┴───┴───┴───┴───┴───┴───┴───┘
         ↑                               ↑
     新数据写入                        消费者读取
  • 固定大小,循环使用
  • 无需动态分配
  • Single Producer, Single Consumer ({SPSC) 可完全无锁

7. 整体架构

7.1 核心服务

服务职责状态
Gateway接收客户端请求无状态
Pre-Check只读查询余额,过滤无效订单无状态
UBSCore所有余额操作 + Order WAL有状态 (余额)
ME纯撮合,生成 Trade Events有状态 (OrderBook)
Settlement持久化 events,未来写 DB无状态

7.2 UBSCore Service (User Balance Core)

UBSCore 是所有账户余额操作的唯一入口,单线程执行保证原子性。

应用场景

  1. Write Order WAL (持久化)
  2. Lock Balance (锁定)
  3. Handle Trade Events (成交后结算)

7.3 Matching Engine (ME)

ME 是纯撮合引擎,不关心余额

  • 负责:维护 OrderBook,撮合,生成 Trade Events。
  • 不负责:检查余额,锁定资金,持久化。

Trade Event 驱动余额更新TradeEvent 包含 {price, qty, user_ids},足够计算出余额变化。

7.4 Settlement Service

Settlement 负责持久化,不修改余额

  • 持久化 Trade Events,Order Events。
  • 写审计日志 (Ledger)。

7.5 完整架构图

┌──────────────────────────────────────────────────────────────────────────────────┐
│                         0xInfinity HFT Architecture                               │
├──────────────────────────────────────────────────────────────────────────────────┤
│   Client Orders                                                                   │
│        │                                                                          │
│        ▼                                                                          │
│   ┌──────────────┐                                                                │
│   │   Gateway    │                                                                │
│   └──────┬───────┘                                                                │
│          ▼                                                                        │
│   ┌──────────────┐         query balance                                          │
│   │  Pre-Check   │ ──────────────────────────────▶   UBSCore Service              │
│   └──────┬───────┘                                                                │
│          ▼                                                                        │
│   ┌──────────────┐                                   ┌────────────────────┐       │
│   │ Order Buffer │                                   │  Balance State     │       │
│   └──────┬───────┘                                   │  (RAM, Single Thd) │       │
│          │ Ring Buffer                               └────────────────────┘       │
│          ▼                                                                        │
│   ┌──────────────────────────────────────────┐                                    │
│   │  UBSCore: Order Processing               │       Operations:                  │
│   │  1. Write Order WAL (持久化)              │       - lock / unlock              │
│   │  2. Lock Balance                         │       - spend_frozen               │
│   │     - OK → forward to ME                 │       - deposit                    │
│   │     - Fail → Rejected                    │                                    │
│   └──────────────┬───────────────────────────┘                                    │
│                  │ Ring Buffer (valid orders)                                     │
│                  ▼                                                                │
│   ┌──────────────────────────────────────────┐                                    │
│   │         Matching Engine (ME)             │                                    │
│   │                                          │                                    │
│   │  纯撮合,不关心 Balance                   │                                    │
│   │  输出: Trade Events                      │                                    │
│   └──────────────┬───────────────────────────┘                                    │
│                  │ Ring Buffer (Trade Events)                                     │
│         ┌───────┴────────┐                                                        │
│         ▼                ▼                                                        │
│   ┌───────────┐   ┌─────────────────────────┐                                     │
│   │ Settlement│   │ Balance Update Events   │────▶   执行余额更新                 │
│   │           │   │ (from Trade Events)     │                                     │
│   │ 持久化:    │   └─────────────────────────┘                                     │
│   │ - Trades  │                                                                   │
│   │ - Ledger  │                                                                   │
│   └───────────┘                                                                   │
└───────────────────────────────────────────────────────────────────────────────────┘

7.7 Event Sourcing + Pure State Machine

Order WAL = Single Source of Truth

State(t) = Replay(Order_WAL[0..t])

只要有 Order WAL,就能恢复整个系统状态!

Pure State Machines:

  • UBSCore: Order Events → Balance Events (确定性)
  • ME: Valid Orders → Trade Events (确定性)

恢复流程:

  1. 加载最近快照 Checkpoint。
  2. 重放 Order WAL。
  3. 系统恢复到崩溃前状态。

8. Summary

核心设计

  • 先持久化:WAL 保证可恢复性。
  • Pre-Check:提前过滤无效订单。
  • 单线程 + 无锁:避免锁竞争,最大化吞吐。
  • UBSCore:集中式、原子的余额管理。
  • 职责分离:UBSCore (钱),ME (撮合),Settlement (日志)。

代码重构: 为后续章节准备,我们重构了 src 目录结构,模块化了 main.rs, core_types.rs 等。

下一步:实现 UBSCore 和 Ring Buffer。

0x08-b UBSCore Implementation

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

Objective: From design to implementation: Building a Safety-First Balance Core Service.

In the previous chapter (0x08-a), we designed the full HFT pipeline architecture. Now, it’s time to implement the core components. This chapter covers:

  1. Ring Buffer - Lock-free inter-service communication.
  2. Write-Ahead Log (WAL) - Order persistence.
  3. UBSCore Service - The core balance service.

1. Technology Selection: Safety First

In financial systems, maturity and stability outweigh extreme performance.

1.1 Ring Buffer Selection

CrateMaturitySecurityPerformance
crossbeam-queue🌟🌟🌟🌟🌟 (3.3M+ DLs)Heavily AuditedVery Low Latency
ringbuf🌟🌟🌟🌟 (600K+ DLs)Community VerifiedLower Latency
rtrb🌟🌟🌟 (Newer)Less VettedLowest Latency

Our Choice: crossbeam-queue

Reasons:

  • Maintained by Rust core team members.
  • Base dependency for tokio, actix, rayon.
  • If it has a bug, half the Rust ecosystem collapses.

Financial System Selection Principle: Use what lets you sleep at night.

#![allow(unused)]
fn main() {
use crossbeam_queue::ArrayQueue;

// Create fixed-size ring buffer
let queue: ArrayQueue<OrderMessage> = ArrayQueue::new(1024);

// Producer: Non-blocking push
queue.push(order_msg).unwrap();

// Consumer: Non-blocking pop
if let Some(msg) = queue.pop() {
    process(msg);
}
}

2. Write-Ahead Log (WAL)

WAL is the system’s Single Source of Truth.

2.1 Design Principles

#![allow(unused)]
fn main() {
/// Write-Ahead Log for Orders
///
/// Principles:
/// 1. Append-Only: Sequential I/O, max performance.
/// 2. Group Commit: Batch fsyncs.
/// 3. Monotonic sequence_id: Deterministic replay.
pub struct WalWriter {
    writer: BufWriter<File>,
    next_seq: SeqNum,
    pending_count: usize,
    config: WalConfig,
}
}

2.2 Group Commit Strategy

Flush StrategyLatencyThroughputSafety
Every Entry~50µs~20K/sHighest
Every 100 Entries~5µs (amortized)~200K/sHigh
Every 1ms~1µs (amortized)~1M/sMedium

We choose Every 100 Entries to balance performance and safety:

#![allow(unused)]
fn main() {
pub struct WalConfig {
    pub path: String,
    pub flush_interval_entries: usize,  // Flush every N entries
    pub sync_on_flush: bool,            // Whether to call fsync
}
}

2.3 WAL Entry Format

Currently CSV (readable for dev):

seq_id,timestamp_ns,order_id,user_id,price,qty,side,order_type
1,1702742400000000000,1001,100,85000000000,100000000,Buy,Limit

In production, switch to Binary (54 bytes/entry) for better performance.

3. UBSCore Service

UBSCore is the Single Entry Point for all balance operations.

3.1 Responsibilities

  1. Balance State Management: In-memory balance state.
  2. Order WAL Writing: Persist orders.
  3. Balance Operations: lock/unlock/spend_frozen/deposit.

3.2 Core Structure

#![allow(unused)]
fn main() {
pub struct UBSCore {
    /// User Accounts - Authoritative Balance State
    accounts: FxHashMap<UserId, UserAccount>,
    /// Write-Ahead Log
    wal: WalWriter,
    /// Configuration
    config: TradingConfig,
    /// Pending Orders (Locked but not filled)
    pending_orders: FxHashMap<OrderId, PendingOrder>,
    /// Statistics
    stats: UBSCoreStats,
}
}

3.3 Order Processing Flow

process_order(order):
  │
  ├─ 1. Write to WAL ──────────► Get seq_id
  │
  ├─ 2. Validate order ────────► Check price/qty
  │
  ├─ 3. Get user account ──────► Lookup user
  │
  ├─ 4. Calculate lock amount ─► Buy: price * qty / qty_unit
  │                              Sell: qty
  │
  └─ 5. Lock balance ──────────► Success → Ok(ValidOrder)
                                 Fail    → Err(Rejected)

Implementation:

#![allow(unused)]
fn main() {
pub fn process_order(&mut self, order: Order) -> Result<ValidOrder, OrderEvent> {
    // Step 1: Write to WAL FIRST (persist before any state change)
    let seq_id = self.wal.append(&order)?;

    // Step 2-4: Validate and calculate
    // ...

    // Step 5: Lock balance
    let lock_result = account
        .get_balance_mut(locked_asset_id)
        .and_then(|balance| balance.lock(locked_amount));

    match lock_result {
        Ok(()) => {
            // Track pending order
            self.pending_orders.insert(order.id, PendingOrder { ... });
            Ok(ValidOrder::new(seq_id, order, locked_amount, locked_asset_id))
        }
        Err(_) => Err(OrderEvent::Rejected { ... })
    }
}
}

3.4 Settlement

#![allow(unused)]
fn main() {
pub fn settle_trade(&mut self, event: &TradeEvent) -> Result<(), &'static str> {
    let trade = &event.trade;
    let quote_amount = trade.price * trade.qty / self.config.qty_unit();

    // Buyer: spend USDT, receive BTC
    buyer.get_balance_mut(quote_id)?.spend_frozen(quote_amount)?;
    buyer.get_balance_mut(base_id)?.deposit(trade.qty)?;

    // Seller: spend BTC, receive USDT
    seller.get_balance_mut(base_id)?.spend_frozen(trade.qty)?;
    seller.get_balance_mut(quote_id)?.deposit(quote_amount)?;

    Ok(())
}
}

4. Message Types

Services communicate via defined message types:

#![allow(unused)]
fn main() {
// Gateway → UBSCore
pub struct OrderMessage {
    pub seq_id: SeqNum,
    pub order: Order,
    // ...
}

// UBSCore → ME
pub struct ValidOrder {
    pub seq_id: SeqNum,
    pub order: Order,
    pub locked_amount: u64,
    // ...
}

// ME → UBSCore + Settlement
pub struct TradeEvent {
    pub trade: Trade,
    pub taker_order_id: OrderId,
    pub maker_order_id: OrderId,
    // ...
}
}

5. Integration & Usage

5.1 CLI Arguments

# Original Pipeline
cargo run --release

# UBSCore Pipeline (Enable WAL)
cargo run --release -- --ubscore

5.2 Performance Comparison

MetricOriginalUBSCoreChange
Throughput15,070 ops/s14,314 ops/s-5%
WAL EntriesN/A100,0006.67 MB
Balance Check0.3%1.3%+1%
Matching45.5%45.5%-
Settlement0.1%0.2%-
Ledger I/O54.0%53.0%-1%

Analysis:

  • WAL introduces ~5% overhead.
  • Acceptable cost for safety.
  • Main bottleneck remains Ledger I/O.

6. Tests

6.1 Unit Tests

cargo test
# 31 tests passing

6.2 E2E Tests

sh scripts/test_e2e.sh
# ✅ All tests passed!

7. New Files

FileLinesDescription
src/messages.rs265Inter-service messages
src/wal.rs340Write-Ahead Log
src/ubscore.rs490User Balance Core

8. Key Learnings

8.1 Safety First

  • Maturity > Performance
  • Auditable > Rapid Dev

8.2 WAL is Single Source of Truth

All state = f(WAL). Foundation for Disaster Recovery and Audit.

8.3 Single Thread Advantage

UBSCore uses single thread for natural atomicity (no locking needed for balance ops) and predictable latency.

9. Critical Bug Fix: Cost Calculation Overflow

9.1 The Issue

Testing with --ubscore revealed 1032 rejected orders that were accepted in the legacy mode.

9.2 Root Cause

Overflow in price * qty (u64).

Example Order #21:

  • Price: 84,956.01 USDT (6 decimals) -> 84,956,010,000
  • Qty: 2.56 BTC (8 decimals) -> 256,284,400
  • Product: 2.177 × 10^19 > u64::MAX

9.3 Why Legacy Mode Passed?

Release Code Wrapping Arithmetic: Legacy code cost = price * qty wrapped around, resulting in a much smaller, incorrect value. users were locked for 33k USDT but bought 217k USDT worth of BTC!

9.4 The Fix

#![allow(unused)]
fn main() {
// Use u128 for intermediate calculation
let cost_128 = (self.price as u128) * (self.qty as u128) / (qty_unit as u128);
if cost_128 > u64::MAX as u128 {
    Err(CostError::Overflow)
}
}

9.5 Configuration Issue

USDT with 6 decimals is risky. Recommended: 2 decimals. Binance uses 2 decimals for USDT price.

10. Improvement: Ledger Integrity & Determinism

10.1 Incomplete Ledger

Current Ledger lacks Deposit, Lock, Unlock, SpendFrozen. Only tracks Settlement.

10.2 Pipeline Non-Determinism

Pipeline concurrency means Lock and Settlement events interleave non-deterministically. Snapshot comparison is impossible.

10.3 Solution: Version Space Separation

Separate version counters for Lock events and Settle events.

Version SpaceIncrement OnSort ByDeterminism
lock_versionLock/Unlockorder_seq_id✅ Deterministic
settle_versionSettletrade_id✅ Deterministic

Validation Strategy: Verify the Final Set of events, sorted by their respective versions/source IDs, rather than checking snapshot consistency at arbitrary times.

11. Design Discussion: Causal Chain

UBSCore has inputs from OrderQueue and TradeQueue. Interleaving is random.

Solution:

  1. OrderQueue strictly follows order_seq_id.
  2. TradeQueue strictly follows trade_id.
  3. Link every Balance Event to its source (order_seq_id or trade_id).
  4. This forms a Causal Chain for audit.
#![allow(unused)]
fn main() {
struct BalanceEvent {
    // ...
    source_type: SourceType, // Order | Trade
    source_id: u64,          // order_seq_id | trade_id
}
}

This allows offline verification: Lock(source=Order N) must exist if Order N exists. Settle(source=Trade M) must exist if Trade M exists.

12. Next Steps (0x08-c)

  1. Implement Version Space Separation.
  2. Expand BalanceEvent with causal links.
  3. Integrate Ring Buffer.
  4. Develop Causal Chain Audit Tools.



🇨🇳 中文

📦 代码变更: 查看 Diff

从设计到实现:构建安全第一的余额核心服务

概述

在上一章(0x08-a)中,我们设计了完整的 HFT 交易流水线架构。现在,是时候实现核心组件了。本章我们将构建:

  1. Ring Buffer - 服务间无锁通信
  2. Write-Ahead Log (WAL) - 订单持久化
  3. UBSCore Service - 余额核心服务

1. 技术选型:安全第一

在金融系统中,成熟稳定比极致性能更重要。

1.1 Ring Buffer 选型

成熟度安全性性能
crossbeam-queue🌟🌟🌟🌟🌟 (330万+下载)最严苛审计极低延迟
ringbuf🌟🌟🌟🌟 (60万+下载)社区验证更低延迟
rtrb🌟🌟🌟 (较新)较少审查最低延迟

我们的选择:crossbeam-queue

理由:

  • Rust 核心团队成员参与维护
  • 被 tokio, actix, rayon 作为底层依赖
  • 如果它有 Bug,半个 Rust 生态都会崩

金融系统选型原则:用它睡得着觉。

#![allow(unused)]
fn main() {
use crossbeam_queue::ArrayQueue;

// 创建固定容量的 ring buffer
let queue: ArrayQueue<OrderMessage> = ArrayQueue::new(1024);

// 生产者:非阻塞 push
queue.push(order_msg).unwrap();

// 消费者:非阻塞 pop
if let Some(msg) = queue.pop() {
    process(msg);
}
}

2. Write-Ahead Log (WAL)

WAL 是系统的唯一事实来源 (Single Source of Truth)

2.1 设计原则

#![allow(unused)]
fn main() {
/// Write-Ahead Log for Orders
///
/// 设计原则:
/// 1. 追加写 (Append-Only) - 顺序 I/O,最大化性能
/// 2. Group Commit - 批量刷盘,减少 fsync 次数
/// 3. 单调递增 sequence_id - 保证确定性重放
pub struct WalWriter {
    writer: BufWriter<File>,
    next_seq: SeqNum,
    pending_count: usize,
    config: WalConfig,
}
}

2.2 Group Commit 策略

刷盘策略延迟吞吐量数据安全
每条 fsync~50µs~20K/s最高
每 100 条~5µs (均摊)~200K/s
每 1ms~1µs (均摊)~1M/s

我们选择 每 100 条刷盘,在性能和安全间取得平衡:

#![allow(unused)]
fn main() {
pub struct WalConfig {
    pub path: String,
    pub flush_interval_entries: usize,  // 每 N 条刷盘
    pub sync_on_flush: bool,            // 是否调用 fsync
}
}

2.3 WAL 条目格式

当前使用 CSV 格式(开发阶段可读性好):

seq_id,timestamp_ns,order_id,user_id,price,qty,side,order_type
1,1702742400000000000,1001,100,85000000000,100000000,Buy,Limit

生产环境可切换为二进制格式(54 bytes/entry)以提升性能。

3. UBSCore Service

UBSCore 是所有余额操作的唯一入口

3.1 职责

  1. Balance State Management - 内存中的余额状态
  2. Order WAL Writing - 持久化订单
  3. Balance Operations - lock/unlock/spend_frozen/deposit

3.2 核心结构

#![allow(unused)]
fn main() {
pub struct UBSCore {
    /// 用户账户 - 权威余额状态
    accounts: FxHashMap<UserId, UserAccount>,
    /// Write-Ahead Log
    wal: WalWriter,
    /// 交易配置
    config: TradingConfig,
    /// 待处理订单(已锁定但未成交)
    pending_orders: FxHashMap<OrderId, PendingOrder>,
    /// 统计信息
    stats: UBSCoreStats,
}
}

3.3 订单处理流程

process_order(order):
  │
  ├─ 1. Write to WAL ──────────► 获得 seq_id
  │
  ├─ 2. Validate order ────────► 价格/数量检查
  │
  ├─ 3. Get user account ──────► 查找用户
  │
  ├─ 4. Calculate lock amount ─► Buy: price * qty / qty_unit
  │                              Sell: qty
  │
  └─ 5. Lock balance ──────────► Success → Ok(ValidOrder)
                                 Fail    → Err(Rejected)

代码实现:

#![allow(unused)]
fn main() {
pub fn process_order(&mut self, order: Order) -> Result<ValidOrder, OrderEvent> {
    // Step 1: Write to WAL FIRST (persist before any state change)
    let seq_id = self.wal.append(&order)?;

    // Step 2-4: Validate and calculate
    // ...

    // Step 5: Lock balance
    let lock_result = account
        .get_balance_mut(locked_asset_id)
        .and_then(|balance| balance.lock(locked_amount));

    match lock_result {
        Ok(()) => {
            // Track pending order
            self.pending_orders.insert(order.id, PendingOrder { ... });
            Ok(ValidOrder::new(seq_id, order, locked_amount, locked_asset_id))
        }
        Err(_) => Err(OrderEvent::Rejected { ... })
    }
}
}

3.4 成交结算

#![allow(unused)]
fn main() {
pub fn settle_trade(&mut self, event: &TradeEvent) -> Result<(), &'static str> {
    let trade = &event.trade;
    let quote_amount = trade.price * trade.qty / self.config.qty_unit();

    // Buyer: spend USDT, receive BTC
    buyer.get_balance_mut(quote_id)?.spend_frozen(quote_amount)?;
    buyer.get_balance_mut(base_id)?.deposit(trade.qty)?;

    // Seller: spend BTC, receive USDT
    seller.get_balance_mut(base_id)?.spend_frozen(trade.qty)?;
    seller.get_balance_mut(quote_id)?.deposit(quote_amount)?;

    Ok(())
}
}

4. 消息类型

服务间通过明确定义的消息类型通信:

#![allow(unused)]
fn main() {
// Gateway → UBSCore
pub struct OrderMessage {
    pub seq_id: SeqNum,
    pub order: Order,
    // ...
}

// UBSCore → ME
pub struct ValidOrder {
    pub seq_id: SeqNum,
    pub order: Order,
    pub locked_amount: u64,
    // ...
}

// ME → UBSCore + Settlement
pub struct TradeEvent {
    pub trade: Trade,
    pub taker_order_id: OrderId,
    pub maker_order_id: OrderId,
    // ...
}
}

5. 集成与使用

5.1 命令行参数

# 原始流水线
cargo run --release

# UBSCore 流水线(启用 WAL)
cargo run --release -- --ubscore

5.2 性能对比

指标原始UBSCore变化
吞吐量15,070 ops/s14,314 ops/s-5%
WAL 条目N/A100,0006.67 MB
余额检查0.3%1.3%+1%
匹配引擎45.5%45.5%-
结算0.1%0.2%-
账本 I/O54.0%53.0%-1%

分析

  • WAL 写入引入约 5% 的开销
  • 这是可接受的代价,换取了数据安全性
  • 主要瓶颈仍是 Ledger I/O(下一章优化目标)

6. 测试

6.1 单元测试

cargo test
# 31 tests passing

6.2 E2E 测试

sh scripts/test_e2e.sh
# ✅ All tests passed!

7. 新增文件

文件行数描述
src/messages.rs265服务间消息类型
src/wal.rs340Write-Ahead Log
src/ubscore.rs490User Balance Core

8. 关键学习

8.1 安全第一

  • 成熟稳定 > 极致性能
  • 可审计 > 快速开发
  • 用它睡得着觉 是选型的最高标准

8.2 WAL 是唯一事实来源

All state = f(WAL)。任何时刻,系统状态都可以从 WAL 100% 重建。这也是灾难恢复和审计合规的基础。

8.3 单线程是优势

UBSCore 选择单线程不是因为简单,而是因为:

  • 自然的原子性(无锁)
  • 不可能双重支付
  • 可预测的延迟

9. 重要 Bug 修复:Cost 计算溢出

9.1 问题发现

在实现 UBSCore 并运行 --ubscore 模式测试时,发现了 1032 个订单被拒绝,而传统模式全部接受。

9.2 根本原因

Cost 计算时 price * qty 溢出 u64

订单 #21:

  • price = 84,956,010,000 (84956.01 USDT,6位精度)
  • qty = 256,284,400 (2.562844 BTC,8位精度)
  • price * qty = 2.177 × 10^19 > u64::MAX

9.3 传统模式为什么没报错?

Release 模式的 wrapping arithmetic! 传统模式下,溢出后值变小,虽然通过了检查,但是锁定的金额严重不足!这是一个巨大的金融漏洞。

9.4 修复方案

#![allow(unused)]
fn main() {
// 使用 u128 进行中间计算
let cost_128 = (self.price as u128) * (self.qty as u128) / (qty_unit as u128);
if cost_128 > u64::MAX as u128 {
    Err(CostError::Overflow)
}
}

9.5 配置问题:USDT 精度过高

USDT 使用 6 位精度导致溢出风险。建议使用 2 位精度(Binance 标准)。

10. 待改进:Ledger 完整性与确定性

10.1 当前 Ledger 不完整

当前 Ledger 缺失 Deposit, Lock, Unlock, SpendFrozen 等操作。

10.2 Pipeline 模式的确定性问题

由于 Ring Buffer 并行处理,Lock 和 Settle 事件的交错顺序不固定,导致无法通过快照对比来验证一致性。

10.3 解决方案:分离 Version 空间

为每种事件类型维护独立的 version:

Version 空间递增条件排序依据确定性
lock_versionLock/Unlock 事件order_seq_id✅ 确定
settle_versionSettle 事件trade_id✅ 确定

验证策略: 不再验证任意时刻的快照,而是验证处理完成后的最终事件集合(按各自 Version 排序)。

11. 设计讨论全记录

11.1 因果链设计

UBSCore 有两个输入源:OrderQueue 和 TradeQueue。 为了审计,我们建立了因果链:

#![allow(unused)]
fn main() {
struct BalanceEvent {
    // ...
    source_type: SourceType, // Order | Trade
    source_id: u64,          // order_seq_id | trade_id
}
}

这不仅解决了审计问题,还让我们可以快速定位问题源头:Lock 必定对应一个 Order,Settle 必定对应一个 Trade。

12. 下一章任务 (0x08-c)

  1. 实现分离 Version 空间 - lock_version / settle_version
  2. 扩展 BalanceEvent - 添加 event_type, version, source_id
  3. Ring Buffer 集成
  4. 因果链审计工具

0x08-c Complete Event Flow & Verification

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

Core Objective: Implement a complete Event Sourcing architecture, verify equivalence with the legacy version, and upgrade the baseline.


Problems Identified

In the previous chapter (0x08-b), we implemented the UBSCore service but identified several issues:

1. Incomplete Ledger

The current Ledger only records settlement operations (Credit/Debit), missing other critical balance changes:

OperationCurrent RecordProduction Req
Deposit
Lock
Unlock
Settle

2. Pipeline Determinism Issue

With a multi-stage Ring Buffer pipeline, the interleaving order of Lock and Settle events is non-deterministic:

Run 1: [Lock1, Lock2, Lock3, Settle1, Settle2, Settle3]
Run 2: [Lock1, Settle1, Lock2, Settle2, Lock3, Settle3]

Result: Final state is identical, but the intermediate version sequence differs. Direct diff verification fails.


Objectives

1. Implement Separate Version Spaces

#![allow(unused)]
fn main() {
struct Balance {
    avail: u64,
    frozen: u64,
    lock_version: u64,    // Increments only on lock/unlock
    settle_version: u64,  // Increments only on settle
}
}

2. Expand BalanceEvent

#![allow(unused)]
fn main() {
struct BalanceEvent {
    user_id: u64,
    asset_id: u32,
    event_type: EventType,  // Deposit | Lock | Unlock | Settle
    version: u64,           // Increments within strict version space
    source_type: SourceType,// Order | Trade | External
    source_id: u64,         // order_seq_id | trade_id | ref_id
    delta: i64,
    avail_after: u64,
    frozen_after: u64,
}
}

3. Record ALL Balance Operations

Order(seq=5) ──Trigger──→ Lock(buyer USDT, lock_version=1)
     │
     └──→ Trade(id=3)
              │
              ├──Trigger──→ Settle(buyer: -USDT, +BTC, settle_version=1)
              └──Trigger──→ Settle(seller: -BTC, +USDT, settle_version=1)

4. Verify Equivalence & Upgrade Baseline

Ensure the refactored system produces the exact same final state as the pre-refactor version.


Implementation Progress

Phase 1: Separate Version Spaces ✅ Done

Goal: Solve Pipeline Determinism.

1.1 Modify Balance Struct

#![allow(unused)]
fn main() {
// src/balance.rs
pub struct Balance {
    avail: u64,
    frozen: u64,
    lock_version: u64,    // lock/unlock/deposit/withdraw
    settle_version: u64,  // spend_frozen/deposit
}
}

1.2 Version Increment Logic

OperationVersion Incremented
deposit()lock_version AND settle_version
withdraw()lock_version
lock()lock_version
unlock()lock_version
spend_frozen()settle_version

1.3 Equivalence Verification ✅

Script: scripts/verify_baseline_equivalence.py

$ python3 scripts/verify_baseline_equivalence.py

╔════════════════════════════════════════════════════════════╗
║     Baseline Equivalence Verification                      ║
╚════════════════════════════════════════════════════════════╝
...
=== Step 3: Compare avail and frozen values ===
✅ EQUIVALENT: avail and frozen values are IDENTICAL

Phase 2: Expand BalanceEvent ✅ Done

Goal: Full Event Sourcing.

2.1 Event Types & Structure

Implemented in src/messages.rs:

#![allow(unused)]
fn main() {
pub enum BalanceEventType { Deposit, Withdraw, Lock, Unlock, Settle }
pub enum SourceType { Order, Trade, External }

pub struct BalanceEvent {
    pub user_id: u64,
    pub asset_id: u32,
    pub event_type: BalanceEventType,
    pub version: u64,
    pub source_type: SourceType,
    pub source_id: u64,
    pub delta: i64,
    // ...
}
}

Phase 3: Record All Operations in Ledger ✅ Done

Goal: Every balance change is recorded.

3.1 Event Log File

UBSCore mode generates output/t2_events.csv:

user_id,asset_id,event_type,version,source_type,source_id,delta,avail_after,frozen_after
655,2,lock,2,order,1,-3315478,996684522,3315478
96,2,settle,2,trade,1,-92889,999907111,0
604,1,deposit,1,external,1,10000000000,10000000000,0

3.2 Recorded Operations

OperationStatusNote
DepositRecorded on init
LockRecorded on order lock
SettleRecorded on trade settle
Unlock(No cancel in current test)
Withdraw(No withdraw in current test)

3.3 Event Stats

Total events: 293,544
  Deposit events: 2,000
  Lock events: 100,000
  Settle events: 191,544

Phase 4: Validation Tests ✅ Done

Goal: Verify Event Correctness.

4.1 Event Correctness Verification

scripts/verify_balance_events.py - 7 Checks:

CheckDescriptionStatus
Lock Count= Accepted Orders
Settle Count= Trades × 4
Lock Version ContinuityIncremental per User-Asset
Settle Version ContinuityIncremental per User-Asset
Delta ConservationSum of deltas per trade = 0
Source ConsistencyLock→Order, Settle→Trade
Deposit CorrectnessPositive delta + source=external

4.2 Events Baseline Verification

scripts/verify_events_baseline.py:

$ python3 scripts/verify_events_baseline.py
...
Comparing by event type...
  deposit: output=2000, baseline=2000 ✅
  lock: output=100000, baseline=100000 ✅
  settle: output=191544, baseline=191544 ✅

╔════════════════════════════════════════════════════════════╗
║     ✅ Events match baseline!                             ║
╚════════════════════════════════════════════════════════════╝

4.3 Full E2E Test

Run scripts/test_ubscore_e2e.sh:

$ bash scripts/test_ubscore_e2e.sh

=== Step 1: Run with UBSCore mode ===
...
=== Step 2: Verify standard baselines ===
  ✅ All MATCH

=== Step 3: Verify balance events correctness ===
  ✅ All 7 checks passed!

=== Step 4: Verify events baseline ===
  ✅ Events match baseline!

Baseline Files

FileDescription
baseline/t2_balances_final.csvFinal Balance State
baseline/t2_orderbook.csvFinal OrderBook State
baseline/t2_events.csvEvent Log (293,544 events)

Next Steps

  • 0x08-d: Multi-threaded Pipeline: Implement Ring Buffer to connect services.
  • 0x09: Multi-Symbol Support: Scale to multiple trading pairs.

References




🇨🇳 中文

📦 代码变更: 查看 Diff

核心目标:实现完整的事件溯源架构,验证与旧版本的等效性,升级 baseline。


本章问题

上一章(0x08-b)我们实现了 UBSCore 服务,但发现了几个问题:

1. Ledger 不完整

当前 Ledger 只记录结算操作(Credit/Debit),缺失其他余额变更:

操作当前记录生产要求
Deposit
Lock
Unlock
Settle

2. Pipeline 确定性问题

当采用 Ring Buffer 多阶段 Pipeline 时,Lock 和 Settle 的交错顺序不确定:

运行 1: [Lock1, Lock2, Lock3, Settle1, Settle2, Settle3]
运行 2: [Lock1, Settle1, Lock2, Settle2, Lock3, Settle3]

最终状态相同,但中间 version 序列不同 → 无法直接 diff 验证。


本章目标

1. 实现分离 Version 空间

#![allow(unused)]
fn main() {
struct Balance {
    avail: u64,
    frozen: u64,
    lock_version: u64,    // 只在 lock/unlock 时递增
    settle_version: u64,  // 只在 settle 时递增
}
}

2. 扩展 BalanceEvent

#![allow(unused)]
fn main() {
struct BalanceEvent {
    user_id: u64,
    asset_id: u32,
    event_type: EventType,  // Deposit | Lock | Unlock | Settle
    version: u64,           // 在对应 version 空间内递增
    source_type: SourceType,// Order | Trade | External
    source_id: u64,         // order_seq_id | trade_id | ref_id
    delta: i64,
    avail_after: u64,
    frozen_after: u64,
}
}

3. 记录所有余额操作

Order(seq=5) ──触发──→ Lock(buyer USDT, lock_version=1)
     │
     └──→ Trade(id=3)
              │
              ├──触发──→ Settle(buyer: -USDT, +BTC, settle_version=1)
              └──触发──→ Settle(seller: -BTC, +USDT, settle_version=1)

4. 验证等效性并升级 Baseline

确保重构后的系统与重构前产生相同的最终状态。


实现进度

Phase 1: 分离 Version 空间 ✅ 已完成

目标:解决 Pipeline 确定性问题

1.1 修改 Balance 结构

#![allow(unused)]
fn main() {
// src/balance.rs
pub struct Balance {
    avail: u64,
    frozen: u64,
    lock_version: u64,    // lock/unlock/deposit/withdraw 操作递增
    settle_version: u64,  // spend_frozen/deposit 操作递增
}
}

1.2 Version 递增逻辑

操作递增的 Version
deposit()lock_version AND settle_version
withdraw()lock_version
lock()lock_version
unlock()lock_version
spend_frozen()settle_version

1.3 等效性验证 ✅

验证脚本scripts/verify_baseline_equivalence.py

$ python3 scripts/verify_baseline_equivalence.py

╔════════════════════════════════════════════════════════════╗
║     Baseline Equivalence Verification                      ║
╚════════════════════════════════════════════════════════════╝
...
=== Step 3: Compare avail and frozen values ===
✅ EQUIVALENT: avail and frozen values are IDENTICAL

Phase 2: 扩展 BalanceEvent ✅ 已完成

目标:完整的事件溯源

2.1 事件类型和结构

已在 src/messages.rs 中实现:

#![allow(unused)]
fn main() {
pub enum BalanceEventType { Deposit, Withdraw, Lock, Unlock, Settle }
pub enum SourceType { Order, Trade, External }

pub struct BalanceEvent {
    pub user_id: u64,
    pub asset_id: u32,
    pub event_type: BalanceEventType,
    pub version: u64,
    pub source_type: SourceType,
    pub source_id: u64,
    pub delta: i64,
    // ...
}
}

Phase 3: Ledger 记录所有操作 ✅ 已完成

目标:每个余额变更都有记录

3.1 事件日志文件

UBSCore 模式下生成 output/t2_events.csv

user_id,asset_id,event_type,version,source_type,source_id,delta,avail_after,frozen_after
655,2,lock,2,order,1,-3315478,996684522,3315478
96,2,settle,2,trade,1,-92889,999907111,0
604,1,deposit,1,external,1,10000000000,10000000000,0

3.2 当前记录的操作

操作状态说明
Deposit初始充值时记录
Lock下单锁定后记录
Settle成交结算后记录
Unlock取消订单时记录(当前测试无取消)
Withdraw提现时记录(当前测试无提现)

3.3 事件统计

Total events: 293,544
  Deposit events: 2,000
  Lock events: 100,000
  Settle events: 191,544

Phase 4: 验证测试 ✅ 已完成

目标:验证事件正确性

4.1 事件正确性验证

scripts/verify_balance_events.py - 7 项检查:

检查项说明状态
Lock 事件数量= 接受的订单数
Settle 事件数量= 成交数 × 4
Lock 版本连续性每个用户-资产对内递增
Settle 版本连续性每个用户-资产对内递增
Delta 守恒每笔成交的 delta 总和 = 0
Source 类型一致性Lock→Order, Settle→Trade
Deposit 事件正 delta + source_type=external

4.2 Events Baseline 验证

scripts/verify_events_baseline.py:

$ python3 scripts/verify_events_baseline.py
...
Comparing by event type...
  deposit: output=2000, baseline=2000 ✅
  lock: output=100000, baseline=100000 ✅
  settle: output=191544, baseline=191544 ✅

╔════════════════════════════════════════════════════════════╗
║     ✅ Events match baseline!                             ║
╚════════════════════════════════════════════════════════════╝

4.3 完整 E2E 测试

运行 scripts/test_ubscore_e2e.sh

$ bash scripts/test_ubscore_e2e.sh

=== Step 1: Run with UBSCore mode ===
...
=== Step 2: Verify standard baselines ===
  ✅ All MATCH

=== Step 3: Verify balance events correctness ===
  ✅ All 7 checks passed!

=== Step 4: Verify events baseline ===
  ✅ Events match baseline!

Baseline 文件

文件说明
baseline/t2_balances_final.csv最终余额状态
baseline/t2_orderbook.csv最终订单簿状态
baseline/t2_events.csv事件日志 (293,544 事件)

下一步

  • 0x08-d: 多线程 Pipeline - 实现 Ring Buffer 连接各服务
  • 0x09: 多 Symbol 支持 - 扩展到多交易对

参考

0x08-d Complete Order Lifecycle & Cancel Optimization

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

Core Objective: Implement full order lifecycle management (including Cancel and Refund), design a dual-track testing framework, and analyze performance bottlenecks.


1. Feature Implementation Overview

In this chapter, we completed the following core features to equip the trading engine with full order processing capabilities:

1.1 Order Events & State Management

Implemented complete OrderEvent enum and CSV logging.

OrderStatus (src/models.rs): Follows Binance-style Screaming Snake Case.

#![allow(unused)]
fn main() {
pub enum OrderStatus {
    NEW,              // Booked
    PARTIALLY_FILLED, 
    FILLED,           
    CANCELED,         // User Cancelled
    REJECTED,         // Risk Check Failed
    EXPIRED,          // System Expired
}
}

OrderEvent (src/messages.rs): Used for Event Sourcing and Audit Logs.

Event TypeTriggerFund Operation
AcceptedPassed risk checkLock
RejectedInsufficient balance/Bad paramsNone
FilledFully filledSettle
PartialFilledPartially filledSettle
CancelledUser cancelUnlock (Refund remaining)
ExpiredSystem expiredUnlock

CSV Log Format (output/t2_order_events.csv):

event_type,order_id,user_id,seq_id,filled_qty,remaining_qty,price,reason
accepted,1,100,101,,,,
rejected,3,102,103,,,,insufficient_balance
partial_filled,1,100,,5000,1000,,
filled,1,100,,0,,85000,
cancelled,5,100,,,2000,,

1.2 Cancel Workflow

  1. Parsing: scripts/csv_io.rs supports action=cancel.
  2. Removal: MatchingEngine calls OrderBook::remove_order_by_id.
  3. Unlock: UBSCore generates Unlock event to refund frozen funds.
  4. Logging: Record Cancelled event.

2. Dual-Track Testing Framework

To guarantee baseline stability while adding new features:

2.1 Regression Baseline

  • Dataset: fixtures/orders.csv (100k orders, Place only).
  • Script: scripts/test_e2e.sh
  • Goal: Ensure no performance regression for legacy flows.

2.2 Feature Testing

  • Dataset: fixtures/test_with_cancel/orders.csv (1M orders, 30% Cancel).
  • Script: scripts/test_cancel.sh
  • Goal: Verify lifecycle closure (Lock = Settle + Unlock).

3. Major Performance Issue

When scaling Cancel tests from 1,000 to 1,000,000 orders, we hit a severe performance wall.

3.1 Symptoms

  • Baseline (100k Place): ~3 seconds.
  • Cancel Test (1M Place+Cancel): > 7 minutes (430s).
  • Bottleneck: Matching Engine consumes 98% CPU.

3.2 Root Cause Analysis

The culprit is OrderBook::remove_order_by_id:

#![allow(unused)]
fn main() {
// src/orderbook.rs
pub fn remove_order_by_id(&mut self, order_id: u64) -> Option<InternalOrder> {
    // Scan ALL price levels -> Scan ALL orders in level
    for (key, orders) in self.bids.iter_mut() {
        if let Some(pos) = orders.iter().position(|o| o.order_id == order_id) {
            // ...
        }
    }
    // Scan asks...
}
}
  • Complexity: O(N).
  • Worst Case: With 500k orders piled up in the book, executing 300k cancels means 150 billion comparisons.

3.3 Solution (Next Step)

Introduce Order Index:

  • Structure: HashMap<OrderId, (Price, Side)>.
  • Complexity: Reduces Cancel from O(N) to O(1).

4. Verification Scripts

  1. verify_balance_events.py:

    • Added Check 8: Verify Frozen Balance history consistency.
    • Verify Unlock events correctly release funds.
  2. verify_order_events.py:

    • Verify every Accepted order has a final state.
    • Verify Cancelled orders correspond to existing Accepted orders.

5. Summary

We implemented full order lifecycle management and established a rigorous testing framework. Crucially, mass stress testing exposed a Big O algorithm defect in the cancel logic, setting the stage for the next optimization iteration.




🇨🇳 中文

📦 代码变更: 查看 Diff

核心目标:实现订单全生命周期管理(含撤单、退款),设计双轨制测试框架,并深入分析引入的性能瓶颈。


1. 功能实现概览

在本章中,我们完成了以下核心功能,使交易引擎具备了完整的订单处理能力:

1.1 订单事件与状态管理

实现了完整的 OrderEvent 枚举与 CSV 日志记录。

OrderStatus (src/models.rs): 注意遵循 Binance 风格的 Screaming Snake Case。

#![allow(unused)]
fn main() {
pub enum OrderStatus {
    NEW,              // 挂单中
    PARTIALLY_FILLED, // 部分成交
    FILLED,           // 完全成交
    CANCELED,         // 用户撤单 (注意拼写 CANCELED)
    REJECTED,         // 风控拒绝
    EXPIRED,          // 系统过期
}
}

OrderEvent (src/messages.rs): 用于 Event Sourcing 和审计日志。

事件类型触发场景资金操作
Accepted订单通过风控并进入撮合Lock (冻结)
Rejected余额不足或参数错误
Filled完全成交Settle (结算)
PartialFilled部分成交Settle (结算)
Cancelled用户撤单 (注意拼写 Cancelled)Unlock (解冻剩余资金)
Expired系统过期Unlock (解冻)

CSV 日志格式 (output/t2_order_events.csv): 实际代码实现的列顺序如下:

event_type,order_id,user_id,seq_id,filled_qty,remaining_qty,price,reason
accepted,1,100,101,,,,
rejected,3,102,103,,,,insufficient_balance
partial_filled,1,100,,5000,1000,,
filled,1,100,,0,,85000,
cancelled,5,100,,,2000,,

1.2 撤单流程 (Cancel Workflow)

实现了 cancel 动作的处理流程:

  1. 输入解析: scripts/csv_io.rs 支持新旧两种 CSV 格式。
    • 新格式: order_id,user_id,action,side,price,qty (支持 action=cancel)。
  2. 撮合移除: MatchingEngine 调用 OrderBook::remove_order_by_id 移除订单。
  3. 资金解锁: UBSCore 生成 Unlock 事件,返还冻结资金。
  4. 事件记录: 记录 Cancelled 事件。

2. 双轨制测试框架

为了在引入新功能的同时保证原有基准不被破坏,我们设计了双轨制测试策略

2.1 原始基准 (Regression Baseline)

  • 数据集: fixtures/orders.csv (10万订单,仅 Place)。
  • 脚本: scripts/test_e2e.sh
  • 目的: 确保传统撮合性能不回退,验证核心正确性。
  • 原则: 保持基准稳定 (非必要不修改,除非格式升级或重大调整)。

2.2 新功能测试 (Feature Testing)

  • 数据集: fixtures/test_with_cancel/orders.csv (100万订单,含30% Cancel)。
  • 脚本: scripts/test_cancel.sh
  • 验证:
    • verify_balance_events.py: 验证资金守恒 (Lock = Settle + Unlock)。
    • verify_order_events.py: 验证订单生命周期闭环。

3. 重大性能问题分析 (Major Issue)

在将撤单测试规模从 1000 扩大到 100万 时,我们发现了一个严重的性能崩塌现象。

3.1 现象

  • 基准测试 (10万 Place): 耗时 ~3秒。
  • 撤单测试 (100万 Place+Cancel): 耗时 超过 7分钟 (430秒)
  • 瓶颈定位: Matching Engine 耗时占比 98%。

3.2 原因深入分析

通过代码审查,我们发现瓶颈在于 OrderBook::remove_order_by_id 的实现:

#![allow(unused)]
fn main() {
// src/orderbook.rs
pub fn remove_order_by_id(&mut self, order_id: u64) -> Option<InternalOrder> {
    // 遍历卖单簿的所有价格层级 --> 遍历每个层级的所有订单
    for (key, orders) in self.bids.iter_mut() {
        if let Some(pos) = orders.iter().position(|o| o.order_id == order_id) {
            // ...
        }
    }
    // 遍历买单簿...
}
}
  • 复杂度: O(N),其中 N 是当前 OrderBook 中的订单总数。
  • 数据分布恶化: 在 test_with_cancel 数据集中,由于缺乏激进的“吃单”逻辑,大量订单堆积在撮合簿中(未成交)。假设盘口堆积了 50万 订单。
  • 计算量: 执行 30万 次撤单,每次遍历 50万 数据 = 1500亿次 CPU 比较操作

这解释了为什么系统在处理大规模撤单时速度极慢。

3.3 解决方案 (Next Step)

为了解决此问题,必须引入订单索引 (Order Index)

  • 结构: HashMap<OrderId, (Price, Side)>
  • 优化后复杂度: 撤单查找从 O(N) 降为 O(1)

4. 验证脚本

我们提供了两个 Python脚本用于验证逻辑正确性:

  1. verify_balance_events.py:

    • 新增 Check 8: 验证 Frozen Balance 的历史一致性。
    • 验证 Unlock 事件是否正确释放了资金。
  2. verify_order_events.py:

    • 验证所有 Accepted 订单最终都有终态 (Filled/Cancelled/Rejected)。
    • 验证 Cancelled 订单真的对应了相应的 Accepted 事件。

5. 总结

本章不仅完成了功能的开发,更重要的是建立了数据隔离的测试体系,并通过大规模压测暴露了算法复杂度缺陷。这为下一步的持续迭代奠定了坚实基础。

0x08-e Performance Profiling & Optimization

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

Background: After introducing Cancel, execution time exploded from ~30s to 7+ minutes. We need to identify and fix the issue.

Goal:

  1. Establish architecture-level profiling to pinpoint bottlenecks.
  2. Fix the identified O(N) issues.
  3. Verify improvements with data.

1. Symptoms

Performance collapsed after adding Cancel:

  • Execution Time: ~30s → 7+ minutes
  • Throughput: ~34k ops/s → ~3k ops/s

Hypothesis:

  • Is it the O(N) Cancel scan?
  • VecDeque removal overhead?
  • Something else?

Hypothesis implies guessing. Profiling provides facts.


2. Optimization 1: Order Index

2.1 The Problem

Cancelling requires looking up an order. The naive remove_order_by_id iterates the entire book:

#![allow(unused)]
fn main() {
// Before: O(N) full scan
pub fn remove_order_by_id(&mut self, order_id: u64) -> Option<InternalOrder> {
    for (key, orders) in self.bids.iter_mut() {
        if let Some(pos) = orders.iter().position(|o| o.order_id == order_id) {
            // ...
        }
    }
    // Scan asks...
}
}

2.2 The Solution

Introduce order_index: FxHashMap<OrderId, (Price, Side)> for O(1) lookup.

#![allow(unused)]
fn main() {
pub struct OrderBook {
    asks: BTreeMap<u64, VecDeque<InternalOrder>>,
    bids: BTreeMap<u64, VecDeque<InternalOrder>>,
    order_index: FxHashMap<u64, (u64, Side)>,  // New
    trade_id_counter: u64,
}
}

2.3 Index Maintenance

OperationAction
rest_order()Insert
cancel_order()Remove
remove_order_by_id()Remove
Match FillRemove

2.4 Optimized Implementation

#![allow(unused)]
fn main() {
pub fn remove_order_by_id(&mut self, order_id: u64) -> Option<InternalOrder> {
    // O(1) Lookup
    let (price, side) = self.order_index.remove(&order_id)?;
    
    // O(log n) Find level
    let (book, key) = match side {
        Side::Buy => (&mut self.bids, u64::MAX - price),
        Side::Sell => (&mut self.asks, price),
    };
    
    // O(k) Find in level (k is small)
    let orders = book.get_mut(&key)?;
    let pos = orders.iter().position(|o| o.order_id == order_id)?;
    let order = orders.remove(pos)?;
    
    if orders.is_empty() {
        book.remove(&key);
    }
    
    Some(order)
}
}

2.5 Result 1

MetricBeforeAfter
Time7+ min87s
Throughput~3k ops/s15k ops/s
Boost-5x

Huge improvement! But 87s for 1.3M orders is still slow (15k ops/s). Further analysis is needed.


3. Architecture Profiling

3.1 Design

Measure time at architectural stages:

Order Input
    │
    ▼
┌─────────────────┐
│  1. Pre-Trade   │  ← UBSCore: WAL + Balance Lock
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  2. Matching    │  ← Pure ME: process_order
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  3. Settlement  │  ← UBSCore: settle_trade
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  4. Event Log   │  ← Ledger writes
└─────────────────┘

3.2 PerfMetrics

#![allow(unused)]
fn main() {
pub struct PerfMetrics {
    pub total_pretrade_ns: u64,    // UBSCore WAL + Lock
    pub total_matching_ns: u64,    // Match processing
    pub total_settlement_ns: u64,  // Balance updates
    pub total_event_log_ns: u64,   // Ledger I/O
    
    pub place_count: u64,
    pub cancel_count: u64,
}
}

4. Optimization 2: Matching Engine

4.1 Bottleneck Identification

Profiling revealed Matching Engine used 96% of time. Deep dive found:

#![allow(unused)]
fn main() {
// Problem: Copy ALL price keys on every match
let prices: Vec<u64> = book.asks().keys().copied().collect();
}

With 250k+ price levels in the Cancel test, copying keys O(P) + Alloc every match is disastrous.

4.2 Solution

Use BTreeMap::range() to iterate only relevant prices.

#![allow(unused)]
fn main() {
// Solution: Iterate only valid price range
let max_price = if buy_order.order_type == OrderType::Limit {
    buy_order.price
} else {
    u64::MAX
};
let prices: Vec<u64> = book.asks().range(..=max_price).map(|(&k, _)| k).collect();
}

5. Final Results

5.1 Environment

  • Dataset: 1.3M Orders (1M Place + 300k Cancel)
  • HW: MacBook Pro M1

5.2 Breakdown

=== Performance Breakdown ===
Orders: 1300000, Trades: 538487

1. Pre-Trade:        621.97ms (  3.5%)  [  0.48 µs/order]
2. Matching:       15014.08ms ( 84.0%)  [ 15.01 µs/order]
3. Settlement:        21.57ms (  0.1%)  [  0.04 µs/trade]
4. Event Log:       2206.71ms ( 12.4%)  [  1.70 µs/order]

Total Tracked:     17864.33ms

5.3 Improvements

StageLatency BeforeLatency AfterGain
Matching83.53 µs/order15.01 µs/order5.6x
Cancel LookupO(N)0.29 µs-

6. Comparison Table

VersionTimeThroughputGain
Before optimization7+ min~3k ops/s-
Order Index87s15k ops/s5x
+ BTreeMap range18s72k ops/s24x

7. Summary

7.1 Achievements

OptimizationProblemSolutionResult
Order IndexO(N) CancelFxHashMap0.29 µs
Range QueryFull key copyrange()83→15 µs

7.2 Final Design Pattern

┌─────────────────────────────────────────────────────────┐
│                     OrderBook                           │
│  ┌─────────────────┐    ┌─────────────────────────────┐ │
│  │   order_index   │◄───│  Sync on: rest, cancel,     │ │
│  │ FxHashMap<id,   │    │           match, remove     │ │
│  │   (price,side)> │    └─────────────────────────────┘ │
│  └────────┬────────┘                                    │
│           │ O(1) lookup                                 │
│           ▼                                             │
│  ┌─────────────────┐    ┌─────────────────────────────┐ │
│  │      bids       │    │          asks               │ │
│  │ BTreeMap<price, │    │  BTreeMap<price,            │ │
│  │   VecDeque>     │    │    VecDeque>                │ │
│  │  + range()      │    │    + range()                │ │
│  └─────────────────┘    └─────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘

Optimization Conclusion: From 7 minutes to 18 seconds. 24x boost. 🚀




🇨🇳 中文

📦 代码变更: 查看 Diff

背景:引入 Cancel 功能后,执行时间从 ~30s 暴涨到 7+ 分钟,需要定位并解决问题。

本章目的

  1. 建立正确的架构级 Profiling 方法
  2. 通过 Profiling 精确定位性能瓶颈
  3. 针对性修复发现的问题

关键点:直觉可以指导方向,但必须用 Profiling 数据验证。


1. 问题现象

引入 Cancel 后性能急剧下降:

  • 执行时间:~30s → 7+ 分钟
  • 吞吐量:~34k ops/s → ~3k ops/s

初始假设可能的原因:

  • Cancel 的 O(N) 查找?
  • VecDeque 删除开销?
  • 其他未知问题?

但在 Profile 之前,这些都只是猜测。


2. Order Index 优化(第一次优化)

2.1 问题

撤单操作需要在 OrderBook 中查找订单。原始实现 remove_order_by_id 需要遍历整个订单簿:

#![allow(unused)]
fn main() {
// 优化前:O(N) 全表扫描
pub fn remove_order_by_id(&mut self, order_id: u64) -> Option<InternalOrder> {
    for (key, orders) in self.bids.iter_mut() {
        if let Some(pos) = orders.iter().position(|o| o.order_id == order_id) {
            // ...
        }
    }
    // 再遍历 asks...
}
}

2.2 解决方案

引入 order_index: FxHashMap<OrderId, (Price, Side)> 实现 O(1) 查找:

#![allow(unused)]
fn main() {
pub struct OrderBook {
    asks: BTreeMap<u64, VecDeque<InternalOrder>>,
    bids: BTreeMap<u64, VecDeque<InternalOrder>>,
    order_index: FxHashMap<u64, (u64, Side)>,  // 新增
    trade_id_counter: u64,
}
}

2.3 索引维护

操作索引动作
rest_order()插入
cancel_order()移除
remove_order_by_id()移除
撮合成交移除

2.4 优化后实现

#![allow(unused)]
fn main() {
pub fn remove_order_by_id(&mut self, order_id: u64) -> Option<InternalOrder> {
    // O(1) 查找
    let (price, side) = self.order_index.remove(&order_id)?;
    
    // O(log n) 定位价格层级
    let (book, key) = match side {
        Side::Buy => (&mut self.bids, u64::MAX - price),
        Side::Sell => (&mut self.asks, price),
    };
    
    // O(k) 在价格层级内查找 (k 通常很小)
    let orders = book.get_mut(&key)?;
    let pos = orders.iter().position(|o| o.order_id == order_id)?;
    let order = orders.remove(pos)?;
    
    if orders.is_empty() {
        book.remove(&key);
    }
    
    Some(order)
}
}

2.5 第一次优化结果

指标优化前优化后
执行时间7+ 分钟87s
吞吐量~3k ops/s15k ops/s
提升-5x

提升巨大! 但 87s 处理 130万订单仍然很慢。需要继续分析。


3. 架构级 Profiling(定位真正瓶颈)

3.1 Profiling 设计

按照订单生命周期的顶层架构分阶段计时:

Order Input
    │
    ▼
┌─────────────────┐
│  1. Pre-Trade   │  ← UBSCore: WAL + Balance Lock
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  2. Matching    │  ← Pure ME: process_order
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  3. Settlement  │  ← UBSCore: settle_trade
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  4. Event Log   │  ← Ledger writes
└─────────────────┘

3.2 PerfMetrics 设计

#![allow(unused)]
fn main() {
pub struct PerfMetrics {
    // 顶层架构计时
    pub total_pretrade_ns: u64,    // UBSCore WAL + Lock
    pub total_matching_ns: u64,    // Pure ME
    pub total_settlement_ns: u64,  // Balance updates
    pub total_event_log_ns: u64,   // Ledger writes
    
    // 操作计数
    pub place_count: u64,
    pub cancel_count: u64,
}
}

4. Matching Engine 优化(第二次优化)

4.1 问题定位

通过架构级 Profiling 发现 Matching Engine 占用 96% 时间。深入分析发现:

#![allow(unused)]
fn main() {
// 问题代码:每次 match 都复制所有价格 keys
let prices: Vec<u64> = book.asks().keys().copied().collect();
}

当订单簿有 25万+ 价格层级时,每次 match 都要:

  1. 遍历整个 BTreeMap 收集 keys - O(P)
  2. 分配 Vec 存储 - 内存分配开销
  3. 再遍历 Vec 进行匹配

4.2 优化方案

使用 BTreeMap::range() 只收集匹配范围内的 keys:

#![allow(unused)]
fn main() {
// 优化后:只收集匹配价格范围内的 keys
let max_price = if buy_order.order_type == OrderType::Limit {
    buy_order.price
} else {
    u64::MAX
};
let prices: Vec<u64> = book.asks().range(..=max_price).map(|(&k, _)| k).collect();
}

5. 性能测试结果

5.1 测试环境

  • 数据集:130万订单(100万 Place + 30万 Cancel)
  • 机器:MacBook Pro M1

5.2 最终 Breakdown

=== Performance Breakdown ===
Orders: 1300000 (Place: 1000000, Cancel: 300000), Trades: 538487

1. Pre-Trade:        621.97ms (  3.5%)  [  0.48 µs/order]
2. Matching:       15014.08ms ( 84.0%)  [ 15.01 µs/order]
3. Settlement:        21.57ms (  0.1%)  [  0.04 µs/trade]
4. Event Log:       2206.71ms ( 12.4%)  [  1.70 µs/order]

Total Tracked:     17864.33ms

5.3 优化效果

阶段优化前优化后提升
Matching83.53 µs/order15.01 µs/order5.6x
Cancel LookupO(N)0.29 µs-

6. 执行性能对比

版本执行时间吞吐量改进
优化前 (O(N) 撤单 + 全量 keys)7+ 分钟~3k ops/s-
Order Index 优化87s15k ops/s5x
+ BTreeMap range query18s72k ops/s24x

7. 总结

7.1 优化成果

优化问题解决方案效果
Order IndexO(N) 撤单查找FxHashMap 索引0.29 µs/cancel
BTreeMap range全量 keys 复制range() 范围查询83→15 µs/order

7.2 最终设计模式

┌─────────────────────────────────────────────────────────┐
│                     OrderBook                           │
│  ┌─────────────────┐    ┌─────────────────────────────┐ │
│  │   order_index   │◄───│  Sync on: rest, cancel,     │ │
│  │ FxHashMap<id,   │    │           match, remove     │ │
│  │   (price,side)> │    └─────────────────────────────┘ │
│  └────────┬────────┘                                    │
│           │ O(1) lookup                                 │
│           ▼                                             │
│  ┌─────────────────┐    ┌─────────────────────────────┐ │
│  │      bids       │    │          asks               │ │
│  │ BTreeMap<price, │    │  BTreeMap<price,            │ │
│  │   VecDeque>     │    │    VecDeque>                │ │
│  │  + range()      │    │    + range()                │ │
│  └─────────────────┘    └─────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘

本次优化先到此为止!从 7 分钟到 18 秒,吞吐量提升 24 倍! 🚀

0x08-f Ring Buffer Pipeline Implementation

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

Goal: Connect services using Ring Buffers to implement a true Pipeline architecture.


Part 1: Single-Thread Pipeline

1.1 Background

Legacy Execution (Synchronous Serial):

for order in orders:
    1. ubscore.process_order(order)     # WAL + Lock
    2. engine.process_order(order)       # Match
    3. ubscore.settle_trade(trade)       # Settle
    4. ledger.write(event)               # Persist

Problem: No pipeline parallelism, latency accumulates.

1.2 Single-Thread Pipeline Architecture

Decouple services using Ring Buffers, but polling within a single thread loop:

┌─────────────────────────────────────────────────────────────────────────┐
│                    Single-Thread Pipeline (Round-Robin)                  │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│   Stage 1: Ingestion          →  order_queue                            │
│   Stage 2: UBSCore Pre-Trade  →  valid_order_queue                      │
│   Stage 3: Matching Engine    →  trade_queue                            │
│   Stage 4: Settlement         →  (Ledger)                               │
│                                                                          │
│   All Stages executed in a round-robin loop                              │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Core Data Structures:

#![allow(unused)]
fn main() {
pub struct PipelineQueues {
    pub order_queue: Arc<ArrayQueue<SequencedOrder>>,
    pub valid_order_queue: Arc<ArrayQueue<ValidOrder>>,
    pub trade_queue: Arc<ArrayQueue<TradeEvent>>,
}
}

Execution Loop:

#![allow(unused)]
fn main() {
loop {
    // UBSCore: order_queue → valid_order_queue
    if let Some(order) = queues.order_queue.pop() {
        // ...
    }
    
    // ME: valid_order_queue → trade_queue
    if let Some(valid_order) = queues.valid_order_queue.pop() {
        // ...
    }
    
    // Settlement: trade_queue → persist
    if let Some(trade) = queues.trade_queue.pop() {
        // ...
    }
}
}

Part 2: Multi-Thread Pipeline

2.1 Architecture

Full Multi-Threaded Pipeline based on 0x08-a design:

┌───────────────────────────────────────────────────────────────────────────────────────┐
│                          Multi-Thread Pipeline (Full)                                  │
├───────────────────────────────────────────────────────────────────────────────────────┤
│                                                                                        │
│  Thread 1: Ingestion       Thread 2: UBSCore              Thread 3: ME                │
│  ┌─────────────────┐       ┌──────────────────────┐       ┌─────────────────┐         │
│  │ Read orders     │       │  PRE-TRADE:          │       │ Match Order     │         │
│  │ Assign SeqNum   │──────▶│  - Write WAL         │──────▶│ in OrderBook    │         │
│  │                 │   ①   │  - process_order()   │  ③    │                 │         │
│  └─────────────────┘       │  - lock_balance()    │       │ Generate        │         │
│                            │                      │       │ TradeEvents     │         │
│                            └──────────┬───────────┘       └────────┬────────┘         │
│                                       ▲                            │                  │
│                                       │                            │                  │
│                                       │ ⑤ balance_update_queue     │ ④ trade_queue   │
│                                       └────────────────────────────┤                  │
│                                                                    │                  │
│                            ┌──────────────────────┐                ▼                  │
│                            │  POST-TRADE:         │       ┌─────────────────┐         │
│                            │  - settle_trade()    │       │ Thread 4:       │         │
│                            │  - spend_frozen()    │──────▶│ Settlement      │         │
│                            │  - deposit()         │  ⑥    │                 │         │
│                            │  - Generate Balance  │       │ Persist:        │         │
│                            │    Update Events     │       │ - Trade Events  │         │
│                            └──────────────────────┘       │ - Balance Events│         │
│                                                           │ - Ledger        │         │
│                                                           └─────────────────┘         │
│                                                                                        │
└───────────────────────────────────────────────────────────────────────────────────────┘

2.2 Key Design Points

  1. ME Fan-out: ME sends TradeEvent in parallel to:
    • trade_queue → Settlement (Persist)
    • balance_update_queue → UBSCore (Balance Settle)
  2. UBSCore as Single Balance Entry: Handles Pre-Trade Lock, Post-Trade Settle, and Refunds.
  3. Settlement Consolidation: Consumes both Trade Events and Balance Events.

2.3 Data Types

BalanceUpdateRequest (ME → UBSCore): Contains Trade Event and optional Price Improvement data.

BalanceEvent (UBSCore → Settlement): The unified channel for ALL balance changes (Lock, Settle, Credit, Refund).

#![allow(unused)]
fn main() {
pub enum BalanceEventType {
    Lock,           // Pre-Trade
    SpendFrozen,    // Post-Trade
    Credit,         // Post-Trade
    RefundFrozen,   // Price Improvement
    // ...
}
}

2.4 Implementation Status

ComponentStatus
All Queues✅ Implemented
UBSCore BalanceEvent Gen✅ Implemented
Settlement Persistence✅ Implemented

Verification & Performance (2025-12-17)

Correctness

E2E tests pass for both pipeline modes.

Performance Comparison

1.3M Orders (with 300k Cancel):

ModeTimeThroughputTrades
UBSCore (Baseline)23.5s55k ops/s538,487
Single-Thread Pipeline22.1s59k ops/s538,487
Multi-Thread Pipeline29.1s45k ops/s489,804
  • Issue: Multi-Thread mode is currently slower (-30%) on large datasets and skips cancel orders.

100k Orders (Place only):

ModeTimeThroughputvs Baseline
UBSCore755ms132k ops/s-
Single-Thread519ms193k ops/s+46%
Multi-Thread391ms256k ops/s+93%
  • Observation: Multi-threading shines on smaller, simpler datasets (+93%).

Analysis

Multi-threaded pipeline overhead (context switching, queue contention, event generation) outweighs benefits when per-order processing time is very low (due to optimizations). Also, missing Cancel logic reduces correctness.


Key Design Decisions

  • Backpressure: Spin Wait (prioritize low latency).
  • Shutdown: Graceful drain using Atomic Signals.
  • Error Handling: Logging and metric counting; critical paths must succeed.



🇨🇳 中文

📦 代码变更: 查看 Diff

目标:使用 Ring Buffer 串接不同服务,实现真正的 Pipeline 架构


Part 1: 单线程 Pipeline

1.1 背景

原始执行模式 (同步串行):

for order in orders:
    1. ubscore.process_order(order)     # WAL + Lock
    2. engine.process_order(order)       # Match
    3. ubscore.settle_trade(trade)       # Settle
    4. ledger.write(event)               # Persist

问题:没有 Pipeline 并行,延迟累加

1.2 单线程 Pipeline 架构

使用 Ring Buffer 解耦各服务,但仍在单线程中轮询执行:

┌─────────────────────────────────────────────────────────────────────────┐
│                    Single-Thread Pipeline (Round-Robin)                  │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│   Stage 1: Ingestion          →  order_queue                            │
│   Stage 2: UBSCore Pre-Trade  →  valid_order_queue                      │
│   Stage 3: Matching Engine    →  trade_queue                            │
│   Stage 4: Settlement         →  (Ledger)                               │
│                                                                          │
│   所有 Stage 在同一个 while 循环中轮询执行                               │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

核心数据结构:

#![allow(unused)]
fn main() {
pub struct PipelineQueues {
    pub order_queue: Arc<ArrayQueue<SequencedOrder>>,
    pub valid_order_queue: Arc<ArrayQueue<ValidOrder>>,
    pub trade_queue: Arc<ArrayQueue<TradeEvent>>,
}
}

执行流程:

#![allow(unused)]
fn main() {
loop {
    // UBSCore: order_queue → valid_order_queue
    if let Some(order) = queues.order_queue.pop() {
        // ...
    }
    
    // ME: valid_order_queue → trade_queue
    if let Some(valid_order) = queues.valid_order_queue.pop() {
        // ...
    }
    
    // Settlement: trade_queue → persist
    if let Some(trade) = queues.trade_queue.pop() {
        // ...
    }
}
}

Part 2: 多线程 Pipeline

2.1 架构

根据 0x08-a 原始设计,完整的多线程 Pipeline 数据流如下:

┌───────────────────────────────────────────────────────────────────────────────────────┐
│                          Multi-Thread Pipeline (完整版)                                │
├───────────────────────────────────────────────────────────────────────────────────────┤
│                                                                                        │
│  Thread 1: Ingestion       Thread 2: UBSCore              Thread 3: ME                │
│  ┌─────────────────┐       ┌──────────────────────┐       ┌─────────────────┐         │
│  │ Read orders     │       │  PRE-TRADE:          │       │ Match Order     │         │
│  │ Assign SeqNum   │──────▶│  - Write WAL         │──────▶│ in OrderBook    │         │
│  │                 │   ①   │  - process_order()   │  ③    │                 │         │
│  └─────────────────┘       │  - lock_balance()    │       │ Generate        │         │
│                            │                      │       │ TradeEvents     │         │
│                            └──────────┬───────────┘       └────────┬────────┘         │
│                                       ▲                            │                  │
│                                       │                            │                  │
│                                       │ ⑤ balance_update_queue     │ ④ trade_queue   │
│                                       └────────────────────────────┤                  │
│                                                                    │                  │
│                            ┌──────────────────────┐                ▼                  │
│                            │  POST-TRADE:         │       ┌─────────────────┐         │
│                            │  - settle_trade()    │       │ Thread 4:       │         │
│                            │  - spend_frozen()    │──────▶│ Settlement      │         │
│                            │  - deposit()         │  ⑥    │                 │         │
│                            │  - Generate Balance  │       │ Persist:        │         │
│                            │    Update Events     │       │ - Trade Events  │         │
│                            └──────────────────────┘       │ - Balance Events│         │
│                                                           │ - Ledger        │         │
│                                                           └─────────────────┘         │
│                                                                                        │
└───────────────────────────────────────────────────────────────────────────────────────┘

2.2 关键设计点

  1. ME Fan-out: ME 将 TradeEvent 并行发送到:
    • trade_queue → Settlement (持久化交易记录)
    • balance_update_queue → UBSCore (余额结算)
  2. UBSCore 是余额操作的唯一入口: 处理 Pre-Trade 锁定、Post-Trade 结算和退款。
  3. Settlement 聚合: 同时消费交易事件和余额事件。

2.3 数据类型

BalanceUpdateRequest (ME → UBSCore): 包含成交事件和可能的价格改善(Price Improvement)数据。

BalanceEvent (UBSCore → Settlement): 所有余额变更的统一通道 (Lock, Settle, Credit, Refund)。

#![allow(unused)]
fn main() {
pub enum BalanceEventType {
    Lock,           // Pre-Trade
    SpendFrozen,    // Post-Trade
    Credit,         // Post-Trade
    RefundFrozen,   // Price Improvement
    // ...
}
}

2.4 实现状态

组件状态
所有队列✅ 已实现
UBSCore BalanceEvent 生成✅ 已实现
Settlement 持久化✅ 已实现

验证与性能 (2025-12-17)

正确性

E2E 测试在两种模式下均通过。

性能对比

1.3M 订单 (含 30 万撤单):

模式执行时间吞吐量成交数
UBSCore (Baseline)23.5s55k ops/s538,487
单线程 Pipeline22.1s59k ops/s538,487
多线程 Pipeline29.1s45k ops/s489,804
  • 问题: 多线程模式在大数据集上反而更慢 (-30%),且目前跳过了撤单处理。

100k 订单 (仅 Place):

模式时间吞吐量提升
UBSCore755ms132k ops/s-
单线程519ms193k ops/s+46%
多线程391ms256k ops/s+93%
  • 观察: 多线程在简单的小数据集上表现出色 (+93%)。

分析

在单笔处理极快的情况下,多线程带来的开销(上下文切换、队列竞争、事件生成)超过了并行的收益。此外,缺失撤单逻辑降低了正确性。


关键设计决策

  • 背压: 自旋等待 (Spin Wait),优先低延迟。
  • 关闭: 使用原子信号优雅退出。
  • 错误处理: 日志记录,核心路径必须成功。

0x08-g Multi-Thread Pipeline Design

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff | Key File: pipeline_mt.rs

Overview

The Multi-Thread Pipeline distributes processing logic across 4 independent threads, communicating via lock-free queues to achieve high throughput order processing.

Architecture

┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Ingestion  │────▶│   UBSCore   │────▶│     ME      │────▶│ Settlement  │
│  (Thread 1) │     │  (Thread 2) │     │  (Thread 3) │     │  (Thread 4) │
└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘
      │                   │ ▲                 │                   │
      │                   │ │                 │                   │
      ▼                   ▼ │                 ▼                   ▼
  order_queue ────▶ action_queue      balance_update_queue   trade_queue
                           │                                balance_event_queue
                           └──────────────────────────────────────┘

Thread Responsibilities

ThreadResponsibilityInput QueueOutput
IngestionParse orders, assign SeqNumorders (iterator)order_queue
UBSCorePre-Trade (WAL + Lock) + Post-Trade (Settle)order_queue, balance_update_queueaction_queue, balance_event_queue
MEMatch, Cancel handlingaction_queuetrade_queue, balance_update_queue
SettlementPersist Events (Trade, Balance)trade_queue, balance_event_queueledgers

Queue Design

Using crossbeam-queue::ArrayQueue for lock-free MPSC queues:

#![allow(unused)]
fn main() {
pub struct MultiThreadQueues {
    pub order_queue: Arc<ArrayQueue<OrderAction>>,     // 64K
    pub action_queue: Arc<ArrayQueue<ValidAction>>,    // 64K
    pub trade_queue: Arc<ArrayQueue<TradeEvent>>,      // 64K
    pub balance_update_queue: Arc<ArrayQueue<BalanceUpdateRequest>>,  // 64K
    pub balance_event_queue: Arc<ArrayQueue<BalanceEvent>>,           // 64K
}
}

Cancel Handling

  1. Ingestion: Create OrderAction::Cancel.
  2. UBSCore: Pass to action_queue (No lock needed).
  3. ME: Remove from OrderBook, send BalanceUpdateRequest::Cancel.
  4. UBSCore: Process unlock, generate BalanceEvent::Unlock.
  5. Settlement: Persist BalanceEvent.

Consistency Verification

Test Script

# Run full comparison test
./scripts/test_pipeline_compare.sh highbal

# Supported Datasets:
#   100k    - 100k orders without cancel
#   cancel  - 1.3M orders with 30% cancel
#   highbal - 1.3M orders with 30% cancel, high balance (Recommended)

Verification Results (1.3M orders, 30% cancel, high balance)

╔════════════════════════════════════════════════════════════════╗
║                    ✅ ALL TESTS PASSED                         ║
║  Multi-thread pipeline matches single-thread exactly!          ║
╚════════════════════════════════════════════════════════════════╝

Key Metrics

DatasetTotalPlaceCancelTradesResult
100k100,000100,000047,886✅ Match
1.3M HighBal1,300,0001,000,000300,000667,567✅ Match

Important Considerations

Balance Sufficiency

Insufficient balance may cause rejections. In concurrent environments, rejection timing can vary due to settlement latency, leading to non-deterministic results. Solution: Use highbal dataset (1000 BTC + 100M USDT per user).

Shutdown Synchronization

Wait for queues to drain before signaling shutdown:

#![allow(unused)]
fn main() {
while !queues.all_empty() {
    std::hint::spin_loop();
}
shutdown.request_shutdown();
}

Performance

Mode100k orders1.3M orders
Single-Thread350ms15.5s
Multi-Thread330ms15.6s

Note: Multi-thread version includes overhead for BalanceEvent generation/persistence, matching Single-Thread performance. Future optimizations: Batch I/O, reduce contention.

Queue Priority Strategy (Future)

Current Implementation: Prioritize draining balance_update_queue completely before processing order_queue.

Future: Weighted Round-Robin: Allow alternating processing to improve responsiveness.

#![allow(unused)]
fn main() {
const SETTLE_WEIGHT: u32 = 3;  // settle : order = 3 : 1
}

File Structure

src/
├── pipeline.rs       # Shared types
├── pipeline_mt.rs    # Multi-thread impl
├── pipeline_runner.rs # Single-thread impl
└── main.rs



🇨🇳 中文

📦 代码变更: 查看 Diff | 关键文件: pipeline_mt.rs

概述

Multi-Thread Pipeline 将处理逻辑分布在 4 个独立线程中,通过无锁队列通信,实现高吞吐量的订单处理。

架构

┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Ingestion  │────▶│   UBSCore   │────▶│     ME      │────▶│ Settlement  │
│  (Thread 1) │     │  (Thread 2) │     │  (Thread 3) │     │  (Thread 4) │
└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘
      │                   │ ▲                 │                   │
      │                   │ │                 │                   │
      ▼                   ▼ │                 ▼                   ▼
  order_queue ────▶ action_queue      balance_update_queue   trade_queue
                           │                                balance_event_queue
                           └──────────────────────────────────────┘

线程职责

线程职责输入队列输出
Ingestion订单解析、序列号分配orders (iterator)order_queue
UBSCorePre-Trade (WAL + Lock) + Post-Trade (Settle)order_queue, balance_update_queueaction_queue, balance_event_queue
ME订单撮合、取消处理action_queuetrade_queue, balance_update_queue
Settlement事件持久化 (TradeEvent, BalanceEvent)trade_queue, balance_event_queueledger files

队列设计

使用 crossbeam-queue::ArrayQueue 实现无锁 MPSC 队列:

#![allow(unused)]
fn main() {
pub struct MultiThreadQueues {
    pub order_queue: Arc<ArrayQueue<OrderAction>>,     // 64K capacity
    pub action_queue: Arc<ArrayQueue<ValidAction>>,    // 64K capacity
    pub trade_queue: Arc<ArrayQueue<TradeEvent>>,      // 64K capacity
    pub balance_update_queue: Arc<ArrayQueue<BalanceUpdateRequest>>,  // 64K
    pub balance_event_queue: Arc<ArrayQueue<BalanceEvent>>,           // 64K
}
}

Cancel 订单处理

Cancel 订单流程:

  1. Ingestion: 创建 OrderAction::Cancel { order_id, user_id }
  2. UBSCore: 直接传递到 action_queue(无需 balance lock)
  3. ME: 从 OrderBook 移除订单,发送 BalanceUpdateRequest::Cancel
  4. UBSCore (Post-Trade): 处理 unlock,生成 BalanceEvent::Unlock
  5. Settlement: 持久化 BalanceEvent

一致性验证

测试脚本

# 运行完整对比测试
./scripts/test_pipeline_compare.sh highbal

# 支持的数据集:
#   100k    - 100k orders without cancel
#   cancel  - 1.3M orders with 30% cancel
#   highbal - 1.3M orders with 30% cancel, high balance (推荐)

验证结果 (1.3M orders, 30% cancel, high balance)

╔════════════════════════════════════════════════════════════════╗
║                    ✅ ALL TESTS PASSED                         ║
║  Multi-thread pipeline matches single-thread exactly!          ║
╚════════════════════════════════════════════════════════════════╝

关键指标

数据集总订单PlaceCancelTrades结果
100k (无 cancel)100,000100,000047,886✅ 完全一致
1.3M + 30% cancel (高余额)1,300,0001,000,000300,000667,567✅ 完全一致

注意事项

余额充足性

如果测试数据中用户余额不足,可能导致部分订单被 reject。在并发环境中,由于 settle 时序不同,这些 reject 可能与单线程结果不同。

解决方案: 使用 highbal 数据集,确保每个用户有充足余额(1000 BTC + 100M USDT)。

Shutdown 同步

Multi-thread pipeline 在 shutdown 时需要确保所有队列都已 drain:

#![allow(unused)]
fn main() {
while !queues.all_empty() {
    std::hint::spin_loop();
}
shutdown.request_shutdown();
}

性能

模式100k orders1.3M orders
Single-Thread350ms15.5s
Multi-Thread330ms15.6s

注:Multi-thread 当前版本包含 BalanceEvent 生成和持久化开销,性能与 Single-Thread 相当。未来优化方向包括批量 I/O 和减少队列竞争。

队列优先级策略 (未来)

当前实现: 完全优先 drain balance_update_queue,然后才处理新订单。

未来优化: 加权轮询 (Weighted Round-Robin): 允许交替处理,提高响应性。

#![allow(unused)]
fn main() {
const SETTLE_WEIGHT: u32 = 3;  // settle : order = 3 : 1
}

文件结构

src/
├── pipeline.rs       # 共享类型: PipelineStats, MultiThreadQueues, ShutdownSignal
├── pipeline_mt.rs    # Multi-thread 实现: run_pipeline_multi_thread()
├── pipeline_runner.rs # Single-thread 实现: run_pipeline()
└── main.rs           # --pipeline / --pipeline-mt 模式选择

0x08-h Performance Monitoring & Observability

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff | Key File: pipeline_services.rs

“If you can’t measure it, you can’t improve it.” This chapter focuses on introducing production-grade performance monitoring and observability for our multi-threaded pipeline.

Monitoring Dimensions

1. Latency Metrics

In HFT, averages are misleading. We care about Tail Latency.

  • P50 (Median): General performance.
  • P99 / P99.9: Stability in extreme cases.
  • Max: Jitter, GC, or system calls.

2. Throughput

  • Orders/sec: Processing capacity.
  • Trades/sec: Matching capacity.

3. Queue Depth & Backpressure

Monitoring Ring Buffer occupancy reveals downstream bottlenecks and jitter.

4. Architectural Breakdown

Knowing where time is spent (Pre-Trade vs Matching vs Settlement).


Test Execution

Dataset: 1.3M orders (30% cancel) from fixtures/test_with_cancel_highbal/.

Single-Thread Run:

cargo run --release -- --pipeline --input fixtures/test_with_cancel_highbal

Multi-Thread Run:

cargo run --release -- --pipeline-mt --input fixtures/test_with_cancel_highbal

Compare Script:

./scripts/test_pipeline_compare.sh highbal

Analysis Results (1.3M Dataset)

1. Single-Thread Pipeline

  • Throughput: 210,000 orders/sec (P50 Latency: 1.25 µs)
  • Breakdown:
    • Matching Engine: 91.5% (The bottleneck)
    • UBSCore Lock: 5.6%
    • Persistence: 2.7%

2. Multi-Thread Pipeline (After Service Refactor)

  • Throughput: ~64,450 orders/sec
  • E2E Latency (P50): ~113 ms
  • E2E Latency (P99): ~188 ms

Conclusion

  1. Parallelism Works: Total task CPU time (~34s) > Wall time (17.5s).
  2. Bottleneck: Matching Engine remains the serial bottleneck (~52k ops/s limit).
  3. Latency Cost: Multi-threading introduces significant message passing latency (µs → ms).

Logging & Observability

We introduced a production-grade asynchronous logging system using tracing.

1. Non-blocking I/O

Using tracing-appender with a dedicated worker thread and memory buffer to prevent I/O blocking.

2. Environment-driven Config

  • Dev: Detailed, human-readable.
  • Prod: JSON format, high-frequency tracing disabled (0XINFI=off).

3. Standardized Targets

All pipeline logs use the 0XINFI namespace (e.g., 0XINFI::ME, 0XINFI::UBSC) for precise filtering.


Intent-Based Design: From Functions to Services

“Good architecture is not designed upfront, but evolved through refactoring.”

We refactored tightly coupled spawn_* functions into decoupled Service Structs.

Problem: Coupled Functions

#![allow(unused)]
fn main() {
// ❌ Business logic buried in thread spawning
fn spawn_me_stage(...) -> JoinHandle<OrderBook> {
    thread::spawn(move || {
        // Logic locked inside closure
    })
}
}
  • Untestable: Cannot unit test logic without spawning threads.
  • Not Reusable: Cannot be used in single-thread mode.

Solution: Service Structs

#![allow(unused)]
fn main() {
// ✅ Intent is clear and decoupled
pub struct MatchingService {
    book: OrderBook,
    // ...
}

impl MatchingService {
    pub fn run(&mut self, shutdown: &ShutdownSignal) { ... }
}
}

Benefits

  • Testability: Services can be instantiated and tested in isolation.
  • Reusability: Core logic is decoupled from threading model.
  • Clarity: Code expresses “what” (Service), not just “how” (Thread).



🇨🇳 中文

📦 代码变更: 查看 Diff | 关键文件: pipeline_services.rs

在构建高性能低延迟交易系统时,“如果你无法测量它,你就无法优化它”。本章重点在于为我们的多线程 Pipeline 引入生产级的性能监控和延迟指标分析。

监控维度

1. 延迟指标 (Latency Metrics)

对于 HFT 系统,平均延迟往往是误导性的,我们更关心长尾延迟 (Tail Latency)

  • P50 (Median): 中位数延迟,反映平均水平。
  • P99 / P99.9: 长尾延迟,反映系统在极端情况下的稳定性。
  • Max: 峰值延迟,通常由系统抖动 (Jitter) 或 GC/系统调用引起。

2. 吞吐量 (Throughput)

  • Orders/sec: 每秒处理订单数。
  • Trades/sec: 每秒撮合成交数。

3. 队列深度与背压 (Queue Depth & Backpressure)

监控 Ring Buffer 的占用情况,识别下游瓶颈。

4. 架构内部阶段耗时 (Architectural Breakdown)

清晰地知道时间花在了哪里:Pre-Trade / Matching / Settlement / Logging。

测试执行方法

数据集: 130 万订单(含 30% 撤单) fixtures/test_with_cancel_highbal/

运行单线程:

cargo run --release -- --pipeline --input fixtures/test_with_cancel_highbal

运行多线程:

cargo run --release -- --pipeline-mt --input fixtures/test_with_cancel_highbal

对比脚本:

./scripts/test_pipeline_compare.sh highbal

执行结果与分析 (1.3M 数据集)

1. 单线程流水线

  • 性能: 210,000 orders/sec (P50: 1.25 µs)
  • 瓶颈: Matching Engine 耗时 91.5%,是最大瓶颈。

2. 多线程流水线 (重构后)

  • 吞吐量: ~64,450 orders/sec
  • 端到端延迟 (P50): ~113 ms
  • 端到端延迟 (P99): ~188 ms

结论

  1. 并行有效: CPU 总耗时远大于执行时间。
  2. 瓶颈: Matching Engine 依然是最大的串行瓶颈 (吞吐上限 ~52k)。
  3. 延迟: 多线程引入的消息传递开销导致端到端延迟从微秒级退化到毫秒级。

日志与可观测性

引入基于 tracing 的生产级异步日志体系。

1. 异步非阻塞架构

使用 tracing-appender 独立线程写入日志,不阻塞业务线程。

2. 环境驱动配置

Dev 开启详细日志,Prod 使用 JSON 并关闭高频追踪。

3. 标准化日志目标

使用 0XINFI 命名空间 (如 0XINFI::ME) 实现精细过滤。

意图编码:从函数到服务

“好的架构不是一开始就设计出来的,而是通过不断重构演进出来的。”

我们将紧耦合的 spawn_* 函数重构为解耦的 Service 结构体

问题:紧耦合

#![allow(unused)]
fn main() {
// ❌ 业务逻辑埋在线程创建中
fn spawn_me_stage(...) {
    thread::spawn(move || { ... })
}
}

无法单元测试,无法复用。

解决方案:Service 结构体

#![allow(unused)]
fn main() {
// ✅ 意图清晰,解耦
pub struct MatchingService { ... }

impl MatchingService {
    pub fn run(&mut self, shutdown: &ShutdownSignal) { ... }
}
}

收益

  • 可测试性: 服务可独立实例化测试。
  • 可复用性: 核心逻辑与线程模型解耦。
  • 清晰度: 代码表达“做什么“ (Service),而非“怎么做“ (Thread)。

0x09-a Gateway: Client Access Layer

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

Core Objective: Implement a lightweight HTTP Gateway to connect clients with the trading core system.


Background: From Core to MVP

We have built a functional Trading Core:

  • OrderBook (0x04)
  • Balance Management (0x05-0x06)
  • Matching Engine (0x08)
  • Pipeline & Monitoring (0x08-f/g/h)

To become a usable MVP, we need auxiliary systems:

┌─────────────────────────────────────────────────────────────────────────┐
│                        Complete Trading System MVP                       │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Client (Web/Mobile/API)                                                 │
│       │                                                                  │
│       ▼                                                                  │
│  ┌─────────────────┐                                                     │
│  │   0x09-a        │  ← This Chapter: Accept orders, return response     │
│  │   Gateway       │                                                     │
│  └────────┬────────┘                                                     │
│           │                                                                  │
│           ▼                                                                  │
│  ┌─────────────────────────────────────────────────────────────────┐     │
│  │              Trading Core (Completed)                            │     │
│  │  Ingestion → UBSCore → ME → Settlement                          │     │
│  └─────────────────────────────────────────────────────────────────┘     │

0x09 Series Plan

ChapterTopicCore Function
0x09-aGatewayHTTP/WS Entry, Pre-Check
0x09-bSettlement PersistenceDB Persistence for Balances/Trades
0x09-cK-Line AggregationReal-time Candles
0x09-dWebSocket PushReal-time Market Data

1. Gateway Design

1.1 Responsibilities

The Gateway is the sole entry point for clients.

  • Protocol Conversion: HTTP/WebSocket → Internal Formats
  • Authentication: API Key / JWT
  • Pre-Check: Fast balance validation
  • Rate Limiting: Anti-DDoS
  • Response: Synchronous acknowledgment

1.2 Why Separate Gateway & Core?

  • Decoupling: Network I/O doesn’t block matching.
  • Scalability: Gateway can scale horizontally.
  • Predictability: Async queues ensure predictable matching latency.

1.3 Tech Stack

  • HTTP: axum (High performance, tokio-native)
  • WebSocket: tokio-tungstenite
  • Serialization: serde + JSON
  • Rate Limiting: tower middleware

2. Core Data Flow

2.1 Order Submission

┌──────────┐    HTTP POST    ┌──────────┐    Ring Buffer   ┌──────────┐
│  Client  │ ───────────────▶│ Gateway  │ ─────────────────▶│ Ingestion│
│          │                 │          │                   │  Stage   │
│          │◀─────────────── │          │                   │          │
└──────────┘  202 Accepted   └──────────┘                   └──────────┘
                   +                                              │
              order_id                                            ▼
              seq_id                                        Trading Core

2.2 Pre-Check Logic

#![allow(unused)]
fn main() {
async fn submit_order(order: OrderRequest) -> Result<OrderResponse, ApiError> {
    // 1. Validation
    validate_order(&order)?;
    
    // 2. Auth
    let user_id = authenticate(&headers)?;
    
    // 3. Pre-Check: Balance (Read-Only)
    let balance = ubscore.query_balance(user_id, order.asset_id).await?;
    if balance.avail < required {
        return Err(ApiError::InsufficientBalance);
    }
    
    // 4. Assign ID
    let order_id = id_generator.next();
    
    // 5. Push to Ring Buffer
    order_queue.push(SequencedOrder { ... })?;
    
    // 6. Return Accepted
    Ok(OrderResponse { status: "PENDING", ... })
}
}

Key Points:

  • Pre-Check is “best effort”.
  • Final locking happens in UBSCore.
  • Returns 202 Accepted to indicate async processing.

3. API Design

3.1 RESTful Endpoints

  • POST /api/v1/create_order: Submit order
  • POST /api/v1/cancel_order: Cancel order
  • GET /api/v1/order/{order_id}: Query status

3.2 Request/Response Format

Submit Order:

// POST /api/v1/create_order
{
    "symbol": "BTC_USDT",
    "side": "BUY",
    "type": "LIMIT",
    "price": "85000.00",
    "qty": "0.001"
}

// Response (202 Accepted)
{
    "code": 0,
    "msg": "ok",
    "data": {
        "order_id": 1001,
        "status": "ACCEPTED",
        "accepted_at": 1734533784000
    }
}

3.3 Unified Response Format

{
    "code": 0,          // 0 = Success, Non-0 = Error
    "msg": "ok",        // Short description
    "data": {}          // Payload or null
}

3.4 API Conventions

Important: Must follow API Conventions.

  1. SCREAMING_CASE Enums: "BUY", "SELL", "LIMIT".
  2. Naming: qty (not quantity), cid (client_order_id).
  3. SCREAMING_SNAKE_CASE Error Codes: INVALID_PARAMETER.

4. WebSocket Push

4.1 Flow

Clients connect via WS, authenticate, and subscribe to channels.

4.2 Channels

  • order_updates: Private order status changes.
  • balance_updates: Private balance changes.
  • trades: Public trade feed.

5. Security

LevelMethodScenario
MVPHeader X-User-IDInternal / Reliability Testing
ProdAPI Key (HMAC)Programmatic Trading
ProdJWTWeb/Mobile

6. Communication Architecture

6.1 MVP Choice: Single Process Ring Buffer

Gateway and Trading Core run in the same process, communicating via Arc<ArrayQueue>.

Pros:

  • ✅ Zero network overhead (~100ns latency).
  • ✅ Reuse existing crossbeam queues.
  • ✅ Simple deployment.

6.2 Architecture Diagram

┌─────────────────────────────────────────────────────────────────────────┐
│                     Single Process (--gateway mode)                      │
├─────────────────────────────────────────────────────────────────────────┤
│  ┌─────────────────────────────┐                                         │
│  │ HTTP Server (tokio runtime) │                                         │
│  └──────────────┬──────────────┘                                         │
│                 │                                                        │
│                 ▼                                                        │
│  ┌─────────────────────────────┐                                         │
│  │         order_queue         │ (Shared Ring Buffer)                    │
│  └──────────────┬──────────────┘                                         │
│                 │                                                        │
│                 ▼                                                        │
│  ┌─────────────────────────────┐                                         │
│  │      Trading Core Threads   │                                         │
│  └─────────────────────────────┘                                         │
└─────────────────────────────────────────────────────────────────────────┘

6.3 Evolution Path

  1. MVP: Single Process.
  2. Phase 2: Unix Domain Socket (Multi-process on same host).
  3. Phase 3: TCP / RPC (Distributed).

7. Implementation Guidelines

7.1 Startup Modes

# Gateway Mode
cargo run --release -- --gateway --port 8080

# Batch Mode (Original)
cargo run --release -- --pipeline-mt

7.2 Main Integration

#![allow(unused)]
fn main() {
if args.gateway {
    // Spawn HTTP Server in a thread
    std::thread::spawn(move || {
        let rt = tokio::runtime::Runtime::new().unwrap();
        rt.block_on(run_http_server(queues));
    });
    // Run Trading Core
    run_pipeline_multi_thread(queues, ...);
}
}

Summary

This chapter implements the Gateway as the client access layer.

Core Philosophy:

The Gateway is a speed guard, not a business processor. Accept fast, validate fast, forward fast.




🇨🇳 中文

📦 代码变更: 查看 Diff

本节核心目标:实现一个轻量级的 HTTP Gateway,连接客户端与交易核心系统。


背景:从核心到完整 MVP

在前面的章节中,我们已经构建了一个功能完整的交易核心系统

  • OrderBook (0x04)
  • Balance Management (0x05-0x06)
  • Matching Engine (0x08)
  • Pipeline (0x08-f/g/h)

但要成为一个可用的 MVP (Minimum Viable Product),还需要以下辅助系统:

┌─────────────────────────────────────────────────────────────────────────┐
│                        Complete Trading System MVP                       │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Client (Web/Mobile/API)                                                 │
│       │                                                                  │
│       ▼                                                                  │
│  ┌─────────────────┐                                                     │
│  │   0x09-a        │  ← 本章:接收订单,返回响应                           │
│  │   Gateway       │                                                     │
│  └────────┬────────┘                                                     │
│           │                                                                  │
│           ▼                                                                  │
│  ┌─────────────────────────────────────────────────────────────────┐     │
│  │              Trading Core (已完成)                               │     │
│  │  Ingestion → UBSCore → ME → Settlement                          │     │
│  └─────────────────────────────────────────────────────────────────┘     │

0x09 系列章节规划

章节主题核心功能
0x09-aGatewayHTTP/WS 订单接入、Pre-Check
0x09-bSettlement Persistence用户余额、订单、成交入库
0x09-cK-Line Aggregation实时 K 线聚合
0x09-dWebSocket Push实时行情推送

1. Gateway 设计

1.1 职责

Gateway 是客户端与交易系统的唯一入口

  • 协议转换:HTTP/WebSocket → 内部消息格式
  • 身份验证:API Key / JWT
  • Pre-Check:快速余额校验
  • 限流:防止 DDoS
  • 响应:同步返回接收确认

1.2 为什么 Gateway + Trading Core 分离?

  • 解耦:网络 I/O 不阻塞撮合。
  • 扩展性:Gateway 可水平扩展。
  • 可预测性:异步队列确保撮合延迟可预测。

1.3 技术选型

  • HTTP: axum (高性能、tokio 原生)
  • WebSocket: tokio-tungstenite
  • Serialization: serde + JSON
  • Rate Limiting: tower middleware

2. 核心数据流

2.1 订单提交流程

┌──────────┐    HTTP POST    ┌──────────┐    Ring Buffer   ┌──────────┐
│  Client  │ ───────────────▶│ Gateway  │ ─────────────────▶│ Ingestion│
│          │                 │          │                   │  Stage   │
│          │◀─────────────── │          │                   │          │
└──────────┘  202 Accepted   └──────────┘                   └──────────┘
                   +                                              │
              order_id                                            ▼
              seq_id                                        Trading Core

2.2 Pre-Check 流程

#![allow(unused)]
fn main() {
async fn submit_order(order: OrderRequest) -> Result<OrderResponse, ApiError> {
    // 1. 参数校验
    validate_order(&order)?;
    
    // 2. 身份验证
    let user_id = authenticate(&headers)?;
    
    // 3. Pre-Check: 余额检查 (只读)
    let balance = ubscore.query_balance(user_id, order.asset_id).await?;
    if balance.avail < required {
        return Err(ApiError::InsufficientBalance);
    }
    
    // 4. 分配 ID
    let order_id = id_generator.next();
    
    // 5. 推送到 Ring Buffer
    order_queue.push(SequencedOrder { ... })?;
    
    // 6. 返回接收确认
    Ok(OrderResponse { status: "PENDING", ... })
}
}

关键点

  • Pre-Check 是“尽力而为“的检查。
  • 最终锁定在 UBSCore 执行。
  • 返回 202 Accepted 表示异步处理中。

3. API 设计

3.1 RESTful Endpoints

  • POST /api/v1/create_order: 提交订单
  • POST /api/v1/cancel_order: 取消订单
  • GET /api/v1/order/{order_id}: 查询状态

3.2 请求/响应格式

提交订单:

// POST /api/v1/create_order
{
    "symbol": "BTC_USDT",
    "side": "BUY",
    "type": "LIMIT",
    "price": "85000.00",
    "qty": "0.001"
}

// Response (202 Accepted)
{
    "code": 0,
    "msg": "ok",
    "data": {
        "order_id": 1001,
        "status": "ACCEPTED",
        "accepted_at": 1734533784000
    }
}

3.3 统一响应格式

{
    "code": 0,          // 0 = 成功, 非0 = 错误码
    "msg": "ok",        // 简短描述
    "data": {}          // 数据或 null
}

3.4 API 规范

重要: 必须遵循 API Conventions 规范。

  1. 大写枚举: "BUY", "SELL", "LIMIT"
  2. 命名一致: qty (而非 quantity), cid (client_order_id)。
  3. 大写蛇形错误码: INVALID_PARAMETER

4. WebSocket 实时推送

4.1 流程

客户端连接 WS,认证,并订阅频道。

4.2 频道

  • order_updates: 私有订单状态变更。
  • balance_updates: 私有余额变更。
  • trades: 公共成交推送。

5. 安全设计

级别方法场景
MVPHeader X-User-ID内部测试
ProdAPI Key (HMAC)程序化交易
ProdJWTWeb/移动端

6. 通信架构设计

6.1 MVP 选择:单进程 Ring Buffer

Gateway 和 Trading Core 运行在同一进程中,通过 Arc<ArrayQueue> 通信。

优势

  • ✅ 零网络开销 (~100ns 延迟)。
  • ✅ 复用现有 crossbeam 队列。
  • ✅ 部署简单。

6.2 架构图

┌─────────────────────────────────────────────────────────────────────────┐
│                     Single Process (--gateway mode)                      │
├─────────────────────────────────────────────────────────────────────────┤
│  ┌─────────────────────────────┐                                         │
│  │ HTTP Server (tokio runtime) │                                         │
│  └──────────────┬──────────────┘                                         │
│                 │                                                        │
│                 ▼                                                        │
│  ┌─────────────────────────────┐                                         │
│  │         order_queue         │ (共享 Ring Buffer)                      │
│  └──────────────┬──────────────┘                                         │
│                 │                                                        │
│                 ▼                                                        │
│  ┌─────────────────────────────┐                                         │
│  │      Trading Core Threads   │                                         │
│  └─────────────────────────────┘                                         │
└─────────────────────────────────────────────────────────────────────────┘

6.3 演进路径

  1. MVP: 单进程。
  2. Phase 2: Unix Domain Socket (同机多进程)。
  3. Phase 3: TCP / RPC (分布式)。

7. 实现指引

7.1 启动模式

# Gateway 模式
cargo run --release -- --gateway --port 8080

# 批量模式 (原有)
cargo run --release -- --pipeline-mt

7.2 Main 集成

#![allow(unused)]
fn main() {
if args.gateway {
    // 启动 HTTP Server 线程
    std::thread::spawn(move || {
        let rt = tokio::runtime::Runtime::new().unwrap();
        rt.block_on(run_http_server(queues));
    });
    // 运行 Trading Core
    run_pipeline_multi_thread(queues, ...);
}
}

总结

本章实现了 Gateway 作为客户端接入层。

核心理念

Gateway 是速度门卫而不是业务处理器。快速接收、快速校验、快速转发。

0x09-b Settlement Persistence: TDengine Integration

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

Core Objective: Persist trade data to TDengine and implement Order Query & History APIs.


Background: From Memory to Persistence

In Gateway Phase 1 (0x09-a), we completed:

  • ✅ HTTP API (create_order, cancel_order)
  • ✅ Order Validation
  • ✅ Ring Buffer Integration
  • Data Persistence ← This Chapter

Current System Issue:

┌─────────────────────────────────────────────────────────────────┐
│                    Trading Core (In-Memory)                      │
│                                                                  │
│    Orders → Match → Trades → Settle → Balance Update             │
│       ↓         ↓           ↓                                   │
│      ❌         ❌           ❌    ← Data LOST on restart!       │
└─────────────────────────────────────────────────────────────────┘

This Chapter’s Solution:

┌─────────────────────────────────────────────────────────────────┐
│                    Trading Core                                  │
│                                                                  │
│    Orders → Match → Trades → Settle → Balance Update             │
│       ↓         ↓           ↓                                   │
│    ┌─────────────────────────────────────────────────┐          │
│    │              TDengine (Persistence)              │          │
│    │    orders | trades | balances                   │          │
│    └─────────────────────────────────────────────────┘          │
└─────────────────────────────────────────────────────────────────┘

1. Why TDengine?

Detailed comparison: Database Selection Analysis

Core Advantages

FeatureTDenginePostgreSQL
Write Speed1M/sec10k/sec
Time-SeriesNative SupportIndex Optimization Needed
Storage1/101x
Real-time AnalyticsBuilt-in StreamExternal Tools Needed
Rust Client✅ Official taostokio-postgres

2. Schema Design

2.1 Super Table Architecture

TDengine uses the Super Table concept:

┌─────────────────────────────────────────────────────────┐
│              Super Table: orders                         │
│    (Unified schema, auto-create sub-table per symbol)    │
├─────────────────┬─────────────────┬────────────────────┤
│ orders_1        │ orders_2        │ orders_N           │
│ (BTC_USDT)      │ (ETH_USDT)      │ (...)              │
└─────────────────┴─────────────────┴────────────────────┘

2.2 DDL Definitions

-- Database Setup
CREATE DATABASE IF NOT EXISTS trading 
    KEEP 365d              -- Retain data for 1 year
    DURATION 10d           -- Partition every 10 days
    BUFFER 256             -- 256MB Write Buffer
    WAL_LEVEL 2            -- WAL Persistence Level
    PRECISION 'us';        -- Microsecond Precision

USE trading;

-- Orders Super Table
CREATE STABLE IF NOT EXISTS orders (
    ts TIMESTAMP,               -- Timestamp (PK)
    order_id BIGINT UNSIGNED,
    user_id BIGINT UNSIGNED,
    side TINYINT UNSIGNED,      -- 0=BUY, 1=SELL
    order_type TINYINT UNSIGNED,-- 0=LIMIT, 1=MARKET
    price BIGINT UNSIGNED,      -- Integer representation
    qty BIGINT UNSIGNED,
    filled_qty BIGINT UNSIGNED,
    status TINYINT UNSIGNED,
    cid NCHAR(64)               -- Client Order ID
) TAGS (
    symbol_id INT UNSIGNED      -- Partition Key
);

-- Trades Super Table
CREATE STABLE IF NOT EXISTS trades (
    ts TIMESTAMP,
    trade_id BIGINT UNSIGNED,
    order_id BIGINT UNSIGNED,
    user_id BIGINT UNSIGNED,
    side TINYINT UNSIGNED,
    price BIGINT UNSIGNED,
    qty BIGINT UNSIGNED,
    fee BIGINT UNSIGNED,
    role TINYINT UNSIGNED       -- 0=MAKER, 1=TAKER
) TAGS (
    symbol_id INT UNSIGNED
);

-- Balances Super Table
CREATE STABLE IF NOT EXISTS balances (
    ts TIMESTAMP,
    avail BIGINT UNSIGNED,
    frozen BIGINT UNSIGNED,
    lock_version BIGINT UNSIGNED,
    settle_version BIGINT UNSIGNED
) TAGS (
    user_id BIGINT UNSIGNED,
    asset_id INT UNSIGNED
);

2.3 Status Enums

#![allow(unused)]
fn main() {
// New Enum
pub enum TradeRole {
    Maker = 0,
    Taker = 1,
}
}

3. API Design

3.1 Query Endpoints

EndpointMethodDescription
/api/v1/order/{order_id}GETQuery single order
/api/v1/ordersGETQuery order list
/api/v1/tradesGETQuery trade history
/api/v1/balancesGETQuery user balances

3.2 Request/Response Format

GET /api/v1/order/{order_id}:

{
    "code": 0,
    "msg": "ok",
    "data": {
        "order_id": 1001,
        "symbol": "BTC_USDT",
        "status": "PARTIALLY_FILLED",
        "filled_qty": "0.0005",
        "created_at": 1734533784000
    }
}

GET /api/v1/balances:

{
    "code": 0,
    "msg": "ok",
    "data": {
        "balances": [
             { "asset": "BTC", "avail": "1.50000000", "frozen": "0.10000000" }
        ]
    }
}

4. Implementation Architecture

4.1 Module Structure

src/
├── persistence/
│   ├── mod.rs              // Entry
│   ├── tdengine.rs         // Connection Manager
│   ├── orders.rs           // Order Persistence
│   ├── trades.rs           // Trade Persistence
│   └── balances.rs         // Balance Persistence

4.2 Data Flow

┌─────────────────────────────────────────────────────────────────┐
│                      Settlement Thread                           │
│                                                                  │
│    trade_queue.pop() ──┬── Update In-Memory Balance              │
│                        │                                         │
│                        └── Write to TDengine                     │
│                             ├── INSERT trades                    │
│                             ├── INSERT order_events              │
│                             └── INSERT balances (Snapshot)       │
└─────────────────────────────────────────────────────────────────┘

4.3 Batch Write Optimization

#![allow(unused)]
fn main() {
// Batch write to reduce I/O overhead
const BATCH_SIZE: usize = 1000;

async fn flush_trades(trades: Vec<Trade>) {
    let mut sql = String::from("INSERT INTO ");
    // Construct bulk insert SQL...
    client.exec(&sql).await;
}
}

5. Implementation Plan

Phase 1: Basic Persistence (This Chapter)

  • TDengine Connection
  • Schema Initialization
  • Trade/Order/Balance Writes

Phase 2: Query APIs

  • Implement GET Endpoints

Phase 3: Optimization

  • Batch Writes
  • Connection Pool
  • Redis Cache

6. Verification Plan

6.1 Integration Test

# 1. Start TDengine
docker run -d -p 6030:6030 -p 6041:6041 tdengine/tdengine:latest

# 2. Run Gateway
cargo run --release -- --gateway --port 8080

# 3. Submit Order
curl -X POST http://localhost:8080/api/v1/create_order ...

# 4. Query Order (Verify Persistence)
curl http://localhost:8080/api/v1/order/1

Summary

This chapter implements Settlement Persistence.

Core Philosophy:

Persistence is a side-channel operation, not blocking the main trading flow. The Settlement thread writes to TDengine asynchronously.




🇨🇳 中文

📦 代码变更: 查看 Diff

本节核心目标:将成交数据持久化到 TDengine,实现订单查询和历史记录 API。


背景:从内存到持久化

在 Gateway Phase 1 (0x09-a) 中,我们完成了:

  • ✅ HTTP API (create_order, cancel_order)
  • ✅ 订单验证和转换
  • ✅ Ring Buffer 队列集成
  • 数据持久化 ← 本章

当前系统的问题:

┌─────────────────────────────────────────────────────────────────┐
│                    Trading Core (内存中)                         │
│                                                                  │
│    Orders → 匹配 → Trades → 结算 → 余额更新                      │
│       ↓         ↓           ↓                                   │
│      ❌         ❌           ❌    ← 重启后数据丢失!              │
└─────────────────────────────────────────────────────────────────┘

本章解决方案:

┌─────────────────────────────────────────────────────────────────┐
│                    Trading Core                                  │
│                                                                  │
│    Orders → 匹配 → Trades → 结算 → 余额更新                      │
│       ↓         ↓           ↓                                   │
│    ┌─────────────────────────────────────────────────┐          │
│    │              TDengine (持久化)                   │          │
│    │    orders | trades | balances                   │          │
│    └─────────────────────────────────────────────────┘          │
└─────────────────────────────────────────────────────────────────┘

1. 为什么选择 TDengine

详细对比见: 数据库选型分析

核心优势

特性TDenginePostgreSQL
写入速度100万/秒1万/秒
时序查询原生支持需要索引优化
存储空间1/101x
实时分析内置流计算需要额外工具
Rust 客户端✅ 官方 taostokio-postgres

2. Schema 设计

2.1 Super Table 架构

TDengine 使用 Super Table 概念:

┌─────────────────────────────────────────────────────────┐
│              Super Table: orders                         │
│    (统一 schema,自动按 symbol_id 创建子表)               │
├─────────────────┬─────────────────┬────────────────────┤
│ orders_1        │ orders_2        │ orders_N           │
│ (BTC_USDT)      │ (ETH_USDT)      │ (...)              │
└─────────────────┴─────────────────┴────────────────────┘

2.2 DDL 定义

-- Database Setup
CREATE DATABASE IF NOT EXISTS trading 
    KEEP 365d              -- 数据保留 1 年
    DURATION 10d           -- 每 10 天一个分区
    BUFFER 256             -- 写缓冲 256MB
    WAL_LEVEL 2            -- WAL 持久化级别
    PRECISION 'us';        -- 微秒精度

USE trading;

-- Orders Super Table
CREATE STABLE IF NOT EXISTS orders (
    ts TIMESTAMP,               -- 订单时间戳 (主键)
    order_id BIGINT UNSIGNED,   -- 订单 ID
    user_id BIGINT UNSIGNED,    -- 用户 ID
    side TINYINT UNSIGNED,      -- 0=BUY, 1=SELL
    order_type TINYINT UNSIGNED,-- 0=LIMIT, 1=MARKET
    price BIGINT UNSIGNED,      -- 价格 (整数)
    qty BIGINT UNSIGNED,        -- 原始数量
    filled_qty BIGINT UNSIGNED, -- 已成交数量
    status TINYINT UNSIGNED,    -- 订单状态
    cid NCHAR(64)               -- 客户端订单 ID
) TAGS (
    symbol_id INT UNSIGNED      -- 交易对 ID (分区键)
);

-- Trades Super Table
CREATE STABLE IF NOT EXISTS trades (
    ts TIMESTAMP,               -- 成交时间戳
    trade_id BIGINT UNSIGNED,   -- 成交 ID
    order_id BIGINT UNSIGNED,   -- 订单 ID
    user_id BIGINT UNSIGNED,    -- 用户 ID
    side TINYINT UNSIGNED,      -- 0=BUY, 1=SELL
    price BIGINT UNSIGNED,      -- 成交价格
    qty BIGINT UNSIGNED,        -- 成交数量
    fee BIGINT UNSIGNED,        -- 手续费
    role TINYINT UNSIGNED       -- 0=MAKER, 1=TAKER
) TAGS (
    symbol_id INT UNSIGNED
);

-- Balances Super Table
CREATE STABLE IF NOT EXISTS balances (
    ts TIMESTAMP,               -- 快照时间
    avail BIGINT UNSIGNED,      -- 可用余额
    frozen BIGINT UNSIGNED,     -- 冻结余额
    lock_version BIGINT UNSIGNED,   -- 锁定版本
    settle_version BIGINT UNSIGNED  -- 结算版本
) TAGS (
    user_id BIGINT UNSIGNED,    -- 用户 ID
    asset_id INT UNSIGNED       -- 资产 ID
);

2.3 状态枚举

#![allow(unused)]
fn main() {
// 新增
pub enum TradeRole {
    Maker = 0,
    Taker = 1,
}
}

3. API 设计

3.1 查询端点

端点方法描述
/api/v1/order/{order_id}GET查询单个订单
/api/v1/ordersGET查询订单列表
/api/v1/tradesGET查询成交历史
/api/v1/balancesGET查询用户余额

3.2 请求/响应格式

GET /api/v1/order/{order_id}:

{
    "code": 0,
    "msg": "ok",
    "data": {
        "order_id": 1001,
        "symbol": "BTC_USDT",
        "status": "PARTIALLY_FILLED",
        "filled_qty": "0.0005",
        "created_at": 1734533784000
    }
}

4. 实现架构

4.1 模块结构

src/
├── persistence/
│   ├── mod.rs              // 模块入口
│   ├── tdengine.rs         // TDengine 连接管理
│   ├── orders.rs           // 订单持久化
│   ├── trades.rs           // 成交持久化
│   └── balances.rs         // 余额持久化

4.2 数据流

┌─────────────────────────────────────────────────────────────────┐
│                      Settlement 线程                             │
│                                                                  │
│    trade_queue.pop() ──┬── 更新内存余额                          │
│                        │                                         │
│                        └── 写入 TDengine                         │
│                             ├── INSERT trades                    │
│                             ├── INSERT order_events              │
│                             └── INSERT balances (快照)           │
└─────────────────────────────────────────────────────────────────┘

4.3 批量写入优化

#![allow(unused)]
fn main() {
// 批量写入,减少 I/O 开销
const BATCH_SIZE: usize = 1000;

async fn flush_trades(trades: Vec<Trade>) {
    let mut sql = String::from("INSERT INTO ");
    // ... 构建批量插入 SQL
    client.exec(&sql).await;
}
}

5. 实现计划

Phase 1: 基础持久化 (本次)

  • TDengine 连接管理
  • Schema 初始化
  • 成交/订单/余额写入

Phase 2: 查询接口

  • 实现 GET 端点

Phase 3: 优化

  • 批量写入
  • 连接池
  • Redis 缓存

6. 验证计划

6.1 集成测试

# 1. 启动 TDengine
docker run -d -p 6030:6030 -p 6041:6041 tdengine/tdengine:latest

# 2. 运行 Gateway
cargo run --release -- --gateway --port 8080

# 3. 提交订单
curl -X POST http://localhost:8080/api/v1/create_order ...

# 4. 查询订单 (验证持久化)
curl http://localhost:8080/api/v1/order/1

Summary

本章实现 Settlement Persistence:

核心理念

持久化是旁路操作,不阻塞主交易流程。Trading Core 保持高性能,Settlement 线程异步写入 TDengine。

下一章 (0x09-c) 将实现 WebSocket 实时推送。

0x09-c WebSocket Push: Real-time Notification

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

Core Objective: Implement WebSocket real-time push so clients can receive order updates, trade notifications, and balance changes.


Background: From Polling to Push

Current Query Method (Polling):

Client                    Gateway
  │                          │
  ├─── GET /orders ─────────>│  (Poll)
  │<──────────────────────────┤
  │       ... seconds ...      │
  ├─── GET /orders ─────────>│  (Poll again)
  │<──────────────────────────┤

Issues:

  • ❌ High Latency
  • ❌ Wasted Resources
  • ❌ Poor Real-time experience

This Chapter’s Solution (Push):

Client                    Gateway                Trading Core
  │                          │                        │
  ├── WS Connect ───────────>│                        │
  │<── Connected ────────────┤                        │
  │                          │                        │
  │                          │<── Order Filled ───────┤
  │<── push: order.update ───┤                        │
  │                          │                        │
  │                          │<── Trade ──────────────┤
  │<── push: trade ──────────┤                        │

1. Push Event Types

1.1 Classification

Event TypeTriggerRecipient
order.updateStatus change (NEW/FILLED/CANCELED)Order Owner
tradeTrade executionBuyer & Seller
balance.updateBalance changeAccount Owner

1.2 Message Format

// Order Update
{
    "type": "order.update",
    "data": {
        "order_id": 1001,
        "symbol": "BTC_USDT",
        "status": "FILLED",
        "filled_qty": "0.001",
        "avg_price": "85000.00",
        "updated_at": 1734533790000
    }
}

// Trade Notification
{
    "type": "trade",
    "data": {
        "trade_id": 5001,
        "order_id": 1001,
        "symbol": "BTC_USDT",
        "side": "BUY",
        "role": "TAKER",
        "traded_at": 1734533790000
    }
}

// Balance Update
{
    "type": "balance.update",
    "data": {
        "asset": "BTC",
        "avail": "1.501000",
        "frozen": "0.000000"
    }
}

2. Architecture Design

2.1 Design Principles

Important

Data Consistency First: When a user receives a push, the database MUST already be updated.

Correct Flow: ME Match → Settlement Persist → Push → User Query → Data Exists ✅

Incorrect Flow: ME Match → Push → User Query → Data Not Found ❌

2.2 System Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Multi-Thread Pipeline                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Thread 3: ME         ──▶  trade_queue  ──▶  Thread 4: Settlement│
│                       └──▶  balance_update_queue                │
│                                                                  │
│  Thread 4: Settlement ──▶  push_event_queue  ──▶  WsService     │
│                       │                                          │
│                       └──▶  TDengine (persist)                   │
│                                                                  │
│  WsService (Gateway)  ──▶  ConnectionManager  ──▶  Clients      │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Key Decisions:

  • Settlement is the only push source.
  • ✅ Push events generated ONLY after persistence success.
  • ✅ WsService runs in the Gateway’s tokio runtime.

2.3 Connection Management

ConnectionManager uses DashMap to handle concurrent connections, supporting multiple connections per user.


3. API Design

3.1 Endpoint

ws://host:port/ws

3.2 Connection Flow

  1. Connect.
  2. Send Auth: {"type": "auth", "token": "..."}.
  3. Receive Auth Success.
  4. Receive Push Events.

3.3 Heartbeat

Client sends {"type": "ping"} every 30s, Server responds {"type": "pong"}.


4. Implementation

4.1 Core Structures

PushEvent (Internal Queue):

#![allow(unused)]
fn main() {
pub enum PushEvent {
    OrderUpdate { ... },
    Trade { ... },
    BalanceUpdate { ... },
}
}

TradeEvent Extension: Added taker_filled_qty, maker_filled_qty etc., to TradeEvent to allow Settlement to determine order status (FILLED vs PARTIAL) without querying generic order state.

4.2 Implementation Plan

  • Phase 1: Basic Connection (Manager, Handler, Gateway Integration).
  • Phase 2: Push Integration (push_event_queue, WsService, Settlement logic).
  • Phase 3: Refinement (Error handling, Performance tests).

5. Verification

5.1 Automated Tests

Run sh run_test.sh:

  • Validates WS connection.
  • Submits orders and verifies receiving order_update, trade, and balance_update events.

5.2 Manual Test

websocat "ws://localhost:8080/ws?user_id=1001"
# Send {"type": "ping"} -> Receive {"type": "pong"}

Summary

This chapter implements WebSocket real-time push.

Key Design Decisions:

  1. Settlement-first: Ensuring consistency.
  2. Single Source: All events originate from Settlement.
  3. Extended TradeEvent: Carrying adequate state for downstream consumers.

Next Chapter: 0x09-d K-Line Aggregation.




🇨🇳 中文

📦 代码变更: 查看 Diff

本节核心目标:实现 WebSocket 实时推送,客户端可接收订单状态更新、成交通知、余额变化。


背景:从轮询到推送

当前系统查询方式(轮询):

Client                    Gateway
  │                          │
  ├─── GET /orders ─────────>│  (轮询 polling)
  │<──────────────────────────┤
  │       ... 数秒后 ...       │
  ├─── GET /orders ─────────>│  (再次轮询)
  │<──────────────────────────┤

问题

  • ❌ 延迟高
  • ❌ 浪费资源
  • ❌ 实时性差

本章解决方案(推送):

Client                    Gateway                Trading Core
  │                          │                        │
  ├── WS Connect ───────────>│                        │
  │<── Connected ────────────┤                        │
  │                          │                        │
  │                          │<── Order Filled ───────┤
  │<── push: order.update ───┤                        │
  │                          │                        │
  │                          │<── Trade ──────────────┤
  │<── push: trade ──────────┤                        │

1. 推送事件类型

1.1 事件分类

事件类型触发时机接收者
order.update订单状态变化订单所有者
trade成交发生双方用户
balance.update余额变化账户所有者

1.2 消息格式

// 订单更新
{
    "type": "order.update",
    "data": {
        "order_id": 1001,
        "symbol": "BTC_USDT",
        "status": "FILLED",
        "filled_qty": "0.001",
        "avg_price": "85000.00",
        "updated_at": 1734533790000
    }
}

2. 架构设计

2.1 设计原则

Important

数据一致性优先: 用户收到推送时,数据库必须已更新。

正确流程: ME 成交 → Settlement 持久化 → 推送 → 用户查询 → 数据已存在 ✅

2.2 系统架构

┌─────────────────────────────────────────────────────────────────┐
│                    Multi-Thread Pipeline                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Thread 3: ME         ──▶  trade_queue  ──▶  Thread 4: Settlement│
│                       └──▶  balance_update_queue                │
│                                                                  │
│  Thread 4: Settlement ──▶  push_event_queue  ──▶  WsService     │
│                       │                                          │
│                       └──▶  TDengine (persist)                   │
│                                                                  │
│  WsService (Gateway)  ──▶  ConnectionManager  ──▶  Clients      │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

关键设计:

  • ✅ Settlement 作为唯一推送源
  • ✅ 持久化成功后才生成 PushEvent
  • ✅ WsService 运行在 Gateway 的 tokio runtime

3. API 设计

3.1 端点

ws://host:port/ws

3.2 连接流程

  1. Client 连接
  2. 发送认证: {"type": "auth", "token": "..."}
  3. 接收推送

3.3 心跳

Client 发送 {"type": "ping"} (每30秒),Server 回复 {"type": "pong"}


4. 实现细节

4.1 核心结构

PushEvent (内部队列): 定义了三种核心事件结构。

TradeEvent 扩展: 新增了 taker_filled_qty 等字段,允许 Settlement 判断订单最终状态。

4.2 实现计划

  • Phase 1: 基础连接管理
  • Phase 2: 推送集成 (Settlement -> WsService)
  • Phase 3: 完善与验证

5. 验证

5.1 自动化测试

运行 sh run_test.sh,覆盖连接、下单、接收各类推送的全流程。

5.2 手动测试

websocat "ws://localhost:8080/ws?user_id=1001"

总结

本章实现了 WebSocket 实时推送。

关键设计决策:

  1. settlement-first: 确保一致性。
  2. 单一推送源: 简化架构。
  3. TradeEvent 扩展: 携带足够状态。

下一章 (0x09-d) 将实现 K-Line 聚合服务。

0x09-d K-Line Aggregation Service

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

Core Objective: Implement real-time K-Line (Candlestick) aggregation service, supporting multiple intervals (1m, 5m, 15m, 30m, 1h, 1d).


Background: Market Data Aggregation

The exchange needs to provide standardized market data:

Trades                            K-Line (OHLCV)
  │                                    │
  ├── Trade 1: price=30000, qty=0.1    │
  ├── Trade 2: price=30100, qty=0.2  ──▶ 1-Min K-Line:
  ├── Trade 3: price=29900, qty=0.1    │   Open:  30000
  └── Trade 4: price=30050, qty=0.3    │   High:  30100
                                       │   Low:   29900
                                       │   Close: 30050
                                       │   Volume: 0.7

1. K-Line Data Structure

1.1 OHLCV

#![allow(unused)]
fn main() {
pub struct KLine {
    pub symbol_id: u32,
    pub interval: KLineInterval,
    pub open_time: u64,      // Unix timestamp (ms)
    pub close_time: u64,
    pub open: u64,
    pub high: u64,
    pub low: u64,
    pub close: u64,
    pub volume: u64,         // Base asset volume
    pub quote_volume: u64,   // Quote asset volume (price * qty)
    pub trade_count: u32,
}
}

Warning

quote_volume Overflow: price * qty might overflow u64.

Correct SQL: SUM(CAST(price AS DOUBLE) * CAST(qty AS DOUBLE)) AS quote_volume

1.2 API Response Format

{
    "symbol": "BTC_USDT",
    "interval": "1m",
    "open_time": 1734533760000,
    "close_time": 1734533819999,
    "open": "30000.00",
    "high": "30100.00",
    "low": "29900.00",
    "close": "30050.00",
    "volume": "0.700000",
    "quote_volume": "21035.00",
    "trade_count": 4
}

2. Architecture: TDengine Stream Computing

2.1 Core Concept

Leverage TDengine built-in Stream Computing for auto-aggregation. No manual aggregator implementation needed:

  1. Settlement writes to trades table.
  2. TDengine automatically triggers stream computing.
  3. Results are written to klines tables.
  4. HTTP API queries klines tables directly.

2.2 Data Flow

   Settlement ──▶ trades table (TDengine)
                      │
                      │ TDengine Stream Computing (Auto)
                      │
                      ├─── kline_1m_stream  ──► klines_1m table
                      ├─── kline_5m_stream  ──► klines_5m table
                      └─── ...
                                                    │
                           ┌────────────────────────┴───────────────────────┐
                           ▼                                                ▼
                    HTTP API                                        WebSocket Push
               GET /api/v1/klines                                kline.update (Optional)

2.3 TDengine Stream Example

CREATE STREAM IF NOT EXISTS kline_1m_stream
INTO klines_1m SUBTABLE(CONCAT('kl_1m_', CAST(symbol_id AS NCHAR(10))))
AS SELECT
    _wstart AS ts,
    FIRST(price) AS open,
    MAX(price) AS high,
    MIN(price) AS low,
    LAST(price) AS close,
    SUM(qty) AS volume,
    SUM(CAST(price AS DOUBLE) * CAST(qty AS DOUBLE)) AS quote_volume,
    COUNT(*) AS trade_count
FROM trades
PARTITION BY symbol_id
INTERVAL(1m);

3. API Design

3.1 HTTP Endpoint

GET /api/v1/klines?symbol=BTC_USDT&interval=1m&limit=100

3.2 WebSocket Push

{
    "type": "kline.update",
    "data": {
        "symbol": "BTC_USDT",
        "interval": "1m",
        "open": "30000.00",
        "close": "30050.00",
        "is_final": false
    }
}

4. Module Structure

src/
├── persistence/
│   ├── klines.rs           # Create Streams, Query K-Lines
│   ├── schema.rs           # Add klines Super Table
│   └── queries.rs          # Add query_klines()
├── gateway/
│   ├── handlers.rs         # Add get_klines
│   └── ...

Tip

No need for src/kline/ logic directory, TDengine handles it.


5. Implementation Plan

  • Phase 1: Schema: Add klines super table.
  • Phase 2: Stream Computing: Implement create_kline_streams().
  • Phase 3: HTTP API: Implement query_klines() and API endpoint.
  • Phase 4: Verification: E2E test.

6. Verification

6.1 E2E Test Scenarios

Script: ./scripts/test_kline_e2e.sh

  1. Check API connectivity.
  2. Record initial K-Line count.
  3. Create matched orders.
  4. Wait for Stream processing (5s).
  5. Query K-Line API and verify data structure.

6.2 Binance Standard Alignment

Warning

P0 Fix: Ensure time fields align with Binance standard (Unix Milliseconds Number).

  • open_time: 1734611580000 (was ISO 8601 string)
  • close_time: 1734611639999 (was missing)

Summary

This chapter implements K-Line aggregation service leveraging TDengine’s Stream Computing.

Key Concept:

K-Line is derived data. We calculate it from trades in real-time, rather than storing original raw data.

Next Chapter: 0x09-e OrderBook Depth.




🇨🇳 中文

📦 代码变更: 查看 Diff

本节核心目标:实现 K-Line (蜡烛图) 实时聚合服务,支持多时间周期 (1m, 5m, 15m, 30m, 1h, 1d)。


背景:行情数据聚合

交易所需要提供标准化的行情数据:

每笔成交                          K-Line (OHLCV)
  │                                    │
  ├── Trade 1: price=30000, qty=0.1    │
  ├── Trade 2: price=30100, qty=0.2  ──▶ 1分钟 K-Line:
  ├── Trade 3: price=29900, qty=0.1    │   Open:  30000
  └── Trade 4: price=30050, qty=0.3    │   High:  30100
                                       │   Low:   29900
                                       │   Close: 30050
                                       │   Volume: 0.7

1. K-Line 数据结构

1.1 OHLCV

#![allow(unused)]
fn main() {
pub struct KLine {
    pub symbol_id: u32,
    pub interval: KLineInterval,
    pub open_time: u64,      // 时间戳 (毫秒)
    pub close_time: u64,
    pub open: u64,           // 开盘价
    pub high: u64,           // 最高价
    pub low: u64,            // 最低价
    pub close: u64,          // 收盘价
    pub volume: u64,         // 成交量 (base asset)
    pub quote_volume: u64,   // 成交额 (quote asset)
    pub trade_count: u32,    // 成交笔数
}
}

Warning

quote_volume 精度问题: price * qty 可能导致 u64 溢出,需使用 DOUBLE 计算。

1.2 API 响应格式

{
    "symbol": "BTC_USDT",
    "interval": "1m",
    "open_time": 1734533760000,
    "close_time": 1734533819999,
    "open": "30000.00",
    "high": "30100.00",
    "low": "29900.00",
    "close": "30050.00",
    "volume": "0.700000",
    "quote_volume": "21035.00",
    "trade_count": 4
}

2. 架构设计:TDengine Stream Computing

2.1 核心思路

利用 TDengine 内置流计算自动聚合 K-Line,无需手动实现聚合器:

  • Settlement 写入 trades 表后,TDengine 自动触发流计算
  • 流计算结果自动写入 klines
  • HTTP API 直接查询 klines 表返回结果

2.2 数据流

   Settlement ──▶ trades 表 (TDengine)
                      │
                      │ TDengine Stream Computing (自动)
                      │
                      ├─── kline_1m_stream  ──► klines_1m 表
                      ├─── kline_5m_stream  ──► klines_5m 表
                      └─── ...

2.3 TDengine Stream 示例

CREATE STREAM IF NOT EXISTS kline_1m_stream
INTO klines_1m SUBTABLE(...)
AS SELECT
    _wstart AS ts,
    FIRST(price) AS open,
    MAX(price) AS high,
    MIN(price) AS low,
    LAST(price) AS close,
    SUM(qty) AS volume,
    SUM(CAST(price AS DOUBLE) * CAST(qty AS DOUBLE)) AS quote_volume,
    COUNT(*) AS trade_count
FROM trades
PARTITION BY symbol_id
INTERVAL(1m);

3. API 设计

HTTP 端点: GET /api/v1/klines?symbol=BTC_USDT&interval=1m&limit=100


4. 模块结构

src/
├── persistence/
│   ├── klines.rs           # Create Stream, Query K-Line
│   ├── schema.rs           # Add klines table
│   └── queries.rs          # Add query_klines()
├── gateway/
│   ├── handlers.rs         # Add get_klines

Tip

无需 src/kline/ 目录,TDengine 流计算替代了手动聚合逻辑


5. 实现计划

  • Phase 1: Schema: 添加 klines 超级表。
  • Phase 2: Stream Computing: 实现 create_kline_streams()
  • Phase 3: HTTP API: 实现查询函数和 API 端点。
  • Phase 4: 验证: E2E 测试。

6. 验证计划

运行脚本 ./scripts/test_kline_e2e.sh 验证:

  1. API 连通性
  2. K-Line 数据生成 (Stream 处理)
  3. 响应结构正确性 (对齐 Binance 标准)

Summary

本章实现 K-Line 聚合服务。

核心理念

K-Line 是衍生数据:从成交事件实时计算,而非存储原始数据。

下一章 (0x09-e) 将实现 OrderBook Depth 聚合。

0x09-e Order Book Depth

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

Core Objective: Implement Order Book Depth push, allowing users to view the current buy/sell order distribution in real-time.


Background: Depth Data

The Order Book Depth displays the current market’s distribution of limit orders:

         Asks (Sells)                   
  ┌─────────────────────┐              
  │ 30100.00   0.3 BTC  │ ← Lowest Ask
  │ 30050.00   0.5 BTC  │              
  │ 30020.00   1.2 BTC  │              
  ├─────────────────────┤              
  │    Current: 30000   │              
  ├─────────────────────┤              
  │ 29980.00   0.8 BTC  │              
  │ 29950.00   1.5 BTC  │              
  │ 29900.00   2.0 BTC  │ ← Highest Bid
  └─────────────────────┘              
         Bids (Buys)                   

1. Data Structure

1.1 Depth Response Format

{
    "symbol": "BTC_USDT",
    "bids": [
        ["29980.00", "0.800000"],
        ["29950.00", "1.500000"],
        ["29900.00", "2.000000"]
    ],
    "asks": [
        ["30020.00", "1.200000"],
        ["30050.00", "0.500000"],
        ["30100.00", "0.300000"]
    ],
    "last_update_id": 12345
}

1.2 Binance Format Comparison

FieldUsBinance
bids[["price", "qty"], ...]✅ Match
asks[["price", "qty"], ...]✅ Match
last_update_id12345✅ Match

2. API Design

2.1 HTTP Endpoint

GET /api/v1/depth?symbol=BTC_USDT&limit=20

ParameterTypeDescription
symbolStringTrading Pair
limitu32Depth levels (5, 10, 20, 50, 100)

2.2 WebSocket Push

// Subscribe
{"type": "subscribe", "channel": "depth", "symbol": "BTC_USDT"}

// Push (Incremental)
{
    "type": "depth.update",
    "symbol": "BTC_USDT",
    "bids": [["29980.00", "0.800000"]],
    "asks": [["30020.00", "0.000000"]],  // qty=0 means removal
    "last_update_id": 12346
}

3. Architecture Design

3.1 Comparison with K-Line

DataSourceLatencyMethod
K-LineHistorical TradesMinute-levelTDengine Stream
DepthCurrent OrdersMs-levelIn-Memory

Depth is too real-time for DB storage. We use Ring Buffer + Independent Service.

3.2 Event-Driven Architecture

Following the pattern: Isolated service, Ring Buffer, Lock-Free.

┌────────────┐                    ┌─────────────────────┐
│     ME     │ ──(non-blocking)─► │ depth_event_queue   │
│            │    drop if full    │ (capacity: 1024)    │
1└────────────┘                   └──────────┬──────────┘
                                             │
                                             ▼
                                  ┌─────────────────────┐
                                  │   DepthService      │
                                  │   (tokio async)     │
                                  ├─────────────────────┤
                                  │ ● HTTP Snapshot     │
                                  │ ● WS Incremental    │
                                  └─────────────────────┘

Important

Market Data Characteristic: Freshness is key. Dropping a few events is acceptable if the consumer is slow, as eventual consistency is restored by snapshots.


4. Module Structure

src/
├── gateway/
│   ├── handlers.rs     # Add get_depth
│   └── ...
├── engine.rs           # Add get_depth() method
└── websocket/
    └── messages.rs     # Add DepthUpdate

5. Implementation Plan

  • Phase 1: HTTP API: Add OrderBook::get_depth(), API endpoint.
  • Phase 2: WebSocket: depth.update message, subscription Logic.

6. Verification

6.1 E2E Test Scenarios

Script: scripts/test_depth.sh

  1. Query empty depth.
  2. Submit Buy/Sell orders (creating depth).
  3. Wait for update (200ms).
  4. Query depth and verify bids/asks.
  5. Performance test (100 orders rapid fire).

Expected Result:

  • Depth reflects order book state.
  • Update latency ≤ 100ms.
  • High frequency updates are batched/throttled correctly.

Summary

PointImplementation
StructureCompatible with Binance (Array format)
APIGET /api/v1/depth
WebSocketdepth.update (Future: Incremental)
ArchitectureEvent-driven, Ring Buffer

Core Concept:

Service Isolation: ME pushes via DepthEvent. DepthService maintains state. Lock-free.

Next Chapter: 0x09-f Integration Test.




🇨🇳 中文

📦 代码变更: 查看 Diff

本节核心目标:实现 Order Book 盘口深度推送,让用户实时看到买卖挂单分布。


背景:盘口数据

交易所盘口展示当前市场的买卖挂单分布:

         卖单 (Asks)                   
  ┌─────────────────────┐              
  │ 30100.00   0.3 BTC  │ ← 最低卖价   
  │ 30050.00   0.5 BTC  │              
  │ 30020.00   1.2 BTC  │              
  ├─────────────────────┤              
  │     当前价格: 30000 │              
  ├─────────────────────┤              
  │ 29980.00   0.8 BTC  │              
  │ 29950.00   1.5 BTC  │              
  │ 29900.00   2.0 BTC  │ ← 最高买价   
  └─────────────────────┘              
         买单 (Bids)                   

1. 数据结构

1.1 Depth 响应格式

{
    "symbol": "BTC_USDT",
    "bids": [
        ["29980.00", "0.800000"],
        ["29950.00", "1.500000"],
        ["29900.00", "2.000000"]
    ],
    "asks": [
        ["30020.00", "1.200000"],
        ["30050.00", "0.500000"],
        ["30100.00", "0.300000"]
    ],
    "last_update_id": 12345
}

1.2 Binance 格式对比

字段我们Binance
bids[["price", "qty"], ...]✅ 相同
asks[["price", "qty"], ...]✅ 相同
last_update_id12345✅ 相同

2. API 设计

2.1 HTTP 端点

GET /api/v1/depth?symbol=BTC_USDT&limit=20

参数类型描述
symbolString交易对
limitu32档位数量 (5, 10, 20, 50, 100)

2.2 WebSocket 推送

depth.update (增量更新),qty=0 表示删除。


3. 架构设计

3.1 与 K-Line 的对比

数据来源时效性处理方式
K-Line历史成交分钟级别TDengine 流计算
Depth当前挂单毫秒级内存状态

Depth 太实时,不适合存数据库——使用 ring buffer + 独立服务 模式。

3.2 事件驱动架构

延续项目一贯的设计:服务独立,通过 ring buffer 通信,lock-free

┌────────────┐                    ┌─────────────────────┐
│     ME     │ ──(non-blocking)─► │ depth_event_queue   │
│            │    drop if full    │ (capacity: 1024)    │
└────────────┘                    └──────────┬──────────┘

4. 模块结构

src/
├── gateway/
│   ├── handlers.rs     # 添加 get_depth
├── engine.rs           # 添加 get_depth()
└── websocket/
    └── messages.rs     # 添加 DepthUpdate

5. 实现计划

  • Phase 1: HTTP API: 实现 OrderBook::get_depth 和 API。
  • Phase 2: WebSocket: 增量推送 (可选)。

6. 验证计划

运行 scripts/test_depth.sh:

  1. 查询空盘口
  2. 提交买卖单
  3. 验证盘口数据更新
  4. 性能验证 (100ms 更新频率)

Summary

设计点方案
数据结构bids/asks 数组,Binance 兼容
HTTP APIGET /api/v1/depth
WebSocketdepth.update (增量)
架构事件驱动,Ring Buffer 通信

核心理念

服务隔离:ME 通过 DepthEvent 推送,DepthService 维护独立状态,lock-free。

下一章 (0x09-f) 将进行集成测试。

0x09-f Integration Test: Full Acceptance

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

Core Objective: Perform comprehensive integration testing on all 0x09 features using historical datasets to establish a reproducible acceptance baseline.


Background

Phase 0x09 delivered multiple key features:

ChapterFeatureStatus
0x09-aGateway HTTP API
0x09-bSettlement Persistence
0x09-cWebSocket Push
0x09-dK-Line Aggregation
0x09-eOrder Book Depth

We now need to integrate and verify these features to ensure end-to-end correctness.


Test Scope

1. Pipeline Correctness

TestDatasetVerification Point
Single vs Multi-Thread100KOutput Identical
Single vs Multi-Thread1.3MOutput Identical

2. Settlement Persistence

TestVerification Point
Orders TableStatus changes recorded correctly
Trades TableTrade data integrity
Balances TableFinal balances match

3. HTTP API

EndpointVerification Point
POST /create_orderSuccess
POST /cancel_orderCorrect execution
GET /ordersCorrect list
GET /tradesRecord integrity
GET /depthBids/Asks ordered

Acceptance Criteria

1. Pipeline Correctness (Must Pass All)

  • Output diff between Single-Thread and Multi-Thread is empty.
  • Final balances match exactly.
  • Trade counts match exactly.

2. Settlement Persistence (Must Pass All)

  • Orders Row Count == Total Orders.
  • Trades Row Count == Total Trades.
  • Final Balances match precisely (100% consistency for avail/frozen).

Important

Consistency Requirement: Core assets (avail, frozen) and order status (filled_qty, status) must be 100% consistent.

3. Performance Baseline

  • Record 100K and 1.3M TPS.
  • Record P99 Latency.

Test Artifacts & Baseline

Baseline Generation

After testing, organize the following for regression testing:

  • 100K Output: baseline/100k/
  • 1.3M Output: baseline/1.3m/
  • Performance Metrics: docs/src/perf-history/

Regression Testing

Use scripts to automatically compare against baseline:

./scripts/test_pipeline_compare.sh 100k
./scripts/test_integration_full.sh

Large Dataset Testing Notes

Important

Special attention needed for 1.3M dataset tests:

  1. Output Redirection: Must redirect output to file to avoid IDE freezing.
  2. Execution Time: Multi-thread mode is slower (~100s vs 16s) due to persistence overhead.
  3. Balance Events: “Lock events != Accepted orders” is expected (due to cancels).
  4. Push Queue Overflow: [PUSH] queue full warnings are expected under high load.

Test Report (2025-12-21)

Performance Baseline

VersionTimeRatevs Baseline
Baseline (urllib)576s174/s-
HTTP Keep-Alive117s857/s+393%
Optimized (Current)69s1,435/s+725%

Pipeline Correctness (1.3M) ✅

  • Core balances consistent.
  • Trade count matches (667,567).
  • Balance final state 100% MATCH.

Settlement Persistence (100K)

  • Orders: 100% MATCH (filled_qty, status).
  • Trades: 100% MATCH.
  • Balances: 100% MATCH.

Conclusion: All 0x09 features (Persistence & Gateway) are production-ready.




🇨🇳 中文

📦 代码变更: 查看 Diff

本节核心目标:使用历史数据集对所有 0x09 功能进行全面集成测试,建立可重复的验收基线。


背景

Phase 0x09 实现了多个关键功能:

章节功能状态
0x09-aGateway HTTP API
0x09-bSettlement Persistence
0x09-cWebSocket Push
0x09-dK-Line Aggregation
0x09-eOrder Book Depth

现需将这些功能整合验证,确保系统端到端正确性。


测试范围

1. Pipeline 正确性

测试数据集验证点
单线程 vs 多线程100K输出完全一致
单线程 vs 多线程1.3M输出完全一致

2. Settlement 持久化

测试验证点
Orders 表状态变更正确记录
Trades 表成交数据完整
Balances 表最终余额一致

3. HTTP API

验证 create_order, cancel_order, orders, trades, depth 等接口。


验收标准

1. Pipeline 正确性 (必须全部通过)

  • 100K/1.3M 输出对比为空。
  • 余额最终状态一致。
  • 成交数量一致。

2. Settlement 持久化 (必须全部通过)

  • Orders/Trades 记录数匹配。
  • Balances 最终值 100% 匹配。

Important

一致性要求:核心资产 (avail, frozen) 和订单状态 (filled_qty, status) 必须 100% 一致。

3. 性能基线

  • 记录 100K 和 1.3M TPS。
  • 记录 P99 延迟。

测试产物与基线

基线生成与回归

使用 baseline/ 目录存储基线数据,并使用 test_pipeline_compare.sh 进行自动化回归测试。


大数据集测试注意事项

Important

运行 1.3M 数据集测试时需要特别注意:

  1. 输出重定向:必须重定向到文件。
  2. 执行时间:多线程模式较慢是正常的。
  3. Balance Events:Lock 事件数不等于订单数是正常的。
  4. Push Queue 溢出:高压下队列满警告是正常的。

测试报告 (2025-12-21)

性能基线

当前优化后 TPS 为 1,435/s,相比基线提升 725%

Pipeline 正确性 (1.3M) ✅

  • 成交数量匹配 (667,567)。
  • 余额最终状态 100% MATCH。

Settlement 持久化 (100K)

  • Orders, Trades, Balances 均为 100% MATCH。

结论:0x09 阶段的所有持久化与网关功能已具备生产级稳定性。

Part II: Productization

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

Core Objective: Upgrade the core matching engine into a complete trading system with Account System, Fund Transfer, and Security Authentication.


1. Review: Achievements of Part I

ChapterTopicKey Achievement
0x01GenesisMinimal Matching Prototype
0x02-03Floats & DecimalsFinancial Grade Precision
0x04BTree OrderBookO(log n) Matching
0x05-06User BalanceLocking/Unlocking
0x07Testing Framework100K Order Baseline
0x08Multi-Thread Pipeline4-Thread Concurrency
0x09Gateway & PersistenceGateway, TDengine, WebSocket

2. Gap Analysis: From Engine to System

DimensionCurrent StateTarget State
IdentityRaw user_idAPI Key Signature
AccountsSingle BalanceFunding + Spot Dual-Account
FundsManual deposit()Deposit/Withdraw/Transfer
EconomicsZero FeeMaker/Taker Fees

3. Blueprint for Part II

0x0A ─── Account System & Security
        ├── 0x0A-a: Account System (exchange_info + DB)
        ├── 0x0A-b: ID Specification (Asset/Symbol Naming)
        └── 0x0A-c: Authentication (API Key Middleware)

0x0B ─── Fund System & Transfers
        ├── Funding/Spot Dual-Account Structure
        └── Deposit/Withdraw API

0x0C ─── Economic Model
        └── Fee Calculation & Deduction

0x0D ─── Snapshot & Recovery
        └── Graceful Shutdown & State Restoration

4. Tech Stack Choices

ComponentChoicePurpose
PostgreSQL 18Account/Asset/SymbolRelational Config Data
TDengineOrders/Trades/K-LinesTime-Series Trading Data
sqlxRust PG DriverAsync + Compile-time Check

5. Design Principles

PrincipleDescription
Minimal External DepsAuth/Transfer logic is cohesive
AuditabilityAll fund changes must have event logs
ProgressiveSystem remains runnable after each module
Backward CompatibleReuse Core types from Part I



🇨🇳 中文

📦 代码变更: 查看 Diff

核心目的:将撮合引擎核心升级为具备账户体系、资金划转和安全鉴权的完整交易系统。


1. 回顾:第一部分的成就

章节主题关键成果
0x01创世纪最简撮合原型
0x02-03浮点数与定点数金融级精度保障
0x04BTree OrderBookO(log n) 撮合
0x05-06用户余额锁定/解锁机制
0x07测试框架100K 订单基线
0x08多线程 Pipeline四线程并发架构
0x09接入层 & 持久化Gateway, TDengine, WebSocket

2. 差距分析:从引擎到系统

维度当前状态目标状态
身份认证user_id 裸奔API Key 签名校验
账户管理单一余额结构Funding + Spot 双账户
资金流转手动 deposit()完整充提+划转流程
经济模型零手续费Maker/Taker 费率

3. 第二部分蓝图

0x0A ─── 账户体系与安全鉴权
        ├── 0x0A-a: 账户体系 (exchange_info + DB 管理)
        ├── 0x0A-b: ID 规范 (Asset/Symbol 命名)
        └── 0x0A-c: 安全鉴权 (API Key 中间件)

0x0B ─── 资金体系与划转
        ├── Funding/Spot 双账户结构
        └── 充提币 API

0x0C ─── 经济模型
        └── 手续费计算与扣除

0x0D ─── 快照与恢复
        └── 优雅停机与状态恢复

4. 技术选型

组件选型用途
PostgreSQL 18账户/资产/交易对关系型配置数据
TDengine订单/成交/K线时序交易数据
sqlxRust PG Driver异步 + 编译时检查

5. 设计原则

原则说明
最小外部依赖鉴权、划转等逻辑内聚
可审计性所有资金变动必须有完整事件流水
渐进式增强每个子模块完成后保持系统可运行
向后兼容复用 Part I 的核心类型

0x0A-a: Account System

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

This chapter establishes the account infrastructure for the trading system: exchange_info module, naming conventions, and database management.


1. Core Module: exchange_info

1.1 Module Structure

src/exchange_info/
├── mod.rs           # Module entry
├── validation.rs    # AssetName/SymbolName validation
├── asset/
│   ├── mod.rs
│   ├── models.rs    # Asset struct + asset_flags
│   └── manager.rs   # AssetManager
└── symbol/
    ├── mod.rs
    ├── models.rs    # Symbol struct + symbol_flags
    └── manager.rs   # SymbolManager

1.2 Core Types

#![allow(unused)]
fn main() {
// Asset
pub struct Asset {
    pub asset_id: i32,
    pub asset: String,     // "BTC", "USDT" (UPPERCASE)
    pub name: String,      // "Bitcoin", "Tether USD"
    pub decimals: i16,     // 8 for BTC, 6 for USDT
    pub status: i16,
    pub asset_flags: i32,  // Permission bits
}

// Symbol
pub struct Symbol {
    pub symbol_id: i32,
    pub symbol: String,    // "BTC_USDT" (UPPERCASE)
    pub base_asset_id: i32,
    pub quote_asset_id: i32,
    pub price_decimals: i16,
    pub qty_decimals: i16,
    pub symbol_flags: i32,
}
}

2. Naming Convention

CategoryStandardExample
Database Name_db suffixexchange_info_db
Table Name_tb suffixassets_tb, symbols_tb
Flags ModuleTable name prefixasset_flags::, symbol_flags::
CodesUPPERCASEBTC, BTC_USDT

See Naming Convention Document.


3. Database Management

3.1 Management Script

# Full Init (Reset + Seed)
python3 scripts/db/manage_db.py init

# Reset Schema Only
python3 scripts/db/manage_db.py reset

# Seed Data Only
python3 scripts/db/manage_db.py seed

# Check Status
python3 scripts/db/manage_db.py status

3.2 Database Constraints

-- Enforce UPPERCASE Asset
CONSTRAINT chk_asset_uppercase CHECK (asset = UPPER(asset))

-- Enforce UPPERCASE Symbol
CONSTRAINT chk_symbol_uppercase CHECK (symbol = UPPER(symbol))

4. API Endpoints

4.1 GET /api/v1/exchange_info

Returns full exchange information:

{
  "code": 0,
  "data": {
    "assets": [
      {
        "asset_id": 1,
        "asset": "BTC",
        "name": "Bitcoin",
        "decimals": 8,
        "can_deposit": true,
        "can_withdraw": true,
        "can_trade": true
      }
    ],
    "symbols": [
      {
        "symbol_id": 1,
        "symbol": "BTC_USDT",
        "base_asset": "BTC",
        "quote_asset": "USDT",
        "price_decimals": 2,
        "qty_decimals": 8,
        "is_tradable": true,
        "is_visible": true
      }
    ],
    "server_time": 1734897000000
  }
}

4.2 Other Endpoints

EndpointDescription
GET /api/v1/assetsAsset list only
GET /api/v1/symbolsSymbol list only

5. Verification

5.1 Integration Test

./scripts/test_account_integration.sh

Scope:

  • ✅ DB Initialization (Auto reset + seed)
  • ✅ Assets/Symbols/ExchangeInfo API
  • ✅ DB Constraints (Lowercase rejected)
  • ✅ Idempotency

5.2 Unit Test

cargo test --lib
# 150 passed, 0 failed

6. Next Steps




🇨🇳 中文

📦 代码变更: 查看 Diff

本章建立交易系统的账户基础设施:exchange_info 模块、命名规范、数据库管理。


1. 核心模块:exchange_info

1.1 模块结构

src/exchange_info/
├── mod.rs           # 模块入口
├── validation.rs    # AssetName/SymbolName 验证
├── asset/
│   ├── mod.rs
│   ├── models.rs    # Asset 结构 + asset_flags
│   └── manager.rs   # AssetManager
└── symbol/
    ├── mod.rs
    ├── models.rs    # Symbol 结构 + symbol_flags
    └── manager.rs   # SymbolManager

1.2 核心类型

#![allow(unused)]
fn main() {
// Asset (资产)
pub struct Asset {
    pub asset_id: i32,
    pub asset: String,     // "BTC", "USDT" (强制大写)
    pub name: String,      // "Bitcoin", "Tether USD"
    pub decimals: i16,     // 8 for BTC, 6 for USDT
    pub status: i16,
    pub asset_flags: i32,  // 权限位
}

// Symbol (交易对)
pub struct Symbol {
    pub symbol_id: i32,
    pub symbol: String,    // "BTC_USDT" (强制大写)
    pub base_asset_id: i32,
    pub quote_asset_id: i32,
    pub price_decimals: i16,
    pub qty_decimals: i16,
    pub symbol_flags: i32,
}
}

2. 命名规范

类别规范示例
数据库名_db 后缀exchange_info_db
表名_tb 后缀assets_tb, symbols_tb
Flags 模块表名前缀asset_flags::, symbol_flags::
Asset/Symbol 代码强制大写BTC, BTC_USDT

详见 命名规范文档


3. 数据库管理

3.1 Python 管理脚本

# 完整初始化(重置 + 种子数据)
python3 scripts/db/manage_db.py init

# 只重置 schema(无数据)
python3 scripts/db/manage_db.py reset

# 只添加种子数据
python3 scripts/db/manage_db.py seed

# 查看当前状态
python3 scripts/db/manage_db.py status

3.2 数据库约束

-- Asset 强制大写
CONSTRAINT chk_asset_uppercase CHECK (asset = UPPER(asset))

-- Symbol 强制大写
CONSTRAINT chk_symbol_uppercase CHECK (symbol = UPPER(symbol))

4. API 端点

4.1 GET /api/v1/exchange_info

返回完整的交易所信息:

{
  "code": 0,
  "data": {
    "assets": [
      {
        "asset_id": 1,
        "asset": "BTC",
        "name": "Bitcoin",
        "decimals": 8,
        "can_deposit": true,
        "can_withdraw": true,
        "can_trade": true
      }
    ],
    "symbols": [
      {
        "symbol_id": 1,
        "symbol": "BTC_USDT",
        "..."
      }
    ],
    "server_time": 1734897000000
  }
}

4.2 其他端点

端点说明
GET /api/v1/assets仅返回资产列表
GET /api/v1/symbols仅返回交易对列表

5. 测试验证

5.1 集成测试

./scripts/test_account_integration.sh

5.2 单元测试

cargo test --lib

6. 下一步

0x0A-b: ID Specification & Account Structure

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📅 Status: Design Phase Core Objective: Define ID generation rules and account data structures.


1. ID Generation Rules

1.1 User ID (u64)

  • Semantics: Global unique user identifier.
  • Strategy: Auto-increment or Snowflake/ULID (for future distributed support).
  • Initial Value: 1024 (0-1023 reserved for system accounts).

1.2 Asset ID (u32)

  • Semantics: Asset identifier (e.g., BTC=1, USDT=2).
  • Strategy: Sequential allocation starting from 1.
  • Purpose: Maintain O(1) array indexing performance.

1.3 Symbol ID (u32)

  • Semantics: Trading Pair identifier (e.g., BTC_USDT=1).
  • Strategy: Sequential allocation starting from 1.

1.4 Account Identification

  • Semantics: User’s sub-account (distinguishing Funding vs Spot).
  • Strategy: Use (user_id, account_type) tuple, no composite ID needed.
    #![allow(unused)]
    fn main() {
    struct AccountKey {
        user_id: u64,
        account_type: AccountType,  // Funding | Spot
    }
    }
  • Account Types:
    • Spot = 1
    • Funding = 2

1.5 Order ID / Trade ID (u64)

  • Semantics: Unique identifier for orders/trades within the Matching Engine.
  • Strategy: Global atomic increment.

2. Core Data Structures

2.1 AccountType Enum

#![allow(unused)]
fn main() {
#[repr(u8)]
pub enum AccountType {
    Funding = 0x01,
    Spot    = 0x02,
}
}

2.2 Account Struct (Conceptual)

#![allow(unused)]
fn main() {
pub struct Account {
    pub user_id: u64,
    pub account_type: AccountType,
    pub balances: HashMap<AssetId, Balance>,
    pub created_at: u64,
    pub status: AccountStatus,
}
}

3. System Reserved Accounts

User IDPurposeDescription
0REVENUEPlatform fee income account
1INSURANCEInsurance fund (future)
2-1023ReservedFor future system use (1024 total)

This design will be updated to src/core_types.rs and src/account/mod.rs upon confirmation.

💡 Future Consideration: Alternative System ID Range

Current: System IDs use 0-1023 (1024 total), users start at 1024.

Problem: Test data might accidentally use 1, 2, 3… which conflicts with system IDs.

Alternative: Use u64::MAX downward for system accounts:

#![allow(unused)]
fn main() {
const REVENUE_ID: u64 = u64::MAX;        // 18446744073709551615
const INSURANCE_ID: u64 = u64::MAX - 1;  // 18446744073709551614
const SYSTEM_MIN: u64 = u64::MAX - 1000; // Boundary

fn is_system_account(user_id: u64) -> bool {
    user_id > SYSTEM_MIN
}
}

Benefits:

  • Users can start from 1, more natural
  • Test data never conflicts with system IDs
  • Clear separation: low = users, high = system



🇨🇳 中文

📅 状态: 设计中 核心目标: 定义系统中所有关键 ID 的生成规则和账户的基础数据结构。


1. ID 生成规则

1.1 User ID (u64)

  • 语义: 全局唯一的用户标识符。
  • 生成策略: 自增序列 或 Snowflake/ULID (未来支持分布式)。
  • 初始值: 1024 (0-1023 保留给系统账户)。

1.2 Asset ID (u32)

  • 语义: 资产标识符(如 BTC=1, USDT=2)。
  • 生成策略: 顺序分配,从 1 开始。
  • 目的: 保持 O(1) 数组索引性能。

1.3 Symbol ID (u32)

  • 语义: 交易对标识符(如 BTC/USDT=1)。
  • 生成策略: 顺序分配,从 1 开始。

1.4 账户标识

  • 语义: 用户的子账户(区分 Funding 与 Spot)。
  • 策略: 使用 (user_id, account_type) 元组,不需要复合 ID。
    #![allow(unused)]
    fn main() {
    struct AccountKey {
        user_id: u64,
        account_type: AccountType,  // Funding | Spot
    }
    }
  • 账户类型:
    • Spot = 1
    • Funding = 2

1.5 Order ID / Trade ID (u64)

  • 语义: 撮合引擎内的订单/成交唯一标识。
  • 生成策略: 全局原子递增。

2. 核心数据结构

2.1 AccountType 枚举

#![allow(unused)]
fn main() {
#[repr(u8)]
pub enum AccountType {
    Funding = 0x01,
    Spot    = 0x02,
}
}

2.2 Account 结构体 (概念)

#![allow(unused)]
fn main() {
pub struct Account {
    pub user_id: u64,
    pub account_type: AccountType,
    pub balances: HashMap<AssetId, Balance>,
    pub created_at: u64,
    pub status: AccountStatus,
}
}

3. 系统保留账户

User ID用途说明
0REVENUE平台手续费收入账户
1INSURANCE保险基金 (未来)
2-1023保留未来系统用途 (共 1024 个)

此设计待确认后,将同步更新至 src/core_types.rssrc/account/mod.rs

💡 未来考虑: 替代系统 ID 范围

当前: 系统 ID 使用 0-1023 (共 1024 个),用户从 1024 开始。

问题: 测试数据可能使用 1, 2, 3…,与系统 ID 冲突。

替代方案: 使用 u64::MAX 倒数作为系统账户:

#![allow(unused)]
fn main() {
const REVENUE_ID: u64 = u64::MAX;        // 18446744073709551615
const INSURANCE_ID: u64 = u64::MAX - 1;  // 18446744073709551614
const SYSTEM_MIN: u64 = u64::MAX - 1000; // 边界

fn is_system_account(user_id: u64) -> bool {
    user_id > SYSTEM_MIN
}
}

好处:

  • 用户可以从 1 开始,更自然
  • 测试数据永不与系统 ID 冲突
  • 清晰分离: 低位=用户,高位=系统

0x0A-c: API Authentication

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📅 Status: ✅ Implemented Branch: 0x0A-b-api-auth Date: 2025-12-23 Code Changes: v0.0A-a-account-system…v0.0A-b-api-auth

Implementation Summary

MetricResult
Auth Module8 Files
Unit Tests35/35 ✅
Total Tests188/188 ✅
Commits31 commits

1. Overview

Implement secure request authentication for Gateway API to protect trading endpoints from unauthorized access.

1.1 Design Goals

GoalDescription
SecurityPrevent forgery and replay attacks
PerformanceVerification latency < 1ms
ScalabilitySupport multiple auth methods
UsabilityDeveloper-friendly SDK integration

1.2 Threat Model

  • Request Forgery
  • Replay Attack
  • Man-in-the-Middle (MITM)
  • API Key Leakage
  • Brute Force

2. Authentication Scheme Comparison

2.1 Evaluation

SchemeSecurityPerformanceComplexityLeal Risk
HMAC-SHA256⭐⭐⭐⭐⭐⭐⭐⭐⭐Medium🔴 Secret on server
Ed25519⭐⭐⭐⭐⭐⭐⭐⭐⭐Medium🟢 Public key only
JWT Token⭐⭐⭐⭐⭐⭐⭐⭐Low🔴 Token replayable
OAuth 2.0⭐⭐⭐⭐⭐⭐⭐High🟡 Dependency

2.2 Decision: Ed25519

Selected Ed25519 Asymmetric Signature.

  • No Server Secret: Only public key stored.
  • Non-Repudiation: Only private key holder can sign.
  • High Security: 128-bit security level (256-bit key).
  • Fast Verification: ~100μs.

3. Ed25519 Signature Design

3.1 Key Pair

  • Private Key: 32 bytes, stored on Client, NEVER transmitted.
  • Public Key: 32 bytes, stored on Server.
  • Signature: 64 bytes.

3.2 Request Signature Format

payload = api_key + ts_nonce + method + path + body
signature = Ed25519.sign(private_key, payload)

Header Format:

Authorization: ZXINF v1.<api_key>.<ts_nonce>.<signature>
FieldDescriptionEncoding
api_keyAK_ + 16 HEX (19 chars)plain
ts_nonceMonotonic Timestamp (ms)numeric
signature64-byte signatureBase62

ts_nonce: Must be strictly monotonically increasing. new_ts = max(now_ms, last_ts + 1).


4. Database Design

4.1 api_keys_tb Table

CREATE TABLE api_keys_tb (
    key_id         SERIAL PRIMARY KEY,
    user_id        BIGINT NOT NULL REFERENCES users_tb(user_id),
    api_key        VARCHAR(35) UNIQUE NOT NULL,
    key_type       SMALLINT NOT NULL DEFAULT 1,  -- 1=Ed25519
    key_data       BYTEA NOT NULL,               -- Public Key (32 bytes)
    permissions    INT NOT NULL DEFAULT 1,
    status         SMALLINT NOT NULL DEFAULT 1,
    ...
);

4.2 Key Types

key_typeAlgorithmkey_data
1Ed25519Public Key (32 bytes)
2HMAC-SHA256SHA256(secret)
3RSAPEM Public Key

5. Code Architecture

5.1 Module Structure

src/api_auth/
├── mod.rs
├── api_key.rs          # Model + Repository
├── signature.rs        # Ed25519 verification
├── middleware.rs       # Axum Middleware
└── error.rs            # Auth Errors

5.2 Request Flow

  1. Extract Headers.
  2. Verify Timestamp window.
  3. Query ApiKey (Cache/DB).
  4. Verify Ed25519 Signature.
  5. Check Permissions.
  6. Inject user_id into context.

6. Route Protection

6.1 Public Endpoints (No Auth)

  • GET /api/v1/public/exchange_info
  • GET /api/v1/public/depth
  • GET /api/v1/public/klines
  • GET /api/v1/public/ticker

6.2 Private Endpoints (Auth Required)

  • GET /api/v1/private/account
  • POST /api/v1/private/order (Trade Perm)
  • POST /api/v1/private/withdraw (Withdraw Perm)

7. Performance

  • Signature Verification: < 50μs (Ed25519).
  • DB Query: < 1ms (Cached).
  • Total Latency Overhead: < 2ms.

8. SDK Example (Python)

from nacl.signing import SigningKey
import time

api_key = "AK_..."
private_key = bytes.fromhex("...")
signing_key = SigningKey(private_key)

def sign_request(method, path, body=""):
    ts_nonce = str(int(time.time() * 1000))
    payload = f"{api_key}{ts_nonce}{method}{path}{body}"
    signature = signing_key.sign(payload.encode()).signature
    sig_b62 = base62_encode(signature)
    return f"v1.{api_key}.{ts_nonce}.{sig_b62}"



🇨🇳 中文

📅 状态: ✅ 实现完成 代码变更: 查看 Diff

Implementation Summary

指标结果
Auth 模块8 文件
单元测试35/35 ✅
全部测试188/188 ✅

1. 概述

为 Gateway API 实现安全的请求鉴权机制,保护交易接口免受未授权访问。

1.1 设计目标

安全、高性能、可扩展、易用。

1.2 安全威胁模型

请求伪造、重放攻击、中间人攻击、Key 泄露等。


2. 鉴权方案对比

2.2 选型决策

选择 Ed25519 非对称签名

  • 服务端无 secret:仅存储公钥。
  • 不可抵赖性
  • 高安全性
  • 快速验证 (~100μs)。

3. Ed25519 签名算法设计

3.1 密钥对生成

私钥客户端保存,公钥服务端存储。

3.2 请求签名格式

payload = api_key + ts_nonce + method + path + body
signature = Ed25519.sign(private_key, payload)

Header: Authorization: ZXINF v1.<api_key>.<ts_nonce>.<signature>


4. 数据库设计

4.1 api_keys_tb 表

支持 key_type (1=Ed25519, 2=HMAC, 3=RSA)。key_data 存储公钥或 secret hash。


5. 代码架构

src/api_auth/ 下包含 api_key, signature, middleware 等模块。


6. 路由保护策略

  • Public: 行情接口,无需鉴权。
  • Private: 交易/账户接口,需签名鉴权。

7. 性能考虑

使用 Ed25519 极速验证 (< 50μs) + 内存缓存,总延迟 < 2ms。


8. SDK 示例 (Python)

提供 Python/Curl 示例代码,展示如何生成符合规范的 Authorization header。

0x0B Funding & Transfer: Fund System

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📅 Status: 📝 Draft Branch: 0x0B-funding-transfer Date: 2025-12-23


1. Overview

1.1 Objectives

Build a complete fund management system supporting:

  • Deposit: External funds entering the exchange.
  • Withdraw: Funds leaving the exchange.
  • Transfer: Internal fund movement between accounts.

1.2 Design Principles

PrincipleDescription
IntegrityComplete audit log for every change
Double EntryDebits = Credits, funds conserved
AsyncDeposits/Withdrawals are async, Transfers sync
IdempotencyNo duplicate execution
AuditabilityAll actions traceable

2. Account Model

2.1 Architecture Overview

┌─────────────────────────────────────────────────────────────────────────┐
│                         Account Architecture                            │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ┌───────────────────────────┐    ┌───────────────────────────┐       │
│   │    Funding Account        │    │     Spot Account          │       │
│   │    (account_type = 2)     │    │     (account_type = 1)    │       │
│   ├───────────────────────────┤    ├───────────────────────────┤       │
│   │  Storage: PostgreSQL      │    │  Storage: UBSCore (RAM)   │       │
│   │  Table: balances_tb       │    │  HashMap in memory        │       │
│   │                           │    │                           │       │
│   │  Purpose:                 │    │  Purpose:                 │       │
│   │  - Deposit (充值)          │    │  - Trading (撮合)          │       │
│   │  - Withdraw (提现)         │    │  - Order matching         │       │
│   │  - Internal Transfer      │    │  - Real-time balance      │       │
│   └─────────────┬─────────────┘    └─────────────┬─────────────┘       │
│                 │                                │                     │
│                 └──────── Transfer (划转) ───────┘                     │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

2.2 Storage Summary

AccountTypeStorageTable/Structure
Funding2PostgreSQLbalances_tb
Spot1Memory (UBSCore)HashMap<(user_id, asset_id), Balance>

Note: balances_tb is currently used for Funding account only. Spot balances are managed in-memory by UBSCore and persisted to TDengine as events.

2.3 Schema (PostgreSQL)

Current Implementation: Single balances_tb for all user balances.

-- 001_init_schema.sql
CREATE TABLE balances_tb (
    balance_id BIGSERIAL PRIMARY KEY,
    user_id BIGINT NOT NULL REFERENCES users_tb(user_id),
    asset_id INT NOT NULL REFERENCES assets_tb(asset_id),
    available DECIMAL(30, 8) NOT NULL DEFAULT 0,
    frozen DECIMAL(30, 8) NOT NULL DEFAULT 0,
    version INT NOT NULL DEFAULT 1,
    UNIQUE (user_id, asset_id)
);

Note: Current design uses single balance per (user_id, asset_id). Future multi-account support (Spot/Funding/Margin) can add account_type column.


3. Deposit Flow

  1. User gets address.
  2. User transfers funds to exchange address.
  3. Indexer monitors chain.
  4. Wait for Confirmations.
  5. Credit to Funding Account.

3.1 Deposit Table

CREATE TYPE deposit_status AS ENUM ('pending', 'confirming', 'completed', 'failed');

CREATE TABLE deposits_tb (
    deposit_id      BIGSERIAL PRIMARY KEY,
    user_id         BIGINT NOT NULL REFERENCES users_tb(user_id),
    asset_id        INTEGER NOT NULL REFERENCES assets_tb(asset_id),
    amount          BIGINT NOT NULL,
    tx_hash         VARCHAR(128) UNIQUE,
    status          deposit_status NOT NULL DEFAULT 'pending',
    ...
);

4. Withdrawal Flow

  1. User Request -> Review -> Sign -> Broadcast -> Complete.

4.1 Withdrawal Table

CREATE TYPE withdraw_status AS ENUM ('pending', 'risk_review', 'processing', 'completed', ...);

CREATE TABLE withdrawals_tb (
    withdrawal_id   BIGSERIAL PRIMARY KEY,
    user_id         BIGINT NOT NULL,
    amount          BIGINT NOT NULL,
    fee             BIGINT NOT NULL,
    net_amount      BIGINT NOT NULL,
    status          withdraw_status NOT NULL DEFAULT 'pending',
    ...
);

4.2 Risk Rules

  • Small Amount: Auto-approve (< 500 USDT).
  • Large Amount: Manual Review (>= 10000 USDT).
  • New Address: 24h Delay.

5. Transfer

5.1 Types

  • funding → spot: Available for trading.
  • spot → funding: Available for withdrawal.
  • user → user: Internal transfer.

5.2 API Design

POST /api/v1/private/transfer

{
    "from_account": "funding",
    "to_account": "spot",
    "asset": "USDT",
    "amount": "100.00"
}

6. Ledger

Complete record of all fund movements.

CREATE TYPE ledger_type AS ENUM ('deposit', 'withdraw', 'transfer_in', 'trade_buy', ...);

CREATE TABLE ledger_tb (
    ledger_id       BIGSERIAL PRIMARY KEY,
    user_id         BIGINT NOT NULL,
    ledger_type     ledger_type NOT NULL,
    amount          BIGINT NOT NULL,
    balance_after   BIGINT NOT NULL,
    ref_id          BIGINT,
    ...
);

7. Implementation Plan

  • Phase 1: DB: Migrations for sub_accounts, funding, ledger.
  • Phase 2: Transfer: Model + API (Sync).
  • Phase 3: Deposit: Model + Address logic.
  • Phase 4: Withdraw: Model + Risk logic.

8. Design Decisions

DecisionChoiceReason
Account ModelSub-accountsIsolate trading risks
StoragePostgreSQLACID Requirement
TransferSynchronousUser Experience
DepositAsynchronousChain dependency



🇨🇳 中文

📅 状态: 📝 草稿 分支: 0x0B-funding-transfer


1. 概述

构建完整的资金管理体系,支持充值、提现、划转。

1.2 设计原则

账本完整性、双重记账、异步处理、幂等性、可审计。


2. 账户模型

2.1 架构总览

┌─────────────────────────────────────────────────────────────────────────┐
│                           账户架构                                       │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ┌───────────────────────────┐    ┌───────────────────────────┐       │
│   │    Funding 账户           │    │     Spot 账户             │       │
│   │    (account_type = 2)     │    │     (account_type = 1)    │       │
│   ├───────────────────────────┤    ├───────────────────────────┤       │
│   │  存储: PostgreSQL         │    │  存储: UBSCore (内存)      │       │
│   │  表: balances_tb          │    │  HashMap 内存结构          │       │
│   │                           │    │                           │       │
│   │  用途:                    │    │  用途:                    │       │
│   │  - 充值 (Deposit)         │    │  - 撮合交易 (Trading)      │       │
│   │  - 提现 (Withdraw)        │    │  - 订单匹配               │       │
│   │  - 内部划转               │    │  - 实时余额管理            │       │
│   └─────────────┬─────────────┘    └─────────────┬─────────────┘       │
│                 │                                │                     │
│                 └──────── 划转 (Transfer) ───────┘                     │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

2.2 存储汇总

账户类型存储表/结构
Funding2PostgreSQLbalances_tb
Spot1内存 (UBSCore)HashMap<(user_id, asset_id), Balance>

备注: balances_tb 目前仅用于 Funding 账户。 Spot 余额由 UBSCore 内存管理,事件持久化到 TDengine。

2.3 数据库设计 (PostgreSQL)

当前实现: balances_tb 用于 Funding 账户余额。

-- 001_init_schema.sql
CREATE TABLE balances_tb (
    balance_id BIGSERIAL PRIMARY KEY,
    user_id BIGINT NOT NULL,
    asset_id INT NOT NULL,
    available DECIMAL(30, 8) NOT NULL DEFAULT 0,
    frozen DECIMAL(30, 8) NOT NULL DEFAULT 0,
    UNIQUE (user_id, asset_id)
);

备注: 当前设计每个 (user_id, asset_id) 一条余额记录。 未来多账户支持 (Spot/Funding/Margin) 可添加 account_type 列。


3. 充值流程 (Deposit)

监听链上交易 -> 等待确认数 -> 入账 Funding 账户。

3.3 确认数规则

BTC 3个确认 (~30min),ETH 12个确认 (~3min)。


4. 提现流程 (Withdraw)

用户申请 -> 风控审核 -> 签名广播 -> 完成。

4.3 风控规则

小额免审,大额人工复核,新地址延迟提现。


5. 划转 (Transfer)

5.1 划转类型

支持 funding <-> spot 互转,及内部用户转账。

5.3 API 设计

POST /api/v1/private/transfer,需要 Ed25519 签名鉴权。


6. 资金流水 (Ledger)

记录每一笔资金变动 (deposit, withdraw, trade, fee, etc.),确保可追溯。


7. 实现计划

  • Phase 1: 数据库 Migration
  • Phase 2: Transfer 功能 (优先)
  • Phase 3: Deposit (P2)
  • Phase 4: Withdraw (P2)

8. 设计决策

决策选择理由
账户模型子账户隔离交易和充提资金
充提存储PostgreSQL需要事务 ACID
划转同步低延迟,用户体验好
充提异步依赖链上确认

0x0B-a Internal Transfer Architecture (Strict FSM)

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff


1. Problem Statement

1.1 System Topology

SystemRoleSource of TruthPersistence
PostgreSQLFunding Accountbalances_tbACID, Durable
UBSCoreTrading AccountRAMWAL + Volatile

1.2 The Core Constraint

These two systems cannot share a transaction. There is no XA/2PC database protocol. Therefore: We must build our own 2-Phase Commit using an external FSM Coordinator.


1.5 Security Pre-Validation (MANDATORY)

Caution

Defense-in-Depth All checks below MUST be performed at every independent module, not just API layer.

  • API Layer: First line of defense, reject obviously invalid requests
  • Coordinator: Re-validate, prevent internal calls bypassing API
  • Adapters: Final defense, each adapter must independently validate parameters
  • UBSCore: Last check before in-memory operations

Safety > Performance. The cost of redundant checks is acceptable; security vulnerabilities are not.

1.5.1 Identity & Authorization Checks

CheckAttack VectorValidation LogicError Code
User AuthenticationForged requestJWT/Session must be validUNAUTHORIZED
User ID ConsistencyCross-user transfer attackrequest.user_id == auth.user_idFORBIDDEN
Account OwnershipSteal others’ fundsSource/Target accounts belong to same user_idFORBIDDEN

1.5.2 Account Type Checks

CheckAttack VectorValidation LogicError Code
from != toInfinite wash trading/resource wasterequest.from != request.toSAME_ACCOUNT
Account Type ValidInject invalid typefrom, to ∈ {FUNDING, SPOT}INVALID_ACCOUNT_TYPE
Account Type SupportedRequest unlaunched featurefrom, to both in supported listUNSUPPORTED_ACCOUNT_TYPE

1.5.3 Amount Checks

CheckAttack VectorValidation LogicError Code
amount > 0Zero/negative transferamount > 0INVALID_AMOUNT
Precision CheckPrecision overflowdecimal_places(amount) <= asset.precisionPRECISION_OVERFLOW
Minimum AmountDust attackamount >= asset.min_transfer_amountAMOUNT_TOO_SMALL
Maximum Single AmountRisk control bypassamount <= asset.max_transfer_amountAMOUNT_TOO_LARGE
Integer Overflowu64 overflow attackamount <= u64::MAX / safety_factorOVERFLOW

1.5.4 Asset Checks

CheckAttack VectorValidation LogicError Code
Asset ExistsFake asset_idasset_id exists in systemINVALID_ASSET
Asset StatusDelisted assetasset.status == ACTIVEASSET_SUSPENDED
Transfer PermissionSome assets forbid internal transferasset.internal_transfer_enabled == trueTRANSFER_NOT_ALLOWED

1.5.5 Account Status Checks

Account Initialization Rules (Overview)

Account TypeInit TimingNotes
FUNDINGCreated on first deposit requestTriggered by external deposit flow
SPOTCreated on first internal transferLazy Init
FUTURECreated on first internal transfer [P2]Lazy Init
MARGINCreated on first internal transfer [P2]Lazy Init

Note

  • Specific initialization behaviors and business rules for each account type are defined in their dedicated documents.
  • Each account has its own state definitions (e.g., whether transfer is allowed); not detailed here.
  • Default State: On account initialization, transfer is allowed by default.

Account Status Check Table

CheckAttack VectorValidation LogicError Code
Source Account ExistsNon-existent accountSource account record must existSOURCE_ACCOUNT_NOT_FOUND
Target Account Exists/CreateNon-existent targetFUNDING must exist; SPOT/FUTURE/MARGIN can createTARGET_ACCOUNT_NOT_FOUND (FUNDING only)
Source Not FrozenFrozen account transfer outsource.status != FROZENACCOUNT_FROZEN
Source Not DisabledDisabled account operationsource.status != DISABLEDACCOUNT_DISABLED
Sufficient BalanceInsufficient balance direct rejectsource.available >= amountINSUFFICIENT_BALANCE

1.5.6 Rate Limiting - [P2 Future Optimization]

Note

This is a V2 optimization. V1 may skip this.

CheckAttack VectorValidation LogicError Code
Requests Per SecondDoS attackuser_requests_per_second <= 10RATE_LIMIT_EXCEEDED
Daily Transfer CountAbuseuser_daily_transfers <= 100DAILY_LIMIT_EXCEEDED
Daily Transfer AmountLarge amount risk controluser_daily_amount <= daily_limitDAILY_AMOUNT_EXCEEDED

1.5.7 Idempotency Check

CheckAttack VectorValidation LogicError Code
cid UniqueDuplicate submissionIf cid provided, check if existsDUPLICATE_REQUEST (return original result)
1. Authentication (JWT valid?)
2. Authorization (user_id match?)
3. Request Format (from/to/amount valid?)
4. Account Type (from != to, type supported?)
5. Asset Check (exists? enabled? transferable?)
6. Amount Check (range? precision? overflow?)
7. Rate Limiting (exceeded?)
8. Idempotency (duplicate?)
9. Balance Check (sufficient?) ← Check last, avoid unnecessary queries

2. FSM Design (The State Machine)

2.0 Library Choice: rust-fsm

We use the rust-fsm library, providing:

  • Compile-time validation - Illegal state transitions cause compile errors.
  • Declarative DSL - Clearly defined states and transitions.
  • Type Safety - Prevents missing match arms.

Cargo.toml:

[dependencies]
rust-fsm = "0.7"

DSL Definition:

#![allow(unused)]
fn main() {
use rust_fsm::*;

state_machine! {
    derive(Debug, Clone, Copy, PartialEq, Eq)
    
    TransferFsm(Init)  // Initial State
    
    // State Definitions
    Init => {
        SourceWithdrawOk => SourceDone,
        SourceWithdrawFail => Failed,
    },
    SourceDone => {
        TargetDepositOk => Committed,
        TargetDepositFail => Compensating,
        TargetDepositUnknown => SourceDone [loop],  // Stay, Infinite Retry
    },
    Compensating => {
        RefundOk => RolledBack,
        RefundFail => Compensating [loop],  // Stay, Infinite Retry
    },
    // Terminal States
    Committed,
    Failed,
    RolledBack,
}
}

Note

The DSL above is used for compile-time validation of state transition validity. Actual runtime state is stored in PostgreSQL and updated via CAS.

2.0.1 Core State Flow (Top Level)

                               ┌─────────────────────────────────────────────────────────┐
                               │              INTERNAL TRANSFER FSM                       │
                               └─────────────────────────────────────────────────────────┘

    ┌─────────────────────────────── Happy Path ────────────────────────────────────────────┐
    │                                                                                       │
    │    ┌─────────┐                    ┌─────────────┐                    ┌───────────────┐  │
    │    │  INIT   │   Source Deduct ✓  │ SOURCE_DONE │   Target Credit ✓  │               │  │
    │    │(Request)│ ─────────────────▶ │ (In-Flight) │ ─────────────────▶ │   COMMITTED   │  │
    │    └─────────┘                    └─────────────┘                    │               │  │
    │         │                               │                            └───────────────┘  │
    │         │                               │                                   ✅          │
    └─────────│───────────────────────────────│───────────────────────────────────────────────┘
              │                               │
              │                               │
              │                               ▼
              │                     ╔══════════════════════════════════════════════════╗
              │                     ║  🔒 ATOMIC COMMIT                               ║
              │                     ║                                                  ║
              │                     ║  IF AND ONLY IF:                                 ║
              │                     ║    FROM.withdraw = SUCCESS  ✓                   ║
              │                     ║    TO.deposit    = SUCCESS  ✓                   ║
              │                     ║                                                  ║
              │                     ║  EXECUTE: CAS(SOURCE_DONE → COMMITTED)           ║
              │                     ║  Must be atomic and non-interruptible.           ║
              │                     ╚══════════════════════════════════════════════════╝
              │                               │
              │ Source Deduction Fail         │ Target Credit Fail (EXPLICIT_FAIL)
              ▼                               ▼
        ┌──────────┐                   ┌──────────────┐
        │  FAILED  │                   │ COMPENSATING │◀───────────┐
        │ (Source) │                   │  (Refunding) │            │ Refund Fail (Infinite Retry)
        └──────────┘                   └──────────────┘────────────┘
             ❌                               │ Refund Success
                                              ▼
                                       ┌─────────────┐
                                       │ ROLLED_BACK │
                                       │ (Restored)  │
                                       └─────────────┘
                                             ↩️

    ╔════════════════════════════════════════════════════════════════════════════════════════╗
    ║  ⚠️ Target Unknown (TIMEOUT/UNKNOWN) → Stay SOURCE_DONE, Infinite Retry, NEVER rollback. ║
    ╚════════════════════════════════════════════════════════════════════════════════════════╝

Core State Description:

StateFund LocationDescription
INITSource AccountUser request accepted, funds haven’t moved yet.
SOURCE_DONEIn-FlightCRITICAL! Funds have left source, haven’t reached target.
COMMITTEDTarget AccountTerminal state, transfer succeeded.
FAILEDSource AccountTerminal state, source deduction failed, no funds moved.
COMPENSATINGIn-FlightTarget credit failed, refunding to source.
ROLLED_BACKSource AccountTerminal state, refund succeeded.

Important

SOURCE_DONE is the most critical state - funds have left the source account but have not yet reached the target. At this point, the state MUST NOT be lost; it must eventually reach COMMITTED or ROLLED_BACK.

2.1 States (Exhaustive)

IDState NameEntry ConditionTerminal?Funds Location
0INITUser request accepted.NoSource
10SOURCE_PENDINGCAS success, Adapter call initiated.NoSource (Deducting)
20SOURCE_DONESource Adapter returned OK.NoIn-Flight
30TARGET_PENDINGCAS success, Target Adapter call initiated.NoIn-Flight (Crediting)
40COMMITTEDTarget Adapter returned OK.YESTarget
-10FAILEDSource Adapter returned FAIL.YESSource (Unchanged)
-20COMPENSATINGTarget Adapter FAIL AND Source is Reversible.NoIn-Flight (Refunding)
-30ROLLED_BACKSource Refund OK.YESSource (Restored)

2.2 State Transition Rules (Exhaustive)

┌───────────────────────────────────────────────────────────────────────────────┐
│                         CANONICAL STATE TRANSITIONS                           │
├───────────────────────────────────────────────────────────────────────────────┤
│                                                                               │
│  INIT ──────[CAS OK]───────► SOURCE_PENDING                                   │
│    │                              │                                           │
│    │                              ├──[Adapter OK]────► SOURCE_DONE            │
│    │                              │                         │                 │
│    │                              └──[Adapter FAIL]──► FAILED (Terminal)      │
│    │                                                        │                 │
│    │                                                        │                 │
│    │                              SOURCE_DONE ──[CAS OK]──► TARGET_PENDING    │
│    │                                                             │            │
│    │                        ┌────────────────────────────────────┤            │
│    │                        │                                    │            │
│    │            [Adapter OK]│                       [Adapter FAIL]            │
│    │                        │                                    │            │
│    │                        ▼                                    ▼            │
│    │                   COMMITTED                     ┌───────────────────┐    │
│    │                   (Terminal)                    │ SOURCE REVERSIBLE?│    │
│    │                                                 └─────────┬─────────┘    │
│    │                                                   YES     │     NO       │
│    │                                                   ▼       │     ▼        │
│    │                                           COMPENSATING    │  INFINITE    │
│    │                                                 │         │   RETRY      │
│    │                                    [Refund OK]  │         │ (Stay in     │
│    │                                         ▼       │         │  TARGET_     │
│    │                                    ROLLED_BACK  │         │  PENDING)    │
│    │                                    (Terminal)   │         │              │
│    │                                                 │         │              │
│    └─────────────────────────────────────────────────┴─────────┴──────────────┘

2.3 Reversibility Rule (CRITICAL)

Core Principle: Only when an Adapter returns an explicitly defined failure can we safely rollback.

Response TypeMeaningCan Safely Rollback?Handling
SUCCESSOperation succeededN/AContinue to next step
EXPLICIT_FAILExplicit business failure (e.g., insufficient balance)YESCan enter COMPENSATING
TIMEOUTTimeout, state unknownNOInfinite Retry
PENDINGProcessing, state unknownNOInfinite Retry
NETWORK_ERRORNetwork error, state unknownNOInfinite Retry
UNKNOWNAny other situationNOInfinite Retry or Manual Intervention

Caution

Only EXPLICIT_FAIL allows safe rollback. Any unknown state (Timeout, Pending, Network Error) means funds are In-Flight. We cannot know whether the counterparty has processed the request. Rash rollback will cause Double Spend or Fund Loss. Only safe actions: Infinite Retry or Manual Intervention.


3. Transfer Scenarios (Step-by-Step)

3.1 Scenario A: Funding → Spot (Deposit to Trading)

Happy Path:

StepActorActionPre-StatePost-StateFunds
1APIValidate, Create Record-INITFunding
2CoordinatorCAS(INITSOURCE_PENDING)INITSOURCE_PENDINGFunding
3CoordinatorCall FundingAdapter.withdraw(req_id)---
4PGUPDATE balances SET amount = amount - X--Deducted
5CoordinatorOn OK: CAS(SOURCE_PENDINGSOURCE_DONE)SOURCE_PENDINGSOURCE_DONEIn-Flight
6CoordinatorCAS(SOURCE_DONETARGET_PENDING)SOURCE_DONETARGET_PENDINGIn-Flight
7CoordinatorCall TradingAdapter.deposit(req_id)---
8UBSCoreCredit RAM, Write WAL, Emit Event--Credited
9CoordinatorOn Event: CAS(TARGET_PENDINGCOMMITTED)TARGET_PENDINGCOMMITTEDTrading

Failure Path (Target Fails):

StepActorActionPre-StatePost-StateFunds
7’CoordinatorCall TradingAdapter.deposit(req_id)FAIL/TimeoutTARGET_PENDING-In-Flight
8’CoordinatorCheck: Source = Funding (Reversible)---
9’CoordinatorCAS(TARGET_PENDINGCOMPENSATING)TARGET_PENDINGCOMPENSATINGIn-Flight
10’CoordinatorCall FundingAdapter.refund(req_id)---
11’PGUPDATE balances SET amount = amount + X--Refunded
12’CoordinatorCAS(COMPENSATINGROLLED_BACK)COMPENSATINGROLLED_BACKFunding

3.2 Scenario B: Spot → Funding (Withdraw from Trading)

Happy Path:

StepActorActionPre-StatePost-StateFunds
1APIValidate, Create Record-INITTrading
2CoordinatorCAS(INITSOURCE_PENDING)INITSOURCE_PENDINGTrading
3CoordinatorCall TradingAdapter.withdraw(req_id)---
4UBSCoreCheck Balance, Deduct RAM, Write WAL, Emit Event--Deducted
5CoordinatorOn Event: CAS(SOURCE_PENDINGSOURCE_DONE)SOURCE_PENDINGSOURCE_DONEIn-Flight
6CoordinatorCAS(SOURCE_DONETARGET_PENDING)SOURCE_DONETARGET_PENDINGIn-Flight
7CoordinatorCall FundingAdapter.deposit(req_id)---
8PGINSERT ... ON CONFLICT UPDATE SET amount = amount + X--Credited
9CoordinatorOn OK: CAS(TARGET_PENDINGCOMMITTED)TARGET_PENDINGCOMMITTEDFunding

Failure Path (Target Fails):

StepActorActionPre-StatePost-StateFunds
7aCoordinatorCall FundingAdapter.deposit(req_id)EXPLICIT_FAIL (e.g., constraint)TARGET_PENDING-In-Flight
8aCoordinatorCheck response type = EXPLICIT_FAIL (can safely rollback)---
9aCoordinatorCAS(TARGET_PENDINGCOMPENSATING)TARGET_PENDINGCOMPENSATINGIn-Flight
10aCoordinatorCall TradingAdapter.refund(req_id) (refund to UBSCore)---
11aUBSCoreCredit RAM balance, write WAL--Refunded
12aCoordinatorCAS(COMPENSATINGROLLED_BACK)COMPENSATINGROLLED_BACKTrading
StepActorActionPre-StatePost-StateFunds
7bCoordinatorCall FundingAdapter.deposit(req_id)TIMEOUT/UNKNOWNTARGET_PENDING-In-Flight
8bCoordinatorCheck response type = UNKNOWN (cannot safely rollback)---
9bCoordinatorDO NOT TRANSITION. Stay TARGET_PENDING.TARGET_PENDINGTARGET_PENDINGIn-Flight
10bCoordinatorLog CRITICAL. Alert Ops. Schedule Retry.---
11bRecoveryRetry FundingAdapter.deposit(req_id) INFINITELY.---
12b(Eventually)On OK: CAS(TARGET_PENDINGCOMMITTED)TARGET_PENDINGCOMMITTEDFunding

Warning

Only enter COMPENSATING when Target returns EXPLICIT_FAIL. If Timeout or Unknown, funds are In-Flight. Must Infinite Retry or Manual Intervention.


4. Failure Mode and Effects Analysis (FMEA)

4.1 Phase 1 Failures (Source Operation)

FailureCauseCurrent StateFundsResolution
Adapter returns FAILInsufficient balance, DB constraintSOURCE_PENDINGSourceTransition to FAILED. User sees error.
Adapter returns PENDINGTimeout, network issueSOURCE_PENDINGUnknownRetry. Adapter MUST be idempotent.
Coordinator crashes after CAS, before callProcess killSOURCE_PENDINGSourceRecovery Worker retries call.
Coordinator crashes after call, before resultProcess killSOURCE_PENDINGUnknownRecovery Worker retries (idempotent).

4.2 Phase 2 Failures (Target Operation)

FailureCauseResponse TypeCurrent StateFundsResolution
Target explicit rejectBusiness ruleEXPLICIT_FAILTARGET_PENDINGIn-FlightCOMPENSATING → Refund.
TimeoutNetwork delayTIMEOUTTARGET_PENDINGUnknownInfinite Retry.
Network errorConnection lostNETWORK_ERRORTARGET_PENDINGUnknownInfinite Retry.
Unknown errorSystem exceptionUNKNOWNTARGET_PENDINGUnknownInfinite Retry or Manual Intervention.
Coordinator crashesProcess killN/ATARGET_PENDINGIn-FlightRecovery Worker retries.

4.3 Compensation Failures

FailureCauseCurrent StateFundsResolution
Refund FAILPG down, constraintCOMPENSATINGIn-FlightInfinite Retry. Funds stuck until PG up.
Refund PENDINGTimeoutCOMPENSATINGUnknownRetry.

5. Idempotency Requirements (MANDATORY)

5.1 Why Idempotency?

Retries are the foundation of crash recovery. Without idempotency, a retry will cause double execution (double deduction, double credit).

5.2 Implementation (Funding Adapter)

Requirement: Given the same req_id, calling withdraw() or deposit() multiple times MUST have the same effect as calling it once.

Mechanism:

  1. transfers_tb has UNIQUE(req_id).
  2. Atomic Transaction:
    BEGIN;
    -- Check if already processed
    SELECT state FROM transfers_tb WHERE req_id = $1;
    IF state >= expected_post_state THEN
        RETURN 'AlreadyProcessed';
    END IF;
    
    -- Perform balance update
    UPDATE balances_tb SET amount = amount - $2 WHERE user_id = $3 AND asset_id = $4 AND amount >= $2;
    IF NOT FOUND THEN
        RETURN 'InsufficientBalance';
    END IF;
    
    -- Update state
    UPDATE transfers_tb SET state = $new_state, updated_at = NOW() WHERE req_id = $1;
    COMMIT;
    RETURN 'Success';
    

5.3 Implementation (Trading Adapter)

Requirement: Same as above. UBSCore MUST reject duplicate req_id.

Mechanism:

  1. InternalOrder includes req_id field (or cid).
  2. UBSCore maintains a ProcessedTransferSet (HashSet in RAM, rebuilt from WAL on restart).
  3. On receiving Transfer Order:
    IF req_id IN ProcessedTransferSet THEN
        RETURN 'AlreadyProcessed' (Success, no-op)
    ELSE
        ProcessTransfer()
        ProcessedTransferSet.insert(req_id)
        WriteWAL(TransferEvent)
        RETURN 'Success'
    END IF
    

6. Recovery Worker (Zombie Handler)

6.1 Purpose

On Coordinator startup (or periodically), scan for “stuck” transfers and resume them.

6.2 Query

SELECT * FROM transfers_tb 
WHERE state IN (0, 10, 20, 30, -20) -- INIT, SOURCE_PENDING, SOURCE_DONE, TARGET_PENDING, COMPENSATING
  AND updated_at < NOW() - INTERVAL '1 minute'; -- Stale threshold

6.3 Recovery Logic

Current StateAction
INITCall step() (will transition to SOURCE_PENDING).
SOURCE_PENDINGRetry Source.withdraw().
SOURCE_DONECall step() (will transition to TARGET_PENDING).
TARGET_PENDINGRetry Target.deposit(). Apply Reversibility Rule.
COMPENSATINGRetry Source.refund().

7. Data Model

7.1 Table: transfers_tb

CREATE TABLE transfers_tb (
    transfer_id   BIGSERIAL PRIMARY KEY,
    req_id        VARCHAR(26) UNIQUE NOT NULL,  -- Server-generated Unique ID (ULID)
    cid           VARCHAR(64) UNIQUE,           -- Client Idempotency Key (Optional)
    user_id       BIGINT NOT NULL,
    asset_id      INTEGER NOT NULL,
    amount        DECIMAL(30, 8) NOT NULL,
    transfer_type SMALLINT NOT NULL,            -- 1 = Funding->Spot, 2 = Spot->Funding
    source_type   SMALLINT NOT NULL,            -- 1 = Funding, 2 = Trading
    state         SMALLINT NOT NULL DEFAULT 0,  -- FSM State ID
    error_message TEXT,                         -- Last error (for debugging)
    retry_count   INTEGER NOT NULL DEFAULT 0,
    created_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at    TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_transfers_state ON transfers_tb(state) WHERE state NOT IN (40, -10, -30);

7.2 Invariant Check

Run periodically to detect data corruption:

-- Sum of Funding + Trading + In-Flight should be constant per user per asset
-- In-Flight = SUM(amount) WHERE state IN (SOURCE_DONE, TARGET_PENDING, COMPENSATING)

8. API Contract

8.1 Endpoint: POST /api/v1/internal_transfer

Request:

{
  "from": "SPOT",       // Source account type
  "to": "FUNDING",     // Target account type
  "asset": "USDT",
  "amount": "100.00"
}

Account Type Enum (AccountType):

ValueMeaningStatus
FUNDINGFunding Account (PostgreSQL)Supported
SPOTSpot Trading Account (UBSCore)Supported
FUTUREFutures AccountFuture Extension
MARGINMargin AccountFuture Extension

Response:

{
  "transfer_id": 12345,
  "req_id": "01JFVQ2X8Z0Y1M3N4P5R6S7T8U",  // Server-generated (ULID)
  "from": "SPOT",
  "to": "FUNDING",
  "state": "COMMITTED",  // or "PENDING" if async
  "message": "Transfer successful"
}

8.2 Query Endpoint: GET /api/v1/internal_transfer/:req_id

Response:

{
  "transfer_id": 12345,
  "req_id": "sr-1734912345678901234",
  "from": "SPOT",
  "to": "FUNDING",
  "asset": "USDT",
  "amount": "100.00",
  "state": "COMMITTED",
  "created_at": "2024-12-23T14:00:00Z",
  "updated_at": "2024-12-23T14:00:01Z"
}

Important

req_id is SERVER-GENERATED, not client. If client needs idempotency, use optional cid (client_order_id) field. Server will check for duplicates and return existing result.

Error Codes:

CodeMeaning
INSUFFICIENT_BALANCESource account balance < amount.
INVALID_ACCOUNT_TYPEfrom or to account type is invalid or unsupported.
SAME_ACCOUNTfrom and to are the same.
DUPLICATE_REQUESTcid already processed. Return original result.
INVALID_AMOUNTamount <= 0 or exceeds precision.
SYSTEM_ERRORInternal failure. Advise retry.

9. Implementation Pseudocode (Critical State Checks)

9.1 API Layer

function handle_transfer_request(request, auth_context):
    // ========== Defense-in-Depth Layer 1: API Layer ==========
    
    // 1. Identity Authentication
    if !auth_context.is_valid():
        return Error(UNAUTHORIZED)
    
    // 2. User ID Consistency (Prevent cross-user attacks)
    if request.user_id != auth_context.user_id:
        return Error(FORBIDDEN, "User ID mismatch")
    
    // 3. Account Type Check
    if request.from == request.to:
        return Error(SAME_ACCOUNT)
    
    if request.from NOT IN [FUNDING, SPOT]:
        return Error(INVALID_ACCOUNT_TYPE)
    
    if request.to NOT IN [FUNDING, SPOT]:
        return Error(INVALID_ACCOUNT_TYPE)
    
    // 4. Amount Check
    if request.amount <= 0:
        return Error(INVALID_AMOUNT)
    
    if decimal_places(request.amount) > asset.precision:
        return Error(PRECISION_OVERFLOW)
    
    // 5. Idempotency Check
    if request.cid:
        existing = db.find_by_cid(request.cid)
        if existing:
            return Success(existing)  // Return existing result
    
    // 6. Asset Check
    asset = db.get_asset(request.asset_id)
    if !asset or asset.status != ACTIVE:
        return Error(INVALID_ASSET)
    
    // 7. Call Coordinator
    result = coordinator.create_and_execute(request)
    return result

9.2 Coordinator Layer

function create_and_execute(request):
    // ========== Defense-in-Depth Layer 2: Coordinator ==========
    
    // Re-verify (Prevent internal calls bypassing API)
    ASSERT request.from != request.to
    ASSERT request.amount > 0
    ASSERT request.user_id > 0
    
    // Generate unique ID
    req_id = ulid.new()
    
    // Create transfer record (State = INIT)
    transfer = TransferRecord {
        req_id: req_id,
        user_id: request.user_id,
        from: request.from,
        to: request.to,
        asset_id: request.asset_id,
        amount: request.amount,
        state: INIT,
        created_at: now()
    }
    
    db.insert(transfer)
    log.info("Transfer created", req_id)
    
    // Execute FSM
    return execute_fsm(req_id)

function execute_fsm(req_id):
    loop:
        transfer = db.get(req_id)
        
        if transfer.state.is_terminal():
            return transfer
        
        new_state = step(transfer)
        
        if new_state == transfer.state:
            // No progress, wait for retry
            sleep(RETRY_INTERVAL)
            continue
    
function step(transfer):
    match transfer.state:
        INIT:
            return step_init(transfer)
        SOURCE_PENDING:
            return step_source_pending(transfer)
        SOURCE_DONE:
            return step_source_done(transfer)
        TARGET_PENDING:
            return step_target_pending(transfer)
        COMPENSATING:
            return step_compensating(transfer)
        _:
            return transfer.state  // Terminal, no processing
    
function step_init(transfer):
    // CAS: Persist state BEFORE calling adapter (Persist-Before-Call)
    success = db.cas_update(
        req_id = transfer.req_id,
        old_state = INIT,
        new_state = SOURCE_PENDING
    )
    
    if !success:
        return db.get(transfer.req_id).state
    
    // Get source adapter
    source_adapter = get_adapter(transfer.from)
    
    // ========== Defense-in-Depth Layer 3: Adapter ==========
    result = source_adapter.withdraw(
        req_id = transfer.req_id,
        user_id = transfer.user_id,
        asset_id = transfer.asset_id,
        amount = transfer.amount
    )
    
    match result:
        SUCCESS:
            db.cas_update(transfer.req_id, SOURCE_PENDING, SOURCE_DONE)
            return SOURCE_DONE
        
        EXPLICIT_FAIL(reason):
            db.update_with_error(transfer.req_id, SOURCE_PENDING, FAILED, reason)
            return FAILED
        
        TIMEOUT | PENDING | NETWORK_ERROR | UNKNOWN:
            log.warn("Source withdraw unknown state", transfer.req_id)
            return SOURCE_PENDING

function step_source_done(transfer):
    // ========== Enter SOURCE_DONE: Funds In-Flight, must reach terminal state ==========
    
    // CAS update to TARGET_PENDING
    success = db.cas_update(transfer.req_id, SOURCE_DONE, TARGET_PENDING)
    if !success:
        return db.get(transfer.req_id).state
    
    // Get target adapter
    target_adapter = get_adapter(transfer.to)
    
    // ========== Defense-in-Depth Layer 4: Target Adapter ==========
    result = target_adapter.deposit(
        req_id = transfer.req_id,
        user_id = transfer.user_id,
        asset_id = transfer.asset_id,
        amount = transfer.amount
    )
    
    match result:
        SUCCESS:
            // ╔════════════════════════════════════════════════════════════════╗
            // ║  🔒 ATOMIC COMMIT - CRITICAL STEP!                             ║
            // ║                                                                ║
            // ║  At this point:                                                ║
            // ║    FROM.withdraw = SUCCESS ✓ (already confirmed)               ║
            // ║    TO.deposit    = SUCCESS ✓ (just confirmed)                  ║
            // ║                                                                ║
            // ║  Execute Atomic CAS Commit:                                    ║
            // ║    CAS(TARGET_PENDING → COMMITTED)                            ║
            // ║                                                                ║
            // ║  Once this CAS succeeds, the transfer is irreversible!         ║
            // ╚════════════════════════════════════════════════════════════════╝
            
            commit_success = db.cas_update(transfer.req_id, TARGET_PENDING, COMMITTED)
            
            if !commit_success:
                return db.get(transfer.req_id).state
            
            log.info("🔒 ATOMIC COMMIT SUCCESS", transfer.req_id)
            return COMMITTED
        
        EXPLICIT_FAIL(reason):
            db.update_with_error(transfer.req_id, TARGET_PENDING, COMPENSATING, reason)
            return COMPENSATING
        
        TIMEOUT | PENDING | NETWORK_ERROR | UNKNOWN:
            // ========== CRITICAL: Unknown state, MUST NOT compensate! ==========
            log.critical("Target deposit unknown state - INFINITE RETRY", transfer.req_id)
            alert_ops("Transfer stuck in TARGET_PENDING", transfer.req_id)
            return TARGET_PENDING  // Stay and retry

function step_compensating(transfer):
    source_adapter = get_adapter(transfer.from)
    
    result = source_adapter.refund(
        req_id = transfer.req_id,
        user_id = transfer.user_id,
        asset_id = transfer.asset_id,
        amount = transfer.amount
    )
    
    match result:
        SUCCESS:
            db.cas_update(transfer.req_id, COMPENSATING, ROLLED_BACK)
            log.info("Transfer rolled back", transfer.req_id)
            return ROLLED_BACK
        
        _:
            log.critical("Refund failed - MUST RETRY", transfer.req_id)
            return COMPENSATING

9.3 Adapter Layer (Example: Funding Adapter)

function withdraw(req_id, user_id, asset_id, amount):
    // ========== Defense-in-Depth Layer 3: Adapter Internal Verification ==========
    
    // Re-verify parameters (Do not trust caller)
    ASSERT amount > 0
    ASSERT user_id > 0
    ASSERT asset_id > 0
    
    // Idempotency Check
    existing = db.find_transfer_operation(req_id, "WITHDRAW")
    if existing:
        return existing.result
    
    // Begin transaction
    tx = db.begin_transaction()
    try:
        // SELECT FOR UPDATE
        account = tx.select_for_update(
            "SELECT * FROM balances_tb WHERE user_id = ? AND asset_id = ? AND account_type = 'FUNDING'"
        )
        
        if !account:
            tx.rollback()
            return EXPLICIT_FAIL("SOURCE_ACCOUNT_NOT_FOUND")
        
        if account.status == FROZEN:
            tx.rollback()
            return EXPLICIT_FAIL("ACCOUNT_FROZEN")
        
        if account.available < amount:
            tx.rollback()
            return EXPLICIT_FAIL("INSUFFICIENT_BALANCE")
        
        // Execute deduction
        tx.update("UPDATE balances_tb SET available = available - ? WHERE id = ?", amount, account.id)
        
        // Record operation for idempotency
        tx.insert("INSERT INTO transfer_operations (req_id, op_type, result) VALUES (?, 'WITHDRAW', 'SUCCESS')")
        
        tx.commit()
        return SUCCESS
        
    catch Exception as e:
        tx.rollback()
        log.error("Withdraw failed", req_id, e)
        return UNKNOWN  // Uncertainty requires retry

10. Acceptance Test Plan (Security Critical)

Caution

ALL tests below must pass before going production. Any failure indicates potential fund theft, loss, or creation from thin air.

10.1 Fund Conservation Tests

Test IDScenarioExpected ResultVerification
INV-001After normal transferTotal funds = BeforeSUM(source) + SUM(target) = Constant
INV-002After failed transferTotal funds = BeforeSource balance unchanged
INV-003After rollbackTotal funds = BeforeSource balance fully restored
INV-004After crash recoveryTotal funds = BeforeVerify all account balances

10.2 External Attack Tests

Test IDAttack VectorStepsExpected Result
ATK-001Cross-user transferSubmits user B’s funds with user A’s tokenFORBIDDEN
ATK-002user_id TamperingModify user_id in request bodyFORBIDDEN
ATK-003Negative Amountamount = -100INVALID_AMOUNT
ATK-004Zero Amountamount = 0INVALID_AMOUNT
ATK-005Precision Overflowamount = 0.000000001 (>8 decimals)PRECISION_OVERFLOW
ATK-006Integer Overflowamount = u64::MAX + 1OVERFLOW or parse error
ATK-007Same Accountfrom = to = SPOTSAME_ACCOUNT
ATK-008Invalid Account Typefrom = “INVALID”INVALID_ACCOUNT_TYPE
ATK-009Non-existent Assetasset_id = 999999INVALID_ASSET
ATK-010Duplicate cidSubmit same ID twiceSecond returns first result
ATK-011No TokenMissing Authorization headerUNAUTHORIZED
ATK-012Expired TokenUse expired JWTUNAUTHORIZED
ATK-013Forged TokenInvalid signature JWTUNAUTHORIZED

10.3 Balance & Status Tests

Test IDScenarioExpected Result
BAL-001amount > availableINSUFFICIENT_BALANCE, no change
BAL-002amount = availableSuccess, balance becomes 0
BAL-003Concurrent: Total > balanceOne success, one INSUFFICIENT_BALANCE
BAL-004Transfer from frozen accountACCOUNT_FROZEN
BAL-005Transfer from disabled accountACCOUNT_DISABLED

10.4 FSM State Transition Tests

Test IDScenarioExpected State Flow
FSM-001Normal Funding→SpotINIT → SOURCE_PENDING → SOURCE_DONE → TARGET_PENDING → COMMITTED
FSM-002Normal Spot→FundingSame as above
FSM-003Source FailureINIT → SOURCE_PENDING → FAILED
FSM-004Target Failure (Explicit)… → TARGET_PENDING → COMPENSATING → ROLLED_BACK
FSM-005Target Timeout… → TARGET_PENDING (Stay, infinite retry)
FSM-006Compensation FailureCOMPENSATING (Stay, infinite retry)

10.5 Crash Recovery Tests

Test IDCrash PointExpected Recovery Behavior
CRA-001After INIT, before SOURCE_PENDINGRecovery reads INIT, restarts step_init
CRA-002During SOURCE_PENDING, before callRecovery retries withdraw (idempotent)
CRA-003During SOURCE_PENDING, after callRecovery retries withdraw (idempotent, returns handled)
CRA-004After SOURCE_DONE, before TARGET_PENDINGRecovery executes step_source_done
CRA-005During TARGET_PENDINGRecovery retries deposit (idempotent)
CRA-006During COMPENSATINGRecovery retries refund (idempotent)

10.6 Concurrency & Race Tests

Test IDScenarioExpected Result
CON-001Multiple Workers on same req_idOnly one successful CAS, others skip
CON-002Concurrent Same Amount TranserTwo separate req_ids, both execute
CON-003Transfer + External WithdrawSum cannot exceed balance
CON-004No-lock balance readNo double deduction (SELECT FOR UPDATE)

10.7 Idempotency Tests

Test IDScenarioExpected Result
IDP-001Call withdraw twiceSecond returns SUCCESS, balance deducted once
IDP-002Call deposit twiceSecond returns SUCCESS, balance credited once
IDP-003Call refund twiceSecond returns SUCCESS, balance credited once
IDP-004Recovery multiple retriesFinal state consistent, balance correct

10.8 Fund Anomaly Tests (Most Critical)

Test IDThreatMethodVerification
FND-001Double SpendSource deduct twiceOnly deduct once (idempotent)
FND-002Fund DisappearanceSource success, target fail, no compensationMust compensate or retry
FND-003Money from NothingTarget credit twiceOnly credit once (idempotent)
FND-004Lost in TransitCrash at any pointRecovery restores integrity
FND-005State InconsistencySOURCE_DONE but DB not updatedWAL + Idempotency parity
FND-006Partial CommitPG Transaction partial successAtomic transaction (all or none)

10.9 Monitoring & Alerting Tests

Test IDScenarioExpected Alert
MON-001Stuck in TARGET_PENDING > 1mCRITICAL Alert
MON-002Compensation fail 3 timesCRITICAL Alert
MON-003Fund conservation check failCRITICAL Alert + HALT Service
MON-004Abnormal freq per userWARNING Alert [P2]

🇨🇳 中文

📦 代码变更: 查看 Diff


1. 问题陈述

1.1 系统拓扑

系统角色数据源持久化
PostgreSQL资金账户 (Funding)balances_tbACID, 持久化
UBSCore交易账户 (Trading)RAMWAL + 易失性

1.2 核心约束

这两个系统 无法共享事务。没有 XA/2PC 数据库协议。 因此:我们必须使用外部 FSM 协调器构建自己的两阶段提交。


1.5 安全前置检查 (MANDATORY)

Caution

纵深防御 (Defense-in-Depth) 以下所有检查必须在 每一个独立模块 中执行,不仅仅是 API 层。

  • API 层: 第一道防线,拒绝明显非法请求
  • Coordinator: 再次验证,防止内部调用绕过 API
  • Adapters: 最终防线,每个适配器必须独立验证参数
  • UBSCore: 内存操作前最后一次检查

安全 > 性能。重复检查的开销可以接受,安全漏洞不可接受。

1.5.1 身份与授权检查

检查项攻击向量验证逻辑错误码
用户认证伪造请求JWT/Session 必须有效UNAUTHORIZED
用户 ID 一致性跨用户转账攻击request.user_id == auth.user_idFORBIDDEN
账户归属转走他人资金源/目标账户都属于同一 user_idFORBIDDEN

1.5.2 账户类型检查

检查项攻击向量验证逻辑错误码
from != to无限刷单/浪费资源request.from != request.toSAME_ACCOUNT
账户类型有效注入无效类型from, to ∈ {FUNDING, SPOT}INVALID_ACCOUNT_TYPE
账户类型支持请求未上线功能from, to 都在支持列表中UNSUPPORTED_ACCOUNT_TYPE

1.5.3 金额检查

检查项攻击向量验证逻辑错误码
amount > 0零/负数转账amount > 0INVALID_AMOUNT
精度检查精度溢出decimal_places(amount) <= asset.precisionPRECISION_OVERFLOW
最小金额微额攻击/粉尘攻击amount >= asset.min_transfer_amountAMOUNT_TOO_SMALL
最大单笔金额风控绕过amount <= asset.max_transfer_amountAMOUNT_TOO_LARGE
整数溢出u64 溢出攻击amount <= u64::MAX / safety_factorOVERFLOW

1.5.4 资产检查

检查项攻击向量验证逻辑错误码
资产存在伪造 asset_idasset_id 在系统中存在INVALID_ASSET
资产状态已下架资产asset.status == ACTIVEASSET_SUSPENDED
转账许可某些资产禁止内部转账asset.internal_transfer_enabled == trueTRANSFER_NOT_ALLOWED

1.5.5 账户状态检查

账户初始化规则(概述)

账户类型初始化时机备注
FUNDING首次申请充值时创建外部充值流程触发
SPOT首次内部转账时创建懒加载 (Lazy Init)
FUTURE首次内部转账时创建 [P2]懒加载
MARGIN首次内部转账时创建 [P2]懒加载

Note

  • 各账户类型的具体初始化行为和业务规则,请参见各账户类型的专用文档。
  • 每个账户都有自己的状态定义(如是否允许划转),当前不详细定义。
  • 默认状态:账户初始化时,默认允许划转。

账户状态检查表

检查项攻击向量验证逻辑错误码
源账户存在不存在的账户源账户记录必须存在SOURCE_ACCOUNT_NOT_FOUND
目标账户存在/创建不存在的目标FUNDING必须存在;SPOT/FUTURE/MARGIN可创建TARGET_ACCOUNT_NOT_FOUND (仅FUNDING)
源账户未冻结被冻结账户转出source.status != FROZENACCOUNT_FROZEN
源账户未禁用被禁用账户操作source.status != DISABLEDACCOUNT_DISABLED
余额充足余额不足直接拒绝source.available >= amountINSUFFICIENT_BALANCE

1.5.6 频率限制 (Rate Limiting) - [P2 未来优化]

Note

此部分为 V2 优化项,V1 可不实现。

检查项攻击向量验证逻辑错误码
每秒请求数DoS 攻击user_requests_per_second <= 10RATE_LIMIT_EXCEEDED
每日转账次数滥用user_daily_transfers <= 100DAILY_LIMIT_EXCEEDED
每日转账金额大额风控user_daily_amount <= daily_limitDAILY_AMOUNT_EXCEEDED

1.5.7 幂等性检查

检查项攻击向量验证逻辑错误码
cid 唯一重复提交如提供 cid,检查是否已存在DUPLICATE_REQUEST (返回原结果)

1.5.8 检查顺序 (推荐)

1. 身份认证 (JWT 有效?)
2. 授权检查 (user_id 匹配?)
3. 请求格式 (from/to/amount 有效?)
4. 账户类型 (from != to, 类型支持?)
5. 资产检查 (存在? 启用? 可转账?)
6. 金额检查 (范围? 精度? 溢出?)
7. 频率限制 (超限?)
8. 幂等性 (重复?)
9. 余额检查 (充足?) ← 最后检查,避免无谓查询

2. FSM 设计 (状态机)

2.0 库选择: rust-fsm

使用 rust-fsm,提供:

  • 编译时验证 - 非法状态转换在编译时报错
  • 声明式 DSL - 清晰定义状态和转换
  • 类型安全 - 防止遗漏分支

Cargo.toml:

[dependencies]
rust-fsm = "0.7"

DSL 定义:

#![allow(unused)]
fn main() {
use rust_fsm::*;

state_machine! {
    derive(Debug, Clone, Copy, PartialEq, Eq)
    
    TransferFsm(Init)  // 初始状态
    
    // 状态定义
    Init => {
        SourceWithdrawOk => SourceDone,
        SourceWithdrawFail => Failed,
    },
    SourceDone => {
        TargetDepositOk => Committed,
        TargetDepositFail => Compensating,
        TargetDepositUnknown => SourceDone [loop],  // 保持,无限重试
    },
    Compensating => {
        RefundOk => RolledBack,
        RefundFail => Compensating [loop],  // 保持,无限重试
    },
    // 终态
    Committed,
    Failed,
    RolledBack,
}
}

Note

上述 DSL 用于编译时验证状态转换的合法性。 实际运行时状态存储在 PostgreSQL,使用 CAS 更新。

2.0.1 核心状态流程图 (Top Level)

                              ┌─────────────────────────────────────────────────────────┐
                              │              INTERNAL TRANSFER FSM                       │
                              └─────────────────────────────────────────────────────────┘

   ┌─────────────────────────────── 正常路径 (Happy Path) ──────────────────────────────────┐
   │                                                                                        │
   │   ┌─────────┐                    ┌─────────────┐                    ┌───────────────┐  │
   │   │  INIT   │   源扣减成功 ✓     │ SOURCE_DONE │   目标入账成功 ✓   │               │  │
   │   │(用户请求)│ ─────────────────▶ │ (资金在途)  │ ─────────────────▶ │   COMMITTED   │  │
   │   └─────────┘                    └─────────────┘                    │               │  │
   │        │                               │                            └───────────────┘  │
   │        │                               │                                   ✅          │
   └────────│───────────────────────────────│───────────────────────────────────────────────┘
            │                               │
            │                               │
            │                               ▼
            │                     ╔══════════════════════════════════════════════════╗
            │                     ║  🔒 ATOMIC COMMIT (原子提交)                     ║
            │                     ║                                                  ║
            │                     ║  当且仅当:                                       ║
            │                     ║    FROM.withdraw = SUCCESS  ✓                   ║
            │                     ║    TO.deposit    = SUCCESS  ✓                   ║
            │                     ║                                                  ║
            │                     ║  执行: CAS(SOURCE_DONE → COMMITTED)             ║
            │                     ║  此操作必须原子,不可中断                         ║
            │                     ╚══════════════════════════════════════════════════╝
            │                               │
            │ 源扣减失败                     │ 目标入账失败 (明确 EXPLICIT_FAIL)
            ▼                               ▼
      ┌──────────┐                   ┌──────────────┐
      │  FAILED  │                   │ COMPENSATING │◀───────────┐
      │ (源失败)  │                   │  (退款中)    │            │ 退款失败 (无限重试)
      └──────────┘                   └──────────────┘────────────┘
           ❌                               │ 退款成功
                                            ▼
                                     ┌─────────────┐
                                     │ ROLLED_BACK │
                                     │  (已回滚)    │
                                     └─────────────┘
                                           ↩️

   ╔════════════════════════════════════════════════════════════════════════════════════════╗
   ║  ⚠️ 目标入账状态未知 (TIMEOUT/UNKNOWN) → 保持 SOURCE_DONE,无限重试,绝不进入 COMPENSATING║
   ╚════════════════════════════════════════════════════════════════════════════════════════╝

核心状态说明:

状态资金位置说明
INIT源账户用户发起请求,资金尚未移动
SOURCE_DONE在途关键点!资金已离开源,尚未到达目标
COMMITTED目标账户终态,转账成功
FAILED源账户终态,源扣减失败,无资金移动
COMPENSATING在途目标入账失败,正在退款
ROLLED_BACK源账户终态,退款成功

Important

SOURCE_DONE 是最关键的状态 - 资金已离开源账户但尚未到达目标。 此时绝不能丢失状态,必须确保最终到达 COMMITTEDROLLED_BACK


2.1 状态 (穷举)

ID状态名进入条件终态?资金位置
0INIT用户请求已接受源账户
10SOURCE_PENDINGCAS 成功,适配器调用已发起源账户 (扣减中)
20SOURCE_DONE源适配器返回 OK在途
30TARGET_PENDINGCAS 成功,目标适配器调用已发起在途 (入账中)
40COMMITTED目标适配器返回 OK目标账户
-10FAILED源适配器返回 FAIL源账户 (未变)
-20COMPENSATING目标适配器 FAIL 且源可逆在途 (退款中)
-30ROLLED_BACK源退款 OK源账户 (已恢复)

2.2 状态转换规则 (穷举)

┌───────────────────────────────────────────────────────────────────────────────┐
│                              规范状态转换                                       │
├───────────────────────────────────────────────────────────────────────────────┤
│                                                                               │
│  INIT ──────[CAS成功]───────► SOURCE_PENDING                                  │
│    │                              │                                           │
│    │                              ├──[适配器OK]────► SOURCE_DONE              │
│    │                              │                         │                 │
│    │                              └──[适配器FAIL]──► FAILED (终态)            │
│    │                                                        │                 │
│    │                                                        │                 │
│    │                              SOURCE_DONE ──[CAS成功]──► TARGET_PENDING   │
│    │                                                             │            │
│    │                        ┌────────────────────────────────────┤            │
│    │                        │                                    │            │
│    │            [适配器OK]  │                       [适配器FAIL]              │
│    │                        │                                    │            │
│    │                        ▼                                    ▼            │
│    │                   COMMITTED                     ┌───────────────────┐    │
│    │                   (终态)                        │   源可逆?          │    │
│    │                                                 └─────────┬─────────┘    │
│    │                                                   是      │     否       │
│    │                                                   ▼       │     ▼        │
│    │                                           COMPENSATING    │  无限重试    │
│    │                                                 │         │ (保持在      │
│    │                                    [退款OK]     │         │  TARGET_     │
│    │                                         ▼       │         │  PENDING)    │
│    │                                    ROLLED_BACK  │         │              │
│    │                                    (终态)       │         │              │
│    │                                                 │         │              │
│    └─────────────────────────────────────────────────┴─────────┴──────────────┘

2.3 可逆性规则 (关键)

核心原则: 只有当适配器返回 明确定义的失败 时,才能安全撤销。

响应类型含义可安全撤销?处理方式
SUCCESS操作成功N/A继续下一步
EXPLICIT_FAIL明确业务失败 (如余额不足)可进入 COMPENSATING
TIMEOUT超时,状态未知无限重试
PENDING处理中,状态未知无限重试
NETWORK_ERROR网络错误,状态未知无限重试
UNKNOWN任何其他情况无限重试或人工介入

Caution

只有 EXPLICIT_FAIL 可以安全撤销。 任何状态未知的情况(超时、Pending、网络错误),资金都处于 In-Flight 中。 我们无法知道对方是否已处理。贸然撤销将导致 双花资金丢失。 唯一安全操作:无限重试人工介入


3. 转账场景 (逐步)

3.1 场景 A: 资金 → 交易 (充值到交易账户)

正常路径:

步骤执行者操作前状态后状态资金
1API验证,创建记录-INIT资金账户
2协调器CAS(INITSOURCE_PENDING)INITSOURCE_PENDING资金账户
3协调器调用 FundingAdapter.withdraw(req_id)---
4PGUPDATE balances SET amount = amount - X--已扣减
5协调器收到 OK: CAS(SOURCE_PENDINGSOURCE_DONE)SOURCE_PENDINGSOURCE_DONE在途
6协调器CAS(SOURCE_DONETARGET_PENDING)SOURCE_DONETARGET_PENDING在途
7协调器调用 TradingAdapter.deposit(req_id)---
8UBSCore增加RAM余额,写WAL,发出事件--已入账
9协调器收到事件: CAS(TARGET_PENDINGCOMMITTED)TARGET_PENDINGCOMMITTED交易账户

失败路径 (目标失败):

步骤执行者操作前状态后状态资金
7’协调器调用 TradingAdapter.deposit(req_id)FAIL/超时TARGET_PENDING-在途
8’协调器检查: 源 = 资金账户 (可逆)---
9’协调器CAS(TARGET_PENDINGCOMPENSATING)TARGET_PENDINGCOMPENSATING在途
10’协调器调用 FundingAdapter.refund(req_id)---
11’PGUPDATE balances SET amount = amount + X--已退款
12’协调器CAS(COMPENSATINGROLLED_BACK)COMPENSATINGROLLED_BACK资金账户

3.2 场景 B: 交易 → 资金 (从交易账户提现)

正常路径:

步骤执行者操作前状态后状态资金
1API验证,创建记录-INIT交易账户
2协调器CAS(INITSOURCE_PENDING)INITSOURCE_PENDING交易账户
3协调器调用 TradingAdapter.withdraw(req_id)---
4UBSCore检查余额,扣减RAM,写WAL,发出事件--已扣减
5协调器收到事件: CAS(SOURCE_PENDINGSOURCE_DONE)SOURCE_PENDINGSOURCE_DONE在途
6协调器CAS(SOURCE_DONETARGET_PENDING)SOURCE_DONETARGET_PENDING在途
7协调器调用 FundingAdapter.deposit(req_id)---
8PGINSERT ... ON CONFLICT UPDATE SET amount = amount + X--已入账
9协调器收到 OK: CAS(TARGET_PENDINGCOMMITTED)TARGET_PENDINGCOMMITTED资金账户

失败路径 (目标失败):

步骤执行者操作前状态后状态资金
7a协调器调用 FundingAdapter.deposit(req_id)EXPLICIT_FAIL (如约束违反)TARGET_PENDING-在途
8a协调器检查响应类型 = EXPLICIT_FAIL (可安全撤销)---
9a协调器CAS(TARGET_PENDINGCOMPENSATING)TARGET_PENDINGCOMPENSATING在途
10a协调器调用 TradingAdapter.refund(req_id) (向UBSCore退款)---
11aUBSCore增加RAM余额,写WAL--已退款
12a协调器CAS(COMPENSATINGROLLED_BACK)COMPENSATINGROLLED_BACK交易账户
步骤执行者操作前状态后状态资金
7b协调器调用 FundingAdapter.deposit(req_id)TIMEOUT/UNKNOWNTARGET_PENDING-在途
8b协调器检查响应类型 = UNKNOWN (不可安全撤销)---
9b协调器不转换状态。保持 TARGET_PENDINGTARGET_PENDINGTARGET_PENDING在途
10b协调器记录 CRITICAL 日志。告警运维。安排重试。---
11b恢复器无限重试 FundingAdapter.deposit(req_id)---
12b(最终)收到 OK: CAS(TARGET_PENDINGCOMMITTED)TARGET_PENDINGCOMMITTED资金账户

Warning

只有当目标返回 EXPLICIT_FAIL 时才能进入 COMPENSATING 如果是超时或未知状态,资金处于 In-Flight,必须无限重试或人工介入。


4. 失效模式与影响分析 (FMEA)

4.1 阶段1失败 (源操作)

失败原因当前状态资金解决方案
适配器返回 FAIL余额不足,DB约束SOURCE_PENDING源账户转到 FAILED。用户看到错误。
适配器返回 PENDING超时,网络问题SOURCE_PENDING未知重试。适配器必须幂等。
协调器在CAS后、调用前崩溃进程终止SOURCE_PENDING源账户恢复工作器重试调用。
协调器在调用后、结果前崩溃进程终止SOURCE_PENDING未知恢复工作器重试(幂等)。

4.2 阶段2失败 (目标操作)

失败原因响应类型当前状态资金解决方案
目标明确拒绝业务规则EXPLICIT_FAILTARGET_PENDING在途COMPENSATING → 退款。
超时网络延迟TIMEOUTTARGET_PENDING未知无限重试
网络错误连接断开NETWORK_ERRORTARGET_PENDING未知无限重试
未知错误系统异常UNKNOWNTARGET_PENDING未知无限重试 或 人工介入。
协调器崩溃进程终止N/ATARGET_PENDING在途恢复工作器重试。

4.3 补偿失败

失败原因当前状态资金解决方案
退款 FAILPG宕机,约束COMPENSATING在途无限重试。资金卡住直到PG恢复。
退款 PENDING超时COMPENSATING未知重试

5. 幂等性要求 (强制)

5.1 为什么需要幂等性?

重试是崩溃恢复的基础。没有幂等性,重试将导致 双重执行(双重扣减、双重入账)。

5.2 实现 (资金适配器)

要求: 给定相同的 req_id,多次调用 withdraw()deposit() 必须与调用一次效果相同。

机制:

  1. transfers_tbUNIQUE(req_id)
  2. 原子事务:
    BEGIN;
    -- 检查是否已处理
    SELECT state FROM transfers_tb WHERE req_id = $1;
    IF state >= expected_post_state THEN
        RETURN 'AlreadyProcessed';
    END IF;
    
    -- 执行余额更新
    UPDATE balances_tb SET amount = amount - $2 WHERE user_id = $3 AND asset_id = $4 AND amount >= $2;
    IF NOT FOUND THEN
        RETURN 'InsufficientBalance';
    END IF;
    
    -- 更新状态
    UPDATE transfers_tb SET state = $new_state, updated_at = NOW() WHERE req_id = $1;
    COMMIT;
    RETURN 'Success';
    

5.3 实现 (交易适配器)

要求: 同上。UBSCore 必须拒绝重复的 req_id

机制:

  1. InternalOrder 包含 req_id 字段(或 cid)。
  2. UBSCore 维护一个 ProcessedTransferSet(RAM中的HashSet,重启时从WAL重建)。
  3. 收到转账订单时:
    IF req_id IN ProcessedTransferSet THEN
        RETURN 'AlreadyProcessed' (成功,无操作)
    ELSE
        ProcessTransfer()
        ProcessedTransferSet.insert(req_id)
        WriteWAL(TransferEvent)
        RETURN 'Success'
    END IF
    

6. 恢复工作器 (僵尸处理器)

6.1 目的

在协调器启动时(或定期),扫描“卡住“的转账并恢复它们。

6.2 查询

SELECT * FROM transfers_tb 
WHERE state IN (0, 10, 20, 30, -20) -- INIT, SOURCE_PENDING, SOURCE_DONE, TARGET_PENDING, COMPENSATING
  AND updated_at < NOW() - INTERVAL '1 minute'; -- 过期阈值

6.3 恢复逻辑

当前状态操作
INIT调用 step()(将转到 SOURCE_PENDING)。
SOURCE_PENDING重试 Source.withdraw()
SOURCE_DONE调用 step()(将转到 TARGET_PENDING)。
TARGET_PENDING重试 Target.deposit()。应用可逆性规则。
COMPENSATING重试 Source.refund()

7. 数据模型

7.1 表: transfers_tb

CREATE TABLE transfers_tb (
    transfer_id   BIGSERIAL PRIMARY KEY,
    req_id        VARCHAR(26) UNIQUE NOT NULL,  -- 服务端生成的唯一 ID (ULID)
    cid           VARCHAR(64) UNIQUE,           -- 客户端幂等键 (可选)
    user_id       BIGINT NOT NULL,
    asset_id      INTEGER NOT NULL,
    amount        DECIMAL(30, 8) NOT NULL,
    transfer_type SMALLINT NOT NULL,            -- 1 = 资金->交易, 2 = 交易->资金
    source_type   SMALLINT NOT NULL,            -- 1 = 资金, 2 = 交易
    state         SMALLINT NOT NULL DEFAULT 0,  -- FSM 状态 ID
    error_message TEXT,                         -- 最后错误(用于调试)
    retry_count   INTEGER NOT NULL DEFAULT 0,
    created_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at    TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_transfers_state ON transfers_tb(state) WHERE state NOT IN (40, -10, -30);

7.2 不变量检查

定期运行以检测数据损坏:

-- 每个用户每个资产的 资金 + 交易 + 在途 之和应该是常数
-- 在途 = SUM(amount) WHERE state IN (SOURCE_DONE, TARGET_PENDING, COMPENSATING)

8. API 契约

8.1 端点: POST /api/v1/internal_transfer

请求:

{
  "from": "SPOT",       // 源账户类型
  "to": "FUNDING",     // 目标账户类型
  "asset": "USDT",
  "amount": "100.00"
}

账户类型枚举 (AccountType):

含义状态
FUNDING资金账户 (PostgreSQL)已支持
SPOT现货交易账户 (UBSCore)已支持
FUTURE合约账户未来扩展
MARGIN杠杆账户未来扩展

响应:

{
  "transfer_id": 12345,
  "req_id": "01JFVQ2X8Z0Y1M3N4P5R6S7T8U",  // 服务端生成 (ULID)
  "from": "SPOT",
  "to": "FUNDING",
  "state": "COMMITTED",  // 或 "PENDING" 如果异步
  "message": "转账成功"
}

8.2 查询端点: GET /api/v1/internal_transfer/:req_id

响应:

{
  "transfer_id": 12345,
  "req_id": "sr-1734912345678901234",
  "from": "SPOT",
  "to": "FUNDING",
  "asset": "USDT",
  "amount": "100.00",
  "state": "COMMITTED",
  "created_at": "2024-12-23T14:00:00Z",
  "updated_at": "2024-12-23T14:00:01Z"
}

Important

req_id 由服务端生成,不是客户端。 客户端如果需要幂等性,应使用 cid (client_order_id) 字段(可选),服务端会检查重复并返回已有结果。

错误码:

代码含义
INSUFFICIENT_BALANCE源账户余额 < 金额。
INVALID_ACCOUNT_TYPEfromto 的账户类型无效或不支持。
SAME_ACCOUNTfromto 相同。
DUPLICATE_REQUESTcid 已处理。返回原始结果。
INVALID_AMOUNT金额 <= 0 或超过精度。
SYSTEM_ERROR内部失败。建议重试。

9. 实现伪代码 (关键状态检查)

9.1 API 层

function handle_transfer_request(request, auth_context):
    // ========== 纵深防御 Layer 1: API 层 ==========
    
    // 1. 身份认证
    if !auth_context.is_valid():
        return Error(UNAUTHORIZED)
    
    // 2. 用户 ID 一致性(防止跨用户攻击)
    if request.user_id != auth_context.user_id:
        return Error(FORBIDDEN, "User ID mismatch")
    
    // 3. 账户类型检查
    if request.from == request.to:
        return Error(SAME_ACCOUNT)
    
    if request.from NOT IN [FUNDING, SPOT]:
        return Error(INVALID_ACCOUNT_TYPE)
    
    if request.to NOT IN [FUNDING, SPOT]:
        return Error(INVALID_ACCOUNT_TYPE)
    
    // 4. 金额检查
    if request.amount <= 0:
        return Error(INVALID_AMOUNT)
    
    if decimal_places(request.amount) > asset.precision:
        return Error(PRECISION_OVERFLOW)
    
    // 5. 幂等性检查
    if request.cid:
        existing = db.find_by_cid(request.cid)
        if existing:
            return Success(existing)  // 返回已存在的结果
    
    // 6. 资产检查
    asset = db.get_asset(request.asset_id)
    if !asset or asset.status != ACTIVE:
        return Error(INVALID_ASSET)
    
    // 7. 调用 Coordinator
    result = coordinator.create_and_execute(request)
    return result

9.2 Coordinator 层

function create_and_execute(request):
    // ========== 纵深防御 Layer 2: Coordinator ==========
    
    // 再次验证(防止内部调用绕过 API)
    ASSERT request.from != request.to
    ASSERT request.amount > 0
    ASSERT request.user_id > 0
    
    // 生成唯一 ID
    req_id = ulid.new()
    
    // 创建转账记录 (State = INIT)
    transfer = TransferRecord {
        req_id: req_id,
        user_id: request.user_id,
        from: request.from,
        to: request.to,
        asset_id: request.asset_id,
        amount: request.amount,
        state: INIT,
        created_at: now()
    }
    
    db.insert(transfer)
    log.info("Transfer created", req_id)
    
    // 执行 FSM
    return execute_fsm(req_id)

function execute_fsm(req_id):
    loop:
        transfer = db.get(req_id)
        
        if transfer.state.is_terminal():
            return transfer
        
        new_state = step(transfer)
        
        if new_state == transfer.state:
            // 未进展,等待重试
            sleep(RETRY_INTERVAL)
            continue
    
function step(transfer):
    match transfer.state:
        INIT:
            return step_init(transfer)
        SOURCE_PENDING:
            return step_source_pending(transfer)
        SOURCE_DONE:
            return step_source_done(transfer)
        TARGET_PENDING:
            return step_target_pending(transfer)
        COMPENSATING:
            return step_compensating(transfer)
        _:
            return transfer.state  // 终态,不处理

function step_init(transfer):
    // CAS: 先更新状态,再调用适配器(Persist-Before-Call)
    success = db.cas_update(
        req_id = transfer.req_id,
        old_state = INIT,
        new_state = SOURCE_PENDING
    )
    
    if !success:
        // 并发冲突,重新读取
        return db.get(transfer.req_id).state
    
    // 获取源适配器
    source_adapter = get_adapter(transfer.from)
    
    // ========== 纵深防御 Layer 3: Adapter ==========
    result = source_adapter.withdraw(
        req_id = transfer.req_id,
        user_id = transfer.user_id,
        asset_id = transfer.asset_id,
        amount = transfer.amount
    )
    
    match result:
        SUCCESS:
            db.cas_update(transfer.req_id, SOURCE_PENDING, SOURCE_DONE)
            return SOURCE_DONE
        
        EXPLICIT_FAIL(reason):
            // 明确失败,可以安全终止
            db.update_with_error(transfer.req_id, SOURCE_PENDING, FAILED, reason)
            return FAILED
        
        TIMEOUT | PENDING | NETWORK_ERROR | UNKNOWN:
            // 状态未知,保持 SOURCE_PENDING,等待重试
            log.warn("Source withdraw unknown state", transfer.req_id)
            return SOURCE_PENDING

function step_source_done(transfer):
    // ========== 进入 SOURCE_DONE: 资金已在途,必须确保最终到达终态 ==========
    
    // CAS 更新到 TARGET_PENDING
    success = db.cas_update(transfer.req_id, SOURCE_DONE, TARGET_PENDING)
    if !success:
        return db.get(transfer.req_id).state
    
    // 获取目标适配器
    target_adapter = get_adapter(transfer.to)
    
    // ========== 纵深防御 Layer 4: Target Adapter ==========
    result = target_adapter.deposit(
        req_id = transfer.req_id,
        user_id = transfer.user_id,
        asset_id = transfer.asset_id,
        amount = transfer.amount
    )
    
    match result:
        SUCCESS:
            // ╔════════════════════════════════════════════════════════════════╗
            // ║  🔒 ATOMIC COMMIT - 最关键的一步!                             ║
            // ║                                                                ║
            // ║  此时:                                                         ║
            // ║    FROM.withdraw = SUCCESS ✓ (已确认)                         ║
            // ║    TO.deposit    = SUCCESS ✓ (刚确认)                         ║
            // ║                                                                ║
            // ║  执行原子 CAS 提交:                                            ║
            // ║    CAS(TARGET_PENDING → COMMITTED)                            ║
            // ║                                                                ║
            // ║  此 CAS 是最终确认,一旦成功,转账不可逆转!                    ║
            // ╚════════════════════════════════════════════════════════════════╝
            
            commit_success = db.cas_update(transfer.req_id, TARGET_PENDING, COMMITTED)
            
            if !commit_success:
                // 极少发生:另一个 Worker 已经提交,返回当前状态
                return db.get(transfer.req_id).state
            
            log.info("🔒 ATOMIC COMMIT SUCCESS", transfer.req_id)
            return COMMITTED
        
        EXPLICIT_FAIL(reason):
            // 明确失败,可以进入补偿
            db.update_with_error(transfer.req_id, TARGET_PENDING, COMPENSATING, reason)
            return COMPENSATING
        
        TIMEOUT | PENDING | NETWORK_ERROR | UNKNOWN:
            // ========== 关键:状态未知,不能补偿!==========
            log.critical("Target deposit unknown state - INFINITE RETRY", transfer.req_id)
            alert_ops("Transfer stuck in TARGET_PENDING", transfer.req_id)
            return TARGET_PENDING  // 保持状态,等待重试


function step_compensating(transfer):
    source_adapter = get_adapter(transfer.from)
    
    result = source_adapter.refund(
        req_id = transfer.req_id,
        user_id = transfer.user_id,
        asset_id = transfer.asset_id,
        amount = transfer.amount
    )
    
    match result:
        SUCCESS:
            db.cas_update(transfer.req_id, COMPENSATING, ROLLED_BACK)
            log.info("Transfer rolled back", transfer.req_id)
            return ROLLED_BACK
        
        _:
            // 退款失败,必须无限重试
            log.critical("Refund failed - MUST RETRY", transfer.req_id)
            return COMPENSATING

9.3 Adapter 层 (示例: Funding Adapter)

function withdraw(req_id, user_id, asset_id, amount):
    // ========== 纵深防御 Layer 3: Adapter 内部检查 ==========
    
    // 再次验证参数(不信任调用者)
    ASSERT amount > 0
    ASSERT user_id > 0
    ASSERT asset_id > 0
    
    // 幂等性检查
    existing = db.find_transfer_operation(req_id, "WITHDRAW")
    if existing:
        return existing.result  // 返回已处理的结果
    
    // 开始事务
    tx = db.begin_transaction()
    try:
        // 获取账户并锁定
        account = tx.select_for_update(
            "SELECT * FROM balances_tb WHERE user_id = ? AND asset_id = ? AND account_type = 'FUNDING'"
        )
        
        if !account:
            tx.rollback()
            return EXPLICIT_FAIL("SOURCE_ACCOUNT_NOT_FOUND")
        
        if account.status == FROZEN:
            tx.rollback()
            return EXPLICIT_FAIL("ACCOUNT_FROZEN")
        
        if account.available < amount:
            tx.rollback()
            return EXPLICIT_FAIL("INSUFFICIENT_BALANCE")
        
        // 执行扣减
        tx.update("UPDATE balances_tb SET available = available - ? WHERE id = ?", amount, account.id)
        
        // 记录操作(用于幂等性)
        tx.insert("INSERT INTO transfer_operations (req_id, op_type, result) VALUES (?, 'WITHDRAW', 'SUCCESS')")
        
        tx.commit()
        return SUCCESS
        
    catch Exception as e:
        tx.rollback()
        log.error("Withdraw failed", req_id, e)
        return UNKNOWN  // 不确定是否执行,必须重试

10. 验收测试计划 (安全关键)

Caution

以下测试必须全部通过才能上线。 任何失败都可能导致资金被盗、消失或无中生有。

10.1 资金守恒测试

测试 ID场景预期结果验证方法
INV-001正常转账后总资金 = 转账前SUM(source) + SUM(target) = 常数
INV-002失败转账后总资金 = 转账前源账户余额无变化
INV-003回滚后总资金 = 转账前源账户余额完全恢复
INV-004系统崩溃恢复后总资金 = 崩溃前遍历所有账户验证

10.2 外部攻击测试

测试 ID攻击向量测试步骤预期结果
ATK-001跨用户转账用 user_id=A 的 token 请求转 user_id=B 的资金FORBIDDEN
ATK-002user_id 篡改修改请求体中的 user_idFORBIDDEN
ATK-003负数金额amount = -100INVALID_AMOUNT
ATK-004零金额amount = 0INVALID_AMOUNT
ATK-005超精度金额amount = 0.000000001 (超过8位)PRECISION_OVERFLOW
ATK-006整数溢出amount = u64::MAX + 1OVERFLOW 或解析失败
ATK-007相同账户from = to = SPOTSAME_ACCOUNT
ATK-008无效账户类型from = “INVALID”INVALID_ACCOUNT_TYPE
ATK-009不存在的资产asset_id = 999999INVALID_ASSET
ATK-010重复 cid同一 cid 发两次第二次返回第一次结果
ATK-011无 Token不带 Authorization headerUNAUTHORIZED
ATK-012过期 Token使用过期的 JWTUNAUTHORIZED
ATK-013伪造 Token使用无效签名的 JWTUNAUTHORIZED

10.3 余额不足测试

测试 ID场景预期结果
BAL-001转账金额 > 可用余额INSUFFICIENT_BALANCE,余额无变化
BAL-002转账金额 = 可用余额成功,余额变为 0
BAL-003并发: 两次转账总额 > 余额一个成功,一个 INSUFFICIENT_BALANCE
BAL-004冻结账户转出ACCOUNT_FROZEN
BAL-005禁用账户转出ACCOUNT_DISABLED

10.4 FSM 状态转换测试

测试 ID场景预期状态流
FSM-001正常 Funding→SpotINIT → SOURCE_PENDING → SOURCE_DONE → TARGET_PENDING → COMMITTED
FSM-002正常 Spot→Funding同上
FSM-003源失败INIT → SOURCE_PENDING → FAILED
FSM-004目标失败 (明确)… → TARGET_PENDING → COMPENSATING → ROLLED_BACK
FSM-005目标超时… → TARGET_PENDING (保持,无限重试)
FSM-006补偿失败COMPENSATING (保持,无限重试)

10.5 崩溃恢复测试

测试 ID崩溃点预期恢复行为
CRA-001INIT 后,SOURCE_PENDING 前Recovery 读取 INIT,重新执行 step_init
CRA-002SOURCE_PENDING 中,适配器调用前Recovery 重试 withdraw (幂等)
CRA-003SOURCE_PENDING 中,适配器调用后Recovery 重试 withdraw (幂等,返回已处理)
CRA-004SOURCE_DONE 后,TARGET_PENDING 前Recovery 继续执行 step_source_done
CRA-005TARGET_PENDING 中Recovery 重试 deposit (幂等)
CRA-006COMPENSATING 中Recovery 重试 refund (幂等)

10.6 并发/竞态测试

测试 ID场景预期结果
CON-001多个 Worker 处理同一 req_id只有一个成功 CAS,其他跳过
CON-002同时两次相同金额转账两个独立 req_id,各自执行
CON-003转账 + 外部提现并发只有余额足够的操作成功
CON-004读取余额时无锁无重复扣减(SELECT FOR UPDATE)

10.7 幂等性测试

测试 ID场景预期结果
IDP-001同一 req_id 调用 withdraw 两次第二次返回 SUCCESS,余额只扣一次
IDP-002同一 req_id 调用 deposit 两次第二次返回 SUCCESS,余额只加一次
IDP-003同一 req_id 调用 refund 两次第二次返回 SUCCESS,余额只加一次
IDP-004Recovery 多次重试同一 transfer最终状态一致,余额正确

10.8 资金异常测试 (最关键)

测试 ID威胁测试方法验证
FND-001双花 (Double Spend)源扣减两次只扣一次(幂等)
FND-002资金消失源扣减成功,目标失败,不补偿必须补偿或无限重试
FND-003资金无中生有目标入账两次只入一次(幂等)
FND-004中途崩溃丢失任意点崩溃Recovery 恢复完整性
FND-005状态不一致SOURCE_DONE 但 DB 未更新WAL + 幂等保证一致
FND-006部分提交PG 事务部分成功原子事务,全成功或全失败

10.9 监控告警测试

测试 ID场景预期告警
MON-001转账卡在 TARGET_PENDING > 1 分钟CRITICAL 告警
MON-002补偿连续失败 3 次CRITICAL 告警
MON-003资金守恒检查失败CRITICAL 告警 + 暂停服务
MON-004单用户转账频率异常WARNING 告警 [P2]




📋 Implementation & Verification | 实现与验证

本章的完整实现细节、API 说明、E2E 测试脚本和验证结果请参阅:

For complete implementation details, API documentation, E2E test scripts, and verification results:

👉 Phase 0x0B-a: Implementation & Testing Guide

包含 / Includes:

  • 架构实现与核心模块 (Architecture & Core Modules)
  • 新增 API 端点 (New API Endpoints)
  • 可复用 E2E 测试脚本 (Reusable E2E Test Script)
  • 数据库验证方法 (Database Verification)
  • 已修复 Bug 清单 (Fixed Bugs)

Internal Transfer E2E Testing Guide

概述 / Overview

本文档描述了 Phase 0x0B-a 内部转账功能的完成工作、实现细节和端到端测试方法。

This document describes the completed work, implementation details, and end-to-end testing methodology for Phase 0x0B-a Internal Transfer feature.


本章完成工作 / Chapter Deliverables

架构实现 / Architecture Implementation

实现了跨系统资金划转的 2-Phase Commit FSM:

                     ┌─────────────────┐
                     │  TransferAPI    │  Gateway 层
                     └────────┬────────┘
                              │
                     ┌────────▼────────┐
                     │ TransferCoord.  │  FSM 协调器
                     └────────┬────────┘
                              │
           ┌──────────────────┼──────────────────┐
           │                  │                  │
  ┌────────▼────────┐ ┌───────▼───────┐ ┌───────▼───────┐
  │ FundingAdapter  │ │ TradingAdapter│ │  TransferDb   │
  │   (PostgreSQL)  │ │  (UBSCore)    │ │  (FSM State)  │
  └─────────────────┘ └───────────────┘ └───────────────┘

核心模块 / Core Modules

模块 / Module文件 / File功能 / Function
TransferCoordinatorsrc/transfer/coordinator.rsFSM 状态机驱动
State machine driver
FundingAdaptersrc/transfer/adapters/funding.rsPostgreSQL 资金操作
PostgreSQL balance ops
TradingAdaptersrc/transfer/adapters/trading.rsUBSCore 通道通信
UBSCore channel comm
TransferDbsrc/transfer/db.rsFSM 状态持久化
FSM state persistence
TransferChannelsrc/transfer/channel.rs跨线程通信
Cross-thread messaging

新增 API / New APIs

EndpointMethod描述 / Description
/api/v1/private/transferPOST创建内部转账
/api/v1/private/transfer/{req_id}GET查询转账状态
/api/v1/private/balances/allGET查询所有账户余额

数据库表 / Database Tables

表 / Table用途 / Purpose
fsm_transfers_tbFSM 转账状态记录
transfer_operations_tb幂等操作追踪
balances_tb账户余额 (Funding/Spot)

交付物 / Deliverables

  • ✅ 完整的 FSM 实现 (Init → SourcePending → SourceDone → TargetPending → Committed)
  • ✅ 双向转账验证 (Funding ↔ Spot)
  • ✅ 可复用 E2E 测试脚本
  • /balances/all 余额查询 API
  • ✅ 232 个单元测试通过

测试脚本 / Test Script

自动化 E2E 测试 / Automated E2E Test

# 运行完整 E2E 测试 (自动启动 Gateway)
./scripts/test_transfer_e2e.sh

脚本位置: scripts/test_transfer_e2e.sh

测试流程 / Test Flow

[1/6] Prerequisites Check
    ✓ PostgreSQL connected (port 5433)
    ✓ Release binary ready

[2/6] Setup Test Data
    - Enable CAN_INTERNAL_TRANSFER for USDT
    - Create 1000 USDT in Funding for user 1001
    - Clear previous transfer records

[3/6] Start Gateway
    - Stop existing Gateway (pgrep + kill)
    - Start new Gateway with updated config
    - Wait for health check

[4/6] Run Transfer Tests
    - Funding → Spot (50 USDT)
    - Spot → Funding (25 USDT)
    - Verify both COMMITTED

[5/6] Verify Balance Changes
    - Check Funding: 1000 → 975 (Δ-25)
    - Use /balances/all API

[6/6] Cleanup
    - Stop Gateway

API 测试 / API Testing

使用 Python 客户端 / Using Python Client

import sys
sys.path.append('scripts/lib')
from api_auth import get_test_client

USER_ID = 1001
client = get_test_client(user_id=USER_ID)
headers = {'X-User-ID': str(USER_ID)}

# 1. 查询余额 / Query balances
resp = client.get('/api/v1/private/balances/all', headers=headers)
print(resp.json())

# 2. 发起转账 / Create transfer
resp = client.post('/api/v1/private/transfer',
    json_body={
        'from': 'funding',
        'to': 'spot',
        'asset': 'USDT',
        'amount': '50'
    },
    headers=headers)
print(resp.json())

# 3. 查询转账状态 / Query transfer status
req_id = resp.json()['data']['req_id']
resp = client.get(f'/api/v1/private/transfer/{req_id}', headers=headers)
print(resp.json())

使用 curl / Using curl

# 查询余额 (需要正确签名)
curl http://localhost:8080/api/v1/private/balances/all \
  -H "X-API-Key: AK_0000000000001001" \
  -H "X-Signature: ..." \
  -H "X-User-ID: 1001"

数据库验证 / Database Verification

检查余额 / Check Balances

PGPASSWORD=trading123 psql -h localhost -p 5433 -U trading -d exchange_info_db -c "
SELECT 
    CASE account_type WHEN 1 THEN 'Spot' WHEN 2 THEN 'Funding' END as account,
    (available / 1000000)::text || ' USDT' as balance
FROM balances_tb 
WHERE user_id = 1001 AND asset_id = 2
ORDER BY account_type;
"

检查 FSM 状态 / Check FSM State

PGPASSWORD=trading123 psql -h localhost -p 5433 -U trading -d exchange_info_db -c "
SELECT req_id, amount, state, created_at 
FROM fsm_transfers_tb 
WHERE user_id = 1001
ORDER BY created_at DESC LIMIT 5;
"

State 值含义 / State Values:

  • 0: INIT
  • 10: SOURCE_PENDING
  • 20: SOURCE_DONE
  • 30: TARGET_PENDING
  • 40: COMMITTED ✅
  • -10: FAILED
  • -20: COMPENSATING
  • -30: ROLLED_BACK

已修复的 Bug / Fixed Bugs

1. FSM 未执行 / FSM Not Executing

问题: create_transfer_fsm 只调用 coordinator.create(),没有调用 coordinator.execute()

修复: 添加 execute() 调用

#![allow(unused)]
fn main() {
// src/transfer/api.rs
let req_id = coordinator.create(core_req).await?;
let state = coordinator.execute(req_id).await?; // ← Added
}

2. 金额解析为 0 / Amount Parsed as 0

问题: Decimal.to_string().parse::<u64>() 对 “50000000.00000000” 返回失败

修复: 使用 trunc().to_i64()

#![allow(unused)]
fn main() {
// src/transfer/db.rs
let amount_u64 = amount.trunc().to_i64().unwrap_or(0) as u64;
}

3. 类型不匹配 / Type Mismatch

  • status 列: INT4 (i32), 不是 INT2
  • decimals 列: INT2 (i16), 不是 i32

测试结果示例 / Sample Test Output

==============================================
Internal Transfer E2E Test (Phase 0x0B-a)
==============================================

[1/6] Checking prerequisites...
  ✓ PostgreSQL connected
  ✓ Release binary ready
[2/6] Setting up test data...
  ✓ Test data initialized (1000 USDT in Funding only for user 1001)
[3/6] Starting Gateway...
  ✓ Gateway ready
[4/6] Running transfer tests with balance verification...
  [BEFORE] Getting initial balances...
    USDT:funding: 1000.00

  [TRANSFER 1] Funding → Spot (50 USDT)...
    ✓ COMMITTED
  [TRANSFER 2] Spot → Funding (25 USDT)...
    ✓ COMMITTED

  [AFTER] Getting final Funding balance...
    USDT:funding: 975.00

  [VERIFY] Checking Funding balance changes...
    ✓ Funding: 1000.00 → 975.00 (Δ-25.00)

  Results: 3 passed, 0 failed
[5/6] Final database state...
 Funding | 975.0000000000000000 USDT

[6/6] Cleanup...

==============================================
✅ All E2E Transfer Tests PASSED
==============================================

文件 / File描述 / Description
scripts/test_transfer_e2e.shE2E 测试脚本
scripts/lib/api_auth.pyAPI 认证库
src/transfer/api.rs转账 API 处理
src/transfer/coordinator.rsFSM 协调器
src/transfer/adapters/funding.rsFunding 适配器
src/transfer/adapters/trading.rsTrading 适配器

Build & Verification Guide | 编译与验证事项

0x0C Trade Fee System | 交易手续费系统

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff


1. Overview

1.1 Connecting the Dots: From Transfer to Trading

In 0x0B, we built the FSM mechanism for fund transfers between Funding and Spot accounts. Once funds enter the Spot account, the exchange needs a revenue source.

This is the topic of this chapter: Trade Fee.

Whenever buyers and sellers execute trades, the exchange collects a percentage fee. This is the core business model of exchanges and the foundation for sustainable operations.

Design Philosophy: Fee implementation seems simple (just deducting a percentage, right?), but involves multiple key decisions:

  • Where to configure fee rates? (Symbol level vs Global)
  • Which asset to deduct from? (Paid vs Received)
  • When to deduct? (In ME vs In Settlement)
  • How to ensure precision? (u64 * bps / 10000 overflow issues)

1.2 Goal

Implement Maker/Taker fee model for trade execution. Fees are the primary revenue source for exchanges

1.3 Key Concepts

TermDefinition
MakerOrder that adds liquidity (resting on orderbook)
TakerOrder that removes liquidity (matches immediately)
Fee RatePercentage of trade value charged
bpsBasis points (1 bps = 0.01% = 0.0001)

1.4 Architecture Overview

┌─────────── Fee Model ────────────┐
│                                  │
│  Final Rate = Symbol.base_fee    │
│             × VipDiscount / 100  │
└──────────────────────────────────┘

┌─────────── Data Flow ─────────────────────────────────────────────────────┐
│                                                                           │
│  ME ────▶ Trade{role} ────▶ UBSCore ────▶ BalanceEventBatch ────▶ TDengine
│              │                  │              │                          │
│              │           Memory: VIP/Fees      ├── buyer event            │
│              │           O(1) fee calc         ├── seller event           │
│              │                                 └── revenue event ×2       │
│              │                                                            │
└──────────────┴────────────────────────────────────────────────────────────┘

┌─────────── Core Design ───────────┐
│ ✅ Fee from Gain → No reservation │
│ ✅ UBSCore billing → Balance auth │
│ ✅ Per-User Event → Decoupled     │
│ ✅ Event Sourcing → Conservation  │
└───────────────────────────────────┘

2. Fee Model Design

2.1 Why Maker/Taker Model?

Traditional stock exchanges use fixed rates, but crypto exchanges universally adopt the Maker/Taker model. This is not arbitrary:

ProblemHow Maker/Taker Solves
Low liquidityLow Maker fees encourage limit orders
Price discoveryDeeper orderbook, narrower spreads
FairnessLiquidity takers pay more

Industry Practice: Binance, OKX, Bybit all use this model.

2.2 Fee Rate Architecture

Two-Layer System: Symbol base rate × VIP discount coefficient

Final Rate = Symbol.base_fee × VipDiscountTable[user.vip_level] / 100

Layer 1: Symbol Base Rate

Each trading pair defines its own base rate:

FieldPrecisionDefaultDescription
base_maker_fee10^610000.10%
base_taker_fee10^620000.20%

Layer 2: VIP Discount Coefficient

VIP levels and discounts are configured from database (not hardcoded).

VIP Level Table Design:

FieldTypeDescription
levelSMALLINT PKVIP level (0, 1, 2, …)
discount_percentSMALLINTDiscount % (100=no discount, 50=50% off)
min_volumeDECIMALTrading volume for upgrade (optional)
descriptionVARCHARLevel description (optional)

Example Data:

leveldiscount_percentdescription
0100Normal
190VIP 1
280VIP 2
370VIP 3

Operations can configure any number of VIP levels; code loads from database.

Example Calculation:

BTC_USDT: base_taker_fee = 2000 (0.20%)
User VIP 5: discount = 50%
Final Rate = 2000 × 50 / 100 = 1000 (0.10%)

Why 10^6 Precision?

  • 10^4 (bps) only represents down to 0.01%, not fine enough
  • 10^6 can represent 0.0001%, sufficient for VIP discounts and rebates
  • Safe with u128 intermediate: (amount as u128 * rate as u128 / 10^6) as u64

2.3 Fee Collection Point

Trade: Alice (Taker, BUY) ← → Bob (Maker, SELL)
       Alice buys 1 BTC @ 100,000 USDT

┌──────────────────────────────────────────────────────────┐
│ Before Fee:                                              │
│   Alice: -100,000 USDT, +1 BTC                          │
│   Bob:   +100,000 USDT, -1 BTC                          │
├──────────────────────────────────────────────────────────┤
│ After Fee (deducted from RECEIVED asset):               │
│   Alice (Taker 0.20%): -100,000 USDT, +0.998 BTC        │
│   Bob (Maker 0.10%):   +99,900 USDT,  -1 BTC            │
│                                                          │
│   Exchange collects: 0.002 BTC + 100 USDT               │
└──────────────────────────────────────────────────────────┘

Rule: Fee is always deducted from what you receive, not what you pay.

Why deduct from received asset?

  1. Simplify user mental accounting: User pays 100 USDT, it’s exactly 100 USDT
  2. Avoid budget overrun: Buying 1 BTC won’t require 100,020 USDT due to fees
  3. Industry practice: Binance, Coinbase all do this

2.4 Why No Lock Reservation Needed

Since fees are deducted from received asset, no fee reservation needed:

┌─────────────────────────────────────────────────────────────────────┐
│ Benefits of Fee from Gain (Received Asset)                         │
├─────────────────────────────────────────────────────────────────────┤
│ User receives 1 BTC → Deduct 0.002 BTC fee → Net credit 0.998 BTC │
│                                                                     │
│ ✅ Never "insufficient balance for fee"                            │
│ ✅ Pay amount = Actual pay amount (exact)                          │
│ ✅ No complex reservation/refund logic                             │
└─────────────────────────────────────────────────────────────────────┘

Compare with deducting from paid asset:

ApproachLock AmountIssue
From Gainbase_costNo extra reservation ✅
From Paybase_cost + max_feeMay insufficient, need reservation ❌

Design Decision: Use “fee from gain” mode, simplify lock logic.

  • Buy order locks USDT, fee deducted from received BTC
  • Sell order locks BTC, fee deducted from received USDT

2.5 Fee Responsibility: UBSCore (First Principles)

Core Question: Who is responsible for fee calculation?

Fee deduction = Balance change = Must be executed by UBSCore
QuestionAnswer
Who knows trade occurred?ME
Who manages balances?UBSCore
Who can execute deductions?UBSCore
Who is responsible for fees?UBSCore

Data Flow:

ME ──▶ Trade{role} ──▶ UBSCore ──▶ BalanceEvent{fee} ──▶ Settlement ──▶ TDengine
                          │
                     ① Get VIP level (memory)
                     ② Get Symbol fee rate (memory)
                     ③ Calculate fee = received × rate
                     ④ credit(net_amount)

2.6 High Performance Design

Key to efficiency: All config in UBSCore memory

UBSCore Memory Structure (loaded at startup):
├── user_vip_levels: HashMap<UserId, u8>
├── vip_discounts: HashMap<u8, u8>  // level → discount%
└── symbol_fees: HashMap<SymbolId, (u64, u64)>  // (maker, taker)

Fee calculation = Pure memory operation, O(1)
ComponentResponsibilityBlocking?
UBSCoreCalculate fee, update balance❌ Pure memory
BalanceEventPass fee info❌ Async channel
SettlementWrite to TDengine❌ Separate thread

Why efficient?

  • No I/O on critical path
  • All data in memory
  • Output reuses existing BalanceEvent channel

2.7 Per-User BalanceEvent Design

Core Insight: One Trade produces two users’ balance changes → Two BalanceEvents

Trade ──▶ UBSCore ──┬──▶ BalanceEvent{user: buyer}  ──▶ WS + TDengine
                    │
                    └──▶ BalanceEvent{user: seller} ──▶ WS + TDengine

Per-User Event Structure:

FieldTypeDescription
trade_idu64Links to original Trade
user_idu64Who this event belongs to
debit_assetu32Asset paid
debit_amountu64Amount paid
credit_assetu32Asset received
credit_amountu64Net amount (after fee)
feeu64Fee charged
is_makerboolIs Maker role

Example Code (Pseudocode, for reference only):

#![allow(unused)]
fn main() {
// ⚠️ Pseudocode - may change during implementation
BalanceEvent::TradeSettled {
    trade_id: u64,         // Links to original Trade
    user_id: u64,          // Who this event belongs to
    
    debit_asset: u32,      // Paid
    debit_amount: u64,
    credit_asset: u32,     // Received (net)
    credit_amount: u64,
    
    fee: u64,              // Fee
    is_maker: bool,        // Role
}
}

Why Per-User Design?

  • Single responsibility: One event = One user’s balance change
  • Decoupled: User doesn’t need to know counterparty
  • WebSocket friendly: Route directly by user_id
  • Query friendly: TDengine partitioned by user_id
  • Privacy safe: User only sees own data

3. Data Model

3.1 Symbol Base Fee Configuration

-- Symbol base fee (10^6 precision: 1000 = 0.10%)
ALTER TABLE symbols_tb ADD COLUMN base_maker_fee INTEGER NOT NULL DEFAULT 1000;
ALTER TABLE symbols_tb ADD COLUMN base_taker_fee INTEGER NOT NULL DEFAULT 2000;

3.2 User VIP Level

-- User VIP level (0-9, 0=normal user, 9=top tier)
ALTER TABLE users_tb ADD COLUMN vip_level SMALLINT NOT NULL DEFAULT 0;

3.3 Trade Record Enhancement

Existing Trade struct already has:

  • fee: u64 - Amount of fee charged (in received asset’s scaled units)
  • role: u8 - 0=Maker, 1=Taker

3.4 Fee Record Storage

Fee info is already included in Trade record:

StorageContent
trades_tb (TDengine)fee, fee_asset, role fields
Trade EventReal-time push to downstream (WS, Kafka)

3.5 Event Sourcing: BalanceEventBatch (Full Traceability)

Core Design: One Trade produces a group of BalanceEvents as atomic unit

Trade ──▶ UBSCore ──▶ BalanceEventBatch{trade_id, events: [...]}
                              │
                              ├── TradeSettled{user: buyer}   // Buyer
                              ├── TradeSettled{user: seller}  // Seller
                              ├── FeeReceived{account: REVENUE, from: buyer}
                              └── FeeReceived{account: REVENUE, from: seller}

Example Structure (Pseudocode):

#![allow(unused)]
fn main() {
// ⚠️ Pseudocode - may change during implementation
BalanceEventBatch {
    trade_id: u64,
    ts: Timestamp,
    events: [
        TradeSettled{user: buyer_id, debit_asset, debit_amount, credit_asset, credit_amount, fee},
        TradeSettled{user: seller_id, debit_asset, debit_amount, credit_asset, credit_amount, fee},
        FeeReceived{account: REVENUE_ID, asset: base_asset, amount: buyer_fee, from_user: buyer_id},
        FeeReceived{account: REVENUE_ID, asset: quote_asset, amount: seller_fee, from_user: seller_id},
    ]
}
}

Atomic Unit Properties:

PropertyDescription
Generated togetherSame trade_id
Persisted togetherSingle batch write to TDengine
Traced togetherAll events linked by trade_id

Asset Conservation Verification:

buyer.debit(quote)  + buyer.credit(base - fee)   = 0  ✓
seller.debit(base)  + seller.credit(quote - fee) = 0  ✓
revenue.credit(buyer_fee + seller_fee)           = fee_total ✓

Σ changes = 0 (Asset conservation, auditable)

TDengine Storage (Event Sourcing):

TableContent
balance_events_tbAll BalanceEvents (TradeSettled + FeeReceived)

Why Event Sourcing?

  • Full traceability: Any fee can be traced to trade_id + user_id
  • Asset conservation: Conservation verifiable within event batch
  • Aggregation is derived: Balance = SUM(events), computed on demand

4. Implementation Architecture

4.1 Complete Data Flow

┌───────────┐    ┌───────────┐    ┌─────────────────────────────────────────┐
│    ME     │───▶│  UBSCore  │───▶│         BalanceEventBatch               │
│  (Match)  │    │ (Fee calc)│    │  ┌─ TradeSettled{buyer}                 │
└───────────┘    └───────────┘    │  ├─ TradeSettled{seller}                │
                      │           │  ├─ FeeReceived{REVENUE, from:buyer}    │
                      │           │  └─ FeeReceived{REVENUE, from:seller}   │
          Memory: VIP/Fee rates   └───────────────┬─────────────────────────┘
                                                  │
                                                  ▼
                              ┌──────────────────────────────────────────────┐
                              │              Settlement Service              │
                              │  ① Batch write to TDengine                   │
                              │  ② WebSocket push (routed by user_id)       │
                              │  ③ Kafka publish (optional)                 │
                              └──────────────────────────────────────────────┘

4.2 TDengine Schema Design

balance_events Super Table:

CREATE STABLE balance_events (
    ts          TIMESTAMP,
    event_type  TINYINT,       -- 1=TradeSettled, 2=FeeReceived, 3=Deposit...
    trade_id    BIGINT,
    debit_asset INT,
    debit_amt   BIGINT,
    credit_asset INT,
    credit_amt  BIGINT,
    fee         BIGINT,
    fee_asset   INT,
    is_maker    BOOL,
    from_user   BIGINT         -- FeeReceived: source user
) TAGS (
    user_id       BIGINT,      -- User identifier (0=REVENUE)
    account_type  TINYINT      -- 1=Spot, 2=Funding, 3=Futures...
);

-- Subtable per (user, account_type)
CREATE TABLE user_1001_spot USING balance_events TAGS (1001, 1);
CREATE TABLE user_1001_funding USING balance_events TAGS (1001, 2);
CREATE TABLE revenue_spot USING balance_events TAGS (0, 1);  -- REVENUE

Design Points:

DesignRationale
Dual TAGs (user_id, account_type)Future-proof for Futures, Margin…
Partition by user_idUser queries scan only their tables
Partition by account_typeAccount-specific queries are O(1)
Timestamp indexTDengine native optimization

4.3 Query Patterns

User query fee history:

SELECT ts, trade_id, fee, fee_asset, is_maker
FROM user_1001_events
WHERE event_type = 1  -- TradeSettled
  AND ts > NOW() - 30d
ORDER BY ts DESC
LIMIT 100;

Platform fee income stats:

SELECT fee_asset, SUM(credit_amt) as total_fee
FROM revenue_events
WHERE ts > NOW() - 1d
GROUP BY fee_asset;

Trace all events for a trade:

SELECT * FROM balance_events
WHERE trade_id = 12345
ORDER BY ts;

4.4 Consumer Architecture

BalanceEventBatch
       │
       ├──▶ TDengine Writer (batch write, high throughput)
       │       └── Route to subtable by (user_id, account_type)
       │
       ├──▶ WebSocket Router (real-time push)
       │       └── Route to WS connection by user_id
       │
       └──▶ Kafka Publisher (optional, downstream subscription)
               └── Topic: balance_events

4.5 Performance Considerations

OptimizationStrategy
Batch writeBalanceEventBatch writes at once
Partition strategyPartition by user_id, avoid hotspots
Time partitionTDengine auto partitions by time
Async processingUBSCore doesn’t wait after send

5. API Changes

5.1 Trade Response

{
  "trade_id": "12345",
  "price": "100000.00",
  "qty": "1.00000000",
  "fee": "0.00200000",       // NEW: Fee amount
  "fee_asset": "BTC",        // NEW: Fee asset
  "role": "TAKER"            // NEW: Maker/Taker
}

5.2 WebSocket Trade Update

{
  "e": "trade.update",
  "data": {
    "trade_id": "12345",
    "fee": "0.002",
    "fee_asset": "BTC",
    "is_maker": false
  }
}

6. Edge Cases

CaseHandling
Zero-fee symbolAllow maker_fee = 0
Insufficient for feeN/A - fee always deducted from received asset

7. Verification Plan

7.1 Unit Tests

  • Fee calculation accuracy (multiple precisions)
  • Maker vs Taker role assignment

7.2 Integration Tests

  • E2E trade with fee deduction
  • Fee ledger reconciliation

7.3 Acceptance Criteria

  • Trades deduct correct fees
  • Fee ledger matches Σ(trade.fee)
  • API returns fee info
  • WS pushes fee info




🇨🇳 中文

📦 代码变更: 查看 Diff


1. 概述

1.1 从资金划转到交易

0x0B 章节中,我们建立了资金划转机制。本章的主题是交易手续费——交易所最核心的商业模式。

1.2 目标

实现 Maker/Taker 手续费模型

1.3 核心概念

术语定义
Maker挂单方 (订单在盘口等待成交)
Taker吃单方 (订单立即匹配成交)
费率交易额的百分比
bps基点 (1 bps = 0.01%)

1.4 架构总览

┌─────────── 费率模型 ────────────┐
│  最终费率 = Symbol.base_fee    │
│           × VipDiscount / 100  │
└────────────────────────────────┘

┌─────────── 数据流 ─────────────────────────────────────────────────────┐
│  ME ────▶ Trade{role} ────▶ UBSCore ────▶ BalanceEventBatch ────▶ TDengine
│              │                  │              │                       │
│              │           内存: VIP/费率        ├── buyer event         │
│              │           O(1) fee 计算         ├── seller event        │
│              │                                 └── revenue event ×2    │
└──────────────┴─────────────────────────────────────────────────────────┘

┌─────────── 核心设计 ───────────┐
│ ✅ 从 Gain 扣费 → 无需预留     │
│ ✅ UBSCore 计费 → 余额权威     │
│ ✅ Per-User Event → 解耦隐私   │
│ ✅ Event Sourcing → 资产守恒   │
└────────────────────────────────┘

2. 费率模型设计

2.1 为什么选择 Maker/Taker?

问题解决方案
流动性不足低 Maker 费率鼓励挂单
价格发现盘口深度越深,价差越小
公平性消耗流动性者多付费

2.2 两层费率体系

最终费率 = Symbol.base_fee × VipDiscount[vip_level] / 100

Layer 1: Symbol 基础费率

字段精度默认值说明
base_maker_fee10^610000.10%
base_taker_fee10^620000.20%

Layer 2: VIP 折扣系数

字段类型说明
levelSMALLINT PKVIP 等级
discount_percentSMALLINT折扣百分比

2.3 手续费扣除点

规则: 手续费从收到的资产扣除,不是支付的资产。

Alice (Taker, BUY) 以 100,000 USDT 购买 1 BTC

Before: Alice -100,000 USDT, +1 BTC
After:  Alice -100,000 USDT, +0.998 BTC (手续费 0.002 BTC)

2.4 无需预留手续费

从 Gain 扣费的好处:

  • ✅ 永远不会“余额不足付手续费“
  • ✅ 支付金额 = 实际支付金额
  • ✅ 无需复杂的预留/退还逻辑

2.5 计费责任: UBSCore (第一性原理)

费用扣除 = 余额变动 = 必须由 UBSCore 执行
问题答案
谁管理余额?UBSCore
谁能执行扣款?UBSCore
谁负责计费?UBSCore

2.6 高性能设计

UBSCore 内存结构 (启动时加载):
├── user_vip_levels: HashMap<UserId, u8>
├── vip_discounts: HashMap<u8, u8>
└── symbol_fees: HashMap<SymbolId, (u64, u64)>

费用计算 = 纯内存操作, O(1)

2.7 Per-User BalanceEvent

一个 Trade → 两个用户事件

Trade ──▶ UBSCore ──┬──▶ BalanceEvent{user: buyer}
                    └──▶ BalanceEvent{user: seller}

3. 数据模型

3.1 Symbol 费率配置

ALTER TABLE symbols_tb ADD COLUMN base_maker_fee INTEGER NOT NULL DEFAULT 1000;
ALTER TABLE symbols_tb ADD COLUMN base_taker_fee INTEGER NOT NULL DEFAULT 2000;

3.2 User VIP 等级

ALTER TABLE users_tb ADD COLUMN vip_level SMALLINT NOT NULL DEFAULT 0;

3.3 Event Sourcing: BalanceEventBatch

一个 Trade 产生一组 BalanceEvent 作为原子整体

BalanceEventBatch{trade_id}
├── TradeSettled{user: buyer}
├── TradeSettled{user: seller}
├── FeeReceived{REVENUE, from: buyer}
└── FeeReceived{REVENUE, from: seller}

资产守恒验证:

buyer.debit(quote)  + buyer.credit(base - fee)   = 0  ✓
seller.debit(base)  + seller.credit(quote - fee) = 0  ✓
revenue.credit(buyer_fee + seller_fee)           = fee_total ✓

Σ 变动 = 0 (可审计)

4. 实现架构

4.1 TDengine Schema

CREATE STABLE balance_events (
    ts          TIMESTAMP,
    event_type  TINYINT,
    trade_id    BIGINT,
    debit_asset INT,
    debit_amt   BIGINT,
    credit_asset INT,
    credit_amt  BIGINT,
    fee         BIGINT,
    fee_asset   INT,
    is_maker    BOOL
) TAGS (
    user_id       BIGINT,      -- 用户 ID (0=REVENUE)
    account_type  TINYINT      -- 1=Spot, 2=Funding, 3=Futures...
);

4.2 查询模式

-- 用户手续费历史
SELECT ts, trade_id, fee FROM user_1001_events WHERE event_type = 1;

-- 平台收入统计
SELECT fee_asset, SUM(credit_amt) FROM revenue_events GROUP BY fee_asset;

4.3 消费者架构

BalanceEventBatch
├──▶ TDengine Writer (批量写入)
├──▶ WebSocket Router (按 user_id 推送)
└──▶ Kafka Publisher (可选)

5. API 变更

5.1 Trade 响应

{
  "trade_id": "12345",
  "fee": "0.002",
  "fee_asset": "BTC",
  "role": "TAKER"
}

5.2 WebSocket 推送

{
  "e": "trade.update",
  "data": {"trade_id": "12345", "fee": "0.002", "is_maker": false}
}

6. 边界情况

情况处理
零费率交易对允许 maker_fee = 0

7. 验证计划

  • 手续费计算准确性测试
  • E2E 交易手续费扣除
  • API/WS 返回手续费信息
  • 资产守恒审计



0x0D Snapshot & Recovery: Robustness

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📅 Status: 🚧 Under Construction Core Objective: Implement graceful shutdown and state recovery mechanisms.


1. Overview

  • Snapshot: Periodically save the memory state (OrderBook, Balances) to disk.
  • Recovery: Restore state from the latest snapshot + replay WAL (Write-Ahead Log) upon restart.
  • Graceful Shutdown: Ensure all pending events are processed before stopping.

(Detailed content coming soon)




🇨🇳 中文

📅 状态: 🚧 建设中 核心目标: 实现优雅停机与状态恢复机制。


1. 概述

  • 快照 (Snapshot): 定期将内存状态(OrderBook, Balances)保存到磁盘。
  • 恢复 (Recovery): 重启时从最新快照恢复 + 重放 WAL (Write-Ahead Log)。
  • 优雅停机: 确保在停止前处理完所有挂起事件。

(详细内容即将推出)

0x0E OpenAPI Integration

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff


1. Overview

1.1 Why OpenAPI?

Programmatic traders need API documentation to integrate with our exchange. Instead of maintaining separate docs that drift from code, we auto-generate OpenAPI 3.0 spec directly from Rust types.

1.2 Goal

  1. Serve interactive API docs at /docs (Swagger UI)
  2. Export openapi.json for SDK generation
  3. Keep docs in sync with code (single source of truth)

1.3 Key Concepts

TermDefinition
OpenAPIIndustry-standard API specification format (formerly Swagger)
utoipaRust crate for compile-time OpenAPI generation
Swagger UIInteractive API documentation interface
Code-FirstGenerate spec from code, not YAML files

1.4 Architecture Overview

┌─────────── OpenAPI Integration Flow ────────────┐
│                                                  │
│  Rust Handlers ──▶ #[utoipa::path] ──▶ OpenAPI   │
│       │                                   │      │
│       │                                   ▼      │
│       │                            Swagger UI    │
│       │                            (/docs)       │
│       │                                   │      │
│       ▼                                   ▼      │
│  Type-Safe API ◀─────────────────▶ openapi.json │
│                                          │      │
│                                          ▼      │
│                                    SDK Clients  │
│                                  (Python, TS)   │
└─────────────────────────────────────────────────┘

2. Implementation

2.1 Adding Dependencies

Cargo.toml:

[dependencies]
+ utoipa = { version = "5.3", features = ["axum_extras", "chrono", "uuid"] }
+ utoipa-swagger-ui = { version = "8.0", features = ["axum"] }

2.2 Creating OpenAPI Module

Create src/gateway/openapi.rs:

#![allow(unused)]
fn main() {
use utoipa::OpenApi;

#[derive(OpenApi)]
#[openapi(
    info(
        title = "Zero X Infinity Exchange API",
        version = "1.0.0",
        description = "High-performance crypto exchange API (1.3M orders/sec)"
    ),
    paths(
        handlers::health_check,
        handlers::get_depth,
        handlers::get_klines,
        // ... all API handlers
    ),
    components(schemas(
        types::ApiResponse<()>,
        types::DepthApiData,
        // ... all response types
    ))
)]
pub struct ApiDoc;
}

2.3 Annotating Handlers

Add #[utoipa::path] to each handler:

+ #[utoipa::path(
+     get,
+     path = "/api/v1/public/depth",
+     params(
+         ("symbol" = String, Query, description = "Trading pair"),
+         ("limit" = Option<u32>, Query, description = "Depth levels")
+     ),
+     responses(
+         (status = 200, description = "Order book depth", body = ApiResponse<DepthApiData>)
+     ),
+     tag = "Market Data"
+ )]
  pub async fn get_depth(
      State(state): State<Arc<AppState>>,
      Query(params): Query<HashMap<String, String>>,
  ) -> impl IntoResponse {
      // ... existing implementation ...
  }

2.4 Adding Schema Derivations

Add ToSchema to response types:

+ use utoipa::ToSchema;

- #[derive(Serialize, Deserialize)]
+ #[derive(Serialize, Deserialize, ToSchema)]
  pub struct DepthApiData {
+     #[schema(example = "BTC_USDT")]
      pub symbol: String,
+     #[schema(example = json!([["85000.00", "0.5"]]))]
      pub bids: Vec<[String; 2]>,
+     #[schema(example = json!([["85001.00", "0.3"]]))]
      pub asks: Vec<[String; 2]>,
  }

2.5 Integrating Swagger UI

In src/gateway/mod.rs:

+ use utoipa_swagger_ui::SwaggerUi;
+ use crate::gateway::openapi::ApiDoc;

  let app = Router::new()
      .route("/api/v1/health", get(handlers::health_check))
      .nest("/api/v1/public", public_routes)
      .nest("/api/v1/private", private_routes)
+     .merge(
+         SwaggerUi::new("/docs")
+             .url("/api-docs/openapi.json", ApiDoc::openapi())
+     )
      .with_state(state);

3. API Endpoints

3.1 Public Endpoints (No Auth)

EndpointMethodDescription
/api/v1/healthGETHealth check
/api/v1/public/depthGETOrder book depth
/api/v1/public/klinesGETK-line data
/api/v1/public/assetsGETAsset list
/api/v1/public/symbolsGETTrading pairs
/api/v1/public/exchange_infoGETExchange metadata

3.2 Private Endpoints (Ed25519 Auth)

EndpointMethodDescription
/api/v1/private/orderPOSTCreate order
/api/v1/private/cancelPOSTCancel order
/api/v1/private/ordersGETQuery orders
/api/v1/private/tradesGETTrade history
/api/v1/private/balancesGETBalance query
/api/v1/private/balances/allGETAll balances
/api/v1/private/transferPOSTInternal transfer
/api/v1/private/transfer/{id}GETTransfer status

4. SDK Generation

4.1 Python SDK

Auto-generated Python client with Ed25519 signing:

from zero_x_infinity_sdk import ZeroXInfinityClient

client = ZeroXInfinityClient(
    api_key="your_api_key",
    secret_key_bytes=secret_key  # Ed25519 private key
)

# Create order
order = client.create_order(
    symbol="BTC_USDT",
    side="BUY",
    price="85000.00",
    qty="0.001"
)

4.2 TypeScript SDK

import { ZeroXInfinityClient } from './zero_x_infinity_sdk';

const client = new ZeroXInfinityClient(apiKey, secretKey);
const depth = await client.getDepth('BTC_USDT');

5. Verification

5.1 Access Swagger UI

cargo run --release -- --gateway --port 8080
# Open: http://localhost:8080/docs

5.2 Test Results

Test CategoryTestsResult
Unit Tests293✅ All pass
Public Endpoints6✅ All pass
Private Endpoints9✅ All pass
E2E Total17✅ All pass

6. Summary

In this chapter, we added OpenAPI documentation to our trading engine:

AchievementResult
Swagger UIAvailable at /docs
OpenAPI Spec15 endpoints documented
Python SDKAuto-generated with Ed25519
TypeScript SDKType-safe client
Zero Breaking ChangesAll existing tests pass

Next Chapter: With resilience (0x0D) and documentation (0x0E) complete, the foundation is solid. The next logical step is 0x0F: Deposit & Withdraw—connecting to blockchain for real crypto funding.




🇨🇳 中文

📦 代码变更: 查看 Diff


1. 概述

1.1 为什么需要 OpenAPI?

程序化交易者需要 API 文档。与其手写 YAML 文档(容易和代码不同步),不如直接从 Rust 类型生成 OpenAPI 3.0 规范。

1.2 目标

  1. /docs 提供交互式文档(Swagger UI)
  2. 导出 openapi.json 用于 SDK 生成
  3. 文档和代码保持同步(单一事实来源)

1.3 核心概念

术语定义
OpenAPI行业标准的 API 规范格式(前身是 Swagger)
utoipaRust 编译时 OpenAPI 生成库
Swagger UI交互式 API 文档界面
代码优先从代码生成规范,而非 YAML 文件

1.4 架构总览

┌─────────── OpenAPI 集成流程 ────────────┐
│                                          │
│  Rust Handlers ──▶ #[utoipa::path] ──▶ OpenAPI
│       │                                   │
│       │                                   ▼
│       │                            Swagger UI
│       │                            (/docs)
│       ▼                                   │
│  类型安全 API ◀────────────────▶ openapi.json
│                                          │
│                                          ▼
│                                    SDK 客户端
│                                  (Python, TS)
└──────────────────────────────────────────┘

2. 实现

2.1 添加依赖

Cargo.toml:

[dependencies]
+ utoipa = { version = "5.3", features = ["axum_extras", "chrono", "uuid"] }
+ utoipa-swagger-ui = { version = "8.0", features = ["axum"] }

2.2 创建 OpenAPI 模块

创建 src/gateway/openapi.rs

#![allow(unused)]
fn main() {
use utoipa::OpenApi;

#[derive(OpenApi)]
#[openapi(
    info(
        title = "Zero X Infinity Exchange API",
        version = "1.0.0",
        description = "高性能加密货币交易所 API (1.3M 订单/秒)"
    ),
    paths(
        handlers::health_check,
        handlers::get_depth,
        handlers::get_klines,
        // ... 所有 API handlers
    ),
    components(schemas(
        types::ApiResponse<()>,
        types::DepthApiData,
        // ... 所有响应类型
    ))
)]
pub struct ApiDoc;
}

2.3 注解 Handlers

为每个 handler 添加 #[utoipa::path]

+ #[utoipa::path(
+     get,
+     path = "/api/v1/public/depth",
+     params(
+         ("symbol" = String, Query, description = "交易对"),
+         ("limit" = Option<u32>, Query, description = "深度层数")
+     ),
+     responses(
+         (status = 200, description = "订单簿深度", body = ApiResponse<DepthApiData>)
+     ),
+     tag = "行情数据"
+ )]
  pub async fn get_depth(
      State(state): State<Arc<AppState>>,
      Query(params): Query<HashMap<String, String>>,
  ) -> impl IntoResponse {
      // ... 现有实现 ...
  }

2.4 添加 Schema 派生

为响应类型添加 ToSchema

+ use utoipa::ToSchema;

- #[derive(Serialize, Deserialize)]
+ #[derive(Serialize, Deserialize, ToSchema)]
  pub struct DepthApiData {
+     #[schema(example = "BTC_USDT")]
      pub symbol: String,
+     #[schema(example = json!([["85000.00", "0.5"]]))]
      pub bids: Vec<[String; 2]>,
+     #[schema(example = json!([["85001.00", "0.3"]]))]
      pub asks: Vec<[String; 2]>,
  }

2.5 集成 Swagger UI

src/gateway/mod.rs 中:

+ use utoipa_swagger_ui::SwaggerUi;
+ use crate::gateway::openapi::ApiDoc;

  let app = Router::new()
      .route("/api/v1/health", get(handlers::health_check))
      .nest("/api/v1/public", public_routes)
      .nest("/api/v1/private", private_routes)
+     .merge(
+         SwaggerUi::new("/docs")
+             .url("/api-docs/openapi.json", ApiDoc::openapi())
+     )
      .with_state(state);

3. API 端点

3.1 公开端点(无需认证)

端点方法描述
/api/v1/healthGET健康检查
/api/v1/public/depthGET订单簿深度
/api/v1/public/klinesGETK 线数据
/api/v1/public/assetsGET资产列表
/api/v1/public/symbolsGET交易对
/api/v1/public/exchange_infoGET交易所信息

3.2 私有端点(Ed25519 认证)

端点方法描述
/api/v1/private/orderPOST创建订单
/api/v1/private/cancelPOST取消订单
/api/v1/private/ordersGET查询订单
/api/v1/private/tradesGET成交历史
/api/v1/private/balancesGET余额查询
/api/v1/private/balances/allGET所有余额
/api/v1/private/transferPOST内部划转
/api/v1/private/transfer/{id}GET划转状态

4. SDK 生成

4.1 Python SDK

自动生成的 Python 客户端(含 Ed25519 签名):

from zero_x_infinity_sdk import ZeroXInfinityClient

client = ZeroXInfinityClient(
    api_key="your_api_key",
    secret_key_bytes=secret_key  # Ed25519 私钥
)

# 创建订单
order = client.create_order(
    symbol="BTC_USDT",
    side="BUY",
    price="85000.00",
    qty="0.001"
)

4.2 TypeScript SDK

import { ZeroXInfinityClient } from './zero_x_infinity_sdk';

const client = new ZeroXInfinityClient(apiKey, secretKey);
const depth = await client.getDepth('BTC_USDT');

5. 验证

5.1 访问 Swagger UI

cargo run --release -- --gateway --port 8080
# 打开: http://localhost:8080/docs

5.2 测试结果

测试类别数量结果
单元测试293✅ 全部通过
公开端点6✅ 全部通过
私有端点9✅ 全部通过
E2E 总计17✅ 全部通过

6. 总结

本章我们为交易引擎添加了 OpenAPI 文档:

成就结果
Swagger UI可通过 /docs 访问
OpenAPI 规范15 个端点已文档化
Python SDK自动生成(含 Ed25519)
TypeScript SDK类型安全的客户端
零破坏性变更所有现有测试通过

下一章:随着鲁棒性(0x0D)和文档化(0x0E)的完成,基础已经稳固。下一个合理的步骤是 0x0F: 充值与提现 —— 连接区块链实现真正的加密货币资金。



0x0F Admin Dashboard Architecture

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📅 Status: ✅ Verified (E2E 4/4 Pass) Branch: 0x0F-admin-dashboard Updated: 2024-12-27

📦 Code Changes: View Code


1. Overview

1.1 Goal

Build an admin dashboard for exchange operations using FastAPI Amis Admin + FastAPI-User-Auth.

1.2 Tech Stack

ComponentTechnology
BackendFastAPI + SQLAlchemy
Admin UIFastAPI Amis Admin (Baidu Amis)
AuthFastAPI-User-Auth (Casbin RBAC)
DatabasePostgreSQL (existing)

1.3 Design Highlights ✨

Why do these designs matter? The Admin Dashboard is a core operations system for the exchange. Incorrect operations can lead to fund loss or system failures. The following design principles are key lessons we’ve learned in practice:

Design PrincipleWhy?
🔒 ID Immutabilityasset_id, symbol_id cannot be modified after creation. Historical orders and trade records depend on these IDs—changing them would break data relationships.
🔢 DB-Generated IDsasset_id, symbol_id use PostgreSQL SERIAL for auto-generation, preventing human input conflicts or errors.
📝 Status as StringsUsers see Active/Disabled instead of 1/0, reducing cognitive load and avoiding misinterpretation.
🚫 Base ≠ QuotePrevent creation of invalid pairs like BTC_BTC—this is a logic bug, not a UX issue.
🔍 Trace ID Evidence ChainFundamental financial compliance requirement. Each operation carries a ULID trace_id, forming a complete audit evidence chain. When issues arise: traceable, provable, reproducible.
📜 Mandatory Audit LogAll operations record before/after states, meeting compliance requirements and supporting incident investigation.
🔄 Gateway Hot-ReloadConfig changes take effect within 5 seconds without service restart—critical for emergency delisting scenarios.
⬇️ Default Descending SortLists show newest items first—operators typically focus on recent activity.

Tutorial Tip: These design principles didn’t emerge from nothing—they come from real operational pitfalls in exchange systems. Readers should carefully understand each “Why”.

1.4 Features

ModuleFunctions
User ManagementKYC review, VIP level, ban/unban
Asset ManagementDeposit confirm, withdrawal review, freeze
Trading MonitorReal-time orders, trades, anomaly alerts
Fee ConfigSymbol fee rates, VIP discounts
System MonitorService health, queue depth, latency
Audit LogAll admin operations logged

2. Architecture

┌─────────────────────────────────────────────────────────┐
│                   Admin Dashboard                        │
├─────────────────────────────────────────────────────────┤
│  FastAPI Amis Admin (UI)                                │
│  ├── User Management                                    │
│  ├── Asset Management                                   │
│  ├── Trading Monitor                                    │
│  ├── Fee Config                                         │
│  └── System Monitor                                     │
├─────────────────────────────────────────────────────────┤
│  FastAPI-User-Auth (RBAC)                               │
│  ├── Page Permissions                                   │
│  ├── Action Permissions                                 │
│  ├── Field Permissions                                  │
│  └── Data Permissions                                   │
├─────────────────────────────────────────────────────────┤
│  PostgreSQL (existing)     │     TDengine (read-only)  │
│  - users_tb                │     - trades_tb           │
│  - balances_tb             │     - balance_events_tb   │
│  - symbols_tb              │     - klines_tb           │
│  - transfers_tb            │                           │
└─────────────────────────────────────────────────────────┘

3. RBAC Roles

RolePermissions
Super AdminAll permissions
Risk OfficerWithdrawal review, user freeze
OperationsUser management, VIP config
SupportView-only, no modifications
AuditorView audit logs only

4. Implementation Plan

Phase 1: MVP - Config Management

Scope: Basic login + config CRUD (Asset, Symbol, VIP)

Step 1: Project Setup

mkdir admin && cd admin
python -m venv venv && source venv/bin/activate
pip install fastapi-amis-admin fastapi-user-auth sqlalchemy asyncpg

Step 2: Database Connection

  • Connect to existing PostgreSQL (zero_x_infinity database)
  • Reuse existing tables: assets_tb, symbols_tb, users_tb

Step 3: Admin CRUD

ModelTableOperations
Assetassets_tbList, Create, Update, Enable/Disable
Symbolsymbols_tbList, Create, Update, Trading/Halt
VIP Levelvip_levels_tbList, Create, Update
Audit Logadmin_audit_logList (read-only)

Symbol Status

StatusDescription
tradingNormal trading
haltSuspended (maintenance/emergency)

Step 4: Admin Auth

  • Default super admin account
  • Login/Logout UI

Acceptance Criteria

IDCriteriaVerify
AC-01Admin can login at http://localhost:$ADMIN_PORT/adminBrowser access (dev:8002, ci:8001)
AC-02Can create Asset (name, symbol, decimals)UI + DB
AC-03Can edit AssetUI + DB
AC-04Gateway hot-reload Asset configNo restart needed
AC-05Can create Symbol (base, quote, fees)UI + DB
AC-06Can edit SymbolUI + DB
AC-07Gateway hot-reload Symbol configNo restart needed
AC-08Can create/edit VIP LevelUI + DB
AC-09Reject invalid input (decimals<0, fee>100%)Boundary tests
AC-10VIP default Normal (level=0, 100% fee)Seed data
AC-11Asset Enable/DisableGateway rejects disabled asset
AC-12Symbol HaltGateway rejects new orders
AC-13Audit logAll CRUD ops queryable

Input Validation Rules

FieldRule
decimals0-18, must be integer
fee_rate0-100%, max 10000 bps
symbolUnique, uppercase + underscore
base_asset / quote_assetMust exist

Future Enhancements (P2)

Chain Asset Management (Layer 2): Implementation of ADR-005

  1. Chain Config: Manage chains_tb (RPC, confirmations)
  2. Asset Binding: Manage chain_assets_tb (Contract Address, Decimals)
  3. Auto-Verify: Verify contracts on-chain before binding
  4. Asset Migration (P3): Unbind/Rebind for Token Swaps (e.g., LEND -> AAVE)

Dual-Confirmation Workflow:

  1. Preview - Config change preview
  2. Second approval - Another admin approves
  3. Apply - Takes effect after confirmation

For: Symbol delisting, Asset disable, and other irreversible ops

Multisig Withdrawal:

  • Admin can only create “withdrawal proposal”, not execute directly
  • Flow: Support submits → Finance reviews → Offline sign/MPC executes
  • Private keys must NEVER touch admin server

5. Security Requirements (MVP Must-Have)

5.1 Mandatory Audit Logging (Middleware)

Every request must be logged:

# FastAPI Middleware
@app.middleware("http")
async def audit_log_middleware(request: Request, call_next):
    response = await call_next(request)
    await AuditLog.create(
        admin_id=request.state.admin_id,
        ip=request.client.host,
        timestamp=datetime.utcnow(),
        action=f"{request.method} {request.url.path}",
        old_value=...,
        new_value=...,
    )
    return response

5.2 Decimal Precision (Required)

Prevent JSON float precision loss:

from pydantic import BaseModel, field_serializer
from decimal import Decimal

class FeeRateResponse(BaseModel):
    rate: Decimal

    @field_serializer('rate')
    def serialize_rate(self, rate: Decimal, _info):
        return str(rate)  # Serialize as String

⚠️ All amounts and rates MUST use Decimal, output MUST be String

Naming Consistency (with existing code)

EntityFieldValues
Assetstatus0=disabled, 1=active
Symbolstatus0=offline, 1=online, 2=maintenance

⚠️ Implementation MUST match migrations/001_init_schema.sql


6. UX Requirements (Post-QA Review)

Based on QA feedback from 160+ test cases. These requirements enhance usability and prevent errors.

6.1 Asset/Symbol Display Enhancement

UX-01: Display Asset names in Symbol creation/edit forms

Base Asset: [BTC (ID: 1) ▼]  ← Dropdown with asset code
Quote Asset: [USDT (ID: 2) ▼]

Implementation: Use SQLAlchemy relationship display in FastAPI Amis Admin.


6.2 Fee Display Format

UX-02: Show fees in both percentage and basis points

Maker Fee: 0.10% (10 bps)
Taker Fee: 0.20% (20 bps)

Implementation:

@field_serializer('base_maker_fee')
def serialize_fee(self, fee: int, _info):
    pct = fee / 10000
    return f"{pct:.2f}% ({fee} bps)"

6.3 Danger Confirmation Dialog

UX-03: Confirm dialog for critical operations (Symbol Halt, Asset Disable)

┌─────────────────────────────────┐
│  ⚠️ Halt Symbol: BTC_USDT        │
├─────────────────────────────────┤
│  • Current orders: 1,234        │
│  • 24h volume: $12M             │
│                                 │
│  This action is reversible      │
│                                 │
│    [Confirm Halt]    [Cancel]   │
└─────────────────────────────────┘

Note: No “type to confirm” required (action is reversible).


6.4 Immutable Field Indicators

UX-04: Visually mark immutable fields in edit forms

Asset Edit:
┌──────────────────────────┐
│ Asset Code: BTC 🔒       │  ← Locked, disabled
│ Decimals: 8 🔒           │  ← Locked, disabled
│ Name: [Bitcoin      ] ✏️  │  ← Editable
│ Status: [Active ▼] ✏️     │  ← Editable
└──────────────────────────┘

Implementation: Use readonly_fields in ModelAdmin.


6.5 Structured Error Messages

UX-05: Provide actionable error responses

{
  "field": "asset",
  "error": "Invalid format",
  "got": "btc!",
  "expected": "Uppercase letters A-Z only (e.g., BTC)",
  "hint": "Remove special character '!'"
}

🚨 6.6 CRITICAL: Base ≠ Quote Validation

UX-06: Prevent creating symbols with same base and quote

This is a LOGIC BUG, not just UX.

@model_validator(mode='after')
def validate_base_quote_different(self):
    if self.base_asset_id == self.quote_asset_id:
        raise ValueError("Base and Quote assets must be different")
    return self

Test Case: BTC_BTC must be rejected.


6.7 ID Auto-Generation (DB Responsibility)

Requirement: asset_id and symbol_id are auto-generated by database, NOT user input.

Create Asset Form:

┌──────────────────────────┐
│ Asset Code: [BTC     ]   │  ← User fills
│ Name: [Bitcoin       ]   │  ← User fills
│ Decimals: [8]            │  ← User fills
│                          │
│ asset_id: (auto)         │  ← DB generates (SERIAL)
└──────────────────────────┘

Create Symbol Form:

┌──────────────────────────┐
│ Symbol: [BTC_USDT    ]   │  ← User fills
│ Base Asset: [BTC ▼]      │  ← User selects
│ Quote Asset: [USDT ▼]    │  ← User selects
│                          │
│ symbol_id: (auto)        │  ← DB generates (SERIAL)
└──────────────────────────┘

Implementation: Use PostgreSQL SERIAL or IDENTITY columns.

-- Already in migrations/001_init_schema.sql
CREATE TABLE assets_tb (
    asset_id SERIAL PRIMARY KEY,  -- Auto-increment
    asset VARCHAR(16) NOT NULL UNIQUE,
    ...
);

6.8 Status/Flags String Display

Requirement: Display Status and Flags as human-readable strings, not raw numbers.

Asset Status Display:

DB ValueDisplay StringColor
0Disabled🔴 Red
1Active🟢 Green

Symbol Status Display:

DB ValueDisplay StringColor
0Offline⚫ Gray
1Online🟢 Green
2Close-Only🟡 Yellow

Asset Flags Display (bitmask):

Flags: [Deposit ✓] [Withdraw ✓] [Trade ✓] [Internal Transfer ✓]

Instead of: asset_flags: 23

Implementation (Final Design):

⚠️ API Design: Status accepts STRING INPUT ONLY. Integer input is rejected.

class AssetStatus(IntEnum):
    DISABLED = 0
    ACTIVE = 1

class SymbolStatus(IntEnum):
    OFFLINE = 0
    ONLINE = 1
    CLOSE_ONLY = 2

# Pydantic schema validation (string-only input)
@field_validator('status', mode='before')
def validate_status(cls, v):
    if not isinstance(v, str):
        raise ValueError(f"Status must be a string, got: {type(v).__name__}")
    return AssetStatus[v.upper()]

# Output serialization (always string)
@field_serializer('status')
def serialize_status(self, value: int) -> str:
    return AssetStatus(value).name  # "ACTIVE" or "DISABLED"

Test Count: 177 unit tests (5 for UX-08 specifically)


6.9 Default Descending Sorting (UX-09)

Requirement: All list views must default to descending order (newest items first). Reason: Admins usually want to see recent activity or newly created entities. Implementation: Set ordering = [Model.pk.desc()] in ModelAdmin classes.


🔒 6.10 Full Lifecycle Trace ID (UX-10) - CRITICAL

Requirement: Every admin operation MUST carry a unique trace_id (ULID) from entry to exit.

Why: Admin Dashboard is critical infrastructure. Full observability is mandatory for:

  • Audit compliance
  • Debugging production issues
  • Security forensics
  • Performance monitoring

Trace Lifecycle:

┌──────────────────────────────────────────────────────────────────┐
│  Request Entry                                                   │
│  trace_id: 01HRC5K8F1ABCDEFG... (ULID generated)                 │
├──────────────────────────────────────────────────────────────────┤
│  [LOG] trace_id=01HRC5K8F1... action=START endpoint=/asset       │
│  [LOG] trace_id=01HRC5K8F1... action=VALIDATE input={...}        │
│  [LOG] trace_id=01HRC5K8F1... action=DB_QUERY sql=SELECT...      │
│  [LOG] trace_id=01HRC5K8F1... action=DB_UPDATE before={} after={}│
│  [LOG] trace_id=01HRC5K8F1... action=AUDIT_LOG written           │
│  [LOG] trace_id=01HRC5K8F1... action=END status=200 duration=45ms│
├──────────────────────────────────────────────────────────────────┤
│  Response Exit                                                   │
│  X-Trace-ID: 01HRC5K8F1ABCDEFG... (returned in header)           │
└──────────────────────────────────────────────────────────────────┘

Implementation:

import ulid
from fastapi import Request
from contextvars import ContextVar

# Context variable for trace_id
trace_id_var: ContextVar[str] = ContextVar("trace_id", default="")

@app.middleware("http")
async def trace_middleware(request: Request, call_next):
    # Generate ULID for each request
    trace_id = str(ulid.new())
    trace_id_var.set(trace_id)
    
    # Log entry
    logger.info(f"trace_id={trace_id} action=START endpoint={request.url.path}")
    
    response = await call_next(request)
    
    # Log exit
    logger.info(f"trace_id={trace_id} action=END status={response.status_code}")
    
    # Return trace_id in response header
    response.headers["X-Trace-ID"] = trace_id
    return response

# Audit log includes trace_id
class AuditLog(Base):
    trace_id = Column(String(26), nullable=False)  # ULID is 26 chars
    admin_id = Column(BigInteger, nullable=False)
    action = Column(String(32), nullable=False)
    ...

Log Format (structured JSON):

{
  "timestamp": "2025-12-27T10:25:00Z",
  "trace_id": "01HRC5K8F1ABCDEFGHIJK",
  "admin_id": 1001,
  "action": "DB_UPDATE",
  "entity": "Asset",
  "entity_id": 5,
  "before": {"status": 1},
  "after": {"status": 0},
  "duration_ms": 12
}

Verification:

  • Every request generates unique ULID trace_id
  • All log lines include trace_id
  • Audit log table has trace_id column
  • Response includes X-Trace-ID header
  • Local log files are rotated and retained

7. Testing

� Full Testing Guide: 0x0F-admin-testing.md

Quick Start:

./scripts/run_admin_full_suite.sh   # Run all tests

Test Summary:

CategoryCountStatus
Rust unit tests5
Admin unit tests178+
Admin E2E tests4/4
UX-10 Trace ID16/16

Ports: Dev 8002, CI 8001


8. Future Phases

PhaseContent
Phase 2User management, balance viewer
Phase 3TDengine monitoring
Phase 4Full RBAC, advanced audit

7. Directory Structure

admin/
├── main.py                 # FastAPI app entry
├── settings.py             # Config
├── models/                 # SQLAlchemy models (shared with main app)
├── admin/
│   ├── user.py            # User admin
│   ├── asset.py           # Asset admin
│   ├── trading.py         # Trading admin
│   └── system.py          # System admin
├── auth/
│   └── rbac.py            # RBAC config
└── requirements.txt




🇨🇳 中文

📅 状态: ✅ 已验证 (E2E 4/4 通过) 分支: 0x0F-admin-dashboard

📦 代码变更: 查看代码


1. 概述

1.1 目标

使用 FastAPI Amis Admin + FastAPI-User-Auth 构建交易所后台管理系统。

1.2 技术栈

组件技术
后端FastAPI + SQLAlchemy
管理界面FastAPI Amis Admin (百度 Amis)
认证FastAPI-User-Auth (Casbin RBAC)
数据库PostgreSQL (现有)

1.3 功能模块

模块功能
用户管理KYC 审核、VIP 等级、封禁/解封
资产管理充值确认、提现审核、资产冻结
交易监控实时订单/成交、异常报警
费率配置Symbol 费率、VIP 折扣
系统监控服务健康、队列积压、延迟
审计日志所有管理操作可追溯

2. RBAC 角色

角色权限
超级管理员全部权限
风控专员提现审核、用户冻结
运营人员用户管理、VIP 配置
客服只读,不可修改
审计员只看审计日志

4. 配置与脚本统一 (2024-12-27)

4.1 配置单一源 (Single Source of Truth)

所有环境配置统一从 scripts/lib/db_env.sh 导出:

# 数据库
export PG_HOST, PG_PORT, PG_USER, PG_PASSWORD, PG_DB
export DATABASE_URL, DATABASE_URL_ASYNC

# 服务端口
export GATEWAY_PORT  # 8080
export ADMIN_PORT    # Dev: 8002, CI: 8001
export ADMIN_URL, GATEWAY_URL

端口约定

环境GatewayAdmin
Dev (本地)80808002
CI80808001
QA80808001

4.2 测试脚本命名规范

脚本用途
run_admin_full_suite.sh统一入口(Rust + Admin Unit + E2E)
run_admin_gateway_e2e.shAdmin → Gateway 传播E2E测试
run_admin_tests_standalone.sh一键完整测试(安装deps+启动server)

命名规范:run_<scope>_<type>.sh

4.3 测试结构

admin/tests/
├── unit/           # pytest 单元测试
├── e2e/            # pytest E2E测试 (需service running)
└── integration/    # 独立脚本 (通过CI运行)
    └── test_admin_gateway_e2e.py

运行方式

# 运行全部
./scripts/run_admin_full_suite.sh

# 快速模式(跳过unit tests)
./scripts/run_admin_full_suite.sh --quick



0x0F Admin Dashboard - Testing Guide

This document contains detailed test cases and scripts for the Admin Dashboard. For architecture overview, see Admin Dashboard.


Test Scripts

One-Click Testing

# Run all tests (Rust + Admin Unit + E2E)
./scripts/run_admin_full_suite.sh

# Quick mode (skip Unit Tests)
./scripts/run_admin_full_suite.sh --quick

# Run only Admin → Gateway propagation E2E
./scripts/run_admin_gateway_e2e.sh

Script Reference

ScriptPurpose
run_admin_full_suite.shUnified entry (Rust + Admin Unit + E2E)
run_admin_gateway_e2e.shAdmin → Gateway propagation tests
run_admin_tests_standalone.shOne-click full test (install deps + start server)

Port Configuration

EnvironmentAdmin PortGateway Port
Dev (local)80028080
CI80018080

Test Files

ScriptFunction
verify_e2e.pyAdmin login/logout, health check
test_admin_login.pyAuthentication tests
test_constraints.pyDatabase constraint validation
test_core_flow.pyAsset/Symbol CRUD workflows
test_input_validation.pyInvalid input rejection
test_security.pySecurity and authentication
tests/e2e/test_asset_lifecycle.pyAsset enable/disable lifecycle
tests/e2e/test_symbol_lifecycle.pySymbol trading status management
tests/e2e/test_fee_update.pyFee configuration updates
tests/e2e/test_audit_log.pyAudit trail verification
tests/test_ux10_trace_id.pyUX-10 Trace ID verification

Running Individual Tests

cd admin && source venv/bin/activate
pytest tests/test_core_flow.py -v
pytest tests/e2e/test_asset_lifecycle.py -v
pytest tests/test_ux10_trace_id.py -v

Test Coverage

Total: 198+ tests

  • Rust unit tests: 5 passed
  • Admin unit tests: 178+ passed
  • Admin E2E tests: 4/4 passed
  • UX-10 Trace ID tests: 16/16 passed

UX Requirements Test Matrix

UX IDRequirementTest File
UX-06Base ≠ Quote validationtest_constraints.py
UX-07ID Auto-Generationtest_id_mapping.py
UX-08Status String Displaytest_ux08_status_strings.py
UX-09Default Descending Sorttest_core_flow.py
UX-10Trace ID Evidence Chaintest_ux10_trace_id.py

Acceptance Criteria

#DeliverableVerification
1Admin UI accessibleBrowser at localhost:$ADMIN_PORT
2One-click E2E test./scripts/run_admin_full_suite.sh passes
3All tests pass198+ tests green
4Audit log queryableAdmin UI audit page
5Gateway hot-reloadConfig change without restart

Standard Operating Procedure (SOP): Token Listing

Role: Operations / Listing Manager System: Admin Dashboard

1. 准备工作 (Pre-requisites)

Before listing, you need the following information:

ItemDescriptionExampleSource
Logic SymbolThe unique ticker on the exchangeUNIProject Team
Asset NameFull display nameUniswapProject Team
ChainThe blockchain networkETHProject Team
Contract AddressThe Token’s Smart Contract0x1f98...Etherscan / Project
DecimalsToken precision18Auto-detected
Min DepositMinimum amount to credit0.1Ops Decision (Risk)
Withdraw FeeFee deducted per withdrawal5.0Ops Decision (Gas Cost)

2. 操作步骤 (Workflow steps)

Phase 1: Create Logical Asset (业务定义)

Define the asset for Trading and User Balances.

  1. Navigate: Admin -> Assets -> Create New.
  2. Input:
    • Symbol: UNI
    • Name: Uniswap
    • Decimals: 18 (System Internal Precision)
    • Initial Permissions:
      • [x] Can Allow Deposit
      • [ ] Can Allow Withdraw (Recommended: Disable initially)
      • [ ] Can Allow Trade (Recommended: Enable later)
    • Status: Active
  3. Click: Save.
    • System Result: assets_tb created. Asset ID generated (e.g., #10).

Phase 2: Bind Chain Asset (链上绑定)

Tell Sentinel how to find this asset on-chain and set limits.

  1. Navigate: Admin -> Assets -> Select UNI (#10) -> Chain Config Tab.

  2. Click: Add New Binding.

  3. Input Configuration (Minimal):

    • Chain: Select ETH (Ethereum).
    • Contract Address: Paste 0x1f98...
    • (Leave other fields empty - System will fetch them)
  4. Action: Click Auto-Detect from Chain.

    • System Action: Queries RPC decimals(), symbol().
    • Result:
      • Decimals: Auto-filled 18. (Locked, Read-only)
      • Symbol: Auto-detected UNI. (Verifies against Asset name)
    • Ops Action: Verify the fetched data matches. Adjust Min Deposit / Fee only if defaults are unsuitable.
  5. Risk Configuration: (Review Defaults)

    • Min Deposit: 0.1 (Prevent dust attacks).
    • Min Withdraw: 10.0 (Must be > Fee).
    • Withdraw Fee: 5.0 (Cover Gas + Margin).
  6. Confirm: Check detected Decimals match project info.

  7. Click: Bind (Saved as Inactive).

    • System Result: chain_assets_tb created with is_active=false.

Note: Risk Parameters (Fee, Min Deposit) are Chain-Specific.

Phase 3: Validation & Activation (验证与激活)

Verify functionality before opening to public.

  1. Validation: Perform the “User Deposit Test” (See Section 3).
    • Note: Inactive assets can still be processed by Sentinel for Admin-whitelisted test accounts (if supported), or use Staging env.
    • Correction: Current Sentinel design might require Active to process. SOP Update: Set “Fee” very high or “Min Deposit” very high to prevent public use, OR rely on asset_flags (Deposit Disabled) from Phase 1.
    • Refined Strategy:
      1. Enable Chain Binding (Active = True) to allow Sentinel to see it.
      2. Keep Logical Asset “Deposit/Withdraw” flags (Phase 1) DISABLED.
      3. Test.
      4. Enable Logical Flags.

(Self-Correction: Sentinel needs is_active=true to index logs. So we must keep Chain Active but Logic (User Balance) Disabled).

Revised Step 7: 7. Click: Bind & Activate (Chain Level). * Safety: Ensure Phase 1 Flags (Deposit/Withdraw) are UNCHECKED. This allows Sentinel to sync, but Users cannot operate.

Phase 4: Public Launch (Go Live)

  1. Navigate: Admin -> Assets -> UNI.
  2. Action: Check [x] Can Allow Deposit, [x] Can Allow Withdraw.
  3. Click: Save.
    • Result: Users can now see deposit address and transact.

Note: Risk Parameters (Fee, Min Deposit) are Chain-Specific. If you list USDT on both ETH and TRON, you must configure them separately for each chain (e.g., ETH Fee = 5.0, TRON Fee = 1.0).


3. 结果验证 (Verification)

Verification A: User Deposit (Hot Test)

  1. Ask a test user to deposit UNI to their Existing ETH Address.
    • Note: User does NOT need to generate a new address.
  2. Wait 1-2 minutes (Block Confirmation).
  3. Check Admin -> Deposits: Should see + UNI record.

Verification B: System Log

  1. Check Sentinel Logs: [ETH] New asset watched: UNI (0x1f98...).

4. 常见问题 (FAQ)

Q: 用户需要重新生成地址吗? A: 不需要。只要是 ETH 链上的资产,用户统一使用同一个 ETH 充值地址。系统会自动根据 Contract 地址识别是 UNI 还是 USDT。

Q: 填错了合约地址怎么办? A: Verify On-Chain 步骤会报错(Decimal获取失败或为0)。如果强行保存了错误地址,请立即在 Admin 中将该 Binding 设为 Disabled,然后重新添加正确的。

0x10 Web Frontend Outsourcing Specification

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📅 Status: 📝 RFP / Requirements Spec Goal: Develop a production-grade cryptocurrency exchange frontend.


1. Project Overview

We are looking for a professional development team to build the web frontend for Zero X Infinity, a high-performance cryptocurrency exchange.

Core Requirement: The frontend must be fast, responsive, and visually premium (similar to Binance/Bybit Pro implementations).

Technology Stack: Open Choice (Developer proposes stack).

  • Recommended: React, Vue 3, or Svelte.
  • Requirement: Must produce static assets manageable by Nginx/Docker.

2. Scope of Work

2.1 Core Pages

PageFeaturesBackend Status
Home / LandingMarket overview, Tickers, “Start Trading” CTA.⚠️ Mock Data (Public API part ready)
AuthenticationLogin, Register, Forgot Password.Ready (Phase 0x10.6 Implemented)
Trading Interface(Core) K-Line Chart, OrderBook, Trade History, Order Form.Ready (Full API Support)
Assets / WalletBalance overview, Deposit, Withdrawal, Asset History.⚠️ Partial (Read Only ready; Dep/Wdw Pending)
User CenterAPI Key management, Password reset, Activity log.Backend Ready (UI Pending)

2.2 Key Features & Requirements

A. Trading Interface (Critical)

  • Layout: 3-column classic layout (Left: Orderbook, Mid: Chart, Right: Trade History/Forms).
  • Chart: Integration with TradingView Charting Library (or Lightweight Charts).
  • OrderBook: Visual depth representation, clickable price to fill order form.
  • Responsiveness: Must work flawlessly on Desktop (1080p+) and Mobile.

B. Technical Constraints

  1. NO FLOATING POINT MATH: All precision must use String or BigInt arithmetic.
    • Backend sends: "123.45670000" (String).
    • Frontend displays: Fixed precision per asset config.
  2. WebSocket Push: Market data is pushed via WebSocket. Frontend must handle reconnection and heartbeat.
  3. Ed25519 Authentication:
    • API requests require X-Signature header.
    • Frontend must sign payload using Ed25519 private key (stored in memory/session).
    • Note: If using a standard password login flow, the backend may handle session cookies, but for high-security actions or if “API-Key mode” is used, client-side signing is required. (Clarification: MVP will use opaque Session Token returned by API, standard HTTP Only Cookie or Bearer Token. Ed25519 is for API Clients, but Web UI can use session wrapper.)

3. Deliverables

  1. Source Code: Full git repository history.
  2. Docker Support: Dockerfile for multi-stage build (Node build -> Nginx alpine).
  3. Documentation:
    • README.md: Build & Run instructions.
    • CONFIG.md: Environment variable reference.
  4. Mock Server: Simple mock logic or fixtures for UI testing without full backend.

4. Resources provided


5. API Inventory (Current Available)

The following APIs are implemented and available for frontend integration.

5.1 Public Market Data

Base URL: /api/v1/public

EndpointMethodDescriptionStatus
/exchange_infoGETServer time, limits✅ Ready
/assetsGETList supported assets✅ Ready
/symbolsGETList trading pairs✅ Ready
/depthGETOrder book depth✅ Ready
/klinesGETOHLCV candles✅ Ready
/tradesGETPublic trade history✅ Ready

5.2 Private Trading (Requires Signature)

Base URL: /api/v1/private

EndpointMethodDescriptionStatus
/orderPOSTPlace limit/market order✅ Ready
/cancelPOSTCancel order✅ Ready
/ordersGETList open/history orders✅ Ready
/order/{id}GETGet single order details✅ Ready
/tradesGETUser trade history✅ Ready
/balancesGETGet specific asset balance✅ Ready
/balances/allGETGet all asset balances✅ Ready

5.3 WebSocket Real-time Stream

Endpoint: ws://host:port/ws

ChannelTypeDescriptionStatus
order.updatePrivateOrder status change✅ Ready (Authenticated)
tradePrivateUser trade execution✅ Ready (Authenticated)
balance.updatePrivateBalance change✅ Ready (Authenticated)
market.depthPublicOrderbook updates✅ Ready
market.tickerPublic24h Ticker updates✅ Ready
market.tradePublicPublic trade stream✅ Ready

5.4 Authentication & User

FeatureDescriptionStatus
Sign-up/LoginUser registration & JWT✅ Ready (Implemented)
User ProfileKYC, Password reset⚠️ Partial (Password Reset Ready)
API KeysManage API keys✅ Ready (Implemented)

6. Development Resources

6.1 How to Access API Documentation

The backend provides auto-generated OpenAPI 3.0 documentation.

Step 1: Start the Backend (Mock Mode)

# Clone repository
git clone https://github.com/gjwang/zero_x_infinity
cd zero_x_infinity

# Run Gateway (requires Rust installed)
cargo run --release -- --gateway --port 8080

Step 2: Access Documentation

Step 3: Generate Client SDK You can use openapi-generator-cli to generate a robust client:

npx @openapitools/openapi-generator-cli generate \
  -i http://localhost:8080/api-docs/openapi.json \
  -g typescript-axios \
  -o ./src/api




🇨🇳 中文

📅 状态: 📝 外包需求文档 (RFP) 目标: 开发一套生产级的加密货币交易所 Web 前端。


1. 项目概览

我们需要一个专业团队为 Zero X Infinity 高性能交易所开发 Web 前端。

核心要求: 界面必须 快速、响应式且具备高级感(对标 Binance/Bybit 专业版体验)。

技术栈: 不限 (由开发方提案)。

  • 推荐: React, Vue 3, 或 Svelte。
  • 要求: 最终产物必须是静态文件,可由 Nginx/Docker 托管。

2. 工作范围

2.1 核心页面

| 页面 | 功能点 | 后端状态 | |——|________|–––––| | 首页 | 市场概览, 推荐币种, “开始交易“引导 | ⚠️ Mock 数据 (部分公有API就绪) | | 认证模块 | 登录, 注册, 找回密码 | ✅ 后端就绪 (Phase 0x10.6 已完成) | | 交易界面 | (核心) K线图, 盘口, 最新成交, 下单面板 | ✅ 完全就绪 (API 齐备) | | 资产/钱包 | 资产总览, 充值, 提现, 资金流水 | ⚠️ 部分就绪 (仅只读余额; 充提待定) | | 用户中心 | API Key 管理, 密码修改, 活动日志 | ✅ 后端就绪 (UI 待开发) |

2.2 关键特性与要求

A. 交易界面 (关键)

  • 布局: 经典三栏布局 (左: 盘口, 中: K线, 右: 成交/下单)。
  • 图表: 集成 TradingView Charting Library (或 Lightweight Charts)。
  • 盘口: 带有视觉深度的买卖盘列表,点击价格可填入下单框。
  • 响应式: 必须完美适配桌面端 (1080p+) 和移动端浏览器。

B. 技术限制

  1. 严禁浮点数运算: 所有金额/价格必须使用 StringBigInt 处理。
    • 后端下发: "123.45670000" (字符串)。
    • 前端显示: 根据配置的精度进行截断/补零。
  2. WebSocket 推送: 行情数据通过 WS 推送。前端需处理断线重连和心跳。
  3. Ed25519 签名 (如需):
    • : Web 端通常使用 Session Cookie/Token 模式。如涉及客户端签名功能,需支持 Ed25519 算法。

3. 交付物

  1. 源代码: 完整的 Git 提交记录。
  2. Docker 支持: Dockerfile (多阶段构建: Node build -> Nginx alpine)。
  3. 文档:
    • README.md: 构建与运行指南。
    • CONFIG.md: 环境变量说明。
  4. Mock 服务: 用于 UI 独立开发的 Mock 数据或逻辑。

4. 提供资源


5. API 清单 (当前可用)

以下 API 已实现并可用于前端集成。

5.1 公开行情数据

基础 URL: /api/v1/public

端点方法描述状态
/exchange_infoGET服务器时间, 限制✅ 就绪
/assetsGET资产列表✅ 就绪
/symbolsGET交易对列表✅ 就绪
/depthGET订单簿深度✅ 就绪
/klinesGETK线数据✅ 就绪
/tradesGET公开成交历史✅ 就绪

5.2 私有交易 (需签名)

基础 URL: /api/v1/private

端点方法描述状态
/orderPOST下单 (限价/市价)✅ 就绪
/cancelPOST撤单✅ 就绪
/ordersGET查询订单 (当前/历史)✅ 就绪
/order/{id}GET查询单条订单✅ 就绪
/tradesGET用户成交历史✅ 就绪
/balancesGET查询特定资产余额✅ 就绪
/balances/allGET查询所有余额✅ 就绪

5.3 WebSocket 实时流

端点: ws://host:port/ws

频道类型描述状态
order.update私有订单状态变更✅ 就绪 (需鉴权)
trade私有用户成交通知✅ 就绪 (需鉴权)
balance.update私有余额变更✅ 就绪 (需鉴权)
market.depth公开盘口深度更新✅ 就绪
market.ticker公开24h Ticker更新✅ 就绪
market.trade公开公开成交流✅ 就绪

5.4 认证与用户

功能描述状态
注册/登录用户注册 & JWT✅ 就绪 (已实现)
用户资料KYC, 密码重置⚠️ 部分就绪 (支持改密)
API Key管理 API Key✅ 就绪 (已实现)

6. 开发资源

6.1 如何获取 API 文档

后端提供自动生成的 OpenAPI 3.0 文档。

步骤 1: 启动后端 (Mock 模式)

# 克隆仓库
git clone https://github.com/gjwang/zero_x_infinity
cd zero_x_infinity

# 运行网关 (需要安装 Rust)
cargo run --release -- --gateway --port 8080

步骤 2: 访问文档

步骤 3: 生成客户端 SDK 你可以使用 openapi-generator-cli 生成健壮的客户端代码:

npx @openapitools/openapi-generator-cli generate \
  -i http://localhost:8080/api-docs/openapi.json \
  -g typescript-axios \
  -o ./src/api

0x11 Deposit & Withdraw (Mock Chain)

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

Core Objective: Implement the Funding Layer (Deposit & Withdraw) using a Mock Chain Architecture to validate asset flows without external blockchain dependencies.


1. Background & Architecture

We have a high-performance Matching Engine (Phase I) and a Product Layer (Accounts/Auth, Phase II). Now we add the Funding Layer to allow assets to enter and leave the system.

1.1 The “Mock Chain” Strategy

Instead of syncing 500GB of Bitcoin data, we implement a Simulator for Phase 0x11.

  • Goal: Validate internal logic (Balance Credit, Risk Check, Idempotency).
  • Method: MockBtcChain and MockEvmChain traits that simulate RPC calls.
graph LR
    User[User] -->|API Request| Gateway
    Gateway -->|Risk Check| FundingService
    FundingService -->|Command| ME[Matching Engine]
    FundingService -.->|Simulated RPC| MockChain[Mock Chain Adapter]
    MockChain -.->|Callback| FundingService

1.2 Phase Plan

ChapterTopicStatus
0x11Deposit & Withdraw (Mock)Completed
0x11-aReal Chain Integration🚧 Construction

2. Core Implementation

2.1 Funding Service (src/funding/service.rs)

The central orchestrator for all funding operations.

  • Deposit: Receives “Mock Event”, checks idempotency, credits user balance via matching engine.
  • Withdraw: Authenticates user, locks funds in engine, simulates broadcast, updates DB.

2.2 Chain Adapter Trait (src/funding/chain_adapter.rs)

We abstract blockchain specifics behind a trait:

#![allow(unused)]
fn main() {
#[async_trait]
pub trait ChainClient: Send + Sync {
    async fn generate_address(&self, user_id: i64) -> Result<String, ChainError>;
    async fn broadcast_withdraw(&self, to: &str, amount: &str) -> Result<String, ChainError>;
    // ... validation methods
}
}

2.3 Database Schema (Migration)

Key tables added in migrations/010_deposit_withdraw.sql:

  • deposit_history: Tracks incoming transactions (Key: tx_hash).
  • withdraw_history: Tracks outgoing requests (Key: request_id).
  • user_addresses: Maps User <-> Asset <-> Address.

3. Data Flow

3.1 Deposit Flow (Mock)

  1. Trigger: POST /internal/mock/deposit { user_id, asset, amount }
  2. Idempotency: Check if tx_hash exists in deposit_history.
  3. Engine Execution: Send OrderAction::Deposit to Match Engine.
  4. Result: User Balance increases.
#![allow(unused)]
fn main() {
// src/funding/deposit.rs
pub async fn process_deposit(...) {
    if db.exists(tx_hash).await? { return Ok(()); }
    
    // Command Engine
    engine.execute(Deposit(user_id, asset, amount)).await?;
    
    // Persist
    db.insert_deposit(..., "SUCCESS").await?;
}
}

3.2 Withdraw Flow

  1. Request: POST /api/v1/private/withdraw/apply
  2. Risk Check: 2FA (Future), Whitelist, Balance Check.
  3. Engine Lock: Send OrderAction::WithdrawLock (Instant deduction).
  4. Broadcast: Call mock_chain.broadcast().
  5. Finalize: Update withdraw_history with tx_hash.

4. Verification

We verified this phase using a comprehensive E2E script.

4.1 Verification Script

Run the master script to verify the full lifecycle:

./scripts/verify_funding_trading_flow.sh

Scenario Covered:

  1. Register User A & B.
  2. Deposit BTC to User A (Mock).
  3. Transfer internal funds.
  4. Trade (Buy/Sell) to change balances.
  5. Withdraw USDT from User B.
  6. Audit: Check DB consistency.

4.2 Security Validation

  • Address Validation: Strict Regex for 0x... (ETH) and 1/3/bc1... (BTC).
  • Internal Auth: Mock endpoints protected by X-Internal-Secret.

Warning

SECURITY ADVISORY: The /internal/mock/deposit endpoint is a major security risk as it allows direct balance manipulation. It is currently protected by a secret but MUST be removed entirely once the Phase 0x11-a Sentinel (blockchain scanner) is fully integrated and stable.


Summary

Phase 0x11 establishes the “Financial Highways” of the exchange. By using a Mock Chain, we isolated the complex internal logic (Accounting, Risk, Idempotency) from the external chaos of real blockchains.

Key Achievement:

A complete, idempotent Asset Inflow/Outflow system that is “Blockchain Agnostic”.

Next Step:

Phase 0x11-a: Replace the “Mock Adapter” with a “Real Node Sentinel” (Bitcoin Core / Anvil).




🇨🇳 中文

📦 代码变更: 查看 Diff

核心目标:实现 资金层 (Funding Layer) (充值与提现),使用 模拟链架构 (Mock Chain) 来验证资金流转,而不依赖外部区块链环境。


1. 背景与架构

我们已经拥有了高性能的 撮合引擎 (Phase I) 和 产品层 (账户/鉴权, Phase II)。 现在我们需要添加 资金层,允许资产进入和离开系统。

1.1 “Mock Chain” 策略

在 Phase 0x11 中,我们实现一个 模拟器,而不是直接同步 500GB 的比特币数据。

  • 目标: 验证内部逻辑 (余额入账、风控检查、幂等性)。
  • 方法: MockBtcChainMockEvmChain trait,模拟 RPC 调用。
graph LR
    User[用户] -->|API 请求| Gateway
    Gateway -->|风控检查| FundingService
    FundingService -->|指令| ME[撮合引擎]
    FundingService -.->|模拟 RPC| MockChain[Mock Chain 适配器]
    MockChain -.->|回调| FundingService

1.2 阶段规划

章节主题状态
0x11充值与提现 (Mock)已完成
0x11-a真实链集成🚧 建设中

2. 核心实现

2.1 资金服务 (src/funding/service.rs)

资金操作的核心协调器。

  • 充值 (Deposit): 接收 “模拟事件”,检查幂等性,通过撮合引擎增加用户余额。
  • 提现 (Withdraw): 验证用户,锁定引擎中的资金,模拟广播,更新数据库。

2.2 链适配器接口 (src/funding/chain_adapter.rs)

我们将区块链细节抽象在 Trait 之后:

#![allow(unused)]
fn main() {
#[async_trait]
pub trait ChainClient: Send + Sync {
    async fn generate_address(&self, user_id: i64) -> Result<String, ChainError>;
    async fn broadcast_withdraw(&self, to: &str, amount: &str) -> Result<String, ChainError>;
    // ... 验证方法
}
}

2.3 数据库 Schema (Migration)

migrations/010_deposit_withdraw.sql 新增的关键表:

  • deposit_history: 追踪入金 (Key: tx_hash)。
  • withdraw_history: 追踪出金 (Key: request_id)。
  • user_addresses: 映射 User <-> Asset <-> Address

3. 数据流

3.1 充值流程 (Mock)

  1. 触发: POST /internal/mock/deposit { user_id, asset, amount }
  2. 幂等性: 检查 deposit_history 中是否存在 tx_hash
  3. 引擎执行: 发送 OrderAction::Deposit 给撮合引擎。
  4. 结果: 用户余额增加。
#![allow(unused)]
fn main() {
// src/funding/deposit.rs
pub async fn process_deposit(...) {
    if db.exists(tx_hash).await? { return Ok(()); }
    
    // Command Engine
    engine.execute(Deposit(user_id, asset, amount)).await?;
    
    // Persist
    db.insert_deposit(..., "SUCCESS").await?;
}
}

3.2 提现流程

  1. 请求: POST /api/v1/private/withdraw/apply
  2. 风控: 2FA (规划中), 白名单, 余额检查
  3. 引擎锁定: 发送 OrderAction::WithdrawLock (瞬间扣除)。
  4. 广播: 调用 mock_chain.broadcast()
  5. 终结: 更新 withdraw_history 填充 tx_hash

4. 验证与测试

我们使用全链路 E2E 脚本验证了本阶段功能。

4.1 验证脚本

运行主脚本以验证完整生命周期:

./scripts/verify_funding_trading_flow.sh

覆盖场景:

  1. 注册 用户 A & B。
  2. 充值 BTC 给用户 A (模拟)。
  3. 划转 资金 (Internal Transfer)。
  4. 交易 (买/卖) 改变余额。
  5. 提现 USDT (用户 B)。
  6. 审计: 检查数据库一致性。

4.2 安全性验证

  • 地址验证: 针对 0x... (ETH) 和 1/3/bc1... (BTC) 的严格正则校验。
  • 内部鉴权: Mock 端点受 X-Internal-Secret 保护。

Caution

安全警告/internal/mock/deposit 接口存在重大安全隐患,因为它允许直接修改用户余额。虽然目前增加了 Secret 校验,但在 Phase 0x11-a Sentinel(区块链扫描器)完全集成并稳定后,必须彻底移除此接口


总结

Phase 0x11 建立了交易所的 “资金高速公路”。 通过使用 Mock Chain,我们将复杂的内部逻辑(会计、风控、幂等性)与外部区块链的混乱隔离开来。

关键成就:

一套完整的、幂等的资产流入/流出系统,且做到 “Blockchain Agnostic” (与具体链解耦)。

下一步:

Phase 0x11-a: 将 “Mock Adapter” 替换为 “Real Node Sentinel” (Bitcoin Core / Anvil)。

0x11-a Real Chain Integration

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

StatusIMPLEMENTED / QA VERIFIED (Phase 0x11-a Complete)
Date2025-12-29
ContextPhase 0x11 Extension: From Mock to Reality
GoalIntegrate real Blockchain Nodes (Regtest/Testnet) and handle distributed system failures (Re-orgs, Network Partition).

1. Core Architecture Change: Pull vs Push

The “Mock” phase (0x11) relied on a Push Model (API Call -> Deposit). Real Chain Integration (0x11-a) requires a Pull Model (Sentinel -> DB).

1.1 The Sentinel (New Service)

A dedicated, independent service loop responsible for “watching” the blockchain.

  • Block Scanning: Polls getblockchaininfo / eth_blockNumber.
  • Filter: Index user_addresses in memory. Scan every transaction in new blocks against this filter.
  • State Tracking: Updates confirmation counts for existing CONFIRMING deposits.

2. Critical Challenge: Re-org (Chain Reorganization)

In a real blockchain, the “latest” block is not final. It can be orphaned.

2.1 Confirmation State Machine

We must expand the Deposit Status flow to handle volatility.

StatusConfirmationsActionUI Display
DETECTED0Log Tx. Do NOT credit balance.“Confirming (0/X)”
CONFIRMING1 to (X-1)Update confirmation count. Check for Re-org (BlockHash mismatch).“Confirming (N/X)”
FINALIZED>= XAction: Push OrderAction::Deposit to Pipeline.“Success”

Important

X represents the REQUIRED_CONFIRMATIONS parameter. Hardcoding is forbidden.

2.2 Re-org Detection Logic

  1. Sentinel remembers Block(Height H) = Hash A.
  2. Sentinel scans Height H again later.
  3. If Hash != A, a Re-org happened.
  4. Action: Rollback scan cursor, re-evaluate all affected deposits.

3. Supported Chains (Phase I)

3.1 Bitcoin (The UTXO Archetype)

  • Node: bitcoind (Regtest Mode).
  • Key Challenge: UTXO Management. A deposit is not a “balance update”, it’s a new Unspent Output.
  • Docker: ruimarinho/bitcoin-core:24

3.2 Ethereum (The Account/EVM Archetype) - 🚧 PENDING

  • Status: Design Complete, Implementation Pending (Phase 0x11-b).
  • Node: anvil (from Foundry-rs).
  • Key Challenge: Event Log Parsing. ERC20 deposits are Transfer events in receipt logs.
  • Docker: ghcr.io/foundry-rs/foundry:latest

4. Sentinel Architecture (Detailed)

4.1 BtcSentinel (Implemented)

  1. getblockhash -> getblock (Verbosity 2).
  2. Iterate outputs vout: Match scriptPubKey against user_addresses.
  3. Re-org Check: Keep a rolling window. If previousblockhash mismatch, trigger Rollback.

4.2 EthSentinel (Planned for 0x11-b)

  1. eth_getLogs (Topic0 = Transfer).
  2. Re-org Check: Check blockHash of confirmed logs.

5. Reconciliation & Safety (The Financial Firewall)

5.1 The “Truncation Protocol”

  • Ingress Logic: Deposit_Credited = Truncate(Deposit_Raw, Configured_Precision)
  • Residue: Remainder stays in wallet as “System Dust”.

5.2 The Triangular Reconciliation

We verify solvency using three independent data sources:

SourceAliasData Point
Blockchain RPCProof of Assets (PoA)getbalance() or sum of UTXOs
Internal LedgerProof of Liabilities (PoL)SUM(user.available + user.frozen)
Transaction HistoryProof of Flow (PoF)SUM(deposits) - SUM(withdrawals) - SUM(fees)

The Equation: PoA == PoL + SystemProfit

5.3 Re-org Recovery Protocol

  • Shallow Re-org: Sentinel rolls back cursor.
  • Deep Re-org (> Max Depth): Manual intervention (Freeze + Clawback).

6. Database Schema Extensions

CREATE TABLE chain_cursor (
    chain_id VARCHAR(16) PRIMARY KEY,
    last_scanned_height BIGINT NOT NULL,
    last_scanned_hash VARCHAR(128) NOT NULL,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

ALTER TABLE deposit_history 
ADD COLUMN chain_id VARCHAR(16),
ADD COLUMN block_height BIGINT,
ADD COLUMN block_hash VARCHAR(128),
ADD COLUMN tx_index INT,
ADD COLUMN confirmations INT DEFAULT 0;

7. Configuration: No Hardcoding

All chain-specific parameters (confirmations, reorg depth, dust threshold) must be loaded from YAML.

8. Security: HD Wallet Architecture

8.1 Key Storage

  • Cold Storage: Private Key (Mnemonic) offline.
  • Hot Server: XPUB only.

8.2 Address Derivation

  • BTC: BIP84 (m/84'/0'/0'/0/{index})
  • ETH: BIP44 (m/44'/60'/0'/0/{index})

8.3 The “Gap Limit” Solution

  • Solution: Full Index Scanning. Sentinel loads ALL active allocated addresses from the user_addresses table into a HashSet (Memory) or Bloom Filter (Future optimization).
  • Scanning: Scan every block transaction output against this set, ignoring standard Gap Limits.

9. Future Work (Out of Scope for 0x11-a)

  1. Bloom Filters: For million-user address matching (Phase 0x12).
  2. Automated Clawback: For deep re-orgs.
  3. Multi-Source Validation: Anti-RPC-spoofing.

10. Summary

Phase 0x11-a transitions the Funding System to production-ready blockchain integration.

11. Implementation Status (2025-12-29)

11.1 Completed Features

  • Core Funding: DepositService and WithdrawService fully implemented with Integer-Only Persistence (BigInt/i64).
  • Sentinel (BTC): Basic BtcScanner implemented (Polling getblock, HashSet address matching).
  • Api Layer: Deposit/Withdraw history APIs fixed (QA-01) and internal auth secured (QA-03).
  • Address Validation: Strict Regex for BTC/ETH addresses (DEF-001).

11.2 Verification & Testing Guide

Run the verified QA suite covering Core, Chaos, and Security scenarios:

bash scripts/run_0x11a_verification.sh

Results:

  • Agent B (Core): Address Persistence, Deposit/Withdraw Lifecycle ✅
  • Agent A (Chaos): Idempotency, Race Condition Resilience ✅
  • Agent C (Security): Address Isolation, Internal Auth ✅

11.3 Known Limitations (Deferred to 0x11-b)

  • ETH / ERC20 Support: Real chain integration for Ethereum is Pending. EthScanner is currently a stub.
  • DEF-002 (Sentinel SegWit): The current bitcoincore-rpc integration has issues parsing P2WPKH addresses in regtest. Sentinel runs but may miss specific SegWit deposits.
  • Bloom Filters: Currently using HashSet for address matching. Bloom Filters deferred to Phase 0x12 optimizations.



🇨🇳 中文

状态已实施 / QA 验证通过 (Phase 0x11-a 完成)
日期2025-12-29
上下文Phase 0x11 扩展: 从模拟到现实
目标集成真实区块链节点 (Regtest/Testnet) 并处理分布式系统容错 (链重组、网络分区)。

1. 核心架构升级:推 (Push) vs 拉 (Pull)

模拟阶段 (0x11) 依赖 推模式 (API 调用 -> 触发充值)。 真实链集成 (0x11-a) 必须采用 拉模式 (哨兵 -> 被动轮询数据库)。

1.1 哨兵服务 (Sentinel - 新增组件)

一个独立运行的守护进程,负责持续“注视”区块链。

  • 区块扫描 (Block Scanning): 轮询 getblockchaininfo (BTC) 或 eth_blockNumber (ETH)。
  • 过滤器 (Filter): 在内存中索引所有 user_addresses (HashSet)。扫描新块交易时进行快速匹配。
  • 状态追踪 (State Tracking): 持续跟进 CONFIRMING 状态存款的确认数变化。

2. 核心挑战:链重组 (Chain Re-org)

真实区块链中,“最新” 区块并非最终态。它随时可能被孤立 (Orphaned)。

2.1 确认数状态机 (Confirmation State Machine)

必须扩展存款状态流以处理链的不确定性。

状态确认数动作UI 显示
DETECTED (已检测)0记录交易,但 绝对不 增加用户余额。“确认中 (0/X)”
CONFIRMING (确认中)1 ~ (X-1)更新确认数。检查父哈希以防重组。“确认中 (N/X)”
FINALIZED (已完成)>= X动作: 向撮合引擎提交 OrderAction::Deposit“成功”

Important

X 代表 REQUIRED_CONFIRMATIONS (所需确认数) 参数。禁止硬编码,必须按链配置。

2.2 重组检测逻辑

  1. 哨兵记录 Block(Height H) = Hash A
  2. 哨兵稍后再次扫描 Height H
  3. 如果 Hash != A,说明发生了重组
  4. 动作: 回滚扫描游标 (Cursor),重新评估所有受影响的存款。

3. 支持的链 (第一阶段)

3.1 Bitcoin (UTXO 原型)

  • 节点: bitcoind (Regtest 模式)。
  • 挑战: UTXO 管理。比特币存款是新的未花费输出 (UTXO),而非简单的余额变动。
  • Docker: ruimarinho/bitcoin-core:24

3.2 Ethereum (账户/EVM 原型) - 🚧 待实现

  • 状态: 设计完成,等待实现 (Phase 0x11-b)。
  • 节点: anvil (Foundry-rs)。
  • 挑战: Event Log 解析。ERC20 存款体现为 Receipt Log 中的 Transfer 事件。
  • Docker: ghcr.io/foundry-rs/foundry:latest

4. 哨兵架构详解

4.1 BtcSentinel (已实现 - 比特币哨兵)

  1. getblockhash -> getblock (Verbosity 2,获取完整交易细节)。
  2. 遍历输出 vout: 将 scriptPubKeyuser_addresses 匹配。
  3. 重组检查: 维护一个滚动窗口。如果 previousblockhash 不匹配,触发 回滚 (Rollback)

4.2 EthSentinel (计划中 - 0x11-b)

  1. eth_getLogs (Topic0 = Transfer 事件签名)。
  2. 重组检查: 检查已确认日志的 blockHash 是否变更。

5. 对账与安全 (金融防火墙)

5.1 “截断协议” (The Truncation Protocol)

解决链上浮点数/大整数与系统精度不匹配的问题:

  • 入金逻辑: 入账金额 = Truncate(链上原始金额, 系统配置精度)
  • 系统粉尘 (System Dust): 截断后的余数留在热钱包中,归系统所有,不归属用户。

5.2 三角对账策略 (Triangular Reconciliation)

使用三个独立数据源验证系统偿付能力:

来源别名数据点
区块链 RPC资产证明 (PoA)getbalance() 或 UTXO 总和
内部账本负债证明 (PoL)SUM(user.available + user.frozen)
流水历史流水证明 (PoF)SUM(充值) - SUM(提现) - SUM(手续费)

核心对账公式: PoA == PoL + 系统利润

5.3 重组恢复协议

  • 浅层重组: 哨兵自动回滚游标。
  • 深层重组 (> 最大深度): 触发熔断,需人工介入 (冻结提现 + 资金冲正)。

6. 数据库模式扩展

CREATE TABLE chain_cursor (
    chain_id VARCHAR(16) PRIMARY KEY, -- 'BTC', 'ETH'
    last_scanned_height BIGINT NOT NULL,
    last_scanned_hash VARCHAR(128) NOT NULL,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

ALTER TABLE deposit_history 
ADD COLUMN chain_id VARCHAR(16),
ADD COLUMN confirmations INT DEFAULT 0;
-- (其他字段省略)

7. 配置:拒绝硬编码

所有特定于链的参数(确认数、重组深度、最小入金阈值)必须从 YAML 配置文件加载。

8. 安全:HD 钱包架构

8.1 密钥存储

  • 冷存储 (离线): 私钥/助记词永远离线保存。
  • 热服务 (在线): 仅部署 扩展公钥 (XPUB)

8.2 地址派生

  • BTC: BIP84 (原生 SegWit) m/84'/0'/0'/0/{index}
  • ETH: BIP44 m/44'/60'/0'/0/{index}

8.3 “Gap Limit” 解决方案

  • 问题: 标准钱包在连续 20 个空地址后停止扫描。
  • 方案: 全索引扫描。哨兵将 user_addresses 表中 所有 活跃地址加载到 HashSet (当前实现) 或 Bloom Filter (未来优化),无视 Gap Limit。

9. 未来工作 (本次范围之外)

  1. Bloom Filters: 百万级用户地址匹配优化。
  2. 自动冲正 (Automated Clawback): 针对深层重组的自动化处理。
  3. 多源验证: 对抗单一 RPC 节点被劫持的风险。

10. 总结

Phase 0x11-a 将资金系统从模拟环境升级为生产就绪的区块链集成架构。

11. 实施状态报告 (2025-12-29)

11.1 已完成功能

  • 核心资金流: DepositService/WithdrawService 实现,并严格遵守整型持久化 (BigInt/i64)。
  • 哨兵 (BTC): 基础 BtcScanner 已上线 (轮询 getblock, HashSet 地址匹配)。
  • API 层: 充提历史接口已修复 (QA-01),内部 mock 接口已加固 (QA-03)。
  • 地址校验: 实现 BTC/ETH 下的严格格式正则校验 (DEF-001)。

11.2 验证与测试指南

运行全量验证套件 (包含 Core/Chaos/Security 测试):

bash scripts/run_0x11a_verification.sh

验证结果:

  • Agent B (Core): 地址持久化, 充提生命周期 ✅
  • Agent A (Chaos): 幂等性, 竞态条件鲁棒性 ✅
  • Agent C (Security): 地址隔离, 内部接口鉴权 ✅

11.3 已知限制 (推迟至 0x11-b)

  • ETH / ERC20 支持: Ethereum 的真实链集成 尚未实现 (Pending)。EthScanner 目前仅为 Stub。
  • DEF-002 (Sentinel SegWit): 当前 bitcoincore-rpc 集成在 regtest 环境下解析 P2WPKH 地址存在问题,可能会漏掉隔离见证存款。
  • Bloom Filter: 当前版本使用 HashSet 进行地址匹配,Bloom Filter 优化推迟至 Phase 0x12。


0x11-b Sentinel Hardening

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

StatusCOMPLETE (Core)
Date2025-12-30
ContextPhase 0x11-a Extension: Hardening Sentinel for Production
GoalFix SegWit blindness (DEF-002), implement ETH/ERC20 & ADR-005/006.
Branch0x11-b-sentinel-hardening
Latest Commitd307e12

1. Objectives

This phase addresses the critical gaps identified during Phase 0x11-a QA:

PriorityIssueDescription
P0DEF-002Sentinel fails to detect P2WPKH (SegWit) deposits on BTC.
P1ETH GapEthScanner is a stub; no real ERC20 event parsing.

2. Deposit Flow Architecture

Important

🚨 Production Risk Control Requirements

Before crediting user balance on finalization, deposits SHOULD pass through:

  1. Source Verification - Check if sender address is on sanctions/blacklist
  2. Amount Thresholds - Large deposits may require enhanced verification
  3. Pattern Analysis - Detect unusual deposit patterns (structuring, layering)
  4. AML Compliance - Regulatory reporting for threshold amounts
  5. Address Attribution - Verify expected vs actual funding sources

The current implementation credits balance automatically on finalization.

2.1 Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                           Sentinel Deposit Flow                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌──────────┐    ┌──────────────┐    ┌────────────────┐    ┌─────────────┐ │
│  │ BTC/ETH  │───▶│ ChainScanner │───▶│ Confirmation   │───▶│ Deposit     │ │
│  │  Node    │    │              │    │    Monitor     │    │  Pipeline   │ │
│  └──────────┘    └──────────────┘    └────────────────┘    └─────────────┘ │
│       ▲                 │                    │                    │        │
│       │                 ▼                    ▼                    ▼        │
│       │          ┌─────────────┐      ┌───────────┐      ┌─────────────┐   │
│       │          │ ScannedBlock│      │ deposit_  │      │ balances_tb │   │
│       │          │ + Deposits  │      │ history   │      │ (Balance)   │   │
│       │          └─────────────┘      └───────────┘      └─────────────┘   │
│       │                                    DB                   DB         │
└───────┴─────────────────────────────────────────────────────────────────────┘

2.2 State Machine

DETECTED ──▶ CONFIRMING ──▶ FINALIZED ──▶ SUCCESS
              │                              │
              └───────── ORPHANED ◀──────────┘
                      (Re-org detected)
StatusMeaningBalance Impact
DETECTEDOn-chain detected, awaiting confirmation
CONFIRMING1+ confirmations, not yet finalized
FINALIZEDRequired confirmations reached🔄 Processing
SUCCESSBalance credited
ORPHANEDBlock re-orged, tx invalidated

2.3 Key Components

ComponentFileResponsibility
BtcScannersrc/sentinel/btc.rsScan BTC blocks, extract P2PKH/P2WPKH addresses
EthScannersrc/sentinel/eth.rsScan ETH blocks via JSON-RPC
ConfirmationMonitorsrc/sentinel/confirmation.rsTrack confirmations, detect re-orgs
DepositPipelinesrc/sentinel/pipeline.rsCredit balance on finalization

2.4 Database Schema

deposit_history (Deposit Records):

tx_hash       VARCHAR PRIMARY KEY  -- Transaction hash
user_id       BIGINT               -- User ID
asset         VARCHAR              -- Asset (BTC/ETH)
amount        DECIMAL              -- Amount
chain_id      VARCHAR              -- Chain ID
block_height  BIGINT               -- Block height
block_hash    VARCHAR              -- Block hash (for re-org detection)
status        VARCHAR              -- Status (see state machine)
confirmations INT                  -- Current confirmation count

3. Withdraw Flow Architecture

Caution

⛔ Production Risk Control Requirements ⛔

The current implementation is for MVP/Testing only. Before production deployment, withdrawals MUST pass through:

  1. Comprehensive Risk Engine - Real-time fraud detection, velocity limits, address blacklist
  2. Manual Review - Large amounts require human approval
  3. Multi-signature Approval - Hot wallet threshold triggers cold wallet multi-sig
  4. AML/KYC Verification - Regulatory compliance checks
  5. Delay Mechanism - Suspicious transactions held for review period

Never deploy the current auto-approval flow to production!

3.1 Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                         Withdraw Flow (Push Model)                          │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌──────────┐    ┌──────────────┐    ┌────────────────┐    ┌─────────────┐ │
│  │   User   │───▶│ WithdrawServ │───▶│   Balance      │───▶│   Chain     │ │
│  │  Request │    │     ice      │    │    Deduct      │    │  Broadcast  │ │
│  └──────────┘    └──────────────┘    └────────────────┘    └─────────────┘ │
│       │                 │                    │                    │        │
│       │                 ▼                    ▼                    ▼        │
│       │          ┌─────────────┐      ┌───────────┐      ┌─────────────┐   │
│       │          │  Validate   │      │ withdraw_ │      │   TX Hash   │   │
│       │          │  Address    │      │  history  │      │   or Fail   │   │
│       │          └─────────────┘      └───────────┘      └─────────────┘   │
│       │                                    DB                   ▼          │
│       │                              ┌─────────────────────────────────┐   │
│       │                              │ On Fail: AUTO REFUND to balance │   │
│       │                              └─────────────────────────────────┘   │
└───────┴─────────────────────────────────────────────────────────────────────┘

3.2 Flow Steps

1. Validate Request
   └─▶ Address format ✓, Amount > 0 ✓

2. Lock & Check Balance (FOR UPDATE)
   └─▶ available >= amount ? Continue : Error

3. Deduct Balance (Immediate)
   └─▶ available -= amount

4. Create Record (PROCESSING)
   └─▶ INSERT INTO withdraw_history

5. COMMIT Transaction
   └─▶ Balance deducted, record created

6. Broadcast to Chain
   ├─▶ Success: UPDATE status = 'SUCCESS', tx_hash = ?
   └─▶ Failure: AUTO REFUND + status = 'FAILED'

3.3 State Machine

           ┌──────────────┐
           │  PROCESSING  │
           └──────┬───────┘
                  │
      ┌───────────┼───────────┐
      ▼                       ▼
┌──────────┐           ┌──────────┐
│  SUCCESS │           │  FAILED  │
│  (✅ TX) │           │(Refunded)│
└──────────┘           └──────────┘
StatusMeaningBalance Impact
PROCESSINGRequest submitted, awaiting broadcast💰 Deducted
SUCCESSTX broadcast successful✅ Completed
FAILEDBroadcast failed, auto-refunded🔄 Refunded

3.4 Key Components

ComponentFileResponsibility
WithdrawServicesrc/funding/withdraw.rsValidate, deduct, broadcast, refund
ChainClientsrc/funding/chain_adapter.rsBlockchain TX broadcast interface
handlers::apply_withdrawsrc/funding/handlers.rsHTTP API endpoint

3.5 Database Schema

withdraw_history (Withdraw Records):

request_id    VARCHAR PRIMARY KEY  -- Request UUID
user_id       BIGINT               -- User ID
asset         VARCHAR              -- Asset (BTC/ETH)
amount        BIGINT               -- Amount (scaled integer)
fee           BIGINT               -- Network fee (scaled integer)
to_address    VARCHAR              -- Destination address
status        VARCHAR              -- PROCESSING/SUCCESS/FAILED
tx_hash       VARCHAR              -- Blockchain TX hash (on success)
created_at    TIMESTAMP            -- Created time
updated_at    TIMESTAMP            -- Updated time

3.6 Amount Calculation

User Balance Delta = -Request Amount
Network Receive    = Request Amount - Fee

Example:

  • User requests withdraw 1.0 BTC with 0.0001 BTC fee
  • Balance deducted: 1.0 BTC
  • Network receives: 0.9999 BTC

4. 🛡️ Tiered Risk Control Framework (Defense in Depth)

4.1 Defense Layers

┌─────────────────────────────────────────────────────────────────────────────┐
│                       Defense in Depth Architecture                         │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Layer 1: 🟢 AUTOMATED                                                      │
│  ├─▶ Address blacklist/sanctions check                                      │
│  ├─▶ Velocity limits (per hour/day/week)                                    │
│  └─▶ Basic fraud pattern detection                                          │
│                                                                             │
│  Layer 2: 🟡 THRESHOLD-BASED                                                │
│  ├─▶ Amount > $1K: Enhanced verification                                    │
│  ├─▶ Amount > $10K: 24-hour delay + notification                            │
│  └─▶ Amount > $50K: Requires Layer 3                                        │
│                                                                             │
│  Layer 3: 🔴 MANUAL REVIEW                                                  │
│  ├─▶ Human analyst verification                                             │
│  ├─▶ Source of funds documentation                                          │
│  └─▶ Multi-party approval (2-of-3)                                          │
│                                                                             │
│  Layer 4: ⚫ COLD WALLET MULTI-SIG                                          │
│  ├─▶ Amount > $100K: Cold wallet release                                    │
│  ├─▶ Hardware key requirement                                               │
│  └─▶ Geographic distribution of signers                                     │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

4.2 Risk Tiers by Amount

TierAmountDelayApprovalWallet
🟢 T1< $1,000NoneAutoHot
🟡 T2$1K - $10K1 hourAuto + AlertHot
🟠 T3$10K - $50K24 hours1-of-2 ManualHot
🔴 T4$50K - $100K48 hours2-of-3 ManualWarm
T5> $100K72 hours3-of-5 + HSMCold

4.3 Automated Checks (All Tiers)

CheckBlockAlert
OFAC/Sanctions list
Address blacklist
Velocity limit exceeded
New address (< 24h)⚠️ T2+
Unusual amount pattern⚠️ Delay
Geographic anomaly⚠️ Delay

4.4 Deposit-Specific Checks

┌────────────────────────────────────────────────────────────────┐
│                    Deposit Risk Assessment                      │
├────────────────────────────────────────────────────────────────┤
│ ✓ Source address attribution (known exchange? mixer? unknown?) │
│ ✓ Transaction graph analysis (1-hop, 2-hop connections)        │
│ ✓ Timing pattern (structuring detection)                       │
│ ✓ Historical behavior baseline                                  │
│ ✓ Cross-chain correlation (same entity on ETH/BTC?)            │
└────────────────────────────────────────────────────────────────┘

4.5 Withdraw-Specific Checks

┌────────────────────────────────────────────────────────────────┐
│                   Withdraw Risk Assessment                      │
├────────────────────────────────────────────────────────────────┤
│ ✓ Destination address reputation                                │
│ ✓ First-time address penalty                                    │
│ ✓ Account age vs amount ratio                                   │
│ ✓ Recent password/2FA changes (48h cooldown)                   │
│ ✓ Device fingerprint verification                               │
│ ✓ API key usage pattern                                         │
└────────────────────────────────────────────────────────────────┘

5. Problem Analysis: DEF-002 (BTC SegWit Blindness)

5.1 Root Cause

The extract_address function in src/sentinel/btc.rs uses Address::from_script(script, network).

While the rust-bitcoin crate should support P2WPKH scripts (OP_0 <20-byte-hash>), the current implementation may fail due to:

  1. Network mismatch between the script encoding and the Network enum passed.
  2. Missing feature flags in the bitcoincore-rpc dependency.

5.2 Solution

  1. Verify: Add unit test with raw P2WPKH script construction.
  2. Fix: If Address::from_script fails, manually detect witness v0 scripts:
    #![allow(unused)]
    fn main() {
    if script.is_p2wpkh() {
        // Extract 20-byte hash from script[2..22]
        // Construct Address::p2wpkh(...)
    }
    }

6. Feature Specification: ETH/ERC20 Sentinel

6.1 Architecture

┌─────────────────────────────────────────────────────────────────┐
│                       EthScanner                                │
├─────────────────────────────────────────────────────────────────┤
│ 1. Poll eth_blockNumber (Tip Tracking)                          │
│ 2. eth_getLogs(fromBlock, toBlock, topics=[Transfer])           │
│ 3. Filter: Match log.address (Contract) + topic[2] (To)         │
│ 4. Parse: Decode log.data as uint256 amount                     │
│ 5. Emit: DetectedDeposit { tx_hash, to_address, amount, ... }   │
└─────────────────────────────────────────────────────────────────┘

6.2 Key Implementation Details

  • Topic0 (Transfer): keccak256("Transfer(address,address,uint256)") = 0xddf252ad...
  • Topic1: Sender (indexed)
  • Topic2: Recipient (indexed) - Match against user_addresses
  • Data: Amount (uint256, left-padded)

6.3 Precision Handling

TokenDecimalsScaling
ETH18amount / 10^18
USDT6amount / 10^6
USDC6amount / 10^6

Important

Token decimals MUST be loaded from assets_tb, not hardcoded.


7. Database Schema Extensions

-- EthScanner requires contract address tracking
ALTER TABLE assets_tb
ADD COLUMN contract_address VARCHAR(42); -- e.g., '0xdAC17F958D2ee523a2206206994597C13D831ec7'

-- Index for fast lookup by contract
CREATE INDEX idx_assets_contract ON assets_tb(contract_address);

8. Configuration: config/sentinel.yaml

eth:
  chain_id: "ETH"
  network: "anvil"  # or "mainnet", "goerli"
  rpc:
    url: "http://127.0.0.1:8545"
  scanning:
    required_confirmations: 12
    max_reorg_depth: 20
    start_height: 0
  contracts:
    - name: "USDT"
      address: "0x..."
      decimals: 6
    - name: "USDC"
      address: "0x..."
      decimals: 6

9. Acceptance Criteria

  • BTC: Unit test test_p2wpkh_extraction passes. ✅ (test_segwit_p2wpkh_extraction_def_002)
  • BTC: E2E deposit to bcrt1... address is detected and credited. ✅ (Verified via greybox test)
  • ETH: Unit test test_erc20_transfer_parsing passes. ✅ (7 ETH tests pass)
  • ETH: E2E deposit via MockUSDT contract is detected. ⏳ (Pending: ERC20 eth_getLogs not yet implemented)
  • Regression: All existing Phase 0x11-a tests still pass. ✅ (322 tests)

10. Implementation Status

ComponentStatusNotes
BtcScanner P2WPKH FixCompleteTest test_segwit_p2wpkh_extraction_def_002 passes
EthScanner ImplementationCompleteFull JSON-RPC (eth_blockNumber, eth_getBlockByNumber, eth_syncing)
Unit Tests22 PassAll Sentinel tests passing
E2E Verification⚠️ PartialNodes not running during test; scripts ready
ERC20 Token Support🚧 In Progresseth_getLogs for Transfer events (Phase 0x11-b scope)

11. Testing Instructions

Quick Test (Rust Unit Tests)

# Run all Sentinel tests
cargo test --package zero_x_infinity --lib sentinel -- --nocapture

# Run DEF-002 verification test only
cargo test test_segwit_p2wpkh_extraction_def_002 -- --nocapture

# Run ETH Scanner tests only
cargo test sentinel::eth -- --nocapture

Full Test Suite

# Run test script (no nodes required)
./scripts/tests/0x11b_sentinel/run_tests.sh

# Run with node startup (requires docker-compose)
./scripts/tests/0x11b_sentinel/run_tests.sh --with-nodes




🇨🇳 中文

状态核心功能已完成
日期2025-12-29
上下文Phase 0x11-a 延续: 强化哨兵服务
目标修复 SegWit 盲区 (DEF-002) 并实现 ETH/ERC20 支持。
分支0x11-b-sentinel-hardening
最新提交d383b6c

1. 目标

本阶段解决 Phase 0x11-a QA 中识别的关键缺陷:

优先级问题描述
P0DEF-002哨兵无法检测 BTC P2WPKH (SegWit) 充值。
P1ETH 缺口EthScanner 只是空壳;无法解析 ERC20 事件。

2. 充值流程架构

Important

🚨 生产环境风控要求

在确认完成后为用户入账之前,充值 应该 经过:

  1. 来源验证 - 检查发送地址是否在制裁/黑名单上
  2. 金额阈值 - 大额充值可能需要加强验证
  3. 模式分析 - 检测异常充值模式 (拆分、分层)
  4. AML 合规 - 超过阈值金额的监管报告
  5. 地址归属 - 验证预期 vs 实际资金来源

当前实现在确认完成后自动入账。

2.1 概览

┌─────────────────────────────────────────────────────────────────────────────┐
│                           Sentinel 充值流程                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌──────────┐    ┌──────────────┐    ┌────────────────┐    ┌─────────────┐ │
│  │ BTC/ETH  │───▶│ ChainScanner │───▶│ Confirmation   │───▶│ Deposit     │ │
│  │   节点   │    │  区块扫描器  │    │    Monitor     │    │  Pipeline   │ │
│  └──────────┘    └──────────────┘    └────────────────┘    └─────────────┘ │
│       ▲                 │                    │                    │        │
│       │                 ▼                    ▼                    ▼        │
│       │          ┌─────────────┐      ┌───────────┐      ┌─────────────┐   │
│       │          │ ScannedBlock│      │ deposit_  │      │ balances_tb │   │
│       │          │  扫描区块   │      │  history  │      │   余额表    │   │
│       │          └─────────────┘      └───────────┘      └─────────────┘   │
│       │                                   数据库                数据库      │
└───────┴─────────────────────────────────────────────────────────────────────┘

2.2 状态机

DETECTED ──▶ CONFIRMING ──▶ FINALIZED ──▶ SUCCESS
    已检测       确认中          已完成       成功
              │                              │
              └───────── ORPHANED ◀──────────┘
                        已孤立 (区块重组)
状态含义余额影响
DETECTED链上检测到,等待确认
CONFIRMING有 1+ 确认,尚未达标
FINALIZED达到所需确认数🔄 处理中
SUCCESS已入账到余额
ORPHANED区块被重组,交易失效

2.3 关键组件

组件文件职责
BtcScannersrc/sentinel/btc.rs扫描 BTC 区块,提取 P2PKH/P2WPKH 地址
EthScannersrc/sentinel/eth.rs通过 JSON-RPC 扫描 ETH 区块
ConfirmationMonitorsrc/sentinel/confirmation.rs追踪确认数,检测重组
DepositPipelinesrc/sentinel/pipeline.rs完成后入账余额

2.4 数据库结构

deposit_history (充值记录表):

tx_hash       VARCHAR PRIMARY KEY  -- 交易哈希
user_id       BIGINT               -- 用户 ID
asset         VARCHAR              -- 资产 (BTC/ETH)
amount        DECIMAL              -- 金额
chain_id      VARCHAR              -- 链 ID
block_height  BIGINT               -- 区块高度
block_hash    VARCHAR              -- 区块哈希 (用于重组检测)
status        VARCHAR              -- 状态 (见状态机)
confirmations INT                  -- 当前确认数

3. 提现流程架构

Caution

⛔ 生产环境风控要求 ⛔

当前实现仅用于 MVP/测试。生产部署前,提现请求 必须 经过:

  1. 完整风控引擎 - 实时欺诈检测、频率限制、地址黑名单
  2. 人工审核 - 大额提现需人工批准
  3. 多签审批 - 热钱包阈值触发冷钱包多签
  4. AML/KYC 验证 - 合规性检查
  5. 延迟机制 - 可疑交易进入审核等待期

绝对不要将当前自动审批流程部署到生产环境!

3.1 概览

┌─────────────────────────────────────────────────────────────────────────────┐
│                         提现流程 (推送模式)                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌──────────┐    ┌──────────────┐    ┌────────────────┐    ┌─────────────┐ │
│  │   用户   │───▶│ WithdrawServ │───▶│   余额扣减     │───▶│   链上广播  │ │
│  │   请求   │    │   提现服务   │    │   (立即)       │    │             │ │
│  └──────────┘    └──────────────┘    └────────────────┘    └─────────────┘ │
│       │                 │                    │                    │        │
│       │                 ▼                    ▼                    ▼        │
│       │          ┌─────────────┐      ┌───────────┐      ┌─────────────┐   │
│       │          │   地址验证  │      │ withdraw_ │      │ TX Hash 或  │   │
│       │          │             │      │  history  │      │   失败      │   │
│       │          └─────────────┘      └───────────┘      └─────────────┘   │
│       │                                   数据库                 ▼         │
│       │                              ┌─────────────────────────────────┐   │
│       │                              │ 失败时: 自动退款到余额          │   │
│       │                              └─────────────────────────────────┘   │
└───────┴─────────────────────────────────────────────────────────────────────┘

3.2 流程步骤

1. 验证请求
   └─▶ 地址格式 ✓, 金额 > 0 ✓

2. 锁定并检查余额 (FOR UPDATE)
   └─▶ 可用余额 >= 金额 ? 继续 : 错误

3. 扣减余额 (立即)
   └─▶ 可用余额 -= 金额

4. 创建记录 (PROCESSING)
   └─▶ INSERT INTO withdraw_history

5. 提交事务
   └─▶ 余额已扣减,记录已创建

6. 广播到链
   ├─▶ 成功: UPDATE status = 'SUCCESS', tx_hash = ?
   └─▶ 失败: 自动退款 + status = 'FAILED'

3.3 状态机

           ┌──────────────┐
           │  PROCESSING  │
           │    处理中    │
           └──────┬───────┘
                  │
      ┌───────────┼───────────┐
      ▼                       ▼
┌──────────┐           ┌──────────┐
│  SUCCESS │           │  FAILED  │
│   成功   │           │  失败    │
│  (✅ TX) │           │(已退款)  │
└──────────┘           └──────────┘
状态含义余额影响
PROCESSING请求已提交,等待广播💰 已扣减
SUCCESS交易广播成功✅ 完成
FAILED广播失败,已自动退款🔄 已退款

3.4 关键组件

组件文件职责
WithdrawServicesrc/funding/withdraw.rs验证、扣减、广播、退款
ChainClientsrc/funding/chain_adapter.rs区块链交易广播接口
handlers::apply_withdrawsrc/funding/handlers.rsHTTP API 端点

3.5 数据库结构

withdraw_history (提现记录表):

request_id    VARCHAR PRIMARY KEY  -- 请求 UUID
user_id       BIGINT               -- 用户 ID
asset         VARCHAR              -- 资产 (BTC/ETH)
amount        BIGINT               -- 金额 (整数缩放)
fee           BIGINT               -- 网络手续费 (整数缩放)
to_address    VARCHAR              -- 目标地址
status        VARCHAR              -- PROCESSING/SUCCESS/FAILED
tx_hash       VARCHAR              -- 区块链交易哈希 (成功时)
created_at    TIMESTAMP            -- 创建时间
updated_at    TIMESTAMP            -- 更新时间

3.6 金额计算

用户余额变化 = -请求金额
链上到账金额 = 请求金额 - 手续费

示例:

  • 用户请求提现 1.0 BTC,手续费 0.0001 BTC
  • 余额扣减: 1.0 BTC
  • 链上到账: 0.9999 BTC

4. 🛡️ 分级纵深防御风控框架

4.1 防御层级

┌─────────────────────────────────────────────────────────────────────────────┐
│                          纵深防御架构                                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  第一层: 🟢 自动化检查                                                       │
│  ├─▶ 地址黑名单/制裁名单检查                                                 │
│  ├─▶ 频率限制 (每小时/每天/每周)                                              │
│  └─▶ 基础欺诈模式检测                                                        │
│                                                                             │
│  第二层: 🟡 阈值触发                                                         │
│  ├─▶ 金额 > ¥7K: 加强验证                                                    │
│  ├─▶ 金额 > ¥70K: 24小时延迟 + 通知                                          │
│  └─▶ 金额 > ¥350K: 进入第三层                                                │
│                                                                             │
│  第三层: 🔴 人工审核                                                         │
│  ├─▶ 人工分析师验证                                                          │
│  ├─▶ 资金来源证明文件                                                        │
│  └─▶ 多方审批 (2-of-3)                                                       │
│                                                                             │
│  第四层: ⚫ 冷钱包多签                                                        │
│  ├─▶ 金额 > ¥700K: 冷钱包释放                                                │
│  ├─▶ 硬件密钥要求                                                            │
│  └─▶ 签名者地理分布                                                          │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

4.2 风险分级 (按金额)

层级金额延迟审批钱包
🟢 T1< ¥7,000自动
🟡 T2¥7K - ¥70K1小时自动 + 告警
🟠 T3¥70K - ¥350K24小时1-of-2 人工
🔴 T4¥350K - ¥700K48小时2-of-3 人工
T5> ¥700K72小时3-of-5 + HSM

4.3 自动化检查 (所有层级)

检查项阻止告警
OFAC/制裁名单
地址黑名单
超过频率限制
新地址 (< 24h)⚠️ T2+
异常金额模式⚠️ 延迟
地理位置异常⚠️ 延迟

4.4 充值专项检查

┌────────────────────────────────────────────────────────────────┐
│                       充值风险评估                              │
├────────────────────────────────────────────────────────────────┤
│ ✓ 来源地址归属 (已知交易所? 混币器? 未知?)                      │
│ ✓ 交易图谱分析 (1跳、2跳关联)                                   │
│ ✓ 时序模式 (拆分检测)                                          │
│ ✓ 历史行为基线                                                  │
│ ✓ 跨链关联 (同一实体在 ETH/BTC?)                                │
└────────────────────────────────────────────────────────────────┘

4.5 提现专项检查

┌────────────────────────────────────────────────────────────────┐
│                       提现风险评估                              │
├────────────────────────────────────────────────────────────────┤
│ ✓ 目标地址信誉                                                  │
│ ✓ 首次使用地址惩罚                                              │
│ ✓ 账户年龄 vs 金额比率                                          │
│ ✓ 近期密码/2FA变更 (48h冷却)                                    │
│ ✓ 设备指纹验证                                                  │
│ ✓ API密钥使用模式                                               │
└────────────────────────────────────────────────────────────────┘

5. 问题分析: DEF-002 (BTC SegWit 盲区)

5.1 根因

src/sentinel/btc.rs 中的 extract_address 函数使用 Address::from_script(script, network)

虽然 rust-bitcoin理论上 支持 P2WPKH 脚本 (OP_0 <20-byte-hash>),但当前实现可能因以下原因失败:

  1. 脚本编码与传入的 Network 枚举不匹配。
  2. bitcoincore-rpc 依赖缺少必要的 feature flags。

5.2 解决方案

  1. 验证: 添加单元测试,手动构造原始 P2WPKH 脚本。
  2. 修复: 如果 Address::from_script 失败,手动检测 witness v0 脚本:
    #![allow(unused)]
    fn main() {
    if script.is_p2wpkh() {
        // 从 script[2..22] 提取 20 字节哈希
        // 构造 Address::p2wpkh(...)
    }
    }

6. 功能规格: ETH/ERC20 哨兵

6.1 架构

┌─────────────────────────────────────────────────────────────────┐
│                       EthScanner                                │
├─────────────────────────────────────────────────────────────────┤
│ 1. 轮询 eth_blockNumber (区块高度追踪)                           │
│ 2. eth_getLogs(fromBlock, toBlock, topics=[Transfer])           │
│ 3. 过滤: 匹配 log.address (合约) + topic[2] (收款人)             │
│ 4. 解析: 将 log.data 解码为 uint256 金额                         │
│ 5. 产出: DetectedDeposit { tx_hash, to_address, amount, ... }   │
└─────────────────────────────────────────────────────────────────┘

6.2 关键实现细节

  • Topic0 (Transfer): keccak256("Transfer(address,address,uint256)") = 0xddf252ad...
  • Topic1: 发送方 (indexed)
  • Topic2: 接收方 (indexed) - user_addresses 匹配
  • Data: 金额 (uint256, 左填充)

6.3 精度处理

代币小数位缩放比例
ETH18amount / 10^18
USDT6amount / 10^6
USDC6amount / 10^6

Important

代币精度 必须assets_tb 加载,禁止硬编码


7. 数据库模式扩展

-- EthScanner 需要追踪合约地址
ALTER TABLE assets_tb
ADD COLUMN contract_address VARCHAR(42); -- 例: '0xdAC17F958D2ee523a2206206994597C13D831ec7'

-- 按合约快速查询的索引
CREATE INDEX idx_assets_contract ON assets_tb(contract_address);

8. 配置: config/sentinel.yaml

eth:
  chain_id: "ETH"
  network: "anvil"  # 或 "mainnet", "goerli"
  rpc:
    url: "http://127.0.0.1:8545"
  scanning:
    required_confirmations: 12
    max_reorg_depth: 20
    start_height: 0
  contracts:
    - name: "USDT"
      address: "0x..."
      decimals: 6
    - name: "USDC"
      address: "0x..."
      decimals: 6

9. 验收标准

  • BTC: 单元测试 test_p2wpkh_extraction 通过。 ✅ (test_segwit_p2wpkh_extraction_def_002)
  • BTC: E2E 测试中充值到 bcrt1... 地址被检测并入账。 ✅ (通过 greybox 测试验证)
  • ETH: 单元测试 test_erc20_transfer_parsing 通过。 ✅ (7 个 ETH 测试通过)
  • ETH: E2E 测试中通过 MockUSDT 合约充值被检测。 ⏳ (待完成: ERC20 eth_getLogs 尚未实现)
  • 回归: 所有 Phase 0x11-a 现有测试仍然通过。 ✅ (322 个测试)

10. 实施状态

组件状态备注
BtcScanner P2WPKH 修复已完成测试 test_segwit_p2wpkh_extraction_def_002 通过
EthScanner 实现已完成完整 JSON-RPC (eth_blockNumber, eth_getBlockByNumber, eth_syncing)
单元测试22 通过所有 Sentinel 测试通过
E2E 验证⚠️ 部分测试时节点未运行;脚本已就绪
ERC20 代币支持🚧 进行中eth_getLogs for Transfer events (Phase 0x11-b 范围)

11. 测试方法

快速测试 (Rust 单元测试)

# 运行所有 Sentinel 测试
cargo test --package zero_x_infinity --lib sentinel -- --nocapture

# 仅运行 DEF-002 验证测试
cargo test test_segwit_p2wpkh_extraction_def_002 -- --nocapture

# 仅运行 ETH Scanner 测试
cargo test sentinel::eth -- --nocapture

完整测试套件

# 运行测试脚本 (无需节点)
./scripts/tests/0x11b_sentinel/run_tests.sh

# 运行测试脚本 (自动启动节点, 需要 docker-compose)
./scripts/tests/0x11b_sentinel/run_tests.sh --with-nodes




Appendix A: Industry Standards Reference

Full Design: See Chains Schema Design for complete schema and industry standards.

Naming Conventions

ConceptIndustry TermOur ColumnType
Business IDshortNamechain_slugVARCHAR
EIP-155 IDchainIdchain_idINTEGER
Native TokennativeCurrency.symbolnative_currencyVARCHAR

References

Phase 0x11-b Schema

-- Minimum viable: uses chain_slug only
CREATE TABLE user_addresses (
    user_id BIGINT,
    asset VARCHAR(32),
    chain_slug VARCHAR(32),  -- "eth", "btc"
    address VARCHAR(255),
    PRIMARY KEY (user_id, asset, chain_slug)
);

0x12 Real Trading Verification

🚧 Documentation In Progress

0x13 Market Data Experience

🚧 Documentation In Progress

0x14 Extreme Optimization: Methodology

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

Phase V Keynote Codename: “Metal Mode” Philosophy: “If you can’t measure it, you can’t improve it.”

1. The Performance Ceiling

In the previous chapters, we built a highly reliable exchange core (Phase I-IV). We achieved 1.3M TPS on a single thread using the Ring Buffer architecture. This is “fast enough” for 99% of crypto exchanges.

But for top-tier HFT engines, “Fast Enough” is not enough. We want to hit the physical limits of the CPU and Memory.

1.1 Why “Extreme Optimization”?

PhaseFocusGoal
I-IIICorrectness“Does it work?”
IVIntegration“Does it work end-to-end?”
VSpeed“How fast can it go?”

In Phase V, we assume correctness is already proven. Our sole focus is performance.

1.2 Why “Metal Mode”?

“Metal Mode” is our internal codename. It means:

  • Close to the Metal: We will bypass high-level abstractions and work directly with memory layouts, CPU caches, and SIMD instructions.
  • Bare Metal Rust: No unnecessary clone(), no hidden malloc(), no runtime surprises.

2. The Benchmarking Methodology (Tier 2)

To optimize, we must first measure. But what we measure matters.

2.1 The Problem with Naive Benchmarks

Benchmark TypeWhat it MeasuresProblem for Optimization
wrk / curlHTTP round-tripIncludes OS, Network, Kernel noise
Unit testsFunction correctnessNo performance data

These are useful for validation (Phase IV), but not for isolation (Phase V).

2.2 Tier 2: Pipeline Benchmarks

We introduce Tier 2 Pipeline Benchmarks:

FeatureDescription
No Network I/OData is pre-loaded in memory.
No Disk I/OWAL is mocked or in-memory.
Pure CPU/MemoryMeasures only the “Hot Path”: RingBuffer → UBSCore → ME → Settlement.
DeterministicSame input → Same output → Same timing.

Goal: Establish the “Red Line” – the current baseline performance under ideal conditions. All future optimizations will be measured against this.


🇨🇳 中文

Phase V 基调 内部代号: “Metal Mode” 核心哲学: “无法测量,就无法优化。”

1. 性能天花板

在前几个阶段(Phase I-IV),我们构建了一个高可靠的交易所核心。利用 Ring Buffer 架构,我们在单线程上实现了 130万 TPS。对于 99% 的加密货币交易所来说,这已经“足够快“了。

但对于顶级的 HFT 引擎,“足够快“是不够的。我们要触达 CPU 和内存的物理极限。

1.1 为什么叫 “Extreme Optimization”?

阶段关注点目标
I-III正确性“能跑吗?”
IV集成“端到端能跑通吗?”
V速度“能跑多快?”

在 Phase V,我们假设正确性已经被验证。唯一的焦点是性能

1.2 为什么叫 “Metal Mode”?

“Metal Mode” 是我们的内部代号,意为:

  • 贴近金属 (Close to the Metal):我们将绕过高层抽象,直接操作内存布局、CPU 缓存和 SIMD 指令。
  • Bare Metal Rust:没有不必要的 clone(),没有隐藏的 malloc(),没有运行时惊喜。

2. 基准测试方法论 (Tier 2)

要优化,必须先测量。但测什么至关重要。

2.1 朴素基准测试的问题

基准测试类型测量内容优化的问题
wrk / curlHTTP 往返包含操作系统、网络、内核噪声
单元测试函数正确性没有性能数据

这些对于验证 (Phase IV) 有用,但不适合隔离测试 (Phase V)

2.2 Tier 2: 流水线基准测试 (Pipeline Benchmarks)

我们引入 Tier 2 流水线基准测试

特性描述
无网络 I/O数据预加载在内存中。
无磁盘 I/OWAL 被 Mock 或在内存中。
纯 CPU/内存只测量“热路径“:RingBuffer → UBSCore → ME → Settlement。
确定性相同输入 → 相同输出 → 相同耗时。

目标:建立 “Red Line (红线)” – 理想条件下的当前基线性能。所有后续优化都将以此为基准进行衡量。

0x14-a Benchmark Harness: Test Data Generation

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

StatusIMPLEMENTED / QA VERIFIED (Phase 0x14-a Complete)
Date2025-12-30
ContextPhase V: Extreme Optimization (Step 1)
GoalRe-implement Exchange-Core test data generation algorithm in Rust and verify correctness against golden data.

1. Chapter Objectives

#GoalDeliverable
1Implement LCG PRNGsrc/bench/java_random.rs - Java-compatible random generator
2Implement Order Generatorsrc/bench/order_generator.rs - Deterministic order sequence
3Verify CorrectnessUnit tests that compare generated data with golden_*.csv

Success Criteria: Generated data matches golden CSV byte-for-byte (same order_id, price, size, uid for each row).


2. Reference Algorithm: LCG PRNG

The Exchange-Core project uses Java’s java.util.Random as its PRNG. We must implement a bit-exact replica.

2.1 Java Random Implementation

#![allow(unused)]
fn main() {
/// Java-compatible Linear Congruential Generator
pub struct JavaRandom {
    seed: u64,
}

impl JavaRandom {
    const MULTIPLIER: u64 = 0x5DEECE66D;
    const ADDEND: u64 = 0xB;
    const MASK: u64 = (1 << 48) - 1;

    pub fn new(seed: i64) -> Self {
        Self {
            seed: (seed as u64 ^ Self::MULTIPLIER) & Self::MASK,
        }
    }

    fn next(&mut self, bits: u32) -> i32 {
        self.seed = self.seed
            .wrapping_mul(Self::MULTIPLIER)
            .wrapping_add(Self::ADDEND) & Self::MASK;
        (self.seed >> (48 - bits)) as i32
    }

    pub fn next_int(&mut self, bound: i32) -> i32 {
        assert!(bound > 0);
        let bound = bound as u32;
        if (bound & bound.wrapping_sub(1)) == 0 {
            // Power of two
            return ((bound as u64 * self.next(31) as u64) >> 31) as i32;
        }
        loop {
            let bits = self.next(31) as u32;
            let val = bits % bound;
            if bits.wrapping_sub(val).wrapping_add(bound.wrapping_sub(1)) >= bits {
                return val as i32;
            }
        }
    }

    pub fn next_long(&mut self) -> i64 {
        ((self.next(32) as i64) << 32) + self.next(32) as i64
    }

    pub fn next_double(&mut self) -> f64 {
        let a = (self.next(26) as u64) << 27;
        let b = self.next(27) as u64;
        (a + b) as f64 / ((1u64 << 53) as f64)
    }
}
}

2.2 Seed Derivation

Each test session derives its seed from symbol_id and benchmark_seed:

#![allow(unused)]
fn main() {
fn derive_session_seed(symbol_id: i32, benchmark_seed: i64) -> i64 {
    let mut hash: i64 = 1;
    hash = 31 * hash + (symbol_id as i64 * -177277);
    hash = 31 * hash + (benchmark_seed * 10037 + 198267);
    hash
}
}

3. Golden Data Reference

Location: docs/exchange_core_verification_kit/golden_data/

FileRecordsSeedDescription
golden_single_pair_margin.csv11,0001Margin (futures) contract
golden_single_pair_exchange.csv11,0001Spot exchange

CSV Format:

phase,command,order_id,symbol,price,size,action,order_type,uid

4. Implementation Checklist

  • Step 1: Create src/bench/mod.rs
  • Step 2: Implement JavaRandom in src/bench/java_random.rs
    • Unit test: verify first 100 random numbers match Java output
  • Step 3: Implement TestOrdersGenerator in src/bench/order_generator.rs
    • Pareto distribution for symbol/user weights
    • Order generation logic (GTC orders for FILL phase)
    • Seed derivation using Objects.hash formula
  • Step 4: Load and compare with golden CSV
    • #[test] fn test_golden_single_pair_margin()
    • #[test] fn test_golden_single_pair_exchange()

5. Implementation Results

Note

✅ FILL PHASE: 100% BIT-EXACT MATCH (1,000 orders) ⚠️ BENCHMARK PHASE: Requires matching engine (10,000 orders)

5.1 FILL Phase (Rows 1-1000)

FieldMatch StatusFormula
Price✅ 100%pow(r,2)*deviation + 4-value averaging
Size✅ 100%1 + rand(6)*rand(6)*rand(6)
Action✅ 100%(rand(4)+priceDir>=2) ? BID : ASK
UID✅ 100%Pareto user account generation

5.2 BENCHMARK Phase Analysis

ComponentStatusNotes
RNG Sequence✅ AlignednextInt(4) for action FIRST, then nextInt(q_range)
Order Selection✅ AlignedUses orderUids iterator (BTreeMap deterministic)
IOC Simulation✅ ImplementedShadow order book with simulate_ioc_match
Order Book Feedback❌ GapJava uses real matcher feedback for lackOfOrders

Important

BENCHMARK Phase Gap: Java’s generateRandomOrder uses lastOrderBookOrdersSizeAsk/Bid from the real matching engine (updated in updateOrderBookSizeStat). Without a full Rust matching engine, the shadow book diverges from Java’s state.

5.3 Golden Data Scale

DatasetFILLBENCHMARKTotal
golden_single_pair_margin.csv1,00010,00011,000
golden_single_pair_exchange.csv1,00010,00011,000

5.4 Key Implementation Details

  1. JavaRandom - Bit-exact java.util.Random LCG
  2. Seed derivation: Objects.hash(symbol*-177277, seed*10037+198267)
  3. User accounts: 1 + (int)paretoSample formula
  4. Currency order: [978, 840] based on HashMap bucket index
  5. CENTRAL_MOVE_ALPHA: 0.01 (not 0.1)
  6. Shadow Order Book: ask_orders/bid_orders Vec with O(1) swap_remove

6. Verification Commands

One-Click Verification:

# Run all golden data verification tests
cargo test golden_ -- --nocapture

Detailed Comparison Test:

# Compare first 20 orders against golden CSV with full output
cargo test test_generator_vs_golden_detailed -- --nocapture

All Benchmark Tests:

# Run all tests in the bench module
cargo test bench:: -- --nocapture

Expected Output:

[  1] ✅ | Golden: id=1, price=34386, size=  1, action=BID, uid=377
[  2] ✅ | Golden: id=2, price=34135, size=  1, action=BID, uid=110
[  3] ✅ | Golden: id=3, price=34347, size=  2, action=BID, uid=459
...
[20] ✅ | Golden: id=20, price=34297, size=  1, action=BID, uid=491

7. Fair Benchmark Procedure

Important

Key to Fairness: Generation and Execution must be separated. Java pre-generates all commands into memory before testing.

7.1 Four Phase Separation

Phase 1: Data Pre-generation ───────── ⏸️ Not Timed
Phase 2: FILL (Pre-fill) ───────────── ⏸️ Not Timed  
Phase 3: BENCHMARK (Stress) ────────── ⏱️ Timed Phase
Phase 4: Verification ──────────────── ⏸️ Not Timed

7.2 Rust Implementation Spec

#![allow(unused)]
fn main() {
// ✅ Correct: Pre-generate -> Then Execute
let (fill_commands, benchmark_commands) = generator.pre_generate_all();

// Phase 2: FILL (Not Timed)
for cmd in &fill_commands {
    exchange.execute(cmd);
}

// Phase 3: BENCHMARK (Timed Only)
let start = Instant::now();
for cmd in &benchmark_commands {
    exchange.execute(cmd);
}
let mtps = benchmark_commands.len() as f64 / start.elapsed().as_secs_f64() / 1_000_000.0;
}

7.3 Pre-generation Interface

#![allow(unused)]
fn main() {
impl TestOrdersGeneratorSession {
    /// Pre-generate all commands for fair benchmarking
    pub fn pre_generate_all(&mut self) -> (Vec<TestCommand>, Vec<TestCommand>) {
        let fill_count = self.config.target_orders_per_side * 2;
        let benchmark_count = self.config.symbol_messages;
        
        let fill: Vec<_> = (0..fill_count).map(|_| self.next_command()).collect();
        let benchmark: Vec<_> = (0..benchmark_count).map(|_| self.next_command()).collect();
        
        (fill, benchmark)
    }
}
}

7.4 Current Status vs ME Requirements

TaskCurrentNeeds ME
Pre-gen Method pre_generate_all()-
Generate 3M orders to memory-
Export CSV for verification-
Execute FILL Phase-
Execute BENCHMARK Phase-
Global Balance Verification-

8. Phase 0x14-a Summary

8.1 Completed Components

ComponentStatusVerification
JavaRandom LCG PRNGBit-exact with Java
Seed DerivationObjects.hash reproduction
TestOrdersGeneratorFILL 1000 rows 100% matched
Shadow OrderBookIOC Simulation implemented
Pre-gen Interfacepre_generate_all(), pre_generate_3m()
Fair Test Procedure DocsSection 7, Appendix B

8.2 BENCHMARK Phase Gap Analysis

CauseDescription
Matching Engine FeedbackJava uses lastOrderBookOrdersSizeAsk/Bid to decide growOrders.
ImpactCommand type distribution (GTC vs IOC) differs slightly.
SolutionPhase 0x14-b introduces full ME to reach 100% parity.

8.3 Next Steps

PriorityTaskDependency
P0Implement Rust Matching Engine (Phase 0x14-b)-
P13M Orders Stress Test VerificationMatching Engine
P2Latency Stats (HdrHistogram)Matching Engine



🇨🇳 中文

状态已实施 / QA 验证通过 (Phase 0x14-a 完成)
日期2025-12-30
上下文Phase V: 极致优化 (Step 1)
目标用 Rust 重新实现 Exchange-Core 测试数据生成算法,并对比黄金数据验证正确性。

1. 章节目标

#目标交付物
1实现 LCG PRNGsrc/bench/java_random.rs - Java 兼容随机数生成器
2实现订单生成器src/bench/order_generator.rs - 确定性订单序列
3验证正确性单元测试对比生成数据与 golden_*.csv

成功标准: 生成的数据与黄金 CSV 逐字节匹配(每行的 order_id, price, size, uid 完全一致)。


2. 参考算法: LCG PRNG

Exchange-Core 项目使用 Java 的 java.util.Random 作为 PRNG。我们必须实现一个比特级精确的副本。

2.1 Java Random Implementation

#![allow(unused)]
fn main() {
/// Java-compatible Linear Congruential Generator
pub struct JavaRandom {
    seed: u64,
}

impl JavaRandom {
    const MULTIPLIER: u64 = 0x5DEECE66D;
    const ADDEND: u64 = 0xB;
    const MASK: u64 = (1 << 48) - 1;

    pub fn new(seed: i64) -> Self {
        Self {
            seed: (seed as u64 ^ Self::MULTIPLIER) & Self::MASK,
        }
    }

    fn next(&mut self, bits: u32) -> i32 {
        self.seed = self.seed
            .wrapping_mul(Self::MULTIPLIER)
            .wrapping_add(Self::ADDEND) & Self::MASK;
        (self.seed >> (48 - bits)) as i32
    }

    pub fn next_int(&mut self, bound: i32) -> i32 {
        assert!(bound > 0);
        let bound = bound as u32;
        if (bound & bound.wrapping_sub(1)) == 0 {
            // Power of two
            return ((bound as u64 * self.next(31) as u64) >> 31) as i32;
        }
        loop {
            let bits = self.next(31) as u32;
            let val = bits % bound;
            if bits.wrapping_sub(val).wrapping_add(bound.wrapping_sub(1)) >= bits {
                return val as i32;
            }
        }
    }

    pub fn next_long(&mut self) -> i64 {
        ((self.next(32) as i64) << 32) + self.next(32) as i64
    }

    pub fn next_double(&mut self) -> f64 {
        let a = (self.next(26) as u64) << 27;
        let b = self.next(27) as u64;
        (a + b) as f64 / ((1u64 << 53) as f64)
    }
}
}

2.2 Seed Derivation

Each test session derives its seed from symbol_id and benchmark_seed:

#![allow(unused)]
fn main() {
fn derive_session_seed(symbol_id: i32, benchmark_seed: i64) -> i64 {
    let mut hash: i64 = 1;
    hash = 31 * hash + (symbol_id as i64 * -177277);
    hash = 31 * hash + (benchmark_seed * 10037 + 198267);
    hash
}
}

3. 黄金数据参考

位置: docs/exchange_core_verification_kit/golden_data/

文件记录数Seed描述
golden_single_pair_margin.csv11,0001保证金(期货)合约
golden_single_pair_exchange.csv11,0001现货交易

4. 实施清单

  • 步骤 1: 创建 src/bench/mod.rs
  • 步骤 2: 在 src/bench/java_random.rs 中实现 JavaRandom
    • 单元测试: 验证前 100 个随机数与 Java 输出匹配
  • 步骤 3: 在 src/bench/order_generator.rs 中实现 TestOrdersGenerator
    • Pareto 分布用于用户权重
    • 订单生成逻辑 (GTC 阶段)
    • 使用 Objects.hash 公式进行种子派生
  • 步骤 4: 加载并对比黄金 CSV
    • #[test] fn test_golden_single_pair_margin()
    • #[test] fn test_golden_single_pair_exchange()

5. 实现结果

Note

✅ FILL 阶段: 100% 比特精确匹配 (1,000 订单) ⚠️ BENCHMARK 阶段: 需要匹配引擎 (10,000 订单)

5.1 FILL 阶段 (行 1-1000)

字段匹配状态公式
Price✅ 100%pow(r,2)*deviation + 4 值平均
Size✅ 100%1 + rand(6)*rand(6)*rand(6)
Action✅ 100%(rand(4)+priceDir>=2) ? BID : ASK
UID✅ 100%Pareto 用户账户生成

5.2 BENCHMARK 阶段分析

组件状态说明
RNG 序列✅ 已对齐nextInt(4) action 优先,然后 nextInt(q_range)
订单选择✅ 已对齐使用 orderUids 迭代器 (BTreeMap 确定性)
IOC 模拟✅ 已实现影子订单簿 simulate_ioc_match
订单簿反馈❌ 缺口Java 使用真实匹配引擎反馈 lackOfOrders

Important

BENCHMARK 阶段缺口: Java 的 generateRandomOrder 使用 真实匹配引擎lastOrderBookOrdersSizeAsk/Bid(在 updateOrderBookSizeStat 中更新)。没有完整 Rust 匹配引擎,影子订单簿会与 Java 状态分歧。

5.3 关键实现细节

  1. JavaRandom - 比特级精确的 java.util.Random LCG
  2. 种子派生: Objects.hash(symbol*-177277, seed*10037+198267)
  3. 用户账户: 1 + (int)paretoSample 公式
  4. 货币顺序: [978, 840] 基于 HashMap bucket 索引
  5. CENTRAL_MOVE_ALPHA: 0.01 (不是 0.1)
  6. 影子订单簿: ask_orders/bid_orders Vec 支持 O(1) swap_remove

6. 验证命令

一键验证:

# 运行所有黄金数据验证测试
cargo test golden_ -- --nocapture

详细对比测试:

# 逐行对比前 20 个订单与黄金 CSV
cargo test test_generator_vs_golden_detailed -- --nocapture

所有 Benchmark 测试:

# 运行 bench 模块的所有测试
cargo test bench:: -- --nocapture

预期输出:

[  1] ✅ | Golden: id=1, price=34386, size=  1, action=BID, uid=377
[  2] ✅ | Golden: id=2, price=34135, size=  1, action=BID, uid=110
[  3] ✅ | Golden: id=3, price=34347, size=  2, action=BID, uid=459
...
[20] ✅ | Golden: id=20, price=34297, size=  1, action=BID, uid=491

7. 公平压测流程 (Fair Benchmark Procedure)

Important

公平比较的关键: 数据生成与执行必须分离。Java 在测试前预生成所有命令到内存。

7.1 四阶段分离

Phase 1: 数据预生成 ───────────── ⏸️ 不计时
Phase 2: FILL (预填充) ──────────── ⏸️ 不计时  
Phase 3: BENCHMARK (压测) ──────── ⏱️ 仅此阶段计时
Phase 4: 验证 ────────────────── ⏸️ 不计时

7.2 Rust 实现规范

#![allow(unused)]
fn main() {
// ✅ 正确: 预生成 → 再执行
let (fill_commands, benchmark_commands) = generator.pre_generate_all();

// Phase 2: FILL (不计时)
for cmd in &fill_commands {
    exchange.execute(cmd);
}

// Phase 3: BENCHMARK (仅此阶段计时)
let start = Instant::now();
for cmd in &benchmark_commands {
    exchange.execute(cmd);
}
let mtps = benchmark_commands.len() as f64 / start.elapsed().as_secs_f64() / 1_000_000.0;
}

7.3 预生成接口

#![allow(unused)]
fn main() {
impl TestOrdersGeneratorSession {
    /// Pre-generate all commands for fair benchmarking
    pub fn pre_generate_all(&mut self) -> (Vec<TestCommand>, Vec<TestCommand>) {
        let fill_count = self.config.target_orders_per_side * 2;
        let benchmark_count = self.config.symbol_messages;
        
        let fill: Vec<_> = (0..fill_count).map(|_| self.next_command()).collect();
        let benchmark: Vec<_> = (0..benchmark_count).map(|_| self.next_command()).collect();
        
        (fill, benchmark)
    }
}
}

7.4 现阶段可完成 vs 需要 ME 集成

任务现阶段需 ME
预生成接口 pre_generate_all()-
生成 3M 订单到内存-
导出 CSV 供验证-
执行 FILL 阶段-
执行 BENCHMARK 计时-
全局余额验证-

8. Phase 0x14-a 总结

8.1 已完成组件

组件状态验证
JavaRandom LCG PRNG与 Java 比特精确
种子派生算法Objects.hash 复现
TestOrdersGeneratorFILL 1000 行 100% 匹配
影子订单簿IOC 模拟实现
预生成接口pre_generate_all(), pre_generate_3m()
公平测试流程文档Section 7, Appendix B

8.2 BENCHMARK 阶段差异分析

原因说明
匹配引擎反馈Java 使用 lastOrderBookOrdersSizeAsk/Bid 决定 growOrders
影响命令类型分布略有不同(GTC vs IOC 比例)
解决方案Phase 0x14-b 实现完整匹配引擎后可达 100%

8.3 下一步

优先级任务依赖
P0实现 Rust 匹配引擎 (Phase 0x14-b)-
P13M 订单压测验证匹配引擎
P2延迟统计 (HdrHistogram)匹配引擎


0x14-b Order Commands: Feature Completion

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

StatusCOMPLETED
ContextPhase V: Extreme Optimization (Step 2)
GoalAchieve feature parity with Exchange-Core’s Spot Matching Engine to support the Benchmark harness.
ScopeSpot Only. Margin/Futures deferred to 0x14-c.

1. Gap Analysis

Based on code review of src/engine.rs, src/models.rs, src/orderbook.rs:

✅ Already Implemented

FeatureLocationNotes
MatchingEnginesrc/engine.rsprocess_order(), match_buy(), match_sell()
Price-Time Priorityengine.rs:80-165Lowest ask first (buy), highest bid first (sell), FIFO
Limit Ordersengine.rs:61-68Unfilled remainder rests in book
Market Ordersengine.rs:90-94u64::MAX price for buy, matches all
Order Statusmodels.rs:57-68NEW, PARTIALLY_FILLED, FILLED, CANCELED, REJECTED, EXPIRED
OrderBookorderbook.rsBTreeMap storage, cancel_order() by ID+price+side

❌ Missing (Required for 0x14-b)

Based on exchange_core_verification_kit/test_datasets_and_steps.md L162-171 (Command Distribution):

FeatureBenchmark %Current StatusPriority
IOC (Immediate-or-Cancel)~35%❌ Not ImplementedP0
MoveOrder~8%❌ Not ImplementedP0
ReduceOrder~3%❌ Not ImplementedP1
FOK_BUDGET~1%❌ Not ImplementedP2

Note: FOK_BUDGET (Fill-or-Kill by Quote Budget) is ~1% of benchmark commands. Required for full S-to-Huge parity.


2. Architectural Requirements

2.1 Data Model Extensions (Schema)

We must extend InternalOrder to support varied execution strategies without polluting the core OrderType.

New Enum: TimeInForce

#![allow(unused)]
fn main() {
pub enum TimeInForce {
    GTC, // Good Till Cancel (Default)
    IOC, // Immediate or Cancel (Taker only, cancel remainder)
    FOK, // Fill or Kill (All or Nothing) - Optional for now
}
}

Updated InternalOrder:

  • Add pub time_in_force: TimeInForce
  • Add pub post_only: bool (Future proofing, Generator doesn’t strictly use it yet but good practice)

2.2 Matching Engine Logic

The Matching Engine must process orders sequentially based on seq_id.

Execution Flow:

  1. Incoming Order: Parse TimeInForce and OrderType.
  2. Matching:
    • Limit GTC: Match against opposite book. Remaining -> Add to Book.
    • Limit IOC: Match against opposite book. Remaining -> Expire (do not add to book).
    • Market: Match against opposite book at any price. Remaining -> Expire (or defined slippage protection).
  3. Command Handling:
    • MoveOrder: Atomic “Cancel old ID + Place new ID”. Priority Loss is acceptable (and expected).
    • ReduceOrder: Reduce qty in-place. Priority Preservation required if implemented efficiently, else re-insert. Exchange-Core typically preserves priority on reduce.

2.3 FokBudget Handling (Spot)

  • Generator produces FokBudget? -> Checks show mostly Gtc/Ioc.
  • Correction: CommandType::FokBudget exists in Generator enum but usage is rare in the Spot Benchmark. We prioritize IOC and GTC.

3. Developer Specification

3.1 Task List

  1. Model Update:
    • Modify src/models.rs: Add TimeInForce enum.
    • Update InternalOrder struct.
  2. Engine Implementation (src/engine/matching.rs):
    • Implement process_order(&mut self, order: InternalOrder) -> OrderResult.
    • Implement match_market_order.
    • Implement match_limit_order.
  3. Command Logic:
    • Implement reduce_order(price, old_qty, new_qty).
    • Implement move_order (atomic cancel + place).

3.2 Acceptance Criteria

  • Unit Tests:
    • test_ioc_partial_fill: 100 qty order vs 60 qty book -> 60 filled, 40 expired.
    • test_gtc_maker: 100 qty order vs empty book -> 100 rests in book.
    • test_market_sweep: Market order consumes multiple price levels.

4. QA Verification Plan

  • Property: Ioc orders must never appear in all_orders() (the book) after processing.
  • Property: Gtc orders must appear in book if not fully matched.
  • | Latency | Measure process_order time | ✅ < 5µs (Verified) |

5. Implementation Status & Results

Note

✅ Phase 0x14-b: 100% Feature Parity Achieved

5.1 Verification Matrix

ModulePurposeTestsStatus
IOC LogicImmediate-or-Cancel (Taker)9/9
MoveOrderPrice modification (Atomic)7/7
ReduceOrderQty reduction (Priority Preserved)5/5
PersistenceSettlement & DB Sync5/5
Edge CasesRobustness & Error Handling17/17
Total43/43100%

5.2 Key Technical Findings

  1. Asynchronous Consistency: Fixed a critical bug where Cancel/Reduce actions bypassed the MEResult persistence queue.
  2. Priority Preservation: Verified that ReduceOrder maintains temporal priority, while MoveOrder (Price change) correctly resets it.
  3. Reactive Loop: Optimized the matching engine to handle state transitions without synchronous blocking on I/O.

6. Validation Commands

Automated QA Suite:

# Run all 0x14-b specific QA tests
./scripts/test_0x14b_qa.sh --with-gateway

Unit Verification:

cargo test test_ioc_ test_mov_ test_reduce_



🇨🇳 中文

状态已完成
上下文Phase V: 极致优化 (Step 2)
目标实现与 Exchange-Core 现货撮合引擎的功能对齐,以支持基准测试工具。
范围仅现货。杠杆/期货推迟至 0x14-c。

1. 差距分析 (基于 Verification Kit)

基于 exchange_core_verification_kit/test_datasets_and_steps.md L162-171 命令分布:

✅ 已实现

功能基准占比说明
GTC 限价单~45%engine.rs::process_order()
Cancel 取消~9%完整链路: Gateway → Pipeline → OrderBook → WAL

❌ 需新增

功能基准占比优先级
IOC 即时单~35%P0
Move 移动~8%P0
Reduce 减量~3%P1
FOK_BUDGET~1%P2

说明: FOK_BUDGET (按报价币金额买入) 占比 ~1%,完成 S-to-Huge 全量测试需实现。


2. 架构需求

2.1 数据模型扩展 (Schema)

必须扩展 InternalOrder 以支持多种执行策略。

新枚举: TimeInForce

#![allow(unused)]
fn main() {
pub enum TimeInForce {
    GTC, // Good Till Cancel (默认: 一直有效直到取消)
    IOC, // Immediate or Cancel (Taker 专用: 剩余未成交部分立即过期)
    FOK, // Fill or Kill (全部成交或全部取消) - 暂可选
}
}

更新 InternalOrder:

  • 新增 pub time_in_force: TimeInForce
  • 新增 pub post_only: bool (为未来准备,虽然生成器暂时未严格使用)

2.2 撮合引擎逻辑

撮合引擎必须基于 seq_id 顺序处理 订单。

执行流:

  1. 新订单接入: 解析 TimeInForceOrderType
  2. 撮合过程:
    • Limit GTC: 与对手盘撮合。剩余部分 -> 加入订单簿
    • Limit IOC: 与对手盘撮合。剩余部分 -> 立即过期 (Expire) (不入簿)。
    • Market: 与对手盘在任意价格撮合。剩余部分 -> 过期 (或滑点保护)。
  3. 指令处理:
    • MoveOrder: 原子化 “取消旧ID + 下单新ID”。优先级丢失 是可接受的 (且预期的)。
    • ReduceOrder: 原地减少数量。如果实现得当,应保留优先级。Exchange-Core 通常在减量时保留优先级。

2.3 FokBudget 处理 (现货)

  • 生成器会产生 FokBudget 吗? -> 代码显示主要是 Gtc/Ioc
  • 修正: CommandType::FokBudget 存在于枚举中,但在现货 Benchmark 中极少使用。我们优先保证 IOCGTC 的正确性。

3. 开发规范 (Developer Specification)

3.1 任务清单

  1. 模型更新:
    • 修改 src/models.rs: 增加 TimeInForce 枚举。
    • 更新 InternalOrder 结构体。
  2. 引擎实现 (src/engine/matching.rs):
    • 实现 process_order(&mut self, order: InternalOrder) -> OrderResult
    • 实现 match_market_order (市价撮合)。
    • 实现 match_limit_order (限价撮合)。
  3. 指令逻辑:
    • 实现 reduce_order(price, old_qty, new_qty)
    • 实现 move_order (atomic cancel + place)。

3.2 验收标准

  • 单元测试:
    • test_ioc_partial_fill: 100 qty 订单 vs 60 qty 深度 -> 成交 60, 过期 40。
    • test_gtc_maker: 100 qty 订单 vs 空订单簿 -> 100 进入 OrderBook。
    • test_market_sweep: 市价单吃掉多个价格档位。

4. QA 验证计划

  • 属性: Ioc 订单处理后,绝不 应出现在 all_orders() (订单簿) 中。
  • 属性: Gtc 订单若未完全成交,必须 出现在订单簿中。
  • | 延迟 | 测量 process_order 处理时间 | ✅ < 5µs (已验证) |

5. 实施结果与验证

Note

✅ Phase 0x14-b: 100% 功能对齐已完成

5.1 验证矩阵

模块目的测试项状态
IOC 逻辑立即成交或取消 (Taker)9/9
MoveOrder改价指令 (原子化)7/7
ReduceOrder减量指令 (保留优先级)5/5
持久化结算与数据库同步5/5
边界测试鲁棒性与错误处理17/17
合计43/43100%

5.2 关键技术点总结

  1. 异步一致性: 修复了 Cancel/Reduce 操作绕过 MEResult 持久化队列的 Bug,确保数据库状态与内存一致。
  2. 优先级保留: 通过单元测试验证了 ReduceOrder 成功保留时间优先级,而 MoveOrder (改价) 正确重置了优先级。
  3. 响应式架构: 优化了撮合引擎的反应循环,确保所有指令都在微秒级完成且具备确定性的副作用路径。

6. 验证命令

一键回归测试:

# 运行所有 0x14-b QA 自动化测试
./scripts/test_0x14b_qa.sh --with-gateway

单元逻辑验证:

cargo test test_ioc_ test_mov_ test_reduce_


0x13 CPU Affinity & Cache

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📅 Status: 🚧 Planned Core Objective: Pin threads to CPU cores and optimize data layout for cache locality.


1. Overview

  • CPU Affinity: Bind matching threads to isolated cores to reduce context switching.
  • Cache Locality: Optimize OrderBook node layout to fit L1/L2 cache lines.
  • False Sharing: Padding atomic variables to prevent cache line contention.

(Detailed content coming soon in Phase III)




🇨🇳 中文

📅 状态: 🚧 计划中 核心目标: 主要线程绑核与缓存友好性优化。


1. 概述

  • CPU 亲和性 (Affinity): 将撮合线程绑定到隔离核心,减少上下文切换。
  • 缓存局部性 (Locality): 优化 OrderBook 节点布局以适应 L1/L2 缓存行。
  • 伪共享 (False Sharing): 通过 Padding 避免多线程竞争同一缓存行。

(第三阶段详细内容敬请期待)

0x14 SIMD Matching Acceleration

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

📅 Status: 🚧 Planned Core Objective: Use SIMD (AVX2/AVX-512) instructions to accelerate order matching.


1. Overview

  • Vectorization: Process multiple price levels in parallel.
  • Intrinsics: Direct use of Rust std::arch intrinsics.
  • Benchmark: Aiming for > 5M TPS.

(Detailed content coming soon in Phase III)




🇨🇳 中文

📅 状态: 🚧 计划中 核心目标: 使用 SIMD (AVX2/AVX-512) 指令集加速订单撮合。


1. 概述

  • 向量化 (Vectorization): 并行处理多个价格档位。
  • Intrinsics: 直接使用 Rust std::arch 内联汇编/指令。
  • 基准目标: 目标吞吐量 > 500万 TPS。

(第三阶段详细内容敬请期待)

0x17 SIMD Matching Acceleration

Status: Planned

This chapter will cover SIMD (Single Instruction Multiple Data) vectorized matching using AVX-512 or ARM NEON instructions.

Coming soon…

Performance Report

Generated: 2025-12-31 04:48:47

Summary

MetricBaselineCurrentChange
Orders0100,000-
Trades00-
Exec Time0.00ms119.10ms+0.0%
Throughput0/s839,618/s+0.0%

Timing Breakdown

ComponentTimeOPS% of Total
Balance Check0ns0K0.0%
Matching Engine23.42ms4.3M100.0%
Settlement0ns0K0.0%
Ledger I/O0ns0K0.0%

Latency Percentiles

PercentileValue
MIN150ns
AVG1.1µs
P501.6µs
P994.2µs
P99.911.1µs
MAX960.5µs

Verdict

No significant regressions detected

Performance History

性能报告历史存档。每个重要章节完成后生成一份报告。

报告列表

日期章节关键变化
2025-12-180x08h服务化重构,ME占76.6%,1.3M数据集
2025-12-160x07b性能基线建立,Ledger I/O 占 98.5%

命名规范

YYYY-MM-DD-章节.md

例如:2025-12-16-0x07b.md

如何生成报告

# 1. 运行性能测试
cargo run --release

# 2. 生成报告
python3 scripts/generate_perf_report.py > docs/src/perf-report.md

# 3. 存档历史
cp docs/src/perf-report.md docs/src/perf-history/$(date +%Y-%m-%d)-章节.md

# 4. 更新此索引文件,添加新条目

# 5. 提交
git add docs/src/perf-report.md docs/src/perf-history/
git commit -m "docs: Update perf report"

Performance Report - 2025-12-18 0x08-h

Branch: 0x08-h-performance-monitoring Dataset: 1.3M orders (30% cancels, high-balance mode) Changes: Service-oriented refactoring (IngestionService, UBSCoreService, MatchingService, SettlementService)

Summary

MetricSingle-ThreadMulti-Thread
Orders1,300,0001,300,000
Trades667,567667,567
Exec Time14.18s20.17s
Throughput91,710/s64,450/s
P50 Latency2.5 µs113 ms

Multi-Thread Breakdown

ComponentTime%Latency/op
Matching Engine19.23s76.6%19.23 µs
Persistence5.35s21.3%4.12 µs
Settlement0.51s2.0%0.76 µs

Key Changes

  • Extracted 4 service structs from spawn functions
  • Reduced pipeline_mt.rs from 720 to ~250 lines
  • Added pipeline_services.rs (~640 lines)
  • All tests pass with exact trade count match

Verdict

Correctness Verified: 667,567 trades, 0 balance differences

Performance Report

Generated: 2025-12-16 18:16:36

Summary

MetricBaselineCurrentChange
Orders100,000100,000-
Trades47,88647,886-
Exec Time3753.87ms3956.64ms+5.4%
Throughput26,639/s25,274/s-5.1%

Timing Breakdown

ComponentTimeOPS% of Total
Balance Check17.64ms5.7M0.4%
Matching Engine36.37ms2.7M0.9%
Settlement4.71ms21.2M0.1%
Ledger I/O3.88s26K98.5%

Latency Percentiles

PercentileValue
MIN125ns
AVG38.6µs
P50625ns
P99429.7µs
P99.91.37ms
MAX7.25ms

Verdict

2 regression(s) detected

  • Exec Time: +5.4%
  • Throughput: -5.1%

开发规范 (Development Guidelines)

Core Principle: Standardize environments to eliminate “works on my machine” issues.


🐍 Python Environment

We use uv for strict dependency management and execution speed.

1. The Golden Rule

NEVER use system python3 or pip directly for project scripts. ALWAYS use uv run to execute scripts.

2. Standard Workflow

# 1. Sync dependencies (like npm install)
uv sync

# 2. Run script (like npm run)
uv run python3 scripts/my_script.py

3. Adding Dependencies

# Add new package
uv add requests

🦀 Rust Environment

  • Format: cargo fmt must pass.
  • Lint: cargo clippy must pass (no warnings).
  • Tests: cargo test must pass.

API 规范 (API Conventions)

ID 规范 (ID Specification)

命名规范 (Naming Convention)

Money Type Safety Standard | 资金类型安全规范

Version: 1.3 | Last Updated: 2025-12-31

本文件定义了本项目处理资金(余额、订单金额、成交价格)的治理方案。 重点是:如何在代码层面禁止不符合规范的操作。 任何违反本规范的代码不得合并


Part I: 背景与设计决策

1.1 核心风险

金额是领域概念,不是原始类型。

在任何金融系统中,“钱“都不应被视为一个裸露的整数。它是一个携带精度语义的领域对象——1 BTC 内部表示为 100_000_000 聪,这个 10^8 的缩放因子是资产的内在属性,而非程序员的临时决定。

当开发者在代码中随意写下 amount * 10u64.pow(8) 时,他实际上在破坏这层抽象,将领域逻辑泄漏到业务代码的每一个角落。这会导致:

风险类型后果
账本无法对齐任何微小误差都会破坏“资金恒等定理“,导致无法 100% 精确对账。我们无法区分“正常误差“还是“真正的 Bug“。
语义错误错误地将 BTC 金额与 USDT 金额直接相加。
溢出攻击恶意构造的超大数值导致系统崩溃或资金错算。
维护噩梦转换逻辑复杂,到处重复写必然到处犯错。

1.2 为什么选择 u64 + 内部缩放?

前置阅读: 关于浮点数的问题,请参阅 0x02 浮点数的诅咒,此处不再重复。

核心结论

  • f64 无法满足跨平台确定性(不同 CPU/编译器结果可能不同)。
  • Decimal 无法满足极致性能(比 u64 慢 10x+)。
  • u64 是唯一能同时满足“区块链级验证强度“和“高频撮合性能“的方案。

u64 需要内部缩放,这引入了复杂性。因此我们必须:

  1. 将缩放算法封装在 money.rs 中。
  2. 严禁在其他地方手工进行缩放运算。

1.3 内部缩放方案:如何实现大额处理?

核心机制:我们为每种资产定义系统精度(通常 8 位),而非使用链上原生精度(如 ETH 的 18 位)。

资产链上精度系统精度u64 最大可处理金额
BTC8 位8 位1844 亿 BTC (远超总供应量)
ETH18 位8 位1844 亿 ETH
USDT6 位6 位18.4 万亿 USDT

Important

精度权衡:使用 8 位系统精度意味着 ETH 最小单位是 0.00000001 ETH (10 gwei),而非链上的 1 wei。 对于交易所场景,这完全足够——没有人会交易 1 wei 的 ETH。

这就是为什么“缩放“必须封装

  • 不同资产有不同的链上精度系统精度
  • 入金时:链上精度 → 系统精度(可能截断极小尾数)。
  • 出金时:系统精度 → 链上精度(补零)。
  • 这套转换逻辑复杂,必须集中管理,严禁各处手写。

Tip

u128 的替代方案:如果不追求极致性能,使用 u128 可以直接采用统一的 18 位精度,避免不同资产间的精度转换问题。但这会牺牲约 10-20% 的撮合性能。


Part II: 解决方案与决策 (Solutions & Decisions)

2.1 类型安全:Newtype 守卫 (The Newtype Guardian)

问题: u64 是原生类型,开发者可以轻易写出 amount * 10u64.pow(8)

方案: 引入不透明的包装类型 ScaledAmount(u64):

  • 内部字段 u64private 的,无法直接访问。
  • 所有构造必须通过 money.rs 提供的审计过的 Constructor。
  • 如果有人想“私自计算“,他必须先解包(to_raw()),这种“不自然“的操作在 Code Review 中一眼可见。
#![allow(unused)]
fn main() {
// 🛡️ 核心类型定义
pub struct ScaledAmount(u64);        // 无符号:余额、订单数量
pub struct ScaledAmountSigned(i64);  // 有符号:盈亏、差额
}

已实现:

  • ScaledAmount / ScaledAmountSigned 定义
  • checked_add / checked_sub 安全算术
  • Deref<Target = u64> 允许比较,但禁止直接算术

2.2 访问控制:入口收缩 (Visibility Chokepoint)

问题: 如果底层函数 parse_amount(str, decimals)pub 的,开发者会倾向于直接使用它。

方案: 将 Layer 1 工具函数收缩为 pub(crate)

可见性函数用途
pub(crate)parse_amount, format_amount仅限 money.rs 和核心模块内部使用
pubSymbolManager::parse_qty(), SymbolManager::format_price()外部唯一入口

效果: 在代码自动补全时,开发者首先看到的是 SymbolManager 上的高层方法。

已实现:

  • parse_amount / format_amount 改为 pub(crate)

2.3 分层架构 (Layered Architecture)

层级组件职责可见性
Layer 1 (Core)money.rs原子类型定义与底层缩放pub(crate)
Layer 2 (Domain)Asset / AssetInfo感知资产精度,提供意图封装 APIpub
Layer 3 (Integration)SymbolManager / MoneyFormatter交易对级别的转换与批量格式化pub

Tip

扩展性: MoneyFormatter 目前服务于深度图。随着 Kline/Ticker 复杂化,此模式可推广至所有行情展示。


2.4 铁律:意图封装 API (Intent-based API)

Caution

业务代码禁止直接调用 money:: 函数。必须使用 Asset / AssetInfo 提供的意图封装 API。

问题:直接调用底层函数暴露实现细节

#![allow(unused)]
fn main() {
// ❌ 错误:暴露了 decimals 参数,调用者需要知道内部实现
let amount_scaled = *money::parse_decimal(amount, asset.decimals as u32)?;
}

解决方案:在 Asset / AssetInfo 上提供意图封装方法

#![allow(unused)]
fn main() {
// ✅ 正确:调用者只需表达意图,不需要知道 decimals
let amount_scaled = asset.parse_amount(amount)?;
let fee_scaled = asset.parse_amount_allow_zero(fee)?;
}

设计架构

┌─────────────────────────────────────────────────────────────────┐
│  业务代码 (deposit.rs, withdraw.rs, order.rs...)                 │
│  ✅ asset.parse_amount(decimal)                                  │
│  ✅ asset.parse_amount_allow_zero(decimal)                       │
│  ✅ asset.format_amount(scaled_amount)                           │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│  意图封装层 (Asset / AssetInfo)                                  │
│  封装 decimals 参数,提供"类型 → 类型"的简洁 API                 │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│  核心转换层 (money.rs)                                           │
│  parse_decimal() / parse_decimal_allow_zero() / format_amount() │
│  ⚠️ pub(crate) - 仅供意图封装层调用                              │
└─────────────────────────────────────────────────────────────────┘

关键收益

收益说明
简洁性asset.parse_amount(d) vs money::parse_decimal(d, asset.decimals as u32)
封装性调用者不需要知道 decimalsdisplay_decimals 等内部参数
一致性所有业务代码使用相同的 API 模式
可审计性直接 money:: 调用是需要审查的红旗

Part III: 内外边界与显示策略 (Internal/External Boundary & Display)

3.0 核心规范:内部实现绝不暴露

Caution

内部的 u64 表示是实现细节,绝对不能暴露给客户端。

强制规范

  1. 统一转换层:内部系统与外部 Client 之间,必须经过统一的转换层
  2. API 层使用 Decimal:DTO 中的金额字段使用 StrictDecimal(自定义类型),利用 rust_decimal 的格式验证能力。
  3. 分层验证
    • Serde 层:格式验证(拒绝 .5、非数字等)→ 得到 Decimal
    • SymbolManager 层:精度/范围验证 → 得到 ScaledAmount
  4. 精度来源唯一:资产精度从 SymbolManager 获取,严禁硬编码。
┌─────────────┐     ┌──────────────┐     ┌──────────────┐     ┌─────────────┐
│   Client    │ ──→ │  Serde 层    │ ──→ │SymbolManager │ ──→ │  Internal   │
│  (String)   │     │ (Decimal)    │     │ (验证精度)   │     │   (u64)     │
└─────────────┘     └──────────────┘     └──────────────┘     └─────────────┘
     "1.5"       格式验证     Decimal(1.5)   精度验证    ScaledAmount(150_000_000)

设计优势

  • 利用库能力rust_decimal 提供成熟的数字解析
  • 早期失败:格式错误在反序列化阶段就拦截
  • 关注点分离:格式验证 vs 精度验证 分开处理
  • 业务代码简化:Handler 拿到的 Decimal 已是合法数字,只需验证范围

3.1 截断是唯一合法的舍入策略

决策:所有转换、计算过程中的精度损失,一律使用截断(Truncation),不允许四舍五入。

原因

  • 一致性:与整数除法的行为一致(向零截断)。
  • 可预测性:任何人在任何平台重算,结果完全一致。
  • 安全性:宁愿少显示,也不能让用户认为自己拥有实际不存在的余额。
场景策略示例
入金转换截断链上 1.23456789012345678 ETH → 系统 1.23456789 ETH
余额显示截断内部 123456789 → 显示 "1.2345" (4位显示精度)
成交计算截断避免凭空产生资金

3.2 严格解析:拒绝模糊输入

决策: 拒绝 .55. 等简写,强制要求 0.55.0

原因:处理金额数据,严谨和安全是第一位的。模糊的输入格式可能导致:

  • 手抖或脚本错误输入不完整数字
  • 不同解析器对歧义格式有不同解读
  • 隐蔽的精度丢失

行动项:

  • 在 OpenAPI 文档和错误信息中明确提示此规范

3.3 零值处理:默认严格 + 显式入口

问题:零值在某些场景是非法的(订单数量),在另一些场景是合法的(手续费)。

反模式:到处写 workaround

#![allow(unused)]
fn main() {
// ❌ 散落各处,维护噩梦
let fee = if fee_str == "0" {
    ScaledAmount::ZERO
} else {
    parse_amount(&fee_str, decimals)?
};
}

推荐模式:显式入口

#![allow(unused)]
fn main() {
// ===== 默认入口:严格,拒绝零 =====
/// 用于订单数量、价格等必须非零的场景
pub fn parse_amount(s: &str, decimals: u32) -> Result<ScaledAmount>

// ===== 显式入口:允许零 =====
/// 用于手续费等可能为零的场景
/// 调用者应该知道自己在做什么
pub fn parse_amount_allow_zero(s: &str, decimals: u32) -> Result<ScaledAmount>
}

使用示例

#![allow(unused)]
fn main() {
// 订单数量:必须非零(使用默认严格版本)
let qty = symbol_mgr.parse_qty(symbol, &req.quantity)?;

// 提现手续费:可以为零(显式表达意图)
let fee = symbol_mgr.parse_fee_allow_zero(symbol, &req.fee)?;
}

设计原则

原则说明
坑的成功 (Pit of Success)默认行为是安全的,需要绕过时必须显式声明
意图可见代码中看到 _allow_zero 就知道这里允许零
Code Review 信号_allow_zero 调用是需要审查的信号
解析层不做业务判断是否允许零由调用方通过选择入口决定

Part IV: 如何在代码层面强制执行?

核心问题:如何禁止开发者到处私自转换?

4.1 第一道防线:类型系统 (编译期)

Newtype 封装ScaledAmount(u64) 的内部字段是 private 的。

#![allow(unused)]
fn main() {
pub struct ScaledAmount(u64);  // u64 不可直接访问

impl ScaledAmount {
    pub(crate) fn from_raw(v: u64) -> Self { Self(v) }  // 仅 crate 内部可构造
    pub fn to_raw(self) -> u64 { self.0 }                // 显式"逃逸"
}
}

效果

  • ScaledAmount::from_raw(100) — 外部模块无法调用
  • amount.0 — 无法直接访问内部字段
  • amount + 100u64 — 类型不匹配,编译失败
  • *amount > 0 — 通过 Deref 允许比较

4.2 第二道防线:可见性控制 (API 入口收缩)

层级隔离

函数可见性谁可以调用
parse_amount()pub(crate)money.rs 和核心模块
format_amount()pub(crate)money.rs 和核心模块
SymbolManager::parse_qty()pub任何模块(唯一合法入口)
SymbolManager::format_price()pub任何模块(唯一合法入口)

效果

  • Gateway Handler 的代码自动补全中,只能看到 SymbolManager 的方法。
  • 如果开发者想调用底层 parse_amount(),会发现它不在作用域内

4.3 第三道防线:API 层数据类型 (DTO 设计)

强制规范:API 请求/响应中的金额字段,必须使用 String 类型

#![allow(unused)]
fn main() {
// ✅ 正确: 使用 String,由 Handler 调用 SymbolManager 转换
#[derive(Deserialize)]
pub struct PlaceOrderRequest {
    pub quantity: String,  // "1.5"
    pub price: String,     // "50000.00"
}

// ❌ 错误: 直接使用 u64,暴露内部实现
#[derive(Deserialize)]
pub struct PlaceOrderRequest {
    pub quantity: u64,     // 客户端如何知道要传 150_000_000?
}
}

Serde 不会自动转换:如果客户端传 "quantity": 1.5(JSON number),String 类型会反序列化失败,强制客户端传 "1.5"(JSON string)。


4.4 第四道防线:CI 自动化审计

审计脚本: scripts/audit_money_safety.sh

#!/bin/bash
set -e

echo "🔍 Auditing money safety..."

# 1. 检查非 money.rs 中的手动缩放
if grep -rn "10u64.pow" --include="*.rs" src/ | grep -v "money.rs"; then
    echo "❌ FAIL: Found 10u64.pow outside money.rs"
    exit 1
fi

# 2. 检查 Decimal 手动幂运算
if grep -rn "Decimal::from(10).powi" --include="*.rs" src/ | grep -v "money.rs"; then
    echo "❌ FAIL: Found Decimal power operation outside money.rs"
    exit 1
fi

# 3. 检查硬编码精度 (可选,需要更精细的规则)
# grep -rn "decimals.*=.*8" --include="*.rs" src/ | grep -v "symbol_manager.rs"

echo "✅ Money safety audit passed!"

集成

  • .github/workflows/ci.yml — 每次 PR 自动运行
  • .git/hooks/pre-commit — 本地提交前拦截

4.5 第五道防线:Code Review 信号

高危操作清单 (PR 审查时重点关注):

代码模式风险等级处理方式
.to_raw()⚠️ 高必须注释说明原因
10u64.powmoney.rs🚫 禁止拒绝合并
decimals: u32 硬编码⚠️ 高应从 SymbolManager 获取
API DTO 中 u64 金额字段🚫 禁止必须使用 String
Deref 后直接算术 (*a + *b)⚠️ 高应使用 checked_add

4.6 第六道防线:Agent 记忆 (AGENTS.md)

已生效: AGENTS.md 必读列表中包含本规范。所有 AI Agent 在开始工作前必须阅读,确保生成的代码符合规范。


Part V: 未来升级路径 (Future Upgrade Path)

阶段目标状态
Phase 0Newtype 定义, API 收缩, 文档治理✅ 已完成
Phase 1audit_money_safety.sh 集成 CI⏳ 待实现
Phase 1.5API Money Enforcement:Extractor + IntoResponse 强制转换⏳ 待实现
Phase 2存量代码全面扫描与迁移⏳ 待执行
Phase 2.5Legacy 代码迁移至意图封装 API(见下方详情)⏳ 待执行

Phase 2.5 详情:Legacy 代码迁移

目标:将所有直接调用 money:: 函数的代码迁移到 Asset / AssetInfo 意图封装 API。

迁移内容

旧代码新代码
money::parse_decimal(d, asset.decimals as u32)asset.parse_amount(d)
money::parse_decimal_allow_zero(d, asset.decimals as u32)asset.parse_amount_allow_zero(d)
money::format_amount(amt, decimals, display)asset.format_amount(amt)

已完成

  • src/funding/deposit.rs
  • src/funding/withdraw.rs

待迁移(扫描整个代码库中的 money::parsemoney::format 调用):

  • 其他业务模块全面扫描
  • 添加 CI 检查禁止业务代码直接调用 money:: 函数

总结:为什么如此严苛 (Why So Heavy?)

核心原则 1:账本必须 100% 可对账

如果允许任何精度误差的存在,系统的账本就无法做到 100% 对齐。 我们无法利用“资金恒等定理“(总入金 = 总余额 + 总出金)来进行精确对账。 一旦账本不能 100% 对齐,我们就无法分辨一个差异是“可接受的正常误差“还是一个隐藏的 Bug。 真正的问题可能被“误差“掩盖,直到造成无法挽回的损失。

核心原则 2:转换逻辑必须收敛到唯一位置

金额转换逻辑非常复杂(精度、舍入、溢出检查)。 如果允许在代码库各处重复编写,每个地方都可能犯不同的错误。 将转换收敛到唯一的、经过充分审计和测试的代码位置 (money.rs + SymbolManager),我们可以:

  • 对这一处进行穷尽式测试(边界值、溢出、负数等)。
  • 确保所有调用者都享受同等的安全保障。
  • 在发现 Bug 时,只需修复一处,全局生效。

简单总结 (The Rules)

  • NO 10u64.pow() outside money.rs.
  • NO raw u64 arithmetic for amounts.
  • NO implicit scaling.
  • YES SymbolManager for all intent-based conversions.

速查表 (Quick Reference)

场景✅ 正确做法❌ 错误做法
API DTO 字段quantity: StrictDecimalquantity: u64quantity: String
Decimal → ScaledAmountsymbol_mgr.decimal_to_scaled(symbol, decimal)手动计算 decimal * 10^8
ScaledAmount → Stringsymbol_mgr.format_price(symbol, amount)format!("{}", amount)
获取精度symbol_mgr.get_decimals(asset)let decimals = 8;
算术运算amount.checked_add(other)?*amount + *other
比较运算*amount > 0✅ 允许 (Deref)

API Money Enforcement | API 层资金类型强制规范

目标:确保所有 API Handler 都通过统一的转换层处理金额数据,禁止各处私自转换。

适用范围:Request(入)和 Response(出)双向。


1. 问题陈述

Gateway 有多个 API Handler,每个都需要:

  • 入向:接收 JSON 中的金额字符串(如 "1.5"),转换为内部 ScaledAmount
  • 出向:将内部 ScaledAmount 格式化为 JSON 字符串返回给客户端

核心挑战:如何确保所有 Handler 都通过 SymbolManager 转换,而不是各自写一套转换逻辑?


2. 方案对比

方案 A:DTO + 显式验证层

机制:Handler 接收原始 DTO,手动调用验证函数。

#![allow(unused)]
fn main() {
// Request
async fn place_order(Json(req): Json<PlaceOrderRequest>) -> Result<...> {
    // 每个 Handler 都要记得调用 validate()
    let validated = symbol_mgr.validate_order(&req)?;
    // ...
}

// Response
async fn get_balance(...) -> Json<BalanceResponse> {
    let raw = service.get_balance(...)?;
    // 每个 Handler 都要记得调用 format()
    Json(symbol_mgr.format_balance_response(&raw))
}
}
优点缺点
简单直接依赖开发者自觉,容易遗漏
无需额外类型转换逻辑分散在各 Handler

方案 B:Service 层封装

机制:Handler 只能调用 Service 方法,Service 内部做转换。

#![allow(unused)]
fn main() {
// Handler 只传递原始 DTO
async fn place_order(Json(req): Json<PlaceOrderRequest>) -> Result<...> {
    order_service.place(req).await  // Service 内部调用 SymbolManager
}

async fn get_balance(...) -> Result<Json<BalanceResponse>> {
    Ok(Json(balance_service.get_formatted(...).await?))  // Service 返回已格式化数据
}
}
优点缺点
业务逻辑集中Service 仍需记得调用 SymbolManager
Handler 简洁如果 Service 遗漏,问题仍会发生

方案 C:Axum Extractor + IntoResponse 模式 ⭐ 推荐

机制:在 Axum 框架层强制转换。

Request 端:自定义 Extractor

#![allow(unused)]
fn main() {
/// 已验证的订单请求,Handler 直接拿到 ScaledAmount
pub struct ValidatedOrder {
    pub symbol_id: SymbolId,
    pub quantity: ScaledAmount,
    pub price: ScaledAmount,
}

#[async_trait]
impl<S> FromRequest<S> for ValidatedOrder
where
    S: Send + Sync,
{
    type Rejection = ApiError;
    
    async fn from_request(req: Request, state: &S) -> Result<Self, Self::Rejection> {
        let Json(raw): Json<RawOrderRequest> = Json::from_request(req, state).await?;
        let symbol_mgr = state.symbol_manager();
        
        Ok(ValidatedOrder {
            symbol_id: raw.symbol_id,
            quantity: symbol_mgr.parse_qty(raw.symbol_id, &raw.quantity)?,
            price: symbol_mgr.parse_price(raw.symbol_id, &raw.price)?,
        })
    }
}

// Handler 直接拿到已验证的类型,无法绕过
async fn place_order(order: ValidatedOrder) -> Result<impl IntoResponse> {
    // order.quantity 已经是 ScaledAmount,不可能是未转换的 String
}
}

Response 端:自定义 IntoResponse

#![allow(unused)]
fn main() {
/// 已格式化的余额响应,自动调用 SymbolManager 格式化
pub struct FormattedBalanceResponse {
    pub balances: Vec<(AssetId, ScaledAmount)>,
    pub symbol_mgr: Arc<SymbolManager>,
}

impl IntoResponse for FormattedBalanceResponse {
    fn into_response(self) -> Response {
        let formatted: Vec<BalanceDto> = self.balances.iter()
            .map(|(asset, amount)| BalanceDto {
                asset: asset.to_string(),
                amount: self.symbol_mgr.format_asset_amount(*asset, *amount),
            })
            .collect();
        Json(formatted).into_response()
    }
}

// Handler 返回内部类型,格式化在 IntoResponse 中自动完成
async fn get_balances(State(state): State<AppState>) -> FormattedBalanceResponse {
    let balances = state.service.get_balances().await;
    FormattedBalanceResponse { balances, symbol_mgr: state.symbol_mgr.clone() }
}
}
优点缺点
框架层强制,Handler 拿不到原始 String需要为每类请求定义 Extractor
编译期保证需要在 Extractor 中获取 SymbolManager
转换逻辑完全集中初期实现成本略高

方案 D:类型驱动设计(最严格)

机制:定义“未验证“的金额类型,只能通过 SymbolManager 转换。

#![allow(unused)]
fn main() {
/// 未验证的金额,不能直接使用
pub struct UnvalidatedAmount(String);

impl UnvalidatedAmount {
    // 没有 .parse() 方法
    // 没有 Deref<Target=String>
    // 唯一的出路是传给 SymbolManager
}

impl SymbolManager {
    pub fn parse(&self, asset: AssetId, amount: UnvalidatedAmount) -> Result<ScaledAmount>;
}

// DTO 使用未验证类型
#[derive(Deserialize)]
pub struct PlaceOrderRequest {
    pub quantity: UnvalidatedAmount,  // 无法直接 .parse()
}
}
优点缺点
类型系统完全封锁引入更多类型
即使忘记调用也无法编译Serde 自定义反序列化略复杂

3. 推荐方案:StrictDecimal + Extractor

3.1 核心设计:分层验证

Client (JSON String "1.5")
    ↓ Serde: StrictDecimal 自定义反序列化
API DTO (StrictDecimal) ← 格式已验证
    ↓ Extractor: SymbolManager.decimal_to_scaled()
Handler (ScaledAmount) ← 精度已验证

关键洞察

  • Serde 层负责格式验证:利用 rust_decimal 的解析能力,拒绝非法格式
  • SymbolManager 负责精度验证:检查小数位是否符合资产精度
  • 业务代码只需验证范围:数字格式和精度都已保证

3.2 StrictDecimal 实现

#![allow(unused)]
fn main() {
use rust_decimal::Decimal;
use serde::{Deserialize, Deserializer};

/// 严格格式的 Decimal,在反序列化时进行格式验证
#[derive(Debug, Clone, Copy)]
pub struct StrictDecimal(Decimal);

impl StrictDecimal {
    pub fn inner(&self) -> Decimal {
        self.0
    }
}

impl<'de> Deserialize<'de> for StrictDecimal {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        let s = String::deserialize(deserializer)?;
        
        // 严格格式检查:拒绝 .5, 5., 空字符串等
        if s.is_empty() {
            return Err(serde::de::Error::custom("Amount cannot be empty"));
        }
        if s.starts_with('.') {
            return Err(serde::de::Error::custom("Invalid format: use 0.5 not .5"));
        }
        if s.ends_with('.') {
            return Err(serde::de::Error::custom("Invalid format: use 5.0 not 5."));
        }
        
        // 使用 Decimal 库解析
        let d = Decimal::from_str(&s)
            .map_err(|e| serde::de::Error::custom(format!("Invalid decimal: {}", e)))?;
        
        // 拒绝负数(金额必须非负)
        if d.is_sign_negative() {
            return Err(serde::de::Error::custom("Amount cannot be negative"));
        }
        
        Ok(StrictDecimal(d))
    }
}
}

3.3 DTO 使用示例

#![allow(unused)]
fn main() {
#[derive(Debug, Deserialize)]
pub struct PlaceOrderRequest {
    pub symbol: String,
    pub quantity: StrictDecimal,  // 格式已验证
    pub price: StrictDecimal,     // 格式已验证
}
}

3.4 SymbolManager 扩展

#![allow(unused)]
fn main() {
impl SymbolManager {
    /// 将已验证的 Decimal 转换为 ScaledAmount
    /// 只需验证精度,格式已在 Serde 层验证
    pub fn decimal_to_scaled(
        &self,
        symbol: SymbolId,
        decimal: Decimal,
    ) -> Result<ScaledAmount, MoneyError> {
        let decimals = self.get_symbol_decimals(symbol)?;
        
        // 检查精度是否超限
        if decimal.scale() > decimals {
            return Err(MoneyError::PrecisionExceeded {
                provided: decimal.scale(),
                max: decimals,
            });
        }
        
        // 转换为 u64
        let scaled = decimal * Decimal::from(10u64.pow(decimals));
        let raw = scaled.to_u64()
            .ok_or(MoneyError::Overflow)?;
        
        Ok(ScaledAmount::from_raw(raw))
    }
}
}

3.5 Extractor 整合

#![allow(unused)]
fn main() {
pub struct ValidatedOrder {
    pub symbol_id: SymbolId,
    pub quantity: ScaledAmount,
    pub price: ScaledAmount,
}

#[async_trait]
impl<S> FromRequest<S> for ValidatedOrder
where
    S: Send + Sync,
{
    type Rejection = ApiError;
    
    async fn from_request(req: Request, state: &S) -> Result<Self, Self::Rejection> {
        let Json(raw): Json<PlaceOrderRequest> = Json::from_request(req, state).await?;
        let symbol_mgr = state.symbol_manager();
        let symbol_id = symbol_mgr.get_symbol_id(&raw.symbol)?;
        
        Ok(ValidatedOrder {
            symbol_id,
            // StrictDecimal 已验证格式,这里只验证精度
            quantity: symbol_mgr.decimal_to_scaled(symbol_id, raw.quantity.inner())?,
            price: symbol_mgr.decimal_to_scaled(symbol_id, raw.price.inner())?,
        })
    }
}
}

3.6 设计优势总结

层级职责验证内容
Serde (StrictDecimal)格式验证拒绝 .5, 5., 负数, 非数字
SymbolManager精度验证检查小数位是否超限
业务代码范围验证检查金额是否在合理范围

关键收益

  1. 利用库能力rust_decimal 提供成熟的数字解析
  2. 早期失败:格式错误在反序列化阶段就拦截
  3. 关注点分离:每层只负责一种验证
  4. 编译期保证:Handler 拿到的是 ScaledAmount,无法出错

4. CI 自动化检查:机制强制,不靠自觉

核心原则:我们要从机制和流程上规范,而不是依赖开发者的“自觉“。

4.1 审计脚本:scripts/audit_api_types.sh

#!/bin/bash
set -e

echo "🔍 Auditing API type safety..."

# 1. 检查 DTO 中是否存在 u64/i64 金额字段
# 金额字段名通常包含: amount, quantity, price, balance, volume
AMOUNT_PATTERNS="amount|quantity|price|balance|volume|size|qty"

if grep -rn "pub\s\+\(${AMOUNT_PATTERNS}\)\s*:\s*u64" --include="*.rs" src/gateway/; then
    echo "❌ FAIL: Found u64 amount field in API DTO"
    echo "   → Should use String type instead"
    exit 1
fi

if grep -rn "pub\s\+\(${AMOUNT_PATTERNS}\)\s*:\s*i64" --include="*.rs" src/gateway/; then
    echo "❌ FAIL: Found i64 amount field in API DTO"
    echo "   → Should use String type instead"
    exit 1
fi

# 2. 检查 Handler 中是否直接 parse 金额
if grep -rn "\.parse::<u64>\(\)" --include="*.rs" src/gateway/; then
    echo "❌ FAIL: Found direct u64 parsing in gateway"
    echo "   → Should use SymbolManager.parse_qty() instead"
    exit 1
fi

# 3. 检查是否直接使用 format!() 格式化金额
if grep -rn 'format!\s*(\s*"{}"\s*,\s*\w*amount' --include="*.rs" src/gateway/; then
    echo "⚠️ WARNING: Possible direct amount formatting found"
    echo "   → Consider using SymbolManager.format_*() instead"
fi

# 4. 检查 Decimal 是否绕过 SymbolManager
if grep -rn "Decimal::from_str" --include="*.rs" src/gateway/ | grep -v "// safe:"; then
    echo "⚠️ WARNING: Direct Decimal parsing found in gateway"
    echo "   → Should use SymbolManager for conversions"
fi

echo "✅ API type safety audit passed!"

4.2 检查规则详解

检查项目标检测模式
DTO 字段类型金额字段必须是 String`pub (amount
直接解析禁止在 Handler 中 .parse::<u64>().parse::<u64>() in src/gateway/
直接格式化禁止 format!("{}", amount)format!(...amount...) in src/gateway/
绕过转换层禁止直接使用 Decimal::from_strDecimal::from_str in src/gateway/

4.3 CI 集成

GitHub Actions 配置

# .github/workflows/ci.yml
- name: Audit API Type Safety
  run: |
    chmod +x scripts/audit_api_types.sh
    ./scripts/audit_api_types.sh

本地 Pre-commit Hook

# .git/hooks/pre-commit
#!/bin/bash
./scripts/audit_api_types.sh || exit 1

4.4 豁免机制

对于确实需要绕过检查的特殊场景(如测试代码、内部工具),可以使用注释标记:

#![allow(unused)]
fn main() {
// safe: 这是测试代码,允许直接解析
let amount = "100".parse::<u64>().unwrap();
}

审计脚本应排除带有 // safe: 注释的行。


5. 实施路线图

阶段任务状态
Phase 1为核心订单 API 实现 ValidatedOrder Extractor⏳ 待实现
Phase 2为余额/资产 API 实现 FormattedBalanceResponse⏳ 待实现
Phase 3为所有金额相关 API 统一改造⏳ 待实现
Phase 4实现 audit_api_types.sh 并集成 CI⏳ 待实现
Phase 5添加 pre-commit hook 本地拦截📋 规划中

6. 参考

CI 常见坑与解决方案

本文档汇总 GitHub Actions CI 中遇到的典型问题及解决方案。


🚨 0. 关键警告:禁止使用 pkill -f

问题描述

在 Antigravity IDE 中执行 pkill -f "zero_x_infinity"导致 IDE 崩溃。 因为 IDE 的 language_server 进程参数中包含项目路径,会被 pkill -f 误杀。

正确做法

永远使用 PID 或精确匹配:

# ✅ 方法 1: 启动时记录 PID (推荐)
./target/release/zero_x_infinity --gateway &
GW_PID=$!
# ...
kill "$GW_PID"

# ✅ 方法 2: 精确匹配进程名
pkill "^zero_x_infinity$"

1. 服务容器 (Service Containers)

1.1 禁止使用 docker exec

问题描述

GitHub Actions 的 services: 是托管服务容器,不是本地 Docker 容器。

services:
  tdengine:
    image: tdengine/tdengine:latest
    ports:
      - 6041:6041

典型报错

docker exec tdengine taos -s "DROP DATABASE IF EXISTS trading"
# Error: No such container: tdengine

解决方案

使用 REST API 或网络协议连接,不用 docker exec

# ❌ 错误
docker exec tdengine taos -s "DROP DATABASE IF EXISTS trading"

# ✅ TDengine REST API
curl -sf -u root:taosdata -d "DROP DATABASE IF EXISTS trading" http://localhost:6041/rest/sql

# ✅ PostgreSQL psql
PGPASSWORD=trading123 psql -h localhost -U trading -d exchange_info_db -c "..."

1.2 服务连接必须用 localhost

# CI 中:
PG_HOST=localhost    # ✅ 正确
PG_HOST=postgres     # ❌ 只在 Docker Compose 中有效

2. 环境变量

2.1 测试脚本必须加载 db_env.sh

问题描述

测试脚本没有设置 DATABASE_URL 等环境变量,导致 PostgreSQL 连接超时。

典型报错

❌ Failed to connect to PostgreSQL: pool timed out while waiting for an open connection

解决方案

在脚本开头 source db_env.sh:

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/lib/db_env.sh"

2.2 CI 环境检测

if [ -n "$CI" ]; then
    # CI 专用逻辑
else
    # 本地环境逻辑
fi

3. workflow 步骤条件

3.1 正确的日志 Dump 模式

问题描述 如果不当使用 continue-on-error: true,会导致即使测试失败,Job 最终也被标记为成功(绿色),掩盖了错误。

❌ 错误做法

- name: Run Test
  run: ./test.sh
  continue-on-error: true  # 导致测试失败也被忽略

- name: Dump Logs
  run: cat logs/*.log
  # 结果:Job 变绿,错误被隐藏!

✅ 正确做法: 不要使用 continue-on-error。利用 if: failure() 条件在失败时运行日志打印步骤。

- name: Run Test
  run: ./test.sh
  # 默认 behavior: 失败立即停止后续非 if: failure() 步骤

- name: Dump Logs
  if: failure()  # 仅在之前步骤失败时运行
  run: cat logs/*.log
  # 注意:此步骤本身会成功,但 Job 状态仍由 Run Test 决定(红色)

3.2 日志文件路径一致性

确保脚本写入的日志路径与 workflow 读取的路径一致:

# 脚本中
nohup ./gateway > /tmp/gateway_fee_e2e.log 2>&1 &

# workflow 中必须匹配
cat /tmp/gateway_fee_e2e.log   # ✅ 路径一致
cat /tmp/gw_test.log           # ❌ 路径不一致

4. 数据库初始化

4.1 PostgreSQL 健康检查

问题: 默认使用 root 用户,数据库没有 root 角色。

services:
  postgres:
    options: >-
      --health-cmd "pg_isready -U trading -d exchange_info_db"  # 指定用户

4.2 TDengine 精度

必须使用 PRECISION 'us'

CREATE DATABASE IF NOT EXISTS trading PRECISION 'us';

如果精度错误,微秒时间戳会报 “Timestamp data out of range”。

4.3 服务沉淀时间

- name: Initialize TDengine
  run: ./scripts/db/init.sh td && sleep 5  # 等待元数据初始化

5. 二进制与启动

5.1 二进制新鲜度

本地测试前确保 release 二进制是最新的:

cargo build --release

CI 每次都是 fresh build,但本地开发可能运行旧版本。

5.2 Gateway 启动等待

for i in $(seq 1 60); do
    if curl -sf "http://localhost:8080/api/v1/health" > /dev/null 2>&1; then
        break
    fi
    sleep 1
done

注意:健康检查路径是 /api/v1/health,不是 /health



6. 配置与端口对齐 (Config & Port Parity)

6.1 5433 vs 5432 端口陷阱

  • 本地 (Dev): 默认端口 5433 (config/dev.yaml)。
  • CI 环境: 标准端口 5432 (config/ci.yaml)。
  • 解决方案: 测试脚本必须检测 CI=true 并传递 --env ci
if [ "$CI" = "true" ]; then
    GATEWAY_ARGS="--gateway --env ci"
fi

6.2 标准化脚本模板

请复用标准模板:scripts/templates/test_integration_template.sh


7. Python 环境规范 (uv)

7.1 禁止裸跑 Python

CI 环境中直接运行 python3 可能找不到依赖。

7.2 解决方案

使用 uv run 显式管理依赖,并推荐使用 HEREDOC 模式以确保环境隔离:

#!/bin/bash
# 统一入口 (Wrapper Scripts) 示例
export SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

# 使用 --with 显式声明依赖,并传递所有参数 "$@"
uv run --with requests --with pynacl python3 - "$@" << 'EOF'
import sys
import os
# ... python code ...
EOF

8. 快速参考

场景本地CI
TDengine 操作docker exec tdengine taoscurl localhost:6041/rest/sql
PostgreSQL 连接容器名或 localhostlocalhost only
环境变量手动设置或 .envsource db_env.sh
日志输出终端文件 + artifact 上传

9. 竞态条件与资源清理 (Race Conditions)

9.1 端口占用 (“Address already in use”)

问题描述 在同一个 Job 中连续运行多个测试脚本(如 QA Suite + POC),前一个脚本可能未完全释放端口,导致后续脚本启动 Gateway 失败。

解决方案 在启动 Gateway 前,必须显式清理旧进程。在 CI 环境中(非本地 IDE),可以使用 pkill

# Ensure clean slate
echo "Cleaning up any existing Gateway processes..."
pkill -9 -f "zero_x_infinity" || true
sleep 2 # 等待内核释放端口

关键点:使用 kill -9 确保立即终止,防止僵尸进程。


10. 错误处理规范

10.1 如果 Config 加载 Panic

禁止

#![allow(unused)]
fn main() {
File::open("config.yaml").unwrap(); // ❌ 导致 crash,无详细日志
}

必须: 使用 anyhow::Result 并添加 Context:

#![allow(unused)]
fn main() {
File::open("config.yaml").with_context(|| "Failed to open config")?; // ✅
}

10.2 数据库唯一约束 (Duplicate Key)

问题:重复注册用户导致 500 Panic 并在日志中打印堆栈跟踪,干扰排查。

解决方案:捕获 “duplicate key” 错误,记录为 Warning,并返回 409 Conflict。

#![allow(unused)]
fn main() {
if err.to_string().contains("duplicate key") {
    tracing::warn!("User already exists: {}", err);
    return Err(StatusCode::CONFLICT);
}
}


11. 测试数据与环境对齐 (Test Data Parity)

11.1 手动 SQL 注入 vs API 初始化

问题描述

本地开发通常依赖 run_poc.sh (基于 API 的全流程验证),而 CI 可能会运行更底层的 test_e2e.sh (基于 SQL 注入的快速验证)。 如果两者逻辑不一致,会导致本地通过但 CI 失败。

典型案例

  • API 充值逻辑:自动处理单位缩放 (Scaling)。
  • 手动 SQL 注入:错误地假设数据库存储 Scaled Integer (10^6),手动插入了 1000000
  • 结果:数据库里实际上存储了 1,000,000 USDT (而非 1 USDT),导致后续余额检查逻辑完全失效。

解决方案

  1. 首选 API 初始化:尽可能在测试脚本中使用 POST /api/v1/private/deposit 等 API 进行数据准备,保证业务逻辑一致性。
  2. 二次确认 Schema:如果必须使用 SQL 注入,务必查阅 migrations/schema.rs 确认字段类型 (Decimal vs BigInt)。
  3. 共享 Helper:使用统一的 Python/Bash 库处理数据注入,避免每个脚本重复造轮子且逻辑不一。

最后更新:2025-12-30


12. Bash 脚本陷阱

12.1 算术扩展导致脚本静默退出

问题描述

在开启了 set -e 的 Bash 脚本中,如果算术表达式的结果为 0,Bash 会将其视为“失败”(False),导致脚本立即退出。

典型场景

set -e
TOTAL_TESTS=0
# ...
((TOTAL_TESTS++)) # 当 TOTAL_TESTS 为 0 时,表达式结果为 0,返回码 1 -> 脚本立即退出!

后果

CI 任务在没有任何报错日志的情况下突然停止(Silent Failure),极难排查。

解决方案

始终使用标准的 POSIX 算术扩展写法,或者确保算术表达式不以此方式单独执行:

# ✅ 推荐:标准写法,不受结果值影响
TOTAL_TESTS=$((TOTAL_TESTS + 1))

# ✅ 替代:强制返回真(不优雅)
((TOTAL_TESTS++)) || true

Pre-Merge to Main Checklist

合并到 main 分支之前,必须完成所有检查项


1. 代码质量 ✓

  • cargo fmt --check 通过
  • cargo test 通过
  • Clippy(必须使用 CI 相同配置)
    cargo clippy -- -D warnings -A clippy::too_many_arguments -A clippy::collapsible_if -A clippy::unwrap_or_default -A clippy::doc_lazy_continuation -A clippy::manual_is_multiple_of -A clippy::implicit_saturating_sub -A clippy::redundant_pattern_matching -A clippy::derivable_impls
    

2. 文档更新 ✓

  • docs/src/*.md 相关章节已更新
  • docs/src/SUMMARY.md 目录结构正确
  • mdbook build 构建成功
  • README.md 已添加新章节链接
  • 阅读 agent-testing-notes.md(避免常见坑)

3. CI/CD ✓

3.1 本地验证(必须)

  • ./scripts/test_ci.sh --quick 通过
  • 模拟 CI 单独运行(关键!本地全跑可能掩盖问题):
    CI=true ./scripts/test_ci.sh --test-gateway-e2e
    CI=true ./scripts/test_ci.sh --test-kline
    CI=true ./scripts/test_ci.sh --test-depth
    CI=true ./scripts/test_ci.sh --test-account
    

3.2 CI 环境检查

  • 不使用 docker exec (CI service container 不支持)
  • 数据库连接使用 localhost 而非容器名
  • 所有 helper 函数定义在全局作用域(不在 if 块内)

3.3 CI 失败时

  1. 立即下载日志gh run view <run-id> --log-failed
  2. 搜索错误grep -i "error\|fail\|fatal" logs/*.txt
  3. 根据日志修复,不要瞎猜

4. Git 操作 ✓

  • 所有更改已 commit
  • git status 显示 clean
  • 分支已 rebase/merge 到最新 main (无冲突)

5. 发布 ✓

  • 合并后 创建 Git Tag: git tag v{版本号}
  • 推送 Tag: git push origin --tags

Caution

⚠️ 严禁删除 feature 分支!分支是项目历史的重要组成部分,必须永久保留。


执行命令

# 1. 最终检查
cargo check && cargo test && cargo clippy && cargo fmt --check

# 2. 文档构建
cd docs && mdbook build && cd ..

# 3. 合并
git checkout main
git merge <feature-branch> --no-ff -m "Merge branch '<feature-branch>'"

# 4. 打 Tag
git tag v0.10-a-account-system
git push origin main --tags

# 5. 完成
echo "✅ Merge complete!"

编译与验证注意事项 (Build & Verification Guide)

本文档总结了在本地进行 Gateway 开发和 E2E 测试时,关于“修改未生效”和“端口冲突”的常见坑点及解决方案。


1. 源码修改未生效 (The Stale Binary Trap)

当你执行了 cargo build --release 但发现测试结果仍然运行旧代码逻辑时:

常见原因

  • 指纹失效 (Fingerprint):Cargo 错误认为二进制已是最新的,跳过了重编或重新链接。
  • 增量缓存损坏target/release/incremental 中的缓存导致逻辑未刷新。
  • 时间戳分辨率:源码修改时间与上次构建时间太接近(APFS 精度问题)。

解决方案 (由轻到重)

  1. 最常用 (强制重链):
    touch src/main.rs && cargo build --release
    
  2. 清理增量缓存 (非全量 clean):
    rm -rf target/release/incremental
    
  3. 强制重编核心模块:
    cargo clean -p zero_x_infinity && cargo build --release
    

2. 端口冲突与僵尸进程 (Port Conflict)

现象

Gateway 启动失败并报错:❌ FATAL: Failed to bind to 0.0.0.0:8080: Address already in use。 这通常是因为后台残留了一个旧的 Gateway 进程。

诊断与解决

  1. 查杀残留进程 (安全方式): 不要使用 pkill -f (会杀掉 IDE)。请使用:
    # 查找并杀掉占用 8080 的进程
    lsof -i :8080
    kill -9 <PID>
    
  2. 检查脚本冲突: 确认你没有在终端手动运行 Gateway 的同时,又在另一个窗口跑 test_transfer_e2e.sh

3. E2E 测试最佳实践

确认二进制新鲜度

在运行测试前,手动核对时间戳或观察 E2E 脚本的警告:

ls -lh src/funding/service.rs target/release/zero_x_infinity

数据库一致性

如果逻辑看起来对了但 API 报错 Missing column

  1. 确认 PostgreSQL 迁移已手动应用(如果 init.sh 因为存在旧数据跳过了)。
  2. 确认 balances_tbaccount_typestatusSMALLINT,在 Rust 中必须对应 i16

运行 Gateway 时常备参数

手动调试时请务必带上环境变量参数:

./target/release/zero_x_infinity --gateway --env dev

最后更新: 2025-12-24

Database Selection: TDengine vs Others

🇺🇸 English    |    🇨🇳 中文

🇺🇸 English

Scenario: Settlement Persistence - Storing orders, trades, and balances.


📊 Comparison

Candidates

DatabaseTypeUse Case
TDengineTime-SeriesIoT, Financial Data, High-Frequency Write
PostgreSQLRelationalGeneral OLTP
TimescaleDBPG ExtensionTime-Series (PG based)
ClickHouseColumnarOLAP, Analytics

🎯 Why TDengine?

1. Performance (Based on TSBS)

MetricTDengine vs TimescaleDBTDengine vs PostgreSQL
Write Speed1.5-6.7x Faster10x+ Faster
Query Speed1.2-24.6x Faster10x+ Faster
Storage1/12 - 1/27 SpaceHuge Saving

2. Matching Exchange Requirements

RequirementTDengine Solution
High Frequency WriteMillion/sec write capacity
Timestamp IndexNative time-series design
High CardinalityHigh data points, Super Tables
Real-time StreamBuilt-in Stream Computing
Data SubscriptionKafka-like real-time push
Auto PartitioningAuto-sharding by time

3. Simplified Architecture

TDengine Solution:
┌─────────────────────────────────────────────┐
│                  TDengine                    │
│      Persistence + Stream + Subscription     │
55 └─────────────────────────────────────────────┘

Fewer Components = Lower Ops Complexity + Lower Latency

4. Rust Ecosystem

  • ✅ Official Rust Client taos
  • ✅ Async (tokio)
  • ✅ Connection Pool (r2d2)
  • ✅ WebSocket (Cloud friendly)

❌ Why Not Others?

PostgreSQL

  • ❌ Poor time-series performance.
  • ❌ High-frequency write bottleneck.
  • ❌ Large storage consumption.

TimescaleDB

  • ⚠️ Slower than TDengine.
  • ⚠️ Much larger storage footprint.

ClickHouse

  • ✅ Fast analytics.
  • ❌ Real-time row-by-row write is weak (prefers batch).
  • ❌ High Ops complexity.

📋 Data Model

TDengine Super Table

-- Orders Super Table
CREATE STABLE orders (
    ts TIMESTAMP,           -- PK
    order_id BIGINT,
    user_id BIGINT,
    side TINYINT,
    order_type TINYINT,
    price BIGINT,
    qty BIGINT,
    filled_qty BIGINT,
    status TINYINT
) TAGS (
    symbol_id INT           -- Partition Tag
);

-- Trades
CREATE STABLE trades (...) TAGS (symbol_id INT);

-- Balances
CREATE STABLE balances (...) TAGS (user_id BIGINT, asset_id INT);

Advantages

  • ✅ Auto-partition by TAG.
  • ✅ Auto-aggregation query.
  • ✅ Unified Schema.

🏗️ Architecture Integration

Gateway -> Order Queue -> Trading Core -> Events -> TDengine

✅ Final Recommendation

Primary Storage: TDengine

  • Orders, Trades, Balances History.
  • High performance write/read.

📊 Expected Performance

  • Write Latency: < 1ms
  • Query Latency: < 5ms
  • Storage Compression: 10:1
  • Supported TPS: 100,000+



🇨🇳 中文

场景: 交易所 Settlement Persistence - 存储订单、成交、余额


📊 方案对比

候选数据库

数据库类型适用场景
TDengine时序数据库IoT, 金融数据, 高频写入
PostgreSQL关系型数据库通用 OLTP
TimescaleDBPostgreSQL扩展时序数据 (基于PG)
ClickHouse列式分析数据库OLAP, 大规模聚合

🎯 为什么选择 TDengine

1. 性能优势 (基于 TSBS 基准测试)

指标TDengine vs TimescaleDBTDengine vs PostgreSQL
写入速度1.5-6.7x 更快10x+ 更快
查询速度1.2-24.6x 更快10x+ 更快
存储空间1/12 - 1/27极大节省

2. 交易所场景完美匹配

需求TDengine 解决方案
高频写入百万/秒级写入能力
时间戳索引原生时序设计,毫秒级查询
高基数支持亿级数据点,Super Table
实时分析内置流计算引擎
数据订阅类 Kafka 的实时推送
自动分区按时间自动分片
高压缩率1/10 存储空间

3. 简化架构

TDengine 方案:
┌─────────────────────────────────────────────┐
│                  TDengine                    │
│      持久化 + 流计算 + 数据订阅              │
└─────────────────────────────────────────────┘

减少组件 = 减少运维复杂度 + 减少延迟

4. Rust 生态支持

  • ✅ 官方 Rust 客户端 taos
  • ✅ 异步支持 (tokio 兼容)
  • ✅ 连接池 (r2d2)
  • ✅ WebSocket 连接 (适合云部署)

❌ 为什么不选其他方案

PostgreSQL

  • ❌ 通用数据库,时序性能差
  • ❌ 高频写入会成为瓶颈
  • ❌ 存储空间消耗大

TimescaleDB

  • ⚠️ 基于 PostgreSQL,继承其限制
  • ⚠️ 比 TDengine 慢 1.5-6.7x
  • ⚠️ 存储空间是 TDengine 的 12-27x

ClickHouse

  • ✅ 分析查询极快
  • ❌ 实时写入不如 TDengine
  • ❌ 更适合批量导入 + OLAP
  • ❌ 运维复杂度高

📋 交易所数据模型设计

TDengine Super Table 设计

-- 订单表 (Super Table)
CREATE STABLE orders (...) TAGS (symbol_id INT);

-- 成交表 (Super Table)
CREATE STABLE trades (...) TAGS (symbol_id INT);

-- 余额快照表 (Super Table)  
CREATE STABLE balances (...) TAGS (user_id BIGINT, asset_id INT);

Super Table 优势

  • ✅ 自动按 TAG 分表
  • ✅ 查询时自动聚合
  • ✅ Schema 统一管理

🏗️ 架构集成方案

Gateway -> Order Queue -> Trading Core -> Events -> TDengine

✅ 最终推荐

主存储: TDengine

  • 订单、成交、余额历史
  • 高性能写入和查询
  • 自动数据分区和压缩

📊 预期性能

  • 写入延迟: < 1ms
  • 查询延迟: < 5ms
  • 存储压缩率: 10:1
  • 支持 TPS: 100,000+

ADR-001: WebSocket Security - Strict Auth Enforcement

StatusAccepted
Date2025-12-27
AuthorQA / Security Remediation Agent
ContextPhase 0x10.5 Backend Gaps

Context

During the QA Audit of Phase 0x10.5, a critical security vulnerability (Identity Spoofing) was identified in the WebSocket Gateway. The implementation allowed clients to assert any user_id via query parameter (ws://...?user_id=123) without cryptographic verification (Token/Signature).

Decision

To immediately mitigate this P0 vulnerability while preserving functionality for the “Public Market Data” milestone:

  1. Strict Anonymous Mode: The Gateway MUST reject any connection attempt where user_id is provided and is NOT 0 (Anonymous).
  2. HTTP 401: Rejection must return 401 Unauthorized.
  3. Future Auth: Authenticated access (for Private Channels) is deferred to the Authentication Phase (0x0A-b). Until then, NO private user connections are allowed.

Consequences

  • Positive: Eliminates identity spoofing risk. System is secure for public data consumption.
  • Negative: Private channel testing (e.g., private.order) is temporarily blocked until proper Auth is implemented.

Verification

  • scripts/test_qa_adversarial.py was created to verify this constraint.

ADR-005: Unified Chain-Asset Schema & Admin Integration

Date: 2025-12-30 Status: Accepted Supersedes: ADR-004 (Partial), Design Doc 0x11-c (Draft) Context: Reconciling conflict between Admin’s Logical Assets (assets_tb) and Sentinel’s Physical Chains (chain_assets).

1. Problem Statement

The system currently has ambiguity regarding where “Asset Definition” lives:

  1. Admin (assets_tb): Defines “USDT” (Logical), Symbol, Decimal (Internal).
  2. Sentinel (chain_assets): Needs “USDT” on ETH (Physical), Contract, Decimal (Chain).
  3. Conflict: Potential data duplication (redundancy) and unclear ownership.

2. Architectural Decision: Layered Asset Model

We explicitly separate the domain model into two strictly defined layers.

Layer 1: Logical Asset (Master) -> assets_tb

  • Owner: Admin Dashboard (Existing).
  • Scope: Business logic, User Balances, Trading Pairs.
  • Key Fields:
    • asset_id (PK)
    • asset (Unique Identifier, e.g., “USDT”)
    • decimals (System Precision, e.g., 8)
    • status (Global Switch)

Layer 2: Physical Binding (Extension) -> chain_assets_tb

  • Owner: Operations (via Admin Extension).
  • Scope: Blockchain adapters, Deposit/Withdrawal addresses, Sentinel config.
  • Key Fields:
    • chain_slug (FK to chains_tb)
    • asset_id (FK to assets_tb)
    • contract_address (Physical ID)
    • decimals (Physical Precision)
  • Constraint: No re-definition of Logical fields (Symbol, Name).

3. Schema Specification (Finalized)

-- 1. Chains (Infrastructure)
CREATE TABLE chains_tb (
    chain_slug VARCHAR(32) PRIMARY KEY,  -- 'ETH', 'BTC' (Renamed from chain_id)
    chain_name VARCHAR(64) NOT NULL,
    network_id VARCHAR(32),              -- '1', 'regtest'
    rpc_urls TEXT[] NOT NULL,
    confirmation_blocks INT DEFAULT 1,
    is_active BOOLEAN DEFAULT TRUE
);

-- 2. Chain Assets (Physical Extension)
CREATE TABLE chain_assets_tb (
    id SERIAL PRIMARY KEY,
    chain_slug VARCHAR(32) NOT NULL REFERENCES chains_tb(chain_slug),
    asset_id INT NOT NULL REFERENCES assets_tb(asset_id),
    
    -- Physical Properties Only
    contract_address VARCHAR(128),  -- Mutually Exclusive Unique ID per chain
    decimals SMALLINT NOT NULL,     -- The mapping factor (Chain -> System)
    
    -- Chain-Specific Overrides
    min_deposit DECIMAL(30, 8) DEFAULT 0,
    min_withdraw DECIMAL(30, 8) DEFAULT 0,
    withdraw_fee DECIMAL(30, 8) DEFAULT 0,
    is_active BOOLEAN DEFAULT FALSE, -- Safety: Inactive by default until verified
    
    -- Constraints
    UNIQUE(chain_slug, asset_id),        -- 1 Asset per Chain (for now, can look into bridging later)
    UNIQUE(chain_slug, contract_address) -- 1 Contract = 1 Asset
);

4. Admin Integration Scope

Admin code currently does not support Layer 2. To strictly follow this architecture, Admin must be updated in a future iteration:

  1. New Model: ChainAsset mapping to chain_assets_tb.
  2. New View: “Chain Configurations” tab under Asset details.
  3. Logic: When viewing “USDT”, allow adding “ETH Binding” (Contract: 0x…, Decimals: 6).

5. Migration Strategy (Immediate)

For Phase 0x11-b (Sentinel Hardening), we implement the Schema and Manual Seeding (Migration 012). Admin UI updates are deferred, but the Schema is future-proofed to support them perfectly.

ADR-006: User Address Decoupling for Account-Based Chains

Date: 2025-12-30 Status: Accepted Context: Replaces user_addresses definition in migrations/010_deposit_withdraw.sql to enable “Hot Listing”.

1. Problem Statement

The current schema for user addresses matches Assets, not Chains:

-- OLD (Flawed)
PRIMARY KEY (user_id, asset, chain_slug)

The Loophole:

  1. User A has ETH Address 0x123. DB Record: (UserA, 'ETH', 'ETH', '0x123').
  2. Ops lists UNI (ERC20).
  3. User A deposits UNI to 0x123.
  4. Sentinel parses Transfer(0x123, val).
  5. Sentinel looks up: SELECT user_id FROM user_addresses WHERE address='0x123' AND asset='UNI'.
  6. Result: NULL. Deposit Ignored.

Impact: Users must manually “Generate UNI Address” (redundant action) before depositing, or funds are lost/stuck. This breaks the “Ops List -> Immediate Deposit” workflow.

2. Solution: Chain-Centric Address Model

We must recognize that for Account-Based Chains (ETH, TRON, BSC, SOL), an address belongs to the Chain Account, not the individual Asset.

2.1 Schema Change

We split the concept into “Account Bindings”.

-- migration/012_user_chain_addresses.sql

CREATE TABLE user_chain_addresses (
    user_id BIGINT NOT NULL,
    chain_slug VARCHAR(32) NOT NULL REFERENCES chains_tb(chain_slug),
    address VARCHAR(255) NOT NULL,
    
    -- Metadata
    memo_tag VARCHAR(64), -- For XRP/EOS destination tags
    created_at TIMESTAMPTZ DEFAULT NOW(),
    
    -- Constraint: One address per user per chain (simplified model)
    -- Or Multiple? For now, 1:1 is sufficient for MVP.
    PRIMARY KEY (user_id, chain_slug),
    UNIQUE (chain_slug, address) -- Reverse lookup must be unique
);

2.2 Sentinel Lookup Logic

When EthScanner detects a Transfer(to, value, contract):

  1. Identify Asset: Match contract -> asset_id (via chain_assets_tb).
  2. Identify User: Match to -> user_id (via user_chain_addresses WHERE chain_slug=‘ETH’).
  3. Insert Deposit: deposit_history(user_id, asset_id, amount).

Outcome: The asset_id comes from the Contract, the user_id comes from the Address. They are decoupled.

3. Handling UTXO (BTC)

BTC addresses are generally single-use or per-intent. However, for an Exchange Deposit model, we typically generate one “Deposit Address” per User per Chain (or rotate them). Currently, we can treat BTC the same: “User’s BTC Deposit Address”. If we need asset-specific addresses (e.g. OMNI USDT vs BTC), that’s a legacy edge case we might ignore for MVP, or handle via chain_slug variants (e.g. btc-omni vs btc-native? No, usually same chain).

Decision: The Schema user_chain_addresses(user_id, chain_slug) works for BTC too if we assume “One Checkable Address per User” or “List of Addresses”. Refinement: PRIMARY KEY (chain_slug, address) is the real physical truth. A user maps to an address. An address maps to a user.

4. Operational Workflow (Final)

  1. Listing: Ops lists UNI (Contract 0x...) -> chain_assets_tb.
  2. Sentinel: Refreshes map 0x... -> UNI.
  3. User: Sends UNI to their existing ETH address.
  4. Sentinel:
    • Sees 0x... -> Knows it’s UNI.
    • Sees Receiver -> Knows it’s User A.
    • Success.

5. Status

Accepted

AR-001: Architecture Request - WebSocket Authentication

StatusREQUESTED
Date2025-12-27
RequesterQA / Remediation Agent
DriverIdentity Spoofing Remediation

Problem Statement

The current WebSocket implementation relies on a “Strict Anonymous Mode” (ADR-001) which rejects any user_id != 0. While this mitigates immediate identity spoofing, it creates a functional gap: Authentic users cannot verify their identity or access private channels.

The user explicitly rejected ADR-001 as a complete solution (security is not fixed ... require forthar design), necessitating a robust authentication design.

Requirements

The Architect must provide a design (e.g., ADR-002) that:

  1. Authentication mechanism: Defines how a WebSocket client proves its identity (e.g., JWT in Query Param vs Header vs Handshake Message).
  2. Integration: How this integrates with src/api_auth/ (Ed25519) or standard Session Management.
  3. State Management: How ConnectionManager stores and validates the authenticated session.
  4. Migration: Specific steps to replace the temporary “Strict Anonymous Mode” in handler.rs with the new mechanism.

Constraints

  • Low Latency: Auth check must not significantly delay connection establishment.
  • Backwards Compatibility: Must support Anonymous public trade streams simultaneously.