0x00 Project Roadmap

Vision: Build a production-grade cryptocurrency exchange from Hello World to Microsecond Latency. Current Status: Phase V (Extreme Optimization) - Order Commands parity complete.

📊 Progress Overview

This project documents the complete journey of building a 1.3M orders/sec matching engine. Below is the current status of each phase.

✅ Phase I: Core Matching Engine

Status: Complete

Chapter	Title	Description
0x01	Genesis	Basic OrderBook with `Vec<Order>`
0x02	Float Curse	Why floats fail → `u64` refactoring
0x03	Decimal World	Precision configuration system
0x04	BTree OrderBook	BTreeMap-based order book
0x05	User Balance	Account & balance management
0x06	Enforced Balance	Type-safe fund locking
0x07	Testing Framework	1M order batch testing
0x08	Trading Pipeline	LMAX-style Ring Buffer architecture
0x09	Gateway & Persistence	HTTP API, TDengine, WebSocket, K-Line

✅ Phase II: Productization

Status: Complete

Chapter	Title	Description
0x0A	Account System	PostgreSQL user management
0x0A-b	ID Specification	Identity addressing rules
0x0A-c	API Authentication	Ed25519 cryptographic auth
0x0B	Funding & Transfer	Internal transfer architecture
0x0C	Trade Fee	Maker/Taker fees + VIP discount

🔶 Phase III: Resilience & Funding

Status: Complete

Chapter	Title	Description	Status
0x0D	Snapshot & Recovery	State snapshot, crash recovery	✅ Done
0x0E	OpenAPI Integration	Swagger UI, SDK generation	✅ Done
0x0F	Admin Dashboard	Ops Panel, KYC, hot-reload	✅ Done
0x11	Deposit & Withdraw	Mock Chain integration, Idempotency	✅ Done
0x11-a	Real Chain Integration	Sentinel Service (Pull Model)	✅ MVP Done
0x11-b	Sentinel Hardening	SegWit Fix (DEF-002) & ETH/ERC20 & ADR-005/006	✅ Done

🔶 Phase IV: Trading Integration & Verification

Status: Pending Verification

Context: The Core Engine and Trading APIs are implemented but currently tested with Mocks. This phase bridges the gap between the Real Chain (0x11) and the Matching Engine (0x01).

Chapter	Title	Description	Status
0x12	Real Trading Verification	End-to-End: `Bitcoind -> Sentinel -> Order -> Trade`	� Code Ready (Needs Real-Chain Test)
0x13	Market Data Experience	WebSocket Verification (`Ticker`, `Trade`, `Depth`)	� Code Ready (Needs E2E Test)

⏳ Phase V: Extreme Optimization (Metal Mode)

Status: In Progress

Codename: “Metal Mode” Goal: Push Rust to the physical limits of the hardware.

Chapter	Title	Description
0x14	Extreme Optimization	Architecture Manifesto
0x14-a	Benchmark Harness	✅ 100% Bit-exact Parity (FILL)
0x14-b	Order Commands	✅ IOC, Move, Reduce (Feature Parity)
0x15	Zero-Copy	Planned
0x16	CPU Affinity	Planned
0x17	SIMD Matching	Planned

🏆 Key Milestones

Git Tag	Phase	Highlights
`v0.09-f-integration-test`	0x09	1.3M orders/sec baseline achieved
`v0.10-a-account-system`	0x0A	PostgreSQL account integration
`v0.10-b-api-auth`	0x0A	Ed25519 authentication
`v0.0C-trade-fee`	0x0C	Maker/Taker fee system
`v0.0D-persistence`	0x0D	Universal WAL & Snapshot persistence
`v0.0F-admin-dashboard`	0x0F	Admin Operations Dashboard
`v0.11-a-funding-qa`	0x11-a	Real Chain Sentinel MVP (Deposit/Withdraw)
`v0.11-b-sentinel-hardening`	0x11-b	DEF-002 Fix, ADR-005/006, Hot Listing
`v0.14-b-order-commands`	0x14-b	✅ IOC, Move, Reduce (Bit-exact Parity)

🎯 What You’ll Learn

Financial Precision - Why f64 fails and how to use fixed-point u64
High-Performance Data Structures - BTreeMap for O(log n) order matching
Lock-Free Concurrency - LMAX Disruptor-style Ring Buffer
Event Sourcing - WAL-based deterministic state reconstruction
Real-World Blockchain Integration - Handling Re-orgs, Confirmations, and UTXO management
Production Security - Watch-only wallets & Ed25519 authentication

Last Updated: 2025-12-31

0x01 Genesis: Basic Engine

This is the first version of 0xInfinity. In this stage, we have built a minimal prototype of a Central Limit Order Book (CLOB). Our goal is to intuitively demonstrate real-world trading logic using standard data structures to manage orders.

1. Visualizing the Orderbook

An Orderbook is essentially a list of orders arranged by price. We place Sells (Asks) at the top and Buys (Bids) at the bottom. The gap in the middle is called the “Spread”.

We maintain two lists in memory:

Sells: Sorted by price Low to High (Buyers want the cheapest price).
Buys: Sorted by price High to Low (Sellers want the most expensive price).

===========================================================
               ORDER BOOK SNAPSHOT
===========================================================

    Side   |   Price (f64)   |   Qty    |   Orders (FIFO)
-----------------------------------------------------------
    SELL   |     102.00      |   5.0    |   [Order #2]
    SELL   |     101.00      |   5.0    |   [Order #3]     ^
                                                           | Best Ask (Lowest)
-----------------------------------------------------------
             $$$  MARKET SPREAD  $$$
-----------------------------------------------------------
                                                           | Best Bid (Highest)
    BUY    |     100.00      |   10.0   |   [Order #1]     v
    BUY    |      99.00      |   10.0   |   [Order #5]

===========================================================

2. Program Output

After executing cargo run, we can observe the actual output of the engine:

--- 0xInfinity: Stage 1 (Genesis) ---

[1] Makers coming in...

[2] Taker eats liquidity...
MATCH: Buy 4 eats Sell 1 @ Price 100 (Qty: 10)
MATCH: Buy 4 eats Sell 3 @ Price 101 (Qty: 2)

[3] More makers...

--- End of Simulation ---

↑ Back to Top

🇨🇳 中文

📦 代码变更: 查看 Diff

这是 0xInfinity 的第一个版本。在这一阶段，我们构建了一个最简单的**中央限价订单簿（CLOB）**雏形。我们的目标是直观地展示现实世界的交易逻辑，使用标准的数据结构来管理订单。

1. 订单簿布局 (Visualizing the Orderbook)

订单簿本质上是一个按价格排列的列表。我们将**卖单（Sells）**放在上方，**买单（Buys）**放在下方。中间的空隙被称为“价差（Spread）”。

我们在内存中维护了两个列表：

Sells: 按价格 从低到高 排列（买家希望买到最便宜的）。
Buys: 按价格 从高到低 排列（卖家希望卖给最贵的）。

===========================================================
               ORDER BOOK SNAPSHOT
===========================================================

    Side   |   Price (f64)   |   Qty    |   Orders (FIFO)
-----------------------------------------------------------
    SELL   |     102.00      |   5.0    |   [Order #2]
    SELL   |     101.00      |   5.0    |   [Order #3]     ^
                                                           | Best Ask (Lowest)
-----------------------------------------------------------
             $$$  MARKET SPREAD  $$$
-----------------------------------------------------------
                                                           | Best Bid (Highest)
    BUY    |     100.00      |   10.0   |   [Order #1]     v
    BUY    |      99.00      |   10.0   |   [Order #5]

===========================================================

2. 运行结果 (Program Output)

执行 cargo run 后，我们可以看到引擎的实际运行结果：

--- 0xInfinity: Stage 1 (Genesis) ---

[1] Makers coming in...

[2] Taker eats liquidity...
MATCH: Buy 4 eats Sell 1 @ Price 100 (Qty: 10)
MATCH: Buy 4 eats Sell 3 @ Price 101 (Qty: 2)

[3] More makers...

--- End of Simulation ---

0x02: The Curse of Float

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

1. The Rookie Mistake

Experienced developers might have noticed that the price type was f64. This is problematic. In models.rs, we had this line:

#![allow(unused)]
fn main() {
pub price: f64, // The root of all evil
}

In most general-purpose applications where absolute precision is not critical, using floating-point numbers is fine. If single precision isn’t enough, double precision usually suffices. However, in the financial domain, storing monetary values as floats is considered an engineering disaster.

If you use floats to store money, it is impossible to maintain a 100% accurate ledger over time. Even with frequent reconciliation, you often end up accepting a “close enough” result.

Moreover, using floats introduces accumulation errors. Over millions of transactions, these tiny errors add up. While various rounding modes can mitigate this if done correctly, the root cause remains.

The biggest issue isn’t just the error itself (which might be acceptable within a tolerance), but the fact that you cannot fundamentally verify the correctness of the settlement, potentially hiding real bugs.

2. The Precision Trap

Run this incredibly simple code (you can run it in this project via cargo run --example the_curse_of_float):

fn main() {
    let a: f64 = 0.1;
    let b: f64 = 0.2;
    let sum = a + b;

    // You expect this to pass, right?
    if sum == 0.3 {
        println!("Math works!");
    } else {
        println!("PANIC: Math is broken! Sum is {:.20}", sum);
    }
}

The output might surprise you:

PANIC: Math is broken! Sum is 0.30000000000000004441

See that extra 0.00000000000000004441? What is that? Why does it happen?

The main issue isn’t just about floating-point precision being “insufficient,” but that computers simply cannot precisely represent certain numbers.

Computers use binary, while humans use decimal. Just as 1/3 = 0.3333... repeats infinitely in decimal, 0.1 is a repeating fraction in binary that cannot be represented exactly.

In a matching engine, if an Ask in your OrderBook is 0.3 and a user’s Bid is computed as 0.1 + 0.2, these two orders—which inherently should match—will never match due to floating-point errors.

3. Why Blockchain Hates Floats

If you’ve worked with Ethereum smart contracts, you know there are no floating-point numbers in Solidity. Many people wonder why.

There is only one reason: Blockchain cores require 100% deterministic outputs for the same input. Regardless of time, location, hardware, OS, or CPU architecture, running the same code must yield exactly the same result. Only with absolute consistency—down to the last bit—can we ensure that everyone shares the same ledger and the same “consensus.”

Specifically, while floating-point calculations follow the IEEE 754 standard, edge cases can cause minute differences across CPUs:

Node A (Intel) Result: 100.00000000000001
Node B (ARM)   Result: 100.00000000000000

Once this happens, the storage Hash differs, consensus breaks, and the chain forks.

4. The Decimal Temptation

When people realize the issue with f64, they often look for a precise decimal type, such as rust_decimal.

However, even with Decimal, different hardware, programming languages, or even compiler versions can lead to subtle differences. Achieving the 100% determinism required by blockchain is difficult.

The only thing that guarantees 100% determinism is Integer arithmetic. If integer calculations are inconsistent, it is 100% a bug.

Problems with Decimal:

Software Emulation: Decimal is a software struct, not a hardware primitive.
Implementation Dependency: Consistency depends on the library implementation.
“Dialects”: If your backend uses Rust (rust_decimal), risk engine uses Python (decimal), and frontend uses JS (BigInt), subtle differences in “Rounding Mode” or “Overflow Handling” can lead to ledger discrepancies over time.

5. Need for Speed: f64 vs u64

Besides determinism, another core reason we avoid Decimal is Performance.

u64 (Native Integer):

When executing a + b, the CPU has a dedicated ALU circuit for 64-bit integer addition.
It completes in as little as 1 clock cycle.

Decimal (Software Struct):

When executing addition, the CPU runs a complex piece of code: checking Scale, aligning decimals, handling overflow, and finally calculating.
This takes hundreds to thousands of times more instruction cycles.

In most apps, CPU cycles are abundant, so this doesn’t matter. But we are writing an HFT (High-Frequency Trading) engine where every nanosecond counts.

Cache Efficiency:

u64 takes 8 bytes.
Decimal typically takes 16 bytes (128-bit).
Using u64 means your CPU cache can store twice as much price data, effectively doubling your throughput.

We will discuss Cache mechanics in detail later.

Summary

Two reasons to ban floating-point numbers:

No 100% Determinism — Fails to meet blockchain consensus and precise reconciliation requirements.
Performance Issues — For HFT engines, Integer is the only choice.

Refactoring Results

We have refactored all f64 fields in models.rs to u64:

#![allow(unused)]
fn main() {
pub struct Order {
    pub id: u64,
    pub price: u64,  // Use Integer for Price
    pub qty: u64,    // Use Integer for Quantity
    pub side: Side,
}
}

Output after cargo run:

--- 0xInfinity: Stage 2 (Integer) ---

[1] Makers coming in...

[2] Taker eats liquidity...
MATCH: Buy 4 eats Sell 1 @ Price 100 (Qty: 10)
MATCH: Buy 4 eats Sell 3 @ Price 101 (Qty: 2)

[3] More makers...

--- End of Simulation ---

Now all price comparisons are precise integer comparisons, free from floating-point errors.

↑ Back to Top

🇨🇳 中文

📦 代码变更: 查看 Diff

1. 新手常犯的错误 (The Rookie Mistake)

有经验的老手，应该马上看到 price 的类型是 f64，这是有问题的。因为我们在 models.rs 里有这行代码：

#![allow(unused)]
fn main() {
pub price: f64, // The root of all evil
}

在大多数不要求计算结果绝对精确的场合，使用浮点数是没问题的。如果单精度不够，那就使用双精度，一般都不会有什么问题。但是在金融领域，使用浮点数存储金额，属于工程事故。

使用浮点数存储金额，稍微长一点时间，都不可能做到账本的完全精确、分毫不差。即使通过频繁的对账校验，最后也只能接受“大差不差，差不多就行“的结果。

而且使用浮点数存储金额，会带来累积误差。在常年累月的交易后，这些微小的误差会越来越多。使用各种不同的误差舍入模式，如果做对了，可以减少累积误差。

如果说累积误差在一定范围内是可以接受的，那么误差本身一般不是问题。最大的问题是：如果不能从根本上检验结算的正确性，就可能因此而隐藏了真正的 bug。

2. 精度陷阱 (The Precision Trap)

跑一下这段极其简单的代码（你可以在本项目中运行 cargo run --example the_curse_of_float）：

fn main() {
    let a: f64 = 0.1;
    let b: f64 = 0.2;
    let sum = a + b;

    // You expect this to pass, right?
    if sum == 0.3 {
        println!("Math works!");
    } else {
        println!("PANIC: Math is broken! Sum is {:.20}", sum);
    }
}

输出结果会让人惊讶：

PANIC: Math is broken! Sum is 0.30000000000000004441

看到了吗？那个多出来的 0.00000000000000004441。这是什么鬼？为什么会这样？

主要的问题不仅仅是浮点数精度够不够的问题，而是计算机根本无法精确表示某些数字的问题。

计算机是二进制的，而人类的常用数字是十进制的。就像十进制里 1/3 = 0.3333... 永远写不完一样，在二进制里，0.1 也是一个用二进制永远无法完全精确表达的数。

在撮合引擎里，如果你的 OrderBook 里的 Ask 是 0.3，而用户的 Bid 是 0.1 + 0.2，由于浮点误差，这两个本来应该成交的单子，永远不会匹配。

3. 区块链的零容忍 (Why Blockchain Hates Floats)

如果了解过以太坊的智能合约语言就知道，在合约里面是没有任何浮点数的。很多人不知道为什么。

原因只有一个：区块链的核心是要求同样的输入必须 100% 确定的输出。无论你在什么时间、什么地方，都必须在不同的硬件、不同的操作系统、不同的 CPU 架构上，运行同一段代码，并得到完全一致的结果。只有完全一致，一个 bit 的误差都没有，才能确定全球所有人共享的都是同一个账本、同一种“比特币“。

具体而言，浮点数计算遵循 IEEE 754 标准，但在极端边缘情况下，不同的 CPU 对浮点数的处理可能会有极其微小的差异：

Node A (Intel) 算出结果：100.00000000000001
Node B (ARM) 算出结果：100.00000000000000

一旦发生这种情况，Hash 就会不同，共识就会破裂，链就会分叉。

4. Decimal 的诱惑与陷阱 (The Decimal Temptation)

有人意识到 f64 的问题时，会寻找一种精确的小数类型，比如 rust_decimal。

但即使是 Decimal，在不同的硬件、不同编程语言，甚至同一种语言的不同版本、编译器的实现上，都可能有细微的差别，都不可能做到区块链要求的 100% 确定性。

能做到 100% 确定性的，只有整数。如果全部是整数计算结果也不一致的话，可以 100% 确定是有 bug。

Decimal 的问题

Decimal (Software Struct):

Decimal 是软件模拟的
Decimal 的一致性依赖于库的实现
如果你的后端用 Rust (rust_decimal)，风控用 Python (decimal)，前端用 JS (BigInt)，不同的库对“舍入模式 (Rounding Mode)“和“溢出处理“可能有不同的“方言”
这种微小的差异会导致长时间之后系统对不上账

5. 性能之争: f64 vs u64 (Need for Speed)

除了 100% 确定性，我们不使用 Decimal 的另一个核心理由是：性能。

u64 (Native Integer):

当你执行 a + b 时，CPU 内部有专门的 ALU 电路直接处理 64 位整数加法
它最快只需要 1 个时钟周期 就完成了计算

Decimal (Software Struct):

当你执行加法时，CPU 实际上是在运行一段复杂的代码：检查 Scale、调整对齐、处理溢出、最后计算
这需要多 上百倍甚至几千倍 的指令周期

大多数情况下，CPU 时钟周期都过剩，因此一般应用无需过多考虑。而且大多数现代 CPU 都有浮点计算单元，也会很快。但我们要写的是 HFT 引擎，纳秒必争。

还有就是 Cache Efficiency（缓存效率）：

u64 占 8 字节
Decimal 通常占 16 字节 (128-bit)
使用 u64 意味着你的 CPU 缓存能多存一倍的价格数据，这直接意味着吞吐量翻倍

关于 Cache 的问题，后面再详细讨论。

Summary

不能使用浮点数的两个理由：

不能保证 100% 确定性 — 无法满足区块链共识和精确对账的要求
Decimal 有性能问题 — 对于 HFT 引擎来说，整数是唯一的选择

重构后的运行结果

我们已经把 models.rs 中的 f64 全部重构为 u64：

#![allow(unused)]
fn main() {
pub struct Order {
    pub id: u64,
    pub price: u64,  // 使用整数表示价格
    pub qty: u64,    // 使用整数表示数量
    pub side: Side,
}
}

运行 cargo run 后的输出：

--- 0xInfinity: Stage 2 (Integer) ---

[1] Makers coming in...

[2] Taker eats liquidity...
MATCH: Buy 4 eats Sell 1 @ Price 100 (Qty: 10)
MATCH: Buy 4 eats Sell 3 @ Price 101 (Qty: 2)

[3] More makers...

--- End of Simulation ---

现在所有的价格比较都是精确的整数比较，不再有浮点数误差的问题。

0x03: Decimal World

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

In the previous chapter, we refactored all f64 to u64, solving the floating-point precision issues. But this introduced a new problem: Clients use decimals, while we use integers internally. How do we convert between them?

1. The Decimal Conversion Problem

When a user places an order, the input price might be "100.50" and quantity "10.5". However, our engine uses u64 integers:

#![allow(unused)]
fn main() {
pub struct Order {
    pub id: u64,
    pub price: u64,   // Integer representation
    pub qty: u64,     // Integer representation
    pub side: Side,
}
}

Core Question: How to perform lossless conversion between decimal strings and u64?

The answer is the Fixed Decimal scheme:

#![allow(unused)]
fn main() {
/// Convert decimal string to u64
/// e.g., "100.50" with 2 decimals -> 10050
fn parse_decimal(s: &str, decimals: u32) -> u64 {
    let multiplier = 10u64.pow(decimals);
    // ... Parsing Logic
}

/// Convert u64 back to decimal string for display
/// e.g., 10050 with 2 decimals -> "100.50"
fn format_decimal(value: u64, decimals: u32) -> String {
    let multiplier = 10u64.pow(decimals);
    let int_part = value / multiplier;
    let dec_part = value % multiplier;
    format!("{}.{:0>width$}", int_part, dec_part, width = decimals as usize)
}
}

2. The u64 Max Value (Range Analysis)

The maximum value of u64 is:

u64::MAX = 18,446,744,073,709,551,615

If we use 8 decimal places (similar to Bitcoin’s satoshi), the maximum representable value is:

184,467,440,737.09551615

This means:

For Price: We can represent up to ~184 Billion. (If Bitcoin hits this price, we’ll upgrade…)
For Quantity: It can hold the entire total supply of BTC (21 million).

Decimals Configuration for Different Assets

Different blockchain assets have different native precisions:

Asset	Native Decimals	Smallest Unit
BTC	8	1 satoshi = 0.00000001 BTC
USDT (ERC20)	6	0.000001 USDT
ETH	18	1 wei = 0.000000000000000001 ETH

The Question: ETH natively uses 18 decimals. Will we lose precision if we use only 8?

The answer is: It is sufficient for an Exchange. Because:

With 8 decimals, the smallest supported unit is 0.00000001 ETH.
There’s no real need to trade 0.000000000000000001 ETH (value ≈ $0.000000000000003).

So we can choose a reasonable internal precision, not necessarily identical to the native chain.

Thus, we need a SymbolManager to manage:

Internal precision (decimals) for each asset.
User display precision (display_decimals).
Price precision configuration for trading pairs.
Conversion between on-chain and internal precision during Deposit/Withdrawal.

ETH Decimals Analysis: 8 vs 12 bits

Let’s analyze the maximum ETH amount representable by u64 under different decimal configs:

Decimals	Multiplier	Max Value by u64	Sufficient?
8	10^8	184,467,440,737 ETH	✅ Huge margin
9	10^9	18,446,744,073 ETH	✅ Huge margin
10	10^10	1,844,674,407 ETH	✅ > Total Supply
11	10^11	184,467,440 ETH	✅ Just enough (~120M)
12	10^12	18,446,744 ETH	❌ < Total Supply!
18	10^18	18.44 ETH	❌ Absolutely not enough

ETH Total Supply ≈ 120 Million ETH

Why we chose 8 decimals for ETH?

0.00000001 ETH ≈ $0.00000003, far below any meaningful trade size.
Max capacity 184 Billion ETH > Total Supply (120M).
Just convert precision during Deposit/Withdrawal.

Configuration Example:

#![allow(unused)]
fn main() {
// BTC: 8 decimals (Same as satoshi)
manager.add_asset(1, 8, 3, "BTC");

// USDT: 8 decimals (Native is 6, we align to 8 internally)
manager.add_asset(2, 8, 2, "USDT");

// ETH: 8 decimals (Safe range, sufficient precision)
manager.add_asset(3, 8, 4, "ETH");
}

3. Symbol Configuration

Different trading pairs have different precision requirements:

Symbol	Price Decimals	Qty Display Decimals	Example
BTC_USDT	2	3	Buy 0.001 BTC @ $65000.00
ETH_USDT	2	4	Buy 0.0001 ETH @ $3500.00
DOGE_USDT	6	0	Buy 100 DOGE @ $0.123456

We use SymbolManager to manage these configs:

#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
pub struct SymbolInfo {
    pub symbol: String,
    pub symbol_id: u32,
    pub base_asset_id: u32,
    pub quote_asset_id: u32,
    pub price_decimal: u32,         // Decimals for Price
    pub price_display_decimal: u32, // Display decimals for Price
}

#[derive(Debug, Clone)]
pub struct AssetInfo {
    pub asset_id: u32,
    pub decimals: u32,         // Internal precision (usually 8)
    pub display_decimals: u32, // Max decimals for input/display
    pub name: String,
}
}

4. decimals vs display_decimals

Distinguishing these two concepts is crucial:

`decimals` (Internal Precision)

Determines the multiplier for u64.
Usually 8 (like satoshi).
This is internal storage format, invisible to users.

`display_decimals` (Display Precision)

Determines how many decimal places users can see/input.
E.g., BTC displays 3 digits: 0.001 BTC.
USDT displays 2 digits: 100.00 USDT.

Why separate them?

UX: Users don’t need to see 8 decimal places.
Validation: Limit user input precision.
Cleanliness: Avoid trailing zeros.

5. Program Output

Output after cargo run:

--- 0xInfinity: Stage 3 (Decimal World) ---
Symbol: BTC_USDT (ID: 0)
Price Decimals: 2, Qty Display Decimals: 3

[1] Makers coming in...
    Order 1: Sell 10.000 BTC @ $100.00
    Order 2: Sell 5.000 BTC @ $102.00
    Order 3: Sell 5.000 BTC @ $101.00

[2] Taker eats liquidity...
    Order 4: Buy 12.000 BTC @ $101.50
MATCH: Buy 4 eats Sell 1 @ Price 10000 (Qty: 10000)
MATCH: Buy 4 eats Sell 3 @ Price 10100 (Qty: 2000)

[3] More makers...
    Order 5: Buy 10.000 BTC @ $99.00

--- End of Simulation ---

--- u64 Range Demo ---
u64::MAX = 18446744073709551615
With 8 decimals, max representable value = 184467440737.09551615

Observation:

User input is decimal string "100.00".
Internal storage is integer 10000.
Display converts back to "100.00".

This is the core of Decimal World: Seamless lossless conversion between Decimal Strings and u64 Integers.

📖 True Story: JavaScript Number Overflow

During development, we encountered a bizarre bug:

Symptom: The backend returned raw ETH amount (in wei). During testing with small amounts (0.00x ETH), frontend worked fine. But once the amount hit ~0.009 ETH, the number started losing precision and became incorrect!

Root Cause: JavaScript’s Number type uses IEEE 754 double-precision floats. The maximum safe integer is 2^53 - 1:

> console.log(Number.MAX_SAFE_INTEGER);
9007199254740991                          // ~ 9 * 10^15

// 1 ETH = 10^18 wei
> const oneEthInWei = 1000000000000000000;

// The Issue: When wei amount exceeds MAX_SAFE_INTEGER
> const smallAmount = 1000000000000000;     // 0.001 ETH = 10^15 wei ✅ Safe
> const dangerAmount = 9007199254740992;    // ~ 0.009 ETH ⚠️ Just exceeded limit!
> const tenEthInWei = 10000000000000000000; // 10 ETH = 10^19 wei ❌ Overflow!

// Verify Precision Loss: Adding 1 has no effect!
> console.log(tenEthInWei + 1);
10000000000000000000                       // No +1!
> console.log(tenEthInWei === tenEthInWei + 1);
true                                       // 😱 WHAT?!

Why ~0.009 ETH?

> console.log(Number.MAX_SAFE_INTEGER / 1e18);
0.009007199254740991                       // 0.009 ETH is the safety limit!

Solution:

// ✅ Solution 1: Backend returns String, Frontend uses BigInt
> const weiString = "10000000000000000000";  // String from backend
> const weiBigInt = BigInt(weiString);       // Convert to BigInt
> console.log((weiBigInt + 1n).toString());
10000000000000000001                       // ✅ Correct!

// ✅ Solution 2: Use libraries like ethers.js
// import { formatEther, parseEther } from 'ethers';
// const eth = formatEther(weiBigInt);  // "10.0"

Summary

This chapter solved:

✅ Decimal Conversion: parse_decimal() and format_decimal() for bidirectional lossless conversion.
✅ u64 Range: Max value 184 Billion (at 8 decimals), sufficient for any financial scenario.
✅ Symbol Config: SymbolManager handles precision settings per pair.
✅ Precision Definitions: Distinct decimals (internal) vs display_decimals (UI).

↑ Back to Top

🇨🇳 中文

📦 代码变更: 查看 Diff

在上一章中，我们将所有的 f64 重构为 u64，解决了浮点数的精度问题。但这引入了一个新的问题：客户端使用的是十进制，而我们内部使用的是整数，如何进行转换？

1. 十进制转换问题 (The Decimal Conversion Problem)

用户在下单时，输入的价格是 "100.50"，数量是 "10.5"。但我们的引擎内部使用的是 u64 整数：

#![allow(unused)]
fn main() {
pub struct Order {
    pub id: u64,
    pub price: u64,   // 整数表示
    pub qty: u64,     // 整数表示
    pub side: Side,
}
}

核心问题：如何在十进制字符串和 u64 之间进行无损转换？

答案是使用 固定小数位数（Fixed Decimal） 方案：

#![allow(unused)]
fn main() {
/// 将十进制字符串转换为 u64
/// e.g., "100.50" with 2 decimals -> 10050
fn parse_decimal(s: &str, decimals: u32) -> u64 {
    let multiplier = 10u64.pow(decimals);
    // ... 解析逻辑
}

/// 将 u64 转换回十进制字符串用于显示
/// e.g., 10050 with 2 decimals -> "100.50"
fn format_decimal(value: u64, decimals: u32) -> String {
    let multiplier = 10u64.pow(decimals);
    let int_part = value / multiplier;
    let dec_part = value % multiplier;
    format!("{}.{:0>width$}", int_part, dec_part, width = decimals as usize)
}
}

2. u64 的最大值问题 (u64 Max Value)

u64 的最大值是：

u64::MAX = 18,446,744,073,709,551,615

如果我们使用 8 位小数（类似比特币的 satoshi），可以表示的最大值是：

184,467,440,737.09551615

这意味着：

对于价格：可以表示到约 1844 亿，某天比特币需要这么大价格表示的时候再升级吧….
对于数量：可以装进去全部比特币BTC总量了（总供应量 2100 万）

不同资产的 Decimals 配置

不同的区块链资产有不同的原生精度：

Asset	Native Decimals	最小单位
BTC	8	1 satoshi = 0.00000001 BTC
USDT (ERC20)	6	0.000001 USDT
ETH	18	1 wei = 0.000000000000000001 ETH

问题来了：但是 ETH 原生是 18 位小数，但我们只用 8 位会丢失精度吗？

答案是：在交易所场景下足够使用。因为：

定义8位的时候交易所交易的最小支持精度是 0.00000001 ETH, 足够了
没有必要支持交易 0.000000000000000001 ETH（价值约 $0.000000000000003）

所以我们可以选择一个合理的内部精度，不一定要和原生链一致。

因此，我们需要一个资产和币对的基本配置管理器（SymbolManager），用于：

管理每个资产的内部精度（decimals）
管理用户可见的显示精度（display_decimals）
管理交易对的价格精度配置
在入金/提币时进行链上精度和内部精度的转换

ETH Decimals 分析：8 到 12 位的选择

让我们分析不同 decimals 配置下，u64 能表示的最大 ETH 数量：

Decimals	乘数	u64 能表示的最大值	够用？
8	10^8	184,467,440,737 ETH	✅ 远超总供应量
9	10^9	18,446,744,073 ETH	✅ 远超总供应量
10	10^10	1,844,674,407 ETH	✅ 超过总供应量
11	10^11	184,467,440 ETH	✅ 刚好超过总供应量 (~120M)
12	10^12	18,446,744 ETH	❌ 小于总供应量！
18	10^18	18.44 ETH	❌ 完全不够用

ETH 当前总供应量约 1.2 亿 ETH

分析：

8 位小数：最大 1844 亿 ETH，余量巨大，精度 0.00000001 ETH 对交易所足够
10 位小数：最大 18 亿 ETH，精度更高
12 位小数：最大 1800 万 ETH，精度最高，⚠️ 但小于总供应量

为什么 ETH 选择 8 位小数？

虽然 ETH 原生是 18 位小数（wei），但对于交易所来说：

0.00000001 ETH ≈ $0.00000003，远小于任何有意义的交易金额
最大可表示 1844 亿 ETH，远超总供应量（1.2 亿）
入金/提币时进行精度转换即可

配置示例：

#![allow(unused)]
fn main() {
// BTC: 8 位小数（和链上 satoshi 一致）
manager.add_asset(1, 8, 3, "BTC");

// USDT: 8 位小数
manager.add_asset(2, 8, 2, "USDT");

// ETH: 8 位小数（精度足够，范围安全）
manager.add_asset(3, 8, 4, "ETH");
}

3. 交易对配置 (Symbol Configuration)

不同的交易对可能有不同的精度要求：

Symbol	Price Decimals	Qty Display Decimals	Example
BTC_USDT	2	3	买 0.001 BTC @ $65000.00
ETH_USDT	2	4	买 0.0001 ETH @ $3500.00
DOGE_USDT	6	0	买 100 DOGE @ $0.123456

我们使用 SymbolManager 来管理这些配置：

#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
pub struct SymbolInfo {
    pub symbol: String,
    pub symbol_id: u32,
    pub base_asset_id: u32,
    pub quote_asset_id: u32,
    pub price_decimal: u32,         // 价格的小数位数
    pub price_display_decimal: u32, // 价格显示的小数位数
}

#[derive(Debug, Clone)]
pub struct AssetInfo {
    pub asset_id: u32,
    pub decimals: u32,         // 内部精度（通常是 8）
    pub display_decimals: u32, // 显示/输入的最大小数位数
    pub name: String,
}
}

4. decimals vs display_decimals

这里有两个概念需要区分：

`decimals` (内部精度)

决定了 u64 乘以多少
通常是 8（类似 satoshi）
这是内部存储精度，用户看不到

`display_decimals` (显示精度)

决定了用户可以输入/看到多少位小数
例如 BTC 显示 3 位：0.001 BTC
USDT 显示 2 位：100.00 USDT

为什么要分开？

用户体验：用户不需要看到 8 位小数的精度
输入验证：可以限制用户输入的小数位数
显示简洁：避免显示过多无意义的零

5. 运行结果

运行 cargo run 后的输出：

--- 0xInfinity: Stage 3 (Decimal World) ---
Symbol: BTC_USDT (ID: 0)
Price Decimals: 2, Qty Display Decimals: 3

[1] Makers coming in...
    Order 1: Sell 10.000 BTC @ $100.00
    Order 2: Sell 5.000 BTC @ $102.00
    Order 3: Sell 5.000 BTC @ $101.00

[2] Taker eats liquidity...
    Order 4: Buy 12.000 BTC @ $101.50
MATCH: Buy 4 eats Sell 1 @ Price 10000 (Qty: 10000)
MATCH: Buy 4 eats Sell 3 @ Price 10100 (Qty: 2000)

[3] More makers...
    Order 5: Buy 10.000 BTC @ $99.00

--- End of Simulation ---

--- u64 Range Demo ---
u64::MAX = 18446744073709551615
With 8 decimals, max representable value = 184467440737.09551615

可以看到：

用户输入的是十进制字符串 "100.00"
内部存储为整数 10000
显示时又转换回 "100.00"

这就是 Decimal World 的核心：在十进制和 u64 整数之间无缝转换。

📖 真实踩坑故事：JavaScript Number 溢出

在我们的开发过程中，曾经遇到过一个非常诡异的 bug：

现象：后端返回给前端的是原始 ETH 数量（单位 wei）。在开发测试阶段，因为测试金额非常小（0.00x 个 ETH 级别），前端都能正常显示和处理。但上线后只要金额稍大一点（实际上超过约 0.009 ETH），数字就开始出现精度问题，变成一个不正确的数值！

根本原因：JavaScript 的 Number 类型使用 IEEE 754 双精度浮点数，最大安全整数是 2^53 - 1：

> console.log(Number.MAX_SAFE_INTEGER);
9007199254740991                          // 约 9 * 10^15

// 1 ETH = 10^18 wei
> const oneEthInWei = 1000000000000000000;

// 问题演示：当 wei 数量超过 MAX_SAFE_INTEGER 时
> const smallAmount = 1000000000000000;     // 0.001 ETH = 10^15 wei ✅ 安全
> const dangerAmount = 9007199254740992;    // 约 0.009 ETH ⚠️ 刚好超过安全范围
> const tenEthInWei = 10000000000000000000; // 10 ETH = 10^19 wei ❌ 溢出！

// 验证精度丢失：加 1 后值不变！
> console.log(tenEthInWei + 1);
10000000000000000000                       // 没有 +1!

> console.log(tenEthInWei + 2);
10000000000000000000                       // 还是一样!

> console.log(tenEthInWei + 1000);
10000000000000000000                       // 加 1000 也还是一样!

> console.log(tenEthInWei === tenEthInWei + 1);
true                                       // 😱 见鬼了！

为什么超过约 0.009 个 ETH 就出问题？

> console.log(Number.MAX_SAFE_INTEGER / 1e18);
0.009007199254740991                       // 约 0.009 ETH 就是安全边界！

// 虽然输出看起来正确，但实际上精度已经丢失，验证方法：
> const nineEth = 9n * 10n ** 18n;         // BigInt 表示 9 ETH
> const nineEthNum = Number(nineEth);      // 转为 Number
> console.log(nineEthNum);
9000000000000000000                        // 看起来正确...

> console.log(nineEthNum + 1);
9000000000000000000                        // 但是 +1 没有效果！

> console.log(nineEthNum === nineEthNum + 1);
true                                       // 证明精度已丢失

正确的处理方案：

// ✅ 方案 1: 后端返回字符串，前端用 BigInt 处理
> const weiString = "10000000000000000000";  // 后端返回字符串
> const weiBigInt = BigInt(weiString);       // 转为 BigInt
> console.log(weiBigInt.toString());
10000000000000000000                       // ✅ 精确！

// BigInt 可以正确进行算术运算
> console.log((weiBigInt + 1n).toString());
10000000000000000001                       // ✅ +1 正确！

// ✅ 方案 2: 使用专业库如 ethers.js
// import { formatEther, parseEther } from 'ethers';
// const eth = formatEther(weiBigInt);  // "10.0"

Summary

本章解决了以下问题：

✅ 十进制转换：parse_decimal() 和 format_decimal() 实现双向无损转换
✅ u64 范围：最大值 1844 亿（8 位小数），足够应对任何金融场景
✅ 交易对配置：SymbolManager 管理每个交易对的精度设置
✅ 两种精度定义：decimals（内部）vs display_decimals（显示）

0x04 OrderBook Refactoring (BTreeMap)

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

In the previous chapters, we completed the transition from Float to Integer and established a precision configuration system. However, our OrderBook data structure was still a “toy” implementation—re-sorting on every match! This chapter upgrades it to a truly production-ready data structure.

1. The Problem with the Naive Implementation

Let’s review the original engine.rs:

#![allow(unused)]
fn main() {
pub struct OrderBook {
    bids: Vec<PriceLevel>,  // Was 'buys'
    asks: Vec<PriceLevel>,  // Was 'sells'
}
}

💡 Naming Convention: We renamed buys/sells to bids/asks. These are standard options industry terms:

Bid: Price buyers are willing to pay.

Ask: Price sellers are demanding.

Using professional terminology aligns the code with industry docs and APIs.

#![allow(unused)]
fn main() {
fn match_buy(&mut self, buy_order: &mut Order) {
    // Problem 1: Re-sort every time! O(n log n)
    self.asks.sort_by_key(|l| l.price);
    
    for level in self.asks.iter_mut() {
        // ...matching logic...
    }
    
    // Problem 2: Removing empty levels shifts the whole array! O(n)
    self.asks.retain(|l| !l.orders.is_empty());
}

fn rest_order(&mut self, order: Order) {
    // Problem 3: Finding price level is a linear scan! O(n)
    let level = self.asks.iter_mut().find(|l| l.price == order.price);
    // ...
}
}

Time Complexity Analysis

Operation	Vec Impl	Issue
Insert Order	O(n)	Linear scan for price level
Pre-match Sort	O(n log n)	Sort required before every match
Remove Empty Level	O(n)	Array element shifting

In an active exchange with tens of thousands of orders per second, O(n) operations quickly become a performance bottleneck.

2. The BTreeMap Solution

Rust’s standard library provides BTreeMap, a Self-Balancing Binary Search Tree:

#![allow(unused)]
fn main() {
use std::collections::BTreeMap;

pub struct OrderBook {
    /// Asks: price -> orders (Ascending, Lowest Price = Best Ask)
    asks: BTreeMap<u64, VecDeque<Order>>,
    
    /// Bids: (u64::MAX - price) -> orders (Trick: Highest Price First)
    bids: BTreeMap<u64, VecDeque<Order>>,
}
}

Key Trick: Key Design for Bids

BTreeMap sorts keys in ascending order by default. This works perfectly for Asks (lowest price first). But for Bids, we need highest price first.

Solution: Use u64::MAX - price as the key.

#![allow(unused)]
fn main() {
// Insert Bid
let key = u64::MAX - order.price;
self.bids.entry(key).or_insert_with(VecDeque::new).push_back(order);

// Read Real Price
let price = u64::MAX - key;
}

Thus, Price 100 becomes Key u64::MAX - 100, and Price 99 becomes u64::MAX - 99. Since (u64::MAX - 100) < (u64::MAX - 99), Price 100 comes before Price 99!

Why not `Reverse` or Custom Comparator?

You might ask: Why not BTreeMap<Reverse<u64>, ...>?

Comparison:

Approach	Issue
`BTreeMap<Reverse<u64>>`	`Reverse` is a wrapper; unwrapping on every access adds complexity.
Custom `Ord`	Requires a newtype wrapper, increasing boilerplate.
`u64::MAX - price`	Zero-Cost Abstraction: Two subtraction ops, easily inlined by compiler.

Key Advantages:

Simple: Just two lines of code.
Zero Overhead: Subtraction is a single-cycle CPU instruction.
Type Safe: Key remains u64.
No Overflow: Price is always < u64::MAX.

Time Complexity Comparison

Operation	Vec Impl	BTreeMap Impl
Insert Order	O(n)	O(log n)
Match (No Sort)	-	O(log n)
Cancel Order	O(n)	O(n)*
Remove Empty Level	O(n)	O(log n)
Query Best Price	O(n) / O(n log n)	O(1)xx

*Note: Cancelling requires linear scan in VecDeque (O(n)). O(1) cancel requires an auxiliary HashMap index. **Note: BTreeMap::first_key_value() is amortized O(1).

3. New Data Models

Order

#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
pub struct Order {
    pub id: u64,
    pub price: u64,          // Internal Integer Price
    pub qty: u64,            // Original Qty
    pub filled_qty: u64,     // Filled Qty
    pub side: Side,
    pub order_type: OrderType,
    pub status: OrderStatus,
}
}

Trade

#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
pub struct Trade {
    pub id: u64,
    pub buyer_order_id: u64,
    pub seller_order_id: u64,
    pub price: u64,
    pub qty: u64,
}
}

OrderResult

#![allow(unused)]
fn main() {
pub struct OrderResult {
    pub order: Order,       // Updated Order
    pub trades: Vec<Trade>, // Generated Trades
}
}

4. Core API

#![allow(unused)]
fn main() {
impl OrderBook {
    /// Add order, return match result
    pub fn add_order(&mut self, order: Order) -> OrderResult;
    
    /// Cancel order
    pub fn cancel_order(&mut self, order_id: u64, price: u64, side: Side) -> bool;
    
    /// Get Best Bid
    pub fn best_bid(&self) -> Option<u64>;
    
    /// Get Best Ask
    pub fn best_ask(&self) -> Option<u64>;
    
    /// Get Spread
    pub fn spread(&self) -> Option<u64>;
}
}

5. Execution Results

=== 0xInfinity: Stage 4 (BTree OrderBook) ===
Symbol: BTC_USDT (ID: 0)
Price Decimals: 2, Qty Display Decimals: 3

[1] Makers coming in...
    Order 1: Sell 10.000 BTC @ $100.00 -> New
    Order 2: Sell 5.000 BTC @ $102.00 -> New
    Order 3: Sell 5.000 BTC @ $101.00 -> New

    Book State: Best Bid=None, Best Ask=Some("100.00"), Spread=None

[2] Taker eats liquidity...
    Order 4: Buy 12.000 BTC @ $101.50
    Trades:
      - Trade #1: 10.000 @ $100.00
      - Trade #2: 2.000 @ $101.00
    Order Status: Filled, Filled: 12.000/12.000

    Book State: Best Bid=None, Best Ask=Some("101.00")

[3] More makers...
    Order 5: Buy 10.000 BTC @ $99.00 -> New

    Final Book State: Best Bid=Some("99.00"), Best Ask=Some("101.00"), Spread=Some("2.00")

=== End of Simulation ===

Observations:

Orders matched correctly by price priority (First $100, then $101).
Every trade recorded in Trades.
Real-time tracking of Best Bid/Ask and Spread.

6. Unit Tests

We added 8 unit tests covering core scenarios:

$ cargo test

running 8 tests
test engine::tests::test_add_resting_order ... ok
test engine::tests::test_cancel_order ... ok
test engine::tests::test_fifo_at_same_price ... ok
test engine::tests::test_full_match ... ok
test engine::tests::test_multiple_trades_single_order ... ok
test engine::tests::test_partial_match ... ok
test engine::tests::test_price_priority ... ok
test engine::tests::test_spread ... ok

test result: ok. 8 passed; 0 failed

7. Is BTreeMap Enough?

For an exchange not chasing extreme performance, BTreeMap is perfectly adequate:

Scenario	BTreeMap Performance
1,000 TPS	Easy
10,000 TPS	Manageable
100,000+ TPS	Need specialized structures

If you want to build a Ferrari-level matching engine (nanosecond latency, millions of TPS), you need:

Lock-free data structures
Memory pools (avoid heap allocation)
CPU Cache optimization
FPGA acceleration

But that’s for later. For now, we have a Correct and Efficient baseline implementation.

Summary

This chapter accomplished:

✅ Analyzed Problem: O(n) bottleneck in Vec implementation.
✅ Refactored to BTreeMap: O(log n) insert/search/delete.
✅ Defined Types: Standard Order/Trade/OrderResult models.
✅ Refined API: best_bid/ask, spread, cancel_order.
✅ Added Tests: 8 tests covering core logic.

↑ Back to Top

🇨🇳 中文

📦 代码变更: 查看 Diff

在前三章中，我们完成了从浮点数到整数的转换，并建立了精度配置系统。但我们的 OrderBook 数据结构还是一个“玩具“实现——每次撮合都需要重新排序！本章我们将把它升级为一个真正生产可用的数据结构。

1. 原有实现的问题

让我们回顾一下原来的 engine.rs：

#![allow(unused)]
fn main() {
pub struct OrderBook {
    bids: Vec<PriceLevel>,  // 原来叫 buys
    asks: Vec<PriceLevel>,  // 原来叫 sells
}
}

💡 命名规范：我们把 buys/sells 改名为 bids/asks。这是金融行业的标准术语：

Bid（买盘）：买方愿意出的价格

Ask（卖盘）：卖方要求的价格

使用专业术语可以让代码更易于与行业文档、API 对接。

#![allow(unused)]
fn main() {
fn match_buy(&mut self, buy_order: &mut Order) {
    // 问题 1: 每次都要重新排序！O(n log n)
    self.asks.sort_by_key(|l| l.price);
    
    for level in self.asks.iter_mut() {
        // ...matching logic...
    }
    
    // 问题 2: 删除空档位需要移动整个数组！O(n)
    self.asks.retain(|l| !l.orders.is_empty());
}

fn rest_order(&mut self, order: Order) {
    // 问题 3: 查找价格档位是线性扫描！O(n)
    let level = self.asks.iter_mut().find(|l| l.price == order.price);
    // ...
}
}

时间复杂度分析

操作	Vec 实现	问题
插入订单	O(n)	线性查找价格档位
撮合前排序	O(n log n)	每次撮合都要排序
删除空档位	O(n)	数组元素移动

在一个活跃的交易所，每秒可能有数万笔订单。如果每笔订单都要 O(n) 操作，这里很快就会成为性能瓶颈。

2. BTreeMap 解决方案

Rust 标准库提供了 BTreeMap，它是一个自平衡二叉搜索树：

#![allow(unused)]
fn main() {
use std::collections::BTreeMap;

pub struct OrderBook {
    /// 卖单: price -> orders (按价格升序，最低价 = 最优卖价)
    asks: BTreeMap<u64, VecDeque<Order>>,
    
    /// 买单: (u64::MAX - price) -> orders (技巧：让最高价排在前面)
    bids: BTreeMap<u64, VecDeque<Order>>,
}
}

关键技巧：买单的 Key 设计

BTreeMap 默认按 key 升序排列。对于卖单，这正好是我们想要的（最低价优先）。但对于买单，我们需要最高价优先。

解决方案：使用 u64::MAX - price 作为 key：

#![allow(unused)]
fn main() {
// 插入买单
let key = u64::MAX - order.price;
self.bids.entry(key).or_insert_with(VecDeque::new).push_back(order);

// 读取真实价格
let price = u64::MAX - key;
}

这样，价格 100 对应 key u64::MAX - 100，价格 99 对应 key u64::MAX - 99。由于 (u64::MAX - 100) < (u64::MAX - 99)，价格 100 会排在价格 99 前面！

为什么不用 `Reverse` 或自定义比较器？

你可能会问：为什么不用 BTreeMap<Reverse<u64>, ...> 或者自定义比较器？

方案对比：

方案	问题
`BTreeMap<Reverse<u64>, ...>`	`Reverse` 是一个 wrapper 类型，每次访问 key 都需要解包，增加代码复杂度
自定义 `Ord` trait	需要创建 newtype wrapper，代码量大增
`u64::MAX - price`	零成本抽象：两次减法操作，编译器可以内联优化

关键优势：

简单：只需要两行代码（插入时 u64::MAX - price，读取时再减回来）
零开销：减法操作在 CPU 上是单周期指令
类型安全：key 仍然是 u64，不需要额外的 wrapper 类型
无溢出风险：价格永远小于 u64::MAX，减法不会溢出

时间复杂度对比

操作	Vec 实现	BTreeMap 实现
插入订单	O(n)	O(log n)
撮合（不排序）	-	O(log n)
取消订单	O(n)	O(n)*
删除空价格档	O(n)	O(log n)
查询最优价	O(n) 或 O(n log n)	O(1)xx

*注: 取消订单需要在 VecDeque 中线性查找订单 ID，这是 O(n)。如果需要 O(1) 取消，需要额外的 HashMap 索引。

**注: BTreeMap 的 first_key_value() 是 O(1) 摊销复杂度。

3. 新的数据模型

Order（订单）

#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
pub struct Order {
    pub id: u64,
    pub price: u64,          // 价格（内部单位）
    pub qty: u64,            // 原始数量
    pub filled_qty: u64,     // 已成交数量
    pub side: Side,
    pub order_type: OrderType,
    pub status: OrderStatus,
}
}

Trade（成交记录）

#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
pub struct Trade {
    pub id: u64,
    pub buyer_order_id: u64,
    pub seller_order_id: u64,
    pub price: u64,
    pub qty: u64,
}
}

OrderResult（下单结果）

#![allow(unused)]
fn main() {
pub struct OrderResult {
    pub order: Order,      // 更新后的订单
    pub trades: Vec<Trade>, // 产生的成交
}
}

4. 核心 API

#![allow(unused)]
fn main() {
impl OrderBook {
    /// 添加订单，返回成交结果
    pub fn add_order(&mut self, order: Order) -> OrderResult;
    
    /// 取消订单
    pub fn cancel_order(&mut self, order_id: u64, price: u64, side: Side) -> bool;
    
    /// 获取最优买价
    pub fn best_bid(&self) -> Option<u64>;
    
    /// 获取最优卖价
    pub fn best_ask(&self) -> Option<u64>;
    
    /// 获取买卖价差
    pub fn spread(&self) -> Option<u64>;
}
}

5. 运行结果

=== 0xInfinity: Stage 4 (BTree OrderBook) ===
Symbol: BTC_USDT (ID: 0)
Price Decimals: 2, Qty Display Decimals: 3

[1] Makers coming in...
    Order 1: Sell 10.000 BTC @ $100.00 -> New
    Order 2: Sell 5.000 BTC @ $102.00 -> New
    Order 3: Sell 5.000 BTC @ $101.00 -> New

    Book State: Best Bid=None, Best Ask=Some("100.00"), Spread=None

[2] Taker eats liquidity...
    Order 4: Buy 12.000 BTC @ $101.50
    Trades:
      - Trade #1: 10.000 @ $100.00
      - Trade #2: 2.000 @ $101.00
    Order Status: Filled, Filled: 12.000/12.000

    Book State: Best Bid=None, Best Ask=Some("101.00")

[3] More makers...
    Order 5: Buy 10.000 BTC @ $99.00 -> New

    Final Book State: Best Bid=Some("99.00"), Best Ask=Some("101.00"), Spread=Some("2.00")

=== End of Simulation ===

可以看到：

订单按价格优先级正确匹配（先 $100，再 $101）
每笔成交都记录在 Trade 中
实时追踪 Best Bid/Ask 和 Spread

6. 单元测试

我们添加了 8 个单元测试来验证核心功能：

$ cargo test

running 8 tests
test engine::tests::test_add_resting_order ... ok
test engine::tests::test_cancel_order ... ok
test engine::tests::test_fifo_at_same_price ... ok
test engine::tests::test_full_match ... ok
test engine::tests::test_multiple_trades_single_order ... ok
test engine::tests::test_partial_match ... ok
test engine::tests::test_price_priority ... ok
test engine::tests::test_spread ... ok

test result: ok. 8 passed; 0 failed

覆盖的场景包括：

✅ 订单挂单（无匹配）
✅ 完全成交
✅ 部分成交
✅ 价格优先级（Price Priority）
✅ 同价格 FIFO
✅ 取消订单
✅ 价差计算
✅ 一个大单吃掉多个小单

7. BTreeMap 够用吗？

对于一个不追求极致性能的交易所，BTreeMap 完全够用：

场景	BTreeMap 表现
每秒 1000 单	轻松应对
每秒 10000 单	可以应对
每秒 100000+ 单	需要更专业的数据结构

如果你要打造一个法拉利级别的撮合引擎（纳秒级延迟、每秒百万单），需要考虑：

无锁数据结构
内存池（避免动态分配）
CPU Cache 优化
FPGA 硬件加速

但那是后话了。现在，我们有了一个正确且高效的基础实现。

Summary

本章完成了以下工作：

✅ 分析原有问题：Vec 实现的 O(n) 复杂度瓶颈
✅ 重构为 BTreeMap：O(log n) 的插入、查找、删除
✅ 定义规范类型：Order、Trade、OrderResult
✅ 完善 API：best_bid/ask、spread、cancel_order
✅ 添加单元测试：8 个测试覆盖核心场景

0x05 User Account & Balance Management

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

In previous chapters, our matching engine could match orders correctly. But there’s a key question: User Funds? In a real exchange, users must have sufficient funds before placing an order, and funds must be transferred upon matching.

This chapter implements the user account system, including:

Balance Management (Avail / Frozen)
Pre-trade Fund Validation
Post-trade Settlement

1. Dual State of Balance: Avail vs Frozen

In an exchange, a balance has two states:

State	Meaning	Usage
Avail	Can be used for trading or withdrawal	Daily operations
Frozen	Locked in open orders	Waiting for match or cancel

Why do we need Frozen?

Suppose Alice has 10 BTC and she places two sell orders:

Order A: Sell 8 BTC
Order B: Sell 5 BTC

Without a freeze mechanism, these two orders require 13 BTC, but Alice only has 10! This is the Over-Selling problem.

Correct Flow:

1. Alice has 10 BTC (avail=10, frozen=0)
2. Place Order A (8 BTC) → freeze 8 BTC → (avail=2, frozen=8) ✅
3. Place Order B (5 BTC) → try freeze 5 BTC → Fail! avail only 2 ❌

2. Balance Structure

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Default)]
pub struct Balance {
    pub avail: u64,  // Available Balance
    pub frozen: u64, // Frozen Balance
}

impl Balance {
    /// Deposit (Increase avail)
    /// Returns false on overflow - Financial systems must detect this!
    pub fn deposit(&mut self, amount: u64) -> bool {
        match self.avail.checked_add(amount) {
            Some(new_avail) => {
                self.avail = new_avail;
                true
            }
            None => false, // Overflow! Alert and investigate.
        }
    }
}

Why checked_add?

Method Overflow Behavior (250u8 + 10u8) Use Case

+ (Std) Panic (Debug) or Wrap (Release) General logic, overflow is bug

wrapping_add 4 (Wrap) Hashing, Graphics

saturating_add 255 (Cap) Quotas, Token buckets

checked_add None ✅ Finance, Overflow must error!

⚠️ In financial systems, “too much money causing overflow” is a severe bug. It must return an error for handling, not silently wrap or saturate.

Method	Overflow Behavior (250u8 + 10u8)	Use Case
`+` (Std)	Panic (Debug) or Wrap (Release)	General logic, overflow is bug
`wrapping_add`	4 (Wrap)	Hashing, Graphics
`saturating_add`	255 (Cap)	Quotas, Token buckets
`checked_add`	`None`	✅ Finance, Overflow must error!

#![allow(unused)]
fn main() {
    /// Freeze (avail → frozen)
    pub fn freeze(&mut self, amount: u64) -> bool {
        if self.avail >= amount {
            self.avail -= amount;
            self.frozen += amount;
            true
        } else {
            false
        }
    }

    /// Unfreeze (frozen → avail), for cancellations
    pub fn unfreeze(&mut self, amount: u64) -> bool {
        if self.frozen >= amount {
            self.frozen -= amount;
            self.avail += amount;
            true
        } else {
            false
        }
    }

    /// Consume Frozen (Fund leaves account after match)
    pub fn consume_frozen(&mut self, amount: u64) -> bool {
        if self.frozen >= amount {
            self.frozen -= amount;
            true
        } else {
            false
        }
    }

    /// Receive Funds (Fund enters account after match)
    pub fn receive(&mut self, amount: u64) {
        self.avail = self.avail.checked_add(amount);
    }
}
}

3. User Account Structure

Each user holds balances for multiple assets:

#![allow(unused)]
fn main() {
/// Use FxHashMap for O(1) asset lookup
/// FxHashMap is faster for integer keys
pub struct UserAccount {
    pub user_id: u64,
    balances: FxHashMap<u32, Balance>, // asset_id -> Balance
}

impl UserAccount {
    pub fn deposit(&mut self, asset_id: u32, amount: u64) {
        self.get_balance_mut(asset_id).deposit(amount);
    }

    pub fn avail(&self, asset_id: u32) -> u64 {
        self.balances.get(&asset_id).map(|b| b.avail).unwrap_or(0)
    }

    pub fn frozen(&self, asset_id: u32) -> u64 {
        self.balances.get(&asset_id).map(|b| b.frozen).unwrap_or(0)
    }
}
}

4. Order Placing: Freezing Funds

When placing an order, we freeze specific assets based on order side:

Order Side	Asset to Freeze	Amount
Buy	Quote Asset (e.g. USDT)	price × quantity / qty_unit
Sell	Base Asset (e.g. BTC)	quantity

Using SymbolManager for Precision

Each pair has its own precision config:

#![allow(unused)]
fn main() {
let symbol_info = manager.get_symbol_info("BTC_USDT").unwrap();
let price_decimal = symbol_info.price_decimal;  // 2
let base_asset = manager.assets.get(&symbol_info.base_asset_id).unwrap();
let qty_decimal = base_asset.decimals;  // 8
let qty_unit = 10u64.pow(qty_decimal);  // 100_000_000

// price = 100 USDT (Internal: 100 * price_unit)
// qty = 10 BTC (Internal: 10 * qty_unit)
// cost = price * qty / qty_unit (Prevent overflow)
let cost = price * qty / qty_unit;

if accounts.freeze(user_id, USDT, cost) {
    let result = book.add_order(Order::new(id, user_id, price, qty, Side::Buy));
} else {
    println!("REJECTED: Insufficient balance");
}

// Sell Order: Freeze BTC
if accounts.freeze(user_id, BTC, qty) {
    let result = book.add_order(Order::new(id, user_id, price, qty, Side::Sell));
}
}

5. Settlement: Fund Transfer

When orders match, funds transfer between buyer and seller:

Trade: Alice sells 1 BTC to Bob @ $100

Before:
  Alice: BTC(frozen=1), USDT(avail=0)
  Bob:   BTC(avail=0), USDT(frozen=100)

Settlement:
  Alice: consume_frozen(BTC, 1) + receive(USDT, 100)
  Bob:   consume_frozen(USDT, 100) + receive(BTC, 1)

After:
  Alice: BTC(frozen=0), USDT(avail=100)
  Bob:   BTC(avail=1), USDT(frozen=0)

Code Implementation:

#![allow(unused)]
fn main() {
pub fn settle_trade(
    &mut self,
    buyer_id: u64,
    seller_id: u64,
    base_asset_id: u32,
    quote_asset_id: u32,
    base_amount: u64,    // Trade Qty
    quote_amount: u64,   // Trade Amount (price × qty)
) {
    // Buyer: Use USDT, Get BTC
    self.get_account_mut(buyer_id)
        .get_balance_mut(quote_asset_id)
        .consume_frozen(quote_amount);
    self.get_account_mut(buyer_id)
        .get_balance_mut(base_asset_id)
        .receive(base_amount);

    // Seller: Use BTC, Get USDT
    self.get_account_mut(seller_id)
        .get_balance_mut(base_asset_id)
        .consume_frozen(base_amount);
    self.get_account_mut(seller_id)
        .get_balance_mut(quote_asset_id)
        .receive(quote_amount);
}
}

6. Refined Trade Structure

To support settlement, Trade needs user IDs:

#![allow(unused)]
fn main() {
pub struct Trade {
    pub id: u64,
    pub buyer_order_id: u64,
    pub seller_order_id: u64,
    pub buyer_user_id: u64,   // New
    pub seller_user_id: u64,  // New
    pub price: u64,
    pub qty: u64,
}
}

7. Execution Results

=== 0xInfinity: Stage 5 (User Balance) ===
Symbol: BTC_USDT | Price: 2 decimals, Qty: 8 decimals
Cost formula: price * qty / 100000000

[0] Initial deposits...
    Alice: 100.00000000 BTC, 10000.00 USDT
    Bob:   5.00000000 BTC, 200000.00 USDT

[1] Alice places sell orders...
    Order 1: Sell 10.00000000 BTC @ $100.00 -> New
    Order 2: Sell 5.00000000 BTC @ $101.00 -> New
    Alice balance: avail=85.00000000 BTC, frozen=15.00000000 BTC

[2] Bob places buy order (taker)...
    Order 3: Buy 12.00000000 BTC @ $101.00 (cost: 1212.00 USDT)
    Trades:
      - Trade #1: 10.00000000 BTC @ $100.00
      - Trade #2: 2.00000000 BTC @ $101.00
    Order status: Filled

[3] Final balances:
    Alice: 85.00000000 BTC (frozen: 3.00000000), 11202.00 USDT
    Bob:   17.00000000 BTC, 198798.00 USDT (frozen: 0.00)

    Book: Best Bid=None, Best Ask=Some("101.00")

Analysis:

Alice initial 100 BTC. Sold 10+2=12. Remaining 85 avail + 3 frozen = 88 BTC ✓
Alice got 10×100 + 2×101 = 1202 USDT. Initial 10000 + 1202 = 11202 USDT ✓
Bob initial 5 BTC. Bought 12. Total 17 BTC ✓
Bob spent 1202 USDT. Initial 200000 - 1202 = 198798 USDT ✓

Summary

This chapter accomplished:

✅ Implemented Balance: Dual-state (avail/frozen).
✅ Implemented UserAccount: Multi-asset support.
✅ Implemented AccountManager: Managing all users.
✅ Pre-trade Freeze: Prevent over-selling/buying.
✅ Post-trade Settlement: Correct fund transfer.
✅ Refined Trade: Included user_ids.

Now our engine not only matches orders but also ensures funding sufficiency and correct settlement!

↑ Back to Top

🇨🇳 中文

📦 代码变更: 查看 Diff

在前几章中，我们的撮合引擎已经可以正确匹配订单并产生成交。但有一个关键问题：钱从哪里来？ 在真实的交易所中，用户必须先有足够的资金才能下单，成交后资金才会转移。

本章我们将实现用户账户系统，包括：

余额管理（可用 / 冻结）
下单前资金校验
成交后资金结算

1. 余额的双重状态：Avail vs Frozen

在交易所中，用户的余额有两种状态：

状态	含义	使用场景
Avail (可用)	可以用于下单或提现	日常操作
Frozen (冻结)	已锁定在挂单中	等待成交或取消

为什么需要冻结？

假设 Alice 有 10 BTC，她同时挂了两个卖单：

卖单 A：卖 8 BTC
卖单 B：卖 5 BTC

如果没有冻结机制，这两个订单共需要 13 BTC，但 Alice 只有 10 BTC！这就是超卖问题。

正确的流程：

1. Alice 有 10 BTC (avail=10, frozen=0)
2. 下卖单 A (8 BTC) → freeze 8 BTC → (avail=2, frozen=8) ✅
3. 下卖单 B (5 BTC) → 尝试 freeze 5 BTC → 失败！avail 只有 2 ❌

2. Balance 结构

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Default)]
pub struct Balance {
    pub avail: u64,  // 可用余额 (简短命名，JSON 输出更高效)
    pub frozen: u64, // 冻结余额
}

impl Balance {
    /// 存款 (增加 avail)
    /// 返回 false 表示溢出 - 金融系统必须检测此错误
    pub fn deposit(&mut self, amount: u64) -> bool {
        match self.avail.checked_add(amount) {
            Some(new_avail) => {
                self.avail = new_avail;
                true
            }
            None => false, // 溢出！需要报警和调查
        }
    }
}

为什么要用 checked_add？

方法溢出行为 (250u8 + 10u8) 适用场景

+ (标准) Panic (Debug) 或 4 (Release回绕) 常规逻辑，溢出是 Bug

wrapping_add 4 (回绕) 哈希计算、图形算法

saturating_add 255 (封顶) 资源配额、令牌桶

checked_add None ✅ 金融余额，溢出必须报错!

⚠️ 金融系统中，“钱多到溢出“是严重的 Bug，必须返回错误让上层处理，而不是静默封顶或回绕。

方法	溢出行为 (250u8 + 10u8)	适用场景
`+` (标准)	Panic (Debug) 或 4 (Release回绕)	常规逻辑，溢出是 Bug
`wrapping_add`	4 (回绕)	哈希计算、图形算法
`saturating_add`	255 (封顶)	资源配额、令牌桶
`checked_add`	`None`	✅ 金融余额，溢出必须报错!

#![allow(unused)]
fn main() {

    /// 冻结 (avail → frozen)
    pub fn freeze(&mut self, amount: u64) -> bool {
        if self.avail >= amount {
            self.avail -= amount;
            self.frozen += amount;
            true
        } else {
            false
        }
    }

    /// 解冻 (frozen → avail)，用于取消订单
    pub fn unfreeze(&mut self, amount: u64) -> bool {
        if self.frozen >= amount {
            self.frozen -= amount;
            self.avail += amount;
            true
        } else {
            false
        }
    }

    /// 消耗冻结资金 (成交后，资金离开账户)
    pub fn consume_frozen(&mut self, amount: u64) -> bool {
        if self.frozen >= amount {
            self.frozen -= amount;
            true
        } else {
            false
        }
    }

    /// 接收资金 (成交后，资金进入账户)
    pub fn receive(&mut self, amount: u64) {
        self.avail = self.avail.checked_add(amount);
    }
}
}

3. 用户账户结构

每个用户持有多种资产的余额：

#![allow(unused)]
fn main() {
/// 使用 FxHashMap 实现 O(1) 资产查找
/// FxHashMap 使用更简单、更快的哈希函数，特别适合整数键
pub struct UserAccount {
    pub user_id: u64,
    balances: FxHashMap<u32, Balance>, // asset_id -> Balance
}

impl UserAccount {
    pub fn deposit(&mut self, asset_id: u32, amount: u64) {
        self.get_balance_mut(asset_id).deposit(amount);
    }

    pub fn avail(&self, asset_id: u32) -> u64 {
        self.balances.get(&asset_id).map(|b| b.avail).unwrap_or(0)
    }

    pub fn frozen(&self, asset_id: u32) -> u64 {
        self.balances.get(&asset_id).map(|b| b.frozen).unwrap_or(0)
    }
}
}

4. 下单流程：冻结资金

在下单时，我们需要根据订单类型冻结相应的资产：

订单类型	需要冻结的资产	冻结金额
买单 (Buy)	Quote 资产 (如 USDT)	price × quantity / qty_unit
卖单 (Sell)	Base 资产 (如 BTC)	quantity

从 SymbolManager 获取精度配置

每个交易对有独立的精度配置：

#![allow(unused)]
fn main() {
let symbol_info = manager.get_symbol_info("BTC_USDT").unwrap();
let price_decimal = symbol_info.price_decimal;  // 2 (价格精度)

let base_asset = manager.assets.get(&symbol_info.base_asset_id).unwrap();
let qty_decimal = base_asset.decimals;  // 8 (数量精度)
let qty_unit = 10u64.pow(qty_decimal);  // 100_000_000

// price = 100 USDT (内部单位: 100 * price_unit)
// qty = 10 BTC (内部单位: 10 * qty_unit)
// cost = price * qty / qty_unit (确保不会溢出)
let cost = price * qty / qty_unit;

if accounts.freeze(user_id, USDT, cost) {
    let result = book.add_order(Order::new(id, user_id, price, qty, Side::Buy));
} else {
    println!("REJECTED: Insufficient balance");
}

// 卖单：冻结 BTC
if accounts.freeze(user_id, BTC, qty) {
    let result = book.add_order(Order::new(id, user_id, price, qty, Side::Sell));
}
}

这样，精度配置跟着 Symbol 走，price * qty / qty_unit 保证结果在合理范围内。

5. 成交结算：资金转移

当订单匹配成交后，需要在买卖双方之间转移资金：

Trade: Alice sells 1 BTC to Bob @ $100

Before:
  Alice: BTC(frozen=1), USDT(avail=0)
  Bob:   BTC(avail=0), USDT(frozen=100)

Settlement:
  Alice: consume_frozen(BTC, 1) + receive(USDT, 100)
  Bob:   consume_frozen(USDT, 100) + receive(BTC, 1)

After:
  Alice: BTC(frozen=0), USDT(avail=100)
  Bob:   BTC(avail=1), USDT(frozen=0)

代码实现：

#![allow(unused)]
fn main() {
pub fn settle_trade(
    &mut self,
    buyer_id: u64,
    seller_id: u64,
    base_asset_id: u32,  // 如 BTC
    quote_asset_id: u32, // 如 USDT
    base_amount: u64,    // 成交数量
    quote_amount: u64,   // 成交金额 (price × qty)
) {
    // Buyer: 消耗 USDT，获得 BTC
    self.get_account_mut(buyer_id)
        .get_balance_mut(quote_asset_id)
        .consume_frozen(quote_amount);
    self.get_account_mut(buyer_id)
        .get_balance_mut(base_asset_id)
        .receive(base_amount);

    // Seller: 消耗 BTC，获得 USDT
    self.get_account_mut(seller_id)
        .get_balance_mut(base_asset_id)
        .consume_frozen(base_amount);
    self.get_account_mut(seller_id)
        .get_balance_mut(quote_asset_id)
        .receive(quote_amount);
}
}

6. Trade 结构的完善

为了正确结算，Trade 结构需要包含买卖双方的用户 ID：

#![allow(unused)]
fn main() {
pub struct Trade {
    pub id: u64,
    pub buyer_order_id: u64,
    pub seller_order_id: u64,
    pub buyer_user_id: u64,   // 新增
    pub seller_user_id: u64,  // 新增
    pub price: u64,
    pub qty: u64,
}
}

在撮合时，从 Order 中提取 user_id 并写入 Trade：

#![allow(unused)]
fn main() {
trades.push(Trade::new(
    self.trade_id_counter,
    buy_order.id,
    sell_order.id,
    buy_order.user_id,   // 从订单获取用户 ID
    sell_order.user_id,
    price,
    trade_qty,
));
}

7. 运行结果

=== 0xInfinity: Stage 5 (User Balance) ===
Symbol: BTC_USDT | Price: 2 decimals, Qty: 8 decimals
Cost formula: price * qty / 100000000

[0] Initial deposits...
    Alice: 100.00000000 BTC, 10000.00 USDT
    Bob:   5.00000000 BTC, 200000.00 USDT

[1] Alice places sell orders...
    Order 1: Sell 10.00000000 BTC @ $100.00 -> New
    Order 2: Sell 5.00000000 BTC @ $101.00 -> New
    Alice balance: avail=85.00000000 BTC, frozen=15.00000000 BTC

[2] Bob places buy order (taker)...
    Order 3: Buy 12.00000000 BTC @ $101.00 (cost: 1212.00 USDT)
    Trades:
      - Trade #1: 10.00000000 BTC @ $100.00
      - Trade #2: 2.00000000 BTC @ $101.00
    Order status: Filled

[3] Final balances:
    Alice: 85.00000000 BTC (frozen: 3.00000000), 11202.00 USDT
    Bob:   17.00000000 BTC, 198798.00 USDT (frozen: 0.00)

    Book: Best Bid=None, Best Ask=Some("101.00")

分析：

Alice 初始有 100 BTC，卖出 10+2=12 BTC，还剩 85 + 3(frozen) = 88 BTC ✓
Alice 收到 10×100 + 2×101 = 1202 USDT，加上初始 10000 = 11202 USDT ✓
Bob 初始有 5 BTC，买入 12 BTC = 17 BTC ✓
Bob 花费 1202 USDT，初始 200000 - 1202 = 198798 USDT ✓

Summary

本章完成了以下工作：

✅ 实现 Balance 结构：avail/frozen 双状态余额管理
✅ 实现 UserAccount：一个用户持有多种资产余额
✅ 实现 AccountManager：管理所有用户账户
✅ 下单前资金冻结：防止超卖/超买
✅ 成交后资金结算：在买卖双方间正确转移资金
✅ 完善 Trade 结构：包含买卖双方 user_id
✅ 添加单元测试：4 个新测试覆盖余额管理

现在我们的撮合引擎不仅能正确匹配订单，还能确保用户有足够的资金，并在成交后正确结算！

0x06 Enforced Balance Management

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

In the previous chapter, we implemented balance management. However, in financial systems, fund operations are the most critical part and must be foolproof. This chapter upgrades balance management to a Type-System Enforced version.

1. Why “Enforced”?

The previous implementation had flaws:

#![allow(unused)]
fn main() {
// ❌ Problem 1: Public fields, easily modified unintentionally
pub struct Balance {
    pub avail: u64,   // Dev might assign directly, bypassing logic
    pub frozen: u64,
}

// ❌ Problem 2: Returns bool, unclear error
fn freeze(&mut self, amount: u64) -> bool {
    // Failed? Why? Don't know.
}

// ❌ Problem 3: No Audit Trail
// Balance changed, but no versioning for tracing.
}

These issues can lead to:

Developers accidentally bypassing checks: In complex logic, one might modify fields directly.
Hard to debug: “Operation failed” doesn’t tell you why.
Audit difficulty: No change tracking makes it hard to pinpoint when a bug occurred.

Note: This is not to prevent malicious attacks (it’s an internal system), but to prevent developer errors. Just like Rust’s ownership system—we use types to reduce the chance of shooting ourselves in the foot.

2. Enforced Balance Design

The new version enforces safety via Rust Type System:

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
pub struct Balance {
    avail: u64,      // ← Private! Only accessible via methods
    frozen: u64,     // ← Private!
    version: u64,    // ← Private! Auto-increment on change
}
}

Core Principles

Principle	Implementation
Encapsulation	All fields private, read-only getters provided
Explicit Error	All mutations return `Result<(), &'static str>`
Audit Trail	`version` auto-increments on every mutation
Overflow Protection	Use `checked_add/sub`, overflow returns Error

Method Renaming

Old (v0.5)	New (v0.6)	Meaning
`freeze()`	`lock()`	More accurate: lock funds for order
`unfreeze()`	`unlock()`	Unlock (when cancelling)
`consume_frozen()`	`spend_frozen()`	Spend frozen funds (after match)
`receive()`	`deposit()`	Unified deposit semantics

3. Balance API Details

Safe Getters

#![allow(unused)]
fn main() {
impl Balance {
    /// Get Available (Read-only)
    pub const fn avail(&self) -> u64 { self.avail }
    
    /// Get Frozen (Read-only)
    pub const fn frozen(&self) -> u64 { self.frozen }
    
    /// Get Total (avail + frozen)
    /// Returns None on overflow (data corruption)
    pub const fn total(&self) -> Option<u64> {
        self.avail.checked_add(self.frozen)
    }
    
    /// Get Version (Read-only)
    pub const fn version(&self) -> u64 { self.version }
}
}

Why const fn? Compiler guarantees state is never modified, providing strongest safety.

Validated Mutations

Every mutation method:

Validates preconditions
Uses checked arithmetic
Returns Result
Auto-increments version

#![allow(unused)]
fn main() {
/// Deposit: Increase Available
pub fn deposit(&mut self, amount: u64) -> Result<(), &'static str> {
    self.avail = self.avail.checked_add(amount)
        .ok_or("Deposit overflow")?;  // ← Return Error on Overflow
    self.version = self.version.wrapping_add(1);  // ← Auto Increment
    Ok(())
}

/// Lock: Avail → Frozen
pub fn lock(&mut self, amount: u64) -> Result<(), &'static str> {
    if self.avail < amount {
        return Err("Insufficient funds to lock");  // ← Explicit Error
    }
    self.avail = self.avail.checked_sub(amount)
        .ok_or("Lock avail underflow")?;
    self.frozen = self.frozen.checked_add(amount)
        .ok_or("Lock frozen overflow")?;
    self.version = self.version.wrapping_add(1);
    Ok(())
}

/// Unlock: Frozen → Avail
pub fn unlock(&mut self, amount: u64) -> Result<(), &'static str> {
    if self.frozen < amount {
        return Err("Insufficient frozen funds");
    }
    self.frozen = self.frozen.checked_sub(amount)
        .ok_or("Unlock frozen underflow")?;
    self.avail = self.avail.checked_add(amount)
        .ok_or("Unlock avail overflow")?;
    self.version = self.version.wrapping_add(1);
    Ok(())
}

/// Spend Frozen: Funds leave account after match
pub fn spend_frozen(&mut self, amount: u64) -> Result<(), &'static str> {
    if self.frozen < amount {
        return Err("Insufficient frozen funds");
    }
    self.frozen = self.frozen.checked_sub(amount)
        .ok_or("Spend frozen underflow")?;
    self.version = self.version.wrapping_add(1);
    Ok(())
}
}

4. UserAccount Refactoring

UserAccount is also refactored:

Data Structure Change

#![allow(unused)]
fn main() {
// Old: FxHashMap
pub struct UserAccount {
    pub user_id: u64,
    balances: FxHashMap<u32, Balance>,
}

// New: O(1) Direct Array Indexing
pub struct UserAccount {
    user_id: UserId,      // Private
    assets: Vec<Balance>, // Private, asset_id as index
}
}

O(1) Direct Array Indexing

#![allow(unused)]
fn main() {
// deposit() auto-creates slot
pub fn deposit(&mut self, asset_id: AssetId, amount: u64) -> Result<(), &'static str> {
    let idx = asset_id as usize;
    if idx >= self.assets.len() {
        self.assets.resize(idx + 1, Balance::default());
    }
    self.assets[idx].deposit(amount)
}

// get_balance_mut() returns Result
pub fn get_balance_mut(&mut self, asset_id: AssetId) -> Result<&mut Balance, &'static str> {
    self.assets.get_mut(asset_id as usize).ok_or("Asset not found")
}
}

🚀 Why Vec<Balance> is Highest Performance?

1. Cache-Friendly Vec<Balance> is contiguous in memory. Loading one Balance loads neighbors into CPU cache line.

2. get_balance() is High Frequency Each order triggers 5-10 balance checks. O(1) + Cache Friendly is critical for millions of TPS.

Settlement Methods

New methods dedicated to handling all settlement logic for buyer/seller in one go:

#![allow(unused)]
fn main() {
/// Buyer Settlement: Spend Quote, Gain Base, Refund unused Quote
pub fn settle_as_buyer(
    &mut self,
    quote_asset_id: AssetId,
    base_asset_id: AssetId,
    spend_quote: u64,   // Consumed USDT
    gain_base: u64,     // Gained BTC
    refund_quote: u64,  // Refunded USDT
) -> Result<(), &'static str> {
    // 1. Spend Quote (Frozen)
    self.get_balance_mut(quote_asset_id).spend_frozen(spend_quote)?;
    
    // 2. Gain Base (Available)
    self.get_balance_mut(base_asset_id).deposit(gain_base)?;
    
    // 3. Refund (Frozen → Available)
    if refund_quote > 0 {
        self.get_balance_mut(quote_asset_id).unlock(refund_quote)?;
    }
    Ok(())
}
}

5. Execution Results

=== 0xInfinity: Stage 6 (Enforced Balance) ===
Symbol: BTC_USDT | Price: 2 decimals, Qty: 8 decimals
Cost formula: price * qty / 100000000

[0] Initial deposits...
    Alice: 100.00000000 BTC, 10000.00 USDT
    Bob:   5.00000000 BTC, 200000.00 USDT

[1] Alice places sell orders...
    Order 1: Sell 10.00000000 BTC @ $100.00 -> New
    Order 2: Sell 5.00000000 BTC @ $101.00 -> New
    Alice balance: avail=85.00000000 BTC, frozen=15.00000000 BTC

[2] Bob places buy order (taker)...
    Order 3: Buy 12.00000000 BTC @ $101.00 (cost: 1212.00 USDT)
    Trades:
      - Trade #1: 10.00000000 BTC @ $100.00
      - Trade #2: 2.00000000 BTC @ $101.00
    Order status: Filled

[3] Final balances:
    Alice: 85.00000000 BTC (frozen: 3.00000000), 11202.00 USDT
    Bob:   17.00000000 BTC, 198798.00 USDT (frozen: 0.00)

    Book: Best Bid=None, Best Ask=Some("101.00")

=== End of Simulation ===

Results are consistent with the previous chapter, but now all operations are protected by the Type System!

6. Unit Tests

We added 8 new tests for enforced_balance. Total 16 tests passing.

test enrolled_balance::tests::test_deposit ... ok
test enrolled_balance::tests::test_deposit_overflow ... ok
test enrolled_balance::tests::test_lock_unlock ... ok
...
test result: ok. 16 passed; 0 failed

7. Error Handling Example

With the new API, Result must be handled:

#![allow(unused)]
fn main() {
// ❌ Compile Error: Unhandled Result
balance.deposit(100);

// ✅ Correct: Propagate
balance.deposit(100)?;

// ✅ Correct: Unwrap (Only if sure)
balance.deposit(100).unwrap();

// ✅ Correct: Match
match balance.lock(1000) {
    Ok(()) => println!("Locked successfully"),
    Err(e) => println!("Failed to lock: {}", e),
}
}

Summary

This chapter accomplished:

✅ Encapsulation: Private fields prevent accidental modification.
✅ Result Return: All mutations return explicit errors.
✅ Versioning: Auto-increment version for audit.
✅ Checked Arithmetic: Prevents overflow.
✅ Renaming: lock/unlock/spend_frozen are clearer.
✅ Settlement Helper: settle_as_buyer/seller.
✅ Asset ID: Constraint for future O(1) array optimization.

Now our balance management is Type-Safe—the compiler prevents most balance-related bugs!

↑ Back to Top

🇨🇳 中文

📦 代码变更: 查看 Diff

在上一章中，我们实现了用户账户的余额管理。但在金融系统中，资金操作是最核心、最关键的操作，必须确保万无一失。本章我们将余额管理升级为类型系统强制的安全版本。

1. 为什么需要“强制“版本？

上一章的实现存在几个隐患：

#![allow(unused)]
fn main() {
// ❌ 旧版问题1：字段是公开的，容易被无意修改
pub struct Balance {
    pub avail: u64,   // 开发者可能不小心直接赋值，绕过业务逻辑校验
    pub frozen: u64,
}

// ❌ 旧版问题2：返回 bool，错误信息不明确
fn freeze(&mut self, amount: u64) -> bool {
    // 失败了？为什么失败？不知道
}

// ❌ 旧版问题3：无审计追踪
// 余额变了，但没有版本号，无法追溯
}

这些问题可能导致：

开发者无意中绕过校验：在复杂的业务代码中，可能不小心直接修改公开字段
错误难以排查：只知道操作失败，不知道具体原因
审计困难：没有变更追踪，难以定位问题发生的时间点

注意：这不是防止恶意攻击（这是内部系统），而是防止开发者无意挖坑。就像 Rust 的所有权系统一样——我们用类型系统来减少挖坑的机会。

2. 强制余额设计 (Enforced Balance)

新版本通过 Rust 类型系统 强制安全：

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
pub struct Balance {
    avail: u64,      // ← 私有！只能通过方法访问
    frozen: u64,     // ← 私有！
    version: u64,    // ← 私有！每次变更自动递增
}
}

核心原则

原则	实现方式
封装	所有字段私有，提供只读 getter
显式错误	所有变更返回 `Result<(), &'static str>`
审计追踪	`version` 在每次变更时自动递增
溢出保护	使用 `checked_add/sub`，溢出返回错误

方法命名变更

旧版 (v0.5)	新版 (v0.6)	说明
`freeze()`	`lock()`	更准确：锁定资金用于订单
`unfreeze()`	`unlock()`	解锁（取消订单时）
`consume_frozen()`	`spend_frozen()`	消费冻结资金（成交后）
`receive()`	`deposit()`	统一为存款语义

3. Balance API 详解

只读方法 (Safe Getters)

#![allow(unused)]
fn main() {
impl Balance {
    /// 获取可用余额 (只读)
    pub const fn avail(&self) -> u64 { self.avail }
    
    /// 获取冻结余额 (只读)
    pub const fn frozen(&self) -> u64 { self.frozen }
    
    /// 获取总余额 (avail + frozen)
    /// 返回 None 表示溢出（数据损坏）
    pub const fn total(&self) -> Option<u64> {
        self.avail.checked_add(self.frozen)
    }
    
    /// 获取版本号 (只读)
    pub const fn version(&self) -> u64 { self.version }
}
}

为什么用 const fn？ 编译器保证永远不会修改状态，提供最强的安全保证。

变更方法 (Validated Mutations)

每个变更方法都：

验证前置条件
使用 checked 算术
返回 Result
自动递增 version

#![allow(unused)]
fn main() {
/// 存款：增加可用余额
pub fn deposit(&mut self, amount: u64) -> Result<(), &'static str> {
    self.avail = self.avail.checked_add(amount)
        .ok_or("Deposit overflow")?;  // ← 溢出返回错误
    self.version = self.version.wrapping_add(1);  // ← 自动递增
    Ok(())
}

/// 锁定：可用 → 冻结
pub fn lock(&mut self, amount: u64) -> Result<(), &'static str> {
    if self.avail < amount {
        return Err("Insufficient funds to lock");  // ← 明确错误信息
    }
    self.avail = self.avail.checked_sub(amount)
        .ok_or("Lock avail underflow")?;
    self.frozen = self.frozen.checked_add(amount)
        .ok_or("Lock frozen overflow")?;
    self.version = self.version.wrapping_add(1);
    Ok(())
}

/// 解锁：冻结 → 可用
pub fn unlock(&mut self, amount: u64) -> Result<(), &'static str> {
    if self.frozen < amount {
        return Err("Insufficient frozen funds");
    }
    self.frozen = self.frozen.checked_sub(amount)
        .ok_or("Unlock frozen underflow")?;
    self.avail = self.avail.checked_add(amount)
        .ok_or("Unlock avail overflow")?;
    self.version = self.version.wrapping_add(1);
    Ok(())
}

/// 消费冻结资金：成交后资金离开账户
pub fn spend_frozen(&mut self, amount: u64) -> Result<(), &'static str> {
    if self.frozen < amount {
        return Err("Insufficient frozen funds");
    }
    self.frozen = self.frozen.checked_sub(amount)
        .ok_or("Spend frozen underflow")?;
    self.version = self.version.wrapping_add(1);
    Ok(())
}
}

4. UserAccount 重构

新版 UserAccount 也进行了重构：

数据结构变更

#![allow(unused)]
fn main() {
// 旧版：使用 FxHashMap
pub struct UserAccount {
    pub user_id: u64,
    balances: FxHashMap<u32, Balance>,
}

// 新版：O(1) 直接数组索引
pub struct UserAccount {
    user_id: UserId,      // 私有
    assets: Vec<Balance>, // 私有，asset_id 作为下标
}
}

O(1) 直接数组索引

#![allow(unused)]
fn main() {
// deposit() 自动创建资产槽位（唯一入口）
pub fn deposit(&mut self, asset_id: AssetId, amount: u64) -> Result<(), &'static str> {
    let idx = asset_id as usize;
    if idx >= self.assets.len() {
        self.assets.resize(idx + 1, Balance::default());
    }
    self.assets[idx].deposit(amount)
}

// get_balance_mut() 不创建槽位，返回 Result
pub fn get_balance_mut(&mut self, asset_id: AssetId) -> Result<&mut Balance, &'static str> {
    self.assets.get_mut(asset_id as usize).ok_or("Asset not found")
}
}

🚀 为什么 Vec<Balance> 直接索引是最高效选择？

1. 极佳的缓存友好性 (Cache-Friendly)

Vec<Balance> 是连续内存布局，相邻资产的 Balance 在内存中也相邻。当 CPU 读取一个 Balance 时，整个缓存行（通常 64 字节）会被加载，相邻的 Balance 数据也一并进入 L1/L2 缓存，后续访问几乎零延迟。

2. get_balance() 是高频调用函数

在撮合引擎中，每笔订单都需要多次调用 get_balance()：

下单前检查余额

冻结资金

每笔成交结算（买方 + 卖方各 2-3 次）

退款未使用资金

一笔订单可能产生 5-10 次 get_balance() 调用。在高频交易场景（每秒万笔订单），这意味着每秒 5-10 万次调用。 O(1) + 缓存友好 对性能至关重要。

结算方法

新增专门的结算方法，一次性处理买方或卖方的所有结算：

#![allow(unused)]
fn main() {
/// 买方结算：消费 Quote，获得 Base，退款未使用的 Quote
pub fn settle_as_buyer(
    &mut self,
    quote_asset_id: AssetId,
    base_asset_id: AssetId,
    spend_quote: u64,   // 消费的 USDT
    gain_base: u64,     // 获得的 BTC
    refund_quote: u64,  // 退款的 USDT
) -> Result<(), &'static str> {
    // 1. 消费 Quote (Frozen)
    self.get_balance_mut(quote_asset_id).spend_frozen(spend_quote)?;
    
    // 2. 获得 Base (Available)
    self.get_balance_mut(base_asset_id).deposit(gain_base)?;
    
    // 3. 退款 (Frozen → Available)
    if refund_quote > 0 {
        self.get_balance_mut(quote_asset_id).unlock(refund_quote)?;
    }
    Ok(())
}
}

5. 运行结果

=== 0xInfinity: Stage 6 (Enforced Balance) ===
Symbol: BTC_USDT | Price: 2 decimals, Qty: 8 decimals
Cost formula: price * qty / 100000000

[0] Initial deposits...
    Alice: 100.00000000 BTC, 10000.00 USDT
    Bob:   5.00000000 BTC, 200000.00 USDT

[1] Alice places sell orders...
    Order 1: Sell 10.00000000 BTC @ $100.00 -> New
    Order 2: Sell 5.00000000 BTC @ $101.00 -> New
    Alice balance: avail=85.00000000 BTC, frozen=15.00000000 BTC

[2] Bob places buy order (taker)...
    Order 3: Buy 12.00000000 BTC @ $101.00 (cost: 1212.00 USDT)
    Trades:
      - Trade #1: 10.00000000 BTC @ $100.00
      - Trade #2: 2.00000000 BTC @ $101.00
    Order status: Filled

[3] Final balances:
    Alice: 85.00000000 BTC (frozen: 3.00000000), 11202.00 USDT
    Bob:   17.00000000 BTC, 198798.00 USDT (frozen: 0.00)

    Book: Best Bid=None, Best Ask=Some("101.00")

=== End of Simulation ===

结果与前一章一致，但现在所有余额操作都通过类型系统保护！

6. 单元测试

新增 8 个 enforced_balance 测试：

$ cargo test

test result: ok. 16 passed; 0 failed

7. 错误处理示例

使用新 API 时，必须处理 Result：

#![allow(unused)]
fn main() {
// ❌ 编译错误：未处理的 Result
balance.deposit(100);

// ✅ 正确：显式处理
balance.deposit(100)?;  // 使用 ? 传播错误

// ✅ 正确：使用 unwrap（仅在确定不会失败时）
balance.deposit(100).unwrap();

// ✅ 正确：匹配处理
match balance.lock(1000) {
    Ok(()) => println!("Locked successfully"),
    Err(e) => println!("Failed to lock: {}", e),
}
}

Summary

本章完成了以下工作：

✅ 私有字段封装：所有余额字段私有化，防止无意修改
✅ Result 返回类型：所有变更操作返回明确的错误信息
✅ 版本追踪：每次变更自动递增 version，支持审计
✅ Checked 算术：所有运算使用 checked_add/sub，溢出返回错误
✅ 方法重命名：lock/unlock/spend_frozen 语义更清晰
✅ 结算方法：settle_as_buyer/settle_as_seller 一站式结算
✅ Asset ID 约束：为未来 O(1) 直接索引优化做准备
✅ 16 个测试通过：包括 8 个新的 enforced_balance 测试

现在我们的余额管理是类型安全的——编译器本身就能防止大部分余额操作错误！

0x07-a Testing Framework - Correctness

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

Core Objective: To establish a verifiable, repeatable, and traceable testing infrastructure for the matching engine.

This chapter is not just about “how to test”, but importantly about understanding “why designed this way”—these design decisions stem directly from real-world exchange requirements.

1. Why a Testing Framework?

1.1 The Uniqueness of Matching Engines

A matching engine is not a generic CRUD app. A single bug can lead to:

Fund Errors: Users’ funds disappearing or inflating.
Order Loss: Orders executed but not recorded.
Inconsistent States: Contradictions between balances, orders, and ledgers.

Therefore, we need:

Deterministic Testing: Same input must yield same output.
Complete Audit: Every penny movement must be traceable.
Fast Verification: Quickly confirm correctness after every code change.

1.2 Golden File Testing Pattern

We adopt the Golden File Pattern:

fixtures/         # Input (Fixed)
    ├── orders.csv
    └── balances_init.csv

baseline/         # Golden Baseline (Result of first correct run, committed to git)
    ├── t1_balances_deposited.csv
    ├── t2_balances_final.csv
    ├── t2_ledger.csv
    └── t2_orderbook.csv

output/           # Current Run Result (gitignored)
    └── ...

Why this pattern?

Determinism: Fixed seeds ensure identical random sequences.
Version Control: Baselines are committed; any change triggers a git diff.
Fast Feedback: Just diff baseline/ output/.
Auditable: Baseline is the “contract”; deviations require explanation.

2. Precision Design: decimals vs display_decimals

2.1 Why Two Precisions?

This is the most error-prone area in exchanges. Consider this real case:

User sees:      Buy 0.01 BTC @ $85,000.00
Internal store: qty=1000000 (satoshi), price=85000000000 (micro-cents)

If we confuse these layers:

User enters 0.01, system treats as 0.01 satoshi (= 0.00000001 BTC).
Or user account shows 100 BTC, but actually has 0.000001 BTC.

Solution: Clearly distinguish two layers.

2.2 Precision Layers

┌─────────────────────────────────────────────────────────────┐
│ Layer 1: Client (display_decimals)                          │
│   - Numbers seen by users                                   │
│   - Can be adjusted based on business needs                 │
│   - E.g.: BTC displays 6 decimals (0.000001 BTC)            │
└─────────────────────────────────────────────────────────────┘
                              ↓
                    Auto Convert (× 10^decimals)
                              ↓
┌─────────────────────────────────────────────────────────────┐
│ Layer 2: Internal (decimals)                                │
│   - Precision for internal storage and calculation          │
│   - NEVER change once set                                   │
│   - E.g.: BTC stored with 8 decimals (satoshi)              │
└─────────────────────────────────────────────────────────────┘

2.3 Configuration Design

assets_config.csv (Asset Precision Config):

asset_id,asset,decimals,display_decimals
1,BTC,8,6     # Min unit 0.000001 BTC ≈ $0.085
2,USDT,6,4    # Min unit 0.0001 USDT
3,ETH,8,4     # Min unit 0.0001 ETH ≈ $0.40

Field	Mutability	Explanation
`decimals`	⚠️ Never Change	Defines min unit; changing breaks all existing data.
`display_decimals`	✅ Dynamic	Client-side precision for Quantity (qty).

symbols_config.csv (Trading Pair Config):

symbol_id,symbol,base_asset_id,quote_asset_id,price_decimal,price_display_decimal
0,BTC_USDT,1,2,6,2    # Price min unit $0.01
1,ETH_USDT,3,2,6,2

Key Design: Precision Source

Order Field	Precision Source	Config File
`qty`	`base_asset.display_decimals`	assets_config.csv
`price`	`symbol.price_display_decimal`	symbols_config.csv

⚠️ Note: Price precision comes from Symbol config, NOT Quote Asset! This is because the same quote asset (e.g., USDT) may have different price precisions in different pairs.

Why decimals cannot change?

Suppose BTC decimals change from 8 to 6:

Original balance 100,000,000 (= 1 BTC with 8 decimals).
New interpretation 100,000,000 / 10^6 = 100 BTC.
User gains 99 BTC out of thin air!

Why display_decimals can change?

This is just the display layer:

Original display: 0.12345678 BTC.
New display (6 decimals): 0.123456 BTC.
Internal storage remains 12,345,678 satoshis.

3. Balance Format: Row vs Column

3.1 Problem: Storing Multi-Asset Balances

Option A: Columnar (One column per asset)

user_id,btc_avail,btc_frozen,usdt_avail,usdt_frozen
1,10000000000,0,10000000000000,0

Option B: Row-based (One row per asset)

user_id,asset_id,avail,frozen,version
1,1,10000000000,0,0
1,2,10000000000000,0,0

3.2 Why Row-based?

Dimension	Columnar	Row-based
Extensibility	❌ Alter table to add asset	✅ Just add a row
Sparse Data	❌ Many nulls/zeros	✅ Store only non-zero assets
DB Compat	❌ Non-standard	✅ Standard normalization
Genericity	❌ Asset names hardcoded	✅ `asset_id` is generic

Real Scenario: An exchange supports 500+ assets, but users avg 3-5 holdings. Row-based design saves 99% storage space.

4. Timeline Snapshot Design

4.1 Why Multiple Snapshots?

Matching is a multi-stage process:

T0: Initial State (fixtures/balances_init.csv)
    ↓ deposit()
T1: Deposit Done (baseline/t1_balances_deposited.csv)
    ↓ execute orders
T2: Trading Done (baseline/t2_balances_final.csv)

Errors can occur at any stage:

T0→T1: Is deposit logic correct?
T1→T2: Is trade settlement correct?

Snapshots pinpoint issues:

# Verify Deposit
diff balances_init.csv t1_balances_deposited.csv

# Verify Settlement
diff t1_balances_deposited.csv t2_balances_final.csv

4.2 Naming Convention

t1_balances_deposited.csv   # t1 stage, balances type, deposited state
t2_balances_final.csv       # t2 stage, balances type, final state
t2_ledger.csv               # t2 stage, ledger type
t2_orderbook.csv            # t2 stage, orderbook type

Principle: {Time}_{Type}_{State}.csv

Benefits:

Natural sort order by time.
Clear content identification.
Avoids ambiguity.

5. Settlement Ledger Design

5.1 Why Ledger?

t2_ledger.csv is the system’s Audit Log. Every penny movement is recorded here.

Without Ledger:

User complaint: “Where did my money go?”
Support: “Your balance is X.”
Unanswerable: “When did it change? Why?”

With Ledger:

trade_id,user_id,asset_id,op,delta,balance_after
1,96,2,debit,849700700,9999150299300
1,96,1,credit,1000000,10001000000

Traceability:

Trade #1 caused User #96’s USDT to decrease by 849,700,700.
Simultaneously BTC increased by 1,000,000.
What is the balance after change.

5.2 Why `delta + after` instead of `before + after`?

Option A: before + after

delta,balance_before,balance_after
849700700,10000000000,9999150299300

Option B: delta + after

delta,balance_after
849700700,9999150299300

Why B?

Less Redundancy: before = after - delta.
Usefulness: We mostly verify “Is the final state correct?”.
Clarity: Delta directly explains the change.

6. ME Orderbook Snapshot

6.1 Why Orderbook Snapshot?

After trading, the Orderbook still holds unfilled orders. These orders:

Reside in RAM.
Are lost if system restarts.

t2_orderbook.csv is a Full Snapshot of ME State:

order_id,user_id,side,order_type,price,qty,filled_qty,status
6,907,sell,limit,85330350000,2000000,0,New

Uses:

Recovery: Revert Orderbook state after restart.
Verification: Compare against theoretical expectations.
Debugging: Check stuck orders.

6.2 Why Record All Fields?

The goal is Full Recovery. Rebuilding Order struct requires:

#![allow(unused)]
fn main() {
struct Order {
    id, user_id, price, qty, filled_qty, side, order_type, status
}
}

Missing any field prevents recovery.

7. Test Script Design

7.1 Modular Scripts

scripts/
├── test_01_generate.sh     # Step 1: Generate Data
├── test_02_baseline.sh     # Step 2: Generate Baseline
├── test_03_verify.sh       # Step 3: Run & Verify
└── test_e2e.sh             # Combo: Full E2E Flow

Why Modular?

Isolated Debugging: Run only relevant steps.
Flexible Composition: CI can verify without regenerating.
Readability: One script, one job.

7.2 Usage

# Daily Test (Use existing baseline)
./scripts/test_e2e.sh

# Regenerate Baseline & Test
./scripts/test_e2e.sh --regenerate

8. CLI Design: `--baseline` Switch

8.1 Why Switch?

Default behavior:

Output to output/
Never overwrite baseline

Update baseline:

Add --baseline arg
Output to baseline/

Why not auto-overwrite?

Safety: Prevent accidental baseline corruption.
Intent: Updating baseline is a conscious decision.
Git Friendly: Changes trigger diff.

8.2 Implementation

#![allow(unused)]
fn main() {
fn get_output_dir() -> &'static str {
    let args: Vec<String> = std::env::args().collect();
    if args.iter().any(|a| a == "--baseline") {
        "baseline"
    } else {
        "output"
    }
}
}

9. Execution Example

9.1 Full Flow

# 1. Generate Data
python3 scripts/generate_orders.py --orders 100000 --seed 42

# 2. Generate Baseline (First run or update)
cargo run --release -- --baseline

# 3. Daily Test
./scripts/test_e2e.sh

9.2 Verification Output

╔════════════════════════════════════════════════════════════╗
║     0xInfinity Testing Framework - E2E Test                ║
╚════════════════════════════════════════════════════════════╝

  t1_balances_deposited.csv: ✅ MATCH
  t2_balances_final.csv: ✅ MATCH
  t2_ledger.csv: ✅ MATCH
  t2_orderbook.csv: ✅ MATCH

✅ All tests passed!

10. Summary

This chapter established a complete testing infrastructure:

Design Point	Problem Solved	Solution
Precision Confusion	User vs Internal precision	decimals + display_decimals
Asset Extension	Support N assets	Row-based balance format
Traceability	Where failed?	Timeline Snapshots (T0→T1→T2)
Fund Audit	Where funds go?	Settlement Ledger
State Recovery	Restart recovery	Orderbook Snapshot
Regression	Breaking changes?	Golden File Pattern
Efficiency	Fast feedback	Modular scripts

Core Philosophy:

Testing is not an afterthought, but part of the design. A good testing framework gives you confidence when changing code.

Next section (0x07-b) will add performance benchmarks on top of this.

↑ Back to Top

🇨🇳 中文

📦 代码变更: 查看 Diff

核心目的：为撮合引擎建立可验证、可重复、可追溯的测试基础设施。

本章不仅是“如何测试“，更重要的是理解“为什么这样设计“——这些设计决策直接源于真实交易所的需求。

1. 为什么需要测试框架？

1.1 撮合引擎的特殊性

撮合引擎不是普通的 CRUD 应用。一个 bug：

资金错误：用户资金凭空消失或增加
订单丢失：订单被执行但没有记录
状态不一致：余额、订单、成交记录互相矛盾

因此，我们需要：

确定性测试：相同的输入必须产生相同的输出
完整审计：每一分钱的变动都可追溯
快速验证：每次修改代码后能快速确认没有破坏正确性

1.2 Golden File 测试模式

我们采用 Golden File 模式：

fixtures/         # 输入（固定）
    ├── orders.csv
    └── balances_init.csv

baseline/         # 黄金基准（第一次正确运行的结果，git 提交）
    ├── t1_balances_deposited.csv
    ├── t2_balances_final.csv
    ├── t2_ledger.csv
    └── t2_orderbook.csv

output/           # 当前运行结果（gitignored）
    └── ...

为什么选择这种模式？

确定性：固定的 seed 保证相同的随机数序列
版本控制：baseline 提交到 git，任何变化都能被 diff 检测
快速反馈：只需 diff baseline/ output/
可审计：baseline 是“合约“，任何偏离都需要解释

2. 精度设计：decimals vs display_decimals

2.1 为什么需要两种精度？

这是交易所最容易出错的地方。看这个真实案例：

用户看到：买入 0.01 BTC @ $85,000.00
内部存储：qty=1000000 (satoshi), price=85000000000 (微美分)

如果混淆这两层，会发生什么？

用户输入 0.01，系统理解为 0.01 satoshi = 实际 0.0000001 BTC
或者用户账户显示有 100 BTC，实际只有 0.000001 BTC

解决方案：明确区分两层精度

2.2 精度层次

┌─────────────────────────────────────────────────────────────┐
│ Layer 1: Client (display_decimals)                          │
│   - 用户看到的数字                                            │
│   - 可以根据业务需求调整                                        │
│   - 例如：BTC 数量显示 6 位小数 (0.000001 BTC)              │
└─────────────────────────────────────────────────────────────┘
                              ↓
                    自动转换 (× 10^decimals)
                              ↓
┌─────────────────────────────────────────────────────────────┐
│ Layer 2: Internal (decimals)                                │
│   - 内部存储和计算的精度                                        │
│   - 一旦设定永不改变                                            │
│   - 例如：BTC 存储 8 位精度 (satoshi)                          │
└─────────────────────────────────────────────────────────────┘

2.3 配置文件设计

assets_config.csv（资产精度配置）：

asset_id,asset,decimals,display_decimals
1,BTC,8,6     # 最小单位 0.000001 BTC ≈ $0.085
2,USDT,6,4   # 最小单位 0.0001 USDT
3,ETH,8,4    # 最小单位 0.0001 ETH ≈ $0.40

字段	可变性	说明
`decimals`	⚠️ 永不改变	定义最小单位，改变会破坏所有现有数据
`display_decimals`	✅ 可动态调整	用于数量 (qty) 的客户端精度

symbols_config.csv（交易对配置）：

symbol_id,symbol,base_asset_id,quote_asset_id,price_decimal,price_display_decimal
0,BTC_USDT,1,2,6,2    # 价格最小单位 $0.01
1,ETH_USDT,3,2,6,2

关键设计：精度来源

订单字段	精度来源	配置位置
`qty` (数量)	`base_asset.display_decimals`	assets_config.csv
`price` (价格)	`symbol.price_display_decimal`	symbols_config.csv

⚠️ 注意：price 精度来自 symbol 配置，不是 quote_asset！ 这样设计是因为同一个 quote asset（如 USDT）在不同交易对中可能有不同的价格精度。

为什么 decimals 不能改变？

假设 BTC decimals 从 8 改为 6：

原来账户余额 100000000 (= 1 BTC)
现在变成 100000000 / 10^6 = 100 BTC
用户凭空获得 99 BTC！

为什么 display_decimals 可以改变？

这只是显示层，不影响存储：

原来显示 0.12345678 BTC
调整后显示 0.123456 BTC（6位）
内部存储仍然是 12345678 satoshi

3. 余额格式设计：行式 vs 列式

3.1 问题：如何存储多资产余额？

Option A：列式（每个资产一列）

user_id,btc_avail,btc_frozen,usdt_avail,usdt_frozen
1,10000000000,0,10000000000000,0

Option B：行式（每个资产一行）

user_id,asset_id,avail,frozen,version
1,1,10000000000,0,0
1,2,10000000000000,0,0

3.2 为什么选择行式？

对比维度	列式	行式
扩展性	❌ 添加资产需改表结构	✅ 直接添加新行
稀疏数据	❌ 大量空值	✅ 只存有余额的资产
数据库兼容	❌ 非标准化	✅ 标准化范式
通用性	❌ 资产名硬编码	✅ asset_id 通用

真实场景：交易所支持 500+ 种资产，但用户平均只持有 3-5 种。行式设计节省 99% 的存储空间。

4. 时间线快照设计

4.1 为什么需要多个快照？

撮合过程不是单一操作，而是多阶段流程：

T0: 初始状态 (fixtures/balances_init.csv)
    ↓ deposit()
T1: 充值完成 (baseline/t1_balances_deposited.csv)
    ↓ execute orders
T2: 交易完成 (baseline/t2_balances_final.csv)

每个阶段都可能出错：

T0→T1：deposit 逻辑是否正确？
T1→T2：交易结算是否正确？

有了快照，可以精确定位问题：

# 验证 deposit 正确性
diff balances_init.csv t1_balances_deposited.csv

# 验证交易结算正确性
diff t1_balances_deposited.csv t2_balances_final.csv

4.2 文件命名设计

t1_balances_deposited.csv   # t1 阶段，balances 类型，deposited 状态
t2_balances_final.csv       # t2 阶段，balances 类型，final 状态
t2_ledger.csv               # t2 阶段，ledger 类型
t2_orderbook.csv            # t2 阶段，orderbook 类型

命名原则：{时间点}_{数据类型}_{状态}.csv

这样的命名：

按时间排序时自然有序
一眼看出数据是什么
避免文件名歧义

5. Settlement Ledger 设计

5.1 为什么需要 Ledger？

t2_ledger.csv 是整个系统的审计日志。每一分钱的变动都记录在这里。

没有 Ledger 的问题：

用户投诉：我的钱去哪了？
只能说：交易后余额是 X
无法回答：什么时候变的？为什么变？

有了 Ledger：

trade_id,user_id,asset_id,op,delta,balance_after
1,96,2,debit,849700700,9999150299300
1,96,1,credit,1000000,10001000000

可以完整追溯：

Trade #1 导致 User #96 的 USDT 减少 849700700
同时 BTC 增加 1000000
变化后余额是多少

5.2 为什么用 delta + after，而不是 before + after？

Option A：before + after

delta,balance_before,balance_after
849700700,10000000000,9999150299300

Option B：delta + after

delta,balance_after
849700700,9999150299300

选择 Option B 的原因：

冗余更少：before = after - delta，可计算得出
after 更有用：通常我们想验证的是“最终状态对不对“
delta 直接说明变化：不需要心算 before - after

6. ME Orderbook 快照

6.1 为什么需要 Orderbook 快照？

交易完成后，Orderbook 里仍然有未成交的挂单。这些订单：

在内存中
如果系统重启，会丢失

t2_orderbook.csv 是 ME 状态的完整快照：

order_id,user_id,side,order_type,price,qty,filled_qty,status
6,907,sell,limit,85330350000,2000000,0,New

用途：

状态恢复：重启后可以从快照恢复 Orderbook
正确性验证：与理论预期对比
调试：哪些订单还在挂着？

6.2 为什么记录所有字段？

快照目的是完整恢复。恢复时需要重建 Order 结构体：

#![allow(unused)]
fn main() {
struct Order {
    id,
    user_id,
    price,
    qty,
    filled_qty,
    side,
    order_type,
    status,
}
}

缺少任何字段都无法恢复。

7. 测试脚本设计

7.1 模块化脚本

scripts/
├── test_01_generate.sh     # Step 1: 生成测试数据
├── test_02_baseline.sh     # Step 2: 生成基准
├── test_03_verify.sh       # Step 3: 运行并验证
└── test_e2e.sh             # 组合：完整 E2E 流程

为什么模块化？

单独调试：出问题时只运行相关步骤
灵活组合：CI 可以只运行 verify，不重新生成数据
可读性：每个脚本做一件事

7.2 使用方式

# 日常测试（使用现有 baseline）
./scripts/test_e2e.sh

# 重新生成基准并测试
./scripts/test_e2e.sh --regenerate

8. 命令行设计：–baseline 开关

8.1 为什么需要开关？

默认行为：

输出到 output/
不会覆盖 baseline

需要更新基准时：

加 --baseline 参数
输出到 baseline/

为什么不自动覆盖？

安全：防止意外覆盖基准
意图明确：更新基准是有意识的决定
Git 友好：baseline 变化会触发 git diff
代码实现：

#![allow(unused)]
fn main() {
fn get_output_dir() -> &'static str {
    let args: Vec<String> = std::env::args().collect();
    if args.iter().any(|a| a == "--baseline") {
        "baseline"
    } else {
        "output"
    }
}
}

9. 运行示例

9.1 完整流程

# 1. 生成测试数据
python3 scripts/generate_orders.py --orders 100000 --seed 42

# 2. 生成基准（首次或需要更新时）
cargo run --release -- --baseline

# 3. 日常测试
./scripts/test_e2e.sh

9.2 验证输出

╔════════════════════════════════════════════════════════════╗
║     0xInfinity Testing Framework - E2E Test                ║
╚════════════════════════════════════════════════════════════╝

  t1_balances_deposited.csv: ✅ MATCH
  t2_balances_final.csv: ✅ MATCH
  t2_ledger.csv: ✅ MATCH
  t2_orderbook.csv: ✅ MATCH

✅ All tests passed!

10. Summary

本章建立了完整的测试基础设施：

设计点	解决的问题	方案
精度混淆	用户精度 vs 内部精度	decimals + display_decimals
资产扩展	支持 N 种资产	行式余额格式
过程追溯	哪一步出错？	时间线快照 (T0→T1→T2)
资金审计	每分钱去向	Settlement Ledger
状态恢复	重启后恢复	Orderbook 快照
回归测试	代码改动是否破坏正确性	Golden File 模式
测试效率	快速反馈	模块化脚本

核心理念：

测试不是事后补的，而是设计的一部分。好的测试框架能让你在改动代码时有信心。

下一节 (0x07-b) 将在此基础上添加性能测试和优化基准。

0x07-b Performance Baseline - Initial Setup

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

Core Objective: To establish a quantifiable, traceable, and comparable performance baseline.

Building on the testing framework from 0x07-a, this chapter adds detailed performance metric collection and analysis capabilities.

1. Why a Performance Baseline?

1.1 The Performance Trap

Optimization without a baseline is blind:

Premature Optimization: Optimizing code that accounts for 1% of runtime.
Delayed Regression Detection: A refactor drops performance by 50%, but it’s only discovered 3 months later.
Unquantifiable Improvement: Promoting “it’s much faster,” but exactly how much?

1.2 Value of a Baseline

With a baseline, you can:

Verify before Commit: Ensure performance hasn’t degraded.
Pinpoint Bottlenecks: Identify which component consumes the most time.
Quantify Optimization: “Throughput increased from 30K ops/s to 100K ops/s.”

2. Metric Design

2.1 Throughput Metrics

Metric	Explanation	Calculation
`throughput_ops`	Order Throughput	orders / exec_time
`throughput_tps`	Trade Throughput	trades / exec_time

2.2 Time Breakdown

We decompose execution time into four components:

┌─────────────────────────────────────────────────────────────┐
│ Order Processing (per order)                                │
├─────────────────────────────────────────────────────────────┤
│ 1. Balance Check     │ Account lookup + balance validation  │
│    - Account lookup  │ FxHashMap O(1)                       │
│    - Fund locking    │ Check avail >= required, then lock   │
├─────────────────────────────────────────────────────────────┤
│ 2. Matching Engine   │ book.add_order()                     │
│    - Price lookup    │ BTreeMap O(log n)                    │
│    - Order matching  │ iterate + partial fill               │
├─────────────────────────────────────────────────────────────┤
│ 3. Settlement        │ settle_as_buyer/seller               │
│    - Balance update  │ HashMap O(1)                         │
├─────────────────────────────────────────────────────────────┤
│ 4. Ledger I/O        │ write_entry()                        │
│    - File write      │ Disk I/O                             │
└─────────────────────────────────────────────────────────────┘

2.3 Latency Percentiles

Sample total processing latency every N orders:

Percentile	Meaning
P50	Median, typical case
P99	99% of requests are faster than this
P99.9	Tail latency, worst cases
Max	Maximum latency

3. Initial Baseline Data

3.1 Test Environment

Hardware: MacBook Pro M Series
Data: 100,000 Orders, 47,886 Trades
Mode: Release build (--release)

3.2 Throughput

Throughput: ~29,000 orders/sec | ~14,000 trades/sec
Execution Time: ~3.5s

3.3 Time Breakdown 🔥

=== Performance Breakdown ===
Balance Check:       17.68ms (  0.5%)  ← FxHashMap O(1)
Matching Engine:     36.04ms (  1.0%)  ← Extremely Fast!
Settlement:           4.77ms (  0.1%)  ← Negligible
Ledger I/O:        3678.68ms ( 98.4%) ← Bottleneck!

Key Findings:

Ledger I/O consumes 98.4% of time.
Balance Check + Matching + Settlement total only ~58ms.
Theoretical Limit: ~1.7 Million orders/sec (without I/O).

3.4 Order Lifecycle Timeline 📊

                           Order Lifecycle
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
    │   Balance   │    │  Matching   │    │ Settlement  │    │  Ledger     │
    │   Check     │───▶│   Engine    │───▶│  (Balance)  │───▶│   I/O       │
    └─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘
          │                  │                  │                  │
          ▼                  ▼                  ▼                  ▼
    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
    │ FxHashMap   │    │  BTreeMap   │    │Vec<Balance> │    │  File::     │
    │   +Vec O(1) │    │  O(log n)   │    │    O(1)     │    │  write()    │
    └─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘

    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    Total Time:   17.68ms            36.04ms            4.77ms          3678.68ms
    Percentage:    0.5%               1.0%              0.1%             98.4%
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    Per-Order:    0.18µs             0.36µs            0.05µs           36.79µs
    Potential:   5.6M ops/s         2.8M ops/s       20M ops/s         27K ops/s
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

                        Business Logic ~58ms (1.6%)        I/O ~3679ms (98.4%)
                    ◀─────────────────────────▶      ◀───────────────────────▶
                             Fast ✅                        Bottleneck 🔴

Analysis:

Phase	Latency/Order	Theoretical OPS	Note
Balance Check	0.18µs	5.6M/s	FxHashMap Lookup + Vec O(1)
Matching Engine	0.36µs	2.8M/s	BTreeMap Price Matching
Settlement	0.05µs	20M/s	Vec<Balance> O(1) Indexing
Ledger I/O	36.79µs	27K/s	Unbuffered File Write = Bottleneck!

E2E Result:

Actual Throughput: ~29K orders/sec (I/O Bound)
Theoretical Limit (No I/O): ~1.7M orders/sec (60x room for improvement!)

3.5 Latency Percentiles

=== Latency Percentiles (sampled) ===
  Min:        125 ns
  Avg:      34022 ns
  P50:        583 ns   ← Typical order < 1µs
  P99:     391750 ns   ← 99% of orders < 0.4ms
  P99.9:  1243833 ns   ← Tail latency ~1.2ms
  Max:    3207875 ns   ← Worst case ~3ms

4. Output Files

4.1 t2_perf.txt (Machine Readable)

# Performance Baseline - 0xInfinity
# Generated: 2025-12-16
orders=100000
trades=47886
exec_time_ms=3451.78
throughput_ops=28971
throughput_tps=13873
matching_ns=32739014
settlement_ns=3085409
ledger_ns=3388134698
latency_min_ns=125
latency_avg_ns=34022
latency_p50_ns=583
latency_p99_ns=391750
latency_p999_ns=1243833
latency_max_ns=3207875

4.2 t2_summary.txt (Human Readable)

Contains full execution summary and performance breakdown.

5. PerfMetrics Implementation

#![allow(unused)]
fn main() {
/// Performance metrics for execution analysis
#[derive(Default)]
struct PerfMetrics {
    // Timing breakdown (nanoseconds)
    total_balance_check_ns: u64,  // Account lookup + balance check + lock
    total_matching_ns: u64,       // OrderBook.add_order()
    total_settlement_ns: u64,     // Balance updates after trade
    total_ledger_ns: u64,         // Ledger file I/O
    
    // Per-order latency samples
    latency_samples: Vec<u64>,
    sample_rate: usize,
}

impl PerfMetrics {
    fn new(sample_rate: usize) -> Self { ... }
    
    fn add_order_latency(&mut self, latency_ns: u64) { ... }
    fn add_balance_check_time(&mut self, ns: u64) { ... }
    fn add_matching_time(&mut self, ns: u64) { ... }
    fn add_settlement_time(&mut self, ns: u64) { ... }
    fn add_ledger_time(&mut self, ns: u64) { ... }
    
    fn percentile(&self, p: f64) -> Option<u64> { ... }
    fn min_latency(&self) -> Option<u64> { ... }
    fn max_latency(&self) -> Option<u64> { ... }
    fn avg_latency(&self) -> Option<u64> { ... }
}
}

6. Optimization Roadmap

Based on baseline data, future directions:

6.1 Short Term (0x07-c)

Optimization	Expected Gain	Difficulty
Use BufWriter	10-50x I/O	Low
Batch Write	2-5x	Low

6.2 Mid Term (0x08+)

Optimization	Expected Gain	Difficulty
Async I/O	Decouple Matching & Persistence	Medium
Memory Pool	Reduce Allocation	Medium

6.3 Long Term

Optimization	Expected Gain	Difficulty
DPDK/io_uring	10x+	High
FPGA	100x+	Extreme

7. Commands Reference

# Run and generate performance data
cargo run --release

# Update baseline (when code changes)
cargo run --release -- --baseline

# View performance data
cat output/t2_perf.txt

# Compare performance changes
python3 scripts/compare_perf.py

compare_perf.py Output Example

╔════════════════════════════════════════════════════════════════════════╗
║                    Performance Comparison Report                       ║
╚════════════════════════════════════════════════════════════════════════╝

Metric                           Baseline         Current       Change
───────────────────────────────────────────────────────────────────────────
Orders                             100000          100000            -
Trades                              47886           47886            -

Exec Time                       3753.87ms       3484.37ms        -7.2%
Throughput (orders)               26639/s         28700/s        +7.7%
Throughput (trades)               12756/s         13743/s        +7.7%

───────────────────────────────────────────────────────────────────────────
Timing Breakdown (lower is better):

Metric                           Baseline         Current     Change        OPS
Balance Check                     17.68ms         16.51ms      -6.6%       6.1M
Matching Engine                   36.04ms         35.01ms      -2.8%       2.9M
Settlement                         4.77ms          5.22ms      +9.4%      19.2M
Ledger I/O                      3678.68ms       3411.49ms      -7.3%        29K

───────────────────────────────────────────────────────────────────────────
Latency Percentiles (lower is better):

Metric                           Baseline         Current       Change
Latency MIN                         125ns           125ns        +0.0%
Latency AVG                        37.9µs          34.8µs        -8.2%
Latency P50                         584ns           541ns        -7.4%
Latency P99                       420.2µs         398.9µs        -5.1%
Latency P99.9                      1.63ms          1.24ms       -24.3%
Latency MAX                        9.76ms          3.53ms       -63.9%

───────────────────────────────────────────────────────────────────────────
✅ No significant regressions detected

Summary

This chapter accomplished:

PerfMetrics Structure: Collecting time breakdown & latency samples.
Time Breakdown: Balance Check / Matching / Settlement / Ledger I/O.
Latency Percentiles: P50 / P99 / P99.9 / Max.
t2_perf.txt: Machine-readable baseline file.
compare_perf.py: Tool to detect regression.
Key Finding: Ledger I/O takes 98.4%, major bottleneck.

↑ Back to Top

🇨🇳 中文

📦 代码变更: 查看 Diff

核心目的：建立可量化、可追踪、可比较的性能基线。

本章在 0x07-a 测试框架基础上，添加详细的性能指标收集和分析能力。

1. 为什么需要性能基线？

1.1 性能陷阱

没有基线的优化是盲目的：

过早优化：优化了占 1% 时间的代码
回归发现延迟：某次重构导致性能下降 50%，但 3 个月后才发现
无法量化改进：说“快了很多“，但具体快了多少？

1.2 基线的价值

有了基线，你可以：

每次提交前验证：性能没有下降
精确定位瓶颈：哪个组件消耗最多时间
量化优化效果：从 30K ops/s 提升到 100K ops/s

2. 性能指标设计

2.1 吞吐量指标

指标	说明	计算方式
`throughput_ops`	订单吞吐量	orders / exec_time
`throughput_tps`	成交吞吐量	trades / exec_time

2.2 时间分解

我们将执行时间分解为四个组件：

┌─────────────────────────────────────────────────────────────┐
│ Order Processing (per order)                                │
├─────────────────────────────────────────────────────────────┤
│ 1. Balance Check     │ Account lookup + balance validation  │
│    - Account lookup  │ FxHashMap O(1)                       │
│    - Fund locking    │ Check avail >= required, then lock   │
├─────────────────────────────────────────────────────────────┤
│ 2. Matching Engine   │ book.add_order()                     │
│    - Price lookup    │ BTreeMap O(log n)                    │
│    - Order matching  │ iterate + partial fill               │
├─────────────────────────────────────────────────────────────┤
│ 3. Settlement        │ settle_as_buyer/seller               │
│    - Balance update  │ HashMap O(1)                         │
├─────────────────────────────────────────────────────────────┤
│ 4. Ledger I/O        │ write_entry()                        │
│    - File write      │ Disk I/O                             │
└─────────────────────────────────────────────────────────────┘

2.3 延迟百分位数

采样每 N 个订单的总处理延迟，计算：

百分位数	含义
P50	中位数，典型情况
P99	99% 的请求低于此值
P99.9	尾延迟，最坏情况
Max	最大延迟

3. 初始基线数据

3.1 测试环境

硬件：MacBook Pro M 系列
数据：100,000 订单，47,886 成交
模式：Release build (--release)

3.2 吞吐量

Throughput: ~29,000 orders/sec | ~14,000 trades/sec
Execution Time: ~3.5s

3.3 时间分解 🔥

=== Performance Breakdown ===
Balance Check:       17.68ms (  0.5%)  ← FxHashMap O(1)
Matching Engine:     36.04ms (  1.0%)  ← 极快！
Settlement:           4.77ms (  0.1%)  ← 几乎可忽略
Ledger I/O:        3678.68ms ( 98.4%) ← 瓶颈！

关键发现：

Ledger I/O 占用 98.4% 的时间
Balance Check + Matching + Settlement 总共只需 ~58ms
理论上限：~170 万 orders/sec（如果没有 I/O）

3.4 订单生命周期性能时间线 📊

                           订单生命周期 (Order Lifecycle)
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
    │   Balance   │    │  Matching   │    │ Settlement  │    │  Ledger     │
    │   Check     │───▶│   Engine    │───▶│  (Balance)  │───▶│   I/O       │
    └─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘
          │                  │                  │                  │
          ▼                  ▼                  ▼                  ▼
    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
    │ FxHashMap   │    │  BTreeMap   │    │Vec<Balance> │    │  File::     │
    │   +Vec O(1) │    │  O(log n)   │    │    O(1)     │    │  write()    │
    └─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘

    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    Total Time:   17.68ms            36.04ms            4.77ms          3678.68ms
    Percentage:    0.5%               1.0%              0.1%             98.4%
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    Per-Order:    0.18µs             0.36µs            0.05µs           36.79µs
    Potential:   5.6M ops/s         2.8M ops/s       20M ops/s         27K ops/s
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

                        业务逻辑 ~58ms (1.6%)              I/O ~3679ms (98.4%)
                    ◀─────────────────────────▶      ◀───────────────────────▶
                             极快 ✅                        瓶颈 🔴

性能分析:

阶段	每订单延迟	理论 OPS	说明
Balance Check	0.18µs	5.6M/s	FxHashMap 账户查找 + Vec O(1) 余额索引
Matching Engine	0.36µs	2.8M/s	BTreeMap 价格匹配
Settlement	0.05µs	20M/s	Vec<Balance> O(1) 直接索引
Ledger I/O	36.79µs	27K/s	unbuffered 文件写入 = 瓶颈！

E2E 结果:

实际吞吐量: ~29K orders/sec (受限于 Ledger I/O)
理论上限 (无 I/O): ~1.7M orders/sec (60x 提升空间!)

3.5 延迟百分位数

=== Latency Percentiles (sampled) ===
  Min:        125 ns
  Avg:      34022 ns
  P50:        583 ns   ← 典型订单 < 1µs
  P99:     391750 ns   ← 99% 的订单 < 0.4ms
  P99.9:  1243833 ns   ← 尾延迟 ~1.2ms
  Max:    3207875 ns   ← 最坏 ~3ms

4. 输出文件

4.1 t2_perf.txt（机器可读）

# Performance Baseline - 0xInfinity
# Generated: 2025-12-16
orders=100000
trades=47886
exec_time_ms=3451.78
throughput_ops=28971
throughput_tps=13873
matching_ns=32739014
settlement_ns=3085409
ledger_ns=3388134698
latency_min_ns=125
latency_avg_ns=34022
latency_p50_ns=583
latency_p99_ns=391750
latency_p999_ns=1243833
latency_max_ns=3207875

4.2 t2_summary.txt（人类可读）

包含完整的执行摘要和性能分解。

5. PerfMetrics 实现

#![allow(unused)]
fn main() {
/// Performance metrics for execution analysis
#[derive(Default)]
struct PerfMetrics {
    // Timing breakdown (nanoseconds)
    total_balance_check_ns: u64,  // Account lookup + balance check + lock
    total_matching_ns: u64,       // OrderBook.add_order()
    total_settlement_ns: u64,     // Balance updates after trade
    total_ledger_ns: u64,         // Ledger file I/O
    
    // Per-order latency samples
    latency_samples: Vec<u64>,
    sample_rate: usize,
}

impl PerfMetrics {
    fn new(sample_rate: usize) -> Self { ... }
    
    fn add_order_latency(&mut self, latency_ns: u64) { ... }
    fn add_balance_check_time(&mut self, ns: u64) { ... }
    fn add_matching_time(&mut self, ns: u64) { ... }
    fn add_settlement_time(&mut self, ns: u64) { ... }
    fn add_ledger_time(&mut self, ns: u64) { ... }
    
    fn percentile(&self, p: f64) -> Option<u64> { ... }
    fn min_latency(&self) -> Option<u64> { ... }
    fn max_latency(&self) -> Option<u64> { ... }
    fn avg_latency(&self) -> Option<u64> { ... }
}
}

6. 优化路线图

基于基线数据，后续优化方向：

6.1 短期（0x07-c）

优化点	预期提升	难度
使用 BufWriter	10-50x I/O	低
批量写入	2-5x	低

6.2 中期（0x08+）

优化点	预期提升	难度
异步 I/O	解耦撮合和持久化	中
内存池	减少分配	中

6.3 长期

优化点	预期提升	难度
DPDK/io_uring	10x+	高
FPGA	100x+	极高

7. 命令参考

# 运行并生成性能数据
cargo run --release

# 更新基线（当代码变化时）
cargo run --release -- --baseline

# 查看性能数据
cat output/t2_perf.txt

# 对比性能变化
python3 scripts/compare_perf.py

compare_perf.py 输出示例

╔════════════════════════════════════════════════════════════════════════╗
║                    Performance Comparison Report                       ║
╚════════════════════════════════════════════════════════════════════════╝

Metric                           Baseline         Current       Change
───────────────────────────────────────────────────────────────────────────
Orders                             100000          100000            -
Trades                              47886           47886            -

Exec Time                       3753.87ms       3484.37ms        -7.2%
Throughput (orders)               26639/s         28700/s        +7.7%
Throughput (trades)               12756/s         13743/s        +7.7%

───────────────────────────────────────────────────────────────────────────
Timing Breakdown (lower is better):

Metric                           Baseline         Current     Change        OPS
Balance Check                     17.68ms         16.51ms      -6.6%       6.1M
Matching Engine                   36.04ms         35.01ms      -2.8%       2.9M
Settlement                         4.77ms          5.22ms      +9.4%      19.2M
Ledger I/O                      3678.68ms       3411.49ms      -7.3%        29K

───────────────────────────────────────────────────────────────────────────
Latency Percentiles (lower is better):

Metric                           Baseline         Current       Change
Latency MIN                         125ns           125ns        +0.0%
Latency AVG                        37.9µs          34.8µs        -8.2%
Latency P50                         584ns           541ns        -7.4%
Latency P99                       420.2µs         398.9µs        -5.1%
Latency P99.9                      1.63ms          1.24ms       -24.3%
Latency MAX                        9.76ms          3.53ms       -63.9%

───────────────────────────────────────────────────────────────────────────
✅ No significant regressions detected

Summary

本章完成了以下工作：

PerfMetrics 结构：收集时间分解和延迟样本
时间分解：Balance Check / Matching / Settlement / Ledger I/O
延迟百分位数：P50 / P99 / P99.9 / Max
t2_perf.txt：机器可读的性能基线文件
compare_perf.py：对比工具，检测性能回归
关键发现：Ledger I/O 占 98.4%，是主要瓶颈

0x08-a Trading Pipeline Design

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

Core Objective: To design a complete trading pipeline architecture that ensures order persistence, balance consistency, and system recoverability.

This chapter addresses the most critical design issues in a matching engine: Service Partitioning, Data Flow, and Atomicity Guarantees.

1. Why Persistence?

1.1 The Problem Scenario

Suppose the system crashes during matching:

User A sends Buy Order → ME receives & fills → System Crash
                                               ↓
                                        User A's funds deducted
                                        But no trade record
                                        Order Lost!

Consequences of No Persistence:

Order Loss: User orders vanish.
Inconsistent State: Funds changed but no record exists.
Unrecoverable: Upon restart, valid orders are unknown.

1.2 Solution: Persist First, Match Later

User A Buy Order → WAL Persist → ME Match → System Crash
                     ↓             ↓
                Order Saved    Replay & Recover!

2. Unique Ordering

2.1 Why Unique Ordering?

In distributed systems, multiple nodes must agree on order sequence:

Scenario	Problem
Node A receives Order 1 then Order 2
Node B receives Order 2 then Order 1	Inconsistent Order!

Result: Matching results differ between nodes!

2.2 Solution: Single Sequencer + Global Sequence ID

All Orders → Sequencer → Assign Global sequence_id → Persist → Dispatch to ME
              ↓
         Unique Arrival Order

Field	Description
`sequence_id`	Monotonically increasing global ID
`timestamp`	Nanosecond precision timestamp
`order_id`	Business level Order ID

3. Order Lifecycle

3.1 Persist First, Execute Later

┌─────────────────────────────────────────────────────────────────────────┐
│                          Order Lifecycle                                │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐             │
│   │ Gateway │───▶│Pre-Check│───▶│   WAL   │───▶│   ME    │             │
│   │(Receiver)│    │(Balance) │    │(Persist)│    │ (Match) │             │
│   └─────────┘    └─────────┘    └─────────┘    └─────────┘             │
│        │              │              │              │                   │
│        ▼              ▼              ▼              ▼                   │
│   Receive Order   Insufficient?   Disk Write     Execute Match           │
│                   Early Reject    Assign SeqID   Guaranteed Exec         │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

3.2 Pre-Check: Reducing Invalid Orders

Pre-Check queries UBSCore (User Balance Core Service) for balance info. Read-Only, No Side Effects.

#![allow(unused)]
fn main() {
async fn pre_check(order: Order) -> Result<Order, Reject> {
    // 1. Query UBSCore for balance (Read-Only)
    let balance = ubscore.query_balance(order.user_id, asset);

    // 2. Calculate required amount
    let required = match order.side {
        Buy  => order.price * order.qty / QTY_UNIT,  // quote
        Sell => order.qty,                            // base
    };

    // 3. Balance Check (Read-Only, No Lock)
    if balance.avail < required {
        return Err(Reject::InsufficientBalance);
    }

    // 4. Pass
    Ok(order)
}
// Note: Balance might be consumed by others between Pre-Check and WAL.
// This is allowed; WAL's Balance Lock will handle it.
}

Why Pre-Check?

The Core Flow (WAL + Balance Lock + Matching) is expensive. We must filter garbage orders fast.

No Pre-Check	With Pre-Check
Garbage enters core flow	Filters most invalid orders
Core wastes latency on invalid orders	Core processes mostly valid orders
Vulnerable to spam attacks	Reduces impact of malicious requests

Pre-Check Items:

✅ Balance Check
📋 User Status (Banned?)
📋 Format Validation
📋 Rate Limiting
📋 Risk Rules

3.3 Must Execute Once Persisted

Once an order is persisted, it MUST end in one of these states:

┌─────────────────────┐
│   Order Persisted   │
└─────────────────────┘
           │
           ├──▶ Filled
           ├──▶ PartialFilled
           ├──▶ New (Booked)
           ├──▶ Cancelled
           ├──▶ Expired
           └──▶ Rejected (Insufficient Balance) ← Valid Final State!

❌ Never: Logged but state unknown.

4. WAL: Why it’s the Best Choice?

4.1 What is WAL (Write-Ahead Log)?

WAL is an Append-Only log structure:

┌─────────────────────────────────────────────────────────────────┐
│                          WAL File                               │
├─────────────────────────────────────────────────────────────────┤
│  Entry 1  │  Entry 2  │  Entry 3  │  Entry 4  │  ...  │ ← Append│
│ (seq=1)   │ (seq=2)   │ (seq=3)   │ (seq=4)   │       │         │
└─────────────────────────────────────────────────────────────────┘
                                                          ↑
                                                     Append Only!

4.2 Why WAL for HFT?

Method	Write Pattern	Latency	Throughput	HFT Suitability
DB (MySQL)	Random + Txn	~1-10ms	~1K ops/s	❌ Too Slow
KV (Redis)	Random	~0.1-1ms	~10K ops/s	⚠️ Average
WAL	Sequential	~1-10µs	~1M ops/s	✅ Best

Why is WAL fast?

Sequential Write vs Random Write:
- HDD: No seek time (~10ms saved).
- SSD: Reduces Write Amplification.
- Result: 10-100x faster.
No Transaction Overhead:
- DB: Txn start, lock, redo log, data page, binlog, commit…
- WAL: Serialize -> Append -> (Optional) Fsync.
Group Commit:
- Batch multiple writes into one fsync.

#![allow(unused)]
fn main() {
// Group Commit Logic
pub fn flush(&mut self) -> io::Result<()> {
    self.file.write_all(&self.buffer)?;
    self.file.sync_data()?;  // fsync once for N orders
    self.buffer.clear();
    Ok(())
}
}

5. Single Thread + Lock-Free Architecture

5.1 Why Single Thread?

Intuition: Concurrency = Fast. Reality in HFT: Single Thread is Faster.

Multi-Thread	Single Thread
Locks & Contention	Lock-Free
Cache Invalidation	Cache Friendly
Context Switch Overhead	No Context Switch
Hard Ordering	Naturally Ordered
Complex Sync Logic	Simple Code

5.2 Mechanical Sympathy

CPU Cache Hierarchy:

L1 Cache: ~1ns
L2 Cache: ~4ns
RAM: ~100ns

Single Thread Advantage: Data stays in L1/L2 (Hot). No cache line contention.

5.3 LMAX Disruptor Pattern

Originating from LMAX Exchange (6M TPS on single thread):

Single Writer (Avoid write contention)
Pre-allocated Memory (Avoid GC/malloc)
Cache Padding (Avoid false sharing)
Batch Consumption

6. Ring Buffer: Inter-Service Communication

6.1 Why Ring Buffer?

Method	Latency	Throughput
HTTP/gRPC	~1ms	~10K/s
Kafka	~1-10ms	~1M/s
Shared Memory Ring Buffer	~100ns	~10M/s

6.2 Ring Buffer Principle

      write_idx                       read_idx
          ↓                               ↓
   ┌───┬───┬───┬───┬───┬───┬───┬───┬───┬───┐
   │ 8 │ 9 │10 │11 │12 │13 │14 │ 0 │ 1 │ 2 │ ...
   └───┴───┴───┴───┴───┴───┴───┴───┴───┴───┘
         ↑                               ↑
     New Data                        Consumer

Fixed size, circular.
Zero allocation during runtime.
SPSC (Single Producer Single Consumer) is lock-free.

7. Overall Architecture

7.1 Core Services

Service	Responsibility	State
Gateway	Receive Requests	Stateless
Pre-Check	Read-only Balance Check	Stateless
UBSCore	Balance Ops + Order WAL	Stateful (Balance)
ME	Matching, Generate Trades	Stateful (OrderBook)
Settlement	Persist Events	Stateless

7.2 UBSCore Service (User Balance Core)

Single Entry Point for ALL Balance Operations.

Why UBSCore?

Atomic: Single thread = No Double Spend.
Audit: Complete trace of all changes.
Recovery: Single WAL restores state.

Pipeline Role:

Write Order WAL (Persist)
Lock Balance
- Success → Forward to ME
- Fail → Rejected
Handle Trade Events (Settlement)
- Update buyer/seller balances.

7.3 Matching Engine (ME)

ME is Pure Matching. It ignores Balances.

Does: Maintain OrderBook, Match by Price/Time, Generate Trade Events.
Does NOT: Check balance, lock funds, persist data.

Trade Event Drive Balance Update: TradeEvent contains {price, qty, user_ids} → sufficient to calculate balance changes.

7.4 Settlement Service

Settlement Persists, does not modify Balances.

Persist Trade Events, Order Events.
Write Audit Log (Ledger).

7.5 Architecture Diagram

┌──────────────────────────────────────────────────────────────────────────────────┐
│                         0xInfinity HFT Architecture                               │
├──────────────────────────────────────────────────────────────────────────────────┤
│   Client Orders                                                                   │
│        │                                                                          │
│        ▼                                                                          │
│   ┌──────────────┐                                                                │
│   │   Gateway    │                                                                │
│   └──────┬───────┘                                                                │
│          ▼                                                                        │
│   ┌──────────────┐         query balance                                          │
│   │  Pre-Check   │ ──────────────────────────────▶   UBSCore Service              │
│   └──────┬───────┘                                                                │
│          ▼                                                                        │
│   ┌──────────────┐                                   ┌────────────────────┐       │
│   │ Order Buffer │                                   │  Balance State     │       │
│   └──────┬───────┘                                   │  (RAM, Single Thd) │       │
│          │ Ring Buffer                               └────────────────────┘       │
│          ▼                                                                        │
│   ┌──────────────────────────────────────────┐                                    │
│   │  UBSCore: Order Processing               │       Operations:                  │
│   │  1. Write Order WAL (Persist)            │       - lock / unlock              │
│   │  2. Lock Balance                         │       - spend_frozen               │
│   │     - OK → forward to ME                 │       - deposit                    │
│   │     - Fail → Rejected                    │                                    │
│   └──────────────┬───────────────────────────┘                                    │
│                  │ Ring Buffer (valid orders)                                     │
│                  ▼                                                                │
│   ┌──────────────────────────────────────────┐                                    │
│   │         Matching Engine (ME)             │                                    │
│   │                                          │                                    │
│   │  Pure Matching, Ignore Balance           │                                    │
│   │  Output: Trade Events                    │                                    │
│   └──────────────┬───────────────────────────┘                                    │
│                  │ Ring Buffer (Trade Events)                                     │
│         ┌───────┴────────┐                                                        │
│         ▼                ▼                                                        │
│   ┌───────────┐   ┌─────────────────────────┐                                     │
│   │ Settlement│   │ Balance Update Events   │────▶   Execute Balance Update       │
│   │           │   │ (from Trade Events)     │                                     │
│   │ Persist:  │   └─────────────────────────┘                                     │
│   │ - Trades  │                                                                   │
│   │ - Ledger  │                                                                   │
│   └───────────┘                                                                   │
└───────────────────────────────────────────────────────────────────────────────────┘

7.7 Event Sourcing + Pure State Machine

Order WAL = Single Source of Truth

State(t) = Replay(Order_WAL[0..t])

Any state (Balance, OrderBook) can be 100% reconstructed by replaying the Order WAL.

Pure State Machines:

UBSCore: Order Events → Balance Events (Deterministic)
ME: Valid Orders → Trade Events (Deterministic)

Recovery Flow:

Load Checkpoint (Snapshot).
Replay Order WAL from checkpoint.
ME re-matches and generates events.
UBSCore applies balance updates.
System Restored.

8. Summary

Core Decisions:

Persist First: WAL ensures recoverability.
Pre-Check: Filters invalid orders early.
Single Thread + Lock-Free: Avoids contention, maximizes throughput.
UBSCore: Centralized, atomic balance management.
Responsibility Segregation: UBSCore (Money), ME (Match), Settlement (Log).

Refactoring: For the upcoming implementation, we refactored the code structure:

lib.rs, main.rs, core_types.rs, config.rs
orderbook.rs, balance.rs, engine.rs
csv_io.rs, ledger.rs, perf.rs

Next: Detailed implementation of UBSCore and Ring Buffer.

↑ Back to Top

🇨🇳 中文

📦 代码变更: 查看 Diff

核心目的：设计完整的交易流水线架构，确保订单持久化、余额一致性和系统可恢复性。

本章解决撮合引擎最关键的设计问题：服务划分、数据流和原子性保证。

1. 为什么需要持久化？

1.1 问题场景

假设系统在撮合过程中崩溃：

用户 A 发送买单 → ME 接收并成交 → 系统崩溃
                                    ↓
                            用户 A 的钱扣了
                            但没有成交记录
                            订单丢失!

没有持久化的后果：

订单丢失：用户下的单消失了
状态不一致：资金变动了但没有记录
无法恢复：重启后不知道有哪些订单

1.2 解决方案：先持久化，后撮合

用户 A 发送买单 → WAL 持久化 → ME 撮合 → 系统崩溃
                    ↓              ↓
               订单已保存      可以重放恢复!

2. 唯一排序 (Unique Ordering)

2.1 为什么需要唯一排序？

在分布式系统中，多个节点必须对订单顺序达成一致：

场景	问题
节点 A 先收到订单 1，再收到订单 2
节点 B 先收到订单 2，再收到订单 1	顺序不一致！

结果：两个节点的撮合结果可能不同！

2.2 解决方案：单点排序 + 全局序号

所有订单 → Sequencer → 分配全局 sequence_id → 持久化 → 分发到 ME
              ↓
         唯一的到达顺序

字段	说明
`sequence_id`	单调递增的全局序号
`timestamp`	精确到纳秒的时间戳
`order_id`	业务层订单 ID

3. 订单生命周期

3.1 先持久化，后执行

┌─────────────────────────────────────────────────────────────────────────┐
│                         订单生命周期                                      │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐             │
│   │ Gateway │───▶│Pre-Check│───▶│   WAL   │───▶│   ME    │             │
│   │(接收订单)│    │(余额校验)│    │ (持久化)│    │ (撮合) │             │
│   └─────────┘    └─────────┘    └─────────┘    └─────────┘             │
│        │              │              │              │                   │
│        ▼              ▼              ▼              ▼                   │
│    接收订单      余额不足?        写入磁盘        执行撮合               │
│                  提前拒绝        分配seq_id      保证执行               │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

3.2 Pre-Check：减少无效订单

Pre-Check 通过查询 UBSCore (User Balance Core Service，用户余额核心服务，详见第 7.2 节) 获取余额信息，只读，无副作用：

#![allow(unused)]
fn main() {
async fn pre_check(order: Order) -> Result<Order, Reject> {
     // 1. 查询 UBSCore 获取余额 (只读查询)
     let balance = ubscore.query_balance(order.user_id, asset);

     // 2. 计算所需金额
     let required = match order.side {
         Buy  => order.price * order.qty / QTY_UNIT,  // quote
         Sell => order.qty,                            // base
     };

     // 3. 余额检查 (只读，不锁定)
     if balance.avail < required {
         return Err(Reject::InsufficientBalance);
     }

     // 4. 检查通过，放行订单到下一阶段
     Ok(order)
}
// 注意：Pre-Check 不锁定余额！
// 余额可能在 Pre-Check 和 WAL 之间被其他订单消耗
// 这是允许的，WAL 后的 Balance Lock 会处理这种情况
}

为什么需要 Pre-Check？

核心流程（WAL 持久化、Balance Lock、撮合）的延迟成本很高。用户可能提交大量垃圾订单，我们需要最快速地预过滤，减少进入核心流程的订单量。

不 Pre-Check	有 Pre-Check
垃圾订单直接进入核心流程	快速过滤大部分无效订单
核心流程处理无效订单，浪费延迟	核心流程只处理可能有效的订单
系统容易被刷单攻击	减少恶意请求的影响

Pre-Check 可以包含多种快速检查：

✅ 余额检查（当前实现）
📋 用户状态检查（是否被禁用）
📋 订单格式校验
📋 频率限制 (Rate Limit)
📋 风控规则（未来扩展）

重要：Pre-Check 是“尽力而为“的过滤器，不保证 100% 准确。通过 Pre-Check 的订单，仍可能在 WAL + Balance Lock 阶段被拒绝。

3.3 一旦持久化，必须完整执行

订单被持久化后，无论发生什么，都必须有以下其中一个结果：

┌─────────────────────┐
│ 订单已持久化         │
└─────────────────────┘
           │
           ├──▶ 成交 (Filled)
           ├──▶ 部分成交 (PartialFilled)
           ├──▶ 挂单中 (New)
           ├──▶ 用户取消 (Cancelled)
           ├──▶ 系统过期 (Expired)
           └──▶ 余额不足被拒绝 (Rejected)  ← 也是合法的终态！

❌ 绝对不能：订单消失 / 状态未知

4. WAL：为什么是最佳选择？

4.1 什么是 WAL (Write-Ahead Log)?

WAL 是一种追加写 (Append-Only) 的日志结构：

┌─────────────────────────────────────────────────────────────────┐
│                          WAL File                               │
├─────────────────────────────────────────────────────────────────┤
│  Entry 1  │  Entry 2  │  Entry 3  │  Entry 4  │  ...  │ ← 追加  │
│ (seq=1)   │ (seq=2)   │ (seq=3)   │ (seq=4)   │       │         │
└─────────────────────────────────────────────────────────────────┘
                                                          ↑
                                                     只追加，不修改

4.2 为什么 WAL 是 HFT 最佳实践？

持久化方式	写入模式	延迟	吞吐量	HFT 适用性
数据库 (MySQL/Postgres)	随机写 + 事务	~1-10ms	~1K ops/s	❌ 太慢
KV 存储 (Redis/RocksDB)	随机写	~0.1-1ms	~10K ops/s	⚠️ 一般
WAL 追加写	顺序写	~1-10µs	~1M ops/s	✅ 最佳

为什么 WAL 这么快？

顺序写 vs 随机写：
- 机械硬盘不用寻道。
- SSD 减少写放大。
- 结果：快 10-100 倍。
无事务开销：
- 无需锁、redo log、binlog 等数据库复杂机制。
批量刷盘 (Group Commit)：
- 合并多次写入一次 fsync。

5. 单线程 + Lock-Free 架构

5.1 为什么选择单线程？

大多数人直觉认为：并发 = 快。但在 HFT 领域，单线程往往更快：

多线程	单线程
需要锁保护共享状态	无锁，无竞争
缓存失效 (cache invalidation)	缓存友好
上下文切换开销	无切换开销
顺序难以保证	天然有序
复杂的同步逻辑	代码简单直观

5.2 Mechanical Sympathy

CPU Cache Hierarchy:

L1 Cache: ~1ns
L2 Cache: ~4ns
RAM: ~100ns

单线程优势：数据始终在 L1/L2 缓存中（热数据），无 cache line 争用。

5.3 LMAX Disruptor 模式

这种单线程 + Ring Buffer 的架构源自 LMAX Exchange（伦敦多资产交易所），号称能在单线程上处理 600 万订单/秒：

Single Writer (避免写竞争)
Pre-allocated Memory (避免 GC/malloc)
Cache Padding (避免 false sharing)
Batch Consumption

6. Ring Buffer：服务间通信

6.1 为什么使用 Ring Buffer？

服务间通信的选择：

方式	延迟	吞吐量
HTTP/gRPC	~1ms	~10K/s
Kafka	~1-10ms	~1M/s
Shared Memory Ring Buffer	~100ns	~10M/s

6.2 Ring Buffer 原理

      write_idx                       read_idx
          ↓                               ↓
   ┌───┬───┬───┬───┬───┬───┬───┬───┬───┬───┐
   │ 8 │ 9 │ 10│ 11│ 12│ 13│ 14│ 15│ 0 │ 1 │ ...
   └───┴───┴───┴───┴───┴───┴───┴───┴───┴───┘
         ↑                               ↑
     新数据写入                        消费者读取

固定大小，循环使用
无需动态分配
Single Producer, Single Consumer ({SPSC) 可完全无锁

7. 整体架构

7.1 核心服务

服务	职责	状态
Gateway	接收客户端请求	无状态
Pre-Check	只读查询余额，过滤无效订单	无状态
UBSCore	所有余额操作 + Order WAL	有状态 (余额)
ME	纯撮合，生成 Trade Events	有状态 (OrderBook)
Settlement	持久化 events，未来写 DB	无状态

7.2 UBSCore Service (User Balance Core)

UBSCore 是所有账户余额操作的唯一入口，单线程执行保证原子性。

应用场景：

Write Order WAL (持久化)
Lock Balance (锁定)
Handle Trade Events (成交后结算)

7.3 Matching Engine (ME)

ME 是纯撮合引擎，不关心余额。

负责：维护 OrderBook，撮合，生成 Trade Events。
不负责：检查余额，锁定资金，持久化。

Trade Event 驱动余额更新： TradeEvent 包含 {price, qty, user_ids}，足够计算出余额变化。

7.4 Settlement Service

Settlement 负责持久化，不修改余额。

持久化 Trade Events，Order Events。
写审计日志 (Ledger)。

7.5 完整架构图

┌──────────────────────────────────────────────────────────────────────────────────┐
│                         0xInfinity HFT Architecture                               │
├──────────────────────────────────────────────────────────────────────────────────┤
│   Client Orders                                                                   │
│        │                                                                          │
│        ▼                                                                          │
│   ┌──────────────┐                                                                │
│   │   Gateway    │                                                                │
│   └──────┬───────┘                                                                │
│          ▼                                                                        │
│   ┌──────────────┐         query balance                                          │
│   │  Pre-Check   │ ──────────────────────────────▶   UBSCore Service              │
│   └──────┬───────┘                                                                │
│          ▼                                                                        │
│   ┌──────────────┐                                   ┌────────────────────┐       │
│   │ Order Buffer │                                   │  Balance State     │       │
│   └──────┬───────┘                                   │  (RAM, Single Thd) │       │
│          │ Ring Buffer                               └────────────────────┘       │
│          ▼                                                                        │
│   ┌──────────────────────────────────────────┐                                    │
│   │  UBSCore: Order Processing               │       Operations:                  │
│   │  1. Write Order WAL (持久化)              │       - lock / unlock              │
│   │  2. Lock Balance                         │       - spend_frozen               │
│   │     - OK → forward to ME                 │       - deposit                    │
│   │     - Fail → Rejected                    │                                    │
│   └──────────────┬───────────────────────────┘                                    │
│                  │ Ring Buffer (valid orders)                                     │
│                  ▼                                                                │
│   ┌──────────────────────────────────────────┐                                    │
│   │         Matching Engine (ME)             │                                    │
│   │                                          │                                    │
│   │  纯撮合，不关心 Balance                   │                                    │
│   │  输出: Trade Events                      │                                    │
│   └──────────────┬───────────────────────────┘                                    │
│                  │ Ring Buffer (Trade Events)                                     │
│         ┌───────┴────────┐                                                        │
│         ▼                ▼                                                        │
│   ┌───────────┐   ┌─────────────────────────┐                                     │
│   │ Settlement│   │ Balance Update Events   │────▶   执行余额更新                 │
│   │           │   │ (from Trade Events)     │                                     │
│   │ 持久化:    │   └─────────────────────────┘                                     │
│   │ - Trades  │                                                                   │
│   │ - Ledger  │                                                                   │
│   └───────────┘                                                                   │
└───────────────────────────────────────────────────────────────────────────────────┘

7.7 Event Sourcing + Pure State Machine

Order WAL = Single Source of Truth

State(t) = Replay(Order_WAL[0..t])

只要有 Order WAL，就能恢复整个系统状态！

Pure State Machines:

UBSCore: Order Events → Balance Events (确定性)
ME: Valid Orders → Trade Events (确定性)

恢复流程:

加载最近快照 Checkpoint。
重放 Order WAL。
系统恢复到崩溃前状态。

8. Summary

核心设计：

先持久化：WAL 保证可恢复性。
Pre-Check：提前过滤无效订单。
单线程 + 无锁：避免锁竞争，最大化吞吐。
UBSCore：集中式、原子的余额管理。
职责分离：UBSCore (钱)，ME (撮合)，Settlement (日志)。

代码重构：为后续章节准备，我们重构了 src 目录结构，模块化了 main.rs, core_types.rs 等。

下一步：实现 UBSCore 和 Ring Buffer。

0x08-b UBSCore Implementation

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

Objective: From design to implementation: Building a Safety-First Balance Core Service.

In the previous chapter (0x08-a), we designed the full HFT pipeline architecture. Now, it’s time to implement the core components. This chapter covers:

Ring Buffer - Lock-free inter-service communication.
Write-Ahead Log (WAL) - Order persistence.
UBSCore Service - The core balance service.

1. Technology Selection: Safety First

In financial systems, maturity and stability outweigh extreme performance.

1.1 Ring Buffer Selection

Crate	Maturity	Security	Performance
`crossbeam-queue`	🌟🌟🌟🌟🌟 (3.3M+ DLs)	Heavily Audited	Very Low Latency
`ringbuf`	🌟🌟🌟🌟 (600K+ DLs)	Community Verified	Lower Latency
`rtrb`	🌟🌟🌟 (Newer)	Less Vetted	Lowest Latency

Our Choice: crossbeam-queue

Reasons:

Maintained by Rust core team members.
Base dependency for tokio, actix, rayon.
If it has a bug, half the Rust ecosystem collapses.

Financial System Selection Principle: Use what lets you sleep at night.

#![allow(unused)]
fn main() {
use crossbeam_queue::ArrayQueue;

// Create fixed-size ring buffer
let queue: ArrayQueue<OrderMessage> = ArrayQueue::new(1024);

// Producer: Non-blocking push
queue.push(order_msg).unwrap();

// Consumer: Non-blocking pop
if let Some(msg) = queue.pop() {
    process(msg);
}
}

2. Write-Ahead Log (WAL)

WAL is the system’s Single Source of Truth.

2.1 Design Principles

#![allow(unused)]
fn main() {
/// Write-Ahead Log for Orders
///
/// Principles:
/// 1. Append-Only: Sequential I/O, max performance.
/// 2. Group Commit: Batch fsyncs.
/// 3. Monotonic sequence_id: Deterministic replay.
pub struct WalWriter {
    writer: BufWriter<File>,
    next_seq: SeqNum,
    pending_count: usize,
    config: WalConfig,
}
}

2.2 Group Commit Strategy

Flush Strategy	Latency	Throughput	Safety
Every Entry	~50µs	~20K/s	Highest
Every 100 Entries	~5µs (amortized)	~200K/s	High
Every 1ms	~1µs (amortized)	~1M/s	Medium

We choose Every 100 Entries to balance performance and safety:

#![allow(unused)]
fn main() {
pub struct WalConfig {
    pub path: String,
    pub flush_interval_entries: usize,  // Flush every N entries
    pub sync_on_flush: bool,            // Whether to call fsync
}
}

2.3 WAL Entry Format

Currently CSV (readable for dev):

seq_id,timestamp_ns,order_id,user_id,price,qty,side,order_type
1,1702742400000000000,1001,100,85000000000,100000000,Buy,Limit

In production, switch to Binary (54 bytes/entry) for better performance.

3. UBSCore Service

UBSCore is the Single Entry Point for all balance operations.

3.1 Responsibilities

Balance State Management: In-memory balance state.
Order WAL Writing: Persist orders.
Balance Operations: lock/unlock/spend_frozen/deposit.

3.2 Core Structure

#![allow(unused)]
fn main() {
pub struct UBSCore {
    /// User Accounts - Authoritative Balance State
    accounts: FxHashMap<UserId, UserAccount>,
    /// Write-Ahead Log
    wal: WalWriter,
    /// Configuration
    config: TradingConfig,
    /// Pending Orders (Locked but not filled)
    pending_orders: FxHashMap<OrderId, PendingOrder>,
    /// Statistics
    stats: UBSCoreStats,
}
}

3.3 Order Processing Flow

process_order(order):
  │
  ├─ 1. Write to WAL ──────────► Get seq_id
  │
  ├─ 2. Validate order ────────► Check price/qty
  │
  ├─ 3. Get user account ──────► Lookup user
  │
  ├─ 4. Calculate lock amount ─► Buy: price * qty / qty_unit
  │                              Sell: qty
  │
  └─ 5. Lock balance ──────────► Success → Ok(ValidOrder)
                                 Fail    → Err(Rejected)

Implementation:

#![allow(unused)]
fn main() {
pub fn process_order(&mut self, order: Order) -> Result<ValidOrder, OrderEvent> {
    // Step 1: Write to WAL FIRST (persist before any state change)
    let seq_id = self.wal.append(&order)?;

    // Step 2-4: Validate and calculate
    // ...

    // Step 5: Lock balance
    let lock_result = account
        .get_balance_mut(locked_asset_id)
        .and_then(|balance| balance.lock(locked_amount));

    match lock_result {
        Ok(()) => {
            // Track pending order
            self.pending_orders.insert(order.id, PendingOrder { ... });
            Ok(ValidOrder::new(seq_id, order, locked_amount, locked_asset_id))
        }
        Err(_) => Err(OrderEvent::Rejected { ... })
    }
}
}

3.4 Settlement

#![allow(unused)]
fn main() {
pub fn settle_trade(&mut self, event: &TradeEvent) -> Result<(), &'static str> {
    let trade = &event.trade;
    let quote_amount = trade.price * trade.qty / self.config.qty_unit();

    // Buyer: spend USDT, receive BTC
    buyer.get_balance_mut(quote_id)?.spend_frozen(quote_amount)?;
    buyer.get_balance_mut(base_id)?.deposit(trade.qty)?;

    // Seller: spend BTC, receive USDT
    seller.get_balance_mut(base_id)?.spend_frozen(trade.qty)?;
    seller.get_balance_mut(quote_id)?.deposit(quote_amount)?;

    Ok(())
}
}

4. Message Types

Services communicate via defined message types:

#![allow(unused)]
fn main() {
// Gateway → UBSCore
pub struct OrderMessage {
    pub seq_id: SeqNum,
    pub order: Order,
    // ...
}

// UBSCore → ME
pub struct ValidOrder {
    pub seq_id: SeqNum,
    pub order: Order,
    pub locked_amount: u64,
    // ...
}

// ME → UBSCore + Settlement
pub struct TradeEvent {
    pub trade: Trade,
    pub taker_order_id: OrderId,
    pub maker_order_id: OrderId,
    // ...
}
}

5. Integration & Usage

5.1 CLI Arguments

# Original Pipeline
cargo run --release

# UBSCore Pipeline (Enable WAL)
cargo run --release -- --ubscore

5.2 Performance Comparison

Metric	Original	UBSCore	Change
Throughput	15,070 ops/s	14,314 ops/s	-5%
WAL Entries	N/A	100,000	6.67 MB
Balance Check	0.3%	1.3%	+1%
Matching	45.5%	45.5%	-
Settlement	0.1%	0.2%	-
Ledger I/O	54.0%	53.0%	-1%

Analysis:

WAL introduces ~5% overhead.
Acceptable cost for safety.
Main bottleneck remains Ledger I/O.

6. Tests

6.1 Unit Tests

cargo test
# 31 tests passing

6.2 E2E Tests

sh scripts/test_e2e.sh
# ✅ All tests passed!

7. New Files

File	Lines	Description
`src/messages.rs`	265	Inter-service messages
`src/wal.rs`	340	Write-Ahead Log
`src/ubscore.rs`	490	User Balance Core

8. Key Learnings

8.1 Safety First

Maturity > Performance
Auditable > Rapid Dev

8.2 WAL is Single Source of Truth

All state = f(WAL). Foundation for Disaster Recovery and Audit.

8.3 Single Thread Advantage

UBSCore uses single thread for natural atomicity (no locking needed for balance ops) and predictable latency.

9. Critical Bug Fix: Cost Calculation Overflow

9.1 The Issue

Testing with --ubscore revealed 1032 rejected orders that were accepted in the legacy mode.

9.2 Root Cause

Overflow in price * qty (u64).

Example Order #21:

Price: 84,956.01 USDT (6 decimals) -> 84,956,010,000
Qty: 2.56 BTC (8 decimals) -> 256,284,400
Product: 2.177 × 10^19 > u64::MAX

9.3 Why Legacy Mode Passed?

Release Code Wrapping Arithmetic: Legacy code cost = price * qty wrapped around, resulting in a much smaller, incorrect value. users were locked for 33k USDT but bought 217k USDT worth of BTC!

9.4 The Fix

#![allow(unused)]
fn main() {
// Use u128 for intermediate calculation
let cost_128 = (self.price as u128) * (self.qty as u128) / (qty_unit as u128);
if cost_128 > u64::MAX as u128 {
    Err(CostError::Overflow)
}
}

9.5 Configuration Issue

USDT with 6 decimals is risky. Recommended: 2 decimals. Binance uses 2 decimals for USDT price.

10. Improvement: Ledger Integrity & Determinism

10.1 Incomplete Ledger

Current Ledger lacks Deposit, Lock, Unlock, SpendFrozen. Only tracks Settlement.

10.2 Pipeline Non-Determinism

Pipeline concurrency means Lock and Settlement events interleave non-deterministically. Snapshot comparison is impossible.

10.3 Solution: Version Space Separation

Separate version counters for Lock events and Settle events.

Version Space	Increment On	Sort By	Determinism
`lock_version`	Lock/Unlock	`order_seq_id`	✅ Deterministic
`settle_version`	Settle	`trade_id`	✅ Deterministic

Validation Strategy: Verify the Final Set of events, sorted by their respective versions/source IDs, rather than checking snapshot consistency at arbitrary times.

11. Design Discussion: Causal Chain

UBSCore has inputs from OrderQueue and TradeQueue. Interleaving is random.

Solution:

OrderQueue strictly follows order_seq_id.
TradeQueue strictly follows trade_id.
Link every Balance Event to its source (order_seq_id or trade_id).
This forms a Causal Chain for audit.

#![allow(unused)]
fn main() {
struct BalanceEvent {
    // ...
    source_type: SourceType, // Order | Trade
    source_id: u64,          // order_seq_id | trade_id
}
}

This allows offline verification: Lock(source=Order N) must exist if Order N exists. Settle(source=Trade M) must exist if Trade M exists.

12. Next Steps (0x08-c)

Implement Version Space Separation.
Expand BalanceEvent with causal links.
Integrate Ring Buffer.
Develop Causal Chain Audit Tools.

↑ Back to Top

🇨🇳 中文

📦 代码变更: 查看 Diff

从设计到实现：构建安全第一的余额核心服务

概述

在上一章（0x08-a）中，我们设计了完整的 HFT 交易流水线架构。现在，是时候实现核心组件了。本章我们将构建：

Ring Buffer - 服务间无锁通信
Write-Ahead Log (WAL) - 订单持久化
UBSCore Service - 余额核心服务

1. 技术选型：安全第一

在金融系统中，成熟稳定比极致性能更重要。

1.1 Ring Buffer 选型

库	成熟度	安全性	性能
`crossbeam-queue`	🌟🌟🌟🌟🌟 (330万+下载)	最严苛审计	极低延迟
`ringbuf`	🌟🌟🌟🌟 (60万+下载)	社区验证	更低延迟
`rtrb`	🌟🌟🌟 (较新)	较少审查	最低延迟

我们的选择：crossbeam-queue

理由：

Rust 核心团队成员参与维护
被 tokio, actix, rayon 作为底层依赖
如果它有 Bug，半个 Rust 生态都会崩

金融系统选型原则：用它睡得着觉。

#![allow(unused)]
fn main() {
use crossbeam_queue::ArrayQueue;

// 创建固定容量的 ring buffer
let queue: ArrayQueue<OrderMessage> = ArrayQueue::new(1024);

// 生产者：非阻塞 push
queue.push(order_msg).unwrap();

// 消费者：非阻塞 pop
if let Some(msg) = queue.pop() {
    process(msg);
}
}

2. Write-Ahead Log (WAL)

WAL 是系统的唯一事实来源 (Single Source of Truth)。

2.1 设计原则

#![allow(unused)]
fn main() {
/// Write-Ahead Log for Orders
///
/// 设计原则:
/// 1. 追加写 (Append-Only) - 顺序 I/O，最大化性能
/// 2. Group Commit - 批量刷盘，减少 fsync 次数
/// 3. 单调递增 sequence_id - 保证确定性重放
pub struct WalWriter {
    writer: BufWriter<File>,
    next_seq: SeqNum,
    pending_count: usize,
    config: WalConfig,
}
}

2.2 Group Commit 策略

刷盘策略	延迟	吞吐量	数据安全
每条 fsync	~50µs	~20K/s	最高
每 100 条	~5µs (均摊)	~200K/s	高
每 1ms	~1µs (均摊)	~1M/s	中

我们选择 每 100 条刷盘，在性能和安全间取得平衡：

#![allow(unused)]
fn main() {
pub struct WalConfig {
    pub path: String,
    pub flush_interval_entries: usize,  // 每 N 条刷盘
    pub sync_on_flush: bool,            // 是否调用 fsync
}
}

2.3 WAL 条目格式

当前使用 CSV 格式（开发阶段可读性好）：

seq_id,timestamp_ns,order_id,user_id,price,qty,side,order_type
1,1702742400000000000,1001,100,85000000000,100000000,Buy,Limit

生产环境可切换为二进制格式（54 bytes/entry）以提升性能。

3. UBSCore Service

UBSCore 是所有余额操作的唯一入口。

3.1 职责

Balance State Management - 内存中的余额状态
Order WAL Writing - 持久化订单
Balance Operations - lock/unlock/spend_frozen/deposit

3.2 核心结构

#![allow(unused)]
fn main() {
pub struct UBSCore {
    /// 用户账户 - 权威余额状态
    accounts: FxHashMap<UserId, UserAccount>,
    /// Write-Ahead Log
    wal: WalWriter,
    /// 交易配置
    config: TradingConfig,
    /// 待处理订单（已锁定但未成交）
    pending_orders: FxHashMap<OrderId, PendingOrder>,
    /// 统计信息
    stats: UBSCoreStats,
}
}

3.3 订单处理流程

process_order(order):
  │
  ├─ 1. Write to WAL ──────────► 获得 seq_id
  │
  ├─ 2. Validate order ────────► 价格/数量检查
  │
  ├─ 3. Get user account ──────► 查找用户
  │
  ├─ 4. Calculate lock amount ─► Buy: price * qty / qty_unit
  │                              Sell: qty
  │
  └─ 5. Lock balance ──────────► Success → Ok(ValidOrder)
                                 Fail    → Err(Rejected)

代码实现：

#![allow(unused)]
fn main() {
pub fn process_order(&mut self, order: Order) -> Result<ValidOrder, OrderEvent> {
    // Step 1: Write to WAL FIRST (persist before any state change)
    let seq_id = self.wal.append(&order)?;

    // Step 2-4: Validate and calculate
    // ...

    // Step 5: Lock balance
    let lock_result = account
        .get_balance_mut(locked_asset_id)
        .and_then(|balance| balance.lock(locked_amount));

    match lock_result {
        Ok(()) => {
            // Track pending order
            self.pending_orders.insert(order.id, PendingOrder { ... });
            Ok(ValidOrder::new(seq_id, order, locked_amount, locked_asset_id))
        }
        Err(_) => Err(OrderEvent::Rejected { ... })
    }
}
}

3.4 成交结算

#![allow(unused)]
fn main() {
pub fn settle_trade(&mut self, event: &TradeEvent) -> Result<(), &'static str> {
    let trade = &event.trade;
    let quote_amount = trade.price * trade.qty / self.config.qty_unit();

    // Buyer: spend USDT, receive BTC
    buyer.get_balance_mut(quote_id)?.spend_frozen(quote_amount)?;
    buyer.get_balance_mut(base_id)?.deposit(trade.qty)?;

    // Seller: spend BTC, receive USDT
    seller.get_balance_mut(base_id)?.spend_frozen(trade.qty)?;
    seller.get_balance_mut(quote_id)?.deposit(quote_amount)?;

    Ok(())
}
}

4. 消息类型

服务间通过明确定义的消息类型通信：

#![allow(unused)]
fn main() {
// Gateway → UBSCore
pub struct OrderMessage {
    pub seq_id: SeqNum,
    pub order: Order,
    // ...
}

// UBSCore → ME
pub struct ValidOrder {
    pub seq_id: SeqNum,
    pub order: Order,
    pub locked_amount: u64,
    // ...
}

// ME → UBSCore + Settlement
pub struct TradeEvent {
    pub trade: Trade,
    pub taker_order_id: OrderId,
    pub maker_order_id: OrderId,
    // ...
}
}

5. 集成与使用

5.1 命令行参数

# 原始流水线
cargo run --release

# UBSCore 流水线（启用 WAL）
cargo run --release -- --ubscore

5.2 性能对比

指标	原始	UBSCore	变化
吞吐量	15,070 ops/s	14,314 ops/s	-5%
WAL 条目	N/A	100,000	6.67 MB
余额检查	0.3%	1.3%	+1%
匹配引擎	45.5%	45.5%	-
结算	0.1%	0.2%	-
账本 I/O	54.0%	53.0%	-1%

分析：

WAL 写入引入约 5% 的开销
这是可接受的代价，换取了数据安全性
主要瓶颈仍是 Ledger I/O（下一章优化目标）

6. 测试

6.1 单元测试

cargo test
# 31 tests passing

6.2 E2E 测试

sh scripts/test_e2e.sh
# ✅ All tests passed!

7. 新增文件

文件	行数	描述
`src/messages.rs`	265	服务间消息类型
`src/wal.rs`	340	Write-Ahead Log
`src/ubscore.rs`	490	User Balance Core

8. 关键学习

8.1 安全第一

成熟稳定 > 极致性能
可审计 > 快速开发
用它睡得着觉 是选型的最高标准

8.2 WAL 是唯一事实来源

All state = f(WAL)。任何时刻，系统状态都可以从 WAL 100% 重建。这也是灾难恢复和审计合规的基础。

8.3 单线程是优势

UBSCore 选择单线程不是因为简单，而是因为：

自然的原子性（无锁）
不可能双重支付
可预测的延迟

9. 重要 Bug 修复：Cost 计算溢出

9.1 问题发现

在实现 UBSCore 并运行 --ubscore 模式测试时，发现了 1032 个订单被拒绝，而传统模式全部接受。

9.2 根本原因

Cost 计算时 price * qty 溢出 u64。

订单 #21:

price = 84,956,010,000 (84956.01 USDT，6位精度)
qty = 256,284,400 (2.562844 BTC，8位精度)
price * qty = 2.177 × 10^19 > u64::MAX

9.3 传统模式为什么没报错？

Release 模式的 wrapping arithmetic！ 传统模式下，溢出后值变小，虽然通过了检查，但是锁定的金额严重不足！这是一个巨大的金融漏洞。

9.4 修复方案

#![allow(unused)]
fn main() {
// 使用 u128 进行中间计算
let cost_128 = (self.price as u128) * (self.qty as u128) / (qty_unit as u128);
if cost_128 > u64::MAX as u128 {
    Err(CostError::Overflow)
}
}

9.5 配置问题：USDT 精度过高

USDT 使用 6 位精度导致溢出风险。建议使用 2 位精度（Binance 标准）。

10. 待改进：Ledger 完整性与确定性

10.1 当前 Ledger 不完整

当前 Ledger 缺失 Deposit, Lock, Unlock, SpendFrozen 等操作。

10.2 Pipeline 模式的确定性问题

由于 Ring Buffer 并行处理，Lock 和 Settle 事件的交错顺序不固定，导致无法通过快照对比来验证一致性。

10.3 解决方案：分离 Version 空间

为每种事件类型维护独立的 version：

Version 空间	递增条件	排序依据	确定性
`lock_version`	Lock/Unlock 事件	`order_seq_id`	✅ 确定
`settle_version`	Settle 事件	`trade_id`	✅ 确定

验证策略：不再验证任意时刻的快照，而是验证处理完成后的最终事件集合（按各自 Version 排序）。

11. 设计讨论全记录

11.1 因果链设计

UBSCore 有两个输入源：OrderQueue 和 TradeQueue。为了审计，我们建立了因果链：

#![allow(unused)]
fn main() {
struct BalanceEvent {
    // ...
    source_type: SourceType, // Order | Trade
    source_id: u64,          // order_seq_id | trade_id
}
}

这不仅解决了审计问题，还让我们可以快速定位问题源头：Lock 必定对应一个 Order，Settle 必定对应一个 Trade。

12. 下一章任务 (0x08-c)

实现分离 Version 空间 - lock_version / settle_version
扩展 BalanceEvent - 添加 event_type, version, source_id
Ring Buffer 集成
因果链审计工具

0x08-c Complete Event Flow & Verification

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

Core Objective: Implement a complete Event Sourcing architecture, verify equivalence with the legacy version, and upgrade the baseline.

Problems Identified

In the previous chapter (0x08-b), we implemented the UBSCore service but identified several issues:

1. Incomplete Ledger

The current Ledger only records settlement operations (Credit/Debit), missing other critical balance changes:

Operation	Current Record	Production Req
Deposit	❌	✅
Lock	❌	✅
Unlock	❌	✅
Settle	❌	✅

2. Pipeline Determinism Issue

With a multi-stage Ring Buffer pipeline, the interleaving order of Lock and Settle events is non-deterministic:

Run 1: [Lock1, Lock2, Lock3, Settle1, Settle2, Settle3]
Run 2: [Lock1, Settle1, Lock2, Settle2, Lock3, Settle3]

Result: Final state is identical, but the intermediate version sequence differs. Direct diff verification fails.

Objectives

1. Implement Separate Version Spaces

#![allow(unused)]
fn main() {
struct Balance {
    avail: u64,
    frozen: u64,
    lock_version: u64,    // Increments only on lock/unlock
    settle_version: u64,  // Increments only on settle
}
}

2. Expand BalanceEvent

#![allow(unused)]
fn main() {
struct BalanceEvent {
    user_id: u64,
    asset_id: u32,
    event_type: EventType,  // Deposit | Lock | Unlock | Settle
    version: u64,           // Increments within strict version space
    source_type: SourceType,// Order | Trade | External
    source_id: u64,         // order_seq_id | trade_id | ref_id
    delta: i64,
    avail_after: u64,
    frozen_after: u64,
}
}

3. Record ALL Balance Operations

Order(seq=5) ──Trigger──→ Lock(buyer USDT, lock_version=1)
     │
     └──→ Trade(id=3)
              │
              ├──Trigger──→ Settle(buyer: -USDT, +BTC, settle_version=1)
              └──Trigger──→ Settle(seller: -BTC, +USDT, settle_version=1)

4. Verify Equivalence & Upgrade Baseline

Ensure the refactored system produces the exact same final state as the pre-refactor version.

Implementation Progress

Phase 1: Separate Version Spaces ✅ Done

Goal: Solve Pipeline Determinism.

1.1 Modify Balance Struct

#![allow(unused)]
fn main() {
// src/balance.rs
pub struct Balance {
    avail: u64,
    frozen: u64,
    lock_version: u64,    // lock/unlock/deposit/withdraw
    settle_version: u64,  // spend_frozen/deposit
}
}

1.2 Version Increment Logic

Operation	Version Incremented
`deposit()`	lock_version AND settle_version
`withdraw()`	lock_version
`lock()`	lock_version
`unlock()`	lock_version
`spend_frozen()`	settle_version

1.3 Equivalence Verification ✅

Script: scripts/verify_baseline_equivalence.py

$ python3 scripts/verify_baseline_equivalence.py

╔════════════════════════════════════════════════════════════╗
║     Baseline Equivalence Verification                      ║
╚════════════════════════════════════════════════════════════╝
...
=== Step 3: Compare avail and frozen values ===
✅ EQUIVALENT: avail and frozen values are IDENTICAL

Phase 2: Expand BalanceEvent ✅ Done

Goal: Full Event Sourcing.

2.1 Event Types & Structure

Implemented in src/messages.rs:

#![allow(unused)]
fn main() {
pub enum BalanceEventType { Deposit, Withdraw, Lock, Unlock, Settle }
pub enum SourceType { Order, Trade, External }

pub struct BalanceEvent {
    pub user_id: u64,
    pub asset_id: u32,
    pub event_type: BalanceEventType,
    pub version: u64,
    pub source_type: SourceType,
    pub source_id: u64,
    pub delta: i64,
    // ...
}
}

Phase 3: Record All Operations in Ledger ✅ Done

Goal: Every balance change is recorded.

3.1 Event Log File

UBSCore mode generates output/t2_events.csv:

user_id,asset_id,event_type,version,source_type,source_id,delta,avail_after,frozen_after
655,2,lock,2,order,1,-3315478,996684522,3315478
96,2,settle,2,trade,1,-92889,999907111,0
604,1,deposit,1,external,1,10000000000,10000000000,0

3.2 Recorded Operations

Operation	Status	Note
Deposit	✅	Recorded on init
Lock	✅	Recorded on order lock
Settle	✅	Recorded on trade settle
Unlock	⏳	(No cancel in current test)
Withdraw	⏳	(No withdraw in current test)

3.3 Event Stats

Total events: 293,544
  Deposit events: 2,000
  Lock events: 100,000
  Settle events: 191,544

Phase 4: Validation Tests ✅ Done

Goal: Verify Event Correctness.

4.1 Event Correctness Verification

scripts/verify_balance_events.py - 7 Checks:

Check	Description	Status
Lock Count	= Accepted Orders	✅
Settle Count	= Trades × 4	✅
Lock Version Continuity	Incremental per User-Asset	✅
Settle Version Continuity	Incremental per User-Asset	✅
Delta Conservation	Sum of deltas per trade = 0	✅
Source Consistency	Lock→Order, Settle→Trade	✅
Deposit Correctness	Positive delta + source=external	✅

4.2 Events Baseline Verification

scripts/verify_events_baseline.py:

$ python3 scripts/verify_events_baseline.py
...
Comparing by event type...
  deposit: output=2000, baseline=2000 ✅
  lock: output=100000, baseline=100000 ✅
  settle: output=191544, baseline=191544 ✅

╔════════════════════════════════════════════════════════════╗
║     ✅ Events match baseline!                             ║
╚════════════════════════════════════════════════════════════╝

4.3 Full E2E Test

Run scripts/test_ubscore_e2e.sh:

$ bash scripts/test_ubscore_e2e.sh

=== Step 1: Run with UBSCore mode ===
...
=== Step 2: Verify standard baselines ===
  ✅ All MATCH

=== Step 3: Verify balance events correctness ===
  ✅ All 7 checks passed!

=== Step 4: Verify events baseline ===
  ✅ Events match baseline!

Baseline Files

File	Description
`baseline/t2_balances_final.csv`	Final Balance State
`baseline/t2_orderbook.csv`	Final OrderBook State
`baseline/t2_events.csv`	Event Log (293,544 events)

Next Steps

0x08-d: Multi-threaded Pipeline: Implement Ring Buffer to connect services.
0x09: Multi-Symbol Support: Scale to multiple trading pairs.

References

↑ Back to Top

🇨🇳 中文

📦 代码变更: 查看 Diff

核心目标：实现完整的事件溯源架构，验证与旧版本的等效性，升级 baseline。

本章问题

上一章（0x08-b）我们实现了 UBSCore 服务，但发现了几个问题：

1. Ledger 不完整

当前 Ledger 只记录结算操作（Credit/Debit），缺失其他余额变更：

操作	当前记录	生产要求
Deposit	❌	✅
Lock	❌	✅
Unlock	❌	✅
Settle	❌	✅

2. Pipeline 确定性问题

当采用 Ring Buffer 多阶段 Pipeline 时，Lock 和 Settle 的交错顺序不确定：

运行 1: [Lock1, Lock2, Lock3, Settle1, Settle2, Settle3]
运行 2: [Lock1, Settle1, Lock2, Settle2, Lock3, Settle3]

最终状态相同，但中间 version 序列不同 → 无法直接 diff 验证。

本章目标

1. 实现分离 Version 空间

#![allow(unused)]
fn main() {
struct Balance {
    avail: u64,
    frozen: u64,
    lock_version: u64,    // 只在 lock/unlock 时递增
    settle_version: u64,  // 只在 settle 时递增
}
}

2. 扩展 BalanceEvent

#![allow(unused)]
fn main() {
struct BalanceEvent {
    user_id: u64,
    asset_id: u32,
    event_type: EventType,  // Deposit | Lock | Unlock | Settle
    version: u64,           // 在对应 version 空间内递增
    source_type: SourceType,// Order | Trade | External
    source_id: u64,         // order_seq_id | trade_id | ref_id
    delta: i64,
    avail_after: u64,
    frozen_after: u64,
}
}

3. 记录所有余额操作

Order(seq=5) ──触发──→ Lock(buyer USDT, lock_version=1)
     │
     └──→ Trade(id=3)
              │
              ├──触发──→ Settle(buyer: -USDT, +BTC, settle_version=1)
              └──触发──→ Settle(seller: -BTC, +USDT, settle_version=1)

4. 验证等效性并升级 Baseline

确保重构后的系统与重构前产生相同的最终状态。

实现进度

Phase 1: 分离 Version 空间 ✅ 已完成

目标：解决 Pipeline 确定性问题

1.1 修改 Balance 结构

#![allow(unused)]
fn main() {
// src/balance.rs
pub struct Balance {
    avail: u64,
    frozen: u64,
    lock_version: u64,    // lock/unlock/deposit/withdraw 操作递增
    settle_version: u64,  // spend_frozen/deposit 操作递增
}
}

1.2 Version 递增逻辑

操作	递增的 Version
`deposit()`	lock_version AND settle_version
`withdraw()`	lock_version
`lock()`	lock_version
`unlock()`	lock_version
`spend_frozen()`	settle_version

1.3 等效性验证 ✅

验证脚本：scripts/verify_baseline_equivalence.py

$ python3 scripts/verify_baseline_equivalence.py

╔════════════════════════════════════════════════════════════╗
║     Baseline Equivalence Verification                      ║
╚════════════════════════════════════════════════════════════╝
...
=== Step 3: Compare avail and frozen values ===
✅ EQUIVALENT: avail and frozen values are IDENTICAL

Phase 2: 扩展 BalanceEvent ✅ 已完成

目标：完整的事件溯源

2.1 事件类型和结构

已在 src/messages.rs 中实现：

#![allow(unused)]
fn main() {
pub enum BalanceEventType { Deposit, Withdraw, Lock, Unlock, Settle }
pub enum SourceType { Order, Trade, External }

pub struct BalanceEvent {
    pub user_id: u64,
    pub asset_id: u32,
    pub event_type: BalanceEventType,
    pub version: u64,
    pub source_type: SourceType,
    pub source_id: u64,
    pub delta: i64,
    // ...
}
}

Phase 3: Ledger 记录所有操作 ✅ 已完成

目标：每个余额变更都有记录

3.1 事件日志文件

UBSCore 模式下生成 output/t2_events.csv：

user_id,asset_id,event_type,version,source_type,source_id,delta,avail_after,frozen_after
655,2,lock,2,order,1,-3315478,996684522,3315478
96,2,settle,2,trade,1,-92889,999907111,0
604,1,deposit,1,external,1,10000000000,10000000000,0

3.2 当前记录的操作

操作	状态	说明
Deposit	✅	初始充值时记录
Lock	✅	下单锁定后记录
Settle	✅	成交结算后记录
Unlock	⏳	取消订单时记录（当前测试无取消）
Withdraw	⏳	提现时记录（当前测试无提现）

3.3 事件统计

Total events: 293,544
  Deposit events: 2,000
  Lock events: 100,000
  Settle events: 191,544

Phase 4: 验证测试 ✅ 已完成

目标：验证事件正确性

4.1 事件正确性验证

scripts/verify_balance_events.py - 7 项检查：

检查项	说明	状态
Lock 事件数量	= 接受的订单数	✅
Settle 事件数量	= 成交数 × 4	✅
Lock 版本连续性	每个用户-资产对内递增	✅
Settle 版本连续性	每个用户-资产对内递增	✅
Delta 守恒	每笔成交的 delta 总和 = 0	✅
Source 类型一致性	Lock→Order, Settle→Trade	✅
Deposit 事件	正 delta + source_type=external	✅

4.2 Events Baseline 验证

scripts/verify_events_baseline.py:

$ python3 scripts/verify_events_baseline.py
...
Comparing by event type...
  deposit: output=2000, baseline=2000 ✅
  lock: output=100000, baseline=100000 ✅
  settle: output=191544, baseline=191544 ✅

╔════════════════════════════════════════════════════════════╗
║     ✅ Events match baseline!                             ║
╚════════════════════════════════════════════════════════════╝

4.3 完整 E2E 测试

运行 scripts/test_ubscore_e2e.sh：

$ bash scripts/test_ubscore_e2e.sh

=== Step 1: Run with UBSCore mode ===
...
=== Step 2: Verify standard baselines ===
  ✅ All MATCH

=== Step 3: Verify balance events correctness ===
  ✅ All 7 checks passed!

=== Step 4: Verify events baseline ===
  ✅ Events match baseline!

Baseline 文件

文件	说明
`baseline/t2_balances_final.csv`	最终余额状态
`baseline/t2_orderbook.csv`	最终订单簿状态
`baseline/t2_events.csv`	事件日志 (293,544 事件)

下一步

0x08-d: 多线程 Pipeline - 实现 Ring Buffer 连接各服务
0x09: 多 Symbol 支持 - 扩展到多交易对

参考

Event Sourcing - 事件溯源模式
LMAX Disruptor - Ring Buffer 架构原型

0x08-d Complete Order Lifecycle & Cancel Optimization

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

Core Objective: Implement full order lifecycle management (including Cancel and Refund), design a dual-track testing framework, and analyze performance bottlenecks.

1. Feature Implementation Overview

In this chapter, we completed the following core features to equip the trading engine with full order processing capabilities:

1.1 Order Events & State Management

Implemented complete OrderEvent enum and CSV logging.

OrderStatus (src/models.rs): Follows Binance-style Screaming Snake Case.

#![allow(unused)]
fn main() {
pub enum OrderStatus {
    NEW,              // Booked
    PARTIALLY_FILLED, 
    FILLED,           
    CANCELED,         // User Cancelled
    REJECTED,         // Risk Check Failed
    EXPIRED,          // System Expired
}
}

OrderEvent (src/messages.rs): Used for Event Sourcing and Audit Logs.

Event Type	Trigger	Fund Operation
`Accepted`	Passed risk check	`Lock`
`Rejected`	Insufficient balance/Bad params	None
`Filled`	Fully filled	`Settle`
`PartialFilled`	Partially filled	`Settle`
`Cancelled`	User cancel	`Unlock` (Refund remaining)
`Expired`	System expired	`Unlock`

CSV Log Format (output/t2_order_events.csv):

event_type,order_id,user_id,seq_id,filled_qty,remaining_qty,price,reason
accepted,1,100,101,,,,
rejected,3,102,103,,,,insufficient_balance
partial_filled,1,100,,5000,1000,,
filled,1,100,,0,,85000,
cancelled,5,100,,,2000,,

1.2 Cancel Workflow

Parsing: scripts/csv_io.rs supports action=cancel.
Removal: MatchingEngine calls OrderBook::remove_order_by_id.
Unlock: UBSCore generates Unlock event to refund frozen funds.
Logging: Record Cancelled event.

2. Dual-Track Testing Framework

To guarantee baseline stability while adding new features:

2.1 Regression Baseline

Dataset: fixtures/orders.csv (100k orders, Place only).
Script: scripts/test_e2e.sh
Goal: Ensure no performance regression for legacy flows.

2.2 Feature Testing

Dataset: fixtures/test_with_cancel/orders.csv (1M orders, 30% Cancel).
Script: scripts/test_cancel.sh
Goal: Verify lifecycle closure (Lock = Settle + Unlock).

3. Major Performance Issue

When scaling Cancel tests from 1,000 to 1,000,000 orders, we hit a severe performance wall.

3.1 Symptoms

Baseline (100k Place): ~3 seconds.
Cancel Test (1M Place+Cancel): > 7 minutes (430s).
Bottleneck: Matching Engine consumes 98% CPU.

3.2 Root Cause Analysis

The culprit is OrderBook::remove_order_by_id:

#![allow(unused)]
fn main() {
// src/orderbook.rs
pub fn remove_order_by_id(&mut self, order_id: u64) -> Option<InternalOrder> {
    // Scan ALL price levels -> Scan ALL orders in level
    for (key, orders) in self.bids.iter_mut() {
        if let Some(pos) = orders.iter().position(|o| o.order_id == order_id) {
            // ...
        }
    }
    // Scan asks...
}
}

Complexity: O(N).
Worst Case: With 500k orders piled up in the book, executing 300k cancels means 150 billion comparisons.

3.3 Solution (Next Step)

Introduce Order Index:

Structure: HashMap<OrderId, (Price, Side)>.
Complexity: Reduces Cancel from O(N) to O(1).

4. Verification Scripts

verify_balance_events.py:
- Added Check 8: Verify Frozen Balance history consistency.
- Verify Unlock events correctly release funds.
verify_order_events.py:
- Verify every Accepted order has a final state.
- Verify Cancelled orders correspond to existing Accepted orders.

5. Summary

We implemented full order lifecycle management and established a rigorous testing framework. Crucially, mass stress testing exposed a Big O algorithm defect in the cancel logic, setting the stage for the next optimization iteration.

↑ Back to Top

🇨🇳 中文

📦 代码变更: 查看 Diff

核心目标：实现订单全生命周期管理（含撤单、退款），设计双轨制测试框架，并深入分析引入的性能瓶颈。

1. 功能实现概览

在本章中，我们完成了以下核心功能，使交易引擎具备了完整的订单处理能力：

1.1 订单事件与状态管理

实现了完整的 OrderEvent 枚举与 CSV 日志记录。

OrderStatus (src/models.rs): 注意遵循 Binance 风格的 Screaming Snake Case。

#![allow(unused)]
fn main() {
pub enum OrderStatus {
    NEW,              // 挂单中
    PARTIALLY_FILLED, // 部分成交
    FILLED,           // 完全成交
    CANCELED,         // 用户撤单 (注意拼写 CANCELED)
    REJECTED,         // 风控拒绝
    EXPIRED,          // 系统过期
}
}

OrderEvent (src/messages.rs): 用于 Event Sourcing 和审计日志。

事件类型	触发场景	资金操作
`Accepted`	订单通过风控并进入撮合	`Lock` (冻结)
`Rejected`	余额不足或参数错误	无
`Filled`	完全成交	`Settle` (结算)
`PartialFilled`	部分成交	`Settle` (结算)
`Cancelled`	用户撤单 (注意拼写 Cancelled)	`Unlock` (解冻剩余资金)
`Expired`	系统过期	`Unlock` (解冻)

CSV 日志格式 (output/t2_order_events.csv): 实际代码实现的列顺序如下：

event_type,order_id,user_id,seq_id,filled_qty,remaining_qty,price,reason
accepted,1,100,101,,,,
rejected,3,102,103,,,,insufficient_balance
partial_filled,1,100,,5000,1000,,
filled,1,100,,0,,85000,
cancelled,5,100,,,2000,,

1.2 撤单流程 (Cancel Workflow)

实现了 cancel 动作的处理流程：

输入解析: scripts/csv_io.rs 支持新旧两种 CSV 格式。
- 新格式: order_id,user_id,action,side,price,qty (支持 action=cancel)。
撮合移除: MatchingEngine 调用 OrderBook::remove_order_by_id 移除订单。
资金解锁: UBSCore 生成 Unlock 事件，返还冻结资金。
事件记录: 记录 Cancelled 事件。

2. 双轨制测试框架

为了在引入新功能的同时保证原有基准不被破坏，我们设计了双轨制测试策略：

2.1 原始基准 (Regression Baseline)

数据集: fixtures/orders.csv (10万订单，仅 Place)。
脚本: scripts/test_e2e.sh
目的: 确保传统撮合性能不回退，验证核心正确性。
原则: 保持基准稳定 (非必要不修改，除非格式升级或重大调整)。

2.2 新功能测试 (Feature Testing)

数据集: fixtures/test_with_cancel/orders.csv (100万订单，含30% Cancel)。
脚本: scripts/test_cancel.sh
验证:
- verify_balance_events.py: 验证资金守恒 (Lock = Settle + Unlock)。
- verify_order_events.py: 验证订单生命周期闭环。

3. 重大性能问题分析 (Major Issue)

在将撤单测试规模从 1000 扩大到 100万时，我们发现了一个严重的性能崩塌现象。

3.1 现象

基准测试 (10万 Place): 耗时 ~3秒。
撤单测试 (100万 Place+Cancel): 耗时 超过 7分钟 (430秒)。
瓶颈定位: Matching Engine 耗时占比 98%。

3.2 原因深入分析

通过代码审查，我们发现瓶颈在于 OrderBook::remove_order_by_id 的实现：

#![allow(unused)]
fn main() {
// src/orderbook.rs
pub fn remove_order_by_id(&mut self, order_id: u64) -> Option<InternalOrder> {
    // 遍历卖单簿的所有价格层级 --> 遍历每个层级的所有订单
    for (key, orders) in self.bids.iter_mut() {
        if let Some(pos) = orders.iter().position(|o| o.order_id == order_id) {
            // ...
        }
    }
    // 遍历买单簿...
}
}

复杂度: O(N)，其中 N 是当前 OrderBook 中的订单总数。
数据分布恶化: 在 test_with_cancel 数据集中，由于缺乏激进的“吃单”逻辑，大量订单堆积在撮合簿中（未成交）。假设盘口堆积了 50万订单。
计算量: 执行 30万次撤单，每次遍历 50万数据 = 1500亿次 CPU 比较操作。

这解释了为什么系统在处理大规模撤单时速度极慢。

3.3 解决方案 (Next Step)

为了解决此问题，必须引入订单索引 (Order Index)：

结构: HashMap<OrderId, (Price, Side)>。
优化后复杂度: 撤单查找从 O(N) 降为 O(1)。

4. 验证脚本

我们提供了两个 Python脚本用于验证逻辑正确性：

verify_balance_events.py:
- 新增 Check 8: 验证 Frozen Balance 的历史一致性。
- 验证 Unlock 事件是否正确释放了资金。
verify_order_events.py:
- 验证所有 Accepted 订单最终都有终态 (Filled/Cancelled/Rejected)。
- 验证 Cancelled 订单真的对应了相应的 Accepted 事件。

5. 总结

本章不仅完成了功能的开发，更重要的是建立了数据隔离的测试体系，并通过大规模压测暴露了算法复杂度缺陷。这为下一步的持续迭代奠定了坚实基础。

0x08-e Performance Profiling & Optimization

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

Background: After introducing Cancel, execution time exploded from ~30s to 7+ minutes. We need to identify and fix the issue.

Goal:

Establish architecture-level profiling to pinpoint bottlenecks.

Fix the identified O(N) issues.

Verify improvements with data.

1. Symptoms

Performance collapsed after adding Cancel:

Execution Time: ~30s → 7+ minutes
Throughput: ~34k ops/s → ~3k ops/s

Hypothesis:

Is it the O(N) Cancel scan?
VecDeque removal overhead?
Something else?

Hypothesis implies guessing. Profiling provides facts.

2. Optimization 1: Order Index

2.1 The Problem

Cancelling requires looking up an order. The naive remove_order_by_id iterates the entire book:

#![allow(unused)]
fn main() {
// Before: O(N) full scan
pub fn remove_order_by_id(&mut self, order_id: u64) -> Option<InternalOrder> {
    for (key, orders) in self.bids.iter_mut() {
        if let Some(pos) = orders.iter().position(|o| o.order_id == order_id) {
            // ...
        }
    }
    // Scan asks...
}
}

2.2 The Solution

Introduce order_index: FxHashMap<OrderId, (Price, Side)> for O(1) lookup.

#![allow(unused)]
fn main() {
pub struct OrderBook {
    asks: BTreeMap<u64, VecDeque<InternalOrder>>,
    bids: BTreeMap<u64, VecDeque<InternalOrder>>,
    order_index: FxHashMap<u64, (u64, Side)>,  // New
    trade_id_counter: u64,
}
}

2.3 Index Maintenance

Operation	Action
`rest_order()`	Insert
`cancel_order()`	Remove
`remove_order_by_id()`	Remove
Match Fill	Remove

2.4 Optimized Implementation

#![allow(unused)]
fn main() {
pub fn remove_order_by_id(&mut self, order_id: u64) -> Option<InternalOrder> {
    // O(1) Lookup
    let (price, side) = self.order_index.remove(&order_id)?;
    
    // O(log n) Find level
    let (book, key) = match side {
        Side::Buy => (&mut self.bids, u64::MAX - price),
        Side::Sell => (&mut self.asks, price),
    };
    
    // O(k) Find in level (k is small)
    let orders = book.get_mut(&key)?;
    let pos = orders.iter().position(|o| o.order_id == order_id)?;
    let order = orders.remove(pos)?;
    
    if orders.is_empty() {
        book.remove(&key);
    }
    
    Some(order)
}
}

2.5 Result 1

Metric	Before	After
Time	7+ min	87s
Throughput	~3k ops/s	15k ops/s
Boost	-	5x

Huge improvement! But 87s for 1.3M orders is still slow (15k ops/s). Further analysis is needed.

3. Architecture Profiling

3.1 Design

Measure time at architectural stages:

Order Input
    │
    ▼
┌─────────────────┐
│  1. Pre-Trade   │  ← UBSCore: WAL + Balance Lock
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  2. Matching    │  ← Pure ME: process_order
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  3. Settlement  │  ← UBSCore: settle_trade
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  4. Event Log   │  ← Ledger writes
└─────────────────┘

3.2 PerfMetrics

#![allow(unused)]
fn main() {
pub struct PerfMetrics {
    pub total_pretrade_ns: u64,    // UBSCore WAL + Lock
    pub total_matching_ns: u64,    // Match processing
    pub total_settlement_ns: u64,  // Balance updates
    pub total_event_log_ns: u64,   // Ledger I/O
    
    pub place_count: u64,
    pub cancel_count: u64,
}
}

4. Optimization 2: Matching Engine

4.1 Bottleneck Identification

Profiling revealed Matching Engine used 96% of time. Deep dive found:

#![allow(unused)]
fn main() {
// Problem: Copy ALL price keys on every match
let prices: Vec<u64> = book.asks().keys().copied().collect();
}

With 250k+ price levels in the Cancel test, copying keys O(P) + Alloc every match is disastrous.

4.2 Solution

Use BTreeMap::range() to iterate only relevant prices.

#![allow(unused)]
fn main() {
// Solution: Iterate only valid price range
let max_price = if buy_order.order_type == OrderType::Limit {
    buy_order.price
} else {
    u64::MAX
};
let prices: Vec<u64> = book.asks().range(..=max_price).map(|(&k, _)| k).collect();
}

5. Final Results

5.1 Environment

Dataset: 1.3M Orders (1M Place + 300k Cancel)
HW: MacBook Pro M1

5.2 Breakdown

=== Performance Breakdown ===
Orders: 1300000, Trades: 538487

1. Pre-Trade:        621.97ms (  3.5%)  [  0.48 µs/order]
2. Matching:       15014.08ms ( 84.0%)  [ 15.01 µs/order]
3. Settlement:        21.57ms (  0.1%)  [  0.04 µs/trade]
4. Event Log:       2206.71ms ( 12.4%)  [  1.70 µs/order]

Total Tracked:     17864.33ms

5.3 Improvements

Stage	Latency Before	Latency After	Gain
Matching	83.53 µs/order	15.01 µs/order	5.6x
Cancel Lookup	O(N)	0.29 µs	-

6. Comparison Table

Version	Time	Throughput	Gain
Before optimization	7+ min	~3k ops/s	-
Order Index	87s	15k ops/s	5x
+ BTreeMap range	18s	72k ops/s	24x

7. Summary

7.1 Achievements

Optimization	Problem	Solution	Result
Order Index	O(N) Cancel	`FxHashMap`	0.29 µs
Range Query	Full key copy	`range()`	83→15 µs

7.2 Final Design Pattern

┌─────────────────────────────────────────────────────────┐
│                     OrderBook                           │
│  ┌─────────────────┐    ┌─────────────────────────────┐ │
│  │   order_index   │◄───│  Sync on: rest, cancel,     │ │
│  │ FxHashMap<id,   │    │           match, remove     │ │
│  │   (price,side)> │    └─────────────────────────────┘ │
│  └────────┬────────┘                                    │
│           │ O(1) lookup                                 │
│           ▼                                             │
│  ┌─────────────────┐    ┌─────────────────────────────┐ │
│  │      bids       │    │          asks               │ │
│  │ BTreeMap<price, │    │  BTreeMap<price,            │ │
│  │   VecDeque>     │    │    VecDeque>                │ │
│  │  + range()      │    │    + range()                │ │
│  └─────────────────┘    └─────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘

Optimization Conclusion: From 7 minutes to 18 seconds. 24x boost. 🚀

↑ Back to Top

🇨🇳 中文

📦 代码变更: 查看 Diff

背景：引入 Cancel 功能后，执行时间从 ~30s 暴涨到 7+ 分钟，需要定位并解决问题。

本章目的：

建立正确的架构级 Profiling 方法

通过 Profiling 精确定位性能瓶颈

针对性修复发现的问题

关键点：直觉可以指导方向，但必须用 Profiling 数据验证。

1. 问题现象

引入 Cancel 后性能急剧下降：

执行时间：~30s → 7+ 分钟
吞吐量：~34k ops/s → ~3k ops/s

初始假设可能的原因：

Cancel 的 O(N) 查找？
VecDeque 删除开销？
其他未知问题？

但在 Profile 之前，这些都只是猜测。

2. Order Index 优化（第一次优化）

2.1 问题

撤单操作需要在 OrderBook 中查找订单。原始实现 remove_order_by_id 需要遍历整个订单簿：

#![allow(unused)]
fn main() {
// 优化前：O(N) 全表扫描
pub fn remove_order_by_id(&mut self, order_id: u64) -> Option<InternalOrder> {
    for (key, orders) in self.bids.iter_mut() {
        if let Some(pos) = orders.iter().position(|o| o.order_id == order_id) {
            // ...
        }
    }
    // 再遍历 asks...
}
}

2.2 解决方案

引入 order_index: FxHashMap<OrderId, (Price, Side)> 实现 O(1) 查找：

#![allow(unused)]
fn main() {
pub struct OrderBook {
    asks: BTreeMap<u64, VecDeque<InternalOrder>>,
    bids: BTreeMap<u64, VecDeque<InternalOrder>>,
    order_index: FxHashMap<u64, (u64, Side)>,  // 新增
    trade_id_counter: u64,
}
}

2.3 索引维护

操作	索引动作
`rest_order()`	插入
`cancel_order()`	移除
`remove_order_by_id()`	移除
撮合成交	移除

2.4 优化后实现

#![allow(unused)]
fn main() {
pub fn remove_order_by_id(&mut self, order_id: u64) -> Option<InternalOrder> {
    // O(1) 查找
    let (price, side) = self.order_index.remove(&order_id)?;
    
    // O(log n) 定位价格层级
    let (book, key) = match side {
        Side::Buy => (&mut self.bids, u64::MAX - price),
        Side::Sell => (&mut self.asks, price),
    };
    
    // O(k) 在价格层级内查找 (k 通常很小)
    let orders = book.get_mut(&key)?;
    let pos = orders.iter().position(|o| o.order_id == order_id)?;
    let order = orders.remove(pos)?;
    
    if orders.is_empty() {
        book.remove(&key);
    }
    
    Some(order)
}
}

2.5 第一次优化结果

指标	优化前	优化后
执行时间	7+ 分钟	87s
吞吐量	~3k ops/s	15k ops/s
提升	-	5x

提升巨大！ 但 87s 处理 130万订单仍然很慢。需要继续分析。

3. 架构级 Profiling（定位真正瓶颈）

3.1 Profiling 设计

按照订单生命周期的顶层架构分阶段计时：

Order Input
    │
    ▼
┌─────────────────┐
│  1. Pre-Trade   │  ← UBSCore: WAL + Balance Lock
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  2. Matching    │  ← Pure ME: process_order
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  3. Settlement  │  ← UBSCore: settle_trade
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  4. Event Log   │  ← Ledger writes
└─────────────────┘

3.2 PerfMetrics 设计

#![allow(unused)]
fn main() {
pub struct PerfMetrics {
    // 顶层架构计时
    pub total_pretrade_ns: u64,    // UBSCore WAL + Lock
    pub total_matching_ns: u64,    // Pure ME
    pub total_settlement_ns: u64,  // Balance updates
    pub total_event_log_ns: u64,   // Ledger writes
    
    // 操作计数
    pub place_count: u64,
    pub cancel_count: u64,
}
}

4. Matching Engine 优化（第二次优化）

4.1 问题定位

通过架构级 Profiling 发现 Matching Engine 占用 96% 时间。深入分析发现：

#![allow(unused)]
fn main() {
// 问题代码：每次 match 都复制所有价格 keys
let prices: Vec<u64> = book.asks().keys().copied().collect();
}

当订单簿有 25万+ 价格层级时，每次 match 都要：

遍历整个 BTreeMap 收集 keys - O(P)
分配 Vec 存储 - 内存分配开销
再遍历 Vec 进行匹配

4.2 优化方案

使用 BTreeMap::range() 只收集匹配范围内的 keys：

#![allow(unused)]
fn main() {
// 优化后：只收集匹配价格范围内的 keys
let max_price = if buy_order.order_type == OrderType::Limit {
    buy_order.price
} else {
    u64::MAX
};
let prices: Vec<u64> = book.asks().range(..=max_price).map(|(&k, _)| k).collect();
}

5. 性能测试结果

5.1 测试环境

数据集：130万订单（100万 Place + 30万 Cancel）
机器：MacBook Pro M1

5.2 最终 Breakdown

=== Performance Breakdown ===
Orders: 1300000 (Place: 1000000, Cancel: 300000), Trades: 538487

1. Pre-Trade:        621.97ms (  3.5%)  [  0.48 µs/order]
2. Matching:       15014.08ms ( 84.0%)  [ 15.01 µs/order]
3. Settlement:        21.57ms (  0.1%)  [  0.04 µs/trade]
4. Event Log:       2206.71ms ( 12.4%)  [  1.70 µs/order]

Total Tracked:     17864.33ms

5.3 优化效果

阶段	优化前	优化后	提升
Matching	83.53 µs/order	15.01 µs/order	5.6x
Cancel Lookup	O(N)	0.29 µs	-

6. 执行性能对比

版本	执行时间	吞吐量	改进
优化前 (O(N) 撤单 + 全量 keys)	7+ 分钟	~3k ops/s	-
Order Index 优化	87s	15k ops/s	5x
+ BTreeMap range query	18s	72k ops/s	24x

7. 总结

7.1 优化成果

优化	问题	解决方案	效果
Order Index	O(N) 撤单查找	FxHashMap 索引	0.29 µs/cancel
BTreeMap range	全量 keys 复制	range() 范围查询	83→15 µs/order

7.2 最终设计模式

┌─────────────────────────────────────────────────────────┐
│                     OrderBook                           │
│  ┌─────────────────┐    ┌─────────────────────────────┐ │
│  │   order_index   │◄───│  Sync on: rest, cancel,     │ │
│  │ FxHashMap<id,   │    │           match, remove     │ │
│  │   (price,side)> │    └─────────────────────────────┘ │
│  └────────┬────────┘                                    │
│           │ O(1) lookup                                 │
│           ▼                                             │
│  ┌─────────────────┐    ┌─────────────────────────────┐ │
│  │      bids       │    │          asks               │ │
│  │ BTreeMap<price, │    │  BTreeMap<price,            │ │
│  │   VecDeque>     │    │    VecDeque>                │ │
│  │  + range()      │    │    + range()                │ │
│  └─────────────────┘    └─────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘

本次优化先到此为止！从 7 分钟到 18 秒，吞吐量提升 24 倍！ 🚀

0x08-f Ring Buffer Pipeline Implementation

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

Goal: Connect services using Ring Buffers to implement a true Pipeline architecture.

Part 1: Single-Thread Pipeline

1.1 Background

Legacy Execution (Synchronous Serial):

for order in orders:
    1. ubscore.process_order(order)     # WAL + Lock
    2. engine.process_order(order)       # Match
    3. ubscore.settle_trade(trade)       # Settle
    4. ledger.write(event)               # Persist

Problem: No pipeline parallelism, latency accumulates.

1.2 Single-Thread Pipeline Architecture

Decouple services using Ring Buffers, but polling within a single thread loop:

┌─────────────────────────────────────────────────────────────────────────┐
│                    Single-Thread Pipeline (Round-Robin)                  │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│   Stage 1: Ingestion          →  order_queue                            │
│   Stage 2: UBSCore Pre-Trade  →  valid_order_queue                      │
│   Stage 3: Matching Engine    →  trade_queue                            │
│   Stage 4: Settlement         →  (Ledger)                               │
│                                                                          │
│   All Stages executed in a round-robin loop                              │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Core Data Structures:

#![allow(unused)]
fn main() {
pub struct PipelineQueues {
    pub order_queue: Arc<ArrayQueue<SequencedOrder>>,
    pub valid_order_queue: Arc<ArrayQueue<ValidOrder>>,
    pub trade_queue: Arc<ArrayQueue<TradeEvent>>,
}
}

Execution Loop:

#![allow(unused)]
fn main() {
loop {
    // UBSCore: order_queue → valid_order_queue
    if let Some(order) = queues.order_queue.pop() {
        // ...
    }
    
    // ME: valid_order_queue → trade_queue
    if let Some(valid_order) = queues.valid_order_queue.pop() {
        // ...
    }
    
    // Settlement: trade_queue → persist
    if let Some(trade) = queues.trade_queue.pop() {
        // ...
    }
}
}

Part 2: Multi-Thread Pipeline

2.1 Architecture

Full Multi-Threaded Pipeline based on 0x08-a design:

┌───────────────────────────────────────────────────────────────────────────────────────┐
│                          Multi-Thread Pipeline (Full)                                  │
├───────────────────────────────────────────────────────────────────────────────────────┤
│                                                                                        │
│  Thread 1: Ingestion       Thread 2: UBSCore              Thread 3: ME                │
│  ┌─────────────────┐       ┌──────────────────────┐       ┌─────────────────┐         │
│  │ Read orders     │       │  PRE-TRADE:          │       │ Match Order     │         │
│  │ Assign SeqNum   │──────▶│  - Write WAL         │──────▶│ in OrderBook    │         │
│  │                 │   ①   │  - process_order()   │  ③    │                 │         │
│  └─────────────────┘       │  - lock_balance()    │       │ Generate        │         │
│                            │                      │       │ TradeEvents     │         │
│                            └──────────┬───────────┘       └────────┬────────┘         │
│                                       ▲                            │                  │
│                                       │                            │                  │
│                                       │ ⑤ balance_update_queue     │ ④ trade_queue   │
│                                       └────────────────────────────┤                  │
│                                                                    │                  │
│                            ┌──────────────────────┐                ▼                  │
│                            │  POST-TRADE:         │       ┌─────────────────┐         │
│                            │  - settle_trade()    │       │ Thread 4:       │         │
│                            │  - spend_frozen()    │──────▶│ Settlement      │         │
│                            │  - deposit()         │  ⑥    │                 │         │
│                            │  - Generate Balance  │       │ Persist:        │         │
│                            │    Update Events     │       │ - Trade Events  │         │
│                            └──────────────────────┘       │ - Balance Events│         │
│                                                           │ - Ledger        │         │
│                                                           └─────────────────┘         │
│                                                                                        │
└───────────────────────────────────────────────────────────────────────────────────────┘

2.2 Key Design Points

ME Fan-out: ME sends TradeEvent in parallel to:
- trade_queue → Settlement (Persist)
- balance_update_queue → UBSCore (Balance Settle)
UBSCore as Single Balance Entry: Handles Pre-Trade Lock, Post-Trade Settle, and Refunds.
Settlement Consolidation: Consumes both Trade Events and Balance Events.

2.3 Data Types

BalanceUpdateRequest (ME → UBSCore): Contains Trade Event and optional Price Improvement data.

BalanceEvent (UBSCore → Settlement): The unified channel for ALL balance changes (Lock, Settle, Credit, Refund).

#![allow(unused)]
fn main() {
pub enum BalanceEventType {
    Lock,           // Pre-Trade
    SpendFrozen,    // Post-Trade
    Credit,         // Post-Trade
    RefundFrozen,   // Price Improvement
    // ...
}
}

2.4 Implementation Status

Component	Status
All Queues	✅ Implemented
UBSCore BalanceEvent Gen	✅ Implemented
Settlement Persistence	✅ Implemented

Verification & Performance (2025-12-17)

Correctness

E2E tests pass for both pipeline modes.

Performance Comparison

1.3M Orders (with 300k Cancel):

Mode	Time	Throughput	Trades
UBSCore (Baseline)	23.5s	55k ops/s	538,487
Single-Thread Pipeline	22.1s	59k ops/s	538,487
Multi-Thread Pipeline	29.1s	45k ops/s	489,804

Issue: Multi-Thread mode is currently slower (-30%) on large datasets and skips cancel orders.

100k Orders (Place only):

Mode	Time	Throughput	vs Baseline
UBSCore	755ms	132k ops/s	-
Single-Thread	519ms	193k ops/s	+46%
Multi-Thread	391ms	256k ops/s	+93%

Observation: Multi-threading shines on smaller, simpler datasets (+93%).

Analysis

Multi-threaded pipeline overhead (context switching, queue contention, event generation) outweighs benefits when per-order processing time is very low (due to optimizations). Also, missing Cancel logic reduces correctness.

Key Design Decisions

Backpressure: Spin Wait (prioritize low latency).
Shutdown: Graceful drain using Atomic Signals.
Error Handling: Logging and metric counting; critical paths must succeed.

↑ Back to Top

🇨🇳 中文

📦 代码变更: 查看 Diff

目标：使用 Ring Buffer 串接不同服务，实现真正的 Pipeline 架构

Part 1: 单线程 Pipeline

1.1 背景

原始执行模式 (同步串行):

for order in orders:
    1. ubscore.process_order(order)     # WAL + Lock
    2. engine.process_order(order)       # Match
    3. ubscore.settle_trade(trade)       # Settle
    4. ledger.write(event)               # Persist

问题：没有 Pipeline 并行，延迟累加

1.2 单线程 Pipeline 架构

使用 Ring Buffer 解耦各服务，但仍在单线程中轮询执行：

┌─────────────────────────────────────────────────────────────────────────┐
│                    Single-Thread Pipeline (Round-Robin)                  │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│   Stage 1: Ingestion          →  order_queue                            │
│   Stage 2: UBSCore Pre-Trade  →  valid_order_queue                      │
│   Stage 3: Matching Engine    →  trade_queue                            │
│   Stage 4: Settlement         →  (Ledger)                               │
│                                                                          │
│   所有 Stage 在同一个 while 循环中轮询执行                               │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

核心数据结构:

#![allow(unused)]
fn main() {
pub struct PipelineQueues {
    pub order_queue: Arc<ArrayQueue<SequencedOrder>>,
    pub valid_order_queue: Arc<ArrayQueue<ValidOrder>>,
    pub trade_queue: Arc<ArrayQueue<TradeEvent>>,
}
}

执行流程:

#![allow(unused)]
fn main() {
loop {
    // UBSCore: order_queue → valid_order_queue
    if let Some(order) = queues.order_queue.pop() {
        // ...
    }
    
    // ME: valid_order_queue → trade_queue
    if let Some(valid_order) = queues.valid_order_queue.pop() {
        // ...
    }
    
    // Settlement: trade_queue → persist
    if let Some(trade) = queues.trade_queue.pop() {
        // ...
    }
}
}

Part 2: 多线程 Pipeline

2.1 架构

根据 0x08-a 原始设计，完整的多线程 Pipeline 数据流如下：

┌───────────────────────────────────────────────────────────────────────────────────────┐
│                          Multi-Thread Pipeline (完整版)                                │
├───────────────────────────────────────────────────────────────────────────────────────┤
│                                                                                        │
│  Thread 1: Ingestion       Thread 2: UBSCore              Thread 3: ME                │
│  ┌─────────────────┐       ┌──────────────────────┐       ┌─────────────────┐         │
│  │ Read orders     │       │  PRE-TRADE:          │       │ Match Order     │         │
│  │ Assign SeqNum   │──────▶│  - Write WAL         │──────▶│ in OrderBook    │         │
│  │                 │   ①   │  - process_order()   │  ③    │                 │         │
│  └─────────────────┘       │  - lock_balance()    │       │ Generate        │         │
│                            │                      │       │ TradeEvents     │         │
│                            └──────────┬───────────┘       └────────┬────────┘         │
│                                       ▲                            │                  │
│                                       │                            │                  │
│                                       │ ⑤ balance_update_queue     │ ④ trade_queue   │
│                                       └────────────────────────────┤                  │
│                                                                    │                  │
│                            ┌──────────────────────┐                ▼                  │
│                            │  POST-TRADE:         │       ┌─────────────────┐         │
│                            │  - settle_trade()    │       │ Thread 4:       │         │
│                            │  - spend_frozen()    │──────▶│ Settlement      │         │
│                            │  - deposit()         │  ⑥    │                 │         │
│                            │  - Generate Balance  │       │ Persist:        │         │
│                            │    Update Events     │       │ - Trade Events  │         │
│                            └──────────────────────┘       │ - Balance Events│         │
│                                                           │ - Ledger        │         │
│                                                           └─────────────────┘         │
│                                                                                        │
└───────────────────────────────────────────────────────────────────────────────────────┘

2.2 关键设计点

ME Fan-out: ME 将 TradeEvent 并行发送到：
- trade_queue → Settlement (持久化交易记录)
- balance_update_queue → UBSCore (余额结算)
UBSCore 是余额操作的唯一入口: 处理 Pre-Trade 锁定、Post-Trade 结算和退款。
Settlement 聚合: 同时消费交易事件和余额事件。

2.3 数据类型

BalanceUpdateRequest (ME → UBSCore): 包含成交事件和可能的价格改善(Price Improvement)数据。

BalanceEvent (UBSCore → Settlement): 所有余额变更的统一通道 (Lock, Settle, Credit, Refund)。

#![allow(unused)]
fn main() {
pub enum BalanceEventType {
    Lock,           // Pre-Trade
    SpendFrozen,    // Post-Trade
    Credit,         // Post-Trade
    RefundFrozen,   // Price Improvement
    // ...
}
}

2.4 实现状态

组件	状态
所有队列	✅ 已实现
UBSCore BalanceEvent 生成	✅ 已实现
Settlement 持久化	✅ 已实现

验证与性能 (2025-12-17)

正确性

E2E 测试在两种模式下均通过。

性能对比

1.3M 订单 (含 30 万撤单):

模式	执行时间	吞吐量	成交数
UBSCore (Baseline)	23.5s	55k ops/s	538,487
单线程 Pipeline	22.1s	59k ops/s	538,487
多线程 Pipeline	29.1s	45k ops/s	489,804

问题: 多线程模式在大数据集上反而更慢 (-30%)，且目前跳过了撤单处理。

100k 订单 (仅 Place):

模式	时间	吞吐量	提升
UBSCore	755ms	132k ops/s	-
单线程	519ms	193k ops/s	+46%
多线程	391ms	256k ops/s	+93%

观察: 多线程在简单的小数据集上表现出色 (+93%)。

分析

在单笔处理极快的情况下，多线程带来的开销（上下文切换、队列竞争、事件生成）超过了并行的收益。此外，缺失撤单逻辑降低了正确性。

关键设计决策

背压: 自旋等待 (Spin Wait)，优先低延迟。
关闭: 使用原子信号优雅退出。
错误处理: 日志记录，核心路径必须成功。

0x08-g Multi-Thread Pipeline Design

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff | Key File: pipeline_mt.rs

Overview

The Multi-Thread Pipeline distributes processing logic across 4 independent threads, communicating via lock-free queues to achieve high throughput order processing.

Architecture

┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Ingestion  │────▶│   UBSCore   │────▶│     ME      │────▶│ Settlement  │
│  (Thread 1) │     │  (Thread 2) │     │  (Thread 3) │     │  (Thread 4) │
└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘
      │                   │ ▲                 │                   │
      │                   │ │                 │                   │
      ▼                   ▼ │                 ▼                   ▼
  order_queue ────▶ action_queue      balance_update_queue   trade_queue
                           │                                balance_event_queue
                           └──────────────────────────────────────┘

Thread Responsibilities

Thread	Responsibility	Input Queue	Output
Ingestion	Parse orders, assign SeqNum	orders (iterator)	order_queue
UBSCore	Pre-Trade (WAL + Lock) + Post-Trade (Settle)	order_queue, balance_update_queue	action_queue, balance_event_queue
ME	Match, Cancel handling	action_queue	trade_queue, balance_update_queue
Settlement	Persist Events (Trade, Balance)	trade_queue, balance_event_queue	ledgers

Queue Design

Using crossbeam-queue::ArrayQueue for lock-free MPSC queues:

#![allow(unused)]
fn main() {
pub struct MultiThreadQueues {
    pub order_queue: Arc<ArrayQueue<OrderAction>>,     // 64K
    pub action_queue: Arc<ArrayQueue<ValidAction>>,    // 64K
    pub trade_queue: Arc<ArrayQueue<TradeEvent>>,      // 64K
    pub balance_update_queue: Arc<ArrayQueue<BalanceUpdateRequest>>,  // 64K
    pub balance_event_queue: Arc<ArrayQueue<BalanceEvent>>,           // 64K
}
}

Cancel Handling

Ingestion: Create OrderAction::Cancel.
UBSCore: Pass to action_queue (No lock needed).
ME: Remove from OrderBook, send BalanceUpdateRequest::Cancel.
UBSCore: Process unlock, generate BalanceEvent::Unlock.
Settlement: Persist BalanceEvent.

Consistency Verification

Test Script

# Run full comparison test
./scripts/test_pipeline_compare.sh highbal

# Supported Datasets:
#   100k    - 100k orders without cancel
#   cancel  - 1.3M orders with 30% cancel
#   highbal - 1.3M orders with 30% cancel, high balance (Recommended)

Verification Results (1.3M orders, 30% cancel, high balance)

╔════════════════════════════════════════════════════════════════╗
║                    ✅ ALL TESTS PASSED                         ║
║  Multi-thread pipeline matches single-thread exactly!          ║
╚════════════════════════════════════════════════════════════════╝

Key Metrics

Dataset	Total	Place	Cancel	Trades	Result
100k	100,000	100,000	0	47,886	✅ Match
1.3M HighBal	1,300,000	1,000,000	300,000	667,567	✅ Match

Important Considerations

Balance Sufficiency

Insufficient balance may cause rejections. In concurrent environments, rejection timing can vary due to settlement latency, leading to non-deterministic results. Solution: Use highbal dataset (1000 BTC + 100M USDT per user).

Shutdown Synchronization

Wait for queues to drain before signaling shutdown:

#![allow(unused)]
fn main() {
while !queues.all_empty() {
    std::hint::spin_loop();
}
shutdown.request_shutdown();
}

Performance

Mode	100k orders	1.3M orders
Single-Thread	350ms	15.5s
Multi-Thread	330ms	15.6s

Note: Multi-thread version includes overhead for BalanceEvent generation/persistence, matching Single-Thread performance. Future optimizations: Batch I/O, reduce contention.

Queue Priority Strategy (Future)

Current Implementation: Prioritize draining balance_update_queue completely before processing order_queue.

Future: Weighted Round-Robin: Allow alternating processing to improve responsiveness.

#![allow(unused)]
fn main() {
const SETTLE_WEIGHT: u32 = 3;  // settle : order = 3 : 1
}

File Structure

src/
├── pipeline.rs       # Shared types
├── pipeline_mt.rs    # Multi-thread impl
├── pipeline_runner.rs # Single-thread impl
└── main.rs

↑ Back to Top

🇨🇳 中文

📦 代码变更: 查看 Diff | 关键文件: pipeline_mt.rs

概述

Multi-Thread Pipeline 将处理逻辑分布在 4 个独立线程中，通过无锁队列通信，实现高吞吐量的订单处理。

架构

┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Ingestion  │────▶│   UBSCore   │────▶│     ME      │────▶│ Settlement  │
│  (Thread 1) │     │  (Thread 2) │     │  (Thread 3) │     │  (Thread 4) │
└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘
      │                   │ ▲                 │                   │
      │                   │ │                 │                   │
      ▼                   ▼ │                 ▼                   ▼
  order_queue ────▶ action_queue      balance_update_queue   trade_queue
                           │                                balance_event_queue
                           └──────────────────────────────────────┘

线程职责

线程	职责	输入队列	输出
Ingestion	订单解析、序列号分配	orders (iterator)	order_queue
UBSCore	Pre-Trade (WAL + Lock) + Post-Trade (Settle)	order_queue, balance_update_queue	action_queue, balance_event_queue
ME	订单撮合、取消处理	action_queue	trade_queue, balance_update_queue
Settlement	事件持久化 (TradeEvent, BalanceEvent)	trade_queue, balance_event_queue	ledger files

队列设计

使用 crossbeam-queue::ArrayQueue 实现无锁 MPSC 队列：

#![allow(unused)]
fn main() {
pub struct MultiThreadQueues {
    pub order_queue: Arc<ArrayQueue<OrderAction>>,     // 64K capacity
    pub action_queue: Arc<ArrayQueue<ValidAction>>,    // 64K capacity
    pub trade_queue: Arc<ArrayQueue<TradeEvent>>,      // 64K capacity
    pub balance_update_queue: Arc<ArrayQueue<BalanceUpdateRequest>>,  // 64K
    pub balance_event_queue: Arc<ArrayQueue<BalanceEvent>>,           // 64K
}
}

Cancel 订单处理

Cancel 订单流程：

Ingestion: 创建 OrderAction::Cancel { order_id, user_id }
UBSCore: 直接传递到 action_queue（无需 balance lock）
ME: 从 OrderBook 移除订单，发送 BalanceUpdateRequest::Cancel
UBSCore (Post-Trade): 处理 unlock，生成 BalanceEvent::Unlock
Settlement: 持久化 BalanceEvent

一致性验证

测试脚本

# 运行完整对比测试
./scripts/test_pipeline_compare.sh highbal

# 支持的数据集:
#   100k    - 100k orders without cancel
#   cancel  - 1.3M orders with 30% cancel
#   highbal - 1.3M orders with 30% cancel, high balance (推荐)

验证结果 (1.3M orders, 30% cancel, high balance)

╔════════════════════════════════════════════════════════════════╗
║                    ✅ ALL TESTS PASSED                         ║
║  Multi-thread pipeline matches single-thread exactly!          ║
╚════════════════════════════════════════════════════════════════╝

关键指标

数据集	总订单	Place	Cancel	Trades	结果
100k (无 cancel)	100,000	100,000	0	47,886	✅ 完全一致
1.3M + 30% cancel (高余额)	1,300,000	1,000,000	300,000	667,567	✅ 完全一致

注意事项

余额充足性

如果测试数据中用户余额不足，可能导致部分订单被 reject。在并发环境中，由于 settle 时序不同，这些 reject 可能与单线程结果不同。

解决方案: 使用 highbal 数据集，确保每个用户有充足余额（1000 BTC + 100M USDT）。

Shutdown 同步

Multi-thread pipeline 在 shutdown 时需要确保所有队列都已 drain：

#![allow(unused)]
fn main() {
while !queues.all_empty() {
    std::hint::spin_loop();
}
shutdown.request_shutdown();
}

性能

模式	100k orders	1.3M orders
Single-Thread	350ms	15.5s
Multi-Thread	330ms	15.6s

注：Multi-thread 当前版本包含 BalanceEvent 生成和持久化开销，性能与 Single-Thread 相当。未来优化方向包括批量 I/O 和减少队列竞争。

队列优先级策略 (未来)

当前实现: 完全优先 drain balance_update_queue，然后才处理新订单。

未来优化: 加权轮询 (Weighted Round-Robin): 允许交替处理，提高响应性。

#![allow(unused)]
fn main() {
const SETTLE_WEIGHT: u32 = 3;  // settle : order = 3 : 1
}

文件结构

src/
├── pipeline.rs       # 共享类型: PipelineStats, MultiThreadQueues, ShutdownSignal
├── pipeline_mt.rs    # Multi-thread 实现: run_pipeline_multi_thread()
├── pipeline_runner.rs # Single-thread 实现: run_pipeline()
└── main.rs           # --pipeline / --pipeline-mt 模式选择

0x08-h Performance Monitoring & Observability

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff | Key File: pipeline_services.rs

“If you can’t measure it, you can’t improve it.” This chapter focuses on introducing production-grade performance monitoring and observability for our multi-threaded pipeline.

Monitoring Dimensions

1. Latency Metrics

In HFT, averages are misleading. We care about Tail Latency.

P50 (Median): General performance.
P99 / P99.9: Stability in extreme cases.
Max: Jitter, GC, or system calls.

2. Throughput

Orders/sec: Processing capacity.
Trades/sec: Matching capacity.

3. Queue Depth & Backpressure

Monitoring Ring Buffer occupancy reveals downstream bottlenecks and jitter.

4. Architectural Breakdown

Knowing where time is spent (Pre-Trade vs Matching vs Settlement).

Test Execution

Dataset: 1.3M orders (30% cancel) from fixtures/test_with_cancel_highbal/.

Single-Thread Run:

cargo run --release -- --pipeline --input fixtures/test_with_cancel_highbal

Multi-Thread Run:

cargo run --release -- --pipeline-mt --input fixtures/test_with_cancel_highbal

Compare Script:

./scripts/test_pipeline_compare.sh highbal

Analysis Results (1.3M Dataset)

1. Single-Thread Pipeline

Throughput: 210,000 orders/sec (P50 Latency: 1.25 µs)
Breakdown:
- Matching Engine: 91.5% (The bottleneck)
- UBSCore Lock: 5.6%
- Persistence: 2.7%

2. Multi-Thread Pipeline (After Service Refactor)

Throughput: ~64,450 orders/sec
E2E Latency (P50): ~113 ms
E2E Latency (P99): ~188 ms

Conclusion

Parallelism Works: Total task CPU time (~34s) > Wall time (17.5s).
Bottleneck: Matching Engine remains the serial bottleneck (~52k ops/s limit).
Latency Cost: Multi-threading introduces significant message passing latency (µs → ms).

Logging & Observability

We introduced a production-grade asynchronous logging system using tracing.

1. Non-blocking I/O

Using tracing-appender with a dedicated worker thread and memory buffer to prevent I/O blocking.

2. Environment-driven Config

Dev: Detailed, human-readable.
Prod: JSON format, high-frequency tracing disabled (0XINFI=off).

3. Standardized Targets

All pipeline logs use the 0XINFI namespace (e.g., 0XINFI::ME, 0XINFI::UBSC) for precise filtering.

Intent-Based Design: From Functions to Services

“Good architecture is not designed upfront, but evolved through refactoring.”

We refactored tightly coupled spawn_* functions into decoupled Service Structs.

Problem: Coupled Functions

#![allow(unused)]
fn main() {
// ❌ Business logic buried in thread spawning
fn spawn_me_stage(...) -> JoinHandle<OrderBook> {
    thread::spawn(move || {
        // Logic locked inside closure
    })
}
}

Untestable: Cannot unit test logic without spawning threads.
Not Reusable: Cannot be used in single-thread mode.

Solution: Service Structs

#![allow(unused)]
fn main() {
// ✅ Intent is clear and decoupled
pub struct MatchingService {
    book: OrderBook,
    // ...
}

impl MatchingService {
    pub fn run(&mut self, shutdown: &ShutdownSignal) { ... }
}
}

Benefits

Testability: Services can be instantiated and tested in isolation.
Reusability: Core logic is decoupled from threading model.
Clarity: Code expresses “what” (Service), not just “how” (Thread).

↑ Back to Top

🇨🇳 中文

📦 代码变更: 查看 Diff | 关键文件: pipeline_services.rs

在构建高性能低延迟交易系统时，“如果你无法测量它，你就无法优化它”。本章重点在于为我们的多线程 Pipeline 引入生产级的性能监控和延迟指标分析。

监控维度

1. 延迟指标 (Latency Metrics)

对于 HFT 系统，平均延迟往往是误导性的，我们更关心长尾延迟 (Tail Latency)。

P50 (Median): 中位数延迟，反映平均水平。
P99 / P99.9: 长尾延迟，反映系统在极端情况下的稳定性。
Max: 峰值延迟，通常由系统抖动 (Jitter) 或 GC/系统调用引起。

2. 吞吐量 (Throughput)

Orders/sec: 每秒处理订单数。
Trades/sec: 每秒撮合成交数。

3. 队列深度与背压 (Queue Depth & Backpressure)

监控 Ring Buffer 的占用情况，识别下游瓶颈。

4. 架构内部阶段耗时 (Architectural Breakdown)

清晰地知道时间花在了哪里：Pre-Trade / Matching / Settlement / Logging。

测试执行方法

数据集: 130 万订单（含 30% 撤单） fixtures/test_with_cancel_highbal/。

运行单线程:

cargo run --release -- --pipeline --input fixtures/test_with_cancel_highbal

运行多线程:

cargo run --release -- --pipeline-mt --input fixtures/test_with_cancel_highbal

对比脚本:

./scripts/test_pipeline_compare.sh highbal

执行结果与分析 (1.3M 数据集)

1. 单线程流水线

性能: 210,000 orders/sec (P50: 1.25 µs)
瓶颈: Matching Engine 耗时 91.5%，是最大瓶颈。

2. 多线程流水线 (重构后)

吞吐量: ~64,450 orders/sec
端到端延迟 (P50): ~113 ms
端到端延迟 (P99): ~188 ms

结论

并行有效: CPU 总耗时远大于执行时间。
瓶颈: Matching Engine 依然是最大的串行瓶颈 (吞吐上限 ~52k)。
延迟: 多线程引入的消息传递开销导致端到端延迟从微秒级退化到毫秒级。

日志与可观测性

引入基于 tracing 的生产级异步日志体系。

1. 异步非阻塞架构

使用 tracing-appender 独立线程写入日志，不阻塞业务线程。

2. 环境驱动配置

Dev 开启详细日志，Prod 使用 JSON 并关闭高频追踪。

3. 标准化日志目标

使用 0XINFI 命名空间 (如 0XINFI::ME) 实现精细过滤。

意图编码：从函数到服务

“好的架构不是一开始就设计出来的，而是通过不断重构演进出来的。”

我们将紧耦合的 spawn_* 函数重构为解耦的 Service 结构体。

问题：紧耦合

#![allow(unused)]
fn main() {
// ❌ 业务逻辑埋在线程创建中
fn spawn_me_stage(...) {
    thread::spawn(move || { ... })
}
}

无法单元测试，无法复用。

解决方案：Service 结构体

#![allow(unused)]
fn main() {
// ✅ 意图清晰，解耦
pub struct MatchingService { ... }

impl MatchingService {
    pub fn run(&mut self, shutdown: &ShutdownSignal) { ... }
}
}

收益

可测试性: 服务可独立实例化测试。
可复用性: 核心逻辑与线程模型解耦。
清晰度: 代码表达“做什么“ (Service)，而非“怎么做“ (Thread)。

0x09-a Gateway: Client Access Layer

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

Core Objective: Implement a lightweight HTTP Gateway to connect clients with the trading core system.

Background: From Core to MVP

We have built a functional Trading Core:

OrderBook (0x04)
Balance Management (0x05-0x06)
Matching Engine (0x08)
Pipeline & Monitoring (0x08-f/g/h)

To become a usable MVP, we need auxiliary systems:

┌─────────────────────────────────────────────────────────────────────────┐
│                        Complete Trading System MVP                       │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Client (Web/Mobile/API)                                                 │
│       │                                                                  │
│       ▼                                                                  │
│  ┌─────────────────┐                                                     │
│  │   0x09-a        │  ← This Chapter: Accept orders, return response     │
│  │   Gateway       │                                                     │
│  └────────┬────────┘                                                     │
│           │                                                                  │
│           ▼                                                                  │
│  ┌─────────────────────────────────────────────────────────────────┐     │
│  │              Trading Core (Completed)                            │     │
│  │  Ingestion → UBSCore → ME → Settlement                          │     │
│  └─────────────────────────────────────────────────────────────────┘     │

0x09 Series Plan

Chapter	Topic	Core Function
0x09-a	Gateway	HTTP/WS Entry, Pre-Check
0x09-b	Settlement Persistence	DB Persistence for Balances/Trades
0x09-c	K-Line Aggregation	Real-time Candles
0x09-d	WebSocket Push	Real-time Market Data

1. Gateway Design

1.1 Responsibilities

The Gateway is the sole entry point for clients.

Protocol Conversion: HTTP/WebSocket → Internal Formats
Authentication: API Key / JWT
Pre-Check: Fast balance validation
Rate Limiting: Anti-DDoS
Response: Synchronous acknowledgment

1.2 Why Separate Gateway & Core?

Decoupling: Network I/O doesn’t block matching.
Scalability: Gateway can scale horizontally.
Predictability: Async queues ensure predictable matching latency.

1.3 Tech Stack

HTTP: axum (High performance, tokio-native)
WebSocket: tokio-tungstenite
Serialization: serde + JSON
Rate Limiting: tower middleware

2. Core Data Flow

2.1 Order Submission

┌──────────┐    HTTP POST    ┌──────────┐    Ring Buffer   ┌──────────┐
│  Client  │ ───────────────▶│ Gateway  │ ─────────────────▶│ Ingestion│
│          │                 │          │                   │  Stage   │
│          │◀─────────────── │          │                   │          │
└──────────┘  202 Accepted   └──────────┘                   └──────────┘
                   +                                              │
              order_id                                            ▼
              seq_id                                        Trading Core

2.2 Pre-Check Logic

#![allow(unused)]
fn main() {
async fn submit_order(order: OrderRequest) -> Result<OrderResponse, ApiError> {
    // 1. Validation
    validate_order(&order)?;
    
    // 2. Auth
    let user_id = authenticate(&headers)?;
    
    // 3. Pre-Check: Balance (Read-Only)
    let balance = ubscore.query_balance(user_id, order.asset_id).await?;
    if balance.avail < required {
        return Err(ApiError::InsufficientBalance);
    }
    
    // 4. Assign ID
    let order_id = id_generator.next();
    
    // 5. Push to Ring Buffer
    order_queue.push(SequencedOrder { ... })?;
    
    // 6. Return Accepted
    Ok(OrderResponse { status: "PENDING", ... })
}
}

Key Points:

Pre-Check is “best effort”.
Final locking happens in UBSCore.
Returns 202 Accepted to indicate async processing.

3. API Design

3.1 RESTful Endpoints

POST /api/v1/create_order: Submit order
POST /api/v1/cancel_order: Cancel order
GET /api/v1/order/{order_id}: Query status

3.2 Request/Response Format

Submit Order:

// POST /api/v1/create_order
{
    "symbol": "BTC_USDT",
    "side": "BUY",
    "type": "LIMIT",
    "price": "85000.00",
    "qty": "0.001"
}

// Response (202 Accepted)
{
    "code": 0,
    "msg": "ok",
    "data": {
        "order_id": 1001,
        "status": "ACCEPTED",
        "accepted_at": 1734533784000
    }
}

3.3 Unified Response Format

{
    "code": 0,          // 0 = Success, Non-0 = Error
    "msg": "ok",        // Short description
    "data": {}          // Payload or null
}

3.4 API Conventions

Important: Must follow API Conventions.

SCREAMING_CASE Enums: "BUY", "SELL", "LIMIT".
Naming: qty (not quantity), cid (client_order_id).
SCREAMING_SNAKE_CASE Error Codes: INVALID_PARAMETER.

4. WebSocket Push

4.1 Flow

Clients connect via WS, authenticate, and subscribe to channels.

4.2 Channels

order_updates: Private order status changes.
balance_updates: Private balance changes.
trades: Public trade feed.

5. Security

Level	Method	Scenario
MVP	Header `X-User-ID`	Internal / Reliability Testing
Prod	API Key (HMAC)	Programmatic Trading
Prod	JWT	Web/Mobile

6. Communication Architecture

6.1 MVP Choice: Single Process Ring Buffer

Gateway and Trading Core run in the same process, communicating via Arc<ArrayQueue>.

Pros:

✅ Zero network overhead (~100ns latency).
✅ Reuse existing crossbeam queues.
✅ Simple deployment.

6.2 Architecture Diagram

┌─────────────────────────────────────────────────────────────────────────┐
│                     Single Process (--gateway mode)                      │
├─────────────────────────────────────────────────────────────────────────┤
│  ┌─────────────────────────────┐                                         │
│  │ HTTP Server (tokio runtime) │                                         │
│  └──────────────┬──────────────┘                                         │
│                 │                                                        │
│                 ▼                                                        │
│  ┌─────────────────────────────┐                                         │
│  │         order_queue         │ (Shared Ring Buffer)                    │
│  └──────────────┬──────────────┘                                         │
│                 │                                                        │
│                 ▼                                                        │
│  ┌─────────────────────────────┐                                         │
│  │      Trading Core Threads   │                                         │
│  └─────────────────────────────┘                                         │
└─────────────────────────────────────────────────────────────────────────┘

6.3 Evolution Path

MVP: Single Process.
Phase 2: Unix Domain Socket (Multi-process on same host).
Phase 3: TCP / RPC (Distributed).

7. Implementation Guidelines

7.1 Startup Modes

# Gateway Mode
cargo run --release -- --gateway --port 8080

# Batch Mode (Original)
cargo run --release -- --pipeline-mt

7.2 Main Integration

#![allow(unused)]
fn main() {
if args.gateway {
    // Spawn HTTP Server in a thread
    std::thread::spawn(move || {
        let rt = tokio::runtime::Runtime::new().unwrap();
        rt.block_on(run_http_server(queues));
    });
    // Run Trading Core
    run_pipeline_multi_thread(queues, ...);
}
}

Summary

This chapter implements the Gateway as the client access layer.

Core Philosophy:

The Gateway is a speed guard, not a business processor. Accept fast, validate fast, forward fast.

↑ Back to Top

🇨🇳 中文

📦 代码变更: 查看 Diff

本节核心目标：实现一个轻量级的 HTTP Gateway，连接客户端与交易核心系统。

背景：从核心到完整 MVP

在前面的章节中，我们已经构建了一个功能完整的交易核心系统：

OrderBook (0x04)
Balance Management (0x05-0x06)
Matching Engine (0x08)
Pipeline (0x08-f/g/h)

但要成为一个可用的 MVP (Minimum Viable Product)，还需要以下辅助系统：

┌─────────────────────────────────────────────────────────────────────────┐
│                        Complete Trading System MVP                       │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Client (Web/Mobile/API)                                                 │
│       │                                                                  │
│       ▼                                                                  │
│  ┌─────────────────┐                                                     │
│  │   0x09-a        │  ← 本章：接收订单，返回响应                           │
│  │   Gateway       │                                                     │
│  └────────┬────────┘                                                     │
│           │                                                                  │
│           ▼                                                                  │
│  ┌─────────────────────────────────────────────────────────────────┐     │
│  │              Trading Core (已完成)                               │     │
│  │  Ingestion → UBSCore → ME → Settlement                          │     │
│  └─────────────────────────────────────────────────────────────────┘     │

0x09 系列章节规划

章节	主题	核心功能
0x09-a	Gateway	HTTP/WS 订单接入、Pre-Check
0x09-b	Settlement Persistence	用户余额、订单、成交入库
0x09-c	K-Line Aggregation	实时 K 线聚合
0x09-d	WebSocket Push	实时行情推送

1. Gateway 设计

1.1 职责

Gateway 是客户端与交易系统的唯一入口：

协议转换：HTTP/WebSocket → 内部消息格式
身份验证：API Key / JWT
Pre-Check：快速余额校验
限流：防止 DDoS
响应：同步返回接收确认

1.2 为什么 Gateway + Trading Core 分离？

解耦：网络 I/O 不阻塞撮合。
扩展性：Gateway 可水平扩展。
可预测性：异步队列确保撮合延迟可预测。

1.3 技术选型

HTTP: axum (高性能、tokio 原生)
WebSocket: tokio-tungstenite
Serialization: serde + JSON
Rate Limiting: tower middleware

2. 核心数据流

2.1 订单提交流程

┌──────────┐    HTTP POST    ┌──────────┐    Ring Buffer   ┌──────────┐
│  Client  │ ───────────────▶│ Gateway  │ ─────────────────▶│ Ingestion│
│          │                 │          │                   │  Stage   │
│          │◀─────────────── │          │                   │          │
└──────────┘  202 Accepted   └──────────┘                   └──────────┘
                   +                                              │
              order_id                                            ▼
              seq_id                                        Trading Core

2.2 Pre-Check 流程

#![allow(unused)]
fn main() {
async fn submit_order(order: OrderRequest) -> Result<OrderResponse, ApiError> {
    // 1. 参数校验
    validate_order(&order)?;
    
    // 2. 身份验证
    let user_id = authenticate(&headers)?;
    
    // 3. Pre-Check: 余额检查 (只读)
    let balance = ubscore.query_balance(user_id, order.asset_id).await?;
    if balance.avail < required {
        return Err(ApiError::InsufficientBalance);
    }
    
    // 4. 分配 ID
    let order_id = id_generator.next();
    
    // 5. 推送到 Ring Buffer
    order_queue.push(SequencedOrder { ... })?;
    
    // 6. 返回接收确认
    Ok(OrderResponse { status: "PENDING", ... })
}
}

关键点：

Pre-Check 是“尽力而为“的检查。
最终锁定在 UBSCore 执行。
返回 202 Accepted 表示异步处理中。

3. API 设计

3.1 RESTful Endpoints

POST /api/v1/create_order: 提交订单
POST /api/v1/cancel_order: 取消订单
GET /api/v1/order/{order_id}: 查询状态

3.2 请求/响应格式

提交订单:

// POST /api/v1/create_order
{
    "symbol": "BTC_USDT",
    "side": "BUY",
    "type": "LIMIT",
    "price": "85000.00",
    "qty": "0.001"
}

// Response (202 Accepted)
{
    "code": 0,
    "msg": "ok",
    "data": {
        "order_id": 1001,
        "status": "ACCEPTED",
        "accepted_at": 1734533784000
    }
}

3.3 统一响应格式

{
    "code": 0,          // 0 = 成功, 非0 = 错误码
    "msg": "ok",        // 简短描述
    "data": {}          // 数据或 null
}

3.4 API 规范

重要: 必须遵循 API Conventions 规范。

大写枚举: "BUY", "SELL", "LIMIT"。
命名一致: qty (而非 quantity), cid (client_order_id)。
大写蛇形错误码: INVALID_PARAMETER。

4. WebSocket 实时推送

4.1 流程

客户端连接 WS，认证，并订阅频道。

4.2 频道

order_updates: 私有订单状态变更。
balance_updates: 私有余额变更。
trades: 公共成交推送。

5. 安全设计

级别	方法	场景
MVP	Header `X-User-ID`	内部测试
Prod	API Key (HMAC)	程序化交易
Prod	JWT	Web/移动端

6. 通信架构设计

6.1 MVP 选择：单进程 Ring Buffer

Gateway 和 Trading Core 运行在同一进程中，通过 Arc<ArrayQueue> 通信。

优势：

✅ 零网络开销 (~100ns 延迟)。
✅ 复用现有 crossbeam 队列。
✅ 部署简单。

6.2 架构图

┌─────────────────────────────────────────────────────────────────────────┐
│                     Single Process (--gateway mode)                      │
├─────────────────────────────────────────────────────────────────────────┤
│  ┌─────────────────────────────┐                                         │
│  │ HTTP Server (tokio runtime) │                                         │
│  └──────────────┬──────────────┘                                         │
│                 │                                                        │
│                 ▼                                                        │
│  ┌─────────────────────────────┐                                         │
│  │         order_queue         │ (共享 Ring Buffer)                      │
│  └──────────────┬──────────────┘                                         │
│                 │                                                        │
│                 ▼                                                        │
│  ┌─────────────────────────────┐                                         │
│  │      Trading Core Threads   │                                         │
│  └─────────────────────────────┘                                         │
└─────────────────────────────────────────────────────────────────────────┘

6.3 演进路径

MVP: 单进程。
Phase 2: Unix Domain Socket (同机多进程)。
Phase 3: TCP / RPC (分布式)。

7. 实现指引

7.1 启动模式

# Gateway 模式
cargo run --release -- --gateway --port 8080

# 批量模式 (原有)
cargo run --release -- --pipeline-mt

7.2 Main 集成

#![allow(unused)]
fn main() {
if args.gateway {
    // 启动 HTTP Server 线程
    std::thread::spawn(move || {
        let rt = tokio::runtime::Runtime::new().unwrap();
        rt.block_on(run_http_server(queues));
    });
    // 运行 Trading Core
    run_pipeline_multi_thread(queues, ...);
}
}

总结

本章实现了 Gateway 作为客户端接入层。

核心理念：

Gateway 是速度门卫而不是业务处理器。快速接收、快速校验、快速转发。

0x09-b Settlement Persistence: TDengine Integration

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

Core Objective: Persist trade data to TDengine and implement Order Query & History APIs.

Background: From Memory to Persistence

In Gateway Phase 1 (0x09-a), we completed:

✅ HTTP API (create_order, cancel_order)
✅ Order Validation
✅ Ring Buffer Integration
⏳ Data Persistence ← This Chapter

Current System Issue:

┌─────────────────────────────────────────────────────────────────┐
│                    Trading Core (In-Memory)                      │
│                                                                  │
│    Orders → Match → Trades → Settle → Balance Update             │
│       ↓         ↓           ↓                                   │
│      ❌         ❌           ❌    ← Data LOST on restart!       │
└─────────────────────────────────────────────────────────────────┘

This Chapter’s Solution:

┌─────────────────────────────────────────────────────────────────┐
│                    Trading Core                                  │
│                                                                  │
│    Orders → Match → Trades → Settle → Balance Update             │
│       ↓         ↓           ↓                                   │
│    ┌─────────────────────────────────────────────────┐          │
│    │              TDengine (Persistence)              │          │
│    │    orders | trades | balances                   │          │
│    └─────────────────────────────────────────────────┘          │
└─────────────────────────────────────────────────────────────────┘

1. Why TDengine?

Detailed comparison: Database Selection Analysis

Core Advantages

Feature	TDengine	PostgreSQL
Write Speed	1M/sec	10k/sec
Time-Series	Native Support	Index Optimization Needed
Storage	1/10	1x
Real-time Analytics	Built-in Stream	External Tools Needed
Rust Client	✅ Official `taos`	✅ `tokio-postgres`

2. Schema Design

2.1 Super Table Architecture

TDengine uses the Super Table concept:

┌─────────────────────────────────────────────────────────┐
│              Super Table: orders                         │
│    (Unified schema, auto-create sub-table per symbol)    │
├─────────────────┬─────────────────┬────────────────────┤
│ orders_1        │ orders_2        │ orders_N           │
│ (BTC_USDT)      │ (ETH_USDT)      │ (...)              │
└─────────────────┴─────────────────┴────────────────────┘

2.2 DDL Definitions

-- Database Setup
CREATE DATABASE IF NOT EXISTS trading 
    KEEP 365d              -- Retain data for 1 year
    DURATION 10d           -- Partition every 10 days
    BUFFER 256             -- 256MB Write Buffer
    WAL_LEVEL 2            -- WAL Persistence Level
    PRECISION 'us';        -- Microsecond Precision

USE trading;

-- Orders Super Table
CREATE STABLE IF NOT EXISTS orders (
    ts TIMESTAMP,               -- Timestamp (PK)
    order_id BIGINT UNSIGNED,
    user_id BIGINT UNSIGNED,
    side TINYINT UNSIGNED,      -- 0=BUY, 1=SELL
    order_type TINYINT UNSIGNED,-- 0=LIMIT, 1=MARKET
    price BIGINT UNSIGNED,      -- Integer representation
    qty BIGINT UNSIGNED,
    filled_qty BIGINT UNSIGNED,
    status TINYINT UNSIGNED,
    cid NCHAR(64)               -- Client Order ID
) TAGS (
    symbol_id INT UNSIGNED      -- Partition Key
);

-- Trades Super Table
CREATE STABLE IF NOT EXISTS trades (
    ts TIMESTAMP,
    trade_id BIGINT UNSIGNED,
    order_id BIGINT UNSIGNED,
    user_id BIGINT UNSIGNED,
    side TINYINT UNSIGNED,
    price BIGINT UNSIGNED,
    qty BIGINT UNSIGNED,
    fee BIGINT UNSIGNED,
    role TINYINT UNSIGNED       -- 0=MAKER, 1=TAKER
) TAGS (
    symbol_id INT UNSIGNED
);

-- Balances Super Table
CREATE STABLE IF NOT EXISTS balances (
    ts TIMESTAMP,
    avail BIGINT UNSIGNED,
    frozen BIGINT UNSIGNED,
    lock_version BIGINT UNSIGNED,
    settle_version BIGINT UNSIGNED
) TAGS (
    user_id BIGINT UNSIGNED,
    asset_id INT UNSIGNED
);

2.3 Status Enums

#![allow(unused)]
fn main() {
// New Enum
pub enum TradeRole {
    Maker = 0,
    Taker = 1,
}
}

3. API Design

3.1 Query Endpoints

Endpoint	Method	Description
`/api/v1/order/{order_id}`	GET	Query single order
`/api/v1/orders`	GET	Query order list
`/api/v1/trades`	GET	Query trade history
`/api/v1/balances`	GET	Query user balances

3.2 Request/Response Format

GET /api/v1/order/{order_id}:

{
    "code": 0,
    "msg": "ok",
    "data": {
        "order_id": 1001,
        "symbol": "BTC_USDT",
        "status": "PARTIALLY_FILLED",
        "filled_qty": "0.0005",
        "created_at": 1734533784000
    }
}

GET /api/v1/balances:

{
    "code": 0,
    "msg": "ok",
    "data": {
        "balances": [
             { "asset": "BTC", "avail": "1.50000000", "frozen": "0.10000000" }
        ]
    }
}

4. Implementation Architecture

4.1 Module Structure

src/
├── persistence/
│   ├── mod.rs              // Entry
│   ├── tdengine.rs         // Connection Manager
│   ├── orders.rs           // Order Persistence
│   ├── trades.rs           // Trade Persistence
│   └── balances.rs         // Balance Persistence

4.2 Data Flow

┌─────────────────────────────────────────────────────────────────┐
│                      Settlement Thread                           │
│                                                                  │
│    trade_queue.pop() ──┬── Update In-Memory Balance              │
│                        │                                         │
│                        └── Write to TDengine                     │
│                             ├── INSERT trades                    │
│                             ├── INSERT order_events              │
│                             └── INSERT balances (Snapshot)       │
└─────────────────────────────────────────────────────────────────┘

4.3 Batch Write Optimization

#![allow(unused)]
fn main() {
// Batch write to reduce I/O overhead
const BATCH_SIZE: usize = 1000;

async fn flush_trades(trades: Vec<Trade>) {
    let mut sql = String::from("INSERT INTO ");
    // Construct bulk insert SQL...
    client.exec(&sql).await;
}
}

5. Implementation Plan

Phase 1: Basic Persistence (This Chapter)

TDengine Connection
Schema Initialization
Trade/Order/Balance Writes

Phase 2: Query APIs

Implement GET Endpoints

Phase 3: Optimization

Batch Writes
Connection Pool
Redis Cache

6. Verification Plan

6.1 Integration Test

# 1. Start TDengine
docker run -d -p 6030:6030 -p 6041:6041 tdengine/tdengine:latest

# 2. Run Gateway
cargo run --release -- --gateway --port 8080

# 3. Submit Order
curl -X POST http://localhost:8080/api/v1/create_order ...

# 4. Query Order (Verify Persistence)
curl http://localhost:8080/api/v1/order/1

Summary

This chapter implements Settlement Persistence.

Core Philosophy:

Persistence is a side-channel operation, not blocking the main trading flow. The Settlement thread writes to TDengine asynchronously.

↑ Back to Top

🇨🇳 中文

📦 代码变更: 查看 Diff

本节核心目标：将成交数据持久化到 TDengine，实现订单查询和历史记录 API。

背景：从内存到持久化

在 Gateway Phase 1 (0x09-a) 中，我们完成了：

✅ HTTP API (create_order, cancel_order)
✅ 订单验证和转换
✅ Ring Buffer 队列集成
⏳ 数据持久化 ← 本章

当前系统的问题：

┌─────────────────────────────────────────────────────────────────┐
│                    Trading Core (内存中)                         │
│                                                                  │
│    Orders → 匹配 → Trades → 结算 → 余额更新                      │
│       ↓         ↓           ↓                                   │
│      ❌         ❌           ❌    ← 重启后数据丢失！              │
└─────────────────────────────────────────────────────────────────┘

本章解决方案：

┌─────────────────────────────────────────────────────────────────┐
│                    Trading Core                                  │
│                                                                  │
│    Orders → 匹配 → Trades → 结算 → 余额更新                      │
│       ↓         ↓           ↓                                   │
│    ┌─────────────────────────────────────────────────┐          │
│    │              TDengine (持久化)                   │          │
│    │    orders | trades | balances                   │          │
│    └─────────────────────────────────────────────────┘          │
└─────────────────────────────────────────────────────────────────┘

1. 为什么选择 TDengine

详细对比见: 数据库选型分析

核心优势

特性	TDengine	PostgreSQL
写入速度	100万/秒	1万/秒
时序查询	原生支持	需要索引优化
存储空间	1/10	1x
实时分析	内置流计算	需要额外工具
Rust 客户端	✅ 官方 `taos`	✅ `tokio-postgres`

2. Schema 设计

2.1 Super Table 架构

TDengine 使用 Super Table 概念：

┌─────────────────────────────────────────────────────────┐
│              Super Table: orders                         │
│    (统一 schema，自动按 symbol_id 创建子表)               │
├─────────────────┬─────────────────┬────────────────────┤
│ orders_1        │ orders_2        │ orders_N           │
│ (BTC_USDT)      │ (ETH_USDT)      │ (...)              │
└─────────────────┴─────────────────┴────────────────────┘

2.2 DDL 定义

-- Database Setup
CREATE DATABASE IF NOT EXISTS trading 
    KEEP 365d              -- 数据保留 1 年
    DURATION 10d           -- 每 10 天一个分区
    BUFFER 256             -- 写缓冲 256MB
    WAL_LEVEL 2            -- WAL 持久化级别
    PRECISION 'us';        -- 微秒精度

USE trading;

-- Orders Super Table
CREATE STABLE IF NOT EXISTS orders (
    ts TIMESTAMP,               -- 订单时间戳 (主键)
    order_id BIGINT UNSIGNED,   -- 订单 ID
    user_id BIGINT UNSIGNED,    -- 用户 ID
    side TINYINT UNSIGNED,      -- 0=BUY, 1=SELL
    order_type TINYINT UNSIGNED,-- 0=LIMIT, 1=MARKET
    price BIGINT UNSIGNED,      -- 价格 (整数)
    qty BIGINT UNSIGNED,        -- 原始数量
    filled_qty BIGINT UNSIGNED, -- 已成交数量
    status TINYINT UNSIGNED,    -- 订单状态
    cid NCHAR(64)               -- 客户端订单 ID
) TAGS (
    symbol_id INT UNSIGNED      -- 交易对 ID (分区键)
);

-- Trades Super Table
CREATE STABLE IF NOT EXISTS trades (
    ts TIMESTAMP,               -- 成交时间戳
    trade_id BIGINT UNSIGNED,   -- 成交 ID
    order_id BIGINT UNSIGNED,   -- 订单 ID
    user_id BIGINT UNSIGNED,    -- 用户 ID
    side TINYINT UNSIGNED,      -- 0=BUY, 1=SELL
    price BIGINT UNSIGNED,      -- 成交价格
    qty BIGINT UNSIGNED,        -- 成交数量
    fee BIGINT UNSIGNED,        -- 手续费
    role TINYINT UNSIGNED       -- 0=MAKER, 1=TAKER
) TAGS (
    symbol_id INT UNSIGNED
);

-- Balances Super Table
CREATE STABLE IF NOT EXISTS balances (
    ts TIMESTAMP,               -- 快照时间
    avail BIGINT UNSIGNED,      -- 可用余额
    frozen BIGINT UNSIGNED,     -- 冻结余额
    lock_version BIGINT UNSIGNED,   -- 锁定版本
    settle_version BIGINT UNSIGNED  -- 结算版本
) TAGS (
    user_id BIGINT UNSIGNED,    -- 用户 ID
    asset_id INT UNSIGNED       -- 资产 ID
);

2.3 状态枚举

#![allow(unused)]
fn main() {
// 新增
pub enum TradeRole {
    Maker = 0,
    Taker = 1,
}
}

3. API 设计

3.1 查询端点

端点	方法	描述
`/api/v1/order/{order_id}`	GET	查询单个订单
`/api/v1/orders`	GET	查询订单列表
`/api/v1/trades`	GET	查询成交历史
`/api/v1/balances`	GET	查询用户余额

3.2 请求/响应格式

GET /api/v1/order/{order_id}:

{
    "code": 0,
    "msg": "ok",
    "data": {
        "order_id": 1001,
        "symbol": "BTC_USDT",
        "status": "PARTIALLY_FILLED",
        "filled_qty": "0.0005",
        "created_at": 1734533784000
    }
}

4. 实现架构

4.1 模块结构

src/
├── persistence/
│   ├── mod.rs              // 模块入口
│   ├── tdengine.rs         // TDengine 连接管理
│   ├── orders.rs           // 订单持久化
│   ├── trades.rs           // 成交持久化
│   └── balances.rs         // 余额持久化

4.2 数据流

┌─────────────────────────────────────────────────────────────────┐
│                      Settlement 线程                             │
│                                                                  │
│    trade_queue.pop() ──┬── 更新内存余额                          │
│                        │                                         │
│                        └── 写入 TDengine                         │
│                             ├── INSERT trades                    │
│                             ├── INSERT order_events              │
│                             └── INSERT balances (快照)           │
└─────────────────────────────────────────────────────────────────┘

4.3 批量写入优化

#![allow(unused)]
fn main() {
// 批量写入，减少 I/O 开销
const BATCH_SIZE: usize = 1000;

async fn flush_trades(trades: Vec<Trade>) {
    let mut sql = String::from("INSERT INTO ");
    // ... 构建批量插入 SQL
    client.exec(&sql).await;
}
}

5. 实现计划

Phase 1: 基础持久化 (本次)

TDengine 连接管理
Schema 初始化
成交/订单/余额写入

Phase 2: 查询接口

实现 GET 端点

Phase 3: 优化

批量写入
连接池
Redis 缓存

6. 验证计划

6.1 集成测试

# 1. 启动 TDengine
docker run -d -p 6030:6030 -p 6041:6041 tdengine/tdengine:latest

# 2. 运行 Gateway
cargo run --release -- --gateway --port 8080

# 3. 提交订单
curl -X POST http://localhost:8080/api/v1/create_order ...

# 4. 查询订单 (验证持久化)
curl http://localhost:8080/api/v1/order/1

Summary

本章实现 Settlement Persistence：

核心理念：

持久化是旁路操作，不阻塞主交易流程。Trading Core 保持高性能，Settlement 线程异步写入 TDengine。

下一章 (0x09-c) 将实现 WebSocket 实时推送。

0x09-c WebSocket Push: Real-time Notification

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

Core Objective: Implement WebSocket real-time push so clients can receive order updates, trade notifications, and balance changes.

Background: From Polling to Push

Current Query Method (Polling):

Client                    Gateway
  │                          │
  ├─── GET /orders ─────────>│  (Poll)
  │<──────────────────────────┤
  │       ... seconds ...      │
  ├─── GET /orders ─────────>│  (Poll again)
  │<──────────────────────────┤

Issues:

❌ High Latency
❌ Wasted Resources
❌ Poor Real-time experience

This Chapter’s Solution (Push):

Client                    Gateway                Trading Core
  │                          │                        │
  ├── WS Connect ───────────>│                        │
  │<── Connected ────────────┤                        │
  │                          │                        │
  │                          │<── Order Filled ───────┤
  │<── push: order.update ───┤                        │
  │                          │                        │
  │                          │<── Trade ──────────────┤
  │<── push: trade ──────────┤                        │

1. Push Event Types

1.1 Classification

Event Type	Trigger	Recipient
`order.update`	Status change (NEW/FILLED/CANCELED)	Order Owner
`trade`	Trade execution	Buyer & Seller
`balance.update`	Balance change	Account Owner

1.2 Message Format

// Order Update
{
    "type": "order.update",
    "data": {
        "order_id": 1001,
        "symbol": "BTC_USDT",
        "status": "FILLED",
        "filled_qty": "0.001",
        "avg_price": "85000.00",
        "updated_at": 1734533790000
    }
}

// Trade Notification
{
    "type": "trade",
    "data": {
        "trade_id": 5001,
        "order_id": 1001,
        "symbol": "BTC_USDT",
        "side": "BUY",
        "role": "TAKER",
        "traded_at": 1734533790000
    }
}

// Balance Update
{
    "type": "balance.update",
    "data": {
        "asset": "BTC",
        "avail": "1.501000",
        "frozen": "0.000000"
    }
}

2. Architecture Design

2.1 Design Principles

Important

Data Consistency First: When a user receives a push, the database MUST already be updated.

Correct Flow: ME Match → Settlement Persist → Push → User Query → Data Exists ✅

Incorrect Flow: ME Match → Push → User Query → Data Not Found ❌

2.2 System Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Multi-Thread Pipeline                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Thread 3: ME         ──▶  trade_queue  ──▶  Thread 4: Settlement│
│                       └──▶  balance_update_queue                │
│                                                                  │
│  Thread 4: Settlement ──▶  push_event_queue  ──▶  WsService     │
│                       │                                          │
│                       └──▶  TDengine (persist)                   │
│                                                                  │
│  WsService (Gateway)  ──▶  ConnectionManager  ──▶  Clients      │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Key Decisions:

✅ Settlement is the only push source.
✅ Push events generated ONLY after persistence success.
✅ WsService runs in the Gateway’s tokio runtime.

2.3 Connection Management

ConnectionManager uses DashMap to handle concurrent connections, supporting multiple connections per user.

3. API Design

3.1 Endpoint

ws://host:port/ws

3.2 Connection Flow

Connect.
Send Auth: {"type": "auth", "token": "..."}.
Receive Auth Success.
Receive Push Events.

3.3 Heartbeat

Client sends {"type": "ping"} every 30s, Server responds {"type": "pong"}.

4. Implementation

4.1 Core Structures

PushEvent (Internal Queue):

#![allow(unused)]
fn main() {
pub enum PushEvent {
    OrderUpdate { ... },
    Trade { ... },
    BalanceUpdate { ... },
}
}

TradeEvent Extension: Added taker_filled_qty, maker_filled_qty etc., to TradeEvent to allow Settlement to determine order status (FILLED vs PARTIAL) without querying generic order state.

4.2 Implementation Plan

Phase 1: Basic Connection (Manager, Handler, Gateway Integration).
Phase 2: Push Integration (push_event_queue, WsService, Settlement logic).
Phase 3: Refinement (Error handling, Performance tests).

5. Verification

5.1 Automated Tests

Run sh run_test.sh:

Validates WS connection.
Submits orders and verifies receiving order_update, trade, and balance_update events.

5.2 Manual Test

websocat "ws://localhost:8080/ws?user_id=1001"
# Send {"type": "ping"} -> Receive {"type": "pong"}

Summary

This chapter implements WebSocket real-time push.

Key Design Decisions:

Settlement-first: Ensuring consistency.
Single Source: All events originate from Settlement.
Extended TradeEvent: Carrying adequate state for downstream consumers.

Next Chapter: 0x09-d K-Line Aggregation.

↑ Back to Top

🇨🇳 中文

📦 代码变更: 查看 Diff

本节核心目标：实现 WebSocket 实时推送，客户端可接收订单状态更新、成交通知、余额变化。

背景：从轮询到推送

当前系统查询方式（轮询）：

Client                    Gateway
  │                          │
  ├─── GET /orders ─────────>│  (轮询 polling)
  │<──────────────────────────┤
  │       ... 数秒后 ...       │
  ├─── GET /orders ─────────>│  (再次轮询)
  │<──────────────────────────┤

问题：

❌ 延迟高
❌ 浪费资源
❌ 实时性差

本章解决方案（推送）：

Client                    Gateway                Trading Core
  │                          │                        │
  ├── WS Connect ───────────>│                        │
  │<── Connected ────────────┤                        │
  │                          │                        │
  │                          │<── Order Filled ───────┤
  │<── push: order.update ───┤                        │
  │                          │                        │
  │                          │<── Trade ──────────────┤
  │<── push: trade ──────────┤                        │

1. 推送事件类型

1.1 事件分类

事件类型	触发时机	接收者
`order.update`	订单状态变化	订单所有者
`trade`	成交发生	双方用户
`balance.update`	余额变化	账户所有者

1.2 消息格式

// 订单更新
{
    "type": "order.update",
    "data": {
        "order_id": 1001,
        "symbol": "BTC_USDT",
        "status": "FILLED",
        "filled_qty": "0.001",
        "avg_price": "85000.00",
        "updated_at": 1734533790000
    }
}

2. 架构设计

2.1 设计原则

Important

数据一致性优先: 用户收到推送时，数据库必须已更新。

正确流程: ME 成交 → Settlement 持久化 → 推送 → 用户查询 → 数据已存在 ✅

2.2 系统架构

┌─────────────────────────────────────────────────────────────────┐
│                    Multi-Thread Pipeline                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Thread 3: ME         ──▶  trade_queue  ──▶  Thread 4: Settlement│
│                       └──▶  balance_update_queue                │
│                                                                  │
│  Thread 4: Settlement ──▶  push_event_queue  ──▶  WsService     │
│                       │                                          │
│                       └──▶  TDengine (persist)                   │
│                                                                  │
│  WsService (Gateway)  ──▶  ConnectionManager  ──▶  Clients      │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

关键设计:

✅ Settlement 作为唯一推送源
✅ 持久化成功后才生成 PushEvent
✅ WsService 运行在 Gateway 的 tokio runtime

3. API 设计

3.1 端点

ws://host:port/ws

3.2 连接流程

Client 连接
发送认证: {"type": "auth", "token": "..."}
接收推送

3.3 心跳

Client 发送 {"type": "ping"} (每30秒)，Server 回复 {"type": "pong"}。

4. 实现细节

4.1 核心结构

PushEvent (内部队列): 定义了三种核心事件结构。

TradeEvent 扩展: 新增了 taker_filled_qty 等字段，允许 Settlement 判断订单最终状态。

4.2 实现计划

Phase 1: 基础连接管理
Phase 2: 推送集成 (Settlement -> WsService)
Phase 3: 完善与验证

5. 验证

5.1 自动化测试

运行 sh run_test.sh，覆盖连接、下单、接收各类推送的全流程。

5.2 手动测试

websocat "ws://localhost:8080/ws?user_id=1001"

总结

本章实现了 WebSocket 实时推送。

关键设计决策:

settlement-first: 确保一致性。
单一推送源: 简化架构。
TradeEvent 扩展: 携带足够状态。

下一章 (0x09-d) 将实现 K-Line 聚合服务。

0x09-d K-Line Aggregation Service

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

Core Objective: Implement real-time K-Line (Candlestick) aggregation service, supporting multiple intervals (1m, 5m, 15m, 30m, 1h, 1d).

Background: Market Data Aggregation

The exchange needs to provide standardized market data:

Trades                            K-Line (OHLCV)
  │                                    │
  ├── Trade 1: price=30000, qty=0.1    │
  ├── Trade 2: price=30100, qty=0.2  ──▶ 1-Min K-Line:
  ├── Trade 3: price=29900, qty=0.1    │   Open:  30000
  └── Trade 4: price=30050, qty=0.3    │   High:  30100
                                       │   Low:   29900
                                       │   Close: 30050
                                       │   Volume: 0.7

1. K-Line Data Structure

1.1 OHLCV

#![allow(unused)]
fn main() {
pub struct KLine {
    pub symbol_id: u32,
    pub interval: KLineInterval,
    pub open_time: u64,      // Unix timestamp (ms)
    pub close_time: u64,
    pub open: u64,
    pub high: u64,
    pub low: u64,
    pub close: u64,
    pub volume: u64,         // Base asset volume
    pub quote_volume: u64,   // Quote asset volume (price * qty)
    pub trade_count: u32,
}
}

Warning

quote_volume Overflow: price * qty might overflow u64.

Correct SQL: SUM(CAST(price AS DOUBLE) * CAST(qty AS DOUBLE)) AS quote_volume

1.2 API Response Format

{
    "symbol": "BTC_USDT",
    "interval": "1m",
    "open_time": 1734533760000,
    "close_time": 1734533819999,
    "open": "30000.00",
    "high": "30100.00",
    "low": "29900.00",
    "close": "30050.00",
    "volume": "0.700000",
    "quote_volume": "21035.00",
    "trade_count": 4
}

2. Architecture: TDengine Stream Computing

2.1 Core Concept

Leverage TDengine built-in Stream Computing for auto-aggregation. No manual aggregator implementation needed:

Settlement writes to trades table.
TDengine automatically triggers stream computing.
Results are written to klines tables.
HTTP API queries klines tables directly.

2.2 Data Flow

   Settlement ──▶ trades table (TDengine)
                      │
                      │ TDengine Stream Computing (Auto)
                      │
                      ├─── kline_1m_stream  ──► klines_1m table
                      ├─── kline_5m_stream  ──► klines_5m table
                      └─── ...
                                                    │
                           ┌────────────────────────┴───────────────────────┐
                           ▼                                                ▼
                    HTTP API                                        WebSocket Push
               GET /api/v1/klines                                kline.update (Optional)

2.3 TDengine Stream Example

CREATE STREAM IF NOT EXISTS kline_1m_stream
INTO klines_1m SUBTABLE(CONCAT('kl_1m_', CAST(symbol_id AS NCHAR(10))))
AS SELECT
    _wstart AS ts,
    FIRST(price) AS open,
    MAX(price) AS high,
    MIN(price) AS low,
    LAST(price) AS close,
    SUM(qty) AS volume,
    SUM(CAST(price AS DOUBLE) * CAST(qty AS DOUBLE)) AS quote_volume,
    COUNT(*) AS trade_count
FROM trades
PARTITION BY symbol_id
INTERVAL(1m);

3. API Design

3.1 HTTP Endpoint

GET /api/v1/klines?symbol=BTC_USDT&interval=1m&limit=100

3.2 WebSocket Push

{
    "type": "kline.update",
    "data": {
        "symbol": "BTC_USDT",
        "interval": "1m",
        "open": "30000.00",
        "close": "30050.00",
        "is_final": false
    }
}

4. Module Structure

src/
├── persistence/
│   ├── klines.rs           # Create Streams, Query K-Lines
│   ├── schema.rs           # Add klines Super Table
│   └── queries.rs          # Add query_klines()
├── gateway/
│   ├── handlers.rs         # Add get_klines
│   └── ...

Tip

No need for src/kline/ logic directory, TDengine handles it.

5. Implementation Plan

Phase 1: Schema: Add klines super table.
Phase 2: Stream Computing: Implement create_kline_streams().
Phase 3: HTTP API: Implement query_klines() and API endpoint.
Phase 4: Verification: E2E test.

6. Verification

6.1 E2E Test Scenarios

Script: ./scripts/test_kline_e2e.sh

Check API connectivity.
Record initial K-Line count.
Create matched orders.
Wait for Stream processing (5s).
Query K-Line API and verify data structure.

6.2 Binance Standard Alignment

Warning

P0 Fix: Ensure time fields align with Binance standard (Unix Milliseconds Number).

open_time: 1734611580000 (was ISO 8601 string)

close_time: 1734611639999 (was missing)

Summary

This chapter implements K-Line aggregation service leveraging TDengine’s Stream Computing.

Key Concept:

K-Line is derived data. We calculate it from trades in real-time, rather than storing original raw data.

Next Chapter: 0x09-e OrderBook Depth.

↑ Back to Top

🇨🇳 中文

📦 代码变更: 查看 Diff

本节核心目标：实现 K-Line (蜡烛图) 实时聚合服务，支持多时间周期 (1m, 5m, 15m, 30m, 1h, 1d)。

背景：行情数据聚合

交易所需要提供标准化的行情数据：

每笔成交                          K-Line (OHLCV)
  │                                    │
  ├── Trade 1: price=30000, qty=0.1    │
  ├── Trade 2: price=30100, qty=0.2  ──▶ 1分钟 K-Line:
  ├── Trade 3: price=29900, qty=0.1    │   Open:  30000
  └── Trade 4: price=30050, qty=0.3    │   High:  30100
                                       │   Low:   29900
                                       │   Close: 30050
                                       │   Volume: 0.7

1. K-Line 数据结构

1.1 OHLCV

#![allow(unused)]
fn main() {
pub struct KLine {
    pub symbol_id: u32,
    pub interval: KLineInterval,
    pub open_time: u64,      // 时间戳 (毫秒)
    pub close_time: u64,
    pub open: u64,           // 开盘价
    pub high: u64,           // 最高价
    pub low: u64,            // 最低价
    pub close: u64,          // 收盘价
    pub volume: u64,         // 成交量 (base asset)
    pub quote_volume: u64,   // 成交额 (quote asset)
    pub trade_count: u32,    // 成交笔数
}
}

Warning

quote_volume 精度问题: price * qty 可能导致 u64 溢出，需使用 DOUBLE 计算。

1.2 API 响应格式

{
    "symbol": "BTC_USDT",
    "interval": "1m",
    "open_time": 1734533760000,
    "close_time": 1734533819999,
    "open": "30000.00",
    "high": "30100.00",
    "low": "29900.00",
    "close": "30050.00",
    "volume": "0.700000",
    "quote_volume": "21035.00",
    "trade_count": 4
}

2. 架构设计：TDengine Stream Computing

2.1 核心思路

利用 TDengine 内置流计算自动聚合 K-Line，无需手动实现聚合器：

Settlement 写入 trades 表后，TDengine 自动触发流计算
流计算结果自动写入 klines 表
HTTP API 直接查询 klines 表返回结果

2.2 数据流

   Settlement ──▶ trades 表 (TDengine)
                      │
                      │ TDengine Stream Computing (自动)
                      │
                      ├─── kline_1m_stream  ──► klines_1m 表
                      ├─── kline_5m_stream  ──► klines_5m 表
                      └─── ...

2.3 TDengine Stream 示例

CREATE STREAM IF NOT EXISTS kline_1m_stream
INTO klines_1m SUBTABLE(...)
AS SELECT
    _wstart AS ts,
    FIRST(price) AS open,
    MAX(price) AS high,
    MIN(price) AS low,
    LAST(price) AS close,
    SUM(qty) AS volume,
    SUM(CAST(price AS DOUBLE) * CAST(qty AS DOUBLE)) AS quote_volume,
    COUNT(*) AS trade_count
FROM trades
PARTITION BY symbol_id
INTERVAL(1m);

3. API 设计

HTTP 端点: GET /api/v1/klines?symbol=BTC_USDT&interval=1m&limit=100

4. 模块结构

src/
├── persistence/
│   ├── klines.rs           # Create Stream, Query K-Line
│   ├── schema.rs           # Add klines table
│   └── queries.rs          # Add query_klines()
├── gateway/
│   ├── handlers.rs         # Add get_klines

Tip

无需 src/kline/ 目录，TDengine 流计算替代了手动聚合逻辑

5. 实现计划

Phase 1: Schema: 添加 klines 超级表。
Phase 2: Stream Computing: 实现 create_kline_streams()。
Phase 3: HTTP API: 实现查询函数和 API 端点。
Phase 4: 验证: E2E 测试。

6. 验证计划

运行脚本 ./scripts/test_kline_e2e.sh 验证：

API 连通性
K-Line 数据生成 (Stream 处理)
响应结构正确性 (对齐 Binance 标准)

Summary

本章实现 K-Line 聚合服务。

核心理念：

K-Line 是衍生数据：从成交事件实时计算，而非存储原始数据。

下一章 (0x09-e) 将实现 OrderBook Depth 聚合。

0x09-e Order Book Depth

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

Core Objective: Implement Order Book Depth push, allowing users to view the current buy/sell order distribution in real-time.

Background: Depth Data

The Order Book Depth displays the current market’s distribution of limit orders:

         Asks (Sells)                   
  ┌─────────────────────┐              
  │ 30100.00   0.3 BTC  │ ← Lowest Ask
  │ 30050.00   0.5 BTC  │              
  │ 30020.00   1.2 BTC  │              
  ├─────────────────────┤              
  │    Current: 30000   │              
  ├─────────────────────┤              
  │ 29980.00   0.8 BTC  │              
  │ 29950.00   1.5 BTC  │              
  │ 29900.00   2.0 BTC  │ ← Highest Bid
  └─────────────────────┘              
         Bids (Buys)

1. Data Structure

1.1 Depth Response Format

{
    "symbol": "BTC_USDT",
    "bids": [
        ["29980.00", "0.800000"],
        ["29950.00", "1.500000"],
        ["29900.00", "2.000000"]
    ],
    "asks": [
        ["30020.00", "1.200000"],
        ["30050.00", "0.500000"],
        ["30100.00", "0.300000"]
    ],
    "last_update_id": 12345
}

1.2 Binance Format Comparison

Field	Us	Binance
bids	`[["price", "qty"], ...]`	✅ Match
asks	`[["price", "qty"], ...]`	✅ Match
last_update_id	`12345`	✅ Match

2. API Design

2.1 HTTP Endpoint

GET /api/v1/depth?symbol=BTC_USDT&limit=20

Parameter	Type	Description
symbol	String	Trading Pair
limit	u32	Depth levels (5, 10, 20, 50, 100)

2.2 WebSocket Push

// Subscribe
{"type": "subscribe", "channel": "depth", "symbol": "BTC_USDT"}

// Push (Incremental)
{
    "type": "depth.update",
    "symbol": "BTC_USDT",
    "bids": [["29980.00", "0.800000"]],
    "asks": [["30020.00", "0.000000"]],  // qty=0 means removal
    "last_update_id": 12346
}

3. Architecture Design

3.1 Comparison with K-Line

Data	Source	Latency	Method
K-Line	Historical Trades	Minute-level	TDengine Stream
Depth	Current Orders	Ms-level	In-Memory

Depth is too real-time for DB storage. We use Ring Buffer + Independent Service.

3.2 Event-Driven Architecture

Following the pattern: Isolated service, Ring Buffer, Lock-Free.

┌────────────┐                    ┌─────────────────────┐
│     ME     │ ──(non-blocking)─► │ depth_event_queue   │
│            │    drop if full    │ (capacity: 1024)    │
1└────────────┘                   └──────────┬──────────┘
                                             │
                                             ▼
                                  ┌─────────────────────┐
                                  │   DepthService      │
                                  │   (tokio async)     │
                                  ├─────────────────────┤
                                  │ ● HTTP Snapshot     │
                                  │ ● WS Incremental    │
                                  └─────────────────────┘

Important

Market Data Characteristic: Freshness is key. Dropping a few events is acceptable if the consumer is slow, as eventual consistency is restored by snapshots.

4. Module Structure

src/
├── gateway/
│   ├── handlers.rs     # Add get_depth
│   └── ...
├── engine.rs           # Add get_depth() method
└── websocket/
    └── messages.rs     # Add DepthUpdate

5. Implementation Plan

Phase 1: HTTP API: Add OrderBook::get_depth(), API endpoint.
Phase 2: WebSocket: depth.update message, subscription Logic.

6. Verification

6.1 E2E Test Scenarios

Script: scripts/test_depth.sh

Query empty depth.
Submit Buy/Sell orders (creating depth).
Wait for update (200ms).
Query depth and verify bids/asks.
Performance test (100 orders rapid fire).

Expected Result:

Depth reflects order book state.
Update latency ≤ 100ms.
High frequency updates are batched/throttled correctly.

Summary

Point	Implementation
Structure	Compatible with Binance (Array format)
API	`GET /api/v1/depth`
WebSocket	`depth.update` (Future: Incremental)
Architecture	Event-driven, Ring Buffer

Core Concept:

Service Isolation: ME pushes via DepthEvent. DepthService maintains state. Lock-free.

Next Chapter: 0x09-f Integration Test.

↑ Back to Top

🇨🇳 中文

📦 代码变更: 查看 Diff

本节核心目标：实现 Order Book 盘口深度推送，让用户实时看到买卖挂单分布。

背景：盘口数据

交易所盘口展示当前市场的买卖挂单分布：

         卖单 (Asks)                   
  ┌─────────────────────┐              
  │ 30100.00   0.3 BTC  │ ← 最低卖价   
  │ 30050.00   0.5 BTC  │              
  │ 30020.00   1.2 BTC  │              
  ├─────────────────────┤              
  │     当前价格: 30000 │              
  ├─────────────────────┤              
  │ 29980.00   0.8 BTC  │              
  │ 29950.00   1.5 BTC  │              
  │ 29900.00   2.0 BTC  │ ← 最高买价   
  └─────────────────────┘              
         买单 (Bids)

1. 数据结构

1.1 Depth 响应格式

{
    "symbol": "BTC_USDT",
    "bids": [
        ["29980.00", "0.800000"],
        ["29950.00", "1.500000"],
        ["29900.00", "2.000000"]
    ],
    "asks": [
        ["30020.00", "1.200000"],
        ["30050.00", "0.500000"],
        ["30100.00", "0.300000"]
    ],
    "last_update_id": 12345
}

1.2 Binance 格式对比

字段	我们	Binance
bids	`[["price", "qty"], ...]`	✅ 相同
asks	`[["price", "qty"], ...]`	✅ 相同
last_update_id	`12345`	✅ 相同

2. API 设计

2.1 HTTP 端点

GET /api/v1/depth?symbol=BTC_USDT&limit=20

参数	类型	描述
symbol	String	交易对
limit	u32	档位数量 (5, 10, 20, 50, 100)

2.2 WebSocket 推送

depth.update (增量更新)，qty=0 表示删除。

3. 架构设计

3.1 与 K-Line 的对比

数据	来源	时效性	处理方式
K-Line	历史成交	分钟级别	TDengine 流计算
Depth	当前挂单	毫秒级	内存状态

Depth 太实时，不适合存数据库——使用 ring buffer + 独立服务 模式。

3.2 事件驱动架构

延续项目一贯的设计：服务独立，通过 ring buffer 通信，lock-free。

┌────────────┐                    ┌─────────────────────┐
│     ME     │ ──(non-blocking)─► │ depth_event_queue   │
│            │    drop if full    │ (capacity: 1024)    │
└────────────┘                    └──────────┬──────────┘

4. 模块结构

src/
├── gateway/
│   ├── handlers.rs     # 添加 get_depth
├── engine.rs           # 添加 get_depth()
└── websocket/
    └── messages.rs     # 添加 DepthUpdate

5. 实现计划

Phase 1: HTTP API: 实现 OrderBook::get_depth 和 API。
Phase 2: WebSocket: 增量推送 (可选)。

6. 验证计划

运行 scripts/test_depth.sh:

查询空盘口
提交买卖单
验证盘口数据更新
性能验证 (100ms 更新频率)

Summary

设计点	方案
数据结构	bids/asks 数组，Binance 兼容
HTTP API	`GET /api/v1/depth`
WebSocket	`depth.update` (增量)
架构	事件驱动，Ring Buffer 通信

核心理念：

服务隔离：ME 通过 DepthEvent 推送，DepthService 维护独立状态，lock-free。

下一章 (0x09-f) 将进行集成测试。

0x09-f Integration Test: Full Acceptance

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

Core Objective: Perform comprehensive integration testing on all 0x09 features using historical datasets to establish a reproducible acceptance baseline.

Background

Phase 0x09 delivered multiple key features:

Chapter	Feature	Status
0x09-a	Gateway HTTP API	✅
0x09-b	Settlement Persistence	✅
0x09-c	WebSocket Push	✅
0x09-d	K-Line Aggregation	✅
0x09-e	Order Book Depth	✅

We now need to integrate and verify these features to ensure end-to-end correctness.

Test Scope

1. Pipeline Correctness

Test	Dataset	Verification Point
Single vs Multi-Thread	100K	Output Identical
Single vs Multi-Thread	1.3M	Output Identical

2. Settlement Persistence

Test	Verification Point
Orders Table	Status changes recorded correctly
Trades Table	Trade data integrity
Balances Table	Final balances match

3. HTTP API

Endpoint	Verification Point
POST /create_order	Success
POST /cancel_order	Correct execution
GET /orders	Correct list
GET /trades	Record integrity
GET /depth	Bids/Asks ordered

Acceptance Criteria

1. Pipeline Correctness (Must Pass All)

Output diff between Single-Thread and Multi-Thread is empty.
Final balances match exactly.
Trade counts match exactly.

2. Settlement Persistence (Must Pass All)

Orders Row Count == Total Orders.
Trades Row Count == Total Trades.
Final Balances match precisely (100% consistency for avail/frozen).

Important

Consistency Requirement: Core assets (avail, frozen) and order status (filled_qty, status) must be 100% consistent.

3. Performance Baseline

Record 100K and 1.3M TPS.
Record P99 Latency.

Test Artifacts & Baseline

Baseline Generation

After testing, organize the following for regression testing:

100K Output: baseline/100k/
1.3M Output: baseline/1.3m/
Performance Metrics: docs/src/perf-history/

Regression Testing

Use scripts to automatically compare against baseline:

./scripts/test_pipeline_compare.sh 100k
./scripts/test_integration_full.sh

Large Dataset Testing Notes

Important

Special attention needed for 1.3M dataset tests:

Output Redirection: Must redirect output to file to avoid IDE freezing.
Execution Time: Multi-thread mode is slower (~100s vs 16s) due to persistence overhead.
Balance Events: “Lock events != Accepted orders” is expected (due to cancels).
Push Queue Overflow: [PUSH] queue full warnings are expected under high load.

Test Report (2025-12-21)

Performance Baseline

Version	Time	Rate	vs Baseline
Baseline (urllib)	576s	174/s	-
HTTP Keep-Alive	117s	857/s	+393%
Optimized (Current)	69s	1,435/s	+725%

Pipeline Correctness (1.3M) ✅

Core balances consistent.
Trade count matches (667,567).
Balance final state 100% MATCH.

Settlement Persistence (100K)

Orders: 100% MATCH (filled_qty, status).
Trades: 100% MATCH.
Balances: 100% MATCH.

Conclusion: All 0x09 features (Persistence & Gateway) are production-ready.

↑ Back to Top

🇨🇳 中文

📦 代码变更: 查看 Diff

本节核心目标：使用历史数据集对所有 0x09 功能进行全面集成测试，建立可重复的验收基线。

背景

Phase 0x09 实现了多个关键功能：

章节	功能	状态
0x09-a	Gateway HTTP API	✅
0x09-b	Settlement Persistence	✅
0x09-c	WebSocket Push	✅
0x09-d	K-Line Aggregation	✅
0x09-e	Order Book Depth	✅

现需将这些功能整合验证，确保系统端到端正确性。

测试范围

1. Pipeline 正确性

测试	数据集	验证点
单线程 vs 多线程	100K	输出完全一致
单线程 vs 多线程	1.3M	输出完全一致

2. Settlement 持久化

测试	验证点
Orders 表	状态变更正确记录
Trades 表	成交数据完整
Balances 表	最终余额一致

3. HTTP API

验证 create_order, cancel_order, orders, trades, depth 等接口。

验收标准

1. Pipeline 正确性 (必须全部通过)

100K/1.3M 输出对比为空。
余额最终状态一致。
成交数量一致。

2. Settlement 持久化 (必须全部通过)

Orders/Trades 记录数匹配。
Balances 最终值 100% 匹配。

Important

一致性要求：核心资产 (avail, frozen) 和订单状态 (filled_qty, status) 必须 100% 一致。

3. 性能基线

记录 100K 和 1.3M TPS。
记录 P99 延迟。

测试产物与基线

基线生成与回归

使用 baseline/ 目录存储基线数据，并使用 test_pipeline_compare.sh 进行自动化回归测试。

大数据集测试注意事项

Important

运行 1.3M 数据集测试时需要特别注意：

输出重定向：必须重定向到文件。
执行时间：多线程模式较慢是正常的。
Balance Events：Lock 事件数不等于订单数是正常的。
Push Queue 溢出：高压下队列满警告是正常的。

测试报告 (2025-12-21)

性能基线

当前优化后 TPS 为 1,435/s，相比基线提升 725%。

Pipeline 正确性 (1.3M) ✅

成交数量匹配 (667,567)。
余额最终状态 100% MATCH。

Settlement 持久化 (100K)

Orders, Trades, Balances 均为 100% MATCH。

结论：0x09 阶段的所有持久化与网关功能已具备生产级稳定性。

Part II: Productization

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

Core Objective: Upgrade the core matching engine into a complete trading system with Account System, Fund Transfer, and Security Authentication.

1. Review: Achievements of Part I

Chapter	Topic	Key Achievement
0x01	Genesis	Minimal Matching Prototype
0x02-03	Floats & Decimals	Financial Grade Precision
0x04	BTree OrderBook	O(log n) Matching
0x05-06	User Balance	Locking/Unlocking
0x07	Testing Framework	100K Order Baseline
0x08	Multi-Thread Pipeline	4-Thread Concurrency
0x09	Gateway & Persistence	Gateway, TDengine, WebSocket

2. Gap Analysis: From Engine to System

Dimension	Current State	Target State
Identity	Raw `user_id`	API Key Signature
Accounts	Single Balance	Funding + Spot Dual-Account
Funds	Manual `deposit()`	Deposit/Withdraw/Transfer
Economics	Zero Fee	Maker/Taker Fees

3. Blueprint for Part II

0x0A ─── Account System & Security
        ├── 0x0A-a: Account System (exchange_info + DB)
        ├── 0x0A-b: ID Specification (Asset/Symbol Naming)
        └── 0x0A-c: Authentication (API Key Middleware)

0x0B ─── Fund System & Transfers
        ├── Funding/Spot Dual-Account Structure
        └── Deposit/Withdraw API

0x0C ─── Economic Model
        └── Fee Calculation & Deduction

0x0D ─── Snapshot & Recovery
        └── Graceful Shutdown & State Restoration

4. Tech Stack Choices

Component	Choice	Purpose
PostgreSQL 18	Account/Asset/Symbol	Relational Config Data
TDengine	Orders/Trades/K-Lines	Time-Series Trading Data
sqlx	Rust PG Driver	Async + Compile-time Check

5. Design Principles

Principle	Description
Minimal External Deps	Auth/Transfer logic is cohesive
Auditability	All fund changes must have event logs
Progressive	System remains runnable after each module
Backward Compatible	Reuse Core types from Part I

↑ Back to Top

🇨🇳 中文

📦 代码变更: 查看 Diff

核心目的：将撮合引擎核心升级为具备账户体系、资金划转和安全鉴权的完整交易系统。

1. 回顾：第一部分的成就

章节	主题	关键成果
0x01	创世纪	最简撮合原型
0x02-03	浮点数与定点数	金融级精度保障
0x04	BTree OrderBook	O(log n) 撮合
0x05-06	用户余额	锁定/解锁机制
0x07	测试框架	100K 订单基线
0x08	多线程 Pipeline	四线程并发架构
0x09	接入层 & 持久化	Gateway, TDengine, WebSocket

2. 差距分析：从引擎到系统

维度	当前状态	目标状态
身份认证	`user_id` 裸奔	API Key 签名校验
账户管理	单一余额结构	Funding + Spot 双账户
资金流转	手动 `deposit()`	完整充提+划转流程
经济模型	零手续费	Maker/Taker 费率

3. 第二部分蓝图

0x0A ─── 账户体系与安全鉴权
        ├── 0x0A-a: 账户体系 (exchange_info + DB 管理)
        ├── 0x0A-b: ID 规范 (Asset/Symbol 命名)
        └── 0x0A-c: 安全鉴权 (API Key 中间件)

0x0B ─── 资金体系与划转
        ├── Funding/Spot 双账户结构
        └── 充提币 API

0x0C ─── 经济模型
        └── 手续费计算与扣除

0x0D ─── 快照与恢复
        └── 优雅停机与状态恢复

4. 技术选型

组件	选型	用途
PostgreSQL 18	账户/资产/交易对	关系型配置数据
TDengine	订单/成交/K线	时序交易数据
sqlx	Rust PG Driver	异步 + 编译时检查

5. 设计原则

原则	说明
最小外部依赖	鉴权、划转等逻辑内聚
可审计性	所有资金变动必须有完整事件流水
渐进式增强	每个子模块完成后保持系统可运行
向后兼容	复用 Part I 的核心类型

0x0A-a: Account System

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

This chapter establishes the account infrastructure for the trading system: exchange_info module, naming conventions, and database management.

1. Core Module: exchange_info

1.1 Module Structure

src/exchange_info/
├── mod.rs           # Module entry
├── validation.rs    # AssetName/SymbolName validation
├── asset/
│   ├── mod.rs
│   ├── models.rs    # Asset struct + asset_flags
│   └── manager.rs   # AssetManager
└── symbol/
    ├── mod.rs
    ├── models.rs    # Symbol struct + symbol_flags
    └── manager.rs   # SymbolManager

1.2 Core Types

#![allow(unused)]
fn main() {
// Asset
pub struct Asset {
    pub asset_id: i32,
    pub asset: String,     // "BTC", "USDT" (UPPERCASE)
    pub name: String,      // "Bitcoin", "Tether USD"
    pub decimals: i16,     // 8 for BTC, 6 for USDT
    pub status: i16,
    pub asset_flags: i32,  // Permission bits
}

// Symbol
pub struct Symbol {
    pub symbol_id: i32,
    pub symbol: String,    // "BTC_USDT" (UPPERCASE)
    pub base_asset_id: i32,
    pub quote_asset_id: i32,
    pub price_decimals: i16,
    pub qty_decimals: i16,
    pub symbol_flags: i32,
}
}

2. Naming Convention

Category	Standard	Example
Database Name	`_db` suffix	`exchange_info_db`
Table Name	`_tb` suffix	`assets_tb`, `symbols_tb`
Flags Module	Table name prefix	`asset_flags::`, `symbol_flags::`
Codes	UPPERCASE	`BTC`, `BTC_USDT`

See Naming Convention Document.

3. Database Management

3.1 Management Script

# Full Init (Reset + Seed)
python3 scripts/db/manage_db.py init

# Reset Schema Only
python3 scripts/db/manage_db.py reset

# Seed Data Only
python3 scripts/db/manage_db.py seed

# Check Status
python3 scripts/db/manage_db.py status

3.2 Database Constraints

-- Enforce UPPERCASE Asset
CONSTRAINT chk_asset_uppercase CHECK (asset = UPPER(asset))

-- Enforce UPPERCASE Symbol
CONSTRAINT chk_symbol_uppercase CHECK (symbol = UPPER(symbol))

4. API Endpoints

4.1 GET /api/v1/exchange_info

Returns full exchange information:

{
  "code": 0,
  "data": {
    "assets": [
      {
        "asset_id": 1,
        "asset": "BTC",
        "name": "Bitcoin",
        "decimals": 8,
        "can_deposit": true,
        "can_withdraw": true,
        "can_trade": true
      }
    ],
    "symbols": [
      {
        "symbol_id": 1,
        "symbol": "BTC_USDT",
        "base_asset": "BTC",
        "quote_asset": "USDT",
        "price_decimals": 2,
        "qty_decimals": 8,
        "is_tradable": true,
        "is_visible": true
      }
    ],
    "server_time": 1734897000000
  }
}

4.2 Other Endpoints

Endpoint	Description
`GET /api/v1/assets`	Asset list only
`GET /api/v1/symbols`	Symbol list only

5. Verification

5.1 Integration Test

./scripts/test_account_integration.sh

Scope:

✅ DB Initialization (Auto reset + seed)
✅ Assets/Symbols/ExchangeInfo API
✅ DB Constraints (Lowercase rejected)
✅ Idempotency

5.2 Unit Test

cargo test --lib
# 150 passed, 0 failed

6. Next Steps

↑ Back to Top

🇨🇳 中文

📦 代码变更: 查看 Diff

本章建立交易系统的账户基础设施：exchange_info 模块、命名规范、数据库管理。

1. 核心模块：exchange_info

1.1 模块结构

src/exchange_info/
├── mod.rs           # 模块入口
├── validation.rs    # AssetName/SymbolName 验证
├── asset/
│   ├── mod.rs
│   ├── models.rs    # Asset 结构 + asset_flags
│   └── manager.rs   # AssetManager
└── symbol/
    ├── mod.rs
    ├── models.rs    # Symbol 结构 + symbol_flags
    └── manager.rs   # SymbolManager

1.2 核心类型

#![allow(unused)]
fn main() {
// Asset (资产)
pub struct Asset {
    pub asset_id: i32,
    pub asset: String,     // "BTC", "USDT" (强制大写)
    pub name: String,      // "Bitcoin", "Tether USD"
    pub decimals: i16,     // 8 for BTC, 6 for USDT
    pub status: i16,
    pub asset_flags: i32,  // 权限位
}

// Symbol (交易对)
pub struct Symbol {
    pub symbol_id: i32,
    pub symbol: String,    // "BTC_USDT" (强制大写)
    pub base_asset_id: i32,
    pub quote_asset_id: i32,
    pub price_decimals: i16,
    pub qty_decimals: i16,
    pub symbol_flags: i32,
}
}

2. 命名规范

类别	规范	示例
数据库名	`_db` 后缀	`exchange_info_db`
表名	`_tb` 后缀	`assets_tb`, `symbols_tb`
Flags 模块	表名前缀	`asset_flags::`, `symbol_flags::`
Asset/Symbol 代码	强制大写	`BTC`, `BTC_USDT`

详见命名规范文档

3. 数据库管理

3.1 Python 管理脚本

# 完整初始化（重置 + 种子数据）
python3 scripts/db/manage_db.py init

# 只重置 schema（无数据）
python3 scripts/db/manage_db.py reset

# 只添加种子数据
python3 scripts/db/manage_db.py seed

# 查看当前状态
python3 scripts/db/manage_db.py status

3.2 数据库约束

-- Asset 强制大写
CONSTRAINT chk_asset_uppercase CHECK (asset = UPPER(asset))

-- Symbol 强制大写
CONSTRAINT chk_symbol_uppercase CHECK (symbol = UPPER(symbol))

4. API 端点

4.1 GET /api/v1/exchange_info

返回完整的交易所信息：

{
  "code": 0,
  "data": {
    "assets": [
      {
        "asset_id": 1,
        "asset": "BTC",
        "name": "Bitcoin",
        "decimals": 8,
        "can_deposit": true,
        "can_withdraw": true,
        "can_trade": true
      }
    ],
    "symbols": [
      {
        "symbol_id": 1,
        "symbol": "BTC_USDT",
        "..."
      }
    ],
    "server_time": 1734897000000
  }
}

4.2 其他端点

端点	说明
`GET /api/v1/assets`	仅返回资产列表
`GET /api/v1/symbols`	仅返回交易对列表

5. 测试验证

5.1 集成测试

./scripts/test_account_integration.sh

5.2 单元测试

cargo test --lib

6. 下一步

0x0A-b: ID Specification & Account Structure

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📅 Status: Design Phase Core Objective: Define ID generation rules and account data structures.

1. ID Generation Rules

1.1 User ID (`u64`)

Semantics: Global unique user identifier.
Strategy: Auto-increment or Snowflake/ULID (for future distributed support).
Initial Value: 1024 (0-1023 reserved for system accounts).

1.2 Asset ID (`u32`)

Semantics: Asset identifier (e.g., BTC=1, USDT=2).
Strategy: Sequential allocation starting from 1.
Purpose: Maintain O(1) array indexing performance.

1.3 Symbol ID (`u32`)

Semantics: Trading Pair identifier (e.g., BTC_USDT=1).
Strategy: Sequential allocation starting from 1.

1.4 Account Identification

Semantics: User’s sub-account (distinguishing Funding vs Spot).

Strategy: Use (user_id, account_type) tuple, no composite ID needed.

#![allow(unused)]
fn main() {
struct AccountKey {
    user_id: u64,
    account_type: AccountType,  // Funding | Spot
}
}

Account Types:
- Spot = 1
- Funding = 2

1.5 Order ID / Trade ID (`u64`)

Semantics: Unique identifier for orders/trades within the Matching Engine.
Strategy: Global atomic increment.

2. Core Data Structures

2.1 `AccountType` Enum

#![allow(unused)]
fn main() {
#[repr(u8)]
pub enum AccountType {
    Funding = 0x01,
    Spot    = 0x02,
}
}

2.2 `Account` Struct (Conceptual)

#![allow(unused)]
fn main() {
pub struct Account {
    pub user_id: u64,
    pub account_type: AccountType,
    pub balances: HashMap<AssetId, Balance>,
    pub created_at: u64,
    pub status: AccountStatus,
}
}

3. System Reserved Accounts

User ID	Purpose	Description
`0`	REVENUE	Platform fee income account
`1`	INSURANCE	Insurance fund (future)
`2-1023`	Reserved	For future system use (1024 total)

This design will be updated to src/core_types.rs and src/account/mod.rs upon confirmation.

💡 Future Consideration: Alternative System ID Range

Current: System IDs use 0-1023 (1024 total), users start at 1024.

Problem: Test data might accidentally use 1, 2, 3… which conflicts with system IDs.

Alternative: Use u64::MAX downward for system accounts:

#![allow(unused)]
fn main() {
const REVENUE_ID: u64 = u64::MAX;        // 18446744073709551615
const INSURANCE_ID: u64 = u64::MAX - 1;  // 18446744073709551614
const SYSTEM_MIN: u64 = u64::MAX - 1000; // Boundary

fn is_system_account(user_id: u64) -> bool {
    user_id > SYSTEM_MIN
}
}

Benefits:

Users can start from 1, more natural
Test data never conflicts with system IDs
Clear separation: low = users, high = system

↑ Back to Top

🇨🇳 中文

📅 状态: 设计中 核心目标: 定义系统中所有关键 ID 的生成规则和账户的基础数据结构。

1. ID 生成规则

1.1 User ID (`u64`)

语义: 全局唯一的用户标识符。
生成策略: 自增序列或 Snowflake/ULID (未来支持分布式)。
初始值: 1024 (0-1023 保留给系统账户)。

1.2 Asset ID (`u32`)

语义: 资产标识符（如 BTC=1, USDT=2）。
生成策略: 顺序分配，从 1 开始。
目的: 保持 O(1) 数组索引性能。

1.3 Symbol ID (`u32`)

语义: 交易对标识符（如 BTC/USDT=1）。
生成策略: 顺序分配，从 1 开始。

1.4 账户标识

语义: 用户的子账户（区分 Funding 与 Spot）。

策略: 使用 (user_id, account_type) 元组，不需要复合 ID。

#![allow(unused)]
fn main() {
struct AccountKey {
    user_id: u64,
    account_type: AccountType,  // Funding | Spot
}
}

账户类型:
- Spot = 1
- Funding = 2

1.5 Order ID / Trade ID (`u64`)

语义: 撮合引擎内的订单/成交唯一标识。
生成策略: 全局原子递增。

2. 核心数据结构

2.1 `AccountType` 枚举

#![allow(unused)]
fn main() {
#[repr(u8)]
pub enum AccountType {
    Funding = 0x01,
    Spot    = 0x02,
}
}

2.2 `Account` 结构体 (概念)

#![allow(unused)]
fn main() {
pub struct Account {
    pub user_id: u64,
    pub account_type: AccountType,
    pub balances: HashMap<AssetId, Balance>,
    pub created_at: u64,
    pub status: AccountStatus,
}
}

3. 系统保留账户

User ID	用途	说明
`0`	REVENUE	平台手续费收入账户
`1`	INSURANCE	保险基金 (未来)
`2-1023`	保留	未来系统用途 (共 1024 个)

此设计待确认后，将同步更新至 src/core_types.rs 与 src/account/mod.rs。

💡 未来考虑: 替代系统 ID 范围

当前: 系统 ID 使用 0-1023 (共 1024 个)，用户从 1024 开始。

问题: 测试数据可能使用 1, 2, 3…，与系统 ID 冲突。

替代方案: 使用 u64::MAX 倒数作为系统账户：

#![allow(unused)]
fn main() {
const REVENUE_ID: u64 = u64::MAX;        // 18446744073709551615
const INSURANCE_ID: u64 = u64::MAX - 1;  // 18446744073709551614
const SYSTEM_MIN: u64 = u64::MAX - 1000; // 边界

fn is_system_account(user_id: u64) -> bool {
    user_id > SYSTEM_MIN
}
}

好处:

用户可以从 1 开始，更自然
测试数据永不与系统 ID 冲突
清晰分离: 低位=用户，高位=系统

0x0A-c: API Authentication

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📅 Status: ✅ Implemented Branch: 0x0A-b-api-auth Date: 2025-12-23 Code Changes: v0.0A-a-account-system…v0.0A-b-api-auth

Implementation Summary

Metric	Result
Auth Module	8 Files
Unit Tests	35/35 ✅
Total Tests	188/188 ✅
Commits	31 commits

1. Overview

Implement secure request authentication for Gateway API to protect trading endpoints from unauthorized access.

1.1 Design Goals

Goal	Description
Security	Prevent forgery and replay attacks
Performance	Verification latency < 1ms
Scalability	Support multiple auth methods
Usability	Developer-friendly SDK integration

1.2 Threat Model

Request Forgery
Replay Attack
Man-in-the-Middle (MITM)
API Key Leakage
Brute Force

2. Authentication Scheme Comparison

2.1 Evaluation

Scheme	Security	Performance	Complexity	Leal Risk
HMAC-SHA256	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	Medium	🔴 Secret on server
Ed25519	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	Medium	🟢 Public key only
JWT Token	⭐⭐⭐	⭐⭐⭐⭐⭐	Low	🔴 Token replayable
OAuth 2.0	⭐⭐⭐⭐	⭐⭐⭐	High	🟡 Dependency

2.2 Decision: Ed25519

Selected Ed25519 Asymmetric Signature.

No Server Secret: Only public key stored.
Non-Repudiation: Only private key holder can sign.
High Security: 128-bit security level (256-bit key).
Fast Verification: ~100μs.

3. Ed25519 Signature Design

3.1 Key Pair

Private Key: 32 bytes, stored on Client, NEVER transmitted.
Public Key: 32 bytes, stored on Server.
Signature: 64 bytes.

3.2 Request Signature Format

payload = api_key + ts_nonce + method + path + body
signature = Ed25519.sign(private_key, payload)

Header Format:

Authorization: ZXINF v1.<api_key>.<ts_nonce>.<signature>

Field	Description	Encoding
`api_key`	`AK_` + 16 HEX (19 chars)	plain
`ts_nonce`	Monotonic Timestamp (ms)	numeric
`signature`	64-byte signature	Base62

ts_nonce: Must be strictly monotonically increasing. new_ts = max(now_ms, last_ts + 1).

4. Database Design

4.1 api_keys_tb Table

CREATE TABLE api_keys_tb (
    key_id         SERIAL PRIMARY KEY,
    user_id        BIGINT NOT NULL REFERENCES users_tb(user_id),
    api_key        VARCHAR(35) UNIQUE NOT NULL,
    key_type       SMALLINT NOT NULL DEFAULT 1,  -- 1=Ed25519
    key_data       BYTEA NOT NULL,               -- Public Key (32 bytes)
    permissions    INT NOT NULL DEFAULT 1,
    status         SMALLINT NOT NULL DEFAULT 1,
    ...
);

4.2 Key Types

key_type	Algorithm	key_data
1	Ed25519	Public Key (32 bytes)
2	HMAC-SHA256	SHA256(secret)
3	RSA	PEM Public Key

5. Code Architecture

5.1 Module Structure

src/api_auth/
├── mod.rs
├── api_key.rs          # Model + Repository
├── signature.rs        # Ed25519 verification
├── middleware.rs       # Axum Middleware
└── error.rs            # Auth Errors

5.2 Request Flow

Extract Headers.
Verify Timestamp window.
Query ApiKey (Cache/DB).
Verify Ed25519 Signature.
Check Permissions.
Inject user_id into context.

6. Route Protection

6.1 Public Endpoints (No Auth)

GET /api/v1/public/exchange_info
GET /api/v1/public/depth
GET /api/v1/public/klines
GET /api/v1/public/ticker

6.2 Private Endpoints (Auth Required)

GET /api/v1/private/account
POST /api/v1/private/order (Trade Perm)
POST /api/v1/private/withdraw (Withdraw Perm)

7. Performance

Signature Verification: < 50μs (Ed25519).
DB Query: < 1ms (Cached).
Total Latency Overhead: < 2ms.

8. SDK Example (Python)

from nacl.signing import SigningKey
import time

api_key = "AK_..."
private_key = bytes.fromhex("...")
signing_key = SigningKey(private_key)

def sign_request(method, path, body=""):
    ts_nonce = str(int(time.time() * 1000))
    payload = f"{api_key}{ts_nonce}{method}{path}{body}"
    signature = signing_key.sign(payload.encode()).signature
    sig_b62 = base62_encode(signature)
    return f"v1.{api_key}.{ts_nonce}.{sig_b62}"

↑ Back to Top

🇨🇳 中文

📅 状态: ✅ 实现完成 代码变更: 查看 Diff

Implementation Summary

指标	结果
Auth 模块	8 文件
单元测试	35/35 ✅
全部测试	188/188 ✅

1. 概述

为 Gateway API 实现安全的请求鉴权机制，保护交易接口免受未授权访问。

1.1 设计目标

安全、高性能、可扩展、易用。

1.2 安全威胁模型

请求伪造、重放攻击、中间人攻击、Key 泄露等。

2. 鉴权方案对比

2.2 选型决策

选择 Ed25519 非对称签名。

服务端无 secret：仅存储公钥。
不可抵赖性。
高安全性。
快速验证 (~100μs)。

3. Ed25519 签名算法设计

3.1 密钥对生成

私钥客户端保存，公钥服务端存储。

3.2 请求签名格式

payload = api_key + ts_nonce + method + path + body
signature = Ed25519.sign(private_key, payload)

Header: Authorization: ZXINF v1.<api_key>.<ts_nonce>.<signature>

4. 数据库设计

4.1 api_keys_tb 表

支持 key_type (1=Ed25519, 2=HMAC, 3=RSA)。key_data 存储公钥或 secret hash。

5. 代码架构

src/api_auth/ 下包含 api_key, signature, middleware 等模块。

6. 路由保护策略

Public: 行情接口，无需鉴权。
Private: 交易/账户接口，需签名鉴权。

7. 性能考虑

使用 Ed25519 极速验证 (< 50μs) + 内存缓存，总延迟 < 2ms。

8. SDK 示例 (Python)

提供 Python/Curl 示例代码，展示如何生成符合规范的 Authorization header。

0x0B Funding & Transfer: Fund System

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📅 Status: 📝 Draft Branch: 0x0B-funding-transfer Date: 2025-12-23

1. Overview

1.1 Objectives

Build a complete fund management system supporting:

Deposit: External funds entering the exchange.
Withdraw: Funds leaving the exchange.
Transfer: Internal fund movement between accounts.

1.2 Design Principles

Principle	Description
Integrity	Complete audit log for every change
Double Entry	Debits = Credits, funds conserved
Async	Deposits/Withdrawals are async, Transfers sync
Idempotency	No duplicate execution
Auditability	All actions traceable

2. Account Model

2.1 Architecture Overview

┌─────────────────────────────────────────────────────────────────────────┐
│                         Account Architecture                            │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ┌───────────────────────────┐    ┌───────────────────────────┐       │
│   │    Funding Account        │    │     Spot Account          │       │
│   │    (account_type = 2)     │    │     (account_type = 1)    │       │
│   ├───────────────────────────┤    ├───────────────────────────┤       │
│   │  Storage: PostgreSQL      │    │  Storage: UBSCore (RAM)   │       │
│   │  Table: balances_tb       │    │  HashMap in memory        │       │
│   │                           │    │                           │       │
│   │  Purpose:                 │    │  Purpose:                 │       │
│   │  - Deposit (充值)          │    │  - Trading (撮合)          │       │
│   │  - Withdraw (提现)         │    │  - Order matching         │       │
│   │  - Internal Transfer      │    │  - Real-time balance      │       │
│   └─────────────┬─────────────┘    └─────────────┬─────────────┘       │
│                 │                                │                     │
│                 └──────── Transfer (划转) ───────┘                     │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

2.2 Storage Summary

Account	Type	Storage	Table/Structure
Funding	2	PostgreSQL	`balances_tb`
Spot	1	Memory (UBSCore)	`HashMap<(user_id, asset_id), Balance>`

Note: balances_tb is currently used for Funding account only. Spot balances are managed in-memory by UBSCore and persisted to TDengine as events.

2.3 Schema (PostgreSQL)

Current Implementation: Single balances_tb for all user balances.

-- 001_init_schema.sql
CREATE TABLE balances_tb (
    balance_id BIGSERIAL PRIMARY KEY,
    user_id BIGINT NOT NULL REFERENCES users_tb(user_id),
    asset_id INT NOT NULL REFERENCES assets_tb(asset_id),
    available DECIMAL(30, 8) NOT NULL DEFAULT 0,
    frozen DECIMAL(30, 8) NOT NULL DEFAULT 0,
    version INT NOT NULL DEFAULT 1,
    UNIQUE (user_id, asset_id)
);

Note: Current design uses single balance per (user_id, asset_id). Future multi-account support (Spot/Funding/Margin) can add account_type column.

3. Deposit Flow

User gets address.
User transfers funds to exchange address.
Indexer monitors chain.
Wait for Confirmations.
Credit to Funding Account.

3.1 Deposit Table

CREATE TYPE deposit_status AS ENUM ('pending', 'confirming', 'completed', 'failed');

CREATE TABLE deposits_tb (
    deposit_id      BIGSERIAL PRIMARY KEY,
    user_id         BIGINT NOT NULL REFERENCES users_tb(user_id),
    asset_id        INTEGER NOT NULL REFERENCES assets_tb(asset_id),
    amount          BIGINT NOT NULL,
    tx_hash         VARCHAR(128) UNIQUE,
    status          deposit_status NOT NULL DEFAULT 'pending',
    ...
);

4. Withdrawal Flow

User Request -> Review -> Sign -> Broadcast -> Complete.

4.1 Withdrawal Table

CREATE TYPE withdraw_status AS ENUM ('pending', 'risk_review', 'processing', 'completed', ...);

CREATE TABLE withdrawals_tb (
    withdrawal_id   BIGSERIAL PRIMARY KEY,
    user_id         BIGINT NOT NULL,
    amount          BIGINT NOT NULL,
    fee             BIGINT NOT NULL,
    net_amount      BIGINT NOT NULL,
    status          withdraw_status NOT NULL DEFAULT 'pending',
    ...
);

4.2 Risk Rules

Small Amount: Auto-approve (< 500 USDT).
Large Amount: Manual Review (>= 10000 USDT).
New Address: 24h Delay.

5. Transfer

5.1 Types

funding → spot: Available for trading.
spot → funding: Available for withdrawal.
user → user: Internal transfer.

5.2 API Design

POST /api/v1/private/transfer

{
    "from_account": "funding",
    "to_account": "spot",
    "asset": "USDT",
    "amount": "100.00"
}

6. Ledger

Complete record of all fund movements.

CREATE TYPE ledger_type AS ENUM ('deposit', 'withdraw', 'transfer_in', 'trade_buy', ...);

CREATE TABLE ledger_tb (
    ledger_id       BIGSERIAL PRIMARY KEY,
    user_id         BIGINT NOT NULL,
    ledger_type     ledger_type NOT NULL,
    amount          BIGINT NOT NULL,
    balance_after   BIGINT NOT NULL,
    ref_id          BIGINT,
    ...
);

7. Implementation Plan

Phase 1: DB: Migrations for sub_accounts, funding, ledger.
Phase 2: Transfer: Model + API (Sync).
Phase 3: Deposit: Model + Address logic.
Phase 4: Withdraw: Model + Risk logic.

8. Design Decisions

Decision	Choice	Reason
Account Model	Sub-accounts	Isolate trading risks
Storage	PostgreSQL	ACID Requirement
Transfer	Synchronous	User Experience
Deposit	Asynchronous	Chain dependency

↑ Back to Top

🇨🇳 中文

📅 状态: 📝 草稿分支: 0x0B-funding-transfer

1. 概述

构建完整的资金管理体系，支持充值、提现、划转。

1.2 设计原则

账本完整性、双重记账、异步处理、幂等性、可审计。

2. 账户模型

2.1 架构总览

┌─────────────────────────────────────────────────────────────────────────┐
│                           账户架构                                       │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ┌───────────────────────────┐    ┌───────────────────────────┐       │
│   │    Funding 账户           │    │     Spot 账户             │       │
│   │    (account_type = 2)     │    │     (account_type = 1)    │       │
│   ├───────────────────────────┤    ├───────────────────────────┤       │
│   │  存储: PostgreSQL         │    │  存储: UBSCore (内存)      │       │
│   │  表: balances_tb          │    │  HashMap 内存结构          │       │
│   │                           │    │                           │       │
│   │  用途:                    │    │  用途:                    │       │
│   │  - 充值 (Deposit)         │    │  - 撮合交易 (Trading)      │       │
│   │  - 提现 (Withdraw)        │    │  - 订单匹配               │       │
│   │  - 内部划转               │    │  - 实时余额管理            │       │
│   └─────────────┬─────────────┘    └─────────────┬─────────────┘       │
│                 │                                │                     │
│                 └──────── 划转 (Transfer) ───────┘                     │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

2.2 存储汇总

账户	类型	存储	表/结构
Funding	2	PostgreSQL	`balances_tb`
Spot	1	内存 (UBSCore)	`HashMap<(user_id, asset_id), Balance>`

备注: balances_tb 目前仅用于 Funding 账户。 Spot 余额由 UBSCore 内存管理，事件持久化到 TDengine。

2.3 数据库设计 (PostgreSQL)

当前实现: balances_tb 用于 Funding 账户余额。

-- 001_init_schema.sql
CREATE TABLE balances_tb (
    balance_id BIGSERIAL PRIMARY KEY,
    user_id BIGINT NOT NULL,
    asset_id INT NOT NULL,
    available DECIMAL(30, 8) NOT NULL DEFAULT 0,
    frozen DECIMAL(30, 8) NOT NULL DEFAULT 0,
    UNIQUE (user_id, asset_id)
);

备注: 当前设计每个 (user_id, asset_id) 一条余额记录。未来多账户支持 (Spot/Funding/Margin) 可添加 account_type 列。

3. 充值流程 (Deposit)

监听链上交易 -> 等待确认数 -> 入账 Funding 账户。

3.3 确认数规则

BTC 3个确认 (~30min)，ETH 12个确认 (~3min)。

4. 提现流程 (Withdraw)

用户申请 -> 风控审核 -> 签名广播 -> 完成。

4.3 风控规则

小额免审，大额人工复核，新地址延迟提现。

5. 划转 (Transfer)

5.1 划转类型

支持 funding <-> spot 互转，及内部用户转账。

5.3 API 设计

POST /api/v1/private/transfer，需要 Ed25519 签名鉴权。

6. 资金流水 (Ledger)

记录每一笔资金变动 (deposit, withdraw, trade, fee, etc.)，确保可追溯。

7. 实现计划

Phase 1: 数据库 Migration
Phase 2: Transfer 功能 (优先)
Phase 3: Deposit (P2)
Phase 4: Withdraw (P2)

8. 设计决策

决策	选择	理由
账户模型	子账户	隔离交易和充提资金
充提存储	PostgreSQL	需要事务 ACID
划转	同步	低延迟，用户体验好
充提	异步	依赖链上确认

0x0B-a Internal Transfer Architecture (Strict FSM)

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

1. Problem Statement

1.1 System Topology

System	Role	Source of Truth	Persistence
PostgreSQL	Funding Account	`balances_tb`	ACID, Durable
UBSCore	Trading Account	RAM	WAL + Volatile

1.2 The Core Constraint

These two systems cannot share a transaction. There is no XA/2PC database protocol. Therefore: We must build our own 2-Phase Commit using an external FSM Coordinator.

1.5 Security Pre-Validation (MANDATORY)

Caution

Defense-in-Depth All checks below MUST be performed at every independent module, not just API layer.

API Layer: First line of defense, reject obviously invalid requests

Coordinator: Re-validate, prevent internal calls bypassing API

Adapters: Final defense, each adapter must independently validate parameters

UBSCore: Last check before in-memory operations

Safety > Performance. The cost of redundant checks is acceptable; security vulnerabilities are not.

1.5.1 Identity & Authorization Checks

Check	Attack Vector	Validation Logic	Error Code
User Authentication	Forged request	JWT/Session must be valid	`UNAUTHORIZED`
User ID Consistency	Cross-user transfer attack	`request.user_id == auth.user_id`	`FORBIDDEN`
Account Ownership	Steal others’ funds	Source/Target accounts belong to same `user_id`	`FORBIDDEN`

1.5.2 Account Type Checks

Check	Attack Vector	Validation Logic	Error Code
from != to	Infinite wash trading/resource waste	`request.from != request.to`	`SAME_ACCOUNT`
Account Type Valid	Inject invalid type	`from, to ∈ {FUNDING, SPOT}`	`INVALID_ACCOUNT_TYPE`
Account Type Supported	Request unlaunched feature	`from, to` both in supported list	`UNSUPPORTED_ACCOUNT_TYPE`

1.5.3 Amount Checks

Check	Attack Vector	Validation Logic	Error Code
amount > 0	Zero/negative transfer	`amount > 0`	`INVALID_AMOUNT`
Precision Check	Precision overflow	`decimal_places(amount) <= asset.precision`	`PRECISION_OVERFLOW`
Minimum Amount	Dust attack	`amount >= asset.min_transfer_amount`	`AMOUNT_TOO_SMALL`
Maximum Single Amount	Risk control bypass	`amount <= asset.max_transfer_amount`	`AMOUNT_TOO_LARGE`
Integer Overflow	u64 overflow attack	`amount <= u64::MAX / safety_factor`	`OVERFLOW`

1.5.4 Asset Checks

Check	Attack Vector	Validation Logic	Error Code
Asset Exists	Fake asset_id	`asset_id` exists in system	`INVALID_ASSET`
Asset Status	Delisted asset	`asset.status == ACTIVE`	`ASSET_SUSPENDED`
Transfer Permission	Some assets forbid internal transfer	`asset.internal_transfer_enabled == true`	`TRANSFER_NOT_ALLOWED`

1.5.5 Account Status Checks

Account Initialization Rules (Overview)

Account Type	Init Timing	Notes
FUNDING	Created on first deposit request	Triggered by external deposit flow
SPOT	Created on first internal transfer	Lazy Init
FUTURE	Created on first internal transfer [P2]	Lazy Init
MARGIN	Created on first internal transfer [P2]	Lazy Init

Note

Specific initialization behaviors and business rules for each account type are defined in their dedicated documents.

Each account has its own state definitions (e.g., whether transfer is allowed); not detailed here.

Default State: On account initialization, transfer is allowed by default.

Account Status Check Table

Check	Attack Vector	Validation Logic	Error Code
Source Account Exists	Non-existent account	Source account record must exist	`SOURCE_ACCOUNT_NOT_FOUND`
Target Account Exists/Create	Non-existent target	FUNDING must exist; SPOT/FUTURE/MARGIN can create	`TARGET_ACCOUNT_NOT_FOUND` (FUNDING only)
Source Not Frozen	Frozen account transfer out	`source.status != FROZEN`	`ACCOUNT_FROZEN`
Source Not Disabled	Disabled account operation	`source.status != DISABLED`	`ACCOUNT_DISABLED`
Sufficient Balance	Insufficient balance direct reject	`source.available >= amount`	`INSUFFICIENT_BALANCE`

1.5.6 Rate Limiting - [P2 Future Optimization]

Note

This is a V2 optimization. V1 may skip this.

Check	Attack Vector	Validation Logic	Error Code
Requests Per Second	DoS attack	`user_requests_per_second <= 10`	`RATE_LIMIT_EXCEEDED`
Daily Transfer Count	Abuse	`user_daily_transfers <= 100`	`DAILY_LIMIT_EXCEEDED`
Daily Transfer Amount	Large amount risk control	`user_daily_amount <= daily_limit`	`DAILY_AMOUNT_EXCEEDED`

1.5.7 Idempotency Check

Check	Attack Vector	Validation Logic	Error Code
cid Unique	Duplicate submission	If `cid` provided, check if exists	`DUPLICATE_REQUEST` (return original result)

1.5.8 Check Order (Recommended)

1. Authentication (JWT valid?)
2. Authorization (user_id match?)
3. Request Format (from/to/amount valid?)
4. Account Type (from != to, type supported?)
5. Asset Check (exists? enabled? transferable?)
6. Amount Check (range? precision? overflow?)
7. Rate Limiting (exceeded?)
8. Idempotency (duplicate?)
9. Balance Check (sufficient?) ← Check last, avoid unnecessary queries

2. FSM Design (The State Machine)

2.0 Library Choice: `rust-fsm`

We use the rust-fsm library, providing:

✅ Compile-time validation - Illegal state transitions cause compile errors.
✅ Declarative DSL - Clearly defined states and transitions.
✅ Type Safety - Prevents missing match arms.

Cargo.toml:

[dependencies]
rust-fsm = "0.7"

DSL Definition:

#![allow(unused)]
fn main() {
use rust_fsm::*;

state_machine! {
    derive(Debug, Clone, Copy, PartialEq, Eq)
    
    TransferFsm(Init)  // Initial State
    
    // State Definitions
    Init => {
        SourceWithdrawOk => SourceDone,
        SourceWithdrawFail => Failed,
    },
    SourceDone => {
        TargetDepositOk => Committed,
        TargetDepositFail => Compensating,
        TargetDepositUnknown => SourceDone [loop],  // Stay, Infinite Retry
    },
    Compensating => {
        RefundOk => RolledBack,
        RefundFail => Compensating [loop],  // Stay, Infinite Retry
    },
    // Terminal States
    Committed,
    Failed,
    RolledBack,
}
}

Note

The DSL above is used for compile-time validation of state transition validity. Actual runtime state is stored in PostgreSQL and updated via CAS.

2.0.1 Core State Flow (Top Level)

                               ┌─────────────────────────────────────────────────────────┐
                               │              INTERNAL TRANSFER FSM                       │
                               └─────────────────────────────────────────────────────────┘

    ┌─────────────────────────────── Happy Path ────────────────────────────────────────────┐
    │                                                                                       │
    │    ┌─────────┐                    ┌─────────────┐                    ┌───────────────┐  │
    │    │  INIT   │   Source Deduct ✓  │ SOURCE_DONE │   Target Credit ✓  │               │  │
    │    │(Request)│ ─────────────────▶ │ (In-Flight) │ ─────────────────▶ │   COMMITTED   │  │
    │    └─────────┘                    └─────────────┘                    │               │  │
    │         │                               │                            └───────────────┘  │
    │         │                               │                                   ✅          │
    └─────────│───────────────────────────────│───────────────────────────────────────────────┘
              │                               │
              │                               │
              │                               ▼
              │                     ╔══════════════════════════════════════════════════╗
              │                     ║  🔒 ATOMIC COMMIT                               ║
              │                     ║                                                  ║
              │                     ║  IF AND ONLY IF:                                 ║
              │                     ║    FROM.withdraw = SUCCESS  ✓                   ║
              │                     ║    TO.deposit    = SUCCESS  ✓                   ║
              │                     ║                                                  ║
              │                     ║  EXECUTE: CAS(SOURCE_DONE → COMMITTED)           ║
              │                     ║  Must be atomic and non-interruptible.           ║
              │                     ╚══════════════════════════════════════════════════╝
              │                               │
              │ Source Deduction Fail         │ Target Credit Fail (EXPLICIT_FAIL)
              ▼                               ▼
        ┌──────────┐                   ┌──────────────┐
        │  FAILED  │                   │ COMPENSATING │◀───────────┐
        │ (Source) │                   │  (Refunding) │            │ Refund Fail (Infinite Retry)
        └──────────┘                   └──────────────┘────────────┘
             ❌                               │ Refund Success
                                              ▼
                                       ┌─────────────┐
                                       │ ROLLED_BACK │
                                       │ (Restored)  │
                                       └─────────────┘
                                             ↩️

    ╔════════════════════════════════════════════════════════════════════════════════════════╗
    ║  ⚠️ Target Unknown (TIMEOUT/UNKNOWN) → Stay SOURCE_DONE, Infinite Retry, NEVER rollback. ║
    ╚════════════════════════════════════════════════════════════════════════════════════════╝

Core State Description:

State	Fund Location	Description
`INIT`	Source Account	User request accepted, funds haven’t moved yet.
`SOURCE_DONE`	In-Flight	CRITICAL! Funds have left source, haven’t reached target.
`COMMITTED`	Target Account	Terminal state, transfer succeeded.
`FAILED`	Source Account	Terminal state, source deduction failed, no funds moved.
`COMPENSATING`	In-Flight	Target credit failed, refunding to source.
`ROLLED_BACK`	Source Account	Terminal state, refund succeeded.

Important

SOURCE_DONE is the most critical state - funds have left the source account but have not yet reached the target. At this point, the state MUST NOT be lost; it must eventually reach COMMITTED or ROLLED_BACK.

2.1 States (Exhaustive)

ID	State Name	Entry Condition	Terminal?	Funds Location
0	`INIT`	User request accepted.	No	Source
10	`SOURCE_PENDING`	CAS success, Adapter call initiated.	No	Source (Deducting)
20	`SOURCE_DONE`	Source Adapter returned `OK`.	No	In-Flight
30	`TARGET_PENDING`	CAS success, Target Adapter call initiated.	No	In-Flight (Crediting)
40	`COMMITTED`	Target Adapter returned `OK`.	YES	Target
-10	`FAILED`	Source Adapter returned `FAIL`.	YES	Source (Unchanged)
-20	`COMPENSATING`	Target Adapter `FAIL` AND Source is Reversible.	No	In-Flight (Refunding)
-30	`ROLLED_BACK`	Source Refund `OK`.	YES	Source (Restored)

2.2 State Transition Rules (Exhaustive)

┌───────────────────────────────────────────────────────────────────────────────┐
│                         CANONICAL STATE TRANSITIONS                           │
├───────────────────────────────────────────────────────────────────────────────┤
│                                                                               │
│  INIT ──────[CAS OK]───────► SOURCE_PENDING                                   │
│    │                              │                                           │
│    │                              ├──[Adapter OK]────► SOURCE_DONE            │
│    │                              │                         │                 │
│    │                              └──[Adapter FAIL]──► FAILED (Terminal)      │
│    │                                                        │                 │
│    │                                                        │                 │
│    │                              SOURCE_DONE ──[CAS OK]──► TARGET_PENDING    │
│    │                                                             │            │
│    │                        ┌────────────────────────────────────┤            │
│    │                        │                                    │            │
│    │            [Adapter OK]│                       [Adapter FAIL]            │
│    │                        │                                    │            │
│    │                        ▼                                    ▼            │
│    │                   COMMITTED                     ┌───────────────────┐    │
│    │                   (Terminal)                    │ SOURCE REVERSIBLE?│    │
│    │                                                 └─────────┬─────────┘    │
│    │                                                   YES     │     NO       │
│    │                                                   ▼       │     ▼        │
│    │                                           COMPENSATING    │  INFINITE    │
│    │                                                 │         │   RETRY      │
│    │                                    [Refund OK]  │         │ (Stay in     │
│    │                                         ▼       │         │  TARGET_     │
│    │                                    ROLLED_BACK  │         │  PENDING)    │
│    │                                    (Terminal)   │         │              │
│    │                                                 │         │              │
│    └─────────────────────────────────────────────────┴─────────┴──────────────┘

2.3 Reversibility Rule (CRITICAL)

Core Principle: Only when an Adapter returns an explicitly defined failure can we safely rollback.

Response Type	Meaning	Can Safely Rollback?	Handling
`SUCCESS`	Operation succeeded	N/A	Continue to next step
`EXPLICIT_FAIL`	Explicit business failure (e.g., insufficient balance)	YES	Can enter `COMPENSATING`
`TIMEOUT`	Timeout, state unknown	NO	Infinite Retry
`PENDING`	Processing, state unknown	NO	Infinite Retry
`NETWORK_ERROR`	Network error, state unknown	NO	Infinite Retry
`UNKNOWN`	Any other situation	NO	Infinite Retry or Manual Intervention

Caution

Only EXPLICIT_FAIL allows safe rollback. Any unknown state (Timeout, Pending, Network Error) means funds are In-Flight. We cannot know whether the counterparty has processed the request. Rash rollback will cause Double Spend or Fund Loss. Only safe actions: Infinite Retry or Manual Intervention.

3. Transfer Scenarios (Step-by-Step)

3.1 Scenario A: Funding → Spot (Deposit to Trading)

Happy Path:

Step	Actor	Action	Pre-State	Post-State	Funds
1	API	Validate, Create Record	-	`INIT`	Funding
2	Coordinator	CAS(`INIT` → `SOURCE_PENDING`)	`INIT`	`SOURCE_PENDING`	Funding
3	Coordinator	Call `FundingAdapter.withdraw(req_id)`	-	-	-
4	PG	`UPDATE balances SET amount = amount - X`	-	-	Deducted
5	Coordinator	On `OK`: CAS(`SOURCE_PENDING` → `SOURCE_DONE`)	`SOURCE_PENDING`	`SOURCE_DONE`	In-Flight
6	Coordinator	CAS(`SOURCE_DONE` → `TARGET_PENDING`)	`SOURCE_DONE`	`TARGET_PENDING`	In-Flight
7	Coordinator	Call `TradingAdapter.deposit(req_id)`	-	-	-
8	UBSCore	Credit RAM, Write WAL, Emit Event	-	-	Credited
9	Coordinator	On Event: CAS(`TARGET_PENDING` → `COMMITTED`)	`TARGET_PENDING`	`COMMITTED`	Trading

Failure Path (Target Fails):

Step	Actor	Action	Pre-State	Post-State	Funds
7’	Coordinator	Call `TradingAdapter.deposit(req_id)` → FAIL/Timeout	`TARGET_PENDING`	-	In-Flight
8’	Coordinator	Check: Source = Funding (Reversible)	-	-	-
9’	Coordinator	CAS(`TARGET_PENDING` → `COMPENSATING`)	`TARGET_PENDING`	`COMPENSATING`	In-Flight
10’	Coordinator	Call `FundingAdapter.refund(req_id)`	-	-	-
11’	PG	`UPDATE balances SET amount = amount + X`	-	-	Refunded
12’	Coordinator	CAS(`COMPENSATING` → `ROLLED_BACK`)	`COMPENSATING`	`ROLLED_BACK`	Funding

3.2 Scenario B: Spot → Funding (Withdraw from Trading)

Happy Path:

Step	Actor	Action	Pre-State	Post-State	Funds
1	API	Validate, Create Record	-	`INIT`	Trading
2	Coordinator	CAS(`INIT` → `SOURCE_PENDING`)	`INIT`	`SOURCE_PENDING`	Trading
3	Coordinator	Call `TradingAdapter.withdraw(req_id)`	-	-	-
4	UBSCore	Check Balance, Deduct RAM, Write WAL, Emit Event	-	-	Deducted
5	Coordinator	On Event: CAS(`SOURCE_PENDING` → `SOURCE_DONE`)	`SOURCE_PENDING`	`SOURCE_DONE`	In-Flight
6	Coordinator	CAS(`SOURCE_DONE` → `TARGET_PENDING`)	`SOURCE_DONE`	`TARGET_PENDING`	In-Flight
7	Coordinator	Call `FundingAdapter.deposit(req_id)`	-	-	-
8	PG	`INSERT ... ON CONFLICT UPDATE SET amount = amount + X`	-	-	Credited
9	Coordinator	On `OK`: CAS(`TARGET_PENDING` → `COMMITTED`)	`TARGET_PENDING`	`COMMITTED`	Funding

Failure Path (Target Fails):

Step	Actor	Action	Pre-State	Post-State	Funds
7a	Coordinator	Call `FundingAdapter.deposit(req_id)` → EXPLICIT_FAIL (e.g., constraint)	`TARGET_PENDING`	-	In-Flight
8a	Coordinator	Check response type = EXPLICIT_FAIL (can safely rollback)	-	-	-
9a	Coordinator	CAS(`TARGET_PENDING` → `COMPENSATING`)	`TARGET_PENDING`	`COMPENSATING`	In-Flight
10a	Coordinator	Call `TradingAdapter.refund(req_id)` (refund to UBSCore)	-	-	-
11a	UBSCore	Credit RAM balance, write WAL	-	-	Refunded
12a	Coordinator	CAS(`COMPENSATING` → `ROLLED_BACK`)	`COMPENSATING`	`ROLLED_BACK`	Trading

Step	Actor	Action	Pre-State	Post-State	Funds
7b	Coordinator	Call `FundingAdapter.deposit(req_id)` → TIMEOUT/UNKNOWN	`TARGET_PENDING`	-	In-Flight
8b	Coordinator	Check response type = UNKNOWN (cannot safely rollback)	-	-	-
9b	Coordinator	DO NOT TRANSITION. Stay `TARGET_PENDING`.	`TARGET_PENDING`	`TARGET_PENDING`	In-Flight
10b	Coordinator	Log CRITICAL. Alert Ops. Schedule Retry.	-	-	-
11b	Recovery	Retry `FundingAdapter.deposit(req_id)` INFINITELY.	-	-	-
12b	(Eventually)	On `OK`: CAS(`TARGET_PENDING` → `COMMITTED`)	`TARGET_PENDING`	`COMMITTED`	Funding

Warning

Only enter COMPENSATING when Target returns EXPLICIT_FAIL. If Timeout or Unknown, funds are In-Flight. Must Infinite Retry or Manual Intervention.

4. Failure Mode and Effects Analysis (FMEA)

4.1 Phase 1 Failures (Source Operation)

Failure	Cause	Current State	Funds	Resolution
Adapter returns `FAIL`	Insufficient balance, DB constraint	`SOURCE_PENDING`	Source	Transition to `FAILED`. User sees error.
Adapter returns `PENDING`	Timeout, network issue	`SOURCE_PENDING`	Unknown	Retry. Adapter MUST be idempotent.
Coordinator crashes after CAS, before call	Process kill	`SOURCE_PENDING`	Source	Recovery Worker retries call.
Coordinator crashes after call, before result	Process kill	`SOURCE_PENDING`	Unknown	Recovery Worker retries (idempotent).

4.2 Phase 2 Failures (Target Operation)

Failure	Cause	Response Type	Current State	Funds	Resolution
Target explicit reject	Business rule	`EXPLICIT_FAIL`	`TARGET_PENDING`	In-Flight	`COMPENSATING` → Refund.
Timeout	Network delay	`TIMEOUT`	`TARGET_PENDING`	Unknown	Infinite Retry.
Network error	Connection lost	`NETWORK_ERROR`	`TARGET_PENDING`	Unknown	Infinite Retry.
Unknown error	System exception	`UNKNOWN`	`TARGET_PENDING`	Unknown	Infinite Retry or Manual Intervention.
Coordinator crashes	Process kill	N/A	`TARGET_PENDING`	In-Flight	Recovery Worker retries.

4.3 Compensation Failures

Failure	Cause	Current State	Funds	Resolution
Refund `FAIL`	PG down, constraint	`COMPENSATING`	In-Flight	Infinite Retry. Funds stuck until PG up.
Refund `PENDING`	Timeout	`COMPENSATING`	Unknown	Retry.

5. Idempotency Requirements (MANDATORY)

5.1 Why Idempotency?

Retries are the foundation of crash recovery. Without idempotency, a retry will cause double execution (double deduction, double credit).

5.2 Implementation (Funding Adapter)

Requirement: Given the same req_id, calling withdraw() or deposit() multiple times MUST have the same effect as calling it once.

Mechanism:

transfers_tb has UNIQUE(req_id).

Atomic Transaction:

BEGIN;
-- Check if already processed
SELECT state FROM transfers_tb WHERE req_id = $1;
IF state >= expected_post_state THEN
    RETURN 'AlreadyProcessed';
END IF;

-- Perform balance update
UPDATE balances_tb SET amount = amount - $2 WHERE user_id = $3 AND asset_id = $4 AND amount >= $2;
IF NOT FOUND THEN
    RETURN 'InsufficientBalance';
END IF;

-- Update state
UPDATE transfers_tb SET state = $new_state, updated_at = NOW() WHERE req_id = $1;
COMMIT;
RETURN 'Success';

5.3 Implementation (Trading Adapter)

Requirement: Same as above. UBSCore MUST reject duplicate req_id.

Mechanism:

InternalOrder includes req_id field (or cid).
UBSCore maintains a ProcessedTransferSet (HashSet in RAM, rebuilt from WAL on restart).

On receiving Transfer Order:

IF req_id IN ProcessedTransferSet THEN
    RETURN 'AlreadyProcessed' (Success, no-op)
ELSE
    ProcessTransfer()
    ProcessedTransferSet.insert(req_id)
    WriteWAL(TransferEvent)
    RETURN 'Success'
END IF

6. Recovery Worker (Zombie Handler)

6.1 Purpose

On Coordinator startup (or periodically), scan for “stuck” transfers and resume them.

6.2 Query

SELECT * FROM transfers_tb 
WHERE state IN (0, 10, 20, 30, -20) -- INIT, SOURCE_PENDING, SOURCE_DONE, TARGET_PENDING, COMPENSATING
  AND updated_at < NOW() - INTERVAL '1 minute'; -- Stale threshold

6.3 Recovery Logic

Current State	Action
`INIT`	Call `step()` (will transition to `SOURCE_PENDING`).
`SOURCE_PENDING`	Retry `Source.withdraw()`.
`SOURCE_DONE`	Call `step()` (will transition to `TARGET_PENDING`).
`TARGET_PENDING`	Retry `Target.deposit()`. Apply Reversibility Rule.
`COMPENSATING`	Retry `Source.refund()`.

7. Data Model

7.1 Table: `transfers_tb`

CREATE TABLE transfers_tb (
    transfer_id   BIGSERIAL PRIMARY KEY,
    req_id        VARCHAR(26) UNIQUE NOT NULL,  -- Server-generated Unique ID (ULID)
    cid           VARCHAR(64) UNIQUE,           -- Client Idempotency Key (Optional)
    user_id       BIGINT NOT NULL,
    asset_id      INTEGER NOT NULL,
    amount        DECIMAL(30, 8) NOT NULL,
    transfer_type SMALLINT NOT NULL,            -- 1 = Funding->Spot, 2 = Spot->Funding
    source_type   SMALLINT NOT NULL,            -- 1 = Funding, 2 = Trading
    state         SMALLINT NOT NULL DEFAULT 0,  -- FSM State ID
    error_message TEXT,                         -- Last error (for debugging)
    retry_count   INTEGER NOT NULL DEFAULT 0,
    created_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at    TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_transfers_state ON transfers_tb(state) WHERE state NOT IN (40, -10, -30);

7.2 Invariant Check

Run periodically to detect data corruption:

-- Sum of Funding + Trading + In-Flight should be constant per user per asset
-- In-Flight = SUM(amount) WHERE state IN (SOURCE_DONE, TARGET_PENDING, COMPENSATING)

8. API Contract

8.1 Endpoint: `POST /api/v1/internal_transfer`

Request:

{
  "from": "SPOT",       // Source account type
  "to": "FUNDING",     // Target account type
  "asset": "USDT",
  "amount": "100.00"
}

Account Type Enum (AccountType):

Value	Meaning	Status
`FUNDING`	Funding Account (PostgreSQL)	Supported
`SPOT`	Spot Trading Account (UBSCore)	Supported
`FUTURE`	Futures Account	Future Extension
`MARGIN`	Margin Account	Future Extension

Response:

{
  "transfer_id": 12345,
  "req_id": "01JFVQ2X8Z0Y1M3N4P5R6S7T8U",  // Server-generated (ULID)
  "from": "SPOT",
  "to": "FUNDING",
  "state": "COMMITTED",  // or "PENDING" if async
  "message": "Transfer successful"
}

8.2 Query Endpoint: `GET /api/v1/internal_transfer/:req_id`

Response:

{
  "transfer_id": 12345,
  "req_id": "sr-1734912345678901234",
  "from": "SPOT",
  "to": "FUNDING",
  "asset": "USDT",
  "amount": "100.00",
  "state": "COMMITTED",
  "created_at": "2024-12-23T14:00:00Z",
  "updated_at": "2024-12-23T14:00:01Z"
}

Important

req_id is SERVER-GENERATED, not client. If client needs idempotency, use optional cid (client_order_id) field. Server will check for duplicates and return existing result.

Error Codes:

Code	Meaning
`INSUFFICIENT_BALANCE`	Source account balance < amount.
`INVALID_ACCOUNT_TYPE`	`from` or `to` account type is invalid or unsupported.
`SAME_ACCOUNT`	`from` and `to` are the same.
`DUPLICATE_REQUEST`	`cid` already processed. Return original result.
`INVALID_AMOUNT`	amount <= 0 or exceeds precision.
`SYSTEM_ERROR`	Internal failure. Advise retry.

9. Implementation Pseudocode (Critical State Checks)

9.1 API Layer

function handle_transfer_request(request, auth_context):
    // ========== Defense-in-Depth Layer 1: API Layer ==========
    
    // 1. Identity Authentication
    if !auth_context.is_valid():
        return Error(UNAUTHORIZED)
    
    // 2. User ID Consistency (Prevent cross-user attacks)
    if request.user_id != auth_context.user_id:
        return Error(FORBIDDEN, "User ID mismatch")
    
    // 3. Account Type Check
    if request.from == request.to:
        return Error(SAME_ACCOUNT)
    
    if request.from NOT IN [FUNDING, SPOT]:
        return Error(INVALID_ACCOUNT_TYPE)
    
    if request.to NOT IN [FUNDING, SPOT]:
        return Error(INVALID_ACCOUNT_TYPE)
    
    // 4. Amount Check
    if request.amount <= 0:
        return Error(INVALID_AMOUNT)
    
    if decimal_places(request.amount) > asset.precision:
        return Error(PRECISION_OVERFLOW)
    
    // 5. Idempotency Check
    if request.cid:
        existing = db.find_by_cid(request.cid)
        if existing:
            return Success(existing)  // Return existing result
    
    // 6. Asset Check
    asset = db.get_asset(request.asset_id)
    if !asset or asset.status != ACTIVE:
        return Error(INVALID_ASSET)
    
    // 7. Call Coordinator
    result = coordinator.create_and_execute(request)
    return result

9.2 Coordinator Layer

function create_and_execute(request):
    // ========== Defense-in-Depth Layer 2: Coordinator ==========
    
    // Re-verify (Prevent internal calls bypassing API)
    ASSERT request.from != request.to
    ASSERT request.amount > 0
    ASSERT request.user_id > 0
    
    // Generate unique ID
    req_id = ulid.new()
    
    // Create transfer record (State = INIT)
    transfer = TransferRecord {
        req_id: req_id,
        user_id: request.user_id,
        from: request.from,
        to: request.to,
        asset_id: request.asset_id,
        amount: request.amount,
        state: INIT,
        created_at: now()
    }
    
    db.insert(transfer)
    log.info("Transfer created", req_id)
    
    // Execute FSM
    return execute_fsm(req_id)

function execute_fsm(req_id):
    loop:
        transfer = db.get(req_id)
        
        if transfer.state.is_terminal():
            return transfer
        
        new_state = step(transfer)
        
        if new_state == transfer.state:
            // No progress, wait for retry
            sleep(RETRY_INTERVAL)
            continue
    
function step(transfer):
    match transfer.state:
        INIT:
            return step_init(transfer)
        SOURCE_PENDING:
            return step_source_pending(transfer)
        SOURCE_DONE:
            return step_source_done(transfer)
        TARGET_PENDING:
            return step_target_pending(transfer)
        COMPENSATING:
            return step_compensating(transfer)
        _:
            return transfer.state  // Terminal, no processing
    
function step_init(transfer):
    // CAS: Persist state BEFORE calling adapter (Persist-Before-Call)
    success = db.cas_update(
        req_id = transfer.req_id,
        old_state = INIT,
        new_state = SOURCE_PENDING
    )
    
    if !success:
        return db.get(transfer.req_id).state
    
    // Get source adapter
    source_adapter = get_adapter(transfer.from)
    
    // ========== Defense-in-Depth Layer 3: Adapter ==========
    result = source_adapter.withdraw(
        req_id = transfer.req_id,
        user_id = transfer.user_id,
        asset_id = transfer.asset_id,
        amount = transfer.amount
    )
    
    match result:
        SUCCESS:
            db.cas_update(transfer.req_id, SOURCE_PENDING, SOURCE_DONE)
            return SOURCE_DONE
        
        EXPLICIT_FAIL(reason):
            db.update_with_error(transfer.req_id, SOURCE_PENDING, FAILED, reason)
            return FAILED
        
        TIMEOUT | PENDING | NETWORK_ERROR | UNKNOWN:
            log.warn("Source withdraw unknown state", transfer.req_id)
            return SOURCE_PENDING

function step_source_done(transfer):
    // ========== Enter SOURCE_DONE: Funds In-Flight, must reach terminal state ==========
    
    // CAS update to TARGET_PENDING
    success = db.cas_update(transfer.req_id, SOURCE_DONE, TARGET_PENDING)
    if !success:
        return db.get(transfer.req_id).state
    
    // Get target adapter
    target_adapter = get_adapter(transfer.to)
    
    // ========== Defense-in-Depth Layer 4: Target Adapter ==========
    result = target_adapter.deposit(
        req_id = transfer.req_id,
        user_id = transfer.user_id,
        asset_id = transfer.asset_id,
        amount = transfer.amount
    )
    
    match result:
        SUCCESS:
            // ╔════════════════════════════════════════════════════════════════╗
            // ║  🔒 ATOMIC COMMIT - CRITICAL STEP!                             ║
            // ║                                                                ║
            // ║  At this point:                                                ║
            // ║    FROM.withdraw = SUCCESS ✓ (already confirmed)               ║
            // ║    TO.deposit    = SUCCESS ✓ (just confirmed)                  ║
            // ║                                                                ║
            // ║  Execute Atomic CAS Commit:                                    ║
            // ║    CAS(TARGET_PENDING → COMMITTED)                            ║
            // ║                                                                ║
            // ║  Once this CAS succeeds, the transfer is irreversible!         ║
            // ╚════════════════════════════════════════════════════════════════╝
            
            commit_success = db.cas_update(transfer.req_id, TARGET_PENDING, COMMITTED)
            
            if !commit_success:
                return db.get(transfer.req_id).state
            
            log.info("🔒 ATOMIC COMMIT SUCCESS", transfer.req_id)
            return COMMITTED
        
        EXPLICIT_FAIL(reason):
            db.update_with_error(transfer.req_id, TARGET_PENDING, COMPENSATING, reason)
            return COMPENSATING
        
        TIMEOUT | PENDING | NETWORK_ERROR | UNKNOWN:
            // ========== CRITICAL: Unknown state, MUST NOT compensate! ==========
            log.critical("Target deposit unknown state - INFINITE RETRY", transfer.req_id)
            alert_ops("Transfer stuck in TARGET_PENDING", transfer.req_id)
            return TARGET_PENDING  // Stay and retry

function step_compensating(transfer):
    source_adapter = get_adapter(transfer.from)
    
    result = source_adapter.refund(
        req_id = transfer.req_id,
        user_id = transfer.user_id,
        asset_id = transfer.asset_id,
        amount = transfer.amount
    )
    
    match result:
        SUCCESS:
            db.cas_update(transfer.req_id, COMPENSATING, ROLLED_BACK)
            log.info("Transfer rolled back", transfer.req_id)
            return ROLLED_BACK
        
        _:
            log.critical("Refund failed - MUST RETRY", transfer.req_id)
            return COMPENSATING

9.3 Adapter Layer (Example: Funding Adapter)

function withdraw(req_id, user_id, asset_id, amount):
    // ========== Defense-in-Depth Layer 3: Adapter Internal Verification ==========
    
    // Re-verify parameters (Do not trust caller)
    ASSERT amount > 0
    ASSERT user_id > 0
    ASSERT asset_id > 0
    
    // Idempotency Check
    existing = db.find_transfer_operation(req_id, "WITHDRAW")
    if existing:
        return existing.result
    
    // Begin transaction
    tx = db.begin_transaction()
    try:
        // SELECT FOR UPDATE
        account = tx.select_for_update(
            "SELECT * FROM balances_tb WHERE user_id = ? AND asset_id = ? AND account_type = 'FUNDING'"
        )
        
        if !account:
            tx.rollback()
            return EXPLICIT_FAIL("SOURCE_ACCOUNT_NOT_FOUND")
        
        if account.status == FROZEN:
            tx.rollback()
            return EXPLICIT_FAIL("ACCOUNT_FROZEN")
        
        if account.available < amount:
            tx.rollback()
            return EXPLICIT_FAIL("INSUFFICIENT_BALANCE")
        
        // Execute deduction
        tx.update("UPDATE balances_tb SET available = available - ? WHERE id = ?", amount, account.id)
        
        // Record operation for idempotency
        tx.insert("INSERT INTO transfer_operations (req_id, op_type, result) VALUES (?, 'WITHDRAW', 'SUCCESS')")
        
        tx.commit()
        return SUCCESS
        
    catch Exception as e:
        tx.rollback()
        log.error("Withdraw failed", req_id, e)
        return UNKNOWN  // Uncertainty requires retry

10. Acceptance Test Plan (Security Critical)

Caution

ALL tests below must pass before going production. Any failure indicates potential fund theft, loss, or creation from thin air.

10.1 Fund Conservation Tests

Test ID	Scenario	Expected Result	Verification
INV-001	After normal transfer	Total funds = Before	`SUM(source) + SUM(target) = Constant`
INV-002	After failed transfer	Total funds = Before	Source balance unchanged
INV-003	After rollback	Total funds = Before	Source balance fully restored
INV-004	After crash recovery	Total funds = Before	Verify all account balances

10.2 External Attack Tests

Test ID	Attack Vector	Steps	Expected Result
ATK-001	Cross-user transfer	Submits user B’s funds with user A’s token	`FORBIDDEN`
ATK-002	user_id Tampering	Modify user_id in request body	`FORBIDDEN`
ATK-003	Negative Amount	amount = -100	`INVALID_AMOUNT`
ATK-004	Zero Amount	amount = 0	`INVALID_AMOUNT`
ATK-005	Precision Overflow	amount = 0.000000001 (>8 decimals)	`PRECISION_OVERFLOW`
ATK-006	Integer Overflow	amount = u64::MAX + 1	`OVERFLOW` or parse error
ATK-007	Same Account	from = to = SPOT	`SAME_ACCOUNT`
ATK-008	Invalid Account Type	from = “INVALID”	`INVALID_ACCOUNT_TYPE`
ATK-009	Non-existent Asset	asset_id = 999999	`INVALID_ASSET`
ATK-010	Duplicate cid	Submit same ID twice	Second returns first result
ATK-011	No Token	Missing Authorization header	`UNAUTHORIZED`
ATK-012	Expired Token	Use expired JWT	`UNAUTHORIZED`
ATK-013	Forged Token	Invalid signature JWT	`UNAUTHORIZED`

10.3 Balance & Status Tests

Test ID	Scenario	Expected Result
BAL-001	amount > available	`INSUFFICIENT_BALANCE`, no change
BAL-002	amount = available	Success, balance becomes 0
BAL-003	Concurrent: Total > balance	One success, one `INSUFFICIENT_BALANCE`
BAL-004	Transfer from frozen account	`ACCOUNT_FROZEN`
BAL-005	Transfer from disabled account	`ACCOUNT_DISABLED`

10.4 FSM State Transition Tests

Test ID	Scenario	Expected State Flow
FSM-001	Normal Funding→Spot	INIT → SOURCE_PENDING → SOURCE_DONE → TARGET_PENDING → COMMITTED
FSM-002	Normal Spot→Funding	Same as above
FSM-003	Source Failure	INIT → SOURCE_PENDING → FAILED
FSM-004	Target Failure (Explicit)	… → TARGET_PENDING → COMPENSATING → ROLLED_BACK
FSM-005	Target Timeout	… → TARGET_PENDING (Stay, infinite retry)
FSM-006	Compensation Failure	COMPENSATING (Stay, infinite retry)

10.5 Crash Recovery Tests

Test ID	Crash Point	Expected Recovery Behavior
CRA-001	After INIT, before SOURCE_PENDING	Recovery reads INIT, restarts step_init
CRA-002	During SOURCE_PENDING, before call	Recovery retries withdraw (idempotent)
CRA-003	During SOURCE_PENDING, after call	Recovery retries withdraw (idempotent, returns handled)
CRA-004	After SOURCE_DONE, before TARGET_PENDING	Recovery executes step_source_done
CRA-005	During TARGET_PENDING	Recovery retries deposit (idempotent)
CRA-006	During COMPENSATING	Recovery retries refund (idempotent)

10.6 Concurrency & Race Tests

Test ID	Scenario	Expected Result
CON-001	Multiple Workers on same req_id	Only one successful CAS, others skip
CON-002	Concurrent Same Amount Transer	Two separate req_ids, both execute
CON-003	Transfer + External Withdraw	Sum cannot exceed balance
CON-004	No-lock balance read	No double deduction (SELECT FOR UPDATE)

10.7 Idempotency Tests

Test ID	Scenario	Expected Result
IDP-001	Call withdraw twice	Second returns SUCCESS, balance deducted once
IDP-002	Call deposit twice	Second returns SUCCESS, balance credited once
IDP-003	Call refund twice	Second returns SUCCESS, balance credited once
IDP-004	Recovery multiple retries	Final state consistent, balance correct

10.8 Fund Anomaly Tests (Most Critical)

Test ID	Threat	Method	Verification
FND-001	Double Spend	Source deduct twice	Only deduct once (idempotent)
FND-002	Fund Disappearance	Source success, target fail, no compensation	Must compensate or retry
FND-003	Money from Nothing	Target credit twice	Only credit once (idempotent)
FND-004	Lost in Transit	Crash at any point	Recovery restores integrity
FND-005	State Inconsistency	SOURCE_DONE but DB not updated	WAL + Idempotency parity
FND-006	Partial Commit	PG Transaction partial success	Atomic transaction (all or none)

10.9 Monitoring & Alerting Tests

Test ID	Scenario	Expected Alert
MON-001	Stuck in TARGET_PENDING > 1m	CRITICAL Alert
MON-002	Compensation fail 3 times	CRITICAL Alert
MON-003	Fund conservation check fail	CRITICAL Alert + HALT Service
MON-004	Abnormal freq per user	WARNING Alert [P2]

🇨🇳 中文

📦 代码变更: 查看 Diff

1. 问题陈述

1.1 系统拓扑

系统	角色	数据源	持久化
PostgreSQL	资金账户 (Funding)	`balances_tb`	ACID, 持久化
UBSCore	交易账户 (Trading)	RAM	WAL + 易失性

1.2 核心约束

这两个系统 无法共享事务。没有 XA/2PC 数据库协议。因此：我们必须使用外部 FSM 协调器构建自己的两阶段提交。

1.5 安全前置检查 (MANDATORY)

Caution

纵深防御 (Defense-in-Depth) 以下所有检查必须在 每一个独立模块 中执行，不仅仅是 API 层。

API 层: 第一道防线，拒绝明显非法请求

Coordinator: 再次验证，防止内部调用绕过 API

Adapters: 最终防线，每个适配器必须独立验证参数

UBSCore: 内存操作前最后一次检查

安全 > 性能。重复检查的开销可以接受，安全漏洞不可接受。

1.5.1 身份与授权检查

检查项	攻击向量	验证逻辑	错误码
用户认证	伪造请求	JWT/Session 必须有效	`UNAUTHORIZED`
用户 ID 一致性	跨用户转账攻击	`request.user_id == auth.user_id`	`FORBIDDEN`
账户归属	转走他人资金	源/目标账户都属于同一 `user_id`	`FORBIDDEN`

1.5.2 账户类型检查

检查项	攻击向量	验证逻辑	错误码
from != to	无限刷单/浪费资源	`request.from != request.to`	`SAME_ACCOUNT`
账户类型有效	注入无效类型	`from, to ∈ {FUNDING, SPOT}`	`INVALID_ACCOUNT_TYPE`
账户类型支持	请求未上线功能	`from, to` 都在支持列表中	`UNSUPPORTED_ACCOUNT_TYPE`

1.5.3 金额检查

检查项	攻击向量	验证逻辑	错误码
amount > 0	零/负数转账	`amount > 0`	`INVALID_AMOUNT`
精度检查	精度溢出	`decimal_places(amount) <= asset.precision`	`PRECISION_OVERFLOW`
最小金额	微额攻击/粉尘攻击	`amount >= asset.min_transfer_amount`	`AMOUNT_TOO_SMALL`
最大单笔金额	风控绕过	`amount <= asset.max_transfer_amount`	`AMOUNT_TOO_LARGE`
整数溢出	u64 溢出攻击	`amount <= u64::MAX / safety_factor`	`OVERFLOW`

1.5.4 资产检查

检查项	攻击向量	验证逻辑	错误码
资产存在	伪造 asset_id	`asset_id` 在系统中存在	`INVALID_ASSET`
资产状态	已下架资产	`asset.status == ACTIVE`	`ASSET_SUSPENDED`
转账许可	某些资产禁止内部转账	`asset.internal_transfer_enabled == true`	`TRANSFER_NOT_ALLOWED`

1.5.5 账户状态检查

账户初始化规则（概述）

账户类型	初始化时机	备注
FUNDING	首次申请充值时创建	外部充值流程触发
SPOT	首次内部转账时创建	懒加载 (Lazy Init)
FUTURE	首次内部转账时创建 [P2]	懒加载
MARGIN	首次内部转账时创建 [P2]	懒加载

Note

各账户类型的具体初始化行为和业务规则，请参见各账户类型的专用文档。

每个账户都有自己的状态定义（如是否允许划转），当前不详细定义。

默认状态：账户初始化时，默认允许划转。

账户状态检查表

检查项	攻击向量	验证逻辑	错误码
源账户存在	不存在的账户	源账户记录必须存在	`SOURCE_ACCOUNT_NOT_FOUND`
目标账户存在/创建	不存在的目标	FUNDING必须存在；SPOT/FUTURE/MARGIN可创建	`TARGET_ACCOUNT_NOT_FOUND` (仅FUNDING)
源账户未冻结	被冻结账户转出	`source.status != FROZEN`	`ACCOUNT_FROZEN`
源账户未禁用	被禁用账户操作	`source.status != DISABLED`	`ACCOUNT_DISABLED`
余额充足	余额不足直接拒绝	`source.available >= amount`	`INSUFFICIENT_BALANCE`

1.5.6 频率限制 (Rate Limiting) - [P2 未来优化]

Note

此部分为 V2 优化项，V1 可不实现。

检查项	攻击向量	验证逻辑	错误码
每秒请求数	DoS 攻击	`user_requests_per_second <= 10`	`RATE_LIMIT_EXCEEDED`
每日转账次数	滥用	`user_daily_transfers <= 100`	`DAILY_LIMIT_EXCEEDED`
每日转账金额	大额风控	`user_daily_amount <= daily_limit`	`DAILY_AMOUNT_EXCEEDED`

1.5.7 幂等性检查

检查项	攻击向量	验证逻辑	错误码
cid 唯一	重复提交	如提供 `cid`，检查是否已存在	`DUPLICATE_REQUEST` (返回原结果)

1.5.8 检查顺序 (推荐)

1. 身份认证 (JWT 有效?)
2. 授权检查 (user_id 匹配?)
3. 请求格式 (from/to/amount 有效?)
4. 账户类型 (from != to, 类型支持?)
5. 资产检查 (存在? 启用? 可转账?)
6. 金额检查 (范围? 精度? 溢出?)
7. 频率限制 (超限?)
8. 幂等性 (重复?)
9. 余额检查 (充足?) ← 最后检查，避免无谓查询

2. FSM 设计 (状态机)

2.0 库选择: `rust-fsm`

使用 rust-fsm 库，提供：

✅ 编译时验证 - 非法状态转换在编译时报错
✅ 声明式 DSL - 清晰定义状态和转换
✅ 类型安全 - 防止遗漏分支

Cargo.toml:

[dependencies]
rust-fsm = "0.7"

DSL 定义:

#![allow(unused)]
fn main() {
use rust_fsm::*;

state_machine! {
    derive(Debug, Clone, Copy, PartialEq, Eq)
    
    TransferFsm(Init)  // 初始状态
    
    // 状态定义
    Init => {
        SourceWithdrawOk => SourceDone,
        SourceWithdrawFail => Failed,
    },
    SourceDone => {
        TargetDepositOk => Committed,
        TargetDepositFail => Compensating,
        TargetDepositUnknown => SourceDone [loop],  // 保持，无限重试
    },
    Compensating => {
        RefundOk => RolledBack,
        RefundFail => Compensating [loop],  // 保持，无限重试
    },
    // 终态
    Committed,
    Failed,
    RolledBack,
}
}

Note

上述 DSL 用于编译时验证状态转换的合法性。实际运行时状态存储在 PostgreSQL，使用 CAS 更新。

2.0.1 核心状态流程图 (Top Level)

                              ┌─────────────────────────────────────────────────────────┐
                              │              INTERNAL TRANSFER FSM                       │
                              └─────────────────────────────────────────────────────────┘

   ┌─────────────────────────────── 正常路径 (Happy Path) ──────────────────────────────────┐
   │                                                                                        │
   │   ┌─────────┐                    ┌─────────────┐                    ┌───────────────┐  │
   │   │  INIT   │   源扣减成功 ✓     │ SOURCE_DONE │   目标入账成功 ✓   │               │  │
   │   │(用户请求)│ ─────────────────▶ │ (资金在途)  │ ─────────────────▶ │   COMMITTED   │  │
   │   └─────────┘                    └─────────────┘                    │               │  │
   │        │                               │                            └───────────────┘  │
   │        │                               │                                   ✅          │
   └────────│───────────────────────────────│───────────────────────────────────────────────┘
            │                               │
            │                               │
            │                               ▼
            │                     ╔══════════════════════════════════════════════════╗
            │                     ║  🔒 ATOMIC COMMIT (原子提交)                     ║
            │                     ║                                                  ║
            │                     ║  当且仅当:                                       ║
            │                     ║    FROM.withdraw = SUCCESS  ✓                   ║
            │                     ║    TO.deposit    = SUCCESS  ✓                   ║
            │                     ║                                                  ║
            │                     ║  执行: CAS(SOURCE_DONE → COMMITTED)             ║
            │                     ║  此操作必须原子，不可中断                         ║
            │                     ╚══════════════════════════════════════════════════╝
            │                               │
            │ 源扣减失败                     │ 目标入账失败 (明确 EXPLICIT_FAIL)
            ▼                               ▼
      ┌──────────┐                   ┌──────────────┐
      │  FAILED  │                   │ COMPENSATING │◀───────────┐
      │ (源失败)  │                   │  (退款中)    │            │ 退款失败 (无限重试)
      └──────────┘                   └──────────────┘────────────┘
           ❌                               │ 退款成功
                                            ▼
                                     ┌─────────────┐
                                     │ ROLLED_BACK │
                                     │  (已回滚)    │
                                     └─────────────┘
                                           ↩️

   ╔════════════════════════════════════════════════════════════════════════════════════════╗
   ║  ⚠️ 目标入账状态未知 (TIMEOUT/UNKNOWN) → 保持 SOURCE_DONE，无限重试，绝不进入 COMPENSATING║
   ╚════════════════════════════════════════════════════════════════════════════════════════╝

核心状态说明:

状态	资金位置	说明
`INIT`	源账户	用户发起请求，资金尚未移动
`SOURCE_DONE`	在途	关键点！资金已离开源，尚未到达目标
`COMMITTED`	目标账户	终态，转账成功
`FAILED`	源账户	终态，源扣减失败，无资金移动
`COMPENSATING`	在途	目标入账失败，正在退款
`ROLLED_BACK`	源账户	终态，退款成功

Important

SOURCE_DONE 是最关键的状态 - 资金已离开源账户但尚未到达目标。此时绝不能丢失状态，必须确保最终到达 COMMITTED 或 ROLLED_BACK。

2.1 状态 (穷举)

ID	状态名	进入条件	终态?	资金位置
0	`INIT`	用户请求已接受	否	源账户
10	`SOURCE_PENDING`	CAS 成功，适配器调用已发起	否	源账户 (扣减中)
20	`SOURCE_DONE`	源适配器返回 `OK`	否	在途
30	`TARGET_PENDING`	CAS 成功，目标适配器调用已发起	否	在途 (入账中)
40	`COMMITTED`	目标适配器返回 `OK`	是	目标账户
-10	`FAILED`	源适配器返回 `FAIL`	是	源账户 (未变)
-20	`COMPENSATING`	目标适配器 `FAIL` 且源可逆	否	在途 (退款中)
-30	`ROLLED_BACK`	源退款 `OK`	是	源账户 (已恢复)

2.2 状态转换规则 (穷举)

┌───────────────────────────────────────────────────────────────────────────────┐
│                              规范状态转换                                       │
├───────────────────────────────────────────────────────────────────────────────┤
│                                                                               │
│  INIT ──────[CAS成功]───────► SOURCE_PENDING                                  │
│    │                              │                                           │
│    │                              ├──[适配器OK]────► SOURCE_DONE              │
│    │                              │                         │                 │
│    │                              └──[适配器FAIL]──► FAILED (终态)            │
│    │                                                        │                 │
│    │                                                        │                 │
│    │                              SOURCE_DONE ──[CAS成功]──► TARGET_PENDING   │
│    │                                                             │            │
│    │                        ┌────────────────────────────────────┤            │
│    │                        │                                    │            │
│    │            [适配器OK]  │                       [适配器FAIL]              │
│    │                        │                                    │            │
│    │                        ▼                                    ▼            │
│    │                   COMMITTED                     ┌───────────────────┐    │
│    │                   (终态)                        │   源可逆?          │    │
│    │                                                 └─────────┬─────────┘    │
│    │                                                   是      │     否       │
│    │                                                   ▼       │     ▼        │
│    │                                           COMPENSATING    │  无限重试    │
│    │                                                 │         │ (保持在      │
│    │                                    [退款OK]     │         │  TARGET_     │
│    │                                         ▼       │         │  PENDING)    │
│    │                                    ROLLED_BACK  │         │              │
│    │                                    (终态)       │         │              │
│    │                                                 │         │              │
│    └─────────────────────────────────────────────────┴─────────┴──────────────┘

2.3 可逆性规则 (关键)

核心原则: 只有当适配器返回 明确定义的失败 时，才能安全撤销。

响应类型	含义	可安全撤销?	处理方式
`SUCCESS`	操作成功	N/A	继续下一步
`EXPLICIT_FAIL`	明确业务失败 (如余额不足)	是	可进入 `COMPENSATING`
`TIMEOUT`	超时，状态未知	否	无限重试
`PENDING`	处理中，状态未知	否	无限重试
`NETWORK_ERROR`	网络错误，状态未知	否	无限重试
`UNKNOWN`	任何其他情况	否	无限重试或人工介入

Caution

只有 EXPLICIT_FAIL 可以安全撤销。 任何状态未知的情况（超时、Pending、网络错误），资金都处于 In-Flight 中。我们无法知道对方是否已处理。贸然撤销将导致双花或 资金丢失。唯一安全操作：无限重试 或 人工介入。

3. 转账场景 (逐步)

3.1 场景 A: 资金 → 交易 (充值到交易账户)

正常路径:

步骤	执行者	操作	前状态	后状态	资金
1	API	验证，创建记录	-	`INIT`	资金账户
2	协调器	CAS(`INIT` → `SOURCE_PENDING`)	`INIT`	`SOURCE_PENDING`	资金账户
3	协调器	调用 `FundingAdapter.withdraw(req_id)`	-	-	-
4	PG	`UPDATE balances SET amount = amount - X`	-	-	已扣减
5	协调器	收到 `OK`: CAS(`SOURCE_PENDING` → `SOURCE_DONE`)	`SOURCE_PENDING`	`SOURCE_DONE`	在途
6	协调器	CAS(`SOURCE_DONE` → `TARGET_PENDING`)	`SOURCE_DONE`	`TARGET_PENDING`	在途
7	协调器	调用 `TradingAdapter.deposit(req_id)`	-	-	-
8	UBSCore	增加RAM余额，写WAL，发出事件	-	-	已入账
9	协调器	收到事件: CAS(`TARGET_PENDING` → `COMMITTED`)	`TARGET_PENDING`	`COMMITTED`	交易账户

失败路径 (目标失败):

步骤	执行者	操作	前状态	后状态	资金
7’	协调器	调用 `TradingAdapter.deposit(req_id)` → FAIL/超时	`TARGET_PENDING`	-	在途
8’	协调器	检查: 源 = 资金账户 (可逆)	-	-	-
9’	协调器	CAS(`TARGET_PENDING` → `COMPENSATING`)	`TARGET_PENDING`	`COMPENSATING`	在途
10’	协调器	调用 `FundingAdapter.refund(req_id)`	-	-	-
11’	PG	`UPDATE balances SET amount = amount + X`	-	-	已退款
12’	协调器	CAS(`COMPENSATING` → `ROLLED_BACK`)	`COMPENSATING`	`ROLLED_BACK`	资金账户

3.2 场景 B: 交易 → 资金 (从交易账户提现)

正常路径:

步骤	执行者	操作	前状态	后状态	资金
1	API	验证，创建记录	-	`INIT`	交易账户
2	协调器	CAS(`INIT` → `SOURCE_PENDING`)	`INIT`	`SOURCE_PENDING`	交易账户
3	协调器	调用 `TradingAdapter.withdraw(req_id)`	-	-	-
4	UBSCore	检查余额，扣减RAM，写WAL，发出事件	-	-	已扣减
5	协调器	收到事件: CAS(`SOURCE_PENDING` → `SOURCE_DONE`)	`SOURCE_PENDING`	`SOURCE_DONE`	在途
6	协调器	CAS(`SOURCE_DONE` → `TARGET_PENDING`)	`SOURCE_DONE`	`TARGET_PENDING`	在途
7	协调器	调用 `FundingAdapter.deposit(req_id)`	-	-	-
8	PG	`INSERT ... ON CONFLICT UPDATE SET amount = amount + X`	-	-	已入账
9	协调器	收到 `OK`: CAS(`TARGET_PENDING` → `COMMITTED`)	`TARGET_PENDING`	`COMMITTED`	资金账户

失败路径 (目标失败):

步骤	执行者	操作	前状态	后状态	资金
7a	协调器	调用 `FundingAdapter.deposit(req_id)` → EXPLICIT_FAIL (如约束违反)	`TARGET_PENDING`	-	在途
8a	协调器	检查响应类型 = EXPLICIT_FAIL (可安全撤销)	-	-	-
9a	协调器	CAS(`TARGET_PENDING` → `COMPENSATING`)	`TARGET_PENDING`	`COMPENSATING`	在途
10a	协调器	调用 `TradingAdapter.refund(req_id)` (向UBSCore退款)	-	-	-
11a	UBSCore	增加RAM余额，写WAL	-	-	已退款
12a	协调器	CAS(`COMPENSATING` → `ROLLED_BACK`)	`COMPENSATING`	`ROLLED_BACK`	交易账户

步骤	执行者	操作	前状态	后状态	资金
7b	协调器	调用 `FundingAdapter.deposit(req_id)` → TIMEOUT/UNKNOWN	`TARGET_PENDING`	-	在途
8b	协调器	检查响应类型 = UNKNOWN (不可安全撤销)	-	-	-
9b	协调器	不转换状态。保持 `TARGET_PENDING`。	`TARGET_PENDING`	`TARGET_PENDING`	在途
10b	协调器	记录 CRITICAL 日志。告警运维。安排重试。	-	-	-
11b	恢复器	无限重试 `FundingAdapter.deposit(req_id)`。	-	-	-
12b	(最终)	收到 `OK`: CAS(`TARGET_PENDING` → `COMMITTED`)	`TARGET_PENDING`	`COMMITTED`	资金账户

Warning

只有当目标返回 EXPLICIT_FAIL 时才能进入 COMPENSATING。 如果是超时或未知状态，资金处于 In-Flight，必须无限重试或人工介入。

4. 失效模式与影响分析 (FMEA)

4.1 阶段1失败 (源操作)

失败	原因	当前状态	资金	解决方案
适配器返回 `FAIL`	余额不足，DB约束	`SOURCE_PENDING`	源账户	转到 `FAILED`。用户看到错误。
适配器返回 `PENDING`	超时，网络问题	`SOURCE_PENDING`	未知	重试。适配器必须幂等。
协调器在CAS后、调用前崩溃	进程终止	`SOURCE_PENDING`	源账户	恢复工作器重试调用。
协调器在调用后、结果前崩溃	进程终止	`SOURCE_PENDING`	未知	恢复工作器重试（幂等）。

4.2 阶段2失败 (目标操作)

失败	原因	响应类型	当前状态	资金	解决方案
目标明确拒绝	业务规则	`EXPLICIT_FAIL`	`TARGET_PENDING`	在途	`COMPENSATING` → 退款。
超时	网络延迟	`TIMEOUT`	`TARGET_PENDING`	未知	无限重试。
网络错误	连接断开	`NETWORK_ERROR`	`TARGET_PENDING`	未知	无限重试。
未知错误	系统异常	`UNKNOWN`	`TARGET_PENDING`	未知	无限重试或人工介入。
协调器崩溃	进程终止	N/A	`TARGET_PENDING`	在途	恢复工作器重试。

4.3 补偿失败

失败	原因	当前状态	资金	解决方案
退款 `FAIL`	PG宕机，约束	`COMPENSATING`	在途	无限重试。资金卡住直到PG恢复。
退款 `PENDING`	超时	`COMPENSATING`	未知	重试。

5. 幂等性要求 (强制)

5.1 为什么需要幂等性?

重试是崩溃恢复的基础。没有幂等性，重试将导致 双重执行（双重扣减、双重入账）。

5.2 实现 (资金适配器)

要求: 给定相同的 req_id，多次调用 withdraw() 或 deposit() 必须与调用一次效果相同。

机制:

transfers_tb 有 UNIQUE(req_id)。

原子事务:

BEGIN;
-- 检查是否已处理
SELECT state FROM transfers_tb WHERE req_id = $1;
IF state >= expected_post_state THEN
    RETURN 'AlreadyProcessed';
END IF;

-- 执行余额更新
UPDATE balances_tb SET amount = amount - $2 WHERE user_id = $3 AND asset_id = $4 AND amount >= $2;
IF NOT FOUND THEN
    RETURN 'InsufficientBalance';
END IF;

-- 更新状态
UPDATE transfers_tb SET state = $new_state, updated_at = NOW() WHERE req_id = $1;
COMMIT;
RETURN 'Success';

5.3 实现 (交易适配器)

要求: 同上。UBSCore 必须拒绝重复的 req_id。

机制:

InternalOrder 包含 req_id 字段（或 cid）。
UBSCore 维护一个 ProcessedTransferSet（RAM中的HashSet，重启时从WAL重建）。

收到转账订单时:

IF req_id IN ProcessedTransferSet THEN
    RETURN 'AlreadyProcessed' (成功，无操作)
ELSE
    ProcessTransfer()
    ProcessedTransferSet.insert(req_id)
    WriteWAL(TransferEvent)
    RETURN 'Success'
END IF

6. 恢复工作器 (僵尸处理器)

6.1 目的

在协调器启动时（或定期），扫描“卡住“的转账并恢复它们。

6.2 查询

SELECT * FROM transfers_tb 
WHERE state IN (0, 10, 20, 30, -20) -- INIT, SOURCE_PENDING, SOURCE_DONE, TARGET_PENDING, COMPENSATING
  AND updated_at < NOW() - INTERVAL '1 minute'; -- 过期阈值

6.3 恢复逻辑

当前状态	操作
`INIT`	调用 `step()`（将转到 `SOURCE_PENDING`）。
`SOURCE_PENDING`	重试 `Source.withdraw()`。
`SOURCE_DONE`	调用 `step()`（将转到 `TARGET_PENDING`）。
`TARGET_PENDING`	重试 `Target.deposit()`。应用可逆性规则。
`COMPENSATING`	重试 `Source.refund()`。

7. 数据模型

7.1 表: `transfers_tb`

CREATE TABLE transfers_tb (
    transfer_id   BIGSERIAL PRIMARY KEY,
    req_id        VARCHAR(26) UNIQUE NOT NULL,  -- 服务端生成的唯一 ID (ULID)
    cid           VARCHAR(64) UNIQUE,           -- 客户端幂等键 (可选)
    user_id       BIGINT NOT NULL,
    asset_id      INTEGER NOT NULL,
    amount        DECIMAL(30, 8) NOT NULL,
    transfer_type SMALLINT NOT NULL,            -- 1 = 资金->交易, 2 = 交易->资金
    source_type   SMALLINT NOT NULL,            -- 1 = 资金, 2 = 交易
    state         SMALLINT NOT NULL DEFAULT 0,  -- FSM 状态 ID
    error_message TEXT,                         -- 最后错误（用于调试）
    retry_count   INTEGER NOT NULL DEFAULT 0,
    created_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at    TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_transfers_state ON transfers_tb(state) WHERE state NOT IN (40, -10, -30);

7.2 不变量检查

定期运行以检测数据损坏:

-- 每个用户每个资产的 资金 + 交易 + 在途 之和应该是常数
-- 在途 = SUM(amount) WHERE state IN (SOURCE_DONE, TARGET_PENDING, COMPENSATING)

8. API 契约

8.1 端点: `POST /api/v1/internal_transfer`

请求:

{
  "from": "SPOT",       // 源账户类型
  "to": "FUNDING",     // 目标账户类型
  "asset": "USDT",
  "amount": "100.00"
}

账户类型枚举 (AccountType):

值	含义	状态
`FUNDING`	资金账户 (PostgreSQL)	已支持
`SPOT`	现货交易账户 (UBSCore)	已支持
`FUTURE`	合约账户	未来扩展
`MARGIN`	杠杆账户	未来扩展

响应:

{
  "transfer_id": 12345,
  "req_id": "01JFVQ2X8Z0Y1M3N4P5R6S7T8U",  // 服务端生成 (ULID)
  "from": "SPOT",
  "to": "FUNDING",
  "state": "COMMITTED",  // 或 "PENDING" 如果异步
  "message": "转账成功"
}

8.2 查询端点: `GET /api/v1/internal_transfer/:req_id`

响应:

{
  "transfer_id": 12345,
  "req_id": "sr-1734912345678901234",
  "from": "SPOT",
  "to": "FUNDING",
  "asset": "USDT",
  "amount": "100.00",
  "state": "COMMITTED",
  "created_at": "2024-12-23T14:00:00Z",
  "updated_at": "2024-12-23T14:00:01Z"
}

Important

req_id 由服务端生成，不是客户端。客户端如果需要幂等性，应使用 cid (client_order_id) 字段（可选），服务端会检查重复并返回已有结果。

错误码:

代码	含义
`INSUFFICIENT_BALANCE`	源账户余额 < 金额。
`INVALID_ACCOUNT_TYPE`	`from` 或 `to` 的账户类型无效或不支持。
`SAME_ACCOUNT`	`from` 和 `to` 相同。
`DUPLICATE_REQUEST`	`cid` 已处理。返回原始结果。
`INVALID_AMOUNT`	金额 <= 0 或超过精度。
`SYSTEM_ERROR`	内部失败。建议重试。

9. 实现伪代码 (关键状态检查)

9.1 API 层

function handle_transfer_request(request, auth_context):
    // ========== 纵深防御 Layer 1: API 层 ==========
    
    // 1. 身份认证
    if !auth_context.is_valid():
        return Error(UNAUTHORIZED)
    
    // 2. 用户 ID 一致性（防止跨用户攻击）
    if request.user_id != auth_context.user_id:
        return Error(FORBIDDEN, "User ID mismatch")
    
    // 3. 账户类型检查
    if request.from == request.to:
        return Error(SAME_ACCOUNT)
    
    if request.from NOT IN [FUNDING, SPOT]:
        return Error(INVALID_ACCOUNT_TYPE)
    
    if request.to NOT IN [FUNDING, SPOT]:
        return Error(INVALID_ACCOUNT_TYPE)
    
    // 4. 金额检查
    if request.amount <= 0:
        return Error(INVALID_AMOUNT)
    
    if decimal_places(request.amount) > asset.precision:
        return Error(PRECISION_OVERFLOW)
    
    // 5. 幂等性检查
    if request.cid:
        existing = db.find_by_cid(request.cid)
        if existing:
            return Success(existing)  // 返回已存在的结果
    
    // 6. 资产检查
    asset = db.get_asset(request.asset_id)
    if !asset or asset.status != ACTIVE:
        return Error(INVALID_ASSET)
    
    // 7. 调用 Coordinator
    result = coordinator.create_and_execute(request)
    return result

9.2 Coordinator 层

function create_and_execute(request):
    // ========== 纵深防御 Layer 2: Coordinator ==========
    
    // 再次验证（防止内部调用绕过 API）
    ASSERT request.from != request.to
    ASSERT request.amount > 0
    ASSERT request.user_id > 0
    
    // 生成唯一 ID
    req_id = ulid.new()
    
    // 创建转账记录 (State = INIT)
    transfer = TransferRecord {
        req_id: req_id,
        user_id: request.user_id,
        from: request.from,
        to: request.to,
        asset_id: request.asset_id,
        amount: request.amount,
        state: INIT,
        created_at: now()
    }
    
    db.insert(transfer)
    log.info("Transfer created", req_id)
    
    // 执行 FSM
    return execute_fsm(req_id)

function execute_fsm(req_id):
    loop:
        transfer = db.get(req_id)
        
        if transfer.state.is_terminal():
            return transfer
        
        new_state = step(transfer)
        
        if new_state == transfer.state:
            // 未进展，等待重试
            sleep(RETRY_INTERVAL)
            continue
    
function step(transfer):
    match transfer.state:
        INIT:
            return step_init(transfer)
        SOURCE_PENDING:
            return step_source_pending(transfer)
        SOURCE_DONE:
            return step_source_done(transfer)
        TARGET_PENDING:
            return step_target_pending(transfer)
        COMPENSATING:
            return step_compensating(transfer)
        _:
            return transfer.state  // 终态，不处理

function step_init(transfer):
    // CAS: 先更新状态，再调用适配器（Persist-Before-Call）
    success = db.cas_update(
        req_id = transfer.req_id,
        old_state = INIT,
        new_state = SOURCE_PENDING
    )
    
    if !success:
        // 并发冲突，重新读取
        return db.get(transfer.req_id).state
    
    // 获取源适配器
    source_adapter = get_adapter(transfer.from)
    
    // ========== 纵深防御 Layer 3: Adapter ==========
    result = source_adapter.withdraw(
        req_id = transfer.req_id,
        user_id = transfer.user_id,
        asset_id = transfer.asset_id,
        amount = transfer.amount
    )
    
    match result:
        SUCCESS:
            db.cas_update(transfer.req_id, SOURCE_PENDING, SOURCE_DONE)
            return SOURCE_DONE
        
        EXPLICIT_FAIL(reason):
            // 明确失败，可以安全终止
            db.update_with_error(transfer.req_id, SOURCE_PENDING, FAILED, reason)
            return FAILED
        
        TIMEOUT | PENDING | NETWORK_ERROR | UNKNOWN:
            // 状态未知，保持 SOURCE_PENDING，等待重试
            log.warn("Source withdraw unknown state", transfer.req_id)
            return SOURCE_PENDING

function step_source_done(transfer):
    // ========== 进入 SOURCE_DONE: 资金已在途，必须确保最终到达终态 ==========
    
    // CAS 更新到 TARGET_PENDING
    success = db.cas_update(transfer.req_id, SOURCE_DONE, TARGET_PENDING)
    if !success:
        return db.get(transfer.req_id).state
    
    // 获取目标适配器
    target_adapter = get_adapter(transfer.to)
    
    // ========== 纵深防御 Layer 4: Target Adapter ==========
    result = target_adapter.deposit(
        req_id = transfer.req_id,
        user_id = transfer.user_id,
        asset_id = transfer.asset_id,
        amount = transfer.amount
    )
    
    match result:
        SUCCESS:
            // ╔════════════════════════════════════════════════════════════════╗
            // ║  🔒 ATOMIC COMMIT - 最关键的一步！                             ║
            // ║                                                                ║
            // ║  此时:                                                         ║
            // ║    FROM.withdraw = SUCCESS ✓ (已确认)                         ║
            // ║    TO.deposit    = SUCCESS ✓ (刚确认)                         ║
            // ║                                                                ║
            // ║  执行原子 CAS 提交:                                            ║
            // ║    CAS(TARGET_PENDING → COMMITTED)                            ║
            // ║                                                                ║
            // ║  此 CAS 是最终确认，一旦成功，转账不可逆转！                    ║
            // ╚════════════════════════════════════════════════════════════════╝
            
            commit_success = db.cas_update(transfer.req_id, TARGET_PENDING, COMMITTED)
            
            if !commit_success:
                // 极少发生：另一个 Worker 已经提交，返回当前状态
                return db.get(transfer.req_id).state
            
            log.info("🔒 ATOMIC COMMIT SUCCESS", transfer.req_id)
            return COMMITTED
        
        EXPLICIT_FAIL(reason):
            // 明确失败，可以进入补偿
            db.update_with_error(transfer.req_id, TARGET_PENDING, COMPENSATING, reason)
            return COMPENSATING
        
        TIMEOUT | PENDING | NETWORK_ERROR | UNKNOWN:
            // ========== 关键：状态未知，不能补偿！==========
            log.critical("Target deposit unknown state - INFINITE RETRY", transfer.req_id)
            alert_ops("Transfer stuck in TARGET_PENDING", transfer.req_id)
            return TARGET_PENDING  // 保持状态，等待重试


function step_compensating(transfer):
    source_adapter = get_adapter(transfer.from)
    
    result = source_adapter.refund(
        req_id = transfer.req_id,
        user_id = transfer.user_id,
        asset_id = transfer.asset_id,
        amount = transfer.amount
    )
    
    match result:
        SUCCESS:
            db.cas_update(transfer.req_id, COMPENSATING, ROLLED_BACK)
            log.info("Transfer rolled back", transfer.req_id)
            return ROLLED_BACK
        
        _:
            // 退款失败，必须无限重试
            log.critical("Refund failed - MUST RETRY", transfer.req_id)
            return COMPENSATING

9.3 Adapter 层 (示例: Funding Adapter)

function withdraw(req_id, user_id, asset_id, amount):
    // ========== 纵深防御 Layer 3: Adapter 内部检查 ==========
    
    // 再次验证参数（不信任调用者）
    ASSERT amount > 0
    ASSERT user_id > 0
    ASSERT asset_id > 0
    
    // 幂等性检查
    existing = db.find_transfer_operation(req_id, "WITHDRAW")
    if existing:
        return existing.result  // 返回已处理的结果
    
    // 开始事务
    tx = db.begin_transaction()
    try:
        // 获取账户并锁定
        account = tx.select_for_update(
            "SELECT * FROM balances_tb WHERE user_id = ? AND asset_id = ? AND account_type = 'FUNDING'"
        )
        
        if !account:
            tx.rollback()
            return EXPLICIT_FAIL("SOURCE_ACCOUNT_NOT_FOUND")
        
        if account.status == FROZEN:
            tx.rollback()
            return EXPLICIT_FAIL("ACCOUNT_FROZEN")
        
        if account.available < amount:
            tx.rollback()
            return EXPLICIT_FAIL("INSUFFICIENT_BALANCE")
        
        // 执行扣减
        tx.update("UPDATE balances_tb SET available = available - ? WHERE id = ?", amount, account.id)
        
        // 记录操作（用于幂等性）
        tx.insert("INSERT INTO transfer_operations (req_id, op_type, result) VALUES (?, 'WITHDRAW', 'SUCCESS')")
        
        tx.commit()
        return SUCCESS
        
    catch Exception as e:
        tx.rollback()
        log.error("Withdraw failed", req_id, e)
        return UNKNOWN  // 不确定是否执行，必须重试

10. 验收测试计划 (安全关键)

Caution

以下测试必须全部通过才能上线。 任何失败都可能导致资金被盗、消失或无中生有。

10.1 资金守恒测试

测试 ID	场景	预期结果	验证方法
INV-001	正常转账后	总资金 = 转账前	`SUM(source) + SUM(target) = 常数`
INV-002	失败转账后	总资金 = 转账前	源账户余额无变化
INV-003	回滚后	总资金 = 转账前	源账户余额完全恢复
INV-004	系统崩溃恢复后	总资金 = 崩溃前	遍历所有账户验证

10.2 外部攻击测试

测试 ID	攻击向量	测试步骤	预期结果
ATK-001	跨用户转账	用 user_id=A 的 token 请求转 user_id=B 的资金	`FORBIDDEN`
ATK-002	user_id 篡改	修改请求体中的 user_id	`FORBIDDEN`
ATK-003	负数金额	amount = -100	`INVALID_AMOUNT`
ATK-004	零金额	amount = 0	`INVALID_AMOUNT`
ATK-005	超精度金额	amount = 0.000000001 (超过8位)	`PRECISION_OVERFLOW`
ATK-006	整数溢出	amount = u64::MAX + 1	`OVERFLOW` 或解析失败
ATK-007	相同账户	from = to = SPOT	`SAME_ACCOUNT`
ATK-008	无效账户类型	from = “INVALID”	`INVALID_ACCOUNT_TYPE`
ATK-009	不存在的资产	asset_id = 999999	`INVALID_ASSET`
ATK-010	重复 cid	同一 cid 发两次	第二次返回第一次结果
ATK-011	无 Token	不带 Authorization header	`UNAUTHORIZED`
ATK-012	过期 Token	使用过期的 JWT	`UNAUTHORIZED`
ATK-013	伪造 Token	使用无效签名的 JWT	`UNAUTHORIZED`

10.3 余额不足测试

测试 ID	场景	预期结果
BAL-001	转账金额 > 可用余额	`INSUFFICIENT_BALANCE`，余额无变化
BAL-002	转账金额 = 可用余额	成功，余额变为 0
BAL-003	并发: 两次转账总额 > 余额	一个成功，一个 `INSUFFICIENT_BALANCE`
BAL-004	冻结账户转出	`ACCOUNT_FROZEN`
BAL-005	禁用账户转出	`ACCOUNT_DISABLED`

10.4 FSM 状态转换测试

测试 ID	场景	预期状态流
FSM-001	正常 Funding→Spot	INIT → SOURCE_PENDING → SOURCE_DONE → TARGET_PENDING → COMMITTED
FSM-002	正常 Spot→Funding	同上
FSM-003	源失败	INIT → SOURCE_PENDING → FAILED
FSM-004	目标失败 (明确)	… → TARGET_PENDING → COMPENSATING → ROLLED_BACK
FSM-005	目标超时	… → TARGET_PENDING (保持，无限重试)
FSM-006	补偿失败	COMPENSATING (保持，无限重试)

10.5 崩溃恢复测试

测试 ID	崩溃点	预期恢复行为
CRA-001	INIT 后，SOURCE_PENDING 前	Recovery 读取 INIT，重新执行 step_init
CRA-002	SOURCE_PENDING 中，适配器调用前	Recovery 重试 withdraw (幂等)
CRA-003	SOURCE_PENDING 中，适配器调用后	Recovery 重试 withdraw (幂等，返回已处理)
CRA-004	SOURCE_DONE 后，TARGET_PENDING 前	Recovery 继续执行 step_source_done
CRA-005	TARGET_PENDING 中	Recovery 重试 deposit (幂等)
CRA-006	COMPENSATING 中	Recovery 重试 refund (幂等)

10.6 并发/竞态测试

测试 ID	场景	预期结果
CON-001	多个 Worker 处理同一 req_id	只有一个成功 CAS，其他跳过
CON-002	同时两次相同金额转账	两个独立 req_id，各自执行
CON-003	转账 + 外部提现并发	只有余额足够的操作成功
CON-004	读取余额时无锁	无重复扣减（SELECT FOR UPDATE）

10.7 幂等性测试

测试 ID	场景	预期结果
IDP-001	同一 req_id 调用 withdraw 两次	第二次返回 SUCCESS，余额只扣一次
IDP-002	同一 req_id 调用 deposit 两次	第二次返回 SUCCESS，余额只加一次
IDP-003	同一 req_id 调用 refund 两次	第二次返回 SUCCESS，余额只加一次
IDP-004	Recovery 多次重试同一 transfer	最终状态一致，余额正确

10.8 资金异常测试 (最关键)

测试 ID	威胁	测试方法	验证
FND-001	双花 (Double Spend)	源扣减两次	只扣一次（幂等）
FND-002	资金消失	源扣减成功，目标失败，不补偿	必须补偿或无限重试
FND-003	资金无中生有	目标入账两次	只入一次（幂等）
FND-004	中途崩溃丢失	任意点崩溃	Recovery 恢复完整性
FND-005	状态不一致	SOURCE_DONE 但 DB 未更新	WAL + 幂等保证一致
FND-006	部分提交	PG 事务部分成功	原子事务，全成功或全失败

10.9 监控告警测试

测试 ID	场景	预期告警
MON-001	转账卡在 TARGET_PENDING > 1 分钟	CRITICAL 告警
MON-002	补偿连续失败 3 次	CRITICAL 告警
MON-003	资金守恒检查失败	CRITICAL 告警 + 暂停服务
MON-004	单用户转账频率异常	WARNING 告警 [P2]

↑ Back to Top

📋 Implementation & Verification | 实现与验证

本章的完整实现细节、API 说明、E2E 测试脚本和验证结果请参阅:

For complete implementation details, API documentation, E2E test scripts, and verification results:

👉 Phase 0x0B-a: Implementation & Testing Guide

包含 / Includes:

架构实现与核心模块 (Architecture & Core Modules)
新增 API 端点 (New API Endpoints)
可复用 E2E 测试脚本 (Reusable E2E Test Script)
数据库验证方法 (Database Verification)
已修复 Bug 清单 (Fixed Bugs)

Internal Transfer E2E Testing Guide

概述 / Overview

本文档描述了 Phase 0x0B-a 内部转账功能的完成工作、实现细节和端到端测试方法。

This document describes the completed work, implementation details, and end-to-end testing methodology for Phase 0x0B-a Internal Transfer feature.

本章完成工作 / Chapter Deliverables

架构实现 / Architecture Implementation

实现了跨系统资金划转的 2-Phase Commit FSM:

                     ┌─────────────────┐
                     │  TransferAPI    │  Gateway 层
                     └────────┬────────┘
                              │
                     ┌────────▼────────┐
                     │ TransferCoord.  │  FSM 协调器
                     └────────┬────────┘
                              │
           ┌──────────────────┼──────────────────┐
           │                  │                  │
  ┌────────▼────────┐ ┌───────▼───────┐ ┌───────▼───────┐
  │ FundingAdapter  │ │ TradingAdapter│ │  TransferDb   │
  │   (PostgreSQL)  │ │  (UBSCore)    │ │  (FSM State)  │
  └─────────────────┘ └───────────────┘ └───────────────┘

核心模块 / Core Modules

模块 / Module	文件 / File	功能 / Function
TransferCoordinator	`src/transfer/coordinator.rs`	FSM 状态机驱动 State machine driver
FundingAdapter	`src/transfer/adapters/funding.rs`	PostgreSQL 资金操作 PostgreSQL balance ops
TradingAdapter	`src/transfer/adapters/trading.rs`	UBSCore 通道通信 UBSCore channel comm
TransferDb	`src/transfer/db.rs`	FSM 状态持久化 FSM state persistence
TransferChannel	`src/transfer/channel.rs`	跨线程通信 Cross-thread messaging

新增 API / New APIs

Endpoint	Method	描述 / Description
`/api/v1/private/transfer`	POST	创建内部转账
`/api/v1/private/transfer/{req_id}`	GET	查询转账状态
`/api/v1/private/balances/all`	GET	查询所有账户余额

数据库表 / Database Tables

表 / Table	用途 / Purpose
`fsm_transfers_tb`	FSM 转账状态记录
`transfer_operations_tb`	幂等操作追踪
`balances_tb`	账户余额 (Funding/Spot)

交付物 / Deliverables

✅ 完整的 FSM 实现 (Init → SourcePending → SourceDone → TargetPending → Committed)
✅ 双向转账验证 (Funding ↔ Spot)
✅ 可复用 E2E 测试脚本
✅ /balances/all 余额查询 API
✅ 232 个单元测试通过

测试脚本 / Test Script

自动化 E2E 测试 / Automated E2E Test

# 运行完整 E2E 测试 (自动启动 Gateway)
./scripts/test_transfer_e2e.sh

脚本位置: scripts/test_transfer_e2e.sh

测试流程 / Test Flow

[1/6] Prerequisites Check
    ✓ PostgreSQL connected (port 5433)
    ✓ Release binary ready

[2/6] Setup Test Data
    - Enable CAN_INTERNAL_TRANSFER for USDT
    - Create 1000 USDT in Funding for user 1001
    - Clear previous transfer records

[3/6] Start Gateway
    - Stop existing Gateway (pgrep + kill)
    - Start new Gateway with updated config
    - Wait for health check

[4/6] Run Transfer Tests
    - Funding → Spot (50 USDT)
    - Spot → Funding (25 USDT)
    - Verify both COMMITTED

[5/6] Verify Balance Changes
    - Check Funding: 1000 → 975 (Δ-25)
    - Use /balances/all API

[6/6] Cleanup
    - Stop Gateway

API 测试 / API Testing

使用 Python 客户端 / Using Python Client

import sys
sys.path.append('scripts/lib')
from api_auth import get_test_client

USER_ID = 1001
client = get_test_client(user_id=USER_ID)
headers = {'X-User-ID': str(USER_ID)}

# 1. 查询余额 / Query balances
resp = client.get('/api/v1/private/balances/all', headers=headers)
print(resp.json())

# 2. 发起转账 / Create transfer
resp = client.post('/api/v1/private/transfer',
    json_body={
        'from': 'funding',
        'to': 'spot',
        'asset': 'USDT',
        'amount': '50'
    },
    headers=headers)
print(resp.json())

# 3. 查询转账状态 / Query transfer status
req_id = resp.json()['data']['req_id']
resp = client.get(f'/api/v1/private/transfer/{req_id}', headers=headers)
print(resp.json())

使用 curl / Using curl

# 查询余额 (需要正确签名)
curl http://localhost:8080/api/v1/private/balances/all \
  -H "X-API-Key: AK_0000000000001001" \
  -H "X-Signature: ..." \
  -H "X-User-ID: 1001"

数据库验证 / Database Verification

检查余额 / Check Balances

PGPASSWORD=trading123 psql -h localhost -p 5433 -U trading -d exchange_info_db -c "
SELECT 
    CASE account_type WHEN 1 THEN 'Spot' WHEN 2 THEN 'Funding' END as account,
    (available / 1000000)::text || ' USDT' as balance
FROM balances_tb 
WHERE user_id = 1001 AND asset_id = 2
ORDER BY account_type;
"

检查 FSM 状态 / Check FSM State

PGPASSWORD=trading123 psql -h localhost -p 5433 -U trading -d exchange_info_db -c "
SELECT req_id, amount, state, created_at 
FROM fsm_transfers_tb 
WHERE user_id = 1001
ORDER BY created_at DESC LIMIT 5;
"

State 值含义 / State Values:

0: INIT
10: SOURCE_PENDING
20: SOURCE_DONE
30: TARGET_PENDING
40: COMMITTED ✅
-10: FAILED
-20: COMPENSATING
-30: ROLLED_BACK

已修复的 Bug / Fixed Bugs

1. FSM 未执行 / FSM Not Executing

问题: create_transfer_fsm 只调用 coordinator.create()，没有调用 coordinator.execute()

修复: 添加 execute() 调用

#![allow(unused)]
fn main() {
// src/transfer/api.rs
let req_id = coordinator.create(core_req).await?;
let state = coordinator.execute(req_id).await?; // ← Added
}

2. 金额解析为 0 / Amount Parsed as 0

问题: Decimal.to_string().parse::<u64>() 对 “50000000.00000000” 返回失败

修复: 使用 trunc().to_i64()

#![allow(unused)]
fn main() {
// src/transfer/db.rs
let amount_u64 = amount.trunc().to_i64().unwrap_or(0) as u64;
}

3. 类型不匹配 / Type Mismatch

status 列: INT4 (i32), 不是 INT2
decimals 列: INT2 (i16), 不是 i32

测试结果示例 / Sample Test Output

==============================================
Internal Transfer E2E Test (Phase 0x0B-a)
==============================================

[1/6] Checking prerequisites...
  ✓ PostgreSQL connected
  ✓ Release binary ready
[2/6] Setting up test data...
  ✓ Test data initialized (1000 USDT in Funding only for user 1001)
[3/6] Starting Gateway...
  ✓ Gateway ready
[4/6] Running transfer tests with balance verification...
  [BEFORE] Getting initial balances...
    USDT:funding: 1000.00

  [TRANSFER 1] Funding → Spot (50 USDT)...
    ✓ COMMITTED
  [TRANSFER 2] Spot → Funding (25 USDT)...
    ✓ COMMITTED

  [AFTER] Getting final Funding balance...
    USDT:funding: 975.00

  [VERIFY] Checking Funding balance changes...
    ✓ Funding: 1000.00 → 975.00 (Δ-25.00)

  Results: 3 passed, 0 failed
[5/6] Final database state...
 Funding | 975.0000000000000000 USDT

[6/6] Cleanup...

==============================================
✅ All E2E Transfer Tests PASSED
==============================================

文件 / File	描述 / Description
`scripts/test_transfer_e2e.sh`	E2E 测试脚本
`scripts/lib/api_auth.py`	API 认证库
`src/transfer/api.rs`	转账 API 处理
`src/transfer/coordinator.rs`	FSM 协调器
`src/transfer/adapters/funding.rs`	Funding 适配器
`src/transfer/adapters/trading.rs`	Trading 适配器

Build & Verification Guide | 编译与验证事项

0x0C Trade Fee System | 交易手续费系统

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

1. Overview

1.1 Connecting the Dots: From Transfer to Trading

In 0x0B, we built the FSM mechanism for fund transfers between Funding and Spot accounts. Once funds enter the Spot account, the exchange needs a revenue source.

This is the topic of this chapter: Trade Fee.

Whenever buyers and sellers execute trades, the exchange collects a percentage fee. This is the core business model of exchanges and the foundation for sustainable operations.

Design Philosophy: Fee implementation seems simple (just deducting a percentage, right?), but involves multiple key decisions:

Where to configure fee rates? (Symbol level vs Global)

Which asset to deduct from? (Paid vs Received)

When to deduct? (In ME vs In Settlement)

How to ensure precision? (u64 * bps / 10000 overflow issues)

1.2 Goal

Implement Maker/Taker fee model for trade execution. Fees are the primary revenue source for exchanges

1.3 Key Concepts

Term	Definition
Maker	Order that adds liquidity (resting on orderbook)
Taker	Order that removes liquidity (matches immediately)
Fee Rate	Percentage of trade value charged
bps	Basis points (1 bps = 0.01% = 0.0001)

1.4 Architecture Overview

┌─────────── Fee Model ────────────┐
│                                  │
│  Final Rate = Symbol.base_fee    │
│             × VipDiscount / 100  │
└──────────────────────────────────┘

┌─────────── Data Flow ─────────────────────────────────────────────────────┐
│                                                                           │
│  ME ────▶ Trade{role} ────▶ UBSCore ────▶ BalanceEventBatch ────▶ TDengine
│              │                  │              │                          │
│              │           Memory: VIP/Fees      ├── buyer event            │
│              │           O(1) fee calc         ├── seller event           │
│              │                                 └── revenue event ×2       │
│              │                                                            │
└──────────────┴────────────────────────────────────────────────────────────┘

┌─────────── Core Design ───────────┐
│ ✅ Fee from Gain → No reservation │
│ ✅ UBSCore billing → Balance auth │
│ ✅ Per-User Event → Decoupled     │
│ ✅ Event Sourcing → Conservation  │
└───────────────────────────────────┘

2. Fee Model Design

2.1 Why Maker/Taker Model?

Traditional stock exchanges use fixed rates, but crypto exchanges universally adopt the Maker/Taker model. This is not arbitrary:

Problem	How Maker/Taker Solves
Low liquidity	Low Maker fees encourage limit orders
Price discovery	Deeper orderbook, narrower spreads
Fairness	Liquidity takers pay more

Industry Practice: Binance, OKX, Bybit all use this model.

2.2 Fee Rate Architecture

Two-Layer System: Symbol base rate × VIP discount coefficient

Final Rate = Symbol.base_fee × VipDiscountTable[user.vip_level] / 100

Layer 1: Symbol Base Rate

Each trading pair defines its own base rate:

Field	Precision	Default	Description
`base_maker_fee`	10^6	1000	0.10%
`base_taker_fee`	10^6	2000	0.20%

Layer 2: VIP Discount Coefficient

VIP levels and discounts are configured from database (not hardcoded).

VIP Level Table Design:

Field	Type	Description
`level`	SMALLINT PK	VIP level (0, 1, 2, …)
`discount_percent`	SMALLINT	Discount % (100=no discount, 50=50% off)
`min_volume`	DECIMAL	Trading volume for upgrade (optional)
`description`	VARCHAR	Level description (optional)

Example Data:

level	discount_percent	description
0	100	Normal
1	90	VIP 1
2	80	VIP 2
3	70	VIP 3
…	…	…

Operations can configure any number of VIP levels; code loads from database.

Example Calculation:

BTC_USDT: base_taker_fee = 2000 (0.20%)
User VIP 5: discount = 50%
Final Rate = 2000 × 50 / 100 = 1000 (0.10%)

Why 10^6 Precision?

10^4 (bps) only represents down to 0.01%, not fine enough

10^6 can represent 0.0001%, sufficient for VIP discounts and rebates

Safe with u128 intermediate: (amount as u128 * rate as u128 / 10^6) as u64

2.3 Fee Collection Point

Trade: Alice (Taker, BUY) ← → Bob (Maker, SELL)
       Alice buys 1 BTC @ 100,000 USDT

┌──────────────────────────────────────────────────────────┐
│ Before Fee:                                              │
│   Alice: -100,000 USDT, +1 BTC                          │
│   Bob:   +100,000 USDT, -1 BTC                          │
├──────────────────────────────────────────────────────────┤
│ After Fee (deducted from RECEIVED asset):               │
│   Alice (Taker 0.20%): -100,000 USDT, +0.998 BTC        │
│   Bob (Maker 0.10%):   +99,900 USDT,  -1 BTC            │
│                                                          │
│   Exchange collects: 0.002 BTC + 100 USDT               │
└──────────────────────────────────────────────────────────┘

Rule: Fee is always deducted from what you receive, not what you pay.

Why deduct from received asset?

Simplify user mental accounting: User pays 100 USDT, it’s exactly 100 USDT

Avoid budget overrun: Buying 1 BTC won’t require 100,020 USDT due to fees

Industry practice: Binance, Coinbase all do this

2.4 Why No Lock Reservation Needed

Since fees are deducted from received asset, no fee reservation needed:

┌─────────────────────────────────────────────────────────────────────┐
│ Benefits of Fee from Gain (Received Asset)                         │
├─────────────────────────────────────────────────────────────────────┤
│ User receives 1 BTC → Deduct 0.002 BTC fee → Net credit 0.998 BTC │
│                                                                     │
│ ✅ Never "insufficient balance for fee"                            │
│ ✅ Pay amount = Actual pay amount (exact)                          │
│ ✅ No complex reservation/refund logic                             │
└─────────────────────────────────────────────────────────────────────┘

Compare with deducting from paid asset:

Approach	Lock Amount	Issue
From Gain	`base_cost`	No extra reservation ✅
From Pay	`base_cost + max_fee`	May insufficient, need reservation ❌

Design Decision: Use “fee from gain” mode, simplify lock logic.

Buy order locks USDT, fee deducted from received BTC

Sell order locks BTC, fee deducted from received USDT

2.5 Fee Responsibility: UBSCore (First Principles)

Core Question: Who is responsible for fee calculation?

Fee deduction = Balance change = Must be executed by UBSCore

Question	Answer
Who knows trade occurred?	ME
Who manages balances?	UBSCore
Who can execute deductions?	UBSCore
Who is responsible for fees?	UBSCore

Data Flow:

ME ──▶ Trade{role} ──▶ UBSCore ──▶ BalanceEvent{fee} ──▶ Settlement ──▶ TDengine
                          │
                     ① Get VIP level (memory)
                     ② Get Symbol fee rate (memory)
                     ③ Calculate fee = received × rate
                     ④ credit(net_amount)

2.6 High Performance Design

Key to efficiency: All config in UBSCore memory

UBSCore Memory Structure (loaded at startup):
├── user_vip_levels: HashMap<UserId, u8>
├── vip_discounts: HashMap<u8, u8>  // level → discount%
└── symbol_fees: HashMap<SymbolId, (u64, u64)>  // (maker, taker)

Fee calculation = Pure memory operation, O(1)

Component	Responsibility	Blocking?
UBSCore	Calculate fee, update balance	❌ Pure memory
BalanceEvent	Pass fee info	❌ Async channel
Settlement	Write to TDengine	❌ Separate thread

Why efficient?

No I/O on critical path

All data in memory

Output reuses existing BalanceEvent channel

2.7 Per-User BalanceEvent Design

Core Insight: One Trade produces two users’ balance changes → Two BalanceEvents

Trade ──▶ UBSCore ──┬──▶ BalanceEvent{user: buyer}  ──▶ WS + TDengine
                    │
                    └──▶ BalanceEvent{user: seller} ──▶ WS + TDengine

Per-User Event Structure:

Field	Type	Description
`trade_id`	u64	Links to original Trade
`user_id`	u64	Who this event belongs to
`debit_asset`	u32	Asset paid
`debit_amount`	u64	Amount paid
`credit_asset`	u32	Asset received
`credit_amount`	u64	Net amount (after fee)
`fee`	u64	Fee charged
`is_maker`	bool	Is Maker role

Example Code (Pseudocode, for reference only):

#![allow(unused)]
fn main() {
// ⚠️ Pseudocode - may change during implementation
BalanceEvent::TradeSettled {
    trade_id: u64,         // Links to original Trade
    user_id: u64,          // Who this event belongs to
    
    debit_asset: u32,      // Paid
    debit_amount: u64,
    credit_asset: u32,     // Received (net)
    credit_amount: u64,
    
    fee: u64,              // Fee
    is_maker: bool,        // Role
}
}

Why Per-User Design?

Single responsibility: One event = One user’s balance change

Decoupled: User doesn’t need to know counterparty

WebSocket friendly: Route directly by user_id

Query friendly: TDengine partitioned by user_id

Privacy safe: User only sees own data

3. Data Model

3.1 Symbol Base Fee Configuration

-- Symbol base fee (10^6 precision: 1000 = 0.10%)
ALTER TABLE symbols_tb ADD COLUMN base_maker_fee INTEGER NOT NULL DEFAULT 1000;
ALTER TABLE symbols_tb ADD COLUMN base_taker_fee INTEGER NOT NULL DEFAULT 2000;

3.2 User VIP Level

-- User VIP level (0-9, 0=normal user, 9=top tier)
ALTER TABLE users_tb ADD COLUMN vip_level SMALLINT NOT NULL DEFAULT 0;

3.3 Trade Record Enhancement

Existing Trade struct already has:

fee: u64 - Amount of fee charged (in received asset’s scaled units)
role: u8 - 0=Maker, 1=Taker

3.4 Fee Record Storage

Fee info is already included in Trade record:

Storage	Content
`trades_tb` (TDengine)	`fee`, `fee_asset`, `role` fields
Trade Event	Real-time push to downstream (WS, Kafka)

3.5 Event Sourcing: BalanceEventBatch (Full Traceability)

Core Design: One Trade produces a group of BalanceEvents as atomic unit

Trade ──▶ UBSCore ──▶ BalanceEventBatch{trade_id, events: [...]}
                              │
                              ├── TradeSettled{user: buyer}   // Buyer
                              ├── TradeSettled{user: seller}  // Seller
                              ├── FeeReceived{account: REVENUE, from: buyer}
                              └── FeeReceived{account: REVENUE, from: seller}

Example Structure (Pseudocode):

#![allow(unused)]
fn main() {
// ⚠️ Pseudocode - may change during implementation
BalanceEventBatch {
    trade_id: u64,
    ts: Timestamp,
    events: [
        TradeSettled{user: buyer_id, debit_asset, debit_amount, credit_asset, credit_amount, fee},
        TradeSettled{user: seller_id, debit_asset, debit_amount, credit_asset, credit_amount, fee},
        FeeReceived{account: REVENUE_ID, asset: base_asset, amount: buyer_fee, from_user: buyer_id},
        FeeReceived{account: REVENUE_ID, asset: quote_asset, amount: seller_fee, from_user: seller_id},
    ]
}
}

Atomic Unit Properties:

Property	Description
Generated together	Same trade_id
Persisted together	Single batch write to TDengine
Traced together	All events linked by trade_id

Asset Conservation Verification:

buyer.debit(quote)  + buyer.credit(base - fee)   = 0  ✓
seller.debit(base)  + seller.credit(quote - fee) = 0  ✓
revenue.credit(buyer_fee + seller_fee)           = fee_total ✓

Σ changes = 0 (Asset conservation, auditable)

TDengine Storage (Event Sourcing):

Table	Content
`balance_events_tb`	All BalanceEvents (TradeSettled + FeeReceived)

Why Event Sourcing?

Full traceability: Any fee can be traced to trade_id + user_id

Asset conservation: Conservation verifiable within event batch

Aggregation is derived: Balance = SUM(events), computed on demand

4. Implementation Architecture

4.1 Complete Data Flow

┌───────────┐    ┌───────────┐    ┌─────────────────────────────────────────┐
│    ME     │───▶│  UBSCore  │───▶│         BalanceEventBatch               │
│  (Match)  │    │ (Fee calc)│    │  ┌─ TradeSettled{buyer}                 │
└───────────┘    └───────────┘    │  ├─ TradeSettled{seller}                │
                      │           │  ├─ FeeReceived{REVENUE, from:buyer}    │
                      │           │  └─ FeeReceived{REVENUE, from:seller}   │
          Memory: VIP/Fee rates   └───────────────┬─────────────────────────┘
                                                  │
                                                  ▼
                              ┌──────────────────────────────────────────────┐
                              │              Settlement Service              │
                              │  ① Batch write to TDengine                   │
                              │  ② WebSocket push (routed by user_id)       │
                              │  ③ Kafka publish (optional)                 │
                              └──────────────────────────────────────────────┘

4.2 TDengine Schema Design

balance_events Super Table:

CREATE STABLE balance_events (
    ts          TIMESTAMP,
    event_type  TINYINT,       -- 1=TradeSettled, 2=FeeReceived, 3=Deposit...
    trade_id    BIGINT,
    debit_asset INT,
    debit_amt   BIGINT,
    credit_asset INT,
    credit_amt  BIGINT,
    fee         BIGINT,
    fee_asset   INT,
    is_maker    BOOL,
    from_user   BIGINT         -- FeeReceived: source user
) TAGS (
    user_id       BIGINT,      -- User identifier (0=REVENUE)
    account_type  TINYINT      -- 1=Spot, 2=Funding, 3=Futures...
);

-- Subtable per (user, account_type)
CREATE TABLE user_1001_spot USING balance_events TAGS (1001, 1);
CREATE TABLE user_1001_funding USING balance_events TAGS (1001, 2);
CREATE TABLE revenue_spot USING balance_events TAGS (0, 1);  -- REVENUE

Design Points:

Design	Rationale
Dual TAGs `(user_id, account_type)`	Future-proof for Futures, Margin…
Partition by user_id	User queries scan only their tables
Partition by account_type	Account-specific queries are O(1)
Timestamp index	TDengine native optimization

4.3 Query Patterns

User query fee history:

SELECT ts, trade_id, fee, fee_asset, is_maker
FROM user_1001_events
WHERE event_type = 1  -- TradeSettled
  AND ts > NOW() - 30d
ORDER BY ts DESC
LIMIT 100;

Platform fee income stats:

SELECT fee_asset, SUM(credit_amt) as total_fee
FROM revenue_events
WHERE ts > NOW() - 1d
GROUP BY fee_asset;

Trace all events for a trade:

SELECT * FROM balance_events
WHERE trade_id = 12345
ORDER BY ts;

4.4 Consumer Architecture

BalanceEventBatch
       │
       ├──▶ TDengine Writer (batch write, high throughput)
       │       └── Route to subtable by (user_id, account_type)
       │
       ├──▶ WebSocket Router (real-time push)
       │       └── Route to WS connection by user_id
       │
       └──▶ Kafka Publisher (optional, downstream subscription)
               └── Topic: balance_events

4.5 Performance Considerations

Optimization	Strategy
Batch write	BalanceEventBatch writes at once
Partition strategy	Partition by user_id, avoid hotspots
Time partition	TDengine auto partitions by time
Async processing	UBSCore doesn’t wait after send

5. API Changes

5.1 Trade Response

{
  "trade_id": "12345",
  "price": "100000.00",
  "qty": "1.00000000",
  "fee": "0.00200000",       // NEW: Fee amount
  "fee_asset": "BTC",        // NEW: Fee asset
  "role": "TAKER"            // NEW: Maker/Taker
}

5.2 WebSocket Trade Update

{
  "e": "trade.update",
  "data": {
    "trade_id": "12345",
    "fee": "0.002",
    "fee_asset": "BTC",
    "is_maker": false
  }
}

6. Edge Cases

Case	Handling
Zero-fee symbol	Allow `maker_fee = 0`
Insufficient for fee	N/A - fee always deducted from received asset

7. Verification Plan

7.1 Unit Tests

Fee calculation accuracy (multiple precisions)
Maker vs Taker role assignment

7.2 Integration Tests

E2E trade with fee deduction
Fee ledger reconciliation

7.3 Acceptance Criteria

Trades deduct correct fees
Fee ledger matches Σ(trade.fee)
API returns fee info
WS pushes fee info

↑ Back to Top

🇨🇳 中文

📦 代码变更: 查看 Diff

1. 概述

1.1 从资金划转到交易

在 0x0B 章节中，我们建立了资金划转机制。本章的主题是交易手续费——交易所最核心的商业模式。

1.2 目标

实现 Maker/Taker 手续费模型。

1.3 核心概念

术语	定义
Maker	挂单方 (订单在盘口等待成交)
Taker	吃单方 (订单立即匹配成交)
费率	交易额的百分比
bps	基点 (1 bps = 0.01%)

1.4 架构总览

┌─────────── 费率模型 ────────────┐
│  最终费率 = Symbol.base_fee    │
│           × VipDiscount / 100  │
└────────────────────────────────┘

┌─────────── 数据流 ─────────────────────────────────────────────────────┐
│  ME ────▶ Trade{role} ────▶ UBSCore ────▶ BalanceEventBatch ────▶ TDengine
│              │                  │              │                       │
│              │           内存: VIP/费率        ├── buyer event         │
│              │           O(1) fee 计算         ├── seller event        │
│              │                                 └── revenue event ×2    │
└──────────────┴─────────────────────────────────────────────────────────┘

┌─────────── 核心设计 ───────────┐
│ ✅ 从 Gain 扣费 → 无需预留     │
│ ✅ UBSCore 计费 → 余额权威     │
│ ✅ Per-User Event → 解耦隐私   │
│ ✅ Event Sourcing → 资产守恒   │
└────────────────────────────────┘

2. 费率模型设计

2.1 为什么选择 Maker/Taker?

问题	解决方案
流动性不足	低 Maker 费率鼓励挂单
价格发现	盘口深度越深，价差越小
公平性	消耗流动性者多付费

2.2 两层费率体系

最终费率 = Symbol.base_fee × VipDiscount[vip_level] / 100

Layer 1: Symbol 基础费率

字段	精度	默认值	说明
`base_maker_fee`	10^6	1000	0.10%
`base_taker_fee`	10^6	2000	0.20%

Layer 2: VIP 折扣系数

字段	类型	说明
`level`	SMALLINT PK	VIP 等级
`discount_percent`	SMALLINT	折扣百分比

2.3 手续费扣除点

规则: 手续费从收到的资产扣除，不是支付的资产。

Alice (Taker, BUY) 以 100,000 USDT 购买 1 BTC

Before: Alice -100,000 USDT, +1 BTC
After:  Alice -100,000 USDT, +0.998 BTC (手续费 0.002 BTC)

2.4 无需预留手续费

从 Gain 扣费的好处：

✅ 永远不会“余额不足付手续费“
✅ 支付金额 = 实际支付金额
✅ 无需复杂的预留/退还逻辑

2.5 计费责任: UBSCore (第一性原理)

费用扣除 = 余额变动 = 必须由 UBSCore 执行

问题	答案
谁管理余额？	UBSCore
谁能执行扣款？	UBSCore
谁负责计费？	UBSCore

2.6 高性能设计

UBSCore 内存结构 (启动时加载):
├── user_vip_levels: HashMap<UserId, u8>
├── vip_discounts: HashMap<u8, u8>
└── symbol_fees: HashMap<SymbolId, (u64, u64)>

费用计算 = 纯内存操作, O(1)

2.7 Per-User BalanceEvent

一个 Trade → 两个用户事件

Trade ──▶ UBSCore ──┬──▶ BalanceEvent{user: buyer}
                    └──▶ BalanceEvent{user: seller}

3. 数据模型

3.1 Symbol 费率配置

ALTER TABLE symbols_tb ADD COLUMN base_maker_fee INTEGER NOT NULL DEFAULT 1000;
ALTER TABLE symbols_tb ADD COLUMN base_taker_fee INTEGER NOT NULL DEFAULT 2000;

3.2 User VIP 等级

ALTER TABLE users_tb ADD COLUMN vip_level SMALLINT NOT NULL DEFAULT 0;

3.3 Event Sourcing: BalanceEventBatch

一个 Trade 产生一组 BalanceEvent 作为原子整体：

BalanceEventBatch{trade_id}
├── TradeSettled{user: buyer}
├── TradeSettled{user: seller}
├── FeeReceived{REVENUE, from: buyer}
└── FeeReceived{REVENUE, from: seller}

资产守恒验证:

buyer.debit(quote)  + buyer.credit(base - fee)   = 0  ✓
seller.debit(base)  + seller.credit(quote - fee) = 0  ✓
revenue.credit(buyer_fee + seller_fee)           = fee_total ✓

Σ 变动 = 0 (可审计)

4. 实现架构

4.1 TDengine Schema

CREATE STABLE balance_events (
    ts          TIMESTAMP,
    event_type  TINYINT,
    trade_id    BIGINT,
    debit_asset INT,
    debit_amt   BIGINT,
    credit_asset INT,
    credit_amt  BIGINT,
    fee         BIGINT,
    fee_asset   INT,
    is_maker    BOOL
) TAGS (
    user_id       BIGINT,      -- 用户 ID (0=REVENUE)
    account_type  TINYINT      -- 1=Spot, 2=Funding, 3=Futures...
);

4.2 查询模式

-- 用户手续费历史
SELECT ts, trade_id, fee FROM user_1001_events WHERE event_type = 1;

-- 平台收入统计
SELECT fee_asset, SUM(credit_amt) FROM revenue_events GROUP BY fee_asset;

4.3 消费者架构

BalanceEventBatch
├──▶ TDengine Writer (批量写入)
├──▶ WebSocket Router (按 user_id 推送)
└──▶ Kafka Publisher (可选)

5. API 变更

5.1 Trade 响应

{
  "trade_id": "12345",
  "fee": "0.002",
  "fee_asset": "BTC",
  "role": "TAKER"
}

5.2 WebSocket 推送

{
  "e": "trade.update",
  "data": {"trade_id": "12345", "fee": "0.002", "is_maker": false}
}

6. 边界情况

情况	处理
零费率交易对	允许 `maker_fee = 0`

7. 验证计划

手续费计算准确性测试
E2E 交易手续费扣除
API/WS 返回手续费信息
资产守恒审计

↑ 返回顶部

0x0D Snapshot & Recovery: Robustness

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📅 Status: 🚧 Under Construction Core Objective: Implement graceful shutdown and state recovery mechanisms.

1. Overview

Snapshot: Periodically save the memory state (OrderBook, Balances) to disk.
Recovery: Restore state from the latest snapshot + replay WAL (Write-Ahead Log) upon restart.
Graceful Shutdown: Ensure all pending events are processed before stopping.

(Detailed content coming soon)

↑ Back to Top

🇨🇳 中文

📅 状态: 🚧 建设中 核心目标: 实现优雅停机与状态恢复机制。

1. 概述

快照 (Snapshot): 定期将内存状态（OrderBook, Balances）保存到磁盘。
恢复 (Recovery): 重启时从最新快照恢复 + 重放 WAL (Write-Ahead Log)。
优雅停机: 确保在停止前处理完所有挂起事件。

(详细内容即将推出)

0x0E OpenAPI Integration

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

1. Overview

1.1 Why OpenAPI?

Programmatic traders need API documentation to integrate with our exchange. Instead of maintaining separate docs that drift from code, we auto-generate OpenAPI 3.0 spec directly from Rust types.

1.2 Goal

Serve interactive API docs at /docs (Swagger UI)
Export openapi.json for SDK generation
Keep docs in sync with code (single source of truth)

1.3 Key Concepts

Term	Definition
OpenAPI	Industry-standard API specification format (formerly Swagger)
utoipa	Rust crate for compile-time OpenAPI generation
Swagger UI	Interactive API documentation interface
Code-First	Generate spec from code, not YAML files

1.4 Architecture Overview

┌─────────── OpenAPI Integration Flow ────────────┐
│                                                  │
│  Rust Handlers ──▶ #[utoipa::path] ──▶ OpenAPI   │
│       │                                   │      │
│       │                                   ▼      │
│       │                            Swagger UI    │
│       │                            (/docs)       │
│       │                                   │      │
│       ▼                                   ▼      │
│  Type-Safe API ◀─────────────────▶ openapi.json │
│                                          │      │
│                                          ▼      │
│                                    SDK Clients  │
│                                  (Python, TS)   │
└─────────────────────────────────────────────────┘

2. Implementation

2.1 Adding Dependencies

Cargo.toml:

[dependencies]
+ utoipa = { version = "5.3", features = ["axum_extras", "chrono", "uuid"] }
+ utoipa-swagger-ui = { version = "8.0", features = ["axum"] }

2.2 Creating OpenAPI Module

Create src/gateway/openapi.rs:

#![allow(unused)]
fn main() {
use utoipa::OpenApi;

#[derive(OpenApi)]
#[openapi(
    info(
        title = "Zero X Infinity Exchange API",
        version = "1.0.0",
        description = "High-performance crypto exchange API (1.3M orders/sec)"
    ),
    paths(
        handlers::health_check,
        handlers::get_depth,
        handlers::get_klines,
        // ... all API handlers
    ),
    components(schemas(
        types::ApiResponse<()>,
        types::DepthApiData,
        // ... all response types
    ))
)]
pub struct ApiDoc;
}

2.3 Annotating Handlers

Add #[utoipa::path] to each handler:

+ #[utoipa::path(
+     get,
+     path = "/api/v1/public/depth",
+     params(
+         ("symbol" = String, Query, description = "Trading pair"),
+         ("limit" = Option<u32>, Query, description = "Depth levels")
+     ),
+     responses(
+         (status = 200, description = "Order book depth", body = ApiResponse<DepthApiData>)
+     ),
+     tag = "Market Data"
+ )]
  pub async fn get_depth(
      State(state): State<Arc<AppState>>,
      Query(params): Query<HashMap<String, String>>,
  ) -> impl IntoResponse {
      // ... existing implementation ...
  }

2.4 Adding Schema Derivations

Add ToSchema to response types:

+ use utoipa::ToSchema;

- #[derive(Serialize, Deserialize)]
+ #[derive(Serialize, Deserialize, ToSchema)]
  pub struct DepthApiData {
+     #[schema(example = "BTC_USDT")]
      pub symbol: String,
+     #[schema(example = json!([["85000.00", "0.5"]]))]
      pub bids: Vec<[String; 2]>,
+     #[schema(example = json!([["85001.00", "0.3"]]))]
      pub asks: Vec<[String; 2]>,
  }

2.5 Integrating Swagger UI

In src/gateway/mod.rs:

+ use utoipa_swagger_ui::SwaggerUi;
+ use crate::gateway::openapi::ApiDoc;

  let app = Router::new()
      .route("/api/v1/health", get(handlers::health_check))
      .nest("/api/v1/public", public_routes)
      .nest("/api/v1/private", private_routes)
+     .merge(
+         SwaggerUi::new("/docs")
+             .url("/api-docs/openapi.json", ApiDoc::openapi())
+     )
      .with_state(state);

3. API Endpoints

3.1 Public Endpoints (No Auth)

Endpoint	Method	Description
`/api/v1/health`	GET	Health check
`/api/v1/public/depth`	GET	Order book depth
`/api/v1/public/klines`	GET	K-line data
`/api/v1/public/assets`	GET	Asset list
`/api/v1/public/symbols`	GET	Trading pairs
`/api/v1/public/exchange_info`	GET	Exchange metadata

3.2 Private Endpoints (Ed25519 Auth)

Endpoint	Method	Description
`/api/v1/private/order`	POST	Create order
`/api/v1/private/cancel`	POST	Cancel order
`/api/v1/private/orders`	GET	Query orders
`/api/v1/private/trades`	GET	Trade history
`/api/v1/private/balances`	GET	Balance query
`/api/v1/private/balances/all`	GET	All balances
`/api/v1/private/transfer`	POST	Internal transfer
`/api/v1/private/transfer/{id}`	GET	Transfer status

4. SDK Generation

4.1 Python SDK

Auto-generated Python client with Ed25519 signing:

from zero_x_infinity_sdk import ZeroXInfinityClient

client = ZeroXInfinityClient(
    api_key="your_api_key",
    secret_key_bytes=secret_key  # Ed25519 private key
)

# Create order
order = client.create_order(
    symbol="BTC_USDT",
    side="BUY",
    price="85000.00",
    qty="0.001"
)

4.2 TypeScript SDK

import { ZeroXInfinityClient } from './zero_x_infinity_sdk';

const client = new ZeroXInfinityClient(apiKey, secretKey);
const depth = await client.getDepth('BTC_USDT');

5. Verification

5.1 Access Swagger UI

cargo run --release -- --gateway --port 8080
# Open: http://localhost:8080/docs

5.2 Test Results

Test Category	Tests	Result
Unit Tests	293	✅ All pass
Public Endpoints	6	✅ All pass
Private Endpoints	9	✅ All pass
E2E Total	17	✅ All pass

6. Summary

In this chapter, we added OpenAPI documentation to our trading engine:

Achievement	Result
Swagger UI	Available at `/docs`
OpenAPI Spec	15 endpoints documented
Python SDK	Auto-generated with Ed25519
TypeScript SDK	Type-safe client
Zero Breaking Changes	All existing tests pass

Next Chapter: With resilience (0x0D) and documentation (0x0E) complete, the foundation is solid. The next logical step is 0x0F: Deposit & Withdraw—connecting to blockchain for real crypto funding.

↑ Back to Top

🇨🇳 中文

📦 代码变更: 查看 Diff

1. 概述

1.1 为什么需要 OpenAPI？

程序化交易者需要 API 文档。与其手写 YAML 文档（容易和代码不同步），不如直接从 Rust 类型生成 OpenAPI 3.0 规范。

1.2 目标

在 /docs 提供交互式文档（Swagger UI）
导出 openapi.json 用于 SDK 生成
文档和代码保持同步（单一事实来源）

1.3 核心概念

术语	定义
OpenAPI	行业标准的 API 规范格式（前身是 Swagger）
utoipa	Rust 编译时 OpenAPI 生成库
Swagger UI	交互式 API 文档界面
代码优先	从代码生成规范，而非 YAML 文件

1.4 架构总览

┌─────────── OpenAPI 集成流程 ────────────┐
│                                          │
│  Rust Handlers ──▶ #[utoipa::path] ──▶ OpenAPI
│       │                                   │
│       │                                   ▼
│       │                            Swagger UI
│       │                            (/docs)
│       ▼                                   │
│  类型安全 API ◀────────────────▶ openapi.json
│                                          │
│                                          ▼
│                                    SDK 客户端
│                                  (Python, TS)
└──────────────────────────────────────────┘

2. 实现

2.1 添加依赖

Cargo.toml:

[dependencies]
+ utoipa = { version = "5.3", features = ["axum_extras", "chrono", "uuid"] }
+ utoipa-swagger-ui = { version = "8.0", features = ["axum"] }

2.2 创建 OpenAPI 模块

创建 src/gateway/openapi.rs：

#![allow(unused)]
fn main() {
use utoipa::OpenApi;

#[derive(OpenApi)]
#[openapi(
    info(
        title = "Zero X Infinity Exchange API",
        version = "1.0.0",
        description = "高性能加密货币交易所 API (1.3M 订单/秒)"
    ),
    paths(
        handlers::health_check,
        handlers::get_depth,
        handlers::get_klines,
        // ... 所有 API handlers
    ),
    components(schemas(
        types::ApiResponse<()>,
        types::DepthApiData,
        // ... 所有响应类型
    ))
)]
pub struct ApiDoc;
}

2.3 注解 Handlers

为每个 handler 添加 #[utoipa::path]：

+ #[utoipa::path(
+     get,
+     path = "/api/v1/public/depth",
+     params(
+         ("symbol" = String, Query, description = "交易对"),
+         ("limit" = Option<u32>, Query, description = "深度层数")
+     ),
+     responses(
+         (status = 200, description = "订单簿深度", body = ApiResponse<DepthApiData>)
+     ),
+     tag = "行情数据"
+ )]
  pub async fn get_depth(
      State(state): State<Arc<AppState>>,
      Query(params): Query<HashMap<String, String>>,
  ) -> impl IntoResponse {
      // ... 现有实现 ...
  }

2.4 添加 Schema 派生

为响应类型添加 ToSchema：

+ use utoipa::ToSchema;

- #[derive(Serialize, Deserialize)]
+ #[derive(Serialize, Deserialize, ToSchema)]
  pub struct DepthApiData {
+     #[schema(example = "BTC_USDT")]
      pub symbol: String,
+     #[schema(example = json!([["85000.00", "0.5"]]))]
      pub bids: Vec<[String; 2]>,
+     #[schema(example = json!([["85001.00", "0.3"]]))]
      pub asks: Vec<[String; 2]>,
  }

2.5 集成 Swagger UI

在 src/gateway/mod.rs 中：

+ use utoipa_swagger_ui::SwaggerUi;
+ use crate::gateway::openapi::ApiDoc;

  let app = Router::new()
      .route("/api/v1/health", get(handlers::health_check))
      .nest("/api/v1/public", public_routes)
      .nest("/api/v1/private", private_routes)
+     .merge(
+         SwaggerUi::new("/docs")
+             .url("/api-docs/openapi.json", ApiDoc::openapi())
+     )
      .with_state(state);

3. API 端点

3.1 公开端点（无需认证）

端点	方法	描述
`/api/v1/health`	GET	健康检查
`/api/v1/public/depth`	GET	订单簿深度
`/api/v1/public/klines`	GET	K 线数据
`/api/v1/public/assets`	GET	资产列表
`/api/v1/public/symbols`	GET	交易对
`/api/v1/public/exchange_info`	GET	交易所信息

3.2 私有端点（Ed25519 认证）

端点	方法	描述
`/api/v1/private/order`	POST	创建订单
`/api/v1/private/cancel`	POST	取消订单
`/api/v1/private/orders`	GET	查询订单
`/api/v1/private/trades`	GET	成交历史
`/api/v1/private/balances`	GET	余额查询
`/api/v1/private/balances/all`	GET	所有余额
`/api/v1/private/transfer`	POST	内部划转
`/api/v1/private/transfer/{id}`	GET	划转状态

4. SDK 生成

4.1 Python SDK

自动生成的 Python 客户端（含 Ed25519 签名）：

from zero_x_infinity_sdk import ZeroXInfinityClient

client = ZeroXInfinityClient(
    api_key="your_api_key",
    secret_key_bytes=secret_key  # Ed25519 私钥
)

# 创建订单
order = client.create_order(
    symbol="BTC_USDT",
    side="BUY",
    price="85000.00",
    qty="0.001"
)

4.2 TypeScript SDK

import { ZeroXInfinityClient } from './zero_x_infinity_sdk';

const client = new ZeroXInfinityClient(apiKey, secretKey);
const depth = await client.getDepth('BTC_USDT');

5. 验证

5.1 访问 Swagger UI

cargo run --release -- --gateway --port 8080
# 打开: http://localhost:8080/docs

5.2 测试结果

测试类别	数量	结果
单元测试	293	✅ 全部通过
公开端点	6	✅ 全部通过
私有端点	9	✅ 全部通过
E2E 总计	17	✅ 全部通过

6. 总结

本章我们为交易引擎添加了 OpenAPI 文档：

成就	结果
Swagger UI	可通过 `/docs` 访问
OpenAPI 规范	15 个端点已文档化
Python SDK	自动生成（含 Ed25519）
TypeScript SDK	类型安全的客户端
零破坏性变更	所有现有测试通过

下一章：随着鲁棒性（0x0D）和文档化（0x0E）的完成，基础已经稳固。下一个合理的步骤是 0x0F: 充值与提现 —— 连接区块链实现真正的加密货币资金。

↑ 返回顶部

0x0F Admin Dashboard Architecture

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📅 Status: ✅ Verified (E2E 4/4 Pass) Branch: 0x0F-admin-dashboard Updated: 2024-12-27

📦 Code Changes: View Code

1. Overview

1.1 Goal

Build an admin dashboard for exchange operations using FastAPI Amis Admin + FastAPI-User-Auth.

1.2 Tech Stack

Component	Technology
Backend	FastAPI + SQLAlchemy
Admin UI	FastAPI Amis Admin (Baidu Amis)
Auth	FastAPI-User-Auth (Casbin RBAC)
Database	PostgreSQL (existing)

1.3 Design Highlights ✨

Why do these designs matter? The Admin Dashboard is a core operations system for the exchange. Incorrect operations can lead to fund loss or system failures. The following design principles are key lessons we’ve learned in practice:

Design Principle	Why?
🔒 ID Immutability	`asset_id`, `symbol_id` cannot be modified after creation. Historical orders and trade records depend on these IDs—changing them would break data relationships.
🔢 DB-Generated IDs	`asset_id`, `symbol_id` use PostgreSQL `SERIAL` for auto-generation, preventing human input conflicts or errors.
📝 Status as Strings	Users see `Active`/`Disabled` instead of `1`/`0`, reducing cognitive load and avoiding misinterpretation.
🚫 Base ≠ Quote	Prevent creation of invalid pairs like `BTC_BTC`—this is a logic bug, not a UX issue.
🔍 Trace ID Evidence Chain	Fundamental financial compliance requirement. Each operation carries a ULID trace_id, forming a complete audit evidence chain. When issues arise: traceable, provable, reproducible.
📜 Mandatory Audit Log	All operations record before/after states, meeting compliance requirements and supporting incident investigation.
🔄 Gateway Hot-Reload	Config changes take effect within 5 seconds without service restart—critical for emergency delisting scenarios.
⬇️ Default Descending Sort	Lists show newest items first—operators typically focus on recent activity.

Tutorial Tip: These design principles didn’t emerge from nothing—they come from real operational pitfalls in exchange systems. Readers should carefully understand each “Why”.

1.4 Features

Module	Functions
User Management	KYC review, VIP level, ban/unban
Asset Management	Deposit confirm, withdrawal review, freeze
Trading Monitor	Real-time orders, trades, anomaly alerts
Fee Config	Symbol fee rates, VIP discounts
System Monitor	Service health, queue depth, latency
Audit Log	All admin operations logged

2. Architecture

┌─────────────────────────────────────────────────────────┐
│                   Admin Dashboard                        │
├─────────────────────────────────────────────────────────┤
│  FastAPI Amis Admin (UI)                                │
│  ├── User Management                                    │
│  ├── Asset Management                                   │
│  ├── Trading Monitor                                    │
│  ├── Fee Config                                         │
│  └── System Monitor                                     │
├─────────────────────────────────────────────────────────┤
│  FastAPI-User-Auth (RBAC)                               │
│  ├── Page Permissions                                   │
│  ├── Action Permissions                                 │
│  ├── Field Permissions                                  │
│  └── Data Permissions                                   │
├─────────────────────────────────────────────────────────┤
│  PostgreSQL (existing)     │     TDengine (read-only)  │
│  - users_tb                │     - trades_tb           │
│  - balances_tb             │     - balance_events_tb   │
│  - symbols_tb              │     - klines_tb           │
│  - transfers_tb            │                           │
└─────────────────────────────────────────────────────────┘

3. RBAC Roles

Role	Permissions
Super Admin	All permissions
Risk Officer	Withdrawal review, user freeze
Operations	User management, VIP config
Support	View-only, no modifications
Auditor	View audit logs only

4. Implementation Plan

Phase 1: MVP - Config Management

Scope: Basic login + config CRUD (Asset, Symbol, VIP)

Step 1: Project Setup

mkdir admin && cd admin
python -m venv venv && source venv/bin/activate
pip install fastapi-amis-admin fastapi-user-auth sqlalchemy asyncpg

Step 2: Database Connection

Connect to existing PostgreSQL (zero_x_infinity database)
Reuse existing tables: assets_tb, symbols_tb, users_tb

Step 3: Admin CRUD

Model	Table	Operations
Asset	`assets_tb`	List, Create, Update, Enable/Disable
Symbol	`symbols_tb`	List, Create, Update, Trading/Halt
VIP Level	`vip_levels_tb`	List, Create, Update
Audit Log	`admin_audit_log`	List (read-only)

Symbol Status

Status	Description
`trading`	Normal trading
`halt`	Suspended (maintenance/emergency)

Step 4: Admin Auth

Default super admin account
Login/Logout UI

Acceptance Criteria

ID	Criteria	Verify
AC-01	Admin can login at `http://localhost:$ADMIN_PORT/admin`	Browser access (dev:8002, ci:8001)
AC-02	Can create Asset (name, symbol, decimals)	UI + DB
AC-03	Can edit Asset	UI + DB
AC-04	Gateway hot-reload Asset config	No restart needed
AC-05	Can create Symbol (base, quote, fees)	UI + DB
AC-06	Can edit Symbol	UI + DB
AC-07	Gateway hot-reload Symbol config	No restart needed
AC-08	Can create/edit VIP Level	UI + DB
AC-09	Reject invalid input (decimals<0, fee>100%)	Boundary tests
AC-10	VIP default Normal (level=0, 100% fee)	Seed data
AC-11	Asset Enable/Disable	Gateway rejects disabled asset
AC-12	Symbol Halt	Gateway rejects new orders
AC-13	Audit log	All CRUD ops queryable

Input Validation Rules

Field	Rule
`decimals`	0-18, must be integer
`fee_rate`	0-100%, max 10000 bps
`symbol`	Unique, uppercase + underscore
`base_asset` / `quote_asset`	Must exist

Future Enhancements (P2)

Chain Asset Management (Layer 2): Implementation of ADR-005

Chain Config: Manage chains_tb (RPC, confirmations)

Asset Binding: Manage chain_assets_tb (Contract Address, Decimals)

Auto-Verify: Verify contracts on-chain before binding

Asset Migration (P3): Unbind/Rebind for Token Swaps (e.g., LEND -> AAVE)

Dual-Confirmation Workflow:

Preview - Config change preview

Second approval - Another admin approves

Apply - Takes effect after confirmation

For: Symbol delisting, Asset disable, and other irreversible ops

Multisig Withdrawal:

Admin can only create “withdrawal proposal”, not execute directly

Flow: Support submits → Finance reviews → Offline sign/MPC executes

Private keys must NEVER touch admin server

5. Security Requirements (MVP Must-Have)

5.1 Mandatory Audit Logging (Middleware)

Every request must be logged:

# FastAPI Middleware
@app.middleware("http")
async def audit_log_middleware(request: Request, call_next):
    response = await call_next(request)
    await AuditLog.create(
        admin_id=request.state.admin_id,
        ip=request.client.host,
        timestamp=datetime.utcnow(),
        action=f"{request.method} {request.url.path}",
        old_value=...,
        new_value=...,
    )
    return response

5.2 Decimal Precision (Required)

Prevent JSON float precision loss:

from pydantic import BaseModel, field_serializer
from decimal import Decimal

class FeeRateResponse(BaseModel):
    rate: Decimal

    @field_serializer('rate')
    def serialize_rate(self, rate: Decimal, _info):
        return str(rate)  # Serialize as String

⚠️ All amounts and rates MUST use Decimal, output MUST be String

Naming Consistency (with existing code)

Entity	Field	Values
Asset	`status`	0=disabled, 1=active
Symbol	`status`	0=offline, 1=online, 2=maintenance

⚠️ Implementation MUST match migrations/001_init_schema.sql

6. UX Requirements (Post-QA Review)

Based on QA feedback from 160+ test cases. These requirements enhance usability and prevent errors.

6.1 Asset/Symbol Display Enhancement

UX-01: Display Asset names in Symbol creation/edit forms

Base Asset: [BTC (ID: 1) ▼]  ← Dropdown with asset code
Quote Asset: [USDT (ID: 2) ▼]

Implementation: Use SQLAlchemy relationship display in FastAPI Amis Admin.

6.2 Fee Display Format

UX-02: Show fees in both percentage and basis points

Maker Fee: 0.10% (10 bps)
Taker Fee: 0.20% (20 bps)

Implementation:

@field_serializer('base_maker_fee')
def serialize_fee(self, fee: int, _info):
    pct = fee / 10000
    return f"{pct:.2f}% ({fee} bps)"

6.3 Danger Confirmation Dialog

UX-03: Confirm dialog for critical operations (Symbol Halt, Asset Disable)

┌─────────────────────────────────┐
│  ⚠️ Halt Symbol: BTC_USDT        │
├─────────────────────────────────┤
│  • Current orders: 1,234        │
│  • 24h volume: $12M             │
│                                 │
│  This action is reversible      │
│                                 │
│    [Confirm Halt]    [Cancel]   │
└─────────────────────────────────┘

Note: No “type to confirm” required (action is reversible).

6.4 Immutable Field Indicators

UX-04: Visually mark immutable fields in edit forms

Asset Edit:
┌──────────────────────────┐
│ Asset Code: BTC 🔒       │  ← Locked, disabled
│ Decimals: 8 🔒           │  ← Locked, disabled
│ Name: [Bitcoin      ] ✏️  │  ← Editable
│ Status: [Active ▼] ✏️     │  ← Editable
└──────────────────────────┘

Implementation: Use readonly_fields in ModelAdmin.

6.5 Structured Error Messages

UX-05: Provide actionable error responses

{
  "field": "asset",
  "error": "Invalid format",
  "got": "btc!",
  "expected": "Uppercase letters A-Z only (e.g., BTC)",
  "hint": "Remove special character '!'"
}

🚨 6.6 CRITICAL: Base ≠ Quote Validation

UX-06: Prevent creating symbols with same base and quote

This is a LOGIC BUG, not just UX.

@model_validator(mode='after')
def validate_base_quote_different(self):
    if self.base_asset_id == self.quote_asset_id:
        raise ValueError("Base and Quote assets must be different")
    return self

Test Case: BTC_BTC must be rejected.

6.7 ID Auto-Generation (DB Responsibility)

Requirement: asset_id and symbol_id are auto-generated by database, NOT user input.

Create Asset Form:

┌──────────────────────────┐
│ Asset Code: [BTC     ]   │  ← User fills
│ Name: [Bitcoin       ]   │  ← User fills
│ Decimals: [8]            │  ← User fills
│                          │
│ asset_id: (auto)         │  ← DB generates (SERIAL)
└──────────────────────────┘

Create Symbol Form:

┌──────────────────────────┐
│ Symbol: [BTC_USDT    ]   │  ← User fills
│ Base Asset: [BTC ▼]      │  ← User selects
│ Quote Asset: [USDT ▼]    │  ← User selects
│                          │
│ symbol_id: (auto)        │  ← DB generates (SERIAL)
└──────────────────────────┘

Implementation: Use PostgreSQL SERIAL or IDENTITY columns.

-- Already in migrations/001_init_schema.sql
CREATE TABLE assets_tb (
    asset_id SERIAL PRIMARY KEY,  -- Auto-increment
    asset VARCHAR(16) NOT NULL UNIQUE,
    ...
);

6.8 Status/Flags String Display

Requirement: Display Status and Flags as human-readable strings, not raw numbers.

Asset Status Display:

DB Value	Display String	Color
0	`Disabled`	🔴 Red
1	`Active`	🟢 Green

Symbol Status Display:

DB Value	Display String	Color
0	`Offline`	⚫ Gray
1	`Online`	🟢 Green
2	`Close-Only`	🟡 Yellow

Asset Flags Display (bitmask):

Flags: [Deposit ✓] [Withdraw ✓] [Trade ✓] [Internal Transfer ✓]

Instead of: asset_flags: 23

Implementation (Final Design):

⚠️ API Design: Status accepts STRING INPUT ONLY. Integer input is rejected.

class AssetStatus(IntEnum):
    DISABLED = 0
    ACTIVE = 1

class SymbolStatus(IntEnum):
    OFFLINE = 0
    ONLINE = 1
    CLOSE_ONLY = 2

# Pydantic schema validation (string-only input)
@field_validator('status', mode='before')
def validate_status(cls, v):
    if not isinstance(v, str):
        raise ValueError(f"Status must be a string, got: {type(v).__name__}")
    return AssetStatus[v.upper()]

# Output serialization (always string)
@field_serializer('status')
def serialize_status(self, value: int) -> str:
    return AssetStatus(value).name  # "ACTIVE" or "DISABLED"

Test Count: 177 unit tests (5 for UX-08 specifically)

6.9 Default Descending Sorting (UX-09)

Requirement: All list views must default to descending order (newest items first). Reason: Admins usually want to see recent activity or newly created entities. Implementation: Set ordering = [Model.pk.desc()] in ModelAdmin classes.

🔒 6.10 Full Lifecycle Trace ID (UX-10) - CRITICAL

Requirement: Every admin operation MUST carry a unique trace_id (ULID) from entry to exit.

Why: Admin Dashboard is critical infrastructure. Full observability is mandatory for:

Audit compliance
Debugging production issues
Security forensics
Performance monitoring

Trace Lifecycle:

┌──────────────────────────────────────────────────────────────────┐
│  Request Entry                                                   │
│  trace_id: 01HRC5K8F1ABCDEFG... (ULID generated)                 │
├──────────────────────────────────────────────────────────────────┤
│  [LOG] trace_id=01HRC5K8F1... action=START endpoint=/asset       │
│  [LOG] trace_id=01HRC5K8F1... action=VALIDATE input={...}        │
│  [LOG] trace_id=01HRC5K8F1... action=DB_QUERY sql=SELECT...      │
│  [LOG] trace_id=01HRC5K8F1... action=DB_UPDATE before={} after={}│
│  [LOG] trace_id=01HRC5K8F1... action=AUDIT_LOG written           │
│  [LOG] trace_id=01HRC5K8F1... action=END status=200 duration=45ms│
├──────────────────────────────────────────────────────────────────┤
│  Response Exit                                                   │
│  X-Trace-ID: 01HRC5K8F1ABCDEFG... (returned in header)           │
└──────────────────────────────────────────────────────────────────┘

Implementation:

import ulid
from fastapi import Request
from contextvars import ContextVar

# Context variable for trace_id
trace_id_var: ContextVar[str] = ContextVar("trace_id", default="")

@app.middleware("http")
async def trace_middleware(request: Request, call_next):
    # Generate ULID for each request
    trace_id = str(ulid.new())
    trace_id_var.set(trace_id)
    
    # Log entry
    logger.info(f"trace_id={trace_id} action=START endpoint={request.url.path}")
    
    response = await call_next(request)
    
    # Log exit
    logger.info(f"trace_id={trace_id} action=END status={response.status_code}")
    
    # Return trace_id in response header
    response.headers["X-Trace-ID"] = trace_id
    return response

# Audit log includes trace_id
class AuditLog(Base):
    trace_id = Column(String(26), nullable=False)  # ULID is 26 chars
    admin_id = Column(BigInteger, nullable=False)
    action = Column(String(32), nullable=False)
    ...

Log Format (structured JSON):

{
  "timestamp": "2025-12-27T10:25:00Z",
  "trace_id": "01HRC5K8F1ABCDEFGHIJK",
  "admin_id": 1001,
  "action": "DB_UPDATE",
  "entity": "Asset",
  "entity_id": 5,
  "before": {"status": 1},
  "after": {"status": 0},
  "duration_ms": 12
}

Verification:

Every request generates unique ULID trace_id
All log lines include trace_id
Audit log table has trace_id column
Response includes X-Trace-ID header
Local log files are rotated and retained

7. Testing

� Full Testing Guide: 0x0F-admin-testing.md

Quick Start:

./scripts/run_admin_full_suite.sh   # Run all tests

Test Summary:

Category	Count	Status
Rust unit tests	5	✅
Admin unit tests	178+	✅
Admin E2E tests	4/4	✅
UX-10 Trace ID	16/16	✅

Ports: Dev 8002, CI 8001

8. Future Phases

Phase	Content
Phase 2	User management, balance viewer
Phase 3	TDengine monitoring
Phase 4	Full RBAC, advanced audit

7. Directory Structure

admin/
├── main.py                 # FastAPI app entry
├── settings.py             # Config
├── models/                 # SQLAlchemy models (shared with main app)
├── admin/
│   ├── user.py            # User admin
│   ├── asset.py           # Asset admin
│   ├── trading.py         # Trading admin
│   └── system.py          # System admin
├── auth/
│   └── rbac.py            # RBAC config
└── requirements.txt

↑ Back to Top

🇨🇳 中文

📅 状态: ✅ 已验证 (E2E 4/4 通过) 分支: 0x0F-admin-dashboard

📦 代码变更: 查看代码

1. 概述

1.1 目标

使用 FastAPI Amis Admin + FastAPI-User-Auth 构建交易所后台管理系统。

1.2 技术栈

组件	技术
后端	FastAPI + SQLAlchemy
管理界面	FastAPI Amis Admin (百度 Amis)
认证	FastAPI-User-Auth (Casbin RBAC)
数据库	PostgreSQL (现有)

1.3 功能模块

模块	功能
用户管理	KYC 审核、VIP 等级、封禁/解封
资产管理	充值确认、提现审核、资产冻结
交易监控	实时订单/成交、异常报警
费率配置	Symbol 费率、VIP 折扣
系统监控	服务健康、队列积压、延迟
审计日志	所有管理操作可追溯

2. RBAC 角色

角色	权限
超级管理员	全部权限
风控专员	提现审核、用户冻结
运营人员	用户管理、VIP 配置
客服	只读，不可修改
审计员	只看审计日志

4. 配置与脚本统一 (2024-12-27)

4.1 配置单一源 (Single Source of Truth)

所有环境配置统一从 scripts/lib/db_env.sh 导出：

# 数据库
export PG_HOST, PG_PORT, PG_USER, PG_PASSWORD, PG_DB
export DATABASE_URL, DATABASE_URL_ASYNC

# 服务端口
export GATEWAY_PORT  # 8080
export ADMIN_PORT    # Dev: 8002, CI: 8001
export ADMIN_URL, GATEWAY_URL

端口约定：

环境	Gateway	Admin
Dev (本地)	8080	8002
CI	8080	8001
QA	8080	8001

4.2 测试脚本命名规范

脚本	用途
`run_admin_full_suite.sh`	统一入口（Rust + Admin Unit + E2E）
`run_admin_gateway_e2e.sh`	Admin → Gateway 传播E2E测试
`run_admin_tests_standalone.sh`	一键完整测试（安装deps+启动server）

命名规范：run_<scope>_<type>.sh

4.3 测试结构

admin/tests/
├── unit/           # pytest 单元测试
├── e2e/            # pytest E2E测试 (需service running)
└── integration/    # 独立脚本 (通过CI运行)
    └── test_admin_gateway_e2e.py

运行方式：

# 运行全部
./scripts/run_admin_full_suite.sh

# 快速模式（跳过unit tests）
./scripts/run_admin_full_suite.sh --quick

↑ 返回顶部

0x0F Admin Dashboard - Testing Guide

This document contains detailed test cases and scripts for the Admin Dashboard. For architecture overview, see Admin Dashboard.

Test Scripts

One-Click Testing

# Run all tests (Rust + Admin Unit + E2E)
./scripts/run_admin_full_suite.sh

# Quick mode (skip Unit Tests)
./scripts/run_admin_full_suite.sh --quick

# Run only Admin → Gateway propagation E2E
./scripts/run_admin_gateway_e2e.sh

Script Reference

Script	Purpose
`run_admin_full_suite.sh`	Unified entry (Rust + Admin Unit + E2E)
`run_admin_gateway_e2e.sh`	Admin → Gateway propagation tests
`run_admin_tests_standalone.sh`	One-click full test (install deps + start server)

Port Configuration

Environment	Admin Port	Gateway Port
Dev (local)	8002	8080
CI	8001	8080

Test Files

Script	Function
`verify_e2e.py`	Admin login/logout, health check
`test_admin_login.py`	Authentication tests
`test_constraints.py`	Database constraint validation
`test_core_flow.py`	Asset/Symbol CRUD workflows
`test_input_validation.py`	Invalid input rejection
`test_security.py`	Security and authentication
`tests/e2e/test_asset_lifecycle.py`	Asset enable/disable lifecycle
`tests/e2e/test_symbol_lifecycle.py`	Symbol trading status management
`tests/e2e/test_fee_update.py`	Fee configuration updates
`tests/e2e/test_audit_log.py`	Audit trail verification
`tests/test_ux10_trace_id.py`	UX-10 Trace ID verification

Running Individual Tests

cd admin && source venv/bin/activate
pytest tests/test_core_flow.py -v
pytest tests/e2e/test_asset_lifecycle.py -v
pytest tests/test_ux10_trace_id.py -v

Test Coverage

Total: 198+ tests

Rust unit tests: 5 passed
Admin unit tests: 178+ passed
Admin E2E tests: 4/4 passed
UX-10 Trace ID tests: 16/16 passed

UX Requirements Test Matrix

UX ID	Requirement	Test File
UX-06	Base ≠ Quote validation	`test_constraints.py`
UX-07	ID Auto-Generation	`test_id_mapping.py`
UX-08	Status String Display	`test_ux08_status_strings.py`
UX-09	Default Descending Sort	`test_core_flow.py`
UX-10	Trace ID Evidence Chain	`test_ux10_trace_id.py`

Acceptance Criteria

#	Deliverable	Verification
1	Admin UI accessible	Browser at `localhost:$ADMIN_PORT`
2	One-click E2E test	`./scripts/run_admin_full_suite.sh` passes
3	All tests pass	198+ tests green
4	Audit log queryable	Admin UI audit page
5	Gateway hot-reload	Config change without restart

Standard Operating Procedure (SOP): Token Listing

Role: Operations / Listing Manager System: Admin Dashboard

1. 准备工作 (Pre-requisites)

Before listing, you need the following information:

Item	Description	Example	Source
Logic Symbol	The unique ticker on the exchange	`UNI`	Project Team
Asset Name	Full display name	`Uniswap`	Project Team
Chain	The blockchain network	`ETH`	Project Team
Contract Address	The Token’s Smart Contract	`0x1f98...`	Etherscan / Project
Decimals	Token precision	`18`	Auto-detected
Min Deposit	Minimum amount to credit	`0.1`	Ops Decision (Risk)
Withdraw Fee	Fee deducted per withdrawal	`5.0`	Ops Decision (Gas Cost)

2. 操作步骤 (Workflow steps)

Phase 1: Create Logical Asset (业务定义)

Define the asset for Trading and User Balances.

Navigate: Admin -> Assets -> Create New.
Input:
- Symbol: UNI
- Name: Uniswap
- Decimals: 18 (System Internal Precision)
- Initial Permissions:
  - [x] Can Allow Deposit
  - [ ] Can Allow Withdraw (Recommended: Disable initially)
  - [ ] Can Allow Trade (Recommended: Enable later)
- Status: Active
Click: Save.
- System Result: assets_tb created. Asset ID generated (e.g., #10).

Phase 2: Bind Chain Asset (链上绑定)

Tell Sentinel how to find this asset on-chain and set limits.

Navigate: Admin -> Assets -> Select UNI (#10) -> Chain Config Tab.
Click: Add New Binding.
Input Configuration (Minimal):
- Chain: Select ETH (Ethereum).
- Contract Address: Paste 0x1f98...
- (Leave other fields empty - System will fetch them)
Action: Click Auto-Detect from Chain.
- System Action: Queries RPC decimals(), symbol().
- Result:
  - Decimals: Auto-filled 18. (Locked, Read-only)
  - Symbol: Auto-detected UNI. (Verifies against Asset name)
- Ops Action: Verify the fetched data matches. Adjust Min Deposit / Fee only if defaults are unsuitable.
Risk Configuration: (Review Defaults)
- Min Deposit: 0.1 (Prevent dust attacks).
- Min Withdraw: 10.0 (Must be > Fee).
- Withdraw Fee: 5.0 (Cover Gas + Margin).
Confirm: Check detected Decimals match project info.
Click: Bind (Saved as Inactive).
- System Result: chain_assets_tb created with is_active=false.

Note: Risk Parameters (Fee, Min Deposit) are Chain-Specific.

Phase 3: Validation & Activation (验证与激活)

Verify functionality before opening to public.

Validation: Perform the “User Deposit Test” (See Section 3).
- Note: Inactive assets can still be processed by Sentinel for Admin-whitelisted test accounts (if supported), or use Staging env.
- Correction: Current Sentinel design might require Active to process. SOP Update: Set “Fee” very high or “Min Deposit” very high to prevent public use, OR rely on asset_flags (Deposit Disabled) from Phase 1.
- Refined Strategy:
  1. Enable Chain Binding (Active = True) to allow Sentinel to see it.
  2. Keep Logical Asset “Deposit/Withdraw” flags (Phase 1) DISABLED.
  3. Test.
  4. Enable Logical Flags.

(Self-Correction: Sentinel needs is_active=true to index logs. So we must keep Chain Active but Logic (User Balance) Disabled).

Revised Step 7: 7. Click: Bind & Activate (Chain Level). * Safety: Ensure Phase 1 Flags (Deposit/Withdraw) are UNCHECKED. This allows Sentinel to sync, but Users cannot operate.

Phase 4: Public Launch (Go Live)

Navigate: Admin -> Assets -> UNI.
Action: Check [x] Can Allow Deposit, [x] Can Allow Withdraw.
Click: Save.
- Result: Users can now see deposit address and transact.

Note: Risk Parameters (Fee, Min Deposit) are Chain-Specific. If you list USDT on both ETH and TRON, you must configure them separately for each chain (e.g., ETH Fee = 5.0, TRON Fee = 1.0).

3. 结果验证 (Verification)

Verification A: User Deposit (Hot Test)

Ask a test user to deposit UNI to their Existing ETH Address.
- Note: User does NOT need to generate a new address.
Wait 1-2 minutes (Block Confirmation).
Check Admin -> Deposits: Should see + UNI record.

Verification B: System Log

Check Sentinel Logs: [ETH] New asset watched: UNI (0x1f98...).

4. 常见问题 (FAQ)

Q: 用户需要重新生成地址吗？ A: 不需要。只要是 ETH 链上的资产，用户统一使用同一个 ETH 充值地址。系统会自动根据 Contract 地址识别是 UNI 还是 USDT。

Q: 填错了合约地址怎么办？ A: Verify On-Chain 步骤会报错（Decimal获取失败或为0）。如果强行保存了错误地址，请立即在 Admin 中将该 Binding 设为 Disabled，然后重新添加正确的。

0x10 Web Frontend Outsourcing Specification

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📅 Status: 📝 RFP / Requirements Spec Goal: Develop a production-grade cryptocurrency exchange frontend.

1. Project Overview

We are looking for a professional development team to build the web frontend for Zero X Infinity, a high-performance cryptocurrency exchange.

Core Requirement: The frontend must be fast, responsive, and visually premium (similar to Binance/Bybit Pro implementations).

Technology Stack: Open Choice (Developer proposes stack).

Recommended: React, Vue 3, or Svelte.
Requirement: Must produce static assets manageable by Nginx/Docker.

2. Scope of Work

2.1 Core Pages

Page	Features	Backend Status
Home / Landing	Market overview, Tickers, “Start Trading” CTA.	⚠️ Mock Data (Public API part ready)
Authentication	Login, Register, Forgot Password.	✅ Ready (Phase 0x10.6 Implemented)
Trading Interface	(Core) K-Line Chart, OrderBook, Trade History, Order Form.	✅ Ready (Full API Support)
Assets / Wallet	Balance overview, Deposit, Withdrawal, Asset History.	⚠️ Partial (Read Only ready; Dep/Wdw Pending)
User Center	API Key management, Password reset, Activity log.	✅ Backend Ready (UI Pending)

2.2 Key Features & Requirements

A. Trading Interface (Critical)

Layout: 3-column classic layout (Left: Orderbook, Mid: Chart, Right: Trade History/Forms).
Chart: Integration with TradingView Charting Library (or Lightweight Charts).
OrderBook: Visual depth representation, clickable price to fill order form.
Responsiveness: Must work flawlessly on Desktop (1080p+) and Mobile.

B. Technical Constraints

NO FLOATING POINT MATH: All precision must use String or BigInt arithmetic.
- Backend sends: "123.45670000" (String).
- Frontend displays: Fixed precision per asset config.
WebSocket Push: Market data is pushed via WebSocket. Frontend must handle reconnection and heartbeat.
Ed25519 Authentication:
- API requests require X-Signature header.
- Frontend must sign payload using Ed25519 private key (stored in memory/session).
- Note: If using a standard password login flow, the backend may handle session cookies, but for high-security actions or if “API-Key mode” is used, client-side signing is required. (Clarification: MVP will use opaque Session Token returned by API, standard HTTP Only Cookie or Bearer Token. Ed25519 is for API Clients, but Web UI can use session wrapper.)

3. Deliverables

Source Code: Full git repository history.
Docker Support: Dockerfile for multi-stage build (Node build -> Nginx alpine).
Documentation:
- README.md: Build & Run instructions.
- CONFIG.md: Environment variable reference.
Mock Server: Simple mock logic or fixtures for UI testing without full backend.

4. Resources provided

API Documentation: Swagger UI / OpenAPI Spec (See Section 6.1)
WebSocket Protocol: Docs
UI/UX References: Binance, Kraken Pro.

5. API Inventory (Current Available)

The following APIs are implemented and available for frontend integration.

5.1 Public Market Data

Base URL: /api/v1/public

Endpoint	Method	Description	Status
`/exchange_info`	GET	Server time, limits	✅ Ready
`/assets`	GET	List supported assets	✅ Ready
`/symbols`	GET	List trading pairs	✅ Ready
`/depth`	GET	Order book depth	✅ Ready
`/klines`	GET	OHLCV candles	✅ Ready
`/trades`	GET	Public trade history	✅ Ready

5.2 Private Trading (Requires Signature)

Base URL: /api/v1/private

Endpoint	Method	Description	Status
`/order`	POST	Place limit/market order	✅ Ready
`/cancel`	POST	Cancel order	✅ Ready
`/orders`	GET	List open/history orders	✅ Ready
`/order/{id}`	GET	Get single order details	✅ Ready
`/trades`	GET	User trade history	✅ Ready
`/balances`	GET	Get specific asset balance	✅ Ready
`/balances/all`	GET	Get all asset balances	✅ Ready

5.3 WebSocket Real-time Stream

Endpoint: ws://host:port/ws

Channel	Type	Description	Status
`order.update`	Private	Order status change	✅ Ready (Authenticated)
`trade`	Private	User trade execution	✅ Ready (Authenticated)
`balance.update`	Private	Balance change	✅ Ready (Authenticated)
`market.depth`	Public	Orderbook updates	✅ Ready
`market.ticker`	Public	24h Ticker updates	✅ Ready
`market.trade`	Public	Public trade stream	✅ Ready

5.4 Authentication & User

Feature	Description	Status
Sign-up/Login	User registration & JWT	✅ Ready (Implemented)
User Profile	KYC, Password reset	⚠️ Partial (Password Reset Ready)
API Keys	Manage API keys	✅ Ready (Implemented)

6. Development Resources

6.1 How to Access API Documentation

The backend provides auto-generated OpenAPI 3.0 documentation.

Step 1: Start the Backend (Mock Mode)

# Clone repository
git clone https://github.com/gjwang/zero_x_infinity
cd zero_x_infinity

# Run Gateway (requires Rust installed)
cargo run --release -- --gateway --port 8080

Step 2: Access Documentation

Interactive Swagger UI: http://localhost:8080/docs
Raw OpenAPI JSON: http://localhost:8080/api-docs/openapi.json

Step 3: Generate Client SDK You can use openapi-generator-cli to generate a robust client:

npx @openapitools/openapi-generator-cli generate \
  -i http://localhost:8080/api-docs/openapi.json \
  -g typescript-axios \
  -o ./src/api

↑ Back to Top

🇨🇳 中文

📅 状态: 📝 外包需求文档 (RFP) 目标: 开发一套生产级的加密货币交易所 Web 前端。

1. 项目概览

我们需要一个专业团队为 Zero X Infinity 高性能交易所开发 Web 前端。

核心要求: 界面必须 快速、响应式且具备高级感（对标 Binance/Bybit 专业版体验）。

技术栈: 不限 (由开发方提案)。

推荐: React, Vue 3, 或 Svelte。
要求: 最终产物必须是静态文件，可由 Nginx/Docker 托管。

2. 工作范围

2.1 核心页面

2.2 关键特性与要求

A. 交易界面 (关键)

布局: 经典三栏布局 (左: 盘口, 中: K线, 右: 成交/下单)。
图表: 集成 TradingView Charting Library (或 Lightweight Charts)。
盘口: 带有视觉深度的买卖盘列表，点击价格可填入下单框。
响应式: 必须完美适配桌面端 (1080p+) 和移动端浏览器。

B. 技术限制

严禁浮点数运算: 所有金额/价格必须使用 String 或 BigInt 处理。
- 后端下发: "123.45670000" (字符串)。
- 前端显示: 根据配置的精度进行截断/补零。
WebSocket 推送: 行情数据通过 WS 推送。前端需处理断线重连和心跳。
Ed25519 签名 (如需):
- 注: Web 端通常使用 Session Cookie/Token 模式。如涉及客户端签名功能，需支持 Ed25519 算法。

3. 交付物

源代码: 完整的 Git 提交记录。
Docker 支持: Dockerfile (多阶段构建: Node build -> Nginx alpine)。
文档:
- README.md: 构建与运行指南。
- CONFIG.md: 环境变量说明。
Mock 服务: 用于 UI 独立开发的 Mock 数据或逻辑。

4. 提供资源

API 文档: Swagger UI / OpenAPI Spec (见第 6.1 节)
WebSocket 协议: 文档
UI/UX 参考: Binance, Kraken Pro.

5. API 清单 (当前可用)

以下 API 已实现并可用于前端集成。

5.1 公开行情数据

基础 URL: /api/v1/public

端点	方法	描述	状态
`/exchange_info`	GET	服务器时间, 限制	✅ 就绪
`/assets`	GET	资产列表	✅ 就绪
`/symbols`	GET	交易对列表	✅ 就绪
`/depth`	GET	订单簿深度	✅ 就绪
`/klines`	GET	K线数据	✅ 就绪
`/trades`	GET	公开成交历史	✅ 就绪

5.2 私有交易 (需签名)

基础 URL: /api/v1/private

端点	方法	描述	状态
`/order`	POST	下单 (限价/市价)	✅ 就绪
`/cancel`	POST	撤单	✅ 就绪
`/orders`	GET	查询订单 (当前/历史)	✅ 就绪
`/order/{id}`	GET	查询单条订单	✅ 就绪
`/trades`	GET	用户成交历史	✅ 就绪
`/balances`	GET	查询特定资产余额	✅ 就绪
`/balances/all`	GET	查询所有余额	✅ 就绪

5.3 WebSocket 实时流

端点: ws://host:port/ws

频道	类型	描述	状态
`order.update`	私有	订单状态变更	✅ 就绪 (需鉴权)
`trade`	私有	用户成交通知	✅ 就绪 (需鉴权)
`balance.update`	私有	余额变更	✅ 就绪 (需鉴权)
`market.depth`	公开	盘口深度更新	✅ 就绪
`market.ticker`	公开	24h Ticker更新	✅ 就绪
`market.trade`	公开	公开成交流	✅ 就绪

5.4 认证与用户

功能	描述	状态
注册/登录	用户注册 & JWT	✅ 就绪 (已实现)
用户资料	KYC, 密码重置	⚠️ 部分就绪 (支持改密)
API Key	管理 API Key	✅ 就绪 (已实现)

6. 开发资源

6.1 如何获取 API 文档

后端提供自动生成的 OpenAPI 3.0 文档。

步骤 1: 启动后端 (Mock 模式)

# 克隆仓库
git clone https://github.com/gjwang/zero_x_infinity
cd zero_x_infinity

# 运行网关 (需要安装 Rust)
cargo run --release -- --gateway --port 8080

步骤 2: 访问文档

交互式 Swagger UI: http://localhost:8080/docs
原始 OpenAPI JSON: http://localhost:8080/api-docs/openapi.json

步骤 3: 生成客户端 SDK 你可以使用 openapi-generator-cli 生成健壮的客户端代码：

npx @openapitools/openapi-generator-cli generate \
  -i http://localhost:8080/api-docs/openapi.json \
  -g typescript-axios \
  -o ./src/api

0x11 Deposit & Withdraw (Mock Chain)

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📦 Code Changes: View Diff

Core Objective: Implement the Funding Layer (Deposit & Withdraw) using a Mock Chain Architecture to validate asset flows without external blockchain dependencies.

1. Background & Architecture

We have a high-performance Matching Engine (Phase I) and a Product Layer (Accounts/Auth, Phase II). Now we add the Funding Layer to allow assets to enter and leave the system.

1.1 The “Mock Chain” Strategy

Instead of syncing 500GB of Bitcoin data, we implement a Simulator for Phase 0x11.

Goal: Validate internal logic (Balance Credit, Risk Check, Idempotency).
Method: MockBtcChain and MockEvmChain traits that simulate RPC calls.

graph LR
    User[User] -->|API Request| Gateway
    Gateway -->|Risk Check| FundingService
    FundingService -->|Command| ME[Matching Engine]
    FundingService -.->|Simulated RPC| MockChain[Mock Chain Adapter]
    MockChain -.->|Callback| FundingService

1.2 Phase Plan

Chapter	Topic	Status
0x11	Deposit & Withdraw (Mock)	✅ Completed
0x11-a	Real Chain Integration	🚧 Construction

2. Core Implementation

2.1 Funding Service (`src/funding/service.rs`)

The central orchestrator for all funding operations.

Deposit: Receives “Mock Event”, checks idempotency, credits user balance via matching engine.
Withdraw: Authenticates user, locks funds in engine, simulates broadcast, updates DB.

2.2 Chain Adapter Trait (`src/funding/chain_adapter.rs`)

We abstract blockchain specifics behind a trait:

#![allow(unused)]
fn main() {
#[async_trait]
pub trait ChainClient: Send + Sync {
    async fn generate_address(&self, user_id: i64) -> Result<String, ChainError>;
    async fn broadcast_withdraw(&self, to: &str, amount: &str) -> Result<String, ChainError>;
    // ... validation methods
}
}

2.3 Database Schema (Migration)

Key tables added in migrations/010_deposit_withdraw.sql:

deposit_history: Tracks incoming transactions (Key: tx_hash).
withdraw_history: Tracks outgoing requests (Key: request_id).
user_addresses: Maps User <-> Asset <-> Address.

3. Data Flow

3.1 Deposit Flow (Mock)

Trigger: POST /internal/mock/deposit { user_id, asset, amount }
Idempotency: Check if tx_hash exists in deposit_history.
Engine Execution: Send OrderAction::Deposit to Match Engine.
Result: User Balance increases.

#![allow(unused)]
fn main() {
// src/funding/deposit.rs
pub async fn process_deposit(...) {
    if db.exists(tx_hash).await? { return Ok(()); }
    
    // Command Engine
    engine.execute(Deposit(user_id, asset, amount)).await?;
    
    // Persist
    db.insert_deposit(..., "SUCCESS").await?;
}
}

3.2 Withdraw Flow

Request: POST /api/v1/private/withdraw/apply
Risk Check: 2FA (Future), Whitelist, Balance Check.
Engine Lock: Send OrderAction::WithdrawLock (Instant deduction).
Broadcast: Call mock_chain.broadcast().
Finalize: Update withdraw_history with tx_hash.

4. Verification

We verified this phase using a comprehensive E2E script.

4.1 Verification Script

Run the master script to verify the full lifecycle:

./scripts/verify_funding_trading_flow.sh

Scenario Covered:

Register User A & B.
Deposit BTC to User A (Mock).
Transfer internal funds.
Trade (Buy/Sell) to change balances.
Withdraw USDT from User B.
Audit: Check DB consistency.

4.2 Security Validation

Address Validation: Strict Regex for 0x... (ETH) and 1/3/bc1... (BTC).
Internal Auth: Mock endpoints protected by X-Internal-Secret.

Warning

SECURITY ADVISORY: The /internal/mock/deposit endpoint is a major security risk as it allows direct balance manipulation. It is currently protected by a secret but MUST be removed entirely once the Phase 0x11-a Sentinel (blockchain scanner) is fully integrated and stable.

Summary

Phase 0x11 establishes the “Financial Highways” of the exchange. By using a Mock Chain, we isolated the complex internal logic (Accounting, Risk, Idempotency) from the external chaos of real blockchains.

Key Achievement:

A complete, idempotent Asset Inflow/Outflow system that is “Blockchain Agnostic”.

Next Step:

Phase 0x11-a: Replace the “Mock Adapter” with a “Real Node Sentinel” (Bitcoin Core / Anvil).

↑ Back to Top

🇨🇳 中文

📦 代码变更: 查看 Diff

核心目标：实现 资金层 (Funding Layer) (充值与提现)，使用 模拟链架构 (Mock Chain) 来验证资金流转，而不依赖外部区块链环境。

1. 背景与架构

我们已经拥有了高性能的 撮合引擎 (Phase I) 和 产品层 (账户/鉴权, Phase II)。现在我们需要添加 资金层，允许资产进入和离开系统。

1.1 “Mock Chain” 策略

在 Phase 0x11 中，我们实现一个 模拟器，而不是直接同步 500GB 的比特币数据。

目标: 验证内部逻辑 (余额入账、风控检查、幂等性)。
方法: MockBtcChain 和 MockEvmChain trait，模拟 RPC 调用。

graph LR
    User[用户] -->|API 请求| Gateway
    Gateway -->|风控检查| FundingService
    FundingService -->|指令| ME[撮合引擎]
    FundingService -.->|模拟 RPC| MockChain[Mock Chain 适配器]
    MockChain -.->|回调| FundingService

1.2 阶段规划

章节	主题	状态
0x11	充值与提现 (Mock)	✅ 已完成
0x11-a	真实链集成	🚧 建设中

2. 核心实现

2.1 资金服务 (`src/funding/service.rs`)

资金操作的核心协调器。

充值 (Deposit): 接收 “模拟事件”，检查幂等性，通过撮合引擎增加用户余额。
提现 (Withdraw): 验证用户，锁定引擎中的资金，模拟广播，更新数据库。

2.2 链适配器接口 (`src/funding/chain_adapter.rs`)

我们将区块链细节抽象在 Trait 之后：

#![allow(unused)]
fn main() {
#[async_trait]
pub trait ChainClient: Send + Sync {
    async fn generate_address(&self, user_id: i64) -> Result<String, ChainError>;
    async fn broadcast_withdraw(&self, to: &str, amount: &str) -> Result<String, ChainError>;
    // ... 验证方法
}
}

2.3 数据库 Schema (Migration)

migrations/010_deposit_withdraw.sql 新增的关键表：

deposit_history: 追踪入金 (Key: tx_hash)。
withdraw_history: 追踪出金 (Key: request_id)。
user_addresses: 映射 User <-> Asset <-> Address。

3. 数据流

3.1 充值流程 (Mock)

触发: POST /internal/mock/deposit { user_id, asset, amount }
幂等性: 检查 deposit_history 中是否存在 tx_hash。
引擎执行: 发送 OrderAction::Deposit 给撮合引擎。
结果: 用户余额增加。

#![allow(unused)]
fn main() {
// src/funding/deposit.rs
pub async fn process_deposit(...) {
    if db.exists(tx_hash).await? { return Ok(()); }
    
    // Command Engine
    engine.execute(Deposit(user_id, asset, amount)).await?;
    
    // Persist
    db.insert_deposit(..., "SUCCESS").await?;
}
}

3.2 提现流程

请求: POST /api/v1/private/withdraw/apply
风控: 2FA (规划中), 白名单, 余额检查。
引擎锁定: 发送 OrderAction::WithdrawLock (瞬间扣除)。
广播: 调用 mock_chain.broadcast()。
终结: 更新 withdraw_history 填充 tx_hash。

4. 验证与测试

我们使用全链路 E2E 脚本验证了本阶段功能。

4.1 验证脚本

运行主脚本以验证完整生命周期：

./scripts/verify_funding_trading_flow.sh

覆盖场景:

注册用户 A & B。
充值 BTC 给用户 A (模拟)。
划转资金 (Internal Transfer)。
交易 (买/卖) 改变余额。
提现 USDT (用户 B)。
审计: 检查数据库一致性。

4.2 安全性验证

地址验证: 针对 0x... (ETH) 和 1/3/bc1... (BTC) 的严格正则校验。
内部鉴权: Mock 端点受 X-Internal-Secret 保护。

Caution

安全警告：/internal/mock/deposit 接口存在重大安全隐患，因为它允许直接修改用户余额。虽然目前增加了 Secret 校验，但在 Phase 0x11-a Sentinel（区块链扫描器）完全集成并稳定后，必须彻底移除此接口。

总结

Phase 0x11 建立了交易所的 “资金高速公路”。通过使用 Mock Chain，我们将复杂的内部逻辑（会计、风控、幂等性）与外部区块链的混乱隔离开来。

关键成就:

一套完整的、幂等的资产流入/流出系统，且做到 “Blockchain Agnostic” (与具体链解耦)。

下一步:

Phase 0x11-a: 将 “Mock Adapter” 替换为 “Real Node Sentinel” (Bitcoin Core / Anvil)。

0x11-a Real Chain Integration

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

Status	✅ IMPLEMENTED / QA VERIFIED (Phase 0x11-a Complete)
Date	2025-12-29
Context	Phase 0x11 Extension: From Mock to Reality
Goal	Integrate real Blockchain Nodes (Regtest/Testnet) and handle distributed system failures (Re-orgs, Network Partition).

1. Core Architecture Change: Pull vs Push

The “Mock” phase (0x11) relied on a Push Model (API Call -> Deposit). Real Chain Integration (0x11-a) requires a Pull Model (Sentinel -> DB).

1.1 The Sentinel (New Service)

A dedicated, independent service loop responsible for “watching” the blockchain.

Block Scanning: Polls getblockchaininfo / eth_blockNumber.
Filter: Index user_addresses in memory. Scan every transaction in new blocks against this filter.
State Tracking: Updates confirmation counts for existing CONFIRMING deposits.

2. Critical Challenge: Re-org (Chain Reorganization)

In a real blockchain, the “latest” block is not final. It can be orphaned.

2.1 Confirmation State Machine

We must expand the Deposit Status flow to handle volatility.

Status	Confirmations	Action	UI Display
DETECTED	0	Log Tx. Do NOT credit balance.	“Confirming (0/X)”
CONFIRMING	1 to (X-1)	Update confirmation count. Check for Re-org (BlockHash mismatch).	“Confirming (N/X)”
FINALIZED	>= X	Action: Push `OrderAction::Deposit` to Pipeline.	“Success”

Important

X represents the REQUIRED_CONFIRMATIONS parameter. Hardcoding is forbidden.

2.2 Re-org Detection Logic

Sentinel remembers Block(Height H) = Hash A.
Sentinel scans Height H again later.
If Hash != A, a Re-org happened.
Action: Rollback scan cursor, re-evaluate all affected deposits.

3. Supported Chains (Phase I)

3.1 Bitcoin (The UTXO Archetype)

Node: bitcoind (Regtest Mode).
Key Challenge: UTXO Management. A deposit is not a “balance update”, it’s a new Unspent Output.
Docker: ruimarinho/bitcoin-core:24

3.2 Ethereum (The Account/EVM Archetype) - 🚧 PENDING

Status: Design Complete, Implementation Pending (Phase 0x11-b).
Node: anvil (from Foundry-rs).
Key Challenge: Event Log Parsing. ERC20 deposits are Transfer events in receipt logs.
Docker: ghcr.io/foundry-rs/foundry:latest

4. Sentinel Architecture (Detailed)

4.1 `BtcSentinel` (Implemented)

getblockhash -> getblock (Verbosity 2).
Iterate outputs vout: Match scriptPubKey against user_addresses.
Re-org Check: Keep a rolling window. If previousblockhash mismatch, trigger Rollback.

4.2 `EthSentinel` (Planned for 0x11-b)

eth_getLogs (Topic0 = Transfer).
Re-org Check: Check blockHash of confirmed logs.

5. Reconciliation & Safety (The Financial Firewall)

5.1 The “Truncation Protocol”

Ingress Logic: Deposit_Credited = Truncate(Deposit_Raw, Configured_Precision)
Residue: Remainder stays in wallet as “System Dust”.

5.2 The Triangular Reconciliation

We verify solvency using three independent data sources:

Source	Alias	Data Point
Blockchain RPC	Proof of Assets (PoA)	`getbalance()` or sum of UTXOs
Internal Ledger	Proof of Liabilities (PoL)	`SUM(user.available + user.frozen)`
Transaction History	Proof of Flow (PoF)	`SUM(deposits) - SUM(withdrawals) - SUM(fees)`

The Equation: PoA == PoL + SystemProfit

5.3 Re-org Recovery Protocol

Shallow Re-org: Sentinel rolls back cursor.
Deep Re-org (> Max Depth): Manual intervention (Freeze + Clawback).

6. Database Schema Extensions

CREATE TABLE chain_cursor (
    chain_id VARCHAR(16) PRIMARY KEY,
    last_scanned_height BIGINT NOT NULL,
    last_scanned_hash VARCHAR(128) NOT NULL,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

ALTER TABLE deposit_history 
ADD COLUMN chain_id VARCHAR(16),
ADD COLUMN block_height BIGINT,
ADD COLUMN block_hash VARCHAR(128),
ADD COLUMN tx_index INT,
ADD COLUMN confirmations INT DEFAULT 0;

7. Configuration: No Hardcoding

All chain-specific parameters (confirmations, reorg depth, dust threshold) must be loaded from YAML.

8. Security: HD Wallet Architecture

8.1 Key Storage

Cold Storage: Private Key (Mnemonic) offline.
Hot Server: XPUB only.

8.2 Address Derivation

BTC: BIP84 (m/84'/0'/0'/0/{index})
ETH: BIP44 (m/44'/60'/0'/0/{index})

8.3 The “Gap Limit” Solution

Solution: Full Index Scanning. Sentinel loads ALL active allocated addresses from the user_addresses table into a HashSet (Memory) or Bloom Filter (Future optimization).
Scanning: Scan every block transaction output against this set, ignoring standard Gap Limits.

9. Future Work (Out of Scope for 0x11-a)

Bloom Filters: For million-user address matching (Phase 0x12).
Automated Clawback: For deep re-orgs.
Multi-Source Validation: Anti-RPC-spoofing.

10. Summary

Phase 0x11-a transitions the Funding System to production-ready blockchain integration.

11. Implementation Status (2025-12-29)

11.1 Completed Features

Core Funding: DepositService and WithdrawService fully implemented with Integer-Only Persistence (BigInt/i64).
Sentinel (BTC): Basic BtcScanner implemented (Polling getblock, HashSet address matching).
Api Layer: Deposit/Withdraw history APIs fixed (QA-01) and internal auth secured (QA-03).
Address Validation: Strict Regex for BTC/ETH addresses (DEF-001).

11.2 Verification & Testing Guide

Run the verified QA suite covering Core, Chaos, and Security scenarios:

bash scripts/run_0x11a_verification.sh

Results:

Agent B (Core): Address Persistence, Deposit/Withdraw Lifecycle ✅
Agent A (Chaos): Idempotency, Race Condition Resilience ✅
Agent C (Security): Address Isolation, Internal Auth ✅

11.3 Known Limitations (Deferred to 0x11-b)

ETH / ERC20 Support: Real chain integration for Ethereum is Pending. EthScanner is currently a stub.
DEF-002 (Sentinel SegWit): The current bitcoincore-rpc integration has issues parsing P2WPKH addresses in regtest. Sentinel runs but may miss specific SegWit deposits.
Bloom Filters: Currently using HashSet for address matching. Bloom Filters deferred to Phase 0x12 optimizations.

↑ Back to Top

🇨🇳 中文

状态	✅ 已实施 / QA 验证通过 (Phase 0x11-a 完成)
日期	2025-12-29
上下文	Phase 0x11 扩展: 从模拟到现实
目标	集成真实区块链节点 (Regtest/Testnet) 并处理分布式系统容错 (链重组、网络分区)。

1. 核心架构升级：推 (Push) vs 拉 (Pull)

模拟阶段 (0x11) 依赖 推模式 (API 调用 -> 触发充值)。真实链集成 (0x11-a) 必须采用 拉模式 (哨兵 -> 被动轮询数据库)。

1.1 哨兵服务 (Sentinel - 新增组件)

一个独立运行的守护进程，负责持续“注视”区块链。

区块扫描 (Block Scanning): 轮询 getblockchaininfo (BTC) 或 eth_blockNumber (ETH)。
过滤器 (Filter): 在内存中索引所有 user_addresses (HashSet)。扫描新块交易时进行快速匹配。
状态追踪 (State Tracking): 持续跟进 CONFIRMING 状态存款的确认数变化。

2. 核心挑战：链重组 (Chain Re-org)

真实区块链中，“最新” 区块并非最终态。它随时可能被孤立 (Orphaned)。

2.1 确认数状态机 (Confirmation State Machine)

必须扩展存款状态流以处理链的不确定性。

状态	确认数	动作	UI 显示
DETECTED (已检测)	0	记录交易，但绝对不增加用户余额。	“确认中 (0/X)”
CONFIRMING (确认中)	1 ~ (X-1)	更新确认数。检查父哈希以防重组。	“确认中 (N/X)”
FINALIZED (已完成)	>= X	动作: 向撮合引擎提交 `OrderAction::Deposit`。	“成功”

Important

X 代表 REQUIRED_CONFIRMATIONS (所需确认数) 参数。禁止硬编码，必须按链配置。

2.2 重组检测逻辑

哨兵记录 Block(Height H) = Hash A。
哨兵稍后再次扫描 Height H。
如果 Hash != A，说明发生了重组。
动作: 回滚扫描游标 (Cursor)，重新评估所有受影响的存款。

3. 支持的链 (第一阶段)

3.1 Bitcoin (UTXO 原型)

节点: bitcoind (Regtest 模式)。
挑战: UTXO 管理。比特币存款是新的未花费输出 (UTXO)，而非简单的余额变动。
Docker: ruimarinho/bitcoin-core:24

3.2 Ethereum (账户/EVM 原型) - 🚧 待实现

状态: 设计完成，等待实现 (Phase 0x11-b)。
节点: anvil (Foundry-rs)。
挑战: Event Log 解析。ERC20 存款体现为 Receipt Log 中的 Transfer 事件。
Docker: ghcr.io/foundry-rs/foundry:latest

4. 哨兵架构详解

4.1 `BtcSentinel` (已实现 - 比特币哨兵)

getblockhash -> getblock (Verbosity 2，获取完整交易细节)。
遍历输出 vout: 将 scriptPubKey 与 user_addresses 匹配。
重组检查: 维护一个滚动窗口。如果 previousblockhash 不匹配，触发 回滚 (Rollback)。

4.2 `EthSentinel` (计划中 - 0x11-b)

eth_getLogs (Topic0 = Transfer 事件签名)。
重组检查: 检查已确认日志的 blockHash 是否变更。

5. 对账与安全 (金融防火墙)

5.1 “截断协议” (The Truncation Protocol)

解决链上浮点数/大整数与系统精度不匹配的问题：

入金逻辑: 入账金额 = Truncate(链上原始金额, 系统配置精度)。
系统粉尘 (System Dust): 截断后的余数留在热钱包中，归系统所有，不归属用户。

5.2 三角对账策略 (Triangular Reconciliation)

使用三个独立数据源验证系统偿付能力：

来源	别名	数据点
区块链 RPC	资产证明 (PoA)	`getbalance()` 或 UTXO 总和
内部账本	负债证明 (PoL)	`SUM(user.available + user.frozen)`
流水历史	流水证明 (PoF)	`SUM(充值) - SUM(提现) - SUM(手续费)`

核心对账公式: PoA == PoL + 系统利润

5.3 重组恢复协议

浅层重组: 哨兵自动回滚游标。
深层重组 (> 最大深度): 触发熔断，需人工介入 (冻结提现 + 资金冲正)。

6. 数据库模式扩展

CREATE TABLE chain_cursor (
    chain_id VARCHAR(16) PRIMARY KEY, -- 'BTC', 'ETH'
    last_scanned_height BIGINT NOT NULL,
    last_scanned_hash VARCHAR(128) NOT NULL,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

ALTER TABLE deposit_history 
ADD COLUMN chain_id VARCHAR(16),
ADD COLUMN confirmations INT DEFAULT 0;
-- (其他字段省略)

7. 配置：拒绝硬编码

所有特定于链的参数（确认数、重组深度、最小入金阈值）必须从 YAML 配置文件加载。

8. 安全：HD 钱包架构

8.1 密钥存储

冷存储 (离线): 私钥/助记词永远离线保存。
热服务 (在线): 仅部署 扩展公钥 (XPUB)。

8.2 地址派生

BTC: BIP84 (原生 SegWit) m/84'/0'/0'/0/{index}
ETH: BIP44 m/44'/60'/0'/0/{index}

8.3 “Gap Limit” 解决方案

问题: 标准钱包在连续 20 个空地址后停止扫描。
方案: 全索引扫描。哨兵将 user_addresses 表中所有活跃地址加载到 HashSet (当前实现) 或 Bloom Filter (未来优化)，无视 Gap Limit。

9. 未来工作 (本次范围之外)

Bloom Filters: 百万级用户地址匹配优化。
自动冲正 (Automated Clawback): 针对深层重组的自动化处理。
多源验证: 对抗单一 RPC 节点被劫持的风险。

10. 总结

Phase 0x11-a 将资金系统从模拟环境升级为生产就绪的区块链集成架构。

11. 实施状态报告 (2025-12-29)

11.1 已完成功能

核心资金流: DepositService/WithdrawService 实现，并严格遵守整型持久化 (BigInt/i64)。
哨兵 (BTC): 基础 BtcScanner 已上线 (轮询 getblock, HashSet 地址匹配)。
API 层: 充提历史接口已修复 (QA-01)，内部 mock 接口已加固 (QA-03)。
地址校验: 实现 BTC/ETH 下的严格格式正则校验 (DEF-001)。

11.2 验证与测试指南

运行全量验证套件 (包含 Core/Chaos/Security 测试):

bash scripts/run_0x11a_verification.sh

验证结果:

Agent B (Core): 地址持久化, 充提生命周期 ✅
Agent A (Chaos): 幂等性, 竞态条件鲁棒性 ✅
Agent C (Security): 地址隔离, 内部接口鉴权 ✅

11.3 已知限制 (推迟至 0x11-b)

ETH / ERC20 支持: Ethereum 的真实链集成 尚未实现 (Pending)。EthScanner 目前仅为 Stub。
DEF-002 (Sentinel SegWit): 当前 bitcoincore-rpc 集成在 regtest 环境下解析 P2WPKH 地址存在问题，可能会漏掉隔离见证存款。
Bloom Filter: 当前版本使用 HashSet 进行地址匹配，Bloom Filter 优化推迟至 Phase 0x12。

↑ 回到顶部

0x11-b Sentinel Hardening

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

Status	✅ COMPLETE (Core)
Date	2025-12-30
Context	Phase 0x11-a Extension: Hardening Sentinel for Production
Goal	Fix SegWit blindness (DEF-002), implement ETH/ERC20 & ADR-005/006.
Branch	`0x11-b-sentinel-hardening`
Latest Commit	`d307e12`

1. Objectives

This phase addresses the critical gaps identified during Phase 0x11-a QA:

Priority	Issue	Description
P0	DEF-002	Sentinel fails to detect P2WPKH (SegWit) deposits on BTC.
P1	ETH Gap	`EthScanner` is a stub; no real ERC20 event parsing.

2. Deposit Flow Architecture

Important

🚨 Production Risk Control Requirements

Before crediting user balance on finalization, deposits SHOULD pass through:

Source Verification - Check if sender address is on sanctions/blacklist

Amount Thresholds - Large deposits may require enhanced verification

Pattern Analysis - Detect unusual deposit patterns (structuring, layering)

AML Compliance - Regulatory reporting for threshold amounts

Address Attribution - Verify expected vs actual funding sources

The current implementation credits balance automatically on finalization.

2.1 Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                           Sentinel Deposit Flow                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌──────────┐    ┌──────────────┐    ┌────────────────┐    ┌─────────────┐ │
│  │ BTC/ETH  │───▶│ ChainScanner │───▶│ Confirmation   │───▶│ Deposit     │ │
│  │  Node    │    │              │    │    Monitor     │    │  Pipeline   │ │
│  └──────────┘    └──────────────┘    └────────────────┘    └─────────────┘ │
│       ▲                 │                    │                    │        │
│       │                 ▼                    ▼                    ▼        │
│       │          ┌─────────────┐      ┌───────────┐      ┌─────────────┐   │
│       │          │ ScannedBlock│      │ deposit_  │      │ balances_tb │   │
│       │          │ + Deposits  │      │ history   │      │ (Balance)   │   │
│       │          └─────────────┘      └───────────┘      └─────────────┘   │
│       │                                    DB                   DB         │
└───────┴─────────────────────────────────────────────────────────────────────┘

2.2 State Machine

DETECTED ──▶ CONFIRMING ──▶ FINALIZED ──▶ SUCCESS
              │                              │
              └───────── ORPHANED ◀──────────┘
                      (Re-org detected)

Status	Meaning	Balance Impact
`DETECTED`	On-chain detected, awaiting confirmation	❌
`CONFIRMING`	1+ confirmations, not yet finalized	❌
`FINALIZED`	Required confirmations reached	🔄 Processing
`SUCCESS`	Balance credited	✅
`ORPHANED`	Block re-orged, tx invalidated	❌

2.3 Key Components

Component	File	Responsibility
`BtcScanner`	`src/sentinel/btc.rs`	Scan BTC blocks, extract P2PKH/P2WPKH addresses
`EthScanner`	`src/sentinel/eth.rs`	Scan ETH blocks via JSON-RPC
`ConfirmationMonitor`	`src/sentinel/confirmation.rs`	Track confirmations, detect re-orgs
`DepositPipeline`	`src/sentinel/pipeline.rs`	Credit balance on finalization

2.4 Database Schema

deposit_history (Deposit Records):

tx_hash       VARCHAR PRIMARY KEY  -- Transaction hash
user_id       BIGINT               -- User ID
asset         VARCHAR              -- Asset (BTC/ETH)
amount        DECIMAL              -- Amount
chain_id      VARCHAR              -- Chain ID
block_height  BIGINT               -- Block height
block_hash    VARCHAR              -- Block hash (for re-org detection)
status        VARCHAR              -- Status (see state machine)
confirmations INT                  -- Current confirmation count

3. Withdraw Flow Architecture

Caution

⛔ Production Risk Control Requirements ⛔

The current implementation is for MVP/Testing only. Before production deployment, withdrawals MUST pass through:

Comprehensive Risk Engine - Real-time fraud detection, velocity limits, address blacklist

Manual Review - Large amounts require human approval

Multi-signature Approval - Hot wallet threshold triggers cold wallet multi-sig

AML/KYC Verification - Regulatory compliance checks

Delay Mechanism - Suspicious transactions held for review period

Never deploy the current auto-approval flow to production!

3.1 Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                         Withdraw Flow (Push Model)                          │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌──────────┐    ┌──────────────┐    ┌────────────────┐    ┌─────────────┐ │
│  │   User   │───▶│ WithdrawServ │───▶│   Balance      │───▶│   Chain     │ │
│  │  Request │    │     ice      │    │    Deduct      │    │  Broadcast  │ │
│  └──────────┘    └──────────────┘    └────────────────┘    └─────────────┘ │
│       │                 │                    │                    │        │
│       │                 ▼                    ▼                    ▼        │
│       │          ┌─────────────┐      ┌───────────┐      ┌─────────────┐   │
│       │          │  Validate   │      │ withdraw_ │      │   TX Hash   │   │
│       │          │  Address    │      │  history  │      │   or Fail   │   │
│       │          └─────────────┘      └───────────┘      └─────────────┘   │
│       │                                    DB                   ▼          │
│       │                              ┌─────────────────────────────────┐   │
│       │                              │ On Fail: AUTO REFUND to balance │   │
│       │                              └─────────────────────────────────┘   │
└───────┴─────────────────────────────────────────────────────────────────────┘

3.2 Flow Steps

1. Validate Request
   └─▶ Address format ✓, Amount > 0 ✓

2. Lock & Check Balance (FOR UPDATE)
   └─▶ available >= amount ? Continue : Error

3. Deduct Balance (Immediate)
   └─▶ available -= amount

4. Create Record (PROCESSING)
   └─▶ INSERT INTO withdraw_history

5. COMMIT Transaction
   └─▶ Balance deducted, record created

6. Broadcast to Chain
   ├─▶ Success: UPDATE status = 'SUCCESS', tx_hash = ?
   └─▶ Failure: AUTO REFUND + status = 'FAILED'

3.3 State Machine

           ┌──────────────┐
           │  PROCESSING  │
           └──────┬───────┘
                  │
      ┌───────────┼───────────┐
      ▼                       ▼
┌──────────┐           ┌──────────┐
│  SUCCESS │           │  FAILED  │
│  (✅ TX) │           │(Refunded)│
└──────────┘           └──────────┘

Status	Meaning	Balance Impact
`PROCESSING`	Request submitted, awaiting broadcast	💰 Deducted
`SUCCESS`	TX broadcast successful	✅ Completed
`FAILED`	Broadcast failed, auto-refunded	🔄 Refunded

3.4 Key Components

Component	File	Responsibility
`WithdrawService`	`src/funding/withdraw.rs`	Validate, deduct, broadcast, refund
`ChainClient`	`src/funding/chain_adapter.rs`	Blockchain TX broadcast interface
`handlers::apply_withdraw`	`src/funding/handlers.rs`	HTTP API endpoint

3.5 Database Schema

withdraw_history (Withdraw Records):

request_id    VARCHAR PRIMARY KEY  -- Request UUID
user_id       BIGINT               -- User ID
asset         VARCHAR              -- Asset (BTC/ETH)
amount        BIGINT               -- Amount (scaled integer)
fee           BIGINT               -- Network fee (scaled integer)
to_address    VARCHAR              -- Destination address
status        VARCHAR              -- PROCESSING/SUCCESS/FAILED
tx_hash       VARCHAR              -- Blockchain TX hash (on success)
created_at    TIMESTAMP            -- Created time
updated_at    TIMESTAMP            -- Updated time

3.6 Amount Calculation

User Balance Delta = -Request Amount
Network Receive    = Request Amount - Fee

Example:

User requests withdraw 1.0 BTC with 0.0001 BTC fee
Balance deducted: 1.0 BTC
Network receives: 0.9999 BTC

4. 🛡️ Tiered Risk Control Framework (Defense in Depth)

4.1 Defense Layers

┌─────────────────────────────────────────────────────────────────────────────┐
│                       Defense in Depth Architecture                         │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Layer 1: 🟢 AUTOMATED                                                      │
│  ├─▶ Address blacklist/sanctions check                                      │
│  ├─▶ Velocity limits (per hour/day/week)                                    │
│  └─▶ Basic fraud pattern detection                                          │
│                                                                             │
│  Layer 2: 🟡 THRESHOLD-BASED                                                │
│  ├─▶ Amount > $1K: Enhanced verification                                    │
│  ├─▶ Amount > $10K: 24-hour delay + notification                            │
│  └─▶ Amount > $50K: Requires Layer 3                                        │
│                                                                             │
│  Layer 3: 🔴 MANUAL REVIEW                                                  │
│  ├─▶ Human analyst verification                                             │
│  ├─▶ Source of funds documentation                                          │
│  └─▶ Multi-party approval (2-of-3)                                          │
│                                                                             │
│  Layer 4: ⚫ COLD WALLET MULTI-SIG                                          │
│  ├─▶ Amount > $100K: Cold wallet release                                    │
│  ├─▶ Hardware key requirement                                               │
│  └─▶ Geographic distribution of signers                                     │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

4.2 Risk Tiers by Amount

Tier	Amount	Delay	Approval	Wallet
🟢 T1	< $1,000	None	Auto	Hot
🟡 T2	$1K - $10K	1 hour	Auto + Alert	Hot
🟠 T3	$10K - $50K	24 hours	1-of-2 Manual	Hot
🔴 T4	$50K - $100K	48 hours	2-of-3 Manual	Warm
⚫ T5	> $100K	72 hours	3-of-5 + HSM	Cold

4.3 Automated Checks (All Tiers)

Check	Block	Alert
OFAC/Sanctions list	✅	✅
Address blacklist	✅	✅
Velocity limit exceeded	✅	✅
New address (< 24h)	⚠️ T2+	✅
Unusual amount pattern	⚠️ Delay	✅
Geographic anomaly	⚠️ Delay	✅

4.4 Deposit-Specific Checks

┌────────────────────────────────────────────────────────────────┐
│                    Deposit Risk Assessment                      │
├────────────────────────────────────────────────────────────────┤
│ ✓ Source address attribution (known exchange? mixer? unknown?) │
│ ✓ Transaction graph analysis (1-hop, 2-hop connections)        │
│ ✓ Timing pattern (structuring detection)                       │
│ ✓ Historical behavior baseline                                  │
│ ✓ Cross-chain correlation (same entity on ETH/BTC?)            │
└────────────────────────────────────────────────────────────────┘

4.5 Withdraw-Specific Checks

┌────────────────────────────────────────────────────────────────┐
│                   Withdraw Risk Assessment                      │
├────────────────────────────────────────────────────────────────┤
│ ✓ Destination address reputation                                │
│ ✓ First-time address penalty                                    │
│ ✓ Account age vs amount ratio                                   │
│ ✓ Recent password/2FA changes (48h cooldown)                   │
│ ✓ Device fingerprint verification                               │
│ ✓ API key usage pattern                                         │
└────────────────────────────────────────────────────────────────┘

5. Problem Analysis: DEF-002 (BTC SegWit Blindness)

5.1 Root Cause

The extract_address function in src/sentinel/btc.rs uses Address::from_script(script, network).

While the rust-bitcoin crate should support P2WPKH scripts (OP_0 <20-byte-hash>), the current implementation may fail due to:

Network mismatch between the script encoding and the Network enum passed.
Missing feature flags in the bitcoincore-rpc dependency.

5.2 Solution

Verify: Add unit test with raw P2WPKH script construction.

Fix: If Address::from_script fails, manually detect witness v0 scripts:

#![allow(unused)]
fn main() {
if script.is_p2wpkh() {
    // Extract 20-byte hash from script[2..22]
    // Construct Address::p2wpkh(...)
}
}

6. Feature Specification: ETH/ERC20 Sentinel

6.1 Architecture

┌─────────────────────────────────────────────────────────────────┐
│                       EthScanner                                │
├─────────────────────────────────────────────────────────────────┤
│ 1. Poll eth_blockNumber (Tip Tracking)                          │
│ 2. eth_getLogs(fromBlock, toBlock, topics=[Transfer])           │
│ 3. Filter: Match log.address (Contract) + topic[2] (To)         │
│ 4. Parse: Decode log.data as uint256 amount                     │
│ 5. Emit: DetectedDeposit { tx_hash, to_address, amount, ... }   │
└─────────────────────────────────────────────────────────────────┘

6.2 Key Implementation Details

Topic0 (Transfer): keccak256("Transfer(address,address,uint256)") = 0xddf252ad...
Topic1: Sender (indexed)
Topic2: Recipient (indexed) - Match against user_addresses
Data: Amount (uint256, left-padded)

6.3 Precision Handling

Token	Decimals	Scaling
ETH	18	`amount / 10^18`
USDT	6	`amount / 10^6`
USDC	6	`amount / 10^6`

Important

Token decimals MUST be loaded from assets_tb, not hardcoded.

7. Database Schema Extensions

-- EthScanner requires contract address tracking
ALTER TABLE assets_tb
ADD COLUMN contract_address VARCHAR(42); -- e.g., '0xdAC17F958D2ee523a2206206994597C13D831ec7'

-- Index for fast lookup by contract
CREATE INDEX idx_assets_contract ON assets_tb(contract_address);

8. Configuration: `config/sentinel.yaml`

eth:
  chain_id: "ETH"
  network: "anvil"  # or "mainnet", "goerli"
  rpc:
    url: "http://127.0.0.1:8545"
  scanning:
    required_confirmations: 12
    max_reorg_depth: 20
    start_height: 0
  contracts:
    - name: "USDT"
      address: "0x..."
      decimals: 6
    - name: "USDC"
      address: "0x..."
      decimals: 6

9. Acceptance Criteria

BTC: Unit test test_p2wpkh_extraction passes. ✅ (test_segwit_p2wpkh_extraction_def_002)
BTC: E2E deposit to bcrt1... address is detected and credited. ✅ (Verified via greybox test)
ETH: Unit test test_erc20_transfer_parsing passes. ✅ (7 ETH tests pass)
ETH: E2E deposit via MockUSDT contract is detected. ⏳ (Pending: ERC20 eth_getLogs not yet implemented)
Regression: All existing Phase 0x11-a tests still pass. ✅ (322 tests)

10. Implementation Status

Component	Status	Notes
`BtcScanner` P2WPKH Fix	✅ Complete	Test `test_segwit_p2wpkh_extraction_def_002` passes
`EthScanner` Implementation	✅ Complete	Full JSON-RPC (`eth_blockNumber`, `eth_getBlockByNumber`, `eth_syncing`)
Unit Tests	✅ 22 Pass	All Sentinel tests passing
E2E Verification	⚠️ Partial	Nodes not running during test; scripts ready
ERC20 Token Support	🚧 In Progress	`eth_getLogs` for Transfer events (Phase 0x11-b scope)

11. Testing Instructions

Quick Test (Rust Unit Tests)

# Run all Sentinel tests
cargo test --package zero_x_infinity --lib sentinel -- --nocapture

# Run DEF-002 verification test only
cargo test test_segwit_p2wpkh_extraction_def_002 -- --nocapture

# Run ETH Scanner tests only
cargo test sentinel::eth -- --nocapture

Full Test Suite

# Run test script (no nodes required)
./scripts/tests/0x11b_sentinel/run_tests.sh

# Run with node startup (requires docker-compose)
./scripts/tests/0x11b_sentinel/run_tests.sh --with-nodes

↑ Back to Top

🇨🇳 中文

状态	✅ 核心功能已完成
日期	2025-12-29
上下文	Phase 0x11-a 延续: 强化哨兵服务
目标	修复 SegWit 盲区 (DEF-002) 并实现 ETH/ERC20 支持。
分支	`0x11-b-sentinel-hardening`
最新提交	`d383b6c`

1. 目标

本阶段解决 Phase 0x11-a QA 中识别的关键缺陷:

优先级	问题	描述
P0	DEF-002	哨兵无法检测 BTC P2WPKH (SegWit) 充值。
P1	ETH 缺口	`EthScanner` 只是空壳；无法解析 ERC20 事件。

2. 充值流程架构

Important

🚨 生产环境风控要求

在确认完成后为用户入账之前，充值应该经过:

来源验证 - 检查发送地址是否在制裁/黑名单上

金额阈值 - 大额充值可能需要加强验证

模式分析 - 检测异常充值模式 (拆分、分层)

AML 合规 - 超过阈值金额的监管报告

地址归属 - 验证预期 vs 实际资金来源

当前实现在确认完成后自动入账。

2.1 概览

┌─────────────────────────────────────────────────────────────────────────────┐
│                           Sentinel 充值流程                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌──────────┐    ┌──────────────┐    ┌────────────────┐    ┌─────────────┐ │
│  │ BTC/ETH  │───▶│ ChainScanner │───▶│ Confirmation   │───▶│ Deposit     │ │
│  │   节点   │    │  区块扫描器  │    │    Monitor     │    │  Pipeline   │ │
│  └──────────┘    └──────────────┘    └────────────────┘    └─────────────┘ │
│       ▲                 │                    │                    │        │
│       │                 ▼                    ▼                    ▼        │
│       │          ┌─────────────┐      ┌───────────┐      ┌─────────────┐   │
│       │          │ ScannedBlock│      │ deposit_  │      │ balances_tb │   │
│       │          │  扫描区块   │      │  history  │      │   余额表    │   │
│       │          └─────────────┘      └───────────┘      └─────────────┘   │
│       │                                   数据库                数据库      │
└───────┴─────────────────────────────────────────────────────────────────────┘

2.2 状态机

DETECTED ──▶ CONFIRMING ──▶ FINALIZED ──▶ SUCCESS
    已检测       确认中          已完成       成功
              │                              │
              └───────── ORPHANED ◀──────────┘
                        已孤立 (区块重组)

状态	含义	余额影响
`DETECTED`	链上检测到，等待确认	❌
`CONFIRMING`	有 1+ 确认，尚未达标	❌
`FINALIZED`	达到所需确认数	🔄 处理中
`SUCCESS`	已入账到余额	✅
`ORPHANED`	区块被重组，交易失效	❌

2.3 关键组件

组件	文件	职责
`BtcScanner`	`src/sentinel/btc.rs`	扫描 BTC 区块，提取 P2PKH/P2WPKH 地址
`EthScanner`	`src/sentinel/eth.rs`	通过 JSON-RPC 扫描 ETH 区块
`ConfirmationMonitor`	`src/sentinel/confirmation.rs`	追踪确认数，检测重组
`DepositPipeline`	`src/sentinel/pipeline.rs`	完成后入账余额

2.4 数据库结构

deposit_history (充值记录表):

tx_hash       VARCHAR PRIMARY KEY  -- 交易哈希
user_id       BIGINT               -- 用户 ID
asset         VARCHAR              -- 资产 (BTC/ETH)
amount        DECIMAL              -- 金额
chain_id      VARCHAR              -- 链 ID
block_height  BIGINT               -- 区块高度
block_hash    VARCHAR              -- 区块哈希 (用于重组检测)
status        VARCHAR              -- 状态 (见状态机)
confirmations INT                  -- 当前确认数

3. 提现流程架构

Caution

⛔ 生产环境风控要求 ⛔

当前实现仅用于 MVP/测试。生产部署前，提现请求必须经过:

完整风控引擎 - 实时欺诈检测、频率限制、地址黑名单

人工审核 - 大额提现需人工批准

多签审批 - 热钱包阈值触发冷钱包多签

AML/KYC 验证 - 合规性检查

延迟机制 - 可疑交易进入审核等待期

绝对不要将当前自动审批流程部署到生产环境！

3.1 概览

┌─────────────────────────────────────────────────────────────────────────────┐
│                         提现流程 (推送模式)                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌──────────┐    ┌──────────────┐    ┌────────────────┐    ┌─────────────┐ │
│  │   用户   │───▶│ WithdrawServ │───▶│   余额扣减     │───▶│   链上广播  │ │
│  │   请求   │    │   提现服务   │    │   (立即)       │    │             │ │
│  └──────────┘    └──────────────┘    └────────────────┘    └─────────────┘ │
│       │                 │                    │                    │        │
│       │                 ▼                    ▼                    ▼        │
│       │          ┌─────────────┐      ┌───────────┐      ┌─────────────┐   │
│       │          │   地址验证  │      │ withdraw_ │      │ TX Hash 或  │   │
│       │          │             │      │  history  │      │   失败      │   │
│       │          └─────────────┘      └───────────┘      └─────────────┘   │
│       │                                   数据库                 ▼         │
│       │                              ┌─────────────────────────────────┐   │
│       │                              │ 失败时: 自动退款到余额          │   │
│       │                              └─────────────────────────────────┘   │
└───────┴─────────────────────────────────────────────────────────────────────┘

3.2 流程步骤

1. 验证请求
   └─▶ 地址格式 ✓, 金额 > 0 ✓

2. 锁定并检查余额 (FOR UPDATE)
   └─▶ 可用余额 >= 金额 ? 继续 : 错误

3. 扣减余额 (立即)
   └─▶ 可用余额 -= 金额

4. 创建记录 (PROCESSING)
   └─▶ INSERT INTO withdraw_history

5. 提交事务
   └─▶ 余额已扣减，记录已创建

6. 广播到链
   ├─▶ 成功: UPDATE status = 'SUCCESS', tx_hash = ?
   └─▶ 失败: 自动退款 + status = 'FAILED'

3.3 状态机

           ┌──────────────┐
           │  PROCESSING  │
           │    处理中    │
           └──────┬───────┘
                  │
      ┌───────────┼───────────┐
      ▼                       ▼
┌──────────┐           ┌──────────┐
│  SUCCESS │           │  FAILED  │
│   成功   │           │  失败    │
│  (✅ TX) │           │(已退款)  │
└──────────┘           └──────────┘

状态	含义	余额影响
`PROCESSING`	请求已提交，等待广播	💰 已扣减
`SUCCESS`	交易广播成功	✅ 完成
`FAILED`	广播失败，已自动退款	🔄 已退款

3.4 关键组件

组件	文件	职责
`WithdrawService`	`src/funding/withdraw.rs`	验证、扣减、广播、退款
`ChainClient`	`src/funding/chain_adapter.rs`	区块链交易广播接口
`handlers::apply_withdraw`	`src/funding/handlers.rs`	HTTP API 端点

3.5 数据库结构

withdraw_history (提现记录表):

request_id    VARCHAR PRIMARY KEY  -- 请求 UUID
user_id       BIGINT               -- 用户 ID
asset         VARCHAR              -- 资产 (BTC/ETH)
amount        BIGINT               -- 金额 (整数缩放)
fee           BIGINT               -- 网络手续费 (整数缩放)
to_address    VARCHAR              -- 目标地址
status        VARCHAR              -- PROCESSING/SUCCESS/FAILED
tx_hash       VARCHAR              -- 区块链交易哈希 (成功时)
created_at    TIMESTAMP            -- 创建时间
updated_at    TIMESTAMP            -- 更新时间

3.6 金额计算

用户余额变化 = -请求金额
链上到账金额 = 请求金额 - 手续费

示例:

用户请求提现 1.0 BTC，手续费 0.0001 BTC
余额扣减: 1.0 BTC
链上到账: 0.9999 BTC

4. 🛡️ 分级纵深防御风控框架

4.1 防御层级

┌─────────────────────────────────────────────────────────────────────────────┐
│                          纵深防御架构                                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  第一层: 🟢 自动化检查                                                       │
│  ├─▶ 地址黑名单/制裁名单检查                                                 │
│  ├─▶ 频率限制 (每小时/每天/每周)                                              │
│  └─▶ 基础欺诈模式检测                                                        │
│                                                                             │
│  第二层: 🟡 阈值触发                                                         │
│  ├─▶ 金额 > ¥7K: 加强验证                                                    │
│  ├─▶ 金额 > ¥70K: 24小时延迟 + 通知                                          │
│  └─▶ 金额 > ¥350K: 进入第三层                                                │
│                                                                             │
│  第三层: 🔴 人工审核                                                         │
│  ├─▶ 人工分析师验证                                                          │
│  ├─▶ 资金来源证明文件                                                        │
│  └─▶ 多方审批 (2-of-3)                                                       │
│                                                                             │
│  第四层: ⚫ 冷钱包多签                                                        │
│  ├─▶ 金额 > ¥700K: 冷钱包释放                                                │
│  ├─▶ 硬件密钥要求                                                            │
│  └─▶ 签名者地理分布                                                          │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

4.2 风险分级 (按金额)

层级	金额	延迟	审批	钱包
🟢 T1	< ¥7,000	无	自动	热
🟡 T2	¥7K - ¥70K	1小时	自动 + 告警	热
🟠 T3	¥70K - ¥350K	24小时	1-of-2 人工	热
🔴 T4	¥350K - ¥700K	48小时	2-of-3 人工	温
⚫ T5	> ¥700K	72小时	3-of-5 + HSM	冷

4.3 自动化检查 (所有层级)

检查项	阻止	告警
OFAC/制裁名单	✅	✅
地址黑名单	✅	✅
超过频率限制	✅	✅
新地址 (< 24h)	⚠️ T2+	✅
异常金额模式	⚠️ 延迟	✅
地理位置异常	⚠️ 延迟	✅

4.4 充值专项检查

┌────────────────────────────────────────────────────────────────┐
│                       充值风险评估                              │
├────────────────────────────────────────────────────────────────┤
│ ✓ 来源地址归属 (已知交易所? 混币器? 未知?)                      │
│ ✓ 交易图谱分析 (1跳、2跳关联)                                   │
│ ✓ 时序模式 (拆分检测)                                          │
│ ✓ 历史行为基线                                                  │
│ ✓ 跨链关联 (同一实体在 ETH/BTC?)                                │
└────────────────────────────────────────────────────────────────┘

4.5 提现专项检查

┌────────────────────────────────────────────────────────────────┐
│                       提现风险评估                              │
├────────────────────────────────────────────────────────────────┤
│ ✓ 目标地址信誉                                                  │
│ ✓ 首次使用地址惩罚                                              │
│ ✓ 账户年龄 vs 金额比率                                          │
│ ✓ 近期密码/2FA变更 (48h冷却)                                    │
│ ✓ 设备指纹验证                                                  │
│ ✓ API密钥使用模式                                               │
└────────────────────────────────────────────────────────────────┘

5. 问题分析: DEF-002 (BTC SegWit 盲区)

5.1 根因

src/sentinel/btc.rs 中的 extract_address 函数使用 Address::from_script(script, network)。

虽然 rust-bitcoin 库 理论上 支持 P2WPKH 脚本 (OP_0 <20-byte-hash>)，但当前实现可能因以下原因失败:

脚本编码与传入的 Network 枚举不匹配。
bitcoincore-rpc 依赖缺少必要的 feature flags。

5.2 解决方案

验证: 添加单元测试，手动构造原始 P2WPKH 脚本。

修复: 如果 Address::from_script 失败，手动检测 witness v0 脚本:

#![allow(unused)]
fn main() {
if script.is_p2wpkh() {
    // 从 script[2..22] 提取 20 字节哈希
    // 构造 Address::p2wpkh(...)
}
}

6. 功能规格: ETH/ERC20 哨兵

6.1 架构

┌─────────────────────────────────────────────────────────────────┐
│                       EthScanner                                │
├─────────────────────────────────────────────────────────────────┤
│ 1. 轮询 eth_blockNumber (区块高度追踪)                           │
│ 2. eth_getLogs(fromBlock, toBlock, topics=[Transfer])           │
│ 3. 过滤: 匹配 log.address (合约) + topic[2] (收款人)             │
│ 4. 解析: 将 log.data 解码为 uint256 金额                         │
│ 5. 产出: DetectedDeposit { tx_hash, to_address, amount, ... }   │
└─────────────────────────────────────────────────────────────────┘

6.2 关键实现细节

Topic0 (Transfer): keccak256("Transfer(address,address,uint256)") = 0xddf252ad...
Topic1: 发送方 (indexed)
Topic2: 接收方 (indexed) - 与 user_addresses 匹配
Data: 金额 (uint256, 左填充)

6.3 精度处理

代币	小数位	缩放比例
ETH	18	`amount / 10^18`
USDT	6	`amount / 10^6`
USDC	6	`amount / 10^6`

Important

代币精度必须从 assets_tb 加载，禁止硬编码。

7. 数据库模式扩展

-- EthScanner 需要追踪合约地址
ALTER TABLE assets_tb
ADD COLUMN contract_address VARCHAR(42); -- 例: '0xdAC17F958D2ee523a2206206994597C13D831ec7'

-- 按合约快速查询的索引
CREATE INDEX idx_assets_contract ON assets_tb(contract_address);

8. 配置: `config/sentinel.yaml`

eth:
  chain_id: "ETH"
  network: "anvil"  # 或 "mainnet", "goerli"
  rpc:
    url: "http://127.0.0.1:8545"
  scanning:
    required_confirmations: 12
    max_reorg_depth: 20
    start_height: 0
  contracts:
    - name: "USDT"
      address: "0x..."
      decimals: 6
    - name: "USDC"
      address: "0x..."
      decimals: 6

9. 验收标准

BTC: 单元测试 test_p2wpkh_extraction 通过。 ✅ (test_segwit_p2wpkh_extraction_def_002)
BTC: E2E 测试中充值到 bcrt1... 地址被检测并入账。 ✅ (通过 greybox 测试验证)
ETH: 单元测试 test_erc20_transfer_parsing 通过。 ✅ (7 个 ETH 测试通过)
ETH: E2E 测试中通过 MockUSDT 合约充值被检测。 ⏳ (待完成: ERC20 eth_getLogs 尚未实现)
回归: 所有 Phase 0x11-a 现有测试仍然通过。 ✅ (322 个测试)

10. 实施状态

组件	状态	备注
`BtcScanner` P2WPKH 修复	✅ 已完成	测试 `test_segwit_p2wpkh_extraction_def_002` 通过
`EthScanner` 实现	✅ 已完成	完整 JSON-RPC (`eth_blockNumber`, `eth_getBlockByNumber`, `eth_syncing`)
单元测试	✅ 22 通过	所有 Sentinel 测试通过
E2E 验证	⚠️ 部分	测试时节点未运行；脚本已就绪
ERC20 代币支持	🚧 进行中	`eth_getLogs` for Transfer events (Phase 0x11-b 范围)

11. 测试方法

快速测试 (Rust 单元测试)

# 运行所有 Sentinel 测试
cargo test --package zero_x_infinity --lib sentinel -- --nocapture

# 仅运行 DEF-002 验证测试
cargo test test_segwit_p2wpkh_extraction_def_002 -- --nocapture

# 仅运行 ETH Scanner 测试
cargo test sentinel::eth -- --nocapture

完整测试套件

# 运行测试脚本 (无需节点)
./scripts/tests/0x11b_sentinel/run_tests.sh

# 运行测试脚本 (自动启动节点, 需要 docker-compose)
./scripts/tests/0x11b_sentinel/run_tests.sh --with-nodes

↑ 回到顶部

Appendix A: Industry Standards Reference

Full Design: See Chains Schema Design for complete schema and industry standards.

Naming Conventions

Concept	Industry Term	Our Column	Type
Business ID	`shortName`	`chain_slug`	VARCHAR
EIP-155 ID	`chainId`	`chain_id`	INTEGER
Native Token	`nativeCurrency.symbol`	`native_currency`	VARCHAR

References

EIP-155 - Ethereum Chain ID
ethereum-lists/chains - Chain Registry
SLIP-0044 - BIP-44 Coin Types

Phase 0x11-b Schema

-- Minimum viable: uses chain_slug only
CREATE TABLE user_addresses (
    user_id BIGINT,
    asset VARCHAR(32),
    chain_slug VARCHAR(32),  -- "eth", "btc"
    address VARCHAR(255),
    PRIMARY KEY (user_id, asset, chain_slug)
);

0x12 Real Trading Verification

🚧 Documentation In Progress

0x13 Market Data Experience

🚧 Documentation In Progress

0x14 Extreme Optimization: Methodology

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

Phase V Keynote Codename: “Metal Mode” Philosophy: “If you can’t measure it, you can’t improve it.”

1. The Performance Ceiling

In the previous chapters, we built a highly reliable exchange core (Phase I-IV). We achieved 1.3M TPS on a single thread using the Ring Buffer architecture. This is “fast enough” for 99% of crypto exchanges.

But for top-tier HFT engines, “Fast Enough” is not enough. We want to hit the physical limits of the CPU and Memory.

1.1 Why “Extreme Optimization”?

Phase	Focus	Goal
I-III	Correctness	“Does it work?”
IV	Integration	“Does it work end-to-end?”
V	Speed	“How fast can it go?”

In Phase V, we assume correctness is already proven. Our sole focus is performance.

1.2 Why “Metal Mode”?

“Metal Mode” is our internal codename. It means:

Close to the Metal: We will bypass high-level abstractions and work directly with memory layouts, CPU caches, and SIMD instructions.
Bare Metal Rust: No unnecessary clone(), no hidden malloc(), no runtime surprises.

2. The Benchmarking Methodology (Tier 2)

To optimize, we must first measure. But what we measure matters.

2.1 The Problem with Naive Benchmarks

Benchmark Type	What it Measures	Problem for Optimization
`wrk` / `curl`	HTTP round-trip	Includes OS, Network, Kernel noise
Unit tests	Function correctness	No performance data

These are useful for validation (Phase IV), but not for isolation (Phase V).

2.2 Tier 2: Pipeline Benchmarks

We introduce Tier 2 Pipeline Benchmarks:

Feature	Description
No Network I/O	Data is pre-loaded in memory.
No Disk I/O	WAL is mocked or in-memory.
Pure CPU/Memory	Measures only the “Hot Path”: RingBuffer → UBSCore → ME → Settlement.
Deterministic	Same input → Same output → Same timing.

Goal: Establish the “Red Line” – the current baseline performance under ideal conditions. All future optimizations will be measured against this.

🇨🇳 中文

Phase V 基调 内部代号: “Metal Mode” 核心哲学: “无法测量，就无法优化。”

1. 性能天花板

在前几个阶段（Phase I-IV），我们构建了一个高可靠的交易所核心。利用 Ring Buffer 架构，我们在单线程上实现了 130万 TPS。对于 99% 的加密货币交易所来说，这已经“足够快“了。

但对于顶级的 HFT 引擎，“足够快“是不够的。我们要触达 CPU 和内存的物理极限。

1.1 为什么叫 “Extreme Optimization”？

阶段	关注点	目标
I-III	正确性	“能跑吗？”
IV	集成	“端到端能跑通吗？”
V	速度	“能跑多快？”

在 Phase V，我们假设正确性已经被验证。唯一的焦点是性能。

1.2 为什么叫 “Metal Mode”？

“Metal Mode” 是我们的内部代号，意为：

贴近金属 (Close to the Metal)：我们将绕过高层抽象，直接操作内存布局、CPU 缓存和 SIMD 指令。
Bare Metal Rust：没有不必要的 clone()，没有隐藏的 malloc()，没有运行时惊喜。

2. 基准测试方法论 (Tier 2)

要优化，必须先测量。但测什么至关重要。

2.1 朴素基准测试的问题

基准测试类型	测量内容	优化的问题
`wrk` / `curl`	HTTP 往返	包含操作系统、网络、内核噪声
单元测试	函数正确性	没有性能数据

这些对于验证 (Phase IV) 有用，但不适合隔离测试 (Phase V)。

2.2 Tier 2: 流水线基准测试 (Pipeline Benchmarks)

我们引入 Tier 2 流水线基准测试：

特性	描述
无网络 I/O	数据预加载在内存中。
无磁盘 I/O	WAL 被 Mock 或在内存中。
纯 CPU/内存	只测量“热路径“：RingBuffer → UBSCore → ME → Settlement。
确定性	相同输入 → 相同输出 → 相同耗时。

目标：建立 “Red Line (红线)” – 理想条件下的当前基线性能。所有后续优化都将以此为基准进行衡量。

0x14-a Benchmark Harness: Test Data Generation

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

Status	✅ IMPLEMENTED / QA VERIFIED (Phase 0x14-a Complete)
Date	2025-12-30
Context	Phase V: Extreme Optimization (Step 1)
Goal	Re-implement Exchange-Core test data generation algorithm in Rust and verify correctness against golden data.

1. Chapter Objectives

#	Goal	Deliverable
1	Implement LCG PRNG	`src/bench/java_random.rs` - Java-compatible random generator
2	Implement Order Generator	`src/bench/order_generator.rs` - Deterministic order sequence
3	Verify Correctness	Unit tests that compare generated data with `golden_*.csv`

Success Criteria: Generated data matches golden CSV byte-for-byte (same order_id, price, size, uid for each row).

2. Reference Algorithm: LCG PRNG

The Exchange-Core project uses Java’s java.util.Random as its PRNG. We must implement a bit-exact replica.

2.1 Java Random Implementation

#![allow(unused)]
fn main() {
/// Java-compatible Linear Congruential Generator
pub struct JavaRandom {
    seed: u64,
}

impl JavaRandom {
    const MULTIPLIER: u64 = 0x5DEECE66D;
    const ADDEND: u64 = 0xB;
    const MASK: u64 = (1 << 48) - 1;

    pub fn new(seed: i64) -> Self {
        Self {
            seed: (seed as u64 ^ Self::MULTIPLIER) & Self::MASK,
        }
    }

    fn next(&mut self, bits: u32) -> i32 {
        self.seed = self.seed
            .wrapping_mul(Self::MULTIPLIER)
            .wrapping_add(Self::ADDEND) & Self::MASK;
        (self.seed >> (48 - bits)) as i32
    }

    pub fn next_int(&mut self, bound: i32) -> i32 {
        assert!(bound > 0);
        let bound = bound as u32;
        if (bound & bound.wrapping_sub(1)) == 0 {
            // Power of two
            return ((bound as u64 * self.next(31) as u64) >> 31) as i32;
        }
        loop {
            let bits = self.next(31) as u32;
            let val = bits % bound;
            if bits.wrapping_sub(val).wrapping_add(bound.wrapping_sub(1)) >= bits {
                return val as i32;
            }
        }
    }

    pub fn next_long(&mut self) -> i64 {
        ((self.next(32) as i64) << 32) + self.next(32) as i64
    }

    pub fn next_double(&mut self) -> f64 {
        let a = (self.next(26) as u64) << 27;
        let b = self.next(27) as u64;
        (a + b) as f64 / ((1u64 << 53) as f64)
    }
}
}

2.2 Seed Derivation

Each test session derives its seed from symbol_id and benchmark_seed:

#![allow(unused)]
fn main() {
fn derive_session_seed(symbol_id: i32, benchmark_seed: i64) -> i64 {
    let mut hash: i64 = 1;
    hash = 31 * hash + (symbol_id as i64 * -177277);
    hash = 31 * hash + (benchmark_seed * 10037 + 198267);
    hash
}
}

3. Golden Data Reference

Location: docs/exchange_core_verification_kit/golden_data/

File	Records	Seed	Description
`golden_single_pair_margin.csv`	11,000	1	Margin (futures) contract
`golden_single_pair_exchange.csv`	11,000	1	Spot exchange

CSV Format:

phase,command,order_id,symbol,price,size,action,order_type,uid

4. Implementation Checklist

Step 1: Create src/bench/mod.rs
Step 2: Implement JavaRandom in src/bench/java_random.rs
- Unit test: verify first 100 random numbers match Java output
Step 3: Implement TestOrdersGenerator in src/bench/order_generator.rs
- Pareto distribution for symbol/user weights
- Order generation logic (GTC orders for FILL phase)
- Seed derivation using Objects.hash formula
Step 4: Load and compare with golden CSV
- #[test] fn test_golden_single_pair_margin()
- #[test] fn test_golden_single_pair_exchange()

5. Implementation Results

Note

✅ FILL PHASE: 100% BIT-EXACT MATCH (1,000 orders) ⚠️ BENCHMARK PHASE: Requires matching engine (10,000 orders)

5.1 FILL Phase (Rows 1-1000)

Field	Match Status	Formula
Price	✅ 100%	`pow(r,2)*deviation` + 4-value averaging
Size	✅ 100%	`1 + rand(6)rand(6)rand(6)`
Action	✅ 100%	`(rand(4)+priceDir>=2) ? BID : ASK`
UID	✅ 100%	Pareto user account generation

5.2 BENCHMARK Phase Analysis

Component	Status	Notes
RNG Sequence	✅ Aligned	`nextInt(4)` for action FIRST, then `nextInt(q_range)`
Order Selection	✅ Aligned	Uses `orderUids` iterator (BTreeMap deterministic)
IOC Simulation	✅ Implemented	Shadow order book with `simulate_ioc_match`
Order Book Feedback	❌ Gap	Java uses real matcher feedback for `lackOfOrders`

Important

BENCHMARK Phase Gap: Java’s generateRandomOrder uses lastOrderBookOrdersSizeAsk/Bid from the real matching engine (updated in updateOrderBookSizeStat). Without a full Rust matching engine, the shadow book diverges from Java’s state.

5.3 Golden Data Scale

Dataset	FILL	BENCHMARK	Total
`golden_single_pair_margin.csv`	1,000	10,000	11,000
`golden_single_pair_exchange.csv`	1,000	10,000	11,000

5.4 Key Implementation Details

JavaRandom - Bit-exact java.util.Random LCG
Seed derivation: Objects.hash(symbol*-177277, seed*10037+198267)
User accounts: 1 + (int)paretoSample formula
Currency order: [978, 840] based on HashMap bucket index
CENTRAL_MOVE_ALPHA: 0.01 (not 0.1)
Shadow Order Book: ask_orders/bid_orders Vec with O(1) swap_remove

6. Verification Commands

One-Click Verification:

# Run all golden data verification tests
cargo test golden_ -- --nocapture

Detailed Comparison Test:

# Compare first 20 orders against golden CSV with full output
cargo test test_generator_vs_golden_detailed -- --nocapture

All Benchmark Tests:

# Run all tests in the bench module
cargo test bench:: -- --nocapture

Expected Output:

[  1] ✅ | Golden: id=1, price=34386, size=  1, action=BID, uid=377
[  2] ✅ | Golden: id=2, price=34135, size=  1, action=BID, uid=110
[  3] ✅ | Golden: id=3, price=34347, size=  2, action=BID, uid=459
...
[20] ✅ | Golden: id=20, price=34297, size=  1, action=BID, uid=491

7. Fair Benchmark Procedure

Important

Key to Fairness: Generation and Execution must be separated. Java pre-generates all commands into memory before testing.

7.1 Four Phase Separation

Phase 1: Data Pre-generation ───────── ⏸️ Not Timed
Phase 2: FILL (Pre-fill) ───────────── ⏸️ Not Timed  
Phase 3: BENCHMARK (Stress) ────────── ⏱️ Timed Phase
Phase 4: Verification ──────────────── ⏸️ Not Timed

7.2 Rust Implementation Spec

#![allow(unused)]
fn main() {
// ✅ Correct: Pre-generate -> Then Execute
let (fill_commands, benchmark_commands) = generator.pre_generate_all();

// Phase 2: FILL (Not Timed)
for cmd in &fill_commands {
    exchange.execute(cmd);
}

// Phase 3: BENCHMARK (Timed Only)
let start = Instant::now();
for cmd in &benchmark_commands {
    exchange.execute(cmd);
}
let mtps = benchmark_commands.len() as f64 / start.elapsed().as_secs_f64() / 1_000_000.0;
}

7.3 Pre-generation Interface

#![allow(unused)]
fn main() {
impl TestOrdersGeneratorSession {
    /// Pre-generate all commands for fair benchmarking
    pub fn pre_generate_all(&mut self) -> (Vec<TestCommand>, Vec<TestCommand>) {
        let fill_count = self.config.target_orders_per_side * 2;
        let benchmark_count = self.config.symbol_messages;
        
        let fill: Vec<_> = (0..fill_count).map(|_| self.next_command()).collect();
        let benchmark: Vec<_> = (0..benchmark_count).map(|_| self.next_command()).collect();
        
        (fill, benchmark)
    }
}
}

7.4 Current Status vs ME Requirements

Task	Current	Needs ME
Pre-gen Method `pre_generate_all()`	✅	-
Generate 3M orders to memory	✅	-
Export CSV for verification	✅	-
Execute FILL Phase	-	✅
Execute BENCHMARK Phase	-	✅
Global Balance Verification	-	✅

8. Phase 0x14-a Summary

8.1 Completed Components

Component	Status	Verification
JavaRandom LCG PRNG	✅	Bit-exact with Java
Seed Derivation	✅	`Objects.hash` reproduction
TestOrdersGenerator	✅	FILL 1000 rows 100% matched
Shadow OrderBook	✅	IOC Simulation implemented
Pre-gen Interface	✅	`pre_generate_all()`, `pre_generate_3m()`
Fair Test Procedure Docs	✅	Section 7, Appendix B

8.2 BENCHMARK Phase Gap Analysis

Cause	Description
Matching Engine Feedback	Java uses `lastOrderBookOrdersSizeAsk/Bid` to decide `growOrders`.
Impact	Command type distribution (GTC vs IOC) differs slightly.
Solution	Phase 0x14-b introduces full ME to reach 100% parity.

8.3 Next Steps

Priority	Task	Dependency
P0	Implement Rust Matching Engine (Phase 0x14-b)	-
P1	3M Orders Stress Test Verification	Matching Engine
P2	Latency Stats (HdrHistogram)	Matching Engine

↑ Back to Top

🇨🇳 中文

状态	✅ 已实施 / QA 验证通过 (Phase 0x14-a 完成)
日期	2025-12-30
上下文	Phase V: 极致优化 (Step 1)
目标	用 Rust 重新实现 Exchange-Core 测试数据生成算法，并对比黄金数据验证正确性。

1. 章节目标

#	目标	交付物
1	实现 LCG PRNG	`src/bench/java_random.rs` - Java 兼容随机数生成器
2	实现订单生成器	`src/bench/order_generator.rs` - 确定性订单序列
3	验证正确性	单元测试对比生成数据与 `golden_*.csv`

成功标准: 生成的数据与黄金 CSV 逐字节匹配（每行的 order_id, price, size, uid 完全一致）。

2. 参考算法: LCG PRNG

Exchange-Core 项目使用 Java 的 java.util.Random 作为 PRNG。我们必须实现一个比特级精确的副本。

2.1 Java Random Implementation

#![allow(unused)]
fn main() {
/// Java-compatible Linear Congruential Generator
pub struct JavaRandom {
    seed: u64,
}

impl JavaRandom {
    const MULTIPLIER: u64 = 0x5DEECE66D;
    const ADDEND: u64 = 0xB;
    const MASK: u64 = (1 << 48) - 1;

    pub fn new(seed: i64) -> Self {
        Self {
            seed: (seed as u64 ^ Self::MULTIPLIER) & Self::MASK,
        }
    }

    fn next(&mut self, bits: u32) -> i32 {
        self.seed = self.seed
            .wrapping_mul(Self::MULTIPLIER)
            .wrapping_add(Self::ADDEND) & Self::MASK;
        (self.seed >> (48 - bits)) as i32
    }

    pub fn next_int(&mut self, bound: i32) -> i32 {
        assert!(bound > 0);
        let bound = bound as u32;
        if (bound & bound.wrapping_sub(1)) == 0 {
            // Power of two
            return ((bound as u64 * self.next(31) as u64) >> 31) as i32;
        }
        loop {
            let bits = self.next(31) as u32;
            let val = bits % bound;
            if bits.wrapping_sub(val).wrapping_add(bound.wrapping_sub(1)) >= bits {
                return val as i32;
            }
        }
    }

    pub fn next_long(&mut self) -> i64 {
        ((self.next(32) as i64) << 32) + self.next(32) as i64
    }

    pub fn next_double(&mut self) -> f64 {
        let a = (self.next(26) as u64) << 27;
        let b = self.next(27) as u64;
        (a + b) as f64 / ((1u64 << 53) as f64)
    }
}
}

2.2 Seed Derivation

Each test session derives its seed from symbol_id and benchmark_seed:

#![allow(unused)]
fn main() {
fn derive_session_seed(symbol_id: i32, benchmark_seed: i64) -> i64 {
    let mut hash: i64 = 1;
    hash = 31 * hash + (symbol_id as i64 * -177277);
    hash = 31 * hash + (benchmark_seed * 10037 + 198267);
    hash
}
}

3. 黄金数据参考

位置: docs/exchange_core_verification_kit/golden_data/

文件	记录数	Seed	描述
`golden_single_pair_margin.csv`	11,000	1	保证金（期货）合约
`golden_single_pair_exchange.csv`	11,000	1	现货交易

4. 实施清单

步骤 1: 创建 src/bench/mod.rs
步骤 2: 在 src/bench/java_random.rs 中实现 JavaRandom
- 单元测试: 验证前 100 个随机数与 Java 输出匹配
步骤 3: 在 src/bench/order_generator.rs 中实现 TestOrdersGenerator
- Pareto 分布用于用户权重
- 订单生成逻辑 (GTC 阶段)
- 使用 Objects.hash 公式进行种子派生
步骤 4: 加载并对比黄金 CSV
- #[test] fn test_golden_single_pair_margin()
- #[test] fn test_golden_single_pair_exchange()

5. 实现结果

Note

✅ FILL 阶段: 100% 比特精确匹配 (1,000 订单) ⚠️ BENCHMARK 阶段: 需要匹配引擎 (10,000 订单)

5.1 FILL 阶段 (行 1-1000)

字段	匹配状态	公式
Price	✅ 100%	`pow(r,2)*deviation` + 4 值平均
Size	✅ 100%	`1 + rand(6)rand(6)rand(6)`
Action	✅ 100%	`(rand(4)+priceDir>=2) ? BID : ASK`
UID	✅ 100%	Pareto 用户账户生成

5.2 BENCHMARK 阶段分析

组件	状态	说明
RNG 序列	✅ 已对齐	`nextInt(4)` action 优先，然后 `nextInt(q_range)`
订单选择	✅ 已对齐	使用 `orderUids` 迭代器 (BTreeMap 确定性)
IOC 模拟	✅ 已实现	影子订单簿 `simulate_ioc_match`
订单簿反馈	❌ 缺口	Java 使用真实匹配引擎反馈 `lackOfOrders`

Important

BENCHMARK 阶段缺口: Java 的 generateRandomOrder 使用 真实匹配引擎 的 lastOrderBookOrdersSizeAsk/Bid（在 updateOrderBookSizeStat 中更新）。没有完整 Rust 匹配引擎，影子订单簿会与 Java 状态分歧。

5.3 关键实现细节

JavaRandom - 比特级精确的 java.util.Random LCG
种子派生: Objects.hash(symbol*-177277, seed*10037+198267)
用户账户: 1 + (int)paretoSample 公式
货币顺序: [978, 840] 基于 HashMap bucket 索引
CENTRAL_MOVE_ALPHA: 0.01 (不是 0.1)
影子订单簿: ask_orders/bid_orders Vec 支持 O(1) swap_remove

6. 验证命令

一键验证:

# 运行所有黄金数据验证测试
cargo test golden_ -- --nocapture

详细对比测试:

# 逐行对比前 20 个订单与黄金 CSV
cargo test test_generator_vs_golden_detailed -- --nocapture

所有 Benchmark 测试:

# 运行 bench 模块的所有测试
cargo test bench:: -- --nocapture

预期输出:

[  1] ✅ | Golden: id=1, price=34386, size=  1, action=BID, uid=377
[  2] ✅ | Golden: id=2, price=34135, size=  1, action=BID, uid=110
[  3] ✅ | Golden: id=3, price=34347, size=  2, action=BID, uid=459
...
[20] ✅ | Golden: id=20, price=34297, size=  1, action=BID, uid=491

7. 公平压测流程 (Fair Benchmark Procedure)

Important

公平比较的关键: 数据生成与执行必须分离。Java 在测试前预生成所有命令到内存。

7.1 四阶段分离

Phase 1: 数据预生成 ───────────── ⏸️ 不计时
Phase 2: FILL (预填充) ──────────── ⏸️ 不计时  
Phase 3: BENCHMARK (压测) ──────── ⏱️ 仅此阶段计时
Phase 4: 验证 ────────────────── ⏸️ 不计时

7.2 Rust 实现规范

#![allow(unused)]
fn main() {
// ✅ 正确: 预生成 → 再执行
let (fill_commands, benchmark_commands) = generator.pre_generate_all();

// Phase 2: FILL (不计时)
for cmd in &fill_commands {
    exchange.execute(cmd);
}

// Phase 3: BENCHMARK (仅此阶段计时)
let start = Instant::now();
for cmd in &benchmark_commands {
    exchange.execute(cmd);
}
let mtps = benchmark_commands.len() as f64 / start.elapsed().as_secs_f64() / 1_000_000.0;
}

7.3 预生成接口

#![allow(unused)]
fn main() {
impl TestOrdersGeneratorSession {
    /// Pre-generate all commands for fair benchmarking
    pub fn pre_generate_all(&mut self) -> (Vec<TestCommand>, Vec<TestCommand>) {
        let fill_count = self.config.target_orders_per_side * 2;
        let benchmark_count = self.config.symbol_messages;
        
        let fill: Vec<_> = (0..fill_count).map(|_| self.next_command()).collect();
        let benchmark: Vec<_> = (0..benchmark_count).map(|_| self.next_command()).collect();
        
        (fill, benchmark)
    }
}
}

7.4 现阶段可完成 vs 需要 ME 集成

任务	现阶段	需 ME
预生成接口 `pre_generate_all()`	✅	-
生成 3M 订单到内存	✅	-
导出 CSV 供验证	✅	-
执行 FILL 阶段	-	✅
执行 BENCHMARK 计时	-	✅
全局余额验证	-	✅

8. Phase 0x14-a 总结

8.1 已完成组件

组件	状态	验证
JavaRandom LCG PRNG	✅	与 Java 比特精确
种子派生算法	✅	`Objects.hash` 复现
TestOrdersGenerator	✅	FILL 1000 行 100% 匹配
影子订单簿	✅	IOC 模拟实现
预生成接口	✅	`pre_generate_all()`, `pre_generate_3m()`
公平测试流程文档	✅	Section 7, Appendix B

8.2 BENCHMARK 阶段差异分析

原因	说明
匹配引擎反馈	Java 使用 `lastOrderBookOrdersSizeAsk/Bid` 决定 `growOrders`
影响	命令类型分布略有不同（GTC vs IOC 比例）
解决方案	Phase 0x14-b 实现完整匹配引擎后可达 100%

8.3 下一步

优先级	任务	依赖
P0	实现 Rust 匹配引擎 (Phase 0x14-b)	-
P1	3M 订单压测验证	匹配引擎
P2	延迟统计 (HdrHistogram)	匹配引擎

↑ 回到顶部

0x14-b Order Commands: Feature Completion

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

Status	✅ COMPLETED
Context	Phase V: Extreme Optimization (Step 2)
Goal	Achieve feature parity with Exchange-Core’s Spot Matching Engine to support the Benchmark harness.
Scope	Spot Only. Margin/Futures deferred to 0x14-c.

1. Gap Analysis

Based on code review of src/engine.rs, src/models.rs, src/orderbook.rs:

✅ Already Implemented

Feature	Location	Notes
MatchingEngine	`src/engine.rs`	`process_order()`, `match_buy()`, `match_sell()`
Price-Time Priority	`engine.rs:80-165`	Lowest ask first (buy), highest bid first (sell), FIFO
Limit Orders	`engine.rs:61-68`	Unfilled remainder rests in book
Market Orders	`engine.rs:90-94`	`u64::MAX` price for buy, matches all
Order Status	`models.rs:57-68`	NEW, PARTIALLY_FILLED, FILLED, CANCELED, REJECTED, EXPIRED
OrderBook	`orderbook.rs`	BTreeMap storage, `cancel_order()` by ID+price+side

❌ Missing (Required for 0x14-b)

Based on exchange_core_verification_kit/test_datasets_and_steps.md L162-171 (Command Distribution):

Feature	Benchmark %	Current Status	Priority
IOC (Immediate-or-Cancel)	~35%	❌ Not Implemented	P0
MoveOrder	~8%	❌ Not Implemented	P0
ReduceOrder	~3%	❌ Not Implemented	P1
FOK_BUDGET	~1%	❌ Not Implemented	P2

Note: FOK_BUDGET (Fill-or-Kill by Quote Budget) is ~1% of benchmark commands. Required for full S-to-Huge parity.

2. Architectural Requirements

2.1 Data Model Extensions (Schema)

We must extend InternalOrder to support varied execution strategies without polluting the core OrderType.

New Enum: TimeInForce

#![allow(unused)]
fn main() {
pub enum TimeInForce {
    GTC, // Good Till Cancel (Default)
    IOC, // Immediate or Cancel (Taker only, cancel remainder)
    FOK, // Fill or Kill (All or Nothing) - Optional for now
}
}

Updated InternalOrder:

Add pub time_in_force: TimeInForce
Add pub post_only: bool (Future proofing, Generator doesn’t strictly use it yet but good practice)

2.2 Matching Engine Logic

The Matching Engine must process orders sequentially based on seq_id.

Execution Flow:

Incoming Order: Parse TimeInForce and OrderType.
Matching:
- Limit GTC: Match against opposite book. Remaining -> Add to Book.
- Limit IOC: Match against opposite book. Remaining -> Expire (do not add to book).
- Market: Match against opposite book at any price. Remaining -> Expire (or defined slippage protection).
Command Handling:
- MoveOrder: Atomic “Cancel old ID + Place new ID”. Priority Loss is acceptable (and expected).
- ReduceOrder: Reduce qty in-place. Priority Preservation required if implemented efficiently, else re-insert. Exchange-Core typically preserves priority on reduce.

2.3 `FokBudget` Handling (Spot)

Generator produces FokBudget? -> Checks show mostly Gtc/Ioc.
Correction: CommandType::FokBudget exists in Generator enum but usage is rare in the Spot Benchmark. We prioritize IOC and GTC.

3. Developer Specification

3.1 Task List

Model Update:
- Modify src/models.rs: Add TimeInForce enum.
- Update InternalOrder struct.
Engine Implementation (src/engine/matching.rs):
- Implement process_order(&mut self, order: InternalOrder) -> OrderResult.
- Implement match_market_order.
- Implement match_limit_order.
Command Logic:
- Implement reduce_order(price, old_qty, new_qty).
- Implement move_order (atomic cancel + place).

3.2 Acceptance Criteria

Unit Tests:
- test_ioc_partial_fill: 100 qty order vs 60 qty book -> 60 filled, 40 expired.
- test_gtc_maker: 100 qty order vs empty book -> 100 rests in book.
- test_market_sweep: Market order consumes multiple price levels.

4. QA Verification Plan

Property: Ioc orders must never appear in all_orders() (the book) after processing.
Property: Gtc orders must appear in book if not fully matched.
| Latency | Measure process_order time | ✅ < 5µs (Verified) |

5. Implementation Status & Results

Note

✅ Phase 0x14-b: 100% Feature Parity Achieved

5.1 Verification Matrix

Module	Purpose	Tests	Status
IOC Logic	Immediate-or-Cancel (Taker)	9/9	✅
MoveOrder	Price modification (Atomic)	7/7	✅
ReduceOrder	Qty reduction (Priority Preserved)	5/5	✅
Persistence	Settlement & DB Sync	5/5	✅
Edge Cases	Robustness & Error Handling	17/17	✅
Total		43/43	✅ 100%

5.2 Key Technical Findings

Asynchronous Consistency: Fixed a critical bug where Cancel/Reduce actions bypassed the MEResult persistence queue.
Priority Preservation: Verified that ReduceOrder maintains temporal priority, while MoveOrder (Price change) correctly resets it.
Reactive Loop: Optimized the matching engine to handle state transitions without synchronous blocking on I/O.

6. Validation Commands

Automated QA Suite:

# Run all 0x14-b specific QA tests
./scripts/test_0x14b_qa.sh --with-gateway

Unit Verification:

cargo test test_ioc_ test_mov_ test_reduce_

↑ Back to Top

🇨🇳 中文

状态	✅ 已完成
上下文	Phase V: 极致优化 (Step 2)
目标	实现与 Exchange-Core 现货撮合引擎的功能对齐，以支持基准测试工具。
范围	仅现货。杠杆/期货推迟至 0x14-c。

1. 差距分析 (基于 Verification Kit)

基于 exchange_core_verification_kit/test_datasets_and_steps.md L162-171 命令分布：

✅ 已实现

功能	基准占比	说明
GTC 限价单	~45%	`engine.rs::process_order()`
Cancel 取消	~9%	完整链路: Gateway → Pipeline → OrderBook → WAL

❌ 需新增

功能	基准占比	优先级
IOC 即时单	~35%	P0
Move 移动	~8%	P0
Reduce 减量	~3%	P1
FOK_BUDGET	~1%	P2

说明: FOK_BUDGET (按报价币金额买入) 占比 ~1%，完成 S-to-Huge 全量测试需实现。

2. 架构需求

2.1 数据模型扩展 (Schema)

必须扩展 InternalOrder 以支持多种执行策略。

新枚举: TimeInForce

#![allow(unused)]
fn main() {
pub enum TimeInForce {
    GTC, // Good Till Cancel (默认: 一直有效直到取消)
    IOC, // Immediate or Cancel (Taker 专用: 剩余未成交部分立即过期)
    FOK, // Fill or Kill (全部成交或全部取消) - 暂可选
}
}

更新 InternalOrder:

新增 pub time_in_force: TimeInForce
新增 pub post_only: bool (为未来准备，虽然生成器暂时未严格使用)

2.2 撮合引擎逻辑

撮合引擎必须基于 seq_id 顺序处理 订单。

执行流:

新订单接入: 解析 TimeInForce 和 OrderType。
撮合过程:
- Limit GTC: 与对手盘撮合。剩余部分 -> 加入订单簿。
- Limit IOC: 与对手盘撮合。剩余部分 -> 立即过期 (Expire) (不入簿)。
- Market: 与对手盘在任意价格撮合。剩余部分 -> 过期 (或滑点保护)。
指令处理:
- MoveOrder: 原子化 “取消旧ID + 下单新ID”。优先级丢失 是可接受的 (且预期的)。
- ReduceOrder: 原地减少数量。如果实现得当，应保留优先级。Exchange-Core 通常在减量时保留优先级。

2.3 `FokBudget` 处理 (现货)

生成器会产生 FokBudget 吗？ -> 代码显示主要是 Gtc/Ioc。
修正: CommandType::FokBudget 存在于枚举中，但在现货 Benchmark 中极少使用。我们优先保证 IOC 和 GTC 的正确性。

3. 开发规范 (Developer Specification)

3.1 任务清单

模型更新:
- 修改 src/models.rs: 增加 TimeInForce 枚举。
- 更新 InternalOrder 结构体。
引擎实现 (src/engine/matching.rs):
- 实现 process_order(&mut self, order: InternalOrder) -> OrderResult。
- 实现 match_market_order (市价撮合)。
- 实现 match_limit_order (限价撮合)。
指令逻辑:
- 实现 reduce_order(price, old_qty, new_qty)。
- 实现 move_order (atomic cancel + place)。

3.2 验收标准

单元测试:
- test_ioc_partial_fill: 100 qty 订单 vs 60 qty 深度 -> 成交 60, 过期 40。
- test_gtc_maker: 100 qty 订单 vs 空订单簿 -> 100 进入 OrderBook。
- test_market_sweep: 市价单吃掉多个价格档位。

4. QA 验证计划

属性: Ioc 订单处理后，绝不应出现在 all_orders() (订单簿) 中。
属性: Gtc 订单若未完全成交，必须出现在订单簿中。
| 延迟 | 测量 process_order 处理时间 | ✅ < 5µs (已验证) |

5. 实施结果与验证

Note

✅ Phase 0x14-b: 100% 功能对齐已完成

5.1 验证矩阵

模块	目的	测试项	状态
IOC 逻辑	立即成交或取消 (Taker)	9/9	✅
MoveOrder	改价指令 (原子化)	7/7	✅
ReduceOrder	减量指令 (保留优先级)	5/5	✅
持久化	结算与数据库同步	5/5	✅
边界测试	鲁棒性与错误处理	17/17	✅
合计		43/43	✅ 100%

5.2 关键技术点总结

异步一致性: 修复了 Cancel/Reduce 操作绕过 MEResult 持久化队列的 Bug，确保数据库状态与内存一致。
优先级保留: 通过单元测试验证了 ReduceOrder 成功保留时间优先级，而 MoveOrder (改价) 正确重置了优先级。
响应式架构: 优化了撮合引擎的反应循环，确保所有指令都在微秒级完成且具备确定性的副作用路径。

6. 验证命令

一键回归测试:

# 运行所有 0x14-b QA 自动化测试
./scripts/test_0x14b_qa.sh --with-gateway

单元逻辑验证:

cargo test test_ioc_ test_mov_ test_reduce_

↑ 回到顶部

0x13 CPU Affinity & Cache

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📅 Status: 🚧 Planned Core Objective: Pin threads to CPU cores and optimize data layout for cache locality.

1. Overview

CPU Affinity: Bind matching threads to isolated cores to reduce context switching.
Cache Locality: Optimize OrderBook node layout to fit L1/L2 cache lines.
False Sharing: Padding atomic variables to prevent cache line contention.

(Detailed content coming soon in Phase III)

↑ Back to Top

🇨🇳 中文

📅 状态: 🚧 计划中 核心目标: 主要线程绑核与缓存友好性优化。

1. 概述

CPU 亲和性 (Affinity): 将撮合线程绑定到隔离核心，减少上下文切换。
缓存局部性 (Locality): 优化 OrderBook 节点布局以适应 L1/L2 缓存行。
伪共享 (False Sharing): 通过 Padding 避免多线程竞争同一缓存行。

(第三阶段详细内容敬请期待)

0x14 SIMD Matching Acceleration

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

📅 Status: 🚧 Planned Core Objective: Use SIMD (AVX2/AVX-512) instructions to accelerate order matching.

1. Overview

Vectorization: Process multiple price levels in parallel.
Intrinsics: Direct use of Rust std::arch intrinsics.
Benchmark: Aiming for > 5M TPS.

(Detailed content coming soon in Phase III)

↑ Back to Top

🇨🇳 中文

📅 状态: 🚧 计划中 核心目标: 使用 SIMD (AVX2/AVX-512) 指令集加速订单撮合。

1. 概述

向量化 (Vectorization): 并行处理多个价格档位。
Intrinsics: 直接使用 Rust std::arch 内联汇编/指令。
基准目标: 目标吞吐量 > 500万 TPS。

(第三阶段详细内容敬请期待)

0x17 SIMD Matching Acceleration

Status: Planned

This chapter will cover SIMD (Single Instruction Multiple Data) vectorized matching using AVX-512 or ARM NEON instructions.

Coming soon…

Performance Report

Generated: 2025-12-31 04:48:47

Summary

Metric	Baseline	Current	Change
Orders	0	100,000	-
Trades	0	0	-
Exec Time	0.00ms	119.10ms	+0.0%
Throughput	0/s	839,618/s	+0.0%

Timing Breakdown

Component	Time	OPS	% of Total
Balance Check	0ns	0K	0.0%
Matching Engine	23.42ms	4.3M	100.0%
Settlement	0ns	0K	0.0%
Ledger I/O	0ns	0K	0.0%

Latency Percentiles

Percentile	Value
MIN	150ns
AVG	1.1µs
P50	1.6µs
P99	4.2µs
P99.9	11.1µs
MAX	960.5µs

Verdict

✅ No significant regressions detected

Performance History

性能报告历史存档。每个重要章节完成后生成一份报告。

报告列表

日期	章节	关键变化
2025-12-18	0x08h	服务化重构，ME占76.6%，1.3M数据集
2025-12-16	0x07b	性能基线建立，Ledger I/O 占 98.5%

命名规范

YYYY-MM-DD-章节.md

例如：2025-12-16-0x07b.md

如何生成报告

# 1. 运行性能测试
cargo run --release

# 2. 生成报告
python3 scripts/generate_perf_report.py > docs/src/perf-report.md

# 3. 存档历史
cp docs/src/perf-report.md docs/src/perf-history/$(date +%Y-%m-%d)-章节.md

# 4. 更新此索引文件，添加新条目

# 5. 提交
git add docs/src/perf-report.md docs/src/perf-history/
git commit -m "docs: Update perf report"

Performance Report - 2025-12-18 0x08-h

Branch: 0x08-h-performance-monitoring Dataset: 1.3M orders (30% cancels, high-balance mode) Changes: Service-oriented refactoring (IngestionService, UBSCoreService, MatchingService, SettlementService)

Summary

Metric	Single-Thread	Multi-Thread
Orders	1,300,000	1,300,000
Trades	667,567	667,567
Exec Time	14.18s	20.17s
Throughput	91,710/s	64,450/s
P50 Latency	2.5 µs	113 ms

Multi-Thread Breakdown

Component	Time	%	Latency/op
Matching Engine	19.23s	76.6%	19.23 µs
Persistence	5.35s	21.3%	4.12 µs
Settlement	0.51s	2.0%	0.76 µs

Key Changes

Extracted 4 service structs from spawn functions
Reduced pipeline_mt.rs from 720 to ~250 lines
Added pipeline_services.rs (~640 lines)
All tests pass with exact trade count match

Verdict

✅ Correctness Verified: 667,567 trades, 0 balance differences

Performance Report

Generated: 2025-12-16 18:16:36

Summary

Metric	Baseline	Current	Change
Orders	100,000	100,000	-
Trades	47,886	47,886	-
Exec Time	3753.87ms	3956.64ms	+5.4%
Throughput	26,639/s	25,274/s	-5.1%

Timing Breakdown

Component	Time	OPS	% of Total
Balance Check	17.64ms	5.7M	0.4%
Matching Engine	36.37ms	2.7M	0.9%
Settlement	4.71ms	21.2M	0.1%
Ledger I/O	3.88s	26K	98.5%

Latency Percentiles

Percentile	Value
MIN	125ns
AVG	38.6µs
P50	625ns
P99	429.7µs
P99.9	1.37ms
MAX	7.25ms

Verdict

❌ 2 regression(s) detected

Exec Time: +5.4%
Throughput: -5.1%

开发规范 (Development Guidelines)

Core Principle: Standardize environments to eliminate “works on my machine” issues.

🐍 Python Environment

We use uv for strict dependency management and execution speed.

1. The Golden Rule

NEVER use system python3 or pip directly for project scripts. ALWAYS use uv run to execute scripts.

2. Standard Workflow

# 1. Sync dependencies (like npm install)
uv sync

# 2. Run script (like npm run)
uv run python3 scripts/my_script.py

3. Adding Dependencies

# Add new package
uv add requests

🦀 Rust Environment

Format: cargo fmt must pass.
Lint: cargo clippy must pass (no warnings).
Tests: cargo test must pass.

API 规范 (API Conventions)

ID 规范 (ID Specification)

命名规范 (Naming Convention)

Money Type Safety Standard | 资金类型安全规范

Version: 1.3 | Last Updated: 2025-12-31

本文件定义了本项目处理资金（余额、订单金额、成交价格）的治理方案。重点是：如何在代码层面禁止不符合规范的操作。任何违反本规范的代码不得合并。

Part I: 背景与设计决策

1.1 核心风险

金额是领域概念，不是原始类型。

在任何金融系统中，“钱“都不应被视为一个裸露的整数。它是一个携带精度语义的领域对象——1 BTC 内部表示为 100_000_000 聪，这个 10^8 的缩放因子是资产的内在属性，而非程序员的临时决定。

当开发者在代码中随意写下 amount * 10u64.pow(8) 时，他实际上在破坏这层抽象，将领域逻辑泄漏到业务代码的每一个角落。这会导致：

风险类型	后果
账本无法对齐	任何微小误差都会破坏“资金恒等定理“，导致无法 100% 精确对账。我们无法区分“正常误差“还是“真正的 Bug“。
语义错误	错误地将 BTC 金额与 USDT 金额直接相加。
溢出攻击	恶意构造的超大数值导致系统崩溃或资金错算。
维护噩梦	转换逻辑复杂，到处重复写必然到处犯错。

1.2 为什么选择 `u64` + 内部缩放？

前置阅读: 关于浮点数的问题，请参阅 0x02 浮点数的诅咒，此处不再重复。

核心结论：

f64 无法满足跨平台确定性（不同 CPU/编译器结果可能不同）。
Decimal 无法满足极致性能（比 u64 慢 10x+）。
u64 是唯一能同时满足“区块链级验证强度“和“高频撮合性能“的方案。

但 u64 需要内部缩放，这引入了复杂性。因此我们必须：

将缩放算法封装在 money.rs 中。
严禁在其他地方手工进行缩放运算。

1.3 内部缩放方案：如何实现大额处理？

核心机制：我们为每种资产定义系统精度（通常 8 位），而非使用链上原生精度（如 ETH 的 18 位）。

资产	链上精度	系统精度	`u64` 最大可处理金额
BTC	8 位	8 位	1844 亿 BTC (远超总供应量)
ETH	18 位	8 位	1844 亿 ETH ✅
USDT	6 位	6 位	18.4 万亿 USDT ✅

Important

精度权衡：使用 8 位系统精度意味着 ETH 最小单位是 0.00000001 ETH (10 gwei)，而非链上的 1 wei。对于交易所场景，这完全足够——没有人会交易 1 wei 的 ETH。

这就是为什么“缩放“必须封装：

不同资产有不同的链上精度和系统精度。
入金时：链上精度 → 系统精度（可能截断极小尾数）。
出金时：系统精度 → 链上精度（补零）。
这套转换逻辑复杂，必须集中管理，严禁各处手写。

Tip

u128 的替代方案：如果不追求极致性能，使用 u128 可以直接采用统一的 18 位精度，避免不同资产间的精度转换问题。但这会牺牲约 10-20% 的撮合性能。

Part II: 解决方案与决策 (Solutions & Decisions)

2.1 类型安全：Newtype 守卫 (The Newtype Guardian)

问题: u64 是原生类型，开发者可以轻易写出 amount * 10u64.pow(8)。

方案: 引入不透明的包装类型 ScaledAmount(u64):

内部字段 u64 是 private 的，无法直接访问。
所有构造必须通过 money.rs 提供的审计过的 Constructor。
如果有人想“私自计算“，他必须先解包（to_raw()），这种“不自然“的操作在 Code Review 中一眼可见。

#![allow(unused)]
fn main() {
// 🛡️ 核心类型定义
pub struct ScaledAmount(u64);        // 无符号：余额、订单数量
pub struct ScaledAmountSigned(i64);  // 有符号：盈亏、差额
}

已实现:

ScaledAmount / ScaledAmountSigned 定义
checked_add / checked_sub 安全算术
Deref<Target = u64> 允许比较，但禁止直接算术

2.2 访问控制：入口收缩 (Visibility Chokepoint)

问题: 如果底层函数 parse_amount(str, decimals) 是 pub 的，开发者会倾向于直接使用它。

方案: 将 Layer 1 工具函数收缩为 pub(crate)：

可见性	函数	用途
`pub(crate)`	`parse_amount`, `format_amount`	仅限 `money.rs` 和核心模块内部使用
`pub`	`SymbolManager::parse_qty()`, `SymbolManager::format_price()`	外部唯一入口

效果: 在代码自动补全时，开发者首先看到的是 SymbolManager 上的高层方法。

已实现:

parse_amount / format_amount 改为 pub(crate)

2.3 分层架构 (Layered Architecture)

层级	组件	职责	可见性
Layer 1 (Core)	`money.rs`	原子类型定义与底层缩放	`pub(crate)`
Layer 2 (Domain)	`Asset` / `AssetInfo`	感知资产精度，提供意图封装 API	`pub`
Layer 3 (Integration)	`SymbolManager` / `MoneyFormatter`	交易对级别的转换与批量格式化	`pub`

Tip

扩展性: MoneyFormatter 目前服务于深度图。随着 Kline/Ticker 复杂化，此模式可推广至所有行情展示。

2.4 铁律：意图封装 API (Intent-based API)

Caution

业务代码禁止直接调用 money:: 函数。必须使用 Asset / AssetInfo 提供的意图封装 API。

问题：直接调用底层函数暴露实现细节

#![allow(unused)]
fn main() {
// ❌ 错误：暴露了 decimals 参数，调用者需要知道内部实现
let amount_scaled = *money::parse_decimal(amount, asset.decimals as u32)?;
}

解决方案：在 Asset / AssetInfo 上提供意图封装方法

#![allow(unused)]
fn main() {
// ✅ 正确：调用者只需表达意图，不需要知道 decimals
let amount_scaled = asset.parse_amount(amount)?;
let fee_scaled = asset.parse_amount_allow_zero(fee)?;
}

设计架构：

┌─────────────────────────────────────────────────────────────────┐
│  业务代码 (deposit.rs, withdraw.rs, order.rs...)                 │
│  ✅ asset.parse_amount(decimal)                                  │
│  ✅ asset.parse_amount_allow_zero(decimal)                       │
│  ✅ asset.format_amount(scaled_amount)                           │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│  意图封装层 (Asset / AssetInfo)                                  │
│  封装 decimals 参数，提供"类型 → 类型"的简洁 API                 │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│  核心转换层 (money.rs)                                           │
│  parse_decimal() / parse_decimal_allow_zero() / format_amount() │
│  ⚠️ pub(crate) - 仅供意图封装层调用                              │
└─────────────────────────────────────────────────────────────────┘

关键收益：

收益	说明
简洁性	`asset.parse_amount(d)` vs `money::parse_decimal(d, asset.decimals as u32)`
封装性	调用者不需要知道 `decimals`、`display_decimals` 等内部参数
一致性	所有业务代码使用相同的 API 模式
可审计性	直接 `money::` 调用是需要审查的红旗

Part III: 内外边界与显示策略 (Internal/External Boundary & Display)

3.0 核心规范：内部实现绝不暴露

Caution

内部的 u64 表示是实现细节，绝对不能暴露给客户端。

强制规范：

统一转换层：内部系统与外部 Client 之间，必须经过统一的转换层。
API 层使用 Decimal：DTO 中的金额字段使用 StrictDecimal（自定义类型），利用 rust_decimal 的格式验证能力。
分层验证：
- Serde 层：格式验证（拒绝 .5、非数字等）→ 得到 Decimal
- SymbolManager 层：精度/范围验证 → 得到 ScaledAmount
精度来源唯一：资产精度从 SymbolManager 获取，严禁硬编码。

┌─────────────┐     ┌──────────────┐     ┌──────────────┐     ┌─────────────┐
│   Client    │ ──→ │  Serde 层    │ ──→ │SymbolManager │ ──→ │  Internal   │
│  (String)   │     │ (Decimal)    │     │ (验证精度)   │     │   (u64)     │
└─────────────┘     └──────────────┘     └──────────────┘     └─────────────┘
     "1.5"       格式验证     Decimal(1.5)   精度验证    ScaledAmount(150_000_000)

设计优势：

利用库能力：rust_decimal 提供成熟的数字解析
早期失败：格式错误在反序列化阶段就拦截
关注点分离：格式验证 vs 精度验证分开处理
业务代码简化：Handler 拿到的 Decimal 已是合法数字，只需验证范围

3.1 截断是唯一合法的舍入策略

决策：所有转换、计算过程中的精度损失，一律使用截断（Truncation），不允许四舍五入。

原因：

一致性：与整数除法的行为一致（向零截断）。
可预测性：任何人在任何平台重算，结果完全一致。
安全性：宁愿少显示，也不能让用户认为自己拥有实际不存在的余额。

场景	策略	示例
入金转换	截断	链上 `1.23456789012345678 ETH` → 系统 `1.23456789 ETH`
余额显示	截断	内部 `123456789` → 显示 `"1.2345"` (4位显示精度)
成交计算	截断	避免凭空产生资金

3.2 严格解析：拒绝模糊输入

决策: 拒绝 .5 和 5. 等简写，强制要求 0.5 和 5.0。

原因：处理金额数据，严谨和安全是第一位的。模糊的输入格式可能导致：

手抖或脚本错误输入不完整数字
不同解析器对歧义格式有不同解读
隐蔽的精度丢失

行动项:

在 OpenAPI 文档和错误信息中明确提示此规范

3.3 零值处理：默认严格 + 显式入口

问题：零值在某些场景是非法的（订单数量），在另一些场景是合法的（手续费）。

反模式：到处写 workaround

#![allow(unused)]
fn main() {
// ❌ 散落各处，维护噩梦
let fee = if fee_str == "0" {
    ScaledAmount::ZERO
} else {
    parse_amount(&fee_str, decimals)?
};
}

推荐模式：显式入口

#![allow(unused)]
fn main() {
// ===== 默认入口：严格，拒绝零 =====
/// 用于订单数量、价格等必须非零的场景
pub fn parse_amount(s: &str, decimals: u32) -> Result<ScaledAmount>

// ===== 显式入口：允许零 =====
/// 用于手续费等可能为零的场景
/// 调用者应该知道自己在做什么
pub fn parse_amount_allow_zero(s: &str, decimals: u32) -> Result<ScaledAmount>
}

使用示例：

#![allow(unused)]
fn main() {
// 订单数量：必须非零（使用默认严格版本）
let qty = symbol_mgr.parse_qty(symbol, &req.quantity)?;

// 提现手续费：可以为零（显式表达意图）
let fee = symbol_mgr.parse_fee_allow_zero(symbol, &req.fee)?;
}

设计原则：

原则	说明
坑的成功 (Pit of Success)	默认行为是安全的，需要绕过时必须显式声明
意图可见	代码中看到 `_allow_zero` 就知道这里允许零
Code Review 信号	`_allow_zero` 调用是需要审查的信号
解析层不做业务判断	是否允许零由调用方通过选择入口决定

Part IV: 如何在代码层面强制执行？

核心问题：如何禁止开发者到处私自转换？

4.1 第一道防线：类型系统 (编译期)

Newtype 封装：ScaledAmount(u64) 的内部字段是 private 的。

#![allow(unused)]
fn main() {
pub struct ScaledAmount(u64);  // u64 不可直接访问

impl ScaledAmount {
    pub(crate) fn from_raw(v: u64) -> Self { Self(v) }  // 仅 crate 内部可构造
    pub fn to_raw(self) -> u64 { self.0 }                // 显式"逃逸"
}
}

效果：

❌ ScaledAmount::from_raw(100) — 外部模块无法调用
❌ amount.0 — 无法直接访问内部字段
❌ amount + 100u64 — 类型不匹配，编译失败
✅ *amount > 0 — 通过 Deref 允许比较

4.2 第二道防线：可见性控制 (API 入口收缩)

层级隔离：

函数	可见性	谁可以调用
`parse_amount()`	`pub(crate)`	仅 `money.rs` 和核心模块
`format_amount()`	`pub(crate)`	仅 `money.rs` 和核心模块
`SymbolManager::parse_qty()`	`pub`	任何模块（唯一合法入口）
`SymbolManager::format_price()`	`pub`	任何模块（唯一合法入口）

效果：

Gateway Handler 的代码自动补全中，只能看到 SymbolManager 的方法。
如果开发者想调用底层 parse_amount()，会发现它不在作用域内。

4.3 第三道防线：API 层数据类型 (DTO 设计)

强制规范：API 请求/响应中的金额字段，必须使用 String 类型。

#![allow(unused)]
fn main() {
// ✅ 正确: 使用 String，由 Handler 调用 SymbolManager 转换
#[derive(Deserialize)]
pub struct PlaceOrderRequest {
    pub quantity: String,  // "1.5"
    pub price: String,     // "50000.00"
}

// ❌ 错误: 直接使用 u64，暴露内部实现
#[derive(Deserialize)]
pub struct PlaceOrderRequest {
    pub quantity: u64,     // 客户端如何知道要传 150_000_000？
}
}

Serde 不会自动转换：如果客户端传 "quantity": 1.5（JSON number），String 类型会反序列化失败，强制客户端传 "1.5"（JSON string）。

4.4 第四道防线：CI 自动化审计

审计脚本: scripts/audit_money_safety.sh

#!/bin/bash
set -e

echo "🔍 Auditing money safety..."

# 1. 检查非 money.rs 中的手动缩放
if grep -rn "10u64.pow" --include="*.rs" src/ | grep -v "money.rs"; then
    echo "❌ FAIL: Found 10u64.pow outside money.rs"
    exit 1
fi

# 2. 检查 Decimal 手动幂运算
if grep -rn "Decimal::from(10).powi" --include="*.rs" src/ | grep -v "money.rs"; then
    echo "❌ FAIL: Found Decimal power operation outside money.rs"
    exit 1
fi

# 3. 检查硬编码精度 (可选，需要更精细的规则)
# grep -rn "decimals.*=.*8" --include="*.rs" src/ | grep -v "symbol_manager.rs"

echo "✅ Money safety audit passed!"

集成：

.github/workflows/ci.yml — 每次 PR 自动运行
.git/hooks/pre-commit — 本地提交前拦截

4.5 第五道防线：Code Review 信号

高危操作清单 (PR 审查时重点关注)：

代码模式	风险等级	处理方式
`.to_raw()`	⚠️ 高	必须注释说明原因
`10u64.pow` 在 `money.rs` 外	🚫 禁止	拒绝合并
`decimals: u32` 硬编码	⚠️ 高	应从 `SymbolManager` 获取
API DTO 中 `u64` 金额字段	🚫 禁止	必须使用 `String`
`Deref` 后直接算术 (`a + b`)	⚠️ 高	应使用 `checked_add`

4.6 第六道防线：Agent 记忆 (AGENTS.md)

已生效: AGENTS.md 必读列表中包含本规范。所有 AI Agent 在开始工作前必须阅读，确保生成的代码符合规范。

Part V: 未来升级路径 (Future Upgrade Path)

阶段	目标	状态
Phase 0	Newtype 定义, API 收缩, 文档治理	✅ 已完成
Phase 1	`audit_money_safety.sh` 集成 CI	⏳ 待实现
Phase 1.5	API Money Enforcement：Extractor + IntoResponse 强制转换	⏳ 待实现
Phase 2	存量代码全面扫描与迁移	⏳ 待执行
Phase 2.5	Legacy 代码迁移至意图封装 API（见下方详情）	⏳ 待执行

Phase 2.5 详情：Legacy 代码迁移

目标：将所有直接调用 money:: 函数的代码迁移到 Asset / AssetInfo 意图封装 API。

迁移内容：

旧代码	新代码
`money::parse_decimal(d, asset.decimals as u32)`	`asset.parse_amount(d)`
`money::parse_decimal_allow_zero(d, asset.decimals as u32)`	`asset.parse_amount_allow_zero(d)`
`money::format_amount(amt, decimals, display)`	`asset.format_amount(amt)`

已完成：

src/funding/deposit.rs
src/funding/withdraw.rs

待迁移（扫描整个代码库中的 money::parse 和 money::format 调用）：

其他业务模块全面扫描
添加 CI 检查禁止业务代码直接调用 money:: 函数

总结：为什么如此严苛 (Why So Heavy?)

核心原则 1：账本必须 100% 可对账

如果允许任何精度误差的存在，系统的账本就无法做到 100% 对齐。我们无法利用“资金恒等定理“（总入金 = 总余额 + 总出金）来进行精确对账。一旦账本不能 100% 对齐，我们就无法分辨一个差异是“可接受的正常误差“还是一个隐藏的 Bug。真正的问题可能被“误差“掩盖，直到造成无法挽回的损失。

核心原则 2：转换逻辑必须收敛到唯一位置

金额转换逻辑非常复杂（精度、舍入、溢出检查）。如果允许在代码库各处重复编写，每个地方都可能犯不同的错误。将转换收敛到唯一的、经过充分审计和测试的代码位置 (money.rs + SymbolManager)，我们可以：

对这一处进行穷尽式测试（边界值、溢出、负数等）。

确保所有调用者都享受同等的安全保障。

在发现 Bug 时，只需修复一处，全局生效。

简单总结 (The Rules)

NO 10u64.pow() outside money.rs.
NO raw u64 arithmetic for amounts.
NO implicit scaling.
YES SymbolManager for all intent-based conversions.

速查表 (Quick Reference)

场景	✅ 正确做法	❌ 错误做法
API DTO 字段	`quantity: StrictDecimal`	`quantity: u64` 或 `quantity: String`
Decimal → ScaledAmount	`symbol_mgr.decimal_to_scaled(symbol, decimal)`	手动计算 `decimal * 10^8`
ScaledAmount → String	`symbol_mgr.format_price(symbol, amount)`	`format!("{}", amount)`
获取精度	`symbol_mgr.get_decimals(asset)`	`let decimals = 8;`
算术运算	`amount.checked_add(other)?`	`amount + other`
比较运算	`*amount > 0`	✅ 允许 (Deref)

API Money Enforcement | API 层资金类型强制规范

目标：确保所有 API Handler 都通过统一的转换层处理金额数据，禁止各处私自转换。

适用范围：Request（入）和 Response（出）双向。

1. 问题陈述

Gateway 有多个 API Handler，每个都需要：

入向：接收 JSON 中的金额字符串（如 "1.5"），转换为内部 ScaledAmount
出向：将内部 ScaledAmount 格式化为 JSON 字符串返回给客户端

核心挑战：如何确保所有 Handler 都通过 SymbolManager 转换，而不是各自写一套转换逻辑？

2. 方案对比

方案 A：DTO + 显式验证层

机制：Handler 接收原始 DTO，手动调用验证函数。

#![allow(unused)]
fn main() {
// Request
async fn place_order(Json(req): Json<PlaceOrderRequest>) -> Result<...> {
    // 每个 Handler 都要记得调用 validate()
    let validated = symbol_mgr.validate_order(&req)?;
    // ...
}

// Response
async fn get_balance(...) -> Json<BalanceResponse> {
    let raw = service.get_balance(...)?;
    // 每个 Handler 都要记得调用 format()
    Json(symbol_mgr.format_balance_response(&raw))
}
}

优点	缺点
简单直接	依赖开发者自觉，容易遗漏
无需额外类型	转换逻辑分散在各 Handler

方案 B：Service 层封装

机制：Handler 只能调用 Service 方法，Service 内部做转换。

#![allow(unused)]
fn main() {
// Handler 只传递原始 DTO
async fn place_order(Json(req): Json<PlaceOrderRequest>) -> Result<...> {
    order_service.place(req).await  // Service 内部调用 SymbolManager
}

async fn get_balance(...) -> Result<Json<BalanceResponse>> {
    Ok(Json(balance_service.get_formatted(...).await?))  // Service 返回已格式化数据
}
}

优点	缺点
业务逻辑集中	Service 仍需记得调用 `SymbolManager`
Handler 简洁	如果 Service 遗漏，问题仍会发生

方案 C：Axum Extractor + IntoResponse 模式 ⭐ 推荐

机制：在 Axum 框架层强制转换。

Request 端：自定义 Extractor

#![allow(unused)]
fn main() {
/// 已验证的订单请求，Handler 直接拿到 ScaledAmount
pub struct ValidatedOrder {
    pub symbol_id: SymbolId,
    pub quantity: ScaledAmount,
    pub price: ScaledAmount,
}

#[async_trait]
impl<S> FromRequest<S> for ValidatedOrder
where
    S: Send + Sync,
{
    type Rejection = ApiError;
    
    async fn from_request(req: Request, state: &S) -> Result<Self, Self::Rejection> {
        let Json(raw): Json<RawOrderRequest> = Json::from_request(req, state).await?;
        let symbol_mgr = state.symbol_manager();
        
        Ok(ValidatedOrder {
            symbol_id: raw.symbol_id,
            quantity: symbol_mgr.parse_qty(raw.symbol_id, &raw.quantity)?,
            price: symbol_mgr.parse_price(raw.symbol_id, &raw.price)?,
        })
    }
}

// Handler 直接拿到已验证的类型，无法绕过
async fn place_order(order: ValidatedOrder) -> Result<impl IntoResponse> {
    // order.quantity 已经是 ScaledAmount，不可能是未转换的 String
}
}

Response 端：自定义 IntoResponse

#![allow(unused)]
fn main() {
/// 已格式化的余额响应，自动调用 SymbolManager 格式化
pub struct FormattedBalanceResponse {
    pub balances: Vec<(AssetId, ScaledAmount)>,
    pub symbol_mgr: Arc<SymbolManager>,
}

impl IntoResponse for FormattedBalanceResponse {
    fn into_response(self) -> Response {
        let formatted: Vec<BalanceDto> = self.balances.iter()
            .map(|(asset, amount)| BalanceDto {
                asset: asset.to_string(),
                amount: self.symbol_mgr.format_asset_amount(*asset, *amount),
            })
            .collect();
        Json(formatted).into_response()
    }
}

// Handler 返回内部类型，格式化在 IntoResponse 中自动完成
async fn get_balances(State(state): State<AppState>) -> FormattedBalanceResponse {
    let balances = state.service.get_balances().await;
    FormattedBalanceResponse { balances, symbol_mgr: state.symbol_mgr.clone() }
}
}

优点	缺点
框架层强制，Handler 拿不到原始 String	需要为每类请求定义 Extractor
编译期保证	需要在 Extractor 中获取 `SymbolManager`
转换逻辑完全集中	初期实现成本略高

方案 D：类型驱动设计（最严格）

机制：定义“未验证“的金额类型，只能通过 SymbolManager 转换。

#![allow(unused)]
fn main() {
/// 未验证的金额，不能直接使用
pub struct UnvalidatedAmount(String);

impl UnvalidatedAmount {
    // 没有 .parse() 方法
    // 没有 Deref<Target=String>
    // 唯一的出路是传给 SymbolManager
}

impl SymbolManager {
    pub fn parse(&self, asset: AssetId, amount: UnvalidatedAmount) -> Result<ScaledAmount>;
}

// DTO 使用未验证类型
#[derive(Deserialize)]
pub struct PlaceOrderRequest {
    pub quantity: UnvalidatedAmount,  // 无法直接 .parse()
}
}

优点	缺点
类型系统完全封锁	引入更多类型
即使忘记调用也无法编译	Serde 自定义反序列化略复杂

3. 推荐方案：StrictDecimal + Extractor

3.1 核心设计：分层验证

Client (JSON String "1.5")
    ↓ Serde: StrictDecimal 自定义反序列化
API DTO (StrictDecimal) ← 格式已验证
    ↓ Extractor: SymbolManager.decimal_to_scaled()
Handler (ScaledAmount) ← 精度已验证

关键洞察：

Serde 层负责格式验证：利用 rust_decimal 的解析能力，拒绝非法格式
SymbolManager 负责精度验证：检查小数位是否符合资产精度
业务代码只需验证范围：数字格式和精度都已保证

3.2 StrictDecimal 实现

#![allow(unused)]
fn main() {
use rust_decimal::Decimal;
use serde::{Deserialize, Deserializer};

/// 严格格式的 Decimal，在反序列化时进行格式验证
#[derive(Debug, Clone, Copy)]
pub struct StrictDecimal(Decimal);

impl StrictDecimal {
    pub fn inner(&self) -> Decimal {
        self.0
    }
}

impl<'de> Deserialize<'de> for StrictDecimal {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        let s = String::deserialize(deserializer)?;
        
        // 严格格式检查：拒绝 .5, 5., 空字符串等
        if s.is_empty() {
            return Err(serde::de::Error::custom("Amount cannot be empty"));
        }
        if s.starts_with('.') {
            return Err(serde::de::Error::custom("Invalid format: use 0.5 not .5"));
        }
        if s.ends_with('.') {
            return Err(serde::de::Error::custom("Invalid format: use 5.0 not 5."));
        }
        
        // 使用 Decimal 库解析
        let d = Decimal::from_str(&s)
            .map_err(|e| serde::de::Error::custom(format!("Invalid decimal: {}", e)))?;
        
        // 拒绝负数（金额必须非负）
        if d.is_sign_negative() {
            return Err(serde::de::Error::custom("Amount cannot be negative"));
        }
        
        Ok(StrictDecimal(d))
    }
}
}

3.3 DTO 使用示例

#![allow(unused)]
fn main() {
#[derive(Debug, Deserialize)]
pub struct PlaceOrderRequest {
    pub symbol: String,
    pub quantity: StrictDecimal,  // 格式已验证
    pub price: StrictDecimal,     // 格式已验证
}
}

3.4 SymbolManager 扩展

#![allow(unused)]
fn main() {
impl SymbolManager {
    /// 将已验证的 Decimal 转换为 ScaledAmount
    /// 只需验证精度，格式已在 Serde 层验证
    pub fn decimal_to_scaled(
        &self,
        symbol: SymbolId,
        decimal: Decimal,
    ) -> Result<ScaledAmount, MoneyError> {
        let decimals = self.get_symbol_decimals(symbol)?;
        
        // 检查精度是否超限
        if decimal.scale() > decimals {
            return Err(MoneyError::PrecisionExceeded {
                provided: decimal.scale(),
                max: decimals,
            });
        }
        
        // 转换为 u64
        let scaled = decimal * Decimal::from(10u64.pow(decimals));
        let raw = scaled.to_u64()
            .ok_or(MoneyError::Overflow)?;
        
        Ok(ScaledAmount::from_raw(raw))
    }
}
}

3.5 Extractor 整合

#![allow(unused)]
fn main() {
pub struct ValidatedOrder {
    pub symbol_id: SymbolId,
    pub quantity: ScaledAmount,
    pub price: ScaledAmount,
}

#[async_trait]
impl<S> FromRequest<S> for ValidatedOrder
where
    S: Send + Sync,
{
    type Rejection = ApiError;
    
    async fn from_request(req: Request, state: &S) -> Result<Self, Self::Rejection> {
        let Json(raw): Json<PlaceOrderRequest> = Json::from_request(req, state).await?;
        let symbol_mgr = state.symbol_manager();
        let symbol_id = symbol_mgr.get_symbol_id(&raw.symbol)?;
        
        Ok(ValidatedOrder {
            symbol_id,
            // StrictDecimal 已验证格式，这里只验证精度
            quantity: symbol_mgr.decimal_to_scaled(symbol_id, raw.quantity.inner())?,
            price: symbol_mgr.decimal_to_scaled(symbol_id, raw.price.inner())?,
        })
    }
}
}

3.6 设计优势总结

层级	职责	验证内容
Serde (StrictDecimal)	格式验证	拒绝 `.5`, `5.`, 负数, 非数字
SymbolManager	精度验证	检查小数位是否超限
业务代码	范围验证	检查金额是否在合理范围

关键收益：

利用库能力：rust_decimal 提供成熟的数字解析
早期失败：格式错误在反序列化阶段就拦截
关注点分离：每层只负责一种验证
编译期保证：Handler 拿到的是 ScaledAmount，无法出错

4. CI 自动化检查：机制强制，不靠自觉

核心原则：我们要从机制和流程上规范，而不是依赖开发者的“自觉“。

4.1 审计脚本：`scripts/audit_api_types.sh`

#!/bin/bash
set -e

echo "🔍 Auditing API type safety..."

# 1. 检查 DTO 中是否存在 u64/i64 金额字段
# 金额字段名通常包含: amount, quantity, price, balance, volume
AMOUNT_PATTERNS="amount|quantity|price|balance|volume|size|qty"

if grep -rn "pub\s\+\(${AMOUNT_PATTERNS}\)\s*:\s*u64" --include="*.rs" src/gateway/; then
    echo "❌ FAIL: Found u64 amount field in API DTO"
    echo "   → Should use String type instead"
    exit 1
fi

if grep -rn "pub\s\+\(${AMOUNT_PATTERNS}\)\s*:\s*i64" --include="*.rs" src/gateway/; then
    echo "❌ FAIL: Found i64 amount field in API DTO"
    echo "   → Should use String type instead"
    exit 1
fi

# 2. 检查 Handler 中是否直接 parse 金额
if grep -rn "\.parse::<u64>\(\)" --include="*.rs" src/gateway/; then
    echo "❌ FAIL: Found direct u64 parsing in gateway"
    echo "   → Should use SymbolManager.parse_qty() instead"
    exit 1
fi

# 3. 检查是否直接使用 format!() 格式化金额
if grep -rn 'format!\s*(\s*"{}"\s*,\s*\w*amount' --include="*.rs" src/gateway/; then
    echo "⚠️ WARNING: Possible direct amount formatting found"
    echo "   → Consider using SymbolManager.format_*() instead"
fi

# 4. 检查 Decimal 是否绕过 SymbolManager
if grep -rn "Decimal::from_str" --include="*.rs" src/gateway/ | grep -v "// safe:"; then
    echo "⚠️ WARNING: Direct Decimal parsing found in gateway"
    echo "   → Should use SymbolManager for conversions"
fi

echo "✅ API type safety audit passed!"

4.2 检查规则详解

检查项	目标	检测模式
DTO 字段类型	金额字段必须是 `String`	`pub (amount
直接解析	禁止在 Handler 中 `.parse::<u64>()`	`.parse::<u64>()` in `src/gateway/`
直接格式化	禁止 `format!("{}", amount)`	`format!(...amount...)` in `src/gateway/`
绕过转换层	禁止直接使用 `Decimal::from_str`	`Decimal::from_str` in `src/gateway/`

4.3 CI 集成

GitHub Actions 配置：

# .github/workflows/ci.yml
- name: Audit API Type Safety
  run: |
    chmod +x scripts/audit_api_types.sh
    ./scripts/audit_api_types.sh

本地 Pre-commit Hook：

# .git/hooks/pre-commit
#!/bin/bash
./scripts/audit_api_types.sh || exit 1

4.4 豁免机制

对于确实需要绕过检查的特殊场景（如测试代码、内部工具），可以使用注释标记：

#![allow(unused)]
fn main() {
// safe: 这是测试代码，允许直接解析
let amount = "100".parse::<u64>().unwrap();
}

审计脚本应排除带有 // safe: 注释的行。

5. 实施路线图

阶段	任务	状态
Phase 1	为核心订单 API 实现 `ValidatedOrder` Extractor	⏳ 待实现
Phase 2	为余额/资产 API 实现 `FormattedBalanceResponse`	⏳ 待实现
Phase 3	为所有金额相关 API 统一改造	⏳ 待实现
Phase 4	实现 `audit_api_types.sh` 并集成 CI	⏳ 待实现
Phase 5	添加 pre-commit hook 本地拦截	📋 规划中

6. 参考

Money Type Safety Standard — 资金类型安全规范
0x02 浮点数的诅咒 — 浮点数问题详解

CI 常见坑与解决方案

本文档汇总 GitHub Actions CI 中遇到的典型问题及解决方案。

🚨 0. 关键警告：禁止使用 `pkill -f`

问题描述

在 Antigravity IDE 中执行 pkill -f "zero_x_infinity" 会导致 IDE 崩溃。因为 IDE 的 language_server 进程参数中包含项目路径，会被 pkill -f 误杀。

正确做法

永远使用 PID 或精确匹配：

# ✅ 方法 1: 启动时记录 PID (推荐)
./target/release/zero_x_infinity --gateway &
GW_PID=$!
# ...
kill "$GW_PID"

# ✅ 方法 2: 精确匹配进程名
pkill "^zero_x_infinity$"

1. 服务容器 (Service Containers)

1.1 禁止使用 `docker exec`

问题描述

GitHub Actions 的 services: 是托管服务容器，不是本地 Docker 容器。

services:
  tdengine:
    image: tdengine/tdengine:latest
    ports:
      - 6041:6041

典型报错

docker exec tdengine taos -s "DROP DATABASE IF EXISTS trading"
# Error: No such container: tdengine

解决方案

使用 REST API 或网络协议连接，不用 docker exec：

# ❌ 错误
docker exec tdengine taos -s "DROP DATABASE IF EXISTS trading"

# ✅ TDengine REST API
curl -sf -u root:taosdata -d "DROP DATABASE IF EXISTS trading" http://localhost:6041/rest/sql

# ✅ PostgreSQL psql
PGPASSWORD=trading123 psql -h localhost -U trading -d exchange_info_db -c "..."

1.2 服务连接必须用 localhost

# CI 中：
PG_HOST=localhost    # ✅ 正确
PG_HOST=postgres     # ❌ 只在 Docker Compose 中有效

2. 环境变量

2.1 测试脚本必须加载 db_env.sh

问题描述

测试脚本没有设置 DATABASE_URL 等环境变量，导致 PostgreSQL 连接超时。

典型报错

❌ Failed to connect to PostgreSQL: pool timed out while waiting for an open connection

解决方案

在脚本开头 source db_env.sh：

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/lib/db_env.sh"

2.2 CI 环境检测

if [ -n "$CI" ]; then
    # CI 专用逻辑
else
    # 本地环境逻辑
fi

3. workflow 步骤条件

3.1 正确的日志 Dump 模式

问题描述 如果不当使用 continue-on-error: true，会导致即使测试失败，Job 最终也被标记为成功（绿色），掩盖了错误。

❌ 错误做法：

- name: Run Test
  run: ./test.sh
  continue-on-error: true  # 导致测试失败也被忽略

- name: Dump Logs
  run: cat logs/*.log
  # 结果：Job 变绿，错误被隐藏！

✅ 正确做法：不要使用 continue-on-error。利用 if: failure() 条件在失败时运行日志打印步骤。

- name: Run Test
  run: ./test.sh
  # 默认 behavior: 失败立即停止后续非 if: failure() 步骤

- name: Dump Logs
  if: failure()  # 仅在之前步骤失败时运行
  run: cat logs/*.log
  # 注意：此步骤本身会成功，但 Job 状态仍由 Run Test 决定（红色）

3.2 日志文件路径一致性

确保脚本写入的日志路径与 workflow 读取的路径一致：

# 脚本中
nohup ./gateway > /tmp/gateway_fee_e2e.log 2>&1 &

# workflow 中必须匹配
cat /tmp/gateway_fee_e2e.log   # ✅ 路径一致
cat /tmp/gw_test.log           # ❌ 路径不一致

4. 数据库初始化

4.1 PostgreSQL 健康检查

问题: 默认使用 root 用户，数据库没有 root 角色。

services:
  postgres:
    options: >-
      --health-cmd "pg_isready -U trading -d exchange_info_db"  # 指定用户

4.2 TDengine 精度

必须使用 PRECISION 'us'：

CREATE DATABASE IF NOT EXISTS trading PRECISION 'us';

如果精度错误，微秒时间戳会报 “Timestamp data out of range”。

4.3 服务沉淀时间

- name: Initialize TDengine
  run: ./scripts/db/init.sh td && sleep 5  # 等待元数据初始化

5. 二进制与启动

5.1 二进制新鲜度

本地测试前确保 release 二进制是最新的：

cargo build --release

CI 每次都是 fresh build，但本地开发可能运行旧版本。

5.2 Gateway 启动等待

for i in $(seq 1 60); do
    if curl -sf "http://localhost:8080/api/v1/health" > /dev/null 2>&1; then
        break
    fi
    sleep 1
done

注意：健康检查路径是 /api/v1/health，不是 /health。

6. 配置与端口对齐 (Config & Port Parity)

6.1 5433 vs 5432 端口陷阱

本地 (Dev): 默认端口 5433 (config/dev.yaml)。
CI 环境: 标准端口 5432 (config/ci.yaml)。
解决方案: 测试脚本必须检测 CI=true 并传递 --env ci。

if [ "$CI" = "true" ]; then
    GATEWAY_ARGS="--gateway --env ci"
fi

6.2 标准化脚本模板

请复用标准模板：scripts/templates/test_integration_template.sh。

7. Python 环境规范 (uv)

7.1 禁止裸跑 Python

CI 环境中直接运行 python3 可能找不到依赖。

7.2 解决方案

使用 uv run 显式管理依赖，并推荐使用 HEREDOC 模式以确保环境隔离：

#!/bin/bash
# 统一入口 (Wrapper Scripts) 示例
export SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

# 使用 --with 显式声明依赖，并传递所有参数 "$@"
uv run --with requests --with pynacl python3 - "$@" << 'EOF'
import sys
import os
# ... python code ...
EOF

8. 快速参考

场景	本地	CI
TDengine 操作	`docker exec tdengine taos`	`curl localhost:6041/rest/sql`
PostgreSQL 连接	容器名或 localhost	`localhost` only
环境变量	手动设置或 .env	`source db_env.sh`
日志输出	终端	文件 + artifact 上传

9. 竞态条件与资源清理 (Race Conditions)

9.1 端口占用 (“Address already in use”)

问题描述 在同一个 Job 中连续运行多个测试脚本（如 QA Suite + POC），前一个脚本可能未完全释放端口，导致后续脚本启动 Gateway 失败。

解决方案 在启动 Gateway 前，必须显式清理旧进程。在 CI 环境中（非本地 IDE），可以使用 pkill：

# Ensure clean slate
echo "Cleaning up any existing Gateway processes..."
pkill -9 -f "zero_x_infinity" || true
sleep 2 # 等待内核释放端口

关键点：使用 kill -9 确保立即终止，防止僵尸进程。

10. 错误处理规范

10.1 如果 Config 加载 Panic

禁止：

#![allow(unused)]
fn main() {
File::open("config.yaml").unwrap(); // ❌ 导致 crash，无详细日志
}

必须：使用 anyhow::Result 并添加 Context：

#![allow(unused)]
fn main() {
File::open("config.yaml").with_context(|| "Failed to open config")?; // ✅
}

10.2 数据库唯一约束 (Duplicate Key)

问题：重复注册用户导致 500 Panic 并在日志中打印堆栈跟踪，干扰排查。

解决方案：捕获 “duplicate key” 错误，记录为 Warning，并返回 409 Conflict。

#![allow(unused)]
fn main() {
if err.to_string().contains("duplicate key") {
    tracing::warn!("User already exists: {}", err);
    return Err(StatusCode::CONFLICT);
}
}

11. 测试数据与环境对齐 (Test Data Parity)

11.1 手动 SQL 注入 vs API 初始化

问题描述

本地开发通常依赖 run_poc.sh (基于 API 的全流程验证)，而 CI 可能会运行更底层的 test_e2e.sh (基于 SQL 注入的快速验证)。如果两者逻辑不一致，会导致本地通过但 CI 失败。

典型案例：

API 充值逻辑：自动处理单位缩放 (Scaling)。
手动 SQL 注入：错误地假设数据库存储 Scaled Integer (10^6)，手动插入了 1000000。
结果：数据库里实际上存储了 1,000,000 USDT (而非 1 USDT)，导致后续余额检查逻辑完全失效。

解决方案：

首选 API 初始化：尽可能在测试脚本中使用 POST /api/v1/private/deposit 等 API 进行数据准备，保证业务逻辑一致性。
二次确认 Schema：如果必须使用 SQL 注入，务必查阅 migrations/ 或 schema.rs 确认字段类型 (Decimal vs BigInt)。
共享 Helper：使用统一的 Python/Bash 库处理数据注入，避免每个脚本重复造轮子且逻辑不一。

最后更新：2025-12-30

12. Bash 脚本陷阱

12.1 算术扩展导致脚本静默退出

问题描述

在开启了 set -e 的 Bash 脚本中，如果算术表达式的结果为 0，Bash 会将其视为“失败”（False），导致脚本立即退出。

典型场景

set -e
TOTAL_TESTS=0
# ...
((TOTAL_TESTS++)) # 当 TOTAL_TESTS 为 0 时，表达式结果为 0，返回码 1 -> 脚本立即退出！

后果

CI 任务在没有任何报错日志的情况下突然停止（Silent Failure），极难排查。

解决方案

始终使用标准的 POSIX 算术扩展写法，或者确保算术表达式不以此方式单独执行：

# ✅ 推荐：标准写法，不受结果值影响
TOTAL_TESTS=$((TOTAL_TESTS + 1))

# ✅ 替代：强制返回真（不优雅）
((TOTAL_TESTS++)) || true

Pre-Merge to Main Checklist

合并到 main 分支之前，必须完成所有检查项。

1. 代码质量 ✓

cargo fmt --check 通过
cargo test 通过

Clippy（必须使用 CI 相同配置）：

cargo clippy -- -D warnings -A clippy::too_many_arguments -A clippy::collapsible_if -A clippy::unwrap_or_default -A clippy::doc_lazy_continuation -A clippy::manual_is_multiple_of -A clippy::implicit_saturating_sub -A clippy::redundant_pattern_matching -A clippy::derivable_impls

2. 文档更新 ✓

docs/src/*.md 相关章节已更新
docs/src/SUMMARY.md 目录结构正确
mdbook build 构建成功
README.md 已添加新章节链接
阅读 agent-testing-notes.md（避免常见坑）

3. CI/CD ✓

3.1 本地验证（必须）

./scripts/test_ci.sh --quick 通过

模拟 CI 单独运行（关键！本地全跑可能掩盖问题）：

CI=true ./scripts/test_ci.sh --test-gateway-e2e
CI=true ./scripts/test_ci.sh --test-kline
CI=true ./scripts/test_ci.sh --test-depth
CI=true ./scripts/test_ci.sh --test-account

3.2 CI 环境检查

不使用 docker exec (CI service container 不支持)
数据库连接使用 localhost 而非容器名
所有 helper 函数定义在全局作用域（不在 if 块内）

3.3 CI 失败时

立即下载日志：gh run view <run-id> --log-failed
搜索错误：grep -i "error\|fail\|fatal" logs/*.txt
根据日志修复，不要瞎猜

4. Git 操作 ✓

所有更改已 commit
git status 显示 clean
分支已 rebase/merge 到最新 main (无冲突)

5. 发布 ✓

合并后 创建 Git Tag: git tag v{版本号}
推送 Tag: git push origin --tags

Caution

⚠️ 严禁删除 feature 分支！分支是项目历史的重要组成部分，必须永久保留。

执行命令

# 1. 最终检查
cargo check && cargo test && cargo clippy && cargo fmt --check

# 2. 文档构建
cd docs && mdbook build && cd ..

# 3. 合并
git checkout main
git merge <feature-branch> --no-ff -m "Merge branch '<feature-branch>'"

# 4. 打 Tag
git tag v0.10-a-account-system
git push origin main --tags

# 5. 完成
echo "✅ Merge complete!"

编译与验证注意事项 (Build & Verification Guide)

本文档总结了在本地进行 Gateway 开发和 E2E 测试时，关于“修改未生效”和“端口冲突”的常见坑点及解决方案。

1. 源码修改未生效 (The Stale Binary Trap)

当你执行了 cargo build --release 但发现测试结果仍然运行旧代码逻辑时：

常见原因

指纹失效 (Fingerprint)：Cargo 错误认为二进制已是最新的，跳过了重编或重新链接。
增量缓存损坏：target/release/incremental 中的缓存导致逻辑未刷新。
时间戳分辨率：源码修改时间与上次构建时间太接近（APFS 精度问题）。

解决方案 (由轻到重)

最常用 (强制重链):

touch src/main.rs && cargo build --release

清理增量缓存 (非全量 clean):
```
rm -rf target/release/incremental
```

强制重编核心模块:

cargo clean -p zero_x_infinity && cargo build --release

2. 端口冲突与僵尸进程 (Port Conflict)

现象

Gateway 启动失败并报错：❌ FATAL: Failed to bind to 0.0.0.0:8080: Address already in use。这通常是因为后台残留了一个旧的 Gateway 进程。

诊断与解决

查杀残留进程 (安全方式): 不要使用 pkill -f (会杀掉 IDE)。请使用：
```
# 查找并杀掉占用 8080 的进程
lsof -i :8080
kill -9 <PID>
```
检查脚本冲突: 确认你没有在终端手动运行 Gateway 的同时，又在另一个窗口跑 test_transfer_e2e.sh。

3. E2E 测试最佳实践

确认二进制新鲜度

在运行测试前，手动核对时间戳或观察 E2E 脚本的警告：

ls -lh src/funding/service.rs target/release/zero_x_infinity

数据库一致性

如果逻辑看起来对了但 API 报错 Missing column：

确认 PostgreSQL 迁移已手动应用（如果 init.sh 因为存在旧数据跳过了）。
确认 balances_tb 的 account_type 和 status 是 SMALLINT，在 Rust 中必须对应 i16。

运行 Gateway 时常备参数

手动调试时请务必带上环境变量参数：

./target/release/zero_x_infinity --gateway --env dev

最后更新: 2025-12-24

Database Selection: TDengine vs Others

🇺🇸 English | 🇨🇳 中文

🇺🇸 English

Scenario: Settlement Persistence - Storing orders, trades, and balances.

📊 Comparison

Candidates

Database	Type	Use Case
TDengine	Time-Series	IoT, Financial Data, High-Frequency Write
PostgreSQL	Relational	General OLTP
TimescaleDB	PG Extension	Time-Series (PG based)
ClickHouse	Columnar	OLAP, Analytics

🎯 Why TDengine?

1. Performance (Based on TSBS)

Metric	TDengine vs TimescaleDB	TDengine vs PostgreSQL
Write Speed	1.5-6.7x Faster	10x+ Faster
Query Speed	1.2-24.6x Faster	10x+ Faster
Storage	1/12 - 1/27 Space	Huge Saving

2. Matching Exchange Requirements

Requirement	TDengine Solution
High Frequency Write	Million/sec write capacity
Timestamp Index	Native time-series design
High Cardinality	High data points, Super Tables
Real-time Stream	Built-in Stream Computing
Data Subscription	Kafka-like real-time push
Auto Partitioning	Auto-sharding by time

3. Simplified Architecture

TDengine Solution:
┌─────────────────────────────────────────────┐
│                  TDengine                    │
│      Persistence + Stream + Subscription     │
55 └─────────────────────────────────────────────┘

Fewer Components = Lower Ops Complexity + Lower Latency

4. Rust Ecosystem

✅ Official Rust Client taos
✅ Async (tokio)
✅ Connection Pool (r2d2)
✅ WebSocket (Cloud friendly)

❌ Why Not Others?

PostgreSQL

❌ Poor time-series performance.
❌ High-frequency write bottleneck.
❌ Large storage consumption.

TimescaleDB

⚠️ Slower than TDengine.
⚠️ Much larger storage footprint.

ClickHouse

✅ Fast analytics.
❌ Real-time row-by-row write is weak (prefers batch).
❌ High Ops complexity.

📋 Data Model

TDengine Super Table

-- Orders Super Table
CREATE STABLE orders (
    ts TIMESTAMP,           -- PK
    order_id BIGINT,
    user_id BIGINT,
    side TINYINT,
    order_type TINYINT,
    price BIGINT,
    qty BIGINT,
    filled_qty BIGINT,
    status TINYINT
) TAGS (
    symbol_id INT           -- Partition Tag
);

-- Trades
CREATE STABLE trades (...) TAGS (symbol_id INT);

-- Balances
CREATE STABLE balances (...) TAGS (user_id BIGINT, asset_id INT);

Advantages

✅ Auto-partition by TAG.
✅ Auto-aggregation query.
✅ Unified Schema.

🏗️ Architecture Integration

Gateway -> Order Queue -> Trading Core -> Events -> TDengine

✅ Final Recommendation

Primary Storage: TDengine

Orders, Trades, Balances History.
High performance write/read.

📊 Expected Performance

Write Latency: < 1ms
Query Latency: < 5ms
Storage Compression: 10:1
Supported TPS: 100,000+

↑ Back to Top

🇨🇳 中文

场景: 交易所 Settlement Persistence - 存储订单、成交、余额

📊 方案对比

候选数据库

数据库	类型	适用场景
TDengine	时序数据库	IoT, 金融数据, 高频写入
PostgreSQL	关系型数据库	通用 OLTP
TimescaleDB	PostgreSQL扩展	时序数据 (基于PG)
ClickHouse	列式分析数据库	OLAP, 大规模聚合

🎯 为什么选择 TDengine

1. 性能优势 (基于 TSBS 基准测试)

指标	TDengine vs TimescaleDB	TDengine vs PostgreSQL
写入速度	1.5-6.7x 更快	10x+ 更快
查询速度	1.2-24.6x 更快	10x+ 更快
存储空间	1/12 - 1/27	极大节省

2. 交易所场景完美匹配

需求	TDengine 解决方案
高频写入	百万/秒级写入能力
时间戳索引	原生时序设计，毫秒级查询
高基数支持	亿级数据点，Super Table
实时分析	内置流计算引擎
数据订阅	类 Kafka 的实时推送
自动分区	按时间自动分片
高压缩率	1/10 存储空间

3. 简化架构

TDengine 方案:
┌─────────────────────────────────────────────┐
│                  TDengine                    │
│      持久化 + 流计算 + 数据订阅              │
└─────────────────────────────────────────────┘

减少组件 = 减少运维复杂度 + 减少延迟

4. Rust 生态支持

✅ 官方 Rust 客户端 taos
✅ 异步支持 (tokio 兼容)
✅ 连接池 (r2d2)
✅ WebSocket 连接 (适合云部署)

❌ 为什么不选其他方案

PostgreSQL

❌ 通用数据库，时序性能差
❌ 高频写入会成为瓶颈
❌ 存储空间消耗大

TimescaleDB

⚠️ 基于 PostgreSQL，继承其限制
⚠️ 比 TDengine 慢 1.5-6.7x
⚠️ 存储空间是 TDengine 的 12-27x

ClickHouse

✅ 分析查询极快
❌ 实时写入不如 TDengine
❌ 更适合批量导入 + OLAP
❌ 运维复杂度高

📋 交易所数据模型设计

TDengine Super Table 设计

-- 订单表 (Super Table)
CREATE STABLE orders (...) TAGS (symbol_id INT);

-- 成交表 (Super Table)
CREATE STABLE trades (...) TAGS (symbol_id INT);

-- 余额快照表 (Super Table)  
CREATE STABLE balances (...) TAGS (user_id BIGINT, asset_id INT);

Super Table 优势

✅ 自动按 TAG 分表
✅ 查询时自动聚合
✅ Schema 统一管理

🏗️ 架构集成方案

Gateway -> Order Queue -> Trading Core -> Events -> TDengine

✅ 最终推荐

主存储: TDengine

订单、成交、余额历史
高性能写入和查询
自动数据分区和压缩

📊 预期性能

写入延迟: < 1ms
查询延迟: < 5ms
存储压缩率: 10:1
支持 TPS: 100,000+

ADR-001: WebSocket Security - Strict Auth Enforcement

Status	Accepted
Date	2025-12-27
Author	QA / Security Remediation Agent
Context	Phase 0x10.5 Backend Gaps

Context

During the QA Audit of Phase 0x10.5, a critical security vulnerability (Identity Spoofing) was identified in the WebSocket Gateway. The implementation allowed clients to assert any user_id via query parameter (ws://...?user_id=123) without cryptographic verification (Token/Signature).

Decision

To immediately mitigate this P0 vulnerability while preserving functionality for the “Public Market Data” milestone:

Strict Anonymous Mode: The Gateway MUST reject any connection attempt where user_id is provided and is NOT 0 (Anonymous).
HTTP 401: Rejection must return 401 Unauthorized.
Future Auth: Authenticated access (for Private Channels) is deferred to the Authentication Phase (0x0A-b). Until then, NO private user connections are allowed.

Consequences

Positive: Eliminates identity spoofing risk. System is secure for public data consumption.
Negative: Private channel testing (e.g., private.order) is temporarily blocked until proper Auth is implemented.

Verification

scripts/test_qa_adversarial.py was created to verify this constraint.

ADR-005: Unified Chain-Asset Schema & Admin Integration

Date: 2025-12-30 Status: Accepted Supersedes: ADR-004 (Partial), Design Doc 0x11-c (Draft) Context: Reconciling conflict between Admin’s Logical Assets (assets_tb) and Sentinel’s Physical Chains (chain_assets).

1. Problem Statement

The system currently has ambiguity regarding where “Asset Definition” lives:

Admin (assets_tb): Defines “USDT” (Logical), Symbol, Decimal (Internal).
Sentinel (chain_assets): Needs “USDT” on ETH (Physical), Contract, Decimal (Chain).
Conflict: Potential data duplication (redundancy) and unclear ownership.

2. Architectural Decision: Layered Asset Model

We explicitly separate the domain model into two strictly defined layers.

Layer 1: Logical Asset (Master) -> `assets_tb`

Owner: Admin Dashboard (Existing).
Scope: Business logic, User Balances, Trading Pairs.
Key Fields:
- asset_id (PK)
- asset (Unique Identifier, e.g., “USDT”)
- decimals (System Precision, e.g., 8)
- status (Global Switch)

Layer 2: Physical Binding (Extension) -> `chain_assets_tb`

Owner: Operations (via Admin Extension).
Scope: Blockchain adapters, Deposit/Withdrawal addresses, Sentinel config.
Key Fields:
- chain_slug (FK to chains_tb)
- asset_id (FK to assets_tb)
- contract_address (Physical ID)
- decimals (Physical Precision)
Constraint: No re-definition of Logical fields (Symbol, Name).

3. Schema Specification (Finalized)

-- 1. Chains (Infrastructure)
CREATE TABLE chains_tb (
    chain_slug VARCHAR(32) PRIMARY KEY,  -- 'ETH', 'BTC' (Renamed from chain_id)
    chain_name VARCHAR(64) NOT NULL,
    network_id VARCHAR(32),              -- '1', 'regtest'
    rpc_urls TEXT[] NOT NULL,
    confirmation_blocks INT DEFAULT 1,
    is_active BOOLEAN DEFAULT TRUE
);

-- 2. Chain Assets (Physical Extension)
CREATE TABLE chain_assets_tb (
    id SERIAL PRIMARY KEY,
    chain_slug VARCHAR(32) NOT NULL REFERENCES chains_tb(chain_slug),
    asset_id INT NOT NULL REFERENCES assets_tb(asset_id),
    
    -- Physical Properties Only
    contract_address VARCHAR(128),  -- Mutually Exclusive Unique ID per chain
    decimals SMALLINT NOT NULL,     -- The mapping factor (Chain -> System)
    
    -- Chain-Specific Overrides
    min_deposit DECIMAL(30, 8) DEFAULT 0,
    min_withdraw DECIMAL(30, 8) DEFAULT 0,
    withdraw_fee DECIMAL(30, 8) DEFAULT 0,
    is_active BOOLEAN DEFAULT FALSE, -- Safety: Inactive by default until verified
    
    -- Constraints
    UNIQUE(chain_slug, asset_id),        -- 1 Asset per Chain (for now, can look into bridging later)
    UNIQUE(chain_slug, contract_address) -- 1 Contract = 1 Asset
);

4. Admin Integration Scope

Admin code currently does not support Layer 2. To strictly follow this architecture, Admin must be updated in a future iteration:

New Model: ChainAsset mapping to chain_assets_tb.
New View: “Chain Configurations” tab under Asset details.
Logic: When viewing “USDT”, allow adding “ETH Binding” (Contract: 0x…, Decimals: 6).

5. Migration Strategy (Immediate)

For Phase 0x11-b (Sentinel Hardening), we implement the Schema and Manual Seeding (Migration 012). Admin UI updates are deferred, but the Schema is future-proofed to support them perfectly.

ADR-006: User Address Decoupling for Account-Based Chains

Date: 2025-12-30 Status: Accepted Context: Replaces user_addresses definition in migrations/010_deposit_withdraw.sql to enable “Hot Listing”.

1. Problem Statement

The current schema for user addresses matches Assets, not Chains:

-- OLD (Flawed)
PRIMARY KEY (user_id, asset, chain_slug)

The Loophole:

User A has ETH Address 0x123. DB Record: (UserA, 'ETH', 'ETH', '0x123').
Ops lists UNI (ERC20).
User A deposits UNI to 0x123.
Sentinel parses Transfer(0x123, val).
Sentinel looks up: SELECT user_id FROM user_addresses WHERE address='0x123' AND asset='UNI'.
Result: NULL. Deposit Ignored.

Impact: Users must manually “Generate UNI Address” (redundant action) before depositing, or funds are lost/stuck. This breaks the “Ops List -> Immediate Deposit” workflow.

2. Solution: Chain-Centric Address Model

We must recognize that for Account-Based Chains (ETH, TRON, BSC, SOL), an address belongs to the Chain Account, not the individual Asset.

2.1 Schema Change

We split the concept into “Account Bindings”.

-- migration/012_user_chain_addresses.sql

CREATE TABLE user_chain_addresses (
    user_id BIGINT NOT NULL,
    chain_slug VARCHAR(32) NOT NULL REFERENCES chains_tb(chain_slug),
    address VARCHAR(255) NOT NULL,
    
    -- Metadata
    memo_tag VARCHAR(64), -- For XRP/EOS destination tags
    created_at TIMESTAMPTZ DEFAULT NOW(),
    
    -- Constraint: One address per user per chain (simplified model)
    -- Or Multiple? For now, 1:1 is sufficient for MVP.
    PRIMARY KEY (user_id, chain_slug),
    UNIQUE (chain_slug, address) -- Reverse lookup must be unique
);

2.2 Sentinel Lookup Logic

When EthScanner detects a Transfer(to, value, contract):

Identify Asset: Match contract -> asset_id (via chain_assets_tb).
Identify User: Match to -> user_id (via user_chain_addresses WHERE chain_slug=‘ETH’).
Insert Deposit: deposit_history(user_id, asset_id, amount).

Outcome: The asset_id comes from the Contract, the user_id comes from the Address. They are decoupled.

3. Handling UTXO (BTC)

BTC addresses are generally single-use or per-intent. However, for an Exchange Deposit model, we typically generate one “Deposit Address” per User per Chain (or rotate them). Currently, we can treat BTC the same: “User’s BTC Deposit Address”. If we need asset-specific addresses (e.g. OMNI USDT vs BTC), that’s a legacy edge case we might ignore for MVP, or handle via chain_slug variants (e.g. btc-omni vs btc-native? No, usually same chain).

Decision: The Schema user_chain_addresses(user_id, chain_slug) works for BTC too if we assume “One Checkable Address per User” or “List of Addresses”. Refinement: PRIMARY KEY (chain_slug, address) is the real physical truth. A user maps to an address. An address maps to a user.

4. Operational Workflow (Final)

Listing: Ops lists UNI (Contract 0x...) -> chain_assets_tb.
Sentinel: Refreshes map 0x... -> UNI.
User: Sends UNI to their existing ETH address.
Sentinel:
- Sees 0x... -> Knows it’s UNI.
- Sees Receiver -> Knows it’s User A.
- Success.

5. Status

Accepted

AR-001: Architecture Request - WebSocket Authentication

Status	REQUESTED
Date	2025-12-27
Requester	QA / Remediation Agent
Driver	Identity Spoofing Remediation

Problem Statement

The current WebSocket implementation relies on a “Strict Anonymous Mode” (ADR-001) which rejects any user_id != 0. While this mitigates immediate identity spoofing, it creates a functional gap: Authentic users cannot verify their identity or access private channels.

The user explicitly rejected ADR-001 as a complete solution (security is not fixed ... require forthar design), necessitating a robust authentication design.

Requirements

The Architect must provide a design (e.g., ADR-002) that:

Authentication mechanism: Defines how a WebSocket client proves its identity (e.g., JWT in Query Param vs Header vs Handshake Message).
Integration: How this integrates with src/api_auth/ (Ed25519) or standard Session Management.
State Management: How ConnectionManager stores and validates the authenticated session.
Migration: Specific steps to replace the temporary “Strict Anonymous Mode” in handler.rs with the new mechanism.

Constraints

Low Latency: Auth check must not significantly delay connection establishment.
Backwards Compatibility: Must support Anonymous public trade streams simultaneously.

Keyboard shortcuts

0xInfinity

🇺🇸 English | 🇨🇳 中文

🇺🇸 English | 🇨🇳 中文

🇺🇸 English | 🇨🇳 中文

🇺🇸 English | 🇨🇳 中文

🇺🇸 English | 🇨🇳 中文

🇺🇸 English | 🇨🇳 中文