Skip to content

Architecture

Architecture overview.

Full component graph (Mermaid)
graph TB
subgraph "Client Layer"
USER[User/Client Application]
CLI[CLI Interface]
SDK[Python SDK]
end
subgraph "API Layer (FastAPI)"
API[main.py - FastAPI App]
SCHEMAS[schemas.py - Pydantic Models]
SSE[sse.py - Server-Sent Events]
API --> SCHEMAS
API --> SSE
end
subgraph "Driver Layer"
DRIVER[drivers.py - Unified Driver]
DRIVER_BATCH[Batch Mode]
DRIVER_STREAM[Stream Mode]
DRIVER --> DRIVER_BATCH
DRIVER --> DRIVER_STREAM
end
subgraph "Generation Layer (Core)"
HSM[single_asset.py - HyperSynthReactor]
SF[feed.py - SyntheticTelemetryUplink]
EF[event_feed.py - EventMarketFeed]
UM[unified_market.py - MarketOrchestrationCore]
RUNNER[runner.py - Batch Runner]
COMPONENTS[components.py - Market Components]
HSM --> COMPONENTS
SF --> HSM
EF --> HSM
UM --> HSM
RUNNER --> HSM
end
subgraph "Core Math & Infrastructure"
MATH[math.py - Stochastic Processes]
INFRA[infrastructure.py - Chronometer, Latency]
CONFIG[config.py - SimulationManifest]
GBM[GeometricBrownianMotion]
MJD[MertonJumpDiffusion]
OU[OrnsteinUhlenbeck]
SIMCLOCK[Chronometer]
LATENCY[LatencyModel]
MATH --> GBM
MATH --> MJD
MATH --> OU
INFRA --> SIMCLOCK
INFRA --> LATENCY
end
subgraph "Venue Layer (KEY DIFFERENTIATOR)"
VENUE_BASE[base.py - VenueProtocol Interface]
VENUE_CONFIG[config.py - Venue Configurations]
BINANCE[binance.py - BinanceFundingModel]
HYPERLIQUID[hyperliquid.py - HyperLiquidFundingModel]
OKX[okx.py - OKXFundingModel]
BYBIT[bybit.py - BybitFundingModel]
BINANCE_FEED[binance/feed.py - Binance Format]
HL_FEED[hyperliquid/feed.py - HyperLiquid Format]
VENUE_BASE --> BINANCE
VENUE_BASE --> HYPERLIQUID
VENUE_BASE --> OKX
VENUE_BASE --> BYBIT
BINANCE --> BINANCE_FEED
HYPERLIQUID --> HL_FEED
end
subgraph "Processing Layer"
NORMALIZER[normalizer.py - CanonicalizationEngine]
CACHE[ArtifactCacheController - LZ4 Compression]
LOADER[hyperliquid_loader.py - Historical Data]
NORMALIZER --> CACHE
NORMALIZER --> LOADER
end
subgraph "Presets & Configuration"
PRESETS[presets.py - Exchange Presets]
BINANCE_PRESET[BinancePreset]
HL_PRESET[HyperLiquidPreset]
OKX_PRESET[OKXPreset]
BYBIT_PRESET[BybitPreset]
PRESETS --> BINANCE_PRESET
PRESETS --> HL_PRESET
PRESETS --> OKX_PRESET
PRESETS --> BYBIT_PRESET
end
subgraph "Utility Layer"
CONV[conversions.py - Unit Conversions]
LOGGER[logger.py - Logging]
ENUMS[enums.py - Enumerations]
STUBS[stubs.py - Type Stubs]
end
subgraph "Data Output"
PARQUET[Parquet Files]
LZ4_CACHE[LZ4 Compressed Cache]
SSE_STREAM[SSE Event Stream]
end
%% Client Connections
USER --> API
USER --> SDK
CLI --> SDK
SDK --> HSM
SDK --> PRESETS
%% API Flow
API --> DRIVER
%% Driver to Generator
DRIVER --> HSM
DRIVER_BATCH --> PARQUET
DRIVER_STREAM --> SSE_STREAM
%% Generator Dependencies
HSM --> MATH
HSM --> INFRA
HSM --> CONFIG
HSM --> CONV
HSM --> VENUE_CONFIG
%% Venue Integration
HSM --> BINANCE
HSM --> HYPERLIQUID
HSM --> OKX
HSM --> BYBIT
%% Presets to Config
PRESETS --> CONFIG
PRESETS --> VENUE_CONFIG
%% Processing Flow
HSM --> NORMALIZER
NORMALIZER --> LZ4_CACHE
%% Logging
HSM --> LOGGER
API --> LOGGER
NORMALIZER --> LOGGER
style VENUE_BASE fill:#ff6b6b
style BINANCE fill:#ff6b6b
style HYPERLIQUID fill:#ff6b6b
style OKX fill:#ff6b6b
style BYBIT fill:#ff6b6b
style VENUE_CONFIG fill:#ff6b6b
  • Entry Points: Direct Python SDK usage, CLI commands, HTTP API clients
  • Use Cases: ML training, backtesting, infrastructure testing
  • main.py: HTTP endpoints (/data/generate, /stream/live)
  • schemas.py: Request/response validation (Pydantic)
  • sse.py: Server-Sent Events for real-time streaming
  • Authentication: API key-based security
  • Unified Driver: Single entry point for batch/stream modes
  • Batch Mode: Generates Parquet files (optimized throughput)
  • Stream Mode: Async SSE with real-time delays (realistic simulation)
  • Determinism: Same seed = identical output across modes
  • HyperSynthReactor: Core L2 orderbook + trade generator

    • Price process: Merton Jump Diffusion (GBM + Poisson jumps)
    • Orderbook: Dynamic depth, spread elasticity, adverse selection
    • Trades: Poisson arrival, power-law sizing, clustering
    • Microstructure: Burst mode, staleness, queue dynamics
  • SyntheticTelemetryUplink: Spot/Perp/Funding triangular model

  • EventMarketFeed: Multi-frequency event streams

  • MarketOrchestrationCore: Consolidated market generator

  • math.py:

    • GeometricBrownianMotion: Standard diffusion
    • MertonJumpDiffusion: Fat-tailed returns
    • OrnsteinUhlenbeck: Mean-reverting processes (funding rates)
  • infrastructure.py:

    • Chronometer: Deterministic time progression
    • LatencyModel: Network/processing delays (fixed, jittery)
  • config.py:

    • SimulationManifest: 40+ parameters for full reproducibility
    • Hash-based caching, JSON serialization

Base Interface:

  • VenueProtocol: Abstract class for venue-specific feeds
  • BookQuote, TradeTick: Generic containers

Funding Models (Exchange-specific):

  • Binance: 8h intervals, interest rate (0.01%) + premium, ±0.75/2% caps
  • HyperLiquid: 1h intervals, velocity coefficients (1.0-2.5x), ±4% cap
  • OKX: 8h intervals, threshold damping, ±0.5/0.75% caps
  • Bybit: 8h intervals, insurance fund effects, ±1/2.5% caps

Venue Feeds:

  • Format converters to exact exchange JSON schemas
  • Binance WebSocket format, HyperLiquid L2 format
  • CanonicalizationEngine:

    • Converts all sources to canonical schema
    • NormalizedBookEvent, NormalizedTradeEvent
    • Monotonic IDs, nanosecond timestamps
    • Venue metadata preservation
  • ArtifactCacheController:

    • LZ4 compression (15-20x reduction)
    • MD5-based cache keys
    • Metadata sidecars (JSON)

7 Exchange Profiles:

  1. binance_spot_btc: 100ms updates, tight spreads
  2. binance_spot_eth: High activity, moderate spreads
  3. binance_futures_btc: Leverage volatility, 8h funding
  4. hyperliquid_perp_sol: 200ms updates, 1h funding, on-chain
  5. hyperliquid_perp_btc: Velocity coefficients
  6. okx_spot_btc: Competitive spreads
  7. bybit_futures_btc: 50ms updates, extreme bursts
  • conversions.py: BPS ↔ decimal ↔ percentage points, quote ↔ base
  • logger.py: Structured logging
  • enums.py: TradeSide, etc.

Batch Generation:

User → API → Driver (batch) → HyperSynthReactor →
→ [Venue Model] → Events → Normalizer → Parquet + LZ4 Cache

Stream Generation:

User → API → Driver (stream) → HyperSynthReactor →
→ [Venue Model] → Events → SSE → Client

Preset Usage:

get_preset("binance_futures_btc") → SimulationManifest →
→ HyperSynthReactor → BinanceFundingModel → Events
  • pandas: Data manipulation (legacy compatibility)
  • polars: High-performance DataFrames
  • numpy: Numerical computations
  • lz4: Fast compression
  • fastapi: HTTP API framework
  • uvicorn: ASGI server
  • pytest: Testing framework
  • pytest-cov: Coverage reporting
  • black: Code formatting
  • isort: Import sorting
  • flake8: Linting
  • mypy: Type checking
  • Minimum: 3.9+
  • Target: 3.9-3.12