Features
The Aleatoric Engine: Features
Section titled “The Aleatoric Engine: Features”Core Differentiators
Section titled “Core Differentiators”1. Exchange-Accurate Cadence Simulation
Section titled “1. Exchange-Accurate Cadence Simulation”Unlike generic synthetic data generators that emit perfect metronome ticks, The Aleatoric Engine replicates real exchange behavior:
Rate Limit Simulation
Section titled “Rate Limit Simulation”market = HyperSynthReactor( symbol="SOL", book_update_interval_ms=100, # Match Binance 100ms snapshot rate trade_intensity_base=2.0, # ~2 trades/sec baseline)Burst Mode
Section titled “Burst Mode”Simulate exchange message avalanches during high volatility:
config = SimulationManifest( burst_probability=0.05, # 5% chance of entering burst burst_intensity_factor=10.0, # 10x message rate during burst)Real-world scenario: During a liquidation cascade, message rates spike 10-100x. The Aleatoric Engine captures this.
Staleness Simulation
Section titled “Staleness Simulation”Network jitter and exchange processing delays:
config = SimulationManifest( staleness_ms=50.0, # Mean 50ms lag with log-normal distribution)Includes:
- Log-normal jitter distribution (realistic network behavior)
- Occasional massive lag spikes (0.5% probability of 100-1000ms delays)
- Separate
timestamp_ms(exchange time) andcapture_time_ms(receive time)
2. Microstructure-Correct L2/L3 Behavior
Section titled “2. Microstructure-Correct L2/L3 Behavior”Order Book Replenishment
Section titled “Order Book Replenishment”Liquidity providers react to aggressive flow:
# After a large sell hits the bid, bid-side depth depletes# and slowly replenishes over ~5-10 secondsmarket = HyperSynthReactor( adverse_selection_strength=0.15, # 15% depth depletion impact_decay_halflife_ms=5000.0, # 5s recovery)Queue Dynamics
Section titled “Queue Dynamics”- Exponential depth profile: Size decreases exponentially away from best bid/ask
- Volatility-dependent liquidity withdrawal: Higher vol → lower depth
- Toxicity-aware spread widening: Informed flow detection → spreads widen
# Spreads widen under high toxicity (informed trading)spread_vol_sensitivity=0.15, # Volatility impactspread_toxicity_sensitivity=0.25, # Informed flow impactBook Shape Distributions
Section titled “Book Shape Distributions”Real orderbooks don’t have uniform depth. The Aleatoric Engine models:
- Power-law depth decay (
depth_decay_rate=0.85) - Volume-weighted microprice calculation
- Realistic bid/ask imbalances
Trade Size Heteroskedasticity
Section titled “Trade Size Heteroskedasticity”trade_size_alpha=2.5, # Pareto exponent (lower = fatter tail)min_trade_size=0.1,max_trade_size=50.0,Result: Many small trades (~0.1-1.0 size), occasional whale trades (10-50 size).
3. Spot-Perp-Funding Triangular Modeling
Section titled “3. Spot-Perp-Funding Triangular Modeling”Complete Crypto Derivatives Framework
Section titled “Complete Crypto Derivatives Framework”Spot Price Process:
- Geometric Brownian Motion (GBM) with configurable drift/volatility
- Jump diffusion for tail events (liquidations, news shocks)
Funding Rate Dynamics:
- Ornstein-Uhlenbeck (OU) mean-reverting process
- Hard bounds to prevent unrealistic rates
- White noise component for micro-jitter
Perpetual Pricing:
Perp Price = Spot × (1 + Funding Rate + Basis Deviation)Example configuration:
from aleatoric.gen.feed import SyntheticTelemetryUplink
feed = SyntheticTelemetryUplink( clock=clock, gbm_drift_annual=0.0, gbm_vol_annual=0.80, funding_mean_bps_hr=0.0, funding_kappa=1.5, # Mean reversion speed funding_sigma=2.0, # Volatility in bps/√hour funding_bounds_bps_hr=(-8.0, 8.0),)Funding Settlement
Section titled “Funding Settlement”- Configurable settlement intervals (1h, 8h, etc.)
- Price convergence simulation near settlement
- TWAP funding rate calculation
4. Built-in Multi-Exchange Normalizer
Section titled “4. Built-in Multi-Exchange Normalizer”Problem: The N+1 Integration Nightmare
Section titled “Problem: The N+1 Integration Nightmare”Building a trading system that works across Binance, HyperLiquid, OKX, Bybit requires:
- N different WebSocket parsers
- N different data models
- N different edge cases
Solution: Emit Exchange-Specific + Normalize
Section titled “Solution: Emit Exchange-Specific + Normalize”Step 1: Generate exchange-accurate raw data
from aleatoric.gen.hyperliquid_format import stream_hyperliquid_format
for channel, data in stream_hyperliquid_format(market, duration_seconds=60): if channel == 'l2Book': # Exact HyperLiquid WsBook format bids, asks = data['levels'] print(bids[0]) # {'px': '100.50', 'sz': '10.5000', 'n': 3}Step 2: Normalize to canonical schema
from aleatoric.process.normalizer import CanonicalizationEngine
normalizer = CanonicalizationEngine(enable_cache=True)norm_event = normalizer.normalize_synthetic(event_type, event)
# Result: NormalizedBookEvent with standardized structure# Works identically for HyperLiquid, Binance, synthetic dataStep 3: Cache for reusability
df = normalizer.normalize_and_cache( source="synthetic", symbol="SOL", start_date="2025-08-01", end_date="2025-08-07", seed=42,)# Cached to ~/.hft_cache/ as LZ4-compressed ParquetUse Cases
Section titled “Use Cases”For AI/ML Training
Section titled “For AI/ML Training”# Generate 1 year of high-fidelity data in minutesconfig = SimulationManifest( symbol="BTC", volatility_annual=0.8, burst_probability=0.10, # More frequent bursts for stress testing seed=42 # Reproducible)
market = HyperSynthReactor.from_config(config)events = market.stream(duration_seconds=365*24*3600)
# Train your model on realistic microstructureFor Strategy Backtesting
Section titled “For Strategy Backtesting”# Test your orderbook-based strategyfor event_type, event in market.stream(duration_seconds=3600): if event_type == 'book': # Your strategy logic imbalance = calculate_imbalance(event.bids, event.asks) if imbalance > threshold: send_order()For Infrastructure Testing
Section titled “For Infrastructure Testing”# Stress test with burst modeconfig = SimulationManifest( burst_probability=0.20, # 20% chance burst_intensity_factor=50.0, # 50x message rate staleness_ms=100.0, # 100ms mean lag)
# Validate your WebSocket client handles:# - Message bursts# - Out-of-order timestamps# - Stale data detectionFor Data Vendor Development
Section titled “For Data Vendor Development”# Generate historical-like datasetsnormalizer = CanonicalizationEngine(enable_cache=True)
for symbol in ["BTC", "ETH", "SOL"]: df = normalizer.normalize_and_cache( source="synthetic", symbol=symbol, start_date="2024-01-01", end_date="2024-12-31", force_regenerate=True ) # Sell to customers as "backtesting dataset"📊 Validation & Quality
Section titled “📊 Validation & Quality”Statistical Properties
Section titled “Statistical Properties”The Aleatoric Engine matches real market signatures:
- Volatility clustering: GARCH-like behavior via realized vol feedback
- Heavy tails: Jump diffusion + power-law trade sizes
- Autocorrelation: Impulse responses create realistic serial correlation
- Volume clustering: Temporal trade intensity bursts
Validation Tools
Section titled “Validation Tools”from aleatoric.gen.feed_eda import validate_bid_ask_consistency
df = build_eda_dataframe(steps=5000)results = validate_bid_ask_consistency(df)
# Automated checks:# ✓ Bid/ask ordering (bids descending, asks ascending)# ✓ Spread consistency# ✓ Quantity distributions# ✓ Funding skew correlationPerformance
Section titled “Performance”Caching System
Section titled “Caching System”- LZ4 compression: 10-20x reduction, 500+ MB/s decompression
- Parquet columnar storage: Efficient time-series queries
- Metadata tracking: Full reproducibility
Example:
✅ Cache hit: a3f2b9c8d1e4f5a6 Loaded 1,000,000 events (800,000 books, 200,000 trades) Compression: 15.2x, Size: 12.4 MBGeneration Speed
Section titled “Generation Speed”- Book updates: ~100,000/sec (single-threaded)
- Trade generation: ~50,000/sec
- Full day (86,400s) of 100ms books: Generated in ~10 seconds