Skip to content

Features

Unlike generic synthetic data generators that emit perfect metronome ticks, The Aleatoric Engine replicates real exchange behavior:

market = HyperSynthReactor(
symbol="SOL",
book_update_interval_ms=100, # Match Binance 100ms snapshot rate
trade_intensity_base=2.0, # ~2 trades/sec baseline
)

Simulate exchange message avalanches during high volatility:

config = SimulationManifest(
burst_probability=0.05, # 5% chance of entering burst
burst_intensity_factor=10.0, # 10x message rate during burst
)

Real-world scenario: During a liquidation cascade, message rates spike 10-100x. The Aleatoric Engine captures this.

Network jitter and exchange processing delays:

config = SimulationManifest(
staleness_ms=50.0, # Mean 50ms lag with log-normal distribution
)

Includes:

  • Log-normal jitter distribution (realistic network behavior)
  • Occasional massive lag spikes (0.5% probability of 100-1000ms delays)
  • Separate timestamp_ms (exchange time) and capture_time_ms (receive time)

Liquidity providers react to aggressive flow:

# After a large sell hits the bid, bid-side depth depletes
# and slowly replenishes over ~5-10 seconds
market = HyperSynthReactor(
adverse_selection_strength=0.15, # 15% depth depletion
impact_decay_halflife_ms=5000.0, # 5s recovery
)
  • Exponential depth profile: Size decreases exponentially away from best bid/ask
  • Volatility-dependent liquidity withdrawal: Higher vol → lower depth
  • Toxicity-aware spread widening: Informed flow detection → spreads widen
# Spreads widen under high toxicity (informed trading)
spread_vol_sensitivity=0.15, # Volatility impact
spread_toxicity_sensitivity=0.25, # Informed flow impact

Real orderbooks don’t have uniform depth. The Aleatoric Engine models:

  • Power-law depth decay (depth_decay_rate=0.85)
  • Volume-weighted microprice calculation
  • Realistic bid/ask imbalances
trade_size_alpha=2.5, # Pareto exponent (lower = fatter tail)
min_trade_size=0.1,
max_trade_size=50.0,

Result: Many small trades (~0.1-1.0 size), occasional whale trades (10-50 size).


Spot Price Process:

  • Geometric Brownian Motion (GBM) with configurable drift/volatility
  • Jump diffusion for tail events (liquidations, news shocks)

Funding Rate Dynamics:

  • Ornstein-Uhlenbeck (OU) mean-reverting process
  • Hard bounds to prevent unrealistic rates
  • White noise component for micro-jitter

Perpetual Pricing:

Perp Price = Spot × (1 + Funding Rate + Basis Deviation)

Example configuration:

from aleatoric.gen.feed import SyntheticTelemetryUplink
feed = SyntheticTelemetryUplink(
clock=clock,
gbm_drift_annual=0.0,
gbm_vol_annual=0.80,
funding_mean_bps_hr=0.0,
funding_kappa=1.5, # Mean reversion speed
funding_sigma=2.0, # Volatility in bps/√hour
funding_bounds_bps_hr=(-8.0, 8.0),
)
  • Configurable settlement intervals (1h, 8h, etc.)
  • Price convergence simulation near settlement
  • TWAP funding rate calculation

Building a trading system that works across Binance, HyperLiquid, OKX, Bybit requires:

  • N different WebSocket parsers
  • N different data models
  • N different edge cases

Solution: Emit Exchange-Specific + Normalize

Section titled “Solution: Emit Exchange-Specific + Normalize”

Step 1: Generate exchange-accurate raw data

from aleatoric.gen.hyperliquid_format import stream_hyperliquid_format
for channel, data in stream_hyperliquid_format(market, duration_seconds=60):
if channel == 'l2Book':
# Exact HyperLiquid WsBook format
bids, asks = data['levels']
print(bids[0]) # {'px': '100.50', 'sz': '10.5000', 'n': 3}

Step 2: Normalize to canonical schema

from aleatoric.process.normalizer import CanonicalizationEngine
normalizer = CanonicalizationEngine(enable_cache=True)
norm_event = normalizer.normalize_synthetic(event_type, event)
# Result: NormalizedBookEvent with standardized structure
# Works identically for HyperLiquid, Binance, synthetic data

Step 3: Cache for reusability

df = normalizer.normalize_and_cache(
source="synthetic",
symbol="SOL",
start_date="2025-08-01",
end_date="2025-08-07",
seed=42,
)
# Cached to ~/.hft_cache/ as LZ4-compressed Parquet

# Generate 1 year of high-fidelity data in minutes
config = SimulationManifest(
symbol="BTC",
volatility_annual=0.8,
burst_probability=0.10, # More frequent bursts for stress testing
seed=42 # Reproducible
)
market = HyperSynthReactor.from_config(config)
events = market.stream(duration_seconds=365*24*3600)
# Train your model on realistic microstructure
# Test your orderbook-based strategy
for event_type, event in market.stream(duration_seconds=3600):
if event_type == 'book':
# Your strategy logic
imbalance = calculate_imbalance(event.bids, event.asks)
if imbalance > threshold:
send_order()
# Stress test with burst mode
config = SimulationManifest(
burst_probability=0.20, # 20% chance
burst_intensity_factor=50.0, # 50x message rate
staleness_ms=100.0, # 100ms mean lag
)
# Validate your WebSocket client handles:
# - Message bursts
# - Out-of-order timestamps
# - Stale data detection
# Generate historical-like datasets
normalizer = CanonicalizationEngine(enable_cache=True)
for symbol in ["BTC", "ETH", "SOL"]:
df = normalizer.normalize_and_cache(
source="synthetic",
symbol=symbol,
start_date="2024-01-01",
end_date="2024-12-31",
force_regenerate=True
)
# Sell to customers as "backtesting dataset"

The Aleatoric Engine matches real market signatures:

  • Volatility clustering: GARCH-like behavior via realized vol feedback
  • Heavy tails: Jump diffusion + power-law trade sizes
  • Autocorrelation: Impulse responses create realistic serial correlation
  • Volume clustering: Temporal trade intensity bursts
from aleatoric.gen.feed_eda import validate_bid_ask_consistency
df = build_eda_dataframe(steps=5000)
results = validate_bid_ask_consistency(df)
# Automated checks:
# ✓ Bid/ask ordering (bids descending, asks ascending)
# ✓ Spread consistency
# ✓ Quantity distributions
# ✓ Funding skew correlation

  • LZ4 compression: 10-20x reduction, 500+ MB/s decompression
  • Parquet columnar storage: Efficient time-series queries
  • Metadata tracking: Full reproducibility

Example:

✅ Cache hit: a3f2b9c8d1e4f5a6
Loaded 1,000,000 events (800,000 books, 200,000 trades)
Compression: 15.2x, Size: 12.4 MB
  • Book updates: ~100,000/sec (single-threaded)
  • Trade generation: ~50,000/sec
  • Full day (86,400s) of 100ms books: Generated in ~10 seconds