Skip to content

Core Specifications

The Aleatoric Engine utilizes a Dual-Driver Architecture to ensure that data generation is identical regardless of consumption mode. This guarantees datasets used for training ML models (Batch) are mathematically identical to data streams used for live testing (Stream).

Located in src/aleatoric/drivers.py:

High-throughput batch generation with optional multiprocessing.

def run_batch(
config: SimulationManifest,
duration_seconds: float,
chunk_size: int = 1000,
multiprocess: bool = False,
workers: Optional[int] = None,
window_seconds: Optional[float] = None,
max_retries: int = 3,
backoff_seconds: float = 0.5
) -> Tuple[str, int]:
"""Returns: (file_path, row_count)"""

Real-time streaming with optional wall-clock timing.

async def run_stream(
config: SimulationManifest,
duration_seconds: Optional[float] = None,
real_time: bool = True
) -> AsyncGenerator[Tuple[str, dict], None]:
"""Yields: (event_type, event_data) tuples"""
  • Purpose: High-throughput generation for historical analysis and ML training
  • Mechanism: Iterates through market generator as fast as CPU allows, writes chunks to Parquet
  • Performance: Millions of events per second (CPU bound)
  • Output: File path to generated Parquet file and row count
  • Purpose: Scale batch generation across multiple CPU cores
  • Mechanism: Divides duration into time windows, spawns ProcessPool workers
  • Configuration:
    • multiprocess=True: Enable multiprocessing
    • workers: Number of parallel workers (default: min(4, CPU count))
    • window_seconds: Duration per window (auto-calculated if not specified)
  • Determinism: Preserved via per-window seed derivation from base seed
  • Purpose: Real-time simulation for bot testing, UI development, system integration
  • Mechanism: Injects asyncio.sleep() to match wall-clock time
  • Behavior: “Plays back” the simulation in real-time
  • Output: AsyncGenerator yielding event tuples
VariableDefaultDescription
ALEATORIC_DRIVER_ENABLE_MULTIPROCESSfalseEnable multiprocessing by default
ALEATORIC_DRIVER_MAX_WORKERSautoMaximum worker processes
ALEATORIC_DRIVER_WINDOW_SECONDSautoWindow duration for multiprocess
ALEATORIC_DRIVER_MAX_RETRIES3Retries for failed windows
ALEATORIC_DRIVER_BACKOFF_SECONDS0.5Backoff between retries
ALEATORIC_BATCH_CHUNK_SIZE1000Events per chunk before flush

Both batch and stream modes guarantee bit-for-bit reproducibility given the same seed:

# These produce identical event sequences:
run_batch(SimulationManifest(seed=42), duration_seconds=100)
run_stream(SimulationManifest(seed=42), duration_seconds=100, real_time=False)
# Multiprocess also preserves determinism:
run_batch(config, duration_seconds=100, multiprocess=False)
run_batch(config, duration_seconds=100, multiprocess=True, workers=4)
# ^ Both produce identical output

The engine maintains a full double-sided order book.

FeatureSpecification
DepthUnlimited (configurable)
Matching EngineFIFO (First-In-First-Out)
Order TypesLimit, Market, IOC, FOK, Post-Only
Precision18 decimal places (floating point safe)
  • Local: Parquet artifacts written to artifact_storage_dir
  • Object Storage: S3-compatible backend via ALEATORIC_ARTIFACT_* env vars
  • Provenance: Cache manifests include seed, preset, and manifest hashes for deterministic replay