Core Specifications
Dual-Driver Architecture
Section titled “Dual-Driver Architecture”The Aleatoric Engine utilizes a Dual-Driver Architecture to ensure that data generation is identical regardless of consumption mode. This guarantees datasets used for training ML models (Batch) are mathematically identical to data streams used for live testing (Stream).
Driver Functions
Section titled “Driver Functions”Located in src/aleatoric/drivers.py:
run_batch()
Section titled “run_batch()”High-throughput batch generation with optional multiprocessing.
def run_batch( config: SimulationManifest, duration_seconds: float, chunk_size: int = 1000, multiprocess: bool = False, workers: Optional[int] = None, window_seconds: Optional[float] = None, max_retries: int = 3, backoff_seconds: float = 0.5) -> Tuple[str, int]: """Returns: (file_path, row_count)"""run_stream()
Section titled “run_stream()”Real-time streaming with optional wall-clock timing.
async def run_stream( config: SimulationManifest, duration_seconds: Optional[float] = None, real_time: bool = True) -> AsyncGenerator[Tuple[str, dict], None]: """Yields: (event_type, event_data) tuples"""Driver Modes
Section titled “Driver Modes”1. Batch Mode
Section titled “1. Batch Mode”- Purpose: High-throughput generation for historical analysis and ML training
- Mechanism: Iterates through market generator as fast as CPU allows, writes chunks to Parquet
- Performance: Millions of events per second (CPU bound)
- Output: File path to generated Parquet file and row count
2. Batch Mode with Multiprocessing
Section titled “2. Batch Mode with Multiprocessing”- Purpose: Scale batch generation across multiple CPU cores
- Mechanism: Divides duration into time windows, spawns ProcessPool workers
- Configuration:
multiprocess=True: Enable multiprocessingworkers: Number of parallel workers (default: min(4, CPU count))window_seconds: Duration per window (auto-calculated if not specified)
- Determinism: Preserved via per-window seed derivation from base seed
3. Stream Mode
Section titled “3. Stream Mode”- Purpose: Real-time simulation for bot testing, UI development, system integration
- Mechanism: Injects
asyncio.sleep()to match wall-clock time - Behavior: “Plays back” the simulation in real-time
- Output: AsyncGenerator yielding event tuples
Environment Configuration
Section titled “Environment Configuration”| Variable | Default | Description |
|---|---|---|
ALEATORIC_DRIVER_ENABLE_MULTIPROCESS | false | Enable multiprocessing by default |
ALEATORIC_DRIVER_MAX_WORKERS | auto | Maximum worker processes |
ALEATORIC_DRIVER_WINDOW_SECONDS | auto | Window duration for multiprocess |
ALEATORIC_DRIVER_MAX_RETRIES | 3 | Retries for failed windows |
ALEATORIC_DRIVER_BACKOFF_SECONDS | 0.5 | Backoff between retries |
ALEATORIC_BATCH_CHUNK_SIZE | 1000 | Events per chunk before flush |
Determinism Guarantee
Section titled “Determinism Guarantee”Both batch and stream modes guarantee bit-for-bit reproducibility given the same seed:
# These produce identical event sequences:run_batch(SimulationManifest(seed=42), duration_seconds=100)run_stream(SimulationManifest(seed=42), duration_seconds=100, real_time=False)
# Multiprocess also preserves determinism:run_batch(config, duration_seconds=100, multiprocess=False)run_batch(config, duration_seconds=100, multiprocess=True, workers=4)# ^ Both produce identical outputL2 Order Book
Section titled “L2 Order Book”The engine maintains a full double-sided order book.
| Feature | Specification |
|---|---|
| Depth | Unlimited (configurable) |
| Matching Engine | FIFO (First-In-First-Out) |
| Order Types | Limit, Market, IOC, FOK, Post-Only |
| Precision | 18 decimal places (floating point safe) |
Data Retention
Section titled “Data Retention”- Local: Parquet artifacts written to
artifact_storage_dir - Object Storage: S3-compatible backend via
ALEATORIC_ARTIFACT_*env vars - Provenance: Cache manifests include
seed,preset, and manifest hashes for deterministic replay