Tick Archive: High-Throughput Market Data Storage

The Problem

Real-time market data is valuable but ephemeral. For backtesting and ML training, you need historical tick data. Commercial solutions are expensive. I built my own archival system.

Architecture

Kafka (market-data) -> Go Ingestor -> WAL (local HDD) -> Flusher -> Parquet (Ceph)
                           |                              |
                    symbols.dict                   date/hour/bucket/
                    (symbol->id)                   part-NNNNNN.parquet

Infrastructure:

Deployed on tick-archive-lxc (VMID 206, 10.31.11.16)
4 vCPU, 4GB RAM
20GB local-lvm for WAL (fast sequential writes)
250GB Ceph RBD for Parquet (distributed, fault-tolerant)

WAL Implementation

The WAL is designed specifically for HDD-friendly sequential writes:

Buffering Strategy:

4MB write buffer accumulates before syscall
fdatasync every 128MB or 15 seconds (safety cap)
256MB segment size before rotation

Fixed-Size Binary Records (48 bytes):

ts_nanos    int64   (8)  - Event timestamp UTC
symbol_id   uint32  (4)  - From symbols.dict
price       int64   (8)  - Scaled: price * 1e6
size        uint32  (4)  - Trade size
exchange    uint16  (2)  - Exchange enum
cond_len    uint8   (1)  - Condition count (0-15)
cond_data   [15]u8  (15) - Packed condition codes
trade_id    int64   (8)  - Trade ID from feed
flags       uint16  (2)  - Replay/correction flags
crc32       uint32  (4)  - CRC of above 44 bytes

Backpressure Mechanism:

Checks total WAL size before each write
Returns ErrWALFull if exceeds 10GB
Ingestor pauses consumption, Kafka backs up naturally
Never drops data silently

Symbol Dictionary:

Maps symbol strings to uint32 IDs (4 bytes vs variable string)
Append-only, persisted to symbols.dict
fsync before updating in-memory map

Parquet Conversion

Partitioning Strategy:

date=YYYY-MM-DD/hour=HH/bucket=NNN/part-NNNNNN.parquet

date: Natural retention boundary
hour: Limits files per directory, enables parallel replay
bucket: xxhash32(symbol_id) & 255 distributes symbols evenly

Arrow/Parquet Schema:

Uses Apache Arrow Go library
Zstd compression (level 3) for 3-5x compression
Dictionary encoding enabled

Atomic Commit Pattern:

Write to tmp/ directory first
fsync the file
Atomic rename to final path
fsync parent directory
Write _SUCCESS marker
Only then delete WAL segment

Storage Tiers

Tier	Storage	Purpose
WAL	20GB local SSD	Durability, crash recovery
Hot Parquet	250GB Ceph RBD	Recent data, fast access
Cold Archive	Object storage	Historical (future)

Performance

Current Throughput:

Messages: 452/sec (trades + quotes + bars)
Trades: 134/sec written to WAL
Symbols: 847 unique in first minute

Storage Projections:

WAL growth: 6.4 KB/sec = 23 MB/hour
Trading day (6.5 hrs): ~150 MB
Parquet with Zstd: 3-5x compression = ~30-50 MB/day
14-day retention: ~500-700 MB

Design Decisions

Why Local HDD for WAL (not Ceph, not tmpfs):

tmpfs: Loses data on restart (rejected)
Ceph: High latency for fdatasync (rejected)
Local HDD: Durable, predictable latency, handles sequential writes fine

Why Fixed-Size Binary Records (not JSON, not Protobuf):

JSON: 5-10x larger, parsing overhead
Protobuf: Overkill for fixed schema
Binary: Compact, fast, simple seeking

Why Backpressure over Data Loss:

System slows down rather than dropping data
Visible in metrics for alerting
Kafka naturally buffers during pause

Crash Recovery

Orphaned tmp files: Cleaned up on startup
Missing _SUCCESS markers: Re-flush from WAL
WAL segment survives: Until all Parquet confirmed
CRC validation: Per-record corruption detection
Recovery: Scan until last good CRC, truncate

Deployment

Two systemd services:

tick-archive-ingest: Kafka to WAL (port 8080 metrics)
tick-archive-flush: WAL to Parquet (port 8081 metrics)

Prometheus metrics: tick_archive_ingest_messages_total, tick_archive_ingest_trades_total, tick_archive_ingest_backpressure_total