Building ATP: An Automated Trading Platform with ML-Driven Signals

The Problem

Manual trading is slow, emotional, and inconsistent. I wanted a system that could scan for momentum opportunities, generate ML-driven signals, and execute trades automatically with proper risk management.

Development Timeline

Built over 2 months (November 2025 – January 2026) with 54 commits. The project evolved through distinct phases:

Phase 1: Foundation (Nov 7)

Initial commit established data infrastructure with SQL Server integration and XGBoost as the ML backbone.

Phase 2: ML Pipeline (Nov 7-8)

Built the training infrastructure: feature engineering, label generation, model trainer. Hit a critical bug early – pandas chained assignment was silently failing to set labels. Fixed by using df.at[actual_idx, label] instead of chained loc/iloc. This was causing 0% positive labels vs expected 63%.

First working model achieved Validation AUC: 0.614. Implemented hybrid adaptive targeting – dynamic profit targets based on ML confidence (50% weight) + volatility/ATR (50% weight), bounded between 1.5% and 5%.

Phase 3: Paper Trading (Nov 9-10)

Launched paper trading with all components wired together:

AlpacaClient + WebSocket streaming
PositionManager with persistence
RiskManager + OrderExecutor
LiveInferenceEngine for real-time ML

Trailing stop: 1% activation, 0.5% trail. $200k buying power (100k cash + 2x margin).

Bug fixed: Position persistence – system was losing track of positions on restart. Fixed by loading existing Alpaca positions on startup.

Phase 4: Training V2 (Nov 12)

Complete ML pipeline overhaul:

TickFeatureEngineer with microstructure features (order flow, VWAP, trade size)
OptimizedLabelGenerator with triple-barrier method
Expanded from 12 days to 365 days (660M ticks) of training data
Added news integration for event-driven trading

Phase 5: Real-time Scanner (Nov 17-18)

WebSocket-based momentum scanner detecting rapid price movements:

10-second OHLCV bar construction from live trade stream
Subscribes to ~3,000 symbols with less than 100ms latency
Slack integration with rich Block Kit formatting
Claude AI trade reviewer – every trade analyzed by Claude Sonnet 4.5

Phase 6: Scaling (Dec-Jan)

Wildcard subscription for entire market (~8-10k symbols)
Database-driven symbol loading
Windows + Linux deployment via systemd

Architecture

+-------------------------------------------------------------+
|                    Apache Airflow 2.x                        |
|   +-------------+  +-------------+  +---------------------+ |
|   |   Training  |  |   Signal    |  |  Position Manager   | |
|   |     DAGs    |  |  Generation |  |   (Trailing Stops)  | |
|   +------+------+  +------+------+  +---------+-----------+ |
+----------+----------------+-------------------+-------------+
           |                |                   |
     +-----v-----------------------------------------+
     |              MS SQL Server                     |
     |    (Historical data, signals, positions)       |
     +------------------------------------------------+
                            |
     +----------------------v--------------------------+
     |           Alpaca Markets API                    |
     |    (WebSocket streaming + Order execution)      |
     +------------------------------------------------+

AI-Powered Development

The project extensively used Claude Code for pair programming, with AI co-authoring many commits. The Claude AI Trade Reviewer provides real-time analysis of every trade:

NRXP was bought due to exceptional ML confidence (91.2%), but the extreme volatility (192%) should have been a red flag…

Cost tracking: ~$0.01-0.05 per trade analysis.

Bugs Fixed

Bug	Symptom	Fix
Label Generator	0% positive labels	Use df.at[] not chained assignment
Position Persistence	Orphaned positions after restart	Load existing Alpaca positions on startup
Indicator Reset	Indicators reset to 0 at date boundaries	Stored procedures with lookback data
WebSocket Handlers	Async bar subscription failures	Proper async handler wiring
DataFrame Fragmentation	Performance warnings	Use df.assign(**new_cols)

Results

Current status: Phase 3 live paper trading with 12 symbols. The expectancy journey went from +0.005% (not tradeable) to targeting 0.15-0.30% with news integration. System runs 24/5 during market hours.

Key Learnings

Data quality matters more than model complexity. Spent weeks tuning hyperparameters when the real issue was bad data.

Paper trading reveals edge cases. Backtester did not account for partial fills, after-hours gaps, or WebSocket disconnects.

10-second bars work. Faster signal generation than minute bars, but requires solid infrastructure to handle the data volume.