The Opportunity
SEC filings are public but underutilized by retail traders. Institutional investors monitor these obsessively. I built a pipeline to poll SEC EDGAR for new filings in near real-time, parse and extract signal-worthy events, and deliver tradeable alerts before the news cycle catches up.
Filing Types That Matter
| Form | What It Means | Signal Priority |
|---|---|---|
| Form 4 | Insider buys/sells | High for CEO/CFO buys, cluster buys |
| 8-K | Material events (earnings, M&A, leadership) | High for earnings beat/miss |
| 13D/G | Activist positions (5%+ ownership) | High for activist language |
| 13F | Institutional holdings (quarterly) | High for notable filers |
| S-3, 424B, S-1 | Shelf registrations, prospectus, IPO | Float/dilution tracking |
Architecture
Ingestion Layer: - Airflow DAGs for historical backfill (2023-present) - RSS poller (60-second cycle) for real-time new filings Processing Layer: - Kafka topic: sec-filings (raw filing notifications) - Parser workers with form-specific parsers (XML/HTML extraction) - Signal generator with configurable thresholds Storage Layer: - SQL Server (TradingDB) with 9 normalized tables - sec_companies, sec_filings_raw, sec_insider_tx, sec_8k_events - sec_13f_holdings, sec_beneficial_owners, sec_signals - sec_cusip_mapping, sec_float_data, sec_lockup_calendar Signal Layer: - Kafka topic: sec-signals (actionable alerts) - Redis bridge for dashboard consumption
Signal Generation Logic
Form 4 Signals
- INSIDER_BUY – Any purchase (medium priority)
- CEO_CFO_BUY – C-suite purchase (high priority)
- INSIDER_CLUSTER_BUY – 2+ insiders buy within 7 days (high)
- INSIDER_LARGE_BUY – Purchase greater than $100k (high)
- DIRECTOR_BUY – Director non-officer purchase (medium)
Form 8-K Signals
- 8K_EARNINGS – Item 2.02 filed (high)
- 8K_EARNINGS_BEAT – Beat keywords detected (high)
- 8K_EARNINGS_MISS – Miss keywords detected (medium)
- Extracts EPS, revenue, quarter from text via regex patterns
Form 13D/G Signals
- 13D_NEW_POSITION – New 5%+ holder (high)
- 13D_ACTIVIST – Activist language in purpose (high)
- 13G_LARGE_PASSIVE – Greater than 10% passive position (low)
Form 13F Signals
- 13F_NEW_POSITION – Institution initiates new position
- 13F_ACCUMULATION – Greater than 20% share increase from prior quarter
- 13F_REDUCTION – Greater than 20% decrease (not exit)
- 13F_EXIT – Complete position exit
Notable filers (Berkshire Hathaway, Pershing Square, Renaissance Technologies, Bridgewater, Point72) get elevated priority and lower thresholds ($1M vs $10M minimum).
Why Insider Buys Matter
Insiders sell for many reasons (diversification, taxes, life events). But they only buy for one: they think the stock is going up. Cluster buying (multiple insiders in short period) is especially bullish.
SEC Rate Limiting
SEC has clear guidelines: 10 requests/second max, proper User-Agent. The pipeline respects these with exponential backoff on 503 errors. Getting blocked would defeat the purpose.
Float Tracking
- sec_float_data – Base float from Alpaca + adjusted float
- sec_float_events – Float-changing events (offerings, lockup expirations)
- sec_lockup_calendar – IPO lockup expiration tracking
- Signals: FLOAT_OFFERING_PRICED, FLOAT_LOCKUP_1D, FLOAT_SHELF_FILED
Infrastructure
| Service | Location | Purpose |
|---|---|---|
| Airflow DAGs | af01:/opt/airflow_env/dags/sec/ | Historical backfill |
| sec-realtime | pve04 LXC | RSS polling, Kafka producer |
| sec-parser | pve04 LXC | Kafka consumer, signal generation |
| SQL Server | sql03.ad.techsnet.net | TradingDB storage |
| Kafka | 10.31.11.10 | Message broker |
| Redis | 10.31.13.10 | Dashboard buffer |
Code Statistics
- 6 Airflow DAGs: Form 4, 8-K, 13D/G, 13F, 13F reverse backfill, float refresh
- 6 Parsers: form4.py, form8k.py, form13dg.py, form13f.py, forms1.py, forms3_424b.py
- 9 SQL Migrations: Normalized schema for all filing types
- Signal Generator: ~400 lines covering all form types
Key Design Decision: Reverse Backfill
13F backfill runs newest-to-oldest to prioritize recent data. Quarterly filings mean older data is less actionable – get the current quarter processed first.