The Problem
I've been trading crypto for years. Like most traders, I was losing more than I was winning. The market moves too fast for human decision-making — by the time you analyze a chart, the opportunity is gone.
So I did what any developer would do: I tried to automate it.
The First Attempt: Tree-Based Models (54% accuracy)
I started with the classic ML approach:
- XGBoost + LightGBM ensemble
- 72 features (RSI, MACD, Bollinger Bands, volume ratios, etc.)
- Walk-forward validation (no data leakage!)
The result? 54% accuracy on out-of-sample data.
That's barely better than a coin flip. And with trading fees eating 0.1% per trade, 54% actually loses money.
I tried everything:
- Feature engineering (72 → 100+ features)
- Hyperparameter tuning (Optuna, 500 trials)
- CatBoost added to the ensemble
- Different timeframes (15m, 1h, 4h)
Best I could squeeze out: 60%. Still not enough.
The Breakthrough: LSTM + Attention (84.6%)
Then I rented a GPU on RunPod ($0.44/hour, A40) and tried something different: a Bidirectional LSTM with Multi-Head Attention.
Why LSTM? Because financial markets have temporal dependencies. A candle 30 hours ago affects what happens now. Tree-based models treat each data point independently — they can't learn these sequences.
Architecture
class LSTMPredictor(nn.Module):
def init(self, input_size=19, hidden_size=128, num_layers=2):
self.lstm = nn.LSTM(
input_size, hidden_size, num_layers,
batch_first=True, bidirectional=True, dropout=0.3
)
self.attention = nn.MultiheadAttention(
hidden_size * 2, num_heads=4, batch_first=True
)
self.fc = nn.Sequential(
nn.Linear(hidden_size * 2, 64),
nn.ReLU(), nn.Dropout(0.3),
nn.Linear(64, 2) # UP or DOWN
)
Key decisions:
- 19 features (not 72 — less is more with LSTM)
- 64-hour sequences (the model "sees" nearly 3 days of history)
- Bidirectional (learns patterns forward and backward)
- Multi-head attention (focuses on the most important timesteps)
Training Tricks That Mattered
- Data Augmentation # Add noise to prevent overfitting noise = np.random.normal(0, 0.01, features.shape) augmented = features + noise
# Time shift: offset sequences by 1-3 candles
shift = np.random.randint(1, 4)
shifted = features[shift:]
- Snapshot Ensemble Instead of using the single best model, I saved the top 3 checkpoints and averaged their predictions. This reduces variance significantly:
- Snapshot 1: 84.6%
- Snapshot 2: 84.0%
- Snapshot 3: 84.1%
Ensemble: 84.6%
Walk-Forward Validation
This is critical. Most "90% accuracy" claims use random train/test splits, which leak future data into training. Walk-forward validation is temporal:
Window 1: Train [Jan-Jun] → Test [Jul-Aug] = 70.3%
Window 2: Train [Mar-Aug] → Test [Sep-Oct] = 72.2%
Window 3: Train [May-Oct] → Test [Nov-Dec] = 85.0%
Window 4: Train [Jul-Dec] → Test [Jan-Feb] = 83.8%
Average: 77.8%
The model genuinely improved on recent data — it's not just memorizing patterns.
The 19 Features
After testing 72+ features, these 19 survived:
┌───────┬───────────────────────────┬───────────────────────────┐
│ # │ Feature │ Why It Works │
├───────┼───────────────────────────┼───────────────────────────┤
│ 1-4 │ Returns (1h, 3h, 5h, 10h) │ Multi-scale momentum │
├───────┼───────────────────────────┼───────────────────────────┤
│ 5-6 │ Volatility (10h, 20h) │ Risk regime detection │
├───────┼───────────────────────────┼───────────────────────────┤
│ 7 │ RSI (14) │ Overbought/oversold │
├───────┼───────────────────────────┼───────────────────────────┤
│ 8 │ MACD │ Trend strength │
├───────┼───────────────────────────┼───────────────────────────┤
│ 9 │ Bollinger Width │ Squeeze detection │
├───────┼───────────────────────────┼───────────────────────────┤
│ 10 │ ATR │ Volatility-adjusted stops │
├───────┼───────────────────────────┼───────────────────────────┤
│ 11 │ Volume Ratio │ Volume breakouts │
├───────┼───────────────────────────┼───────────────────────────┤
│ 12-14 │ EMA distance (9, 21, 50) │ Trend position │
├───────┼───────────────────────────┼───────────────────────────┤
│ 15 │ Candle Body Ratio │ Price action │
├───────┼───────────────────────────┼───────────────────────────┤
│ 16 │ Z-Score │ Mean reversion signal │
├───────┼───────────────────────────┼───────────────────────────┤
│ 17 │ Log Volume │ Normalized volume │
├───────┼───────────────────────────┼───────────────────────────┤
│ 18-19 │ Hour/Day encoding │ Time patterns │
└───────┴───────────────────────────┴───────────────────────────┘
From Model to Live Trading Bot
Having a good model is only half the battle. Turning it into a live trading system required:
Risk Management
- ATR-based stop-loss (adapts to volatility)
- Partial close at +1.5% and +3% (lock in profits)
- Trailing stop (let winners run)
- Max 3 concurrent positions (diversification)
- Auto-unstuck (graduated exit from losing trades)
Infrastructure
- Bybit API with limit orders (0.02% maker fee vs 0.055% taker)
- Exchange-side stop-loss (safety net if bot crashes)
- PM2 process management with auto-restart
- PostgreSQL for full state persistence (survives restarts)
- Telegram notifications (public + private channels)
The Blend
The final system uses both models:
- 60% tree-based (XGBoost + LightGBM) for baseline
- 40% LSTM for neural boost
- Direction-aware blending for SHORT positions
Results
After deploying the LSTM model:
- Old model (tree-based only): 54% accuracy, losing money to fees
- New model (LSTM blend): 84.6% directional accuracy
- The bot now opens positions with 80-95% confidence signals
- Pump scanner catches volume spikes (5x+ average) for quick trades
Open Source
The entire codebase is open-source:
GitHub: https://github.com/stefanoviana/deepalpha
You can:
- Self-host for free (bring your own API keys)
- Use the cloud version at https://deepalphabot.com (free 7-day trial)
- Train your own models with the included training scripts
Quick Start
git clone https://github.com/stefanoviana/deepalpha.git
cd deepalpha
pip install -r requirements.txt
cp .env.example .env # Add your Bybit API keys
python deepalpha.py
What I Learned
- Walk-forward validation is non-negotiable. My tree models claimed 70.9% with random splits. Real accuracy: 54%.
- Sequence models beat tabular models for time series. LSTM sees patterns that XGBoost literally cannot.
- Data augmentation prevents overfitting. Adding noise + time shifts doubled the effective training data.
- Less features = better. Going from 72 to 19 features improved accuracy by removing noise.
- The model is 30% of the work. Risk management, infrastructure, and edge cases are the other 70%.
What's Next
- Reinforcement learning for dynamic position sizing
- Multi-timeframe fusion (1h + 4h + 1d)
- Continuous online learning from live trades
- More exchange support (currently Bybit, Binance, OKX, Gate.io)
If you find this useful, a star on https://github.com/stefanoviana/deepalpha would mean a lot. And if you have questions about the architecture, training process, or deployment — drop a comment below.
DeepAlpha is built with Python, PyTorch, ccxt, FastAPI, and PostgreSQL. It's MIT licensed and free to use.
Top comments (0)