BitFunded Decision Transformer (Stage 1)

Decision Transformer trained for the BitFunded prop firm Stage 1 crypto challenge. Predicts optimal trading actions (HOLD, LONG, SHORT, CLOSE) conditioned on a target return of 8%, while respecting drawdown limits.

Challenge Constraints

Parameter Value
Account 10,000 USDT
Leverage 1:5
Profit target 800 USDT (8%)
Max daily loss 500 USDT (5%)
Max total loss 1,000 USDT (10%)

Model Architecture

Parameter Value
Type Decision Transformer (causal)
Hidden dim 128
Attention heads 4
Transformer layers 3
Context length 20 timesteps
Parameters ~635K
Actions HOLD (0), LONG (1), SHORT (2), CLOSE (3)

Training

  • Data: 23 crypto pairs × 800 4H candles from OKX (~6 months)
  • Pairs: BTC, ETH, SOL, BNB, XRP, ADA, DOT, LINK, AVAX, NEAR, SUI, TON, ATOM, APT, ARB, OP, TRX, UNI, LTC, IMX, ONDO, ICP, FET
  • Expert trajectories: 150 episodes with enriched rule-based policy (trend pullback, RSI at S/R, BB squeeze, MACD crossover)
  • Action distribution: HOLD=21,507 | LONG=1,097 | SHORT=1,604 | CLOSE=2,576
  • Training accuracy: 89.4% | Validation accuracy: 85.8%
  • Trade action accuracy: 98.8%
  • Composite score: 0.936

Input Features (39 dimensions)

Index Description
0-4 Price returns (1/6/18/42 bar log returns + candle body ratio)
5-8 Volatility (14/42 bar rolling std, normalized ATR, vol ratio)
9-14 Moving averages (EMA 9/21/50 distance, slopes, alignment signal)
15-17 RSI (normalized [-1,1], zone signal, divergence)
18-20 MACD (normalized line, histogram, crossover signal)
21-24 Bollinger Bands (%B position, width, squeeze ratio, momentum)
25-28 Volume (ratio to 20-MA, log ratio, trend, price-volume divergence)
29-31 Support/Resistance (distance to nearest high/low, range position)
32-34 Market regime (trend strength, mean reversion z-score, composite)
35-38 Portfolio state (position flag, PnL ratio, daily PnL, drawdown)

Reward Shaping

  • Base: realized + unrealized PnL (normalized)
  • Penalty at 3% daily drawdown (approaching 5% limit)
  • Penalty at 6% total drawdown (approaching 10% limit)
  • Progress bonus toward 8% profit target
  • Small HOLD cost to encourage decision-making
  • Terminal: +5.0 for hitting target, -3.0 for daily loss breach, -5.0 for total loss breach

Usage

import torch
import torch.nn as nn
import json

# Load config
with open("config.json") as f:
    cfg = json.load(f)

# Define architecture (must match training)
class DTBlock(nn.Module):
    def __init__(self, h, nh, do):
        super().__init__()
        self.attn = nn.MultiheadAttention(h, nh, dropout=do, batch_first=True)
        self.ln1 = nn.LayerNorm(h)
        self.ln2 = nn.LayerNorm(h)
        self.ffn = nn.Sequential(
            nn.Linear(h, h*4), nn.GELU(), nn.Dropout(do),
            nn.Linear(h*4, h), nn.Dropout(do)
        )
    def forward(self, x, mask=None):
        n = self.ln1(x)
        a, _ = self.attn(n, n, n, attn_mask=mask)
        x = x + a
        return x + self.ffn(self.ln2(x))

class DecisionTransformer(nn.Module):
    def __init__(self, state_dim, act_dim, hidden_dim, n_heads, n_layers, max_ep_len, seq_len, dropout=0.0):
        super().__init__()
        self.hidden_dim = hidden_dim
        self.state_embed = nn.Linear(state_dim, hidden_dim)
        self.action_embed = nn.Embedding(act_dim, hidden_dim)
        self.return_embed = nn.Linear(1, hidden_dim)
        self.timestep_embed = nn.Embedding(max_ep_len, hidden_dim)
        self.pos_embed = nn.Parameter(torch.zeros(1, 3*seq_len, hidden_dim))
        self.blocks = nn.ModuleList([DTBlock(hidden_dim, n_heads, dropout) for _ in range(n_layers)])
        self.ln_f = nn.LayerNorm(hidden_dim)
        self.action_head = nn.Linear(hidden_dim, act_dim)

    def forward(self, states, actions, returns_to_go, timesteps):
        B, T = states.shape[:2]
        te = self.timestep_embed(timesteps)
        se = self.state_embed(states) + te
        ae = self.action_embed(actions) + te
        re = self.return_embed(returns_to_go.unsqueeze(-1)) + te
        tok = torch.zeros(B, 3*T, self.hidden_dim, device=states.device)
        tok[:, 0::3] = re
        tok[:, 1::3] = se
        tok[:, 2::3] = ae
        tok = tok + self.pos_embed[:, :3*T]
        mask = torch.triu(torch.ones(3*T, 3*T, device=states.device), diagonal=1).bool()
        x = tok
        for block in self.blocks:
            x = block(x, mask)
        return self.action_head(self.ln_f(x[:, 1::3]))

# Load model
model = DecisionTransformer(**cfg)
model.load_state_dict(torch.load("decision_transformer_v3.pt", map_location="cpu"))
model.eval()

# Inference: predict action for the last timestep
# states: (1, seq_len, 39), actions: (1, seq_len), rtg: (1, seq_len), timesteps: (1, seq_len)
with torch.no_grad():
    logits = model(states, actions, returns_to_go, timesteps)
    probs = torch.softmax(logits[0, -1], dim=-1)
    action = torch.argmax(probs).item()
    # 0=HOLD, 1=LONG, 2=SHORT, 3=CLOSE

Integration

This model is used as a confidence overlay in a live signal scanner. Technical analysis generates candidate signals, then the DT provides:

  • Confirmation (upgrades signal strength) when it agrees
  • Caution (downgrades signal strength) when it disagrees

Risk per trade is capped at 40 USDT (0.4%) with a personal daily stop of 200 USDT.

Disclaimer

This model is for educational and research purposes. Not financial advice. Past performance does not guarantee future results. Always use proper risk management.

Downloads last month
13
Video Preview
loading