roneneldan/TinyStories
Viewer • Updated • 2.14M • 90.8k • 1k
A nano GPT-2 style causal language model trained on TinyStories with double-double (~FP128) arithmetic in the forward pass.
| Hyper-parameter | Value |
|---|---|
| Embedding dim | 32 |
| Attention heads | 2 |
| Transformer layers | 2 |
| Context window | 64 |
| Vocabulary | 82 (char-level, TinyStories) |
| Parameters | 32,768 |
| Stage | Precision |
|---|---|
| Weight storage | float64 |
| Forward matmuls | ~106-bit (double-double / Veltkamp split) |
| Backward pass | float64 |
| Equivalent to | IEEE binary128 (113-bit) within 7 bits |