NanoChat

This repository contains the saved model artifacts from a successful Nanochat training run completed on Nebius using H100 GPUs.

Companion repository

The codebase, report, and supporting project documentation for this run live in the companion GitHub repository:

github.com/angy255/nanochat-d24-nebius

Overview

This model represents a full Nanochat training run that was successfully completed, validated, and preserved. The weights of this project were trained using a minimal GPT-style training pipeline inspired by Andrej Karpathy's NanoChat project. The goal of this project is to provide lightweight, reproducible language model training and inference. After training finished, I was able to launch the browser-based chat interface, confirm the model loaded correctly, and back up the final checkpoint files. Note that the tokenizer and token byte files were not saved in this run.

Ultimately, this project was more than just training a model. It was an opportunity for me to complete the full workflow from setup and execution to validation and preservation of the final artifacts. This makes the run particularly meaningful, as it reflects a cleaner and more complete outcome compared to an earlier attempt where the training stopped abruptly halfway through the process.

The training process followed the NanoChat setup and speedrun workflow, including:

  • Model initialization and configuration
  • Tokenization and dataset preparation
  • Training loop with gradient updates
  • Checkpointing and weight export

Model Details

  • Architecture: GPT-style Transformer
  • Framework: PyTorch
  • Training method: NanoChat training pipeline
  • Tokenization: Character or subword-based (depending on your setup!)

Files Included

  • model_000483.pt โ€” main model weights
  • meta_000483.json โ€” training metadata (config, step info, etc.)
  • optim_000483_rank*.pt โ€” optimizer shards (used for resuming training)

Usage Note

This repo does NOT include tokenizer files.

That means:

  • You must use the same tokenizer setup used during training
  • Typically, NanoChat uses:
    • Character-level tokenization OR a simple dataset-specific encoding

Since the tokenizer was not saved, loading the model requires either re-creating the tokenizer or using compatible defaults from the original Nanochat setup.

Training Context

The run was executed on a Nebius GPU VM using the official Nanochat speedrun workflow.

Intended Use

  • Experimentation with small language models
  • Fine-tuning and research
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support