ThomasTheMaker's picture
Upload folder using huggingface_hub
a1a7208 verified
raw
history blame
22.1 kB
2025-08-31 17:03:52 - pico-train - INFO - Step 100000 -- ๐Ÿ“Š Evaluation Results
2025-08-31 17:03:52 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-31 17:03:52 - pico-train - INFO - ==================================================
2025-08-31 17:03:52 - pico-train - INFO - โœจ Training Configuration
2025-08-31 17:03:52 - pico-train - INFO - ==================================================
2025-08-31 17:03:52 - pico-train - INFO - โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ checkpointing: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ checkpoints_dir: checkpoints โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ evaluation: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ eval_results_dir: eval_results โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ fabric_checkpoint_dir: fabric_state โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ fabric_checkpoint_filename: checkpoint.pt โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ hf_checkpoint: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ collection_slug: null โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ repo_id: ThomasTheMaker/pico-decoder-tiny โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ learning_dynamics: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ batch_size: 1 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ eval_data: null โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ layer_suffixes: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ - attention.v_proj โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ - attention.o_proj โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ - swiglu.w_2 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ sequence_idx: -1 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ learning_dynamics_dir: learning_dynamics โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ logs_dir: logs โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ run_name: pico-decoder-tiny-dolma250M-v1 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ runs_dir: runs โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ save_every_n_steps: 2000 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ save_to_hf: false โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ training: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ auto_resume: true โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ data: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ dataloader: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ batch_size: 16 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ dataset: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ name: pico-lm/pretokenized-dolma โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ tokenizer: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ name: allenai/OLMo-7B-0724-hf โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ vocab_size: 50304 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ evaluation: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ metrics: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ - paloma โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ paloma: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ batch_size: 1 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ dataset_name: pico-lm/pretokenized-paloma-tinsy โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ dataset_split: val โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ max_length: 2048 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ model: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ activation_hidden_dim: 384 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ attention_n_heads: 12 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ attention_n_kv_heads: 4 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ batch_size: 1024 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ d_model: 96 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ max_seq_len: 2048 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ model_type: pico_decoder โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ n_layers: 12 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ norm_eps: 1.0e-06 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ position_emb_theta: 10000.0 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ vocab_size: 50304 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ monitoring: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ logging: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ log_every_n_steps: 100 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ log_level: INFO โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ save_to_wandb: false โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ wandb: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ entity: boymyc โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ project: pico-decoder-tiny โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ training: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ fabric: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ accelerator: cuda โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ num_devices: 1 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ num_nodes: 1 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ precision: bf16-mixed โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ max_steps: 100000 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ optimization: โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ gradient_accumulation_steps: 1 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ lr: 0.0002 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ lr_scheduler: cosine โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ lr_warmup_steps: 2000 โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ optimizer: adamw โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ”‚ โ”‚
2025-08-31 17:03:52 - pico-train - INFO - โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
2025-08-31 17:03:52 - pico-train - INFO - ==================================================
2025-08-31 17:03:52 - pico-train - INFO - โ›ญ Runtime Summary:
2025-08-31 17:03:52 - pico-train - INFO - ==================================================
2025-08-31 17:03:52 - pico-train - INFO - Starting from step: 100000
2025-08-31 17:03:52 - pico-train - INFO - Model Setup:
2025-08-31 17:03:52 - pico-train - INFO - โ””โ”€ Total Parameters: 11,282,784
2025-08-31 17:03:52 - pico-train - INFO - โ””โ”€ Trainable Parameters: 11,282,784
2025-08-31 17:03:52 - pico-train - INFO - Distributed Setup:
2025-08-31 17:03:52 - pico-train - INFO - โ””โ”€ Number of Devices: 1
2025-08-31 17:03:52 - pico-train - INFO - โ””โ”€ Device Type: NVIDIA H100 80GB HBM3
2025-08-31 17:03:52 - pico-train - INFO - โ””โ”€ Available Memory: 85.03 GB
2025-08-31 17:03:52 - pico-train - INFO - Software Setup:
2025-08-31 17:03:52 - pico-train - INFO - โ””โ”€ Python Version: 3.12.3
2025-08-31 17:03:52 - pico-train - INFO - โ””โ”€ PyTorch Version: 2.8.0+cu128
2025-08-31 17:03:52 - pico-train - INFO - โ””โ”€ CUDA Version: 12.8
2025-08-31 17:03:52 - pico-train - INFO - โ””โ”€ Operating System: Linux 6.8.0-71-generic
2025-08-31 17:03:52 - pico-train - INFO - Batch Size Configuration:
2025-08-31 17:03:52 - pico-train - INFO - โ””โ”€ Global Batch Size: 16
2025-08-31 17:03:52 - pico-train - INFO - โ””โ”€ Per Device Batch Size: 16
2025-08-31 17:03:52 - pico-train - INFO - โ””โ”€ Gradient Accumulation Steps: 1
2025-08-31 17:03:52 - pico-train - INFO - ==================================================
2025-08-31 17:03:52 - pico-train - INFO - Step 100000 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:03:52 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.9432
2025-08-31 17:03:52 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 2.00e-05
2025-08-31 17:03:52 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:03:52 - pico-train - INFO - Step 100000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-31 17:04:49 - pico-train - INFO - Step 100100 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:04:49 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7703
2025-08-31 17:04:49 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.01e-04
2025-08-31 17:04:49 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:05:43 - pico-train - INFO - Step 100200 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:05:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8047
2025-08-31 17:05:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.01e-04
2025-08-31 17:05:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:06:37 - pico-train - INFO - Step 100300 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:06:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8076
2025-08-31 17:06:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.01e-04
2025-08-31 17:06:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:07:31 - pico-train - INFO - Step 100400 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:07:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7926
2025-08-31 17:07:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.01e-04
2025-08-31 17:07:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:08:25 - pico-train - INFO - Step 100500 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:08:25 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8059
2025-08-31 17:08:25 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.01e-04
2025-08-31 17:08:25 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:09:19 - pico-train - INFO - Step 100600 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:09:19 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7896
2025-08-31 17:09:19 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.01e-04
2025-08-31 17:09:19 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:10:12 - pico-train - INFO - Step 100700 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:10:12 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8066
2025-08-31 17:10:12 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.00e-04
2025-08-31 17:10:12 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:11:07 - pico-train - INFO - Step 100800 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:11:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7870
2025-08-31 17:11:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.00e-04
2025-08-31 17:11:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:12:01 - pico-train - INFO - Step 100900 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:12:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7958
2025-08-31 17:12:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.00e-04
2025-08-31 17:12:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:12:55 - pico-train - INFO - Step 101000 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:12:55 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8081
2025-08-31 17:12:55 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 1.00e-04
2025-08-31 17:12:55 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:13:48 - pico-train - INFO - Step 101100 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:13:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8023
2025-08-31 17:13:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.98e-05
2025-08-31 17:13:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:14:43 - pico-train - INFO - Step 101200 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:14:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7830
2025-08-31 17:14:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.97e-05
2025-08-31 17:14:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:15:38 - pico-train - INFO - Step 101300 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:15:38 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8071
2025-08-31 17:15:38 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.95e-05
2025-08-31 17:15:38 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:16:32 - pico-train - INFO - Step 101400 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:16:32 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8072
2025-08-31 17:16:32 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.94e-05
2025-08-31 17:16:32 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:17:27 - pico-train - INFO - Step 101500 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:17:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8027
2025-08-31 17:17:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.92e-05
2025-08-31 17:17:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:18:20 - pico-train - INFO - Step 101600 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:18:20 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7874
2025-08-31 17:18:20 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.90e-05
2025-08-31 17:18:20 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:19:15 - pico-train - INFO - Step 101700 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:19:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7817
2025-08-31 17:19:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.89e-05
2025-08-31 17:19:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:20:09 - pico-train - INFO - Step 101800 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:20:09 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8188
2025-08-31 17:20:09 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.87e-05
2025-08-31 17:20:09 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:21:04 - pico-train - INFO - Step 101900 -- ๐Ÿ”„ Training Metrics
2025-08-31 17:21:04 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7880
2025-08-31 17:21:04 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.86e-05
2025-08-31 17:21:04 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 17:21:58 - pico-train - INFO - Step 102000 -- ๐Ÿ’พ Saving Checkpoint
2025-08-31 18:00:17 - pico-train - INFO - Step 102000 -- ๐Ÿ“Š Evaluation Results
2025-08-31 18:00:17 - pico-train - INFO - โ””โ”€โ”€ paloma: inf
2025-08-31 18:00:17 - pico-train - INFO - Step 102000 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:00:17 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8055
2025-08-31 18:00:17 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.84e-05
2025-08-31 18:00:17 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:00:17 - pico-train - INFO - Step 102000 -- ๐Ÿ“ˆ Saving Learning Dynamics
2025-08-31 18:01:13 - pico-train - INFO - Step 102100 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:01:13 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7742
2025-08-31 18:01:13 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.83e-05
2025-08-31 18:01:13 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:02:07 - pico-train - INFO - Step 102200 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:02:07 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8050
2025-08-31 18:02:07 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.81e-05
2025-08-31 18:02:07 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:03:01 - pico-train - INFO - Step 102300 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:03:01 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8066
2025-08-31 18:03:01 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.79e-05
2025-08-31 18:03:01 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:03:57 - pico-train - INFO - Step 102400 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:03:57 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7865
2025-08-31 18:03:57 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.78e-05
2025-08-31 18:03:57 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:04:50 - pico-train - INFO - Step 102500 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:04:50 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8019
2025-08-31 18:04:50 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.76e-05
2025-08-31 18:04:50 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:05:45 - pico-train - INFO - Step 102600 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:05:45 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7948
2025-08-31 18:05:45 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.75e-05
2025-08-31 18:05:45 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:06:39 - pico-train - INFO - Step 102700 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:06:39 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8006
2025-08-31 18:06:39 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.73e-05
2025-08-31 18:06:39 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:07:33 - pico-train - INFO - Step 102800 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:07:33 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8049
2025-08-31 18:07:33 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.71e-05
2025-08-31 18:07:33 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:08:27 - pico-train - INFO - Step 102900 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:08:27 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8086
2025-08-31 18:08:27 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.70e-05
2025-08-31 18:08:27 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:09:21 - pico-train - INFO - Step 103000 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:09:21 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8154
2025-08-31 18:09:21 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.68e-05
2025-08-31 18:09:21 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:10:15 - pico-train - INFO - Step 103100 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:10:15 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8232
2025-08-31 18:10:15 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.67e-05
2025-08-31 18:10:15 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:11:10 - pico-train - INFO - Step 103200 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:11:10 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8032
2025-08-31 18:11:10 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.65e-05
2025-08-31 18:11:10 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:12:05 - pico-train - INFO - Step 103300 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:12:05 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8157
2025-08-31 18:12:05 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.64e-05
2025-08-31 18:12:05 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:13:00 - pico-train - INFO - Step 103400 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:13:00 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7903
2025-08-31 18:13:00 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.62e-05
2025-08-31 18:13:00 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:13:54 - pico-train - INFO - Step 103500 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:13:54 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7786
2025-08-31 18:13:54 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.60e-05
2025-08-31 18:13:54 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:14:48 - pico-train - INFO - Step 103600 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:14:48 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7962
2025-08-31 18:14:48 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.59e-05
2025-08-31 18:14:48 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:15:43 - pico-train - INFO - Step 103700 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:15:43 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.8097
2025-08-31 18:15:43 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.57e-05
2025-08-31 18:15:43 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:16:37 - pico-train - INFO - Step 103800 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:16:37 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7613
2025-08-31 18:16:37 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.56e-05
2025-08-31 18:16:37 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:17:31 - pico-train - INFO - Step 103900 -- ๐Ÿ”„ Training Metrics
2025-08-31 18:17:31 - pico-train - INFO - โ”œโ”€โ”€ Loss: 4.7992
2025-08-31 18:17:31 - pico-train - INFO - โ”œโ”€โ”€ Learning Rate: 9.54e-05
2025-08-31 18:17:31 - pico-train - INFO - โ””โ”€โ”€ Inf/NaN count: 0
2025-08-31 18:18:25 - pico-train - INFO - Step 104000 -- ๐Ÿ’พ Saving Checkpoint