| 2025-08-31 17:03:52 - pico-train - INFO - Step 100000 -- ๐ Evaluation Results | |
| 2025-08-31 17:03:52 - pico-train - INFO - โโโ paloma: inf | |
| 2025-08-31 17:03:52 - pico-train - INFO - ================================================== | |
| 2025-08-31 17:03:52 - pico-train - INFO - โจ Training Configuration | |
| 2025-08-31 17:03:52 - pico-train - INFO - ================================================== | |
| 2025-08-31 17:03:52 - pico-train - INFO - โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ checkpointing: โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ checkpoints_dir: checkpoints โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ evaluation: โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ eval_results_dir: eval_results โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ fabric_checkpoint_dir: fabric_state โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ fabric_checkpoint_filename: checkpoint.pt โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ hf_checkpoint: โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ collection_slug: null โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ repo_id: ThomasTheMaker/pico-decoder-tiny โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ learning_dynamics: โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ batch_size: 1 โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ eval_data: null โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ layer_suffixes: โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ - attention.v_proj โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ - attention.o_proj โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ - swiglu.w_2 โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ sequence_idx: -1 โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ learning_dynamics_dir: learning_dynamics โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ logs_dir: logs โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ run_name: pico-decoder-tiny-dolma250M-v1 โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ runs_dir: runs โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ save_every_n_steps: 2000 โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ save_to_hf: false โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ training: โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ auto_resume: true โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ data: โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ dataloader: โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ batch_size: 16 โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ dataset: โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ name: pico-lm/pretokenized-dolma โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ tokenizer: โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ name: allenai/OLMo-7B-0724-hf โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ vocab_size: 50304 โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ evaluation: โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ metrics: โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ - paloma โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ paloma: โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ batch_size: 1 โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ dataset_name: pico-lm/pretokenized-paloma-tinsy โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ dataset_split: val โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ max_length: 2048 โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ model: โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ activation_hidden_dim: 384 โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ attention_n_heads: 12 โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ attention_n_kv_heads: 4 โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ batch_size: 1024 โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ d_model: 96 โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ max_seq_len: 2048 โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ model_type: pico_decoder โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ n_layers: 12 โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ norm_eps: 1.0e-06 โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ position_emb_theta: 10000.0 โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ vocab_size: 50304 โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ monitoring: โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ logging: โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ log_every_n_steps: 100 โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ log_level: INFO โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ save_to_wandb: false โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ wandb: โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ entity: boymyc โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ project: pico-decoder-tiny โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ training: โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ fabric: โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ accelerator: cuda โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ num_devices: 1 โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ num_nodes: 1 โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ precision: bf16-mixed โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ max_steps: 100000 โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ optimization: โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ gradient_accumulation_steps: 1 โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ lr: 0.0002 โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ lr_scheduler: cosine โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ lr_warmup_steps: 2000 โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ optimizer: adamw โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โ โ | |
| 2025-08-31 17:03:52 - pico-train - INFO - โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ | |
| 2025-08-31 17:03:52 - pico-train - INFO - ================================================== | |
| 2025-08-31 17:03:52 - pico-train - INFO - โญ Runtime Summary: | |
| 2025-08-31 17:03:52 - pico-train - INFO - ================================================== | |
| 2025-08-31 17:03:52 - pico-train - INFO - Starting from step: 100000 | |
| 2025-08-31 17:03:52 - pico-train - INFO - Model Setup: | |
| 2025-08-31 17:03:52 - pico-train - INFO - โโ Total Parameters: 11,282,784 | |
| 2025-08-31 17:03:52 - pico-train - INFO - โโ Trainable Parameters: 11,282,784 | |
| 2025-08-31 17:03:52 - pico-train - INFO - Distributed Setup: | |
| 2025-08-31 17:03:52 - pico-train - INFO - โโ Number of Devices: 1 | |
| 2025-08-31 17:03:52 - pico-train - INFO - โโ Device Type: NVIDIA H100 80GB HBM3 | |
| 2025-08-31 17:03:52 - pico-train - INFO - โโ Available Memory: 85.03 GB | |
| 2025-08-31 17:03:52 - pico-train - INFO - Software Setup: | |
| 2025-08-31 17:03:52 - pico-train - INFO - โโ Python Version: 3.12.3 | |
| 2025-08-31 17:03:52 - pico-train - INFO - โโ PyTorch Version: 2.8.0+cu128 | |
| 2025-08-31 17:03:52 - pico-train - INFO - โโ CUDA Version: 12.8 | |
| 2025-08-31 17:03:52 - pico-train - INFO - โโ Operating System: Linux 6.8.0-71-generic | |
| 2025-08-31 17:03:52 - pico-train - INFO - Batch Size Configuration: | |
| 2025-08-31 17:03:52 - pico-train - INFO - โโ Global Batch Size: 16 | |
| 2025-08-31 17:03:52 - pico-train - INFO - โโ Per Device Batch Size: 16 | |
| 2025-08-31 17:03:52 - pico-train - INFO - โโ Gradient Accumulation Steps: 1 | |
| 2025-08-31 17:03:52 - pico-train - INFO - ================================================== | |
| 2025-08-31 17:03:52 - pico-train - INFO - Step 100000 -- ๐ Training Metrics | |
| 2025-08-31 17:03:52 - pico-train - INFO - โโโ Loss: 4.9432 | |
| 2025-08-31 17:03:52 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 | |
| 2025-08-31 17:03:52 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 17:03:52 - pico-train - INFO - Step 100000 -- ๐ Saving Learning Dynamics | |
| 2025-08-31 17:04:49 - pico-train - INFO - Step 100100 -- ๐ Training Metrics | |
| 2025-08-31 17:04:49 - pico-train - INFO - โโโ Loss: 4.7703 | |
| 2025-08-31 17:04:49 - pico-train - INFO - โโโ Learning Rate: 1.01e-04 | |
| 2025-08-31 17:04:49 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 17:05:43 - pico-train - INFO - Step 100200 -- ๐ Training Metrics | |
| 2025-08-31 17:05:43 - pico-train - INFO - โโโ Loss: 4.8047 | |
| 2025-08-31 17:05:43 - pico-train - INFO - โโโ Learning Rate: 1.01e-04 | |
| 2025-08-31 17:05:43 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 17:06:37 - pico-train - INFO - Step 100300 -- ๐ Training Metrics | |
| 2025-08-31 17:06:37 - pico-train - INFO - โโโ Loss: 4.8076 | |
| 2025-08-31 17:06:37 - pico-train - INFO - โโโ Learning Rate: 1.01e-04 | |
| 2025-08-31 17:06:37 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 17:07:31 - pico-train - INFO - Step 100400 -- ๐ Training Metrics | |
| 2025-08-31 17:07:31 - pico-train - INFO - โโโ Loss: 4.7926 | |
| 2025-08-31 17:07:31 - pico-train - INFO - โโโ Learning Rate: 1.01e-04 | |
| 2025-08-31 17:07:31 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 17:08:25 - pico-train - INFO - Step 100500 -- ๐ Training Metrics | |
| 2025-08-31 17:08:25 - pico-train - INFO - โโโ Loss: 4.8059 | |
| 2025-08-31 17:08:25 - pico-train - INFO - โโโ Learning Rate: 1.01e-04 | |
| 2025-08-31 17:08:25 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 17:09:19 - pico-train - INFO - Step 100600 -- ๐ Training Metrics | |
| 2025-08-31 17:09:19 - pico-train - INFO - โโโ Loss: 4.7896 | |
| 2025-08-31 17:09:19 - pico-train - INFO - โโโ Learning Rate: 1.01e-04 | |
| 2025-08-31 17:09:19 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 17:10:12 - pico-train - INFO - Step 100700 -- ๐ Training Metrics | |
| 2025-08-31 17:10:12 - pico-train - INFO - โโโ Loss: 4.8066 | |
| 2025-08-31 17:10:12 - pico-train - INFO - โโโ Learning Rate: 1.00e-04 | |
| 2025-08-31 17:10:12 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 17:11:07 - pico-train - INFO - Step 100800 -- ๐ Training Metrics | |
| 2025-08-31 17:11:07 - pico-train - INFO - โโโ Loss: 4.7870 | |
| 2025-08-31 17:11:07 - pico-train - INFO - โโโ Learning Rate: 1.00e-04 | |
| 2025-08-31 17:11:07 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 17:12:01 - pico-train - INFO - Step 100900 -- ๐ Training Metrics | |
| 2025-08-31 17:12:01 - pico-train - INFO - โโโ Loss: 4.7958 | |
| 2025-08-31 17:12:01 - pico-train - INFO - โโโ Learning Rate: 1.00e-04 | |
| 2025-08-31 17:12:01 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 17:12:55 - pico-train - INFO - Step 101000 -- ๐ Training Metrics | |
| 2025-08-31 17:12:55 - pico-train - INFO - โโโ Loss: 4.8081 | |
| 2025-08-31 17:12:55 - pico-train - INFO - โโโ Learning Rate: 1.00e-04 | |
| 2025-08-31 17:12:55 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 17:13:48 - pico-train - INFO - Step 101100 -- ๐ Training Metrics | |
| 2025-08-31 17:13:48 - pico-train - INFO - โโโ Loss: 4.8023 | |
| 2025-08-31 17:13:48 - pico-train - INFO - โโโ Learning Rate: 9.98e-05 | |
| 2025-08-31 17:13:48 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 17:14:43 - pico-train - INFO - Step 101200 -- ๐ Training Metrics | |
| 2025-08-31 17:14:43 - pico-train - INFO - โโโ Loss: 4.7830 | |
| 2025-08-31 17:14:43 - pico-train - INFO - โโโ Learning Rate: 9.97e-05 | |
| 2025-08-31 17:14:43 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 17:15:38 - pico-train - INFO - Step 101300 -- ๐ Training Metrics | |
| 2025-08-31 17:15:38 - pico-train - INFO - โโโ Loss: 4.8071 | |
| 2025-08-31 17:15:38 - pico-train - INFO - โโโ Learning Rate: 9.95e-05 | |
| 2025-08-31 17:15:38 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 17:16:32 - pico-train - INFO - Step 101400 -- ๐ Training Metrics | |
| 2025-08-31 17:16:32 - pico-train - INFO - โโโ Loss: 4.8072 | |
| 2025-08-31 17:16:32 - pico-train - INFO - โโโ Learning Rate: 9.94e-05 | |
| 2025-08-31 17:16:32 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 17:17:27 - pico-train - INFO - Step 101500 -- ๐ Training Metrics | |
| 2025-08-31 17:17:27 - pico-train - INFO - โโโ Loss: 4.8027 | |
| 2025-08-31 17:17:27 - pico-train - INFO - โโโ Learning Rate: 9.92e-05 | |
| 2025-08-31 17:17:27 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 17:18:20 - pico-train - INFO - Step 101600 -- ๐ Training Metrics | |
| 2025-08-31 17:18:20 - pico-train - INFO - โโโ Loss: 4.7874 | |
| 2025-08-31 17:18:20 - pico-train - INFO - โโโ Learning Rate: 9.90e-05 | |
| 2025-08-31 17:18:20 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 17:19:15 - pico-train - INFO - Step 101700 -- ๐ Training Metrics | |
| 2025-08-31 17:19:15 - pico-train - INFO - โโโ Loss: 4.7817 | |
| 2025-08-31 17:19:15 - pico-train - INFO - โโโ Learning Rate: 9.89e-05 | |
| 2025-08-31 17:19:15 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 17:20:09 - pico-train - INFO - Step 101800 -- ๐ Training Metrics | |
| 2025-08-31 17:20:09 - pico-train - INFO - โโโ Loss: 4.8188 | |
| 2025-08-31 17:20:09 - pico-train - INFO - โโโ Learning Rate: 9.87e-05 | |
| 2025-08-31 17:20:09 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 17:21:04 - pico-train - INFO - Step 101900 -- ๐ Training Metrics | |
| 2025-08-31 17:21:04 - pico-train - INFO - โโโ Loss: 4.7880 | |
| 2025-08-31 17:21:04 - pico-train - INFO - โโโ Learning Rate: 9.86e-05 | |
| 2025-08-31 17:21:04 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 17:21:58 - pico-train - INFO - Step 102000 -- ๐พ Saving Checkpoint | |
| 2025-08-31 18:00:17 - pico-train - INFO - Step 102000 -- ๐ Evaluation Results | |
| 2025-08-31 18:00:17 - pico-train - INFO - โโโ paloma: inf | |
| 2025-08-31 18:00:17 - pico-train - INFO - Step 102000 -- ๐ Training Metrics | |
| 2025-08-31 18:00:17 - pico-train - INFO - โโโ Loss: 4.8055 | |
| 2025-08-31 18:00:17 - pico-train - INFO - โโโ Learning Rate: 9.84e-05 | |
| 2025-08-31 18:00:17 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 18:00:17 - pico-train - INFO - Step 102000 -- ๐ Saving Learning Dynamics | |
| 2025-08-31 18:01:13 - pico-train - INFO - Step 102100 -- ๐ Training Metrics | |
| 2025-08-31 18:01:13 - pico-train - INFO - โโโ Loss: 4.7742 | |
| 2025-08-31 18:01:13 - pico-train - INFO - โโโ Learning Rate: 9.83e-05 | |
| 2025-08-31 18:01:13 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 18:02:07 - pico-train - INFO - Step 102200 -- ๐ Training Metrics | |
| 2025-08-31 18:02:07 - pico-train - INFO - โโโ Loss: 4.8050 | |
| 2025-08-31 18:02:07 - pico-train - INFO - โโโ Learning Rate: 9.81e-05 | |
| 2025-08-31 18:02:07 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 18:03:01 - pico-train - INFO - Step 102300 -- ๐ Training Metrics | |
| 2025-08-31 18:03:01 - pico-train - INFO - โโโ Loss: 4.8066 | |
| 2025-08-31 18:03:01 - pico-train - INFO - โโโ Learning Rate: 9.79e-05 | |
| 2025-08-31 18:03:01 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 18:03:57 - pico-train - INFO - Step 102400 -- ๐ Training Metrics | |
| 2025-08-31 18:03:57 - pico-train - INFO - โโโ Loss: 4.7865 | |
| 2025-08-31 18:03:57 - pico-train - INFO - โโโ Learning Rate: 9.78e-05 | |
| 2025-08-31 18:03:57 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 18:04:50 - pico-train - INFO - Step 102500 -- ๐ Training Metrics | |
| 2025-08-31 18:04:50 - pico-train - INFO - โโโ Loss: 4.8019 | |
| 2025-08-31 18:04:50 - pico-train - INFO - โโโ Learning Rate: 9.76e-05 | |
| 2025-08-31 18:04:50 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 18:05:45 - pico-train - INFO - Step 102600 -- ๐ Training Metrics | |
| 2025-08-31 18:05:45 - pico-train - INFO - โโโ Loss: 4.7948 | |
| 2025-08-31 18:05:45 - pico-train - INFO - โโโ Learning Rate: 9.75e-05 | |
| 2025-08-31 18:05:45 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 18:06:39 - pico-train - INFO - Step 102700 -- ๐ Training Metrics | |
| 2025-08-31 18:06:39 - pico-train - INFO - โโโ Loss: 4.8006 | |
| 2025-08-31 18:06:39 - pico-train - INFO - โโโ Learning Rate: 9.73e-05 | |
| 2025-08-31 18:06:39 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 18:07:33 - pico-train - INFO - Step 102800 -- ๐ Training Metrics | |
| 2025-08-31 18:07:33 - pico-train - INFO - โโโ Loss: 4.8049 | |
| 2025-08-31 18:07:33 - pico-train - INFO - โโโ Learning Rate: 9.71e-05 | |
| 2025-08-31 18:07:33 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 18:08:27 - pico-train - INFO - Step 102900 -- ๐ Training Metrics | |
| 2025-08-31 18:08:27 - pico-train - INFO - โโโ Loss: 4.8086 | |
| 2025-08-31 18:08:27 - pico-train - INFO - โโโ Learning Rate: 9.70e-05 | |
| 2025-08-31 18:08:27 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 18:09:21 - pico-train - INFO - Step 103000 -- ๐ Training Metrics | |
| 2025-08-31 18:09:21 - pico-train - INFO - โโโ Loss: 4.8154 | |
| 2025-08-31 18:09:21 - pico-train - INFO - โโโ Learning Rate: 9.68e-05 | |
| 2025-08-31 18:09:21 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 18:10:15 - pico-train - INFO - Step 103100 -- ๐ Training Metrics | |
| 2025-08-31 18:10:15 - pico-train - INFO - โโโ Loss: 4.8232 | |
| 2025-08-31 18:10:15 - pico-train - INFO - โโโ Learning Rate: 9.67e-05 | |
| 2025-08-31 18:10:15 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 18:11:10 - pico-train - INFO - Step 103200 -- ๐ Training Metrics | |
| 2025-08-31 18:11:10 - pico-train - INFO - โโโ Loss: 4.8032 | |
| 2025-08-31 18:11:10 - pico-train - INFO - โโโ Learning Rate: 9.65e-05 | |
| 2025-08-31 18:11:10 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 18:12:05 - pico-train - INFO - Step 103300 -- ๐ Training Metrics | |
| 2025-08-31 18:12:05 - pico-train - INFO - โโโ Loss: 4.8157 | |
| 2025-08-31 18:12:05 - pico-train - INFO - โโโ Learning Rate: 9.64e-05 | |
| 2025-08-31 18:12:05 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 18:13:00 - pico-train - INFO - Step 103400 -- ๐ Training Metrics | |
| 2025-08-31 18:13:00 - pico-train - INFO - โโโ Loss: 4.7903 | |
| 2025-08-31 18:13:00 - pico-train - INFO - โโโ Learning Rate: 9.62e-05 | |
| 2025-08-31 18:13:00 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 18:13:54 - pico-train - INFO - Step 103500 -- ๐ Training Metrics | |
| 2025-08-31 18:13:54 - pico-train - INFO - โโโ Loss: 4.7786 | |
| 2025-08-31 18:13:54 - pico-train - INFO - โโโ Learning Rate: 9.60e-05 | |
| 2025-08-31 18:13:54 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 18:14:48 - pico-train - INFO - Step 103600 -- ๐ Training Metrics | |
| 2025-08-31 18:14:48 - pico-train - INFO - โโโ Loss: 4.7962 | |
| 2025-08-31 18:14:48 - pico-train - INFO - โโโ Learning Rate: 9.59e-05 | |
| 2025-08-31 18:14:48 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 18:15:43 - pico-train - INFO - Step 103700 -- ๐ Training Metrics | |
| 2025-08-31 18:15:43 - pico-train - INFO - โโโ Loss: 4.8097 | |
| 2025-08-31 18:15:43 - pico-train - INFO - โโโ Learning Rate: 9.57e-05 | |
| 2025-08-31 18:15:43 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 18:16:37 - pico-train - INFO - Step 103800 -- ๐ Training Metrics | |
| 2025-08-31 18:16:37 - pico-train - INFO - โโโ Loss: 4.7613 | |
| 2025-08-31 18:16:37 - pico-train - INFO - โโโ Learning Rate: 9.56e-05 | |
| 2025-08-31 18:16:37 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 18:17:31 - pico-train - INFO - Step 103900 -- ๐ Training Metrics | |
| 2025-08-31 18:17:31 - pico-train - INFO - โโโ Loss: 4.7992 | |
| 2025-08-31 18:17:31 - pico-train - INFO - โโโ Learning Rate: 9.54e-05 | |
| 2025-08-31 18:17:31 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-31 18:18:25 - pico-train - INFO - Step 104000 -- ๐พ Saving Checkpoint | |