CharlesDDDD/looped_window_600M_full_512_256_128_weighted_loss
Updated
Looped transformer checkpoint collection
Note looped_window_600M full->SWA=512->SWA=256->SWA=128, weighted loss. 9 ckpts: 10B-90B tokens.
Note looped_window 600M bookend:2 looped4 window=128. 10 ckpts: 10B-100B tokens.
Note looped_window 600M 4:1 ratio looped1 window=128. 10 ckpts: 10B-100B tokens.
Note looped_transformer 600M pure full looped4 (loop_count=4). 10 ckpts: 10B-100B tokens.