12 Commits

Author SHA1 Message Date
Kye Gomez
963e11277d tiny tests 2026-04-22 12:48:33 -04:00
Kye Gomez
7d78ebec79 flash attn 2026-04-22 12:15:37 -04:00
Kye Gomez
289981ba01 [bugf][act-halting][gate halted positions from weight accumulation][bugf][moe-router-bias][stop
load balance bias gradient leak][bugf][act-cache-consistency][keep all loops with kv
  cache][bugf][lora-depth-extrapolation][clamp scale index beyond max
  loops][improvement][pyproject-version][bump version to 0 4 0]
2026-04-20 09:17:43 -04:00
Kye Gomez
18cca894dd [fix][rope Every decode token was stuck at position 0, so <q_decoded, k_cached> lost the (n - m) term entirely] 2026-04-20 08:19:14 -04:00
Kye Gomez
97bc414977 use gpt-oss tokenizer because it's a great tokenizer 2026-04-19 22:28:09 -04:00
Kye Gomez
5cfef742b5 [feat][dropout][add dropout to config attn and
residuals][docs][datasets][add recommended training datasets
  doc][docs][readme-datasets][link datasets doc in readme]
2026-04-19 22:15:41 -04:00
Kye Gomez
e522ecf630 clean up 2026-04-19 15:04:30 -04:00
Kye Gomez
9cce1c1401 variants 2026-04-19 15:03:45 -04:00
Kye Gomez
806a8da1d6 [bugf][lora-b-init][fix zero-init B making adapter always output zero][bugf][lti-get-a][fix 0 times
inf NaN in log space computation][improvement][rope-theta-test][exclude degenerate dim0 from theta
  angle comparison]
2026-04-18 23:51:01 -04:00
Kye Gomez
0699c00c94 tests 2026-04-18 10:12:45 -04:00
Kye Gomez
c258cdc8da example .py 2026-04-18 09:55:56 -04:00
Kye Gomez
791626667d [FEAT][Main.py] 2026-04-18 09:53:06 -04:00