32 Commits

Author SHA1 Message Date
Kye Gomez
eae0f04b8e fix training 2026-04-20 09:43:25 -04:00
Kye Gomez
289981ba01 [bugf][act-halting][gate halted positions from weight accumulation][bugf][moe-router-bias][stop
load balance bias gradient leak][bugf][act-cache-consistency][keep all loops with kv
  cache][bugf][lora-depth-extrapolation][clamp scale index beyond max
  loops][improvement][pyproject-version][bump version to 0 4 0]
2026-04-20 09:17:43 -04:00
Kye Gomez
7ba690797b [improvement][loguru-logging][replace print with loguru in training script][feat][ckpt-logging][add checkpoint start and success log events][docs][readme-optimizer][remove muon optimizer
reference][feat][train-requirements][add requirements txt to training folder]
2026-04-20 08:25:00 -04:00
Kye Gomez
18cca894dd [fix][rope Every decode token was stuck at position 0, so <q_decoded, k_cached> lost the (n - m) term entirely] 2026-04-20 08:19:14 -04:00
Kye Gomez
537b116b3e just use adam for now in training maybe add muon later 2026-04-19 23:34:58 -04:00
Kye Gomez
137cd8832e readme 2026-04-19 23:21:50 -04:00
Kye Gomez
f37d405a81 readme 2026-04-19 23:21:39 -04:00
Kye Gomez
ef2cac6c3a readme 2026-04-19 23:20:53 -04:00
Kye Gomez
12f6c5b32e [docs][readme-badges][add pypi twitter github badges to
readme][improvement][init-sort][sort imports alphabetically in
  init][improvement][version-bump][bump version to
  0.3.0][feat][tokenizer-class][add MythosTokenizer to init
  exports][feat][test-tokenizer][add tokenizer test suite with printed output]
2026-04-19 23:18:05 -04:00
Kye Gomez
5ffb897dcf [feat][training-script][add 3b fineweb-edu training
script][feat][tokenizer][add MythosTokenizer class with encode
  decode][improvement][deps][add transformers and datasets
  dependencies][docs][readme-training][add training section with run
  commands][improvement][pyproject][pin torch and add new deps]
2026-04-19 22:48:30 -04:00
Kye Gomez
97bc414977 use gpt-oss tokenizer because it's a great tokenizer 2026-04-19 22:28:09 -04:00
Kye Gomez
5cfef742b5 [feat][dropout][add dropout to config attn and
residuals][docs][datasets][add recommended training datasets
  doc][docs][readme-datasets][link datasets doc in readme]
2026-04-19 22:15:41 -04:00
Kye Gomez
0623ceb960
Update README.md 2026-04-19 22:01:10 -04:00
Kye Gomez
a825ba217f [Update variants with parameter count equation] 2026-04-19 21:55:06 -04:00
Kye Gomez
f00d10b59f more references fix 2026-04-19 21:39:13 -04:00
Kye Gomez
e522ecf630 clean up 2026-04-19 15:04:30 -04:00
Kye Gomez
1c54259fa8 format 2026-04-19 15:04:03 -04:00
Kye Gomez
9cce1c1401 variants 2026-04-19 15:03:45 -04:00
Kye Gomez
299a0cdb0a pip and readme 2026-04-19 08:33:09 -04:00
Kye Gomez
806a8da1d6 [bugf][lora-b-init][fix zero-init B making adapter always output zero][bugf][lti-get-a][fix 0 times
inf NaN in log space computation][improvement][rope-theta-test][exclude degenerate dim0 from theta
  angle comparison]
2026-04-18 23:51:01 -04:00
Kye Gomez
53f786afda disclaimer and rope tests 2026-04-18 23:44:31 -04:00
Kye Gomez
14a806c2a1 installation 2026-04-18 20:48:34 -04:00
Kye Gomez
7abd3e5d20 cleanup 2026-04-18 20:44:23 -04:00
Kye Gomez
0818bf815b readme 2026-04-18 20:43:05 -04:00
Kye Gomez
fb2939b6c4 [DOCS] [LICENSE] 2026-04-18 20:20:47 -04:00
Kye Gomez
0699c00c94 tests 2026-04-18 10:12:45 -04:00
Kye Gomez
c258cdc8da example .py 2026-04-18 09:55:56 -04:00
Kye Gomez
791626667d [FEAT][Main.py] 2026-04-18 09:53:06 -04:00
Kye Gomez
4ce503f3dd moe 2026-04-18 09:15:46 -04:00
Kye Gomez
79b916d999 paper 2026-04-18 09:07:19 -04:00
Kye Gomez
4aea9d9dd5 reference 2026-04-18 09:02:50 -04:00
Kye Gomez
8fa70224c7
Initial commit 2026-04-18 08:57:52 -04:00