Commit Graph

  • 8c68c1fcc7
    Update README.md main Kye Gomez 2026-04-27 10:58:47 +02:00
  • f261645e5f
    Update README.md Kye Gomez 2026-04-27 01:13:40 +02:00
  • 227dbb1532 new examples folder Kye Gomez 2026-04-22 13:00:57 -04:00
  • 963e11277d tiny tests Kye Gomez 2026-04-22 12:48:33 -04:00
  • 7d78ebec79 flash attn Kye Gomez 2026-04-22 12:15:37 -04:00
  • eae0f04b8e fix training Kye Gomez 2026-04-20 09:43:25 -04:00
  • 289981ba01 [bugf][act-halting][gate halted positions from weight accumulation][bugf][moe-router-bias][stop load balance bias gradient leak][bugf][act-cache-consistency][keep all loops with kv cache][bugf][lora-depth-extrapolation][clamp scale index beyond max loops][improvement][pyproject-version][bump version to 0 4 0] Kye Gomez 2026-04-20 09:17:43 -04:00
  • 7ba690797b [improvement][loguru-logging][replace print with loguru in training script][feat][ckpt-logging][add checkpoint start and success log events][docs][readme-optimizer][remove muon optimizer reference][feat][train-requirements][add requirements txt to training folder] Kye Gomez 2026-04-20 08:25:00 -04:00
  • 18cca894dd [fix][rope Every decode token was stuck at position 0, so <q_decoded, k_cached> lost the (n - m) term entirely] Kye Gomez 2026-04-20 08:19:14 -04:00
  • 537b116b3e just use adam for now in training maybe add muon later Kye Gomez 2026-04-19 23:34:58 -04:00
  • 137cd8832e readme Kye Gomez 2026-04-19 23:21:50 -04:00
  • f37d405a81 readme Kye Gomez 2026-04-19 23:21:39 -04:00
  • ef2cac6c3a readme Kye Gomez 2026-04-19 23:20:53 -04:00
  • 12f6c5b32e [docs][readme-badges][add pypi twitter github badges to readme][improvement][init-sort][sort imports alphabetically in init][improvement][version-bump][bump version to 0.3.0][feat][tokenizer-class][add MythosTokenizer to init exports][feat][test-tokenizer][add tokenizer test suite with printed output] Kye Gomez 2026-04-19 23:18:05 -04:00
  • 5ffb897dcf [feat][training-script][add 3b fineweb-edu training script][feat][tokenizer][add MythosTokenizer class with encode decode][improvement][deps][add transformers and datasets dependencies][docs][readme-training][add training section with run commands][improvement][pyproject][pin torch and add new deps] Kye Gomez 2026-04-19 22:48:30 -04:00
  • 97bc414977 use gpt-oss tokenizer because it's a great tokenizer Kye Gomez 2026-04-19 22:28:09 -04:00
  • 5cfef742b5 [feat][dropout][add dropout to config attn and residuals][docs][datasets][add recommended training datasets doc][docs][readme-datasets][link datasets doc in readme] Kye Gomez 2026-04-19 22:15:41 -04:00
  • 0623ceb960
    Update README.md Kye Gomez 2026-04-19 22:01:10 -04:00
  • a825ba217f [Update variants with parameter count equation] Kye Gomez 2026-04-19 21:55:06 -04:00
  • f00d10b59f more references fix Kye Gomez 2026-04-19 21:39:13 -04:00
  • e522ecf630 clean up Kye Gomez 2026-04-19 15:04:30 -04:00
  • 1c54259fa8 format Kye Gomez 2026-04-19 15:04:03 -04:00
  • 9cce1c1401 variants Kye Gomez 2026-04-19 15:03:45 -04:00
  • 299a0cdb0a pip and readme Kye Gomez 2026-04-19 08:33:09 -04:00
  • 806a8da1d6 [bugf][lora-b-init][fix zero-init B making adapter always output zero][bugf][lti-get-a][fix 0 times inf NaN in log space computation][improvement][rope-theta-test][exclude degenerate dim0 from theta angle comparison] Kye Gomez 2026-04-18 23:51:01 -04:00
  • 53f786afda disclaimer and rope tests Kye Gomez 2026-04-18 23:44:31 -04:00
  • 14a806c2a1 installation Kye Gomez 2026-04-18 20:48:34 -04:00
  • 7abd3e5d20 cleanup Kye Gomez 2026-04-18 20:44:23 -04:00
  • 0818bf815b readme Kye Gomez 2026-04-18 20:43:05 -04:00
  • fb2939b6c4 [DOCS] [LICENSE] Kye Gomez 2026-04-18 20:20:47 -04:00
  • 0699c00c94 tests Kye Gomez 2026-04-18 10:12:45 -04:00
  • c258cdc8da example .py Kye Gomez 2026-04-18 09:55:56 -04:00
  • 791626667d [FEAT][Main.py] Kye Gomez 2026-04-18 09:53:06 -04:00
  • 4ce503f3dd moe Kye Gomez 2026-04-18 09:15:46 -04:00
  • 79b916d999 paper Kye Gomez 2026-04-18 09:07:19 -04:00
  • 4aea9d9dd5 reference Kye Gomez 2026-04-18 09:02:50 -04:00
  • 8fa70224c7
    Initial commit Kye Gomez 2026-04-18 08:57:52 -04:00