diff --git a/README.md b/README.md
index c8cf190..e37970f 100644
--- a/README.md
+++ b/README.md
@@ -1,7 +1,7 @@
 # OpenMythos
+
 A theoretical reconstruction of the Claude Mythos architecture, built from first principles using the available research literature.
 
----
 
 ## The Central Hypothesis
 
@@ -145,6 +145,14 @@ Mythos almost certainly has some version of this. The model cannot naively run t
 
 ---
 
+## Mixture of Experts — Suspected for Large Parameter Counts
+
+The looped transformer explains the depth of Mythos's reasoning, but not the breadth. Handling wildly different domains — code, math, literature, science, law — with the same weights requires **Mixture of Experts (MoE)**. The suspected design replaces every FFN in the Recurrent Block with a fine-grained MoE layer: each FFN is split into many small experts (1/m the normal size), a router selects the top-mK of them per token via learned affinity scores, and a small number of **shared experts** are always activated regardless of routing to absorb common cross-domain knowledge — syntax, basic reasoning, general context — that would otherwise be redundantly learned by every routed expert. Routing collapse is prevented through a bias term on the router logits adjusted dynamically during training, keeping load balanced across experts without distorting the loss signal. 
+
+As the hidden state `h_t` evolves across loop iterations, the router may select different expert subsets at each depth, making every loop computationally distinct despite shared weights. MoE provides breadth; looping provides depth. If the activation ratio is ~5%, Mythos could hold hundreds of billions of total parameters while activating only a small fraction per token — the true parameter count, if ever disclosed, would be a storage number, not a compute number.
+
+---
+
 ## The Memorization-Reasoning Tradeoff
 
 Looped models exhibit an interesting dichotomy: looping improves reasoning but can hurt memorization. The recurrent structure is optimized for iterative composition — running a reasoning chain forward — but does not inherently improve the storage of rote facts.
@@ -181,12 +189,13 @@ Theoretical analysis suggests 2-3x improvements in inference throughput. For a d
 | Property | Description |
 |---|---|
 | Architecture | Recurrent-Depth Transformer (Prelude + Looped Recurrent Block + Coda) |
+| FFN layer | Suspected MoE — fine-grained experts + always-on shared experts |
+| Parameter count | Very large total; small fraction activated per token (~5% estimate) |
 | Reasoning mechanism | Implicit multi-hop via iterative latent updates — no token output between steps |
 | Inference-time scaling | More loops = deeper reasoning, following predictable exponential decay |
 | Training stability | LTI-constrained injection parameters with spectral radius < 1 |
 | Loop differentiation | Likely uses loop-index positional embedding (à la RoPE) per iteration |
 | Halting | Adaptive Computation Time or learned convergence criterion |
-| Parameter efficiency | Achieves quality of a ~2x larger fixed-depth transformer |
 | Scaling law | Optimal training scales looping and data together, not parameters alone |
 | Reasoning vs. memory | Structurally biased toward composition; memorization requires separate treatment |
 | Deployment | Continuous Depth-wise Batching enables variable compute per request |
@@ -205,6 +214,7 @@ Theoretical analysis suggests 2-3x improvements in inference throughput. For a d
 
 ### Papers
 
+- Fine-grained expert segmentation and shared expert isolation in MoE: https://arxiv.org/abs/2401.06066
 - Loop, Think, & Generalize — Implicit Reasoning in Recurrent Depth Transformers: https://arxiv.org/pdf/2604.07822
 - Parcae — Scaling Laws for Stable Looped Language Models: https://arxiv.org/abs/2604.12946
 - Parcae blog: https://sandyresearch.github.io/parcae/