diff --git a/README.md b/README.md
index af2d774..b1b93a6 100644
--- a/README.md
+++ b/README.md
@@ -414,6 +414,7 @@ Theoretical analysis suggests 2-3x improvements in inference throughput. For a d
 - Training Large Language Models to Reason in a Continuous Latent Space: https://arxiv.org/abs/2412.06769
 - Relaxed Recursive Transformers — Effective Parameter Sharing with Layer-wise LoRA: https://arxiv.org/pdf/2410.20672
 - Mixture-of-Depths Attention: https://arxiv.org/abs/2603.15619
+- Hyperloop Transformers: https://arxiv.org/abs/2604.21254
 
 ---