diff --git a/README.md b/README.md index af2d774..b1b93a6 100644 --- a/README.md +++ b/README.md @@ -414,6 +414,7 @@ Theoretical analysis suggests 2-3x improvements in inference throughput. For a d - Training Large Language Models to Reason in a Continuous Latent Space: https://arxiv.org/abs/2412.06769 - Relaxed Recursive Transformers — Effective Parameter Sharing with Layer-wise LoRA: https://arxiv.org/pdf/2410.20672 - Mixture-of-Depths Attention: https://arxiv.org/abs/2603.15619 +- Hyperloop Transformers: https://arxiv.org/abs/2604.21254 ---