diff --git a/README.md b/README.md
index b1b93a6..2f2b4e0 100644
--- a/README.md
+++ b/README.md
@@ -415,6 +415,7 @@ Theoretical analysis suggests 2-3x improvements in inference throughput. For a d
 - Relaxed Recursive Transformers — Effective Parameter Sharing with Layer-wise LoRA: https://arxiv.org/pdf/2410.20672
 - Mixture-of-Depths Attention: https://arxiv.org/abs/2603.15619
 - Hyperloop Transformers: https://arxiv.org/abs/2604.21254
+- The Recurrent Transformer: Greater Effective Depth and Efficient Decoding: https://arxiv.org/abs/2604.21215
 
 ---