[Feature Request] Recurrent Depth Latent Reasoning

Potentially significant implications for scaling performance of distributed inference. Potentially greater implications for distributing inference than a naive implementation (An initial thought/guess; citation needed). Transformers has it via:

> The model requires its own KV-cache implementation `HuginnDynamicCache`, otherwise the KV-caches of later calls to the recurrent block will overwrite the earlier ones.

but no idea if this makes sacrifices/unrealized potential.

Having recently read https://github.com/bigscience-workshop/petals/issues/483 and listening to the pod got me curious about it. The are the obvious benefits but I'm wondering more about distributing inference for a single request. It's a pipe-dream until it isn't.

## Papers

https://arxiv.org/abs/2502.05171
https://arxiv.org/abs/2402.14020

**POC Model**: https://huggingface.co/tomg-group-umd/huginn-0125

## Code

https://github.com/seal-rg/recurrent-pretraining

https://github.com/gair-nlp/prox

**Interview Pod:** https://www.youtube.com/watch?v=dY90DXLi0vk


easy


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Recurrent Depth Latent Reasoning #647

Papers

Code

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Recurrent Depth Latent Reasoning #647

Description

Papers

Code

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions