A from-scratch implementation of a T5 model modified with Rotary Position Embeddings (RoPE). This project includes the code for pre-training on the C4 dataset in streaming mode with Flash Attention 2.
nlp pytorch sequence-to-sequence language-model from-scratch rope pre-training huggingface t5 evaluation-benchmark llm rotary-position-embedding flash-attention c4-dataset span-corruption
-
Updated
Jul 9, 2025 - Python