Add Code World Model (CWM) #41199

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

jacobkahn wants to merge 27 commits into huggingface:main from jacobkahn:main

+1,656 −0

Contributor

jacobkahn commented Sep 29, 2025

Adds the Code World Model (CWM) - https://ai.meta.com/research/publications/cwm-an-open-weights-llm-for-research-on-code-generation-with-world-models/

High-level implementation details:

This is a GQA + local/global sliding window attention model
Implemented in HF Llama3 + interleaved sliding window attention
Inheriting from Gemma2/3 requires weight remapping which breaks VLLM compatibility and other components, so this is implemented using the existing causal mask utils from HF

The model repos are:

Note that for VLLM compatibility, model config.json still refer to Llama3ForCausalLM and a llama model_type — see example. vllm-project/vllm#25611 adds support mapping CwmForCausalLM to the Llama3 model class in VLLM since VLLM supports Llama3 + layer_types with local/global attention - see docs. The model type in the config.json will be updated on HF (and the special automapping condition removed) once this PR is merged and a Transformers release has happened containing the CwmForCausalLM model class.

@ArthurZucker, @zucchini-nlp

Supersedes #41188 due to some fork misery

jacobkahn and others added 12 commits

September 24, 2025 18:31


          [wip][cwm] Code World Model stubs and setup in HF Transformers

ed328b6


          [wip] Get other things working

c6adecb


          [wip] Working


          Tokenizer pad

20485eb


          fix: cwm window attn

a7bfd9f


          temp remove test

db2da31


          temp remove test

7d31c57


          Merge branch 'main' into develop

9bab5a7


          Fixes

3b3c910


          Temporarily add auto config remapping option until VLLM 0.11 is out

95c3013


          Fix model type and add layer validation

db58f4f


          Add Code World Model (CWM)

db14577

jacobkahn mentioned this pull request

feat: support cwm modeling #41188

Closed

jacobkahn added 14 commits

September 29, 2025 14:35


          Lint, remove CwmForSequenceClassification

46c55e2


          Lint, tests

05a9fb0


          Remove CwmForSequenceClassification

fb8b721


          Lint

921c4ba


          Remove intermediary layer expors/doc errorss, fix tests

ee19e08


          Lint

beaa15f


          run python utils/sort_auto_mappings.py --check_only

c7a4be2


          Remove Cwm processor mapping, get check_repo passing

05e86aa


          Remove CwmTextConfig from test

aa61459


          Add docstring for CwmConfig

e83610a


          remove global_window and window_pattern params from config

cc53513


          Fix docstrings

a662db1


          Revert change to auto docstring util

2f8228d


          lint

9eb95a9

zucchini-nlp reviewed

View reviewed changes

Member

zucchini-nlp left a comment

Thanks, left some comments to clean up. Btw, do we have converted weights already which we can use for the integration tests?

docs/source/en/model_doc/cwm.md Outdated Show resolved Hide resolved

src/transformers/models/auto/configuration_auto.py Outdated Show resolved Hide resolved

src/transformers/models/cwm/__init__.py Outdated Show resolved Hide resolved

src/transformers/models/cwm/modular_cwm.py Outdated Show resolved Hide resolved

src/transformers/models/cwm/modular_cwm.py Outdated Show resolved Hide resolved

tests/models/cwm/test_modeling_cwm.py Outdated Show resolved Hide resolved

tests/models/cwm/test_modeling_cwm.py

Comment on lines +144 to +156

    
                      config = self.model_tester.get_config()

                      model = CwmModel(config)

                      model.to(torch_device)

                      model.eval()

                      # input longer than sliding window

                      seq_length = config.sliding_window + 10

                      input_ids = torch.randint(0, config.vocab_size, (1, seq_length), device=torch_device)

                      with torch.no_grad():

                          outputs = model(input_ids)

                      self.assertEqual(outputs.last_hidden_state.shape, (1, seq_length, config.hidden_size))

Member

zucchini-nlp Sep 30, 2025

better if we can make an integration test and check that the generated ids are correct

tests/models/cwm/test_modeling_cwm.py Outdated Show resolved Hide resolved

tests/models/cwm/test_modeling_cwm.py

Comment on lines +184 to +185

    
                      # no errors

                      self.assertIsNotNone(outputs.last_hidden_state)

Member

zucchini-nlp Sep 30, 2025

not clear what we are testing for here, last hidden state can never be None, no?

tests/models/cwm/test_modeling_cwm.py

    
                  def tearDown(self):

                      cleanup(torch_device, gc_collect=True)

                  def test_cwm_small_model_forward(self):

Member

zucchini-nlp Sep 30, 2025

let's use actual model for all below tests and check generated token ids or logit values for a few important cases (sliding window, simple generation etc)


          Fixes minus test improvements

b885a8b

Contributor

github-actions bot commented Sep 30, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, cwm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet