Running gpt-oss-120b model with llama.cpp on H100 GPUs? #16198

bartlettroscoe · 2025-09-23T11:59:01Z

bartlettroscoe
Sep 23, 2025

Has anyone had success running the gpt-oss-120b model on NVIDIA H100 GPUs? I can't find any evidence of anyone using llama.cpp to run the gpt-oss-120b model on an H100 GPU, even though there is lots of talk about gpt-oss-120b running on an H100, like:

https://platform.openai.com/docs/models/gpt-oss-120b

However, that post mentions vLLM and vLLM that does not support tool calling with the gpt-oss models, so you can't use vLLM to serve the gpt-oss models and use them with an agentic coding agent like Codex CLI (OpenAI's own coding agent). See:

So that leaves us with llama.cpp to try to run the gpt-oss models on H100s (and we actually have a bunch of H100s that we can use). However, when I tried to build and run llama.cpp to serve the gpt-oss-20b and gpt-oss-120b models on our H100s (using llama-server), we are getting getting gibberish from the model output like reported at:

Eval bug: Gibberish with long context prompts GPT OSS #15112

This seems like it might be some type of numerical problem on this machine or with the CUDA version we are using?

NOTE: There is evidence that people have run llama.cpp to run other models on H100s like:

Performance of llama.cpp on Nvidia CUDA #15013

Has anyone had any luck getting these gpt-oss models to run on H100s with llama.cpp?

FYI: See the identical Reddit post:

https://www.reddit.com/r/LocalLLaMA/comments/1nofb2s/running_gptoss120b_model_with_llamacpp_on_h100/

JohannesGaessler · 2025-09-23T13:10:59Z

JohannesGaessler
Sep 23, 2025
Collaborator

The numerical issues described in the linked issue were specifically due to the use of GGML_CUDA_FORCE_CUBLAS. All GPUs with compute capability 8.9 and higher use the same code path in llama.cpp/ggml, if you get garbled results please open an issue with the exact steps to reproduce.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Running gpt-oss-120b model with llama.cpp on H100 GPUs? #16198

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Running gpt-oss-120b model with llama.cpp on H100 GPUs? #16198

Uh oh!

bartlettroscoe Sep 23, 2025

Replies: 1 comment

Uh oh!

JohannesGaessler Sep 23, 2025 Collaborator

bartlettroscoe
Sep 23, 2025

JohannesGaessler
Sep 23, 2025
Collaborator