Audio encodings now match conv2d weight dtype in Gemma3nAudioSSCPConvBlock #39743

Malav-P · 2025-07-29T01:31:30Z

What does this PR do?

The change ensures that the audio encodings are cast to the same type as conv.weight tensor. This is done by appending to(self.conv.weight.dtype) to the audio_encodings_padded rvalue.

Fixes an issue where the conv2d forward pass throws an error. Code for reproducible error (run on Mac M1) :

from transformers import pipeline
import torch

pipe = pipeline(
    "image-text-to-text",
    model="google/gemma-3n-e2b-it",
    device=0,
    torch_dtype=torch.bfloat16,
    cache_implementation="static"
)


messages = [
    {
        "role": "user",
        "content": [
            {"type": "audio", "audio": "5676.wav"},
            {"type": "text", "text": "Transcribe this audio file."}
        ]
    }
]

output = pipe(text=messages, max_new_tokens=200, torch_dtype=torch.bfloat16)
print(output[0]["generated_text"][-1]["content"])

Full stack trace is attached error.txt.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Suggested Reviewers

@ArthurZucker

Rocketknight1 · 2025-07-29T11:12:22Z

cc @eustlb

ArthurZucker

This makes sense, happy to merge!

github-actions · 2025-07-29T17:11:53Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: gemma3n

Malav-P · 2025-07-29T19:23:41Z

This makes sense, happy to merge!

Awesome, ty!

audio encodings now match conv weight dtype in Gemma3nAudioSSCPConvBlock

522ab46

ArthurZucker approved these changes Jul 29, 2025

View reviewed changes

Malav-P added 3 commits July 29, 2025 11:20

Merge branch 'main' into gemma3n-audiofeatures-dtypefix

0a4256a

Merge branch 'main' into gemma3n-audiofeatures-dtypefix

9876b60

Merge branch 'main' into gemma3n-audiofeatures-dtypefix

0982375

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Audio encodings now match conv2d weight dtype in Gemma3nAudioSSCPConvBlock #39743

Audio encodings now match conv2d weight dtype in Gemma3nAudioSSCPConvBlock #39743

Malav-P commented Jul 29, 2025

Uh oh!

Rocketknight1 commented Jul 29, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

github-actions bot commented Jul 29, 2025

Uh oh!

Malav-P commented Jul 29, 2025

Uh oh!

Uh oh!

Audio encodings now match conv2d weight dtype in Gemma3nAudioSSCPConvBlock #39743

Are you sure you want to change the base?

Audio encodings now match conv2d weight dtype in Gemma3nAudioSSCPConvBlock #39743

Conversation

Malav-P commented Jul 29, 2025

What does this PR do?

Before submitting

Suggested Reviewers

Uh oh!

Rocketknight1 commented Jul 29, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 29, 2025

Uh oh!

Malav-P commented Jul 29, 2025

Uh oh!

Uh oh!