Skip to content

Add support to send voice messages #10230

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 28 commits into
base: master
Choose a base branch
from
Open

Conversation

blord0
Copy link
Contributor

@blord0 blord0 commented Jul 16, 2025

Summary

Adds support for voice messags
closes #10165

Checklist

  • If code changes were made then they have been tested.
    • I have updated the documentation to reflect the changes.
  • This PR fixes an issue.
  • This PR adds something new (e.g. new method or parameters).
  • This PR is a breaking change (e.g. methods or parameters removed/renamed)
  • This PR is not a code change (e.g. documentation, README, ...)

@DA-344
Copy link
Contributor

DA-344 commented Jul 16, 2025

This is discussed here, and iirc, this was not planned at the moment, might be wrong though, worth checking: https://discord.com/channels/336642139381301249/1100398536920662117

Copy link
Contributor

@AbstractUmbra AbstractUmbra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the code in this PR AI generated or something? Frankly I'd refuse to believe otherwise, due to seeing 3 different conventions/styles as well as a mish-mash of logic that doesn't really make sense.

I appreciate it's still in draft, but even so.

@blord0
Copy link
Contributor Author

blord0 commented Jul 16, 2025

This is discussed here, and iirc, this was not planned at the moment, might be wrong though, worth checking: https://discord.com/channels/336642139381301249/1100398536920662117

Discord has added the message flag for "IS_VOICE_MESSAGE" in their offical docs so I think we good? https://discord.com/developers/docs/resources/message#message-object-message-flags

@blord0
Copy link
Contributor Author

blord0 commented Jul 16, 2025

Was the code in this PR AI generated or something? Frankly I'd refuse to believe otherwise, due to seeing 3 different conventions/styles as well as a mish-mash of logic that doesn't really make sense.

I appreciate it's still in draft, but even so.

Bit of AI was used as I was stuggling to get it all working. Like how I imported requests so that I could get actual values from the requests.
I will use the actual libraries/clean it all up tho (including the horrific amount of print statments i gotta remove)

@AbstractUmbra
Copy link
Contributor

Pull requests that use AI generated code do not get accepted due to the possible licensing implications.
Not to mention the quality of AI generate code being abysmal.

@blord0
Copy link
Contributor Author

blord0 commented Jul 16, 2025

Pull requests that use AI generated code do not get accepted due to the possible licensing implications. Not to mention the quality of AI generate code being abysmal.

The only ai part is the part where requests is imported and used. That was never intended for the final merge.
All other abysmal code/syntax/logic was just me doing patchwork to get it working and returning some values

@AbstractUmbra
Copy link
Contributor

Additionally I'm not sure where the library stands on sending "fake" data, we don't do it anywhere else so sending a fake waveform and fake size likely won't work.

@blord0
Copy link
Contributor Author

blord0 commented Jul 16, 2025

Additionally I'm not sure where the library stands on sending "fake" data, we don't do it anywhere else so sending a fake waveform and fake size likely won't work.

Fake size is a placeholder (as commented) so I can send stuff while testing, do plan adding actual code to the .size() method for final

The waveform data is only used as a visual preview for how the message looks. I do plan on looking into if I can get the real waveform data, but the only code I've seen that sends a voice message (https://gist.github.com/max-4-3/6704422045b1c21a4a3e9226fca38f68) does the same thing that I am currently.
It isn't checked by discord so if there is no method of getting it (without external tools like ffmpeg), then the message won't fail to send. Also the reason I allow users to pass in their own waveform right now
image

Copy link
Contributor Author

@blord0 blord0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a much simpler way of sending voice messages while looking through https://discord.com/channels/336642139381301249/1100398536920662117

@blord0 blord0 marked this pull request as ready for review July 17, 2025 12:23
@blord0
Copy link
Contributor Author

blord0 commented Jul 17, 2025

Example code for sending a voice message is

  channel = await bot.fetch_channel(xxx)
  file = discord.VoiceMessageFile("test.mp3", duration=21)
  await channel.send(file=file, voice=True)

Copy link
Contributor

@AbstractUmbra AbstractUmbra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't make a whole new File class, and instead have voice be an init parameter or a generated one (i.e. if duration is provided, or similar) of File and use this as needed.

Copy link
Owner

@Rapptz Rapptz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

A few nits though:

  1. Instead of passing voice=True at the send site, it should be at the File site instead because that's where it logically is. A voice message is nothing more than a fancy attachment that toggles a flag.
  2. You should check whether multiple embedded audio attachments are supported per message.
  3. You're missing .. versionadded:: 2.6 on the new File properties.
  4. You're missing documentation for duration (which you typo'd as duation a few times).
  5. The waveform should be generated properly.
  6. More documentation is needed for this in general, such as what audio format is expected to be uploaded for it to be a voice message etc.

@blord0
Copy link
Contributor Author

blord0 commented Jul 28, 2025

Thanks, didn't catch the typos there. Also added File.voice bool to for checking if the file is a voice message now
If you try to add multiple voice messages to a message, discord throws a 400 Bad Request (error code: 50160): Voice messages must have a single audio attachment

For points 5 and 6, discord accepts a wide range of audio formats. Found it to work with ogg, mp3, wav, aac and flac. Doesn't work with .mp4
Each different file format would need its own way to calculate the waveform data. I can find out ways to calculate waveform data for each of the types above, but

  1. that list above is non-exhausitve
  2. it would require adding more external dependencies to the project (not sure how you feel about that)

@Rapptz
Copy link
Owner

Rapptz commented Jul 28, 2025

Looking at the documentation you can probably make a valid waveform for the .ogg ones using the facilities in the library, if possible, otherwise you can fallback to the random waveform.

@blord0
Copy link
Contributor Author

blord0 commented Jul 28, 2025

From what i can see, the oggparse and opus libraries can only be used for voice data, and fail when used on general opus files. Even a simple script like this fails due to a corrupted stream error. I downloaded my own voice-message.ogg from discord and checked with ffprobe to make sure it was a valid ogg file

from discord.oggparse import OggStream
from discord.opus import Decoder

filename = "voice-message.ogg"
f = open(filename, 'rb')
ogg = OggStream(f)
decoder = Decoder()
for packet in ogg.iter_packets():
  decoded = decoder.decode(packet) # discord.opus.OpusError: corrupted stream
  print(decoded)

Looked into another library called soundfile and this seems to be able to generate waveform for ogg, mp3 and wav.
Could impliment somthing similar to PyNacl, where you don't have to install it but it is required to generate the real waveform data without falling back to a random waveform?

@AbstractUmbra
Copy link
Contributor

soundfile is a rather large dependency, since it includes numpy and a few others. Each wheel is around 1mib. I personally think that's a bit overkill.

I've discussed this waveform idea in depth and it should be possible using the opus/ogg items the library provides (I will do some testing when I have the time, soon) and generating the correct and real waveform should be a fast process.

However, this does mean we'll need to accept only OGG formatted files, it will be on the end user to do the necessary conversion somehow, unless we do decide to go in depth.

@blord0
Copy link
Contributor Author

blord0 commented Jul 28, 2025

Just had a deeper look into what each packet from my code example above has in it, and seems like the first 2 are metadata. If I just skip those 2 packets then decoder works fine all the way through.
Going to have a look around to see how to turn that data into waveform stuff

from discord.oggparse import OggStream
from discord.opus import Decoder

filename = "voice-message.ogg"
f = open(filename, 'rb')
ogg = OggStream(f)
decoder = Decoder()
count = 0
for packet in ogg.iter_packets():
    if count < 2:
        count += 1
        continue
    decoded: bytes = decoder.decode(packet) # Gives actual bytes now

Copy link
Contributor

@scarletcafe scarletcafe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My personal thoughts, I think this is a bit of a tough one to approach.

discord/file.py Outdated
try:
self._waveform = self.generate_waveform()
except Exception:
self._waveform = base64.b64encode(os.urandom(256)).decode('utf-8')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I'm a big fan of this caveat, especially not as a catch-all except case. I think us handling audio extraction for anything outside of opus 'in house' is pretty far out of scope, but this feels like a design 'lock-in' that could prevent people who have the means to do the waveform generation themselves from doing so. Maybe the waveform property could have a setter, or maybe there could be a way to construct a File (or a specialized subclass) with a waveform provided e.g. File.voice_message_with_waveform(data, waveform).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can actually pass in your own waveform when creating a file
file = discord.File('voice-message.ogg', duration=5.0, waveform='AAAAA...')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that makes this less of an issue, but I still think we should avoid generating a fake waveform if we can, maybe we could just let the exception be raised instead or split generation and non-generation into classmethod-based signatures like suggested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem is that if someone passes in an mp3 file, we still need to generate a waveform. I don't see the point in adding extra steps for the user to generate a waveform dependant on if they have the correct audio type as that could be confusing

Comment on lines +259 to +277
point_count: int = self.duration * 10 # type: ignore
point_count = min(point_count, 255)
points_per_sample: int = len(waveform) // point_count
sample_waveform: list[int] = []

total, count = 0, 0
# Average out the amplitudes for each point within a sample
for i in range(len(waveform)):
total += waveform[i]
count += 1
if i % points_per_sample == 0:
sample_waveform.append(total // count)
total, count = 0, 0

# Maximum value of a waveform is 0xff (255)
highest = max(sample_waveform)
mult = 255 / highest
for i in range(len(sample_waveform)):
sample_waveform[i] = int(sample_waveform[i] * mult)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already rely on audioop or audioop-lts for newer versions of Python, which includes many audio operations implemented in C. I don't have a lot of reason to believe this operation is that slow, but I'm inclined to believe we can probably use audioop to make this simpler and faster to do.

Copy link
Contributor Author

@blord0 blord0 Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

audioop is depricated in python 3.11 and removed since 3.13
https://docs.python.org/3/library/audioop.html

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already include audioop for 3.13+ using audioop-lts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The part of code you have commented this on doesn't decode or process the Opus data.
Lines 259 to 277 just processes the list of ints, that represent the waveform, to be in the form that discord expects them to be (values between 0-255 and a maximum of 255 values)
Since its just a list of ints, not sure if audioop would be applicable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Voice message support
5 participants