-
-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Add support to send voice messages #10230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This is discussed here, and iirc, this was not planned at the moment, might be wrong though, worth checking: https://discord.com/channels/336642139381301249/1100398536920662117 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was the code in this PR AI generated or something? Frankly I'd refuse to believe otherwise, due to seeing 3 different conventions/styles as well as a mish-mash of logic that doesn't really make sense.
I appreciate it's still in draft, but even so.
Discord has added the message flag for "IS_VOICE_MESSAGE" in their offical docs so I think we good? https://discord.com/developers/docs/resources/message#message-object-message-flags |
Bit of AI was used as I was stuggling to get it all working. Like how I imported |
Pull requests that use AI generated code do not get accepted due to the possible licensing implications. |
The only ai part is the part where |
Additionally I'm not sure where the library stands on sending "fake" data, we don't do it anywhere else so sending a fake waveform and fake size likely won't work. |
Fake size is a placeholder (as commented) so I can send stuff while testing, do plan adding actual code to the The waveform data is only used as a visual preview for how the message looks. I do plan on looking into if I can get the real waveform data, but the only code I've seen that sends a voice message (https://gist.github.com/max-4-3/6704422045b1c21a4a3e9226fca38f68) does the same thing that I am currently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found a much simpler way of sending voice messages while looking through https://discord.com/channels/336642139381301249/1100398536920662117
Example code for sending a voice message is channel = await bot.fetch_channel(xxx)
file = discord.VoiceMessageFile("test.mp3", duration=21)
await channel.send(file=file, voice=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't make a whole new File class, and instead have voice
be an init parameter or a generated one (i.e. if duration
is provided, or similar) of File
and use this as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR.
A few nits though:
- Instead of passing
voice=True
at thesend
site, it should be at theFile
site instead because that's where it logically is. A voice message is nothing more than a fancy attachment that toggles a flag. - You should check whether multiple embedded audio attachments are supported per message.
- You're missing
.. versionadded:: 2.6
on the newFile
properties. - You're missing documentation for
duration
(which you typo'd asduation
a few times). - The waveform should be generated properly.
- More documentation is needed for this in general, such as what audio format is expected to be uploaded for it to be a voice message etc.
Thanks, didn't catch the typos there. Also added For points 5 and 6, discord accepts a wide range of audio formats. Found it to work with
|
Looking at the documentation you can probably make a valid waveform for the .ogg ones using the facilities in the library, if possible, otherwise you can fallback to the random waveform. |
From what i can see, the from discord.oggparse import OggStream
from discord.opus import Decoder
filename = "voice-message.ogg"
f = open(filename, 'rb')
ogg = OggStream(f)
decoder = Decoder()
for packet in ogg.iter_packets():
decoded = decoder.decode(packet) # discord.opus.OpusError: corrupted stream
print(decoded) Looked into another library called soundfile and this seems to be able to generate waveform for ogg, mp3 and wav. |
I've discussed this waveform idea in depth and it should be possible using the opus/ogg items the library provides (I will do some testing when I have the time, soon) and generating the correct and real waveform should be a fast process. However, this does mean we'll need to accept only |
Just had a deeper look into what each packet from my code example above has in it, and seems like the first 2 are metadata. If I just skip those 2 packets then decoder works fine all the way through. from discord.oggparse import OggStream
from discord.opus import Decoder
filename = "voice-message.ogg"
f = open(filename, 'rb')
ogg = OggStream(f)
decoder = Decoder()
count = 0
for packet in ogg.iter_packets():
if count < 2:
count += 1
continue
decoded: bytes = decoder.decode(packet) # Gives actual bytes now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My personal thoughts, I think this is a bit of a tough one to approach.
discord/file.py
Outdated
try: | ||
self._waveform = self.generate_waveform() | ||
except Exception: | ||
self._waveform = base64.b64encode(os.urandom(256)).decode('utf-8') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I'm a big fan of this caveat, especially not as a catch-all except case. I think us handling audio extraction for anything outside of opus 'in house' is pretty far out of scope, but this feels like a design 'lock-in' that could prevent people who have the means to do the waveform generation themselves from doing so. Maybe the waveform property could have a setter, or maybe there could be a way to construct a File (or a specialized subclass) with a waveform provided e.g. File.voice_message_with_waveform(data, waveform)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can actually pass in your own waveform when creating a file
file = discord.File('voice-message.ogg', duration=5.0, waveform='AAAAA...')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that makes this less of an issue, but I still think we should avoid generating a fake waveform if we can, maybe we could just let the exception be raised instead or split generation and non-generation into classmethod-based signatures like suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Problem is that if someone passes in an mp3 file, we still need to generate a waveform. I don't see the point in adding extra steps for the user to generate a waveform dependant on if they have the correct audio type as that could be confusing
point_count: int = self.duration * 10 # type: ignore | ||
point_count = min(point_count, 255) | ||
points_per_sample: int = len(waveform) // point_count | ||
sample_waveform: list[int] = [] | ||
|
||
total, count = 0, 0 | ||
# Average out the amplitudes for each point within a sample | ||
for i in range(len(waveform)): | ||
total += waveform[i] | ||
count += 1 | ||
if i % points_per_sample == 0: | ||
sample_waveform.append(total // count) | ||
total, count = 0, 0 | ||
|
||
# Maximum value of a waveform is 0xff (255) | ||
highest = max(sample_waveform) | ||
mult = 255 / highest | ||
for i in range(len(sample_waveform)): | ||
sample_waveform[i] = int(sample_waveform[i] * mult) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already rely on audioop
or audioop-lts
for newer versions of Python, which includes many audio operations implemented in C. I don't have a lot of reason to believe this operation is that slow, but I'm inclined to believe we can probably use audioop to make this simpler and faster to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
audioop is depricated in python 3.11 and removed since 3.13
https://docs.python.org/3/library/audioop.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already include audioop
for 3.13+ using audioop-lts
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The part of code you have commented this on doesn't decode or process the Opus data.
Lines 259 to 277 just processes the list of int
s, that represent the waveform, to be in the form that discord expects them to be (values between 0-255 and a maximum of 255 values)
Since its just a list of ints, not sure if audioop would be applicable?
Summary
Adds support for voice messags
closes #10165
Checklist