Skip to content

Add support to send voice messages #10230

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 28 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
4480fea
Add VoiceMessageFile class
Jul 16, 2025
31344ca
Add abc.channel.send_voice_message to allow sending voice messages
Jul 16, 2025
aa5c26f
Start work on sending the voice messages (not working)
Jul 16, 2025
a7270b2
First working version of sending voice messages
Jul 16, 2025
4a40683
Start cleaning up parts of the code
Jul 16, 2025
9a4617e
More cleanup
Jul 16, 2025
f9ca81a
Found a much simpler method to send voice messages
Jul 16, 2025
eb62338
`size` method no longer needed
Jul 17, 2025
27bca43
Remove unncessary code
Jul 17, 2025
d1747b9
Doc fixes and made `duration` a required field
Jul 17, 2025
2a96d13
Remove print statements
blord0 Jul 17, 2025
8332ca3
Move `VoiceMessageFile` into `File`
Jul 17, 2025
5231d51
Remove final reference to `VoiceMessageFile`
Jul 17, 2025
2e6bfd3
Merge branch 'Rapptz:master' into voice-messages
blord0 Jul 18, 2025
60030d8
Add error checking
Jul 18, 2025
1d2ab9c
Fix error checking
Jul 18, 2025
50cb4f6
Merge branch 'Rapptz:master' into voice-messages
blord0 Jul 23, 2025
3dd7f8f
Rename duation to duration
Jul 28, 2025
e5cca7d
Add File.voice attribute
Jul 28, 2025
8f1d548
Change checking for voice messages to use File.voice
Jul 28, 2025
9936b0d
Formatting change
Jul 28, 2025
8bc906e
Add real generation of waveforms for Opus files
Jul 28, 2025
dd2fd33
Formatting
Jul 28, 2025
bb4de89
Calculate correct number of points per sample
Jul 28, 2025
0f3bc42
Change TypeError to ValueError
Jul 28, 2025
8bea5c3
Change waveform data to be input as a list of ints
Jul 29, 2025
394b16e
Fix doc issues
Jul 29, 2025
0f1ded6
Merge branch 'Rapptz:master' into voice-messages
blord0 Jul 29, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion discord/abc.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@
T = TypeVar('T', bound=VoiceProtocol)

if TYPE_CHECKING:
from typing_extensions import Self

Check warning on line 77 in discord/abc.py

View workflow job for this annotation

GitHub Actions / check 3.x

Import "typing_extensions" could not be resolved from source (reportMissingModuleSource)

from .client import Client
from .user import ClientUser
Expand Down Expand Up @@ -1624,12 +1624,21 @@
if view and not hasattr(view, '__discord_ui_view__'):
raise TypeError(f'view parameter must be View not {view.__class__.__name__}')

if suppress_embeds or silent:
voice = False
if file is not None and file.voice:
if content is not None:
raise TypeError('Cannot send content with a voice message')
if embed is not None or embeds is not None:
raise TypeError('Cannot send embeds with a voice message')
voice = True

if suppress_embeds or silent or voice:
from .message import MessageFlags # circular import

flags = MessageFlags._from_value(0)
flags.suppress_embeds = suppress_embeds
flags.suppress_notifications = silent
flags.voice = voice
else:
flags = MISSING

Expand Down
123 changes: 122 additions & 1 deletion discord/file.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@

import os
import io
import base64
from .oggparse import OggStream
from .opus import Decoder
import struct

from .utils import MISSING

Expand Down Expand Up @@ -75,9 +79,37 @@ class File:
The file description to display, currently only supported for images.
.. versionadded:: 2.0
voice: :class:`bool`
Whether the file is a voice message. If left unspecified, the :attr:`~File.duration` is used
to determine if the file is a voice message.
.. note::
Voice files must be an audio only format.
A *non-exhaustive* list of supported formats are: `ogg`, `mp3`, `wav`, `aac`, and `flac`.
.. versionadded:: 2.6
duration: Optional[:class:`float`]
The duration of the voice message in seconds
.. versionadded:: 2.6
"""

__slots__ = ('fp', '_filename', 'spoiler', 'description', '_original_pos', '_owner', '_closer')
__slots__ = (
'fp',
'_filename',
'spoiler',
'description',
'_original_pos',
'_owner',
'_closer',
'duration',
'_waveform',
'voice',
)

def __init__(
self,
Expand All @@ -86,6 +118,9 @@ def __init__(
*,
spoiler: bool = MISSING,
description: Optional[str] = None,
voice: bool = MISSING,
duration: Optional[float] = None,
waveform: Optional[str] = None,
):
if isinstance(fp, io.IOBase):
if not (fp.seekable() and fp.readable()):
Expand Down Expand Up @@ -117,6 +152,15 @@ def __init__(

self.spoiler: bool = spoiler
self.description: Optional[str] = description
self.duration = duration
self._waveform = waveform

if voice is MISSING:
voice = duration is not None
self.voice = voice

if duration is None and voice:
raise TypeError('Voice messages must have a duration')

@property
def filename(self) -> str:
Expand All @@ -126,6 +170,24 @@ def filename(self) -> str:
"""
return 'SPOILER_' + self._filename if self.spoiler else self._filename

@property
def waveform(self) -> str:
""":class:`str`: The waveform data for the voice message.
.. note::
If a waveform was not given, it will be generated
Only supports generating the waveform for Opus format files, other files will be given a random waveform
.. versionadded:: 2.6"""
if self._waveform is None:
try:
self._waveform = self.generate_waveform()
except Exception:
self._waveform = base64.b64encode(os.urandom(256)).decode('utf-8')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I'm a big fan of this caveat, especially not as a catch-all except case. I think us handling audio extraction for anything outside of opus 'in house' is pretty far out of scope, but this feels like a design 'lock-in' that could prevent people who have the means to do the waveform generation themselves from doing so. Maybe the waveform property could have a setter, or maybe there could be a way to construct a File (or a specialized subclass) with a waveform provided e.g. File.voice_message_with_waveform(data, waveform).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can actually pass in your own waveform when creating a file
file = discord.File('voice-message.ogg', duration=5.0, waveform='AAAAA...')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that makes this less of an issue, but I still think we should avoid generating a fake waveform if we can, maybe we could just let the exception be raised instead or split generation and non-generation into classmethod-based signatures like suggested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem is that if someone passes in an mp3 file, we still need to generate a waveform. I don't see the point in adding extra steps for the user to generate a waveform dependant on if they have the correct audio type as that could be confusing

self.reset()
return self._waveform

@filename.setter
def filename(self, value: str) -> None:
self._filename, self.spoiler = _strip_spoiler(value)
Expand Down Expand Up @@ -156,4 +218,63 @@ def to_dict(self, index: int) -> Dict[str, Any]:
if self.description is not None:
payload['description'] = self.description

if self.voice:
payload['duration_secs'] = self.duration
payload['waveform'] = self.waveform

return payload

def generate_waveform(self) -> str:
if not self.voice:
raise TypeError("Cannot produce waveform for non voice file")
self.reset()
ogg = OggStream(self.fp) # type: ignore
decoder = Decoder()
waveform: list[int] = []
prefixes = [b'OpusHead', b'OpusTags']
for packet in ogg.iter_packets():
if packet[:8] in prefixes:
continue

if b'vorbis' in packet:
raise TypeError("File format is 'vorbis'. Format of 'opus' is required for waveform generation")

# these are PCM bytes in 16-bit signed little-endian form
decoded = decoder.decode(packet, fec=False)

# 16 bits -> 2 bytes per sample
num_samples = len(decoded) // 2

# https://docs.python.org/3/library/struct.html#byte-order-size-and-alignment
format = '<' + 'h' * num_samples
samples: tuple[int] = struct.unpack(format, decoded)

waveform.extend(samples)

# Make sure all values are positive
for i in range(len(waveform)):
if waveform[i] < 0:
waveform[i] = -waveform[i]

point_count: int = self.duration * 10 # type: ignore
point_count = min(point_count, 255)
points_per_sample: int = len(waveform) // point_count
sample_waveform: list[int] = []

total, count = 0, 0
# Average out the amplitudes for each point within a sample
for i in range(len(waveform)):
total += waveform[i]
count += 1
if i % points_per_sample == 0:
sample_waveform.append(total // count)
total, count = 0, 0

# Maximum value of a waveform is 0xff (255)
highest = max(sample_waveform)
mult = 255 / highest
for i in range(len(sample_waveform)):
sample_waveform[i] = int(sample_waveform[i] * mult)
Comment on lines +266 to +284
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already rely on audioop or audioop-lts for newer versions of Python, which includes many audio operations implemented in C. I don't have a lot of reason to believe this operation is that slow, but I'm inclined to believe we can probably use audioop to make this simpler and faster to do.

Copy link
Contributor Author

@blord0 blord0 Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

audioop is depricated in python 3.11 and removed since 3.13
https://docs.python.org/3/library/audioop.html

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already include audioop for 3.13+ using audioop-lts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The part of code you have commented this on doesn't decode or process the Opus data.
Lines 259 to 277 just processes the list of ints, that represent the waveform, to be in the form that discord expects them to be (values between 0-255 and a maximum of 255 values)
Since its just a list of ints, not sure if audioop would be applicable?


print(len(sample_waveform))
return base64.b64encode(bytes(sample_waveform)).decode('utf-8')
Loading