torchaudio.backend¶

概述¶

torchaudio.backend 模块提供了音频文件读写的实现功能，包括 torchaudio.info、torchaudio.load、torchaudio.load_wav 和 torchaudio.save。

目前共有四种实现可供选择。

“sox” （已弃用，默认在Linux/macOS下）
“sox_io” （默认在Linux/macOS上，从0.8.0版本开始）
“soundfile” - 旧版接口 (已废弃，默认在Windows上启用)
“soundfile” - 新接口 (Windows上从0.8.0版本开始默认启用)

在Windows上，只有"soundfile"后端（同时支持两种接口）可用。建议使用新接口，因为旧接口已被弃用。

在Linux/macOS系统上，请使用"sox_io"后端。强烈建议不要使用"sox"后端，因为它无法正确处理16位整数WAV以外的其他格式。详情请参见#726。

注意

请不要直接调用torchaudio.backend中的函数，而是使用torchaudio.info、torchaudio.load、torchaudio.load_wav和torchaudio.save，并使用torchaudio.set_audio_backend()正确设置后端。

可用性¶

"sox" 和 "sox_io" 后端需要 C++ 扩展模块，该模块包含在 Linux/macOS 二进制发行版中。这些后端在 Windows 上不可用。

"soundfile" 后端需要 SoundFile。请参考 SoundFile文档以获取安装说明。

默认后端的更改和弃用¶

后端模块正在进行重大重构。下表总结了变更和弃用的时间线。

Backend

0.7.0

0.8.0

0.9.0

"sox" (deprecated)

Default on Linux/macOS

Available

Removed

"sox_io"

Available

Default on Linx/macOS

Default on Linux/macOS

"soundfile" (legacy interface, deprecated)

Default on Windows

Available

Removed

"soundfile" (new interface)

Available

Default on Windows

Default on Windows

Linux/macOS 的默认后端将从 "sox" 更改为 "sox_io"，此更改将在 0.8.0 版本中生效。
The "sox" 后端将在 0.9.0 版本中移除。
从0.8.0版本开始，"soundfile"后端将使用新的接口，该接口与"sox_io"后端的接口相同。旧的接口将在0.9.0版本中被移除。

常用数据结构¶

用于报告音频文件元数据的结构。

AudioMetaData¶

class torchaudio.backend.common.AudioMetaData(sample_rate: int, num_frames: int, num_channels: int)[source]¶

torchaudio.info 函数的返回类型。

此类由 “sox_io” 后端和 “soundfile” 后端的新接口使用。

Variables

sample_rate (int) – 采样率
num_frames (int) – 帧数
num_channels (int) – 通道数量

信号信息（已弃用）¶

class torchaudio.backend.common.SignalInfo[source]¶

torchaudio.info函数的一种返回类型。

此类由 “sox” 后端（已废弃）和 “soundfile” 后端与旧接口（已废弃）使用。

参见 https://fossies.org/dox/sox-14.4.2/structsox__signalinfo__t.html

Variables

通道 (可选[整数]) – 通道的数量
采样率 (可选[浮点数]) – 采样率
精度 (可选[整数]) – 位深度
长度 (可选[整数]) – 对于 sox 后端, 样本的数量。 (帧数 * 通道数)。对于 soundfile 后端, 帧数。

编码信息 (已弃用)¶

class torchaudio.backend.common.EncodingInfo[source]¶

torchaudio.info函数的一种返回类型。

此类由 “sox” 后端（已废弃）和 “soundfile” 后端与旧接口（已废弃）使用。

参见 https://fossies.org/dox/sox-14.4.2/structsox__encodinginfo__t.html

Variables

编码 (可选[整数]) – sox_encoding_t
bits_per_sample (可选[int]) – 位深度
压缩 (可选[浮点数]) – 压缩选项
reverse_bytes (任意) –
反向 nibbles (任意) –
反向位 (任意) –
opposite_endian (可选[bool]) –

Sox 后端（已弃用）¶

The "sox" 后端在 Linux/macOS 上可用，但在 Windows 上不可用。此后端目前在可用时为默认设置，但已弃用，并将在 0.9.0 版本中移除。

你可以通过以下方式从另一个后端切换到sox后端。

torchaudio.set_audio_backend("sox")

信息¶

torchaudio.backend.sox_backend.info(filepath: str) → Tuple[torchaudio.backend.common.SignalInfo, torchaudio.backend.common.EncodingInfo][source]¶

从音频文件中获取元数据而无需加载信号。

Parameters

文件路径 – 音频文件的路径

Returns

A si (sox_signalinfo_t) signal: 作为 Python 对象的信息。一个 ei（sox_encodinginfo_t）编码信息。

Return type

(sox_signalinfo_t, sox编码信息_t)

Example

>>> si, ei = torchaudio.info('foo.wav')
>>> rate, channels, encoding = si.rate, si.channels, ei.encoding

加载¶

torchaudio.backend.sox_backend.load(filepath: str, out: Optional[torch.Tensor] = None, normalization: bool = True, channels_first: bool = True, num_frames: int = 0, offset: int = 0, signalinfo: torchaudio.backend.common.SignalInfo = None, encodinginfo: torchaudio.backend.common.EncodingInfo = None, filetype: Optional[str] = None) → Tuple[torch.Tensor, int][source]¶

从磁盘加载音频文件到张量中

Parameters

文件路径 – 音频文件的路径
out – 可选的输出张量，用于代替创建一个新的张量。(默认值：None)
归一化 – 可选的归一化。如果布尔值为True，则输出将除以1 << 31。假设输入是带符号的32位音频，这将归一化为[-1, 1]。如果为float，则输出将除以该数字。如果为Callable，则输出作为参数传递给给定函数，然后输出将除以结果。(默认值：True)
channels_first – 设置结果中的通道优先或长度优先。（默认值：True）
num_frames – 要加载的帧数。0 表示在偏移量之后加载所有内容。 (默认值：0)
偏移量 – 从文件开头开始加载数据的帧数。（默认值：0）
信号信息 – 一个 sox_signalinfo_t 类型，当音频类型无法自动确定时可能会有帮助。（默认值：None）
编码信息 – 一个 sox_encodinginfo_t 类型，如果音频类型无法自动确定时可以设置。（默认值：None）
文件类型 – 如果sox无法自动确定，则设置一个文件类型或扩展名。 (默认值：None)

Returns

An output tensor of size [C x L] or [L x C] where: L 是音频帧的数量， C 是通道的数量。一个整数，表示音频的采样率（如文件元数据中所列）。

Return type

(Tensor, 整数)

Example

>>> data, sample_rate = torchaudio.load('foo.mp3')
>>> print(data.size())
torch.Size([2, 278756])
>>> print(sample_rate)
44100
>>> data_vol_normalized, _ = torchaudio.load('foo.mp3', normalization=lambda x: torch.abs(x).max())
>>> print(data_vol_normalized.abs().max())
1.

torchaudio.backend.sox_backend.load_wav(filepath, **kwargs)[source]¶

加载一个波形文件。

它假设该 wav 文件每样本使用 16 位，需要通过将输入右移 16 位来进行归一化。

Parameters

文件路径 – 音频文件的路径

Returns

An output tensor of size [C x L] or [L x C] where L is the number: 音频帧的数量，C 是通道数。一个整数，表示音频的采样率（如文件元数据中所列）

Return type

(Tensor, 整数)

保存¶

torchaudio.backend.sox_backend.save(filepath: str, src: torch.Tensor, sample_rate: int, precision: int = 16, channels_first: bool = True) → None[source]¶

将张量保存为音频文件。

Parameters

文件路径 – 音频文件的路径
src – 输入的2D张量，形状为[C x L]或[L x C]，其中L是音频帧的数量，C是通道数
sample_rate – 一个整数，表示音频的采样率（如文件元数据中所列）
位精度（默认值 (精度) – 16 )
channels_first (bool, 可选) – 设置结果中的通道优先或长度优先。（默认值：True）

其他¶

torchaudio.backend.sox_backend.get_sox_bool(i: int = 0) → Any[source]¶

获取 sox 编码信息选项中 sox_bool 的枚举值。

Parameters: i (整数, 可选) – 选择类型或获取所有可能选项的字典使用 __members__ 在未指定时查看所有选项。 (默认: sox_false 或 0)
Returns: 一个 sox_bool 类型
Return type: sox_bool

torchaudio.backend.sox_backend.get_sox_encoding_t(i: int = None) → torchaudio.backend.common.EncodingInfo[source]¶

获取 sox 编码的 sox_encoding_t 枚举值。

Parameters: i (int, 可选) – 选择类型或获取所有可能选项的字典使用 __members__ 在未指定时查看所有选项。 (默认: None)
Returns: 用于输出编码的 sox_encoding_t 类型
Return type: sox_encoding_t

torchaudio.backend.sox_backend.get_sox_option_t(i: int = 2) → Any[source]¶

获取 sox_option_t 的枚举值以用于 sox 编码信息选项。

Parameters: i (int, 可选) – 选择类型或获取所有可能选项的字典使用 __members__ 在未指定时查看所有选项。 (默认：sox_option_default 或 2)
Returns: 一种 sox_option_t 类型
Return type: sox_option_t

torchaudio.backend.sox_backend.save_encinfo(filepath: str, src: torch.Tensor, channels_first: bool = True, signalinfo: Optional[torchaudio.backend.common.SignalInfo] = None, encodinginfo: Optional[torchaudio.backend.common.EncodingInfo] = None, filetype: Optional[str] = None) → None[source]¶

将一个音频信号张量保存到磁盘上，格式为标准格式，如 mp3、wav 等。

Parameters

filepath (str) – 音频文件路径
输入 (张量) – 形状为[C x L]或[L x C]的2D张量，其中L是音频帧的数量，C是通道数
channels_first (bool, 可选) – 设置结果中的通道优先或长度优先。（默认值：True）
信号信息 (sox_signalinfo_t, 可选) – 一个类型为sox_signalinfo_t的信息，当音频类型无法自动确定时可能会有帮助（默认值：None）。
编码信息 (sox_encodinginfo_t, 可选) – 一个 sox_encodinginfo_t 类型，如果音频类型无法自动确定时可以设置（默认值：None）。
文件类型 (str, 可选) – 如果sox无法自动确定，则设置一个文件类型或扩展名。（默认值：None）

Example

>>> data, sample_rate = torchaudio.load('foo.mp3')
>>> torchaudio.save('foo.wav', data, sample_rate)

torchaudio.backend.sox_backend.sox_encodinginfo_t() → torchaudio.backend.common.EncodingInfo[source]¶

创建一个 `sox_encodinginfo_t` 对象。此对象可用于设置编码类型、位精度、压缩因子、字节反转、半字节反转、位反转和字节序。它可以在效果链中用于对最终输出进行编码，或者保存为特定编码的文件。例如，可以使用 sox ulaw 编码来进行 8 位 ulaw 编码。注意，在张量输出中，结果将是 32 位数字，但唯一值的数量将由位精度决定。

Returns: sox_encodinginfo_t(object)

编码（sox_encoding_t），输出编码
每样本位数 (int)，位精度，与 sox_signalinfo_t 中的 precision 相同
压缩（浮点数），有损格式的压缩率，0.0 表示默认压缩
反向字节 (sox_option_t)，反向字节，使用 sox_option_default
反向 nibble （sox_option_t），反向 nibble，使用 sox_option_default
反向位（sox_option_t），反向字节，使用 sox_option_default
相反字节序（sox_bool），更改字节序，使用 sox_false

Example

>>> ei = torchaudio.sox_encodinginfo_t()
>>> ei.encoding = torchaudio.get_sox_encoding_t(1)
>>> ei.bits_per_sample = 16
>>> ei.compression = 0
>>> ei.reverse_bytes = torchaudio.get_sox_option_t(2)
>>> ei.reverse_nibbles = torchaudio.get_sox_option_t(2)
>>> ei.reverse_bits = torchaudio.get_sox_option_t(2)
>>> ei.opposite_endian = torchaudio.get_sox_bool(0)

torchaudio.backend.sox_backend.sox_signalinfo_t() → torchaudio.backend.common.SignalInfo[source]¶

创建一个 sox_signalinfo_t 对象。此对象主要用于设置样本率、声道数、长度、位精度和头留倍数。

Returns: sox_signalinfo_t(object)

速率（浮点数），采样率作为浮点数，实际上很可能是整数浮点数。
通道（int），音频通道的数量
精度（int），位精度
长度（整数），音频以样本乘以通道为单位的长度，0 表示未指定，-1 表示未知。
倍数 (float，可选)，效果的增益乘数，None 表示无乘数

Example

>>> si = torchaudio.sox_signalinfo_t()
>>> si.channels = 1
>>> si.rate = 16000.
>>> si.precision = 16
>>> si.length = 0

Sox IO 后端¶

The "sox_io" 后端在 Linux/macOS 上可用，但在 Windows 上不可用。此后端优于 "sox" 后端，并将在 0.8.0 版本中成为默认选项。

此后端的 I/O 函数支持 TorchScript。

您可以使用以下命令从另一个后端切换到 sox_io 后端；

torchaudio.set_audio_backend("sox_io")

信息¶

torchaudio.backend.sox_io_backend.info(filepath: str) → torchaudio.backend.common.AudioMetaData[source]¶

获取音频文件的信号信息。

Parameters: 文件路径 (str 或 pathlib.Path) – 音频文件的路径。此函数也处理 pathlib.Path 对象，但为了与TorchScript兼容，被标注为 str。
Returns: 给定音频的元数据。
Return type: AudioMetaData

加载¶

torchaudio.backend.sox_io_backend.load(filepath: str, frame_offset: int = 0, num_frames: int = -1, normalize: bool = True, channels_first: bool = True) → Tuple[torch.Tensor, int][source]¶

从文件加载音频数据。

注意

此函数可以处理底层 libsox 支持的所有编解码器，但已在以下格式上进行了测试；

WAV
- 32-bit floating-point
- 32-bit signed integer
- 16-bit signed integer
- 8-bit unsigned integer
MP3
FLAC
OGG/VORBIS
OPUS
SPHERE

要加载 MP3、FLAC、OGG/VORBIS、OPUS 以及其他 libsox 原生不支持的编解码器，您的 torchaudio 安装必须链接到 libsox 以及相应的编解码器库，例如 libmad 或 libmp3lame 等。

默认情况下（normalize=True，channels_first=True），此函数返回具有 float32 数据类型和 [channel, time] 形状的张量。样本已归一化，使其范围适配于 [-1.0, 1.0]。

当输入格式为整数类型的 WAV（例如 32 位有符号整数、16 位有符号整数和 8 位无符号整数，不支持 24 位有符号整数）时，通过提供 normalize=False，此函数可以返回整数 Tensor，其中样本在对应数据类型的整个范围内表示，即 32 位有符号 PCM 的 int32 tensor，16 位有符号 PCM 的 int16 以及 8 位无符号 PCM 的 uint8。

normalize 参数对 32 位浮点 WAV 和其他格式（如 flac 和 mp3）没有影响。对于这些格式，此函数始终返回 float32 Tensor，其值已归一化为 [-1.0, 1.0]。

Parameters

文件路径 (str 或 pathlib.Path) – 音频文件的路径。此函数也处理 pathlib.Path 对象，但为了与TorchScript编译器兼容而标注为 str。
frame_offset (int) – 开始读取数据之前要跳过的帧数。
num_frames (int) – 要读取的最大帧数。 -1 表示读取剩余所有样本，从 frame_offset 开始。如果给定文件中的帧数不足，此函数可能会返回较少的帧数。
归一化 (bool) – 当值为 True 时，此函数始终返回 float32，并会将采样值归一化至 [-1.0, 1.0]。如果输入文件是整数WAV格式，提供 False 将使结果张量类型更改为整数类型。此参数对除整数WAV格式以外的其他格式无效。
channels_first (bool) – 当为 True 时，返回的张量维度为 [channel, time]。否则，返回的张量维度为 [time, channel]。

Returns

如果输入文件为整数WAV格式且未进行归一化，则其类型为整数类型；否则为float32类型。若channels_first=True，则其类型为[channel, time]；否则为[time, channel]。

Return type

torch.Tensor

torchaudio.backend.sox_io_backend.load_wav(filepath: str, frame_offset: int = 0, num_frames: int = -1, channels_first: bool = True) → Tuple[torch.Tensor, int][source]¶

加载波形文件。

此函数仅出于与其他后端兼容的目的而定义，适用于简单的用例，例如 torchaudio.load_wav(filepath)。实现方式与load()相同。

保存¶

torchaudio.backend.sox_io_backend.save(filepath: str, src: torch.Tensor, sample_rate: int, channels_first: bool = True, compression: Optional[float] = None)[source]¶

将音频数据保存到文件。

注意

支持的格式为；

WAV
- 32-bit floating-point
- 32-bit signed integer
- 16-bit signed integer
- 8-bit unsigned integer
MP3
FLAC
OGG/VORBIS
SPHERE

为了保存MP3、FLAC、OGG/VORBIS和其他不原生支持的编解码器libsox，您的torchaudio安装需要链接到libsox及其相应的编解码库，例如libmad或libmp3lame等。

Parameters

文件路径 (str 或 pathlib.Path) – 保存文件的路径。此函数还处理 pathlib.Path 对象，但被标注为 str 以确保与TorchScript编译器兼容。
张量 (torch.Tensor) – 需要保存的音频数据。必须是二维张量。
sample_rate (int) – 采样率
channels_first (bool) – 如果为 True，则给定张量被解释为 [channel, time]，否则为 [time, channel]。
压缩 (可选[浮点数]) –
用于除WAV以外的格式。这对应于-C选项的sox命令。
- MP3: Either bitrate (in kbps) with quality factor, such as 128.2, or
  
  VBR encoding with quality factor such as -4.2. Default: -4.5.
- FLAC: compression level. Whole number from 0 to 8.
  
  8 is default and highest compression.
- OGG/VORBIS: number from -1 to 10; -1 is the highest compression
  
  and lowest quality. Default: 3.
在 http://sox.sourceforge.net/soxformat.html 查看详情。

Soundfile 后端¶

当安装 SoundFile 时，"soundfile" 后端可用。此后端是 Windows 上的默认设置。

The "soundfile" 后端有两个接口，旧版和新版。

在0.7.0版本中，默认情况下在切换到"soundfile"后端时使用旧版接口。
在0.8.0版本中，新界面将成为默认设置。
在0.9.0版本中，旧版接口将被移除。

要更改界面，请在切换后端之前设置torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE标志。

torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = True
torchaudio.set_audio_backend("soundfile")  # The legacy interface

torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False
torchaudio.set_audio_backend("soundfile")  # The new interface

旧版接口（已弃用）¶

"soundfile" 后端接口目前在 Windows 上默认使用，但此接口已弃用，并将在 0.9.0 版本中移除。

切换到此后端/界面之前，设置torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE标志。在切换后端之前进行设置。

torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = True
torchaudio.set_audio_backend("soundfile")  # The legacy interface

信息¶

torchaudio.backend.soundfile_backend.info(filepath: str) → Tuple[torchaudio.backend.common.SignalInfo, torchaudio.backend.common.EncodingInfo][source]¶

从音频文件中获取元数据而无需加载信号。

Parameters

文件路径 – 音频文件的路径

Returns

A si (sox_signalinfo_t) signal: 作为 Python 对象的信息。一个 ei（sox_encodinginfo_t）编码信息。

Return type

(sox_signalinfo_t, sox编码信息_t)

Example

>>> si, ei = torchaudio.info('foo.wav')
>>> rate, channels, encoding = si.rate, si.channels, ei.encoding

加载¶

torchaudio.backend.soundfile_backend.load(filepath: str, out: Optional[torch.Tensor] = None, normalization: Optional[bool] = True, channels_first: Optional[bool] = True, num_frames: int = 0, offset: int = 0, signalinfo: torchaudio.backend.common.SignalInfo = None, encodinginfo: torchaudio.backend.common.EncodingInfo = None, filetype: Optional[str] = None) → Tuple[torch.Tensor, int][source]¶

从磁盘加载音频文件到张量中

Parameters

文件路径 – 音频文件的路径
out – 可选的输出张量，用于代替创建一个新的张量。(默认值：None)
归一化 – 可选的归一化。如果布尔值为True，则输出将除以1 << 31。假设输入是带符号的32位音频，这将归一化为[-1, 1]。如果为float，则输出将除以该数字。如果为Callable，则输出作为参数传递给给定函数，然后输出将除以结果。(默认值：True)
channels_first – 设置结果中的通道优先或长度优先。（默认值：True）
num_frames – 要加载的帧数。0 表示在偏移量之后加载所有内容。 (默认值：0)
偏移量 – 从文件开头开始加载数据的帧数。（默认值：0）
信号信息 – 一个 sox_signalinfo_t 类型，当音频类型无法自动确定时可能会有帮助。（默认值：None）
编码信息 – 一个 sox_encodinginfo_t 类型，如果音频类型无法自动确定时可以设置。（默认值：None）
文件类型 – 如果sox无法自动确定，则设置一个文件类型或扩展名。 (默认值：None)

Returns

An output tensor of size [C x L] or [L x C] where: L 是音频帧的数量， C 是通道的数量。一个整数，表示音频的采样率（如文件元数据中所列）。

Return type

(Tensor, 整数)

Example

>>> data, sample_rate = torchaudio.load('foo.mp3')
>>> print(data.size())
torch.Size([2, 278756])
>>> print(sample_rate)
44100
>>> data_vol_normalized, _ = torchaudio.load('foo.mp3', normalization=lambda x: torch.abs(x).max())
>>> print(data_vol_normalized.abs().max())
1.

torchaudio.backend.soundfile_backend.load_wav(filepath, **kwargs)[source]¶

加载一个波形文件。

它假设该 wav 文件每样本使用 16 位，需要通过将输入右移 16 位来进行归一化。

Parameters

文件路径 – 音频文件的路径

Returns

An output tensor of size [C x L] or [L x C] where L is the number: 音频帧的数量，C 是通道数。一个整数，表示音频的采样率（如文件元数据中所列）

Return type

(Tensor, 整数)

保存¶

torchaudio.backend.soundfile_backend.save(filepath: str, src: torch.Tensor, sample_rate: int, precision: int = 16, channels_first: bool = True) → None[source]¶

将张量保存为音频文件。

Parameters

文件路径 – 音频文件的路径
src – 输入的2D张量，形状为[C x L]或[L x C]，其中L是音频帧的数量，C是通道数
sample_rate – 一个整数，表示音频的采样率（如文件元数据中所列）
位精度（默认值 (精度) – 16 )
channels_first (bool, 可选) – 设置结果中的通道优先或长度优先。（默认值：True）

新界面¶

新的 "soundfile" 后端将在 0.8.0 版本中成为默认设置。

切换到此后端/界面之前，设置torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE标志。在切换后端之前进行设置。

torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False
torchaudio.set_audio_backend("soundfile")  # The new interface

信息¶

torchaudio.backend._soundfile_backend.info(filepath: str) → torchaudio.backend.common.AudioMetaData[source]¶

获取音频文件的信号信息。

Parameters: 文件路径 (str 或 pathlib.Path) – 音频文件的路径。此函数也处理 pathlib.Path 对象，但为了与“sox_io”后端保持一致，并确保TorchScript编译器兼容性，其被标注为 str。
Returns: 给定音频的元数据。
Return type: AudioMetaData

加载¶

torchaudio.backend._soundfile_backend.load(filepath: str, frame_offset: int = 0, num_frames: int = -1, normalize: bool = True, channels_first: bool = True) → Tuple[torch.Tensor, int][source]¶

从文件加载音频数据。

注意

此函数可处理的格式取决于 soundfile 的安装情况。本函数已在以下格式上经过测试；

WAV
- 32-bit floating-point
- 32-bit signed integer
- 16-bit signed integer
- 8-bit unsigned integer
FLAC
OGG/VORBIS
SPHERE

默认情况下（normalize=True，channels_first=True），此函数返回具有 float32 数据类型和 [channel, time] 形状的张量。样本已归一化，使其范围适配于 [-1.0, 1.0]。

当输入格式为整数类型的 WAV（例如 32 位有符号整数、16 位有符号整数和 8 位无符号整数，不支持 24 位有符号整数）时，通过提供 normalize=False，此函数可以返回整数 Tensor，其中样本在对应数据类型的整个范围内表示，即 32 位有符号 PCM 的 int32 tensor，16 位有符号 PCM 的 int16 以及 8 位无符号 PCM 的 uint8。

normalize 参数对 32 位浮点 WAV 和其他格式（如 flac 和 mp3）没有影响。对于这些格式，此函数始终返回 float32 Tensor，其值已归一化为 [-1.0, 1.0]。

Parameters

文件路径 (str 或 pathlib.Path) – 音频文件的路径。此函数也处理 pathlib.Path 对象，但为了与“sox_io”后端保持一致，并确保TorchScript编译器兼容性，其被标注为 str。
frame_offset (int) – 开始读取数据之前要跳过的帧数。
num_frames (int) – 要读取的最大帧数。 -1 表示读取剩余所有样本，从 frame_offset 开始。如果给定文件中的帧数不足，此函数可能会返回较少的帧数。
归一化 (bool) – 当值为 True 时，此函数始终返回 float32，并会将采样值归一化至 [-1.0, 1.0]。如果输入文件是整数WAV格式，提供 False 将使结果张量类型更改为整数类型。此参数对除整数WAV格式以外的其他格式无效。
channels_first (bool) – 当为 True 时，返回的张量维度为 [channel, time]。否则，返回的张量维度为 [time, channel]。

Returns

如果输入文件为整数WAV格式且未进行归一化，则其类型为整数类型；否则为float32类型。若channels_first=True，则其类型为[channel, time]；否则为[time, channel]。

Return type

torch.Tensor

torchaudio.backend._soundfile_backend.load_wav(filepath: str, frame_offset: int = 0, num_frames: int = -1, channels_first: bool = True) → Tuple[torch.Tensor, int][source]¶

加载波形文件。

此函数仅出于与其他后端兼容的目的而定义，适用于简单的用例，例如 torchaudio.load_wav(filepath)。实现方式与load()相同。

保存¶

torchaudio.backend._soundfile_backend.save(filepath: str, src: torch.Tensor, sample_rate: int, channels_first: bool = True, compression: Optional[float] = None)[source]¶

将音频数据保存到文件。

注意

此函数可处理的格式取决于 soundfile 的安装情况。本函数已在以下格式上经过测试；

WAV
- 32-bit floating-point
- 32-bit signed integer
- 16-bit signed integer
- 8-bit unsigned integer
FLAC
OGG/VORBIS
SPHERE

Parameters

文件路径 (str 或 pathlib.Path) – 音频文件的路径。此函数也处理 pathlib.Path 对象，但为了与“sox_io”后端保持一致，并确保TorchScript编译器兼容性，其被标注为 str。
张量 (torch.Tensor) – 需要保存的音频数据。必须是二维张量。
sample_rate (int) – 采样率
channels_first (bool) – 如果为 True，则给定张量被解释为 [channel, time]，否则为 [time, channel]。
压缩 (可选[浮点数]) – 不使用。仅用于与“sox_io”后端接口兼容。

torchaudio.backend¶

概述¶

可用性¶

默认后端的更改和弃用¶

常用数据结构¶

AudioMetaData¶

信号信息（已弃用）¶

编码信息 (已弃用)¶

Sox 后端（已弃用）¶

信息¶

加载¶

保存¶

其他¶

Sox IO 后端¶

信息¶

加载¶

保存¶

Soundfile 后端¶

旧版接口（已弃用）¶

信息¶

加载¶

保存¶

新界面¶

信息¶

加载¶

保存¶

文档

教程

资源

Backend	0.7.0	0.8.0	0.9.0
`"sox"` (deprecated)	Default on Linux/macOS	Available	Removed
`"sox_io"`	Available	Default on Linx/macOS	Default on Linux/macOS
`"soundfile"` (legacy interface, deprecated)	Default on Windows	Available	Removed
`"soundfile"` (new interface)	Available	Default on Windows	Default on Windows