torchaudio.sox_effects¶

警告

The SoxEffect 和 SoxEffectsChain 类已弃用。请迁移到 apply_effects_tensor() 和 apply_effects_file()。

资源初始化/关闭¶

torchaudio.sox_effects.init_sox_effects()[source]¶

初始化使用 sox 效果所需的资源。

注意

您无需手动调用此函数。它会自动调用。

初始化后，您无需在多次使用 sox 效果时再次调用此函数，只要尚未调用 shutdown_sox_effects()，这样做是安全的。一旦调用了 shutdown_sox_effects()，您将无法再使用 SoX 效果，且再次初始化将导致错误。

torchaudio.sox_effects.shutdown_sox_effects()[source]¶

清理使用 sox 效果所需的资源。

注意

您无需手动调用此函数。它会自动调用。

多次调用此函数是安全的。一旦调用 shutdown_sox_effects()，您将无法再使用 SoX 效果，且再次初始化将导致错误。

列出支持的特效¶

torchaudio.sox_effects.effect_names() → List[str][source]¶

获取有效的 sox 效果名称列表

Returns: 可用效果名称列表。
Return type: 列表[字符串]

Example

>>> torchaudio.sox_effects.effect_names()
['allpass', 'band', 'bandpass', ... ]

应用效果¶

将 SoX 效果链应用于 torch.Tensor 或文件，并加载为 torch.Tensor。

在 Tensor 上应用效果¶

torchaudio.sox_effects.apply_effects_tensor(tensor: torch.Tensor, sample_rate: int, effects: List[List[str]], channels_first: bool = True) → Tuple[torch.Tensor, int][source]¶

对给定张量应用 sox 效果

注意

此函数的工作方式与sox命令非常相似，但有一些细微差别。例如，sox命令会自动添加某些效果（例如在speed和pitch之后添加rate效果以及其他效果），而此函数只应用给定的效果。（因此，要实际应用speed效果，您还需要指定具有所需采样率的rate效果。）

Parameters

张量 (torch.Tensor) – 输入的2D张量。
sample_rate (int) – 采样率
effects (List[List[str]]) – 效果列表。
channels_first (bool) – 表示输入张量的维度是 [channels, time] 或 [time, channels]

Returns

生成的张量和采样率。生成的张量与输入张量具有相同的 dtype，并且通道顺序相同。张量的形状可能会因应用的效应而有所不同。采样率也可能因应用的效应而有所不同。

Return type

Tuple[torch.Tensor, int]

Example - Basic usage

>>>
>>> # Defines the effects to apply
>>> effects = [
...     ['gain', '-n'],  # normalises to 0dB
...     ['pitch', '5'],  # 5 cent pitch shift
...     ['rate', '8000'],  # resample to 8000 Hz
... ]
>>>
>>> # Generate pseudo wave:
>>> # normalized, channels first, 2ch, sampling rate 16000, 1 second
>>> sample_rate = 16000
>>> waveform = 2 * torch.rand([2, sample_rate * 1]) - 1
>>> waveform.shape
torch.Size([2, 16000])
>>> waveform
tensor([[ 0.3138,  0.7620, -0.9019,  ..., -0.7495, -0.4935,  0.5442],
        [-0.0832,  0.0061,  0.8233,  ..., -0.5176, -0.9140, -0.2434]])
>>>
>>> # Apply effects
>>> waveform, sample_rate = apply_effects_tensor(
...     wave_form, sample_rate, effects, channels_first=True)
>>>
>>> # Check the result
>>> # The new waveform is sampling rate 8000, 1 second.
>>> # normalization and channel order are preserved
>>> waveform.shape
torch.Size([2, 8000])
>>> waveform
tensor([[ 0.5054, -0.5518, -0.4800,  ..., -0.0076,  0.0096, -0.0110],
        [ 0.1331,  0.0436, -0.3783,  ..., -0.0035,  0.0012,  0.0008]])
>>> sample_rate
8000

Example - Torchscript-able transform

>>>
>>> # Use `apply_effects_tensor` in `torch.nn.Module` and dump it to file,
>>> # then run sox effect via Torchscript runtime.
>>>
>>> class SoxEffectTransform(torch.nn.Module):
...     effects: List[List[str]]
...
...     def __init__(self, effects: List[List[str]]):
...         super().__init__()
...         self.effects = effects
...
...     def forward(self, tensor: torch.Tensor, sample_rate: int):
...         return sox_effects.apply_effects_tensor(
...             tensor, sample_rate, self.effects)
...
...
>>> # Create transform object
>>> effects = [
...     ["lowpass", "-1", "300"],  # apply single-pole lowpass filter
...     ["rate", "8000"],  # change sample rate to 8000
... ]
>>> transform = SoxEffectTensorTransform(effects, input_sample_rate)
>>>
>>> # Dump it to file and load
>>> path = 'sox_effect.zip'
>>> torch.jit.script(trans).save(path)
>>> transform = torch.jit.load(path)
>>>
>>>> # Run transform
>>> waveform, input_sample_rate = torchaudio.load("input.wav")
>>> waveform, sample_rate = transform(waveform, input_sample_rate)
>>> assert sample_rate == 8000

在文件上应用效果¶

torchaudio.sox_effects.apply_effects_file(path: str, effects: List[List[str]], normalize: bool = True, channels_first: bool = True) → Tuple[torch.Tensor, int][source]¶

对音频文件应用 sox 效果，并将生成的数据加载为 Tensor

注意

此函数的运作方式与 sox 命令非常相似，但存在一些细微差别。例如，sox 命令会自动添加某些效果（例如在 speed 之后添加 rate 效果、pitch 等），而此函数仅应用给定的效果。因此，若要实际应用 speed 效果，您还需要提供具有所需采样率的 rate 效果，因为在内部，speed 效果仅更改采样率而不触碰样本。

Parameters

路径 (字符串) – 音频文件的路径。
effects (List[List[str]]) – 效果列表。
normalize (bool) – 当 True 时，此函数始终返回 float32，并且样本值被归一化为 [-1.0, 1.0]。如果输入文件是整数 WAV 格式，提供 False 将更改生成的张量类型为整数类型。对于除整数 WAV 类型以外的格式，此参数没有效果。
channels_first (bool) – 当为 True 时，返回的张量维度为 [channel, time]。否则，返回的张量维度为 [time, channel]。

Returns

结果张量和采样率。如果 normalize=True，则结果张量始终为 float32 类型。如果 normalize=False，且输入音频文件为整数WAV文件，则结果张量具有相应的整数类型。（注意不支持24位整数类型）如果 channels_first=True，则结果张量的维度为 [channel, time]，否则为 [time, channel]。

Return type

Tuple[torch.Tensor, int]

Example - Basic usage

>>>
>>> # Defines the effects to apply
>>> effects = [
...     ['gain', '-n'],  # normalises to 0dB
...     ['pitch', '5'],  # 5 cent pitch shift
...     ['rate', '8000'],  # resample to 8000 Hz
... ]
>>>
>>> # Apply effects and load data with channels_first=True
>>> waveform, sample_rate = apply_effects_file("data.wav", effects, channels_first=True)
>>>
>>> # Check the result
>>> waveform.shape
torch.Size([2, 8000])
>>> waveform
tensor([[ 5.1151e-03,  1.8073e-02,  2.2188e-02,  ...,  1.0431e-07,
         -1.4761e-07,  1.8114e-07],
        [-2.6924e-03,  2.1860e-03,  1.0650e-02,  ...,  6.4122e-07,
         -5.6159e-07,  4.8103e-07]])
>>> sample_rate
8000

Example - Apply random speed perturbation to dataset

>>>
>>> # Load data from file, apply random speed perturbation
>>> class RandomPerturbationFile(torch.utils.data.Dataset):
...     """Given flist, apply random speed perturbation
...
...     Suppose all the input files are at least one second long.
...     """
...     def __init__(self, flist: List[str], sample_rate: int):
...         super().__init__()
...         self.flist = flist
...         self.sample_rate = sample_rate
...         self.rng = None
...
...     def __getitem__(self, index):
...         speed = self.rng.uniform(0.5, 2.0)
...         effects = [
...             ['gain', '-n', '-10'],  # apply 10 db attenuation
...             ['remix', '-'],  # merge all the channels
...             ['speed', f'{speed:.5f}'],  # duration is now 0.5 ~ 2.0 seconds.
...             ['rate', f'{self.sample_rate}'],
...             ['pad', '0', '1.5'],  # add 1.5 seconds silence at the end
...             ['trim', '0', '2'],  # get the first 2 seconds
...         ]
...         waveform, _ = torchaudio.sox_effects.apply_effects_file(
...             self.flist[index], effects)
...         return waveform
...
...     def __len__(self):
...         return len(self.flist)
...
>>> dataset = RandomPerturbationFile(file_list, sample_rate=8000)
>>> loader = torch.utils.data.DataLoader(dataset, batch_size=32)
>>> for batch in loader:
>>>     pass

旧版¶

SoxEffect¶

class torchaudio.sox_effects.SoxEffect[source]¶

创建一个对象，用于在 Python 和 C++ 之间传递 SoX 效果信息

警告

此功能已弃用。请迁移到apply_effects_file()或apply_effects_tensor()。

Returns: 一个具有以下属性的对象：ename （字符串类型），表示效果名称，eopts （字符串列表类型），表示效果选项列表。
Return type: SoxEffect

SoxEffectsChain¶

class torchaudio.sox_effects.SoxEffectsChain(normalization: Union[bool, float, Callable] = True, channels_first: bool = True, out_siginfo: Any = None, out_encinfo: Any = None, filetype: str = 'raw')[source]¶

SoX效果链类。

警告

此类已弃用。请迁移到apply_effects_file()或apply_effects_tensor()。

Parameters

归一化 (布尔值, 数字, 或 可调用对象, 可选) – 如果是布尔值 True，则输出除以 1 << 31 （假设为有符号的32位音频），并归一化到 [-1, 1]。如果是 number，则输出除以该数字。如果是 callable，则将输出作为参数传递给给定的函数，然后输出除以结果。 (默认: True)
channels_first (bool, 可选) – 设置结果中的通道优先或长度优先。（默认值：True）
out_siginfo (sox_signalinfo_t，可选) – 一个 sox_signalinfo_t 类型，当音频类型无法自动确定时可能会有帮助。（默认值：None）
out_encinfo (sox_encodinginfo_t, 可选) – 一个 sox_encodinginfo_t 类型，如果音频类型无法自动确定时可以设置。（默认值：None）
文件类型 (str, 可选) – 如果sox无法自动确定，则设置的文件类型或扩展名。 (默认值: 'raw')

Returns

大小为[C x L]或[L x C]的输出张量，其中L是音频帧的数量，C是通道数。一个整数，表示音频的采样率（如文件元数据中所列）

Return type

元组[Tensor, 整数]

Example

>>> class MyDataset(Dataset):
...     def __init__(self, audiodir_path):
...         self.data = [
...             os.path.join(audiodir_path, fn)
...             for fn in os.listdir(audiodir_path)]
...         self.E = torchaudio.sox_effects.SoxEffectsChain()
...         self.E.append_effect_to_chain("rate", [16000])  # resample to 16000hz
...         self.E.append_effect_to_chain("channels", ["1"])  # mono signal
...     def __getitem__(self, index):
...         fn = self.data[index]
...         self.E.set_input_file(fn)
...         x, sr = self.E.sox_build_flow_effects()
...         return x, sr
...
...     def __len__(self):
...         return len(self.data)
...
>>> ds = MyDataset(path_to_audio_files)
>>> for sig, sr in ds:
...    pass

append_effect_to_chain(ename: str, eargs: Union[List[str], str, None] = None) → None[source]¶

将效果附加到 SoX 效果链中。

Parameters

ename (str) – 这是效果的名称
eargs (List[str] or str, optional) – 这是一个效果选项列表。 (默认: None)

clear_chain() → None[source]¶: 清晰的 Python 效果链

set_input_file(input_file: str) → None[source]¶

设置链式输入文件

Parameters: 输入文件 (str) – 输入文件的路径。

sox_build_flow_effects(out: Optional[torch.Tensor] = None) → Tuple[torch.Tensor, int][source]¶

构建从输入文件到输出张量的效果链和流效果

Parameters: out (Tensor, 可选) – 输出将写入的位置。（默认值：None）
Returns: 大小为[C x L]或[L x C]的输出张量，其中L是音频帧的数量，C是通道数。一个整数，表示音频的采样率（文件元数据中列出）
Return type: 元组[Tensor, 整数]

torchaudio.sox_effects¶

资源初始化/关闭¶

列出支持的特效¶

应用效果¶

在 Tensor 上应用效果¶

在文件上应用效果¶

旧版¶

SoxEffect¶

SoxEffectsChain¶

文档

教程

资源