torchaudio.datasets¶

所有数据集都是 torch.utils.data.Dataset 的子类，并实现了 __getitem__ 和 __len__ 方法。因此，它们都可以传递给 torch.utils.data.DataLoader，该加载器可以使用 torch.multiprocessing 个工作进程并行加载多个样本。例如：

yesno_data = torchaudio.datasets.YESNO('.', download=True)
data_loader = torch.utils.data.DataLoader(yesno_data,
                                          batch_size=1,
                                          shuffle=True,
                                          num_workers=args.nThreads)

CMUARCTIC¶

class torchaudio.datasets.CMUARCTIC(root: Union[str, pathlib.Path], url: str = 'aew', folder_in_archive: str = 'ARCTIC', download: bool = False)[source]¶

为 CMU_ARCTIC 创建数据集。

Parameters

root (str 或 Path) – 数据集所在或下载的目录路径。
url (str, optional) – 要从中下载数据集的 URL，或要下载的数据集类型。（默认值："aew"）允许的类型值为 "aew"、"ahw"、"aup"、"awb"、"axb"、"bdl"、 "clb"、"eey"、"fem"、"gka"、"jmk"、"ksp"、"ljm"、"lnh"、 "rms"、"rxr"、"slp" 或 "slt"。
folder_in_archive (str, optional) – 数据集的顶层目录。（默认值："ARCTIC"）
download (bool, optional) – 如果在根路径下未找到数据集，是否下载该数据集。（默认值：False）。

__getitem__(n: int) → Tuple[torch.Tensor, int, str, str][source]¶

从数据集中加载第 n 个样本。

Parameters: n (int) – 要加载的样本的索引
Returns: (waveform, sample_rate, transcript, utterance_id)
Return type: (张量, 整数, 字符串, 字符串)

CMUDict¶

class torchaudio.datasets.CMUDict(root: Union[str, pathlib.Path], exclude_punctuations: bool = True, *, download: bool = False, url: str = 'http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b', url_symbols: str = 'http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b.symbols')[source]¶

为 CMU 发音词典（CMUDict）创建数据集。

Parameters

root (str 或 Path) – 数据集所在或下载的目录路径。
exclude_punctuations (bool, optional) – 启用时，排除标点符号的发音，例如 !EXCLAMATION-POINT 和 #HASH-MARK。
download (bool, optional) – 如果在根路径下未找到数据集，是否下载该数据集。（默认值：False）。
url (str, optional) – 要从中下载字典的 URL。 (默认值: "http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b")
url_symbols (str, optional) – 用于下载符号列表的 URL。（默认值："http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b.symbols"）

__getitem__(n: int) → Tuple[str, List[str]][source]¶

从数据集中加载第 n 个样本。

Parameters: n (int) – 要加载的样本的索引。
Returns: 对应的单词和音素 (word, [phonemes]).
Return type: (字符串, List[字符串])

property symbols¶

一个音素符号列表，例如 AA、AE、AH。

Type: 列表[字符串]

COMMONVOICE¶

class torchaudio.datasets.COMMONVOICE(root: Union[str, pathlib.Path], tsv: str = 'train.tsv')[source]¶

为 CommonVoice 创建数据集。

Parameters

root (str 或 Path) – 数据集所在目录的路径。（存在 tsv 文件的位置。）
tsv (str, optional) – 用于构建元数据的 tsv 文件名，例如 "train.tsv", "test.tsv", "dev.tsv", "invalidated.tsv", "validated.tsv" 和 "other.tsv"。（默认值："train.tsv"）

__getitem__(n: int) → Tuple[torch.Tensor, int, Dict[str, str]][source]¶

从数据集中加载第 n 个样本。

Parameters: n (int) – 要加载的样本的索引
Returns: (waveform, sample_rate, dictionary), 其中字典是根据以下键从 TSV 文件构建的：client_id, path, sentence, up_votes, down_votes, age, gender 和 accent。
Return type: (张量, 整数, 字典[字符串, 字符串])

GTZAN¶

class torchaudio.datasets.GTZAN(root: Union[str, pathlib.Path], url: str = 'http://opihi.cs.uvic.ca/sound/genres.tar.gz', folder_in_archive: str = 'genres', download: bool = False, subset: Optional[str] = None)[source]¶

为 GTZAN 创建数据集。

注意

如果您计划使用此数据集发布结果，请参阅 http://marsyas.info/downloads/datasets.html。

Parameters

root (str 或 Path) – 数据集所在或下载的目录路径。
url (str, optional) – 下载数据集的 URL。 (默认值: "http://opihi.cs.uvic.ca/sound/genres.tar.gz")
folder_in_archive (str, optional) – 数据集的顶层目录。
download (bool, optional) – 如果在根路径下未找到数据集，是否下载该数据集。（默认值：False）。
subset (str 或 None, 可选) – 要使用的数据集子集。可以是 "training"、"validation"、"testing" 或 None 之一。如果为 None，则使用整个数据集。（默认值：None）。

__getitem__(n: int) → Tuple[torch.Tensor, int, str][source]¶

从数据集中加载第 n 个样本。

Parameters: n (int) – 要加载的样本的索引
Returns: (waveform, sample_rate, label)
Return type: (Tensor, 整数, 字符串)

LIBRISPEECH¶

class torchaudio.datasets.LIBRISPEECH(root: Union[str, pathlib.Path], url: str = 'train-clean-100', folder_in_archive: str = 'LibriSpeech', download: bool = False)[source]¶

为 LibriSpeech 创建数据集。

Parameters

root (str 或 Path) – 数据集所在或下载的目录路径。
url (str, optional) – 用于下载数据集的 URL，或要下载的数据集类型。允许的类型值为 "dev-clean"、"dev-other"、"test-clean"、 "test-other"、"train-clean-100"、"train-clean-360" 和 "train-other-500"。（默认值："train-clean-100"）
folder_in_archive (str, optional) – 数据集的顶层目录。（默认值："LibriSpeech"）
download (bool, optional) – 如果在根路径下未找到数据集，是否下载该数据集。（默认值：False）。

__getitem__(n: int) → Tuple[torch.Tensor, int, str, int, int, int][source]¶

从数据集中加载第 n 个样本。

Parameters: n (int) – 要加载的样本的索引
Returns: (waveform, sample_rate, transcript, speaker_id, chapter_id, utterance_id)
Return type: (Tensor, 整数, 字符串, 整数, 整数, 整数)

LIBRITTS¶

class torchaudio.datasets.LIBRITTS(root: Union[str, pathlib.Path], url: str = 'train-clean-100', folder_in_archive: str = 'LibriTTS', download: bool = False)[source]¶

为 LibriTTS 创建数据集。

Parameters

root (str 或 Path) – 数据集所在或下载的目录路径。
url (str, optional) – 用于下载数据集的 URL，或要下载的数据集类型。允许的类型值为 "dev-clean"、"dev-other"、"test-clean"、 "test-other"、"train-clean-100"、"train-clean-360" 和 "train-other-500"。（默认值："train-clean-100"）
folder_in_archive (str, optional) – 数据集的顶层目录。（默认值："LibriTTS"）
download (bool, optional) – 如果在根路径下未找到数据集，是否下载该数据集。（默认值：False）。

__getitem__(n: int) → Tuple[torch.Tensor, int, str, str, int, int, str][source]¶

从数据集中加载第 n 个样本。

Parameters: n (int) – 要加载的样本的索引
Returns: (waveform, sample_rate, original_text, normalized_text, speaker_id, chapter_id, utterance_id)
Return type: (张量, 整数, 字符串, 字符串, 字符串, 整数, 整数, 字符串)

LJSPEECH¶

class torchaudio.datasets.LJSPEECH(root: Union[str, pathlib.Path], url: str = 'https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2', folder_in_archive: str = 'wavs', download: bool = False)[source]¶

为 LJSpeech-1.1 创建数据集。

Parameters

root (str 或 Path) – 数据集所在或下载的目录路径。
url (str, optional) – 下载数据集的 URL。 (默认值: "https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2")
folder_in_archive (str, optional) – 数据集的顶层目录。（默认值："wavs"）
download (bool, optional) – 如果在根路径下未找到数据集，是否下载该数据集。（默认值：False）。

__getitem__(n: int) → Tuple[torch.Tensor, int, str, str][source]¶

从数据集中加载第 n 个样本。

Parameters: n (int) – 要加载的样本的索引
Returns: (waveform, sample_rate, transcript, normalized_transcript)
Return type: (张量, 整数, 字符串, 字符串)

SPEECHCOMMANDS¶

class torchaudio.datasets.SPEECHCOMMANDS(root: Union[str, pathlib.Path], url: str = 'speech_commands_v0.02', folder_in_archive: str = 'SpeechCommands', download: bool = False, subset: Optional[str] = None)[source]¶

创建语音命令数据集。

Parameters

root (str 或 Path) – 数据集所在或下载的目录路径。
url (str, optional) – 用于下载数据集的 URL，或要下载的数据集类型。允许的 type 值为 "speech_commands_v0.01" 和 "speech_commands_v0.02"（默认值："speech_commands_v0.02"）
folder_in_archive (str, optional) – 数据集的顶层目录。（默认值："SpeechCommands"）
download (bool, optional) – 如果在根路径下未找到数据集，是否下载该数据集。（默认值：False）。
subset (str 或 None, 可选) – 选择数据集的子集 [None, “training”, “validation”, “testing”]。None 表示整个数据集。“validation”和“testing”分别在“validation_list.txt”和"testing_list.txt”中定义，而“training”则是其余部分。关于文件“validation_list.txt”和"testing_list.txt”的详细信息在数据集的 README 以及原始论文第 7 节的介绍及其参考文献 12 中有说明。原始论文可在此处找到。（默认值：None）

__getitem__(n: int) → Tuple[torch.Tensor, int, str, str, int][source]¶

从数据集中加载第 n 个样本。

Parameters: n (int) – 要加载的样本的索引
Returns: (waveform, sample_rate, label, speaker_id, utterance_number)
Return type: (张量, 整数, 字符串, 字符串, 整数)

TEDLIUM¶

class torchaudio.datasets.TEDLIUM(root: Union[str, pathlib.Path], release: str = 'release1', subset: Optional[str] = None, download: bool = False, audio_ext: str = '.sph')[source]¶

为 Tedlium 创建数据集。它支持第 1、2 和 3 版发布。

Parameters

root (str 或 Path) – 数据集所在或下载的目录路径。
release (str, optional) – 发布版本。允许的值是 "release1"、"release2" 或 "release3"。（默认值："release1"）。
subset (str, optional) – 要使用的数据集子集。有效选项为 "train"、"dev" 和 "test"（适用于版本 1 和 2），以及 None（适用于版本 3）。默认为 "train" 或 None。
download (bool, optional) – 如果在根路径下未找到数据集，是否下载该数据集。（默认值：False）。
audio_ext (str, optional) – 音频文件的扩展名（默认值："audio_ext"）

__getitem__(n: int) → Tuple[torch.Tensor, int, str, int, int, int][source]¶

从数据集中加载第 n 个样本。

Parameters: n (int) – 要加载的样本的索引
Returns: (waveform, sample_rate, transcript, talk_id, speaker_id, identifier)
Return type: 元组

property phoneme_dict¶

音素。从单词到音素元组的映射。请注意，某些单词的音素为空。

Type: 字典[字符串, 元组[字符串]]

VCTK¶

class torchaudio.datasets.VCTK(root: Union[str, pathlib.Path], url: str = 'https://datashare.is.ed.ac.uk/bitstream/handle/10283/3443/VCTK-Corpus-0.92.zip', folder_in_archive: str = 'VCTK-Corpus', download: bool = False, downsample: bool = False)[source]¶

为 VCTK 创建数据集。

注意

此数据集已不再公开提供。 请使用 VCTK_092
目录 p315 被忽略，因为没有对应的文本文件。有关数据集的更多信息，请访问：https://datashare.is.ed.ac.uk/handle/10283/3443

Parameters

root (str 或 Path) – 数据集所在或下载的目录路径。
url (str, optional) – 未使用，因为该数据集已不再公开提供。
folder_in_archive (str, optional) – 数据集的顶层目录。（默认值："VCTK-Corpus"）
download (bool, optional) – 如果未在根路径找到数据集，是否下载该数据集。（默认值：False）。传入 download=True 将导致错误，因为该数据集已不再公开提供。
downsample (bool, optional) – 未使用。

__getitem__(n: int) → Tuple[torch.Tensor, int, str, str, str][source]¶

从数据集中加载第 n 个样本。

Parameters: n (int) – 要加载的样本的索引
Returns: (waveform, sample_rate, utterance, speaker_id, utterance_id)
Return type: 元组

VCTK_092¶

class torchaudio.datasets.VCTK_092(root: str, mic_id: str = 'mic2', download: bool = False, url: str = 'https://datashare.is.ed.ac.uk/bitstream/handle/10283/3443/VCTK-Corpus-0.92.zip', audio_ext='.flac')[source]¶

创建 VCTK 0.92 数据集

Parameters

root (str) – 找到数据集顶级目录的根目录。
mic_id (str, optional) – 麦克风 ID。可以是 "mic1" 或 "mic2"。（默认值："mic2"）
download (bool, optional) – 如果在根路径下未找到数据集，是否下载该数据集。（默认值：False）。
url (str, optional) – 下载数据集的 URL。 (默认值: "https://datashare.is.ed.ac.uk/bitstream/handle/10283/3443/VCTK-Corpus-0.92.zip")
audio_ext (str, optional) – 如果数据集转换为非默认音频格式，则使用自定义音频扩展。

注意

由于缺少相应的文本文件，将跳过演讲者 p315 的所有演讲。
由于缺少音频文件，p280的所有演讲将被跳过 mic_id="mic2"。
由于缺少音频文件，说话人 p362 的部分演讲将被跳过。
另请参阅：https://datashare.is.ed.ac.uk/handle/10283/3443

__getitem__(n: int) → Tuple[torch.Tensor, int, str, str, str][source]¶

从数据集中加载第 n 个样本。

Parameters: n (int) – 要加载的样本的索引
Returns: (waveform, sample_rate, transcript, speaker_id, utterance_id)
Return type: (张量, 整数, 字符串, 字符串, 字符串)

YESNO¶

class torchaudio.datasets.YESNO(root: Union[str, pathlib.Path], url: str = 'http://www.openslr.org/resources/1/waves_yesno.tar.gz', folder_in_archive: str = 'waves_yesno', download: bool = False)[source]¶

为 YesNo 创建数据集。

Parameters

root (str 或 Path) – 数据集所在或下载的目录路径。
url (str, optional) – 下载数据集的 URL。 (默认值: "http://www.openslr.org/resources/1/waves_yesno.tar.gz")
folder_in_archive (str, optional) – 数据集的顶层目录。（默认值："waves_yesno"）
download (bool, optional) – 如果在根路径下未找到数据集，是否下载该数据集。（默认值：False）。

__getitem__(n: int) → Tuple[torch.Tensor, int, List[int]][source]¶

从数据集中加载第 n 个样本。

Parameters: n (int) – 要加载的样本的索引
Returns: (waveform, sample_rate, labels)
Return type: (Tensor, 整数, List[整数])

torchaudio.datasets¶

CMUARCTIC¶

CMUDict¶

COMMONVOICE¶

GTZAN¶

LIBRISPEECH¶

LIBRITTS¶

LJSPEECH¶

SPEECHCOMMANDS¶

TEDLIUM¶

VCTK¶

VCTK_092¶

YESNO¶

文档

教程

资源