torchaudio.datasets¶

所有数据集都是torch.utils.data.Dataset并有和方法实施。因此，它们都可以传递给__getitem____len__torch.utils.data.DataLoader它可以使用 worker 并行加载多个样本。例如：torch.multiprocessing

yesno_data = torchaudio.datasets.YESNO('.', download=True)
data_loader = torch.utils.data.DataLoader(yesno_data,
                                          batch_size=1,
                                          shuffle=True,
                                          num_workers=args.nThreads)

中国 CMUARCTIC¶

类（根： Union[str， pathlib.Path]， url： str = 'aew'， folder_in_archive： str = 'ARCTIC'， 下载： bool = False）[来源]torchaudio.datasets.CMUARCTIC¶

为 CMU ARCTIC 创建数据集 [1]。

参数

root （str 或 Path） – 找到或下载数据集的目录的路径。
url （str， optional） – 要从中下载数据集的 URL 或要下载的数据集的类型。（默认值：）允许的类型值为、或。"aew""aew""ahw""aup""awb""axb""bdl""clb""eey""fem""gka""jmk""ksp""ljm""lnh""rms""rxr""slp""slt"
folder_in_archive （str， optional） – 数据集的顶级目录。（默认："ARCTIC")
download （bool， optional） – 如果在根路径中找不到数据集，是否下载数据集。（默认值：）。False

__getitem__(n： int） → Tuple[torch.Tensor， int， str， str][来源]¶

从数据集中加载第 n 个样本。

参数: n （int） – 要加载的样本的索引
返回: (waveform, sample_rate, transcript, utterance_id)
返回类型: （张量、int、str、str)

CMUDict¶

类（根： Union[str， pathlib.Path]， exclude_punctuations： bool = True， *， download： bool = False， url： str = 'http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b'， url_symbols： str = 'http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b.symbols'）[来源]torchaudio.datasets.CMUDict¶

为 CMU 发音词典 [2] （CMUDict）创建数据集。

参数

root （str 或 Path） – 找到或下载数据集的目录的路径。
exclude_punctuations （bool， optional） – 启用后，排除标点符号的发音，例如！感叹号和 #HASH 标记。
download （bool， optional） – 如果在根路径中找不到数据集，是否下载数据集。（默认值：）。False
url （str， optional） – 要从中下载词典的 URL。（默认："http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b")
url_symbols （str， optional） – 要从中下载元件列表的 URL。（默认："http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b.symbols")

__getitem__(n： int） → Tuple[str， List[str]][来源]¶

从数据集中加载第 n 个样本。

参数: n （int） – 要加载的样本的索引。
返回: 相应的单词和音素 .(word, [phonemes])
返回类型: (str， List[str])

财产 symbols¶

音素符号列表，例如 AA、AE、AH。

类型: 列表[str]

COMMONVOICE 公司¶

类（根： Union[str， pathlib.路径]， tsv： str = 'train.tsv'）[来源]torchaudio.datasets.COMMONVOICE¶

为 CommonVoice 创建一个数据集 [3]。

参数

root （str 或 Path） – 数据集所在目录的路径。（如果存在文件。tsv
tsv （str，可选） – 用于构建元数据的 tsv 文件的名称，例如、和。（默认："train.tsv""test.tsv""dev.tsv""invalidated.tsv""validated.tsv""other.tsv""train.tsv")

__getitem__(n： int） → Tuple[torch.张量、整数、Dict[str， str]][来源]¶

从数据集中加载第 n 个样本。

参数: n （int） – 要加载的样本的索引
返回: (waveform, sample_rate, dictionary)，其中 dictionary 是从 TSV 文件构建的，其中包含以下键：、和。client_idpathsentenceup_votesdown_votesagegenderaccent
返回类型: （张量、整数、字典[str、str])

GTZAN 公司¶

类（根： Union[str， pathlib.Path]， url： str = 'http://opihi.cs.uvic.ca/sound/genres.tar.gz'， folder_in_archive： str = 'genres'， 下载： bool = False，子集：可选[str] = None）[来源]torchaudio.datasets.GTZAN¶

为 GTZAN 创建一个数据集 [4]。

注意

如果您打算使用 http://marsyas.info/downloads/datasets.html 此数据集发布结果。

参数

root （str 或 Path） – 找到或下载数据集的目录的路径。
url （str， optional） – 要从中下载数据集的 URL。（默认："http://opihi.cs.uvic.ca/sound/genres.tar.gz")
folder_in_archive （str， optional） – 数据集的顶级目录。
download （bool， optional） – 如果在根路径中找不到数据集，是否下载数据集。（默认值：）。False
subset （str 或 None，可选） – 要使用的数据集子集。、或之一。如果，则使用整个数据集。（默认值：）。"training""validation""testing"NoneNoneNone

__getitem__(n： int） → Tuple[torch.Tensor， int， str][来源]¶

从数据集中加载第 n 个样本。

参数: n （int） – 要加载的样本的索引
返回: (waveform, sample_rate, label)
返回类型: （张量、整数、str)

利布里混合¶

类（根： Union[str， pathlib.Path]，子集：str = 'train-360'，num_speakers：int = 2，sample_rate：int = 8000，任务：str = 'sep_clean'）[来源]torchaudio.datasets.LibriMix¶

创建 LibriMix [5] 数据集。

参数

root （str 或 Path） – 存储目录或的目录的路径。Libri2MixLibri3Mix
subset （str， optional） – 要使用的子集。选项：[、、和 ]（默认值：）。train-360train-100devtesttrain-360
num_speakers （int， optional） – 说话人的数量，用于确定目录遍历。Dataset 将遍历到要收集的目录 N 个源音频。（默认值：2）s1sN
sample_rate （int， optional） – 音频文件的采样率。确定获取音频的子目录。如果任何音频具有不同的样本率，提高。选项： [8000， 16000] （默认值： 8000）sample_rateValueError
task （str， optional） – LibriMix 的任务。选项：[、] （默认：enh_singleenh_bothsep_cleansep_noisysep_clean)

注意

LibriMix 数据集需要手动生成。请检查 https://github.com/JorisCos/LibriMix

__getitem__(key： int） → Tuple[int， torch.张量、List[torch.张量]][来源]¶

从数据集中加载第 n 个样本。 :p aram key：要加载的样本的索引：type 键： int

返回: (sample_rate, mix_waveform, list_of_source_waveforms)
返回类型: (int， Tensor， List[Tensor]）

LIBRISPEECH¶

类（根： Union[str， pathlib.Path]， url： str = 'train-clean-100'， folder_in_archive： str = 'LibriSpeech'， 下载：布尔 = False）[来源]torchaudio.datasets.LIBRISPEECH¶

为 LibriSpeech 创建一个数据集 [6]。

参数

root （str 或 Path） – 找到或下载数据集的目录的路径。
url （str， optional） – 要从中下载数据集的 URL，或要下载的数据集的类型。允许的类型值为、、、、和。（默认："dev-clean""dev-other""test-clean""test-other""train-clean-100""train-clean-360""train-other-500""train-clean-100")
folder_in_archive （str， optional） – 数据集的顶级目录。（默认："LibriSpeech")
download （bool， optional） – 如果在根路径中找不到数据集，是否下载数据集。（默认值：）。False

__getitem__(n： int） → Tuple[torch.张量、int、str、int、int、int][来源]¶

从数据集中加载第 n 个样本。

参数: n （int） – 要加载的样本的索引
返回: (waveform, sample_rate, transcript, speaker_id, chapter_id, utterance_id)
返回类型: （张量、int、str、int、int、int)

LibriLightLimited （利布莱特有限公司）¶

类（根： Union[str， pathlib.Path]， subset： str = '10min'， 下载： bool = False）[来源]torchaudio.datasets.LibriLightLimited¶

为 LibriLightLimited 创建一个数据集，它是: LibriLight 数据集。

参数

root （str 或 Path） – 找到或下载数据集的目录的路径。
subset （str， optional） – 要使用的子集。选项：[、、] （默认值：）。10min1h10h10min
download （bool， optional） – 如果在根路径中找不到数据集，是否下载数据集。（默认值：）。False

__getitem__(n： int） → Tuple[torch.张量、int、str、int、int、int][来源]¶

从数据集中加载第 n 个样本。 :p aram n：需要加载的样本的索引：type n： int

返回: (waveform, sample_rate, transcript, speaker_id, chapter_id, utterance_id)
返回类型: （张量、int、str、int、int、int)

利布里茨¶

类（根： Union[str， pathlib.Path]， url： str = 'train-clean-100'， folder_in_archive： str = 'LibriTTS'， 下载： bool = False）[来源]torchaudio.datasets.LIBRITTS¶

为 LibriTTS 创建数据集 [7]。

参数

root （str 或 Path） – 找到或下载数据集的目录的路径。
url （str， optional） – 要从中下载数据集的 URL，或要下载的数据集的类型。允许的类型值为、、、、和。（默认："dev-clean""dev-other""test-clean""test-other""train-clean-100""train-clean-360""train-other-500""train-clean-100")
folder_in_archive （str， optional） – 数据集的顶级目录。（默认："LibriTTS")
download （bool， optional） – 如果在根路径中找不到数据集，是否下载数据集。（默认值：）。False

__getitem__(n： int） → Tuple[torch.张量、int、str、str、int、int、str][来源]¶

从数据集中加载第 n 个样本。

参数: n （int） – 要加载的样本的索引
返回: (waveform, sample_rate, original_text, normalized_text, speaker_id, chapter_id, utterance_id)
返回类型: （张量、int、str、str、str、int、int、str)

LJSPEECH¶

类（根： Union[str， pathlib.Path]， url： str = 'https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2'， folder_in_archive： str = 'wavs'， 下载： bool = False）[来源]torchaudio.datasets.LJSPEECH¶

为 LJSpeech-1.1 创建一个数据集 [8]。

参数

root （str 或 Path） – 找到或下载数据集的目录的路径。
url （str， optional） – 要从中下载数据集的 URL。（默认："https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2")
folder_in_archive （str， optional） – 数据集的顶级目录。（默认："wavs")
download （bool， optional） – 如果在根路径中找不到数据集，是否下载数据集。（默认值：）。False

__getitem__(n： int） → Tuple[torch.Tensor， int， str， str][来源]¶

从数据集中加载第 n 个样本。

参数: n （int） – 要加载的样本的索引
返回: (waveform, sample_rate, transcript, normalized_transcript)
返回类型: （张量、int、str、str)

语音命令¶

类（根： Union[str， pathlib.Path]， url： str = 'speech_commands_v0.02'， folder_in_archive： str = 'SpeechCommands'， 下载：布尔 = False，子集：可选[str] = None）[来源]torchaudio.datasets.SPEECHCOMMANDS¶

为语音命令创建数据集 [9]。

参数

root （str 或 Path） – 找到或下载数据集的目录的路径。
url （str， optional） – 要从中下载数据集的 URL，或要下载的数据集的类型。允许的类型值为和（默认值："speech_commands_v0.01""speech_commands_v0.02""speech_commands_v0.02")
folder_in_archive （str， optional） – 数据集的顶级目录。（默认："SpeechCommands")
download （bool， optional） – 如果在根路径中找不到数据集，是否下载数据集。（默认值：）。False
subset （str 或 None，可选） – 选择数据集的子集 [None， “training”， “validation”， “testing”]。None 表示整个数据集。“validation” 和 “testing” 在 “validation_list.txt” 和分别是 “testing_list.txt”，其余的则是 “training”。文件的详细信息 “validation_list.txt”和“testing_list.txt”在数据集的 README 中进行了解释以及原始论文第 7 节及其参考文献 12 的引言。这原始论文可以在这里找到。（默认：None)

__getitem__(n： int） → Tuple[torch.Tensor， int， str， str， int][来源]¶

从数据集中加载第 n 个样本。

参数: n （int） – 要加载的样本的索引
返回: (waveform, sample_rate, label, speaker_id, utterance_number)
返回类型: （张量、 int、 str、 str、 int)

特德利姆¶

类（根： Union[str， pathlib.Path]， release： str = 'release1'， 子集： str = 'train'， 下载： bool = False， audio_ext： str = '.sph'）[来源]torchaudio.datasets.TEDLIUM¶

为 Tedlium 创建一个数据集 [10]。它支持版本 1、2 和 3。

参数

root （str 或 Path） – 找到或下载数据集的目录的路径。
release （str， optional） – 发行版。允许的值为，或。（默认值：）。"release1""release2""release3""release1"
subset （str， optional）（子集，可选） – 要使用的数据集子集。有效选项包括、、和。默认为。"train""dev""test""train"
download （bool， optional） – 如果在根路径中找不到数据集，是否下载数据集。（默认值：）。False
audio_ext （str， optional） – 音频文件的扩展名（默认：".sph")

__getitem__(n： int） → Tuple[torch.张量、int、str、int、int、int][来源]¶

从数据集中加载第 n 个样本。

参数: n （int） – 要加载的样本的索引
返回: (waveform, sample_rate, transcript, talk_id, speaker_id, identifier)
返回类型: 元

财产 phoneme_dict¶

音素。从单词映射到音素元组。请注意，某些单词的音素为空。

类型: dict[str， tuple[str]]

VCTK_092¶

class （root： str， mic_id： str = 'mic2'， 下载： bool = False， 网址： str = 'https://datashare.is.ed.ac.uk/bitstream/handle/10283/3443/VCTK-Corpus-0.92.zip'， audio_ext='.flac'）[来源]torchaudio.datasets.VCTK_092¶

创建 VCTK 0.92 [11] 数据集

参数

root （str） – 找到数据集的顶级目录的根目录。
mic_id （str， optional） – 麦克风 ID。（默认："mic1""mic2""mic2")
download （bool， optional） – 如果在根路径中找不到数据集，是否下载数据集。（默认值：）。False
url （str， optional） – 要从中下载数据集的 URL。（默认："https://datashare.is.ed.ac.uk/bitstream/handle/10283/3443/VCTK-Corpus-0.92.zip")
audio_ext （str， optional） – 如果数据集转换为非默认音频格式，则为自定义音频扩展。

注意

由于缺少相应的文本文件，演讲者的所有演讲都将被跳过。p315
由于缺少音频文件，所有语音都将被跳过。p280mic_id="mic2"
由于缺少音频文件，演讲者的一些演讲将被跳过。p362
另请参见：https://datashare.is.ed.ac.uk/handle/10283/3443

__getitem__(n： int） → Tuple[torch.Tensor， int， str， str， str][来源]¶

从数据集中加载第 n 个样本。

参数: n （int） – 要加载的样本的索引
返回: (waveform, sample_rate, transcript, speaker_id, utterance_id)
返回类型: （张量、int、str、str、str)

DR_VCTK¶

类（根： Union[str， pathlib.Path]， subset： str = 'train'， *， download： bool = False， url： str = 'https://datashare.ed.ac.uk/bitstream/handle/10283/3038/DR-VCTK.zip'）[来源]torchaudio.datasets.DR_VCTK¶

为 Device Recorded VCTK （Small subset version） [12] 创建数据集。

参数

root （str 或 Path） – 找到数据集的顶级目录的根目录。
subset （str） – 要使用的子集。可以是和之一。（默认值：）。"train""test""train"
download （bool） – 如果在根路径中找不到数据集，是否下载数据集。（默认值：）。False
url （str） – 要从中下载数据集的 URL。（默认："https://datashare.ed.ac.uk/bitstream/handle/10283/3038/DR-VCTK.zip")

__getitem__(n： int） → Tuple[torch.Tensor、int、torch 的 Tensor 和 Torch 中。张量、int、str、str、str、int][来源]¶

从数据集中加载第 n 个样本。

参数: n （int） – 要加载的样本的索引
返回: (waveform_clean, sample_rate_clean, waveform_noisy, sample_rate_noisy, speaker_id, utterance_id, source, channel_id)
返回类型: （Tensor， int， Tensor， int， str， str， str， int)

是的没有¶

类（根： Union[str， pathlib.Path]， url： str = 'http://www.openslr.org/resources/1/waves_yesno.tar.gz'， folder_in_archive： str = 'waves_yesno'， 下载： bool = False）[来源]torchaudio.datasets.YESNO¶

为 YesNo 创建一个数据集 [13]。

参数

root （str 或 Path） – 找到或下载数据集的目录的路径。
url （str， optional） – 要从中下载数据集的 URL。（默认："http://www.openslr.org/resources/1/waves_yesno.tar.gz")
folder_in_archive （str， optional） – 数据集的顶级目录。（默认："waves_yesno")
download （bool， optional） – 如果在根路径中找不到数据集，是否下载数据集。（默认值：）。False

教程使用：YESNO: 音频数据集¶

__getitem__(n： int） → Tuple[torch.张量、整数、列表[int]][来源]¶

从数据集中加载第 n 个样本。

参数: n （int） – 要加载的样本的索引
返回: (waveform, sample_rate, labels)
返回类型: （张量、 int、 List[int])

QUESST14¶

类（根： Union[str， pathlib.Path]， subset： str， language： Optional[str] = 'nnenglish'， 下载： bool = False）[来源]torchaudio.datasets.QUESST14¶

创建 QUESST14 [14] 数据集

参数

root （str 或 Path） – 找到数据集的顶级目录的根目录
subset （str） – 要使用的数据集的子集。选项：[，， ]。"docs""dev""eval"
language （str 或 None，可选） – 要获取数据集的语言。选项：[，， ]。如果，则 dataset 包含所有语言。（默认：NonealbanianbasqueczechnnenglishromanianslovakNone"nnenglish")
download （bool， optional） – 如果在根路径中找不到数据集，是否下载数据集。（默认：False)

__getitem__(n： int） → Tuple[torch.Tensor， int， str][来源]¶

从数据集中加载第 n 个样本。

参数: n （int） – 要加载的样本的索引
返回: (waveform, sample_rate, file_name)
返回类型: （张量、整数、str)

引用¶

1: John Kominek、Alan W Black 和 Ver Ver. 用于语音合成的 Cmu 北极数据库。技术报告，2003 年。
2: RL 魏德。卡内基梅隆大学发音词典。1998. 网址：http://www.speech.cs.cmu.edu/cgi-bin/cmudict。
3: 罗珊娜·阿迪拉、梅根·布兰森、凯利·戴维斯、迈克尔·亨雷蒂、迈克尔·科勒、乔什·迈耶、鲁本·莫莱斯、林赛·桑德斯、弗朗西斯·泰尔斯和格雷戈尔·韦伯。Common voice：一个包含大量多语言的语音语料库。2020. arXiv：1912.06670.
4: George Tzanetakis、Georg Essl 和 Perry Cook。音频信号的自动音乐流派分类。2001. 网址：http://ismir2001.ismir.net/pdf/tzanetakis.pdf。
5: Joris Cosentino、Manuel Pariente、Samuele Cornell、Antoine Deleforge 和 Emmanuel Vincent。Librimix：用于通用语音分离的开源数据集。2020. arXiv：2005.11262.
6: Vassil Panayotov、Guoguo Chen、Daniel Povey 和 Sanjeev Khudanpur。Librispeech：基于公共领域有声读物的 asr 语料库。2015 年 IEEE 声学、语音和信号处理国际会议（ICASSP），卷，5206–5210。2015. doi：10.1109/ICASSP.2015.7178964.
7: Heiga Zen、Viet-Trung Dang、Robert A. J. Clark、Yu Zhang、Ron J. Weiss、Ye Jia、Z. Chen 和 Yonghui Wu。Libritts：从 librispeech 派生的用于文本转语音的语料库。ArXiv，2019 年。
8: Keith Ito 和 Linda Johnson。lj 语音数据集。https://keithito.com/LJ-Speech-Dataset/，2017 年。
9: P. 典狱长。语音命令：用于有限词汇语音识别的数据集。ArXiv 电子版画，2018 年 4 月。网址：https://arxiv.org/abs/1804.03209，arXiv：1804.03209。
10: Anthony Rousseau、Paul Delégise 和 Yannick Estève。Ted-lium：自动语音识别专用语料库。语言资源与评估会议（LREC），125-129。2012.
11: Junichi Yamagishi、Christophe Veaux 和 Kirsten MacDonald。CSTR VCTK 语料库：CSTR 语音克隆工具包（版本 0.92）的英语多说话人语料库。2019. doi：10.7488/ds/2645.
12: 赛义德·赛义德·萨夫朱（Seyyed Saeed Sarfjoo）和山岸纯一（Junichi Yamagishi）。设备录制的 vctk （小子集版本）。2018.
13: 是的，不是。网址：http://www.openslr.org/1/。
14: 泽维尔·安格拉·米罗、路易斯·哈维尔·罗德里格斯-富恩特斯、安迪·布佐、弗洛里安·梅茨、伊戈尔·佐克和米克尔·佩纳加里卡诺。Quesst2014：使用真实查询在零资源设置中评估逐例查询语音搜索。2015 年 IEEE 声学、语音和信号处理国际会议（ICASSP），第 5833–5837 页，2015 年。

torchaudio.datasets¶

中国 CMUARCTIC¶

CMUDict¶

COMMONVOICE 公司¶

GTZAN 公司¶

利布里混合¶

LIBRISPEECH¶

LibriLightLimited （利布莱特有限公司）¶

利布里茨¶

LJSPEECH¶

语音命令¶

特德利姆¶

VCTK_092¶

DR_VCTK¶

是的没有¶

QUESST14¶

引用¶

文档

教程

资源