torchaudio.datasets¶
All datasets are subclasses of torch.utils.data.Dataset
i.e, they have __getitem__ and __len__ methods implemented.
Hence, they can all be passed to a torch.utils.data.DataLoader
which can load multiple samples parallelly using torch.multiprocessing workers.
For example:
yesno_data = torchaudio.datasets.YESNO('.', download=True)
data_loader = torch.utils.data.DataLoader(yesno_data,
                                          batch_size=1,
                                          shuffle=True,
                                          num_workers=args.nThreads)
The following datasets are available:
Datasets
All the datasets have almost similar API. They all have two common arguments:
transform and  target_transform to transform the input and target respectively.
CMUARCTIC¶
- 
class torchaudio.datasets.CMUARCTIC(root: str, url: str = 'aew', folder_in_archive: str = 'ARCTIC', download: bool = False)[source]¶
- Create a Dataset for CMU_ARCTIC. - Parameters
- root (str) – Path to the directory where the dataset is found or downloaded. 
- url (str, optional) – The URL to download the dataset from or the type of the dataset to dowload. (default: - "aew") Allowed type values are- "aew",- "ahw",- "aup",- "awb",- "axb",- "bdl",- "clb",- "eey",- "fem",- "gka",- "jmk",- "ksp",- "ljm",- "lnh",- "rms",- "rxr",- "slp"or- "slt".
- folder_in_archive (str, optional) – The top-level directory of the dataset. (default: - "ARCTIC")
- download (bool, optional) – Whether to download the dataset if it is not found at root path. (default: - False).
 
 
COMMONVOICE¶
- 
class torchaudio.datasets.COMMONVOICE(root: str, tsv: str = 'train.tsv', url: str = 'english', folder_in_archive: str = 'CommonVoice', version: str = 'cv-corpus-4-2019-12-10', download: bool = False)[source]¶
- Create a Dataset for CommonVoice. - Parameters
- root (str) – Path to the directory where the dataset is found or downloaded. 
- tsv (str, optional) – The name of the tsv file used to construct the metadata. (default: - "train.tsv")
- url (str, optional) – The URL to download the dataset from, or the language of the dataset to download. (default: - "english"). Allowed language values are- "tatar",- "english",- "german",- "french",- "welsh",- "breton",- "chuvash",- "turkish",- "kyrgyz",- "irish",- "kabyle",- "catalan",- "taiwanese",- "slovenian",- "italian",- "dutch",- "hakha chin",- "esperanto",- "estonian",- "persian",- "portuguese",- "basque",- "spanish",- "chinese",- "mongolian",- "sakha",- "dhivehi",- "kinyarwanda",- "swedish",- "russian",- "indonesian",- "arabic",- "tamil",- "interlingua",- "latvian",- "japanese",- "votic",- "abkhaz",- "cantonese"and- "romansh sursilvan".
- folder_in_archive (str, optional) – The top-level directory of the dataset. 
- version (str) – Version string. (default: - "cv-corpus-4-2019-12-10") For the other allowed values, Please checkout https://commonvoice.mozilla.org/en/datasets.
- download (bool, optional) – Whether to download the dataset if it is not found at root path. (default: - False).
 
 
GTZAN¶
- 
class torchaudio.datasets.GTZAN(root: str, url: str = 'http://opihi.cs.uvic.ca/sound/genres.tar.gz', folder_in_archive: str = 'genres', download: bool = False, subset: Optional[str] = None)[source]¶
- Create a Dataset for GTZAN. - Note - Please see http://marsyas.info/downloads/datasets.html if you are planning to use this dataset to publish results. - Parameters
- root (str) – Path to the directory where the dataset is found or downloaded. 
- url (str, optional) – The URL to download the dataset from. (default: - "http://opihi.cs.uvic.ca/sound/genres.tar.gz")
- folder_in_archive (str, optional) – The top-level directory of the dataset. 
- download (bool, optional) – Whether to download the dataset if it is not found at root path. (default: - False).
- subset (str, optional) – Which subset of the dataset to use. One of - "training",- "validation",- "testing"or- None. If- None, the entire dataset is used. (default:- None).
 
 
LIBRISPEECH¶
- 
class torchaudio.datasets.LIBRISPEECH(root: str, url: str = 'train-clean-100', folder_in_archive: str = 'LibriSpeech', download: bool = False)[source]¶
- Create a Dataset for LibriSpeech. - Parameters
- root (str) – Path to the directory where the dataset is found or downloaded. 
- url (str, optional) – The URL to download the dataset from, or the type of the dataset to dowload. Allowed type values are - "dev-clean",- "dev-other",- "test-clean",- "test-other",- "train-clean-100",- "train-clean-360"and- "train-other-500". (default:- "train-clean-100")
- folder_in_archive (str, optional) – The top-level directory of the dataset. (default: - "LibriSpeech")
- download (bool, optional) – Whether to download the dataset if it is not found at root path. (default: - False).
 
 
LIBRITTS¶
- 
class torchaudio.datasets.LIBRITTS(root: str, url: str = 'train-clean-100', folder_in_archive: str = 'LibriTTS', download: bool = False)[source]¶
- Create a Dataset for LibriTTS. - Parameters
- root (str) – Path to the directory where the dataset is found or downloaded. 
- url (str, optional) – The URL to download the dataset from, or the type of the dataset to dowload. Allowed type values are - "dev-clean",- "dev-other",- "test-clean",- "test-other",- "train-clean-100",- "train-clean-360"and- "train-other-500". (default:- "train-clean-100")
- folder_in_archive (str, optional) – The top-level directory of the dataset. (default: - "LibriTTS")
- download (bool, optional) – Whether to download the dataset if it is not found at root path. (default: - False).
 
 
LJSPEECH¶
- 
class torchaudio.datasets.LJSPEECH(root: str, url: str = 'https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2', folder_in_archive: str = 'wavs', download: bool = False)[source]¶
- Create a Dataset for LJSpeech-1.1. - Parameters
- root (str) – Path to the directory where the dataset is found or downloaded. 
- url (str, optional) – The URL to download the dataset from. (default: - "https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2")
- folder_in_archive (str, optional) – The top-level directory of the dataset. (default: - "wavs")
- download (bool, optional) – Whether to download the dataset if it is not found at root path. (default: - False).
 
 
SPEECHCOMMANDS¶
- 
class torchaudio.datasets.SPEECHCOMMANDS(root: str, url: str = 'speech_commands_v0.02', folder_in_archive: str = 'SpeechCommands', download: bool = False)[source]¶
- Create a Dataset for Speech Commands. - Parameters
- root (str) – Path to the directory where the dataset is found or downloaded. 
- url (str, optional) – The URL to download the dataset from, or the type of the dataset to dowload. Allowed type values are - "speech_commands_v0.01"and- "speech_commands_v0.02"(default:- "speech_commands_v0.02")
- folder_in_archive (str, optional) – The top-level directory of the dataset. (default: - "SpeechCommands")
- download (bool, optional) – Whether to download the dataset if it is not found at root path. (default: - False).
 
 
TEDLIUM¶
- 
class torchaudio.datasets.TEDLIUM(root: str, release: str = 'release1', subset: str = None, download: bool = False, audio_ext='.sph')[source]¶
- Create a Dataset for Tedlium. It supports releases 1,2 and 3. - Parameters
- root (str) – Path to the directory where the dataset is found or downloaded. 
- release (str, optional) – Release version. Allowed values are - "release1",- "release2"or- "release3". (default:- "release1").
- subset (str, optional) – The subset of dataset to use. Valid options are - "train",- "dev", and- "test"for releases 1&2,- Nonefor release3. Defaults to- "train"or- None.
- download (bool, optional) – Whether to download the dataset if it is not found at root path. (default: - False).
 
 
VCTK¶
- 
class torchaudio.datasets.VCTK(root: str, url: str = 'https://datashare.is.ed.ac.uk/bitstream/handle/10283/3443/VCTK-Corpus-0.92.zip', folder_in_archive: str = 'VCTK-Corpus', download: bool = False, downsample: bool = False, transform: Any = None, target_transform: Any = None)[source]¶
- Create a Dataset for VCTK. - Note - This dataset is no longer publicly available. Please use - VCTK_092
- Directory - p315is ignored because there is no corresponding text files. For more information about the dataset visit: https://datashare.is.ed.ac.uk/handle/10283/3443
 - Parameters
- root (str) – Path to the directory where the dataset is found or downloaded. 
- url (str, optional) – Not used as the dataset is no longer publicly available. 
- folder_in_archive (str, optional) – The top-level directory of the dataset. (default: - "VCTK-Corpus")
- download (bool, optional) – Whether to download the dataset if it is not found at root path. (default: - False). Giving- download=Truewill result in error as the dataset is no longer publicly available.
- downsample (bool, optional) – Not used. 
- transform (callable, optional) – Optional transform applied on waveform. (default: - None)
- target_transform (callable, optional) – Optional transform applied on utterance. (default: - None)
 
 
VCTK_092¶
- 
class torchaudio.datasets.VCTK_092(root: str, mic_id: str = 'mic2', download: bool = False, url: str = 'https://datashare.is.ed.ac.uk/bitstream/handle/10283/3443/VCTK-Corpus-0.92.zip', audio_ext='.flac')[source]¶
- Create VCTK 0.92 Dataset - Parameters
- root (str) – Root directory where the dataset’s top level directory is found. 
- mic_id (str) – Microphone ID. Either - "mic1"or- "mic2". (default:- "mic2")
- download (bool, optional) – Whether to download the dataset if it is not found at root path. (default: - False).
- url (str, optional) – The URL to download the dataset from. (default: - "https://datashare.is.ed.ac.uk/bitstream/handle/10283/3443/VCTK-Corpus-0.92.zip")
- audio_ext (str, optional) – Custom audio extension if dataset is converted to non-default audio format. 
 
 - Note - All the speeches from speaker - p315will be skipped due to the lack of the corresponding text files.
- All the speeches from - p280will be skipped for- mic_id="mic2"due to the lack of the audio files.
- Some of the speeches from speaker - p362will be skipped due to the lack of the audio files.
 
YESNO¶
- 
class torchaudio.datasets.YESNO(root: str, url: str = 'http://www.openslr.org/resources/1/waves_yesno.tar.gz', folder_in_archive: str = 'waves_yesno', download: bool = False, transform: Any = None, target_transform: Any = None)[source]¶
- Create a Dataset for YesNo. - Parameters
- root (str) – Path to the directory where the dataset is found or downloaded. 
- url (str, optional) – The URL to download the dataset from. (default: - "http://www.openslr.org/resources/1/waves_yesno.tar.gz")
- folder_in_archive (str, optional) – The top-level directory of the dataset. (default: - "waves_yesno")
- download (bool, optional) – Whether to download the dataset if it is not found at root path. (default: - False).
- transform (callable, optional) – Optional transform applied on waveform. (default: - None)
- target_transform (callable, optional) – Optional transform applied on utterance. (default: - None)