torchaudio.functional¶
Functions to perform common audio operations.
Utility¶
| Turn a spectrogram from the power/amplitude scale to the decibel scale. | |
| Turn a tensor from the decibel scale to the power/amplitude scale. | |
| Create a frequency bin conversion matrix. | |
| Creates a linear triangular filterbank. | |
| Create a DCT transformation matrix with shape ( | |
| Apply a mask along  | |
| Apply a mask along  | |
| Encode signal based on mu-law companding. | |
| Decode mu-law encoded signal. | |
| DEPRECATED: Apply codecs as a form of augmentation. | |
| Resamples the waveform at the new frequency using bandlimited interpolation. | |
| Measure audio loudness according to the ITU-R BS.1770-4 recommendation. | |
| Convolves inputs along their last dimension using the direct method. | |
| Convolves inputs along their last dimension using FFT. | |
| Scales and adds noise to waveform per signal-to-noise ratio. | |
| Pre-emphasizes a waveform along its last dimension, i.e. for each signal \(x\) in  | |
| De-emphasizes a waveform along its last dimension. | |
| Adjusts waveform speed. | |
| Computes the Fréchet distance between two multivariate normal distributions [Dowson and Landau, 1982]. | 
Forced Alignment¶
| Align a CTC label sequence to an emission. | |
| Removes repeated tokens and blank tokens from the given CTC token sequence. | |
| Token with time stamps and score. | 
Filtering¶
| Design two-pole all-pass filter. | |
| Design two-pole band filter. | |
| Design two-pole band-pass filter. | |
| Design two-pole band-reject filter. | |
| Design a bass tone-control effect. | |
| Perform a biquad filter of input tensor. | |
| Apply contrast effect. | |
| Apply a DC shift to the audio. | |
| Apply ISO 908 CD de-emphasis (shelving) IIR filter. | |
| Apply dither | |
| Design biquad peaking equalizer filter and perform filtering. | |
| Apply an IIR filter forward and backward to a waveform. | |
| Apply a flanger effect to the audio. | |
| Apply amplification or attenuation to the whole waveform. | |
| Design biquad highpass filter and perform filtering. | |
| Perform an IIR filter by evaluating difference equation, using differentiable implementation developed independently by Yu et al. [Yu and Fazekas, 2023] and Forgione et al. [Forgione and Piga, 2021]. | |
| Design biquad lowpass filter and perform filtering. | |
| Apply a overdrive effect to the audio. | |
| Apply a phasing effect to the audio. | |
| Apply RIAA vinyl playback equalization. | |
| Design a treble tone-control effect. | 
Feature Extractions¶
| Voice Activity Detector. | |
| Create a spectrogram or a batch of spectrograms from a raw audio signal. | |
| Create an inverse spectrogram or a batch of inverse spectrograms from the provided complex-valued spectrogram. | |
| Compute waveform from a linear scale magnitude spectrogram using the Griffin-Lim transformation. | |
| Given a STFT tensor, speed up in time without modifying pitch by a factor of  | |
| Shift the pitch of a waveform by  | |
| Compute delta coefficients of a tensor, usually a spectrogram: | |
| Detect pitch frequency. | |
| Apply sliding-window cepstral mean (and optionally variance) normalization per utterance. | |
| Compute the spectral centroid for each channel along the time axis. | 
Multi-channel¶
| Compute cross-channel power spectral density (PSD) matrix. | |
| Compute the Minimum Variance Distortionless Response (MVDR [Capon, 1969]) beamforming weights by the method proposed by Souden et, al. [Souden et al., 2009]. | |
| Compute the Minimum Variance Distortionless Response (MVDR [Capon, 1969]) beamforming weights based on the relative transfer function (RTF) and power spectral density (PSD) matrix of noise. | |
| Estimate the relative transfer function (RTF) or the steering vector by eigenvalue decomposition. | |
| Estimate the relative transfer function (RTF) or the steering vector by the power method. | |
| Apply the beamforming weight to the multi-channel noisy spectrum to obtain the single-channel enhanced spectrum. | 
Loss¶
| Compute the RNN Transducer loss from Sequence Transduction with Recurrent Neural Networks [Graves, 2012]. | 
Metric¶
| Calculate the word level edit (Levenshtein) distance between two sequences. |