The naive approach to streaming audio synthesis using deep neural networks is to break up the input into chunks and then run synthesis on each chunk. Unfortunately, this introduces wasted computation and discontinuities. In this blog post, I present a simple and robust alternative.
Neural transducers are commonly used for automatic speech recognition (ASR), often achieving state-of-the-art results for quality and inference speech; for instance, they power Google's offline ASR engine. In this post, I'd like to
This is a continuation of Part 1 of this two-part series. In this post, I'll try to go over the implementation of PQMF filters in sufficient detail such that you'll be able to
In the past year or so, there's been several papers that investigate using sub-band coding with neural vocoders to model audio and accelerate inference: FFTNet with sub-band codingWaveNet with sub-band codingDurIan TTS System
A deep dive into several Facebook publications about knowledge-augmented language tasks, such as question answering and entity linking.
In this post, I'll derive the equations for DiffWave and WaveGrad using diffusion probabilistic processes.