Basic Usage

This section shows how to use pretrained models provided by parakeet and make inference with them.

Pretrained models are provided in a archive. Extract it to get a folder like this:

checkpoint_name/
├──config.yaml
└──step-310000.pdparams

The config.yaml stores the config used to train the model, the step-N.pdparams is the parameter file, where N is the steps it has been trained.

The example code below shows how to use the models for prediction.

text to spectrogram

The code below show how to use a transformer_tts model. After loading the pretrained model, use model.predict(sentence) to generate spectrograms (in numpy.ndarray format), which can be further used to synthesize raw audio with a vocoder.

>>> import parakeet
>>> from parakeet.frontend import English
>>> from parakeet.models import TransformerTTS
>>> from pathlib import Path
>>> import yacs
>>>
>>> # load the pretrained model
>>> frontend = English()
>>> checkpoint_dir = Path("transformer_tts_pretrained")
>>> config = yacs.config.CfgNode.load_cfg(str(checkpoint_dir / "config.yaml"))
>>> checkpoint_path = str(checkpoint_dir / "step-310000")
>>> model = TransformerTTS.from_pretrained(
>>>     frontend, config, checkpoint_path)
>>> model.eval()
>>>
>>> # text to spectrogram
>>> sentence = "Printing, in the only sense with which we are at present concerned, differs from most if not from all the arts and crafts represented in the Exhibition"
>>> outputs = model.predict(sentence, verbose=args.verbose)
>>> mel_output = outputs["mel_output"]

vocoder

Like the example above, after loading the pretrained ConditionalWaveFlow model, call model.predict(mel) to synthesize raw audio (in wav format).

>>> import soundfile as df
>>> from parakeet.models import ConditionalWaveFlow
>>>
>>> # load the pretrained model
>>> checkpoint_dir = Path("waveflow_pretrained")
>>> config = yacs.config.CfgNode.load_cfg(str(checkpoint_dir / "config.yaml"))
>>> checkpoint_path = str(checkpoint_dir / "step-2000000")
>>> vocoder = ConditionalWaveFlow.from_pretrained(config, checkpoint_path)
>>> vocoder.eval()
>>>
>>> # synthesize
>>> audio = vocoder.predict(mel_output)
>>> sf.write(audio_path, audio, config.data.sample_rate)

For more details on how to use the model, please refer the documentation.