parakeet.frontend package

Subpackages

parakeet.frontend.normalizer package

Submodules

parakeet.frontend.phonectic module

class parakeet.frontend.phonectic.Chinese[source]

Bases: parakeet.frontend.phonectic.Phonetics

Normalize Chinese text sequence and convert it into ids.

numericalize(phonemes)[source]

Convert pronunciation sequence into pronunciation id sequence.

Parameters

phonemes: List[str]: The list of pronunciation sequence.

Returns

List[int]: The list of pronunciation id sequence.

phoneticize(sentence)[source]

Normalize the input text sequence and convert it into pronunciation sequence.

Parameters

sentence: str: The input text sequence.

Returns

List[str]: The list of pronunciation sequence.

reverse(ids)[source]

Reverse the list of pronunciation id sequence to a list of pronunciation sequence.

Parameters

ids: List[int]: The list of pronunciation id sequence.

Returns

List[str]: The list of pronunciation sequence.

property vocab_size: Vocab size.

class parakeet.frontend.phonectic.English[source]

Bases: parakeet.frontend.phonectic.Phonetics

Normalize the input text sequence and convert into pronunciation id sequence.

numericalize(phonemes)[source]

Convert pronunciation sequence into pronunciation id sequence.

Parameters

phonemes: List[str]: The list of pronunciation sequence.

Returns

List[int]: The list of pronunciation id sequence.

phoneticize(sentence)[source]

Normalize the input text sequence and convert it into pronunciation sequence.

Parameters

sentence: str: The input text sequence.

Returns

List[str]: The list of pronunciation sequence.

reverse(ids)[source]

Reverse the list of pronunciation id sequence to a list of pronunciation sequence.

Parameters

ids: List[int]: The list of pronunciation id sequence.

Returns

List[str]: The list of pronunciation sequence.

property vocab_size: Vocab size.

class parakeet.frontend.phonectic.EnglishCharacter[source]

Bases: parakeet.frontend.phonectic.Phonetics

Normalize the input text sequence and convert it into character id sequence.

numericalize(sentence)[source]

Convert a text sequence into ids.

Parameters

sentence: str: The input text sequence.

Returns

List[int]: List of a character id sequence.

phoneticize(sentence)[source]

Normalize the input text sequence.

Parameters

sentence: str: The input text sequence.

Returns

str: A text sequence after normalize.

reverse(ids)[source]

Convert a character id sequence into text.

Parameters

ids: List[int]: List of a character id sequence.

Returns

str: The input text sequence.

property vocab_size: Vocab size.

class parakeet.frontend.phonectic.Phonetics[source]

Bases: abc.ABC

abstract numericalize(phonemes)[source]

abstract phoneticize(sentence)[source]

parakeet.frontend.punctuation module

parakeet.frontend.punctuation.get_punctuations(lang)[source]

parakeet.frontend.vocab module

class parakeet.frontend.vocab.Vocab(symbols: Iterable[str], padding_symbol='<pad>', unk_symbol='<unk>', start_symbol='<s>', end_symbol='</s>')[source]

Bases: object

Vocabulary.

Parameters

symbols: Iterable[str]: Common symbols.
padding_symbol: str, optional: Symbol for pad. Defaults to “<pad>”.
unk_symbol: str, optional: Symbol for unknow. Defaults to “<unk>”
start_symbol: str, optional: Symbol for start. Defaults to “<s>”
end_symbol: str, optional: Symbol for end. Defaults to “</s>”

add_symbol(symbol)[source]: Add a new symbol in vocab.

add_symbols(symbols)[source]: Add multiple symbols in vocab.

property end_index: The index of end symbol.

lookup(symbol)[source]: The index that symbol correspond.

property num_specials: The number of special symbols.

property padding_index: The index of padding symbol

reverse(index)[source]: The symbol thar index cottespond.

property start_index: The index of start symbol.

property unk_index: The index of unknow symbol.

parakeet.frontend package

Subpackages

Submodules

parakeet.frontend.phonectic module

parakeet.frontend.punctuation module

parakeet.frontend.vocab module

Module contents