parakeet.frontend package

Subpackages

Submodules

parakeet.frontend.phonectic module

class parakeet.frontend.phonectic.Chinese[source]

Bases: parakeet.frontend.phonectic.Phonetics

Normalize Chinese text sequence and convert it into ids.

numericalize(phonemes)[source]

Convert pronunciation sequence into pronunciation id sequence.

Parameters
phonemes: List[str]

The list of pronunciation sequence.

Returns
List[int]

The list of pronunciation id sequence.

phoneticize(sentence)[source]

Normalize the input text sequence and convert it into pronunciation sequence.

Parameters
sentence: str

The input text sequence.

Returns
List[str]

The list of pronunciation sequence.

reverse(ids)[source]

Reverse the list of pronunciation id sequence to a list of pronunciation sequence.

Parameters
ids: List[int]

The list of pronunciation id sequence.

Returns
List[str]

The list of pronunciation sequence.

property vocab_size

Vocab size.

class parakeet.frontend.phonectic.English[source]

Bases: parakeet.frontend.phonectic.Phonetics

Normalize the input text sequence and convert into pronunciation id sequence.

numericalize(phonemes)[source]

Convert pronunciation sequence into pronunciation id sequence.

Parameters
phonemes: List[str]

The list of pronunciation sequence.

Returns
List[int]

The list of pronunciation id sequence.

phoneticize(sentence)[source]

Normalize the input text sequence and convert it into pronunciation sequence.

Parameters
sentence: str

The input text sequence.

Returns
List[str]

The list of pronunciation sequence.

reverse(ids)[source]

Reverse the list of pronunciation id sequence to a list of pronunciation sequence.

Parameters
ids: List[int]

The list of pronunciation id sequence.

Returns
List[str]

The list of pronunciation sequence.

property vocab_size

Vocab size.

class parakeet.frontend.phonectic.EnglishCharacter[source]

Bases: parakeet.frontend.phonectic.Phonetics

Normalize the input text sequence and convert it into character id sequence.

numericalize(sentence)[source]

Convert a text sequence into ids.

Parameters
sentence: str

The input text sequence.

Returns
List[int]

List of a character id sequence.

phoneticize(sentence)[source]

Normalize the input text sequence.

Parameters
sentence: str

The input text sequence.

Returns
str

A text sequence after normalize.

reverse(ids)[source]

Convert a character id sequence into text.

Parameters
ids: List[int]

List of a character id sequence.

Returns
str

The input text sequence.

property vocab_size

Vocab size.

class parakeet.frontend.phonectic.Phonetics[source]

Bases: abc.ABC

abstract numericalize(phonemes)[source]
abstract phoneticize(sentence)[source]

parakeet.frontend.punctuation module

parakeet.frontend.punctuation.get_punctuations(lang)[source]

parakeet.frontend.vocab module

class parakeet.frontend.vocab.Vocab(symbols: Iterable[str], padding_symbol='<pad>', unk_symbol='<unk>', start_symbol='<s>', end_symbol='</s>')[source]

Bases: object

Vocabulary.

Parameters
symbols: Iterable[str]

Common symbols.

padding_symbol: str, optional

Symbol for pad. Defaults to “<pad>”.

unk_symbol: str, optional

Symbol for unknow. Defaults to “<unk>”

start_symbol: str, optional

Symbol for start. Defaults to “<s>”

end_symbol: str, optional

Symbol for end. Defaults to “</s>”

add_symbol(symbol)[source]

Add a new symbol in vocab.

add_symbols(symbols)[source]

Add multiple symbols in vocab.

property end_index

The index of end symbol.

lookup(symbol)[source]

The index that symbol correspond.

property num_specials

The number of special symbols.

property padding_index

The index of padding symbol

reverse(index)[source]

The symbol thar index cottespond.

property start_index

The index of start symbol.

property unk_index

The index of unknow symbol.

Module contents