API Documentation

Basic usage

from hfst_optimized_lookup import TransducerFile

fst = TransducerFile("path/to/transducer.hfstol")

# Now you can .lookup() in the FST to your heart's content!

See hfst_optimized_lookup.TransducerFile for further usage.

TransducerFile

class hfst_optimized_lookup.TransducerFile(path)

Load an .hfstol transducer file.

>>> analyzer = TransducerFile("path/to/fst.hfst")

Examples usage of an English analyzer:

>>> analyzer.lookup("bank")
['bank+Noun+Sg', 'bank+Verb']
>>> analyzer.lookup_lemma_with_affixes("bank")
[Analysis(prefixes=(), lemma="bank", suffixes=("+Noun", "+Sg"), Analysis(prefixes=(), lemma="bank", suffixes=("+Verb",)]
>>> analyzer.lookup_symbols("bank")
[['bank', '+Noun', '+Sg'], ['bank', '+Verb']]

Example usage of an English generator:

>>> generator.bulk_lookup(["cactus+Noun+Sg", "octopus+Noun+Pl"])
{"cactus+Noun+Sg", set(["cactuses, cacti"]), "octopus+Noun+Pl": set(["octopuses", "octopi", "octopodes"])}
Parameters

path (str or os.PathLike) – the path to the .hfstol file

bulk_lookup(words)

Like lookup() but applied to multiple inputs. Useful for generating multiple surface forms.

Parameters

words (list[str]) – list of words to lookup

Returns

a dictionary mapping words in the input to a set of its tranductions

Return type

dict[str, set[str]]

lookup(string)

Lookup the input string, returning a list of tranductions. This is most similar to using hfst-optimized-lookup on the command line.

Parameters

string (str) – The string to lookup.

Returns

list of analyses as concatenated strings, or an empty list if the input cannot be analyzed.

Return type

list[str]

lookup_lemma_with_affixes(string)

New in version 0.10.0.

Analyze the input string, returning a list of hfst_optimized_lookup.Analysis objects.

Note

this method assumes an analyzer in which all multicharacter symbols represent affixes, and all lexical symbols are contiguous.

Parameters

string (str) – The string to lookup.

Returns

list of analyses as hfst_optimized_lookup.Analysis objects, or an empty list if there are no analyses.

Return type

list of hfst_optimized_lookup.Analysis

lookup_symbols(string)

Transduce the input string. The result is a list of tranductions. Each tranduction is a list of symbols returned in the model; that is, the symbols are not concatenated into a single string.

Parameters

string (str) – The string to lookup.

Returns

Return type

list[list[str]]

symbol_count() → int

Returns the number of symbols in the sigma (the symbol table).

Return type

int

Analysis

class hfst_optimized_lookup.Analysis(prefixes: Tuple[str, ], lemma: str, suffixes: Tuple[str, ])

An analysis of a wordform.

This is a named tuple, so you can use it both with attributes and indices:

>>> analysis = Analysis(('PV/e+',), 'wâpamêw', ('+V', '+TA', '+Cnj', '+3Sg', '+4Sg/PlO'))

Using attributes:

>>> analysis.lemma
'wâpamêw'
>>> analysis.prefixes
('PV/e+',)
>>> analysis.suffixes
('+V', '+TA', '+Cnj', '+3Sg', '+4Sg/PlO')

Using with indices:

>>> len(analysis)
3
>>> analysis[0]
('PV/e+',)
>>> analysis[1]
'wâpamêw'
>>> analysis[2]
('+V', '+TA', '+Cnj', '+3Sg', '+4Sg/PlO')
>>> prefixes, lemma, suffix = analysis
>>> lemma
'wâpamêw'
prefixes: Tuple[str, ]

Tags that appear before the lemma.

lemma: str

The base form of the analyzed wordform.

suffixes: Tuple[str, ]

Tags that appear after the lemma.