API Documentation¶
Basic usage¶
from hfst_optimized_lookup import TransducerFile
fst = TransducerFile("path/to/transducer.hfstol")
# Now you can .lookup() in the FST to your heart's content!
See hfst_optimized_lookup.TransducerFile
for further usage.
TransducerFile¶
-
class
hfst_optimized_lookup.
TransducerFile
(path)¶ Load an
.hfstol
transducer file.>>> analyzer = TransducerFile("path/to/fst.hfst")
Examples usage of an English analyzer:
>>> analyzer.lookup("bank") ['bank+Noun+Sg', 'bank+Verb'] >>> analyzer.lookup_lemma_with_affixes("bank") [Analysis(prefixes=(), lemma="bank", suffixes=("+Noun", "+Sg"), Analysis(prefixes=(), lemma="bank", suffixes=("+Verb",)] >>> analyzer.lookup_symbols("bank") [['bank', '+Noun', '+Sg'], ['bank', '+Verb']]
Example usage of an English generator:
>>> generator.bulk_lookup(["cactus+Noun+Sg", "octopus+Noun+Pl"]) {"cactus+Noun+Sg", set(["cactuses, cacti"]), "octopus+Noun+Pl": set(["octopuses", "octopi", "octopodes"])}
- Parameters
path (str or os.PathLike) – the path to the .hfstol file
-
bulk_lookup
(words)¶ Like
lookup()
but applied to multiple inputs. Useful for generating multiple surface forms.- Parameters
words (list[str]) – list of words to lookup
- Returns
a dictionary mapping words in the input to a set of its tranductions
- Return type
dict[str, set[str]]
-
lookup
(string)¶ Lookup the input string, returning a list of tranductions. This is most similar to using
hfst-optimized-lookup
on the command line.- Parameters
string (str) – The string to lookup.
- Returns
list of analyses as concatenated strings, or an empty list if the input cannot be analyzed.
- Return type
list[str]
-
lookup_lemma_with_affixes
(string)¶ New in version 0.10.0.
Analyze the input string, returning a list of
hfst_optimized_lookup.Analysis
objects.Note
this method assumes an analyzer in which all multicharacter symbols represent affixes, and all lexical symbols are contiguous.
- Parameters
string (str) – The string to lookup.
- Returns
list of analyses as
hfst_optimized_lookup.Analysis
objects, or an empty list if there are no analyses.- Return type
list of
hfst_optimized_lookup.Analysis
-
lookup_symbols
(string)¶ Transduce the input string. The result is a list of tranductions. Each tranduction is a list of symbols returned in the model; that is, the symbols are not concatenated into a single string.
- Parameters
string (str) – The string to lookup.
- Returns
- Return type
list[list[str]]
-
symbol_count
() → int¶ Returns the number of symbols in the sigma (the symbol table).
- Return type
int
Analysis¶
-
class
hfst_optimized_lookup.
Analysis
(prefixes: Tuple[str, …], lemma: str, suffixes: Tuple[str, …])¶ An analysis of a wordform.
This is a named tuple, so you can use it both with attributes and indices:
>>> analysis = Analysis(('PV/e+',), 'wâpamêw', ('+V', '+TA', '+Cnj', '+3Sg', '+4Sg/PlO'))
Using attributes:
>>> analysis.lemma 'wâpamêw' >>> analysis.prefixes ('PV/e+',) >>> analysis.suffixes ('+V', '+TA', '+Cnj', '+3Sg', '+4Sg/PlO')
Using with indices:
>>> len(analysis) 3 >>> analysis[0] ('PV/e+',) >>> analysis[1] 'wâpamêw' >>> analysis[2] ('+V', '+TA', '+Cnj', '+3Sg', '+4Sg/PlO') >>> prefixes, lemma, suffix = analysis >>> lemma 'wâpamêw'
-
prefixes
: Tuple[str, …]¶ Tags that appear before the lemma.
-
lemma
: str¶ The base form of the analyzed wordform.
-
suffixes
: Tuple[str, …]¶ Tags that appear after the lemma.
-