categorizer module¶
-
class
categorizer.
FirstLetterSplitter
(structure, ngram_index)¶ Bases:
object
Creates a token-tree per letter to categorize tokens based on the first letter of each token. There is a tree for every letter, every digit, and a misc tree for everything else.
-
find
(token)¶ looks for a token key-node :param token: :return:
-
size
(category=None)¶ gets the size of the token-trees :param category: :return:
-
traverse
()¶ traversing through every token-tree. Expensive on memory. :return:
-
update_tree
(document)¶ updates the tree with a new document. :param document: :return:
-
visualize_tree
()¶
-