categorizer module

class categorizer.FirstLetterSplitter(structure, ngram_index)

Bases: object

Creates a token-tree per letter to categorize tokens based on the first letter of each token. There is a tree for every letter, every digit, and a misc tree for everything else.

find(token)

looks for a token key-node :param token: :return:

size(category=None)

gets the size of the token-trees :param category: :return:

traverse()

traversing through every token-tree. Expensive on memory. :return:

update_tree(document)

updates the tree with a new document. :param document: :return:

visualize_tree()