Original Recurser

Classes:

Recurser(corpus[, internal_only])

Find words that have words inside them that when you delete the inside word the letters that are left are still a word

Functions:

recursive_walk(in_list)

recursive_density(in_list)

dedupe_density(in_list)

recursive density doesn’t dedupe.

recursive_depth(in_list)

count_leaves(in_list)

recursive_translate(in_list, lut)

class recurse_words.recursers.recurse_words.Recurser(corpus: str, internal_only=True)

Bases: object

Find words that have words inside them that when you delete the inside word the letters that are left are still a word

Variables
  • words (typing.Dict[int, typing.List[str,..]]) – Dict of Lists of words ordered by word length, eg. length-4 words are words[4]

  • _wordsRecurser.words only instead of a list, a dict of dicts with words as keys and all values == 1 for faster lookups

  • word_trees – dict of list of tuples, each tuple consists of ('original_word', 'subword', 'sliced_word'), since each original word can have multiple subword, they’re combined in nested lists

Parameters
  • corpus (path) – text corpus! if a file, use the default corpi.Txt corpus loader, otherwise use the ‘name’ attribute of the loader like “english”

  • internal_only (bool) – Whether to consider matching strings only if they are in the interior of the word, as opposed to the beginning or end (ie. exclude matches that are prefixes/suffixes).

Methods:

__init__(corpus[, internal_only])

Parameters
  • corpus (path) – text corpus! if a file, use the default corpi.Txt corpus loader,

load_words(corpus)

recurse_word(word[, min_test_word, …])

Recurse a single word – see recurse_all_words() for args

recurse_all_words([min_include_word, …])

Populate word_trees by searching recursively through words for recurse words

save(filename)

load(filename)

_reindex_trees(func)

Despite how the internal variables might describe it, reindex the word trees according to some function that takes the tree itself and returns some index, like an integer.

draw_graph(trees, output_dir[, extension, …])

Draw a network diagram of a recurseword tree

Attributes:

word_edges

word_trees except for just a list of the edges after they have been made unique by calling set()

word_chains

chains of trees, without tuple structure.

by_leaves

word_trees reindexed by total number of unique leaves

by_density

word_trees reindexed by dedupe_density()

by_absolute_density

word_trees reindexed by recursive_density()

by_depth

word_trees reindexed by recursive_depth()

__init__(corpus: str, internal_only=True)
Parameters
  • corpus (path) – text corpus! if a file, use the default corpi.Txt corpus loader, otherwise use the ‘name’ attribute of the loader like “english”

  • internal_only (bool) – Whether to consider matching strings only if they are in the interior of the word, as opposed to the beginning or end (ie. exclude matches that are prefixes/suffixes).

property word_edges

word_trees except for just a list of the edges after they have been made unique by calling set()

Returns

[(from_word, transformation, to_word),…]

load_words(corpus: List[str])
recurse_word(word: str, min_test_word: int = 2, min_clipped_word: int = 3, max_depth: int = 0, current_depth: int = 0)

Recurse a single word – see recurse_all_words() for args

recurse_all_words(min_include_word: int = 9, min_test_word: int = 2, min_clipped_word: int = 3, max_depth: int = 0, n_procs: int = 12, batch_size: int = 100)

Populate word_trees by searching recursively through words for recurse words

Parameters
  • min_include_word (int) – Minimum length of original words to test

  • min_test_word (int) – Minimum size of subwords to test splicing subwords with

  • min_clipped_word (int) – Minimum size of the resulting spliced/clipped word to be considered for additional recursive subwords

  • max_depth (int) – Maximum recursion depth to allow, if 0, infinite

  • n_procs (int) – Number of processors to spawn in the multiprocessing pool

save(filename: pathlib.Path)
load(filename: pathlib.Path)
property word_chains

chains of trees, without tuple structure.

Returns:

_reindex_trees(func)dict

Despite how the internal variables might describe it, reindex the word trees according to some function that takes the tree itself and returns some index, like an integer… or whatever…

Parameters

func (callable) – give it a tree, return something else?

Returns

dict

property by_leaves

word_trees reindexed by total number of unique leaves

property by_density

word_trees reindexed by dedupe_density()

aka the total unique number of edges

property by_absolute_density

word_trees reindexed by recursive_density()

aka by counting the total number of nodes and edges in the tree, allowing for repeated paths

property by_depth

word_trees reindexed by recursive_depth()

aka by counting the maximum depth of the tree

draw_graph(trees: Union[dict, list, str], output_dir: pathlib.Path, extension: str = '.svg', graph_attr: dict = {}, node_attr: dict = {}, edge_attr: dict = {}, translate=True)

Draw a network diagram of a recurseword tree

Parameters
  • trees (dict, list, str) – either a dictionary of {word: tree}, a list of [words], or a single word

  • output_dir (Path) – output directory, file will be named word{extension}

  • extension (str) – default '.svg' , but any output that pygraphciz supports

  • graph_attr (dict) – supplementary parameters for graph attributes

  • node_attr (dict) – … node attributes…

  • edge_attr (dict) – … edge

recurse_words.recursers.recurse_words.recursive_walk(in_list)
recurse_words.recursers.recurse_words.recursive_density(in_list)int
recurse_words.recursers.recurse_words.dedupe_density(in_list)int

recursive density doesn’t dedupe… so the same path can appear multiple times.

instead we can just recursive walk and take the length of the set of all the unique tuples

recurse_words.recursers.recurse_words.recursive_depth(in_list)int
recurse_words.recursers.recurse_words.count_leaves(in_list)int
recurse_words.recursers.recurse_words.recursive_translate(in_list: list, lut: dict)list