Original Recurser¶
Classes:
|
Find words that have words inside them that when you delete the inside word the letters that are left are still a word |
Functions:
|
|
|
|
|
recursive density doesn’t dedupe. |
|
|
|
|
|
-
class
recurse_words.recursers.recurse_words.
Recurser
(corpus: str, internal_only=True)¶ Bases:
object
Find words that have words inside them that when you delete the inside word the letters that are left are still a word
- Variables
words (typing.Dict[int, typing.List[str,..]]) – Dict of Lists of words ordered by word length, eg. length-4 words are words[4]
_words –
Recurser.words
only instead of a list, a dict of dicts with words as keys and all values == 1 for faster lookupsword_trees – dict of list of tuples, each tuple consists of
('original_word', 'subword', 'sliced_word')
, since each original word can have multiple subword, they’re combined in nested lists
- Parameters
corpus (path) – text corpus! if a file, use the default corpi.Txt corpus loader, otherwise use the ‘name’ attribute of the loader like “english”
internal_only (bool) – Whether to consider matching strings only if they are in the interior of the word, as opposed to the beginning or end (ie. exclude matches that are prefixes/suffixes).
Methods:
__init__
(corpus[, internal_only])- Parameters
corpus (path) – text corpus! if a file, use the default corpi.Txt corpus loader,
load_words
(corpus)recurse_word
(word[, min_test_word, …])Recurse a single word – see
recurse_all_words()
for argsrecurse_all_words
([min_include_word, …])Populate
word_trees
by searching recursively through words for recurse wordssave
(filename)load
(filename)_reindex_trees
(func)Despite how the internal variables might describe it, reindex the word trees according to some function that takes the tree itself and returns some index, like an integer.
draw_graph
(trees, output_dir[, extension, …])Draw a network diagram of a recurseword tree
Attributes:
word_trees except for just a list of the edges after they have been made unique by calling set()
chains of trees, without tuple structure.
word_trees
reindexed by total number of unique leavesword_trees
reindexed bydedupe_density()
word_trees
reindexed byrecursive_density()
word_trees
reindexed byrecursive_depth()
-
__init__
(corpus: str, internal_only=True)¶ - Parameters
corpus (path) – text corpus! if a file, use the default corpi.Txt corpus loader, otherwise use the ‘name’ attribute of the loader like “english”
internal_only (bool) – Whether to consider matching strings only if they are in the interior of the word, as opposed to the beginning or end (ie. exclude matches that are prefixes/suffixes).
-
property
word_edges
¶ word_trees except for just a list of the edges after they have been made unique by calling set()
- Returns
[(from_word, transformation, to_word),…]
-
load_words
(corpus: List[str])¶
-
recurse_word
(word: str, min_test_word: int = 2, min_clipped_word: int = 3, max_depth: int = 0, current_depth: int = 0)¶ Recurse a single word – see
recurse_all_words()
for args
-
recurse_all_words
(min_include_word: int = 9, min_test_word: int = 2, min_clipped_word: int = 3, max_depth: int = 0, n_procs: int = 12, batch_size: int = 100)¶ Populate
word_trees
by searching recursively through words for recurse words- Parameters
min_include_word (int) – Minimum length of original words to test
min_test_word (int) – Minimum size of subwords to test splicing subwords with
min_clipped_word (int) – Minimum size of the resulting spliced/clipped word to be considered for additional recursive subwords
max_depth (int) – Maximum recursion depth to allow, if 0, infinite
n_procs (int) – Number of processors to spawn in the multiprocessing pool
-
save
(filename: pathlib.Path)¶
-
load
(filename: pathlib.Path)¶
-
property
word_chains
¶ chains of trees, without tuple structure.
Returns:
-
_reindex_trees
(func) → dict¶ Despite how the internal variables might describe it, reindex the word trees according to some function that takes the tree itself and returns some index, like an integer… or whatever…
- Parameters
func (callable) – give it a tree, return something else?
- Returns
dict
-
property
by_leaves
¶ word_trees
reindexed by total number of unique leaves
-
property
by_density
¶ word_trees
reindexed bydedupe_density()
aka the total unique number of edges
-
property
by_absolute_density
¶ word_trees
reindexed byrecursive_density()
aka by counting the total number of nodes and edges in the tree, allowing for repeated paths
-
property
by_depth
¶ word_trees
reindexed byrecursive_depth()
aka by counting the maximum depth of the tree
-
draw_graph
(trees: Union[dict, list, str], output_dir: pathlib.Path, extension: str = '.svg', graph_attr: dict = {}, node_attr: dict = {}, edge_attr: dict = {}, translate=True)¶ Draw a network diagram of a recurseword tree
- Parameters
trees (dict, list, str) – either a dictionary of {word: tree}, a list of [words], or a single word
output_dir (Path) – output directory, file will be named
word{extension}
extension (str) – default
'.svg'
, but any output that pygraphciz supportsgraph_attr (dict) – supplementary parameters for graph attributes
node_attr (dict) – … node attributes…
edge_attr (dict) – … edge
-
recurse_words.recursers.recurse_words.
recursive_walk
(in_list)¶
-
recurse_words.recursers.recurse_words.
recursive_density
(in_list) → int¶
-
recurse_words.recursers.recurse_words.
dedupe_density
(in_list) → int¶ recursive density doesn’t dedupe… so the same path can appear multiple times.
instead we can just recursive walk and take the length of the set of all the unique tuples
-
recurse_words.recursers.recurse_words.
recursive_depth
(in_list) → int¶
-
recurse_words.recursers.recurse_words.
count_leaves
(in_list) → int¶
-
recurse_words.recursers.recurse_words.
recursive_translate
(in_list: list, lut: dict) → list¶