2024-heraklion-data/exercises/anagrams/anagrams.ipynb
2024-08-27 15:27:53 +03:00

4.7 KiB

Exercise: Find the anagrams for all words in a list

  • You are given an English dictionary containing M words (“the dictionary”), and a separate list of N words (“the input”, saved in the file words_to_search.txt)
  • For each word in the input, find all the anagrams in the dictionary (e.g., for input 'acme' the anagrams are ['acme', 'came', 'mace'])

How to proceed?

  1. Write an algorithm to find all anagrams for one input word first
  2. What is the Big-O class of this algorithm when executed the full N-words input?
  3. Is there a way to pre-process the dictionary to improve the Big-O performance?

1. Load the system dictionary and the input words

In [7]:
# Load the system dictionary
with open('/usr/share/dict/words', 'r') as f:
    dict_words = [w.strip() for w in f.readlines()]
In [8]:
# Print the start and end of the dictionary
dict_words[:5] + ['...'] + dict_words[-5:]
Out[8]:
['A',
 'a',
 'aa',
 'aal',
 'aalii',
 '...',
 'zythem',
 'Zythia',
 'zythum',
 'Zyzomys',
 'Zyzzogeton']
In [13]:
# Load the input words
with open('words_to_search.txt', 'r') as f:
    words = [w.strip() for w in f.readlines()]
In [14]:
# Print the start and end of the input list
words[:5] + ['...'] + words[-5:]
Out[14]:
['acer',
 'acers',
 'aces',
 'aches',
 'acme',
 '...',
 'yap',
 'yaw',
 'yea',
 'zendo',
 'zoned']

2. Look for the anagrams of one input word, e.g. "organ"

  • There are several anagrams, including "groan" and "argon".

  • What is the Big-O performance oh your algorithm? In terms of M, the number of words in the dictionary, and K, the number of letters in a word

In [15]:
word = 'organ'
In [ ]:

The performance of this implementation is ... .

3. Look for the anagrams of the words in the input list

  • How does the Big-O performance of your one-word implementation scale to an input list of M words?
  • Is there a way to pre-process the dictionary words in a data structure that is better suited for this task?
In [ ]:

In [ ]: