2024-heraklion-data/exercises/anagrams/.ipynb_checkpoints/anagrams-checkpoint.ipynb
2024-08-27 15:27:53 +03:00

211 lines
4.7 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "245d99ab",
"metadata": {},
"source": [
"# Exercise: Find the anagrams for all words in a list"
]
},
{
"cell_type": "markdown",
"id": "d1e46c4a",
"metadata": {},
"source": [
"* You are given an English dictionary containing M words (“the dictionary”), and a separate list of N words (“the input”, saved in the file `words_to_search.txt`)\n",
"* For each word in the input, find all the anagrams in the dictionary (e.g., for input 'acme' the anagrams are `['acme', 'came', 'mace']`)\n",
"\n",
"How to proceed?\n",
"1. Write an algorithm to find all anagrams for one input word first\n",
"2. What is the Big-O class of this algorithm when executed the full N-words input?\n",
"3. Is there a way to pre-process the dictionary to improve the Big-O performance?"
]
},
{
"cell_type": "markdown",
"id": "b9b9cecd",
"metadata": {},
"source": [
"# 1. Load the system dictionary and the input words"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "24070a26",
"metadata": {},
"outputs": [],
"source": [
"# Load the system dictionary\n",
"with open('/usr/share/dict/words', 'r') as f:\n",
" dict_words = [w.strip() for w in f.readlines()]"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "4002fcdd",
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"['A',\n",
" 'a',\n",
" 'aa',\n",
" 'aal',\n",
" 'aalii',\n",
" '...',\n",
" 'zythem',\n",
" 'Zythia',\n",
" 'zythum',\n",
" 'Zyzomys',\n",
" 'Zyzzogeton']"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Print the start and end of the dictionary\n",
"dict_words[:5] + ['...'] + dict_words[-5:]"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "823537ef",
"metadata": {},
"outputs": [],
"source": [
"# Load the input words\n",
"with open('words_to_search.txt', 'r') as f:\n",
" words = [w.strip() for w in f.readlines()]"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "4ccec6a3",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['acer',\n",
" 'acers',\n",
" 'aces',\n",
" 'aches',\n",
" 'acme',\n",
" '...',\n",
" 'yap',\n",
" 'yaw',\n",
" 'yea',\n",
" 'zendo',\n",
" 'zoned']"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Print the start and end of the input list\n",
"words[:5] + ['...'] + words[-5:]"
]
},
{
"cell_type": "markdown",
"id": "14d91685",
"metadata": {},
"source": [
"# 2. Look for the anagrams of one input word, e.g. \"organ\"\n",
"\n",
"* There are several anagrams, including \"groan\" and \"argon\".\n",
"\n",
"* What is the Big-O performance oh your algorithm? In terms of M, the number of words in the dictionary, and K, the number of letters in a word"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "badf44c1",
"metadata": {},
"outputs": [],
"source": [
"word = 'organ'"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ef938d95",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "7cd1196c",
"metadata": {},
"source": [
"The performance of this implementation is ... ."
]
},
{
"cell_type": "markdown",
"id": "115c3219",
"metadata": {},
"source": [
"# 3. Look for the anagrams of the words in the input list\n",
"\n",
"* How does the Big-O performance of your one-word implementation scale to an input list of M words?\n",
"* Is there a way to pre-process the dictionary words in a data structure that is better suited for this task?"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "03ce3e28",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "a2fc5ec4",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}