{ "cells": [ { "cell_type": "markdown", "id": "245d99ab", "metadata": {}, "source": [ "# Exercise: Find the anagrams for all words in a list" ] }, { "cell_type": "markdown", "id": "d1e46c4a", "metadata": {}, "source": [ "* You are given an English dictionary containing M words (“the dictionary”), and a separate list of N words (“the input”, saved in the file `words_to_search.txt`)\n", "* For each word in the input, find all the anagrams in the dictionary (e.g., for input 'acme' the anagrams are `['acme', 'came', 'mace']`)\n", "\n", "How to proceed?\n", "1. Write an algorithm to find all anagrams for one input word first\n", "2. What is the Big-O class of this algorithm when executed the full N-words input?\n", "3. Is there a way to pre-process the dictionary to improve the Big-O performance?" ] }, { "cell_type": "markdown", "id": "b9b9cecd", "metadata": {}, "source": [ "# 1. Load the system dictionary and the input words" ] }, { "cell_type": "code", "execution_count": 7, "id": "24070a26", "metadata": {}, "outputs": [], "source": [ "# Load the system dictionary\n", "with open('/usr/share/dict/words', 'r') as f:\n", " dict_words = [w.strip() for w in f.readlines()]" ] }, { "cell_type": "code", "execution_count": 8, "id": "4002fcdd", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "['A',\n", " 'a',\n", " 'aa',\n", " 'aal',\n", " 'aalii',\n", " '...',\n", " 'zythem',\n", " 'Zythia',\n", " 'zythum',\n", " 'Zyzomys',\n", " 'Zyzzogeton']" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Print the start and end of the dictionary\n", "dict_words[:5] + ['...'] + dict_words[-5:]" ] }, { "cell_type": "code", "execution_count": 13, "id": "823537ef", "metadata": {}, "outputs": [], "source": [ "# Load the input words\n", "with open('words_to_search.txt', 'r') as f:\n", " words = [w.strip() for w in f.readlines()]" ] }, { "cell_type": "code", "execution_count": 14, "id": "4ccec6a3", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['acer',\n", " 'acers',\n", " 'aces',\n", " 'aches',\n", " 'acme',\n", " '...',\n", " 'yap',\n", " 'yaw',\n", " 'yea',\n", " 'zendo',\n", " 'zoned']" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Print the start and end of the input list\n", "words[:5] + ['...'] + words[-5:]" ] }, { "cell_type": "markdown", "id": "14d91685", "metadata": {}, "source": [ "# 2. Look for the anagrams of one input word, e.g. \"organ\"\n", "\n", "* There are several anagrams, including \"groan\" and \"argon\".\n", "\n", "* What is the Big-O performance oh your algorithm? In terms of M, the number of words in the dictionary, and K, the number of letters in a word" ] }, { "cell_type": "code", "execution_count": 15, "id": "badf44c1", "metadata": {}, "outputs": [], "source": [ "word = 'organ'" ] }, { "cell_type": "code", "execution_count": null, "id": "ef938d95", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "7cd1196c", "metadata": {}, "source": [ "The performance of this implementation is ... ." ] }, { "cell_type": "markdown", "id": "115c3219", "metadata": {}, "source": [ "# 3. Look for the anagrams of the words in the input list\n", "\n", "* How does the Big-O performance of your one-word implementation scale to an input list of M words?\n", "* Is there a way to pre-process the dictionary words in a data structure that is better suited for this task?" ] }, { "cell_type": "code", "execution_count": null, "id": "03ce3e28", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "a2fc5ec4", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.3" } }, "nbformat": 4, "nbformat_minor": 5 }