2024-heraklion-data/exercises/numpy_broadcasting_extra/broadcasting.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "282817dd",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-06-27T20:08:23.900532Z",
     "start_time": "2023-06-27T20:08:22.963157Z"
    },
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "def print_info(a):\n",
    "    \"\"\" Print the content of an array, and its metadata. \"\"\"\n",
    "    \n",
    "    txt = f\"\"\"\n",
    "dtype\\t{a.dtype}\n",
    "ndim\\t{a.ndim}\n",
    "shape\\t{a.shape}\n",
    "strides\\t{a.strides}\n",
    "    \"\"\"\n",
    "\n",
    "    print(a)\n",
    "    print(txt)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6cd0f8cf",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "<font size=9> Mind-on exercises </font>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "acba732f",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Exercise 1: warm up\n",
    "\n",
    "```What is the expected output shape for each operation?```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a41d0f74",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-06-27T19:58:58.881059Z",
     "start_time": "2023-06-27T19:58:57.830Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "a = np.arange(5)\n",
    "b = 5\n",
    "\n",
    "np.shape(a-b)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6f82a2fb",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-06-27T19:58:58.884966Z",
     "start_time": "2023-06-27T19:58:57.833Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "a = np.ones((7, 1))\n",
    "b = np.arange(7)\n",
    "np.shape(a*b)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "808095ad",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-06-27T19:58:58.888119Z",
     "start_time": "2023-06-27T19:58:57.836Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "a = np.random.randint(0, 50, (2, 3, 3))\n",
    "b = np.random.randint(0, 10, (3, 1))\n",
    "\n",
    "np.shape(a-b)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d9a12a90",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-06-27T19:58:58.891462Z",
     "start_time": "2023-06-27T19:58:57.839Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "a = np.arange(100).reshape(10, 10)\n",
    "b = np.arange(1, 10)\n",
    "\n",
    "np.shape(a+b)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "69632f95",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Exercise 2\n",
    "\n",
    "```\n",
    "1. Create a random 2D array of dimension (5, 3)\n",
    "2. Calculate the maximum value of each row\n",
    "3. Divide each row by its maximum\n",
    "```\n",
    "\n",
    "Remember to use broadcasting : NO FOR LOOPS!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "54e2a53e",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-06-27T19:58:58.894433Z",
     "start_time": "2023-06-27T19:58:57.843Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "## Your code here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b9facc0f",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Exercise 3"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7e8156d0",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Task: Find the closest **cluster** to the **observation**. \n",
    "\n",
    "Again, use broadcasting: DO NOT iterate cluster by cluster"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2969994e",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-06-27T19:58:58.899204Z",
     "start_time": "2023-06-27T19:58:57.847Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "observation = np.array([30.0, 99.0]) #Observation\n",
    "\n",
    "#Clusters\n",
    "clusters = np.array([[102.0, 203.0],\n",
    "             [132.0, 193.0],\n",
    "            [45.0, 155.0], \n",
    "            [57.0, 173.0]])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f13352ff",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Lets plot this data\n",
    "\n",
    "In the plot below, **+** is the observation and dots are the cluster coordinates"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b9f6b5cf",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-06-27T19:58:58.906715Z",
     "start_time": "2023-06-27T19:58:57.850Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt \n",
    "\n",
    "plt.scatter(clusters[:, 0], clusters[:, 1]) #Scatter plot of clusters\n",
    "for n, x in enumerate(clusters):\n",
    "    print('cluster %d' %n)\n",
    "    plt.annotate('cluster%d' %n, (x[0], x[1])) #Label each cluster\n",
    "plt.plot(observation[0], observation[1], '+'); #Plot observation"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f9b84e2",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Closest cluster as seen by the plot is **2**. Your task is to write a function to calculate this"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8aea6781",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-06-26T19:25:08.202848Z",
     "start_time": "2023-06-26T19:25:08.194923Z"
    }
   },
   "source": [
    "\n",
    "**hint:** Find the distance between the observation and each row in the cluster. The cluster to which the observation belongs to is the row with the minimum distance.\n",
    "\n",
    "distance = $\\sqrt {\\left( {x_1 - x_2 } \\right)^2 + \\left( {y_1 - y_2 } \\right)^2 }$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ea8a7240",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-06-27T19:58:58.916610Z",
     "start_time": "2023-06-27T19:58:57.854Z"
    },
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "## Your code here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "beaee243",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "## Sources + Resources\n",
    "\n",
    "ASPP 2016 - Stéfan van der Walt - https://github.com/ASPP/2016_numpy\n",
    "\n",
    "Basic Numpy: http://scipy-lectures.org/intro/numpy/index.html\n",
    "\n",
    "Advanced Numpy: http://scipy-lectures.org/advanced/advanced_numpy/index.html\n",
    "\n",
    "Numpy chapter in \"Python Data Science Handbook\" https://jakevdp.github.io/PythonDataScienceHandbook/02.00-introduction-to-numpy.html"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.3"
  },
  "rise": {
   "scroll": true
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
ASPP 2024 material 2024-08-27 14:27:53 +02:00			`{`
			`"cells": [`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "282817dd",`
			`"metadata": {`
			`"ExecuteTime": {`
			`"end_time": "2023-06-27T20:08:23.900532Z",`
			`"start_time": "2023-06-27T20:08:22.963157Z"`
			`},`
			`"slideshow": {`
			`"slide_type": "skip"`
			`}`
			`},`
			`"outputs": [],`
			`"source": [`
			`"import numpy as np\n",`
			`"\n",`
			`"def print_info(a):\n",`
			`" \"\"\" Print the content of an array, and its metadata. \"\"\"\n",`
			`" \n",`
			`" txt = f\"\"\"\n",`
			`"dtype\\t{a.dtype}\n",`
			`"ndim\\t{a.ndim}\n",`
			`"shape\\t{a.shape}\n",`
			`"strides\\t{a.strides}\n",`
			`" \"\"\"\n",`
			`"\n",`
			`" print(a)\n",`
			`" print(txt)"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "6cd0f8cf",`
			`"metadata": {`
			`"slideshow": {`
			`"slide_type": "slide"`
			`}`
			`},`
			`"source": [`
			`"<font size=9> Mind-on exercises </font>"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "acba732f",`
			`"metadata": {`
			`"slideshow": {`
			`"slide_type": "slide"`
			`}`
			`},`
			`"source": [`
			`"### Exercise 1: warm up\n",`
			`"\n",`
			"```What is the expected output shape for each operation?```"
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "a41d0f74",`
			`"metadata": {`
			`"ExecuteTime": {`
			`"end_time": "2023-06-27T19:58:58.881059Z",`
			`"start_time": "2023-06-27T19:58:57.830Z"`
			`},`
			`"slideshow": {`
			`"slide_type": "fragment"`
			`}`
			`},`
			`"outputs": [],`
			`"source": [`
			`"a = np.arange(5)\n",`
			`"b = 5\n",`
			`"\n",`
			`"np.shape(a-b)"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "6f82a2fb",`
			`"metadata": {`
			`"ExecuteTime": {`
			`"end_time": "2023-06-27T19:58:58.884966Z",`
			`"start_time": "2023-06-27T19:58:57.833Z"`
			`},`
			`"slideshow": {`
			`"slide_type": "fragment"`
			`}`
			`},`
			`"outputs": [],`
			`"source": [`
			`"a = np.ones((7, 1))\n",`
			`"b = np.arange(7)\n",`
			`"np.shape(a*b)"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "808095ad",`
			`"metadata": {`
			`"ExecuteTime": {`
			`"end_time": "2023-06-27T19:58:58.888119Z",`
			`"start_time": "2023-06-27T19:58:57.836Z"`
			`},`
			`"slideshow": {`
			`"slide_type": "fragment"`
			`}`
			`},`
			`"outputs": [],`
			`"source": [`
			`"a = np.random.randint(0, 50, (2, 3, 3))\n",`
			`"b = np.random.randint(0, 10, (3, 1))\n",`
			`"\n",`
			`"np.shape(a-b)"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "d9a12a90",`
			`"metadata": {`
			`"ExecuteTime": {`
			`"end_time": "2023-06-27T19:58:58.891462Z",`
			`"start_time": "2023-06-27T19:58:57.839Z"`
			`},`
			`"slideshow": {`
			`"slide_type": "fragment"`
			`}`
			`},`
			`"outputs": [],`
			`"source": [`
			`"a = np.arange(100).reshape(10, 10)\n",`
			`"b = np.arange(1, 10)\n",`
			`"\n",`
			`"np.shape(a+b)"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "69632f95",`
			`"metadata": {`
			`"slideshow": {`
			`"slide_type": "slide"`
			`}`
			`},`
			`"source": [`
			`"### Exercise 2\n",`
			`"\n",`
			"```\n",
			`"1. Create a random 2D array of dimension (5, 3)\n",`
			`"2. Calculate the maximum value of each row\n",`
			`"3. Divide each row by its maximum\n",`
			"```\n",
			`"\n",`
			`"Remember to use broadcasting : NO FOR LOOPS!"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "54e2a53e",`
			`"metadata": {`
			`"ExecuteTime": {`
			`"end_time": "2023-06-27T19:58:58.894433Z",`
			`"start_time": "2023-06-27T19:58:57.843Z"`
			`},`
			`"slideshow": {`
			`"slide_type": "fragment"`
			`}`
			`},`
			`"outputs": [],`
			`"source": [`
			`"## Your code here"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "b9facc0f",`
			`"metadata": {`
			`"slideshow": {`
			`"slide_type": "slide"`
			`}`
			`},`
			`"source": [`
			`"### Exercise 3"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "7e8156d0",`
			`"metadata": {`
			`"slideshow": {`
			`"slide_type": "fragment"`
			`}`
			`},`
			`"source": [`
			`"Task: Find the closest cluster to the observation. \n",`
			`"\n",`
			`"Again, use broadcasting: DO NOT iterate cluster by cluster"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "2969994e",`
			`"metadata": {`
			`"ExecuteTime": {`
			`"end_time": "2023-06-27T19:58:58.899204Z",`
			`"start_time": "2023-06-27T19:58:57.847Z"`
			`},`
			`"slideshow": {`
			`"slide_type": "fragment"`
			`}`
			`},`
			`"outputs": [],`
			`"source": [`
			`"observation = np.array([30.0, 99.0]) #Observation\n",`
			`"\n",`
			`"#Clusters\n",`
			`"clusters = np.array([[102.0, 203.0],\n",`
			`" [132.0, 193.0],\n",`
			`" [45.0, 155.0], \n",`
			`" [57.0, 173.0]])"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "f13352ff",`
			`"metadata": {`
			`"slideshow": {`
			`"slide_type": "fragment"`
			`}`
			`},`
			`"source": [`
			`"Lets plot this data\n",`
			`"\n",`
			`"In the plot below, + is the observation and dots are the cluster coordinates"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "b9f6b5cf",`
			`"metadata": {`
			`"ExecuteTime": {`
			`"end_time": "2023-06-27T19:58:58.906715Z",`
			`"start_time": "2023-06-27T19:58:57.850Z"`
			`},`
			`"slideshow": {`
			`"slide_type": "fragment"`
			`}`
			`},`
			`"outputs": [],`
			`"source": [`
			`"import matplotlib.pyplot as plt \n",`
			`"\n",`
			`"plt.scatter(clusters[:, 0], clusters[:, 1]) #Scatter plot of clusters\n",`
			`"for n, x in enumerate(clusters):\n",`
			`" print('cluster %d' %n)\n",`
			`" plt.annotate('cluster%d' %n, (x[0], x[1])) #Label each cluster\n",`
			`"plt.plot(observation[0], observation[1], '+'); #Plot observation"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "4f9b84e2",`
			`"metadata": {`
			`"slideshow": {`
			`"slide_type": "fragment"`
			`}`
			`},`
			`"source": [`
			`"Closest cluster as seen by the plot is 2. Your task is to write a function to calculate this"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "8aea6781",`
			`"metadata": {`
			`"ExecuteTime": {`
			`"end_time": "2023-06-26T19:25:08.202848Z",`
			`"start_time": "2023-06-26T19:25:08.194923Z"`
			`}`
			`},`
			`"source": [`
			`"\n",`
			`"hint: Find the distance between the observation and each row in the cluster. The cluster to which the observation belongs to is the row with the minimum distance.\n",`
			`"\n",`
			`"distance = $\\sqrt {\\left( {x_1 - x_2 } \\right)^2 + \\left( {y_1 - y_2 } \\right)^2 }$"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "ea8a7240",`
			`"metadata": {`
			`"ExecuteTime": {`
			`"end_time": "2023-06-27T19:58:58.916610Z",`
			`"start_time": "2023-06-27T19:58:57.854Z"`
			`},`
			`"slideshow": {`
			`"slide_type": "fragment"`
			`}`
			`},`
			`"outputs": [],`
			`"source": [`
			`"## Your code here"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "beaee243",`
			`"metadata": {`
			`"slideshow": {`
			`"slide_type": "skip"`
			`}`
			`},`
			`"source": [`
			`"## Sources + Resources\n",`
			`"\n",`
			`"ASPP 2016 - Stéfan van der Walt - https://github.com/ASPP/2016_numpy\n",`
			`"\n",`
			`"Basic Numpy: http://scipy-lectures.org/intro/numpy/index.html\n",`
			`"\n",`
			`"Advanced Numpy: http://scipy-lectures.org/advanced/advanced_numpy/index.html\n",`
			`"\n",`
			`"Numpy chapter in \"Python Data Science Handbook\" https://jakevdp.github.io/PythonDataScienceHandbook/02.00-introduction-to-numpy.html"`
			`]`
			`}`
			`],`
			`"metadata": {`
			`"celltoolbar": "Slideshow",`
			`"kernelspec": {`
			`"display_name": "Python 3 (ipykernel)",`
			`"language": "python",`
			`"name": "python3"`
			`},`
			`"language_info": {`
			`"codemirror_mode": {`
			`"name": "ipython",`
			`"version": 3`
			`},`
			`"file_extension": ".py",`
			`"mimetype": "text/x-python",`
			`"name": "python",`
			`"nbconvert_exporter": "python",`
			`"pygments_lexer": "ipython3",`
			`"version": "3.11.3"`
			`},`
			`"rise": {`
			`"scroll": true`
			`},`
			`"toc": {`
			`"base_numbering": 1,`
			`"nav_menu": {},`
			`"number_sections": true,`
			`"sideBar": true,`
			`"skip_h1_title": false,`
			`"title_cell": "Table of Contents",`
			`"title_sidebar": "Contents",`
			`"toc_cell": false,`
			`"toc_position": {},`
			`"toc_section_display": true,`
			`"toc_window_display": false`
			`}`
			`},`
			`"nbformat": 4,`
			`"nbformat_minor": 5`
			`}`