2024-heraklion-data/exercises/numpy_broadcasting_extra/broadcasting.ipynb

375 lines
7.9 KiB
Plaintext
Raw Normal View History

2024-08-27 14:27:53 +02:00
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "282817dd",
"metadata": {
"ExecuteTime": {
"end_time": "2023-06-27T20:08:23.900532Z",
"start_time": "2023-06-27T20:08:22.963157Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"def print_info(a):\n",
" \"\"\" Print the content of an array, and its metadata. \"\"\"\n",
" \n",
" txt = f\"\"\"\n",
"dtype\\t{a.dtype}\n",
"ndim\\t{a.ndim}\n",
"shape\\t{a.shape}\n",
"strides\\t{a.strides}\n",
" \"\"\"\n",
"\n",
" print(a)\n",
" print(txt)"
]
},
{
"cell_type": "markdown",
"id": "6cd0f8cf",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"<font size=9> Mind-on exercises </font>"
]
},
{
"cell_type": "markdown",
"id": "acba732f",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Exercise 1: warm up\n",
"\n",
"```What is the expected output shape for each operation?```"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a41d0f74",
"metadata": {
"ExecuteTime": {
"end_time": "2023-06-27T19:58:58.881059Z",
"start_time": "2023-06-27T19:58:57.830Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"a = np.arange(5)\n",
"b = 5\n",
"\n",
"np.shape(a-b)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6f82a2fb",
"metadata": {
"ExecuteTime": {
"end_time": "2023-06-27T19:58:58.884966Z",
"start_time": "2023-06-27T19:58:57.833Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"a = np.ones((7, 1))\n",
"b = np.arange(7)\n",
"np.shape(a*b)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "808095ad",
"metadata": {
"ExecuteTime": {
"end_time": "2023-06-27T19:58:58.888119Z",
"start_time": "2023-06-27T19:58:57.836Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"a = np.random.randint(0, 50, (2, 3, 3))\n",
"b = np.random.randint(0, 10, (3, 1))\n",
"\n",
"np.shape(a-b)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d9a12a90",
"metadata": {
"ExecuteTime": {
"end_time": "2023-06-27T19:58:58.891462Z",
"start_time": "2023-06-27T19:58:57.839Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"a = np.arange(100).reshape(10, 10)\n",
"b = np.arange(1, 10)\n",
"\n",
"np.shape(a+b)"
]
},
{
"cell_type": "markdown",
"id": "69632f95",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Exercise 2\n",
"\n",
"```\n",
"1. Create a random 2D array of dimension (5, 3)\n",
"2. Calculate the maximum value of each row\n",
"3. Divide each row by its maximum\n",
"```\n",
"\n",
"Remember to use broadcasting : NO FOR LOOPS!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "54e2a53e",
"metadata": {
"ExecuteTime": {
"end_time": "2023-06-27T19:58:58.894433Z",
"start_time": "2023-06-27T19:58:57.843Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"## Your code here"
]
},
{
"cell_type": "markdown",
"id": "b9facc0f",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Exercise 3"
]
},
{
"cell_type": "markdown",
"id": "7e8156d0",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Task: Find the closest **cluster** to the **observation**. \n",
"\n",
"Again, use broadcasting: DO NOT iterate cluster by cluster"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2969994e",
"metadata": {
"ExecuteTime": {
"end_time": "2023-06-27T19:58:58.899204Z",
"start_time": "2023-06-27T19:58:57.847Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"observation = np.array([30.0, 99.0]) #Observation\n",
"\n",
"#Clusters\n",
"clusters = np.array([[102.0, 203.0],\n",
" [132.0, 193.0],\n",
" [45.0, 155.0], \n",
" [57.0, 173.0]])"
]
},
{
"cell_type": "markdown",
"id": "f13352ff",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Lets plot this data\n",
"\n",
"In the plot below, **+** is the observation and dots are the cluster coordinates"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b9f6b5cf",
"metadata": {
"ExecuteTime": {
"end_time": "2023-06-27T19:58:58.906715Z",
"start_time": "2023-06-27T19:58:57.850Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt \n",
"\n",
"plt.scatter(clusters[:, 0], clusters[:, 1]) #Scatter plot of clusters\n",
"for n, x in enumerate(clusters):\n",
" print('cluster %d' %n)\n",
" plt.annotate('cluster%d' %n, (x[0], x[1])) #Label each cluster\n",
"plt.plot(observation[0], observation[1], '+'); #Plot observation"
]
},
{
"cell_type": "markdown",
"id": "4f9b84e2",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Closest cluster as seen by the plot is **2**. Your task is to write a function to calculate this"
]
},
{
"cell_type": "markdown",
"id": "8aea6781",
"metadata": {
"ExecuteTime": {
"end_time": "2023-06-26T19:25:08.202848Z",
"start_time": "2023-06-26T19:25:08.194923Z"
}
},
"source": [
"\n",
"**hint:** Find the distance between the observation and each row in the cluster. The cluster to which the observation belongs to is the row with the minimum distance.\n",
"\n",
"distance = $\\sqrt {\\left( {x_1 - x_2 } \\right)^2 + \\left( {y_1 - y_2 } \\right)^2 }$"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ea8a7240",
"metadata": {
"ExecuteTime": {
"end_time": "2023-06-27T19:58:58.916610Z",
"start_time": "2023-06-27T19:58:57.854Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"## Your code here"
]
},
{
"cell_type": "markdown",
"id": "beaee243",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## Sources + Resources\n",
"\n",
"ASPP 2016 - Stéfan van der Walt - https://github.com/ASPP/2016_numpy\n",
"\n",
"Basic Numpy: http://scipy-lectures.org/intro/numpy/index.html\n",
"\n",
"Advanced Numpy: http://scipy-lectures.org/advanced/advanced_numpy/index.html\n",
"\n",
"Numpy chapter in \"Python Data Science Handbook\" https://jakevdp.github.io/PythonDataScienceHandbook/02.00-introduction-to-numpy.html"
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
},
"rise": {
"scroll": true
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 5
}