{ "cells": [ { "cell_type": "markdown", "id": "81bfa588", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Numpy arrays" ] }, { "cell_type": "code", "execution_count": 1, "id": "87f078ef", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "markdown", "id": "4670f195", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Numpy arrays in memory (representation)\n", "**reminder**" ] }, { "cell_type": "code", "execution_count": 4, "id": "f006625e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0 1 2]\n", " [3 4 5]\n", " [6 7 8]]\n" ] } ], "source": [ "X = np.arange(0,9).reshape(3,3)\n", "print(X)" ] }, { "cell_type": "markdown", "id": "0dfd22f3", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "![memory_lists1](images/memory_layout_array3.png)" ] }, { "cell_type": "code", "execution_count": 5, "id": "08bf5b5e", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 2, 3, 4, 5, 6, 7, 8])" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# flatten\n", "X.ravel()" ] }, { "cell_type": "code", "execution_count": 8, "id": "45e1988e", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([[0, 3, 6],\n", " [1, 4, 7],\n", " [2, 5, 8]])" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# transpose\n", "X.T" ] }, { "cell_type": "code", "execution_count": 9, "id": "0aab72b3", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([[0, 2],\n", " [6, 8]])" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# slice\n", "X[::2, ::2]" ] }, { "cell_type": "markdown", "id": "d9518a25", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "![numpy_meta](images/numpy_metadata.png)" ] }, { "cell_type": "markdown", "id": "7d938ab8", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "![numpy_magic](images/numpy_views.png)" ] }, { "cell_type": "code", "execution_count": 12, "id": "5c8ad61e", "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "cecb3364", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Views and Copies: an important distinction!\n", "\n", "\n", "**View**\n", "\n", "- accessing the array without changing the databuffer \n", "- **regular indexing** and **slicing** give views\n", "- *in-place* operations can be done in views\n", "\n", "\n", "**Copy**\n", "- when a new array is created by duplicating the data buffer as well as the array metadata\n", "- **fancy indexing** give always copies\n", "- a copy can be forced by method **.copy()**\n", "\n", "How to know? with ```base```" ] }, { "cell_type": "code", "execution_count": 11, "id": "a169f158", "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def is_view(a, x): #checks if the base of a is the same as the base of x\n", " return a.base is x" ] }, { "cell_type": "code", "execution_count": 12, "id": "23f95dca", "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a = [1 2 3 4 5 6]\n" ] } ], "source": [ "a = np.arange(1,7)\n", "print('a = ',a)" ] }, { "cell_type": "code", "execution_count": 13, "id": "c3150638", "metadata": { "lines_to_next_cell": 2, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a_slice = [3 4 5]\n", "The base of a_slice is [1 2 3 4 5 6]\n", "Is a_slice a view of a? True\n" ] } ], "source": [ "# create slice of a and print its base\n", "a_slice = a[2:5]\n", "\n", "print('a_slice = ', a_slice)\n", "print('The base of a_slice is ', a_slice.base)\n", "\n", "print('Is a_slice a view of a?', is_view(a_slice, a))\n" ] }, { "cell_type": "code", "execution_count": 14, "id": "19d2ae59", "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a_copy = [[1 2 3]\n", " [4 5 6]]\n", "the base of a_copy None\n", "a and a_copy have the same base False\n" ] } ], "source": [ "# create a copy of a and print its base\n", "\n", "a_copy = np.reshape(a, (2,3)).copy()\n", "\n", "print('a_copy = ', a_copy)\n", "print('the base of a_copy ', a_copy.base)\n", "print('a and a_copy have the same base ', is_view(a_copy, a))\n" ] }, { "cell_type": "code", "execution_count": 15, "id": "c2e2e7ab", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1 2 3]\n", " [4 5 6]]\n", "True\n", "True\n", "False\n" ] } ], "source": [ "## ! DON'T understand\n", "\n", "# create a copy of a and print its base\n", "a_2_3 = np.reshape(a, (2,3))\n", "\n", "b = np.reshape(a_2_3, (2,3))\n", "print(b)\n", "print(is_view(b, a))\n", "print(is_view(a_2_3, a))\n", "print(is_view(b, a_2_3)) #??????" ] }, { "cell_type": "markdown", "id": "6446c6a7", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "As a copy is a different array in memory, modifiying it will *not* change the base array" ] }, { "cell_type": "code", "execution_count": 17, "id": "728a2740", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a [1 2 3 4 5 6]\n", "a_copy [[ 1 2 3]\n", " [ 4 666 6]]\n" ] } ], "source": [ "a = np.arange(1, 7)\n", "\n", "# create a copy\n", "a_copy = np.reshape(a, (2,3)).copy()\n", "\n", "a_copy[1,1] = 666\n", "\n", "print('a ', a)\n", "print('a_copy ', a_copy)" ] }, { "cell_type": "markdown", "id": "3978394b", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "# change an element in the copy, print original array\n" ] }, { "cell_type": "code", "execution_count": 18, "id": "4763cc05", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a [ 1 2 3 4 101 6]\n", "a_view: [[ 1 2 3]\n", " [ 4 101 6]]\n", "a_view strides: (24, 8)\n", "a_copy [[ 1. 2. 3. ]\n", " [ 4. 666.44 6. ]]\n", "a_copy strides: (24, 8)\n", "a_copy base: None\n" ] } ], "source": [ "a = np.arange(1, 7)\n", "\n", "# create a copy\n", "a_copy = np.reshape(a, (2,3)).astype('float64')\n", "a_view = np.reshape(a, (2,3))\n", "\n", "a_copy[1,1] = 666.44\n", "a_view[1,1] = 101.6555 # the data type in the original array (int) stays the same \n", "\n", "print('a ', a)\n", "print('a_view: ', a_view)\n", "print('a_view strides: ', a_view.strides)\n", "print('a_copy ', a_copy)\n", "print('a_copy strides: ', a_view.strides)\n", "print('a_copy base: ', a_copy.base)" ] }, { "cell_type": "markdown", "id": "adaf1f45", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The same operation with a *view*, however, will carry the change " ] }, { "cell_type": "markdown", "id": "94fc8724", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "**Take-away**: you **do** need to know if you are using a **view** or a **copy**, particularly when you are operating on the array **in-place**" ] }, { "cell_type": "markdown", "id": "512588d1", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### 1.2.1 Strides - why some indexing gives copies and others views?\n", "\n", "- how does numpy arrange data in memory? - When you create an array, numpy allocates certain memory that depends on the type you choose" ] }, { "cell_type": "code", "execution_count": 20, "id": "f94ffd04", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0 1 2]\n", " [3 4 5]\n", " [6 7 8]]\n" ] } ], "source": [ "a = np.arange(9).reshape(3,3)\n", "print(a)" ] }, { "cell_type": "code", "execution_count": 21, "id": "2f3e051a", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "8" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.itemsize" ] }, { "cell_type": "markdown", "id": "b141572c", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "In this example the array has 8 bytes allocated per item.\n", "\n", "Memory is *linear*, that means, the 2-D array will look in memory something like this (blue boxes) \n", "\n", "![linear_mem](images/memory_linear.png)\n", "\n", "However, the user 'sees' the array in 2D (green boxes).\n", "\n", "How does numpy accomplishes this? By defining ```strides```.\n" ] }, { "cell_type": "code", "execution_count": 22, "id": "f8dffd3a", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "(24, 8)" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.strides" ] }, { "cell_type": "markdown", "id": "10883b65", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Strides tell you by how many bytes you should move in memory when moving one step in that dimension.\n", "\n", "![strides](images/strides.png)\n", "\n", "To go from the first item in the first row to the first item in the second row, you need to move (3*8) 24 bytes. To move from the column-wise, you just need to move 8 bytes." ] }, { "cell_type": "markdown", "id": "7be8064c", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "**Views** are created when you use other strides to read your data. Slicing and regular indexing allows that, as you know how many byte steps you need to take to get the data.\n", "\n", "**Fancy indexing** does not allow that, because the data you are asking **cannot** be obtained by just changing the strides. Thus, numpy need to make a **copy** of it in memory." ] }, { "cell_type": "markdown", "id": "66a2711b", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Now, you can change the strides of an array at will." ] }, { "cell_type": "code", "execution_count": 23, "id": "541bb33c", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([[0, 3, 6],\n", " [1, 4, 7],\n", " [2, 5, 8]])" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.strides=(8,24)\n", "a" ] }, { "cell_type": "markdown", "id": "05a3e933", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ " But be careful! Changing the strides to something non-sensical will also **give you non-sense**. And numpy will not complain. " ] }, { "cell_type": "code", "execution_count": 24, "id": "8c20fea5", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "a.strides=(8, 9)" ] }, { "cell_type": "markdown", "id": "e7bccdc1", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Exercises on indexing, views/copies\n" ] }, { "cell_type": "markdown", "id": "694ed250", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Exercise 1: indexing, dimensionality of the output, view or copy?\n", "\n", "Look at the following code examples and before running it, try to answer for each case: \\\n", "(1) what is the dimensionality of v? \\\n", "(2) is v a view or a copy?" ] }, { "cell_type": "code", "execution_count": 25, "id": "cc625fdd", "metadata": {}, "outputs": [], "source": [ "x = np.arange(0,12).reshape(3,4)" ] }, { "cell_type": "code", "execution_count": 26, "id": "02319316", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2, 3],\n", " [ 8, 9, 10, 11]])" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[::2, :] #dim, view or copy\n", "\n", "#is_view(x[::2, :], x.base)" ] }, { "cell_type": "code", "execution_count": 27, "id": "c3336d1e", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "array([4, 5, 6, 7])" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[1, :]\n", "\n", "#is_view(x[1, :], x.base)" ] }, { "cell_type": "code", "execution_count": 45, "id": "1c8449f8", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[1]\n", "\n", "#is_view(x[1], x.base)" ] }, { "cell_type": "code", "execution_count": 59, "id": "b635ea0e", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0 1 2 3]\n", " [ 4 5 6 7]\n", " [ 8 9 10 11]]\n" ] }, { "data": { "text/plain": [ "array([5, 9, 2])" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print(x)\n", "x[[1, 2, 0], [1, 1, 2]]\n", "\n", "#is_view(x[[1, 2, 0], [1, 1, 2]], x.base)" ] }, { "cell_type": "markdown", "id": "ab7c3609", "metadata": {}, "source": [ "### Fancy indexing\n", "\n", "![fancy](images/fancy_indexing_lookup.png)" ] }, { "cell_type": "code", "execution_count": 61, "id": "1371c65a", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[[0, 2], :]\n", "\n", "#is_view(x[[0, 2], :], x.base)" ] }, { "cell_type": "code", "execution_count": 62, "id": "9db68cf6", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.reshape((6, 2))\n", "\n", "#is_view(x.reshape((6, 2)), x.base)" ] }, { "cell_type": "code", "execution_count": null, "id": "9374de16", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "x.ravel()\n", "\n", "#is_view(x.ravel(), x.base)" ] }, { "cell_type": "code", "execution_count": null, "id": "4dce1b10", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "x.T.ravel()\n", "\n", "#is_view(x.T.ravel(), x.base)" ] }, { "cell_type": "code", "execution_count": 68, "id": "fd4bd129", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "(32, 8)" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[(x % 2) == 1]\n", "\n", "#is_view(x[(x % 2) == 1], x.base)" ] }, { "cell_type": "code", "execution_count": 82, "id": "baf5b337", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 82, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y = x + 2\n", "\n", "##### Is this because \n", "\n", "#is_view(y, x)" ] }, { "cell_type": "code", "execution_count": 83, "id": "a5e79e50", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y = np.sort(x, axis=1)\n", "\n", "#is_view(y, x)" ] }, { "cell_type": "markdown", "id": "08b9c593", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Sources + other resources\n", "\n", "\n", "ASPP Bilbao 2022 - Lisa Schwetlick & Aina Frau-Pascual\n", "https://github.com/ASPP/2022-bilbao-advanced-numpy\n", "\n", "\n", "Scipy lecture notes, 2022.1\n", "- Basic Numpy: http://scipy-lectures.org/intro/numpy/index.html\n", "- Advanced Numpy: http://scipy-lectures.org/advanced/advanced_numpy/index.html\n", "\n", "Numpy chapter in \"Python Data Science Handbook\"\n", "https://jakevdp.github.io/PythonDataScienceHandbook/02.00-introduction-to-numpy.html\n", "\n", "\n", "\n", "Further resources on strides: \n", "- https://scipy-lectures.org/advanced/advanced_numpy/#indexing-scheme-strides\n", "- https://ajcr.net/stride-guide-part-1/\n" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.3" }, "rise": { "scroll": true } }, "nbformat": 4, "nbformat_minor": 5 }