2024-heraklion-data/notebooks/020_numpy/001_numpy_views_and_copies.ipynb

453 lines
8.8 KiB
Plaintext
Raw Normal View History

2024-08-27 14:27:53 +02:00
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "86b10564",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"def print_info(a):\n",
" \"\"\" Print the content of an array, and its metadata. \"\"\"\n",
" \n",
" txt = f\"\"\"\n",
"dtype\\t{a.dtype}\n",
"ndim\\t{a.ndim}\n",
"shape\\t{a.shape}\n",
"strides\\t{a.strides}\n",
" \"\"\"\n",
"\n",
" print(a)\n",
" print(txt)"
]
},
{
"cell_type": "markdown",
"id": "a5bbf650",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# NumPy views and copies\n",
"\n",
"- Operations that only require changing the metadata always do so, and return a **view**\n",
"- Operations that cannot be executed by changing the metadata create a new memory block, and return a **copy**"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "53bd92f9",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"x = np.arange(12).reshape(3, 4).copy()\n",
"print_info(x)"
]
},
{
"cell_type": "markdown",
"id": "d2ee43d7",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Views"
]
},
{
"cell_type": "markdown",
"id": "f4838e77",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Operations that only require changing the metadata always do so, and return a **view**"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f1b82845",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"# slice\n",
"y = x[0::2, 1::2]\n",
"print_info(y)"
]
},
{
"cell_type": "markdown",
"id": "3199b45b",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"A view shares the same memory block as the original array. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "28ea1c71",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"z = x.reshape(1, 12)\n",
"print_info(z)"
]
},
{
"cell_type": "markdown",
"id": "d88fbf5d",
"metadata": {},
"source": [
"CAREFUL: Modifying the view **changes the original array** and all other views of that array as well!"
]
},
{
"cell_type": "markdown",
"id": "7f35dcc3",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"##### in place operations"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "46822b5a",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"y += 100\n",
"print_info(y)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ad9a7950",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"print_info(x)\n",
"print_info(z)"
]
},
{
"cell_type": "markdown",
"id": "4fc789c1",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Functions that take an array as an input should **avoid modifying it in place!***\n",
"\n",
"Always make a copy or be super extra clear in the docstring."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aa25ac4b",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"def robust_log(x, cte=1e-10):\n",
" \"\"\" Returns the log of an array, deals with values that are 0.\n",
"\n",
" `x` is expected to have non-negative values.\n",
" \"\"\"\n",
" x[x == 0] += cte\n",
" return np.log(x)\n",
" \n",
"# this is not being very clear"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "471d9d6b",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"a = np.array([[0.3, 0.01], [0, 1]])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6c05d356",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"# This is a view of `a`\n",
"b = a[1, :]\n",
"print_info(b)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9d96fb61",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"# what is the output?\n",
"robust_log(a)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "35d0327d",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"# what is the output?\n",
"b # what about b??"
]
},
{
"cell_type": "markdown",
"id": "fa8cf77a",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Better to make a copy!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c5359eac",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"def robust_log(x, cte=1e-10):\n",
" \"\"\" Returns the log of an array, deals with values that are 0.\n",
"\n",
" `x` is expected to have non-negative values.\n",
" \"\"\"\n",
" x = x.copy()\n",
" x[x == 0] += cte\n",
" return np.log(x)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0bf9b2d5",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"a = np.array([[0.3, 0.01], [0, 1]])\n",
"b = a[1, :]\n",
"\n",
"#robust_sqrt(a)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "895209ce",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"a # what is the output? \n",
"# b"
]
},
{
"cell_type": "markdown",
"id": "d664b462",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Copies\n",
"\n",
"- Operations that cannot be executed by changing the metadata create a new memory block, and return a **copy**\n",
"\n",
"- How to find out view or copy?"
]
},
{
"cell_type": "markdown",
"id": "716aec53",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Choosing row, columns, or individual elements of an array by giving explicitly their indices (a.k.a \"fancy indexing\") it's an operation that in general cannot be executed by changing the metadata alone.\n",
"\n",
"Therefore, **fancy indexing always returns a copy**."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fbcf3100",
"metadata": {},
"outputs": [],
"source": [
"x = np.arange(12).reshape(3, 4).copy()\n",
"print_info(x)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6c50e46e",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"#print(x)\n",
"z = x[[0, 0, 2], [1, 0, 3]]\n",
"# Can you guess what's z equal to?\n",
"\n",
"print_info(z)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9d65a5c3",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"z += 1000\n",
"print_info(z)\n",
"\n",
"# the original array is unchanged => not a view!\n",
"print_info(x)"
]
},
{
"cell_type": "markdown",
"id": "25aa99a4",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"**Views** are created, when you use other strides to read your data. Slicing and regular indexing allows that, as you know how many byte steps you need to take to get the data.\n",
"\n",
"**Fancy indexing** does not allow that, because the data you are asking **cannot** be obtained by just changing the strides. Thus, numpy needs to create a **copy** of it in memory."
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}