2024-heraklion-data/notebooks/.ipynb_checkpoints/numpy_views_and_copies-checkpoint.ipynb
2024-08-27 15:27:53 +03:00

694 lines
13 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "20df51b1",
"metadata": {},
"source": [
"# NumPy views and copies\n",
"\n",
"- Operations that only require changing the metadata always do so, and return a **view**\n",
"- Operations that cannot be executed by changing the metadata create a new memory block, and return a **copy**"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "4ed67e38",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"\n",
"def print_info(a):\n",
" \"\"\" Print the content of an array, and its metadata. \"\"\"\n",
" \n",
" txt = f\"\"\"\n",
"dtype\\t{a.dtype}\n",
"ndim\\t{a.ndim}\n",
"shape\\t{a.shape}\n",
"strides\\t{a.strides}\n",
" \"\"\"\n",
"\n",
" print(a)\n",
" print(txt)\n"
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "53bd92f9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 0 1 2 3]\n",
" [ 4 5 6 7]\n",
" [ 8 9 10 11]]\n",
"\n",
"dtype\tint64\n",
"ndim\t2\n",
"shape\t(3, 4)\n",
"strides\t(32, 8)\n",
" \n"
]
}
],
"source": [
"x = np.arange(12).reshape(3, 4).copy()\n",
"print_info(x)"
]
},
{
"cell_type": "markdown",
"id": "d2ee43d7",
"metadata": {},
"source": [
"# Views"
]
},
{
"cell_type": "markdown",
"id": "f4838e77",
"metadata": {},
"source": [
"Operations that only require changing the metadata always do so, and return a **view**"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "f1b82845",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 1 3]\n",
" [ 9 11]]\n",
"\n",
"dtype\tint64\n",
"ndim\t2\n",
"shape\t(2, 2)\n",
"strides\t(64, 16)\n",
" \n"
]
}
],
"source": [
"y = x[0::2, 1::2]\n",
"print_info(y)"
]
},
{
"cell_type": "markdown",
"id": "3199b45b",
"metadata": {},
"source": [
"A view shares the same memory block as the original array. \n",
"\n",
"CAREFUL: Modifying the view changes the original array and all an other views of that array as well!"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "28ea1c71",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 0 1 2 3 4 5 6 7 8 9 10 11]]\n",
"\n",
"dtype\tint64\n",
"ndim\t2\n",
"shape\t(1, 12)\n",
"strides\t(96, 8)\n",
" \n"
]
}
],
"source": [
"z = x.reshape(1, 12)\n",
"print_info(z)"
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "46822b5a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[101 103]\n",
" [109 111]]\n",
"\n",
"dtype\tint64\n",
"ndim\t2\n",
"shape\t(2, 2)\n",
"strides\t(64, 16)\n",
" \n"
]
}
],
"source": [
"y += 100\n",
"print_info(y)"
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "ad9a7950",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 0 101 2 103]\n",
" [ 4 5 6 7]\n",
" [ 8 109 10 111]]\n",
"\n",
"dtype\tint64\n",
"ndim\t2\n",
"shape\t(3, 4)\n",
"strides\t(32, 8)\n",
" \n",
"[[ 0 101 2 103 4 5 6 7 8 109 10 111]]\n",
"\n",
"dtype\tint64\n",
"ndim\t2\n",
"shape\t(1, 12)\n",
"strides\t(96, 8)\n",
" \n"
]
}
],
"source": [
"print_info(x)\n",
"print_info(z)"
]
},
{
"cell_type": "markdown",
"id": "4fc789c1",
"metadata": {},
"source": [
"Functions that take an array as an input should avoid modifying it in place! \n",
"\n",
"Always make a copy or be super extra clear in the docstring."
]
},
{
"cell_type": "code",
"execution_count": 45,
"id": "aa25ac4b",
"metadata": {},
"outputs": [],
"source": [
"def robust_log(a, cte=1e-10):\n",
" \"\"\" Returns the log of an array, avoiding troubles when a value is 0.\n",
" \n",
" Add a tiny constant to the values of `a` so that they are not 0. \n",
" `a` is expected to have non-negative values.\n",
" \"\"\"\n",
" a[a == 0] += cte\n",
" return np.log(a)\n",
" \n"
]
},
{
"cell_type": "code",
"execution_count": 57,
"id": "471d9d6b",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/tmp/ipykernel_48764/1018405258.py:2: RuntimeWarning: divide by zero encountered in log\n",
" np.log(a)\n"
]
},
{
"data": {
"text/plain": [
"array([[-1.2039728 , -4.60517019],\n",
" [ -inf, 0. ]])"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = np.array([[0.3, 0.01], [0, 1]])\n",
"np.log(a)"
]
},
{
"cell_type": "code",
"execution_count": 58,
"id": "6c05d356",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0. 1.]\n",
"\n",
"dtype\tfloat64\n",
"ndim\t1\n",
"shape\t(2,)\n",
"strides\t(8,)\n",
" \n"
]
}
],
"source": [
"# This is a view of `a`\n",
"b = a[1, :]\n",
"print_info(b)"
]
},
{
"cell_type": "code",
"execution_count": 59,
"id": "9d96fb61",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ -1.2039728 , -4.60517019],\n",
" [-23.02585093, 0. ]])"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"robust_log(a)"
]
},
{
"cell_type": "code",
"execution_count": 60,
"id": "35d0327d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[3.e-01, 1.e-02],\n",
" [1.e-10, 1.e+00]])"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a"
]
},
{
"cell_type": "code",
"execution_count": 61,
"id": "4a2b95c5",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([1.e-10, 1.e+00])"
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"b"
]
},
{
"cell_type": "markdown",
"id": "fa8cf77a",
"metadata": {},
"source": [
"Better to make a copy!"
]
},
{
"cell_type": "code",
"execution_count": 62,
"id": "c5359eac",
"metadata": {},
"outputs": [],
"source": [
"def robust_log(a, cte=1e-10):\n",
" \"\"\" Returns the log of an array, avoiding troubles when a value is 0.\n",
" \n",
" Add a tiny constant to the values of `a` so that they are not 0. \n",
" `a` is expected to have non-negative values.\n",
" \"\"\"\n",
" a = a.copy()\n",
" a[a == 0] += cte\n",
" return np.log(a)"
]
},
{
"cell_type": "code",
"execution_count": 66,
"id": "0bf9b2d5",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ -1.2039728 , -4.60517019],\n",
" [-23.02585093, 0. ]])"
]
},
"execution_count": 66,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = np.array([[0.3, 0.01], [0, 1]])\n",
"b = a[1, :]\n",
"\n",
"robust_log(a)"
]
},
{
"cell_type": "code",
"execution_count": 67,
"id": "895209ce",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[0.3 , 0.01],\n",
" [0. , 1. ]])"
]
},
"execution_count": 67,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a"
]
},
{
"cell_type": "code",
"execution_count": 68,
"id": "18004050",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0., 1.])"
]
},
"execution_count": 68,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"b"
]
},
{
"cell_type": "markdown",
"id": "d664b462",
"metadata": {},
"source": [
"# Copies\n",
"\n",
"Operations that cannot be executed by changing the metadata create a new memory block, and return a **copy**"
]
},
{
"cell_type": "code",
"execution_count": 72,
"id": "8c8f77e1",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 0 1 2 3]\n",
" [ 4 5 6 7]\n",
" [ 8 9 10 11]]\n",
"\n",
"dtype\tint64\n",
"ndim\t2\n",
"shape\t(3, 4)\n",
"strides\t(32, 8)\n",
" \n"
]
}
],
"source": [
"x = np.arange(12).reshape(3, 4).copy()\n",
"print_info(x)"
]
},
{
"cell_type": "markdown",
"id": "716aec53",
"metadata": {},
"source": [
"Choosing row, columns, or individual elements of an array by giving explicitly their indices (a.k.a \"fancy indexing\") it's an operation that in general cannot be executed by changing the metadata alone.\n",
"\n",
"Therefore, **fancy indexing always returns a copy**."
]
},
{
"cell_type": "code",
"execution_count": 77,
"id": "40fb1777",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[0 1]\n",
" [4 5]\n",
" [8 9]]\n",
"\n",
"dtype\tint64\n",
"ndim\t2\n",
"shape\t(3, 2)\n",
"strides\t(8, 24)\n",
" \n"
]
}
],
"source": [
"# Get the first and second column\n",
"y = x[:, [0, 1]]\n",
"print_info(y)"
]
},
{
"cell_type": "code",
"execution_count": 79,
"id": "b8ed81d5",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[2000 2001]\n",
" [2004 2005]\n",
" [2008 2009]]\n",
"\n",
"dtype\tint64\n",
"ndim\t2\n",
"shape\t(3, 2)\n",
"strides\t(8, 24)\n",
" \n",
"[[ 0 1 2 3]\n",
" [ 4 5 6 7]\n",
" [ 8 9 10 11]]\n",
"\n",
"dtype\tint64\n",
"ndim\t2\n",
"shape\t(3, 4)\n",
"strides\t(32, 8)\n",
" \n"
]
}
],
"source": [
"y += 1000\n",
"print_info(y)\n",
"# the original array is unchanged => not a view!\n",
"print_info(x)"
]
},
{
"cell_type": "code",
"execution_count": 80,
"id": "6c50e46e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 1 0 11]\n",
"\n",
"dtype\tint64\n",
"ndim\t1\n",
"shape\t(3,)\n",
"strides\t(8,)\n",
" \n"
]
}
],
"source": [
"y = x[[0, 0, 2], [1, 0, 3]]\n",
"print_info(y)"
]
},
{
"cell_type": "code",
"execution_count": 81,
"id": "9d65a5c3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1001 1000 1011]\n",
"\n",
"dtype\tint64\n",
"ndim\t1\n",
"shape\t(3,)\n",
"strides\t(8,)\n",
" \n",
"[[ 0 1 2 3]\n",
" [ 4 5 6 7]\n",
" [ 8 9 10 11]]\n",
"\n",
"dtype\tint64\n",
"ndim\t2\n",
"shape\t(3, 4)\n",
"strides\t(32, 8)\n",
" \n"
]
}
],
"source": [
"y += 1000\n",
"print_info(y)\n",
"# the original array is unchanged => not a view!\n",
"print_info(x)"
]
},
{
"cell_type": "markdown",
"id": "5e76ea7a",
"metadata": {},
"source": [
"Any operation that computes new values also returns a copy."
]
},
{
"cell_type": "code",
"execution_count": 82,
"id": "b8a3d44c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 0. 7.1 14.2 21.3]\n",
" [28.4 35.5 42.6 49.7]\n",
" [56.8 63.9 71. 78.1]]\n",
"\n",
"dtype\tfloat64\n",
"ndim\t2\n",
"shape\t(3, 4)\n",
"strides\t(32, 8)\n",
" \n"
]
}
],
"source": [
"y = x * 7.1\n",
"print_info(y)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9e50edfd",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "022e7b98",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}