2025-plovdiv-comp-arch/exercise.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "n_series = 32\n",
    "len_one_series = 5*2**20\n",
    "time_series = np.random.rand(n_series, len_one_series)\n",
    "gap = 16*2**10"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Size of one time series: 40 M\n",
      "Size of collection: 1280 M\n",
      "Gap size: 128 K\n",
      "Gapped series size: 2 K\n"
     ]
    }
   ],
   "source": [
    "print(f'Size of one time series: {int(time_series[0].nbytes/2**20)} M')\n",
    "print(f'Size of collection: {int(time_series.nbytes/2**20)} M')\n",
    "print(f'Gap size: {int(gap*8/2**10)} K')\n",
    "print(f'Gapped series size: {int(time_series[0, ::gap].nbytes/2**10)} K')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following function implements an approximation of a power series of every `gap` value in our time series.\n",
    "\n",
    "If we define one time series of length `N` to be:\n",
    "\n",
    "$[x_0, x_1, x_2, ..., x_N]$,\n",
    "\n",
    "then the \"gapped\" series with `gap=g` is:\n",
    "\n",
    "$[x_0, x_g, x_{2g}, ..., x_{N/g}]$,\n",
    "\n",
    "where $N/g$ is the number of gaps.\n",
    "\n",
    "The approximation of the power series up to power `30` for our \"gapped\" series is defined as:\n",
    "\n",
    "$$\\mathbf{P} = \\sum_{p=0}^{30} \\sum_i x_i^{p} = \\sum_i x_i^0 + \\sum_i x_i^1 + \\sum_i x_i^2 + ... + \\sum_i x_i^{30} $$\n",
    "\n",
    "where $i \\in [0, g, 2g, ..., N/g]$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# compute an approximation of a power series for a collection of gapped timeseries\n",
    "def power(time_series, P, gap):\n",
    "    for row in range(time_series.shape[0]):\n",
    "        for pwr in range(30):\n",
    "            P[row] += (time_series[row, ::gap]**pwr).sum()\n",
    "    return P\n",
    "       "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "P = np.zeros_like(time_series)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "5.53 s ± 1.3 s per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
     ]
    }
   ],
   "source": [
    "%timeit power(time_series, P, gap)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Challenge\n",
    "- Can you improve on the above implementation of the `power` function?\n",
    "- Change the following `power_improved` function and see what you can do\n",
    "- **Remember**: you can't change any other cell in this notebook!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-03-04T10:06:08.461Z",
     "iopub.status.busy": "2024-03-04T10:06:08.459Z",
     "iopub.status.idle": "2024-03-04T10:06:08.466Z",
     "shell.execute_reply": "2024-03-04T10:06:08.468Z"
    }
   },
   "outputs": [],
   "source": [
    "def power_improved(time_series, P, gap):\n",
    "    for row in range(time_series.shape[0]):\n",
    "        for pwr in range(30):\n",
    "            P[row] += (time_series[row, ::gap]**pwr).sum()\n",
    "    return P"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "def power_improved(time_series, P, gap):\n",
    "    for pwr in range(30):\n",
    "        for row_idx, row_values in enumerate(time_series):\n",
    "            P[row_idx] += (row_values[::gap]**pwr).sum()\n",
    "    return P"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "# verify that they yield the same results\n",
    "P = np.zeros(n_series, dtype='float64')\n",
    "out1 = power(time_series, P, gap)\n",
    "P = np.zeros(n_series, dtype='float64')\n",
    "out2 = power_improved(time_series, P, gap)\n",
    "np.testing.assert_allclose(out1, out2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-03-04T10:06:14.959Z",
     "iopub.status.busy": "2024-03-04T10:06:14.956Z",
     "iopub.status.idle": "2024-03-04T10:06:17.437Z",
     "shell.execute_reply": "2024-03-04T10:06:17.443Z"
    }
   },
   "outputs": [],
   "source": [
    "P = np.zeros(n_series, dtype='float64')\n",
    "%timeit power(time_series, P, gap)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "17.8 ms ± 1.56 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
     ]
    }
   ],
   "source": [
    "P = np.zeros(n_series, dtype='float64')\n",
    "%timeit power_improved(time_series, P, gap)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Learnings:\n",
    "- just changing the order of the loops: 5.5 s -> 24.2 ms\n",
    "- also enumerating and accessing the rows: 17.8 ms"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.6"
  },
  "nteract": {
   "version": "0.28.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}