add some rules of thumb

This commit is contained in:
Tiziano Zito 2024-08-21 11:11:58 +02:00
parent 3db360b13b
commit dea903409e

View file

@ -59,6 +59,17 @@ Setup:
## Back to the Python benchmark (third try)
- can we explain what is happening now? Yes, more or less ;-)
- the default memeory layout is also called row-major == `C_CONTIGUOUS`
- rule of thumb for multi-dimensional numpy arrays:
- the right-most index should be the inner-most loop in a series of nested loops over the dimensions of a multi-dimensional array
- the previous rule can be remembered as *the right-most index changes the faster* in a series of nested loops
- the logically contiguous data, for example the data points of a single time series, should be stored along the right-most dimension:
```python
x = np.zeros((n_series, lenght_of_one_series)) # ➔ good!
y = np.zeros((length_of_one_series, n_series)) # ➔ bad!
```
- … unless of course you plan to mostly loop *across* time series :)
- watch out when migrating code from MATLAB® or to `pandas.DataFrame` ➔ they store data in memory using the opposite convention, the column-major order!!!
- quick fix for the [puzzle](puzzle.ipynb): try and add `order='F'` in the "bad" snippet and see that is "fixes" the bug ➔ why?
Notes on the [Python benchmark](benchmark_python/):