add some rules of thumb

2024-08-21 11:11:58 +02:00 · 2024-08-21 11:11:58 +02:00 · dea903409e
commit dea903409e
parent 3db360b13b
1 changed files with 11 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -59,6 +59,17 @@ Setup:

 ## Back to the Python benchmark (third try)
  - can we explain what is happening now? Yes, more or less ;-)
+  - the default memeory layout is also called row-major == `C_CONTIGUOUS`
+  - rule of thumb for multi-dimensional numpy arrays:
+    - the right-most index should be the inner-most loop in a series of nested loops over the dimensions of a multi-dimensional array
+    - the previous rule can be remembered as *the right-most index changes the faster* in a series of nested loops
+    - the logically contiguous data, for example the data points of a single time series, should be stored along the right-most dimension: 
+        ```python
+          x = np.zeros((n_series, lenght_of_one_series)) # ➔ good!
+          y = np.zeros((length_of_one_series, n_series)) # ➔ bad!
+        ```
+    - … unless of course you plan to mostly loop *across* time series :)
+    - watch out when migrating code from MATLAB® or to `pandas.DataFrame` ➔ they store data in memory using the opposite convention, the column-major order!!!
  - quick fix for the [puzzle](puzzle.ipynb): try and add `order='F'` in the "bad" snippet and see that is "fixes" the bug ➔ why?

 Notes on the [Python benchmark](benchmark_python/):