History

Tiziano Zito 2ea0c0b60c add all materials		2025-08-13 13:58:07 +02:00
..
overcommit.py	first commit	2025-08-13 13:57:19 +02:00
README.md	add all materials	2025-08-13 13:58:07 +02:00
submit.sh	first commit	2025-08-13 13:57:19 +02:00

The dangers and joys of automatic parallelization (like in numpy linear algebra routines) and the use of clusters/schedulers (but also on your laptop)

Go through the notebook to play around with numpy auto-parallelization, CPU affinity and OpenMP thread pool control
Now we want to submit our code to a cluster, or even just running it in parallel on our own laptop:
- run overcommit.py while monitoring with htop
- try the submit.sh script
- see problems with overcomitting
- explain the PSI (Pressure Stalled Information) fields in htop. Useful readings:
  - https://docs.kernel.org/accounting/psi.html
  - https://facebookmicrosites.github.io/psi/docs/overview
Discuss implications for local and cluster workflows

Hands on

Let's try to make it more quantitative:
- Write a benchmark in the style of benchmark_python
- We want to assess the performance of matrix multiplication as a function of:
  - the size of the matrix N
  - the number of openMP threads T, controlled with threadpoolctl or by environment variable OMP_NUM_THREADS
  - the number of processes P, controlled by the submit.sh script or something similar
The results will of course depend on the particular architecture of the machine on which you are running
Submit your benchmark, together with some plotting routines, as a PR to this repo!