.. | ||
overcommit.py | ||
README.md | ||
submit.sh |
The dangers and joys of automatic parallelization (like in numpy linear algebra routines) and the use of clusters/schedulers (but also on your laptop)
-
Go through the notebook to play around with numpy auto-parallelization, CPU affinity and OpenMP thread pool control
-
Now we want to submit our code to a cluster, or even just running it in parallel on our own laptop:
- run
overcommit.py
while monitoring with htop - try the
submit.sh
script - see problems with overcomitting
- explain the PSI (Pressure Stalled Information) fields in
htop
. Useful readings:
- run
-
Discuss implications for local and cluster workflows
Hands on
- Let's try to make it more quantitative:
- Write a benchmark in the style of benchmark_python
- We want to assess the performance of matrix multiplication as a function of:
- the size of the matrix
N
- the number of openMP threads
T
, controlled withthreadpoolctl
or by environment variableOMP_NUM_THREADS
- the number of processes
P
, controlled by thesubmit.sh
script or something similar
- the size of the matrix
- The results will of course depend on the particular architecture of the machine on which you are running
- Submit your benchmark, together with some plotting routines, as a PR to this repo!