| .. | ||
| overcommit.py | ||
| README.md | ||
| submit.sh | ||
The dangers and joys of automatic parallelization (like in numpy linear algebra routines) and the use of clusters/schedulers (but also on your laptop)
- 
Go through the notebook to play around with numpy auto-parallelization, CPU affinity and OpenMP thread pool control
 - 
Now we want to submit our code to a cluster, or even just running it in parallel on our own laptop:
- run 
overcommit.pywhile monitoring with htop - try the 
submit.shscript - see problems with overcomitting
 - explain the PSI (Pressure Stalled Information) fields in 
htop. Useful readings: 
 - run 
 - 
Discuss implications for local and cluster workflows
 
Hands on
- Let's try to make it more quantitative:
- Write a benchmark in the style of benchmark_python
 - We want to assess the performance of matrix multiplication as a function of:
- the size of the matrix 
N - the number of openMP threads 
T, controlled withthreadpoolctlor by environment variableOMP_NUM_THREADS - the number of processes 
P, controlled by thesubmit.shscript or something similar 
 - the size of the matrix 
 
 - The results will of course depend on the particular architecture of the machine on which you are running
 - Submit your benchmark, together with some plotting routines, as a PR to this repo!