24 lines
1.4 KiB
Markdown
24 lines
1.4 KiB
Markdown
# The dangers and joys of automatic parallelization (like in numpy linear algebra routines) and the use of clusters/schedulers (but also on your laptop)
|
|
- Go through the [notebook](../parallel.ipynb) to play around with numpy auto-parallelization, CPU affinity and OpenMP thread pool control
|
|
|
|
- Now we want to submit our code to a cluster, or even just running it in parallel on our own laptop:
|
|
- run [`overcommit.py`](overcommit.py) while monitoring with htop
|
|
- try the [`submit.sh`](submit.sh) script
|
|
- see problems with overcomitting
|
|
- explain the PSI (Pressure Stalled Information) fields in `htop`. Useful readings:
|
|
- https://docs.kernel.org/accounting/psi.html
|
|
- https://facebookmicrosites.github.io/psi/docs/overview
|
|
- Discuss implications for local and cluster workflows
|
|
|
|
# Hands on
|
|
- Let's try to make it more quantitative:
|
|
- Write a benchmark in the style of [benchmark_python](../benchmark_python/bench.py)
|
|
- We want to assess the performance of matrix multiplication as a function of:
|
|
- the size of the matrix `N`
|
|
- the number of openMP threads `T`, controlled with `threadpoolctl` or by environment variable `OMP_NUM_THREADS`
|
|
- the number of processes `P`, controlled by the [`submit.sh`](submit.sh) script or something similar
|
|
- The results will of course depend on the particular architecture of the machine on which you are running
|
|
- Submit your benchmark, together with some plotting routines, as a PR to this repo!
|
|
|
|
|
|
|