# The dangers and joys of automatic parallelization (like in numpy linear algebra routines) and the use of clusters/schedulers (but also on your laptop)
- Go through the [notebook](../parallel.ipynb) to play around with numpy auto-parallelization, CPU affinity and OpenMP thread pool control

- Now we want to submit our code to a cluster, or even just running it in parallel on our own laptop:
  - run [`overcommit.py`](overcommit.py) while monitoring with htop
  - try the [`submit.sh`](submit.sh) script
  - see problems with overcomitting
  - explain the PSI (Pressure Stalled Information) fields in `htop`. Useful readings:
    - https://docs.kernel.org/accounting/psi.html
    - https://facebookmicrosites.github.io/psi/docs/overview
- Discuss implications for local and cluster workflows

# Hands on
- Let's try to make it more quantitative:
  - Write a benchmark in the style of [benchmark_python](../benchmark_python/bench.py)
  - We want to assess the performance of matrix multiplication as a function of:
    - the size of the matrix `N`
    - the number of openMP threads `T`, controlled with `threadpoolctl` or by environment variable `OMP_NUM_THREADS`
    - the number of processes `P`, controlled by the [`submit.sh`](submit.sh) script or something similar
- The results will of course depend on the particular architecture of the machine on which you are running
- Submit your benchmark, together with some plotting routines, as a PR to this repo!