2024-heraklion-parallel-python/exercises/exerciseA
2024-08-30 13:35:24 +02:00
..
heavy_computation.py make timing consistent across exercises 2024-08-29 17:27:28 +02:00
plot.py ignore None in timings (from when the OMP_NUM_THREADS is not set) 2024-08-30 13:35:24 +02:00
README.md updated readmes for exercises 2024-08-29 19:00:45 +03:00

Exercise A: multithreading with NumPy

Objective: investigate speed-up of numpy code with multiple threads.

HINT Use htop in your terminal to track what the CPUs are doing.

First

The script heavy_computation.py performs some matrix calculations with numpy.

You can change the number of threads that numpy uses for the calculation using the OMP_NUM_THREADS environment variable like this:

OMP_NUM_THREADS=7 python heavy_computation.py

The script will also measure the time to run the calculation and will save the timing results into the timings/ folder as a .txt file.

TASK: Execute the script heavy_computation.py, varying the numbers of threads. You will plot the resulting calculating times in the second part below.

QUESTION

What happens if OMP_NUM_THREADS is not set? How many threads are there? Why?

Second

In plot.py, we have given code that will load all of the timing data in timings/.

TASK: Add code to plot of the execution duration vs. the number of threads

TASK: Open a Pull Request with your plotting code and post your plots in the conversation. Don't upload binaries to the Git remote!

OPTIONAL TASK: Add code to calculate and plot the speed-up time compared to single-threaded execution. Include your code and plot in the PR.

QUESTIONS

What does the result tell us about the optimum number of threads? Why?

Does it take the same time as your colleagues to run? Why?

Optional tasks

Investigate the runtime variability. Systematically run multiple instances with the same number of threads by modifying heavy_computation.py.

How is the runtime affected when the problem becomes bigger? Is the optimum number of threads always the same?

How is the runtime affected when the memory is almost full? You can fill it up by creating a separate (unused) large numpy array.

How about running on battery vs. having your laptop plugged in?