renaming exercises

2024-08-29 18:23:32 +03:00 · 2024-08-29 18:23:32 +03:00 · 55cccc5f4f
commit 55cccc5f4f
parent b9988fdda9
6 changed files with 2 additions and 2 deletions
--- a/exercises/exerciseA/README.md
+++ b/exercises/exerciseA/README.md
@ -0,0 +1,52 @@
+# Exercise A: multithreading with NumPy
+
+Objective: investigate speed-up of numpy code with multiple threads.
+
+```HINT``` Use `htop` in your terminal to track what the CPUs are doing.
+
+## First
+
+The script `heavy_computation.py` performs some matrix calculations with numpy.
+
+You can change the number of threads that numpy uses for the calculation
+using the `OMP_NUM_THREADS` environment variable like this:
+```
+OMP_NUM_THREADS=7 python heavy_computation.py
+```
+
+The script will also measure the time to run the calculation and will save
+the timing results into the `timings/` folder as a `.txt` file.
+
+**TASK**: Execute the script `heavy_computation.py`, varying the numbers of threads.
+You will plot the resulting calculating times in the second part below.
+
+**QUESTION**
+> What happens if `OMP_NUM_THREADS` is not set? How many threads are there? Why?
+
+
+## Second
+
+In `plot.py`, we have given code that will load all of the timing data in `timings/`.
+
+**TASK**: Add code to plot of the execution duration vs. the number of threads
+
+Open a PR with your plotting code and post your plots in the conversation, don't upload binaries to the Git remote!
+
+**OPTIONAL TASK**: Add code to calculate and plot the speed-up time compared
+to single-threaded execution. Include your code and plot in the PR.
+
+**QUESTIONS**
+
+> What does the result tell us about the optimum number of threads? Why?
+
+> Does it take the same time as your colleagues to run? Why?
+
+## Optional tasks
+
+Investigate the runtime variability. Systematically run multiple instances with the same number of threads by modifying `heavy_computation.py`.
+
+How is the runtime affected when the problem becomes bigger? Is the optimum number of threads always the same?
+
+How is the runtime affected when the memory is almost full? You can fill it up by creating a separate (unused) large numpy array.
+
+How about running on battery vs. having your laptop plugged in?
--- a/exercises/exerciseA/heavy_computation.py
+++ b/exercises/exerciseA/heavy_computation.py
@ -0,0 +1,32 @@
+import os
+import timeit
+import numpy as np
+from datetime import datetime
+import time
+
+# Timestamp that will be put in the file name
+timestamp = datetime.now().strftime("%H%M%S%f")
+
+# Get the environment variable for threads
+threads = os.getenv('OMP_NUM_THREADS')
+
+# A relatively large matrix to work on
+n = 5_000
+x = np.random.random(size=(n, n))
+
+print(f"We are executed with OMP_NUM_THREADS={threads} for {n=}")
+
+# Measure the time required for matrix multiplication
+start_time = time.time()
+y = x @ x  # The heavy compute
+stop_time = time.time()
+elapsed_time = stop_time - start_time
+
+print(f'Time used for matrix multiplication: {elapsed_time:.2f} s')
+
+# Check if timings folder exists
+if not os.path.isdir('timings/'): os.mkdir('timings')
+
+# IO: Save the timing to a unique txt file
+with open(f'timings/{threads}_threads_t{timestamp}.txt', 'w') as file:
+    file.write(f'{threads},{elapsed_time:.6f}')
--- a/exercises/exerciseA/plot.py
+++ b/exercises/exerciseA/plot.py
@ -0,0 +1,23 @@
+import os
+import numpy as np
+import matplotlib.pyplot as plt
+
+# IO: This loads the timings for you 
+threads, timings = [], []
+for file in os.listdir('timings'):
+    with open(f'timings/{file}', 'r') as f:
+        n, t = f.read().strip().split(',')
+        threads.append(int(n))
+        timings.append(float(t))
+threads = np.array(threads)
+timings = np.array(timings)
+
+print('This is the data I loaded: threads =', threads, ', timings =',timings)
+
+fig, axs = plt.subplots()
+
+# CREATE YOUR PLOT HERE
+# Remember to label your axis
+# Feel free to make it pretty
+
+plt.savefig('threads_v_timings.png', dpi=300)
--- a/exercises/exerciseB/README.md
+++ b/exercises/exerciseB/README.md
@ -0,0 +1,27 @@
+# Exercise B: multiprocessing and map
+
+Objective: introduce `map` and `Pool.map`.
+
+In the `numerical_integration.py` file, we give Python code that calculates
+the integral of a function in two different ways: numerically and analytically.
+
+The given functions are `integrate` (numerical integration), `f` (the function
+to integrate), and `F` (the analytical integral).
+
+We want to check the precision of the numerical integration as a function of
+the number of steps in the domain. To do this, we calculate and print the
+relative differences between the analytic result and the numerical result
+for different values of the number of steps.
+
+**TASKS**:  
+ 0. Read `numerical_integration.py` and familiarize yourselves with the code.
+ 1. Update the `main` function so that it calculates the numerical error without
+    any parallelization. You can use a for loop or `map`.
+ 2. Note the execution time for this serial implementation.
+ 3. Implement the parallel version using `multiprocessing.Pool`.
+ 4. Compare the timing for the parallel version with the serial time.
+    What speed-up did you get?
+
+**BONUS TASKS (very optional)**:  
+ 5. Implement a parallel version with threads (using `multiprocessing.pool.ThreadPool`).
+ 6. Time this version, and hypothetize about the result.
--- a/exercises/exerciseB/numerical_integration.py
+++ b/exercises/exerciseB/numerical_integration.py
@ -0,0 +1,38 @@
+"""Exercise 2b: multiprocessing
+"""
+
+def integrate(f, a, b, n):
+    "Perform numerical integration of f in range [a, b], with n steps"
+    s = []
+    for i in range(n):
+        dx = (b - a) / n
+        x = a + (i + 0.5) * dx
+        y = f(x)
+        s = s + [y * dx]
+    return sum(s)
+
+def f(x):
+    "A polynomial that we'll integrate"
+    return x ** 4 - 3 * x
+
+def F(x):
+    "The analatic integral of f. (F' = f)"
+    return 1 / 5 * x ** 5 - 3 / 2 * x ** 2
+
+def compute_error(n):
+    "Calculate the difference between the numerical and analytical integration results"
+    a = -1.0
+    b = +2.0
+    F_analytical = F(b) - F(a)
+    F_numerical = integrate(f, a, b, n)
+    return abs((F_numerical - F_analytical) / F_analytical)
+
+def main():
+    ns = [10_000, 25_000, 50_000, 75_000]
+    errors = ... # TODO: write a for loop, serial map, and parallel map here
+
+    for n, e in zip(ns, errors):
+        print(f'{n} {e:.8%}')
+
+if __name__ == '__main__':
+    main()
--- a/exercises/exerciseB/numerical_integration_solution.py
+++ b/exercises/exerciseB/numerical_integration_solution.py
@ -0,0 +1,30 @@
+import sys
+from numerical_integration import compute_error
+
+def main(arg):
+    ns = [10_000, 25_000, 50_000, 75_000]
+
+    match arg:
+        case 'for':
+            errors = []
+            for n in ns:
+                errors += [compute_error(n)]
+        case 'lc':
+            errors = [compute_error(n) for n in ns]
+        case 'map':
+            errors = list(map(compute_error, ns))
+        case 'mp':
+            from multiprocessing import Pool as ProcessPool
+            with ProcessPool() as pool:
+                errors = pool.map(compute_error, ns)
+        case 'mt':
+            from multiprocessing.pool import ThreadPool
+            with ThreadPool(10) as pool:
+                errors = pool.map(compute_error, ns)
+
+    for n, e in zip(ns, errors):
+        print(f'{n} {e:.8%}')
+
+if __name__ == '__main__':
+    arg = (sys.argv[1:] + ['for'])[0]
+    main(arg)