Compare commits

...
Sign in to create a new pull request.

4 commits

46 changed files with 20451 additions and 0 deletions

93
README.md Normal file
View file

@ -0,0 +1,93 @@
# What every scientist should know about computer architecture
## Introduction
- [Puzzle](puzzle.ipynb)
- Question: how come that swapping dimensions in a for-loop makes out for a huge slowdown?
- Let students play around with the notebook and try to find the "bug"
- A more thorough [benchmark](benchmark_python/)
## A digression in CPU architecture and the memory hierarchy
- Go to [A Primer in CPU architecture](architecture/)
- Measure size and timings for the memory hierarchy on my machine with a low level [C benchmark](benchmark_low_level/)
## Analog programming
- [Two exercises to activate the body and the mind](analog_programming.md)
## Back to the Python benchmark (second try)
- can we explain what is happening?
- it must have to do with the good (or bad) use of cache properties
- but how are numpy arrays laid out in memory?
## Anatomy of a numpy array
- [memory layout of numpy arrays](numpy/)
## Back to the Python benchmark (third try)
- can we explain what is happening now? Yes, more or less ;-)
- quick fix for the [puzzle](puzzle.ipynb): try and add `order='F'` in the "bad" snippet and see that it "fixes" the bug ➔ why?
- the default memeory layout is called "C-contiguous" or "row-major":
```python
np.zeros((2,2)).flags.c_contiguous == True
np.zeros((2,2)).flags.f_contiguous == False
```
- note that for one-dimensional arrays it makes no difference:
```python
np.zeros(2).flags.c_contiguous == True
np.zeros(2).flags.f_contiguous == True
```
- rule of thumb for multi-dimensional numpy arrays:
- the right-most index should be the inner-most loop in a series of nested loops over the dimensions of a multi-dimensional array
- the previous rule can be remembered as *the right-most index changes the faster* in a series of nested loops
- the logically contiguous data, for example the data points of a single time series, should be stored along the right-most dimension:
```python
x = np.zeros((n_series, lenght_of_one_series)) # ➔ good!
y = np.zeros((length_of_one_series, n_series)) # ➔ bad!
```
- … unless of course you plan to mostly loop *across* time series :)
- watch out when migrating code from MATLAB® : it stores data in memory using the opposite convention, the column-major order!
- **DANGER**: watch out when working with [`pandas.DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html):
➔ the data are stored in memory using different conventions depending on how the `DataFrame` was initialized! Be sure to
check the `DataFrame.values.flags` attribute!
## A final exercise to put it all together
- fork this repo to your account and clone your fork on the laptop
- create a branch `ex` and switch to it
- work on the [exercise](exercise.ipynb)
- push your solution to your fork and create a Pull Request to this repo
## Notes on the benchmarks
- while running the benchmarks attached to one core on my laptop, the core was running under a constant load of 100% (almost completely user-time) and at a fixed frequency of 3.8 GHz, where the theoretical max would be 5.2 GHz
➔ the CPU does not "starve" because it scales its speed down to match the memory throughput? Or I am misinterpreting this? This problem which at first sight should be perfectly memory-bound, becomes CPU-bound, or actually, exactly balanced? From the [Intel documentation](https://lenovopress.lenovo.com/lp1836-tuning-uefi-settings-4th-gen-intel-xeon-scalable-processor):
> **Energy Efficient Turbo**
>
> When `Energy Efficient Turbo` is enabled, the CPUs optimal turbo
> frequency will be tuned dynamically based on CPU utilization. The actual
> turbo frequency the CPU is set to is proportionally adjusted based on the
> duration of the turbo request. Memory usage of the OS is also monitored.
> If the OS is using memory heavily and the CPU core performance is limited
> by the available memory resources, the turbo frequency will be reduced
> until more memory load dissipates, and more memory resources become
> available. The power/performance bias setting also influences energy
> efficient turbo. `Energy Efficient Turbo` is best used when attempting to
> maximize power consumption over performance.
## Concluding remarks
- how is all of this relevant for the users of a computing cluster?
- Never trust benchmarks! See for example [Producing Wrong Data Without Doing Anything Obviously Wrong!](https://users.cs.northwestern.edu/~robby/courses/322-2013-spring/mytkowicz-wrong-data.pdf)
## Additional material if there's time left
- [Excerpts of parallel Python](parallel)
- how does memory *allocation* to processes work at the OS level?
- virtual memory
- swap
- optimistic over-committing allocation policies
- the oom-killer watchdog

Binary file not shown.

After

Width:  |  Height:  |  Size: 92 KiB

View file

@ -0,0 +1,67 @@
<?xml version="1.0" encoding="utf-8"?>
<!-- Generator: Adobe Illustrator 12.0.1, SVG Export Plug-In . SVG Version: 6.00 Build 51448) -->
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd" [
<!ENTITY ns_svg "http://www.w3.org/2000/svg">
<!ENTITY ns_xlink "http://www.w3.org/1999/xlink">
<!ENTITY st0 "letter-spacing:-3px;">
<!ENTITY st1 "fill:none;">
<!ENTITY st2 "fill:none;stroke:#000000;stroke-width:3px;">
<!ENTITY st3 "fill:none;stroke:#000000;stroke-width:6px;">
<!ENTITY st4 "font-size:36px;">
<!ENTITY st5 "font-size:48px;">
<!ENTITY st6 "font-family:Times,'Times New Roman',serif; font-weight:normal; font-style:italic;">
]>
<svg version="1.1" id="Layer_2" xmlns="&ns_svg;" xmlns:xlink="&ns_xlink;" width="648" height="531" viewBox="0 0 648 531"
style="overflow:visible;enable-background:new 0 0 648 531;" xml:space="preserve">
<rect style="&st1;" width="648" height="531"/>
<g>
<g>
<line style="&st3;" x1="441" y1="297" x2="441" y2="369"/>
<line style="&st2;" x1="441" y1="324" x2="486" y2="297"/>
<g>
<line style="&st2;" x1="441" y1="342" x2="486" y2="369"/>
<polygon points="466.9,366.817 486,369 475.086,353.175 "/>
</g>
<circle style="&st2;" cx="459" cy="333" r="45"/>
</g>
<g>
<line style="&st3;" x1="288" y1="234" x2="216" y2="234"/>
<line style="&st2;" x1="261" y1="234" x2="288" y2="279"/>
<g>
<line style="&st2;" x1="243" y1="234" x2="216" y2="279"/>
<polygon points="218.183,259.9 216,279 231.825,268.086 "/>
</g>
<g>
<line style="&st2;" x1="261" y1="234" x2="234" y2="279"/>
<polygon points="236.183,259.9 234,279 249.825,268.086 "/>
</g>
<circle style="&st2;" cx="252" cy="252" r="45"/>
</g>
<polyline style="&st2;" points="288,279 360,279 360,333 441,333 "/>
<polyline style="&st2;" points="234,279 180,369 81,369 "/>
<polyline style="&st2;" points="216,279 189,324 81,324 "/>
<line style="&st2;" x1="81" y1="468" x2="594" y2="468"/>
<line style="&st2;" x1="486" y1="495" x2="540" y2="495"/>
<line style="&st2;" x1="495" y1="504" x2="531" y2="504"/>
<line style="&st2;" x1="504" y1="513" x2="522" y2="513"/>
<line style="&st2;" x1="513" y1="495" x2="513" y2="468"/>
<line style="&st2;" x1="486" y1="369" x2="486" y2="468"/>
<line style="&st2;" x1="252" y1="234" x2="252" y2="171"/>
<polyline style="&st2;" points="252,171 266,164.25 238,150.75 266,137.25 238,123.75 266,110.25 238,96.75 252,90 "/>
<polyline style="&st2;" points="486,171 500,164.25 472,150.75 500,137.25 472,123.75 500,110.25 472,96.75 486,90 "/>
<line style="&st2;" x1="486" y1="297" x2="486" y2="171"/>
<line style="&st2;" x1="252" y1="90" x2="252" y2="45"/>
<line style="&st2;" x1="486" y1="45" x2="486" y2="90"/>
<line style="&st2;" x1="81" y1="45" x2="594" y2="45"/>
<line style="&st2;" x1="486" y1="216" x2="594" y2="216"/>
<text transform="matrix(1 0 0 1 18 54)"><tspan x="0" y="0" style="&st6; &st5; &st0;">V</tspan><tspan x="25.48" y="5" style="&st6; &st4;">cc</tspan></text>
<text transform="matrix(1 0 0 1 18 459)" style="&st6; &st5;">GND</text>
<text transform="matrix(1 0 0 1 170 140)" style="&st6; &st5;">R1</text>
<text transform="matrix(1 0 0 1 410 140)" style="&st6; &st5;">R2</text>
<text transform="matrix(1 0 0 1 120 230)" style="&st6; &st5;">VT1</text>
<text transform="matrix(1 0 0 1 380 280)" style="&st6; &st5;">VT2</text>
<text transform="matrix(1 0 0 1 19.2959 333)" style="&st6; &st5;">A</text>
<text transform="matrix(1 0 0 1 18.4336 377)" style="&st6; &st5;">B</text>
<text transform="matrix(1 0 0 1 603 223.1104)" style="&st6; &st5;">Q</text>
</g>
</svg>

After

Width:  |  Height:  |  Size: 3.6 KiB

123
architecture/README.md Normal file
View file

@ -0,0 +1,123 @@
# A Primer on computer architecture
## Binary representation of common data types
- **integers**
- int32 ➔ 32b (bits) = 4B (bytes)
- 1 bit for sign, 31 bit for magnitude
- min = -2^31 = -2,147,483,648
- max = 2^31-1 = 2,147,483,647
- [visualization](https://manderc.com/apps/umrechner/index_eng.php)
- In Python:
- `bin(14)``'0b1110'`
- `np.iinfo(np.int32)``iinfo(min=-2147483648, max=2147483647, dtype=int32)`
- Python integers, as opposed to numpy integer types, are represented with a flexible number of bits: `sys.int_info``bits_per_digit=30, sizeof_digit=4, default_max_str_digits=4300, str_digits_check_threshold=640`
- they are called "long" or "superlong" integers, because they can have arbitrary size. Low level implementation explained:
- [Arpit Bhayani's blog](https://arpitbhayani.me/blogs/long-integers-python/)
- [Artem Golubin's blog](https://rushter.com/blog/python-integer-implementation/)
- **real numbers**, a.k.a. floating point numbers (IEEE-754 standard):
- float64 ➔ 64b (bits) = 8B (bytes)
- 1 bit for sign, 51 bits for mantissa, 11 bits for exponent
- min, max ≈ ± 1.8 × 10^308
- smallest ≈ 2.2 x 10^-308
- example in Python:
- `np.finfo(np.float64)``finfo(resolution=1e-15, min=-1.7976931348623157e+308, max=1.7976931348623157e+308, dtype=float64)`
- `sys.float_info``max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1`
- next floating point number after `1.` in the direction of `2.`: `np.nextafter(1., 2.)``1.0000000000000002`
- watch out for equality between floating point numbers:
- `1.1 + 2.2 == 3.3``False`
- `(1.1+2.2).hex()``0x1.a666666666667p+1` and `(3.3).hex()``0x1.a666666666666p+1`
- visualization:
- [`float32`](https://www.h-schmidt.net/FloatConverter/IEEE754.html)
- [all types](https://float.exposed/) with [explanation](https://ciechanow.ski/exposing-floating-point/)
- Docs with more details: [What Every Programmer Should Know About Floating-Point Arithmetic or Why dont my numbers add up?](https://floating-point-gui.de)
- Docs with the gory details, a.k.a. the floating point bible: [What every computer scientist should know about floating-point arithmetic](https://doi.org/10.1145/103162.103163)
- **strings**:
- UTF8 encoded, flexible width from 1B (byte) to 4B (bytes): 1,112,064 Unicode characters (code points)
- ASCII: 7 bits (fits in one byte), 127 characters ➔ [ASCII table](https://upload.wikimedia.org/wikipedia/commons/2/26/ASCII_Table_%28suitable_for_printing%29.svg)
- [visualization](https://sonarsource.github.io/utf8-visualizer/)
- actually in Python strings (more precisely: unicode objects) are stored in different formats depending on which characters are stored for memory efficiency. Look at the gory details [here](https://docs.python.org/3.14/c-api/unicode.html) ➔ not for the faint-hearted!
- **hexadecimal notation**:
- base16 ➔ '0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f'
- a compact way of representating binary data: 8 bits = 1 byte = 2 hexadecimal digits
- Example: `254` (decimal) ➔ `1111 1110` (binary) ➔ `fe` (hex)
## CPU architecture
![CPU schematics](comp_architecture_schematics.svg)
- Primer on CPU (x86_64) architecture and the memory hierarchy:
- CPU registers ≈ 160 (plus another ~500 model specific), latency: 0 cycles, capacity: 8 bytes
- x86-64 instruction set, with ≈ 2000 instructions with mnemonic (plus an unknown number of "undocumented" instructions ~ 10k)
- (single instruction, mutiple data [8 or 16 data units]) SIMD CPUs
- L-Caches: L1/L2/L3, with cache lines of 128B, latencies: 1-40 cycles, capacity: ~KB and ~MB
- Main memory: RAM pages 4KB or 64KB, latency: 50~100 cycles, capacity ~GBs
- Storage (local disks): disk transfer blocks 4KB to 64MB, latency: 0.1ms (300k cycles), capacity: ~TBs
- Remote Storage (network): typically limited by ethernet connection (1-10 GB/s), latency: 10~100 ms, capacity: ∞
- Understand the trade-offs involved:
- **capacity** measured in T/G/M/K/B
- **latency** ≈ (time-when-data-available-on-output-pins time-when-data-requested) ➔ measured in nanoseconds or in (CPU) cycles
- **bandwidth** ≈ clock frequency × data-transfer/tick × bus-width (in bytes) ➔ measured in T/G/M/K/B per second (this is what is usually advertised as the **speed**)
- data **volatility** vs. **persistence**
- cost
- physical limits (heat dissipation, density, size, lifetime)
- temporal and spacial locality of data
- The gory details about the memory hierarchy: [What Every Programmer Should Know About Memory](https://www.akkadia.org/drepper/cpumemory.pdf) by the notorious Ulrich Drepper
![Computer architecture - big picture](comp_architecture_big_picture.svg)
# …and what about the GPU?
![GPU vs CPU architecture](GPUvsCPU-architecture.png)
- A GPU has many (in the order of hundreds) SIMT (single instruction, multiple thread) cores, so called SMs (Streaming Multiprocessors), each one with local L1 and shared L2 caches, and shared RAM (due to to the high parallelism, with huge bandwidth, in the order of ~1TB/s)
- The SMs are specialized on data types. In order of abundance, the following data types are supported: int8, int32, int64, float16, float32, float64
- Performance depends on:
- memory bandwidth: usually higher than CPU's RAM
- "math" bandwidth: usually higher than CPU's, but much more limited in capability; for example branches (if/else) are expensive
- latency: usually much higher than CPU's ➔ the more parallel threads are run the less the price of high latency is paid (latency "hiding")
- spatial locality is extremely critical
- A portion of the GPU-RAM is accessible to the CPU ➔ the GPU performs the copies
- The PCI-Bus (Peripheral Component Interconnect bus) is the bottleneck: data needs to flow from main (CPU) memory to GPU memory and back!
- Problems on a cluster: the GPU does not really support simultanous multiple users payloads!
# Computer Architecture (a concrete example)
My Laptop:
![Graphical representation](topology.png)
- Lenovo - T14 Gen 4
- CPU i7-1365U:
- 2× "performance cores" (Intel Core) max 5.20 GHz (0.19 ns/cycle) with Hyper-Threading
- 8× "efficient cores" (Intel Atom) max 3.90 GHz (0.26 ns/cycle)
- L1 (data) cache P-Core 48 KB
- L1 (data) cache E-Core 32 KB
- L2 cache P-Core 1280 KB
- L2 cache E-Core 2048 KB (shared x4)
- L3 cache 12 MB (shared P+E-Cores)
- RAM DDR5-5200: 32GB (16GB soldered + 16GB bank):
- Data rate 5200 MT/s, Transfer time 0.192 ns/cycle
- Command rate (bus clock) 2600 MHz, Cycle time 0.385 ns
- Internal clock 650 MHz, 1.54 ns
- CAS Latency 34 cycles, Total latency = CAS latency x cycle = 13.09 ns, Throughput 40.6 GB/s
- DMI (Direct Media Interface): 8×16 GT/s (≈128 GB/s)
- PCI (Peripheral Component Interconnect) Express bridges:
- Graphics: 16 GT/s (≈ 8 GB/s)
- 2× Thunderbolt: 2.5 GT/s (≈ 1 GB/s) and 16 GT/s (≈ 8 GB/s)
- GPU Intel Iris, Internal clock 300 Mhz-1.30 GHz, memory 4 GB/2.1 GHz with a bandwidth of 68 GB/s
## Historical evolution of speed of different components in a computer
(data source: Wikipedia)
### CPU Clock Rate
![](cpu_clock_rate.svg)
### Memory (RAM) Clock Cycle Time
![](memory_clock.svg)
### Memory (RAM) Bandwidth
![](memory_bandwidth.svg)
### Memory (RAM) Latency
![](memory_latency.svg)
### Storage Read Speed
![](storage.svg)

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 26 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 94 KiB

View file

@ -0,0 +1,201 @@
1969,1 MHz
1971,740 kHz
1972,200 kHz
1972,400 kHz
1972,500 kHz
1973,1 MHz
1973,2 MHz
1973,715 kHz
1974,1.33 MHz
1974,1.4 MHz
1974,1 MHz
1974,2 MHz
1974,400 kHz
1974,500 kHz
1974,715 kHz
1974,740 kHz
1975,1.2 MHz
1975,10 MHz
1975,1 MHz
1975,256 kHz
1975,2 MHz
1975,3.3 MHz
1975,4 MHz
1976,2.5 MHz
1976,3.3 MHz
1976,6.4 MHz
1976,8 MHz
1977,1.0 MHz
1977,2.0 MHz
1977,3.0 MHz
1978,1 MHz
1978,5 MHz
1979,5 MHz
1979,8 MHz
1981,10 MHz
1981,2.5 MHz
1982,18 MHz
1982,1 MHz
1982,6 MHz
1982,8 MHz
1983,2 MHz
1983,3 MHz
1984,16 MHz
1984,5 MHz
1985,12 MHz
1985,5 MHz
1985,8 MHz
1986,15 MHz
1986,16 MHz
1987,10 MHz
1987,12.5 MHz
1987,16 MHz
1987,20 MHz
1987,8 MHz
1988,10 MHz
1988,12 MHz
1988,25 MHz
1989,16-33 MHz
1989,25 MHz
1989,35 MHz
1990,20-30 MHz
1990,40 MHz
1991,100 MHz
1991,33 MHz
1991,62.5-90.91 MHz
1992,100 MHz
1992,100-200 MHz
1992,20 MHz
1992,40 MHz
1992,40-50 MHz
1993,120 MHz
1993,50-80 MHz
1993,55-71.5 MHz
1993,60-66 MHz
1994,100 MHz
1994,100-125 MHz
1994,100-180 MHz
1994,125 MHz
1994,200-300 MHz
1994,50 MHz
1994,60-120 MHz
1994,60-125 MHz
1995,101-118 MHz
1995,143-167 MHz
1995,150-200 MHz
1995,266-333 MHz
1996,141-161 MHz
1996,150 MHz
1996,150-250 MHz
1996,160-180 MHz
1996,180-250 MHz
1996,400-500 MHz
1996,75-100 MHz
1997,120-150 MHz
1997,125 MHz
1997,166-233 MHz
1997,200 MHz
1997,233-300 MHz
1997,233-366 MHz
1997,250-400 MHz
1997,370 MHz
1998,200 MHz
1998,250-300 MHz
1998,250-330 MHz
1998,262 MHz
1998,270-400 MHz
1998,300-440 MHz
1998,450-600 MHz
1998,500 MHz
1999,294-300 MHz
1999,350-500 MHz
1999,450 MHz
1999,450-600 MHz
1999,500-1000 MHz
1999,550-637 MHz
2000,1.33-1.73 GHz
2000,1.3-2 GHz
2000,450-810 MHz
2000,550 MHz-1.3 GHz
2000,600-750 MHz
2000,918 MHz
2001,1.1-1.4 GHz
2001,500-600 MHz
2001,733-800 MHz
2001,750-1200 MHz
2002,0.9-1 GHz
2002,1.1-1.35 GHz
2003,0.9-1.7 GHz
2003,1.4-2.4 GHz
2003,1.6-2.0 GHz
2004,1.65-1.9 GHz
2004,700 MHz
2005,1.05-1.35 GHz
2005,1.2-2.5 GHz
2005,1.6-3.0 GHz
2005,1-1.4 GHz
2005,2.8-3.2 GHz
2005,2-2.4 GHz
2005,3.2 GHz
2006,1.06-2.67 GHz
2006,1.1-2.33 GHz
2006,1.4-1.6 GHz
2006,3.2-4.6 GHz
2007,1.8-3.2 GHz
2007,1-1.4 GHz
2007,2.15-2.4 GHz
2007,3.5-4.7 GHz
2007,600-900 MHz
2007,850 MHz
2008,0.8-1.6 GHz
2008,1.8-2.6 GHz
2008,2.3-2.9 GHz
2008,2.4-2.88 GHz
2008,2.66-3.2 GHz
2008,2.8-4.0 GHz
2008,4.4 GHz
2008,600-866 MHz
2009,2.2-2.8 GHz
2009,2.5-3.2 GHz
2010,1.6 GHz
2010,1.73-2.66 GHz
2010,1.7-2.4 GHz
2010,1.86-3.33 GHz
2010,2.66-3.0 GHz
2010,2 GHz
2010,3.8-5.2 GHz
2010,3-4.14 GHz
2011,1.0-1.6 GHz
2011,1.6 GHz
2011,1.6-3.4 GHz
2011,1.73-2.67 GHz
2011,2.0 GHz
2011,2.8-3 GHz
2011,3.1-3.6 GHz
2012,1.73-2.53 GHz
2012,1.848 GHz
2012,3.1-5.3 GHz
2012,5.5 GHz
2013,1.9-4.4 GHz
2013,2.8-3 GHz
2013,3.6 GHz
2014,1.8-4 GHz
2014,2.5-5 GHz
2015,3.6 GHz
2015,5 GHz
2016,320 MHz
2017,1.5 GHz
2017,3.2-4.1 GHz
2017,4 GHz
2017,5.2 GHz
2017,5 GHz
2018,1.5 GHz
2018,2.2-3.2 GHz
2018,2.8-3.7 GHz
2019,2-4.7 GHz
2019,5.2 GHz
2020,3.2 GHz
2020,3.4-4.9 GHz
2021,3.2 GHz
2022,3.2 GHz
2022,5 GHz
1 1969 1 MHz
2 1971 740 kHz
3 1972 200 kHz
4 1972 400 kHz
5 1972 500 kHz
6 1973 1 MHz
7 1973 2 MHz
8 1973 715 kHz
9 1974 1.33 MHz
10 1974 1.4 MHz
11 1974 1 MHz
12 1974 2 MHz
13 1974 400 kHz
14 1974 500 kHz
15 1974 715 kHz
16 1974 740 kHz
17 1975 1.2 MHz
18 1975 10 MHz
19 1975 1 MHz
20 1975 256 kHz
21 1975 2 MHz
22 1975 3.3 MHz
23 1975 4 MHz
24 1976 2.5 MHz
25 1976 3.3 MHz
26 1976 6.4 MHz
27 1976 8 MHz
28 1977 1.0 MHz
29 1977 2.0 MHz
30 1977 3.0 MHz
31 1978 1 MHz
32 1978 5 MHz
33 1979 5 MHz
34 1979 8 MHz
35 1981 10 MHz
36 1981 2.5 MHz
37 1982 18 MHz
38 1982 1 MHz
39 1982 6 MHz
40 1982 8 MHz
41 1983 2 MHz
42 1983 3 MHz
43 1984 16 MHz
44 1984 5 MHz
45 1985 12 MHz
46 1985 5 MHz
47 1985 8 MHz
48 1986 15 MHz
49 1986 16 MHz
50 1987 10 MHz
51 1987 12.5 MHz
52 1987 16 MHz
53 1987 20 MHz
54 1987 8 MHz
55 1988 10 MHz
56 1988 12 MHz
57 1988 25 MHz
58 1989 16-33 MHz
59 1989 25 MHz
60 1989 35 MHz
61 1990 20-30 MHz
62 1990 40 MHz
63 1991 100 MHz
64 1991 33 MHz
65 1991 62.5-90.91 MHz
66 1992 100 MHz
67 1992 100-200 MHz
68 1992 20 MHz
69 1992 40 MHz
70 1992 40-50 MHz
71 1993 120 MHz
72 1993 50-80 MHz
73 1993 55-71.5 MHz
74 1993 60-66 MHz
75 1994 100 MHz
76 1994 100-125 MHz
77 1994 100-180 MHz
78 1994 125 MHz
79 1994 200-300 MHz
80 1994 50 MHz
81 1994 60-120 MHz
82 1994 60-125 MHz
83 1995 101-118 MHz
84 1995 143-167 MHz
85 1995 150-200 MHz
86 1995 266-333 MHz
87 1996 141-161 MHz
88 1996 150 MHz
89 1996 150-250 MHz
90 1996 160-180 MHz
91 1996 180-250 MHz
92 1996 400-500 MHz
93 1996 75-100 MHz
94 1997 120-150 MHz
95 1997 125 MHz
96 1997 166-233 MHz
97 1997 200 MHz
98 1997 233-300 MHz
99 1997 233-366 MHz
100 1997 250-400 MHz
101 1997 370 MHz
102 1998 200 MHz
103 1998 250-300 MHz
104 1998 250-330 MHz
105 1998 262 MHz
106 1998 270-400 MHz
107 1998 300-440 MHz
108 1998 450-600 MHz
109 1998 500 MHz
110 1999 294-300 MHz
111 1999 350-500 MHz
112 1999 450 MHz
113 1999 450-600 MHz
114 1999 500-1000 MHz
115 1999 550-637 MHz
116 2000 1.33-1.73 GHz
117 2000 1.3-2 GHz
118 2000 450-810 MHz
119 2000 550 MHz-1.3 GHz
120 2000 600-750 MHz
121 2000 918 MHz
122 2001 1.1-1.4 GHz
123 2001 500-600 MHz
124 2001 733-800 MHz
125 2001 750-1200 MHz
126 2002 0.9-1 GHz
127 2002 1.1-1.35 GHz
128 2003 0.9-1.7 GHz
129 2003 1.4-2.4 GHz
130 2003 1.6-2.0 GHz
131 2004 1.65-1.9 GHz
132 2004 700 MHz
133 2005 1.05-1.35 GHz
134 2005 1.2-2.5 GHz
135 2005 1.6-3.0 GHz
136 2005 1-1.4 GHz
137 2005 2.8-3.2 GHz
138 2005 2-2.4 GHz
139 2005 3.2 GHz
140 2006 1.06-2.67 GHz
141 2006 1.1-2.33 GHz
142 2006 1.4-1.6 GHz
143 2006 3.2-4.6 GHz
144 2007 1.8-3.2 GHz
145 2007 1-1.4 GHz
146 2007 2.15-2.4 GHz
147 2007 3.5-4.7 GHz
148 2007 600-900 MHz
149 2007 850 MHz
150 2008 0.8-1.6 GHz
151 2008 1.8-2.6 GHz
152 2008 2.3-2.9 GHz
153 2008 2.4-2.88 GHz
154 2008 2.66-3.2 GHz
155 2008 2.8-4.0 GHz
156 2008 4.4 GHz
157 2008 600-866 MHz
158 2009 2.2-2.8 GHz
159 2009 2.5-3.2 GHz
160 2010 1.6 GHz
161 2010 1.73-2.66 GHz
162 2010 1.7-2.4 GHz
163 2010 1.86-3.33 GHz
164 2010 2.66-3.0 GHz
165 2010 2 GHz
166 2010 3.8-5.2 GHz
167 2010 3-4.14 GHz
168 2011 1.0-1.6 GHz
169 2011 1.6 GHz
170 2011 1.6-3.4 GHz
171 2011 1.73-2.67 GHz
172 2011 2.0 GHz
173 2011 2.8-3 GHz
174 2011 3.1-3.6 GHz
175 2012 1.73-2.53 GHz
176 2012 1.848 GHz
177 2012 3.1-5.3 GHz
178 2012 5.5 GHz
179 2013 1.9-4.4 GHz
180 2013 2.8-3 GHz
181 2013 3.6 GHz
182 2014 1.8-4 GHz
183 2014 2.5-5 GHz
184 2015 3.6 GHz
185 2015 5 GHz
186 2016 320 MHz
187 2017 1.5 GHz
188 2017 3.2-4.1 GHz
189 2017 4 GHz
190 2017 5.2 GHz
191 2017 5 GHz
192 2018 1.5 GHz
193 2018 2.2-3.2 GHz
194 2018 2.8-3.7 GHz
195 2019 2-4.7 GHz
196 2019 5.2 GHz
197 2020 3.2 GHz
198 2020 3.4-4.9 GHz
199 2021 3.2 GHz
200 2022 3.2 GHz
201 2022 5 GHz

View file

@ -0,0 +1,83 @@
# CPU clock rate data from Wikipedia
# https://en.wikipedia.org/wiki/Microprocessor_chronology
# Table data extracted with: https://wikitable2csv.ggor.de/
import numpy as np
import matplotlib
from matplotlib import pyplot as plt
plt.style.use('ggplot')
matplotlib.rcParams['font.size'] = 12
matplotlib.rcParams['font.family'] = ['Exo 2', 'sans-serif']
data = open('cpu_clock_rate.csv', 'rt')
# first, remove units and rescale everything to MHz
rescaled = []
for line in data:
date, raw = line.split(',')
try:
value, unit = raw.split()
except ValueError:
# there are lines with multiple units, for example
# 550 MHz-1.3 GHz
# take the left most one
raw = raw.split('-')[1]
value, unit = raw.split()
# if value is in the form X-Y, just use the biggest, i.e. Y
if '-' in value:
value = value.split('-')[1]
value = float(value)
# rescale value
if unit == 'kHz':
value = value/1000
elif unit == 'GHz':
value = value*1000
elif unit == 'MHz':
pass
else:
raise ValueError(f'Unit not understood! {unit}')
rescaled.append((date, value))
dtype = [('year', np.float64), ('clock', np.float64)]
rescaled = np.array(rescaled, dtype=dtype)
# sort first by year and then by value
rescaled.sort(order=['year', 'clock'])
# add some jitter on values corresponding to the same year, so that the plot
# looks more understandable
old_year = rescaled[0][0]
count = 0
for row in range(rescaled.shape[0]):
year = rescaled[row][0]
count += 1
if year != old_year:
# add jitter to the values corresponding to the previous year
prev_count = count-1
if prev_count > 1:
jitter = 1/prev_count
for i in range(1, count):
loc_year, loc_value = rescaled[row-count+i]
rescaled[row-count+i] = (loc_year+(jitter*(i-1)), loc_value)
# restart counting
count = 1
old_year = year
# plot the thing
plt.figure(figsize=(8.5,7.5))
plt.semilogy(rescaled['year'], rescaled['clock'], 'o')
# my laptop here
plt.semilogy([2020], [4900], 'o')
plt.grid(None)
plt.grid(which='both', axis='both')
plt.ylim(0.1, 10000)
plt.xlim(1968, 2025)
years = np.arange(1970, 2025, 5)
plt.xticks(years, years)
plt.yticks([0.1, 1,10,100,1000, 10000], ['1 kHz\n1 ms', '1 MHz\n1 µs', '10 MHz\n100 ns',
'100 MHz\n10 ns', '1 GHz\n1 ns', '10 GHz\n0.1 ns'])
plt.tick_params(labelright=True, top=True, right=True)
plt.title('CPU clock rate')
plt.savefig('cpu_clock_rate.svg')

File diff suppressed because it is too large Load diff

After

Width:  |  Height:  |  Size: 76 KiB

148
architecture/memory.csv Normal file
View file

@ -0,0 +1,148 @@
Generation,Type,Data rate (MT/s),Transfer time (ns),Command rate (MHz),Cycle time (ns),CAS latency,First word (ns),Fourth word (ns),Eighth word (ns)
SDRAM,PC100,100,10.000,100,10.000,2,20.00,50.00,90.00
SDRAM,PC133,133,7.500,133,7.500,3,22.50,45.00,75.00
DDR SDRAM,DDR-333,333,3.000,166,6.000,2.5,15.00,24.00,36.00
DDR SDRAM,DDR-400,400,2.500,200,5.000,3,15.00,22.50,32.50
DDR SDRAM,DDR-400,400,2.500,200,5.000,2.5,12.50,20.00,30.00
DDR SDRAM,DDR-400,400,2.500,200,5.000,2,10.00,17.50,27.50
DDR2 SDRAM,DDR2-400,400,2.500,200,5.000,4,20.00,27.50,37.50
DDR2 SDRAM,DDR2-400,400,2.500,200,5.000,3,15.00,22.50,32.50
DDR2 SDRAM,DDR2-533,533,1.875,266,3.750,4,15.00,20.63,28.13
DDR2 SDRAM,DDR2-533,533,1.875,266,3.750,3,11.25,16.88,24.38
DDR2 SDRAM,DDR2-667,667,1.500,333,3.000,5,15.00,19.50,25.50
DDR2 SDRAM,DDR2-667,667,1.500,333,3.000,4,12.00,16.50,22.50
DDR2 SDRAM,DDR2-800,800,1.250,400,2.500,6,15.00,18.75,23.75
DDR2 SDRAM,DDR2-800,800,1.250,400,2.500,5,12.50,16.25,21.25
DDR2 SDRAM,DDR2-800,800,1.250,400,2.500,4.5,11.25,15.00,20.00
DDR2 SDRAM,DDR2-800,800,1.250,400,2.500,4,10.00,13.75,18.75
DDR2 SDRAM,DDR2-1066,1066,0.938,533,1.875,7,13.13,15.94,19.69
DDR2 SDRAM,DDR2-1066,1066,0.938,533,1.875,6,11.25,14.06,17.81
DDR2 SDRAM,DDR2-1066,1066,0.938,533,1.875,5,9.38,12.19,15.94
DDR2 SDRAM,DDR2-1066,1066,0.938,533,1.875,4.5,8.44,11.25,15.00
DDR2 SDRAM,DDR2-1066,1066,0.938,533,1.875,4,7.50,10.31,14.06
DDR3 SDRAM,DDR3-1066,1066,0.938,533,1.875,7,13.13,15.94,19.69
DDR3 SDRAM,DDR3-1333,1333,0.750,666,1.500,9,13.50,15.75,18.75
DDR3 SDRAM,DDR3-1333,1333,0.750,666,1.500,7,10.50,12.75,15.75
DDR3 SDRAM,DDR3-1333,1333,0.750,666,1.500,6,9.00,11.25,14.25
DDR3 SDRAM,DDR3-1375,1375,0.727,687,1.455,5,7.27,9.45,12.36
DDR3 SDRAM,DDR3-1600,1600,0.625,800,1.250,11,13.75,15.63,18.13
DDR3 SDRAM,DDR3-1600,1600,0.625,800,1.250,10,12.50,14.38,16.88
DDR3 SDRAM,DDR3-1600,1600,0.625,800,1.250,9,11.25,13.13,15.63
DDR3 SDRAM,DDR3-1600,1600,0.625,800,1.250,8,10.00,11.88,14.38
DDR3 SDRAM,DDR3-1600,1600,0.625,800,1.250,7,8.75,10.63,13.13
DDR3 SDRAM,DDR3-1600,1600,0.625,800,1.250,6,7.50,9.38,11.88
DDR3 SDRAM,DDR3-1866,1866,0.536,933,1.071,10,10.71,12.32,14.46
DDR3 SDRAM,DDR3-1866,1866,0.536,933,1.071,9,9.64,11.25,13.39
DDR3 SDRAM,DDR3-1866,1866,0.536,933,1.071,8,8.57,10.18,12.32
DDR3 SDRAM,DDR3-2000,2000,0.500,1000,1.000,9,9.00,10.50,12.50
DDR3 SDRAM,DDR3-2133,2133,0.469,1066,0.938,12,11.25,12.66,14.53
DDR3 SDRAM,DDR3-2133,2133,0.469,1066,0.938,11,10.31,11.72,13.59
DDR3 SDRAM,DDR3-2133,2133,0.469,1066,0.938,10,9.38,10.78,12.66
DDR3 SDRAM,DDR3-2133,2133,0.469,1066,0.938,9,8.44,9.84,11.72
DDR3 SDRAM,DDR3-2133,2133,0.469,1066,0.938,8,7.50,8.91,10.78
DDR3 SDRAM,DDR3-2133,2133,0.469,1066,0.938,7,6.56,7.97,9.84
DDR3 SDRAM,DDR3-2200,2200,0.455,1100,0.909,7,6.36,7.73,9.55
DDR3 SDRAM,DDR3-2400,2400,0.417,1200,0.833,13,10.83,12.08,13.75
DDR3 SDRAM,DDR3-2400,2400,0.417,1200,0.833,12,10.00,11.25,12.92
DDR3 SDRAM,DDR3-2400,2400,0.417,1200,0.833,11,9.17,10.42,12.08
DDR3 SDRAM,DDR3-2400,2400,0.417,1200,0.833,10,8.33,9.58,11.25
DDR3 SDRAM,DDR3-2400,2400,0.417,1200,0.833,9,7.50,8.75,10.42
DDR3 SDRAM,DDR3-2600,2600,0.385,1300,0.769,11,8.46,9.62,11.15
DDR3 SDRAM,DDR3-2666,2666,0.375,1333,0.750,15,11.25,12.38,13.88
DDR3 SDRAM,DDR3-2666,2666,0.375,1333,0.750,13,9.75,10.88,12.38
DDR3 SDRAM,DDR3-2666,2666,0.375,1333,0.750,12,9.00,10.13,11.63
DDR3 SDRAM,DDR3-2666,2666,0.375,1333,0.750,11,8.25,9.38,10.88
DDR3 SDRAM,DDR3-2800,2800,0.357,1400,0.714,16,11.43,12.50,13.93
DDR3 SDRAM,DDR3-2800,2800,0.357,1400,0.714,12,8.57,9.64,11.07
DDR3 SDRAM,DDR3-2800,2800,0.357,1400,0.714,11,7.86,8.93,10.36
DDR3 SDRAM,DDR3-2933,2933,0.341,1466,0.682,12,8.18,9.20,10.57
DDR3 SDRAM,DDR3-3000,3000,0.333,1500,0.667,12,8.00,9.00,10.33
DDR3 SDRAM,DDR3-3100,3100,0.323,1550,0.645,12,7.74,8.71,10.00
DDR3 SDRAM,DDR3-3200,3200,0.313,1600,0.625,16,10.00,10.94,12.19
DDR3 SDRAM,DDR3-3300,3300,0.303,1650,0.606,16,9.70,10.61,11.82
DDR4 SDRAM,DDR4-1600,1600,0.625,800,1.250,12,15.00,16.88,19.38
DDR4 SDRAM,DDR4-1600,1600,0.625,800,1.250,11,13.75,15.63,18.13
DDR4 SDRAM,DDR4-1600,1600,0.625,800,1.250,10,12.50,14.38,16.88
DDR4 SDRAM,DDR4-1866,1866,0.536,933,1.071,14,15.00,16.61,18.75
DDR4 SDRAM,DDR4-1866,1866,0.536,933,1.071,13,13.93,15.54,17.68
DDR4 SDRAM,DDR4-1866,1866,0.536,933,1.071,12,12.86,14.46,16.61
DDR4 SDRAM,DDR4-2133,2133,0.469,1066,0.938,16,15.00,16.41,18.28
DDR4 SDRAM,DDR4-2133,2133,0.469,1066,0.938,15,14.06,15.47,17.34
DDR4 SDRAM,DDR4-2133,2133,0.469,1066,0.938,14,13.13,14.53,16.41
DDR4 SDRAM,DDR4-2400,2400,0.417,1200,0.833,17,14.17,15.42,17.08
DDR4 SDRAM,DDR4-2400,2400,0.417,1200,0.833,16,13.33,14.58,16.25
DDR4 SDRAM,DDR4-2400,2400,0.417,1200,0.833,15,12.50,13.75,15.42
DDR4 SDRAM,DDR4-2666,2666,0.375,1333,0.750,19,14.25,15.38,16.88
DDR4 SDRAM,DDR4-2666,2666,0.375,1333,0.750,17,12.75,13.88,15.38
DDR4 SDRAM,DDR4-2666,2666,0.375,1333,0.750,16,12.00,13.13,14.63
DDR4 SDRAM,DDR4-2666,2666,0.375,1333,0.750,15,11.25,12.38,13.88
DDR4 SDRAM,DDR4-2666,2666,0.375,1333,0.750,13,9.75,10.88,12.38
DDR4 SDRAM,DDR4-2800,2800,0.357,1400,0.714,17,12.14,13.21,14.64
DDR4 SDRAM,DDR4-2800,2800,0.357,1400,0.714,16,11.43,12.50,13.93
DDR4 SDRAM,DDR4-2800,2800,0.357,1400,0.714,15,10.71,11.79,13.21
DDR4 SDRAM,DDR4-2800,2800,0.357,1400,0.714,14,10.00,11.07,12.50
DDR4 SDRAM,DDR4-3000,3000,0.333,1500,0.667,17,11.33,12.33,13.67
DDR4 SDRAM,DDR4-3000,3000,0.333,1500,0.667,16,10.67,11.67,13.00
DDR4 SDRAM,DDR4-3000,3000,0.333,1500,0.667,15,10.00,11.00,12.33
DDR4 SDRAM,DDR4-3000,3000,0.333,1500,0.667,14,9.33,10.33,11.67
DDR4 SDRAM,DDR4-3200,3200,0.313,1600,0.625,16,10.00,10.94,12.19
DDR4 SDRAM,DDR4-3200,3200,0.313,1600,0.625,15,9.38,10.31,11.56
DDR4 SDRAM,DDR4-3200,3200,0.313,1600,0.625,14,8.75,9.69,10.94
DDR4 SDRAM,DDR4-3300,3300,0.303,1650,0.606,16,9.70,10.61,11.82
DDR4 SDRAM,DDR4-3333,3333,0.300,1666,0.600,16,9.60,10.50,11.70
DDR4 SDRAM,DDR4-3400,3400,0.294,1700,0.588,16,9.41,10.29,11.47
DDR4 SDRAM,DDR4-3466,3466,0.288,1733,0.577,18,10.38,11.25,12.40
DDR4 SDRAM,DDR4-3466,3466,0.288,1733,0.577,17,9.81,10.67,11.83
DDR4 SDRAM,DDR4-3466,3466,0.288,1733,0.577,16,9.23,10.10,11.25
DDR4 SDRAM,DDR4-3600,3600,0.278,1800,0.556,19,10.56,11.39,12.50
DDR4 SDRAM,DDR4-3600,3600,0.278,1800,0.556,18,10.00,10.83,11.94
DDR4 SDRAM,DDR4-3600,3600,0.278,1800,0.556,17,9.44,10.28,11.39
DDR4 SDRAM,DDR4-3600,3600,0.278,1800,0.556,16,8.89,9.72,10.83
DDR4 SDRAM,DDR4-3600,3600,0.278,1800,0.556,15,8.33,9.17,10.28
DDR4 SDRAM,DDR4-3600,3600,0.278,1800,0.556,14,7.78,8.61,9.72
DDR4 SDRAM,DDR4-3733,3733,0.268,1866,0.536,17,9.11,9.91,10.98
DDR4 SDRAM,DDR4-3866,3866,0.259,1933,0.517,18,9.31,10.09,11.12
DDR4 SDRAM,DDR4-4000,4000,0.250,2000,0.500,19,9.50,10.25,11.25
DDR4 SDRAM,DDR4-4000,4000,0.250,2000,0.500,18,9.00,9.75,10.75
DDR4 SDRAM,DDR4-4000,4000,0.250,2000,0.500,17,8.50,9.25,10.25
DDR4 SDRAM,DDR4-4000,4000,0.250,2000,0.500,16,8.00,8.75,9.75
DDR4 SDRAM,DDR4-4133,4133,0.242,2066,0.484,19,9.19,9.92,10.89
DDR4 SDRAM,DDR4-4200,4200,0.238,2100,0.476,19,9.05,9.76,10.71
DDR4 SDRAM,DDR4-4266,4266,0.234,2133,0.469,19,8.91,9.61,10.55
DDR4 SDRAM,DDR4-4266,4266,0.234,2133,0.469,18,8.44,9.14,10.08
DDR4 SDRAM,DDR4-4266,4266,0.234,2133,0.469,17,7.97,8.67,9.61
DDR4 SDRAM,DDR4-4266,4266,0.234,2133,0.469,16,7.50,8.20,9.14
DDR4 SDRAM,DDR4-4400,4400,0.227,2200,0.454,19,8.64,9.32,10.23
DDR4 SDRAM,DDR4-4400,4400,0.227,2200,0.454,18,8.18,8.86,9.77
DDR4 SDRAM,DDR4-4400,4400,0.227,2200,0.454,17,7.73,8.41,9.32
DDR4 SDRAM,DDR4-4600,4600,0.217,2300,0.435,19,8.26,8.91,9.78
DDR4 SDRAM,DDR4-4600,4600,0.217,2300,0.435,18,7.82,8.48,9.35
DDR4 SDRAM,DDR4-4800,4800,0.208,2400,0.417,20,8.33,8.96,9.79
DDR4 SDRAM,DDR4-4800,4800,0.208,2400,0.417,19,7.92,8.54,9.38
DDR5 SDRAM,DDR5-4800,4800,0.208,2400,0.417,40,16.67,17.29,18.13
DDR5 SDRAM,DDR5-4800,4800,0.208,2400,0.417,38,15.83,16.46,17.29
DDR5 SDRAM,DDR5-4800,4800,0.208,2400,0.417,36,15.00,15.63,16.46
DDR5 SDRAM,DDR5-4800,4800,0.208,2400,0.417,34,14.17,14.79,15.63
DDR5 SDRAM,DDR5-5200,5200,0.192,2600,0.385,40,15.38,15.96,16.73
DDR5 SDRAM,DDR5-5200,5200,0.192,2600,0.385,38,14.62,15.19,15.96
DDR5 SDRAM,DDR5-5200,5200,0.192,2600,0.385,36,13.85,14.42,15.19
DDR5 SDRAM,DDR5-5200,5200,0.192,2600,0.385,34,13.08,13.65,14.42
DDR5 SDRAM,DDR5-5600,5600,0.179,2800,0.357,40,14.29,14.82,15.54
DDR5 SDRAM,DDR5-5600,5600,0.179,2800,0.357,38,13.57,14.11,14.82
DDR5 SDRAM,DDR5-5600,5600,0.179,2800,0.357,36,12.86,13.39,14.11
DDR5 SDRAM,DDR5-5600,5600,0.179,2800,0.357,34,12.14,12.68,13.39
DDR5 SDRAM,DDR5-5600,5600,0.179,2800,0.357,30,10.71,11.25,11.96
DDR5 SDRAM,DDR5-6000,6000,0.167,3000,0.333,40,13.33,13.83,14.50
DDR5 SDRAM,DDR5-6000,6000,0.167,3000,0.333,38,12.67,13.17,13.83
DDR5 SDRAM,DDR5-6000,6000,0.167,3000,0.333,36,12.00,12.50,13.17
DDR5 SDRAM,DDR5-6000,6000,0.167,3000,0.333,32,10.67,11.17,11.83
DDR5 SDRAM,DDR5-6000,6000,0.167,3000,0.333,30,10.00,10.50,11.17
DDR5 SDRAM,DDR5-6200,6200,0.161,3100,0.323,40,12.90,13.39,14.03
DDR5 SDRAM,DDR5-6200,6200,0.161,3100,0.323,38,12.26,12.74,13.39
DDR5 SDRAM,DDR5-6200,6200,0.161,3100,0.323,36,11.61,12.10,12.74
DDR5 SDRAM,DDR5-6400,6400,0.156,3200,0.313,40,12.50,12.97,13.59
DDR5 SDRAM,DDR5-6400,6400,0.156,3200,0.313,38,11.88,12.34,12.97
DDR5 SDRAM,DDR5-6400,6400,0.156,3200,0.313,36,11.25,11.72,12.34
DDR5 SDRAM,DDR5-6400,6400,0.156,3200,0.313,34,10.63,11.09,11.72
DDR5 SDRAM,DDR5-6400,6400,0.156,3200,0.313,32,10.00,10.47,11.09
DDR5 SDRAM,DDR5-6600,6600,0.152,3300,0.303,34,10.30,10.76,11.36
1 Generation Type Data rate (MT/s) Transfer time (ns) Command rate (MHz) Cycle time (ns) CAS latency First word (ns) Fourth word (ns) Eighth word (ns)
2 SDRAM PC100 100 10.000 100 10.000 2 20.00 50.00 90.00
3 SDRAM PC133 133 7.500 133 7.500 3 22.50 45.00 75.00
4 DDR SDRAM DDR-333 333 3.000 166 6.000 2.5 15.00 24.00 36.00
5 DDR SDRAM DDR-400 400 2.500 200 5.000 3 15.00 22.50 32.50
6 DDR SDRAM DDR-400 400 2.500 200 5.000 2.5 12.50 20.00 30.00
7 DDR SDRAM DDR-400 400 2.500 200 5.000 2 10.00 17.50 27.50
8 DDR2 SDRAM DDR2-400 400 2.500 200 5.000 4 20.00 27.50 37.50
9 DDR2 SDRAM DDR2-400 400 2.500 200 5.000 3 15.00 22.50 32.50
10 DDR2 SDRAM DDR2-533 533 1.875 266 3.750 4 15.00 20.63 28.13
11 DDR2 SDRAM DDR2-533 533 1.875 266 3.750 3 11.25 16.88 24.38
12 DDR2 SDRAM DDR2-667 667 1.500 333 3.000 5 15.00 19.50 25.50
13 DDR2 SDRAM DDR2-667 667 1.500 333 3.000 4 12.00 16.50 22.50
14 DDR2 SDRAM DDR2-800 800 1.250 400 2.500 6 15.00 18.75 23.75
15 DDR2 SDRAM DDR2-800 800 1.250 400 2.500 5 12.50 16.25 21.25
16 DDR2 SDRAM DDR2-800 800 1.250 400 2.500 4.5 11.25 15.00 20.00
17 DDR2 SDRAM DDR2-800 800 1.250 400 2.500 4 10.00 13.75 18.75
18 DDR2 SDRAM DDR2-1066 1066 0.938 533 1.875 7 13.13 15.94 19.69
19 DDR2 SDRAM DDR2-1066 1066 0.938 533 1.875 6 11.25 14.06 17.81
20 DDR2 SDRAM DDR2-1066 1066 0.938 533 1.875 5 9.38 12.19 15.94
21 DDR2 SDRAM DDR2-1066 1066 0.938 533 1.875 4.5 8.44 11.25 15.00
22 DDR2 SDRAM DDR2-1066 1066 0.938 533 1.875 4 7.50 10.31 14.06
23 DDR3 SDRAM DDR3-1066 1066 0.938 533 1.875 7 13.13 15.94 19.69
24 DDR3 SDRAM DDR3-1333 1333 0.750 666 1.500 9 13.50 15.75 18.75
25 DDR3 SDRAM DDR3-1333 1333 0.750 666 1.500 7 10.50 12.75 15.75
26 DDR3 SDRAM DDR3-1333 1333 0.750 666 1.500 6 9.00 11.25 14.25
27 DDR3 SDRAM DDR3-1375 1375 0.727 687 1.455 5 7.27 9.45 12.36
28 DDR3 SDRAM DDR3-1600 1600 0.625 800 1.250 11 13.75 15.63 18.13
29 DDR3 SDRAM DDR3-1600 1600 0.625 800 1.250 10 12.50 14.38 16.88
30 DDR3 SDRAM DDR3-1600 1600 0.625 800 1.250 9 11.25 13.13 15.63
31 DDR3 SDRAM DDR3-1600 1600 0.625 800 1.250 8 10.00 11.88 14.38
32 DDR3 SDRAM DDR3-1600 1600 0.625 800 1.250 7 8.75 10.63 13.13
33 DDR3 SDRAM DDR3-1600 1600 0.625 800 1.250 6 7.50 9.38 11.88
34 DDR3 SDRAM DDR3-1866 1866 0.536 933 1.071 10 10.71 12.32 14.46
35 DDR3 SDRAM DDR3-1866 1866 0.536 933 1.071 9 9.64 11.25 13.39
36 DDR3 SDRAM DDR3-1866 1866 0.536 933 1.071 8 8.57 10.18 12.32
37 DDR3 SDRAM DDR3-2000 2000 0.500 1000 1.000 9 9.00 10.50 12.50
38 DDR3 SDRAM DDR3-2133 2133 0.469 1066 0.938 12 11.25 12.66 14.53
39 DDR3 SDRAM DDR3-2133 2133 0.469 1066 0.938 11 10.31 11.72 13.59
40 DDR3 SDRAM DDR3-2133 2133 0.469 1066 0.938 10 9.38 10.78 12.66
41 DDR3 SDRAM DDR3-2133 2133 0.469 1066 0.938 9 8.44 9.84 11.72
42 DDR3 SDRAM DDR3-2133 2133 0.469 1066 0.938 8 7.50 8.91 10.78
43 DDR3 SDRAM DDR3-2133 2133 0.469 1066 0.938 7 6.56 7.97 9.84
44 DDR3 SDRAM DDR3-2200 2200 0.455 1100 0.909 7 6.36 7.73 9.55
45 DDR3 SDRAM DDR3-2400 2400 0.417 1200 0.833 13 10.83 12.08 13.75
46 DDR3 SDRAM DDR3-2400 2400 0.417 1200 0.833 12 10.00 11.25 12.92
47 DDR3 SDRAM DDR3-2400 2400 0.417 1200 0.833 11 9.17 10.42 12.08
48 DDR3 SDRAM DDR3-2400 2400 0.417 1200 0.833 10 8.33 9.58 11.25
49 DDR3 SDRAM DDR3-2400 2400 0.417 1200 0.833 9 7.50 8.75 10.42
50 DDR3 SDRAM DDR3-2600 2600 0.385 1300 0.769 11 8.46 9.62 11.15
51 DDR3 SDRAM DDR3-2666 2666 0.375 1333 0.750 15 11.25 12.38 13.88
52 DDR3 SDRAM DDR3-2666 2666 0.375 1333 0.750 13 9.75 10.88 12.38
53 DDR3 SDRAM DDR3-2666 2666 0.375 1333 0.750 12 9.00 10.13 11.63
54 DDR3 SDRAM DDR3-2666 2666 0.375 1333 0.750 11 8.25 9.38 10.88
55 DDR3 SDRAM DDR3-2800 2800 0.357 1400 0.714 16 11.43 12.50 13.93
56 DDR3 SDRAM DDR3-2800 2800 0.357 1400 0.714 12 8.57 9.64 11.07
57 DDR3 SDRAM DDR3-2800 2800 0.357 1400 0.714 11 7.86 8.93 10.36
58 DDR3 SDRAM DDR3-2933 2933 0.341 1466 0.682 12 8.18 9.20 10.57
59 DDR3 SDRAM DDR3-3000 3000 0.333 1500 0.667 12 8.00 9.00 10.33
60 DDR3 SDRAM DDR3-3100 3100 0.323 1550 0.645 12 7.74 8.71 10.00
61 DDR3 SDRAM DDR3-3200 3200 0.313 1600 0.625 16 10.00 10.94 12.19
62 DDR3 SDRAM DDR3-3300 3300 0.303 1650 0.606 16 9.70 10.61 11.82
63 DDR4 SDRAM DDR4-1600 1600 0.625 800 1.250 12 15.00 16.88 19.38
64 DDR4 SDRAM DDR4-1600 1600 0.625 800 1.250 11 13.75 15.63 18.13
65 DDR4 SDRAM DDR4-1600 1600 0.625 800 1.250 10 12.50 14.38 16.88
66 DDR4 SDRAM DDR4-1866 1866 0.536 933 1.071 14 15.00 16.61 18.75
67 DDR4 SDRAM DDR4-1866 1866 0.536 933 1.071 13 13.93 15.54 17.68
68 DDR4 SDRAM DDR4-1866 1866 0.536 933 1.071 12 12.86 14.46 16.61
69 DDR4 SDRAM DDR4-2133 2133 0.469 1066 0.938 16 15.00 16.41 18.28
70 DDR4 SDRAM DDR4-2133 2133 0.469 1066 0.938 15 14.06 15.47 17.34
71 DDR4 SDRAM DDR4-2133 2133 0.469 1066 0.938 14 13.13 14.53 16.41
72 DDR4 SDRAM DDR4-2400 2400 0.417 1200 0.833 17 14.17 15.42 17.08
73 DDR4 SDRAM DDR4-2400 2400 0.417 1200 0.833 16 13.33 14.58 16.25
74 DDR4 SDRAM DDR4-2400 2400 0.417 1200 0.833 15 12.50 13.75 15.42
75 DDR4 SDRAM DDR4-2666 2666 0.375 1333 0.750 19 14.25 15.38 16.88
76 DDR4 SDRAM DDR4-2666 2666 0.375 1333 0.750 17 12.75 13.88 15.38
77 DDR4 SDRAM DDR4-2666 2666 0.375 1333 0.750 16 12.00 13.13 14.63
78 DDR4 SDRAM DDR4-2666 2666 0.375 1333 0.750 15 11.25 12.38 13.88
79 DDR4 SDRAM DDR4-2666 2666 0.375 1333 0.750 13 9.75 10.88 12.38
80 DDR4 SDRAM DDR4-2800 2800 0.357 1400 0.714 17 12.14 13.21 14.64
81 DDR4 SDRAM DDR4-2800 2800 0.357 1400 0.714 16 11.43 12.50 13.93
82 DDR4 SDRAM DDR4-2800 2800 0.357 1400 0.714 15 10.71 11.79 13.21
83 DDR4 SDRAM DDR4-2800 2800 0.357 1400 0.714 14 10.00 11.07 12.50
84 DDR4 SDRAM DDR4-3000 3000 0.333 1500 0.667 17 11.33 12.33 13.67
85 DDR4 SDRAM DDR4-3000 3000 0.333 1500 0.667 16 10.67 11.67 13.00
86 DDR4 SDRAM DDR4-3000 3000 0.333 1500 0.667 15 10.00 11.00 12.33
87 DDR4 SDRAM DDR4-3000 3000 0.333 1500 0.667 14 9.33 10.33 11.67
88 DDR4 SDRAM DDR4-3200 3200 0.313 1600 0.625 16 10.00 10.94 12.19
89 DDR4 SDRAM DDR4-3200 3200 0.313 1600 0.625 15 9.38 10.31 11.56
90 DDR4 SDRAM DDR4-3200 3200 0.313 1600 0.625 14 8.75 9.69 10.94
91 DDR4 SDRAM DDR4-3300 3300 0.303 1650 0.606 16 9.70 10.61 11.82
92 DDR4 SDRAM DDR4-3333 3333 0.300 1666 0.600 16 9.60 10.50 11.70
93 DDR4 SDRAM DDR4-3400 3400 0.294 1700 0.588 16 9.41 10.29 11.47
94 DDR4 SDRAM DDR4-3466 3466 0.288 1733 0.577 18 10.38 11.25 12.40
95 DDR4 SDRAM DDR4-3466 3466 0.288 1733 0.577 17 9.81 10.67 11.83
96 DDR4 SDRAM DDR4-3466 3466 0.288 1733 0.577 16 9.23 10.10 11.25
97 DDR4 SDRAM DDR4-3600 3600 0.278 1800 0.556 19 10.56 11.39 12.50
98 DDR4 SDRAM DDR4-3600 3600 0.278 1800 0.556 18 10.00 10.83 11.94
99 DDR4 SDRAM DDR4-3600 3600 0.278 1800 0.556 17 9.44 10.28 11.39
100 DDR4 SDRAM DDR4-3600 3600 0.278 1800 0.556 16 8.89 9.72 10.83
101 DDR4 SDRAM DDR4-3600 3600 0.278 1800 0.556 15 8.33 9.17 10.28
102 DDR4 SDRAM DDR4-3600 3600 0.278 1800 0.556 14 7.78 8.61 9.72
103 DDR4 SDRAM DDR4-3733 3733 0.268 1866 0.536 17 9.11 9.91 10.98
104 DDR4 SDRAM DDR4-3866 3866 0.259 1933 0.517 18 9.31 10.09 11.12
105 DDR4 SDRAM DDR4-4000 4000 0.250 2000 0.500 19 9.50 10.25 11.25
106 DDR4 SDRAM DDR4-4000 4000 0.250 2000 0.500 18 9.00 9.75 10.75
107 DDR4 SDRAM DDR4-4000 4000 0.250 2000 0.500 17 8.50 9.25 10.25
108 DDR4 SDRAM DDR4-4000 4000 0.250 2000 0.500 16 8.00 8.75 9.75
109 DDR4 SDRAM DDR4-4133 4133 0.242 2066 0.484 19 9.19 9.92 10.89
110 DDR4 SDRAM DDR4-4200 4200 0.238 2100 0.476 19 9.05 9.76 10.71
111 DDR4 SDRAM DDR4-4266 4266 0.234 2133 0.469 19 8.91 9.61 10.55
112 DDR4 SDRAM DDR4-4266 4266 0.234 2133 0.469 18 8.44 9.14 10.08
113 DDR4 SDRAM DDR4-4266 4266 0.234 2133 0.469 17 7.97 8.67 9.61
114 DDR4 SDRAM DDR4-4266 4266 0.234 2133 0.469 16 7.50 8.20 9.14
115 DDR4 SDRAM DDR4-4400 4400 0.227 2200 0.454 19 8.64 9.32 10.23
116 DDR4 SDRAM DDR4-4400 4400 0.227 2200 0.454 18 8.18 8.86 9.77
117 DDR4 SDRAM DDR4-4400 4400 0.227 2200 0.454 17 7.73 8.41 9.32
118 DDR4 SDRAM DDR4-4600 4600 0.217 2300 0.435 19 8.26 8.91 9.78
119 DDR4 SDRAM DDR4-4600 4600 0.217 2300 0.435 18 7.82 8.48 9.35
120 DDR4 SDRAM DDR4-4800 4800 0.208 2400 0.417 20 8.33 8.96 9.79
121 DDR4 SDRAM DDR4-4800 4800 0.208 2400 0.417 19 7.92 8.54 9.38
122 DDR5 SDRAM DDR5-4800 4800 0.208 2400 0.417 40 16.67 17.29 18.13
123 DDR5 SDRAM DDR5-4800 4800 0.208 2400 0.417 38 15.83 16.46 17.29
124 DDR5 SDRAM DDR5-4800 4800 0.208 2400 0.417 36 15.00 15.63 16.46
125 DDR5 SDRAM DDR5-4800 4800 0.208 2400 0.417 34 14.17 14.79 15.63
126 DDR5 SDRAM DDR5-5200 5200 0.192 2600 0.385 40 15.38 15.96 16.73
127 DDR5 SDRAM DDR5-5200 5200 0.192 2600 0.385 38 14.62 15.19 15.96
128 DDR5 SDRAM DDR5-5200 5200 0.192 2600 0.385 36 13.85 14.42 15.19
129 DDR5 SDRAM DDR5-5200 5200 0.192 2600 0.385 34 13.08 13.65 14.42
130 DDR5 SDRAM DDR5-5600 5600 0.179 2800 0.357 40 14.29 14.82 15.54
131 DDR5 SDRAM DDR5-5600 5600 0.179 2800 0.357 38 13.57 14.11 14.82
132 DDR5 SDRAM DDR5-5600 5600 0.179 2800 0.357 36 12.86 13.39 14.11
133 DDR5 SDRAM DDR5-5600 5600 0.179 2800 0.357 34 12.14 12.68 13.39
134 DDR5 SDRAM DDR5-5600 5600 0.179 2800 0.357 30 10.71 11.25 11.96
135 DDR5 SDRAM DDR5-6000 6000 0.167 3000 0.333 40 13.33 13.83 14.50
136 DDR5 SDRAM DDR5-6000 6000 0.167 3000 0.333 38 12.67 13.17 13.83
137 DDR5 SDRAM DDR5-6000 6000 0.167 3000 0.333 36 12.00 12.50 13.17
138 DDR5 SDRAM DDR5-6000 6000 0.167 3000 0.333 32 10.67 11.17 11.83
139 DDR5 SDRAM DDR5-6000 6000 0.167 3000 0.333 30 10.00 10.50 11.17
140 DDR5 SDRAM DDR5-6200 6200 0.161 3100 0.323 40 12.90 13.39 14.03
141 DDR5 SDRAM DDR5-6200 6200 0.161 3100 0.323 38 12.26 12.74 13.39
142 DDR5 SDRAM DDR5-6200 6200 0.161 3100 0.323 36 11.61 12.10 12.74
143 DDR5 SDRAM DDR5-6400 6400 0.156 3200 0.313 40 12.50 12.97 13.59
144 DDR5 SDRAM DDR5-6400 6400 0.156 3200 0.313 38 11.88 12.34 12.97
145 DDR5 SDRAM DDR5-6400 6400 0.156 3200 0.313 36 11.25 11.72 12.34
146 DDR5 SDRAM DDR5-6400 6400 0.156 3200 0.313 34 10.63 11.09 11.72
147 DDR5 SDRAM DDR5-6400 6400 0.156 3200 0.313 32 10.00 10.47 11.09
148 DDR5 SDRAM DDR5-6600 6600 0.152 3300 0.303 34 10.30 10.76 11.36

101
architecture/memory.py Normal file
View file

@ -0,0 +1,101 @@
# RAM clock rate and transfer rate data from Wikipedia
# https://en.wikipedia.org/wiki/DDR_SDRAM
# Table data extracted with: https://wikitable2csv.ggor.de/
import numpy as np
import pandas
import matplotlib
from matplotlib import pyplot as plt
plt.style.use('ggplot')
matplotlib.rcParams['font.size'] = 12
matplotlib.rcParams['font.family'] = ['Exo 2', 'sans-serif']
data = pandas.read_csv('memory.csv')
#data = data.sort_values('Memory clock (MHz)')
_types = list(data['Type'])
_transfers = list(data['Data rate (MT/s)'])
_cycle_times = list(data['Cycle time (ns)'])
_latencies = list(data['Eighth word (ns)'])
# remove redundant data
types = []
transfers = []
cycle_times = []
latencies = []
for idx, typ in enumerate(_types):
# just select the first occurence of this type
if typ in types:
continue
# filter DDR4 inferior to DDR4-3333
prefix = 'DDR4-'
if typ.startswith(prefix) and int(typ.removeprefix(prefix)) < 3333:
continue
types.append(typ)
transfers.append(_transfers[idx])
cycle_times.append(_cycle_times[idx])
latencies.append(_latencies[idx])
# transform transfers from MT/s to GB/s
transfers = np.array(transfers, dtype='float64')*8/1024
plt.figure(figsize=(8.5,7.5))
plt.title('Memory Bandwidth [GB/s] (1995-2023)')
# my laptop first
me = types.index('DDR5-5200')
plt.plot(range(len(types)), transfers, 'o')
plt.plot(me, transfers[me], 'o')
plt.xticks(range(len(types))[::3], types[::3], rotation=30, ha='right')
yticks = range(0,56)
ylabels = []
for t in yticks:
if not t%5:
ylabels.append(str(t))
else:
ylabels.append('')
plt.yticks(yticks, ylabels)
plt.ylabel('GB/s')
plt.tick_params(axis='y', which='both', reset=True, labelright=True, right=True)
plt.savefig('memory_bandwidth.svg')
plt.figure(figsize=(8.5,7.5))
plt.title('Memory Cycle Time [ns] (1995-2023)')
# my laptop first
me = types.index('DDR5-5200')
plt.semilogy(range(len(types)), cycle_times, 'o')
plt.semilogy(me, cycle_times[me], 'o')
plt.xticks(range(len(types))[::3], types[::3], rotation=30, ha='right')
plt.ylim(0.1, 11)
line = np.arange(0,10).astype('float64')
yticks = list(line[1:]*0.1)+list(line[1:])+list(line[1:2]*10)
ylabels = []
for value in yticks:
if value in (0.1, 0.5, 1., 1.5, 5., 10.):
ylabels.append(f'{value} ns')
else:
ylabels.append('')
plt.yticks(yticks, ylabels)
plt.tick_params(axis='y', which='both', reset=True, labelright=True, right=True)
plt.savefig('memory_clock.svg')
# transform transfers from MT/s to GB/s
plt.figure(figsize=(8.5,7.5))
plt.title('Memory Latency [ns] (1998-2023)')
# my laptop first
me = types.index('DDR5-5200')
mel = latencies[me]
y = latencies[2:]
x = types[2:]
plt.plot(range(len(x)), y, 'o')
plt.plot(me, mel, 'o')
plt.xticks(range(len(x))[::3], x[::3], rotation=30, ha='right')
plt.ylim(8,40)
yticks = range(8,41)
ylabels = []
for t in yticks:
if not t%5:
ylabels.append(str(t)+' ns')
else:
ylabels.append('')
plt.yticks(yticks, ylabels)
plt.tick_params(axis='y', which='both', reset=True, labelright=True, right=True)
plt.savefig('memory_latency.svg')

File diff suppressed because it is too large Load diff

After

Width:  |  Height:  |  Size: 81 KiB

File diff suppressed because it is too large Load diff

After

Width:  |  Height:  |  Size: 56 KiB

File diff suppressed because it is too large Load diff

After

Width:  |  Height:  |  Size: 63 KiB

49
architecture/storage.csv Normal file
View file

@ -0,0 +1,49 @@
Teletype Model 33 paper tape,10 B/s,1963
Single Density 8-inch FM Floppy Disk Controller (160 KB),31 KB/s,1973
C2N Commodore Datasette 1530 cassette tape interface,15 B/s,1977
Apple II cassette tape interface,200 B/s,1977
TRS-80 Model 1 Level 1 BASIC cassette tape interface,32 B/s,1977
Single Density 5.25-inch FM Floppy Disk Controller (180 KB),15.5 KB/s,1978
MFM hard disk,0.625 MB/s,1980
Amstrad CPC tape,250 B/s,1984
High Density MFM Floppy Disk Controller (1.2 MB/1.44 MB),31 KB/s,1984
ATA PIO Mode 0,3.3 MB/s,1986
SCSI (Narrow SCSI) (5 MHz),5 MB/s,1986
CD Controller (1×),0.146 MB/s,1988
Serial Storage Architecture SSA,80 MB/s,1990
ATA PIO Mode 1,5.2 MB/s,1994
ATA PIO Mode 2,8.3 MB/s,1994
ATA PIO Mode 3,11.1 MB/s,1996
ATA PIO Mode 4,16.7 MB/s,1996
Fibre Channel 1GFC (1.0625 GHz),103.23 MB/s,1997
Ultra DMA ATA 33,33 MB/s,1998
Ultra DMA ATA 66,66.7 MB/s,2000
Fibre Channel 2GFC (2.125 GHz),206.5 MB/s,2001
Ultra DMA ATA 100,100 MB/s,2002
SATA revision 1.0,150 MB/s,2003
iSCSI over 10GbE,1.239 GB/s,2004
iSCSI over Fast Ethernet,11.9 MB/s,2004
iSCSI over gigabit Ethernet- jumbo frames,123.9 MB/s,2004
Serial Attached SCSI (SAS) SAS-1,300 MB/s,2004
SATA Revision 2.0,300 MB/s,2004
Fibre Channel 4GFC (4.25 GHz),413 MB/s,2004
Ultra DMA ATA 133,133 MB/s,2005
Fibre Channel 8GFC (8.50 GHz),826 MB/s,2005
iSCSI over InfiniBand 4×,4 GB/s,2007
SATA Revision 3.0,600 MB/s,2008
FCoE over 10GbE,1.206 GB/s,2009
AoE over 10GbE,1.242 GB/s,2009
AoE over Fast Ethernet,11.9 MB/s,2009
AoE over gigabit Ethernet- jumbo frames,124.2 MB/s,2009
Serial Attached SCSI (SAS) SAS-2,600 MB/s,2009
FCoE over 100G Ethernet,12.064 GB/s,2010
iSCSI over 100G Ethernet,12.392 GB/s,2010
Fibre Channel 16GFC (14.025 GHz),1.652 GB/s,2011
Serial Attached SCSI (SAS) SAS-3,1.2 GB/s,2013
SATA Express,2 GB/s,2013
NVMe over M.2 or U.2 (using PCI Express 3.0 ×4 link),3.938 GB/s,2013
Fibre Channel 32GFC (28.05 GHz),3.303 GB/s,2016
Serial Attached SCSI (SAS) SAS-4,2.4 GB/s,2017
NVMe over M.2 or U.2 (using PCI Express 4.0 ×4 link),7.876 GB/s,2017
UFS (version 3.0),2.9 GB/s,2018
NVMe over M.2- U.2- U.3 or EDSFF (using PCI Express 5.0 ×4 link),15.754 GB/s,2019
1 Teletype Model 33 paper tape 10 B/s 1963
2 Single Density 8-inch FM Floppy Disk Controller (160 KB) 31 KB/s 1973
3 C2N Commodore Datasette 1530 cassette tape interface 15 B/s 1977
4 Apple II cassette tape interface 200 B/s 1977
5 TRS-80 Model 1 Level 1 BASIC cassette tape interface 32 B/s 1977
6 Single Density 5.25-inch FM Floppy Disk Controller (180 KB) 15.5 KB/s 1978
7 MFM hard disk 0.625 MB/s 1980
8 Amstrad CPC tape 250 B/s 1984
9 High Density MFM Floppy Disk Controller (1.2 MB/1.44 MB) 31 KB/s 1984
10 ATA PIO Mode 0 3.3 MB/s 1986
11 SCSI (Narrow SCSI) (5 MHz) 5 MB/s 1986
12 CD Controller (1×) 0.146 MB/s 1988
13 Serial Storage Architecture SSA 80 MB/s 1990
14 ATA PIO Mode 1 5.2 MB/s 1994
15 ATA PIO Mode 2 8.3 MB/s 1994
16 ATA PIO Mode 3 11.1 MB/s 1996
17 ATA PIO Mode 4 16.7 MB/s 1996
18 Fibre Channel 1GFC (1.0625 GHz) 103.23 MB/s 1997
19 Ultra DMA ATA 33 33 MB/s 1998
20 Ultra DMA ATA 66 66.7 MB/s 2000
21 Fibre Channel 2GFC (2.125 GHz) 206.5 MB/s 2001
22 Ultra DMA ATA 100 100 MB/s 2002
23 SATA revision 1.0 150 MB/s 2003
24 iSCSI over 10GbE 1.239 GB/s 2004
25 iSCSI over Fast Ethernet 11.9 MB/s 2004
26 iSCSI over gigabit Ethernet- jumbo frames 123.9 MB/s 2004
27 Serial Attached SCSI (SAS) SAS-1 300 MB/s 2004
28 SATA Revision 2.0 300 MB/s 2004
29 Fibre Channel 4GFC (4.25 GHz) 413 MB/s 2004
30 Ultra DMA ATA 133 133 MB/s 2005
31 Fibre Channel 8GFC (8.50 GHz) 826 MB/s 2005
32 iSCSI over InfiniBand 4× 4 GB/s 2007
33 SATA Revision 3.0 600 MB/s 2008
34 FCoE over 10GbE 1.206 GB/s 2009
35 AoE over 10GbE 1.242 GB/s 2009
36 AoE over Fast Ethernet 11.9 MB/s 2009
37 AoE over gigabit Ethernet- jumbo frames 124.2 MB/s 2009
38 Serial Attached SCSI (SAS) SAS-2 600 MB/s 2009
39 FCoE over 100G Ethernet 12.064 GB/s 2010
40 iSCSI over 100G Ethernet 12.392 GB/s 2010
41 Fibre Channel 16GFC (14.025 GHz) 1.652 GB/s 2011
42 Serial Attached SCSI (SAS) SAS-3 1.2 GB/s 2013
43 SATA Express 2 GB/s 2013
44 NVMe over M.2 or U.2 (using PCI Express 3.0 ×4 link) 3.938 GB/s 2013
45 Fibre Channel 32GFC (28.05 GHz) 3.303 GB/s 2016
46 Serial Attached SCSI (SAS) SAS-4 2.4 GB/s 2017
47 NVMe over M.2 or U.2 (using PCI Express 4.0 ×4 link) 7.876 GB/s 2017
48 UFS (version 3.0) 2.9 GB/s 2018
49 NVMe over M.2- U.2- U.3 or EDSFF (using PCI Express 5.0 ×4 link) 15.754 GB/s 2019

54
architecture/storage.py Normal file
View file

@ -0,0 +1,54 @@
# Storage interfaces rates from
# https://en.wikipedia.org/wiki/List_of_interface_bit_rates#Storage
# Table data extracted with: https://wikitable2csv.ggor.de/
import numpy as np
import matplotlib
from matplotlib import pyplot as plt
plt.style.use('ggplot')
matplotlib.rcParams['font.size'] = 12
matplotlib.rcParams['font.family'] = ['Exo 2', 'sans-serif']
data = open('storage.csv', 'rt')
# remove units and rescale everything to MB/s
b_to_mb = 1/(1024*1024)
kb_to_mb = 1/1024
gb_to_mb = 1024
rescaled = []
for line in data:
typ, rate, year = line.split(',')
value, unit = rate.split()
value = float(value)
if unit == 'B/s':
value = value*b_to_mb
elif unit == 'KB/s':
value = value*kb_to_mb
elif unit == 'MB/s':
pass
elif unit == 'GB/s':
value = value*gb_to_mb
else:
raise ValueError(f'Unit not understood! {unit}')
rescaled.append((int(year), value))
dtype = [('year', np.float64), ('speed', np.float64)]
rescaled = np.array(rescaled, dtype=dtype)
# sort first by year and then by value
rescaled.sort(order=['year', 'speed'])
# plot the thing
plt.figure(figsize=(8.5,7.5))
plt.semilogy(rescaled['year'], rescaled['speed'], 'o')
# my laptop here
plt.semilogy([2023], [6585], 'o')
plt.grid(None)
plt.grid(which='both', axis='y')
plt.grid(which='both', axis='x')
plt.ylim(b_to_mb, 100*gb_to_mb)
plt.xlim(1960, 2025)
years = range(1960,2026,5)
plt.xticks(years, years, rotation=45, ha='center')
plt.yticks([b_to_mb, kb_to_mb, 1, 10, 100, gb_to_mb, 10*gb_to_mb, 100*gb_to_mb],
['1 B/s', '1 KB/s', '1 MB/s', '10 MB/s', '100 MB/s', '1 GB/s', '10 GB/s', '100 GB/s'])
plt.tick_params(labeltop=False, labelright=True, top=True, right=True)
plt.title('Storage (read) speed')
plt.savefig('storage.svg')

1526
architecture/storage.svg Normal file

File diff suppressed because it is too large Load diff

After

Width:  |  Height:  |  Size: 48 KiB

BIN
architecture/topology.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

View file

@ -0,0 +1,57 @@
# Low Level Memory Benchmark
These are the results of a low level memory benchmark (written in C) on my [laptop](../architecture/README.md)
## Summary plots (details below)
![Memory Bandwidth P-core](bandwidth-t14.svg)
![Memory Latency P-core](latency-t14.svg)
## Benchmarks details:
- Bandwidth (read), [bw_mem_rd](http://lmbench.sourceforge.net/man/bw_mem_rd.8.html). Allocate the specified amount of memory, zero it, and then time the reading of that memory as a series of integer loads and adds. Each 4-byte integer is loaded and added to accumulator.
[Results](t14-bwr.csv) (block size in MB, bandwith in MB/s)
- Bandwidth (write),[bw_mem](http://lmbench.sourceforge.net/man/bw_mem.8.html). Allocate twice the specified amount of memory, zero it, and then time the copying of the first half to the second half.
[Results](t14-bww.csv) (block size in MB, bandwith in MB/s)
- Latency (sequential access), [lat_mem_rd](http://lmbench.sourceforge.net/man/lat_mem_rd.8.html). Run two nested loops. The outer loop is the stride size of 128 bytes. The inner loop is the block size. For each block size, create a ring of pointers that point backward one stride. Traverse the block by `p = (char **)*p` in a for loop and time the load ladency over block.
[Results](t14-lseq.csv) (block size in MB, latency in ns)
- Latency (random access). Like above, but with a stride size of 16 bytes.
[Results](t14-lrnd.csv) (block size in MB, latency in ns)
## Running the benchmarks on Linux:
- You need the [lmbench](http://lmbench.sourceforge.net/) library and [cpuset](https://github.com/SUSE/cpuset)
- All commands must be run as root after having killed as many processes/services as possible, so that the CPUs are almost idle
- Disable address space randomization:
```bash
echo 0 > /proc/sys/kernel/randomize_va_space
```
- Set scaling governor to performance for CPU0:
```bash
echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
```
- Reserve CPU 0 fro our benchmark, i.e. kick out (almost) all other processes
```bash
cset shield --cpu 0 --kthread=on
```
- If you are on INTEL and CPU0 is part of a SMT-pair (hyperthreading), disable the peer
```bash
echo 0 > /sys/devices/system/cpu/cpu1/online
```
- Disable turbo mode on INTEL:
```bash
echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
```
- Run the configuration script for lmbench. Select only the `HARDWARE` set of benchmarks and set the maximum amount of memory to something like 1024MB
```bash
cd /usr/lib/lmbench/scripts
# the following command will create the configuration file /usr/lib/lmbench/bin/x86_64-linux-gnu/CONFIG.<hostname>
cset shield --exec -- ./config-run
# run the benchmark
cset shield --exec -- /usr/bin/lmbench-run
# results are in /var/lib/lmbench/results/x86_64-linux-gnu/<hostname>
```

File diff suppressed because it is too large Load diff

After

Width:  |  Height:  |  Size: 51 KiB

File diff suppressed because it is too large Load diff

After

Width:  |  Height:  |  Size: 53 KiB

View file

@ -0,0 +1,39 @@
import os
import sys
# prefix is something like results_
results = sys.argv[1]
name = results.removeprefix('results_')
types = {}
results = open(results, 'rt')
for idx, line in enumerate(results):
if line.startswith('Memory read bandwidth'):
types['bwr'] = idx
elif line.startswith('Memory write bandwidth'):
types['bww'] = idx
elif line.startswith('Memory load latency'):
types['lseq'] = idx
elif line.startswith('Random load latency'):
types['lrnd'] = idx
else:
pass
for typ, idx in types.items():
csv = open(f'{name}-{typ}.csv', 'wt')
results.seek(0)
for count, line in enumerate(results):
if count <= idx:
continue
if line.startswith('"'):
continue
try:
val1, val2 = line.split(" ")
except ValueError:
# we are at the end of the section
csv.close()
break
csv.write(f'{val1},{val2}')

115
benchmark_low_level/plot.py Normal file
View file

@ -0,0 +1,115 @@
import os
import sys
import numpy as np
import matplotlib
import itertools
from matplotlib import pyplot as plt
plt.style.use('ggplot')
matplotlib.rcParams['font.size'] = 12
name = 't14'
caches = (48*1024, 1280*1024, 12*1024*1024)
def get_labels(x):
xlabels = []
for value in x:
b = int(2**value)
if b < 1024:
xlabels.append(f'{b}B')
elif b < 1048576:
xlabels.append(f'{b//1024}K')
elif b < 1073741824:
xlabels.append(f'{b//1024//1024}M')
else:
xlabels.append(f'{b//1024//1024//1024}G')
return xlabels
# manually set ticks, to disable, set ticks = None
line = np.linspace(1, 10, 9, endpoint=False)
yticks = list(line)+list(line*10)+list(line[:2]*100)
ylabels = {1 : '1 ns', 5 : '5 ns', 10 : '10 ns', 50 : '50 ns', 100: '100 ns'}
ticks = {'l': (yticks, [ylabels[i] if i in ylabels else '' for i in yticks]),
'bw': (range(5,46,5), [f'{i} GB/s' for i in range(5,46,5)]),
}
# manually set limits, to disable set to ylim = None
ylim = {'l' : (1, 200),
'bw' : (5,45),
}
for type_ in ('bw', 'l'):
if type_ == 'bw':
suffix = ('r', 'w')
ylabel = ''
title = f'Memory Bandwidth ({name}) [GB/s]'
legend1, legend2 = 'read', 'write'
pic = f'bandwidth-{name}.svg'
plt_func = plt.plot
else:
suffix = ('seq', 'rnd')
ylabel = ''
title = f'Memory Latency ({name}) [ns]'
legend1, legend2 = 'sequential access', 'random access'
pic = f'latency-{name}.svg'
plt_func = plt.semilogy
data1 = np.loadtxt(f'{name}-{type_}{suffix[0]}.csv', delimiter=',')
data2 = np.loadtxt(f'{name}-{type_}{suffix[1]}.csv', delimiter=',')
# convert to bytes and then to the corresponding power of two
if type_ == 'bw':
x1 = np.log2(data1[:,0]*1024*1024).round()
y1 = data1[:,1]/1024
x2 = np.log2(data2[:,0]*1024*1024).round()
y2 = data2[:,1]/1024
else:
x1 = np.log2(data1[::2,0]*1024*1024).round()
y1 = data1[::2,1]
x2 = np.log2(data2[::2,0]*1024*1024).round()
y2 = data2[::2,1]
ylabels = None
xlabel = 'block size'
xlabels = get_labels(x1)
plt.figure(figsize=(8.5,7.5))
if type_ == 'l':
# plot two empy plots so we advance the color cyle (bad trick)
_ = plt_func([],[])
_ = plt_func([],[])
p1, = plt_func(x1, y1, 'o')
plt.ylabel(ylabel)
plt.xlabel(xlabel)
p2, = plt_func(x2, y2, 'o')
if ylim and type_ in ylim:
plt.ylim(*ylim[type_])
plt.xticks(x1, xlabels, rotation=60)
if ticks and type_ in ticks:
plt.yticks(*ticks[type_])
plt.legend((p1, p2), (legend1, legend2))
if ylim and type_ in ylim:
miny, maxy = ylim[type_]
else:
miny = min(y1.min(), y2.min())
maxy = max(y1.max(), y2.max())
# caches
for idx, cache in enumerate(caches):
level = idx + 1
size = np.log2(cache)
plt.plot((size, size), (miny, maxy),
color = 'darkblue', alpha=0.4)
plt.text(size-1, 2*miny, f'L{level}\n',
color='darkblue', verticalalignment='top')
plt.title(title)
plt.savefig(pic)

View file

@ -0,0 +1,485 @@
[lmbench3.0 results for Linux multivac 6.10.3-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.10.3-1 (2024-08-04) x86_64 GNU/Linux]
[LMBENCH_VER: 3.0-a9]
[BENCHMARK_HARDWARE: YES]
[BENCHMARK_OS: NO]
[ALL: 512 1k 2k 4k 8k 16k 32k 64k 128k 256k 512k 1m 2m 4m 8m 16m 32m 64m 128m 256m 512m 1024m]
[DISKS: ]
[DISK_DESC: ]
[ENOUGH: 5000]
[FAST: ]
[FASTMEM: NO]
[FILE: /var/tmp/lmbench/XXX]
[FSDIR: /var/tmp/lmbench]
[HALF: 512 1k 2k 4k 8k 16k 32k 64k 128k 256k 512k 1m 2m 4m 8m 16m 32m 64m 128m 256m 512m]
[INFO: INFO.multivac]
[LINE_SIZE: ]
[LOOP_O: 0.00000000]
[MB: 1024]
[MHZ: 1296 MHz, 0.7716 nanosec clock]
[MOTHERBOARD: ]
[NETWORKS: ]
[PROCESSORS: 11]
[REMOTE: ]
[SLOWFS: YES]
[OS: x86_64-linux-gnu]
[SYNC_MAX: 1]
[LMBENCH_SCHED: DEFAULT]
[TIMING_O: 0]
[LMBENCH VERSION: 3.0-20240810]
[USER: root]
[HOSTNAME: multivac]
[NODENAME: multivac]
[SYSNAME: Linux]
[PROCESSOR: unknown]
[MACHINE: x86_64]
[RELEASE: 6.10.3-amd64]
[VERSION: #1 SMP PREEMPT_DYNAMIC Debian 6.10.3-1 (2024-08-04)]
[Sat Aug 10 04:20:43 PM CEST 2024]
[ 16:20:43 up 1:18, 4 users, load average: 0.37, 0.94, 1.05]
[net: Kernel Interface table]
[net: Iface MTU RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg]
[net: eth0 1500 0 0 0 0 0 0 0 0 BMU]
[if: eth0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500]
[if: ether fc:5c:ee:4d:b5:eb txqueuelen 1000 (Ethernet)]
[if: RX packets 0 bytes 0 (0.0 B)]
[if: RX errors 0 dropped 0 overruns 0 frame 0]
[if: TX packets 0 bytes 0 (0.0 B)]
[if: TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0]
[if: device interrupt 16 memory 0xbc300000-bc320000]
[if: ]
[net: eth1 1500 34948 0 2352 0 7773 0 0 0 BMRU]
[if: eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500]
[if: inet 192.168.111.103 netmask 255.255.255.0 broadcast 192.168.111.255]
[if: inet6 fe80::44e3:4a35:5130:3045 prefixlen 64 scopeid 0x20<link>]
[if: inet6 2003:ef:2f2e:c900:e437:85c7:3d97:f353 prefixlen 64 scopeid 0x0<global>]
[if: ether b0:4f:13:ef:1a:3e txqueuelen 1000 (Ethernet)]
[if: RX packets 34948 bytes 33936985 (32.3 MiB)]
[if: RX errors 0 dropped 2352 overruns 0 frame 0]
[if: TX packets 7773 bytes 1213416 (1.1 MiB)]
[if: TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0]
[if: ]
[net: lo 65536 95 0 0 0 95 0 0 0 LRU]
[if: lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536]
[if: inet 127.0.0.1 netmask 255.0.0.0]
[if: inet6 ::1 prefixlen 128 scopeid 0x10<host>]
[if: loop txqueuelen 1000 (Local Loopback)]
[if: RX packets 95 bytes 5588 (5.4 KiB)]
[if: RX errors 0 dropped 0 overruns 0 frame 0]
[if: TX packets 95 bytes 5588 (5.4 KiB)]
[if: TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0]
[if: ]
[mount: sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)]
[mount: proc on /proc type proc (rw,relatime)]
[mount: udev on /dev type devtmpfs (rw,nosuid,relatime,size=16228560k,nr_inodes=4057140,mode=755,inode64)]
[mount: devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)]
[mount: tmpfs on /run type tmpfs (rw,nosuid,nodev,noexec,relatime,size=3251140k,mode=755,inode64)]
[mount: /dev/mapper/CRYPT-ROOT on / type ext4 (rw,relatime,errors=remount-ro)]
[mount: securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)]
[mount: tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,inode64)]
[mount: cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot)]
[mount: pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)]
[mount: efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)]
[mount: bpf on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)]
[mount: systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=39,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=67)]
[mount: hugetlbfs on /dev/hugepages type hugetlbfs (rw,nosuid,nodev,relatime,pagesize=2M)]
[mount: none on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)]
[mount: mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)]
[mount: tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k,inode64)]
[mount: tracefs on /sys/kernel/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)]
[mount: tmpfs on /run/credentials/systemd-journald.service type tmpfs (ro,nosuid,nodev,noexec,relatime,nosymfollow,size=1024k,nr_inodes=1024,mode=700,inode64,noswap)]
[mount: tmpfs on /run/credentials/systemd-udev-load-credentials.service type tmpfs (ro,nosuid,nodev,noexec,relatime,nosymfollow,size=1024k,nr_inodes=1024,mode=700,inode64,noswap)]
[mount: tmpfs on /run/credentials/systemd-tmpfiles-setup-dev-early.service type tmpfs (ro,nosuid,nodev,noexec,relatime,nosymfollow,size=1024k,nr_inodes=1024,mode=700,inode64,noswap)]
[mount: configfs on /sys/kernel/config type configfs (rw,nosuid,nodev,noexec,relatime)]
[mount: fusectl on /sys/fs/fuse/connections type fusectl (rw,nosuid,nodev,noexec,relatime)]
[mount: tmpfs on /run/credentials/systemd-sysctl.service type tmpfs (ro,nosuid,nodev,noexec,relatime,nosymfollow,size=1024k,nr_inodes=1024,mode=700,inode64,noswap)]
[mount: tmpfs on /run/credentials/systemd-tmpfiles-setup-dev.service type tmpfs (ro,nosuid,nodev,noexec,relatime,nosymfollow,size=1024k,nr_inodes=1024,mode=700,inode64,noswap)]
[mount: tmpfs on /tmp type tmpfs (rw,nosuid,nodev,relatime,size=16777216k,inode64)]
[mount: tmpfs on /run/credentials/systemd-tmpfiles-setup.service type tmpfs (ro,nosuid,nodev,noexec,relatime,nosymfollow,size=1024k,nr_inodes=1024,mode=700,inode64,noswap)]
[mount: binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,nosuid,nodev,noexec,relatime)]
[mount: sunrpc on /run/rpc_pipefs type rpc_pipefs (rw,relatime)]
[mount: tmpfs on /run/user/1002 type tmpfs (rw,nosuid,nodev,relatime,size=3251136k,nr_inodes=812784,mode=700,uid=1002,gid=100,inode64)]
[mount: tmpfs on /run/credentials/getty@tty1.service type tmpfs (ro,nosuid,nodev,noexec,relatime,nosymfollow,size=1024k,nr_inodes=1024,mode=700,inode64,noswap)]
[mount: tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=3251136k,nr_inodes=812784,mode=700,inode64)]
[mount: none on /cpusets type cgroup (rw,relatime,cpuset,noprefix,release_agent=/sbin/cpuset_release_agent)]
integer bit: 0.54 nanoseconds
integer add: 0.77 nanoseconds
integer div: 8.49 nanoseconds
integer mod: 12.58 nanoseconds
int64 bit: 0.52 nanoseconds
uint64 add: 0.77 nanoseconds
int64 div: 11.58 nanoseconds
int64 mod: 14.91 nanoseconds
float add: 1.54 nanoseconds
float mul: 3.09 nanoseconds
float div: 8.49 nanoseconds
double add: 1.54 nanoseconds
double mul: 3.09 nanoseconds
double div: 10.80 nanoseconds
float bogomflops: 1.16 nanoseconds
double bogomflops: 1.54 nanoseconds
integer bit parallelism: 2.77
integer add parallelism: 2.73
integer div parallelism: 1.83
integer mod parallelism: 2.83
int64 bit parallelism: 2.49
int64 add parallelism: 2.60
int64 div parallelism: 1.50
int64 mod parallelism: 1.90
float add parallelism: 4.00
float mul parallelism: 8.00
float div parallelism: 3.67
double add parallelism: 4.00
double mul parallelism: 8.00
double div parallelism: 3.50
unable to register (XACT_PROG, XACT_VERS, udp).
: RPC: Unable to receive
"libc bcopy unaligned
0.000512 41652.95
0.001024 47761.25
0.002048 50233.88
0.004096 55637.27
0.008192 64524.03
0.016384 67719.30
0.032768 18212.36
0.065536 18407.52
0.131072 18473.55
0.262144 18475.00
0.524288 14642.79
1.05 8957.30
2.10 8208.03
4.19 8208.03
8.39 9645.77
16.78 7631.79
33.55 7129.38
67.11 6951.41
134.22 6900.65
268.44 6848.89
536.87 6861.76
"libc bcopy aligned
0.000512 44106.76
0.001024 49354.68
0.002048 51472.69
0.004096 55925.21
0.008192 63828.24
0.016384 66379.51
0.032768 18202.45
0.065536 18336.03
0.131072 18457.77
0.262144 18327.76
0.524288 15715.46
1.05 8922.33
2.10 8367.89
4.19 8343.10
8.39 9679.16
16.78 7632.95
33.55 7179.72
67.11 6990.51
134.22 6911.31
268.44 6892.15
536.87 6891.97
Memory bzero bandwidth
0.000512 73586.23
0.001024 78019.46
0.002048 80349.42
0.004096 74573.30
0.008192 78524.11
0.016384 80567.79
0.032768 81708.84
0.065536 21219.16
0.131072 21299.79
0.262144 21333.96
0.524288 21347.23
1.05 19382.88
2.10 12829.98
4.19 12611.10
8.39 12606.02
16.78 10399.64
33.55 9537.93
67.11 9140.41
134.22 9007.90
268.44 8931.77
536.87 8918.57
1073.74 8908.13
"unrolled bcopy unaligned
0.000512 10357.22
0.001024 10363.21
0.002048 10356.95
0.004096 10357.76
0.008192 10343.49
0.016384 10351.27
0.032768 7899.27
0.065536 7893.76
0.131072 7873.84
0.262144 7832.99
0.524288 7281.78
1.05 6503.77
2.10 6418.22
4.19 6461.47
8.39 5194.99
16.78 4722.65
33.55 4639.72
67.11 4606.91
134.22 4593.51
268.44 4596.34
536.87 4587.27
"unrolled partial bcopy unaligned
0.000512 41402.69
0.001024 41453.86
0.002048 41452.30
0.004096 41425.45
0.008192 41418.12
0.016384 41333.58
0.032768 18957.19
0.065536 18955.39
0.131072 18962.49
0.262144 18969.69
0.524288 14659.04
1.05 8844.77
2.10 8192.00
4.19 8206.57
8.39 6326.25
16.78 5801.25
33.55 5644.14
67.11 5609.70
134.22 5600.81
268.44 5589.38
536.87 5591.24
Memory read bandwidth
0.000512 29201.61
0.001024 29294.55
0.002048 29363.12
0.004096 29433.86
0.008192 29442.59
0.016384 29285.40
0.032768 29336.30
0.065536 27978.05
0.131072 28392.59
0.262144 28408.05
0.524288 28424.68
1.05 28385.92
2.10 28385.92
4.19 28395.43
8.39 28334.85
16.78 26342.45
33.55 23489.28
67.11 22195.75
134.22 21644.53
268.44 21620.12
536.87 21505.80
1073.74 21526.50
Memory partial read bandwidth
0.000512 58916.90
0.001024 59661.44
0.002048 61203.68
0.004096 58783.21
0.008192 61320.45
0.016384 61266.70
0.032768 60940.09
0.065536 30488.23
0.131072 30517.76
0.262144 30516.13
0.524288 29627.83
1.05 24662.86
2.10 17384.93
4.19 17168.66
8.39 16915.36
16.78 13189.64
33.55 11584.48
67.11 11024.95
134.22 10892.53
268.44 10824.45
536.87 10781.84
1073.74 10759.80
Memory write bandwidth
0.000512 41405.52
0.001024 41396.47
0.002048 41429.93
0.004096 41445.34
0.008192 41401.00
0.016384 41398.70
0.032768 41426.50
0.065536 21381.05
0.131072 21388.82
0.262144 21374.31
0.524288 21370.17
1.05 18114.68
2.10 12417.83
4.19 12264.05
8.39 12250.61
16.78 9679.16
33.55 8978.98
67.11 8703.00
134.22 8589.38
268.44 8520.41
536.87 8543.59
1073.74 8544.75
Memory partial write bandwidth
0.000512 41406.05
0.001024 41431.27
0.002048 41414.90
0.004096 41425.45
0.008192 41431.04
0.016384 41453.60
0.032768 41366.48
0.065536 21392.21
0.131072 21364.37
0.262144 21381.05
0.524288 21366.56
1.05 18649.81
2.10 12411.48
4.19 12249.30
8.39 12300.01
16.78 9693.61
33.55 9024.86
67.11 8771.25
134.22 8618.06
268.44 8557.89
536.87 8549.44
1073.74 8543.32
Memory partial read/write bandwidth
0.000512 20712.63
0.001024 20714.87
0.002048 20703.88
0.004096 20718.77
0.008192 20719.28
0.016384 20715.33
0.032768 20722.87
0.065536 20693.70
0.131072 20690.58
0.262144 20638.28
0.524288 20665.37
1.05 18846.95
2.10 12887.53
4.19 12613.33
8.39 12576.62
16.78 10295.93
33.55 9551.50
67.11 9191.74
134.22 9087.19
268.44 9035.49
536.87 9018.95
1073.74 9023.04
Usage: tlb [-c] [-L <line size>] [-M len[K|M]] [-W <warmup>] [-N <repetitions>]
Memory load parallelism
Usage: par_mem [-L <line size>] [-M len[K|M]] [-W <warmup>] [-N <repetitions>]
STREAM copy latency: 1.48 nanoseconds
STREAM copy bandwidth: 10781.39 MB/sec
STREAM scale latency: 1.50 nanoseconds
STREAM scale bandwidth: 10668.00 MB/sec
STREAM add latency: 2.11 nanoseconds
STREAM add bandwidth: 11374.24 MB/sec
STREAM triad latency: 2.13 nanoseconds
STREAM triad bandwidth: 11264.63 MB/sec
STREAM2 fill latency: 0.89 nanoseconds
STREAM2 fill bandwidth: 8955.12 MB/sec
STREAM2 copy latency: 1.48 nanoseconds
STREAM2 copy bandwidth: 10775.60 MB/sec
STREAM2 daxpy latency: 1.81 nanoseconds
STREAM2 daxpy bandwidth: 13256.52 MB/sec
STREAM2 sum latency: 1.60 nanoseconds
STREAM2 sum bandwidth: 5006.72 MB/sec
Memory load latency
"stride=128
0.00049 3.859
0.00098 3.859
0.00195 3.859
0.00293 3.859
0.00391 3.861
0.00586 3.858
0.00781 3.858
0.01172 3.859
0.01562 3.859
0.02344 3.859
0.03125 3.859
0.04688 3.861
0.06250 11.580
0.09375 11.576
0.12500 11.577
0.18750 11.583
0.25000 11.577
0.37500 11.576
0.50000 11.579
0.75000 11.578
1.00000 11.590
1.50000 13.543
2.00000 13.936
3.00000 13.999
4.00000 13.996
6.00000 13.997
8.00000 14.002
12.00000 14.976
16.00000 19.832
24.00000 20.880
32.00000 21.339
48.00000 21.899
64.00000 22.023
96.00000 22.156
128.00000 22.213
192.00000 22.283
256.00000 22.320
384.00000 22.306
512.00000 22.325
768.00000 22.345
1024.00000 22.361
Random load latency
"stride=16
0.00049 3.859
0.00098 3.858
0.00195 3.858
0.00293 3.858
0.00391 3.858
0.00586 3.858
0.00781 3.859
0.01172 3.858
0.01562 3.858
0.02344 3.859
0.03125 3.859
0.04688 3.864
0.06250 11.575
0.09375 14.276
0.12500 15.462
0.18750 16.079
0.25000 16.646
0.37500 16.373
0.50000 16.352
0.75000 18.529
1.00000 18.245
1.50000 42.351
2.00000 55.350
3.00000 61.011
4.00000 62.143
6.00000 63.587
8.00000 65.259
12.00000 84.563
16.00000 107.165
24.00000 131.898
32.00000 141.864
48.00000 150.654
64.00000 156.245
96.00000 162.950
128.00000 167.497
192.00000 170.394
256.00000 171.779
384.00000 172.858
512.00000 172.877
768.00000 173.626
1024.00000 173.702
[Sat Aug 10 04:39:13 PM CEST 2024]

View file

@ -0,0 +1,22 @@
0.000512,29201.61
0.001024,29294.55
0.002048,29363.12
0.004096,29433.86
0.008192,29442.59
0.016384,29285.40
0.032768,29336.30
0.065536,27978.05
0.131072,28392.59
0.262144,28408.05
0.524288,28424.68
1.05,28385.92
2.10,28385.92
4.19,28395.43
8.39,28334.85
16.78,26342.45
33.55,23489.28
67.11,22195.75
134.22,21644.53
268.44,21620.12
536.87,21505.80
1073.74,21526.50
1 0.000512 29201.61
2 0.001024 29294.55
3 0.002048 29363.12
4 0.004096 29433.86
5 0.008192 29442.59
6 0.016384 29285.40
7 0.032768 29336.30
8 0.065536 27978.05
9 0.131072 28392.59
10 0.262144 28408.05
11 0.524288 28424.68
12 1.05 28385.92
13 2.10 28385.92
14 4.19 28395.43
15 8.39 28334.85
16 16.78 26342.45
17 33.55 23489.28
18 67.11 22195.75
19 134.22 21644.53
20 268.44 21620.12
21 536.87 21505.80
22 1073.74 21526.50

View file

@ -0,0 +1,22 @@
0.000512,41405.52
0.001024,41396.47
0.002048,41429.93
0.004096,41445.34
0.008192,41401.00
0.016384,41398.70
0.032768,41426.50
0.065536,21381.05
0.131072,21388.82
0.262144,21374.31
0.524288,21370.17
1.05,18114.68
2.10,12417.83
4.19,12264.05
8.39,12250.61
16.78,9679.16
33.55,8978.98
67.11,8703.00
134.22,8589.38
268.44,8520.41
536.87,8543.59
1073.74,8544.75
1 0.000512 41405.52
2 0.001024 41396.47
3 0.002048 41429.93
4 0.004096 41445.34
5 0.008192 41401.00
6 0.016384 41398.70
7 0.032768 41426.50
8 0.065536 21381.05
9 0.131072 21388.82
10 0.262144 21374.31
11 0.524288 21370.17
12 1.05 18114.68
13 2.10 12417.83
14 4.19 12264.05
15 8.39 12250.61
16 16.78 9679.16
17 33.55 8978.98
18 67.11 8703.00
19 134.22 8589.38
20 268.44 8520.41
21 536.87 8543.59
22 1073.74 8544.75

View file

@ -0,0 +1,41 @@
0.00049,3.859
0.00098,3.858
0.00195,3.858
0.00293,3.858
0.00391,3.858
0.00586,3.858
0.00781,3.859
0.01172,3.858
0.01562,3.858
0.02344,3.859
0.03125,3.859
0.04688,3.864
0.06250,11.575
0.09375,14.276
0.12500,15.462
0.18750,16.079
0.25000,16.646
0.37500,16.373
0.50000,16.352
0.75000,18.529
1.00000,18.245
1.50000,42.351
2.00000,55.350
3.00000,61.011
4.00000,62.143
6.00000,63.587
8.00000,65.259
12.00000,84.563
16.00000,107.165
24.00000,131.898
32.00000,141.864
48.00000,150.654
64.00000,156.245
96.00000,162.950
128.00000,167.497
192.00000,170.394
256.00000,171.779
384.00000,172.858
512.00000,172.877
768.00000,173.626
1024.00000,173.702
1 0.00049 3.859
2 0.00098 3.858
3 0.00195 3.858
4 0.00293 3.858
5 0.00391 3.858
6 0.00586 3.858
7 0.00781 3.859
8 0.01172 3.858
9 0.01562 3.858
10 0.02344 3.859
11 0.03125 3.859
12 0.04688 3.864
13 0.06250 11.575
14 0.09375 14.276
15 0.12500 15.462
16 0.18750 16.079
17 0.25000 16.646
18 0.37500 16.373
19 0.50000 16.352
20 0.75000 18.529
21 1.00000 18.245
22 1.50000 42.351
23 2.00000 55.350
24 3.00000 61.011
25 4.00000 62.143
26 6.00000 63.587
27 8.00000 65.259
28 12.00000 84.563
29 16.00000 107.165
30 24.00000 131.898
31 32.00000 141.864
32 48.00000 150.654
33 64.00000 156.245
34 96.00000 162.950
35 128.00000 167.497
36 192.00000 170.394
37 256.00000 171.779
38 384.00000 172.858
39 512.00000 172.877
40 768.00000 173.626
41 1024.00000 173.702

View file

@ -0,0 +1,41 @@
0.00049,3.859
0.00098,3.859
0.00195,3.859
0.00293,3.859
0.00391,3.861
0.00586,3.858
0.00781,3.858
0.01172,3.859
0.01562,3.859
0.02344,3.859
0.03125,3.859
0.04688,3.861
0.06250,11.580
0.09375,11.576
0.12500,11.577
0.18750,11.583
0.25000,11.577
0.37500,11.576
0.50000,11.579
0.75000,11.578
1.00000,11.590
1.50000,13.543
2.00000,13.936
3.00000,13.999
4.00000,13.996
6.00000,13.997
8.00000,14.002
12.00000,14.976
16.00000,19.832
24.00000,20.880
32.00000,21.339
48.00000,21.899
64.00000,22.023
96.00000,22.156
128.00000,22.213
192.00000,22.283
256.00000,22.320
384.00000,22.306
512.00000,22.325
768.00000,22.345
1024.00000,22.361
1 0.00049 3.859
2 0.00098 3.859
3 0.00195 3.859
4 0.00293 3.859
5 0.00391 3.861
6 0.00586 3.858
7 0.00781 3.858
8 0.01172 3.859
9 0.01562 3.859
10 0.02344 3.859
11 0.03125 3.859
12 0.04688 3.861
13 0.06250 11.580
14 0.09375 11.576
15 0.12500 11.577
16 0.18750 11.583
17 0.25000 11.577
18 0.37500 11.576
19 0.50000 11.579
20 0.75000 11.578
21 1.00000 11.590
22 1.50000 13.543
23 2.00000 13.936
24 3.00000 13.999
25 4.00000 13.996
26 6.00000 13.997
27 8.00000 14.002
28 12.00000 14.976
29 16.00000 19.832
30 24.00000 20.880
31 32.00000 21.339
32 48.00000 21.899
33 64.00000 22.023
34 96.00000 22.156
35 128.00000 22.213
36 192.00000 22.283
37 256.00000 22.320
38 384.00000 22.306
39 512.00000 22.325
40 768.00000 22.345
41 1024.00000 22.361

View file

@ -0,0 +1,29 @@
## Benchmark
![](loading-timings-ns128.svg)
![](loading-slowdown-ns128.svg)
## Benchmarks details
- Create in memory a list of `N=128` one dimensional numpy arrays of length `L` starting from `L=2` up to `L=2^22=4_194_304` in steps of powers of two.
- Given that each item of the array is of type `float64`, i.e. 8 bytes, the size of the arrays goes from `16B` to `32M`
- The total memory required is at least `128 × 32M × 2 = 8G`
- Load the whole list in one big numpy array of size `N`x`L` (*good*) and `L`x`N` (*bad*). The corresponding loops are:
```python
# good loop (store each time series on a different row)
for row, time_series in enumerate(collection):
ts[row, :] = time_series
```
```python
# bad loop (store each time series on a different column)
for column, time_series in enumerate(collection):
ts[:, column] = time_series
```
- Time the *bad* and the *good* loop
- Plot the timings for the *good* and the *bad* loop as a function of `L`
- Plot `slowdown = time_bad/time_good` as a function of `L`
## Scripts
- [Benchmark script](bench.py) and [measurements](results_ns128)
- [Plotting script](bench_plot.py)

68
benchmark_python/bench.py Executable file
View file

@ -0,0 +1,68 @@
#!/usr/bin/python3
# Commands and ideas from: https://llvm.org/docs/Benchmarking.html
# we assume that CPU 0 and 1 are together in an intel SMT-pair (hyperthreading)
# - Disable address space randomization:
# echo 0 > /proc/sys/kernel/randomize_va_space
# - Set scaling governor to performance for CPU 0
# echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
# - Reserve CPU 0 fro our benchmark
# cset shield --cpu 0 --kthread=on
# - Disable the SMT-peer of CPU 0, i.e. CPU 1
# echo 0 > /sys/devices/system/cpu/cpu1/online
# - Disable turbo mode (works only on Intel):
# echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
# - Run with
# cset shield --exec -- ./bench.py
#
# To use the full power of the CPU, skip all the other steps and
# just run with
# taskset --cpu-list 0 ./bench.py
import os
import sys
import timeit
import numpy as np
NSERIES = (128, )
POWS = 2**np.arange(2, 23, dtype=int)
# Size of one dimensional numpy arrays of dtype 'float64':
# A fix overhead of 96 bytes plus a variable size:
# (n_items x 8 bytes)
def load_data_row(x, time_series):
"""Store one time series per raw"""
for row, ts in enumerate(time_series):
x[row,:] = ts
return x
def load_data_column(x, time_series):
"""Store one time series per column"""
for column, ts in enumerate(time_series):
x[:, column] = ts
return x
if __name__ == '__main__':
for nseries in NSERIES:
print(30*'=', '\n', nseries)
float_items = POWS
byte_sizes = (float_items*8) #+ 96
bads = []
goods = []
results = open(f'results_ns{nseries}', 'wt')
for i, len_one_series in enumerate(float_items):
time_series = np.zeros((nseries, len_one_series), dtype='float64')
x = np.empty((nseries, len_one_series), dtype='float64')
print('Timing good...')
good = min(timeit.repeat(lambda: load_data_row(x, time_series), number=5))/5
x = np.empty((len_one_series, nseries), dtype='float64')
print('Timing bad...')
bad = min(timeit.repeat(lambda: load_data_column(x, time_series), number=5))/5
print(f'{len_one_series}/{POWS[-1]} {good} {bad}')
bads.append(bad)
goods.append(good)
results.write(f'{byte_sizes[i]} {good} {bad}\n')
results.flush()
results.close()

106
benchmark_python/bench_plot.py Executable file
View file

@ -0,0 +1,106 @@
#!/usr/bin/python3
import os
import sys
import numpy as np
import matplotlib
from matplotlib import pyplot as plt
plt.style.use('ggplot')
matplotlib.rcParams['font.size'] = 12
matplotlib.rcParams['font.family'] = ['Exo 2', 'sans-serif']
from bench import NSERIES
def get_xlabels(x):
xlabels = []
for value in x:
b = int(2**value)
if b < 1024:
xlabels.append(f'{b}B')
elif b < 1048576:
xlabels.append(f'{b//1024}K')
elif b < 1073741824:
xlabels.append(f'{b//1024//1024}M')
else:
xlabels.append(f'{b//1024//1024//1024}G')
return xlabels
def get_ylabels(y):
ylabels = []
for power in y:
power = int(np.log10(power))
if power < -6:
value = 10**(power+9)
ylabels.append(f'{value}ns')
elif power < -3:
value = 10**(power+6)
ylabels.append(f'{value}μs')
elif power < 0:
value = 10**(power+3)
ylabels.append(f'{value}ms')
else:
value = 10**power
ylabels.append(f'{value}s')
return ylabels
prefix = 'results_ns'
maxy = 1e1
miny = 1e-6
for results in (f for f in os.listdir('.') if f.startswith(prefix)):
num_series = results.removeprefix(prefix)
sizes, bads, goods = [], [], []
with open(results, 'r') as fh:
for line in fh:
size, good, bad = line.split()
bads.append(float(bad))
goods.append(float(good))
sizes.append(int(size))
goods = np.array(goods)
bads = np.array(bads)
x = np.log2(sizes)
y1 = goods
y2 = bads
# generate two plots: good+bad timings and slowdown plot
plt.figure(figsize=(8.5, 7.5))
p1, = plt.semilogy(x, y1, 'o')
p2, = plt.semilogy(x, y2, 'o')
plt.xlabel('size of one time series')
plt.ylabel('loading time')
plt.grid(None)
plt.grid(which='both', axis='both')
plt.xticks(x, get_xlabels(x), rotation=60)
plt.ylim(miny, maxy)
yticks = np.logspace(int(np.log10(miny)),
int(np.log10(maxy)),
num=int(np.log10(maxy/miny))+1)
plt.yticks(yticks, get_ylabels(yticks))
plt.tick_params(axis='y', labelright=True, right=True)
lgd = plt.legend((p1, p2), ('good', 'bad'), frameon=True)
lgd.get_frame().set_edgecolor('black')
plt.title(f'Timings\n{num_series} time series')
plt.savefig(f'loading-timings-ns{num_series}.svg')
# slowdown plot
plt.figure(figsize=(8.5, 7.5))
p1, = plt.plot(x, bads/goods, 'og', label=r'$\frac{\mathrm{time\_bad}}{\mathrm{time\_good}}$')
plt.xlabel('size of one time series')
plt.ylabel('slowdown')
plt.grid(None)
plt.grid(which='both', axis='both')
plt.xticks(x, get_xlabels(x), rotation=60)
plt.tick_params(axis='y', which='both', reset=True, labelright=True, right=True)
lmaxy = (bads/goods).max()
yticks = range(0, int(np.ceil(lmaxy))+1)
yticks_labels = []
for i in yticks:
if not i%5:
yticks_labels.append(str(i))
else:
yticks_labels.append('')
plt.yticks(yticks, yticks_labels)
#plt.legend((p1,), ('time_bad/time_good',))
lgd = plt.legend(frameon=True, fontsize=16)
lgd.get_frame().set_edgecolor('black')
plt.title(f'Slowdown\n{num_series} time series')
plt.savefig(f'loading-slowdown-ns{num_series}.svg')

File diff suppressed because it is too large Load diff

After

Width:  |  Height:  |  Size: 70 KiB

File diff suppressed because it is too large Load diff

After

Width:  |  Height:  |  Size: 73 KiB

View file

@ -0,0 +1,21 @@
32 3.624640012276359e-05 3.637080008047633e-05
64 3.6378599907038735e-05 3.75960000383202e-05
128 3.5999999818159266e-05 3.659379981399979e-05
256 3.647979974630289e-05 3.8503600080730394e-05
512 3.6466600067797114e-05 4.5260399929247797e-05
1024 3.694279985211324e-05 6.389359987224452e-05
2048 3.771199990296736e-05 0.00010158859986404422
4096 3.9600799937034026e-05 0.00017927039989444892
8192 4.872880017501302e-05 0.00035019980023207606
16384 0.0001166390000435058 0.0009462125999561977
32768 0.00020714380007120782 0.0018965299997944385
65536 0.00041309340012958274 0.003799166400131071
131072 0.0009548601999995298 0.013321203599844011
262144 0.002020656999957282 0.03497214360031649
524288 0.004090915599954314 0.07588242120000359
1048576 0.008223017600175807 0.15468134920010926
2097152 0.01658681320004689 0.3135109368002304
4194304 0.03318921820027754 0.629940018599882
8388608 0.031091862199900788 1.2587968365998676
16777216 0.0620614524003031 2.517574722000063
33554432 0.125347068800329 5.040295794399936

241
exercise-my-solution.ipynb Normal file
View file

@ -0,0 +1,241 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"execution": {
"iopub.execute_input": "2024-03-04T09:40:28.904Z",
"iopub.status.busy": "2024-03-04T09:40:28.896Z",
"iopub.status.idle": "2024-03-04T09:40:28.978Z",
"shell.execute_reply": "2024-03-04T09:40:28.967Z"
}
},
"outputs": [],
"source": [
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"execution": {
"iopub.execute_input": "2024-03-04T10:02:39.062Z",
"iopub.status.busy": "2024-03-04T10:02:39.057Z",
"iopub.status.idle": "2024-03-04T10:02:39.068Z",
"shell.execute_reply": "2024-03-04T10:02:39.071Z"
}
},
"outputs": [],
"source": [
"n_series = 32\n",
"len_one_series = 5*2**20\n",
"time_series = np.random.rand(n_series, len_one_series)\n",
"gap = 16*2**10"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"execution": {
"iopub.execute_input": "2024-03-04T10:02:41.027Z",
"iopub.status.busy": "2024-03-04T10:02:41.020Z",
"iopub.status.idle": "2024-03-04T10:02:41.036Z",
"shell.execute_reply": "2024-03-04T10:02:41.040Z"
},
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Size of one time series: 40 M\n",
"Size of collection: 1280 M\n",
"Gap size: 128 K\n",
"Gapped series size: 2 K\n"
]
}
],
"source": [
"print(f'Size of one time series: {int(time_series[0].nbytes/2**20)} M')\n",
"print(f'Size of collection: {int(time_series.nbytes/2**20)} M')\n",
"print(f'Gap size: {int(gap*8/2**10)} K')\n",
"print(f'Gapped series size: {int(time_series[0, ::gap].nbytes/2**10)} K')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following function implements an approximation of a power series of every `gap` value in our time series.\n",
"\n",
"If we define one time series of length `N` to be:\n",
"\n",
"$[x_0, x_1, x_2, ..., x_N]$,\n",
"\n",
"then the \"gapped\" series with `gap=g` is:\n",
"\n",
"$[x_0, x_g, x_{2g}, ..., x_{N/g}]$,\n",
"\n",
"where $N/g$ is the number of gaps.\n",
"\n",
"The approximation of the power series up to power `30` for our \"gapped\" series is defined as:\n",
"\n",
"$$\\mathbf{P} = \\sum_{p=0}^{30} \\sum_i x_i^{p} = \\sum_i x_i^0 + \\sum_i x_i^1 + \\sum_i x_i^2 + ... + \\sum_i x_i^{30} $$\n",
"\n",
"where $i \\in [0, g, 2g, ..., N/g]$"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"execution": {
"iopub.execute_input": "2024-03-04T10:06:08.461Z",
"iopub.status.busy": "2024-03-04T10:06:08.459Z",
"iopub.status.idle": "2024-03-04T10:06:08.466Z",
"shell.execute_reply": "2024-03-04T10:06:08.468Z"
}
},
"outputs": [],
"source": [
"# compute an approximation of a power series for a collection of gapped timeseries\n",
"def power(time_series, P, gap):\n",
" for row in range(time_series.shape[0]):\n",
" for pwr in range(30):\n",
" P[row] += (time_series[row, ::gap]**pwr).sum()\n",
" return P\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Challenge\n",
"- Can you improve on the above implementation of the `power` function?\n",
"- Change the following `power_improved` function and see what you can do\n",
"- **Remember**: you can't change any other cell in this notebook!"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"execution": {
"iopub.execute_input": "2024-03-04T10:06:08.461Z",
"iopub.status.busy": "2024-03-04T10:06:08.459Z",
"iopub.status.idle": "2024-03-04T10:06:08.466Z",
"shell.execute_reply": "2024-03-04T10:06:08.468Z"
}
},
"outputs": [],
"source": [
"def power_improved(time_series, P, gap):\n",
" y = time_series[:,::gap].copy()\n",
" for row in range(time_series.shape[0]):\n",
" for pwr in range(30):\n",
" P[row] += (y[row, :]**pwr).sum()\n",
" return P"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"# verify that they yield the same results\n",
"P = np.zeros(n_series, dtype='float64')\n",
"out1 = power(time_series, P, gap)\n",
"P = np.zeros(n_series, dtype='float64')\n",
"out2 = power_improved(time_series, P, gap)\n",
"np.testing.assert_allclose(out1, out2)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"execution": {
"iopub.execute_input": "2024-03-04T10:06:14.959Z",
"iopub.status.busy": "2024-03-04T10:06:14.956Z",
"iopub.status.idle": "2024-03-04T10:06:17.437Z",
"shell.execute_reply": "2024-03-04T10:06:17.443Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"38.9 ms ± 492 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
]
}
],
"source": [
"P = np.zeros(n_series, dtype='float64')\n",
"%timeit power(time_series, P, gap)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"execution": {
"iopub.execute_input": "2024-03-04T10:06:20.056Z",
"iopub.status.busy": "2024-03-04T10:06:20.053Z",
"iopub.status.idle": "2024-03-04T10:06:21.695Z",
"shell.execute_reply": "2024-03-04T10:06:21.700Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"6.79 ms ± 35.2 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
]
}
],
"source": [
"P = np.zeros(n_series, dtype='float64')\n",
"%timeit power_improved(time_series, P, gap)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.5"
},
"nteract": {
"version": "0.28.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

31
numpy/README.md Normal file
View file

@ -0,0 +1,31 @@
# Anatomy of a numpy array
## one dimension, float64
![1d array](ndarray-memory-layout-1d.svg)
## two dimensions, square, float64
![2d array - square](ndarray-memory-layout-2d-square.svg)
## two dimensions, rectangular, int32
![2d array - rectangular](ndarray-memory-layout-2d-rectangular.svg)
## what about Python lists?
![memory layout of a Python list](python-list-memory-layout.svg)
## interesting attributes of numpy arrays
- `x.data`, `x.data.hex()`, `x.data.format`, `x.tobytes()`
- `x.flags`:
- `OWNDATA`
- `C_CONTIGUOUS`
- `F_CONTIGUOUS`
- more [flags](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.flags.html)
## If your arrays are bigger than RAM
- [`numpy.memmap`](https://numpy.org/doc/stable/reference/generated/numpy.memmap.html): an array-like
object that maps memory to an array stored on disk, used for accessing small segments of large
files on disk, without reading the entire file into memory. Use with caution!
- [`HDF5`](https://support.hdfgroup.org/documentation/hdf5/latest/_intro_h_d_f5.html): hierarchical
data format, with aribitrary metadata and multilanguage support
with [`h5py`](https://docs.h5py.org/en/stable/) with an array-lie interface
- other projects, for example [`xarray`](https://docs.xarray.dev/en/stable/)
and [`zarr`](https://zarr.readthedocs.io/en/stable/)

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 60 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 69 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 78 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 64 KiB

24
parallel/README.md Normal file
View file

@ -0,0 +1,24 @@
# The dangers and joys of automatic parallelization (like in numpy linear algebra routines) and the use of clusters/schedulers (but also on your laptop)
- Go through the [notebook](../parallel.ipynb) to play around with numpy auto-parallelization, CPU affinity and OpenMP thread pool control
- Now we want to submit our code to a cluster, or even just running it in parallel on our own laptop:
- run [`overcommit.py`](overcommit.py) while monitoring with htop
- try the [`submit.sh`](submit.sh) script
- see problems with overcomitting
- explain the PSI (Pressure Stalled Information) fields in `htop`. Useful readings:
- https://docs.kernel.org/accounting/psi.html
- https://facebookmicrosites.github.io/psi/docs/overview
- Discuss implications for local and cluster workflows
# Hands on
- Let's try to make it more quantitative:
- Write a benchmark in the style of [benchmark_python](../benchmark_python/bench.py)
- We want to assess the performance of matrix multiplication as a function of:
- the size of the matrix `N`
- the number of openMP threads `T`, controlled with `threadpoolctl` or by environment variable `OMP_NUM_THREADS`
- the number of processes `P`, controlled by the [`submit.sh`](submit.sh) script or something similar
- The results will of course depend on the particular architecture of the machine on which you are running
- Submit your benchmark, together with some plotting routines, as a PR to this repo!

View file

@ -0,0 +1,37 @@
MSVG = 'template.svg'
TEMPL = '0xNN'
NREG = 4
N = 30
import sys
import subprocess
if N > 255:
print('cannot do it!')
sys.exit()
addresses = [hex(i).replace("0x", "0x0") for i in range(N//2+1)] + \
[hex(i) for i in range(N//2+1,N)] + \
[f'REG{i}' for i in range(NREG)]
msvg = open(MSVG).read()
delete = []
to_join = []
for a in addresses:
print(f'Processing {a}...')
new_svg = a+MSVG
new_svg_pdf = new_svg.replace('.svg', '.pdf')
with open(new_svg, 'wt') as svg:
svg.write(msvg.replace(TEMPL,a))
delete.extend([new_svg, new_svg_pdf])
subprocess.run(['inkscape', '--export-filename='+new_svg_pdf, new_svg], capture_output=True)
to_join.append(new_svg_pdf)
subprocess.run(['pdftk'] + to_join + ['output', 'materials.pdf'], capture_output=True)
subprocess.check_call(['rm'] + delete)

BIN
setup/materials.pdf Normal file

Binary file not shown.

105
setup/template.svg Normal file
View file

@ -0,0 +1,105 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape (http://www.inkscape.org/) -->
<svg
width="297mm"
height="210mm"
viewBox="0 0 297 210"
version="1.1"
id="svg1"
inkscape:version="1.4 (e7c3feb100, 2024-10-09)"
sodipodi:docname="template.svg"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns="http://www.w3.org/2000/svg"
xmlns:svg="http://www.w3.org/2000/svg">
<sodipodi:namedview
id="namedview1"
pagecolor="#ffffff"
bordercolor="#000000"
borderopacity="0.25"
inkscape:showpageshadow="2"
inkscape:pageopacity="0.0"
inkscape:pagecheckerboard="0"
inkscape:deskcolor="#d1d1d1"
inkscape:document-units="mm"
showguides="false"
inkscape:zoom="0.8979798"
inkscape:cx="543.44207"
inkscape:cy="383.08211"
inkscape:window-width="1681"
inkscape:window-height="1210"
inkscape:window-x="654"
inkscape:window-y="108"
inkscape:window-maximized="0"
inkscape:current-layer="layer1" />
<defs
id="defs1">
<rect
x="416.49044"
y="455.46682"
width="189.31384"
height="63.475816"
id="rect3" />
</defs>
<g
inkscape:label="Layer 1"
inkscape:groupmode="layer"
id="layer1">
<path
style="fill:#000000;stroke-width:0.264999"
d="M 8.5446427,90.75 C 284.33035,91.339285 283.74107,91.339285 283.74107,91.339285"
id="path1" />
<text
xml:space="preserve"
transform="scale(0.26458333)"
id="text3"
style="font-size:16px;line-height:100%;font-family:'Exo 2';-inkscape-font-specification:'Exo 2, ';text-align:start;text-decoration-color:#000000;letter-spacing:0px;word-spacing:-1.2px;writing-mode:lr-tb;direction:ltr;white-space:pre;shape-inside:url(#rect3);fill:#000000;stroke:#000000;stroke-width:11.3386;stroke-dasharray:none;stroke-opacity:1" />
<text
xml:space="preserve"
style="font-size:4.23333px;line-height:100%;font-family:'Exo 2';-inkscape-font-specification:'Exo 2, ';text-align:start;text-decoration-color:#000000;letter-spacing:0px;word-spacing:-0.3175px;writing-mode:lr-tb;direction:ltr;text-anchor:start;fill:#000000;stroke:#000000;stroke-width:0.1;stroke-dasharray:none;stroke-opacity:1"
x="113.4375"
y="140.83928"
id="text4"><tspan
sodipodi:role="line"
id="tspan4"
style="stroke-width:0.1;stroke-dasharray:none;stroke:#000000;fill:#000000"
x="113.4375"
y="140.83928" /></text>
<path
style="fill:#000000;stroke:#000000;stroke-width:2.87555;stroke-dasharray:11.50220013,2.87555003;stroke-opacity:1;stroke-dashoffset:0"
d="M 13.503009,105.28242 C 283.49699,104.71758 283.49699,104.71758 283.49699,104.71758"
id="path2" />
<path
style="fill:#000000;stroke:#000000;stroke-width:2.87555;stroke-dasharray:11.50220013,2.87555003;stroke-opacity:1;stroke-dashoffset:0"
d="M 13.503011,199.34547 C 283.49699,198.78063 283.49699,198.78063 283.49699,198.78063"
id="path2-1" />
<path
style="fill:#000000;stroke:#000000;stroke-width:2.87555;stroke-dasharray:11.50220013,2.87555003;stroke-opacity:1;stroke-dashoffset:0"
d="m 13.50301,12.253503 c 269.99398,-0.56484 269.99398,-0.56484 269.99398,-0.56484"
id="path2-4" />
<text
xml:space="preserve"
style="font-size:50.8px;line-height:100%;font-family:'Exo 2';-inkscape-font-specification:'Exo 2, ';text-align:start;text-decoration-color:#000000;letter-spacing:0px;word-spacing:-0.3175px;writing-mode:lr-tb;direction:ltr;text-anchor:start;fill:#000000;stroke:#000000;stroke-width:0.1;stroke-dasharray:none;stroke-opacity:1"
x="84.542786"
y="162.20676"
id="text5"><tspan
sodipodi:role="line"
id="tspan5"
style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:50.8px;font-family:'MesloLGL Nerd Font Mono';-inkscape-font-specification:'MesloLGL Nerd Font Mono';stroke-width:0.1"
x="84.542786"
y="162.20676">0xNN</tspan></text>
<text
xml:space="preserve"
style="font-size:50.8px;line-height:100%;font-family:'Exo 2';-inkscape-font-specification:'Exo 2, ';text-align:start;text-decoration-color:#000000;letter-spacing:0px;word-spacing:-0.3175px;writing-mode:lr-tb;direction:ltr;text-anchor:start;fill:#000000;stroke:#000000;stroke-width:0.1;stroke-dasharray:none;stroke-opacity:1"
x="-212.45721"
y="-47.793243"
id="text5-7"
transform="scale(-1)"><tspan
sodipodi:role="line"
id="tspan5-4"
style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:50.8px;font-family:'MesloLGL Nerd Font Mono';-inkscape-font-specification:'MesloLGL Nerd Font Mono';stroke-width:0.1"
x="-212.45721"
y="-47.793243">0xNN</tspan></text>
</g>
</svg>

After

Width:  |  Height:  |  Size: 4.9 KiB