Compare commits
4 commits
Author | SHA1 | Date | |
---|---|---|---|
cae320374b | |||
3dbc354460 | |||
b2e104b696 | |||
2ea0c0b60c |
93
README.md
Normal file
|
@ -0,0 +1,93 @@
|
|||
# What every scientist should know about computer architecture
|
||||
|
||||
## Introduction
|
||||
- [Puzzle](puzzle.ipynb)
|
||||
- Question: how come that swapping dimensions in a for-loop makes out for a huge slowdown?
|
||||
- Let students play around with the notebook and try to find the "bug"
|
||||
- A more thorough [benchmark](benchmark_python/)
|
||||
|
||||
|
||||
## A digression in CPU architecture and the memory hierarchy
|
||||
|
||||
- Go to [A Primer in CPU architecture](architecture/)
|
||||
- Measure size and timings for the memory hierarchy on my machine with a low level [C benchmark](benchmark_low_level/)
|
||||
|
||||
## Analog programming
|
||||
|
||||
- [Two exercises to activate the body and the mind](analog_programming.md)
|
||||
|
||||
|
||||
## Back to the Python benchmark (second try)
|
||||
|
||||
- can we explain what is happening?
|
||||
- it must have to do with the good (or bad) use of cache properties
|
||||
- but how are numpy arrays laid out in memory?
|
||||
|
||||
## Anatomy of a numpy array
|
||||
|
||||
- [memory layout of numpy arrays](numpy/)
|
||||
|
||||
## Back to the Python benchmark (third try)
|
||||
- can we explain what is happening now? Yes, more or less ;-)
|
||||
- quick fix for the [puzzle](puzzle.ipynb): try and add `order='F'` in the "bad" snippet and see that it "fixes" the bug ➔ why?
|
||||
- the default memeory layout is called "C-contiguous" or "row-major":
|
||||
```python
|
||||
np.zeros((2,2)).flags.c_contiguous == True
|
||||
np.zeros((2,2)).flags.f_contiguous == False
|
||||
```
|
||||
- note that for one-dimensional arrays it makes no difference:
|
||||
```python
|
||||
np.zeros(2).flags.c_contiguous == True
|
||||
np.zeros(2).flags.f_contiguous == True
|
||||
```
|
||||
- rule of thumb for multi-dimensional numpy arrays:
|
||||
- the right-most index should be the inner-most loop in a series of nested loops over the dimensions of a multi-dimensional array
|
||||
- the previous rule can be remembered as *the right-most index changes the faster* in a series of nested loops
|
||||
- the logically contiguous data, for example the data points of a single time series, should be stored along the right-most dimension:
|
||||
```python
|
||||
x = np.zeros((n_series, lenght_of_one_series)) # ➔ good!
|
||||
y = np.zeros((length_of_one_series, n_series)) # ➔ bad!
|
||||
```
|
||||
- … unless of course you plan to mostly loop *across* time series :)
|
||||
- watch out when migrating code from MATLAB® : it stores data in memory using the opposite convention, the column-major order!
|
||||
- **DANGER**: watch out when working with [`pandas.DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html):
|
||||
|
||||
➔ the data are stored in memory using different conventions depending on how the `DataFrame` was initialized! Be sure to
|
||||
check the `DataFrame.values.flags` attribute!
|
||||
|
||||
## A final exercise to put it all together
|
||||
- fork this repo to your account and clone your fork on the laptop
|
||||
- create a branch `ex` and switch to it
|
||||
- work on the [exercise](exercise.ipynb)
|
||||
- push your solution to your fork and create a Pull Request to this repo
|
||||
|
||||
|
||||
## Notes on the benchmarks
|
||||
|
||||
- while running the benchmarks attached to one core on my laptop, the core was running under a constant load of 100% (almost completely user-time) and at a fixed frequency of 3.8 GHz, where the theoretical max would be 5.2 GHz
|
||||
|
||||
➔ the CPU does not "starve" because it scales its speed down to match the memory throughput? Or I am misinterpreting this? This problem which at first sight should be perfectly memory-bound, becomes CPU-bound, or actually, exactly balanced? From the [Intel documentation](https://lenovopress.lenovo.com/lp1836-tuning-uefi-settings-4th-gen-intel-xeon-scalable-processor):
|
||||
> **Energy Efficient Turbo**
|
||||
>
|
||||
> When `Energy Efficient Turbo` is enabled, the CPU’s optimal turbo
|
||||
> frequency will be tuned dynamically based on CPU utilization. The actual
|
||||
> turbo frequency the CPU is set to is proportionally adjusted based on the
|
||||
> duration of the turbo request. Memory usage of the OS is also monitored.
|
||||
> If the OS is using memory heavily and the CPU core performance is limited
|
||||
> by the available memory resources, the turbo frequency will be reduced
|
||||
> until more memory load dissipates, and more memory resources become
|
||||
> available. The power/performance bias setting also influences energy
|
||||
> efficient turbo. `Energy Efficient Turbo` is best used when attempting to
|
||||
> maximize power consumption over performance.
|
||||
|
||||
## Concluding remarks
|
||||
- how is all of this relevant for the users of a computing cluster?
|
||||
- Never trust benchmarks! See for example [Producing Wrong Data Without Doing Anything Obviously Wrong!](https://users.cs.northwestern.edu/~robby/courses/322-2013-spring/mytkowicz-wrong-data.pdf)
|
||||
|
||||
## Additional material if there's time left
|
||||
- [Excerpts of parallel Python](parallel)
|
||||
- how does memory *allocation* to processes work at the OS level?
|
||||
- virtual memory
|
||||
- swap
|
||||
- optimistic over-committing allocation policies
|
||||
- the oom-killer watchdog
|
BIN
architecture/GPUvsCPU-architecture.png
Normal file
After Width: | Height: | Size: 92 KiB |
67
architecture/NAND-gate.svg
Normal file
|
@ -0,0 +1,67 @@
|
|||
<?xml version="1.0" encoding="utf-8"?>
|
||||
<!-- Generator: Adobe Illustrator 12.0.1, SVG Export Plug-In . SVG Version: 6.00 Build 51448) -->
|
||||
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd" [
|
||||
<!ENTITY ns_svg "http://www.w3.org/2000/svg">
|
||||
<!ENTITY ns_xlink "http://www.w3.org/1999/xlink">
|
||||
<!ENTITY st0 "letter-spacing:-3px;">
|
||||
<!ENTITY st1 "fill:none;">
|
||||
<!ENTITY st2 "fill:none;stroke:#000000;stroke-width:3px;">
|
||||
<!ENTITY st3 "fill:none;stroke:#000000;stroke-width:6px;">
|
||||
<!ENTITY st4 "font-size:36px;">
|
||||
<!ENTITY st5 "font-size:48px;">
|
||||
<!ENTITY st6 "font-family:Times,'Times New Roman',serif; font-weight:normal; font-style:italic;">
|
||||
]>
|
||||
<svg version="1.1" id="Layer_2" xmlns="&ns_svg;" xmlns:xlink="&ns_xlink;" width="648" height="531" viewBox="0 0 648 531"
|
||||
style="overflow:visible;enable-background:new 0 0 648 531;" xml:space="preserve">
|
||||
<rect style="&st1;" width="648" height="531"/>
|
||||
<g>
|
||||
<g>
|
||||
<line style="&st3;" x1="441" y1="297" x2="441" y2="369"/>
|
||||
<line style="&st2;" x1="441" y1="324" x2="486" y2="297"/>
|
||||
<g>
|
||||
<line style="&st2;" x1="441" y1="342" x2="486" y2="369"/>
|
||||
<polygon points="466.9,366.817 486,369 475.086,353.175 "/>
|
||||
</g>
|
||||
<circle style="&st2;" cx="459" cy="333" r="45"/>
|
||||
</g>
|
||||
<g>
|
||||
<line style="&st3;" x1="288" y1="234" x2="216" y2="234"/>
|
||||
<line style="&st2;" x1="261" y1="234" x2="288" y2="279"/>
|
||||
<g>
|
||||
<line style="&st2;" x1="243" y1="234" x2="216" y2="279"/>
|
||||
<polygon points="218.183,259.9 216,279 231.825,268.086 "/>
|
||||
</g>
|
||||
<g>
|
||||
<line style="&st2;" x1="261" y1="234" x2="234" y2="279"/>
|
||||
<polygon points="236.183,259.9 234,279 249.825,268.086 "/>
|
||||
</g>
|
||||
<circle style="&st2;" cx="252" cy="252" r="45"/>
|
||||
</g>
|
||||
<polyline style="&st2;" points="288,279 360,279 360,333 441,333 "/>
|
||||
<polyline style="&st2;" points="234,279 180,369 81,369 "/>
|
||||
<polyline style="&st2;" points="216,279 189,324 81,324 "/>
|
||||
<line style="&st2;" x1="81" y1="468" x2="594" y2="468"/>
|
||||
<line style="&st2;" x1="486" y1="495" x2="540" y2="495"/>
|
||||
<line style="&st2;" x1="495" y1="504" x2="531" y2="504"/>
|
||||
<line style="&st2;" x1="504" y1="513" x2="522" y2="513"/>
|
||||
<line style="&st2;" x1="513" y1="495" x2="513" y2="468"/>
|
||||
<line style="&st2;" x1="486" y1="369" x2="486" y2="468"/>
|
||||
<line style="&st2;" x1="252" y1="234" x2="252" y2="171"/>
|
||||
<polyline style="&st2;" points="252,171 266,164.25 238,150.75 266,137.25 238,123.75 266,110.25 238,96.75 252,90 "/>
|
||||
<polyline style="&st2;" points="486,171 500,164.25 472,150.75 500,137.25 472,123.75 500,110.25 472,96.75 486,90 "/>
|
||||
<line style="&st2;" x1="486" y1="297" x2="486" y2="171"/>
|
||||
<line style="&st2;" x1="252" y1="90" x2="252" y2="45"/>
|
||||
<line style="&st2;" x1="486" y1="45" x2="486" y2="90"/>
|
||||
<line style="&st2;" x1="81" y1="45" x2="594" y2="45"/>
|
||||
<line style="&st2;" x1="486" y1="216" x2="594" y2="216"/>
|
||||
<text transform="matrix(1 0 0 1 18 54)"><tspan x="0" y="0" style="&st6; &st5; &st0;">V</tspan><tspan x="25.48" y="5" style="&st6; &st4;">cc</tspan></text>
|
||||
<text transform="matrix(1 0 0 1 18 459)" style="&st6; &st5;">GND</text>
|
||||
<text transform="matrix(1 0 0 1 170 140)" style="&st6; &st5;">R1</text>
|
||||
<text transform="matrix(1 0 0 1 410 140)" style="&st6; &st5;">R2</text>
|
||||
<text transform="matrix(1 0 0 1 120 230)" style="&st6; &st5;">VT1</text>
|
||||
<text transform="matrix(1 0 0 1 380 280)" style="&st6; &st5;">VT2</text>
|
||||
<text transform="matrix(1 0 0 1 19.2959 333)" style="&st6; &st5;">A</text>
|
||||
<text transform="matrix(1 0 0 1 18.4336 377)" style="&st6; &st5;">B</text>
|
||||
<text transform="matrix(1 0 0 1 603 223.1104)" style="&st6; &st5;">Q</text>
|
||||
</g>
|
||||
</svg>
|
After Width: | Height: | Size: 3.6 KiB |
123
architecture/README.md
Normal file
|
@ -0,0 +1,123 @@
|
|||
# A Primer on computer architecture
|
||||
|
||||
## Binary representation of common data types
|
||||
- **integers**
|
||||
- int32 ➔ 32b (bits) = 4B (bytes)
|
||||
- 1 bit for sign, 31 bit for magnitude
|
||||
- min = -2^31 = -2,147,483,648
|
||||
- max = 2^31-1 = 2,147,483,647
|
||||
- [visualization](https://manderc.com/apps/umrechner/index_eng.php)
|
||||
- In Python:
|
||||
- `bin(14)` ➔ `'0b1110'`
|
||||
- `np.iinfo(np.int32)` ➔ `iinfo(min=-2147483648, max=2147483647, dtype=int32)`
|
||||
- Python integers, as opposed to numpy integer types, are represented with a flexible number of bits: `sys.int_info` ➔ `bits_per_digit=30, sizeof_digit=4, default_max_str_digits=4300, str_digits_check_threshold=640`
|
||||
- they are called "long" or "superlong" integers, because they can have arbitrary size. Low level implementation explained:
|
||||
- [Arpit Bhayani's blog](https://arpitbhayani.me/blogs/long-integers-python/)
|
||||
- [Artem Golubin's blog](https://rushter.com/blog/python-integer-implementation/)
|
||||
|
||||
- **real numbers**, a.k.a. floating point numbers (IEEE-754 standard):
|
||||
- float64 ➔ 64b (bits) = 8B (bytes)
|
||||
- 1 bit for sign, 51 bits for mantissa, 11 bits for exponent
|
||||
- min, max ≈ ± 1.8 × 10^308
|
||||
- smallest ≈ 2.2 x 10^-308
|
||||
- example in Python:
|
||||
- `np.finfo(np.float64)` ➔ `finfo(resolution=1e-15, min=-1.7976931348623157e+308, max=1.7976931348623157e+308, dtype=float64)`
|
||||
- `sys.float_info` ➔ `max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1`
|
||||
- next floating point number after `1.` in the direction of `2.`: `np.nextafter(1., 2.)` ➔ `1.0000000000000002`
|
||||
- watch out for equality between floating point numbers:
|
||||
- `1.1 + 2.2 == 3.3` ➔ `False`
|
||||
- `(1.1+2.2).hex()` ➔ `0x1.a666666666667p+1` and `(3.3).hex()` ➔ `0x1.a666666666666p+1`
|
||||
- visualization:
|
||||
- [`float32`](https://www.h-schmidt.net/FloatConverter/IEEE754.html)
|
||||
- [all types](https://float.exposed/) with [explanation](https://ciechanow.ski/exposing-floating-point/)
|
||||
- Docs with more details: [What Every Programmer Should Know About Floating-Point Arithmetic or Why don’t my numbers add up?](https://floating-point-gui.de)
|
||||
- Docs with the gory details, a.k.a. the floating point bible: [What every computer scientist should know about floating-point arithmetic](https://doi.org/10.1145/103162.103163)
|
||||
|
||||
- **strings**:
|
||||
- UTF8 encoded, flexible width from 1B (byte) to 4B (bytes): 1,112,064 Unicode characters (code points)
|
||||
- ASCII: 7 bits (fits in one byte), 127 characters ➔ [ASCII table](https://upload.wikimedia.org/wikipedia/commons/2/26/ASCII_Table_%28suitable_for_printing%29.svg)
|
||||
- [visualization](https://sonarsource.github.io/utf8-visualizer/)
|
||||
- actually in Python strings (more precisely: unicode objects) are stored in different formats depending on which characters are stored for memory efficiency. Look at the gory details [here](https://docs.python.org/3.14/c-api/unicode.html) ➔ not for the faint-hearted!
|
||||
|
||||
- **hexadecimal notation**:
|
||||
- base16 ➔ '0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f'
|
||||
- a compact way of representating binary data: 8 bits = 1 byte = 2 hexadecimal digits
|
||||
- Example: `254` (decimal) ➔ `1111 1110` (binary) ➔ `fe` (hex)
|
||||
|
||||
## CPU architecture
|
||||

|
||||
- Primer on CPU (x86_64) architecture and the memory hierarchy:
|
||||
- CPU registers ≈ 160 (plus another ~500 model specific), latency: 0 cycles, capacity: 8 bytes
|
||||
- x86-64 instruction set, with ≈ 2000 instructions with mnemonic (plus an unknown number of "undocumented" instructions ~ 10k)
|
||||
- (single instruction, mutiple data [8 or 16 data units]) SIMD CPUs
|
||||
- L-Caches: L1/L2/L3, with cache lines of 128B, latencies: 1-40 cycles, capacity: ~KB and ~MB
|
||||
- Main memory: RAM pages 4KB or 64KB, latency: 50~100 cycles, capacity ~GBs
|
||||
- Storage (local disks): disk transfer blocks 4KB to 64MB, latency: 0.1ms (300k cycles), capacity: ~TBs
|
||||
- Remote Storage (network): typically limited by ethernet connection (1-10 GB/s), latency: 10~100 ms, capacity: ∞
|
||||
- Understand the trade-offs involved:
|
||||
- **capacity** measured in T/G/M/K/B
|
||||
- **latency** ≈ (time-when-data-available-on-output-pins – time-when-data-requested) ➔ measured in nanoseconds or in (CPU) cycles
|
||||
- **bandwidth** ≈ clock frequency × data-transfer/tick × bus-width (in bytes) ➔ measured in T/G/M/K/B per second (this is what is usually advertised as the **speed**)
|
||||
- data **volatility** vs. **persistence**
|
||||
- cost
|
||||
- physical limits (heat dissipation, density, size, lifetime)
|
||||
- temporal and spacial locality of data
|
||||
- The gory details about the memory hierarchy: [What Every Programmer Should Know About Memory](https://www.akkadia.org/drepper/cpumemory.pdf) by the notorious Ulrich Drepper
|
||||
|
||||

|
||||
|
||||
# …and what about the GPU?
|
||||

|
||||
- A GPU has many (in the order of hundreds) SIMT (single instruction, multiple thread) cores, so called SMs (Streaming Multiprocessors), each one with local L1 and shared L2 caches, and shared RAM (due to to the high parallelism, with huge bandwidth, in the order of ~1TB/s)
|
||||
- The SMs are specialized on data types. In order of abundance, the following data types are supported: int8, int32, int64, float16, float32, float64
|
||||
- Performance depends on:
|
||||
- memory bandwidth: usually higher than CPU's RAM
|
||||
- "math" bandwidth: usually higher than CPU's, but much more limited in capability; for example branches (if/else) are expensive
|
||||
- latency: usually much higher than CPU's ➔ the more parallel threads are run the less the price of high latency is paid (latency "hiding")
|
||||
- spatial locality is extremely critical
|
||||
- A portion of the GPU-RAM is accessible to the CPU ➔ the GPU performs the copies
|
||||
- The PCI-Bus (Peripheral Component Interconnect bus) is the bottleneck: data needs to flow from main (CPU) memory to GPU memory and back!
|
||||
- Problems on a cluster: the GPU does not really support simultanous multiple users payloads!
|
||||
|
||||
# Computer Architecture (a concrete example)
|
||||
My Laptop:
|
||||

|
||||
- Lenovo - T14 Gen 4
|
||||
- CPU i7-1365U:
|
||||
- 2× "performance cores" (Intel Core) max 5.20 GHz (0.19 ns/cycle) with Hyper-Threading
|
||||
- 8× "efficient cores" (Intel Atom) max 3.90 GHz (0.26 ns/cycle)
|
||||
- L1 (data) cache P-Core 48 KB
|
||||
- L1 (data) cache E-Core 32 KB
|
||||
- L2 cache P-Core 1280 KB
|
||||
- L2 cache E-Core 2048 KB (shared x4)
|
||||
- L3 cache 12 MB (shared P+E-Cores)
|
||||
- RAM DDR5-5200: 32GB (16GB soldered + 16GB bank):
|
||||
- Data rate 5200 MT/s, Transfer time 0.192 ns/cycle
|
||||
- Command rate (bus clock) 2600 MHz, Cycle time 0.385 ns
|
||||
- Internal clock 650 MHz, 1.54 ns
|
||||
- CAS Latency 34 cycles, Total latency = CAS latency x cycle = 13.09 ns, Throughput 40.6 GB/s
|
||||
- DMI (Direct Media Interface): 8×16 GT/s (≈128 GB/s)
|
||||
- PCI (Peripheral Component Interconnect) Express bridges:
|
||||
- Graphics: 16 GT/s (≈ 8 GB/s)
|
||||
- 2× Thunderbolt: 2.5 GT/s (≈ 1 GB/s) and 16 GT/s (≈ 8 GB/s)
|
||||
- GPU Intel Iris, Internal clock 300 Mhz-1.30 GHz, memory 4 GB/2.1 GHz with a bandwidth of 68 GB/s
|
||||
|
||||
## Historical evolution of speed of different components in a computer
|
||||
(data source: Wikipedia)
|
||||
|
||||
### CPU Clock Rate
|
||||

|
||||
|
||||
### Memory (RAM) Clock Cycle Time
|
||||

|
||||
|
||||
### Memory (RAM) Bandwidth
|
||||

|
||||
|
||||
### Memory (RAM) Latency
|
||||

|
||||
|
||||
### Storage Read Speed
|
||||

|
||||
|
||||
|
1
architecture/comp_architecture_big_picture.svg
Normal file
After Width: | Height: | Size: 26 KiB |
BIN
architecture/comp_architecture_big_picture_presentation.pdf
Normal file
1
architecture/comp_architecture_schematics.svg
Normal file
After Width: | Height: | Size: 94 KiB |
BIN
architecture/comp_architecture_schematics_presentation.pdf
Normal file
201
architecture/cpu_clock_rate.csv
Normal file
|
@ -0,0 +1,201 @@
|
|||
1969,1 MHz
|
||||
1971,740 kHz
|
||||
1972,200 kHz
|
||||
1972,400 kHz
|
||||
1972,500 kHz
|
||||
1973,1 MHz
|
||||
1973,2 MHz
|
||||
1973,715 kHz
|
||||
1974,1.33 MHz
|
||||
1974,1.4 MHz
|
||||
1974,1 MHz
|
||||
1974,2 MHz
|
||||
1974,400 kHz
|
||||
1974,500 kHz
|
||||
1974,715 kHz
|
||||
1974,740 kHz
|
||||
1975,1.2 MHz
|
||||
1975,10 MHz
|
||||
1975,1 MHz
|
||||
1975,256 kHz
|
||||
1975,2 MHz
|
||||
1975,3.3 MHz
|
||||
1975,4 MHz
|
||||
1976,2.5 MHz
|
||||
1976,3.3 MHz
|
||||
1976,6.4 MHz
|
||||
1976,8 MHz
|
||||
1977,1.0 MHz
|
||||
1977,2.0 MHz
|
||||
1977,3.0 MHz
|
||||
1978,1 MHz
|
||||
1978,5 MHz
|
||||
1979,5 MHz
|
||||
1979,8 MHz
|
||||
1981,10 MHz
|
||||
1981,2.5 MHz
|
||||
1982,18 MHz
|
||||
1982,1 MHz
|
||||
1982,6 MHz
|
||||
1982,8 MHz
|
||||
1983,2 MHz
|
||||
1983,3 MHz
|
||||
1984,16 MHz
|
||||
1984,5 MHz
|
||||
1985,12 MHz
|
||||
1985,5 MHz
|
||||
1985,8 MHz
|
||||
1986,15 MHz
|
||||
1986,16 MHz
|
||||
1987,10 MHz
|
||||
1987,12.5 MHz
|
||||
1987,16 MHz
|
||||
1987,20 MHz
|
||||
1987,8 MHz
|
||||
1988,10 MHz
|
||||
1988,12 MHz
|
||||
1988,25 MHz
|
||||
1989,16-33 MHz
|
||||
1989,25 MHz
|
||||
1989,35 MHz
|
||||
1990,20-30 MHz
|
||||
1990,40 MHz
|
||||
1991,100 MHz
|
||||
1991,33 MHz
|
||||
1991,62.5-90.91 MHz
|
||||
1992,100 MHz
|
||||
1992,100-200 MHz
|
||||
1992,20 MHz
|
||||
1992,40 MHz
|
||||
1992,40-50 MHz
|
||||
1993,120 MHz
|
||||
1993,50-80 MHz
|
||||
1993,55-71.5 MHz
|
||||
1993,60-66 MHz
|
||||
1994,100 MHz
|
||||
1994,100-125 MHz
|
||||
1994,100-180 MHz
|
||||
1994,125 MHz
|
||||
1994,200-300 MHz
|
||||
1994,50 MHz
|
||||
1994,60-120 MHz
|
||||
1994,60-125 MHz
|
||||
1995,101-118 MHz
|
||||
1995,143-167 MHz
|
||||
1995,150-200 MHz
|
||||
1995,266-333 MHz
|
||||
1996,141-161 MHz
|
||||
1996,150 MHz
|
||||
1996,150-250 MHz
|
||||
1996,160-180 MHz
|
||||
1996,180-250 MHz
|
||||
1996,400-500 MHz
|
||||
1996,75-100 MHz
|
||||
1997,120-150 MHz
|
||||
1997,125 MHz
|
||||
1997,166-233 MHz
|
||||
1997,200 MHz
|
||||
1997,233-300 MHz
|
||||
1997,233-366 MHz
|
||||
1997,250-400 MHz
|
||||
1997,370 MHz
|
||||
1998,200 MHz
|
||||
1998,250-300 MHz
|
||||
1998,250-330 MHz
|
||||
1998,262 MHz
|
||||
1998,270-400 MHz
|
||||
1998,300-440 MHz
|
||||
1998,450-600 MHz
|
||||
1998,500 MHz
|
||||
1999,294-300 MHz
|
||||
1999,350-500 MHz
|
||||
1999,450 MHz
|
||||
1999,450-600 MHz
|
||||
1999,500-1000 MHz
|
||||
1999,550-637 MHz
|
||||
2000,1.33-1.73 GHz
|
||||
2000,1.3-2 GHz
|
||||
2000,450-810 MHz
|
||||
2000,550 MHz-1.3 GHz
|
||||
2000,600-750 MHz
|
||||
2000,918 MHz
|
||||
2001,1.1-1.4 GHz
|
||||
2001,500-600 MHz
|
||||
2001,733-800 MHz
|
||||
2001,750-1200 MHz
|
||||
2002,0.9-1 GHz
|
||||
2002,1.1-1.35 GHz
|
||||
2003,0.9-1.7 GHz
|
||||
2003,1.4-2.4 GHz
|
||||
2003,1.6-2.0 GHz
|
||||
2004,1.65-1.9 GHz
|
||||
2004,700 MHz
|
||||
2005,1.05-1.35 GHz
|
||||
2005,1.2-2.5 GHz
|
||||
2005,1.6-3.0 GHz
|
||||
2005,1-1.4 GHz
|
||||
2005,2.8-3.2 GHz
|
||||
2005,2-2.4 GHz
|
||||
2005,3.2 GHz
|
||||
2006,1.06-2.67 GHz
|
||||
2006,1.1-2.33 GHz
|
||||
2006,1.4-1.6 GHz
|
||||
2006,3.2-4.6 GHz
|
||||
2007,1.8-3.2 GHz
|
||||
2007,1-1.4 GHz
|
||||
2007,2.15-2.4 GHz
|
||||
2007,3.5-4.7 GHz
|
||||
2007,600-900 MHz
|
||||
2007,850 MHz
|
||||
2008,0.8-1.6 GHz
|
||||
2008,1.8-2.6 GHz
|
||||
2008,2.3-2.9 GHz
|
||||
2008,2.4-2.88 GHz
|
||||
2008,2.66-3.2 GHz
|
||||
2008,2.8-4.0 GHz
|
||||
2008,4.4 GHz
|
||||
2008,600-866 MHz
|
||||
2009,2.2-2.8 GHz
|
||||
2009,2.5-3.2 GHz
|
||||
2010,1.6 GHz
|
||||
2010,1.73-2.66 GHz
|
||||
2010,1.7-2.4 GHz
|
||||
2010,1.86-3.33 GHz
|
||||
2010,2.66-3.0 GHz
|
||||
2010,2 GHz
|
||||
2010,3.8-5.2 GHz
|
||||
2010,3-4.14 GHz
|
||||
2011,1.0-1.6 GHz
|
||||
2011,1.6 GHz
|
||||
2011,1.6-3.4 GHz
|
||||
2011,1.73-2.67 GHz
|
||||
2011,2.0 GHz
|
||||
2011,2.8-3 GHz
|
||||
2011,3.1-3.6 GHz
|
||||
2012,1.73-2.53 GHz
|
||||
2012,1.848 GHz
|
||||
2012,3.1-5.3 GHz
|
||||
2012,5.5 GHz
|
||||
2013,1.9-4.4 GHz
|
||||
2013,2.8-3 GHz
|
||||
2013,3.6 GHz
|
||||
2014,1.8-4 GHz
|
||||
2014,2.5-5 GHz
|
||||
2015,3.6 GHz
|
||||
2015,5 GHz
|
||||
2016,320 MHz
|
||||
2017,1.5 GHz
|
||||
2017,3.2-4.1 GHz
|
||||
2017,4 GHz
|
||||
2017,5.2 GHz
|
||||
2017,5 GHz
|
||||
2018,1.5 GHz
|
||||
2018,2.2-3.2 GHz
|
||||
2018,2.8-3.7 GHz
|
||||
2019,2-4.7 GHz
|
||||
2019,5.2 GHz
|
||||
2020,3.2 GHz
|
||||
2020,3.4-4.9 GHz
|
||||
2021,3.2 GHz
|
||||
2022,3.2 GHz
|
||||
2022,5 GHz
|
|
83
architecture/cpu_clock_rate.py
Normal file
|
@ -0,0 +1,83 @@
|
|||
# CPU clock rate data from Wikipedia
|
||||
# https://en.wikipedia.org/wiki/Microprocessor_chronology
|
||||
# Table data extracted with: https://wikitable2csv.ggor.de/
|
||||
import numpy as np
|
||||
import matplotlib
|
||||
from matplotlib import pyplot as plt
|
||||
plt.style.use('ggplot')
|
||||
matplotlib.rcParams['font.size'] = 12
|
||||
matplotlib.rcParams['font.family'] = ['Exo 2', 'sans-serif']
|
||||
|
||||
data = open('cpu_clock_rate.csv', 'rt')
|
||||
|
||||
# first, remove units and rescale everything to MHz
|
||||
rescaled = []
|
||||
for line in data:
|
||||
date, raw = line.split(',')
|
||||
try:
|
||||
value, unit = raw.split()
|
||||
except ValueError:
|
||||
# there are lines with multiple units, for example
|
||||
# 550 MHz-1.3 GHz
|
||||
# take the left most one
|
||||
raw = raw.split('-')[1]
|
||||
value, unit = raw.split()
|
||||
# if value is in the form X-Y, just use the biggest, i.e. Y
|
||||
if '-' in value:
|
||||
value = value.split('-')[1]
|
||||
value = float(value)
|
||||
# rescale value
|
||||
if unit == 'kHz':
|
||||
value = value/1000
|
||||
elif unit == 'GHz':
|
||||
value = value*1000
|
||||
elif unit == 'MHz':
|
||||
pass
|
||||
else:
|
||||
raise ValueError(f'Unit not understood! {unit}')
|
||||
rescaled.append((date, value))
|
||||
|
||||
dtype = [('year', np.float64), ('clock', np.float64)]
|
||||
rescaled = np.array(rescaled, dtype=dtype)
|
||||
# sort first by year and then by value
|
||||
rescaled.sort(order=['year', 'clock'])
|
||||
|
||||
# add some jitter on values corresponding to the same year, so that the plot
|
||||
# looks more understandable
|
||||
old_year = rescaled[0][0]
|
||||
count = 0
|
||||
for row in range(rescaled.shape[0]):
|
||||
year = rescaled[row][0]
|
||||
count += 1
|
||||
if year != old_year:
|
||||
# add jitter to the values corresponding to the previous year
|
||||
prev_count = count-1
|
||||
if prev_count > 1:
|
||||
jitter = 1/prev_count
|
||||
for i in range(1, count):
|
||||
loc_year, loc_value = rescaled[row-count+i]
|
||||
rescaled[row-count+i] = (loc_year+(jitter*(i-1)), loc_value)
|
||||
# restart counting
|
||||
count = 1
|
||||
old_year = year
|
||||
|
||||
# plot the thing
|
||||
plt.figure(figsize=(8.5,7.5))
|
||||
plt.semilogy(rescaled['year'], rescaled['clock'], 'o')
|
||||
# my laptop here
|
||||
plt.semilogy([2020], [4900], 'o')
|
||||
plt.grid(None)
|
||||
plt.grid(which='both', axis='both')
|
||||
plt.ylim(0.1, 10000)
|
||||
plt.xlim(1968, 2025)
|
||||
years = np.arange(1970, 2025, 5)
|
||||
plt.xticks(years, years)
|
||||
plt.yticks([0.1, 1,10,100,1000, 10000], ['1 kHz\n1 ms', '1 MHz\n1 µs', '10 MHz\n100 ns',
|
||||
'100 MHz\n10 ns', '1 GHz\n1 ns', '10 GHz\n0.1 ns'])
|
||||
plt.tick_params(labelright=True, top=True, right=True)
|
||||
plt.title('CPU clock rate')
|
||||
plt.savefig('cpu_clock_rate.svg')
|
||||
|
||||
|
||||
|
||||
|
2047
architecture/cpu_clock_rate.svg
Normal file
After Width: | Height: | Size: 76 KiB |
148
architecture/memory.csv
Normal file
|
@ -0,0 +1,148 @@
|
|||
Generation,Type,Data rate (MT/s),Transfer time (ns),Command rate (MHz),Cycle time (ns),CAS latency,First word (ns),Fourth word (ns),Eighth word (ns)
|
||||
SDRAM,PC100,100,10.000,100,10.000,2,20.00,50.00,90.00
|
||||
SDRAM,PC133,133,7.500,133,7.500,3,22.50,45.00,75.00
|
||||
DDR SDRAM,DDR-333,333,3.000,166,6.000,2.5,15.00,24.00,36.00
|
||||
DDR SDRAM,DDR-400,400,2.500,200,5.000,3,15.00,22.50,32.50
|
||||
DDR SDRAM,DDR-400,400,2.500,200,5.000,2.5,12.50,20.00,30.00
|
||||
DDR SDRAM,DDR-400,400,2.500,200,5.000,2,10.00,17.50,27.50
|
||||
DDR2 SDRAM,DDR2-400,400,2.500,200,5.000,4,20.00,27.50,37.50
|
||||
DDR2 SDRAM,DDR2-400,400,2.500,200,5.000,3,15.00,22.50,32.50
|
||||
DDR2 SDRAM,DDR2-533,533,1.875,266,3.750,4,15.00,20.63,28.13
|
||||
DDR2 SDRAM,DDR2-533,533,1.875,266,3.750,3,11.25,16.88,24.38
|
||||
DDR2 SDRAM,DDR2-667,667,1.500,333,3.000,5,15.00,19.50,25.50
|
||||
DDR2 SDRAM,DDR2-667,667,1.500,333,3.000,4,12.00,16.50,22.50
|
||||
DDR2 SDRAM,DDR2-800,800,1.250,400,2.500,6,15.00,18.75,23.75
|
||||
DDR2 SDRAM,DDR2-800,800,1.250,400,2.500,5,12.50,16.25,21.25
|
||||
DDR2 SDRAM,DDR2-800,800,1.250,400,2.500,4.5,11.25,15.00,20.00
|
||||
DDR2 SDRAM,DDR2-800,800,1.250,400,2.500,4,10.00,13.75,18.75
|
||||
DDR2 SDRAM,DDR2-1066,1066,0.938,533,1.875,7,13.13,15.94,19.69
|
||||
DDR2 SDRAM,DDR2-1066,1066,0.938,533,1.875,6,11.25,14.06,17.81
|
||||
DDR2 SDRAM,DDR2-1066,1066,0.938,533,1.875,5,9.38,12.19,15.94
|
||||
DDR2 SDRAM,DDR2-1066,1066,0.938,533,1.875,4.5,8.44,11.25,15.00
|
||||
DDR2 SDRAM,DDR2-1066,1066,0.938,533,1.875,4,7.50,10.31,14.06
|
||||
DDR3 SDRAM,DDR3-1066,1066,0.938,533,1.875,7,13.13,15.94,19.69
|
||||
DDR3 SDRAM,DDR3-1333,1333,0.750,666,1.500,9,13.50,15.75,18.75
|
||||
DDR3 SDRAM,DDR3-1333,1333,0.750,666,1.500,7,10.50,12.75,15.75
|
||||
DDR3 SDRAM,DDR3-1333,1333,0.750,666,1.500,6,9.00,11.25,14.25
|
||||
DDR3 SDRAM,DDR3-1375,1375,0.727,687,1.455,5,7.27,9.45,12.36
|
||||
DDR3 SDRAM,DDR3-1600,1600,0.625,800,1.250,11,13.75,15.63,18.13
|
||||
DDR3 SDRAM,DDR3-1600,1600,0.625,800,1.250,10,12.50,14.38,16.88
|
||||
DDR3 SDRAM,DDR3-1600,1600,0.625,800,1.250,9,11.25,13.13,15.63
|
||||
DDR3 SDRAM,DDR3-1600,1600,0.625,800,1.250,8,10.00,11.88,14.38
|
||||
DDR3 SDRAM,DDR3-1600,1600,0.625,800,1.250,7,8.75,10.63,13.13
|
||||
DDR3 SDRAM,DDR3-1600,1600,0.625,800,1.250,6,7.50,9.38,11.88
|
||||
DDR3 SDRAM,DDR3-1866,1866,0.536,933,1.071,10,10.71,12.32,14.46
|
||||
DDR3 SDRAM,DDR3-1866,1866,0.536,933,1.071,9,9.64,11.25,13.39
|
||||
DDR3 SDRAM,DDR3-1866,1866,0.536,933,1.071,8,8.57,10.18,12.32
|
||||
DDR3 SDRAM,DDR3-2000,2000,0.500,1000,1.000,9,9.00,10.50,12.50
|
||||
DDR3 SDRAM,DDR3-2133,2133,0.469,1066,0.938,12,11.25,12.66,14.53
|
||||
DDR3 SDRAM,DDR3-2133,2133,0.469,1066,0.938,11,10.31,11.72,13.59
|
||||
DDR3 SDRAM,DDR3-2133,2133,0.469,1066,0.938,10,9.38,10.78,12.66
|
||||
DDR3 SDRAM,DDR3-2133,2133,0.469,1066,0.938,9,8.44,9.84,11.72
|
||||
DDR3 SDRAM,DDR3-2133,2133,0.469,1066,0.938,8,7.50,8.91,10.78
|
||||
DDR3 SDRAM,DDR3-2133,2133,0.469,1066,0.938,7,6.56,7.97,9.84
|
||||
DDR3 SDRAM,DDR3-2200,2200,0.455,1100,0.909,7,6.36,7.73,9.55
|
||||
DDR3 SDRAM,DDR3-2400,2400,0.417,1200,0.833,13,10.83,12.08,13.75
|
||||
DDR3 SDRAM,DDR3-2400,2400,0.417,1200,0.833,12,10.00,11.25,12.92
|
||||
DDR3 SDRAM,DDR3-2400,2400,0.417,1200,0.833,11,9.17,10.42,12.08
|
||||
DDR3 SDRAM,DDR3-2400,2400,0.417,1200,0.833,10,8.33,9.58,11.25
|
||||
DDR3 SDRAM,DDR3-2400,2400,0.417,1200,0.833,9,7.50,8.75,10.42
|
||||
DDR3 SDRAM,DDR3-2600,2600,0.385,1300,0.769,11,8.46,9.62,11.15
|
||||
DDR3 SDRAM,DDR3-2666,2666,0.375,1333,0.750,15,11.25,12.38,13.88
|
||||
DDR3 SDRAM,DDR3-2666,2666,0.375,1333,0.750,13,9.75,10.88,12.38
|
||||
DDR3 SDRAM,DDR3-2666,2666,0.375,1333,0.750,12,9.00,10.13,11.63
|
||||
DDR3 SDRAM,DDR3-2666,2666,0.375,1333,0.750,11,8.25,9.38,10.88
|
||||
DDR3 SDRAM,DDR3-2800,2800,0.357,1400,0.714,16,11.43,12.50,13.93
|
||||
DDR3 SDRAM,DDR3-2800,2800,0.357,1400,0.714,12,8.57,9.64,11.07
|
||||
DDR3 SDRAM,DDR3-2800,2800,0.357,1400,0.714,11,7.86,8.93,10.36
|
||||
DDR3 SDRAM,DDR3-2933,2933,0.341,1466,0.682,12,8.18,9.20,10.57
|
||||
DDR3 SDRAM,DDR3-3000,3000,0.333,1500,0.667,12,8.00,9.00,10.33
|
||||
DDR3 SDRAM,DDR3-3100,3100,0.323,1550,0.645,12,7.74,8.71,10.00
|
||||
DDR3 SDRAM,DDR3-3200,3200,0.313,1600,0.625,16,10.00,10.94,12.19
|
||||
DDR3 SDRAM,DDR3-3300,3300,0.303,1650,0.606,16,9.70,10.61,11.82
|
||||
DDR4 SDRAM,DDR4-1600,1600,0.625,800,1.250,12,15.00,16.88,19.38
|
||||
DDR4 SDRAM,DDR4-1600,1600,0.625,800,1.250,11,13.75,15.63,18.13
|
||||
DDR4 SDRAM,DDR4-1600,1600,0.625,800,1.250,10,12.50,14.38,16.88
|
||||
DDR4 SDRAM,DDR4-1866,1866,0.536,933,1.071,14,15.00,16.61,18.75
|
||||
DDR4 SDRAM,DDR4-1866,1866,0.536,933,1.071,13,13.93,15.54,17.68
|
||||
DDR4 SDRAM,DDR4-1866,1866,0.536,933,1.071,12,12.86,14.46,16.61
|
||||
DDR4 SDRAM,DDR4-2133,2133,0.469,1066,0.938,16,15.00,16.41,18.28
|
||||
DDR4 SDRAM,DDR4-2133,2133,0.469,1066,0.938,15,14.06,15.47,17.34
|
||||
DDR4 SDRAM,DDR4-2133,2133,0.469,1066,0.938,14,13.13,14.53,16.41
|
||||
DDR4 SDRAM,DDR4-2400,2400,0.417,1200,0.833,17,14.17,15.42,17.08
|
||||
DDR4 SDRAM,DDR4-2400,2400,0.417,1200,0.833,16,13.33,14.58,16.25
|
||||
DDR4 SDRAM,DDR4-2400,2400,0.417,1200,0.833,15,12.50,13.75,15.42
|
||||
DDR4 SDRAM,DDR4-2666,2666,0.375,1333,0.750,19,14.25,15.38,16.88
|
||||
DDR4 SDRAM,DDR4-2666,2666,0.375,1333,0.750,17,12.75,13.88,15.38
|
||||
DDR4 SDRAM,DDR4-2666,2666,0.375,1333,0.750,16,12.00,13.13,14.63
|
||||
DDR4 SDRAM,DDR4-2666,2666,0.375,1333,0.750,15,11.25,12.38,13.88
|
||||
DDR4 SDRAM,DDR4-2666,2666,0.375,1333,0.750,13,9.75,10.88,12.38
|
||||
DDR4 SDRAM,DDR4-2800,2800,0.357,1400,0.714,17,12.14,13.21,14.64
|
||||
DDR4 SDRAM,DDR4-2800,2800,0.357,1400,0.714,16,11.43,12.50,13.93
|
||||
DDR4 SDRAM,DDR4-2800,2800,0.357,1400,0.714,15,10.71,11.79,13.21
|
||||
DDR4 SDRAM,DDR4-2800,2800,0.357,1400,0.714,14,10.00,11.07,12.50
|
||||
DDR4 SDRAM,DDR4-3000,3000,0.333,1500,0.667,17,11.33,12.33,13.67
|
||||
DDR4 SDRAM,DDR4-3000,3000,0.333,1500,0.667,16,10.67,11.67,13.00
|
||||
DDR4 SDRAM,DDR4-3000,3000,0.333,1500,0.667,15,10.00,11.00,12.33
|
||||
DDR4 SDRAM,DDR4-3000,3000,0.333,1500,0.667,14,9.33,10.33,11.67
|
||||
DDR4 SDRAM,DDR4-3200,3200,0.313,1600,0.625,16,10.00,10.94,12.19
|
||||
DDR4 SDRAM,DDR4-3200,3200,0.313,1600,0.625,15,9.38,10.31,11.56
|
||||
DDR4 SDRAM,DDR4-3200,3200,0.313,1600,0.625,14,8.75,9.69,10.94
|
||||
DDR4 SDRAM,DDR4-3300,3300,0.303,1650,0.606,16,9.70,10.61,11.82
|
||||
DDR4 SDRAM,DDR4-3333,3333,0.300,1666,0.600,16,9.60,10.50,11.70
|
||||
DDR4 SDRAM,DDR4-3400,3400,0.294,1700,0.588,16,9.41,10.29,11.47
|
||||
DDR4 SDRAM,DDR4-3466,3466,0.288,1733,0.577,18,10.38,11.25,12.40
|
||||
DDR4 SDRAM,DDR4-3466,3466,0.288,1733,0.577,17,9.81,10.67,11.83
|
||||
DDR4 SDRAM,DDR4-3466,3466,0.288,1733,0.577,16,9.23,10.10,11.25
|
||||
DDR4 SDRAM,DDR4-3600,3600,0.278,1800,0.556,19,10.56,11.39,12.50
|
||||
DDR4 SDRAM,DDR4-3600,3600,0.278,1800,0.556,18,10.00,10.83,11.94
|
||||
DDR4 SDRAM,DDR4-3600,3600,0.278,1800,0.556,17,9.44,10.28,11.39
|
||||
DDR4 SDRAM,DDR4-3600,3600,0.278,1800,0.556,16,8.89,9.72,10.83
|
||||
DDR4 SDRAM,DDR4-3600,3600,0.278,1800,0.556,15,8.33,9.17,10.28
|
||||
DDR4 SDRAM,DDR4-3600,3600,0.278,1800,0.556,14,7.78,8.61,9.72
|
||||
DDR4 SDRAM,DDR4-3733,3733,0.268,1866,0.536,17,9.11,9.91,10.98
|
||||
DDR4 SDRAM,DDR4-3866,3866,0.259,1933,0.517,18,9.31,10.09,11.12
|
||||
DDR4 SDRAM,DDR4-4000,4000,0.250,2000,0.500,19,9.50,10.25,11.25
|
||||
DDR4 SDRAM,DDR4-4000,4000,0.250,2000,0.500,18,9.00,9.75,10.75
|
||||
DDR4 SDRAM,DDR4-4000,4000,0.250,2000,0.500,17,8.50,9.25,10.25
|
||||
DDR4 SDRAM,DDR4-4000,4000,0.250,2000,0.500,16,8.00,8.75,9.75
|
||||
DDR4 SDRAM,DDR4-4133,4133,0.242,2066,0.484,19,9.19,9.92,10.89
|
||||
DDR4 SDRAM,DDR4-4200,4200,0.238,2100,0.476,19,9.05,9.76,10.71
|
||||
DDR4 SDRAM,DDR4-4266,4266,0.234,2133,0.469,19,8.91,9.61,10.55
|
||||
DDR4 SDRAM,DDR4-4266,4266,0.234,2133,0.469,18,8.44,9.14,10.08
|
||||
DDR4 SDRAM,DDR4-4266,4266,0.234,2133,0.469,17,7.97,8.67,9.61
|
||||
DDR4 SDRAM,DDR4-4266,4266,0.234,2133,0.469,16,7.50,8.20,9.14
|
||||
DDR4 SDRAM,DDR4-4400,4400,0.227,2200,0.454,19,8.64,9.32,10.23
|
||||
DDR4 SDRAM,DDR4-4400,4400,0.227,2200,0.454,18,8.18,8.86,9.77
|
||||
DDR4 SDRAM,DDR4-4400,4400,0.227,2200,0.454,17,7.73,8.41,9.32
|
||||
DDR4 SDRAM,DDR4-4600,4600,0.217,2300,0.435,19,8.26,8.91,9.78
|
||||
DDR4 SDRAM,DDR4-4600,4600,0.217,2300,0.435,18,7.82,8.48,9.35
|
||||
DDR4 SDRAM,DDR4-4800,4800,0.208,2400,0.417,20,8.33,8.96,9.79
|
||||
DDR4 SDRAM,DDR4-4800,4800,0.208,2400,0.417,19,7.92,8.54,9.38
|
||||
DDR5 SDRAM,DDR5-4800,4800,0.208,2400,0.417,40,16.67,17.29,18.13
|
||||
DDR5 SDRAM,DDR5-4800,4800,0.208,2400,0.417,38,15.83,16.46,17.29
|
||||
DDR5 SDRAM,DDR5-4800,4800,0.208,2400,0.417,36,15.00,15.63,16.46
|
||||
DDR5 SDRAM,DDR5-4800,4800,0.208,2400,0.417,34,14.17,14.79,15.63
|
||||
DDR5 SDRAM,DDR5-5200,5200,0.192,2600,0.385,40,15.38,15.96,16.73
|
||||
DDR5 SDRAM,DDR5-5200,5200,0.192,2600,0.385,38,14.62,15.19,15.96
|
||||
DDR5 SDRAM,DDR5-5200,5200,0.192,2600,0.385,36,13.85,14.42,15.19
|
||||
DDR5 SDRAM,DDR5-5200,5200,0.192,2600,0.385,34,13.08,13.65,14.42
|
||||
DDR5 SDRAM,DDR5-5600,5600,0.179,2800,0.357,40,14.29,14.82,15.54
|
||||
DDR5 SDRAM,DDR5-5600,5600,0.179,2800,0.357,38,13.57,14.11,14.82
|
||||
DDR5 SDRAM,DDR5-5600,5600,0.179,2800,0.357,36,12.86,13.39,14.11
|
||||
DDR5 SDRAM,DDR5-5600,5600,0.179,2800,0.357,34,12.14,12.68,13.39
|
||||
DDR5 SDRAM,DDR5-5600,5600,0.179,2800,0.357,30,10.71,11.25,11.96
|
||||
DDR5 SDRAM,DDR5-6000,6000,0.167,3000,0.333,40,13.33,13.83,14.50
|
||||
DDR5 SDRAM,DDR5-6000,6000,0.167,3000,0.333,38,12.67,13.17,13.83
|
||||
DDR5 SDRAM,DDR5-6000,6000,0.167,3000,0.333,36,12.00,12.50,13.17
|
||||
DDR5 SDRAM,DDR5-6000,6000,0.167,3000,0.333,32,10.67,11.17,11.83
|
||||
DDR5 SDRAM,DDR5-6000,6000,0.167,3000,0.333,30,10.00,10.50,11.17
|
||||
DDR5 SDRAM,DDR5-6200,6200,0.161,3100,0.323,40,12.90,13.39,14.03
|
||||
DDR5 SDRAM,DDR5-6200,6200,0.161,3100,0.323,38,12.26,12.74,13.39
|
||||
DDR5 SDRAM,DDR5-6200,6200,0.161,3100,0.323,36,11.61,12.10,12.74
|
||||
DDR5 SDRAM,DDR5-6400,6400,0.156,3200,0.313,40,12.50,12.97,13.59
|
||||
DDR5 SDRAM,DDR5-6400,6400,0.156,3200,0.313,38,11.88,12.34,12.97
|
||||
DDR5 SDRAM,DDR5-6400,6400,0.156,3200,0.313,36,11.25,11.72,12.34
|
||||
DDR5 SDRAM,DDR5-6400,6400,0.156,3200,0.313,34,10.63,11.09,11.72
|
||||
DDR5 SDRAM,DDR5-6400,6400,0.156,3200,0.313,32,10.00,10.47,11.09
|
||||
DDR5 SDRAM,DDR5-6600,6600,0.152,3300,0.303,34,10.30,10.76,11.36
|
|
101
architecture/memory.py
Normal file
|
@ -0,0 +1,101 @@
|
|||
# RAM clock rate and transfer rate data from Wikipedia
|
||||
# https://en.wikipedia.org/wiki/DDR_SDRAM
|
||||
# Table data extracted with: https://wikitable2csv.ggor.de/
|
||||
import numpy as np
|
||||
import pandas
|
||||
import matplotlib
|
||||
from matplotlib import pyplot as plt
|
||||
plt.style.use('ggplot')
|
||||
matplotlib.rcParams['font.size'] = 12
|
||||
matplotlib.rcParams['font.family'] = ['Exo 2', 'sans-serif']
|
||||
|
||||
|
||||
data = pandas.read_csv('memory.csv')
|
||||
#data = data.sort_values('Memory clock (MHz)')
|
||||
_types = list(data['Type'])
|
||||
_transfers = list(data['Data rate (MT/s)'])
|
||||
_cycle_times = list(data['Cycle time (ns)'])
|
||||
_latencies = list(data['Eighth word (ns)'])
|
||||
|
||||
# remove redundant data
|
||||
types = []
|
||||
transfers = []
|
||||
cycle_times = []
|
||||
latencies = []
|
||||
for idx, typ in enumerate(_types):
|
||||
# just select the first occurence of this type
|
||||
if typ in types:
|
||||
continue
|
||||
# filter DDR4 inferior to DDR4-3333
|
||||
prefix = 'DDR4-'
|
||||
if typ.startswith(prefix) and int(typ.removeprefix(prefix)) < 3333:
|
||||
continue
|
||||
types.append(typ)
|
||||
transfers.append(_transfers[idx])
|
||||
cycle_times.append(_cycle_times[idx])
|
||||
latencies.append(_latencies[idx])
|
||||
|
||||
# transform transfers from MT/s to GB/s
|
||||
transfers = np.array(transfers, dtype='float64')*8/1024
|
||||
plt.figure(figsize=(8.5,7.5))
|
||||
plt.title('Memory Bandwidth [GB/s] (1995-2023)')
|
||||
# my laptop first
|
||||
me = types.index('DDR5-5200')
|
||||
plt.plot(range(len(types)), transfers, 'o')
|
||||
plt.plot(me, transfers[me], 'o')
|
||||
plt.xticks(range(len(types))[::3], types[::3], rotation=30, ha='right')
|
||||
yticks = range(0,56)
|
||||
ylabels = []
|
||||
for t in yticks:
|
||||
if not t%5:
|
||||
ylabels.append(str(t))
|
||||
else:
|
||||
ylabels.append('')
|
||||
plt.yticks(yticks, ylabels)
|
||||
plt.ylabel('GB/s')
|
||||
plt.tick_params(axis='y', which='both', reset=True, labelright=True, right=True)
|
||||
plt.savefig('memory_bandwidth.svg')
|
||||
|
||||
plt.figure(figsize=(8.5,7.5))
|
||||
plt.title('Memory Cycle Time [ns] (1995-2023)')
|
||||
# my laptop first
|
||||
me = types.index('DDR5-5200')
|
||||
plt.semilogy(range(len(types)), cycle_times, 'o')
|
||||
plt.semilogy(me, cycle_times[me], 'o')
|
||||
plt.xticks(range(len(types))[::3], types[::3], rotation=30, ha='right')
|
||||
plt.ylim(0.1, 11)
|
||||
line = np.arange(0,10).astype('float64')
|
||||
yticks = list(line[1:]*0.1)+list(line[1:])+list(line[1:2]*10)
|
||||
ylabels = []
|
||||
for value in yticks:
|
||||
if value in (0.1, 0.5, 1., 1.5, 5., 10.):
|
||||
ylabels.append(f'{value} ns')
|
||||
else:
|
||||
ylabels.append('')
|
||||
plt.yticks(yticks, ylabels)
|
||||
plt.tick_params(axis='y', which='both', reset=True, labelright=True, right=True)
|
||||
plt.savefig('memory_clock.svg')
|
||||
|
||||
# transform transfers from MT/s to GB/s
|
||||
plt.figure(figsize=(8.5,7.5))
|
||||
plt.title('Memory Latency [ns] (1998-2023)')
|
||||
# my laptop first
|
||||
me = types.index('DDR5-5200')
|
||||
mel = latencies[me]
|
||||
y = latencies[2:]
|
||||
x = types[2:]
|
||||
plt.plot(range(len(x)), y, 'o')
|
||||
plt.plot(me, mel, 'o')
|
||||
plt.xticks(range(len(x))[::3], x[::3], rotation=30, ha='right')
|
||||
plt.ylim(8,40)
|
||||
yticks = range(8,41)
|
||||
ylabels = []
|
||||
for t in yticks:
|
||||
if not t%5:
|
||||
ylabels.append(str(t)+' ns')
|
||||
else:
|
||||
ylabels.append('')
|
||||
plt.yticks(yticks, ylabels)
|
||||
plt.tick_params(axis='y', which='both', reset=True, labelright=True, right=True)
|
||||
plt.savefig('memory_latency.svg')
|
||||
|
2563
architecture/memory_bandwidth.svg
Normal file
After Width: | Height: | Size: 81 KiB |
1734
architecture/memory_clock.svg
Normal file
After Width: | Height: | Size: 56 KiB |
1944
architecture/memory_latency.svg
Normal file
After Width: | Height: | Size: 63 KiB |
49
architecture/storage.csv
Normal file
|
@ -0,0 +1,49 @@
|
|||
Teletype Model 33 paper tape,10 B/s,1963
|
||||
Single Density 8-inch FM Floppy Disk Controller (160 KB),31 KB/s,1973
|
||||
C2N Commodore Datasette 1530 cassette tape interface,15 B/s,1977
|
||||
Apple II cassette tape interface,200 B/s,1977
|
||||
TRS-80 Model 1 Level 1 BASIC cassette tape interface,32 B/s,1977
|
||||
Single Density 5.25-inch FM Floppy Disk Controller (180 KB),15.5 KB/s,1978
|
||||
MFM hard disk,0.625 MB/s,1980
|
||||
Amstrad CPC tape,250 B/s,1984
|
||||
High Density MFM Floppy Disk Controller (1.2 MB/1.44 MB),31 KB/s,1984
|
||||
ATA PIO Mode 0,3.3 MB/s,1986
|
||||
SCSI (Narrow SCSI) (5 MHz),5 MB/s,1986
|
||||
CD Controller (1×),0.146 MB/s,1988
|
||||
Serial Storage Architecture SSA,80 MB/s,1990
|
||||
ATA PIO Mode 1,5.2 MB/s,1994
|
||||
ATA PIO Mode 2,8.3 MB/s,1994
|
||||
ATA PIO Mode 3,11.1 MB/s,1996
|
||||
ATA PIO Mode 4,16.7 MB/s,1996
|
||||
Fibre Channel 1GFC (1.0625 GHz),103.23 MB/s,1997
|
||||
Ultra DMA ATA 33,33 MB/s,1998
|
||||
Ultra DMA ATA 66,66.7 MB/s,2000
|
||||
Fibre Channel 2GFC (2.125 GHz),206.5 MB/s,2001
|
||||
Ultra DMA ATA 100,100 MB/s,2002
|
||||
SATA revision 1.0,150 MB/s,2003
|
||||
iSCSI over 10GbE,1.239 GB/s,2004
|
||||
iSCSI over Fast Ethernet,11.9 MB/s,2004
|
||||
iSCSI over gigabit Ethernet- jumbo frames,123.9 MB/s,2004
|
||||
Serial Attached SCSI (SAS) SAS-1,300 MB/s,2004
|
||||
SATA Revision 2.0,300 MB/s,2004
|
||||
Fibre Channel 4GFC (4.25 GHz),413 MB/s,2004
|
||||
Ultra DMA ATA 133,133 MB/s,2005
|
||||
Fibre Channel 8GFC (8.50 GHz),826 MB/s,2005
|
||||
iSCSI over InfiniBand 4×,4 GB/s,2007
|
||||
SATA Revision 3.0,600 MB/s,2008
|
||||
FCoE over 10GbE,1.206 GB/s,2009
|
||||
AoE over 10GbE,1.242 GB/s,2009
|
||||
AoE over Fast Ethernet,11.9 MB/s,2009
|
||||
AoE over gigabit Ethernet- jumbo frames,124.2 MB/s,2009
|
||||
Serial Attached SCSI (SAS) SAS-2,600 MB/s,2009
|
||||
FCoE over 100G Ethernet,12.064 GB/s,2010
|
||||
iSCSI over 100G Ethernet,12.392 GB/s,2010
|
||||
Fibre Channel 16GFC (14.025 GHz),1.652 GB/s,2011
|
||||
Serial Attached SCSI (SAS) SAS-3,1.2 GB/s,2013
|
||||
SATA Express,2 GB/s,2013
|
||||
NVMe over M.2 or U.2 (using PCI Express 3.0 ×4 link),3.938 GB/s,2013
|
||||
Fibre Channel 32GFC (28.05 GHz),3.303 GB/s,2016
|
||||
Serial Attached SCSI (SAS) SAS-4,2.4 GB/s,2017
|
||||
NVMe over M.2 or U.2 (using PCI Express 4.0 ×4 link),7.876 GB/s,2017
|
||||
UFS (version 3.0),2.9 GB/s,2018
|
||||
NVMe over M.2- U.2- U.3 or EDSFF (using PCI Express 5.0 ×4 link),15.754 GB/s,2019
|
|
54
architecture/storage.py
Normal file
|
@ -0,0 +1,54 @@
|
|||
# Storage interfaces rates from
|
||||
# https://en.wikipedia.org/wiki/List_of_interface_bit_rates#Storage
|
||||
# Table data extracted with: https://wikitable2csv.ggor.de/
|
||||
import numpy as np
|
||||
import matplotlib
|
||||
from matplotlib import pyplot as plt
|
||||
plt.style.use('ggplot')
|
||||
matplotlib.rcParams['font.size'] = 12
|
||||
matplotlib.rcParams['font.family'] = ['Exo 2', 'sans-serif']
|
||||
|
||||
data = open('storage.csv', 'rt')
|
||||
# remove units and rescale everything to MB/s
|
||||
b_to_mb = 1/(1024*1024)
|
||||
kb_to_mb = 1/1024
|
||||
gb_to_mb = 1024
|
||||
rescaled = []
|
||||
for line in data:
|
||||
typ, rate, year = line.split(',')
|
||||
value, unit = rate.split()
|
||||
value = float(value)
|
||||
if unit == 'B/s':
|
||||
value = value*b_to_mb
|
||||
elif unit == 'KB/s':
|
||||
value = value*kb_to_mb
|
||||
elif unit == 'MB/s':
|
||||
pass
|
||||
elif unit == 'GB/s':
|
||||
value = value*gb_to_mb
|
||||
else:
|
||||
raise ValueError(f'Unit not understood! {unit}')
|
||||
rescaled.append((int(year), value))
|
||||
|
||||
dtype = [('year', np.float64), ('speed', np.float64)]
|
||||
rescaled = np.array(rescaled, dtype=dtype)
|
||||
# sort first by year and then by value
|
||||
rescaled.sort(order=['year', 'speed'])
|
||||
|
||||
# plot the thing
|
||||
plt.figure(figsize=(8.5,7.5))
|
||||
plt.semilogy(rescaled['year'], rescaled['speed'], 'o')
|
||||
# my laptop here
|
||||
plt.semilogy([2023], [6585], 'o')
|
||||
plt.grid(None)
|
||||
plt.grid(which='both', axis='y')
|
||||
plt.grid(which='both', axis='x')
|
||||
plt.ylim(b_to_mb, 100*gb_to_mb)
|
||||
plt.xlim(1960, 2025)
|
||||
years = range(1960,2026,5)
|
||||
plt.xticks(years, years, rotation=45, ha='center')
|
||||
plt.yticks([b_to_mb, kb_to_mb, 1, 10, 100, gb_to_mb, 10*gb_to_mb, 100*gb_to_mb],
|
||||
['1 B/s', '1 KB/s', '1 MB/s', '10 MB/s', '100 MB/s', '1 GB/s', '10 GB/s', '100 GB/s'])
|
||||
plt.tick_params(labeltop=False, labelright=True, top=True, right=True)
|
||||
plt.title('Storage (read) speed')
|
||||
plt.savefig('storage.svg')
|
1526
architecture/storage.svg
Normal file
After Width: | Height: | Size: 48 KiB |
BIN
architecture/topology.png
Normal file
After Width: | Height: | Size: 31 KiB |
57
benchmark_low_level/README.md
Normal file
|
@ -0,0 +1,57 @@
|
|||
# Low Level Memory Benchmark
|
||||
|
||||
These are the results of a low level memory benchmark (written in C) on my [laptop](../architecture/README.md)
|
||||
|
||||
## Summary plots (details below)
|
||||

|
||||

|
||||
|
||||
## Benchmarks details:
|
||||
|
||||
- Bandwidth (read), [bw_mem_rd](http://lmbench.sourceforge.net/man/bw_mem_rd.8.html). Allocate the specified amount of memory, zero it, and then time the reading of that memory as a series of integer loads and adds. Each 4-byte integer is loaded and added to accumulator.
|
||||
|
||||
[Results](t14-bwr.csv) (block size in MB, bandwith in MB/s)
|
||||
- Bandwidth (write),[bw_mem](http://lmbench.sourceforge.net/man/bw_mem.8.html). Allocate twice the specified amount of memory, zero it, and then time the copying of the first half to the second half.
|
||||
|
||||
[Results](t14-bww.csv) (block size in MB, bandwith in MB/s)
|
||||
- Latency (sequential access), [lat_mem_rd](http://lmbench.sourceforge.net/man/lat_mem_rd.8.html). Run two nested loops. The outer loop is the stride size of 128 bytes. The inner loop is the block size. For each block size, create a ring of pointers that point backward one stride. Traverse the block by `p = (char **)*p` in a for loop and time the load ladency over block.
|
||||
|
||||
[Results](t14-lseq.csv) (block size in MB, latency in ns)
|
||||
- Latency (random access). Like above, but with a stride size of 16 bytes.
|
||||
|
||||
[Results](t14-lrnd.csv) (block size in MB, latency in ns)
|
||||
|
||||
## Running the benchmarks on Linux:
|
||||
- You need the [lmbench](http://lmbench.sourceforge.net/) library and [cpuset](https://github.com/SUSE/cpuset)
|
||||
- All commands must be run as root after having killed as many processes/services as possible, so that the CPUs are almost idle
|
||||
- Disable address space randomization:
|
||||
```bash
|
||||
echo 0 > /proc/sys/kernel/randomize_va_space
|
||||
```
|
||||
- Set scaling governor to performance for CPU0:
|
||||
```bash
|
||||
echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
|
||||
```
|
||||
- Reserve CPU 0 fro our benchmark, i.e. kick out (almost) all other processes
|
||||
```bash
|
||||
cset shield --cpu 0 --kthread=on
|
||||
```
|
||||
- If you are on INTEL and CPU0 is part of a SMT-pair (hyperthreading), disable the peer
|
||||
```bash
|
||||
echo 0 > /sys/devices/system/cpu/cpu1/online
|
||||
```
|
||||
- Disable turbo mode on INTEL:
|
||||
```bash
|
||||
echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
|
||||
```
|
||||
- Run the configuration script for lmbench. Select only the `HARDWARE` set of benchmarks and set the maximum amount of memory to something like 1024MB
|
||||
```bash
|
||||
cd /usr/lib/lmbench/scripts
|
||||
# the following command will create the configuration file /usr/lib/lmbench/bin/x86_64-linux-gnu/CONFIG.<hostname>
|
||||
cset shield --exec -- ./config-run
|
||||
# run the benchmark
|
||||
cset shield --exec -- /usr/bin/lmbench-run
|
||||
# results are in /var/lib/lmbench/results/x86_64-linux-gnu/<hostname>
|
||||
```
|
||||
|
||||
|
1719
benchmark_low_level/bandwidth-t14.svg
Normal file
After Width: | Height: | Size: 51 KiB |
1801
benchmark_low_level/latency-t14.svg
Normal file
After Width: | Height: | Size: 53 KiB |
39
benchmark_low_level/parse_results.py
Normal file
|
@ -0,0 +1,39 @@
|
|||
import os
|
||||
import sys
|
||||
|
||||
# prefix is something like results_
|
||||
results = sys.argv[1]
|
||||
name = results.removeprefix('results_')
|
||||
types = {}
|
||||
results = open(results, 'rt')
|
||||
|
||||
|
||||
for idx, line in enumerate(results):
|
||||
if line.startswith('Memory read bandwidth'):
|
||||
types['bwr'] = idx
|
||||
elif line.startswith('Memory write bandwidth'):
|
||||
types['bww'] = idx
|
||||
elif line.startswith('Memory load latency'):
|
||||
types['lseq'] = idx
|
||||
elif line.startswith('Random load latency'):
|
||||
types['lrnd'] = idx
|
||||
else:
|
||||
pass
|
||||
|
||||
for typ, idx in types.items():
|
||||
csv = open(f'{name}-{typ}.csv', 'wt')
|
||||
results.seek(0)
|
||||
for count, line in enumerate(results):
|
||||
if count <= idx:
|
||||
continue
|
||||
if line.startswith('"'):
|
||||
continue
|
||||
try:
|
||||
val1, val2 = line.split(" ")
|
||||
except ValueError:
|
||||
# we are at the end of the section
|
||||
csv.close()
|
||||
break
|
||||
csv.write(f'{val1},{val2}')
|
||||
|
||||
|
115
benchmark_low_level/plot.py
Normal file
|
@ -0,0 +1,115 @@
|
|||
import os
|
||||
import sys
|
||||
import numpy as np
|
||||
import matplotlib
|
||||
import itertools
|
||||
from matplotlib import pyplot as plt
|
||||
plt.style.use('ggplot')
|
||||
matplotlib.rcParams['font.size'] = 12
|
||||
|
||||
name = 't14'
|
||||
|
||||
caches = (48*1024, 1280*1024, 12*1024*1024)
|
||||
|
||||
def get_labels(x):
|
||||
xlabels = []
|
||||
for value in x:
|
||||
b = int(2**value)
|
||||
if b < 1024:
|
||||
xlabels.append(f'{b}B')
|
||||
elif b < 1048576:
|
||||
xlabels.append(f'{b//1024}K')
|
||||
elif b < 1073741824:
|
||||
xlabels.append(f'{b//1024//1024}M')
|
||||
else:
|
||||
xlabels.append(f'{b//1024//1024//1024}G')
|
||||
return xlabels
|
||||
|
||||
|
||||
# manually set ticks, to disable, set ticks = None
|
||||
|
||||
line = np.linspace(1, 10, 9, endpoint=False)
|
||||
yticks = list(line)+list(line*10)+list(line[:2]*100)
|
||||
|
||||
ylabels = {1 : '1 ns', 5 : '5 ns', 10 : '10 ns', 50 : '50 ns', 100: '100 ns'}
|
||||
ticks = {'l': (yticks, [ylabels[i] if i in ylabels else '' for i in yticks]),
|
||||
'bw': (range(5,46,5), [f'{i} GB/s' for i in range(5,46,5)]),
|
||||
}
|
||||
|
||||
# manually set limits, to disable set to ylim = None
|
||||
|
||||
ylim = {'l' : (1, 200),
|
||||
'bw' : (5,45),
|
||||
}
|
||||
|
||||
for type_ in ('bw', 'l'):
|
||||
if type_ == 'bw':
|
||||
suffix = ('r', 'w')
|
||||
ylabel = ''
|
||||
title = f'Memory Bandwidth ({name}) [GB/s]'
|
||||
legend1, legend2 = 'read', 'write'
|
||||
pic = f'bandwidth-{name}.svg'
|
||||
plt_func = plt.plot
|
||||
else:
|
||||
suffix = ('seq', 'rnd')
|
||||
ylabel = ''
|
||||
title = f'Memory Latency ({name}) [ns]'
|
||||
legend1, legend2 = 'sequential access', 'random access'
|
||||
pic = f'latency-{name}.svg'
|
||||
plt_func = plt.semilogy
|
||||
|
||||
|
||||
data1 = np.loadtxt(f'{name}-{type_}{suffix[0]}.csv', delimiter=',')
|
||||
data2 = np.loadtxt(f'{name}-{type_}{suffix[1]}.csv', delimiter=',')
|
||||
|
||||
# convert to bytes and then to the corresponding power of two
|
||||
|
||||
if type_ == 'bw':
|
||||
x1 = np.log2(data1[:,0]*1024*1024).round()
|
||||
y1 = data1[:,1]/1024
|
||||
x2 = np.log2(data2[:,0]*1024*1024).round()
|
||||
y2 = data2[:,1]/1024
|
||||
else:
|
||||
x1 = np.log2(data1[::2,0]*1024*1024).round()
|
||||
y1 = data1[::2,1]
|
||||
x2 = np.log2(data2[::2,0]*1024*1024).round()
|
||||
y2 = data2[::2,1]
|
||||
ylabels = None
|
||||
|
||||
|
||||
xlabel = 'block size'
|
||||
xlabels = get_labels(x1)
|
||||
|
||||
plt.figure(figsize=(8.5,7.5))
|
||||
if type_ == 'l':
|
||||
# plot two empy plots so we advance the color cyle (bad trick)
|
||||
_ = plt_func([],[])
|
||||
_ = plt_func([],[])
|
||||
p1, = plt_func(x1, y1, 'o')
|
||||
plt.ylabel(ylabel)
|
||||
plt.xlabel(xlabel)
|
||||
p2, = plt_func(x2, y2, 'o')
|
||||
if ylim and type_ in ylim:
|
||||
plt.ylim(*ylim[type_])
|
||||
plt.xticks(x1, xlabels, rotation=60)
|
||||
if ticks and type_ in ticks:
|
||||
plt.yticks(*ticks[type_])
|
||||
plt.legend((p1, p2), (legend1, legend2))
|
||||
if ylim and type_ in ylim:
|
||||
miny, maxy = ylim[type_]
|
||||
else:
|
||||
miny = min(y1.min(), y2.min())
|
||||
maxy = max(y1.max(), y2.max())
|
||||
# caches
|
||||
for idx, cache in enumerate(caches):
|
||||
level = idx + 1
|
||||
size = np.log2(cache)
|
||||
plt.plot((size, size), (miny, maxy),
|
||||
color = 'darkblue', alpha=0.4)
|
||||
plt.text(size-1, 2*miny, f'L{level}\n⟵',
|
||||
color='darkblue', verticalalignment='top')
|
||||
|
||||
plt.title(title)
|
||||
plt.savefig(pic)
|
||||
|
||||
|
485
benchmark_low_level/results_t14
Normal file
|
@ -0,0 +1,485 @@
|
|||
[lmbench3.0 results for Linux multivac 6.10.3-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.10.3-1 (2024-08-04) x86_64 GNU/Linux]
|
||||
[LMBENCH_VER: 3.0-a9]
|
||||
[BENCHMARK_HARDWARE: YES]
|
||||
[BENCHMARK_OS: NO]
|
||||
[ALL: 512 1k 2k 4k 8k 16k 32k 64k 128k 256k 512k 1m 2m 4m 8m 16m 32m 64m 128m 256m 512m 1024m]
|
||||
[DISKS: ]
|
||||
[DISK_DESC: ]
|
||||
[ENOUGH: 5000]
|
||||
[FAST: ]
|
||||
[FASTMEM: NO]
|
||||
[FILE: /var/tmp/lmbench/XXX]
|
||||
[FSDIR: /var/tmp/lmbench]
|
||||
[HALF: 512 1k 2k 4k 8k 16k 32k 64k 128k 256k 512k 1m 2m 4m 8m 16m 32m 64m 128m 256m 512m]
|
||||
[INFO: INFO.multivac]
|
||||
[LINE_SIZE: ]
|
||||
[LOOP_O: 0.00000000]
|
||||
[MB: 1024]
|
||||
[MHZ: 1296 MHz, 0.7716 nanosec clock]
|
||||
[MOTHERBOARD: ]
|
||||
[NETWORKS: ]
|
||||
[PROCESSORS: 11]
|
||||
[REMOTE: ]
|
||||
[SLOWFS: YES]
|
||||
[OS: x86_64-linux-gnu]
|
||||
[SYNC_MAX: 1]
|
||||
[LMBENCH_SCHED: DEFAULT]
|
||||
[TIMING_O: 0]
|
||||
[LMBENCH VERSION: 3.0-20240810]
|
||||
[USER: root]
|
||||
[HOSTNAME: multivac]
|
||||
[NODENAME: multivac]
|
||||
[SYSNAME: Linux]
|
||||
[PROCESSOR: unknown]
|
||||
[MACHINE: x86_64]
|
||||
[RELEASE: 6.10.3-amd64]
|
||||
[VERSION: #1 SMP PREEMPT_DYNAMIC Debian 6.10.3-1 (2024-08-04)]
|
||||
[Sat Aug 10 04:20:43 PM CEST 2024]
|
||||
[ 16:20:43 up 1:18, 4 users, load average: 0.37, 0.94, 1.05]
|
||||
[net: Kernel Interface table]
|
||||
[net: Iface MTU RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg]
|
||||
[net: eth0 1500 0 0 0 0 0 0 0 0 BMU]
|
||||
[if: eth0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500]
|
||||
[if: ether fc:5c:ee:4d:b5:eb txqueuelen 1000 (Ethernet)]
|
||||
[if: RX packets 0 bytes 0 (0.0 B)]
|
||||
[if: RX errors 0 dropped 0 overruns 0 frame 0]
|
||||
[if: TX packets 0 bytes 0 (0.0 B)]
|
||||
[if: TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0]
|
||||
[if: device interrupt 16 memory 0xbc300000-bc320000]
|
||||
[if: ]
|
||||
[net: eth1 1500 34948 0 2352 0 7773 0 0 0 BMRU]
|
||||
[if: eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500]
|
||||
[if: inet 192.168.111.103 netmask 255.255.255.0 broadcast 192.168.111.255]
|
||||
[if: inet6 fe80::44e3:4a35:5130:3045 prefixlen 64 scopeid 0x20<link>]
|
||||
[if: inet6 2003:ef:2f2e:c900:e437:85c7:3d97:f353 prefixlen 64 scopeid 0x0<global>]
|
||||
[if: ether b0:4f:13:ef:1a:3e txqueuelen 1000 (Ethernet)]
|
||||
[if: RX packets 34948 bytes 33936985 (32.3 MiB)]
|
||||
[if: RX errors 0 dropped 2352 overruns 0 frame 0]
|
||||
[if: TX packets 7773 bytes 1213416 (1.1 MiB)]
|
||||
[if: TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0]
|
||||
[if: ]
|
||||
[net: lo 65536 95 0 0 0 95 0 0 0 LRU]
|
||||
[if: lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536]
|
||||
[if: inet 127.0.0.1 netmask 255.0.0.0]
|
||||
[if: inet6 ::1 prefixlen 128 scopeid 0x10<host>]
|
||||
[if: loop txqueuelen 1000 (Local Loopback)]
|
||||
[if: RX packets 95 bytes 5588 (5.4 KiB)]
|
||||
[if: RX errors 0 dropped 0 overruns 0 frame 0]
|
||||
[if: TX packets 95 bytes 5588 (5.4 KiB)]
|
||||
[if: TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0]
|
||||
[if: ]
|
||||
[mount: sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)]
|
||||
[mount: proc on /proc type proc (rw,relatime)]
|
||||
[mount: udev on /dev type devtmpfs (rw,nosuid,relatime,size=16228560k,nr_inodes=4057140,mode=755,inode64)]
|
||||
[mount: devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)]
|
||||
[mount: tmpfs on /run type tmpfs (rw,nosuid,nodev,noexec,relatime,size=3251140k,mode=755,inode64)]
|
||||
[mount: /dev/mapper/CRYPT-ROOT on / type ext4 (rw,relatime,errors=remount-ro)]
|
||||
[mount: securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)]
|
||||
[mount: tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,inode64)]
|
||||
[mount: cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot)]
|
||||
[mount: pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)]
|
||||
[mount: efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)]
|
||||
[mount: bpf on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)]
|
||||
[mount: systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=39,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=67)]
|
||||
[mount: hugetlbfs on /dev/hugepages type hugetlbfs (rw,nosuid,nodev,relatime,pagesize=2M)]
|
||||
[mount: none on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)]
|
||||
[mount: mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)]
|
||||
[mount: tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k,inode64)]
|
||||
[mount: tracefs on /sys/kernel/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)]
|
||||
[mount: tmpfs on /run/credentials/systemd-journald.service type tmpfs (ro,nosuid,nodev,noexec,relatime,nosymfollow,size=1024k,nr_inodes=1024,mode=700,inode64,noswap)]
|
||||
[mount: tmpfs on /run/credentials/systemd-udev-load-credentials.service type tmpfs (ro,nosuid,nodev,noexec,relatime,nosymfollow,size=1024k,nr_inodes=1024,mode=700,inode64,noswap)]
|
||||
[mount: tmpfs on /run/credentials/systemd-tmpfiles-setup-dev-early.service type tmpfs (ro,nosuid,nodev,noexec,relatime,nosymfollow,size=1024k,nr_inodes=1024,mode=700,inode64,noswap)]
|
||||
[mount: configfs on /sys/kernel/config type configfs (rw,nosuid,nodev,noexec,relatime)]
|
||||
[mount: fusectl on /sys/fs/fuse/connections type fusectl (rw,nosuid,nodev,noexec,relatime)]
|
||||
[mount: tmpfs on /run/credentials/systemd-sysctl.service type tmpfs (ro,nosuid,nodev,noexec,relatime,nosymfollow,size=1024k,nr_inodes=1024,mode=700,inode64,noswap)]
|
||||
[mount: tmpfs on /run/credentials/systemd-tmpfiles-setup-dev.service type tmpfs (ro,nosuid,nodev,noexec,relatime,nosymfollow,size=1024k,nr_inodes=1024,mode=700,inode64,noswap)]
|
||||
[mount: tmpfs on /tmp type tmpfs (rw,nosuid,nodev,relatime,size=16777216k,inode64)]
|
||||
[mount: tmpfs on /run/credentials/systemd-tmpfiles-setup.service type tmpfs (ro,nosuid,nodev,noexec,relatime,nosymfollow,size=1024k,nr_inodes=1024,mode=700,inode64,noswap)]
|
||||
[mount: binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,nosuid,nodev,noexec,relatime)]
|
||||
[mount: sunrpc on /run/rpc_pipefs type rpc_pipefs (rw,relatime)]
|
||||
[mount: tmpfs on /run/user/1002 type tmpfs (rw,nosuid,nodev,relatime,size=3251136k,nr_inodes=812784,mode=700,uid=1002,gid=100,inode64)]
|
||||
[mount: tmpfs on /run/credentials/getty@tty1.service type tmpfs (ro,nosuid,nodev,noexec,relatime,nosymfollow,size=1024k,nr_inodes=1024,mode=700,inode64,noswap)]
|
||||
[mount: tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=3251136k,nr_inodes=812784,mode=700,inode64)]
|
||||
[mount: none on /cpusets type cgroup (rw,relatime,cpuset,noprefix,release_agent=/sbin/cpuset_release_agent)]
|
||||
integer bit: 0.54 nanoseconds
|
||||
integer add: 0.77 nanoseconds
|
||||
integer div: 8.49 nanoseconds
|
||||
integer mod: 12.58 nanoseconds
|
||||
int64 bit: 0.52 nanoseconds
|
||||
uint64 add: 0.77 nanoseconds
|
||||
int64 div: 11.58 nanoseconds
|
||||
int64 mod: 14.91 nanoseconds
|
||||
float add: 1.54 nanoseconds
|
||||
float mul: 3.09 nanoseconds
|
||||
float div: 8.49 nanoseconds
|
||||
double add: 1.54 nanoseconds
|
||||
double mul: 3.09 nanoseconds
|
||||
double div: 10.80 nanoseconds
|
||||
float bogomflops: 1.16 nanoseconds
|
||||
double bogomflops: 1.54 nanoseconds
|
||||
integer bit parallelism: 2.77
|
||||
integer add parallelism: 2.73
|
||||
integer div parallelism: 1.83
|
||||
integer mod parallelism: 2.83
|
||||
int64 bit parallelism: 2.49
|
||||
int64 add parallelism: 2.60
|
||||
int64 div parallelism: 1.50
|
||||
int64 mod parallelism: 1.90
|
||||
float add parallelism: 4.00
|
||||
float mul parallelism: 8.00
|
||||
float div parallelism: 3.67
|
||||
double add parallelism: 4.00
|
||||
double mul parallelism: 8.00
|
||||
double div parallelism: 3.50
|
||||
unable to register (XACT_PROG, XACT_VERS, udp).
|
||||
: RPC: Unable to receive
|
||||
|
||||
"libc bcopy unaligned
|
||||
0.000512 41652.95
|
||||
0.001024 47761.25
|
||||
0.002048 50233.88
|
||||
0.004096 55637.27
|
||||
0.008192 64524.03
|
||||
0.016384 67719.30
|
||||
0.032768 18212.36
|
||||
0.065536 18407.52
|
||||
0.131072 18473.55
|
||||
0.262144 18475.00
|
||||
0.524288 14642.79
|
||||
1.05 8957.30
|
||||
2.10 8208.03
|
||||
4.19 8208.03
|
||||
8.39 9645.77
|
||||
16.78 7631.79
|
||||
33.55 7129.38
|
||||
67.11 6951.41
|
||||
134.22 6900.65
|
||||
268.44 6848.89
|
||||
536.87 6861.76
|
||||
|
||||
"libc bcopy aligned
|
||||
0.000512 44106.76
|
||||
0.001024 49354.68
|
||||
0.002048 51472.69
|
||||
0.004096 55925.21
|
||||
0.008192 63828.24
|
||||
0.016384 66379.51
|
||||
0.032768 18202.45
|
||||
0.065536 18336.03
|
||||
0.131072 18457.77
|
||||
0.262144 18327.76
|
||||
0.524288 15715.46
|
||||
1.05 8922.33
|
||||
2.10 8367.89
|
||||
4.19 8343.10
|
||||
8.39 9679.16
|
||||
16.78 7632.95
|
||||
33.55 7179.72
|
||||
67.11 6990.51
|
||||
134.22 6911.31
|
||||
268.44 6892.15
|
||||
536.87 6891.97
|
||||
|
||||
Memory bzero bandwidth
|
||||
0.000512 73586.23
|
||||
0.001024 78019.46
|
||||
0.002048 80349.42
|
||||
0.004096 74573.30
|
||||
0.008192 78524.11
|
||||
0.016384 80567.79
|
||||
0.032768 81708.84
|
||||
0.065536 21219.16
|
||||
0.131072 21299.79
|
||||
0.262144 21333.96
|
||||
0.524288 21347.23
|
||||
1.05 19382.88
|
||||
2.10 12829.98
|
||||
4.19 12611.10
|
||||
8.39 12606.02
|
||||
16.78 10399.64
|
||||
33.55 9537.93
|
||||
67.11 9140.41
|
||||
134.22 9007.90
|
||||
268.44 8931.77
|
||||
536.87 8918.57
|
||||
1073.74 8908.13
|
||||
|
||||
"unrolled bcopy unaligned
|
||||
0.000512 10357.22
|
||||
0.001024 10363.21
|
||||
0.002048 10356.95
|
||||
0.004096 10357.76
|
||||
0.008192 10343.49
|
||||
0.016384 10351.27
|
||||
0.032768 7899.27
|
||||
0.065536 7893.76
|
||||
0.131072 7873.84
|
||||
0.262144 7832.99
|
||||
0.524288 7281.78
|
||||
1.05 6503.77
|
||||
2.10 6418.22
|
||||
4.19 6461.47
|
||||
8.39 5194.99
|
||||
16.78 4722.65
|
||||
33.55 4639.72
|
||||
67.11 4606.91
|
||||
134.22 4593.51
|
||||
268.44 4596.34
|
||||
536.87 4587.27
|
||||
|
||||
"unrolled partial bcopy unaligned
|
||||
0.000512 41402.69
|
||||
0.001024 41453.86
|
||||
0.002048 41452.30
|
||||
0.004096 41425.45
|
||||
0.008192 41418.12
|
||||
0.016384 41333.58
|
||||
0.032768 18957.19
|
||||
0.065536 18955.39
|
||||
0.131072 18962.49
|
||||
0.262144 18969.69
|
||||
0.524288 14659.04
|
||||
1.05 8844.77
|
||||
2.10 8192.00
|
||||
4.19 8206.57
|
||||
8.39 6326.25
|
||||
16.78 5801.25
|
||||
33.55 5644.14
|
||||
67.11 5609.70
|
||||
134.22 5600.81
|
||||
268.44 5589.38
|
||||
536.87 5591.24
|
||||
|
||||
Memory read bandwidth
|
||||
0.000512 29201.61
|
||||
0.001024 29294.55
|
||||
0.002048 29363.12
|
||||
0.004096 29433.86
|
||||
0.008192 29442.59
|
||||
0.016384 29285.40
|
||||
0.032768 29336.30
|
||||
0.065536 27978.05
|
||||
0.131072 28392.59
|
||||
0.262144 28408.05
|
||||
0.524288 28424.68
|
||||
1.05 28385.92
|
||||
2.10 28385.92
|
||||
4.19 28395.43
|
||||
8.39 28334.85
|
||||
16.78 26342.45
|
||||
33.55 23489.28
|
||||
67.11 22195.75
|
||||
134.22 21644.53
|
||||
268.44 21620.12
|
||||
536.87 21505.80
|
||||
1073.74 21526.50
|
||||
|
||||
Memory partial read bandwidth
|
||||
0.000512 58916.90
|
||||
0.001024 59661.44
|
||||
0.002048 61203.68
|
||||
0.004096 58783.21
|
||||
0.008192 61320.45
|
||||
0.016384 61266.70
|
||||
0.032768 60940.09
|
||||
0.065536 30488.23
|
||||
0.131072 30517.76
|
||||
0.262144 30516.13
|
||||
0.524288 29627.83
|
||||
1.05 24662.86
|
||||
2.10 17384.93
|
||||
4.19 17168.66
|
||||
8.39 16915.36
|
||||
16.78 13189.64
|
||||
33.55 11584.48
|
||||
67.11 11024.95
|
||||
134.22 10892.53
|
||||
268.44 10824.45
|
||||
536.87 10781.84
|
||||
1073.74 10759.80
|
||||
|
||||
Memory write bandwidth
|
||||
0.000512 41405.52
|
||||
0.001024 41396.47
|
||||
0.002048 41429.93
|
||||
0.004096 41445.34
|
||||
0.008192 41401.00
|
||||
0.016384 41398.70
|
||||
0.032768 41426.50
|
||||
0.065536 21381.05
|
||||
0.131072 21388.82
|
||||
0.262144 21374.31
|
||||
0.524288 21370.17
|
||||
1.05 18114.68
|
||||
2.10 12417.83
|
||||
4.19 12264.05
|
||||
8.39 12250.61
|
||||
16.78 9679.16
|
||||
33.55 8978.98
|
||||
67.11 8703.00
|
||||
134.22 8589.38
|
||||
268.44 8520.41
|
||||
536.87 8543.59
|
||||
1073.74 8544.75
|
||||
|
||||
Memory partial write bandwidth
|
||||
0.000512 41406.05
|
||||
0.001024 41431.27
|
||||
0.002048 41414.90
|
||||
0.004096 41425.45
|
||||
0.008192 41431.04
|
||||
0.016384 41453.60
|
||||
0.032768 41366.48
|
||||
0.065536 21392.21
|
||||
0.131072 21364.37
|
||||
0.262144 21381.05
|
||||
0.524288 21366.56
|
||||
1.05 18649.81
|
||||
2.10 12411.48
|
||||
4.19 12249.30
|
||||
8.39 12300.01
|
||||
16.78 9693.61
|
||||
33.55 9024.86
|
||||
67.11 8771.25
|
||||
134.22 8618.06
|
||||
268.44 8557.89
|
||||
536.87 8549.44
|
||||
1073.74 8543.32
|
||||
|
||||
Memory partial read/write bandwidth
|
||||
0.000512 20712.63
|
||||
0.001024 20714.87
|
||||
0.002048 20703.88
|
||||
0.004096 20718.77
|
||||
0.008192 20719.28
|
||||
0.016384 20715.33
|
||||
0.032768 20722.87
|
||||
0.065536 20693.70
|
||||
0.131072 20690.58
|
||||
0.262144 20638.28
|
||||
0.524288 20665.37
|
||||
1.05 18846.95
|
||||
2.10 12887.53
|
||||
4.19 12613.33
|
||||
8.39 12576.62
|
||||
16.78 10295.93
|
||||
33.55 9551.50
|
||||
67.11 9191.74
|
||||
134.22 9087.19
|
||||
268.44 9035.49
|
||||
536.87 9018.95
|
||||
1073.74 9023.04
|
||||
|
||||
Usage: tlb [-c] [-L <line size>] [-M len[K|M]] [-W <warmup>] [-N <repetitions>]
|
||||
|
||||
Memory load parallelism
|
||||
Usage: par_mem [-L <line size>] [-M len[K|M]] [-W <warmup>] [-N <repetitions>]
|
||||
|
||||
STREAM copy latency: 1.48 nanoseconds
|
||||
STREAM copy bandwidth: 10781.39 MB/sec
|
||||
STREAM scale latency: 1.50 nanoseconds
|
||||
STREAM scale bandwidth: 10668.00 MB/sec
|
||||
STREAM add latency: 2.11 nanoseconds
|
||||
STREAM add bandwidth: 11374.24 MB/sec
|
||||
STREAM triad latency: 2.13 nanoseconds
|
||||
STREAM triad bandwidth: 11264.63 MB/sec
|
||||
STREAM2 fill latency: 0.89 nanoseconds
|
||||
STREAM2 fill bandwidth: 8955.12 MB/sec
|
||||
STREAM2 copy latency: 1.48 nanoseconds
|
||||
STREAM2 copy bandwidth: 10775.60 MB/sec
|
||||
STREAM2 daxpy latency: 1.81 nanoseconds
|
||||
STREAM2 daxpy bandwidth: 13256.52 MB/sec
|
||||
STREAM2 sum latency: 1.60 nanoseconds
|
||||
STREAM2 sum bandwidth: 5006.72 MB/sec
|
||||
|
||||
Memory load latency
|
||||
"stride=128
|
||||
0.00049 3.859
|
||||
0.00098 3.859
|
||||
0.00195 3.859
|
||||
0.00293 3.859
|
||||
0.00391 3.861
|
||||
0.00586 3.858
|
||||
0.00781 3.858
|
||||
0.01172 3.859
|
||||
0.01562 3.859
|
||||
0.02344 3.859
|
||||
0.03125 3.859
|
||||
0.04688 3.861
|
||||
0.06250 11.580
|
||||
0.09375 11.576
|
||||
0.12500 11.577
|
||||
0.18750 11.583
|
||||
0.25000 11.577
|
||||
0.37500 11.576
|
||||
0.50000 11.579
|
||||
0.75000 11.578
|
||||
1.00000 11.590
|
||||
1.50000 13.543
|
||||
2.00000 13.936
|
||||
3.00000 13.999
|
||||
4.00000 13.996
|
||||
6.00000 13.997
|
||||
8.00000 14.002
|
||||
12.00000 14.976
|
||||
16.00000 19.832
|
||||
24.00000 20.880
|
||||
32.00000 21.339
|
||||
48.00000 21.899
|
||||
64.00000 22.023
|
||||
96.00000 22.156
|
||||
128.00000 22.213
|
||||
192.00000 22.283
|
||||
256.00000 22.320
|
||||
384.00000 22.306
|
||||
512.00000 22.325
|
||||
768.00000 22.345
|
||||
1024.00000 22.361
|
||||
|
||||
Random load latency
|
||||
"stride=16
|
||||
0.00049 3.859
|
||||
0.00098 3.858
|
||||
0.00195 3.858
|
||||
0.00293 3.858
|
||||
0.00391 3.858
|
||||
0.00586 3.858
|
||||
0.00781 3.859
|
||||
0.01172 3.858
|
||||
0.01562 3.858
|
||||
0.02344 3.859
|
||||
0.03125 3.859
|
||||
0.04688 3.864
|
||||
0.06250 11.575
|
||||
0.09375 14.276
|
||||
0.12500 15.462
|
||||
0.18750 16.079
|
||||
0.25000 16.646
|
||||
0.37500 16.373
|
||||
0.50000 16.352
|
||||
0.75000 18.529
|
||||
1.00000 18.245
|
||||
1.50000 42.351
|
||||
2.00000 55.350
|
||||
3.00000 61.011
|
||||
4.00000 62.143
|
||||
6.00000 63.587
|
||||
8.00000 65.259
|
||||
12.00000 84.563
|
||||
16.00000 107.165
|
||||
24.00000 131.898
|
||||
32.00000 141.864
|
||||
48.00000 150.654
|
||||
64.00000 156.245
|
||||
96.00000 162.950
|
||||
128.00000 167.497
|
||||
192.00000 170.394
|
||||
256.00000 171.779
|
||||
384.00000 172.858
|
||||
512.00000 172.877
|
||||
768.00000 173.626
|
||||
1024.00000 173.702
|
||||
|
||||
|
||||
|
||||
[Sat Aug 10 04:39:13 PM CEST 2024]
|
22
benchmark_low_level/t14-bwr.csv
Normal file
|
@ -0,0 +1,22 @@
|
|||
0.000512,29201.61
|
||||
0.001024,29294.55
|
||||
0.002048,29363.12
|
||||
0.004096,29433.86
|
||||
0.008192,29442.59
|
||||
0.016384,29285.40
|
||||
0.032768,29336.30
|
||||
0.065536,27978.05
|
||||
0.131072,28392.59
|
||||
0.262144,28408.05
|
||||
0.524288,28424.68
|
||||
1.05,28385.92
|
||||
2.10,28385.92
|
||||
4.19,28395.43
|
||||
8.39,28334.85
|
||||
16.78,26342.45
|
||||
33.55,23489.28
|
||||
67.11,22195.75
|
||||
134.22,21644.53
|
||||
268.44,21620.12
|
||||
536.87,21505.80
|
||||
1073.74,21526.50
|
|
22
benchmark_low_level/t14-bww.csv
Normal file
|
@ -0,0 +1,22 @@
|
|||
0.000512,41405.52
|
||||
0.001024,41396.47
|
||||
0.002048,41429.93
|
||||
0.004096,41445.34
|
||||
0.008192,41401.00
|
||||
0.016384,41398.70
|
||||
0.032768,41426.50
|
||||
0.065536,21381.05
|
||||
0.131072,21388.82
|
||||
0.262144,21374.31
|
||||
0.524288,21370.17
|
||||
1.05,18114.68
|
||||
2.10,12417.83
|
||||
4.19,12264.05
|
||||
8.39,12250.61
|
||||
16.78,9679.16
|
||||
33.55,8978.98
|
||||
67.11,8703.00
|
||||
134.22,8589.38
|
||||
268.44,8520.41
|
||||
536.87,8543.59
|
||||
1073.74,8544.75
|
|
41
benchmark_low_level/t14-lrnd.csv
Normal file
|
@ -0,0 +1,41 @@
|
|||
0.00049,3.859
|
||||
0.00098,3.858
|
||||
0.00195,3.858
|
||||
0.00293,3.858
|
||||
0.00391,3.858
|
||||
0.00586,3.858
|
||||
0.00781,3.859
|
||||
0.01172,3.858
|
||||
0.01562,3.858
|
||||
0.02344,3.859
|
||||
0.03125,3.859
|
||||
0.04688,3.864
|
||||
0.06250,11.575
|
||||
0.09375,14.276
|
||||
0.12500,15.462
|
||||
0.18750,16.079
|
||||
0.25000,16.646
|
||||
0.37500,16.373
|
||||
0.50000,16.352
|
||||
0.75000,18.529
|
||||
1.00000,18.245
|
||||
1.50000,42.351
|
||||
2.00000,55.350
|
||||
3.00000,61.011
|
||||
4.00000,62.143
|
||||
6.00000,63.587
|
||||
8.00000,65.259
|
||||
12.00000,84.563
|
||||
16.00000,107.165
|
||||
24.00000,131.898
|
||||
32.00000,141.864
|
||||
48.00000,150.654
|
||||
64.00000,156.245
|
||||
96.00000,162.950
|
||||
128.00000,167.497
|
||||
192.00000,170.394
|
||||
256.00000,171.779
|
||||
384.00000,172.858
|
||||
512.00000,172.877
|
||||
768.00000,173.626
|
||||
1024.00000,173.702
|
|
41
benchmark_low_level/t14-lseq.csv
Normal file
|
@ -0,0 +1,41 @@
|
|||
0.00049,3.859
|
||||
0.00098,3.859
|
||||
0.00195,3.859
|
||||
0.00293,3.859
|
||||
0.00391,3.861
|
||||
0.00586,3.858
|
||||
0.00781,3.858
|
||||
0.01172,3.859
|
||||
0.01562,3.859
|
||||
0.02344,3.859
|
||||
0.03125,3.859
|
||||
0.04688,3.861
|
||||
0.06250,11.580
|
||||
0.09375,11.576
|
||||
0.12500,11.577
|
||||
0.18750,11.583
|
||||
0.25000,11.577
|
||||
0.37500,11.576
|
||||
0.50000,11.579
|
||||
0.75000,11.578
|
||||
1.00000,11.590
|
||||
1.50000,13.543
|
||||
2.00000,13.936
|
||||
3.00000,13.999
|
||||
4.00000,13.996
|
||||
6.00000,13.997
|
||||
8.00000,14.002
|
||||
12.00000,14.976
|
||||
16.00000,19.832
|
||||
24.00000,20.880
|
||||
32.00000,21.339
|
||||
48.00000,21.899
|
||||
64.00000,22.023
|
||||
96.00000,22.156
|
||||
128.00000,22.213
|
||||
192.00000,22.283
|
||||
256.00000,22.320
|
||||
384.00000,22.306
|
||||
512.00000,22.325
|
||||
768.00000,22.345
|
||||
1024.00000,22.361
|
|
29
benchmark_python/README.md
Normal file
|
@ -0,0 +1,29 @@
|
|||
## Benchmark
|
||||

|
||||

|
||||
|
||||
## Benchmarks details
|
||||
|
||||
- Create in memory a list of `N=128` one dimensional numpy arrays of length `L` starting from `L=2` up to `L=2^22=4_194_304` in steps of powers of two.
|
||||
- Given that each item of the array is of type `float64`, i.e. 8 bytes, the size of the arrays goes from `16B` to `32M`
|
||||
- The total memory required is at least `128 × 32M × 2 = 8G`
|
||||
- Load the whole list in one big numpy array of size `N`x`L` (*good*) and `L`x`N` (*bad*). The corresponding loops are:
|
||||
```python
|
||||
# good loop (store each time series on a different row)
|
||||
for row, time_series in enumerate(collection):
|
||||
ts[row, :] = time_series
|
||||
```
|
||||
|
||||
```python
|
||||
# bad loop (store each time series on a different column)
|
||||
for column, time_series in enumerate(collection):
|
||||
ts[:, column] = time_series
|
||||
```
|
||||
- Time the *bad* and the *good* loop
|
||||
- Plot the timings for the *good* and the *bad* loop as a function of `L`
|
||||
- Plot `slowdown = time_bad/time_good` as a function of `L`
|
||||
|
||||
## Scripts
|
||||
|
||||
- [Benchmark script](bench.py) and [measurements](results_ns128)
|
||||
- [Plotting script](bench_plot.py)
|
68
benchmark_python/bench.py
Executable file
|
@ -0,0 +1,68 @@
|
|||
#!/usr/bin/python3
|
||||
# Commands and ideas from: https://llvm.org/docs/Benchmarking.html
|
||||
# we assume that CPU 0 and 1 are together in an intel SMT-pair (hyperthreading)
|
||||
# - Disable address space randomization:
|
||||
# echo 0 > /proc/sys/kernel/randomize_va_space
|
||||
# - Set scaling governor to performance for CPU 0
|
||||
# echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
|
||||
# - Reserve CPU 0 fro our benchmark
|
||||
# cset shield --cpu 0 --kthread=on
|
||||
# - Disable the SMT-peer of CPU 0, i.e. CPU 1
|
||||
# echo 0 > /sys/devices/system/cpu/cpu1/online
|
||||
# - Disable turbo mode (works only on Intel):
|
||||
# echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
|
||||
# - Run with
|
||||
# cset shield --exec -- ./bench.py
|
||||
#
|
||||
|
||||
# To use the full power of the CPU, skip all the other steps and
|
||||
# just run with
|
||||
# taskset --cpu-list 0 ./bench.py
|
||||
import os
|
||||
import sys
|
||||
import timeit
|
||||
|
||||
import numpy as np
|
||||
|
||||
NSERIES = (128, )
|
||||
POWS = 2**np.arange(2, 23, dtype=int)
|
||||
|
||||
# Size of one dimensional numpy arrays of dtype 'float64':
|
||||
# A fix overhead of 96 bytes plus a variable size:
|
||||
# (n_items x 8 bytes)
|
||||
|
||||
def load_data_row(x, time_series):
|
||||
"""Store one time series per raw"""
|
||||
for row, ts in enumerate(time_series):
|
||||
x[row,:] = ts
|
||||
return x
|
||||
|
||||
def load_data_column(x, time_series):
|
||||
"""Store one time series per column"""
|
||||
for column, ts in enumerate(time_series):
|
||||
x[:, column] = ts
|
||||
return x
|
||||
|
||||
if __name__ == '__main__':
|
||||
for nseries in NSERIES:
|
||||
print(30*'=', '\n', nseries)
|
||||
float_items = POWS
|
||||
byte_sizes = (float_items*8) #+ 96
|
||||
bads = []
|
||||
goods = []
|
||||
results = open(f'results_ns{nseries}', 'wt')
|
||||
for i, len_one_series in enumerate(float_items):
|
||||
time_series = np.zeros((nseries, len_one_series), dtype='float64')
|
||||
x = np.empty((nseries, len_one_series), dtype='float64')
|
||||
print('Timing good...')
|
||||
good = min(timeit.repeat(lambda: load_data_row(x, time_series), number=5))/5
|
||||
x = np.empty((len_one_series, nseries), dtype='float64')
|
||||
print('Timing bad...')
|
||||
bad = min(timeit.repeat(lambda: load_data_column(x, time_series), number=5))/5
|
||||
print(f'{len_one_series}/{POWS[-1]} {good} {bad}')
|
||||
bads.append(bad)
|
||||
goods.append(good)
|
||||
results.write(f'{byte_sizes[i]} {good} {bad}\n')
|
||||
results.flush()
|
||||
results.close()
|
||||
|
106
benchmark_python/bench_plot.py
Executable file
|
@ -0,0 +1,106 @@
|
|||
#!/usr/bin/python3
|
||||
import os
|
||||
import sys
|
||||
|
||||
import numpy as np
|
||||
import matplotlib
|
||||
from matplotlib import pyplot as plt
|
||||
plt.style.use('ggplot')
|
||||
matplotlib.rcParams['font.size'] = 12
|
||||
matplotlib.rcParams['font.family'] = ['Exo 2', 'sans-serif']
|
||||
|
||||
from bench import NSERIES
|
||||
|
||||
def get_xlabels(x):
|
||||
xlabels = []
|
||||
for value in x:
|
||||
b = int(2**value)
|
||||
if b < 1024:
|
||||
xlabels.append(f'{b}B')
|
||||
elif b < 1048576:
|
||||
xlabels.append(f'{b//1024}K')
|
||||
elif b < 1073741824:
|
||||
xlabels.append(f'{b//1024//1024}M')
|
||||
else:
|
||||
xlabels.append(f'{b//1024//1024//1024}G')
|
||||
return xlabels
|
||||
|
||||
def get_ylabels(y):
|
||||
ylabels = []
|
||||
for power in y:
|
||||
power = int(np.log10(power))
|
||||
if power < -6:
|
||||
value = 10**(power+9)
|
||||
ylabels.append(f'{value}ns')
|
||||
elif power < -3:
|
||||
value = 10**(power+6)
|
||||
ylabels.append(f'{value}μs')
|
||||
elif power < 0:
|
||||
value = 10**(power+3)
|
||||
ylabels.append(f'{value}ms')
|
||||
else:
|
||||
value = 10**power
|
||||
ylabels.append(f'{value}s')
|
||||
return ylabels
|
||||
|
||||
prefix = 'results_ns'
|
||||
maxy = 1e1
|
||||
miny = 1e-6
|
||||
for results in (f for f in os.listdir('.') if f.startswith(prefix)):
|
||||
num_series = results.removeprefix(prefix)
|
||||
|
||||
sizes, bads, goods = [], [], []
|
||||
with open(results, 'r') as fh:
|
||||
for line in fh:
|
||||
size, good, bad = line.split()
|
||||
bads.append(float(bad))
|
||||
goods.append(float(good))
|
||||
sizes.append(int(size))
|
||||
goods = np.array(goods)
|
||||
bads = np.array(bads)
|
||||
x = np.log2(sizes)
|
||||
y1 = goods
|
||||
y2 = bads
|
||||
# generate two plots: good+bad timings and slowdown plot
|
||||
plt.figure(figsize=(8.5, 7.5))
|
||||
p1, = plt.semilogy(x, y1, 'o')
|
||||
p2, = plt.semilogy(x, y2, 'o')
|
||||
plt.xlabel('size of one time series')
|
||||
plt.ylabel('loading time')
|
||||
plt.grid(None)
|
||||
plt.grid(which='both', axis='both')
|
||||
plt.xticks(x, get_xlabels(x), rotation=60)
|
||||
plt.ylim(miny, maxy)
|
||||
yticks = np.logspace(int(np.log10(miny)),
|
||||
int(np.log10(maxy)),
|
||||
num=int(np.log10(maxy/miny))+1)
|
||||
plt.yticks(yticks, get_ylabels(yticks))
|
||||
plt.tick_params(axis='y', labelright=True, right=True)
|
||||
lgd = plt.legend((p1, p2), ('good', 'bad'), frameon=True)
|
||||
lgd.get_frame().set_edgecolor('black')
|
||||
plt.title(f'Timings\n{num_series} time series')
|
||||
plt.savefig(f'loading-timings-ns{num_series}.svg')
|
||||
|
||||
# slowdown plot
|
||||
plt.figure(figsize=(8.5, 7.5))
|
||||
p1, = plt.plot(x, bads/goods, 'og', label=r'$\frac{\mathrm{time\_bad}}{\mathrm{time\_good}}$')
|
||||
plt.xlabel('size of one time series')
|
||||
plt.ylabel('slowdown')
|
||||
plt.grid(None)
|
||||
plt.grid(which='both', axis='both')
|
||||
plt.xticks(x, get_xlabels(x), rotation=60)
|
||||
plt.tick_params(axis='y', which='both', reset=True, labelright=True, right=True)
|
||||
lmaxy = (bads/goods).max()
|
||||
yticks = range(0, int(np.ceil(lmaxy))+1)
|
||||
yticks_labels = []
|
||||
for i in yticks:
|
||||
if not i%5:
|
||||
yticks_labels.append(str(i))
|
||||
else:
|
||||
yticks_labels.append('')
|
||||
plt.yticks(yticks, yticks_labels)
|
||||
#plt.legend((p1,), ('time_bad/time_good',))
|
||||
lgd = plt.legend(frameon=True, fontsize=16)
|
||||
lgd.get_frame().set_edgecolor('black')
|
||||
plt.title(f'Slowdown\n{num_series} time series')
|
||||
plt.savefig(f'loading-slowdown-ns{num_series}.svg')
|
2360
benchmark_python/loading-slowdown-ns128.svg
Normal file
After Width: | Height: | Size: 70 KiB |
2348
benchmark_python/loading-timings-ns128.svg
Normal file
After Width: | Height: | Size: 73 KiB |
21
benchmark_python/results_ns128
Normal file
|
@ -0,0 +1,21 @@
|
|||
32 3.624640012276359e-05 3.637080008047633e-05
|
||||
64 3.6378599907038735e-05 3.75960000383202e-05
|
||||
128 3.5999999818159266e-05 3.659379981399979e-05
|
||||
256 3.647979974630289e-05 3.8503600080730394e-05
|
||||
512 3.6466600067797114e-05 4.5260399929247797e-05
|
||||
1024 3.694279985211324e-05 6.389359987224452e-05
|
||||
2048 3.771199990296736e-05 0.00010158859986404422
|
||||
4096 3.9600799937034026e-05 0.00017927039989444892
|
||||
8192 4.872880017501302e-05 0.00035019980023207606
|
||||
16384 0.0001166390000435058 0.0009462125999561977
|
||||
32768 0.00020714380007120782 0.0018965299997944385
|
||||
65536 0.00041309340012958274 0.003799166400131071
|
||||
131072 0.0009548601999995298 0.013321203599844011
|
||||
262144 0.002020656999957282 0.03497214360031649
|
||||
524288 0.004090915599954314 0.07588242120000359
|
||||
1048576 0.008223017600175807 0.15468134920010926
|
||||
2097152 0.01658681320004689 0.3135109368002304
|
||||
4194304 0.03318921820027754 0.629940018599882
|
||||
8388608 0.031091862199900788 1.2587968365998676
|
||||
16777216 0.0620614524003031 2.517574722000063
|
||||
33554432 0.125347068800329 5.040295794399936
|
241
exercise-my-solution.ipynb
Normal file
|
@ -0,0 +1,241 @@
|
|||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"execution": {
|
||||
"iopub.execute_input": "2024-03-04T09:40:28.904Z",
|
||||
"iopub.status.busy": "2024-03-04T09:40:28.896Z",
|
||||
"iopub.status.idle": "2024-03-04T09:40:28.978Z",
|
||||
"shell.execute_reply": "2024-03-04T09:40:28.967Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"execution": {
|
||||
"iopub.execute_input": "2024-03-04T10:02:39.062Z",
|
||||
"iopub.status.busy": "2024-03-04T10:02:39.057Z",
|
||||
"iopub.status.idle": "2024-03-04T10:02:39.068Z",
|
||||
"shell.execute_reply": "2024-03-04T10:02:39.071Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"n_series = 32\n",
|
||||
"len_one_series = 5*2**20\n",
|
||||
"time_series = np.random.rand(n_series, len_one_series)\n",
|
||||
"gap = 16*2**10"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"execution": {
|
||||
"iopub.execute_input": "2024-03-04T10:02:41.027Z",
|
||||
"iopub.status.busy": "2024-03-04T10:02:41.020Z",
|
||||
"iopub.status.idle": "2024-03-04T10:02:41.036Z",
|
||||
"shell.execute_reply": "2024-03-04T10:02:41.040Z"
|
||||
},
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Size of one time series: 40 M\n",
|
||||
"Size of collection: 1280 M\n",
|
||||
"Gap size: 128 K\n",
|
||||
"Gapped series size: 2 K\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(f'Size of one time series: {int(time_series[0].nbytes/2**20)} M')\n",
|
||||
"print(f'Size of collection: {int(time_series.nbytes/2**20)} M')\n",
|
||||
"print(f'Gap size: {int(gap*8/2**10)} K')\n",
|
||||
"print(f'Gapped series size: {int(time_series[0, ::gap].nbytes/2**10)} K')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The following function implements an approximation of a power series of every `gap` value in our time series.\n",
|
||||
"\n",
|
||||
"If we define one time series of length `N` to be:\n",
|
||||
"\n",
|
||||
"$[x_0, x_1, x_2, ..., x_N]$,\n",
|
||||
"\n",
|
||||
"then the \"gapped\" series with `gap=g` is:\n",
|
||||
"\n",
|
||||
"$[x_0, x_g, x_{2g}, ..., x_{N/g}]$,\n",
|
||||
"\n",
|
||||
"where $N/g$ is the number of gaps.\n",
|
||||
"\n",
|
||||
"The approximation of the power series up to power `30` for our \"gapped\" series is defined as:\n",
|
||||
"\n",
|
||||
"$$\\mathbf{P} = \\sum_{p=0}^{30} \\sum_i x_i^{p} = \\sum_i x_i^0 + \\sum_i x_i^1 + \\sum_i x_i^2 + ... + \\sum_i x_i^{30} $$\n",
|
||||
"\n",
|
||||
"where $i \\in [0, g, 2g, ..., N/g]$"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"execution": {
|
||||
"iopub.execute_input": "2024-03-04T10:06:08.461Z",
|
||||
"iopub.status.busy": "2024-03-04T10:06:08.459Z",
|
||||
"iopub.status.idle": "2024-03-04T10:06:08.466Z",
|
||||
"shell.execute_reply": "2024-03-04T10:06:08.468Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# compute an approximation of a power series for a collection of gapped timeseries\n",
|
||||
"def power(time_series, P, gap):\n",
|
||||
" for row in range(time_series.shape[0]):\n",
|
||||
" for pwr in range(30):\n",
|
||||
" P[row] += (time_series[row, ::gap]**pwr).sum()\n",
|
||||
" return P\n",
|
||||
" "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Challenge\n",
|
||||
"- Can you improve on the above implementation of the `power` function?\n",
|
||||
"- Change the following `power_improved` function and see what you can do\n",
|
||||
"- **Remember**: you can't change any other cell in this notebook!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {
|
||||
"execution": {
|
||||
"iopub.execute_input": "2024-03-04T10:06:08.461Z",
|
||||
"iopub.status.busy": "2024-03-04T10:06:08.459Z",
|
||||
"iopub.status.idle": "2024-03-04T10:06:08.466Z",
|
||||
"shell.execute_reply": "2024-03-04T10:06:08.468Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def power_improved(time_series, P, gap):\n",
|
||||
" y = time_series[:,::gap].copy()\n",
|
||||
" for row in range(time_series.shape[0]):\n",
|
||||
" for pwr in range(30):\n",
|
||||
" P[row] += (y[row, :]**pwr).sum()\n",
|
||||
" return P"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# verify that they yield the same results\n",
|
||||
"P = np.zeros(n_series, dtype='float64')\n",
|
||||
"out1 = power(time_series, P, gap)\n",
|
||||
"P = np.zeros(n_series, dtype='float64')\n",
|
||||
"out2 = power_improved(time_series, P, gap)\n",
|
||||
"np.testing.assert_allclose(out1, out2)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {
|
||||
"execution": {
|
||||
"iopub.execute_input": "2024-03-04T10:06:14.959Z",
|
||||
"iopub.status.busy": "2024-03-04T10:06:14.956Z",
|
||||
"iopub.status.idle": "2024-03-04T10:06:17.437Z",
|
||||
"shell.execute_reply": "2024-03-04T10:06:17.443Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"38.9 ms ± 492 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"P = np.zeros(n_series, dtype='float64')\n",
|
||||
"%timeit power(time_series, P, gap)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {
|
||||
"execution": {
|
||||
"iopub.execute_input": "2024-03-04T10:06:20.056Z",
|
||||
"iopub.status.busy": "2024-03-04T10:06:20.053Z",
|
||||
"iopub.status.idle": "2024-03-04T10:06:21.695Z",
|
||||
"shell.execute_reply": "2024-03-04T10:06:21.700Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"6.79 ms ± 35.2 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"P = np.zeros(n_series, dtype='float64')\n",
|
||||
"%timeit power_improved(time_series, P, gap)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.13.5"
|
||||
},
|
||||
"nteract": {
|
||||
"version": "0.28.0"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
31
numpy/README.md
Normal file
|
@ -0,0 +1,31 @@
|
|||
# Anatomy of a numpy array
|
||||
## one dimension, float64
|
||||

|
||||
|
||||
## two dimensions, square, float64
|
||||

|
||||
|
||||
## two dimensions, rectangular, int32
|
||||

|
||||
|
||||
## what about Python lists?
|
||||

|
||||
|
||||
## interesting attributes of numpy arrays
|
||||
- `x.data`, `x.data.hex()`, `x.data.format`, `x.tobytes()`
|
||||
- `x.flags`:
|
||||
- `OWNDATA`
|
||||
- `C_CONTIGUOUS`
|
||||
- `F_CONTIGUOUS`
|
||||
- more [flags](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.flags.html)
|
||||
|
||||
## If your arrays are bigger than RAM
|
||||
- [`numpy.memmap`](https://numpy.org/doc/stable/reference/generated/numpy.memmap.html): an array-like
|
||||
object that maps memory to an array stored on disk, used for accessing small segments of large
|
||||
files on disk, without reading the entire file into memory. Use with caution!
|
||||
|
||||
- [`HDF5`](https://support.hdfgroup.org/documentation/hdf5/latest/_intro_h_d_f5.html): hierarchical
|
||||
data format, with aribitrary metadata and multilanguage support
|
||||
with [`h5py`](https://docs.h5py.org/en/stable/) with an array-lie interface
|
||||
- other projects, for example [`xarray`](https://docs.xarray.dev/en/stable/)
|
||||
and [`zarr`](https://zarr.readthedocs.io/en/stable/)
|
1
numpy/ndarray-memory-layout-1d.svg
Normal file
After Width: | Height: | Size: 60 KiB |
1
numpy/ndarray-memory-layout-2d-rectangular.svg
Normal file
After Width: | Height: | Size: 69 KiB |
1
numpy/ndarray-memory-layout-2d-square.svg
Normal file
After Width: | Height: | Size: 78 KiB |
1
numpy/python-list-memory-layout.svg
Normal file
After Width: | Height: | Size: 64 KiB |
24
parallel/README.md
Normal file
|
@ -0,0 +1,24 @@
|
|||
# The dangers and joys of automatic parallelization (like in numpy linear algebra routines) and the use of clusters/schedulers (but also on your laptop)
|
||||
- Go through the [notebook](../parallel.ipynb) to play around with numpy auto-parallelization, CPU affinity and OpenMP thread pool control
|
||||
|
||||
- Now we want to submit our code to a cluster, or even just running it in parallel on our own laptop:
|
||||
- run [`overcommit.py`](overcommit.py) while monitoring with htop
|
||||
- try the [`submit.sh`](submit.sh) script
|
||||
- see problems with overcomitting
|
||||
- explain the PSI (Pressure Stalled Information) fields in `htop`. Useful readings:
|
||||
- https://docs.kernel.org/accounting/psi.html
|
||||
- https://facebookmicrosites.github.io/psi/docs/overview
|
||||
- Discuss implications for local and cluster workflows
|
||||
|
||||
# Hands on
|
||||
- Let's try to make it more quantitative:
|
||||
- Write a benchmark in the style of [benchmark_python](../benchmark_python/bench.py)
|
||||
- We want to assess the performance of matrix multiplication as a function of:
|
||||
- the size of the matrix `N`
|
||||
- the number of openMP threads `T`, controlled with `threadpoolctl` or by environment variable `OMP_NUM_THREADS`
|
||||
- the number of processes `P`, controlled by the [`submit.sh`](submit.sh) script or something similar
|
||||
- The results will of course depend on the particular architecture of the machine on which you are running
|
||||
- Submit your benchmark, together with some plotting routines, as a PR to this repo!
|
||||
|
||||
|
||||
|
37
setup/generate_materials.py
Normal file
|
@ -0,0 +1,37 @@
|
|||
MSVG = 'template.svg'
|
||||
TEMPL = '0xNN'
|
||||
NREG = 4
|
||||
N = 30
|
||||
|
||||
|
||||
import sys
|
||||
import subprocess
|
||||
|
||||
if N > 255:
|
||||
print('cannot do it!')
|
||||
sys.exit()
|
||||
|
||||
addresses = [hex(i).replace("0x", "0x0") for i in range(N//2+1)] + \
|
||||
[hex(i) for i in range(N//2+1,N)] + \
|
||||
[f'REG{i}' for i in range(NREG)]
|
||||
|
||||
msvg = open(MSVG).read()
|
||||
delete = []
|
||||
to_join = []
|
||||
for a in addresses:
|
||||
print(f'Processing {a}...')
|
||||
new_svg = a+MSVG
|
||||
new_svg_pdf = new_svg.replace('.svg', '.pdf')
|
||||
with open(new_svg, 'wt') as svg:
|
||||
svg.write(msvg.replace(TEMPL,a))
|
||||
delete.extend([new_svg, new_svg_pdf])
|
||||
subprocess.run(['inkscape', '--export-filename='+new_svg_pdf, new_svg], capture_output=True)
|
||||
to_join.append(new_svg_pdf)
|
||||
|
||||
subprocess.run(['pdftk'] + to_join + ['output', 'materials.pdf'], capture_output=True)
|
||||
subprocess.check_call(['rm'] + delete)
|
||||
|
||||
|
||||
|
||||
|
||||
|
BIN
setup/materials.pdf
Normal file
105
setup/template.svg
Normal file
|
@ -0,0 +1,105 @@
|
|||
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
|
||||
<!-- Created with Inkscape (http://www.inkscape.org/) -->
|
||||
|
||||
<svg
|
||||
width="297mm"
|
||||
height="210mm"
|
||||
viewBox="0 0 297 210"
|
||||
version="1.1"
|
||||
id="svg1"
|
||||
inkscape:version="1.4 (e7c3feb100, 2024-10-09)"
|
||||
sodipodi:docname="template.svg"
|
||||
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
|
||||
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
|
||||
xmlns="http://www.w3.org/2000/svg"
|
||||
xmlns:svg="http://www.w3.org/2000/svg">
|
||||
<sodipodi:namedview
|
||||
id="namedview1"
|
||||
pagecolor="#ffffff"
|
||||
bordercolor="#000000"
|
||||
borderopacity="0.25"
|
||||
inkscape:showpageshadow="2"
|
||||
inkscape:pageopacity="0.0"
|
||||
inkscape:pagecheckerboard="0"
|
||||
inkscape:deskcolor="#d1d1d1"
|
||||
inkscape:document-units="mm"
|
||||
showguides="false"
|
||||
inkscape:zoom="0.8979798"
|
||||
inkscape:cx="543.44207"
|
||||
inkscape:cy="383.08211"
|
||||
inkscape:window-width="1681"
|
||||
inkscape:window-height="1210"
|
||||
inkscape:window-x="654"
|
||||
inkscape:window-y="108"
|
||||
inkscape:window-maximized="0"
|
||||
inkscape:current-layer="layer1" />
|
||||
<defs
|
||||
id="defs1">
|
||||
<rect
|
||||
x="416.49044"
|
||||
y="455.46682"
|
||||
width="189.31384"
|
||||
height="63.475816"
|
||||
id="rect3" />
|
||||
</defs>
|
||||
<g
|
||||
inkscape:label="Layer 1"
|
||||
inkscape:groupmode="layer"
|
||||
id="layer1">
|
||||
<path
|
||||
style="fill:#000000;stroke-width:0.264999"
|
||||
d="M 8.5446427,90.75 C 284.33035,91.339285 283.74107,91.339285 283.74107,91.339285"
|
||||
id="path1" />
|
||||
<text
|
||||
xml:space="preserve"
|
||||
transform="scale(0.26458333)"
|
||||
id="text3"
|
||||
style="font-size:16px;line-height:100%;font-family:'Exo 2';-inkscape-font-specification:'Exo 2, ';text-align:start;text-decoration-color:#000000;letter-spacing:0px;word-spacing:-1.2px;writing-mode:lr-tb;direction:ltr;white-space:pre;shape-inside:url(#rect3);fill:#000000;stroke:#000000;stroke-width:11.3386;stroke-dasharray:none;stroke-opacity:1" />
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:4.23333px;line-height:100%;font-family:'Exo 2';-inkscape-font-specification:'Exo 2, ';text-align:start;text-decoration-color:#000000;letter-spacing:0px;word-spacing:-0.3175px;writing-mode:lr-tb;direction:ltr;text-anchor:start;fill:#000000;stroke:#000000;stroke-width:0.1;stroke-dasharray:none;stroke-opacity:1"
|
||||
x="113.4375"
|
||||
y="140.83928"
|
||||
id="text4"><tspan
|
||||
sodipodi:role="line"
|
||||
id="tspan4"
|
||||
style="stroke-width:0.1;stroke-dasharray:none;stroke:#000000;fill:#000000"
|
||||
x="113.4375"
|
||||
y="140.83928" /></text>
|
||||
<path
|
||||
style="fill:#000000;stroke:#000000;stroke-width:2.87555;stroke-dasharray:11.50220013,2.87555003;stroke-opacity:1;stroke-dashoffset:0"
|
||||
d="M 13.503009,105.28242 C 283.49699,104.71758 283.49699,104.71758 283.49699,104.71758"
|
||||
id="path2" />
|
||||
<path
|
||||
style="fill:#000000;stroke:#000000;stroke-width:2.87555;stroke-dasharray:11.50220013,2.87555003;stroke-opacity:1;stroke-dashoffset:0"
|
||||
d="M 13.503011,199.34547 C 283.49699,198.78063 283.49699,198.78063 283.49699,198.78063"
|
||||
id="path2-1" />
|
||||
<path
|
||||
style="fill:#000000;stroke:#000000;stroke-width:2.87555;stroke-dasharray:11.50220013,2.87555003;stroke-opacity:1;stroke-dashoffset:0"
|
||||
d="m 13.50301,12.253503 c 269.99398,-0.56484 269.99398,-0.56484 269.99398,-0.56484"
|
||||
id="path2-4" />
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:50.8px;line-height:100%;font-family:'Exo 2';-inkscape-font-specification:'Exo 2, ';text-align:start;text-decoration-color:#000000;letter-spacing:0px;word-spacing:-0.3175px;writing-mode:lr-tb;direction:ltr;text-anchor:start;fill:#000000;stroke:#000000;stroke-width:0.1;stroke-dasharray:none;stroke-opacity:1"
|
||||
x="84.542786"
|
||||
y="162.20676"
|
||||
id="text5"><tspan
|
||||
sodipodi:role="line"
|
||||
id="tspan5"
|
||||
style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:50.8px;font-family:'MesloLGL Nerd Font Mono';-inkscape-font-specification:'MesloLGL Nerd Font Mono';stroke-width:0.1"
|
||||
x="84.542786"
|
||||
y="162.20676">0xNN</tspan></text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:50.8px;line-height:100%;font-family:'Exo 2';-inkscape-font-specification:'Exo 2, ';text-align:start;text-decoration-color:#000000;letter-spacing:0px;word-spacing:-0.3175px;writing-mode:lr-tb;direction:ltr;text-anchor:start;fill:#000000;stroke:#000000;stroke-width:0.1;stroke-dasharray:none;stroke-opacity:1"
|
||||
x="-212.45721"
|
||||
y="-47.793243"
|
||||
id="text5-7"
|
||||
transform="scale(-1)"><tspan
|
||||
sodipodi:role="line"
|
||||
id="tspan5-4"
|
||||
style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:50.8px;font-family:'MesloLGL Nerd Font Mono';-inkscape-font-specification:'MesloLGL Nerd Font Mono';stroke-width:0.1"
|
||||
x="-212.45721"
|
||||
y="-47.793243">0xNN</tspan></text>
|
||||
</g>
|
||||
</svg>
|
After Width: | Height: | Size: 4.9 KiB |