added low level benchmark
This commit is contained in:
parent
f8b83015fa
commit
05e5e18e59
57
benchmark_low_level/README.md
Normal file
57
benchmark_low_level/README.md
Normal file
|
@ -0,0 +1,57 @@
|
|||
# Low Level Memory Benchmark
|
||||
|
||||
These are the results of a low level memory benchmark (written in C) on my [laptop](../architecture/README.md)
|
||||
|
||||
## Summary plots (details below)
|
||||
![Memory Bandwidth P-core](bandwidth-t14.svg)
|
||||
![Memory Latency P-core](latency-t14.svg)
|
||||
|
||||
## Benchmarks details:
|
||||
|
||||
- Bandwidth (read), [bw_mem_rd](http://lmbench.sourceforge.net/man/bw_mem_rd.8.html). Allocate the specified amount of memory, zero it, and then time the reading of that memory as a series of integer loads and adds. Each 4-byte integer is loaded and added to accumulator.
|
||||
|
||||
[Results](t14-bwr.csv) (block size in MB, bandwith in MB/s)
|
||||
- Bandwidth (write),[bw_mem](http://lmbench.sourceforge.net/man/bw_mem.8.html). Allocate twice the specified amount of memory, zero it, and then time the copying of the first half to the second half.
|
||||
|
||||
[Results](t14-bww.csv) (block size in MB, bandwith in MB/s)
|
||||
- Latency (sequential access), [lat_mem_rd](http://lmbench.sourceforge.net/man/lat_mem_rd.8.html). Run two nested loops. The outer loop is the stride size of 128 bytes. The inner loop is the block size. For each block size, create a ring of pointers that point backward one stride. Traverse the block by `p = (char **)*p` in a for loop and time the load ladency over block.
|
||||
|
||||
[Results](t14-lseq.csv) (block size in MB, latency in ns)
|
||||
- Latency (random access). Like above, but with a stride size of 16 bytes.
|
||||
|
||||
[Results](t14-lrnd.csv) (block size in MB, latency in ns)
|
||||
|
||||
## Running the benchmarks on Linux:
|
||||
- You need the [lmbench](http://lmbench.sourceforge.net/) library and [cpuset](https://github.com/SUSE/cpuset)
|
||||
- All commands must be run as root after having killed as many processes/services as possible, so that the CPUs are almost idle
|
||||
- Disable address space randomization:
|
||||
```bash
|
||||
echo 0 > /proc/sys/kernel/randomize_va_space
|
||||
```
|
||||
- Set scaling governor to performance for CPU0:
|
||||
```bash
|
||||
echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
|
||||
```
|
||||
- Reserve CPU 0 fro our benchmark, i.e. kick out (almost) all other processes
|
||||
```bash
|
||||
cset shield --cpu 0 --kthread=on
|
||||
```
|
||||
- If you are on INTEL and CPU0 is part of a SMT-pair (hyperthreading), disable the peer
|
||||
```bash
|
||||
echo 0 > /sys/devices/system/cpu/cpu1/online
|
||||
```
|
||||
- Disable turbo mode on INTEL:
|
||||
```bash
|
||||
echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
|
||||
```
|
||||
- Run the configuration script for lmbench. Select only the `HARDWARE` set of benchmarks and set the maximum amount of memory to something like 1024MB
|
||||
```bash
|
||||
cd /usr/lib/lmbench/scripts
|
||||
# the following command will create the configuration file /usr/lib/lmbench/bin/x86_64-linux-gnu/CONFIG.<hostname>
|
||||
cset shield --exec -- ./config-run
|
||||
# run the benchmark
|
||||
cset shield --exec -- /usr/bin/lmbench-run
|
||||
# results are in /var/lib/lmbench/results/x86_64-linux-gnu/<hostname>
|
||||
```
|
||||
|
||||
|
1721
benchmark_low_level/bandwidth-cpu4.svg
Normal file
1721
benchmark_low_level/bandwidth-cpu4.svg
Normal file
File diff suppressed because it is too large
Load diff
After Width: | Height: | Size: 50 KiB |
1672
benchmark_low_level/bandwidth-t14.svg
Normal file
1672
benchmark_low_level/bandwidth-t14.svg
Normal file
File diff suppressed because it is too large
Load diff
After Width: | Height: | Size: 49 KiB |
1796
benchmark_low_level/latency-cpu4.svg
Normal file
1796
benchmark_low_level/latency-cpu4.svg
Normal file
File diff suppressed because it is too large
Load diff
After Width: | Height: | Size: 52 KiB |
1769
benchmark_low_level/latency-t14.svg
Normal file
1769
benchmark_low_level/latency-t14.svg
Normal file
File diff suppressed because it is too large
Load diff
After Width: | Height: | Size: 52 KiB |
39
benchmark_low_level/parse_results.py
Normal file
39
benchmark_low_level/parse_results.py
Normal file
|
@ -0,0 +1,39 @@
|
|||
import os
|
||||
import sys
|
||||
|
||||
# prefix is something like results_
|
||||
results = sys.argv[1]
|
||||
name = results.removeprefix('results_')
|
||||
types = {}
|
||||
results = open(results, 'rt')
|
||||
|
||||
|
||||
for idx, line in enumerate(results):
|
||||
if line.startswith('Memory read bandwidth'):
|
||||
types['bwr'] = idx
|
||||
elif line.startswith('Memory write bandwidth'):
|
||||
types['bww'] = idx
|
||||
elif line.startswith('Memory load latency'):
|
||||
types['lseq'] = idx
|
||||
elif line.startswith('Random load latency'):
|
||||
types['lrnd'] = idx
|
||||
else:
|
||||
pass
|
||||
|
||||
for typ, idx in types.items():
|
||||
csv = open(f'{name}-{typ}.csv', 'wt')
|
||||
results.seek(0)
|
||||
for count, line in enumerate(results):
|
||||
if count <= idx:
|
||||
continue
|
||||
if line.startswith('"'):
|
||||
continue
|
||||
try:
|
||||
val1, val2 = line.split(" ")
|
||||
except ValueError:
|
||||
# we are at the end of the section
|
||||
csv.close()
|
||||
break
|
||||
csv.write(f'{val1},{val2}')
|
||||
|
||||
|
111
benchmark_low_level/plot.py
Normal file
111
benchmark_low_level/plot.py
Normal file
|
@ -0,0 +1,111 @@
|
|||
import os
|
||||
import sys
|
||||
import numpy as np
|
||||
import matplotlib
|
||||
import itertools
|
||||
from matplotlib import pyplot as plt
|
||||
plt.style.use('ggplot')
|
||||
matplotlib.rcParams['font.size'] = 12
|
||||
|
||||
name = 't14'
|
||||
|
||||
caches = (48*1024, 1280*1024, 12*1024*1024)
|
||||
|
||||
def get_labels(x):
|
||||
xlabels = []
|
||||
for value in x:
|
||||
b = int(2**value)
|
||||
if b < 1024:
|
||||
xlabels.append(f'{b}B')
|
||||
elif b < 1048576:
|
||||
xlabels.append(f'{b//1024}K')
|
||||
elif b < 1073741824:
|
||||
xlabels.append(f'{b//1024//1024}M')
|
||||
else:
|
||||
xlabels.append(f'{b//1024//1024//1024}G')
|
||||
return xlabels
|
||||
|
||||
|
||||
# manually set ticks, to disable, set ticks = None
|
||||
|
||||
line = np.linspace(1, 10, 9, endpoint=False)
|
||||
yticks = list(line)+list(line*10)+list(line[:2]*100)
|
||||
|
||||
ylabels = (1, 10, 100)
|
||||
ticks = {'l': (yticks, [str(int(i)) if i in ylabels else '' for i in yticks]),
|
||||
'bw': (range(5,46,5), range(5,46,5)),
|
||||
}
|
||||
|
||||
# manually set limits, to disable set to ylim = None
|
||||
|
||||
ylim = {'l' : (1, 200),
|
||||
'bw' : (5,45),
|
||||
}
|
||||
|
||||
for type_ in ('bw', 'l'):
|
||||
if type_ == 'bw':
|
||||
suffix = ('r', 'w')
|
||||
ylabel = ''
|
||||
title = f'Memory Bandwidth ({name}) [GB/s]'
|
||||
legend1, legend2 = 'read', 'write'
|
||||
pic = f'bandwidth-{name}.svg'
|
||||
plt_func = plt.plot
|
||||
else:
|
||||
suffix = ('seq', 'rnd')
|
||||
ylabel = ''
|
||||
title = f'Memory Latency ({name}) [ns]'
|
||||
legend1, legend2 = 'sequential access', 'random access'
|
||||
pic = f'latency-{name}.svg'
|
||||
plt_func = plt.semilogy
|
||||
|
||||
|
||||
data1 = np.loadtxt(f'{name}-{type_}{suffix[0]}.csv', delimiter=',')
|
||||
data2 = np.loadtxt(f'{name}-{type_}{suffix[1]}.csv', delimiter=',')
|
||||
|
||||
# convert to bytes and then to the corresponding power of two
|
||||
|
||||
if type_ == 'bw':
|
||||
x1 = np.log2(data1[:,0]*1024*1024).round()
|
||||
y1 = data1[:,1]/1024
|
||||
x2 = np.log2(data2[:,0]*1024*1024).round()
|
||||
y2 = data2[:,1]/1024
|
||||
else:
|
||||
x1 = np.log2(data1[::2,0]*1024*1024).round()
|
||||
y1 = data1[::2,1]
|
||||
x2 = np.log2(data2[::2,0]*1024*1024).round()
|
||||
y2 = data2[::2,1]
|
||||
ylabels = None
|
||||
|
||||
|
||||
xlabel = 'block size'
|
||||
xlabels = get_labels(x1)
|
||||
|
||||
plt.figure(figsize=(8.5,7.5))
|
||||
p1, = plt_func(x1, y1, 'o')
|
||||
plt.ylabel(ylabel)
|
||||
plt.xlabel(xlabel)
|
||||
p2, = plt_func(x2, y2, 'o')
|
||||
if ylim and type_ in ylim:
|
||||
plt.ylim(*ylim[type_])
|
||||
plt.xticks(x1, xlabels, rotation=60)
|
||||
if ticks and type_ in ticks:
|
||||
plt.yticks(*ticks[type_])
|
||||
plt.legend((p1, p2), (legend1, legend2))
|
||||
if ylim and type_ in ylim:
|
||||
miny, maxy = ylim[type_]
|
||||
else:
|
||||
miny = min(y1.min(), y2.min())
|
||||
maxy = max(y1.max(), y2.max())
|
||||
# caches
|
||||
for idx, cache in enumerate(caches):
|
||||
level = idx + 1
|
||||
size = np.log2(cache)
|
||||
plt.plot((size, size), (miny, maxy),
|
||||
color = 'darkblue', alpha=0.4)
|
||||
plt.text(size-1, 2*miny, f'L{level}\n⟵',
|
||||
color='darkblue', verticalalignment='top')
|
||||
|
||||
plt.title(title)
|
||||
plt.savefig(pic)
|
||||
|
||||
|
485
benchmark_low_level/results_t14
Normal file
485
benchmark_low_level/results_t14
Normal file
|
@ -0,0 +1,485 @@
|
|||
[lmbench3.0 results for Linux multivac 6.10.3-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.10.3-1 (2024-08-04) x86_64 GNU/Linux]
|
||||
[LMBENCH_VER: 3.0-a9]
|
||||
[BENCHMARK_HARDWARE: YES]
|
||||
[BENCHMARK_OS: NO]
|
||||
[ALL: 512 1k 2k 4k 8k 16k 32k 64k 128k 256k 512k 1m 2m 4m 8m 16m 32m 64m 128m 256m 512m 1024m]
|
||||
[DISKS: ]
|
||||
[DISK_DESC: ]
|
||||
[ENOUGH: 5000]
|
||||
[FAST: ]
|
||||
[FASTMEM: NO]
|
||||
[FILE: /var/tmp/lmbench/XXX]
|
||||
[FSDIR: /var/tmp/lmbench]
|
||||
[HALF: 512 1k 2k 4k 8k 16k 32k 64k 128k 256k 512k 1m 2m 4m 8m 16m 32m 64m 128m 256m 512m]
|
||||
[INFO: INFO.multivac]
|
||||
[LINE_SIZE: ]
|
||||
[LOOP_O: 0.00000000]
|
||||
[MB: 1024]
|
||||
[MHZ: 1296 MHz, 0.7716 nanosec clock]
|
||||
[MOTHERBOARD: ]
|
||||
[NETWORKS: ]
|
||||
[PROCESSORS: 11]
|
||||
[REMOTE: ]
|
||||
[SLOWFS: YES]
|
||||
[OS: x86_64-linux-gnu]
|
||||
[SYNC_MAX: 1]
|
||||
[LMBENCH_SCHED: DEFAULT]
|
||||
[TIMING_O: 0]
|
||||
[LMBENCH VERSION: 3.0-20240810]
|
||||
[USER: root]
|
||||
[HOSTNAME: multivac]
|
||||
[NODENAME: multivac]
|
||||
[SYSNAME: Linux]
|
||||
[PROCESSOR: unknown]
|
||||
[MACHINE: x86_64]
|
||||
[RELEASE: 6.10.3-amd64]
|
||||
[VERSION: #1 SMP PREEMPT_DYNAMIC Debian 6.10.3-1 (2024-08-04)]
|
||||
[Sat Aug 10 04:20:43 PM CEST 2024]
|
||||
[ 16:20:43 up 1:18, 4 users, load average: 0.37, 0.94, 1.05]
|
||||
[net: Kernel Interface table]
|
||||
[net: Iface MTU RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg]
|
||||
[net: eth0 1500 0 0 0 0 0 0 0 0 BMU]
|
||||
[if: eth0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500]
|
||||
[if: ether fc:5c:ee:4d:b5:eb txqueuelen 1000 (Ethernet)]
|
||||
[if: RX packets 0 bytes 0 (0.0 B)]
|
||||
[if: RX errors 0 dropped 0 overruns 0 frame 0]
|
||||
[if: TX packets 0 bytes 0 (0.0 B)]
|
||||
[if: TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0]
|
||||
[if: device interrupt 16 memory 0xbc300000-bc320000]
|
||||
[if: ]
|
||||
[net: eth1 1500 34948 0 2352 0 7773 0 0 0 BMRU]
|
||||
[if: eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500]
|
||||
[if: inet 192.168.111.103 netmask 255.255.255.0 broadcast 192.168.111.255]
|
||||
[if: inet6 fe80::44e3:4a35:5130:3045 prefixlen 64 scopeid 0x20<link>]
|
||||
[if: inet6 2003:ef:2f2e:c900:e437:85c7:3d97:f353 prefixlen 64 scopeid 0x0<global>]
|
||||
[if: ether b0:4f:13:ef:1a:3e txqueuelen 1000 (Ethernet)]
|
||||
[if: RX packets 34948 bytes 33936985 (32.3 MiB)]
|
||||
[if: RX errors 0 dropped 2352 overruns 0 frame 0]
|
||||
[if: TX packets 7773 bytes 1213416 (1.1 MiB)]
|
||||
[if: TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0]
|
||||
[if: ]
|
||||
[net: lo 65536 95 0 0 0 95 0 0 0 LRU]
|
||||
[if: lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536]
|
||||
[if: inet 127.0.0.1 netmask 255.0.0.0]
|
||||
[if: inet6 ::1 prefixlen 128 scopeid 0x10<host>]
|
||||
[if: loop txqueuelen 1000 (Local Loopback)]
|
||||
[if: RX packets 95 bytes 5588 (5.4 KiB)]
|
||||
[if: RX errors 0 dropped 0 overruns 0 frame 0]
|
||||
[if: TX packets 95 bytes 5588 (5.4 KiB)]
|
||||
[if: TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0]
|
||||
[if: ]
|
||||
[mount: sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)]
|
||||
[mount: proc on /proc type proc (rw,relatime)]
|
||||
[mount: udev on /dev type devtmpfs (rw,nosuid,relatime,size=16228560k,nr_inodes=4057140,mode=755,inode64)]
|
||||
[mount: devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)]
|
||||
[mount: tmpfs on /run type tmpfs (rw,nosuid,nodev,noexec,relatime,size=3251140k,mode=755,inode64)]
|
||||
[mount: /dev/mapper/CRYPT-ROOT on / type ext4 (rw,relatime,errors=remount-ro)]
|
||||
[mount: securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)]
|
||||
[mount: tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,inode64)]
|
||||
[mount: cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot)]
|
||||
[mount: pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)]
|
||||
[mount: efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)]
|
||||
[mount: bpf on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)]
|
||||
[mount: systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=39,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=67)]
|
||||
[mount: hugetlbfs on /dev/hugepages type hugetlbfs (rw,nosuid,nodev,relatime,pagesize=2M)]
|
||||
[mount: none on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)]
|
||||
[mount: mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)]
|
||||
[mount: tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k,inode64)]
|
||||
[mount: tracefs on /sys/kernel/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)]
|
||||
[mount: tmpfs on /run/credentials/systemd-journald.service type tmpfs (ro,nosuid,nodev,noexec,relatime,nosymfollow,size=1024k,nr_inodes=1024,mode=700,inode64,noswap)]
|
||||
[mount: tmpfs on /run/credentials/systemd-udev-load-credentials.service type tmpfs (ro,nosuid,nodev,noexec,relatime,nosymfollow,size=1024k,nr_inodes=1024,mode=700,inode64,noswap)]
|
||||
[mount: tmpfs on /run/credentials/systemd-tmpfiles-setup-dev-early.service type tmpfs (ro,nosuid,nodev,noexec,relatime,nosymfollow,size=1024k,nr_inodes=1024,mode=700,inode64,noswap)]
|
||||
[mount: configfs on /sys/kernel/config type configfs (rw,nosuid,nodev,noexec,relatime)]
|
||||
[mount: fusectl on /sys/fs/fuse/connections type fusectl (rw,nosuid,nodev,noexec,relatime)]
|
||||
[mount: tmpfs on /run/credentials/systemd-sysctl.service type tmpfs (ro,nosuid,nodev,noexec,relatime,nosymfollow,size=1024k,nr_inodes=1024,mode=700,inode64,noswap)]
|
||||
[mount: tmpfs on /run/credentials/systemd-tmpfiles-setup-dev.service type tmpfs (ro,nosuid,nodev,noexec,relatime,nosymfollow,size=1024k,nr_inodes=1024,mode=700,inode64,noswap)]
|
||||
[mount: tmpfs on /tmp type tmpfs (rw,nosuid,nodev,relatime,size=16777216k,inode64)]
|
||||
[mount: tmpfs on /run/credentials/systemd-tmpfiles-setup.service type tmpfs (ro,nosuid,nodev,noexec,relatime,nosymfollow,size=1024k,nr_inodes=1024,mode=700,inode64,noswap)]
|
||||
[mount: binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,nosuid,nodev,noexec,relatime)]
|
||||
[mount: sunrpc on /run/rpc_pipefs type rpc_pipefs (rw,relatime)]
|
||||
[mount: tmpfs on /run/user/1002 type tmpfs (rw,nosuid,nodev,relatime,size=3251136k,nr_inodes=812784,mode=700,uid=1002,gid=100,inode64)]
|
||||
[mount: tmpfs on /run/credentials/getty@tty1.service type tmpfs (ro,nosuid,nodev,noexec,relatime,nosymfollow,size=1024k,nr_inodes=1024,mode=700,inode64,noswap)]
|
||||
[mount: tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=3251136k,nr_inodes=812784,mode=700,inode64)]
|
||||
[mount: none on /cpusets type cgroup (rw,relatime,cpuset,noprefix,release_agent=/sbin/cpuset_release_agent)]
|
||||
integer bit: 0.54 nanoseconds
|
||||
integer add: 0.77 nanoseconds
|
||||
integer div: 8.49 nanoseconds
|
||||
integer mod: 12.58 nanoseconds
|
||||
int64 bit: 0.52 nanoseconds
|
||||
uint64 add: 0.77 nanoseconds
|
||||
int64 div: 11.58 nanoseconds
|
||||
int64 mod: 14.91 nanoseconds
|
||||
float add: 1.54 nanoseconds
|
||||
float mul: 3.09 nanoseconds
|
||||
float div: 8.49 nanoseconds
|
||||
double add: 1.54 nanoseconds
|
||||
double mul: 3.09 nanoseconds
|
||||
double div: 10.80 nanoseconds
|
||||
float bogomflops: 1.16 nanoseconds
|
||||
double bogomflops: 1.54 nanoseconds
|
||||
integer bit parallelism: 2.77
|
||||
integer add parallelism: 2.73
|
||||
integer div parallelism: 1.83
|
||||
integer mod parallelism: 2.83
|
||||
int64 bit parallelism: 2.49
|
||||
int64 add parallelism: 2.60
|
||||
int64 div parallelism: 1.50
|
||||
int64 mod parallelism: 1.90
|
||||
float add parallelism: 4.00
|
||||
float mul parallelism: 8.00
|
||||
float div parallelism: 3.67
|
||||
double add parallelism: 4.00
|
||||
double mul parallelism: 8.00
|
||||
double div parallelism: 3.50
|
||||
unable to register (XACT_PROG, XACT_VERS, udp).
|
||||
: RPC: Unable to receive
|
||||
|
||||
"libc bcopy unaligned
|
||||
0.000512 41652.95
|
||||
0.001024 47761.25
|
||||
0.002048 50233.88
|
||||
0.004096 55637.27
|
||||
0.008192 64524.03
|
||||
0.016384 67719.30
|
||||
0.032768 18212.36
|
||||
0.065536 18407.52
|
||||
0.131072 18473.55
|
||||
0.262144 18475.00
|
||||
0.524288 14642.79
|
||||
1.05 8957.30
|
||||
2.10 8208.03
|
||||
4.19 8208.03
|
||||
8.39 9645.77
|
||||
16.78 7631.79
|
||||
33.55 7129.38
|
||||
67.11 6951.41
|
||||
134.22 6900.65
|
||||
268.44 6848.89
|
||||
536.87 6861.76
|
||||
|
||||
"libc bcopy aligned
|
||||
0.000512 44106.76
|
||||
0.001024 49354.68
|
||||
0.002048 51472.69
|
||||
0.004096 55925.21
|
||||
0.008192 63828.24
|
||||
0.016384 66379.51
|
||||
0.032768 18202.45
|
||||
0.065536 18336.03
|
||||
0.131072 18457.77
|
||||
0.262144 18327.76
|
||||
0.524288 15715.46
|
||||
1.05 8922.33
|
||||
2.10 8367.89
|
||||
4.19 8343.10
|
||||
8.39 9679.16
|
||||
16.78 7632.95
|
||||
33.55 7179.72
|
||||
67.11 6990.51
|
||||
134.22 6911.31
|
||||
268.44 6892.15
|
||||
536.87 6891.97
|
||||
|
||||
Memory bzero bandwidth
|
||||
0.000512 73586.23
|
||||
0.001024 78019.46
|
||||
0.002048 80349.42
|
||||
0.004096 74573.30
|
||||
0.008192 78524.11
|
||||
0.016384 80567.79
|
||||
0.032768 81708.84
|
||||
0.065536 21219.16
|
||||
0.131072 21299.79
|
||||
0.262144 21333.96
|
||||
0.524288 21347.23
|
||||
1.05 19382.88
|
||||
2.10 12829.98
|
||||
4.19 12611.10
|
||||
8.39 12606.02
|
||||
16.78 10399.64
|
||||
33.55 9537.93
|
||||
67.11 9140.41
|
||||
134.22 9007.90
|
||||
268.44 8931.77
|
||||
536.87 8918.57
|
||||
1073.74 8908.13
|
||||
|
||||
"unrolled bcopy unaligned
|
||||
0.000512 10357.22
|
||||
0.001024 10363.21
|
||||
0.002048 10356.95
|
||||
0.004096 10357.76
|
||||
0.008192 10343.49
|
||||
0.016384 10351.27
|
||||
0.032768 7899.27
|
||||
0.065536 7893.76
|
||||
0.131072 7873.84
|
||||
0.262144 7832.99
|
||||
0.524288 7281.78
|
||||
1.05 6503.77
|
||||
2.10 6418.22
|
||||
4.19 6461.47
|
||||
8.39 5194.99
|
||||
16.78 4722.65
|
||||
33.55 4639.72
|
||||
67.11 4606.91
|
||||
134.22 4593.51
|
||||
268.44 4596.34
|
||||
536.87 4587.27
|
||||
|
||||
"unrolled partial bcopy unaligned
|
||||
0.000512 41402.69
|
||||
0.001024 41453.86
|
||||
0.002048 41452.30
|
||||
0.004096 41425.45
|
||||
0.008192 41418.12
|
||||
0.016384 41333.58
|
||||
0.032768 18957.19
|
||||
0.065536 18955.39
|
||||
0.131072 18962.49
|
||||
0.262144 18969.69
|
||||
0.524288 14659.04
|
||||
1.05 8844.77
|
||||
2.10 8192.00
|
||||
4.19 8206.57
|
||||
8.39 6326.25
|
||||
16.78 5801.25
|
||||
33.55 5644.14
|
||||
67.11 5609.70
|
||||
134.22 5600.81
|
||||
268.44 5589.38
|
||||
536.87 5591.24
|
||||
|
||||
Memory read bandwidth
|
||||
0.000512 29201.61
|
||||
0.001024 29294.55
|
||||
0.002048 29363.12
|
||||
0.004096 29433.86
|
||||
0.008192 29442.59
|
||||
0.016384 29285.40
|
||||
0.032768 29336.30
|
||||
0.065536 27978.05
|
||||
0.131072 28392.59
|
||||
0.262144 28408.05
|
||||
0.524288 28424.68
|
||||
1.05 28385.92
|
||||
2.10 28385.92
|
||||
4.19 28395.43
|
||||
8.39 28334.85
|
||||
16.78 26342.45
|
||||
33.55 23489.28
|
||||
67.11 22195.75
|
||||
134.22 21644.53
|
||||
268.44 21620.12
|
||||
536.87 21505.80
|
||||
1073.74 21526.50
|
||||
|
||||
Memory partial read bandwidth
|
||||
0.000512 58916.90
|
||||
0.001024 59661.44
|
||||
0.002048 61203.68
|
||||
0.004096 58783.21
|
||||
0.008192 61320.45
|
||||
0.016384 61266.70
|
||||
0.032768 60940.09
|
||||
0.065536 30488.23
|
||||
0.131072 30517.76
|
||||
0.262144 30516.13
|
||||
0.524288 29627.83
|
||||
1.05 24662.86
|
||||
2.10 17384.93
|
||||
4.19 17168.66
|
||||
8.39 16915.36
|
||||
16.78 13189.64
|
||||
33.55 11584.48
|
||||
67.11 11024.95
|
||||
134.22 10892.53
|
||||
268.44 10824.45
|
||||
536.87 10781.84
|
||||
1073.74 10759.80
|
||||
|
||||
Memory write bandwidth
|
||||
0.000512 41405.52
|
||||
0.001024 41396.47
|
||||
0.002048 41429.93
|
||||
0.004096 41445.34
|
||||
0.008192 41401.00
|
||||
0.016384 41398.70
|
||||
0.032768 41426.50
|
||||
0.065536 21381.05
|
||||
0.131072 21388.82
|
||||
0.262144 21374.31
|
||||
0.524288 21370.17
|
||||
1.05 18114.68
|
||||
2.10 12417.83
|
||||
4.19 12264.05
|
||||
8.39 12250.61
|
||||
16.78 9679.16
|
||||
33.55 8978.98
|
||||
67.11 8703.00
|
||||
134.22 8589.38
|
||||
268.44 8520.41
|
||||
536.87 8543.59
|
||||
1073.74 8544.75
|
||||
|
||||
Memory partial write bandwidth
|
||||
0.000512 41406.05
|
||||
0.001024 41431.27
|
||||
0.002048 41414.90
|
||||
0.004096 41425.45
|
||||
0.008192 41431.04
|
||||
0.016384 41453.60
|
||||
0.032768 41366.48
|
||||
0.065536 21392.21
|
||||
0.131072 21364.37
|
||||
0.262144 21381.05
|
||||
0.524288 21366.56
|
||||
1.05 18649.81
|
||||
2.10 12411.48
|
||||
4.19 12249.30
|
||||
8.39 12300.01
|
||||
16.78 9693.61
|
||||
33.55 9024.86
|
||||
67.11 8771.25
|
||||
134.22 8618.06
|
||||
268.44 8557.89
|
||||
536.87 8549.44
|
||||
1073.74 8543.32
|
||||
|
||||
Memory partial read/write bandwidth
|
||||
0.000512 20712.63
|
||||
0.001024 20714.87
|
||||
0.002048 20703.88
|
||||
0.004096 20718.77
|
||||
0.008192 20719.28
|
||||
0.016384 20715.33
|
||||
0.032768 20722.87
|
||||
0.065536 20693.70
|
||||
0.131072 20690.58
|
||||
0.262144 20638.28
|
||||
0.524288 20665.37
|
||||
1.05 18846.95
|
||||
2.10 12887.53
|
||||
4.19 12613.33
|
||||
8.39 12576.62
|
||||
16.78 10295.93
|
||||
33.55 9551.50
|
||||
67.11 9191.74
|
||||
134.22 9087.19
|
||||
268.44 9035.49
|
||||
536.87 9018.95
|
||||
1073.74 9023.04
|
||||
|
||||
Usage: tlb [-c] [-L <line size>] [-M len[K|M]] [-W <warmup>] [-N <repetitions>]
|
||||
|
||||
Memory load parallelism
|
||||
Usage: par_mem [-L <line size>] [-M len[K|M]] [-W <warmup>] [-N <repetitions>]
|
||||
|
||||
STREAM copy latency: 1.48 nanoseconds
|
||||
STREAM copy bandwidth: 10781.39 MB/sec
|
||||
STREAM scale latency: 1.50 nanoseconds
|
||||
STREAM scale bandwidth: 10668.00 MB/sec
|
||||
STREAM add latency: 2.11 nanoseconds
|
||||
STREAM add bandwidth: 11374.24 MB/sec
|
||||
STREAM triad latency: 2.13 nanoseconds
|
||||
STREAM triad bandwidth: 11264.63 MB/sec
|
||||
STREAM2 fill latency: 0.89 nanoseconds
|
||||
STREAM2 fill bandwidth: 8955.12 MB/sec
|
||||
STREAM2 copy latency: 1.48 nanoseconds
|
||||
STREAM2 copy bandwidth: 10775.60 MB/sec
|
||||
STREAM2 daxpy latency: 1.81 nanoseconds
|
||||
STREAM2 daxpy bandwidth: 13256.52 MB/sec
|
||||
STREAM2 sum latency: 1.60 nanoseconds
|
||||
STREAM2 sum bandwidth: 5006.72 MB/sec
|
||||
|
||||
Memory load latency
|
||||
"stride=128
|
||||
0.00049 3.859
|
||||
0.00098 3.859
|
||||
0.00195 3.859
|
||||
0.00293 3.859
|
||||
0.00391 3.861
|
||||
0.00586 3.858
|
||||
0.00781 3.858
|
||||
0.01172 3.859
|
||||
0.01562 3.859
|
||||
0.02344 3.859
|
||||
0.03125 3.859
|
||||
0.04688 3.861
|
||||
0.06250 11.580
|
||||
0.09375 11.576
|
||||
0.12500 11.577
|
||||
0.18750 11.583
|
||||
0.25000 11.577
|
||||
0.37500 11.576
|
||||
0.50000 11.579
|
||||
0.75000 11.578
|
||||
1.00000 11.590
|
||||
1.50000 13.543
|
||||
2.00000 13.936
|
||||
3.00000 13.999
|
||||
4.00000 13.996
|
||||
6.00000 13.997
|
||||
8.00000 14.002
|
||||
12.00000 14.976
|
||||
16.00000 19.832
|
||||
24.00000 20.880
|
||||
32.00000 21.339
|
||||
48.00000 21.899
|
||||
64.00000 22.023
|
||||
96.00000 22.156
|
||||
128.00000 22.213
|
||||
192.00000 22.283
|
||||
256.00000 22.320
|
||||
384.00000 22.306
|
||||
512.00000 22.325
|
||||
768.00000 22.345
|
||||
1024.00000 22.361
|
||||
|
||||
Random load latency
|
||||
"stride=16
|
||||
0.00049 3.859
|
||||
0.00098 3.858
|
||||
0.00195 3.858
|
||||
0.00293 3.858
|
||||
0.00391 3.858
|
||||
0.00586 3.858
|
||||
0.00781 3.859
|
||||
0.01172 3.858
|
||||
0.01562 3.858
|
||||
0.02344 3.859
|
||||
0.03125 3.859
|
||||
0.04688 3.864
|
||||
0.06250 11.575
|
||||
0.09375 14.276
|
||||
0.12500 15.462
|
||||
0.18750 16.079
|
||||
0.25000 16.646
|
||||
0.37500 16.373
|
||||
0.50000 16.352
|
||||
0.75000 18.529
|
||||
1.00000 18.245
|
||||
1.50000 42.351
|
||||
2.00000 55.350
|
||||
3.00000 61.011
|
||||
4.00000 62.143
|
||||
6.00000 63.587
|
||||
8.00000 65.259
|
||||
12.00000 84.563
|
||||
16.00000 107.165
|
||||
24.00000 131.898
|
||||
32.00000 141.864
|
||||
48.00000 150.654
|
||||
64.00000 156.245
|
||||
96.00000 162.950
|
||||
128.00000 167.497
|
||||
192.00000 170.394
|
||||
256.00000 171.779
|
||||
384.00000 172.858
|
||||
512.00000 172.877
|
||||
768.00000 173.626
|
||||
1024.00000 173.702
|
||||
|
||||
|
||||
|
||||
[Sat Aug 10 04:39:13 PM CEST 2024]
|
22
benchmark_low_level/t14-bwr.csv
Normal file
22
benchmark_low_level/t14-bwr.csv
Normal file
|
@ -0,0 +1,22 @@
|
|||
0.000512,29201.61
|
||||
0.001024,29294.55
|
||||
0.002048,29363.12
|
||||
0.004096,29433.86
|
||||
0.008192,29442.59
|
||||
0.016384,29285.40
|
||||
0.032768,29336.30
|
||||
0.065536,27978.05
|
||||
0.131072,28392.59
|
||||
0.262144,28408.05
|
||||
0.524288,28424.68
|
||||
1.05,28385.92
|
||||
2.10,28385.92
|
||||
4.19,28395.43
|
||||
8.39,28334.85
|
||||
16.78,26342.45
|
||||
33.55,23489.28
|
||||
67.11,22195.75
|
||||
134.22,21644.53
|
||||
268.44,21620.12
|
||||
536.87,21505.80
|
||||
1073.74,21526.50
|
|
22
benchmark_low_level/t14-bww.csv
Normal file
22
benchmark_low_level/t14-bww.csv
Normal file
|
@ -0,0 +1,22 @@
|
|||
0.000512,41405.52
|
||||
0.001024,41396.47
|
||||
0.002048,41429.93
|
||||
0.004096,41445.34
|
||||
0.008192,41401.00
|
||||
0.016384,41398.70
|
||||
0.032768,41426.50
|
||||
0.065536,21381.05
|
||||
0.131072,21388.82
|
||||
0.262144,21374.31
|
||||
0.524288,21370.17
|
||||
1.05,18114.68
|
||||
2.10,12417.83
|
||||
4.19,12264.05
|
||||
8.39,12250.61
|
||||
16.78,9679.16
|
||||
33.55,8978.98
|
||||
67.11,8703.00
|
||||
134.22,8589.38
|
||||
268.44,8520.41
|
||||
536.87,8543.59
|
||||
1073.74,8544.75
|
|
41
benchmark_low_level/t14-lrnd.csv
Normal file
41
benchmark_low_level/t14-lrnd.csv
Normal file
|
@ -0,0 +1,41 @@
|
|||
0.00049,3.859
|
||||
0.00098,3.858
|
||||
0.00195,3.858
|
||||
0.00293,3.858
|
||||
0.00391,3.858
|
||||
0.00586,3.858
|
||||
0.00781,3.859
|
||||
0.01172,3.858
|
||||
0.01562,3.858
|
||||
0.02344,3.859
|
||||
0.03125,3.859
|
||||
0.04688,3.864
|
||||
0.06250,11.575
|
||||
0.09375,14.276
|
||||
0.12500,15.462
|
||||
0.18750,16.079
|
||||
0.25000,16.646
|
||||
0.37500,16.373
|
||||
0.50000,16.352
|
||||
0.75000,18.529
|
||||
1.00000,18.245
|
||||
1.50000,42.351
|
||||
2.00000,55.350
|
||||
3.00000,61.011
|
||||
4.00000,62.143
|
||||
6.00000,63.587
|
||||
8.00000,65.259
|
||||
12.00000,84.563
|
||||
16.00000,107.165
|
||||
24.00000,131.898
|
||||
32.00000,141.864
|
||||
48.00000,150.654
|
||||
64.00000,156.245
|
||||
96.00000,162.950
|
||||
128.00000,167.497
|
||||
192.00000,170.394
|
||||
256.00000,171.779
|
||||
384.00000,172.858
|
||||
512.00000,172.877
|
||||
768.00000,173.626
|
||||
1024.00000,173.702
|
|
41
benchmark_low_level/t14-lseq.csv
Normal file
41
benchmark_low_level/t14-lseq.csv
Normal file
|
@ -0,0 +1,41 @@
|
|||
0.00049,3.859
|
||||
0.00098,3.859
|
||||
0.00195,3.859
|
||||
0.00293,3.859
|
||||
0.00391,3.861
|
||||
0.00586,3.858
|
||||
0.00781,3.858
|
||||
0.01172,3.859
|
||||
0.01562,3.859
|
||||
0.02344,3.859
|
||||
0.03125,3.859
|
||||
0.04688,3.861
|
||||
0.06250,11.580
|
||||
0.09375,11.576
|
||||
0.12500,11.577
|
||||
0.18750,11.583
|
||||
0.25000,11.577
|
||||
0.37500,11.576
|
||||
0.50000,11.579
|
||||
0.75000,11.578
|
||||
1.00000,11.590
|
||||
1.50000,13.543
|
||||
2.00000,13.936
|
||||
3.00000,13.999
|
||||
4.00000,13.996
|
||||
6.00000,13.997
|
||||
8.00000,14.002
|
||||
12.00000,14.976
|
||||
16.00000,19.832
|
||||
24.00000,20.880
|
||||
32.00000,21.339
|
||||
48.00000,21.899
|
||||
64.00000,22.023
|
||||
96.00000,22.156
|
||||
128.00000,22.213
|
||||
192.00000,22.283
|
||||
256.00000,22.320
|
||||
384.00000,22.306
|
||||
512.00000,22.325
|
||||
768.00000,22.345
|
||||
1024.00000,22.361
|
|
Loading…
Reference in a new issue