diff --git a/architecture/README.md b/architecture/README.md index b912d14..eefadfc 100644 --- a/architecture/README.md +++ b/architecture/README.md @@ -73,7 +73,7 @@ - latency: usually much higher than CPU's ➔ the more parallel threads are run the less the price of high latency is paid (latency "hiding") - spatial locality is extremely critical - A portion of the GPU-RAM is accessible to the CPU ➔ the GPU performs the copies - - The PCI-Bus bottleneck: data needs to flow from main (CPU) memory to GPU memory and back! + - The PCI-Bus (Peripheral Component Interconnect bus) is the bottleneck: data needs to flow from main (CPU) memory to GPU memory and back! - Problems on a cluster: the GPU does not really support simultanous multiple users payloads! # Computer Architecture (a concrete example) @@ -94,7 +94,7 @@ My Laptop: - Internal clock 650 MHz, 1.54 ns - CAS Latency 34 cycles, Total latency = CAS latency x cycle = 13.09 ns, Throughput 40.6 GB/s - DMI (Direct Media Interface): 8×16 GT/s (≈128 GB/s) - - PCI Express bridges: + - PCI (Peripheral Component Interconnect) Express bridges: - Graphics: 16 GT/s (≈ 8 GB/s) - 2× Thunderbolt: 2.5 GT/s (≈ 1 GB/s) and 16 GT/s (≈ 8 GB/s) - GPU Intel Iris, Internal clock 300 Mhz-1.30 GHz, memory 4 GB/2.1 GHz with a bandwidth of 68 GB/s