-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathnotes
54 lines (53 loc) · 1.33 KB
/
notes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
introduction:
* CPU <> GPU
* CUDA framework
* compute capability
hardware architecture:
Overview:
* host
* device
* SM
Memory:
Device:
* global
* shared
* registers
* local
* L1
* L2
Transfer:
* PCIe protocol characteristics
* PCIe generation overview / comparison
* comparison to global memory / device memory
Kernel:
* ??
* threads/blocks/grid...?
memory management:
allocation:
dynamic allocation
pitches layout
streams and synchronization:
synchronization:
streams:
Unified Virtual Addressing (UVA):
Unified Memory
memory transfer optimization:
pinned memory:
portable:
mapped:
wcpm?:
Zero-Copy:
Memory access optimization:
global memory access:
CC 1.x (Tesla):
CC 2.x (Fermi):
CC 3.x (Kepler):
CC 5.x (Maxwell):
2d access (pitched)
* example code, show throughput table
shared memory access:
* shared memory fast, on-chip
* configurable size, 64 kb
* using 32 memory banks => high performance, different bandwith (CC related)
* bank conflicts, diff CCs
* example code, show throughput table