もう古い GPU なので、あまり大したことではないのですが、以下のように NVIDIA の GOU C2070 と C2075 でデバイス内のメモリバンド幅の性能差が大きいのが、ちょっと問題になってます。
./bandwidthTest --device=all
[CUDA Bandwidth Test] - Starting...
!!!!!Cumulative Bandwidth to be computed from all the devices !!!!!!
Running on...
Device 0: Tesla C2075
Device 1: Tesla C2075
Device 2: Tesla C2075
Device 3: Tesla C2075
Quick Mode
Host to Device Bandwidth, 4 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 21816.0
Device to Host Bandwidth, 4 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 21593.8
Device to Device Bandwidth, 4 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 421157.3
Result = PASS
./bandwidthTest --device=all
[CUDA Bandwidth Test] - Starting...
!!!!!Cumulative Bandwidth to be computed from all the devices !!!!!!
Running on...
Device 0: Tesla C2070
Device 1: Tesla C2070
Device 2: Tesla C2070
Device 3: Tesla C2070
Quick Mode
Host to Device Bandwidth, 4 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 21925.0
Device to Host Bandwidth, 4 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 21651.0
Device to Device Bandwidth, 4 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 398906.1
Result = PASS