TEPS その２ - 大規模最適化問題、グラフ探索、機械学習やデジタルツインなど

本来は BFS (幅優先探索）での測定になるが、最短路のプログラム(1対全:優先キュー付きダイクストラ法)を用いて traversed edges per second (TEPS) の値を測定した。

http://www.graph500.org/Specifications.html

We measure TEPS through the benchmarking of kernel 2 as follows. Let timeK2(n) be the measured execution time for kernel 2. Let m be the number of input edge tuples within the component traversed by the search, counting any multiple edges and self-loops. We define the normalized performance rate (number of edge traversals per second) as:

TEPS(n) = m / timeK2(n)

以下の Intel Core i7 の計算サーバ１を用いて、全米データ 256 クエリの最短路探索を行い、1 クエリ単位で TEPS 値を測定する。

○１コアのみ：TEPS 値
最大値：21.01 ME/S
最小値：12.22 ME/S
平均値：14.27 ME/S

１コアのみで計算を行うと Core i7 では最大の場合、１秒間に 2100万枝の探索を行うことができる。

○４コア同時計算：TEPS 値
最大値：20.04 ME/S
最小値：9.90 ME/S
平均値：12.43 ME/S

4コア同時計算でも意外と性能低下は少ない。

○計算サーバ1 (1CPU x 4 コア = 4 コア)
CPU : Intel Core i7 965 3.2GHz x 1
メモリ : 12GB
OS : CentOS 5.5 for x86_64

上記の場合では、おそらく優先キューのサイズが L3 キャッシュのサイズと比較して小さいのだが、例えば超大規模グラフ(点数=1,073,741,824, 枝数=2,147,483,647 つまり点数 1G, 枝数 2G)になると L3 キャッシュから溢れてしまい、単純にメモリバンド幅に依存した性能になってしまう。

○１コアのみ：TEPS 値（計算サーバ２）
2.08 ME/S

○計算サーバ２ (2 CPU x 4 コア = 8 コア)
CPU : Intel Xeon 5550 (2.66GHz / 8MB L3) x 2
Memory : 72GB (18 x 4GB / 800MHz)
OS : Fedora 13 for x86_64

ちなみに以下は BFS (幅優先探索）の結果なので最短路探索よりは性能上は有利である。
http://www.graph500.org/Results.html