You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemml.apache.org by Mingyang Wang <mi...@eng.ucsd.edu> on 2017/04/20 00:48:01 UTC

Questions about the Compositions of Execution Time

Hi all,

I have run some simple matrix multiplication in SystemML and found that JVM
GC time and Spark collect time are dominant.

For example, given 4 executors with 20 cores and 100GB memory each, and a
driver with 10GB memory, one setting is

R = read($R) # 1,000,000 x 80 -> 612M
S = read($S) # 20,000,000 x 20 -> 3G
FK = read($FK) # 20,000,000 x 1,000,000 (sparse) -> 358M
wS = Rand(rows=ncol(S), cols=1, min=0, max=1, pdf="uniform")
wR = Rand(rows=ncol(R), cols=1, min=0, max=1, pdf="uniform")

temp = S %*% wS + FK %*% (R %*% wR)
# some code to enforce the execution

It took 77.597s to execute while JVM GC took 70.282s.

Another setting is

T = read($T) # 20,000,000 x 100 -> 15G
w = Rand(rows=ncol(T), cols=1, min=0, max=1, pdf="uniform")

temp = T %*% w
# some code to enforce the execution

It took 92.582s to execute while Spark collect took 91.991s.

My questions are
1. Are these behaviors expected, as it seems only a tiny fraction of time
are spent on computation?
2. How can I tweak the configuration to tune the performance?
3. Is there any way to measure the time spent on data loading, computation,
disk accesses, and communication separately?
4. Any rule of thumb to estimate the memory needed for a program in
SystemML?

I really appreciate your inputs!


Best,
Mingyang Wang