You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@systemml.apache.org by "Matthias Boehm (JIRA)" <ji...@apache.org> on 2016/10/05 17:52:20 UTC
[jira] [Closed] (SYSTEMML-1011) Slow sparse append cbind (sparse
row re-allocations)
[ https://issues.apache.org/jira/browse/SYSTEMML-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthias Boehm closed SYSTEMML-1011.
------------------------------------
> Slow sparse append cbind (sparse row re-allocations)
> ----------------------------------------------------
>
> Key: SYSTEMML-1011
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1011
> Project: SystemML
> Issue Type: Bug
> Reporter: Matthias Boehm
> Assignee: Matthias Boehm
> Fix For: SystemML 0.11
>
>
> All algorithms that support the 'intercept' option (e.g., LinregCG, LinregDS, L2SVM, MSVM, Mlogreg, and GLM) append a column of 1s in the beginning of the script. On large sparse data, this append sometimes dominates end-to-end performance. For example, here are the LinregCG results for a 10Mx1K scenario with sparsity 0.01.
> {code}
> -- Running runLinearRegCG on 10M_1k_sparse (all configs)
> LinRegCG train ict=0 on mbperftest/binomial/X10M_1k_sparse: 7
> LinRegCG train ict=1 on mbperftest/binomial/X10M_1k_sparse: 15
> LinRegCG train ict=2 on mbperftest/binomial/X10M_1k_sparse: 15
> -- Running runLinearRegCG on 10M_1k_sparse (all configs)
> LinRegCG train ict=0 on mbperftest/binomial/X10M_1k_sparse: 7
> LinRegCG train ict=1 on mbperftest/binomial/X10M_1k_sparse: 15
> LinRegCG train ict=2 on mbperftest/binomial/X10M_1k_sparse: 15
> -- Running runLinearRegCG on 10M_1k_sparse (all configs)
> LinRegCG train ict=0 on mbperftest/binomial/X10M_1k_sparse: 6
> LinRegCG train ict=1 on mbperftest/binomial/X10M_1k_sparse: 15
> LinRegCG train ict=2 on mbperftest/binomial/X10M_1k_sparse: 16
> -- Running runLinearRegCG on 10M_1k_sparse (all configs)
> LinRegCG train ict=0 on mbperftest/binomial/X10M_1k_sparse: 7
> LinRegCG train ict=1 on mbperftest/binomial/X10M_1k_sparse: 16
> LinRegCG train ict=2 on mbperftest/binomial/X10M_1k_sparse: 15
> {code}
> and here is the related -stats output for ict=1.
> {code}
> Total elapsed time: 16.893 sec.
> Total compilation time: 2.412 sec.
> Total execution time: 14.480 sec.
> Number of compiled Spark inst: 0.
> Number of executed Spark inst: 0.
> Cache hits (Mem, WB, FS, HDFS): 172/0/0/2.
> Cache writes (WB, FS, HDFS): 77/0/1.
> Cache times (ACQr/m, RLS, EXP): 1.734/0.003/2.143/0.209 sec.
> HOP DAGs recompiled (PRED, SB): 0/0.
> HOP DAGs recompile time: 0.000 sec.
> Spark ctx create time (lazy): 0.000 sec.
> Spark trans counts (par,bc,col):0/0/0.
> Spark trans times (par,bc,col): 0.000/0.000/0.000 secs.
> Total JIT compile time: 5.357 sec.
> Total JVM GC count: 2.
> Total JVM GC time: 5.628 sec.
> Heavy hitter instructions (name, time, count):
> -- 1) append 8.595 sec 26
> -- 2) mmchain 4.443 sec 8
> -- 3) ba+* 0.537 sec 10
> -- 4) r' 0.411 sec 10
> -- 5) write 0.210 sec 1
> -- 6) - 0.087 sec 20
> -- 7) uak+ 0.059 sec 2
> -- 8) tsmm 0.049 sec 11
> -- 9) rand 0.043 sec 5
> -- 10) +* 0.007 sec 24
> {code}
> The large GC time indicates that sparse row re-allocations are a major issue here. We should compute the joint nnz per output row, and allocate the output sparse row just once.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)