You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@systemml.apache.org by mb...@apache.org on 2017/07/20 08:25:34 UTC

[3/3] systemml git commit: [SYSTEMML-1792] Performance sparse-dense block matrix multiply

[SYSTEMML-1792] Performance sparse-dense block matrix multiply

Our sparse-dense matrix multiply was already cache-conscious but used
very small block static block sizes, which were optimized for moderate
sparsity. However, for cases with very sparse matrices (and skinny right
hand size matrices), the small block sizes added substantial overhead of
more than an order of magnitude. This patch makes these block sizes
adaptive, consistent with our cache-conscious implementations of
sparsity exploiting matrix multiply operators such as wsloss.

On a scenario with 1K x 2M mini batches of average sparsity 0.0003 and a
dense right hand side of 2M x 100, the runtime improved from ~300ms to
<2ms, without hurting the case of moderate sparsity.


Project: http://git-wip-us.apache.org/repos/asf/systemml/repo
Commit: http://git-wip-us.apache.org/repos/asf/systemml/commit/cca4f942
Tree: http://git-wip-us.apache.org/repos/asf/systemml/tree/cca4f942
Diff: http://git-wip-us.apache.org/repos/asf/systemml/diff/cca4f942

Branch: refs/heads/master
Commit: cca4f942cd8d4a2ac0b6e5994ebac4efdcdc06c3
Parents: 4a24b9a
Author: Matthias Boehm <mb...@gmail.com>
Authored: Thu Jul 20 01:21:19 2017 -0700
Committer: Matthias Boehm <mb...@gmail.com>
Committed: Thu Jul 20 01:24:27 2017 -0700

----------------------------------------------------------------------
 .../org/apache/sysml/runtime/matrix/data/LibMatrixMult.java | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/systemml/blob/cca4f942/src/main/java/org/apache/sysml/runtime/matrix/data/LibMatrixMult.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/sysml/runtime/matrix/data/LibMatrixMult.java b/src/main/java/org/apache/sysml/runtime/matrix/data/LibMatrixMult.java
index 30e7d3d..c2af52d 100644
--- a/src/main/java/org/apache/sysml/runtime/matrix/data/LibMatrixMult.java
+++ b/src/main/java/org/apache/sysml/runtime/matrix/data/LibMatrixMult.java
@@ -1292,8 +1292,11 @@ public class LibMatrixMult
 			{							
 				//blocksizes to fit blocks of B (dense) and several rows of A/C in common L2 cache size, 
 				//while blocking A/C for L1/L2 yet allowing long scans (2 pages) in the inner loop over j
-				final int blocksizeI = 32;
-				final int blocksizeK = 24; 
+				//in case of almost ultra-sparse matrices, we cannot ensure the blocking for the rhs and
+				//output - however, in this case it's unlikely that we consume every cache line in the rhs
+				
+				final int blocksizeI = (int) (8L*m*cd/m1.nonZeros);
+				final int blocksizeK = (int) (8L*m*cd/m1.nonZeros);
 				final int blocksizeJ = 1024; 
 				
 				//temporary array of current sparse positions
@@ -1314,7 +1317,7 @@ public class LibMatrixMult
 									int[] aix = a.indexes(i);
 									double[] avals = a.values(i);					
 									
-									int k = curk[i-bi] + apos;									
+									int k = curk[i-bi] + apos;			
 					    			//rest not aligned to blocks of 4 rows
 									int bn = alen%4;
 									for( ; k<apos+bn && aix[k]<bkmin; k++ )