You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@systemml.apache.org by "Matthias Boehm (JIRA)" <ji...@apache.org> on 2017/03/17 20:18:42 UTC
[jira] [Closed] (SYSTEMML-455) OOM CP transpose in Spark hybrid mode

     [ https://issues.apache.org/jira/browse/SYSTEMML-455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthias Boehm closed SYSTEMML-455.
-----------------------------------

> OOM CP transpose in Spark hybrid mode 
> --------------------------------------
>
>                 Key: SYSTEMML-455
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-455
>             Project: SystemML
>          Issue Type: Bug
>          Components: Compiler
>            Reporter: Matthias Boehm
>            Assignee: Matthias Boehm
>             Fix For: SystemML 1.0
>
>
> The following data generation script failed with OOM in hybrid_spark execution mode (config: 20GB driver memory), whereas the same script runs fine with the same memory budget in hybrid_mr execution mode.
> {code}
> n = 30000;
> B = Rand (rows = n, cols = n, min = -1, max = 1, pdf = "uniform", seed = 1234);
> v = exp (Rand (rows = n, cols = 1, min = -3, max = 3, pdf = "uniform", seed = 5678));
> A = t(B) %*% (B * v);
> write(A, "./tmp/A", format="binary");
> {code}
> The resulting hop explain output is as follows:
> {code}
> # Memory Budget local/remote = 13739MB/184320MB/8602MB
> # Degree of Parallelism (vcores) local/remote = 16/120
> PROGRAM
> --MAIN PROGRAM
> ----GENERIC (lines 4-12) [recompile=true]
> ------(10) dg(rand) [30000,30000,1000,1000,900000000] [0,0,6866 -> 6866MB], CP
> ------(21) r(t) (10) [30000,30000,1000,1000,900000000] [6866,0,6866 -> 13733MB], CP
> ------(19) dg(rand) [30000,1,1000,1000,30000] [0,0,0 -> 0MB], CP
> ------(20) u(exp) (19) [30000,1,1000,1000,-1] [0,0,0 -> 0MB], CP
> ------(22) b(*) (10,20) [30000,30000,1000,1000,-1] [6867,0,6866 -> 13733MB], CP
> ------(23) ba(+*) (21,22) [30000,30000,1000,1000,-1] [13733,6866,6866 -> 27466MB], SPARK
> ------(28) PWrite A (23) [30000,30000,1000,1000,-1] [6866,0,0 -> 6866MB], CP
> {code}
> The scripts fails at CP transpose with
> {code}
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>         at org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlock(MatrixBlock.java:414)
>         at org.apache.sysml.runtime.matrix.data.LibMatrixReorg.transposeDenseToDense(LibMatrixReorg.java:752)
>         at org.apache.sysml.runtime.matrix.data.LibMatrixReorg.transpose(LibMatrixReorg.java:136)
>         at org.apache.sysml.runtime.matrix.data.LibMatrixReorg.reorg(LibMatrixReorg.java:105)
>         at org.apache.sysml.runtime.matrix.data.MatrixBlock.reorgOperations(MatrixBlock.java:3458)
>         at org.apache.sysml.runtime.instructions.cp.ReorgCPInstruction.processInstruction(ReorgCPInstruction.java:129)
> {code}
> It's noteworthy that the failing cp instructions requires 13733MB at a memory budget of 13739MB. The current guess is that Spark itself occupies substantial memory overhead which eventually leads to the OOM - we should adjust our memory budget in Spark execution modes to account for this overhead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)