You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@systemml.apache.org by "Matthias Boehm (JIRA)" <ji...@apache.org> on 2017/07/15 03:49:01 UTC
[jira] [Resolved] (SYSTEMML-1772) Perftest: MultiLogReg 100M x 1K, sparse fails with OOM

     [ https://issues.apache.org/jira/browse/SYSTEMML-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthias Boehm resolved SYSTEMML-1772.
--------------------------------------
       Resolution: Fixed
         Assignee: Matthias Boehm
    Fix Version/s: SystemML 1.0

> Perftest: MultiLogReg 100M x 1K, sparse fails with OOM
> ------------------------------------------------------
>
>                 Key: SYSTEMML-1772
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1772
>             Project: SystemML
>          Issue Type: Bug
>            Reporter: Matthias Boehm
>            Assignee: Matthias Boehm
>             Fix For: SystemML 1.0
>
>
> Our perftest MultiLogReg 100M x 1K, sparse fails with the following OOM when ran with 20GB driver budget. 
> {code}
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> 17/07/14 13:42:04 WARN hdfs.BlockReaderFactory: I/O error constructing remote block reader.
> java.io.EOFException: Premature EOF: no length prefix available
> 	at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2282)
> 	at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:423)
> 	at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:818)
> 	at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:697)
> 	at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355)
> 	at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:673)
> 	at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882)
> 	at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
> 	at java.io.DataInputStream.readFully(DataInputStream.java:195)
> 	at java.io.DataInputStream.readFully(DataInputStream.java:169)
> 	at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1915)
> 	at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1880)
> 	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1829)
> 	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1843)
> 	at org.apache.sysml.runtime.io.ReaderBinaryBlockParallel$ReadFileTask.call(ReaderBinaryBlockParallel.java:150)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:748)
> {code}
> Thanks for catching this issue [~acs_s]. The root cause can be seen in the following HOP characteristics and the generated runtime plan which contains a CP mmchain operation for hop 456
> {code}
> 17/07/14 16:48:50 INFO recompile.Recompiler: EXPLAIN RECOMPILE 
> GENERIC (lines 207-208):
> --(432) TRead X [100000000,1000,1000,1000,999978303] [0,0,23270 -> 23270MB], SPARK
> --(439) r(t) (432) [1000,100000000,1000,1000,999978303] [23270,0,11444 -> 34714MB], SPARK
> --(431) TRead P [100000000,2,1000,1000,200000000] [0,0,1526 -> 1526MB], CP
> --(436) rix (431) [100000000,1,1000,1000,-1] [1526,0,763 -> 2289MB], CP
> --(1276) u(sprop) (436) [100000000,1,1000,1000,-1] [763,0,763 -> 1526MB], CP
> --(429) TRead ssX_V [1000,1,1000,1000,1000] [0,0,0 -> 0MB], CP
> --(437) ba(+*) (432,429) [100000000,1,1000,1000,-1] [23270,0,763 -> 24033MB], SPARK
> --(1275) b(*) (1276,437) [100000000,1,1000,1000,-1] [1526,0,763 -> 2289MB], CP
> --(456) ba(+*) (439,1275) [1000,1,1000,1000,-1] [12207,0,0 -> 12207MB], CP
> --(457) TWrite HV (456) [1000,1,1000,1000,-1] [0,0,0 -> 0MB], CP
> {code}
> The final matrix multiplication for {{t(X) tmp}} fits in CP and satisfied the mmchain pattern However, mmchain avoids the transpose (assuming that X must fit into memory given that t(X) fits in memory). Given our MCSR and CSR representations this is not necessarily true because there each row has a certain sparse row overhead independent of the number of non-zeros.
> We should consider this scenario during execution type selection and send the entire pattern to SPARK in these cases which is anyway a good idea because the first matrix multiplications is already in SPARK. If the additional broadcast and blocksize constraints are met we compile a SPARK mmchain, otherwise two subsequent SPARK matrix multiplications.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)