You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@systemml.apache.org by "Matthias Boehm (JIRA)" <ji...@apache.org> on 2018/02/24 09:51:00 UTC

[jira] [Closed] (SYSTEMML-1078) Ultra Sparse Invalid number of serialized non-zeros

     [ https://issues.apache.org/jira/browse/SYSTEMML-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthias Boehm closed SYSTEMML-1078.
------------------------------------
       Resolution: Fixed
         Assignee: Matthias Boehm
    Fix Version/s: SystemML 1.1

I'm closing this issue as there is no reproducible scenario and this issue has likely been fixed with the recent sparse block fixes SYSTEMML-1959, SYSTEMML-2035, SYSTEMML-2051, SYSTEMML-2052, and SYSTEMML-2098.

> Ultra Sparse Invalid number of serialized non-zeros
> ---------------------------------------------------
>
>                 Key: SYSTEMML-1078
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1078
>             Project: SystemML
>          Issue Type: Bug
>            Reporter: Mike Dusenberry
>            Assignee: Matthias Boehm
>            Priority: Blocker
>             Fix For: SystemML 1.1
>
>
> Randomly during training of a model, the following error will occur.  It appears that during the course of training, the characteristics of the intermediate matrices can change, and if one of them becomes sparse enough to fall into the "Ultra Sparse" category, an internal compiler error is encountered in which the *true* and *expected* number of non-zeros diverge.
> {code}
> Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception occurred while executing runtime program
> 	at org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:377)
> 	at org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:320)
> 	at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:287)
> 	... 11 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in while program block generated from while statement block between lines 17 and 45 -- Error evaluating while program block
> 	at org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:152)
> 	at org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:375)
> 	... 13 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in while program block generated from while statement block between lines 17 and 45 -- Error evaluating while program block
> 	at org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:181)
> 	at org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:145)
> 	... 14 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program block generated from statement block between lines 32 and 32 -- Error evaluating instruction: CP°extfunct°./mnist_lenet.dml°train°14°10°X·MATRIX·DOUBLE°Y·MATRIX·DOUBLE°X_val·MATRIX·DOUBLE°Y_val·MATRIX·DOUBLE°C·SCALAR·DOUBLE·false°Hin·SCALAR·DOUBLE·false°Win·SCALAR·DOUBLE·false°lr·SCALAR·DOUBLE·false°mu·SCALAR·DOUBLE·false°decay·SCALAR·DOUBLE·false°lambda·SCALAR·DOUBLE·false°50·SCALAR·INT·true°1·SCALAR·INT·true°iters·SCALAR·DOUBLE·false°Wc1°bc1°Wc2°bc2°Wc3°bc3°Wa1°ba1°Wa2°ba2
> 	at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:335)
> 	at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:224)
> 	at org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168)
> 	at org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:169)
> 	... 15 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: error executing function ./mnist_lenet.dml::train
> 	at org.apache.sysml.runtime.instructions.cp.FunctionCallCPInstruction.processInstruction(FunctionCallCPInstruction.java:184)
> 	at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:305)
> 	... 18 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in function program block generated from function statement block between lines 38 and 270 -- Error evaluating function program block
> 	at org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:121)
> 	at org.apache.sysml.runtime.instructions.cp.FunctionCallCPInstruction.processInstruction(FunctionCallCPInstruction.java:177)
> 	... 19 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in for program block generated from for statement block between lines 131 and 269 -- Error evaluating for program block
> 	at org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:162)
> 	at org.apache.sysml.runtime.controlprogram.FunctionProgramBlock.execute(FunctionProgramBlock.java:114)
> 	... 20 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in for program block generated from for statement block between lines 132 and 244 -- Error evaluating for program block
> 	at org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:162)
> 	at org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:150)
> 	... 21 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program block generated from statement block between lines 157 and 217 -- Error evaluating instruction: CP°r'°outc3p·MATRIX·DOUBLE°_mVar1077501·MATRIX·DOUBLE°48
> 	at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:335)
> 	at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:224)
> 	at org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168)
> 	at org.apache.sysml.runtime.controlprogram.ForProgramBlock.execute(ForProgramBlock.java:150)
> 	... 22 more
> Caused by: org.apache.sysml.runtime.controlprogram.caching.CacheException: Eviction to local path /tmp/systemml/_p6456_10.168.31.80//cache/cache000546482.dat (_mVar1077501) failed.
> 	at org.apache.sysml.runtime.controlprogram.caching.CacheableData.release(CacheableData.java:651)
> 	at org.apache.sysml.runtime.controlprogram.context.ExecutionContext.setMatrixOutput(ExecutionContext.java:426)
> 	at org.apache.sysml.runtime.instructions.cp.ReorgCPInstruction.processInstruction(ReorgCPInstruction.java:135)
> 	at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:305)
> 	... 25 more
> Caused by: java.io.IOException: Failed to serialize cache block.
> 	at org.apache.sysml.runtime.controlprogram.caching.ByteBuffer.serializeBlock(ByteBuffer.java:82)
> 	at org.apache.sysml.runtime.controlprogram.caching.LazyWriteBuffer.writeBlock(LazyWriteBuffer.java:113)
> 	at org.apache.sysml.runtime.controlprogram.caching.CacheableData.release(CacheableData.java:647)
> 	... 28 more
> Caused by: java.io.IOException: Invalid number of serialized non-zeros: 842 (expected: 2044)
> 	at org.apache.sysml.runtime.matrix.data.MatrixBlock.writeSparseToUltraSparse(MatrixBlock.java:2208)
> 	at org.apache.sysml.runtime.matrix.data.MatrixBlock.write(MatrixBlock.java:2073)
> 	at org.apache.sysml.runtime.controlprogram.caching.ByteBuffer.serializeBlock(ByteBuffer.java:73)
> 	... 30 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)