You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@systemml.apache.org by "LI Guobao (JIRA)" <ji...@apache.org> on 2018/04/05 10:53:00 UTC

[jira] [Commented] (SYSTEMML-2197) Multi-threaded broadcast creation

    [ https://issues.apache.org/jira/browse/SYSTEMML-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426745#comment-16426745 ] 

LI Guobao commented on SYSTEMML-2197:
-------------------------------------

OK. I got an error when launching this test. [~mboehm7], could you help me out of this? I have setted the classpath to the systemml module. And the generated folders inside target can be also found.
{code:java}
18/04/05 12:40:42 INFO api.DMLScript: END DML run 04/05/2018 12:40:42
starting R script
cmd: Rscript --default-packages=methods,datasets,graphics,grDevices,stats,utils ./src/test/scripts/functions/binary/matrix_full_other/FullDistributedMatrixMultiplication.R target/testTemp/functions/binary/matrix_full_other/FullDistributedMatrixMultiplicationTest/in/ target/testTemp/functions/binary/matrix_full_other/FullDistributedMatrixMultiplicationTest/expected/0.7_0.1/
java.io.IOException: Cannot run program "Rscript": error=2, No such file or directory
 at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
 at java.lang.Runtime.exec(Runtime.java:620)
 at java.lang.Runtime.exec(Runtime.java:450)
 at java.lang.Runtime.exec(Runtime.java:347)
 at org.apache.sysml.test.integration.AutomatedTestBase.runRScript(AutomatedTestBase.java:990)
 at org.apache.sysml.test.integration.functions.binary.matrix_full_other.FullDistributedMatrixMultiplicationTest.runDistributedMatrixMatrixMultiplicationTest(FullDistributedMatrixMultiplicationTest.java:277)
 at org.apache.sysml.test.integration.functions.binary.matrix_full_other.FullDistributedMatrixMultiplicationTest.testDenseSparseRmmSpark(FullDistributedMatrixMultiplicationTest.java:209)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
 at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
 at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
 at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
 at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
 at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
 at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
 at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
 at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
 at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
 at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
 at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
 at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
 at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
Caused by: java.io.IOException: error=2, No such file or directory
 at java.lang.UNIXProcess.forkAndExec(Native Method)
 at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
 at java.lang.ProcessImpl.start(ProcessImpl.java:134)
 at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
 ... 32 more
{code}

> Multi-threaded broadcast creation
> ---------------------------------
>
>                 Key: SYSTEMML-2197
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-2197
>             Project: SystemML
>          Issue Type: Task
>            Reporter: Matthias Boehm
>            Assignee: LI Guobao
>            Priority: Major
>
> All spark instructions that broadcast one of the input operands, rely on a shared primitive {{sec.getBroadcastForVariable(var)}} for creating partitioned broadcasts, which are wrapper objects around potentially many broadcast variables to overcome Spark 2GB limitation for compressed broadcasts. Each individual broadcast blocks the matrix into squared blocks for direct access without unnecessary copy per task. So far this broadcast creation is single-threaded. 
> This task aims to parallelize the blocking of the given in-memory matrix into squared blocks (https://github.com/apache/systemml/blob/master/src/main/java/org/apache/sysml/runtime/instructions/spark/data/PartitionedBlock.java#L82) as well as the subsequent partition creation and actual broadcasting (https://github.com/apache/systemml/blob/master/src/main/java/org/apache/sysml/runtime/controlprogram/context/SparkExecutionContext.java#L548). 
> For consistency and in order to avoid excessive over-provisioning, this multi-threading should use the common internal thread pool or parallel java streams, which similarly calls the shared {{ForkJoinPool.commonPool}}. An example is the multi-threaded parallelization of RDDs which similarly blocks a given matrix into its squared blocks (see https://github.com/apache/systemml/blob/master/src/main/java/org/apache/sysml/runtime/controlprogram/context/SparkExecutionContext.java#L679).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)