You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@systemml.apache.org by "Matthias Boehm (JIRA)" <ji...@apache.org> on 2016/07/07 04:14:11 UTC

[jira] [Comment Edited] (SYSTEMML-775) Distribute Data for spark

    [ https://issues.apache.org/jira/browse/SYSTEMML-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365578#comment-15365578 ] 

Matthias Boehm edited comment on SYSTEMML-775 at 7/7/16 4:13 AM:
-----------------------------------------------------------------

there are of course also workarounds but it's hard to make a recommendation without the script:
{code}
X = read()
parfor( i in 1:nrow(X), opt=CONSTRAINED, mode=REMOTE_SPARK ) {
   Xi = X[i, ];
   #do some extremely CPU-intensive work that justifies to force distributed computation    
}
{code}


was (Author: mboehm7):
there are of course also workarounds but it's hard to make a recommendation without the script:
{code}
X = read()
parfor( i in 1:nrow(X), opt=CONSTRAINED, mode=REMOTE_SPARK ) {
   Xi = X[, i];
   #do some extremely CPU-intensive work that justifies forcing distributed computation    
}
{code}

> Distribute Data for spark
> -------------------------
>
>                 Key: SYSTEMML-775
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-775
>             Project: SystemML
>          Issue Type: Question
>          Components: Algorithms
>    Affects Versions: SystemML 0.10
>            Reporter: Johannes Wilke
>            Priority: Minor
>
> Hi!
> I have to calculate in parallel on data on a spark-Cluster with SystemML.
> The program works fine on the cluster, but not in parallel, because I don't know how to distribute my data throw this Cluster to use the data with SystemML.
> In Scala I have tried the following:
>  val sysMlMatrix = RDDConverterUtils.dataFrameToBinaryBlock(sc, dff, mc, false)
>  sysMlMatrix.saveAsObjectFile("/home/hduser/test.obj")
>  val sysMlMatrix2 = sc.sequenceFile[MatrixIndexes, MatrixBlock]("/home/hduser/test.obj",1000);
>  val sysMlMatrix3 = JavaPairRDD.fromRDD(sysMlMatrix2)
>     ml.reset()
>     ml.registerInput("X", sysMlMatrix3, numRows, numCols)
> But I get a ClassCastException, when I try to load the object File.
> My Matrix has 1000 rows and I want to work in parallel on these rows.
> How can I reach this? I hope you can help me!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)