You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@systemml.apache.org by "Matthias Boehm (JIRA)" <ji...@apache.org> on 2016/07/07 04:14:11 UTC
[jira] [Comment Edited] (SYSTEMML-775) Distribute Data for spark
[ https://issues.apache.org/jira/browse/SYSTEMML-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365578#comment-15365578 ]
Matthias Boehm edited comment on SYSTEMML-775 at 7/7/16 4:13 AM:
-----------------------------------------------------------------
there are of course also workarounds but it's hard to make a recommendation without the script:
{code}
X = read()
parfor( i in 1:nrow(X), opt=CONSTRAINED, mode=REMOTE_SPARK ) {
Xi = X[i, ];
#do some extremely CPU-intensive work that justifies to force distributed computation
}
{code}
was (Author: mboehm7):
there are of course also workarounds but it's hard to make a recommendation without the script:
{code}
X = read()
parfor( i in 1:nrow(X), opt=CONSTRAINED, mode=REMOTE_SPARK ) {
Xi = X[, i];
#do some extremely CPU-intensive work that justifies forcing distributed computation
}
{code}
> Distribute Data for spark
> -------------------------
>
> Key: SYSTEMML-775
> URL: https://issues.apache.org/jira/browse/SYSTEMML-775
> Project: SystemML
> Issue Type: Question
> Components: Algorithms
> Affects Versions: SystemML 0.10
> Reporter: Johannes Wilke
> Priority: Minor
>
> Hi!
> I have to calculate in parallel on data on a spark-Cluster with SystemML.
> The program works fine on the cluster, but not in parallel, because I don't know how to distribute my data throw this Cluster to use the data with SystemML.
> In Scala I have tried the following:
> val sysMlMatrix = RDDConverterUtils.dataFrameToBinaryBlock(sc, dff, mc, false)
> sysMlMatrix.saveAsObjectFile("/home/hduser/test.obj")
> val sysMlMatrix2 = sc.sequenceFile[MatrixIndexes, MatrixBlock]("/home/hduser/test.obj",1000);
> val sysMlMatrix3 = JavaPairRDD.fromRDD(sysMlMatrix2)
> ml.reset()
> ml.registerInput("X", sysMlMatrix3, numRows, numCols)
> But I get a ClassCastException, when I try to load the object File.
> My Matrix has 1000 rows and I want to work in parallel on these rows.
> How can I reach this? I hope you can help me!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)