You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemml.apache.org by Matthias Boehm <mb...@us.ibm.com> on 2016/04/03 07:08:34 UTC

Re: Gxuides about running SystemML by spark cluster

thanks again for catching
https://issues.apache.org/jira/browse/SYSTEMML-609, yes the change is in
SystemML head now, so please rebuild SystemML or use one of our nightly
builds (https://sparktc.ibmcloud.com/repo/latest/). Thanks.

For running SystemML on Spark, you have multiple options (
http://apache.github.io/incubator-systemml/#running-systemml). Either use
MLContext or spark-submit. Since our documentation does not show many
examples for spark-submit yet, here is a typical command line invocation:

../spark/bin/spark-submit \
      --class org.apache.sysml.api.DMLScript \
      --master yarn-client \
      --num-executors 10 \
      --driver-memory 20g \
      --executor-memory 60g \
      --executor-cores 24 \
      --queue default \
      --conf spark.driver.maxResultSize=0 \
      ./SystemML.jar \
      -f test.dml -stats -exec hybrid_spark -nvargs ...

Everything else is similar to the hadoop invocation. We also provide you a
script that simplifies this configuration:
https://github.com/apache/incubator-systemml/blob/master/scripts/sparkDML.sh
. Keep in mind that if you want to run in yarn-cluster, you should put the
DML script and potentially SystemML-config into HDFS too.

Regards,
Matthias




From:	Wenjie Zhuang <ka...@vt.edu>
To:	dev@systemml.incubator.apache.org
Cc:	Matthias Boehm/Almaden/IBM@IBMUS
Date:	04/02/2016 07:50 PM
Subject:	Re: Gxuides about running SystemML by spark cluster



Hi,

I try to run StepLinearRegDS.dml by spark yarn mode today. And I get the
following result. Is it correct?

Thanks.

BEGIN STEPWISE LINEAR REGRESSION SCRIPT
Reading X and Y...
Best AIC without any features: 4123.134539784949
Best AIC 4068.2916533784332 achieved with feature: 22
Running linear regression with selected features...
Computing the statistics...
Writing the output matrix...



On Sat, Apr 2, 2016 at 8:37 AM, Wenjie Zhuang <ka...@vt.edu> wrote:
  Hi,

  I am now trying to run experiments about SystemML on spark cluster. Could
  you please share some guides about how to run StepLinearRegDS.dml by
  spark cluster?  The official guide I find is most about hadoop.

  Thanks & Have a good weekend!