You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mateiz <gi...@git.apache.org> on 2014/05/28 01:20:50 UTC
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
GitHub user mateiz opened a pull request:
https://github.com/apache/spark/pull/896
[SPARK-1566] consolidate programming guide, and general doc updates
This is a fairly large PR to clean up and update the docs for 1.0. The major changes are:
* A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs
* New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark
* Spark-submit guide moved to a separate page and expanded slightly
* Various cleanups of the menu system, security docs, and others
* Updated look of title bar to differentiate the docs from previous Spark versions
You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mateiz/spark 1.0-docs
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/896.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #896
----
commit 038d8feb700758f48d4c777a47a9b36c637a5372
Author: Matei Zaharia <ma...@databricks.com>
Date: 2014-05-26T00:30:51Z
Change color of doc title bar to differentiate from 0.9.0
commit 4298ce9145585b0ad02809202b4bc22df79063a8
Author: Matei Zaharia <ma...@databricks.com>
Date: 2014-05-26T00:50:47Z
More CSS tweaks
commit dec99f104c484d8ce5cffd8e1ed9db530e59374a
Author: Matei Zaharia <ma...@databricks.com>
Date: 2014-05-26T01:11:25Z
More CSS tweaks
commit 6b618f6af881f7cdbf29b7cbdf0abe35c8d4e555
Author: Matei Zaharia <ma...@databricks.com>
Date: 2014-05-26T20:52:59Z
First pass at updating programming guide to support all languages, plus
other tweaks throughout
commit 08f0c861a881ebc3c5cd83619ce885431c478f0e
Author: Matei Zaharia <ma...@databricks.com>
Date: 2014-05-26T21:56:53Z
Actually added programming guide to Git
commit 5d29d82b6c53fa0f0f14e60939b1369c4b3888b3
Author: Matei Zaharia <ma...@databricks.com>
Date: 2014-05-27T07:57:25Z
New section on basics and function syntax
commit af747d8d78ee30ee8b1067ab44d1cab560be5d34
Author: Matei Zaharia <ma...@databricks.com>
Date: 2014-05-27T08:04:10Z
tweaks
commit 12aa10c5485ea3855b6a1d59be5cc38c77a23f21
Author: Matei Zaharia <ma...@databricks.com>
Date: 2014-05-27T08:31:26Z
Added key-value pairs section
commit bdf22a28a17955bbbe733b68ad1340eefa0cdcd5
Author: Matei Zaharia <ma...@databricks.com>
Date: 2014-05-27T08:37:41Z
tweaks
commit fce1c797e73d1704e91807a23c7cc5b088edcb7a
Author: Matei Zaharia <ma...@databricks.com>
Date: 2014-05-27T18:29:27Z
Add more API functions
commit cf22e27e186ec9c3981488ee2cd7c58b8b1b8599
Author: Matei Zaharia <ma...@databricks.com>
Date: 2014-05-27T18:49:59Z
migration guide, remove old language guides
commit 64cb7c2ceb82483088156928bfac2fe53772c6be
Author: Matei Zaharia <ma...@databricks.com>
Date: 2014-05-27T21:11:58Z
stuff
commit 59ef0289aa148e0b5b3ac198c09cc28fb5713fc4
Author: Matei Zaharia <ma...@databricks.com>
Date: 2014-05-27T21:12:28Z
Moved submitting apps to separate doc
commit 09c57b252b62209fd92d35a4443ae65e637fc2fb
Author: Matei Zaharia <ma...@databricks.com>
Date: 2014-05-27T23:07:34Z
miscellaneous changes
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/896#issuecomment-44371355
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15245/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/896#issuecomment-44348380
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/896#issuecomment-44445558
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/896#issuecomment-44351157
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/896#issuecomment-44483569
@pwendell is it okay to merge this as is or do you want to look at it more?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/896#issuecomment-44451149
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/896#issuecomment-44348783
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/896#issuecomment-44445572
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:
https://github.com/apache/spark/pull/896#discussion_r13116427
--- Diff: docs/submitting-applications.md ---
@@ -0,0 +1,153 @@
+---
+layout: global
+title: Submitting Applications
+---
+
+The `spark-submit` script in Spark's `bin` directory is used to launch applications on a cluster.
+It can use all of Spark's supported [cluster managers](cluster-overview.html#cluster-manager-types)
+through a uniform interface so you don't have to configure your application specially for each one.
+
+# Bundling Your Application's Dependencies
+If your code depends on other projects, you will need to package them alongside
+your application in order to distribute the code to a Spark cluster. To do this,
+to create an assembly jar (or "uber" jar) containing your code and its dependencies. Both
+[sbt](https://github.com/sbt/sbt-assembly) and
+[Maven](http://maven.apache.org/plugins/maven-shade-plugin/)
+have assembly plugins. When creating assembly jars, list Spark and Hadoop
+as `provided` dependencies; these need not be bundled since they are provided by
+the cluster manager at runtime. Once you have an assembled jar you can call the `bin/spark-submit`
+script as shown here while passing your jar.
+
+For Python, you can use the `--py-files` argument of `spark-submit` to add `.py`, `.zip` or `.egg`
+files to be distributed with your application. If you depend on multiple Python files we recommend
+packaging them into a `.zip` or `.egg`.
+
+# Launching Applications with spark-submit
+
+Once a user application is bundled, it can be launched using the `bin/spark-submit` script.
+This script takes care of setting up the classpath with Spark and its
+dependencies, and can support different cluster managers and deploy modes that Spark supports:
+
+{% highlight bash %}
+./bin/spark-submit \
+ --class <main-class>
+ --master <master-url> \
+ --deploy-mode <deploy-mode> \
+ ... # other options
+ <application-jar> \
+ [application-arguments]
+{% endhighlight %}
+
+Some of the commonly used options are:
+
+* `--class`: The entry point for your application (e.g. `org.apache.spark.examples.SparkPi`)
+* `--master`: The [master URL](#master-urls) for the cluster (e.g. `spark://23.195.26.187:7077`)
+* `--deploy-mode`: Whether to deploy your driver program within the cluster or run it locally as an external client (either `cluster` or `client`)
+* `application-jar`: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an `hdfs://` path or a `file://` path that is present on all nodes.
+* `application-arguments`: Arguments passed to the main method of your main class, if any
+
+For Python applications, simply pass a `.py` file in the place of `<application-jar>` instead of a JAR,
+and add Python `.zip`, `.egg` or `.py` files to the search path with `--py-files`.
+
+To enumerate all options available to `spark-submit` run it with `--help`. Here are a few
+examples of common options:
+
+{% highlight bash %}
+# Run application locally on 8 cores
+./bin/spark-submit \
+ --class org.apache.spark.examples.SparkPi \
+ --master local[8] \
+ /path/to/examples.jar \
+ 100
+
+# Run on a Spark standalone cluster
+./bin/spark-submit \
+ --class org.apache.spark.examples.SparkPi \
+ --master spark://207.184.161.138:7077 \
+ --executor-memory 20G \
+ --total-executor-cores 100 \
+ /path/to/examples.jar \
+ 1000
+
+# Run on a YARN cluster
+export HADOOP_CONF_DIR=XXX
+./bin/spark-submit \
+ --class org.apache.spark.examples.SparkPi \
+ --master yarn-cluster \ # can also be `yarn-client` for client mode
+ --executor-memory 20G \
+ --num-executors 50 \
+ /path/to/examples.jar \
+ 1000
+
+# Run a Python application on a cluster
+./bin/spark-submit \
+ --master spark://207.184.161.138:7077 \
+ examples/src/main/python/pi.py \
+ 1000
+{% endhighlight %}
+
+# Master URLs
+
+The master URL passed to Spark can be in one of the following formats:
+
+<table class="table">
+<tr><th>Master URL</th><th>Meaning</th></tr>
+<tr><td> local </td><td> Run Spark locally with one worker thread (i.e. no parallelism at all). </td></tr>
+<tr><td> local[K] </td><td> Run Spark locally with K worker threads (ideally, set this to the number of cores on your machine). </td></tr>
+<tr><td> local[*] </td><td> Run Spark locally with as many worker threads as logical cores on your machine.</td></tr>
+<tr><td> spark://HOST:PORT </td><td> Connect to the given <a href="spark-standalone.html">Spark standalone
+ cluster</a> master. The port must be whichever one your master is configured to use, which is 7077 by default.
+</td></tr>
+<tr><td> mesos://HOST:PORT </td><td> Connect to the given <a href="running-on-mesos.html">Mesos</a> cluster.
+ The port must be whichever one your is configured to use, which is 5050 by default.
+ Or, for a Mesos cluster using ZooKeeper, use <code>mesos://zk://...</code>.
+</td></tr>
+<tr><td> yarn-client </td><td> Connect to a <a href="running-on-yarn.html"> YARN </a> cluster in
+client mode. The cluster location will be found based on the HADOOP_CONF_DIR variable.
+</td></tr>
+<tr><td> yarn-cluster </td><td> Connect to a <a href="running-on-yarn.html"> YARN </a> cluster in
+cluster mode. The cluster location will be found based on HADOOP_CONF_DIR.
+</td></tr>
+</table>
+
+
+# Loading Configuration from a File
+
+The `spark-submit` script can load default [Spark configuration values](configuration.html) from a
+properties file and pass them on to your application. By default it will read options
+from `conf/spark-defaults.conf` in the Spark directory. For more detail, see the section on
+[loading default configurations](configuration.html#loading-default-configurations).
+
+Loading default Spark configurations this way can obviate the need for certain flags to
+`spark-submit`. For instance, if the `spark.master` property is set, you can safely omit the
+`--master` flag from `spark-submit`. In general, configuration values explicitly set on a
+`SparkConf` take the highest precedence, then flags passed to `spark-submit`, then values in the
+defaults file.
+
+If you are ever unclear where configuration options are coming from, you can print out fine-grained
+debugging information by running `spark-submit` with the `--verbose` option.
--- End diff --
Yes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/896#issuecomment-44349520
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/896#issuecomment-44351158
All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15237/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/896#issuecomment-44352724
All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15239/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/896#issuecomment-44481139
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/896#issuecomment-44350541
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:
https://github.com/apache/spark/pull/896#discussion_r13109195
--- Diff: docs/submitting-applications.md ---
@@ -0,0 +1,153 @@
+---
+layout: global
+title: Submitting Applications
+---
+
+The `spark-submit` script in Spark's `bin` directory is used to launch applications on a cluster.
+It can use all of Spark's supported [cluster managers](cluster-overview.html#cluster-manager-types)
+through a uniform interface so you don't have to configure your application specially for each one.
+
+# Bundling Your Application's Dependencies
+If your code depends on other projects, you will need to package them alongside
+your application in order to distribute the code to a Spark cluster. To do this,
+to create an assembly jar (or "uber" jar) containing your code and its dependencies. Both
+[sbt](https://github.com/sbt/sbt-assembly) and
+[Maven](http://maven.apache.org/plugins/maven-shade-plugin/)
+have assembly plugins. When creating assembly jars, list Spark and Hadoop
+as `provided` dependencies; these need not be bundled since they are provided by
+the cluster manager at runtime. Once you have an assembled jar you can call the `bin/spark-submit`
+script as shown here while passing your jar.
+
+For Python, you can use the `--py-files` argument of `spark-submit` to add `.py`, `.zip` or `.egg`
+files to be distributed with your application. If you depend on multiple Python files we recommend
+packaging them into a `.zip` or `.egg`.
+
+# Launching Applications with spark-submit
+
+Once a user application is bundled, it can be launched using the `bin/spark-submit` script
+This script takes care of setting up the classpath with Spark and its
+dependencies, and can support different cluster managers and deploy modes that Spark supports:
+
+{% highlight bash %}
+./bin/spark-submit \
+ --class <main-class>
+ --master <master-url> \
+ --deploy-mode <deploy-mode> \
+ ... # other options
+ <application-jar> \
+ [application-arguments]
+{% endhighlight %}
+
+Some of the commonly used options are:
+
+* `--class`: The entry point for your application (e.g. `org.apache.spark.examples.SparkPi`)
+* `--master`: The [master URL](#master-urls) for the cluster (e.g. `spark://23.195.26.187:7077`)
+* `--deploy-mode`: Whether to deploy your driver program within the cluster or run it locally as an external client (either `cluster` or `client`)
+* `application-jar`: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an `hdfs://` path or a `file://` path that is present on all nodes.
+* `application-arguments`: Arguments passed to the main method of your main class, if any
+
+For Python applications, simply pass a `.py` file in the place of `<application-jar>` instead of a JAR,
+and add Python `.zip`, `.egg` or `.py` files to the search path with `--py-files`.
+
+To enumerate all options available to `spark-submit` run it with `--help`. Here are a few
+examples of common options:
+
+{% highlight bash %}
+# Run application locally on 8 cores
+./bin/spark-submit \
+ --class org.apache.spark.examples.SparkPi
--- End diff --
This needs to have a backlash after it (doesn't paste correclty into a shell) and same in the cases below
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/896#issuecomment-44348013
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/896#issuecomment-44622890
Thanks Matei, I've merged this!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/896#issuecomment-44352098
All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15238/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/896#issuecomment-44478696
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/896#issuecomment-44481140
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15269/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by ash211 <gi...@git.apache.org>.
Github user ash211 commented on a diff in the pull request:
https://github.com/apache/spark/pull/896#discussion_r13116340
--- Diff: docs/submitting-applications.md ---
@@ -0,0 +1,153 @@
+---
+layout: global
+title: Submitting Applications
+---
+
+The `spark-submit` script in Spark's `bin` directory is used to launch applications on a cluster.
+It can use all of Spark's supported [cluster managers](cluster-overview.html#cluster-manager-types)
+through a uniform interface so you don't have to configure your application specially for each one.
+
+# Bundling Your Application's Dependencies
+If your code depends on other projects, you will need to package them alongside
+your application in order to distribute the code to a Spark cluster. To do this,
+to create an assembly jar (or "uber" jar) containing your code and its dependencies. Both
+[sbt](https://github.com/sbt/sbt-assembly) and
+[Maven](http://maven.apache.org/plugins/maven-shade-plugin/)
+have assembly plugins. When creating assembly jars, list Spark and Hadoop
+as `provided` dependencies; these need not be bundled since they are provided by
+the cluster manager at runtime. Once you have an assembled jar you can call the `bin/spark-submit`
+script as shown here while passing your jar.
+
+For Python, you can use the `--py-files` argument of `spark-submit` to add `.py`, `.zip` or `.egg`
+files to be distributed with your application. If you depend on multiple Python files we recommend
+packaging them into a `.zip` or `.egg`.
+
+# Launching Applications with spark-submit
+
+Once a user application is bundled, it can be launched using the `bin/spark-submit` script.
+This script takes care of setting up the classpath with Spark and its
+dependencies, and can support different cluster managers and deploy modes that Spark supports:
+
+{% highlight bash %}
+./bin/spark-submit \
+ --class <main-class>
+ --master <master-url> \
+ --deploy-mode <deploy-mode> \
+ ... # other options
+ <application-jar> \
+ [application-arguments]
+{% endhighlight %}
+
+Some of the commonly used options are:
+
+* `--class`: The entry point for your application (e.g. `org.apache.spark.examples.SparkPi`)
+* `--master`: The [master URL](#master-urls) for the cluster (e.g. `spark://23.195.26.187:7077`)
+* `--deploy-mode`: Whether to deploy your driver program within the cluster or run it locally as an external client (either `cluster` or `client`)
+* `application-jar`: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an `hdfs://` path or a `file://` path that is present on all nodes.
+* `application-arguments`: Arguments passed to the main method of your main class, if any
+
+For Python applications, simply pass a `.py` file in the place of `<application-jar>` instead of a JAR,
+and add Python `.zip`, `.egg` or `.py` files to the search path with `--py-files`.
+
+To enumerate all options available to `spark-submit` run it with `--help`. Here are a few
+examples of common options:
+
+{% highlight bash %}
+# Run application locally on 8 cores
+./bin/spark-submit \
+ --class org.apache.spark.examples.SparkPi \
+ --master local[8] \
+ /path/to/examples.jar \
+ 100
+
+# Run on a Spark standalone cluster
+./bin/spark-submit \
+ --class org.apache.spark.examples.SparkPi \
+ --master spark://207.184.161.138:7077 \
+ --executor-memory 20G \
+ --total-executor-cores 100 \
+ /path/to/examples.jar \
+ 1000
+
+# Run on a YARN cluster
+export HADOOP_CONF_DIR=XXX
+./bin/spark-submit \
+ --class org.apache.spark.examples.SparkPi \
+ --master yarn-cluster \ # can also be `yarn-client` for client mode
+ --executor-memory 20G \
+ --num-executors 50 \
+ /path/to/examples.jar \
+ 1000
+
+# Run a Python application on a cluster
+./bin/spark-submit \
+ --master spark://207.184.161.138:7077 \
+ examples/src/main/python/pi.py \
+ 1000
+{% endhighlight %}
+
+# Master URLs
+
+The master URL passed to Spark can be in one of the following formats:
+
+<table class="table">
+<tr><th>Master URL</th><th>Meaning</th></tr>
+<tr><td> local </td><td> Run Spark locally with one worker thread (i.e. no parallelism at all). </td></tr>
+<tr><td> local[K] </td><td> Run Spark locally with K worker threads (ideally, set this to the number of cores on your machine). </td></tr>
+<tr><td> local[*] </td><td> Run Spark locally with as many worker threads as logical cores on your machine.</td></tr>
+<tr><td> spark://HOST:PORT </td><td> Connect to the given <a href="spark-standalone.html">Spark standalone
+ cluster</a> master. The port must be whichever one your master is configured to use, which is 7077 by default.
+</td></tr>
+<tr><td> mesos://HOST:PORT </td><td> Connect to the given <a href="running-on-mesos.html">Mesos</a> cluster.
+ The port must be whichever one your is configured to use, which is 5050 by default.
+ Or, for a Mesos cluster using ZooKeeper, use <code>mesos://zk://...</code>.
+</td></tr>
+<tr><td> yarn-client </td><td> Connect to a <a href="running-on-yarn.html"> YARN </a> cluster in
+client mode. The cluster location will be found based on the HADOOP_CONF_DIR variable.
+</td></tr>
+<tr><td> yarn-cluster </td><td> Connect to a <a href="running-on-yarn.html"> YARN </a> cluster in
+cluster mode. The cluster location will be found based on HADOOP_CONF_DIR.
+</td></tr>
+</table>
+
+
+# Loading Configuration from a File
+
+The `spark-submit` script can load default [Spark configuration values](configuration.html) from a
+properties file and pass them on to your application. By default it will read options
+from `conf/spark-defaults.conf` in the Spark directory. For more detail, see the section on
+[loading default configurations](configuration.html#loading-default-configurations).
+
+Loading default Spark configurations this way can obviate the need for certain flags to
+`spark-submit`. For instance, if the `spark.master` property is set, you can safely omit the
+`--master` flag from `spark-submit`. In general, configuration values explicitly set on a
+`SparkConf` take the highest precedence, then flags passed to `spark-submit`, then values in the
+defaults file.
+
+If you are ever unclear where configuration options are coming from, you can print out fine-grained
+debugging information by running `spark-submit` with the `--verbose` option.
--- End diff --
`spark-submit -h` doesn't show `--verbose` as an option, but `--verbose` does appear when running with invalid options. Should I file a bug to include that in spark-submit's normal help output?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/896
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/896#issuecomment-44348007
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/896#issuecomment-44350542
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by ash211 <gi...@git.apache.org>.
Github user ash211 commented on a diff in the pull request:
https://github.com/apache/spark/pull/896#discussion_r13116488
--- Diff: docs/submitting-applications.md ---
@@ -0,0 +1,153 @@
+---
+layout: global
+title: Submitting Applications
+---
+
+The `spark-submit` script in Spark's `bin` directory is used to launch applications on a cluster.
+It can use all of Spark's supported [cluster managers](cluster-overview.html#cluster-manager-types)
+through a uniform interface so you don't have to configure your application specially for each one.
+
+# Bundling Your Application's Dependencies
+If your code depends on other projects, you will need to package them alongside
+your application in order to distribute the code to a Spark cluster. To do this,
+to create an assembly jar (or "uber" jar) containing your code and its dependencies. Both
+[sbt](https://github.com/sbt/sbt-assembly) and
+[Maven](http://maven.apache.org/plugins/maven-shade-plugin/)
+have assembly plugins. When creating assembly jars, list Spark and Hadoop
+as `provided` dependencies; these need not be bundled since they are provided by
+the cluster manager at runtime. Once you have an assembled jar you can call the `bin/spark-submit`
+script as shown here while passing your jar.
+
+For Python, you can use the `--py-files` argument of `spark-submit` to add `.py`, `.zip` or `.egg`
+files to be distributed with your application. If you depend on multiple Python files we recommend
+packaging them into a `.zip` or `.egg`.
+
+# Launching Applications with spark-submit
+
+Once a user application is bundled, it can be launched using the `bin/spark-submit` script.
+This script takes care of setting up the classpath with Spark and its
+dependencies, and can support different cluster managers and deploy modes that Spark supports:
+
+{% highlight bash %}
+./bin/spark-submit \
+ --class <main-class>
+ --master <master-url> \
+ --deploy-mode <deploy-mode> \
+ ... # other options
+ <application-jar> \
+ [application-arguments]
+{% endhighlight %}
+
+Some of the commonly used options are:
+
+* `--class`: The entry point for your application (e.g. `org.apache.spark.examples.SparkPi`)
+* `--master`: The [master URL](#master-urls) for the cluster (e.g. `spark://23.195.26.187:7077`)
+* `--deploy-mode`: Whether to deploy your driver program within the cluster or run it locally as an external client (either `cluster` or `client`)
+* `application-jar`: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an `hdfs://` path or a `file://` path that is present on all nodes.
+* `application-arguments`: Arguments passed to the main method of your main class, if any
+
+For Python applications, simply pass a `.py` file in the place of `<application-jar>` instead of a JAR,
+and add Python `.zip`, `.egg` or `.py` files to the search path with `--py-files`.
+
+To enumerate all options available to `spark-submit` run it with `--help`. Here are a few
+examples of common options:
+
+{% highlight bash %}
+# Run application locally on 8 cores
+./bin/spark-submit \
+ --class org.apache.spark.examples.SparkPi \
+ --master local[8] \
+ /path/to/examples.jar \
+ 100
+
+# Run on a Spark standalone cluster
+./bin/spark-submit \
+ --class org.apache.spark.examples.SparkPi \
+ --master spark://207.184.161.138:7077 \
+ --executor-memory 20G \
+ --total-executor-cores 100 \
+ /path/to/examples.jar \
+ 1000
+
+# Run on a YARN cluster
+export HADOOP_CONF_DIR=XXX
+./bin/spark-submit \
+ --class org.apache.spark.examples.SparkPi \
+ --master yarn-cluster \ # can also be `yarn-client` for client mode
+ --executor-memory 20G \
+ --num-executors 50 \
+ /path/to/examples.jar \
+ 1000
+
+# Run a Python application on a cluster
+./bin/spark-submit \
+ --master spark://207.184.161.138:7077 \
+ examples/src/main/python/pi.py \
+ 1000
+{% endhighlight %}
+
+# Master URLs
+
+The master URL passed to Spark can be in one of the following formats:
+
+<table class="table">
+<tr><th>Master URL</th><th>Meaning</th></tr>
+<tr><td> local </td><td> Run Spark locally with one worker thread (i.e. no parallelism at all). </td></tr>
+<tr><td> local[K] </td><td> Run Spark locally with K worker threads (ideally, set this to the number of cores on your machine). </td></tr>
+<tr><td> local[*] </td><td> Run Spark locally with as many worker threads as logical cores on your machine.</td></tr>
+<tr><td> spark://HOST:PORT </td><td> Connect to the given <a href="spark-standalone.html">Spark standalone
+ cluster</a> master. The port must be whichever one your master is configured to use, which is 7077 by default.
+</td></tr>
+<tr><td> mesos://HOST:PORT </td><td> Connect to the given <a href="running-on-mesos.html">Mesos</a> cluster.
+ The port must be whichever one your is configured to use, which is 5050 by default.
+ Or, for a Mesos cluster using ZooKeeper, use <code>mesos://zk://...</code>.
+</td></tr>
+<tr><td> yarn-client </td><td> Connect to a <a href="running-on-yarn.html"> YARN </a> cluster in
+client mode. The cluster location will be found based on the HADOOP_CONF_DIR variable.
+</td></tr>
+<tr><td> yarn-cluster </td><td> Connect to a <a href="running-on-yarn.html"> YARN </a> cluster in
+cluster mode. The cluster location will be found based on HADOOP_CONF_DIR.
+</td></tr>
+</table>
+
+
+# Loading Configuration from a File
+
+The `spark-submit` script can load default [Spark configuration values](configuration.html) from a
+properties file and pass them on to your application. By default it will read options
+from `conf/spark-defaults.conf` in the Spark directory. For more detail, see the section on
+[loading default configurations](configuration.html#loading-default-configurations).
+
+Loading default Spark configurations this way can obviate the need for certain flags to
+`spark-submit`. For instance, if the `spark.master` property is set, you can safely omit the
+`--master` flag from `spark-submit`. In general, configuration values explicitly set on a
+`SparkConf` take the highest precedence, then flags passed to `spark-submit`, then values in the
+defaults file.
+
+If you are ever unclear where configuration options are coming from, you can print out fine-grained
+debugging information by running `spark-submit` with the `--verbose` option.
--- End diff --
https://issues.apache.org/jira/browse/SPARK-1944
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/896#issuecomment-44369237
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/896#discussion_r13116882
--- Diff: docs/running-on-mesos.md ---
@@ -103,7 +103,7 @@ the `make-distribution.sh` script included in a Spark source tarball/checkout.
## Using a Mesos Master URL
The Master URLs for Mesos are in the form `mesos://host:5050` for a single-master Mesos
-cluster, or `zk://host:2181` for a multi-master Mesos cluster using ZooKeeper.
+cluster, or `mesos://zk://host:2181` for a multi-master Mesos cluster using ZooKeeper.
--- End diff --
Yeah I'd leave this for another time. In general none of these are URLs anyway, perhaps they could be called URIs, but I don't even want to get into the difference ;)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/896#issuecomment-44371354
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/896#issuecomment-44478690
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/896#issuecomment-44451153
All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15259/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/896#issuecomment-44349512
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by ash211 <gi...@git.apache.org>.
Github user ash211 commented on a diff in the pull request:
https://github.com/apache/spark/pull/896#discussion_r13116025
--- Diff: docs/running-on-mesos.md ---
@@ -103,7 +103,7 @@ the `make-distribution.sh` script included in a Spark source tarball/checkout.
## Using a Mesos Master URL
The Master URLs for Mesos are in the form `mesos://host:5050` for a single-master Mesos
-cluster, or `zk://host:2181` for a multi-master Mesos cluster using ZooKeeper.
+cluster, or `mesos://zk://host:2181` for a multi-master Mesos cluster using ZooKeeper.
--- End diff --
I thought this change was wrong because the `MESOS_REGEX` at https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L1437 is `"""(mesos|zk)://.*""".r` but after reading through the code this change is actually correct.
I'm not sure that `mesos://zk://host:2181` is a [valid URL](https://en.wikipedia.org/wiki/Uniform_resource_locator#Syntax) though. JDBC for example uses `jdbc:postgresql://` like in `jdbc:postgresql://localhost/test` rather than `jdbc://postgresql://...` Issue for a later time?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/896#issuecomment-44369235
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/896#issuecomment-44348370
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:
https://github.com/apache/spark/pull/896#discussion_r13108963
--- Diff: docs/submitting-applications.md ---
@@ -0,0 +1,153 @@
+---
+layout: global
+title: Submitting Applications
+---
+
+The `spark-submit` script in Spark's `bin` directory is used to launch applications on a cluster.
+It can use all of Spark's supported [cluster managers](cluster-overview.html#cluster-manager-types)
+through a uniform interface so you don't have to configure your application specially for each one.
+
+# Bundling Your Application's Dependencies
+If your code depends on other projects, you will need to package them alongside
+your application in order to distribute the code to a Spark cluster. To do this,
+to create an assembly jar (or "uber" jar) containing your code and its dependencies. Both
+[sbt](https://github.com/sbt/sbt-assembly) and
+[Maven](http://maven.apache.org/plugins/maven-shade-plugin/)
+have assembly plugins. When creating assembly jars, list Spark and Hadoop
+as `provided` dependencies; these need not be bundled since they are provided by
+the cluster manager at runtime. Once you have an assembled jar you can call the `bin/spark-submit`
+script as shown here while passing your jar.
+
+For Python, you can use the `--py-files` argument of `spark-submit` to add `.py`, `.zip` or `.egg`
+files to be distributed with your application. If you depend on multiple Python files we recommend
+packaging them into a `.zip` or `.egg`.
+
+# Launching Applications with spark-submit
+
+Once a user application is bundled, it can be launched using the `bin/spark-submit` script
--- End diff --
This sentence needs a period. Wasn't part of your change but I just noticed it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/896#issuecomment-44352723
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/896#issuecomment-44348784
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15236/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/896#issuecomment-44349180
Hey @mateiz looks great. I added two small comments which I think were both typo's unrelated to your patch.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1566] consolidate programming guide, an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/896#issuecomment-44352097
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---