You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by sh...@apache.org on 2015/10/24 06:38:23 UTC

spark git commit: [SPARK-10971][SPARKR] RRunner should allow setting path to Rscript.

Repository: spark
Updated Branches:
  refs/heads/master 4725cb988 -> 2462dbcce


[SPARK-10971][SPARKR] RRunner should allow setting path to Rscript.

Add a new spark conf option "spark.sparkr.r.driver.command" to specify the executable for an R script in client modes.

The existing spark conf option "spark.sparkr.r.command" is used to specify the executable for an R script in cluster modes for both driver and workers. See also [launch R worker script](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/r/RRDD.scala#L395).

BTW, [envrionment variable "SPARKR_DRIVER_R"](https://github.com/apache/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L275) is used to locate R shell on the local host.

For your information, PYSPARK has two environment variables serving simliar purpose:
PYSPARK_PYTHON	      Python binary executable to use for PySpark in both driver and workers (default is `python`).
PYSPARK_DRIVER_PYTHON	Python binary executable to use for PySpark in driver only (default is PYSPARK_PYTHON).
pySpark use the code [here](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala#L41) to determine the python executable for a python script.

Author: Sun Rui <ru...@intel.com>

Closes #9179 from sun-rui/SPARK-10971.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2462dbcc
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2462dbcc
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2462dbcc

Branch: refs/heads/master
Commit: 2462dbcce89d657bca17ae311c99c2a4bee4a5fa
Parents: 4725cb9
Author: Sun Rui <ru...@intel.com>
Authored: Fri Oct 23 21:38:04 2015 -0700
Committer: Shivaram Venkataraman <sh...@cs.berkeley.edu>
Committed: Fri Oct 23 21:38:04 2015 -0700

----------------------------------------------------------------------
 .../scala/org/apache/spark/deploy/RRunner.scala   | 11 ++++++++++-
 docs/configuration.md                             | 18 ++++++++++++++++++
 2 files changed, 28 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/2462dbcc/core/src/main/scala/org/apache/spark/deploy/RRunner.scala
----------------------------------------------------------------------
diff --git a/core/src/main/scala/org/apache/spark/deploy/RRunner.scala b/core/src/main/scala/org/apache/spark/deploy/RRunner.scala
index 58cc1f9..ed183cf 100644
--- a/core/src/main/scala/org/apache/spark/deploy/RRunner.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/RRunner.scala
@@ -40,7 +40,16 @@ object RRunner {
 
     // Time to wait for SparkR backend to initialize in seconds
     val backendTimeout = sys.env.getOrElse("SPARKR_BACKEND_TIMEOUT", "120").toInt
-    val rCommand = "Rscript"
+    val rCommand = {
+      // "spark.sparkr.r.command" is deprecated and replaced by "spark.r.command",
+      // but kept here for backward compatibility.
+      var cmd = sys.props.getOrElse("spark.sparkr.r.command", "Rscript")
+      cmd = sys.props.getOrElse("spark.r.command", cmd)
+      if (sys.props.getOrElse("spark.submit.deployMode", "client") == "client") {
+        cmd = sys.props.getOrElse("spark.r.driver.command", cmd)
+      }
+      cmd
+    }
 
     // Check if the file path exists.
     // If not, change directory to current working directory for YARN cluster mode

http://git-wip-us.apache.org/repos/asf/spark/blob/2462dbcc/docs/configuration.md
----------------------------------------------------------------------
diff --git a/docs/configuration.md b/docs/configuration.md
index be9c36b..682384d 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1589,6 +1589,20 @@ Apart from these, the following properties are also available, and may be useful
     Number of threads used by RBackend to handle RPC calls from SparkR package.
   </td>
 </tr>
+<tr>
+  <td><code>spark.r.command</code></td>
+  <td>Rscript</td>
+  <td>
+    Executable for executing R scripts in cluster modes for both driver and workers.
+  </td>
+</tr>
+<tr>
+  <td><code>spark.r.driver.command</code></td>
+  <td>spark.r.command</td>
+  <td>
+    Executable for executing R scripts in client modes for driver. Ignored in cluster modes.
+  </td>
+</tr>
 </table>
 
 #### Cluster Managers
@@ -1629,6 +1643,10 @@ The following variables can be set in `spark-env.sh`:
     <td>Python binary executable to use for PySpark in driver only (default is <code>PYSPARK_PYTHON</code>).</td>
   </tr>
   <tr>
+    <td><code>SPARKR_DRIVER_R</code></td>
+    <td>R binary executable to use for SparkR shell (default is <code>R</code>).</td>
+  </tr>
+  <tr>
     <td><code>SPARK_LOCAL_IP</code></td>
     <td>IP address of the machine to bind to.</td>
   </tr>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org