You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by sh...@apache.org on 2015/10/24 06:38:23 UTC
spark git commit: [SPARK-10971][SPARKR] RRunner should allow setting
path to Rscript.
Repository: spark
Updated Branches:
refs/heads/master 4725cb988 -> 2462dbcce
[SPARK-10971][SPARKR] RRunner should allow setting path to Rscript.
Add a new spark conf option "spark.sparkr.r.driver.command" to specify the executable for an R script in client modes.
The existing spark conf option "spark.sparkr.r.command" is used to specify the executable for an R script in cluster modes for both driver and workers. See also [launch R worker script](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/r/RRDD.scala#L395).
BTW, [envrionment variable "SPARKR_DRIVER_R"](https://github.com/apache/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L275) is used to locate R shell on the local host.
For your information, PYSPARK has two environment variables serving simliar purpose:
PYSPARK_PYTHON Python binary executable to use for PySpark in both driver and workers (default is `python`).
PYSPARK_DRIVER_PYTHON Python binary executable to use for PySpark in driver only (default is PYSPARK_PYTHON).
pySpark use the code [here](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala#L41) to determine the python executable for a python script.
Author: Sun Rui <ru...@intel.com>
Closes #9179 from sun-rui/SPARK-10971.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2462dbcc
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2462dbcc
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2462dbcc
Branch: refs/heads/master
Commit: 2462dbcce89d657bca17ae311c99c2a4bee4a5fa
Parents: 4725cb9
Author: Sun Rui <ru...@intel.com>
Authored: Fri Oct 23 21:38:04 2015 -0700
Committer: Shivaram Venkataraman <sh...@cs.berkeley.edu>
Committed: Fri Oct 23 21:38:04 2015 -0700
----------------------------------------------------------------------
.../scala/org/apache/spark/deploy/RRunner.scala | 11 ++++++++++-
docs/configuration.md | 18 ++++++++++++++++++
2 files changed, 28 insertions(+), 1 deletion(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/2462dbcc/core/src/main/scala/org/apache/spark/deploy/RRunner.scala
----------------------------------------------------------------------
diff --git a/core/src/main/scala/org/apache/spark/deploy/RRunner.scala b/core/src/main/scala/org/apache/spark/deploy/RRunner.scala
index 58cc1f9..ed183cf 100644
--- a/core/src/main/scala/org/apache/spark/deploy/RRunner.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/RRunner.scala
@@ -40,7 +40,16 @@ object RRunner {
// Time to wait for SparkR backend to initialize in seconds
val backendTimeout = sys.env.getOrElse("SPARKR_BACKEND_TIMEOUT", "120").toInt
- val rCommand = "Rscript"
+ val rCommand = {
+ // "spark.sparkr.r.command" is deprecated and replaced by "spark.r.command",
+ // but kept here for backward compatibility.
+ var cmd = sys.props.getOrElse("spark.sparkr.r.command", "Rscript")
+ cmd = sys.props.getOrElse("spark.r.command", cmd)
+ if (sys.props.getOrElse("spark.submit.deployMode", "client") == "client") {
+ cmd = sys.props.getOrElse("spark.r.driver.command", cmd)
+ }
+ cmd
+ }
// Check if the file path exists.
// If not, change directory to current working directory for YARN cluster mode
http://git-wip-us.apache.org/repos/asf/spark/blob/2462dbcc/docs/configuration.md
----------------------------------------------------------------------
diff --git a/docs/configuration.md b/docs/configuration.md
index be9c36b..682384d 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1589,6 +1589,20 @@ Apart from these, the following properties are also available, and may be useful
Number of threads used by RBackend to handle RPC calls from SparkR package.
</td>
</tr>
+<tr>
+ <td><code>spark.r.command</code></td>
+ <td>Rscript</td>
+ <td>
+ Executable for executing R scripts in cluster modes for both driver and workers.
+ </td>
+</tr>
+<tr>
+ <td><code>spark.r.driver.command</code></td>
+ <td>spark.r.command</td>
+ <td>
+ Executable for executing R scripts in client modes for driver. Ignored in cluster modes.
+ </td>
+</tr>
</table>
#### Cluster Managers
@@ -1629,6 +1643,10 @@ The following variables can be set in `spark-env.sh`:
<td>Python binary executable to use for PySpark in driver only (default is <code>PYSPARK_PYTHON</code>).</td>
</tr>
<tr>
+ <td><code>SPARKR_DRIVER_R</code></td>
+ <td>R binary executable to use for SparkR shell (default is <code>R</code>).</td>
+ </tr>
+ <tr>
<td><code>SPARK_LOCAL_IP</code></td>
<td>IP address of the machine to bind to.</td>
</tr>
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org