You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Michael Sannella (JIRA)" <ji...@apache.org> on 2015/06/01 21:54:27 UTC

[jira] [Created] (SPARK-8019) [SparkR] Create worker R processes with a command other then Rscript

Michael Sannella created SPARK-8019:
---------------------------------------

             Summary: [SparkR] Create worker R processes with a command other then Rscript
                 Key: SPARK-8019
                 URL: https://issues.apache.org/jira/browse/SPARK-8019
             Project: Spark
          Issue Type: New Feature
          Components: SparkR
            Reporter: Michael Sannella


Currently, SparkR creates worker R processes by calling the command
"Rscript", so it depends on R being installed with that command
globally visible.

This could be a problem if one wants to use an R engine that is not
installed in this way.  For example, suppose that one has multiple
versions of R on the worker machines, and wants to try a new version
of R under SparkR before it has been formally installed.  Ideally, one
could do this by running SparkR and specifying the full path name to
the Rscript command (such as "/usr/local/R-alt/bin/Rscript").

I faced this problem in a different situation: I am working on an
alternate R engine (TERR), which has an alternate version of the
Rscript command (TERRScript).  I could make TERR work with SparkR by
setting up appropriate links from the file Rscript to my TERRscript,
but I'd rather not disable normal access to R.

I finally dealt with this by making a one-line change to
core/src/main/scala/org/apache/spark/api/r/RRDD.scala (which I will
shortly submit as a pull request for this bug) that uses the new
environment variable "spark.sparkr.r.command" to get the path for
spawning R engines.  If this variable is not defined, it defaults to
"Rscript", so we get the old behavior.  With this change, I can start
SparkR to use TERR with a command such as:

{noformat}
sc <- sparkR.init(
        sparkEnvir=list(spark.sparkr.use.daemon="false",
                        spark.sparkr.r.command="/usr/local/TERR/bin/TERRscript"))
{noformat}

This is a very low-risk change that could be generally useful to other
people.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org