You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xiangrui Meng (JIRA)" <ji...@apache.org> on 2016/08/22 00:46:20 UTC

[jira] [Comment Edited] (SPARK-16578) Configurable hostname for RBackend

    [ https://issues.apache.org/jira/browse/SPARK-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15429942#comment-15429942 ] 

Xiangrui Meng edited comment on SPARK-16578 at 8/22/16 12:46 AM:
-----------------------------------------------------------------

[~shivaram] I had an offline discussion with [~junyangq] and I feel that we might have some misunderstanding of user scenarios. 

The old workflow for SparkR is the following:

1. Users download and install Spark distribution by themselves.
2. Users let R know where to find the SparkR package on local.
3. `library(SparkR)`
4. Launch driver/SparkContext (in client mode) and connect to a local or remote cluster.

And the ideal workflow is the following:

1. install.packages("SparkR") from CRAN
2. optionally `install.spark`
3. Launch driver/SparkContext (in client mode) and connect to a local or remote cluster.

So the way we run spark-submit, RBackend, and R process, and create the SparkContext doesn't really change. They are still running on the same machine (e.g., user's laptop). So it is not necessary to make RBackend running remotely for this scenario.

Having RBackend running remotely is a new Spark deployment mode and I think it requires more design and discussions.


was (Author: mengxr):
[~shivaram] I had an offline discussion with [~junyangq] and I feel that we might have some misunderstanding of user scenarios. 

The old workflow for SparkR is the following:

1. Users download and install Spark distribution by themselves.
2. Users let R know where to find the SparkR package on local.
3. `library(SparkR)`
4. Launch driver/SparkContext (in client mode) and connect to a local or remote cluster.

And the ideal workflow is the following:

1. install.packages("SparkR")
2. optionally `install.spark`
3. Launch driver/SparkContext (in client mode) and connect to a local or remote cluster.

So the way we run spark-submit, RBackend, and R process, and create the SparkContext doesn't really change. They are still running on the same machine (e.g., user's laptop). So it is not necessary to make RBackend running remotely for this scenario.

Having RBackend running remotely is a new Spark deployment mode and I think it requires more design and discussions.

> Configurable hostname for RBackend
> ----------------------------------
>
>                 Key: SPARK-16578
>                 URL: https://issues.apache.org/jira/browse/SPARK-16578
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SparkR
>            Reporter: Shivaram Venkataraman
>            Assignee: Junyang Qian
>
> One of the requirements that comes up with SparkR being a standalone package is that users can now install just the R package on the client side and connect to a remote machine which runs the RBackend class.
> We should check if we can support this mode of execution and what are the pros / cons of it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org