You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by falaki <gi...@git.apache.org> on 2015/10/20 21:07:05 UTC

[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

GitHub user falaki opened a pull request:

    https://github.com/apache/spark/pull/9185

    [SPARKR] [SPARK-11199] Improve R context management story and add getOrCreate

    * Changes api.r.SQLUtils to use ```SQLContext.getOrCreate``` instead of creating a new context. 
    * Adds a simple test 
    
    [SPARK-11199] #comment link with JIRA

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/falaki/spark SPARK-11199

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9185.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9185
    
----
commit 58fff3fd032a4d1822a7dd8f09b0a012bf2e919d
Author: Hossein <ho...@databricks.com>
Date:   2015-10-20T19:04:41Z

    Using getOrCreate instead of creating new SQLContext

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by sun-rui <gi...@git.apache.org>.
Github user sun-rui commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-150162723
  
    On the R side, there is a cache of created SQLContext/HiveContext, so R won't call createSQLContext() second time. See https://github.com/apache/spark/blob/master/R/pkg/R/sparkR.R#L218 and https://github.com/apache/spark/blob/master/R/pkg/R/sparkR.R#L249. Also [SPARK-11042](https://github.com/apache/spark/pull/9058) does prevent user from creating multiple root SQLContexts (if multiple root SQLContexts is not allowed). So there is no need to change createSQLContext().
    
    However, I am not sure if we will support session in SparkR. If so, it would make sense to do such change as getOrCreate() will return the active SQLContext for current thread if it exists before returning the root SQLContext/HiveContext? 
    
    @davies, I am curious why you implemented createSQLContext() in Scala as a helper function for SparkR to create a SQLContext? It seems SparkR can directly use newJObject("org.apache.spark.sql.SQLContext", sc) to create a SQLContext just as the [creation of HiveContext](https://github.com/apache/spark/blob/master/R/pkg/R/sparkR.R#L262).
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-167738526
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-151562392
  
    @davies @rxin Any updates on this ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-167688400
  
    As pointed out above, R code actually does not call `createSQLContext` multiple times:
    
    https://github.com/apache/spark/blob/master/R/pkg/R/sparkR.R#L243
    ```
    sparkRSQL.init <- function(jsc = NULL) {
      if (exists(".sparkRSQLsc", envir = .sparkREnv)) {
        return(get(".sparkRSQLsc", envir = .sparkREnv))
      }
    ...
      sqlContext <- callJStatic("org.apache.spark.sql.api.r.SQLUtils",
                                "createSQLContext",
                                sc)
      assign(".sparkRSQLsc", sqlContext, envir = .sparkREnv)
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-167668477
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48375/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by falaki <gi...@git.apache.org>.
Github user falaki commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-150077277
  
    HiveContext is a subclass of SQLContext. So all of SQLContext functionality continues to work. cc @marmbrus what do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-154531048
  
    /cc @davies 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-149670410
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-153782645
  
    Is there an example of how people use sessions ? Or rather can @davies or you describe what is the API used to support sessions in Scala / Python ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-167862437
  
    Merging to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by sun-rui <gi...@git.apache.org>.
Github user sun-rui commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-155406712
  
    @davies, I don't understand your two cases. SparkR is actually a standalone spark application in R, the JVM backend is dedicated, won't share with other applications.
    
    SparkR is not a thrift client to an SparkSQL service,  it won't be a session of an SparkSQL thrift service,  so it makes no sense to add a flag about session into sparkR.init.
    
    SparkR itself won't try to provide multi-session support (as SparkSQL thrift server does), because R is a single-threaded environment.
    
    so my point is that a single root SQLContext is what we want.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-151568758
  
    **[Test build #1957 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1957/consoleFull)** for PR 9185 at commit [`58fff3f`](https://github.com/apache/spark/commit/58fff3fd032a4d1822a7dd8f09b0a012bf2e919d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-153614611
  
    sessions are critical - actually one of the most important features for spark 1.6.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-149670494
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-149696821
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44006/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-149696782
  
    **[Test build #44006 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44006/consoleFull)** for PR 9185 at commit [`58fff3f`](https://github.com/apache/spark/commit/58fff3fd032a4d1822a7dd8f09b0a012bf2e919d).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-167668475
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-167684925
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-167689398
  
    **[Test build #48386 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48386/consoleFull)** for PR 9185 at commit [`0633a73`](https://github.com/apache/spark/commit/0633a73ddbc6a328d579434f3c3ec349765d70ef).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by sun-rui <gi...@git.apache.org>.
Github user sun-rui commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-154333467
  
    From user's point of view, multiple concurrent R sessions are expected to allow for parallel analysis when SparkR is running as service in cloud.
    
    However, R at its core is single threaded, and it does support concept of session. So there exists some intermediate layers that enable multiple R sessions, for example, [RStudio Server Pro](https://support.rstudio.com/hc/en-us/articles/211789298-Multiple-R-Sessions-in-RStudio-Server-Pro) and [Rserve](https://cran.r-project.org/web/packages/Rserve/index.html). Both of them enable multiple R sessions by spawning multiple R processes.
    
    So my point is that within SparkR, we don't need to support SQL session and a single SQLContext is enough.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-167703809
  
    **[Test build #48386 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48386/consoleFull)** for PR 9185 at commit [`0633a73`](https://github.com/apache/spark/commit/0633a73ddbc6a328d579434f3c3ec349765d70ef).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-150318041
  
    I vote for simplicity for SparkR and not have multiple session.
    In fact I observe it is already messy to handle DataFrame created by a different SparkContext (after stop() and init()). I would argue these concepts do not translate well to R - for which for the most part 'session' == 'process'


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-151610589
  
    **[Test build #1957 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1957/consoleFull)** for PR 9185 at commit [`58fff3f`](https://github.com/apache/spark/commit/58fff3fd032a4d1822a7dd8f09b0a012bf2e919d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-152774944
  
    @rxin can you comment on how important sessions are (esp. with respect to SparkR) ? If they are not important we can significantly simplify things and just support one SQLContext (or HiveContext if thats used) for a SparkR session.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-149781581
  
    @falaki SQLContext.getOrCreate could return HiveContext, it's slightly different than `new SQLContext`, is this what we want?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-154524422
  
    I would propose that sparkR.init just has a flag that says if sessions should be isolated or not.  When it connects it can either call `SQLContext.getOrCreate(sc).newSession()` or just `SQLContext.getOrCreate(sc)` based on that flag.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-149696819
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-151567766
  
    @falaki HiveContext has more functionality than SQLContext (Window functions, ORC files etc.), and a few semantic difference (how to parse decimal and intervals). Usually user will only want one, so it means the created one should be the one user also want in SparkR. I think this change is fine.
    
    @sun-rui That change is not necessary, I may did not figure out the better way to do it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-167703848
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48386/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-167688403
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by falaki <gi...@git.apache.org>.
Github user falaki commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-167664825
  
    ping @marmbrus 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-167688404
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48384/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/9185


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-154529467
  
    @shivaram I did plan on changing `sparkR.init()` and `sparkRSQL.init()`
    I like the idea @marmbrus proposed - we could have `sparkR.init(new.sql.session = TRUE)` to call `SQLContext.getOrCreate(sc).newSession()` - we could have the option of keeping the same `SparkContext` or stop/create a new one from a simple API call.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-149672129
  
    **[Test build #44006 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44006/consoleFull)** for PR 9185 at commit [`58fff3f`](https://github.com/apache/spark/commit/58fff3fd032a4d1822a7dd8f09b0a012bf2e919d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-151568092
  
    We could merge this after pass tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-149764857
  
    cc @davies 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-154490703
  
    Thanks @marmbrus - That helps. Yeah so I think from my perspective it makes sense to just have one *active* session for SparkR and thus just one active SQLContext. This means that the change @felixcheung was working on to hide `sqlContext` from users would still be required. 
    
    We should however add support for two things (a) Specifying a session in `sparkR.init` (are sessions just named as strings ?) (b) a command to switch sessions which will change the sqlContext and all the necessary data structures under the covers so that commands run after this will run with the new session. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-167703846
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-167665605
  
    This seems fine to me as a first step.  Eventually we will probably want to make the RBackend multi-session aware.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-154212591
  
    Sessions allow multiple users to share a Spark SQL Cluster without clobbering each other.  Imagine you have multiple R sessions connected to the same Spark cluster.  If one user runs `sql("use database foo")` its going to break all the other users that are connected.
    
    One question is if a single instance of Spark R needs to be able to have more than one session (forgive me if this doesn't make sense as I'm super unfamiliar with our R arch).  I think the most important thing is that its possible to inject an isolate session in to Spark R, not that in a single instance of Spark R you can have more than one session.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-167688501
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/9185#issuecomment-154548391
  
    R is does not support threading, it's reasonable that SparkR does not support multiple sessions in the same R process in the same time, as we move to have a SQLContext a singleton in SparkR.
    
    Considering two cases: 
    1) the JVM is launched by R process,  then call `new SQLContext` or `SQLContext.getOrCreate` will be the same.
     2) The JVM is launched first, then launch R process and pass the information of R backend into it. Should SparkR share the same SQL session with other applications in JVM or not? Then @marmbrus 's suggestion sounds like a better approach, giving use the ability to choose what they want.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org