You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by adrian555 <gi...@git.apache.org> on 2018/09/18 21:15:11 UTC

[GitHub] spark pull request #22455: [SPARK-24572][SparkR] "eager execution" for R she...

GitHub user adrian555 opened a pull request:

    https://github.com/apache/spark/pull/22455

    [SPARK-24572][SparkR] "eager execution" for R shell, IDE

    ## What changes were proposed in this pull request?
    
    Check the `spark.sql.repl.eagerEval.enabled` configuration property in SparkDataFrame `show()` method. If the `SparkSession` has eager execution enabled, the data will be returned to the R client when the data frame is created. So instead of seeing this
    ```
    > df <- createDataFrame(faithful)
    > df
    SparkDataFrame[eruptions:double, waiting:double]
    ```
    you will see
    ```
    > df <- createDataFrame(faithful)
    > df
    +---------+-------+                                                             
    |eruptions|waiting|
    +---------+-------+
    |      3.6|   79.0|
    |      1.8|   54.0|
    |    3.333|   74.0|
    |    2.283|   62.0|
    |    4.533|   85.0|
    |    2.883|   55.0|
    |      4.7|   88.0|
    |      3.6|   85.0|
    |     1.95|   51.0|
    |     4.35|   85.0|
    |    1.833|   54.0|
    |    3.917|   84.0|
    |      4.2|   78.0|
    |     1.75|   47.0|
    |      4.7|   83.0|
    |    2.167|   52.0|
    |     1.75|   62.0|
    |      4.8|   84.0|
    |      1.6|   52.0|
    |     4.25|   79.0|
    +---------+-------+
    only showing top 20 rows
    ```
    
    ## How was this patch tested?
    Manual tests as well as unit tests (one new test case is added).


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/adrian555/spark eager_execution

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22455.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22455
    
----
commit b3e014c4e8050e8a0b3da190bb327347f9136b7e
Author: adrian555 <v2ave10p>
Date:   2018-09-18T20:48:56Z

    support eager execution

commit d0be3a8582de548862a78cfa5ffb58df933efa8d
Author: adrian555 <v2ave10p>
Date:   2018-09-18T20:51:43Z

    Merge remote-tracking branch 'remotes/upstream/master' into eager_execution

commit cd8a7041c6eecc59d22db72f4f2065ed9f06640a
Author: adrian555 <v2ave10p>
Date:   2018-09-18T21:09:48Z

    add newline

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219576011
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -226,7 +226,8 @@ setMethod("showDF",
     
     #' show
     #'
    -#' Print class and type information of a Spark object.
    +#' If eager evaluation is enabled and the Spark object is a SparkDataFrame, return the data of
    --- End diff --
    
    Done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96383/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219029847
  
    --- Diff: docs/sparkr.md ---
    @@ -450,6 +450,42 @@ print(model.summaries)
     {% endhighlight %}
     </div>
     
    +### Eager execution
    --- End diff --
    
    should be `####` I think?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219351707
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -244,11 +245,15 @@ setMethod("showDF",
     #' @note show(SparkDataFrame) since 1.4.0
     setMethod("show", "SparkDataFrame",
               function(object) {
    -            cols <- lapply(dtypes(object), function(l) {
    -              paste(l, collapse = ":")
    -            })
    -            s <- paste(cols, collapse = ", ")
    -            cat(paste(class(object), "[", s, "]\n", sep = ""))
    +            if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) {
    --- End diff --
    
    Can you also correct the param doc?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 closed the pull request at:

    https://github.com/apache/spark/pull/22455


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3359/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    **[Test build #96569 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96569/testReport)** for PR 22455 at commit [`57eb008`](https://github.com/apache/spark/commit/57eb00824b072b2a326810ff42edef2ca2626eb6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SparkR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    **[Test build #97551 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97551/testReport)** for PR 22455 at commit [`76767f9`](https://github.com/apache/spark/commit/76767f9e5288d593efedda000c7fc01799d43085).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    **[Test build #97551 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97551/testReport)** for PR 22455 at commit [`76767f9`](https://github.com/apache/spark/commit/76767f9e5288d593efedda000c7fc01799d43085).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219347786
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -244,11 +245,15 @@ setMethod("showDF",
     #' @note show(SparkDataFrame) since 1.4.0
     setMethod("show", "SparkDataFrame",
               function(object) {
    -            cols <- lapply(dtypes(object), function(l) {
    -              paste(l, collapse = ":")
    -            })
    -            s <- paste(cols, collapse = ", ")
    -            cat(paste(class(object), "[", s, "]\n", sep = ""))
    +            if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) {
    --- End diff --
    
    `show` also works on other Spark object like `Column`, I think we only need to do `showDF` for `SparkDataFrame`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SparkR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3203/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219030775
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -244,11 +244,15 @@ setMethod("showDF",
     #' @note show(SparkDataFrame) since 1.4.0
     setMethod("show", "SparkDataFrame",
               function(object) {
    -            cols <- lapply(dtypes(object), function(l) {
    -              paste(l, collapse = ":")
    -            })
    -            s <- paste(cols, collapse = ", ")
    -            cat(paste(class(object), "[", s, "]\n", sep = ""))
    +            if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) {
    --- End diff --
    
    also not sure if it's done for python, consider adding to the doc above (L229) how it behaves with eagerEval


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    **[Test build #96383 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96383/testReport)** for PR 22455 at commit [`a083d64`](https://github.com/apache/spark/commit/a083d64ea3210c367cc4d63ad51a1c6beab129e7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219384936
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -244,11 +245,15 @@ setMethod("showDF",
     #' @note show(SparkDataFrame) since 1.4.0
     setMethod("show", "SparkDataFrame",
               function(object) {
    -            cols <- lapply(dtypes(object), function(l) {
    -              paste(l, collapse = ":")
    -            })
    -            s <- paste(cols, collapse = ", ")
    -            cat(paste(class(object), "[", s, "]\n", sep = ""))
    +            if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) {
    --- End diff --
    
    Had thought about this. First, I consider it is not in the scope of this jira, because I think they are conflicting with the current `showDF()` behavior. 
    
    Some details: the `showDF()` already takes `numRows` and `truncate` arguments. So if we are going to respect those two as well, we have to decide what behavior is best suitable for `showDF()`. For example, whether `showDF()` should just ignore the eager execution, or it picks the `maxNumRows` and `truncate` set through eager execution like following:
    
    ```
    setMethod("showDF",
              signature(x = "SparkDataFrame"),
              function(x, numRows = 20, truncate = TRUE, vertical = FALSE) {
                eagerNumRows <- as.numeric(sparkR.conf("spark.sql.repl.eagerEval.maxNumRows", "0")[[1]])
                numRows <- ifelse(eagerNumRows == 0, numRows, eagerNumRows)
                eagerTruncate <- as.numeric(sparkR.conf("spark.sql.repl.eagerEval.truncate", "0")[[1]])
                truncate <- ifelse(eagerTruncate == 0, truncate, eagerTruncate)
                if (is.logical(truncate) && truncate) {
                  s <- callJMethod(x@sdf, "showString", numToInt(numRows), numToInt(20), vertical)
                } else {
                  truncate2 <- as.numeric(truncate)
                  s <- callJMethod(x@sdf, "showString", numToInt(numRows), numToInt(truncate2),
                                   vertical)
                }
                cat(s)
              })
    ```
    
    And if we think that `showDF()` can ignore the eager execution setting and still want the `show()` to observe eager execution config, we can certainly just grab the `maxNumRows` and `truncate` setting and pass to `showDF() call.
    
    However, my second point is that I don't think these two configs matter much or that important/necessary. Since the eager execution is just to show a snippet data of the SparkDataFrame, our default `numRows = 20` and `truncate = TRUE` are good enough iMO. If users want to see more or less number of rows, they should call `showDF()`.
    
    @felixcheung, your thought?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    **[Test build #96381 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96381/testReport)** for PR 22455 at commit [`a89bf37`](https://github.com/apache/spark/commit/a89bf37e7fbfa089a39e0fb91cdd4a1bdd409b9f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r226874309
  
    --- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
    @@ -0,0 +1,52 @@
    +#
    --- End diff --
    
    one thing to note, since test runs alphebatically, this test will run before sparkSQL - I think we should rename this to test_sparkSQL_eager.R to ensure it runs after


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r223197875
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -246,30 +248,38 @@ setMethod("showDF",
     #' @note show(SparkDataFrame) since 1.4.0
     setMethod("show", "SparkDataFrame",
               function(object) {
    -            allConf <- sparkR.conf()
    -            if (!is.null(allConf[["spark.sql.repl.eagerEval.enabled"]]) &&
    -                identical(allConf[["spark.sql.repl.eagerEval.enabled"]], "true")) {
    -              argsList <- list()
    -              argsList$x <- object
    -              if (!is.null(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])) {
    -                numRows <- as.numeric(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])
    -                if (numRows > 0) {
    -                  argsList$numRows <- numRows
    +            showFunc <- getOption("sparkr.SparkDataFrame.base_show_func")
    --- End diff --
    
    could we consider leaving print/show option out? I'd like to get eager compute to work even in basic sparkR / R shell


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219330082
  
    --- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
    @@ -0,0 +1,58 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +library(testthat)
    +
    +context("Show SparkDataFrame when eager execution is enabled.")
    +
    +test_that("eager execution is not enabled", {
    +  # Start Spark session without eager execution enabled
    +  sparkSession <- if (windows_with_hadoop()) {
    +    sparkR.session(master = sparkRTestMaster)
    +  } else {
    +    sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
    +  }
    +  
    +  df <- createDataFrame(faithful)
    +  expect_is(df, "SparkDataFrame")
    +  expected <- "eruptions:double, waiting:double"
    +  expect_output(show(df), expected)
    +  
    +  # Stop Spark session
    +  sparkR.session.stop()
    +})
    +
    +test_that("eager execution is enabled", {
    +  # Start Spark session without eager execution enabled
    --- End diff --
    
    Here we are testing with eager execution enabled, right ? Do we need to fix the comment here ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    **[Test build #96387 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96387/testReport)** for PR 22455 at commit [`7b121e6`](https://github.com/apache/spark/commit/7b121e65b99e177d8870b4e098797f1f1e86ce65).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    **[Test build #96387 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96387/testReport)** for PR 22455 at commit [`7b121e6`](https://github.com/apache/spark/commit/7b121e65b99e177d8870b4e098797f1f1e86ce65).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r220030686
  
    --- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
    @@ -0,0 +1,60 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +library(testthat)
    +
    +context("Show SparkDataFrame when eager execution is enabled.")
    +
    +test_that("eager execution is not enabled", {
    +  # Start Spark session without eager execution enabled
    +  sparkSession <- if (windows_with_hadoop()) {
    +    sparkR.session(master = sparkRTestMaster)
    +  } else {
    +    sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
    --- End diff --
    
    actually you can use `, enableHiveSupport = FALSE` for this file in all cases - we only need the opposite for SQL tests


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219315429
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -244,11 +244,15 @@ setMethod("showDF",
     #' @note show(SparkDataFrame) since 1.4.0
     setMethod("show", "SparkDataFrame",
               function(object) {
    -            cols <- lapply(dtypes(object), function(l) {
    -              paste(l, collapse = ":")
    -            })
    -            s <- paste(cols, collapse = ", ")
    -            cat(paste(class(object), "[", s, "]\n", sep = ""))
    +            if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) {
    --- End diff --
    
    Done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r227500208
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -244,11 +246,33 @@ setMethod("showDF",
     #' @note show(SparkDataFrame) since 1.4.0
     setMethod("show", "SparkDataFrame",
               function(object) {
    -            cols <- lapply(dtypes(object), function(l) {
    -              paste(l, collapse = ":")
    -            })
    -            s <- paste(cols, collapse = ", ")
    -            cat(paste(class(object), "[", s, "]\n", sep = ""))
    +            allConf <- sparkR.conf()
    +            prop <- allConf[["spark.sql.repl.eagerEval.enabled"]]
    +            if (!is.null(prop) && identical(prop, "true")) {
    +              argsList <- list()
    +              argsList$x <- object
    +              prop <- allConf[["spark.sql.repl.eagerEval.maxNumRows"]]
    +              if (!is.null(prop)) {
    +                numRows <- as.numeric(prop)
    +                if (numRows > 0) {
    +                  argsList$numRows <- numRows
    --- End diff --
    
    A spark session would not start if the maxNumRows config is not an integer. So replacing as.numeric with as.integer here won't make real difference. But I made the code change anyway to be clearer from code view.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r220292513
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -244,11 +246,31 @@ setMethod("showDF",
     #' @note show(SparkDataFrame) since 1.4.0
     setMethod("show", "SparkDataFrame",
               function(object) {
    -            cols <- lapply(dtypes(object), function(l) {
    -              paste(l, collapse = ":")
    -            })
    -            s <- paste(cols, collapse = ", ")
    -            cat(paste(class(object), "[", s, "]\n", sep = ""))
    +            allConf <- sparkR.conf()
    +            if (!is.null(allConf[["spark.sql.repl.eagerEval.enabled"]]) &&
    --- End diff --
    
    Done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    **[Test build #96906 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96906/testReport)** for PR 22455 at commit [`44df922`](https://github.com/apache/spark/commit/44df922fb347211c9a962bf1771e2e0005634305).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r226472697
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -246,30 +248,38 @@ setMethod("showDF",
     #' @note show(SparkDataFrame) since 1.4.0
     setMethod("show", "SparkDataFrame",
               function(object) {
    -            allConf <- sparkR.conf()
    -            if (!is.null(allConf[["spark.sql.repl.eagerEval.enabled"]]) &&
    -                identical(allConf[["spark.sql.repl.eagerEval.enabled"]], "true")) {
    -              argsList <- list()
    -              argsList$x <- object
    -              if (!is.null(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])) {
    -                numRows <- as.numeric(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])
    -                if (numRows > 0) {
    -                  argsList$numRows <- numRows
    +            showFunc <- getOption("sparkr.SparkDataFrame.base_show_func")
    --- End diff --
    
    I have backed out the change of display function option. Also opened jira https://issues.apache.org/jira/browse/SPARK-25770 to track that issue.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r227500368
  
    --- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
    @@ -0,0 +1,52 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +library(testthat)
    +
    +context("Show SparkDataFrame when eager execution is enabled.")
    +
    +test_that("eager execution is not enabled", {
    +  # Start Spark session without eager execution enabled
    +  sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
    +  
    +  df <- createDataFrame(faithful)
    +  expect_is(df, "SparkDataFrame")
    +  expected <- "eruptions:double, waiting:double"
    +  expect_output(show(df), expected)
    +  
    +  # Stop Spark session
    +  sparkR.session.stop()
    +})
    +
    +test_that("eager execution is enabled", {
    +  # Start Spark session with eager execution enabled
    +  sparkConfig <- list(spark.sql.repl.eagerEval.enabled = "true",
    +                      spark.sql.repl.eagerEval.maxNumRows = as.integer(10))
    --- End diff --
    
    Done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219402281
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -244,11 +245,15 @@ setMethod("showDF",
     #' @note show(SparkDataFrame) since 1.4.0
     setMethod("show", "SparkDataFrame",
               function(object) {
    -            cols <- lapply(dtypes(object), function(l) {
    -              paste(l, collapse = ":")
    -            })
    -            s <- paste(cols, collapse = ", ")
    -            cat(paste(class(object), "[", s, "]\n", sep = ""))
    +            if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) {
    --- End diff --
    
    ^ that - it's two different "overloads" for two different "class"


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219030537
  
    --- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
    @@ -0,0 +1,58 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +library(testthat)
    +
    +context("Show SparkDataFrame when eager execution is enabled.")
    +
    +test_that("eager execution is not enabled", {
    +  # Start Spark session without eager execution enabled
    +  sparkSession <- if (windows_with_hadoop()) {
    +    sparkR.session(master = sparkRTestMaster)
    +  } else {
    +    sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
    +  }
    +  
    +  df <- suppressWarnings(createDataFrame(iris))
    +  expect_is(df, "SparkDataFrame")
    +  expected <- "Sepal_Length:double, Sepal_Width:double, Petal_Length:double, Petal_Width:double, Species:string"
    +  expect_output(show(df), expected)
    +  
    +  # Stop Spark session
    +  sparkR.session.stop()
    +})
    +
    +test_that("eager execution is enabled", {
    +  # Start Spark session without eager execution enabled
    +  sparkSession <- if (windows_with_hadoop()) {
    +    sparkR.session(master = sparkRTestMaster,
    +                   sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true"))
    +  } else {
    +    sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE, 
    +                   sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true"))
    +  }
    +  
    +  df <- suppressWarnings(createDataFrame(iris))
    --- End diff --
    
    ditto


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3312/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219315320
  
    --- Diff: docs/sparkr.md ---
    @@ -450,6 +450,42 @@ print(model.summaries)
     {% endhighlight %}
     </div>
     
    +### Eager execution
    +
    +If the eager execution is enabled, the data will be returned to R client immediately when the `SparkDataFrame` is created. Eager execution can be enabled by setting the configuration property `spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started up.
    +
    +<div data-lang="r" markdown="1">
    +{% highlight r %}
    +
    +# Start up spark session with eager execution enabled
    +sparkR.session(master = "local[*]", sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true"))
    +
    +df <- createDataFrame(faithful)
    +
    +# Instead of displaying the SparkDataFrame class, displays the data returned
    +df
    +
    +##+---------+-------+                                                             
    +##|eruptions|waiting|
    +##+---------+-------+
    +##|      3.6|   79.0|
    +##|      1.8|   54.0|
    +##|    3.333|   74.0|
    +##|    2.283|   62.0|
    +##|    4.533|   85.0|
    +##|    2.883|   55.0|
    +##|      4.7|   88.0|
    +##|      3.6|   85.0|
    +##|     1.95|   51.0|
    +##|     4.35|   85.0|
    +##+---------+-------+
    +##only showing top 10 rows
    +
    +{% endhighlight %} 
    +</div>
    +
    +Note that the `SparkSession` created by `sparkR` shell does not have eager execution enabled. You can stop the current session and start up a new session like above to enable.
    --- End diff --
    
    Changed to this: 
    ```
    Note that to enable eager execution through `sparkR` command, add `spark.sql.repl.eagerEval.enabled=true` configuration property to the `--conf` option.
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r220033577
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -244,11 +246,31 @@ setMethod("showDF",
     #' @note show(SparkDataFrame) since 1.4.0
     setMethod("show", "SparkDataFrame",
               function(object) {
    -            cols <- lapply(dtypes(object), function(l) {
    -              paste(l, collapse = ":")
    -            })
    -            s <- paste(cols, collapse = ", ")
    -            cat(paste(class(object), "[", s, "]\n", sep = ""))
    +            allConf <- sparkR.conf()
    +            if (!is.null(allConf[["spark.sql.repl.eagerEval.enabled"]]) &&
    --- End diff --
    
    you can also save off `foo <- allConf[["spark.sql.repl.eagerEval.enabled"]]` and reuse it, ditto for other cases


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r223197863
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -246,30 +248,38 @@ setMethod("showDF",
     #' @note show(SparkDataFrame) since 1.4.0
     setMethod("show", "SparkDataFrame",
               function(object) {
    -            allConf <- sparkR.conf()
    -            if (!is.null(allConf[["spark.sql.repl.eagerEval.enabled"]]) &&
    -                identical(allConf[["spark.sql.repl.eagerEval.enabled"]], "true")) {
    -              argsList <- list()
    -              argsList$x <- object
    -              if (!is.null(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])) {
    -                numRows <- as.numeric(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])
    -                if (numRows > 0) {
    -                  argsList$numRows <- numRows
    +            showFunc <- getOption("sparkr.SparkDataFrame.base_show_func")
    --- End diff --
    
    IMO pretty print should plug in to something more R standard like 
    [printr](https://yihui.name/printr/) or
    [lemon](https://cran.r-project.org/web/packages/lemon/vignettes/lemon_print.html)
    or
    [print.x](https://stat.ethz.ch/R-manual/R-devel/library/base/html/print.dataframe.html)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219030211
  
    --- Diff: docs/sparkr.md ---
    @@ -450,6 +450,42 @@ print(model.summaries)
     {% endhighlight %}
     </div>
     
    +### Eager execution
    +
    +If the eager execution is enabled, the data will be returned to R client immediately when the `SparkDataFrame` is created. Eager execution can be enabled by setting the configuration property `spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started up.
    +
    +<div data-lang="r" markdown="1">
    +{% highlight r %}
    +
    +# Start up spark session with eager execution enabled
    +sparkR.session(master = "local[*]", sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true"))
    +
    +df <- createDataFrame(faithful)
    +
    +# Instead of displaying the SparkDataFrame class, displays the data returned
    --- End diff --
    
    we could also start here by saying "similar to R data.frame`...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    > I'm not sure about the new changes. IMO we are mixing S3 and S4 convention here as per @adrian555.
    
    The implemented code is still conforming to S4 object standard that `show()` is the method to display the object.
    
    > IMO, we should have an option for show to return a data.frame, if we are running in an IDE. then we can leverage any custom print logic the IDE has
    
    `show()` ideally should return an invisible NULL to be consistent with other R objects.
    
    CC @falaki for any further discussion on this topic.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219938610
  
    --- Diff: docs/sparkr.md ---
    @@ -450,6 +450,48 @@ print(model.summaries)
     {% endhighlight %}
     </div>
     
    +### Eager execution
    +
    +If the eager execution is enabled, the data will be returned to R client immediately when the `SparkDataFrame` is created. Eager execution can be enabled by setting the configuration property `spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started up.
    +
    +Maximum number of rows and maximum number of characters per column of data to display can be controlled by `spark.sql.repl.eagerEval.maxNumRows` and `spark.sql.repl.eagerEval.truncate` configuration properties, respectively.
    --- End diff --
    
    Done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219351194
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -244,11 +245,15 @@ setMethod("showDF",
     #' @note show(SparkDataFrame) since 1.4.0
     setMethod("show", "SparkDataFrame",
               function(object) {
    -            cols <- lapply(dtypes(object), function(l) {
    -              paste(l, collapse = ":")
    -            })
    -            s <- paste(cols, collapse = ", ")
    -            cat(paste(class(object), "[", s, "]\n", sep = ""))
    +            if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) {
    --- End diff --
    
    I think the `param` is not correct... the signature of `show()` has `SparkDataFrame` so in this one it won't work for `Column`. In column.R file, it has its own `show()` implementation.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r226874142
  
    --- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
    @@ -0,0 +1,52 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +library(testthat)
    +
    +context("Show SparkDataFrame when eager execution is enabled.")
    +
    +test_that("eager execution is not enabled", {
    +  # Start Spark session without eager execution enabled
    +  sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
    +  
    +  df <- createDataFrame(faithful)
    +  expect_is(df, "SparkDataFrame")
    +  expected <- "eruptions:double, waiting:double"
    +  expect_output(show(df), expected)
    +  
    +  # Stop Spark session
    +  sparkR.session.stop()
    +})
    +
    +test_that("eager execution is enabled", {
    +  # Start Spark session with eager execution enabled
    +  sparkConfig <- list(spark.sql.repl.eagerEval.enabled = "true",
    +                      spark.sql.repl.eagerEval.maxNumRows = as.integer(10))
    --- End diff --
    
    add a test with `spark.sql.repl.eagerEval.maxNumRows` not set.
    change this test to add `spark.sql.repl.eagerEval.truncate`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SparkR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    **[Test build #96645 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96645/testReport)** for PR 22455 at commit [`df582fc`](https://github.com/apache/spark/commit/df582fc8d31c4e993a6e215fa0901a4728affc0e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3499/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219030474
  
    --- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
    @@ -0,0 +1,58 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +library(testthat)
    +
    +context("Show SparkDataFrame when eager execution is enabled.")
    +
    +test_that("eager execution is not enabled", {
    --- End diff --
    
    I'm neutral, should these tests be in test_sparkSQL.R? it takes longer to run with many test files


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r227500418
  
    --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
    @@ -3731,6 +3731,7 @@ test_that("catalog APIs, listTables, listColumns, listFunctions", {
       dropTempView("cars")
     })
     
    +
    --- End diff --
    
    Done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219938566
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -244,11 +246,25 @@ setMethod("showDF",
     #' @note show(SparkDataFrame) since 1.4.0
     setMethod("show", "SparkDataFrame",
               function(object) {
    -            cols <- lapply(dtypes(object), function(l) {
    -              paste(l, collapse = ":")
    -            })
    -            s <- paste(cols, collapse = ", ")
    -            cat(paste(class(object), "[", s, "]\n", sep = ""))
    +            if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) {
    --- End diff --
    
    Retrieving all conf through `sparkR.conf()` does not provide a default value for each conf, so extra condition check is needed. Ideally, I believe we should have cached all configuration for the session inside the sparkr env to avoid backend calls at all.
    
    Anyway, I made the suggested change.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219388335
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -244,11 +245,15 @@ setMethod("showDF",
     #' @note show(SparkDataFrame) since 1.4.0
     setMethod("show", "SparkDataFrame",
               function(object) {
    -            cols <- lapply(dtypes(object), function(l) {
    -              paste(l, collapse = ":")
    -            })
    -            s <- paste(cols, collapse = ", ")
    -            cat(paste(class(object), "[", s, "]\n", sep = ""))
    +            if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) {
    --- End diff --
    
    @adrian555 Thanks for the explanation. 
    > However, my second point is that I don't think these two configs matter much or that important/necessary. Since the eager execution is just to show a snippet data of the SparkDataFrame, our default numRows = 20 and truncate = TRUE are good enough iMO. If users want to see more or less number of rows, they should call showDF().
    
    So i just wanted to make sure if its possible to have parity with how it works for python. It seems to me that in python, we just get the two configs and call the showstring method.
    
    > And if we think that showDF() can ignore the eager execution setting and still want the show() to observe eager execution config, we can certainly just grab the maxNumRows and truncate setting and pass to showDF() call.
    
    What will happen if we grab these config in show() when eager execution is enabled and then call showDF() by passing these parameters ? 



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96515/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3501/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    LGTM. merged to master


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4448/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    **[Test build #96383 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96383/testReport)** for PR 22455 at commit [`a083d64`](https://github.com/apache/spark/commit/a083d64ea3210c367cc4d63ad51a1c6beab129e7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    **[Test build #96647 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96647/testReport)** for PR 22455 at commit [`d156277`](https://github.com/apache/spark/commit/d1562777924fd84f14655df9c7418575aeae7fbe).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219971196
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -244,11 +246,25 @@ setMethod("showDF",
     #' @note show(SparkDataFrame) since 1.4.0
     setMethod("show", "SparkDataFrame",
               function(object) {
    -            cols <- lapply(dtypes(object), function(l) {
    -              paste(l, collapse = ":")
    -            })
    -            s <- paste(cols, collapse = ", ")
    -            cat(paste(class(object), "[", s, "]\n", sep = ""))
    +            if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) {
    +              argsList <- list()
    +              argsList$x <- object
    +              numRows <- as.numeric(sparkR.conf("spark.sql.repl.eagerEval.maxNumRows", "0")[[1]])
    +              if (numRows > 0) {
    +                argsList$numRows <- numRows
    +              }
    +              truncate <- as.numeric(sparkR.conf("spark.sql.repl.eagerEval.truncate", "0")[[1]])
    --- End diff --
    
    I did test and it worked. Any specific concern? Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96645/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219404319
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -244,11 +245,15 @@ setMethod("showDF",
     #' @note show(SparkDataFrame) since 1.4.0
     setMethod("show", "SparkDataFrame",
               function(object) {
    -            cols <- lapply(dtypes(object), function(l) {
    -              paste(l, collapse = ":")
    -            })
    -            s <- paste(cols, collapse = ", ")
    -            cat(paste(class(object), "[", s, "]\n", sep = ""))
    +            if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) {
    --- End diff --
    
    I see. The document generated looks like https://spark.apache.org/docs/latest/api/R/index.html.
    
    Then rewriting the description is good and clear.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    **[Test build #96515 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96515/testReport)** for PR 22455 at commit [`100bac7`](https://github.com/apache/spark/commit/100bac715ee3fc6baee960dfa7c9466baa032bcb).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219315410
  
    --- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
    @@ -0,0 +1,58 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +library(testthat)
    +
    +context("Show SparkDataFrame when eager execution is enabled.")
    +
    +test_that("eager execution is not enabled", {
    +  # Start Spark session without eager execution enabled
    +  sparkSession <- if (windows_with_hadoop()) {
    +    sparkR.session(master = sparkRTestMaster)
    +  } else {
    +    sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
    +  }
    +  
    +  df <- suppressWarnings(createDataFrame(iris))
    +  expect_is(df, "SparkDataFrame")
    +  expected <- "Sepal_Length:double, Sepal_Width:double, Petal_Length:double, Petal_Width:double, Species:string"
    +  expect_output(show(df), expected)
    +  
    +  # Stop Spark session
    +  sparkR.session.stop()
    +})
    +
    +test_that("eager execution is enabled", {
    +  # Start Spark session without eager execution enabled
    +  sparkSession <- if (windows_with_hadoop()) {
    +    sparkR.session(master = sparkRTestMaster,
    +                   sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true"))
    +  } else {
    +    sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE, 
    +                   sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true"))
    +  }
    +  
    +  df <- suppressWarnings(createDataFrame(iris))
    --- End diff --
    
    Done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96569/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3447/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

GitHub user adrian555 reopened a pull request:

    https://github.com/apache/spark/pull/22455

    [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

    ## What changes were proposed in this pull request?
    
    Check the `spark.sql.repl.eagerEval.enabled` configuration property in SparkDataFrame `show()` method. If the `SparkSession` has eager execution enabled, the data will be returned to the R client when the data frame is created. So instead of seeing this
    ```
    > df <- createDataFrame(faithful)
    > df
    SparkDataFrame[eruptions:double, waiting:double]
    ```
    you will see
    ```
    > df <- createDataFrame(faithful)
    > df
    +---------+-------+                                                             
    |eruptions|waiting|
    +---------+-------+
    |      3.6|   79.0|
    |      1.8|   54.0|
    |    3.333|   74.0|
    |    2.283|   62.0|
    |    4.533|   85.0|
    |    2.883|   55.0|
    |      4.7|   88.0|
    |      3.6|   85.0|
    |     1.95|   51.0|
    |     4.35|   85.0|
    |    1.833|   54.0|
    |    3.917|   84.0|
    |      4.2|   78.0|
    |     1.75|   47.0|
    |      4.7|   83.0|
    |    2.167|   52.0|
    |     1.75|   62.0|
    |      4.8|   84.0|
    |      1.6|   52.0|
    |     4.25|   79.0|
    +---------+-------+
    only showing top 20 rows
    ```
    
    ## How was this patch tested?
    Manual tests as well as unit tests (one new test case is added).


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/adrian555/spark eager_execution

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22455.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22455
    
----
commit b3e014c4e8050e8a0b3da190bb327347f9136b7e
Author: adrian555 <v2ave10p>
Date:   2018-09-18T20:48:56Z

    support eager execution

commit d0be3a8582de548862a78cfa5ffb58df933efa8d
Author: adrian555 <v2ave10p>
Date:   2018-09-18T20:51:43Z

    Merge remote-tracking branch 'remotes/upstream/master' into eager_execution

commit cd8a7041c6eecc59d22db72f4f2065ed9f06640a
Author: adrian555 <v2ave10p>
Date:   2018-09-18T21:09:48Z

    add newline

commit a89bf37e7fbfa089a39e0fb91cdd4a1bdd409b9f
Author: adrian555 <v2ave10p>
Date:   2018-09-20T20:58:02Z

    address review comment

commit a083d64ea3210c367cc4d63ad51a1c6beab129e7
Author: adrian555 <v2ave10p>
Date:   2018-09-20T22:03:30Z

    address review comment

commit 7b121e65b99e177d8870b4e098797f1f1e86ce65
Author: adrian555 <v2ave10p>
Date:   2018-09-21T00:02:31Z

    address review comment

commit 4492b278ba5a4721d6a5dc836436191ad155dfc6
Author: adrian555 <v2ave10p>
Date:   2018-09-21T17:36:33Z

    address review comment

commit 100bac715ee3fc6baee960dfa7c9466baa032bcb
Author: adrian555 <v2ave10p>
Date:   2018-09-24T18:16:41Z

    address review comment

commit 57eb00824b072b2a326810ff42edef2ca2626eb6
Author: adrian555 <v2ave10p>
Date:   2018-09-25T17:51:34Z

    address review comment

commit df582fc8d31c4e993a6e215fa0901a4728affc0e
Author: adrian555 <v2ave10p>
Date:   2018-09-26T23:06:03Z

    add sparkr.SparkDataFrame.base_show_func option

commit d1562777924fd84f14655df9c7418575aeae7fbe
Author: adrian555 <v2ave10p>
Date:   2018-09-26T23:11:41Z

    Merge remote-tracking branch 'remotes/upstream/master' into eager_execution

commit 44df922fb347211c9a962bf1771e2e0005634305
Author: adrian555 <v2ave10p>
Date:   2018-10-03T20:27:05Z

    address review comment

commit 7ce8f9454d222b42dc3969894147461eaa850b46
Author: adrian555 <v2ave10p>
Date:   2018-10-18T17:13:32Z

    Merge remote-tracking branch 'remotes/upstream/master' into eager_execution

commit 76767f9e5288d593efedda000c7fc01799d43085
Author: adrian555 <v2ave10p>
Date:   2018-10-18T18:05:47Z

    address review comment

commit b5e463f8babe75463054fe1224348bb707afc195
Author: adrian555 <v2ave10p>
Date:   2018-10-23T17:52:17Z

    address review comment

commit 5b39d737aea4b4a680e63e25f5df81993ced1c70
Author: adrian555 <v2ave10p>
Date:   2018-10-23T17:53:17Z

    Merge remote-tracking branch 'remotes/upstream/master' into eager_execution

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r222451616
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -246,30 +248,38 @@ setMethod("showDF",
     #' @note show(SparkDataFrame) since 1.4.0
     setMethod("show", "SparkDataFrame",
               function(object) {
    -            allConf <- sparkR.conf()
    -            if (!is.null(allConf[["spark.sql.repl.eagerEval.enabled"]]) &&
    -                identical(allConf[["spark.sql.repl.eagerEval.enabled"]], "true")) {
    -              argsList <- list()
    -              argsList$x <- object
    -              if (!is.null(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])) {
    -                numRows <- as.numeric(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])
    -                if (numRows > 0) {
    -                  argsList$numRows <- numRows
    +            showFunc <- getOption("sparkr.SparkDataFrame.base_show_func")
    --- End diff --
    
    I don't have any naming preference. One thing to note is that the `spark.sparkr.r.command` you mentioned above is the config to the Spark while the one I tried to define here is a config in the R side.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219402870
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -244,11 +245,15 @@ setMethod("showDF",
     #' @note show(SparkDataFrame) since 1.4.0
     setMethod("show", "SparkDataFrame",
               function(object) {
    -            cols <- lapply(dtypes(object), function(l) {
    -              paste(l, collapse = ":")
    -            })
    -            s <- paste(cols, collapse = ", ")
    -            cat(paste(class(object), "[", s, "]\n", sep = ""))
    +            if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) {
    --- End diff --
    
    respecting `spark.sql.repl.eagerEval.maxNumRows` somewhat makes sense. but instead of changing `showDF` which has other cases beyond eagerEval, we could change where `showDF` is called by `show`, and pass the max rows.
    
    here https://github.com/apache/spark/pull/22455/files#diff-508641a8bd6c6b59f3e77c80cdcfa6a9R249


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Thanks @falaki. We are taking out the `print` method out so the requirement seems clear now. Yes, for Jupyter with IRkernel, two options `jupyter.rich_display` and jupyter.base_display_func` are used to display the data.frame in a nicer way, as you mentioned. And since it is a kernel with a backend, the automatic print is not really calling the `show` method. Instead, the data is evaluated and the resulted message is returned as a response. Hence, the `df` and `show(df)` have different output formats.
    
    I think you want to have a common option in SparkR that other framework can use it instead of creating its own. Will `sparkr.SparkDataFrame.base_show_func` work for you? And I will modify the `show()` method for `SparkDataFrame` as follow
    ```r
    setMethod("show", "SparkDataFrame",
              function(object) {
                showFunc <- getOption("sparkr.SparkDataFrame.base_show_func")
                if (!is.null(showFunc)) {
                  stopifnot(typeof(showFunc) == "closure")
                  showFunc(object)
                } else {
                  allConf <- sparkR.conf()
                  prop <- allConf[["spark.sql.repl.eagerEval.enabled"]]
                  if (!is.null(prop) && identical(prop, "true")) {
                    argsList <- list()
                    argsList$x <- object
                    prop <- allConf[["spark.sql.repl.eagerEval.maxNumRows"]]
                    if (!is.null(prop)) {
                      numRows <- as.numeric(prop)
                      if (numRows > 0) {
                        argsList$numRows <- numRows
                      }
                    }
                    prop <- allConf[["spark.sql.repl.eagerEval.truncate"]]
                    if (!is.null(prop)) {
                      truncate <- as.numeric(prop)
                      if (truncate > 0) {
                        argsList$truncate <- truncate
                      }
                    }
                    do.call(showDF, argsList)
                  } else {
                    cols <- lapply(dtypes(object), function(l) {
                      paste(l, collapse = ":")
                    })
                    s <- paste(cols, collapse = ", ")
                    cat(paste(class(object), "[", s, "]\n", sep = ""))
                  }              
                }
              })
    ```
    Basically if that option is set to a predefined function, it will run that function to show. This approach also makes the `df` and `show(df)` act the same.
    
    Please have a look. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r222450798
  
    --- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
    @@ -21,11 +21,7 @@ context("Show SparkDataFrame when eager execution is enabled.")
     
     test_that("eager execution is not enabled", {
       # Start Spark session without eager execution enabled
    -  sparkSession <- if (windows_with_hadoop()) {
    -    sparkR.session(master = sparkRTestMaster)
    -  } else {
    -    sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
    -  }
    +  sparkR.session(master = sparkRTestMaster)
    --- End diff --
    
    Sure. Guess I misinterpreted your other comment.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96647/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Looks like the test was terminated in the middle, not likely related to the any code change. Could someone please ask for retest? Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4413/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Test was terminated as "Build execution time has reached the maximum allowed time for your plan (90 minutes)."
    
    Try to close and reopen the PR to see how the retest goes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219348795
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -244,11 +245,15 @@ setMethod("showDF",
     #' @note show(SparkDataFrame) since 1.4.0
     setMethod("show", "SparkDataFrame",
               function(object) {
    -            cols <- lapply(dtypes(object), function(l) {
    -              paste(l, collapse = ":")
    -            })
    -            s <- paste(cols, collapse = ", ")
    -            cat(paste(class(object), "[", s, "]\n", sep = ""))
    +            if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) {
    --- End diff --
    
    Or the param doc is not correct?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219030512
  
    --- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
    @@ -0,0 +1,58 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +library(testthat)
    +
    +context("Show SparkDataFrame when eager execution is enabled.")
    +
    +test_that("eager execution is not enabled", {
    +  # Start Spark session without eager execution enabled
    +  sparkSession <- if (windows_with_hadoop()) {
    +    sparkR.session(master = sparkRTestMaster)
    +  } else {
    +    sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
    +  }
    +  
    +  df <- suppressWarnings(createDataFrame(iris))
    --- End diff --
    
    use a different dataset that does not require `suppressWarnings`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219315374
  
    --- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
    @@ -0,0 +1,58 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +library(testthat)
    +
    +context("Show SparkDataFrame when eager execution is enabled.")
    +
    +test_that("eager execution is not enabled", {
    --- End diff --
    
    Since this test will terminate the spark session and restart, I think it is better to be self contained.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    **[Test build #96450 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96450/testReport)** for PR 22455 at commit [`4492b27`](https://github.com/apache/spark/commit/4492b278ba5a4721d6a5dc836436191ad155dfc6).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97368/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219030350
  
    --- Diff: docs/sparkr.md ---
    @@ -450,6 +450,42 @@ print(model.summaries)
     {% endhighlight %}
     </div>
     
    +### Eager execution
    +
    +If the eager execution is enabled, the data will be returned to R client immediately when the `SparkDataFrame` is created. Eager execution can be enabled by setting the configuration property `spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started up.
    +
    +<div data-lang="r" markdown="1">
    +{% highlight r %}
    +
    +# Start up spark session with eager execution enabled
    +sparkR.session(master = "local[*]", sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true"))
    +
    +df <- createDataFrame(faithful)
    +
    +# Instead of displaying the SparkDataFrame class, displays the data returned
    +df
    +
    +##+---------+-------+                                                             
    +##|eruptions|waiting|
    +##+---------+-------+
    +##|      3.6|   79.0|
    +##|      1.8|   54.0|
    +##|    3.333|   74.0|
    +##|    2.283|   62.0|
    +##|    4.533|   85.0|
    +##|    2.883|   55.0|
    +##|      4.7|   88.0|
    +##|      3.6|   85.0|
    +##|     1.95|   51.0|
    +##|     4.35|   85.0|
    +##+---------+-------+
    +##only showing top 10 rows
    +
    +{% endhighlight %} 
    +</div>
    +
    +Note that the `SparkSession` created by `sparkR` shell does not have eager execution enabled. You can stop the current session and start up a new session like above to enable.
    --- End diff --
    
    actually I think the suggestion should be to set that in the `sparkR` command line as spark conf?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    **[Test build #96450 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96450/testReport)** for PR 22455 at commit [`4492b27`](https://github.com/apache/spark/commit/4492b278ba5a4721d6a5dc836436191ad155dfc6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    **[Test build #96381 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96381/testReport)** for PR 22455 at commit [`a89bf37`](https://github.com/apache/spark/commit/a89bf37e7fbfa089a39e0fb91cdd4a1bdd409b9f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    **[Test build #97368 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97368/testReport)** for PR 22455 at commit [`44df922`](https://github.com/apache/spark/commit/44df922fb347211c9a962bf1771e2e0005634305).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    **[Test build #4388 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4388/testReport)** for PR 22455 at commit [`5b39d73`](https://github.com/apache/spark/commit/5b39d737aea4b4a680e63e25f5df81993ced1c70).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public class JavaPrefixSpanExample `
      * `trait ScalaReflection extends Logging `
      * `    // TODO: make sure this class is only instantiated through `SparkUserDefinedFunction.create()``


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Great feature! cc @falaki 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SparkR] "eager execution" for R shell, IDE

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    **[Test build #96203 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96203/testReport)** for PR 22455 at commit [`cd8a704`](https://github.com/apache/spark/commit/cd8a7041c6eecc59d22db72f4f2065ed9f06640a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    @falaki trying to understand what you were asking for. I know that `print` takes some arguments like `digit` and `quote` that can further format the output. `print` is for S3 object and `show` is for S4 object. So, are you asking for `print.SparkDataFrame` function as follow
    ```
    print.SparkDataFrame <- function(x, ...) {
      argsList <- list(...)
      if (!is.null(argsList$eager) && !argsList$eager) {
        cols <- lapply(dtypes(x), function(l) {
          paste(l, collapse = ":")
        })
        s <- paste(cols, collapse = ", ")
        cat(paste(class(x), "[", s, "]\n", sep = ""))
      } else {
        show(x)
      }
    }
    ```
    With this, 
    * `print(df)` will have the same behavior as `show(df)`. 
    * `print(df, eager=FALSE)` will continue to display just the SparkDataFrame class info, even eager execution is enabled. 
    * `print(df, eager=TRUE)` will be the same as `show()` when eager execution is enabled. But if eager execution is disable, the passed in `eager=TRUE` is ignored. (This way, we guarantee only the session's SQLConf is in effect.)
    
    Also, are you also thinking of passing the `maxNumRows` and `truncate` as arguments to `print.SparkDataFrame` and replacing `spark.sql.repl.eagerEval.maxNumRows` and `spark.sql.repl.eagerEval.truncate`?
    
    Please let me know if this is on par with your thought then I will push a commit.
    
    Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219689697
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -244,11 +246,25 @@ setMethod("showDF",
     #' @note show(SparkDataFrame) since 1.4.0
     setMethod("show", "SparkDataFrame",
               function(object) {
    -            cols <- lapply(dtypes(object), function(l) {
    -              paste(l, collapse = ":")
    -            })
    -            s <- paste(cols, collapse = ", ")
    -            cat(paste(class(object), "[", s, "]\n", sep = ""))
    +            if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) {
    +              argsList <- list()
    +              argsList$x <- object
    +              numRows <- as.numeric(sparkR.conf("spark.sql.repl.eagerEval.maxNumRows", "0")[[1]])
    +              if (numRows > 0) {
    +                argsList$numRows <- numRows
    +              }
    +              truncate <- as.numeric(sparkR.conf("spark.sql.repl.eagerEval.truncate", "0")[[1]])
    --- End diff --
    
    have you tested with the case when only `spark.sql.repl.eagerEval.truncate` is set, and `spark.sql.repl.eagerEval.maxNumRows` is not?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    > Looks like the test was terminated in the middle, not likely related to the any code change. Could someone please ask for retest? Thanks.
    
    appveyor one? you need to close and re-open this PR to trigger it


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r227500321
  
    --- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
    @@ -0,0 +1,52 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +library(testthat)
    +
    +context("Show SparkDataFrame when eager execution is enabled.")
    --- End diff --
    
    Done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r220034035
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -244,11 +246,25 @@ setMethod("showDF",
     #' @note show(SparkDataFrame) since 1.4.0
     setMethod("show", "SparkDataFrame",
               function(object) {
    -            cols <- lapply(dtypes(object), function(l) {
    -              paste(l, collapse = ":")
    -            })
    -            s <- paste(cols, collapse = ", ")
    -            cat(paste(class(object), "[", s, "]\n", sep = ""))
    +            if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) {
    +              argsList <- list()
    +              argsList$x <- object
    +              numRows <- as.numeric(sparkR.conf("spark.sql.repl.eagerEval.maxNumRows", "0")[[1]])
    +              if (numRows > 0) {
    +                argsList$numRows <- numRows
    +              }
    +              truncate <- as.numeric(sparkR.conf("spark.sql.repl.eagerEval.truncate", "0")[[1]])
    --- End diff --
    
    just asking since this variation was not in unit test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96381/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r220292594
  
    --- Diff: docs/sparkr.md ---
    @@ -450,6 +450,48 @@ print(model.summaries)
     {% endhighlight %}
     </div>
     
    +### Eager execution
    +
    +If eager execution is enabled, the data will be returned to R client immediately when the `SparkDataFrame` is created. By default, eager execution is not enabled and can be enabled by setting the configuration property `spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started up.
    +
    +Maximum number of rows and maximum number of characters per column of data to display can be controlled by `spark.sql.repl.eagerEval.maxNumRows` and `spark.sql.repl.eagerEval.truncate` configuration properties, respectively. These properties are only effective when eager execution is enabled. If these properties are not set explicitly, by default, data up to 20 rows and up to 20 characters per column will be showed.
    +
    +<div data-lang="r" markdown="1">
    +{% highlight r %}
    +
    +# Start up spark session with eager execution enabled
    +sparkR.session(master = "local[*]",
    +               sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true",
    +                                  spark.sql.repl.eagerEval.maxNumRows = as.integer(10)))
    +
    +# Create a grouped and sorted SparkDataFrame
    +df <- createDataFrame(faithful)
    +df2 <- arrange(summarize(groupBy(df, df$waiting), count = n(df$waiting)), "waiting")
    +
    +# Similar to R data.frame, displays the data returned, instead of SparkDataFrame class string
    +df2
    +
    +##+-------+-----+
    +##|waiting|count|
    +##+-------+-----+
    +##|   43.0|    1|
    +##|   45.0|    3|
    +##|   46.0|    5|
    +##|   47.0|    4|
    +##|   48.0|    3|
    +##|   49.0|    5|
    +##|   50.0|    5|
    +##|   51.0|    6|
    +##|   52.0|    5|
    +##|   53.0|    7|
    +##+-------+-----+
    +##only showing top 10 rows
    +
    +{% endhighlight %} 
    +</div>
    +
    +Note that to enable eager execution through `sparkR` command, add `spark.sql.repl.eagerEval.enabled=true` configuration property to the `--conf` option.
    --- End diff --
    
    In the same doc "From Data Sources", it has `either be added by specifying --packages with spark-submit or sparkR commands`, that is why I used `command` instead of `shell`. I would think that `script`, `shell` and `command` are exchangeable here. But if viewed by the angle that `sparkR` ends with a R execution env, maybe `shell` makes more sense. :)
    
    So I made the change.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219315277
  
    --- Diff: docs/sparkr.md ---
    @@ -450,6 +450,42 @@ print(model.summaries)
     {% endhighlight %}
     </div>
     
    +### Eager execution
    +
    +If the eager execution is enabled, the data will be returned to R client immediately when the `SparkDataFrame` is created. Eager execution can be enabled by setting the configuration property `spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started up.
    +
    +<div data-lang="r" markdown="1">
    +{% highlight r %}
    +
    +# Start up spark session with eager execution enabled
    +sparkR.session(master = "local[*]", sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true"))
    +
    +df <- createDataFrame(faithful)
    --- End diff --
    
    Done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3661/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r221416198
  
    --- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
    @@ -21,11 +21,7 @@ context("Show SparkDataFrame when eager execution is enabled.")
     
     test_that("eager execution is not enabled", {
       # Start Spark session without eager execution enabled
    -  sparkSession <- if (windows_with_hadoop()) {
    -    sparkR.session(master = sparkRTestMaster)
    -  } else {
    -    sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
    -  }
    +  sparkR.session(master = sparkRTestMaster)
    --- End diff --
    
    as mentioned here https://github.com/apache/spark/pull/22455#discussion_r220030686



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    **[Test build #96515 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96515/testReport)** for PR 22455 at commit [`100bac7`](https://github.com/apache/spark/commit/100bac715ee3fc6baee960dfa7c9466baa032bcb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96906/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219315251
  
    --- Diff: docs/sparkr.md ---
    @@ -450,6 +450,42 @@ print(model.summaries)
     {% endhighlight %}
     </div>
     
    +### Eager execution
    --- End diff --
    
    This will be the same level as "Applying User-Defined Function" where its parent topic is "SparkDataFrame Operations". So the heading is with three `#`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r221416164
  
    --- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
    @@ -21,11 +21,7 @@ context("Show SparkDataFrame when eager execution is enabled.")
     
     test_that("eager execution is not enabled", {
       # Start Spark session without eager execution enabled
    -  sparkSession <- if (windows_with_hadoop()) {
    -    sparkR.session(master = sparkRTestMaster)
    -  } else {
    -    sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
    -  }
    +  sparkR.session(master = sparkRTestMaster)
    --- End diff --
    
    you should definitely set `enableHiveSupport = FALSE` - historically this hasn't work well in other R tests when hive catalog is enabled


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r226874107
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -244,11 +246,33 @@ setMethod("showDF",
     #' @note show(SparkDataFrame) since 1.4.0
     setMethod("show", "SparkDataFrame",
               function(object) {
    -            cols <- lapply(dtypes(object), function(l) {
    -              paste(l, collapse = ":")
    -            })
    -            s <- paste(cols, collapse = ", ")
    -            cat(paste(class(object), "[", s, "]\n", sep = ""))
    +            allConf <- sparkR.conf()
    +            prop <- allConf[["spark.sql.repl.eagerEval.enabled"]]
    +            if (!is.null(prop) && identical(prop, "true")) {
    +              argsList <- list()
    +              argsList$x <- object
    +              prop <- allConf[["spark.sql.repl.eagerEval.maxNumRows"]]
    +              if (!is.null(prop)) {
    +                numRows <- as.numeric(prop)
    +                if (numRows > 0) {
    +                  argsList$numRows <- numRows
    --- End diff --
    
    could you manually check if this works correctly when numRows or truncate (L265) is a numeric but not an integer - eg. 1.4
    
    if not, we should replace `as.numeric` than `as.integer` in both places


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    **[Test build #4388 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4388/testReport)** for PR 22455 at commit [`5b39d73`](https://github.com/apache/spark/commit/5b39d737aea4b4a680e63e25f5df81993ced1c70).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r226874251
  
    --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
    @@ -3731,6 +3731,7 @@ test_that("catalog APIs, listTables, listColumns, listFunctions", {
       dropTempView("cars")
     })
     
    +
    --- End diff --
    
    remove empty line


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219689694
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -244,11 +246,25 @@ setMethod("showDF",
     #' @note show(SparkDataFrame) since 1.4.0
     setMethod("show", "SparkDataFrame",
               function(object) {
    -            cols <- lapply(dtypes(object), function(l) {
    -              paste(l, collapse = ":")
    -            })
    -            s <- paste(cols, collapse = ", ")
    -            cat(paste(class(object), "[", s, "]\n", sep = ""))
    +            if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) {
    --- End diff --
    
    instead of calling sparkR.conf three times, I'm wondering if it cleaner/safer to get a copy like in here:
    https://github.com/apache/spark/blob/95b177c8f0862c6965a7c3cd76b3935c975adee9/R/pkg/tests/fulltests/test_sparkSQL.R#L3538



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219689581
  
    --- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
    @@ -0,0 +1,61 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +library(testthat)
    +
    +context("Show SparkDataFrame when eager execution is enabled.")
    +
    +test_that("eager execution is not enabled", {
    +  # Start Spark session without eager execution enabled
    +  sparkSession <- if (windows_with_hadoop()) {
    +    sparkR.session(master = sparkRTestMaster)
    +  } else {
    +    sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
    +  }
    +  
    +  df <- createDataFrame(faithful)
    +  expect_is(df, "SparkDataFrame")
    +  expected <- "eruptions:double, waiting:double"
    +  expect_output(show(df), expected)
    +  
    +  # Stop Spark session
    +  sparkR.session.stop()
    +})
    +
    +test_that("eager execution is enabled", {
    +  # Start Spark session with eager execution enabled
    +  sparkSession <- if (windows_with_hadoop()) {
    +    sparkR.session(master = sparkRTestMaster,
    +                   sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true",
    +                                      spark.sql.repl.eagerEval.maxNumRows = as.integer(10)))
    +  } else {
    +    sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE, 
    +                   sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true",
    +                                      spark.sql.repl.eagerEval.maxNumRows = as.integer(10)))
    --- End diff --
    
    I'd set sparkConfig to a list before this and simply reuse it for both cases instead of having them duplicated


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219355827
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -244,11 +245,15 @@ setMethod("showDF",
     #' @note show(SparkDataFrame) since 1.4.0
     setMethod("show", "SparkDataFrame",
               function(object) {
    -            cols <- lapply(dtypes(object), function(l) {
    -              paste(l, collapse = ":")
    -            })
    -            s <- paste(cols, collapse = ", ")
    -            cat(paste(class(object), "[", s, "]\n", sep = ""))
    +            if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) {
    --- End diff --
    
    I might have misunderstood your previous question. The `@param` does not need to change as it  is common to `SparkDataFrame` and `Column` etc for `show()` method. Same applies to description, so I rewrote the description. Please have a look.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96450/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Let's also update the doc of `REPL_EAGER_EVAL_ENABLED` in `SQLConf`. After this patch, eager evaluation is not only supported in PySpark.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3966/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    **[Test build #96569 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96569/testReport)** for PR 22455 at commit [`57eb008`](https://github.com/apache/spark/commit/57eb00824b072b2a326810ff42edef2ca2626eb6).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r221415826
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -246,30 +248,38 @@ setMethod("showDF",
     #' @note show(SparkDataFrame) since 1.4.0
     setMethod("show", "SparkDataFrame",
               function(object) {
    -            allConf <- sparkR.conf()
    -            if (!is.null(allConf[["spark.sql.repl.eagerEval.enabled"]]) &&
    -                identical(allConf[["spark.sql.repl.eagerEval.enabled"]], "true")) {
    -              argsList <- list()
    -              argsList$x <- object
    -              if (!is.null(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])) {
    -                numRows <- as.numeric(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])
    -                if (numRows > 0) {
    -                  argsList$numRows <- numRows
    +            showFunc <- getOption("sparkr.SparkDataFrame.base_show_func")
    --- End diff --
    
    hmm, this naming convention? typically we don't mix `.` and `_` and I don't think we have anything with `SparkDataFrame`
    
    seems like we have `spark.sparkr.something` before https://github.com/apache/spark/blob/5264164a67df498b73facae207eda12ee133be7d/core/src/main/scala/org/apache/spark/api/r/RRunner.scala#L343



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219030277
  
    --- Diff: docs/sparkr.md ---
    @@ -450,6 +450,42 @@ print(model.summaries)
     {% endhighlight %}
     </div>
     
    +### Eager execution
    +
    +If the eager execution is enabled, the data will be returned to R client immediately when the `SparkDataFrame` is created. Eager execution can be enabled by setting the configuration property `spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started up.
    +
    +<div data-lang="r" markdown="1">
    +{% highlight r %}
    +
    +# Start up spark session with eager execution enabled
    +sparkR.session(master = "local[*]", sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true"))
    +
    +df <- createDataFrame(faithful)
    +
    +# Instead of displaying the SparkDataFrame class, displays the data returned
    +df
    +
    +##+---------+-------+                                                             
    +##|eruptions|waiting|
    +##+---------+-------+
    +##|      3.6|   79.0|
    +##|      1.8|   54.0|
    +##|    3.333|   74.0|
    +##|    2.283|   62.0|
    +##|    4.533|   85.0|
    +##|    2.883|   55.0|
    +##|      4.7|   88.0|
    +##|      3.6|   85.0|
    +##|     1.95|   51.0|
    +##|     4.35|   85.0|
    +##+---------+-------+
    +##only showing top 10 rows
    +
    +{% endhighlight %} 
    +</div>
    +
    +Note that the `SparkSession` created by `sparkR` shell does not have eager execution enabled. You can stop the current session and start up a new session like above to enable.
    --- End diff --
    
    change to `Note that the `SparkSession` created by `sparkR` shell by default does not `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219356224
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -244,11 +245,15 @@ setMethod("showDF",
     #' @note show(SparkDataFrame) since 1.4.0
     setMethod("show", "SparkDataFrame",
               function(object) {
    -            cols <- lapply(dtypes(object), function(l) {
    -              paste(l, collapse = ":")
    -            })
    -            s <- paste(cols, collapse = ", ")
    -            cat(paste(class(object), "[", s, "]\n", sep = ""))
    +            if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) {
    --- End diff --
    
    @adrian555 Please correct me on this one. Should we also respect `spark.sql.repl.eagerEval.maxNumRows` and `spark.sql.repl.eagerEval.truncate` ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219410274
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -226,7 +226,8 @@ setMethod("showDF",
     
     #' show
     #'
    -#' Print class and type information of a Spark object.
    +#' If eager evaluation is enabled and the Spark object is a SparkDataFrame, return the data of
    --- End diff --
    
    return the data of the SparkDataFrame object -> evaluate the SparkDataFrame and print top rows of the SparkDataFrame


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    **[Test build #96645 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96645/testReport)** for PR 22455 at commit [`df582fc`](https://github.com/apache/spark/commit/df582fc8d31c4e993a6e215fa0901a4728affc0e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r227500277
  
    --- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
    @@ -0,0 +1,52 @@
    +#
    --- End diff --
    
    Done.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4098/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    **[Test build #97984 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97984/testReport)** for PR 22455 at commit [`2d2c3c6`](https://github.com/apache/spark/commit/2d2c3c638062633193642931e8c81e80151dd09d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97551/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r220292560
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -1491,9 +1491,10 @@ object SQLConf {
       val REPL_EAGER_EVAL_ENABLED = buildConf("spark.sql.repl.eagerEval.enabled")
         .doc("Enables eager evaluation or not. When true, the top K rows of Dataset will be " +
           "displayed if and only if the REPL supports the eager evaluation. Currently, the " +
    -      "eager evaluation is only supported in PySpark. For the notebooks like Jupyter, " +
    -      "the HTML table (generated by _repr_html_) will be returned. For plain Python REPL, " +
    -      "the returned outputs are formatted like dataframe.show().")
    +      "eager evaluation is supported in PySpark and SparkR. In PySpark, for the notebooks like " +
    +      "Jupyter, the HTML table (generated by _repr_html_) will be returned. For plain Python " +
    +      "REPL, the returned outputs are formatted like dataframe.show(). In SparkR, the returned " +
    +      "outputs are showed as R data.frame.")
    --- End diff --
    
    Done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22455


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97984/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219331649
  
    --- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
    @@ -0,0 +1,58 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +library(testthat)
    +
    +context("Show SparkDataFrame when eager execution is enabled.")
    +
    +test_that("eager execution is not enabled", {
    +  # Start Spark session without eager execution enabled
    +  sparkSession <- if (windows_with_hadoop()) {
    +    sparkR.session(master = sparkRTestMaster)
    +  } else {
    +    sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
    +  }
    +  
    +  df <- createDataFrame(faithful)
    +  expect_is(df, "SparkDataFrame")
    +  expected <- "eruptions:double, waiting:double"
    +  expect_output(show(df), expected)
    +  
    +  # Stop Spark session
    +  sparkR.session.stop()
    +})
    +
    +test_that("eager execution is enabled", {
    +  # Start Spark session without eager execution enabled
    --- End diff --
    
    Done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    **[Test build #97984 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97984/testReport)** for PR 22455 at commit [`2d2c3c6`](https://github.com/apache/spark/commit/2d2c3c638062633193642931e8c81e80151dd09d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3316/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    **[Test build #96906 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96906/testReport)** for PR 22455 at commit [`44df922`](https://github.com/apache/spark/commit/44df922fb347211c9a962bf1771e2e0005634305).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219030085
  
    --- Diff: docs/sparkr.md ---
    @@ -450,6 +450,42 @@ print(model.summaries)
     {% endhighlight %}
     </div>
     
    +### Eager execution
    +
    +If the eager execution is enabled, the data will be returned to R client immediately when the `SparkDataFrame` is created. Eager execution can be enabled by setting the configuration property `spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started up.
    +
    +<div data-lang="r" markdown="1">
    +{% highlight r %}
    +
    +# Start up spark session with eager execution enabled
    +sparkR.session(master = "local[*]", sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true"))
    +
    +df <- createDataFrame(faithful)
    --- End diff --
    
    perhaps a more complete example - like `summarize(groupBy(df, df$waiting), count = n(df$waiting))`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r220033775
  
    --- Diff: docs/sparkr.md ---
    @@ -450,6 +450,48 @@ print(model.summaries)
     {% endhighlight %}
     </div>
     
    +### Eager execution
    +
    +If eager execution is enabled, the data will be returned to R client immediately when the `SparkDataFrame` is created. By default, eager execution is not enabled and can be enabled by setting the configuration property `spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started up.
    +
    +Maximum number of rows and maximum number of characters per column of data to display can be controlled by `spark.sql.repl.eagerEval.maxNumRows` and `spark.sql.repl.eagerEval.truncate` configuration properties, respectively. These properties are only effective when eager execution is enabled. If these properties are not set explicitly, by default, data up to 20 rows and up to 20 characters per column will be showed.
    +
    +<div data-lang="r" markdown="1">
    +{% highlight r %}
    +
    +# Start up spark session with eager execution enabled
    +sparkR.session(master = "local[*]",
    +               sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true",
    +                                  spark.sql.repl.eagerEval.maxNumRows = as.integer(10)))
    +
    +# Create a grouped and sorted SparkDataFrame
    +df <- createDataFrame(faithful)
    +df2 <- arrange(summarize(groupBy(df, df$waiting), count = n(df$waiting)), "waiting")
    +
    +# Similar to R data.frame, displays the data returned, instead of SparkDataFrame class string
    +df2
    +
    +##+-------+-----+
    +##|waiting|count|
    +##+-------+-----+
    +##|   43.0|    1|
    +##|   45.0|    3|
    +##|   46.0|    5|
    +##|   47.0|    4|
    +##|   48.0|    3|
    +##|   49.0|    5|
    +##|   50.0|    5|
    +##|   51.0|    6|
    +##|   52.0|    5|
    +##|   53.0|    7|
    +##+-------+-----+
    +##only showing top 10 rows
    +
    +{% endhighlight %} 
    +</div>
    +
    +Note that to enable eager execution through `sparkR` command, add `spark.sql.repl.eagerEval.enabled=true` configuration property to the `--conf` option.
    --- End diff --
    
    `sparkR command` - I think should be `sparkR shell`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r220292390
  
    --- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
    @@ -0,0 +1,60 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +library(testthat)
    +
    +context("Show SparkDataFrame when eager execution is enabled.")
    +
    +test_that("eager execution is not enabled", {
    +  # Start Spark session without eager execution enabled
    +  sparkSession <- if (windows_with_hadoop()) {
    +    sparkR.session(master = sparkRTestMaster)
    +  } else {
    +    sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
    --- End diff --
    
    Done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SparkR] "eager execution" for R shell, IDE

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    **[Test build #96203 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96203/testReport)** for PR 22455 at commit [`cd8a704`](https://github.com/apache/spark/commit/cd8a7041c6eecc59d22db72f4f2065ed9f06640a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by falaki <gi...@git.apache.org>.

Github user falaki commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    @adrian555 yes, that looks good. Thank you!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219315297
  
    --- Diff: docs/sparkr.md ---
    @@ -450,6 +450,42 @@ print(model.summaries)
     {% endhighlight %}
     </div>
     
    +### Eager execution
    +
    +If the eager execution is enabled, the data will be returned to R client immediately when the `SparkDataFrame` is created. Eager execution can be enabled by setting the configuration property `spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started up.
    +
    +<div data-lang="r" markdown="1">
    +{% highlight r %}
    +
    +# Start up spark session with eager execution enabled
    +sparkR.session(master = "local[*]", sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true"))
    +
    +df <- createDataFrame(faithful)
    +
    +# Instead of displaying the SparkDataFrame class, displays the data returned
    --- End diff --
    
    Done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3313/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219315391
  
    --- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
    @@ -0,0 +1,58 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +library(testthat)
    +
    +context("Show SparkDataFrame when eager execution is enabled.")
    +
    +test_that("eager execution is not enabled", {
    +  # Start Spark session without eager execution enabled
    +  sparkSession <- if (windows_with_hadoop()) {
    +    sparkR.session(master = sparkRTestMaster)
    +  } else {
    +    sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
    +  }
    +  
    +  df <- suppressWarnings(createDataFrame(iris))
    --- End diff --
    
    Done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Another flavor will be this
    ```r
    print.SparkDataFrame <- function(x, ...) {
      if ((length(list(...)) > 0) && identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) {
        # TODO: limit the maxNumRows and truncate
        df <- collect(x)
        print(df, ...)
      } else {
        show(x)
      }
    }
    ```
    this allows the other `print` formats to be taken. For example, when eager execution is enabled, `print(createDataFrame(iris), quote=TRUE)` will have such output
    ```
        Sepal_Length Sepal_Width Petal_Length Petal_Width      Species              
    1          "5.1"       "3.5"        "1.4"       "0.2"     "setosa"
    2          "4.9"       "3.0"        "1.4"       "0.2"     "setosa"
    3          "4.7"       "3.2"        "1.3"       "0.2"     "setosa"
    4          "4.6"       "3.1"        "1.5"       "0.2"     "setosa"
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r226874469
  
    --- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
    @@ -0,0 +1,52 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +library(testthat)
    +
    +context("Show SparkDataFrame when eager execution is enabled.")
    --- End diff --
    
    change to `test show SparkDataFrame when eager execution is enabled.`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    **[Test build #97368 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97368/testReport)** for PR 22455 at commit [`44df922`](https://github.com/apache/spark/commit/44df922fb347211c9a962bf1771e2e0005634305).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    **[Test build #96647 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96647/testReport)** for PR 22455 at commit [`d156277`](https://github.com/apache/spark/commit/d1562777924fd84f14655df9c7418575aeae7fbe).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219897812
  
    --- Diff: docs/sparkr.md ---
    @@ -450,6 +450,48 @@ print(model.summaries)
     {% endhighlight %}
     </div>
     
    +### Eager execution
    +
    +If the eager execution is enabled, the data will be returned to R client immediately when the `SparkDataFrame` is created. Eager execution can be enabled by setting the configuration property `spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started up.
    +
    +Maximum number of rows and maximum number of characters per column of data to display can be controlled by `spark.sql.repl.eagerEval.maxNumRows` and `spark.sql.repl.eagerEval.truncate` configuration properties, respectively.
    --- End diff --
    
    Let's note the default values.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219575988
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -244,11 +245,15 @@ setMethod("showDF",
     #' @note show(SparkDataFrame) since 1.4.0
     setMethod("show", "SparkDataFrame",
               function(object) {
    -            cols <- lapply(dtypes(object), function(l) {
    -              paste(l, collapse = ":")
    -            })
    -            s <- paste(cols, collapse = ", ")
    -            cat(paste(class(object), "[", s, "]\n", sep = ""))
    +            if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) {
    --- End diff --
    
    Done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r220033684
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -1491,9 +1491,10 @@ object SQLConf {
       val REPL_EAGER_EVAL_ENABLED = buildConf("spark.sql.repl.eagerEval.enabled")
         .doc("Enables eager evaluation or not. When true, the top K rows of Dataset will be " +
           "displayed if and only if the REPL supports the eager evaluation. Currently, the " +
    -      "eager evaluation is only supported in PySpark. For the notebooks like Jupyter, " +
    -      "the HTML table (generated by _repr_html_) will be returned. For plain Python REPL, " +
    -      "the returned outputs are formatted like dataframe.show().")
    +      "eager evaluation is supported in PySpark and SparkR. In PySpark, for the notebooks like " +
    +      "Jupyter, the HTML table (generated by _repr_html_) will be returned. For plain Python " +
    +      "REPL, the returned outputs are formatted like dataframe.show(). In SparkR, the returned " +
    +      "outputs are showed as R data.frame.")
    --- End diff --
    
    ` are showed as R data.frame` - it's not actually the exact format, so how about we say ` are showed similar to R data.frame would`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by falaki <gi...@git.apache.org>.

Github user falaki commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    @adrian555 These all great points. My high-level was enabling other platforms (e.g., Jupyter) to plugin more advanced (custom) functions for displaying SparkDataFrame. If the framework does not set anything, SparkR would use `show()` by default. But if the notebook environment has a nicer way of displaying, they can simply set a config to use that by default.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by falaki <gi...@git.apache.org>.

Github user falaki commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    @adrian555 thanks for submitting this. Can we have a config to set the default `print` function in eager mode. It can default to `show`, but I can imagine it is useful to make it configurable. For example in Jupyter there are ricer ways of displaying R data.frame.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96387/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SparkR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96203/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    @adrian555 The changes look fine to me. Thank you.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    **[Test build #97936 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97936/testReport)** for PR 22455 at commit [`5b39d73`](https://github.com/apache/spark/commit/5b39d737aea4b4a680e63e25f5df81993ced1c70).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public class JavaPrefixSpanExample `
      * `trait ScalaReflection extends Logging `
      * `    // TODO: make sure this class is only instantiated through `SparkUserDefinedFunction.create()``


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r219937585
  
    --- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
    @@ -0,0 +1,61 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +library(testthat)
    +
    +context("Show SparkDataFrame when eager execution is enabled.")
    +
    +test_that("eager execution is not enabled", {
    +  # Start Spark session without eager execution enabled
    +  sparkSession <- if (windows_with_hadoop()) {
    +    sparkR.session(master = sparkRTestMaster)
    +  } else {
    +    sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
    +  }
    +  
    +  df <- createDataFrame(faithful)
    +  expect_is(df, "SparkDataFrame")
    +  expected <- "eruptions:double, waiting:double"
    +  expect_output(show(df), expected)
    +  
    +  # Stop Spark session
    +  sparkR.session.stop()
    +})
    +
    +test_that("eager execution is enabled", {
    +  # Start Spark session with eager execution enabled
    +  sparkSession <- if (windows_with_hadoop()) {
    +    sparkR.session(master = sparkRTestMaster,
    +                   sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true",
    +                                      spark.sql.repl.eagerEval.maxNumRows = as.integer(10)))
    +  } else {
    +    sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE, 
    +                   sparkConfig = list(spark.sql.repl.eagerEval.enabled = "true",
    +                                      spark.sql.repl.eagerEval.maxNumRows = as.integer(10)))
    --- End diff --
    
    Done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

Posted by adrian555 <gi...@git.apache.org>.

Github user adrian555 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22455#discussion_r223257553
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -246,30 +248,38 @@ setMethod("showDF",
     #' @note show(SparkDataFrame) since 1.4.0
     setMethod("show", "SparkDataFrame",
               function(object) {
    -            allConf <- sparkR.conf()
    -            if (!is.null(allConf[["spark.sql.repl.eagerEval.enabled"]]) &&
    -                identical(allConf[["spark.sql.repl.eagerEval.enabled"]], "true")) {
    -              argsList <- list()
    -              argsList$x <- object
    -              if (!is.null(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])) {
    -                numRows <- as.numeric(allConf[["spark.sql.repl.eagerEval.maxNumRows"]])
    -                if (numRows > 0) {
    -                  argsList$numRows <- numRows
    +            showFunc <- getOption("sparkr.SparkDataFrame.base_show_func")
    --- End diff --
    
    It seems a good idea to separate out the print/show issue as I can see it is not 100% related to this jira. Maybe it will help with a new jira to track the requirement and discussion for that issue. Please confirm so that I will revert some related changes. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97936/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    Retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22455
  
    **[Test build #97936 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97936/testReport)** for PR 22455 at commit [`5b39d73`](https://github.com/apache/spark/commit/5b39d737aea4b4a680e63e25f5df81993ced1c70).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org