You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by zero323 <gi...@git.apache.org> on 2017/05/04 00:35:41 UTC

[GitHub] spark pull request #17851: [SPARK-20585][SPARKR] R generic hint support

GitHub user zero323 opened a pull request:

    https://github.com/apache/spark/pull/17851

    [SPARK-20585][SPARKR] R generic hint support

    ## What changes were proposed in this pull request?
    
    Adds support for generic hints on `SparkDataFrame`
    
    ## How was this patch tested?
    
    Unit tests, `check-cran.sh`


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zero323/spark SPARK-20585

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17851.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17851
    
----
commit e21d51e2a7a3794b2807c18239e1ce889dc41dcf
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-05-03T09:30:22Z

    Initial implementation

commit 261e5a636198a1dcc87770bff1cc75bc745b4043
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-05-03T09:48:51Z

    Add since note

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17851
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76436/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17851#discussion_r114698548
  
    --- Diff: R/pkg/R/generics.R ---
    @@ -572,6 +572,10 @@ setGeneric("first", function(x, ...) { standardGeneric("first") })
     #' @export
     setGeneric("group_by", function(x, ...) { standardGeneric("group_by") })
     
    +#' @rdname hint
    +#' @export
    +setGeneric("hint", function(x, name, ...) { standardGeneric("hint") })
    +
    --- End diff --
    
    this should move after `groupBy`, I think


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17851#discussion_r114698295
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -3715,3 +3715,34 @@ setMethod("rollup",
                 sgd <- callJMethod(x@sdf, "rollup", jcol)
                 groupedData(sgd)
               })
    +
    +#' hint
    +#'
    +#' Specifies execution plan hint on the current SparkDataFrame.
    +#'
    +#' @param x a SparkDataFrame.
    +#' @param name a name of the hint.
    +#' @param ... additional argument(s) passed to the method.
    +#'
    +#' @return A SparkDataFrame.
    +#' @family SparkDataFrame functions
    +#' @aliases hint,SparkDataFrame,character-method
    +#' @rdname hint
    +#' @name hint
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' df <- createDataFrame(mtcars)
    +#' avg_mpg <- mean(groupBy(createDataFrame(mtcars), "cyl"), "mpg")
    --- End diff --
    
    or you want to show these are two different dataset? maybe it's worthwhile to comment that


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17851#discussion_r114698515
  
    --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
    @@ -2147,6 +2147,18 @@ test_that("join(), crossJoin() and merge() on a DataFrame", {
     
       unlink(jsonPath2)
       unlink(jsonPath3)
    +
    +  # Join with broadcast hint
    +  df1 <- sql("SELECT * FROM range(10e10)")
    +  df2 <- sql("SELECT * FROM range(10e10)")
    +
    +  execution_plan <- capture.output(explain(join(df1, df2, df1$id == df2$id)))
    +  expect_false(any(grepl("BroadcastHashJoin", execution_plan)))
    +
    +  execution_plan_hint <- capture.output(
    +    explain(join(df1, hint(df2, "broadcast"), df1$id == df2$id))
    +  )
    +  expect_true(any(grepl("BroadcastHashJoin", execution_plan_hint)))
    --- End diff --
    
    awesome!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17851#discussion_r114698072
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -3715,3 +3715,34 @@ setMethod("rollup",
                 sgd <- callJMethod(x@sdf, "rollup", jcol)
                 groupedData(sgd)
               })
    +
    +#' hint
    +#'
    +#' Specifies execution plan hint on the current SparkDataFrame.
    +#'
    +#' @param x a SparkDataFrame.
    +#' @param name a name of the hint.
    +#' @param ... additional argument(s) passed to the method.
    --- End diff --
    
    in this case the `...` is actually meaningful, so I'd suggest documenting it, eg. similar to scala, "(optional) properties"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/17851
  
    it looks like AppVeyor is stuck since about 22 hrs ago..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by zero323 <gi...@git.apache.org>.
Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17851#discussion_r114709260
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -3715,3 +3715,34 @@ setMethod("rollup",
                 sgd <- callJMethod(x@sdf, "rollup", jcol)
                 groupedData(sgd)
               })
    +
    +#' hint
    +#'
    +#' Specifies execution plan hint on the current SparkDataFrame.
    +#'
    +#' @param x a SparkDataFrame.
    +#' @param name a name of the hint.
    +#' @param ... additional argument(s) passed to the method.
    +#'
    +#' @return A SparkDataFrame.
    +#' @family SparkDataFrame functions
    +#' @aliases hint,SparkDataFrame,character-method
    +#' @rdname hint
    +#' @name hint
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' df <- createDataFrame(mtcars)
    +#' avg_mpg <- mean(groupBy(createDataFrame(mtcars), "cyl"), "mpg")
    --- End diff --
    
    Also with alias it will be quite dense:
    
    ```r
    #' @examples
    #' \dontrun{
    #' # Set aliases to avoid ambiguity
    #' df <- alias(createDataFrame(mtcars), "cars")
    #' avg_mpg <- alias(mean(groupBy(createDataFrame(mtcars), "cyl"), "mpg"), "avg_mpg")
    #'
    #' head(join(
    #'   df, hint(avg_mpg, "broadcast"), 
    #'   column("cars.cyl") == column("avg_mpg.cyl")
    #' ))
    #' }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17851
  
    **[Test build #76442 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76442/testReport)** for PR 17851 at commit [`1183441`](https://github.com/apache/spark/commit/1183441a8dcebb8938081a0a7b2203ae2809b30b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by zero323 <gi...@git.apache.org>.
Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/17851
  
    @felixcheung  Do you think this makes `o.a.s.sql.functions.broadcast` obsolete? I a have WIP on this but it is a tricky one. There is an internal, non-generic `broadcast, with different signature so we'd have to either adjust it. or use different name (`broadcast_df`, `broadcast_table`?).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17851
  
    **[Test build #76436 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76436/testReport)** for PR 17851 at commit [`ee52b53`](https://github.com/apache/spark/commit/ee52b53d1668ab1b48cf0ce659fbb9e32ebd2a3c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17851#discussion_r114708146
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -3715,3 +3715,34 @@ setMethod("rollup",
                 sgd <- callJMethod(x@sdf, "rollup", jcol)
                 groupedData(sgd)
               })
    +
    +#' hint
    +#'
    +#' Specifies execution plan hint on the current SparkDataFrame.
    +#'
    +#' @param x a SparkDataFrame.
    +#' @param name a name of the hint.
    +#' @param ... additional argument(s) passed to the method.
    --- End diff --
    
    hmm, yes ;)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17851


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/17851
  
    @felixcheung was this merged only in master but not branch-2.2?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by zero323 <gi...@git.apache.org>.
Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17851#discussion_r114708664
  
    --- Diff: R/pkg/R/generics.R ---
    @@ -572,6 +572,10 @@ setGeneric("first", function(x, ...) { standardGeneric("first") })
     #' @export
     setGeneric("group_by", function(x, ...) { standardGeneric("group_by") })
     
    +#' @rdname hint
    +#' @export
    +setGeneric("hint", function(x, name, ...) { standardGeneric("hint") })
    +
    --- End diff --
    
    Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17851
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17851
  
    **[Test build #76436 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76436/testReport)** for PR 17851 at commit [`ee52b53`](https://github.com/apache/spark/commit/ee52b53d1668ab1b48cf0ce659fbb9e32ebd2a3c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17851
  
    **[Test build #76449 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76449/testReport)** for PR 17851 at commit [`e6c6d82`](https://github.com/apache/spark/commit/e6c6d82d0494da1645d1f1e3c113c07fadf004cd).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17851
  
    **[Test build #76435 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76435/testReport)** for PR 17851 at commit [`261e5a6`](https://github.com/apache/spark/commit/261e5a636198a1dcc87770bff1cc75bc745b4043).
     * This patch **fails R style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by zero323 <gi...@git.apache.org>.
Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/17851
  
    Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17851
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76442/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17851
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17851
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76435/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by zero323 <gi...@git.apache.org>.
Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17851#discussion_r114702184
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -3715,3 +3715,34 @@ setMethod("rollup",
                 sgd <- callJMethod(x@sdf, "rollup", jcol)
                 groupedData(sgd)
               })
    +
    +#' hint
    +#'
    +#' Specifies execution plan hint on the current SparkDataFrame.
    +#'
    +#' @param x a SparkDataFrame.
    +#' @param name a name of the hint.
    +#' @param ... additional argument(s) passed to the method.
    +#'
    +#' @return A SparkDataFrame.
    +#' @family SparkDataFrame functions
    +#' @aliases hint,SparkDataFrame,character-method
    +#' @rdname hint
    +#' @name hint
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' df <- createDataFrame(mtcars)
    +#' avg_mpg <- mean(groupBy(createDataFrame(mtcars), "cyl"), "mpg")
    --- End diff --
    
    I wanted to use different datasets to avoid aliasing and _trivially equal_ warning. It works but it is confusing (note: equi-join syntax same as in Scala or Python would be great, and it shouldn't be that hard to add). Once we merge `alias` this shouldn't been an issue. Since we don't run it, I can add aliases now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17851
  
    **[Test build #76435 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76435/testReport)** for PR 17851 at commit [`261e5a6`](https://github.com/apache/spark/commit/261e5a636198a1dcc87770bff1cc75bc745b4043).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/17851
  
    do you mean [this](https://github.com/apache/spark/blob/39e2bad6a866d27c3ca594d15e574a1da3ee84cc/R/pkg/R/broadcast.R#L43)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17851#discussion_r114724979
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -3715,3 +3715,34 @@ setMethod("rollup",
                 sgd <- callJMethod(x@sdf, "rollup", jcol)
                 groupedData(sgd)
               })
    +
    +#' hint
    +#'
    +#' Specifies execution plan hint on the current SparkDataFrame.
    +#'
    +#' @param x a SparkDataFrame.
    +#' @param name a name of the hint.
    +#' @param ... additional argument(s) passed to the method.
    +#'
    +#' @return A SparkDataFrame.
    +#' @family SparkDataFrame functions
    +#' @aliases hint,SparkDataFrame,character-method
    +#' @rdname hint
    +#' @name hint
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' df <- createDataFrame(mtcars)
    +#' avg_mpg <- mean(groupBy(createDataFrame(mtcars), "cyl"), "mpg")
    --- End diff --
    
    right - I think the example makes sense now but it might not be very obvious - for example, 
    ```
    createDataFrame(mtcars)
    createDataFrame(mtcars)
    ```
    vs
    ```
    df <- createDataFrame(mtcars)
    df
    ```
    is not very subtle unless you know what Spark is doing differently here. This is why I suggested pointing out the need to have distinct "copies" of data


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17851
  
    **[Test build #76442 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76442/testReport)** for PR 17851 at commit [`1183441`](https://github.com/apache/spark/commit/1183441a8dcebb8938081a0a7b2203ae2809b30b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17851#discussion_r114698385
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -3715,3 +3715,34 @@ setMethod("rollup",
                 sgd <- callJMethod(x@sdf, "rollup", jcol)
                 groupedData(sgd)
               })
    +
    +#' hint
    +#'
    +#' Specifies execution plan hint on the current SparkDataFrame.
    --- End diff --
    
    the R programming model is a bit different - I think it is better to point out the original SparkDataFrame is not actually changed - instead say `.... hint and return a new SparkDataFrame` is better


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17851#discussion_r114698264
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -3715,3 +3715,34 @@ setMethod("rollup",
                 sgd <- callJMethod(x@sdf, "rollup", jcol)
                 groupedData(sgd)
               })
    +
    +#' hint
    +#'
    +#' Specifies execution plan hint on the current SparkDataFrame.
    +#'
    +#' @param x a SparkDataFrame.
    +#' @param name a name of the hint.
    +#' @param ... additional argument(s) passed to the method.
    +#'
    +#' @return A SparkDataFrame.
    +#' @family SparkDataFrame functions
    +#' @aliases hint,SparkDataFrame,character-method
    +#' @rdname hint
    +#' @name hint
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' df <- createDataFrame(mtcars)
    +#' avg_mpg <- mean(groupBy(createDataFrame(mtcars), "cyl"), "mpg")
    --- End diff --
    
    you recreated `createDataFrame(mtcars)` here, do you mean to use `df` from the line before?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/17851
  
    merged to master/2.2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17851#discussion_r114708724
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -3715,3 +3715,34 @@ setMethod("rollup",
                 sgd <- callJMethod(x@sdf, "rollup", jcol)
                 groupedData(sgd)
               })
    +
    +#' hint
    +#'
    +#' Specifies execution plan hint on the current SparkDataFrame.
    +#'
    +#' @param x a SparkDataFrame.
    +#' @param name a name of the hint.
    +#' @param ... additional argument(s) passed to the method.
    +#'
    +#' @return A SparkDataFrame.
    +#' @family SparkDataFrame functions
    +#' @aliases hint,SparkDataFrame,character-method
    +#' @rdname hint
    +#' @name hint
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' df <- createDataFrame(mtcars)
    +#' avg_mpg <- mean(groupBy(createDataFrame(mtcars), "cyl"), "mpg")
    --- End diff --
    
    `alias` is going to 2.3, this is going to 2.2 - I think we leave this for now and can improve this in master later


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17851
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17851
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17851
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76449/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17851#discussion_r114714285
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -3715,3 +3715,34 @@ setMethod("rollup",
                 sgd <- callJMethod(x@sdf, "rollup", jcol)
                 groupedData(sgd)
               })
    +
    +#' hint
    +#'
    +#' Specifies execution plan hint on the current SparkDataFrame.
    --- End diff --
    
    ouch sorry I didn't mean to put ".... hint and return a new SparkDataFrame"
    but "Specifies execution plan hint and return a new SparkDataFrame"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/17851
  
    Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/17851
  
    This is branch-2.2?
    
    https://github.com/apache/spark/commit/3f5c548128c17d058b5ab2142938f6d03b38e0b1
    
    It missed the rc2 by a few hours though
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by zero323 <gi...@git.apache.org>.
Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17851#discussion_r114704425
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -3715,3 +3715,34 @@ setMethod("rollup",
                 sgd <- callJMethod(x@sdf, "rollup", jcol)
                 groupedData(sgd)
               })
    +
    +#' hint
    +#'
    +#' Specifies execution plan hint on the current SparkDataFrame.
    +#'
    +#' @param x a SparkDataFrame.
    +#' @param name a name of the hint.
    +#' @param ... additional argument(s) passed to the method.
    --- End diff --
    
    Scala is even [more cryptic](https://github.com/rxin/spark/blob/b84badc29c5ca315791e4d023bdebb046e0b3b4f/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L1163-L1171) here. I adjusted it a bit, but I think we can revisit this once we have some practical examples.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17851
  
    **[Test build #76449 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76449/testReport)** for PR 17851 at commit [`e6c6d82`](https://github.com/apache/spark/commit/e6c6d82d0494da1645d1f1e3c113c07fadf004cd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17851: [SPARK-20585][SPARKR] R generic hint support

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17851#discussion_r114698221
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -3715,3 +3715,34 @@ setMethod("rollup",
                 sgd <- callJMethod(x@sdf, "rollup", jcol)
                 groupedData(sgd)
               })
    +
    +#' hint
    +#'
    +#' Specifies execution plan hint on the current SparkDataFrame.
    +#'
    +#' @param x a SparkDataFrame.
    +#' @param name a name of the hint.
    +#' @param ... additional argument(s) passed to the method.
    +#'
    --- End diff --
    
    nit: remove empty line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org