You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by actuaryzhang <gi...@git.apache.org> on 2017/05/18 06:36:28 UTC

[GitHub] spark pull request #18025: [WIP][SparkR] Update doc and examples for sql fun...

GitHub user actuaryzhang opened a pull request:

    https://github.com/apache/spark/pull/18025

    [WIP][SparkR] Update doc and examples for sql functions

    ## What changes were proposed in this pull request?
    Create better examples for sql functions. 
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/actuaryzhang/spark sparkRDoc4

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18025.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18025
    
----
commit 5c8cd1e5da896d78ea3cb4fcf5e046d22090dc2a
Author: Wayne Zhang <ac...@uber.com>
Date:   2017-05-18T06:32:42Z

    sql function examples prototype

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR][WIP] Grouped documentation for agg...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #77378 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77378/testReport)** for PR 18025 at commit [`21e54c0`](https://github.com/apache/spark/commit/21e54c0fd035892066bc4964b8eca5f51331c29e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [WIP][SparkR] Update doc and examples for sql fun...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r117548839
  
    --- Diff: R/pkg/R/generics.R ---
    @@ -396,67 +397,81 @@ setGeneric("agg", function (x, ...) { standardGeneric("agg") })
     #' @param object x a SparkDataFrame or a Column
     #' @param data new name to use
     #' @return a SparkDataFrame or a Column
    +#' @noRd
     NULL
     
     #' @rdname arrange
     #' @export
    +#' @noRd
     setGeneric("arrange", function(x, col, ...) { standardGeneric("arrange") })
     
     #' @rdname as.data.frame
     #' @export
    +#' @noRd
     setGeneric("as.data.frame",
                function(x, row.names = NULL, optional = FALSE, ...) {
                  standardGeneric("as.data.frame")
                })
     
     #' @rdname attach
     #' @export
    +#' @noRd
     setGeneric("attach")
     
     #' @rdname cache
     #' @export
    +#' @noRd
     setGeneric("cache", function(x) { standardGeneric("cache") })
     
     #' @rdname checkpoint
     #' @export
    +#' @noRd
     setGeneric("checkpoint", function(x, eager = TRUE) { standardGeneric("checkpoint") })
     
     #' @rdname coalesce
     #' @param x a Column or a SparkDataFrame.
    --- End diff --
    
    see here https://github.com/apache/spark/pull/18025/files#diff-04c14efaae2b7b0f0a45038482f2590cR135
    how do we decide this goes to Column.R and not DataFrame.R? It's very easy then later on someone else added more comment in DataFrame.R thinking there is no documentation and then later on the help content is duplicated (has happened a few times before)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [WIP][SparkR] Update doc and examples for sql fun...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r117602305
  
    --- Diff: R/pkg/R/generics.R ---
    @@ -396,67 +397,81 @@ setGeneric("agg", function (x, ...) { standardGeneric("agg") })
     #' @param object x a SparkDataFrame or a Column
     #' @param data new name to use
     #' @return a SparkDataFrame or a Column
    +#' @noRd
     NULL
     
     #' @rdname arrange
     #' @export
    +#' @noRd
     setGeneric("arrange", function(x, col, ...) { standardGeneric("arrange") })
     
     #' @rdname as.data.frame
     #' @export
    +#' @noRd
     setGeneric("as.data.frame",
                function(x, row.names = NULL, optional = FALSE, ...) {
                  standardGeneric("as.data.frame")
                })
     
     #' @rdname attach
     #' @export
    +#' @noRd
     setGeneric("attach")
     
     #' @rdname cache
     #' @export
    +#' @noRd
     setGeneric("cache", function(x) { standardGeneric("cache") })
     
     #' @rdname checkpoint
     #' @export
    +#' @noRd
     setGeneric("checkpoint", function(x, eager = TRUE) { standardGeneric("checkpoint") })
     
     #' @rdname coalesce
     #' @param x a Column or a SparkDataFrame.
    --- End diff --
    
    wait, I think some wires are crossed. let me clarify. let's take `coalesce` as an example.
    
    there are 2 coalesce, one `coalesce(df)` like repartition, and one `coalesce(df$foo)` on a column like in SQL. so therefore, these are in fact 2 `coalesce` and `x is either a SparkDataFrame or a Column`.
    
    and to elaborate, the history behind that approach we have today is because we use to have this
    ```
    @param x a Column
    ...
    function(x = "Column"...)
    ```
    and at the same time, in a different .R file
    ```
    @param x a SparkDataFrame
    ...
    function(x = "SparkDataFrame"
    ```
    they seem fine when we write the code and it seems logical/easy to maintain, but when the rd/doc page is generated it has
    ```
    x a SparkDataFrame
    x a Column
    ```
    here `x` is explained twice. worse, the order is largely random (it's the alphabetic order of the .R file)
    
    and it is going against the standard R pattern of one description line for each parameter with the choice of type separated by `.. or..` like https://stat.ethz.ch/R-manual/R-devel/library/stats/html/lm.html, and not to mention CRAN check I think complain about parameter documented more than once.
    
    so in short, having one `@param x a SparkDataFrame or a Column` is intentional. since this is describing 2 things, from discussions back then it feels more nature to put it some where independent - in fact like you are touching on, I'd argue it's better to look it up in generic.R rather than trying to figure out what other existing overload class that method already has.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r118639090
  
    --- Diff: R/pkg/R/generics.R ---
    @@ -1403,20 +1416,25 @@ setGeneric("unix_timestamp", function(x, format) { standardGeneric("unix_timesta
     #' @export
     setGeneric("upper", function(x) { standardGeneric("upper") })
     
    -#' @rdname var
    +#' @rdname column_aggregate_functions
    +#' @param y,na.rm,use currently not used.
    --- End diff --
    
    should this be moved like this line https://github.com/apache/spark/pull/18025/files#diff-8e3d61ff66c9ffcd6ffb7a8eedc08409L923


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    I guess this comes from not seeing a lot of S4 methods? In fact, a overwhelming percentage of all methods in packages are S3, and I think users are using to search with `?avg` and also expect to see different classes documented on the same page.
    
    granted, I am aware some (many?) of these don't make sense (like coalesce(DF) and coalesce(col) have not much common) and that is a problem.
    
    would you propose they go to different page then? would that also solve https://issues.apache.org/jira/browse/SPARK-18825 ?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    AppVeyor failure is unfortunate. but it passed before a doc only change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #77631 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77631/testReport)** for PR 18025 at commit [`0044b29`](https://github.com/apache/spark/commit/0044b29853c949b0baac7c70ed35658ed6005943).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r122617405
  
    --- Diff: R/pkg/R/stats.R ---
    @@ -52,22 +52,17 @@ setMethod("crosstab",
                 collect(dataFrame(sct))
               })
     
    -#' Calculate the sample covariance of two numerical columns of a SparkDataFrame.
    -#'
     #' @param colName1 the name of the first column
     #' @param colName2 the name of the second column
    -#' @return The covariance of the two columns.
     #'
     #' @rdname cov
    -#' @name cov
     #' @aliases cov,SparkDataFrame-method
     #' @family stat functions
     #' @export
     #' @examples
    -#'\dontrun{
    -#' df <- read.json("/path/to/file.json")
    -#' cov <- cov(df, "title", "gender")
    -#' }
    +#'
    +#' \dontrun{
    --- End diff --
    
    No. The newline should be between `@example` and `\dontrun` to separate multiple `dontruns`. 
    ![image](https://user-images.githubusercontent.com/11082368/27269043-73785762-5468-11e7-9a31-5cca104e005b.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #78244 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78244/testReport)** for PR 18025 at commit [`6eae126`](https://github.com/apache/spark/commit/6eae126398e4229aa84130728792f407c67a75e6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r122608217
  
    --- Diff: R/pkg/R/stats.R ---
    @@ -52,22 +52,17 @@ setMethod("crosstab",
                 collect(dataFrame(sct))
               })
     
    -#' Calculate the sample covariance of two numerical columns of a SparkDataFrame.
    -#'
     #' @param colName1 the name of the first column
     #' @param colName2 the name of the second column
    -#' @return The covariance of the two columns.
     #'
     #' @rdname cov
    -#' @name cov
     #' @aliases cov,SparkDataFrame-method
     #' @family stat functions
     #' @export
     #' @examples
    -#'\dontrun{
    -#' df <- read.json("/path/to/file.json")
    -#' cov <- cov(df, "title", "gender")
    -#' }
    +#'
    +#' \dontrun{
    --- End diff --
    
    shouldn't the newline be after the dontrun?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r122631024
  
    --- Diff: R/pkg/R/stats.R ---
    @@ -52,22 +52,17 @@ setMethod("crosstab",
                 collect(dataFrame(sct))
               })
     
    -#' Calculate the sample covariance of two numerical columns of a SparkDataFrame.
    -#'
     #' @param colName1 the name of the first column
     #' @param colName2 the name of the second column
    -#' @return The covariance of the two columns.
    --- End diff --
    
    Possibly, but better clarity wouldn't hurt, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77105/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    re: title, would explicitly adding `@title` help?
    re: multiple class - agreed, a link or `@seealso` should be good. wouldn't `?coalesce` show the overloads though


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #78264 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78264/testReport)** for PR 18025 at commit [`4cf5ab9`](https://github.com/apache/spark/commit/4cf5ab98771f19924e483ac716bd8a0618ba3f2e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #77105 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77105/testReport)** for PR 18025 at commit [`ead9781`](https://github.com/apache/spark/commit/ead9781d38fe5b74b9e5ad4a658b1a0e632c9740).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77414/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    very cool, thanks, I guess there's only this last comment https://github.com/apache/spark/pull/18025#discussion_r122631306


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r122133379
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -2254,18 +2198,12 @@ setMethod("approxCountDistinct",
                 column(jc)
               })
     
    -#' Count Distinct Values
    +#' @section Details:
    +#' \code{countDistinct}: Returns the number of distinct items in a group.
     #'
    -#' @param x Column to compute on
    -#' @param ... other columns
    --- End diff --
    
    this is the example perhaps we are losing some details by grouping these functions - instead of having a line that says "other columns" now we have "other arguments" which is perhaps less clear?
    
    thought?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    @felixcheung Thanks for your feedback. 
    - This does not affect discoverability: the name of the method is still on the index list 
    - No problem with help either, e.g., one can use `?avg`. 
    
    ![image](https://cloud.githubusercontent.com/assets/11082368/26232656/945b3afe-3c0c-11e7-8c17-fa8df5e4ee2e.png)
    
    Another benefit is that we can get rid of most warnings on no examples since we now document all the tiny functions together. 
    
    I think it is important and the change is straightforward. However, this is a pretty manual (and big) change. I would like to get a `Yes` from you for doing this. Thanks.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r122133897
  
    --- Diff: R/pkg/R/generics.R ---
    @@ -919,10 +920,9 @@ setGeneric("array_contains", function(x, value) { standardGeneric("array_contain
     #' @export
     setGeneric("ascii", function(x) { standardGeneric("ascii") })
     
    -#' @param x Column to compute on or a GroupedData object.
    --- End diff --
    
    I think we need to add this back to GroupedData, since that is now in a different rd?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    I think we need to give it a title explicitly - see the header/first line of https://cloud.githubusercontent.com/assets/11082368/26429381/64dd117e-409b-11e7-9661-659b5fbe8206.png


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [WIP][SparkR] Update doc and examples for sql fun...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r117602096
  
    --- Diff: R/pkg/R/generics.R ---
    @@ -385,6 +385,7 @@ setGeneric("value", function(bcast) { standardGeneric("value") })
     #' @return A SparkDataFrame.
     #' @rdname summarize
     #' @export
    +#' @noRd
    --- End diff --
    
    hmm, I think some of the problems are orthogonal - for example, to avoid a `avg.rd` file we could just change `avg` in generic.R to have `@rdname column_aggregate_functions`?
    
    I feel the standard is to have documentation on the generic (and have it on the same rd) like https://stat.ethz.ch/R-manual/R-devel/library/base/html/mean.html



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #77313 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77313/testReport)** for PR 18025 at commit [`10a24b3`](https://github.com/apache/spark/commit/10a24b3c4ece2fcf4711e1c6c5c4518b097cf93f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/18025

@felixcheung Thanks for asking this. I should have been more clear.

`@name` is the "name" of the Rd object represented, which is unique. An Rd object can have multiple aliases pointing to the same Rd. Usually, when one specifies `@name xxx`, there will be a `@alias xxx` automatically created to reference `xxx`.

In the `avg` case, I actually don't need `@name avg`, since the doc of `avg` goes to `column_aggregate_functions`. In the new commit, I used `@aliases avg avg,Column-method` which creates two aliases `avg` and `avg,Column-method`.

It now allows the following shortcut for help search which all direct to the Rd object `column_aggregate_functions` (the `@name` object):

- `?avg`
- `?"avg,Column-method"`
- `help("avg")`
- `?column_aggregate_functions`
- Indeed, since this is S4 method, we can also do `method?avg("Column")`.

Details of R documentation here:
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Writing-R-documentation-files

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #77313 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77313/testReport)** for PR 18025 at commit [`10a24b3`](https://github.com/apache/spark/commit/10a24b3c4ece2fcf4711e1c6c5c4518b097cf93f).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77084/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #78190 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78190/testReport)** for PR 18025 at commit [`978e13b`](https://github.com/apache/spark/commit/978e13b498b492495a2fa21e915c120791b59b9f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #77341 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77341/testReport)** for PR 18025 at commit [`038eac3`](https://github.com/apache/spark/commit/038eac3a60b330a29fc7099c31913175f6593e3c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    This is what the `'column_aggregate_functions.Rd'` doc looks like:
    
    ![image](https://cloud.githubusercontent.com/assets/11082368/26190195/fd353224-3b5c-11e7-9a78-2607cc665f49.png)
    ![image](https://cloud.githubusercontent.com/assets/11082368/26190198/ff5e2cae-3b5c-11e7-8942-271cd8a5a3cc.png)
    ![image](https://cloud.githubusercontent.com/assets/11082368/26190200/013f8356-3b5d-11e7-82fd-ee9e79228027.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78190/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #77343 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77343/testReport)** for PR 18025 at commit [`21e54c0`](https://github.com/apache/spark/commit/21e54c0fd035892066bc4964b8eca5f51331c29e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #78215 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78215/testReport)** for PR 18025 at commit [`875db0d`](https://github.com/apache/spark/commit/875db0dc02e03fab1df57ba105033a6597d45249).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    thanks, I am somewhat familiar with all the tags. I see that you have `@aliases avg` instead of `@name avg`. may I direct you to this JIRA - https://issues.apache.org/jira/browse/SPARK-18825 this is the one tracking all the double/triple links because of aliases and `@seealso`
    
    I think we are trying to remove `@aliases avg,Column-method` (ie. `-method` ones) and so it's better to make sure we come up with a consistent way to handle this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    I guess we don't need link to stddev_samp since it's the same page
    shouldn't std_dev and var_samp also on this page?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #78215 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78215/testReport)** for PR 18025 at commit [`875db0d`](https://github.com/apache/spark/commit/875db0dc02e03fab1df57ba105033a6597d45249).
     * This patch **fails some tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #77631 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77631/testReport)** for PR 18025 at commit [`0044b29`](https://github.com/apache/spark/commit/0044b29853c949b0baac7c70ed35658ed6005943).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #78152 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78152/testReport)** for PR 18025 at commit [`0a7f5fc`](https://github.com/apache/spark/commit/0a7f5fcac2e0295d92b82d8909c4f1b11c82f016).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r122358689
  
    --- Diff: R/pkg/R/generics.R ---
    @@ -919,10 +920,9 @@ setGeneric("array_contains", function(x, value) { standardGeneric("array_contain
     #' @export
     setGeneric("ascii", function(x) { standardGeneric("ascii") })
     
    -#' @param x Column to compute on or a GroupedData object.
    --- End diff --
    
    In this case, we will have to document `avg` on its own, like `count`, `first` and `last`. I cannot document the `x` param here since it will show up in the  doc for the column class. Interestingly, there is not even a doc of the `avg` method from the `GroupedData` class.... 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #77341 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77341/testReport)** for PR 18025 at commit [`038eac3`](https://github.com/apache/spark/commit/038eac3a60b330a29fc7099c31913175f6593e3c).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #78216 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78216/testReport)** for PR 18025 at commit [`79d9fdf`](https://github.com/apache/spark/commit/79d9fdf424cc24277673f30ec673ed6ae3eafeee).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    so I think better example is great and we might be too verbose with individual pages for each function so might be a good idea to consolidate them, but one question, does this affect discoverability?
    
    can function be found on http://spark.apache.org/docs/latest/api/R/index.html
    also can one find the function in the shell with `?abs`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    since this is a fairly big change to the documentation could you please open a JIRA? this is also how we could track this work against SPARK-18825


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    - New commit now resolves the Name issue. `@title` does not work, which is the header in the second line `\title{Aggregate functions for Column operations}`. The solution is to use `@name NULL` for the generics.  Now we have:
    
    ![image](https://cloud.githubusercontent.com/assets/11082368/26437454/3780b8d4-40d2-11e7-83e9-80eec206f000.png)
    
    - Also added several more practical examples. But most of these functions are very straightforward to use. 
    
    ![image](https://cloud.githubusercontent.com/assets/11082368/26437488/5be621be-40d2-11e7-8df8-0e5c99fb6ef6.png)
     



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r122133612
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -3948,26 +3869,18 @@ setMethod("grouping_bit",
                 column(jc)
               })
     
    -#' grouping_id
    -#'
    -#' Returns the level of grouping.
    -#'
    +#' @section Details:
    +#' \code{grouping_id}: Returns the level of grouping.
     #' Equals to \code{
     #' grouping_bit(c1) * 2^(n - 1) + grouping_bit(c2) * 2^(n - 2)  + ... + grouping_bit(cn)
     #' }
     #'
    -#' @param x Column to compute on
    -#' @param ... additional Column(s) (optional).
    --- End diff --
    
    ditto here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #78264 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78264/testReport)** for PR 18025 at commit [`4cf5ab9`](https://github.com/apache/spark/commit/4cf5ab98771f19924e483ac716bd8a0618ba3f2e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77046/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR][WIP] Grouped documentation for agg...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #77378 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77378/testReport)** for PR 18025 at commit [`21e54c0`](https://github.com/apache/spark/commit/21e54c0fd035892066bc4964b8eca5f51331c29e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #77318 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77318/testReport)** for PR 18025 at commit [`782ffc1`](https://github.com/apache/spark/commit/782ffc13a22dd36258b7736cb8a4b5f9f69953b2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #78190 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78190/testReport)** for PR 18025 at commit [`978e13b`](https://github.com/apache/spark/commit/978e13b498b492495a2fa21e915c120791b59b9f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r122362550
  
    --- Diff: R/pkg/R/generics.R ---
    @@ -919,10 +920,9 @@ setGeneric("array_contains", function(x, value) { standardGeneric("array_contain
     #' @export
     setGeneric("ascii", function(x) { standardGeneric("ascii") })
     
    -#' @param x Column to compute on or a GroupedData object.
    --- End diff --
    
    yes, that's one of the code-gen methods that don't actually have documentation (which is a problem) but somehow inherit one from base:: that CRAN check doesn't complain about it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r122608370
  
    --- Diff: R/pkg/R/stats.R ---
    @@ -52,22 +52,17 @@ setMethod("crosstab",
                 collect(dataFrame(sct))
               })
     
    -#' Calculate the sample covariance of two numerical columns of a SparkDataFrame.
    -#'
     #' @param colName1 the name of the first column
     #' @param colName2 the name of the second column
    -#' @return The covariance of the two columns.
    --- End diff --
    
    what would the `@return` line in the final doc?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    also, since we have an Rd now what you think about collecting all the example into one - that should eliminate all the `Not run` in every other line.
    
    I think then also this will be a great opportunity to do more than simple `head(select(...))` something expanded and more practical? what do you think?
    
    also this https://github.com/apache/spark/pull/18025#issuecomment-303838880
    
    I like this approach - these are my comments from your screen shot - I'll review more closely after more changes, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    haha. I like the `\emph`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77318/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    ![image](https://cloud.githubusercontent.com/assets/11082368/26429381/64dd117e-409b-11e7-9661-659b5fbe8206.png)
    ![image](https://cloud.githubusercontent.com/assets/11082368/26429388/69fcb452-409b-11e7-9c1b-5c91483af094.png)
    ![image](https://cloud.githubusercontent.com/assets/11082368/26429389/6cac649a-409b-11e7-80be-0c2af78124b0.png)
    
    
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r122356625
  
    --- Diff: R/pkg/R/generics.R ---
    @@ -1403,20 +1416,25 @@ setGeneric("unix_timestamp", function(x, format) { standardGeneric("unix_timesta
     #' @export
     setGeneric("upper", function(x) { standardGeneric("upper") })
     
    -#' @rdname var
    +#' @rdname column_aggregate_functions
    +#' @param y,na.rm,use currently not used.
    --- End diff --
    
    Good point. Moved to `column_aggregate_functions`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Thanks for the update. Look forward to your feedback. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    @felixcheung Could you take another look and let me know if there is anything else needed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/18025

@felixcheung I think we may want to distinguish a few cases:
1. For methods that are mainly defined by only one class, e.g., most function methods for Column, it makes sense to group and document them together. For example, most aggregate functions of Column go into one single Rd, since they are not defined for other classes. In this case, `avg` will go to this doc since it is not used by other classes.
2. For methods that are defined by multiple classes, e.g., the `show` method defined for SparkDataFrame, GroupedData, Column and StreamingQuery, we can still document them in `show.Rd`. In this case, `show` will go to this doc and shows the help for all classes that have defined a `show` method.
3. When it makes sense, we can also combine 1 & 2 above. For example, `gapply` and `gapplyCollecte` are defined for both SparkDataFrame and GroupedData. But we can still document them together and create shared examples.

Let me know if this makes sense.

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [WIP][SparkR] Update doc and examples for sql fun...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r117548367
  
    --- Diff: R/pkg/R/generics.R ---
    @@ -396,67 +397,81 @@ setGeneric("agg", function (x, ...) { standardGeneric("agg") })
     #' @param object x a SparkDataFrame or a Column
     #' @param data new name to use
     #' @return a SparkDataFrame or a Column
    +#' @noRd
     NULL
     
     #' @rdname arrange
     #' @export
    +#' @noRd
     setGeneric("arrange", function(x, col, ...) { standardGeneric("arrange") })
     
     #' @rdname as.data.frame
     #' @export
    +#' @noRd
     setGeneric("as.data.frame",
                function(x, row.names = NULL, optional = FALSE, ...) {
                  standardGeneric("as.data.frame")
                })
     
     #' @rdname attach
     #' @export
    +#' @noRd
     setGeneric("attach")
     
     #' @rdname cache
     #' @export
    +#' @noRd
     setGeneric("cache", function(x) { standardGeneric("cache") })
     
     #' @rdname checkpoint
     #' @export
    +#' @noRd
     setGeneric("checkpoint", function(x, eager = TRUE) { standardGeneric("checkpoint") })
     
     #' @rdname coalesce
     #' @param x a Column or a SparkDataFrame.
    --- End diff --
    
    waiting - having these here is intentional? if a function supports different type we don't want the documentation to go to DataFrame.R because then Column.R will look empty.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r122607813
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -2254,18 +2198,12 @@ setMethod("approxCountDistinct",
                 column(jc)
               })
     
    -#' Count Distinct Values
    +#' @section Details:
    +#' \code{countDistinct}: Returns the number of distinct items in a group.
     #'
    -#' @param x Column to compute on
    -#' @param ... other columns
    --- End diff --
    
    ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    @HyukjinKwon Thanks for catching this. They were incorrectly labeled as math functions instead of aggregate functions in SparkR. And that's why I did not change them. 
    New commit fixed this now. Note they are still documented in their own Rd because there is also a method defined for SparkDataFrame. I made some cleaning and updated the example to be runnable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r122133805
  
    --- Diff: R/pkg/R/generics.R ---
    @@ -1403,20 +1416,25 @@ setGeneric("unix_timestamp", function(x, format) { standardGeneric("unix_timesta
     #' @export
     setGeneric("upper", function(x) { standardGeneric("upper") })
     
    -#' @rdname var
    +#' @rdname column_aggregate_functions
    +#' @param y,na.rm,use currently not used.
    --- End diff --
    
    hmm, this is in `var` but also in `sd` too - perhaps should move to column_aggregate_functions
    
    in fact, in your screen shot they are both listed https://github.com/apache/spark/pull/18025#issuecomment-303877945
    `y,na.rm,use` and then `na.rm`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    merged to master, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #77318 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77318/testReport)** for PR 18025 at commit [`782ffc1`](https://github.com/apache/spark/commit/782ffc13a22dd36258b7736cb8a4b5f9f69953b2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #78152 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78152/testReport)** for PR 18025 at commit [`0a7f5fc`](https://github.com/apache/spark/commit/0a7f5fcac2e0295d92b82d8909c4f1b11c82f016).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r122131981
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -85,17 +100,20 @@ setMethod("acos",
                 column(jc)
               })
     
    -#' Returns the approximate number of distinct items in a group
    +#' @section Details:
    --- End diff --
    
    would it be more concise to do `@details`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #77046 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77046/testReport)** for PR 18025 at commit [`5c8cd1e`](https://github.com/apache/spark/commit/5c8cd1e5da896d78ea3cb4fcf5e046d22090dc2a).
     * This patch **fails some tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r118643715
  
    --- Diff: R/pkg/R/generics.R ---
    @@ -1403,20 +1416,25 @@ setGeneric("unix_timestamp", function(x, format) { standardGeneric("unix_timesta
     #' @export
     setGeneric("upper", function(x) { standardGeneric("upper") })
     
    -#' @rdname var
    +#' @rdname column_aggregate_functions
    +#' @param y,na.rm,use currently not used.
    --- End diff --
    
    Good catch. But I  think this one makes more sense to stay here because all these arguments are specific to the `var` generic, which is not used anywhere else. I moved the `...` in `avg` to `column_aggregate_functions` to avoid duplicated doc for `...`.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #77420 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77420/testReport)** for PR 18025 at commit [`ab6e4f1`](https://github.com/apache/spark/commit/ab6e4f1651ec09e576b8dcf8a611c9f2ea2169a5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    BTW, I checked the description/examples. These generally look good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78149/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Thanks for summarizing. I think they make sense. To be clear though, we should also talk about:
    - what if a method is defined in one class and belongs in a group, but also defined for another class (eg. sql function: `cov`)
    - what if it is defined for multiple classes but meaning are drastically different (eg. coalesce(DF) and coalesce(col)  in my example above)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    @felixcheung 
    - The links to `stddev_samp` etc are already removed in the latest commit. 
    - About collecting all the example into one, I think that'll work for this particular one. But I'm not sure about this in general. These methods are still spread out in `.R` file. And if we decide to change the grouping of these functions later on, it will be very difficult if we don't have examples in those methods. 
    - For a method that is defined for multiple classes but meaning are drastically different, I agree that it's best to document by class. One downside is a generic `?coalesce` can only go to one help page,  e.g., the help for SparkDataFrame, not the other classed. However, we can add links to the `coalesce` methods for the other classes in the `SeeAlso` section. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r119538471
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -1081,19 +1098,12 @@ setMethod("md5",
                 column(jc)
               })
     
    -#' mean
    -#'
    -#' Aggregate function: returns the average of the values in a group.
    -#' Alias for avg.
    -#'
    -#' @param x Column to compute on.
    +#' @section Details:
    +#' \code{mean}: Returns the average of the values in a group. Alias for avg.
    --- End diff --
    
    Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78215/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    @actuaryzhang, Are `corr`, `covar_pop` and `covar_samp` the same instances but missed? It looks these are also aggregate functions in `functions.scala` but these look missed here and I can see R has those functions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    This is how the doc for column_aggregate_functions looks like (only snapshot of the main parts):
    
    ![image](https://user-images.githubusercontent.com/11082368/27269174-85df12fa-5469-11e7-872d-d740fd382294.png)
    ![image](https://user-images.githubusercontent.com/11082368/27269177-8b35a67e-5469-11e7-80ac-7c804c3728d2.png)
    ![image](https://user-images.githubusercontent.com/11082368/27269180-8eb8c7a4-5469-11e7-8c4a-1de037bf078d.png)
    ![image](https://user-images.githubusercontent.com/11082368/27269184-91e39cb0-5469-11e7-932c-5eab772ec845.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78244/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/18025


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77420/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r122132206
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -85,17 +100,20 @@ setMethod("acos",
                 column(jc)
               })
     
    -#' Returns the approximate number of distinct items in a group
    +#' @section Details:
    +#' \code{approxCountDistinct}: Returns the approximate number of distinct items in a group.
     #'
    -#' Returns the approximate number of distinct items in a group. This is a column
    -#' aggregate function.
    -#'
    -#' @rdname approxCountDistinct
    -#' @name approxCountDistinct
    -#' @return the approximate number of distinct items in a group.
    +#' @rdname column_aggregate_functions
     #' @export
    -#' @aliases approxCountDistinct,Column-method
    -#' @examples \dontrun{approxCountDistinct(df$c)}
    +#' @aliases approxCountDistinct approxCountDistinct,Column-method
    +#' @examples
    +#'
    --- End diff --
    
    extra newline?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78152/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Will give a shot to help double check examples and contents about correctness and consistency at my best tonight too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #78149 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78149/testReport)** for PR 18025 at commit [`014b9f3`](https://github.com/apache/spark/commit/014b9f3069a6e2075cb8be307c5d74081dabe15a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r122322352
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -2254,18 +2198,12 @@ setMethod("approxCountDistinct",
                 column(jc)
               })
     
    -#' Count Distinct Values
    +#' @section Details:
    +#' \code{countDistinct}: Returns the number of distinct items in a group.
     #'
    -#' @param x Column to compute on
    -#' @param ... other columns
    --- End diff --
    
    I agree it is less clear, but the impact is very minor if we have examples to illustrate passing additional columns. I now updated the doc of the argument as
    `#' @param ... additional argument(s). For example, it could be used to pass additional Columns. `
    And update the example to have multiple columns:
    `head(select(df, countDistinct(df$gear, df$cyl)))`
    Do the above changes address your concern? 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #78153 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78153/testReport)** for PR 18025 at commit [`19d063c`](https://github.com/apache/spark/commit/19d063c6995fa6bd780830a941f6b1f7c45c1bac).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/18025

@felixcheung I just made a new commit which I think has the cleanest solution so far. In this one, I implemented grouping for all aggregate functions for Column, except those that are also defined for other classes (`count`, `first` and `last`). As you can see, it achieves the following:
- Centralized documentation for easy navigation.
- Reduced number of items in `See also`
- Betters examples using shared data. This avoids creating a data frame for each function if they are documented separately.
- Cleaner structure and much fewer Rd files.
- Remove duplicated definition of `@param`
- No need to write meaningless examples for trivial functions (because of grouping).

In this version, I also demonstrate the for methods defined by multiple classes (`count`, `first` and `last`), we can still document them on their own RD, and simply give a link in the `SeeAlso` section. Of course, we can combine the doc for these three to something like `shared_methods.Rd` since each of them is tiny.

Also, to facilitate review, perhaps we can break the changes into several PRs, one for each of `aggregate_functions`, `datetime_functions`, `math_function`, and `misc_functions`?

After making the change to the Column methods, I will work on the doc for SparkDataFrame and GroupedData.

Please let me know your thoughts.

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #78153 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78153/testReport)** for PR 18025 at commit [`19d063c`](https://github.com/apache/spark/commit/19d063c6995fa6bd780830a941f6b1f7c45c1bac).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r119530100
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -1081,19 +1098,12 @@ setMethod("md5",
                 column(jc)
               })
     
    -#' mean
    -#'
    -#' Aggregate function: returns the average of the values in a group.
    -#' Alias for avg.
    -#'
    -#' @param x Column to compute on.
    +#' @section Details:
    +#' \code{mean}: Returns the average of the values in a group. Alias for avg.
    --- End diff --
    
    little suggestion: `avg` -> `\code{avg}`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    @felixcheung All comments are addressed now and I think this is ready for review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r122609690
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -361,10 +361,13 @@ setMethod("column",
     #'
     #' @rdname corr
     #' @name corr
    -#' @family math functions
    +#' @family aggregate functions
     #' @export
     #' @aliases corr,Column-method
    -#' @examples \dontrun{corr(df$c, df$d)}
    +#' @examples
    +#' \dontrun{
    +#' df <- createDataFrame(cbind(model = rownames(mtcars), mtcars))
    --- End diff --
    
    this one does not need the extra newline since it's in its own Rd and there are no examples before it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [WIP][SparkR] Update doc and examples for sql fun...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r117548343
  
    --- Diff: R/pkg/R/generics.R ---
    @@ -385,6 +385,7 @@ setGeneric("value", function(bcast) { standardGeneric("value") })
     #' @return A SparkDataFrame.
     #' @rdname summarize
     #' @export
    +#' @noRd
    --- End diff --
    
    what's the reason for the noRd here? I think it's fairly standard to "document" generic - even though it's not useful most of the time


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #77420 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77420/testReport)** for PR 18025 at commit [`ab6e4f1`](https://github.com/apache/spark/commit/ab6e4f1651ec09e576b8dcf8a611c9f2ea2169a5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r122608317
  
    --- Diff: R/pkg/R/stats.R ---
    @@ -52,22 +52,17 @@ setMethod("crosstab",
                 collect(dataFrame(sct))
               })
     
    -#' Calculate the sample covariance of two numerical columns of a SparkDataFrame.
    --- End diff --
    
    hmm, this is one of the tricky ones where there is one page for DataFrame & Columns.
    I think it's useful to touch of how this works with a SparkDataFrame and keep this line in some form?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [WIP][SparkR] Update doc and examples for sql fun...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r117565367
  
    --- Diff: R/pkg/R/generics.R ---
    @@ -385,6 +385,7 @@ setGeneric("value", function(bcast) { standardGeneric("value") })
     #' @return A SparkDataFrame.
     #' @rdname summarize
     #' @export
    +#' @noRd
    --- End diff --
    
    @felixcheung I intentionally suppressed the documentation for generic functions. The main reason is the documentation of generics conflict with documenting groups of methods together. For example, if I document the `avg` method for the column class together with other column aggregate methods, here is a couple of scenarios that could happen to the generic `avg` method:
    1. Without `@noRd`, roxygen will create a `avg` rd file to document the `avg` generic method ONLY. Since in the generics definition, we don't have `@title` or `@description`, the package won't compile. 
    2. Suppose we get rid of `@noRd`, and document all generics functions under a new `generics` rd name. This will compile, but then we have conflicting definitions of the arguments. For example, `...` could mean different things in different generics. But this is a minor issue. 
    
    The current approach is to suppress the documentation of the generics, since we document each method of the generics. This does not lose any information IMO. The user can always use `getGeneric("avg")` to get the prototype for the generics. Would like to hear your thoughts. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77631/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r119530600
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -1630,18 +1609,12 @@ setMethod("sqrt",
                 column(jc)
               })
     
    -#' sum
    -#'
    -#' Aggregate function: returns the sum of all values in the expression.
    +#' @section Details:
    +#' \code{sum}: Returns the sum of all values in the expression.
     #'
    -#' @param x Column to compute on.
    -#'
    -#' @rdname sum
    -#' @name sum
    -#' @family aggregate functions
    -#' @aliases sum,Column-method
    +#' @rdname column_aggregate_functions
    +#' @aliases sum sum,Column-method
     #' @export
    -#' @examples \dontrun{sum(df$c)}
    --- End diff --
    
    little nit: It looks this example is missed in examples?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    thanks for looking into this and yes it will be useful to understand better before proceeding further.
    
    how does the `?` help search or API index page work in this case? I wonder if it's because of the `@aliases` tag (or might be `@name`?) - `aliases` we are discussing ways to get rid of since it creates duplicated links everywhere. I just want to make sure we don't break usability - people can't find help in the shell is a huge problem ;)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78264/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r122630839
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -361,10 +361,13 @@ setMethod("column",
     #'
     #' @rdname corr
     #' @name corr
    -#' @family math functions
    +#' @family aggregate functions
     #' @export
     #' @aliases corr,Column-method
    -#' @examples \dontrun{corr(df$c, df$d)}
    +#' @examples
    +#' \dontrun{
    +#' df <- createDataFrame(cbind(model = rownames(mtcars), mtcars))
    --- End diff --
    
    or maybe we should have a newline at the end of every `@example` block (when there are multiple examples on one Rd)? This way we don't have to know where goes first


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [WIP][SparkR] Update doc and examples for sql fun...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r117566353
  
    --- Diff: R/pkg/R/generics.R ---
    @@ -396,67 +397,81 @@ setGeneric("agg", function (x, ...) { standardGeneric("agg") })
     #' @param object x a SparkDataFrame or a Column
     #' @param data new name to use
     #' @return a SparkDataFrame or a Column
    +#' @noRd
     NULL
     
     #' @rdname arrange
     #' @export
    +#' @noRd
     setGeneric("arrange", function(x, col, ...) { standardGeneric("arrange") })
     
     #' @rdname as.data.frame
     #' @export
    +#' @noRd
     setGeneric("as.data.frame",
                function(x, row.names = NULL, optional = FALSE, ...) {
                  standardGeneric("as.data.frame")
                })
     
     #' @rdname attach
     #' @export
    +#' @noRd
     setGeneric("attach")
     
     #' @rdname cache
     #' @export
    +#' @noRd
     setGeneric("cache", function(x) { standardGeneric("cache") })
     
     #' @rdname checkpoint
     #' @export
    +#' @noRd
     setGeneric("checkpoint", function(x, eager = TRUE) { standardGeneric("checkpoint") })
     
     #' @rdname coalesce
     #' @param x a Column or a SparkDataFrame.
    --- End diff --
    
    Thanks for pointing this out. In this case, it should be `a Column` not `a SparkDataFrame or a Column`.  Indeed, this is a problem of documenting only the generics and inheriting the parameter doc from the generics in each method. Ideally, each specific method of a generic should have its own documentation, not the way we document the generics now.  Suppose we have a new method `alias` for another class, the doc in the generics needs to be updated to include this new class. But it's not clear to the developer where to make the change. If we instead just document each method of the generics, it will be much clearer. 
    I will do a thorough cleanup of these issues after we decide on the big items. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78216/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR][WIP] Grouped documentation for agg...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Opened a JIRA. We would need several PRs to fix all doc issues. 
    Also, not sure why Jenkins failed as the error msg is not clear and all tests passed on my computer. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    @felixcheung Your comments are all addressed now. Please let me know if there is anything else needed. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #77046 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77046/testReport)** for PR 18025 at commit [`5c8cd1e`](https://github.com/apache/spark/commit/5c8cd1e5da896d78ea3cb4fcf5e046d22090dc2a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR][WIP] Grouped documentation for agg...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77378/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR][WIP] Grouped documentation for agg...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    @HyukjinKwon Thanks much for the review. New commit now fixes the issues you pointed out. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    @felixcheung 
    I'm curious what is the motivation to remove the style `avg,Column-method`?
    This is the default and preferred (I believe) way to reference to S4 methods in R. 
    Suppose we have new classes and methods on `avg`, say `avg,DataFrame-method`, `avg,RDD-method`, `avg,Row-method`, etc, is the idea to document all `avg` methods in a single doc and use one `\alias{avg}` to reference it? 
    
    If that's the case, then the idea of grouping different methods by class will not be useful. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77343/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77341/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77313/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r122630544
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -361,10 +361,13 @@ setMethod("column",
     #'
     #' @rdname corr
     #' @name corr
    -#' @family math functions
    +#' @family aggregate functions
     #' @export
     #' @aliases corr,Column-method
    -#' @examples \dontrun{corr(df$c, df$d)}
    +#' @examples
    +#' \dontrun{
    +#' df <- createDataFrame(cbind(model = rownames(mtcars), mtcars))
    --- End diff --
    
    great - I know we talk about it, but we might consider getting all examples on the Rd into one `\dontrun` block again. as of now it's very hard to review new PR without knowing whether a newline is needed or not...



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Great point. 
    
    - For a method that is defined in one class and belongs in a group like `cov`, we can document it in its own Rd, and add a link to in the `SeeAlso` section of the group doc. In this case, the `\alias{cov}` will be in `cov.Rd`. 
    - For a method that is defined for multiple classes but meaning are drastically different: I think we can still document them in one Rd, and add a `details` section to describe the method for each class. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #78216 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78216/testReport)** for PR 18025 at commit [`79d9fdf`](https://github.com/apache/spark/commit/79d9fdf424cc24277673f30ec673ed6ae3eafeee).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r122312290
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -85,17 +100,20 @@ setMethod("acos",
                 column(jc)
               })
     
    -#' Returns the approximate number of distinct items in a group
    +#' @section Details:
    --- End diff --
    
    Yes, changed. Thanks for the suggestion. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    thanks! sorry about the delay - this is very important to have (and among your other pending PRs) - will pick this up again in ~ a week


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #77084 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77084/testReport)** for PR 18025 at commit [`8d21ebf`](https://github.com/apache/spark/commit/8d21ebfbb473ca21e9ede6c22be24cdc8043a22b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78153/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r119538499
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -1630,18 +1609,12 @@ setMethod("sqrt",
                 column(jc)
               })
     
    -#' sum
    -#'
    -#' Aggregate function: returns the sum of all values in the expression.
    +#' @section Details:
    +#' \code{sum}: Returns the sum of all values in the expression.
     #'
    -#' @param x Column to compute on.
    -#'
    -#' @rdname sum
    -#' @name sum
    -#' @family aggregate functions
    -#' @aliases sum,Column-method
    +#' @rdname column_aggregate_functions
    +#' @aliases sum sum,Column-method
     #' @export
    -#' @examples \dontrun{sum(df$c)}
    --- End diff --
    
    Good catch. Added to example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #78244 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78244/testReport)** for PR 18025 at commit [`6eae126`](https://github.com/apache/spark/commit/6eae126398e4229aa84130728792f407c67a75e6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/18025

@felixcheung @HyukjinKwon

Per this [suggestion](https://github.com/apache/spark/pull/18003#discussion-diff-116853922L57), I'm creating more meaningful examples for the SQL functions.

Since these functions can be grouped, we can create a single page doc for each group of the functions and construct concrete and useful examples for each group. The benefit is obvious:
- Centralized documentation of related functions. This makes it easier for user to navigate. Right now there are TOO many items in the `see also` section.
- Examples can share the same data. This avoids creating a data frame for each function if they are documented separately.
- Cleaner structure and much fewer Rd files.

Indeed, this is part of what was discussed in #17161. I have explored this for a few functions to illustrate the idea. Since this is a big effort, I would like to get folks' opinions before extending this to all functions.

In this commit, I created docs for some sample functions in three groups:
- 'column_datetime_functions' to document all datetime functions
- 'column_aggregate_functions' to document all aggregate functions
- 'column_math_functions' to document all math functions
- ...

Below is what 'column_datetime_functions.Rd' looks like:

![image](https://cloud.githubusercontent.com/assets/11082368/26189797/426029f0-3b5b-11e7-9175-c63b0e5c0014.png)
![image](https://cloud.githubusercontent.com/assets/11082368/26189810/56630954-3b5b-11e7-9d70-3e74b6d3b032.png)

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r122318874
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -85,17 +100,20 @@ setMethod("acos",
                 column(jc)
               })
     
    -#' Returns the approximate number of distinct items in a group
    +#' @section Details:
    +#' \code{approxCountDistinct}: Returns the approximate number of distinct items in a group.
     #'
    -#' Returns the approximate number of distinct items in a group. This is a column
    -#' aggregate function.
    -#'
    -#' @rdname approxCountDistinct
    -#' @name approxCountDistinct
    -#' @return the approximate number of distinct items in a group.
    +#' @rdname column_aggregate_functions
     #' @export
    -#' @aliases approxCountDistinct,Column-method
    -#' @examples \dontrun{approxCountDistinct(df$c)}
    +#' @aliases approxCountDistinct approxCountDistinct,Column-method
    +#' @examples
    +#'
    --- End diff --
    
    Yes, this newline is needed to separate blocks of examples. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r122616787
  
    --- Diff: R/pkg/R/stats.R ---
    @@ -52,22 +52,17 @@ setMethod("crosstab",
                 collect(dataFrame(sct))
               })
     
    -#' Calculate the sample covariance of two numerical columns of a SparkDataFrame.
    --- End diff --
    
    The method for SparkDataFrame is still there. I'm just removing redundant doc here. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #78149 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78149/testReport)** for PR 18025 at commit [`014b9f3`](https://github.com/apache/spark/commit/014b9f3069a6e2075cb8be307c5d74081dabe15a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r122608072
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -361,10 +361,13 @@ setMethod("column",
     #'
     #' @rdname corr
     #' @name corr
    -#' @family math functions
    +#' @family aggregate functions
     #' @export
     #' @aliases corr,Column-method
    -#' @examples \dontrun{corr(df$c, df$d)}
    +#' @examples
    +#' \dontrun{
    +#' df <- createDataFrame(cbind(model = rownames(mtcars), mtcars))
    --- End diff --
    
    do we need space/newline in front of this example like the other ones?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #77105 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77105/testReport)** for PR 18025 at commit [`ead9781`](https://github.com/apache/spark/commit/ead9781d38fe5b74b9e5ad4a658b1a0e632c9740).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r122617531
  
    --- Diff: R/pkg/R/stats.R ---
    @@ -52,22 +52,17 @@ setMethod("crosstab",
                 collect(dataFrame(sct))
               })
     
    -#' Calculate the sample covariance of two numerical columns of a SparkDataFrame.
    -#'
     #' @param colName1 the name of the first column
     #' @param colName2 the name of the second column
    -#' @return The covariance of the two columns.
    --- End diff --
    
    OK. I added this back. The doc should be very clear even without this return value. Indeed,  most functions do not document return value in SparkR. See what it looks like in the image attached in the next comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by actuaryzhang <gi...@git.apache.org>.

Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    OK. Updated the doc for the cov method for SparkDataFrame. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    >For a method that is defined for multiple classes but meaning are drastically different: I think we can still document them in one Rd, and add a details section to describe the method for each class.
    
    would this be too confusing though? particularly if we are moving to -method one per doc page



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18025#discussion_r122631306
  
    --- Diff: R/pkg/R/stats.R ---
    @@ -52,22 +52,17 @@ setMethod("crosstab",
                 collect(dataFrame(sct))
               })
     
    -#' Calculate the sample covariance of two numerical columns of a SparkDataFrame.
    --- End diff --
    
    I see. in that case can we add this in the `@details` for `cov`
    I feel like this has bits of info that could be useful `Calculate the sample covariance of two numerical columns of a SparkDataFrame` - say, *numerical* columns of *one* SparkDataFrame, as supposed to `cov(df, df$name, df2$bar)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    **[Test build #77414 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77414/testReport)** for PR 18025 at commit [`ab6e4f1`](https://github.com/apache/spark/commit/ab6e4f1651ec09e576b8dcf8a611c9f2ea2169a5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR][WIP] Grouped documentation for agg...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18025
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org