You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by NarineK <gi...@git.apache.org> on 2016/05/06 22:09:16 UTC

[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

GitHub user NarineK opened a pull request:

    https://github.com/apache/spark/pull/12966

    [SPARK-15196][SparkR] Add a wrapper for dapply(repartiition(col,...), ... )

    ## What changes were proposed in this pull request?
    
    As mentioned in :
    https://github.com/apache/spark/pull/12836#issuecomment-217338855
    We would like to create a wrapper for: dapply(repartiition(col,...), ... )
    This will allow to run aggregate functions on groups which are identified by a list of grouping columns.
    
    I called the wrapper method gapply. We can rename it if we want to call it differently.
    We could also have:
    ` setMethod("dapply", signature(x = "SparkDataFrame", func = "function", schema = "structType", col = "Column"),` , however, dapply already has many examples in the doc and if we add new examples for aggregate functions that will make the documentation longer and less clear.
    
    ## How was this patch tested?
    Unit tests
    1. Group by a column and compute mean
    2.  Group by a column and train linear model
    
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/NarineK/spark repartitionWithDapply

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12966.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12966
    
----
commit be5de6a42e50c42b4af15b87623bc7b49aecb353
Author: NarineK <na...@us.ibm.com>
Date:   2016-05-06T21:51:35Z

    Add a wrapper for dapply(repartiition(col,...), ... )

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217611301
  
    **[Test build #58059 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58059/consoleFull)** for PR 12966 at commit [`057ff9b`](https://github.com/apache/spark/commit/057ff9b30e56de4172c957e467b9b35dd932999a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217610810
  
    **[Test build #58059 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58059/consoleFull)** for PR 12966 at commit [`057ff9b`](https://github.com/apache/spark/commit/057ff9b30e56de4172c957e467b9b35dd932999a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by NarineK <gi...@git.apache.org>.
Github user NarineK commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217936323
  
    Thanks, @davies !
    This means that we cannot implement group-apply using repartitioning. 
    What would you suggest in this case ?
    My previous pull request works fine for one key.
    
    We can try another implementation with groupBy-> agg . In this case I understand that we will need to implement imperative or declarative aggregate which most probably will collect the rows in a buffer with the same grouping column and pass that buffer with the rows to R worker.
    Is this what you'd prefer ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12966#discussion_r62399743
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -1214,6 +1214,77 @@ setMethod("dapply",
                 dataFrame(sdf)
               })
     
    +#' gapply
    +#'
    +#' Apply a function to each group of a DataFrame. The group is defined by an input
    +#' grouping column(s).
    +#'
    +#' @param x A SparkDataFrame
    +#' @param func A function to be applied to each group partition specified by grouping
    +#'             column(s) of the SparkDataFrame.
    +#'             The output of func is a local R data.frame.
    +#' @param schema The schema of the resulting SparkDataFrame after the function is applied.
    +#'               It must match the output of func.
    +#' @family SparkDataFrame functions
    +#' @rdname gapply
    +#' @name gapply
    +#' @export
    +#' @examples
    +#'
    +#' \dontrun{
    +#'
    +#' Computes the arithmetic mean of `Sepal_Width` by grouping
    +#' on `Species`. Output the grouping value and the average.
    +#'
    +#' df <- createDataFrame (sqlContext, iris)
    +#' schema <-  structType(structField("Species", "string"), structField("Avg", "double"))
    +#' df1 <- gapply(
    +#'   df,
    +#'   function(x) {
    +#'     data.frame(x$Species[1], mean(x$Sepal_Width), stringsAsFactors = FALSE)
    +#'   },
    +#'   schema, col=df$"Species")
    +#' collect(df1)
    +#'
    +#' Species      Avg
    +#' -----------------
    +#' virginica   2.974
    +#' versicolor  2.770
    +#' setosa      3.428
    +#'
    +#' Fits linear models on iris dataset by grouping on the `Species` column and
    +#' using `Sepal_Length` as a target variable, `Sepal_Width`, `Petal_Length`
    +#' and `Petal_Width` as training features.
    +#'
    +#' df <- createDataFrame (sqlContext, iris)
    +#' schema <- structType(structField("(Intercept)", "double"),
    +#'   structField("Sepal_Width", "double"), structField("Petal_Length", "double"),
    +#'   structField("Petal_Width", "double"))
    +#' df1 <- gapply(
    +#'   df,
    +#'   function(x) {
    +#'     model <- suppressWarnings(lm(Sepal_Length ~
    +#'     Sepal_Width + Petal_Length + Petal_Width, x))
    +#'     data.frame(t(coef(model)))
    +#'   }, schema, df$"Species")
    +#' collect(df1)
    +#'
    +#'Result
    +#'---------
    +#' Model  (Intercept)  Sepal_Width  Petal_Length  Petal_Width
    +#' 1        0.699883    0.3303370    0.9455356    -0.1697527
    +#' 2        1.895540    0.3868576    0.9083370    -0.6792238
    +#' 3        2.351890    0.6548350    0.2375602     0.2521257
    +#'
    +#'}
    +setMethod("gapply",
    +          signature(x = "SparkDataFrame", func = "function", schema = "structType",
    +                    col = "Column"),
    --- End diff --
    
    can we handle multiple columns now that we are using repartition ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217611334
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58059/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217602308
  
    **[Test build #58050 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58050/consoleFull)** for PR 12966 at commit [`bf3a74d`](https://github.com/apache/spark/commit/bf3a74d34b21eaa6c3d1422c1135658d9be58a8a).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by NarineK <gi...@git.apache.org>.
Github user NarineK commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217739297
  
    Thanks, @sun-rui !
    
    Yes, It seems to be the case @sun-rui .
    I've recently hit the case where the number of partitions was less than the number of actual groups.
    
    I've tried the same thing on my previous implementation with groupByKey -> flatMap and it works perfectly fine with any repartitioning.
    
    Maybe @davies, has some suggestions about this.  
    If there is any repartitioner which will guarantee that each group will be in a single partition then we can use it otherwise this won't give us the expected result.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217588851
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58041/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217593698
  
    **[Test build #58043 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58043/consoleFull)** for PR 12966 at commit [`9704956`](https://github.com/apache/spark/commit/97049564433607544beef439ddce272f607298d9).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217578938
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58035/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217602310
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by sun-rui <gi...@git.apache.org>.
Github user sun-rui commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217699516
  
    @NarineK, it is guaranteed that all items in a same group will be in a same partition. But  it is not guaranteed that there is only single group in a partition. There could be multiple groups in a partition.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by NarineK <gi...@git.apache.org>.
Github user NarineK commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12966#discussion_r62400586
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -1214,6 +1214,77 @@ setMethod("dapply",
                 dataFrame(sdf)
               })
     
    +#' gapply
    +#'
    +#' Apply a function to each group of a DataFrame. The group is defined by an input
    +#' grouping column(s).
    +#'
    +#' @param x A SparkDataFrame
    +#' @param func A function to be applied to each group partition specified by grouping
    +#'             column(s) of the SparkDataFrame.
    +#'             The output of func is a local R data.frame.
    +#' @param schema The schema of the resulting SparkDataFrame after the function is applied.
    +#'               It must match the output of func.
    +#' @family SparkDataFrame functions
    +#' @rdname gapply
    +#' @name gapply
    +#' @export
    +#' @examples
    +#'
    +#' \dontrun{
    +#'
    +#' Computes the arithmetic mean of `Sepal_Width` by grouping
    +#' on `Species`. Output the grouping value and the average.
    +#'
    +#' df <- createDataFrame (sqlContext, iris)
    +#' schema <-  structType(structField("Species", "string"), structField("Avg", "double"))
    +#' df1 <- gapply(
    +#'   df,
    +#'   function(x) {
    +#'     data.frame(x$Species[1], mean(x$Sepal_Width), stringsAsFactors = FALSE)
    +#'   },
    +#'   schema, col=df$"Species")
    +#' collect(df1)
    +#'
    +#' Species      Avg
    +#' -----------------
    +#' virginica   2.974
    +#' versicolor  2.770
    +#' setosa      3.428
    +#'
    +#' Fits linear models on iris dataset by grouping on the `Species` column and
    +#' using `Sepal_Length` as a target variable, `Sepal_Width`, `Petal_Length`
    +#' and `Petal_Width` as training features.
    +#'
    +#' df <- createDataFrame (sqlContext, iris)
    +#' schema <- structType(structField("(Intercept)", "double"),
    +#'   structField("Sepal_Width", "double"), structField("Petal_Length", "double"),
    +#'   structField("Petal_Width", "double"))
    +#' df1 <- gapply(
    +#'   df,
    +#'   function(x) {
    +#'     model <- suppressWarnings(lm(Sepal_Length ~
    +#'     Sepal_Width + Petal_Length + Petal_Width, x))
    +#'     data.frame(t(coef(model)))
    +#'   }, schema, df$"Species")
    +#' collect(df1)
    +#'
    +#'Result
    +#'---------
    +#' Model  (Intercept)  Sepal_Width  Petal_Length  Petal_Width
    +#' 1        0.699883    0.3303370    0.9455356    -0.1697527
    +#' 2        1.895540    0.3868576    0.9083370    -0.6792238
    +#' 3        2.351890    0.6548350    0.2375602     0.2521257
    +#'
    +#'}
    +setMethod("gapply",
    +          signature(x = "SparkDataFrame", func = "function", schema = "structType",
    +                    col = "Column"),
    --- End diff --
    
    I might have an issue in the signature, I'll fix it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217611333
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by NarineK <gi...@git.apache.org>.
Github user NarineK commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217958625
  
    Well, since the user provides the R function, I think he/she should provide the aggregate too. 
    Instead of providing:
    ```
    function(x) {
         data.frame(x$Species[1], mean(x$Sepal_Width), stringsAsFactors = FALSE)
     }
    ```
    
    They will provide:
    ```
    function(x) {
          data.frame(aggregate(x$Sepal_Width, by = list(x$Species), FUN = mean), stringsAsFactors = FALSE)
        }
    ```
    
    This is my understanding, there may be also other ways ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217593702
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217588848
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217592573
  
    **[Test build #58043 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58043/consoleFull)** for PR 12966 at commit [`9704956`](https://github.com/apache/spark/commit/97049564433607544beef439ddce272f607298d9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217587290
  
    **[Test build #58041 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58041/consoleFull)** for PR 12966 at commit [`204a105`](https://github.com/apache/spark/commit/204a1053dabd74d39a25e725276e31bb3a592917).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217580578
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217580569
  
    **[Test build #58037 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58037/consoleFull)** for PR 12966 at commit [`30693c2`](https://github.com/apache/spark/commit/30693c2b40ab459a9fe252a2d00595c8190f2094).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217593705
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58043/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217611993
  
    Change LGTM. Thanks @NarineK 
    
    cc @sun-rui @felixcheung @davies - any other comments on this ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217577994
  
    **[Test build #58037 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58037/consoleFull)** for PR 12966 at commit [`30693c2`](https://github.com/apache/spark/commit/30693c2b40ab459a9fe252a2d00595c8190f2094).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by NarineK <gi...@git.apache.org>.
Github user NarineK commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217945476
  
    I see, that's sounds good too, but, I'm not sure how user friendly it will be.
    I guess for R side we need something like this:
    df <- createDataFrame (sqlContext, iris)
    schema <-  structType(structField("Species", "string"), structField("avg", "double"))
    df <- repartition(df, col=df$"Species")
    df1 <- dapply(
        df,
        function(x) {
            data.frame(aggregate(x$Sepal_Width, by = list(x$Species), FUN = mean), stringsAsFactors = FALSE)
        },
        schema)
    collect(df1)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217602311
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58050/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by NarineK <gi...@git.apache.org>.
Github user NarineK commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12966#discussion_r62400077
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -1214,6 +1214,77 @@ setMethod("dapply",
                 dataFrame(sdf)
               })
     
    +#' gapply
    +#'
    +#' Apply a function to each group of a DataFrame. The group is defined by an input
    +#' grouping column(s).
    +#'
    +#' @param x A SparkDataFrame
    +#' @param func A function to be applied to each group partition specified by grouping
    +#'             column(s) of the SparkDataFrame.
    +#'             The output of func is a local R data.frame.
    +#' @param schema The schema of the resulting SparkDataFrame after the function is applied.
    +#'               It must match the output of func.
    +#' @family SparkDataFrame functions
    +#' @rdname gapply
    +#' @name gapply
    +#' @export
    +#' @examples
    +#'
    +#' \dontrun{
    +#'
    +#' Computes the arithmetic mean of `Sepal_Width` by grouping
    +#' on `Species`. Output the grouping value and the average.
    +#'
    +#' df <- createDataFrame (sqlContext, iris)
    +#' schema <-  structType(structField("Species", "string"), structField("Avg", "double"))
    +#' df1 <- gapply(
    +#'   df,
    +#'   function(x) {
    +#'     data.frame(x$Species[1], mean(x$Sepal_Width), stringsAsFactors = FALSE)
    +#'   },
    +#'   schema, col=df$"Species")
    +#' collect(df1)
    +#'
    +#' Species      Avg
    +#' -----------------
    +#' virginica   2.974
    +#' versicolor  2.770
    +#' setosa      3.428
    +#'
    +#' Fits linear models on iris dataset by grouping on the `Species` column and
    +#' using `Sepal_Length` as a target variable, `Sepal_Width`, `Petal_Length`
    +#' and `Petal_Width` as training features.
    +#'
    +#' df <- createDataFrame (sqlContext, iris)
    +#' schema <- structType(structField("(Intercept)", "double"),
    +#'   structField("Sepal_Width", "double"), structField("Petal_Length", "double"),
    +#'   structField("Petal_Width", "double"))
    +#' df1 <- gapply(
    +#'   df,
    +#'   function(x) {
    +#'     model <- suppressWarnings(lm(Sepal_Length ~
    +#'     Sepal_Width + Petal_Length + Petal_Width, x))
    +#'     data.frame(t(coef(model)))
    +#'   }, schema, df$"Species")
    +#' collect(df1)
    +#'
    +#'Result
    +#'---------
    +#' Model  (Intercept)  Sepal_Width  Petal_Length  Petal_Width
    +#' 1        0.699883    0.3303370    0.9455356    -0.1697527
    +#' 2        1.895540    0.3868576    0.9083370    -0.6792238
    +#' 3        2.351890    0.6548350    0.2375602     0.2521257
    +#'
    +#'}
    +setMethod("gapply",
    +          signature(x = "SparkDataFrame", func = "function", schema = "structType",
    +                    col = "Column"),
    --- End diff --
    
    yes, absolutely!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217578915
  
    **[Test build #58035 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58035/consoleFull)** for PR 12966 at commit [`be5de6a`](https://github.com/apache/spark/commit/be5de6a42e50c42b4af15b87623bc7b49aecb353).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by NarineK <gi...@git.apache.org>.
Github user NarineK commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217955181
  
    What do you think, @shivaram , @sun-rui , @felixcheung  ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by sun-rui <gi...@git.apache.org>.
Github user sun-rui commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-221144808
  
    @NarineK, I think we can close this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217956433
  
    I am not sure why it should affect SparkR users. Users will still give us the same function to `gapply` as before but we will implement this inside SparkR side using the `aggregate` code snippet above ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by NarineK <gi...@git.apache.org>.
Github user NarineK commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-221157067
  
    sure!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by sun-rui <gi...@git.apache.org>.
Github user sun-rui commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217632210
  
    @NarineK , @shivaram . sorry for missing the previous discussion. The problem is that repartition() can have multiple groups in a partition. This is not gapply() meant to handle with. Do I miss something?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217588838
  
    **[Test build #58041 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58041/consoleFull)** for PR 12966 at commit [`204a105`](https://github.com/apache/spark/commit/204a1053dabd74d39a25e725276e31bb3a592917).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217578934
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by NarineK <gi...@git.apache.org>.
Github user NarineK commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217612497
  
    Thanks, @shivaram !
    I have one question regarding factor datatypes. It seems that SparkR doesn't support it and we have to set: `stringsAsFactors = FALSE` in order to avoid "UnsupportedType" exceptions in data.frame.
    
    Is it possible to map factor to maybe to list or string ... ?
    
    R's data.frame converts strings to factor ... , but a factor in general doesn't have to be a string.
    
    here is a snapshot from R's documentation: 
    
    `Character variables passed to data.frame are converted to factor columns unless protected by I or argument stringsAsFactors is false. 
    `
    https://stat.ethz.ch/R-manual/R-devel/library/base/html/data.frame.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217580580
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58037/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217939409
  
    @NarineK After repartition(), all the row with same grouping key are in the same partition, so we could have another groupBy() in R for single partition, then do the aggregate.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217932852
  
    @NarineK There is no such partitioner right now (it can't be cheap).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by NarineK <gi...@git.apache.org>.
Github user NarineK commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217971884
  
    This is similar to the following test case for repatition:
    https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala#L1238


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by sun-rui <gi...@git.apache.org>.
Github user sun-rui commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-218022565
  
    i suggest we follow the original design. i will take a detailed look at previous PR soon


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by NarineK <gi...@git.apache.org>.
Github user NarineK closed the pull request at:

    https://github.com/apache/spark/pull/12966


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217575956
  
    **[Test build #58035 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58035/consoleFull)** for PR 12966 at commit [`be5de6a`](https://github.com/apache/spark/commit/be5de6a42e50c42b4af15b87623bc7b49aecb353).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217601769
  
    **[Test build #58050 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58050/consoleFull)** for PR 12966 at commit [`bf3a74d`](https://github.com/apache/spark/commit/bf3a74d34b21eaa6c3d1422c1135658d9be58a8a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15196][SparkR] Add a wrapper for dapply...

Posted by NarineK <gi...@git.apache.org>.
Github user NarineK commented on the pull request:

    https://github.com/apache/spark/pull/12966#issuecomment-217662459
  
    Hi @sun-rui , 
    I think it depends on how we do the repartitioning. It shouldn't be the case when we do it by : 
    https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L2191
    @davies , can it be the case ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org