You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by yanboliang <gi...@git.apache.org> on 2016/04/30 12:46:49 UTC

[GitHub] spark pull request: [SPARK-15030] [ML] [SparkR] Support formula in...

GitHub user yanboliang opened a pull request:

    https://github.com/apache/spark/pull/12813

    [SPARK-15030] [ML] [SparkR] Support formula in spark.kmeans in SparkR

    ## What changes were proposed in this pull request?
    * Support formula in ```spark.kmeans``` in SparkR.
    * Fix some outdated docs for SparkR.
    
    ## How was this patch tested?
    Unit tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yanboliang/spark spark-15030

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12813.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12813
    
----
commit 2080090e7a2ec55eb0c2a9f9be79db76f8a98189
Author: Yanbo Liang <yb...@gmail.com>
Date:   2016-04-30T09:46:22Z

    Support formula in spark.kmeans in SparkR

commit 12dfae14670e3c526530c4aa70810d2e45195b73
Author: Yanbo Liang <yb...@gmail.com>
Date:   2016-04-30T10:06:03Z

    fix some docs

commit f1ba442350378296d07855a71e2299bd9fecde11
Author: Yanbo Liang <yb...@gmail.com>
Date:   2016-04-30T10:43:33Z

    update some docs

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15030] [ML] [SparkR] Support formula in...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12813#issuecomment-215964440
  
    **[Test build #57445 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57445/consoleFull)** for PR 12813 at commit [`79d1be4`](https://github.com/apache/spark/commit/79d1be46a518010247a722a51868f7d71bb56557).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15030] [ML] [SparkR] Support formula in...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12813#issuecomment-215956645
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57439/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15030] [ML] [SparkR] Support formula in...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12813#issuecomment-215966562
  
    **[Test build #57446 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57446/consoleFull)** for PR 12813 at commit [`5bdce92`](https://github.com/apache/spark/commit/5bdce92bb427d5a665fba9eb20b6c6ec9bb3428b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15030] [ML] [SparkR] Support formula in...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12813#issuecomment-215971073
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57446/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15030] [ML] [SparkR] Support formula in...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12813#issuecomment-215956643
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15030] [ML] [SparkR] Support formula in...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12813#discussion_r61669864
  
    --- Diff: R/pkg/R/mllib.R ---
    @@ -271,22 +274,25 @@ setMethod("summary", signature(object = "NaiveBayesModel"),
     #' Fit a k-means model, similarly to R's kmeans().
     #'
     #' @param data SparkDataFrame for training
    -#' @param k Number of centers
    -#' @param maxIter Maximum iteration number
    -#' @param initializationMode Algorithm choosen to fit the model
    +#' @param formula A symbolic description of the model to be fitted. Currently only a few formula
    +#'                operators are supported, including '~', '.', ':', '+', and '-'.
    +#'                Note that the response variable of formula is empty in spark.kmeans.
    +#' @param centers Number of centers
    +#' @param iter.max Maximum iteration number
    +#' @param algorithm The initialization algorithm choosen to fit the model
     #' @return A fitted k-means model
     #' @rdname spark.kmeans
     #' @export
     #' @examples
     #' \dontrun{
    -#' model <- spark.kmeans(data, k = 2, initializationMode="random")
    +#' model <- spark.kmeans(data, ~ ., centers = 2, algorithm="random")
     #' }
    -setMethod("spark.kmeans", signature(data = "SparkDataFrame"),
    -          function(data, k, maxIter = 10, initializationMode = c("random", "k-means||")) {
    -            columnNames <- as.array(colnames(data))
    -            initializationMode <- match.arg(initializationMode)
    -            jobj <- callJStatic("org.apache.spark.ml.r.KMeansWrapper", "fit", data@sdf,
    -                                k, maxIter, initializationMode, columnNames)
    +setMethod("spark.kmeans", signature(data = "SparkDataFrame", formula = "formula"),
    +          function(data, formula, centers, iter.max = 10, algorithm = c("random", "k-means||")) {
    --- End diff --
    
    In ```spark.glm```, we use
    ```
    function(data, formula, family = gaussian, epsilon = 1e-06, maxit = 25)
    ```
    Should we also fix them in this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15030] [ML] [SparkR] Support formula in...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12813#issuecomment-215963096
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57442/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15030] [ML] [SparkR] Support formula in...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12813#issuecomment-215970154
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15030] [ML] [SparkR] Support formula in...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12813#issuecomment-215963089
  
    **[Test build #57442 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57442/consoleFull)** for PR 12813 at commit [`f1ba442`](https://github.com/apache/spark/commit/f1ba442350378296d07855a71e2299bd9fecde11).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15030] [ML] [SparkR] Support formula in...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12813#issuecomment-215961834
  
    **[Test build #57442 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57442/consoleFull)** for PR 12813 at commit [`f1ba442`](https://github.com/apache/spark/commit/f1ba442350378296d07855a71e2299bd9fecde11).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15030] [ML] [SparkR] Support formula in...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/12813#issuecomment-215973911
  
    LGTM. Merged into master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15030] [ML] [SparkR] Support formula in...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12813#discussion_r61669670
  
    --- Diff: R/pkg/R/mllib.R ---
    @@ -271,22 +274,25 @@ setMethod("summary", signature(object = "NaiveBayesModel"),
     #' Fit a k-means model, similarly to R's kmeans().
     #'
     #' @param data SparkDataFrame for training
    -#' @param k Number of centers
    -#' @param maxIter Maximum iteration number
    -#' @param initializationMode Algorithm choosen to fit the model
    +#' @param formula A symbolic description of the model to be fitted. Currently only a few formula
    +#'                operators are supported, including '~', '.', ':', '+', and '-'.
    +#'                Note that the response variable of formula is empty in spark.kmeans.
    +#' @param centers Number of centers
    +#' @param iter.max Maximum iteration number
    +#' @param algorithm The initialization algorithm choosen to fit the model
     #' @return A fitted k-means model
     #' @rdname spark.kmeans
     #' @export
     #' @examples
     #' \dontrun{
    -#' model <- spark.kmeans(data, k = 2, initializationMode="random")
    +#' model <- spark.kmeans(data, ~ ., centers = 2, algorithm="random")
     #' }
    -setMethod("spark.kmeans", signature(data = "SparkDataFrame"),
    -          function(data, k, maxIter = 10, initializationMode = c("random", "k-means||")) {
    -            columnNames <- as.array(colnames(data))
    -            initializationMode <- match.arg(initializationMode)
    -            jobj <- callJStatic("org.apache.spark.ml.r.KMeansWrapper", "fit", data@sdf,
    -                                k, maxIter, initializationMode, columnNames)
    +setMethod("spark.kmeans", signature(data = "SparkDataFrame", formula = "formula"),
    +          function(data, formula, centers, iter.max = 10, algorithm = c("random", "k-means||")) {
    --- End diff --
    
    I vote for ```max.iter/init.mode```, and will update it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15030] [ML] [SparkR] Support formula in...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12813#discussion_r61669606
  
    --- Diff: R/pkg/R/mllib.R ---
    @@ -271,22 +274,25 @@ setMethod("summary", signature(object = "NaiveBayesModel"),
     #' Fit a k-means model, similarly to R's kmeans().
     #'
     #' @param data SparkDataFrame for training
    -#' @param k Number of centers
    -#' @param maxIter Maximum iteration number
    -#' @param initializationMode Algorithm choosen to fit the model
    +#' @param formula A symbolic description of the model to be fitted. Currently only a few formula
    +#'                operators are supported, including '~', '.', ':', '+', and '-'.
    +#'                Note that the response variable of formula is empty in spark.kmeans.
    +#' @param centers Number of centers
    +#' @param iter.max Maximum iteration number
    +#' @param algorithm The initialization algorithm choosen to fit the model
     #' @return A fitted k-means model
     #' @rdname spark.kmeans
     #' @export
     #' @examples
     #' \dontrun{
    -#' model <- spark.kmeans(data, k = 2, initializationMode="random")
    +#' model <- spark.kmeans(data, ~ ., centers = 2, algorithm="random")
     #' }
    -setMethod("spark.kmeans", signature(data = "SparkDataFrame"),
    -          function(data, k, maxIter = 10, initializationMode = c("random", "k-means||")) {
    -            columnNames <- as.array(colnames(data))
    -            initializationMode <- match.arg(initializationMode)
    -            jobj <- callJStatic("org.apache.spark.ml.r.KMeansWrapper", "fit", data@sdf,
    -                                k, maxIter, initializationMode, columnNames)
    +setMethod("spark.kmeans", signature(data = "SparkDataFrame", formula = "formula"),
    +          function(data, formula, centers, iter.max = 10, algorithm = c("random", "k-means||")) {
    --- End diff --
    
    Since `spark.kmeans` already indicates that this method is different from R's `kmeans`, we made the param names consistent with MLlib params. Especially the change from `algorithm` to `initMode` makes more sense. We should discuss whether we want to use `maxIter`/`initMode` or `max.iter`/`init.mode` as params.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15030] [ML] [SparkR] Support formula in...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12813#discussion_r61669769
  
    --- Diff: R/pkg/R/mllib.R ---
    @@ -271,22 +274,25 @@ setMethod("summary", signature(object = "NaiveBayesModel"),
     #' Fit a k-means model, similarly to R's kmeans().
     #'
     #' @param data SparkDataFrame for training
    -#' @param k Number of centers
    -#' @param maxIter Maximum iteration number
    -#' @param initializationMode Algorithm choosen to fit the model
    +#' @param formula A symbolic description of the model to be fitted. Currently only a few formula
    +#'                operators are supported, including '~', '.', ':', '+', and '-'.
    +#'                Note that the response variable of formula is empty in spark.kmeans.
    +#' @param centers Number of centers
    +#' @param iter.max Maximum iteration number
    +#' @param algorithm The initialization algorithm choosen to fit the model
     #' @return A fitted k-means model
     #' @rdname spark.kmeans
     #' @export
     #' @examples
     #' \dontrun{
    -#' model <- spark.kmeans(data, k = 2, initializationMode="random")
    +#' model <- spark.kmeans(data, ~ ., centers = 2, algorithm="random")
     #' }
    -setMethod("spark.kmeans", signature(data = "SparkDataFrame"),
    -          function(data, k, maxIter = 10, initializationMode = c("random", "k-means||")) {
    -            columnNames <- as.array(colnames(data))
    -            initializationMode <- match.arg(initializationMode)
    -            jobj <- callJStatic("org.apache.spark.ml.r.KMeansWrapper", "fit", data@sdf,
    -                                k, maxIter, initializationMode, columnNames)
    +setMethod("spark.kmeans", signature(data = "SparkDataFrame", formula = "formula"),
    +          function(data, formula, centers, iter.max = 10, algorithm = c("random", "k-means||")) {
    --- End diff --
    
    We need to be consistent with other SparkR APIs. For example, we have `approxQuantile(x, col, probabilities, relativeError)` and `insertInto(x, tableName)`. Given those existing APIs, I think we should use `maxIter` and `initMode` here. But I didn't scan all existing methods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15030] [ML] [SparkR] Support formula in...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12813#issuecomment-215971072
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15030] [ML] [SparkR] Support formula in...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on the pull request:

    https://github.com/apache/spark/pull/12813#issuecomment-215961694
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15030] [ML] [SparkR] Support formula in...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12813#issuecomment-215956628
  
    **[Test build #57439 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57439/consoleFull)** for PR 12813 at commit [`f1ba442`](https://github.com/apache/spark/commit/f1ba442350378296d07855a71e2299bd9fecde11).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15030] [ML] [SparkR] Support formula in...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12813


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15030] [ML] [SparkR] Support formula in...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12813#issuecomment-215970127
  
    **[Test build #57445 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57445/consoleFull)** for PR 12813 at commit [`79d1be4`](https://github.com/apache/spark/commit/79d1be46a518010247a722a51868f7d71bb56557).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15030] [ML] [SparkR] Support formula in...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12813#issuecomment-215954387
  
    **[Test build #57439 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57439/consoleFull)** for PR 12813 at commit [`f1ba442`](https://github.com/apache/spark/commit/f1ba442350378296d07855a71e2299bd9fecde11).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15030] [ML] [SparkR] Support formula in...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12813#issuecomment-215971047
  
    **[Test build #57446 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57446/consoleFull)** for PR 12813 at commit [`5bdce92`](https://github.com/apache/spark/commit/5bdce92bb427d5a665fba9eb20b6c6ec9bb3428b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15030] [ML] [SparkR] Support formula in...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12813#issuecomment-215970155
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57445/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15030] [ML] [SparkR] Support formula in...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12813#discussion_r61670513
  
    --- Diff: mllib/src/test/java/org/apache/spark/mllib/stat/JavaStatisticsSuite.java ---
    @@ -72,7 +72,7 @@ public void testCorr() {
         Double corr1 = Statistics.corr(x, y);
         Double corr2 = Statistics.corr(x, y, "pearson");
         // Check default method
    -    assertEquals(corr1, corr2);
    +    assertEquals(corr1, corr2, 1e-5);
    --- End diff --
    
    Fix the unstable test BTW.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15030] [ML] [SparkR] Support formula in...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12813#issuecomment-215963095
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org