You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by actuaryzhang <gi...@git.apache.org> on 2017/05/30 01:39:47 UTC

[GitHub] spark pull request #18140: Spark r formula

GitHub user actuaryzhang opened a pull request:

    https://github.com/apache/spark/pull/18140

    Spark r formula

    ## What changes were proposed in this pull request?
    
    Add `stringIndexerOrderType` to `spark.glm` and `spark.survreg` to support string encoding that is consistent with default R. 
    
    ## How was this patch tested?
    new tests 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/actuaryzhang/spark sparkRFormula

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18140.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18140
    
----
commit be7a0fb993ad1fbe60576cd39ca86b20d45289a6
Author: actuaryzhang <ac...@gmail.com>
Date:   2017-05-28T01:39:51Z

    add stringIndexerOrderType to SparkR glm and test result consistency with R

commit 826e784e3bf83c3b9a84fc7d9500d15971a7ffd8
Author: actuaryzhang <ac...@gmail.com>
Date:   2017-05-30T01:36:39Z

    add stringIndexerOrderType to survreg

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    **[Test build #77526 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77526/testReport)** for PR 18140 at commit [`0109aaf`](https://github.com/apache/spark/commit/0109aaf16b9035b0c6e491cd3147fa6ced8bafe6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78232/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    **[Test build #77516 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77516/testReport)** for PR 18140 at commit [`66bc786`](https://github.com/apache/spark/commit/66bc786add41df52baead5a7d38b0b6b035d764d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    **[Test build #77601 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77601/testReport)** for PR 18140 at commit [`65f69fa`](https://github.com/apache/spark/commit/65f69fa26d5483300abffdca75f5171dfa42fb77).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18140: [SPARK-20917][ML][SparkR] SparkR supports string ...

Posted by actuaryzhang <gi...@git.apache.org>.
GitHub user actuaryzhang reopened a pull request:

    https://github.com/apache/spark/pull/18140

    [SPARK-20917][ML][SparkR] SparkR supports string encoding consistent with R

    ## What changes were proposed in this pull request?
    
    Add `stringIndexerOrderType` to `spark.glm` and `spark.survreg` to support string encoding that is consistent with default R. 
    
    ## How was this patch tested?
    new tests 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/actuaryzhang/spark sparkRFormula

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18140.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18140
    
----
commit aba1429c48580ed19ae0a653830d065c681b7150
Author: actuaryzhang <ac...@gmail.com>
Date:   2017-05-28T01:39:51Z

    add stringIndexerOrderType to SparkR glm and test result consistency with R

commit 49e50849ac7566aad9eb251535a29a59b659a68a
Author: actuaryzhang <ac...@gmail.com>
Date:   2017-05-30T01:36:39Z

    add stringIndexerOrderType to survreg

commit cdc6c377ada3187111cdf984e8cd595ba78b69dc
Author: actuaryzhang <ac...@gmail.com>
Date:   2017-05-30T02:52:22Z

    fix test

commit 18cbeb79b7cbf12a6d77110673312b82edbed92a
Author: actuaryzhang <ac...@gmail.com>
Date:   2017-05-30T07:39:49Z

    address comments on doc

commit 6ae4d56592aef607a9e6d29b11fbb703bc4b971c
Author: actuaryzhang <ac...@gmail.com>
Date:   2017-05-31T16:50:44Z

    add match arg

commit 3c1b85eb4db97723576927a2f972543c7ae69678
Author: actuaryzhang <ac...@gmail.com>
Date:   2017-05-31T18:31:43Z

    add match arg in survreg

commit f33d0eafa5fc2a0b806c7016b42574045c3261af
Author: actuaryzhang <ac...@gmail.com>
Date:   2017-06-19T17:08:10Z

    address comments

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18140: [SPARK-20917][ML][SparkR] SparkR supports string ...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18140#discussion_r119285081
  
    --- Diff: R/pkg/inst/tests/testthat/test_mllib_regression.R ---
    @@ -379,6 +379,49 @@ test_that("glm save/load", {
       unlink(modelPath)
     })
     
    +test_that("spark.glm and glm with string encoding", {
    --- End diff --
    
    Added. Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    do you want to bring this up to date?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18140: [SPARK-20917][ML][SparkR] SparkR supports string ...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18140#discussion_r119021523
  
    --- Diff: R/pkg/R/mllib_regression.R ---
    @@ -70,6 +70,12 @@ setClass("IsotonicRegressionModel", representation(jobj = "jobj"))
     #'                      the relationship between the variance and mean of the distribution. Only
     #'                      applicable to the Tweedie family.
     #' @param link.power the index in the power link function. Only applicable to the Tweedie family.
    +#' @param stringIndexerOrderType how to order categories of a string feature column. This is used to
    +#'                               decide the base level of a string feature as the last category after
    +#'                               ordering is dropped when encoding strings. Supported options are
    +#'                               'frequencyDesc', 'frequencyAsc', 'alphabetDesc', 'alphabetAsc'.
    --- End diff --
    
    let's quote with `"` 
    optionally, use `\code{}`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18140: [SPARK-20917][ML][SparkR] SparkR supports string ...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18140#discussion_r122632631
  
    --- Diff: R/pkg/R/mllib_regression.R ---
    @@ -443,10 +478,14 @@ setMethod("write.ml", signature(object = "IsotonicRegressionModel", path = "char
     #' }
     #' @note spark.survreg since 2.0.0
     setMethod("spark.survreg", signature(data = "SparkDataFrame", formula = "formula"),
    -          function(data, formula, aggregationDepth = 2) {
    +          function(data, formula, aggregationDepth = 2,
    +                   stringIndexerOrderType = c("frequencyDesc", "frequencyAsc",
    +                                              "alphabetDesc", "alphabetAsc")) {
    +            stringIndexerOrderType <- match.arg(stringIndexerOrderType)
                 formula <- paste(deparse(formula), collapse = "")
                 jobj <- callJStatic("org.apache.spark.ml.r.AFTSurvivalRegressionWrapper",
    -                                "fit", formula, data@sdf, as.integer(aggregationDepth))
    +                                "fit", formula, data@sdf, as.integer(aggregationDepth),
    +                                as.character(stringIndexerOrderType))
    --- End diff --
    
    ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    Ah I didn't know it could be omitted with match.arg.
    
    What does it pick when it is not specified? The first one?
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    **[Test build #77526 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77526/testReport)** for PR 18140 at commit [`0109aaf`](https://github.com/apache/spark/commit/0109aaf16b9035b0c6e491cd3147fa6ced8bafe6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    can you kick AppVeyor?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18140: [SPARK-20917][ML][SparkR] SparkR supports string ...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang closed the pull request at:

    https://github.com/apache/spark/pull/18140


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18140: [SPARK-20917][ML][SparkR] SparkR supports string ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/18140


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [ML][SparkR] SparkR supports string encoding consistent ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    **[Test build #77511 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77511/testReport)** for PR 18140 at commit [`826e784`](https://github.com/apache/spark/commit/826e784e3bf83c3b9a84fc7d9500d15971a7ffd8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77526/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18140: [SPARK-20917][ML][SparkR] SparkR supports string ...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18140#discussion_r122632729
  
    --- Diff: R/pkg/tests/fulltests/test_mllib_regression.R ---
    @@ -367,6 +367,51 @@ test_that("glm save/load", {
       unlink(modelPath)
     })
     
    +test_that("spark.glm and glm with string encoding", {
    +  skip_on_cran()
    --- End diff --
    
    sorry, no longer needed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    Oh, great. Did that and checks passed now. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    **[Test build #77595 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77595/testReport)** for PR 18140 at commit [`5aa8946`](https://github.com/apache/spark/commit/5aa8946f740135b90376154aab81ab182b3ba888).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    **[Test build #77601 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77601/testReport)** for PR 18140 at commit [`65f69fa`](https://github.com/apache/spark/commit/65f69fa26d5483300abffdca75f5171dfa42fb77).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77516/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    **[Test build #78261 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78261/testReport)** for PR 18140 at commit [`f33d0ea`](https://github.com/apache/spark/commit/f33d0eafa5fc2a0b806c7016b42574045c3261af).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18140: [SPARK-20917][ML][SparkR] SparkR supports string ...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18140#discussion_r122632393
  
    --- Diff: R/pkg/R/mllib_regression.R ---
    @@ -145,7 +163,8 @@ setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"),
                 jobj <- callJStatic("org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper",
                                     "fit", formula, data@sdf, tolower(family$family), family$link,
                                     tol, as.integer(maxIter), weightCol, regParam,
    -                                as.double(var.power), as.double(link.power))
    +                                as.double(var.power), as.double(link.power),
    +                                as.character(stringIndexerOrderType))
    --- End diff --
    
    nit: I think we don't need `as.character` now as `stringIndexerOrderType` is from `match.arg`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    **[Test build #78232 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78232/testReport)** for PR 18140 at commit [`3c1b85e`](https://github.com/apache/spark/commit/3c1b85eb4db97723576927a2f972543c7ae69678).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    you can close and re-open this PR on github here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    @felixcheung Please take a look. Thanks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18140: [SPARK-20917][ML][SparkR] SparkR supports string ...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18140#discussion_r119403261
  
    --- Diff: R/pkg/R/mllib_regression.R ---
    @@ -110,7 +125,8 @@ setClass("IsotonicRegressionModel", representation(jobj = "jobj"))
     #' @seealso \link{glm}, \link{read.ml}
     setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"),
               function(data, formula, family = gaussian, tol = 1e-6, maxIter = 25, weightCol = NULL,
    -                   regParam = 0.0, var.power = 0.0, link.power = 1.0 - var.power) {
    +                   regParam = 0.0, var.power = 0.0, link.power = 1.0 - var.power,
    +                   stringIndexerOrderType = "frequencyDesc") {
    --- End diff --
    
    actually, I mean it as https://github.com/actuaryzhang/spark/blob/66bc786add41df52baead5a7d38b0b6b035d764d/R/pkg/R/mllib_clustering.R#L167
    
    but then we will need to tweak it to have a default value


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    **[Test build #77516 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77516/testReport)** for PR 18140 at commit [`66bc786`](https://github.com/apache/spark/commit/66bc786add41df52baead5a7d38b0b6b035d764d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    **[Test build #77511 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77511/testReport)** for PR 18140 at commit [`826e784`](https://github.com/apache/spark/commit/826e784e3bf83c3b9a84fc7d9500d15971a7ffd8).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77595/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    **[Test build #78261 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78261/testReport)** for PR 18140 at commit [`f33d0ea`](https://github.com/apache/spark/commit/f33d0eafa5fc2a0b806c7016b42574045c3261af).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    Thanks for the comments. Addressed them in the new commit. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18140: [SPARK-20917][ML][SparkR] SparkR supports string ...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18140#discussion_r122632556
  
    --- Diff: R/pkg/R/mllib_regression.R ---
    @@ -182,9 +207,13 @@ setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"),
     #' @seealso \link{spark.glm}
     setMethod("glm", signature(formula = "formula", family = "ANY", data = "SparkDataFrame"),
               function(formula, family = gaussian, data, epsilon = 1e-6, maxit = 25, weightCol = NULL,
    -                   var.power = 0.0, link.power = 1.0 - var.power) {
    +                   var.power = 0.0, link.power = 1.0 - var.power,
    +                   stringIndexerOrderType = c("frequencyDesc", "frequencyAsc",
    +                                              "alphabetDesc", "alphabetAsc")) {
    +            stringIndexerOrderType <- match.arg(stringIndexerOrderType)
    --- End diff --
    
    maybe we don't need here, since we are calling spark.glm which will do the same check


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77601/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    Simple example to illustrate:
    ```
    > df <- createDataFrame(as.data.frame(Titanic, stringsAsFactors = FALSE))
    > rModel <- stats::glm(Freq ~ Sex + Age, family = "gaussian", data = as.data.frame(df))
    > summary(rModel)$coefficients
                  Estimate Std. Error   t value    Pr(>|t|)
    (Intercept)   91.34375   35.99417  2.537737 0.016790098
    SexMale       78.81250   41.56249  1.896241 0.067931094
    AgeChild    -123.93750   41.56249 -2.981956 0.005752153
     
    > model <- spark.glm(df, Freq ~ Sex + Age, family = "gaussian")
    > summary(model)$coefficients
                 Estimate Std. Error    t value    Pr(>|t|)
    (Intercept) -32.59375   35.99417 -0.9055286 0.372647658
    Sex_Male     78.81250   41.56249  1.8962412 0.067931094
    Age_Adult   123.93750   41.56249  2.9819558 0.005752153
    
    > model2 <- spark.glm(df, Freq ~ Sex + Age, family = "gaussian",
    +                     stringIndexerOrderType = "alphabetDesc")
    > summary(model2)$coefficients
                  Estimate Std. Error   t value    Pr(>|t|)
    (Intercept)   91.34375   35.99417  2.537737 0.016790098
    Sex_Male      78.81250   41.56249  1.896241 0.067931094
    Age_Child   -123.93750   41.56249 -2.981956 0.005752153
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18140: [SPARK-20917][ML][SparkR] SparkR supports string ...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18140#discussion_r119022717
  
    --- Diff: R/pkg/inst/tests/testthat/test_mllib_regression.R ---
    @@ -379,6 +379,49 @@ test_that("glm save/load", {
       unlink(modelPath)
     })
     
    +test_that("spark.glm and glm with string encoding", {
    --- End diff --
    
    we are trying to cut down to only a core set of tests to run on CRAN.
    please add `skip_on_cran()` here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18140: [SPARK-20917][ML][SparkR] SparkR supports string ...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18140#discussion_r119022422
  
    --- Diff: R/pkg/R/mllib_regression.R ---
    @@ -110,7 +125,8 @@ setClass("IsotonicRegressionModel", representation(jobj = "jobj"))
     #' @seealso \link{glm}, \link{read.ml}
     setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"),
               function(data, formula, family = gaussian, tol = 1e-6, maxIter = 25, weightCol = NULL,
    -                   regParam = 0.0, var.power = 0.0, link.power = 1.0 - var.power) {
    +                   regParam = 0.0, var.power = 0.0, link.power = 1.0 - var.power,
    +                   stringIndexerOrderType = "frequencyDesc") {
    --- End diff --
    
    given we need to spell it out, I'm wondering it it would be better to check the `stringIndexerOrderType` parameter to match on of the supported options in R?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18140: [SPARK-20917][ML][SparkR] SparkR supports string ...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18140#discussion_r119029879
  
    --- Diff: R/pkg/R/mllib_regression.R ---
    @@ -110,7 +125,8 @@ setClass("IsotonicRegressionModel", representation(jobj = "jobj"))
     #' @seealso \link{glm}, \link{read.ml}
     setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"),
               function(data, formula, family = gaussian, tol = 1e-6, maxIter = 25, weightCol = NULL,
    -                   regParam = 0.0, var.power = 0.0, link.power = 1.0 - var.power) {
    +                   regParam = 0.0, var.power = 0.0, link.power = 1.0 - var.power,
    +                   stringIndexerOrderType = "frequencyDesc") {
    --- End diff --
    
    I don't think there are corresponding R options for this. One can convert the string into a factor and manipulate the factor easily. It's just the default approach is dropping the first alphabetical category. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    **[Test build #78232 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78232/testReport)** for PR 18140 at commit [`3c1b85e`](https://github.com/apache/spark/commit/3c1b85eb4db97723576927a2f972543c7ae69678).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    How do I do that? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    **[Test build #77595 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77595/testReport)** for PR 18140 at commit [`5aa8946`](https://github.com/apache/spark/commit/5aa8946f740135b90376154aab81ab182b3ba888).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77511/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    Thanks for the comments. Fixed them all in the new commit. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    @felixcheung It's up to date now. Any additional comments on this one?  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    @felixcheung Yes, the first one is the default. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18140: [SPARK-20917][ML][SparkR] SparkR supports string ...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18140#discussion_r119438348
  
    --- Diff: R/pkg/R/mllib_regression.R ---
    @@ -110,7 +125,8 @@ setClass("IsotonicRegressionModel", representation(jobj = "jobj"))
     #' @seealso \link{glm}, \link{read.ml}
     setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"),
               function(data, formula, family = gaussian, tol = 1e-6, maxIter = 25, weightCol = NULL,
    -                   regParam = 0.0, var.power = 0.0, link.power = 1.0 - var.power) {
    +                   regParam = 0.0, var.power = 0.0, link.power = 1.0 - var.power,
    +                   stringIndexerOrderType = "frequencyDesc") {
    --- End diff --
    
    I see. Added argument matching in the new commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18140
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78261/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org