You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by actuaryzhang <gi...@git.apache.org> on 2017/05/30 01:39:47 UTC
[GitHub] spark pull request #18140: Spark r formula
GitHub user actuaryzhang opened a pull request:
https://github.com/apache/spark/pull/18140
Spark r formula
## What changes were proposed in this pull request?
Add `stringIndexerOrderType` to `spark.glm` and `spark.survreg` to support string encoding that is consistent with default R.
## How was this patch tested?
new tests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/actuaryzhang/spark sparkRFormula
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/18140.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #18140
----
commit be7a0fb993ad1fbe60576cd39ca86b20d45289a6
Author: actuaryzhang <ac...@gmail.com>
Date: 2017-05-28T01:39:51Z
add stringIndexerOrderType to SparkR glm and test result consistency with R
commit 826e784e3bf83c3b9a84fc7d9500d15971a7ffd8
Author: actuaryzhang <ac...@gmail.com>
Date: 2017-05-30T01:36:39Z
add stringIndexerOrderType to survreg
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18140
**[Test build #77526 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77526/testReport)** for PR 18140 at commit [`0109aaf`](https://github.com/apache/spark/commit/0109aaf16b9035b0c6e491cd3147fa6ced8bafe6).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18140
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18140
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78232/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18140
**[Test build #77516 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77516/testReport)** for PR 18140 at commit [`66bc786`](https://github.com/apache/spark/commit/66bc786add41df52baead5a7d38b0b6b035d764d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18140
**[Test build #77601 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77601/testReport)** for PR 18140 at commit [`65f69fa`](https://github.com/apache/spark/commit/65f69fa26d5483300abffdca75f5171dfa42fb77).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #18140: [SPARK-20917][ML][SparkR] SparkR supports string ...
Posted by actuaryzhang <gi...@git.apache.org>.
GitHub user actuaryzhang reopened a pull request:
https://github.com/apache/spark/pull/18140
[SPARK-20917][ML][SparkR] SparkR supports string encoding consistent with R
## What changes were proposed in this pull request?
Add `stringIndexerOrderType` to `spark.glm` and `spark.survreg` to support string encoding that is consistent with default R.
## How was this patch tested?
new tests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/actuaryzhang/spark sparkRFormula
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/18140.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #18140
----
commit aba1429c48580ed19ae0a653830d065c681b7150
Author: actuaryzhang <ac...@gmail.com>
Date: 2017-05-28T01:39:51Z
add stringIndexerOrderType to SparkR glm and test result consistency with R
commit 49e50849ac7566aad9eb251535a29a59b659a68a
Author: actuaryzhang <ac...@gmail.com>
Date: 2017-05-30T01:36:39Z
add stringIndexerOrderType to survreg
commit cdc6c377ada3187111cdf984e8cd595ba78b69dc
Author: actuaryzhang <ac...@gmail.com>
Date: 2017-05-30T02:52:22Z
fix test
commit 18cbeb79b7cbf12a6d77110673312b82edbed92a
Author: actuaryzhang <ac...@gmail.com>
Date: 2017-05-30T07:39:49Z
address comments on doc
commit 6ae4d56592aef607a9e6d29b11fbb703bc4b971c
Author: actuaryzhang <ac...@gmail.com>
Date: 2017-05-31T16:50:44Z
add match arg
commit 3c1b85eb4db97723576927a2f972543c7ae69678
Author: actuaryzhang <ac...@gmail.com>
Date: 2017-05-31T18:31:43Z
add match arg in survreg
commit f33d0eafa5fc2a0b806c7016b42574045c3261af
Author: actuaryzhang <ac...@gmail.com>
Date: 2017-06-19T17:08:10Z
address comments
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:
https://github.com/apache/spark/pull/18140
merged to master
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #18140: [SPARK-20917][ML][SparkR] SparkR supports string ...
Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/18140#discussion_r119285081
--- Diff: R/pkg/inst/tests/testthat/test_mllib_regression.R ---
@@ -379,6 +379,49 @@ test_that("glm save/load", {
unlink(modelPath)
})
+test_that("spark.glm and glm with string encoding", {
--- End diff --
Added. Thank you!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18140
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:
https://github.com/apache/spark/pull/18140
do you want to bring this up to date?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #18140: [SPARK-20917][ML][SparkR] SparkR supports string ...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:
https://github.com/apache/spark/pull/18140#discussion_r119021523
--- Diff: R/pkg/R/mllib_regression.R ---
@@ -70,6 +70,12 @@ setClass("IsotonicRegressionModel", representation(jobj = "jobj"))
#' the relationship between the variance and mean of the distribution. Only
#' applicable to the Tweedie family.
#' @param link.power the index in the power link function. Only applicable to the Tweedie family.
+#' @param stringIndexerOrderType how to order categories of a string feature column. This is used to
+#' decide the base level of a string feature as the last category after
+#' ordering is dropped when encoding strings. Supported options are
+#' 'frequencyDesc', 'frequencyAsc', 'alphabetDesc', 'alphabetAsc'.
--- End diff --
let's quote with `"`
optionally, use `\code{}`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #18140: [SPARK-20917][ML][SparkR] SparkR supports string ...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:
https://github.com/apache/spark/pull/18140#discussion_r122632631
--- Diff: R/pkg/R/mllib_regression.R ---
@@ -443,10 +478,14 @@ setMethod("write.ml", signature(object = "IsotonicRegressionModel", path = "char
#' }
#' @note spark.survreg since 2.0.0
setMethod("spark.survreg", signature(data = "SparkDataFrame", formula = "formula"),
- function(data, formula, aggregationDepth = 2) {
+ function(data, formula, aggregationDepth = 2,
+ stringIndexerOrderType = c("frequencyDesc", "frequencyAsc",
+ "alphabetDesc", "alphabetAsc")) {
+ stringIndexerOrderType <- match.arg(stringIndexerOrderType)
formula <- paste(deparse(formula), collapse = "")
jobj <- callJStatic("org.apache.spark.ml.r.AFTSurvivalRegressionWrapper",
- "fit", formula, data@sdf, as.integer(aggregationDepth))
+ "fit", formula, data@sdf, as.integer(aggregationDepth),
+ as.character(stringIndexerOrderType))
--- End diff --
ditto
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18140
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:
https://github.com/apache/spark/pull/18140
Ah I didn't know it could be omitted with match.arg.
What does it pick when it is not specified? The first one?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18140
**[Test build #77526 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77526/testReport)** for PR 18140 at commit [`0109aaf`](https://github.com/apache/spark/commit/0109aaf16b9035b0c6e491cd3147fa6ced8bafe6).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:
https://github.com/apache/spark/pull/18140
can you kick AppVeyor?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #18140: [SPARK-20917][ML][SparkR] SparkR supports string ...
Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang closed the pull request at:
https://github.com/apache/spark/pull/18140
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #18140: [SPARK-20917][ML][SparkR] SparkR supports string ...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/18140
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [ML][SparkR] SparkR supports string encoding consistent ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18140
**[Test build #77511 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77511/testReport)** for PR 18140 at commit [`826e784`](https://github.com/apache/spark/commit/826e784e3bf83c3b9a84fc7d9500d15971a7ffd8).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18140
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77526/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #18140: [SPARK-20917][ML][SparkR] SparkR supports string ...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:
https://github.com/apache/spark/pull/18140#discussion_r122632729
--- Diff: R/pkg/tests/fulltests/test_mllib_regression.R ---
@@ -367,6 +367,51 @@ test_that("glm save/load", {
unlink(modelPath)
})
+test_that("spark.glm and glm with string encoding", {
+ skip_on_cran()
--- End diff --
sorry, no longer needed
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/18140
Oh, great. Did that and checks passed now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18140
**[Test build #77595 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77595/testReport)** for PR 18140 at commit [`5aa8946`](https://github.com/apache/spark/commit/5aa8946f740135b90376154aab81ab182b3ba888).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18140
**[Test build #77601 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77601/testReport)** for PR 18140 at commit [`65f69fa`](https://github.com/apache/spark/commit/65f69fa26d5483300abffdca75f5171dfa42fb77).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18140
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77516/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18140
**[Test build #78261 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78261/testReport)** for PR 18140 at commit [`f33d0ea`](https://github.com/apache/spark/commit/f33d0eafa5fc2a0b806c7016b42574045c3261af).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #18140: [SPARK-20917][ML][SparkR] SparkR supports string ...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:
https://github.com/apache/spark/pull/18140#discussion_r122632393
--- Diff: R/pkg/R/mllib_regression.R ---
@@ -145,7 +163,8 @@ setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"),
jobj <- callJStatic("org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper",
"fit", formula, data@sdf, tolower(family$family), family$link,
tol, as.integer(maxIter), weightCol, regParam,
- as.double(var.power), as.double(link.power))
+ as.double(var.power), as.double(link.power),
+ as.character(stringIndexerOrderType))
--- End diff --
nit: I think we don't need `as.character` now as `stringIndexerOrderType` is from `match.arg`?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18140
**[Test build #78232 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78232/testReport)** for PR 18140 at commit [`3c1b85e`](https://github.com/apache/spark/commit/3c1b85eb4db97723576927a2f972543c7ae69678).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:
https://github.com/apache/spark/pull/18140
you can close and re-open this PR on github here
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/18140
@felixcheung Please take a look. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #18140: [SPARK-20917][ML][SparkR] SparkR supports string ...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:
https://github.com/apache/spark/pull/18140#discussion_r119403261
--- Diff: R/pkg/R/mllib_regression.R ---
@@ -110,7 +125,8 @@ setClass("IsotonicRegressionModel", representation(jobj = "jobj"))
#' @seealso \link{glm}, \link{read.ml}
setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"),
function(data, formula, family = gaussian, tol = 1e-6, maxIter = 25, weightCol = NULL,
- regParam = 0.0, var.power = 0.0, link.power = 1.0 - var.power) {
+ regParam = 0.0, var.power = 0.0, link.power = 1.0 - var.power,
+ stringIndexerOrderType = "frequencyDesc") {
--- End diff --
actually, I mean it as https://github.com/actuaryzhang/spark/blob/66bc786add41df52baead5a7d38b0b6b035d764d/R/pkg/R/mllib_clustering.R#L167
but then we will need to tweak it to have a default value
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18140
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18140
**[Test build #77516 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77516/testReport)** for PR 18140 at commit [`66bc786`](https://github.com/apache/spark/commit/66bc786add41df52baead5a7d38b0b6b035d764d).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18140
**[Test build #77511 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77511/testReport)** for PR 18140 at commit [`826e784`](https://github.com/apache/spark/commit/826e784e3bf83c3b9a84fc7d9500d15971a7ffd8).
* This patch **fails SparkR unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18140
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77595/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18140
**[Test build #78261 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78261/testReport)** for PR 18140 at commit [`f33d0ea`](https://github.com/apache/spark/commit/f33d0eafa5fc2a0b806c7016b42574045c3261af).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/18140
Thanks for the comments. Addressed them in the new commit.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #18140: [SPARK-20917][ML][SparkR] SparkR supports string ...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:
https://github.com/apache/spark/pull/18140#discussion_r122632556
--- Diff: R/pkg/R/mllib_regression.R ---
@@ -182,9 +207,13 @@ setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"),
#' @seealso \link{spark.glm}
setMethod("glm", signature(formula = "formula", family = "ANY", data = "SparkDataFrame"),
function(formula, family = gaussian, data, epsilon = 1e-6, maxit = 25, weightCol = NULL,
- var.power = 0.0, link.power = 1.0 - var.power) {
+ var.power = 0.0, link.power = 1.0 - var.power,
+ stringIndexerOrderType = c("frequencyDesc", "frequencyAsc",
+ "alphabetDesc", "alphabetAsc")) {
+ stringIndexerOrderType <- match.arg(stringIndexerOrderType)
--- End diff --
maybe we don't need here, since we are calling spark.glm which will do the same check
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18140
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77601/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/18140
Simple example to illustrate:
```
> df <- createDataFrame(as.data.frame(Titanic, stringsAsFactors = FALSE))
> rModel <- stats::glm(Freq ~ Sex + Age, family = "gaussian", data = as.data.frame(df))
> summary(rModel)$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 91.34375 35.99417 2.537737 0.016790098
SexMale 78.81250 41.56249 1.896241 0.067931094
AgeChild -123.93750 41.56249 -2.981956 0.005752153
> model <- spark.glm(df, Freq ~ Sex + Age, family = "gaussian")
> summary(model)$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) -32.59375 35.99417 -0.9055286 0.372647658
Sex_Male 78.81250 41.56249 1.8962412 0.067931094
Age_Adult 123.93750 41.56249 2.9819558 0.005752153
> model2 <- spark.glm(df, Freq ~ Sex + Age, family = "gaussian",
+ stringIndexerOrderType = "alphabetDesc")
> summary(model2)$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 91.34375 35.99417 2.537737 0.016790098
Sex_Male 78.81250 41.56249 1.896241 0.067931094
Age_Child -123.93750 41.56249 -2.981956 0.005752153
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #18140: [SPARK-20917][ML][SparkR] SparkR supports string ...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:
https://github.com/apache/spark/pull/18140#discussion_r119022717
--- Diff: R/pkg/inst/tests/testthat/test_mllib_regression.R ---
@@ -379,6 +379,49 @@ test_that("glm save/load", {
unlink(modelPath)
})
+test_that("spark.glm and glm with string encoding", {
--- End diff --
we are trying to cut down to only a core set of tests to run on CRAN.
please add `skip_on_cran()` here
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #18140: [SPARK-20917][ML][SparkR] SparkR supports string ...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:
https://github.com/apache/spark/pull/18140#discussion_r119022422
--- Diff: R/pkg/R/mllib_regression.R ---
@@ -110,7 +125,8 @@ setClass("IsotonicRegressionModel", representation(jobj = "jobj"))
#' @seealso \link{glm}, \link{read.ml}
setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"),
function(data, formula, family = gaussian, tol = 1e-6, maxIter = 25, weightCol = NULL,
- regParam = 0.0, var.power = 0.0, link.power = 1.0 - var.power) {
+ regParam = 0.0, var.power = 0.0, link.power = 1.0 - var.power,
+ stringIndexerOrderType = "frequencyDesc") {
--- End diff --
given we need to spell it out, I'm wondering it it would be better to check the `stringIndexerOrderType` parameter to match on of the supported options in R?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #18140: [SPARK-20917][ML][SparkR] SparkR supports string ...
Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/18140#discussion_r119029879
--- Diff: R/pkg/R/mllib_regression.R ---
@@ -110,7 +125,8 @@ setClass("IsotonicRegressionModel", representation(jobj = "jobj"))
#' @seealso \link{glm}, \link{read.ml}
setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"),
function(data, formula, family = gaussian, tol = 1e-6, maxIter = 25, weightCol = NULL,
- regParam = 0.0, var.power = 0.0, link.power = 1.0 - var.power) {
+ regParam = 0.0, var.power = 0.0, link.power = 1.0 - var.power,
+ stringIndexerOrderType = "frequencyDesc") {
--- End diff --
I don't think there are corresponding R options for this. One can convert the string into a factor and manipulate the factor easily. It's just the default approach is dropping the first alphabetical category.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18140
**[Test build #78232 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78232/testReport)** for PR 18140 at commit [`3c1b85e`](https://github.com/apache/spark/commit/3c1b85eb4db97723576927a2f972543c7ae69678).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/18140
How do I do that?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18140
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18140
**[Test build #77595 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77595/testReport)** for PR 18140 at commit [`5aa8946`](https://github.com/apache/spark/commit/5aa8946f740135b90376154aab81ab182b3ba888).
* This patch **fails SparkR unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18140
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77511/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18140
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/18140
Thanks for the comments. Fixed them all in the new commit.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/18140
@felixcheung It's up to date now. Any additional comments on this one?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18140
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/18140
@felixcheung Yes, the first one is the default.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #18140: [SPARK-20917][ML][SparkR] SparkR supports string ...
Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on a diff in the pull request:
https://github.com/apache/spark/pull/18140#discussion_r119438348
--- Diff: R/pkg/R/mllib_regression.R ---
@@ -110,7 +125,8 @@ setClass("IsotonicRegressionModel", representation(jobj = "jobj"))
#' @seealso \link{glm}, \link{read.ml}
setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"),
function(data, formula, family = gaussian, tol = 1e-6, maxIter = 25, weightCol = NULL,
- regParam = 0.0, var.power = 0.0, link.power = 1.0 - var.power) {
+ regParam = 0.0, var.power = 0.0, link.power = 1.0 - var.power,
+ stringIndexerOrderType = "frequencyDesc") {
--- End diff --
I see. Added argument matching in the new commit.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #18140: [SPARK-20917][ML][SparkR] SparkR supports string encodin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18140
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78261/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org