You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2016/07/03 06:20:18 UTC
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/14035
[SPARK-16356][ML] Add testImplicits for ML unit tests and promote toDF()
## What changes were proposed in this pull request?
This was suggested in https://github.com/apache/spark/commit/101663f1ae222a919fc40510aa4f2bad22d1be6f#commitcomment-17114968.
This PR adds `testImplicits` to `MLlibTestSparkContext` so that some implicits such as `toDF()` can be sued across ml tests.
This PR also changes all the usages of `spark.createDataFrame( ... )` to `toDF()` where applicable in ml tests in Scala.
## How was this patch tested?
Existing tests should work.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HyukjinKwon/spark minor-ml-test
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/14035.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #14035
----
commit 79453ac4806bc55dc5ac57da1fb4c706cdd0a762
Author: hyukjinkwon <gu...@gmail.com>
Date: 2016-06-28T04:25:35Z
Promote toDF() instead of createDataFrame from a Product-type RDD
commit 902b9132a029df3f879d70e0f01c04b640a97ebc
Author: hyukjinkwon <gu...@gmail.com>
Date: 2016-06-28T04:35:49Z
Fix indentation
commit 0df2e44c1871ce30a29878450b0d2024779a3e73
Author: hyukjinkwon <gu...@gmail.com>
Date: 2016-06-29T03:31:15Z
Add some more tests to use toDF API
commit 4f1fc1cfdd9d3cd55ce56b852d5a4a6d6b7ea958
Author: hyukjinkwon <gu...@gmail.com>
Date: 2016-07-03T04:46:03Z
Fetch upstream
commit 5f7f85b40709eee0eb261edd24eaaef9b7fc3783
Author: hyukjinkwon <gu...@gmail.com>
Date: 2016-07-03T05:45:43Z
Fix some more cases
commit 52e7f1601df73dc35aac7627a6e0466b19cd8248
Author: hyukjinkwon <gu...@gmail.com>
Date: 2016-07-03T05:56:24Z
Take out the change in SQL and consistent imports
commit 54c27d4d359a7e6ad445856e06f15e29132d582c
Author: hyukjinkwon <gu...@gmail.com>
Date: 2016-07-03T06:12:35Z
Remove unused imports and cleanup nits
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r69432798
--- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala ---
@@ -55,7 +56,7 @@ class LogisticRegressionSuite
generateMultinomialLogisticInput(coefficients, xMean, xVariance,
addIntercept = true, nPoints, 42)
- spark.createDataFrame(sc.parallelize(testData, 4))
+ sc.parallelize(testData, 4).toDF()
--- End diff --
It'd be nice to know what was the purpose of the explicit partition setting.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14035
Hi @jkbradley, could you take a look for this one please?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14035
**[Test build #65758 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65758/consoleFull)** for PR 14035 at commit [`13b1a67`](https://github.com/apache/spark/commit/13b1a6751902493e458af162b222aebf879d41da).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r69390117
--- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/ClassifierSuite.scala ---
@@ -71,8 +71,7 @@ class ClassifierSuite extends SparkFunSuite with MLlibTestSparkContext {
test("getNumClasses") {
def getTestData(labels: Seq[Double]): DataFrame = {
--- End diff --
repeated. What about Moving it outside `test` methods?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14035
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65883/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r69390189
--- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala ---
@@ -158,7 +159,7 @@ class RandomForestClassifierSuite
}
test("Fitting without numClasses in metadata") {
- val df: DataFrame = spark.createDataFrame(TreeTests.featureImportanceData(sc))
+ val df: DataFrame = TreeTests.featureImportanceData(sc).toDF()
--- End diff --
Why is the type annotation needed here?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14035
**[Test build #65881 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65881/consoleFull)** for PR 14035 at commit [`ad9d7ac`](https://github.com/apache/spark/commit/ad9d7ac49b26805eb3c975dc61bd15d2d47c34a4).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14035
**[Test build #65883 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65883/consoleFull)** for PR 14035 at commit [`b60c952`](https://github.com/apache/spark/commit/b60c95279c809c59cd08ed1e1b239b3688f50b12).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14035
Thank you for reviewing this!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14035
Thank you!!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14035
**[Test build #63487 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63487/consoleFull)** for PR 14035 at commit [`5157d77`](https://github.com/apache/spark/commit/5157d77127541bd39517a5802618d6c1f7f6beaa).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14035
ping @mengxr and @yanboliang
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r69390273
--- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/MinMaxScalerSuite.scala ---
@@ -57,8 +58,7 @@ class MinMaxScalerSuite extends SparkFunSuite with MLlibTestSparkContext with De
test("MinMaxScaler arguments max must be larger than min") {
withClue("arguments max must be larger than min") {
- val dummyDF = spark.createDataFrame(Seq(
- (1, Vectors.dense(1.0, 2.0)))).toDF("id", "feature")
+ val dummyDF = Seq((1, Vectors.dense(1.0, 2.0))).toDF("id", "feature")
--- End diff --
It's just a column name, but for consistency...`features` (not `feature`) (unless there's a reason for this)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14035
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14035
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14035
Thanks @yanboliang and @jaceklaskowski . I addressed comments except for few comments I am not too sure of and I think are not related changes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14035
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14035
Gentle ping @mengxr and @yanboliang
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14035
**[Test build #65758 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65758/consoleFull)** for PR 14035 at commit [`13b1a67`](https://github.com/apache/spark/commit/13b1a6751902493e458af162b222aebf879d41da).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r69390144
--- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala ---
@@ -55,7 +56,7 @@ class LogisticRegressionSuite
generateMultinomialLogisticInput(coefficients, xMean, xVariance,
addIntercept = true, nPoints, 42)
- spark.createDataFrame(sc.parallelize(testData, 4))
+ sc.parallelize(testData, 4).toDF()
--- End diff --
`testData.toDF.repartition(4)`?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14035
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63487/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r69400558
--- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala ---
@@ -158,7 +159,7 @@ class RandomForestClassifierSuite
}
test("Fitting without numClasses in metadata") {
- val df: DataFrame = spark.createDataFrame(TreeTests.featureImportanceData(sc))
+ val df: DataFrame = TreeTests.featureImportanceData(sc).toDF()
--- End diff --
I also agree with this but actually it seems both are fine assuming from this discussion, https://github.com/apache/spark/pull/12452
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r80376695
--- Diff: mllib/src/test/scala/org/apache/spark/ml/evaluation/RegressionEvaluatorSuite.scala ---
@@ -42,9 +43,10 @@ class RegressionEvaluatorSuite
* data.map(x=> x.label + ", " + x.features(0) + ", " + x.features(1))
* .saveAsTextFile("path")
*/
- val dataset = spark.createDataFrame(
- sc.parallelize(LinearDataGenerator.generateLinearInput(
- 6.3, Array(4.7, 7.2), Array(0.9, -1.3), Array(0.7, 1.2), 100, 42, 0.1), 2).map(_.asML))
+ val dataset = sc.parallelize(
--- End diff --
+1 @jaceklaskowski
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14035
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r69390204
--- Diff: mllib/src/test/scala/org/apache/spark/ml/evaluation/RegressionEvaluatorSuite.scala ---
@@ -42,9 +43,10 @@ class RegressionEvaluatorSuite
* data.map(x=> x.label + ", " + x.features(0) + ", " + x.features(1))
* .saveAsTextFile("path")
*/
- val dataset = spark.createDataFrame(
- sc.parallelize(LinearDataGenerator.generateLinearInput(
- 6.3, Array(4.7, 7.2), Array(0.9, -1.3), Array(0.7, 1.2), 100, 42, 0.1), 2).map(_.asML))
+ val dataset = sc.parallelize(
--- End diff --
Remove `sc.parallelize`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r80376849
--- Diff: mllib/src/test/scala/org/apache/spark/mllib/util/MLUtilsSuite.scala ---
@@ -282,9 +281,7 @@ class MLUtilsSuite extends SparkFunSuite with MLlibTestSparkContext {
val z = Vectors.dense(4.0).asML
val p = (5.0, z)
val w = Vectors.dense(6.0)
- val df = spark.createDataFrame(Seq(
- (0, x, y, p, w)
- )).toDF("id", "x", "y", "p", "w")
+ val df = Seq((0, x, y, p, w)).toDF("id", "x", "y", "p", "w")
.withColumn("x", col("x"), metadata)
--- End diff --
We are more prefer to use ```col("x")``` for DataFrame operation.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r69390147
--- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala ---
@@ -869,8 +870,7 @@ class LogisticRegressionSuite
}
}
- (spark.createDataFrame(sc.parallelize(data1, 4)),
- spark.createDataFrame(sc.parallelize(data2, 4)))
+ (sc.parallelize(data1, 4).toDF(), sc.parallelize(data2, 4).toDF())
--- End diff --
Same as above
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14035
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61679/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r69391151
--- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala ---
@@ -52,23 +53,20 @@ class GeneralizedLinearRegressionSuite
import GeneralizedLinearRegressionSuite._
- datasetGaussianIdentity = spark.createDataFrame(
- sc.parallelize(generateGeneralizedLinearRegressionInput(
- intercept = 2.5, coefficients = Array(2.2, 0.6), xMean = Array(2.9, 10.5),
- xVariance = Array(0.7, 1.2), nPoints = 10000, seed, noiseLevel = 0.01,
- family = "gaussian", link = "identity"), 2))
+ datasetGaussianIdentity = sc.parallelize(generateGeneralizedLinearRegressionInput(
--- End diff --
Why is this `sc.parallelize` needed here? Why are `2` partitions used?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r69390185
--- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala ---
@@ -55,7 +56,7 @@ class OneVsRestSuite extends SparkFunSuite with MLlibTestSparkContext with Defau
val xVariance = Array(0.6856, 0.1899, 3.116, 0.581)
rdd = sc.parallelize(generateMultinomialLogisticInput(
coefficients, xMean, xVariance, true, nPoints, 42), 2)
- dataset = spark.createDataFrame(rdd)
+ dataset = rdd.toDF()
--- End diff --
Merge it with line 57.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r69390423
--- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/OneHotEncoderSuite.scala ---
@@ -29,10 +29,11 @@ import org.apache.spark.sql.types._
class OneHotEncoderSuite
extends SparkFunSuite with MLlibTestSparkContext with DefaultReadWriteTest {
+ import testImplicits._
def stringIndexed(): DataFrame = {
val data = sc.parallelize(Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, "a"), (5, "c")), 2)
--- End diff --
Remove `sc.parallelize`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r80449593
--- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/ChiSqSelectorSuite.scala ---
@@ -29,8 +29,7 @@ class ChiSqSelectorSuite extends SparkFunSuite with MLlibTestSparkContext
with DefaultReadWriteTest {
test("Test Chi-Square selector") {
- val spark = this.spark
- import spark.implicits._
+ import testImplicits._
--- End diff --
Nit: Actually it should be moved out of this test function and can be shared between all test cases if necessary.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14035
**[Test build #61679 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61679/consoleFull)** for PR 14035 at commit [`54c27d4`](https://github.com/apache/spark/commit/54c27d4d359a7e6ad445856e06f15e29132d582c).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r69400523
--- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala ---
@@ -116,7 +117,7 @@ class MultilayerPerceptronClassifierSuite
// the input seed is somewhat magic, to make this test pass
val rdd = sc.parallelize(generateMultinomialLogisticInput(
coefficients, xMean, xVariance, true, nPoints, 1), 2)
- val dataFrame = spark.createDataFrame(rdd).toDF("label", "features")
+ val dataFrame = rdd.toDF("label", "features")
--- End diff --
Again, I also agree with this but I am hesitated to change this because it is explicitly set.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14035
**[Test build #63487 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63487/consoleFull)** for PR 14035 at commit [`5157d77`](https://github.com/apache/spark/commit/5157d77127541bd39517a5802618d6c1f7f6beaa).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14035
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64632/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14035
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r69390216
--- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala ---
@@ -44,7 +45,7 @@ class CountVectorizerSuite extends SparkFunSuite with MLlibTestSparkContext
(3, split(""), Vectors.sparse(4, Seq())), // empty string
--- End diff --
Replace the comment `// empty string` with `val EMPTY_STRING = ""`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/14035
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14035
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14035
Hi @mengxr, @yanboliang and @jkbradley, if these changes are so big, I can just leave `testImplicits` and let others fix them later without sweeping. Could you please take a look?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r69390388
--- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/NormalizerSuite.scala ---
@@ -61,7 +62,7 @@ class NormalizerSuite extends SparkFunSuite with MLlibTestSparkContext with Defa
Vectors.sparse(3, Seq())
)
- dataFrame = spark.createDataFrame(sc.parallelize(data, 2).map(NormalizerSuite.FeatureData))
+ dataFrame = sc.parallelize(data, 2).map(NormalizerSuite.FeatureData).toDF()
--- End diff --
Remove `sc.parallelize`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14035
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r69391176
--- Diff: mllib/src/test/scala/org/apache/spark/mllib/util/MLUtilsSuite.scala ---
@@ -282,9 +281,7 @@ class MLUtilsSuite extends SparkFunSuite with MLlibTestSparkContext {
val z = Vectors.dense(4.0).asML
val p = (5.0, z)
val w = Vectors.dense(6.0)
- val df = spark.createDataFrame(Seq(
- (0, x, y, p, w)
- )).toDF("id", "x", "y", "p", "w")
+ val df = Seq((0, x, y, p, w)).toDF("id", "x", "y", "p", "w")
.withColumn("x", col("x"), metadata)
--- End diff --
Replace `col("x")` with `$"x"` or (better) `'x`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14035
**[Test build #65883 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65883/consoleFull)** for PR 14035 at commit [`b60c952`](https://github.com/apache/spark/commit/b60c95279c809c59cd08ed1e1b239b3688f50b12).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14035
ping @mengxr and @yanboliang
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14035
**[Test build #65757 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65757/consoleFull)** for PR 14035 at commit [`2cbcabd`](https://github.com/apache/spark/commit/2cbcabdcef32280316db1ede1a22934dacf3cf35).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/14035
Sorry I'm seeing this so late, but thank you all for the PR & reviews!
@jaceklaskowski Regarding the explicit partitioning in unit tests, that's historical: In the past, we had run into some bugs which only showed up with multiple partitions, so we got in the habit of using multiple partitions. I still think it's a nice idea in general, though perhaps there are fewer such bugs nowadays.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r69400465
--- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala ---
@@ -55,7 +56,7 @@ class LogisticRegressionSuite
generateMultinomialLogisticInput(coefficients, xMean, xVariance,
addIntercept = true, nPoints, 42)
- spark.createDataFrame(sc.parallelize(testData, 4))
+ sc.parallelize(testData, 4).toDF()
--- End diff --
I guess, to be strict, `sc.parallelize(testData, 4).toDF()` and `testData.toDF.repartition(4)` would not be exactly the same. It seems the author of this test code intended to explicitly set the initial number of partitions to `4` and I left as it is although I think as you said because I am not 100% sure and it is not the part of this issue.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14035
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65757/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r69391139
--- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorIndexerSuite.scala ---
@@ -102,7 +103,7 @@ class VectorIndexerSuite extends SparkFunSuite with MLlibTestSparkContext
}
test("Cannot fit an empty DataFrame") {
- val rdd = spark.createDataFrame(sc.parallelize(Array.empty[Vector], 2).map(FeatureData))
+ val rdd = sc.parallelize(Array.empty[Vector], 2).map(FeatureData).toDF()
--- End diff --
Do you need `sc.parallelize`?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r69390173
--- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala ---
@@ -116,7 +117,7 @@ class MultilayerPerceptronClassifierSuite
// the input seed is somewhat magic, to make this test pass
val rdd = sc.parallelize(generateMultinomialLogisticInput(
coefficients, xMean, xVariance, true, nPoints, 1), 2)
- val dataFrame = spark.createDataFrame(rdd).toDF("label", "features")
+ val dataFrame = rdd.toDF("label", "features")
--- End diff --
Could we merge this line with 118? I don't think 118 needs `sc.parallelize`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r80376739
--- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/MinMaxScalerSuite.scala ---
@@ -57,8 +58,7 @@ class MinMaxScalerSuite extends SparkFunSuite with MLlibTestSparkContext with De
test("MinMaxScaler arguments max must be larger than min") {
withClue("arguments max must be larger than min") {
- val dummyDF = spark.createDataFrame(Seq(
- (1, Vectors.dense(1.0, 2.0)))).toDF("id", "feature")
+ val dummyDF = Seq((1, Vectors.dense(1.0, 2.0))).toDF("id", "feature")
--- End diff --
+1 @jaceklaskowski
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r69390178
--- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/NaiveBayesSuite.scala ---
@@ -47,7 +48,7 @@ class NaiveBayesSuite extends SparkFunSuite with MLlibTestSparkContext with Defa
Array(0.10, 0.10, 0.70, 0.10) // label 2
).map(_.map(math.log))
- dataset = spark.createDataFrame(generateNaiveBayesInput(pi, theta, 100, 42))
+ dataset = generateNaiveBayesInput(pi, theta, 100, 42).toDF()
--- End diff --
Exactly my point above :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14035
hm.. I can close if it looks inappropriate or it seems making a lot of conflicts across PRs. Could you give some feedback please @mengxr and @yanboliang ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14035
**[Test build #61679 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61679/consoleFull)** for PR 14035 at commit [`54c27d4`](https://github.com/apache/spark/commit/54c27d4d359a7e6ad445856e06f15e29132d582c).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14035
Hi @mengxr, is this the change you meant? Could you please take a look?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14035
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61681/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14035
**[Test build #61681 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61681/consoleFull)** for PR 14035 at commit [`6fdf290`](https://github.com/apache/spark/commit/6fdf290d682b190f6b0457bb084cebacd105ee00).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r80379623
--- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala ---
@@ -55,7 +56,7 @@ class OneVsRestSuite extends SparkFunSuite with MLlibTestSparkContext with Defau
val xVariance = Array(0.6856, 0.1899, 3.116, 0.581)
rdd = sc.parallelize(generateMultinomialLogisticInput(
coefficients, xMean, xVariance, true, nPoints, 42), 2)
- dataset = spark.createDataFrame(rdd)
+ dataset = rdd.toDF()
--- End diff --
It seems the `rdd` is being used in the tests.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/14035
Sorry for late response. I like this change and will have a look soon. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/14035
LGTM, merged into master. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14035
cc @mengxr, @yanboliang and @jaceklaskowski
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14035
**[Test build #61681 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61681/consoleFull)** for PR 14035 at commit [`6fdf290`](https://github.com/apache/spark/commit/6fdf290d682b190f6b0457bb084cebacd105ee00).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14035
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14035
**[Test build #65757 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65757/consoleFull)** for PR 14035 at commit [`2cbcabd`](https://github.com/apache/spark/commit/2cbcabdcef32280316db1ede1a22934dacf3cf35).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14035
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65758/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r69391120
--- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala ---
@@ -39,7 +40,7 @@ class StringIndexerSuite
test("StringIndexer") {
val data = sc.parallelize(Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, "a"), (5, "c")), 2)
--- End diff --
Could you remove `sc.parallelize`, too?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14035
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65881/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14035
**[Test build #64632 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64632/consoleFull)** for PR 14035 at commit [`f2990b1`](https://github.com/apache/spark/commit/f2990b1b8240a43a7a5fcee5b39ada05b432aef1).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r80380667
--- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorIndexerSuite.scala ---
@@ -85,11 +87,13 @@ class VectorIndexerSuite extends SparkFunSuite with MLlibTestSparkContext
checkPair(densePoints1Seq, sparsePoints1Seq)
checkPair(densePoints2Seq, sparsePoints2Seq)
- densePoints1 = spark.createDataFrame(sc.parallelize(densePoints1Seq, 2).map(FeatureData))
- sparsePoints1 = spark.createDataFrame(sc.parallelize(sparsePoints1Seq, 2).map(FeatureData))
- densePoints2 = spark.createDataFrame(sc.parallelize(densePoints2Seq, 2).map(FeatureData))
- sparsePoints2 = spark.createDataFrame(sc.parallelize(sparsePoints2Seq, 2).map(FeatureData))
- badPoints = spark.createDataFrame(sc.parallelize(badPointsSeq, 2).map(FeatureData))
+ densePoints1 = densePoints1Seq.map(FeatureData).toDF()
+ sparsePoints1 = sparsePoints1Seq.map(FeatureData).toDF()
+ // TODO: If we directly use `toDF` without parallelize, the test in
+ // "Throws error when given RDDs with different size vectors" is failed for an unknown reason.
+ densePoints2 = sc.parallelize(densePoints2Seq, 2).map(FeatureData).toDF()
--- End diff --
BTW, It seems a test is failed when I change this to `densePoints2Seq.map(FeatureData).toDF()` for an unknown reason.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r80376680
--- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala ---
@@ -116,7 +117,7 @@ class MultilayerPerceptronClassifierSuite
// the input seed is somewhat magic, to make this test pass
val rdd = sc.parallelize(generateMultinomialLogisticInput(
coefficients, xMean, xVariance, true, nPoints, 1), 2)
- val dataFrame = spark.createDataFrame(rdd).toDF("label", "features")
+ val dataFrame = rdd.toDF("label", "features")
--- End diff --
+1 @jaceklaskowski
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the issue:
https://github.com/apache/spark/pull/14035
@HyukjinKwon I have made a pass and this PR look good overall. Could you double check whether all ML test cases are covered? Since I found we used implicit import of different style at [```ChiSqSelectorSuite```](https://github.com/apache/spark/blob/master/mllib/src/test/scala/org/apache/spark/ml/feature/ChiSqSelectorSuite.scala#L33), it's better we can unify them. Then I'd like to get this in. Thanks for working on this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14035
**[Test build #64632 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64632/consoleFull)** for PR 14035 at commit [`f2990b1`](https://github.com/apache/spark/commit/f2990b1b8240a43a7a5fcee5b39ada05b432aef1).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/14035
**[Test build #65881 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65881/consoleFull)** for PR 14035 at commit [`ad9d7ac`](https://github.com/apache/spark/commit/ad9d7ac49b26805eb3c975dc61bd15d2d47c34a4).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14035
@mengxr, @yanboliang, Could you review this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/14035#discussion_r69390132
--- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala ---
@@ -134,15 +135,14 @@ class GBTClassifierSuite extends SparkFunSuite with MLlibTestSparkContext
*/
test("Fitting without numClasses in metadata") {
- val df: DataFrame = spark.createDataFrame(TreeTests.featureImportanceData(sc))
+ val df: DataFrame = TreeTests.featureImportanceData(sc).toDF()
val gbt = new GBTClassifier().setMaxDepth(1).setMaxIter(1)
gbt.fit(df)
--- End diff --
Wonder why this line is separate not part of 139? Any reason?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org