You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gaborgsomogyi <gi...@git.apache.org> on 2018/01/23 12:13:11 UTC
[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...
GitHub user gaborgsomogyi opened a pull request:
https://github.com/apache/spark/pull/20362
[Spark-22886][ML][TESTS] ML test for structured streaming: ml.recomme…
## What changes were proposed in this pull request?
Converting spark.ml.recommendation tests to also check code with structured streaming, using the ML testing infrastructure implemented in SPARK-22882.
## How was this patch tested?
Automated: Pass the Jenkins.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gaborgsomogyi/spark SPARK-22886
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20362.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20362
----
commit 33654c93c2fe240eb0c6a6932353239ab84b0ce0
Author: Gabor Somogyi <ga...@...>
Date: 2018-01-18T20:27:08Z
[Spark-22886][ML][TESTS] ML test for structured streaming: ml.recommendation
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20362: [Spark-22886][ML][TESTS] ML test for structured streamin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20362
**[Test build #4126 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4126/testReport)** for PR 20362 at commit [`acfb092`](https://github.com/apache/spark/commit/acfb0920d0c5ae4601580cec633f2c3b87a2673f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20362: [Spark-22886][ML][TESTS] ML test for structured streamin...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20362
**[Test build #4126 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4126/testReport)** for PR 20362 at commit [`acfb092`](https://github.com/apache/spark/commit/acfb0920d0c5ae4601580cec633f2c3b87a2673f).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...
Posted by gaborgsomogyi <gi...@git.apache.org>.
Github user gaborgsomogyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/20362#discussion_r165378573
--- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -566,6 +565,7 @@ class ALSSuite
test("read/write") {
val spark = this.spark
import spark.implicits._
+
--- End diff --
Fixed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...
Posted by gaborgsomogyi <gi...@git.apache.org>.
Github user gaborgsomogyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/20362#discussion_r167005865
--- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -599,8 +598,11 @@ class ALSSuite
(ex, act) =>
ex.userFactors.first().getSeq[Float](1) === act.userFactors.first.getSeq[Float](1)
} { (ex, act, _) =>
- ex.transform(_: DataFrame).select("prediction").first.getDouble(0) ~==
- act.transform(_: DataFrame).select("prediction").first.getDouble(0) absTol 1e-6
+ testTransformerByGlobalCheckFunc[Float](_: DataFrame, act, "prediction") {
+ case actRows: Seq[Row] =>
+ ex.transform(_: DataFrame).select("prediction").first.getDouble(0) ~==
+ actRows(0).getDouble(0) absTol 1e-6
+ }
--- End diff --
Woah, didn't check the original functionality in such a depth. This is really dead code in runtime environment. Fixed 👍
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20362: [Spark-22886][ML][TESTS] ML test for structured streamin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20362
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...
Posted by gaborgsomogyi <gi...@git.apache.org>.
Github user gaborgsomogyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/20362#discussion_r165396480
--- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -662,28 +676,32 @@ class ALSSuite
val knownItem = data.select(max("item")).as[Int].first()
val unknownItem = knownItem + 20
val test = Seq(
- (unknownUser, unknownItem),
- (knownUser, unknownItem),
- (unknownUser, knownItem),
- (knownUser, knownItem)
- ).toDF("user", "item")
+ (unknownUser, unknownItem, true),
+ (knownUser, unknownItem, true),
+ (unknownUser, knownItem, true),
+ (knownUser, knownItem, false)
+ ).toDF("user", "item", "expectedIsNaN")
val als = new ALS().setMaxIter(1).setRank(1)
// default is 'nan'
val defaultModel = als.fit(data)
- val defaultPredictions = defaultModel.transform(test).select("prediction").as[Float].collect()
- assert(defaultPredictions.length == 4)
- assert(defaultPredictions.slice(0, 3).forall(_.isNaN))
- assert(!defaultPredictions.last.isNaN)
+ var defaultPredictionNotNaN = Float.NaN
--- End diff --
Fixed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...
Posted by smurakozi <gi...@git.apache.org>.
Github user smurakozi commented on a diff in the pull request:
https://github.com/apache/spark/pull/20362#discussion_r165312057
--- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -662,28 +676,32 @@ class ALSSuite
val knownItem = data.select(max("item")).as[Int].first()
val unknownItem = knownItem + 20
val test = Seq(
- (unknownUser, unknownItem),
- (knownUser, unknownItem),
- (unknownUser, knownItem),
- (knownUser, knownItem)
- ).toDF("user", "item")
+ (unknownUser, unknownItem, true),
+ (knownUser, unknownItem, true),
+ (unknownUser, knownItem, true),
+ (knownUser, knownItem, false)
+ ).toDF("user", "item", "expectedIsNaN")
val als = new ALS().setMaxIter(1).setRank(1)
// default is 'nan'
val defaultModel = als.fit(data)
- val defaultPredictions = defaultModel.transform(test).select("prediction").as[Float].collect()
- assert(defaultPredictions.length == 4)
- assert(defaultPredictions.slice(0, 3).forall(_.isNaN))
- assert(!defaultPredictions.last.isNaN)
+ var defaultPredictionNotNaN = Float.NaN
--- End diff --
I would get rid of this variable.
In `testTransformer` it just adds overhead, `assert(!defaultPredictionNotNaN.isNaN)` asserts something that was already checked in testTransformer, so it's only use is in `testTransformerByGlobalCheckFunc`.
Producing it is a bit convoluted, it's not easy to understand why it's needed.
I would make it clearer by doing a plain old transform using the `test` DF (or a smaller one containing only the knownUser, knownItem pair) and selecting the value.
An alternative solution could be to use real expected values in the `test` DF instead of "isNan" flags.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...
Posted by smurakozi <gi...@git.apache.org>.
Github user smurakozi commented on a diff in the pull request:
https://github.com/apache/spark/pull/20362#discussion_r165305989
--- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -599,8 +599,15 @@ class ALSSuite
(ex, act) =>
ex.userFactors.first().getSeq[Float](1) === act.userFactors.first.getSeq[Float](1)
} { (ex, act, _) =>
- ex.transform(_: DataFrame).select("prediction").first.getDouble(0) ~==
- act.transform(_: DataFrame).select("prediction").first.getDouble(0) absTol 1e-6
+ testTransformerByGlobalCheckFunc[Float](_: DataFrame, ex, "prediction") {
+ case exRows: Seq[Row] =>
--- End diff --
I think it's ok to keep ex.transform here. This way the code will be a bit simpler.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...
Posted by gaborgsomogyi <gi...@git.apache.org>.
Github user gaborgsomogyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/20362#discussion_r170072578
--- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -413,34 +411,36 @@ class ALSSuite
.setSeed(0)
val alpha = als.getAlpha
val model = als.fit(training.toDF())
- val predictions = model.transform(test.toDF()).select("rating", "prediction").rdd.map {
- case Row(rating: Float, prediction: Float) =>
- (rating.toDouble, prediction.toDouble)
+ testTransformerByGlobalCheckFunc[Rating[Int]](test.toDF(), model, "rating", "prediction") {
+ case rows: Seq[Row] =>
+ val predictions = rows.map(row => (row.getFloat(0).toDouble, row.getFloat(1).toDouble))
+
+ val rmse =
+ if (implicitPrefs) {
+ // TODO: Use a better (rank-based?) evaluation metric for implicit feedback.
+ // We limit the ratings and the predictions to interval [0, 1] and compute the
+ // weighted RMSE with the confidence scores as weights.
+ val (totalWeight, weightedSumSq) = predictions.map { case (rating, prediction) =>
+ val confidence = 1.0 + alpha * math.abs(rating)
+ val rating01 = math.max(math.min(rating, 1.0), 0.0)
+ val prediction01 = math.max(math.min(prediction, 1.0), 0.0)
+ val err = prediction01 - rating01
+ (confidence, confidence * err * err)
+ }.reduce[(Double, Double)] { case ((c0, e0), (c1, e1)) =>
+ (c0 + c1, e0 + e1)
+ }
+ math.sqrt(weightedSumSq / totalWeight)
+ } else {
+ val errorSquares = predictions.map { case (rating, prediction) =>
+ val err = rating - prediction
+ err * err
+ }
+ val mse = errorSquares.sum / errorSquares.length
+ math.sqrt(mse)
+ }
+ logInfo(s"Test RMSE is $rmse.")
+ assert(rmse < targetRMSE)
}
- val rmse =
--- End diff --
Mainly move but there was no mean function so implemented.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/20362#discussion_r170046788
--- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -586,6 +586,68 @@ class ALSSuite
allModelParamSettings, checkModelData)
}
+ private def checkNumericTypesALS(
--- End diff --
Is this mostly just moving the code from the test class, or did it change too?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/20362
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...
Posted by gaborgsomogyi <gi...@git.apache.org>.
Github user gaborgsomogyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/20362#discussion_r165379876
--- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -599,8 +599,15 @@ class ALSSuite
(ex, act) =>
ex.userFactors.first().getSeq[Float](1) === act.userFactors.first.getSeq[Float](1)
} { (ex, act, _) =>
- ex.transform(_: DataFrame).select("prediction").first.getDouble(0) ~==
- act.transform(_: DataFrame).select("prediction").first.getDouble(0) absTol 1e-6
+ testTransformerByGlobalCheckFunc[Float](_: DataFrame, ex, "prediction") {
+ case exRows: Seq[Row] =>
--- End diff --
Fixed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20362: [Spark-22886][ML][TESTS] ML test for structured streamin...
Posted by gaborgsomogyi <gi...@git.apache.org>.
Github user gaborgsomogyi commented on the issue:
https://github.com/apache/spark/pull/20362
gentle ping @jkbradley @WeichenXu123
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20362: [Spark-22886][ML][TESTS] ML test for structured streamin...
Posted by gaborgsomogyi <gi...@git.apache.org>.
Github user gaborgsomogyi commented on the issue:
https://github.com/apache/spark/pull/20362
ping @srowen
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...
Posted by smurakozi <gi...@git.apache.org>.
Github user smurakozi commented on a diff in the pull request:
https://github.com/apache/spark/pull/20362#discussion_r165308205
--- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -653,6 +666,7 @@ class ALSSuite
test("ALS cold start user/item prediction strategy") {
val spark = this.spark
import spark.implicits._
+
--- End diff --
nit: no need for empty line here
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20362: [Spark-22886][ML][TESTS] ML test for structured streamin...
Posted by gaborgsomogyi <gi...@git.apache.org>.
Github user gaborgsomogyi commented on the issue:
https://github.com/apache/spark/pull/20362
cc @jkbradley @WeichenXu123
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...
Posted by smurakozi <gi...@git.apache.org>.
Github user smurakozi commented on a diff in the pull request:
https://github.com/apache/spark/pull/20362#discussion_r165304423
--- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -566,6 +565,7 @@ class ALSSuite
test("read/write") {
val spark = this.spark
import spark.implicits._
+
--- End diff --
nit: new line is not needed
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20362: [Spark-22886][ML][TESTS] ML test for structured streamin...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20362
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...
Posted by smurakozi <gi...@git.apache.org>.
Github user smurakozi commented on a diff in the pull request:
https://github.com/apache/spark/pull/20362#discussion_r165745445
--- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -599,8 +598,11 @@ class ALSSuite
(ex, act) =>
ex.userFactors.first().getSeq[Float](1) === act.userFactors.first.getSeq[Float](1)
} { (ex, act, _) =>
- ex.transform(_: DataFrame).select("prediction").first.getDouble(0) ~==
- act.transform(_: DataFrame).select("prediction").first.getDouble(0) absTol 1e-6
+ testTransformerByGlobalCheckFunc[Float](_: DataFrame, act, "prediction") {
+ case actRows: Seq[Row] =>
+ ex.transform(_: DataFrame).select("prediction").first.getDouble(0) ~==
+ actRows(0).getDouble(0) absTol 1e-6
+ }
--- End diff --
I think this code does not check anything.
`testTransformerByGlobalCheckFunc[Float](_: DataFrame, act, "prediction")` is just a partial application of `testTransformerByGlobalCheckFunc`.
However, `checkNumericTypesALS` expects `check2: (ALSModel, ALSModel, DataFrame) => Unit`. It's happy to call the provided function, discard the partially applied function and use `()` instead, so it will typecheck.
As a consequence, the function doing the assert is never called, so the `~===` assertion never happens. You can check it say by asking for the 100th column of the first row - it will not produce an error.
This problem is not a result of your change, the original code had the same issue.
It could probably be simplified a bit but I think the original intent was to do a check like this:
```
{ (ex, act, df) =>
ex.transform(df).selectExpr("cast(prediction as double)").first.getDouble(0) ~==
act.transform(df).selectExpr("cast(prediction as double)").first.getDouble(0) absTol
1e-6
}
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...
Posted by smurakozi <gi...@git.apache.org>.
Github user smurakozi commented on a diff in the pull request:
https://github.com/apache/spark/pull/20362#discussion_r165308129
--- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -628,18 +635,24 @@ class ALSSuite
}
withClue("transform should fail when ids exceed integer range. ") {
val model = als.fit(df)
- assert(intercept[SparkException] {
- model.transform(df.select(df("user_big").as("user"), df("item"))).first
- }.getMessage.contains(msg))
- assert(intercept[SparkException] {
- model.transform(df.select(df("user_small").as("user"), df("item"))).first
- }.getMessage.contains(msg))
- assert(intercept[SparkException] {
- model.transform(df.select(df("item_big").as("item"), df("user"))).first
- }.getMessage.contains(msg))
- assert(intercept[SparkException] {
- model.transform(df.select(df("item_small").as("item"), df("user"))).first
- }.getMessage.contains(msg))
+ def testTransformIdExceedsIntRange[A : Encoder](dataFrame: DataFrame): Unit = {
+ assert(intercept[SparkException] {
+ model.transform(dataFrame).first
+ }.getMessage.contains(msg))
+ assert(intercept[StreamingQueryException] {
+ testTransformer[A](dataFrame, model, "prediction") {
+ case _ =>
--- End diff --
No need for a partial function here, you can simplify it to `{ _ => }`.
I would also add a small comment to make it explicit that we intentionally do not check anything.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...
Posted by gaborgsomogyi <gi...@git.apache.org>.
Github user gaborgsomogyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/20362#discussion_r165392031
--- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -693,7 +711,9 @@ class ALSSuite
val data = ratings.toDF
val model = new ALS().fit(data)
Seq("nan", "NaN", "Nan", "drop", "DROP", "Drop").foreach { s =>
- model.setColdStartStrategy(s).transform(data)
+ testTransformer[Rating[Int]](data, model.setColdStartStrategy(s), "prediction") {
+ case _ =>
--- End diff --
Fixed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20362: [Spark-22886][ML][TESTS] ML test for structured streamin...
Posted by gaborgsomogyi <gi...@git.apache.org>.
Github user gaborgsomogyi commented on the issue:
https://github.com/apache/spark/pull/20362
@smurakozi fix added for the original test issue.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...
Posted by gaborgsomogyi <gi...@git.apache.org>.
Github user gaborgsomogyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/20362#discussion_r165382316
--- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -628,18 +635,24 @@ class ALSSuite
}
withClue("transform should fail when ids exceed integer range. ") {
val model = als.fit(df)
- assert(intercept[SparkException] {
- model.transform(df.select(df("user_big").as("user"), df("item"))).first
- }.getMessage.contains(msg))
- assert(intercept[SparkException] {
- model.transform(df.select(df("user_small").as("user"), df("item"))).first
- }.getMessage.contains(msg))
- assert(intercept[SparkException] {
- model.transform(df.select(df("item_big").as("item"), df("user"))).first
- }.getMessage.contains(msg))
- assert(intercept[SparkException] {
- model.transform(df.select(df("item_small").as("item"), df("user"))).first
- }.getMessage.contains(msg))
+ def testTransformIdExceedsIntRange[A : Encoder](dataFrame: DataFrame): Unit = {
+ assert(intercept[SparkException] {
+ model.transform(dataFrame).first
+ }.getMessage.contains(msg))
+ assert(intercept[StreamingQueryException] {
+ testTransformer[A](dataFrame, model, "prediction") {
+ case _ =>
--- End diff --
Partial function removed. This code part expects `StreamingQueryException` which is quite close to this area. Not sure whether a comment would make it better.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...
Posted by smurakozi <gi...@git.apache.org>.
Github user smurakozi commented on a diff in the pull request:
https://github.com/apache/spark/pull/20362#discussion_r165312157
--- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -693,7 +711,9 @@ class ALSSuite
val data = ratings.toDF
val model = new ALS().fit(data)
Seq("nan", "NaN", "Nan", "drop", "DROP", "Drop").foreach { s =>
- model.setColdStartStrategy(s).transform(data)
+ testTransformer[Rating[Int]](data, model.setColdStartStrategy(s), "prediction") {
+ case _ =>
--- End diff --
Just like above, no need for partial function.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/20362#discussion_r170047180
--- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -693,7 +766,7 @@ class ALSSuite
val data = ratings.toDF
val model = new ALS().fit(data)
Seq("nan", "NaN", "Nan", "drop", "DROP", "Drop").foreach { s =>
- model.setColdStartStrategy(s).transform(data)
+ testTransformer[Rating[Int]](data, model.setColdStartStrategy(s), "prediction") { _ => }
--- End diff --
What's the no-op function at the end for -- just because it requires an argument?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...
Posted by gaborgsomogyi <gi...@git.apache.org>.
Github user gaborgsomogyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/20362#discussion_r165382596
--- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -653,6 +666,7 @@ class ALSSuite
test("ALS cold start user/item prediction strategy") {
val spark = this.spark
import spark.implicits._
+
--- End diff --
Fixed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20362: [Spark-22886][ML][TESTS] ML test for structured streamin...
Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/20362
Nice work! 👍
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...
Posted by gaborgsomogyi <gi...@git.apache.org>.
Github user gaborgsomogyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/20362#discussion_r170074167
--- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -586,6 +586,68 @@ class ALSSuite
allModelParamSettings, checkModelData)
}
+ private def checkNumericTypesALS(
--- End diff --
It's changed because one of the test hasn't done anything. Please take a look at the last commit it contains a test bugfix not only the transformation.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/20362#discussion_r170046857
--- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -413,34 +411,36 @@ class ALSSuite
.setSeed(0)
val alpha = als.getAlpha
val model = als.fit(training.toDF())
- val predictions = model.transform(test.toDF()).select("rating", "prediction").rdd.map {
- case Row(rating: Float, prediction: Float) =>
- (rating.toDouble, prediction.toDouble)
+ testTransformerByGlobalCheckFunc[Rating[Int]](test.toDF(), model, "rating", "prediction") {
+ case rows: Seq[Row] =>
+ val predictions = rows.map(row => (row.getFloat(0).toDouble, row.getFloat(1).toDouble))
+
+ val rmse =
+ if (implicitPrefs) {
+ // TODO: Use a better (rank-based?) evaluation metric for implicit feedback.
+ // We limit the ratings and the predictions to interval [0, 1] and compute the
+ // weighted RMSE with the confidence scores as weights.
+ val (totalWeight, weightedSumSq) = predictions.map { case (rating, prediction) =>
+ val confidence = 1.0 + alpha * math.abs(rating)
+ val rating01 = math.max(math.min(rating, 1.0), 0.0)
+ val prediction01 = math.max(math.min(prediction, 1.0), 0.0)
+ val err = prediction01 - rating01
+ (confidence, confidence * err * err)
+ }.reduce[(Double, Double)] { case ((c0, e0), (c1, e1)) =>
+ (c0 + c1, e0 + e1)
+ }
+ math.sqrt(weightedSumSq / totalWeight)
+ } else {
+ val errorSquares = predictions.map { case (rating, prediction) =>
+ val err = rating - prediction
+ err * err
+ }
+ val mse = errorSquares.sum / errorSquares.length
+ math.sqrt(mse)
+ }
+ logInfo(s"Test RMSE is $rmse.")
+ assert(rmse < targetRMSE)
}
- val rmse =
--- End diff --
This change is just a move, really? or did something else change as well?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20362: [Spark-22886][ML][TESTS] ML test for structured s...
Posted by gaborgsomogyi <gi...@git.apache.org>.
Github user gaborgsomogyi commented on a diff in the pull request:
https://github.com/apache/spark/pull/20362#discussion_r170072392
--- Diff: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ---
@@ -586,6 +586,68 @@ class ALSSuite
allModelParamSettings, checkModelData)
}
+ private def checkNumericTypesALS(
--- End diff --
Mainly move but there was no mean function so implemented this oneliner.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20362: [Spark-22886][ML][TESTS] ML test for structured streamin...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/20362
Merged to master
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org