You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jkbradley <gi...@git.apache.org> on 2017/10/25 18:25:53 UTC
[GitHub] spark pull request #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluat...
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/19122#discussion_r146940974
--- Diff: python/pyspark/ml/tests.py ---
@@ -836,6 +836,27 @@ def test_save_load_simple_estimator(self):
loadedModel = CrossValidatorModel.load(cvModelPath)
self.assertEqual(loadedModel.bestModel.uid, cvModel.bestModel.uid)
+ def test_parallel_evaluation(self):
+ dataset = self.spark.createDataFrame(
+ [(Vectors.dense([0.0]), 0.0),
+ (Vectors.dense([0.4]), 1.0),
+ (Vectors.dense([0.5]), 0.0),
+ (Vectors.dense([0.6]), 1.0),
+ (Vectors.dense([1.0]), 1.0)] * 10,
+ ["features", "label"])
+
+ lr = LogisticRegression()
+ grid = ParamGridBuilder().addGrid(lr.maxIter, [0, 1]).build()
--- End diff --
With only 0 or 1 iteration, I don't think we could expect to see big differences between parallelism 1 or 2, even if there were bugs in our implementation. How about trying more, saying 5 and 6 iterations?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org