You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by jkbradley <gi...@git.apache.org> on 2017/10/25 18:25:53 UTC

[GitHub] spark pull request #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluat...

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19122#discussion_r146940974
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -836,6 +836,27 @@ def test_save_load_simple_estimator(self):
             loadedModel = CrossValidatorModel.load(cvModelPath)
             self.assertEqual(loadedModel.bestModel.uid, cvModel.bestModel.uid)
     
    +    def test_parallel_evaluation(self):
    +        dataset = self.spark.createDataFrame(
    +            [(Vectors.dense([0.0]), 0.0),
    +             (Vectors.dense([0.4]), 1.0),
    +             (Vectors.dense([0.5]), 0.0),
    +             (Vectors.dense([0.6]), 1.0),
    +             (Vectors.dense([1.0]), 1.0)] * 10,
    +            ["features", "label"])
    +
    +        lr = LogisticRegression()
    +        grid = ParamGridBuilder().addGrid(lr.maxIter, [0, 1]).build()
    --- End diff --
    
    With only 0 or 1 iteration, I don't think we could expect to see big differences between parallelism 1 or 2, even if there were bugs in our implementation.  How about trying more, saying 5 and 6 iterations?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org