You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (JIRA)" <ji...@apache.org> on 2019/08/14 19:38:00 UTC

[jira] [Updated] (SPARK-28736) pyspark.mllib.clustering fails on JDK11

     [ https://issues.apache.org/jira/browse/SPARK-28736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun updated SPARK-28736:
----------------------------------
    Description: 
Build Spark and run PySpark UT with JDK11.

{code}
$ build/sbt -Phadoop-3.2 test:package
$ python/run-tests --testnames 'pyspark.mllib.clustering' --python-executables python
...
File "/Users/dongjoon/APACHE/spark-master/python/pyspark/mllib/clustering.py", line 386, in __main__.GaussianMixtureModel
Failed example:
    abs(softPredicted[0] - 1.0) < 0.001
Expected:
    True
Got:
    False
**********************************************************************
File "/Users/dongjoon/APACHE/spark-master/python/pyspark/mllib/clustering.py", line 388, in __main__.GaussianMixtureModel
Failed example:
    abs(softPredicted[1] - 0.0) < 0.001
Expected:
    True
Got:
    False
**********************************************************************
   2 of  31 in __main__.GaussianMixtureModel
***Test Failed*** 2 failures.
{code}

  was:
Build Spark and run PySpark UT with JDK11. The last commented `assertTrue` failed.

{code}
$ build/sbt -Phadoop-3.2 test:package
$ python/run-tests --testnames 'pyspark.ml.tests.test_algorithms' --python-executables python
...
======================================================================
FAIL: test_raw_and_probability_prediction (pyspark.ml.tests.test_algorithms.MultilayerPerceptronClassifierTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/dongjoon/APACHE/spark-master/python/pyspark/ml/tests/test_algorithms.py", line 89, in test_raw_and_probability_prediction
    self.assertTrue(np.allclose(result.rawPrediction, expected_rawPrediction, atol=1E-4))
AssertionError: False is not true
{code}

{code:python}
class MultilayerPerceptronClassifierTest(SparkSessionTestCase):
    def test_raw_and_probability_prediction(self):
        data_path = "data/mllib/sample_multiclass_classification_data.txt"
        df = self.spark.read.format("libsvm").load(data_path)
        mlp = MultilayerPerceptronClassifier(maxIter=100, layers=[4, 5, 4, 3],
                                             blockSize=128, seed=123)
        model = mlp.fit(df)
        test = self.sc.parallelize([Row(features=Vectors.dense(0.1, 0.1, 0.25, 0.25))]).toDF()
        result = model.transform(test).head()
        expected_prediction = 2.0
        expected_probability = [0.0, 0.0, 1.0]
	        expected_rawPrediction = [-11.6081922998, -8.15827998691, 22.17757045]
	        self.assertTrue(result.prediction, expected_prediction)
	        self.assertTrue(np.allclose(result.probability, expected_probability, atol=1E-4))
	        self.assertTrue(np.allclose(result.rawPrediction, expected_rawPrediction, atol=1E-4))
	        # self.assertTrue(np.allclose(result.rawPrediction, expected_rawPrediction, atol=1E-4))
{code}


> pyspark.mllib.clustering fails on JDK11
> ---------------------------------------
>
>                 Key: SPARK-28736
>                 URL: https://issues.apache.org/jira/browse/SPARK-28736
>             Project: Spark
>          Issue Type: Sub-task
>          Components: PySpark
>    Affects Versions: 3.0.0
>            Reporter: Dongjoon Hyun
>            Priority: Major
>
> Build Spark and run PySpark UT with JDK11.
> {code}
> $ build/sbt -Phadoop-3.2 test:package
> $ python/run-tests --testnames 'pyspark.mllib.clustering' --python-executables python
> ...
> File "/Users/dongjoon/APACHE/spark-master/python/pyspark/mllib/clustering.py", line 386, in __main__.GaussianMixtureModel
> Failed example:
>     abs(softPredicted[0] - 1.0) < 0.001
> Expected:
>     True
> Got:
>     False
> **********************************************************************
> File "/Users/dongjoon/APACHE/spark-master/python/pyspark/mllib/clustering.py", line 388, in __main__.GaussianMixtureModel
> Failed example:
>     abs(softPredicted[1] - 0.0) < 0.001
> Expected:
>     True
> Got:
>     False
> **********************************************************************
>    2 of  31 in __main__.GaussianMixtureModel
> ***Test Failed*** 2 failures.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org