You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "huangtianhua (Jira)" <ji...@apache.org> on 2019/09/23 06:51:00 UTC

[jira] [Commented] (SPARK-29205) Pyspark tests failed for suspected performance problem on ARM

    [ https://issues.apache.org/jira/browse/SPARK-29205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935578#comment-16935578 ] 

huangtianhua commented on SPARK-29205:
--------------------------------------

And we found there is a similar issue community faced before: [https://github.com/apache/spark/commit/ab76900fedc05df7080c9b6c81d65a3f260c1c26#diff-f7e50078760ce2d40f35e4c3b9112227,] if we increase the timeout the tests are pass. 

> Pyspark tests failed for suspected performance problem on ARM
> -------------------------------------------------------------
>
>                 Key: SPARK-29205
>                 URL: https://issues.apache.org/jira/browse/SPARK-29205
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 3.0.0
>         Environment: OS: Ubuntu16.04
> Arch: aarch64
> Host: Virtual Machine
>            Reporter: zhao bo
>            Priority: Major
>
> We test the pyspark on ARM VM. But found some test fails, once we change the source code to extend the wait time for making sure those test tasks had finished, then the test will pass.
>  
> The affected test cases including:
> pyspark.mllib.tests.test_streaming_algorithms:StreamingLinearRegressionWithTests.test_parameter_convergence
> pyspark.mllib.tests.test_streaming_algorithms:StreamingLogisticRegressionWithSGDTests.test_convergence
> pyspark.mllib.tests.test_streaming_algorithms:StreamingLogisticRegressionWithSGDTests.test_parameter_accuracy
> pyspark.mllib.tests.test_streaming_algorithms:StreamingLogisticRegressionWithSGDTests.test_training_and_prediction
> The error message about above test fails:
> ======================================================================
> FAIL: test_parameter_convergence (pyspark.mllib.tests.test_streaming_algorithms.StreamingLinearRegressionWithTests)
> Test that the model parameters improve with streaming data.
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 429, in test_parameter_convergen ce
>     self._eventually(condition, catch_assertions=True)
>   File "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 74, in _eventually
>     raise lastValue
>   File "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 65, in _eventually
>     lastValue = condition()
>   File "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 425, in condition
>     self.assertEqual(len(model_weights), len(batches))
> AssertionError: 6 != 10
>  
>  
> ======================================================================
> FAIL: test_convergence (pyspark.mllib.tests.test_streaming_algorithms.StreamingLogisticRegressionWithSGDTests)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 292, in test_convergence
>     self._eventually(condition, 60.0, catch_assertions=True)
>   File "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 74, in _eventually
>     raise lastValue
>   File "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 65, in _eventually
>     lastValue = condition()
>   File "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 288, in condition
>     self.assertEqual(len(models), len(input_batches))
> AssertionError: 19 != 20
>  
> ======================================================================
> FAIL: test_parameter_accuracy (pyspark.mllib.tests.test_streaming_algorithms.StreamingLogisticRegressionWithSGDTests)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 266, in test_parameter_accuracy
>     self._eventually(condition, catch_assertions=True)
>   File "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 74, in _eventually
>     raise lastValue
>   File "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 65, in _eventually
>     lastValue = condition()
>   File "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 263, in condition
>     self.assertAlmostEqual(rel, 0.1, 1)
> AssertionError: 0.21309223935797794 != 0.1 within 1 places
>  
> ======================================================================
> FAIL: test_training_and_prediction (pyspark.mllib.tests.test_streaming_algorithms.StreamingLogisticRegressionWithSGDTests)
> Test that the model improves on toy data with no. of batches
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 367, in test_training_and_predic tion
>     self._eventually(condition, timeout=60.0)
>   File "/usr/local/src/hth/spark/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 78, in _eventually
>     % (timeout, lastValue))
> AssertionError: Test failed due to timeout after 60 sec, with last condition returning: Latest errors: 0.67, 0.71, 0.78, 0.7, 0. 75, 0.74, 0.73, 0.69, 0.62, 0.71, 0.69, 0.75, 0.72, 0.77, 0.71, 0.74, 0.76, 0.78, 0.7, 0.78, 0.8
>  
>  
> Is it possible to expand the job time to make sure the job run finish if the test result is not change?? Any help or advice is welcome. Thanks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org