You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mxnet.apache.org by GitBox <gi...@apache.org> on 2022/05/25 20:13:08 UTC

[GitHub] [incubator-mxnet] waytrue17 opened a new issue, #21040: [v1.x][CI] test_laop failed on CI

waytrue17 opened a new issue, #21040:
URL: https://github.com/apache/incubator-mxnet/issues/21040

   ## Description
   On v1.x CI, [test_operator_gpu.test_laop](https://github.com/apache/incubator-mxnet/blob/2a381e83de6335b9a0d09379f28284a8d29808d6/tests/python/unittest/test_operator.py#L6537) failed the numerical assertion by a large margin (~60% mismatch).
   ```
   [2022-05-25T18:08:54.036Z] ======================================================================
   
   [2022-05-25T18:08:54.036Z] FAIL: test_operator_gpu.test_laop
   
   [2022-05-25T18:08:54.036Z] ----------------------------------------------------------------------
   
   [2022-05-25T18:08:54.036Z] Traceback (most recent call last):
   
   [2022-05-25T18:08:54.036Z]   File "/usr/local/lib/python3.7/dist-packages/nose/case.py", line 198, in runTest
   
   [2022-05-25T18:08:54.037Z]     self.test(*self.arg)
   
   [2022-05-25T18:08:54.037Z]   File "/usr/local/lib/python3.7/dist-packages/nose/util.py", line 620, in newfunc
   
   [2022-05-25T18:08:54.037Z]     return func(*arg, **kw)
   
   [2022-05-25T18:08:54.037Z]   File "/work/mxnet/tests/python/gpu/../unittest/common.py", line 218, in test_new
   
   [2022-05-25T18:08:54.037Z]     orig_test(*args, **kwargs)
   
   [2022-05-25T18:08:54.037Z]   File "/work/mxnet/tests/python/gpu/../unittest/test_operator.py", line 6593, in test_laop
   
   [2022-05-25T18:08:54.037Z]     check_fw_grad(test_potri, [data_in], [res_potri])
   
   [2022-05-25T18:08:54.037Z]   File "/work/mxnet/tests/python/gpu/../unittest/test_operator.py", line 6558, in check_fw_grad
   
   [2022-05-25T18:08:54.037Z]     atol=atol_bw, dtype=dtype)
   
   [2022-05-25T18:08:54.037Z]   File "/work/mxnet/python/mxnet/test_utils.py", line 1238, in check_numeric_gradient
   
   [2022-05-25T18:08:54.037Z]     ("NUMERICAL_%s"%name, "BACKWARD_%s"%name))
   
   [2022-05-25T18:08:54.037Z]   File "/work/mxnet/python/mxnet/test_utils.py", line 749, in assert_almost_equal
   
   [2022-05-25T18:08:54.037Z]     raise AssertionError(msg)
   
   [2022-05-25T18:08:54.037Z] AssertionError: 
   
   [2022-05-25T18:08:54.037Z] Items are not equal:
   
   [2022-05-25T18:08:54.037Z] Error 62764.305585 exceeds tolerance rtol=1.000000e-05, atol=1.000000e-05 (mismatch at least 68.750000%).
   
   [2022-05-25T18:08:54.037Z] Location of maximum error: (1, 2, 0, 0), NUMERICAL_data1=-0.00285029, BACKWARD_data1=-1.69324987
   
   [2022-05-25T18:08:54.037Z]  ACTUAL: array([[[[-0.00179968]],
   
   [2022-05-25T18:08:54.037Z] 
   
   [2022-05-25T18:08:54.037Z]         [[-0.00871502]],...
   
   [2022-05-25T18:08:54.037Z]  DESIRED: array([[[[-0.63388058]],
   
   [2022-05-25T18:08:54.037Z] 
   
   [2022-05-25T18:08:54.037Z]         [[-1.51251872]],...
   
   [2022-05-25T18:08:54.037Z] -------------------- >> begin captured stdout << ---------------------
   
   [2022-05-25T18:08:54.037Z] 
   
   [2022-05-25T18:08:54.037Z] *** Maximum errors for vector of size 16:  rtol=1e-05, atol=1e-05
   
   [2022-05-25T18:08:54.037Z] 
   
   [2022-05-25T18:08:54.037Z]   1: Error 62764.305585  Location of error: (1, 2, 0, 0), NUMERICAL_data1=-0.00285029, BACKWARD_data1=-1.69324987
   
   [2022-05-25T18:08:54.037Z]   2: Error 61551.914711  Location of error: (2, 3, 0, 0), NUMERICAL_data1=-0.00620331, BACKWARD_data1=-1.61704400
   
   [2022-05-25T18:08:54.037Z]   3: Error 60891.178802  Location of error: (2, 0, 0, 0), NUMERICAL_data1=-0.02516404, BACKWARD_data1=-1.62131153
   
   [2022-05-25T18:08:54.037Z]   4: Error 59852.437601  Location of error: (0, 1, 0, 0), NUMERICAL_data1=-0.00871502, BACKWARD_data1=-1.51251872
   
   [2022-05-25T18:08:54.037Z]   5: Error 59245.766913  Location of error: (0, 3, 0, 0), NUMERICAL_data1=-0.29727407, BACKWARD_data1=-2.18316398
   
   [2022-05-25T18:08:54.037Z]   6: Error 57879.102497  Location of error: (0, 2, 0, 0), NUMERICAL_data1=-0.00213730, BACKWARD_data1=-1.37919265
   
   [2022-05-25T18:08:54.037Z]   7: Error 57522.249004  Location of error: (3, 1, 0, 0), NUMERICAL_data1=-0.00394569, BACKWARD_data1=-1.36346243
   
   [2022-05-25T18:08:54.037Z]   8: Error 54722.891910  Location of error: (3, 0, 0, 0), NUMERICAL_data1=-0.00189854, BACKWARD_data1=-1.21281479
   
   [2022-05-25T18:08:54.037Z]   9: Error 53389.739167  Location of error: (1, 1, 0, 0), NUMERICAL_data1=-0.11656958, BACKWARD_data1=-1.39554458
   
   [2022-05-25T18:08:54.037Z]  10: Error 51433.366492  Location of error: (1, 0, 0, 0), NUMERICAL_data1=-0.00173236, BACKWARD_data1=-1.06259378
   
   [2022-05-25T18:08:54.037Z] 
   
   [2022-05-25T18:08:54.037Z] --------------------- >> end captured stdout << ----------------------
   
   [2022-05-25T18:08:54.037Z] -------------------- >> begin captured logging << --------------------
   
   [2022-05-25T18:08:54.037Z] common: WARNING: Error seen with seeded test, use MXNET_TEST_SEED=563683812 to reproduce.
   
   [2022-05-25T18:08:54.037Z] --------------------- >> end captured logging << ---------------------
   ```
   
   ## Occurrences
   Re-ran the teat multiple times (> 10 times), only a few passed: https://jenkins.mxnet-ci.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/activity?branch=PR-21039
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org