You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mxnet.apache.org by Marco de Abreu <ma...@googlemail.com> on 2018/05/09 12:32:15 UTC

Problems with test_sparse_operator.test_sparse_mathematical_core

Hello,

I'm currently working on auto scaling and encountering a consistent test
failure on CPU. At the moment, I'm not really sure what's causing this,
considering the setup should be identical.

http://jenkins.mxnet-ci-dev.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/ci-master/557/pipeline/694

======================================================================

FAIL: test_sparse_operator.test_sparse_mathematical_core

----------------------------------------------------------------------

Traceback (most recent call last):

  File "/usr/local/lib/python3.5/dist-packages/nose/case.py", line 198, in
runTest

    self.test(*self.arg)

  File "/work/mxnet/tests/python/unittest/common.py", line 157, in test_new

    orig_test(*args, **kwargs)

  File "/work/mxnet/tests/python/unittest/test_sparse_operator.py", line
1084, in test_sparse_mathematical_core

    density=density, ograd_density=ograd_density)

  File "/work/mxnet/tests/python/unittest/test_sparse_operator.py", line
1056, in check_mathematical_core

    density=density, ograd_density=ograd_density)

  File "/work/mxnet/tests/python/unittest/test_sparse_operator.py", line
698, in check_sparse_mathematical_core

    assert_almost_equal(arr_grad, input_grad, equal_nan=True)

  File "/work/mxnet/python/mxnet/test_utils.py", line 493, in
assert_almost_equal

    raise AssertionError(msg)

AssertionError:

Items are not equal:

Error nan exceeds tolerance rtol=0.000010, atol=0.000000.  Location of
maximum error:(0, 0), a=inf, b=-inf

 a: array([[inf],

       [inf],

       [inf],...

 b: array([[-inf],

       [-inf],

       [-inf],...

-------------------- >> begin captured stdout << ---------------------

pass 0

0.0, 0.0, False

--------------------- >> end captured stdout << ----------------------

-------------------- >> begin captured logging << --------------------

common: INFO: Setting test np/mx/python random seeds, use
MXNET_TEST_SEED=2103230797 to reproduce.

--------------------- >> end captured logging << ---------------------


Does this ring any bells?

Thanks in advance!

-Marco

Re: Problems with test_sparse_operator.test_sparse_mathematical_core

Posted by Marco de Abreu <ma...@googlemail.com>.
Thanks Haibin, it was in fact related to scipy. The latest version 1.1.0
causes this test to fail. Switching back to 1.0.1 makes the test pass. I
have created an issue at [1].

I'm not familiar with Scipy and don't really know where to start for
creating a fix. For now, I have submitted a PR [2] to pin the version to
1.0.1. None the less, it would be good if we could get this fixed. Could
somebody assist me here?

Best regards,
Marco

[1]: https://github.com/apache/incubator-mxnet/issues/10901
[2]: https://github.com/apache/incubator-mxnet/pull/10902



On Wed, May 9, 2018 at 11:08 PM, Marco de Abreu <
marco.g.abreu@googlemail.com> wrote:

> Hi Haibin,
>
> auto scaling is currently not enabled on MXNet Apache CI. This only
> happens on my test environment. Thanks for the hint with Scipy, I will
> definitely look into this!
>
> That's a good idea. I have spoken to Steffen in the last days and we
> brainstormed some ideas how to handle test failures. We will let the
> community know if we have a more detailed plan.
>
> Best regards,
> Marco
>
> On Wed, May 9, 2018 at 7:19 PM, Haibin Lin <ha...@gmail.com>
> wrote:
>
>> Hi Marco,
>>
>> Is auto scaling already enabled on mxnet apache CI, or this is only
>> happens
>> on your setup? I see the test is using scipy. Do both environments have
>> the
>> same version of scipy installed?
>>
>> I recently see lots of test failures on mxnet master. One thing on my wish
>> list is a database which stores all the occurrences of test failures and
>> their commit ids, which would be very helpful for initial diagnosing what
>> code changes potentially introduced bugs. Otherwise clicking all past
>> tests
>> and reading those logs requires a lot of manual work.
>>
>> Best,
>> Haibin
>>
>> On Wed, May 9, 2018 at 5:32 AM, Marco de Abreu <
>> marco.g.abreu@googlemail.com
>> > wrote:
>>
>> > Hello,
>> >
>> > I'm currently working on auto scaling and encountering a consistent test
>> > failure on CPU. At the moment, I'm not really sure what's causing this,
>> > considering the setup should be identical.
>> >
>> > http://jenkins.mxnet-ci-dev.amazon-ml.com/blue/organizations/jenkins/
>> > incubator-mxnet/detail/ci-master/557/pipeline/694
>> >
>> > ======================================================================
>> >
>> > FAIL: test_sparse_operator.test_sparse_mathematical_core
>> >
>> > ----------------------------------------------------------------------
>> >
>> > Traceback (most recent call last):
>> >
>> >   File "/usr/local/lib/python3.5/dist-packages/nose/case.py", line
>> 198, in
>> > runTest
>> >
>> >     self.test(*self.arg)
>> >
>> >   File "/work/mxnet/tests/python/unittest/common.py", line 157, in
>> > test_new
>> >
>> >     orig_test(*args, **kwargs)
>> >
>> >   File "/work/mxnet/tests/python/unittest/test_sparse_operator.py",
>> line
>> > 1084, in test_sparse_mathematical_core
>> >
>> >     density=density, ograd_density=ograd_density)
>> >
>> >   File "/work/mxnet/tests/python/unittest/test_sparse_operator.py",
>> line
>> > 1056, in check_mathematical_core
>> >
>> >     density=density, ograd_density=ograd_density)
>> >
>> >   File "/work/mxnet/tests/python/unittest/test_sparse_operator.py",
>> line
>> > 698, in check_sparse_mathematical_core
>> >
>> >     assert_almost_equal(arr_grad, input_grad, equal_nan=True)
>> >
>> >   File "/work/mxnet/python/mxnet/test_utils.py", line 493, in
>> > assert_almost_equal
>> >
>> >     raise AssertionError(msg)
>> >
>> > AssertionError:
>> >
>> > Items are not equal:
>> >
>> > Error nan exceeds tolerance rtol=0.000010, atol=0.000000.  Location of
>> > maximum error:(0, 0), a=inf, b=-inf
>> >
>> >  a: array([[inf],
>> >
>> >        [inf],
>> >
>> >        [inf],...
>> >
>> >  b: array([[-inf],
>> >
>> >        [-inf],
>> >
>> >        [-inf],...
>> >
>> > -------------------- >> begin captured stdout << ---------------------
>> >
>> > pass 0
>> >
>> > 0.0, 0.0, False
>> >
>> > --------------------- >> end captured stdout << ----------------------
>> >
>> > -------------------- >> begin captured logging << --------------------
>> >
>> > common: INFO: Setting test np/mx/python random seeds, use
>> > MXNET_TEST_SEED=2103230797 to reproduce.
>> >
>> > --------------------- >> end captured logging << ---------------------
>> >
>> >
>> > Does this ring any bells?
>> >
>> > Thanks in advance!
>> >
>> > -Marco
>> >
>>
>
>

Re: Problems with test_sparse_operator.test_sparse_mathematical_core

Posted by Marco de Abreu <ma...@googlemail.com>.
Hi Haibin,

auto scaling is currently not enabled on MXNet Apache CI. This only happens
on my test environment. Thanks for the hint with Scipy, I will definitely
look into this!

That's a good idea. I have spoken to Steffen in the last days and we
brainstormed some ideas how to handle test failures. We will let the
community know if we have a more detailed plan.

Best regards,
Marco

On Wed, May 9, 2018 at 7:19 PM, Haibin Lin <ha...@gmail.com> wrote:

> Hi Marco,
>
> Is auto scaling already enabled on mxnet apache CI, or this is only happens
> on your setup? I see the test is using scipy. Do both environments have the
> same version of scipy installed?
>
> I recently see lots of test failures on mxnet master. One thing on my wish
> list is a database which stores all the occurrences of test failures and
> their commit ids, which would be very helpful for initial diagnosing what
> code changes potentially introduced bugs. Otherwise clicking all past tests
> and reading those logs requires a lot of manual work.
>
> Best,
> Haibin
>
> On Wed, May 9, 2018 at 5:32 AM, Marco de Abreu <
> marco.g.abreu@googlemail.com
> > wrote:
>
> > Hello,
> >
> > I'm currently working on auto scaling and encountering a consistent test
> > failure on CPU. At the moment, I'm not really sure what's causing this,
> > considering the setup should be identical.
> >
> > http://jenkins.mxnet-ci-dev.amazon-ml.com/blue/organizations/jenkins/
> > incubator-mxnet/detail/ci-master/557/pipeline/694
> >
> > ======================================================================
> >
> > FAIL: test_sparse_operator.test_sparse_mathematical_core
> >
> > ----------------------------------------------------------------------
> >
> > Traceback (most recent call last):
> >
> >   File "/usr/local/lib/python3.5/dist-packages/nose/case.py", line 198,
> in
> > runTest
> >
> >     self.test(*self.arg)
> >
> >   File "/work/mxnet/tests/python/unittest/common.py", line 157, in
> > test_new
> >
> >     orig_test(*args, **kwargs)
> >
> >   File "/work/mxnet/tests/python/unittest/test_sparse_operator.py", line
> > 1084, in test_sparse_mathematical_core
> >
> >     density=density, ograd_density=ograd_density)
> >
> >   File "/work/mxnet/tests/python/unittest/test_sparse_operator.py", line
> > 1056, in check_mathematical_core
> >
> >     density=density, ograd_density=ograd_density)
> >
> >   File "/work/mxnet/tests/python/unittest/test_sparse_operator.py", line
> > 698, in check_sparse_mathematical_core
> >
> >     assert_almost_equal(arr_grad, input_grad, equal_nan=True)
> >
> >   File "/work/mxnet/python/mxnet/test_utils.py", line 493, in
> > assert_almost_equal
> >
> >     raise AssertionError(msg)
> >
> > AssertionError:
> >
> > Items are not equal:
> >
> > Error nan exceeds tolerance rtol=0.000010, atol=0.000000.  Location of
> > maximum error:(0, 0), a=inf, b=-inf
> >
> >  a: array([[inf],
> >
> >        [inf],
> >
> >        [inf],...
> >
> >  b: array([[-inf],
> >
> >        [-inf],
> >
> >        [-inf],...
> >
> > -------------------- >> begin captured stdout << ---------------------
> >
> > pass 0
> >
> > 0.0, 0.0, False
> >
> > --------------------- >> end captured stdout << ----------------------
> >
> > -------------------- >> begin captured logging << --------------------
> >
> > common: INFO: Setting test np/mx/python random seeds, use
> > MXNET_TEST_SEED=2103230797 to reproduce.
> >
> > --------------------- >> end captured logging << ---------------------
> >
> >
> > Does this ring any bells?
> >
> > Thanks in advance!
> >
> > -Marco
> >
>

Re: Problems with test_sparse_operator.test_sparse_mathematical_core

Posted by Haibin Lin <ha...@gmail.com>.
Hi Marco,

Is auto scaling already enabled on mxnet apache CI, or this is only happens
on your setup? I see the test is using scipy. Do both environments have the
same version of scipy installed?

I recently see lots of test failures on mxnet master. One thing on my wish
list is a database which stores all the occurrences of test failures and
their commit ids, which would be very helpful for initial diagnosing what
code changes potentially introduced bugs. Otherwise clicking all past tests
and reading those logs requires a lot of manual work.

Best,
Haibin

On Wed, May 9, 2018 at 5:32 AM, Marco de Abreu <marco.g.abreu@googlemail.com
> wrote:

> Hello,
>
> I'm currently working on auto scaling and encountering a consistent test
> failure on CPU. At the moment, I'm not really sure what's causing this,
> considering the setup should be identical.
>
> http://jenkins.mxnet-ci-dev.amazon-ml.com/blue/organizations/jenkins/
> incubator-mxnet/detail/ci-master/557/pipeline/694
>
> ======================================================================
>
> FAIL: test_sparse_operator.test_sparse_mathematical_core
>
> ----------------------------------------------------------------------
>
> Traceback (most recent call last):
>
>   File "/usr/local/lib/python3.5/dist-packages/nose/case.py", line 198, in
> runTest
>
>     self.test(*self.arg)
>
>   File "/work/mxnet/tests/python/unittest/common.py", line 157, in
> test_new
>
>     orig_test(*args, **kwargs)
>
>   File "/work/mxnet/tests/python/unittest/test_sparse_operator.py", line
> 1084, in test_sparse_mathematical_core
>
>     density=density, ograd_density=ograd_density)
>
>   File "/work/mxnet/tests/python/unittest/test_sparse_operator.py", line
> 1056, in check_mathematical_core
>
>     density=density, ograd_density=ograd_density)
>
>   File "/work/mxnet/tests/python/unittest/test_sparse_operator.py", line
> 698, in check_sparse_mathematical_core
>
>     assert_almost_equal(arr_grad, input_grad, equal_nan=True)
>
>   File "/work/mxnet/python/mxnet/test_utils.py", line 493, in
> assert_almost_equal
>
>     raise AssertionError(msg)
>
> AssertionError:
>
> Items are not equal:
>
> Error nan exceeds tolerance rtol=0.000010, atol=0.000000.  Location of
> maximum error:(0, 0), a=inf, b=-inf
>
>  a: array([[inf],
>
>        [inf],
>
>        [inf],...
>
>  b: array([[-inf],
>
>        [-inf],
>
>        [-inf],...
>
> -------------------- >> begin captured stdout << ---------------------
>
> pass 0
>
> 0.0, 0.0, False
>
> --------------------- >> end captured stdout << ----------------------
>
> -------------------- >> begin captured logging << --------------------
>
> common: INFO: Setting test np/mx/python random seeds, use
> MXNET_TEST_SEED=2103230797 to reproduce.
>
> --------------------- >> end captured logging << ---------------------
>
>
> Does this ring any bells?
>
> Thanks in advance!
>
> -Marco
>