You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mxnet.apache.org by GitBox <gi...@apache.org> on 2021/02/18 13:39:30 UTC

[GitHub] [incubator-mxnet] leezu opened a new issue #19915: test_subgraph_exe1 fails on windows

leezu opened a new issue #19915:
URL: https://github.com/apache/incubator-mxnet/issues/19915


   https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fwindows-cpu/detail/PR-19908/2/pipeline


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] mseth10 commented on issue #19915: test_subgraph_exe1 fails on windows

Posted by GitBox <gi...@apache.org>.
mseth10 commented on issue #19915:
URL: https://github.com/apache/incubator-mxnet/issues/19915#issuecomment-827800958


   Are we still seeing this error? @leezu 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] mseth10 commented on issue #19915: test_subgraph_exe1 fails on windows

Posted by GitBox <gi...@apache.org>.
mseth10 commented on issue #19915:
URL: https://github.com/apache/incubator-mxnet/issues/19915#issuecomment-782790891


   The error occurs for the network
   ```
       data1 = mx.sym.Variable('data1', shape=(3, 3, 10, 10), dtype=np.float32)
       data2 = mx.sym.Variable('data2', shape=(1, 0, 2, 2))
       data3 = mx.sym.sin(data2)
       conv = mx.sym.Convolution(data=data1, weight=data3, kernel=(2, 2), num_filter=1)
       return (conv, ['data1'], [(3, 3, 10, 10)])
   ```
   with `simple_bind` during `infer_shape` and is flaky.
   
   @samskalicky Do you think we can change the shape of `data2` from (1,0,2,2) to (1,3,2,2)? Or is it intended to be inferred during shape inference?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] DickJC123 commented on issue #19915: test_subgraph_exe1 fails on windows

Posted by GitBox <gi...@apache.org>.
DickJC123 commented on issue #19915:
URL: https://github.com/apache/incubator-mxnet/issues/19915#issuecomment-858201350


   I recently set up master with an internal build/CI system, and see the reported failure on linux, but so far only on the CI machines when running the full test suite.  The test_subgraph_exe* tests pass when run individually on a non-CI machine.  The failure I'm seeing matches the reported one:
   ```
   Shape inconsistent, Provided = [1,0,2,2], inferred shape=(1,3,2,2)
   ```
   This error text comes from the macro SHAPE_ASSIGN_CHECK, which calls shape_assign():
   https://github.com/apache/incubator-mxnet/blob/master/src/operator/operator_common.h#L157-L181
   
   My confusion is in the interpretation of the shape [1,0,2,2].  It seems the test author wanted the C-dimension of this input weight tensor shape to be inferred.  However, shape_assign() seems to be applying the 'np_shape' view of the shape, where a 0 represents a known 0-size, generally reserved for a scalar (so incompatible with [1,3,2,2].  I wonder if a 'use_np_shape' mode is being non-deterministically applied somehow to this test.  Thoughts anyone?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] samskalicky commented on issue #19915: test_subgraph_exe1 fails on windows

Posted by GitBox <gi...@apache.org>.
samskalicky commented on issue #19915:
URL: https://github.com/apache/incubator-mxnet/issues/19915#issuecomment-783605055


   no idea, if its flaky then its working (sometimes) and we should figure out why it fails. Just changing the inputs is not a good way to "fix" this, but might be a good place to debug if that makes the problem go away consistently. But that shouldnt be the final resolution


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] leezu commented on issue #19915: test_subgraph_exe1 fails on windows

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #19915:
URL: https://github.com/apache/incubator-mxnet/issues/19915#issuecomment-828038860


   The test is currently disabled on Windows:
   
   https://github.com/apache/incubator-mxnet/blob/5722f8b38af58c5a296e46ca695bfaf7cff85040/tests/python/unittest/test_subgraph_op.py#L126-L127
   
   If you think it has been fixed, let's re-enable it :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] leezu commented on issue #19915: test_subgraph_exe1 fails on windows

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #19915:
URL: https://github.com/apache/incubator-mxnet/issues/19915#issuecomment-786311783


   Yes. Maybe there was a change to the Windows CI infrastructure that triggered this. I'm not sure.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] leezu edited a comment on issue #19915: test_subgraph_exe1 fails on windows

Posted by GitBox <gi...@apache.org>.
leezu edited a comment on issue #19915:
URL: https://github.com/apache/incubator-mxnet/issues/19915#issuecomment-781353480


   The first time I see a related error on master branch windows-cpu is https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fwindows-cpu/detail/master/2455/pipeline of https://github.com/apache/incubator-mxnet/commit/e164ceeb2c4b5fb8cacdac1f0cced683a80b70b0
   
   ```
   [2021-02-16T21:06:05.273Z] _______________ test_subgraph_exe4[sym14-op_names14-default_v2] _______________
   [2021-02-16T21:06:05.273Z] [gw0] win32 -- Python 3.7.0 C:\Python37\python.exe
   [2021-02-16T21:06:05.273Z] 
   [2021-02-16T21:06:05.273Z] sym = <Symbol convolution38>, subgraph_backend = 'default_v2'
   [2021-02-16T21:06:05.273Z] op_names = ['sin', 'Convolution']
   [2021-02-16T21:06:05.273Z] 
   [2021-02-16T21:06:05.273Z]     @pytest.mark.parametrize('subgraph_backend', ['default', 'default_v2'])
   [2021-02-16T21:06:05.273Z]     @pytest.mark.parametrize('sym,op_names', get_graphs())
   [2021-02-16T21:06:05.273Z]     def test_subgraph_exe4(sym, subgraph_backend, op_names):
   [2021-02-16T21:06:05.273Z]         """Use env var MXNET_SUBGRAPH_BACKEND=default to trigger graph partitioning in bind
   [2021-02-16T21:06:05.273Z]         and compare results of the partitioned sym and the original sym."""
   [2021-02-16T21:06:05.273Z]         def get_executor(sym, subgraph_backend=None, op_names=None, original_exec=None):
   [2021-02-16T21:06:05.273Z]             arg_shapes, _, aux_shapes = sym.infer_shape()
   [2021-02-16T21:06:05.273Z]             if subgraph_backend is None:
   [2021-02-16T21:06:05.273Z]                 arg_array = [mx.nd.random.uniform(shape=shape) for shape in arg_shapes]
   [2021-02-16T21:06:05.273Z]                 aux_array = [mx.nd.random.uniform(shape=shape) for shape in aux_shapes]
   [2021-02-16T21:06:05.273Z]             else:
   [2021-02-16T21:06:05.273Z]                 arg_array = None
   [2021-02-16T21:06:05.273Z]                 aux_array = None
   [2021-02-16T21:06:05.273Z]             exe = sym._bind(ctx=mx.current_context(),
   [2021-02-16T21:06:05.273Z]                            args=arg_array if subgraph_backend is None else original_exec.arg_arrays,
   [2021-02-16T21:06:05.273Z]                            aux_states=aux_array if subgraph_backend is None else original_exec.aux_arrays,
   [2021-02-16T21:06:05.273Z]                            grad_req='null')
   [2021-02-16T21:06:05.273Z]             exe.forward()
   [2021-02-16T21:06:05.273Z]             return exe
   [2021-02-16T21:06:05.273Z]     
   [2021-02-16T21:06:05.273Z]         sym, _, _ = sym
   [2021-02-16T21:06:05.273Z] >       original_exec = get_executor(sym)
   [2021-02-16T21:06:05.273Z] 
   [2021-02-16T21:06:05.273Z] tests\python\unittest\test_subgraph_op.py:237: 
   [2021-02-16T21:06:05.273Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
   [2021-02-16T21:06:05.273Z] tests\python\unittest\test_subgraph_op.py:222: in get_executor
   [2021-02-16T21:06:05.273Z]     arg_shapes, _, aux_shapes = sym.infer_shape()
   [2021-02-16T21:06:05.273Z] windows_package\python\mxnet\symbol\symbol.py:1132: in infer_shape
   [2021-02-16T21:06:05.273Z]     res = self._infer_shape_impl(False, *args, **kwargs)
   [2021-02-16T21:06:05.273Z] windows_package\python\mxnet\symbol\symbol.py:1267: in _infer_shape_impl
   [2021-02-16T21:06:05.273Z]     ctypes.byref(complete)))
   [2021-02-16T21:06:05.273Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
   [2021-02-16T21:06:05.273Z] 
   [2021-02-16T21:06:05.273Z] ret = -1
   [2021-02-16T21:06:05.273Z] 
   [2021-02-16T21:06:05.273Z]     def check_call(ret):
   [2021-02-16T21:06:05.273Z]         """Check the return value of C API call.
   [2021-02-16T21:06:05.273Z]     
   [2021-02-16T21:06:05.273Z]         This function will raise an exception when an error occurs.
   [2021-02-16T21:06:05.273Z]         Wrap every API call with this function.
   [2021-02-16T21:06:05.273Z]     
   [2021-02-16T21:06:05.273Z]         Parameters
   [2021-02-16T21:06:05.273Z]         ----------
   [2021-02-16T21:06:05.273Z]         ret : int
   [2021-02-16T21:06:05.273Z]             return value from API calls.
   [2021-02-16T21:06:05.273Z]         """
   [2021-02-16T21:06:05.273Z]         if ret != 0:
   [2021-02-16T21:06:05.273Z] >           raise get_last_ffi_error()
   [2021-02-16T21:06:05.273Z] E           mxnet.base.MXNetError: MXNetError: Error in operator convolution38: Shape inconsistent, Provided = [1,0,2,2], inferred shape=(1,3,2,2)
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] samskalicky edited a comment on issue #19915: test_subgraph_exe1 fails on windows

Posted by GitBox <gi...@apache.org>.
samskalicky edited a comment on issue #19915:
URL: https://github.com/apache/incubator-mxnet/issues/19915#issuecomment-783605055


   no idea, if its flaky then its working (sometimes) and we should figure out why it fails. Just changing the inputs is not a good way to "fix" this, but might be a good place to debug if that makes the problem go away consistently. But that shouldnt be the final resolution, that just hides the problem


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] samskalicky commented on issue #19915: test_subgraph_exe1 fails on windows

Posted by GitBox <gi...@apache.org>.
samskalicky commented on issue #19915:
URL: https://github.com/apache/incubator-mxnet/issues/19915#issuecomment-786297723


   So these tests pass on linux but are flaky on windows? is that the current state of things?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] leezu commented on issue #19915: test_subgraph_exe1 fails on windows

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #19915:
URL: https://github.com/apache/incubator-mxnet/issues/19915#issuecomment-781353480


   The first time I see a related error on master branch windows-cpu is https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fwindows-cpu/detail/master/2455/pipeline
   
   ```
   [2021-02-16T21:06:05.273Z] _______________ test_subgraph_exe4[sym14-op_names14-default_v2] _______________
   [2021-02-16T21:06:05.273Z] [gw0] win32 -- Python 3.7.0 C:\Python37\python.exe
   [2021-02-16T21:06:05.273Z] 
   [2021-02-16T21:06:05.273Z] sym = <Symbol convolution38>, subgraph_backend = 'default_v2'
   [2021-02-16T21:06:05.273Z] op_names = ['sin', 'Convolution']
   [2021-02-16T21:06:05.273Z] 
   [2021-02-16T21:06:05.273Z]     @pytest.mark.parametrize('subgraph_backend', ['default', 'default_v2'])
   [2021-02-16T21:06:05.273Z]     @pytest.mark.parametrize('sym,op_names', get_graphs())
   [2021-02-16T21:06:05.273Z]     def test_subgraph_exe4(sym, subgraph_backend, op_names):
   [2021-02-16T21:06:05.273Z]         """Use env var MXNET_SUBGRAPH_BACKEND=default to trigger graph partitioning in bind
   [2021-02-16T21:06:05.273Z]         and compare results of the partitioned sym and the original sym."""
   [2021-02-16T21:06:05.273Z]         def get_executor(sym, subgraph_backend=None, op_names=None, original_exec=None):
   [2021-02-16T21:06:05.273Z]             arg_shapes, _, aux_shapes = sym.infer_shape()
   [2021-02-16T21:06:05.273Z]             if subgraph_backend is None:
   [2021-02-16T21:06:05.273Z]                 arg_array = [mx.nd.random.uniform(shape=shape) for shape in arg_shapes]
   [2021-02-16T21:06:05.273Z]                 aux_array = [mx.nd.random.uniform(shape=shape) for shape in aux_shapes]
   [2021-02-16T21:06:05.273Z]             else:
   [2021-02-16T21:06:05.273Z]                 arg_array = None
   [2021-02-16T21:06:05.273Z]                 aux_array = None
   [2021-02-16T21:06:05.273Z]             exe = sym._bind(ctx=mx.current_context(),
   [2021-02-16T21:06:05.273Z]                            args=arg_array if subgraph_backend is None else original_exec.arg_arrays,
   [2021-02-16T21:06:05.273Z]                            aux_states=aux_array if subgraph_backend is None else original_exec.aux_arrays,
   [2021-02-16T21:06:05.273Z]                            grad_req='null')
   [2021-02-16T21:06:05.273Z]             exe.forward()
   [2021-02-16T21:06:05.273Z]             return exe
   [2021-02-16T21:06:05.273Z]     
   [2021-02-16T21:06:05.273Z]         sym, _, _ = sym
   [2021-02-16T21:06:05.273Z] >       original_exec = get_executor(sym)
   [2021-02-16T21:06:05.273Z] 
   [2021-02-16T21:06:05.273Z] tests\python\unittest\test_subgraph_op.py:237: 
   [2021-02-16T21:06:05.273Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
   [2021-02-16T21:06:05.273Z] tests\python\unittest\test_subgraph_op.py:222: in get_executor
   [2021-02-16T21:06:05.273Z]     arg_shapes, _, aux_shapes = sym.infer_shape()
   [2021-02-16T21:06:05.273Z] windows_package\python\mxnet\symbol\symbol.py:1132: in infer_shape
   [2021-02-16T21:06:05.273Z]     res = self._infer_shape_impl(False, *args, **kwargs)
   [2021-02-16T21:06:05.273Z] windows_package\python\mxnet\symbol\symbol.py:1267: in _infer_shape_impl
   [2021-02-16T21:06:05.273Z]     ctypes.byref(complete)))
   [2021-02-16T21:06:05.273Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
   [2021-02-16T21:06:05.273Z] 
   [2021-02-16T21:06:05.273Z] ret = -1
   [2021-02-16T21:06:05.273Z] 
   [2021-02-16T21:06:05.273Z]     def check_call(ret):
   [2021-02-16T21:06:05.273Z]         """Check the return value of C API call.
   [2021-02-16T21:06:05.273Z]     
   [2021-02-16T21:06:05.273Z]         This function will raise an exception when an error occurs.
   [2021-02-16T21:06:05.273Z]         Wrap every API call with this function.
   [2021-02-16T21:06:05.273Z]     
   [2021-02-16T21:06:05.273Z]         Parameters
   [2021-02-16T21:06:05.273Z]         ----------
   [2021-02-16T21:06:05.273Z]         ret : int
   [2021-02-16T21:06:05.273Z]             return value from API calls.
   [2021-02-16T21:06:05.273Z]         """
   [2021-02-16T21:06:05.273Z]         if ret != 0:
   [2021-02-16T21:06:05.273Z] >           raise get_last_ffi_error()
   [2021-02-16T21:06:05.273Z] E           mxnet.base.MXNetError: MXNetError: Error in operator convolution38: Shape inconsistent, Provided = [1,0,2,2], inferred shape=(1,3,2,2)
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] leezu commented on issue #19915: test_subgraph_exe1 fails on windows

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #19915:
URL: https://github.com/apache/incubator-mxnet/issues/19915#issuecomment-786167250


   This essentially blocks the master CI. I marked more subgraph tests for disabling on windows in https://github.com/apache/incubator-mxnet/pull/19908


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org