You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/08/16 02:25:13 UTC

[GitHub] [incubator-mxnet] leeyeetonn opened a new issue #18936: Floating point exception in mxnet.ndarray.op.random_pdf_dirichlet

leeyeetonn opened a new issue #18936:
URL: https://github.com/apache/incubator-mxnet/issues/18936


   ## Description
   (A clear and concise description of what the bug is.)
   `mxnet.ndarray.op.random_pdf_dirichlet` has floating point exception when given `sample`'s shape has 0. Please see the provided code as example.
   ### Error Message
   (Paste the complete error message. Please also include stack trace by setting environment variable `DMLC_LOG_STACK_TRACE_DEPTH=10` before running your script.)
   >Floating point exception (core dumped)
   ## To Reproduce
   (If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)
   ```python
   import mxnet
   import numpy as np
   sample = mxnet.nd.array(np.random.rand(4,0))
   alpha = mxnet.nd.array(np.random.rand(1))
   mxnet.ndarray.op.random_pdf_dirichlet(sample=sample, alpha=alpha)
   ```
   ### Steps to reproduce
   (Paste the commands you ran that produced the error.)
   
   1. run the provided code in python interpreter or as a script
   2.
   
   ## What have you tried to solve it?
   
   1.
   2.
   
   ## Environment
   
   We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:
   ```
   curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python
   
   # paste outputs here
   ```
   Got 404 when trying to get the script.
   
   Some environment information:
   
   * OS: ubuntu 18.04
   * Python: 3.7.6
   * pip: 20.0.2
   * numpy: 1.18.5
   * mxnet: 1.6.0


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on issue #18936: Floating point exception in mxnet.ndarray.op.random_pdf_dirichlet

Posted by GitBox <gi...@apache.org>.
szha commented on issue #18936:
URL: https://github.com/apache/incubator-mxnet/issues/18936#issuecomment-678064250


   @xidulu since we are deprecating ndarray, do we need to register an alias of this op in np/npx? (or is it already registered)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha edited a comment on issue #18936: Floating point exception in mxnet.ndarray.op.random_pdf_dirichlet

Posted by GitBox <gi...@apache.org>.
szha edited a comment on issue #18936:
URL: https://github.com/apache/incubator-mxnet/issues/18936#issuecomment-678062954


   So here's the problem:
   ```
   % DMLC_LOG_STACK_TRACE_DEPTH=150 MXNET_ENGINE_TYPE=NaiveEngine lldb python3.7 -- test_18936.py
   (lldb) target create "python3.7"
   Current executable set to 'python3.7' (x86_64).
   (lldb) settings set -- target.run-args  "test_18936.py"
   (lldb) run
   Process 27100 launched: '/usr/local/bin/python3.7' (x86_64)
   Process 27100 stopped
   * thread #2, stop reason = exec
       frame #0: 0x0000000100006000 dyld`_dyld_start
   dyld`_dyld_start:
   ->  0x100006000 <+0>: popq   %rdi
       0x100006001 <+1>: pushq  $0x0
       0x100006003 <+3>: movq   %rsp, %rbp
       0x100006006 <+6>: andq   $-0x10, %rsp
   (lldb) cont
   Process 27100 resuming
   [23:14:55] ../src/engine/engine.cc:55: MXNet start using engine: NaiveEngine
   [23:14:55] ../src/storage/storage.cc:198: Using Pooled (Naive) StorageManager for CPU
   Process 27100 stopped
   * thread #2, queue = 'com.apple.main-thread', stop reason = EXC_ARITHMETIC (code=EXC_I386_DIV, subcode=0x0)
       frame #0: 0x0000000116f5b318 libmxnet.dylib`void mxnet::op::mxnet_op::Kernel<mxnet::op::LaunchExWrapper<mxnet::op::PDF_Dirichlet<false> >, mshadow::cpu>::LaunchEx<int, int, float*, float*, float*>(mshadow::Stream<mshadow::cpu>*, unsigned long, int, int, float*, float*, float*) at pdf_op.h:443
      440 	    index_t i = start;
      441
      442 	    // Get aligned
   -> 443 	    const index_t align_step = sample_size - (i % sample_size);
      444 	    const index_t first_stride = length > align_step ? align_step : length;
      445 	    OP::Map(i, first_stride, sample_size, args...);
      446 	    i += first_stride;
   ```
   
   https://github.com/apache/incubator-mxnet/blob/9bdd4d6347c284770ee5bfe5ae98f1dabc283829/src/operator/random/pdf_op.h#L443
   
   The code needs to guard against zero-size array for right operand of `%`, and we should add a smoke test to guard against such problem in this op, similar to https://github.com/apache/incubator-mxnet/pull/18972/files


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha edited a comment on issue #18936: Floating point exception in mxnet.ndarray.op.random_pdf_dirichlet

Posted by GitBox <gi...@apache.org>.
szha edited a comment on issue #18936:
URL: https://github.com/apache/incubator-mxnet/issues/18936#issuecomment-678064250


   @xidulu since we are deprecating ndarray in favor of np/npx, do we need to register an alias of this op in np/npx? (or is it already registered)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on issue #18936: Floating point exception in mxnet.ndarray.op.random_pdf_dirichlet

Posted by GitBox <gi...@apache.org>.
szha commented on issue #18936:
URL: https://github.com/apache/incubator-mxnet/issues/18936#issuecomment-677755384


   Yes, FPE should no longer abort the program now. The bug still needs to be fixed


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on issue #18936: Floating point exception in mxnet.ndarray.op.random_pdf_dirichlet

Posted by GitBox <gi...@apache.org>.
szha commented on issue #18936:
URL: https://github.com/apache/incubator-mxnet/issues/18936#issuecomment-678062954


   So here's the problem:
   ```
   % DMLC_LOG_STACK_TRACE_DEPTH=150 MXNET_ENGINE_TYPE=NaiveEngine lldb python3.7 -- test_18936.py
   (lldb) target create "python3.7"
   Current executable set to 'python3.7' (x86_64).
   (lldb) settings set -- target.run-args  "test_18936.py"
   (lldb) run
   Process 27100 launched: '/usr/local/bin/python3.7' (x86_64)
   Process 27100 stopped
   * thread #2, stop reason = exec
       frame #0: 0x0000000100006000 dyld`_dyld_start
   dyld`_dyld_start:
   ->  0x100006000 <+0>: popq   %rdi
       0x100006001 <+1>: pushq  $0x0
       0x100006003 <+3>: movq   %rsp, %rbp
       0x100006006 <+6>: andq   $-0x10, %rsp
   (lldb) cont
   Process 27100 resuming
   [23:14:55] ../src/engine/engine.cc:55: MXNet start using engine: NaiveEngine
   [23:14:55] ../src/storage/storage.cc:198: Using Pooled (Naive) StorageManager for CPU
   Process 27100 stopped
   * thread #2, queue = 'com.apple.main-thread', stop reason = EXC_ARITHMETIC (code=EXC_I386_DIV, subcode=0x0)
       frame #0: 0x0000000116f5b318 libmxnet.dylib`void mxnet::op::mxnet_op::Kernel<mxnet::op::LaunchExWrapper<mxnet::op::PDF_Dirichlet<false> >, mshadow::cpu>::LaunchEx<int, int, float*, float*, float*>(mshadow::Stream<mshadow::cpu>*, unsigned long, int, int, float*, float*, float*) at pdf_op.h:443
      440 	    index_t i = start;
      441
      442 	    // Get aligned
   -> 443 	    const index_t align_step = sample_size - (i % sample_size);
      444 	    const index_t first_stride = length > align_step ? align_step : length;
      445 	    OP::Map(i, first_stride, sample_size, args...);
      446 	    i += first_stride;
   ```
   
   https://github.com/apache/incubator-mxnet/blob/9bdd4d6347c284770ee5bfe5ae98f1dabc283829/src/operator/random/pdf_op.h#L443
   
   The code needs to guard against zero-size array, and we should add a smoke test to guard against such problem in this op, similar to https://github.com/apache/incubator-mxnet/pull/18972/files


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leeyeetonn commented on issue #18936: Floating point exception in mxnet.ndarray.op.random_pdf_dirichlet

Posted by GitBox <gi...@apache.org>.
leeyeetonn commented on issue #18936:
URL: https://github.com/apache/incubator-mxnet/issues/18936#issuecomment-674471057


   @szha Thanks for your feedback! I agree. They should not have FPEs but rather runtime exceptions in python. I believe each one of them requires some additional input validity checks. 
   
   I have a few more cases of FPEs caused by similar kind of input. If you don't mind, I'd like to report them as individual issues which is helpful to keep track of things.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on issue #18936: Floating point exception in mxnet.ndarray.op.random_pdf_dirichlet

Posted by GitBox <gi...@apache.org>.
szha commented on issue #18936:
URL: https://github.com/apache/incubator-mxnet/issues/18936#issuecomment-674469336


   @leeyeetonn thanks a lot for identifying and reporting these issues (#18927, #18933, #18934, and this). it's very helpful.
   
   I think a general problem for these issues is that FPE exits the program without a stacktrace. I will work on improving the signal handler to treat it as a regular runtime error instead.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] xidulu commented on issue #18936: Floating point exception in mxnet.ndarray.op.random_pdf_dirichlet

Posted by GitBox <gi...@apache.org>.
xidulu commented on issue #18936:
URL: https://github.com/apache/incubator-mxnet/issues/18936#issuecomment-677740834


   @leeyeetonn 
   I guess this issue should be resolved now according to https://github.com/apache/incubator-mxnet/pull/18956 (if my understand is correct) authored by @szha 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha closed issue #18936: Floating point exception in mxnet.ndarray.op.random_pdf_dirichlet

Posted by GitBox <gi...@apache.org>.
szha closed issue #18936:
URL: https://github.com/apache/incubator-mxnet/issues/18936


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org