You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/10/01 18:35:44 UTC

[GitHub] [incubator-mxnet] Zha0q1 opened a new issue #19265: MKLDNN RNN seg fault

Zha0q1 opened a new issue #19265:
URL: https://github.com/apache/incubator-mxnet/issues/19265


   A customer is experiencing seg fault when feeding in a large input to MKL LSTM. I have reduced the code to this:
   ```
   import mxnet as mx
   from mxnet import gluon, nd, autograd
   from mxnet.gluon import nn, rnn, Trainer
   
   hidden_size = 30
   num_embed = 100
   vocab_size = 13028#len(vocab.token_to_idx.keys())
   
   inp = nd.random.uniform(0, vocab_size, (16758,500))
   print(inp)
   
   context = mx.cpu()
   
   model = nn.Sequential()
   model.add(nn.Embedding(vocab_size, num_embed), # Embedding layer
             rnn.LSTM(hidden_size, num_layers=1,bidirectional=True),  # Recurrent layer ,bidirectional=True
             nn.Dense(3))  # Output layer
   
   model.collect_params().initialize(mx.init.Xavier(), ctx=context)
   
   val_predictions = model(inp)
   nd.waitall()
   print(val_predictions)
   ```
   I think this is some sort of out of memory issue because if we shrink the input (first dim of `inp`) then there will not be a seg fault, but still, shall we add some error message here so that users will be notified to reduce the input size?
   
   I also noticed the same input will run fine with `export MXNET_USE_MKLDNN_RNN=0` but that is 3x slower than the mkldnn implementation
   
   @PatricZhao 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] sandeep-krishnamurthy commented on issue #19265: MKLDNN RNN seg fault

Posted by GitBox <gi...@apache.org>.

sandeep-krishnamurthy commented on issue #19265:
URL: https://github.com/apache/incubator-mxnet/issues/19265#issuecomment-702399869


   @anko-intel 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] mozga-intel edited a comment on issue #19265: MKLDNN RNN seg fault

Posted by GitBox <gi...@apache.org>.

mozga-intel edited a comment on issue #19265:
URL: https://github.com/apache/incubator-mxnet/issues/19265#issuecomment-707744679


   Hi, 
   
   Well, 
   When running our pre-model (this is a simple imitation of the LSTM model). While a test, I want to create a large LRN tensor, for example: (20758,500). It could be seen that ~ 170GB of memory is allocated for scratchpad computations. We can see that global memory is always true. Well, as a result, for a different oneDNN version, I got the error messages: as following:
   1. For a given v.1.3 version of mkldnn: `Segmentation fault: 11`
   2. For a given v.1.6 version of mkldnn:  `mxnet.base.MXNetError: MXNetError: could not create a primitive`
   
   This error is only visible for a large LRN tensor. Step-by-step reproduction casts light on this issue. If we have a look at the code, a lot of things might be visible there. First off, the standard Vanilla-LSTM algorithm of MKLDNN leads to allocate sufficient/insufficient block of memory. The block is allocated based on this equation: `sizeof(float) * work_space`, where `work_space` is an offset (in bytes). For a given test (input: 20758,500) we can see that ~170 GB od memory is allocated for scratchpad computation, where `workspace = 47952392192 * sizeof(float) =  191809568768 bytes ~ 170 GB`.  If you don't have enough space, you will get both errors: see **1** & **2**. In Intel, MKLDNN primitives can use either individual memory or global buffer memory for an intermediate computation. The first one might lead to getting better performance result since memory most likely will be attached to any thread. The second one, might save a lot of memory.
   
   **For brevity:** 
   The input tensor is` T x N x C`, well, for a given example `(10758, 500)`, T is `10758`, C is `500`, That means that we need at least `4 * 10758 * 500 * 500 * 4 bytes ~ 40 GB`, or maybe more. Basically the work-space would be comparable with the grid size `n_layers * mb * n_times_stamps * 4 (gates) * max(sic, slc, dhsc) ^ 2`.  For a given oneDNN version (1.3 and 1.6) the size of work-space (i.e LSTM space) is equal `book<float>(num_elems, ....) ~ 40 GB * sizeof(T) = 40 GB * 4 ~160GB`.  The upper_bound (the size of input tensor) is unlimited and is bounded by available memory space.  Well, the size of buffer which is need to allocate LSTM tensor is equal: `4 * 10758 * 500 * 500 * 4 bytes ~ 40 GB`.  Yet, this value is multiply by the constant value of its type (in this case: <T> = float). 
   Approximately: it should be defined, as following: 
   1. The size of work-space * <T>, where <T> is <uint8_t> ~ * 1byte [potentially]
   1. The workspace is only limited by the total number of elements of a given tensor. 
   
   
   The upper_bound of a given tensor is equal (the upper-bound of LSTM)
   `n^2 * m = memory_space / (16 bytes)`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] Zha0q1 closed issue #19265: MKLDNN RNN seg fault

Posted by GitBox <gi...@apache.org>.

Zha0q1 closed issue #19265:
URL: https://github.com/apache/incubator-mxnet/issues/19265


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] mozga-intel edited a comment on issue #19265: MKLDNN RNN seg fault

Posted by GitBox <gi...@apache.org>.

mozga-intel edited a comment on issue #19265:
URL: https://github.com/apache/incubator-mxnet/issues/19265#issuecomment-703568699


   @Zha0q1 Could you please tell me a little bit more details about this issue, such as the branch name and its commit sha and what version of MKLDNN you have (commit-sha)? Thanks! 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] mozga-intel edited a comment on issue #19265: MKLDNN RNN seg fault

Posted by GitBox <gi...@apache.org>.

mozga-intel edited a comment on issue #19265:
URL: https://github.com/apache/incubator-mxnet/issues/19265#issuecomment-707744679


   Hi, 
   
   Well, 
   When running our pre-model (this is a simple imitation of the LSTM model). While a test, I want to create a large LRN tensor, for example: (20758,500). It could be seen that ~ 170GB of memory is allocated for scratchpad computations. We can see that global memory is always true. Well, as a result, for a different oneDNN version, I got the error messages: as following:
   1. For a given v.1.3 version of mkldnn: `Segmentation fault: 11`
   2. For a given v.1.6 version of mkldnn:  `mxnet.base.MXNetError: MXNetError: could not create a primitive`
   
   This error is only visible for a large LRN tensor. Step-by-step reproduction cast light on this issue. If we have a look at the code, a lot of things might be visible there. First off, the standard Vanilla-LSTM algorithm of MKLDNN leads to allocate sufficient/insufficient block of memory. The block is allocated based on this equation: `sizeof(float) * work_space`, where `work_space` is an offset (in bytes). For a given test (input: 20758,500) we can see that ~170 GB od memory is allocated for scratchpad computation, where `workspace = 47952392192 * sizeof(float) =  191809568768 bytes ~ 170 GB`.  If you don't have enough space, you will get both errors: see **1** & **2**. In Intel, MKLDNN primitives can use either individual memory or global buffer memory for an intermediate computation. The first one might lead to getting better performance result since memory most likely will be attached to any thread. The second one, might save a lot of memory.
   
   **For brevity:** 
   The input tensor is` T x N x C`, well, for a given example `(10758, 500)`, T is `10758`, C is `500`, That means that we need at least `4 * 10758 * 500 * 500 * 4 bytes ~ 40 GB`, or maybe more. Basically the work-space would be comparable with the grid size `n_layers * mb * n_times_stamps * 4 (gates) * max(sic, slc, dhsc) ^ 2`.  For a given oneDNN version (1.3 and 1.6) the size of work-space (i.e LSTM space) is equal `book<float>(num_elems, ....) ~ 40 GB * sizeof(T) = 40 GB * 4 ~160GB`.  The upper_bound (the size of input tensor) is unlimited and is bounded by available memory space.  Well, the size of buffer which is need to allocate LSTM tensor is equal: `4 * 10758 * 500 * 500 * 4 bytes ~ 40 GB`.  Yet, this value is multiply by the constant value of its type (in this case: <T> = float). 
   Approximately: it should be defined, as following: 
   1. The size of work-space * <T>, where <T> is <uint8_t> ~ * 1byte [potentially]
   1. The workspace is only limited by the total number of elements of a given tensor. 
   
   
   The upper_bound of a given tensor is equal (the upper-bound of LSTM)
   `n^2 * m = memory_space / (16 bytes)`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] mozga-intel edited a comment on issue #19265: MKLDNN RNN seg fault

Posted by GitBox <gi...@apache.org>.

mozga-intel edited a comment on issue #19265:
URL: https://github.com/apache/incubator-mxnet/issues/19265#issuecomment-707744679


   Hi, 
   
   Well, 
   When running our pre-model (this is a simple imitation of the LSTM model). While a test, I want to create a large LSTM tensor, for example: (20758,500). It could be seen that ~ 170GB of memory is allocated for scratchpad computations. We can see that global memory is always true. Well, as a result, for a different oneDNN version, I got the error messages: as following:
   1. For a given v.1.3 version of mkldnn: `Segmentation fault: 11`
   2. For a given v.1.6 version of mkldnn:  `mxnet.base.MXNetError: MXNetError: could not create a primitive`
   
   This error is only visible for a large LSTM tensor. Step-by-step reproduction casts light on this issue. If we have a look at the code, a lot of things might be visible there. First off, the standard Vanilla-LSTM algorithm of MKLDNN leads to allocate sufficient/insufficient block of memory. The block is allocated based on this equation: `sizeof(float) * work_space`, where `work_space` is an offset (in bytes). For a given test (input: 20758,500) we can see that ~170 GB od memory is allocated for scratchpad computation, where `workspace = 47952392192 * sizeof(float) =  191809568768 bytes ~ 170 GB`.  If you don't have enough space, you will get both errors: see **1** & **2**. In Intel, MKLDNN primitives can use either individual memory or global buffer memory for an intermediate computation. The first one might lead to getting better performance result since memory most likely will be attached to any thread. The second one, might save a lot of memory.
   
   **For brevity:** 
   The input tensor is` T x N x C`, well, for a given example `(10758, 500)`, T is `10758`, C is `500`, That means that we need at least `4 * 10758 * 500 * 500 * 4 bytes ~ 40 GB`, or maybe more. Basically the work-space would be comparable with the grid size `n_layers * mb * n_times_stamps * 4 (gates) * max(sic, slc, dhsc) ^ 2`.  For a given oneDNN version (1.3 and 1.6) the size of work-space (i.e LSTM space) is equal `book<float>(num_elems, ....) ~ 40 GB * sizeof(T) = 40 GB * 4 ~160GB`.  An upper_bound (the size of input tensor) has not been clearly defined and its upper_bound has been limited by physical side of memory.  Well, the size of buffer which is need to allocate LSTM tensor is determined, as following: `4 * 10758 * 500 * 500 * 4 bytes ~ 40 GB`.  Yet, this value is multiply by the constant value of its type (in this case: <T> = float). 
   Approximately: it should be defined, as following: 
   1. The size of work-space * <T>, where <T> is <uint8_t> ~ * 1byte [potentially]
   1. The workspace is only limited by the total number of elements of a given tensor. 
   
   
   An upper_bound of a given tensor is equal (the upper-bound of LSTM)
   `n^2 * m = memory_space / (16 bytes)`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] pengzhao-intel commented on issue #19265: MKLDNN RNN seg fault

Posted by GitBox <gi...@apache.org>.

pengzhao-intel commented on issue #19265:
URL: https://github.com/apache/incubator-mxnet/issues/19265#issuecomment-718328187


   > @TaoLv @ciyongch @PatricZhao - Hello guys. Can you please help in this issue. We saw atleast 2 production users impacted by this and USE_MKLDNN=0 was temp fix, but performance is really bad as expected. This is a blocker.
   
   Sorry for that and the team is working on fixing any possible issues. Feel free to ping us for any issue :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] mozga-intel commented on issue #19265: MKLDNN RNN seg fault

Posted by GitBox <gi...@apache.org>.

mozga-intel commented on issue #19265:
URL: https://github.com/apache/incubator-mxnet/issues/19265#issuecomment-702835605


   Thanks, @Zha0q1 @sandeep-krishnamurthy! I have a look at this issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] Zha0q1 commented on issue #19265: MKLDNN RNN seg fault

Posted by GitBox <gi...@apache.org>.

Zha0q1 commented on issue #19265:
URL: https://github.com/apache/incubator-mxnet/issues/19265#issuecomment-713138375


   @mozga-intel Thanks for you investigation! Yes, this improvement is huge and will help our users who run inference tasks on pre-trained models. It would be great to include this fix in the next oneDNN release


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] Zha0q1 commented on issue #19265: MKLDNN RNN seg fault

Posted by GitBox <gi...@apache.org>.

Zha0q1 commented on issue #19265:
URL: https://github.com/apache/incubator-mxnet/issues/19265#issuecomment-702324807


   seg fault:
   ```
   Segmentation fault: 11
   
   terminate called without an active exception
   Aborted (core dumped)
   ```
   
   GDB:
   ```
   
   Thread 9 "python" received signal SIGSEGV, Segmentation fault.
   [Switching to Thread 0x7fffbac26700 (LWP 18164)]
   bt
   0x00007fff9c0743f0 in ?? ()
   (gdb) bt
   #0  0x00007fff9c0743f0 in ?? ()
   #1  0x00007fffe5e905ec in float** dnnl::impl::memory_tracking::grantor_t::get<float*>(unsigned int const&) const
       () from /home/ubuntu/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so
   #2  0x00007fffe5e93697 in dnnl::impl::cpu::_ref_rnn_common_t<(dnnl_prop_kind_t)64, (dnnl_data_type_t)3, (dnnl_data_type_t)3, (dnnl_data_type_t)3>::execute_(dnnl::impl::exec_ctx_t const&) const ()
      from /home/ubuntu/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so
   #3  0x00007fffe5d05de9 in dnnl::impl::cpu::_ref_rnn_common_t<(dnnl_prop_kind_t)64, (dnnl_data_type_t)3, (dnnl_data_type_t)3, (dnnl_data_type_t)3>::execute(dnnl::impl::exec_ctx_t const&) const ()
      from /home/ubuntu/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so
   #4  0x00007fffe5890788 in dnnl_primitive_execute ()
      from /home/ubuntu/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so
   #5  0x00007fffe0a5eb1a in mxnet::MKLDNNStream::Submit(bool) ()
      from /home/ubuntu/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so
   #6  0x00007fffe0b13343 in mxnet::op::MKLDNNRnnOp::Forward(mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&) ()
      from /home/ubuntu/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so
   #7  0x00007fffe5306633 in mxnet::op::RNNStatefulComputeExCPU(mxnet::OpStatePtr const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&) ()
      from /home/ubuntu/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so
   #8  0x00007fffe4f503fd in mxnet::imperative::PushOperator(mxnet::OpStatePtr const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<unsigned int, std::allocator<unsigned int> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, mxnet::DispatchMode)::{lambda(mxnet::RunContext, mxnet::engine::CallbackOnComplete)#1}::operator()(mxnet::RunContext, mxnet::engine::CallbackOnComplete) const () from /home/ubuntu/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so
   #9  0x00007fffe4f506cd in std::_Function_handler<void (mxnet::RunContext), mxnet::imperative::PushOperator(mxnet::O---Type <return> to continue, or q <return> to quit---
   pStatePtr const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<unsigned int, std::allocator<unsigned int> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, mxnet::DispatchMode)::{lambda(mxnet::RunContext)#2}>::_M_invoke(std::_Any_data const&, mxnet::RunContext) () from /home/ubuntu/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so
   #10 0x00007fffe501d754 in std::_Function_handler<void (mxnet::RunContext, mxnet::engine::CallbackOnComplete), mxnet::engine::ThreadedEngine::PushSync(std::function<void (mxnet::RunContext)>, mxnet::Context, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, mxnet::FnProperty, int, char const*)::{lambda(mxnet::RunContext, mxnet::engine::CallbackOnComplete)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext, mxnet::engine::CallbackOnComplete) ()
      from /home/ubuntu/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so
   #11 0x00007fffe50180a5 in mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*) () from /home/ubuntu/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so
   #12 0x00007fffe502a294 in std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#1}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>) ()
      from /home/ubuntu/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so
   #13 0x00007fffe5016934 in std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)> >::_M_run() ()
      from /home/ubuntu/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so
   #14 0x00007fffded79421 in std::execute_native_thread_routine_compat (__p=<optimized out>)
       at /home/nwani/m3/conda-bld/compilers_linux-64_1560109574129/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libstdc++-v3/src/c++11/thread.cc:94
   #15 0x00007ffff7bbd6db in start_thread (arg=0x7fffbac26700) at pthread_create.c:463
   #16 0x00007ffff78e6a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] mozga-intel edited a comment on issue #19265: MKLDNN RNN seg fault

Posted by GitBox <gi...@apache.org>.

mozga-intel edited a comment on issue #19265:
URL: https://github.com/apache/incubator-mxnet/issues/19265#issuecomment-707744679






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] mozga-intel commented on issue #19265: MKLDNN RNN seg fault

Posted by GitBox <gi...@apache.org>.

mozga-intel commented on issue #19265:
URL: https://github.com/apache/incubator-mxnet/issues/19265#issuecomment-712736973


   Hi @Zha0q1 
   There’s a bug in oneDNN LSTM forward inference that results in using ~4x more memory for LSTM workspace in inference cases.
   Could you please tell me whether this addressing (look at the table), is acceptable and it allows you to resolve any issues?
   
     | Before | After
   -- | -- | --
   The total size of memory needed to   allocate LSTM tensor (dim: 20756, 500) | 230 GB (~4x more memory) | 56 GB (~4x less memory)
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] mozga-intel edited a comment on issue #19265: MKLDNN RNN seg fault

Posted by GitBox <gi...@apache.org>.

mozga-intel edited a comment on issue #19265:
URL: https://github.com/apache/incubator-mxnet/issues/19265#issuecomment-712736973


   Hi @Zha0q1 
   There’s a bug in oneDNN LSTM forward inference that results in using ~4x more memory for LSTM workspace in inference cases.
   Could you please tell me whether this addressing (look at the table), is acceptable and it allows you to resolve any issues?
   
     (dim: 20756, 500)  | Before | After
   -- | -- | --
   The total size of memory needed to allocate LSTM tensor | 230 GB (~4x more memory) | 56 GB (~4x less memory)
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] mozga-intel edited a comment on issue #19265: MKLDNN RNN seg fault

Posted by GitBox <gi...@apache.org>.

mozga-intel edited a comment on issue #19265:
URL: https://github.com/apache/incubator-mxnet/issues/19265#issuecomment-707744679


   Hi, 
   
   Well, 
   When running our pre-model (this is a simple imitation of the LSTM model). While a test, I want to create a large LRN tensor, for example: (20758,500). It could be seen that ~ 170GB of memory is allocated for scratchpad computations. We can see that global memory is always true. Well, as a result, for a different oneDNN version, I got the error messages: as following:
   1. For a given v.1.3 version of mkldnn: `Segmentation fault: 11`
   2. For a given v.1.6 version of mkldnn:  `mxnet.base.MXNetError: MXNetError: could not create a primitive`
   
   This error is only visible for a large LRN tensor. Step-by-step reproduction casts light on this issue. If we have a look at the code, a lot of things might be visible there. First off, the standard Vanilla-LSTM algorithm of MKLDNN leads to allocate sufficient/insufficient block of memory. The block is allocated based on this equation: `sizeof(float) * work_space`, where `work_space` is an offset (in bytes). For a given test (input: 20758,500) we can see that ~170 GB od memory is allocated for scratchpad computation, where `workspace = 47952392192 * sizeof(float) =  191809568768 bytes ~ 170 GB`.  If you don't have enough space, you will get both errors: see **1** & **2**. In Intel, MKLDNN primitives can use either individual memory or global buffer memory for an intermediate computation. The first one might lead to getting better performance result since memory most likely will be attached to any thread. The second one, might save a lot of memory.
   
   **For brevity:** 
   The input tensor is` T x N x C`, well, for a given example `(10758, 500)`, T is `10758`, C is `500`, That means that we need at least `4 * 10758 * 500 * 500 * 4 bytes ~ 40 GB`, or maybe more. Basically the work-space would be comparable with the grid size `n_layers * mb * n_times_stamps * 4 (gates) * max(sic, slc, dhsc) ^ 2`.  For a given oneDNN version (1.3 and 1.6) the size of work-space (i.e LSTM space) is equal `book<float>(num_elems, ....) ~ 40 GB * sizeof(T) = 40 GB * 4 ~160GB`.  An upper_bound (the size of input tensor) has not been clearly defined and its upper_bound has been limited by physical side of memory.  Well, the size of buffer which is need to allocate LSTM tensor is determined, as following: `4 * 10758 * 500 * 500 * 4 bytes ~ 40 GB`.  Yet, this value is multiply by the constant value of its type (in this case: <T> = float). 
   Approximately: it should be defined, as following: 
   1. The size of work-space * <T>, where <T> is <uint8_t> ~ * 1byte [potentially]
   1. The workspace is only limited by the total number of elements of a given tensor. 
   
   
   An upper_bound of a given tensor is equal (the upper-bound of LSTM)
   `n^2 * m = memory_space / (16 bytes)`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] mozga-intel edited a comment on issue #19265: MKLDNN RNN seg fault

Posted by GitBox <gi...@apache.org>.

mozga-intel edited a comment on issue #19265:
URL: https://github.com/apache/incubator-mxnet/issues/19265#issuecomment-703568699


   @Zha0q1 Could you please tell me a little bit more details about this issue, such as the branch name and its commit sha and what version of MKLDNN you have? Thanks! 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] sandeep-krishnamurthy commented on issue #19265: MKLDNN RNN seg fault

Posted by GitBox <gi...@apache.org>.

sandeep-krishnamurthy commented on issue #19265:
URL: https://github.com/apache/incubator-mxnet/issues/19265#issuecomment-702373078


   @TaoLv @ciyongch @PatricZhao - Hello guys. Can you please help in this issue. We saw atleast 2 production users impacted by this and USE_MKLDNN=0 was temp fix, but performance is really bad as expected. This is a blocker.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] mozga-intel commented on issue #19265: MKLDNN RNN seg fault

Posted by GitBox <gi...@apache.org>.

mozga-intel commented on issue #19265:
URL: https://github.com/apache/incubator-mxnet/issues/19265#issuecomment-707744679


   Hi, 
   
   Well, 
   When running our pre-model (this is a simple imitation of the LSTM model). While a test, I want to create a large LRN tensor, for example: (20758,500). It could be seen that ~ 170GB of memory is allocated for scratchpad computations. We can see that global memory is always true. Well, as a result, for a different oneDNN version, I got the error messages: as following:
   1. For a given v.1.3 version of mkldnn: `Segmentation fault: 11`
   2. For a given v.1.6 version of mkldnn:  `mxnet.base.MXNetError: MXNetError: could not create a primitive`
   
   This error is only visible for a large LRN tensor. Step-by-step reproduction gives many lights to this issue. If we look into the code, a lot of future will be visible. The standard Vanilla-LSTM algorithm of MKLDNN leads to allocate of blocks of memory based on this equation: ```sizeof(float) * work_space```, where ```work_space``` is an offset (in bytes). For a given test (input :20758,500) we can see that ~170 GB od memory is allocated for scratchpad computation, where `workspace = 47952392192 * sizeof(float) =  191809568768 bytes ~ 170 GB`.  If you don't have enough space, you will get both errors: see **1** & **2**. In Intel, MKLDNN primitives can use either individual memory or global buffer memory for an intermediate computation. The first one might lead to getting better performance result since memory most likely will be attached to any thread. The second one, might save a lot of memory.
   
   **For brevity:** 
   The input tensor is` T x N x C`, well, for a given example `(10758, 500)`, T is `10758`, C is `500`, That means that we need at least `4 * 10758 * 500 * 500 * 4 bytes ~ 40 GB`, or maybe more. Basically the work-space would be comparable with the grid size `n_layers * mb * n_times_stamps * 4 (gates) * max(sic, slc, dhsc) ^ 2`.  For a given oneDNN version (1.3 and 1.6) the size of work-space (i.e LSTM space) is equal `book<float>(num_elems, ....) ~ 40 GB * sizeof(T) = 40 GB * 4 ~160GB`.  The upper_bound (the size of input tensor) is unlimited and is bounded by available memory space.  Well, the size of buffer which is need to allocate LSTM tensor is equal: `4 * 10758 * 500 * 500 * 4 bytes ~ 40 GB`.  Yet, this value is multiply by the constant value of its type (in this case: <T> = float). 
   Approximately: it should be defined, as following: 
   1. The size of work-space * <T>, where <T> is <uint8_t> 
   1. The work space is only limited by the total number of elements of a given tensor. 
   
   
   The total size which is required for a given tensor (the upper-bound of LSTM)
   `n^2 * m = memory_space / (16 bits)`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] mozga-intel commented on issue #19265: MKLDNN RNN seg fault

Posted by GitBox <gi...@apache.org>.

mozga-intel commented on issue #19265:
URL: https://github.com/apache/incubator-mxnet/issues/19265#issuecomment-703568699


   @Zha0q1 Could you please tell me a little bit more details about this issue, such as the branch name and its commit and what version of MKLDNN you have? Thanks! 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] Zha0q1 commented on issue #19265: MKLDNN RNN seg fault

Posted by GitBox <gi...@apache.org>.

Zha0q1 commented on issue #19265:
URL: https://github.com/apache/incubator-mxnet/issues/19265#issuecomment-703777997


   I am using mxnet 1.7 (https://github.com/apache/incubator-mxnet/releases/tag/1.7.0) from `.pip install mxnet`. The machine was a C5.9xlarge DLAMI Ubuntu 18 EC2 instance.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] mozga-intel edited a comment on issue #19265: MKLDNN RNN seg fault

Posted by GitBox <gi...@apache.org>.

mozga-intel edited a comment on issue #19265:
URL: https://github.com/apache/incubator-mxnet/issues/19265#issuecomment-707744679


   Hi, 
   
   Well, 
   When running our pre-model (this is a simple imitation of the LSTM model). While a test, I want to create a large LRN tensor, for example: (20758,500). It could be seen that ~ 170GB of memory is allocated for scratchpad computations. We can see that global memory is always true. Well, as a result, for a different oneDNN version, I got the error messages: as following:
   1. For a given v.1.3 version of mkldnn: `Segmentation fault: 11`
   2. For a given v.1.6 version of mkldnn:  `mxnet.base.MXNetError: MXNetError: could not create a primitive`
   
   This error is only visible for a large LRN tensor. Step-by-step reproduction cast light on this issue. If we look into the code, a lot of future will be visible. The standard Vanilla-LSTM algorithm of MKLDNN leads to allocate of blocks of memory based on this equation: ```sizeof(float) * work_space```, where ```work_space``` is an offset (in bytes). For a given test (input :20758,500) we can see that ~170 GB od memory is allocated for scratchpad computation, where `workspace = 47952392192 * sizeof(float) =  191809568768 bytes ~ 170 GB`.  If you don't have enough space, you will get both errors: see **1** & **2**. In Intel, MKLDNN primitives can use either individual memory or global buffer memory for an intermediate computation. The first one might lead to getting better performance result since memory most likely will be attached to any thread. The second one, might save a lot of memory.
   
   **For brevity:** 
   The input tensor is` T x N x C`, well, for a given example `(10758, 500)`, T is `10758`, C is `500`, That means that we need at least `4 * 10758 * 500 * 500 * 4 bytes ~ 40 GB`, or maybe more. Basically the work-space would be comparable with the grid size `n_layers * mb * n_times_stamps * 4 (gates) * max(sic, slc, dhsc) ^ 2`.  For a given oneDNN version (1.3 and 1.6) the size of work-space (i.e LSTM space) is equal `book<float>(num_elems, ....) ~ 40 GB * sizeof(T) = 40 GB * 4 ~160GB`.  The upper_bound (the size of input tensor) is unlimited and is bounded by available memory space.  Well, the size of buffer which is need to allocate LSTM tensor is equal: `4 * 10758 * 500 * 500 * 4 bytes ~ 40 GB`.  Yet, this value is multiply by the constant value of its type (in this case: <T> = float). 
   Approximately: it should be defined, as following: 
   1. The size of work-space * <T>, where <T> is <uint8_t> ~ * 1byte [potentially]
   1. The workspace is only limited by the total number of elements of a given tensor. 
   
   
   The upper_bound of a given tensor is equal (the upper-bound of LSTM)
   `n^2 * m = memory_space / (16 bytes)`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] mozga-intel edited a comment on issue #19265: MKLDNN RNN seg fault

Posted by GitBox <gi...@apache.org>.

mozga-intel edited a comment on issue #19265:
URL: https://github.com/apache/incubator-mxnet/issues/19265#issuecomment-707744679


   Hi, 
   
   Well, 
   When running our pre-model (this is a simple imitation of the LSTM model). While a test, I want to create a large LRN tensor, for example: (20758,500). It could be seen that ~ 170GB of memory is allocated for scratchpad computations. We can see that global memory is always true. Well, as a result, for a different oneDNN version, I got the error messages: as following:
   1. For a given v.1.3 version of mkldnn: `Segmentation fault: 11`
   2. For a given v.1.6 version of mkldnn:  `mxnet.base.MXNetError: MXNetError: could not create a primitive`
   
   This error is only visible for a large LRN tensor. Step-by-step reproduction gives many lights to this issue. If we look into the code, a lot of future will be visible. The standard Vanilla-LSTM algorithm of MKLDNN leads to allocate of blocks of memory based on this equation: ```sizeof(float) * work_space```, where ```work_space``` is an offset (in bytes). For a given test (input :20758,500) we can see that ~170 GB od memory is allocated for scratchpad computation, where `workspace = 47952392192 * sizeof(float) =  191809568768 bytes ~ 170 GB`.  If you don't have enough space, you will get both errors: see **1** & **2**. In Intel, MKLDNN primitives can use either individual memory or global buffer memory for an intermediate computation. The first one might lead to getting better performance result since memory most likely will be attached to any thread. The second one, might save a lot of memory.
   
   **For brevity:** 
   The input tensor is` T x N x C`, well, for a given example `(10758, 500)`, T is `10758`, C is `500`, That means that we need at least `4 * 10758 * 500 * 500 * 4 bytes ~ 40 GB`, or maybe more. Basically the work-space would be comparable with the grid size `n_layers * mb * n_times_stamps * 4 (gates) * max(sic, slc, dhsc) ^ 2`.  For a given oneDNN version (1.3 and 1.6) the size of work-space (i.e LSTM space) is equal `book<float>(num_elems, ....) ~ 40 GB * sizeof(T) = 40 GB * 4 ~160GB`.  The upper_bound (the size of input tensor) is unlimited and is bounded by available memory space.  Well, the size of buffer which is need to allocate LSTM tensor is equal: `4 * 10758 * 500 * 500 * 4 bytes ~ 40 GB`.  Yet, this value is multiply by the constant value of its type (in this case: <T> = float). 
   Approximately: it should be defined, as following: 
   1. The size of work-space * <T>, where <T> is <uint8_t> ~ * 1byte [potentially]
   1. The workspace is only limited by the total number of elements of a given tensor. 
   
   
   The total size which is required for a given tensor (the upper-bound of LSTM)
   `n^2 * m = memory_space / (16 bits)`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org