You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mxnet.apache.org by GitBox <gi...@apache.org> on 2021/04/09 18:27:53 UTC

[GitHub] [incubator-mxnet] matteosal opened a new issue #20149: MXAutogradBackward complains about autograd recording despite being set? (C API)

matteosal opened a new issue #20149:
URL: https://github.com/apache/incubator-mxnet/issues/20149


   This program creates a simple operator and runs forward + backward with the C API, checking for failures:
   ```
   #include <iostream>
   
   #include "mxnet/c_api.h"
   #include "nnvm/c_api.h"
   
   #define checkedMXCall(func, ...)                              \
     {                                                           \
       if (func(__VA_ARGS__) != 0) {                             \
         printf("MX call %s failed at line %d:\n%s",             \
               #func, __LINE__, MXGetLastError());               \
         exit(1)               ;                                 \
       }                                                         \
     }
   
   int main() {
   
     /* Create symbol variables */
     SymbolHandle in_sym;
     checkedMXCall(MXSymbolCreateVariable, "in", &in_sym);
   
     /* Create symbol */
     OpHandle op;
     NNGetOpHandle("sin", &op);
     SymbolHandle sym;
     checkedMXCall(MXSymbolCreateAtomicSymbol, op, 0, nullptr, nullptr, &sym);
     checkedMXCall(MXSymbolCompose, sym, "Sin", 1, nullptr, &in_sym);
   
     /* Create NDArray for argument */
     int dev_type = 1;
     int dev_id = 0; 
     mx_uint shape[2] = {2, 3};
     NDArrayHandle in_arr;
     checkedMXCall(MXNDArrayCreate, shape, 2, dev_type, dev_id, 0, 0, &in_arr);
   
     /* Create NDArray for gradient and attach gradient */
     NDArrayHandle grad_arr;
     checkedMXCall(MXNDArrayCreate, shape, 2, dev_type, dev_id, 0, 0, &grad_arr);
     uint32_t grad_req[1] = {1}; 
     checkedMXCall(MXAutogradMarkVariables, 1, &in_arr, grad_req, &grad_arr);
   
     /* Create cached op */
     const char *cachedop_keys[1] = {"static_alloc"};
     const char *cachedop_vals[1] = {"true"};
     CachedOpHandle cached_op;
     checkedMXCall(MXCreateCachedOp, sym, 1, cachedop_keys, cachedop_vals, &cached_op, true);
   
     /* Set autograd to record */
     int dummy_prev;
     checkedMXCall(MXAutogradSetIsRecording, 1, &dummy_prev);
   
     /* Run cached op */
     int n_outs;
     NDArrayHandle *out_arr = nullptr;
     const int *dummy_stypes = nullptr;
     checkedMXCall(MXInvokeCachedOp, cached_op, 1, &in_arr, dev_type, dev_id, &n_outs, &out_arr, &dummy_stypes);
   
     /* Check that autograd is recording */
     bool res;
     checkedMXCall(MXAutogradIsRecording, &res);
     std::cout << "IsRecording: " << res << "\n";
   
     /* Create NDArray for outgrad and run backward */
     NDArrayHandle outgrad_arr;
     checkedMXCall(MXNDArrayCreate, shape, 2, dev_type, dev_id, 0, 0, &outgrad_arr);
     checkedMXCall(MXAutogradBackward, 1, out_arr, &outgrad_arr, true);
   
     return 0;
   }
   ```
   The output is:
   ```
   [20:23:58] /home/matteo/Git/mxnet-build/Build/Linux-x86-64/MKL/mxnet/src/storage/storage.cc:199: Using Pooled (Naive) StorageManager for CPU
   IsRecording: 1
   MX call MXAutogradBackward failed at line 65:
   MXNetError: Check failed: !AGInfo: :IsNone(*i): Cannot differentiate node because it is not in a computational graph. You need to set is_recording to true or use autograd.record() to save computational graphs for backward. If you want to differentiate the same graph twice, you need to pass retain_graph=True to backward.
   Stack trace:
     File "/home/matteo/Git/mxnet-build/Build/Linux-x86-64/MKL/mxnet/src/imperative/imperative.cc", line 402
   ```
   
   Why is `MXAutogradBackward` complaining about autograd recording if `MXAutogradIsRecording` gives `true`? Am I doing something wrong here?
   
   Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] matteosal commented on issue #20149: MXAutogradBackward complains about autograd recording despite being set? (C API)

Posted by GitBox <gi...@apache.org>.
matteosal commented on issue #20149:
URL: https://github.com/apache/incubator-mxnet/issues/20149#issuecomment-817939702


   No luck, by using these flags I still see the same error:
   ```
   const char *cachedop_keys[3] = {"static_alloc", "data_indices", "param_indices"};
   const char *cachedop_vals[3] = {"true", "[0]", "[]"};
   ```
   Besides, I'm trying to mimic the pipeline of `executor.py` and there are no mentions to either `data_indices` or `param_indices` there. 
   By inspecting the source, something related to `AGInfo` entries is created by `MXAutogradMarkVariables`, but only for argument and gradient arrays. Such step is performed via the ndarraymethod `attach_grad` in the `executor.py` code, but I can't find an equivalent step for output arrays there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] szha commented on issue #20149: MXAutogradBackward complains about autograd recording despite being set? (C API)

Posted by GitBox <gi...@apache.org>.
szha commented on issue #20149:
URL: https://github.com/apache/incubator-mxnet/issues/20149#issuecomment-818865063


   cc @anirudh2290 who authored the thread-safe variant of cached op. I think at the time when the thread safe cached op was added it only had inference use cases in mind. The default cached op was not built to be thread safe because of the states it keeps inside.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] szha commented on issue #20149: MXAutogradBackward complains about autograd recording despite being set? (C API)

Posted by GitBox <gi...@apache.org>.
szha commented on issue #20149:
URL: https://github.com/apache/incubator-mxnet/issues/20149#issuecomment-818886227


   > Once it's feature complete, indeed, the unsafe version can be removed.
   
   I think it might take more than that. The current approach for making cached op thread-safe involves locks (see [here](https://github.com/apache/incubator-mxnet/blob/master/src/imperative/cached_op_threadsafe.cc#L162)). This has performance implication. Thus, in order to replace the non-thread-safe version, we may need a better approach for achieving thread-safety, such as making cachedop stateless.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] matteosal edited a comment on issue #20149: MXAutogradBackward complains about autograd recording despite being set? (C API)

Posted by GitBox <gi...@apache.org>.
matteosal edited a comment on issue #20149:
URL: https://github.com/apache/incubator-mxnet/issues/20149#issuecomment-818860713


   Ok I've found the problem: the last argument to `MXCreateCachedOp`, `thread_safe`. By setting that to `false` instead of `true` the backward is computed correctly. This argument is always `false` in the executor python interface.
   It sounds very surprising that executor computation can never be thread-safe. Actually, I wonder why one would *ever* want to discard thread-safety. Maybe this is not about thread-safety in the graph computation (as I assumed initially), but something else? 
   
   Can someone elaborate on this? Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] szha commented on issue #20149: MXAutogradBackward complains about autograd recording despite being set? (C API)

Posted by GitBox <gi...@apache.org>.
szha commented on issue #20149:
URL: https://github.com/apache/incubator-mxnet/issues/20149#issuecomment-819598646


   @matteosal in your example above, when you invoke the same cached op from multiple threads it will cause issues.
   ```
     /* Run cached op */
     int n_outs;
     NDArrayHandle *out_arr = nullptr;
     const int *dummy_stypes = nullptr;
     checkedMXCall(MXInvokeCachedOp, cached_op, 1, &in_arr, dev_type, dev_id, &n_outs, &out_arr, &dummy_stypes);
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] leezu removed a comment on issue #20149: MXAutogradBackward complains about autograd recording despite being set? (C API)

Posted by GitBox <gi...@apache.org>.
leezu removed a comment on issue #20149:
URL: https://github.com/apache/incubator-mxnet/issues/20149#issuecomment-818883777


   > Actually, I wonder why one would ever want to discard thread-safety.
   
   The problem here is that the original code wasn't thread-safe, and the thread-safe implementation mentioned by @szha isn't feature complete. Once it's feature complete, indeed, the unsafe version can be removed. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] matteosal commented on issue #20149: MXAutogradBackward complains about autograd recording despite being set? (C API)

Posted by GitBox <gi...@apache.org>.
matteosal commented on issue #20149:
URL: https://github.com/apache/incubator-mxnet/issues/20149#issuecomment-818948512


   I'd like to understand what *kind* of thread-safety is being controlled by the argument. In other words, can someone describe an example which would be unsafe to run if `thread_safe=false` is passed?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] matteosal commented on issue #20149: MXAutogradBackward complains about autograd recording despite being set? (C API)

Posted by GitBox <gi...@apache.org>.
matteosal commented on issue #20149:
URL: https://github.com/apache/incubator-mxnet/issues/20149#issuecomment-819607691


   Ok that's clear, thank you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] leezu commented on issue #20149: MXAutogradBackward complains about autograd recording despite being set? (C API)

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #20149:
URL: https://github.com/apache/incubator-mxnet/issues/20149#issuecomment-818883777


   > Actually, I wonder why one would ever want to discard thread-safety.
   
   The problem here is that the original code wasn't thread-safe, and the thread-safe implementation mentioned by @szha isn't feature complete. Once it's feature complete, indeed, the unsafe version can be removed. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] matteosal commented on issue #20149: MXAutogradBackward complains about autograd recording despite being set? (C API)

Posted by GitBox <gi...@apache.org>.
matteosal commented on issue #20149:
URL: https://github.com/apache/incubator-mxnet/issues/20149#issuecomment-818860713


   Ok I've found the problem: the last argument to `MXCreateCachedOp`, `thread_safe`. By setting that to `false` instead of `true` the backward is computed correctly. This argument is always `false` in the executor python interface.
   It sounds very surprising that executor computation can never be thread-safe. Actually, I wonder why one would *ever* want to discard thread-safety. Maybe this is not about thread-safety in the graph computation, but something else? 
   
   Can someone elaborate on this? Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] szha commented on issue #20149: MXAutogradBackward complains about autograd recording despite being set? (C API)

Posted by GitBox <gi...@apache.org>.
szha commented on issue #20149:
URL: https://github.com/apache/incubator-mxnet/issues/20149#issuecomment-817187705


   The error comes from that autograd didn't find the AGInfo entry on the output array (in your case it's `out_arr`), see [here](https://github.com/apache/incubator-mxnet/blob/master/src/imperative/imperative.cc#L401-L402). I'm guessing it might have to do with the missing flags for `data_indices` and `param_indices`, see [here](https://github.com/apache/incubator-mxnet/blame/master/python/mxnet/gluon/block.py#L1250-L1251)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] matteosal edited a comment on issue #20149: MXAutogradBackward complains about autograd recording despite being set? (C API)

Posted by GitBox <gi...@apache.org>.
matteosal edited a comment on issue #20149:
URL: https://github.com/apache/incubator-mxnet/issues/20149#issuecomment-817939702


   No luck, by using these flags I still see the same error:
   ```
   const char *cachedop_keys[3] = {"static_alloc", "data_indices", "param_indices"};
   const char *cachedop_vals[3] = {"true", "[0]", "[]"};
   ```
   Besides, I'm trying to mimic the pipeline of `executor.py` and there are no mentions to either `data_indices` or `param_indices` there. 
   By inspecting the source, something related to `AGInfo` entries is created by `MXAutogradMarkVariables`, but only for argument and gradient arrays (which is present in my example). In the `executor.py` interface, such step is performed via the ndarraymethod `attach_grad`, but I can't find an equivalent step for output arrays there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] matteosal closed issue #20149: MXAutogradBackward complains about autograd recording despite being set? (C API)

Posted by GitBox <gi...@apache.org>.
matteosal closed issue #20149:
URL: https://github.com/apache/incubator-mxnet/issues/20149


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org