You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2017/11/15 23:56:27 UTC

[GitHub] jermainewang opened a new issue #8673: Autograd variable scope confusion

jermainewang opened a new issue #8673: Autograd variable scope confusion
URL: https://github.com/apache/incubator-mxnet/issues/8673
 
 
   I'm a little bit confused about how the autograd scope will behave with mark_variables. I tried out following test code:
   ```python
   import mxnet as mx
   import mxnet.ndarray as nd
   import mxnet.autograd as ag
   
   ctx = mx.cpu(0)
   
   def PP(arr):
       if arr is None:
           print('None')
       else:
           print(arr.asnumpy()[0, 0])
   
   def test():
       a = nd.ones((2, 3), ctx=ctx)  # a is 1
       a_grad = nd.zeros((2, 3), ctx=ctx)
       ag.mark_variables(a, a_grad, 'write')
       d = nd.zeros((2, 3), ctx=ctx) + 100  # d is 100
       d_grad = nd.zeros((2, 3), ctx=ctx)
       ag.mark_variables(d, d_grad, 'write')
       for i in range(10):
           print('>>>>>>iter', i)
           with ag.record():
               b = a * (i + 1)
               if i % 2 == 0:
                   b_grad = nd.zeros((2, 3), ctx=ctx)
                   ag.mark_variables(b, b_grad)
               t = b * (i + 2)
               if i % 3 == 0:
                   t = t * d
               t.backward(retain_graph=True)
               PP(a.grad)
               PP(b.grad)
               PP(d.grad)
   
   test()
   ```
   
   The code has two variables outside of the autograd scope (`a` and `d`). A temporary variable `b` that will be marked once for two loops. The variable `d` will only show up in the computation graph once per three loops. *What will be the output of the three print statements?*
   
   Actually, I don't know about the correct answer. Here is what I will propose.
   * For loops that will not step in either branch (e.g. iter 1, 5, ...), the code is essentially equivalent to:
     ```
     b = a * (i + 1)
     t = b * (i + 2)
     t.backward()
     ```
     So `a.grad` should always be `(i + 1) * (i + 2)`. Since `b` is not marked as variable, I suppose its gradient array will not be saved, so `b.grad` could either be zero or none. As a contrast, `d` has been marked as variable but not in the computation graph, so `d.grad` should be zero.
   * For loops that step **only** in the first branch (e.g. iter 2, 4, ...), `a.grad` should be the same above. `b` should always be `(i + 2)`. And `d.grad` should be zero.
   * For loops that step **only** in the second branch (e.g. iter 3, 9, ...), the code is equivalent to:
     ```
     b = a * (i + 1)  # a is always 1
     t = b * (i + 2)
     t = t * d   # d is always 100
     t.backward()
     ```
     In this case, `a.grad` should be `(i + 1) * (i + 2) * 100`. `b` is not declared as variable. `d.grad` should equal to `t`'s value which is `(i + 1) * (i + 2)`.
   * For loops that step into both branches (e.g. iter 0, 6, ...), `a.grad` and `d.grad` should be the save as above. `b.grad` should equal to `(i + 2) * 100`.
   
   So here is what I got after running this:
   
   First of all, I must set `retain_graph` equal to True. I suppose this is because the autograd history will be cleared otherwise. However, it is a little bit confusing since `a` and `d` are declared as variable out of the autograd scope but are cleared after that.
   
   The running output is as follows:
   ```
   >>>>>>iter 0
   0.0
   200.0
   2.0
   >>>>>>iter 1
   6.0
   None
   2.0
   >>>>>>iter 2
   6.0
   4.0
   2.0
   >>>>>>iter 3
   2000.0
   None
   20.0
   >>>>>>iter 4
   2000.0
   6.0
   20.0
   >>>>>>iter 5
   42.0
   None
   20.0
   >>>>>>iter 6
   42.0
   800.0
   56.0
   >>>>>>iter 7
   72.0
   None
   56.0
   >>>>>>iter 8
   72.0
   10.0
   56.0
   >>>>>>iter 9
   11000.0
   None
   110.0
   ```
   
   There are some weird behaviors here:
   1. iter#0 's `a.grad` is incorrect. It is the forth case in the above and should be 200.
   2. iter#1 and iter#2's `d.grad` should be zero. It seems that it fetches the old gradient value (which is 2 calculated by iter#0). If I change `d`'s `mark_variable` to `ag.mark_variables(d, d_grad, 'add')` (i.e, the grad request is add), the result is same (which implies the calculated grad is 0.0). This is inconsistent.
   3. iter#4's `a.grad` is also incorrect. It is the second case and should be `(4 + 1) * (4 + 2) = 30`.
   
   Overall, there seems to be many puzzles and inconsistency here. My questions are *Are these behaviours intended?` and *What is the rule of thumbs when using autograd scope and variables* ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services