You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2020/10/29 23:32:24 UTC

[GitHub] [incubator-tvm] tkonolige opened a new pull request #6794: [TOPI] Add embedding op and gradient

tkonolige opened a new pull request #6794:
URL: https://github.com/apache/incubator-tvm/pull/6794


   This PR adds the embed op and its gradient. Embed is a specialization of take with a 2D lookup table.
   
   @altanh 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tqchen edited a comment on pull request #6794: [TOPI] Add embedding op and gradient

Posted by GitBox <gi...@apache.org>.

tqchen edited a comment on pull request #6794:
URL: https://github.com/apache/incubator-tvm/pull/6794#issuecomment-719094251


   Thanks @tkonolige , given that this is a new op, it would be great to do a API review per https://tvm.apache.org/docs/contribute/code_review.html#deliberate-on-api-and-data-structures
   
   In particular, it would be great to checkout the convention of similar APIs in existing frameworks like numpy, PyTorch, Keras, TensorFlow, we should ideally follow common conventions. See previous related topics on nms https://github.com/apache/incubator-tvm/issues/2535


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] altanh commented on pull request #6794: [TOPI] Add embedding op and gradient

Posted by GitBox <gi...@apache.org>.

altanh commented on pull request #6794:
URL: https://github.com/apache/incubator-tvm/pull/6794#issuecomment-719117205


   cc @antinucleon 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] altanh commented on pull request #6794: [TOPI] Add embedding op and gradient

Posted by GitBox <gi...@apache.org>.

altanh commented on pull request #6794:
URL: https://github.com/apache/incubator-tvm/pull/6794#issuecomment-719116287

re: API review
- mxnet: https://mxnet.apache.org/versions/1.6/api/r/docs/api/mx.symbol.Embedding.html
- PyTorch: https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html
- TF/Keras: https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding

### Naming
All of the above APIs call it `Embedding`, so we may want to rename `embed` to `embedding` (although grammatically I do feel like "embed" is more correct).

### Arguments
I don't think we need to pass in the vocabulary size or embedding dimension like these examples do, since we can infer it from the weight/table matrix (I imagine they use it for bookkeeping in training, which is a separate matter). Likewise, we can ignore anything related to weight initialization.

PyTorch has the following additional arguments:
- `padding_idx: int`, index into embedding table that will always have 0 gradient, generally used for padding
- `scale_grad_by_freq: boolean`, "scale gradients by the inverse of frequency of the words in the mini-batch." I believe this means the gradient update for index `j` will be divided by `sum(indices == j)` (count of `j` in input indices).
- `sparse: boolean`: "gradient w.r.t. weight matrix will be a sparse tensor."

mxnet has:
- `sparse_grad: boolean`: gradient for weight is row-sparse (probably the same as PyTorch above?)

TF/keras has:
- `mask_zero: boolean`: "whether or not the input value 0 is a special "padding" value that should be masked out. This is useful when using recurrent layers which may take variable length input. If this is True, then all subsequent layers in the model need to support masking or an exception will be raised. If mask_zero is set to True, as a consequence, index 0 cannot be used in the vocabulary (input_dim should equal size of vocabulary + 1)." I don't fully understand this but it seems similar to `padding_idx` from PyTorch, but requires TF-specific masking support. I prefer PyTorch's approach if they are both equivalent.
- `input_length: int`: "Length of input sequences, when it is constant. This argument is required if you are going to connect Flatten then Dense layers upstream (without it, the shape of the dense outputs cannot be computed)." Again, this sounds like some kind of TF/Keras design quirk.

In my opinion, we should aim for PyTorch's API over TF/Keras, but perhaps others can give more insight. We are also thinking about adding sparse gradient support, so it may be best to add it as an `attr` but raise an error for now.

### Shapes
PyTorch and mxnet support arbitrary input shape. In particular, if our embedding dimension is `dim`, the shape relation is `(d1,...,dn) -> (d1,...,dn, dim)`.

TF/Keras is strange as they have `(batch_size, input_length) -> (batch_size, input_length, dim)`, which just seems like a restriction of PyTorch and mxnet.

This PR currently proposes `(flat_length,) -> (flat_length, dim)`. Note that we can easily support the PyTorch and mxnet approach by flattening the indices and then "reshaping the first n dimensions": `(d1,...,dn) -> (d1 * ... * dn) -> (d1 * ... * dn, dim) -> (d1,...,dn,dim)`. I imagine this should be easy to implement but I'm not too familiar with TOPI.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tqchen commented on pull request #6794: [TOPI] Add embedding op and gradient

Posted by GitBox <gi...@apache.org>.

tqchen commented on pull request #6794:
URL: https://github.com/apache/incubator-tvm/pull/6794#issuecomment-719094251


   Thanks @tkonolige , given that this is a new op, it would be great to do a API review per https://tvm.apache.org/docs/contribute/code_review.html#deliberate-on-api-and-data-structures
   
   In particular, it would be great to checkout the convention of similar APIs in existing frameworks like PyTorch, TensorFlow, we should ideally follow common conventions


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tkonolige commented on a change in pull request #6794: [TOPI] Add embedding op and gradient

Posted by GitBox <gi...@apache.org>.

tkonolige commented on a change in pull request #6794:
URL: https://github.com/apache/incubator-tvm/pull/6794#discussion_r520735630



##########
File path: python/tvm/topi/x86/nn.py
##########
@@ -69,3 +69,33 @@ def schedule_softmax(outs):
         s[exp].compute_at(s[softmax], fused_outer_axes)
 
     return s
+
+
+def schedule_embed_grad(outs):
+    """Schedule for embed_grad
+
+    Parameters
+    ----------
+    outs: Array of Tensor
+          The computation graph description of embed_grad
+          in the format of an array of tensors.
+
+    Returns
+    -------
+    sch: Schedule
+        The computation schedule for the op.
+    """
+    s = te.create_schedule([outs[0].op])
+
+    vec_size = 8  # should autotune this, but we can't with hybrid script

Review comment:
       We could reuse it. I just didn't have a good way to figure out the width of the vector instructions.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] giuseros commented on a change in pull request #6794: [TOPI] Add embedding op and gradient

Posted by GitBox <gi...@apache.org>.

giuseros commented on a change in pull request #6794:
URL: https://github.com/apache/incubator-tvm/pull/6794#discussion_r520667413



##########
File path: python/tvm/topi/x86/nn.py
##########
@@ -69,3 +69,33 @@ def schedule_softmax(outs):
         s[exp].compute_at(s[softmax], fused_outer_axes)
 
     return s
+
+
+def schedule_embed_grad(outs):
+    """Schedule for embed_grad
+
+    Parameters
+    ----------
+    outs: Array of Tensor
+          The computation graph description of embed_grad
+          in the format of an array of tensors.
+
+    Returns
+    -------
+    sch: Schedule
+        The computation schedule for the op.
+    """
+    s = te.create_schedule([outs[0].op])
+
+    vec_size = 8  # should autotune this, but we can't with hybrid script

Review comment:
       May I ask why 8? I am just wondering if we could reuse this schedule for the arm back-end as well




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org