You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/01/14 03:31:39 UTC

[GitHub] [incubator-mxnet] anjishnu opened a new pull request #17298: [MXNET-1438] Adding SDML loss function

anjishnu opened a new pull request #17298: [MXNET-1438] Adding SDML loss function
URL: https://github.com/apache/incubator-mxnet/pull/17298
 
 
   ## Description ##
   (Brief description on what this PR is about)
   
   ## Checklist ##
   ### Essentials ###
   Please feel free to remove inapplicable items for your PR.
   - [x] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant [JIRA issue](https://issues.apache.org/jira/projects/MXNET/issues) created (except PRs with tiny changes)
   - [x] Changes are complete (i.e. I finished coding on this PR)
   - [x] All changes have test coverage:
   - [x] Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
   - [x] Code is well-documented: 
   - [x] For user-facing API changes, API doc string has been updated. 
   - [x] To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
   - Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
   
   ### Changes ###
   - [x] Added new loss for information retrieval - Smoothed Deep Metric Learning Loss with relevant API doc.
   
   ## Comments ##
   - Added loss function outlined in https://arxiv.org/abs/1905.12786 - example to follow.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] anjishnu edited a comment on issue #17298: [MXNET-1438] Adding SDML loss function

Posted by GitBox <gi...@apache.org>.
anjishnu edited a comment on issue #17298: [MXNET-1438] Adding SDML loss function
URL: https://github.com/apache/incubator-mxnet/pull/17298#issuecomment-574415884
 
 
   @haojin2 Sure will address the sanity cases.
   
   Can you give an example of a unit test that is appropriately randomized so I can base it on that?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] anjishnu edited a comment on issue #17298: [MXNET-1438] Adding SDML loss function

Posted by GitBox <gi...@apache.org>.
anjishnu edited a comment on issue #17298: [MXNET-1438] Adding SDML loss function
URL: https://github.com/apache/incubator-mxnet/pull/17298#issuecomment-574458020
 
 
   It looks a little tricky to port this into the 'fit' and 'score' paradigm since this is a retrieval specific loss function which uses the other elements in a batch as implicit negative samples - and I'm not sure how cleanly it fits into the Module API for this kind of test. Specially since the loss computation needs to know the shape of the minibatch which doesn't seem to be possible in the symbol API.
   
   The loss only guarantees that associated pairs will be closer in the chosen metric space after learning as compared to the non-associated pairs. 
   
   Maybe I can write something equivalent using the gluon API, to train a small network and ensure it learns the right associations. I'll come up with a proposal shortly.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] anjishnu edited a comment on issue #17298: [MXNET-1438] Adding SDML loss function

Posted by GitBox <gi...@apache.org>.
anjishnu edited a comment on issue #17298: [MXNET-1438] Adding SDML loss function
URL: https://github.com/apache/incubator-mxnet/pull/17298#issuecomment-574802286
 
 
   @haojin2 if I randomized the input data in the original test code the losses would would have different values during each run (SDML loss imposes a distribution over the relative distances of data points in a minibatch) - so I would not be able to compare the output against precomputed loss values any more - thus the original unit test procedure cannot be reused.
   
   That's why I added a test that fits a toy model to some toy data instead. The current test was running in ~50 ms on my machine on CPU. Would love to hear your thoughts on how to improve on this.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] anjishnu commented on issue #17298: [MXNET-1438] Adding SDML loss function

Posted by GitBox <gi...@apache.org>.
anjishnu commented on issue #17298: [MXNET-1438] Adding SDML loss function
URL: https://github.com/apache/incubator-mxnet/pull/17298#issuecomment-575405571
 
 
   @haojin2 
   With the original confirguration I was getting around 1 in 100 or 1 in 200 failures, which is not acceptable for a unit test.
   
   Changing the hyperparameters (e.g. increasing dimensionality, lowering N) does make it more robust (tried up to MXNET_TEST_COUNT=2000 runs) but there's always a chance of failure
   
   Other options:
   - Go with the original precomputed values approach, but maybe over a larger range of precomputed values.
   
   - Set up a similar test ... but use correlated data instead of using purely random data:
      - Generate the positive samples that are actually correlated to the paired data rather than purely random.
      - Always use some set of samples that are known to converge successfully
   - Add a test to see loss reduction (e.g. to <0.05 as in many of the other tests)
   
   Which of these would be your preferred approach - I have tested them all to MXNET_TEST_COUNT=2000
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] anjishnu commented on issue #17298: [MXNET-1438] Adding SDML loss function

Posted by GitBox <gi...@apache.org>.
anjishnu commented on issue #17298: [MXNET-1438] Adding SDML loss function
URL: https://github.com/apache/incubator-mxnet/pull/17298#issuecomment-575768941
 
 
   @haojin2 ok, thanks! sounds good!
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] haojin2 merged pull request #17298: [MXNET-1438] Adding SDML loss function

Posted by GitBox <gi...@apache.org>.
haojin2 merged pull request #17298: [MXNET-1438] Adding SDML loss function
URL: https://github.com/apache/incubator-mxnet/pull/17298
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] anjishnu commented on issue #17298: [MXNET-1438] Adding SDML loss function

Posted by GitBox <gi...@apache.org>.
anjishnu commented on issue #17298: [MXNET-1438] Adding SDML loss function
URL: https://github.com/apache/incubator-mxnet/pull/17298#issuecomment-574802286
 
 
   @haojin2 if I randomized the input data in the original test code the losses would would have different values during each run (SDML loss imposes a distribution over the relative distances of data points in a minibatch) - so you cannot compare the output against precomputed loss values any more - thus the original unit test procedure cannot be reused.
   
   That's why I added a test that fits a toy model to some toy data instead. The current test was running in ~50 ms on my machine on CPU. Would love to hear your thoughts on how to improve on this.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] haojin2 commented on issue #17298: [MXNET-1438] Adding SDML loss function

Posted by GitBox <gi...@apache.org>.
haojin2 commented on issue #17298: [MXNET-1438] Adding SDML loss function
URL: https://github.com/apache/incubator-mxnet/pull/17298#issuecomment-574418897
 
 
   @anjishnu Like this: https://github.com/apache/incubator-mxnet/pull/17298/files#diff-b129d416cc6980021cb35420a66e4d3aR334

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] haojin2 commented on issue #17298: [MXNET-1438] Adding SDML loss function

Posted by GitBox <gi...@apache.org>.
haojin2 commented on issue #17298: [MXNET-1438] Adding SDML loss function
URL: https://github.com/apache/incubator-mxnet/pull/17298#issuecomment-575757199
 
 
   @anjishnu it seems ok. I've re-triggered your failed website build, I think it's good for merge after that passes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] anjishnu commented on issue #17298: [MXNET-1438] Adding SDML loss function

Posted by GitBox <gi...@apache.org>.
anjishnu commented on issue #17298: [MXNET-1438] Adding SDML loss function
URL: https://github.com/apache/incubator-mxnet/pull/17298#issuecomment-574458020
 
 
   It looks a little tricky to port this into the 'fit' and 'score' paradigm since this is a retrieval specific loss function which uses the other elements in a batch as implicit negative samples - and I'm not sure how cleanly it fits into the Module API for this kind of test. 
   
   The loss only guarantees that associated pairs will be closer in distance space after learning then non-associated pairs. Maybe I can write something equivalent using the gluon API, to train a small network and ensure it learns the right associations. I'll come up with a proposal shortly.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] anjishnu commented on issue #17298: [MXNET-1438] Adding SDML loss function

Posted by GitBox <gi...@apache.org>.
anjishnu commented on issue #17298: [MXNET-1438] Adding SDML loss function
URL: https://github.com/apache/incubator-mxnet/pull/17298#issuecomment-574415884
 
 
   Sure will address the sanity cases - 
   
   Can you give an example of a unit test that is appropriately randomized so I can base it on that?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] anjishnu commented on issue #17298: [MXNET-1438] Adding SDML loss function

Posted by GitBox <gi...@apache.org>.
anjishnu commented on issue #17298: [MXNET-1438] Adding SDML loss function
URL: https://github.com/apache/incubator-mxnet/pull/17298#issuecomment-574927315
 
 
   Verifying that mention of this PR in #17292 is a typo - author meant to refer to #17208 - we are not breaking horovod

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] anjishnu edited a comment on issue #17298: [MXNET-1438] Adding SDML loss function

Posted by GitBox <gi...@apache.org>.
anjishnu edited a comment on issue #17298: [MXNET-1438] Adding SDML loss function
URL: https://github.com/apache/incubator-mxnet/pull/17298#issuecomment-574458020
 
 
   It looks a little tricky to port this into the 'fit' and 'score' paradigm since this is a retrieval specific loss function which uses the other elements in a batch as implicit negative samples - and I'm not sure how cleanly it fits into the Module API for this kind of test. 
   
   The loss only guarantees that associated pairs will be closer in the chosen metric space after learning as compared to the non-associated pairs. 
   
   Maybe I can write something equivalent using the gluon API, to train a small network and ensure it learns the right associations. I'll come up with a proposal shortly.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] haojin2 commented on issue #17298: [MXNET-1438] Adding SDML loss function

Posted by GitBox <gi...@apache.org>.
haojin2 commented on issue #17298: [MXNET-1438] Adding SDML loss function
URL: https://github.com/apache/incubator-mxnet/pull/17298#issuecomment-575348787
 
 
   @anjishnu I'm only concerned about if there's any flakiness in such a test, to verify, please try
   `MXNET_TEST_COUNT=500 nosetests tests/python/unittest/test_loss.py:test_sdml_loss` on your end and lemme know if it passes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] anjishnu commented on issue #17298: [MXNET-1438] Adding SDML loss function

Posted by GitBox <gi...@apache.org>.
anjishnu commented on issue #17298: [MXNET-1438] Adding SDML loss function
URL: https://github.com/apache/incubator-mxnet/pull/17298#issuecomment-575756538
 
 
   @haojin2 could you take a look at the latest version - the new unit test (which looks at loss reduction) is passing MXNET_TEST_COUNT=2000

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] anjishnu commented on issue #17298: [MXNET-1438] Adding SDML loss function

Posted by GitBox <gi...@apache.org>.
anjishnu commented on issue #17298: [MXNET-1438] Adding SDML loss function
URL: https://github.com/apache/incubator-mxnet/pull/17298#issuecomment-575347534
 
 
   @haojin2 Are there any serious concerns with the new unit test?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] haojin2 commented on issue #17298: [MXNET-1438] Adding SDML loss function

Posted by GitBox <gi...@apache.org>.
haojin2 commented on issue #17298: [MXNET-1438] Adding SDML loss function
URL: https://github.com/apache/incubator-mxnet/pull/17298#issuecomment-574779763
 
 
   @anjishnu I was actually only asking for getting the input data randomized with the original version of test code untouched.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] haojin2 commented on issue #17298: [MXNET-1438] Adding SDML loss function

Posted by GitBox <gi...@apache.org>.
haojin2 commented on issue #17298: [MXNET-1438] Adding SDML loss function
URL: https://github.com/apache/incubator-mxnet/pull/17298#issuecomment-574016263
 
 
   @anjishnu Please address the sanity errors: http://jenkins.mxnet-ci.amazon-ml.com/job/mxnet-validation/job/sanity/job/PR-17298/1/display/redirect . Also can you randomize your unit test to ensure that we're covering more numerically different cases?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] haojin2 commented on issue #17298: [MXNET-1438] Adding SDML loss function

Posted by GitBox <gi...@apache.org>.
haojin2 commented on issue #17298: [MXNET-1438] Adding SDML loss function
URL: https://github.com/apache/incubator-mxnet/pull/17298#issuecomment-575814068
 
 
   @anjishnu Merged, thx for the contribution.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] szha commented on a change in pull request #17298: [MXNET-1438] Adding SDML loss function

Posted by GitBox <gi...@apache.org>.
szha commented on a change in pull request #17298:
URL: https://github.com/apache/incubator-mxnet/pull/17298#discussion_r460480501



##########
File path: python/mxnet/gluon/loss.py
##########
@@ -930,3 +930,118 @@ def _cosine_similarity(self, F, x, y, axis=-1):
         else:
             eps_arr = F.full((1, 1), 1e-12)
         return (x_dot_y / F.broadcast_maximum(x_norm * y_norm, eps_arr))
+
+
+class SDMLLoss(Loss):
+    r"""Calculates Batchwise Smoothed Deep Metric Learning (SDML) Loss given two input tensors and a smoothing weight
+    SDM Loss learns similarity between paired samples by using unpaired samples in the minibatch
+    as potential negative examples.
+
+    The loss is described in greater detail in
+    "Large Scale Question Paraphrase Retrieval with Smoothed Deep Metric Learning."
+    - by Bonadiman, Daniele, Anjishnu Kumar, and Arpit Mittal.  arXiv preprint arXiv:1905.12786 (2019).
+    URL: https://arxiv.org/pdf/1905.12786.pdf
+
+    According to the authors, this loss formulation achieves comparable or higher accuracy to
+    Triplet Loss but converges much faster.
+    The loss assumes that the items in both tensors in each minibatch
+    are aligned such that x1[0] corresponds to x2[0] and all other datapoints in the minibatch are unrelated.
+    `x1` and  `x2` are minibatches of vectors.
+
+    Parameters
+    ----------
+    smoothing_parameter : float
+        Probability mass to be distributed over the minibatch. Must be < 1.0.
+    weight : float or None
+        Global scalar weight for loss.
+    batch_axis : int, default 0
+        The axis that represents mini-batch.
+
+    Inputs:
+        - **x1**: Minibatch of data points with shape (batch_size, vector_dim)
+        - **x2**: Minibatch of data points with shape (batch_size, vector_dim)
+          Each item in x2 is a positive sample for the same index in x1.
+          That is, x1[0] and x2[0] form a positive pair, x1[1] and x2[1] form a positive pair - and so on.
+          All data points in different rows should be decorrelated
+
+    Outputs:
+        - **loss**: loss tensor with shape (batch_size,).
+    """
+
+    def __init__(self, smoothing_parameter=0.3, weight=1., batch_axis=0, **kwargs):
+        super(SDMLLoss, self).__init__(weight, batch_axis, **kwargs)
+        self.kl_loss = KLDivLoss(from_logits=True)
+        self.smoothing_parameter = smoothing_parameter # Smoothing probability mass
+
+    def _compute_distances(self, x1, x2):
+        """
+        This function computes the euclidean distance between every vector
+        in the two batches in input.
+        """
+
+        # extracting sizes expecting [batch_size, dim]
+        assert x1.shape == x2.shape
+        batch_size, dim = x1.shape
+        # expanding both tensor form [batch_size, dim] to [batch_size, batch_size, dim]
+        x1_ = x1.expand_dims(1).broadcast_to([batch_size, batch_size, dim])
+        x2_ = x2.expand_dims(0).broadcast_to([batch_size, batch_size, dim])
+        # pointwise squared differences
+        squared_diffs = (x1_ - x2_)**2
+        # sum of squared differences distance
+        return squared_diffs.sum(axis=2)
+
+
+    def _compute_labels(self, F, batch_size):
+        """
+        The function creates the label matrix for the loss.
+        It is an identity matrix of size [BATCH_SIZE x BATCH_SIZE]
+        labels:
+            [[1, 0]
+             [0, 1]]
+
+        after the proces the labels are smoothed by a small amount to
+        account for errors.
+
+        labels:
+            [[0.9, 0.1]
+             [0.1, 0.9]]
+
+
+        Pereyra, Gabriel, et al. "Regularizing neural networks by penalizing
+        confident output distributions." arXiv preprint arXiv:1701.06548 (2017).
+        """
+
+        # TODO: replace with mx.nd.eye(batch_size) with mxnet 1.2
+        gold = F.one_hot(F.arange(batch_size), batch_size)
+        labels = gold * (1 - self.smoothing_parameter) + (1 - gold) * self.smoothing_parameter / (batch_size - 1)
+        return labels
+
+
+    def _loss(self, F, x1, x2):
+        """
+        the function computes the kl divergence between the negative distances
+        (internally it compute a softmax casting into probabilities) and the
+        identity matrix.
+
+        This assumes that the two batches are aligned therefore the more similar
+        vector should be the one having the same id.
+
+        Batch1                                Batch2
+
+        President of France                   French President
+        President of US                       American President
+
+        Given the question president of France in batch 1 the model will
+        learn to predict french president comparing it with all the other
+        vectors in batch 2
+        """
+        batch_size = x1.shape[0]
+        labels = self._compute_labels(F, batch_size)
+        distances = self._compute_distances(x1, x2)
+        log_probabilities = F.log_softmax(-distances, axis=1)
+        # multiply for the number of labels to obtain the correct loss (gluon kl_loss averages instead of sum)
+        return self.kl_loss(log_probabilities, labels.as_in_context(distances.context)) * batch_size

Review comment:
       this doesn't work in sym. 1.x will need to be fixed while 2.0 will be switched to deferred compute mode




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org