You are viewing a plain text version of this content. The canonical link for it is here.

Posted to discuss-archive@mxnet.apache.org by plemeur via MXNet Forum <mx...@discoursemail.com.INVALID> on 2020/07/02 09:26:25 UTC

[MXNet Forum] [Discussion] Siamese network issue, No gradient, No clue why


Hello, I've been trying to reproduce this experiment 
https://github.com/adambielski/siamese-triplet/blob/master/Experiments_MNIST.ipynb using MXNet.
The classification part was done, no problem there, but the siamese part is not converging as expected.

First thing first, my dataset is based on the MNIST dataset available in gluon

```
class SiameseNetworkDataset(datasets.MNIST):
    def __init__(self, root, train, transform=None):
        super().__init__(root,train, transform=transform)
        self.root = root
        self.transform = transform
        
        self._data = self._data.transpose((0, 3, 1,2)).astype('float32')/255
        self._label_indexes = {
            i : np.where(self._label==i)[0] for i in range(10)
        }
        
    def __getitem__(self, index):
        items_with_index = list(enumerate(self._label))
        img0_index, img0_tuple = random.choice(items_with_index)
        # we need to make sure approx 50% of images are in the same class
        should_get_same_class = random.randint(0, 1)
        if should_get_same_class:
            img1_index = random.choice(self._label_indexes[img0_tuple])
            img1_tuple = self._label[img1_index]
                
        else:
            img1_index, img1_tuple = random.choice(items_with_index)

        img0 = self._data[img0_index]
        img1 = self._data[img1_index]
        
        
        return img0, img1, mx.nd.array(mx.nd.array([int(img1_tuple != img0_tuple)]))

    def __len__(self):
        return super().__len__()
```

It does what I want and give back 3 output two of shape `(batch_size, 1, 28, 28)` and of shape `(batch_size, 1)`. So I doubt the error comes from here.

Then, there is the model, I'm using the same as the sus-mentionned exemple 
```class SiameseNetwork(HybridBlock):
    def __init__(self):
        super(SiameseNetwork, self).__init__()
        with self.name_scope():
            self.CNN = nn.HybridSequential()
            self.CNN.add(nn.Conv2D(channels=32, kernel_size=5),
                         nn.PReLU(),
                         nn.MaxPool2D(pool_size=2, strides=2),
                         nn.Conv2D(channels=64, kernel_size=5),
                         nn.PReLU(),
                         nn.MaxPool2D(pool_size=2, strides=2),
                        nn.Flatten(),
                        nn.Dense(256),
                         nn.PReLU(),
                        nn.Dense(256),
                         nn.PReLU(),
                        nn.Dense(2))
           
        
    def hybrid_forward(self, F, input1, input2):
        
        output1 = self.CNN(input1)
        output2 = self.CNN(input2)

        return output1, output2
```
Originally the `CNN` was another custom `HybridBlock`, but right now it's this (I thought that maybe I had an issue because of nested models, but looks like no) 

Then there is the loss function 
```
class ContrastiveLoss(Loss):
    def __init__(self, margin=2.0, weight=None, batch_axis=0, **kwargs):
        super(ContrastiveLoss, self).__init__(weight, batch_axis, **kwargs)
        self.margin = margin

    def hybrid_forward(self, F, output1, output2, label):
        euclidean_distance = F.sum(F.square(output1-output2),axis=1)
        loss_contrastive = F.mean((1-label) * F.square(euclidean_distance) +
                                   label * F.square(F.sqrt(F.clip((self.margin - euclidean_distance), 0.0, 10))))
        return loss_contrastive
```
I tried many things here, using different ndarray function (norm, relu, sum, etc...) basically always the same error function but written in a different way. I checked the shape at every step of the process to be sure that my vector product were doing the right thing (not going to matrix of shape `(batch_size, batch_size)` for instance.
It might be one of the culprit, but if it is, I might be not familiar enough with mxnet to see it.

Finally the training loop is 
```
for epoch in range(0, Config.train_number_epochs):
        if (epoch+1)%5 == 0:
            lr *= 0.1
            print(lr)
            trainer.set_learning_rate(lr)
        for i, data in enumerate(train_dataloader, 0):
               
            with autograd.record():
                img0, img1, label = data
                img0 = img0.copyto(mx.gpu())
                img1 = img1.copyto(mx.gpu())

                label = label.squeeze().copyto(mx.gpu())
                output1, output2 = net(img0, img1)
                
   
                loss_contrastive = loss(output1, output2, label)
                loss_contrastive.backward()
                
        
                if mx.nd.contrib.isnan(loss_contrastive):
                    print(loss_contrastive)
                    return label, output1, output2
            
            trainer.step(256)
            
            if i % 20 == 0:
                print("Epoch number {}  : Current loss = {}".format(epoch, loss_contrastive.mean().asscalar()))
```
I tried with the autograd before and after copying the batch to the GPU, no difference here, I would think that best practice is to have it after because I don't care about this operation.
`label` is squeezed because the shape `(64,1)` is problematic in the loss function. 
I checked the gradient recorded in the model with `net.CNN[0].weight.grad` and they were non-null.
However, I did try to check the loss gradient w.r.t `output1` and `output2` and got an error saying that they were not in a computational graph.

I'm open to any suggestion (especially if you have some on the dataset, the dataloading is slow as hell), if you have any idea on why it is not training.

I will try to transfer learn from the classification to the metric learning, maybe it's just lost in a local minima, but I really doubt it because when I plot the embedding in a graph, it's just a point cloud without any order.

Thanks for reading, It's the first time I post something in any technical forum, so if something is not in the form of this post, please do tell me as well :slight_smile:





---
[Visit Topic](https://discuss.mxnet.io/t/siamese-network-issue-no-gradient-no-clue-why/6372/1) or reply to this email to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.mxnet.io/email/unsubscribe/7d2dfb7e9252317bcfaf32aefc2c794a2d5a2c871f98be56f47ee80eea400d72).

[MXNet Forum] [Discussion] Siamese network issue, No gradient, No clue why

Posted by plemeur via MXNet Forum <mx...@discoursemail.com.INVALID>.


I found [this](https://mxnet.apache.org/api/python/docs/tutorials/packages/gluon/loss/custom-loss.html)  while going through the doc, but it doesn't seems to work on MNIST ...





---
[Visit Topic](https://discuss.mxnet.io/t/siamese-network-issue-no-gradient-no-clue-why/6372/3) or reply to this email to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.mxnet.io/email/unsubscribe/5e6ed35d895eb7132f368fc992c95208723b3f515900462a2eed4fdd1ecab47b).

[MXNet Forum] [Discussion] Siamese network issue, No gradient, No clue why

Posted by plemeur via MXNet Forum <mx...@discoursemail.com.INVALID>.


I figured out the slow dataloader, and all my dataset was not used... 
turned that 
```
def __getitem__(self, index):
        items_with_index = list(enumerate(self._label))
        img0_index, img0_tuple = random.choice(items_with_index)
```

into this 
```
def __getitem__(self, index):
        img0_tuple = self._label[index]
``` 
and put the `item_with_index` as a member, so it's not computed everytime for 
`img1_index, img1_tuple = random.choice(self._items_with_index)`

But still not converging, embedding plot looking always the same despite the loss being reduced....





---
[Visit Topic](https://discuss.mxnet.io/t/siamese-network-issue-no-gradient-no-clue-why/6372/2) or reply to this email to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.mxnet.io/email/unsubscribe/46067ccd6df7c4aa5d6acbcadd22f3675a354c44c899d7a89e3280cc6d377286).