You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/01/01 04:00:12 UTC
[GitHub] [tvm] crawlingcub opened a new issue #9818: [Bug]
crawlingcub opened a new issue #9818:
URL: https://github.com/apache/tvm/issues/9818
Hi,
I generated a model from `MobileNet-V3` by mutating some layers. When run with TVM, the model produces lower accuracy than PyTorch. However, I expected the results to be exactly same.
To reproduce the problem, download the model and data from this [link](https://drive.google.com/drive/folders/1_AOA-9hA1I92vBUeQMbZ6F-2r69_aIrF?usp=sharing) and run the script below using:
`python runscript.py model.pt data.pt`
I suspect the bug can be due to the `BatchNorm2d` layer. In one of the mutations, I changed the BatchNorm2d config from ` {"momentum": 0.01, "affine": true, "eps": 0.001}` to `{"momentum": 0.19067323835610506, "affine": true, "eps": -1.0}`.
Can the difference be due to this layer?
Please let me know if there is any other info that you need.
**Environment**:
```
TVM installed from source
Pytorch 1.9.1+cu111
Python 3.7
OS: Ubuntu 18.04
Cuda 11.1
GPUs: 8 NVidia GA100
```
**Code to Reproduce**:
```python
import torch
from torch.utils.data import DataLoader
import sys
import numpy as np
def accuracy(output, target, topk=(1,)):
with torch.no_grad():
maxk = max(topk)
batch_size = target.size(0)
_, pred = output.topk(maxk, 1, True, True)
pred = pred.t()
correct = pred.eq(target.view(1, -1).expand_as(pred))
res = []
for k in topk:
correct_k = correct[:k].reshape(-1).float().sum(0, keepdim=True)
res.append(correct_k.mul_(100.0 / batch_size))
return res
def eval_model_tvm(model, dataset, device, batch_size):
import tvm
from tvm import relay
from tvm.contrib.download import download_testdata
from tvm.contrib import graph_executor
import logging
logger = logging.getLogger('compile_engine')
logger.setLevel(logging.ERROR)
validation_dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=False)
if "cpu" in device.lower():
target = tvm.target.Target("llvm", host="llvm")
else:
target = tvm.target.cuda()
print("target", target)
dev = tvm.device(str(target))
model = model.to("cpu")
model.eval()
mod = None
lib = None
acc1s = []
acc5s = []
for i, (images, targets) in enumerate(validation_dataloader):
input_name = "input0"
if mod is None:
scripted_model = torch.jit.trace(model, images).eval()
print("scripted")
input_data = np.array([images[i].data.numpy() for i in range(len(images))], dtype="float32")
shape_list = [(input_name, input_data.shape)]
mod, params = relay.frontend.from_pytorch(scripted_model, shape_list)
with tvm.transform.PassContext(opt_level=3):
lib = relay.build(mod, target=target, params=params)
m = graph_executor.GraphModule(lib["default"](dev))
m.set_input(input_name, tvm.nd.array(images))
m.run()
output = torch.tensor(m.get_output(0).asnumpy())
acc1, acc5 = accuracy(output, targets, topk=(1, 5))
acc1s.append(acc1.item())
acc5s.append(acc5.item())
return {'acc1': np.mean(acc1s), 'acc5': np.mean(acc5s)}
def eval_model_vision(model, dataset, device, criterion, compute_metrics_fn, batch_size):
print("Running validation...")
from tqdm import tqdm
if not isinstance(model, torch.nn.DataParallel):
model = torch.nn.DataParallel(model)
if not isinstance(dataset, DataLoader):
validation_dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=False, num_workers=8, pin_memory=True)
else:
validation_dataloader = dataset
acc1s = []
acc2s = []
losses = []
model.to(device)
model.eval()
with torch.no_grad():
for i, (images, target) in tqdm(enumerate(validation_dataloader), total=len(validation_dataloader)):
# compute output
images = images.to(device)
target = target.to(device)
output = model(images)
loss = criterion(output, target)
# measure accuracy and record loss
acc1, acc5 = compute_metrics_fn(output, target, topk=(1, 5))
acc1s.append(acc1.item())
acc2s.append(acc5.item())
losses.append(loss.item())
#if i % 10 == 0:
# print(i, loss)
return {'acc1': np.mean(acc1s), 'acc5': np.mean(acc2s)}
model=sys.argv[1]
data=sys.argv[2]
DEVICE='cuda'
model = torch.load(sys.argv[1])
data=torch.load(sys.argv[2])
res1=eval_model_vision(model, data, device=DEVICE, criterion=torch.nn.CrossEntropyLoss(), compute_metrics_fn=accuracy, batch_size=100)
print(res1)
res2= eval_model_tvm(model, data, DEVICE, batch_size=100)
print(res2)
```
**Actual output**:
```
{'acc1': 0.3, 'acc5': 0.8}
{'acc1': 0.0, 'acc5': 0.8}
```
**Expected output**:
Both results should be same
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] masahi commented on issue #9818: [Bug] Lower accuracy for a variant of MobileNet-V3 model
Posted by GitBox <gi...@apache.org>.
masahi commented on issue #9818:
URL: https://github.com/apache/tvm/issues/9818#issuecomment-1005211048
ok, I have an imagenet and I was able to reproduce your issue. I'll take a look.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] manojec054 commented on issue #9818: [Bug] Lower accuracy for a variant of MobileNet-V3 model
Posted by GitBox <gi...@apache.org>.
manojec054 commented on issue #9818:
URL: https://github.com/apache/tvm/issues/9818#issuecomment-1065151079
@crawlingcub Were you able to figure out what was wrong ?. I have a similar issue with resnet50.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] masahi commented on issue #9818: [Bug] Lower accuracy for a variant of MobileNet-V3 model
Posted by GitBox <gi...@apache.org>.
masahi commented on issue #9818:
URL: https://github.com/apache/tvm/issues/9818#issuecomment-1004626130
I've got the following error when trying to run your script:
```
File "/home/masa/anaconda3/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/masa/anaconda3/lib/python3.8/site-packages/torch/utils/data/dataset.py", line 363, in __getitem__
return self.dataset[self.indices[idx]]
File "/home/masa/anaconda3/lib/python3.8/site-packages/torch/utils/data/dataset.py", line 363, in __getitem__
return self.dataset[self.indices[idx]]
File "/home/masa/anaconda3/lib/python3.8/site-packages/torchvision/datasets/folder.py", line 232, in __getitem__
sample = self.loader(path)
File "/home/masa/anaconda3/lib/python3.8/site-packages/torchvision/datasets/folder.py", line 269, in default_loader
return pil_loader(path)
File "/home/masa/anaconda3/lib/python3.8/site-packages/torchvision/datasets/folder.py", line 249, in pil_loader
with open(path, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: './pt_models/data/val/n01518878/ILSVRC2012_val_00036105.JPEG'
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] crawlingcub commented on issue #9818: [Bug] Lower accuracy for a variant of MobileNet-V3 model
Posted by GitBox <gi...@apache.org>.
crawlingcub commented on issue #9818:
URL: https://github.com/apache/tvm/issues/9818#issuecomment-1005157959
Hi @masahi, sorry about that. I kind of missed the fact that the data is lazily loaded. This requires the imagenet validation dataset. If you already have it and you can place it in `./pt_models/data/val`, this should work.
Please let me know if that doesnt work. I will try an alternative way of sending the data set.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] masahi commented on issue #9818: [Bug] Lower accuracy for a variant of MobileNet-V3 model
Posted by GitBox <gi...@apache.org>.
masahi commented on issue #9818:
URL: https://github.com/apache/tvm/issues/9818#issuecomment-1006326650
Apparently, the output from TVM is all `nan`. I tried replacing `eps = -1.0` by 0.001, but it didn't help.
It's hard to tell what's going on. Can you start from Mobilenet v3, which should give accurate result with TVM, and gradually add your change until you see accuracy going off? It takes some time but surely it will find what change caused the issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org