You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by lx...@apache.org on 2017/07/07 15:58:16 UTC
[03/50] [abbrv] incubator-mxnet-test git commit: Better ssd (#6827)
Better ssd (#6827)
* adjust according to # ctx
increase default cpu threads
add ms coco dataset
fix det_aug_default when label is invalid
fix bracelets
revert grad_scale and lr
change lr-steps and end-epoch
rescale by num_ctx only
fix mxnet path
remove hard-coded class names
add coco utils for ground-truth
fix os.path
add new inception v3 ssd symbol
rename
fix rename
add resnet 50 ssd symbol
* rewrite symbol composing methods
fix name
add symbol to path
modify lr-steps and end-epoch
better names
rename conv_act_layer names
update demo,deploy, evalute
remove handcrafted symbols
add doc
fix vgg16 fc->relu
modify inceptionv3-ssd
relax data_shape constraints
fix resnet101
parameter tuning
add back legacy models
change default lr
fix typo
* add model link, update readme, wrap up
fix path
* fix lost commit train_net.py
Project: http://git-wip-us.apache.org/repos/asf/incubator-mxnet-test/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-mxnet-test/commit/cc62aded
Tree: http://git-wip-us.apache.org/repos/asf/incubator-mxnet-test/tree/cc62aded
Diff: http://git-wip-us.apache.org/repos/asf/incubator-mxnet-test/diff/cc62aded
Branch: refs/heads/master
Commit: cc62aded9ce8bd078458553dc6bcb5ee16472f51
Parents: 984aaa1
Author: Joshua Z. Zhang <ch...@gmail.com>
Authored: Mon Jun 26 20:40:00 2017 -0700
Committer: Eric Junyuan Xie <pi...@users.noreply.github.com>
Committed: Mon Jun 26 20:40:00 2017 -0700
----------------------------------------------------------------------
example/ssd/README.md | 35 +-
example/ssd/config/config.py | 3 +-
example/ssd/dataset/imdb.py | 32 +-
example/ssd/dataset/mscoco.py | 115 ++++++
example/ssd/dataset/names/mscoco.names | 80 +++++
example/ssd/dataset/names/pascal_voc.names | 20 ++
example/ssd/dataset/pascal_voc.py | 10 +-
example/ssd/dataset/pycocotools/README.md | 2 +
example/ssd/dataset/pycocotools/__init__.py | 1 +
example/ssd/dataset/pycocotools/coco.py | 435 +++++++++++++++++++++++
example/ssd/demo.py | 66 ++--
example/ssd/deploy.py | 25 +-
example/ssd/evaluate.py | 23 +-
example/ssd/evaluate/evaluate_net.py | 6 +-
example/ssd/symbol/README.md | 49 +++
example/ssd/symbol/common.py | 100 +++++-
example/ssd/symbol/inceptionv3.py | 168 +++++++++
example/ssd/symbol/legacy_vgg16_ssd_300.py | 191 ++++++++++
example/ssd/symbol/legacy_vgg16_ssd_512.py | 194 ++++++++++
example/ssd/symbol/resnet.py | 169 +++++++++
example/ssd/symbol/symbol_builder.py | 166 +++++++++
example/ssd/symbol/symbol_factory.py | 122 +++++++
example/ssd/symbol/symbol_vgg16_ssd_300.py | 189 ----------
example/ssd/symbol/symbol_vgg16_ssd_512.py | 194 ----------
example/ssd/symbol/vgg16_reduced.py | 86 +++++
example/ssd/tools/prepare_coco.sh | 4 +
example/ssd/tools/prepare_dataset.py | 30 ++
example/ssd/tools/visualize_net.py | 17 +-
example/ssd/train.py | 14 +-
example/ssd/train/train_net.py | Bin 9900 -> 9651 bytes
src/io/image_det_aug_default.cc | 6 +-
31 files changed, 2084 insertions(+), 468 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/incubator-mxnet-test/blob/cc62aded/example/ssd/README.md
----------------------------------------------------------------------
diff --git a/example/ssd/README.md b/example/ssd/README.md
index 8703a7c..5759fca 100644
--- a/example/ssd/README.md
+++ b/example/ssd/README.md
@@ -17,6 +17,8 @@ remarkable traits of MXNet.
Due to the permission issue, this example is maintained in this [repository](https://github.com/zhreshold/mxnet-ssd) separately. You can use the link regarding specific per example [issues](https://github.com/zhreshold/mxnet-ssd/issues).
### What's new
+* Added multiple trained models.
+* Added a much simpler way to compose network from mainstream classification networks (resnet, inception...) and [Guide](symbol/README.md).
* Update to the latest version according to caffe version, with 5% mAP increase.
* Use C++ record iterator based on back-end multi-thread engine to achieve huge speed up on multi-gpu environments.
* Monitor validation mAP during training.
@@ -30,11 +32,12 @@ Due to the permission issue, this example is maintained in this [repository](htt
![demo3](https://cloud.githubusercontent.com/assets/3307514/19171086/a9346842-8be0-11e6-8011-c17716b22ad3.png)
### mAP
-| Model | Training data | Test data | mAP |
-|:-----------------:|:----------------:|:---------:|:----:|
-| [VGG16_reduced 300x300](https://github.com/zhreshold/mxnet-ssd/releases/download/v0.5-beta/vgg16_ssd_300_voc0712_trainval.zip) | VOC07+12 trainval| VOC07 test| 77.8|
-| [VGG16_reduced 512x512](https://github.com/zhreshold/mxnet-ssd/releases/download/v0.5-beta/vgg16_ssd_512_voc0712_trainval.zip) | VOC07+12 trainval | VOC07 test| 79.9|
-*More to be added*
+| Model | Training data | Test data | mAP | Note |
+|:-----------------:|:----------------:|:---------:|:----:|:-----|
+| [VGG16_reduced 300x300](https://github.com/zhreshold/mxnet-ssd/releases/download/v0.5-beta/vgg16_ssd_300_voc0712_trainval.zip) | VOC07+12 trainval| VOC07 test| 77.8| fast |
+| [VGG16_reduced 512x512](https://github.com/zhreshold/mxnet-ssd/releases/download/v0.5-beta/vgg16_ssd_512_voc0712_trainval.zip) | VOC07+12 trainval | VOC07 test| 79.9| slow |
+| [Inception-v3 512x512](https://github.com/zhreshold/mxnet-ssd/releases/download/v0.6/inceptionv3_ssd_512_voc0712_trainval.zip) | VOC07+12 trainval| VOC07 test| 78.9 | fastest |
+| [Resnet-50 512x512](https://github.com/zhreshold/mxnet-ssd/releases/download/v0.6/resnet50_ssd_512_voc0712_trainval.zip) | VOC07+12 trainval| VOC07 test| 78.9 | fast |
### Speed
| Model | GPU | CUDNN | Batch-size | FPS* |
@@ -65,13 +68,14 @@ Remember to enable CUDA if you want to be able to train, since CPU training is
insanely slow. Using CUDNN is optional, but highly recommended.
### Try the demo
-* Download the pretrained model: [`ssd_300_voc_0712.zip`](https://github.com/zhreshold/mxnet-ssd/releases/download/v0.5-beta/vgg16_ssd_300_voc0712_trainval.zip), and extract to `model/` directory.
+* Download the pretrained model: [`ssd_resnet50_0712.zip`](https://github.com/zhreshold/mxnet-ssd/releases/download/v0.6/resnet50_ssd_512_voc0712_trainval.zip), and extract to `model/` directory.
* Run
```
-# cd /path/to/mxnet/example/ssd
-python demo.py
+# cd /path/to/mxnet-ssd
+python demo.py --gpu 0
# play with examples:
python demo.py --epoch 0 --images ./data/demo/dog.jpg --thresh 0.5
+python demo.py --cpu --network resnet50 --data-shape 512
# wait for library to load for the first time
```
* Check `python demo.py --help` for more options.
@@ -93,7 +97,7 @@ tar -xvf VOCtrainval_11-May-2012.tar
tar -xvf VOCtrainval_06-Nov-2007.tar
tar -xvf VOCtest_06-Nov-2007.tar
```
-* We are goint to use `trainval` set in VOC2007/2012 as a common strategy.
+* We are going to use `trainval` set in VOC2007/2012 as a common strategy.
The suggested directory structure is to store `VOC2007` and `VOC2012` directories
in the same `VOCdevkit` folder.
* Then link `VOCdevkit` folder to `data/VOCdevkit` by default:
@@ -114,12 +118,12 @@ python tools/prepare_dataset.py --dataset pascal --year 2007 --set test --target
# cd /path/to/mxnet/example/ssd
python train.py
```
-* By default, this example will use `batch-size=32` and `learning_rate=0.004`.
+* By default, this example will use `batch-size=32` and `learning_rate=0.002`.
You might need to change the parameters a bit if you have different configurations.
Check `python train.py --help` for more training options. For example, if you have 4 GPUs, use:
```
# note that a perfect training parameter set is yet to be discovered for multi-GPUs
-python train.py --gpus 0,1,2,3 --batch-size 128 --lr 0.001
+python train.py --gpus 0,1,2,3 --batch-size 32
```
### Evalute trained model
@@ -148,3 +152,12 @@ python convert_model.py deploy.prototxt name_of_pretrained_caffe_model.caffemode
python demo.py --prefix ssd_converted --epoch 1 --deploy
```
There is no guarantee that conversion will always work, but at least it's good for now.
+
+### Legacy models
+Since the new interface for composing network is introduced, the old models have inconsistent names for weights.
+You can still load the previous model by rename the symbol to `legacy_xxx.py`
+and call with `python train/demo.py --network legacy_xxx `
+For example:
+```
+python demo.py --network 'legacy_vgg16_ssd_300.py' --prefix model/ssd_300 --epoch 0
+```
http://git-wip-us.apache.org/repos/asf/incubator-mxnet-test/blob/cc62aded/example/ssd/config/config.py
----------------------------------------------------------------------
diff --git a/example/ssd/config/config.py b/example/ssd/config/config.py
index 931ad16..278b770 100644
--- a/example/ssd/config/config.py
+++ b/example/ssd/config/config.py
@@ -53,7 +53,7 @@ cfg.train.inter_method = 10 # random interpolation
cfg.train.rand_mirror_prob = 0.5
cfg.train.shuffle = True
cfg.train.seed = 233
-cfg.train.preprocess_threads = 6
+cfg.train.preprocess_threads = 48
cfg.train = config_as_dict(cfg.train) # convert to normal dict
# validation
@@ -64,4 +64,5 @@ cfg.valid.color_jitter = ColorJitter()
cfg.valid.rand_mirror_prob = 0
cfg.valid.shuffle = False
cfg.valid.seed = 0
+cfg.valid.preprocess_threads = 32
cfg.valid = config_as_dict(cfg.valid) # convert to normal dict
http://git-wip-us.apache.org/repos/asf/incubator-mxnet-test/blob/cc62aded/example/ssd/dataset/imdb.py
----------------------------------------------------------------------
diff --git a/example/ssd/dataset/imdb.py b/example/ssd/dataset/imdb.py
index 95b082d..279fe9c 100644
--- a/example/ssd/dataset/imdb.py
+++ b/example/ssd/dataset/imdb.py
@@ -14,7 +14,7 @@ class Imdb(object):
self.name = name
self.classes = []
self.num_classes = 0
- self.image_set_index = []
+ self.image_set_index = None
self.num_images = 0
self.labels = None
self.padding = 0
@@ -59,9 +59,22 @@ class Imdb(object):
fname : str
saved filename
"""
+ def progress_bar(count, total, suffix=''):
+ import sys
+ bar_len = 24
+ filled_len = int(round(bar_len * count / float(total)))
+
+ percents = round(100.0 * count / float(total), 1)
+ bar = '=' * filled_len + '-' * (bar_len - filled_len)
+ sys.stdout.write('[%s] %s%s ...%s\r' % (bar, percents, '%', suffix))
+ sys.stdout.flush()
+
str_list = []
for index in range(self.num_images):
+ progress_bar(index, self.num_images)
label = self.label_from_index(index)
+ if label.size < 1:
+ continue
path = self.image_path_from_index(index)
if root:
path = osp.relpath(path, root)
@@ -78,3 +91,20 @@ class Imdb(object):
f.write(line)
else:
raise RuntimeError("No image in imdb")
+
+ def _load_class_names(self, filename, dirname):
+ """
+ load class names from text file
+
+ Parameters:
+ ----------
+ filename: str
+ file stores class names
+ dirname: str
+ file directory
+ """
+ full_path = osp.join(dirname, filename)
+ classes = []
+ with open(full_path, 'r') as f:
+ classes = [l.strip() for l in f.readlines()]
+ return classes
http://git-wip-us.apache.org/repos/asf/incubator-mxnet-test/blob/cc62aded/example/ssd/dataset/mscoco.py
----------------------------------------------------------------------
diff --git a/example/ssd/dataset/mscoco.py b/example/ssd/dataset/mscoco.py
new file mode 100644
index 0000000..b46b227
--- /dev/null
+++ b/example/ssd/dataset/mscoco.py
@@ -0,0 +1,115 @@
+import os
+import numpy as np
+from imdb import Imdb
+from pycocotools.coco import COCO
+
+
+class Coco(Imdb):
+ """
+ Implementation of Imdb for MSCOCO dataset: https://http://mscoco.org
+
+ Parameters:
+ ----------
+ anno_file : str
+ annotation file for coco, a json file
+ image_dir : str
+ image directory for coco images
+ shuffle : bool
+ whether initially shuffle image list
+
+ """
+ def __init__(self, anno_file, image_dir, shuffle=True, names='mscoco.names'):
+ assert os.path.isfile(anno_file), "Invalid annotation file: " + anno_file
+ basename = os.path.splitext(os.path.basename(anno_file))[0]
+ super(Coco, self).__init__('coco_' + basename)
+ self.image_dir = image_dir
+
+ self.classes = self._load_class_names(names,
+ os.path.join(os.path.dirname(__file__), 'names'))
+
+ self.num_classes = len(self.classes)
+ self._load_all(anno_file, shuffle)
+ self.num_images = len(self.image_set_index)
+
+
+ def image_path_from_index(self, index):
+ """
+ given image index, find out full path
+
+ Parameters:
+ ----------
+ index: int
+ index of a specific image
+ Returns:
+ ----------
+ full path of this image
+ """
+ assert self.image_set_index is not None, "Dataset not initialized"
+ name = self.image_set_index[index]
+ image_file = os.path.join(self.image_dir, 'images', name)
+ assert os.path.isfile(image_file), 'Path does not exist: {}'.format(image_file)
+ return image_file
+
+ def label_from_index(self, index):
+ """
+ given image index, return preprocessed ground-truth
+
+ Parameters:
+ ----------
+ index: int
+ index of a specific image
+ Returns:
+ ----------
+ ground-truths of this image
+ """
+ assert self.labels is not None, "Labels not processed"
+ return self.labels[index]
+
+ def _load_all(self, anno_file, shuffle):
+ """
+ initialize all entries given annotation json file
+
+ Parameters:
+ ----------
+ anno_file: str
+ annotation json file
+ shuffle: bool
+ whether to shuffle image list
+ """
+ image_set_index = []
+ labels = []
+ coco = COCO(anno_file)
+ img_ids = coco.getImgIds()
+ for img_id in img_ids:
+ # filename
+ image_info = coco.loadImgs(img_id)[0]
+ filename = image_info["file_name"]
+ subdir = filename.split('_')[1]
+ height = image_info["height"]
+ width = image_info["width"]
+ # label
+ anno_ids = coco.getAnnIds(imgIds=img_id)
+ annos = coco.loadAnns(anno_ids)
+ label = []
+ for anno in annos:
+ cat_id = int(anno["category_id"])
+ bbox = anno["bbox"]
+ assert len(bbox) == 4
+ xmin = float(bbox[0]) / width
+ ymin = float(bbox[1]) / height
+ xmax = xmin + float(bbox[2]) / width
+ ymax = ymin + float(bbox[3]) / height
+ label.append([cat_id, xmin, ymin, xmax, ymax, 0])
+ if label:
+ labels.append(np.array(label))
+ image_set_index.append(os.path.join(subdir, filename))
+
+ if shuffle:
+ import random
+ indices = range(len(image_set_index))
+ random.shuffle(indices)
+ image_set_index = [image_set_index[i] for i in indices]
+ labels = [labels[i] for i in indices]
+ # store the results
+ self.image_set_index = image_set_index
+ self.labels = labels
http://git-wip-us.apache.org/repos/asf/incubator-mxnet-test/blob/cc62aded/example/ssd/dataset/names/mscoco.names
----------------------------------------------------------------------
diff --git a/example/ssd/dataset/names/mscoco.names b/example/ssd/dataset/names/mscoco.names
new file mode 100644
index 0000000..ca76c80
--- /dev/null
+++ b/example/ssd/dataset/names/mscoco.names
@@ -0,0 +1,80 @@
+person
+bicycle
+car
+motorbike
+aeroplane
+bus
+train
+truck
+boat
+traffic light
+fire hydrant
+stop sign
+parking meter
+bench
+bird
+cat
+dog
+horse
+sheep
+cow
+elephant
+bear
+zebra
+giraffe
+backpack
+umbrella
+handbag
+tie
+suitcase
+frisbee
+skis
+snowboard
+sports ball
+kite
+baseball bat
+baseball glove
+skateboard
+surfboard
+tennis racket
+bottle
+wine glass
+cup
+fork
+knife
+spoon
+bowl
+banana
+apple
+sandwich
+orange
+broccoli
+carrot
+hot dog
+pizza
+donut
+cake
+chair
+sofa
+pottedplant
+bed
+diningtable
+toilet
+tvmonitor
+laptop
+mouse
+remote
+keyboard
+cell phone
+microwave
+oven
+toaster
+sink
+refrigerator
+book
+clock
+vase
+scissors
+teddy bear
+hair drier
+toothbrush
http://git-wip-us.apache.org/repos/asf/incubator-mxnet-test/blob/cc62aded/example/ssd/dataset/names/pascal_voc.names
----------------------------------------------------------------------
diff --git a/example/ssd/dataset/names/pascal_voc.names b/example/ssd/dataset/names/pascal_voc.names
new file mode 100644
index 0000000..8420ab3
--- /dev/null
+++ b/example/ssd/dataset/names/pascal_voc.names
@@ -0,0 +1,20 @@
+aeroplane
+bicycle
+bird
+boat
+bottle
+bus
+car
+cat
+chair
+cow
+diningtable
+dog
+horse
+motorbike
+person
+pottedplant
+sheep
+sofa
+train
+tvmonitor
http://git-wip-us.apache.org/repos/asf/incubator-mxnet-test/blob/cc62aded/example/ssd/dataset/pascal_voc.py
----------------------------------------------------------------------
diff --git a/example/ssd/dataset/pascal_voc.py b/example/ssd/dataset/pascal_voc.py
index 2c61be7..31e287e 100644
--- a/example/ssd/dataset/pascal_voc.py
+++ b/example/ssd/dataset/pascal_voc.py
@@ -24,7 +24,8 @@ class PascalVoc(Imdb):
is_train : boolean
if true, will load annotations
"""
- def __init__(self, image_set, year, devkit_path, shuffle=False, is_train=False):
+ def __init__(self, image_set, year, devkit_path, shuffle=False, is_train=False,
+ names='pascal_voc.names'):
super(PascalVoc, self).__init__('voc_' + year + '_' + image_set)
self.image_set = image_set
self.year = year
@@ -33,11 +34,8 @@ class PascalVoc(Imdb):
self.extension = '.jpg'
self.is_train = is_train
- self.classes = ['aeroplane', 'bicycle', 'bird', 'boat',
- 'bottle', 'bus', 'car', 'cat', 'chair',
- 'cow', 'diningtable', 'dog', 'horse',
- 'motorbike', 'person', 'pottedplant',
- 'sheep', 'sofa', 'train', 'tvmonitor']
+ self.classes = self._load_class_names(names,
+ os.path.join(os.path.dirname(__file__), 'names'))
self.config = {'use_difficult': True,
'comp_id': 'comp4',}
http://git-wip-us.apache.org/repos/asf/incubator-mxnet-test/blob/cc62aded/example/ssd/dataset/pycocotools/README.md
----------------------------------------------------------------------
diff --git a/example/ssd/dataset/pycocotools/README.md b/example/ssd/dataset/pycocotools/README.md
new file mode 100755
index 0000000..d358f53
--- /dev/null
+++ b/example/ssd/dataset/pycocotools/README.md
@@ -0,0 +1,2 @@
+This is a modified version of https://github.com/pdollar/coco python API.
+No `make` is required, but this will not support mask functions.
http://git-wip-us.apache.org/repos/asf/incubator-mxnet-test/blob/cc62aded/example/ssd/dataset/pycocotools/__init__.py
----------------------------------------------------------------------
diff --git a/example/ssd/dataset/pycocotools/__init__.py b/example/ssd/dataset/pycocotools/__init__.py
new file mode 100755
index 0000000..3f7d85b
--- /dev/null
+++ b/example/ssd/dataset/pycocotools/__init__.py
@@ -0,0 +1 @@
+__author__ = 'tylin'
http://git-wip-us.apache.org/repos/asf/incubator-mxnet-test/blob/cc62aded/example/ssd/dataset/pycocotools/coco.py
----------------------------------------------------------------------
diff --git a/example/ssd/dataset/pycocotools/coco.py b/example/ssd/dataset/pycocotools/coco.py
new file mode 100755
index 0000000..a8939f6
--- /dev/null
+++ b/example/ssd/dataset/pycocotools/coco.py
@@ -0,0 +1,435 @@
+__author__ = 'tylin'
+__version__ = '2.0'
+# Interface for accessing the Microsoft COCO dataset.
+
+# Microsoft COCO is a large image dataset designed for object detection,
+# segmentation, and caption generation. pycocotools is a Python API that
+# assists in loading, parsing and visualizing the annotations in COCO.
+# Please visit http://mscoco.org/ for more information on COCO, including
+# for the data, paper, and tutorials. The exact format of the annotations
+# is also described on the COCO website. For example usage of the pycocotools
+# please see pycocotools_demo.ipynb. In addition to this API, please download both
+# the COCO images and annotations in order to run the demo.
+
+# An alternative to using the API is to load the annotations directly
+# into Python dictionary
+# Using the API provides additional utility functions. Note that this API
+# supports both *instance* and *caption* annotations. In the case of
+# captions not all functions are defined (e.g. categories are undefined).
+
+# The following API functions are defined:
+# COCO - COCO api class that loads COCO annotation file and prepare data structures.
+# decodeMask - Decode binary mask M encoded via run-length encoding.
+# encodeMask - Encode binary mask M using run-length encoding.
+# getAnnIds - Get ann ids that satisfy given filter conditions.
+# getCatIds - Get cat ids that satisfy given filter conditions.
+# getImgIds - Get img ids that satisfy given filter conditions.
+# loadAnns - Load anns with the specified ids.
+# loadCats - Load cats with the specified ids.
+# loadImgs - Load imgs with the specified ids.
+# annToMask - Convert segmentation in an annotation to binary mask.
+# showAnns - Display the specified annotations.
+# loadRes - Load algorithm results and create API for accessing them.
+# download - Download COCO images from mscoco.org server.
+# Throughout the API "ann"=annotation, "cat"=category, and "img"=image.
+# Help on each functions can be accessed by: "help COCO>function".
+
+# See also COCO>decodeMask,
+# COCO>encodeMask, COCO>getAnnIds, COCO>getCatIds,
+# COCO>getImgIds, COCO>loadAnns, COCO>loadCats,
+# COCO>loadImgs, COCO>annToMask, COCO>showAnns
+
+# Microsoft COCO Toolbox. version 2.0
+# Data, paper, and tutorials available at: http://mscoco.org/
+# Code written by Piotr Dollar and Tsung-Yi Lin, 2014.
+# Licensed under the Simplified BSD License [see bsd.txt]
+
+import json
+import time
+import matplotlib.pyplot as plt
+from matplotlib.collections import PatchCollection
+from matplotlib.patches import Polygon
+import numpy as np
+import copy
+import itertools
+# from . import mask as maskUtils
+import os
+from collections import defaultdict
+import sys
+PYTHON_VERSION = sys.version_info[0]
+if PYTHON_VERSION == 2:
+ from urllib import urlretrieve
+elif PYTHON_VERSION == 3:
+ from urllib.request import urlretrieve
+
+class COCO:
+ def __init__(self, annotation_file=None):
+ """
+ Constructor of Microsoft COCO helper class for reading and visualizing annotations.
+ :param annotation_file (str): location of annotation file
+ :param image_folder (str): location to the folder that hosts images.
+ :return:
+ """
+ # load dataset
+ self.dataset,self.anns,self.cats,self.imgs = dict(),dict(),dict(),dict()
+ self.imgToAnns, self.catToImgs = defaultdict(list), defaultdict(list)
+ if not annotation_file == None:
+ print('loading annotations into memory...')
+ tic = time.time()
+ dataset = json.load(open(annotation_file, 'r'))
+ assert type(dataset)==dict, 'annotation file format {} not supported'.format(type(dataset))
+ print('Done (t={:0.2f}s)'.format(time.time()- tic))
+ self.dataset = dataset
+ self.createIndex()
+
+ def createIndex(self):
+ # create index
+ print('creating index...')
+ anns, cats, imgs = {}, {}, {}
+ imgToAnns,catToImgs = defaultdict(list),defaultdict(list)
+ if 'annotations' in self.dataset:
+ for ann in self.dataset['annotations']:
+ imgToAnns[ann['image_id']].append(ann)
+ anns[ann['id']] = ann
+
+ if 'images' in self.dataset:
+ for img in self.dataset['images']:
+ imgs[img['id']] = img
+
+ if 'categories' in self.dataset:
+ for cat in self.dataset['categories']:
+ cats[cat['id']] = cat
+
+ if 'annotations' in self.dataset and 'categories' in self.dataset:
+ for ann in self.dataset['annotations']:
+ catToImgs[ann['category_id']].append(ann['image_id'])
+
+ print('index created!')
+
+ # create class members
+ self.anns = anns
+ self.imgToAnns = imgToAnns
+ self.catToImgs = catToImgs
+ self.imgs = imgs
+ self.cats = cats
+
+ def info(self):
+ """
+ Print information about the annotation file.
+ :return:
+ """
+ for key, value in self.dataset['info'].items():
+ print('{}: {}'.format(key, value))
+
+ def getAnnIds(self, imgIds=[], catIds=[], areaRng=[], iscrowd=None):
+ """
+ Get ann ids that satisfy given filter conditions. default skips that filter
+ :param imgIds (int array) : get anns for given imgs
+ catIds (int array) : get anns for given cats
+ areaRng (float array) : get anns for given area range (e.g. [0 inf])
+ iscrowd (boolean) : get anns for given crowd label (False or True)
+ :return: ids (int array) : integer array of ann ids
+ """
+ imgIds = imgIds if type(imgIds) == list else [imgIds]
+ catIds = catIds if type(catIds) == list else [catIds]
+
+ if len(imgIds) == len(catIds) == len(areaRng) == 0:
+ anns = self.dataset['annotations']
+ else:
+ if not len(imgIds) == 0:
+ lists = [self.imgToAnns[imgId] for imgId in imgIds if imgId in self.imgToAnns]
+ anns = list(itertools.chain.from_iterable(lists))
+ else:
+ anns = self.dataset['annotations']
+ anns = anns if len(catIds) == 0 else [ann for ann in anns if ann['category_id'] in catIds]
+ anns = anns if len(areaRng) == 0 else [ann for ann in anns if ann['area'] > areaRng[0] and ann['area'] < areaRng[1]]
+ if not iscrowd == None:
+ ids = [ann['id'] for ann in anns if ann['iscrowd'] == iscrowd]
+ else:
+ ids = [ann['id'] for ann in anns]
+ return ids
+
+ def getCatIds(self, catNms=[], supNms=[], catIds=[]):
+ """
+ filtering parameters. default skips that filter.
+ :param catNms (str array) : get cats for given cat names
+ :param supNms (str array) : get cats for given supercategory names
+ :param catIds (int array) : get cats for given cat ids
+ :return: ids (int array) : integer array of cat ids
+ """
+ catNms = catNms if type(catNms) == list else [catNms]
+ supNms = supNms if type(supNms) == list else [supNms]
+ catIds = catIds if type(catIds) == list else [catIds]
+
+ if len(catNms) == len(supNms) == len(catIds) == 0:
+ cats = self.dataset['categories']
+ else:
+ cats = self.dataset['categories']
+ cats = cats if len(catNms) == 0 else [cat for cat in cats if cat['name'] in catNms]
+ cats = cats if len(supNms) == 0 else [cat for cat in cats if cat['supercategory'] in supNms]
+ cats = cats if len(catIds) == 0 else [cat for cat in cats if cat['id'] in catIds]
+ ids = [cat['id'] for cat in cats]
+ return ids
+
+ def getImgIds(self, imgIds=[], catIds=[]):
+ '''
+ Get img ids that satisfy given filter conditions.
+ :param imgIds (int array) : get imgs for given ids
+ :param catIds (int array) : get imgs with all given cats
+ :return: ids (int array) : integer array of img ids
+ '''
+ imgIds = imgIds if type(imgIds) == list else [imgIds]
+ catIds = catIds if type(catIds) == list else [catIds]
+
+ if len(imgIds) == len(catIds) == 0:
+ ids = self.imgs.keys()
+ else:
+ ids = set(imgIds)
+ for i, catId in enumerate(catIds):
+ if i == 0 and len(ids) == 0:
+ ids = set(self.catToImgs[catId])
+ else:
+ ids &= set(self.catToImgs[catId])
+ return list(ids)
+
+ def loadAnns(self, ids=[]):
+ """
+ Load anns with the specified ids.
+ :param ids (int array) : integer ids specifying anns
+ :return: anns (object array) : loaded ann objects
+ """
+ if type(ids) == list:
+ return [self.anns[id] for id in ids]
+ elif type(ids) == int:
+ return [self.anns[ids]]
+
+ def loadCats(self, ids=[]):
+ """
+ Load cats with the specified ids.
+ :param ids (int array) : integer ids specifying cats
+ :return: cats (object array) : loaded cat objects
+ """
+ if type(ids) == list:
+ return [self.cats[id] for id in ids]
+ elif type(ids) == int:
+ return [self.cats[ids]]
+
+ def loadImgs(self, ids=[]):
+ """
+ Load anns with the specified ids.
+ :param ids (int array) : integer ids specifying img
+ :return: imgs (object array) : loaded img objects
+ """
+ if type(ids) == list:
+ return [self.imgs[id] for id in ids]
+ elif type(ids) == int:
+ return [self.imgs[ids]]
+
+ def showAnns(self, anns):
+ """
+ Display the specified annotations.
+ :param anns (array of object): annotations to display
+ :return: None
+ """
+ if len(anns) == 0:
+ return 0
+ if 'segmentation' in anns[0] or 'keypoints' in anns[0]:
+ datasetType = 'instances'
+ elif 'caption' in anns[0]:
+ datasetType = 'captions'
+ else:
+ raise Exception('datasetType not supported')
+ if datasetType == 'instances':
+ ax = plt.gca()
+ ax.set_autoscale_on(False)
+ polygons = []
+ color = []
+ for ann in anns:
+ c = (np.random.random((1, 3))*0.6+0.4).tolist()[0]
+ if 'segmentation' in ann:
+ if type(ann['segmentation']) == list:
+ # polygon
+ for seg in ann['segmentation']:
+ poly = np.array(seg).reshape((int(len(seg)/2), 2))
+ polygons.append(Polygon(poly))
+ color.append(c)
+ else:
+ # mask
+ t = self.imgs[ann['image_id']]
+ if type(ann['segmentation']['counts']) == list:
+ # rle = maskUtils.frPyObjects([ann['segmentation']], t['height'], t['width'])
+ raise NotImplementedError("maskUtils disabled!")
+ else:
+ rle = [ann['segmentation']]
+ # m = maskUtils.decode(rle)
+ raise NotImplementedError("maskUtils disabled!")
+ img = np.ones( (m.shape[0], m.shape[1], 3) )
+ if ann['iscrowd'] == 1:
+ color_mask = np.array([2.0,166.0,101.0])/255
+ if ann['iscrowd'] == 0:
+ color_mask = np.random.random((1, 3)).tolist()[0]
+ for i in range(3):
+ img[:,:,i] = color_mask[i]
+ ax.imshow(np.dstack( (img, m*0.5) ))
+ if 'keypoints' in ann and type(ann['keypoints']) == list:
+ # turn skeleton into zero-based index
+ sks = np.array(self.loadCats(ann['category_id'])[0]['skeleton'])-1
+ kp = np.array(ann['keypoints'])
+ x = kp[0::3]
+ y = kp[1::3]
+ v = kp[2::3]
+ for sk in sks:
+ if np.all(v[sk]>0):
+ plt.plot(x[sk],y[sk], linewidth=3, color=c)
+ plt.plot(x[v>0], y[v>0],'o',markersize=8, markerfacecolor=c, markeredgecolor='k',markeredgewidth=2)
+ plt.plot(x[v>1], y[v>1],'o',markersize=8, markerfacecolor=c, markeredgecolor=c, markeredgewidth=2)
+ p = PatchCollection(polygons, facecolor=color, linewidths=0, alpha=0.4)
+ ax.add_collection(p)
+ p = PatchCollection(polygons, facecolor='none', edgecolors=color, linewidths=2)
+ ax.add_collection(p)
+ elif datasetType == 'captions':
+ for ann in anns:
+ print(ann['caption'])
+
+ def loadRes(self, resFile):
+ """
+ Load result file and return a result api object.
+ :param resFile (str) : file name of result file
+ :return: res (obj) : result api object
+ """
+ res = COCO()
+ res.dataset['images'] = [img for img in self.dataset['images']]
+
+ print('Loading and preparing results...')
+ tic = time.time()
+ if type(resFile) == str or type(resFile) == unicode:
+ anns = json.load(open(resFile))
+ elif type(resFile) == np.ndarray:
+ anns = self.loadNumpyAnnotations(resFile)
+ else:
+ anns = resFile
+ assert type(anns) == list, 'results in not an array of objects'
+ annsImgIds = [ann['image_id'] for ann in anns]
+ assert set(annsImgIds) == (set(annsImgIds) & set(self.getImgIds())), \
+ 'Results do not correspond to current coco set'
+ if 'caption' in anns[0]:
+ imgIds = set([img['id'] for img in res.dataset['images']]) & set([ann['image_id'] for ann in anns])
+ res.dataset['images'] = [img for img in res.dataset['images'] if img['id'] in imgIds]
+ for id, ann in enumerate(anns):
+ ann['id'] = id+1
+ elif 'bbox' in anns[0] and not anns[0]['bbox'] == []:
+ res.dataset['categories'] = copy.deepcopy(self.dataset['categories'])
+ for id, ann in enumerate(anns):
+ bb = ann['bbox']
+ x1, x2, y1, y2 = [bb[0], bb[0]+bb[2], bb[1], bb[1]+bb[3]]
+ if not 'segmentation' in ann:
+ ann['segmentation'] = [[x1, y1, x1, y2, x2, y2, x2, y1]]
+ ann['area'] = bb[2]*bb[3]
+ ann['id'] = id+1
+ ann['iscrowd'] = 0
+ elif 'segmentation' in anns[0]:
+ res.dataset['categories'] = copy.deepcopy(self.dataset['categories'])
+ for id, ann in enumerate(anns):
+ # now only support compressed RLE format as segmentation results
+ # ann['area'] = maskUtils.area(ann['segmentation'])
+ raise NotImplementedError("maskUtils disabled!")
+ if not 'bbox' in ann:
+ # ann['bbox'] = maskUtils.toBbox(ann['segmentation'])
+ raise NotImplementedError("maskUtils disabled!")
+ ann['id'] = id+1
+ ann['iscrowd'] = 0
+ elif 'keypoints' in anns[0]:
+ res.dataset['categories'] = copy.deepcopy(self.dataset['categories'])
+ for id, ann in enumerate(anns):
+ s = ann['keypoints']
+ x = s[0::3]
+ y = s[1::3]
+ x0,x1,y0,y1 = np.min(x), np.max(x), np.min(y), np.max(y)
+ ann['area'] = (x1-x0)*(y1-y0)
+ ann['id'] = id + 1
+ ann['bbox'] = [x0,y0,x1-x0,y1-y0]
+ print('DONE (t={:0.2f}s)'.format(time.time()- tic))
+
+ res.dataset['annotations'] = anns
+ res.createIndex()
+ return res
+
+ def download(self, tarDir = None, imgIds = [] ):
+ '''
+ Download COCO images from mscoco.org server.
+ :param tarDir (str): COCO results directory name
+ imgIds (list): images to be downloaded
+ :return:
+ '''
+ if tarDir is None:
+ print('Please specify target directory')
+ return -1
+ if len(imgIds) == 0:
+ imgs = self.imgs.values()
+ else:
+ imgs = self.loadImgs(imgIds)
+ N = len(imgs)
+ if not os.path.exists(tarDir):
+ os.makedirs(tarDir)
+ for i, img in enumerate(imgs):
+ tic = time.time()
+ fname = os.path.join(tarDir, img['file_name'])
+ if not os.path.exists(fname):
+ urlretrieve(img['coco_url'], fname)
+ print('downloaded {}/{} images (t={:0.1f}s)'.format(i, N, time.time()- tic))
+
+ def loadNumpyAnnotations(self, data):
+ """
+ Convert result data from a numpy array [Nx7] where each row contains {imageID,x1,y1,w,h,score,class}
+ :param data (numpy.ndarray)
+ :return: annotations (python nested list)
+ """
+ print('Converting ndarray to lists...')
+ assert(type(data) == np.ndarray)
+ print(data.shape)
+ assert(data.shape[1] == 7)
+ N = data.shape[0]
+ ann = []
+ for i in range(N):
+ if i % 1000000 == 0:
+ print('{}/{}'.format(i,N))
+ ann += [{
+ 'image_id' : int(data[i, 0]),
+ 'bbox' : [ data[i, 1], data[i, 2], data[i, 3], data[i, 4] ],
+ 'score' : data[i, 5],
+ 'category_id': int(data[i, 6]),
+ }]
+ return ann
+
+ def annToRLE(self, ann):
+ """
+ Convert annotation which can be polygons, uncompressed RLE to RLE.
+ :return: binary mask (numpy 2D array)
+ """
+ t = self.imgs[ann['image_id']]
+ h, w = t['height'], t['width']
+ segm = ann['segmentation']
+ if type(segm) == list:
+ # polygon -- a single object might consist of multiple parts
+ # we merge all parts into one mask rle code
+ # rles = maskUtils.frPyObjects(segm, h, w)
+ # rle = maskUtils.merge(rles)
+ raise NotImplementedError("maskUtils disabled!")
+ elif type(segm['counts']) == list:
+ # uncompressed RLE
+ # rle = maskUtils.frPyObjects(segm, h, w)
+ raise NotImplementedError("maskUtils disabled!")
+ else:
+ # rle
+ rle = ann['segmentation']
+ return rle
+
+ def annToMask(self, ann):
+ """
+ Convert annotation which can be polygons, uncompressed RLE, or RLE to binary mask.
+ :return: binary mask (numpy 2D array)
+ """
+ rle = self.annToRLE(ann)
+ # m = maskUtils.decode(rle)
+ raise NotImplementedError("maskUtils disabled!")
+ return m
http://git-wip-us.apache.org/repos/asf/incubator-mxnet-test/blob/cc62aded/example/ssd/demo.py
----------------------------------------------------------------------
diff --git a/example/ssd/demo.py b/example/ssd/demo.py
index ededbdb..bda4606 100644
--- a/example/ssd/demo.py
+++ b/example/ssd/demo.py
@@ -2,18 +2,12 @@ import argparse
import tools.find_mxnet
import mxnet as mx
import os
-import importlib
import sys
from detect.detector import Detector
+from symbol.symbol_factory import get_symbol
-CLASSES = ('aeroplane', 'bicycle', 'bird', 'boat',
- 'bottle', 'bus', 'car', 'cat', 'chair',
- 'cow', 'diningtable', 'dog', 'horse',
- 'motorbike', 'person', 'pottedplant',
- 'sheep', 'sofa', 'train', 'tvmonitor')
-
-def get_detector(net, prefix, epoch, data_shape, mean_pixels, ctx,
- nms_thresh=0.5, force_nms=True):
+def get_detector(net, prefix, epoch, data_shape, mean_pixels, ctx, num_class,
+ nms_thresh=0.5, force_nms=True, nms_topk=400):
"""
wrapper for initialize a detector
@@ -31,23 +25,25 @@ def get_detector(net, prefix, epoch, data_shape, mean_pixels, ctx,
mean pixel values (R, G, B)
ctx : mx.ctx
running context, mx.cpu() or mx.gpu(?)
+ num_class : int
+ number of classes
+ nms_thresh : float
+ non-maximum suppression threshold
force_nms : bool
force suppress different categories
"""
- sys.path.append(os.path.join(os.getcwd(), 'symbol'))
if net is not None:
- net = importlib.import_module("symbol_" + net) \
- .get_symbol(len(CLASSES), nms_thresh, force_nms)
- detector = Detector(net, prefix + "_" + str(data_shape), epoch, \
- data_shape, mean_pixels, ctx=ctx)
+ net = get_symbol(net, data_shape, num_classes=num_class, nms_thresh=nms_thresh,
+ force_nms=force_nms, nms_topk=nms_topk)
+ detector = Detector(net, prefix, epoch, data_shape, mean_pixels, ctx=ctx)
return detector
def parse_args():
parser = argparse.ArgumentParser(description='Single-shot detection network demo')
- parser.add_argument('--network', dest='network', type=str, default='vgg16_ssd_300',
- choices=['vgg16_ssd_300', 'vgg16_ssd_512'], help='which network to use')
+ parser.add_argument('--network', dest='network', type=str, default='resnet50',
+ help='which network to use')
parser.add_argument('--images', dest='images', type=str, default='./data/demo/dog.jpg',
- help='run demo with images, use comma(without space) to seperate multiple images')
+ help='run demo with images, use comma to seperate multiple images')
parser.add_argument('--dir', dest='dir', nargs='?',
help='demo image directory, optional', type=str)
parser.add_argument('--ext', dest='extension', help='image extension, optional',
@@ -55,12 +51,13 @@ def parse_args():
parser.add_argument('--epoch', dest='epoch', help='epoch of trained model',
default=0, type=int)
parser.add_argument('--prefix', dest='prefix', help='trained model prefix',
- default=os.path.join(os.getcwd(), 'model', 'ssd'), type=str)
+ default=os.path.join(os.getcwd(), 'model', 'ssd_'),
+ type=str)
parser.add_argument('--cpu', dest='cpu', help='(override GPU) use CPU to detect',
action='store_true', default=False)
parser.add_argument('--gpu', dest='gpu_id', type=int, default=0,
help='GPU device id to detect with')
- parser.add_argument('--data-shape', dest='data_shape', type=int, default=300,
+ parser.add_argument('--data-shape', dest='data_shape', type=int, default=512,
help='set image shape')
parser.add_argument('--mean-r', dest='mean_r', type=float, default=123,
help='red mean value')
@@ -78,9 +75,29 @@ def parse_args():
help='show detection time')
parser.add_argument('--deploy', dest='deploy_net', action='store_true', default=False,
help='Load network from json file, rather than from symbol')
+ parser.add_argument('--class-names', dest='class_names', type=str,
+ default='aeroplane, bicycle, bird, boat, bottle, bus, \
+ car, cat, chair, cow, diningtable, dog, horse, motorbike, \
+ person, pottedplant, sheep, sofa, train, tvmonitor',
+ help='string of comma separated names, or text filename')
args = parser.parse_args()
return args
+def parse_class_names(class_names):
+ """ parse # classes and class_names if applicable """
+ if len(class_names) > 0:
+ if os.path.isfile(class_names):
+ # try to open it to read class names
+ with open(class_names, 'r') as f:
+ class_names = [l.strip() for l in f.readlines()]
+ else:
+ class_names = [c.strip() for c in class_names.split(',')]
+ for name in class_names:
+ assert len(name) > 0
+ else:
+ raise RuntimeError("No valid class_name provided...")
+ return class_names
+
if __name__ == '__main__':
args = parse_args()
if args.cpu:
@@ -93,10 +110,15 @@ if __name__ == '__main__':
assert len(image_list) > 0, "No valid image specified to detect"
network = None if args.deploy_net else args.network
- detector = get_detector(network, args.prefix, args.epoch,
+ class_names = parse_class_names(args.class_names)
+ if args.prefix.endswith('_'):
+ prefix = args.prefix + args.network + '_' + str(args.data_shape)
+ else:
+ prefix = args.prefix
+ detector = get_detector(network, prefix, args.epoch,
args.data_shape,
(args.mean_r, args.mean_g, args.mean_b),
- ctx, args.nms_thresh, args.force_nms)
+ ctx, len(class_names), args.nms_thresh, args.force_nms)
# run detection
detector.detect_and_visualize(image_list, args.dir, args.extension,
- CLASSES, args.thresh, args.show_timer)
+ class_names, args.thresh, args.show_timer)
http://git-wip-us.apache.org/repos/asf/incubator-mxnet-test/blob/cc62aded/example/ssd/deploy.py
----------------------------------------------------------------------
diff --git a/example/ssd/deploy.py b/example/ssd/deploy.py
index 264314a..aa70cac 100644
--- a/example/ssd/deploy.py
+++ b/example/ssd/deploy.py
@@ -5,32 +5,41 @@ import mxnet as mx
import os
import importlib
import sys
+from symbol.symbol_factory import get_symbol
def parse_args():
parser = argparse.ArgumentParser(description='Convert a trained model to deploy model')
- parser.add_argument('--network', dest='network', type=str, default='vgg16_ssd_300',
- choices=['vgg16_ssd_300', 'vgg16_ssd_512'], help='which network to use')
+ parser.add_argument('--network', dest='network', type=str, default='vgg16_reduced',
+ help='which network to use')
parser.add_argument('--epoch', dest='epoch', help='epoch of trained model',
default=0, type=int)
parser.add_argument('--prefix', dest='prefix', help='trained model prefix',
- default=os.path.join(os.getcwd(), 'model', 'ssd_300'), type=str)
+ default=os.path.join(os.getcwd(), 'model', 'ssd_'), type=str)
+ parser.add_argument('--data-shape', dest='data_shape', type=int, default=300,
+ help='data shape')
parser.add_argument('--num-class', dest='num_classes', help='number of classes',
default=20, type=int)
parser.add_argument('--nms', dest='nms_thresh', type=float, default=0.5,
help='non-maximum suppression threshold, default 0.5')
parser.add_argument('--force', dest='force_nms', type=bool, default=True,
help='force non-maximum suppression on different class')
+ parser.add_argument('--topk', dest='nms_topk', type=int, default=400,
+ help='apply nms only to top k detections based on scores.')
args = parser.parse_args()
return args
if __name__ == '__main__':
args = parse_args()
- sys.path.append(os.path.join(os.getcwd(), 'symbol'))
- net = importlib.import_module("symbol_" + args.network) \
- .get_symbol(args.num_classes, args.nms_thresh, args.force_nms)
- _, arg_params, aux_params = mx.model.load_checkpoint(args.prefix, args.epoch)
+ net = get_symbol(args.network, args.data_shape,
+ num_classes=args.num_classes, nms_thresh=args.nms_thresh,
+ force_suppress=args.force_nms, nms_topk=args.nms_topk)
+ if args.prefix.endswith('_'):
+ prefix = args.prefix + args.network + '_' + str(args.data_shape)
+ else:
+ prefix = args.prefix
+ _, arg_params, aux_params = mx.model.load_checkpoint(prefix, args.epoch)
# new name
- tmp = args.prefix.rsplit('/', 1)
+ tmp = prefix.rsplit('/', 1)
save_prefix = '/deploy_'.join(tmp)
mx.model.save_checkpoint(save_prefix, args.epoch, net, arg_params, aux_params)
print("Saved model: {}-{:04d}.param".format(save_prefix, args.epoch))
http://git-wip-us.apache.org/repos/asf/incubator-mxnet-test/blob/cc62aded/example/ssd/evaluate.py
----------------------------------------------------------------------
diff --git a/example/ssd/evaluate.py b/example/ssd/evaluate.py
index a38a7f6..65e0b30 100644
--- a/example/ssd/evaluate.py
+++ b/example/ssd/evaluate.py
@@ -5,30 +5,27 @@ import os
import sys
from evaluate.evaluate_net import evaluate_net
-CLASSES = ('aeroplane', 'bicycle', 'bird', 'boat',
- 'bottle', 'bus', 'car', 'cat', 'chair',
- 'cow', 'diningtable', 'dog', 'horse',
- 'motorbike', 'person', 'pottedplant',
- 'sheep', 'sofa', 'train', 'tvmonitor')
-
def parse_args():
parser = argparse.ArgumentParser(description='Evaluate a network')
parser.add_argument('--rec-path', dest='rec_path', help='which record file to use',
default=os.path.join(os.getcwd(), 'data', 'val.rec'), type=str)
parser.add_argument('--list-path', dest='list_path', help='which list file to use',
default="", type=str)
- parser.add_argument('--network', dest='network', type=str, default='vgg16_ssd_300',
- choices=['vgg16_ssd_300', 'vgg16_ssd_512'], help='which network to use')
+ parser.add_argument('--network', dest='network', type=str, default='vgg16_reduced',
+ help='which network to use')
parser.add_argument('--batch-size', dest='batch_size', type=int, default=32,
help='evaluation batch size')
parser.add_argument('--num-class', dest='num_class', type=int, default=20,
help='number of classes')
- parser.add_argument('--class-names', dest='class_names', type=str, default=",".join(CLASSES),
+ parser.add_argument('--class-names', dest='class_names', type=str,
+ default='aeroplane, bicycle, bird, boat, bottle, bus, \
+ car, cat, chair, cow, diningtable, dog, horse, motorbike, \
+ person, pottedplant, sheep, sofa, train, tvmonitor',
help='string of comma separated names, or text filename')
parser.add_argument('--epoch', dest='epoch', help='epoch of pretrained model',
default=0, type=int)
parser.add_argument('--prefix', dest='prefix', help='load model prefix',
- default=os.path.join(os.getcwd(), 'model', 'ssd'), type=str)
+ default=os.path.join(os.getcwd(), 'model', 'ssd_'), type=str)
parser.add_argument('--gpus', dest='gpu_id', help='GPU devices to evaluate with',
default='0', type=str)
parser.add_argument('--cpu', dest='cpu', help='use cpu to evaluate, this can be slow',
@@ -79,9 +76,13 @@ if __name__ == '__main__':
class_names = None
network = None if args.deploy_net else args.network
+ if args.prefix.endswith('_'):
+ prefix = args.prefix + args.network
+ else:
+ prefix = args.prefix
evaluate_net(network, args.rec_path, num_class,
(args.mean_r, args.mean_g, args.mean_b), args.data_shape,
- args.prefix, args.epoch, ctx, batch_size=args.batch_size,
+ prefix, args.epoch, ctx, batch_size=args.batch_size,
path_imglist=args.list_path, nms_thresh=args.nms_thresh,
force_nms=args.force_nms, ovp_thresh=args.overlap_thresh,
use_difficult=args.use_difficult, class_names=class_names,
http://git-wip-us.apache.org/repos/asf/incubator-mxnet-test/blob/cc62aded/example/ssd/evaluate/evaluate_net.py
----------------------------------------------------------------------
diff --git a/example/ssd/evaluate/evaluate_net.py b/example/ssd/evaluate/evaluate_net.py
index 8d86f8e..4c629f8 100644
--- a/example/ssd/evaluate/evaluate_net.py
+++ b/example/ssd/evaluate/evaluate_net.py
@@ -7,6 +7,7 @@ from dataset.iterator import DetRecordIter
from config.config import cfg
from evaluate.eval_metric import MApMetric, VOC07MApMetric
import logging
+from symbol.symbol_factory import get_symbol
def evaluate_net(net, path_imgrec, num_classes, mean_pixels, data_shape,
model_prefix, epoch, ctx=mx.cpu(), batch_size=1,
@@ -71,9 +72,8 @@ def evaluate_net(net, path_imgrec, num_classes, mean_pixels, data_shape,
if net is None:
net = load_net
else:
- sys.path.append(os.path.join(cfg.ROOT_DIR, 'symbol'))
- net = importlib.import_module("symbol_" + net) \
- .get_symbol(num_classes, nms_thresh, force_nms)
+ net = get_symbol(net, data_shape[1], num_classes=num_classes,
+ nms_thresh=nms_thresh, force_suppress=force_nms)
if not 'label' in net.list_arguments():
label = mx.sym.Variable(name='label')
net = mx.sym.Group([net, label])
http://git-wip-us.apache.org/repos/asf/incubator-mxnet-test/blob/cc62aded/example/ssd/symbol/README.md
----------------------------------------------------------------------
diff --git a/example/ssd/symbol/README.md b/example/ssd/symbol/README.md
new file mode 100644
index 0000000..8fee319
--- /dev/null
+++ b/example/ssd/symbol/README.md
@@ -0,0 +1,49 @@
+## How to compose SSD network on top of mainstream classification networks
+
+1. Have the base network ready in this directory as `name.py`, such as `inceptionv3.py`.
+2. Add configuration to `symbol_factory.py`, an example would be:
+```
+if network == 'vgg16_reduced':
+ if data_shape >= 448:
+ from_layers = ['relu4_3', 'relu7', '', '', '', '', '']
+ num_filters = [512, -1, 512, 256, 256, 256, 256]
+ strides = [-1, -1, 2, 2, 2, 2, 1]
+ pads = [-1, -1, 1, 1, 1, 1, 1]
+ sizes = [[.07, .1025], [.15,.2121], [.3, .3674], [.45, .5196], [.6, .6708], \
+ [.75, .8216], [.9, .9721]]
+ ratios = [[1,2,.5], [1,2,.5,3,1./3], [1,2,.5,3,1./3], [1,2,.5,3,1./3], \
+ [1,2,.5,3,1./3], [1,2,.5], [1,2,.5]]
+ normalizations = [20, -1, -1, -1, -1, -1, -1]
+ steps = [] if data_shape != 512 else [x / 512.0 for x in
+ [8, 16, 32, 64, 128, 256, 512]]
+ else:
+ from_layers = ['relu4_3', 'relu7', '', '', '', '']
+ num_filters = [512, -1, 512, 256, 256, 256]
+ strides = [-1, -1, 2, 2, 1, 1]
+ pads = [-1, -1, 1, 1, 0, 0]
+ sizes = [[.1, .141], [.2,.272], [.37, .447], [.54, .619], [.71, .79], [.88, .961]]
+ ratios = [[1,2,.5], [1,2,.5,3,1./3], [1,2,.5,3,1./3], [1,2,.5,3,1./3], \
+ [1,2,.5], [1,2,.5]]
+ normalizations = [20, -1, -1, -1, -1, -1]
+ steps = [] if data_shape != 300 else [x / 300.0 for x in [8, 16, 32, 64, 100, 300]]
+ return locals()
+elif network == 'inceptionv3':
+ from_layers = ['ch_concat_mixed_7_chconcat', 'ch_concat_mixed_10_chconcat', '', '', '', '']
+ num_filters = [-1, -1, 512, 256, 256, 128]
+ strides = [-1, -1, 2, 2, 2, 2]
+ pads = [-1, -1, 1, 1, 1, 1]
+ sizes = [[.1, .141], [.2,.272], [.37, .447], [.54, .619], [.71, .79], [.88, .961]]
+ ratios = [[1,2,.5], [1,2,.5,3,1./3], [1,2,.5,3,1./3], [1,2,.5,3,1./3], \
+ [1,2,.5], [1,2,.5]]
+ normalizations = -1
+ steps = []
+ return locals()
+```
+Here `from_layers` indicate the feature layer you would like to extract from the base network.
+`''` indicate that we want add extra new layers on top of the last feature layer,
+and the number of filters must be specified in `num_filters`. Similarly, `strides` and `pads`
+are required to compose these new layers. `sizes` and `ratios` are the parameters controlling
+the anchor generation algorithm. `normalizations` is used to normalize and rescale feature if
+not `-1`. `steps`: optional, used to calculate the anchor sliding steps.
+
+3. Train or test with arguments `--network name --data-shape xxx --pretrained pretrained_model`
http://git-wip-us.apache.org/repos/asf/incubator-mxnet-test/blob/cc62aded/example/ssd/symbol/common.py
----------------------------------------------------------------------
diff --git a/example/ssd/symbol/common.py b/example/ssd/symbol/common.py
index 12ea718..474d3ea 100644
--- a/example/ssd/symbol/common.py
+++ b/example/ssd/symbol/common.py
@@ -29,6 +29,42 @@ def conv_act_layer(from_layer, name, num_filter, kernel=(1,1), pad=(0,0), \
----------
(conv, relu) mx.Symbols
"""
+ conv = mx.symbol.Convolution(data=from_layer, kernel=kernel, pad=pad, \
+ stride=stride, num_filter=num_filter, name="{}_conv".format(name))
+ if use_batchnorm:
+ conv = mx.symbol.BatchNorm(data=conv, name="{}_bn".format(name))
+ relu = mx.symbol.Activation(data=conv, act_type=act_type, \
+ name="{}_{}".format(name, act_type))
+ return relu
+
+def legacy_conv_act_layer(from_layer, name, num_filter, kernel=(1,1), pad=(0,0), \
+ stride=(1,1), act_type="relu", use_batchnorm=False):
+ """
+ wrapper for a small Convolution group
+
+ Parameters:
+ ----------
+ from_layer : mx.symbol
+ continue on which layer
+ name : str
+ base name of the new layers
+ num_filter : int
+ how many filters to use in Convolution layer
+ kernel : tuple (int, int)
+ kernel size (h, w)
+ pad : tuple (int, int)
+ padding size (h, w)
+ stride : tuple (int, int)
+ stride size (h, w)
+ act_type : str
+ activation type, can be relu...
+ use_batchnorm : bool
+ whether to use batch normalization
+
+ Returns:
+ ----------
+ (conv, relu) mx.Symbols
+ """
assert not use_batchnorm, "batchnorm not yet supported"
bias = mx.symbol.Variable(name="conv{}_bias".format(name),
init=mx.init.Constant(0.0), attr={'__lr_mult__': '2.0'})
@@ -40,9 +76,66 @@ def conv_act_layer(from_layer, name, num_filter, kernel=(1,1), pad=(0,0), \
relu = mx.symbol.BatchNorm(data=relu, name="bn{}".format(name))
return conv, relu
+def multi_layer_feature(body, from_layers, num_filters, strides, pads, min_filter=128):
+ """Wrapper function to extract features from base network, attaching extra
+ layers and SSD specific layers
+
+ Parameters
+ ----------
+ from_layers : list of str
+ feature extraction layers, use '' for add extra layers
+ For example:
+ from_layers = ['relu4_3', 'fc7', '', '', '', '']
+ which means extract feature from relu4_3 and fc7, adding 4 extra layers
+ on top of fc7
+ num_filters : list of int
+ number of filters for extra layers, you can use -1 for extracted features,
+ however, if normalization and scale is applied, the number of filter for
+ that layer must be provided.
+ For example:
+ num_filters = [512, -1, 512, 256, 256, 256]
+ strides : list of int
+ strides for the 3x3 convolution appended, -1 can be used for extracted
+ feature layers
+ pads : list of int
+ paddings for the 3x3 convolution, -1 can be used for extracted layers
+ min_filter : int
+ minimum number of filters used in 1x1 convolution
+
+ Returns
+ -------
+ list of mx.Symbols
+
+ """
+ # arguments check
+ assert len(from_layers) > 0
+ assert isinstance(from_layers[0], str) and len(from_layers[0].strip()) > 0
+ assert len(from_layers) == len(num_filters) == len(strides) == len(pads)
+
+ internals = body.get_internals()
+ layers = []
+ for k, params in enumerate(zip(from_layers, num_filters, strides, pads)):
+ from_layer, num_filter, s, p = params
+ if from_layer.strip():
+ # extract from base network
+ layer = internals[from_layer.strip() + '_output']
+ layers.append(layer)
+ else:
+ # attach from last feature layer
+ assert len(layers) > 0
+ assert num_filter > 0
+ layer = layers[-1]
+ num_1x1 = max(min_filter, num_filter // 2)
+ conv_1x1 = conv_act_layer(layer, 'multi_feat_%d_conv_1x1' % (k),
+ num_1x1, kernel=(1, 1), pad=(0, 0), stride=(1, 1), act_type='relu')
+ conv_3x3 = conv_act_layer(conv_1x1, 'multi_feat_%d_conv_3x3' % (k),
+ num_filter, kernel=(3, 3), pad=(p, p), stride=(s, s), act_type='relu')
+ layers.append(conv_3x3)
+ return layers
+
def multibox_layer(from_layers, num_classes, sizes=[.2, .95],
ratios=[1], normalization=-1, num_channels=[],
- clip=True, interm_layer=0, steps=[]):
+ clip=False, interm_layer=0, steps=[]):
"""
the basic aggregation module for SSD detection. Takes in multiple layers,
generate multiple object detection targets by customized layers
@@ -106,7 +199,7 @@ def multibox_layer(from_layers, num_classes, sizes=[.2, .95],
normalization = [normalization] * len(from_layers)
assert len(normalization) == len(from_layers)
- assert sum(x > 0 for x in normalization) == len(num_channels), \
+ assert sum(x > 0 for x in normalization) <= len(num_channels), \
"must provide number of channels for each normalized layer"
if steps:
@@ -125,7 +218,8 @@ def multibox_layer(from_layers, num_classes, sizes=[.2, .95],
mode="channel", name="{}_norm".format(from_name))
scale = mx.symbol.Variable(name="{}_scale".format(from_name),
shape=(1, num_channels.pop(0), 1, 1),
- init=mx.init.Constant(normalization[k]))
+ init=mx.init.Constant(normalization[k]),
+ attr={'__wd_mult__': '0.1'})
from_layer = mx.symbol.broadcast_mul(lhs=scale, rhs=from_layer)
if interm_layer > 0:
from_layer = mx.symbol.Convolution(data=from_layer, kernel=(3,3), \
http://git-wip-us.apache.org/repos/asf/incubator-mxnet-test/blob/cc62aded/example/ssd/symbol/inceptionv3.py
----------------------------------------------------------------------
diff --git a/example/ssd/symbol/inceptionv3.py b/example/ssd/symbol/inceptionv3.py
new file mode 100644
index 0000000..1c38ae6
--- /dev/null
+++ b/example/ssd/symbol/inceptionv3.py
@@ -0,0 +1,168 @@
+"""
+Inception V3, suitable for images with around 299 x 299
+
+Reference:
+
+Szegedy, Christian, et al. "Rethinking the Inception Architecture for Computer Vision." arXiv preprint arXiv:1512.00567 (2015).
+"""
+import mxnet as mx
+
+def Conv(data, num_filter, kernel=(1, 1), stride=(1, 1), pad=(0, 0), name=None, suffix=''):
+ conv = mx.sym.Convolution(data=data, num_filter=num_filter, kernel=kernel, stride=stride, pad=pad, no_bias=True, name='%s%s_conv2d' %(name, suffix))
+ bn = mx.sym.BatchNorm(data=conv, name='%s%s_batchnorm' %(name, suffix), fix_gamma=True)
+ act = mx.sym.Activation(data=bn, act_type='relu', name='%s%s_relu' %(name, suffix))
+ return act
+
+
+def Inception7A(data,
+ num_1x1,
+ num_3x3_red, num_3x3_1, num_3x3_2,
+ num_5x5_red, num_5x5,
+ pool, proj,
+ name):
+ tower_1x1 = Conv(data, num_1x1, name=('%s_conv' % name))
+ tower_5x5 = Conv(data, num_5x5_red, name=('%s_tower' % name), suffix='_conv')
+ tower_5x5 = Conv(tower_5x5, num_5x5, kernel=(5, 5), pad=(2, 2), name=('%s_tower' % name), suffix='_conv_1')
+ tower_3x3 = Conv(data, num_3x3_red, name=('%s_tower_1' % name), suffix='_conv')
+ tower_3x3 = Conv(tower_3x3, num_3x3_1, kernel=(3, 3), pad=(1, 1), name=('%s_tower_1' % name), suffix='_conv_1')
+ tower_3x3 = Conv(tower_3x3, num_3x3_2, kernel=(3, 3), pad=(1, 1), name=('%s_tower_1' % name), suffix='_conv_2')
+ pooling = mx.sym.Pooling(data=data, kernel=(3, 3), stride=(1, 1), pad=(1, 1), pool_type=pool, name=('%s_pool_%s_pool' % (pool, name)))
+ cproj = Conv(pooling, proj, name=('%s_tower_2' % name), suffix='_conv')
+ concat = mx.sym.Concat(*[tower_1x1, tower_5x5, tower_3x3, cproj], name='ch_concat_%s_chconcat' % name)
+ return concat
+
+# First Downsample
+def Inception7B(data,
+ num_3x3,
+ num_d3x3_red, num_d3x3_1, num_d3x3_2,
+ pool,
+ name):
+ tower_3x3 = Conv(data, num_3x3, kernel=(3, 3), pad=(0, 0), stride=(2, 2), name=('%s_conv' % name))
+ tower_d3x3 = Conv(data, num_d3x3_red, name=('%s_tower' % name), suffix='_conv')
+ tower_d3x3 = Conv(tower_d3x3, num_d3x3_1, kernel=(3, 3), pad=(1, 1), stride=(1, 1), name=('%s_tower' % name), suffix='_conv_1')
+ tower_d3x3 = Conv(tower_d3x3, num_d3x3_2, kernel=(3, 3), pad=(0, 0), stride=(2, 2), name=('%s_tower' % name), suffix='_conv_2')
+ pooling = mx.symbol.Pooling(data=data, kernel=(3, 3), stride=(2, 2), pad=(0,0), pool_type="max", name=('max_pool_%s_pool' % name))
+ concat = mx.sym.Concat(*[tower_3x3, tower_d3x3, pooling], name='ch_concat_%s_chconcat' % name)
+ return concat
+
+def Inception7C(data,
+ num_1x1,
+ num_d7_red, num_d7_1, num_d7_2,
+ num_q7_red, num_q7_1, num_q7_2, num_q7_3, num_q7_4,
+ pool, proj,
+ name):
+ tower_1x1 = Conv(data=data, num_filter=num_1x1, kernel=(1, 1), name=('%s_conv' % name))
+ tower_d7 = Conv(data=data, num_filter=num_d7_red, name=('%s_tower' % name), suffix='_conv')
+ tower_d7 = Conv(data=tower_d7, num_filter=num_d7_1, kernel=(1, 7), pad=(0, 3), name=('%s_tower' % name), suffix='_conv_1')
+ tower_d7 = Conv(data=tower_d7, num_filter=num_d7_2, kernel=(7, 1), pad=(3, 0), name=('%s_tower' % name), suffix='_conv_2')
+ tower_q7 = Conv(data=data, num_filter=num_q7_red, name=('%s_tower_1' % name), suffix='_conv')
+ tower_q7 = Conv(data=tower_q7, num_filter=num_q7_1, kernel=(7, 1), pad=(3, 0), name=('%s_tower_1' % name), suffix='_conv_1')
+ tower_q7 = Conv(data=tower_q7, num_filter=num_q7_2, kernel=(1, 7), pad=(0, 3), name=('%s_tower_1' % name), suffix='_conv_2')
+ tower_q7 = Conv(data=tower_q7, num_filter=num_q7_3, kernel=(7, 1), pad=(3, 0), name=('%s_tower_1' % name), suffix='_conv_3')
+ tower_q7 = Conv(data=tower_q7, num_filter=num_q7_4, kernel=(1, 7), pad=(0, 3), name=('%s_tower_1' % name), suffix='_conv_4')
+ pooling = mx.sym.Pooling(data=data, kernel=(3, 3), stride=(1, 1), pad=(1, 1), pool_type=pool, name=('%s_pool_%s_pool' % (pool, name)))
+ cproj = Conv(data=pooling, num_filter=proj, kernel=(1, 1), name=('%s_tower_2' % name), suffix='_conv')
+ # concat
+ concat = mx.sym.Concat(*[tower_1x1, tower_d7, tower_q7, cproj], name='ch_concat_%s_chconcat' % name)
+ return concat
+
+def Inception7D(data,
+ num_3x3_red, num_3x3,
+ num_d7_3x3_red, num_d7_1, num_d7_2, num_d7_3x3,
+ pool,
+ name):
+ tower_3x3 = Conv(data=data, num_filter=num_3x3_red, name=('%s_tower' % name), suffix='_conv')
+ tower_3x3 = Conv(data=tower_3x3, num_filter=num_3x3, kernel=(3, 3), pad=(0,0), stride=(2, 2), name=('%s_tower' % name), suffix='_conv_1')
+ tower_d7_3x3 = Conv(data=data, num_filter=num_d7_3x3_red, name=('%s_tower_1' % name), suffix='_conv')
+ tower_d7_3x3 = Conv(data=tower_d7_3x3, num_filter=num_d7_1, kernel=(1, 7), pad=(0, 3), name=('%s_tower_1' % name), suffix='_conv_1')
+ tower_d7_3x3 = Conv(data=tower_d7_3x3, num_filter=num_d7_2, kernel=(7, 1), pad=(3, 0), name=('%s_tower_1' % name), suffix='_conv_2')
+ tower_d7_3x3 = Conv(data=tower_d7_3x3, num_filter=num_d7_3x3, kernel=(3, 3), stride=(2, 2), name=('%s_tower_1' % name), suffix='_conv_3')
+ pooling = mx.sym.Pooling(data=data, kernel=(3, 3), stride=(2, 2), pool_type=pool, name=('%s_pool_%s_pool' % (pool, name)))
+ # concat
+ concat = mx.sym.Concat(*[tower_3x3, tower_d7_3x3, pooling], name='ch_concat_%s_chconcat' % name)
+ return concat
+
+def Inception7E(data,
+ num_1x1,
+ num_d3_red, num_d3_1, num_d3_2,
+ num_3x3_d3_red, num_3x3, num_3x3_d3_1, num_3x3_d3_2,
+ pool, proj,
+ name):
+ tower_1x1 = Conv(data=data, num_filter=num_1x1, kernel=(1, 1), name=('%s_conv' % name))
+ tower_d3 = Conv(data=data, num_filter=num_d3_red, name=('%s_tower' % name), suffix='_conv')
+ tower_d3_a = Conv(data=tower_d3, num_filter=num_d3_1, kernel=(1, 3), pad=(0, 1), name=('%s_tower' % name), suffix='_mixed_conv')
+ tower_d3_b = Conv(data=tower_d3, num_filter=num_d3_2, kernel=(3, 1), pad=(1, 0), name=('%s_tower' % name), suffix='_mixed_conv_1')
+ tower_3x3_d3 = Conv(data=data, num_filter=num_3x3_d3_red, name=('%s_tower_1' % name), suffix='_conv')
+ tower_3x3_d3 = Conv(data=tower_3x3_d3, num_filter=num_3x3, kernel=(3, 3), pad=(1, 1), name=('%s_tower_1' % name), suffix='_conv_1')
+ tower_3x3_d3_a = Conv(data=tower_3x3_d3, num_filter=num_3x3_d3_1, kernel=(1, 3), pad=(0, 1), name=('%s_tower_1' % name), suffix='_mixed_conv')
+ tower_3x3_d3_b = Conv(data=tower_3x3_d3, num_filter=num_3x3_d3_2, kernel=(3, 1), pad=(1, 0), name=('%s_tower_1' % name), suffix='_mixed_conv_1')
+ pooling = mx.sym.Pooling(data=data, kernel=(3, 3), stride=(1, 1), pad=(1, 1), pool_type=pool, name=('%s_pool_%s_pool' % (pool, name)))
+ cproj = Conv(data=pooling, num_filter=proj, kernel=(1, 1), name=('%s_tower_2' % name), suffix='_conv')
+ # concat
+ concat = mx.sym.Concat(*[tower_1x1, tower_d3_a, tower_d3_b, tower_3x3_d3_a, tower_3x3_d3_b, cproj], name='ch_concat_%s_chconcat' % name)
+ return concat
+
+# In[49]:
+
+def get_symbol(num_classes=1000, **kwargs):
+ data = mx.symbol.Variable(name="data")
+ # stage 1
+ conv = Conv(data, 32, kernel=(3, 3), stride=(2, 2), name="conv")
+ conv_1 = Conv(conv, 32, kernel=(3, 3), name="conv_1")
+ conv_2 = Conv(conv_1, 64, kernel=(3, 3), pad=(1, 1), name="conv_2")
+ pool = mx.sym.Pooling(data=conv_2, kernel=(3, 3), stride=(2, 2), pool_type="max", name="pool")
+ # stage 2
+ conv_3 = Conv(pool, 80, kernel=(1, 1), name="conv_3")
+ conv_4 = Conv(conv_3, 192, kernel=(3, 3), name="conv_4")
+ pool1 = mx.sym.Pooling(data=conv_4, kernel=(3, 3), stride=(2, 2), pool_type="max", name="pool1")
+ # stage 3
+ in3a = Inception7A(pool1, 64,
+ 64, 96, 96,
+ 48, 64,
+ "avg", 32, "mixed")
+ in3b = Inception7A(in3a, 64,
+ 64, 96, 96,
+ 48, 64,
+ "avg", 64, "mixed_1")
+ in3c = Inception7A(in3b, 64,
+ 64, 96, 96,
+ 48, 64,
+ "avg", 64, "mixed_2")
+ in3d = Inception7B(in3c, 384,
+ 64, 96, 96,
+ "max", "mixed_3")
+ # stage 4
+ in4a = Inception7C(in3d, 192,
+ 128, 128, 192,
+ 128, 128, 128, 128, 192,
+ "avg", 192, "mixed_4")
+ in4b = Inception7C(in4a, 192,
+ 160, 160, 192,
+ 160, 160, 160, 160, 192,
+ "avg", 192, "mixed_5")
+ in4c = Inception7C(in4b, 192,
+ 160, 160, 192,
+ 160, 160, 160, 160, 192,
+ "avg", 192, "mixed_6")
+ in4d = Inception7C(in4c, 192,
+ 192, 192, 192,
+ 192, 192, 192, 192, 192,
+ "avg", 192, "mixed_7")
+ in4e = Inception7D(in4d, 192, 320,
+ 192, 192, 192, 192,
+ "max", "mixed_8")
+ # stage 5
+ in5a = Inception7E(in4e, 320,
+ 384, 384, 384,
+ 448, 384, 384, 384,
+ "avg", 192, "mixed_9")
+ in5b = Inception7E(in5a, 320,
+ 384, 384, 384,
+ 448, 384, 384, 384,
+ "max", 192, "mixed_10")
+ # pool
+ pool = mx.sym.Pooling(data=in5b, kernel=(8, 8), stride=(1, 1), pool_type="avg", name="global_pool")
+ flatten = mx.sym.Flatten(data=pool, name="flatten")
+ fc1 = mx.symbol.FullyConnected(data=flatten, num_hidden=num_classes, name='fc1')
+ softmax = mx.symbol.SoftmaxOutput(data=fc1, name='softmax')
+ return softmax
http://git-wip-us.apache.org/repos/asf/incubator-mxnet-test/blob/cc62aded/example/ssd/symbol/legacy_vgg16_ssd_300.py
----------------------------------------------------------------------
diff --git a/example/ssd/symbol/legacy_vgg16_ssd_300.py b/example/ssd/symbol/legacy_vgg16_ssd_300.py
new file mode 100644
index 0000000..257fdd6
--- /dev/null
+++ b/example/ssd/symbol/legacy_vgg16_ssd_300.py
@@ -0,0 +1,191 @@
+import mxnet as mx
+from common import legacy_conv_act_layer
+from common import multibox_layer
+
+def get_symbol_train(num_classes=20, nms_thresh=0.5, force_suppress=False,
+ nms_topk=400, **kwargs):
+ """
+ Single-shot multi-box detection with VGG 16 layers ConvNet
+ This is a modified version, with fc6/fc7 layers replaced by conv layers
+ And the network is slightly smaller than original VGG 16 network
+ This is a training network with losses
+
+ Parameters:
+ ----------
+ num_classes: int
+ number of object classes not including background
+ nms_thresh : float
+ non-maximum suppression threshold
+ force_suppress : boolean
+ whether suppress different class objects
+ nms_topk : int
+ apply NMS to top K detections
+
+ Returns:
+ ----------
+ mx.Symbol
+ """
+ data = mx.symbol.Variable(name="data")
+ label = mx.symbol.Variable(name="label")
+
+ # group 1
+ conv1_1 = mx.symbol.Convolution(
+ data=data, kernel=(3, 3), pad=(1, 1), num_filter=64, name="conv1_1")
+ relu1_1 = mx.symbol.Activation(data=conv1_1, act_type="relu", name="relu1_1")
+ conv1_2 = mx.symbol.Convolution(
+ data=relu1_1, kernel=(3, 3), pad=(1, 1), num_filter=64, name="conv1_2")
+ relu1_2 = mx.symbol.Activation(data=conv1_2, act_type="relu", name="relu1_2")
+ pool1 = mx.symbol.Pooling(
+ data=relu1_2, pool_type="max", kernel=(2, 2), stride=(2, 2), name="pool1")
+ # group 2
+ conv2_1 = mx.symbol.Convolution(
+ data=pool1, kernel=(3, 3), pad=(1, 1), num_filter=128, name="conv2_1")
+ relu2_1 = mx.symbol.Activation(data=conv2_1, act_type="relu", name="relu2_1")
+ conv2_2 = mx.symbol.Convolution(
+ data=relu2_1, kernel=(3, 3), pad=(1, 1), num_filter=128, name="conv2_2")
+ relu2_2 = mx.symbol.Activation(data=conv2_2, act_type="relu", name="relu2_2")
+ pool2 = mx.symbol.Pooling(
+ data=relu2_2, pool_type="max", kernel=(2, 2), stride=(2, 2), name="pool2")
+ # group 3
+ conv3_1 = mx.symbol.Convolution(
+ data=pool2, kernel=(3, 3), pad=(1, 1), num_filter=256, name="conv3_1")
+ relu3_1 = mx.symbol.Activation(data=conv3_1, act_type="relu", name="relu3_1")
+ conv3_2 = mx.symbol.Convolution(
+ data=relu3_1, kernel=(3, 3), pad=(1, 1), num_filter=256, name="conv3_2")
+ relu3_2 = mx.symbol.Activation(data=conv3_2, act_type="relu", name="relu3_2")
+ conv3_3 = mx.symbol.Convolution(
+ data=relu3_2, kernel=(3, 3), pad=(1, 1), num_filter=256, name="conv3_3")
+ relu3_3 = mx.symbol.Activation(data=conv3_3, act_type="relu", name="relu3_3")
+ pool3 = mx.symbol.Pooling(
+ data=relu3_3, pool_type="max", kernel=(2, 2), stride=(2, 2), \
+ pooling_convention="full", name="pool3")
+ # group 4
+ conv4_1 = mx.symbol.Convolution(
+ data=pool3, kernel=(3, 3), pad=(1, 1), num_filter=512, name="conv4_1")
+ relu4_1 = mx.symbol.Activation(data=conv4_1, act_type="relu", name="relu4_1")
+ conv4_2 = mx.symbol.Convolution(
+ data=relu4_1, kernel=(3, 3), pad=(1, 1), num_filter=512, name="conv4_2")
+ relu4_2 = mx.symbol.Activation(data=conv4_2, act_type="relu", name="relu4_2")
+ conv4_3 = mx.symbol.Convolution(
+ data=relu4_2, kernel=(3, 3), pad=(1, 1), num_filter=512, name="conv4_3")
+ relu4_3 = mx.symbol.Activation(data=conv4_3, act_type="relu", name="relu4_3")
+ pool4 = mx.symbol.Pooling(
+ data=relu4_3, pool_type="max", kernel=(2, 2), stride=(2, 2), name="pool4")
+ # group 5
+ conv5_1 = mx.symbol.Convolution(
+ data=pool4, kernel=(3, 3), pad=(1, 1), num_filter=512, name="conv5_1")
+ relu5_1 = mx.symbol.Activation(data=conv5_1, act_type="relu", name="relu5_1")
+ conv5_2 = mx.symbol.Convolution(
+ data=relu5_1, kernel=(3, 3), pad=(1, 1), num_filter=512, name="conv5_2")
+ relu5_2 = mx.symbol.Activation(data=conv5_2, act_type="relu", name="relu5_2")
+ conv5_3 = mx.symbol.Convolution(
+ data=relu5_2, kernel=(3, 3), pad=(1, 1), num_filter=512, name="conv5_3")
+ relu5_3 = mx.symbol.Activation(data=conv5_3, act_type="relu", name="relu5_3")
+ pool5 = mx.symbol.Pooling(
+ data=relu5_3, pool_type="max", kernel=(3, 3), stride=(1, 1),
+ pad=(1,1), name="pool5")
+ # group 6
+ conv6 = mx.symbol.Convolution(
+ data=pool5, kernel=(3, 3), pad=(6, 6), dilate=(6, 6),
+ num_filter=1024, name="conv6")
+ relu6 = mx.symbol.Activation(data=conv6, act_type="relu", name="relu6")
+ # drop6 = mx.symbol.Dropout(data=relu6, p=0.5, name="drop6")
+ # group 7
+ conv7 = mx.symbol.Convolution(
+ data=relu6, kernel=(1, 1), pad=(0, 0), num_filter=1024, name="conv7")
+ relu7 = mx.symbol.Activation(data=conv7, act_type="relu", name="relu7")
+ # drop7 = mx.symbol.Dropout(data=relu7, p=0.5, name="drop7")
+
+ ### ssd extra layers ###
+ conv8_1, relu8_1 = legacy_conv_act_layer(relu7, "8_1", 256, kernel=(1,1), pad=(0,0), \
+ stride=(1,1), act_type="relu", use_batchnorm=False)
+ conv8_2, relu8_2 = legacy_conv_act_layer(relu8_1, "8_2", 512, kernel=(3,3), pad=(1,1), \
+ stride=(2,2), act_type="relu", use_batchnorm=False)
+ conv9_1, relu9_1 = legacy_conv_act_layer(relu8_2, "9_1", 128, kernel=(1,1), pad=(0,0), \
+ stride=(1,1), act_type="relu", use_batchnorm=False)
+ conv9_2, relu9_2 = legacy_conv_act_layer(relu9_1, "9_2", 256, kernel=(3,3), pad=(1,1), \
+ stride=(2,2), act_type="relu", use_batchnorm=False)
+ conv10_1, relu10_1 = legacy_conv_act_layer(relu9_2, "10_1", 128, kernel=(1,1), pad=(0,0), \
+ stride=(1,1), act_type="relu", use_batchnorm=False)
+ conv10_2, relu10_2 = legacy_conv_act_layer(relu10_1, "10_2", 256, kernel=(3,3), pad=(0,0), \
+ stride=(1,1), act_type="relu", use_batchnorm=False)
+ conv11_1, relu11_1 = legacy_conv_act_layer(relu10_2, "11_1", 128, kernel=(1,1), pad=(0,0), \
+ stride=(1,1), act_type="relu", use_batchnorm=False)
+ conv11_2, relu11_2 = legacy_conv_act_layer(relu11_1, "11_2", 256, kernel=(3,3), pad=(0,0), \
+ stride=(1,1), act_type="relu", use_batchnorm=False)
+
+ # specific parameters for VGG16 network
+ from_layers = [relu4_3, relu7, relu8_2, relu9_2, relu10_2, relu11_2]
+ sizes = [[.1, .141], [.2,.272], [.37, .447], [.54, .619], [.71, .79], [.88, .961]]
+ ratios = [[1,2,.5], [1,2,.5,3,1./3], [1,2,.5,3,1./3], [1,2,.5,3,1./3], \
+ [1,2,.5], [1,2,.5]]
+ normalizations = [20, -1, -1, -1, -1, -1]
+ steps = [ x / 300.0 for x in [8, 16, 32, 64, 100, 300]]
+ num_channels = [512]
+
+ loc_preds, cls_preds, anchor_boxes = multibox_layer(from_layers, \
+ num_classes, sizes=sizes, ratios=ratios, normalization=normalizations, \
+ num_channels=num_channels, clip=False, interm_layer=0, steps=steps)
+
+ tmp = mx.contrib.symbol.MultiBoxTarget(
+ *[anchor_boxes, label, cls_preds], overlap_threshold=.5, \
+ ignore_label=-1, negative_mining_ratio=3, minimum_negative_samples=0, \
+ negative_mining_thresh=.5, variances=(0.1, 0.1, 0.2, 0.2),
+ name="multibox_target")
+ loc_target = tmp[0]
+ loc_target_mask = tmp[1]
+ cls_target = tmp[2]
+
+ cls_prob = mx.symbol.SoftmaxOutput(data=cls_preds, label=cls_target, \
+ ignore_label=-1, use_ignore=True, grad_scale=1., multi_output=True, \
+ normalization='valid', name="cls_prob")
+ loc_loss_ = mx.symbol.smooth_l1(name="loc_loss_", \
+ data=loc_target_mask * (loc_preds - loc_target), scalar=1.0)
+ loc_loss = mx.symbol.MakeLoss(loc_loss_, grad_scale=1., \
+ normalization='valid', name="loc_loss")
+
+ # monitoring training status
+ cls_label = mx.symbol.MakeLoss(data=cls_target, grad_scale=0, name="cls_label")
+ det = mx.contrib.symbol.MultiBoxDetection(*[cls_prob, loc_preds, anchor_boxes], \
+ name="detection", nms_threshold=nms_thresh, force_suppress=force_suppress,
+ variances=(0.1, 0.1, 0.2, 0.2), nms_topk=nms_topk)
+ det = mx.symbol.MakeLoss(data=det, grad_scale=0, name="det_out")
+
+ # group output
+ out = mx.symbol.Group([cls_prob, loc_loss, cls_label, det])
+ return out
+
+def get_symbol(num_classes=20, nms_thresh=0.5, force_suppress=False,
+ nms_topk=400, **kwargs):
+ """
+ Single-shot multi-box detection with VGG 16 layers ConvNet
+ This is a modified version, with fc6/fc7 layers replaced by conv layers
+ And the network is slightly smaller than original VGG 16 network
+ This is the detection network
+
+ Parameters:
+ ----------
+ num_classes: int
+ number of object classes not including background
+ nms_thresh : float
+ threshold of overlap for non-maximum suppression
+ force_suppress : boolean
+ whether suppress different class objects
+ nms_topk : int
+ apply NMS to top K detections
+
+ Returns:
+ ----------
+ mx.Symbol
+ """
+ net = get_symbol_train(num_classes)
+ cls_preds = net.get_internals()["multibox_cls_pred_output"]
+ loc_preds = net.get_internals()["multibox_loc_pred_output"]
+ anchor_boxes = net.get_internals()["multibox_anchors_output"]
+
+ cls_prob = mx.symbol.SoftmaxActivation(data=cls_preds, mode='channel', \
+ name='cls_prob')
+ out = mx.contrib.symbol.MultiBoxDetection(*[cls_prob, loc_preds, anchor_boxes], \
+ name="detection", nms_threshold=nms_thresh, force_suppress=force_suppress,
+ variances=(0.1, 0.1, 0.2, 0.2), nms_topk=nms_topk)
+ return out