You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@singa.apache.org by GitBox <gi...@apache.org> on 2020/05/18 08:11:10 UTC

[GitHub] [singa] joddiy opened a new pull request #703: Refactor sonnx, test cases and examples

joddiy opened a new pull request #703:
URL: https://github.com/apache/singa/pull/703


   todo-list:
   1. sonnx to suit new API(almost done),
   2. sonnx to use new autograd API,
   3. test cases,
   4. examples
   5. (if time is enough) add more examples


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] nudles commented on a change in pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

nudles commented on a change in pull request #703:
URL: https://github.com/apache/singa/pull/703#discussion_r426453900



##########
File path: examples/cnn/onnx/train.py
##########
@@ -0,0 +1,307 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+import sys, os
+import json
+from singa import singa_wrap as singa
+from singa import opt
+from singa import device
+from singa import tensor
+from singa import sonnx
+from singa import autograd
+import numpy as np
+import time
+import argparse
+from PIL import Image
+import onnx
+from tqdm import tqdm
+from utils import download_model, update_batch_size, check_exist_or_download, softmax_loss
+import logging
+
+logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')
+sys.path.append(os.path.dirname(__file__) + '/..')
+
+# Data Augmentation
+def augmentation(x, batch_size):
+    xpad = np.pad(x, [[0, 0], [0, 0], [4, 4], [4, 4]], 'symmetric')
+    for data_num in range(0, batch_size):
+        offset = np.random.randint(8, size=2)
+        x[data_num, :, :, :] = xpad[data_num, :,
+                                    offset[0]:offset[0] + x.shape[2],
+                                    offset[1]:offset[1] + x.shape[2]]
+        if_flip = np.random.randint(2)
+        if (if_flip):
+            x[data_num, :, :, :] = x[data_num, :, :, ::-1]
+    return x
+
+
+# Calculate Accuracy
+def accuracy(pred, target):
+    # y is network output to be compared with ground truth (int)
+    y = np.argmax(pred, axis=1)
+    a = y == target
+    correct = np.array(a, "int").sum()
+    # print(correct)
+    return correct
+
+
+# Data partition according to the rank
+def partition(global_rank, world_size, train_x, train_y, val_x, val_y):
+    # Partition training data
+    data_per_rank = train_x.shape[0] // world_size
+    idx_start = global_rank * data_per_rank
+    idx_end = (global_rank + 1) * data_per_rank
+    train_x = train_x[idx_start:idx_end]
+    train_y = train_y[idx_start:idx_end]
+    # Partition evaluation data
+    data_per_rank = val_x.shape[0] // world_size
+    idx_start = global_rank * data_per_rank
+    idx_end = (global_rank + 1) * data_per_rank
+    val_x = val_x[idx_start:idx_end]
+    val_y = val_y[idx_start:idx_end]
+    return train_x, train_y, val_x, val_y
+
+
+# Function to all reduce NUMPY Accuracy and Loss from Multiple Devices
+def reduce_variable(variable, dist_opt, reducer):
+    reducer.copy_from_numpy(variable)
+    dist_opt.all_reduce(reducer.data)
+    dist_opt.wait()
+    output = tensor.to_numpy(reducer)
+    return output
+
+
+def resize_dataset(x, image_size):
+    num_data = x.shape[0]
+    dim = x.shape[1]
+    X = np.zeros(shape=(num_data, dim, image_size, image_size),
+                 dtype=np.float32)
+    for n in range(0, num_data):
+        for d in range(0, dim):
+            X[n, d, :, :] = np.array(Image.fromarray(x[n, d, :, :]).resize(
+                (image_size, image_size), Image.BILINEAR),
+                                     dtype=np.float32)
+    return X
+
+
+def run(global_rank,
+        world_size,
+        local_rank,
+        max_epoch,
+        batch_size,
+        model,
+        data,
+        sgd,
+        graph,
+        dist_option='fp32',
+        spars=None):
+    dev = device.create_cuda_gpu_on(local_rank)
+    dev.SetRandSeed(0)
+    np.random.seed(0)
+
+    if data == 'cifar10':
+        from data import cifar10
+        train_x, train_y, val_x, val_y = cifar10.load()
+    elif data == 'cifar100':
+        from data import cifar100
+        train_x, train_y, val_x, val_y = cifar100.load()
+    elif data == 'mnist':
+        from data import mnist
+        train_x, train_y, val_x, val_y = mnist.load()
+
+    num_channels = train_x.shape[1]
+    image_size = train_x.shape[2]
+    data_size = np.prod(train_x.shape[1:train_x.ndim]).item()
+    num_classes = (np.max(train_y) + 1).item()
+
+    # print content
+    with open(os.path.join(os.path.dirname(__file__), 'models.json')) as json_file:
+        model_config = json.load(json_file)
+        model_config = model_config[model]
+
+    download_model(model_config['url'])
+    onnx_model = onnx.load(os.path.join('/tmp', model_config['path']))
+    onnx_model = update_batch_size(onnx_model, batch_size)
+    sg_ir = sonnx.prepare(onnx_model, device=dev)
+    model = sonnx.create_model(sg_ir, autograd.softmax_cross_entropy, sgd)

Review comment:
       are you going to change to the new SONNXModel API?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] lgtm-com[bot] commented on pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

lgtm-com[bot] commented on pull request #703:
URL: https://github.com/apache/singa/pull/703#issuecomment-639489100


   This pull request **fixes 1 alert** when merging 6af18ba4d4c527f970ff74407f7258b604bf4b5b into ede4a3ed0e29e4ef488e76e37f6c020c44508ea0 - [view on LGTM.com](https://lgtm.com/projects/g/apache/singa/rev/pr-f7ba34c9a8e061b00dc440d521aff588975a2a69)
   
   **fixed alerts:**
   
   * 1 for Unused local variable


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] lgtm-com[bot] commented on pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

lgtm-com[bot] commented on pull request #703:
URL: https://github.com/apache/singa/pull/703#issuecomment-639397133


   This pull request **fixes 1 alert** when merging 6d46112253deef0edc64538f7e30eb929ba0fddf into ede4a3ed0e29e4ef488e76e37f6c020c44508ea0 - [view on LGTM.com](https://lgtm.com/projects/g/apache/singa/rev/pr-82346d735639365a6c9f44b33210f20048b78fc0)
   
   **fixed alerts:**
   
   * 1 for Unused local variable


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] lgtm-com[bot] commented on pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

lgtm-com[bot] commented on pull request #703:
URL: https://github.com/apache/singa/pull/703#issuecomment-632278304


   This pull request **introduces 7 alerts** and **fixes 1** when merging 54458cb1c949408a3221ab3edd7f952ee12a4fd4 into 84de1af8428d3796e3f99b215570a54d6f975a94 - [view on LGTM.com](https://lgtm.com/projects/g/apache/singa/rev/pr-6a6acab2f8a9ff07863aa29c4e1aab6f3cb96d1a)
   
   **new alerts:**
   
   * 4 for Duplicate key in dict literal
   * 2 for Unused local variable
   * 1 for Wrong number of arguments in a class instantiation
   
   **fixed alerts:**
   
   * 1 for Unused local variable


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] lgtm-com[bot] commented on pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

lgtm-com[bot] commented on pull request #703:
URL: https://github.com/apache/singa/pull/703#issuecomment-635548145


   This pull request **introduces 2 alerts** and **fixes 1** when merging 70a0d7e8a657b71685776190f49f77068c8d7218 into bec19640aa7dae75cd12a44109bc319d2e30cfbf - [view on LGTM.com](https://lgtm.com/projects/g/apache/singa/rev/pr-0257e2ebeb99e27069d9a8162f88946fabdd7bc9)
   
   **new alerts:**
   
   * 2 for Unused local variable
   
   **fixed alerts:**
   
   * 1 for Unused local variable


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] joddiy commented on a change in pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

joddiy commented on a change in pull request #703:
URL: https://github.com/apache/singa/pull/703#discussion_r435866133



##########
File path: examples/onnx/arcface.py
##########
@@ -78,35 +76,30 @@ def forward(self, x):
     download_model(url)
     onnx_model = onnx.load(model_path)
 
-    # set batch size
-    onnx_model = update_batch_size(onnx_model, 2)
+    # inference demo
+    logging.info("preprocessing...")
+    img1, img2 = get_image()
+    img1 = preprocess(img1)
+    img2 = preprocess(img2)
+    # sg_ir = sonnx.prepare(onnx_model) # run without graph
+    # y = sg_ir.run([img1, img2])
 
-    # prepare the model
-    logging.info("prepare model...")
+    logging.info("model compling...")
     dev = device.create_cuda_gpu()
-    sg_ir = sonnx.prepare(onnx_model, device=dev)
-    autograd.training = False
-    model = Infer(sg_ir)
+    x = tensor.Tensor(device=dev, data=np.concatenate((img1, img2), axis=0))
+    m = MyModel(onnx_model)
+    m.compile([x], is_train=False, use_graph=True, sequential=True)
 
-    # verifty the test dataset
+    # verifty the test
     # from utils import load_dataset
-    # inputs, ref_outputs = load_dataset(
-    #     os.path.join('/tmp', 'resnet100', 'test_data_set_0'))
+    # inputs, ref_outputs = load_dataset(os.path.join('/tmp', 'resnet100', 'test_data_set_0'))
     # x_batch = tensor.Tensor(device=dev, data=inputs[0])
-    # outputs = model.forward(x_batch)
+    # outputs = sg_ir.run([x_batch])
     # for ref_o, o in zip(ref_outputs, outputs):
     #     np.testing.assert_almost_equal(ref_o, tensor.to_numpy(o), 4)
 
-    # inference demo
-    logging.info("preprocessing...")
-    img1, img2 = get_image()
-    img1 = preprocess(img1)
-    img2 = preprocess(img2)
-
-    x_batch = tensor.Tensor(device=dev,
-                            data=np.concatenate((img1, img2), axis=0))
     logging.info("model running...")
-    y = model.forward(x_batch)
+    y = m.forward(*[x])[0]

Review comment:
       - the `*[x]` is same with `x`, it'd be ok to change to `x`.
   
   - ok, I'll update the `y[0]`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] nudles commented on a change in pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

nudles commented on a change in pull request #703:
URL: https://github.com/apache/singa/pull/703#discussion_r433005372



##########
File path: python/singa/sonnx.py
##########
@@ -989,1150 +1035,1051 @@ def from_onnx(args):
 class SingaBackend(Backend):
 
     # This number indicates the onnx operator set version
-    _known_opset_version = 11
+    _opset_version = 11
+
+    _ir_version = 0x0000000000000006
 
     # beceuase singa's operators are different from onnx.
     # we define a dict for the name projection
     _rename_operators = {
-        'Relu': 'relu',
-        'Softmax': 'SoftMax',
-        'Sigmoid': 'sigmoid',
-        'Add': 'add',
-        'MatMul': 'matmul',
-        'Conv': '_Conv2d',
-        'MaxPool': '_Pooling2d',
-        'AveragePool': '_Pooling2d',
-        'BatchNormalization': 'batchnorm_2d',
-        'Concat': 'Concat',
-        'Flatten': 'Flatten',
-        'Gemm': 'Gemm',
-        'Reshape': 'Reshape',
-        'Sum': 'sum',
-        'Cos': 'cos',
-        'Cosh': 'cosh',
-        'Sin': 'sin',
-        'Sinh': 'sinh',
-        'Tan': 'tan',
-        'Tanh': 'tanh',
-        'Acos': 'acos',
-        'Acosh': 'acosh',
-        'Asin': 'asin',
-        'Asinh': 'asinh',
-        'Atan': 'atan',
-        'Atanh': 'atanh',
-        'Selu': 'SeLU',
-        'Elu': 'Elu',
-        'Equal': 'equal',
-        'Less': 'less',
-        'Sign': 'sign',
-        'Div': 'div',
-        'Sub': 'sub',
-        'Sqrt': 'sqrt',
-        'Log': 'log',
-        'Greater': 'greater',
-        'HardSigmoid': 'HardSigmoid',
-        'Identity': 'identity',
-        'Softplus': 'softplus',
-        'Softsign': 'softsign',
-        'Mean': 'mean',
-        'Pow': 'pow',
-        'Clip': 'Clip',
-        'PRelu': 'prelu',
-        'Mul': 'mul',
-        'Transpose': 'Transpose',
-        'Max': 'max',
-        'Min': 'min',
-        'Shape': 'shape',
-        'And': '_and',
-        'Or': '_or',
-        'Xor': '_xor',
-        'Not': '_not',
-        'Neg': 'negative',
-        'Reciprocal': 'reciprocal',
-        'ConstantOfShape': 'ConstantOfShape',
-        'Dropout': 'Dropout',
-        'ReduceSum': 'ReduceSum',
-        'ReduceMean': 'ReduceMean',
-        'LeakyRelu': 'LeakyRelu',
-        'GlobalAveragePool': 'GlobalAveragePool',
-        'Squeeze': 'Squeeze',
+        # common op
+        'Relu': 'ReLU',
+        'Sigmoid': 'Sigmoid',
+        'Add': 'Add',
+        'MatMul': 'Matmul',
+        'Sum': 'Sum',
+        'Cos': 'Cos',
+        'Cosh': 'Cosh',
+        'Sin': 'Sin',
+        'Sinh': 'Sinh',
+        'Tan': 'Tan',
+        'Tanh': 'Tanh',
+        'Acos': 'Acos',
+        'Acosh': 'Acosh',
+        'Asin': 'Asin',
+        'Asinh': 'Asinh',
+        'Atan': 'Atan',
+        'Atanh': 'Atanh',
+        'Equal': 'Equal',
+        'Less': 'Less',
+        'Sign': 'Sign',
+        'Div': 'Div',
+        'Sub': 'Sub',
+        'Sqrt': 'Sqrt',
+        'Log': 'Log',
+        'Greater': 'Greater',
+        'Identity': 'Identity',
+        'Softplus': 'SoftPlus',
+        'Softsign': 'SoftSign',
+        'Mean': 'Mean',
+        'Pow': 'Pow',
+        'PRelu': 'PRelu',
+        'Mul': 'Mul',
+        'Max': 'Max',
+        'Min': 'Min',
+        'Shape': 'Shape',
+        'And': 'And',
+        'Or': 'Or',
+        'Xor': 'Xor',
+        'Not': 'Not',
+        'Neg': 'Negative',
+        'Reciprocal': 'Reciprocal',
         'Unsqueeze': 'Unsqueeze',
-        'Slice': 'Slice',
+        'NonZero': 'NonZero',
         'Ceil': 'Ceil',
-        'Split': 'Split',
-        'Gather': 'Gather',
-        'Tile': 'Tile',
-        'NonZero': 'nonzero',
+        # # special op
         'Cast': 'Cast',
+        'Split': 'Split',
+        'Squeeze': 'Squeeze',
+        'GlobalAveragePool': 'GlobalAveragePool',
+        'LeakyRelu': 'LeakyRelu',
+        'ReduceSum': 'ReduceSum',
+        'ReduceMean': 'ReduceMean',
+        'Dropout': 'Dropout',
+        'ConstantOfShape': 'ConstantOfShape',
+        'Transpose': 'Transpose',
+        'HardSigmoid': 'HardSigmoid',
+        'Elu': 'Elu',
+        'Selu': 'SeLU',
+        'Concat': 'Concat',
+        'Softmax': 'SoftMax',
+        'Flatten': 'Flatten',
         'OneHot': 'OneHot',
+        'Tile': 'Tile',
+        'Gather': 'Gather',
+        'Reshape': 'Reshape',
+        'Slice': 'Slice',
+        'Clip': 'Clip',
+        'Gemm': 'layer.Gemm',  # layer
+        'BatchNormalization': 'layer.BatchNorm2d',  # layer
+        'Conv': 'layer.Conv2d',  # layer
+        'MaxPool': 'layer.Pooling2d',  # layer
+        'AveragePool': 'layer.Pooling2d',  # layer
     }
 
     # this dict indicates the operators that need extra handle
     # each indicates a function name
     _special_operators = {
-        'Conv': '_create_conv',
-        'MaxPool': '_create_max_avg_pool',
-        'AveragePool': '_create_max_avg_pool',
-        'BatchNormalization': '_create_batchnorm',
+        'Cast': '_create_cast',
+        'Split': '_create_split',
+        'Squeeze': '_create_squeeze_unsqueeze',
+        'Unsqueeze': '_create_squeeze_unsqueeze',
+        'GlobalAveragePool': '_create_global_average_pool',
+        'LeakyRelu': '_create_leakyrelu',
+        'ReduceSum': '_create_reduce_ops',
+        'ReduceMean': '_create_reduce_ops',
+        'Dropout': '_create_dropout',
+        'ConstantOfShape': '_create_constant_of_shape',
+        'Transpose': '_create_transpose',
+        'HardSigmoid': '_create_hardsigmoid',
+        'Elu': '_create_elu',
+        'Selu': '_create_selu',
         'Concat': '_create_concat',
-        'Flatten': '_create_flatten',
+        'Softmax': '_create_softmax',
         'Gemm': '_create_gemm',
+        'Flatten': '_create_flatten',
+        'OneHot': '_create_onehot',
+        'Tile': '_create_tile',
+        'Gather': '_create_gather',
         'Reshape': '_create_reshape',
-        'Softmax': '_create_softmax',
-        'Selu': '_create_selu',
-        'Elu': '_create_elu',
-        'HardSigmoid': '_create_hardsigmoid',
-        'Clip': '_create_clip',
-        'Transpose': '_create_transpose',
-        'ConstantOfShape': '_create_constantOfShape',
-        'Dropout': '_create_dropout',
-        'ReduceSum': '_create_reduceOp',
-        'ReduceMean': '_create_reduceOp',
-        'LeakyRelu': '_create_leakyrelu',
-        'GlobalAveragePool': '_create_globalaveragepool',
-        'Squeeze': '_create_squeeze',
-        'Unsqueeze': '_create_squeeze',
         'Slice': '_create_slice',
-        'Split': '_create_split',
-        'Gather': '_create_gather',
-        'Tile': '_create_tile',
-        'Cast': '_create_cast',
-        'OneHot': '_create_onehot',
-        'Constant': "_create_constant"
+        'Clip': '_create_clip',
+        'BatchNormalization': '_create_batch_norm',
+        'Conv': '_create_conv',
+        'MaxPool': '_create_max_avg_pool',
+        'AveragePool': '_create_max_avg_pool',
     }
 
     @classmethod
-    def _create_constant(cls, onnx_node, inputs, opset_version):
-        """
-        parse onnx constatn node to weights
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
-        """
-        tmp_tensor = onnx_node.getattr('value')
-        np_dtype = onnx.mapping.TENSOR_TYPE_TO_NP_TYPE[tmp_tensor.data_type]
-        np_tensor = np.frombuffer(tmp_tensor.raw_data, dtype=np_dtype)
-        if np_tensor.dtype == "int64":
-            np_tensor = np_tensor.astype(np.int32)
-        # todo, we cannot support scalar tensor
-        if np.ndim(np_tensor) == 0:
-            np_tensor = np.array(np_tensor, ndmin=1)
-        return None, np_tensor
-
-    @classmethod
-    def _create_onehot(cls, onnx_node, inputs, opset_version):
-        """
-        get the OneHot operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
-        """
-        axis = onnx_node.getattr("axis", -1)
-        # we move several inputs to singa's attribuates
-        # and mark them so we don't use them when we run this operator
-        depth = tensor.to_numpy(inputs.pop(1)).astype(np.int32)
-        value = tensor.to_numpy(inputs.pop(1))
-        onnx_node.consumed_inputs.extend(onnx_node.inputs[1:])
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(axis, depth, value)
-
-    @classmethod
-    def _create_cast(cls, onnx_node, inputs, opset_version):
+    def _create_cast(cls, onnx_node, operator, opset_version=_opset_version):
         """
         get the Cast operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
-        """
-        to = onnx_node.getattr("to")
-        # singa only supports float32 and int32
-        map_dict = {
-            TensorProto.FLOAT: tensor.float32,  # FLOAT to float32
-            TensorProto.UINT8: None,  # UINT8
-            TensorProto.INT8: tensor.int32,  # INT8 to int32
-            TensorProto.UINT16: None,  # UINT16
-            TensorProto.INT16: tensor.int32,  # INT16 to int32
-            TensorProto.INT32: tensor.int32,  # INT32 to int32
-            TensorProto.INT64: tensor.int32,  # INT64 to int32
-            TensorProto.STRING: None,  # stirng
-            TensorProto.BOOL: None,  # bool
-        }
-        to = map_dict[to]
-        assert to != None, "not support cast type: {}".format(to)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(to)
-
-    @classmethod
-    def _create_tile(cls, onnx_node, inputs, opset_version):
-        """
-        get the Tile operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
-        """
-        # we move several inputs to singa's attribuates
-        # and mark them so we don't use them when we run this operator
-        repeats = tensor.to_numpy(inputs.pop(1)).astype(np.int32).tolist()
-        onnx_node.consumed_inputs.append(onnx_node.inputs[1])
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(repeats)
-
-    @classmethod
-    def _create_gather(cls, onnx_node, inputs, opset_version):
-        """
-        get the Gather operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
-        axis = onnx_node.getattr("axis", 0)
-        # we move several inputs to singa's attribuates
-        # and mark them so we don't use them when we run this operator
-        indices = tensor.to_numpy(inputs.pop(1)).astype(np.int32).tolist()
-        onnx_node.consumed_inputs.append(onnx_node.inputs[1])
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(axis, indices)
+        to_type = onnx_type_to_singa_type(onnx_node.getattr("to"))
+        assert to_type != None, "not support cast type: {}".format(to_type)
+        if to_type == np.dtype('float32'):
+            return operator(tensor.float32)
+        else:
+            return operator(tensor.int32)
 
     @classmethod
-    def _create_split(cls, onnx_node, inputs, opset_version):
+    def _create_split(cls, onnx_node, operator, opset_version=_opset_version):
         """
         get the Split operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         axis = onnx_node.getattr("axis", 0)
         split = onnx_node.getattr("split", None)
         num_output = len(onnx_node.outputs)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(axis, split, num_output)
-
-    @classmethod
-    def _create_slice(cls, onnx_node, inputs, opset_version):
-        """
-        get the Slice operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
-        """
-        # we move several inputs to singa's attribuates
-        # and mark them so we don't use them when we run this operator
-        starts = tensor.to_numpy(inputs.pop(1)).astype(np.int32).tolist()
-        ends = tensor.to_numpy(inputs.pop(1)).astype(np.int32).tolist()
-        # sometime onnx may ignore these two inputs, axes and step
-        if len(inputs) >= 2 and onnx_node.inputs[3] != '':
-            axes = tensor.to_numpy(inputs.pop(1)).astype(np.int32).tolist()
-        else:
-            axes = None
-        steps = tensor.to_numpy(inputs.pop(1)).astype(
-            np.int32).tolist() if len(inputs) >= 2 else None
-        onnx_node.consumed_inputs.extend(onnx_node.inputs[1:])
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(starts, ends, axes, steps)
+        return operator(axis, split, num_output)
 
     @classmethod
-    def _create_squeeze(cls, onnx_node, inputs, opset_version):
+    def _create_squeeze_unsqueeze(cls,
+                                  onnx_node,
+                                  operator,
+                                  opset_version=_opset_version):
         """
         get the Squeeze and Unsqueeze operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         axes = onnx_node.getattr("axes")
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(axes)
+        return operator(axes)
 
     @classmethod
-    def _create_globalaveragepool(cls, onnx_node, inputs, opset_version):
+    def _create_global_average_pool(cls,
+                                    onnx_node,
+                                    operator,
+                                    opset_version=_opset_version):
         """
         get the GlobalAveragePool operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         data_format = onnx_node.getattr("data_format", 'channels_first')
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(data_format)
+        return operator(data_format)
 
     @classmethod
-    def _create_leakyrelu(cls, onnx_node, inputs, opset_version):
+    def _create_leakyrelu(cls,
+                          onnx_node,
+                          operator,
+                          opset_version=_opset_version):
         """
         get the LeakyRelu operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         alpha = onnx_node.getattr("alpha", 0.01)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(alpha)
+        return operator(alpha)
 
     @classmethod
-    def _create_reduceOp(cls, onnx_node, inputs, opset_version):
+    def _create_reduce_ops(cls,
+                           onnx_node,
+                           operator,
+                           opset_version=_opset_version):
         """
         get the ReduceSum, ReduceMean, ReduceMax, ReduceMin, etc, operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         axes = onnx_node.getattr("axes", None)
         keepdims = onnx_node.getattr("keepdims", 1)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(axes, keepdims)
+        return operator(axes, keepdims)
 
     @classmethod
-    def _create_dropout(cls, onnx_node, inputs, opset_version):
+    def _create_dropout(cls, onnx_node, operator, opset_version=_opset_version):
         """
         get the Dropout operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         ratio = onnx_node.getattr("ratio", 0)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(ratio)
+        return operator(ratio)
 
     @classmethod
-    def _create_constantOfShape(cls, onnx_node, inputs, opset_version):
+    def _create_constant_of_shape(cls,
+                                  onnx_node,
+                                  operator,
+                                  opset_version=_opset_version):
         """
         get the ConstantOfShape operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         value = onnx_node.getattr("value", 0)
         if isinstance(value, onnx.TensorProto):
             value = numpy_helper.to_array(value)[0].item()
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(value)
+        return operator(value)
 
     @classmethod
-    def _create_transpose(cls, onnx_node, inputs, opset_version):
+    def _create_transpose(cls,
+                          onnx_node,
+                          operator,
+                          opset_version=_opset_version):
         """
         get the Transpose operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
-        """
-        shape = inputs[0].shape
-        perm = onnx_node.getattr("perm", list(range(len(shape) - 1, -1, -1)))
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(perm)
-
-    @classmethod
-    def _create_clip(cls, onnx_node, inputs, opset_version):
-        """
-        get the clip operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
-        # sometime onnx may ignore these two inputs, min or max or both
-        if len(inputs) >= 2 and onnx_node.inputs[1] != '':
-            min_v = tensor.to_numpy(inputs.pop(1)).tolist()[0]
-        else:
-            min_v = None
-        if len(inputs) >= 2 and onnx_node.inputs[2] != '':
-            max_v = tensor.to_numpy(inputs.pop(1)).tolist()[0]
-        else:
-            max_v = None
-        onnx_node.consumed_inputs.extend(onnx_node.inputs[1:])
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(min_v, max_v)
+        perm = onnx_node.getattr("perm")
+        return operator(perm)
 
     @classmethod
-    def _create_hardsigmoid(cls, onnx_node, inputs, opset_version):
+    def _create_hardsigmoid(cls,
+                            onnx_node,
+                            operator,
+                            opset_version=_opset_version):
         """
-        get the HardSigmoid operator from onnx node
+        get the hardsigmoid operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            
+            opset_version (int): the opset version
         Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         alpha = onnx_node.getattr("alpha", 0.2)
         beta = onnx_node.getattr("beta", 0.5)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(alpha, beta)
+        return operator(alpha, beta)
 
     @classmethod
-    def _create_elu(cls, onnx_node, inputs, opset_version):
+    def _create_elu(cls, onnx_node, operator, opset_version=_opset_version):
         """
         get the elu operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         alpha = onnx_node.getattr("alpha", 1.)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(alpha)
+        return operator(alpha)
 
     @classmethod
-    def _create_selu(cls, onnx_node, inputs, opset_version):
+    def _create_selu(cls, onnx_node, operator, opset_version=_opset_version):
         """
         get the selu operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         alpha = onnx_node.getattr("alpha", 1.67326)
         gamma = onnx_node.getattr("gamma", 1.0507)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(alpha, gamma)
+        return operator(alpha, gamma)
 
     @classmethod
-    def _create_reshape(cls, onnx_node, inputs, opset_version):
+    def _create_concat(cls, onnx_node, operator, opset_version=_opset_version):
         """
-        get the reshape operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
+        get the concat operator from onnx node
         Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            the handle of singa operator
-        Returns: 
-            the autograd of singa operator
+            singa operator instance
         """
-        shape = tensor.to_numpy(inputs.pop(1)).astype(np.int32).tolist()
-        onnx_node.consumed_inputs.append(onnx_node.inputs[1])
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(shape)
+        factor = onnx_node.getattr('axis')
+        return operator(axis=factor)
 
     @classmethod
-    def _create_conv(cls, onnx_node, inputs, opset_version):
+    def _create_softmax(cls, onnx_node, operator, opset_version=_opset_version):
         """
-        get the conv operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
+        get the softmax operator from onnx node
         Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
-        kernel = tuple(onnx_node.attrs["kernel_shape"])
-        padding = tuple(
-            onnx_node.attrs["pads"]) if "pads" in onnx_node.attrs else (0, 0)
-        stride = tuple(onnx_node.getattr('strides', (1, 1)))
-        # default the odd_padding is 0, once there are same pad mode, we modify it
-        # for odd_padding, please refer the autegrade.py
-        odd_padding = (0, 0, 0, 0)
-        if "auto_pad" in onnx_node.attrs:
-            auto_pad = utils.force_unicode(onnx_node.attrs['auto_pad'])
-            if auto_pad in ('SAME_UPPER', 'SAME_LOWER'):
-                padding, odd_padding = utils.get_padding_shape(
-                    auto_pad, inputs[0].shape[2:], kernel, stride)
-
-        # not support dilation
-        dilation = onnx_node.getattr('dilations', 1)
-        if dilation != 1 and list(dilation) != [1, 1]:
-            raise ValueError("Not implemented yet for dilation")
-        group = onnx_node.getattr('group', 1)
-
-        # only support 1d or 2d
-        if len(kernel) > 2:
-            raise ValueError("Only implemented for 1d or 2d")
-
-        bias = len(inputs) == 3
-        x = inputs[0]
-        x_shape = inputs[0].shape
-        in_channels = x_shape[1]
-        w_shape = inputs[1].shape
-        out_channels = w_shape[0]
-        assert w_shape[1] == in_channels // group
-
-        if inputs[0].device.id() == -1:
-            if group != 1:
-                raise NotImplementedError
-            else:
-                handle = singa.ConvHandle(x.data, kernel, stride, padding,
-                                          in_channels, out_channels, bias,
-                                          group)
-        else:
-            handle = singa.CudnnConvHandle(x.data, kernel, stride, padding,
-                                           in_channels, out_channels, bias,
-                                           group)
-
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(handle, odd_padding)
+        factor = onnx_node.getattr('axis', 1)
+        return operator(axis=factor)
 
     @classmethod
-    def _create_max_avg_pool(cls, onnx_node, inputs, opset_version):
+    def _create_gemm(cls, onnx_node, operator, opset_version=_opset_version):
         """
-        get the max or avg pool operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
+        get the gemm operator from onnx node
         Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
-        kernel = tuple(onnx_node.attrs["kernel_shape"])
-        padding = tuple(
-            onnx_node.attrs["pads"]) if "pads" in onnx_node.attrs else (0, 0)
-        stride = tuple(onnx_node.getattr('strides', (1, 1)))
-        # default the odd_padding is 0, once there are same pad mode, we modify it
-        # for odd_padding, please refer the autegrade.py
-        odd_padding = (0, 0, 0, 0)
-        if "auto_pad" in onnx_node.attrs:
-            auto_pad = utils.force_unicode(onnx_node.attrs['auto_pad'])
-            if auto_pad in ('SAME_UPPER', 'SAME_LOWER'):
-                padding, odd_padding = utils.get_padding_shape(
-                    auto_pad, inputs[0].shape[2:], kernel, stride)
-
-        # not support count_include_pad and auto_pad
-        if "count_include_pad" in onnx_node.attrs or "ceil_mode" in onnx_node.attrs:
-            raise ValueError(
-                "Not implemented yet for count_include_pad or ceil_mode")
-
-        # only support 2d
-        if len(kernel) != 2:
-            raise ValueError("Not implemented yet")
-
-        is_max = onnx_node.op_type == 'MaxPool'
-        x = inputs[0]
-        if x.device.id() == -1:
-            handle = singa.PoolingHandle(x.data, kernel, stride, padding,
-                                         is_max)
-        else:
-            handle = singa.CudnnPoolingHandle(x.data, kernel, stride, padding,
-                                              is_max)
-
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(handle, odd_padding)
+        alpha = onnx_node.getattr('alpha', 1.)
+        beta = onnx_node.getattr('beta', 1.)
+        transA = onnx_node.getattr('transA', 0)
+        transB = onnx_node.getattr('transB', 0)
+        onnx_node.set_weight_inputs(onnx_node.inputs[1], 'W')
+        bias = False
+        if len(onnx_node.inputs) == 3:
+            onnx_node.set_weight_inputs(onnx_node.inputs[2], 'b')
+            bias = True
+        return operator(None,
+                        alpha=alpha,
+                        beta=beta,
+                        transA=transA,
+                        transB=transB,
+                        bias=bias)
 
     @classmethod
-    def _create_batchnorm(cls, onnx_node, inputs, opset_version):
+    def _create_flatten(cls, onnx_node, operator, opset_version=_opset_version):
         """
-        get the batch norm operator from onnx node
-        Args:onnx_node: a given onnx node
-        Args:inputs: the input tensor
-        Args:opset_version: the opset version
-        Returns: the handle of singa operator
-        Returns: the autograd of singa operator
+        get the flatten operator from onnx node
+        Args:
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
+        Returns: 
+            singa operator instance
         """
-        x = inputs[0]
-        factor = onnx_node.getattr('momentum', 0.9)
-        if x.device.id() == -1:
-            handle = singa.BatchNormHandle(factor, x.data)
-        else:
-            handle = singa.CudnnBatchNormHandle(factor, x.data)
-
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return handle, forward
+        factor = onnx_node.getattr('axis', 1)
+        return operator(axis=factor)
 
     @classmethod
-    def _create_concat(cls, onnx_node, inputs, opset_version):
+    def _create_onehot(cls, onnx_node, operator, opset_version=_opset_version):
         """
-        get the concat operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
+        get the OneHot operator from onnx node
         Args:
-            opset_version: the opset version
-        Returns: 
-            the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            the autograd of singa operator
+            singa operator instance
         """
-        factor = onnx_node.attrs["axis"]
-        if factor < 0:
-            factor = len(inputs[0].shape
-                        ) + factor  # in order to support the negative axis
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return None, forward(axis=factor)
+        axis = onnx_node.getattr("axis", -1)
+        onnx_node.set_attr_inputs(onnx_node.inputs[1], 'depth')
+        onnx_node.set_attr_inputs(onnx_node.inputs[2], 'values')
+        return operator(axis, None, None)
 
     @classmethod
-    def _create_softmax(cls, onnx_node, inputs, opset_version):
+    def _create_tile(cls, onnx_node, operator, opset_version=_opset_version):
         """
-        get the concat operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
+        get the Tile operator from onnx node
         Args:
-            opset_version: the opset version
-        Returns: 
-            the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            the autograd of singa operator
+            singa operator instance
         """
-        factor = onnx_node.getattr('axis', 1)
-        if factor < 0:
-            # in order to support the negative axis
-            factor = len(inputs[0].shape) + factor
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return None, forward(axis=factor)
+        onnx_node.set_attr_inputs(onnx_node.inputs[1], 'repeats')
+        return operator(None)
 
     @classmethod
-    def _create_gemm(cls, onnx_node, inputs, opset_version):
+    def _create_gather(cls, onnx_node, operator, opset_version=_opset_version):
         """
-        get the gemm operator from onnx node
-        Args:
-            onnx_node: a given onnx node
+        get the Gather operator from onnx node
         Args:
-            inputs: the input tensor
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
+        Returns: 
+            singa operator instance
+        """
+        axis = onnx_node.getattr("axis", 0)
+        onnx_node.set_attr_inputs(onnx_node.inputs[1], 'indices')
+        return operator(axis, None)
+
+    @classmethod
+    def _create_reshape(cls, onnx_node, operator, opset_version=_opset_version):
+        """
+        get the reshape operator from onnx node
         Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            the handle of singa operator
+            singa operator instance
+        """
+        onnx_node.set_attr_inputs(onnx_node.inputs[1], 'shape')
+        return operator(None)
+
+    @classmethod
+    def _create_slice(cls, onnx_node, operator, opset_version=_opset_version):
+        """
+        get the Slice operator from onnx node
+        Args:
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            the autograd of singa operator
+            singa operator instance
         """
-        x = inputs[0]
-        alpha = onnx_node.getattr('alpha', 1.)
-        beta = onnx_node.getattr('beta', 1.)
-        transA = onnx_node.getattr('transA', 0)
-        transB = onnx_node.getattr('transB', 0)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return None, forward(alpha=alpha,
-                             beta=beta,
-                             transA=transA,
-                             transB=transB)
+        onnx_node.set_attr_inputs(onnx_node.inputs[1], 'starts')
+        onnx_node.set_attr_inputs(onnx_node.inputs[2], 'ends')
+        if len(onnx_node.inputs) >= 4 and onnx_node.inputs[3] != '':
+            onnx_node.set_attr_inputs(onnx_node.inputs[3], 'axes')
+        if len(onnx_node.inputs) == 5 and onnx_node.inputs[4] != '':
+            onnx_node.set_attr_inputs(onnx_node.inputs[4], 'steps')
+        return operator(None, None, None, None)
 
     @classmethod
-    def _create_flatten(cls, onnx_node, inputs, opset_version):
+    def _create_clip(cls, onnx_node, operator, opset_version=_opset_version):
         """
-        get the flatten operator from onnx node
+        get the clip operator from onnx node
         Args:
-            onnx_node: a given onnx node
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
+        Returns: 
+            singa operator instance
+        """
+        if len(onnx_node.inputs) >= 2 and onnx_node.inputs[1] != '':
+            onnx_node.set_attr_inputs(onnx_node.inputs[1], 'min')
+        if len(onnx_node.inputs) == 3 and onnx_node.inputs[2] != '':
+            onnx_node.set_attr_inputs(onnx_node.inputs[2], 'max')
+        return operator(None, None)
+
+    @classmethod
+    def _create_batch_norm(cls,
+                           onnx_node,
+                           operator,
+                           opset_version=_opset_version):
+        """
+        get the clip operator from onnx node
         Args:
-            inputs: the input tensor
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
+        Returns: 
+            singa operator instance
+        """
+        factor = onnx_node.getattr('momentum', 0.9)
+        onnx_node.set_weight_inputs(onnx_node.inputs[1], 'scale')
+        onnx_node.set_weight_inputs(onnx_node.inputs[2], 'bias')
+        onnx_node.set_weight_inputs(onnx_node.inputs[3], 'running_mean')
+        onnx_node.set_weight_inputs(onnx_node.inputs[4], 'running_var')
+        return operator(factor)
+
+    @classmethod
+    def _create_conv(cls, onnx_node, operator, opset_version=_opset_version):
+        """
+        get the clip operator from onnx node
         Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            the handle of singa operator
+            singa operator instance
+        """
+        kernel_size = tuple(onnx_node.getattr('kernel_shape'))
+        padding = tuple(onnx_node.getattr('pads', (0, 0)))
+        stride = tuple(onnx_node.getattr('strides', (1, 1)))
+        auto_pad = utils.force_unicode(onnx_node.getattr('auto_pad', 'NOTSET'))
+
+        # not support dilation
+        dilation = onnx_node.getattr('dilations', 1)
+        if dilation != 1 and list(dilation) != [1, 1]:
+            raise ValueError("Not implemented yet for dilation")
+        group = onnx_node.getattr('group', 1)
+
+        # only support 1d or 2d
+        if len(kernel_size) > 2:
+            raise ValueError("Only implemented for 1d or 2d")
+
+        onnx_node.set_weight_inputs(onnx_node.inputs[1], 'W')
+        bias = False
+        if len(onnx_node.inputs) == 3:
+            onnx_node.set_weight_inputs(onnx_node.inputs[2], 'b')
+            bias = True
+        return operator(None,
+                        kernel_size,
+                        stride=stride,
+                        padding=padding,
+                        dilation=dilation,
+                        group=group,
+                        bias=bias,
+                        pad_mode=auto_pad)
+
+    @classmethod
+    def _create_max_avg_pool(cls,
+                             onnx_node,
+                             operator,
+                             opset_version=_opset_version):
+        """
+        get the clip operator from onnx node
+        Args:
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            the autograd of singa operator
+            singa operator instance
         """
-        factor = onnx_node.getattr('axis', 1)
-        if factor < 0:
-            # in order to support the negative axis
-            factor = len(inputs[0].shape) + factor
+        kernel_size = tuple(onnx_node.getattr('kernel_shape'))
+        padding = tuple(onnx_node.getattr('pads', (0, 0)))
+        stride = tuple(onnx_node.getattr('strides', (1, 1)))
+        auto_pad = utils.force_unicode(onnx_node.getattr('auto_pad', 'NOTSET'))
 
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return None, forward(axis=factor)
+        # not support count_include_pad and auto_pad
+        ceil_mode = onnx_node.getattr('ceil_mode', 0)
+        count_include_pad = onnx_node.getattr('count_include_pad', 0)
+        if ceil_mode != 0 or count_include_pad != 0:
+            raise ValueError(
+                "Not implemented yet for count_include_pad or ceil_mode")
+
+        # only support 1d or 2d
+        if len(kernel_size) > 2:
+            raise ValueError("Only implemented for 1d or 2d")
+
+        is_max = onnx_node.op_type == 'MaxPool'
+        return operator(kernel_size, stride, padding, is_max, auto_pad)
 
     @classmethod
-    def _common_onnx_node_to_singa_op(cls, onnx_node, inputs, opset_version):
+    def _onnx_constant_to_np(cls, onnx_node, opset_version):
         """
-        get a common singa operator(only autograd) from a onnx node
-        other special operators also can call this func to get autograd
-        Args:
-            onnx_node: a given onnx node
+        parse onnx constatn node to numpy array
         Args:
-            tensor_map: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            a dict of tensors
+            onnx_node (OnnxNode): a given onnx node
+            opset_version (int): the opset version
         Returns: 
-            a list of SingaOps('name', 'op', 'handle', 'forward')
+            a numpy ndarray
         """
-        onnx_op_type = onnx_node.op_type
-        assert onnx_op_type in cls._rename_operators, "not support operator: {}".format(
-            onnx_op_type)
-        autograd_op = getattr(autograd, cls._rename_operators[onnx_op_type])
-        return None, autograd_op
+        onnx_tensor = onnx_node.getattr('value')
+        np_dtype = mapping.TENSOR_TYPE_TO_NP_TYPE[onnx_tensor.data_type]
+        np_tensor = np.frombuffer(onnx_tensor.raw_data, dtype=np_dtype)
+        return tensor.from_numpy(np_tensor)
 
     @classmethod
-    def _onnx_node_to_singa_op(cls,
-                               onnx_node,
-                               inputs,
-                               opset_version=_known_opset_version):
+    def _onnx_node_to_singa_op(cls, onnx_node, opset_version=_opset_version):
         """
-        get a singa operator(handle and autograd) from a onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input list
+        get singa operator from a onnx node
         Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            opset_version (int): the opset version
         Returns: 
-            a dict of tensors
-        Returns: 
-            a list of SingaOps('name', 'op', 'handle', 'forward')
+            singa operator instance
         """
+        onnx_op_type = onnx_node.op_type
+        assert onnx_op_type in cls._rename_operators, "not support operator: {}".format(
+            onnx_op_type)
+        renamed_op = cls._rename_operators[onnx_op_type]
+        if renamed_op.startswith('layer.'):
+            op_class = getattr(layer, renamed_op[6:])
+        else:
+            op_class = getattr(autograd, renamed_op)
         if onnx_node.op_type in cls._special_operators:
             translator = getattr(cls, cls._special_operators[onnx_node.op_type])
+            op = translator(onnx_node, op_class, opset_version)
         else:
-            translator = cls._common_onnx_node_to_singa_op
-        return translator(onnx_node, inputs, opset_version)
+            op = op_class()
+        # refine the ONNXNode
+        onnx_node.inputs = [inp for inp in onnx_node.inputs if inp != '']
+        return op
 
     @classmethod
-    def run_node(cls, onnx_node, inputs, opset_version=_known_opset_version):
+    def run_node(cls, node, inputs, device='CPU', opset_version=_opset_version):
         """
         run a single singa operator from a onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            device: the used device
-        Args:
-            opset_version: the opset version
-        Returns: 
-            list, the output of the 
+            node (NodeProto): a given onnx node
+            inputs (ndarray[]): a list of numpy ndarray
+            device (string): CPU or CUDA
+            opset_version (int): the opset version
+        Returns:
+            list, the output
         """
-        valid_inputs = [x for x in onnx_node.inputs if x != ""]
+        node = OnnxNode(node)
+        valid_inputs = [x for x in node.inputs if x != ""]
         assert len(valid_inputs) == len(
-            inputs), "{}: expected {} but got {}".format(
-                onnx_node.op_type, len(valid_inputs), len(inputs))
-
-        tmp_inputs = [inputs[x] for x in onnx_node.inputs if x != ""]
-        handle, forward = cls._onnx_node_to_singa_op(onnx_node, tmp_inputs,
-                                                     opset_version)
-        # only give the inputs it needs
-        # consumed_inputs are the inputs marked as attributes
-        # so we remove it here
-        tmp_inputs = [
-            inputs[x]
-            for x in onnx_node.inputs
-            if x not in onnx_node.consumed_inputs
-        ]
-        return cls._run_node(onnx_node, tmp_inputs, handle, forward,
-                             opset_version)
+            inputs), "{}: expected {} inputs, but got {}. ".format(
+                node.op_type, len(valid_inputs), len(inputs))
+
+        operator = cls._onnx_node_to_singa_op(node, opset_version)
+        # seperate weights with inputs, and init inputs as Tensor
+        weights = {}
+        _inputs = []
+        for (key, val) in zip(valid_inputs, inputs):
+            val = val.astype(onnx_type_to_singa_type(val.dtype))
+            if key in node.weight_inputs:
+                weights[key] = val
+            else:
+                x = tensor.from_numpy(val)
+                if device == 'CPU':
+                    assert singa.USE_CUDA, "Your SINGA doesn't compile GPU module."
+                    dev = device.create_cuda_gpu(set_default=False)
+                else:
+                    dev = device.get_default_device()
+                x.to_device(dev)
+                _inputs.append(x)
+        inputs = _inputs
+        # set params
+        params = {}
+        for key, name in node.weight_inputs.items():
+            params[name] = weights[key]
+        operator.set_params(params)
+        outputs = cls._run_node(operator, inputs)
+        outputs_dict = OrderedDict()
+        for (key, val) in zip(node.outputs, outputs):
+            outputs_dict[key] = val
+        return outputs_dict
 
     @classmethod
-    def _run_node(cls,
-                  onnx_node,
-                  inputs,
-                  handle,
-                  forward,
-                  opset_version=_known_opset_version):
+    def _run_node(cls, operator, inputs):
         """
-        run a single singa operator from a onnx node
-        Args:inputs: 
-            the input tensor
-        Args:handle: 
-            the handle of singa operator
-        Args:forward: 
-            the forward of singa operator
+        run a single singa operator from singa operator
         Args:
-            opset_version: the opset version
-        Returns: 
-            list, the output of the
+            operator (Operator): the Operator instance
+            inputs (Tensor[]): a list of SINGA Tensor
+        Returns:
+            list, the output
         """
-        outputs = forward(*inputs) if handle is None else forward(
-            handle, *inputs)
+        outputs = operator(*inputs)
         if not isinstance(outputs, collections.Iterable):
             outputs = [outputs]
-        outputs_dict = OrderedDict()
-        for (key, val) in zip(onnx_node.outputs, outputs):
-            outputs_dict[key] = val
-        return outputs_dict
+        return outputs
 
     @classmethod
-    def _init_graph_parameter(cls, graph, init_inputs, device):
+    def _parse_graph_params(cls, graph, device):
         """
-        init the singa tensor from onnx infos
+        parse the parameters from onnx graph
         Args:
-            graph: a given onnx graph
-        Args:
-            init_inputs: a list of inputs, which used to init the operators
+            graph (Graph): a given onnx graph
+            device (string): CPU or CUDA
+        Returns:
+            a dict of numpy ndarray
+        """
+        params = {}
+        for tp in graph.initializer:
+            val = numpy_helper.to_array(tp)
+            val = val.astype(onnx_type_to_singa_type(tp.data_type))
+            params[tp.name] = val
+        return params
+
+    @classmethod
+    def _parse_graph_inputs_outputs(cls, graph, params, device):
+        """
+        parse the inits, outputs from onnx graph
         Args:
-            device: the used device
+            graph (Graph): a given onnx graph
+            device (string): # CPU or CUDA
         Returns:
-            a dict of tensors
+            a dict of ValueInfo
+            a dict of ValueInfo
         """
-        tensor_map = {}
-        # due to https://github.com/onnx/onnx/issues/2417
-        # sometimes, input contains all initializer's info
-        # sometimes, may not
-        all_inputs = OrderedDict()
+        inputs = []
+        outputs = []
+        info_tuple = namedtuple('info_tuple', ['name', 'dtype', 'shape'])
         for t in graph.input:
-            all_inputs[t.name] = t
-        # so we refresh the input by the initializer
-        for t in graph.initializer:
-            all_inputs[t.name] = t
-        initializers = {t.name for t in graph.initializer}
-        inp_idx = 0
-        for name, x in all_inputs.items():
-            if name in initializers:
-                # if it has initializer, we use its value as the input
-                np_tensor = numpy_helper.to_array(x)
-                if np_tensor.dtype == "int64":
-                    np_tensor = np_tensor.astype(np.int32)
-                # todo, we cannot support scalar tensor
-                if np.ndim(np_tensor) == 0:
-                    np_tensor = np.array(np_tensor, ndmin=1)
-            else:
-                # if not, means it's a input rather than a inner weight
-                # so if the user gives values, we use these values
-                # if not, we just use the shape of input gived by onnx to init a random value
-                # HOWEVER, the random value may not be correct for some inputs, such as gather which needs indices
-                # so if have operators, the user must give inputs
-                x_shape = tuple(
-                    dim.dim_value for dim in x.type.tensor_type.shape.dim)
-                if init_inputs is not None:
-                    np_tensor = init_inputs[inp_idx]
-                    inp_idx += 1
-                else:
-                    np_tensor = np.random.randn(*x_shape).astype(np.float32)
-            tmp_tensor = tensor.from_numpy(np_tensor)
-            tmp_tensor.to_device(device)
-            # todo, for backward
-            tmp_tensor.stores_grad = (name in initializers)
-            tensor_map[x.name] = tmp_tensor
-        return tensor_map
+            if t.name not in params:
+                dtype = t.type.tensor_type.elem_type
+                shape = [dim.dim_value for dim in t.type.tensor_type.shape.dim]
+                inputs.extend([info_tuple(t.name, dtype, shape)])
+        for t in graph.output:
+            dtype = t.type.tensor_type.elem_type
+            shape = [dim.dim_value for dim in t.type.tensor_type.shape.dim]
+            outputs.extend([info_tuple(t.name, dtype, shape)])
+        return inputs, outputs
 
     @classmethod
-    def _onnx_model_to_singa_net(cls, model, init_inputs, device,
-                                 opset_version):
+    def _onnx_model_to_singa_ops(cls, graph, device, opset_version):
         """
-        get all intermediate tensors and operators from onnx model
-        Args:
-            model: a given onnx model
+        get all intermediate params, operators, and input info from onnx model
         Args:
-            init_inputs: a list of inputs, which used to init the operators
-        Args:
-            device: the used device
-        Args:
-            opset_version: the opset version
-        Returns:
-            a dict of tensors
+            graph (Graph): the loaded ONNX graph
+            device (string): CPU or CUDA
+            opset_version (int): the opset version
         Returns:
-            a list of SingaOps('name', 'op', 'handle', 'forward')
-        """
-        # init all tensor input and weight as a tensor map
-        tensor_map = cls._init_graph_parameter(model.graph, init_inputs, device)
-        # only weights tensor
-        weights = {x.name: tensor_map[x.name] for x in model.graph.initializer}
+            a dict of weights
+            a dict of ValueInfo
+            a dict of ValueInfo
+            a list of SingaOps('node', 'forward')
+        """
+        # init all tensor input and params as a tensor map
+        params = cls._parse_graph_params(graph, device)
+        inputs, outputs = cls._parse_graph_inputs_outputs(graph, params, device)
         # the parsed operators queue
-        singa_ops = []
-        singa_op = namedtuple('SingaOps', ['name', 'op', 'handle', 'forward'])
-        for node in model.graph.node:
+        operators = []
+        operator_tuple = namedtuple('operator_tuple', ['node', 'operator'])
+        for node in graph.node:
             node = OnnxNode(node)
-            # only give the inputs it needs
-            # consumed_inputs are the inputs marked as attributes
-            # so we remove it here
-            inputs = [
-                tensor_map[x]
-                for x in node.inputs
-                if x not in node.consumed_inputs
-            ]
-            handle, forward = cls._onnx_node_to_singa_op(
-                node, inputs, opset_version)
-            # if it is Constant, we hanlde it as a weight
-            # otherwise, we run it and add its output into map for being used by later operators
+            # convert Constant to param
             if node.op_type == 'Constant':
-                tmp_tensor = tensor.from_numpy(forward)
-                tmp_tensor.to_device(device)
-                tmp_name = node.outputs.pop(0)
-                weights[tmp_name] = tmp_tensor
-                tensor_map[tmp_name] = tmp_tensor
+                params[node.outputs[0]] = cls._onnx_constant_to_np(node)
             else:
-                outputs = cls._run_node(node, inputs, handle, forward)
-                for key, val in outputs.items():
-                    tensor_map[key] = val
-                singa_ops.extend([singa_op(node.name, node, handle, forward)])
-        return weights, singa_ops
+                op = cls._onnx_node_to_singa_op(node, opset_version)
+                operators.extend([operator_tuple(node, op)])
+        return params, inputs, outputs, operators
 
     @classmethod
-    def prepare(cls, model, device, **kwargs):
+    def prepare(cls, model, device='CPU', **kwargs):
         """
-        get the batch norm operator from onnx node
-        Args:
-            model: a given onnx node
+        parse the ONNX and to create layers
         Args:
-            device: the used device
-        Returns: 
-            a list of output values
+            model (ModelProto): the loaded ONNX model
+            device (string): CPU or CUDA
+        Returns:
+            a SingaRep instance to stores the layers and weights
         """
         super(SingaBackend, cls).prepare(model, device, **kwargs)
-        # when parsing graph, we use the shape of input gived by onnx to init a random value
-        # HOWEVER, the random value may not be correct for some inputs, such as gather which needs indices
-        # so if have operators, the user must give inputs
-        init_inputs = kwargs.get("init_inputs", None)
-        # whether initializers are moved into inputs, due to https://github.com/onnx/onnx/issues/2417
-        # sometimes, input contains all initializer's info, sometimes, may not
-        cls.keep_initializers_as_inputs = kwargs.get(
-            'keep_initializers_as_inputs', True)
         # optimize and infer the shape of the model
         try:
             model = onnx.utils.polish_model(model)
         except IndexError as err:
-            # due to https://github.com/onnx/onnx/issues/2417
             model = onnx.shape_inference.infer_shapes(model)
 
         # check the opset version and ir version
+        # SINGA supports opset version(11), ir version(1.6.0 -> 6)
         opset_version = None
         for imp in model.opset_import:
             if not imp.HasField("domain") or imp.domain == "":
                 opset_version = imp.version
-                if imp.version > cls._known_opset_version:
+                if imp.version > cls._opset_version:
                     warnings.warn(
-                        "This version of singa targets ONNX operator set version {}, but the model we are trying to import uses version {}.  We will try to import it anyway, but if the model uses operators which had BC-breaking changes in the intervening versions, import will fail."
-                        .format(cls._known_opset_version, imp.version))
+                        "The imported opertor set verion {} is larger than the supported version {}."
+                        .format(imp.version, cls._opset_version))
             else:
                 warnings.warn("Unrecognized operator set {}".format(imp.domain))
-        if opset_version is None:
-            if model.ir_version >= 0x00000003:
-                raise RuntimeError(
-                    "Model with IR version >= 3 did not specify ONNX operator set version (singa requires it)"
-                )
-            else:
-                opset_version = 1
-        weights, singa_ops = cls._onnx_model_to_singa_net(
-            model, init_inputs, device, opset_version)
-        return SingaRep(model, weights, singa_ops,
-                        cls.keep_initializers_as_inputs)
+
+        if model.ir_version > cls._ir_version:
+            warnings.warn(
+                "The imported ir verion {} is larger than the supported version {}."
+                .format(cls._ir_version, imp.version))
+
+        graph = model.graph
+        params, inputs, outputs, layers = cls._onnx_model_to_singa_ops(
+            graph, device, opset_version)
+        return SingaRep(params, inputs, outputs, layers, device)
 
 
 class SingaRep(BackendRep):
 
-    def __init__(self,
-                 model,
-                 weights,
-                 singa_ops,
-                 keep_initializers_as_inputs=True):
+    def __init__(self, params, inputs, outputs, layers, device):
         """
+        https://github.com/onnx/onnx/blob/master/docs/ImplementingAnOnnxBackend.md
         SingaRep provides the intermediate representation of Singa,
         the user can run the forward of the singa model by run func,
         or, the user can append more layers after the singa_ops to do
         the transfer learning
         Args:
-            model: a given operator
+            params (dict{}): a dict of params, data type is numpy ndarray
+            inputs (ValueInfo): a dict of inputs
+            outputs (ValueInfo): a dict of outputs
+            layers (namedtuple('operator_tuple', ['node', 'operator'])[]): a list of singa operator
+            device (string): CPU or CUDA
+        """
+        super(SingaRep, self).__init__()
+        self.inputs = inputs
+        self.states = params
+        self.outputs = outputs
+        self.dev = cpu_dev if device == "CPU" else gpu_dev
+        self.layers = layers
+        self.tensor_count = {}
+        self.has_initialized = False
+        self.is_graph = False
+
+    def initialize(self):
+        """
+        Init the instance
+        """
+        self.outputs_info = {outp.name: outp for outp in self.outputs}
+        _layers = []  # layers by topo order
+        for node, operator in self.layers:
+            for key, name in node.weight_inputs.items():
+                if key not in self.states:
+                    # cannot find the weights, try to find it from input
+                    node.set_attr_inputs(key, name)
+            self.__dict__[node.name] = operator
+            # init the tensor count
+            all_possible_inputs = node.inputs + list(
+                node.attr_inputs.keys()) + list(node.weight_inputs.keys())
+            for inp in all_possible_inputs:
+                if inp not in self.tensor_count:
+                    self.tensor_count[inp] = 1
+                else:
+                    self.tensor_count[inp] += 1
+            _layers.append(node)
+        self._layers = _layers
+
+    def init_tensor_count(self):
+        """
+        Init the tensor count dict
+        """
+        self.tensor_count = {}
+        for node, operator in self.layers:
+            # init the tensor count
+            all_possible_inputs = node.inputs + list(
+                node.attr_inputs.keys()) + list(node.weight_inputs.keys())
+            for inp in all_possible_inputs:
+                if inp not in self.tensor_count:
+                    self.tensor_count[inp] = 1
+                else:
+                    self.tensor_count[inp] += 1
+
+    def to_input_tensor(self, x):
+        """
+        convert the input to tensors
         Args:
-            weights: the tensor of weights
+            x (np.ndarray[]): a list of numpy ndarray as inputs
+        Returns: 
+            a dict of SINGA Tensors
+        """
+        tensor_dict = {}
+        # init inputs as Tensor
+        for (key, val) in zip(self.inputs, x):
+            if not self.is_graph:
+                val = val.astype(onnx_type_to_singa_type(key.dtype))
+                # todo, scalar
+                val = np.atleast_1d(self.states[val])
+                val = tensor.from_numpy(val)
+                val.to_device(self.dev)
+            tensor_dict[key.name] = val
+        return tensor_dict
+
+    def to_output_tensor(self, y, out_name):
+        """
+        convert the tensors to input
         Args:
-            singa_ops: the tensor of the operator
+            x (np.ndarray[]): a list of numpy ndarray as inputs
+        Returns: 
+            a dict of SINGA Tensors
         """
-        super(SingaRep, self).__init__()
-        self.model = model
-        self.tensor_map = weights
-        self.keep_initializers_as_inputs = keep_initializers_as_inputs
-        # this each item of singa_ops is: ('name', 'op', 'handle', 'forward')
-        # the name is a string, op is OnnxNode,
-        # handle is Singa handle to store the tensor into singa operator
-        # the forward is singa autograd operator
-        self.singa_ops = singa_ops
+        if not self.is_graph:
+            y = tensor.to_numpy(y)
+            if out_name in self.outputs_info:
+                np_dtyp = mapping.TENSOR_TYPE_TO_NP_TYPE[
+                    self.outputs_info[out_name].dtype]
+                y = y.astype(np_dtyp)
+        return y
 
-    def run(self, inputs, **kwargs):
+    def get_s(self, name, node, tensor_dict):
+        """
+        get state from the node's weights or tensor_dict
+        Args:
+            name (str): name of the state
+            node (ONNXNode): ONNX node
+            tensor_dict ({}): tensor dict
+        Returns: 
+            the states
+        """
+        if name in node.attr_inputs:
+            return tensor_dict[name]
+        else:
+            return self.states[name]
+
+    def handle_special_ops(self, node, op, tensor_dict):
+        """
+        hanlde some special operations
+        Args:
+            name (str): name of the state
+            node (ONNXNode): ONNX node
+            tensor_dict ({}): tensor dict
+        Returns: 
+            the states
+        """
+        # todo, hard code
+        # Conv2d nb_kernels
+        if node.op_type == "Conv":
+            shape = self.get_s(node.inputs[1], node, tensor_dict).shape
+            op.nb_kernels = shape[0]
+        # Gemm nb_kernels and bias_shape
+        elif node.op_type == "Gemm":
+            nb_kernels_flag = 0 if op.transB == 1 else 1
+            shape = self.get_s(node.inputs[1], node, tensor_dict).shape
+            op.nb_kernels = shape[nb_kernels_flag]
+            if op.bias:
+                shape = self.get_s(node.inputs[2], node, tensor_dict).shape
+                op.bias_shape = shape
+
+    def run(self, *x, **kwargs):
         """
         run the forward of singa model
         Args:
-            inputs: a given operator
+            x (np.ndarray[]): a list of numpy ndarray as inputs
         Returns: 
-            the onnx node
+            a list of outputs
         """
-        graph = self.model.graph
+        if not self.has_initialized:
+            self.initialize()
+            if isinstance(x[0], tensor.Tensor):
+                self.dev = x[0].device
+            self.has_initialized = True
+
+        outputs_dict = OrderedDict([(outp.name, None) for outp in self.outputs])
+
         # last_layers means we run this model until the last #N layers
-        last_layers = kwargs.get('last_layers', len(self.singa_ops))
-        if last_layers != len(self.singa_ops):
-            final_outputs = self.singa_ops[last_layers-1].op.outputs
-        else:
-            final_outputs =  [outp.name for outp in graph.output]
-        # whether return all outputs
-        all_outputs = kwargs.get('all_outputs', False)
-        # get a specific op by its name
-        op_name = kwargs.get('op_name', None)
-        # record the tensor we added from input
-        tmp_tensor_map = {name: val for name, val in self.tensor_map.items()}
-
-        # the dict will be returned
-        ret_outputs = OrderedDict()
-        if self.keep_initializers_as_inputs:
-            require_input_len = len(graph.input) - len(graph.initializer)
-            actual_input_len = len(inputs)
-        else:
-            require_input_len = len(graph.input)
-            actual_input_len = len(inputs)
-        assert require_input_len == actual_input_len, "The length of graph input is different from the tensor input: %d, %d" % (
-            require_input_len, actual_input_len)
-        # run the handle by the order of the list(the list is Topological Sorting)
-        for inp in graph.input:
-            if inp.name not in tmp_tensor_map:
-                tmp_tensor_map[inp.name] = inputs.pop(0)
-
-        for _, op, handle, forward in self.singa_ops[:last_layers]:
-            if len(op.consumed_inputs) != 0:
-                # because if op has consumed_inputs, it means it moved some inputs into attributes
-                # so when running, we should update these attributes
-                handle, forward = get_op(op,
-                                         [tmp_tensor_map[x] for x in op.inputs])
-            inputs = [
-                tmp_tensor_map[x]
-                for x in op.inputs
-                if x not in op.consumed_inputs
-            ]
-            outputs = _run_node(op, inputs, handle, forward)
-            for key, val in outputs.items():
-                tmp_tensor_map[key] = val
-                ret_outputs[key] = val
-
-        if op_name is not None:
-            if op_name in outputs:
-                return outputs[op_name]
+        last_layers = kwargs.get('last_layers', len(self._layers))
+        if last_layers != len(self._layers):
+            for outp in self._layers[last_layers - 1].outputs:
+                outputs_dict[outp] = None
+
+        aux_output = kwargs.get('aux_output', ())
+        for outp in aux_output:
+            outputs_dict[outp] = None
+
+        tensor_dict = self.to_input_tensor(x)
+        self.init_tensor_count()
+
+        # run the layer by the topo order
+        for node in self._layers[:last_layers]:
+            op = self.__dict__[node.name]
+            self.handle_special_ops(node, op, tensor_dict)
+            # make input
+            inputs = []
+            for inp in node.inputs:
+                if inp not in node.weight_inputs and inp not in node.attr_inputs:
+                    if inp in tensor_dict:
+                        inputs.append(tensor_dict[inp])
+                    elif inp in self.states:
+                        # todo, scalar
+                        val = np.atleast_1d(self.states[inp])
+                        val = tensor.from_numpy(val)
+                        val.to_device(self.dev)
+                        inputs.append(val)
+                    else:
+                        raise KeyError("Not found the input {} for operation {}".format(inp, node.name))
+            states = {}
+            if callable(getattr(op, "initialize",
+                                None)) and not op._initialized:
+                # init the operator
+                op.initialize(*inputs)
+                op._initialized = True
+                for key, name in node.weight_inputs.items():
+                    if key not in node.attr_inputs:
+                        # find the weights and not in the inputs
+                        states[name] = self.states[key]
+
+            # replace attrs by inputs
+            for key, name in node.attr_inputs.items():
+                if key in tensor_dict:
+                    ts = tensor_dict[key]
+                    if isinstance(ts, tensor.Tensor):
+                        ts = tensor.to_numpy(ts)
+                    states[name] = ts
+                elif key in self.states:
+                    states[name] = self.states[key]
+            # set states
+            if callable(getattr(op, "set_states", None)):
+                op.set_states(**states)
             else:
-                raise RuntimeError(
-                    "The op_name {} does not exist, please check. The available op_names are: {}"
-                    .format(op_name, [val for key, val in op_name.items()]))
-
-        # return all outputs if all_outputs==True
-        # else return last outputs
-        if all_outputs:
-            return ret_outputs
-        else:
-            return [ret_outputs[outp] for outp in final_outputs]
+                for key, value in states.items():
+                    setattr(op, key, value)
+            # run the node
+            outputs = _run_node(op, inputs)
+            # release the input tensor
+            for inp in node.inputs:
+                if inp in self.tensor_count:
+                    self.tensor_count[inp] -= 1
+                if self.tensor_count[inp] == 0:
+                    if inp in tensor_dict:
+                        del tensor_dict[inp]
+                    del self.tensor_count[inp]
+            # store the output
+            for (outp, val) in zip(node.outputs, outputs):
+                tensor_dict[outp] = val
+                if outp in outputs_dict:
+                    outputs_dict[outp] = self.to_output_tensor(val, outp)
+        return list(outputs_dict.values())
+
+
+class SONNXModel(model.Model):
+
+    def __init__(self, onnx_model):
+        """
+        Init a SIGNA Model
+        Args:
+            onnx_model (ModelProto): a loaded onnx model
+        """
+        super(SONNXModel, self).__init__()
+        self.sg_ir = prepare(onnx_model)
+        for node, operator in self.sg_ir.layers:
+            self.__dict__[node.name] = operator

Review comment:
       if operator is not a Layer instance, then the save_states and load_states cannot work, which works by saving the states of the sub-layers.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] lgtm-com[bot] commented on pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

lgtm-com[bot] commented on pull request #703:
URL: https://github.com/apache/singa/pull/703#issuecomment-636327193


   This pull request **introduces 11 alerts** and **fixes 1** when merging 223b9e3cb92c9c58df4f65bc3cec4244daa7de20 into dd18aff58aafe29b2984c884fad7e453bfe2d507 - [view on LGTM.com](https://lgtm.com/projects/g/apache/singa/rev/pr-a6cccb1b87e08705c401893c872331c60a384299)
   
   **new alerts:**
   
   * 10 for Unused import
   * 1 for Unused local variable
   
   **fixed alerts:**
   
   * 1 for Unused local variable


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] lgtm-com[bot] commented on pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

lgtm-com[bot] commented on pull request #703:
URL: https://github.com/apache/singa/pull/703#issuecomment-630371622


   This pull request **introduces 10 alerts** and **fixes 13** when merging 5fccbbab6c65fb8d563198bc99bbb95bedc09940 into db1846dd2c612950054f75b8125f40cd25a20f44 - [view on LGTM.com](https://lgtm.com/projects/g/apache/singa/rev/pr-68df59fa63bec4645bbc1f9c06ea75ab409cb7d6)
   
   **new alerts:**
   
   * 3 for Unused local variable
   * 3 for Unreachable code
   * 1 for Unnecessary pass
   * 1 for Unused import
   * 1 for Missing call to \_\_init\_\_ during object initialization
   * 1 for Wrong number of arguments in a class instantiation
   
   **fixed alerts:**
   
   * 8 for Missing call to \_\_init\_\_ during object initialization
   * 2 for Unreachable code
   * 1 for Unnecessary pass
   * 1 for Unused local variable
   * 1 for Mismatch between signature and use of an overridden method


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] lgtm-com[bot] commented on pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

lgtm-com[bot] commented on pull request #703:
URL: https://github.com/apache/singa/pull/703#issuecomment-639373166


   This pull request **fixes 1 alert** when merging de46a51b1e3acdba26fbb01f97a17ef3925f276d into ede4a3ed0e29e4ef488e76e37f6c020c44508ea0 - [view on LGTM.com](https://lgtm.com/projects/g/apache/singa/rev/pr-c18d032bd5e80969ecaad40c12c1ce45cfd80e50)
   
   **fixed alerts:**
   
   * 1 for Unused local variable


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] joddiy commented on a change in pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

joddiy commented on a change in pull request #703:
URL: https://github.com/apache/singa/pull/703#discussion_r427584575



##########
File path: examples/cnn/onnx/train.py
##########
@@ -0,0 +1,307 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+import sys, os
+import json
+from singa import singa_wrap as singa
+from singa import opt
+from singa import device
+from singa import tensor
+from singa import sonnx
+from singa import autograd
+import numpy as np
+import time
+import argparse
+from PIL import Image
+import onnx
+from tqdm import tqdm
+from utils import download_model, update_batch_size, check_exist_or_download, softmax_loss
+import logging
+
+logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')
+sys.path.append(os.path.dirname(__file__) + '/..')
+
+# Data Augmentation
+def augmentation(x, batch_size):
+    xpad = np.pad(x, [[0, 0], [0, 0], [4, 4], [4, 4]], 'symmetric')
+    for data_num in range(0, batch_size):
+        offset = np.random.randint(8, size=2)
+        x[data_num, :, :, :] = xpad[data_num, :,
+                                    offset[0]:offset[0] + x.shape[2],
+                                    offset[1]:offset[1] + x.shape[2]]
+        if_flip = np.random.randint(2)
+        if (if_flip):
+            x[data_num, :, :, :] = x[data_num, :, :, ::-1]
+    return x
+
+
+# Calculate Accuracy
+def accuracy(pred, target):
+    # y is network output to be compared with ground truth (int)
+    y = np.argmax(pred, axis=1)
+    a = y == target
+    correct = np.array(a, "int").sum()
+    # print(correct)
+    return correct
+
+
+# Data partition according to the rank
+def partition(global_rank, world_size, train_x, train_y, val_x, val_y):
+    # Partition training data
+    data_per_rank = train_x.shape[0] // world_size
+    idx_start = global_rank * data_per_rank
+    idx_end = (global_rank + 1) * data_per_rank
+    train_x = train_x[idx_start:idx_end]
+    train_y = train_y[idx_start:idx_end]
+    # Partition evaluation data
+    data_per_rank = val_x.shape[0] // world_size
+    idx_start = global_rank * data_per_rank
+    idx_end = (global_rank + 1) * data_per_rank
+    val_x = val_x[idx_start:idx_end]
+    val_y = val_y[idx_start:idx_end]
+    return train_x, train_y, val_x, val_y
+
+
+# Function to all reduce NUMPY Accuracy and Loss from Multiple Devices
+def reduce_variable(variable, dist_opt, reducer):
+    reducer.copy_from_numpy(variable)
+    dist_opt.all_reduce(reducer.data)
+    dist_opt.wait()
+    output = tensor.to_numpy(reducer)
+    return output
+
+
+def resize_dataset(x, image_size):
+    num_data = x.shape[0]
+    dim = x.shape[1]
+    X = np.zeros(shape=(num_data, dim, image_size, image_size),
+                 dtype=np.float32)
+    for n in range(0, num_data):
+        for d in range(0, dim):
+            X[n, d, :, :] = np.array(Image.fromarray(x[n, d, :, :]).resize(
+                (image_size, image_size), Image.BILINEAR),
+                                     dtype=np.float32)
+    return X
+
+
+def run(global_rank,
+        world_size,
+        local_rank,
+        max_epoch,
+        batch_size,
+        model,
+        data,
+        sgd,
+        graph,
+        dist_option='fp32',
+        spars=None):
+    dev = device.create_cuda_gpu_on(local_rank)
+    dev.SetRandSeed(0)
+    np.random.seed(0)
+
+    if data == 'cifar10':
+        from data import cifar10
+        train_x, train_y, val_x, val_y = cifar10.load()
+    elif data == 'cifar100':
+        from data import cifar100
+        train_x, train_y, val_x, val_y = cifar100.load()
+    elif data == 'mnist':
+        from data import mnist
+        train_x, train_y, val_x, val_y = mnist.load()
+
+    num_channels = train_x.shape[1]
+    image_size = train_x.shape[2]
+    data_size = np.prod(train_x.shape[1:train_x.ndim]).item()
+    num_classes = (np.max(train_y) + 1).item()
+
+    # print content
+    with open(os.path.join(os.path.dirname(__file__), 'models.json')) as json_file:
+        model_config = json.load(json_file)
+        model_config = model_config[model]
+
+    download_model(model_config['url'])
+    onnx_model = onnx.load(os.path.join('/tmp', model_config['path']))
+    onnx_model = update_batch_size(onnx_model, batch_size)
+    sg_ir = sonnx.prepare(onnx_model, device=dev)
+    model = sonnx.create_model(sg_ir, autograd.softmax_cross_entropy, sgd)

Review comment:
       No, sorry for the confusion, please refer the current code.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] nudles commented on a change in pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

nudles commented on a change in pull request #703:
URL: https://github.com/apache/singa/pull/703#discussion_r435858990



##########
File path: examples/onnx/arcface.py
##########
@@ -78,35 +76,30 @@ def forward(self, x):
     download_model(url)
     onnx_model = onnx.load(model_path)
 
-    # set batch size
-    onnx_model = update_batch_size(onnx_model, 2)
+    # inference demo
+    logging.info("preprocessing...")
+    img1, img2 = get_image()
+    img1 = preprocess(img1)
+    img2 = preprocess(img2)
+    # sg_ir = sonnx.prepare(onnx_model) # run without graph
+    # y = sg_ir.run([img1, img2])
 
-    # prepare the model
-    logging.info("prepare model...")
+    logging.info("model compling...")
     dev = device.create_cuda_gpu()
-    sg_ir = sonnx.prepare(onnx_model, device=dev)
-    autograd.training = False
-    model = Infer(sg_ir)
+    x = tensor.Tensor(device=dev, data=np.concatenate((img1, img2), axis=0))
+    m = MyModel(onnx_model)
+    m.compile([x], is_train=False, use_graph=True, sequential=True)
 
-    # verifty the test dataset
+    # verifty the test
     # from utils import load_dataset
-    # inputs, ref_outputs = load_dataset(
-    #     os.path.join('/tmp', 'resnet100', 'test_data_set_0'))
+    # inputs, ref_outputs = load_dataset(os.path.join('/tmp', 'resnet100', 'test_data_set_0'))
     # x_batch = tensor.Tensor(device=dev, data=inputs[0])
-    # outputs = model.forward(x_batch)
+    # outputs = sg_ir.run([x_batch])
     # for ref_o, o in zip(ref_outputs, outputs):
     #     np.testing.assert_almost_equal(ref_o, tensor.to_numpy(o), 4)
 
-    # inference demo
-    logging.info("preprocessing...")
-    img1, img2 = get_image()
-    img1 = preprocess(img1)
-    img2 = preprocess(img2)
-
-    x_batch = tensor.Tensor(device=dev,
-                            data=np.concatenate((img1, img2), axis=0))
     logging.info("model running...")
-    y = model.forward(x_batch)
+    y = m.forward(*[x])[0]

Review comment:
       why pass `*[x]` to the forward function?
   and the [0] should be done within module.forward
   ```python
   def forward():
      y = super(MyModel, self).forward(*x)
      return y[0]
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] lgtm-com[bot] commented on pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

lgtm-com[bot] commented on pull request #703:
URL: https://github.com/apache/singa/pull/703#issuecomment-632895642


   This pull request **introduces 7 alerts** and **fixes 1** when merging 03c28cada62ee1acd3c5c9b17292a90d878ea98b into 84de1af8428d3796e3f99b215570a54d6f975a94 - [view on LGTM.com](https://lgtm.com/projects/g/apache/singa/rev/pr-a824e28c19728987e20806abbf1456326ecc44f6)
   
   **new alerts:**
   
   * 4 for Duplicate key in dict literal
   * 2 for Unused local variable
   * 1 for Wrong number of arguments in a class instantiation
   
   **fixed alerts:**
   
   * 1 for Unused local variable


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] lgtm-com[bot] commented on pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

lgtm-com[bot] commented on pull request #703:
URL: https://github.com/apache/singa/pull/703#issuecomment-636331522


   This pull request **fixes 1 alert** when merging fb765575f546aa7fec2d8dd8ad92b709a99adf02 into dd18aff58aafe29b2984c884fad7e453bfe2d507 - [view on LGTM.com](https://lgtm.com/projects/g/apache/singa/rev/pr-93242620f2bb2310bbe059a4db4e7a3e17144f1f)
   
   **fixed alerts:**
   
   * 1 for Unused local variable


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] lgtm-com[bot] commented on pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

lgtm-com[bot] commented on pull request #703:
URL: https://github.com/apache/singa/pull/703#issuecomment-637638468


   This pull request **fixes 1 alert** when merging 00b65b21717b7768d456a5681f39ec383c39af79 into dd18aff58aafe29b2984c884fad7e453bfe2d507 - [view on LGTM.com](https://lgtm.com/projects/g/apache/singa/rev/pr-992b3f304f090250217c62f2edec141a080c457e)
   
   **fixed alerts:**
   
   * 1 for Unused local variable


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] lgtm-com[bot] commented on pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

lgtm-com[bot] commented on pull request #703:
URL: https://github.com/apache/singa/pull/703#issuecomment-634916536


   This pull request **introduces 2 alerts** and **fixes 1** when merging 27051aef947b07e3dedb306ce373bbf08d6f170b into bec19640aa7dae75cd12a44109bc319d2e30cfbf - [view on LGTM.com](https://lgtm.com/projects/g/apache/singa/rev/pr-f9610f8d0805c66795711b7e5c5de2abd2f20be8)
   
   **new alerts:**
   
   * 2 for Unused local variable
   
   **fixed alerts:**
   
   * 1 for Unused local variable


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] lgtm-com[bot] commented on pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

lgtm-com[bot] commented on pull request #703:
URL: https://github.com/apache/singa/pull/703#issuecomment-637686401


   This pull request **fixes 1 alert** when merging e93770b3adca3828e60fd621d26261716ce078b8 into dd18aff58aafe29b2984c884fad7e453bfe2d507 - [view on LGTM.com](https://lgtm.com/projects/g/apache/singa/rev/pr-10f1d03c11a1c14f258b6bc974636c0db774df79)
   
   **fixed alerts:**
   
   * 1 for Unused local variable


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] lgtm-com[bot] commented on pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

lgtm-com[bot] commented on pull request #703:
URL: https://github.com/apache/singa/pull/703#issuecomment-636043906


   This pull request **introduces 1 alert** and **fixes 1** when merging 486e1d620421baf942a999cbcbc37cc771095c37 into bec19640aa7dae75cd12a44109bc319d2e30cfbf - [view on LGTM.com](https://lgtm.com/projects/g/apache/singa/rev/pr-dce09b59ca624530dc016ec343b90ae3dd3e9059)
   
   **new alerts:**
   
   * 1 for Unused local variable
   
   **fixed alerts:**
   
   * 1 for Unused local variable


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] lgtm-com[bot] commented on pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

lgtm-com[bot] commented on pull request #703:
URL: https://github.com/apache/singa/pull/703#issuecomment-632892891


   This pull request **introduces 7 alerts** and **fixes 1** when merging cfa8878316e69cb16939995980d04babc27af8c2 into 84de1af8428d3796e3f99b215570a54d6f975a94 - [view on LGTM.com](https://lgtm.com/projects/g/apache/singa/rev/pr-52b892dfd7c8f23ce3a1253d1e23bd6fef568789)
   
   **new alerts:**
   
   * 4 for Duplicate key in dict literal
   * 2 for Unused local variable
   * 1 for Wrong number of arguments in a class instantiation
   
   **fixed alerts:**
   
   * 1 for Unused local variable


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] joddiy commented on a change in pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

joddiy commented on a change in pull request #703:
URL: https://github.com/apache/singa/pull/703#discussion_r432953461



##########
File path: python/singa/sonnx.py
##########
@@ -989,1150 +1035,1051 @@ def from_onnx(args):
 class SingaBackend(Backend):
 
     # This number indicates the onnx operator set version
-    _known_opset_version = 11
+    _opset_version = 11
+
+    _ir_version = 0x0000000000000006
 
     # beceuase singa's operators are different from onnx.
     # we define a dict for the name projection
     _rename_operators = {
-        'Relu': 'relu',
-        'Softmax': 'SoftMax',
-        'Sigmoid': 'sigmoid',
-        'Add': 'add',
-        'MatMul': 'matmul',
-        'Conv': '_Conv2d',
-        'MaxPool': '_Pooling2d',
-        'AveragePool': '_Pooling2d',
-        'BatchNormalization': 'batchnorm_2d',
-        'Concat': 'Concat',
-        'Flatten': 'Flatten',
-        'Gemm': 'Gemm',
-        'Reshape': 'Reshape',
-        'Sum': 'sum',
-        'Cos': 'cos',
-        'Cosh': 'cosh',
-        'Sin': 'sin',
-        'Sinh': 'sinh',
-        'Tan': 'tan',
-        'Tanh': 'tanh',
-        'Acos': 'acos',
-        'Acosh': 'acosh',
-        'Asin': 'asin',
-        'Asinh': 'asinh',
-        'Atan': 'atan',
-        'Atanh': 'atanh',
-        'Selu': 'SeLU',
-        'Elu': 'Elu',
-        'Equal': 'equal',
-        'Less': 'less',
-        'Sign': 'sign',
-        'Div': 'div',
-        'Sub': 'sub',
-        'Sqrt': 'sqrt',
-        'Log': 'log',
-        'Greater': 'greater',
-        'HardSigmoid': 'HardSigmoid',
-        'Identity': 'identity',
-        'Softplus': 'softplus',
-        'Softsign': 'softsign',
-        'Mean': 'mean',
-        'Pow': 'pow',
-        'Clip': 'Clip',
-        'PRelu': 'prelu',
-        'Mul': 'mul',
-        'Transpose': 'Transpose',
-        'Max': 'max',
-        'Min': 'min',
-        'Shape': 'shape',
-        'And': '_and',
-        'Or': '_or',
-        'Xor': '_xor',
-        'Not': '_not',
-        'Neg': 'negative',
-        'Reciprocal': 'reciprocal',
-        'ConstantOfShape': 'ConstantOfShape',
-        'Dropout': 'Dropout',
-        'ReduceSum': 'ReduceSum',
-        'ReduceMean': 'ReduceMean',
-        'LeakyRelu': 'LeakyRelu',
-        'GlobalAveragePool': 'GlobalAveragePool',
-        'Squeeze': 'Squeeze',
+        # common op
+        'Relu': 'ReLU',
+        'Sigmoid': 'Sigmoid',
+        'Add': 'Add',
+        'MatMul': 'Matmul',
+        'Sum': 'Sum',
+        'Cos': 'Cos',
+        'Cosh': 'Cosh',
+        'Sin': 'Sin',
+        'Sinh': 'Sinh',
+        'Tan': 'Tan',
+        'Tanh': 'Tanh',
+        'Acos': 'Acos',
+        'Acosh': 'Acosh',
+        'Asin': 'Asin',
+        'Asinh': 'Asinh',
+        'Atan': 'Atan',
+        'Atanh': 'Atanh',
+        'Equal': 'Equal',
+        'Less': 'Less',
+        'Sign': 'Sign',
+        'Div': 'Div',
+        'Sub': 'Sub',
+        'Sqrt': 'Sqrt',
+        'Log': 'Log',
+        'Greater': 'Greater',
+        'Identity': 'Identity',
+        'Softplus': 'SoftPlus',
+        'Softsign': 'SoftSign',
+        'Mean': 'Mean',
+        'Pow': 'Pow',
+        'PRelu': 'PRelu',
+        'Mul': 'Mul',
+        'Max': 'Max',
+        'Min': 'Min',
+        'Shape': 'Shape',
+        'And': 'And',
+        'Or': 'Or',
+        'Xor': 'Xor',
+        'Not': 'Not',
+        'Neg': 'Negative',
+        'Reciprocal': 'Reciprocal',
         'Unsqueeze': 'Unsqueeze',
-        'Slice': 'Slice',
+        'NonZero': 'NonZero',
         'Ceil': 'Ceil',
-        'Split': 'Split',
-        'Gather': 'Gather',
-        'Tile': 'Tile',
-        'NonZero': 'nonzero',
+        # # special op
         'Cast': 'Cast',
+        'Split': 'Split',
+        'Squeeze': 'Squeeze',
+        'GlobalAveragePool': 'GlobalAveragePool',
+        'LeakyRelu': 'LeakyRelu',
+        'ReduceSum': 'ReduceSum',
+        'ReduceMean': 'ReduceMean',
+        'Dropout': 'Dropout',
+        'ConstantOfShape': 'ConstantOfShape',
+        'Transpose': 'Transpose',
+        'HardSigmoid': 'HardSigmoid',
+        'Elu': 'Elu',
+        'Selu': 'SeLU',
+        'Concat': 'Concat',
+        'Softmax': 'SoftMax',
+        'Flatten': 'Flatten',
         'OneHot': 'OneHot',
+        'Tile': 'Tile',
+        'Gather': 'Gather',
+        'Reshape': 'Reshape',
+        'Slice': 'Slice',
+        'Clip': 'Clip',
+        'Gemm': 'layer.Gemm',  # layer
+        'BatchNormalization': 'layer.BatchNorm2d',  # layer
+        'Conv': 'layer.Conv2d',  # layer
+        'MaxPool': 'layer.Pooling2d',  # layer
+        'AveragePool': 'layer.Pooling2d',  # layer
     }
 
     # this dict indicates the operators that need extra handle
     # each indicates a function name
     _special_operators = {
-        'Conv': '_create_conv',
-        'MaxPool': '_create_max_avg_pool',
-        'AveragePool': '_create_max_avg_pool',
-        'BatchNormalization': '_create_batchnorm',
+        'Cast': '_create_cast',
+        'Split': '_create_split',
+        'Squeeze': '_create_squeeze_unsqueeze',
+        'Unsqueeze': '_create_squeeze_unsqueeze',
+        'GlobalAveragePool': '_create_global_average_pool',
+        'LeakyRelu': '_create_leakyrelu',
+        'ReduceSum': '_create_reduce_ops',
+        'ReduceMean': '_create_reduce_ops',
+        'Dropout': '_create_dropout',
+        'ConstantOfShape': '_create_constant_of_shape',
+        'Transpose': '_create_transpose',
+        'HardSigmoid': '_create_hardsigmoid',
+        'Elu': '_create_elu',
+        'Selu': '_create_selu',
         'Concat': '_create_concat',
-        'Flatten': '_create_flatten',
+        'Softmax': '_create_softmax',
         'Gemm': '_create_gemm',
+        'Flatten': '_create_flatten',
+        'OneHot': '_create_onehot',
+        'Tile': '_create_tile',
+        'Gather': '_create_gather',
         'Reshape': '_create_reshape',
-        'Softmax': '_create_softmax',
-        'Selu': '_create_selu',
-        'Elu': '_create_elu',
-        'HardSigmoid': '_create_hardsigmoid',
-        'Clip': '_create_clip',
-        'Transpose': '_create_transpose',
-        'ConstantOfShape': '_create_constantOfShape',
-        'Dropout': '_create_dropout',
-        'ReduceSum': '_create_reduceOp',
-        'ReduceMean': '_create_reduceOp',
-        'LeakyRelu': '_create_leakyrelu',
-        'GlobalAveragePool': '_create_globalaveragepool',
-        'Squeeze': '_create_squeeze',
-        'Unsqueeze': '_create_squeeze',
         'Slice': '_create_slice',
-        'Split': '_create_split',
-        'Gather': '_create_gather',
-        'Tile': '_create_tile',
-        'Cast': '_create_cast',
-        'OneHot': '_create_onehot',
-        'Constant': "_create_constant"
+        'Clip': '_create_clip',
+        'BatchNormalization': '_create_batch_norm',
+        'Conv': '_create_conv',
+        'MaxPool': '_create_max_avg_pool',
+        'AveragePool': '_create_max_avg_pool',
     }
 
     @classmethod
-    def _create_constant(cls, onnx_node, inputs, opset_version):
-        """
-        parse onnx constatn node to weights
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
-        """
-        tmp_tensor = onnx_node.getattr('value')
-        np_dtype = onnx.mapping.TENSOR_TYPE_TO_NP_TYPE[tmp_tensor.data_type]
-        np_tensor = np.frombuffer(tmp_tensor.raw_data, dtype=np_dtype)
-        if np_tensor.dtype == "int64":
-            np_tensor = np_tensor.astype(np.int32)
-        # todo, we cannot support scalar tensor
-        if np.ndim(np_tensor) == 0:
-            np_tensor = np.array(np_tensor, ndmin=1)
-        return None, np_tensor
-
-    @classmethod
-    def _create_onehot(cls, onnx_node, inputs, opset_version):
-        """
-        get the OneHot operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
-        """
-        axis = onnx_node.getattr("axis", -1)
-        # we move several inputs to singa's attribuates
-        # and mark them so we don't use them when we run this operator
-        depth = tensor.to_numpy(inputs.pop(1)).astype(np.int32)
-        value = tensor.to_numpy(inputs.pop(1))
-        onnx_node.consumed_inputs.extend(onnx_node.inputs[1:])
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(axis, depth, value)
-
-    @classmethod
-    def _create_cast(cls, onnx_node, inputs, opset_version):
+    def _create_cast(cls, onnx_node, operator, opset_version=_opset_version):
         """
         get the Cast operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
-        """
-        to = onnx_node.getattr("to")
-        # singa only supports float32 and int32
-        map_dict = {
-            TensorProto.FLOAT: tensor.float32,  # FLOAT to float32
-            TensorProto.UINT8: None,  # UINT8
-            TensorProto.INT8: tensor.int32,  # INT8 to int32
-            TensorProto.UINT16: None,  # UINT16
-            TensorProto.INT16: tensor.int32,  # INT16 to int32
-            TensorProto.INT32: tensor.int32,  # INT32 to int32
-            TensorProto.INT64: tensor.int32,  # INT64 to int32
-            TensorProto.STRING: None,  # stirng
-            TensorProto.BOOL: None,  # bool
-        }
-        to = map_dict[to]
-        assert to != None, "not support cast type: {}".format(to)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(to)
-
-    @classmethod
-    def _create_tile(cls, onnx_node, inputs, opset_version):
-        """
-        get the Tile operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
-        """
-        # we move several inputs to singa's attribuates
-        # and mark them so we don't use them when we run this operator
-        repeats = tensor.to_numpy(inputs.pop(1)).astype(np.int32).tolist()
-        onnx_node.consumed_inputs.append(onnx_node.inputs[1])
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(repeats)
-
-    @classmethod
-    def _create_gather(cls, onnx_node, inputs, opset_version):
-        """
-        get the Gather operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
-        axis = onnx_node.getattr("axis", 0)
-        # we move several inputs to singa's attribuates
-        # and mark them so we don't use them when we run this operator
-        indices = tensor.to_numpy(inputs.pop(1)).astype(np.int32).tolist()
-        onnx_node.consumed_inputs.append(onnx_node.inputs[1])
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(axis, indices)
+        to_type = onnx_type_to_singa_type(onnx_node.getattr("to"))
+        assert to_type != None, "not support cast type: {}".format(to_type)
+        if to_type == np.dtype('float32'):
+            return operator(tensor.float32)
+        else:
+            return operator(tensor.int32)
 
     @classmethod
-    def _create_split(cls, onnx_node, inputs, opset_version):
+    def _create_split(cls, onnx_node, operator, opset_version=_opset_version):
         """
         get the Split operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         axis = onnx_node.getattr("axis", 0)
         split = onnx_node.getattr("split", None)
         num_output = len(onnx_node.outputs)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(axis, split, num_output)
-
-    @classmethod
-    def _create_slice(cls, onnx_node, inputs, opset_version):
-        """
-        get the Slice operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
-        """
-        # we move several inputs to singa's attribuates
-        # and mark them so we don't use them when we run this operator
-        starts = tensor.to_numpy(inputs.pop(1)).astype(np.int32).tolist()
-        ends = tensor.to_numpy(inputs.pop(1)).astype(np.int32).tolist()
-        # sometime onnx may ignore these two inputs, axes and step
-        if len(inputs) >= 2 and onnx_node.inputs[3] != '':
-            axes = tensor.to_numpy(inputs.pop(1)).astype(np.int32).tolist()
-        else:
-            axes = None
-        steps = tensor.to_numpy(inputs.pop(1)).astype(
-            np.int32).tolist() if len(inputs) >= 2 else None
-        onnx_node.consumed_inputs.extend(onnx_node.inputs[1:])
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(starts, ends, axes, steps)
+        return operator(axis, split, num_output)
 
     @classmethod
-    def _create_squeeze(cls, onnx_node, inputs, opset_version):
+    def _create_squeeze_unsqueeze(cls,
+                                  onnx_node,
+                                  operator,
+                                  opset_version=_opset_version):
         """
         get the Squeeze and Unsqueeze operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         axes = onnx_node.getattr("axes")
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(axes)
+        return operator(axes)
 
     @classmethod
-    def _create_globalaveragepool(cls, onnx_node, inputs, opset_version):
+    def _create_global_average_pool(cls,
+                                    onnx_node,
+                                    operator,
+                                    opset_version=_opset_version):
         """
         get the GlobalAveragePool operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         data_format = onnx_node.getattr("data_format", 'channels_first')
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(data_format)
+        return operator(data_format)
 
     @classmethod
-    def _create_leakyrelu(cls, onnx_node, inputs, opset_version):
+    def _create_leakyrelu(cls,
+                          onnx_node,
+                          operator,
+                          opset_version=_opset_version):
         """
         get the LeakyRelu operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         alpha = onnx_node.getattr("alpha", 0.01)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(alpha)
+        return operator(alpha)
 
     @classmethod
-    def _create_reduceOp(cls, onnx_node, inputs, opset_version):
+    def _create_reduce_ops(cls,
+                           onnx_node,
+                           operator,
+                           opset_version=_opset_version):
         """
         get the ReduceSum, ReduceMean, ReduceMax, ReduceMin, etc, operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         axes = onnx_node.getattr("axes", None)
         keepdims = onnx_node.getattr("keepdims", 1)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(axes, keepdims)
+        return operator(axes, keepdims)
 
     @classmethod
-    def _create_dropout(cls, onnx_node, inputs, opset_version):
+    def _create_dropout(cls, onnx_node, operator, opset_version=_opset_version):
         """
         get the Dropout operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         ratio = onnx_node.getattr("ratio", 0)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(ratio)
+        return operator(ratio)
 
     @classmethod
-    def _create_constantOfShape(cls, onnx_node, inputs, opset_version):
+    def _create_constant_of_shape(cls,
+                                  onnx_node,
+                                  operator,
+                                  opset_version=_opset_version):
         """
         get the ConstantOfShape operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         value = onnx_node.getattr("value", 0)
         if isinstance(value, onnx.TensorProto):
             value = numpy_helper.to_array(value)[0].item()
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(value)
+        return operator(value)
 
     @classmethod
-    def _create_transpose(cls, onnx_node, inputs, opset_version):
+    def _create_transpose(cls,
+                          onnx_node,
+                          operator,
+                          opset_version=_opset_version):
         """
         get the Transpose operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
-        """
-        shape = inputs[0].shape
-        perm = onnx_node.getattr("perm", list(range(len(shape) - 1, -1, -1)))
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(perm)
-
-    @classmethod
-    def _create_clip(cls, onnx_node, inputs, opset_version):
-        """
-        get the clip operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
-        # sometime onnx may ignore these two inputs, min or max or both
-        if len(inputs) >= 2 and onnx_node.inputs[1] != '':
-            min_v = tensor.to_numpy(inputs.pop(1)).tolist()[0]
-        else:
-            min_v = None
-        if len(inputs) >= 2 and onnx_node.inputs[2] != '':
-            max_v = tensor.to_numpy(inputs.pop(1)).tolist()[0]
-        else:
-            max_v = None
-        onnx_node.consumed_inputs.extend(onnx_node.inputs[1:])
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(min_v, max_v)
+        perm = onnx_node.getattr("perm")
+        return operator(perm)
 
     @classmethod
-    def _create_hardsigmoid(cls, onnx_node, inputs, opset_version):
+    def _create_hardsigmoid(cls,
+                            onnx_node,
+                            operator,
+                            opset_version=_opset_version):
         """
-        get the HardSigmoid operator from onnx node
+        get the hardsigmoid operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            
+            opset_version (int): the opset version
         Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         alpha = onnx_node.getattr("alpha", 0.2)
         beta = onnx_node.getattr("beta", 0.5)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(alpha, beta)
+        return operator(alpha, beta)
 
     @classmethod
-    def _create_elu(cls, onnx_node, inputs, opset_version):
+    def _create_elu(cls, onnx_node, operator, opset_version=_opset_version):
         """
         get the elu operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         alpha = onnx_node.getattr("alpha", 1.)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(alpha)
+        return operator(alpha)
 
     @classmethod
-    def _create_selu(cls, onnx_node, inputs, opset_version):
+    def _create_selu(cls, onnx_node, operator, opset_version=_opset_version):
         """
         get the selu operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         alpha = onnx_node.getattr("alpha", 1.67326)
         gamma = onnx_node.getattr("gamma", 1.0507)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(alpha, gamma)
+        return operator(alpha, gamma)
 
     @classmethod
-    def _create_reshape(cls, onnx_node, inputs, opset_version):
+    def _create_concat(cls, onnx_node, operator, opset_version=_opset_version):
         """
-        get the reshape operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
+        get the concat operator from onnx node
         Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            the handle of singa operator
-        Returns: 
-            the autograd of singa operator
+            singa operator instance
         """
-        shape = tensor.to_numpy(inputs.pop(1)).astype(np.int32).tolist()
-        onnx_node.consumed_inputs.append(onnx_node.inputs[1])
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(shape)
+        factor = onnx_node.getattr('axis')
+        return operator(axis=factor)
 
     @classmethod
-    def _create_conv(cls, onnx_node, inputs, opset_version):
+    def _create_softmax(cls, onnx_node, operator, opset_version=_opset_version):
         """
-        get the conv operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
+        get the softmax operator from onnx node
         Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
-        kernel = tuple(onnx_node.attrs["kernel_shape"])
-        padding = tuple(
-            onnx_node.attrs["pads"]) if "pads" in onnx_node.attrs else (0, 0)
-        stride = tuple(onnx_node.getattr('strides', (1, 1)))
-        # default the odd_padding is 0, once there are same pad mode, we modify it
-        # for odd_padding, please refer the autegrade.py
-        odd_padding = (0, 0, 0, 0)
-        if "auto_pad" in onnx_node.attrs:
-            auto_pad = utils.force_unicode(onnx_node.attrs['auto_pad'])
-            if auto_pad in ('SAME_UPPER', 'SAME_LOWER'):
-                padding, odd_padding = utils.get_padding_shape(
-                    auto_pad, inputs[0].shape[2:], kernel, stride)
-
-        # not support dilation
-        dilation = onnx_node.getattr('dilations', 1)
-        if dilation != 1 and list(dilation) != [1, 1]:
-            raise ValueError("Not implemented yet for dilation")
-        group = onnx_node.getattr('group', 1)
-
-        # only support 1d or 2d
-        if len(kernel) > 2:
-            raise ValueError("Only implemented for 1d or 2d")
-
-        bias = len(inputs) == 3
-        x = inputs[0]
-        x_shape = inputs[0].shape
-        in_channels = x_shape[1]
-        w_shape = inputs[1].shape
-        out_channels = w_shape[0]
-        assert w_shape[1] == in_channels // group
-
-        if inputs[0].device.id() == -1:
-            if group != 1:
-                raise NotImplementedError
-            else:
-                handle = singa.ConvHandle(x.data, kernel, stride, padding,
-                                          in_channels, out_channels, bias,
-                                          group)
-        else:
-            handle = singa.CudnnConvHandle(x.data, kernel, stride, padding,
-                                           in_channels, out_channels, bias,
-                                           group)
-
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(handle, odd_padding)
+        factor = onnx_node.getattr('axis', 1)
+        return operator(axis=factor)
 
     @classmethod
-    def _create_max_avg_pool(cls, onnx_node, inputs, opset_version):
+    def _create_gemm(cls, onnx_node, operator, opset_version=_opset_version):
         """
-        get the max or avg pool operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
+        get the gemm operator from onnx node
         Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
-        kernel = tuple(onnx_node.attrs["kernel_shape"])
-        padding = tuple(
-            onnx_node.attrs["pads"]) if "pads" in onnx_node.attrs else (0, 0)
-        stride = tuple(onnx_node.getattr('strides', (1, 1)))
-        # default the odd_padding is 0, once there are same pad mode, we modify it
-        # for odd_padding, please refer the autegrade.py
-        odd_padding = (0, 0, 0, 0)
-        if "auto_pad" in onnx_node.attrs:
-            auto_pad = utils.force_unicode(onnx_node.attrs['auto_pad'])
-            if auto_pad in ('SAME_UPPER', 'SAME_LOWER'):
-                padding, odd_padding = utils.get_padding_shape(
-                    auto_pad, inputs[0].shape[2:], kernel, stride)
-
-        # not support count_include_pad and auto_pad
-        if "count_include_pad" in onnx_node.attrs or "ceil_mode" in onnx_node.attrs:
-            raise ValueError(
-                "Not implemented yet for count_include_pad or ceil_mode")
-
-        # only support 2d
-        if len(kernel) != 2:
-            raise ValueError("Not implemented yet")
-
-        is_max = onnx_node.op_type == 'MaxPool'
-        x = inputs[0]
-        if x.device.id() == -1:
-            handle = singa.PoolingHandle(x.data, kernel, stride, padding,
-                                         is_max)
-        else:
-            handle = singa.CudnnPoolingHandle(x.data, kernel, stride, padding,
-                                              is_max)
-
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(handle, odd_padding)
+        alpha = onnx_node.getattr('alpha', 1.)
+        beta = onnx_node.getattr('beta', 1.)
+        transA = onnx_node.getattr('transA', 0)
+        transB = onnx_node.getattr('transB', 0)
+        onnx_node.set_weight_inputs(onnx_node.inputs[1], 'W')
+        bias = False
+        if len(onnx_node.inputs) == 3:
+            onnx_node.set_weight_inputs(onnx_node.inputs[2], 'b')
+            bias = True
+        return operator(None,
+                        alpha=alpha,
+                        beta=beta,
+                        transA=transA,
+                        transB=transB,
+                        bias=bias)
 
     @classmethod
-    def _create_batchnorm(cls, onnx_node, inputs, opset_version):
+    def _create_flatten(cls, onnx_node, operator, opset_version=_opset_version):
         """
-        get the batch norm operator from onnx node
-        Args:onnx_node: a given onnx node
-        Args:inputs: the input tensor
-        Args:opset_version: the opset version
-        Returns: the handle of singa operator
-        Returns: the autograd of singa operator
+        get the flatten operator from onnx node
+        Args:
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
+        Returns: 
+            singa operator instance
         """
-        x = inputs[0]
-        factor = onnx_node.getattr('momentum', 0.9)
-        if x.device.id() == -1:
-            handle = singa.BatchNormHandle(factor, x.data)
-        else:
-            handle = singa.CudnnBatchNormHandle(factor, x.data)
-
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return handle, forward
+        factor = onnx_node.getattr('axis', 1)
+        return operator(axis=factor)
 
     @classmethod
-    def _create_concat(cls, onnx_node, inputs, opset_version):
+    def _create_onehot(cls, onnx_node, operator, opset_version=_opset_version):
         """
-        get the concat operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
+        get the OneHot operator from onnx node
         Args:
-            opset_version: the opset version
-        Returns: 
-            the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            the autograd of singa operator
+            singa operator instance
         """
-        factor = onnx_node.attrs["axis"]
-        if factor < 0:
-            factor = len(inputs[0].shape
-                        ) + factor  # in order to support the negative axis
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return None, forward(axis=factor)
+        axis = onnx_node.getattr("axis", -1)
+        onnx_node.set_attr_inputs(onnx_node.inputs[1], 'depth')
+        onnx_node.set_attr_inputs(onnx_node.inputs[2], 'values')
+        return operator(axis, None, None)
 
     @classmethod
-    def _create_softmax(cls, onnx_node, inputs, opset_version):
+    def _create_tile(cls, onnx_node, operator, opset_version=_opset_version):
         """
-        get the concat operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
+        get the Tile operator from onnx node
         Args:
-            opset_version: the opset version
-        Returns: 
-            the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            the autograd of singa operator
+            singa operator instance
         """
-        factor = onnx_node.getattr('axis', 1)
-        if factor < 0:
-            # in order to support the negative axis
-            factor = len(inputs[0].shape) + factor
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return None, forward(axis=factor)
+        onnx_node.set_attr_inputs(onnx_node.inputs[1], 'repeats')
+        return operator(None)
 
     @classmethod
-    def _create_gemm(cls, onnx_node, inputs, opset_version):
+    def _create_gather(cls, onnx_node, operator, opset_version=_opset_version):
         """
-        get the gemm operator from onnx node
-        Args:
-            onnx_node: a given onnx node
+        get the Gather operator from onnx node
         Args:
-            inputs: the input tensor
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
+        Returns: 
+            singa operator instance
+        """
+        axis = onnx_node.getattr("axis", 0)
+        onnx_node.set_attr_inputs(onnx_node.inputs[1], 'indices')
+        return operator(axis, None)
+
+    @classmethod
+    def _create_reshape(cls, onnx_node, operator, opset_version=_opset_version):
+        """
+        get the reshape operator from onnx node
         Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            the handle of singa operator
+            singa operator instance
+        """
+        onnx_node.set_attr_inputs(onnx_node.inputs[1], 'shape')
+        return operator(None)
+
+    @classmethod
+    def _create_slice(cls, onnx_node, operator, opset_version=_opset_version):
+        """
+        get the Slice operator from onnx node
+        Args:
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            the autograd of singa operator
+            singa operator instance
         """
-        x = inputs[0]
-        alpha = onnx_node.getattr('alpha', 1.)
-        beta = onnx_node.getattr('beta', 1.)
-        transA = onnx_node.getattr('transA', 0)
-        transB = onnx_node.getattr('transB', 0)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return None, forward(alpha=alpha,
-                             beta=beta,
-                             transA=transA,
-                             transB=transB)
+        onnx_node.set_attr_inputs(onnx_node.inputs[1], 'starts')
+        onnx_node.set_attr_inputs(onnx_node.inputs[2], 'ends')
+        if len(onnx_node.inputs) >= 4 and onnx_node.inputs[3] != '':
+            onnx_node.set_attr_inputs(onnx_node.inputs[3], 'axes')
+        if len(onnx_node.inputs) == 5 and onnx_node.inputs[4] != '':
+            onnx_node.set_attr_inputs(onnx_node.inputs[4], 'steps')
+        return operator(None, None, None, None)
 
     @classmethod
-    def _create_flatten(cls, onnx_node, inputs, opset_version):
+    def _create_clip(cls, onnx_node, operator, opset_version=_opset_version):
         """
-        get the flatten operator from onnx node
+        get the clip operator from onnx node
         Args:
-            onnx_node: a given onnx node
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
+        Returns: 
+            singa operator instance
+        """
+        if len(onnx_node.inputs) >= 2 and onnx_node.inputs[1] != '':
+            onnx_node.set_attr_inputs(onnx_node.inputs[1], 'min')
+        if len(onnx_node.inputs) == 3 and onnx_node.inputs[2] != '':
+            onnx_node.set_attr_inputs(onnx_node.inputs[2], 'max')
+        return operator(None, None)
+
+    @classmethod
+    def _create_batch_norm(cls,
+                           onnx_node,
+                           operator,
+                           opset_version=_opset_version):
+        """
+        get the clip operator from onnx node
         Args:
-            inputs: the input tensor
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
+        Returns: 
+            singa operator instance
+        """
+        factor = onnx_node.getattr('momentum', 0.9)
+        onnx_node.set_weight_inputs(onnx_node.inputs[1], 'scale')
+        onnx_node.set_weight_inputs(onnx_node.inputs[2], 'bias')
+        onnx_node.set_weight_inputs(onnx_node.inputs[3], 'running_mean')
+        onnx_node.set_weight_inputs(onnx_node.inputs[4], 'running_var')
+        return operator(factor)
+
+    @classmethod
+    def _create_conv(cls, onnx_node, operator, opset_version=_opset_version):
+        """
+        get the clip operator from onnx node
         Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            the handle of singa operator
+            singa operator instance
+        """
+        kernel_size = tuple(onnx_node.getattr('kernel_shape'))
+        padding = tuple(onnx_node.getattr('pads', (0, 0)))
+        stride = tuple(onnx_node.getattr('strides', (1, 1)))
+        auto_pad = utils.force_unicode(onnx_node.getattr('auto_pad', 'NOTSET'))
+
+        # not support dilation
+        dilation = onnx_node.getattr('dilations', 1)
+        if dilation != 1 and list(dilation) != [1, 1]:
+            raise ValueError("Not implemented yet for dilation")
+        group = onnx_node.getattr('group', 1)
+
+        # only support 1d or 2d
+        if len(kernel_size) > 2:
+            raise ValueError("Only implemented for 1d or 2d")
+
+        onnx_node.set_weight_inputs(onnx_node.inputs[1], 'W')
+        bias = False
+        if len(onnx_node.inputs) == 3:
+            onnx_node.set_weight_inputs(onnx_node.inputs[2], 'b')
+            bias = True
+        return operator(None,
+                        kernel_size,
+                        stride=stride,
+                        padding=padding,
+                        dilation=dilation,
+                        group=group,
+                        bias=bias,
+                        pad_mode=auto_pad)
+
+    @classmethod
+    def _create_max_avg_pool(cls,
+                             onnx_node,
+                             operator,
+                             opset_version=_opset_version):
+        """
+        get the clip operator from onnx node
+        Args:
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            the autograd of singa operator
+            singa operator instance
         """
-        factor = onnx_node.getattr('axis', 1)
-        if factor < 0:
-            # in order to support the negative axis
-            factor = len(inputs[0].shape) + factor
+        kernel_size = tuple(onnx_node.getattr('kernel_shape'))
+        padding = tuple(onnx_node.getattr('pads', (0, 0)))
+        stride = tuple(onnx_node.getattr('strides', (1, 1)))
+        auto_pad = utils.force_unicode(onnx_node.getattr('auto_pad', 'NOTSET'))
 
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return None, forward(axis=factor)
+        # not support count_include_pad and auto_pad
+        ceil_mode = onnx_node.getattr('ceil_mode', 0)
+        count_include_pad = onnx_node.getattr('count_include_pad', 0)
+        if ceil_mode != 0 or count_include_pad != 0:
+            raise ValueError(
+                "Not implemented yet for count_include_pad or ceil_mode")
+
+        # only support 1d or 2d
+        if len(kernel_size) > 2:
+            raise ValueError("Only implemented for 1d or 2d")
+
+        is_max = onnx_node.op_type == 'MaxPool'
+        return operator(kernel_size, stride, padding, is_max, auto_pad)
 
     @classmethod
-    def _common_onnx_node_to_singa_op(cls, onnx_node, inputs, opset_version):
+    def _onnx_constant_to_np(cls, onnx_node, opset_version):
         """
-        get a common singa operator(only autograd) from a onnx node
-        other special operators also can call this func to get autograd
-        Args:
-            onnx_node: a given onnx node
+        parse onnx constatn node to numpy array
         Args:
-            tensor_map: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            a dict of tensors
+            onnx_node (OnnxNode): a given onnx node
+            opset_version (int): the opset version
         Returns: 
-            a list of SingaOps('name', 'op', 'handle', 'forward')
+            a numpy ndarray
         """
-        onnx_op_type = onnx_node.op_type
-        assert onnx_op_type in cls._rename_operators, "not support operator: {}".format(
-            onnx_op_type)
-        autograd_op = getattr(autograd, cls._rename_operators[onnx_op_type])
-        return None, autograd_op
+        onnx_tensor = onnx_node.getattr('value')
+        np_dtype = mapping.TENSOR_TYPE_TO_NP_TYPE[onnx_tensor.data_type]
+        np_tensor = np.frombuffer(onnx_tensor.raw_data, dtype=np_dtype)
+        return tensor.from_numpy(np_tensor)
 
     @classmethod
-    def _onnx_node_to_singa_op(cls,
-                               onnx_node,
-                               inputs,
-                               opset_version=_known_opset_version):
+    def _onnx_node_to_singa_op(cls, onnx_node, opset_version=_opset_version):
         """
-        get a singa operator(handle and autograd) from a onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input list
+        get singa operator from a onnx node
         Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            opset_version (int): the opset version
         Returns: 
-            a dict of tensors
-        Returns: 
-            a list of SingaOps('name', 'op', 'handle', 'forward')
+            singa operator instance
         """
+        onnx_op_type = onnx_node.op_type
+        assert onnx_op_type in cls._rename_operators, "not support operator: {}".format(
+            onnx_op_type)
+        renamed_op = cls._rename_operators[onnx_op_type]
+        if renamed_op.startswith('layer.'):
+            op_class = getattr(layer, renamed_op[6:])
+        else:
+            op_class = getattr(autograd, renamed_op)
         if onnx_node.op_type in cls._special_operators:
             translator = getattr(cls, cls._special_operators[onnx_node.op_type])
+            op = translator(onnx_node, op_class, opset_version)
         else:
-            translator = cls._common_onnx_node_to_singa_op
-        return translator(onnx_node, inputs, opset_version)
+            op = op_class()
+        # refine the ONNXNode
+        onnx_node.inputs = [inp for inp in onnx_node.inputs if inp != '']
+        return op
 
     @classmethod
-    def run_node(cls, onnx_node, inputs, opset_version=_known_opset_version):
+    def run_node(cls, node, inputs, device='CPU', opset_version=_opset_version):
         """
         run a single singa operator from a onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            device: the used device
-        Args:
-            opset_version: the opset version
-        Returns: 
-            list, the output of the 
+            node (NodeProto): a given onnx node
+            inputs (ndarray[]): a list of numpy ndarray
+            device (string): CPU or CUDA
+            opset_version (int): the opset version
+        Returns:
+            list, the output
         """
-        valid_inputs = [x for x in onnx_node.inputs if x != ""]
+        node = OnnxNode(node)
+        valid_inputs = [x for x in node.inputs if x != ""]
         assert len(valid_inputs) == len(
-            inputs), "{}: expected {} but got {}".format(
-                onnx_node.op_type, len(valid_inputs), len(inputs))
-
-        tmp_inputs = [inputs[x] for x in onnx_node.inputs if x != ""]
-        handle, forward = cls._onnx_node_to_singa_op(onnx_node, tmp_inputs,
-                                                     opset_version)
-        # only give the inputs it needs
-        # consumed_inputs are the inputs marked as attributes
-        # so we remove it here
-        tmp_inputs = [
-            inputs[x]
-            for x in onnx_node.inputs
-            if x not in onnx_node.consumed_inputs
-        ]
-        return cls._run_node(onnx_node, tmp_inputs, handle, forward,
-                             opset_version)
+            inputs), "{}: expected {} inputs, but got {}. ".format(
+                node.op_type, len(valid_inputs), len(inputs))
+
+        operator = cls._onnx_node_to_singa_op(node, opset_version)
+        # seperate weights with inputs, and init inputs as Tensor
+        weights = {}
+        _inputs = []
+        for (key, val) in zip(valid_inputs, inputs):
+            val = val.astype(onnx_type_to_singa_type(val.dtype))
+            if key in node.weight_inputs:
+                weights[key] = val
+            else:
+                x = tensor.from_numpy(val)
+                if device == 'CPU':
+                    assert singa.USE_CUDA, "Your SINGA doesn't compile GPU module."
+                    dev = device.create_cuda_gpu(set_default=False)
+                else:
+                    dev = device.get_default_device()
+                x.to_device(dev)
+                _inputs.append(x)
+        inputs = _inputs
+        # set params
+        params = {}
+        for key, name in node.weight_inputs.items():
+            params[name] = weights[key]
+        operator.set_params(params)
+        outputs = cls._run_node(operator, inputs)
+        outputs_dict = OrderedDict()
+        for (key, val) in zip(node.outputs, outputs):
+            outputs_dict[key] = val
+        return outputs_dict
 
     @classmethod
-    def _run_node(cls,
-                  onnx_node,
-                  inputs,
-                  handle,
-                  forward,
-                  opset_version=_known_opset_version):
+    def _run_node(cls, operator, inputs):
         """
-        run a single singa operator from a onnx node
-        Args:inputs: 
-            the input tensor
-        Args:handle: 
-            the handle of singa operator
-        Args:forward: 
-            the forward of singa operator
+        run a single singa operator from singa operator
         Args:
-            opset_version: the opset version
-        Returns: 
-            list, the output of the
+            operator (Operator): the Operator instance
+            inputs (Tensor[]): a list of SINGA Tensor
+        Returns:
+            list, the output
         """
-        outputs = forward(*inputs) if handle is None else forward(
-            handle, *inputs)
+        outputs = operator(*inputs)
         if not isinstance(outputs, collections.Iterable):
             outputs = [outputs]
-        outputs_dict = OrderedDict()
-        for (key, val) in zip(onnx_node.outputs, outputs):
-            outputs_dict[key] = val
-        return outputs_dict
+        return outputs
 
     @classmethod
-    def _init_graph_parameter(cls, graph, init_inputs, device):
+    def _parse_graph_params(cls, graph, device):
         """
-        init the singa tensor from onnx infos
+        parse the parameters from onnx graph
         Args:
-            graph: a given onnx graph
-        Args:
-            init_inputs: a list of inputs, which used to init the operators
+            graph (Graph): a given onnx graph
+            device (string): CPU or CUDA
+        Returns:
+            a dict of numpy ndarray
+        """
+        params = {}
+        for tp in graph.initializer:
+            val = numpy_helper.to_array(tp)
+            val = val.astype(onnx_type_to_singa_type(tp.data_type))
+            params[tp.name] = val
+        return params
+
+    @classmethod
+    def _parse_graph_inputs_outputs(cls, graph, params, device):
+        """
+        parse the inits, outputs from onnx graph
         Args:
-            device: the used device
+            graph (Graph): a given onnx graph
+            device (string): # CPU or CUDA
         Returns:
-            a dict of tensors
+            a dict of ValueInfo
+            a dict of ValueInfo
         """
-        tensor_map = {}
-        # due to https://github.com/onnx/onnx/issues/2417
-        # sometimes, input contains all initializer's info
-        # sometimes, may not
-        all_inputs = OrderedDict()
+        inputs = []
+        outputs = []
+        info_tuple = namedtuple('info_tuple', ['name', 'dtype', 'shape'])
         for t in graph.input:
-            all_inputs[t.name] = t
-        # so we refresh the input by the initializer
-        for t in graph.initializer:
-            all_inputs[t.name] = t
-        initializers = {t.name for t in graph.initializer}
-        inp_idx = 0
-        for name, x in all_inputs.items():
-            if name in initializers:
-                # if it has initializer, we use its value as the input
-                np_tensor = numpy_helper.to_array(x)
-                if np_tensor.dtype == "int64":
-                    np_tensor = np_tensor.astype(np.int32)
-                # todo, we cannot support scalar tensor
-                if np.ndim(np_tensor) == 0:
-                    np_tensor = np.array(np_tensor, ndmin=1)
-            else:
-                # if not, means it's a input rather than a inner weight
-                # so if the user gives values, we use these values
-                # if not, we just use the shape of input gived by onnx to init a random value
-                # HOWEVER, the random value may not be correct for some inputs, such as gather which needs indices
-                # so if have operators, the user must give inputs
-                x_shape = tuple(
-                    dim.dim_value for dim in x.type.tensor_type.shape.dim)
-                if init_inputs is not None:
-                    np_tensor = init_inputs[inp_idx]
-                    inp_idx += 1
-                else:
-                    np_tensor = np.random.randn(*x_shape).astype(np.float32)
-            tmp_tensor = tensor.from_numpy(np_tensor)
-            tmp_tensor.to_device(device)
-            # todo, for backward
-            tmp_tensor.stores_grad = (name in initializers)
-            tensor_map[x.name] = tmp_tensor
-        return tensor_map
+            if t.name not in params:
+                dtype = t.type.tensor_type.elem_type
+                shape = [dim.dim_value for dim in t.type.tensor_type.shape.dim]
+                inputs.extend([info_tuple(t.name, dtype, shape)])
+        for t in graph.output:
+            dtype = t.type.tensor_type.elem_type
+            shape = [dim.dim_value for dim in t.type.tensor_type.shape.dim]
+            outputs.extend([info_tuple(t.name, dtype, shape)])
+        return inputs, outputs
 
     @classmethod
-    def _onnx_model_to_singa_net(cls, model, init_inputs, device,
-                                 opset_version):
+    def _onnx_model_to_singa_ops(cls, graph, device, opset_version):
         """
-        get all intermediate tensors and operators from onnx model
-        Args:
-            model: a given onnx model
+        get all intermediate params, operators, and input info from onnx model
         Args:
-            init_inputs: a list of inputs, which used to init the operators
-        Args:
-            device: the used device
-        Args:
-            opset_version: the opset version
-        Returns:
-            a dict of tensors
+            graph (Graph): the loaded ONNX graph
+            device (string): CPU or CUDA
+            opset_version (int): the opset version
         Returns:
-            a list of SingaOps('name', 'op', 'handle', 'forward')
-        """
-        # init all tensor input and weight as a tensor map
-        tensor_map = cls._init_graph_parameter(model.graph, init_inputs, device)
-        # only weights tensor
-        weights = {x.name: tensor_map[x.name] for x in model.graph.initializer}
+            a dict of weights
+            a dict of ValueInfo
+            a dict of ValueInfo
+            a list of SingaOps('node', 'forward')
+        """
+        # init all tensor input and params as a tensor map
+        params = cls._parse_graph_params(graph, device)
+        inputs, outputs = cls._parse_graph_inputs_outputs(graph, params, device)
         # the parsed operators queue
-        singa_ops = []
-        singa_op = namedtuple('SingaOps', ['name', 'op', 'handle', 'forward'])
-        for node in model.graph.node:
+        operators = []
+        operator_tuple = namedtuple('operator_tuple', ['node', 'operator'])
+        for node in graph.node:
             node = OnnxNode(node)
-            # only give the inputs it needs
-            # consumed_inputs are the inputs marked as attributes
-            # so we remove it here
-            inputs = [
-                tensor_map[x]
-                for x in node.inputs
-                if x not in node.consumed_inputs
-            ]
-            handle, forward = cls._onnx_node_to_singa_op(
-                node, inputs, opset_version)
-            # if it is Constant, we hanlde it as a weight
-            # otherwise, we run it and add its output into map for being used by later operators
+            # convert Constant to param
             if node.op_type == 'Constant':
-                tmp_tensor = tensor.from_numpy(forward)
-                tmp_tensor.to_device(device)
-                tmp_name = node.outputs.pop(0)
-                weights[tmp_name] = tmp_tensor
-                tensor_map[tmp_name] = tmp_tensor
+                params[node.outputs[0]] = cls._onnx_constant_to_np(node)
             else:
-                outputs = cls._run_node(node, inputs, handle, forward)
-                for key, val in outputs.items():
-                    tensor_map[key] = val
-                singa_ops.extend([singa_op(node.name, node, handle, forward)])
-        return weights, singa_ops
+                op = cls._onnx_node_to_singa_op(node, opset_version)
+                operators.extend([operator_tuple(node, op)])
+        return params, inputs, outputs, operators
 
     @classmethod
-    def prepare(cls, model, device, **kwargs):
+    def prepare(cls, model, device='CPU', **kwargs):
         """
-        get the batch norm operator from onnx node
-        Args:
-            model: a given onnx node
+        parse the ONNX and to create layers
         Args:
-            device: the used device
-        Returns: 
-            a list of output values
+            model (ModelProto): the loaded ONNX model
+            device (string): CPU or CUDA
+        Returns:
+            a SingaRep instance to stores the layers and weights
         """
         super(SingaBackend, cls).prepare(model, device, **kwargs)
-        # when parsing graph, we use the shape of input gived by onnx to init a random value
-        # HOWEVER, the random value may not be correct for some inputs, such as gather which needs indices
-        # so if have operators, the user must give inputs
-        init_inputs = kwargs.get("init_inputs", None)
-        # whether initializers are moved into inputs, due to https://github.com/onnx/onnx/issues/2417
-        # sometimes, input contains all initializer's info, sometimes, may not
-        cls.keep_initializers_as_inputs = kwargs.get(
-            'keep_initializers_as_inputs', True)
         # optimize and infer the shape of the model
         try:
             model = onnx.utils.polish_model(model)
         except IndexError as err:
-            # due to https://github.com/onnx/onnx/issues/2417
             model = onnx.shape_inference.infer_shapes(model)
 
         # check the opset version and ir version
+        # SINGA supports opset version(11), ir version(1.6.0 -> 6)
         opset_version = None
         for imp in model.opset_import:
             if not imp.HasField("domain") or imp.domain == "":
                 opset_version = imp.version
-                if imp.version > cls._known_opset_version:
+                if imp.version > cls._opset_version:
                     warnings.warn(
-                        "This version of singa targets ONNX operator set version {}, but the model we are trying to import uses version {}.  We will try to import it anyway, but if the model uses operators which had BC-breaking changes in the intervening versions, import will fail."
-                        .format(cls._known_opset_version, imp.version))
+                        "The imported opertor set verion {} is larger than the supported version {}."
+                        .format(imp.version, cls._opset_version))
             else:
                 warnings.warn("Unrecognized operator set {}".format(imp.domain))
-        if opset_version is None:
-            if model.ir_version >= 0x00000003:
-                raise RuntimeError(
-                    "Model with IR version >= 3 did not specify ONNX operator set version (singa requires it)"
-                )
-            else:
-                opset_version = 1
-        weights, singa_ops = cls._onnx_model_to_singa_net(
-            model, init_inputs, device, opset_version)
-        return SingaRep(model, weights, singa_ops,
-                        cls.keep_initializers_as_inputs)
+
+        if model.ir_version > cls._ir_version:
+            warnings.warn(
+                "The imported ir verion {} is larger than the supported version {}."
+                .format(cls._ir_version, imp.version))
+
+        graph = model.graph
+        params, inputs, outputs, layers = cls._onnx_model_to_singa_ops(
+            graph, device, opset_version)
+        return SingaRep(params, inputs, outputs, layers, device)
 
 
 class SingaRep(BackendRep):
 
-    def __init__(self,
-                 model,
-                 weights,
-                 singa_ops,
-                 keep_initializers_as_inputs=True):
+    def __init__(self, params, inputs, outputs, layers, device):
         """
+        https://github.com/onnx/onnx/blob/master/docs/ImplementingAnOnnxBackend.md
         SingaRep provides the intermediate representation of Singa,
         the user can run the forward of the singa model by run func,
         or, the user can append more layers after the singa_ops to do
         the transfer learning
         Args:
-            model: a given operator
+            params (dict{}): a dict of params, data type is numpy ndarray
+            inputs (ValueInfo): a dict of inputs
+            outputs (ValueInfo): a dict of outputs
+            layers (namedtuple('operator_tuple', ['node', 'operator'])[]): a list of singa operator
+            device (string): CPU or CUDA
+        """
+        super(SingaRep, self).__init__()
+        self.inputs = inputs
+        self.states = params
+        self.outputs = outputs
+        self.dev = cpu_dev if device == "CPU" else gpu_dev
+        self.layers = layers
+        self.tensor_count = {}
+        self.has_initialized = False
+        self.is_graph = False
+
+    def initialize(self):
+        """
+        Init the instance
+        """
+        self.outputs_info = {outp.name: outp for outp in self.outputs}
+        _layers = []  # layers by topo order
+        for node, operator in self.layers:
+            for key, name in node.weight_inputs.items():
+                if key not in self.states:
+                    # cannot find the weights, try to find it from input
+                    node.set_attr_inputs(key, name)
+            self.__dict__[node.name] = operator
+            # init the tensor count
+            all_possible_inputs = node.inputs + list(
+                node.attr_inputs.keys()) + list(node.weight_inputs.keys())
+            for inp in all_possible_inputs:
+                if inp not in self.tensor_count:
+                    self.tensor_count[inp] = 1
+                else:
+                    self.tensor_count[inp] += 1
+            _layers.append(node)
+        self._layers = _layers
+
+    def init_tensor_count(self):
+        """
+        Init the tensor count dict
+        """
+        self.tensor_count = {}
+        for node, operator in self.layers:
+            # init the tensor count
+            all_possible_inputs = node.inputs + list(
+                node.attr_inputs.keys()) + list(node.weight_inputs.keys())
+            for inp in all_possible_inputs:
+                if inp not in self.tensor_count:
+                    self.tensor_count[inp] = 1
+                else:
+                    self.tensor_count[inp] += 1
+
+    def to_input_tensor(self, x):
+        """
+        convert the input to tensors
         Args:
-            weights: the tensor of weights
+            x (np.ndarray[]): a list of numpy ndarray as inputs
+        Returns: 
+            a dict of SINGA Tensors
+        """
+        tensor_dict = {}
+        # init inputs as Tensor
+        for (key, val) in zip(self.inputs, x):
+            if not self.is_graph:
+                val = val.astype(onnx_type_to_singa_type(key.dtype))
+                # todo, scalar
+                val = np.atleast_1d(self.states[val])
+                val = tensor.from_numpy(val)
+                val.to_device(self.dev)
+            tensor_dict[key.name] = val
+        return tensor_dict
+
+    def to_output_tensor(self, y, out_name):
+        """
+        convert the tensors to input
         Args:
-            singa_ops: the tensor of the operator
+            x (np.ndarray[]): a list of numpy ndarray as inputs
+        Returns: 
+            a dict of SINGA Tensors
         """
-        super(SingaRep, self).__init__()
-        self.model = model
-        self.tensor_map = weights
-        self.keep_initializers_as_inputs = keep_initializers_as_inputs
-        # this each item of singa_ops is: ('name', 'op', 'handle', 'forward')
-        # the name is a string, op is OnnxNode,
-        # handle is Singa handle to store the tensor into singa operator
-        # the forward is singa autograd operator
-        self.singa_ops = singa_ops
+        if not self.is_graph:
+            y = tensor.to_numpy(y)
+            if out_name in self.outputs_info:
+                np_dtyp = mapping.TENSOR_TYPE_TO_NP_TYPE[
+                    self.outputs_info[out_name].dtype]
+                y = y.astype(np_dtyp)
+        return y
 
-    def run(self, inputs, **kwargs):
+    def get_s(self, name, node, tensor_dict):
+        """
+        get state from the node's weights or tensor_dict
+        Args:
+            name (str): name of the state
+            node (ONNXNode): ONNX node
+            tensor_dict ({}): tensor dict
+        Returns: 
+            the states
+        """
+        if name in node.attr_inputs:
+            return tensor_dict[name]
+        else:
+            return self.states[name]
+
+    def handle_special_ops(self, node, op, tensor_dict):
+        """
+        hanlde some special operations
+        Args:
+            name (str): name of the state
+            node (ONNXNode): ONNX node
+            tensor_dict ({}): tensor dict
+        Returns: 
+            the states
+        """
+        # todo, hard code
+        # Conv2d nb_kernels
+        if node.op_type == "Conv":
+            shape = self.get_s(node.inputs[1], node, tensor_dict).shape
+            op.nb_kernels = shape[0]
+        # Gemm nb_kernels and bias_shape
+        elif node.op_type == "Gemm":
+            nb_kernels_flag = 0 if op.transB == 1 else 1
+            shape = self.get_s(node.inputs[1], node, tensor_dict).shape
+            op.nb_kernels = shape[nb_kernels_flag]
+            if op.bias:
+                shape = self.get_s(node.inputs[2], node, tensor_dict).shape
+                op.bias_shape = shape
+
+    def run(self, *x, **kwargs):
         """
         run the forward of singa model
         Args:
-            inputs: a given operator
+            x (np.ndarray[]): a list of numpy ndarray as inputs
         Returns: 
-            the onnx node
+            a list of outputs
         """
-        graph = self.model.graph
+        if not self.has_initialized:
+            self.initialize()
+            if isinstance(x[0], tensor.Tensor):
+                self.dev = x[0].device
+            self.has_initialized = True
+
+        outputs_dict = OrderedDict([(outp.name, None) for outp in self.outputs])
+
         # last_layers means we run this model until the last #N layers
-        last_layers = kwargs.get('last_layers', len(self.singa_ops))
-        if last_layers != len(self.singa_ops):
-            final_outputs = self.singa_ops[last_layers-1].op.outputs
-        else:
-            final_outputs =  [outp.name for outp in graph.output]
-        # whether return all outputs
-        all_outputs = kwargs.get('all_outputs', False)
-        # get a specific op by its name
-        op_name = kwargs.get('op_name', None)
-        # record the tensor we added from input
-        tmp_tensor_map = {name: val for name, val in self.tensor_map.items()}
-
-        # the dict will be returned
-        ret_outputs = OrderedDict()
-        if self.keep_initializers_as_inputs:
-            require_input_len = len(graph.input) - len(graph.initializer)
-            actual_input_len = len(inputs)
-        else:
-            require_input_len = len(graph.input)
-            actual_input_len = len(inputs)
-        assert require_input_len == actual_input_len, "The length of graph input is different from the tensor input: %d, %d" % (
-            require_input_len, actual_input_len)
-        # run the handle by the order of the list(the list is Topological Sorting)
-        for inp in graph.input:
-            if inp.name not in tmp_tensor_map:
-                tmp_tensor_map[inp.name] = inputs.pop(0)
-
-        for _, op, handle, forward in self.singa_ops[:last_layers]:
-            if len(op.consumed_inputs) != 0:
-                # because if op has consumed_inputs, it means it moved some inputs into attributes
-                # so when running, we should update these attributes
-                handle, forward = get_op(op,
-                                         [tmp_tensor_map[x] for x in op.inputs])
-            inputs = [
-                tmp_tensor_map[x]
-                for x in op.inputs
-                if x not in op.consumed_inputs
-            ]
-            outputs = _run_node(op, inputs, handle, forward)
-            for key, val in outputs.items():
-                tmp_tensor_map[key] = val
-                ret_outputs[key] = val
-
-        if op_name is not None:
-            if op_name in outputs:
-                return outputs[op_name]
+        last_layers = kwargs.get('last_layers', len(self._layers))
+        if last_layers != len(self._layers):
+            for outp in self._layers[last_layers - 1].outputs:
+                outputs_dict[outp] = None
+
+        aux_output = kwargs.get('aux_output', ())
+        for outp in aux_output:
+            outputs_dict[outp] = None
+
+        tensor_dict = self.to_input_tensor(x)
+        self.init_tensor_count()
+
+        # run the layer by the topo order
+        for node in self._layers[:last_layers]:
+            op = self.__dict__[node.name]
+            self.handle_special_ops(node, op, tensor_dict)
+            # make input
+            inputs = []
+            for inp in node.inputs:
+                if inp not in node.weight_inputs and inp not in node.attr_inputs:
+                    if inp in tensor_dict:
+                        inputs.append(tensor_dict[inp])
+                    elif inp in self.states:
+                        # todo, scalar
+                        val = np.atleast_1d(self.states[inp])
+                        val = tensor.from_numpy(val)
+                        val.to_device(self.dev)
+                        inputs.append(val)
+                    else:
+                        raise KeyError("Not found the input {} for operation {}".format(inp, node.name))
+            states = {}
+            if callable(getattr(op, "initialize",
+                                None)) and not op._initialized:
+                # init the operator
+                op.initialize(*inputs)
+                op._initialized = True
+                for key, name in node.weight_inputs.items():
+                    if key not in node.attr_inputs:
+                        # find the weights and not in the inputs
+                        states[name] = self.states[key]
+
+            # replace attrs by inputs
+            for key, name in node.attr_inputs.items():
+                if key in tensor_dict:
+                    ts = tensor_dict[key]
+                    if isinstance(ts, tensor.Tensor):
+                        ts = tensor.to_numpy(ts)
+                    states[name] = ts
+                elif key in self.states:
+                    states[name] = self.states[key]
+            # set states
+            if callable(getattr(op, "set_states", None)):
+                op.set_states(**states)
             else:
-                raise RuntimeError(
-                    "The op_name {} does not exist, please check. The available op_names are: {}"
-                    .format(op_name, [val for key, val in op_name.items()]))
-
-        # return all outputs if all_outputs==True
-        # else return last outputs
-        if all_outputs:
-            return ret_outputs
-        else:
-            return [ret_outputs[outp] for outp in final_outputs]
+                for key, value in states.items():
+                    setattr(op, key, value)
+            # run the node
+            outputs = _run_node(op, inputs)
+            # release the input tensor
+            for inp in node.inputs:
+                if inp in self.tensor_count:
+                    self.tensor_count[inp] -= 1
+                if self.tensor_count[inp] == 0:
+                    if inp in tensor_dict:
+                        del tensor_dict[inp]
+                    del self.tensor_count[inp]
+            # store the output
+            for (outp, val) in zip(node.outputs, outputs):
+                tensor_dict[outp] = val
+                if outp in outputs_dict:
+                    outputs_dict[outp] = self.to_output_tensor(val, outp)
+        return list(outputs_dict.values())
+
+
+class SONNXModel(model.Model):
+
+    def __init__(self, onnx_model):
+        """
+        Init a SIGNA Model
+        Args:
+            onnx_model (ModelProto): a loaded onnx model
+        """
+        super(SONNXModel, self).__init__()
+        self.sg_ir = prepare(onnx_model)
+        for node, operator in self.sg_ir.layers:
+            self.__dict__[node.name] = operator

Review comment:
       No, you can check the dict of ` _rename_operators`, for the normal operator, it creates the operator from the `autograd.py`, and for the layer operator, it creates it from `layer.py`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] lgtm-com[bot] commented on pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

lgtm-com[bot] commented on pull request #703:
URL: https://github.com/apache/singa/pull/703#issuecomment-636328317


   This pull request **introduces 3 alerts** and **fixes 1** when merging 7e5bf2686ad608aa963657e5273daf9d094bcbaa into dd18aff58aafe29b2984c884fad7e453bfe2d507 - [view on LGTM.com](https://lgtm.com/projects/g/apache/singa/rev/pr-4935c4c51e7520ead532f5ba42a7f65b45b44ec2)
   
   **new alerts:**
   
   * 3 for Unused import
   
   **fixed alerts:**
   
   * 1 for Unused local variable


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] lgtm-com[bot] commented on pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

lgtm-com[bot] commented on pull request #703:
URL: https://github.com/apache/singa/pull/703#issuecomment-630334827


   This pull request **introduces 10 alerts** and **fixes 13** when merging 7e47512775933cd33d980e3321d57a9f45055cae into db1846dd2c612950054f75b8125f40cd25a20f44 - [view on LGTM.com](https://lgtm.com/projects/g/apache/singa/rev/pr-c0db4b8a230c6ca2f56d44a7219d3611ecda7122)
   
   **new alerts:**
   
   * 3 for Unused local variable
   * 3 for Unreachable code
   * 1 for Unnecessary pass
   * 1 for Unused import
   * 1 for Missing call to \_\_init\_\_ during object initialization
   * 1 for Wrong number of arguments in a class instantiation
   
   **fixed alerts:**
   
   * 8 for Missing call to \_\_init\_\_ during object initialization
   * 2 for Unreachable code
   * 1 for Unnecessary pass
   * 1 for Unused local variable
   * 1 for Mismatch between signature and use of an overridden method


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] lgtm-com[bot] commented on pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

lgtm-com[bot] commented on pull request #703:
URL: https://github.com/apache/singa/pull/703#issuecomment-631072077


   This pull request **introduces 5 alerts** and **fixes 1** when merging 884998226268d222b1bf2ad1e76c6774479f68e9 into db1846dd2c612950054f75b8125f40cd25a20f44 - [view on LGTM.com](https://lgtm.com/projects/g/apache/singa/rev/pr-92ab63b680c18acf5e0a2f1ba24a0b293edcf461)
   
   **new alerts:**
   
   * 4 for Duplicate key in dict literal
   * 1 for Unused local variable
   
   **fixed alerts:**
   
   * 1 for Unused local variable


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] lgtm-com[bot] commented on pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

lgtm-com[bot] commented on pull request #703:
URL: https://github.com/apache/singa/pull/703#issuecomment-630023072


   This pull request **introduces 7 alerts** when merging b6a8754e048e6447ccdfd9d359692143e6c967cb into db1846dd2c612950054f75b8125f40cd25a20f44 - [view on LGTM.com](https://lgtm.com/projects/g/apache/singa/rev/pr-f213db032a5d6a71907b8a64b0fb520467054dfc)
   
   **new alerts:**
   
   * 3 for Unused local variable
   * 3 for Unreachable code
   * 1 for Unused import


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] lgtm-com[bot] commented on pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

lgtm-com[bot] commented on pull request #703:
URL: https://github.com/apache/singa/pull/703#issuecomment-636111115


   This pull request **introduces 15 alerts** and **fixes 1** when merging f83b286db1aa1f0e17ac55365f3e409b65e0dfa6 into bec19640aa7dae75cd12a44109bc319d2e30cfbf - [view on LGTM.com](https://lgtm.com/projects/g/apache/singa/rev/pr-d379d4e2618d1049912a3fb1aae2d3798476f307)
   
   **new alerts:**
   
   * 14 for Unused import
   * 1 for Unused local variable
   
   **fixed alerts:**
   
   * 1 for Unused local variable


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] lgtm-com[bot] commented on pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

lgtm-com[bot] commented on pull request #703:
URL: https://github.com/apache/singa/pull/703#issuecomment-639520003


   This pull request **fixes 1 alert** when merging 43addc7d003d17edd8933c008d410e63ed414d64 into ede4a3ed0e29e4ef488e76e37f6c020c44508ea0 - [view on LGTM.com](https://lgtm.com/projects/g/apache/singa/rev/pr-6b5b70beb389ef940691bd5183cafa51692bc125)
   
   **fixed alerts:**
   
   * 1 for Unused local variable


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] joddiy commented on a change in pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

joddiy commented on a change in pull request #703:
URL: https://github.com/apache/singa/pull/703#discussion_r433032119



##########
File path: python/singa/sonnx.py
##########
@@ -989,1150 +1035,1051 @@ def from_onnx(args):
 class SingaBackend(Backend):
 
     # This number indicates the onnx operator set version
-    _known_opset_version = 11
+    _opset_version = 11
+
+    _ir_version = 0x0000000000000006
 
     # beceuase singa's operators are different from onnx.
     # we define a dict for the name projection
     _rename_operators = {
-        'Relu': 'relu',
-        'Softmax': 'SoftMax',
-        'Sigmoid': 'sigmoid',
-        'Add': 'add',
-        'MatMul': 'matmul',
-        'Conv': '_Conv2d',
-        'MaxPool': '_Pooling2d',
-        'AveragePool': '_Pooling2d',
-        'BatchNormalization': 'batchnorm_2d',
-        'Concat': 'Concat',
-        'Flatten': 'Flatten',
-        'Gemm': 'Gemm',
-        'Reshape': 'Reshape',
-        'Sum': 'sum',
-        'Cos': 'cos',
-        'Cosh': 'cosh',
-        'Sin': 'sin',
-        'Sinh': 'sinh',
-        'Tan': 'tan',
-        'Tanh': 'tanh',
-        'Acos': 'acos',
-        'Acosh': 'acosh',
-        'Asin': 'asin',
-        'Asinh': 'asinh',
-        'Atan': 'atan',
-        'Atanh': 'atanh',
-        'Selu': 'SeLU',
-        'Elu': 'Elu',
-        'Equal': 'equal',
-        'Less': 'less',
-        'Sign': 'sign',
-        'Div': 'div',
-        'Sub': 'sub',
-        'Sqrt': 'sqrt',
-        'Log': 'log',
-        'Greater': 'greater',
-        'HardSigmoid': 'HardSigmoid',
-        'Identity': 'identity',
-        'Softplus': 'softplus',
-        'Softsign': 'softsign',
-        'Mean': 'mean',
-        'Pow': 'pow',
-        'Clip': 'Clip',
-        'PRelu': 'prelu',
-        'Mul': 'mul',
-        'Transpose': 'Transpose',
-        'Max': 'max',
-        'Min': 'min',
-        'Shape': 'shape',
-        'And': '_and',
-        'Or': '_or',
-        'Xor': '_xor',
-        'Not': '_not',
-        'Neg': 'negative',
-        'Reciprocal': 'reciprocal',
-        'ConstantOfShape': 'ConstantOfShape',
-        'Dropout': 'Dropout',
-        'ReduceSum': 'ReduceSum',
-        'ReduceMean': 'ReduceMean',
-        'LeakyRelu': 'LeakyRelu',
-        'GlobalAveragePool': 'GlobalAveragePool',
-        'Squeeze': 'Squeeze',
+        # common op
+        'Relu': 'ReLU',
+        'Sigmoid': 'Sigmoid',
+        'Add': 'Add',
+        'MatMul': 'Matmul',
+        'Sum': 'Sum',
+        'Cos': 'Cos',
+        'Cosh': 'Cosh',
+        'Sin': 'Sin',
+        'Sinh': 'Sinh',
+        'Tan': 'Tan',
+        'Tanh': 'Tanh',
+        'Acos': 'Acos',
+        'Acosh': 'Acosh',
+        'Asin': 'Asin',
+        'Asinh': 'Asinh',
+        'Atan': 'Atan',
+        'Atanh': 'Atanh',
+        'Equal': 'Equal',
+        'Less': 'Less',
+        'Sign': 'Sign',
+        'Div': 'Div',
+        'Sub': 'Sub',
+        'Sqrt': 'Sqrt',
+        'Log': 'Log',
+        'Greater': 'Greater',
+        'Identity': 'Identity',
+        'Softplus': 'SoftPlus',
+        'Softsign': 'SoftSign',
+        'Mean': 'Mean',
+        'Pow': 'Pow',
+        'PRelu': 'PRelu',
+        'Mul': 'Mul',
+        'Max': 'Max',
+        'Min': 'Min',
+        'Shape': 'Shape',
+        'And': 'And',
+        'Or': 'Or',
+        'Xor': 'Xor',
+        'Not': 'Not',
+        'Neg': 'Negative',
+        'Reciprocal': 'Reciprocal',
         'Unsqueeze': 'Unsqueeze',
-        'Slice': 'Slice',
+        'NonZero': 'NonZero',
         'Ceil': 'Ceil',
-        'Split': 'Split',
-        'Gather': 'Gather',
-        'Tile': 'Tile',
-        'NonZero': 'nonzero',
+        # # special op
         'Cast': 'Cast',
+        'Split': 'Split',
+        'Squeeze': 'Squeeze',
+        'GlobalAveragePool': 'GlobalAveragePool',
+        'LeakyRelu': 'LeakyRelu',
+        'ReduceSum': 'ReduceSum',
+        'ReduceMean': 'ReduceMean',
+        'Dropout': 'Dropout',
+        'ConstantOfShape': 'ConstantOfShape',
+        'Transpose': 'Transpose',
+        'HardSigmoid': 'HardSigmoid',
+        'Elu': 'Elu',
+        'Selu': 'SeLU',
+        'Concat': 'Concat',
+        'Softmax': 'SoftMax',
+        'Flatten': 'Flatten',
         'OneHot': 'OneHot',
+        'Tile': 'Tile',
+        'Gather': 'Gather',
+        'Reshape': 'Reshape',
+        'Slice': 'Slice',
+        'Clip': 'Clip',
+        'Gemm': 'layer.Gemm',  # layer
+        'BatchNormalization': 'layer.BatchNorm2d',  # layer
+        'Conv': 'layer.Conv2d',  # layer
+        'MaxPool': 'layer.Pooling2d',  # layer
+        'AveragePool': 'layer.Pooling2d',  # layer
     }
 
     # this dict indicates the operators that need extra handle
     # each indicates a function name
     _special_operators = {
-        'Conv': '_create_conv',
-        'MaxPool': '_create_max_avg_pool',
-        'AveragePool': '_create_max_avg_pool',
-        'BatchNormalization': '_create_batchnorm',
+        'Cast': '_create_cast',
+        'Split': '_create_split',
+        'Squeeze': '_create_squeeze_unsqueeze',
+        'Unsqueeze': '_create_squeeze_unsqueeze',
+        'GlobalAveragePool': '_create_global_average_pool',
+        'LeakyRelu': '_create_leakyrelu',
+        'ReduceSum': '_create_reduce_ops',
+        'ReduceMean': '_create_reduce_ops',
+        'Dropout': '_create_dropout',
+        'ConstantOfShape': '_create_constant_of_shape',
+        'Transpose': '_create_transpose',
+        'HardSigmoid': '_create_hardsigmoid',
+        'Elu': '_create_elu',
+        'Selu': '_create_selu',
         'Concat': '_create_concat',
-        'Flatten': '_create_flatten',
+        'Softmax': '_create_softmax',
         'Gemm': '_create_gemm',
+        'Flatten': '_create_flatten',
+        'OneHot': '_create_onehot',
+        'Tile': '_create_tile',
+        'Gather': '_create_gather',
         'Reshape': '_create_reshape',
-        'Softmax': '_create_softmax',
-        'Selu': '_create_selu',
-        'Elu': '_create_elu',
-        'HardSigmoid': '_create_hardsigmoid',
-        'Clip': '_create_clip',
-        'Transpose': '_create_transpose',
-        'ConstantOfShape': '_create_constantOfShape',
-        'Dropout': '_create_dropout',
-        'ReduceSum': '_create_reduceOp',
-        'ReduceMean': '_create_reduceOp',
-        'LeakyRelu': '_create_leakyrelu',
-        'GlobalAveragePool': '_create_globalaveragepool',
-        'Squeeze': '_create_squeeze',
-        'Unsqueeze': '_create_squeeze',
         'Slice': '_create_slice',
-        'Split': '_create_split',
-        'Gather': '_create_gather',
-        'Tile': '_create_tile',
-        'Cast': '_create_cast',
-        'OneHot': '_create_onehot',
-        'Constant': "_create_constant"
+        'Clip': '_create_clip',
+        'BatchNormalization': '_create_batch_norm',
+        'Conv': '_create_conv',
+        'MaxPool': '_create_max_avg_pool',
+        'AveragePool': '_create_max_avg_pool',
     }
 
     @classmethod
-    def _create_constant(cls, onnx_node, inputs, opset_version):
-        """
-        parse onnx constatn node to weights
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
-        """
-        tmp_tensor = onnx_node.getattr('value')
-        np_dtype = onnx.mapping.TENSOR_TYPE_TO_NP_TYPE[tmp_tensor.data_type]
-        np_tensor = np.frombuffer(tmp_tensor.raw_data, dtype=np_dtype)
-        if np_tensor.dtype == "int64":
-            np_tensor = np_tensor.astype(np.int32)
-        # todo, we cannot support scalar tensor
-        if np.ndim(np_tensor) == 0:
-            np_tensor = np.array(np_tensor, ndmin=1)
-        return None, np_tensor
-
-    @classmethod
-    def _create_onehot(cls, onnx_node, inputs, opset_version):
-        """
-        get the OneHot operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
-        """
-        axis = onnx_node.getattr("axis", -1)
-        # we move several inputs to singa's attribuates
-        # and mark them so we don't use them when we run this operator
-        depth = tensor.to_numpy(inputs.pop(1)).astype(np.int32)
-        value = tensor.to_numpy(inputs.pop(1))
-        onnx_node.consumed_inputs.extend(onnx_node.inputs[1:])
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(axis, depth, value)
-
-    @classmethod
-    def _create_cast(cls, onnx_node, inputs, opset_version):
+    def _create_cast(cls, onnx_node, operator, opset_version=_opset_version):
         """
         get the Cast operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
-        """
-        to = onnx_node.getattr("to")
-        # singa only supports float32 and int32
-        map_dict = {
-            TensorProto.FLOAT: tensor.float32,  # FLOAT to float32
-            TensorProto.UINT8: None,  # UINT8
-            TensorProto.INT8: tensor.int32,  # INT8 to int32
-            TensorProto.UINT16: None,  # UINT16
-            TensorProto.INT16: tensor.int32,  # INT16 to int32
-            TensorProto.INT32: tensor.int32,  # INT32 to int32
-            TensorProto.INT64: tensor.int32,  # INT64 to int32
-            TensorProto.STRING: None,  # stirng
-            TensorProto.BOOL: None,  # bool
-        }
-        to = map_dict[to]
-        assert to != None, "not support cast type: {}".format(to)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(to)
-
-    @classmethod
-    def _create_tile(cls, onnx_node, inputs, opset_version):
-        """
-        get the Tile operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
-        """
-        # we move several inputs to singa's attribuates
-        # and mark them so we don't use them when we run this operator
-        repeats = tensor.to_numpy(inputs.pop(1)).astype(np.int32).tolist()
-        onnx_node.consumed_inputs.append(onnx_node.inputs[1])
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(repeats)
-
-    @classmethod
-    def _create_gather(cls, onnx_node, inputs, opset_version):
-        """
-        get the Gather operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
-        axis = onnx_node.getattr("axis", 0)
-        # we move several inputs to singa's attribuates
-        # and mark them so we don't use them when we run this operator
-        indices = tensor.to_numpy(inputs.pop(1)).astype(np.int32).tolist()
-        onnx_node.consumed_inputs.append(onnx_node.inputs[1])
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(axis, indices)
+        to_type = onnx_type_to_singa_type(onnx_node.getattr("to"))
+        assert to_type != None, "not support cast type: {}".format(to_type)
+        if to_type == np.dtype('float32'):
+            return operator(tensor.float32)
+        else:
+            return operator(tensor.int32)
 
     @classmethod
-    def _create_split(cls, onnx_node, inputs, opset_version):
+    def _create_split(cls, onnx_node, operator, opset_version=_opset_version):
         """
         get the Split operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         axis = onnx_node.getattr("axis", 0)
         split = onnx_node.getattr("split", None)
         num_output = len(onnx_node.outputs)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(axis, split, num_output)
-
-    @classmethod
-    def _create_slice(cls, onnx_node, inputs, opset_version):
-        """
-        get the Slice operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
-        """
-        # we move several inputs to singa's attribuates
-        # and mark them so we don't use them when we run this operator
-        starts = tensor.to_numpy(inputs.pop(1)).astype(np.int32).tolist()
-        ends = tensor.to_numpy(inputs.pop(1)).astype(np.int32).tolist()
-        # sometime onnx may ignore these two inputs, axes and step
-        if len(inputs) >= 2 and onnx_node.inputs[3] != '':
-            axes = tensor.to_numpy(inputs.pop(1)).astype(np.int32).tolist()
-        else:
-            axes = None
-        steps = tensor.to_numpy(inputs.pop(1)).astype(
-            np.int32).tolist() if len(inputs) >= 2 else None
-        onnx_node.consumed_inputs.extend(onnx_node.inputs[1:])
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(starts, ends, axes, steps)
+        return operator(axis, split, num_output)
 
     @classmethod
-    def _create_squeeze(cls, onnx_node, inputs, opset_version):
+    def _create_squeeze_unsqueeze(cls,
+                                  onnx_node,
+                                  operator,
+                                  opset_version=_opset_version):
         """
         get the Squeeze and Unsqueeze operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         axes = onnx_node.getattr("axes")
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(axes)
+        return operator(axes)
 
     @classmethod
-    def _create_globalaveragepool(cls, onnx_node, inputs, opset_version):
+    def _create_global_average_pool(cls,
+                                    onnx_node,
+                                    operator,
+                                    opset_version=_opset_version):
         """
         get the GlobalAveragePool operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         data_format = onnx_node.getattr("data_format", 'channels_first')
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(data_format)
+        return operator(data_format)
 
     @classmethod
-    def _create_leakyrelu(cls, onnx_node, inputs, opset_version):
+    def _create_leakyrelu(cls,
+                          onnx_node,
+                          operator,
+                          opset_version=_opset_version):
         """
         get the LeakyRelu operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         alpha = onnx_node.getattr("alpha", 0.01)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(alpha)
+        return operator(alpha)
 
     @classmethod
-    def _create_reduceOp(cls, onnx_node, inputs, opset_version):
+    def _create_reduce_ops(cls,
+                           onnx_node,
+                           operator,
+                           opset_version=_opset_version):
         """
         get the ReduceSum, ReduceMean, ReduceMax, ReduceMin, etc, operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         axes = onnx_node.getattr("axes", None)
         keepdims = onnx_node.getattr("keepdims", 1)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(axes, keepdims)
+        return operator(axes, keepdims)
 
     @classmethod
-    def _create_dropout(cls, onnx_node, inputs, opset_version):
+    def _create_dropout(cls, onnx_node, operator, opset_version=_opset_version):
         """
         get the Dropout operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         ratio = onnx_node.getattr("ratio", 0)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(ratio)
+        return operator(ratio)
 
     @classmethod
-    def _create_constantOfShape(cls, onnx_node, inputs, opset_version):
+    def _create_constant_of_shape(cls,
+                                  onnx_node,
+                                  operator,
+                                  opset_version=_opset_version):
         """
         get the ConstantOfShape operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         value = onnx_node.getattr("value", 0)
         if isinstance(value, onnx.TensorProto):
             value = numpy_helper.to_array(value)[0].item()
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(value)
+        return operator(value)
 
     @classmethod
-    def _create_transpose(cls, onnx_node, inputs, opset_version):
+    def _create_transpose(cls,
+                          onnx_node,
+                          operator,
+                          opset_version=_opset_version):
         """
         get the Transpose operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
-        """
-        shape = inputs[0].shape
-        perm = onnx_node.getattr("perm", list(range(len(shape) - 1, -1, -1)))
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(perm)
-
-    @classmethod
-    def _create_clip(cls, onnx_node, inputs, opset_version):
-        """
-        get the clip operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
-        # sometime onnx may ignore these two inputs, min or max or both
-        if len(inputs) >= 2 and onnx_node.inputs[1] != '':
-            min_v = tensor.to_numpy(inputs.pop(1)).tolist()[0]
-        else:
-            min_v = None
-        if len(inputs) >= 2 and onnx_node.inputs[2] != '':
-            max_v = tensor.to_numpy(inputs.pop(1)).tolist()[0]
-        else:
-            max_v = None
-        onnx_node.consumed_inputs.extend(onnx_node.inputs[1:])
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(min_v, max_v)
+        perm = onnx_node.getattr("perm")
+        return operator(perm)
 
     @classmethod
-    def _create_hardsigmoid(cls, onnx_node, inputs, opset_version):
+    def _create_hardsigmoid(cls,
+                            onnx_node,
+                            operator,
+                            opset_version=_opset_version):
         """
-        get the HardSigmoid operator from onnx node
+        get the hardsigmoid operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            
+            opset_version (int): the opset version
         Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         alpha = onnx_node.getattr("alpha", 0.2)
         beta = onnx_node.getattr("beta", 0.5)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(alpha, beta)
+        return operator(alpha, beta)
 
     @classmethod
-    def _create_elu(cls, onnx_node, inputs, opset_version):
+    def _create_elu(cls, onnx_node, operator, opset_version=_opset_version):
         """
         get the elu operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         alpha = onnx_node.getattr("alpha", 1.)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(alpha)
+        return operator(alpha)
 
     @classmethod
-    def _create_selu(cls, onnx_node, inputs, opset_version):
+    def _create_selu(cls, onnx_node, operator, opset_version=_opset_version):
         """
         get the selu operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         alpha = onnx_node.getattr("alpha", 1.67326)
         gamma = onnx_node.getattr("gamma", 1.0507)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(alpha, gamma)
+        return operator(alpha, gamma)
 
     @classmethod
-    def _create_reshape(cls, onnx_node, inputs, opset_version):
+    def _create_concat(cls, onnx_node, operator, opset_version=_opset_version):
         """
-        get the reshape operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
+        get the concat operator from onnx node
         Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            the handle of singa operator
-        Returns: 
-            the autograd of singa operator
+            singa operator instance
         """
-        shape = tensor.to_numpy(inputs.pop(1)).astype(np.int32).tolist()
-        onnx_node.consumed_inputs.append(onnx_node.inputs[1])
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(shape)
+        factor = onnx_node.getattr('axis')
+        return operator(axis=factor)
 
     @classmethod
-    def _create_conv(cls, onnx_node, inputs, opset_version):
+    def _create_softmax(cls, onnx_node, operator, opset_version=_opset_version):
         """
-        get the conv operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
+        get the softmax operator from onnx node
         Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
-        kernel = tuple(onnx_node.attrs["kernel_shape"])
-        padding = tuple(
-            onnx_node.attrs["pads"]) if "pads" in onnx_node.attrs else (0, 0)
-        stride = tuple(onnx_node.getattr('strides', (1, 1)))
-        # default the odd_padding is 0, once there are same pad mode, we modify it
-        # for odd_padding, please refer the autegrade.py
-        odd_padding = (0, 0, 0, 0)
-        if "auto_pad" in onnx_node.attrs:
-            auto_pad = utils.force_unicode(onnx_node.attrs['auto_pad'])
-            if auto_pad in ('SAME_UPPER', 'SAME_LOWER'):
-                padding, odd_padding = utils.get_padding_shape(
-                    auto_pad, inputs[0].shape[2:], kernel, stride)
-
-        # not support dilation
-        dilation = onnx_node.getattr('dilations', 1)
-        if dilation != 1 and list(dilation) != [1, 1]:
-            raise ValueError("Not implemented yet for dilation")
-        group = onnx_node.getattr('group', 1)
-
-        # only support 1d or 2d
-        if len(kernel) > 2:
-            raise ValueError("Only implemented for 1d or 2d")
-
-        bias = len(inputs) == 3
-        x = inputs[0]
-        x_shape = inputs[0].shape
-        in_channels = x_shape[1]
-        w_shape = inputs[1].shape
-        out_channels = w_shape[0]
-        assert w_shape[1] == in_channels // group
-
-        if inputs[0].device.id() == -1:
-            if group != 1:
-                raise NotImplementedError
-            else:
-                handle = singa.ConvHandle(x.data, kernel, stride, padding,
-                                          in_channels, out_channels, bias,
-                                          group)
-        else:
-            handle = singa.CudnnConvHandle(x.data, kernel, stride, padding,
-                                           in_channels, out_channels, bias,
-                                           group)
-
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(handle, odd_padding)
+        factor = onnx_node.getattr('axis', 1)
+        return operator(axis=factor)
 
     @classmethod
-    def _create_max_avg_pool(cls, onnx_node, inputs, opset_version):
+    def _create_gemm(cls, onnx_node, operator, opset_version=_opset_version):
         """
-        get the max or avg pool operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
+        get the gemm operator from onnx node
         Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
-        kernel = tuple(onnx_node.attrs["kernel_shape"])
-        padding = tuple(
-            onnx_node.attrs["pads"]) if "pads" in onnx_node.attrs else (0, 0)
-        stride = tuple(onnx_node.getattr('strides', (1, 1)))
-        # default the odd_padding is 0, once there are same pad mode, we modify it
-        # for odd_padding, please refer the autegrade.py
-        odd_padding = (0, 0, 0, 0)
-        if "auto_pad" in onnx_node.attrs:
-            auto_pad = utils.force_unicode(onnx_node.attrs['auto_pad'])
-            if auto_pad in ('SAME_UPPER', 'SAME_LOWER'):
-                padding, odd_padding = utils.get_padding_shape(
-                    auto_pad, inputs[0].shape[2:], kernel, stride)
-
-        # not support count_include_pad and auto_pad
-        if "count_include_pad" in onnx_node.attrs or "ceil_mode" in onnx_node.attrs:
-            raise ValueError(
-                "Not implemented yet for count_include_pad or ceil_mode")
-
-        # only support 2d
-        if len(kernel) != 2:
-            raise ValueError("Not implemented yet")
-
-        is_max = onnx_node.op_type == 'MaxPool'
-        x = inputs[0]
-        if x.device.id() == -1:
-            handle = singa.PoolingHandle(x.data, kernel, stride, padding,
-                                         is_max)
-        else:
-            handle = singa.CudnnPoolingHandle(x.data, kernel, stride, padding,
-                                              is_max)
-
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(handle, odd_padding)
+        alpha = onnx_node.getattr('alpha', 1.)
+        beta = onnx_node.getattr('beta', 1.)
+        transA = onnx_node.getattr('transA', 0)
+        transB = onnx_node.getattr('transB', 0)
+        onnx_node.set_weight_inputs(onnx_node.inputs[1], 'W')
+        bias = False
+        if len(onnx_node.inputs) == 3:
+            onnx_node.set_weight_inputs(onnx_node.inputs[2], 'b')
+            bias = True
+        return operator(None,
+                        alpha=alpha,
+                        beta=beta,
+                        transA=transA,
+                        transB=transB,
+                        bias=bias)
 
     @classmethod
-    def _create_batchnorm(cls, onnx_node, inputs, opset_version):
+    def _create_flatten(cls, onnx_node, operator, opset_version=_opset_version):
         """
-        get the batch norm operator from onnx node
-        Args:onnx_node: a given onnx node
-        Args:inputs: the input tensor
-        Args:opset_version: the opset version
-        Returns: the handle of singa operator
-        Returns: the autograd of singa operator
+        get the flatten operator from onnx node
+        Args:
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
+        Returns: 
+            singa operator instance
         """
-        x = inputs[0]
-        factor = onnx_node.getattr('momentum', 0.9)
-        if x.device.id() == -1:
-            handle = singa.BatchNormHandle(factor, x.data)
-        else:
-            handle = singa.CudnnBatchNormHandle(factor, x.data)
-
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return handle, forward
+        factor = onnx_node.getattr('axis', 1)
+        return operator(axis=factor)
 
     @classmethod
-    def _create_concat(cls, onnx_node, inputs, opset_version):
+    def _create_onehot(cls, onnx_node, operator, opset_version=_opset_version):
         """
-        get the concat operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
+        get the OneHot operator from onnx node
         Args:
-            opset_version: the opset version
-        Returns: 
-            the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            the autograd of singa operator
+            singa operator instance
         """
-        factor = onnx_node.attrs["axis"]
-        if factor < 0:
-            factor = len(inputs[0].shape
-                        ) + factor  # in order to support the negative axis
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return None, forward(axis=factor)
+        axis = onnx_node.getattr("axis", -1)
+        onnx_node.set_attr_inputs(onnx_node.inputs[1], 'depth')
+        onnx_node.set_attr_inputs(onnx_node.inputs[2], 'values')
+        return operator(axis, None, None)
 
     @classmethod
-    def _create_softmax(cls, onnx_node, inputs, opset_version):
+    def _create_tile(cls, onnx_node, operator, opset_version=_opset_version):
         """
-        get the concat operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
+        get the Tile operator from onnx node
         Args:
-            opset_version: the opset version
-        Returns: 
-            the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            the autograd of singa operator
+            singa operator instance
         """
-        factor = onnx_node.getattr('axis', 1)
-        if factor < 0:
-            # in order to support the negative axis
-            factor = len(inputs[0].shape) + factor
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return None, forward(axis=factor)
+        onnx_node.set_attr_inputs(onnx_node.inputs[1], 'repeats')
+        return operator(None)
 
     @classmethod
-    def _create_gemm(cls, onnx_node, inputs, opset_version):
+    def _create_gather(cls, onnx_node, operator, opset_version=_opset_version):
         """
-        get the gemm operator from onnx node
-        Args:
-            onnx_node: a given onnx node
+        get the Gather operator from onnx node
         Args:
-            inputs: the input tensor
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
+        Returns: 
+            singa operator instance
+        """
+        axis = onnx_node.getattr("axis", 0)
+        onnx_node.set_attr_inputs(onnx_node.inputs[1], 'indices')
+        return operator(axis, None)
+
+    @classmethod
+    def _create_reshape(cls, onnx_node, operator, opset_version=_opset_version):
+        """
+        get the reshape operator from onnx node
         Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            the handle of singa operator
+            singa operator instance
+        """
+        onnx_node.set_attr_inputs(onnx_node.inputs[1], 'shape')
+        return operator(None)
+
+    @classmethod
+    def _create_slice(cls, onnx_node, operator, opset_version=_opset_version):
+        """
+        get the Slice operator from onnx node
+        Args:
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            the autograd of singa operator
+            singa operator instance
         """
-        x = inputs[0]
-        alpha = onnx_node.getattr('alpha', 1.)
-        beta = onnx_node.getattr('beta', 1.)
-        transA = onnx_node.getattr('transA', 0)
-        transB = onnx_node.getattr('transB', 0)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return None, forward(alpha=alpha,
-                             beta=beta,
-                             transA=transA,
-                             transB=transB)
+        onnx_node.set_attr_inputs(onnx_node.inputs[1], 'starts')
+        onnx_node.set_attr_inputs(onnx_node.inputs[2], 'ends')
+        if len(onnx_node.inputs) >= 4 and onnx_node.inputs[3] != '':
+            onnx_node.set_attr_inputs(onnx_node.inputs[3], 'axes')
+        if len(onnx_node.inputs) == 5 and onnx_node.inputs[4] != '':
+            onnx_node.set_attr_inputs(onnx_node.inputs[4], 'steps')
+        return operator(None, None, None, None)
 
     @classmethod
-    def _create_flatten(cls, onnx_node, inputs, opset_version):
+    def _create_clip(cls, onnx_node, operator, opset_version=_opset_version):
         """
-        get the flatten operator from onnx node
+        get the clip operator from onnx node
         Args:
-            onnx_node: a given onnx node
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
+        Returns: 
+            singa operator instance
+        """
+        if len(onnx_node.inputs) >= 2 and onnx_node.inputs[1] != '':
+            onnx_node.set_attr_inputs(onnx_node.inputs[1], 'min')
+        if len(onnx_node.inputs) == 3 and onnx_node.inputs[2] != '':
+            onnx_node.set_attr_inputs(onnx_node.inputs[2], 'max')
+        return operator(None, None)
+
+    @classmethod
+    def _create_batch_norm(cls,
+                           onnx_node,
+                           operator,
+                           opset_version=_opset_version):
+        """
+        get the clip operator from onnx node
         Args:
-            inputs: the input tensor
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
+        Returns: 
+            singa operator instance
+        """
+        factor = onnx_node.getattr('momentum', 0.9)
+        onnx_node.set_weight_inputs(onnx_node.inputs[1], 'scale')
+        onnx_node.set_weight_inputs(onnx_node.inputs[2], 'bias')
+        onnx_node.set_weight_inputs(onnx_node.inputs[3], 'running_mean')
+        onnx_node.set_weight_inputs(onnx_node.inputs[4], 'running_var')
+        return operator(factor)
+
+    @classmethod
+    def _create_conv(cls, onnx_node, operator, opset_version=_opset_version):
+        """
+        get the clip operator from onnx node
         Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            the handle of singa operator
+            singa operator instance
+        """
+        kernel_size = tuple(onnx_node.getattr('kernel_shape'))
+        padding = tuple(onnx_node.getattr('pads', (0, 0)))
+        stride = tuple(onnx_node.getattr('strides', (1, 1)))
+        auto_pad = utils.force_unicode(onnx_node.getattr('auto_pad', 'NOTSET'))
+
+        # not support dilation
+        dilation = onnx_node.getattr('dilations', 1)
+        if dilation != 1 and list(dilation) != [1, 1]:
+            raise ValueError("Not implemented yet for dilation")
+        group = onnx_node.getattr('group', 1)
+
+        # only support 1d or 2d
+        if len(kernel_size) > 2:
+            raise ValueError("Only implemented for 1d or 2d")
+
+        onnx_node.set_weight_inputs(onnx_node.inputs[1], 'W')
+        bias = False
+        if len(onnx_node.inputs) == 3:
+            onnx_node.set_weight_inputs(onnx_node.inputs[2], 'b')
+            bias = True
+        return operator(None,
+                        kernel_size,
+                        stride=stride,
+                        padding=padding,
+                        dilation=dilation,
+                        group=group,
+                        bias=bias,
+                        pad_mode=auto_pad)
+
+    @classmethod
+    def _create_max_avg_pool(cls,
+                             onnx_node,
+                             operator,
+                             opset_version=_opset_version):
+        """
+        get the clip operator from onnx node
+        Args:
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            the autograd of singa operator
+            singa operator instance
         """
-        factor = onnx_node.getattr('axis', 1)
-        if factor < 0:
-            # in order to support the negative axis
-            factor = len(inputs[0].shape) + factor
+        kernel_size = tuple(onnx_node.getattr('kernel_shape'))
+        padding = tuple(onnx_node.getattr('pads', (0, 0)))
+        stride = tuple(onnx_node.getattr('strides', (1, 1)))
+        auto_pad = utils.force_unicode(onnx_node.getattr('auto_pad', 'NOTSET'))
 
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return None, forward(axis=factor)
+        # not support count_include_pad and auto_pad
+        ceil_mode = onnx_node.getattr('ceil_mode', 0)
+        count_include_pad = onnx_node.getattr('count_include_pad', 0)
+        if ceil_mode != 0 or count_include_pad != 0:
+            raise ValueError(
+                "Not implemented yet for count_include_pad or ceil_mode")
+
+        # only support 1d or 2d
+        if len(kernel_size) > 2:
+            raise ValueError("Only implemented for 1d or 2d")
+
+        is_max = onnx_node.op_type == 'MaxPool'
+        return operator(kernel_size, stride, padding, is_max, auto_pad)
 
     @classmethod
-    def _common_onnx_node_to_singa_op(cls, onnx_node, inputs, opset_version):
+    def _onnx_constant_to_np(cls, onnx_node, opset_version):
         """
-        get a common singa operator(only autograd) from a onnx node
-        other special operators also can call this func to get autograd
-        Args:
-            onnx_node: a given onnx node
+        parse onnx constatn node to numpy array
         Args:
-            tensor_map: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            a dict of tensors
+            onnx_node (OnnxNode): a given onnx node
+            opset_version (int): the opset version
         Returns: 
-            a list of SingaOps('name', 'op', 'handle', 'forward')
+            a numpy ndarray
         """
-        onnx_op_type = onnx_node.op_type
-        assert onnx_op_type in cls._rename_operators, "not support operator: {}".format(
-            onnx_op_type)
-        autograd_op = getattr(autograd, cls._rename_operators[onnx_op_type])
-        return None, autograd_op
+        onnx_tensor = onnx_node.getattr('value')
+        np_dtype = mapping.TENSOR_TYPE_TO_NP_TYPE[onnx_tensor.data_type]
+        np_tensor = np.frombuffer(onnx_tensor.raw_data, dtype=np_dtype)
+        return tensor.from_numpy(np_tensor)
 
     @classmethod
-    def _onnx_node_to_singa_op(cls,
-                               onnx_node,
-                               inputs,
-                               opset_version=_known_opset_version):
+    def _onnx_node_to_singa_op(cls, onnx_node, opset_version=_opset_version):
         """
-        get a singa operator(handle and autograd) from a onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input list
+        get singa operator from a onnx node
         Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            opset_version (int): the opset version
         Returns: 
-            a dict of tensors
-        Returns: 
-            a list of SingaOps('name', 'op', 'handle', 'forward')
+            singa operator instance
         """
+        onnx_op_type = onnx_node.op_type
+        assert onnx_op_type in cls._rename_operators, "not support operator: {}".format(
+            onnx_op_type)
+        renamed_op = cls._rename_operators[onnx_op_type]
+        if renamed_op.startswith('layer.'):
+            op_class = getattr(layer, renamed_op[6:])
+        else:
+            op_class = getattr(autograd, renamed_op)
         if onnx_node.op_type in cls._special_operators:
             translator = getattr(cls, cls._special_operators[onnx_node.op_type])
+            op = translator(onnx_node, op_class, opset_version)
         else:
-            translator = cls._common_onnx_node_to_singa_op
-        return translator(onnx_node, inputs, opset_version)
+            op = op_class()
+        # refine the ONNXNode
+        onnx_node.inputs = [inp for inp in onnx_node.inputs if inp != '']
+        return op
 
     @classmethod
-    def run_node(cls, onnx_node, inputs, opset_version=_known_opset_version):
+    def run_node(cls, node, inputs, device='CPU', opset_version=_opset_version):
         """
         run a single singa operator from a onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            device: the used device
-        Args:
-            opset_version: the opset version
-        Returns: 
-            list, the output of the 
+            node (NodeProto): a given onnx node
+            inputs (ndarray[]): a list of numpy ndarray
+            device (string): CPU or CUDA
+            opset_version (int): the opset version
+        Returns:
+            list, the output
         """
-        valid_inputs = [x for x in onnx_node.inputs if x != ""]
+        node = OnnxNode(node)
+        valid_inputs = [x for x in node.inputs if x != ""]
         assert len(valid_inputs) == len(
-            inputs), "{}: expected {} but got {}".format(
-                onnx_node.op_type, len(valid_inputs), len(inputs))
-
-        tmp_inputs = [inputs[x] for x in onnx_node.inputs if x != ""]
-        handle, forward = cls._onnx_node_to_singa_op(onnx_node, tmp_inputs,
-                                                     opset_version)
-        # only give the inputs it needs
-        # consumed_inputs are the inputs marked as attributes
-        # so we remove it here
-        tmp_inputs = [
-            inputs[x]
-            for x in onnx_node.inputs
-            if x not in onnx_node.consumed_inputs
-        ]
-        return cls._run_node(onnx_node, tmp_inputs, handle, forward,
-                             opset_version)
+            inputs), "{}: expected {} inputs, but got {}. ".format(
+                node.op_type, len(valid_inputs), len(inputs))
+
+        operator = cls._onnx_node_to_singa_op(node, opset_version)
+        # seperate weights with inputs, and init inputs as Tensor
+        weights = {}
+        _inputs = []
+        for (key, val) in zip(valid_inputs, inputs):
+            val = val.astype(onnx_type_to_singa_type(val.dtype))
+            if key in node.weight_inputs:
+                weights[key] = val
+            else:
+                x = tensor.from_numpy(val)
+                if device == 'CPU':
+                    assert singa.USE_CUDA, "Your SINGA doesn't compile GPU module."
+                    dev = device.create_cuda_gpu(set_default=False)
+                else:
+                    dev = device.get_default_device()
+                x.to_device(dev)
+                _inputs.append(x)
+        inputs = _inputs
+        # set params
+        params = {}
+        for key, name in node.weight_inputs.items():
+            params[name] = weights[key]
+        operator.set_params(params)
+        outputs = cls._run_node(operator, inputs)
+        outputs_dict = OrderedDict()
+        for (key, val) in zip(node.outputs, outputs):
+            outputs_dict[key] = val
+        return outputs_dict
 
     @classmethod
-    def _run_node(cls,
-                  onnx_node,
-                  inputs,
-                  handle,
-                  forward,
-                  opset_version=_known_opset_version):
+    def _run_node(cls, operator, inputs):
         """
-        run a single singa operator from a onnx node
-        Args:inputs: 
-            the input tensor
-        Args:handle: 
-            the handle of singa operator
-        Args:forward: 
-            the forward of singa operator
+        run a single singa operator from singa operator
         Args:
-            opset_version: the opset version
-        Returns: 
-            list, the output of the
+            operator (Operator): the Operator instance
+            inputs (Tensor[]): a list of SINGA Tensor
+        Returns:
+            list, the output
         """
-        outputs = forward(*inputs) if handle is None else forward(
-            handle, *inputs)
+        outputs = operator(*inputs)
         if not isinstance(outputs, collections.Iterable):
             outputs = [outputs]
-        outputs_dict = OrderedDict()
-        for (key, val) in zip(onnx_node.outputs, outputs):
-            outputs_dict[key] = val
-        return outputs_dict
+        return outputs
 
     @classmethod
-    def _init_graph_parameter(cls, graph, init_inputs, device):
+    def _parse_graph_params(cls, graph, device):
         """
-        init the singa tensor from onnx infos
+        parse the parameters from onnx graph
         Args:
-            graph: a given onnx graph
-        Args:
-            init_inputs: a list of inputs, which used to init the operators
+            graph (Graph): a given onnx graph
+            device (string): CPU or CUDA
+        Returns:
+            a dict of numpy ndarray
+        """
+        params = {}
+        for tp in graph.initializer:
+            val = numpy_helper.to_array(tp)
+            val = val.astype(onnx_type_to_singa_type(tp.data_type))
+            params[tp.name] = val
+        return params
+
+    @classmethod
+    def _parse_graph_inputs_outputs(cls, graph, params, device):
+        """
+        parse the inits, outputs from onnx graph
         Args:
-            device: the used device
+            graph (Graph): a given onnx graph
+            device (string): # CPU or CUDA
         Returns:
-            a dict of tensors
+            a dict of ValueInfo
+            a dict of ValueInfo
         """
-        tensor_map = {}
-        # due to https://github.com/onnx/onnx/issues/2417
-        # sometimes, input contains all initializer's info
-        # sometimes, may not
-        all_inputs = OrderedDict()
+        inputs = []
+        outputs = []
+        info_tuple = namedtuple('info_tuple', ['name', 'dtype', 'shape'])
         for t in graph.input:
-            all_inputs[t.name] = t
-        # so we refresh the input by the initializer
-        for t in graph.initializer:
-            all_inputs[t.name] = t
-        initializers = {t.name for t in graph.initializer}
-        inp_idx = 0
-        for name, x in all_inputs.items():
-            if name in initializers:
-                # if it has initializer, we use its value as the input
-                np_tensor = numpy_helper.to_array(x)
-                if np_tensor.dtype == "int64":
-                    np_tensor = np_tensor.astype(np.int32)
-                # todo, we cannot support scalar tensor
-                if np.ndim(np_tensor) == 0:
-                    np_tensor = np.array(np_tensor, ndmin=1)
-            else:
-                # if not, means it's a input rather than a inner weight
-                # so if the user gives values, we use these values
-                # if not, we just use the shape of input gived by onnx to init a random value
-                # HOWEVER, the random value may not be correct for some inputs, such as gather which needs indices
-                # so if have operators, the user must give inputs
-                x_shape = tuple(
-                    dim.dim_value for dim in x.type.tensor_type.shape.dim)
-                if init_inputs is not None:
-                    np_tensor = init_inputs[inp_idx]
-                    inp_idx += 1
-                else:
-                    np_tensor = np.random.randn(*x_shape).astype(np.float32)
-            tmp_tensor = tensor.from_numpy(np_tensor)
-            tmp_tensor.to_device(device)
-            # todo, for backward
-            tmp_tensor.stores_grad = (name in initializers)
-            tensor_map[x.name] = tmp_tensor
-        return tensor_map
+            if t.name not in params:
+                dtype = t.type.tensor_type.elem_type
+                shape = [dim.dim_value for dim in t.type.tensor_type.shape.dim]
+                inputs.extend([info_tuple(t.name, dtype, shape)])
+        for t in graph.output:
+            dtype = t.type.tensor_type.elem_type
+            shape = [dim.dim_value for dim in t.type.tensor_type.shape.dim]
+            outputs.extend([info_tuple(t.name, dtype, shape)])
+        return inputs, outputs
 
     @classmethod
-    def _onnx_model_to_singa_net(cls, model, init_inputs, device,
-                                 opset_version):
+    def _onnx_model_to_singa_ops(cls, graph, device, opset_version):
         """
-        get all intermediate tensors and operators from onnx model
-        Args:
-            model: a given onnx model
+        get all intermediate params, operators, and input info from onnx model
         Args:
-            init_inputs: a list of inputs, which used to init the operators
-        Args:
-            device: the used device
-        Args:
-            opset_version: the opset version
-        Returns:
-            a dict of tensors
+            graph (Graph): the loaded ONNX graph
+            device (string): CPU or CUDA
+            opset_version (int): the opset version
         Returns:
-            a list of SingaOps('name', 'op', 'handle', 'forward')
-        """
-        # init all tensor input and weight as a tensor map
-        tensor_map = cls._init_graph_parameter(model.graph, init_inputs, device)
-        # only weights tensor
-        weights = {x.name: tensor_map[x.name] for x in model.graph.initializer}
+            a dict of weights
+            a dict of ValueInfo
+            a dict of ValueInfo
+            a list of SingaOps('node', 'forward')
+        """
+        # init all tensor input and params as a tensor map
+        params = cls._parse_graph_params(graph, device)
+        inputs, outputs = cls._parse_graph_inputs_outputs(graph, params, device)
         # the parsed operators queue
-        singa_ops = []
-        singa_op = namedtuple('SingaOps', ['name', 'op', 'handle', 'forward'])
-        for node in model.graph.node:
+        operators = []
+        operator_tuple = namedtuple('operator_tuple', ['node', 'operator'])
+        for node in graph.node:
             node = OnnxNode(node)
-            # only give the inputs it needs
-            # consumed_inputs are the inputs marked as attributes
-            # so we remove it here
-            inputs = [
-                tensor_map[x]
-                for x in node.inputs
-                if x not in node.consumed_inputs
-            ]
-            handle, forward = cls._onnx_node_to_singa_op(
-                node, inputs, opset_version)
-            # if it is Constant, we hanlde it as a weight
-            # otherwise, we run it and add its output into map for being used by later operators
+            # convert Constant to param
             if node.op_type == 'Constant':
-                tmp_tensor = tensor.from_numpy(forward)
-                tmp_tensor.to_device(device)
-                tmp_name = node.outputs.pop(0)
-                weights[tmp_name] = tmp_tensor
-                tensor_map[tmp_name] = tmp_tensor
+                params[node.outputs[0]] = cls._onnx_constant_to_np(node)
             else:
-                outputs = cls._run_node(node, inputs, handle, forward)
-                for key, val in outputs.items():
-                    tensor_map[key] = val
-                singa_ops.extend([singa_op(node.name, node, handle, forward)])
-        return weights, singa_ops
+                op = cls._onnx_node_to_singa_op(node, opset_version)
+                operators.extend([operator_tuple(node, op)])
+        return params, inputs, outputs, operators
 
     @classmethod
-    def prepare(cls, model, device, **kwargs):
+    def prepare(cls, model, device='CPU', **kwargs):
         """
-        get the batch norm operator from onnx node
-        Args:
-            model: a given onnx node
+        parse the ONNX and to create layers
         Args:
-            device: the used device
-        Returns: 
-            a list of output values
+            model (ModelProto): the loaded ONNX model
+            device (string): CPU or CUDA
+        Returns:
+            a SingaRep instance to stores the layers and weights
         """
         super(SingaBackend, cls).prepare(model, device, **kwargs)
-        # when parsing graph, we use the shape of input gived by onnx to init a random value
-        # HOWEVER, the random value may not be correct for some inputs, such as gather which needs indices
-        # so if have operators, the user must give inputs
-        init_inputs = kwargs.get("init_inputs", None)
-        # whether initializers are moved into inputs, due to https://github.com/onnx/onnx/issues/2417
-        # sometimes, input contains all initializer's info, sometimes, may not
-        cls.keep_initializers_as_inputs = kwargs.get(
-            'keep_initializers_as_inputs', True)
         # optimize and infer the shape of the model
         try:
             model = onnx.utils.polish_model(model)
         except IndexError as err:
-            # due to https://github.com/onnx/onnx/issues/2417
             model = onnx.shape_inference.infer_shapes(model)
 
         # check the opset version and ir version
+        # SINGA supports opset version(11), ir version(1.6.0 -> 6)
         opset_version = None
         for imp in model.opset_import:
             if not imp.HasField("domain") or imp.domain == "":
                 opset_version = imp.version
-                if imp.version > cls._known_opset_version:
+                if imp.version > cls._opset_version:
                     warnings.warn(
-                        "This version of singa targets ONNX operator set version {}, but the model we are trying to import uses version {}.  We will try to import it anyway, but if the model uses operators which had BC-breaking changes in the intervening versions, import will fail."
-                        .format(cls._known_opset_version, imp.version))
+                        "The imported opertor set verion {} is larger than the supported version {}."
+                        .format(imp.version, cls._opset_version))
             else:
                 warnings.warn("Unrecognized operator set {}".format(imp.domain))
-        if opset_version is None:
-            if model.ir_version >= 0x00000003:
-                raise RuntimeError(
-                    "Model with IR version >= 3 did not specify ONNX operator set version (singa requires it)"
-                )
-            else:
-                opset_version = 1
-        weights, singa_ops = cls._onnx_model_to_singa_net(
-            model, init_inputs, device, opset_version)
-        return SingaRep(model, weights, singa_ops,
-                        cls.keep_initializers_as_inputs)
+
+        if model.ir_version > cls._ir_version:
+            warnings.warn(
+                "The imported ir verion {} is larger than the supported version {}."
+                .format(cls._ir_version, imp.version))
+
+        graph = model.graph
+        params, inputs, outputs, layers = cls._onnx_model_to_singa_ops(
+            graph, device, opset_version)
+        return SingaRep(params, inputs, outputs, layers, device)
 
 
 class SingaRep(BackendRep):
 
-    def __init__(self,
-                 model,
-                 weights,
-                 singa_ops,
-                 keep_initializers_as_inputs=True):
+    def __init__(self, params, inputs, outputs, layers, device):
         """
+        https://github.com/onnx/onnx/blob/master/docs/ImplementingAnOnnxBackend.md
         SingaRep provides the intermediate representation of Singa,
         the user can run the forward of the singa model by run func,
         or, the user can append more layers after the singa_ops to do
         the transfer learning
         Args:
-            model: a given operator
+            params (dict{}): a dict of params, data type is numpy ndarray
+            inputs (ValueInfo): a dict of inputs
+            outputs (ValueInfo): a dict of outputs
+            layers (namedtuple('operator_tuple', ['node', 'operator'])[]): a list of singa operator
+            device (string): CPU or CUDA
+        """
+        super(SingaRep, self).__init__()
+        self.inputs = inputs
+        self.states = params
+        self.outputs = outputs
+        self.dev = cpu_dev if device == "CPU" else gpu_dev
+        self.layers = layers
+        self.tensor_count = {}
+        self.has_initialized = False
+        self.is_graph = False
+
+    def initialize(self):
+        """
+        Init the instance
+        """
+        self.outputs_info = {outp.name: outp for outp in self.outputs}
+        _layers = []  # layers by topo order
+        for node, operator in self.layers:
+            for key, name in node.weight_inputs.items():
+                if key not in self.states:
+                    # cannot find the weights, try to find it from input
+                    node.set_attr_inputs(key, name)
+            self.__dict__[node.name] = operator
+            # init the tensor count
+            all_possible_inputs = node.inputs + list(
+                node.attr_inputs.keys()) + list(node.weight_inputs.keys())
+            for inp in all_possible_inputs:
+                if inp not in self.tensor_count:
+                    self.tensor_count[inp] = 1
+                else:
+                    self.tensor_count[inp] += 1
+            _layers.append(node)
+        self._layers = _layers
+
+    def init_tensor_count(self):
+        """
+        Init the tensor count dict
+        """
+        self.tensor_count = {}
+        for node, operator in self.layers:
+            # init the tensor count
+            all_possible_inputs = node.inputs + list(
+                node.attr_inputs.keys()) + list(node.weight_inputs.keys())
+            for inp in all_possible_inputs:
+                if inp not in self.tensor_count:
+                    self.tensor_count[inp] = 1
+                else:
+                    self.tensor_count[inp] += 1
+
+    def to_input_tensor(self, x):
+        """
+        convert the input to tensors
         Args:
-            weights: the tensor of weights
+            x (np.ndarray[]): a list of numpy ndarray as inputs
+        Returns: 
+            a dict of SINGA Tensors
+        """
+        tensor_dict = {}
+        # init inputs as Tensor
+        for (key, val) in zip(self.inputs, x):
+            if not self.is_graph:
+                val = val.astype(onnx_type_to_singa_type(key.dtype))
+                # todo, scalar
+                val = np.atleast_1d(self.states[val])
+                val = tensor.from_numpy(val)
+                val.to_device(self.dev)
+            tensor_dict[key.name] = val
+        return tensor_dict
+
+    def to_output_tensor(self, y, out_name):
+        """
+        convert the tensors to input
         Args:
-            singa_ops: the tensor of the operator
+            x (np.ndarray[]): a list of numpy ndarray as inputs
+        Returns: 
+            a dict of SINGA Tensors
         """
-        super(SingaRep, self).__init__()
-        self.model = model
-        self.tensor_map = weights
-        self.keep_initializers_as_inputs = keep_initializers_as_inputs
-        # this each item of singa_ops is: ('name', 'op', 'handle', 'forward')
-        # the name is a string, op is OnnxNode,
-        # handle is Singa handle to store the tensor into singa operator
-        # the forward is singa autograd operator
-        self.singa_ops = singa_ops
+        if not self.is_graph:
+            y = tensor.to_numpy(y)
+            if out_name in self.outputs_info:
+                np_dtyp = mapping.TENSOR_TYPE_TO_NP_TYPE[
+                    self.outputs_info[out_name].dtype]
+                y = y.astype(np_dtyp)
+        return y
 
-    def run(self, inputs, **kwargs):
+    def get_s(self, name, node, tensor_dict):
+        """
+        get state from the node's weights or tensor_dict
+        Args:
+            name (str): name of the state
+            node (ONNXNode): ONNX node
+            tensor_dict ({}): tensor dict
+        Returns: 
+            the states
+        """
+        if name in node.attr_inputs:
+            return tensor_dict[name]
+        else:
+            return self.states[name]
+
+    def handle_special_ops(self, node, op, tensor_dict):
+        """
+        hanlde some special operations
+        Args:
+            name (str): name of the state
+            node (ONNXNode): ONNX node
+            tensor_dict ({}): tensor dict
+        Returns: 
+            the states
+        """
+        # todo, hard code
+        # Conv2d nb_kernels
+        if node.op_type == "Conv":
+            shape = self.get_s(node.inputs[1], node, tensor_dict).shape
+            op.nb_kernels = shape[0]
+        # Gemm nb_kernels and bias_shape
+        elif node.op_type == "Gemm":
+            nb_kernels_flag = 0 if op.transB == 1 else 1
+            shape = self.get_s(node.inputs[1], node, tensor_dict).shape
+            op.nb_kernels = shape[nb_kernels_flag]
+            if op.bias:
+                shape = self.get_s(node.inputs[2], node, tensor_dict).shape
+                op.bias_shape = shape
+
+    def run(self, *x, **kwargs):
         """
         run the forward of singa model
         Args:
-            inputs: a given operator
+            x (np.ndarray[]): a list of numpy ndarray as inputs
         Returns: 
-            the onnx node
+            a list of outputs
         """
-        graph = self.model.graph
+        if not self.has_initialized:
+            self.initialize()
+            if isinstance(x[0], tensor.Tensor):
+                self.dev = x[0].device
+            self.has_initialized = True
+
+        outputs_dict = OrderedDict([(outp.name, None) for outp in self.outputs])
+
         # last_layers means we run this model until the last #N layers
-        last_layers = kwargs.get('last_layers', len(self.singa_ops))
-        if last_layers != len(self.singa_ops):
-            final_outputs = self.singa_ops[last_layers-1].op.outputs
-        else:
-            final_outputs =  [outp.name for outp in graph.output]
-        # whether return all outputs
-        all_outputs = kwargs.get('all_outputs', False)
-        # get a specific op by its name
-        op_name = kwargs.get('op_name', None)
-        # record the tensor we added from input
-        tmp_tensor_map = {name: val for name, val in self.tensor_map.items()}
-
-        # the dict will be returned
-        ret_outputs = OrderedDict()
-        if self.keep_initializers_as_inputs:
-            require_input_len = len(graph.input) - len(graph.initializer)
-            actual_input_len = len(inputs)
-        else:
-            require_input_len = len(graph.input)
-            actual_input_len = len(inputs)
-        assert require_input_len == actual_input_len, "The length of graph input is different from the tensor input: %d, %d" % (
-            require_input_len, actual_input_len)
-        # run the handle by the order of the list(the list is Topological Sorting)
-        for inp in graph.input:
-            if inp.name not in tmp_tensor_map:
-                tmp_tensor_map[inp.name] = inputs.pop(0)
-
-        for _, op, handle, forward in self.singa_ops[:last_layers]:
-            if len(op.consumed_inputs) != 0:
-                # because if op has consumed_inputs, it means it moved some inputs into attributes
-                # so when running, we should update these attributes
-                handle, forward = get_op(op,
-                                         [tmp_tensor_map[x] for x in op.inputs])
-            inputs = [
-                tmp_tensor_map[x]
-                for x in op.inputs
-                if x not in op.consumed_inputs
-            ]
-            outputs = _run_node(op, inputs, handle, forward)
-            for key, val in outputs.items():
-                tmp_tensor_map[key] = val
-                ret_outputs[key] = val
-
-        if op_name is not None:
-            if op_name in outputs:
-                return outputs[op_name]
+        last_layers = kwargs.get('last_layers', len(self._layers))
+        if last_layers != len(self._layers):
+            for outp in self._layers[last_layers - 1].outputs:
+                outputs_dict[outp] = None
+
+        aux_output = kwargs.get('aux_output', ())
+        for outp in aux_output:
+            outputs_dict[outp] = None
+
+        tensor_dict = self.to_input_tensor(x)
+        self.init_tensor_count()
+
+        # run the layer by the topo order
+        for node in self._layers[:last_layers]:
+            op = self.__dict__[node.name]
+            self.handle_special_ops(node, op, tensor_dict)
+            # make input
+            inputs = []
+            for inp in node.inputs:
+                if inp not in node.weight_inputs and inp not in node.attr_inputs:
+                    if inp in tensor_dict:
+                        inputs.append(tensor_dict[inp])
+                    elif inp in self.states:
+                        # todo, scalar
+                        val = np.atleast_1d(self.states[inp])
+                        val = tensor.from_numpy(val)
+                        val.to_device(self.dev)
+                        inputs.append(val)
+                    else:
+                        raise KeyError("Not found the input {} for operation {}".format(inp, node.name))
+            states = {}
+            if callable(getattr(op, "initialize",
+                                None)) and not op._initialized:
+                # init the operator
+                op.initialize(*inputs)
+                op._initialized = True
+                for key, name in node.weight_inputs.items():
+                    if key not in node.attr_inputs:
+                        # find the weights and not in the inputs
+                        states[name] = self.states[key]
+
+            # replace attrs by inputs
+            for key, name in node.attr_inputs.items():
+                if key in tensor_dict:
+                    ts = tensor_dict[key]
+                    if isinstance(ts, tensor.Tensor):
+                        ts = tensor.to_numpy(ts)
+                    states[name] = ts
+                elif key in self.states:
+                    states[name] = self.states[key]
+            # set states
+            if callable(getattr(op, "set_states", None)):
+                op.set_states(**states)
             else:
-                raise RuntimeError(
-                    "The op_name {} does not exist, please check. The available op_names are: {}"
-                    .format(op_name, [val for key, val in op_name.items()]))
-
-        # return all outputs if all_outputs==True
-        # else return last outputs
-        if all_outputs:
-            return ret_outputs
-        else:
-            return [ret_outputs[outp] for outp in final_outputs]
+                for key, value in states.items():
+                    setattr(op, key, value)
+            # run the node
+            outputs = _run_node(op, inputs)
+            # release the input tensor
+            for inp in node.inputs:
+                if inp in self.tensor_count:
+                    self.tensor_count[inp] -= 1
+                if self.tensor_count[inp] == 0:
+                    if inp in tensor_dict:
+                        del tensor_dict[inp]
+                    del self.tensor_count[inp]
+            # store the output
+            for (outp, val) in zip(node.outputs, outputs):
+                tensor_dict[outp] = val
+                if outp in outputs_dict:
+                    outputs_dict[outp] = self.to_output_tensor(val, outp)
+        return list(outputs_dict.values())
+
+
+class SONNXModel(model.Model):
+
+    def __init__(self, onnx_model):
+        """
+        Init a SIGNA Model
+        Args:
+            onnx_model (ModelProto): a loaded onnx model
+        """
+        super(SONNXModel, self).__init__()
+        self.sg_ir = prepare(onnx_model)
+        for node, operator in self.sg_ir.layers:
+            self.__dict__[node.name] = operator

Review comment:
       At [Line 2035-2040](https://github.com/apache/singa/pull/703/files/44796e63f66f1010ac08ff2c7ba8e94160c490c4#diff-4839d7350844248bf25a18751ed06062R2035-R2040), the code will check the type of the instance, if it's from autograd, it will call `setattr(op, key, value)`, otherwise will call `op.set_states(**states)`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] nudles commented on a change in pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

nudles commented on a change in pull request #703:
URL: https://github.com/apache/singa/pull/703#discussion_r432952001



##########
File path: python/singa/sonnx.py
##########
@@ -989,1150 +1035,1051 @@ def from_onnx(args):
 class SingaBackend(Backend):
 
     # This number indicates the onnx operator set version
-    _known_opset_version = 11
+    _opset_version = 11
+
+    _ir_version = 0x0000000000000006
 
     # beceuase singa's operators are different from onnx.
     # we define a dict for the name projection
     _rename_operators = {
-        'Relu': 'relu',
-        'Softmax': 'SoftMax',
-        'Sigmoid': 'sigmoid',
-        'Add': 'add',
-        'MatMul': 'matmul',
-        'Conv': '_Conv2d',
-        'MaxPool': '_Pooling2d',
-        'AveragePool': '_Pooling2d',
-        'BatchNormalization': 'batchnorm_2d',
-        'Concat': 'Concat',
-        'Flatten': 'Flatten',
-        'Gemm': 'Gemm',
-        'Reshape': 'Reshape',
-        'Sum': 'sum',
-        'Cos': 'cos',
-        'Cosh': 'cosh',
-        'Sin': 'sin',
-        'Sinh': 'sinh',
-        'Tan': 'tan',
-        'Tanh': 'tanh',
-        'Acos': 'acos',
-        'Acosh': 'acosh',
-        'Asin': 'asin',
-        'Asinh': 'asinh',
-        'Atan': 'atan',
-        'Atanh': 'atanh',
-        'Selu': 'SeLU',
-        'Elu': 'Elu',
-        'Equal': 'equal',
-        'Less': 'less',
-        'Sign': 'sign',
-        'Div': 'div',
-        'Sub': 'sub',
-        'Sqrt': 'sqrt',
-        'Log': 'log',
-        'Greater': 'greater',
-        'HardSigmoid': 'HardSigmoid',
-        'Identity': 'identity',
-        'Softplus': 'softplus',
-        'Softsign': 'softsign',
-        'Mean': 'mean',
-        'Pow': 'pow',
-        'Clip': 'Clip',
-        'PRelu': 'prelu',
-        'Mul': 'mul',
-        'Transpose': 'Transpose',
-        'Max': 'max',
-        'Min': 'min',
-        'Shape': 'shape',
-        'And': '_and',
-        'Or': '_or',
-        'Xor': '_xor',
-        'Not': '_not',
-        'Neg': 'negative',
-        'Reciprocal': 'reciprocal',
-        'ConstantOfShape': 'ConstantOfShape',
-        'Dropout': 'Dropout',
-        'ReduceSum': 'ReduceSum',
-        'ReduceMean': 'ReduceMean',
-        'LeakyRelu': 'LeakyRelu',
-        'GlobalAveragePool': 'GlobalAveragePool',
-        'Squeeze': 'Squeeze',
+        # common op
+        'Relu': 'ReLU',
+        'Sigmoid': 'Sigmoid',
+        'Add': 'Add',
+        'MatMul': 'Matmul',
+        'Sum': 'Sum',
+        'Cos': 'Cos',
+        'Cosh': 'Cosh',
+        'Sin': 'Sin',
+        'Sinh': 'Sinh',
+        'Tan': 'Tan',
+        'Tanh': 'Tanh',
+        'Acos': 'Acos',
+        'Acosh': 'Acosh',
+        'Asin': 'Asin',
+        'Asinh': 'Asinh',
+        'Atan': 'Atan',
+        'Atanh': 'Atanh',
+        'Equal': 'Equal',
+        'Less': 'Less',
+        'Sign': 'Sign',
+        'Div': 'Div',
+        'Sub': 'Sub',
+        'Sqrt': 'Sqrt',
+        'Log': 'Log',
+        'Greater': 'Greater',
+        'Identity': 'Identity',
+        'Softplus': 'SoftPlus',
+        'Softsign': 'SoftSign',
+        'Mean': 'Mean',
+        'Pow': 'Pow',
+        'PRelu': 'PRelu',
+        'Mul': 'Mul',
+        'Max': 'Max',
+        'Min': 'Min',
+        'Shape': 'Shape',
+        'And': 'And',
+        'Or': 'Or',
+        'Xor': 'Xor',
+        'Not': 'Not',
+        'Neg': 'Negative',
+        'Reciprocal': 'Reciprocal',
         'Unsqueeze': 'Unsqueeze',
-        'Slice': 'Slice',
+        'NonZero': 'NonZero',
         'Ceil': 'Ceil',
-        'Split': 'Split',
-        'Gather': 'Gather',
-        'Tile': 'Tile',
-        'NonZero': 'nonzero',
+        # # special op
         'Cast': 'Cast',
+        'Split': 'Split',
+        'Squeeze': 'Squeeze',
+        'GlobalAveragePool': 'GlobalAveragePool',
+        'LeakyRelu': 'LeakyRelu',
+        'ReduceSum': 'ReduceSum',
+        'ReduceMean': 'ReduceMean',
+        'Dropout': 'Dropout',
+        'ConstantOfShape': 'ConstantOfShape',
+        'Transpose': 'Transpose',
+        'HardSigmoid': 'HardSigmoid',
+        'Elu': 'Elu',
+        'Selu': 'SeLU',
+        'Concat': 'Concat',
+        'Softmax': 'SoftMax',
+        'Flatten': 'Flatten',
         'OneHot': 'OneHot',
+        'Tile': 'Tile',
+        'Gather': 'Gather',
+        'Reshape': 'Reshape',
+        'Slice': 'Slice',
+        'Clip': 'Clip',
+        'Gemm': 'layer.Gemm',  # layer
+        'BatchNormalization': 'layer.BatchNorm2d',  # layer
+        'Conv': 'layer.Conv2d',  # layer
+        'MaxPool': 'layer.Pooling2d',  # layer
+        'AveragePool': 'layer.Pooling2d',  # layer
     }
 
     # this dict indicates the operators that need extra handle
     # each indicates a function name
     _special_operators = {
-        'Conv': '_create_conv',
-        'MaxPool': '_create_max_avg_pool',
-        'AveragePool': '_create_max_avg_pool',
-        'BatchNormalization': '_create_batchnorm',
+        'Cast': '_create_cast',
+        'Split': '_create_split',
+        'Squeeze': '_create_squeeze_unsqueeze',
+        'Unsqueeze': '_create_squeeze_unsqueeze',
+        'GlobalAveragePool': '_create_global_average_pool',
+        'LeakyRelu': '_create_leakyrelu',
+        'ReduceSum': '_create_reduce_ops',
+        'ReduceMean': '_create_reduce_ops',
+        'Dropout': '_create_dropout',
+        'ConstantOfShape': '_create_constant_of_shape',
+        'Transpose': '_create_transpose',
+        'HardSigmoid': '_create_hardsigmoid',
+        'Elu': '_create_elu',
+        'Selu': '_create_selu',
         'Concat': '_create_concat',
-        'Flatten': '_create_flatten',
+        'Softmax': '_create_softmax',
         'Gemm': '_create_gemm',
+        'Flatten': '_create_flatten',
+        'OneHot': '_create_onehot',
+        'Tile': '_create_tile',
+        'Gather': '_create_gather',
         'Reshape': '_create_reshape',
-        'Softmax': '_create_softmax',
-        'Selu': '_create_selu',
-        'Elu': '_create_elu',
-        'HardSigmoid': '_create_hardsigmoid',
-        'Clip': '_create_clip',
-        'Transpose': '_create_transpose',
-        'ConstantOfShape': '_create_constantOfShape',
-        'Dropout': '_create_dropout',
-        'ReduceSum': '_create_reduceOp',
-        'ReduceMean': '_create_reduceOp',
-        'LeakyRelu': '_create_leakyrelu',
-        'GlobalAveragePool': '_create_globalaveragepool',
-        'Squeeze': '_create_squeeze',
-        'Unsqueeze': '_create_squeeze',
         'Slice': '_create_slice',
-        'Split': '_create_split',
-        'Gather': '_create_gather',
-        'Tile': '_create_tile',
-        'Cast': '_create_cast',
-        'OneHot': '_create_onehot',
-        'Constant': "_create_constant"
+        'Clip': '_create_clip',
+        'BatchNormalization': '_create_batch_norm',
+        'Conv': '_create_conv',
+        'MaxPool': '_create_max_avg_pool',
+        'AveragePool': '_create_max_avg_pool',
     }
 
     @classmethod
-    def _create_constant(cls, onnx_node, inputs, opset_version):
-        """
-        parse onnx constatn node to weights
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
-        """
-        tmp_tensor = onnx_node.getattr('value')
-        np_dtype = onnx.mapping.TENSOR_TYPE_TO_NP_TYPE[tmp_tensor.data_type]
-        np_tensor = np.frombuffer(tmp_tensor.raw_data, dtype=np_dtype)
-        if np_tensor.dtype == "int64":
-            np_tensor = np_tensor.astype(np.int32)
-        # todo, we cannot support scalar tensor
-        if np.ndim(np_tensor) == 0:
-            np_tensor = np.array(np_tensor, ndmin=1)
-        return None, np_tensor
-
-    @classmethod
-    def _create_onehot(cls, onnx_node, inputs, opset_version):
-        """
-        get the OneHot operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
-        """
-        axis = onnx_node.getattr("axis", -1)
-        # we move several inputs to singa's attribuates
-        # and mark them so we don't use them when we run this operator
-        depth = tensor.to_numpy(inputs.pop(1)).astype(np.int32)
-        value = tensor.to_numpy(inputs.pop(1))
-        onnx_node.consumed_inputs.extend(onnx_node.inputs[1:])
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(axis, depth, value)
-
-    @classmethod
-    def _create_cast(cls, onnx_node, inputs, opset_version):
+    def _create_cast(cls, onnx_node, operator, opset_version=_opset_version):
         """
         get the Cast operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
-        """
-        to = onnx_node.getattr("to")
-        # singa only supports float32 and int32
-        map_dict = {
-            TensorProto.FLOAT: tensor.float32,  # FLOAT to float32
-            TensorProto.UINT8: None,  # UINT8
-            TensorProto.INT8: tensor.int32,  # INT8 to int32
-            TensorProto.UINT16: None,  # UINT16
-            TensorProto.INT16: tensor.int32,  # INT16 to int32
-            TensorProto.INT32: tensor.int32,  # INT32 to int32
-            TensorProto.INT64: tensor.int32,  # INT64 to int32
-            TensorProto.STRING: None,  # stirng
-            TensorProto.BOOL: None,  # bool
-        }
-        to = map_dict[to]
-        assert to != None, "not support cast type: {}".format(to)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(to)
-
-    @classmethod
-    def _create_tile(cls, onnx_node, inputs, opset_version):
-        """
-        get the Tile operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
-        """
-        # we move several inputs to singa's attribuates
-        # and mark them so we don't use them when we run this operator
-        repeats = tensor.to_numpy(inputs.pop(1)).astype(np.int32).tolist()
-        onnx_node.consumed_inputs.append(onnx_node.inputs[1])
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(repeats)
-
-    @classmethod
-    def _create_gather(cls, onnx_node, inputs, opset_version):
-        """
-        get the Gather operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
-        axis = onnx_node.getattr("axis", 0)
-        # we move several inputs to singa's attribuates
-        # and mark them so we don't use them when we run this operator
-        indices = tensor.to_numpy(inputs.pop(1)).astype(np.int32).tolist()
-        onnx_node.consumed_inputs.append(onnx_node.inputs[1])
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(axis, indices)
+        to_type = onnx_type_to_singa_type(onnx_node.getattr("to"))
+        assert to_type != None, "not support cast type: {}".format(to_type)
+        if to_type == np.dtype('float32'):
+            return operator(tensor.float32)
+        else:
+            return operator(tensor.int32)
 
     @classmethod
-    def _create_split(cls, onnx_node, inputs, opset_version):
+    def _create_split(cls, onnx_node, operator, opset_version=_opset_version):
         """
         get the Split operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         axis = onnx_node.getattr("axis", 0)
         split = onnx_node.getattr("split", None)
         num_output = len(onnx_node.outputs)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(axis, split, num_output)
-
-    @classmethod
-    def _create_slice(cls, onnx_node, inputs, opset_version):
-        """
-        get the Slice operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
-        """
-        # we move several inputs to singa's attribuates
-        # and mark them so we don't use them when we run this operator
-        starts = tensor.to_numpy(inputs.pop(1)).astype(np.int32).tolist()
-        ends = tensor.to_numpy(inputs.pop(1)).astype(np.int32).tolist()
-        # sometime onnx may ignore these two inputs, axes and step
-        if len(inputs) >= 2 and onnx_node.inputs[3] != '':
-            axes = tensor.to_numpy(inputs.pop(1)).astype(np.int32).tolist()
-        else:
-            axes = None
-        steps = tensor.to_numpy(inputs.pop(1)).astype(
-            np.int32).tolist() if len(inputs) >= 2 else None
-        onnx_node.consumed_inputs.extend(onnx_node.inputs[1:])
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(starts, ends, axes, steps)
+        return operator(axis, split, num_output)
 
     @classmethod
-    def _create_squeeze(cls, onnx_node, inputs, opset_version):
+    def _create_squeeze_unsqueeze(cls,
+                                  onnx_node,
+                                  operator,
+                                  opset_version=_opset_version):
         """
         get the Squeeze and Unsqueeze operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         axes = onnx_node.getattr("axes")
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(axes)
+        return operator(axes)
 
     @classmethod
-    def _create_globalaveragepool(cls, onnx_node, inputs, opset_version):
+    def _create_global_average_pool(cls,
+                                    onnx_node,
+                                    operator,
+                                    opset_version=_opset_version):
         """
         get the GlobalAveragePool operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         data_format = onnx_node.getattr("data_format", 'channels_first')
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(data_format)
+        return operator(data_format)
 
     @classmethod
-    def _create_leakyrelu(cls, onnx_node, inputs, opset_version):
+    def _create_leakyrelu(cls,
+                          onnx_node,
+                          operator,
+                          opset_version=_opset_version):
         """
         get the LeakyRelu operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         alpha = onnx_node.getattr("alpha", 0.01)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(alpha)
+        return operator(alpha)
 
     @classmethod
-    def _create_reduceOp(cls, onnx_node, inputs, opset_version):
+    def _create_reduce_ops(cls,
+                           onnx_node,
+                           operator,
+                           opset_version=_opset_version):
         """
         get the ReduceSum, ReduceMean, ReduceMax, ReduceMin, etc, operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         axes = onnx_node.getattr("axes", None)
         keepdims = onnx_node.getattr("keepdims", 1)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(axes, keepdims)
+        return operator(axes, keepdims)
 
     @classmethod
-    def _create_dropout(cls, onnx_node, inputs, opset_version):
+    def _create_dropout(cls, onnx_node, operator, opset_version=_opset_version):
         """
         get the Dropout operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         ratio = onnx_node.getattr("ratio", 0)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(ratio)
+        return operator(ratio)
 
     @classmethod
-    def _create_constantOfShape(cls, onnx_node, inputs, opset_version):
+    def _create_constant_of_shape(cls,
+                                  onnx_node,
+                                  operator,
+                                  opset_version=_opset_version):
         """
         get the ConstantOfShape operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         value = onnx_node.getattr("value", 0)
         if isinstance(value, onnx.TensorProto):
             value = numpy_helper.to_array(value)[0].item()
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(value)
+        return operator(value)
 
     @classmethod
-    def _create_transpose(cls, onnx_node, inputs, opset_version):
+    def _create_transpose(cls,
+                          onnx_node,
+                          operator,
+                          opset_version=_opset_version):
         """
         get the Transpose operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
-        """
-        shape = inputs[0].shape
-        perm = onnx_node.getattr("perm", list(range(len(shape) - 1, -1, -1)))
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(perm)
-
-    @classmethod
-    def _create_clip(cls, onnx_node, inputs, opset_version):
-        """
-        get the clip operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
-        # sometime onnx may ignore these two inputs, min or max or both
-        if len(inputs) >= 2 and onnx_node.inputs[1] != '':
-            min_v = tensor.to_numpy(inputs.pop(1)).tolist()[0]
-        else:
-            min_v = None
-        if len(inputs) >= 2 and onnx_node.inputs[2] != '':
-            max_v = tensor.to_numpy(inputs.pop(1)).tolist()[0]
-        else:
-            max_v = None
-        onnx_node.consumed_inputs.extend(onnx_node.inputs[1:])
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(min_v, max_v)
+        perm = onnx_node.getattr("perm")
+        return operator(perm)
 
     @classmethod
-    def _create_hardsigmoid(cls, onnx_node, inputs, opset_version):
+    def _create_hardsigmoid(cls,
+                            onnx_node,
+                            operator,
+                            opset_version=_opset_version):
         """
-        get the HardSigmoid operator from onnx node
+        get the hardsigmoid operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            
+            opset_version (int): the opset version
         Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         alpha = onnx_node.getattr("alpha", 0.2)
         beta = onnx_node.getattr("beta", 0.5)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(alpha, beta)
+        return operator(alpha, beta)
 
     @classmethod
-    def _create_elu(cls, onnx_node, inputs, opset_version):
+    def _create_elu(cls, onnx_node, operator, opset_version=_opset_version):
         """
         get the elu operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         alpha = onnx_node.getattr("alpha", 1.)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(alpha)
+        return operator(alpha)
 
     @classmethod
-    def _create_selu(cls, onnx_node, inputs, opset_version):
+    def _create_selu(cls, onnx_node, operator, opset_version=_opset_version):
         """
         get the selu operator from onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
         alpha = onnx_node.getattr("alpha", 1.67326)
         gamma = onnx_node.getattr("gamma", 1.0507)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(alpha, gamma)
+        return operator(alpha, gamma)
 
     @classmethod
-    def _create_reshape(cls, onnx_node, inputs, opset_version):
+    def _create_concat(cls, onnx_node, operator, opset_version=_opset_version):
         """
-        get the reshape operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
+        get the concat operator from onnx node
         Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            the handle of singa operator
-        Returns: 
-            the autograd of singa operator
+            singa operator instance
         """
-        shape = tensor.to_numpy(inputs.pop(1)).astype(np.int32).tolist()
-        onnx_node.consumed_inputs.append(onnx_node.inputs[1])
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(shape)
+        factor = onnx_node.getattr('axis')
+        return operator(axis=factor)
 
     @classmethod
-    def _create_conv(cls, onnx_node, inputs, opset_version):
+    def _create_softmax(cls, onnx_node, operator, opset_version=_opset_version):
         """
-        get the conv operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
+        get the softmax operator from onnx node
         Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            handle, the handle of singa operator
-        Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
-        kernel = tuple(onnx_node.attrs["kernel_shape"])
-        padding = tuple(
-            onnx_node.attrs["pads"]) if "pads" in onnx_node.attrs else (0, 0)
-        stride = tuple(onnx_node.getattr('strides', (1, 1)))
-        # default the odd_padding is 0, once there are same pad mode, we modify it
-        # for odd_padding, please refer the autegrade.py
-        odd_padding = (0, 0, 0, 0)
-        if "auto_pad" in onnx_node.attrs:
-            auto_pad = utils.force_unicode(onnx_node.attrs['auto_pad'])
-            if auto_pad in ('SAME_UPPER', 'SAME_LOWER'):
-                padding, odd_padding = utils.get_padding_shape(
-                    auto_pad, inputs[0].shape[2:], kernel, stride)
-
-        # not support dilation
-        dilation = onnx_node.getattr('dilations', 1)
-        if dilation != 1 and list(dilation) != [1, 1]:
-            raise ValueError("Not implemented yet for dilation")
-        group = onnx_node.getattr('group', 1)
-
-        # only support 1d or 2d
-        if len(kernel) > 2:
-            raise ValueError("Only implemented for 1d or 2d")
-
-        bias = len(inputs) == 3
-        x = inputs[0]
-        x_shape = inputs[0].shape
-        in_channels = x_shape[1]
-        w_shape = inputs[1].shape
-        out_channels = w_shape[0]
-        assert w_shape[1] == in_channels // group
-
-        if inputs[0].device.id() == -1:
-            if group != 1:
-                raise NotImplementedError
-            else:
-                handle = singa.ConvHandle(x.data, kernel, stride, padding,
-                                          in_channels, out_channels, bias,
-                                          group)
-        else:
-            handle = singa.CudnnConvHandle(x.data, kernel, stride, padding,
-                                           in_channels, out_channels, bias,
-                                           group)
-
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(handle, odd_padding)
+        factor = onnx_node.getattr('axis', 1)
+        return operator(axis=factor)
 
     @classmethod
-    def _create_max_avg_pool(cls, onnx_node, inputs, opset_version):
+    def _create_gemm(cls, onnx_node, operator, opset_version=_opset_version):
         """
-        get the max or avg pool operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
+        get the gemm operator from onnx node
         Args:
-            opset_version: the opset version
-        Returns: 
-            handle, the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            forward, the autograd of singa operator
+            singa operator instance
         """
-        kernel = tuple(onnx_node.attrs["kernel_shape"])
-        padding = tuple(
-            onnx_node.attrs["pads"]) if "pads" in onnx_node.attrs else (0, 0)
-        stride = tuple(onnx_node.getattr('strides', (1, 1)))
-        # default the odd_padding is 0, once there are same pad mode, we modify it
-        # for odd_padding, please refer the autegrade.py
-        odd_padding = (0, 0, 0, 0)
-        if "auto_pad" in onnx_node.attrs:
-            auto_pad = utils.force_unicode(onnx_node.attrs['auto_pad'])
-            if auto_pad in ('SAME_UPPER', 'SAME_LOWER'):
-                padding, odd_padding = utils.get_padding_shape(
-                    auto_pad, inputs[0].shape[2:], kernel, stride)
-
-        # not support count_include_pad and auto_pad
-        if "count_include_pad" in onnx_node.attrs or "ceil_mode" in onnx_node.attrs:
-            raise ValueError(
-                "Not implemented yet for count_include_pad or ceil_mode")
-
-        # only support 2d
-        if len(kernel) != 2:
-            raise ValueError("Not implemented yet")
-
-        is_max = onnx_node.op_type == 'MaxPool'
-        x = inputs[0]
-        if x.device.id() == -1:
-            handle = singa.PoolingHandle(x.data, kernel, stride, padding,
-                                         is_max)
-        else:
-            handle = singa.CudnnPoolingHandle(x.data, kernel, stride, padding,
-                                              is_max)
-
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return _, forward(handle, odd_padding)
+        alpha = onnx_node.getattr('alpha', 1.)
+        beta = onnx_node.getattr('beta', 1.)
+        transA = onnx_node.getattr('transA', 0)
+        transB = onnx_node.getattr('transB', 0)
+        onnx_node.set_weight_inputs(onnx_node.inputs[1], 'W')
+        bias = False
+        if len(onnx_node.inputs) == 3:
+            onnx_node.set_weight_inputs(onnx_node.inputs[2], 'b')
+            bias = True
+        return operator(None,
+                        alpha=alpha,
+                        beta=beta,
+                        transA=transA,
+                        transB=transB,
+                        bias=bias)
 
     @classmethod
-    def _create_batchnorm(cls, onnx_node, inputs, opset_version):
+    def _create_flatten(cls, onnx_node, operator, opset_version=_opset_version):
         """
-        get the batch norm operator from onnx node
-        Args:onnx_node: a given onnx node
-        Args:inputs: the input tensor
-        Args:opset_version: the opset version
-        Returns: the handle of singa operator
-        Returns: the autograd of singa operator
+        get the flatten operator from onnx node
+        Args:
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
+        Returns: 
+            singa operator instance
         """
-        x = inputs[0]
-        factor = onnx_node.getattr('momentum', 0.9)
-        if x.device.id() == -1:
-            handle = singa.BatchNormHandle(factor, x.data)
-        else:
-            handle = singa.CudnnBatchNormHandle(factor, x.data)
-
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return handle, forward
+        factor = onnx_node.getattr('axis', 1)
+        return operator(axis=factor)
 
     @classmethod
-    def _create_concat(cls, onnx_node, inputs, opset_version):
+    def _create_onehot(cls, onnx_node, operator, opset_version=_opset_version):
         """
-        get the concat operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
+        get the OneHot operator from onnx node
         Args:
-            opset_version: the opset version
-        Returns: 
-            the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            the autograd of singa operator
+            singa operator instance
         """
-        factor = onnx_node.attrs["axis"]
-        if factor < 0:
-            factor = len(inputs[0].shape
-                        ) + factor  # in order to support the negative axis
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return None, forward(axis=factor)
+        axis = onnx_node.getattr("axis", -1)
+        onnx_node.set_attr_inputs(onnx_node.inputs[1], 'depth')
+        onnx_node.set_attr_inputs(onnx_node.inputs[2], 'values')
+        return operator(axis, None, None)
 
     @classmethod
-    def _create_softmax(cls, onnx_node, inputs, opset_version):
+    def _create_tile(cls, onnx_node, operator, opset_version=_opset_version):
         """
-        get the concat operator from onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
+        get the Tile operator from onnx node
         Args:
-            opset_version: the opset version
-        Returns: 
-            the handle of singa operator
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            the autograd of singa operator
+            singa operator instance
         """
-        factor = onnx_node.getattr('axis', 1)
-        if factor < 0:
-            # in order to support the negative axis
-            factor = len(inputs[0].shape) + factor
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return None, forward(axis=factor)
+        onnx_node.set_attr_inputs(onnx_node.inputs[1], 'repeats')
+        return operator(None)
 
     @classmethod
-    def _create_gemm(cls, onnx_node, inputs, opset_version):
+    def _create_gather(cls, onnx_node, operator, opset_version=_opset_version):
         """
-        get the gemm operator from onnx node
-        Args:
-            onnx_node: a given onnx node
+        get the Gather operator from onnx node
         Args:
-            inputs: the input tensor
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
+        Returns: 
+            singa operator instance
+        """
+        axis = onnx_node.getattr("axis", 0)
+        onnx_node.set_attr_inputs(onnx_node.inputs[1], 'indices')
+        return operator(axis, None)
+
+    @classmethod
+    def _create_reshape(cls, onnx_node, operator, opset_version=_opset_version):
+        """
+        get the reshape operator from onnx node
         Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            the handle of singa operator
+            singa operator instance
+        """
+        onnx_node.set_attr_inputs(onnx_node.inputs[1], 'shape')
+        return operator(None)
+
+    @classmethod
+    def _create_slice(cls, onnx_node, operator, opset_version=_opset_version):
+        """
+        get the Slice operator from onnx node
+        Args:
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            the autograd of singa operator
+            singa operator instance
         """
-        x = inputs[0]
-        alpha = onnx_node.getattr('alpha', 1.)
-        beta = onnx_node.getattr('beta', 1.)
-        transA = onnx_node.getattr('transA', 0)
-        transB = onnx_node.getattr('transB', 0)
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return None, forward(alpha=alpha,
-                             beta=beta,
-                             transA=transA,
-                             transB=transB)
+        onnx_node.set_attr_inputs(onnx_node.inputs[1], 'starts')
+        onnx_node.set_attr_inputs(onnx_node.inputs[2], 'ends')
+        if len(onnx_node.inputs) >= 4 and onnx_node.inputs[3] != '':
+            onnx_node.set_attr_inputs(onnx_node.inputs[3], 'axes')
+        if len(onnx_node.inputs) == 5 and onnx_node.inputs[4] != '':
+            onnx_node.set_attr_inputs(onnx_node.inputs[4], 'steps')
+        return operator(None, None, None, None)
 
     @classmethod
-    def _create_flatten(cls, onnx_node, inputs, opset_version):
+    def _create_clip(cls, onnx_node, operator, opset_version=_opset_version):
         """
-        get the flatten operator from onnx node
+        get the clip operator from onnx node
         Args:
-            onnx_node: a given onnx node
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
+        Returns: 
+            singa operator instance
+        """
+        if len(onnx_node.inputs) >= 2 and onnx_node.inputs[1] != '':
+            onnx_node.set_attr_inputs(onnx_node.inputs[1], 'min')
+        if len(onnx_node.inputs) == 3 and onnx_node.inputs[2] != '':
+            onnx_node.set_attr_inputs(onnx_node.inputs[2], 'max')
+        return operator(None, None)
+
+    @classmethod
+    def _create_batch_norm(cls,
+                           onnx_node,
+                           operator,
+                           opset_version=_opset_version):
+        """
+        get the clip operator from onnx node
         Args:
-            inputs: the input tensor
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
+        Returns: 
+            singa operator instance
+        """
+        factor = onnx_node.getattr('momentum', 0.9)
+        onnx_node.set_weight_inputs(onnx_node.inputs[1], 'scale')
+        onnx_node.set_weight_inputs(onnx_node.inputs[2], 'bias')
+        onnx_node.set_weight_inputs(onnx_node.inputs[3], 'running_mean')
+        onnx_node.set_weight_inputs(onnx_node.inputs[4], 'running_var')
+        return operator(factor)
+
+    @classmethod
+    def _create_conv(cls, onnx_node, operator, opset_version=_opset_version):
+        """
+        get the clip operator from onnx node
         Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            the handle of singa operator
+            singa operator instance
+        """
+        kernel_size = tuple(onnx_node.getattr('kernel_shape'))
+        padding = tuple(onnx_node.getattr('pads', (0, 0)))
+        stride = tuple(onnx_node.getattr('strides', (1, 1)))
+        auto_pad = utils.force_unicode(onnx_node.getattr('auto_pad', 'NOTSET'))
+
+        # not support dilation
+        dilation = onnx_node.getattr('dilations', 1)
+        if dilation != 1 and list(dilation) != [1, 1]:
+            raise ValueError("Not implemented yet for dilation")
+        group = onnx_node.getattr('group', 1)
+
+        # only support 1d or 2d
+        if len(kernel_size) > 2:
+            raise ValueError("Only implemented for 1d or 2d")
+
+        onnx_node.set_weight_inputs(onnx_node.inputs[1], 'W')
+        bias = False
+        if len(onnx_node.inputs) == 3:
+            onnx_node.set_weight_inputs(onnx_node.inputs[2], 'b')
+            bias = True
+        return operator(None,
+                        kernel_size,
+                        stride=stride,
+                        padding=padding,
+                        dilation=dilation,
+                        group=group,
+                        bias=bias,
+                        pad_mode=auto_pad)
+
+    @classmethod
+    def _create_max_avg_pool(cls,
+                             onnx_node,
+                             operator,
+                             opset_version=_opset_version):
+        """
+        get the clip operator from onnx node
+        Args:
+            onnx_node (OnnxNode): a given onnx node
+            operator (Operator Class): a singa operator class
+            opset_version (int): the opset version
         Returns: 
-            the autograd of singa operator
+            singa operator instance
         """
-        factor = onnx_node.getattr('axis', 1)
-        if factor < 0:
-            # in order to support the negative axis
-            factor = len(inputs[0].shape) + factor
+        kernel_size = tuple(onnx_node.getattr('kernel_shape'))
+        padding = tuple(onnx_node.getattr('pads', (0, 0)))
+        stride = tuple(onnx_node.getattr('strides', (1, 1)))
+        auto_pad = utils.force_unicode(onnx_node.getattr('auto_pad', 'NOTSET'))
 
-        _, forward = cls._common_onnx_node_to_singa_op(onnx_node, inputs,
-                                                       opset_version)
-        return None, forward(axis=factor)
+        # not support count_include_pad and auto_pad
+        ceil_mode = onnx_node.getattr('ceil_mode', 0)
+        count_include_pad = onnx_node.getattr('count_include_pad', 0)
+        if ceil_mode != 0 or count_include_pad != 0:
+            raise ValueError(
+                "Not implemented yet for count_include_pad or ceil_mode")
+
+        # only support 1d or 2d
+        if len(kernel_size) > 2:
+            raise ValueError("Only implemented for 1d or 2d")
+
+        is_max = onnx_node.op_type == 'MaxPool'
+        return operator(kernel_size, stride, padding, is_max, auto_pad)
 
     @classmethod
-    def _common_onnx_node_to_singa_op(cls, onnx_node, inputs, opset_version):
+    def _onnx_constant_to_np(cls, onnx_node, opset_version):
         """
-        get a common singa operator(only autograd) from a onnx node
-        other special operators also can call this func to get autograd
-        Args:
-            onnx_node: a given onnx node
+        parse onnx constatn node to numpy array
         Args:
-            tensor_map: the input tensor
-        Args:
-            opset_version: the opset version
-        Returns: 
-            a dict of tensors
+            onnx_node (OnnxNode): a given onnx node
+            opset_version (int): the opset version
         Returns: 
-            a list of SingaOps('name', 'op', 'handle', 'forward')
+            a numpy ndarray
         """
-        onnx_op_type = onnx_node.op_type
-        assert onnx_op_type in cls._rename_operators, "not support operator: {}".format(
-            onnx_op_type)
-        autograd_op = getattr(autograd, cls._rename_operators[onnx_op_type])
-        return None, autograd_op
+        onnx_tensor = onnx_node.getattr('value')
+        np_dtype = mapping.TENSOR_TYPE_TO_NP_TYPE[onnx_tensor.data_type]
+        np_tensor = np.frombuffer(onnx_tensor.raw_data, dtype=np_dtype)
+        return tensor.from_numpy(np_tensor)
 
     @classmethod
-    def _onnx_node_to_singa_op(cls,
-                               onnx_node,
-                               inputs,
-                               opset_version=_known_opset_version):
+    def _onnx_node_to_singa_op(cls, onnx_node, opset_version=_opset_version):
         """
-        get a singa operator(handle and autograd) from a onnx node
-        Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input list
+        get singa operator from a onnx node
         Args:
-            opset_version: the opset version
+            onnx_node (OnnxNode): a given onnx node
+            opset_version (int): the opset version
         Returns: 
-            a dict of tensors
-        Returns: 
-            a list of SingaOps('name', 'op', 'handle', 'forward')
+            singa operator instance
         """
+        onnx_op_type = onnx_node.op_type
+        assert onnx_op_type in cls._rename_operators, "not support operator: {}".format(
+            onnx_op_type)
+        renamed_op = cls._rename_operators[onnx_op_type]
+        if renamed_op.startswith('layer.'):
+            op_class = getattr(layer, renamed_op[6:])
+        else:
+            op_class = getattr(autograd, renamed_op)
         if onnx_node.op_type in cls._special_operators:
             translator = getattr(cls, cls._special_operators[onnx_node.op_type])
+            op = translator(onnx_node, op_class, opset_version)
         else:
-            translator = cls._common_onnx_node_to_singa_op
-        return translator(onnx_node, inputs, opset_version)
+            op = op_class()
+        # refine the ONNXNode
+        onnx_node.inputs = [inp for inp in onnx_node.inputs if inp != '']
+        return op
 
     @classmethod
-    def run_node(cls, onnx_node, inputs, opset_version=_known_opset_version):
+    def run_node(cls, node, inputs, device='CPU', opset_version=_opset_version):
         """
         run a single singa operator from a onnx node
         Args:
-            onnx_node: a given onnx node
-        Args:
-            inputs: the input tensor
-        Args:
-            device: the used device
-        Args:
-            opset_version: the opset version
-        Returns: 
-            list, the output of the 
+            node (NodeProto): a given onnx node
+            inputs (ndarray[]): a list of numpy ndarray
+            device (string): CPU or CUDA
+            opset_version (int): the opset version
+        Returns:
+            list, the output
         """
-        valid_inputs = [x for x in onnx_node.inputs if x != ""]
+        node = OnnxNode(node)
+        valid_inputs = [x for x in node.inputs if x != ""]
         assert len(valid_inputs) == len(
-            inputs), "{}: expected {} but got {}".format(
-                onnx_node.op_type, len(valid_inputs), len(inputs))
-
-        tmp_inputs = [inputs[x] for x in onnx_node.inputs if x != ""]
-        handle, forward = cls._onnx_node_to_singa_op(onnx_node, tmp_inputs,
-                                                     opset_version)
-        # only give the inputs it needs
-        # consumed_inputs are the inputs marked as attributes
-        # so we remove it here
-        tmp_inputs = [
-            inputs[x]
-            for x in onnx_node.inputs
-            if x not in onnx_node.consumed_inputs
-        ]
-        return cls._run_node(onnx_node, tmp_inputs, handle, forward,
-                             opset_version)
+            inputs), "{}: expected {} inputs, but got {}. ".format(
+                node.op_type, len(valid_inputs), len(inputs))
+
+        operator = cls._onnx_node_to_singa_op(node, opset_version)
+        # seperate weights with inputs, and init inputs as Tensor
+        weights = {}
+        _inputs = []
+        for (key, val) in zip(valid_inputs, inputs):
+            val = val.astype(onnx_type_to_singa_type(val.dtype))
+            if key in node.weight_inputs:
+                weights[key] = val
+            else:
+                x = tensor.from_numpy(val)
+                if device == 'CPU':
+                    assert singa.USE_CUDA, "Your SINGA doesn't compile GPU module."
+                    dev = device.create_cuda_gpu(set_default=False)
+                else:
+                    dev = device.get_default_device()
+                x.to_device(dev)
+                _inputs.append(x)
+        inputs = _inputs
+        # set params
+        params = {}
+        for key, name in node.weight_inputs.items():
+            params[name] = weights[key]
+        operator.set_params(params)
+        outputs = cls._run_node(operator, inputs)
+        outputs_dict = OrderedDict()
+        for (key, val) in zip(node.outputs, outputs):
+            outputs_dict[key] = val
+        return outputs_dict
 
     @classmethod
-    def _run_node(cls,
-                  onnx_node,
-                  inputs,
-                  handle,
-                  forward,
-                  opset_version=_known_opset_version):
+    def _run_node(cls, operator, inputs):
         """
-        run a single singa operator from a onnx node
-        Args:inputs: 
-            the input tensor
-        Args:handle: 
-            the handle of singa operator
-        Args:forward: 
-            the forward of singa operator
+        run a single singa operator from singa operator
         Args:
-            opset_version: the opset version
-        Returns: 
-            list, the output of the
+            operator (Operator): the Operator instance
+            inputs (Tensor[]): a list of SINGA Tensor
+        Returns:
+            list, the output
         """
-        outputs = forward(*inputs) if handle is None else forward(
-            handle, *inputs)
+        outputs = operator(*inputs)
         if not isinstance(outputs, collections.Iterable):
             outputs = [outputs]
-        outputs_dict = OrderedDict()
-        for (key, val) in zip(onnx_node.outputs, outputs):
-            outputs_dict[key] = val
-        return outputs_dict
+        return outputs
 
     @classmethod
-    def _init_graph_parameter(cls, graph, init_inputs, device):
+    def _parse_graph_params(cls, graph, device):
         """
-        init the singa tensor from onnx infos
+        parse the parameters from onnx graph
         Args:
-            graph: a given onnx graph
-        Args:
-            init_inputs: a list of inputs, which used to init the operators
+            graph (Graph): a given onnx graph
+            device (string): CPU or CUDA
+        Returns:
+            a dict of numpy ndarray
+        """
+        params = {}
+        for tp in graph.initializer:
+            val = numpy_helper.to_array(tp)
+            val = val.astype(onnx_type_to_singa_type(tp.data_type))
+            params[tp.name] = val
+        return params
+
+    @classmethod
+    def _parse_graph_inputs_outputs(cls, graph, params, device):
+        """
+        parse the inits, outputs from onnx graph
         Args:
-            device: the used device
+            graph (Graph): a given onnx graph
+            device (string): # CPU or CUDA
         Returns:
-            a dict of tensors
+            a dict of ValueInfo
+            a dict of ValueInfo
         """
-        tensor_map = {}
-        # due to https://github.com/onnx/onnx/issues/2417
-        # sometimes, input contains all initializer's info
-        # sometimes, may not
-        all_inputs = OrderedDict()
+        inputs = []
+        outputs = []
+        info_tuple = namedtuple('info_tuple', ['name', 'dtype', 'shape'])
         for t in graph.input:
-            all_inputs[t.name] = t
-        # so we refresh the input by the initializer
-        for t in graph.initializer:
-            all_inputs[t.name] = t
-        initializers = {t.name for t in graph.initializer}
-        inp_idx = 0
-        for name, x in all_inputs.items():
-            if name in initializers:
-                # if it has initializer, we use its value as the input
-                np_tensor = numpy_helper.to_array(x)
-                if np_tensor.dtype == "int64":
-                    np_tensor = np_tensor.astype(np.int32)
-                # todo, we cannot support scalar tensor
-                if np.ndim(np_tensor) == 0:
-                    np_tensor = np.array(np_tensor, ndmin=1)
-            else:
-                # if not, means it's a input rather than a inner weight
-                # so if the user gives values, we use these values
-                # if not, we just use the shape of input gived by onnx to init a random value
-                # HOWEVER, the random value may not be correct for some inputs, such as gather which needs indices
-                # so if have operators, the user must give inputs
-                x_shape = tuple(
-                    dim.dim_value for dim in x.type.tensor_type.shape.dim)
-                if init_inputs is not None:
-                    np_tensor = init_inputs[inp_idx]
-                    inp_idx += 1
-                else:
-                    np_tensor = np.random.randn(*x_shape).astype(np.float32)
-            tmp_tensor = tensor.from_numpy(np_tensor)
-            tmp_tensor.to_device(device)
-            # todo, for backward
-            tmp_tensor.stores_grad = (name in initializers)
-            tensor_map[x.name] = tmp_tensor
-        return tensor_map
+            if t.name not in params:
+                dtype = t.type.tensor_type.elem_type
+                shape = [dim.dim_value for dim in t.type.tensor_type.shape.dim]
+                inputs.extend([info_tuple(t.name, dtype, shape)])
+        for t in graph.output:
+            dtype = t.type.tensor_type.elem_type
+            shape = [dim.dim_value for dim in t.type.tensor_type.shape.dim]
+            outputs.extend([info_tuple(t.name, dtype, shape)])
+        return inputs, outputs
 
     @classmethod
-    def _onnx_model_to_singa_net(cls, model, init_inputs, device,
-                                 opset_version):
+    def _onnx_model_to_singa_ops(cls, graph, device, opset_version):
         """
-        get all intermediate tensors and operators from onnx model
-        Args:
-            model: a given onnx model
+        get all intermediate params, operators, and input info from onnx model
         Args:
-            init_inputs: a list of inputs, which used to init the operators
-        Args:
-            device: the used device
-        Args:
-            opset_version: the opset version
-        Returns:
-            a dict of tensors
+            graph (Graph): the loaded ONNX graph
+            device (string): CPU or CUDA
+            opset_version (int): the opset version
         Returns:
-            a list of SingaOps('name', 'op', 'handle', 'forward')
-        """
-        # init all tensor input and weight as a tensor map
-        tensor_map = cls._init_graph_parameter(model.graph, init_inputs, device)
-        # only weights tensor
-        weights = {x.name: tensor_map[x.name] for x in model.graph.initializer}
+            a dict of weights
+            a dict of ValueInfo
+            a dict of ValueInfo
+            a list of SingaOps('node', 'forward')
+        """
+        # init all tensor input and params as a tensor map
+        params = cls._parse_graph_params(graph, device)
+        inputs, outputs = cls._parse_graph_inputs_outputs(graph, params, device)
         # the parsed operators queue
-        singa_ops = []
-        singa_op = namedtuple('SingaOps', ['name', 'op', 'handle', 'forward'])
-        for node in model.graph.node:
+        operators = []
+        operator_tuple = namedtuple('operator_tuple', ['node', 'operator'])
+        for node in graph.node:
             node = OnnxNode(node)
-            # only give the inputs it needs
-            # consumed_inputs are the inputs marked as attributes
-            # so we remove it here
-            inputs = [
-                tensor_map[x]
-                for x in node.inputs
-                if x not in node.consumed_inputs
-            ]
-            handle, forward = cls._onnx_node_to_singa_op(
-                node, inputs, opset_version)
-            # if it is Constant, we hanlde it as a weight
-            # otherwise, we run it and add its output into map for being used by later operators
+            # convert Constant to param
             if node.op_type == 'Constant':
-                tmp_tensor = tensor.from_numpy(forward)
-                tmp_tensor.to_device(device)
-                tmp_name = node.outputs.pop(0)
-                weights[tmp_name] = tmp_tensor
-                tensor_map[tmp_name] = tmp_tensor
+                params[node.outputs[0]] = cls._onnx_constant_to_np(node)
             else:
-                outputs = cls._run_node(node, inputs, handle, forward)
-                for key, val in outputs.items():
-                    tensor_map[key] = val
-                singa_ops.extend([singa_op(node.name, node, handle, forward)])
-        return weights, singa_ops
+                op = cls._onnx_node_to_singa_op(node, opset_version)
+                operators.extend([operator_tuple(node, op)])
+        return params, inputs, outputs, operators
 
     @classmethod
-    def prepare(cls, model, device, **kwargs):
+    def prepare(cls, model, device='CPU', **kwargs):
         """
-        get the batch norm operator from onnx node
-        Args:
-            model: a given onnx node
+        parse the ONNX and to create layers
         Args:
-            device: the used device
-        Returns: 
-            a list of output values
+            model (ModelProto): the loaded ONNX model
+            device (string): CPU or CUDA
+        Returns:
+            a SingaRep instance to stores the layers and weights
         """
         super(SingaBackend, cls).prepare(model, device, **kwargs)
-        # when parsing graph, we use the shape of input gived by onnx to init a random value
-        # HOWEVER, the random value may not be correct for some inputs, such as gather which needs indices
-        # so if have operators, the user must give inputs
-        init_inputs = kwargs.get("init_inputs", None)
-        # whether initializers are moved into inputs, due to https://github.com/onnx/onnx/issues/2417
-        # sometimes, input contains all initializer's info, sometimes, may not
-        cls.keep_initializers_as_inputs = kwargs.get(
-            'keep_initializers_as_inputs', True)
         # optimize and infer the shape of the model
         try:
             model = onnx.utils.polish_model(model)
         except IndexError as err:
-            # due to https://github.com/onnx/onnx/issues/2417
             model = onnx.shape_inference.infer_shapes(model)
 
         # check the opset version and ir version
+        # SINGA supports opset version(11), ir version(1.6.0 -> 6)
         opset_version = None
         for imp in model.opset_import:
             if not imp.HasField("domain") or imp.domain == "":
                 opset_version = imp.version
-                if imp.version > cls._known_opset_version:
+                if imp.version > cls._opset_version:
                     warnings.warn(
-                        "This version of singa targets ONNX operator set version {}, but the model we are trying to import uses version {}.  We will try to import it anyway, but if the model uses operators which had BC-breaking changes in the intervening versions, import will fail."
-                        .format(cls._known_opset_version, imp.version))
+                        "The imported opertor set verion {} is larger than the supported version {}."
+                        .format(imp.version, cls._opset_version))
             else:
                 warnings.warn("Unrecognized operator set {}".format(imp.domain))
-        if opset_version is None:
-            if model.ir_version >= 0x00000003:
-                raise RuntimeError(
-                    "Model with IR version >= 3 did not specify ONNX operator set version (singa requires it)"
-                )
-            else:
-                opset_version = 1
-        weights, singa_ops = cls._onnx_model_to_singa_net(
-            model, init_inputs, device, opset_version)
-        return SingaRep(model, weights, singa_ops,
-                        cls.keep_initializers_as_inputs)
+
+        if model.ir_version > cls._ir_version:
+            warnings.warn(
+                "The imported ir verion {} is larger than the supported version {}."
+                .format(cls._ir_version, imp.version))
+
+        graph = model.graph
+        params, inputs, outputs, layers = cls._onnx_model_to_singa_ops(
+            graph, device, opset_version)
+        return SingaRep(params, inputs, outputs, layers, device)
 
 
 class SingaRep(BackendRep):
 
-    def __init__(self,
-                 model,
-                 weights,
-                 singa_ops,
-                 keep_initializers_as_inputs=True):
+    def __init__(self, params, inputs, outputs, layers, device):
         """
+        https://github.com/onnx/onnx/blob/master/docs/ImplementingAnOnnxBackend.md
         SingaRep provides the intermediate representation of Singa,
         the user can run the forward of the singa model by run func,
         or, the user can append more layers after the singa_ops to do
         the transfer learning
         Args:
-            model: a given operator
+            params (dict{}): a dict of params, data type is numpy ndarray
+            inputs (ValueInfo): a dict of inputs
+            outputs (ValueInfo): a dict of outputs
+            layers (namedtuple('operator_tuple', ['node', 'operator'])[]): a list of singa operator
+            device (string): CPU or CUDA
+        """
+        super(SingaRep, self).__init__()
+        self.inputs = inputs
+        self.states = params
+        self.outputs = outputs
+        self.dev = cpu_dev if device == "CPU" else gpu_dev
+        self.layers = layers
+        self.tensor_count = {}
+        self.has_initialized = False
+        self.is_graph = False
+
+    def initialize(self):
+        """
+        Init the instance
+        """
+        self.outputs_info = {outp.name: outp for outp in self.outputs}
+        _layers = []  # layers by topo order
+        for node, operator in self.layers:
+            for key, name in node.weight_inputs.items():
+                if key not in self.states:
+                    # cannot find the weights, try to find it from input
+                    node.set_attr_inputs(key, name)
+            self.__dict__[node.name] = operator
+            # init the tensor count
+            all_possible_inputs = node.inputs + list(
+                node.attr_inputs.keys()) + list(node.weight_inputs.keys())
+            for inp in all_possible_inputs:
+                if inp not in self.tensor_count:
+                    self.tensor_count[inp] = 1
+                else:
+                    self.tensor_count[inp] += 1
+            _layers.append(node)
+        self._layers = _layers
+
+    def init_tensor_count(self):
+        """
+        Init the tensor count dict
+        """
+        self.tensor_count = {}
+        for node, operator in self.layers:
+            # init the tensor count
+            all_possible_inputs = node.inputs + list(
+                node.attr_inputs.keys()) + list(node.weight_inputs.keys())
+            for inp in all_possible_inputs:
+                if inp not in self.tensor_count:
+                    self.tensor_count[inp] = 1
+                else:
+                    self.tensor_count[inp] += 1
+
+    def to_input_tensor(self, x):
+        """
+        convert the input to tensors
         Args:
-            weights: the tensor of weights
+            x (np.ndarray[]): a list of numpy ndarray as inputs
+        Returns: 
+            a dict of SINGA Tensors
+        """
+        tensor_dict = {}
+        # init inputs as Tensor
+        for (key, val) in zip(self.inputs, x):
+            if not self.is_graph:
+                val = val.astype(onnx_type_to_singa_type(key.dtype))
+                # todo, scalar
+                val = np.atleast_1d(self.states[val])
+                val = tensor.from_numpy(val)
+                val.to_device(self.dev)
+            tensor_dict[key.name] = val
+        return tensor_dict
+
+    def to_output_tensor(self, y, out_name):
+        """
+        convert the tensors to input
         Args:
-            singa_ops: the tensor of the operator
+            x (np.ndarray[]): a list of numpy ndarray as inputs
+        Returns: 
+            a dict of SINGA Tensors
         """
-        super(SingaRep, self).__init__()
-        self.model = model
-        self.tensor_map = weights
-        self.keep_initializers_as_inputs = keep_initializers_as_inputs
-        # this each item of singa_ops is: ('name', 'op', 'handle', 'forward')
-        # the name is a string, op is OnnxNode,
-        # handle is Singa handle to store the tensor into singa operator
-        # the forward is singa autograd operator
-        self.singa_ops = singa_ops
+        if not self.is_graph:
+            y = tensor.to_numpy(y)
+            if out_name in self.outputs_info:
+                np_dtyp = mapping.TENSOR_TYPE_TO_NP_TYPE[
+                    self.outputs_info[out_name].dtype]
+                y = y.astype(np_dtyp)
+        return y
 
-    def run(self, inputs, **kwargs):
+    def get_s(self, name, node, tensor_dict):
+        """
+        get state from the node's weights or tensor_dict
+        Args:
+            name (str): name of the state
+            node (ONNXNode): ONNX node
+            tensor_dict ({}): tensor dict
+        Returns: 
+            the states
+        """
+        if name in node.attr_inputs:
+            return tensor_dict[name]
+        else:
+            return self.states[name]
+
+    def handle_special_ops(self, node, op, tensor_dict):
+        """
+        hanlde some special operations
+        Args:
+            name (str): name of the state
+            node (ONNXNode): ONNX node
+            tensor_dict ({}): tensor dict
+        Returns: 
+            the states
+        """
+        # todo, hard code
+        # Conv2d nb_kernels
+        if node.op_type == "Conv":
+            shape = self.get_s(node.inputs[1], node, tensor_dict).shape
+            op.nb_kernels = shape[0]
+        # Gemm nb_kernels and bias_shape
+        elif node.op_type == "Gemm":
+            nb_kernels_flag = 0 if op.transB == 1 else 1
+            shape = self.get_s(node.inputs[1], node, tensor_dict).shape
+            op.nb_kernels = shape[nb_kernels_flag]
+            if op.bias:
+                shape = self.get_s(node.inputs[2], node, tensor_dict).shape
+                op.bias_shape = shape
+
+    def run(self, *x, **kwargs):
         """
         run the forward of singa model
         Args:
-            inputs: a given operator
+            x (np.ndarray[]): a list of numpy ndarray as inputs
         Returns: 
-            the onnx node
+            a list of outputs
         """
-        graph = self.model.graph
+        if not self.has_initialized:
+            self.initialize()
+            if isinstance(x[0], tensor.Tensor):
+                self.dev = x[0].device
+            self.has_initialized = True
+
+        outputs_dict = OrderedDict([(outp.name, None) for outp in self.outputs])
+
         # last_layers means we run this model until the last #N layers
-        last_layers = kwargs.get('last_layers', len(self.singa_ops))
-        if last_layers != len(self.singa_ops):
-            final_outputs = self.singa_ops[last_layers-1].op.outputs
-        else:
-            final_outputs =  [outp.name for outp in graph.output]
-        # whether return all outputs
-        all_outputs = kwargs.get('all_outputs', False)
-        # get a specific op by its name
-        op_name = kwargs.get('op_name', None)
-        # record the tensor we added from input
-        tmp_tensor_map = {name: val for name, val in self.tensor_map.items()}
-
-        # the dict will be returned
-        ret_outputs = OrderedDict()
-        if self.keep_initializers_as_inputs:
-            require_input_len = len(graph.input) - len(graph.initializer)
-            actual_input_len = len(inputs)
-        else:
-            require_input_len = len(graph.input)
-            actual_input_len = len(inputs)
-        assert require_input_len == actual_input_len, "The length of graph input is different from the tensor input: %d, %d" % (
-            require_input_len, actual_input_len)
-        # run the handle by the order of the list(the list is Topological Sorting)
-        for inp in graph.input:
-            if inp.name not in tmp_tensor_map:
-                tmp_tensor_map[inp.name] = inputs.pop(0)
-
-        for _, op, handle, forward in self.singa_ops[:last_layers]:
-            if len(op.consumed_inputs) != 0:
-                # because if op has consumed_inputs, it means it moved some inputs into attributes
-                # so when running, we should update these attributes
-                handle, forward = get_op(op,
-                                         [tmp_tensor_map[x] for x in op.inputs])
-            inputs = [
-                tmp_tensor_map[x]
-                for x in op.inputs
-                if x not in op.consumed_inputs
-            ]
-            outputs = _run_node(op, inputs, handle, forward)
-            for key, val in outputs.items():
-                tmp_tensor_map[key] = val
-                ret_outputs[key] = val
-
-        if op_name is not None:
-            if op_name in outputs:
-                return outputs[op_name]
+        last_layers = kwargs.get('last_layers', len(self._layers))
+        if last_layers != len(self._layers):
+            for outp in self._layers[last_layers - 1].outputs:
+                outputs_dict[outp] = None
+
+        aux_output = kwargs.get('aux_output', ())
+        for outp in aux_output:
+            outputs_dict[outp] = None
+
+        tensor_dict = self.to_input_tensor(x)
+        self.init_tensor_count()
+
+        # run the layer by the topo order
+        for node in self._layers[:last_layers]:
+            op = self.__dict__[node.name]
+            self.handle_special_ops(node, op, tensor_dict)
+            # make input
+            inputs = []
+            for inp in node.inputs:
+                if inp not in node.weight_inputs and inp not in node.attr_inputs:
+                    if inp in tensor_dict:
+                        inputs.append(tensor_dict[inp])
+                    elif inp in self.states:
+                        # todo, scalar
+                        val = np.atleast_1d(self.states[inp])
+                        val = tensor.from_numpy(val)
+                        val.to_device(self.dev)
+                        inputs.append(val)
+                    else:
+                        raise KeyError("Not found the input {} for operation {}".format(inp, node.name))
+            states = {}
+            if callable(getattr(op, "initialize",
+                                None)) and not op._initialized:
+                # init the operator
+                op.initialize(*inputs)
+                op._initialized = True
+                for key, name in node.weight_inputs.items():
+                    if key not in node.attr_inputs:
+                        # find the weights and not in the inputs
+                        states[name] = self.states[key]
+
+            # replace attrs by inputs
+            for key, name in node.attr_inputs.items():
+                if key in tensor_dict:
+                    ts = tensor_dict[key]
+                    if isinstance(ts, tensor.Tensor):
+                        ts = tensor.to_numpy(ts)
+                    states[name] = ts
+                elif key in self.states:
+                    states[name] = self.states[key]
+            # set states
+            if callable(getattr(op, "set_states", None)):
+                op.set_states(**states)
             else:
-                raise RuntimeError(
-                    "The op_name {} does not exist, please check. The available op_names are: {}"
-                    .format(op_name, [val for key, val in op_name.items()]))
-
-        # return all outputs if all_outputs==True
-        # else return last outputs
-        if all_outputs:
-            return ret_outputs
-        else:
-            return [ret_outputs[outp] for outp in final_outputs]
+                for key, value in states.items():
+                    setattr(op, key, value)
+            # run the node
+            outputs = _run_node(op, inputs)
+            # release the input tensor
+            for inp in node.inputs:
+                if inp in self.tensor_count:
+                    self.tensor_count[inp] -= 1
+                if self.tensor_count[inp] == 0:
+                    if inp in tensor_dict:
+                        del tensor_dict[inp]
+                    del self.tensor_count[inp]
+            # store the output
+            for (outp, val) in zip(node.outputs, outputs):
+                tensor_dict[outp] = val
+                if outp in outputs_dict:
+                    outputs_dict[outp] = self.to_output_tensor(val, outp)
+        return list(outputs_dict.values())
+
+
+class SONNXModel(model.Model):
+
+    def __init__(self, onnx_model):
+        """
+        Init a SIGNA Model
+        Args:
+            onnx_model (ModelProto): a loaded onnx model
+        """
+        super(SONNXModel, self).__init__()
+        self.sg_ir = prepare(onnx_model)
+        for node, operator in self.sg_ir.layers:
+            self.__dict__[node.name] = operator

Review comment:
       is each operator here a Layer instance?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] lgtm-com[bot] commented on pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

lgtm-com[bot] commented on pull request #703:
URL: https://github.com/apache/singa/pull/703#issuecomment-639393194


   This pull request **fixes 1 alert** when merging 61e424e5813c51f555072d990242eee72c6b15cc into ede4a3ed0e29e4ef488e76e37f6c020c44508ea0 - [view on LGTM.com](https://lgtm.com/projects/g/apache/singa/rev/pr-899d09ab0ce211a3ac6047eb0427b0113162f7a6)
   
   **fixed alerts:**
   
   * 1 for Unused local variable


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] joddiy commented on pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

joddiy commented on pull request #703:
URL: https://github.com/apache/singa/pull/703#issuecomment-639395227


   @nudles  ready to merge


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] nudles merged pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

nudles merged pull request #703:
URL: https://github.com/apache/singa/pull/703


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] lgtm-com[bot] commented on pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

lgtm-com[bot] commented on pull request #703:
URL: https://github.com/apache/singa/pull/703#issuecomment-639439480


   This pull request **fixes 1 alert** when merging c563ed3d80711f140b0ba50827b02f4af59dca45 into ede4a3ed0e29e4ef488e76e37f6c020c44508ea0 - [view on LGTM.com](https://lgtm.com/projects/g/apache/singa/rev/pr-a3acaf88c7b20a71ff4465d80b42be413018ad17)
   
   **fixed alerts:**
   
   * 1 for Unused local variable


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] lgtm-com[bot] commented on pull request #703: Refactor sonnx, test cases and examples

Posted by GitBox <gi...@apache.org>.

lgtm-com[bot] commented on pull request #703:
URL: https://github.com/apache/singa/pull/703#issuecomment-636429098


   This pull request **fixes 1 alert** when merging 44796e63f66f1010ac08ff2c7ba8e94160c490c4 into dd18aff58aafe29b2984c884fad7e453bfe2d507 - [view on LGTM.com](https://lgtm.com/projects/g/apache/singa/rev/pr-dc8c82584e5ab00b4ee3971b553fc31e79c049db)
   
   **fixed alerts:**
   
   * 1 for Unused local variable


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org