You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@singa.apache.org by GitBox <gi...@apache.org> on 2020/05/14 10:38:55 UTC
[GitHub] [singa] joddiy commented on issue #696: Refactor autograd module

joddiy commented on issue #696:
URL: https://github.com/apache/singa/issues/696#issuecomment-628548737


   > Shall we go with the following APIs?
   > @joddiy @dcslin @XJDKC
   > They should be compatible with the current APIs.
   > 
   > ```python
   > class Module:
   >     def compile(self, inputs, is_train, use_graph, graph_alg):
   >         set train, graph etc config
   >         turn off graph
   >         if inputs are not filled, print warnings and fill inputs according to data type.
   >         self.forward(*inputs)
   >     
   >      def load(self, ckp_path, include_state=False):
   >        load onnx model and copy the params to each layer; 
   >        generate warnings for mismatched layers/params.
   >        restore the states and return it as a dict
   >      
   >      def save(self, ckp_path, state={}):
   >        save the model as onnx format
   >        save the states
   >     
   >      def forward(self, x):    # turn on graph if necessary
   >         pass
   > 
   >      def train_one_batch(self, x, y):  # turn on graph if necessary
   >         pass   
   >    
   >      @deprecated 
   >      def loss(self, ):
   >         pass
   > 
   >       @deprecated 
   >       def optim(self,):
   >           pass      
   > 
   > 
   > class Layer:
   >     def __init__(name=None):
   >       self.init = False
   >       
   >     def __call__(self, x):
   >        if self.init == False:
   >            init layer states
   >        else:
   >           # do the forward propagation 
   > 
   > 
   > class MyLayer(Layer):
   >      def __init__(self):
   >           self.layer1 = layer.Conv2d(nb_kernels = 32, kernel=3, stride=1, padding=0, kernel_init='he_uniform') 
   >           self.layer2 = layer.MaxPool2d(kernel=3, stride=2)
   > 
   >       def forward(self, x):
   >           return self.layer2(self.layer1(x))
   > 
   > 
   > 
   > class MyModule(Module):
   >      def __init__(self):
   >            self.blk1 = MyLayer()
   >            self.blk2 = MyLayer()
   >            self.optim = SGD()
   >            self.loss = CrossEntropyLoss()
   > 
   >       def forward(self, x):
   >            return self.blk2(self.blk1(x))    
   > 
   >       def train_one_batch(self, x, y): 
   >            y_ = self.forward(x)
   >            l = self.loss(y_, y)
   >            self.optim.backward_and_update(l)
   >            return l
   > 
   > x = Placeholder((2, 3), device = gpu, dtype=singa.float) # alias of Tensor
   > fill x with values
   > m = MyModel()
   > 
   > # compatible with existing code which does not have the following two statements.
   > m.compile([x], is_train=True, use_graph=True, graph_alg='sequence')
   > for pname, ptensor in m.get_params():
   >     ptensor.uniform(-1, 1)   # not necessary if each layer's param init methods are configured.
   > 
   > y = Placeholder((2,), device = gpu)
   > for npx, npy in data:
   >    x.copy_from(npx)
   >    y.copy_from(npy)
   >    m.train_one_batch(x, y)  # build the graph in the first iter.  For the old code, the params are initialized here.
   > 
   > m.save('mymodel', state={'epoch': data.size(), 'sgd': m.optim}
   > ```
   
   This approach still postpones the operation init till the training phase right? When the user has a batch of samples, he calls `train_one_batch`, to call `forward`, and then to call `_call_`:
   ```py
   def __call__(self, x):
       if self.init == False:
           init layer states
   ```
   it's still strange to init the graph until the user has the data.
   
   In my opinion, the current problem is, 
   1. we don't have the shape of the input -> so we using a Placeholder as the input
   2. even we have the shape of input data, we cannot compute the all shapes of intermediate tensors since we cannot call the forward with Placeholder -> we may want to init random data but it may incur error.
   
   So, the key point is, we bind the graph construction with `forward` function. Only when we call forward, we construct the graph. But if we want to call forward we must have the real data.
   
   Then I'm thinking about separating the graph construction with `forward` function. We define several classes called `Graph`, `Node`, the `Graph` stores relationship between `Node`s, and `Node`s stores an `Operation` as well as its input and output.  
   
   In the `_call_` function of an `Operation`, we don't call the `forward` function, instead, create a `Node`, and stores this operation itself within this `Node`, set its input and output, then return this newly created `Node`. So finally, in the following code:
   ```py
   class Operation(object):
       def __init__(self):
           pass
   
       def __call__(self, previous_node): # for multiply input is similiar
           # create an Node
           # link the current with previous node
           # do the infer_shape, set the shape of each input and output for the current node and previous node
           current_node = new Node()
           current_node.input.node = previous_node
           current_node.operation = self
           current_node.output.shape = infer_shape()
           previous_node.output.node = current_node
           return current_node
   
       def forward():
           pass
   
       def backward():
           pass
   
       def infer_shape():
           pass
   ```
   
   
   We actually constructed a `Graph` linked with `Node` by using the following code:
   ```py
   class MyModule(Module):
       def __init__(self):
           super(Model, self).__init__()
   
           self.conv1 = autograd.Conv2d(1, 20, 5, padding=0)
           self.conv2 = autograd.Conv2d(20, 50, 5, padding=0)
   
           self.sgd = opt.SGD(lr=0.01)
   
       def construt_graph(self, x):
           # x is a placeholder
          # create the Graph linked with Node
           y = self.conv1(x)
           y = self.conv2(y)
           self.graph = Graph(x, y)
   
       def train(self, x, y): 
           y_ = self.graph.forward(x)
           l = self.loss(y_, y)
           self.optim.backward_and_update(l)
           return l
   
       def loss(self, out, y):
           return autograd.softmax_cross_entropy(out, y)
   
       def optim(self, loss):
           self.sgd.backward_and_update(loss)
   
   model = MyModule()
   x = Placeholder((2, 3), device = gpu, dtype=singa.float) # alias of Tensor
   model.construt_graph(x)
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org