You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/01/13 15:13:13 UTC

[GitHub] analog-cbarber opened a new issue #9411: gluon.data.vision datasets should have layout option

analog-cbarber opened a new issue #9411: gluon.data.vision datasets should have layout option
URL: https://github.com/apache/incubator-mxnet/issues/9411
 
 
   (mxnet 1.0.0)
   
   The datasets provided in mxnet.gluon.data.vision do not provide an option for the initial data shape and simply put the channel dimension last for some reason. Unfortunately, this choice clashes with the default layout of NCHW for the Conv2D block, which is frequently the first layer in models using these data sets. It also clashes with the native layout of these datasets. In the case of CIFAR, because there is more than one channel, it has to actually reorder all of the data only to have it reordered back again. What is the rationale for putting the channel last? 
   
   There should be an option independent of the transform keyword to specify the initial layout of the dataset, at least with respect to the location of channel dimension, e.g.
   
   ~~~python
   dataset = MNIST(layout='NCHW', transform=my_transform)
   ~~~
   
   It would probably be sufficient to only implement layouts 'NCHW' and 'NHWC'.
   
   Note that reshaping the transform function is not as easy as it might seem since you may need to handle varying batch sizes or even a dropped dimension in the case where you take a single element slice of the dataset. 
   
   For instance, here is the transform I used that was intended to be usable with both MNIST and CIFAR with varying batch sizes:
   
   ~~~python
           def data_transform(data:NDArray, label:NDArray) -> (NDArray, NDArray):
               shape = data.shape
               if shape[-1] == 1:
                   # If there is only one channel, data is already correctly ordered
                   # and we can just reshape.
                   data2 = data.reshape(shape[:-3] + (1,) + shape[-3:-1])
               elif len(shape) == 4:
                   # Otherwise we need to transpose
                   data2 = data.transpose((0,3,1,2))
               else:
                   assert len(shape) == 3
                   data2 = data.transpose((2,0,1))
   
               return data2.astype(self.dtype) * pixel_scale + pixel_offset, label
   ~~~
   
   If the underlying dataset had the channel in the right position, the transform could just be a one liner.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services