You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/03/18 15:53:37 UTC

[GitHub] [incubator-mxnet] Neutron3529 opened a new issue #17870: batch transform with dataloader

Neutron3529 opened a new issue #17870: batch transform with dataloader
URL: https://github.com/apache/incubator-mxnet/issues/17870
 
 
   ## Description
   I found a [performance regression](https://github.com/apache/incubator-mxnet/issues/15655) last year, which is due to the current strategy of executing what meet `transform_first`'s setting.
   
   The current choice is perform what meets `transform_first`'s setting on the data **before** collecting data into a batch, if batch_size=500, we have to execute what meets `transform_first`'s setting for 500 times within a batch, which is extremely ineffective.
   
   if we collecting the data first, and send the full data batch to what meet `transform_first`'s settings, the processing may be much faster.
   
   Here is the test result, 2 nets are provided to simulate when to perform what meets `transform_first`'s setting:
   ```
   Python 3.7.7 (tags/v3.7.7:d7c567b08f, Mar 10 2020, 10:41:24) [MSC v.1900 64 bit (AMD64)] on win32
   Type "help", "copyright", "credits" or "license" for more information.
   >>> import mxnet as mx
   >>> from mxnet.gluon.nn import Dense
   >>> import time
   >>> net=mx.gluon.nn.HybridSequential()#calculate before batchify
   >>> with net.name_scope():
   ...  net.add(Dense(10))
   ...  net.add(Dense(35))
   ...  net.add(Dense(10))
   ...
   >>> n2=mx.gluon.nn.HybridSequential()#calculate after batchify
   >>> with n2.name_scope():
   ...  n2.add(Dense(10))
   ...  n2.add(Dense(35))
   ...  n2.add(Dense(10))
   ...
   >>> a=[mx.nd.random.uniform(shape=(60,)) for i in range(500)]#data is mx.nd.random.uniform(shape=(60,)) with  batch_size=500
   >>> ctx=mx.cpu(0)
   >>> n2.initialize(mx.init.Uniform(), ctx=ctx,force_reinit=True)
   >>> net.initialize(mx.init.Uniform(), ctx=ctx,force_reinit=True)
   >>> for i in range(10):
   ...  ii=time.time()
   ...  _=mx.nd.stack(*(net(a) for a in a))#calculate before batchify, which is the default order of calculation which MXNet applies.
   ...  jj=time.time()-ii
   ...  ii=time.time()
   ...  b=mx.nd.stack(*a)#batchify
   ...  _=mx.nd.stack(n2(b))#calculate, which will be faster.
   ...  kk=time.time()-ii
   ...  print((jj,kk))
   ...
   (0.44376635551452637, 0.008976459503173828)
   (0.34108686447143555, 0.0029942989349365234)
   (0.48074865341186523, 0.002991914749145508)
   (0.3600330352783203, 0.0029861927032470703)
   (0.35801076889038086, 0.003988027572631836)
   (0.3530876636505127, 0.0029916763305664062)
   (0.4198474884033203, 0.0019931793212890625)
   (0.41489434242248535, 0.0019941329956054688)
   (0.3879544734954834, 0.003995418548583984)
   (0.419874906539917, 0.0029935836791992188)
   >>>#which means, batchify after transform is a totally waste of time.
   ```
   I tried to force a transform after batchify, but I cannot find any convinent apporaches since most of the built-in transform function is not designed to work after batchify (e.g., ), if I want to batchify before transform, I have to write an equivlent transform function myself.
   
   Actually I've submitted [what I found](https://github.com/apache/incubator-mxnet/issues/15655#issuecomment-515744051), but the problem still exists. It tooks extremely long time training CIFAR-10 for me.
   (with num_worker=3, it may took a while, but with default num_workers=8, an OOM happened and crush the training step since each python.exe eats 2GB of my physical memory(loading libmxnet.dll tooks a lot of time and ~2G memory))

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] Neutron3529 closed issue #17870: batch transform with dataloader

Posted by GitBox <gi...@apache.org>.
Neutron3529 closed issue #17870:
URL: https://github.com/apache/incubator-mxnet/issues/17870


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org