You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mxnet.apache.org by Lai Wei <ro...@gmail.com> on 2019/06/20 06:36:22 UTC

[VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Dear MXNet community,

This is the 3-day vote to release Apache MXNet (incubating) version 1.5.0.
Voting on dev@ will start June 19, 23:59:59(PST)  and close on June 22,
23:59:59.

1) Link to release notes:
https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Notes


2) Link to release candidate:

https://github.com/apache/incubator-mxnet/releases/tag/1.5.0.rc1


3) Link to source and signatures on apache dist server:

https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.0.rc1/


Please remember to TEST first before voting accordingly:

+1 = approve
+0 = no opinion
-1 = disapprove (provide reason)
-- 
Best Regards

Lai

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Posted by Lai Wei <ro...@gmail.com>.
Hi all,

An update on the regression issue:

There is no significant regression on operators between 1.4.1 and 1.5.0
according to latest finding here [1].
The previous possible regresion observed is due to profiler change between
1.4.1 and 1.5.0 so it's not an apple to apple comparison. Please refer to
the performance results using time and timeit module from this comment. [2]

With that, let's restart voting on 1.5.0.rc2, as there is no code change
required.

[1]
https://github.com/apache/incubator-mxnet/issues/15429#issuecomment-508865398
[2]
https://github.com/apache/incubator-mxnet/issues/15429#issuecomment-508831150





Best Regards

Lai


On Sat, Jun 29, 2019 at 12:35 PM Chris Olivier <cj...@gmail.com>
wrote:

> for batch norm, I mean. max*
>
> On Sat, Jun 29, 2019 at 12:34 PM Chris Olivier <cj...@gmail.com>
> wrote:
>
> > what’s with the mac memory usage being 2x in 1.4? As I am not sure where
> > the number is coming from (if it’s my profiler code, I wouldn’t consider
> it
> > terribly meaningful), but it is the same everywhere else, so it kind of
> > sticks out.
> >
> > On Thu, Jun 27, 2019 at 3:36 PM sandeep krishnamurthy <
> > sandeep.krishna98@gmail.com> wrote:
> >
> >> Hello Ciyong/Pedro,
> >>
> >> Ran operator benchmarks on 1.4.1 and 1.5.0.rc2. (Not complete, doesn’t
> >> cover all MXNet operators, not presented in best possible way, still
> WIP)
> >>
> >>
> https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50
> >>
> >> Following operators looks slower in 1.5 compared to 1.4.1:
> >> - BatchNorm
> >> - Pooling
> >> - FullyConnected
> >> - batch_dot
> >> - Dot
> >> - broadcast_mul
> >> - log_softmax
> >> and few other operators
> >>
> >> Also, several operators runs a lot faster on 1.5 compared to 1.4.1. For
> >> example - Convolution, flatten, elementwise operators etc. So I see that
> >> likely few operators have regressed noticeably, however, due to other
> >> operator performance improvements, the end effect is not that
> significant
> >> hiding a lot of regression. We need more detailed analysis per operator
> >> performance. We will not be able to do this for current release, we
> should
> >> have a more concrete way to determining such performance regression
> before
> >> next release.
> >>
> >> Setup:
> >> 1.5 => Build from source (head of 1.5.rc2 tag), built with MKLDNN
> >> 1.4.1 => PyPi mxnet-mkl==1.4.1
> >> Machine: C5.18X
> >> No explicit environment variable were set
> >> Operator benchmark code -
> >> https://github.com/apache/incubator-mxnet/tree/master/benchmark/opperf
> >>
> >> Best,
> >> Sandeep
> >>
> >>
> >> On Thu, Jun 27, 2019 at 10:42 AM Pedro Larroy <
> >> pedro.larroy.lists@gmail.com>
> >> wrote:
> >>
> >> > I will try to run a few benchmarks in a bare metal instance tonight to
> >> > remove virtualization variance for the measurements and provide some
> >> > numbers.
> >> >
> >> > Please propose a set of models / examples that would be desirable to
> >> > run before the release and provide a link to an easy to run script
> >> > with instructions so we can validate the release better.
> >> >
> >> > Thank you.
> >> >
> >> > On Thu, Jun 27, 2019 at 10:01 AM Lai Wei <ro...@gmail.com> wrote:
> >> > >
> >> > > Dear @dev,
> >> > >
> >> > > I m cancelling the vote for cached op fix:
> >> > >
> >> > > https://github.com/apache/incubator-mxnet/pull/15298
> >> > >
> >> > > As for the possible cpu training regression, it looks like not a
> >> blocker
> >> > > for now.
> >> > >
> >> > > I will start a new rc2 vote, please help to validate.
> >> > >
> >> > > Thanks!
> >> > >
> >> > >
> >> > > On Thu, Jun 27, 2019 at 10:06 PM Chen, Ciyong <
> ciyong.chen@intel.com>
> >> > wrote:
> >> > >
> >> > > > Hi Pedro,
> >> > > >
> >> > > > I was able to reproduced the similar result (v1.5 is ~%5.6 slower
> >> than
> >> > > > v1.4, I was using 18 cores for computing) with your script on
> >> > C5.18xlarge.
> >> > > > But need to bind the cores with below command when running the
> >> script,
> >> > > > (without setting the env variables, I got a close time (<1%) with
> >> v1.5
> >> > and
> >> > > > v1.4)
> >> > > >         export
> >> KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0
> >> > > >         export OMP_NUM_THREADS=18
> >> > > >
> >> > > > Did you set any env variables during running?
> >> > > >
> >> > > > The performance result I got as below:
> >> > > > 1) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
> >> > > > real    12m10.856s
> >> > > > user    234m49.576s
> >> > > > sys     4m38.044s
> >> > > >
> >> > > > 2) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
> >> > > > real    12m52.140s
> >> > > > user    246m30.740s
> >> > > > sys     5m8.188s
> >> > > >
> >> > > > As I looked at the profiling data, most of the ops have same perf
> >> > between
> >> > > > v1.4 and v1.5. But some ops like " _backward_BatchNorm" and
> >> "Pooling"
> >> > is
> >> > > > ~1.37x slower on v1.5 compared with v1.4.
> >> > > > Will do further analysis on these ops.
> >> > > >
> >> > > > Here's the hardware/OS info from my side:
> >> > > > ----------Python Info----------
> >> > > > Version      : 3.6.8
> >> > > > Compiler     : GCC 7.3.0
> >> > > > Build        : ('default', 'Dec 30 2018 01:22:34')
> >> > > > Arch         : ('64bit', '')
> >> > > > ------------Pip Info-----------
> >> > > > Version      : 19.0.3
> >> > > > Directory    :
> >> > > >
> >> /home/ubuntu/anaconda3/envs/perf-mxnet/lib/python3.6/site-packages/pip
> >> > > > ----------MXNet Info-----------
> >> > > > Version      : 1.5.0
> >> > > > Directory    : /home/ubuntu/ws/incubator-mxnet/python/mxnet
> >> > > > Hashtag not found. Not installed from pre-built package.
> >> > > > ----------System Info----------
> >> > > > Platform     : Linux-4.4.0-1085-aws-x86_64-with-debian-stretch-sid
> >> > > > system       : Linux
> >> > > > node         : ip-172-31-32-129
> >> > > > release      : 4.4.0-1085-aws
> >> > > > version      : #96-Ubuntu SMP Tue Jun 11 09:08:32 UTC 2019
> >> > > > ----------Hardware Info----------
> >> > > > machine      : x86_64
> >> > > > processor    : x86_64
> >> > > > Architecture:          x86_64
> >> > > > CPU op-mode(s):        32-bit, 64-bit
> >> > > > Byte Order:            Little Endian
> >> > > > CPU(s):                72
> >> > > > On-line CPU(s) list:   0-71
> >> > > > Thread(s) per core:    2
> >> > > > Core(s) per socket:    18
> >> > > > Socket(s):             2
> >> > > > NUMA node(s):          2
> >> > > > Vendor ID:             GenuineIntel
> >> > > > CPU family:            6
> >> > > > Model:                 85
> >> > > > Model name:            Intel(R) Xeon(R) Platinum 8124M CPU @
> 3.00GHz
> >> > > > Stepping:              3
> >> > > > CPU MHz:               3000.000
> >> > > > BogoMIPS:              6000.00
> >> > > > Hypervisor vendor:     KVM
> >> > > > Virtualization type:   full
> >> > > > L1d cache:             32K
> >> > > > L1i cache:             32K
> >> > > > L2 cache:              1024K
> >> > > > L3 cache:              25344K
> >> > > > NUMA node0 CPU(s):     0-17,36-53
> >> > > > NUMA node1 CPU(s):     18-35,54-71
> >> > > > Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep
> >> mtrr
> >> > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx
> >> > pdpe1gb
> >> > > > rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology
> >> nonstop_tsc
> >> > > > aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16
> pcid
> >> > sse4_1
> >> > > > sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c
> >> rdrand
> >> > > > hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser
> fsgsbase
> >> > > > tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f
> >> rdseed
> >> > adx
> >> > > > smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 ida arat pku
> >> > > > ----------Network Test----------
> >> > > >
> >> > > >
> >> > > > -Ciyong
> >> > > >
> >> > > >
> >> > > > -----Original Message-----
> >> > > > From: Zhao, Patric [mailto:patric.zhao@intel.com]
> >> > > > Sent: Thursday, June 27, 2019 9:55 AM
> >> > > > To: dev@mxnet.incubator.apache.org
> >> > > > Cc: dev@mxnet.apache.org
> >> > > > Subject: RE: [VOTE] Release Apache MXNet (incubating) version
> >> 1.5.0.rc1
> >> > > >
> >> > > > Could we run more epochs to see the performance difference or
> >> profiling
> >> > > > the difference between good and bad run?
> >> > > >
> >> > > > > -----Original Message-----
> >> > > > > From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com]
> >> > > > > Sent: Thursday, June 27, 2019 9:35 AM
> >> > > > > To: dev@mxnet.incubator.apache.org
> >> > > > > Cc: dev@mxnet.apache.org
> >> > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version
> >> > > > > 1.5.0.rc1
> >> > > > >
> >> > > > > I run again and the gap is again bigger, I guess we need to
> >> average
> >> > > > > out the times across several runs:
> >> > > > >
> >> > > > > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
> >> > > > > (master)+$ time ~/mxnet_1.4/py3_venv/bin/python cifar10.py
> >> --epochs 5
> >> > > > > && time ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
> >> > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
> >> > > > > ImageRecordIOParser2:
> >> > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
> >> > threads
> >> > > > > for decoding..
> >> > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
> >> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >> > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
> >> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >> completed
> >> > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
> >> > > > > ImageRecordIOParser2:
> >> > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
> >> threads
> >> > > > > for decoding..
> >> > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
> >> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >> > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
> >> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >> completed
> >> > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
> 300:
> >> > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 [23:17:09]
> >> > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> >> > > > > 147456 bytes with malloc directly
> >> > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> >> > > > > 589824 bytes with malloc directly
> >> > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> >> > > > > 2359296 bytes with malloc directly
> >> > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> >> > > > > 9437184 bytes with malloc directly
> >> > > > > Epoch 0, Batch 199, Speed=384.149839
> >> > > > > Epoch 0, Duration=140.919567
> >> > > > > Epoch 0, Training accuracy=0.115169
> >> > > > > Epoch 0, Validation accuracy=0.141317
> >> > > > > Epoch 1, Batch 199, Speed=433.380512
> >> > > > > Epoch 1, Duration=119.553233
> >> > > > > Epoch 1, Training accuracy=0.170956
> >> > > > > Epoch 1, Validation accuracy=0.216146
> >> > > > > Epoch 2, Batch 199, Speed=434.864699
> >> > > > > Epoch 2, Duration=123.278490
> >> > > > > Epoch 2, Training accuracy=0.209455
> >> > > > > Epoch 2, Validation accuracy=0.247296
> >> > > > > Epoch 3, Batch 199, Speed=433.401854
> >> > > > > Epoch 3, Duration=118.327797
> >> > > > > Epoch 3, Training accuracy=0.248701
> >> > > > > Epoch 3, Validation accuracy=0.302083
> >> > > > > Epoch 4, Batch 199, Speed=419.713707
> >> > > > > Epoch 4, Duration=126.468409
> >> > > > > Epoch 4, Training accuracy=0.260949
> >> > > > > Epoch 4, Validation accuracy=0.269030
> >> > > > >
> >> > > > > real    10m55.796s
> >> > > > > user    399m33.567s
> >> > > > > sys     13m55.904s
> >> > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
> >> > > > > ImageRecordIOParser2:
> >> > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
> >> > threads
> >> > > > > for decoding..
> >> > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
> >> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >> > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
> >> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >> completed
> >> > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
> >> > > > > ImageRecordIOParser2:
> >> > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
> >> threads
> >> > > > > for decoding..
> >> > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
> >> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >> > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
> >> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >> completed
> >> > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
> 300:
> >> > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0, Batch
> 199,
> >> > > > > Speed=419.039188 Epoch 0, Duration=143.934903 Epoch 0, Training
> >> > > > > accuracy=0.122542 Epoch 0, Validation accuracy=0.164359 Epoch 1,
> >> > Batch
> >> > > > > 199, Speed=445.257048 Epoch 1, Duration=135.248399 Epoch 1,
> >> Training
> >> > > > > accuracy=0.178828 Epoch 1, Validation accuracy=0.199419 Epoch 2,
> >> > Batch
> >> > > > > 199, Speed=447.115215 Epoch 2, Duration=132.003770 Epoch 2,
> >> Training
> >> > > > > accuracy=0.217808 Epoch 2, Validation accuracy=0.233073 Epoch 3,
> >> > Batch
> >> > > > > 199, Speed=441.079477 Epoch 3, Duration=126.543316 Epoch 3,
> >> Training
> >> > > > > accuracy=0.248102 Epoch 3, Validation accuracy=0.293870 Epoch 4,
> >> > Batch
> >> > > > > 199, Speed=449.329787 Epoch 4, Duration=138.398325 Epoch 4,
> >> Training
> >> > > > > accuracy=0.270021 Epoch 4, Validation accuracy=0.311498
> >> > > > >
> >> > > > > real    11m45.329s
> >> > > > > user    426m13.908s
> >> > > > > sys     16m45.093s
> >> > > > >
> >> > > > > On Wed, Jun 26, 2019 at 4:18 PM Pedro Larroy
> >> > > > > <pe...@gmail.com> wrote:
> >> > > > > >
> >> > > > > > The difference looks smaller now, more like your numbers. I
> >> wonder
> >> > > > > > if something happened during the previous benchmark like a
> >> system
> >> > > > > > update...
> >> > > > > >
> >> > > > > >
> >> > > > > > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
> >> > > > > (master)+$
> >> > > > > > time ~/mxnet_1.4/py3_venv/bin/python cifar10.py --epochs 5 &&
> >> time
> >> > > > > > ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
> [22:49:41]
> >> > > > > > ../src/io/iter_image_recordio_2.cc:172:
> >> > > > > > ImageRecordIOParser2:
> >> > > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
> >> > > > > > threads for decoding..
> >> > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load mean
> >> image
> >> > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >> > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load mean
> >> image
> >> > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >> > > > > completed
> >> > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:172:
> >> > > > > > ImageRecordIOParser2:
> >> > > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
> >> > > > > > threads for decoding..
> >> > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load mean
> >> image
> >> > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >> > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load mean
> >> image
> >> > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >> > > > > completed
> >> > > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
> >> 300:
> >> > > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 [22:49:42]
> >> > > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> >> > > > > > 147456 bytes with malloc directly
> >> > > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> Allocate
> >> > > > > > 589824 bytes with malloc directly
> >> > > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> Allocate
> >> > > > > > 2359296 bytes with malloc directly
> >> > > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> Allocate
> >> > > > > > 9437184 bytes with malloc directly
> >> > > > > > Epoch 0, Batch 199, Speed=426.182733 Epoch 0,
> >> Duration=134.868458
> >> > > > > > Epoch 0, Training accuracy=0.127238 Epoch 0, Validation
> >> > > > > > accuracy=0.206388 Epoch 1, Batch 199, Speed=313.127156 Epoch
> 1,
> >> > > > > > Duration=128.041775 Epoch 1, Training accuracy=0.182065 Epoch
> 1,
> >> > > > > > Validation accuracy=0.202524 Epoch 2, Batch 199,
> >> Speed=410.931187
> >> > > > > > Epoch 2, Duration=124.920588 Epoch 2, Training
> accuracy=0.202584
> >> > > > > > Epoch 2, Validation accuracy=0.245693 Epoch 3, Batch 199,
> >> > > > > > Speed=419.119335 Epoch 3, Duration=120.948349 Epoch 3,
> Training
> >> > > > > > accuracy=0.235854 Epoch 3, Validation accuracy=0.291066 Epoch
> 4,
> >> > > > > > Batch 199, Speed=430.473733 Epoch 4, Duration=130.181724 Epoch
> >> 4,
> >> > > > > > Training accuracy=0.257773 Epoch 4, Validation
> accuracy=0.304988
> >> > > > > >
> >> > > > > > real    11m7.356s
> >> > > > > > user    406m9.910s
> >> > > > > > sys     14m18.349s
> >> > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
> >> > > > > > ImageRecordIOParser2:
> >> > > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
> >> > > > > > threads for decoding..
> >> > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load mean
> >> image
> >> > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >> > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load mean
> >> image
> >> > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >> > > > > completed
> >> > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
> >> > > > > > ImageRecordIOParser2:
> >> > > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
> >> > > > > > threads for decoding..
> >> > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load mean
> >> image
> >> > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >> > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load mean
> >> image
> >> > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >> > > > > completed
> >> > > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
> >> 300:
> >> > > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0, Batch
> >> 199,
> >> > > > > > Speed=348.618154 Epoch 0, Duration=146.469352 Epoch 0,
> Training
> >> > > > > > accuracy=0.124121 Epoch 0, Validation accuracy=0.167227 Epoch
> 1,
> >> > > > > > Batch 199, Speed=452.790825 Epoch 1, Duration=130.199421 Epoch
> >> 1,
> >> > > > > > Training
> >> > > > > > accuracy=0.183863 Epoch 1, Validation accuracy=0.237079 Epoch
> 2,
> >> > > > > > Batch 199, Speed=451.406559 Epoch 2, Duration=126.320823 Epoch
> >> 2,
> >> > > > > > Training
> >> > > > > > accuracy=0.214844 Epoch 2, Validation accuracy=0.244692 Epoch
> 3,
> >> > > > > > Batch 199, Speed=403.161873 Epoch 3, Duration=125.331660 Epoch
> >> 3,
> >> > > > > > Training
> >> > > > > > accuracy=0.243506 Epoch 3, Validation accuracy=0.301182 Epoch
> 4,
> >> > > > > > Batch 199, Speed=450.826598 Epoch 4, Duration=126.426253 Epoch
> >> 4,
> >> > > > > > Training
> >> > > > > > accuracy=0.266424 Epoch 4, Validation accuracy=0.311899
> >> > > > > >
> >> > > > > > real    11m21.930s
> >> > > > > > user    415m3.855s
> >> > > > > > sys     13m53.975s
> >> > > > > >
> >> > > > > > On Wed, Jun 26, 2019 at 3:50 PM Pedro Larroy
> >> > > > > > <pe...@gmail.com> wrote:
> >> > > > > > >
> >> > > > > > > Hi Ciyong, thanks for trying to reproduce:
> >> > > > > > >
> >> > > > > > > I used this one:
> >> > > > > > > https://github.com/awslabs/deeplearning-
> >> > > > > benchmark/blob/master/dawnbe
> >> > > > > > > nch/cifar10.py
> >> > > > > > >
> >> > > > > > > Could you provide hardware and OS details?
> >> > > > > > >
> >> > > > > > > I will rerun and repost numbers in a few minutes.
> >> > > > > > >
> >> > > > > > > Pedro.
> >> > > > > > >
> >> > > > > > > On Wed, Jun 26, 2019 at 4:18 AM Chen, Ciyong
> >> > > > > > > <ci...@intel.com>
> >> > > > > wrote:
> >> > > > > > > >
> >> > > > > > > > Hi Pedro,
> >> > > > > > > >
> >> > > > > > > > I'm looking at this case, and using the script of
> >> > > > > > > >
> >> "incubator-mxnet/example/image-classification/train_cifar10.py"
> >> > > > > > > > to get
> >> > > > > the timing data, but seems there's not much difference between
> >> mxnet
> >> > > > > 1.4.1.rc0 and 1.5.0.rc1 on C5.18xlarge.
> >> > > > > > > >
> >> > > > > > > > Not sure if there's any difference in the python script,
> can
> >> > you
> >> > > > > > > > point me
> >> > > > > the link to get your script (cifar10.py)?
> >> > > > > > > > Or you can also have a try with MXNet's script
> >> > > > > > > > (train_cifar10.py) and see
> >> > > > > the performance.
> >> > > > > > > >
> >> > > > > > > > Here's the command I used to collect the time:
> >> > > > > > > >         python train_cifar10.py --num-epoch=5
> >> > > > > > > >
> >> > > > > > > > 1) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
> >> > > > > > > >         real    9m4.880s
> >> > > > > > > >         user    333m13.340s
> >> > > > > > > >         sys     14m36.100s
> >> > > > > > > >
> >> > > > > > > > 2) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
> >> > > > > > > >         real    9m2.155s
> >> > > > > > > >         user    329m37.092s
> >> > > > > > > >         sys     16m8.668s
> >> > > > > > > >
> >> > > > > > > > -Ciyong
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > -----Original Message-----
> >> > > > > > > > From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com]
> >> > > > > > > > Sent: Wednesday, June 26, 2019 6:28 AM
> >> > > > > > > > To: dev@mxnet.incubator.apache.org
> >> > > > > > > > Cc: dev@mxnet.apache.org
> >> > > > > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating)
> >> version
> >> > > > > > > > 1.5.0.rc1
> >> > > > > > > >
> >> > > > > > > > Hi these were my build flags and system info:
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > --- # CMake configuration
> >> > > > > > > > USE_CUDA: "OFF" # Build with CUDA support
> >> > > > > > > > USE_OLDCMAKECUDA: "OFF" # Build with old cmake cuda
> >> > > > > > > > USE_NCCL: "OFF" # Use NVidia NCCL with CUDA
> >> > > > > > > > USE_OPENCV: "ON" # Build with OpenCV support
> >> > > > > > > > USE_OPENMP: "ON" # Build with Openmp support
> >> > > > > > > > USE_CUDNN: "ON" # Build with cudnn support) # one could
> set
> >> > > > > > > > CUDNN_ROOT for search path
> >> > > > > > > > USE_SSE: "ON" # Build with x86 SSE instruction support IF
> >> NOT
> >> > > > > > > > ARM
> >> > > > > > > > USE_F16C: "ON" # Build with x86 F16C instruction support)
> #
> >> > > > > autodetects support if "ON"
> >> > > > > > > > USE_LAPACK: "ON" # Build with lapack support
> >> > > > > > > > USE_MKL_IF_AVAILABLE: "ON" # Use MKL if found
> >> > > > > > > > USE_MKLML_MKL: "ON" # Use MKLDNN variant of MKL (if MKL
> >> found)
> >> > > > > > > > IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
> >> > > > > > > > USE_MKLDNN: "ON" # Use MKLDNN variant of MKL (if MKL
> found)
> >> IF
> >> > > > > > > > USE_MKL_IF_AVAILABLE AND (NOT APPLE)
> >> > > > > > > > USE_OPERATOR_TUNING: "ON" # Enable auto-tuning of
> operators
> >> IF
> >> > > > > NOT
> >> > > > > > > > MSVC
> >> > > > > > > > USE_GPERFTOOLS: "ON" # Build with GPerfTools support (if
> >> found)
> >> > > > > > > > USE_JEMALLOC: "ON" # Build with Jemalloc support
> >> > > > > > > > USE_PROFILER: "ON" # Build with Profiler support
> >> > > > > > > > USE_DIST_KVSTORE: "OFF" # Build with DIST_KVSTORE support
> >> > > > > > > > USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC Plugins
> >> > > > > > > > USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin
> >> > > > > > > > USE_CPP_PACKAGE: "OFF" # Build C++ Package
> >> > > > > > > > USE_MXNET_LIB_NAMING: "ON" # Use MXNet library naming
> >> > > > > conventions.
> >> > > > > > > > USE_GPROF: "OFF" # Compile with gprof (profiling) flag
> >> > > > > > > > USE_CXX14_IF_AVAILABLE: "OFF" # Build with C++14 if the
> >> > compiler
> >> > > > > > > > supports it
> >> > > > > > > > USE_VTUNE: "OFF" # Enable use of Intel Amplifier XE
> >> (VTune)) #
> >> > > > > > > > one could set VTUNE_ROOT for search path
> >> > > > > > > > ENABLE_CUDA_RTC: "ON" # Build with CUDA runtime
> compilation
> >> > > > > > > > support
> >> > > > > > > > BUILD_CPP_EXAMPLES: "ON" # Build cpp examples
> >> > > > > > > > INSTALL_EXAMPLES: "OFF" # Install the example source
> files.
> >> > > > > > > > USE_SIGNAL_HANDLER: "ON" # Print stack traces on
> segfaults.
> >> > > > > > > > USE_TENSORRT: "OFF" # Enable infeference optimization with
> >> > > > TensorRT.
> >> > > > > > > > USE_ASAN: "OFF" # Enable Clang/GCC ASAN sanitizers.
> >> > > > > > > > ENABLE_TESTCOVERAGE: "OFF" # Enable compilation with test
> >> > > > > > > > coverage metric output
> >> > > > > > > > CMAKE_BUILD_TYPE: "Release"
> >> > > > > > > > CMAKE_CUDA_COMPILER_LAUNCHER: "ccache"
> >> > > > > > > > CMAKE_C_COMPILER_LAUNCHER: "ccache"
> >> > > > > > > > CMAKE_CXX_COMPILER_LAUNCHER: "ccache"
> >> > > > > > > >
> >> > > > > > > > commit 4d9667121ae6fb643f2a02ab15e25231ed756cde (HEAD,
> tag:
> >> > > > > > > > 1.5.0.rc1,
> >> > > > > > > > upstream/v1.5.x)
> >> > > > > > > > commit 1a7199691f5cbc6012bb53eecbf884bed5ae6590 (HEAD,
> tag:
> >> > > > > > > > 1.4.1.rc0,
> >> > > > > > > > upstream/v1.4.x)
> >> > > > > > > >
> >> > > > > > > > curl
> http://169.254.169.254/latest/meta-data/instance-type
> >> > > > > > > > c5d.18xlarge
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > Version      : 3.6.7
> >> > > > > > > > Compiler     : GCC 8.2.0
> >> > > > > > > > Build        : ('default', 'Oct 22 2018 11:32:17')
> >> > > > > > > > Arch         : ('64bit', 'ELF')
> >> > > > > > > > ------------Pip Info-----------
> >> > > > > > > > Version      : 19.1.1
> >> > > > > > > > Directory    :
> >> > /home/piotr/mxnet_1.5/py3_venv/lib/python3.6/site-
> >> > > > > packages/pip
> >> > > > > > > > ----------MXNet Info-----------
> >> > > > > > > > Version      : 1.5.0
> >> > > > > > > > Directory    : /home/piotr/mxnet_1.5/python/mxnet
> >> > > > > > > > Hashtag not found. Not installed from pre-built package.
> >> > > > > > > > ----------System Info----------
> >> > > > > > > > Platform     :
> >> > > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
> >> > > > > > > > system       : Linux
> >> > > > > > > > node         : ip-172-31-63-171
> >> > > > > > > > release      : 4.15.0-1035-aws
> >> > > > > > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
> >> > > > > > > > ----------Hardware Info----------
> >> > > > > > > > machine      : x86_64
> >> > > > > > > > processor    : x86_64
> >> > > > > > > > Architecture:        x86_64
> >> > > > > > > > CPU op-mode(s):      32-bit, 64-bit
> >> > > > > > > > Byte Order:          Little Endian
> >> > > > > > > > CPU(s):              72
> >> > > > > > > > On-line CPU(s) list: 0-71
> >> > > > > > > > Thread(s) per core:  2
> >> > > > > > > > Core(s) per socket:  18
> >> > > > > > > > Socket(s):           2
> >> > > > > > > > NUMA node(s):        2
> >> > > > > > > > Vendor ID:           GenuineIntel
> >> > > > > > > > CPU family:          6
> >> > > > > > > > Model:               85
> >> > > > > > > > Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @
> >> > 3.00GHz
> >> > > > > > > > Stepping:            4
> >> > > > > > > > CPU MHz:             1326.446
> >> > > > > > > > BogoMIPS:            6000.00
> >> > > > > > > > Hypervisor vendor:   KVM
> >> > > > > > > > Virtualization type: full
> >> > > > > > > > L1d cache:           32K
> >> > > > > > > > L1i cache:           32K
> >> > > > > > > > L2 cache:            1024K
> >> > > > > > > > L3 cache:            25344K
> >> > > > > > > > NUMA node0 CPU(s):   0-17,36-53
> >> > > > > > > > NUMA node1 CPU(s):   18-35,54-71
> >> > > > > > > > Flags:               fpu vme de pse tsc msr pae mce cx8
> apic
> >> > sep
> >> > > > mtrr
> >> > > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht
> >> syscall
> >> > > > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good
> nopl
> >> > > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
> monitor
> >> > > > > > > > ssse3 fma cx16 pcid
> >> > > > > > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
> >> xsave
> >> > > > > > > > avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
> >> > > > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep
> >> bmi2
> >> > > > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx smap
> >> > clflushopt
> >> > > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
> >> xsaves
> >> > > > > > > > ida arat pku ospke ----------Network Test----------
> >> > > > > > > >
> >> > > > > > > > ----------Python Info----------
> >> > > > > > > > Version      : 3.6.7
> >> > > > > > > > Compiler     : GCC 8.2.0
> >> > > > > > > > Build        : ('default', 'Oct 22 2018 11:32:17')
> >> > > > > > > > Arch         : ('64bit', 'ELF')
> >> > > > > > > > ------------Pip Info-----------
> >> > > > > > > > Version      : 19.1.1
> >> > > > > > > > Directory    :
> >> > /home/piotr/mxnet_1.4/py3_venv/lib/python3.6/site-
> >> > > > > packages/pip
> >> > > > > > > > ----------MXNet Info-----------
> >> > > > > > > > Version      : 1.4.1
> >> > > > > > > > Directory    : /home/piotr/mxnet_1.4/python/mxnet
> >> > > > > > > > Hashtag not found. Not installed from pre-built package.
> >> > > > > > > > ----------System Info----------
> >> > > > > > > > Platform     :
> >> > > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
> >> > > > > > > > system       : Linux
> >> > > > > > > > node         : ip-172-31-63-171
> >> > > > > > > > release      : 4.15.0-1035-aws
> >> > > > > > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
> >> > > > > > > > ----------Hardware Info----------
> >> > > > > > > > machine      : x86_64
> >> > > > > > > > processor    : x86_64
> >> > > > > > > > Architecture:        x86_64
> >> > > > > > > > CPU op-mode(s):      32-bit, 64-bit
> >> > > > > > > > Byte Order:          Little Endian
> >> > > > > > > > CPU(s):              72
> >> > > > > > > > On-line CPU(s) list: 0-71
> >> > > > > > > > Thread(s) per core:  2
> >> > > > > > > > Core(s) per socket:  18
> >> > > > > > > > Socket(s):           2
> >> > > > > > > > NUMA node(s):        2
> >> > > > > > > > Vendor ID:           GenuineIntel
> >> > > > > > > > CPU family:          6
> >> > > > > > > > Model:               85
> >> > > > > > > > Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @
> >> > 3.00GHz
> >> > > > > > > > Stepping:            4
> >> > > > > > > > CPU MHz:             1223.344
> >> > > > > > > > BogoMIPS:            6000.00
> >> > > > > > > > Hypervisor vendor:   KVM
> >> > > > > > > > Virtualization type: full
> >> > > > > > > > L1d cache:           32K
> >> > > > > > > > L1i cache:           32K
> >> > > > > > > > L2 cache:            1024K
> >> > > > > > > > L3 cache:            25344K
> >> > > > > > > > NUMA node0 CPU(s):   0-17,36-53
> >> > > > > > > > NUMA node1 CPU(s):   18-35,54-71
> >> > > > > > > > Flags:               fpu vme de pse tsc msr pae mce cx8
> apic
> >> > sep
> >> > > > mtrr
> >> > > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht
> >> syscall
> >> > > > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good
> nopl
> >> > > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
> monitor
> >> > > > > > > > ssse3 fma cx16 pcid
> >> > > > > > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
> >> xsave
> >> > > > > > > > avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
> >> > > > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep
> >> bmi2
> >> > > > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx smap
> >> > clflushopt
> >> > > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
> >> xsaves
> >> > > > > > > > ida arat pku ospke ----------Network Test----------
> >> > > > > > > >
> >> > > > > > > > On Tue, Jun 25, 2019 at 2:35 PM Pedro Larroy
> >> > > > > <pe...@gmail.com> wrote:
> >> > > > > > > > >
> >> > > > > > > > > I did a training of cifar10 in CPU and seems there's
> some
> >> > > > > > > > > regressions in the range of 7% increase of training time
> >> > against
> >> > > > 1.4.1:
> >> > > > > > > > >
> >> > > > > > > > > (py3_venv)
> >> > > > > > > > > piotr@ip-172-31-63-171
> >> :0:~/deeplearning-benchmark/dawnbench
> >> > > > > > > > > (master)+$ time python cifar10.py --epochs 5
> >> > > > > > > > > real    11m30.388s
> >> > > > > > > > > user    417m7.766s
> >> > > > > > > > > sys     16m57.315s
> >> > > > > > > > >
> >> > > > > > > > > VS 1.4.1:
> >> > > > > > > > > real    10m41.994s
> >> > > > > > > > > user    392m40.646s
> >> > > > > > > > > sys     12m30.601s
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > On Thu, Jun 20, 2019 at 10:15 PM Lai Wei <
> >> > royweilai@gmail.com>
> >> > > > > wrote:
> >> > > > > > > > > >
> >> > > > > > > > > > Hi Anirudh,
> >> > > > > > > > > >
> >> > > > > > > > > > Thanks for jumping into this quickly, I followed up on
> >> the
> >> > > > issue.
> >> > > > > > > > > >
> >> > > > > > > > > > I was meant for sockeye developer/maintainers to help
> >> setup
> >> > > > > > > > > > nightly tests and raise issues early.
> >> > > > > > > > > >
> >> > > > > > > > > > Thanks!
> >> > > > > > > > > >
> >> > > > > > > > > > On Fri, Jun 21, 2019 at 10:10 AM Haibin Lin
> >> > > > > > > > > > <ha...@gmail.com>
> >> > > > > > > > > > wrote:
> >> > > > > > > > > >
> >> > > > > > > > > > > In GluonNLP we are testing with MXNET nightly build
> >> for
> >> > > > > > > > > > > each PR, and we did find some MXNet related issue
> >> caught
> >> > by
> >> > > > the CI.
> >> > > > > > > > > > > I recommend other toolkits also add integration
> tests
> >> > with
> >> > > > > > > > > > > MXNet
> >> > > > > nightly.
> >> > > > > > > > > > > It helps identify issues early.
> >> > > > > > > > > > >
> >> > > > > > > > > > > Best,
> >> > > > > > > > > > > Haibin
> >> > > > > > > > > > >
> >> > > > > > > > > > > On Thu, Jun 20, 2019 at 18:52 Zhao, Patric
> >> > > > > > > > > > > <pa...@intel.com>
> >> > > > > wrote:
> >> > > > > > > > > > >
> >> > > > > > > > > > > > Thanks to raise the issue and we will take a look
> >> ASAP.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > The downstream cases is not in the MXNet CI so
> it's
> >> > hard
> >> > > > > > > > > > > > to catch the potential bugs or performance
> >> degradation
> >> > > > > > > > > > > > for
> >> > > > > MXNet developers.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > In the future, I suggest adding the major
> downstream
> >> > > > > > > > > > > > test cases, like
> >> > > > > > > > > > > from
> >> > > > > > > > > > > > sockeye, GluonNLP, GLuonCV, DGL, Gluon-TS, into
> the
> >> > > > > > > > > > > > nightly
> >> > > > > test.
> >> > > > > > > > > > > > If it's still too heavy,  maybe testing it weekly
> or
> >> > > > > > > > > > > > monthly :)
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Thanks,
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > --Patric
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > > -----Original Message-----
> >> > > > > > > > > > > > > From: Anirudh Subramanian
> >> > > > > > > > > > > > > [mailto:anirudh2290@gmail.com]
> >> > > > > > > > > > > > > Sent: Friday, June 21, 2019 9:31 AM
> >> > > > > > > > > > > > > To: dev@mxnet.incubator.apache.org
> >> > > > > > > > > > > > > Cc: dev@mxnet.apache.org
> >> > > > > > > > > > > > > Subject: Re: [VOTE] Release Apache MXNet
> >> (incubating)
> >> > > > > > > > > > > > > version
> >> > > > > > > > > > > > > 1.5.0.rc1
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > Hi Lai,
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > I have opened an issue:
> >> > > > > > > > > > > > >
> >> > https://github.com/apache/incubator-mxnet/issues/15297
> >> > > > > > > > > > > > > I came to know about this issue only today and I
> >> have
> >> > > > > > > > > > > > > not been
> >> > > > > > > > > > > monitoring
> >> > > > > > > > > > > > > sockeye.
> >> > > > > > > > > > > > > I jumped onto this issue to make sure it wasn't
> >> > caused
> >> > > > > > > > > > > > > by the dlpack
> >> > > > > > > > > > > > changes.
> >> > > > > > > > > > > > > Also, I don't  think sockeye CI checks against
> >> > master,
> >> > > > > > > > > > > > > it is using
> >> > > > > > > > > > > 1.4.1.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > Anirudh
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei
> >> > > > > > > > > > > > > <ro...@gmail.com>
> >> > > > > wrote:
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Hi,
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Could you share which test failed and what’s
> the
> >> > > > > > > > > > > > > > crash? How to reproduce it?
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > I was able to install sockeye and run all
> tests
> >> > passed.
> >> > > > > > > > > > > > > > Using python setup.py test
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > I have tested both nightly pip package and
> >> > 1.5.0.rc1
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > It would be great to create an issue with
> >> > > > > > > > > > > > > > reproducible steps and move the discussion
> >> there.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Also I see sockeye nightly build[1] has been
> >> > failing
> >> > > > > > > > > > > > > > for some time,
> >> > > > > > > > > > > if
> >> > > > > > > > > > > > > > it’s due to MXNet change, please raise this
> >> early
> >> > so
> >> > > > > > > > > > > > > > we can track and solve it in time rather than
> >> block
> >> > > > > > > > > > > > > > the release
> >> > > > > during vote time.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > [1] https://travis-ci.org/awslabs/sockeye
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 7:01 AM Anirudh
> >> Subramanian
> >> > > > > > > > > > > > > > <anirudh2290@gmail.com
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > I was able to reproduce a crash with the
> >> commit
> >> > > > > > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06 but
> >> not
> >> > > > > > > > > > > > > > > with the commit
> >> > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c.
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > Anirudh
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei
> >> > > > > > > > > > > > > > > <ro...@gmail.com>
> >> > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > Hi Przemyslaw,
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > Is there an issue with more details to
> track
> >> > the
> >> > > > problem?
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 6:04 AM Przemysław
> >> > > > > > > > > > > > > > > > Trędak <pt...@apache.org>
> >> > > > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > -1
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > There is a crash in sockeye unit test
> >> (python
> >> > > > > > > > > > > > > > > > > setup.py
> >> > > > > > > > > > > > > > > > > test) observed starting with nightly 1.5
> >> > build
> >> > > > > > > > > > > > > > > > > from
> >> > > > > > > > > > > > > > > > > 6/13 and still occuring in
> >> > > > > > > > > > > > > > > 1.5rc1. I
> >> > > > > > > > > > > > > > > > > don't yet have the exact commit that is
> >> > > > > > > > > > > > > > > > > responsible for it, but it is either
> >> > > > > > > > > > > > > > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c
> >> > > > > > > > > > > > > > > > > (dlpack
> >> > > > > > > > > > > > > > > > > related) or
> >> > > > > > > > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06
> >> > > > > > > > > > > > > > > > > (cached op
> >> > > > > > > > > > > > > optimization).
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > On 2019/06/20 06:36:22, Lai Wei
> >> > > > > > > > > > > > > > > > > <ro...@gmail.com>
> >> > > > > wrote:
> >> > > > > > > > > > > > > > > > > > Dear MXNet community,
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > This is the 3-day vote to release
> Apache
> >> > > > > > > > > > > > > > > > > > MXNet
> >> > > > > > > > > > > > > > > > > > (incubating) version
> >> > > > > > > > > > > > > > > > > 1.5.0.
> >> > > > > > > > > > > > > > > > > > Voting on dev@ will start June 19,
> >> > > > > > > > > > > > > > > > > > 23:59:59(PST) and close
> >> > > > > > > > > > > on
> >> > > > > > > > > > > > > > June
> >> > > > > > > > > > > > > > > > 22,
> >> > > > > > > > > > > > > > > > > > 23:59:59.
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > 1) Link to release notes:
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Re
> >> > > > > > > > > > > le
> >> > > > > > > > > > > ase+No
> >> > > > > > > > > > > te
> >> > > > > > > > > > > > > > > s
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > 2) Link to release candidate:
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > https://github.com/apache/incubator-mxnet/releases/tag/1.5
> >> > > > > > > > > > > .0
> >> > > > > > > > > > > .r
> >> > > > > > > > > > > > > > > > > > c1
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > 3) Link to source and signatures on
> >> apache
> >> > > > dist server:
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5
> >> > > > > > > > > > > .0
> >> > > > > > > > > > > .r
> >> > > > > > > > > > > > > > > > > > c1/
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > Please remember to TEST first before
> >> voting
> >> > > > > accordingly:
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > +1 = approve
> >> > > > > > > > > > > > > > > > > > +0 = no opinion
> >> > > > > > > > > > > > > > > > > > -1 = disapprove (provide reason)
> >> > > > > > > > > > > > > > > > > > --
> >> > > > > > > > > > > > > > > > > > Best Regards
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > Lai
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > --
> >> > > > > > > > > > > > > > > > Best Regards
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > Lai
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > --
> >> > > > > > > > > > > > > > Best Regards
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Lai
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > --
> >> > > > > > > > > > Best Regards
> >> > > > > > > > > >
> >> > > > > > > > > > Lai
> >> > > >
> >> > > --
> >> > > Best Regards
> >> > >
> >> > > Lai
> >> >
> >> >
> >
> >
> >>
> >> --
> >> Sandeep Krishnamurthy
> >>
> >
>

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Posted by Chris Olivier <cj...@gmail.com>.
for batch norm, I mean. max*

On Sat, Jun 29, 2019 at 12:34 PM Chris Olivier <cj...@gmail.com>
wrote:

> what’s with the mac memory usage being 2x in 1.4? As I am not sure where
> the number is coming from (if it’s my profiler code, I wouldn’t consider it
> terribly meaningful), but it is the same everywhere else, so it kind of
> sticks out.
>
> On Thu, Jun 27, 2019 at 3:36 PM sandeep krishnamurthy <
> sandeep.krishna98@gmail.com> wrote:
>
>> Hello Ciyong/Pedro,
>>
>> Ran operator benchmarks on 1.4.1 and 1.5.0.rc2. (Not complete, doesn’t
>> cover all MXNet operators, not presented in best possible way, still WIP)
>>
>> https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50
>>
>> Following operators looks slower in 1.5 compared to 1.4.1:
>> - BatchNorm
>> - Pooling
>> - FullyConnected
>> - batch_dot
>> - Dot
>> - broadcast_mul
>> - log_softmax
>> and few other operators
>>
>> Also, several operators runs a lot faster on 1.5 compared to 1.4.1. For
>> example - Convolution, flatten, elementwise operators etc. So I see that
>> likely few operators have regressed noticeably, however, due to other
>> operator performance improvements, the end effect is not that significant
>> hiding a lot of regression. We need more detailed analysis per operator
>> performance. We will not be able to do this for current release, we should
>> have a more concrete way to determining such performance regression before
>> next release.
>>
>> Setup:
>> 1.5 => Build from source (head of 1.5.rc2 tag), built with MKLDNN
>> 1.4.1 => PyPi mxnet-mkl==1.4.1
>> Machine: C5.18X
>> No explicit environment variable were set
>> Operator benchmark code -
>> https://github.com/apache/incubator-mxnet/tree/master/benchmark/opperf
>>
>> Best,
>> Sandeep
>>
>>
>> On Thu, Jun 27, 2019 at 10:42 AM Pedro Larroy <
>> pedro.larroy.lists@gmail.com>
>> wrote:
>>
>> > I will try to run a few benchmarks in a bare metal instance tonight to
>> > remove virtualization variance for the measurements and provide some
>> > numbers.
>> >
>> > Please propose a set of models / examples that would be desirable to
>> > run before the release and provide a link to an easy to run script
>> > with instructions so we can validate the release better.
>> >
>> > Thank you.
>> >
>> > On Thu, Jun 27, 2019 at 10:01 AM Lai Wei <ro...@gmail.com> wrote:
>> > >
>> > > Dear @dev,
>> > >
>> > > I m cancelling the vote for cached op fix:
>> > >
>> > > https://github.com/apache/incubator-mxnet/pull/15298
>> > >
>> > > As for the possible cpu training regression, it looks like not a
>> blocker
>> > > for now.
>> > >
>> > > I will start a new rc2 vote, please help to validate.
>> > >
>> > > Thanks!
>> > >
>> > >
>> > > On Thu, Jun 27, 2019 at 10:06 PM Chen, Ciyong <ci...@intel.com>
>> > wrote:
>> > >
>> > > > Hi Pedro,
>> > > >
>> > > > I was able to reproduced the similar result (v1.5 is ~%5.6 slower
>> than
>> > > > v1.4, I was using 18 cores for computing) with your script on
>> > C5.18xlarge.
>> > > > But need to bind the cores with below command when running the
>> script,
>> > > > (without setting the env variables, I got a close time (<1%) with
>> v1.5
>> > and
>> > > > v1.4)
>> > > >         export
>> KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0
>> > > >         export OMP_NUM_THREADS=18
>> > > >
>> > > > Did you set any env variables during running?
>> > > >
>> > > > The performance result I got as below:
>> > > > 1) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
>> > > > real    12m10.856s
>> > > > user    234m49.576s
>> > > > sys     4m38.044s
>> > > >
>> > > > 2) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
>> > > > real    12m52.140s
>> > > > user    246m30.740s
>> > > > sys     5m8.188s
>> > > >
>> > > > As I looked at the profiling data, most of the ops have same perf
>> > between
>> > > > v1.4 and v1.5. But some ops like " _backward_BatchNorm" and
>> "Pooling"
>> > is
>> > > > ~1.37x slower on v1.5 compared with v1.4.
>> > > > Will do further analysis on these ops.
>> > > >
>> > > > Here's the hardware/OS info from my side:
>> > > > ----------Python Info----------
>> > > > Version      : 3.6.8
>> > > > Compiler     : GCC 7.3.0
>> > > > Build        : ('default', 'Dec 30 2018 01:22:34')
>> > > > Arch         : ('64bit', '')
>> > > > ------------Pip Info-----------
>> > > > Version      : 19.0.3
>> > > > Directory    :
>> > > >
>> /home/ubuntu/anaconda3/envs/perf-mxnet/lib/python3.6/site-packages/pip
>> > > > ----------MXNet Info-----------
>> > > > Version      : 1.5.0
>> > > > Directory    : /home/ubuntu/ws/incubator-mxnet/python/mxnet
>> > > > Hashtag not found. Not installed from pre-built package.
>> > > > ----------System Info----------
>> > > > Platform     : Linux-4.4.0-1085-aws-x86_64-with-debian-stretch-sid
>> > > > system       : Linux
>> > > > node         : ip-172-31-32-129
>> > > > release      : 4.4.0-1085-aws
>> > > > version      : #96-Ubuntu SMP Tue Jun 11 09:08:32 UTC 2019
>> > > > ----------Hardware Info----------
>> > > > machine      : x86_64
>> > > > processor    : x86_64
>> > > > Architecture:          x86_64
>> > > > CPU op-mode(s):        32-bit, 64-bit
>> > > > Byte Order:            Little Endian
>> > > > CPU(s):                72
>> > > > On-line CPU(s) list:   0-71
>> > > > Thread(s) per core:    2
>> > > > Core(s) per socket:    18
>> > > > Socket(s):             2
>> > > > NUMA node(s):          2
>> > > > Vendor ID:             GenuineIntel
>> > > > CPU family:            6
>> > > > Model:                 85
>> > > > Model name:            Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
>> > > > Stepping:              3
>> > > > CPU MHz:               3000.000
>> > > > BogoMIPS:              6000.00
>> > > > Hypervisor vendor:     KVM
>> > > > Virtualization type:   full
>> > > > L1d cache:             32K
>> > > > L1i cache:             32K
>> > > > L2 cache:              1024K
>> > > > L3 cache:              25344K
>> > > > NUMA node0 CPU(s):     0-17,36-53
>> > > > NUMA node1 CPU(s):     18-35,54-71
>> > > > Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep
>> mtrr
>> > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx
>> > pdpe1gb
>> > > > rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology
>> nonstop_tsc
>> > > > aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid
>> > sse4_1
>> > > > sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c
>> rdrand
>> > > > hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase
>> > > > tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f
>> rdseed
>> > adx
>> > > > smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 ida arat pku
>> > > > ----------Network Test----------
>> > > >
>> > > >
>> > > > -Ciyong
>> > > >
>> > > >
>> > > > -----Original Message-----
>> > > > From: Zhao, Patric [mailto:patric.zhao@intel.com]
>> > > > Sent: Thursday, June 27, 2019 9:55 AM
>> > > > To: dev@mxnet.incubator.apache.org
>> > > > Cc: dev@mxnet.apache.org
>> > > > Subject: RE: [VOTE] Release Apache MXNet (incubating) version
>> 1.5.0.rc1
>> > > >
>> > > > Could we run more epochs to see the performance difference or
>> profiling
>> > > > the difference between good and bad run?
>> > > >
>> > > > > -----Original Message-----
>> > > > > From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com]
>> > > > > Sent: Thursday, June 27, 2019 9:35 AM
>> > > > > To: dev@mxnet.incubator.apache.org
>> > > > > Cc: dev@mxnet.apache.org
>> > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version
>> > > > > 1.5.0.rc1
>> > > > >
>> > > > > I run again and the gap is again bigger, I guess we need to
>> average
>> > > > > out the times across several runs:
>> > > > >
>> > > > > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
>> > > > > (master)+$ time ~/mxnet_1.4/py3_venv/bin/python cifar10.py
>> --epochs 5
>> > > > > && time ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
>> > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
>> > > > > ImageRecordIOParser2:
>> > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
>> > threads
>> > > > > for decoding..
>> > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean image
>> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>> > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean image
>> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>> completed
>> > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
>> > > > > ImageRecordIOParser2:
>> > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
>> threads
>> > > > > for decoding..
>> > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean image
>> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>> > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean image
>> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>> completed
>> > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005, 300:
>> > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 [23:17:09]
>> > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
>> > > > > 147456 bytes with malloc directly
>> > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
>> > > > > 589824 bytes with malloc directly
>> > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
>> > > > > 2359296 bytes with malloc directly
>> > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
>> > > > > 9437184 bytes with malloc directly
>> > > > > Epoch 0, Batch 199, Speed=384.149839
>> > > > > Epoch 0, Duration=140.919567
>> > > > > Epoch 0, Training accuracy=0.115169
>> > > > > Epoch 0, Validation accuracy=0.141317
>> > > > > Epoch 1, Batch 199, Speed=433.380512
>> > > > > Epoch 1, Duration=119.553233
>> > > > > Epoch 1, Training accuracy=0.170956
>> > > > > Epoch 1, Validation accuracy=0.216146
>> > > > > Epoch 2, Batch 199, Speed=434.864699
>> > > > > Epoch 2, Duration=123.278490
>> > > > > Epoch 2, Training accuracy=0.209455
>> > > > > Epoch 2, Validation accuracy=0.247296
>> > > > > Epoch 3, Batch 199, Speed=433.401854
>> > > > > Epoch 3, Duration=118.327797
>> > > > > Epoch 3, Training accuracy=0.248701
>> > > > > Epoch 3, Validation accuracy=0.302083
>> > > > > Epoch 4, Batch 199, Speed=419.713707
>> > > > > Epoch 4, Duration=126.468409
>> > > > > Epoch 4, Training accuracy=0.260949
>> > > > > Epoch 4, Validation accuracy=0.269030
>> > > > >
>> > > > > real    10m55.796s
>> > > > > user    399m33.567s
>> > > > > sys     13m55.904s
>> > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
>> > > > > ImageRecordIOParser2:
>> > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
>> > threads
>> > > > > for decoding..
>> > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean image
>> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>> > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean image
>> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>> completed
>> > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
>> > > > > ImageRecordIOParser2:
>> > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
>> threads
>> > > > > for decoding..
>> > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean image
>> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>> > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean image
>> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>> completed
>> > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005, 300:
>> > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0, Batch 199,
>> > > > > Speed=419.039188 Epoch 0, Duration=143.934903 Epoch 0, Training
>> > > > > accuracy=0.122542 Epoch 0, Validation accuracy=0.164359 Epoch 1,
>> > Batch
>> > > > > 199, Speed=445.257048 Epoch 1, Duration=135.248399 Epoch 1,
>> Training
>> > > > > accuracy=0.178828 Epoch 1, Validation accuracy=0.199419 Epoch 2,
>> > Batch
>> > > > > 199, Speed=447.115215 Epoch 2, Duration=132.003770 Epoch 2,
>> Training
>> > > > > accuracy=0.217808 Epoch 2, Validation accuracy=0.233073 Epoch 3,
>> > Batch
>> > > > > 199, Speed=441.079477 Epoch 3, Duration=126.543316 Epoch 3,
>> Training
>> > > > > accuracy=0.248102 Epoch 3, Validation accuracy=0.293870 Epoch 4,
>> > Batch
>> > > > > 199, Speed=449.329787 Epoch 4, Duration=138.398325 Epoch 4,
>> Training
>> > > > > accuracy=0.270021 Epoch 4, Validation accuracy=0.311498
>> > > > >
>> > > > > real    11m45.329s
>> > > > > user    426m13.908s
>> > > > > sys     16m45.093s
>> > > > >
>> > > > > On Wed, Jun 26, 2019 at 4:18 PM Pedro Larroy
>> > > > > <pe...@gmail.com> wrote:
>> > > > > >
>> > > > > > The difference looks smaller now, more like your numbers. I
>> wonder
>> > > > > > if something happened during the previous benchmark like a
>> system
>> > > > > > update...
>> > > > > >
>> > > > > >
>> > > > > > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
>> > > > > (master)+$
>> > > > > > time ~/mxnet_1.4/py3_venv/bin/python cifar10.py --epochs 5 &&
>> time
>> > > > > > ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5 [22:49:41]
>> > > > > > ../src/io/iter_image_recordio_2.cc:172:
>> > > > > > ImageRecordIOParser2:
>> > > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
>> > > > > > threads for decoding..
>> > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load mean
>> image
>> > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>> > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load mean
>> image
>> > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>> > > > > completed
>> > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:172:
>> > > > > > ImageRecordIOParser2:
>> > > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
>> > > > > > threads for decoding..
>> > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load mean
>> image
>> > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>> > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load mean
>> image
>> > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>> > > > > completed
>> > > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
>> 300:
>> > > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 [22:49:42]
>> > > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
>> > > > > > 147456 bytes with malloc directly
>> > > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
>> > > > > > 589824 bytes with malloc directly
>> > > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
>> > > > > > 2359296 bytes with malloc directly
>> > > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
>> > > > > > 9437184 bytes with malloc directly
>> > > > > > Epoch 0, Batch 199, Speed=426.182733 Epoch 0,
>> Duration=134.868458
>> > > > > > Epoch 0, Training accuracy=0.127238 Epoch 0, Validation
>> > > > > > accuracy=0.206388 Epoch 1, Batch 199, Speed=313.127156 Epoch 1,
>> > > > > > Duration=128.041775 Epoch 1, Training accuracy=0.182065 Epoch 1,
>> > > > > > Validation accuracy=0.202524 Epoch 2, Batch 199,
>> Speed=410.931187
>> > > > > > Epoch 2, Duration=124.920588 Epoch 2, Training accuracy=0.202584
>> > > > > > Epoch 2, Validation accuracy=0.245693 Epoch 3, Batch 199,
>> > > > > > Speed=419.119335 Epoch 3, Duration=120.948349 Epoch 3, Training
>> > > > > > accuracy=0.235854 Epoch 3, Validation accuracy=0.291066 Epoch 4,
>> > > > > > Batch 199, Speed=430.473733 Epoch 4, Duration=130.181724 Epoch
>> 4,
>> > > > > > Training accuracy=0.257773 Epoch 4, Validation accuracy=0.304988
>> > > > > >
>> > > > > > real    11m7.356s
>> > > > > > user    406m9.910s
>> > > > > > sys     14m18.349s
>> > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
>> > > > > > ImageRecordIOParser2:
>> > > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
>> > > > > > threads for decoding..
>> > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load mean
>> image
>> > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>> > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load mean
>> image
>> > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>> > > > > completed
>> > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
>> > > > > > ImageRecordIOParser2:
>> > > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
>> > > > > > threads for decoding..
>> > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load mean
>> image
>> > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>> > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load mean
>> image
>> > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>> > > > > completed
>> > > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
>> 300:
>> > > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0, Batch
>> 199,
>> > > > > > Speed=348.618154 Epoch 0, Duration=146.469352 Epoch 0, Training
>> > > > > > accuracy=0.124121 Epoch 0, Validation accuracy=0.167227 Epoch 1,
>> > > > > > Batch 199, Speed=452.790825 Epoch 1, Duration=130.199421 Epoch
>> 1,
>> > > > > > Training
>> > > > > > accuracy=0.183863 Epoch 1, Validation accuracy=0.237079 Epoch 2,
>> > > > > > Batch 199, Speed=451.406559 Epoch 2, Duration=126.320823 Epoch
>> 2,
>> > > > > > Training
>> > > > > > accuracy=0.214844 Epoch 2, Validation accuracy=0.244692 Epoch 3,
>> > > > > > Batch 199, Speed=403.161873 Epoch 3, Duration=125.331660 Epoch
>> 3,
>> > > > > > Training
>> > > > > > accuracy=0.243506 Epoch 3, Validation accuracy=0.301182 Epoch 4,
>> > > > > > Batch 199, Speed=450.826598 Epoch 4, Duration=126.426253 Epoch
>> 4,
>> > > > > > Training
>> > > > > > accuracy=0.266424 Epoch 4, Validation accuracy=0.311899
>> > > > > >
>> > > > > > real    11m21.930s
>> > > > > > user    415m3.855s
>> > > > > > sys     13m53.975s
>> > > > > >
>> > > > > > On Wed, Jun 26, 2019 at 3:50 PM Pedro Larroy
>> > > > > > <pe...@gmail.com> wrote:
>> > > > > > >
>> > > > > > > Hi Ciyong, thanks for trying to reproduce:
>> > > > > > >
>> > > > > > > I used this one:
>> > > > > > > https://github.com/awslabs/deeplearning-
>> > > > > benchmark/blob/master/dawnbe
>> > > > > > > nch/cifar10.py
>> > > > > > >
>> > > > > > > Could you provide hardware and OS details?
>> > > > > > >
>> > > > > > > I will rerun and repost numbers in a few minutes.
>> > > > > > >
>> > > > > > > Pedro.
>> > > > > > >
>> > > > > > > On Wed, Jun 26, 2019 at 4:18 AM Chen, Ciyong
>> > > > > > > <ci...@intel.com>
>> > > > > wrote:
>> > > > > > > >
>> > > > > > > > Hi Pedro,
>> > > > > > > >
>> > > > > > > > I'm looking at this case, and using the script of
>> > > > > > > >
>> "incubator-mxnet/example/image-classification/train_cifar10.py"
>> > > > > > > > to get
>> > > > > the timing data, but seems there's not much difference between
>> mxnet
>> > > > > 1.4.1.rc0 and 1.5.0.rc1 on C5.18xlarge.
>> > > > > > > >
>> > > > > > > > Not sure if there's any difference in the python script, can
>> > you
>> > > > > > > > point me
>> > > > > the link to get your script (cifar10.py)?
>> > > > > > > > Or you can also have a try with MXNet's script
>> > > > > > > > (train_cifar10.py) and see
>> > > > > the performance.
>> > > > > > > >
>> > > > > > > > Here's the command I used to collect the time:
>> > > > > > > >         python train_cifar10.py --num-epoch=5
>> > > > > > > >
>> > > > > > > > 1) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
>> > > > > > > >         real    9m4.880s
>> > > > > > > >         user    333m13.340s
>> > > > > > > >         sys     14m36.100s
>> > > > > > > >
>> > > > > > > > 2) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
>> > > > > > > >         real    9m2.155s
>> > > > > > > >         user    329m37.092s
>> > > > > > > >         sys     16m8.668s
>> > > > > > > >
>> > > > > > > > -Ciyong
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > -----Original Message-----
>> > > > > > > > From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com]
>> > > > > > > > Sent: Wednesday, June 26, 2019 6:28 AM
>> > > > > > > > To: dev@mxnet.incubator.apache.org
>> > > > > > > > Cc: dev@mxnet.apache.org
>> > > > > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating)
>> version
>> > > > > > > > 1.5.0.rc1
>> > > > > > > >
>> > > > > > > > Hi these were my build flags and system info:
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > --- # CMake configuration
>> > > > > > > > USE_CUDA: "OFF" # Build with CUDA support
>> > > > > > > > USE_OLDCMAKECUDA: "OFF" # Build with old cmake cuda
>> > > > > > > > USE_NCCL: "OFF" # Use NVidia NCCL with CUDA
>> > > > > > > > USE_OPENCV: "ON" # Build with OpenCV support
>> > > > > > > > USE_OPENMP: "ON" # Build with Openmp support
>> > > > > > > > USE_CUDNN: "ON" # Build with cudnn support) # one could set
>> > > > > > > > CUDNN_ROOT for search path
>> > > > > > > > USE_SSE: "ON" # Build with x86 SSE instruction support IF
>> NOT
>> > > > > > > > ARM
>> > > > > > > > USE_F16C: "ON" # Build with x86 F16C instruction support) #
>> > > > > autodetects support if "ON"
>> > > > > > > > USE_LAPACK: "ON" # Build with lapack support
>> > > > > > > > USE_MKL_IF_AVAILABLE: "ON" # Use MKL if found
>> > > > > > > > USE_MKLML_MKL: "ON" # Use MKLDNN variant of MKL (if MKL
>> found)
>> > > > > > > > IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
>> > > > > > > > USE_MKLDNN: "ON" # Use MKLDNN variant of MKL (if MKL found)
>> IF
>> > > > > > > > USE_MKL_IF_AVAILABLE AND (NOT APPLE)
>> > > > > > > > USE_OPERATOR_TUNING: "ON" # Enable auto-tuning of operators
>> IF
>> > > > > NOT
>> > > > > > > > MSVC
>> > > > > > > > USE_GPERFTOOLS: "ON" # Build with GPerfTools support (if
>> found)
>> > > > > > > > USE_JEMALLOC: "ON" # Build with Jemalloc support
>> > > > > > > > USE_PROFILER: "ON" # Build with Profiler support
>> > > > > > > > USE_DIST_KVSTORE: "OFF" # Build with DIST_KVSTORE support
>> > > > > > > > USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC Plugins
>> > > > > > > > USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin
>> > > > > > > > USE_CPP_PACKAGE: "OFF" # Build C++ Package
>> > > > > > > > USE_MXNET_LIB_NAMING: "ON" # Use MXNet library naming
>> > > > > conventions.
>> > > > > > > > USE_GPROF: "OFF" # Compile with gprof (profiling) flag
>> > > > > > > > USE_CXX14_IF_AVAILABLE: "OFF" # Build with C++14 if the
>> > compiler
>> > > > > > > > supports it
>> > > > > > > > USE_VTUNE: "OFF" # Enable use of Intel Amplifier XE
>> (VTune)) #
>> > > > > > > > one could set VTUNE_ROOT for search path
>> > > > > > > > ENABLE_CUDA_RTC: "ON" # Build with CUDA runtime compilation
>> > > > > > > > support
>> > > > > > > > BUILD_CPP_EXAMPLES: "ON" # Build cpp examples
>> > > > > > > > INSTALL_EXAMPLES: "OFF" # Install the example source files.
>> > > > > > > > USE_SIGNAL_HANDLER: "ON" # Print stack traces on segfaults.
>> > > > > > > > USE_TENSORRT: "OFF" # Enable infeference optimization with
>> > > > TensorRT.
>> > > > > > > > USE_ASAN: "OFF" # Enable Clang/GCC ASAN sanitizers.
>> > > > > > > > ENABLE_TESTCOVERAGE: "OFF" # Enable compilation with test
>> > > > > > > > coverage metric output
>> > > > > > > > CMAKE_BUILD_TYPE: "Release"
>> > > > > > > > CMAKE_CUDA_COMPILER_LAUNCHER: "ccache"
>> > > > > > > > CMAKE_C_COMPILER_LAUNCHER: "ccache"
>> > > > > > > > CMAKE_CXX_COMPILER_LAUNCHER: "ccache"
>> > > > > > > >
>> > > > > > > > commit 4d9667121ae6fb643f2a02ab15e25231ed756cde (HEAD, tag:
>> > > > > > > > 1.5.0.rc1,
>> > > > > > > > upstream/v1.5.x)
>> > > > > > > > commit 1a7199691f5cbc6012bb53eecbf884bed5ae6590 (HEAD, tag:
>> > > > > > > > 1.4.1.rc0,
>> > > > > > > > upstream/v1.4.x)
>> > > > > > > >
>> > > > > > > > curl http://169.254.169.254/latest/meta-data/instance-type
>> > > > > > > > c5d.18xlarge
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Version      : 3.6.7
>> > > > > > > > Compiler     : GCC 8.2.0
>> > > > > > > > Build        : ('default', 'Oct 22 2018 11:32:17')
>> > > > > > > > Arch         : ('64bit', 'ELF')
>> > > > > > > > ------------Pip Info-----------
>> > > > > > > > Version      : 19.1.1
>> > > > > > > > Directory    :
>> > /home/piotr/mxnet_1.5/py3_venv/lib/python3.6/site-
>> > > > > packages/pip
>> > > > > > > > ----------MXNet Info-----------
>> > > > > > > > Version      : 1.5.0
>> > > > > > > > Directory    : /home/piotr/mxnet_1.5/python/mxnet
>> > > > > > > > Hashtag not found. Not installed from pre-built package.
>> > > > > > > > ----------System Info----------
>> > > > > > > > Platform     :
>> > > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
>> > > > > > > > system       : Linux
>> > > > > > > > node         : ip-172-31-63-171
>> > > > > > > > release      : 4.15.0-1035-aws
>> > > > > > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
>> > > > > > > > ----------Hardware Info----------
>> > > > > > > > machine      : x86_64
>> > > > > > > > processor    : x86_64
>> > > > > > > > Architecture:        x86_64
>> > > > > > > > CPU op-mode(s):      32-bit, 64-bit
>> > > > > > > > Byte Order:          Little Endian
>> > > > > > > > CPU(s):              72
>> > > > > > > > On-line CPU(s) list: 0-71
>> > > > > > > > Thread(s) per core:  2
>> > > > > > > > Core(s) per socket:  18
>> > > > > > > > Socket(s):           2
>> > > > > > > > NUMA node(s):        2
>> > > > > > > > Vendor ID:           GenuineIntel
>> > > > > > > > CPU family:          6
>> > > > > > > > Model:               85
>> > > > > > > > Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @
>> > 3.00GHz
>> > > > > > > > Stepping:            4
>> > > > > > > > CPU MHz:             1326.446
>> > > > > > > > BogoMIPS:            6000.00
>> > > > > > > > Hypervisor vendor:   KVM
>> > > > > > > > Virtualization type: full
>> > > > > > > > L1d cache:           32K
>> > > > > > > > L1i cache:           32K
>> > > > > > > > L2 cache:            1024K
>> > > > > > > > L3 cache:            25344K
>> > > > > > > > NUMA node0 CPU(s):   0-17,36-53
>> > > > > > > > NUMA node1 CPU(s):   18-35,54-71
>> > > > > > > > Flags:               fpu vme de pse tsc msr pae mce cx8 apic
>> > sep
>> > > > mtrr
>> > > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht
>> syscall
>> > > > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl
>> > > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq monitor
>> > > > > > > > ssse3 fma cx16 pcid
>> > > > > > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
>> xsave
>> > > > > > > > avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
>> > > > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep
>> bmi2
>> > > > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx smap
>> > clflushopt
>> > > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
>> xsaves
>> > > > > > > > ida arat pku ospke ----------Network Test----------
>> > > > > > > >
>> > > > > > > > ----------Python Info----------
>> > > > > > > > Version      : 3.6.7
>> > > > > > > > Compiler     : GCC 8.2.0
>> > > > > > > > Build        : ('default', 'Oct 22 2018 11:32:17')
>> > > > > > > > Arch         : ('64bit', 'ELF')
>> > > > > > > > ------------Pip Info-----------
>> > > > > > > > Version      : 19.1.1
>> > > > > > > > Directory    :
>> > /home/piotr/mxnet_1.4/py3_venv/lib/python3.6/site-
>> > > > > packages/pip
>> > > > > > > > ----------MXNet Info-----------
>> > > > > > > > Version      : 1.4.1
>> > > > > > > > Directory    : /home/piotr/mxnet_1.4/python/mxnet
>> > > > > > > > Hashtag not found. Not installed from pre-built package.
>> > > > > > > > ----------System Info----------
>> > > > > > > > Platform     :
>> > > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
>> > > > > > > > system       : Linux
>> > > > > > > > node         : ip-172-31-63-171
>> > > > > > > > release      : 4.15.0-1035-aws
>> > > > > > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
>> > > > > > > > ----------Hardware Info----------
>> > > > > > > > machine      : x86_64
>> > > > > > > > processor    : x86_64
>> > > > > > > > Architecture:        x86_64
>> > > > > > > > CPU op-mode(s):      32-bit, 64-bit
>> > > > > > > > Byte Order:          Little Endian
>> > > > > > > > CPU(s):              72
>> > > > > > > > On-line CPU(s) list: 0-71
>> > > > > > > > Thread(s) per core:  2
>> > > > > > > > Core(s) per socket:  18
>> > > > > > > > Socket(s):           2
>> > > > > > > > NUMA node(s):        2
>> > > > > > > > Vendor ID:           GenuineIntel
>> > > > > > > > CPU family:          6
>> > > > > > > > Model:               85
>> > > > > > > > Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @
>> > 3.00GHz
>> > > > > > > > Stepping:            4
>> > > > > > > > CPU MHz:             1223.344
>> > > > > > > > BogoMIPS:            6000.00
>> > > > > > > > Hypervisor vendor:   KVM
>> > > > > > > > Virtualization type: full
>> > > > > > > > L1d cache:           32K
>> > > > > > > > L1i cache:           32K
>> > > > > > > > L2 cache:            1024K
>> > > > > > > > L3 cache:            25344K
>> > > > > > > > NUMA node0 CPU(s):   0-17,36-53
>> > > > > > > > NUMA node1 CPU(s):   18-35,54-71
>> > > > > > > > Flags:               fpu vme de pse tsc msr pae mce cx8 apic
>> > sep
>> > > > mtrr
>> > > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht
>> syscall
>> > > > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl
>> > > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq monitor
>> > > > > > > > ssse3 fma cx16 pcid
>> > > > > > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
>> xsave
>> > > > > > > > avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
>> > > > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep
>> bmi2
>> > > > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx smap
>> > clflushopt
>> > > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
>> xsaves
>> > > > > > > > ida arat pku ospke ----------Network Test----------
>> > > > > > > >
>> > > > > > > > On Tue, Jun 25, 2019 at 2:35 PM Pedro Larroy
>> > > > > <pe...@gmail.com> wrote:
>> > > > > > > > >
>> > > > > > > > > I did a training of cifar10 in CPU and seems there's some
>> > > > > > > > > regressions in the range of 7% increase of training time
>> > against
>> > > > 1.4.1:
>> > > > > > > > >
>> > > > > > > > > (py3_venv)
>> > > > > > > > > piotr@ip-172-31-63-171
>> :0:~/deeplearning-benchmark/dawnbench
>> > > > > > > > > (master)+$ time python cifar10.py --epochs 5
>> > > > > > > > > real    11m30.388s
>> > > > > > > > > user    417m7.766s
>> > > > > > > > > sys     16m57.315s
>> > > > > > > > >
>> > > > > > > > > VS 1.4.1:
>> > > > > > > > > real    10m41.994s
>> > > > > > > > > user    392m40.646s
>> > > > > > > > > sys     12m30.601s
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > On Thu, Jun 20, 2019 at 10:15 PM Lai Wei <
>> > royweilai@gmail.com>
>> > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > Hi Anirudh,
>> > > > > > > > > >
>> > > > > > > > > > Thanks for jumping into this quickly, I followed up on
>> the
>> > > > issue.
>> > > > > > > > > >
>> > > > > > > > > > I was meant for sockeye developer/maintainers to help
>> setup
>> > > > > > > > > > nightly tests and raise issues early.
>> > > > > > > > > >
>> > > > > > > > > > Thanks!
>> > > > > > > > > >
>> > > > > > > > > > On Fri, Jun 21, 2019 at 10:10 AM Haibin Lin
>> > > > > > > > > > <ha...@gmail.com>
>> > > > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > In GluonNLP we are testing with MXNET nightly build
>> for
>> > > > > > > > > > > each PR, and we did find some MXNet related issue
>> caught
>> > by
>> > > > the CI.
>> > > > > > > > > > > I recommend other toolkits also add integration tests
>> > with
>> > > > > > > > > > > MXNet
>> > > > > nightly.
>> > > > > > > > > > > It helps identify issues early.
>> > > > > > > > > > >
>> > > > > > > > > > > Best,
>> > > > > > > > > > > Haibin
>> > > > > > > > > > >
>> > > > > > > > > > > On Thu, Jun 20, 2019 at 18:52 Zhao, Patric
>> > > > > > > > > > > <pa...@intel.com>
>> > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > > > Thanks to raise the issue and we will take a look
>> ASAP.
>> > > > > > > > > > > >
>> > > > > > > > > > > > The downstream cases is not in the MXNet CI so it's
>> > hard
>> > > > > > > > > > > > to catch the potential bugs or performance
>> degradation
>> > > > > > > > > > > > for
>> > > > > MXNet developers.
>> > > > > > > > > > > >
>> > > > > > > > > > > > In the future, I suggest adding the major downstream
>> > > > > > > > > > > > test cases, like
>> > > > > > > > > > > from
>> > > > > > > > > > > > sockeye, GluonNLP, GLuonCV, DGL, Gluon-TS, into the
>> > > > > > > > > > > > nightly
>> > > > > test.
>> > > > > > > > > > > > If it's still too heavy,  maybe testing it weekly or
>> > > > > > > > > > > > monthly :)
>> > > > > > > > > > > >
>> > > > > > > > > > > > Thanks,
>> > > > > > > > > > > >
>> > > > > > > > > > > > --Patric
>> > > > > > > > > > > >
>> > > > > > > > > > > > > -----Original Message-----
>> > > > > > > > > > > > > From: Anirudh Subramanian
>> > > > > > > > > > > > > [mailto:anirudh2290@gmail.com]
>> > > > > > > > > > > > > Sent: Friday, June 21, 2019 9:31 AM
>> > > > > > > > > > > > > To: dev@mxnet.incubator.apache.org
>> > > > > > > > > > > > > Cc: dev@mxnet.apache.org
>> > > > > > > > > > > > > Subject: Re: [VOTE] Release Apache MXNet
>> (incubating)
>> > > > > > > > > > > > > version
>> > > > > > > > > > > > > 1.5.0.rc1
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Hi Lai,
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > I have opened an issue:
>> > > > > > > > > > > > >
>> > https://github.com/apache/incubator-mxnet/issues/15297
>> > > > > > > > > > > > > I came to know about this issue only today and I
>> have
>> > > > > > > > > > > > > not been
>> > > > > > > > > > > monitoring
>> > > > > > > > > > > > > sockeye.
>> > > > > > > > > > > > > I jumped onto this issue to make sure it wasn't
>> > caused
>> > > > > > > > > > > > > by the dlpack
>> > > > > > > > > > > > changes.
>> > > > > > > > > > > > > Also, I don't  think sockeye CI checks against
>> > master,
>> > > > > > > > > > > > > it is using
>> > > > > > > > > > > 1.4.1.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Anirudh
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei
>> > > > > > > > > > > > > <ro...@gmail.com>
>> > > > > wrote:
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > > Hi,
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Could you share which test failed and what’s the
>> > > > > > > > > > > > > > crash? How to reproduce it?
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > I was able to install sockeye and run all tests
>> > passed.
>> > > > > > > > > > > > > > Using python setup.py test
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > I have tested both nightly pip package and
>> > 1.5.0.rc1
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > It would be great to create an issue with
>> > > > > > > > > > > > > > reproducible steps and move the discussion
>> there.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Also I see sockeye nightly build[1] has been
>> > failing
>> > > > > > > > > > > > > > for some time,
>> > > > > > > > > > > if
>> > > > > > > > > > > > > > it’s due to MXNet change, please raise this
>> early
>> > so
>> > > > > > > > > > > > > > we can track and solve it in time rather than
>> block
>> > > > > > > > > > > > > > the release
>> > > > > during vote time.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > [1] https://travis-ci.org/awslabs/sockeye
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 7:01 AM Anirudh
>> Subramanian
>> > > > > > > > > > > > > > <anirudh2290@gmail.com
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > I was able to reproduce a crash with the
>> commit
>> > > > > > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06 but
>> not
>> > > > > > > > > > > > > > > with the commit
>> > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c.
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Anirudh
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei
>> > > > > > > > > > > > > > > <ro...@gmail.com>
>> > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Hi Przemyslaw,
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Is there an issue with more details to track
>> > the
>> > > > problem?
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 6:04 AM Przemysław
>> > > > > > > > > > > > > > > > Trędak <pt...@apache.org>
>> > > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > -1
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > There is a crash in sockeye unit test
>> (python
>> > > > > > > > > > > > > > > > > setup.py
>> > > > > > > > > > > > > > > > > test) observed starting with nightly 1.5
>> > build
>> > > > > > > > > > > > > > > > > from
>> > > > > > > > > > > > > > > > > 6/13 and still occuring in
>> > > > > > > > > > > > > > > 1.5rc1. I
>> > > > > > > > > > > > > > > > > don't yet have the exact commit that is
>> > > > > > > > > > > > > > > > > responsible for it, but it is either
>> > > > > > > > > > > > > > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c
>> > > > > > > > > > > > > > > > > (dlpack
>> > > > > > > > > > > > > > > > > related) or
>> > > > > > > > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06
>> > > > > > > > > > > > > > > > > (cached op
>> > > > > > > > > > > > > optimization).
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > On 2019/06/20 06:36:22, Lai Wei
>> > > > > > > > > > > > > > > > > <ro...@gmail.com>
>> > > > > wrote:
>> > > > > > > > > > > > > > > > > > Dear MXNet community,
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > This is the 3-day vote to release Apache
>> > > > > > > > > > > > > > > > > > MXNet
>> > > > > > > > > > > > > > > > > > (incubating) version
>> > > > > > > > > > > > > > > > > 1.5.0.
>> > > > > > > > > > > > > > > > > > Voting on dev@ will start June 19,
>> > > > > > > > > > > > > > > > > > 23:59:59(PST) and close
>> > > > > > > > > > > on
>> > > > > > > > > > > > > > June
>> > > > > > > > > > > > > > > > 22,
>> > > > > > > > > > > > > > > > > > 23:59:59.
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > 1) Link to release notes:
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > >
>> > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Re
>> > > > > > > > > > > le
>> > > > > > > > > > > ase+No
>> > > > > > > > > > > te
>> > > > > > > > > > > > > > > s
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > 2) Link to release candidate:
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > >
>> > https://github.com/apache/incubator-mxnet/releases/tag/1.5
>> > > > > > > > > > > .0
>> > > > > > > > > > > .r
>> > > > > > > > > > > > > > > > > > c1
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > 3) Link to source and signatures on
>> apache
>> > > > dist server:
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > >
>> > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5
>> > > > > > > > > > > .0
>> > > > > > > > > > > .r
>> > > > > > > > > > > > > > > > > > c1/
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > Please remember to TEST first before
>> voting
>> > > > > accordingly:
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > +1 = approve
>> > > > > > > > > > > > > > > > > > +0 = no opinion
>> > > > > > > > > > > > > > > > > > -1 = disapprove (provide reason)
>> > > > > > > > > > > > > > > > > > --
>> > > > > > > > > > > > > > > > > > Best Regards
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > Lai
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > --
>> > > > > > > > > > > > > > > > Best Regards
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Lai
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > --
>> > > > > > > > > > > > > > Best Regards
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Lai
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > --
>> > > > > > > > > > Best Regards
>> > > > > > > > > >
>> > > > > > > > > > Lai
>> > > >
>> > > --
>> > > Best Regards
>> > >
>> > > Lai
>> >
>> >
>
>
>>
>> --
>> Sandeep Krishnamurthy
>>
>

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Posted by Chris Olivier <cj...@gmail.com>.
what’s with the mac memory usage being 2x in 1.4? As I am not sure where
the number is coming from (if it’s my profiler code, I wouldn’t consider it
terribly meaningful), but it is the same everywhere else, so it kind of
sticks out.

On Thu, Jun 27, 2019 at 3:36 PM sandeep krishnamurthy <
sandeep.krishna98@gmail.com> wrote:

> Hello Ciyong/Pedro,
>
> Ran operator benchmarks on 1.4.1 and 1.5.0.rc2. (Not complete, doesn’t
> cover all MXNet operators, not presented in best possible way, still WIP)
>
> https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50
>
> Following operators looks slower in 1.5 compared to 1.4.1:
> - BatchNorm
> - Pooling
> - FullyConnected
> - batch_dot
> - Dot
> - broadcast_mul
> - log_softmax
> and few other operators
>
> Also, several operators runs a lot faster on 1.5 compared to 1.4.1. For
> example - Convolution, flatten, elementwise operators etc. So I see that
> likely few operators have regressed noticeably, however, due to other
> operator performance improvements, the end effect is not that significant
> hiding a lot of regression. We need more detailed analysis per operator
> performance. We will not be able to do this for current release, we should
> have a more concrete way to determining such performance regression before
> next release.
>
> Setup:
> 1.5 => Build from source (head of 1.5.rc2 tag), built with MKLDNN
> 1.4.1 => PyPi mxnet-mkl==1.4.1
> Machine: C5.18X
> No explicit environment variable were set
> Operator benchmark code -
> https://github.com/apache/incubator-mxnet/tree/master/benchmark/opperf
>
> Best,
> Sandeep
>
>
> On Thu, Jun 27, 2019 at 10:42 AM Pedro Larroy <
> pedro.larroy.lists@gmail.com>
> wrote:
>
> > I will try to run a few benchmarks in a bare metal instance tonight to
> > remove virtualization variance for the measurements and provide some
> > numbers.
> >
> > Please propose a set of models / examples that would be desirable to
> > run before the release and provide a link to an easy to run script
> > with instructions so we can validate the release better.
> >
> > Thank you.
> >
> > On Thu, Jun 27, 2019 at 10:01 AM Lai Wei <ro...@gmail.com> wrote:
> > >
> > > Dear @dev,
> > >
> > > I m cancelling the vote for cached op fix:
> > >
> > > https://github.com/apache/incubator-mxnet/pull/15298
> > >
> > > As for the possible cpu training regression, it looks like not a
> blocker
> > > for now.
> > >
> > > I will start a new rc2 vote, please help to validate.
> > >
> > > Thanks!
> > >
> > >
> > > On Thu, Jun 27, 2019 at 10:06 PM Chen, Ciyong <ci...@intel.com>
> > wrote:
> > >
> > > > Hi Pedro,
> > > >
> > > > I was able to reproduced the similar result (v1.5 is ~%5.6 slower
> than
> > > > v1.4, I was using 18 cores for computing) with your script on
> > C5.18xlarge.
> > > > But need to bind the cores with below command when running the
> script,
> > > > (without setting the env variables, I got a close time (<1%) with
> v1.5
> > and
> > > > v1.4)
> > > >         export KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0
> > > >         export OMP_NUM_THREADS=18
> > > >
> > > > Did you set any env variables during running?
> > > >
> > > > The performance result I got as below:
> > > > 1) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
> > > > real    12m10.856s
> > > > user    234m49.576s
> > > > sys     4m38.044s
> > > >
> > > > 2) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
> > > > real    12m52.140s
> > > > user    246m30.740s
> > > > sys     5m8.188s
> > > >
> > > > As I looked at the profiling data, most of the ops have same perf
> > between
> > > > v1.4 and v1.5. But some ops like " _backward_BatchNorm" and "Pooling"
> > is
> > > > ~1.37x slower on v1.5 compared with v1.4.
> > > > Will do further analysis on these ops.
> > > >
> > > > Here's the hardware/OS info from my side:
> > > > ----------Python Info----------
> > > > Version      : 3.6.8
> > > > Compiler     : GCC 7.3.0
> > > > Build        : ('default', 'Dec 30 2018 01:22:34')
> > > > Arch         : ('64bit', '')
> > > > ------------Pip Info-----------
> > > > Version      : 19.0.3
> > > > Directory    :
> > > >
> /home/ubuntu/anaconda3/envs/perf-mxnet/lib/python3.6/site-packages/pip
> > > > ----------MXNet Info-----------
> > > > Version      : 1.5.0
> > > > Directory    : /home/ubuntu/ws/incubator-mxnet/python/mxnet
> > > > Hashtag not found. Not installed from pre-built package.
> > > > ----------System Info----------
> > > > Platform     : Linux-4.4.0-1085-aws-x86_64-with-debian-stretch-sid
> > > > system       : Linux
> > > > node         : ip-172-31-32-129
> > > > release      : 4.4.0-1085-aws
> > > > version      : #96-Ubuntu SMP Tue Jun 11 09:08:32 UTC 2019
> > > > ----------Hardware Info----------
> > > > machine      : x86_64
> > > > processor    : x86_64
> > > > Architecture:          x86_64
> > > > CPU op-mode(s):        32-bit, 64-bit
> > > > Byte Order:            Little Endian
> > > > CPU(s):                72
> > > > On-line CPU(s) list:   0-71
> > > > Thread(s) per core:    2
> > > > Core(s) per socket:    18
> > > > Socket(s):             2
> > > > NUMA node(s):          2
> > > > Vendor ID:             GenuineIntel
> > > > CPU family:            6
> > > > Model:                 85
> > > > Model name:            Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
> > > > Stepping:              3
> > > > CPU MHz:               3000.000
> > > > BogoMIPS:              6000.00
> > > > Hypervisor vendor:     KVM
> > > > Virtualization type:   full
> > > > L1d cache:             32K
> > > > L1i cache:             32K
> > > > L2 cache:              1024K
> > > > L3 cache:              25344K
> > > > NUMA node0 CPU(s):     0-17,36-53
> > > > NUMA node1 CPU(s):     18-35,54-71
> > > > Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep
> mtrr
> > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx
> > pdpe1gb
> > > > rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology
> nonstop_tsc
> > > > aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid
> > sse4_1
> > > > sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c
> rdrand
> > > > hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase
> > > > tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f
> rdseed
> > adx
> > > > smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 ida arat pku
> > > > ----------Network Test----------
> > > >
> > > >
> > > > -Ciyong
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Zhao, Patric [mailto:patric.zhao@intel.com]
> > > > Sent: Thursday, June 27, 2019 9:55 AM
> > > > To: dev@mxnet.incubator.apache.org
> > > > Cc: dev@mxnet.apache.org
> > > > Subject: RE: [VOTE] Release Apache MXNet (incubating) version
> 1.5.0.rc1
> > > >
> > > > Could we run more epochs to see the performance difference or
> profiling
> > > > the difference between good and bad run?
> > > >
> > > > > -----Original Message-----
> > > > > From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com]
> > > > > Sent: Thursday, June 27, 2019 9:35 AM
> > > > > To: dev@mxnet.incubator.apache.org
> > > > > Cc: dev@mxnet.apache.org
> > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version
> > > > > 1.5.0.rc1
> > > > >
> > > > > I run again and the gap is again bigger, I guess we need to average
> > > > > out the times across several runs:
> > > > >
> > > > > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
> > > > > (master)+$ time ~/mxnet_1.4/py3_venv/bin/python cifar10.py
> --epochs 5
> > > > > && time ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
> > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
> > > > > ImageRecordIOParser2:
> > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
> > threads
> > > > > for decoding..
> > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> completed
> > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
> > > > > ImageRecordIOParser2:
> > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
> threads
> > > > > for decoding..
> > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> completed
> > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005, 300:
> > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 [23:17:09]
> > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > > > > 147456 bytes with malloc directly
> > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > > > > 589824 bytes with malloc directly
> > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > > > > 2359296 bytes with malloc directly
> > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > > > > 9437184 bytes with malloc directly
> > > > > Epoch 0, Batch 199, Speed=384.149839
> > > > > Epoch 0, Duration=140.919567
> > > > > Epoch 0, Training accuracy=0.115169
> > > > > Epoch 0, Validation accuracy=0.141317
> > > > > Epoch 1, Batch 199, Speed=433.380512
> > > > > Epoch 1, Duration=119.553233
> > > > > Epoch 1, Training accuracy=0.170956
> > > > > Epoch 1, Validation accuracy=0.216146
> > > > > Epoch 2, Batch 199, Speed=434.864699
> > > > > Epoch 2, Duration=123.278490
> > > > > Epoch 2, Training accuracy=0.209455
> > > > > Epoch 2, Validation accuracy=0.247296
> > > > > Epoch 3, Batch 199, Speed=433.401854
> > > > > Epoch 3, Duration=118.327797
> > > > > Epoch 3, Training accuracy=0.248701
> > > > > Epoch 3, Validation accuracy=0.302083
> > > > > Epoch 4, Batch 199, Speed=419.713707
> > > > > Epoch 4, Duration=126.468409
> > > > > Epoch 4, Training accuracy=0.260949
> > > > > Epoch 4, Validation accuracy=0.269030
> > > > >
> > > > > real    10m55.796s
> > > > > user    399m33.567s
> > > > > sys     13m55.904s
> > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
> > > > > ImageRecordIOParser2:
> > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
> > threads
> > > > > for decoding..
> > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> completed
> > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
> > > > > ImageRecordIOParser2:
> > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
> threads
> > > > > for decoding..
> > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> completed
> > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005, 300:
> > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0, Batch 199,
> > > > > Speed=419.039188 Epoch 0, Duration=143.934903 Epoch 0, Training
> > > > > accuracy=0.122542 Epoch 0, Validation accuracy=0.164359 Epoch 1,
> > Batch
> > > > > 199, Speed=445.257048 Epoch 1, Duration=135.248399 Epoch 1,
> Training
> > > > > accuracy=0.178828 Epoch 1, Validation accuracy=0.199419 Epoch 2,
> > Batch
> > > > > 199, Speed=447.115215 Epoch 2, Duration=132.003770 Epoch 2,
> Training
> > > > > accuracy=0.217808 Epoch 2, Validation accuracy=0.233073 Epoch 3,
> > Batch
> > > > > 199, Speed=441.079477 Epoch 3, Duration=126.543316 Epoch 3,
> Training
> > > > > accuracy=0.248102 Epoch 3, Validation accuracy=0.293870 Epoch 4,
> > Batch
> > > > > 199, Speed=449.329787 Epoch 4, Duration=138.398325 Epoch 4,
> Training
> > > > > accuracy=0.270021 Epoch 4, Validation accuracy=0.311498
> > > > >
> > > > > real    11m45.329s
> > > > > user    426m13.908s
> > > > > sys     16m45.093s
> > > > >
> > > > > On Wed, Jun 26, 2019 at 4:18 PM Pedro Larroy
> > > > > <pe...@gmail.com> wrote:
> > > > > >
> > > > > > The difference looks smaller now, more like your numbers. I
> wonder
> > > > > > if something happened during the previous benchmark like a system
> > > > > > update...
> > > > > >
> > > > > >
> > > > > > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
> > > > > (master)+$
> > > > > > time ~/mxnet_1.4/py3_venv/bin/python cifar10.py --epochs 5 &&
> time
> > > > > > ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5 [22:49:41]
> > > > > > ../src/io/iter_image_recordio_2.cc:172:
> > > > > > ImageRecordIOParser2:
> > > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
> > > > > > threads for decoding..
> > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
> > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
> > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > > > completed
> > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:172:
> > > > > > ImageRecordIOParser2:
> > > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
> > > > > > threads for decoding..
> > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
> > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
> > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > > > completed
> > > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
> 300:
> > > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 [22:49:42]
> > > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > > > > > 147456 bytes with malloc directly
> > > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > > > > > 589824 bytes with malloc directly
> > > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > > > > > 2359296 bytes with malloc directly
> > > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > > > > > 9437184 bytes with malloc directly
> > > > > > Epoch 0, Batch 199, Speed=426.182733 Epoch 0, Duration=134.868458
> > > > > > Epoch 0, Training accuracy=0.127238 Epoch 0, Validation
> > > > > > accuracy=0.206388 Epoch 1, Batch 199, Speed=313.127156 Epoch 1,
> > > > > > Duration=128.041775 Epoch 1, Training accuracy=0.182065 Epoch 1,
> > > > > > Validation accuracy=0.202524 Epoch 2, Batch 199, Speed=410.931187
> > > > > > Epoch 2, Duration=124.920588 Epoch 2, Training accuracy=0.202584
> > > > > > Epoch 2, Validation accuracy=0.245693 Epoch 3, Batch 199,
> > > > > > Speed=419.119335 Epoch 3, Duration=120.948349 Epoch 3, Training
> > > > > > accuracy=0.235854 Epoch 3, Validation accuracy=0.291066 Epoch 4,
> > > > > > Batch 199, Speed=430.473733 Epoch 4, Duration=130.181724 Epoch 4,
> > > > > > Training accuracy=0.257773 Epoch 4, Validation accuracy=0.304988
> > > > > >
> > > > > > real    11m7.356s
> > > > > > user    406m9.910s
> > > > > > sys     14m18.349s
> > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
> > > > > > ImageRecordIOParser2:
> > > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
> > > > > > threads for decoding..
> > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
> > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
> > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > > > completed
> > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
> > > > > > ImageRecordIOParser2:
> > > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
> > > > > > threads for decoding..
> > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
> > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
> > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > > > completed
> > > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
> 300:
> > > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0, Batch
> 199,
> > > > > > Speed=348.618154 Epoch 0, Duration=146.469352 Epoch 0, Training
> > > > > > accuracy=0.124121 Epoch 0, Validation accuracy=0.167227 Epoch 1,
> > > > > > Batch 199, Speed=452.790825 Epoch 1, Duration=130.199421 Epoch 1,
> > > > > > Training
> > > > > > accuracy=0.183863 Epoch 1, Validation accuracy=0.237079 Epoch 2,
> > > > > > Batch 199, Speed=451.406559 Epoch 2, Duration=126.320823 Epoch 2,
> > > > > > Training
> > > > > > accuracy=0.214844 Epoch 2, Validation accuracy=0.244692 Epoch 3,
> > > > > > Batch 199, Speed=403.161873 Epoch 3, Duration=125.331660 Epoch 3,
> > > > > > Training
> > > > > > accuracy=0.243506 Epoch 3, Validation accuracy=0.301182 Epoch 4,
> > > > > > Batch 199, Speed=450.826598 Epoch 4, Duration=126.426253 Epoch 4,
> > > > > > Training
> > > > > > accuracy=0.266424 Epoch 4, Validation accuracy=0.311899
> > > > > >
> > > > > > real    11m21.930s
> > > > > > user    415m3.855s
> > > > > > sys     13m53.975s
> > > > > >
> > > > > > On Wed, Jun 26, 2019 at 3:50 PM Pedro Larroy
> > > > > > <pe...@gmail.com> wrote:
> > > > > > >
> > > > > > > Hi Ciyong, thanks for trying to reproduce:
> > > > > > >
> > > > > > > I used this one:
> > > > > > > https://github.com/awslabs/deeplearning-
> > > > > benchmark/blob/master/dawnbe
> > > > > > > nch/cifar10.py
> > > > > > >
> > > > > > > Could you provide hardware and OS details?
> > > > > > >
> > > > > > > I will rerun and repost numbers in a few minutes.
> > > > > > >
> > > > > > > Pedro.
> > > > > > >
> > > > > > > On Wed, Jun 26, 2019 at 4:18 AM Chen, Ciyong
> > > > > > > <ci...@intel.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > Hi Pedro,
> > > > > > > >
> > > > > > > > I'm looking at this case, and using the script of
> > > > > > > >
> "incubator-mxnet/example/image-classification/train_cifar10.py"
> > > > > > > > to get
> > > > > the timing data, but seems there's not much difference between
> mxnet
> > > > > 1.4.1.rc0 and 1.5.0.rc1 on C5.18xlarge.
> > > > > > > >
> > > > > > > > Not sure if there's any difference in the python script, can
> > you
> > > > > > > > point me
> > > > > the link to get your script (cifar10.py)?
> > > > > > > > Or you can also have a try with MXNet's script
> > > > > > > > (train_cifar10.py) and see
> > > > > the performance.
> > > > > > > >
> > > > > > > > Here's the command I used to collect the time:
> > > > > > > >         python train_cifar10.py --num-epoch=5
> > > > > > > >
> > > > > > > > 1) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
> > > > > > > >         real    9m4.880s
> > > > > > > >         user    333m13.340s
> > > > > > > >         sys     14m36.100s
> > > > > > > >
> > > > > > > > 2) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
> > > > > > > >         real    9m2.155s
> > > > > > > >         user    329m37.092s
> > > > > > > >         sys     16m8.668s
> > > > > > > >
> > > > > > > > -Ciyong
> > > > > > > >
> > > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com]
> > > > > > > > Sent: Wednesday, June 26, 2019 6:28 AM
> > > > > > > > To: dev@mxnet.incubator.apache.org
> > > > > > > > Cc: dev@mxnet.apache.org
> > > > > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version
> > > > > > > > 1.5.0.rc1
> > > > > > > >
> > > > > > > > Hi these were my build flags and system info:
> > > > > > > >
> > > > > > > >
> > > > > > > > --- # CMake configuration
> > > > > > > > USE_CUDA: "OFF" # Build with CUDA support
> > > > > > > > USE_OLDCMAKECUDA: "OFF" # Build with old cmake cuda
> > > > > > > > USE_NCCL: "OFF" # Use NVidia NCCL with CUDA
> > > > > > > > USE_OPENCV: "ON" # Build with OpenCV support
> > > > > > > > USE_OPENMP: "ON" # Build with Openmp support
> > > > > > > > USE_CUDNN: "ON" # Build with cudnn support) # one could set
> > > > > > > > CUDNN_ROOT for search path
> > > > > > > > USE_SSE: "ON" # Build with x86 SSE instruction support IF NOT
> > > > > > > > ARM
> > > > > > > > USE_F16C: "ON" # Build with x86 F16C instruction support) #
> > > > > autodetects support if "ON"
> > > > > > > > USE_LAPACK: "ON" # Build with lapack support
> > > > > > > > USE_MKL_IF_AVAILABLE: "ON" # Use MKL if found
> > > > > > > > USE_MKLML_MKL: "ON" # Use MKLDNN variant of MKL (if MKL
> found)
> > > > > > > > IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
> > > > > > > > USE_MKLDNN: "ON" # Use MKLDNN variant of MKL (if MKL found)
> IF
> > > > > > > > USE_MKL_IF_AVAILABLE AND (NOT APPLE)
> > > > > > > > USE_OPERATOR_TUNING: "ON" # Enable auto-tuning of operators
> IF
> > > > > NOT
> > > > > > > > MSVC
> > > > > > > > USE_GPERFTOOLS: "ON" # Build with GPerfTools support (if
> found)
> > > > > > > > USE_JEMALLOC: "ON" # Build with Jemalloc support
> > > > > > > > USE_PROFILER: "ON" # Build with Profiler support
> > > > > > > > USE_DIST_KVSTORE: "OFF" # Build with DIST_KVSTORE support
> > > > > > > > USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC Plugins
> > > > > > > > USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin
> > > > > > > > USE_CPP_PACKAGE: "OFF" # Build C++ Package
> > > > > > > > USE_MXNET_LIB_NAMING: "ON" # Use MXNet library naming
> > > > > conventions.
> > > > > > > > USE_GPROF: "OFF" # Compile with gprof (profiling) flag
> > > > > > > > USE_CXX14_IF_AVAILABLE: "OFF" # Build with C++14 if the
> > compiler
> > > > > > > > supports it
> > > > > > > > USE_VTUNE: "OFF" # Enable use of Intel Amplifier XE (VTune))
> #
> > > > > > > > one could set VTUNE_ROOT for search path
> > > > > > > > ENABLE_CUDA_RTC: "ON" # Build with CUDA runtime compilation
> > > > > > > > support
> > > > > > > > BUILD_CPP_EXAMPLES: "ON" # Build cpp examples
> > > > > > > > INSTALL_EXAMPLES: "OFF" # Install the example source files.
> > > > > > > > USE_SIGNAL_HANDLER: "ON" # Print stack traces on segfaults.
> > > > > > > > USE_TENSORRT: "OFF" # Enable infeference optimization with
> > > > TensorRT.
> > > > > > > > USE_ASAN: "OFF" # Enable Clang/GCC ASAN sanitizers.
> > > > > > > > ENABLE_TESTCOVERAGE: "OFF" # Enable compilation with test
> > > > > > > > coverage metric output
> > > > > > > > CMAKE_BUILD_TYPE: "Release"
> > > > > > > > CMAKE_CUDA_COMPILER_LAUNCHER: "ccache"
> > > > > > > > CMAKE_C_COMPILER_LAUNCHER: "ccache"
> > > > > > > > CMAKE_CXX_COMPILER_LAUNCHER: "ccache"
> > > > > > > >
> > > > > > > > commit 4d9667121ae6fb643f2a02ab15e25231ed756cde (HEAD, tag:
> > > > > > > > 1.5.0.rc1,
> > > > > > > > upstream/v1.5.x)
> > > > > > > > commit 1a7199691f5cbc6012bb53eecbf884bed5ae6590 (HEAD, tag:
> > > > > > > > 1.4.1.rc0,
> > > > > > > > upstream/v1.4.x)
> > > > > > > >
> > > > > > > > curl http://169.254.169.254/latest/meta-data/instance-type
> > > > > > > > c5d.18xlarge
> > > > > > > >
> > > > > > > >
> > > > > > > > Version      : 3.6.7
> > > > > > > > Compiler     : GCC 8.2.0
> > > > > > > > Build        : ('default', 'Oct 22 2018 11:32:17')
> > > > > > > > Arch         : ('64bit', 'ELF')
> > > > > > > > ------------Pip Info-----------
> > > > > > > > Version      : 19.1.1
> > > > > > > > Directory    :
> > /home/piotr/mxnet_1.5/py3_venv/lib/python3.6/site-
> > > > > packages/pip
> > > > > > > > ----------MXNet Info-----------
> > > > > > > > Version      : 1.5.0
> > > > > > > > Directory    : /home/piotr/mxnet_1.5/python/mxnet
> > > > > > > > Hashtag not found. Not installed from pre-built package.
> > > > > > > > ----------System Info----------
> > > > > > > > Platform     :
> > > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
> > > > > > > > system       : Linux
> > > > > > > > node         : ip-172-31-63-171
> > > > > > > > release      : 4.15.0-1035-aws
> > > > > > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
> > > > > > > > ----------Hardware Info----------
> > > > > > > > machine      : x86_64
> > > > > > > > processor    : x86_64
> > > > > > > > Architecture:        x86_64
> > > > > > > > CPU op-mode(s):      32-bit, 64-bit
> > > > > > > > Byte Order:          Little Endian
> > > > > > > > CPU(s):              72
> > > > > > > > On-line CPU(s) list: 0-71
> > > > > > > > Thread(s) per core:  2
> > > > > > > > Core(s) per socket:  18
> > > > > > > > Socket(s):           2
> > > > > > > > NUMA node(s):        2
> > > > > > > > Vendor ID:           GenuineIntel
> > > > > > > > CPU family:          6
> > > > > > > > Model:               85
> > > > > > > > Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @
> > 3.00GHz
> > > > > > > > Stepping:            4
> > > > > > > > CPU MHz:             1326.446
> > > > > > > > BogoMIPS:            6000.00
> > > > > > > > Hypervisor vendor:   KVM
> > > > > > > > Virtualization type: full
> > > > > > > > L1d cache:           32K
> > > > > > > > L1i cache:           32K
> > > > > > > > L2 cache:            1024K
> > > > > > > > L3 cache:            25344K
> > > > > > > > NUMA node0 CPU(s):   0-17,36-53
> > > > > > > > NUMA node1 CPU(s):   18-35,54-71
> > > > > > > > Flags:               fpu vme de pse tsc msr pae mce cx8 apic
> > sep
> > > > mtrr
> > > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht
> syscall
> > > > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl
> > > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq monitor
> > > > > > > > ssse3 fma cx16 pcid
> > > > > > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
> xsave
> > > > > > > > avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
> > > > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep
> bmi2
> > > > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx smap
> > clflushopt
> > > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
> xsaves
> > > > > > > > ida arat pku ospke ----------Network Test----------
> > > > > > > >
> > > > > > > > ----------Python Info----------
> > > > > > > > Version      : 3.6.7
> > > > > > > > Compiler     : GCC 8.2.0
> > > > > > > > Build        : ('default', 'Oct 22 2018 11:32:17')
> > > > > > > > Arch         : ('64bit', 'ELF')
> > > > > > > > ------------Pip Info-----------
> > > > > > > > Version      : 19.1.1
> > > > > > > > Directory    :
> > /home/piotr/mxnet_1.4/py3_venv/lib/python3.6/site-
> > > > > packages/pip
> > > > > > > > ----------MXNet Info-----------
> > > > > > > > Version      : 1.4.1
> > > > > > > > Directory    : /home/piotr/mxnet_1.4/python/mxnet
> > > > > > > > Hashtag not found. Not installed from pre-built package.
> > > > > > > > ----------System Info----------
> > > > > > > > Platform     :
> > > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
> > > > > > > > system       : Linux
> > > > > > > > node         : ip-172-31-63-171
> > > > > > > > release      : 4.15.0-1035-aws
> > > > > > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
> > > > > > > > ----------Hardware Info----------
> > > > > > > > machine      : x86_64
> > > > > > > > processor    : x86_64
> > > > > > > > Architecture:        x86_64
> > > > > > > > CPU op-mode(s):      32-bit, 64-bit
> > > > > > > > Byte Order:          Little Endian
> > > > > > > > CPU(s):              72
> > > > > > > > On-line CPU(s) list: 0-71
> > > > > > > > Thread(s) per core:  2
> > > > > > > > Core(s) per socket:  18
> > > > > > > > Socket(s):           2
> > > > > > > > NUMA node(s):        2
> > > > > > > > Vendor ID:           GenuineIntel
> > > > > > > > CPU family:          6
> > > > > > > > Model:               85
> > > > > > > > Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @
> > 3.00GHz
> > > > > > > > Stepping:            4
> > > > > > > > CPU MHz:             1223.344
> > > > > > > > BogoMIPS:            6000.00
> > > > > > > > Hypervisor vendor:   KVM
> > > > > > > > Virtualization type: full
> > > > > > > > L1d cache:           32K
> > > > > > > > L1i cache:           32K
> > > > > > > > L2 cache:            1024K
> > > > > > > > L3 cache:            25344K
> > > > > > > > NUMA node0 CPU(s):   0-17,36-53
> > > > > > > > NUMA node1 CPU(s):   18-35,54-71
> > > > > > > > Flags:               fpu vme de pse tsc msr pae mce cx8 apic
> > sep
> > > > mtrr
> > > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht
> syscall
> > > > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl
> > > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq monitor
> > > > > > > > ssse3 fma cx16 pcid
> > > > > > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
> xsave
> > > > > > > > avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
> > > > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep
> bmi2
> > > > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx smap
> > clflushopt
> > > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
> xsaves
> > > > > > > > ida arat pku ospke ----------Network Test----------
> > > > > > > >
> > > > > > > > On Tue, Jun 25, 2019 at 2:35 PM Pedro Larroy
> > > > > <pe...@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > I did a training of cifar10 in CPU and seems there's some
> > > > > > > > > regressions in the range of 7% increase of training time
> > against
> > > > 1.4.1:
> > > > > > > > >
> > > > > > > > > (py3_venv)
> > > > > > > > > piotr@ip-172-31-63-171
> :0:~/deeplearning-benchmark/dawnbench
> > > > > > > > > (master)+$ time python cifar10.py --epochs 5
> > > > > > > > > real    11m30.388s
> > > > > > > > > user    417m7.766s
> > > > > > > > > sys     16m57.315s
> > > > > > > > >
> > > > > > > > > VS 1.4.1:
> > > > > > > > > real    10m41.994s
> > > > > > > > > user    392m40.646s
> > > > > > > > > sys     12m30.601s
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thu, Jun 20, 2019 at 10:15 PM Lai Wei <
> > royweilai@gmail.com>
> > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Anirudh,
> > > > > > > > > >
> > > > > > > > > > Thanks for jumping into this quickly, I followed up on
> the
> > > > issue.
> > > > > > > > > >
> > > > > > > > > > I was meant for sockeye developer/maintainers to help
> setup
> > > > > > > > > > nightly tests and raise issues early.
> > > > > > > > > >
> > > > > > > > > > Thanks!
> > > > > > > > > >
> > > > > > > > > > On Fri, Jun 21, 2019 at 10:10 AM Haibin Lin
> > > > > > > > > > <ha...@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > In GluonNLP we are testing with MXNET nightly build for
> > > > > > > > > > > each PR, and we did find some MXNet related issue
> caught
> > by
> > > > the CI.
> > > > > > > > > > > I recommend other toolkits also add integration tests
> > with
> > > > > > > > > > > MXNet
> > > > > nightly.
> > > > > > > > > > > It helps identify issues early.
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Haibin
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Jun 20, 2019 at 18:52 Zhao, Patric
> > > > > > > > > > > <pa...@intel.com>
> > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Thanks to raise the issue and we will take a look
> ASAP.
> > > > > > > > > > > >
> > > > > > > > > > > > The downstream cases is not in the MXNet CI so it's
> > hard
> > > > > > > > > > > > to catch the potential bugs or performance
> degradation
> > > > > > > > > > > > for
> > > > > MXNet developers.
> > > > > > > > > > > >
> > > > > > > > > > > > In the future, I suggest adding the major downstream
> > > > > > > > > > > > test cases, like
> > > > > > > > > > > from
> > > > > > > > > > > > sockeye, GluonNLP, GLuonCV, DGL, Gluon-TS, into the
> > > > > > > > > > > > nightly
> > > > > test.
> > > > > > > > > > > > If it's still too heavy,  maybe testing it weekly or
> > > > > > > > > > > > monthly :)
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > >
> > > > > > > > > > > > --Patric
> > > > > > > > > > > >
> > > > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > > > From: Anirudh Subramanian
> > > > > > > > > > > > > [mailto:anirudh2290@gmail.com]
> > > > > > > > > > > > > Sent: Friday, June 21, 2019 9:31 AM
> > > > > > > > > > > > > To: dev@mxnet.incubator.apache.org
> > > > > > > > > > > > > Cc: dev@mxnet.apache.org
> > > > > > > > > > > > > Subject: Re: [VOTE] Release Apache MXNet
> (incubating)
> > > > > > > > > > > > > version
> > > > > > > > > > > > > 1.5.0.rc1
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi Lai,
> > > > > > > > > > > > >
> > > > > > > > > > > > > I have opened an issue:
> > > > > > > > > > > > >
> > https://github.com/apache/incubator-mxnet/issues/15297
> > > > > > > > > > > > > I came to know about this issue only today and I
> have
> > > > > > > > > > > > > not been
> > > > > > > > > > > monitoring
> > > > > > > > > > > > > sockeye.
> > > > > > > > > > > > > I jumped onto this issue to make sure it wasn't
> > caused
> > > > > > > > > > > > > by the dlpack
> > > > > > > > > > > > changes.
> > > > > > > > > > > > > Also, I don't  think sockeye CI checks against
> > master,
> > > > > > > > > > > > > it is using
> > > > > > > > > > > 1.4.1.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Anirudh
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei
> > > > > > > > > > > > > <ro...@gmail.com>
> > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Could you share which test failed and what’s the
> > > > > > > > > > > > > > crash? How to reproduce it?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I was able to install sockeye and run all tests
> > passed.
> > > > > > > > > > > > > > Using python setup.py test
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I have tested both nightly pip package and
> > 1.5.0.rc1
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > It would be great to create an issue with
> > > > > > > > > > > > > > reproducible steps and move the discussion there.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Also I see sockeye nightly build[1] has been
> > failing
> > > > > > > > > > > > > > for some time,
> > > > > > > > > > > if
> > > > > > > > > > > > > > it’s due to MXNet change, please raise this early
> > so
> > > > > > > > > > > > > > we can track and solve it in time rather than
> block
> > > > > > > > > > > > > > the release
> > > > > during vote time.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > [1] https://travis-ci.org/awslabs/sockeye
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 7:01 AM Anirudh
> Subramanian
> > > > > > > > > > > > > > <anirudh2290@gmail.com
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I was able to reproduce a crash with the commit
> > > > > > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06 but
> not
> > > > > > > > > > > > > > > with the commit
> > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Anirudh
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei
> > > > > > > > > > > > > > > <ro...@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi Przemyslaw,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Is there an issue with more details to track
> > the
> > > > problem?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 6:04 AM Przemysław
> > > > > > > > > > > > > > > > Trędak <pt...@apache.org>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > -1
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > There is a crash in sockeye unit test
> (python
> > > > > > > > > > > > > > > > > setup.py
> > > > > > > > > > > > > > > > > test) observed starting with nightly 1.5
> > build
> > > > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > > 6/13 and still occuring in
> > > > > > > > > > > > > > > 1.5rc1. I
> > > > > > > > > > > > > > > > > don't yet have the exact commit that is
> > > > > > > > > > > > > > > > > responsible for it, but it is either
> > > > > > > > > > > > > > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c
> > > > > > > > > > > > > > > > > (dlpack
> > > > > > > > > > > > > > > > > related) or
> > > > > > > > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06
> > > > > > > > > > > > > > > > > (cached op
> > > > > > > > > > > > > optimization).
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On 2019/06/20 06:36:22, Lai Wei
> > > > > > > > > > > > > > > > > <ro...@gmail.com>
> > > > > wrote:
> > > > > > > > > > > > > > > > > > Dear MXNet community,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > This is the 3-day vote to release Apache
> > > > > > > > > > > > > > > > > > MXNet
> > > > > > > > > > > > > > > > > > (incubating) version
> > > > > > > > > > > > > > > > > 1.5.0.
> > > > > > > > > > > > > > > > > > Voting on dev@ will start June 19,
> > > > > > > > > > > > > > > > > > 23:59:59(PST) and close
> > > > > > > > > > > on
> > > > > > > > > > > > > > June
> > > > > > > > > > > > > > > > 22,
> > > > > > > > > > > > > > > > > > 23:59:59.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 1) Link to release notes:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > >
> > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Re
> > > > > > > > > > > le
> > > > > > > > > > > ase+No
> > > > > > > > > > > te
> > > > > > > > > > > > > > > s
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 2) Link to release candidate:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > >
> > https://github.com/apache/incubator-mxnet/releases/tag/1.5
> > > > > > > > > > > .0
> > > > > > > > > > > .r
> > > > > > > > > > > > > > > > > > c1
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 3) Link to source and signatures on
> apache
> > > > dist server:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > >
> > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5
> > > > > > > > > > > .0
> > > > > > > > > > > .r
> > > > > > > > > > > > > > > > > > c1/
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Please remember to TEST first before
> voting
> > > > > accordingly:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > +1 = approve
> > > > > > > > > > > > > > > > > > +0 = no opinion
> > > > > > > > > > > > > > > > > > -1 = disapprove (provide reason)
> > > > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > > > Best Regards
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Lai
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > Best Regards
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Lai
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > Best Regards
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Lai
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Best Regards
> > > > > > > > > >
> > > > > > > > > > Lai
> > > >
> > > --
> > > Best Regards
> > >
> > > Lai
> >
> >
>
> --
> Sandeep Krishnamurthy
>

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Posted by sandeep krishnamurthy <sa...@gmail.com>.
I ran operator benchmarks for GPU with CUDA10.1 comparing MXNet 1.4.1 and
1.5.0 RC2
Results -
https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50#file-mxnet_opperf_gpu-md

GPU operator benchmark Summary:
1. Most ops are mostly stable across MXNet 1.4.1 and 1.5.0.RC2. No
regressions.
2. Nice improvements on Dropout and FC backward.

As shared before on CPU -
https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50#file-mxnet_opperf_cpu-md,
individual operator performance, comparing MXNet 1.4.1 and 1.5.0.RC2:

CPU operator benchmark Summary:
1. Many operators have improved very well - dropout, Convolution, reduction
ops  like max, mean, min, prod, sum
2. Many operators have regressed - batchnorm, pooling, batch_dot with
transpose, all broadcast_* ops

Best,
Sandeep


On Fri, Jun 28, 2019 at 10:46 AM Davydenko, Denis <
dzianis.davydzenka@gmail.com> wrote:

> Just to re-iterate, postponing the release until we have a strong hold on
> perf regression is my #1 choice as well. I am just trying to consider
> alternatives where we can release 1.5.0 and manage potential perf impact...
>
> On 6/28/19, 10:04 AM, "Marco de Abreu" <ma...@gmail.com> wrote:
>
>     Hey Denis,
>
>     I don't think something like an experimental release is something that
> the
>     Apache release process supports. Also, I would be afraid of automated
>     systems consuming MXNet by simply fetching the latest release version.
>     These users would then get the experimental version without being
> aware.
>
>     For the sake of the best user experience, I'd prefer if we could take
> a few
>     days to track down the root causes for all these regressions. While I
> agree
>     that releasing the new features and optimizations is certainly
> overdue, I
>     think that the most important point is to keep up with the existing
> users
>     and their trust. If a new release performs worse for the same kind of
>     workload, they might lose trust into our release process and in future
>     might be less willing to adopt a new release early-on.
>
>     -Marco
>
>     Davydenko, Denis <dz...@gmail.com> schrieb am Fr., 28.
> Juni
>     2019, 18:55:
>
>     > According to Sandeep's evaluation of perf regression on operator
> level [1]
>     > we have 77 op/input combinations for forward pass and 50 for
> backward pass
>     > where regression is 5%+ (biggest regressions observed are about 86%
> and 84%
>     > respectively) out of 290 tests. If I raise threshold of degradation
> to 10%+
>     > corresponding numbers are 70 for forward and 42 for backward. This,
> from my
>     > perspective, constitutes significant scale performance impact, at
> least on
>     > individual operator level. In light of keeping every next release as
>     > performant as previous (at least to feasible extent) I suggest we
> can only
>     > move forward with 1.5.0 release if we call it experimental. Current
>     > landscape of operators having potentially negative performance
> impact on
>     > customers could (and I consider it will) put MXNet one step behind
> its
>     > current market position of being a choice for performance optimized
> DL
>     > workloads. Tagging it as experimental, from my point of view, would
> help to
>     > release new features so that customers could enjoy them while being
>     > explicit about performance optimizations going on.
>     >
>     > [1]
>     >
> https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50
>     >
>     >
>     >
>     > On 6/28/19, 9:38 AM, "Lai Wei" <ro...@gmail.com> wrote:
>     >
>     >     Hi,
>     >
>     >     Some more data points:
>     >
>     >     I ran the same cifar10.py scripts with same setup, BUT added a
> fixed
>     > seed
>     >
>     >     Ran 50 epochs, and first 10 epoch as warmup.
>     >     I have the following average time per epoch:
>     >     1.4.1: 164.95 s
>     >     1.5.0: 170.44 s
>     >     Detailed data at [1]
>     >     This is about 3% regression, less than Manu’s result but more
> close to
>     > the
>     >     Gluon result.
>     >
>     >     As for the operator benchmarks from Sandeep[2],  I have
> calculated the
>     >     percentage of speed increase/regression here[1]. Looks like not
> all
>     >     operators mentioned before slowed down. should it be treated as
> an
>     > separate
>     >     issue as it’s testing on fake data with different shape than
> CIFAR10
>     >     dataset? For example, batch norm has no regression in the report
> but
>     > it’s
>     >     slowed down in cifar10.py script profiling.
>     >
>     >     [1]
> https://gist.github.com/roywei/41fce930f013ff3b54cda6e86eaaf66b
>     >     [2]
>     >
>     >
> https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50
>     >
>     >
>     >     On Fri, Jun 28, 2019 at 2:47 PM Pedro Larroy <
>     > pedro.larroy.lists@gmail.com>
>     >     wrote:
>     >
>     >     > Thanks Manu.
>     >     >
>     >     > @all: I observed other strange stuff that I don't understand
> at the
>     > moment:
>     >     >
>     >     > I installed rc for 1.5 from pip to check that I'm not doing
> something
>     >     > wrong when building. And I found out that the usage of CPU is
> quite
>     >     > subpar ( https://imgur.com/fRmbQNc ) compared to a version
> compiled
>     >     > from source. The pip package is using 4-5 cores of the 32.
> When I
>     >     > compile from source I get good core utilization. (
>     >     > https://imgur.com/e8BB425 ). I verified this also on a
> c5d.18xlarge
>     >     > and a 32 core AMD bare metal machine.
>     >     >
>     >     > Seems to me also that the version from pip is using gomp
> instead of
>     >     > llvm's omp. I'm not sure why.
>     >     >
>     >     > pip install mxnet==1.5.0b20190627
>     >     > /home/piotr/py3_1.5rc/lib/python3.6/site-packages/mxnet
>     >     > piotr@panther:0: ~/p/l/p/s/mxnet> ldd libmxnet.so | grep omp
>     >     >     libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
>     >     > (0x00007f99d1832000)
>     >     >
>     >     > I tried cifar10 on a bare metal 32 core AMD Zen machine and is
>     >     > extremely slow, doesn't seem to make much progress, when
> compared to
>     > a
>     >     > c5d.18xlarge, I couldn't even do 1 epoch, tried with and
> without MKL
>     >     > without much success. Will continue digging into this when
> possible.
>     >     >
>     >     >
>     >     > Pedro.
>     >     >
>     >     > On Thu, Jun 27, 2019 at 9:41 PM Manu Seth <
> manuseth1010@gmail.com>
>     > wrote:
>     >     > >
>     >     > > Hi all,
>     >     > >
>     >     > > I ran the same cifar10.py script as Pedro, but for 20 epochs.
>     > Considering
>     >     > > the first 10 epochs for warm-up, I averaged time per epoch
> for the
>     > last
>     >     > 10
>     >     > > epochs.
>     >     > >
>     >     > > With MXNet 1.4.1 average time is 164.23 s
>     >     > > With MXNet 1.5.0 average time is 174.59 s (~6.3% regression)
>     >     > >
>     >     > >
>     >     > > For a second data point, I ran Gluon speed test benchmark
> script -
>     >     > >
>     >     >
>     >
> https://github.com/apache/incubator-mxnet/blob/master/benchmark/python/gluon/benchmark_gluon.py
>     >     > > using the following command:
>     >     > > python3 benchmark_gluon.py --model 'resnet152_v2'
> --batch-size 128
>     >     > > --num-batches 200 --type 'training'
>     >     > >
>     >     > > I got the following speeds:
>     >     > > With MXNet 1.4.1, average speed is 25.677534 img/s
>     >     > > With MXNet 1.5.0, average speed is 25.082130 img/s (~2.3%
>     > regression)
>     >     > >
>     >     > > Note:
>     >     > > For 1.4.1 version, I used pip install mxnet-mkl==1.4.1
>     >     > > For 1.5.0 version, I used pip install
> mxnet-mkl==1.5.0b20190619
>     > which
>     >     > > corresponds to commit#
> ccbbf6b4b76ea536a6583c99497c83b65a20817b
>     > which is
>     >     > > behind 1.5.x branch by 4 commits
>     >     > >
>     >     > >
>     >     > > Best,
>     >     > > Manu
>     >     > >
>     >     > >
>     >     > > On 6/27/19, 3:37 PM, "sandeep krishnamurthy" <
>     >     > sandeep.krishna98@gmail.com>
>     >     > > wrote:
>     >     > >
>     >     > >     Hello Ciyong/Pedro,
>     >     > >
>     >     > >     Ran operator benchmarks on 1.4.1 and 1.5.0.rc2. (Not
> complete,
>     >     > doesn’t
>     >     > >     cover all MXNet operators, not presented in best
> possible way,
>     > still
>     >     > > WIP)
>     >     > >
>     >     > >
>     >     >
>     >
> https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50
>     >     > >
>     >     > >     Following operators looks slower in 1.5 compared to
> 1.4.1:
>     >     > >     - BatchNorm
>     >     > >     - Pooling
>     >     > >     - FullyConnected
>     >     > >     - batch_dot
>     >     > >     - Dot
>     >     > >     - broadcast_mul
>     >     > >     - log_softmax
>     >     > >     and few other operators
>     >     > >
>     >     > >     Also, several operators runs a lot faster on 1.5
> compared to
>     > 1.4.1.
>     >     > For
>     >     > >     example - Convolution, flatten, elementwise operators
> etc. So
>     > I see
>     >     > that
>     >     > >     likely few operators have regressed noticeably, however,
> due
>     > to other
>     >     > >     operator performance improvements, the end effect is not
> that
>     >     > > significant
>     >     > >     hiding a lot of regression. We need more detailed
> analysis per
>     >     > operator
>     >     > >     performance. We will not be able to do this for current
>     > release, we
>     >     > > should
>     >     > >     have a more concrete way to determining such performance
>     > regression
>     >     > > before
>     >     > >     next release.
>     >     > >
>     >     > >     Setup:
>     >     > >     1.5 => Build from source (head of 1.5.rc2 tag), built
> with
>     > MKLDNN
>     >     > >     1.4.1 => PyPi mxnet-mkl==1.4.1
>     >     > >     Machine: C5.18X
>     >     > >     No explicit environment variable were set
>     >     > >     Operator benchmark code -
>     >     > >
>     >     >
>     >
> https://github.com/apache/incubator-mxnet/tree/master/benchmark/opperf
>     >     > >
>     >     > >     Best,
>     >     > >     Sandeep
>     >     > >
>     >     > >
>     >     > >     On Thu, Jun 27, 2019 at 10:42 AM Pedro Larroy <
>     >     > > pedro.larroy.lists@gmail.com>
>     >     > >     wrote:
>     >     > >
>     >     > >     > I will try to run a few benchmarks in a bare metal
> instance
>     >     > tonight to
>     >     > >     > remove virtualization variance for the measurements and
>     > provide
>     >     > some
>     >     > >     > numbers.
>     >     > >     >
>     >     > >     > Please propose a set of models / examples that would be
>     > desirable
>     >     > to
>     >     > >     > run before the release and provide a link to an easy
> to run
>     > script
>     >     > >     > with instructions so we can validate the release
> better.
>     >     > >     >
>     >     > >     > Thank you.
>     >     > >     >
>     >     > >     > On Thu, Jun 27, 2019 at 10:01 AM Lai Wei <
>     > royweilai@gmail.com>
>     >     > wrote:
>     >     > >     > >
>     >     > >     > > Dear @dev,
>     >     > >     > >
>     >     > >     > > I m cancelling the vote for cached op fix:
>     >     > >     > >
>     >     > >     > > https://github.com/apache/incubator-mxnet/pull/15298
>     >     > >     > >
>     >     > >     > > As for the possible cpu training regression, it
> looks like
>     > not a
>     >     > > blocker
>     >     > >     > > for now.
>     >     > >     > >
>     >     > >     > > I will start a new rc2 vote, please help to validate.
>     >     > >     > >
>     >     > >     > > Thanks!
>     >     > >     > >
>     >     > >     > >
>     >     > >     > > On Thu, Jun 27, 2019 at 10:06 PM Chen, Ciyong <
>     >     > ciyong.chen@intel.com
>     >     > > >
>     >     > >     > wrote:
>     >     > >     > >
>     >     > >     > > > Hi Pedro,
>     >     > >     > > >
>     >     > >     > > > I was able to reproduced the similar result (v1.5
> is
>     > ~%5.6
>     >     > slower
>     >     > > than
>     >     > >     > > > v1.4, I was using 18 cores for computing) with your
>     > script on
>     >     > >     > C5.18xlarge.
>     >     > >     > > > But need to bind the cores with below command when
>     > running the
>     >     > > script,
>     >     > >     > > > (without setting the env variables, I got a close
> time
>     > (<1%)
>     >     > with
>     >     > > v1.5
>     >     > >     > and
>     >     > >     > > > v1.4)
>     >     > >     > > >         export
>     >     > > KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0
>     >     > >     > > >         export OMP_NUM_THREADS=18
>     >     > >     > > >
>     >     > >     > > > Did you set any env variables during running?
>     >     > >     > > >
>     >     > >     > > > The performance result I got as below:
>     >     > >     > > > 1) 1.4.1.rc0
> (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
>     >     > >     > > > real    12m10.856s
>     >     > >     > > > user    234m49.576s
>     >     > >     > > > sys     4m38.044s
>     >     > >     > > >
>     >     > >     > > > 2) 1.5.0.rc1
> (4d9667121ae6fb643f2a02ab15e25231ed756cde)
>     >     > >     > > > real    12m52.140s
>     >     > >     > > > user    246m30.740s
>     >     > >     > > > sys     5m8.188s
>     >     > >     > > >
>     >     > >     > > > As I looked at the profiling data, most of the ops
> have
>     > same
>     >     > perf
>     >     > >     > between
>     >     > >     > > > v1.4 and v1.5. But some ops like "
> _backward_BatchNorm"
>     > and
>     >     > > "Pooling"
>     >     > >     > is
>     >     > >     > > > ~1.37x slower on v1.5 compared with v1.4.
>     >     > >     > > > Will do further analysis on these ops.
>     >     > >     > > >
>     >     > >     > > > Here's the hardware/OS info from my side:
>     >     > >     > > > ----------Python Info----------
>     >     > >     > > > Version      : 3.6.8
>     >     > >     > > > Compiler     : GCC 7.3.0
>     >     > >     > > > Build        : ('default', 'Dec 30 2018 01:22:34')
>     >     > >     > > > Arch         : ('64bit', '')
>     >     > >     > > > ------------Pip Info-----------
>     >     > >     > > > Version      : 19.0.3
>     >     > >     > > > Directory    :
>     >     > >     > > >
>     >     > >
>     >
> /home/ubuntu/anaconda3/envs/perf-mxnet/lib/python3.6/site-packages/pip
>     >     > >     > > > ----------MXNet Info-----------
>     >     > >     > > > Version      : 1.5.0
>     >     > >     > > > Directory    :
>     > /home/ubuntu/ws/incubator-mxnet/python/mxnet
>     >     > >     > > > Hashtag not found. Not installed from pre-built
> package.
>     >     > >     > > > ----------System Info----------
>     >     > >     > > > Platform     :
>     >     > Linux-4.4.0-1085-aws-x86_64-with-debian-stretch-sid
>     >     > >     > > > system       : Linux
>     >     > >     > > > node         : ip-172-31-32-129
>     >     > >     > > > release      : 4.4.0-1085-aws
>     >     > >     > > > version      : #96-Ubuntu SMP Tue Jun 11 09:08:32
> UTC
>     > 2019
>     >     > >     > > > ----------Hardware Info----------
>     >     > >     > > > machine      : x86_64
>     >     > >     > > > processor    : x86_64
>     >     > >     > > > Architecture:          x86_64
>     >     > >     > > > CPU op-mode(s):        32-bit, 64-bit
>     >     > >     > > > Byte Order:            Little Endian
>     >     > >     > > > CPU(s):                72
>     >     > >     > > > On-line CPU(s) list:   0-71
>     >     > >     > > > Thread(s) per core:    2
>     >     > >     > > > Core(s) per socket:    18
>     >     > >     > > > Socket(s):             2
>     >     > >     > > > NUMA node(s):          2
>     >     > >     > > > Vendor ID:             GenuineIntel
>     >     > >     > > > CPU family:            6
>     >     > >     > > > Model:                 85
>     >     > >     > > > Model name:            Intel(R) Xeon(R) Platinum
> 8124M
>     > CPU @
>     >     > > 3.00GHz
>     >     > >     > > > Stepping:              3
>     >     > >     > > > CPU MHz:               3000.000
>     >     > >     > > > BogoMIPS:              6000.00
>     >     > >     > > > Hypervisor vendor:     KVM
>     >     > >     > > > Virtualization type:   full
>     >     > >     > > > L1d cache:             32K
>     >     > >     > > > L1i cache:             32K
>     >     > >     > > > L2 cache:              1024K
>     >     > >     > > > L3 cache:              25344K
>     >     > >     > > > NUMA node0 CPU(s):     0-17,36-53
>     >     > >     > > > NUMA node1 CPU(s):     18-35,54-71
>     >     > >     > > > Flags:                 fpu vme de pse tsc msr pae
> mce
>     > cx8 apic
>     >     > > sep mtrr
>     >     > >     > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2
> ss ht
>     > syscall
>     >     > nx
>     >     > >     > pdpe1gb
>     >     > >     > > > rdtscp lm constant_tsc arch_perfmon rep_good nopl
>     > xtopology
>     >     > > nonstop_tsc
>     >     > >     > > > aperfmperf tsc_known_freq pni pclmulqdq monitor
> ssse3
>     > fma cx16
>     >     > > pcid
>     >     > >     > sse4_1
>     >     > >     > > > sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
> xsave
>     > avx
>     >     > f16c
>     >     > > rdrand
>     >     > >     > > > hypervisor lahf_lm abm 3dnowprefetch invpcid_single
>     > kaiser
>     >     > > fsgsbase
>     >     > >     > > > tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid
> rtm mpx
>     > avx512f
>     >     > > rdseed
>     >     > >     > adx
>     >     > >     > > > smap clflushopt clwb avx512cd xsaveopt xsavec
> xgetbv1
>     > ida arat
>     >     > pku
>     >     > >     > > > ----------Network Test----------
>     >     > >     > > >
>     >     > >     > > >
>     >     > >     > > > -Ciyong
>     >     > >     > > >
>     >     > >     > > >
>     >     > >     > > > -----Original Message-----
>     >     > >     > > > From: Zhao, Patric [mailto:patric.zhao@intel.com]
>     >     > >     > > > Sent: Thursday, June 27, 2019 9:55 AM
>     >     > >     > > > To: dev@mxnet.incubator.apache.org
>     >     > >     > > > Cc: dev@mxnet.apache.org
>     >     > >     > > > Subject: RE: [VOTE] Release Apache MXNet
> (incubating)
>     > version
>     >     > > 1.5.0.rc1
>     >     > >     > > >
>     >     > >     > > > Could we run more epochs to see the performance
>     > difference or
>     >     > > profiling
>     >     > >     > > > the difference between good and bad run?
>     >     > >     > > >
>     >     > >     > > > > -----Original Message-----
>     >     > >     > > > > From: Pedro Larroy [mailto:
>     > pedro.larroy.lists@gmail.com]
>     >     > >     > > > > Sent: Thursday, June 27, 2019 9:35 AM
>     >     > >     > > > > To: dev@mxnet.incubator.apache.org
>     >     > >     > > > > Cc: dev@mxnet.apache.org
>     >     > >     > > > > Subject: Re: [VOTE] Release Apache MXNet
> (incubating)
>     > version
>     >     > >     > > > > 1.5.0.rc1
>     >     > >     > > > >
>     >     > >     > > > > I run again and the gap is again bigger, I guess
> we
>     > need to
>     >     > > average
>     >     > >     > > > > out the times across several runs:
>     >     > >     > > > >
>     >     > >     > > > > piotr@ip-172-31-63-171
>     > :0:~/deeplearning-benchmark/dawnbench
>     >     > >     > > > > (master)+$ time ~/mxnet_1.4/py3_venv/bin/python
>     > cifar10.py
>     >     > > --epochs 5
>     >     > >     > > > > && time ~/mxnet_1.5/py3_venv/bin/python
> cifar10.py
>     > --epochs 5
>     >     > >     > > > > [23:17:09]
> ../src/io/iter_image_recordio_2.cc:172:
>     >     > >     > > > > ImageRecordIOParser2:
>     >     > >     > > > >
>     > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use
>     >     > 4
>     >     > >     > threads
>     >     > >     > > > > for decoding..
>     >     > >     > > > > [23:17:09]
> ../src/io/iter_image_recordio_2.cc:230:
>     > Load mean
>     >     > > image
>     >     > >     > > > > from
>     > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     >     > >     > > > > [23:17:09]
> ../src/io/iter_image_recordio_2.cc:248:
>     > Load mean
>     >     > > image
>     >     > >     > > > > from
>     > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     >     > > completed
>     >     > >     > > > > [23:17:09]
> ../src/io/iter_image_recordio_2.cc:172:
>     >     > >     > > > > ImageRecordIOParser2:
>     >     > >     > > > >
>     > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
>     >     > > threads
>     >     > >     > > > > for decoding..
>     >     > >     > > > > [23:17:09]
> ../src/io/iter_image_recordio_2.cc:230:
>     > Load mean
>     >     > > image
>     >     > >     > > > > from
>     > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     >     > >     > > > > [23:17:09]
> ../src/io/iter_image_recordio_2.cc:248:
>     > Load mean
>     >     > > image
>     >     > >     > > > > from
>     > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     >     > > completed
>     >     > >     > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001,
> 123:
>     > 0.0005,
>     >     > > 300:
>     >     > >     > > > > 0.0001} Epoch 0, Changed learning rate to 0.05
>     > [23:17:09]
>     >     > >     > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> Allocate
>     >     > >     > > > > 147456 bytes with malloc directly
>     >     > >     > > > > [23:17:09]
> ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
>     >     > Allocate
>     >     > >     > > > > 589824 bytes with malloc directly
>     >     > >     > > > > [23:17:09]
> ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
>     >     > Allocate
>     >     > >     > > > > 2359296 bytes with malloc directly
>     >     > >     > > > > [23:17:09]
> ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
>     >     > Allocate
>     >     > >     > > > > 9437184 bytes with malloc directly
>     >     > >     > > > > Epoch 0, Batch 199, Speed=384.149839
>     >     > >     > > > > Epoch 0, Duration=140.919567
>     >     > >     > > > > Epoch 0, Training accuracy=0.115169
>     >     > >     > > > > Epoch 0, Validation accuracy=0.141317
>     >     > >     > > > > Epoch 1, Batch 199, Speed=433.380512
>     >     > >     > > > > Epoch 1, Duration=119.553233
>     >     > >     > > > > Epoch 1, Training accuracy=0.170956
>     >     > >     > > > > Epoch 1, Validation accuracy=0.216146
>     >     > >     > > > > Epoch 2, Batch 199, Speed=434.864699
>     >     > >     > > > > Epoch 2, Duration=123.278490
>     >     > >     > > > > Epoch 2, Training accuracy=0.209455
>     >     > >     > > > > Epoch 2, Validation accuracy=0.247296
>     >     > >     > > > > Epoch 3, Batch 199, Speed=433.401854
>     >     > >     > > > > Epoch 3, Duration=118.327797
>     >     > >     > > > > Epoch 3, Training accuracy=0.248701
>     >     > >     > > > > Epoch 3, Validation accuracy=0.302083
>     >     > >     > > > > Epoch 4, Batch 199, Speed=419.713707
>     >     > >     > > > > Epoch 4, Duration=126.468409
>     >     > >     > > > > Epoch 4, Training accuracy=0.260949
>     >     > >     > > > > Epoch 4, Validation accuracy=0.269030
>     >     > >     > > > >
>     >     > >     > > > > real    10m55.796s
>     >     > >     > > > > user    399m33.567s
>     >     > >     > > > > sys     13m55.904s
>     >     > >     > > > > [23:28:04]
> ../src/io/iter_image_recordio_2.cc:172:
>     >     > >     > > > > ImageRecordIOParser2:
>     >     > >     > > > >
>     > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use
>     >     > 4
>     >     > >     > threads
>     >     > >     > > > > for decoding..
>     >     > >     > > > > [23:28:04]
> ../src/io/iter_image_recordio_2.cc:230:
>     > Load mean
>     >     > > image
>     >     > >     > > > > from
>     > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     >     > >     > > > > [23:28:04]
> ../src/io/iter_image_recordio_2.cc:248:
>     > Load mean
>     >     > > image
>     >     > >     > > > > from
>     > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     >     > > completed
>     >     > >     > > > > [23:28:04]
> ../src/io/iter_image_recordio_2.cc:172:
>     >     > >     > > > > ImageRecordIOParser2:
>     >     > >     > > > >
>     > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
>     >     > > threads
>     >     > >     > > > > for decoding..
>     >     > >     > > > > [23:28:04]
> ../src/io/iter_image_recordio_2.cc:230:
>     > Load mean
>     >     > > image
>     >     > >     > > > > from
>     > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     >     > >     > > > > [23:28:04]
> ../src/io/iter_image_recordio_2.cc:248:
>     > Load mean
>     >     > > image
>     >     > >     > > > > from
>     > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     >     > > completed
>     >     > >     > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001,
> 123:
>     > 0.0005,
>     >     > > 300:
>     >     > >     > > > > 0.0001} Epoch 0, Changed learning rate to 0.05
> Epoch
>     > 0, Batch
>     >     > > 199,
>     >     > >     > > > > Speed=419.039188 Epoch 0, Duration=143.934903
> Epoch 0,
>     >     > Training
>     >     > >     > > > > accuracy=0.122542 Epoch 0, Validation
> accuracy=0.164359
>     >     > Epoch 1,
>     >     > >     > Batch
>     >     > >     > > > > 199, Speed=445.257048 Epoch 1,
> Duration=135.248399
>     > Epoch 1,
>     >     > > Training
>     >     > >     > > > > accuracy=0.178828 Epoch 1, Validation
> accuracy=0.199419
>     >     > Epoch 2,
>     >     > >     > Batch
>     >     > >     > > > > 199, Speed=447.115215 Epoch 2,
> Duration=132.003770
>     > Epoch 2,
>     >     > > Training
>     >     > >     > > > > accuracy=0.217808 Epoch 2, Validation
> accuracy=0.233073
>     >     > Epoch 3,
>     >     > >     > Batch
>     >     > >     > > > > 199, Speed=441.079477 Epoch 3,
> Duration=126.543316
>     > Epoch 3,
>     >     > > Training
>     >     > >     > > > > accuracy=0.248102 Epoch 3, Validation
> accuracy=0.293870
>     >     > Epoch 4,
>     >     > >     > Batch
>     >     > >     > > > > 199, Speed=449.329787 Epoch 4,
> Duration=138.398325
>     > Epoch 4,
>     >     > > Training
>     >     > >     > > > > accuracy=0.270021 Epoch 4, Validation
> accuracy=0.311498
>     >     > >     > > > >
>     >     > >     > > > > real    11m45.329s
>     >     > >     > > > > user    426m13.908s
>     >     > >     > > > > sys     16m45.093s
>     >     > >     > > > >
>     >     > >     > > > > On Wed, Jun 26, 2019 at 4:18 PM Pedro Larroy
>     >     > >     > > > > <pe...@gmail.com> wrote:
>     >     > >     > > > > >
>     >     > >     > > > > > The difference looks smaller now, more like
> your
>     > numbers. I
>     >     > > wonder
>     >     > >     > > > > > if something happened during the previous
> benchmark
>     > like a
>     >     > > system
>     >     > >     > > > > > update...
>     >     > >     > > > > >
>     >     > >     > > > > >
>     >     > >     > > > > > piotr@ip-172-31-63-171
>     >     > :0:~/deeplearning-benchmark/dawnbench
>     >     > >     > > > > (master)+$
>     >     > >     > > > > > time ~/mxnet_1.4/py3_venv/bin/python cifar10.py
>     > --epochs 5
>     >     > &&
>     >     > > time
>     >     > >     > > > > > ~/mxnet_1.5/py3_venv/bin/python cifar10.py
> --epochs 5
>     >     > > [22:49:41]
>     >     > >     > > > > > ../src/io/iter_image_recordio_2.cc:172:
>     >     > >     > > > > > ImageRecordIOParser2:
>     >     > >     > > > > >
>     > /home/piotr/deeplearning-benchmark/data/cifar/train.rec,
>     >     > use 4
>     >     > >     > > > > > threads for decoding..
>     >     > >     > > > > > [22:49:41]
> ../src/io/iter_image_recordio_2.cc:230:
>     > Load
>     >     > mean
>     >     > > image
>     >     > >     > > > > > from
>     > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     >     > >     > > > > > [22:49:41]
> ../src/io/iter_image_recordio_2.cc:248:
>     > Load
>     >     > mean
>     >     > > image
>     >     > >     > > > > > from
>     > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     >     > >     > > > > completed
>     >     > >     > > > > > [22:49:41]
> ../src/io/iter_image_recordio_2.cc:172:
>     >     > >     > > > > > ImageRecordIOParser2:
>     >     > >     > > > > >
>     > /home/piotr/deeplearning-benchmark/data/cifar/test.rec,
>     >     > use 4
>     >     > >     > > > > > threads for decoding..
>     >     > >     > > > > > [22:49:41]
> ../src/io/iter_image_recordio_2.cc:230:
>     > Load
>     >     > mean
>     >     > > image
>     >     > >     > > > > > from
>     > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     >     > >     > > > > > [22:49:41]
> ../src/io/iter_image_recordio_2.cc:248:
>     > Load
>     >     > mean
>     >     > > image
>     >     > >     > > > > > from
>     > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     >     > >     > > > > completed
>     >     > >     > > > > > lr_schedule: {0: 0.05, 82:
> 0.005000000000000001, 123:
>     >     > 0.0005,
>     >     > > 300:
>     >     > >     > > > > > 0.0001} Epoch 0, Changed learning rate to 0.05
>     > [22:49:42]
>     >     > >     > > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> Allocate
>     >     > >     > > > > > 147456 bytes with malloc directly
>     >     > >     > > > > > [22:49:42]
>     > ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
>     >     > > Allocate
>     >     > >     > > > > > 589824 bytes with malloc directly
>     >     > >     > > > > > [22:49:42]
>     > ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
>     >     > > Allocate
>     >     > >     > > > > > 2359296 bytes with malloc directly
>     >     > >     > > > > > [22:49:42]
>     > ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
>     >     > > Allocate
>     >     > >     > > > > > 9437184 bytes with malloc directly
>     >     > >     > > > > > Epoch 0, Batch 199, Speed=426.182733 Epoch 0,
>     >     > > Duration=134.868458
>     >     > >     > > > > > Epoch 0, Training accuracy=0.127238 Epoch 0,
>     > Validation
>     >     > >     > > > > > accuracy=0.206388 Epoch 1, Batch 199,
>     > Speed=313.127156
>     >     > Epoch
>     >     > > 1,
>     >     > >     > > > > > Duration=128.041775 Epoch 1, Training
>     > accuracy=0.182065
>     >     > Epoch
>     >     > > 1,
>     >     > >     > > > > > Validation accuracy=0.202524 Epoch 2, Batch
> 199,
>     >     > > Speed=410.931187
>     >     > >     > > > > > Epoch 2, Duration=124.920588 Epoch 2, Training
>     >     > > accuracy=0.202584
>     >     > >     > > > > > Epoch 2, Validation accuracy=0.245693 Epoch 3,
> Batch
>     > 199,
>     >     > >     > > > > > Speed=419.119335 Epoch 3, Duration=120.948349
> Epoch
>     > 3,
>     >     > > Training
>     >     > >     > > > > > accuracy=0.235854 Epoch 3, Validation
>     > accuracy=0.291066
>     >     > Epoch
>     >     > > 4,
>     >     > >     > > > > > Batch 199, Speed=430.473733 Epoch 4,
>     > Duration=130.181724
>     >     > > Epoch 4,
>     >     > >     > > > > > Training accuracy=0.257773 Epoch 4, Validation
>     >     > > accuracy=0.304988
>     >     > >     > > > > >
>     >     > >     > > > > > real    11m7.356s
>     >     > >     > > > > > user    406m9.910s
>     >     > >     > > > > > sys     14m18.349s
>     >     > >     > > > > > [23:00:49]
> ../src/io/iter_image_recordio_2.cc:172:
>     >     > >     > > > > > ImageRecordIOParser2:
>     >     > >     > > > > >
>     > /home/piotr/deeplearning-benchmark/data/cifar/train.rec,
>     >     > use 4
>     >     > >     > > > > > threads for decoding..
>     >     > >     > > > > > [23:00:49]
> ../src/io/iter_image_recordio_2.cc:230:
>     > Load
>     >     > mean
>     >     > > image
>     >     > >     > > > > > from
>     > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     >     > >     > > > > > [23:00:49]
> ../src/io/iter_image_recordio_2.cc:248:
>     > Load
>     >     > mean
>     >     > > image
>     >     > >     > > > > > from
>     > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     >     > >     > > > > completed
>     >     > >     > > > > > [23:00:49]
> ../src/io/iter_image_recordio_2.cc:172:
>     >     > >     > > > > > ImageRecordIOParser2:
>     >     > >     > > > > >
>     > /home/piotr/deeplearning-benchmark/data/cifar/test.rec,
>     >     > use 4
>     >     > >     > > > > > threads for decoding..
>     >     > >     > > > > > [23:00:49]
> ../src/io/iter_image_recordio_2.cc:230:
>     > Load
>     >     > mean
>     >     > > image
>     >     > >     > > > > > from
>     > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     >     > >     > > > > > [23:00:49]
> ../src/io/iter_image_recordio_2.cc:248:
>     > Load
>     >     > mean
>     >     > > image
>     >     > >     > > > > > from
>     > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     >     > >     > > > > completed
>     >     > >     > > > > > lr_schedule: {0: 0.05, 82:
> 0.005000000000000001, 123:
>     >     > 0.0005,
>     >     > > 300:
>     >     > >     > > > > > 0.0001} Epoch 0, Changed learning rate to 0.05
> Epoch
>     > 0,
>     >     > Batch
>     >     > > 199,
>     >     > >     > > > > > Speed=348.618154 Epoch 0, Duration=146.469352
> Epoch
>     > 0,
>     >     > > Training
>     >     > >     > > > > > accuracy=0.124121 Epoch 0, Validation
>     > accuracy=0.167227
>     >     > Epoch
>     >     > > 1,
>     >     > >     > > > > > Batch 199, Speed=452.790825 Epoch 1,
>     > Duration=130.199421
>     >     > > Epoch 1,
>     >     > >     > > > > > Training
>     >     > >     > > > > > accuracy=0.183863 Epoch 1, Validation
>     > accuracy=0.237079
>     >     > Epoch
>     >     > > 2,
>     >     > >     > > > > > Batch 199, Speed=451.406559 Epoch 2,
>     > Duration=126.320823
>     >     > > Epoch 2,
>     >     > >     > > > > > Training
>     >     > >     > > > > > accuracy=0.214844 Epoch 2, Validation
>     > accuracy=0.244692
>     >     > Epoch
>     >     > > 3,
>     >     > >     > > > > > Batch 199, Speed=403.161873 Epoch 3,
>     > Duration=125.331660
>     >     > > Epoch 3,
>     >     > >     > > > > > Training
>     >     > >     > > > > > accuracy=0.243506 Epoch 3, Validation
>     > accuracy=0.301182
>     >     > Epoch
>     >     > > 4,
>     >     > >     > > > > > Batch 199, Speed=450.826598 Epoch 4,
>     > Duration=126.426253
>     >     > > Epoch 4,
>     >     > >     > > > > > Training
>     >     > >     > > > > > accuracy=0.266424 Epoch 4, Validation
>     > accuracy=0.311899
>     >     > >     > > > > >
>     >     > >     > > > > > real    11m21.930s
>     >     > >     > > > > > user    415m3.855s
>     >     > >     > > > > > sys     13m53.975s
>     >     > >     > > > > >
>     >     > >     > > > > > On Wed, Jun 26, 2019 at 3:50 PM Pedro Larroy
>     >     > >     > > > > > <pe...@gmail.com> wrote:
>     >     > >     > > > > > >
>     >     > >     > > > > > > Hi Ciyong, thanks for trying to reproduce:
>     >     > >     > > > > > >
>     >     > >     > > > > > > I used this one:
>     >     > >     > > > > > > https://github.com/awslabs/deeplearning-
>     >     > >     > > > > benchmark/blob/master/dawnbe
>     >     > >     > > > > > > nch/cifar10.py
>     >     > >     > > > > > >
>     >     > >     > > > > > > Could you provide hardware and OS details?
>     >     > >     > > > > > >
>     >     > >     > > > > > > I will rerun and repost numbers in a few
> minutes.
>     >     > >     > > > > > >
>     >     > >     > > > > > > Pedro.
>     >     > >     > > > > > >
>     >     > >     > > > > > > On Wed, Jun 26, 2019 at 4:18 AM Chen, Ciyong
>     >     > >     > > > > > > <ci...@intel.com>
>     >     > >     > > > > wrote:
>     >     > >     > > > > > > >
>     >     > >     > > > > > > > Hi Pedro,
>     >     > >     > > > > > > >
>     >     > >     > > > > > > > I'm looking at this case, and using the
> script of
>     >     > >     > > > > > > >
>     >     > >
> "incubator-mxnet/example/image-classification/train_cifar10.py"
>     >     > >     > > > > > > > to get
>     >     > >     > > > > the timing data, but seems there's not much
> difference
>     >     > between
>     >     > > mxnet
>     >     > >     > > > > 1.4.1.rc0 and 1.5.0.rc1 on C5.18xlarge.
>     >     > >     > > > > > > >
>     >     > >     > > > > > > > Not sure if there's any difference in the
> python
>     >     > script,
>     >     > > can
>     >     > >     > you
>     >     > >     > > > > > > > point me
>     >     > >     > > > > the link to get your script (cifar10.py)?
>     >     > >     > > > > > > > Or you can also have a try with MXNet's
> script
>     >     > >     > > > > > > > (train_cifar10.py) and see
>     >     > >     > > > > the performance.
>     >     > >     > > > > > > >
>     >     > >     > > > > > > > Here's the command I used to collect the
> time:
>     >     > >     > > > > > > >         python train_cifar10.py
> --num-epoch=5
>     >     > >     > > > > > > >
>     >     > >     > > > > > > > 1) 1.5.0.rc1
>     > (4d9667121ae6fb643f2a02ab15e25231ed756cde)
>     >     > >     > > > > > > >         real    9m4.880s
>     >     > >     > > > > > > >         user    333m13.340s
>     >     > >     > > > > > > >         sys     14m36.100s
>     >     > >     > > > > > > >
>     >     > >     > > > > > > > 2) 1.4.1.rc0
>     > (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
>     >     > >     > > > > > > >         real    9m2.155s
>     >     > >     > > > > > > >         user    329m37.092s
>     >     > >     > > > > > > >         sys     16m8.668s
>     >     > >     > > > > > > >
>     >     > >     > > > > > > > -Ciyong
>     >     > >     > > > > > > >
>     >     > >     > > > > > > >
>     >     > >     > > > > > > > -----Original Message-----
>     >     > >     > > > > > > > From: Pedro Larroy [mailto:
>     >     > pedro.larroy.lists@gmail.com]
>     >     > >     > > > > > > > Sent: Wednesday, June 26, 2019 6:28 AM
>     >     > >     > > > > > > > To: dev@mxnet.incubator.apache.org
>     >     > >     > > > > > > > Cc: dev@mxnet.apache.org
>     >     > >     > > > > > > > Subject: Re: [VOTE] Release Apache MXNet
>     > (incubating)
>     >     > > version
>     >     > >     > > > > > > > 1.5.0.rc1
>     >     > >     > > > > > > >
>     >     > >     > > > > > > > Hi these were my build flags and system
> info:
>     >     > >     > > > > > > >
>     >     > >     > > > > > > >
>     >     > >     > > > > > > > --- # CMake configuration
>     >     > >     > > > > > > > USE_CUDA: "OFF" # Build with CUDA support
>     >     > >     > > > > > > > USE_OLDCMAKECUDA: "OFF" # Build with old
> cmake
>     > cuda
>     >     > >     > > > > > > > USE_NCCL: "OFF" # Use NVidia NCCL with CUDA
>     >     > >     > > > > > > > USE_OPENCV: "ON" # Build with OpenCV
> support
>     >     > >     > > > > > > > USE_OPENMP: "ON" # Build with Openmp
> support
>     >     > >     > > > > > > > USE_CUDNN: "ON" # Build with cudnn
> support) #
>     > one could
>     >     > > set
>     >     > >     > > > > > > > CUDNN_ROOT for search path
>     >     > >     > > > > > > > USE_SSE: "ON" # Build with x86 SSE
> instruction
>     > support
>     >     > IF
>     >     > > NOT
>     >     > >     > > > > > > > ARM
>     >     > >     > > > > > > > USE_F16C: "ON" # Build with x86 F16C
> instruction
>     >     > support)
>     >     > > #
>     >     > >     > > > > autodetects support if "ON"
>     >     > >     > > > > > > > USE_LAPACK: "ON" # Build with lapack
> support
>     >     > >     > > > > > > > USE_MKL_IF_AVAILABLE: "ON" # Use MKL if
> found
>     >     > >     > > > > > > > USE_MKLML_MKL: "ON" # Use MKLDNN variant
> of MKL
>     > (if MKL
>     >     > > found)
>     >     > >     > > > > > > > IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
>     >     > >     > > > > > > > USE_MKLDNN: "ON" # Use MKLDNN variant of
> MKL (if
>     > MKL
>     >     > > found) IF
>     >     > >     > > > > > > > USE_MKL_IF_AVAILABLE AND (NOT APPLE)
>     >     > >     > > > > > > > USE_OPERATOR_TUNING: "ON" # Enable
> auto-tuning of
>     >     > > operators IF
>     >     > >     > > > > NOT
>     >     > >     > > > > > > > MSVC
>     >     > >     > > > > > > > USE_GPERFTOOLS: "ON" # Build with
> GPerfTools
>     > support
>     >     > (if
>     >     > > found)
>     >     > >     > > > > > > > USE_JEMALLOC: "ON" # Build with Jemalloc
> support
>     >     > >     > > > > > > > USE_PROFILER: "ON" # Build with Profiler
> support
>     >     > >     > > > > > > > USE_DIST_KVSTORE: "OFF" # Build with
> DIST_KVSTORE
>     >     > support
>     >     > >     > > > > > > > USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC
> Plugins
>     >     > >     > > > > > > > USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin
>     >     > >     > > > > > > > USE_CPP_PACKAGE: "OFF" # Build C++ Package
>     >     > >     > > > > > > > USE_MXNET_LIB_NAMING: "ON" # Use MXNet
> library
>     > naming
>     >     > >     > > > > conventions.
>     >     > >     > > > > > > > USE_GPROF: "OFF" # Compile with gprof
>     > (profiling) flag
>     >     > >     > > > > > > > USE_CXX14_IF_AVAILABLE: "OFF" # Build with
> C++14
>     > if the
>     >     > >     > compiler
>     >     > >     > > > > > > > supports it
>     >     > >     > > > > > > > USE_VTUNE: "OFF" # Enable use of Intel
> Amplifier
>     > XE
>     >     > > (VTune)) #
>     >     > >     > > > > > > > one could set VTUNE_ROOT for search path
>     >     > >     > > > > > > > ENABLE_CUDA_RTC: "ON" # Build with CUDA
> runtime
>     >     > > compilation
>     >     > >     > > > > > > > support
>     >     > >     > > > > > > > BUILD_CPP_EXAMPLES: "ON" # Build cpp
> examples
>     >     > >     > > > > > > > INSTALL_EXAMPLES: "OFF" # Install the
> example
>     > source
>     >     > > files.
>     >     > >     > > > > > > > USE_SIGNAL_HANDLER: "ON" # Print stack
> traces on
>     >     > > segfaults.
>     >     > >     > > > > > > > USE_TENSORRT: "OFF" # Enable infeference
>     > optimization
>     >     > with
>     >     > >     > > > TensorRT.
>     >     > >     > > > > > > > USE_ASAN: "OFF" # Enable Clang/GCC ASAN
>     > sanitizers.
>     >     > >     > > > > > > > ENABLE_TESTCOVERAGE: "OFF" # Enable
> compilation
>     > with
>     >     > test
>     >     > >     > > > > > > > coverage metric output
>     >     > >     > > > > > > > CMAKE_BUILD_TYPE: "Release"
>     >     > >     > > > > > > > CMAKE_CUDA_COMPILER_LAUNCHER: "ccache"
>     >     > >     > > > > > > > CMAKE_C_COMPILER_LAUNCHER: "ccache"
>     >     > >     > > > > > > > CMAKE_CXX_COMPILER_LAUNCHER: "ccache"
>     >     > >     > > > > > > >
>     >     > >     > > > > > > > commit
> 4d9667121ae6fb643f2a02ab15e25231ed756cde
>     > (HEAD,
>     >     > > tag:
>     >     > >     > > > > > > > 1.5.0.rc1,
>     >     > >     > > > > > > > upstream/v1.5.x)
>     >     > >     > > > > > > > commit
> 1a7199691f5cbc6012bb53eecbf884bed5ae6590
>     > (HEAD,
>     >     > > tag:
>     >     > >     > > > > > > > 1.4.1.rc0,
>     >     > >     > > > > > > > upstream/v1.4.x)
>     >     > >     > > > > > > >
>     >     > >     > > > > > > > curl
>     >     > http://169.254.169.254/latest/meta-data/instance-type
>     >     > >     > > > > > > > c5d.18xlarge
>     >     > >     > > > > > > >
>     >     > >     > > > > > > >
>     >     > >     > > > > > > > Version      : 3.6.7
>     >     > >     > > > > > > > Compiler     : GCC 8.2.0
>     >     > >     > > > > > > > Build        : ('default', 'Oct 22 2018
>     > 11:32:17')
>     >     > >     > > > > > > > Arch         : ('64bit', 'ELF')
>     >     > >     > > > > > > > ------------Pip Info-----------
>     >     > >     > > > > > > > Version      : 19.1.1
>     >     > >     > > > > > > > Directory    :
>     >     > >     > /home/piotr/mxnet_1.5/py3_venv/lib/python3.6/site-
>     >     > >     > > > > packages/pip
>     >     > >     > > > > > > > ----------MXNet Info-----------
>     >     > >     > > > > > > > Version      : 1.5.0
>     >     > >     > > > > > > > Directory    :
> /home/piotr/mxnet_1.5/python/mxnet
>     >     > >     > > > > > > > Hashtag not found. Not installed from
> pre-built
>     >     > package.
>     >     > >     > > > > > > > ----------System Info----------
>     >     > >     > > > > > > > Platform     :
>     >     > >     > > >
> Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
>     >     > >     > > > > > > > system       : Linux
>     >     > >     > > > > > > > node         : ip-172-31-63-171
>     >     > >     > > > > > > > release      : 4.15.0-1035-aws
>     >     > >     > > > > > > > version      : #37-Ubuntu SMP Mon Mar 18
>     > 16:15:14 UTC
>     >     > 2019
>     >     > >     > > > > > > > ----------Hardware Info----------
>     >     > >     > > > > > > > machine      : x86_64
>     >     > >     > > > > > > > processor    : x86_64
>     >     > >     > > > > > > > Architecture:        x86_64
>     >     > >     > > > > > > > CPU op-mode(s):      32-bit, 64-bit
>     >     > >     > > > > > > > Byte Order:          Little Endian
>     >     > >     > > > > > > > CPU(s):              72
>     >     > >     > > > > > > > On-line CPU(s) list: 0-71
>     >     > >     > > > > > > > Thread(s) per core:  2
>     >     > >     > > > > > > > Core(s) per socket:  18
>     >     > >     > > > > > > > Socket(s):           2
>     >     > >     > > > > > > > NUMA node(s):        2
>     >     > >     > > > > > > > Vendor ID:           GenuineIntel
>     >     > >     > > > > > > > CPU family:          6
>     >     > >     > > > > > > > Model:               85
>     >     > >     > > > > > > > Model name:          Intel(R) Xeon(R)
> Platinum
>     > 8124M
>     >     > CPU @
>     >     > >     > 3.00GHz
>     >     > >     > > > > > > > Stepping:            4
>     >     > >     > > > > > > > CPU MHz:             1326.446
>     >     > >     > > > > > > > BogoMIPS:            6000.00
>     >     > >     > > > > > > > Hypervisor vendor:   KVM
>     >     > >     > > > > > > > Virtualization type: full
>     >     > >     > > > > > > > L1d cache:           32K
>     >     > >     > > > > > > > L1i cache:           32K
>     >     > >     > > > > > > > L2 cache:            1024K
>     >     > >     > > > > > > > L3 cache:            25344K
>     >     > >     > > > > > > > NUMA node0 CPU(s):   0-17,36-53
>     >     > >     > > > > > > > NUMA node1 CPU(s):   18-35,54-71
>     >     > >     > > > > > > > Flags:               fpu vme de pse tsc
> msr pae
>     > mce cx8
>     >     > > apic
>     >     > >     > sep
>     >     > >     > > > mtrr
>     >     > >     > > > > > > > pge mca cmov pat pse36 clflush mmx fxsr
> sse sse2
>     > ss ht
>     >     > > syscall
>     >     > >     > > > > > > > nx pdpe1gb rdtscp lm constant_tsc
> arch_perfmon
>     > rep_good
>     >     > > nopl
>     >     > >     > > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni
>     > pclmulqdq
>     >     > > monitor
>     >     > >     > > > > > > > ssse3 fma cx16 pcid
>     >     > >     > > > > > > > sse4_1 sse4_2 x2apic movbe popcnt
>     > tsc_deadline_timer
>     >     > aes
>     >     > > xsave
>     >     > >     > > > > > > > avx f16c rdrand hypervisor lahf_lm abm
>     > 3dnowprefetch
>     >     > >     > > > > > > > invpcid_single pti fsgsbase tsc_adjust
> bmi1 hle
>     > avx2
>     >     > smep
>     >     > > bmi2
>     >     > >     > > > > > > > erms invpcid rtm mpx avx512f avx512dq
> rdseed adx
>     > smap
>     >     > >     > clflushopt
>     >     > >     > > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt
> xsavec
>     > xgetbv1
>     >     > > xsaves
>     >     > >     > > > > > > > ida arat pku ospke ----------Network
>     > Test----------
>     >     > >     > > > > > > >
>     >     > >     > > > > > > > ----------Python Info----------
>     >     > >     > > > > > > > Version      : 3.6.7
>     >     > >     > > > > > > > Compiler     : GCC 8.2.0
>     >     > >     > > > > > > > Build        : ('default', 'Oct 22 2018
>     > 11:32:17')
>     >     > >     > > > > > > > Arch         : ('64bit', 'ELF')
>     >     > >     > > > > > > > ------------Pip Info-----------
>     >     > >     > > > > > > > Version      : 19.1.1
>     >     > >     > > > > > > > Directory    :
>     >     > >     > /home/piotr/mxnet_1.4/py3_venv/lib/python3.6/site-
>     >     > >     > > > > packages/pip
>     >     > >     > > > > > > > ----------MXNet Info-----------
>     >     > >     > > > > > > > Version      : 1.4.1
>     >     > >     > > > > > > > Directory    :
> /home/piotr/mxnet_1.4/python/mxnet
>     >     > >     > > > > > > > Hashtag not found. Not installed from
> pre-built
>     >     > package.
>     >     > >     > > > > > > > ----------System Info----------
>     >     > >     > > > > > > > Platform     :
>     >     > >     > > >
> Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
>     >     > >     > > > > > > > system       : Linux
>     >     > >     > > > > > > > node         : ip-172-31-63-171
>     >     > >     > > > > > > > release      : 4.15.0-1035-aws
>     >     > >     > > > > > > > version      : #37-Ubuntu SMP Mon Mar 18
>     > 16:15:14 UTC
>     >     > 2019
>     >     > >     > > > > > > > ----------Hardware Info----------
>     >     > >     > > > > > > > machine      : x86_64
>     >     > >     > > > > > > > processor    : x86_64
>     >     > >     > > > > > > > Architecture:        x86_64
>     >     > >     > > > > > > > CPU op-mode(s):      32-bit, 64-bit
>     >     > >     > > > > > > > Byte Order:          Little Endian
>     >     > >     > > > > > > > CPU(s):              72
>     >     > >     > > > > > > > On-line CPU(s) list: 0-71
>     >     > >     > > > > > > > Thread(s) per core:  2
>     >     > >     > > > > > > > Core(s) per socket:  18
>     >     > >     > > > > > > > Socket(s):           2
>     >     > >     > > > > > > > NUMA node(s):        2
>     >     > >     > > > > > > > Vendor ID:           GenuineIntel
>     >     > >     > > > > > > > CPU family:          6
>     >     > >     > > > > > > > Model:               85
>     >     > >     > > > > > > > Model name:          Intel(R) Xeon(R)
> Platinum
>     > 8124M
>     >     > CPU @
>     >     > >     > 3.00GHz
>     >     > >     > > > > > > > Stepping:            4
>     >     > >     > > > > > > > CPU MHz:             1223.344
>     >     > >     > > > > > > > BogoMIPS:            6000.00
>     >     > >     > > > > > > > Hypervisor vendor:   KVM
>     >     > >     > > > > > > > Virtualization type: full
>     >     > >     > > > > > > > L1d cache:           32K
>     >     > >     > > > > > > > L1i cache:           32K
>     >     > >     > > > > > > > L2 cache:            1024K
>     >     > >     > > > > > > > L3 cache:            25344K
>     >     > >     > > > > > > > NUMA node0 CPU(s):   0-17,36-53
>     >     > >     > > > > > > > NUMA node1 CPU(s):   18-35,54-71
>     >     > >     > > > > > > > Flags:               fpu vme de pse tsc
> msr pae
>     > mce cx8
>     >     > > apic
>     >     > >     > sep
>     >     > >     > > > mtrr
>     >     > >     > > > > > > > pge mca cmov pat pse36 clflush mmx fxsr
> sse sse2
>     > ss ht
>     >     > > syscall
>     >     > >     > > > > > > > nx pdpe1gb rdtscp lm constant_tsc
> arch_perfmon
>     > rep_good
>     >     > > nopl
>     >     > >     > > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni
>     > pclmulqdq
>     >     > > monitor
>     >     > >     > > > > > > > ssse3 fma cx16 pcid
>     >     > >     > > > > > > > sse4_1 sse4_2 x2apic movbe popcnt
>     > tsc_deadline_timer
>     >     > aes
>     >     > > xsave
>     >     > >     > > > > > > > avx f16c rdrand hypervisor lahf_lm abm
>     > 3dnowprefetch
>     >     > >     > > > > > > > invpcid_single pti fsgsbase tsc_adjust
> bmi1 hle
>     > avx2
>     >     > smep
>     >     > > bmi2
>     >     > >     > > > > > > > erms invpcid rtm mpx avx512f avx512dq
> rdseed adx
>     > smap
>     >     > >     > clflushopt
>     >     > >     > > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt
> xsavec
>     > xgetbv1
>     >     > > xsaves
>     >     > >     > > > > > > > ida arat pku ospke ----------Network
>     > Test----------
>     >     > >     > > > > > > >
>     >     > >     > > > > > > > On Tue, Jun 25, 2019 at 2:35 PM Pedro
> Larroy
>     >     > >     > > > > <pe...@gmail.com> wrote:
>     >     > >     > > > > > > > >
>     >     > >     > > > > > > > > I did a training of cifar10 in CPU and
> seems
>     > there's
>     >     > > some
>     >     > >     > > > > > > > > regressions in the range of 7% increase
> of
>     > training
>     >     > time
>     >     > >     > against
>     >     > >     > > > 1.4.1:
>     >     > >     > > > > > > > >
>     >     > >     > > > > > > > > (py3_venv)
>     >     > >     > > > > > > > > piotr@ip-172-31-63-171
>     >     > > :0:~/deeplearning-benchmark/dawnbench
>     >     > >     > > > > > > > > (master)+$ time python cifar10.py
> --epochs 5
>     >     > >     > > > > > > > > real    11m30.388s
>     >     > >     > > > > > > > > user    417m7.766s
>     >     > >     > > > > > > > > sys     16m57.315s
>     >     > >     > > > > > > > >
>     >     > >     > > > > > > > > VS 1.4.1:
>     >     > >     > > > > > > > > real    10m41.994s
>     >     > >     > > > > > > > > user    392m40.646s
>     >     > >     > > > > > > > > sys     12m30.601s
>     >     > >     > > > > > > > >
>     >     > >     > > > > > > > >
>     >     > >     > > > > > > > > On Thu, Jun 20, 2019 at 10:15 PM Lai Wei
> <
>     >     > >     > royweilai@gmail.com>
>     >     > >     > > > > wrote:
>     >     > >     > > > > > > > > >
>     >     > >     > > > > > > > > > Hi Anirudh,
>     >     > >     > > > > > > > > >
>     >     > >     > > > > > > > > > Thanks for jumping into this quickly, I
>     > followed up
>     >     > > on the
>     >     > >     > > > issue.
>     >     > >     > > > > > > > > >
>     >     > >     > > > > > > > > > I was meant for sockeye
>     > developer/maintainers to
>     >     > help
>     >     > > setup
>     >     > >     > > > > > > > > > nightly tests and raise issues early.
>     >     > >     > > > > > > > > >
>     >     > >     > > > > > > > > > Thanks!
>     >     > >     > > > > > > > > >
>     >     > >     > > > > > > > > > On Fri, Jun 21, 2019 at 10:10 AM
> Haibin Lin
>     >     > >     > > > > > > > > > <ha...@gmail.com>
>     >     > >     > > > > > > > > > wrote:
>     >     > >     > > > > > > > > >
>     >     > >     > > > > > > > > > > In GluonNLP we are testing with MXNET
>     > nightly
>     >     > build
>     >     > > for
>     >     > >     > > > > > > > > > > each PR, and we did find some MXNet
>     > related issue
>     >     > > caught
>     >     > >     > by
>     >     > >     > > > the CI.
>     >     > >     > > > > > > > > > > I recommend other toolkits also add
>     > integration
>     >     > > tests
>     >     > >     > with
>     >     > >     > > > > > > > > > > MXNet
>     >     > >     > > > > nightly.
>     >     > >     > > > > > > > > > > It helps identify issues early.
>     >     > >     > > > > > > > > > >
>     >     > >     > > > > > > > > > > Best,
>     >     > >     > > > > > > > > > > Haibin
>     >     > >     > > > > > > > > > >
>     >     > >     > > > > > > > > > > On Thu, Jun 20, 2019 at 18:52 Zhao,
> Patric
>     >     > >     > > > > > > > > > > <pa...@intel.com>
>     >     > >     > > > > wrote:
>     >     > >     > > > > > > > > > >
>     >     > >     > > > > > > > > > > > Thanks to raise the issue and we
> will
>     > take a
>     >     > look
>     >     > > ASAP.
>     >     > >     > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > The downstream cases is not in the
> MXNet
>     > CI so
>     >     > > it's
>     >     > >     > hard
>     >     > >     > > > > > > > > > > > to catch the potential bugs or
>     > performance
>     >     > > degradation
>     >     > >     > > > > > > > > > > > for
>     >     > >     > > > > MXNet developers.
>     >     > >     > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > In the future, I suggest adding
> the major
>     >     > > downstream
>     >     > >     > > > > > > > > > > > test cases, like
>     >     > >     > > > > > > > > > > from
>     >     > >     > > > > > > > > > > > sockeye, GluonNLP, GLuonCV, DGL,
>     > Gluon-TS, into
>     >     > > the
>     >     > >     > > > > > > > > > > > nightly
>     >     > >     > > > > test.
>     >     > >     > > > > > > > > > > > If it's still too heavy,  maybe
> testing
>     > it
>     >     > weekly
>     >     > > or
>     >     > >     > > > > > > > > > > > monthly :)
>     >     > >     > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > Thanks,
>     >     > >     > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > --Patric
>     >     > >     > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > > -----Original Message-----
>     >     > >     > > > > > > > > > > > > From: Anirudh Subramanian
>     >     > >     > > > > > > > > > > > > [mailto:anirudh2290@gmail.com]
>     >     > >     > > > > > > > > > > > > Sent: Friday, June 21, 2019 9:31
> AM
>     >     > >     > > > > > > > > > > > > To:
> dev@mxnet.incubator.apache.org
>     >     > >     > > > > > > > > > > > > Cc: dev@mxnet.apache.org
>     >     > >     > > > > > > > > > > > > Subject: Re: [VOTE] Release
> Apache
>     > MXNet
>     >     > > (incubating)
>     >     > >     > > > > > > > > > > > > version
>     >     > >     > > > > > > > > > > > > 1.5.0.rc1
>     >     > >     > > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > > Hi Lai,
>     >     > >     > > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > > I have opened an issue:
>     >     > >     > > > > > > > > > > > >
>     >     > >     > https://github.com/apache/incubator-mxnet/issues/15297
>     >     > >     > > > > > > > > > > > > I came to know about this issue
> only
>     > today
>     >     > and
>     >     > > I have
>     >     > >     > > > > > > > > > > > > not been
>     >     > >     > > > > > > > > > > monitoring
>     >     > >     > > > > > > > > > > > > sockeye.
>     >     > >     > > > > > > > > > > > > I jumped onto this issue to make
> sure
>     > it
>     >     > wasn't
>     >     > >     > caused
>     >     > >     > > > > > > > > > > > > by the dlpack
>     >     > >     > > > > > > > > > > > changes.
>     >     > >     > > > > > > > > > > > > Also, I don't  think sockeye CI
> checks
>     >     > against
>     >     > >     > master,
>     >     > >     > > > > > > > > > > > > it is using
>     >     > >     > > > > > > > > > > 1.4.1.
>     >     > >     > > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > > Anirudh
>     >     > >     > > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > > On Thu, Jun 20, 2019 at 6:17 PM
> Lai Wei
>     >     > >     > > > > > > > > > > > > <ro...@gmail.com>
>     >     > >     > > > > wrote:
>     >     > >     > > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > > > Hi,
>     >     > >     > > > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > > > Could you share which test
> failed and
>     >     > what’s
>     >     > > the
>     >     > >     > > > > > > > > > > > > > crash? How to reproduce it?
>     >     > >     > > > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > > > I was able to install sockeye
> and
>     > run all
>     >     > > tests
>     >     > >     > passed.
>     >     > >     > > > > > > > > > > > > > Using python setup.py test
>     >     > >     > > > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > > > I have tested both nightly pip
>     > package and
>     >     > >     > 1.5.0.rc1
>     >     > >     > > > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > > > It would be great to create an
> issue
>     > with
>     >     > >     > > > > > > > > > > > > > reproducible steps and move the
>     > discussion
>     >     > > there.
>     >     > >     > > > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > > > Also I see sockeye nightly
> build[1]
>     > has
>     >     > been
>     >     > >     > failing
>     >     > >     > > > > > > > > > > > > > for some time,
>     >     > >     > > > > > > > > > > if
>     >     > >     > > > > > > > > > > > > > it’s due to MXNet change,
> please
>     > raise this
>     >     > > early
>     >     > >     > so
>     >     > >     > > > > > > > > > > > > > we can track and solve it in
> time
>     > rather
>     >     > than
>     >     > > block
>     >     > >     > > > > > > > > > > > > > the release
>     >     > >     > > > > during vote time.
>     >     > >     > > > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > > > [1]
>     > https://travis-ci.org/awslabs/sockeye
>     >     > >     > > > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > > >
>     >     > >     > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 7:01 AM
>     > Anirudh
>     >     > > Subramanian
>     >     > >     > > > > > > > > > > > >
>
>
>
>

-- 
Sandeep Krishnamurthy

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Posted by "Davydenko, Denis" <dz...@gmail.com>.
Just to re-iterate, postponing the release until we have a strong hold on perf regression is my #1 choice as well. I am just trying to consider alternatives where we can release 1.5.0 and manage potential perf impact...

On 6/28/19, 10:04 AM, "Marco de Abreu" <ma...@gmail.com> wrote:

    Hey Denis,
    
    I don't think something like an experimental release is something that the
    Apache release process supports. Also, I would be afraid of automated
    systems consuming MXNet by simply fetching the latest release version.
    These users would then get the experimental version without being aware.
    
    For the sake of the best user experience, I'd prefer if we could take a few
    days to track down the root causes for all these regressions. While I agree
    that releasing the new features and optimizations is certainly overdue, I
    think that the most important point is to keep up with the existing users
    and their trust. If a new release performs worse for the same kind of
    workload, they might lose trust into our release process and in future
    might be less willing to adopt a new release early-on.
    
    -Marco
    
    Davydenko, Denis <dz...@gmail.com> schrieb am Fr., 28. Juni
    2019, 18:55:
    
    > According to Sandeep's evaluation of perf regression on operator level [1]
    > we have 77 op/input combinations for forward pass and 50 for backward pass
    > where regression is 5%+ (biggest regressions observed are about 86% and 84%
    > respectively) out of 290 tests. If I raise threshold of degradation to 10%+
    > corresponding numbers are 70 for forward and 42 for backward. This, from my
    > perspective, constitutes significant scale performance impact, at least on
    > individual operator level. In light of keeping every next release as
    > performant as previous (at least to feasible extent) I suggest we can only
    > move forward with 1.5.0 release if we call it experimental. Current
    > landscape of operators having potentially negative performance impact on
    > customers could (and I consider it will) put MXNet one step behind its
    > current market position of being a choice for performance optimized DL
    > workloads. Tagging it as experimental, from my point of view, would help to
    > release new features so that customers could enjoy them while being
    > explicit about performance optimizations going on.
    >
    > [1]
    > https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50
    >
    >
    >
    > On 6/28/19, 9:38 AM, "Lai Wei" <ro...@gmail.com> wrote:
    >
    >     Hi,
    >
    >     Some more data points:
    >
    >     I ran the same cifar10.py scripts with same setup, BUT added a fixed
    > seed
    >
    >     Ran 50 epochs, and first 10 epoch as warmup.
    >     I have the following average time per epoch:
    >     1.4.1: 164.95 s
    >     1.5.0: 170.44 s
    >     Detailed data at [1]
    >     This is about 3% regression, less than Manu’s result but more close to
    > the
    >     Gluon result.
    >
    >     As for the operator benchmarks from Sandeep[2],  I have calculated the
    >     percentage of speed increase/regression here[1]. Looks like not all
    >     operators mentioned before slowed down. should it be treated as an
    > separate
    >     issue as it’s testing on fake data with different shape than CIFAR10
    >     dataset? For example, batch norm has no regression in the report but
    > it’s
    >     slowed down in cifar10.py script profiling.
    >
    >     [1] https://gist.github.com/roywei/41fce930f013ff3b54cda6e86eaaf66b
    >     [2]
    >
    > https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50
    >
    >
    >     On Fri, Jun 28, 2019 at 2:47 PM Pedro Larroy <
    > pedro.larroy.lists@gmail.com>
    >     wrote:
    >
    >     > Thanks Manu.
    >     >
    >     > @all: I observed other strange stuff that I don't understand at the
    > moment:
    >     >
    >     > I installed rc for 1.5 from pip to check that I'm not doing something
    >     > wrong when building. And I found out that the usage of CPU is quite
    >     > subpar ( https://imgur.com/fRmbQNc ) compared to a version compiled
    >     > from source. The pip package is using 4-5 cores of the 32. When I
    >     > compile from source I get good core utilization. (
    >     > https://imgur.com/e8BB425 ). I verified this also on a c5d.18xlarge
    >     > and a 32 core AMD bare metal machine.
    >     >
    >     > Seems to me also that the version from pip is using gomp instead of
    >     > llvm's omp. I'm not sure why.
    >     >
    >     > pip install mxnet==1.5.0b20190627
    >     > /home/piotr/py3_1.5rc/lib/python3.6/site-packages/mxnet
    >     > piotr@panther:0: ~/p/l/p/s/mxnet> ldd libmxnet.so | grep omp
    >     >     libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
    >     > (0x00007f99d1832000)
    >     >
    >     > I tried cifar10 on a bare metal 32 core AMD Zen machine and is
    >     > extremely slow, doesn't seem to make much progress, when compared to
    > a
    >     > c5d.18xlarge, I couldn't even do 1 epoch, tried with and without MKL
    >     > without much success. Will continue digging into this when possible.
    >     >
    >     >
    >     > Pedro.
    >     >
    >     > On Thu, Jun 27, 2019 at 9:41 PM Manu Seth <ma...@gmail.com>
    > wrote:
    >     > >
    >     > > Hi all,
    >     > >
    >     > > I ran the same cifar10.py script as Pedro, but for 20 epochs.
    > Considering
    >     > > the first 10 epochs for warm-up, I averaged time per epoch for the
    > last
    >     > 10
    >     > > epochs.
    >     > >
    >     > > With MXNet 1.4.1 average time is 164.23 s
    >     > > With MXNet 1.5.0 average time is 174.59 s (~6.3% regression)
    >     > >
    >     > >
    >     > > For a second data point, I ran Gluon speed test benchmark script -
    >     > >
    >     >
    > https://github.com/apache/incubator-mxnet/blob/master/benchmark/python/gluon/benchmark_gluon.py
    >     > > using the following command:
    >     > > python3 benchmark_gluon.py --model 'resnet152_v2' --batch-size 128
    >     > > --num-batches 200 --type 'training'
    >     > >
    >     > > I got the following speeds:
    >     > > With MXNet 1.4.1, average speed is 25.677534 img/s
    >     > > With MXNet 1.5.0, average speed is 25.082130 img/s (~2.3%
    > regression)
    >     > >
    >     > > Note:
    >     > > For 1.4.1 version, I used pip install mxnet-mkl==1.4.1
    >     > > For 1.5.0 version, I used pip install mxnet-mkl==1.5.0b20190619
    > which
    >     > > corresponds to commit# ccbbf6b4b76ea536a6583c99497c83b65a20817b
    > which is
    >     > > behind 1.5.x branch by 4 commits
    >     > >
    >     > >
    >     > > Best,
    >     > > Manu
    >     > >
    >     > >
    >     > > On 6/27/19, 3:37 PM, "sandeep krishnamurthy" <
    >     > sandeep.krishna98@gmail.com>
    >     > > wrote:
    >     > >
    >     > >     Hello Ciyong/Pedro,
    >     > >
    >     > >     Ran operator benchmarks on 1.4.1 and 1.5.0.rc2. (Not complete,
    >     > doesn’t
    >     > >     cover all MXNet operators, not presented in best possible way,
    > still
    >     > > WIP)
    >     > >
    >     > >
    >     >
    > https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50
    >     > >
    >     > >     Following operators looks slower in 1.5 compared to 1.4.1:
    >     > >     - BatchNorm
    >     > >     - Pooling
    >     > >     - FullyConnected
    >     > >     - batch_dot
    >     > >     - Dot
    >     > >     - broadcast_mul
    >     > >     - log_softmax
    >     > >     and few other operators
    >     > >
    >     > >     Also, several operators runs a lot faster on 1.5 compared to
    > 1.4.1.
    >     > For
    >     > >     example - Convolution, flatten, elementwise operators etc. So
    > I see
    >     > that
    >     > >     likely few operators have regressed noticeably, however, due
    > to other
    >     > >     operator performance improvements, the end effect is not that
    >     > > significant
    >     > >     hiding a lot of regression. We need more detailed analysis per
    >     > operator
    >     > >     performance. We will not be able to do this for current
    > release, we
    >     > > should
    >     > >     have a more concrete way to determining such performance
    > regression
    >     > > before
    >     > >     next release.
    >     > >
    >     > >     Setup:
    >     > >     1.5 => Build from source (head of 1.5.rc2 tag), built with
    > MKLDNN
    >     > >     1.4.1 => PyPi mxnet-mkl==1.4.1
    >     > >     Machine: C5.18X
    >     > >     No explicit environment variable were set
    >     > >     Operator benchmark code -
    >     > >
    >     >
    > https://github.com/apache/incubator-mxnet/tree/master/benchmark/opperf
    >     > >
    >     > >     Best,
    >     > >     Sandeep
    >     > >
    >     > >
    >     > >     On Thu, Jun 27, 2019 at 10:42 AM Pedro Larroy <
    >     > > pedro.larroy.lists@gmail.com>
    >     > >     wrote:
    >     > >
    >     > >     > I will try to run a few benchmarks in a bare metal instance
    >     > tonight to
    >     > >     > remove virtualization variance for the measurements and
    > provide
    >     > some
    >     > >     > numbers.
    >     > >     >
    >     > >     > Please propose a set of models / examples that would be
    > desirable
    >     > to
    >     > >     > run before the release and provide a link to an easy to run
    > script
    >     > >     > with instructions so we can validate the release better.
    >     > >     >
    >     > >     > Thank you.
    >     > >     >
    >     > >     > On Thu, Jun 27, 2019 at 10:01 AM Lai Wei <
    > royweilai@gmail.com>
    >     > wrote:
    >     > >     > >
    >     > >     > > Dear @dev,
    >     > >     > >
    >     > >     > > I m cancelling the vote for cached op fix:
    >     > >     > >
    >     > >     > > https://github.com/apache/incubator-mxnet/pull/15298
    >     > >     > >
    >     > >     > > As for the possible cpu training regression, it looks like
    > not a
    >     > > blocker
    >     > >     > > for now.
    >     > >     > >
    >     > >     > > I will start a new rc2 vote, please help to validate.
    >     > >     > >
    >     > >     > > Thanks!
    >     > >     > >
    >     > >     > >
    >     > >     > > On Thu, Jun 27, 2019 at 10:06 PM Chen, Ciyong <
    >     > ciyong.chen@intel.com
    >     > > >
    >     > >     > wrote:
    >     > >     > >
    >     > >     > > > Hi Pedro,
    >     > >     > > >
    >     > >     > > > I was able to reproduced the similar result (v1.5 is
    > ~%5.6
    >     > slower
    >     > > than
    >     > >     > > > v1.4, I was using 18 cores for computing) with your
    > script on
    >     > >     > C5.18xlarge.
    >     > >     > > > But need to bind the cores with below command when
    > running the
    >     > > script,
    >     > >     > > > (without setting the env variables, I got a close time
    > (<1%)
    >     > with
    >     > > v1.5
    >     > >     > and
    >     > >     > > > v1.4)
    >     > >     > > >         export
    >     > > KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0
    >     > >     > > >         export OMP_NUM_THREADS=18
    >     > >     > > >
    >     > >     > > > Did you set any env variables during running?
    >     > >     > > >
    >     > >     > > > The performance result I got as below:
    >     > >     > > > 1) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
    >     > >     > > > real    12m10.856s
    >     > >     > > > user    234m49.576s
    >     > >     > > > sys     4m38.044s
    >     > >     > > >
    >     > >     > > > 2) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
    >     > >     > > > real    12m52.140s
    >     > >     > > > user    246m30.740s
    >     > >     > > > sys     5m8.188s
    >     > >     > > >
    >     > >     > > > As I looked at the profiling data, most of the ops have
    > same
    >     > perf
    >     > >     > between
    >     > >     > > > v1.4 and v1.5. But some ops like " _backward_BatchNorm"
    > and
    >     > > "Pooling"
    >     > >     > is
    >     > >     > > > ~1.37x slower on v1.5 compared with v1.4.
    >     > >     > > > Will do further analysis on these ops.
    >     > >     > > >
    >     > >     > > > Here's the hardware/OS info from my side:
    >     > >     > > > ----------Python Info----------
    >     > >     > > > Version      : 3.6.8
    >     > >     > > > Compiler     : GCC 7.3.0
    >     > >     > > > Build        : ('default', 'Dec 30 2018 01:22:34')
    >     > >     > > > Arch         : ('64bit', '')
    >     > >     > > > ------------Pip Info-----------
    >     > >     > > > Version      : 19.0.3
    >     > >     > > > Directory    :
    >     > >     > > >
    >     > >
    > /home/ubuntu/anaconda3/envs/perf-mxnet/lib/python3.6/site-packages/pip
    >     > >     > > > ----------MXNet Info-----------
    >     > >     > > > Version      : 1.5.0
    >     > >     > > > Directory    :
    > /home/ubuntu/ws/incubator-mxnet/python/mxnet
    >     > >     > > > Hashtag not found. Not installed from pre-built package.
    >     > >     > > > ----------System Info----------
    >     > >     > > > Platform     :
    >     > Linux-4.4.0-1085-aws-x86_64-with-debian-stretch-sid
    >     > >     > > > system       : Linux
    >     > >     > > > node         : ip-172-31-32-129
    >     > >     > > > release      : 4.4.0-1085-aws
    >     > >     > > > version      : #96-Ubuntu SMP Tue Jun 11 09:08:32 UTC
    > 2019
    >     > >     > > > ----------Hardware Info----------
    >     > >     > > > machine      : x86_64
    >     > >     > > > processor    : x86_64
    >     > >     > > > Architecture:          x86_64
    >     > >     > > > CPU op-mode(s):        32-bit, 64-bit
    >     > >     > > > Byte Order:            Little Endian
    >     > >     > > > CPU(s):                72
    >     > >     > > > On-line CPU(s) list:   0-71
    >     > >     > > > Thread(s) per core:    2
    >     > >     > > > Core(s) per socket:    18
    >     > >     > > > Socket(s):             2
    >     > >     > > > NUMA node(s):          2
    >     > >     > > > Vendor ID:             GenuineIntel
    >     > >     > > > CPU family:            6
    >     > >     > > > Model:                 85
    >     > >     > > > Model name:            Intel(R) Xeon(R) Platinum 8124M
    > CPU @
    >     > > 3.00GHz
    >     > >     > > > Stepping:              3
    >     > >     > > > CPU MHz:               3000.000
    >     > >     > > > BogoMIPS:              6000.00
    >     > >     > > > Hypervisor vendor:     KVM
    >     > >     > > > Virtualization type:   full
    >     > >     > > > L1d cache:             32K
    >     > >     > > > L1i cache:             32K
    >     > >     > > > L2 cache:              1024K
    >     > >     > > > L3 cache:              25344K
    >     > >     > > > NUMA node0 CPU(s):     0-17,36-53
    >     > >     > > > NUMA node1 CPU(s):     18-35,54-71
    >     > >     > > > Flags:                 fpu vme de pse tsc msr pae mce
    > cx8 apic
    >     > > sep mtrr
    >     > >     > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht
    > syscall
    >     > nx
    >     > >     > pdpe1gb
    >     > >     > > > rdtscp lm constant_tsc arch_perfmon rep_good nopl
    > xtopology
    >     > > nonstop_tsc
    >     > >     > > > aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3
    > fma cx16
    >     > > pcid
    >     > >     > sse4_1
    >     > >     > > > sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave
    > avx
    >     > f16c
    >     > > rdrand
    >     > >     > > > hypervisor lahf_lm abm 3dnowprefetch invpcid_single
    > kaiser
    >     > > fsgsbase
    >     > >     > > > tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx
    > avx512f
    >     > > rdseed
    >     > >     > adx
    >     > >     > > > smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1
    > ida arat
    >     > pku
    >     > >     > > > ----------Network Test----------
    >     > >     > > >
    >     > >     > > >
    >     > >     > > > -Ciyong
    >     > >     > > >
    >     > >     > > >
    >     > >     > > > -----Original Message-----
    >     > >     > > > From: Zhao, Patric [mailto:patric.zhao@intel.com]
    >     > >     > > > Sent: Thursday, June 27, 2019 9:55 AM
    >     > >     > > > To: dev@mxnet.incubator.apache.org
    >     > >     > > > Cc: dev@mxnet.apache.org
    >     > >     > > > Subject: RE: [VOTE] Release Apache MXNet (incubating)
    > version
    >     > > 1.5.0.rc1
    >     > >     > > >
    >     > >     > > > Could we run more epochs to see the performance
    > difference or
    >     > > profiling
    >     > >     > > > the difference between good and bad run?
    >     > >     > > >
    >     > >     > > > > -----Original Message-----
    >     > >     > > > > From: Pedro Larroy [mailto:
    > pedro.larroy.lists@gmail.com]
    >     > >     > > > > Sent: Thursday, June 27, 2019 9:35 AM
    >     > >     > > > > To: dev@mxnet.incubator.apache.org
    >     > >     > > > > Cc: dev@mxnet.apache.org
    >     > >     > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating)
    > version
    >     > >     > > > > 1.5.0.rc1
    >     > >     > > > >
    >     > >     > > > > I run again and the gap is again bigger, I guess we
    > need to
    >     > > average
    >     > >     > > > > out the times across several runs:
    >     > >     > > > >
    >     > >     > > > > piotr@ip-172-31-63-171
    > :0:~/deeplearning-benchmark/dawnbench
    >     > >     > > > > (master)+$ time ~/mxnet_1.4/py3_venv/bin/python
    > cifar10.py
    >     > > --epochs 5
    >     > >     > > > > && time ~/mxnet_1.5/py3_venv/bin/python cifar10.py
    > --epochs 5
    >     > >     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
    >     > >     > > > > ImageRecordIOParser2:
    >     > >     > > > >
    > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use
    >     > 4
    >     > >     > threads
    >     > >     > > > > for decoding..
    >     > >     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230:
    > Load mean
    >     > > image
    >     > >     > > > > from
    > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    >     > >     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248:
    > Load mean
    >     > > image
    >     > >     > > > > from
    > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    >     > > completed
    >     > >     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
    >     > >     > > > > ImageRecordIOParser2:
    >     > >     > > > >
    > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
    >     > > threads
    >     > >     > > > > for decoding..
    >     > >     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230:
    > Load mean
    >     > > image
    >     > >     > > > > from
    > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    >     > >     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248:
    > Load mean
    >     > > image
    >     > >     > > > > from
    > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    >     > > completed
    >     > >     > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123:
    > 0.0005,
    >     > > 300:
    >     > >     > > > > 0.0001} Epoch 0, Changed learning rate to 0.05
    > [23:17:09]
    >     > >     > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
    >     > >     > > > > 147456 bytes with malloc directly
    >     > >     > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
    >     > Allocate
    >     > >     > > > > 589824 bytes with malloc directly
    >     > >     > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
    >     > Allocate
    >     > >     > > > > 2359296 bytes with malloc directly
    >     > >     > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
    >     > Allocate
    >     > >     > > > > 9437184 bytes with malloc directly
    >     > >     > > > > Epoch 0, Batch 199, Speed=384.149839
    >     > >     > > > > Epoch 0, Duration=140.919567
    >     > >     > > > > Epoch 0, Training accuracy=0.115169
    >     > >     > > > > Epoch 0, Validation accuracy=0.141317
    >     > >     > > > > Epoch 1, Batch 199, Speed=433.380512
    >     > >     > > > > Epoch 1, Duration=119.553233
    >     > >     > > > > Epoch 1, Training accuracy=0.170956
    >     > >     > > > > Epoch 1, Validation accuracy=0.216146
    >     > >     > > > > Epoch 2, Batch 199, Speed=434.864699
    >     > >     > > > > Epoch 2, Duration=123.278490
    >     > >     > > > > Epoch 2, Training accuracy=0.209455
    >     > >     > > > > Epoch 2, Validation accuracy=0.247296
    >     > >     > > > > Epoch 3, Batch 199, Speed=433.401854
    >     > >     > > > > Epoch 3, Duration=118.327797
    >     > >     > > > > Epoch 3, Training accuracy=0.248701
    >     > >     > > > > Epoch 3, Validation accuracy=0.302083
    >     > >     > > > > Epoch 4, Batch 199, Speed=419.713707
    >     > >     > > > > Epoch 4, Duration=126.468409
    >     > >     > > > > Epoch 4, Training accuracy=0.260949
    >     > >     > > > > Epoch 4, Validation accuracy=0.269030
    >     > >     > > > >
    >     > >     > > > > real    10m55.796s
    >     > >     > > > > user    399m33.567s
    >     > >     > > > > sys     13m55.904s
    >     > >     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
    >     > >     > > > > ImageRecordIOParser2:
    >     > >     > > > >
    > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use
    >     > 4
    >     > >     > threads
    >     > >     > > > > for decoding..
    >     > >     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230:
    > Load mean
    >     > > image
    >     > >     > > > > from
    > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    >     > >     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248:
    > Load mean
    >     > > image
    >     > >     > > > > from
    > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    >     > > completed
    >     > >     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
    >     > >     > > > > ImageRecordIOParser2:
    >     > >     > > > >
    > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
    >     > > threads
    >     > >     > > > > for decoding..
    >     > >     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230:
    > Load mean
    >     > > image
    >     > >     > > > > from
    > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    >     > >     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248:
    > Load mean
    >     > > image
    >     > >     > > > > from
    > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    >     > > completed
    >     > >     > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123:
    > 0.0005,
    >     > > 300:
    >     > >     > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch
    > 0, Batch
    >     > > 199,
    >     > >     > > > > Speed=419.039188 Epoch 0, Duration=143.934903 Epoch 0,
    >     > Training
    >     > >     > > > > accuracy=0.122542 Epoch 0, Validation accuracy=0.164359
    >     > Epoch 1,
    >     > >     > Batch
    >     > >     > > > > 199, Speed=445.257048 Epoch 1, Duration=135.248399
    > Epoch 1,
    >     > > Training
    >     > >     > > > > accuracy=0.178828 Epoch 1, Validation accuracy=0.199419
    >     > Epoch 2,
    >     > >     > Batch
    >     > >     > > > > 199, Speed=447.115215 Epoch 2, Duration=132.003770
    > Epoch 2,
    >     > > Training
    >     > >     > > > > accuracy=0.217808 Epoch 2, Validation accuracy=0.233073
    >     > Epoch 3,
    >     > >     > Batch
    >     > >     > > > > 199, Speed=441.079477 Epoch 3, Duration=126.543316
    > Epoch 3,
    >     > > Training
    >     > >     > > > > accuracy=0.248102 Epoch 3, Validation accuracy=0.293870
    >     > Epoch 4,
    >     > >     > Batch
    >     > >     > > > > 199, Speed=449.329787 Epoch 4, Duration=138.398325
    > Epoch 4,
    >     > > Training
    >     > >     > > > > accuracy=0.270021 Epoch 4, Validation accuracy=0.311498
    >     > >     > > > >
    >     > >     > > > > real    11m45.329s
    >     > >     > > > > user    426m13.908s
    >     > >     > > > > sys     16m45.093s
    >     > >     > > > >
    >     > >     > > > > On Wed, Jun 26, 2019 at 4:18 PM Pedro Larroy
    >     > >     > > > > <pe...@gmail.com> wrote:
    >     > >     > > > > >
    >     > >     > > > > > The difference looks smaller now, more like your
    > numbers. I
    >     > > wonder
    >     > >     > > > > > if something happened during the previous benchmark
    > like a
    >     > > system
    >     > >     > > > > > update...
    >     > >     > > > > >
    >     > >     > > > > >
    >     > >     > > > > > piotr@ip-172-31-63-171
    >     > :0:~/deeplearning-benchmark/dawnbench
    >     > >     > > > > (master)+$
    >     > >     > > > > > time ~/mxnet_1.4/py3_venv/bin/python cifar10.py
    > --epochs 5
    >     > &&
    >     > > time
    >     > >     > > > > > ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
    >     > > [22:49:41]
    >     > >     > > > > > ../src/io/iter_image_recordio_2.cc:172:
    >     > >     > > > > > ImageRecordIOParser2:
    >     > >     > > > > >
    > /home/piotr/deeplearning-benchmark/data/cifar/train.rec,
    >     > use 4
    >     > >     > > > > > threads for decoding..
    >     > >     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230:
    > Load
    >     > mean
    >     > > image
    >     > >     > > > > > from
    > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    >     > >     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248:
    > Load
    >     > mean
    >     > > image
    >     > >     > > > > > from
    > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    >     > >     > > > > completed
    >     > >     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:172:
    >     > >     > > > > > ImageRecordIOParser2:
    >     > >     > > > > >
    > /home/piotr/deeplearning-benchmark/data/cifar/test.rec,
    >     > use 4
    >     > >     > > > > > threads for decoding..
    >     > >     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230:
    > Load
    >     > mean
    >     > > image
    >     > >     > > > > > from
    > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    >     > >     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248:
    > Load
    >     > mean
    >     > > image
    >     > >     > > > > > from
    > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    >     > >     > > > > completed
    >     > >     > > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123:
    >     > 0.0005,
    >     > > 300:
    >     > >     > > > > > 0.0001} Epoch 0, Changed learning rate to 0.05
    > [22:49:42]
    >     > >     > > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
    >     > >     > > > > > 147456 bytes with malloc directly
    >     > >     > > > > > [22:49:42]
    > ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
    >     > > Allocate
    >     > >     > > > > > 589824 bytes with malloc directly
    >     > >     > > > > > [22:49:42]
    > ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
    >     > > Allocate
    >     > >     > > > > > 2359296 bytes with malloc directly
    >     > >     > > > > > [22:49:42]
    > ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
    >     > > Allocate
    >     > >     > > > > > 9437184 bytes with malloc directly
    >     > >     > > > > > Epoch 0, Batch 199, Speed=426.182733 Epoch 0,
    >     > > Duration=134.868458
    >     > >     > > > > > Epoch 0, Training accuracy=0.127238 Epoch 0,
    > Validation
    >     > >     > > > > > accuracy=0.206388 Epoch 1, Batch 199,
    > Speed=313.127156
    >     > Epoch
    >     > > 1,
    >     > >     > > > > > Duration=128.041775 Epoch 1, Training
    > accuracy=0.182065
    >     > Epoch
    >     > > 1,
    >     > >     > > > > > Validation accuracy=0.202524 Epoch 2, Batch 199,
    >     > > Speed=410.931187
    >     > >     > > > > > Epoch 2, Duration=124.920588 Epoch 2, Training
    >     > > accuracy=0.202584
    >     > >     > > > > > Epoch 2, Validation accuracy=0.245693 Epoch 3, Batch
    > 199,
    >     > >     > > > > > Speed=419.119335 Epoch 3, Duration=120.948349 Epoch
    > 3,
    >     > > Training
    >     > >     > > > > > accuracy=0.235854 Epoch 3, Validation
    > accuracy=0.291066
    >     > Epoch
    >     > > 4,
    >     > >     > > > > > Batch 199, Speed=430.473733 Epoch 4,
    > Duration=130.181724
    >     > > Epoch 4,
    >     > >     > > > > > Training accuracy=0.257773 Epoch 4, Validation
    >     > > accuracy=0.304988
    >     > >     > > > > >
    >     > >     > > > > > real    11m7.356s
    >     > >     > > > > > user    406m9.910s
    >     > >     > > > > > sys     14m18.349s
    >     > >     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
    >     > >     > > > > > ImageRecordIOParser2:
    >     > >     > > > > >
    > /home/piotr/deeplearning-benchmark/data/cifar/train.rec,
    >     > use 4
    >     > >     > > > > > threads for decoding..
    >     > >     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230:
    > Load
    >     > mean
    >     > > image
    >     > >     > > > > > from
    > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    >     > >     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248:
    > Load
    >     > mean
    >     > > image
    >     > >     > > > > > from
    > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    >     > >     > > > > completed
    >     > >     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
    >     > >     > > > > > ImageRecordIOParser2:
    >     > >     > > > > >
    > /home/piotr/deeplearning-benchmark/data/cifar/test.rec,
    >     > use 4
    >     > >     > > > > > threads for decoding..
    >     > >     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230:
    > Load
    >     > mean
    >     > > image
    >     > >     > > > > > from
    > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    >     > >     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248:
    > Load
    >     > mean
    >     > > image
    >     > >     > > > > > from
    > /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    >     > >     > > > > completed
    >     > >     > > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123:
    >     > 0.0005,
    >     > > 300:
    >     > >     > > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch
    > 0,
    >     > Batch
    >     > > 199,
    >     > >     > > > > > Speed=348.618154 Epoch 0, Duration=146.469352 Epoch
    > 0,
    >     > > Training
    >     > >     > > > > > accuracy=0.124121 Epoch 0, Validation
    > accuracy=0.167227
    >     > Epoch
    >     > > 1,
    >     > >     > > > > > Batch 199, Speed=452.790825 Epoch 1,
    > Duration=130.199421
    >     > > Epoch 1,
    >     > >     > > > > > Training
    >     > >     > > > > > accuracy=0.183863 Epoch 1, Validation
    > accuracy=0.237079
    >     > Epoch
    >     > > 2,
    >     > >     > > > > > Batch 199, Speed=451.406559 Epoch 2,
    > Duration=126.320823
    >     > > Epoch 2,
    >     > >     > > > > > Training
    >     > >     > > > > > accuracy=0.214844 Epoch 2, Validation
    > accuracy=0.244692
    >     > Epoch
    >     > > 3,
    >     > >     > > > > > Batch 199, Speed=403.161873 Epoch 3,
    > Duration=125.331660
    >     > > Epoch 3,
    >     > >     > > > > > Training
    >     > >     > > > > > accuracy=0.243506 Epoch 3, Validation
    > accuracy=0.301182
    >     > Epoch
    >     > > 4,
    >     > >     > > > > > Batch 199, Speed=450.826598 Epoch 4,
    > Duration=126.426253
    >     > > Epoch 4,
    >     > >     > > > > > Training
    >     > >     > > > > > accuracy=0.266424 Epoch 4, Validation
    > accuracy=0.311899
    >     > >     > > > > >
    >     > >     > > > > > real    11m21.930s
    >     > >     > > > > > user    415m3.855s
    >     > >     > > > > > sys     13m53.975s
    >     > >     > > > > >
    >     > >     > > > > > On Wed, Jun 26, 2019 at 3:50 PM Pedro Larroy
    >     > >     > > > > > <pe...@gmail.com> wrote:
    >     > >     > > > > > >
    >     > >     > > > > > > Hi Ciyong, thanks for trying to reproduce:
    >     > >     > > > > > >
    >     > >     > > > > > > I used this one:
    >     > >     > > > > > > https://github.com/awslabs/deeplearning-
    >     > >     > > > > benchmark/blob/master/dawnbe
    >     > >     > > > > > > nch/cifar10.py
    >     > >     > > > > > >
    >     > >     > > > > > > Could you provide hardware and OS details?
    >     > >     > > > > > >
    >     > >     > > > > > > I will rerun and repost numbers in a few minutes.
    >     > >     > > > > > >
    >     > >     > > > > > > Pedro.
    >     > >     > > > > > >
    >     > >     > > > > > > On Wed, Jun 26, 2019 at 4:18 AM Chen, Ciyong
    >     > >     > > > > > > <ci...@intel.com>
    >     > >     > > > > wrote:
    >     > >     > > > > > > >
    >     > >     > > > > > > > Hi Pedro,
    >     > >     > > > > > > >
    >     > >     > > > > > > > I'm looking at this case, and using the script of
    >     > >     > > > > > > >
    >     > > "incubator-mxnet/example/image-classification/train_cifar10.py"
    >     > >     > > > > > > > to get
    >     > >     > > > > the timing data, but seems there's not much difference
    >     > between
    >     > > mxnet
    >     > >     > > > > 1.4.1.rc0 and 1.5.0.rc1 on C5.18xlarge.
    >     > >     > > > > > > >
    >     > >     > > > > > > > Not sure if there's any difference in the python
    >     > script,
    >     > > can
    >     > >     > you
    >     > >     > > > > > > > point me
    >     > >     > > > > the link to get your script (cifar10.py)?
    >     > >     > > > > > > > Or you can also have a try with MXNet's script
    >     > >     > > > > > > > (train_cifar10.py) and see
    >     > >     > > > > the performance.
    >     > >     > > > > > > >
    >     > >     > > > > > > > Here's the command I used to collect the time:
    >     > >     > > > > > > >         python train_cifar10.py --num-epoch=5
    >     > >     > > > > > > >
    >     > >     > > > > > > > 1) 1.5.0.rc1
    > (4d9667121ae6fb643f2a02ab15e25231ed756cde)
    >     > >     > > > > > > >         real    9m4.880s
    >     > >     > > > > > > >         user    333m13.340s
    >     > >     > > > > > > >         sys     14m36.100s
    >     > >     > > > > > > >
    >     > >     > > > > > > > 2) 1.4.1.rc0
    > (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
    >     > >     > > > > > > >         real    9m2.155s
    >     > >     > > > > > > >         user    329m37.092s
    >     > >     > > > > > > >         sys     16m8.668s
    >     > >     > > > > > > >
    >     > >     > > > > > > > -Ciyong
    >     > >     > > > > > > >
    >     > >     > > > > > > >
    >     > >     > > > > > > > -----Original Message-----
    >     > >     > > > > > > > From: Pedro Larroy [mailto:
    >     > pedro.larroy.lists@gmail.com]
    >     > >     > > > > > > > Sent: Wednesday, June 26, 2019 6:28 AM
    >     > >     > > > > > > > To: dev@mxnet.incubator.apache.org
    >     > >     > > > > > > > Cc: dev@mxnet.apache.org
    >     > >     > > > > > > > Subject: Re: [VOTE] Release Apache MXNet
    > (incubating)
    >     > > version
    >     > >     > > > > > > > 1.5.0.rc1
    >     > >     > > > > > > >
    >     > >     > > > > > > > Hi these were my build flags and system info:
    >     > >     > > > > > > >
    >     > >     > > > > > > >
    >     > >     > > > > > > > --- # CMake configuration
    >     > >     > > > > > > > USE_CUDA: "OFF" # Build with CUDA support
    >     > >     > > > > > > > USE_OLDCMAKECUDA: "OFF" # Build with old cmake
    > cuda
    >     > >     > > > > > > > USE_NCCL: "OFF" # Use NVidia NCCL with CUDA
    >     > >     > > > > > > > USE_OPENCV: "ON" # Build with OpenCV support
    >     > >     > > > > > > > USE_OPENMP: "ON" # Build with Openmp support
    >     > >     > > > > > > > USE_CUDNN: "ON" # Build with cudnn support) #
    > one could
    >     > > set
    >     > >     > > > > > > > CUDNN_ROOT for search path
    >     > >     > > > > > > > USE_SSE: "ON" # Build with x86 SSE instruction
    > support
    >     > IF
    >     > > NOT
    >     > >     > > > > > > > ARM
    >     > >     > > > > > > > USE_F16C: "ON" # Build with x86 F16C instruction
    >     > support)
    >     > > #
    >     > >     > > > > autodetects support if "ON"
    >     > >     > > > > > > > USE_LAPACK: "ON" # Build with lapack support
    >     > >     > > > > > > > USE_MKL_IF_AVAILABLE: "ON" # Use MKL if found
    >     > >     > > > > > > > USE_MKLML_MKL: "ON" # Use MKLDNN variant of MKL
    > (if MKL
    >     > > found)
    >     > >     > > > > > > > IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
    >     > >     > > > > > > > USE_MKLDNN: "ON" # Use MKLDNN variant of MKL (if
    > MKL
    >     > > found) IF
    >     > >     > > > > > > > USE_MKL_IF_AVAILABLE AND (NOT APPLE)
    >     > >     > > > > > > > USE_OPERATOR_TUNING: "ON" # Enable auto-tuning of
    >     > > operators IF
    >     > >     > > > > NOT
    >     > >     > > > > > > > MSVC
    >     > >     > > > > > > > USE_GPERFTOOLS: "ON" # Build with GPerfTools
    > support
    >     > (if
    >     > > found)
    >     > >     > > > > > > > USE_JEMALLOC: "ON" # Build with Jemalloc support
    >     > >     > > > > > > > USE_PROFILER: "ON" # Build with Profiler support
    >     > >     > > > > > > > USE_DIST_KVSTORE: "OFF" # Build with DIST_KVSTORE
    >     > support
    >     > >     > > > > > > > USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC Plugins
    >     > >     > > > > > > > USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin
    >     > >     > > > > > > > USE_CPP_PACKAGE: "OFF" # Build C++ Package
    >     > >     > > > > > > > USE_MXNET_LIB_NAMING: "ON" # Use MXNet library
    > naming
    >     > >     > > > > conventions.
    >     > >     > > > > > > > USE_GPROF: "OFF" # Compile with gprof
    > (profiling) flag
    >     > >     > > > > > > > USE_CXX14_IF_AVAILABLE: "OFF" # Build with C++14
    > if the
    >     > >     > compiler
    >     > >     > > > > > > > supports it
    >     > >     > > > > > > > USE_VTUNE: "OFF" # Enable use of Intel Amplifier
    > XE
    >     > > (VTune)) #
    >     > >     > > > > > > > one could set VTUNE_ROOT for search path
    >     > >     > > > > > > > ENABLE_CUDA_RTC: "ON" # Build with CUDA runtime
    >     > > compilation
    >     > >     > > > > > > > support
    >     > >     > > > > > > > BUILD_CPP_EXAMPLES: "ON" # Build cpp examples
    >     > >     > > > > > > > INSTALL_EXAMPLES: "OFF" # Install the example
    > source
    >     > > files.
    >     > >     > > > > > > > USE_SIGNAL_HANDLER: "ON" # Print stack traces on
    >     > > segfaults.
    >     > >     > > > > > > > USE_TENSORRT: "OFF" # Enable infeference
    > optimization
    >     > with
    >     > >     > > > TensorRT.
    >     > >     > > > > > > > USE_ASAN: "OFF" # Enable Clang/GCC ASAN
    > sanitizers.
    >     > >     > > > > > > > ENABLE_TESTCOVERAGE: "OFF" # Enable compilation
    > with
    >     > test
    >     > >     > > > > > > > coverage metric output
    >     > >     > > > > > > > CMAKE_BUILD_TYPE: "Release"
    >     > >     > > > > > > > CMAKE_CUDA_COMPILER_LAUNCHER: "ccache"
    >     > >     > > > > > > > CMAKE_C_COMPILER_LAUNCHER: "ccache"
    >     > >     > > > > > > > CMAKE_CXX_COMPILER_LAUNCHER: "ccache"
    >     > >     > > > > > > >
    >     > >     > > > > > > > commit 4d9667121ae6fb643f2a02ab15e25231ed756cde
    > (HEAD,
    >     > > tag:
    >     > >     > > > > > > > 1.5.0.rc1,
    >     > >     > > > > > > > upstream/v1.5.x)
    >     > >     > > > > > > > commit 1a7199691f5cbc6012bb53eecbf884bed5ae6590
    > (HEAD,
    >     > > tag:
    >     > >     > > > > > > > 1.4.1.rc0,
    >     > >     > > > > > > > upstream/v1.4.x)
    >     > >     > > > > > > >
    >     > >     > > > > > > > curl
    >     > http://169.254.169.254/latest/meta-data/instance-type
    >     > >     > > > > > > > c5d.18xlarge
    >     > >     > > > > > > >
    >     > >     > > > > > > >
    >     > >     > > > > > > > Version      : 3.6.7
    >     > >     > > > > > > > Compiler     : GCC 8.2.0
    >     > >     > > > > > > > Build        : ('default', 'Oct 22 2018
    > 11:32:17')
    >     > >     > > > > > > > Arch         : ('64bit', 'ELF')
    >     > >     > > > > > > > ------------Pip Info-----------
    >     > >     > > > > > > > Version      : 19.1.1
    >     > >     > > > > > > > Directory    :
    >     > >     > /home/piotr/mxnet_1.5/py3_venv/lib/python3.6/site-
    >     > >     > > > > packages/pip
    >     > >     > > > > > > > ----------MXNet Info-----------
    >     > >     > > > > > > > Version      : 1.5.0
    >     > >     > > > > > > > Directory    : /home/piotr/mxnet_1.5/python/mxnet
    >     > >     > > > > > > > Hashtag not found. Not installed from pre-built
    >     > package.
    >     > >     > > > > > > > ----------System Info----------
    >     > >     > > > > > > > Platform     :
    >     > >     > > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
    >     > >     > > > > > > > system       : Linux
    >     > >     > > > > > > > node         : ip-172-31-63-171
    >     > >     > > > > > > > release      : 4.15.0-1035-aws
    >     > >     > > > > > > > version      : #37-Ubuntu SMP Mon Mar 18
    > 16:15:14 UTC
    >     > 2019
    >     > >     > > > > > > > ----------Hardware Info----------
    >     > >     > > > > > > > machine      : x86_64
    >     > >     > > > > > > > processor    : x86_64
    >     > >     > > > > > > > Architecture:        x86_64
    >     > >     > > > > > > > CPU op-mode(s):      32-bit, 64-bit
    >     > >     > > > > > > > Byte Order:          Little Endian
    >     > >     > > > > > > > CPU(s):              72
    >     > >     > > > > > > > On-line CPU(s) list: 0-71
    >     > >     > > > > > > > Thread(s) per core:  2
    >     > >     > > > > > > > Core(s) per socket:  18
    >     > >     > > > > > > > Socket(s):           2
    >     > >     > > > > > > > NUMA node(s):        2
    >     > >     > > > > > > > Vendor ID:           GenuineIntel
    >     > >     > > > > > > > CPU family:          6
    >     > >     > > > > > > > Model:               85
    >     > >     > > > > > > > Model name:          Intel(R) Xeon(R) Platinum
    > 8124M
    >     > CPU @
    >     > >     > 3.00GHz
    >     > >     > > > > > > > Stepping:            4
    >     > >     > > > > > > > CPU MHz:             1326.446
    >     > >     > > > > > > > BogoMIPS:            6000.00
    >     > >     > > > > > > > Hypervisor vendor:   KVM
    >     > >     > > > > > > > Virtualization type: full
    >     > >     > > > > > > > L1d cache:           32K
    >     > >     > > > > > > > L1i cache:           32K
    >     > >     > > > > > > > L2 cache:            1024K
    >     > >     > > > > > > > L3 cache:            25344K
    >     > >     > > > > > > > NUMA node0 CPU(s):   0-17,36-53
    >     > >     > > > > > > > NUMA node1 CPU(s):   18-35,54-71
    >     > >     > > > > > > > Flags:               fpu vme de pse tsc msr pae
    > mce cx8
    >     > > apic
    >     > >     > sep
    >     > >     > > > mtrr
    >     > >     > > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2
    > ss ht
    >     > > syscall
    >     > >     > > > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon
    > rep_good
    >     > > nopl
    >     > >     > > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni
    > pclmulqdq
    >     > > monitor
    >     > >     > > > > > > > ssse3 fma cx16 pcid
    >     > >     > > > > > > > sse4_1 sse4_2 x2apic movbe popcnt
    > tsc_deadline_timer
    >     > aes
    >     > > xsave
    >     > >     > > > > > > > avx f16c rdrand hypervisor lahf_lm abm
    > 3dnowprefetch
    >     > >     > > > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle
    > avx2
    >     > smep
    >     > > bmi2
    >     > >     > > > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx
    > smap
    >     > >     > clflushopt
    >     > >     > > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec
    > xgetbv1
    >     > > xsaves
    >     > >     > > > > > > > ida arat pku ospke ----------Network
    > Test----------
    >     > >     > > > > > > >
    >     > >     > > > > > > > ----------Python Info----------
    >     > >     > > > > > > > Version      : 3.6.7
    >     > >     > > > > > > > Compiler     : GCC 8.2.0
    >     > >     > > > > > > > Build        : ('default', 'Oct 22 2018
    > 11:32:17')
    >     > >     > > > > > > > Arch         : ('64bit', 'ELF')
    >     > >     > > > > > > > ------------Pip Info-----------
    >     > >     > > > > > > > Version      : 19.1.1
    >     > >     > > > > > > > Directory    :
    >     > >     > /home/piotr/mxnet_1.4/py3_venv/lib/python3.6/site-
    >     > >     > > > > packages/pip
    >     > >     > > > > > > > ----------MXNet Info-----------
    >     > >     > > > > > > > Version      : 1.4.1
    >     > >     > > > > > > > Directory    : /home/piotr/mxnet_1.4/python/mxnet
    >     > >     > > > > > > > Hashtag not found. Not installed from pre-built
    >     > package.
    >     > >     > > > > > > > ----------System Info----------
    >     > >     > > > > > > > Platform     :
    >     > >     > > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
    >     > >     > > > > > > > system       : Linux
    >     > >     > > > > > > > node         : ip-172-31-63-171
    >     > >     > > > > > > > release      : 4.15.0-1035-aws
    >     > >     > > > > > > > version      : #37-Ubuntu SMP Mon Mar 18
    > 16:15:14 UTC
    >     > 2019
    >     > >     > > > > > > > ----------Hardware Info----------
    >     > >     > > > > > > > machine      : x86_64
    >     > >     > > > > > > > processor    : x86_64
    >     > >     > > > > > > > Architecture:        x86_64
    >     > >     > > > > > > > CPU op-mode(s):      32-bit, 64-bit
    >     > >     > > > > > > > Byte Order:          Little Endian
    >     > >     > > > > > > > CPU(s):              72
    >     > >     > > > > > > > On-line CPU(s) list: 0-71
    >     > >     > > > > > > > Thread(s) per core:  2
    >     > >     > > > > > > > Core(s) per socket:  18
    >     > >     > > > > > > > Socket(s):           2
    >     > >     > > > > > > > NUMA node(s):        2
    >     > >     > > > > > > > Vendor ID:           GenuineIntel
    >     > >     > > > > > > > CPU family:          6
    >     > >     > > > > > > > Model:               85
    >     > >     > > > > > > > Model name:          Intel(R) Xeon(R) Platinum
    > 8124M
    >     > CPU @
    >     > >     > 3.00GHz
    >     > >     > > > > > > > Stepping:            4
    >     > >     > > > > > > > CPU MHz:             1223.344
    >     > >     > > > > > > > BogoMIPS:            6000.00
    >     > >     > > > > > > > Hypervisor vendor:   KVM
    >     > >     > > > > > > > Virtualization type: full
    >     > >     > > > > > > > L1d cache:           32K
    >     > >     > > > > > > > L1i cache:           32K
    >     > >     > > > > > > > L2 cache:            1024K
    >     > >     > > > > > > > L3 cache:            25344K
    >     > >     > > > > > > > NUMA node0 CPU(s):   0-17,36-53
    >     > >     > > > > > > > NUMA node1 CPU(s):   18-35,54-71
    >     > >     > > > > > > > Flags:               fpu vme de pse tsc msr pae
    > mce cx8
    >     > > apic
    >     > >     > sep
    >     > >     > > > mtrr
    >     > >     > > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2
    > ss ht
    >     > > syscall
    >     > >     > > > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon
    > rep_good
    >     > > nopl
    >     > >     > > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni
    > pclmulqdq
    >     > > monitor
    >     > >     > > > > > > > ssse3 fma cx16 pcid
    >     > >     > > > > > > > sse4_1 sse4_2 x2apic movbe popcnt
    > tsc_deadline_timer
    >     > aes
    >     > > xsave
    >     > >     > > > > > > > avx f16c rdrand hypervisor lahf_lm abm
    > 3dnowprefetch
    >     > >     > > > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle
    > avx2
    >     > smep
    >     > > bmi2
    >     > >     > > > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx
    > smap
    >     > >     > clflushopt
    >     > >     > > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec
    > xgetbv1
    >     > > xsaves
    >     > >     > > > > > > > ida arat pku ospke ----------Network
    > Test----------
    >     > >     > > > > > > >
    >     > >     > > > > > > > On Tue, Jun 25, 2019 at 2:35 PM Pedro Larroy
    >     > >     > > > > <pe...@gmail.com> wrote:
    >     > >     > > > > > > > >
    >     > >     > > > > > > > > I did a training of cifar10 in CPU and seems
    > there's
    >     > > some
    >     > >     > > > > > > > > regressions in the range of 7% increase of
    > training
    >     > time
    >     > >     > against
    >     > >     > > > 1.4.1:
    >     > >     > > > > > > > >
    >     > >     > > > > > > > > (py3_venv)
    >     > >     > > > > > > > > piotr@ip-172-31-63-171
    >     > > :0:~/deeplearning-benchmark/dawnbench
    >     > >     > > > > > > > > (master)+$ time python cifar10.py --epochs 5
    >     > >     > > > > > > > > real    11m30.388s
    >     > >     > > > > > > > > user    417m7.766s
    >     > >     > > > > > > > > sys     16m57.315s
    >     > >     > > > > > > > >
    >     > >     > > > > > > > > VS 1.4.1:
    >     > >     > > > > > > > > real    10m41.994s
    >     > >     > > > > > > > > user    392m40.646s
    >     > >     > > > > > > > > sys     12m30.601s
    >     > >     > > > > > > > >
    >     > >     > > > > > > > >
    >     > >     > > > > > > > > On Thu, Jun 20, 2019 at 10:15 PM Lai Wei <
    >     > >     > royweilai@gmail.com>
    >     > >     > > > > wrote:
    >     > >     > > > > > > > > >
    >     > >     > > > > > > > > > Hi Anirudh,
    >     > >     > > > > > > > > >
    >     > >     > > > > > > > > > Thanks for jumping into this quickly, I
    > followed up
    >     > > on the
    >     > >     > > > issue.
    >     > >     > > > > > > > > >
    >     > >     > > > > > > > > > I was meant for sockeye
    > developer/maintainers to
    >     > help
    >     > > setup
    >     > >     > > > > > > > > > nightly tests and raise issues early.
    >     > >     > > > > > > > > >
    >     > >     > > > > > > > > > Thanks!
    >     > >     > > > > > > > > >
    >     > >     > > > > > > > > > On Fri, Jun 21, 2019 at 10:10 AM Haibin Lin
    >     > >     > > > > > > > > > <ha...@gmail.com>
    >     > >     > > > > > > > > > wrote:
    >     > >     > > > > > > > > >
    >     > >     > > > > > > > > > > In GluonNLP we are testing with MXNET
    > nightly
    >     > build
    >     > > for
    >     > >     > > > > > > > > > > each PR, and we did find some MXNet
    > related issue
    >     > > caught
    >     > >     > by
    >     > >     > > > the CI.
    >     > >     > > > > > > > > > > I recommend other toolkits also add
    > integration
    >     > > tests
    >     > >     > with
    >     > >     > > > > > > > > > > MXNet
    >     > >     > > > > nightly.
    >     > >     > > > > > > > > > > It helps identify issues early.
    >     > >     > > > > > > > > > >
    >     > >     > > > > > > > > > > Best,
    >     > >     > > > > > > > > > > Haibin
    >     > >     > > > > > > > > > >
    >     > >     > > > > > > > > > > On Thu, Jun 20, 2019 at 18:52 Zhao, Patric
    >     > >     > > > > > > > > > > <pa...@intel.com>
    >     > >     > > > > wrote:
    >     > >     > > > > > > > > > >
    >     > >     > > > > > > > > > > > Thanks to raise the issue and we will
    > take a
    >     > look
    >     > > ASAP.
    >     > >     > > > > > > > > > > >
    >     > >     > > > > > > > > > > > The downstream cases is not in the MXNet
    > CI so
    >     > > it's
    >     > >     > hard
    >     > >     > > > > > > > > > > > to catch the potential bugs or
    > performance
    >     > > degradation
    >     > >     > > > > > > > > > > > for
    >     > >     > > > > MXNet developers.
    >     > >     > > > > > > > > > > >
    >     > >     > > > > > > > > > > > In the future, I suggest adding the major
    >     > > downstream
    >     > >     > > > > > > > > > > > test cases, like
    >     > >     > > > > > > > > > > from
    >     > >     > > > > > > > > > > > sockeye, GluonNLP, GLuonCV, DGL,
    > Gluon-TS, into
    >     > > the
    >     > >     > > > > > > > > > > > nightly
    >     > >     > > > > test.
    >     > >     > > > > > > > > > > > If it's still too heavy,  maybe testing
    > it
    >     > weekly
    >     > > or
    >     > >     > > > > > > > > > > > monthly :)
    >     > >     > > > > > > > > > > >
    >     > >     > > > > > > > > > > > Thanks,
    >     > >     > > > > > > > > > > >
    >     > >     > > > > > > > > > > > --Patric
    >     > >     > > > > > > > > > > >
    >     > >     > > > > > > > > > > > > -----Original Message-----
    >     > >     > > > > > > > > > > > > From: Anirudh Subramanian
    >     > >     > > > > > > > > > > > > [mailto:anirudh2290@gmail.com]
    >     > >     > > > > > > > > > > > > Sent: Friday, June 21, 2019 9:31 AM
    >     > >     > > > > > > > > > > > > To: dev@mxnet.incubator.apache.org
    >     > >     > > > > > > > > > > > > Cc: dev@mxnet.apache.org
    >     > >     > > > > > > > > > > > > Subject: Re: [VOTE] Release Apache
    > MXNet
    >     > > (incubating)
    >     > >     > > > > > > > > > > > > version
    >     > >     > > > > > > > > > > > > 1.5.0.rc1
    >     > >     > > > > > > > > > > > >
    >     > >     > > > > > > > > > > > > Hi Lai,
    >     > >     > > > > > > > > > > > >
    >     > >     > > > > > > > > > > > > I have opened an issue:
    >     > >     > > > > > > > > > > > >
    >     > >     > https://github.com/apache/incubator-mxnet/issues/15297
    >     > >     > > > > > > > > > > > > I came to know about this issue only
    > today
    >     > and
    >     > > I have
    >     > >     > > > > > > > > > > > > not been
    >     > >     > > > > > > > > > > monitoring
    >     > >     > > > > > > > > > > > > sockeye.
    >     > >     > > > > > > > > > > > > I jumped onto this issue to make sure
    > it
    >     > wasn't
    >     > >     > caused
    >     > >     > > > > > > > > > > > > by the dlpack
    >     > >     > > > > > > > > > > > changes.
    >     > >     > > > > > > > > > > > > Also, I don't  think sockeye CI checks
    >     > against
    >     > >     > master,
    >     > >     > > > > > > > > > > > > it is using
    >     > >     > > > > > > > > > > 1.4.1.
    >     > >     > > > > > > > > > > > >
    >     > >     > > > > > > > > > > > > Anirudh
    >     > >     > > > > > > > > > > > >
    >     > >     > > > > > > > > > > > >
    >     > >     > > > > > > > > > > > > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei
    >     > >     > > > > > > > > > > > > <ro...@gmail.com>
    >     > >     > > > > wrote:
    >     > >     > > > > > > > > > > > >
    >     > >     > > > > > > > > > > > > > Hi,
    >     > >     > > > > > > > > > > > > >
    >     > >     > > > > > > > > > > > > > Could you share which test failed and
    >     > what’s
    >     > > the
    >     > >     > > > > > > > > > > > > > crash? How to reproduce it?
    >     > >     > > > > > > > > > > > > >
    >     > >     > > > > > > > > > > > > > I was able to install sockeye and
    > run all
    >     > > tests
    >     > >     > passed.
    >     > >     > > > > > > > > > > > > > Using python setup.py test
    >     > >     > > > > > > > > > > > > >
    >     > >     > > > > > > > > > > > > > I have tested both nightly pip
    > package and
    >     > >     > 1.5.0.rc1
    >     > >     > > > > > > > > > > > > >
    >     > >     > > > > > > > > > > > > > It would be great to create an issue
    > with
    >     > >     > > > > > > > > > > > > > reproducible steps and move the
    > discussion
    >     > > there.
    >     > >     > > > > > > > > > > > > >
    >     > >     > > > > > > > > > > > > > Also I see sockeye nightly build[1]
    > has
    >     > been
    >     > >     > failing
    >     > >     > > > > > > > > > > > > > for some time,
    >     > >     > > > > > > > > > > if
    >     > >     > > > > > > > > > > > > > it’s due to MXNet change, please
    > raise this
    >     > > early
    >     > >     > so
    >     > >     > > > > > > > > > > > > > we can track and solve it in time
    > rather
    >     > than
    >     > > block
    >     > >     > > > > > > > > > > > > > the release
    >     > >     > > > > during vote time.
    >     > >     > > > > > > > > > > > > >
    >     > >     > > > > > > > > > > > > > [1]
    > https://travis-ci.org/awslabs/sockeye
    >     > >     > > > > > > > > > > > > >
    >     > >     > > > > > > > > > > > > >
    >     > >     > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 7:01 AM
    > Anirudh
    >     > > Subramanian
    >     > >     > > > > > > > > > > > >
    



Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Posted by Marco de Abreu <ma...@gmail.com>.
Hey Denis,

I don't think something like an experimental release is something that the
Apache release process supports. Also, I would be afraid of automated
systems consuming MXNet by simply fetching the latest release version.
These users would then get the experimental version without being aware.

For the sake of the best user experience, I'd prefer if we could take a few
days to track down the root causes for all these regressions. While I agree
that releasing the new features and optimizations is certainly overdue, I
think that the most important point is to keep up with the existing users
and their trust. If a new release performs worse for the same kind of
workload, they might lose trust into our release process and in future
might be less willing to adopt a new release early-on.

-Marco

Davydenko, Denis <dz...@gmail.com> schrieb am Fr., 28. Juni
2019, 18:55:

> According to Sandeep's evaluation of perf regression on operator level [1]
> we have 77 op/input combinations for forward pass and 50 for backward pass
> where regression is 5%+ (biggest regressions observed are about 86% and 84%
> respectively) out of 290 tests. If I raise threshold of degradation to 10%+
> corresponding numbers are 70 for forward and 42 for backward. This, from my
> perspective, constitutes significant scale performance impact, at least on
> individual operator level. In light of keeping every next release as
> performant as previous (at least to feasible extent) I suggest we can only
> move forward with 1.5.0 release if we call it experimental. Current
> landscape of operators having potentially negative performance impact on
> customers could (and I consider it will) put MXNet one step behind its
> current market position of being a choice for performance optimized DL
> workloads. Tagging it as experimental, from my point of view, would help to
> release new features so that customers could enjoy them while being
> explicit about performance optimizations going on.
>
> [1]
> https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50
>
>
>
> On 6/28/19, 9:38 AM, "Lai Wei" <ro...@gmail.com> wrote:
>
>     Hi,
>
>     Some more data points:
>
>     I ran the same cifar10.py scripts with same setup, BUT added a fixed
> seed
>
>     Ran 50 epochs, and first 10 epoch as warmup.
>     I have the following average time per epoch:
>     1.4.1: 164.95 s
>     1.5.0: 170.44 s
>     Detailed data at [1]
>     This is about 3% regression, less than Manu’s result but more close to
> the
>     Gluon result.
>
>     As for the operator benchmarks from Sandeep[2],  I have calculated the
>     percentage of speed increase/regression here[1]. Looks like not all
>     operators mentioned before slowed down. should it be treated as an
> separate
>     issue as it’s testing on fake data with different shape than CIFAR10
>     dataset? For example, batch norm has no regression in the report but
> it’s
>     slowed down in cifar10.py script profiling.
>
>     [1] https://gist.github.com/roywei/41fce930f013ff3b54cda6e86eaaf66b
>     [2]
>
> https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50
>
>
>     On Fri, Jun 28, 2019 at 2:47 PM Pedro Larroy <
> pedro.larroy.lists@gmail.com>
>     wrote:
>
>     > Thanks Manu.
>     >
>     > @all: I observed other strange stuff that I don't understand at the
> moment:
>     >
>     > I installed rc for 1.5 from pip to check that I'm not doing something
>     > wrong when building. And I found out that the usage of CPU is quite
>     > subpar ( https://imgur.com/fRmbQNc ) compared to a version compiled
>     > from source. The pip package is using 4-5 cores of the 32. When I
>     > compile from source I get good core utilization. (
>     > https://imgur.com/e8BB425 ). I verified this also on a c5d.18xlarge
>     > and a 32 core AMD bare metal machine.
>     >
>     > Seems to me also that the version from pip is using gomp instead of
>     > llvm's omp. I'm not sure why.
>     >
>     > pip install mxnet==1.5.0b20190627
>     > /home/piotr/py3_1.5rc/lib/python3.6/site-packages/mxnet
>     > piotr@panther:0: ~/p/l/p/s/mxnet> ldd libmxnet.so | grep omp
>     >     libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
>     > (0x00007f99d1832000)
>     >
>     > I tried cifar10 on a bare metal 32 core AMD Zen machine and is
>     > extremely slow, doesn't seem to make much progress, when compared to
> a
>     > c5d.18xlarge, I couldn't even do 1 epoch, tried with and without MKL
>     > without much success. Will continue digging into this when possible.
>     >
>     >
>     > Pedro.
>     >
>     > On Thu, Jun 27, 2019 at 9:41 PM Manu Seth <ma...@gmail.com>
> wrote:
>     > >
>     > > Hi all,
>     > >
>     > > I ran the same cifar10.py script as Pedro, but for 20 epochs.
> Considering
>     > > the first 10 epochs for warm-up, I averaged time per epoch for the
> last
>     > 10
>     > > epochs.
>     > >
>     > > With MXNet 1.4.1 average time is 164.23 s
>     > > With MXNet 1.5.0 average time is 174.59 s (~6.3% regression)
>     > >
>     > >
>     > > For a second data point, I ran Gluon speed test benchmark script -
>     > >
>     >
> https://github.com/apache/incubator-mxnet/blob/master/benchmark/python/gluon/benchmark_gluon.py
>     > > using the following command:
>     > > python3 benchmark_gluon.py --model 'resnet152_v2' --batch-size 128
>     > > --num-batches 200 --type 'training'
>     > >
>     > > I got the following speeds:
>     > > With MXNet 1.4.1, average speed is 25.677534 img/s
>     > > With MXNet 1.5.0, average speed is 25.082130 img/s (~2.3%
> regression)
>     > >
>     > > Note:
>     > > For 1.4.1 version, I used pip install mxnet-mkl==1.4.1
>     > > For 1.5.0 version, I used pip install mxnet-mkl==1.5.0b20190619
> which
>     > > corresponds to commit# ccbbf6b4b76ea536a6583c99497c83b65a20817b
> which is
>     > > behind 1.5.x branch by 4 commits
>     > >
>     > >
>     > > Best,
>     > > Manu
>     > >
>     > >
>     > > On 6/27/19, 3:37 PM, "sandeep krishnamurthy" <
>     > sandeep.krishna98@gmail.com>
>     > > wrote:
>     > >
>     > >     Hello Ciyong/Pedro,
>     > >
>     > >     Ran operator benchmarks on 1.4.1 and 1.5.0.rc2. (Not complete,
>     > doesn’t
>     > >     cover all MXNet operators, not presented in best possible way,
> still
>     > > WIP)
>     > >
>     > >
>     >
> https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50
>     > >
>     > >     Following operators looks slower in 1.5 compared to 1.4.1:
>     > >     - BatchNorm
>     > >     - Pooling
>     > >     - FullyConnected
>     > >     - batch_dot
>     > >     - Dot
>     > >     - broadcast_mul
>     > >     - log_softmax
>     > >     and few other operators
>     > >
>     > >     Also, several operators runs a lot faster on 1.5 compared to
> 1.4.1.
>     > For
>     > >     example - Convolution, flatten, elementwise operators etc. So
> I see
>     > that
>     > >     likely few operators have regressed noticeably, however, due
> to other
>     > >     operator performance improvements, the end effect is not that
>     > > significant
>     > >     hiding a lot of regression. We need more detailed analysis per
>     > operator
>     > >     performance. We will not be able to do this for current
> release, we
>     > > should
>     > >     have a more concrete way to determining such performance
> regression
>     > > before
>     > >     next release.
>     > >
>     > >     Setup:
>     > >     1.5 => Build from source (head of 1.5.rc2 tag), built with
> MKLDNN
>     > >     1.4.1 => PyPi mxnet-mkl==1.4.1
>     > >     Machine: C5.18X
>     > >     No explicit environment variable were set
>     > >     Operator benchmark code -
>     > >
>     >
> https://github.com/apache/incubator-mxnet/tree/master/benchmark/opperf
>     > >
>     > >     Best,
>     > >     Sandeep
>     > >
>     > >
>     > >     On Thu, Jun 27, 2019 at 10:42 AM Pedro Larroy <
>     > > pedro.larroy.lists@gmail.com>
>     > >     wrote:
>     > >
>     > >     > I will try to run a few benchmarks in a bare metal instance
>     > tonight to
>     > >     > remove virtualization variance for the measurements and
> provide
>     > some
>     > >     > numbers.
>     > >     >
>     > >     > Please propose a set of models / examples that would be
> desirable
>     > to
>     > >     > run before the release and provide a link to an easy to run
> script
>     > >     > with instructions so we can validate the release better.
>     > >     >
>     > >     > Thank you.
>     > >     >
>     > >     > On Thu, Jun 27, 2019 at 10:01 AM Lai Wei <
> royweilai@gmail.com>
>     > wrote:
>     > >     > >
>     > >     > > Dear @dev,
>     > >     > >
>     > >     > > I m cancelling the vote for cached op fix:
>     > >     > >
>     > >     > > https://github.com/apache/incubator-mxnet/pull/15298
>     > >     > >
>     > >     > > As for the possible cpu training regression, it looks like
> not a
>     > > blocker
>     > >     > > for now.
>     > >     > >
>     > >     > > I will start a new rc2 vote, please help to validate.
>     > >     > >
>     > >     > > Thanks!
>     > >     > >
>     > >     > >
>     > >     > > On Thu, Jun 27, 2019 at 10:06 PM Chen, Ciyong <
>     > ciyong.chen@intel.com
>     > > >
>     > >     > wrote:
>     > >     > >
>     > >     > > > Hi Pedro,
>     > >     > > >
>     > >     > > > I was able to reproduced the similar result (v1.5 is
> ~%5.6
>     > slower
>     > > than
>     > >     > > > v1.4, I was using 18 cores for computing) with your
> script on
>     > >     > C5.18xlarge.
>     > >     > > > But need to bind the cores with below command when
> running the
>     > > script,
>     > >     > > > (without setting the env variables, I got a close time
> (<1%)
>     > with
>     > > v1.5
>     > >     > and
>     > >     > > > v1.4)
>     > >     > > >         export
>     > > KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0
>     > >     > > >         export OMP_NUM_THREADS=18
>     > >     > > >
>     > >     > > > Did you set any env variables during running?
>     > >     > > >
>     > >     > > > The performance result I got as below:
>     > >     > > > 1) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
>     > >     > > > real    12m10.856s
>     > >     > > > user    234m49.576s
>     > >     > > > sys     4m38.044s
>     > >     > > >
>     > >     > > > 2) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
>     > >     > > > real    12m52.140s
>     > >     > > > user    246m30.740s
>     > >     > > > sys     5m8.188s
>     > >     > > >
>     > >     > > > As I looked at the profiling data, most of the ops have
> same
>     > perf
>     > >     > between
>     > >     > > > v1.4 and v1.5. But some ops like " _backward_BatchNorm"
> and
>     > > "Pooling"
>     > >     > is
>     > >     > > > ~1.37x slower on v1.5 compared with v1.4.
>     > >     > > > Will do further analysis on these ops.
>     > >     > > >
>     > >     > > > Here's the hardware/OS info from my side:
>     > >     > > > ----------Python Info----------
>     > >     > > > Version      : 3.6.8
>     > >     > > > Compiler     : GCC 7.3.0
>     > >     > > > Build        : ('default', 'Dec 30 2018 01:22:34')
>     > >     > > > Arch         : ('64bit', '')
>     > >     > > > ------------Pip Info-----------
>     > >     > > > Version      : 19.0.3
>     > >     > > > Directory    :
>     > >     > > >
>     > >
> /home/ubuntu/anaconda3/envs/perf-mxnet/lib/python3.6/site-packages/pip
>     > >     > > > ----------MXNet Info-----------
>     > >     > > > Version      : 1.5.0
>     > >     > > > Directory    :
> /home/ubuntu/ws/incubator-mxnet/python/mxnet
>     > >     > > > Hashtag not found. Not installed from pre-built package.
>     > >     > > > ----------System Info----------
>     > >     > > > Platform     :
>     > Linux-4.4.0-1085-aws-x86_64-with-debian-stretch-sid
>     > >     > > > system       : Linux
>     > >     > > > node         : ip-172-31-32-129
>     > >     > > > release      : 4.4.0-1085-aws
>     > >     > > > version      : #96-Ubuntu SMP Tue Jun 11 09:08:32 UTC
> 2019
>     > >     > > > ----------Hardware Info----------
>     > >     > > > machine      : x86_64
>     > >     > > > processor    : x86_64
>     > >     > > > Architecture:          x86_64
>     > >     > > > CPU op-mode(s):        32-bit, 64-bit
>     > >     > > > Byte Order:            Little Endian
>     > >     > > > CPU(s):                72
>     > >     > > > On-line CPU(s) list:   0-71
>     > >     > > > Thread(s) per core:    2
>     > >     > > > Core(s) per socket:    18
>     > >     > > > Socket(s):             2
>     > >     > > > NUMA node(s):          2
>     > >     > > > Vendor ID:             GenuineIntel
>     > >     > > > CPU family:            6
>     > >     > > > Model:                 85
>     > >     > > > Model name:            Intel(R) Xeon(R) Platinum 8124M
> CPU @
>     > > 3.00GHz
>     > >     > > > Stepping:              3
>     > >     > > > CPU MHz:               3000.000
>     > >     > > > BogoMIPS:              6000.00
>     > >     > > > Hypervisor vendor:     KVM
>     > >     > > > Virtualization type:   full
>     > >     > > > L1d cache:             32K
>     > >     > > > L1i cache:             32K
>     > >     > > > L2 cache:              1024K
>     > >     > > > L3 cache:              25344K
>     > >     > > > NUMA node0 CPU(s):     0-17,36-53
>     > >     > > > NUMA node1 CPU(s):     18-35,54-71
>     > >     > > > Flags:                 fpu vme de pse tsc msr pae mce
> cx8 apic
>     > > sep mtrr
>     > >     > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht
> syscall
>     > nx
>     > >     > pdpe1gb
>     > >     > > > rdtscp lm constant_tsc arch_perfmon rep_good nopl
> xtopology
>     > > nonstop_tsc
>     > >     > > > aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3
> fma cx16
>     > > pcid
>     > >     > sse4_1
>     > >     > > > sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave
> avx
>     > f16c
>     > > rdrand
>     > >     > > > hypervisor lahf_lm abm 3dnowprefetch invpcid_single
> kaiser
>     > > fsgsbase
>     > >     > > > tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx
> avx512f
>     > > rdseed
>     > >     > adx
>     > >     > > > smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1
> ida arat
>     > pku
>     > >     > > > ----------Network Test----------
>     > >     > > >
>     > >     > > >
>     > >     > > > -Ciyong
>     > >     > > >
>     > >     > > >
>     > >     > > > -----Original Message-----
>     > >     > > > From: Zhao, Patric [mailto:patric.zhao@intel.com]
>     > >     > > > Sent: Thursday, June 27, 2019 9:55 AM
>     > >     > > > To: dev@mxnet.incubator.apache.org
>     > >     > > > Cc: dev@mxnet.apache.org
>     > >     > > > Subject: RE: [VOTE] Release Apache MXNet (incubating)
> version
>     > > 1.5.0.rc1
>     > >     > > >
>     > >     > > > Could we run more epochs to see the performance
> difference or
>     > > profiling
>     > >     > > > the difference between good and bad run?
>     > >     > > >
>     > >     > > > > -----Original Message-----
>     > >     > > > > From: Pedro Larroy [mailto:
> pedro.larroy.lists@gmail.com]
>     > >     > > > > Sent: Thursday, June 27, 2019 9:35 AM
>     > >     > > > > To: dev@mxnet.incubator.apache.org
>     > >     > > > > Cc: dev@mxnet.apache.org
>     > >     > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating)
> version
>     > >     > > > > 1.5.0.rc1
>     > >     > > > >
>     > >     > > > > I run again and the gap is again bigger, I guess we
> need to
>     > > average
>     > >     > > > > out the times across several runs:
>     > >     > > > >
>     > >     > > > > piotr@ip-172-31-63-171
> :0:~/deeplearning-benchmark/dawnbench
>     > >     > > > > (master)+$ time ~/mxnet_1.4/py3_venv/bin/python
> cifar10.py
>     > > --epochs 5
>     > >     > > > > && time ~/mxnet_1.5/py3_venv/bin/python cifar10.py
> --epochs 5
>     > >     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
>     > >     > > > > ImageRecordIOParser2:
>     > >     > > > >
> /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use
>     > 4
>     > >     > threads
>     > >     > > > > for decoding..
>     > >     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230:
> Load mean
>     > > image
>     > >     > > > > from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > >     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248:
> Load mean
>     > > image
>     > >     > > > > from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > completed
>     > >     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
>     > >     > > > > ImageRecordIOParser2:
>     > >     > > > >
> /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
>     > > threads
>     > >     > > > > for decoding..
>     > >     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230:
> Load mean
>     > > image
>     > >     > > > > from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > >     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248:
> Load mean
>     > > image
>     > >     > > > > from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > completed
>     > >     > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123:
> 0.0005,
>     > > 300:
>     > >     > > > > 0.0001} Epoch 0, Changed learning rate to 0.05
> [23:17:09]
>     > >     > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
>     > >     > > > > 147456 bytes with malloc directly
>     > >     > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
>     > Allocate
>     > >     > > > > 589824 bytes with malloc directly
>     > >     > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
>     > Allocate
>     > >     > > > > 2359296 bytes with malloc directly
>     > >     > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
>     > Allocate
>     > >     > > > > 9437184 bytes with malloc directly
>     > >     > > > > Epoch 0, Batch 199, Speed=384.149839
>     > >     > > > > Epoch 0, Duration=140.919567
>     > >     > > > > Epoch 0, Training accuracy=0.115169
>     > >     > > > > Epoch 0, Validation accuracy=0.141317
>     > >     > > > > Epoch 1, Batch 199, Speed=433.380512
>     > >     > > > > Epoch 1, Duration=119.553233
>     > >     > > > > Epoch 1, Training accuracy=0.170956
>     > >     > > > > Epoch 1, Validation accuracy=0.216146
>     > >     > > > > Epoch 2, Batch 199, Speed=434.864699
>     > >     > > > > Epoch 2, Duration=123.278490
>     > >     > > > > Epoch 2, Training accuracy=0.209455
>     > >     > > > > Epoch 2, Validation accuracy=0.247296
>     > >     > > > > Epoch 3, Batch 199, Speed=433.401854
>     > >     > > > > Epoch 3, Duration=118.327797
>     > >     > > > > Epoch 3, Training accuracy=0.248701
>     > >     > > > > Epoch 3, Validation accuracy=0.302083
>     > >     > > > > Epoch 4, Batch 199, Speed=419.713707
>     > >     > > > > Epoch 4, Duration=126.468409
>     > >     > > > > Epoch 4, Training accuracy=0.260949
>     > >     > > > > Epoch 4, Validation accuracy=0.269030
>     > >     > > > >
>     > >     > > > > real    10m55.796s
>     > >     > > > > user    399m33.567s
>     > >     > > > > sys     13m55.904s
>     > >     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
>     > >     > > > > ImageRecordIOParser2:
>     > >     > > > >
> /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use
>     > 4
>     > >     > threads
>     > >     > > > > for decoding..
>     > >     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230:
> Load mean
>     > > image
>     > >     > > > > from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > >     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248:
> Load mean
>     > > image
>     > >     > > > > from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > completed
>     > >     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
>     > >     > > > > ImageRecordIOParser2:
>     > >     > > > >
> /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
>     > > threads
>     > >     > > > > for decoding..
>     > >     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230:
> Load mean
>     > > image
>     > >     > > > > from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > >     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248:
> Load mean
>     > > image
>     > >     > > > > from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > completed
>     > >     > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123:
> 0.0005,
>     > > 300:
>     > >     > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch
> 0, Batch
>     > > 199,
>     > >     > > > > Speed=419.039188 Epoch 0, Duration=143.934903 Epoch 0,
>     > Training
>     > >     > > > > accuracy=0.122542 Epoch 0, Validation accuracy=0.164359
>     > Epoch 1,
>     > >     > Batch
>     > >     > > > > 199, Speed=445.257048 Epoch 1, Duration=135.248399
> Epoch 1,
>     > > Training
>     > >     > > > > accuracy=0.178828 Epoch 1, Validation accuracy=0.199419
>     > Epoch 2,
>     > >     > Batch
>     > >     > > > > 199, Speed=447.115215 Epoch 2, Duration=132.003770
> Epoch 2,
>     > > Training
>     > >     > > > > accuracy=0.217808 Epoch 2, Validation accuracy=0.233073
>     > Epoch 3,
>     > >     > Batch
>     > >     > > > > 199, Speed=441.079477 Epoch 3, Duration=126.543316
> Epoch 3,
>     > > Training
>     > >     > > > > accuracy=0.248102 Epoch 3, Validation accuracy=0.293870
>     > Epoch 4,
>     > >     > Batch
>     > >     > > > > 199, Speed=449.329787 Epoch 4, Duration=138.398325
> Epoch 4,
>     > > Training
>     > >     > > > > accuracy=0.270021 Epoch 4, Validation accuracy=0.311498
>     > >     > > > >
>     > >     > > > > real    11m45.329s
>     > >     > > > > user    426m13.908s
>     > >     > > > > sys     16m45.093s
>     > >     > > > >
>     > >     > > > > On Wed, Jun 26, 2019 at 4:18 PM Pedro Larroy
>     > >     > > > > <pe...@gmail.com> wrote:
>     > >     > > > > >
>     > >     > > > > > The difference looks smaller now, more like your
> numbers. I
>     > > wonder
>     > >     > > > > > if something happened during the previous benchmark
> like a
>     > > system
>     > >     > > > > > update...
>     > >     > > > > >
>     > >     > > > > >
>     > >     > > > > > piotr@ip-172-31-63-171
>     > :0:~/deeplearning-benchmark/dawnbench
>     > >     > > > > (master)+$
>     > >     > > > > > time ~/mxnet_1.4/py3_venv/bin/python cifar10.py
> --epochs 5
>     > &&
>     > > time
>     > >     > > > > > ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
>     > > [22:49:41]
>     > >     > > > > > ../src/io/iter_image_recordio_2.cc:172:
>     > >     > > > > > ImageRecordIOParser2:
>     > >     > > > > >
> /home/piotr/deeplearning-benchmark/data/cifar/train.rec,
>     > use 4
>     > >     > > > > > threads for decoding..
>     > >     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230:
> Load
>     > mean
>     > > image
>     > >     > > > > > from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > >     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248:
> Load
>     > mean
>     > > image
>     > >     > > > > > from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > >     > > > > completed
>     > >     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:172:
>     > >     > > > > > ImageRecordIOParser2:
>     > >     > > > > >
> /home/piotr/deeplearning-benchmark/data/cifar/test.rec,
>     > use 4
>     > >     > > > > > threads for decoding..
>     > >     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230:
> Load
>     > mean
>     > > image
>     > >     > > > > > from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > >     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248:
> Load
>     > mean
>     > > image
>     > >     > > > > > from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > >     > > > > completed
>     > >     > > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123:
>     > 0.0005,
>     > > 300:
>     > >     > > > > > 0.0001} Epoch 0, Changed learning rate to 0.05
> [22:49:42]
>     > >     > > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
>     > >     > > > > > 147456 bytes with malloc directly
>     > >     > > > > > [22:49:42]
> ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
>     > > Allocate
>     > >     > > > > > 589824 bytes with malloc directly
>     > >     > > > > > [22:49:42]
> ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
>     > > Allocate
>     > >     > > > > > 2359296 bytes with malloc directly
>     > >     > > > > > [22:49:42]
> ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
>     > > Allocate
>     > >     > > > > > 9437184 bytes with malloc directly
>     > >     > > > > > Epoch 0, Batch 199, Speed=426.182733 Epoch 0,
>     > > Duration=134.868458
>     > >     > > > > > Epoch 0, Training accuracy=0.127238 Epoch 0,
> Validation
>     > >     > > > > > accuracy=0.206388 Epoch 1, Batch 199,
> Speed=313.127156
>     > Epoch
>     > > 1,
>     > >     > > > > > Duration=128.041775 Epoch 1, Training
> accuracy=0.182065
>     > Epoch
>     > > 1,
>     > >     > > > > > Validation accuracy=0.202524 Epoch 2, Batch 199,
>     > > Speed=410.931187
>     > >     > > > > > Epoch 2, Duration=124.920588 Epoch 2, Training
>     > > accuracy=0.202584
>     > >     > > > > > Epoch 2, Validation accuracy=0.245693 Epoch 3, Batch
> 199,
>     > >     > > > > > Speed=419.119335 Epoch 3, Duration=120.948349 Epoch
> 3,
>     > > Training
>     > >     > > > > > accuracy=0.235854 Epoch 3, Validation
> accuracy=0.291066
>     > Epoch
>     > > 4,
>     > >     > > > > > Batch 199, Speed=430.473733 Epoch 4,
> Duration=130.181724
>     > > Epoch 4,
>     > >     > > > > > Training accuracy=0.257773 Epoch 4, Validation
>     > > accuracy=0.304988
>     > >     > > > > >
>     > >     > > > > > real    11m7.356s
>     > >     > > > > > user    406m9.910s
>     > >     > > > > > sys     14m18.349s
>     > >     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
>     > >     > > > > > ImageRecordIOParser2:
>     > >     > > > > >
> /home/piotr/deeplearning-benchmark/data/cifar/train.rec,
>     > use 4
>     > >     > > > > > threads for decoding..
>     > >     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230:
> Load
>     > mean
>     > > image
>     > >     > > > > > from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > >     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248:
> Load
>     > mean
>     > > image
>     > >     > > > > > from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > >     > > > > completed
>     > >     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
>     > >     > > > > > ImageRecordIOParser2:
>     > >     > > > > >
> /home/piotr/deeplearning-benchmark/data/cifar/test.rec,
>     > use 4
>     > >     > > > > > threads for decoding..
>     > >     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230:
> Load
>     > mean
>     > > image
>     > >     > > > > > from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > >     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248:
> Load
>     > mean
>     > > image
>     > >     > > > > > from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > >     > > > > completed
>     > >     > > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123:
>     > 0.0005,
>     > > 300:
>     > >     > > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch
> 0,
>     > Batch
>     > > 199,
>     > >     > > > > > Speed=348.618154 Epoch 0, Duration=146.469352 Epoch
> 0,
>     > > Training
>     > >     > > > > > accuracy=0.124121 Epoch 0, Validation
> accuracy=0.167227
>     > Epoch
>     > > 1,
>     > >     > > > > > Batch 199, Speed=452.790825 Epoch 1,
> Duration=130.199421
>     > > Epoch 1,
>     > >     > > > > > Training
>     > >     > > > > > accuracy=0.183863 Epoch 1, Validation
> accuracy=0.237079
>     > Epoch
>     > > 2,
>     > >     > > > > > Batch 199, Speed=451.406559 Epoch 2,
> Duration=126.320823
>     > > Epoch 2,
>     > >     > > > > > Training
>     > >     > > > > > accuracy=0.214844 Epoch 2, Validation
> accuracy=0.244692
>     > Epoch
>     > > 3,
>     > >     > > > > > Batch 199, Speed=403.161873 Epoch 3,
> Duration=125.331660
>     > > Epoch 3,
>     > >     > > > > > Training
>     > >     > > > > > accuracy=0.243506 Epoch 3, Validation
> accuracy=0.301182
>     > Epoch
>     > > 4,
>     > >     > > > > > Batch 199, Speed=450.826598 Epoch 4,
> Duration=126.426253
>     > > Epoch 4,
>     > >     > > > > > Training
>     > >     > > > > > accuracy=0.266424 Epoch 4, Validation
> accuracy=0.311899
>     > >     > > > > >
>     > >     > > > > > real    11m21.930s
>     > >     > > > > > user    415m3.855s
>     > >     > > > > > sys     13m53.975s
>     > >     > > > > >
>     > >     > > > > > On Wed, Jun 26, 2019 at 3:50 PM Pedro Larroy
>     > >     > > > > > <pe...@gmail.com> wrote:
>     > >     > > > > > >
>     > >     > > > > > > Hi Ciyong, thanks for trying to reproduce:
>     > >     > > > > > >
>     > >     > > > > > > I used this one:
>     > >     > > > > > > https://github.com/awslabs/deeplearning-
>     > >     > > > > benchmark/blob/master/dawnbe
>     > >     > > > > > > nch/cifar10.py
>     > >     > > > > > >
>     > >     > > > > > > Could you provide hardware and OS details?
>     > >     > > > > > >
>     > >     > > > > > > I will rerun and repost numbers in a few minutes.
>     > >     > > > > > >
>     > >     > > > > > > Pedro.
>     > >     > > > > > >
>     > >     > > > > > > On Wed, Jun 26, 2019 at 4:18 AM Chen, Ciyong
>     > >     > > > > > > <ci...@intel.com>
>     > >     > > > > wrote:
>     > >     > > > > > > >
>     > >     > > > > > > > Hi Pedro,
>     > >     > > > > > > >
>     > >     > > > > > > > I'm looking at this case, and using the script of
>     > >     > > > > > > >
>     > > "incubator-mxnet/example/image-classification/train_cifar10.py"
>     > >     > > > > > > > to get
>     > >     > > > > the timing data, but seems there's not much difference
>     > between
>     > > mxnet
>     > >     > > > > 1.4.1.rc0 and 1.5.0.rc1 on C5.18xlarge.
>     > >     > > > > > > >
>     > >     > > > > > > > Not sure if there's any difference in the python
>     > script,
>     > > can
>     > >     > you
>     > >     > > > > > > > point me
>     > >     > > > > the link to get your script (cifar10.py)?
>     > >     > > > > > > > Or you can also have a try with MXNet's script
>     > >     > > > > > > > (train_cifar10.py) and see
>     > >     > > > > the performance.
>     > >     > > > > > > >
>     > >     > > > > > > > Here's the command I used to collect the time:
>     > >     > > > > > > >         python train_cifar10.py --num-epoch=5
>     > >     > > > > > > >
>     > >     > > > > > > > 1) 1.5.0.rc1
> (4d9667121ae6fb643f2a02ab15e25231ed756cde)
>     > >     > > > > > > >         real    9m4.880s
>     > >     > > > > > > >         user    333m13.340s
>     > >     > > > > > > >         sys     14m36.100s
>     > >     > > > > > > >
>     > >     > > > > > > > 2) 1.4.1.rc0
> (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
>     > >     > > > > > > >         real    9m2.155s
>     > >     > > > > > > >         user    329m37.092s
>     > >     > > > > > > >         sys     16m8.668s
>     > >     > > > > > > >
>     > >     > > > > > > > -Ciyong
>     > >     > > > > > > >
>     > >     > > > > > > >
>     > >     > > > > > > > -----Original Message-----
>     > >     > > > > > > > From: Pedro Larroy [mailto:
>     > pedro.larroy.lists@gmail.com]
>     > >     > > > > > > > Sent: Wednesday, June 26, 2019 6:28 AM
>     > >     > > > > > > > To: dev@mxnet.incubator.apache.org
>     > >     > > > > > > > Cc: dev@mxnet.apache.org
>     > >     > > > > > > > Subject: Re: [VOTE] Release Apache MXNet
> (incubating)
>     > > version
>     > >     > > > > > > > 1.5.0.rc1
>     > >     > > > > > > >
>     > >     > > > > > > > Hi these were my build flags and system info:
>     > >     > > > > > > >
>     > >     > > > > > > >
>     > >     > > > > > > > --- # CMake configuration
>     > >     > > > > > > > USE_CUDA: "OFF" # Build with CUDA support
>     > >     > > > > > > > USE_OLDCMAKECUDA: "OFF" # Build with old cmake
> cuda
>     > >     > > > > > > > USE_NCCL: "OFF" # Use NVidia NCCL with CUDA
>     > >     > > > > > > > USE_OPENCV: "ON" # Build with OpenCV support
>     > >     > > > > > > > USE_OPENMP: "ON" # Build with Openmp support
>     > >     > > > > > > > USE_CUDNN: "ON" # Build with cudnn support) #
> one could
>     > > set
>     > >     > > > > > > > CUDNN_ROOT for search path
>     > >     > > > > > > > USE_SSE: "ON" # Build with x86 SSE instruction
> support
>     > IF
>     > > NOT
>     > >     > > > > > > > ARM
>     > >     > > > > > > > USE_F16C: "ON" # Build with x86 F16C instruction
>     > support)
>     > > #
>     > >     > > > > autodetects support if "ON"
>     > >     > > > > > > > USE_LAPACK: "ON" # Build with lapack support
>     > >     > > > > > > > USE_MKL_IF_AVAILABLE: "ON" # Use MKL if found
>     > >     > > > > > > > USE_MKLML_MKL: "ON" # Use MKLDNN variant of MKL
> (if MKL
>     > > found)
>     > >     > > > > > > > IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
>     > >     > > > > > > > USE_MKLDNN: "ON" # Use MKLDNN variant of MKL (if
> MKL
>     > > found) IF
>     > >     > > > > > > > USE_MKL_IF_AVAILABLE AND (NOT APPLE)
>     > >     > > > > > > > USE_OPERATOR_TUNING: "ON" # Enable auto-tuning of
>     > > operators IF
>     > >     > > > > NOT
>     > >     > > > > > > > MSVC
>     > >     > > > > > > > USE_GPERFTOOLS: "ON" # Build with GPerfTools
> support
>     > (if
>     > > found)
>     > >     > > > > > > > USE_JEMALLOC: "ON" # Build with Jemalloc support
>     > >     > > > > > > > USE_PROFILER: "ON" # Build with Profiler support
>     > >     > > > > > > > USE_DIST_KVSTORE: "OFF" # Build with DIST_KVSTORE
>     > support
>     > >     > > > > > > > USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC Plugins
>     > >     > > > > > > > USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin
>     > >     > > > > > > > USE_CPP_PACKAGE: "OFF" # Build C++ Package
>     > >     > > > > > > > USE_MXNET_LIB_NAMING: "ON" # Use MXNet library
> naming
>     > >     > > > > conventions.
>     > >     > > > > > > > USE_GPROF: "OFF" # Compile with gprof
> (profiling) flag
>     > >     > > > > > > > USE_CXX14_IF_AVAILABLE: "OFF" # Build with C++14
> if the
>     > >     > compiler
>     > >     > > > > > > > supports it
>     > >     > > > > > > > USE_VTUNE: "OFF" # Enable use of Intel Amplifier
> XE
>     > > (VTune)) #
>     > >     > > > > > > > one could set VTUNE_ROOT for search path
>     > >     > > > > > > > ENABLE_CUDA_RTC: "ON" # Build with CUDA runtime
>     > > compilation
>     > >     > > > > > > > support
>     > >     > > > > > > > BUILD_CPP_EXAMPLES: "ON" # Build cpp examples
>     > >     > > > > > > > INSTALL_EXAMPLES: "OFF" # Install the example
> source
>     > > files.
>     > >     > > > > > > > USE_SIGNAL_HANDLER: "ON" # Print stack traces on
>     > > segfaults.
>     > >     > > > > > > > USE_TENSORRT: "OFF" # Enable infeference
> optimization
>     > with
>     > >     > > > TensorRT.
>     > >     > > > > > > > USE_ASAN: "OFF" # Enable Clang/GCC ASAN
> sanitizers.
>     > >     > > > > > > > ENABLE_TESTCOVERAGE: "OFF" # Enable compilation
> with
>     > test
>     > >     > > > > > > > coverage metric output
>     > >     > > > > > > > CMAKE_BUILD_TYPE: "Release"
>     > >     > > > > > > > CMAKE_CUDA_COMPILER_LAUNCHER: "ccache"
>     > >     > > > > > > > CMAKE_C_COMPILER_LAUNCHER: "ccache"
>     > >     > > > > > > > CMAKE_CXX_COMPILER_LAUNCHER: "ccache"
>     > >     > > > > > > >
>     > >     > > > > > > > commit 4d9667121ae6fb643f2a02ab15e25231ed756cde
> (HEAD,
>     > > tag:
>     > >     > > > > > > > 1.5.0.rc1,
>     > >     > > > > > > > upstream/v1.5.x)
>     > >     > > > > > > > commit 1a7199691f5cbc6012bb53eecbf884bed5ae6590
> (HEAD,
>     > > tag:
>     > >     > > > > > > > 1.4.1.rc0,
>     > >     > > > > > > > upstream/v1.4.x)
>     > >     > > > > > > >
>     > >     > > > > > > > curl
>     > http://169.254.169.254/latest/meta-data/instance-type
>     > >     > > > > > > > c5d.18xlarge
>     > >     > > > > > > >
>     > >     > > > > > > >
>     > >     > > > > > > > Version      : 3.6.7
>     > >     > > > > > > > Compiler     : GCC 8.2.0
>     > >     > > > > > > > Build        : ('default', 'Oct 22 2018
> 11:32:17')
>     > >     > > > > > > > Arch         : ('64bit', 'ELF')
>     > >     > > > > > > > ------------Pip Info-----------
>     > >     > > > > > > > Version      : 19.1.1
>     > >     > > > > > > > Directory    :
>     > >     > /home/piotr/mxnet_1.5/py3_venv/lib/python3.6/site-
>     > >     > > > > packages/pip
>     > >     > > > > > > > ----------MXNet Info-----------
>     > >     > > > > > > > Version      : 1.5.0
>     > >     > > > > > > > Directory    : /home/piotr/mxnet_1.5/python/mxnet
>     > >     > > > > > > > Hashtag not found. Not installed from pre-built
>     > package.
>     > >     > > > > > > > ----------System Info----------
>     > >     > > > > > > > Platform     :
>     > >     > > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
>     > >     > > > > > > > system       : Linux
>     > >     > > > > > > > node         : ip-172-31-63-171
>     > >     > > > > > > > release      : 4.15.0-1035-aws
>     > >     > > > > > > > version      : #37-Ubuntu SMP Mon Mar 18
> 16:15:14 UTC
>     > 2019
>     > >     > > > > > > > ----------Hardware Info----------
>     > >     > > > > > > > machine      : x86_64
>     > >     > > > > > > > processor    : x86_64
>     > >     > > > > > > > Architecture:        x86_64
>     > >     > > > > > > > CPU op-mode(s):      32-bit, 64-bit
>     > >     > > > > > > > Byte Order:          Little Endian
>     > >     > > > > > > > CPU(s):              72
>     > >     > > > > > > > On-line CPU(s) list: 0-71
>     > >     > > > > > > > Thread(s) per core:  2
>     > >     > > > > > > > Core(s) per socket:  18
>     > >     > > > > > > > Socket(s):           2
>     > >     > > > > > > > NUMA node(s):        2
>     > >     > > > > > > > Vendor ID:           GenuineIntel
>     > >     > > > > > > > CPU family:          6
>     > >     > > > > > > > Model:               85
>     > >     > > > > > > > Model name:          Intel(R) Xeon(R) Platinum
> 8124M
>     > CPU @
>     > >     > 3.00GHz
>     > >     > > > > > > > Stepping:            4
>     > >     > > > > > > > CPU MHz:             1326.446
>     > >     > > > > > > > BogoMIPS:            6000.00
>     > >     > > > > > > > Hypervisor vendor:   KVM
>     > >     > > > > > > > Virtualization type: full
>     > >     > > > > > > > L1d cache:           32K
>     > >     > > > > > > > L1i cache:           32K
>     > >     > > > > > > > L2 cache:            1024K
>     > >     > > > > > > > L3 cache:            25344K
>     > >     > > > > > > > NUMA node0 CPU(s):   0-17,36-53
>     > >     > > > > > > > NUMA node1 CPU(s):   18-35,54-71
>     > >     > > > > > > > Flags:               fpu vme de pse tsc msr pae
> mce cx8
>     > > apic
>     > >     > sep
>     > >     > > > mtrr
>     > >     > > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2
> ss ht
>     > > syscall
>     > >     > > > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon
> rep_good
>     > > nopl
>     > >     > > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni
> pclmulqdq
>     > > monitor
>     > >     > > > > > > > ssse3 fma cx16 pcid
>     > >     > > > > > > > sse4_1 sse4_2 x2apic movbe popcnt
> tsc_deadline_timer
>     > aes
>     > > xsave
>     > >     > > > > > > > avx f16c rdrand hypervisor lahf_lm abm
> 3dnowprefetch
>     > >     > > > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle
> avx2
>     > smep
>     > > bmi2
>     > >     > > > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx
> smap
>     > >     > clflushopt
>     > >     > > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec
> xgetbv1
>     > > xsaves
>     > >     > > > > > > > ida arat pku ospke ----------Network
> Test----------
>     > >     > > > > > > >
>     > >     > > > > > > > ----------Python Info----------
>     > >     > > > > > > > Version      : 3.6.7
>     > >     > > > > > > > Compiler     : GCC 8.2.0
>     > >     > > > > > > > Build        : ('default', 'Oct 22 2018
> 11:32:17')
>     > >     > > > > > > > Arch         : ('64bit', 'ELF')
>     > >     > > > > > > > ------------Pip Info-----------
>     > >     > > > > > > > Version      : 19.1.1
>     > >     > > > > > > > Directory    :
>     > >     > /home/piotr/mxnet_1.4/py3_venv/lib/python3.6/site-
>     > >     > > > > packages/pip
>     > >     > > > > > > > ----------MXNet Info-----------
>     > >     > > > > > > > Version      : 1.4.1
>     > >     > > > > > > > Directory    : /home/piotr/mxnet_1.4/python/mxnet
>     > >     > > > > > > > Hashtag not found. Not installed from pre-built
>     > package.
>     > >     > > > > > > > ----------System Info----------
>     > >     > > > > > > > Platform     :
>     > >     > > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
>     > >     > > > > > > > system       : Linux
>     > >     > > > > > > > node         : ip-172-31-63-171
>     > >     > > > > > > > release      : 4.15.0-1035-aws
>     > >     > > > > > > > version      : #37-Ubuntu SMP Mon Mar 18
> 16:15:14 UTC
>     > 2019
>     > >     > > > > > > > ----------Hardware Info----------
>     > >     > > > > > > > machine      : x86_64
>     > >     > > > > > > > processor    : x86_64
>     > >     > > > > > > > Architecture:        x86_64
>     > >     > > > > > > > CPU op-mode(s):      32-bit, 64-bit
>     > >     > > > > > > > Byte Order:          Little Endian
>     > >     > > > > > > > CPU(s):              72
>     > >     > > > > > > > On-line CPU(s) list: 0-71
>     > >     > > > > > > > Thread(s) per core:  2
>     > >     > > > > > > > Core(s) per socket:  18
>     > >     > > > > > > > Socket(s):           2
>     > >     > > > > > > > NUMA node(s):        2
>     > >     > > > > > > > Vendor ID:           GenuineIntel
>     > >     > > > > > > > CPU family:          6
>     > >     > > > > > > > Model:               85
>     > >     > > > > > > > Model name:          Intel(R) Xeon(R) Platinum
> 8124M
>     > CPU @
>     > >     > 3.00GHz
>     > >     > > > > > > > Stepping:            4
>     > >     > > > > > > > CPU MHz:             1223.344
>     > >     > > > > > > > BogoMIPS:            6000.00
>     > >     > > > > > > > Hypervisor vendor:   KVM
>     > >     > > > > > > > Virtualization type: full
>     > >     > > > > > > > L1d cache:           32K
>     > >     > > > > > > > L1i cache:           32K
>     > >     > > > > > > > L2 cache:            1024K
>     > >     > > > > > > > L3 cache:            25344K
>     > >     > > > > > > > NUMA node0 CPU(s):   0-17,36-53
>     > >     > > > > > > > NUMA node1 CPU(s):   18-35,54-71
>     > >     > > > > > > > Flags:               fpu vme de pse tsc msr pae
> mce cx8
>     > > apic
>     > >     > sep
>     > >     > > > mtrr
>     > >     > > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2
> ss ht
>     > > syscall
>     > >     > > > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon
> rep_good
>     > > nopl
>     > >     > > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni
> pclmulqdq
>     > > monitor
>     > >     > > > > > > > ssse3 fma cx16 pcid
>     > >     > > > > > > > sse4_1 sse4_2 x2apic movbe popcnt
> tsc_deadline_timer
>     > aes
>     > > xsave
>     > >     > > > > > > > avx f16c rdrand hypervisor lahf_lm abm
> 3dnowprefetch
>     > >     > > > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle
> avx2
>     > smep
>     > > bmi2
>     > >     > > > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx
> smap
>     > >     > clflushopt
>     > >     > > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec
> xgetbv1
>     > > xsaves
>     > >     > > > > > > > ida arat pku ospke ----------Network
> Test----------
>     > >     > > > > > > >
>     > >     > > > > > > > On Tue, Jun 25, 2019 at 2:35 PM Pedro Larroy
>     > >     > > > > <pe...@gmail.com> wrote:
>     > >     > > > > > > > >
>     > >     > > > > > > > > I did a training of cifar10 in CPU and seems
> there's
>     > > some
>     > >     > > > > > > > > regressions in the range of 7% increase of
> training
>     > time
>     > >     > against
>     > >     > > > 1.4.1:
>     > >     > > > > > > > >
>     > >     > > > > > > > > (py3_venv)
>     > >     > > > > > > > > piotr@ip-172-31-63-171
>     > > :0:~/deeplearning-benchmark/dawnbench
>     > >     > > > > > > > > (master)+$ time python cifar10.py --epochs 5
>     > >     > > > > > > > > real    11m30.388s
>     > >     > > > > > > > > user    417m7.766s
>     > >     > > > > > > > > sys     16m57.315s
>     > >     > > > > > > > >
>     > >     > > > > > > > > VS 1.4.1:
>     > >     > > > > > > > > real    10m41.994s
>     > >     > > > > > > > > user    392m40.646s
>     > >     > > > > > > > > sys     12m30.601s
>     > >     > > > > > > > >
>     > >     > > > > > > > >
>     > >     > > > > > > > > On Thu, Jun 20, 2019 at 10:15 PM Lai Wei <
>     > >     > royweilai@gmail.com>
>     > >     > > > > wrote:
>     > >     > > > > > > > > >
>     > >     > > > > > > > > > Hi Anirudh,
>     > >     > > > > > > > > >
>     > >     > > > > > > > > > Thanks for jumping into this quickly, I
> followed up
>     > > on the
>     > >     > > > issue.
>     > >     > > > > > > > > >
>     > >     > > > > > > > > > I was meant for sockeye
> developer/maintainers to
>     > help
>     > > setup
>     > >     > > > > > > > > > nightly tests and raise issues early.
>     > >     > > > > > > > > >
>     > >     > > > > > > > > > Thanks!
>     > >     > > > > > > > > >
>     > >     > > > > > > > > > On Fri, Jun 21, 2019 at 10:10 AM Haibin Lin
>     > >     > > > > > > > > > <ha...@gmail.com>
>     > >     > > > > > > > > > wrote:
>     > >     > > > > > > > > >
>     > >     > > > > > > > > > > In GluonNLP we are testing with MXNET
> nightly
>     > build
>     > > for
>     > >     > > > > > > > > > > each PR, and we did find some MXNet
> related issue
>     > > caught
>     > >     > by
>     > >     > > > the CI.
>     > >     > > > > > > > > > > I recommend other toolkits also add
> integration
>     > > tests
>     > >     > with
>     > >     > > > > > > > > > > MXNet
>     > >     > > > > nightly.
>     > >     > > > > > > > > > > It helps identify issues early.
>     > >     > > > > > > > > > >
>     > >     > > > > > > > > > > Best,
>     > >     > > > > > > > > > > Haibin
>     > >     > > > > > > > > > >
>     > >     > > > > > > > > > > On Thu, Jun 20, 2019 at 18:52 Zhao, Patric
>     > >     > > > > > > > > > > <pa...@intel.com>
>     > >     > > > > wrote:
>     > >     > > > > > > > > > >
>     > >     > > > > > > > > > > > Thanks to raise the issue and we will
> take a
>     > look
>     > > ASAP.
>     > >     > > > > > > > > > > >
>     > >     > > > > > > > > > > > The downstream cases is not in the MXNet
> CI so
>     > > it's
>     > >     > hard
>     > >     > > > > > > > > > > > to catch the potential bugs or
> performance
>     > > degradation
>     > >     > > > > > > > > > > > for
>     > >     > > > > MXNet developers.
>     > >     > > > > > > > > > > >
>     > >     > > > > > > > > > > > In the future, I suggest adding the major
>     > > downstream
>     > >     > > > > > > > > > > > test cases, like
>     > >     > > > > > > > > > > from
>     > >     > > > > > > > > > > > sockeye, GluonNLP, GLuonCV, DGL,
> Gluon-TS, into
>     > > the
>     > >     > > > > > > > > > > > nightly
>     > >     > > > > test.
>     > >     > > > > > > > > > > > If it's still too heavy,  maybe testing
> it
>     > weekly
>     > > or
>     > >     > > > > > > > > > > > monthly :)
>     > >     > > > > > > > > > > >
>     > >     > > > > > > > > > > > Thanks,
>     > >     > > > > > > > > > > >
>     > >     > > > > > > > > > > > --Patric
>     > >     > > > > > > > > > > >
>     > >     > > > > > > > > > > > > -----Original Message-----
>     > >     > > > > > > > > > > > > From: Anirudh Subramanian
>     > >     > > > > > > > > > > > > [mailto:anirudh2290@gmail.com]
>     > >     > > > > > > > > > > > > Sent: Friday, June 21, 2019 9:31 AM
>     > >     > > > > > > > > > > > > To: dev@mxnet.incubator.apache.org
>     > >     > > > > > > > > > > > > Cc: dev@mxnet.apache.org
>     > >     > > > > > > > > > > > > Subject: Re: [VOTE] Release Apache
> MXNet
>     > > (incubating)
>     > >     > > > > > > > > > > > > version
>     > >     > > > > > > > > > > > > 1.5.0.rc1
>     > >     > > > > > > > > > > > >
>     > >     > > > > > > > > > > > > Hi Lai,
>     > >     > > > > > > > > > > > >
>     > >     > > > > > > > > > > > > I have opened an issue:
>     > >     > > > > > > > > > > > >
>     > >     > https://github.com/apache/incubator-mxnet/issues/15297
>     > >     > > > > > > > > > > > > I came to know about this issue only
> today
>     > and
>     > > I have
>     > >     > > > > > > > > > > > > not been
>     > >     > > > > > > > > > > monitoring
>     > >     > > > > > > > > > > > > sockeye.
>     > >     > > > > > > > > > > > > I jumped onto this issue to make sure
> it
>     > wasn't
>     > >     > caused
>     > >     > > > > > > > > > > > > by the dlpack
>     > >     > > > > > > > > > > > changes.
>     > >     > > > > > > > > > > > > Also, I don't  think sockeye CI checks
>     > against
>     > >     > master,
>     > >     > > > > > > > > > > > > it is using
>     > >     > > > > > > > > > > 1.4.1.
>     > >     > > > > > > > > > > > >
>     > >     > > > > > > > > > > > > Anirudh
>     > >     > > > > > > > > > > > >
>     > >     > > > > > > > > > > > >
>     > >     > > > > > > > > > > > > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei
>     > >     > > > > > > > > > > > > <ro...@gmail.com>
>     > >     > > > > wrote:
>     > >     > > > > > > > > > > > >
>     > >     > > > > > > > > > > > > > Hi,
>     > >     > > > > > > > > > > > > >
>     > >     > > > > > > > > > > > > > Could you share which test failed and
>     > what’s
>     > > the
>     > >     > > > > > > > > > > > > > crash? How to reproduce it?
>     > >     > > > > > > > > > > > > >
>     > >     > > > > > > > > > > > > > I was able to install sockeye and
> run all
>     > > tests
>     > >     > passed.
>     > >     > > > > > > > > > > > > > Using python setup.py test
>     > >     > > > > > > > > > > > > >
>     > >     > > > > > > > > > > > > > I have tested both nightly pip
> package and
>     > >     > 1.5.0.rc1
>     > >     > > > > > > > > > > > > >
>     > >     > > > > > > > > > > > > > It would be great to create an issue
> with
>     > >     > > > > > > > > > > > > > reproducible steps and move the
> discussion
>     > > there.
>     > >     > > > > > > > > > > > > >
>     > >     > > > > > > > > > > > > > Also I see sockeye nightly build[1]
> has
>     > been
>     > >     > failing
>     > >     > > > > > > > > > > > > > for some time,
>     > >     > > > > > > > > > > if
>     > >     > > > > > > > > > > > > > it’s due to MXNet change, please
> raise this
>     > > early
>     > >     > so
>     > >     > > > > > > > > > > > > > we can track and solve it in time
> rather
>     > than
>     > > block
>     > >     > > > > > > > > > > > > > the release
>     > >     > > > > during vote time.
>     > >     > > > > > > > > > > > > >
>     > >     > > > > > > > > > > > > > [1]
> https://travis-ci.org/awslabs/sockeye
>     > >     > > > > > > > > > > > > >
>     > >     > > > > > > > > > > > > >
>     > >     > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 7:01 AM
> Anirudh
>     > > Subramanian
>     > >     > > > > > > > > > > > >

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Posted by "Davydenko, Denis" <dz...@gmail.com>.
According to Sandeep's evaluation of perf regression on operator level [1] we have 77 op/input combinations for forward pass and 50 for backward pass where regression is 5%+ (biggest regressions observed are about 86% and 84% respectively) out of 290 tests. If I raise threshold of degradation to 10%+ corresponding numbers are 70 for forward and 42 for backward. This, from my perspective, constitutes significant scale performance impact, at least on individual operator level. In light of keeping every next release as performant as previous (at least to feasible extent) I suggest we can only move forward with 1.5.0 release if we call it experimental. Current landscape of operators having potentially negative performance impact on customers could (and I consider it will) put MXNet one step behind its current market position of being a choice for performance optimized DL workloads. Tagging it as experimental, from my point of view, would help to release new features so that customers could enjoy them while being explicit about performance optimizations going on.

[1] https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50 



On 6/28/19, 9:38 AM, "Lai Wei" <ro...@gmail.com> wrote:

    Hi,
    
    Some more data points:
    
    I ran the same cifar10.py scripts with same setup, BUT added a fixed seed
    
    Ran 50 epochs, and first 10 epoch as warmup.
    I have the following average time per epoch:
    1.4.1: 164.95 s
    1.5.0: 170.44 s
    Detailed data at [1]
    This is about 3% regression, less than Manu’s result but more close to the
    Gluon result.
    
    As for the operator benchmarks from Sandeep[2],  I have calculated the
    percentage of speed increase/regression here[1]. Looks like not all
    operators mentioned before slowed down. should it be treated as an separate
    issue as it’s testing on fake data with different shape than CIFAR10
    dataset? For example, batch norm has no regression in the report but it’s
    slowed down in cifar10.py script profiling.
    
    [1] https://gist.github.com/roywei/41fce930f013ff3b54cda6e86eaaf66b
    [2]
    https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50
    
    
    On Fri, Jun 28, 2019 at 2:47 PM Pedro Larroy <pe...@gmail.com>
    wrote:
    
    > Thanks Manu.
    >
    > @all: I observed other strange stuff that I don't understand at the moment:
    >
    > I installed rc for 1.5 from pip to check that I'm not doing something
    > wrong when building. And I found out that the usage of CPU is quite
    > subpar ( https://imgur.com/fRmbQNc ) compared to a version compiled
    > from source. The pip package is using 4-5 cores of the 32. When I
    > compile from source I get good core utilization. (
    > https://imgur.com/e8BB425 ). I verified this also on a c5d.18xlarge
    > and a 32 core AMD bare metal machine.
    >
    > Seems to me also that the version from pip is using gomp instead of
    > llvm's omp. I'm not sure why.
    >
    > pip install mxnet==1.5.0b20190627
    > /home/piotr/py3_1.5rc/lib/python3.6/site-packages/mxnet
    > piotr@panther:0: ~/p/l/p/s/mxnet> ldd libmxnet.so | grep omp
    >     libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
    > (0x00007f99d1832000)
    >
    > I tried cifar10 on a bare metal 32 core AMD Zen machine and is
    > extremely slow, doesn't seem to make much progress, when compared to a
    > c5d.18xlarge, I couldn't even do 1 epoch, tried with and without MKL
    > without much success. Will continue digging into this when possible.
    >
    >
    > Pedro.
    >
    > On Thu, Jun 27, 2019 at 9:41 PM Manu Seth <ma...@gmail.com> wrote:
    > >
    > > Hi all,
    > >
    > > I ran the same cifar10.py script as Pedro, but for 20 epochs. Considering
    > > the first 10 epochs for warm-up, I averaged time per epoch for the last
    > 10
    > > epochs.
    > >
    > > With MXNet 1.4.1 average time is 164.23 s
    > > With MXNet 1.5.0 average time is 174.59 s (~6.3% regression)
    > >
    > >
    > > For a second data point, I ran Gluon speed test benchmark script -
    > >
    > https://github.com/apache/incubator-mxnet/blob/master/benchmark/python/gluon/benchmark_gluon.py
    > > using the following command:
    > > python3 benchmark_gluon.py --model 'resnet152_v2' --batch-size 128
    > > --num-batches 200 --type 'training'
    > >
    > > I got the following speeds:
    > > With MXNet 1.4.1, average speed is 25.677534 img/s
    > > With MXNet 1.5.0, average speed is 25.082130 img/s (~2.3% regression)
    > >
    > > Note:
    > > For 1.4.1 version, I used pip install mxnet-mkl==1.4.1
    > > For 1.5.0 version, I used pip install mxnet-mkl==1.5.0b20190619 which
    > > corresponds to commit# ccbbf6b4b76ea536a6583c99497c83b65a20817b which is
    > > behind 1.5.x branch by 4 commits
    > >
    > >
    > > Best,
    > > Manu
    > >
    > >
    > > On 6/27/19, 3:37 PM, "sandeep krishnamurthy" <
    > sandeep.krishna98@gmail.com>
    > > wrote:
    > >
    > >     Hello Ciyong/Pedro,
    > >
    > >     Ran operator benchmarks on 1.4.1 and 1.5.0.rc2. (Not complete,
    > doesn’t
    > >     cover all MXNet operators, not presented in best possible way, still
    > > WIP)
    > >
    > >
    > https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50
    > >
    > >     Following operators looks slower in 1.5 compared to 1.4.1:
    > >     - BatchNorm
    > >     - Pooling
    > >     - FullyConnected
    > >     - batch_dot
    > >     - Dot
    > >     - broadcast_mul
    > >     - log_softmax
    > >     and few other operators
    > >
    > >     Also, several operators runs a lot faster on 1.5 compared to 1.4.1.
    > For
    > >     example - Convolution, flatten, elementwise operators etc. So I see
    > that
    > >     likely few operators have regressed noticeably, however, due to other
    > >     operator performance improvements, the end effect is not that
    > > significant
    > >     hiding a lot of regression. We need more detailed analysis per
    > operator
    > >     performance. We will not be able to do this for current release, we
    > > should
    > >     have a more concrete way to determining such performance regression
    > > before
    > >     next release.
    > >
    > >     Setup:
    > >     1.5 => Build from source (head of 1.5.rc2 tag), built with MKLDNN
    > >     1.4.1 => PyPi mxnet-mkl==1.4.1
    > >     Machine: C5.18X
    > >     No explicit environment variable were set
    > >     Operator benchmark code -
    > >
    > https://github.com/apache/incubator-mxnet/tree/master/benchmark/opperf
    > >
    > >     Best,
    > >     Sandeep
    > >
    > >
    > >     On Thu, Jun 27, 2019 at 10:42 AM Pedro Larroy <
    > > pedro.larroy.lists@gmail.com>
    > >     wrote:
    > >
    > >     > I will try to run a few benchmarks in a bare metal instance
    > tonight to
    > >     > remove virtualization variance for the measurements and provide
    > some
    > >     > numbers.
    > >     >
    > >     > Please propose a set of models / examples that would be desirable
    > to
    > >     > run before the release and provide a link to an easy to run script
    > >     > with instructions so we can validate the release better.
    > >     >
    > >     > Thank you.
    > >     >
    > >     > On Thu, Jun 27, 2019 at 10:01 AM Lai Wei <ro...@gmail.com>
    > wrote:
    > >     > >
    > >     > > Dear @dev,
    > >     > >
    > >     > > I m cancelling the vote for cached op fix:
    > >     > >
    > >     > > https://github.com/apache/incubator-mxnet/pull/15298
    > >     > >
    > >     > > As for the possible cpu training regression, it looks like not a
    > > blocker
    > >     > > for now.
    > >     > >
    > >     > > I will start a new rc2 vote, please help to validate.
    > >     > >
    > >     > > Thanks!
    > >     > >
    > >     > >
    > >     > > On Thu, Jun 27, 2019 at 10:06 PM Chen, Ciyong <
    > ciyong.chen@intel.com
    > > >
    > >     > wrote:
    > >     > >
    > >     > > > Hi Pedro,
    > >     > > >
    > >     > > > I was able to reproduced the similar result (v1.5 is ~%5.6
    > slower
    > > than
    > >     > > > v1.4, I was using 18 cores for computing) with your script on
    > >     > C5.18xlarge.
    > >     > > > But need to bind the cores with below command when running the
    > > script,
    > >     > > > (without setting the env variables, I got a close time (<1%)
    > with
    > > v1.5
    > >     > and
    > >     > > > v1.4)
    > >     > > >         export
    > > KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0
    > >     > > >         export OMP_NUM_THREADS=18
    > >     > > >
    > >     > > > Did you set any env variables during running?
    > >     > > >
    > >     > > > The performance result I got as below:
    > >     > > > 1) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
    > >     > > > real    12m10.856s
    > >     > > > user    234m49.576s
    > >     > > > sys     4m38.044s
    > >     > > >
    > >     > > > 2) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
    > >     > > > real    12m52.140s
    > >     > > > user    246m30.740s
    > >     > > > sys     5m8.188s
    > >     > > >
    > >     > > > As I looked at the profiling data, most of the ops have same
    > perf
    > >     > between
    > >     > > > v1.4 and v1.5. But some ops like " _backward_BatchNorm" and
    > > "Pooling"
    > >     > is
    > >     > > > ~1.37x slower on v1.5 compared with v1.4.
    > >     > > > Will do further analysis on these ops.
    > >     > > >
    > >     > > > Here's the hardware/OS info from my side:
    > >     > > > ----------Python Info----------
    > >     > > > Version      : 3.6.8
    > >     > > > Compiler     : GCC 7.3.0
    > >     > > > Build        : ('default', 'Dec 30 2018 01:22:34')
    > >     > > > Arch         : ('64bit', '')
    > >     > > > ------------Pip Info-----------
    > >     > > > Version      : 19.0.3
    > >     > > > Directory    :
    > >     > > >
    > > /home/ubuntu/anaconda3/envs/perf-mxnet/lib/python3.6/site-packages/pip
    > >     > > > ----------MXNet Info-----------
    > >     > > > Version      : 1.5.0
    > >     > > > Directory    : /home/ubuntu/ws/incubator-mxnet/python/mxnet
    > >     > > > Hashtag not found. Not installed from pre-built package.
    > >     > > > ----------System Info----------
    > >     > > > Platform     :
    > Linux-4.4.0-1085-aws-x86_64-with-debian-stretch-sid
    > >     > > > system       : Linux
    > >     > > > node         : ip-172-31-32-129
    > >     > > > release      : 4.4.0-1085-aws
    > >     > > > version      : #96-Ubuntu SMP Tue Jun 11 09:08:32 UTC 2019
    > >     > > > ----------Hardware Info----------
    > >     > > > machine      : x86_64
    > >     > > > processor    : x86_64
    > >     > > > Architecture:          x86_64
    > >     > > > CPU op-mode(s):        32-bit, 64-bit
    > >     > > > Byte Order:            Little Endian
    > >     > > > CPU(s):                72
    > >     > > > On-line CPU(s) list:   0-71
    > >     > > > Thread(s) per core:    2
    > >     > > > Core(s) per socket:    18
    > >     > > > Socket(s):             2
    > >     > > > NUMA node(s):          2
    > >     > > > Vendor ID:             GenuineIntel
    > >     > > > CPU family:            6
    > >     > > > Model:                 85
    > >     > > > Model name:            Intel(R) Xeon(R) Platinum 8124M CPU @
    > > 3.00GHz
    > >     > > > Stepping:              3
    > >     > > > CPU MHz:               3000.000
    > >     > > > BogoMIPS:              6000.00
    > >     > > > Hypervisor vendor:     KVM
    > >     > > > Virtualization type:   full
    > >     > > > L1d cache:             32K
    > >     > > > L1i cache:             32K
    > >     > > > L2 cache:              1024K
    > >     > > > L3 cache:              25344K
    > >     > > > NUMA node0 CPU(s):     0-17,36-53
    > >     > > > NUMA node1 CPU(s):     18-35,54-71
    > >     > > > Flags:                 fpu vme de pse tsc msr pae mce cx8 apic
    > > sep mtrr
    > >     > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall
    > nx
    > >     > pdpe1gb
    > >     > > > rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology
    > > nonstop_tsc
    > >     > > > aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16
    > > pcid
    > >     > sse4_1
    > >     > > > sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx
    > f16c
    > > rdrand
    > >     > > > hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser
    > > fsgsbase
    > >     > > > tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f
    > > rdseed
    > >     > adx
    > >     > > > smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 ida arat
    > pku
    > >     > > > ----------Network Test----------
    > >     > > >
    > >     > > >
    > >     > > > -Ciyong
    > >     > > >
    > >     > > >
    > >     > > > -----Original Message-----
    > >     > > > From: Zhao, Patric [mailto:patric.zhao@intel.com]
    > >     > > > Sent: Thursday, June 27, 2019 9:55 AM
    > >     > > > To: dev@mxnet.incubator.apache.org
    > >     > > > Cc: dev@mxnet.apache.org
    > >     > > > Subject: RE: [VOTE] Release Apache MXNet (incubating) version
    > > 1.5.0.rc1
    > >     > > >
    > >     > > > Could we run more epochs to see the performance difference or
    > > profiling
    > >     > > > the difference between good and bad run?
    > >     > > >
    > >     > > > > -----Original Message-----
    > >     > > > > From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com]
    > >     > > > > Sent: Thursday, June 27, 2019 9:35 AM
    > >     > > > > To: dev@mxnet.incubator.apache.org
    > >     > > > > Cc: dev@mxnet.apache.org
    > >     > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version
    > >     > > > > 1.5.0.rc1
    > >     > > > >
    > >     > > > > I run again and the gap is again bigger, I guess we need to
    > > average
    > >     > > > > out the times across several runs:
    > >     > > > >
    > >     > > > > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
    > >     > > > > (master)+$ time ~/mxnet_1.4/py3_venv/bin/python cifar10.py
    > > --epochs 5
    > >     > > > > && time ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
    > >     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
    > >     > > > > ImageRecordIOParser2:
    > >     > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use
    > 4
    > >     > threads
    > >     > > > > for decoding..
    > >     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean
    > > image
    > >     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    > >     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean
    > > image
    > >     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    > > completed
    > >     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
    > >     > > > > ImageRecordIOParser2:
    > >     > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
    > > threads
    > >     > > > > for decoding..
    > >     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean
    > > image
    > >     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    > >     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean
    > > image
    > >     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    > > completed
    > >     > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
    > > 300:
    > >     > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 [23:17:09]
    > >     > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
    > >     > > > > 147456 bytes with malloc directly
    > >     > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
    > Allocate
    > >     > > > > 589824 bytes with malloc directly
    > >     > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
    > Allocate
    > >     > > > > 2359296 bytes with malloc directly
    > >     > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
    > Allocate
    > >     > > > > 9437184 bytes with malloc directly
    > >     > > > > Epoch 0, Batch 199, Speed=384.149839
    > >     > > > > Epoch 0, Duration=140.919567
    > >     > > > > Epoch 0, Training accuracy=0.115169
    > >     > > > > Epoch 0, Validation accuracy=0.141317
    > >     > > > > Epoch 1, Batch 199, Speed=433.380512
    > >     > > > > Epoch 1, Duration=119.553233
    > >     > > > > Epoch 1, Training accuracy=0.170956
    > >     > > > > Epoch 1, Validation accuracy=0.216146
    > >     > > > > Epoch 2, Batch 199, Speed=434.864699
    > >     > > > > Epoch 2, Duration=123.278490
    > >     > > > > Epoch 2, Training accuracy=0.209455
    > >     > > > > Epoch 2, Validation accuracy=0.247296
    > >     > > > > Epoch 3, Batch 199, Speed=433.401854
    > >     > > > > Epoch 3, Duration=118.327797
    > >     > > > > Epoch 3, Training accuracy=0.248701
    > >     > > > > Epoch 3, Validation accuracy=0.302083
    > >     > > > > Epoch 4, Batch 199, Speed=419.713707
    > >     > > > > Epoch 4, Duration=126.468409
    > >     > > > > Epoch 4, Training accuracy=0.260949
    > >     > > > > Epoch 4, Validation accuracy=0.269030
    > >     > > > >
    > >     > > > > real    10m55.796s
    > >     > > > > user    399m33.567s
    > >     > > > > sys     13m55.904s
    > >     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
    > >     > > > > ImageRecordIOParser2:
    > >     > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use
    > 4
    > >     > threads
    > >     > > > > for decoding..
    > >     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean
    > > image
    > >     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    > >     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean
    > > image
    > >     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    > > completed
    > >     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
    > >     > > > > ImageRecordIOParser2:
    > >     > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
    > > threads
    > >     > > > > for decoding..
    > >     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean
    > > image
    > >     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    > >     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean
    > > image
    > >     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    > > completed
    > >     > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
    > > 300:
    > >     > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0, Batch
    > > 199,
    > >     > > > > Speed=419.039188 Epoch 0, Duration=143.934903 Epoch 0,
    > Training
    > >     > > > > accuracy=0.122542 Epoch 0, Validation accuracy=0.164359
    > Epoch 1,
    > >     > Batch
    > >     > > > > 199, Speed=445.257048 Epoch 1, Duration=135.248399 Epoch 1,
    > > Training
    > >     > > > > accuracy=0.178828 Epoch 1, Validation accuracy=0.199419
    > Epoch 2,
    > >     > Batch
    > >     > > > > 199, Speed=447.115215 Epoch 2, Duration=132.003770 Epoch 2,
    > > Training
    > >     > > > > accuracy=0.217808 Epoch 2, Validation accuracy=0.233073
    > Epoch 3,
    > >     > Batch
    > >     > > > > 199, Speed=441.079477 Epoch 3, Duration=126.543316 Epoch 3,
    > > Training
    > >     > > > > accuracy=0.248102 Epoch 3, Validation accuracy=0.293870
    > Epoch 4,
    > >     > Batch
    > >     > > > > 199, Speed=449.329787 Epoch 4, Duration=138.398325 Epoch 4,
    > > Training
    > >     > > > > accuracy=0.270021 Epoch 4, Validation accuracy=0.311498
    > >     > > > >
    > >     > > > > real    11m45.329s
    > >     > > > > user    426m13.908s
    > >     > > > > sys     16m45.093s
    > >     > > > >
    > >     > > > > On Wed, Jun 26, 2019 at 4:18 PM Pedro Larroy
    > >     > > > > <pe...@gmail.com> wrote:
    > >     > > > > >
    > >     > > > > > The difference looks smaller now, more like your numbers. I
    > > wonder
    > >     > > > > > if something happened during the previous benchmark like a
    > > system
    > >     > > > > > update...
    > >     > > > > >
    > >     > > > > >
    > >     > > > > > piotr@ip-172-31-63-171
    > :0:~/deeplearning-benchmark/dawnbench
    > >     > > > > (master)+$
    > >     > > > > > time ~/mxnet_1.4/py3_venv/bin/python cifar10.py --epochs 5
    > &&
    > > time
    > >     > > > > > ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
    > > [22:49:41]
    > >     > > > > > ../src/io/iter_image_recordio_2.cc:172:
    > >     > > > > > ImageRecordIOParser2:
    > >     > > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec,
    > use 4
    > >     > > > > > threads for decoding..
    > >     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load
    > mean
    > > image
    > >     > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    > >     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load
    > mean
    > > image
    > >     > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    > >     > > > > completed
    > >     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:172:
    > >     > > > > > ImageRecordIOParser2:
    > >     > > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec,
    > use 4
    > >     > > > > > threads for decoding..
    > >     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load
    > mean
    > > image
    > >     > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    > >     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load
    > mean
    > > image
    > >     > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    > >     > > > > completed
    > >     > > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123:
    > 0.0005,
    > > 300:
    > >     > > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 [22:49:42]
    > >     > > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
    > >     > > > > > 147456 bytes with malloc directly
    > >     > > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
    > > Allocate
    > >     > > > > > 589824 bytes with malloc directly
    > >     > > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
    > > Allocate
    > >     > > > > > 2359296 bytes with malloc directly
    > >     > > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
    > > Allocate
    > >     > > > > > 9437184 bytes with malloc directly
    > >     > > > > > Epoch 0, Batch 199, Speed=426.182733 Epoch 0,
    > > Duration=134.868458
    > >     > > > > > Epoch 0, Training accuracy=0.127238 Epoch 0, Validation
    > >     > > > > > accuracy=0.206388 Epoch 1, Batch 199, Speed=313.127156
    > Epoch
    > > 1,
    > >     > > > > > Duration=128.041775 Epoch 1, Training accuracy=0.182065
    > Epoch
    > > 1,
    > >     > > > > > Validation accuracy=0.202524 Epoch 2, Batch 199,
    > > Speed=410.931187
    > >     > > > > > Epoch 2, Duration=124.920588 Epoch 2, Training
    > > accuracy=0.202584
    > >     > > > > > Epoch 2, Validation accuracy=0.245693 Epoch 3, Batch 199,
    > >     > > > > > Speed=419.119335 Epoch 3, Duration=120.948349 Epoch 3,
    > > Training
    > >     > > > > > accuracy=0.235854 Epoch 3, Validation accuracy=0.291066
    > Epoch
    > > 4,
    > >     > > > > > Batch 199, Speed=430.473733 Epoch 4, Duration=130.181724
    > > Epoch 4,
    > >     > > > > > Training accuracy=0.257773 Epoch 4, Validation
    > > accuracy=0.304988
    > >     > > > > >
    > >     > > > > > real    11m7.356s
    > >     > > > > > user    406m9.910s
    > >     > > > > > sys     14m18.349s
    > >     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
    > >     > > > > > ImageRecordIOParser2:
    > >     > > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec,
    > use 4
    > >     > > > > > threads for decoding..
    > >     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load
    > mean
    > > image
    > >     > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    > >     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load
    > mean
    > > image
    > >     > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    > >     > > > > completed
    > >     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
    > >     > > > > > ImageRecordIOParser2:
    > >     > > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec,
    > use 4
    > >     > > > > > threads for decoding..
    > >     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load
    > mean
    > > image
    > >     > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    > >     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load
    > mean
    > > image
    > >     > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    > >     > > > > completed
    > >     > > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123:
    > 0.0005,
    > > 300:
    > >     > > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0,
    > Batch
    > > 199,
    > >     > > > > > Speed=348.618154 Epoch 0, Duration=146.469352 Epoch 0,
    > > Training
    > >     > > > > > accuracy=0.124121 Epoch 0, Validation accuracy=0.167227
    > Epoch
    > > 1,
    > >     > > > > > Batch 199, Speed=452.790825 Epoch 1, Duration=130.199421
    > > Epoch 1,
    > >     > > > > > Training
    > >     > > > > > accuracy=0.183863 Epoch 1, Validation accuracy=0.237079
    > Epoch
    > > 2,
    > >     > > > > > Batch 199, Speed=451.406559 Epoch 2, Duration=126.320823
    > > Epoch 2,
    > >     > > > > > Training
    > >     > > > > > accuracy=0.214844 Epoch 2, Validation accuracy=0.244692
    > Epoch
    > > 3,
    > >     > > > > > Batch 199, Speed=403.161873 Epoch 3, Duration=125.331660
    > > Epoch 3,
    > >     > > > > > Training
    > >     > > > > > accuracy=0.243506 Epoch 3, Validation accuracy=0.301182
    > Epoch
    > > 4,
    > >     > > > > > Batch 199, Speed=450.826598 Epoch 4, Duration=126.426253
    > > Epoch 4,
    > >     > > > > > Training
    > >     > > > > > accuracy=0.266424 Epoch 4, Validation accuracy=0.311899
    > >     > > > > >
    > >     > > > > > real    11m21.930s
    > >     > > > > > user    415m3.855s
    > >     > > > > > sys     13m53.975s
    > >     > > > > >
    > >     > > > > > On Wed, Jun 26, 2019 at 3:50 PM Pedro Larroy
    > >     > > > > > <pe...@gmail.com> wrote:
    > >     > > > > > >
    > >     > > > > > > Hi Ciyong, thanks for trying to reproduce:
    > >     > > > > > >
    > >     > > > > > > I used this one:
    > >     > > > > > > https://github.com/awslabs/deeplearning-
    > >     > > > > benchmark/blob/master/dawnbe
    > >     > > > > > > nch/cifar10.py
    > >     > > > > > >
    > >     > > > > > > Could you provide hardware and OS details?
    > >     > > > > > >
    > >     > > > > > > I will rerun and repost numbers in a few minutes.
    > >     > > > > > >
    > >     > > > > > > Pedro.
    > >     > > > > > >
    > >     > > > > > > On Wed, Jun 26, 2019 at 4:18 AM Chen, Ciyong
    > >     > > > > > > <ci...@intel.com>
    > >     > > > > wrote:
    > >     > > > > > > >
    > >     > > > > > > > Hi Pedro,
    > >     > > > > > > >
    > >     > > > > > > > I'm looking at this case, and using the script of
    > >     > > > > > > >
    > > "incubator-mxnet/example/image-classification/train_cifar10.py"
    > >     > > > > > > > to get
    > >     > > > > the timing data, but seems there's not much difference
    > between
    > > mxnet
    > >     > > > > 1.4.1.rc0 and 1.5.0.rc1 on C5.18xlarge.
    > >     > > > > > > >
    > >     > > > > > > > Not sure if there's any difference in the python
    > script,
    > > can
    > >     > you
    > >     > > > > > > > point me
    > >     > > > > the link to get your script (cifar10.py)?
    > >     > > > > > > > Or you can also have a try with MXNet's script
    > >     > > > > > > > (train_cifar10.py) and see
    > >     > > > > the performance.
    > >     > > > > > > >
    > >     > > > > > > > Here's the command I used to collect the time:
    > >     > > > > > > >         python train_cifar10.py --num-epoch=5
    > >     > > > > > > >
    > >     > > > > > > > 1) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
    > >     > > > > > > >         real    9m4.880s
    > >     > > > > > > >         user    333m13.340s
    > >     > > > > > > >         sys     14m36.100s
    > >     > > > > > > >
    > >     > > > > > > > 2) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
    > >     > > > > > > >         real    9m2.155s
    > >     > > > > > > >         user    329m37.092s
    > >     > > > > > > >         sys     16m8.668s
    > >     > > > > > > >
    > >     > > > > > > > -Ciyong
    > >     > > > > > > >
    > >     > > > > > > >
    > >     > > > > > > > -----Original Message-----
    > >     > > > > > > > From: Pedro Larroy [mailto:
    > pedro.larroy.lists@gmail.com]
    > >     > > > > > > > Sent: Wednesday, June 26, 2019 6:28 AM
    > >     > > > > > > > To: dev@mxnet.incubator.apache.org
    > >     > > > > > > > Cc: dev@mxnet.apache.org
    > >     > > > > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating)
    > > version
    > >     > > > > > > > 1.5.0.rc1
    > >     > > > > > > >
    > >     > > > > > > > Hi these were my build flags and system info:
    > >     > > > > > > >
    > >     > > > > > > >
    > >     > > > > > > > --- # CMake configuration
    > >     > > > > > > > USE_CUDA: "OFF" # Build with CUDA support
    > >     > > > > > > > USE_OLDCMAKECUDA: "OFF" # Build with old cmake cuda
    > >     > > > > > > > USE_NCCL: "OFF" # Use NVidia NCCL with CUDA
    > >     > > > > > > > USE_OPENCV: "ON" # Build with OpenCV support
    > >     > > > > > > > USE_OPENMP: "ON" # Build with Openmp support
    > >     > > > > > > > USE_CUDNN: "ON" # Build with cudnn support) # one could
    > > set
    > >     > > > > > > > CUDNN_ROOT for search path
    > >     > > > > > > > USE_SSE: "ON" # Build with x86 SSE instruction support
    > IF
    > > NOT
    > >     > > > > > > > ARM
    > >     > > > > > > > USE_F16C: "ON" # Build with x86 F16C instruction
    > support)
    > > #
    > >     > > > > autodetects support if "ON"
    > >     > > > > > > > USE_LAPACK: "ON" # Build with lapack support
    > >     > > > > > > > USE_MKL_IF_AVAILABLE: "ON" # Use MKL if found
    > >     > > > > > > > USE_MKLML_MKL: "ON" # Use MKLDNN variant of MKL (if MKL
    > > found)
    > >     > > > > > > > IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
    > >     > > > > > > > USE_MKLDNN: "ON" # Use MKLDNN variant of MKL (if MKL
    > > found) IF
    > >     > > > > > > > USE_MKL_IF_AVAILABLE AND (NOT APPLE)
    > >     > > > > > > > USE_OPERATOR_TUNING: "ON" # Enable auto-tuning of
    > > operators IF
    > >     > > > > NOT
    > >     > > > > > > > MSVC
    > >     > > > > > > > USE_GPERFTOOLS: "ON" # Build with GPerfTools support
    > (if
    > > found)
    > >     > > > > > > > USE_JEMALLOC: "ON" # Build with Jemalloc support
    > >     > > > > > > > USE_PROFILER: "ON" # Build with Profiler support
    > >     > > > > > > > USE_DIST_KVSTORE: "OFF" # Build with DIST_KVSTORE
    > support
    > >     > > > > > > > USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC Plugins
    > >     > > > > > > > USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin
    > >     > > > > > > > USE_CPP_PACKAGE: "OFF" # Build C++ Package
    > >     > > > > > > > USE_MXNET_LIB_NAMING: "ON" # Use MXNet library naming
    > >     > > > > conventions.
    > >     > > > > > > > USE_GPROF: "OFF" # Compile with gprof (profiling) flag
    > >     > > > > > > > USE_CXX14_IF_AVAILABLE: "OFF" # Build with C++14 if the
    > >     > compiler
    > >     > > > > > > > supports it
    > >     > > > > > > > USE_VTUNE: "OFF" # Enable use of Intel Amplifier XE
    > > (VTune)) #
    > >     > > > > > > > one could set VTUNE_ROOT for search path
    > >     > > > > > > > ENABLE_CUDA_RTC: "ON" # Build with CUDA runtime
    > > compilation
    > >     > > > > > > > support
    > >     > > > > > > > BUILD_CPP_EXAMPLES: "ON" # Build cpp examples
    > >     > > > > > > > INSTALL_EXAMPLES: "OFF" # Install the example source
    > > files.
    > >     > > > > > > > USE_SIGNAL_HANDLER: "ON" # Print stack traces on
    > > segfaults.
    > >     > > > > > > > USE_TENSORRT: "OFF" # Enable infeference optimization
    > with
    > >     > > > TensorRT.
    > >     > > > > > > > USE_ASAN: "OFF" # Enable Clang/GCC ASAN sanitizers.
    > >     > > > > > > > ENABLE_TESTCOVERAGE: "OFF" # Enable compilation with
    > test
    > >     > > > > > > > coverage metric output
    > >     > > > > > > > CMAKE_BUILD_TYPE: "Release"
    > >     > > > > > > > CMAKE_CUDA_COMPILER_LAUNCHER: "ccache"
    > >     > > > > > > > CMAKE_C_COMPILER_LAUNCHER: "ccache"
    > >     > > > > > > > CMAKE_CXX_COMPILER_LAUNCHER: "ccache"
    > >     > > > > > > >
    > >     > > > > > > > commit 4d9667121ae6fb643f2a02ab15e25231ed756cde (HEAD,
    > > tag:
    > >     > > > > > > > 1.5.0.rc1,
    > >     > > > > > > > upstream/v1.5.x)
    > >     > > > > > > > commit 1a7199691f5cbc6012bb53eecbf884bed5ae6590 (HEAD,
    > > tag:
    > >     > > > > > > > 1.4.1.rc0,
    > >     > > > > > > > upstream/v1.4.x)
    > >     > > > > > > >
    > >     > > > > > > > curl
    > http://169.254.169.254/latest/meta-data/instance-type
    > >     > > > > > > > c5d.18xlarge
    > >     > > > > > > >
    > >     > > > > > > >
    > >     > > > > > > > Version      : 3.6.7
    > >     > > > > > > > Compiler     : GCC 8.2.0
    > >     > > > > > > > Build        : ('default', 'Oct 22 2018 11:32:17')
    > >     > > > > > > > Arch         : ('64bit', 'ELF')
    > >     > > > > > > > ------------Pip Info-----------
    > >     > > > > > > > Version      : 19.1.1
    > >     > > > > > > > Directory    :
    > >     > /home/piotr/mxnet_1.5/py3_venv/lib/python3.6/site-
    > >     > > > > packages/pip
    > >     > > > > > > > ----------MXNet Info-----------
    > >     > > > > > > > Version      : 1.5.0
    > >     > > > > > > > Directory    : /home/piotr/mxnet_1.5/python/mxnet
    > >     > > > > > > > Hashtag not found. Not installed from pre-built
    > package.
    > >     > > > > > > > ----------System Info----------
    > >     > > > > > > > Platform     :
    > >     > > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
    > >     > > > > > > > system       : Linux
    > >     > > > > > > > node         : ip-172-31-63-171
    > >     > > > > > > > release      : 4.15.0-1035-aws
    > >     > > > > > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC
    > 2019
    > >     > > > > > > > ----------Hardware Info----------
    > >     > > > > > > > machine      : x86_64
    > >     > > > > > > > processor    : x86_64
    > >     > > > > > > > Architecture:        x86_64
    > >     > > > > > > > CPU op-mode(s):      32-bit, 64-bit
    > >     > > > > > > > Byte Order:          Little Endian
    > >     > > > > > > > CPU(s):              72
    > >     > > > > > > > On-line CPU(s) list: 0-71
    > >     > > > > > > > Thread(s) per core:  2
    > >     > > > > > > > Core(s) per socket:  18
    > >     > > > > > > > Socket(s):           2
    > >     > > > > > > > NUMA node(s):        2
    > >     > > > > > > > Vendor ID:           GenuineIntel
    > >     > > > > > > > CPU family:          6
    > >     > > > > > > > Model:               85
    > >     > > > > > > > Model name:          Intel(R) Xeon(R) Platinum 8124M
    > CPU @
    > >     > 3.00GHz
    > >     > > > > > > > Stepping:            4
    > >     > > > > > > > CPU MHz:             1326.446
    > >     > > > > > > > BogoMIPS:            6000.00
    > >     > > > > > > > Hypervisor vendor:   KVM
    > >     > > > > > > > Virtualization type: full
    > >     > > > > > > > L1d cache:           32K
    > >     > > > > > > > L1i cache:           32K
    > >     > > > > > > > L2 cache:            1024K
    > >     > > > > > > > L3 cache:            25344K
    > >     > > > > > > > NUMA node0 CPU(s):   0-17,36-53
    > >     > > > > > > > NUMA node1 CPU(s):   18-35,54-71
    > >     > > > > > > > Flags:               fpu vme de pse tsc msr pae mce cx8
    > > apic
    > >     > sep
    > >     > > > mtrr
    > >     > > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht
    > > syscall
    > >     > > > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good
    > > nopl
    > >     > > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
    > > monitor
    > >     > > > > > > > ssse3 fma cx16 pcid
    > >     > > > > > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer
    > aes
    > > xsave
    > >     > > > > > > > avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
    > >     > > > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2
    > smep
    > > bmi2
    > >     > > > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx smap
    > >     > clflushopt
    > >     > > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
    > > xsaves
    > >     > > > > > > > ida arat pku ospke ----------Network Test----------
    > >     > > > > > > >
    > >     > > > > > > > ----------Python Info----------
    > >     > > > > > > > Version      : 3.6.7
    > >     > > > > > > > Compiler     : GCC 8.2.0
    > >     > > > > > > > Build        : ('default', 'Oct 22 2018 11:32:17')
    > >     > > > > > > > Arch         : ('64bit', 'ELF')
    > >     > > > > > > > ------------Pip Info-----------
    > >     > > > > > > > Version      : 19.1.1
    > >     > > > > > > > Directory    :
    > >     > /home/piotr/mxnet_1.4/py3_venv/lib/python3.6/site-
    > >     > > > > packages/pip
    > >     > > > > > > > ----------MXNet Info-----------
    > >     > > > > > > > Version      : 1.4.1
    > >     > > > > > > > Directory    : /home/piotr/mxnet_1.4/python/mxnet
    > >     > > > > > > > Hashtag not found. Not installed from pre-built
    > package.
    > >     > > > > > > > ----------System Info----------
    > >     > > > > > > > Platform     :
    > >     > > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
    > >     > > > > > > > system       : Linux
    > >     > > > > > > > node         : ip-172-31-63-171
    > >     > > > > > > > release      : 4.15.0-1035-aws
    > >     > > > > > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC
    > 2019
    > >     > > > > > > > ----------Hardware Info----------
    > >     > > > > > > > machine      : x86_64
    > >     > > > > > > > processor    : x86_64
    > >     > > > > > > > Architecture:        x86_64
    > >     > > > > > > > CPU op-mode(s):      32-bit, 64-bit
    > >     > > > > > > > Byte Order:          Little Endian
    > >     > > > > > > > CPU(s):              72
    > >     > > > > > > > On-line CPU(s) list: 0-71
    > >     > > > > > > > Thread(s) per core:  2
    > >     > > > > > > > Core(s) per socket:  18
    > >     > > > > > > > Socket(s):           2
    > >     > > > > > > > NUMA node(s):        2
    > >     > > > > > > > Vendor ID:           GenuineIntel
    > >     > > > > > > > CPU family:          6
    > >     > > > > > > > Model:               85
    > >     > > > > > > > Model name:          Intel(R) Xeon(R) Platinum 8124M
    > CPU @
    > >     > 3.00GHz
    > >     > > > > > > > Stepping:            4
    > >     > > > > > > > CPU MHz:             1223.344
    > >     > > > > > > > BogoMIPS:            6000.00
    > >     > > > > > > > Hypervisor vendor:   KVM
    > >     > > > > > > > Virtualization type: full
    > >     > > > > > > > L1d cache:           32K
    > >     > > > > > > > L1i cache:           32K
    > >     > > > > > > > L2 cache:            1024K
    > >     > > > > > > > L3 cache:            25344K
    > >     > > > > > > > NUMA node0 CPU(s):   0-17,36-53
    > >     > > > > > > > NUMA node1 CPU(s):   18-35,54-71
    > >     > > > > > > > Flags:               fpu vme de pse tsc msr pae mce cx8
    > > apic
    > >     > sep
    > >     > > > mtrr
    > >     > > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht
    > > syscall
    > >     > > > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good
    > > nopl
    > >     > > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
    > > monitor
    > >     > > > > > > > ssse3 fma cx16 pcid
    > >     > > > > > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer
    > aes
    > > xsave
    > >     > > > > > > > avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
    > >     > > > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2
    > smep
    > > bmi2
    > >     > > > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx smap
    > >     > clflushopt
    > >     > > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
    > > xsaves
    > >     > > > > > > > ida arat pku ospke ----------Network Test----------
    > >     > > > > > > >
    > >     > > > > > > > On Tue, Jun 25, 2019 at 2:35 PM Pedro Larroy
    > >     > > > > <pe...@gmail.com> wrote:
    > >     > > > > > > > >
    > >     > > > > > > > > I did a training of cifar10 in CPU and seems there's
    > > some
    > >     > > > > > > > > regressions in the range of 7% increase of training
    > time
    > >     > against
    > >     > > > 1.4.1:
    > >     > > > > > > > >
    > >     > > > > > > > > (py3_venv)
    > >     > > > > > > > > piotr@ip-172-31-63-171
    > > :0:~/deeplearning-benchmark/dawnbench
    > >     > > > > > > > > (master)+$ time python cifar10.py --epochs 5
    > >     > > > > > > > > real    11m30.388s
    > >     > > > > > > > > user    417m7.766s
    > >     > > > > > > > > sys     16m57.315s
    > >     > > > > > > > >
    > >     > > > > > > > > VS 1.4.1:
    > >     > > > > > > > > real    10m41.994s
    > >     > > > > > > > > user    392m40.646s
    > >     > > > > > > > > sys     12m30.601s
    > >     > > > > > > > >
    > >     > > > > > > > >
    > >     > > > > > > > > On Thu, Jun 20, 2019 at 10:15 PM Lai Wei <
    > >     > royweilai@gmail.com>
    > >     > > > > wrote:
    > >     > > > > > > > > >
    > >     > > > > > > > > > Hi Anirudh,
    > >     > > > > > > > > >
    > >     > > > > > > > > > Thanks for jumping into this quickly, I followed up
    > > on the
    > >     > > > issue.
    > >     > > > > > > > > >
    > >     > > > > > > > > > I was meant for sockeye developer/maintainers to
    > help
    > > setup
    > >     > > > > > > > > > nightly tests and raise issues early.
    > >     > > > > > > > > >
    > >     > > > > > > > > > Thanks!
    > >     > > > > > > > > >
    > >     > > > > > > > > > On Fri, Jun 21, 2019 at 10:10 AM Haibin Lin
    > >     > > > > > > > > > <ha...@gmail.com>
    > >     > > > > > > > > > wrote:
    > >     > > > > > > > > >
    > >     > > > > > > > > > > In GluonNLP we are testing with MXNET nightly
    > build
    > > for
    > >     > > > > > > > > > > each PR, and we did find some MXNet related issue
    > > caught
    > >     > by
    > >     > > > the CI.
    > >     > > > > > > > > > > I recommend other toolkits also add integration
    > > tests
    > >     > with
    > >     > > > > > > > > > > MXNet
    > >     > > > > nightly.
    > >     > > > > > > > > > > It helps identify issues early.
    > >     > > > > > > > > > >
    > >     > > > > > > > > > > Best,
    > >     > > > > > > > > > > Haibin
    > >     > > > > > > > > > >
    > >     > > > > > > > > > > On Thu, Jun 20, 2019 at 18:52 Zhao, Patric
    > >     > > > > > > > > > > <pa...@intel.com>
    > >     > > > > wrote:
    > >     > > > > > > > > > >
    > >     > > > > > > > > > > > Thanks to raise the issue and we will take a
    > look
    > > ASAP.
    > >     > > > > > > > > > > >
    > >     > > > > > > > > > > > The downstream cases is not in the MXNet CI so
    > > it's
    > >     > hard
    > >     > > > > > > > > > > > to catch the potential bugs or performance
    > > degradation
    > >     > > > > > > > > > > > for
    > >     > > > > MXNet developers.
    > >     > > > > > > > > > > >
    > >     > > > > > > > > > > > In the future, I suggest adding the major
    > > downstream
    > >     > > > > > > > > > > > test cases, like
    > >     > > > > > > > > > > from
    > >     > > > > > > > > > > > sockeye, GluonNLP, GLuonCV, DGL, Gluon-TS, into
    > > the
    > >     > > > > > > > > > > > nightly
    > >     > > > > test.
    > >     > > > > > > > > > > > If it's still too heavy,  maybe testing it
    > weekly
    > > or
    > >     > > > > > > > > > > > monthly :)
    > >     > > > > > > > > > > >
    > >     > > > > > > > > > > > Thanks,
    > >     > > > > > > > > > > >
    > >     > > > > > > > > > > > --Patric
    > >     > > > > > > > > > > >
    > >     > > > > > > > > > > > > -----Original Message-----
    > >     > > > > > > > > > > > > From: Anirudh Subramanian
    > >     > > > > > > > > > > > > [mailto:anirudh2290@gmail.com]
    > >     > > > > > > > > > > > > Sent: Friday, June 21, 2019 9:31 AM
    > >     > > > > > > > > > > > > To: dev@mxnet.incubator.apache.org
    > >     > > > > > > > > > > > > Cc: dev@mxnet.apache.org
    > >     > > > > > > > > > > > > Subject: Re: [VOTE] Release Apache MXNet
    > > (incubating)
    > >     > > > > > > > > > > > > version
    > >     > > > > > > > > > > > > 1.5.0.rc1
    > >     > > > > > > > > > > > >
    > >     > > > > > > > > > > > > Hi Lai,
    > >     > > > > > > > > > > > >
    > >     > > > > > > > > > > > > I have opened an issue:
    > >     > > > > > > > > > > > >
    > >     > https://github.com/apache/incubator-mxnet/issues/15297
    > >     > > > > > > > > > > > > I came to know about this issue only today
    > and
    > > I have
    > >     > > > > > > > > > > > > not been
    > >     > > > > > > > > > > monitoring
    > >     > > > > > > > > > > > > sockeye.
    > >     > > > > > > > > > > > > I jumped onto this issue to make sure it
    > wasn't
    > >     > caused
    > >     > > > > > > > > > > > > by the dlpack
    > >     > > > > > > > > > > > changes.
    > >     > > > > > > > > > > > > Also, I don't  think sockeye CI checks
    > against
    > >     > master,
    > >     > > > > > > > > > > > > it is using
    > >     > > > > > > > > > > 1.4.1.
    > >     > > > > > > > > > > > >
    > >     > > > > > > > > > > > > Anirudh
    > >     > > > > > > > > > > > >
    > >     > > > > > > > > > > > >
    > >     > > > > > > > > > > > > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei
    > >     > > > > > > > > > > > > <ro...@gmail.com>
    > >     > > > > wrote:
    > >     > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > Hi,
    > >     > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > Could you share which test failed and
    > what’s
    > > the
    > >     > > > > > > > > > > > > > crash? How to reproduce it?
    > >     > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > I was able to install sockeye and run all
    > > tests
    > >     > passed.
    > >     > > > > > > > > > > > > > Using python setup.py test
    > >     > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > I have tested both nightly pip package and
    > >     > 1.5.0.rc1
    > >     > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > It would be great to create an issue with
    > >     > > > > > > > > > > > > > reproducible steps and move the discussion
    > > there.
    > >     > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > Also I see sockeye nightly build[1] has
    > been
    > >     > failing
    > >     > > > > > > > > > > > > > for some time,
    > >     > > > > > > > > > > if
    > >     > > > > > > > > > > > > > it’s due to MXNet change, please raise this
    > > early
    > >     > so
    > >     > > > > > > > > > > > > > we can track and solve it in time rather
    > than
    > > block
    > >     > > > > > > > > > > > > > the release
    > >     > > > > during vote time.
    > >     > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > [1] https://travis-ci.org/awslabs/sockeye
    > >     > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 7:01 AM Anirudh
    > > Subramanian
    > >     > > > > > > > > > > > > > <anirudh2290@gmail.com
    > >     > > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > wrote:
    > >     > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > > I was able to reproduce a crash with the
    > > commit
    > >     > > > > > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06
    > > but not
    > >     > > > > > > > > > > > > > > with the commit
    > >     > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c.
    > >     > > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > > Anirudh
    > >     > > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei
    > >     > > > > > > > > > > > > > > <ro...@gmail.com>
    > >     > > > > > > > > > > wrote:
    > >     > > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > > > Hi Przemyslaw,
    > >     > > > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > > > Is there an issue with more details to
    > > track
    > >     > the
    > >     > > > problem?
    > >     > > > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 6:04 AM
    > Przemysław
    > >     > > > > > > > > > > > > > > > Trędak <pt...@apache.org>
    > >     > > > > > > > > > > > > > > > wrote:
    > >     > > > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > > > > -1
    > >     > > > > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > > > > There is a crash in sockeye unit test
    > > (python
    > >     > > > > > > > > > > > > > > > > setup.py
    > >     > > > > > > > > > > > > > > > > test) observed starting with nightly
    > 1.5
    > >     > build
    > >     > > > > > > > > > > > > > > > > from
    > >     > > > > > > > > > > > > > > > > 6/13 and still occuring in
    > >     > > > > > > > > > > > > > > 1.5rc1. I
    > >     > > > > > > > > > > > > > > > > don't yet have the exact commit that
    > is
    > >     > > > > > > > > > > > > > > > > responsible for it, but it is either
    > >     > > > > > > > > > > > > > > > >
    > a862270beb2d796c1ba311183f7f4a766a18ad6c
    > >     > > > > > > > > > > > > > > > > (dlpack
    > >     > > > > > > > > > > > > > > > > related) or
    > >     > > > > > > > > > > > > > > > >
    > 09202f7f261954383aa387144524d38f83f18d06
    > >     > > > > > > > > > > > > > > > > (cached op
    > >     > > > > > > > > > > > > optimization).
    > >     > > > > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > > > > On 2019/06/20 06:36:22, Lai Wei
    > >     > > > > > > > > > > > > > > > > <ro...@gmail.com>
    > >     > > > > wrote:
    > >     > > > > > > > > > > > > > > > > > Dear MXNet community,
    > >     > > > > > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > > > > > This is the 3-day vote to release
    > > Apache
    > >     > > > > > > > > > > > > > > > > > MXNet
    > >     > > > > > > > > > > > > > > > > > (incubating) version
    > >     > > > > > > > > > > > > > > > > 1.5.0.
    > >     > > > > > > > > > > > > > > > > > Voting on dev@ will start June 19,
    > >     > > > > > > > > > > > > > > > > > 23:59:59(PST) and close
    > >     > > > > > > > > > > on
    > >     > > > > > > > > > > > > > June
    > >     > > > > > > > > > > > > > > > 22,
    > >     > > > > > > > > > > > > > > > > > 23:59:59.
    > >     > > > > > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > > > > > 1) Link to release notes:
    > >     > > > > > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > >
    > >     > > > > > > > > > >
    > >     > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Re
    > >     > > > > > > > > > > le
    > >     > > > > > > > > > > ase+No
    > >     > > > > > > > > > > te
    > >     > > > > > > > > > > > > > > s
    > >     > > > > > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > > > > > 2) Link to release candidate:
    > >     > > > > > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > > > > >
    > >     > > > > > > > > > >
    > >     > https://github.com/apache/incubator-mxnet/releases/tag/1.5
    > >     > > > > > > > > > > .0
    > >     > > > > > > > > > > .r
    > >     > > > > > > > > > > > > > > > > > c1
    > >     > > > > > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > > > > > 3) Link to source and signatures on
    > > apache
    > >     > > > dist server:
    > >     > > > > > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > > > > >
    > >     > > > > > > > > > >
    > >     > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5
    > >     > > > > > > > > > > .0
    > >     > > > > > > > > > > .r
    > >     > > > > > > > > > > > > > > > > > c1/
    > >     > > > > > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > > > > > Please remember to TEST first
    > before
    > > voting
    > >     > > > > accordingly:
    > >     > > > > > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > > > > > +1 = approve
    > >     > > > > > > > > > > > > > > > > > +0 = no opinion
    > >     > > > > > > > > > > > > > > > > > -1 = disapprove (provide reason)
    > >     > > > > > > > > > > > > > > > > > --
    > >     > > > > > > > > > > > > > > > > > Best Regards
    > >     > > > > > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > > > > > Lai
    > >     > > > > > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > > > --
    > >     > > > > > > > > > > > > > > > Best Regards
    > >     > > > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > > > Lai
    > >     > > > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > --
    > >     > > > > > > > > > > > > > Best Regards
    > >     > > > > > > > > > > > > >
    > >     > > > > > > > > > > > > > Lai
    > >     > > > > > > > > > > > > >
    > >     > > > > > > > > > > >
    > >     > > > > > > > > > >
    > >     > > > > > > > > > --
    > >     > > > > > > > > > Best Regards
    > >     > > > > > > > > >
    > >     > > > > > > > > > Lai
    > >     > > >
    > >     > > --
    > >     > > Best Regards
    > >     > >
    > >     > > Lai
    > >     >
    > >     >
    > >
    > >     --
    > >     Sandeep Krishnamurthy
    >
    > --
    Best Regards
    
    Lai
    



Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Posted by Lai Wei <ro...@gmail.com>.
Hi,

Some more data points:

I ran the same cifar10.py scripts with same setup, BUT added a fixed seed

Ran 50 epochs, and first 10 epoch as warmup.
I have the following average time per epoch:
1.4.1: 164.95 s
1.5.0: 170.44 s
Detailed data at [1]
This is about 3% regression, less than Manu’s result but more close to the
Gluon result.

As for the operator benchmarks from Sandeep[2],  I have calculated the
percentage of speed increase/regression here[1]. Looks like not all
operators mentioned before slowed down. should it be treated as an separate
issue as it’s testing on fake data with different shape than CIFAR10
dataset? For example, batch norm has no regression in the report but it’s
slowed down in cifar10.py script profiling.

[1] https://gist.github.com/roywei/41fce930f013ff3b54cda6e86eaaf66b
[2]
https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50


On Fri, Jun 28, 2019 at 2:47 PM Pedro Larroy <pe...@gmail.com>
wrote:

> Thanks Manu.
>
> @all: I observed other strange stuff that I don't understand at the moment:
>
> I installed rc for 1.5 from pip to check that I'm not doing something
> wrong when building. And I found out that the usage of CPU is quite
> subpar ( https://imgur.com/fRmbQNc ) compared to a version compiled
> from source. The pip package is using 4-5 cores of the 32. When I
> compile from source I get good core utilization. (
> https://imgur.com/e8BB425 ). I verified this also on a c5d.18xlarge
> and a 32 core AMD bare metal machine.
>
> Seems to me also that the version from pip is using gomp instead of
> llvm's omp. I'm not sure why.
>
> pip install mxnet==1.5.0b20190627
> /home/piotr/py3_1.5rc/lib/python3.6/site-packages/mxnet
> piotr@panther:0: ~/p/l/p/s/mxnet> ldd libmxnet.so | grep omp
>     libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
> (0x00007f99d1832000)
>
> I tried cifar10 on a bare metal 32 core AMD Zen machine and is
> extremely slow, doesn't seem to make much progress, when compared to a
> c5d.18xlarge, I couldn't even do 1 epoch, tried with and without MKL
> without much success. Will continue digging into this when possible.
>
>
> Pedro.
>
> On Thu, Jun 27, 2019 at 9:41 PM Manu Seth <ma...@gmail.com> wrote:
> >
> > Hi all,
> >
> > I ran the same cifar10.py script as Pedro, but for 20 epochs. Considering
> > the first 10 epochs for warm-up, I averaged time per epoch for the last
> 10
> > epochs.
> >
> > With MXNet 1.4.1 average time is 164.23 s
> > With MXNet 1.5.0 average time is 174.59 s (~6.3% regression)
> >
> >
> > For a second data point, I ran Gluon speed test benchmark script -
> >
> https://github.com/apache/incubator-mxnet/blob/master/benchmark/python/gluon/benchmark_gluon.py
> > using the following command:
> > python3 benchmark_gluon.py --model 'resnet152_v2' --batch-size 128
> > --num-batches 200 --type 'training'
> >
> > I got the following speeds:
> > With MXNet 1.4.1, average speed is 25.677534 img/s
> > With MXNet 1.5.0, average speed is 25.082130 img/s (~2.3% regression)
> >
> > Note:
> > For 1.4.1 version, I used pip install mxnet-mkl==1.4.1
> > For 1.5.0 version, I used pip install mxnet-mkl==1.5.0b20190619 which
> > corresponds to commit# ccbbf6b4b76ea536a6583c99497c83b65a20817b which is
> > behind 1.5.x branch by 4 commits
> >
> >
> > Best,
> > Manu
> >
> >
> > On 6/27/19, 3:37 PM, "sandeep krishnamurthy" <
> sandeep.krishna98@gmail.com>
> > wrote:
> >
> >     Hello Ciyong/Pedro,
> >
> >     Ran operator benchmarks on 1.4.1 and 1.5.0.rc2. (Not complete,
> doesn’t
> >     cover all MXNet operators, not presented in best possible way, still
> > WIP)
> >
> >
> https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50
> >
> >     Following operators looks slower in 1.5 compared to 1.4.1:
> >     - BatchNorm
> >     - Pooling
> >     - FullyConnected
> >     - batch_dot
> >     - Dot
> >     - broadcast_mul
> >     - log_softmax
> >     and few other operators
> >
> >     Also, several operators runs a lot faster on 1.5 compared to 1.4.1.
> For
> >     example - Convolution, flatten, elementwise operators etc. So I see
> that
> >     likely few operators have regressed noticeably, however, due to other
> >     operator performance improvements, the end effect is not that
> > significant
> >     hiding a lot of regression. We need more detailed analysis per
> operator
> >     performance. We will not be able to do this for current release, we
> > should
> >     have a more concrete way to determining such performance regression
> > before
> >     next release.
> >
> >     Setup:
> >     1.5 => Build from source (head of 1.5.rc2 tag), built with MKLDNN
> >     1.4.1 => PyPi mxnet-mkl==1.4.1
> >     Machine: C5.18X
> >     No explicit environment variable were set
> >     Operator benchmark code -
> >
> https://github.com/apache/incubator-mxnet/tree/master/benchmark/opperf
> >
> >     Best,
> >     Sandeep
> >
> >
> >     On Thu, Jun 27, 2019 at 10:42 AM Pedro Larroy <
> > pedro.larroy.lists@gmail.com>
> >     wrote:
> >
> >     > I will try to run a few benchmarks in a bare metal instance
> tonight to
> >     > remove virtualization variance for the measurements and provide
> some
> >     > numbers.
> >     >
> >     > Please propose a set of models / examples that would be desirable
> to
> >     > run before the release and provide a link to an easy to run script
> >     > with instructions so we can validate the release better.
> >     >
> >     > Thank you.
> >     >
> >     > On Thu, Jun 27, 2019 at 10:01 AM Lai Wei <ro...@gmail.com>
> wrote:
> >     > >
> >     > > Dear @dev,
> >     > >
> >     > > I m cancelling the vote for cached op fix:
> >     > >
> >     > > https://github.com/apache/incubator-mxnet/pull/15298
> >     > >
> >     > > As for the possible cpu training regression, it looks like not a
> > blocker
> >     > > for now.
> >     > >
> >     > > I will start a new rc2 vote, please help to validate.
> >     > >
> >     > > Thanks!
> >     > >
> >     > >
> >     > > On Thu, Jun 27, 2019 at 10:06 PM Chen, Ciyong <
> ciyong.chen@intel.com
> > >
> >     > wrote:
> >     > >
> >     > > > Hi Pedro,
> >     > > >
> >     > > > I was able to reproduced the similar result (v1.5 is ~%5.6
> slower
> > than
> >     > > > v1.4, I was using 18 cores for computing) with your script on
> >     > C5.18xlarge.
> >     > > > But need to bind the cores with below command when running the
> > script,
> >     > > > (without setting the env variables, I got a close time (<1%)
> with
> > v1.5
> >     > and
> >     > > > v1.4)
> >     > > >         export
> > KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0
> >     > > >         export OMP_NUM_THREADS=18
> >     > > >
> >     > > > Did you set any env variables during running?
> >     > > >
> >     > > > The performance result I got as below:
> >     > > > 1) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
> >     > > > real    12m10.856s
> >     > > > user    234m49.576s
> >     > > > sys     4m38.044s
> >     > > >
> >     > > > 2) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
> >     > > > real    12m52.140s
> >     > > > user    246m30.740s
> >     > > > sys     5m8.188s
> >     > > >
> >     > > > As I looked at the profiling data, most of the ops have same
> perf
> >     > between
> >     > > > v1.4 and v1.5. But some ops like " _backward_BatchNorm" and
> > "Pooling"
> >     > is
> >     > > > ~1.37x slower on v1.5 compared with v1.4.
> >     > > > Will do further analysis on these ops.
> >     > > >
> >     > > > Here's the hardware/OS info from my side:
> >     > > > ----------Python Info----------
> >     > > > Version      : 3.6.8
> >     > > > Compiler     : GCC 7.3.0
> >     > > > Build        : ('default', 'Dec 30 2018 01:22:34')
> >     > > > Arch         : ('64bit', '')
> >     > > > ------------Pip Info-----------
> >     > > > Version      : 19.0.3
> >     > > > Directory    :
> >     > > >
> > /home/ubuntu/anaconda3/envs/perf-mxnet/lib/python3.6/site-packages/pip
> >     > > > ----------MXNet Info-----------
> >     > > > Version      : 1.5.0
> >     > > > Directory    : /home/ubuntu/ws/incubator-mxnet/python/mxnet
> >     > > > Hashtag not found. Not installed from pre-built package.
> >     > > > ----------System Info----------
> >     > > > Platform     :
> Linux-4.4.0-1085-aws-x86_64-with-debian-stretch-sid
> >     > > > system       : Linux
> >     > > > node         : ip-172-31-32-129
> >     > > > release      : 4.4.0-1085-aws
> >     > > > version      : #96-Ubuntu SMP Tue Jun 11 09:08:32 UTC 2019
> >     > > > ----------Hardware Info----------
> >     > > > machine      : x86_64
> >     > > > processor    : x86_64
> >     > > > Architecture:          x86_64
> >     > > > CPU op-mode(s):        32-bit, 64-bit
> >     > > > Byte Order:            Little Endian
> >     > > > CPU(s):                72
> >     > > > On-line CPU(s) list:   0-71
> >     > > > Thread(s) per core:    2
> >     > > > Core(s) per socket:    18
> >     > > > Socket(s):             2
> >     > > > NUMA node(s):          2
> >     > > > Vendor ID:             GenuineIntel
> >     > > > CPU family:            6
> >     > > > Model:                 85
> >     > > > Model name:            Intel(R) Xeon(R) Platinum 8124M CPU @
> > 3.00GHz
> >     > > > Stepping:              3
> >     > > > CPU MHz:               3000.000
> >     > > > BogoMIPS:              6000.00
> >     > > > Hypervisor vendor:     KVM
> >     > > > Virtualization type:   full
> >     > > > L1d cache:             32K
> >     > > > L1i cache:             32K
> >     > > > L2 cache:              1024K
> >     > > > L3 cache:              25344K
> >     > > > NUMA node0 CPU(s):     0-17,36-53
> >     > > > NUMA node1 CPU(s):     18-35,54-71
> >     > > > Flags:                 fpu vme de pse tsc msr pae mce cx8 apic
> > sep mtrr
> >     > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall
> nx
> >     > pdpe1gb
> >     > > > rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology
> > nonstop_tsc
> >     > > > aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16
> > pcid
> >     > sse4_1
> >     > > > sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx
> f16c
> > rdrand
> >     > > > hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser
> > fsgsbase
> >     > > > tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f
> > rdseed
> >     > adx
> >     > > > smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 ida arat
> pku
> >     > > > ----------Network Test----------
> >     > > >
> >     > > >
> >     > > > -Ciyong
> >     > > >
> >     > > >
> >     > > > -----Original Message-----
> >     > > > From: Zhao, Patric [mailto:patric.zhao@intel.com]
> >     > > > Sent: Thursday, June 27, 2019 9:55 AM
> >     > > > To: dev@mxnet.incubator.apache.org
> >     > > > Cc: dev@mxnet.apache.org
> >     > > > Subject: RE: [VOTE] Release Apache MXNet (incubating) version
> > 1.5.0.rc1
> >     > > >
> >     > > > Could we run more epochs to see the performance difference or
> > profiling
> >     > > > the difference between good and bad run?
> >     > > >
> >     > > > > -----Original Message-----
> >     > > > > From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com]
> >     > > > > Sent: Thursday, June 27, 2019 9:35 AM
> >     > > > > To: dev@mxnet.incubator.apache.org
> >     > > > > Cc: dev@mxnet.apache.org
> >     > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version
> >     > > > > 1.5.0.rc1
> >     > > > >
> >     > > > > I run again and the gap is again bigger, I guess we need to
> > average
> >     > > > > out the times across several runs:
> >     > > > >
> >     > > > > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
> >     > > > > (master)+$ time ~/mxnet_1.4/py3_venv/bin/python cifar10.py
> > --epochs 5
> >     > > > > && time ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
> >     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
> >     > > > > ImageRecordIOParser2:
> >     > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use
> 4
> >     > threads
> >     > > > > for decoding..
> >     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean
> > image
> >     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean
> > image
> >     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > completed
> >     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
> >     > > > > ImageRecordIOParser2:
> >     > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
> > threads
> >     > > > > for decoding..
> >     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean
> > image
> >     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean
> > image
> >     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > completed
> >     > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
> > 300:
> >     > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 [23:17:09]
> >     > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> >     > > > > 147456 bytes with malloc directly
> >     > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> Allocate
> >     > > > > 589824 bytes with malloc directly
> >     > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> Allocate
> >     > > > > 2359296 bytes with malloc directly
> >     > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> Allocate
> >     > > > > 9437184 bytes with malloc directly
> >     > > > > Epoch 0, Batch 199, Speed=384.149839
> >     > > > > Epoch 0, Duration=140.919567
> >     > > > > Epoch 0, Training accuracy=0.115169
> >     > > > > Epoch 0, Validation accuracy=0.141317
> >     > > > > Epoch 1, Batch 199, Speed=433.380512
> >     > > > > Epoch 1, Duration=119.553233
> >     > > > > Epoch 1, Training accuracy=0.170956
> >     > > > > Epoch 1, Validation accuracy=0.216146
> >     > > > > Epoch 2, Batch 199, Speed=434.864699
> >     > > > > Epoch 2, Duration=123.278490
> >     > > > > Epoch 2, Training accuracy=0.209455
> >     > > > > Epoch 2, Validation accuracy=0.247296
> >     > > > > Epoch 3, Batch 199, Speed=433.401854
> >     > > > > Epoch 3, Duration=118.327797
> >     > > > > Epoch 3, Training accuracy=0.248701
> >     > > > > Epoch 3, Validation accuracy=0.302083
> >     > > > > Epoch 4, Batch 199, Speed=419.713707
> >     > > > > Epoch 4, Duration=126.468409
> >     > > > > Epoch 4, Training accuracy=0.260949
> >     > > > > Epoch 4, Validation accuracy=0.269030
> >     > > > >
> >     > > > > real    10m55.796s
> >     > > > > user    399m33.567s
> >     > > > > sys     13m55.904s
> >     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
> >     > > > > ImageRecordIOParser2:
> >     > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use
> 4
> >     > threads
> >     > > > > for decoding..
> >     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean
> > image
> >     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean
> > image
> >     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > completed
> >     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
> >     > > > > ImageRecordIOParser2:
> >     > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
> > threads
> >     > > > > for decoding..
> >     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean
> > image
> >     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean
> > image
> >     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > completed
> >     > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
> > 300:
> >     > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0, Batch
> > 199,
> >     > > > > Speed=419.039188 Epoch 0, Duration=143.934903 Epoch 0,
> Training
> >     > > > > accuracy=0.122542 Epoch 0, Validation accuracy=0.164359
> Epoch 1,
> >     > Batch
> >     > > > > 199, Speed=445.257048 Epoch 1, Duration=135.248399 Epoch 1,
> > Training
> >     > > > > accuracy=0.178828 Epoch 1, Validation accuracy=0.199419
> Epoch 2,
> >     > Batch
> >     > > > > 199, Speed=447.115215 Epoch 2, Duration=132.003770 Epoch 2,
> > Training
> >     > > > > accuracy=0.217808 Epoch 2, Validation accuracy=0.233073
> Epoch 3,
> >     > Batch
> >     > > > > 199, Speed=441.079477 Epoch 3, Duration=126.543316 Epoch 3,
> > Training
> >     > > > > accuracy=0.248102 Epoch 3, Validation accuracy=0.293870
> Epoch 4,
> >     > Batch
> >     > > > > 199, Speed=449.329787 Epoch 4, Duration=138.398325 Epoch 4,
> > Training
> >     > > > > accuracy=0.270021 Epoch 4, Validation accuracy=0.311498
> >     > > > >
> >     > > > > real    11m45.329s
> >     > > > > user    426m13.908s
> >     > > > > sys     16m45.093s
> >     > > > >
> >     > > > > On Wed, Jun 26, 2019 at 4:18 PM Pedro Larroy
> >     > > > > <pe...@gmail.com> wrote:
> >     > > > > >
> >     > > > > > The difference looks smaller now, more like your numbers. I
> > wonder
> >     > > > > > if something happened during the previous benchmark like a
> > system
> >     > > > > > update...
> >     > > > > >
> >     > > > > >
> >     > > > > > piotr@ip-172-31-63-171
> :0:~/deeplearning-benchmark/dawnbench
> >     > > > > (master)+$
> >     > > > > > time ~/mxnet_1.4/py3_venv/bin/python cifar10.py --epochs 5
> &&
> > time
> >     > > > > > ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
> > [22:49:41]
> >     > > > > > ../src/io/iter_image_recordio_2.cc:172:
> >     > > > > > ImageRecordIOParser2:
> >     > > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec,
> use 4
> >     > > > > > threads for decoding..
> >     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load
> mean
> > image
> >     > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load
> mean
> > image
> >     > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >     > > > > completed
> >     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:172:
> >     > > > > > ImageRecordIOParser2:
> >     > > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec,
> use 4
> >     > > > > > threads for decoding..
> >     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load
> mean
> > image
> >     > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load
> mean
> > image
> >     > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >     > > > > completed
> >     > > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123:
> 0.0005,
> > 300:
> >     > > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 [22:49:42]
> >     > > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> >     > > > > > 147456 bytes with malloc directly
> >     > > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> > Allocate
> >     > > > > > 589824 bytes with malloc directly
> >     > > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> > Allocate
> >     > > > > > 2359296 bytes with malloc directly
> >     > > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> > Allocate
> >     > > > > > 9437184 bytes with malloc directly
> >     > > > > > Epoch 0, Batch 199, Speed=426.182733 Epoch 0,
> > Duration=134.868458
> >     > > > > > Epoch 0, Training accuracy=0.127238 Epoch 0, Validation
> >     > > > > > accuracy=0.206388 Epoch 1, Batch 199, Speed=313.127156
> Epoch
> > 1,
> >     > > > > > Duration=128.041775 Epoch 1, Training accuracy=0.182065
> Epoch
> > 1,
> >     > > > > > Validation accuracy=0.202524 Epoch 2, Batch 199,
> > Speed=410.931187
> >     > > > > > Epoch 2, Duration=124.920588 Epoch 2, Training
> > accuracy=0.202584
> >     > > > > > Epoch 2, Validation accuracy=0.245693 Epoch 3, Batch 199,
> >     > > > > > Speed=419.119335 Epoch 3, Duration=120.948349 Epoch 3,
> > Training
> >     > > > > > accuracy=0.235854 Epoch 3, Validation accuracy=0.291066
> Epoch
> > 4,
> >     > > > > > Batch 199, Speed=430.473733 Epoch 4, Duration=130.181724
> > Epoch 4,
> >     > > > > > Training accuracy=0.257773 Epoch 4, Validation
> > accuracy=0.304988
> >     > > > > >
> >     > > > > > real    11m7.356s
> >     > > > > > user    406m9.910s
> >     > > > > > sys     14m18.349s
> >     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
> >     > > > > > ImageRecordIOParser2:
> >     > > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec,
> use 4
> >     > > > > > threads for decoding..
> >     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load
> mean
> > image
> >     > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load
> mean
> > image
> >     > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >     > > > > completed
> >     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
> >     > > > > > ImageRecordIOParser2:
> >     > > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec,
> use 4
> >     > > > > > threads for decoding..
> >     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load
> mean
> > image
> >     > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load
> mean
> > image
> >     > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >     > > > > completed
> >     > > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123:
> 0.0005,
> > 300:
> >     > > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0,
> Batch
> > 199,
> >     > > > > > Speed=348.618154 Epoch 0, Duration=146.469352 Epoch 0,
> > Training
> >     > > > > > accuracy=0.124121 Epoch 0, Validation accuracy=0.167227
> Epoch
> > 1,
> >     > > > > > Batch 199, Speed=452.790825 Epoch 1, Duration=130.199421
> > Epoch 1,
> >     > > > > > Training
> >     > > > > > accuracy=0.183863 Epoch 1, Validation accuracy=0.237079
> Epoch
> > 2,
> >     > > > > > Batch 199, Speed=451.406559 Epoch 2, Duration=126.320823
> > Epoch 2,
> >     > > > > > Training
> >     > > > > > accuracy=0.214844 Epoch 2, Validation accuracy=0.244692
> Epoch
> > 3,
> >     > > > > > Batch 199, Speed=403.161873 Epoch 3, Duration=125.331660
> > Epoch 3,
> >     > > > > > Training
> >     > > > > > accuracy=0.243506 Epoch 3, Validation accuracy=0.301182
> Epoch
> > 4,
> >     > > > > > Batch 199, Speed=450.826598 Epoch 4, Duration=126.426253
> > Epoch 4,
> >     > > > > > Training
> >     > > > > > accuracy=0.266424 Epoch 4, Validation accuracy=0.311899
> >     > > > > >
> >     > > > > > real    11m21.930s
> >     > > > > > user    415m3.855s
> >     > > > > > sys     13m53.975s
> >     > > > > >
> >     > > > > > On Wed, Jun 26, 2019 at 3:50 PM Pedro Larroy
> >     > > > > > <pe...@gmail.com> wrote:
> >     > > > > > >
> >     > > > > > > Hi Ciyong, thanks for trying to reproduce:
> >     > > > > > >
> >     > > > > > > I used this one:
> >     > > > > > > https://github.com/awslabs/deeplearning-
> >     > > > > benchmark/blob/master/dawnbe
> >     > > > > > > nch/cifar10.py
> >     > > > > > >
> >     > > > > > > Could you provide hardware and OS details?
> >     > > > > > >
> >     > > > > > > I will rerun and repost numbers in a few minutes.
> >     > > > > > >
> >     > > > > > > Pedro.
> >     > > > > > >
> >     > > > > > > On Wed, Jun 26, 2019 at 4:18 AM Chen, Ciyong
> >     > > > > > > <ci...@intel.com>
> >     > > > > wrote:
> >     > > > > > > >
> >     > > > > > > > Hi Pedro,
> >     > > > > > > >
> >     > > > > > > > I'm looking at this case, and using the script of
> >     > > > > > > >
> > "incubator-mxnet/example/image-classification/train_cifar10.py"
> >     > > > > > > > to get
> >     > > > > the timing data, but seems there's not much difference
> between
> > mxnet
> >     > > > > 1.4.1.rc0 and 1.5.0.rc1 on C5.18xlarge.
> >     > > > > > > >
> >     > > > > > > > Not sure if there's any difference in the python
> script,
> > can
> >     > you
> >     > > > > > > > point me
> >     > > > > the link to get your script (cifar10.py)?
> >     > > > > > > > Or you can also have a try with MXNet's script
> >     > > > > > > > (train_cifar10.py) and see
> >     > > > > the performance.
> >     > > > > > > >
> >     > > > > > > > Here's the command I used to collect the time:
> >     > > > > > > >         python train_cifar10.py --num-epoch=5
> >     > > > > > > >
> >     > > > > > > > 1) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
> >     > > > > > > >         real    9m4.880s
> >     > > > > > > >         user    333m13.340s
> >     > > > > > > >         sys     14m36.100s
> >     > > > > > > >
> >     > > > > > > > 2) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
> >     > > > > > > >         real    9m2.155s
> >     > > > > > > >         user    329m37.092s
> >     > > > > > > >         sys     16m8.668s
> >     > > > > > > >
> >     > > > > > > > -Ciyong
> >     > > > > > > >
> >     > > > > > > >
> >     > > > > > > > -----Original Message-----
> >     > > > > > > > From: Pedro Larroy [mailto:
> pedro.larroy.lists@gmail.com]
> >     > > > > > > > Sent: Wednesday, June 26, 2019 6:28 AM
> >     > > > > > > > To: dev@mxnet.incubator.apache.org
> >     > > > > > > > Cc: dev@mxnet.apache.org
> >     > > > > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating)
> > version
> >     > > > > > > > 1.5.0.rc1
> >     > > > > > > >
> >     > > > > > > > Hi these were my build flags and system info:
> >     > > > > > > >
> >     > > > > > > >
> >     > > > > > > > --- # CMake configuration
> >     > > > > > > > USE_CUDA: "OFF" # Build with CUDA support
> >     > > > > > > > USE_OLDCMAKECUDA: "OFF" # Build with old cmake cuda
> >     > > > > > > > USE_NCCL: "OFF" # Use NVidia NCCL with CUDA
> >     > > > > > > > USE_OPENCV: "ON" # Build with OpenCV support
> >     > > > > > > > USE_OPENMP: "ON" # Build with Openmp support
> >     > > > > > > > USE_CUDNN: "ON" # Build with cudnn support) # one could
> > set
> >     > > > > > > > CUDNN_ROOT for search path
> >     > > > > > > > USE_SSE: "ON" # Build with x86 SSE instruction support
> IF
> > NOT
> >     > > > > > > > ARM
> >     > > > > > > > USE_F16C: "ON" # Build with x86 F16C instruction
> support)
> > #
> >     > > > > autodetects support if "ON"
> >     > > > > > > > USE_LAPACK: "ON" # Build with lapack support
> >     > > > > > > > USE_MKL_IF_AVAILABLE: "ON" # Use MKL if found
> >     > > > > > > > USE_MKLML_MKL: "ON" # Use MKLDNN variant of MKL (if MKL
> > found)
> >     > > > > > > > IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
> >     > > > > > > > USE_MKLDNN: "ON" # Use MKLDNN variant of MKL (if MKL
> > found) IF
> >     > > > > > > > USE_MKL_IF_AVAILABLE AND (NOT APPLE)
> >     > > > > > > > USE_OPERATOR_TUNING: "ON" # Enable auto-tuning of
> > operators IF
> >     > > > > NOT
> >     > > > > > > > MSVC
> >     > > > > > > > USE_GPERFTOOLS: "ON" # Build with GPerfTools support
> (if
> > found)
> >     > > > > > > > USE_JEMALLOC: "ON" # Build with Jemalloc support
> >     > > > > > > > USE_PROFILER: "ON" # Build with Profiler support
> >     > > > > > > > USE_DIST_KVSTORE: "OFF" # Build with DIST_KVSTORE
> support
> >     > > > > > > > USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC Plugins
> >     > > > > > > > USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin
> >     > > > > > > > USE_CPP_PACKAGE: "OFF" # Build C++ Package
> >     > > > > > > > USE_MXNET_LIB_NAMING: "ON" # Use MXNet library naming
> >     > > > > conventions.
> >     > > > > > > > USE_GPROF: "OFF" # Compile with gprof (profiling) flag
> >     > > > > > > > USE_CXX14_IF_AVAILABLE: "OFF" # Build with C++14 if the
> >     > compiler
> >     > > > > > > > supports it
> >     > > > > > > > USE_VTUNE: "OFF" # Enable use of Intel Amplifier XE
> > (VTune)) #
> >     > > > > > > > one could set VTUNE_ROOT for search path
> >     > > > > > > > ENABLE_CUDA_RTC: "ON" # Build with CUDA runtime
> > compilation
> >     > > > > > > > support
> >     > > > > > > > BUILD_CPP_EXAMPLES: "ON" # Build cpp examples
> >     > > > > > > > INSTALL_EXAMPLES: "OFF" # Install the example source
> > files.
> >     > > > > > > > USE_SIGNAL_HANDLER: "ON" # Print stack traces on
> > segfaults.
> >     > > > > > > > USE_TENSORRT: "OFF" # Enable infeference optimization
> with
> >     > > > TensorRT.
> >     > > > > > > > USE_ASAN: "OFF" # Enable Clang/GCC ASAN sanitizers.
> >     > > > > > > > ENABLE_TESTCOVERAGE: "OFF" # Enable compilation with
> test
> >     > > > > > > > coverage metric output
> >     > > > > > > > CMAKE_BUILD_TYPE: "Release"
> >     > > > > > > > CMAKE_CUDA_COMPILER_LAUNCHER: "ccache"
> >     > > > > > > > CMAKE_C_COMPILER_LAUNCHER: "ccache"
> >     > > > > > > > CMAKE_CXX_COMPILER_LAUNCHER: "ccache"
> >     > > > > > > >
> >     > > > > > > > commit 4d9667121ae6fb643f2a02ab15e25231ed756cde (HEAD,
> > tag:
> >     > > > > > > > 1.5.0.rc1,
> >     > > > > > > > upstream/v1.5.x)
> >     > > > > > > > commit 1a7199691f5cbc6012bb53eecbf884bed5ae6590 (HEAD,
> > tag:
> >     > > > > > > > 1.4.1.rc0,
> >     > > > > > > > upstream/v1.4.x)
> >     > > > > > > >
> >     > > > > > > > curl
> http://169.254.169.254/latest/meta-data/instance-type
> >     > > > > > > > c5d.18xlarge
> >     > > > > > > >
> >     > > > > > > >
> >     > > > > > > > Version      : 3.6.7
> >     > > > > > > > Compiler     : GCC 8.2.0
> >     > > > > > > > Build        : ('default', 'Oct 22 2018 11:32:17')
> >     > > > > > > > Arch         : ('64bit', 'ELF')
> >     > > > > > > > ------------Pip Info-----------
> >     > > > > > > > Version      : 19.1.1
> >     > > > > > > > Directory    :
> >     > /home/piotr/mxnet_1.5/py3_venv/lib/python3.6/site-
> >     > > > > packages/pip
> >     > > > > > > > ----------MXNet Info-----------
> >     > > > > > > > Version      : 1.5.0
> >     > > > > > > > Directory    : /home/piotr/mxnet_1.5/python/mxnet
> >     > > > > > > > Hashtag not found. Not installed from pre-built
> package.
> >     > > > > > > > ----------System Info----------
> >     > > > > > > > Platform     :
> >     > > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
> >     > > > > > > > system       : Linux
> >     > > > > > > > node         : ip-172-31-63-171
> >     > > > > > > > release      : 4.15.0-1035-aws
> >     > > > > > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC
> 2019
> >     > > > > > > > ----------Hardware Info----------
> >     > > > > > > > machine      : x86_64
> >     > > > > > > > processor    : x86_64
> >     > > > > > > > Architecture:        x86_64
> >     > > > > > > > CPU op-mode(s):      32-bit, 64-bit
> >     > > > > > > > Byte Order:          Little Endian
> >     > > > > > > > CPU(s):              72
> >     > > > > > > > On-line CPU(s) list: 0-71
> >     > > > > > > > Thread(s) per core:  2
> >     > > > > > > > Core(s) per socket:  18
> >     > > > > > > > Socket(s):           2
> >     > > > > > > > NUMA node(s):        2
> >     > > > > > > > Vendor ID:           GenuineIntel
> >     > > > > > > > CPU family:          6
> >     > > > > > > > Model:               85
> >     > > > > > > > Model name:          Intel(R) Xeon(R) Platinum 8124M
> CPU @
> >     > 3.00GHz
> >     > > > > > > > Stepping:            4
> >     > > > > > > > CPU MHz:             1326.446
> >     > > > > > > > BogoMIPS:            6000.00
> >     > > > > > > > Hypervisor vendor:   KVM
> >     > > > > > > > Virtualization type: full
> >     > > > > > > > L1d cache:           32K
> >     > > > > > > > L1i cache:           32K
> >     > > > > > > > L2 cache:            1024K
> >     > > > > > > > L3 cache:            25344K
> >     > > > > > > > NUMA node0 CPU(s):   0-17,36-53
> >     > > > > > > > NUMA node1 CPU(s):   18-35,54-71
> >     > > > > > > > Flags:               fpu vme de pse tsc msr pae mce cx8
> > apic
> >     > sep
> >     > > > mtrr
> >     > > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht
> > syscall
> >     > > > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good
> > nopl
> >     > > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
> > monitor
> >     > > > > > > > ssse3 fma cx16 pcid
> >     > > > > > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer
> aes
> > xsave
> >     > > > > > > > avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
> >     > > > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2
> smep
> > bmi2
> >     > > > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx smap
> >     > clflushopt
> >     > > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
> > xsaves
> >     > > > > > > > ida arat pku ospke ----------Network Test----------
> >     > > > > > > >
> >     > > > > > > > ----------Python Info----------
> >     > > > > > > > Version      : 3.6.7
> >     > > > > > > > Compiler     : GCC 8.2.0
> >     > > > > > > > Build        : ('default', 'Oct 22 2018 11:32:17')
> >     > > > > > > > Arch         : ('64bit', 'ELF')
> >     > > > > > > > ------------Pip Info-----------
> >     > > > > > > > Version      : 19.1.1
> >     > > > > > > > Directory    :
> >     > /home/piotr/mxnet_1.4/py3_venv/lib/python3.6/site-
> >     > > > > packages/pip
> >     > > > > > > > ----------MXNet Info-----------
> >     > > > > > > > Version      : 1.4.1
> >     > > > > > > > Directory    : /home/piotr/mxnet_1.4/python/mxnet
> >     > > > > > > > Hashtag not found. Not installed from pre-built
> package.
> >     > > > > > > > ----------System Info----------
> >     > > > > > > > Platform     :
> >     > > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
> >     > > > > > > > system       : Linux
> >     > > > > > > > node         : ip-172-31-63-171
> >     > > > > > > > release      : 4.15.0-1035-aws
> >     > > > > > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC
> 2019
> >     > > > > > > > ----------Hardware Info----------
> >     > > > > > > > machine      : x86_64
> >     > > > > > > > processor    : x86_64
> >     > > > > > > > Architecture:        x86_64
> >     > > > > > > > CPU op-mode(s):      32-bit, 64-bit
> >     > > > > > > > Byte Order:          Little Endian
> >     > > > > > > > CPU(s):              72
> >     > > > > > > > On-line CPU(s) list: 0-71
> >     > > > > > > > Thread(s) per core:  2
> >     > > > > > > > Core(s) per socket:  18
> >     > > > > > > > Socket(s):           2
> >     > > > > > > > NUMA node(s):        2
> >     > > > > > > > Vendor ID:           GenuineIntel
> >     > > > > > > > CPU family:          6
> >     > > > > > > > Model:               85
> >     > > > > > > > Model name:          Intel(R) Xeon(R) Platinum 8124M
> CPU @
> >     > 3.00GHz
> >     > > > > > > > Stepping:            4
> >     > > > > > > > CPU MHz:             1223.344
> >     > > > > > > > BogoMIPS:            6000.00
> >     > > > > > > > Hypervisor vendor:   KVM
> >     > > > > > > > Virtualization type: full
> >     > > > > > > > L1d cache:           32K
> >     > > > > > > > L1i cache:           32K
> >     > > > > > > > L2 cache:            1024K
> >     > > > > > > > L3 cache:            25344K
> >     > > > > > > > NUMA node0 CPU(s):   0-17,36-53
> >     > > > > > > > NUMA node1 CPU(s):   18-35,54-71
> >     > > > > > > > Flags:               fpu vme de pse tsc msr pae mce cx8
> > apic
> >     > sep
> >     > > > mtrr
> >     > > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht
> > syscall
> >     > > > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good
> > nopl
> >     > > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
> > monitor
> >     > > > > > > > ssse3 fma cx16 pcid
> >     > > > > > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer
> aes
> > xsave
> >     > > > > > > > avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
> >     > > > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2
> smep
> > bmi2
> >     > > > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx smap
> >     > clflushopt
> >     > > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
> > xsaves
> >     > > > > > > > ida arat pku ospke ----------Network Test----------
> >     > > > > > > >
> >     > > > > > > > On Tue, Jun 25, 2019 at 2:35 PM Pedro Larroy
> >     > > > > <pe...@gmail.com> wrote:
> >     > > > > > > > >
> >     > > > > > > > > I did a training of cifar10 in CPU and seems there's
> > some
> >     > > > > > > > > regressions in the range of 7% increase of training
> time
> >     > against
> >     > > > 1.4.1:
> >     > > > > > > > >
> >     > > > > > > > > (py3_venv)
> >     > > > > > > > > piotr@ip-172-31-63-171
> > :0:~/deeplearning-benchmark/dawnbench
> >     > > > > > > > > (master)+$ time python cifar10.py --epochs 5
> >     > > > > > > > > real    11m30.388s
> >     > > > > > > > > user    417m7.766s
> >     > > > > > > > > sys     16m57.315s
> >     > > > > > > > >
> >     > > > > > > > > VS 1.4.1:
> >     > > > > > > > > real    10m41.994s
> >     > > > > > > > > user    392m40.646s
> >     > > > > > > > > sys     12m30.601s
> >     > > > > > > > >
> >     > > > > > > > >
> >     > > > > > > > > On Thu, Jun 20, 2019 at 10:15 PM Lai Wei <
> >     > royweilai@gmail.com>
> >     > > > > wrote:
> >     > > > > > > > > >
> >     > > > > > > > > > Hi Anirudh,
> >     > > > > > > > > >
> >     > > > > > > > > > Thanks for jumping into this quickly, I followed up
> > on the
> >     > > > issue.
> >     > > > > > > > > >
> >     > > > > > > > > > I was meant for sockeye developer/maintainers to
> help
> > setup
> >     > > > > > > > > > nightly tests and raise issues early.
> >     > > > > > > > > >
> >     > > > > > > > > > Thanks!
> >     > > > > > > > > >
> >     > > > > > > > > > On Fri, Jun 21, 2019 at 10:10 AM Haibin Lin
> >     > > > > > > > > > <ha...@gmail.com>
> >     > > > > > > > > > wrote:
> >     > > > > > > > > >
> >     > > > > > > > > > > In GluonNLP we are testing with MXNET nightly
> build
> > for
> >     > > > > > > > > > > each PR, and we did find some MXNet related issue
> > caught
> >     > by
> >     > > > the CI.
> >     > > > > > > > > > > I recommend other toolkits also add integration
> > tests
> >     > with
> >     > > > > > > > > > > MXNet
> >     > > > > nightly.
> >     > > > > > > > > > > It helps identify issues early.
> >     > > > > > > > > > >
> >     > > > > > > > > > > Best,
> >     > > > > > > > > > > Haibin
> >     > > > > > > > > > >
> >     > > > > > > > > > > On Thu, Jun 20, 2019 at 18:52 Zhao, Patric
> >     > > > > > > > > > > <pa...@intel.com>
> >     > > > > wrote:
> >     > > > > > > > > > >
> >     > > > > > > > > > > > Thanks to raise the issue and we will take a
> look
> > ASAP.
> >     > > > > > > > > > > >
> >     > > > > > > > > > > > The downstream cases is not in the MXNet CI so
> > it's
> >     > hard
> >     > > > > > > > > > > > to catch the potential bugs or performance
> > degradation
> >     > > > > > > > > > > > for
> >     > > > > MXNet developers.
> >     > > > > > > > > > > >
> >     > > > > > > > > > > > In the future, I suggest adding the major
> > downstream
> >     > > > > > > > > > > > test cases, like
> >     > > > > > > > > > > from
> >     > > > > > > > > > > > sockeye, GluonNLP, GLuonCV, DGL, Gluon-TS, into
> > the
> >     > > > > > > > > > > > nightly
> >     > > > > test.
> >     > > > > > > > > > > > If it's still too heavy,  maybe testing it
> weekly
> > or
> >     > > > > > > > > > > > monthly :)
> >     > > > > > > > > > > >
> >     > > > > > > > > > > > Thanks,
> >     > > > > > > > > > > >
> >     > > > > > > > > > > > --Patric
> >     > > > > > > > > > > >
> >     > > > > > > > > > > > > -----Original Message-----
> >     > > > > > > > > > > > > From: Anirudh Subramanian
> >     > > > > > > > > > > > > [mailto:anirudh2290@gmail.com]
> >     > > > > > > > > > > > > Sent: Friday, June 21, 2019 9:31 AM
> >     > > > > > > > > > > > > To: dev@mxnet.incubator.apache.org
> >     > > > > > > > > > > > > Cc: dev@mxnet.apache.org
> >     > > > > > > > > > > > > Subject: Re: [VOTE] Release Apache MXNet
> > (incubating)
> >     > > > > > > > > > > > > version
> >     > > > > > > > > > > > > 1.5.0.rc1
> >     > > > > > > > > > > > >
> >     > > > > > > > > > > > > Hi Lai,
> >     > > > > > > > > > > > >
> >     > > > > > > > > > > > > I have opened an issue:
> >     > > > > > > > > > > > >
> >     > https://github.com/apache/incubator-mxnet/issues/15297
> >     > > > > > > > > > > > > I came to know about this issue only today
> and
> > I have
> >     > > > > > > > > > > > > not been
> >     > > > > > > > > > > monitoring
> >     > > > > > > > > > > > > sockeye.
> >     > > > > > > > > > > > > I jumped onto this issue to make sure it
> wasn't
> >     > caused
> >     > > > > > > > > > > > > by the dlpack
> >     > > > > > > > > > > > changes.
> >     > > > > > > > > > > > > Also, I don't  think sockeye CI checks
> against
> >     > master,
> >     > > > > > > > > > > > > it is using
> >     > > > > > > > > > > 1.4.1.
> >     > > > > > > > > > > > >
> >     > > > > > > > > > > > > Anirudh
> >     > > > > > > > > > > > >
> >     > > > > > > > > > > > >
> >     > > > > > > > > > > > > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei
> >     > > > > > > > > > > > > <ro...@gmail.com>
> >     > > > > wrote:
> >     > > > > > > > > > > > >
> >     > > > > > > > > > > > > > Hi,
> >     > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > Could you share which test failed and
> what’s
> > the
> >     > > > > > > > > > > > > > crash? How to reproduce it?
> >     > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > I was able to install sockeye and run all
> > tests
> >     > passed.
> >     > > > > > > > > > > > > > Using python setup.py test
> >     > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > I have tested both nightly pip package and
> >     > 1.5.0.rc1
> >     > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > It would be great to create an issue with
> >     > > > > > > > > > > > > > reproducible steps and move the discussion
> > there.
> >     > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > Also I see sockeye nightly build[1] has
> been
> >     > failing
> >     > > > > > > > > > > > > > for some time,
> >     > > > > > > > > > > if
> >     > > > > > > > > > > > > > it’s due to MXNet change, please raise this
> > early
> >     > so
> >     > > > > > > > > > > > > > we can track and solve it in time rather
> than
> > block
> >     > > > > > > > > > > > > > the release
> >     > > > > during vote time.
> >     > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > [1] https://travis-ci.org/awslabs/sockeye
> >     > > > > > > > > > > > > >
> >     > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 7:01 AM Anirudh
> > Subramanian
> >     > > > > > > > > > > > > > <anirudh2290@gmail.com
> >     > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > wrote:
> >     > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > I was able to reproduce a crash with the
> > commit
> >     > > > > > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06
> > but not
> >     > > > > > > > > > > > > > > with the commit
> >     > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c.
> >     > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > Anirudh
> >     > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei
> >     > > > > > > > > > > > > > > <ro...@gmail.com>
> >     > > > > > > > > > > wrote:
> >     > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > Hi Przemyslaw,
> >     > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > Is there an issue with more details to
> > track
> >     > the
> >     > > > problem?
> >     > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 6:04 AM
> Przemysław
> >     > > > > > > > > > > > > > > > Trędak <pt...@apache.org>
> >     > > > > > > > > > > > > > > > wrote:
> >     > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > > -1
> >     > > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > > There is a crash in sockeye unit test
> > (python
> >     > > > > > > > > > > > > > > > > setup.py
> >     > > > > > > > > > > > > > > > > test) observed starting with nightly
> 1.5
> >     > build
> >     > > > > > > > > > > > > > > > > from
> >     > > > > > > > > > > > > > > > > 6/13 and still occuring in
> >     > > > > > > > > > > > > > > 1.5rc1. I
> >     > > > > > > > > > > > > > > > > don't yet have the exact commit that
> is
> >     > > > > > > > > > > > > > > > > responsible for it, but it is either
> >     > > > > > > > > > > > > > > > >
> a862270beb2d796c1ba311183f7f4a766a18ad6c
> >     > > > > > > > > > > > > > > > > (dlpack
> >     > > > > > > > > > > > > > > > > related) or
> >     > > > > > > > > > > > > > > > >
> 09202f7f261954383aa387144524d38f83f18d06
> >     > > > > > > > > > > > > > > > > (cached op
> >     > > > > > > > > > > > > optimization).
> >     > > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > > On 2019/06/20 06:36:22, Lai Wei
> >     > > > > > > > > > > > > > > > > <ro...@gmail.com>
> >     > > > > wrote:
> >     > > > > > > > > > > > > > > > > > Dear MXNet community,
> >     > > > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > > > This is the 3-day vote to release
> > Apache
> >     > > > > > > > > > > > > > > > > > MXNet
> >     > > > > > > > > > > > > > > > > > (incubating) version
> >     > > > > > > > > > > > > > > > > 1.5.0.
> >     > > > > > > > > > > > > > > > > > Voting on dev@ will start June 19,
> >     > > > > > > > > > > > > > > > > > 23:59:59(PST) and close
> >     > > > > > > > > > > on
> >     > > > > > > > > > > > > > June
> >     > > > > > > > > > > > > > > > 22,
> >     > > > > > > > > > > > > > > > > > 23:59:59.
> >     > > > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > > > 1) Link to release notes:
> >     > > > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > >
> >     > > > > > > > > > >
> >     > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Re
> >     > > > > > > > > > > le
> >     > > > > > > > > > > ase+No
> >     > > > > > > > > > > te
> >     > > > > > > > > > > > > > > s
> >     > > > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > > > 2) Link to release candidate:
> >     > > > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > > >
> >     > > > > > > > > > >
> >     > https://github.com/apache/incubator-mxnet/releases/tag/1.5
> >     > > > > > > > > > > .0
> >     > > > > > > > > > > .r
> >     > > > > > > > > > > > > > > > > > c1
> >     > > > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > > > 3) Link to source and signatures on
> > apache
> >     > > > dist server:
> >     > > > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > > >
> >     > > > > > > > > > >
> >     > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5
> >     > > > > > > > > > > .0
> >     > > > > > > > > > > .r
> >     > > > > > > > > > > > > > > > > > c1/
> >     > > > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > > > Please remember to TEST first
> before
> > voting
> >     > > > > accordingly:
> >     > > > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > > > +1 = approve
> >     > > > > > > > > > > > > > > > > > +0 = no opinion
> >     > > > > > > > > > > > > > > > > > -1 = disapprove (provide reason)
> >     > > > > > > > > > > > > > > > > > --
> >     > > > > > > > > > > > > > > > > > Best Regards
> >     > > > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > > > Lai
> >     > > > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > --
> >     > > > > > > > > > > > > > > > Best Regards
> >     > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > Lai
> >     > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > --
> >     > > > > > > > > > > > > > Best Regards
> >     > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > Lai
> >     > > > > > > > > > > > > >
> >     > > > > > > > > > > >
> >     > > > > > > > > > >
> >     > > > > > > > > > --
> >     > > > > > > > > > Best Regards
> >     > > > > > > > > >
> >     > > > > > > > > > Lai
> >     > > >
> >     > > --
> >     > > Best Regards
> >     > >
> >     > > Lai
> >     >
> >     >
> >
> >     --
> >     Sandeep Krishnamurthy
>
> --
Best Regards

Lai

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Posted by Pedro Larroy <pe...@gmail.com>.
Thanks Manu.

@all: I observed other strange stuff that I don't understand at the moment:

I installed rc for 1.5 from pip to check that I'm not doing something
wrong when building. And I found out that the usage of CPU is quite
subpar ( https://imgur.com/fRmbQNc ) compared to a version compiled
from source. The pip package is using 4-5 cores of the 32. When I
compile from source I get good core utilization. (
https://imgur.com/e8BB425 ). I verified this also on a c5d.18xlarge
and a 32 core AMD bare metal machine.

Seems to me also that the version from pip is using gomp instead of
llvm's omp. I'm not sure why.

pip install mxnet==1.5.0b20190627
/home/piotr/py3_1.5rc/lib/python3.6/site-packages/mxnet
piotr@panther:0: ~/p/l/p/s/mxnet> ldd libmxnet.so | grep omp
    libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f99d1832000)

I tried cifar10 on a bare metal 32 core AMD Zen machine and is
extremely slow, doesn't seem to make much progress, when compared to a
c5d.18xlarge, I couldn't even do 1 epoch, tried with and without MKL
without much success. Will continue digging into this when possible.


Pedro.

On Thu, Jun 27, 2019 at 9:41 PM Manu Seth <ma...@gmail.com> wrote:
>
> Hi all,
>
> I ran the same cifar10.py script as Pedro, but for 20 epochs. Considering
> the first 10 epochs for warm-up, I averaged time per epoch for the last 10
> epochs.
>
> With MXNet 1.4.1 average time is 164.23 s
> With MXNet 1.5.0 average time is 174.59 s (~6.3% regression)
>
>
> For a second data point, I ran Gluon speed test benchmark script -
> https://github.com/apache/incubator-mxnet/blob/master/benchmark/python/gluon/benchmark_gluon.py
> using the following command:
> python3 benchmark_gluon.py --model 'resnet152_v2' --batch-size 128
> --num-batches 200 --type 'training'
>
> I got the following speeds:
> With MXNet 1.4.1, average speed is 25.677534 img/s
> With MXNet 1.5.0, average speed is 25.082130 img/s (~2.3% regression)
>
> Note:
> For 1.4.1 version, I used pip install mxnet-mkl==1.4.1
> For 1.5.0 version, I used pip install mxnet-mkl==1.5.0b20190619 which
> corresponds to commit# ccbbf6b4b76ea536a6583c99497c83b65a20817b which is
> behind 1.5.x branch by 4 commits
>
>
> Best,
> Manu
>
>
> On 6/27/19, 3:37 PM, "sandeep krishnamurthy" <sa...@gmail.com>
> wrote:
>
>     Hello Ciyong/Pedro,
>
>     Ran operator benchmarks on 1.4.1 and 1.5.0.rc2. (Not complete, doesn’t
>     cover all MXNet operators, not presented in best possible way, still
> WIP)
>
> https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50
>
>     Following operators looks slower in 1.5 compared to 1.4.1:
>     - BatchNorm
>     - Pooling
>     - FullyConnected
>     - batch_dot
>     - Dot
>     - broadcast_mul
>     - log_softmax
>     and few other operators
>
>     Also, several operators runs a lot faster on 1.5 compared to 1.4.1. For
>     example - Convolution, flatten, elementwise operators etc. So I see that
>     likely few operators have regressed noticeably, however, due to other
>     operator performance improvements, the end effect is not that
> significant
>     hiding a lot of regression. We need more detailed analysis per operator
>     performance. We will not be able to do this for current release, we
> should
>     have a more concrete way to determining such performance regression
> before
>     next release.
>
>     Setup:
>     1.5 => Build from source (head of 1.5.rc2 tag), built with MKLDNN
>     1.4.1 => PyPi mxnet-mkl==1.4.1
>     Machine: C5.18X
>     No explicit environment variable were set
>     Operator benchmark code -
>     https://github.com/apache/incubator-mxnet/tree/master/benchmark/opperf
>
>     Best,
>     Sandeep
>
>
>     On Thu, Jun 27, 2019 at 10:42 AM Pedro Larroy <
> pedro.larroy.lists@gmail.com>
>     wrote:
>
>     > I will try to run a few benchmarks in a bare metal instance tonight to
>     > remove virtualization variance for the measurements and provide some
>     > numbers.
>     >
>     > Please propose a set of models / examples that would be desirable to
>     > run before the release and provide a link to an easy to run script
>     > with instructions so we can validate the release better.
>     >
>     > Thank you.
>     >
>     > On Thu, Jun 27, 2019 at 10:01 AM Lai Wei <ro...@gmail.com> wrote:
>     > >
>     > > Dear @dev,
>     > >
>     > > I m cancelling the vote for cached op fix:
>     > >
>     > > https://github.com/apache/incubator-mxnet/pull/15298
>     > >
>     > > As for the possible cpu training regression, it looks like not a
> blocker
>     > > for now.
>     > >
>     > > I will start a new rc2 vote, please help to validate.
>     > >
>     > > Thanks!
>     > >
>     > >
>     > > On Thu, Jun 27, 2019 at 10:06 PM Chen, Ciyong <ciyong.chen@intel.com
> >
>     > wrote:
>     > >
>     > > > Hi Pedro,
>     > > >
>     > > > I was able to reproduced the similar result (v1.5 is ~%5.6 slower
> than
>     > > > v1.4, I was using 18 cores for computing) with your script on
>     > C5.18xlarge.
>     > > > But need to bind the cores with below command when running the
> script,
>     > > > (without setting the env variables, I got a close time (<1%) with
> v1.5
>     > and
>     > > > v1.4)
>     > > >         export
> KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0
>     > > >         export OMP_NUM_THREADS=18
>     > > >
>     > > > Did you set any env variables during running?
>     > > >
>     > > > The performance result I got as below:
>     > > > 1) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
>     > > > real    12m10.856s
>     > > > user    234m49.576s
>     > > > sys     4m38.044s
>     > > >
>     > > > 2) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
>     > > > real    12m52.140s
>     > > > user    246m30.740s
>     > > > sys     5m8.188s
>     > > >
>     > > > As I looked at the profiling data, most of the ops have same perf
>     > between
>     > > > v1.4 and v1.5. But some ops like " _backward_BatchNorm" and
> "Pooling"
>     > is
>     > > > ~1.37x slower on v1.5 compared with v1.4.
>     > > > Will do further analysis on these ops.
>     > > >
>     > > > Here's the hardware/OS info from my side:
>     > > > ----------Python Info----------
>     > > > Version      : 3.6.8
>     > > > Compiler     : GCC 7.3.0
>     > > > Build        : ('default', 'Dec 30 2018 01:22:34')
>     > > > Arch         : ('64bit', '')
>     > > > ------------Pip Info-----------
>     > > > Version      : 19.0.3
>     > > > Directory    :
>     > > >
> /home/ubuntu/anaconda3/envs/perf-mxnet/lib/python3.6/site-packages/pip
>     > > > ----------MXNet Info-----------
>     > > > Version      : 1.5.0
>     > > > Directory    : /home/ubuntu/ws/incubator-mxnet/python/mxnet
>     > > > Hashtag not found. Not installed from pre-built package.
>     > > > ----------System Info----------
>     > > > Platform     : Linux-4.4.0-1085-aws-x86_64-with-debian-stretch-sid
>     > > > system       : Linux
>     > > > node         : ip-172-31-32-129
>     > > > release      : 4.4.0-1085-aws
>     > > > version      : #96-Ubuntu SMP Tue Jun 11 09:08:32 UTC 2019
>     > > > ----------Hardware Info----------
>     > > > machine      : x86_64
>     > > > processor    : x86_64
>     > > > Architecture:          x86_64
>     > > > CPU op-mode(s):        32-bit, 64-bit
>     > > > Byte Order:            Little Endian
>     > > > CPU(s):                72
>     > > > On-line CPU(s) list:   0-71
>     > > > Thread(s) per core:    2
>     > > > Core(s) per socket:    18
>     > > > Socket(s):             2
>     > > > NUMA node(s):          2
>     > > > Vendor ID:             GenuineIntel
>     > > > CPU family:            6
>     > > > Model:                 85
>     > > > Model name:            Intel(R) Xeon(R) Platinum 8124M CPU @
> 3.00GHz
>     > > > Stepping:              3
>     > > > CPU MHz:               3000.000
>     > > > BogoMIPS:              6000.00
>     > > > Hypervisor vendor:     KVM
>     > > > Virtualization type:   full
>     > > > L1d cache:             32K
>     > > > L1i cache:             32K
>     > > > L2 cache:              1024K
>     > > > L3 cache:              25344K
>     > > > NUMA node0 CPU(s):     0-17,36-53
>     > > > NUMA node1 CPU(s):     18-35,54-71
>     > > > Flags:                 fpu vme de pse tsc msr pae mce cx8 apic
> sep mtrr
>     > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx
>     > pdpe1gb
>     > > > rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology
> nonstop_tsc
>     > > > aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16
> pcid
>     > sse4_1
>     > > > sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c
> rdrand
>     > > > hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser
> fsgsbase
>     > > > tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f
> rdseed
>     > adx
>     > > > smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 ida arat pku
>     > > > ----------Network Test----------
>     > > >
>     > > >
>     > > > -Ciyong
>     > > >
>     > > >
>     > > > -----Original Message-----
>     > > > From: Zhao, Patric [mailto:patric.zhao@intel.com]
>     > > > Sent: Thursday, June 27, 2019 9:55 AM
>     > > > To: dev@mxnet.incubator.apache.org
>     > > > Cc: dev@mxnet.apache.org
>     > > > Subject: RE: [VOTE] Release Apache MXNet (incubating) version
> 1.5.0.rc1
>     > > >
>     > > > Could we run more epochs to see the performance difference or
> profiling
>     > > > the difference between good and bad run?
>     > > >
>     > > > > -----Original Message-----
>     > > > > From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com]
>     > > > > Sent: Thursday, June 27, 2019 9:35 AM
>     > > > > To: dev@mxnet.incubator.apache.org
>     > > > > Cc: dev@mxnet.apache.org
>     > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version
>     > > > > 1.5.0.rc1
>     > > > >
>     > > > > I run again and the gap is again bigger, I guess we need to
> average
>     > > > > out the times across several runs:
>     > > > >
>     > > > > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
>     > > > > (master)+$ time ~/mxnet_1.4/py3_venv/bin/python cifar10.py
> --epochs 5
>     > > > > && time ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
>     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
>     > > > > ImageRecordIOParser2:
>     > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
>     > threads
>     > > > > for decoding..
>     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
>     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
>     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> completed
>     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
>     > > > > ImageRecordIOParser2:
>     > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
> threads
>     > > > > for decoding..
>     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
>     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
>     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> completed
>     > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
> 300:
>     > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 [23:17:09]
>     > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
>     > > > > 147456 bytes with malloc directly
>     > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
>     > > > > 589824 bytes with malloc directly
>     > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
>     > > > > 2359296 bytes with malloc directly
>     > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
>     > > > > 9437184 bytes with malloc directly
>     > > > > Epoch 0, Batch 199, Speed=384.149839
>     > > > > Epoch 0, Duration=140.919567
>     > > > > Epoch 0, Training accuracy=0.115169
>     > > > > Epoch 0, Validation accuracy=0.141317
>     > > > > Epoch 1, Batch 199, Speed=433.380512
>     > > > > Epoch 1, Duration=119.553233
>     > > > > Epoch 1, Training accuracy=0.170956
>     > > > > Epoch 1, Validation accuracy=0.216146
>     > > > > Epoch 2, Batch 199, Speed=434.864699
>     > > > > Epoch 2, Duration=123.278490
>     > > > > Epoch 2, Training accuracy=0.209455
>     > > > > Epoch 2, Validation accuracy=0.247296
>     > > > > Epoch 3, Batch 199, Speed=433.401854
>     > > > > Epoch 3, Duration=118.327797
>     > > > > Epoch 3, Training accuracy=0.248701
>     > > > > Epoch 3, Validation accuracy=0.302083
>     > > > > Epoch 4, Batch 199, Speed=419.713707
>     > > > > Epoch 4, Duration=126.468409
>     > > > > Epoch 4, Training accuracy=0.260949
>     > > > > Epoch 4, Validation accuracy=0.269030
>     > > > >
>     > > > > real    10m55.796s
>     > > > > user    399m33.567s
>     > > > > sys     13m55.904s
>     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
>     > > > > ImageRecordIOParser2:
>     > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
>     > threads
>     > > > > for decoding..
>     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
>     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
>     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> completed
>     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
>     > > > > ImageRecordIOParser2:
>     > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
> threads
>     > > > > for decoding..
>     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
>     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
>     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> completed
>     > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
> 300:
>     > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0, Batch
> 199,
>     > > > > Speed=419.039188 Epoch 0, Duration=143.934903 Epoch 0, Training
>     > > > > accuracy=0.122542 Epoch 0, Validation accuracy=0.164359 Epoch 1,
>     > Batch
>     > > > > 199, Speed=445.257048 Epoch 1, Duration=135.248399 Epoch 1,
> Training
>     > > > > accuracy=0.178828 Epoch 1, Validation accuracy=0.199419 Epoch 2,
>     > Batch
>     > > > > 199, Speed=447.115215 Epoch 2, Duration=132.003770 Epoch 2,
> Training
>     > > > > accuracy=0.217808 Epoch 2, Validation accuracy=0.233073 Epoch 3,
>     > Batch
>     > > > > 199, Speed=441.079477 Epoch 3, Duration=126.543316 Epoch 3,
> Training
>     > > > > accuracy=0.248102 Epoch 3, Validation accuracy=0.293870 Epoch 4,
>     > Batch
>     > > > > 199, Speed=449.329787 Epoch 4, Duration=138.398325 Epoch 4,
> Training
>     > > > > accuracy=0.270021 Epoch 4, Validation accuracy=0.311498
>     > > > >
>     > > > > real    11m45.329s
>     > > > > user    426m13.908s
>     > > > > sys     16m45.093s
>     > > > >
>     > > > > On Wed, Jun 26, 2019 at 4:18 PM Pedro Larroy
>     > > > > <pe...@gmail.com> wrote:
>     > > > > >
>     > > > > > The difference looks smaller now, more like your numbers. I
> wonder
>     > > > > > if something happened during the previous benchmark like a
> system
>     > > > > > update...
>     > > > > >
>     > > > > >
>     > > > > > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
>     > > > > (master)+$
>     > > > > > time ~/mxnet_1.4/py3_venv/bin/python cifar10.py --epochs 5 &&
> time
>     > > > > > ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
> [22:49:41]
>     > > > > > ../src/io/iter_image_recordio_2.cc:172:
>     > > > > > ImageRecordIOParser2:
>     > > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
>     > > > > > threads for decoding..
>     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
>     > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
>     > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > > completed
>     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:172:
>     > > > > > ImageRecordIOParser2:
>     > > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
>     > > > > > threads for decoding..
>     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
>     > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
>     > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > > completed
>     > > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
> 300:
>     > > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 [22:49:42]
>     > > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
>     > > > > > 147456 bytes with malloc directly
>     > > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> Allocate
>     > > > > > 589824 bytes with malloc directly
>     > > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> Allocate
>     > > > > > 2359296 bytes with malloc directly
>     > > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> Allocate
>     > > > > > 9437184 bytes with malloc directly
>     > > > > > Epoch 0, Batch 199, Speed=426.182733 Epoch 0,
> Duration=134.868458
>     > > > > > Epoch 0, Training accuracy=0.127238 Epoch 0, Validation
>     > > > > > accuracy=0.206388 Epoch 1, Batch 199, Speed=313.127156 Epoch
> 1,
>     > > > > > Duration=128.041775 Epoch 1, Training accuracy=0.182065 Epoch
> 1,
>     > > > > > Validation accuracy=0.202524 Epoch 2, Batch 199,
> Speed=410.931187
>     > > > > > Epoch 2, Duration=124.920588 Epoch 2, Training
> accuracy=0.202584
>     > > > > > Epoch 2, Validation accuracy=0.245693 Epoch 3, Batch 199,
>     > > > > > Speed=419.119335 Epoch 3, Duration=120.948349 Epoch 3,
> Training
>     > > > > > accuracy=0.235854 Epoch 3, Validation accuracy=0.291066 Epoch
> 4,
>     > > > > > Batch 199, Speed=430.473733 Epoch 4, Duration=130.181724
> Epoch 4,
>     > > > > > Training accuracy=0.257773 Epoch 4, Validation
> accuracy=0.304988
>     > > > > >
>     > > > > > real    11m7.356s
>     > > > > > user    406m9.910s
>     > > > > > sys     14m18.349s
>     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
>     > > > > > ImageRecordIOParser2:
>     > > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
>     > > > > > threads for decoding..
>     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
>     > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
>     > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > > completed
>     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
>     > > > > > ImageRecordIOParser2:
>     > > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
>     > > > > > threads for decoding..
>     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
>     > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
>     > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > > completed
>     > > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
> 300:
>     > > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0, Batch
> 199,
>     > > > > > Speed=348.618154 Epoch 0, Duration=146.469352 Epoch 0,
> Training
>     > > > > > accuracy=0.124121 Epoch 0, Validation accuracy=0.167227 Epoch
> 1,
>     > > > > > Batch 199, Speed=452.790825 Epoch 1, Duration=130.199421
> Epoch 1,
>     > > > > > Training
>     > > > > > accuracy=0.183863 Epoch 1, Validation accuracy=0.237079 Epoch
> 2,
>     > > > > > Batch 199, Speed=451.406559 Epoch 2, Duration=126.320823
> Epoch 2,
>     > > > > > Training
>     > > > > > accuracy=0.214844 Epoch 2, Validation accuracy=0.244692 Epoch
> 3,
>     > > > > > Batch 199, Speed=403.161873 Epoch 3, Duration=125.331660
> Epoch 3,
>     > > > > > Training
>     > > > > > accuracy=0.243506 Epoch 3, Validation accuracy=0.301182 Epoch
> 4,
>     > > > > > Batch 199, Speed=450.826598 Epoch 4, Duration=126.426253
> Epoch 4,
>     > > > > > Training
>     > > > > > accuracy=0.266424 Epoch 4, Validation accuracy=0.311899
>     > > > > >
>     > > > > > real    11m21.930s
>     > > > > > user    415m3.855s
>     > > > > > sys     13m53.975s
>     > > > > >
>     > > > > > On Wed, Jun 26, 2019 at 3:50 PM Pedro Larroy
>     > > > > > <pe...@gmail.com> wrote:
>     > > > > > >
>     > > > > > > Hi Ciyong, thanks for trying to reproduce:
>     > > > > > >
>     > > > > > > I used this one:
>     > > > > > > https://github.com/awslabs/deeplearning-
>     > > > > benchmark/blob/master/dawnbe
>     > > > > > > nch/cifar10.py
>     > > > > > >
>     > > > > > > Could you provide hardware and OS details?
>     > > > > > >
>     > > > > > > I will rerun and repost numbers in a few minutes.
>     > > > > > >
>     > > > > > > Pedro.
>     > > > > > >
>     > > > > > > On Wed, Jun 26, 2019 at 4:18 AM Chen, Ciyong
>     > > > > > > <ci...@intel.com>
>     > > > > wrote:
>     > > > > > > >
>     > > > > > > > Hi Pedro,
>     > > > > > > >
>     > > > > > > > I'm looking at this case, and using the script of
>     > > > > > > >
> "incubator-mxnet/example/image-classification/train_cifar10.py"
>     > > > > > > > to get
>     > > > > the timing data, but seems there's not much difference between
> mxnet
>     > > > > 1.4.1.rc0 and 1.5.0.rc1 on C5.18xlarge.
>     > > > > > > >
>     > > > > > > > Not sure if there's any difference in the python script,
> can
>     > you
>     > > > > > > > point me
>     > > > > the link to get your script (cifar10.py)?
>     > > > > > > > Or you can also have a try with MXNet's script
>     > > > > > > > (train_cifar10.py) and see
>     > > > > the performance.
>     > > > > > > >
>     > > > > > > > Here's the command I used to collect the time:
>     > > > > > > >         python train_cifar10.py --num-epoch=5
>     > > > > > > >
>     > > > > > > > 1) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
>     > > > > > > >         real    9m4.880s
>     > > > > > > >         user    333m13.340s
>     > > > > > > >         sys     14m36.100s
>     > > > > > > >
>     > > > > > > > 2) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
>     > > > > > > >         real    9m2.155s
>     > > > > > > >         user    329m37.092s
>     > > > > > > >         sys     16m8.668s
>     > > > > > > >
>     > > > > > > > -Ciyong
>     > > > > > > >
>     > > > > > > >
>     > > > > > > > -----Original Message-----
>     > > > > > > > From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com]
>     > > > > > > > Sent: Wednesday, June 26, 2019 6:28 AM
>     > > > > > > > To: dev@mxnet.incubator.apache.org
>     > > > > > > > Cc: dev@mxnet.apache.org
>     > > > > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating)
> version
>     > > > > > > > 1.5.0.rc1
>     > > > > > > >
>     > > > > > > > Hi these were my build flags and system info:
>     > > > > > > >
>     > > > > > > >
>     > > > > > > > --- # CMake configuration
>     > > > > > > > USE_CUDA: "OFF" # Build with CUDA support
>     > > > > > > > USE_OLDCMAKECUDA: "OFF" # Build with old cmake cuda
>     > > > > > > > USE_NCCL: "OFF" # Use NVidia NCCL with CUDA
>     > > > > > > > USE_OPENCV: "ON" # Build with OpenCV support
>     > > > > > > > USE_OPENMP: "ON" # Build with Openmp support
>     > > > > > > > USE_CUDNN: "ON" # Build with cudnn support) # one could
> set
>     > > > > > > > CUDNN_ROOT for search path
>     > > > > > > > USE_SSE: "ON" # Build with x86 SSE instruction support IF
> NOT
>     > > > > > > > ARM
>     > > > > > > > USE_F16C: "ON" # Build with x86 F16C instruction support)
> #
>     > > > > autodetects support if "ON"
>     > > > > > > > USE_LAPACK: "ON" # Build with lapack support
>     > > > > > > > USE_MKL_IF_AVAILABLE: "ON" # Use MKL if found
>     > > > > > > > USE_MKLML_MKL: "ON" # Use MKLDNN variant of MKL (if MKL
> found)
>     > > > > > > > IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
>     > > > > > > > USE_MKLDNN: "ON" # Use MKLDNN variant of MKL (if MKL
> found) IF
>     > > > > > > > USE_MKL_IF_AVAILABLE AND (NOT APPLE)
>     > > > > > > > USE_OPERATOR_TUNING: "ON" # Enable auto-tuning of
> operators IF
>     > > > > NOT
>     > > > > > > > MSVC
>     > > > > > > > USE_GPERFTOOLS: "ON" # Build with GPerfTools support (if
> found)
>     > > > > > > > USE_JEMALLOC: "ON" # Build with Jemalloc support
>     > > > > > > > USE_PROFILER: "ON" # Build with Profiler support
>     > > > > > > > USE_DIST_KVSTORE: "OFF" # Build with DIST_KVSTORE support
>     > > > > > > > USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC Plugins
>     > > > > > > > USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin
>     > > > > > > > USE_CPP_PACKAGE: "OFF" # Build C++ Package
>     > > > > > > > USE_MXNET_LIB_NAMING: "ON" # Use MXNet library naming
>     > > > > conventions.
>     > > > > > > > USE_GPROF: "OFF" # Compile with gprof (profiling) flag
>     > > > > > > > USE_CXX14_IF_AVAILABLE: "OFF" # Build with C++14 if the
>     > compiler
>     > > > > > > > supports it
>     > > > > > > > USE_VTUNE: "OFF" # Enable use of Intel Amplifier XE
> (VTune)) #
>     > > > > > > > one could set VTUNE_ROOT for search path
>     > > > > > > > ENABLE_CUDA_RTC: "ON" # Build with CUDA runtime
> compilation
>     > > > > > > > support
>     > > > > > > > BUILD_CPP_EXAMPLES: "ON" # Build cpp examples
>     > > > > > > > INSTALL_EXAMPLES: "OFF" # Install the example source
> files.
>     > > > > > > > USE_SIGNAL_HANDLER: "ON" # Print stack traces on
> segfaults.
>     > > > > > > > USE_TENSORRT: "OFF" # Enable infeference optimization with
>     > > > TensorRT.
>     > > > > > > > USE_ASAN: "OFF" # Enable Clang/GCC ASAN sanitizers.
>     > > > > > > > ENABLE_TESTCOVERAGE: "OFF" # Enable compilation with test
>     > > > > > > > coverage metric output
>     > > > > > > > CMAKE_BUILD_TYPE: "Release"
>     > > > > > > > CMAKE_CUDA_COMPILER_LAUNCHER: "ccache"
>     > > > > > > > CMAKE_C_COMPILER_LAUNCHER: "ccache"
>     > > > > > > > CMAKE_CXX_COMPILER_LAUNCHER: "ccache"
>     > > > > > > >
>     > > > > > > > commit 4d9667121ae6fb643f2a02ab15e25231ed756cde (HEAD,
> tag:
>     > > > > > > > 1.5.0.rc1,
>     > > > > > > > upstream/v1.5.x)
>     > > > > > > > commit 1a7199691f5cbc6012bb53eecbf884bed5ae6590 (HEAD,
> tag:
>     > > > > > > > 1.4.1.rc0,
>     > > > > > > > upstream/v1.4.x)
>     > > > > > > >
>     > > > > > > > curl http://169.254.169.254/latest/meta-data/instance-type
>     > > > > > > > c5d.18xlarge
>     > > > > > > >
>     > > > > > > >
>     > > > > > > > Version      : 3.6.7
>     > > > > > > > Compiler     : GCC 8.2.0
>     > > > > > > > Build        : ('default', 'Oct 22 2018 11:32:17')
>     > > > > > > > Arch         : ('64bit', 'ELF')
>     > > > > > > > ------------Pip Info-----------
>     > > > > > > > Version      : 19.1.1
>     > > > > > > > Directory    :
>     > /home/piotr/mxnet_1.5/py3_venv/lib/python3.6/site-
>     > > > > packages/pip
>     > > > > > > > ----------MXNet Info-----------
>     > > > > > > > Version      : 1.5.0
>     > > > > > > > Directory    : /home/piotr/mxnet_1.5/python/mxnet
>     > > > > > > > Hashtag not found. Not installed from pre-built package.
>     > > > > > > > ----------System Info----------
>     > > > > > > > Platform     :
>     > > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
>     > > > > > > > system       : Linux
>     > > > > > > > node         : ip-172-31-63-171
>     > > > > > > > release      : 4.15.0-1035-aws
>     > > > > > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
>     > > > > > > > ----------Hardware Info----------
>     > > > > > > > machine      : x86_64
>     > > > > > > > processor    : x86_64
>     > > > > > > > Architecture:        x86_64
>     > > > > > > > CPU op-mode(s):      32-bit, 64-bit
>     > > > > > > > Byte Order:          Little Endian
>     > > > > > > > CPU(s):              72
>     > > > > > > > On-line CPU(s) list: 0-71
>     > > > > > > > Thread(s) per core:  2
>     > > > > > > > Core(s) per socket:  18
>     > > > > > > > Socket(s):           2
>     > > > > > > > NUMA node(s):        2
>     > > > > > > > Vendor ID:           GenuineIntel
>     > > > > > > > CPU family:          6
>     > > > > > > > Model:               85
>     > > > > > > > Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @
>     > 3.00GHz
>     > > > > > > > Stepping:            4
>     > > > > > > > CPU MHz:             1326.446
>     > > > > > > > BogoMIPS:            6000.00
>     > > > > > > > Hypervisor vendor:   KVM
>     > > > > > > > Virtualization type: full
>     > > > > > > > L1d cache:           32K
>     > > > > > > > L1i cache:           32K
>     > > > > > > > L2 cache:            1024K
>     > > > > > > > L3 cache:            25344K
>     > > > > > > > NUMA node0 CPU(s):   0-17,36-53
>     > > > > > > > NUMA node1 CPU(s):   18-35,54-71
>     > > > > > > > Flags:               fpu vme de pse tsc msr pae mce cx8
> apic
>     > sep
>     > > > mtrr
>     > > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht
> syscall
>     > > > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good
> nopl
>     > > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
> monitor
>     > > > > > > > ssse3 fma cx16 pcid
>     > > > > > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
> xsave
>     > > > > > > > avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
>     > > > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep
> bmi2
>     > > > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx smap
>     > clflushopt
>     > > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
> xsaves
>     > > > > > > > ida arat pku ospke ----------Network Test----------
>     > > > > > > >
>     > > > > > > > ----------Python Info----------
>     > > > > > > > Version      : 3.6.7
>     > > > > > > > Compiler     : GCC 8.2.0
>     > > > > > > > Build        : ('default', 'Oct 22 2018 11:32:17')
>     > > > > > > > Arch         : ('64bit', 'ELF')
>     > > > > > > > ------------Pip Info-----------
>     > > > > > > > Version      : 19.1.1
>     > > > > > > > Directory    :
>     > /home/piotr/mxnet_1.4/py3_venv/lib/python3.6/site-
>     > > > > packages/pip
>     > > > > > > > ----------MXNet Info-----------
>     > > > > > > > Version      : 1.4.1
>     > > > > > > > Directory    : /home/piotr/mxnet_1.4/python/mxnet
>     > > > > > > > Hashtag not found. Not installed from pre-built package.
>     > > > > > > > ----------System Info----------
>     > > > > > > > Platform     :
>     > > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
>     > > > > > > > system       : Linux
>     > > > > > > > node         : ip-172-31-63-171
>     > > > > > > > release      : 4.15.0-1035-aws
>     > > > > > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
>     > > > > > > > ----------Hardware Info----------
>     > > > > > > > machine      : x86_64
>     > > > > > > > processor    : x86_64
>     > > > > > > > Architecture:        x86_64
>     > > > > > > > CPU op-mode(s):      32-bit, 64-bit
>     > > > > > > > Byte Order:          Little Endian
>     > > > > > > > CPU(s):              72
>     > > > > > > > On-line CPU(s) list: 0-71
>     > > > > > > > Thread(s) per core:  2
>     > > > > > > > Core(s) per socket:  18
>     > > > > > > > Socket(s):           2
>     > > > > > > > NUMA node(s):        2
>     > > > > > > > Vendor ID:           GenuineIntel
>     > > > > > > > CPU family:          6
>     > > > > > > > Model:               85
>     > > > > > > > Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @
>     > 3.00GHz
>     > > > > > > > Stepping:            4
>     > > > > > > > CPU MHz:             1223.344
>     > > > > > > > BogoMIPS:            6000.00
>     > > > > > > > Hypervisor vendor:   KVM
>     > > > > > > > Virtualization type: full
>     > > > > > > > L1d cache:           32K
>     > > > > > > > L1i cache:           32K
>     > > > > > > > L2 cache:            1024K
>     > > > > > > > L3 cache:            25344K
>     > > > > > > > NUMA node0 CPU(s):   0-17,36-53
>     > > > > > > > NUMA node1 CPU(s):   18-35,54-71
>     > > > > > > > Flags:               fpu vme de pse tsc msr pae mce cx8
> apic
>     > sep
>     > > > mtrr
>     > > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht
> syscall
>     > > > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good
> nopl
>     > > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
> monitor
>     > > > > > > > ssse3 fma cx16 pcid
>     > > > > > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
> xsave
>     > > > > > > > avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
>     > > > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep
> bmi2
>     > > > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx smap
>     > clflushopt
>     > > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
> xsaves
>     > > > > > > > ida arat pku ospke ----------Network Test----------
>     > > > > > > >
>     > > > > > > > On Tue, Jun 25, 2019 at 2:35 PM Pedro Larroy
>     > > > > <pe...@gmail.com> wrote:
>     > > > > > > > >
>     > > > > > > > > I did a training of cifar10 in CPU and seems there's
> some
>     > > > > > > > > regressions in the range of 7% increase of training time
>     > against
>     > > > 1.4.1:
>     > > > > > > > >
>     > > > > > > > > (py3_venv)
>     > > > > > > > > piotr@ip-172-31-63-171
> :0:~/deeplearning-benchmark/dawnbench
>     > > > > > > > > (master)+$ time python cifar10.py --epochs 5
>     > > > > > > > > real    11m30.388s
>     > > > > > > > > user    417m7.766s
>     > > > > > > > > sys     16m57.315s
>     > > > > > > > >
>     > > > > > > > > VS 1.4.1:
>     > > > > > > > > real    10m41.994s
>     > > > > > > > > user    392m40.646s
>     > > > > > > > > sys     12m30.601s
>     > > > > > > > >
>     > > > > > > > >
>     > > > > > > > > On Thu, Jun 20, 2019 at 10:15 PM Lai Wei <
>     > royweilai@gmail.com>
>     > > > > wrote:
>     > > > > > > > > >
>     > > > > > > > > > Hi Anirudh,
>     > > > > > > > > >
>     > > > > > > > > > Thanks for jumping into this quickly, I followed up
> on the
>     > > > issue.
>     > > > > > > > > >
>     > > > > > > > > > I was meant for sockeye developer/maintainers to help
> setup
>     > > > > > > > > > nightly tests and raise issues early.
>     > > > > > > > > >
>     > > > > > > > > > Thanks!
>     > > > > > > > > >
>     > > > > > > > > > On Fri, Jun 21, 2019 at 10:10 AM Haibin Lin
>     > > > > > > > > > <ha...@gmail.com>
>     > > > > > > > > > wrote:
>     > > > > > > > > >
>     > > > > > > > > > > In GluonNLP we are testing with MXNET nightly build
> for
>     > > > > > > > > > > each PR, and we did find some MXNet related issue
> caught
>     > by
>     > > > the CI.
>     > > > > > > > > > > I recommend other toolkits also add integration
> tests
>     > with
>     > > > > > > > > > > MXNet
>     > > > > nightly.
>     > > > > > > > > > > It helps identify issues early.
>     > > > > > > > > > >
>     > > > > > > > > > > Best,
>     > > > > > > > > > > Haibin
>     > > > > > > > > > >
>     > > > > > > > > > > On Thu, Jun 20, 2019 at 18:52 Zhao, Patric
>     > > > > > > > > > > <pa...@intel.com>
>     > > > > wrote:
>     > > > > > > > > > >
>     > > > > > > > > > > > Thanks to raise the issue and we will take a look
> ASAP.
>     > > > > > > > > > > >
>     > > > > > > > > > > > The downstream cases is not in the MXNet CI so
> it's
>     > hard
>     > > > > > > > > > > > to catch the potential bugs or performance
> degradation
>     > > > > > > > > > > > for
>     > > > > MXNet developers.
>     > > > > > > > > > > >
>     > > > > > > > > > > > In the future, I suggest adding the major
> downstream
>     > > > > > > > > > > > test cases, like
>     > > > > > > > > > > from
>     > > > > > > > > > > > sockeye, GluonNLP, GLuonCV, DGL, Gluon-TS, into
> the
>     > > > > > > > > > > > nightly
>     > > > > test.
>     > > > > > > > > > > > If it's still too heavy,  maybe testing it weekly
> or
>     > > > > > > > > > > > monthly :)
>     > > > > > > > > > > >
>     > > > > > > > > > > > Thanks,
>     > > > > > > > > > > >
>     > > > > > > > > > > > --Patric
>     > > > > > > > > > > >
>     > > > > > > > > > > > > -----Original Message-----
>     > > > > > > > > > > > > From: Anirudh Subramanian
>     > > > > > > > > > > > > [mailto:anirudh2290@gmail.com]
>     > > > > > > > > > > > > Sent: Friday, June 21, 2019 9:31 AM
>     > > > > > > > > > > > > To: dev@mxnet.incubator.apache.org
>     > > > > > > > > > > > > Cc: dev@mxnet.apache.org
>     > > > > > > > > > > > > Subject: Re: [VOTE] Release Apache MXNet
> (incubating)
>     > > > > > > > > > > > > version
>     > > > > > > > > > > > > 1.5.0.rc1
>     > > > > > > > > > > > >
>     > > > > > > > > > > > > Hi Lai,
>     > > > > > > > > > > > >
>     > > > > > > > > > > > > I have opened an issue:
>     > > > > > > > > > > > >
>     > https://github.com/apache/incubator-mxnet/issues/15297
>     > > > > > > > > > > > > I came to know about this issue only today and
> I have
>     > > > > > > > > > > > > not been
>     > > > > > > > > > > monitoring
>     > > > > > > > > > > > > sockeye.
>     > > > > > > > > > > > > I jumped onto this issue to make sure it wasn't
>     > caused
>     > > > > > > > > > > > > by the dlpack
>     > > > > > > > > > > > changes.
>     > > > > > > > > > > > > Also, I don't  think sockeye CI checks against
>     > master,
>     > > > > > > > > > > > > it is using
>     > > > > > > > > > > 1.4.1.
>     > > > > > > > > > > > >
>     > > > > > > > > > > > > Anirudh
>     > > > > > > > > > > > >
>     > > > > > > > > > > > >
>     > > > > > > > > > > > > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei
>     > > > > > > > > > > > > <ro...@gmail.com>
>     > > > > wrote:
>     > > > > > > > > > > > >
>     > > > > > > > > > > > > > Hi,
>     > > > > > > > > > > > > >
>     > > > > > > > > > > > > > Could you share which test failed and what’s
> the
>     > > > > > > > > > > > > > crash? How to reproduce it?
>     > > > > > > > > > > > > >
>     > > > > > > > > > > > > > I was able to install sockeye and run all
> tests
>     > passed.
>     > > > > > > > > > > > > > Using python setup.py test
>     > > > > > > > > > > > > >
>     > > > > > > > > > > > > > I have tested both nightly pip package and
>     > 1.5.0.rc1
>     > > > > > > > > > > > > >
>     > > > > > > > > > > > > > It would be great to create an issue with
>     > > > > > > > > > > > > > reproducible steps and move the discussion
> there.
>     > > > > > > > > > > > > >
>     > > > > > > > > > > > > > Also I see sockeye nightly build[1] has been
>     > failing
>     > > > > > > > > > > > > > for some time,
>     > > > > > > > > > > if
>     > > > > > > > > > > > > > it’s due to MXNet change, please raise this
> early
>     > so
>     > > > > > > > > > > > > > we can track and solve it in time rather than
> block
>     > > > > > > > > > > > > > the release
>     > > > > during vote time.
>     > > > > > > > > > > > > >
>     > > > > > > > > > > > > > [1] https://travis-ci.org/awslabs/sockeye
>     > > > > > > > > > > > > >
>     > > > > > > > > > > > > >
>     > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 7:01 AM Anirudh
> Subramanian
>     > > > > > > > > > > > > > <anirudh2290@gmail.com
>     > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > wrote:
>     > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > I was able to reproduce a crash with the
> commit
>     > > > > > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06
> but not
>     > > > > > > > > > > > > > > with the commit
>     > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c.
>     > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > Anirudh
>     > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei
>     > > > > > > > > > > > > > > <ro...@gmail.com>
>     > > > > > > > > > > wrote:
>     > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > Hi Przemyslaw,
>     > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > Is there an issue with more details to
> track
>     > the
>     > > > problem?
>     > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 6:04 AM Przemysław
>     > > > > > > > > > > > > > > > Trędak <pt...@apache.org>
>     > > > > > > > > > > > > > > > wrote:
>     > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > -1
>     > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > There is a crash in sockeye unit test
> (python
>     > > > > > > > > > > > > > > > > setup.py
>     > > > > > > > > > > > > > > > > test) observed starting with nightly 1.5
>     > build
>     > > > > > > > > > > > > > > > > from
>     > > > > > > > > > > > > > > > > 6/13 and still occuring in
>     > > > > > > > > > > > > > > 1.5rc1. I
>     > > > > > > > > > > > > > > > > don't yet have the exact commit that is
>     > > > > > > > > > > > > > > > > responsible for it, but it is either
>     > > > > > > > > > > > > > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c
>     > > > > > > > > > > > > > > > > (dlpack
>     > > > > > > > > > > > > > > > > related) or
>     > > > > > > > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06
>     > > > > > > > > > > > > > > > > (cached op
>     > > > > > > > > > > > > optimization).
>     > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > On 2019/06/20 06:36:22, Lai Wei
>     > > > > > > > > > > > > > > > > <ro...@gmail.com>
>     > > > > wrote:
>     > > > > > > > > > > > > > > > > > Dear MXNet community,
>     > > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > > This is the 3-day vote to release
> Apache
>     > > > > > > > > > > > > > > > > > MXNet
>     > > > > > > > > > > > > > > > > > (incubating) version
>     > > > > > > > > > > > > > > > > 1.5.0.
>     > > > > > > > > > > > > > > > > > Voting on dev@ will start June 19,
>     > > > > > > > > > > > > > > > > > 23:59:59(PST) and close
>     > > > > > > > > > > on
>     > > > > > > > > > > > > > June
>     > > > > > > > > > > > > > > > 22,
>     > > > > > > > > > > > > > > > > > 23:59:59.
>     > > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > > 1) Link to release notes:
>     > > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > >
>     > > > > > > > > > >
>     > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Re
>     > > > > > > > > > > le
>     > > > > > > > > > > ase+No
>     > > > > > > > > > > te
>     > > > > > > > > > > > > > > s
>     > > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > > 2) Link to release candidate:
>     > > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > >
>     > > > > > > > > > >
>     > https://github.com/apache/incubator-mxnet/releases/tag/1.5
>     > > > > > > > > > > .0
>     > > > > > > > > > > .r
>     > > > > > > > > > > > > > > > > > c1
>     > > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > > 3) Link to source and signatures on
> apache
>     > > > dist server:
>     > > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > >
>     > > > > > > > > > >
>     > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5
>     > > > > > > > > > > .0
>     > > > > > > > > > > .r
>     > > > > > > > > > > > > > > > > > c1/
>     > > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > > Please remember to TEST first before
> voting
>     > > > > accordingly:
>     > > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > > +1 = approve
>     > > > > > > > > > > > > > > > > > +0 = no opinion
>     > > > > > > > > > > > > > > > > > -1 = disapprove (provide reason)
>     > > > > > > > > > > > > > > > > > --
>     > > > > > > > > > > > > > > > > > Best Regards
>     > > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > > Lai
>     > > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > --
>     > > > > > > > > > > > > > > > Best Regards
>     > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > Lai
>     > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > --
>     > > > > > > > > > > > > > Best Regards
>     > > > > > > > > > > > > >
>     > > > > > > > > > > > > > Lai
>     > > > > > > > > > > > > >
>     > > > > > > > > > > >
>     > > > > > > > > > >
>     > > > > > > > > > --
>     > > > > > > > > > Best Regards
>     > > > > > > > > >
>     > > > > > > > > > Lai
>     > > >
>     > > --
>     > > Best Regards
>     > >
>     > > Lai
>     >
>     >
>
>     --
>     Sandeep Krishnamurthy


Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Posted by Manu Seth <ma...@gmail.com>.
Hi all,

I ran the same cifar10.py script as Pedro, but for 20 epochs. Considering
the first 10 epochs for warm-up, I averaged time per epoch for the last 10
epochs.

With MXNet 1.4.1 average time is 164.23 s
With MXNet 1.5.0 average time is 174.59 s (~6.3% regression)


For a second data point, I ran Gluon speed test benchmark script -
https://github.com/apache/incubator-mxnet/blob/master/benchmark/python/gluon/benchmark_gluon.py
using the following command:
python3 benchmark_gluon.py --model 'resnet152_v2' --batch-size 128
--num-batches 200 --type 'training'

I got the following speeds:
With MXNet 1.4.1, average speed is 25.677534 img/s
With MXNet 1.5.0, average speed is 25.082130 img/s (~2.3% regression)

Note:
For 1.4.1 version, I used pip install mxnet-mkl==1.4.1
For 1.5.0 version, I used pip install mxnet-mkl==1.5.0b20190619 which
corresponds to commit# ccbbf6b4b76ea536a6583c99497c83b65a20817b which is
behind 1.5.x branch by 4 commits


Best,
Manu


On 6/27/19, 3:37 PM, "sandeep krishnamurthy" <sa...@gmail.com>
wrote:

    Hello Ciyong/Pedro,

    Ran operator benchmarks on 1.4.1 and 1.5.0.rc2. (Not complete, doesn’t
    cover all MXNet operators, not presented in best possible way, still
WIP)

https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50

    Following operators looks slower in 1.5 compared to 1.4.1:
    - BatchNorm
    - Pooling
    - FullyConnected
    - batch_dot
    - Dot
    - broadcast_mul
    - log_softmax
    and few other operators

    Also, several operators runs a lot faster on 1.5 compared to 1.4.1. For
    example - Convolution, flatten, elementwise operators etc. So I see that
    likely few operators have regressed noticeably, however, due to other
    operator performance improvements, the end effect is not that
significant
    hiding a lot of regression. We need more detailed analysis per operator
    performance. We will not be able to do this for current release, we
should
    have a more concrete way to determining such performance regression
before
    next release.

    Setup:
    1.5 => Build from source (head of 1.5.rc2 tag), built with MKLDNN
    1.4.1 => PyPi mxnet-mkl==1.4.1
    Machine: C5.18X
    No explicit environment variable were set
    Operator benchmark code -
    https://github.com/apache/incubator-mxnet/tree/master/benchmark/opperf

    Best,
    Sandeep


    On Thu, Jun 27, 2019 at 10:42 AM Pedro Larroy <
pedro.larroy.lists@gmail.com>
    wrote:

    > I will try to run a few benchmarks in a bare metal instance tonight to
    > remove virtualization variance for the measurements and provide some
    > numbers.
    >
    > Please propose a set of models / examples that would be desirable to
    > run before the release and provide a link to an easy to run script
    > with instructions so we can validate the release better.
    >
    > Thank you.
    >
    > On Thu, Jun 27, 2019 at 10:01 AM Lai Wei <ro...@gmail.com> wrote:
    > >
    > > Dear @dev,
    > >
    > > I m cancelling the vote for cached op fix:
    > >
    > > https://github.com/apache/incubator-mxnet/pull/15298
    > >
    > > As for the possible cpu training regression, it looks like not a
blocker
    > > for now.
    > >
    > > I will start a new rc2 vote, please help to validate.
    > >
    > > Thanks!
    > >
    > >
    > > On Thu, Jun 27, 2019 at 10:06 PM Chen, Ciyong <ciyong.chen@intel.com
>
    > wrote:
    > >
    > > > Hi Pedro,
    > > >
    > > > I was able to reproduced the similar result (v1.5 is ~%5.6 slower
than
    > > > v1.4, I was using 18 cores for computing) with your script on
    > C5.18xlarge.
    > > > But need to bind the cores with below command when running the
script,
    > > > (without setting the env variables, I got a close time (<1%) with
v1.5
    > and
    > > > v1.4)
    > > >         export
KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0
    > > >         export OMP_NUM_THREADS=18
    > > >
    > > > Did you set any env variables during running?
    > > >
    > > > The performance result I got as below:
    > > > 1) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
    > > > real    12m10.856s
    > > > user    234m49.576s
    > > > sys     4m38.044s
    > > >
    > > > 2) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
    > > > real    12m52.140s
    > > > user    246m30.740s
    > > > sys     5m8.188s
    > > >
    > > > As I looked at the profiling data, most of the ops have same perf
    > between
    > > > v1.4 and v1.5. But some ops like " _backward_BatchNorm" and
"Pooling"
    > is
    > > > ~1.37x slower on v1.5 compared with v1.4.
    > > > Will do further analysis on these ops.
    > > >
    > > > Here's the hardware/OS info from my side:
    > > > ----------Python Info----------
    > > > Version      : 3.6.8
    > > > Compiler     : GCC 7.3.0
    > > > Build        : ('default', 'Dec 30 2018 01:22:34')
    > > > Arch         : ('64bit', '')
    > > > ------------Pip Info-----------
    > > > Version      : 19.0.3
    > > > Directory    :
    > > >
/home/ubuntu/anaconda3/envs/perf-mxnet/lib/python3.6/site-packages/pip
    > > > ----------MXNet Info-----------
    > > > Version      : 1.5.0
    > > > Directory    : /home/ubuntu/ws/incubator-mxnet/python/mxnet
    > > > Hashtag not found. Not installed from pre-built package.
    > > > ----------System Info----------
    > > > Platform     : Linux-4.4.0-1085-aws-x86_64-with-debian-stretch-sid
    > > > system       : Linux
    > > > node         : ip-172-31-32-129
    > > > release      : 4.4.0-1085-aws
    > > > version      : #96-Ubuntu SMP Tue Jun 11 09:08:32 UTC 2019
    > > > ----------Hardware Info----------
    > > > machine      : x86_64
    > > > processor    : x86_64
    > > > Architecture:          x86_64
    > > > CPU op-mode(s):        32-bit, 64-bit
    > > > Byte Order:            Little Endian
    > > > CPU(s):                72
    > > > On-line CPU(s) list:   0-71
    > > > Thread(s) per core:    2
    > > > Core(s) per socket:    18
    > > > Socket(s):             2
    > > > NUMA node(s):          2
    > > > Vendor ID:             GenuineIntel
    > > > CPU family:            6
    > > > Model:                 85
    > > > Model name:            Intel(R) Xeon(R) Platinum 8124M CPU @
3.00GHz
    > > > Stepping:              3
    > > > CPU MHz:               3000.000
    > > > BogoMIPS:              6000.00
    > > > Hypervisor vendor:     KVM
    > > > Virtualization type:   full
    > > > L1d cache:             32K
    > > > L1i cache:             32K
    > > > L2 cache:              1024K
    > > > L3 cache:              25344K
    > > > NUMA node0 CPU(s):     0-17,36-53
    > > > NUMA node1 CPU(s):     18-35,54-71
    > > > Flags:                 fpu vme de pse tsc msr pae mce cx8 apic
sep mtrr
    > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx
    > pdpe1gb
    > > > rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology
nonstop_tsc
    > > > aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16
pcid
    > sse4_1
    > > > sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c
rdrand
    > > > hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser
fsgsbase
    > > > tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f
rdseed
    > adx
    > > > smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 ida arat pku
    > > > ----------Network Test----------
    > > >
    > > >
    > > > -Ciyong
    > > >
    > > >
    > > > -----Original Message-----
    > > > From: Zhao, Patric [mailto:patric.zhao@intel.com]
    > > > Sent: Thursday, June 27, 2019 9:55 AM
    > > > To: dev@mxnet.incubator.apache.org
    > > > Cc: dev@mxnet.apache.org
    > > > Subject: RE: [VOTE] Release Apache MXNet (incubating) version
1.5.0.rc1
    > > >
    > > > Could we run more epochs to see the performance difference or
profiling
    > > > the difference between good and bad run?
    > > >
    > > > > -----Original Message-----
    > > > > From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com]
    > > > > Sent: Thursday, June 27, 2019 9:35 AM
    > > > > To: dev@mxnet.incubator.apache.org
    > > > > Cc: dev@mxnet.apache.org
    > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version
    > > > > 1.5.0.rc1
    > > > >
    > > > > I run again and the gap is again bigger, I guess we need to
average
    > > > > out the times across several runs:
    > > > >
    > > > > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
    > > > > (master)+$ time ~/mxnet_1.4/py3_venv/bin/python cifar10.py
--epochs 5
    > > > > && time ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
    > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
    > > > > ImageRecordIOParser2:
    > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
    > threads
    > > > > for decoding..
    > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean
image
    > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean
image
    > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
completed
    > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
    > > > > ImageRecordIOParser2:
    > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
threads
    > > > > for decoding..
    > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean
image
    > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean
image
    > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
completed
    > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
300:
    > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 [23:17:09]
    > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
    > > > > 147456 bytes with malloc directly
    > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
    > > > > 589824 bytes with malloc directly
    > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
    > > > > 2359296 bytes with malloc directly
    > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
    > > > > 9437184 bytes with malloc directly
    > > > > Epoch 0, Batch 199, Speed=384.149839
    > > > > Epoch 0, Duration=140.919567
    > > > > Epoch 0, Training accuracy=0.115169
    > > > > Epoch 0, Validation accuracy=0.141317
    > > > > Epoch 1, Batch 199, Speed=433.380512
    > > > > Epoch 1, Duration=119.553233
    > > > > Epoch 1, Training accuracy=0.170956
    > > > > Epoch 1, Validation accuracy=0.216146
    > > > > Epoch 2, Batch 199, Speed=434.864699
    > > > > Epoch 2, Duration=123.278490
    > > > > Epoch 2, Training accuracy=0.209455
    > > > > Epoch 2, Validation accuracy=0.247296
    > > > > Epoch 3, Batch 199, Speed=433.401854
    > > > > Epoch 3, Duration=118.327797
    > > > > Epoch 3, Training accuracy=0.248701
    > > > > Epoch 3, Validation accuracy=0.302083
    > > > > Epoch 4, Batch 199, Speed=419.713707
    > > > > Epoch 4, Duration=126.468409
    > > > > Epoch 4, Training accuracy=0.260949
    > > > > Epoch 4, Validation accuracy=0.269030
    > > > >
    > > > > real    10m55.796s
    > > > > user    399m33.567s
    > > > > sys     13m55.904s
    > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
    > > > > ImageRecordIOParser2:
    > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
    > threads
    > > > > for decoding..
    > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean
image
    > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean
image
    > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
completed
    > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
    > > > > ImageRecordIOParser2:
    > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
threads
    > > > > for decoding..
    > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean
image
    > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean
image
    > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
completed
    > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
300:
    > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0, Batch
199,
    > > > > Speed=419.039188 Epoch 0, Duration=143.934903 Epoch 0, Training
    > > > > accuracy=0.122542 Epoch 0, Validation accuracy=0.164359 Epoch 1,
    > Batch
    > > > > 199, Speed=445.257048 Epoch 1, Duration=135.248399 Epoch 1,
Training
    > > > > accuracy=0.178828 Epoch 1, Validation accuracy=0.199419 Epoch 2,
    > Batch
    > > > > 199, Speed=447.115215 Epoch 2, Duration=132.003770 Epoch 2,
Training
    > > > > accuracy=0.217808 Epoch 2, Validation accuracy=0.233073 Epoch 3,
    > Batch
    > > > > 199, Speed=441.079477 Epoch 3, Duration=126.543316 Epoch 3,
Training
    > > > > accuracy=0.248102 Epoch 3, Validation accuracy=0.293870 Epoch 4,
    > Batch
    > > > > 199, Speed=449.329787 Epoch 4, Duration=138.398325 Epoch 4,
Training
    > > > > accuracy=0.270021 Epoch 4, Validation accuracy=0.311498
    > > > >
    > > > > real    11m45.329s
    > > > > user    426m13.908s
    > > > > sys     16m45.093s
    > > > >
    > > > > On Wed, Jun 26, 2019 at 4:18 PM Pedro Larroy
    > > > > <pe...@gmail.com> wrote:
    > > > > >
    > > > > > The difference looks smaller now, more like your numbers. I
wonder
    > > > > > if something happened during the previous benchmark like a
system
    > > > > > update...
    > > > > >
    > > > > >
    > > > > > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
    > > > > (master)+$
    > > > > > time ~/mxnet_1.4/py3_venv/bin/python cifar10.py --epochs 5 &&
time
    > > > > > ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
[22:49:41]
    > > > > > ../src/io/iter_image_recordio_2.cc:172:
    > > > > > ImageRecordIOParser2:
    > > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
    > > > > > threads for decoding..
    > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load mean
image
    > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load mean
image
    > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    > > > > completed
    > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:172:
    > > > > > ImageRecordIOParser2:
    > > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
    > > > > > threads for decoding..
    > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load mean
image
    > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load mean
image
    > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    > > > > completed
    > > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
300:
    > > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 [22:49:42]
    > > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
    > > > > > 147456 bytes with malloc directly
    > > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
Allocate
    > > > > > 589824 bytes with malloc directly
    > > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
Allocate
    > > > > > 2359296 bytes with malloc directly
    > > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
Allocate
    > > > > > 9437184 bytes with malloc directly
    > > > > > Epoch 0, Batch 199, Speed=426.182733 Epoch 0,
Duration=134.868458
    > > > > > Epoch 0, Training accuracy=0.127238 Epoch 0, Validation
    > > > > > accuracy=0.206388 Epoch 1, Batch 199, Speed=313.127156 Epoch
1,
    > > > > > Duration=128.041775 Epoch 1, Training accuracy=0.182065 Epoch
1,
    > > > > > Validation accuracy=0.202524 Epoch 2, Batch 199,
Speed=410.931187
    > > > > > Epoch 2, Duration=124.920588 Epoch 2, Training
accuracy=0.202584
    > > > > > Epoch 2, Validation accuracy=0.245693 Epoch 3, Batch 199,
    > > > > > Speed=419.119335 Epoch 3, Duration=120.948349 Epoch 3,
Training
    > > > > > accuracy=0.235854 Epoch 3, Validation accuracy=0.291066 Epoch
4,
    > > > > > Batch 199, Speed=430.473733 Epoch 4, Duration=130.181724
Epoch 4,
    > > > > > Training accuracy=0.257773 Epoch 4, Validation
accuracy=0.304988
    > > > > >
    > > > > > real    11m7.356s
    > > > > > user    406m9.910s
    > > > > > sys     14m18.349s
    > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
    > > > > > ImageRecordIOParser2:
    > > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
    > > > > > threads for decoding..
    > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load mean
image
    > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load mean
image
    > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    > > > > completed
    > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
    > > > > > ImageRecordIOParser2:
    > > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
    > > > > > threads for decoding..
    > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load mean
image
    > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load mean
image
    > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
    > > > > completed
    > > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
300:
    > > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0, Batch
199,
    > > > > > Speed=348.618154 Epoch 0, Duration=146.469352 Epoch 0,
Training
    > > > > > accuracy=0.124121 Epoch 0, Validation accuracy=0.167227 Epoch
1,
    > > > > > Batch 199, Speed=452.790825 Epoch 1, Duration=130.199421
Epoch 1,
    > > > > > Training
    > > > > > accuracy=0.183863 Epoch 1, Validation accuracy=0.237079 Epoch
2,
    > > > > > Batch 199, Speed=451.406559 Epoch 2, Duration=126.320823
Epoch 2,
    > > > > > Training
    > > > > > accuracy=0.214844 Epoch 2, Validation accuracy=0.244692 Epoch
3,
    > > > > > Batch 199, Speed=403.161873 Epoch 3, Duration=125.331660
Epoch 3,
    > > > > > Training
    > > > > > accuracy=0.243506 Epoch 3, Validation accuracy=0.301182 Epoch
4,
    > > > > > Batch 199, Speed=450.826598 Epoch 4, Duration=126.426253
Epoch 4,
    > > > > > Training
    > > > > > accuracy=0.266424 Epoch 4, Validation accuracy=0.311899
    > > > > >
    > > > > > real    11m21.930s
    > > > > > user    415m3.855s
    > > > > > sys     13m53.975s
    > > > > >
    > > > > > On Wed, Jun 26, 2019 at 3:50 PM Pedro Larroy
    > > > > > <pe...@gmail.com> wrote:
    > > > > > >
    > > > > > > Hi Ciyong, thanks for trying to reproduce:
    > > > > > >
    > > > > > > I used this one:
    > > > > > > https://github.com/awslabs/deeplearning-
    > > > > benchmark/blob/master/dawnbe
    > > > > > > nch/cifar10.py
    > > > > > >
    > > > > > > Could you provide hardware and OS details?
    > > > > > >
    > > > > > > I will rerun and repost numbers in a few minutes.
    > > > > > >
    > > > > > > Pedro.
    > > > > > >
    > > > > > > On Wed, Jun 26, 2019 at 4:18 AM Chen, Ciyong
    > > > > > > <ci...@intel.com>
    > > > > wrote:
    > > > > > > >
    > > > > > > > Hi Pedro,
    > > > > > > >
    > > > > > > > I'm looking at this case, and using the script of
    > > > > > > >
"incubator-mxnet/example/image-classification/train_cifar10.py"
    > > > > > > > to get
    > > > > the timing data, but seems there's not much difference between
mxnet
    > > > > 1.4.1.rc0 and 1.5.0.rc1 on C5.18xlarge.
    > > > > > > >
    > > > > > > > Not sure if there's any difference in the python script,
can
    > you
    > > > > > > > point me
    > > > > the link to get your script (cifar10.py)?
    > > > > > > > Or you can also have a try with MXNet's script
    > > > > > > > (train_cifar10.py) and see
    > > > > the performance.
    > > > > > > >
    > > > > > > > Here's the command I used to collect the time:
    > > > > > > >         python train_cifar10.py --num-epoch=5
    > > > > > > >
    > > > > > > > 1) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
    > > > > > > >         real    9m4.880s
    > > > > > > >         user    333m13.340s
    > > > > > > >         sys     14m36.100s
    > > > > > > >
    > > > > > > > 2) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
    > > > > > > >         real    9m2.155s
    > > > > > > >         user    329m37.092s
    > > > > > > >         sys     16m8.668s
    > > > > > > >
    > > > > > > > -Ciyong
    > > > > > > >
    > > > > > > >
    > > > > > > > -----Original Message-----
    > > > > > > > From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com]
    > > > > > > > Sent: Wednesday, June 26, 2019 6:28 AM
    > > > > > > > To: dev@mxnet.incubator.apache.org
    > > > > > > > Cc: dev@mxnet.apache.org
    > > > > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating)
version
    > > > > > > > 1.5.0.rc1
    > > > > > > >
    > > > > > > > Hi these were my build flags and system info:
    > > > > > > >
    > > > > > > >
    > > > > > > > --- # CMake configuration
    > > > > > > > USE_CUDA: "OFF" # Build with CUDA support
    > > > > > > > USE_OLDCMAKECUDA: "OFF" # Build with old cmake cuda
    > > > > > > > USE_NCCL: "OFF" # Use NVidia NCCL with CUDA
    > > > > > > > USE_OPENCV: "ON" # Build with OpenCV support
    > > > > > > > USE_OPENMP: "ON" # Build with Openmp support
    > > > > > > > USE_CUDNN: "ON" # Build with cudnn support) # one could
set
    > > > > > > > CUDNN_ROOT for search path
    > > > > > > > USE_SSE: "ON" # Build with x86 SSE instruction support IF
NOT
    > > > > > > > ARM
    > > > > > > > USE_F16C: "ON" # Build with x86 F16C instruction support)
#
    > > > > autodetects support if "ON"
    > > > > > > > USE_LAPACK: "ON" # Build with lapack support
    > > > > > > > USE_MKL_IF_AVAILABLE: "ON" # Use MKL if found
    > > > > > > > USE_MKLML_MKL: "ON" # Use MKLDNN variant of MKL (if MKL
found)
    > > > > > > > IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
    > > > > > > > USE_MKLDNN: "ON" # Use MKLDNN variant of MKL (if MKL
found) IF
    > > > > > > > USE_MKL_IF_AVAILABLE AND (NOT APPLE)
    > > > > > > > USE_OPERATOR_TUNING: "ON" # Enable auto-tuning of
operators IF
    > > > > NOT
    > > > > > > > MSVC
    > > > > > > > USE_GPERFTOOLS: "ON" # Build with GPerfTools support (if
found)
    > > > > > > > USE_JEMALLOC: "ON" # Build with Jemalloc support
    > > > > > > > USE_PROFILER: "ON" # Build with Profiler support
    > > > > > > > USE_DIST_KVSTORE: "OFF" # Build with DIST_KVSTORE support
    > > > > > > > USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC Plugins
    > > > > > > > USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin
    > > > > > > > USE_CPP_PACKAGE: "OFF" # Build C++ Package
    > > > > > > > USE_MXNET_LIB_NAMING: "ON" # Use MXNet library naming
    > > > > conventions.
    > > > > > > > USE_GPROF: "OFF" # Compile with gprof (profiling) flag
    > > > > > > > USE_CXX14_IF_AVAILABLE: "OFF" # Build with C++14 if the
    > compiler
    > > > > > > > supports it
    > > > > > > > USE_VTUNE: "OFF" # Enable use of Intel Amplifier XE
(VTune)) #
    > > > > > > > one could set VTUNE_ROOT for search path
    > > > > > > > ENABLE_CUDA_RTC: "ON" # Build with CUDA runtime
compilation
    > > > > > > > support
    > > > > > > > BUILD_CPP_EXAMPLES: "ON" # Build cpp examples
    > > > > > > > INSTALL_EXAMPLES: "OFF" # Install the example source
files.
    > > > > > > > USE_SIGNAL_HANDLER: "ON" # Print stack traces on
segfaults.
    > > > > > > > USE_TENSORRT: "OFF" # Enable infeference optimization with
    > > > TensorRT.
    > > > > > > > USE_ASAN: "OFF" # Enable Clang/GCC ASAN sanitizers.
    > > > > > > > ENABLE_TESTCOVERAGE: "OFF" # Enable compilation with test
    > > > > > > > coverage metric output
    > > > > > > > CMAKE_BUILD_TYPE: "Release"
    > > > > > > > CMAKE_CUDA_COMPILER_LAUNCHER: "ccache"
    > > > > > > > CMAKE_C_COMPILER_LAUNCHER: "ccache"
    > > > > > > > CMAKE_CXX_COMPILER_LAUNCHER: "ccache"
    > > > > > > >
    > > > > > > > commit 4d9667121ae6fb643f2a02ab15e25231ed756cde (HEAD,
tag:
    > > > > > > > 1.5.0.rc1,
    > > > > > > > upstream/v1.5.x)
    > > > > > > > commit 1a7199691f5cbc6012bb53eecbf884bed5ae6590 (HEAD,
tag:
    > > > > > > > 1.4.1.rc0,
    > > > > > > > upstream/v1.4.x)
    > > > > > > >
    > > > > > > > curl http://169.254.169.254/latest/meta-data/instance-type
    > > > > > > > c5d.18xlarge
    > > > > > > >
    > > > > > > >
    > > > > > > > Version      : 3.6.7
    > > > > > > > Compiler     : GCC 8.2.0
    > > > > > > > Build        : ('default', 'Oct 22 2018 11:32:17')
    > > > > > > > Arch         : ('64bit', 'ELF')
    > > > > > > > ------------Pip Info-----------
    > > > > > > > Version      : 19.1.1
    > > > > > > > Directory    :
    > /home/piotr/mxnet_1.5/py3_venv/lib/python3.6/site-
    > > > > packages/pip
    > > > > > > > ----------MXNet Info-----------
    > > > > > > > Version      : 1.5.0
    > > > > > > > Directory    : /home/piotr/mxnet_1.5/python/mxnet
    > > > > > > > Hashtag not found. Not installed from pre-built package.
    > > > > > > > ----------System Info----------
    > > > > > > > Platform     :
    > > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
    > > > > > > > system       : Linux
    > > > > > > > node         : ip-172-31-63-171
    > > > > > > > release      : 4.15.0-1035-aws
    > > > > > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
    > > > > > > > ----------Hardware Info----------
    > > > > > > > machine      : x86_64
    > > > > > > > processor    : x86_64
    > > > > > > > Architecture:        x86_64
    > > > > > > > CPU op-mode(s):      32-bit, 64-bit
    > > > > > > > Byte Order:          Little Endian
    > > > > > > > CPU(s):              72
    > > > > > > > On-line CPU(s) list: 0-71
    > > > > > > > Thread(s) per core:  2
    > > > > > > > Core(s) per socket:  18
    > > > > > > > Socket(s):           2
    > > > > > > > NUMA node(s):        2
    > > > > > > > Vendor ID:           GenuineIntel
    > > > > > > > CPU family:          6
    > > > > > > > Model:               85
    > > > > > > > Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @
    > 3.00GHz
    > > > > > > > Stepping:            4
    > > > > > > > CPU MHz:             1326.446
    > > > > > > > BogoMIPS:            6000.00
    > > > > > > > Hypervisor vendor:   KVM
    > > > > > > > Virtualization type: full
    > > > > > > > L1d cache:           32K
    > > > > > > > L1i cache:           32K
    > > > > > > > L2 cache:            1024K
    > > > > > > > L3 cache:            25344K
    > > > > > > > NUMA node0 CPU(s):   0-17,36-53
    > > > > > > > NUMA node1 CPU(s):   18-35,54-71
    > > > > > > > Flags:               fpu vme de pse tsc msr pae mce cx8
apic
    > sep
    > > > mtrr
    > > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht
syscall
    > > > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good
nopl
    > > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
monitor
    > > > > > > > ssse3 fma cx16 pcid
    > > > > > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
xsave
    > > > > > > > avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
    > > > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep
bmi2
    > > > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx smap
    > clflushopt
    > > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
xsaves
    > > > > > > > ida arat pku ospke ----------Network Test----------
    > > > > > > >
    > > > > > > > ----------Python Info----------
    > > > > > > > Version      : 3.6.7
    > > > > > > > Compiler     : GCC 8.2.0
    > > > > > > > Build        : ('default', 'Oct 22 2018 11:32:17')
    > > > > > > > Arch         : ('64bit', 'ELF')
    > > > > > > > ------------Pip Info-----------
    > > > > > > > Version      : 19.1.1
    > > > > > > > Directory    :
    > /home/piotr/mxnet_1.4/py3_venv/lib/python3.6/site-
    > > > > packages/pip
    > > > > > > > ----------MXNet Info-----------
    > > > > > > > Version      : 1.4.1
    > > > > > > > Directory    : /home/piotr/mxnet_1.4/python/mxnet
    > > > > > > > Hashtag not found. Not installed from pre-built package.
    > > > > > > > ----------System Info----------
    > > > > > > > Platform     :
    > > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
    > > > > > > > system       : Linux
    > > > > > > > node         : ip-172-31-63-171
    > > > > > > > release      : 4.15.0-1035-aws
    > > > > > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
    > > > > > > > ----------Hardware Info----------
    > > > > > > > machine      : x86_64
    > > > > > > > processor    : x86_64
    > > > > > > > Architecture:        x86_64
    > > > > > > > CPU op-mode(s):      32-bit, 64-bit
    > > > > > > > Byte Order:          Little Endian
    > > > > > > > CPU(s):              72
    > > > > > > > On-line CPU(s) list: 0-71
    > > > > > > > Thread(s) per core:  2
    > > > > > > > Core(s) per socket:  18
    > > > > > > > Socket(s):           2
    > > > > > > > NUMA node(s):        2
    > > > > > > > Vendor ID:           GenuineIntel
    > > > > > > > CPU family:          6
    > > > > > > > Model:               85
    > > > > > > > Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @
    > 3.00GHz
    > > > > > > > Stepping:            4
    > > > > > > > CPU MHz:             1223.344
    > > > > > > > BogoMIPS:            6000.00
    > > > > > > > Hypervisor vendor:   KVM
    > > > > > > > Virtualization type: full
    > > > > > > > L1d cache:           32K
    > > > > > > > L1i cache:           32K
    > > > > > > > L2 cache:            1024K
    > > > > > > > L3 cache:            25344K
    > > > > > > > NUMA node0 CPU(s):   0-17,36-53
    > > > > > > > NUMA node1 CPU(s):   18-35,54-71
    > > > > > > > Flags:               fpu vme de pse tsc msr pae mce cx8
apic
    > sep
    > > > mtrr
    > > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht
syscall
    > > > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good
nopl
    > > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
monitor
    > > > > > > > ssse3 fma cx16 pcid
    > > > > > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
xsave
    > > > > > > > avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
    > > > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep
bmi2
    > > > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx smap
    > clflushopt
    > > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
xsaves
    > > > > > > > ida arat pku ospke ----------Network Test----------
    > > > > > > >
    > > > > > > > On Tue, Jun 25, 2019 at 2:35 PM Pedro Larroy
    > > > > <pe...@gmail.com> wrote:
    > > > > > > > >
    > > > > > > > > I did a training of cifar10 in CPU and seems there's
some
    > > > > > > > > regressions in the range of 7% increase of training time
    > against
    > > > 1.4.1:
    > > > > > > > >
    > > > > > > > > (py3_venv)
    > > > > > > > > piotr@ip-172-31-63-171
:0:~/deeplearning-benchmark/dawnbench
    > > > > > > > > (master)+$ time python cifar10.py --epochs 5
    > > > > > > > > real    11m30.388s
    > > > > > > > > user    417m7.766s
    > > > > > > > > sys     16m57.315s
    > > > > > > > >
    > > > > > > > > VS 1.4.1:
    > > > > > > > > real    10m41.994s
    > > > > > > > > user    392m40.646s
    > > > > > > > > sys     12m30.601s
    > > > > > > > >
    > > > > > > > >
    > > > > > > > > On Thu, Jun 20, 2019 at 10:15 PM Lai Wei <
    > royweilai@gmail.com>
    > > > > wrote:
    > > > > > > > > >
    > > > > > > > > > Hi Anirudh,
    > > > > > > > > >
    > > > > > > > > > Thanks for jumping into this quickly, I followed up
on the
    > > > issue.
    > > > > > > > > >
    > > > > > > > > > I was meant for sockeye developer/maintainers to help
setup
    > > > > > > > > > nightly tests and raise issues early.
    > > > > > > > > >
    > > > > > > > > > Thanks!
    > > > > > > > > >
    > > > > > > > > > On Fri, Jun 21, 2019 at 10:10 AM Haibin Lin
    > > > > > > > > > <ha...@gmail.com>
    > > > > > > > > > wrote:
    > > > > > > > > >
    > > > > > > > > > > In GluonNLP we are testing with MXNET nightly build
for
    > > > > > > > > > > each PR, and we did find some MXNet related issue
caught
    > by
    > > > the CI.
    > > > > > > > > > > I recommend other toolkits also add integration
tests
    > with
    > > > > > > > > > > MXNet
    > > > > nightly.
    > > > > > > > > > > It helps identify issues early.
    > > > > > > > > > >
    > > > > > > > > > > Best,
    > > > > > > > > > > Haibin
    > > > > > > > > > >
    > > > > > > > > > > On Thu, Jun 20, 2019 at 18:52 Zhao, Patric
    > > > > > > > > > > <pa...@intel.com>
    > > > > wrote:
    > > > > > > > > > >
    > > > > > > > > > > > Thanks to raise the issue and we will take a look
ASAP.
    > > > > > > > > > > >
    > > > > > > > > > > > The downstream cases is not in the MXNet CI so
it's
    > hard
    > > > > > > > > > > > to catch the potential bugs or performance
degradation
    > > > > > > > > > > > for
    > > > > MXNet developers.
    > > > > > > > > > > >
    > > > > > > > > > > > In the future, I suggest adding the major
downstream
    > > > > > > > > > > > test cases, like
    > > > > > > > > > > from
    > > > > > > > > > > > sockeye, GluonNLP, GLuonCV, DGL, Gluon-TS, into
the
    > > > > > > > > > > > nightly
    > > > > test.
    > > > > > > > > > > > If it's still too heavy,  maybe testing it weekly
or
    > > > > > > > > > > > monthly :)
    > > > > > > > > > > >
    > > > > > > > > > > > Thanks,
    > > > > > > > > > > >
    > > > > > > > > > > > --Patric
    > > > > > > > > > > >
    > > > > > > > > > > > > -----Original Message-----
    > > > > > > > > > > > > From: Anirudh Subramanian
    > > > > > > > > > > > > [mailto:anirudh2290@gmail.com]
    > > > > > > > > > > > > Sent: Friday, June 21, 2019 9:31 AM
    > > > > > > > > > > > > To: dev@mxnet.incubator.apache.org
    > > > > > > > > > > > > Cc: dev@mxnet.apache.org
    > > > > > > > > > > > > Subject: Re: [VOTE] Release Apache MXNet
(incubating)
    > > > > > > > > > > > > version
    > > > > > > > > > > > > 1.5.0.rc1
    > > > > > > > > > > > >
    > > > > > > > > > > > > Hi Lai,
    > > > > > > > > > > > >
    > > > > > > > > > > > > I have opened an issue:
    > > > > > > > > > > > >
    > https://github.com/apache/incubator-mxnet/issues/15297
    > > > > > > > > > > > > I came to know about this issue only today and
I have
    > > > > > > > > > > > > not been
    > > > > > > > > > > monitoring
    > > > > > > > > > > > > sockeye.
    > > > > > > > > > > > > I jumped onto this issue to make sure it wasn't
    > caused
    > > > > > > > > > > > > by the dlpack
    > > > > > > > > > > > changes.
    > > > > > > > > > > > > Also, I don't  think sockeye CI checks against
    > master,
    > > > > > > > > > > > > it is using
    > > > > > > > > > > 1.4.1.
    > > > > > > > > > > > >
    > > > > > > > > > > > > Anirudh
    > > > > > > > > > > > >
    > > > > > > > > > > > >
    > > > > > > > > > > > > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei
    > > > > > > > > > > > > <ro...@gmail.com>
    > > > > wrote:
    > > > > > > > > > > > >
    > > > > > > > > > > > > > Hi,
    > > > > > > > > > > > > >
    > > > > > > > > > > > > > Could you share which test failed and what’s
the
    > > > > > > > > > > > > > crash? How to reproduce it?
    > > > > > > > > > > > > >
    > > > > > > > > > > > > > I was able to install sockeye and run all
tests
    > passed.
    > > > > > > > > > > > > > Using python setup.py test
    > > > > > > > > > > > > >
    > > > > > > > > > > > > > I have tested both nightly pip package and
    > 1.5.0.rc1
    > > > > > > > > > > > > >
    > > > > > > > > > > > > > It would be great to create an issue with
    > > > > > > > > > > > > > reproducible steps and move the discussion
there.
    > > > > > > > > > > > > >
    > > > > > > > > > > > > > Also I see sockeye nightly build[1] has been
    > failing
    > > > > > > > > > > > > > for some time,
    > > > > > > > > > > if
    > > > > > > > > > > > > > it’s due to MXNet change, please raise this
early
    > so
    > > > > > > > > > > > > > we can track and solve it in time rather than
block
    > > > > > > > > > > > > > the release
    > > > > during vote time.
    > > > > > > > > > > > > >
    > > > > > > > > > > > > > [1] https://travis-ci.org/awslabs/sockeye
    > > > > > > > > > > > > >
    > > > > > > > > > > > > >
    > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 7:01 AM Anirudh
Subramanian
    > > > > > > > > > > > > > <anirudh2290@gmail.com
    > > > > > > > > > > > > > >
    > > > > > > > > > > > > > wrote:
    > > > > > > > > > > > > >
    > > > > > > > > > > > > > > I was able to reproduce a crash with the
commit
    > > > > > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06
but not
    > > > > > > > > > > > > > > with the commit
    > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c.
    > > > > > > > > > > > > > >
    > > > > > > > > > > > > > > Anirudh
    > > > > > > > > > > > > > >
    > > > > > > > > > > > > > > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei
    > > > > > > > > > > > > > > <ro...@gmail.com>
    > > > > > > > > > > wrote:
    > > > > > > > > > > > > > >
    > > > > > > > > > > > > > > > Hi Przemyslaw,
    > > > > > > > > > > > > > > >
    > > > > > > > > > > > > > > > Is there an issue with more details to
track
    > the
    > > > problem?
    > > > > > > > > > > > > > > >
    > > > > > > > > > > > > > > >
    > > > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 6:04 AM Przemysław
    > > > > > > > > > > > > > > > Trędak <pt...@apache.org>
    > > > > > > > > > > > > > > > wrote:
    > > > > > > > > > > > > > > >
    > > > > > > > > > > > > > > > > -1
    > > > > > > > > > > > > > > > >
    > > > > > > > > > > > > > > > > There is a crash in sockeye unit test
(python
    > > > > > > > > > > > > > > > > setup.py
    > > > > > > > > > > > > > > > > test) observed starting with nightly 1.5
    > build
    > > > > > > > > > > > > > > > > from
    > > > > > > > > > > > > > > > > 6/13 and still occuring in
    > > > > > > > > > > > > > > 1.5rc1. I
    > > > > > > > > > > > > > > > > don't yet have the exact commit that is
    > > > > > > > > > > > > > > > > responsible for it, but it is either
    > > > > > > > > > > > > > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c
    > > > > > > > > > > > > > > > > (dlpack
    > > > > > > > > > > > > > > > > related) or
    > > > > > > > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06
    > > > > > > > > > > > > > > > > (cached op
    > > > > > > > > > > > > optimization).
    > > > > > > > > > > > > > > > >
    > > > > > > > > > > > > > > > > On 2019/06/20 06:36:22, Lai Wei
    > > > > > > > > > > > > > > > > <ro...@gmail.com>
    > > > > wrote:
    > > > > > > > > > > > > > > > > > Dear MXNet community,
    > > > > > > > > > > > > > > > > >
    > > > > > > > > > > > > > > > > > This is the 3-day vote to release
Apache
    > > > > > > > > > > > > > > > > > MXNet
    > > > > > > > > > > > > > > > > > (incubating) version
    > > > > > > > > > > > > > > > > 1.5.0.
    > > > > > > > > > > > > > > > > > Voting on dev@ will start June 19,
    > > > > > > > > > > > > > > > > > 23:59:59(PST) and close
    > > > > > > > > > > on
    > > > > > > > > > > > > > June
    > > > > > > > > > > > > > > > 22,
    > > > > > > > > > > > > > > > > > 23:59:59.
    > > > > > > > > > > > > > > > > >
    > > > > > > > > > > > > > > > > > 1) Link to release notes:
    > > > > > > > > > > > > > > > > >
    > > > > > > > > > > > > > >
    > > > > > > > > > >
    > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Re
    > > > > > > > > > > le
    > > > > > > > > > > ase+No
    > > > > > > > > > > te
    > > > > > > > > > > > > > > s
    > > > > > > > > > > > > > > > > >
    > > > > > > > > > > > > > > > > >
    > > > > > > > > > > > > > > > > > 2) Link to release candidate:
    > > > > > > > > > > > > > > > > >
    > > > > > > > > > > > > > > > > >
    > > > > > > > > > >
    > https://github.com/apache/incubator-mxnet/releases/tag/1.5
    > > > > > > > > > > .0
    > > > > > > > > > > .r
    > > > > > > > > > > > > > > > > > c1
    > > > > > > > > > > > > > > > > >
    > > > > > > > > > > > > > > > > >
    > > > > > > > > > > > > > > > > > 3) Link to source and signatures on
apache
    > > > dist server:
    > > > > > > > > > > > > > > > > >
    > > > > > > > > > > > > > > > > >
    > > > > > > > > > >
    > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5
    > > > > > > > > > > .0
    > > > > > > > > > > .r
    > > > > > > > > > > > > > > > > > c1/
    > > > > > > > > > > > > > > > > >
    > > > > > > > > > > > > > > > > >
    > > > > > > > > > > > > > > > > > Please remember to TEST first before
voting
    > > > > accordingly:
    > > > > > > > > > > > > > > > > >
    > > > > > > > > > > > > > > > > > +1 = approve
    > > > > > > > > > > > > > > > > > +0 = no opinion
    > > > > > > > > > > > > > > > > > -1 = disapprove (provide reason)
    > > > > > > > > > > > > > > > > > --
    > > > > > > > > > > > > > > > > > Best Regards
    > > > > > > > > > > > > > > > > >
    > > > > > > > > > > > > > > > > > Lai
    > > > > > > > > > > > > > > > > >
    > > > > > > > > > > > > > > > >
    > > > > > > > > > > > > > > > --
    > > > > > > > > > > > > > > > Best Regards
    > > > > > > > > > > > > > > >
    > > > > > > > > > > > > > > > Lai
    > > > > > > > > > > > > > > >
    > > > > > > > > > > > > > >
    > > > > > > > > > > > > > --
    > > > > > > > > > > > > > Best Regards
    > > > > > > > > > > > > >
    > > > > > > > > > > > > > Lai
    > > > > > > > > > > > > >
    > > > > > > > > > > >
    > > > > > > > > > >
    > > > > > > > > > --
    > > > > > > > > > Best Regards
    > > > > > > > > >
    > > > > > > > > > Lai
    > > >
    > > --
    > > Best Regards
    > >
    > > Lai
    >
    >

    --
    Sandeep Krishnamurthy

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Posted by sandeep krishnamurthy <sa...@gmail.com>.
Hello Ciyong/Pedro,

Ran operator benchmarks on 1.4.1 and 1.5.0.rc2. (Not complete, doesn’t
cover all MXNet operators, not presented in best possible way, still WIP)
https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50

Following operators looks slower in 1.5 compared to 1.4.1:
- BatchNorm
- Pooling
- FullyConnected
- batch_dot
- Dot
- broadcast_mul
- log_softmax
and few other operators

Also, several operators runs a lot faster on 1.5 compared to 1.4.1. For
example - Convolution, flatten, elementwise operators etc. So I see that
likely few operators have regressed noticeably, however, due to other
operator performance improvements, the end effect is not that significant
hiding a lot of regression. We need more detailed analysis per operator
performance. We will not be able to do this for current release, we should
have a more concrete way to determining such performance regression before
next release.

Setup:
1.5 => Build from source (head of 1.5.rc2 tag), built with MKLDNN
1.4.1 => PyPi mxnet-mkl==1.4.1
Machine: C5.18X
No explicit environment variable were set
Operator benchmark code -
https://github.com/apache/incubator-mxnet/tree/master/benchmark/opperf

Best,
Sandeep


On Thu, Jun 27, 2019 at 10:42 AM Pedro Larroy <pe...@gmail.com>
wrote:

> I will try to run a few benchmarks in a bare metal instance tonight to
> remove virtualization variance for the measurements and provide some
> numbers.
>
> Please propose a set of models / examples that would be desirable to
> run before the release and provide a link to an easy to run script
> with instructions so we can validate the release better.
>
> Thank you.
>
> On Thu, Jun 27, 2019 at 10:01 AM Lai Wei <ro...@gmail.com> wrote:
> >
> > Dear @dev,
> >
> > I m cancelling the vote for cached op fix:
> >
> > https://github.com/apache/incubator-mxnet/pull/15298
> >
> > As for the possible cpu training regression, it looks like not a blocker
> > for now.
> >
> > I will start a new rc2 vote, please help to validate.
> >
> > Thanks!
> >
> >
> > On Thu, Jun 27, 2019 at 10:06 PM Chen, Ciyong <ci...@intel.com>
> wrote:
> >
> > > Hi Pedro,
> > >
> > > I was able to reproduced the similar result (v1.5 is ~%5.6 slower than
> > > v1.4, I was using 18 cores for computing) with your script on
> C5.18xlarge.
> > > But need to bind the cores with below command when running the script,
> > > (without setting the env variables, I got a close time (<1%) with v1.5
> and
> > > v1.4)
> > >         export KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0
> > >         export OMP_NUM_THREADS=18
> > >
> > > Did you set any env variables during running?
> > >
> > > The performance result I got as below:
> > > 1) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
> > > real    12m10.856s
> > > user    234m49.576s
> > > sys     4m38.044s
> > >
> > > 2) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
> > > real    12m52.140s
> > > user    246m30.740s
> > > sys     5m8.188s
> > >
> > > As I looked at the profiling data, most of the ops have same perf
> between
> > > v1.4 and v1.5. But some ops like " _backward_BatchNorm" and "Pooling"
> is
> > > ~1.37x slower on v1.5 compared with v1.4.
> > > Will do further analysis on these ops.
> > >
> > > Here's the hardware/OS info from my side:
> > > ----------Python Info----------
> > > Version      : 3.6.8
> > > Compiler     : GCC 7.3.0
> > > Build        : ('default', 'Dec 30 2018 01:22:34')
> > > Arch         : ('64bit', '')
> > > ------------Pip Info-----------
> > > Version      : 19.0.3
> > > Directory    :
> > > /home/ubuntu/anaconda3/envs/perf-mxnet/lib/python3.6/site-packages/pip
> > > ----------MXNet Info-----------
> > > Version      : 1.5.0
> > > Directory    : /home/ubuntu/ws/incubator-mxnet/python/mxnet
> > > Hashtag not found. Not installed from pre-built package.
> > > ----------System Info----------
> > > Platform     : Linux-4.4.0-1085-aws-x86_64-with-debian-stretch-sid
> > > system       : Linux
> > > node         : ip-172-31-32-129
> > > release      : 4.4.0-1085-aws
> > > version      : #96-Ubuntu SMP Tue Jun 11 09:08:32 UTC 2019
> > > ----------Hardware Info----------
> > > machine      : x86_64
> > > processor    : x86_64
> > > Architecture:          x86_64
> > > CPU op-mode(s):        32-bit, 64-bit
> > > Byte Order:            Little Endian
> > > CPU(s):                72
> > > On-line CPU(s) list:   0-71
> > > Thread(s) per core:    2
> > > Core(s) per socket:    18
> > > Socket(s):             2
> > > NUMA node(s):          2
> > > Vendor ID:             GenuineIntel
> > > CPU family:            6
> > > Model:                 85
> > > Model name:            Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
> > > Stepping:              3
> > > CPU MHz:               3000.000
> > > BogoMIPS:              6000.00
> > > Hypervisor vendor:     KVM
> > > Virtualization type:   full
> > > L1d cache:             32K
> > > L1i cache:             32K
> > > L2 cache:              1024K
> > > L3 cache:              25344K
> > > NUMA node0 CPU(s):     0-17,36-53
> > > NUMA node1 CPU(s):     18-35,54-71
> > > Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
> > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx
> pdpe1gb
> > > rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc
> > > aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid
> sse4_1
> > > sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand
> > > hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase
> > > tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f rdseed
> adx
> > > smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 ida arat pku
> > > ----------Network Test----------
> > >
> > >
> > > -Ciyong
> > >
> > >
> > > -----Original Message-----
> > > From: Zhao, Patric [mailto:patric.zhao@intel.com]
> > > Sent: Thursday, June 27, 2019 9:55 AM
> > > To: dev@mxnet.incubator.apache.org
> > > Cc: dev@mxnet.apache.org
> > > Subject: RE: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1
> > >
> > > Could we run more epochs to see the performance difference or profiling
> > > the difference between good and bad run?
> > >
> > > > -----Original Message-----
> > > > From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com]
> > > > Sent: Thursday, June 27, 2019 9:35 AM
> > > > To: dev@mxnet.incubator.apache.org
> > > > Cc: dev@mxnet.apache.org
> > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version
> > > > 1.5.0.rc1
> > > >
> > > > I run again and the gap is again bigger, I guess we need to average
> > > > out the times across several runs:
> > > >
> > > > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
> > > > (master)+$ time ~/mxnet_1.4/py3_venv/bin/python cifar10.py --epochs 5
> > > > && time ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
> > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
> > > > ImageRecordIOParser2:
> > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
> threads
> > > > for decoding..
> > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin completed
> > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
> > > > ImageRecordIOParser2:
> > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4 threads
> > > > for decoding..
> > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin completed
> > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005, 300:
> > > > 0.0001} Epoch 0, Changed learning rate to 0.05 [23:17:09]
> > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > > > 147456 bytes with malloc directly
> > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > > > 589824 bytes with malloc directly
> > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > > > 2359296 bytes with malloc directly
> > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > > > 9437184 bytes with malloc directly
> > > > Epoch 0, Batch 199, Speed=384.149839
> > > > Epoch 0, Duration=140.919567
> > > > Epoch 0, Training accuracy=0.115169
> > > > Epoch 0, Validation accuracy=0.141317
> > > > Epoch 1, Batch 199, Speed=433.380512
> > > > Epoch 1, Duration=119.553233
> > > > Epoch 1, Training accuracy=0.170956
> > > > Epoch 1, Validation accuracy=0.216146
> > > > Epoch 2, Batch 199, Speed=434.864699
> > > > Epoch 2, Duration=123.278490
> > > > Epoch 2, Training accuracy=0.209455
> > > > Epoch 2, Validation accuracy=0.247296
> > > > Epoch 3, Batch 199, Speed=433.401854
> > > > Epoch 3, Duration=118.327797
> > > > Epoch 3, Training accuracy=0.248701
> > > > Epoch 3, Validation accuracy=0.302083
> > > > Epoch 4, Batch 199, Speed=419.713707
> > > > Epoch 4, Duration=126.468409
> > > > Epoch 4, Training accuracy=0.260949
> > > > Epoch 4, Validation accuracy=0.269030
> > > >
> > > > real    10m55.796s
> > > > user    399m33.567s
> > > > sys     13m55.904s
> > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
> > > > ImageRecordIOParser2:
> > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
> threads
> > > > for decoding..
> > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin completed
> > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
> > > > ImageRecordIOParser2:
> > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4 threads
> > > > for decoding..
> > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin completed
> > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005, 300:
> > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0, Batch 199,
> > > > Speed=419.039188 Epoch 0, Duration=143.934903 Epoch 0, Training
> > > > accuracy=0.122542 Epoch 0, Validation accuracy=0.164359 Epoch 1,
> Batch
> > > > 199, Speed=445.257048 Epoch 1, Duration=135.248399 Epoch 1, Training
> > > > accuracy=0.178828 Epoch 1, Validation accuracy=0.199419 Epoch 2,
> Batch
> > > > 199, Speed=447.115215 Epoch 2, Duration=132.003770 Epoch 2, Training
> > > > accuracy=0.217808 Epoch 2, Validation accuracy=0.233073 Epoch 3,
> Batch
> > > > 199, Speed=441.079477 Epoch 3, Duration=126.543316 Epoch 3, Training
> > > > accuracy=0.248102 Epoch 3, Validation accuracy=0.293870 Epoch 4,
> Batch
> > > > 199, Speed=449.329787 Epoch 4, Duration=138.398325 Epoch 4, Training
> > > > accuracy=0.270021 Epoch 4, Validation accuracy=0.311498
> > > >
> > > > real    11m45.329s
> > > > user    426m13.908s
> > > > sys     16m45.093s
> > > >
> > > > On Wed, Jun 26, 2019 at 4:18 PM Pedro Larroy
> > > > <pe...@gmail.com> wrote:
> > > > >
> > > > > The difference looks smaller now, more like your numbers. I wonder
> > > > > if something happened during the previous benchmark like a system
> > > > > update...
> > > > >
> > > > >
> > > > > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
> > > > (master)+$
> > > > > time ~/mxnet_1.4/py3_venv/bin/python cifar10.py --epochs 5 && time
> > > > > ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5 [22:49:41]
> > > > > ../src/io/iter_image_recordio_2.cc:172:
> > > > > ImageRecordIOParser2:
> > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
> > > > > threads for decoding..
> > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > > completed
> > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:172:
> > > > > ImageRecordIOParser2:
> > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
> > > > > threads for decoding..
> > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > > completed
> > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005, 300:
> > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 [22:49:42]
> > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > > > > 147456 bytes with malloc directly
> > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > > > > 589824 bytes with malloc directly
> > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > > > > 2359296 bytes with malloc directly
> > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > > > > 9437184 bytes with malloc directly
> > > > > Epoch 0, Batch 199, Speed=426.182733 Epoch 0, Duration=134.868458
> > > > > Epoch 0, Training accuracy=0.127238 Epoch 0, Validation
> > > > > accuracy=0.206388 Epoch 1, Batch 199, Speed=313.127156 Epoch 1,
> > > > > Duration=128.041775 Epoch 1, Training accuracy=0.182065 Epoch 1,
> > > > > Validation accuracy=0.202524 Epoch 2, Batch 199, Speed=410.931187
> > > > > Epoch 2, Duration=124.920588 Epoch 2, Training accuracy=0.202584
> > > > > Epoch 2, Validation accuracy=0.245693 Epoch 3, Batch 199,
> > > > > Speed=419.119335 Epoch 3, Duration=120.948349 Epoch 3, Training
> > > > > accuracy=0.235854 Epoch 3, Validation accuracy=0.291066 Epoch 4,
> > > > > Batch 199, Speed=430.473733 Epoch 4, Duration=130.181724 Epoch 4,
> > > > > Training accuracy=0.257773 Epoch 4, Validation accuracy=0.304988
> > > > >
> > > > > real    11m7.356s
> > > > > user    406m9.910s
> > > > > sys     14m18.349s
> > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
> > > > > ImageRecordIOParser2:
> > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
> > > > > threads for decoding..
> > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > > completed
> > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
> > > > > ImageRecordIOParser2:
> > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
> > > > > threads for decoding..
> > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > > completed
> > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005, 300:
> > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0, Batch 199,
> > > > > Speed=348.618154 Epoch 0, Duration=146.469352 Epoch 0, Training
> > > > > accuracy=0.124121 Epoch 0, Validation accuracy=0.167227 Epoch 1,
> > > > > Batch 199, Speed=452.790825 Epoch 1, Duration=130.199421 Epoch 1,
> > > > > Training
> > > > > accuracy=0.183863 Epoch 1, Validation accuracy=0.237079 Epoch 2,
> > > > > Batch 199, Speed=451.406559 Epoch 2, Duration=126.320823 Epoch 2,
> > > > > Training
> > > > > accuracy=0.214844 Epoch 2, Validation accuracy=0.244692 Epoch 3,
> > > > > Batch 199, Speed=403.161873 Epoch 3, Duration=125.331660 Epoch 3,
> > > > > Training
> > > > > accuracy=0.243506 Epoch 3, Validation accuracy=0.301182 Epoch 4,
> > > > > Batch 199, Speed=450.826598 Epoch 4, Duration=126.426253 Epoch 4,
> > > > > Training
> > > > > accuracy=0.266424 Epoch 4, Validation accuracy=0.311899
> > > > >
> > > > > real    11m21.930s
> > > > > user    415m3.855s
> > > > > sys     13m53.975s
> > > > >
> > > > > On Wed, Jun 26, 2019 at 3:50 PM Pedro Larroy
> > > > > <pe...@gmail.com> wrote:
> > > > > >
> > > > > > Hi Ciyong, thanks for trying to reproduce:
> > > > > >
> > > > > > I used this one:
> > > > > > https://github.com/awslabs/deeplearning-
> > > > benchmark/blob/master/dawnbe
> > > > > > nch/cifar10.py
> > > > > >
> > > > > > Could you provide hardware and OS details?
> > > > > >
> > > > > > I will rerun and repost numbers in a few minutes.
> > > > > >
> > > > > > Pedro.
> > > > > >
> > > > > > On Wed, Jun 26, 2019 at 4:18 AM Chen, Ciyong
> > > > > > <ci...@intel.com>
> > > > wrote:
> > > > > > >
> > > > > > > Hi Pedro,
> > > > > > >
> > > > > > > I'm looking at this case, and using the script of
> > > > > > > "incubator-mxnet/example/image-classification/train_cifar10.py"
> > > > > > > to get
> > > > the timing data, but seems there's not much difference between mxnet
> > > > 1.4.1.rc0 and 1.5.0.rc1 on C5.18xlarge.
> > > > > > >
> > > > > > > Not sure if there's any difference in the python script, can
> you
> > > > > > > point me
> > > > the link to get your script (cifar10.py)?
> > > > > > > Or you can also have a try with MXNet's script
> > > > > > > (train_cifar10.py) and see
> > > > the performance.
> > > > > > >
> > > > > > > Here's the command I used to collect the time:
> > > > > > >         python train_cifar10.py --num-epoch=5
> > > > > > >
> > > > > > > 1) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
> > > > > > >         real    9m4.880s
> > > > > > >         user    333m13.340s
> > > > > > >         sys     14m36.100s
> > > > > > >
> > > > > > > 2) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
> > > > > > >         real    9m2.155s
> > > > > > >         user    329m37.092s
> > > > > > >         sys     16m8.668s
> > > > > > >
> > > > > > > -Ciyong
> > > > > > >
> > > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com]
> > > > > > > Sent: Wednesday, June 26, 2019 6:28 AM
> > > > > > > To: dev@mxnet.incubator.apache.org
> > > > > > > Cc: dev@mxnet.apache.org
> > > > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version
> > > > > > > 1.5.0.rc1
> > > > > > >
> > > > > > > Hi these were my build flags and system info:
> > > > > > >
> > > > > > >
> > > > > > > --- # CMake configuration
> > > > > > > USE_CUDA: "OFF" # Build with CUDA support
> > > > > > > USE_OLDCMAKECUDA: "OFF" # Build with old cmake cuda
> > > > > > > USE_NCCL: "OFF" # Use NVidia NCCL with CUDA
> > > > > > > USE_OPENCV: "ON" # Build with OpenCV support
> > > > > > > USE_OPENMP: "ON" # Build with Openmp support
> > > > > > > USE_CUDNN: "ON" # Build with cudnn support) # one could set
> > > > > > > CUDNN_ROOT for search path
> > > > > > > USE_SSE: "ON" # Build with x86 SSE instruction support IF NOT
> > > > > > > ARM
> > > > > > > USE_F16C: "ON" # Build with x86 F16C instruction support) #
> > > > autodetects support if "ON"
> > > > > > > USE_LAPACK: "ON" # Build with lapack support
> > > > > > > USE_MKL_IF_AVAILABLE: "ON" # Use MKL if found
> > > > > > > USE_MKLML_MKL: "ON" # Use MKLDNN variant of MKL (if MKL found)
> > > > > > > IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
> > > > > > > USE_MKLDNN: "ON" # Use MKLDNN variant of MKL (if MKL found) IF
> > > > > > > USE_MKL_IF_AVAILABLE AND (NOT APPLE)
> > > > > > > USE_OPERATOR_TUNING: "ON" # Enable auto-tuning of operators IF
> > > > NOT
> > > > > > > MSVC
> > > > > > > USE_GPERFTOOLS: "ON" # Build with GPerfTools support (if found)
> > > > > > > USE_JEMALLOC: "ON" # Build with Jemalloc support
> > > > > > > USE_PROFILER: "ON" # Build with Profiler support
> > > > > > > USE_DIST_KVSTORE: "OFF" # Build with DIST_KVSTORE support
> > > > > > > USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC Plugins
> > > > > > > USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin
> > > > > > > USE_CPP_PACKAGE: "OFF" # Build C++ Package
> > > > > > > USE_MXNET_LIB_NAMING: "ON" # Use MXNet library naming
> > > > conventions.
> > > > > > > USE_GPROF: "OFF" # Compile with gprof (profiling) flag
> > > > > > > USE_CXX14_IF_AVAILABLE: "OFF" # Build with C++14 if the
> compiler
> > > > > > > supports it
> > > > > > > USE_VTUNE: "OFF" # Enable use of Intel Amplifier XE (VTune)) #
> > > > > > > one could set VTUNE_ROOT for search path
> > > > > > > ENABLE_CUDA_RTC: "ON" # Build with CUDA runtime compilation
> > > > > > > support
> > > > > > > BUILD_CPP_EXAMPLES: "ON" # Build cpp examples
> > > > > > > INSTALL_EXAMPLES: "OFF" # Install the example source files.
> > > > > > > USE_SIGNAL_HANDLER: "ON" # Print stack traces on segfaults.
> > > > > > > USE_TENSORRT: "OFF" # Enable infeference optimization with
> > > TensorRT.
> > > > > > > USE_ASAN: "OFF" # Enable Clang/GCC ASAN sanitizers.
> > > > > > > ENABLE_TESTCOVERAGE: "OFF" # Enable compilation with test
> > > > > > > coverage metric output
> > > > > > > CMAKE_BUILD_TYPE: "Release"
> > > > > > > CMAKE_CUDA_COMPILER_LAUNCHER: "ccache"
> > > > > > > CMAKE_C_COMPILER_LAUNCHER: "ccache"
> > > > > > > CMAKE_CXX_COMPILER_LAUNCHER: "ccache"
> > > > > > >
> > > > > > > commit 4d9667121ae6fb643f2a02ab15e25231ed756cde (HEAD, tag:
> > > > > > > 1.5.0.rc1,
> > > > > > > upstream/v1.5.x)
> > > > > > > commit 1a7199691f5cbc6012bb53eecbf884bed5ae6590 (HEAD, tag:
> > > > > > > 1.4.1.rc0,
> > > > > > > upstream/v1.4.x)
> > > > > > >
> > > > > > > curl http://169.254.169.254/latest/meta-data/instance-type
> > > > > > > c5d.18xlarge
> > > > > > >
> > > > > > >
> > > > > > > Version      : 3.6.7
> > > > > > > Compiler     : GCC 8.2.0
> > > > > > > Build        : ('default', 'Oct 22 2018 11:32:17')
> > > > > > > Arch         : ('64bit', 'ELF')
> > > > > > > ------------Pip Info-----------
> > > > > > > Version      : 19.1.1
> > > > > > > Directory    :
> /home/piotr/mxnet_1.5/py3_venv/lib/python3.6/site-
> > > > packages/pip
> > > > > > > ----------MXNet Info-----------
> > > > > > > Version      : 1.5.0
> > > > > > > Directory    : /home/piotr/mxnet_1.5/python/mxnet
> > > > > > > Hashtag not found. Not installed from pre-built package.
> > > > > > > ----------System Info----------
> > > > > > > Platform     :
> > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
> > > > > > > system       : Linux
> > > > > > > node         : ip-172-31-63-171
> > > > > > > release      : 4.15.0-1035-aws
> > > > > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
> > > > > > > ----------Hardware Info----------
> > > > > > > machine      : x86_64
> > > > > > > processor    : x86_64
> > > > > > > Architecture:        x86_64
> > > > > > > CPU op-mode(s):      32-bit, 64-bit
> > > > > > > Byte Order:          Little Endian
> > > > > > > CPU(s):              72
> > > > > > > On-line CPU(s) list: 0-71
> > > > > > > Thread(s) per core:  2
> > > > > > > Core(s) per socket:  18
> > > > > > > Socket(s):           2
> > > > > > > NUMA node(s):        2
> > > > > > > Vendor ID:           GenuineIntel
> > > > > > > CPU family:          6
> > > > > > > Model:               85
> > > > > > > Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @
> 3.00GHz
> > > > > > > Stepping:            4
> > > > > > > CPU MHz:             1326.446
> > > > > > > BogoMIPS:            6000.00
> > > > > > > Hypervisor vendor:   KVM
> > > > > > > Virtualization type: full
> > > > > > > L1d cache:           32K
> > > > > > > L1i cache:           32K
> > > > > > > L2 cache:            1024K
> > > > > > > L3 cache:            25344K
> > > > > > > NUMA node0 CPU(s):   0-17,36-53
> > > > > > > NUMA node1 CPU(s):   18-35,54-71
> > > > > > > Flags:               fpu vme de pse tsc msr pae mce cx8 apic
> sep
> > > mtrr
> > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall
> > > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl
> > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq monitor
> > > > > > > ssse3 fma cx16 pcid
> > > > > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave
> > > > > > > avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
> > > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2
> > > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx smap
> clflushopt
> > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves
> > > > > > > ida arat pku ospke ----------Network Test----------
> > > > > > >
> > > > > > > ----------Python Info----------
> > > > > > > Version      : 3.6.7
> > > > > > > Compiler     : GCC 8.2.0
> > > > > > > Build        : ('default', 'Oct 22 2018 11:32:17')
> > > > > > > Arch         : ('64bit', 'ELF')
> > > > > > > ------------Pip Info-----------
> > > > > > > Version      : 19.1.1
> > > > > > > Directory    :
> /home/piotr/mxnet_1.4/py3_venv/lib/python3.6/site-
> > > > packages/pip
> > > > > > > ----------MXNet Info-----------
> > > > > > > Version      : 1.4.1
> > > > > > > Directory    : /home/piotr/mxnet_1.4/python/mxnet
> > > > > > > Hashtag not found. Not installed from pre-built package.
> > > > > > > ----------System Info----------
> > > > > > > Platform     :
> > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
> > > > > > > system       : Linux
> > > > > > > node         : ip-172-31-63-171
> > > > > > > release      : 4.15.0-1035-aws
> > > > > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
> > > > > > > ----------Hardware Info----------
> > > > > > > machine      : x86_64
> > > > > > > processor    : x86_64
> > > > > > > Architecture:        x86_64
> > > > > > > CPU op-mode(s):      32-bit, 64-bit
> > > > > > > Byte Order:          Little Endian
> > > > > > > CPU(s):              72
> > > > > > > On-line CPU(s) list: 0-71
> > > > > > > Thread(s) per core:  2
> > > > > > > Core(s) per socket:  18
> > > > > > > Socket(s):           2
> > > > > > > NUMA node(s):        2
> > > > > > > Vendor ID:           GenuineIntel
> > > > > > > CPU family:          6
> > > > > > > Model:               85
> > > > > > > Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @
> 3.00GHz
> > > > > > > Stepping:            4
> > > > > > > CPU MHz:             1223.344
> > > > > > > BogoMIPS:            6000.00
> > > > > > > Hypervisor vendor:   KVM
> > > > > > > Virtualization type: full
> > > > > > > L1d cache:           32K
> > > > > > > L1i cache:           32K
> > > > > > > L2 cache:            1024K
> > > > > > > L3 cache:            25344K
> > > > > > > NUMA node0 CPU(s):   0-17,36-53
> > > > > > > NUMA node1 CPU(s):   18-35,54-71
> > > > > > > Flags:               fpu vme de pse tsc msr pae mce cx8 apic
> sep
> > > mtrr
> > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall
> > > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl
> > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq monitor
> > > > > > > ssse3 fma cx16 pcid
> > > > > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave
> > > > > > > avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
> > > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2
> > > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx smap
> clflushopt
> > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves
> > > > > > > ida arat pku ospke ----------Network Test----------
> > > > > > >
> > > > > > > On Tue, Jun 25, 2019 at 2:35 PM Pedro Larroy
> > > > <pe...@gmail.com> wrote:
> > > > > > > >
> > > > > > > > I did a training of cifar10 in CPU and seems there's some
> > > > > > > > regressions in the range of 7% increase of training time
> against
> > > 1.4.1:
> > > > > > > >
> > > > > > > > (py3_venv)
> > > > > > > > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
> > > > > > > > (master)+$ time python cifar10.py --epochs 5
> > > > > > > > real    11m30.388s
> > > > > > > > user    417m7.766s
> > > > > > > > sys     16m57.315s
> > > > > > > >
> > > > > > > > VS 1.4.1:
> > > > > > > > real    10m41.994s
> > > > > > > > user    392m40.646s
> > > > > > > > sys     12m30.601s
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Jun 20, 2019 at 10:15 PM Lai Wei <
> royweilai@gmail.com>
> > > > wrote:
> > > > > > > > >
> > > > > > > > > Hi Anirudh,
> > > > > > > > >
> > > > > > > > > Thanks for jumping into this quickly, I followed up on the
> > > issue.
> > > > > > > > >
> > > > > > > > > I was meant for sockeye developer/maintainers to help setup
> > > > > > > > > nightly tests and raise issues early.
> > > > > > > > >
> > > > > > > > > Thanks!
> > > > > > > > >
> > > > > > > > > On Fri, Jun 21, 2019 at 10:10 AM Haibin Lin
> > > > > > > > > <ha...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > In GluonNLP we are testing with MXNET nightly build for
> > > > > > > > > > each PR, and we did find some MXNet related issue caught
> by
> > > the CI.
> > > > > > > > > > I recommend other toolkits also add integration tests
> with
> > > > > > > > > > MXNet
> > > > nightly.
> > > > > > > > > > It helps identify issues early.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Haibin
> > > > > > > > > >
> > > > > > > > > > On Thu, Jun 20, 2019 at 18:52 Zhao, Patric
> > > > > > > > > > <pa...@intel.com>
> > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks to raise the issue and we will take a look ASAP.
> > > > > > > > > > >
> > > > > > > > > > > The downstream cases is not in the MXNet CI so it's
> hard
> > > > > > > > > > > to catch the potential bugs or performance degradation
> > > > > > > > > > > for
> > > > MXNet developers.
> > > > > > > > > > >
> > > > > > > > > > > In the future, I suggest adding the major downstream
> > > > > > > > > > > test cases, like
> > > > > > > > > > from
> > > > > > > > > > > sockeye, GluonNLP, GLuonCV, DGL, Gluon-TS, into the
> > > > > > > > > > > nightly
> > > > test.
> > > > > > > > > > > If it's still too heavy,  maybe testing it weekly or
> > > > > > > > > > > monthly :)
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > >
> > > > > > > > > > > --Patric
> > > > > > > > > > >
> > > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > > From: Anirudh Subramanian
> > > > > > > > > > > > [mailto:anirudh2290@gmail.com]
> > > > > > > > > > > > Sent: Friday, June 21, 2019 9:31 AM
> > > > > > > > > > > > To: dev@mxnet.incubator.apache.org
> > > > > > > > > > > > Cc: dev@mxnet.apache.org
> > > > > > > > > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating)
> > > > > > > > > > > > version
> > > > > > > > > > > > 1.5.0.rc1
> > > > > > > > > > > >
> > > > > > > > > > > > Hi Lai,
> > > > > > > > > > > >
> > > > > > > > > > > > I have opened an issue:
> > > > > > > > > > > >
> https://github.com/apache/incubator-mxnet/issues/15297
> > > > > > > > > > > > I came to know about this issue only today and I have
> > > > > > > > > > > > not been
> > > > > > > > > > monitoring
> > > > > > > > > > > > sockeye.
> > > > > > > > > > > > I jumped onto this issue to make sure it wasn't
> caused
> > > > > > > > > > > > by the dlpack
> > > > > > > > > > > changes.
> > > > > > > > > > > > Also, I don't  think sockeye CI checks against
> master,
> > > > > > > > > > > > it is using
> > > > > > > > > > 1.4.1.
> > > > > > > > > > > >
> > > > > > > > > > > > Anirudh
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei
> > > > > > > > > > > > <ro...@gmail.com>
> > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Could you share which test failed and what’s the
> > > > > > > > > > > > > crash? How to reproduce it?
> > > > > > > > > > > > >
> > > > > > > > > > > > > I was able to install sockeye and run all tests
> passed.
> > > > > > > > > > > > > Using python setup.py test
> > > > > > > > > > > > >
> > > > > > > > > > > > > I have tested both nightly pip package and
> 1.5.0.rc1
> > > > > > > > > > > > >
> > > > > > > > > > > > > It would be great to create an issue with
> > > > > > > > > > > > > reproducible steps and move the discussion there.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Also I see sockeye nightly build[1] has been
> failing
> > > > > > > > > > > > > for some time,
> > > > > > > > > > if
> > > > > > > > > > > > > it’s due to MXNet change, please raise this early
> so
> > > > > > > > > > > > > we can track and solve it in time rather than block
> > > > > > > > > > > > > the release
> > > > during vote time.
> > > > > > > > > > > > >
> > > > > > > > > > > > > [1] https://travis-ci.org/awslabs/sockeye
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Jun 21, 2019 at 7:01 AM Anirudh Subramanian
> > > > > > > > > > > > > <anirudh2290@gmail.com
> > > > > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > I was able to reproduce a crash with the commit
> > > > > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06 but not
> > > > > > > > > > > > > > with the commit
> > > > a862270beb2d796c1ba311183f7f4a766a18ad6c.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Anirudh
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei
> > > > > > > > > > > > > > <ro...@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Przemyslaw,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Is there an issue with more details to track
> the
> > > problem?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 6:04 AM Przemysław
> > > > > > > > > > > > > > > Trędak <pt...@apache.org>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > -1
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > There is a crash in sockeye unit test (python
> > > > > > > > > > > > > > > > setup.py
> > > > > > > > > > > > > > > > test) observed starting with nightly 1.5
> build
> > > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > 6/13 and still occuring in
> > > > > > > > > > > > > > 1.5rc1. I
> > > > > > > > > > > > > > > > don't yet have the exact commit that is
> > > > > > > > > > > > > > > > responsible for it, but it is either
> > > > > > > > > > > > > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c
> > > > > > > > > > > > > > > > (dlpack
> > > > > > > > > > > > > > > > related) or
> > > > > > > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06
> > > > > > > > > > > > > > > > (cached op
> > > > > > > > > > > > optimization).
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On 2019/06/20 06:36:22, Lai Wei
> > > > > > > > > > > > > > > > <ro...@gmail.com>
> > > > wrote:
> > > > > > > > > > > > > > > > > Dear MXNet community,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > This is the 3-day vote to release Apache
> > > > > > > > > > > > > > > > > MXNet
> > > > > > > > > > > > > > > > > (incubating) version
> > > > > > > > > > > > > > > > 1.5.0.
> > > > > > > > > > > > > > > > > Voting on dev@ will start June 19,
> > > > > > > > > > > > > > > > > 23:59:59(PST) and close
> > > > > > > > > > on
> > > > > > > > > > > > > June
> > > > > > > > > > > > > > > 22,
> > > > > > > > > > > > > > > > > 23:59:59.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 1) Link to release notes:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > >
> https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Re
> > > > > > > > > > le
> > > > > > > > > > ase+No
> > > > > > > > > > te
> > > > > > > > > > > > > > s
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 2) Link to release candidate:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > >
> https://github.com/apache/incubator-mxnet/releases/tag/1.5
> > > > > > > > > > .0
> > > > > > > > > > .r
> > > > > > > > > > > > > > > > > c1
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 3) Link to source and signatures on apache
> > > dist server:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > >
> https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5
> > > > > > > > > > .0
> > > > > > > > > > .r
> > > > > > > > > > > > > > > > > c1/
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Please remember to TEST first before voting
> > > > accordingly:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +1 = approve
> > > > > > > > > > > > > > > > > +0 = no opinion
> > > > > > > > > > > > > > > > > -1 = disapprove (provide reason)
> > > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > > Best Regards
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Lai
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > Best Regards
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Lai
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > Best Regards
> > > > > > > > > > > > >
> > > > > > > > > > > > > Lai
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Best Regards
> > > > > > > > >
> > > > > > > > > Lai
> > >
> > --
> > Best Regards
> >
> > Lai
>
>

-- 
Sandeep Krishnamurthy

Re: FW: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Posted by Manu Seth <ma...@gmail.com>.
Sorry that last email got sent again. I tried to send it yesterday but I
guess Apache approved it today.

Hi Pedro,
Good question. You can run the following commands in Python to get the
commit id corresponding to the pip build.

import mxnet as mx
import os
path = os.path.join(mx.__path__[0],'COMMIT_HASH')
print(open(path).read())


On Fri, Jun 28, 2019 at 4:56 PM Pedro Larroy <pe...@gmail.com>
wrote:

> Thanks Manu, the warmup is important, also the first run it downloads
> a bunch of data which will affect the measurement. That's a good idea.
>
> How can I find which commit corresponds to a pip build myself?
>
> Pedro.
>
> On Fri, Jun 28, 2019 at 4:48 PM Manu Seth <ma...@gmail.com> wrote:
> >
> > I ran the same cifar10.py script as Pedro, but for 20 epochs. Considering
> > the first 10 epochs for warm-up, I averaged time per epoch for the last
> 10
> > epochs.
> >
> > With MXNet 1.4.1 average time is 164.23 s
> > With MXNet 1.5.0 average time is 174.59 s (~6.3% regression)
> >
> >
> > For a second data point, I ran Gluon speed test benchmark script -
> >
> https://github.com/apache/incubator-mxnet/blob/master/benchmark/python/gluon/benchmark_gluon.py
> > using the following command:
> > python3 benchmark_gluon.py --model 'resnet152_v2' --batch-size 128
> > --num-batches 200 --type 'training'
> >
> > I got the following speeds:
> > With MXNet 1.4.1, average speed is 25.677534 img/s
> > With MXNet 1.5.0, average speed is 25.082130 img/s (~2.3% regression)
> >
> > Note:
> > For 1.4.1 version, I used pip install mxnet-mkl==1.4.1
> > For 1.5.0 version, I used pip install mxnet-mkl==1.5.0b20190619 which
> > corresponds to commit# ccbbf6b4b76ea536a6583c99497c83b65a20817b which is
> > behind 1.5.x branch by 4 commits
> >
> >
> > Best,
> > Manu
> >
> >
> > On 6/27/19, 10:44 AM, "Pedro Larroy" <pe...@gmail.com>
> wrote:
> > >
> > >     I will try to run a few benchmarks in a bare metal instance
> tonight to
> > >     remove virtualization variance for the measurements and provide
> some
> > >     numbers.
> > >
> > >     Please propose a set of models / examples that would be desirable
> to
> > >     run before the release and provide a link to an easy to run script
> > >     with instructions so we can validate the release better.
> > >
> > >     Thank you.
> > >
> > >     On Thu, Jun 27, 2019 at 10:01 AM Lai Wei <ro...@gmail.com>
> wrote:
> > >     >
> > >     > Dear @dev,
> > >     >
> > >     > I m cancelling the vote for cached op fix:
> > >     >
> > >     > https://github.com/apache/incubator-mxnet/pull/15298
> > >     >
> > >     > As for the possible cpu training regression, it looks like not a
> > > blocker
> > >     > for now.
> > >     >
> > >     > I will start a new rc2 vote, please help to validate.
> > >     >
> > >     > Thanks!
> > >     >
> > >     >
> > >     > On Thu, Jun 27, 2019 at 10:06 PM Chen, Ciyong <
> ciyong.chen@intel.com>
> > > wrote:
> > >     >
> > >     > > Hi Pedro,
> > >     > >
> > >     > > I was able to reproduced the similar result (v1.5 is ~%5.6
> slower
> > > than
> > >     > > v1.4, I was using 18 cores for computing) with your script on
> > > C5.18xlarge.
> > >     > > But need to bind the cores with below command when running the
> > > script,
> > >     > > (without setting the env variables, I got a close time (<1%)
> with
> > > v1.5 and
> > >     > > v1.4)
> > >     > >         export
> > > KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0
> > >     > >         export OMP_NUM_THREADS=18
> > >     > >
> > >     > > Did you set any env variables during running?
> > >     > >
> > >     > > The performance result I got as below:
> > >     > > 1) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
> > >     > > real    12m10.856s
> > >     > > user    234m49.576s
> > >     > > sys     4m38.044s
> > >     > >
> > >     > > 2) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
> > >     > > real    12m52.140s
> > >     > > user    246m30.740s
> > >     > > sys     5m8.188s
> > >     > >
> > >     > > As I looked at the profiling data, most of the ops have same
> perf
> > > between
> > >     > > v1.4 and v1.5. But some ops like " _backward_BatchNorm" and
> > > "Pooling" is
> > >     > > ~1.37x slower on v1.5 compared with v1.4.
> > >     > > Will do further analysis on these ops.
> > >     > >
> > >     > > Here's the hardware/OS info from my side:
> > >     > > ----------Python Info----------
> > >     > > Version      : 3.6.8
> > >     > > Compiler     : GCC 7.3.0
> > >     > > Build        : ('default', 'Dec 30 2018 01:22:34')
> > >     > > Arch         : ('64bit', '')
> > >     > > ------------Pip Info-----------
> > >     > > Version      : 19.0.3
> > >     > > Directory    :
> > >     > >
> > > /home/ubuntu/anaconda3/envs/perf-mxnet/lib/python3.6/site-packages/pip
> > >     > > ----------MXNet Info-----------
> > >     > > Version      : 1.5.0
> > >     > > Directory    : /home/ubuntu/ws/incubator-mxnet/python/mxnet
> > >     > > Hashtag not found. Not installed from pre-built package.
> > >     > > ----------System Info----------
> > >     > > Platform     :
> Linux-4.4.0-1085-aws-x86_64-with-debian-stretch-sid
> > >     > > system       : Linux
> > >     > > node         : ip-172-31-32-129
> > >     > > release      : 4.4.0-1085-aws
> > >     > > version      : #96-Ubuntu SMP Tue Jun 11 09:08:32 UTC 2019
> > >     > > ----------Hardware Info----------
> > >     > > machine      : x86_64
> > >     > > processor    : x86_64
> > >     > > Architecture:          x86_64
> > >     > > CPU op-mode(s):        32-bit, 64-bit
> > >     > > Byte Order:            Little Endian
> > >     > > CPU(s):                72
> > >     > > On-line CPU(s) list:   0-71
> > >     > > Thread(s) per core:    2
> > >     > > Core(s) per socket:    18
> > >     > > Socket(s):             2
> > >     > > NUMA node(s):          2
> > >     > > Vendor ID:             GenuineIntel
> > >     > > CPU family:            6
> > >     > > Model:                 85
> > >     > > Model name:            Intel(R) Xeon(R) Platinum 8124M CPU @
> > > 3.00GHz
> > >     > > Stepping:              3
> > >     > > CPU MHz:               3000.000
> > >     > > BogoMIPS:              6000.00
> > >     > > Hypervisor vendor:     KVM
> > >     > > Virtualization type:   full
> > >     > > L1d cache:             32K
> > >     > > L1i cache:             32K
> > >     > > L2 cache:              1024K
> > >     > > L3 cache:              25344K
> > >     > > NUMA node0 CPU(s):     0-17,36-53
> > >     > > NUMA node1 CPU(s):     18-35,54-71
> > >     > > Flags:                 fpu vme de pse tsc msr pae mce cx8 apic
> sep
> > > mtrr
> > >     > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall
> nx
> > > pdpe1gb
> > >     > > rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology
> > > nonstop_tsc
> > >     > > aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16
> > > pcid sse4_1
> > >     > > sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx
> f16c
> > > rdrand
> > >     > > hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser
> fsgsbase
> > >     > > tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f
> > > rdseed adx
> > >     > > smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 ida arat
> pku
> > >     > > ----------Network Test----------
> > >     > >
> > >     > >
> > >     > > -Ciyong
> > >     > >
> > >     > >
> > >     > > -----Original Message-----
> > >     > > From: Zhao, Patric [mailto:patric.zhao@intel.com]
> > >     > > Sent: Thursday, June 27, 2019 9:55 AM
> > >     > > To: dev@mxnet.incubator.apache.org
> > >     > > Cc: dev@mxnet.apache.org
> > >     > > Subject: RE: [VOTE] Release Apache MXNet (incubating) version
> > > 1.5.0.rc1
> > >     > >
> > >     > > Could we run more epochs to see the performance difference or
> > > profiling
> > >     > > the difference between good and bad run?
> > >     > >
> > >     > > > -----Original Message-----
> > >     > > > From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com]
> > >     > > > Sent: Thursday, June 27, 2019 9:35 AM
> > >     > > > To: dev@mxnet.incubator.apache.org
> > >     > > > Cc: dev@mxnet.apache.org
> > >     > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version
> > >     > > > 1.5.0.rc1
> > >     > > >
> > >     > > > I run again and the gap is again bigger, I guess we need to
> > > average
> > >     > > > out the times across several runs:
> > >     > > >
> > >     > > > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
> > >     > > > (master)+$ time ~/mxnet_1.4/py3_venv/bin/python cifar10.py
> > > --epochs 5
> > >     > > > && time ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
> > >     > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
> > >     > > > ImageRecordIOParser2:
> > >     > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use
> 4
> > > threads
> > >     > > > for decoding..
> > >     > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean
> > > image
> > >     > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > >     > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean
> > > image
> > >     > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > completed
> > >     > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
> > >     > > > ImageRecordIOParser2:
> > >     > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
> > > threads
> > >     > > > for decoding..
> > >     > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean
> > > image
> > >     > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > >     > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean
> > > image
> > >     > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > completed
> > >     > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
> > > 300:
> > >     > > > 0.0001} Epoch 0, Changed learning rate to 0.05 [23:17:09]
> > >     > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > >     > > > 147456 bytes with malloc directly
> > >     > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> Allocate
> > >     > > > 589824 bytes with malloc directly
> > >     > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> Allocate
> > >     > > > 2359296 bytes with malloc directly
> > >     > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> Allocate
> > >     > > > 9437184 bytes with malloc directly
> > >     > > > Epoch 0, Batch 199, Speed=384.149839
> > >     > > > Epoch 0, Duration=140.919567
> > >     > > > Epoch 0, Training accuracy=0.115169
> > >     > > > Epoch 0, Validation accuracy=0.141317
> > >     > > > Epoch 1, Batch 199, Speed=433.380512
> > >     > > > Epoch 1, Duration=119.553233
> > >     > > > Epoch 1, Training accuracy=0.170956
> > >     > > > Epoch 1, Validation accuracy=0.216146
> > >     > > > Epoch 2, Batch 199, Speed=434.864699
> > >     > > > Epoch 2, Duration=123.278490
> > >     > > > Epoch 2, Training accuracy=0.209455
> > >     > > > Epoch 2, Validation accuracy=0.247296
> > >     > > > Epoch 3, Batch 199, Speed=433.401854
> > >     > > > Epoch 3, Duration=118.327797
> > >     > > > Epoch 3, Training accuracy=0.248701
> > >     > > > Epoch 3, Validation accuracy=0.302083
> > >     > > > Epoch 4, Batch 199, Speed=419.713707
> > >     > > > Epoch 4, Duration=126.468409
> > >     > > > Epoch 4, Training accuracy=0.260949
> > >     > > > Epoch 4, Validation accuracy=0.269030
> > >     > > >
> > >     > > > real    10m55.796s
> > >     > > > user    399m33.567s
> > >     > > > sys     13m55.904s
> > >     > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
> > >     > > > ImageRecordIOParser2:
> > >     > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use
> 4
> > > threads
> > >     > > > for decoding..
> > >     > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean
> > > image
> > >     > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > >     > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean
> > > image
> > >     > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > completed
> > >     > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
> > >     > > > ImageRecordIOParser2:
> > >     > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
> > > threads
> > >     > > > for decoding..
> > >     > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean
> > > image
> > >     > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > >     > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean
> > > image
> > >     > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > completed
> > >     > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
> > > 300:
> > >     > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0, Batch
> > > 199,
> > >     > > > Speed=419.039188 Epoch 0, Duration=143.934903 Epoch 0,
> Training
> > >     > > > accuracy=0.122542 Epoch 0, Validation accuracy=0.164359
> Epoch 1,
> > > Batch
> > >     > > > 199, Speed=445.257048 Epoch 1, Duration=135.248399 Epoch 1,
> > > Training
> > >     > > > accuracy=0.178828 Epoch 1, Validation accuracy=0.199419
> Epoch 2,
> > > Batch
> > >     > > > 199, Speed=447.115215 Epoch 2, Duration=132.003770 Epoch 2,
> > > Training
> > >     > > > accuracy=0.217808 Epoch 2, Validation accuracy=0.233073
> Epoch 3,
> > > Batch
> > >     > > > 199, Speed=441.079477 Epoch 3, Duration=126.543316 Epoch 3,
> > > Training
> > >     > > > accuracy=0.248102 Epoch 3, Validation accuracy=0.293870
> Epoch 4,
> > > Batch
> > >     > > > 199, Speed=449.329787 Epoch 4, Duration=138.398325 Epoch 4,
> > > Training
> > >     > > > accuracy=0.270021 Epoch 4, Validation accuracy=0.311498
> > >     > > >
> > >     > > > real    11m45.329s
> > >     > > > user    426m13.908s
> > >     > > > sys     16m45.093s
> > >     > > >
> > >     > > > On Wed, Jun 26, 2019 at 4:18 PM Pedro Larroy
> > >     > > > <pe...@gmail.com> wrote:
> > >     > > > >
> > >     > > > > The difference looks smaller now, more like your numbers. I
> > > wonder
> > >     > > > > if something happened during the previous benchmark like a
> > > system
> > >     > > > > update...
> > >     > > > >
> > >     > > > >
> > >     > > > > piotr@ip-172-31-63-171
> :0:~/deeplearning-benchmark/dawnbench
> > >     > > > (master)+$
> > >     > > > > time ~/mxnet_1.4/py3_venv/bin/python cifar10.py --epochs 5
> &&
> > > time
> > >     > > > > ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
> > > [22:49:41]
> > >     > > > > ../src/io/iter_image_recordio_2.cc:172:
> > >     > > > > ImageRecordIOParser2:
> > >     > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec,
> use 4
> > >     > > > > threads for decoding..
> > >     > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load
> mean
> > > image
> > >     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > >     > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load
> mean
> > > image
> > >     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > >     > > > completed
> > >     > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:172:
> > >     > > > > ImageRecordIOParser2:
> > >     > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec,
> use 4
> > >     > > > > threads for decoding..
> > >     > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load
> mean
> > > image
> > >     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > >     > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load
> mean
> > > image
> > >     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > >     > > > completed
> > >     > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123:
> 0.0005,
> > > 300:
> > >     > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 [22:49:42]
> > >     > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > >     > > > > 147456 bytes with malloc directly
> > >     > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> > > Allocate
> > >     > > > > 589824 bytes with malloc directly
> > >     > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> > > Allocate
> > >     > > > > 2359296 bytes with malloc directly
> > >     > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> > > Allocate
> > >     > > > > 9437184 bytes with malloc directly
> > >     > > > > Epoch 0, Batch 199, Speed=426.182733 Epoch 0,
> > > Duration=134.868458
> > >     > > > > Epoch 0, Training accuracy=0.127238 Epoch 0, Validation
> > >     > > > > accuracy=0.206388 Epoch 1, Batch 199, Speed=313.127156
> Epoch 1,
> > >     > > > > Duration=128.041775 Epoch 1, Training accuracy=0.182065
> Epoch
> > > 1,
> > >     > > > > Validation accuracy=0.202524 Epoch 2, Batch 199,
> > > Speed=410.931187
> > >     > > > > Epoch 2, Duration=124.920588 Epoch 2, Training
> > > accuracy=0.202584
> > >     > > > > Epoch 2, Validation accuracy=0.245693 Epoch 3, Batch 199,
> > >     > > > > Speed=419.119335 Epoch 3, Duration=120.948349 Epoch 3,
> Training
> > >     > > > > accuracy=0.235854 Epoch 3, Validation accuracy=0.291066
> Epoch
> > > 4,
> > >     > > > > Batch 199, Speed=430.473733 Epoch 4, Duration=130.181724
> Epoch
> > > 4,
> > >     > > > > Training accuracy=0.257773 Epoch 4, Validation
> > > accuracy=0.304988
> > >     > > > >
> > >     > > > > real    11m7.356s
> > >     > > > > user    406m9.910s
> > >     > > > > sys     14m18.349s
> > >     > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
> > >     > > > > ImageRecordIOParser2:
> > >     > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec,
> use 4
> > >     > > > > threads for decoding..
> > >     > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load
> mean
> > > image
> > >     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > >     > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load
> mean
> > > image
> > >     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > >     > > > completed
> > >     > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
> > >     > > > > ImageRecordIOParser2:
> > >     > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec,
> use 4
> > >     > > > > threads for decoding..
> > >     > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load
> mean
> > > image
> > >     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > >     > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load
> mean
> > > image
> > >     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > >     > > > completed
> > >     > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123:
> 0.0005,
> > > 300:
> > >     > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0,
> Batch
> > > 199,
> > >     > > > > Speed=348.618154 Epoch 0, Duration=146.469352 Epoch 0,
> Training
> > >     > > > > accuracy=0.124121 Epoch 0, Validation accuracy=0.167227
> Epoch
> > > 1,
> > >     > > > > Batch 199, Speed=452.790825 Epoch 1, Duration=130.199421
> Epoch
> > > 1,
> > >     > > > > Training
> > >     > > > > accuracy=0.183863 Epoch 1, Validation accuracy=0.237079
> Epoch
> > > 2,
> > >     > > > > Batch 199, Speed=451.406559 Epoch 2, Duration=126.320823
> Epoch
> > > 2,
> > >     > > > > Training
> > >     > > > > accuracy=0.214844 Epoch 2, Validation accuracy=0.244692
> Epoch
> > > 3,
> > >     > > > > Batch 199, Speed=403.161873 Epoch 3, Duration=125.331660
> Epoch
> > > 3,
> > >     > > > > Training
> > >     > > > > accuracy=0.243506 Epoch 3, Validation accuracy=0.301182
> Epoch
> > > 4,
> > >     > > > > Batch 199, Speed=450.826598 Epoch 4, Duration=126.426253
> Epoch
> > > 4,
> > >     > > > > Training
> > >     > > > > accuracy=0.266424 Epoch 4, Validation accuracy=0.311899
> > >     > > > >
> > >     > > > > real    11m21.930s
> > >     > > > > user    415m3.855s
> > >     > > > > sys     13m53.975s
> > >     > > > >
> > >     > > > > On Wed, Jun 26, 2019 at 3:50 PM Pedro Larroy
> > >     > > > > <pe...@gmail.com> wrote:
> > >     > > > > >
> > >     > > > > > Hi Ciyong, thanks for trying to reproduce:
> > >     > > > > >
> > >     > > > > > I used this one:
> > >     > > > > > https://github.com/awslabs/deeplearning-
> > >     > > > benchmark/blob/master/dawnbe
> > >     > > > > > nch/cifar10.py
> > >     > > > > >
> > >     > > > > > Could you provide hardware and OS details?
> > >     > > > > >
> > >     > > > > > I will rerun and repost numbers in a few minutes.
> > >     > > > > >
> > >     > > > > > Pedro.
> > >     > > > > >
> > >     > > > > > On Wed, Jun 26, 2019 at 4:18 AM Chen, Ciyong
> > >     > > > > > <ci...@intel.com>
> > >     > > > wrote:
> > >     > > > > > >
> > >     > > > > > > Hi Pedro,
> > >     > > > > > >
> > >     > > > > > > I'm looking at this case, and using the script of
> > >     > > > > > >
> > > "incubator-mxnet/example/image-classification/train_cifar10.py"
> > >     > > > > > > to get
> > >     > > > the timing data, but seems there's not much difference
> between
> > > mxnet
> > >     > > > 1.4.1.rc0 and 1.5.0.rc1 on C5.18xlarge.
> > >     > > > > > >
> > >     > > > > > > Not sure if there's any difference in the python
> script,
> > > can you
> > >     > > > > > > point me
> > >     > > > the link to get your script (cifar10.py)?
> > >     > > > > > > Or you can also have a try with MXNet's script
> > >     > > > > > > (train_cifar10.py) and see
> > >     > > > the performance.
> > >     > > > > > >
> > >     > > > > > > Here's the command I used to collect the time:
> > >     > > > > > >         python train_cifar10.py --num-epoch=5
> > >     > > > > > >
> > >     > > > > > > 1) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
> > >     > > > > > >         real    9m4.880s
> > >     > > > > > >         user    333m13.340s
> > >     > > > > > >         sys     14m36.100s
> > >     > > > > > >
> > >     > > > > > > 2) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
> > >     > > > > > >         real    9m2.155s
> > >     > > > > > >         user    329m37.092s
> > >     > > > > > >         sys     16m8.668s
> > >     > > > > > >
> > >     > > > > > > -Ciyong
> > >     > > > > > >
> > >     > > > > > >
> > >     > > > > > > -----Original Message-----
> > >     > > > > > > From: Pedro Larroy [mailto:
> pedro.larroy.lists@gmail.com]
> > >     > > > > > > Sent: Wednesday, June 26, 2019 6:28 AM
> > >     > > > > > > To: dev@mxnet.incubator.apache.org
> > >     > > > > > > Cc: dev@mxnet.apache.org
> > >     > > > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating)
> > > version
> > >     > > > > > > 1.5.0.rc1
> > >     > > > > > >
> > >     > > > > > > Hi these were my build flags and system info:
> > >     > > > > > >
> > >     > > > > > >
> > >     > > > > > > --- # CMake configuration
> > >     > > > > > > USE_CUDA: "OFF" # Build with CUDA support
> > >     > > > > > > USE_OLDCMAKECUDA: "OFF" # Build with old cmake cuda
> > >     > > > > > > USE_NCCL: "OFF" # Use NVidia NCCL with CUDA
> > >     > > > > > > USE_OPENCV: "ON" # Build with OpenCV support
> > >     > > > > > > USE_OPENMP: "ON" # Build with Openmp support
> > >     > > > > > > USE_CUDNN: "ON" # Build with cudnn support) # one
> could set
> > >     > > > > > > CUDNN_ROOT for search path
> > >     > > > > > > USE_SSE: "ON" # Build with x86 SSE instruction support
> IF
> > > NOT
> > >     > > > > > > ARM
> > >     > > > > > > USE_F16C: "ON" # Build with x86 F16C instruction
> support) #
> > >     > > > autodetects support if "ON"
> > >     > > > > > > USE_LAPACK: "ON" # Build with lapack support
> > >     > > > > > > USE_MKL_IF_AVAILABLE: "ON" # Use MKL if found
> > >     > > > > > > USE_MKLML_MKL: "ON" # Use MKLDNN variant of MKL (if MKL
> > > found)
> > >     > > > > > > IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
> > >     > > > > > > USE_MKLDNN: "ON" # Use MKLDNN variant of MKL (if MKL
> > > found) IF
> > >     > > > > > > USE_MKL_IF_AVAILABLE AND (NOT APPLE)
> > >     > > > > > > USE_OPERATOR_TUNING: "ON" # Enable auto-tuning of
> > > operators IF
> > >     > > > NOT
> > >     > > > > > > MSVC
> > >     > > > > > > USE_GPERFTOOLS: "ON" # Build with GPerfTools support
> (if
> > > found)
> > >     > > > > > > USE_JEMALLOC: "ON" # Build with Jemalloc support
> > >     > > > > > > USE_PROFILER: "ON" # Build with Profiler support
> > >     > > > > > > USE_DIST_KVSTORE: "OFF" # Build with DIST_KVSTORE
> support
> > >     > > > > > > USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC Plugins
> > >     > > > > > > USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin
> > >     > > > > > > USE_CPP_PACKAGE: "OFF" # Build C++ Package
> > >     > > > > > > USE_MXNET_LIB_NAMING: "ON" # Use MXNet library naming
> > >     > > > conventions.
> > >     > > > > > > USE_GPROF: "OFF" # Compile with gprof (profiling) flag
> > >     > > > > > > USE_CXX14_IF_AVAILABLE: "OFF" # Build with C++14 if the
> > > compiler
> > >     > > > > > > supports it
> > >     > > > > > > USE_VTUNE: "OFF" # Enable use of Intel Amplifier XE
> > > (VTune)) #
> > >     > > > > > > one could set VTUNE_ROOT for search path
> > >     > > > > > > ENABLE_CUDA_RTC: "ON" # Build with CUDA runtime
> compilation
> > >     > > > > > > support
> > >     > > > > > > BUILD_CPP_EXAMPLES: "ON" # Build cpp examples
> > >     > > > > > > INSTALL_EXAMPLES: "OFF" # Install the example source
> files.
> > >     > > > > > > USE_SIGNAL_HANDLER: "ON" # Print stack traces on
> segfaults.
> > >     > > > > > > USE_TENSORRT: "OFF" # Enable infeference optimization
> with
> > >     > > TensorRT.
> > >     > > > > > > USE_ASAN: "OFF" # Enable Clang/GCC ASAN sanitizers.
> > >     > > > > > > ENABLE_TESTCOVERAGE: "OFF" # Enable compilation with
> test
> > >     > > > > > > coverage metric output
> > >     > > > > > > CMAKE_BUILD_TYPE: "Release"
> > >     > > > > > > CMAKE_CUDA_COMPILER_LAUNCHER: "ccache"
> > >     > > > > > > CMAKE_C_COMPILER_LAUNCHER: "ccache"
> > >     > > > > > > CMAKE_CXX_COMPILER_LAUNCHER: "ccache"
> > >     > > > > > >
> > >     > > > > > > commit 4d9667121ae6fb643f2a02ab15e25231ed756cde (HEAD,
> tag:
> > >     > > > > > > 1.5.0.rc1,
> > >     > > > > > > upstream/v1.5.x)
> > >     > > > > > > commit 1a7199691f5cbc6012bb53eecbf884bed5ae6590 (HEAD,
> tag:
> > >     > > > > > > 1.4.1.rc0,
> > >     > > > > > > upstream/v1.4.x)
> > >     > > > > > >
> > >     > > > > > > curl
> http://169.254.169.254/latest/meta-data/instance-type
> > >     > > > > > > c5d.18xlarge
> > >     > > > > > >
> > >     > > > > > >
> > >     > > > > > > Version      : 3.6.7
> > >     > > > > > > Compiler     : GCC 8.2.0
> > >     > > > > > > Build        : ('default', 'Oct 22 2018 11:32:17')
> > >     > > > > > > Arch         : ('64bit', 'ELF')
> > >     > > > > > > ------------Pip Info-----------
> > >     > > > > > > Version      : 19.1.1
> > >     > > > > > > Directory    :
> > > /home/piotr/mxnet_1.5/py3_venv/lib/python3.6/site-
> > >     > > > packages/pip
> > >     > > > > > > ----------MXNet Info-----------
> > >     > > > > > > Version      : 1.5.0
> > >     > > > > > > Directory    : /home/piotr/mxnet_1.5/python/mxnet
> > >     > > > > > > Hashtag not found. Not installed from pre-built
> package.
> > >     > > > > > > ----------System Info----------
> > >     > > > > > > Platform     :
> > >     > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
> > >     > > > > > > system       : Linux
> > >     > > > > > > node         : ip-172-31-63-171
> > >     > > > > > > release      : 4.15.0-1035-aws
> > >     > > > > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC
> 2019
> > >     > > > > > > ----------Hardware Info----------
> > >     > > > > > > machine      : x86_64
> > >     > > > > > > processor    : x86_64
> > >     > > > > > > Architecture:        x86_64
> > >     > > > > > > CPU op-mode(s):      32-bit, 64-bit
> > >     > > > > > > Byte Order:          Little Endian
> > >     > > > > > > CPU(s):              72
> > >     > > > > > > On-line CPU(s) list: 0-71
> > >     > > > > > > Thread(s) per core:  2
> > >     > > > > > > Core(s) per socket:  18
> > >     > > > > > > Socket(s):           2
> > >     > > > > > > NUMA node(s):        2
> > >     > > > > > > Vendor ID:           GenuineIntel
> > >     > > > > > > CPU family:          6
> > >     > > > > > > Model:               85
> > >     > > > > > > Model name:          Intel(R) Xeon(R) Platinum 8124M
> CPU @
> > > 3.00GHz
> > >     > > > > > > Stepping:            4
> > >     > > > > > > CPU MHz:             1326.446
> > >     > > > > > > BogoMIPS:            6000.00
> > >     > > > > > > Hypervisor vendor:   KVM
> > >     > > > > > > Virtualization type: full
> > >     > > > > > > L1d cache:           32K
> > >     > > > > > > L1i cache:           32K
> > >     > > > > > > L2 cache:            1024K
> > >     > > > > > > L3 cache:            25344K
> > >     > > > > > > NUMA node0 CPU(s):   0-17,36-53
> > >     > > > > > > NUMA node1 CPU(s):   18-35,54-71
> > >     > > > > > > Flags:               fpu vme de pse tsc msr pae mce cx8
> > > apic sep
> > >     > > mtrr
> > >     > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht
> > > syscall
> > >     > > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good
> > > nopl
> > >     > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
> > > monitor
> > >     > > > > > > ssse3 fma cx16 pcid
> > >     > > > > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer
> aes
> > > xsave
> > >     > > > > > > avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
> > >     > > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2
> smep
> > > bmi2
> > >     > > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx smap
> > > clflushopt
> > >     > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
> > > xsaves
> > >     > > > > > > ida arat pku ospke ----------Network Test----------
> > >     > > > > > >
> > >     > > > > > > ----------Python Info----------
> > >     > > > > > > Version      : 3.6.7
> > >     > > > > > > Compiler     : GCC 8.2.0
> > >     > > > > > > Build        : ('default', 'Oct 22 2018 11:32:17')
> > >     > > > > > > Arch         : ('64bit', 'ELF')
> > >     > > > > > > ------------Pip Info-----------
> > >     > > > > > > Version      : 19.1.1
> > >     > > > > > > Directory    :
> > > /home/piotr/mxnet_1.4/py3_venv/lib/python3.6/site-
> > >     > > > packages/pip
> > >     > > > > > > ----------MXNet Info-----------
> > >     > > > > > > Version      : 1.4.1
> > >     > > > > > > Directory    : /home/piotr/mxnet_1.4/python/mxnet
> > >     > > > > > > Hashtag not found. Not installed from pre-built
> package.
> > >     > > > > > > ----------System Info----------
> > >     > > > > > > Platform     :
> > >     > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
> > >     > > > > > > system       : Linux
> > >     > > > > > > node         : ip-172-31-63-171
> > >     > > > > > > release      : 4.15.0-1035-aws
> > >     > > > > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC
> 2019
> > >     > > > > > > ----------Hardware Info----------
> > >     > > > > > > machine      : x86_64
> > >     > > > > > > processor    : x86_64
> > >     > > > > > > Architecture:        x86_64
> > >     > > > > > > CPU op-mode(s):      32-bit, 64-bit
> > >     > > > > > > Byte Order:          Little Endian
> > >     > > > > > > CPU(s):              72
> > >     > > > > > > On-line CPU(s) list: 0-71
> > >     > > > > > > Thread(s) per core:  2
> > >     > > > > > > Core(s) per socket:  18
> > >     > > > > > > Socket(s):           2
> > >     > > > > > > NUMA node(s):        2
> > >     > > > > > > Vendor ID:           GenuineIntel
> > >     > > > > > > CPU family:          6
> > >     > > > > > > Model:               85
> > >     > > > > > > Model name:          Intel(R) Xeon(R) Platinum 8124M
> CPU @
> > > 3.00GHz
> > >     > > > > > > Stepping:            4
> > >     > > > > > > CPU MHz:             1223.344
> > >     > > > > > > BogoMIPS:            6000.00
> > >     > > > > > > Hypervisor vendor:   KVM
> > >     > > > > > > Virtualization type: full
> > >     > > > > > > L1d cache:           32K
> > >     > > > > > > L1i cache:           32K
> > >     > > > > > > L2 cache:            1024K
> > >     > > > > > > L3 cache:            25344K
> > >     > > > > > > NUMA node0 CPU(s):   0-17,36-53
> > >     > > > > > > NUMA node1 CPU(s):   18-35,54-71
> > >     > > > > > > Flags:               fpu vme de pse tsc msr pae mce cx8
> > > apic sep
> > >     > > mtrr
> > >     > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht
> > > syscall
> > >     > > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good
> > > nopl
> > >     > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
> > > monitor
> > >     > > > > > > ssse3 fma cx16 pcid
> > >     > > > > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer
> aes
> > > xsave
> > >     > > > > > > avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
> > >     > > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2
> smep
> > > bmi2
> > >     > > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx smap
> > > clflushopt
> > >     > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
> > > xsaves
> > >     > > > > > > ida arat pku ospke ----------Network Test----------
> > >     > > > > > >
> > >     > > > > > > On Tue, Jun 25, 2019 at 2:35 PM Pedro Larroy
> > >     > > > <pe...@gmail.com> wrote:
> > >     > > > > > > >
> > >     > > > > > > > I did a training of cifar10 in CPU and seems there's
> some
> > >     > > > > > > > regressions in the range of 7% increase of training
> time
> > > against
> > >     > > 1.4.1:
> > >     > > > > > > >
> > >     > > > > > > > (py3_venv)
> > >     > > > > > > > piotr@ip-172-31-63-171
> > > :0:~/deeplearning-benchmark/dawnbench
> > >     > > > > > > > (master)+$ time python cifar10.py --epochs 5
> > >     > > > > > > > real    11m30.388s
> > >     > > > > > > > user    417m7.766s
> > >     > > > > > > > sys     16m57.315s
> > >     > > > > > > >
> > >     > > > > > > > VS 1.4.1:
> > >     > > > > > > > real    10m41.994s
> > >     > > > > > > > user    392m40.646s
> > >     > > > > > > > sys     12m30.601s
> > >     > > > > > > >
> > >     > > > > > > >
> > >     > > > > > > > On Thu, Jun 20, 2019 at 10:15 PM Lai Wei <
> > > royweilai@gmail.com>
> > >     > > > wrote:
> > >     > > > > > > > >
> > >     > > > > > > > > Hi Anirudh,
> > >     > > > > > > > >
> > >     > > > > > > > > Thanks for jumping into this quickly, I followed
> up on
> > > the
> > >     > > issue.
> > >     > > > > > > > >
> > >     > > > > > > > > I was meant for sockeye developer/maintainers to
> help
> > > setup
> > >     > > > > > > > > nightly tests and raise issues early.
> > >     > > > > > > > >
> > >     > > > > > > > > Thanks!
> > >     > > > > > > > >
> > >     > > > > > > > > On Fri, Jun 21, 2019 at 10:10 AM Haibin Lin
> > >     > > > > > > > > <ha...@gmail.com>
> > >     > > > > > > > > wrote:
> > >     > > > > > > > >
> > >     > > > > > > > > > In GluonNLP we are testing with MXNET nightly
> build
> > > for
> > >     > > > > > > > > > each PR, and we did find some MXNet related issue
> > > caught by
> > >     > > the CI.
> > >     > > > > > > > > > I recommend other toolkits also add integration
> > > tests with
> > >     > > > > > > > > > MXNet
> > >     > > > nightly.
> > >     > > > > > > > > > It helps identify issues early.
> > >     > > > > > > > > >
> > >     > > > > > > > > > Best,
> > >     > > > > > > > > > Haibin
> > >     > > > > > > > > >
> > >     > > > > > > > > > On Thu, Jun 20, 2019 at 18:52 Zhao, Patric
> > >     > > > > > > > > > <pa...@intel.com>
> > >     > > > wrote:
> > >     > > > > > > > > >
> > >     > > > > > > > > > > Thanks to raise the issue and we will take a
> look
> > > ASAP.
> > >     > > > > > > > > > >
> > >     > > > > > > > > > > The downstream cases is not in the MXNet CI so
> > > it's hard
> > >     > > > > > > > > > > to catch the potential bugs or performance
> > > degradation
> > >     > > > > > > > > > > for
> > >     > > > MXNet developers.
> > >     > > > > > > > > > >
> > >     > > > > > > > > > > In the future, I suggest adding the major
> > > downstream
> > >     > > > > > > > > > > test cases, like
> > >     > > > > > > > > > from
> > >     > > > > > > > > > > sockeye, GluonNLP, GLuonCV, DGL, Gluon-TS,
> into the
> > >     > > > > > > > > > > nightly
> > >     > > > test.
> > >     > > > > > > > > > > If it's still too heavy,  maybe testing it
> weekly
> > > or
> > >     > > > > > > > > > > monthly :)
> > >     > > > > > > > > > >
> > >     > > > > > > > > > > Thanks,
> > >     > > > > > > > > > >
> > >     > > > > > > > > > > --Patric
> > >     > > > > > > > > > >
> > >     > > > > > > > > > > > -----Original Message-----
> > >     > > > > > > > > > > > From: Anirudh Subramanian
> > >     > > > > > > > > > > > [mailto:anirudh2290@gmail.com]
> > >     > > > > > > > > > > > Sent: Friday, June 21, 2019 9:31 AM
> > >     > > > > > > > > > > > To: dev@mxnet.incubator.apache.org
> > >     > > > > > > > > > > > Cc: dev@mxnet.apache.org
> > >     > > > > > > > > > > > Subject: Re: [VOTE] Release Apache MXNet
> > > (incubating)
> > >     > > > > > > > > > > > version
> > >     > > > > > > > > > > > 1.5.0.rc1
> > >     > > > > > > > > > > >
> > >     > > > > > > > > > > > Hi Lai,
> > >     > > > > > > > > > > >
> > >     > > > > > > > > > > > I have opened an issue:
> > >     > > > > > > > > > > >
> > > https://github.com/apache/incubator-mxnet/issues/15297
> > >     > > > > > > > > > > > I came to know about this issue only today
> and I
> > > have
> > >     > > > > > > > > > > > not been
> > >     > > > > > > > > > monitoring
> > >     > > > > > > > > > > > sockeye.
> > >     > > > > > > > > > > > I jumped onto this issue to make sure it
> wasn't
> > > caused
> > >     > > > > > > > > > > > by the dlpack
> > >     > > > > > > > > > > changes.
> > >     > > > > > > > > > > > Also, I don't  think sockeye CI checks
> against
> > > master,
> > >     > > > > > > > > > > > it is using
> > >     > > > > > > > > > 1.4.1.
> > >     > > > > > > > > > > >
> > >     > > > > > > > > > > > Anirudh
> > >     > > > > > > > > > > >
> > >     > > > > > > > > > > >
> > >     > > > > > > > > > > > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei
> > >     > > > > > > > > > > > <ro...@gmail.com>
> > >     > > > wrote:
> > >     > > > > > > > > > > >
> > >     > > > > > > > > > > > > Hi,
> > >     > > > > > > > > > > > >
> > >     > > > > > > > > > > > > Could you share which test failed and
> what’s
> > > the
> > >     > > > > > > > > > > > > crash? How to reproduce it?
> > >     > > > > > > > > > > > >
> > >     > > > > > > > > > > > > I was able to install sockeye and run all
> > > tests passed.
> > >     > > > > > > > > > > > > Using python setup.py test
> > >     > > > > > > > > > > > >
> > >     > > > > > > > > > > > > I have tested both nightly pip package and
> > > 1.5.0.rc1
> > >     > > > > > > > > > > > >
> > >     > > > > > > > > > > > > It would be great to create an issue with
> > >     > > > > > > > > > > > > reproducible steps and move the discussion
> > > there.
> > >     > > > > > > > > > > > >
> > >     > > > > > > > > > > > > Also I see sockeye nightly build[1] has
> been
> > > failing
> > >     > > > > > > > > > > > > for some time,
> > >     > > > > > > > > > if
> > >     > > > > > > > > > > > > it’s due to MXNet change, please raise this
> > > early so
> > >     > > > > > > > > > > > > we can track and solve it in time rather
> than
> > > block
> > >     > > > > > > > > > > > > the release
> > >     > > > during vote time.
> > >     > > > > > > > > > > > >
> > >     > > > > > > > > > > > > [1] https://travis-ci.org/awslabs/sockeye
> > >     > > > > > > > > > > > >
> > >     > > > > > > > > > > > >
> > >     > > > > > > > > > > > > On Fri, Jun 21, 2019 at 7:01 AM Anirudh
> > > Subramanian
> > >     > > > > > > > > > > > > <anirudh2290@gmail.com
> > >     > > > > > > > > > > > > >
> > >     > > > > > > > > > > > > wrote:
> > >     > > > > > > > > > > > >
> > >     > > > > > > > > > > > > > I was able to reproduce a crash with the
> > > commit
> > >     > > > > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06
> but
> > > not
> > >     > > > > > > > > > > > > > with the commit
> > >     > > > a862270beb2d796c1ba311183f7f4a766a18ad6c.
> > >     > > > > > > > > > > > > >
> > >     > > > > > > > > > > > > > Anirudh
> > >     > > > > > > > > > > > > >
> > >     > > > > > > > > > > > > > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei
> > >     > > > > > > > > > > > > > <ro...@gmail.com>
> > >     > > > > > > > > > wrote:
> > >     > > > > > > > > > > > > >
> > >     > > > > > > > > > > > > > > Hi Przemyslaw,
> > >     > > > > > > > > > > > > > >
> > >     > > > > > > > > > > > > > > Is there an issue with more details to
> > > track the
> > >     > > problem?
> > >     > > > > > > > > > > > > > >
> > >     > > > > > > > > > > > > > >
> > >     > > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 6:04 AM
> Przemysław
> > >     > > > > > > > > > > > > > > Trędak <pt...@apache.org>
> > >     > > > > > > > > > > > > > > wrote:
> > >     > > > > > > > > > > > > > >
> > >     > > > > > > > > > > > > > > > -1
> > >     > > > > > > > > > > > > > > >
> > >     > > > > > > > > > > > > > > > There is a crash in sockeye unit test
> > > (python
> > >     > > > > > > > > > > > > > > > setup.py
> > >     > > > > > > > > > > > > > > > test) observed starting with nightly
> 1.5
> > > build
> > >     > > > > > > > > > > > > > > > from
> > >     > > > > > > > > > > > > > > > 6/13 and still occuring in
> > >     > > > > > > > > > > > > > 1.5rc1. I
> > >     > > > > > > > > > > > > > > > don't yet have the exact commit that
> is
> > >     > > > > > > > > > > > > > > > responsible for it, but it is either
> > >     > > > > > > > > > > > > > > >
> a862270beb2d796c1ba311183f7f4a766a18ad6c
> > >     > > > > > > > > > > > > > > > (dlpack
> > >     > > > > > > > > > > > > > > > related) or
> > >     > > > > > > > > > > > > > > >
> 09202f7f261954383aa387144524d38f83f18d06
> > >     > > > > > > > > > > > > > > > (cached op
> > >     > > > > > > > > > > > optimization).
> > >     > > > > > > > > > > > > > > >
> > >     > > > > > > > > > > > > > > > On 2019/06/20 06:36:22, Lai Wei
> > >     > > > > > > > > > > > > > > > <ro...@gmail.com>
> > >     > > > wrote:
> > >     > > > > > > > > > > > > > > > > Dear MXNet community,
> > >     > > > > > > > > > > > > > > > >
> > >     > > > > > > > > > > > > > > > > This is the 3-day vote to release
> > > Apache
> > >     > > > > > > > > > > > > > > > > MXNet
> > >     > > > > > > > > > > > > > > > > (incubating) version
> > >     > > > > > > > > > > > > > > > 1.5.0.
> > >     > > > > > > > > > > > > > > > > Voting on dev@ will start June 19,
> > >     > > > > > > > > > > > > > > > > 23:59:59(PST) and close
> > >     > > > > > > > > > on
> > >     > > > > > > > > > > > > June
> > >     > > > > > > > > > > > > > > 22,
> > >     > > > > > > > > > > > > > > > > 23:59:59.
> > >     > > > > > > > > > > > > > > > >
> > >     > > > > > > > > > > > > > > > > 1) Link to release notes:
> > >     > > > > > > > > > > > > > > > >
> > >     > > > > > > > > > > > > >
> > >     > > > > > > > > >
> > > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Re
> > >     > > > > > > > > > le
> > >     > > > > > > > > > ase+No
> > >     > > > > > > > > > te
> > >     > > > > > > > > > > > > > s
> > >     > > > > > > > > > > > > > > > >
> > >     > > > > > > > > > > > > > > > >
> > >     > > > > > > > > > > > > > > > > 2) Link to release candidate:
> > >     > > > > > > > > > > > > > > > >
> > >     > > > > > > > > > > > > > > > >
> > >     > > > > > > > > >
> > > https://github.com/apache/incubator-mxnet/releases/tag/1.5
> > >     > > > > > > > > > .0
> > >     > > > > > > > > > .r
> > >     > > > > > > > > > > > > > > > > c1
> > >     > > > > > > > > > > > > > > > >
> > >     > > > > > > > > > > > > > > > >
> > >     > > > > > > > > > > > > > > > > 3) Link to source and signatures on
> > > apache
> > >     > > dist server:
> > >     > > > > > > > > > > > > > > > >
> > >     > > > > > > > > > > > > > > > >
> > >     > > > > > > > > >
> > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5
> > >     > > > > > > > > > .0
> > >     > > > > > > > > > .r
> > >     > > > > > > > > > > > > > > > > c1/
> > >     > > > > > > > > > > > > > > > >
> > >     > > > > > > > > > > > > > > > >
> > >     > > > > > > > > > > > > > > > > Please remember to TEST first
> before
> > > voting
> > >     > > > accordingly:
> > >     > > > > > > > > > > > > > > > >
> > >     > > > > > > > > > > > > > > > > +1 = approve
> > >     > > > > > > > > > > > > > > > > +0 = no opinion
> > >     > > > > > > > > > > > > > > > > -1 = disapprove (provide reason)
> > >     > > > > > > > > > > > > > > > > --
> > >     > > > > > > > > > > > > > > > > Best Regards
> > >     > > > > > > > > > > > > > > > >
> > >     > > > > > > > > > > > > > > > > Lai
> > >     > > > > > > > > > > > > > > > >
> > >     > > > > > > > > > > > > > > >
> > >     > > > > > > > > > > > > > > --
> > >     > > > > > > > > > > > > > > Best Regards
> > >     > > > > > > > > > > > > > >
> > >     > > > > > > > > > > > > > > Lai
> > >     > > > > > > > > > > > > > >
> > >     > > > > > > > > > > > > >
> > >     > > > > > > > > > > > > --
> > >     > > > > > > > > > > > > Best Regards
> > >     > > > > > > > > > > > >
> > >     > > > > > > > > > > > > Lai
> > >     > > > > > > > > > > > >
> > >     > > > > > > > > > >
> > >     > > > > > > > > >
> > >     > > > > > > > > --
> > >     > > > > > > > > Best Regards
> > >     > > > > > > > >
> > >     > > > > > > > > Lai
> > >     > >
> > >     > --
> > >     > Best Regards
> > >     >
> > >     > Lai
> > >
> > >
> > >
> > >
>
>

Re: FW: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Posted by Pedro Larroy <pe...@gmail.com>.
Thanks Manu, the warmup is important, also the first run it downloads
a bunch of data which will affect the measurement. That's a good idea.

How can I find which commit corresponds to a pip build myself?

Pedro.

On Fri, Jun 28, 2019 at 4:48 PM Manu Seth <ma...@gmail.com> wrote:
>
> I ran the same cifar10.py script as Pedro, but for 20 epochs. Considering
> the first 10 epochs for warm-up, I averaged time per epoch for the last 10
> epochs.
>
> With MXNet 1.4.1 average time is 164.23 s
> With MXNet 1.5.0 average time is 174.59 s (~6.3% regression)
>
>
> For a second data point, I ran Gluon speed test benchmark script -
> https://github.com/apache/incubator-mxnet/blob/master/benchmark/python/gluon/benchmark_gluon.py
> using the following command:
> python3 benchmark_gluon.py --model 'resnet152_v2' --batch-size 128
> --num-batches 200 --type 'training'
>
> I got the following speeds:
> With MXNet 1.4.1, average speed is 25.677534 img/s
> With MXNet 1.5.0, average speed is 25.082130 img/s (~2.3% regression)
>
> Note:
> For 1.4.1 version, I used pip install mxnet-mkl==1.4.1
> For 1.5.0 version, I used pip install mxnet-mkl==1.5.0b20190619 which
> corresponds to commit# ccbbf6b4b76ea536a6583c99497c83b65a20817b which is
> behind 1.5.x branch by 4 commits
>
>
> Best,
> Manu
>
>
> On 6/27/19, 10:44 AM, "Pedro Larroy" <pe...@gmail.com> wrote:
> >
> >     I will try to run a few benchmarks in a bare metal instance tonight to
> >     remove virtualization variance for the measurements and provide some
> >     numbers.
> >
> >     Please propose a set of models / examples that would be desirable to
> >     run before the release and provide a link to an easy to run script
> >     with instructions so we can validate the release better.
> >
> >     Thank you.
> >
> >     On Thu, Jun 27, 2019 at 10:01 AM Lai Wei <ro...@gmail.com> wrote:
> >     >
> >     > Dear @dev,
> >     >
> >     > I m cancelling the vote for cached op fix:
> >     >
> >     > https://github.com/apache/incubator-mxnet/pull/15298
> >     >
> >     > As for the possible cpu training regression, it looks like not a
> > blocker
> >     > for now.
> >     >
> >     > I will start a new rc2 vote, please help to validate.
> >     >
> >     > Thanks!
> >     >
> >     >
> >     > On Thu, Jun 27, 2019 at 10:06 PM Chen, Ciyong <ci...@intel.com>
> > wrote:
> >     >
> >     > > Hi Pedro,
> >     > >
> >     > > I was able to reproduced the similar result (v1.5 is ~%5.6 slower
> > than
> >     > > v1.4, I was using 18 cores for computing) with your script on
> > C5.18xlarge.
> >     > > But need to bind the cores with below command when running the
> > script,
> >     > > (without setting the env variables, I got a close time (<1%) with
> > v1.5 and
> >     > > v1.4)
> >     > >         export
> > KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0
> >     > >         export OMP_NUM_THREADS=18
> >     > >
> >     > > Did you set any env variables during running?
> >     > >
> >     > > The performance result I got as below:
> >     > > 1) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
> >     > > real    12m10.856s
> >     > > user    234m49.576s
> >     > > sys     4m38.044s
> >     > >
> >     > > 2) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
> >     > > real    12m52.140s
> >     > > user    246m30.740s
> >     > > sys     5m8.188s
> >     > >
> >     > > As I looked at the profiling data, most of the ops have same perf
> > between
> >     > > v1.4 and v1.5. But some ops like " _backward_BatchNorm" and
> > "Pooling" is
> >     > > ~1.37x slower on v1.5 compared with v1.4.
> >     > > Will do further analysis on these ops.
> >     > >
> >     > > Here's the hardware/OS info from my side:
> >     > > ----------Python Info----------
> >     > > Version      : 3.6.8
> >     > > Compiler     : GCC 7.3.0
> >     > > Build        : ('default', 'Dec 30 2018 01:22:34')
> >     > > Arch         : ('64bit', '')
> >     > > ------------Pip Info-----------
> >     > > Version      : 19.0.3
> >     > > Directory    :
> >     > >
> > /home/ubuntu/anaconda3/envs/perf-mxnet/lib/python3.6/site-packages/pip
> >     > > ----------MXNet Info-----------
> >     > > Version      : 1.5.0
> >     > > Directory    : /home/ubuntu/ws/incubator-mxnet/python/mxnet
> >     > > Hashtag not found. Not installed from pre-built package.
> >     > > ----------System Info----------
> >     > > Platform     : Linux-4.4.0-1085-aws-x86_64-with-debian-stretch-sid
> >     > > system       : Linux
> >     > > node         : ip-172-31-32-129
> >     > > release      : 4.4.0-1085-aws
> >     > > version      : #96-Ubuntu SMP Tue Jun 11 09:08:32 UTC 2019
> >     > > ----------Hardware Info----------
> >     > > machine      : x86_64
> >     > > processor    : x86_64
> >     > > Architecture:          x86_64
> >     > > CPU op-mode(s):        32-bit, 64-bit
> >     > > Byte Order:            Little Endian
> >     > > CPU(s):                72
> >     > > On-line CPU(s) list:   0-71
> >     > > Thread(s) per core:    2
> >     > > Core(s) per socket:    18
> >     > > Socket(s):             2
> >     > > NUMA node(s):          2
> >     > > Vendor ID:             GenuineIntel
> >     > > CPU family:            6
> >     > > Model:                 85
> >     > > Model name:            Intel(R) Xeon(R) Platinum 8124M CPU @
> > 3.00GHz
> >     > > Stepping:              3
> >     > > CPU MHz:               3000.000
> >     > > BogoMIPS:              6000.00
> >     > > Hypervisor vendor:     KVM
> >     > > Virtualization type:   full
> >     > > L1d cache:             32K
> >     > > L1i cache:             32K
> >     > > L2 cache:              1024K
> >     > > L3 cache:              25344K
> >     > > NUMA node0 CPU(s):     0-17,36-53
> >     > > NUMA node1 CPU(s):     18-35,54-71
> >     > > Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep
> > mtrr
> >     > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx
> > pdpe1gb
> >     > > rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology
> > nonstop_tsc
> >     > > aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16
> > pcid sse4_1
> >     > > sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c
> > rdrand
> >     > > hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase
> >     > > tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f
> > rdseed adx
> >     > > smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 ida arat pku
> >     > > ----------Network Test----------
> >     > >
> >     > >
> >     > > -Ciyong
> >     > >
> >     > >
> >     > > -----Original Message-----
> >     > > From: Zhao, Patric [mailto:patric.zhao@intel.com]
> >     > > Sent: Thursday, June 27, 2019 9:55 AM
> >     > > To: dev@mxnet.incubator.apache.org
> >     > > Cc: dev@mxnet.apache.org
> >     > > Subject: RE: [VOTE] Release Apache MXNet (incubating) version
> > 1.5.0.rc1
> >     > >
> >     > > Could we run more epochs to see the performance difference or
> > profiling
> >     > > the difference between good and bad run?
> >     > >
> >     > > > -----Original Message-----
> >     > > > From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com]
> >     > > > Sent: Thursday, June 27, 2019 9:35 AM
> >     > > > To: dev@mxnet.incubator.apache.org
> >     > > > Cc: dev@mxnet.apache.org
> >     > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version
> >     > > > 1.5.0.rc1
> >     > > >
> >     > > > I run again and the gap is again bigger, I guess we need to
> > average
> >     > > > out the times across several runs:
> >     > > >
> >     > > > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
> >     > > > (master)+$ time ~/mxnet_1.4/py3_venv/bin/python cifar10.py
> > --epochs 5
> >     > > > && time ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
> >     > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
> >     > > > ImageRecordIOParser2:
> >     > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
> > threads
> >     > > > for decoding..
> >     > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean
> > image
> >     > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >     > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean
> > image
> >     > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > completed
> >     > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
> >     > > > ImageRecordIOParser2:
> >     > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
> > threads
> >     > > > for decoding..
> >     > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean
> > image
> >     > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >     > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean
> > image
> >     > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > completed
> >     > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
> > 300:
> >     > > > 0.0001} Epoch 0, Changed learning rate to 0.05 [23:17:09]
> >     > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> >     > > > 147456 bytes with malloc directly
> >     > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> >     > > > 589824 bytes with malloc directly
> >     > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> >     > > > 2359296 bytes with malloc directly
> >     > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> >     > > > 9437184 bytes with malloc directly
> >     > > > Epoch 0, Batch 199, Speed=384.149839
> >     > > > Epoch 0, Duration=140.919567
> >     > > > Epoch 0, Training accuracy=0.115169
> >     > > > Epoch 0, Validation accuracy=0.141317
> >     > > > Epoch 1, Batch 199, Speed=433.380512
> >     > > > Epoch 1, Duration=119.553233
> >     > > > Epoch 1, Training accuracy=0.170956
> >     > > > Epoch 1, Validation accuracy=0.216146
> >     > > > Epoch 2, Batch 199, Speed=434.864699
> >     > > > Epoch 2, Duration=123.278490
> >     > > > Epoch 2, Training accuracy=0.209455
> >     > > > Epoch 2, Validation accuracy=0.247296
> >     > > > Epoch 3, Batch 199, Speed=433.401854
> >     > > > Epoch 3, Duration=118.327797
> >     > > > Epoch 3, Training accuracy=0.248701
> >     > > > Epoch 3, Validation accuracy=0.302083
> >     > > > Epoch 4, Batch 199, Speed=419.713707
> >     > > > Epoch 4, Duration=126.468409
> >     > > > Epoch 4, Training accuracy=0.260949
> >     > > > Epoch 4, Validation accuracy=0.269030
> >     > > >
> >     > > > real    10m55.796s
> >     > > > user    399m33.567s
> >     > > > sys     13m55.904s
> >     > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
> >     > > > ImageRecordIOParser2:
> >     > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
> > threads
> >     > > > for decoding..
> >     > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean
> > image
> >     > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >     > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean
> > image
> >     > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > completed
> >     > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
> >     > > > ImageRecordIOParser2:
> >     > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
> > threads
> >     > > > for decoding..
> >     > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean
> > image
> >     > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >     > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean
> > image
> >     > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > completed
> >     > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
> > 300:
> >     > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0, Batch
> > 199,
> >     > > > Speed=419.039188 Epoch 0, Duration=143.934903 Epoch 0, Training
> >     > > > accuracy=0.122542 Epoch 0, Validation accuracy=0.164359 Epoch 1,
> > Batch
> >     > > > 199, Speed=445.257048 Epoch 1, Duration=135.248399 Epoch 1,
> > Training
> >     > > > accuracy=0.178828 Epoch 1, Validation accuracy=0.199419 Epoch 2,
> > Batch
> >     > > > 199, Speed=447.115215 Epoch 2, Duration=132.003770 Epoch 2,
> > Training
> >     > > > accuracy=0.217808 Epoch 2, Validation accuracy=0.233073 Epoch 3,
> > Batch
> >     > > > 199, Speed=441.079477 Epoch 3, Duration=126.543316 Epoch 3,
> > Training
> >     > > > accuracy=0.248102 Epoch 3, Validation accuracy=0.293870 Epoch 4,
> > Batch
> >     > > > 199, Speed=449.329787 Epoch 4, Duration=138.398325 Epoch 4,
> > Training
> >     > > > accuracy=0.270021 Epoch 4, Validation accuracy=0.311498
> >     > > >
> >     > > > real    11m45.329s
> >     > > > user    426m13.908s
> >     > > > sys     16m45.093s
> >     > > >
> >     > > > On Wed, Jun 26, 2019 at 4:18 PM Pedro Larroy
> >     > > > <pe...@gmail.com> wrote:
> >     > > > >
> >     > > > > The difference looks smaller now, more like your numbers. I
> > wonder
> >     > > > > if something happened during the previous benchmark like a
> > system
> >     > > > > update...
> >     > > > >
> >     > > > >
> >     > > > > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
> >     > > > (master)+$
> >     > > > > time ~/mxnet_1.4/py3_venv/bin/python cifar10.py --epochs 5 &&
> > time
> >     > > > > ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
> > [22:49:41]
> >     > > > > ../src/io/iter_image_recordio_2.cc:172:
> >     > > > > ImageRecordIOParser2:
> >     > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
> >     > > > > threads for decoding..
> >     > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load mean
> > image
> >     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >     > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load mean
> > image
> >     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >     > > > completed
> >     > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:172:
> >     > > > > ImageRecordIOParser2:
> >     > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
> >     > > > > threads for decoding..
> >     > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load mean
> > image
> >     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >     > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load mean
> > image
> >     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >     > > > completed
> >     > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
> > 300:
> >     > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 [22:49:42]
> >     > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> >     > > > > 147456 bytes with malloc directly
> >     > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> > Allocate
> >     > > > > 589824 bytes with malloc directly
> >     > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> > Allocate
> >     > > > > 2359296 bytes with malloc directly
> >     > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> > Allocate
> >     > > > > 9437184 bytes with malloc directly
> >     > > > > Epoch 0, Batch 199, Speed=426.182733 Epoch 0,
> > Duration=134.868458
> >     > > > > Epoch 0, Training accuracy=0.127238 Epoch 0, Validation
> >     > > > > accuracy=0.206388 Epoch 1, Batch 199, Speed=313.127156 Epoch 1,
> >     > > > > Duration=128.041775 Epoch 1, Training accuracy=0.182065 Epoch
> > 1,
> >     > > > > Validation accuracy=0.202524 Epoch 2, Batch 199,
> > Speed=410.931187
> >     > > > > Epoch 2, Duration=124.920588 Epoch 2, Training
> > accuracy=0.202584
> >     > > > > Epoch 2, Validation accuracy=0.245693 Epoch 3, Batch 199,
> >     > > > > Speed=419.119335 Epoch 3, Duration=120.948349 Epoch 3, Training
> >     > > > > accuracy=0.235854 Epoch 3, Validation accuracy=0.291066 Epoch
> > 4,
> >     > > > > Batch 199, Speed=430.473733 Epoch 4, Duration=130.181724 Epoch
> > 4,
> >     > > > > Training accuracy=0.257773 Epoch 4, Validation
> > accuracy=0.304988
> >     > > > >
> >     > > > > real    11m7.356s
> >     > > > > user    406m9.910s
> >     > > > > sys     14m18.349s
> >     > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
> >     > > > > ImageRecordIOParser2:
> >     > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
> >     > > > > threads for decoding..
> >     > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load mean
> > image
> >     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >     > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load mean
> > image
> >     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >     > > > completed
> >     > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
> >     > > > > ImageRecordIOParser2:
> >     > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
> >     > > > > threads for decoding..
> >     > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load mean
> > image
> >     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >     > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load mean
> > image
> >     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >     > > > completed
> >     > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
> > 300:
> >     > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0, Batch
> > 199,
> >     > > > > Speed=348.618154 Epoch 0, Duration=146.469352 Epoch 0, Training
> >     > > > > accuracy=0.124121 Epoch 0, Validation accuracy=0.167227 Epoch
> > 1,
> >     > > > > Batch 199, Speed=452.790825 Epoch 1, Duration=130.199421 Epoch
> > 1,
> >     > > > > Training
> >     > > > > accuracy=0.183863 Epoch 1, Validation accuracy=0.237079 Epoch
> > 2,
> >     > > > > Batch 199, Speed=451.406559 Epoch 2, Duration=126.320823 Epoch
> > 2,
> >     > > > > Training
> >     > > > > accuracy=0.214844 Epoch 2, Validation accuracy=0.244692 Epoch
> > 3,
> >     > > > > Batch 199, Speed=403.161873 Epoch 3, Duration=125.331660 Epoch
> > 3,
> >     > > > > Training
> >     > > > > accuracy=0.243506 Epoch 3, Validation accuracy=0.301182 Epoch
> > 4,
> >     > > > > Batch 199, Speed=450.826598 Epoch 4, Duration=126.426253 Epoch
> > 4,
> >     > > > > Training
> >     > > > > accuracy=0.266424 Epoch 4, Validation accuracy=0.311899
> >     > > > >
> >     > > > > real    11m21.930s
> >     > > > > user    415m3.855s
> >     > > > > sys     13m53.975s
> >     > > > >
> >     > > > > On Wed, Jun 26, 2019 at 3:50 PM Pedro Larroy
> >     > > > > <pe...@gmail.com> wrote:
> >     > > > > >
> >     > > > > > Hi Ciyong, thanks for trying to reproduce:
> >     > > > > >
> >     > > > > > I used this one:
> >     > > > > > https://github.com/awslabs/deeplearning-
> >     > > > benchmark/blob/master/dawnbe
> >     > > > > > nch/cifar10.py
> >     > > > > >
> >     > > > > > Could you provide hardware and OS details?
> >     > > > > >
> >     > > > > > I will rerun and repost numbers in a few minutes.
> >     > > > > >
> >     > > > > > Pedro.
> >     > > > > >
> >     > > > > > On Wed, Jun 26, 2019 at 4:18 AM Chen, Ciyong
> >     > > > > > <ci...@intel.com>
> >     > > > wrote:
> >     > > > > > >
> >     > > > > > > Hi Pedro,
> >     > > > > > >
> >     > > > > > > I'm looking at this case, and using the script of
> >     > > > > > >
> > "incubator-mxnet/example/image-classification/train_cifar10.py"
> >     > > > > > > to get
> >     > > > the timing data, but seems there's not much difference between
> > mxnet
> >     > > > 1.4.1.rc0 and 1.5.0.rc1 on C5.18xlarge.
> >     > > > > > >
> >     > > > > > > Not sure if there's any difference in the python script,
> > can you
> >     > > > > > > point me
> >     > > > the link to get your script (cifar10.py)?
> >     > > > > > > Or you can also have a try with MXNet's script
> >     > > > > > > (train_cifar10.py) and see
> >     > > > the performance.
> >     > > > > > >
> >     > > > > > > Here's the command I used to collect the time:
> >     > > > > > >         python train_cifar10.py --num-epoch=5
> >     > > > > > >
> >     > > > > > > 1) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
> >     > > > > > >         real    9m4.880s
> >     > > > > > >         user    333m13.340s
> >     > > > > > >         sys     14m36.100s
> >     > > > > > >
> >     > > > > > > 2) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
> >     > > > > > >         real    9m2.155s
> >     > > > > > >         user    329m37.092s
> >     > > > > > >         sys     16m8.668s
> >     > > > > > >
> >     > > > > > > -Ciyong
> >     > > > > > >
> >     > > > > > >
> >     > > > > > > -----Original Message-----
> >     > > > > > > From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com]
> >     > > > > > > Sent: Wednesday, June 26, 2019 6:28 AM
> >     > > > > > > To: dev@mxnet.incubator.apache.org
> >     > > > > > > Cc: dev@mxnet.apache.org
> >     > > > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating)
> > version
> >     > > > > > > 1.5.0.rc1
> >     > > > > > >
> >     > > > > > > Hi these were my build flags and system info:
> >     > > > > > >
> >     > > > > > >
> >     > > > > > > --- # CMake configuration
> >     > > > > > > USE_CUDA: "OFF" # Build with CUDA support
> >     > > > > > > USE_OLDCMAKECUDA: "OFF" # Build with old cmake cuda
> >     > > > > > > USE_NCCL: "OFF" # Use NVidia NCCL with CUDA
> >     > > > > > > USE_OPENCV: "ON" # Build with OpenCV support
> >     > > > > > > USE_OPENMP: "ON" # Build with Openmp support
> >     > > > > > > USE_CUDNN: "ON" # Build with cudnn support) # one could set
> >     > > > > > > CUDNN_ROOT for search path
> >     > > > > > > USE_SSE: "ON" # Build with x86 SSE instruction support IF
> > NOT
> >     > > > > > > ARM
> >     > > > > > > USE_F16C: "ON" # Build with x86 F16C instruction support) #
> >     > > > autodetects support if "ON"
> >     > > > > > > USE_LAPACK: "ON" # Build with lapack support
> >     > > > > > > USE_MKL_IF_AVAILABLE: "ON" # Use MKL if found
> >     > > > > > > USE_MKLML_MKL: "ON" # Use MKLDNN variant of MKL (if MKL
> > found)
> >     > > > > > > IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
> >     > > > > > > USE_MKLDNN: "ON" # Use MKLDNN variant of MKL (if MKL
> > found) IF
> >     > > > > > > USE_MKL_IF_AVAILABLE AND (NOT APPLE)
> >     > > > > > > USE_OPERATOR_TUNING: "ON" # Enable auto-tuning of
> > operators IF
> >     > > > NOT
> >     > > > > > > MSVC
> >     > > > > > > USE_GPERFTOOLS: "ON" # Build with GPerfTools support (if
> > found)
> >     > > > > > > USE_JEMALLOC: "ON" # Build with Jemalloc support
> >     > > > > > > USE_PROFILER: "ON" # Build with Profiler support
> >     > > > > > > USE_DIST_KVSTORE: "OFF" # Build with DIST_KVSTORE support
> >     > > > > > > USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC Plugins
> >     > > > > > > USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin
> >     > > > > > > USE_CPP_PACKAGE: "OFF" # Build C++ Package
> >     > > > > > > USE_MXNET_LIB_NAMING: "ON" # Use MXNet library naming
> >     > > > conventions.
> >     > > > > > > USE_GPROF: "OFF" # Compile with gprof (profiling) flag
> >     > > > > > > USE_CXX14_IF_AVAILABLE: "OFF" # Build with C++14 if the
> > compiler
> >     > > > > > > supports it
> >     > > > > > > USE_VTUNE: "OFF" # Enable use of Intel Amplifier XE
> > (VTune)) #
> >     > > > > > > one could set VTUNE_ROOT for search path
> >     > > > > > > ENABLE_CUDA_RTC: "ON" # Build with CUDA runtime compilation
> >     > > > > > > support
> >     > > > > > > BUILD_CPP_EXAMPLES: "ON" # Build cpp examples
> >     > > > > > > INSTALL_EXAMPLES: "OFF" # Install the example source files.
> >     > > > > > > USE_SIGNAL_HANDLER: "ON" # Print stack traces on segfaults.
> >     > > > > > > USE_TENSORRT: "OFF" # Enable infeference optimization with
> >     > > TensorRT.
> >     > > > > > > USE_ASAN: "OFF" # Enable Clang/GCC ASAN sanitizers.
> >     > > > > > > ENABLE_TESTCOVERAGE: "OFF" # Enable compilation with test
> >     > > > > > > coverage metric output
> >     > > > > > > CMAKE_BUILD_TYPE: "Release"
> >     > > > > > > CMAKE_CUDA_COMPILER_LAUNCHER: "ccache"
> >     > > > > > > CMAKE_C_COMPILER_LAUNCHER: "ccache"
> >     > > > > > > CMAKE_CXX_COMPILER_LAUNCHER: "ccache"
> >     > > > > > >
> >     > > > > > > commit 4d9667121ae6fb643f2a02ab15e25231ed756cde (HEAD, tag:
> >     > > > > > > 1.5.0.rc1,
> >     > > > > > > upstream/v1.5.x)
> >     > > > > > > commit 1a7199691f5cbc6012bb53eecbf884bed5ae6590 (HEAD, tag:
> >     > > > > > > 1.4.1.rc0,
> >     > > > > > > upstream/v1.4.x)
> >     > > > > > >
> >     > > > > > > curl http://169.254.169.254/latest/meta-data/instance-type
> >     > > > > > > c5d.18xlarge
> >     > > > > > >
> >     > > > > > >
> >     > > > > > > Version      : 3.6.7
> >     > > > > > > Compiler     : GCC 8.2.0
> >     > > > > > > Build        : ('default', 'Oct 22 2018 11:32:17')
> >     > > > > > > Arch         : ('64bit', 'ELF')
> >     > > > > > > ------------Pip Info-----------
> >     > > > > > > Version      : 19.1.1
> >     > > > > > > Directory    :
> > /home/piotr/mxnet_1.5/py3_venv/lib/python3.6/site-
> >     > > > packages/pip
> >     > > > > > > ----------MXNet Info-----------
> >     > > > > > > Version      : 1.5.0
> >     > > > > > > Directory    : /home/piotr/mxnet_1.5/python/mxnet
> >     > > > > > > Hashtag not found. Not installed from pre-built package.
> >     > > > > > > ----------System Info----------
> >     > > > > > > Platform     :
> >     > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
> >     > > > > > > system       : Linux
> >     > > > > > > node         : ip-172-31-63-171
> >     > > > > > > release      : 4.15.0-1035-aws
> >     > > > > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
> >     > > > > > > ----------Hardware Info----------
> >     > > > > > > machine      : x86_64
> >     > > > > > > processor    : x86_64
> >     > > > > > > Architecture:        x86_64
> >     > > > > > > CPU op-mode(s):      32-bit, 64-bit
> >     > > > > > > Byte Order:          Little Endian
> >     > > > > > > CPU(s):              72
> >     > > > > > > On-line CPU(s) list: 0-71
> >     > > > > > > Thread(s) per core:  2
> >     > > > > > > Core(s) per socket:  18
> >     > > > > > > Socket(s):           2
> >     > > > > > > NUMA node(s):        2
> >     > > > > > > Vendor ID:           GenuineIntel
> >     > > > > > > CPU family:          6
> >     > > > > > > Model:               85
> >     > > > > > > Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @
> > 3.00GHz
> >     > > > > > > Stepping:            4
> >     > > > > > > CPU MHz:             1326.446
> >     > > > > > > BogoMIPS:            6000.00
> >     > > > > > > Hypervisor vendor:   KVM
> >     > > > > > > Virtualization type: full
> >     > > > > > > L1d cache:           32K
> >     > > > > > > L1i cache:           32K
> >     > > > > > > L2 cache:            1024K
> >     > > > > > > L3 cache:            25344K
> >     > > > > > > NUMA node0 CPU(s):   0-17,36-53
> >     > > > > > > NUMA node1 CPU(s):   18-35,54-71
> >     > > > > > > Flags:               fpu vme de pse tsc msr pae mce cx8
> > apic sep
> >     > > mtrr
> >     > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht
> > syscall
> >     > > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good
> > nopl
> >     > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
> > monitor
> >     > > > > > > ssse3 fma cx16 pcid
> >     > > > > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
> > xsave
> >     > > > > > > avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
> >     > > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep
> > bmi2
> >     > > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx smap
> > clflushopt
> >     > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
> > xsaves
> >     > > > > > > ida arat pku ospke ----------Network Test----------
> >     > > > > > >
> >     > > > > > > ----------Python Info----------
> >     > > > > > > Version      : 3.6.7
> >     > > > > > > Compiler     : GCC 8.2.0
> >     > > > > > > Build        : ('default', 'Oct 22 2018 11:32:17')
> >     > > > > > > Arch         : ('64bit', 'ELF')
> >     > > > > > > ------------Pip Info-----------
> >     > > > > > > Version      : 19.1.1
> >     > > > > > > Directory    :
> > /home/piotr/mxnet_1.4/py3_venv/lib/python3.6/site-
> >     > > > packages/pip
> >     > > > > > > ----------MXNet Info-----------
> >     > > > > > > Version      : 1.4.1
> >     > > > > > > Directory    : /home/piotr/mxnet_1.4/python/mxnet
> >     > > > > > > Hashtag not found. Not installed from pre-built package.
> >     > > > > > > ----------System Info----------
> >     > > > > > > Platform     :
> >     > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
> >     > > > > > > system       : Linux
> >     > > > > > > node         : ip-172-31-63-171
> >     > > > > > > release      : 4.15.0-1035-aws
> >     > > > > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
> >     > > > > > > ----------Hardware Info----------
> >     > > > > > > machine      : x86_64
> >     > > > > > > processor    : x86_64
> >     > > > > > > Architecture:        x86_64
> >     > > > > > > CPU op-mode(s):      32-bit, 64-bit
> >     > > > > > > Byte Order:          Little Endian
> >     > > > > > > CPU(s):              72
> >     > > > > > > On-line CPU(s) list: 0-71
> >     > > > > > > Thread(s) per core:  2
> >     > > > > > > Core(s) per socket:  18
> >     > > > > > > Socket(s):           2
> >     > > > > > > NUMA node(s):        2
> >     > > > > > > Vendor ID:           GenuineIntel
> >     > > > > > > CPU family:          6
> >     > > > > > > Model:               85
> >     > > > > > > Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @
> > 3.00GHz
> >     > > > > > > Stepping:            4
> >     > > > > > > CPU MHz:             1223.344
> >     > > > > > > BogoMIPS:            6000.00
> >     > > > > > > Hypervisor vendor:   KVM
> >     > > > > > > Virtualization type: full
> >     > > > > > > L1d cache:           32K
> >     > > > > > > L1i cache:           32K
> >     > > > > > > L2 cache:            1024K
> >     > > > > > > L3 cache:            25344K
> >     > > > > > > NUMA node0 CPU(s):   0-17,36-53
> >     > > > > > > NUMA node1 CPU(s):   18-35,54-71
> >     > > > > > > Flags:               fpu vme de pse tsc msr pae mce cx8
> > apic sep
> >     > > mtrr
> >     > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht
> > syscall
> >     > > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good
> > nopl
> >     > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
> > monitor
> >     > > > > > > ssse3 fma cx16 pcid
> >     > > > > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
> > xsave
> >     > > > > > > avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
> >     > > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep
> > bmi2
> >     > > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx smap
> > clflushopt
> >     > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
> > xsaves
> >     > > > > > > ida arat pku ospke ----------Network Test----------
> >     > > > > > >
> >     > > > > > > On Tue, Jun 25, 2019 at 2:35 PM Pedro Larroy
> >     > > > <pe...@gmail.com> wrote:
> >     > > > > > > >
> >     > > > > > > > I did a training of cifar10 in CPU and seems there's some
> >     > > > > > > > regressions in the range of 7% increase of training time
> > against
> >     > > 1.4.1:
> >     > > > > > > >
> >     > > > > > > > (py3_venv)
> >     > > > > > > > piotr@ip-172-31-63-171
> > :0:~/deeplearning-benchmark/dawnbench
> >     > > > > > > > (master)+$ time python cifar10.py --epochs 5
> >     > > > > > > > real    11m30.388s
> >     > > > > > > > user    417m7.766s
> >     > > > > > > > sys     16m57.315s
> >     > > > > > > >
> >     > > > > > > > VS 1.4.1:
> >     > > > > > > > real    10m41.994s
> >     > > > > > > > user    392m40.646s
> >     > > > > > > > sys     12m30.601s
> >     > > > > > > >
> >     > > > > > > >
> >     > > > > > > > On Thu, Jun 20, 2019 at 10:15 PM Lai Wei <
> > royweilai@gmail.com>
> >     > > > wrote:
> >     > > > > > > > >
> >     > > > > > > > > Hi Anirudh,
> >     > > > > > > > >
> >     > > > > > > > > Thanks for jumping into this quickly, I followed up on
> > the
> >     > > issue.
> >     > > > > > > > >
> >     > > > > > > > > I was meant for sockeye developer/maintainers to help
> > setup
> >     > > > > > > > > nightly tests and raise issues early.
> >     > > > > > > > >
> >     > > > > > > > > Thanks!
> >     > > > > > > > >
> >     > > > > > > > > On Fri, Jun 21, 2019 at 10:10 AM Haibin Lin
> >     > > > > > > > > <ha...@gmail.com>
> >     > > > > > > > > wrote:
> >     > > > > > > > >
> >     > > > > > > > > > In GluonNLP we are testing with MXNET nightly build
> > for
> >     > > > > > > > > > each PR, and we did find some MXNet related issue
> > caught by
> >     > > the CI.
> >     > > > > > > > > > I recommend other toolkits also add integration
> > tests with
> >     > > > > > > > > > MXNet
> >     > > > nightly.
> >     > > > > > > > > > It helps identify issues early.
> >     > > > > > > > > >
> >     > > > > > > > > > Best,
> >     > > > > > > > > > Haibin
> >     > > > > > > > > >
> >     > > > > > > > > > On Thu, Jun 20, 2019 at 18:52 Zhao, Patric
> >     > > > > > > > > > <pa...@intel.com>
> >     > > > wrote:
> >     > > > > > > > > >
> >     > > > > > > > > > > Thanks to raise the issue and we will take a look
> > ASAP.
> >     > > > > > > > > > >
> >     > > > > > > > > > > The downstream cases is not in the MXNet CI so
> > it's hard
> >     > > > > > > > > > > to catch the potential bugs or performance
> > degradation
> >     > > > > > > > > > > for
> >     > > > MXNet developers.
> >     > > > > > > > > > >
> >     > > > > > > > > > > In the future, I suggest adding the major
> > downstream
> >     > > > > > > > > > > test cases, like
> >     > > > > > > > > > from
> >     > > > > > > > > > > sockeye, GluonNLP, GLuonCV, DGL, Gluon-TS, into the
> >     > > > > > > > > > > nightly
> >     > > > test.
> >     > > > > > > > > > > If it's still too heavy,  maybe testing it weekly
> > or
> >     > > > > > > > > > > monthly :)
> >     > > > > > > > > > >
> >     > > > > > > > > > > Thanks,
> >     > > > > > > > > > >
> >     > > > > > > > > > > --Patric
> >     > > > > > > > > > >
> >     > > > > > > > > > > > -----Original Message-----
> >     > > > > > > > > > > > From: Anirudh Subramanian
> >     > > > > > > > > > > > [mailto:anirudh2290@gmail.com]
> >     > > > > > > > > > > > Sent: Friday, June 21, 2019 9:31 AM
> >     > > > > > > > > > > > To: dev@mxnet.incubator.apache.org
> >     > > > > > > > > > > > Cc: dev@mxnet.apache.org
> >     > > > > > > > > > > > Subject: Re: [VOTE] Release Apache MXNet
> > (incubating)
> >     > > > > > > > > > > > version
> >     > > > > > > > > > > > 1.5.0.rc1
> >     > > > > > > > > > > >
> >     > > > > > > > > > > > Hi Lai,
> >     > > > > > > > > > > >
> >     > > > > > > > > > > > I have opened an issue:
> >     > > > > > > > > > > >
> > https://github.com/apache/incubator-mxnet/issues/15297
> >     > > > > > > > > > > > I came to know about this issue only today and I
> > have
> >     > > > > > > > > > > > not been
> >     > > > > > > > > > monitoring
> >     > > > > > > > > > > > sockeye.
> >     > > > > > > > > > > > I jumped onto this issue to make sure it wasn't
> > caused
> >     > > > > > > > > > > > by the dlpack
> >     > > > > > > > > > > changes.
> >     > > > > > > > > > > > Also, I don't  think sockeye CI checks against
> > master,
> >     > > > > > > > > > > > it is using
> >     > > > > > > > > > 1.4.1.
> >     > > > > > > > > > > >
> >     > > > > > > > > > > > Anirudh
> >     > > > > > > > > > > >
> >     > > > > > > > > > > >
> >     > > > > > > > > > > > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei
> >     > > > > > > > > > > > <ro...@gmail.com>
> >     > > > wrote:
> >     > > > > > > > > > > >
> >     > > > > > > > > > > > > Hi,
> >     > > > > > > > > > > > >
> >     > > > > > > > > > > > > Could you share which test failed and what’s
> > the
> >     > > > > > > > > > > > > crash? How to reproduce it?
> >     > > > > > > > > > > > >
> >     > > > > > > > > > > > > I was able to install sockeye and run all
> > tests passed.
> >     > > > > > > > > > > > > Using python setup.py test
> >     > > > > > > > > > > > >
> >     > > > > > > > > > > > > I have tested both nightly pip package and
> > 1.5.0.rc1
> >     > > > > > > > > > > > >
> >     > > > > > > > > > > > > It would be great to create an issue with
> >     > > > > > > > > > > > > reproducible steps and move the discussion
> > there.
> >     > > > > > > > > > > > >
> >     > > > > > > > > > > > > Also I see sockeye nightly build[1] has been
> > failing
> >     > > > > > > > > > > > > for some time,
> >     > > > > > > > > > if
> >     > > > > > > > > > > > > it’s due to MXNet change, please raise this
> > early so
> >     > > > > > > > > > > > > we can track and solve it in time rather than
> > block
> >     > > > > > > > > > > > > the release
> >     > > > during vote time.
> >     > > > > > > > > > > > >
> >     > > > > > > > > > > > > [1] https://travis-ci.org/awslabs/sockeye
> >     > > > > > > > > > > > >
> >     > > > > > > > > > > > >
> >     > > > > > > > > > > > > On Fri, Jun 21, 2019 at 7:01 AM Anirudh
> > Subramanian
> >     > > > > > > > > > > > > <anirudh2290@gmail.com
> >     > > > > > > > > > > > > >
> >     > > > > > > > > > > > > wrote:
> >     > > > > > > > > > > > >
> >     > > > > > > > > > > > > > I was able to reproduce a crash with the
> > commit
> >     > > > > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06 but
> > not
> >     > > > > > > > > > > > > > with the commit
> >     > > > a862270beb2d796c1ba311183f7f4a766a18ad6c.
> >     > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > Anirudh
> >     > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei
> >     > > > > > > > > > > > > > <ro...@gmail.com>
> >     > > > > > > > > > wrote:
> >     > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > Hi Przemyslaw,
> >     > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > Is there an issue with more details to
> > track the
> >     > > problem?
> >     > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 6:04 AM Przemysław
> >     > > > > > > > > > > > > > > Trędak <pt...@apache.org>
> >     > > > > > > > > > > > > > > wrote:
> >     > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > -1
> >     > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > There is a crash in sockeye unit test
> > (python
> >     > > > > > > > > > > > > > > > setup.py
> >     > > > > > > > > > > > > > > > test) observed starting with nightly 1.5
> > build
> >     > > > > > > > > > > > > > > > from
> >     > > > > > > > > > > > > > > > 6/13 and still occuring in
> >     > > > > > > > > > > > > > 1.5rc1. I
> >     > > > > > > > > > > > > > > > don't yet have the exact commit that is
> >     > > > > > > > > > > > > > > > responsible for it, but it is either
> >     > > > > > > > > > > > > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c
> >     > > > > > > > > > > > > > > > (dlpack
> >     > > > > > > > > > > > > > > > related) or
> >     > > > > > > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06
> >     > > > > > > > > > > > > > > > (cached op
> >     > > > > > > > > > > > optimization).
> >     > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > On 2019/06/20 06:36:22, Lai Wei
> >     > > > > > > > > > > > > > > > <ro...@gmail.com>
> >     > > > wrote:
> >     > > > > > > > > > > > > > > > > Dear MXNet community,
> >     > > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > > This is the 3-day vote to release
> > Apache
> >     > > > > > > > > > > > > > > > > MXNet
> >     > > > > > > > > > > > > > > > > (incubating) version
> >     > > > > > > > > > > > > > > > 1.5.0.
> >     > > > > > > > > > > > > > > > > Voting on dev@ will start June 19,
> >     > > > > > > > > > > > > > > > > 23:59:59(PST) and close
> >     > > > > > > > > > on
> >     > > > > > > > > > > > > June
> >     > > > > > > > > > > > > > > 22,
> >     > > > > > > > > > > > > > > > > 23:59:59.
> >     > > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > > 1) Link to release notes:
> >     > > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > >
> >     > > > > > > > > >
> > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Re
> >     > > > > > > > > > le
> >     > > > > > > > > > ase+No
> >     > > > > > > > > > te
> >     > > > > > > > > > > > > > s
> >     > > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > > 2) Link to release candidate:
> >     > > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > >
> >     > > > > > > > > >
> > https://github.com/apache/incubator-mxnet/releases/tag/1.5
> >     > > > > > > > > > .0
> >     > > > > > > > > > .r
> >     > > > > > > > > > > > > > > > > c1
> >     > > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > > 3) Link to source and signatures on
> > apache
> >     > > dist server:
> >     > > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > >
> >     > > > > > > > > >
> > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5
> >     > > > > > > > > > .0
> >     > > > > > > > > > .r
> >     > > > > > > > > > > > > > > > > c1/
> >     > > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > > Please remember to TEST first before
> > voting
> >     > > > accordingly:
> >     > > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > > +1 = approve
> >     > > > > > > > > > > > > > > > > +0 = no opinion
> >     > > > > > > > > > > > > > > > > -1 = disapprove (provide reason)
> >     > > > > > > > > > > > > > > > > --
> >     > > > > > > > > > > > > > > > > Best Regards
> >     > > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > > > Lai
> >     > > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > --
> >     > > > > > > > > > > > > > > Best Regards
> >     > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > > > Lai
> >     > > > > > > > > > > > > > >
> >     > > > > > > > > > > > > >
> >     > > > > > > > > > > > > --
> >     > > > > > > > > > > > > Best Regards
> >     > > > > > > > > > > > >
> >     > > > > > > > > > > > > Lai
> >     > > > > > > > > > > > >
> >     > > > > > > > > > >
> >     > > > > > > > > >
> >     > > > > > > > > --
> >     > > > > > > > > Best Regards
> >     > > > > > > > >
> >     > > > > > > > > Lai
> >     > >
> >     > --
> >     > Best Regards
> >     >
> >     > Lai
> >
> >
> >
> >


Re: FW: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Posted by Manu Seth <ma...@gmail.com>.
I ran the same cifar10.py script as Pedro, but for 20 epochs. Considering
the first 10 epochs for warm-up, I averaged time per epoch for the last 10
epochs.

With MXNet 1.4.1 average time is 164.23 s
With MXNet 1.5.0 average time is 174.59 s (~6.3% regression)


For a second data point, I ran Gluon speed test benchmark script -
https://github.com/apache/incubator-mxnet/blob/master/benchmark/python/gluon/benchmark_gluon.py
using the following command:
python3 benchmark_gluon.py --model 'resnet152_v2' --batch-size 128
--num-batches 200 --type 'training'

I got the following speeds:
With MXNet 1.4.1, average speed is 25.677534 img/s
With MXNet 1.5.0, average speed is 25.082130 img/s (~2.3% regression)

Note:
For 1.4.1 version, I used pip install mxnet-mkl==1.4.1
For 1.5.0 version, I used pip install mxnet-mkl==1.5.0b20190619 which
corresponds to commit# ccbbf6b4b76ea536a6583c99497c83b65a20817b which is
behind 1.5.x branch by 4 commits


Best,
Manu


On 6/27/19, 10:44 AM, "Pedro Larroy" <pe...@gmail.com> wrote:
>
>     I will try to run a few benchmarks in a bare metal instance tonight to
>     remove virtualization variance for the measurements and provide some
>     numbers.
>
>     Please propose a set of models / examples that would be desirable to
>     run before the release and provide a link to an easy to run script
>     with instructions so we can validate the release better.
>
>     Thank you.
>
>     On Thu, Jun 27, 2019 at 10:01 AM Lai Wei <ro...@gmail.com> wrote:
>     >
>     > Dear @dev,
>     >
>     > I m cancelling the vote for cached op fix:
>     >
>     > https://github.com/apache/incubator-mxnet/pull/15298
>     >
>     > As for the possible cpu training regression, it looks like not a
> blocker
>     > for now.
>     >
>     > I will start a new rc2 vote, please help to validate.
>     >
>     > Thanks!
>     >
>     >
>     > On Thu, Jun 27, 2019 at 10:06 PM Chen, Ciyong <ci...@intel.com>
> wrote:
>     >
>     > > Hi Pedro,
>     > >
>     > > I was able to reproduced the similar result (v1.5 is ~%5.6 slower
> than
>     > > v1.4, I was using 18 cores for computing) with your script on
> C5.18xlarge.
>     > > But need to bind the cores with below command when running the
> script,
>     > > (without setting the env variables, I got a close time (<1%) with
> v1.5 and
>     > > v1.4)
>     > >         export
> KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0
>     > >         export OMP_NUM_THREADS=18
>     > >
>     > > Did you set any env variables during running?
>     > >
>     > > The performance result I got as below:
>     > > 1) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
>     > > real    12m10.856s
>     > > user    234m49.576s
>     > > sys     4m38.044s
>     > >
>     > > 2) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
>     > > real    12m52.140s
>     > > user    246m30.740s
>     > > sys     5m8.188s
>     > >
>     > > As I looked at the profiling data, most of the ops have same perf
> between
>     > > v1.4 and v1.5. But some ops like " _backward_BatchNorm" and
> "Pooling" is
>     > > ~1.37x slower on v1.5 compared with v1.4.
>     > > Will do further analysis on these ops.
>     > >
>     > > Here's the hardware/OS info from my side:
>     > > ----------Python Info----------
>     > > Version      : 3.6.8
>     > > Compiler     : GCC 7.3.0
>     > > Build        : ('default', 'Dec 30 2018 01:22:34')
>     > > Arch         : ('64bit', '')
>     > > ------------Pip Info-----------
>     > > Version      : 19.0.3
>     > > Directory    :
>     > >
> /home/ubuntu/anaconda3/envs/perf-mxnet/lib/python3.6/site-packages/pip
>     > > ----------MXNet Info-----------
>     > > Version      : 1.5.0
>     > > Directory    : /home/ubuntu/ws/incubator-mxnet/python/mxnet
>     > > Hashtag not found. Not installed from pre-built package.
>     > > ----------System Info----------
>     > > Platform     : Linux-4.4.0-1085-aws-x86_64-with-debian-stretch-sid
>     > > system       : Linux
>     > > node         : ip-172-31-32-129
>     > > release      : 4.4.0-1085-aws
>     > > version      : #96-Ubuntu SMP Tue Jun 11 09:08:32 UTC 2019
>     > > ----------Hardware Info----------
>     > > machine      : x86_64
>     > > processor    : x86_64
>     > > Architecture:          x86_64
>     > > CPU op-mode(s):        32-bit, 64-bit
>     > > Byte Order:            Little Endian
>     > > CPU(s):                72
>     > > On-line CPU(s) list:   0-71
>     > > Thread(s) per core:    2
>     > > Core(s) per socket:    18
>     > > Socket(s):             2
>     > > NUMA node(s):          2
>     > > Vendor ID:             GenuineIntel
>     > > CPU family:            6
>     > > Model:                 85
>     > > Model name:            Intel(R) Xeon(R) Platinum 8124M CPU @
> 3.00GHz
>     > > Stepping:              3
>     > > CPU MHz:               3000.000
>     > > BogoMIPS:              6000.00
>     > > Hypervisor vendor:     KVM
>     > > Virtualization type:   full
>     > > L1d cache:             32K
>     > > L1i cache:             32K
>     > > L2 cache:              1024K
>     > > L3 cache:              25344K
>     > > NUMA node0 CPU(s):     0-17,36-53
>     > > NUMA node1 CPU(s):     18-35,54-71
>     > > Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep
> mtrr
>     > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx
> pdpe1gb
>     > > rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology
> nonstop_tsc
>     > > aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16
> pcid sse4_1
>     > > sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c
> rdrand
>     > > hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase
>     > > tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f
> rdseed adx
>     > > smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 ida arat pku
>     > > ----------Network Test----------
>     > >
>     > >
>     > > -Ciyong
>     > >
>     > >
>     > > -----Original Message-----
>     > > From: Zhao, Patric [mailto:patric.zhao@intel.com]
>     > > Sent: Thursday, June 27, 2019 9:55 AM
>     > > To: dev@mxnet.incubator.apache.org
>     > > Cc: dev@mxnet.apache.org
>     > > Subject: RE: [VOTE] Release Apache MXNet (incubating) version
> 1.5.0.rc1
>     > >
>     > > Could we run more epochs to see the performance difference or
> profiling
>     > > the difference between good and bad run?
>     > >
>     > > > -----Original Message-----
>     > > > From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com]
>     > > > Sent: Thursday, June 27, 2019 9:35 AM
>     > > > To: dev@mxnet.incubator.apache.org
>     > > > Cc: dev@mxnet.apache.org
>     > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version
>     > > > 1.5.0.rc1
>     > > >
>     > > > I run again and the gap is again bigger, I guess we need to
> average
>     > > > out the times across several runs:
>     > > >
>     > > > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
>     > > > (master)+$ time ~/mxnet_1.4/py3_venv/bin/python cifar10.py
> --epochs 5
>     > > > && time ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
>     > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
>     > > > ImageRecordIOParser2:
>     > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
> threads
>     > > > for decoding..
>     > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
>     > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
>     > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> completed
>     > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
>     > > > ImageRecordIOParser2:
>     > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
> threads
>     > > > for decoding..
>     > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
>     > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
>     > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> completed
>     > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
> 300:
>     > > > 0.0001} Epoch 0, Changed learning rate to 0.05 [23:17:09]
>     > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
>     > > > 147456 bytes with malloc directly
>     > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
>     > > > 589824 bytes with malloc directly
>     > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
>     > > > 2359296 bytes with malloc directly
>     > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
>     > > > 9437184 bytes with malloc directly
>     > > > Epoch 0, Batch 199, Speed=384.149839
>     > > > Epoch 0, Duration=140.919567
>     > > > Epoch 0, Training accuracy=0.115169
>     > > > Epoch 0, Validation accuracy=0.141317
>     > > > Epoch 1, Batch 199, Speed=433.380512
>     > > > Epoch 1, Duration=119.553233
>     > > > Epoch 1, Training accuracy=0.170956
>     > > > Epoch 1, Validation accuracy=0.216146
>     > > > Epoch 2, Batch 199, Speed=434.864699
>     > > > Epoch 2, Duration=123.278490
>     > > > Epoch 2, Training accuracy=0.209455
>     > > > Epoch 2, Validation accuracy=0.247296
>     > > > Epoch 3, Batch 199, Speed=433.401854
>     > > > Epoch 3, Duration=118.327797
>     > > > Epoch 3, Training accuracy=0.248701
>     > > > Epoch 3, Validation accuracy=0.302083
>     > > > Epoch 4, Batch 199, Speed=419.713707
>     > > > Epoch 4, Duration=126.468409
>     > > > Epoch 4, Training accuracy=0.260949
>     > > > Epoch 4, Validation accuracy=0.269030
>     > > >
>     > > > real    10m55.796s
>     > > > user    399m33.567s
>     > > > sys     13m55.904s
>     > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
>     > > > ImageRecordIOParser2:
>     > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
> threads
>     > > > for decoding..
>     > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
>     > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
>     > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> completed
>     > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
>     > > > ImageRecordIOParser2:
>     > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
> threads
>     > > > for decoding..
>     > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
>     > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
>     > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> completed
>     > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
> 300:
>     > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0, Batch
> 199,
>     > > > Speed=419.039188 Epoch 0, Duration=143.934903 Epoch 0, Training
>     > > > accuracy=0.122542 Epoch 0, Validation accuracy=0.164359 Epoch 1,
> Batch
>     > > > 199, Speed=445.257048 Epoch 1, Duration=135.248399 Epoch 1,
> Training
>     > > > accuracy=0.178828 Epoch 1, Validation accuracy=0.199419 Epoch 2,
> Batch
>     > > > 199, Speed=447.115215 Epoch 2, Duration=132.003770 Epoch 2,
> Training
>     > > > accuracy=0.217808 Epoch 2, Validation accuracy=0.233073 Epoch 3,
> Batch
>     > > > 199, Speed=441.079477 Epoch 3, Duration=126.543316 Epoch 3,
> Training
>     > > > accuracy=0.248102 Epoch 3, Validation accuracy=0.293870 Epoch 4,
> Batch
>     > > > 199, Speed=449.329787 Epoch 4, Duration=138.398325 Epoch 4,
> Training
>     > > > accuracy=0.270021 Epoch 4, Validation accuracy=0.311498
>     > > >
>     > > > real    11m45.329s
>     > > > user    426m13.908s
>     > > > sys     16m45.093s
>     > > >
>     > > > On Wed, Jun 26, 2019 at 4:18 PM Pedro Larroy
>     > > > <pe...@gmail.com> wrote:
>     > > > >
>     > > > > The difference looks smaller now, more like your numbers. I
> wonder
>     > > > > if something happened during the previous benchmark like a
> system
>     > > > > update...
>     > > > >
>     > > > >
>     > > > > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
>     > > > (master)+$
>     > > > > time ~/mxnet_1.4/py3_venv/bin/python cifar10.py --epochs 5 &&
> time
>     > > > > ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
> [22:49:41]
>     > > > > ../src/io/iter_image_recordio_2.cc:172:
>     > > > > ImageRecordIOParser2:
>     > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
>     > > > > threads for decoding..
>     > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
>     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
>     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > completed
>     > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:172:
>     > > > > ImageRecordIOParser2:
>     > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
>     > > > > threads for decoding..
>     > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
>     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
>     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > completed
>     > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
> 300:
>     > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 [22:49:42]
>     > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
>     > > > > 147456 bytes with malloc directly
>     > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> Allocate
>     > > > > 589824 bytes with malloc directly
>     > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> Allocate
>     > > > > 2359296 bytes with malloc directly
>     > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> Allocate
>     > > > > 9437184 bytes with malloc directly
>     > > > > Epoch 0, Batch 199, Speed=426.182733 Epoch 0,
> Duration=134.868458
>     > > > > Epoch 0, Training accuracy=0.127238 Epoch 0, Validation
>     > > > > accuracy=0.206388 Epoch 1, Batch 199, Speed=313.127156 Epoch 1,
>     > > > > Duration=128.041775 Epoch 1, Training accuracy=0.182065 Epoch
> 1,
>     > > > > Validation accuracy=0.202524 Epoch 2, Batch 199,
> Speed=410.931187
>     > > > > Epoch 2, Duration=124.920588 Epoch 2, Training
> accuracy=0.202584
>     > > > > Epoch 2, Validation accuracy=0.245693 Epoch 3, Batch 199,
>     > > > > Speed=419.119335 Epoch 3, Duration=120.948349 Epoch 3, Training
>     > > > > accuracy=0.235854 Epoch 3, Validation accuracy=0.291066 Epoch
> 4,
>     > > > > Batch 199, Speed=430.473733 Epoch 4, Duration=130.181724 Epoch
> 4,
>     > > > > Training accuracy=0.257773 Epoch 4, Validation
> accuracy=0.304988
>     > > > >
>     > > > > real    11m7.356s
>     > > > > user    406m9.910s
>     > > > > sys     14m18.349s
>     > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
>     > > > > ImageRecordIOParser2:
>     > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
>     > > > > threads for decoding..
>     > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
>     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
>     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > completed
>     > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
>     > > > > ImageRecordIOParser2:
>     > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
>     > > > > threads for decoding..
>     > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
>     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
>     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > completed
>     > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
> 300:
>     > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0, Batch
> 199,
>     > > > > Speed=348.618154 Epoch 0, Duration=146.469352 Epoch 0, Training
>     > > > > accuracy=0.124121 Epoch 0, Validation accuracy=0.167227 Epoch
> 1,
>     > > > > Batch 199, Speed=452.790825 Epoch 1, Duration=130.199421 Epoch
> 1,
>     > > > > Training
>     > > > > accuracy=0.183863 Epoch 1, Validation accuracy=0.237079 Epoch
> 2,
>     > > > > Batch 199, Speed=451.406559 Epoch 2, Duration=126.320823 Epoch
> 2,
>     > > > > Training
>     > > > > accuracy=0.214844 Epoch 2, Validation accuracy=0.244692 Epoch
> 3,
>     > > > > Batch 199, Speed=403.161873 Epoch 3, Duration=125.331660 Epoch
> 3,
>     > > > > Training
>     > > > > accuracy=0.243506 Epoch 3, Validation accuracy=0.301182 Epoch
> 4,
>     > > > > Batch 199, Speed=450.826598 Epoch 4, Duration=126.426253 Epoch
> 4,
>     > > > > Training
>     > > > > accuracy=0.266424 Epoch 4, Validation accuracy=0.311899
>     > > > >
>     > > > > real    11m21.930s
>     > > > > user    415m3.855s
>     > > > > sys     13m53.975s
>     > > > >
>     > > > > On Wed, Jun 26, 2019 at 3:50 PM Pedro Larroy
>     > > > > <pe...@gmail.com> wrote:
>     > > > > >
>     > > > > > Hi Ciyong, thanks for trying to reproduce:
>     > > > > >
>     > > > > > I used this one:
>     > > > > > https://github.com/awslabs/deeplearning-
>     > > > benchmark/blob/master/dawnbe
>     > > > > > nch/cifar10.py
>     > > > > >
>     > > > > > Could you provide hardware and OS details?
>     > > > > >
>     > > > > > I will rerun and repost numbers in a few minutes.
>     > > > > >
>     > > > > > Pedro.
>     > > > > >
>     > > > > > On Wed, Jun 26, 2019 at 4:18 AM Chen, Ciyong
>     > > > > > <ci...@intel.com>
>     > > > wrote:
>     > > > > > >
>     > > > > > > Hi Pedro,
>     > > > > > >
>     > > > > > > I'm looking at this case, and using the script of
>     > > > > > >
> "incubator-mxnet/example/image-classification/train_cifar10.py"
>     > > > > > > to get
>     > > > the timing data, but seems there's not much difference between
> mxnet
>     > > > 1.4.1.rc0 and 1.5.0.rc1 on C5.18xlarge.
>     > > > > > >
>     > > > > > > Not sure if there's any difference in the python script,
> can you
>     > > > > > > point me
>     > > > the link to get your script (cifar10.py)?
>     > > > > > > Or you can also have a try with MXNet's script
>     > > > > > > (train_cifar10.py) and see
>     > > > the performance.
>     > > > > > >
>     > > > > > > Here's the command I used to collect the time:
>     > > > > > >         python train_cifar10.py --num-epoch=5
>     > > > > > >
>     > > > > > > 1) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
>     > > > > > >         real    9m4.880s
>     > > > > > >         user    333m13.340s
>     > > > > > >         sys     14m36.100s
>     > > > > > >
>     > > > > > > 2) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
>     > > > > > >         real    9m2.155s
>     > > > > > >         user    329m37.092s
>     > > > > > >         sys     16m8.668s
>     > > > > > >
>     > > > > > > -Ciyong
>     > > > > > >
>     > > > > > >
>     > > > > > > -----Original Message-----
>     > > > > > > From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com]
>     > > > > > > Sent: Wednesday, June 26, 2019 6:28 AM
>     > > > > > > To: dev@mxnet.incubator.apache.org
>     > > > > > > Cc: dev@mxnet.apache.org
>     > > > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating)
> version
>     > > > > > > 1.5.0.rc1
>     > > > > > >
>     > > > > > > Hi these were my build flags and system info:
>     > > > > > >
>     > > > > > >
>     > > > > > > --- # CMake configuration
>     > > > > > > USE_CUDA: "OFF" # Build with CUDA support
>     > > > > > > USE_OLDCMAKECUDA: "OFF" # Build with old cmake cuda
>     > > > > > > USE_NCCL: "OFF" # Use NVidia NCCL with CUDA
>     > > > > > > USE_OPENCV: "ON" # Build with OpenCV support
>     > > > > > > USE_OPENMP: "ON" # Build with Openmp support
>     > > > > > > USE_CUDNN: "ON" # Build with cudnn support) # one could set
>     > > > > > > CUDNN_ROOT for search path
>     > > > > > > USE_SSE: "ON" # Build with x86 SSE instruction support IF
> NOT
>     > > > > > > ARM
>     > > > > > > USE_F16C: "ON" # Build with x86 F16C instruction support) #
>     > > > autodetects support if "ON"
>     > > > > > > USE_LAPACK: "ON" # Build with lapack support
>     > > > > > > USE_MKL_IF_AVAILABLE: "ON" # Use MKL if found
>     > > > > > > USE_MKLML_MKL: "ON" # Use MKLDNN variant of MKL (if MKL
> found)
>     > > > > > > IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
>     > > > > > > USE_MKLDNN: "ON" # Use MKLDNN variant of MKL (if MKL
> found) IF
>     > > > > > > USE_MKL_IF_AVAILABLE AND (NOT APPLE)
>     > > > > > > USE_OPERATOR_TUNING: "ON" # Enable auto-tuning of
> operators IF
>     > > > NOT
>     > > > > > > MSVC
>     > > > > > > USE_GPERFTOOLS: "ON" # Build with GPerfTools support (if
> found)
>     > > > > > > USE_JEMALLOC: "ON" # Build with Jemalloc support
>     > > > > > > USE_PROFILER: "ON" # Build with Profiler support
>     > > > > > > USE_DIST_KVSTORE: "OFF" # Build with DIST_KVSTORE support
>     > > > > > > USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC Plugins
>     > > > > > > USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin
>     > > > > > > USE_CPP_PACKAGE: "OFF" # Build C++ Package
>     > > > > > > USE_MXNET_LIB_NAMING: "ON" # Use MXNet library naming
>     > > > conventions.
>     > > > > > > USE_GPROF: "OFF" # Compile with gprof (profiling) flag
>     > > > > > > USE_CXX14_IF_AVAILABLE: "OFF" # Build with C++14 if the
> compiler
>     > > > > > > supports it
>     > > > > > > USE_VTUNE: "OFF" # Enable use of Intel Amplifier XE
> (VTune)) #
>     > > > > > > one could set VTUNE_ROOT for search path
>     > > > > > > ENABLE_CUDA_RTC: "ON" # Build with CUDA runtime compilation
>     > > > > > > support
>     > > > > > > BUILD_CPP_EXAMPLES: "ON" # Build cpp examples
>     > > > > > > INSTALL_EXAMPLES: "OFF" # Install the example source files.
>     > > > > > > USE_SIGNAL_HANDLER: "ON" # Print stack traces on segfaults.
>     > > > > > > USE_TENSORRT: "OFF" # Enable infeference optimization with
>     > > TensorRT.
>     > > > > > > USE_ASAN: "OFF" # Enable Clang/GCC ASAN sanitizers.
>     > > > > > > ENABLE_TESTCOVERAGE: "OFF" # Enable compilation with test
>     > > > > > > coverage metric output
>     > > > > > > CMAKE_BUILD_TYPE: "Release"
>     > > > > > > CMAKE_CUDA_COMPILER_LAUNCHER: "ccache"
>     > > > > > > CMAKE_C_COMPILER_LAUNCHER: "ccache"
>     > > > > > > CMAKE_CXX_COMPILER_LAUNCHER: "ccache"
>     > > > > > >
>     > > > > > > commit 4d9667121ae6fb643f2a02ab15e25231ed756cde (HEAD, tag:
>     > > > > > > 1.5.0.rc1,
>     > > > > > > upstream/v1.5.x)
>     > > > > > > commit 1a7199691f5cbc6012bb53eecbf884bed5ae6590 (HEAD, tag:
>     > > > > > > 1.4.1.rc0,
>     > > > > > > upstream/v1.4.x)
>     > > > > > >
>     > > > > > > curl http://169.254.169.254/latest/meta-data/instance-type
>     > > > > > > c5d.18xlarge
>     > > > > > >
>     > > > > > >
>     > > > > > > Version      : 3.6.7
>     > > > > > > Compiler     : GCC 8.2.0
>     > > > > > > Build        : ('default', 'Oct 22 2018 11:32:17')
>     > > > > > > Arch         : ('64bit', 'ELF')
>     > > > > > > ------------Pip Info-----------
>     > > > > > > Version      : 19.1.1
>     > > > > > > Directory    :
> /home/piotr/mxnet_1.5/py3_venv/lib/python3.6/site-
>     > > > packages/pip
>     > > > > > > ----------MXNet Info-----------
>     > > > > > > Version      : 1.5.0
>     > > > > > > Directory    : /home/piotr/mxnet_1.5/python/mxnet
>     > > > > > > Hashtag not found. Not installed from pre-built package.
>     > > > > > > ----------System Info----------
>     > > > > > > Platform     :
>     > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
>     > > > > > > system       : Linux
>     > > > > > > node         : ip-172-31-63-171
>     > > > > > > release      : 4.15.0-1035-aws
>     > > > > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
>     > > > > > > ----------Hardware Info----------
>     > > > > > > machine      : x86_64
>     > > > > > > processor    : x86_64
>     > > > > > > Architecture:        x86_64
>     > > > > > > CPU op-mode(s):      32-bit, 64-bit
>     > > > > > > Byte Order:          Little Endian
>     > > > > > > CPU(s):              72
>     > > > > > > On-line CPU(s) list: 0-71
>     > > > > > > Thread(s) per core:  2
>     > > > > > > Core(s) per socket:  18
>     > > > > > > Socket(s):           2
>     > > > > > > NUMA node(s):        2
>     > > > > > > Vendor ID:           GenuineIntel
>     > > > > > > CPU family:          6
>     > > > > > > Model:               85
>     > > > > > > Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @
> 3.00GHz
>     > > > > > > Stepping:            4
>     > > > > > > CPU MHz:             1326.446
>     > > > > > > BogoMIPS:            6000.00
>     > > > > > > Hypervisor vendor:   KVM
>     > > > > > > Virtualization type: full
>     > > > > > > L1d cache:           32K
>     > > > > > > L1i cache:           32K
>     > > > > > > L2 cache:            1024K
>     > > > > > > L3 cache:            25344K
>     > > > > > > NUMA node0 CPU(s):   0-17,36-53
>     > > > > > > NUMA node1 CPU(s):   18-35,54-71
>     > > > > > > Flags:               fpu vme de pse tsc msr pae mce cx8
> apic sep
>     > > mtrr
>     > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht
> syscall
>     > > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good
> nopl
>     > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
> monitor
>     > > > > > > ssse3 fma cx16 pcid
>     > > > > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
> xsave
>     > > > > > > avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
>     > > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep
> bmi2
>     > > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx smap
> clflushopt
>     > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
> xsaves
>     > > > > > > ida arat pku ospke ----------Network Test----------
>     > > > > > >
>     > > > > > > ----------Python Info----------
>     > > > > > > Version      : 3.6.7
>     > > > > > > Compiler     : GCC 8.2.0
>     > > > > > > Build        : ('default', 'Oct 22 2018 11:32:17')
>     > > > > > > Arch         : ('64bit', 'ELF')
>     > > > > > > ------------Pip Info-----------
>     > > > > > > Version      : 19.1.1
>     > > > > > > Directory    :
> /home/piotr/mxnet_1.4/py3_venv/lib/python3.6/site-
>     > > > packages/pip
>     > > > > > > ----------MXNet Info-----------
>     > > > > > > Version      : 1.4.1
>     > > > > > > Directory    : /home/piotr/mxnet_1.4/python/mxnet
>     > > > > > > Hashtag not found. Not installed from pre-built package.
>     > > > > > > ----------System Info----------
>     > > > > > > Platform     :
>     > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
>     > > > > > > system       : Linux
>     > > > > > > node         : ip-172-31-63-171
>     > > > > > > release      : 4.15.0-1035-aws
>     > > > > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
>     > > > > > > ----------Hardware Info----------
>     > > > > > > machine      : x86_64
>     > > > > > > processor    : x86_64
>     > > > > > > Architecture:        x86_64
>     > > > > > > CPU op-mode(s):      32-bit, 64-bit
>     > > > > > > Byte Order:          Little Endian
>     > > > > > > CPU(s):              72
>     > > > > > > On-line CPU(s) list: 0-71
>     > > > > > > Thread(s) per core:  2
>     > > > > > > Core(s) per socket:  18
>     > > > > > > Socket(s):           2
>     > > > > > > NUMA node(s):        2
>     > > > > > > Vendor ID:           GenuineIntel
>     > > > > > > CPU family:          6
>     > > > > > > Model:               85
>     > > > > > > Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @
> 3.00GHz
>     > > > > > > Stepping:            4
>     > > > > > > CPU MHz:             1223.344
>     > > > > > > BogoMIPS:            6000.00
>     > > > > > > Hypervisor vendor:   KVM
>     > > > > > > Virtualization type: full
>     > > > > > > L1d cache:           32K
>     > > > > > > L1i cache:           32K
>     > > > > > > L2 cache:            1024K
>     > > > > > > L3 cache:            25344K
>     > > > > > > NUMA node0 CPU(s):   0-17,36-53
>     > > > > > > NUMA node1 CPU(s):   18-35,54-71
>     > > > > > > Flags:               fpu vme de pse tsc msr pae mce cx8
> apic sep
>     > > mtrr
>     > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht
> syscall
>     > > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good
> nopl
>     > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
> monitor
>     > > > > > > ssse3 fma cx16 pcid
>     > > > > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
> xsave
>     > > > > > > avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
>     > > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep
> bmi2
>     > > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx smap
> clflushopt
>     > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
> xsaves
>     > > > > > > ida arat pku ospke ----------Network Test----------
>     > > > > > >
>     > > > > > > On Tue, Jun 25, 2019 at 2:35 PM Pedro Larroy
>     > > > <pe...@gmail.com> wrote:
>     > > > > > > >
>     > > > > > > > I did a training of cifar10 in CPU and seems there's some
>     > > > > > > > regressions in the range of 7% increase of training time
> against
>     > > 1.4.1:
>     > > > > > > >
>     > > > > > > > (py3_venv)
>     > > > > > > > piotr@ip-172-31-63-171
> :0:~/deeplearning-benchmark/dawnbench
>     > > > > > > > (master)+$ time python cifar10.py --epochs 5
>     > > > > > > > real    11m30.388s
>     > > > > > > > user    417m7.766s
>     > > > > > > > sys     16m57.315s
>     > > > > > > >
>     > > > > > > > VS 1.4.1:
>     > > > > > > > real    10m41.994s
>     > > > > > > > user    392m40.646s
>     > > > > > > > sys     12m30.601s
>     > > > > > > >
>     > > > > > > >
>     > > > > > > > On Thu, Jun 20, 2019 at 10:15 PM Lai Wei <
> royweilai@gmail.com>
>     > > > wrote:
>     > > > > > > > >
>     > > > > > > > > Hi Anirudh,
>     > > > > > > > >
>     > > > > > > > > Thanks for jumping into this quickly, I followed up on
> the
>     > > issue.
>     > > > > > > > >
>     > > > > > > > > I was meant for sockeye developer/maintainers to help
> setup
>     > > > > > > > > nightly tests and raise issues early.
>     > > > > > > > >
>     > > > > > > > > Thanks!
>     > > > > > > > >
>     > > > > > > > > On Fri, Jun 21, 2019 at 10:10 AM Haibin Lin
>     > > > > > > > > <ha...@gmail.com>
>     > > > > > > > > wrote:
>     > > > > > > > >
>     > > > > > > > > > In GluonNLP we are testing with MXNET nightly build
> for
>     > > > > > > > > > each PR, and we did find some MXNet related issue
> caught by
>     > > the CI.
>     > > > > > > > > > I recommend other toolkits also add integration
> tests with
>     > > > > > > > > > MXNet
>     > > > nightly.
>     > > > > > > > > > It helps identify issues early.
>     > > > > > > > > >
>     > > > > > > > > > Best,
>     > > > > > > > > > Haibin
>     > > > > > > > > >
>     > > > > > > > > > On Thu, Jun 20, 2019 at 18:52 Zhao, Patric
>     > > > > > > > > > <pa...@intel.com>
>     > > > wrote:
>     > > > > > > > > >
>     > > > > > > > > > > Thanks to raise the issue and we will take a look
> ASAP.
>     > > > > > > > > > >
>     > > > > > > > > > > The downstream cases is not in the MXNet CI so
> it's hard
>     > > > > > > > > > > to catch the potential bugs or performance
> degradation
>     > > > > > > > > > > for
>     > > > MXNet developers.
>     > > > > > > > > > >
>     > > > > > > > > > > In the future, I suggest adding the major
> downstream
>     > > > > > > > > > > test cases, like
>     > > > > > > > > > from
>     > > > > > > > > > > sockeye, GluonNLP, GLuonCV, DGL, Gluon-TS, into the
>     > > > > > > > > > > nightly
>     > > > test.
>     > > > > > > > > > > If it's still too heavy,  maybe testing it weekly
> or
>     > > > > > > > > > > monthly :)
>     > > > > > > > > > >
>     > > > > > > > > > > Thanks,
>     > > > > > > > > > >
>     > > > > > > > > > > --Patric
>     > > > > > > > > > >
>     > > > > > > > > > > > -----Original Message-----
>     > > > > > > > > > > > From: Anirudh Subramanian
>     > > > > > > > > > > > [mailto:anirudh2290@gmail.com]
>     > > > > > > > > > > > Sent: Friday, June 21, 2019 9:31 AM
>     > > > > > > > > > > > To: dev@mxnet.incubator.apache.org
>     > > > > > > > > > > > Cc: dev@mxnet.apache.org
>     > > > > > > > > > > > Subject: Re: [VOTE] Release Apache MXNet
> (incubating)
>     > > > > > > > > > > > version
>     > > > > > > > > > > > 1.5.0.rc1
>     > > > > > > > > > > >
>     > > > > > > > > > > > Hi Lai,
>     > > > > > > > > > > >
>     > > > > > > > > > > > I have opened an issue:
>     > > > > > > > > > > >
> https://github.com/apache/incubator-mxnet/issues/15297
>     > > > > > > > > > > > I came to know about this issue only today and I
> have
>     > > > > > > > > > > > not been
>     > > > > > > > > > monitoring
>     > > > > > > > > > > > sockeye.
>     > > > > > > > > > > > I jumped onto this issue to make sure it wasn't
> caused
>     > > > > > > > > > > > by the dlpack
>     > > > > > > > > > > changes.
>     > > > > > > > > > > > Also, I don't  think sockeye CI checks against
> master,
>     > > > > > > > > > > > it is using
>     > > > > > > > > > 1.4.1.
>     > > > > > > > > > > >
>     > > > > > > > > > > > Anirudh
>     > > > > > > > > > > >
>     > > > > > > > > > > >
>     > > > > > > > > > > > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei
>     > > > > > > > > > > > <ro...@gmail.com>
>     > > > wrote:
>     > > > > > > > > > > >
>     > > > > > > > > > > > > Hi,
>     > > > > > > > > > > > >
>     > > > > > > > > > > > > Could you share which test failed and what’s
> the
>     > > > > > > > > > > > > crash? How to reproduce it?
>     > > > > > > > > > > > >
>     > > > > > > > > > > > > I was able to install sockeye and run all
> tests passed.
>     > > > > > > > > > > > > Using python setup.py test
>     > > > > > > > > > > > >
>     > > > > > > > > > > > > I have tested both nightly pip package and
> 1.5.0.rc1
>     > > > > > > > > > > > >
>     > > > > > > > > > > > > It would be great to create an issue with
>     > > > > > > > > > > > > reproducible steps and move the discussion
> there.
>     > > > > > > > > > > > >
>     > > > > > > > > > > > > Also I see sockeye nightly build[1] has been
> failing
>     > > > > > > > > > > > > for some time,
>     > > > > > > > > > if
>     > > > > > > > > > > > > it’s due to MXNet change, please raise this
> early so
>     > > > > > > > > > > > > we can track and solve it in time rather than
> block
>     > > > > > > > > > > > > the release
>     > > > during vote time.
>     > > > > > > > > > > > >
>     > > > > > > > > > > > > [1] https://travis-ci.org/awslabs/sockeye
>     > > > > > > > > > > > >
>     > > > > > > > > > > > >
>     > > > > > > > > > > > > On Fri, Jun 21, 2019 at 7:01 AM Anirudh
> Subramanian
>     > > > > > > > > > > > > <anirudh2290@gmail.com
>     > > > > > > > > > > > > >
>     > > > > > > > > > > > > wrote:
>     > > > > > > > > > > > >
>     > > > > > > > > > > > > > I was able to reproduce a crash with the
> commit
>     > > > > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06 but
> not
>     > > > > > > > > > > > > > with the commit
>     > > > a862270beb2d796c1ba311183f7f4a766a18ad6c.
>     > > > > > > > > > > > > >
>     > > > > > > > > > > > > > Anirudh
>     > > > > > > > > > > > > >
>     > > > > > > > > > > > > > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei
>     > > > > > > > > > > > > > <ro...@gmail.com>
>     > > > > > > > > > wrote:
>     > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > Hi Przemyslaw,
>     > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > Is there an issue with more details to
> track the
>     > > problem?
>     > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 6:04 AM Przemysław
>     > > > > > > > > > > > > > > Trędak <pt...@apache.org>
>     > > > > > > > > > > > > > > wrote:
>     > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > -1
>     > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > There is a crash in sockeye unit test
> (python
>     > > > > > > > > > > > > > > > setup.py
>     > > > > > > > > > > > > > > > test) observed starting with nightly 1.5
> build
>     > > > > > > > > > > > > > > > from
>     > > > > > > > > > > > > > > > 6/13 and still occuring in
>     > > > > > > > > > > > > > 1.5rc1. I
>     > > > > > > > > > > > > > > > don't yet have the exact commit that is
>     > > > > > > > > > > > > > > > responsible for it, but it is either
>     > > > > > > > > > > > > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c
>     > > > > > > > > > > > > > > > (dlpack
>     > > > > > > > > > > > > > > > related) or
>     > > > > > > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06
>     > > > > > > > > > > > > > > > (cached op
>     > > > > > > > > > > > optimization).
>     > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > On 2019/06/20 06:36:22, Lai Wei
>     > > > > > > > > > > > > > > > <ro...@gmail.com>
>     > > > wrote:
>     > > > > > > > > > > > > > > > > Dear MXNet community,
>     > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > This is the 3-day vote to release
> Apache
>     > > > > > > > > > > > > > > > > MXNet
>     > > > > > > > > > > > > > > > > (incubating) version
>     > > > > > > > > > > > > > > > 1.5.0.
>     > > > > > > > > > > > > > > > > Voting on dev@ will start June 19,
>     > > > > > > > > > > > > > > > > 23:59:59(PST) and close
>     > > > > > > > > > on
>     > > > > > > > > > > > > June
>     > > > > > > > > > > > > > > 22,
>     > > > > > > > > > > > > > > > > 23:59:59.
>     > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > 1) Link to release notes:
>     > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > >
>     > > > > > > > > >
> https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Re
>     > > > > > > > > > le
>     > > > > > > > > > ase+No
>     > > > > > > > > > te
>     > > > > > > > > > > > > > s
>     > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > 2) Link to release candidate:
>     > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > >
>     > > > > > > > > >
> https://github.com/apache/incubator-mxnet/releases/tag/1.5
>     > > > > > > > > > .0
>     > > > > > > > > > .r
>     > > > > > > > > > > > > > > > > c1
>     > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > 3) Link to source and signatures on
> apache
>     > > dist server:
>     > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > >
>     > > > > > > > > >
> https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5
>     > > > > > > > > > .0
>     > > > > > > > > > .r
>     > > > > > > > > > > > > > > > > c1/
>     > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > Please remember to TEST first before
> voting
>     > > > accordingly:
>     > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > +1 = approve
>     > > > > > > > > > > > > > > > > +0 = no opinion
>     > > > > > > > > > > > > > > > > -1 = disapprove (provide reason)
>     > > > > > > > > > > > > > > > > --
>     > > > > > > > > > > > > > > > > Best Regards
>     > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > Lai
>     > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > --
>     > > > > > > > > > > > > > > Best Regards
>     > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > Lai
>     > > > > > > > > > > > > > >
>     > > > > > > > > > > > > >
>     > > > > > > > > > > > > --
>     > > > > > > > > > > > > Best Regards
>     > > > > > > > > > > > >
>     > > > > > > > > > > > > Lai
>     > > > > > > > > > > > >
>     > > > > > > > > > >
>     > > > > > > > > >
>     > > > > > > > > --
>     > > > > > > > > Best Regards
>     > > > > > > > >
>     > > > > > > > > Lai
>     > >
>     > --
>     > Best Regards
>     >
>     > Lai
>
>
>
>

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Posted by Pedro Larroy <pe...@gmail.com>.
I will try to run a few benchmarks in a bare metal instance tonight to
remove virtualization variance for the measurements and provide some
numbers.

Please propose a set of models / examples that would be desirable to
run before the release and provide a link to an easy to run script
with instructions so we can validate the release better.

Thank you.

On Thu, Jun 27, 2019 at 10:01 AM Lai Wei <ro...@gmail.com> wrote:
>
> Dear @dev,
>
> I m cancelling the vote for cached op fix:
>
> https://github.com/apache/incubator-mxnet/pull/15298
>
> As for the possible cpu training regression, it looks like not a blocker
> for now.
>
> I will start a new rc2 vote, please help to validate.
>
> Thanks!
>
>
> On Thu, Jun 27, 2019 at 10:06 PM Chen, Ciyong <ci...@intel.com> wrote:
>
> > Hi Pedro,
> >
> > I was able to reproduced the similar result (v1.5 is ~%5.6 slower than
> > v1.4, I was using 18 cores for computing) with your script on C5.18xlarge.
> > But need to bind the cores with below command when running the script,
> > (without setting the env variables, I got a close time (<1%) with v1.5 and
> > v1.4)
> >         export KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0
> >         export OMP_NUM_THREADS=18
> >
> > Did you set any env variables during running?
> >
> > The performance result I got as below:
> > 1) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
> > real    12m10.856s
> > user    234m49.576s
> > sys     4m38.044s
> >
> > 2) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
> > real    12m52.140s
> > user    246m30.740s
> > sys     5m8.188s
> >
> > As I looked at the profiling data, most of the ops have same perf between
> > v1.4 and v1.5. But some ops like " _backward_BatchNorm" and "Pooling" is
> > ~1.37x slower on v1.5 compared with v1.4.
> > Will do further analysis on these ops.
> >
> > Here's the hardware/OS info from my side:
> > ----------Python Info----------
> > Version      : 3.6.8
> > Compiler     : GCC 7.3.0
> > Build        : ('default', 'Dec 30 2018 01:22:34')
> > Arch         : ('64bit', '')
> > ------------Pip Info-----------
> > Version      : 19.0.3
> > Directory    :
> > /home/ubuntu/anaconda3/envs/perf-mxnet/lib/python3.6/site-packages/pip
> > ----------MXNet Info-----------
> > Version      : 1.5.0
> > Directory    : /home/ubuntu/ws/incubator-mxnet/python/mxnet
> > Hashtag not found. Not installed from pre-built package.
> > ----------System Info----------
> > Platform     : Linux-4.4.0-1085-aws-x86_64-with-debian-stretch-sid
> > system       : Linux
> > node         : ip-172-31-32-129
> > release      : 4.4.0-1085-aws
> > version      : #96-Ubuntu SMP Tue Jun 11 09:08:32 UTC 2019
> > ----------Hardware Info----------
> > machine      : x86_64
> > processor    : x86_64
> > Architecture:          x86_64
> > CPU op-mode(s):        32-bit, 64-bit
> > Byte Order:            Little Endian
> > CPU(s):                72
> > On-line CPU(s) list:   0-71
> > Thread(s) per core:    2
> > Core(s) per socket:    18
> > Socket(s):             2
> > NUMA node(s):          2
> > Vendor ID:             GenuineIntel
> > CPU family:            6
> > Model:                 85
> > Model name:            Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
> > Stepping:              3
> > CPU MHz:               3000.000
> > BogoMIPS:              6000.00
> > Hypervisor vendor:     KVM
> > Virtualization type:   full
> > L1d cache:             32K
> > L1i cache:             32K
> > L2 cache:              1024K
> > L3 cache:              25344K
> > NUMA node0 CPU(s):     0-17,36-53
> > NUMA node1 CPU(s):     18-35,54-71
> > Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
> > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb
> > rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc
> > aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1
> > sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand
> > hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase
> > tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f rdseed adx
> > smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 ida arat pku
> > ----------Network Test----------
> >
> >
> > -Ciyong
> >
> >
> > -----Original Message-----
> > From: Zhao, Patric [mailto:patric.zhao@intel.com]
> > Sent: Thursday, June 27, 2019 9:55 AM
> > To: dev@mxnet.incubator.apache.org
> > Cc: dev@mxnet.apache.org
> > Subject: RE: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1
> >
> > Could we run more epochs to see the performance difference or profiling
> > the difference between good and bad run?
> >
> > > -----Original Message-----
> > > From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com]
> > > Sent: Thursday, June 27, 2019 9:35 AM
> > > To: dev@mxnet.incubator.apache.org
> > > Cc: dev@mxnet.apache.org
> > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version
> > > 1.5.0.rc1
> > >
> > > I run again and the gap is again bigger, I guess we need to average
> > > out the times across several runs:
> > >
> > > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
> > > (master)+$ time ~/mxnet_1.4/py3_venv/bin/python cifar10.py --epochs 5
> > > && time ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
> > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
> > > ImageRecordIOParser2:
> > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4 threads
> > > for decoding..
> > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin completed
> > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
> > > ImageRecordIOParser2:
> > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4 threads
> > > for decoding..
> > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin completed
> > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005, 300:
> > > 0.0001} Epoch 0, Changed learning rate to 0.05 [23:17:09]
> > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > > 147456 bytes with malloc directly
> > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > > 589824 bytes with malloc directly
> > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > > 2359296 bytes with malloc directly
> > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > > 9437184 bytes with malloc directly
> > > Epoch 0, Batch 199, Speed=384.149839
> > > Epoch 0, Duration=140.919567
> > > Epoch 0, Training accuracy=0.115169
> > > Epoch 0, Validation accuracy=0.141317
> > > Epoch 1, Batch 199, Speed=433.380512
> > > Epoch 1, Duration=119.553233
> > > Epoch 1, Training accuracy=0.170956
> > > Epoch 1, Validation accuracy=0.216146
> > > Epoch 2, Batch 199, Speed=434.864699
> > > Epoch 2, Duration=123.278490
> > > Epoch 2, Training accuracy=0.209455
> > > Epoch 2, Validation accuracy=0.247296
> > > Epoch 3, Batch 199, Speed=433.401854
> > > Epoch 3, Duration=118.327797
> > > Epoch 3, Training accuracy=0.248701
> > > Epoch 3, Validation accuracy=0.302083
> > > Epoch 4, Batch 199, Speed=419.713707
> > > Epoch 4, Duration=126.468409
> > > Epoch 4, Training accuracy=0.260949
> > > Epoch 4, Validation accuracy=0.269030
> > >
> > > real    10m55.796s
> > > user    399m33.567s
> > > sys     13m55.904s
> > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
> > > ImageRecordIOParser2:
> > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4 threads
> > > for decoding..
> > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin completed
> > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
> > > ImageRecordIOParser2:
> > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4 threads
> > > for decoding..
> > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin completed
> > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005, 300:
> > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0, Batch 199,
> > > Speed=419.039188 Epoch 0, Duration=143.934903 Epoch 0, Training
> > > accuracy=0.122542 Epoch 0, Validation accuracy=0.164359 Epoch 1, Batch
> > > 199, Speed=445.257048 Epoch 1, Duration=135.248399 Epoch 1, Training
> > > accuracy=0.178828 Epoch 1, Validation accuracy=0.199419 Epoch 2, Batch
> > > 199, Speed=447.115215 Epoch 2, Duration=132.003770 Epoch 2, Training
> > > accuracy=0.217808 Epoch 2, Validation accuracy=0.233073 Epoch 3, Batch
> > > 199, Speed=441.079477 Epoch 3, Duration=126.543316 Epoch 3, Training
> > > accuracy=0.248102 Epoch 3, Validation accuracy=0.293870 Epoch 4, Batch
> > > 199, Speed=449.329787 Epoch 4, Duration=138.398325 Epoch 4, Training
> > > accuracy=0.270021 Epoch 4, Validation accuracy=0.311498
> > >
> > > real    11m45.329s
> > > user    426m13.908s
> > > sys     16m45.093s
> > >
> > > On Wed, Jun 26, 2019 at 4:18 PM Pedro Larroy
> > > <pe...@gmail.com> wrote:
> > > >
> > > > The difference looks smaller now, more like your numbers. I wonder
> > > > if something happened during the previous benchmark like a system
> > > > update...
> > > >
> > > >
> > > > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
> > > (master)+$
> > > > time ~/mxnet_1.4/py3_venv/bin/python cifar10.py --epochs 5 && time
> > > > ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5 [22:49:41]
> > > > ../src/io/iter_image_recordio_2.cc:172:
> > > > ImageRecordIOParser2:
> > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
> > > > threads for decoding..
> > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > completed
> > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:172:
> > > > ImageRecordIOParser2:
> > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
> > > > threads for decoding..
> > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > completed
> > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005, 300:
> > > > 0.0001} Epoch 0, Changed learning rate to 0.05 [22:49:42]
> > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > > > 147456 bytes with malloc directly
> > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > > > 589824 bytes with malloc directly
> > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > > > 2359296 bytes with malloc directly
> > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > > > 9437184 bytes with malloc directly
> > > > Epoch 0, Batch 199, Speed=426.182733 Epoch 0, Duration=134.868458
> > > > Epoch 0, Training accuracy=0.127238 Epoch 0, Validation
> > > > accuracy=0.206388 Epoch 1, Batch 199, Speed=313.127156 Epoch 1,
> > > > Duration=128.041775 Epoch 1, Training accuracy=0.182065 Epoch 1,
> > > > Validation accuracy=0.202524 Epoch 2, Batch 199, Speed=410.931187
> > > > Epoch 2, Duration=124.920588 Epoch 2, Training accuracy=0.202584
> > > > Epoch 2, Validation accuracy=0.245693 Epoch 3, Batch 199,
> > > > Speed=419.119335 Epoch 3, Duration=120.948349 Epoch 3, Training
> > > > accuracy=0.235854 Epoch 3, Validation accuracy=0.291066 Epoch 4,
> > > > Batch 199, Speed=430.473733 Epoch 4, Duration=130.181724 Epoch 4,
> > > > Training accuracy=0.257773 Epoch 4, Validation accuracy=0.304988
> > > >
> > > > real    11m7.356s
> > > > user    406m9.910s
> > > > sys     14m18.349s
> > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
> > > > ImageRecordIOParser2:
> > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
> > > > threads for decoding..
> > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > completed
> > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
> > > > ImageRecordIOParser2:
> > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
> > > > threads for decoding..
> > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > completed
> > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005, 300:
> > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0, Batch 199,
> > > > Speed=348.618154 Epoch 0, Duration=146.469352 Epoch 0, Training
> > > > accuracy=0.124121 Epoch 0, Validation accuracy=0.167227 Epoch 1,
> > > > Batch 199, Speed=452.790825 Epoch 1, Duration=130.199421 Epoch 1,
> > > > Training
> > > > accuracy=0.183863 Epoch 1, Validation accuracy=0.237079 Epoch 2,
> > > > Batch 199, Speed=451.406559 Epoch 2, Duration=126.320823 Epoch 2,
> > > > Training
> > > > accuracy=0.214844 Epoch 2, Validation accuracy=0.244692 Epoch 3,
> > > > Batch 199, Speed=403.161873 Epoch 3, Duration=125.331660 Epoch 3,
> > > > Training
> > > > accuracy=0.243506 Epoch 3, Validation accuracy=0.301182 Epoch 4,
> > > > Batch 199, Speed=450.826598 Epoch 4, Duration=126.426253 Epoch 4,
> > > > Training
> > > > accuracy=0.266424 Epoch 4, Validation accuracy=0.311899
> > > >
> > > > real    11m21.930s
> > > > user    415m3.855s
> > > > sys     13m53.975s
> > > >
> > > > On Wed, Jun 26, 2019 at 3:50 PM Pedro Larroy
> > > > <pe...@gmail.com> wrote:
> > > > >
> > > > > Hi Ciyong, thanks for trying to reproduce:
> > > > >
> > > > > I used this one:
> > > > > https://github.com/awslabs/deeplearning-
> > > benchmark/blob/master/dawnbe
> > > > > nch/cifar10.py
> > > > >
> > > > > Could you provide hardware and OS details?
> > > > >
> > > > > I will rerun and repost numbers in a few minutes.
> > > > >
> > > > > Pedro.
> > > > >
> > > > > On Wed, Jun 26, 2019 at 4:18 AM Chen, Ciyong
> > > > > <ci...@intel.com>
> > > wrote:
> > > > > >
> > > > > > Hi Pedro,
> > > > > >
> > > > > > I'm looking at this case, and using the script of
> > > > > > "incubator-mxnet/example/image-classification/train_cifar10.py"
> > > > > > to get
> > > the timing data, but seems there's not much difference between mxnet
> > > 1.4.1.rc0 and 1.5.0.rc1 on C5.18xlarge.
> > > > > >
> > > > > > Not sure if there's any difference in the python script, can you
> > > > > > point me
> > > the link to get your script (cifar10.py)?
> > > > > > Or you can also have a try with MXNet's script
> > > > > > (train_cifar10.py) and see
> > > the performance.
> > > > > >
> > > > > > Here's the command I used to collect the time:
> > > > > >         python train_cifar10.py --num-epoch=5
> > > > > >
> > > > > > 1) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
> > > > > >         real    9m4.880s
> > > > > >         user    333m13.340s
> > > > > >         sys     14m36.100s
> > > > > >
> > > > > > 2) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
> > > > > >         real    9m2.155s
> > > > > >         user    329m37.092s
> > > > > >         sys     16m8.668s
> > > > > >
> > > > > > -Ciyong
> > > > > >
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com]
> > > > > > Sent: Wednesday, June 26, 2019 6:28 AM
> > > > > > To: dev@mxnet.incubator.apache.org
> > > > > > Cc: dev@mxnet.apache.org
> > > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version
> > > > > > 1.5.0.rc1
> > > > > >
> > > > > > Hi these were my build flags and system info:
> > > > > >
> > > > > >
> > > > > > --- # CMake configuration
> > > > > > USE_CUDA: "OFF" # Build with CUDA support
> > > > > > USE_OLDCMAKECUDA: "OFF" # Build with old cmake cuda
> > > > > > USE_NCCL: "OFF" # Use NVidia NCCL with CUDA
> > > > > > USE_OPENCV: "ON" # Build with OpenCV support
> > > > > > USE_OPENMP: "ON" # Build with Openmp support
> > > > > > USE_CUDNN: "ON" # Build with cudnn support) # one could set
> > > > > > CUDNN_ROOT for search path
> > > > > > USE_SSE: "ON" # Build with x86 SSE instruction support IF NOT
> > > > > > ARM
> > > > > > USE_F16C: "ON" # Build with x86 F16C instruction support) #
> > > autodetects support if "ON"
> > > > > > USE_LAPACK: "ON" # Build with lapack support
> > > > > > USE_MKL_IF_AVAILABLE: "ON" # Use MKL if found
> > > > > > USE_MKLML_MKL: "ON" # Use MKLDNN variant of MKL (if MKL found)
> > > > > > IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
> > > > > > USE_MKLDNN: "ON" # Use MKLDNN variant of MKL (if MKL found) IF
> > > > > > USE_MKL_IF_AVAILABLE AND (NOT APPLE)
> > > > > > USE_OPERATOR_TUNING: "ON" # Enable auto-tuning of operators IF
> > > NOT
> > > > > > MSVC
> > > > > > USE_GPERFTOOLS: "ON" # Build with GPerfTools support (if found)
> > > > > > USE_JEMALLOC: "ON" # Build with Jemalloc support
> > > > > > USE_PROFILER: "ON" # Build with Profiler support
> > > > > > USE_DIST_KVSTORE: "OFF" # Build with DIST_KVSTORE support
> > > > > > USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC Plugins
> > > > > > USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin
> > > > > > USE_CPP_PACKAGE: "OFF" # Build C++ Package
> > > > > > USE_MXNET_LIB_NAMING: "ON" # Use MXNet library naming
> > > conventions.
> > > > > > USE_GPROF: "OFF" # Compile with gprof (profiling) flag
> > > > > > USE_CXX14_IF_AVAILABLE: "OFF" # Build with C++14 if the compiler
> > > > > > supports it
> > > > > > USE_VTUNE: "OFF" # Enable use of Intel Amplifier XE (VTune)) #
> > > > > > one could set VTUNE_ROOT for search path
> > > > > > ENABLE_CUDA_RTC: "ON" # Build with CUDA runtime compilation
> > > > > > support
> > > > > > BUILD_CPP_EXAMPLES: "ON" # Build cpp examples
> > > > > > INSTALL_EXAMPLES: "OFF" # Install the example source files.
> > > > > > USE_SIGNAL_HANDLER: "ON" # Print stack traces on segfaults.
> > > > > > USE_TENSORRT: "OFF" # Enable infeference optimization with
> > TensorRT.
> > > > > > USE_ASAN: "OFF" # Enable Clang/GCC ASAN sanitizers.
> > > > > > ENABLE_TESTCOVERAGE: "OFF" # Enable compilation with test
> > > > > > coverage metric output
> > > > > > CMAKE_BUILD_TYPE: "Release"
> > > > > > CMAKE_CUDA_COMPILER_LAUNCHER: "ccache"
> > > > > > CMAKE_C_COMPILER_LAUNCHER: "ccache"
> > > > > > CMAKE_CXX_COMPILER_LAUNCHER: "ccache"
> > > > > >
> > > > > > commit 4d9667121ae6fb643f2a02ab15e25231ed756cde (HEAD, tag:
> > > > > > 1.5.0.rc1,
> > > > > > upstream/v1.5.x)
> > > > > > commit 1a7199691f5cbc6012bb53eecbf884bed5ae6590 (HEAD, tag:
> > > > > > 1.4.1.rc0,
> > > > > > upstream/v1.4.x)
> > > > > >
> > > > > > curl http://169.254.169.254/latest/meta-data/instance-type
> > > > > > c5d.18xlarge
> > > > > >
> > > > > >
> > > > > > Version      : 3.6.7
> > > > > > Compiler     : GCC 8.2.0
> > > > > > Build        : ('default', 'Oct 22 2018 11:32:17')
> > > > > > Arch         : ('64bit', 'ELF')
> > > > > > ------------Pip Info-----------
> > > > > > Version      : 19.1.1
> > > > > > Directory    : /home/piotr/mxnet_1.5/py3_venv/lib/python3.6/site-
> > > packages/pip
> > > > > > ----------MXNet Info-----------
> > > > > > Version      : 1.5.0
> > > > > > Directory    : /home/piotr/mxnet_1.5/python/mxnet
> > > > > > Hashtag not found. Not installed from pre-built package.
> > > > > > ----------System Info----------
> > > > > > Platform     :
> > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
> > > > > > system       : Linux
> > > > > > node         : ip-172-31-63-171
> > > > > > release      : 4.15.0-1035-aws
> > > > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
> > > > > > ----------Hardware Info----------
> > > > > > machine      : x86_64
> > > > > > processor    : x86_64
> > > > > > Architecture:        x86_64
> > > > > > CPU op-mode(s):      32-bit, 64-bit
> > > > > > Byte Order:          Little Endian
> > > > > > CPU(s):              72
> > > > > > On-line CPU(s) list: 0-71
> > > > > > Thread(s) per core:  2
> > > > > > Core(s) per socket:  18
> > > > > > Socket(s):           2
> > > > > > NUMA node(s):        2
> > > > > > Vendor ID:           GenuineIntel
> > > > > > CPU family:          6
> > > > > > Model:               85
> > > > > > Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
> > > > > > Stepping:            4
> > > > > > CPU MHz:             1326.446
> > > > > > BogoMIPS:            6000.00
> > > > > > Hypervisor vendor:   KVM
> > > > > > Virtualization type: full
> > > > > > L1d cache:           32K
> > > > > > L1i cache:           32K
> > > > > > L2 cache:            1024K
> > > > > > L3 cache:            25344K
> > > > > > NUMA node0 CPU(s):   0-17,36-53
> > > > > > NUMA node1 CPU(s):   18-35,54-71
> > > > > > Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep
> > mtrr
> > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall
> > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl
> > > > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq monitor
> > > > > > ssse3 fma cx16 pcid
> > > > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave
> > > > > > avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
> > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2
> > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt
> > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves
> > > > > > ida arat pku ospke ----------Network Test----------
> > > > > >
> > > > > > ----------Python Info----------
> > > > > > Version      : 3.6.7
> > > > > > Compiler     : GCC 8.2.0
> > > > > > Build        : ('default', 'Oct 22 2018 11:32:17')
> > > > > > Arch         : ('64bit', 'ELF')
> > > > > > ------------Pip Info-----------
> > > > > > Version      : 19.1.1
> > > > > > Directory    : /home/piotr/mxnet_1.4/py3_venv/lib/python3.6/site-
> > > packages/pip
> > > > > > ----------MXNet Info-----------
> > > > > > Version      : 1.4.1
> > > > > > Directory    : /home/piotr/mxnet_1.4/python/mxnet
> > > > > > Hashtag not found. Not installed from pre-built package.
> > > > > > ----------System Info----------
> > > > > > Platform     :
> > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
> > > > > > system       : Linux
> > > > > > node         : ip-172-31-63-171
> > > > > > release      : 4.15.0-1035-aws
> > > > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
> > > > > > ----------Hardware Info----------
> > > > > > machine      : x86_64
> > > > > > processor    : x86_64
> > > > > > Architecture:        x86_64
> > > > > > CPU op-mode(s):      32-bit, 64-bit
> > > > > > Byte Order:          Little Endian
> > > > > > CPU(s):              72
> > > > > > On-line CPU(s) list: 0-71
> > > > > > Thread(s) per core:  2
> > > > > > Core(s) per socket:  18
> > > > > > Socket(s):           2
> > > > > > NUMA node(s):        2
> > > > > > Vendor ID:           GenuineIntel
> > > > > > CPU family:          6
> > > > > > Model:               85
> > > > > > Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
> > > > > > Stepping:            4
> > > > > > CPU MHz:             1223.344
> > > > > > BogoMIPS:            6000.00
> > > > > > Hypervisor vendor:   KVM
> > > > > > Virtualization type: full
> > > > > > L1d cache:           32K
> > > > > > L1i cache:           32K
> > > > > > L2 cache:            1024K
> > > > > > L3 cache:            25344K
> > > > > > NUMA node0 CPU(s):   0-17,36-53
> > > > > > NUMA node1 CPU(s):   18-35,54-71
> > > > > > Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep
> > mtrr
> > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall
> > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl
> > > > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq monitor
> > > > > > ssse3 fma cx16 pcid
> > > > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave
> > > > > > avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
> > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2
> > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt
> > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves
> > > > > > ida arat pku ospke ----------Network Test----------
> > > > > >
> > > > > > On Tue, Jun 25, 2019 at 2:35 PM Pedro Larroy
> > > <pe...@gmail.com> wrote:
> > > > > > >
> > > > > > > I did a training of cifar10 in CPU and seems there's some
> > > > > > > regressions in the range of 7% increase of training time against
> > 1.4.1:
> > > > > > >
> > > > > > > (py3_venv)
> > > > > > > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
> > > > > > > (master)+$ time python cifar10.py --epochs 5
> > > > > > > real    11m30.388s
> > > > > > > user    417m7.766s
> > > > > > > sys     16m57.315s
> > > > > > >
> > > > > > > VS 1.4.1:
> > > > > > > real    10m41.994s
> > > > > > > user    392m40.646s
> > > > > > > sys     12m30.601s
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Jun 20, 2019 at 10:15 PM Lai Wei <ro...@gmail.com>
> > > wrote:
> > > > > > > >
> > > > > > > > Hi Anirudh,
> > > > > > > >
> > > > > > > > Thanks for jumping into this quickly, I followed up on the
> > issue.
> > > > > > > >
> > > > > > > > I was meant for sockeye developer/maintainers to help setup
> > > > > > > > nightly tests and raise issues early.
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > >
> > > > > > > > On Fri, Jun 21, 2019 at 10:10 AM Haibin Lin
> > > > > > > > <ha...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > In GluonNLP we are testing with MXNET nightly build for
> > > > > > > > > each PR, and we did find some MXNet related issue caught by
> > the CI.
> > > > > > > > > I recommend other toolkits also add integration tests with
> > > > > > > > > MXNet
> > > nightly.
> > > > > > > > > It helps identify issues early.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Haibin
> > > > > > > > >
> > > > > > > > > On Thu, Jun 20, 2019 at 18:52 Zhao, Patric
> > > > > > > > > <pa...@intel.com>
> > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks to raise the issue and we will take a look ASAP.
> > > > > > > > > >
> > > > > > > > > > The downstream cases is not in the MXNet CI so it's hard
> > > > > > > > > > to catch the potential bugs or performance degradation
> > > > > > > > > > for
> > > MXNet developers.
> > > > > > > > > >
> > > > > > > > > > In the future, I suggest adding the major downstream
> > > > > > > > > > test cases, like
> > > > > > > > > from
> > > > > > > > > > sockeye, GluonNLP, GLuonCV, DGL, Gluon-TS, into the
> > > > > > > > > > nightly
> > > test.
> > > > > > > > > > If it's still too heavy,  maybe testing it weekly or
> > > > > > > > > > monthly :)
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > > > --Patric
> > > > > > > > > >
> > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > From: Anirudh Subramanian
> > > > > > > > > > > [mailto:anirudh2290@gmail.com]
> > > > > > > > > > > Sent: Friday, June 21, 2019 9:31 AM
> > > > > > > > > > > To: dev@mxnet.incubator.apache.org
> > > > > > > > > > > Cc: dev@mxnet.apache.org
> > > > > > > > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating)
> > > > > > > > > > > version
> > > > > > > > > > > 1.5.0.rc1
> > > > > > > > > > >
> > > > > > > > > > > Hi Lai,
> > > > > > > > > > >
> > > > > > > > > > > I have opened an issue:
> > > > > > > > > > > https://github.com/apache/incubator-mxnet/issues/15297
> > > > > > > > > > > I came to know about this issue only today and I have
> > > > > > > > > > > not been
> > > > > > > > > monitoring
> > > > > > > > > > > sockeye.
> > > > > > > > > > > I jumped onto this issue to make sure it wasn't caused
> > > > > > > > > > > by the dlpack
> > > > > > > > > > changes.
> > > > > > > > > > > Also, I don't  think sockeye CI checks against master,
> > > > > > > > > > > it is using
> > > > > > > > > 1.4.1.
> > > > > > > > > > >
> > > > > > > > > > > Anirudh
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei
> > > > > > > > > > > <ro...@gmail.com>
> > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi,
> > > > > > > > > > > >
> > > > > > > > > > > > Could you share which test failed and what’s the
> > > > > > > > > > > > crash? How to reproduce it?
> > > > > > > > > > > >
> > > > > > > > > > > > I was able to install sockeye and run all tests passed.
> > > > > > > > > > > > Using python setup.py test
> > > > > > > > > > > >
> > > > > > > > > > > > I have tested both nightly pip package and 1.5.0.rc1
> > > > > > > > > > > >
> > > > > > > > > > > > It would be great to create an issue with
> > > > > > > > > > > > reproducible steps and move the discussion there.
> > > > > > > > > > > >
> > > > > > > > > > > > Also I see sockeye nightly build[1] has been failing
> > > > > > > > > > > > for some time,
> > > > > > > > > if
> > > > > > > > > > > > it’s due to MXNet change, please raise this early so
> > > > > > > > > > > > we can track and solve it in time rather than block
> > > > > > > > > > > > the release
> > > during vote time.
> > > > > > > > > > > >
> > > > > > > > > > > > [1] https://travis-ci.org/awslabs/sockeye
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Jun 21, 2019 at 7:01 AM Anirudh Subramanian
> > > > > > > > > > > > <anirudh2290@gmail.com
> > > > > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > I was able to reproduce a crash with the commit
> > > > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06 but not
> > > > > > > > > > > > > with the commit
> > > a862270beb2d796c1ba311183f7f4a766a18ad6c.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Anirudh
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei
> > > > > > > > > > > > > <ro...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Przemyslaw,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Is there an issue with more details to track the
> > problem?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 6:04 AM Przemysław
> > > > > > > > > > > > > > Trędak <pt...@apache.org>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > -1
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > There is a crash in sockeye unit test (python
> > > > > > > > > > > > > > > setup.py
> > > > > > > > > > > > > > > test) observed starting with nightly 1.5 build
> > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > 6/13 and still occuring in
> > > > > > > > > > > > > 1.5rc1. I
> > > > > > > > > > > > > > > don't yet have the exact commit that is
> > > > > > > > > > > > > > > responsible for it, but it is either
> > > > > > > > > > > > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c
> > > > > > > > > > > > > > > (dlpack
> > > > > > > > > > > > > > > related) or
> > > > > > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06
> > > > > > > > > > > > > > > (cached op
> > > > > > > > > > > optimization).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On 2019/06/20 06:36:22, Lai Wei
> > > > > > > > > > > > > > > <ro...@gmail.com>
> > > wrote:
> > > > > > > > > > > > > > > > Dear MXNet community,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > This is the 3-day vote to release Apache
> > > > > > > > > > > > > > > > MXNet
> > > > > > > > > > > > > > > > (incubating) version
> > > > > > > > > > > > > > > 1.5.0.
> > > > > > > > > > > > > > > > Voting on dev@ will start June 19,
> > > > > > > > > > > > > > > > 23:59:59(PST) and close
> > > > > > > > > on
> > > > > > > > > > > > June
> > > > > > > > > > > > > > 22,
> > > > > > > > > > > > > > > > 23:59:59.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 1) Link to release notes:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Re
> > > > > > > > > le
> > > > > > > > > ase+No
> > > > > > > > > te
> > > > > > > > > > > > > s
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 2) Link to release candidate:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.5
> > > > > > > > > .0
> > > > > > > > > .r
> > > > > > > > > > > > > > > > c1
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 3) Link to source and signatures on apache
> > dist server:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5
> > > > > > > > > .0
> > > > > > > > > .r
> > > > > > > > > > > > > > > > c1/
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Please remember to TEST first before voting
> > > accordingly:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +1 = approve
> > > > > > > > > > > > > > > > +0 = no opinion
> > > > > > > > > > > > > > > > -1 = disapprove (provide reason)
> > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > Best Regards
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Lai
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > Best Regards
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Lai
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Best Regards
> > > > > > > > > > > >
> > > > > > > > > > > > Lai
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > --
> > > > > > > > Best Regards
> > > > > > > >
> > > > > > > > Lai
> >
> --
> Best Regards
>
> Lai


Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Posted by Lai Wei <ro...@gmail.com>.
Dear @dev,

I m cancelling the vote for cached op fix:

https://github.com/apache/incubator-mxnet/pull/15298

As for the possible cpu training regression, it looks like not a blocker
for now.

I will start a new rc2 vote, please help to validate.

Thanks!


On Thu, Jun 27, 2019 at 10:06 PM Chen, Ciyong <ci...@intel.com> wrote:

> Hi Pedro,
>
> I was able to reproduced the similar result (v1.5 is ~%5.6 slower than
> v1.4, I was using 18 cores for computing) with your script on C5.18xlarge.
> But need to bind the cores with below command when running the script,
> (without setting the env variables, I got a close time (<1%) with v1.5 and
> v1.4)
>         export KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0
>         export OMP_NUM_THREADS=18
>
> Did you set any env variables during running?
>
> The performance result I got as below:
> 1) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
> real    12m10.856s
> user    234m49.576s
> sys     4m38.044s
>
> 2) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
> real    12m52.140s
> user    246m30.740s
> sys     5m8.188s
>
> As I looked at the profiling data, most of the ops have same perf between
> v1.4 and v1.5. But some ops like " _backward_BatchNorm" and "Pooling" is
> ~1.37x slower on v1.5 compared with v1.4.
> Will do further analysis on these ops.
>
> Here's the hardware/OS info from my side:
> ----------Python Info----------
> Version      : 3.6.8
> Compiler     : GCC 7.3.0
> Build        : ('default', 'Dec 30 2018 01:22:34')
> Arch         : ('64bit', '')
> ------------Pip Info-----------
> Version      : 19.0.3
> Directory    :
> /home/ubuntu/anaconda3/envs/perf-mxnet/lib/python3.6/site-packages/pip
> ----------MXNet Info-----------
> Version      : 1.5.0
> Directory    : /home/ubuntu/ws/incubator-mxnet/python/mxnet
> Hashtag not found. Not installed from pre-built package.
> ----------System Info----------
> Platform     : Linux-4.4.0-1085-aws-x86_64-with-debian-stretch-sid
> system       : Linux
> node         : ip-172-31-32-129
> release      : 4.4.0-1085-aws
> version      : #96-Ubuntu SMP Tue Jun 11 09:08:32 UTC 2019
> ----------Hardware Info----------
> machine      : x86_64
> processor    : x86_64
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                72
> On-line CPU(s) list:   0-71
> Thread(s) per core:    2
> Core(s) per socket:    18
> Socket(s):             2
> NUMA node(s):          2
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 85
> Model name:            Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
> Stepping:              3
> CPU MHz:               3000.000
> BogoMIPS:              6000.00
> Hypervisor vendor:     KVM
> Virtualization type:   full
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              1024K
> L3 cache:              25344K
> NUMA node0 CPU(s):     0-17,36-53
> NUMA node1 CPU(s):     18-35,54-71
> Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
> pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb
> rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc
> aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1
> sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand
> hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase
> tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f rdseed adx
> smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 ida arat pku
> ----------Network Test----------
>
>
> -Ciyong
>
>
> -----Original Message-----
> From: Zhao, Patric [mailto:patric.zhao@intel.com]
> Sent: Thursday, June 27, 2019 9:55 AM
> To: dev@mxnet.incubator.apache.org
> Cc: dev@mxnet.apache.org
> Subject: RE: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1
>
> Could we run more epochs to see the performance difference or profiling
> the difference between good and bad run?
>
> > -----Original Message-----
> > From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com]
> > Sent: Thursday, June 27, 2019 9:35 AM
> > To: dev@mxnet.incubator.apache.org
> > Cc: dev@mxnet.apache.org
> > Subject: Re: [VOTE] Release Apache MXNet (incubating) version
> > 1.5.0.rc1
> >
> > I run again and the gap is again bigger, I guess we need to average
> > out the times across several runs:
> >
> > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
> > (master)+$ time ~/mxnet_1.4/py3_venv/bin/python cifar10.py --epochs 5
> > && time ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
> > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
> > ImageRecordIOParser2:
> > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4 threads
> > for decoding..
> > [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin completed
> > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
> > ImageRecordIOParser2:
> > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4 threads
> > for decoding..
> > [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin completed
> > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005, 300:
> > 0.0001} Epoch 0, Changed learning rate to 0.05 [23:17:09]
> > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > 147456 bytes with malloc directly
> > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > 589824 bytes with malloc directly
> > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > 2359296 bytes with malloc directly
> > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > 9437184 bytes with malloc directly
> > Epoch 0, Batch 199, Speed=384.149839
> > Epoch 0, Duration=140.919567
> > Epoch 0, Training accuracy=0.115169
> > Epoch 0, Validation accuracy=0.141317
> > Epoch 1, Batch 199, Speed=433.380512
> > Epoch 1, Duration=119.553233
> > Epoch 1, Training accuracy=0.170956
> > Epoch 1, Validation accuracy=0.216146
> > Epoch 2, Batch 199, Speed=434.864699
> > Epoch 2, Duration=123.278490
> > Epoch 2, Training accuracy=0.209455
> > Epoch 2, Validation accuracy=0.247296
> > Epoch 3, Batch 199, Speed=433.401854
> > Epoch 3, Duration=118.327797
> > Epoch 3, Training accuracy=0.248701
> > Epoch 3, Validation accuracy=0.302083
> > Epoch 4, Batch 199, Speed=419.713707
> > Epoch 4, Duration=126.468409
> > Epoch 4, Training accuracy=0.260949
> > Epoch 4, Validation accuracy=0.269030
> >
> > real    10m55.796s
> > user    399m33.567s
> > sys     13m55.904s
> > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
> > ImageRecordIOParser2:
> > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4 threads
> > for decoding..
> > [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin completed
> > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
> > ImageRecordIOParser2:
> > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4 threads
> > for decoding..
> > [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin completed
> > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005, 300:
> > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0, Batch 199,
> > Speed=419.039188 Epoch 0, Duration=143.934903 Epoch 0, Training
> > accuracy=0.122542 Epoch 0, Validation accuracy=0.164359 Epoch 1, Batch
> > 199, Speed=445.257048 Epoch 1, Duration=135.248399 Epoch 1, Training
> > accuracy=0.178828 Epoch 1, Validation accuracy=0.199419 Epoch 2, Batch
> > 199, Speed=447.115215 Epoch 2, Duration=132.003770 Epoch 2, Training
> > accuracy=0.217808 Epoch 2, Validation accuracy=0.233073 Epoch 3, Batch
> > 199, Speed=441.079477 Epoch 3, Duration=126.543316 Epoch 3, Training
> > accuracy=0.248102 Epoch 3, Validation accuracy=0.293870 Epoch 4, Batch
> > 199, Speed=449.329787 Epoch 4, Duration=138.398325 Epoch 4, Training
> > accuracy=0.270021 Epoch 4, Validation accuracy=0.311498
> >
> > real    11m45.329s
> > user    426m13.908s
> > sys     16m45.093s
> >
> > On Wed, Jun 26, 2019 at 4:18 PM Pedro Larroy
> > <pe...@gmail.com> wrote:
> > >
> > > The difference looks smaller now, more like your numbers. I wonder
> > > if something happened during the previous benchmark like a system
> > > update...
> > >
> > >
> > > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
> > (master)+$
> > > time ~/mxnet_1.4/py3_venv/bin/python cifar10.py --epochs 5 && time
> > > ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5 [22:49:41]
> > > ../src/io/iter_image_recordio_2.cc:172:
> > > ImageRecordIOParser2:
> > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
> > > threads for decoding..
> > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > completed
> > > [22:49:41] ../src/io/iter_image_recordio_2.cc:172:
> > > ImageRecordIOParser2:
> > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
> > > threads for decoding..
> > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > completed
> > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005, 300:
> > > 0.0001} Epoch 0, Changed learning rate to 0.05 [22:49:42]
> > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > > 147456 bytes with malloc directly
> > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > > 589824 bytes with malloc directly
> > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > > 2359296 bytes with malloc directly
> > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > > 9437184 bytes with malloc directly
> > > Epoch 0, Batch 199, Speed=426.182733 Epoch 0, Duration=134.868458
> > > Epoch 0, Training accuracy=0.127238 Epoch 0, Validation
> > > accuracy=0.206388 Epoch 1, Batch 199, Speed=313.127156 Epoch 1,
> > > Duration=128.041775 Epoch 1, Training accuracy=0.182065 Epoch 1,
> > > Validation accuracy=0.202524 Epoch 2, Batch 199, Speed=410.931187
> > > Epoch 2, Duration=124.920588 Epoch 2, Training accuracy=0.202584
> > > Epoch 2, Validation accuracy=0.245693 Epoch 3, Batch 199,
> > > Speed=419.119335 Epoch 3, Duration=120.948349 Epoch 3, Training
> > > accuracy=0.235854 Epoch 3, Validation accuracy=0.291066 Epoch 4,
> > > Batch 199, Speed=430.473733 Epoch 4, Duration=130.181724 Epoch 4,
> > > Training accuracy=0.257773 Epoch 4, Validation accuracy=0.304988
> > >
> > > real    11m7.356s
> > > user    406m9.910s
> > > sys     14m18.349s
> > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
> > > ImageRecordIOParser2:
> > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
> > > threads for decoding..
> > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > completed
> > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
> > > ImageRecordIOParser2:
> > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
> > > threads for decoding..
> > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > completed
> > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005, 300:
> > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0, Batch 199,
> > > Speed=348.618154 Epoch 0, Duration=146.469352 Epoch 0, Training
> > > accuracy=0.124121 Epoch 0, Validation accuracy=0.167227 Epoch 1,
> > > Batch 199, Speed=452.790825 Epoch 1, Duration=130.199421 Epoch 1,
> > > Training
> > > accuracy=0.183863 Epoch 1, Validation accuracy=0.237079 Epoch 2,
> > > Batch 199, Speed=451.406559 Epoch 2, Duration=126.320823 Epoch 2,
> > > Training
> > > accuracy=0.214844 Epoch 2, Validation accuracy=0.244692 Epoch 3,
> > > Batch 199, Speed=403.161873 Epoch 3, Duration=125.331660 Epoch 3,
> > > Training
> > > accuracy=0.243506 Epoch 3, Validation accuracy=0.301182 Epoch 4,
> > > Batch 199, Speed=450.826598 Epoch 4, Duration=126.426253 Epoch 4,
> > > Training
> > > accuracy=0.266424 Epoch 4, Validation accuracy=0.311899
> > >
> > > real    11m21.930s
> > > user    415m3.855s
> > > sys     13m53.975s
> > >
> > > On Wed, Jun 26, 2019 at 3:50 PM Pedro Larroy
> > > <pe...@gmail.com> wrote:
> > > >
> > > > Hi Ciyong, thanks for trying to reproduce:
> > > >
> > > > I used this one:
> > > > https://github.com/awslabs/deeplearning-
> > benchmark/blob/master/dawnbe
> > > > nch/cifar10.py
> > > >
> > > > Could you provide hardware and OS details?
> > > >
> > > > I will rerun and repost numbers in a few minutes.
> > > >
> > > > Pedro.
> > > >
> > > > On Wed, Jun 26, 2019 at 4:18 AM Chen, Ciyong
> > > > <ci...@intel.com>
> > wrote:
> > > > >
> > > > > Hi Pedro,
> > > > >
> > > > > I'm looking at this case, and using the script of
> > > > > "incubator-mxnet/example/image-classification/train_cifar10.py"
> > > > > to get
> > the timing data, but seems there's not much difference between mxnet
> > 1.4.1.rc0 and 1.5.0.rc1 on C5.18xlarge.
> > > > >
> > > > > Not sure if there's any difference in the python script, can you
> > > > > point me
> > the link to get your script (cifar10.py)?
> > > > > Or you can also have a try with MXNet's script
> > > > > (train_cifar10.py) and see
> > the performance.
> > > > >
> > > > > Here's the command I used to collect the time:
> > > > >         python train_cifar10.py --num-epoch=5
> > > > >
> > > > > 1) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
> > > > >         real    9m4.880s
> > > > >         user    333m13.340s
> > > > >         sys     14m36.100s
> > > > >
> > > > > 2) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
> > > > >         real    9m2.155s
> > > > >         user    329m37.092s
> > > > >         sys     16m8.668s
> > > > >
> > > > > -Ciyong
> > > > >
> > > > >
> > > > > -----Original Message-----
> > > > > From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com]
> > > > > Sent: Wednesday, June 26, 2019 6:28 AM
> > > > > To: dev@mxnet.incubator.apache.org
> > > > > Cc: dev@mxnet.apache.org
> > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version
> > > > > 1.5.0.rc1
> > > > >
> > > > > Hi these were my build flags and system info:
> > > > >
> > > > >
> > > > > --- # CMake configuration
> > > > > USE_CUDA: "OFF" # Build with CUDA support
> > > > > USE_OLDCMAKECUDA: "OFF" # Build with old cmake cuda
> > > > > USE_NCCL: "OFF" # Use NVidia NCCL with CUDA
> > > > > USE_OPENCV: "ON" # Build with OpenCV support
> > > > > USE_OPENMP: "ON" # Build with Openmp support
> > > > > USE_CUDNN: "ON" # Build with cudnn support) # one could set
> > > > > CUDNN_ROOT for search path
> > > > > USE_SSE: "ON" # Build with x86 SSE instruction support IF NOT
> > > > > ARM
> > > > > USE_F16C: "ON" # Build with x86 F16C instruction support) #
> > autodetects support if "ON"
> > > > > USE_LAPACK: "ON" # Build with lapack support
> > > > > USE_MKL_IF_AVAILABLE: "ON" # Use MKL if found
> > > > > USE_MKLML_MKL: "ON" # Use MKLDNN variant of MKL (if MKL found)
> > > > > IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
> > > > > USE_MKLDNN: "ON" # Use MKLDNN variant of MKL (if MKL found) IF
> > > > > USE_MKL_IF_AVAILABLE AND (NOT APPLE)
> > > > > USE_OPERATOR_TUNING: "ON" # Enable auto-tuning of operators IF
> > NOT
> > > > > MSVC
> > > > > USE_GPERFTOOLS: "ON" # Build with GPerfTools support (if found)
> > > > > USE_JEMALLOC: "ON" # Build with Jemalloc support
> > > > > USE_PROFILER: "ON" # Build with Profiler support
> > > > > USE_DIST_KVSTORE: "OFF" # Build with DIST_KVSTORE support
> > > > > USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC Plugins
> > > > > USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin
> > > > > USE_CPP_PACKAGE: "OFF" # Build C++ Package
> > > > > USE_MXNET_LIB_NAMING: "ON" # Use MXNet library naming
> > conventions.
> > > > > USE_GPROF: "OFF" # Compile with gprof (profiling) flag
> > > > > USE_CXX14_IF_AVAILABLE: "OFF" # Build with C++14 if the compiler
> > > > > supports it
> > > > > USE_VTUNE: "OFF" # Enable use of Intel Amplifier XE (VTune)) #
> > > > > one could set VTUNE_ROOT for search path
> > > > > ENABLE_CUDA_RTC: "ON" # Build with CUDA runtime compilation
> > > > > support
> > > > > BUILD_CPP_EXAMPLES: "ON" # Build cpp examples
> > > > > INSTALL_EXAMPLES: "OFF" # Install the example source files.
> > > > > USE_SIGNAL_HANDLER: "ON" # Print stack traces on segfaults.
> > > > > USE_TENSORRT: "OFF" # Enable infeference optimization with
> TensorRT.
> > > > > USE_ASAN: "OFF" # Enable Clang/GCC ASAN sanitizers.
> > > > > ENABLE_TESTCOVERAGE: "OFF" # Enable compilation with test
> > > > > coverage metric output
> > > > > CMAKE_BUILD_TYPE: "Release"
> > > > > CMAKE_CUDA_COMPILER_LAUNCHER: "ccache"
> > > > > CMAKE_C_COMPILER_LAUNCHER: "ccache"
> > > > > CMAKE_CXX_COMPILER_LAUNCHER: "ccache"
> > > > >
> > > > > commit 4d9667121ae6fb643f2a02ab15e25231ed756cde (HEAD, tag:
> > > > > 1.5.0.rc1,
> > > > > upstream/v1.5.x)
> > > > > commit 1a7199691f5cbc6012bb53eecbf884bed5ae6590 (HEAD, tag:
> > > > > 1.4.1.rc0,
> > > > > upstream/v1.4.x)
> > > > >
> > > > > curl http://169.254.169.254/latest/meta-data/instance-type
> > > > > c5d.18xlarge
> > > > >
> > > > >
> > > > > Version      : 3.6.7
> > > > > Compiler     : GCC 8.2.0
> > > > > Build        : ('default', 'Oct 22 2018 11:32:17')
> > > > > Arch         : ('64bit', 'ELF')
> > > > > ------------Pip Info-----------
> > > > > Version      : 19.1.1
> > > > > Directory    : /home/piotr/mxnet_1.5/py3_venv/lib/python3.6/site-
> > packages/pip
> > > > > ----------MXNet Info-----------
> > > > > Version      : 1.5.0
> > > > > Directory    : /home/piotr/mxnet_1.5/python/mxnet
> > > > > Hashtag not found. Not installed from pre-built package.
> > > > > ----------System Info----------
> > > > > Platform     :
> Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
> > > > > system       : Linux
> > > > > node         : ip-172-31-63-171
> > > > > release      : 4.15.0-1035-aws
> > > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
> > > > > ----------Hardware Info----------
> > > > > machine      : x86_64
> > > > > processor    : x86_64
> > > > > Architecture:        x86_64
> > > > > CPU op-mode(s):      32-bit, 64-bit
> > > > > Byte Order:          Little Endian
> > > > > CPU(s):              72
> > > > > On-line CPU(s) list: 0-71
> > > > > Thread(s) per core:  2
> > > > > Core(s) per socket:  18
> > > > > Socket(s):           2
> > > > > NUMA node(s):        2
> > > > > Vendor ID:           GenuineIntel
> > > > > CPU family:          6
> > > > > Model:               85
> > > > > Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
> > > > > Stepping:            4
> > > > > CPU MHz:             1326.446
> > > > > BogoMIPS:            6000.00
> > > > > Hypervisor vendor:   KVM
> > > > > Virtualization type: full
> > > > > L1d cache:           32K
> > > > > L1i cache:           32K
> > > > > L2 cache:            1024K
> > > > > L3 cache:            25344K
> > > > > NUMA node0 CPU(s):   0-17,36-53
> > > > > NUMA node1 CPU(s):   18-35,54-71
> > > > > Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep
> mtrr
> > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall
> > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl
> > > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq monitor
> > > > > ssse3 fma cx16 pcid
> > > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave
> > > > > avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
> > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2
> > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt
> > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves
> > > > > ida arat pku ospke ----------Network Test----------
> > > > >
> > > > > ----------Python Info----------
> > > > > Version      : 3.6.7
> > > > > Compiler     : GCC 8.2.0
> > > > > Build        : ('default', 'Oct 22 2018 11:32:17')
> > > > > Arch         : ('64bit', 'ELF')
> > > > > ------------Pip Info-----------
> > > > > Version      : 19.1.1
> > > > > Directory    : /home/piotr/mxnet_1.4/py3_venv/lib/python3.6/site-
> > packages/pip
> > > > > ----------MXNet Info-----------
> > > > > Version      : 1.4.1
> > > > > Directory    : /home/piotr/mxnet_1.4/python/mxnet
> > > > > Hashtag not found. Not installed from pre-built package.
> > > > > ----------System Info----------
> > > > > Platform     :
> Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
> > > > > system       : Linux
> > > > > node         : ip-172-31-63-171
> > > > > release      : 4.15.0-1035-aws
> > > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
> > > > > ----------Hardware Info----------
> > > > > machine      : x86_64
> > > > > processor    : x86_64
> > > > > Architecture:        x86_64
> > > > > CPU op-mode(s):      32-bit, 64-bit
> > > > > Byte Order:          Little Endian
> > > > > CPU(s):              72
> > > > > On-line CPU(s) list: 0-71
> > > > > Thread(s) per core:  2
> > > > > Core(s) per socket:  18
> > > > > Socket(s):           2
> > > > > NUMA node(s):        2
> > > > > Vendor ID:           GenuineIntel
> > > > > CPU family:          6
> > > > > Model:               85
> > > > > Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
> > > > > Stepping:            4
> > > > > CPU MHz:             1223.344
> > > > > BogoMIPS:            6000.00
> > > > > Hypervisor vendor:   KVM
> > > > > Virtualization type: full
> > > > > L1d cache:           32K
> > > > > L1i cache:           32K
> > > > > L2 cache:            1024K
> > > > > L3 cache:            25344K
> > > > > NUMA node0 CPU(s):   0-17,36-53
> > > > > NUMA node1 CPU(s):   18-35,54-71
> > > > > Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep
> mtrr
> > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall
> > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl
> > > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq monitor
> > > > > ssse3 fma cx16 pcid
> > > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave
> > > > > avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
> > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2
> > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt
> > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves
> > > > > ida arat pku ospke ----------Network Test----------
> > > > >
> > > > > On Tue, Jun 25, 2019 at 2:35 PM Pedro Larroy
> > <pe...@gmail.com> wrote:
> > > > > >
> > > > > > I did a training of cifar10 in CPU and seems there's some
> > > > > > regressions in the range of 7% increase of training time against
> 1.4.1:
> > > > > >
> > > > > > (py3_venv)
> > > > > > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
> > > > > > (master)+$ time python cifar10.py --epochs 5
> > > > > > real    11m30.388s
> > > > > > user    417m7.766s
> > > > > > sys     16m57.315s
> > > > > >
> > > > > > VS 1.4.1:
> > > > > > real    10m41.994s
> > > > > > user    392m40.646s
> > > > > > sys     12m30.601s
> > > > > >
> > > > > >
> > > > > > On Thu, Jun 20, 2019 at 10:15 PM Lai Wei <ro...@gmail.com>
> > wrote:
> > > > > > >
> > > > > > > Hi Anirudh,
> > > > > > >
> > > > > > > Thanks for jumping into this quickly, I followed up on the
> issue.
> > > > > > >
> > > > > > > I was meant for sockeye developer/maintainers to help setup
> > > > > > > nightly tests and raise issues early.
> > > > > > >
> > > > > > > Thanks!
> > > > > > >
> > > > > > > On Fri, Jun 21, 2019 at 10:10 AM Haibin Lin
> > > > > > > <ha...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > In GluonNLP we are testing with MXNET nightly build for
> > > > > > > > each PR, and we did find some MXNet related issue caught by
> the CI.
> > > > > > > > I recommend other toolkits also add integration tests with
> > > > > > > > MXNet
> > nightly.
> > > > > > > > It helps identify issues early.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Haibin
> > > > > > > >
> > > > > > > > On Thu, Jun 20, 2019 at 18:52 Zhao, Patric
> > > > > > > > <pa...@intel.com>
> > wrote:
> > > > > > > >
> > > > > > > > > Thanks to raise the issue and we will take a look ASAP.
> > > > > > > > >
> > > > > > > > > The downstream cases is not in the MXNet CI so it's hard
> > > > > > > > > to catch the potential bugs or performance degradation
> > > > > > > > > for
> > MXNet developers.
> > > > > > > > >
> > > > > > > > > In the future, I suggest adding the major downstream
> > > > > > > > > test cases, like
> > > > > > > > from
> > > > > > > > > sockeye, GluonNLP, GLuonCV, DGL, Gluon-TS, into the
> > > > > > > > > nightly
> > test.
> > > > > > > > > If it's still too heavy,  maybe testing it weekly or
> > > > > > > > > monthly :)
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > --Patric
> > > > > > > > >
> > > > > > > > > > -----Original Message-----
> > > > > > > > > > From: Anirudh Subramanian
> > > > > > > > > > [mailto:anirudh2290@gmail.com]
> > > > > > > > > > Sent: Friday, June 21, 2019 9:31 AM
> > > > > > > > > > To: dev@mxnet.incubator.apache.org
> > > > > > > > > > Cc: dev@mxnet.apache.org
> > > > > > > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating)
> > > > > > > > > > version
> > > > > > > > > > 1.5.0.rc1
> > > > > > > > > >
> > > > > > > > > > Hi Lai,
> > > > > > > > > >
> > > > > > > > > > I have opened an issue:
> > > > > > > > > > https://github.com/apache/incubator-mxnet/issues/15297
> > > > > > > > > > I came to know about this issue only today and I have
> > > > > > > > > > not been
> > > > > > > > monitoring
> > > > > > > > > > sockeye.
> > > > > > > > > > I jumped onto this issue to make sure it wasn't caused
> > > > > > > > > > by the dlpack
> > > > > > > > > changes.
> > > > > > > > > > Also, I don't  think sockeye CI checks against master,
> > > > > > > > > > it is using
> > > > > > > > 1.4.1.
> > > > > > > > > >
> > > > > > > > > > Anirudh
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei
> > > > > > > > > > <ro...@gmail.com>
> > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi,
> > > > > > > > > > >
> > > > > > > > > > > Could you share which test failed and what’s the
> > > > > > > > > > > crash? How to reproduce it?
> > > > > > > > > > >
> > > > > > > > > > > I was able to install sockeye and run all tests passed.
> > > > > > > > > > > Using python setup.py test
> > > > > > > > > > >
> > > > > > > > > > > I have tested both nightly pip package and 1.5.0.rc1
> > > > > > > > > > >
> > > > > > > > > > > It would be great to create an issue with
> > > > > > > > > > > reproducible steps and move the discussion there.
> > > > > > > > > > >
> > > > > > > > > > > Also I see sockeye nightly build[1] has been failing
> > > > > > > > > > > for some time,
> > > > > > > > if
> > > > > > > > > > > it’s due to MXNet change, please raise this early so
> > > > > > > > > > > we can track and solve it in time rather than block
> > > > > > > > > > > the release
> > during vote time.
> > > > > > > > > > >
> > > > > > > > > > > [1] https://travis-ci.org/awslabs/sockeye
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Jun 21, 2019 at 7:01 AM Anirudh Subramanian
> > > > > > > > > > > <anirudh2290@gmail.com
> > > > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > I was able to reproduce a crash with the commit
> > > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06 but not
> > > > > > > > > > > > with the commit
> > a862270beb2d796c1ba311183f7f4a766a18ad6c.
> > > > > > > > > > > >
> > > > > > > > > > > > Anirudh
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei
> > > > > > > > > > > > <ro...@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi Przemyslaw,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Is there an issue with more details to track the
> problem?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Jun 21, 2019 at 6:04 AM Przemysław
> > > > > > > > > > > > > Trędak <pt...@apache.org>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > -1
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > There is a crash in sockeye unit test (python
> > > > > > > > > > > > > > setup.py
> > > > > > > > > > > > > > test) observed starting with nightly 1.5 build
> > > > > > > > > > > > > > from
> > > > > > > > > > > > > > 6/13 and still occuring in
> > > > > > > > > > > > 1.5rc1. I
> > > > > > > > > > > > > > don't yet have the exact commit that is
> > > > > > > > > > > > > > responsible for it, but it is either
> > > > > > > > > > > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c
> > > > > > > > > > > > > > (dlpack
> > > > > > > > > > > > > > related) or
> > > > > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06
> > > > > > > > > > > > > > (cached op
> > > > > > > > > > optimization).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On 2019/06/20 06:36:22, Lai Wei
> > > > > > > > > > > > > > <ro...@gmail.com>
> > wrote:
> > > > > > > > > > > > > > > Dear MXNet community,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This is the 3-day vote to release Apache
> > > > > > > > > > > > > > > MXNet
> > > > > > > > > > > > > > > (incubating) version
> > > > > > > > > > > > > > 1.5.0.
> > > > > > > > > > > > > > > Voting on dev@ will start June 19,
> > > > > > > > > > > > > > > 23:59:59(PST) and close
> > > > > > > > on
> > > > > > > > > > > June
> > > > > > > > > > > > > 22,
> > > > > > > > > > > > > > > 23:59:59.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 1) Link to release notes:
> > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Re
> > > > > > > > le
> > > > > > > > ase+No
> > > > > > > > te
> > > > > > > > > > > > s
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 2) Link to release candidate:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.5
> > > > > > > > .0
> > > > > > > > .r
> > > > > > > > > > > > > > > c1
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 3) Link to source and signatures on apache
> dist server:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5
> > > > > > > > .0
> > > > > > > > .r
> > > > > > > > > > > > > > > c1/
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Please remember to TEST first before voting
> > accordingly:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +1 = approve
> > > > > > > > > > > > > > > +0 = no opinion
> > > > > > > > > > > > > > > -1 = disapprove (provide reason)
> > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > Best Regards
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Lai
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > Best Regards
> > > > > > > > > > > > >
> > > > > > > > > > > > > Lai
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Best Regards
> > > > > > > > > > >
> > > > > > > > > > > Lai
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > --
> > > > > > > Best Regards
> > > > > > >
> > > > > > > Lai
>
-- 
Best Regards

Lai

RE: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Posted by "Chen, Ciyong" <ci...@intel.com>.
Hi Pedro,

I was able to reproduced the similar result (v1.5 is ~%5.6 slower than v1.4, I was using 18 cores for computing) with your script on C5.18xlarge.
But need to bind the cores with below command when running the script, (without setting the env variables, I got a close time (<1%) with v1.5 and v1.4)
	export KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0
	export OMP_NUM_THREADS=18

Did you set any env variables during running?

The performance result I got as below:
1) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
real	12m10.856s
user	234m49.576s
sys	4m38.044s

2) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
real	12m52.140s
user	246m30.740s
sys	5m8.188s

As I looked at the profiling data, most of the ops have same perf between v1.4 and v1.5. But some ops like " _backward_BatchNorm" and "Pooling" is ~1.37x slower on v1.5 compared with v1.4.
Will do further analysis on these ops.

Here's the hardware/OS info from my side:
----------Python Info----------
Version      : 3.6.8
Compiler     : GCC 7.3.0
Build        : ('default', 'Dec 30 2018 01:22:34')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 19.0.3
Directory    : /home/ubuntu/anaconda3/envs/perf-mxnet/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version      : 1.5.0
Directory    : /home/ubuntu/ws/incubator-mxnet/python/mxnet
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform     : Linux-4.4.0-1085-aws-x86_64-with-debian-stretch-sid
system       : Linux
node         : ip-172-31-32-129
release      : 4.4.0-1085-aws
version      : #96-Ubuntu SMP Tue Jun 11 09:08:32 UTC 2019
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                72
On-line CPU(s) list:   0-71
Thread(s) per core:    2
Core(s) per socket:    18
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
Stepping:              3
CPU MHz:               3000.000
BogoMIPS:              6000.00
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              25344K
NUMA node0 CPU(s):     0-17,36-53
NUMA node1 CPU(s):     18-35,54-71
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f rdseed adx smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 ida arat pku
----------Network Test----------


-Ciyong


-----Original Message-----
From: Zhao, Patric [mailto:patric.zhao@intel.com] 
Sent: Thursday, June 27, 2019 9:55 AM
To: dev@mxnet.incubator.apache.org
Cc: dev@mxnet.apache.org
Subject: RE: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Could we run more epochs to see the performance difference or profiling the difference between good and bad run?

> -----Original Message-----
> From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com]
> Sent: Thursday, June 27, 2019 9:35 AM
> To: dev@mxnet.incubator.apache.org
> Cc: dev@mxnet.apache.org
> Subject: Re: [VOTE] Release Apache MXNet (incubating) version 
> 1.5.0.rc1
> 
> I run again and the gap is again bigger, I guess we need to average 
> out the times across several runs:
> 
> piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
> (master)+$ time ~/mxnet_1.4/py3_venv/bin/python cifar10.py --epochs 5 
> && time ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5 
> [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
> ImageRecordIOParser2:
> /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4 threads 
> for decoding..
> [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean image 
> from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean image 
> from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin completed 
> [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
> ImageRecordIOParser2:
> /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4 threads 
> for decoding..
> [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean image 
> from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean image 
> from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin completed
> lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005, 300: 
> 0.0001} Epoch 0, Changed learning rate to 0.05 [23:17:09] 
> ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> 147456 bytes with malloc directly
> [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> 589824 bytes with malloc directly
> [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> 2359296 bytes with malloc directly
> [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> 9437184 bytes with malloc directly
> Epoch 0, Batch 199, Speed=384.149839
> Epoch 0, Duration=140.919567
> Epoch 0, Training accuracy=0.115169
> Epoch 0, Validation accuracy=0.141317
> Epoch 1, Batch 199, Speed=433.380512
> Epoch 1, Duration=119.553233
> Epoch 1, Training accuracy=0.170956
> Epoch 1, Validation accuracy=0.216146
> Epoch 2, Batch 199, Speed=434.864699
> Epoch 2, Duration=123.278490
> Epoch 2, Training accuracy=0.209455
> Epoch 2, Validation accuracy=0.247296
> Epoch 3, Batch 199, Speed=433.401854
> Epoch 3, Duration=118.327797
> Epoch 3, Training accuracy=0.248701
> Epoch 3, Validation accuracy=0.302083
> Epoch 4, Batch 199, Speed=419.713707
> Epoch 4, Duration=126.468409
> Epoch 4, Training accuracy=0.260949
> Epoch 4, Validation accuracy=0.269030
> 
> real    10m55.796s
> user    399m33.567s
> sys     13m55.904s
> [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
> ImageRecordIOParser2:
> /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4 threads 
> for decoding..
> [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean image 
> from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean image 
> from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin completed 
> [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
> ImageRecordIOParser2:
> /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4 threads 
> for decoding..
> [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean image 
> from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean image 
> from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin completed
> lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005, 300: 
> 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0, Batch 199,
> Speed=419.039188 Epoch 0, Duration=143.934903 Epoch 0, Training
> accuracy=0.122542 Epoch 0, Validation accuracy=0.164359 Epoch 1, Batch 
> 199, Speed=445.257048 Epoch 1, Duration=135.248399 Epoch 1, Training
> accuracy=0.178828 Epoch 1, Validation accuracy=0.199419 Epoch 2, Batch 
> 199, Speed=447.115215 Epoch 2, Duration=132.003770 Epoch 2, Training
> accuracy=0.217808 Epoch 2, Validation accuracy=0.233073 Epoch 3, Batch 
> 199, Speed=441.079477 Epoch 3, Duration=126.543316 Epoch 3, Training
> accuracy=0.248102 Epoch 3, Validation accuracy=0.293870 Epoch 4, Batch 
> 199, Speed=449.329787 Epoch 4, Duration=138.398325 Epoch 4, Training
> accuracy=0.270021 Epoch 4, Validation accuracy=0.311498
> 
> real    11m45.329s
> user    426m13.908s
> sys     16m45.093s
> 
> On Wed, Jun 26, 2019 at 4:18 PM Pedro Larroy 
> <pe...@gmail.com> wrote:
> >
> > The difference looks smaller now, more like your numbers. I wonder 
> > if something happened during the previous benchmark like a system 
> > update...
> >
> >
> > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
> (master)+$
> > time ~/mxnet_1.4/py3_venv/bin/python cifar10.py --epochs 5 && time 
> > ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5 [22:49:41]
> > ../src/io/iter_image_recordio_2.cc:172:
> > ImageRecordIOParser2:
> > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4 
> > threads for decoding..
> > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load mean image 
> > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load mean image 
> > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> completed
> > [22:49:41] ../src/io/iter_image_recordio_2.cc:172:
> > ImageRecordIOParser2:
> > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4 
> > threads for decoding..
> > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load mean image 
> > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load mean image 
> > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> completed
> > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005, 300:
> > 0.0001} Epoch 0, Changed learning rate to 0.05 [22:49:42]
> > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > 147456 bytes with malloc directly
> > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > 589824 bytes with malloc directly
> > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > 2359296 bytes with malloc directly
> > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > 9437184 bytes with malloc directly
> > Epoch 0, Batch 199, Speed=426.182733 Epoch 0, Duration=134.868458 
> > Epoch 0, Training accuracy=0.127238 Epoch 0, Validation 
> > accuracy=0.206388 Epoch 1, Batch 199, Speed=313.127156 Epoch 1, 
> > Duration=128.041775 Epoch 1, Training accuracy=0.182065 Epoch 1, 
> > Validation accuracy=0.202524 Epoch 2, Batch 199, Speed=410.931187 
> > Epoch 2, Duration=124.920588 Epoch 2, Training accuracy=0.202584 
> > Epoch 2, Validation accuracy=0.245693 Epoch 3, Batch 199, 
> > Speed=419.119335 Epoch 3, Duration=120.948349 Epoch 3, Training 
> > accuracy=0.235854 Epoch 3, Validation accuracy=0.291066 Epoch 4, 
> > Batch 199, Speed=430.473733 Epoch 4, Duration=130.181724 Epoch 4, 
> > Training accuracy=0.257773 Epoch 4, Validation accuracy=0.304988
> >
> > real    11m7.356s
> > user    406m9.910s
> > sys     14m18.349s
> > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
> > ImageRecordIOParser2:
> > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4 
> > threads for decoding..
> > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load mean image 
> > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load mean image 
> > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> completed
> > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
> > ImageRecordIOParser2:
> > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4 
> > threads for decoding..
> > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load mean image 
> > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load mean image 
> > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> completed
> > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005, 300:
> > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0, Batch 199,
> > Speed=348.618154 Epoch 0, Duration=146.469352 Epoch 0, Training
> > accuracy=0.124121 Epoch 0, Validation accuracy=0.167227 Epoch 1, 
> > Batch 199, Speed=452.790825 Epoch 1, Duration=130.199421 Epoch 1, 
> > Training
> > accuracy=0.183863 Epoch 1, Validation accuracy=0.237079 Epoch 2, 
> > Batch 199, Speed=451.406559 Epoch 2, Duration=126.320823 Epoch 2, 
> > Training
> > accuracy=0.214844 Epoch 2, Validation accuracy=0.244692 Epoch 3, 
> > Batch 199, Speed=403.161873 Epoch 3, Duration=125.331660 Epoch 3, 
> > Training
> > accuracy=0.243506 Epoch 3, Validation accuracy=0.301182 Epoch 4, 
> > Batch 199, Speed=450.826598 Epoch 4, Duration=126.426253 Epoch 4, 
> > Training
> > accuracy=0.266424 Epoch 4, Validation accuracy=0.311899
> >
> > real    11m21.930s
> > user    415m3.855s
> > sys     13m53.975s
> >
> > On Wed, Jun 26, 2019 at 3:50 PM Pedro Larroy 
> > <pe...@gmail.com> wrote:
> > >
> > > Hi Ciyong, thanks for trying to reproduce:
> > >
> > > I used this one:
> > > https://github.com/awslabs/deeplearning-
> benchmark/blob/master/dawnbe
> > > nch/cifar10.py
> > >
> > > Could you provide hardware and OS details?
> > >
> > > I will rerun and repost numbers in a few minutes.
> > >
> > > Pedro.
> > >
> > > On Wed, Jun 26, 2019 at 4:18 AM Chen, Ciyong 
> > > <ci...@intel.com>
> wrote:
> > > >
> > > > Hi Pedro,
> > > >
> > > > I'm looking at this case, and using the script of 
> > > > "incubator-mxnet/example/image-classification/train_cifar10.py" 
> > > > to get
> the timing data, but seems there's not much difference between mxnet
> 1.4.1.rc0 and 1.5.0.rc1 on C5.18xlarge.
> > > >
> > > > Not sure if there's any difference in the python script, can you 
> > > > point me
> the link to get your script (cifar10.py)?
> > > > Or you can also have a try with MXNet's script 
> > > > (train_cifar10.py) and see
> the performance.
> > > >
> > > > Here's the command I used to collect the time:
> > > >         python train_cifar10.py --num-epoch=5
> > > >
> > > > 1) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
> > > >         real    9m4.880s
> > > >         user    333m13.340s
> > > >         sys     14m36.100s
> > > >
> > > > 2) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
> > > >         real    9m2.155s
> > > >         user    329m37.092s
> > > >         sys     16m8.668s
> > > >
> > > > -Ciyong
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com]
> > > > Sent: Wednesday, June 26, 2019 6:28 AM
> > > > To: dev@mxnet.incubator.apache.org
> > > > Cc: dev@mxnet.apache.org
> > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version
> > > > 1.5.0.rc1
> > > >
> > > > Hi these were my build flags and system info:
> > > >
> > > >
> > > > --- # CMake configuration
> > > > USE_CUDA: "OFF" # Build with CUDA support
> > > > USE_OLDCMAKECUDA: "OFF" # Build with old cmake cuda
> > > > USE_NCCL: "OFF" # Use NVidia NCCL with CUDA
> > > > USE_OPENCV: "ON" # Build with OpenCV support
> > > > USE_OPENMP: "ON" # Build with Openmp support
> > > > USE_CUDNN: "ON" # Build with cudnn support) # one could set 
> > > > CUDNN_ROOT for search path
> > > > USE_SSE: "ON" # Build with x86 SSE instruction support IF NOT 
> > > > ARM
> > > > USE_F16C: "ON" # Build with x86 F16C instruction support) #
> autodetects support if "ON"
> > > > USE_LAPACK: "ON" # Build with lapack support
> > > > USE_MKL_IF_AVAILABLE: "ON" # Use MKL if found
> > > > USE_MKLML_MKL: "ON" # Use MKLDNN variant of MKL (if MKL found) 
> > > > IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
> > > > USE_MKLDNN: "ON" # Use MKLDNN variant of MKL (if MKL found) IF 
> > > > USE_MKL_IF_AVAILABLE AND (NOT APPLE)
> > > > USE_OPERATOR_TUNING: "ON" # Enable auto-tuning of operators IF
> NOT
> > > > MSVC
> > > > USE_GPERFTOOLS: "ON" # Build with GPerfTools support (if found)
> > > > USE_JEMALLOC: "ON" # Build with Jemalloc support
> > > > USE_PROFILER: "ON" # Build with Profiler support
> > > > USE_DIST_KVSTORE: "OFF" # Build with DIST_KVSTORE support
> > > > USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC Plugins
> > > > USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin
> > > > USE_CPP_PACKAGE: "OFF" # Build C++ Package
> > > > USE_MXNET_LIB_NAMING: "ON" # Use MXNet library naming
> conventions.
> > > > USE_GPROF: "OFF" # Compile with gprof (profiling) flag
> > > > USE_CXX14_IF_AVAILABLE: "OFF" # Build with C++14 if the compiler 
> > > > supports it
> > > > USE_VTUNE: "OFF" # Enable use of Intel Amplifier XE (VTune)) # 
> > > > one could set VTUNE_ROOT for search path
> > > > ENABLE_CUDA_RTC: "ON" # Build with CUDA runtime compilation 
> > > > support
> > > > BUILD_CPP_EXAMPLES: "ON" # Build cpp examples
> > > > INSTALL_EXAMPLES: "OFF" # Install the example source files.
> > > > USE_SIGNAL_HANDLER: "ON" # Print stack traces on segfaults.
> > > > USE_TENSORRT: "OFF" # Enable infeference optimization with TensorRT.
> > > > USE_ASAN: "OFF" # Enable Clang/GCC ASAN sanitizers.
> > > > ENABLE_TESTCOVERAGE: "OFF" # Enable compilation with test 
> > > > coverage metric output
> > > > CMAKE_BUILD_TYPE: "Release"
> > > > CMAKE_CUDA_COMPILER_LAUNCHER: "ccache"
> > > > CMAKE_C_COMPILER_LAUNCHER: "ccache"
> > > > CMAKE_CXX_COMPILER_LAUNCHER: "ccache"
> > > >
> > > > commit 4d9667121ae6fb643f2a02ab15e25231ed756cde (HEAD, tag:
> > > > 1.5.0.rc1,
> > > > upstream/v1.5.x)
> > > > commit 1a7199691f5cbc6012bb53eecbf884bed5ae6590 (HEAD, tag:
> > > > 1.4.1.rc0,
> > > > upstream/v1.4.x)
> > > >
> > > > curl http://169.254.169.254/latest/meta-data/instance-type
> > > > c5d.18xlarge
> > > >
> > > >
> > > > Version      : 3.6.7
> > > > Compiler     : GCC 8.2.0
> > > > Build        : ('default', 'Oct 22 2018 11:32:17')
> > > > Arch         : ('64bit', 'ELF')
> > > > ------------Pip Info-----------
> > > > Version      : 19.1.1
> > > > Directory    : /home/piotr/mxnet_1.5/py3_venv/lib/python3.6/site-
> packages/pip
> > > > ----------MXNet Info-----------
> > > > Version      : 1.5.0
> > > > Directory    : /home/piotr/mxnet_1.5/python/mxnet
> > > > Hashtag not found. Not installed from pre-built package.
> > > > ----------System Info----------
> > > > Platform     : Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
> > > > system       : Linux
> > > > node         : ip-172-31-63-171
> > > > release      : 4.15.0-1035-aws
> > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
> > > > ----------Hardware Info----------
> > > > machine      : x86_64
> > > > processor    : x86_64
> > > > Architecture:        x86_64
> > > > CPU op-mode(s):      32-bit, 64-bit
> > > > Byte Order:          Little Endian
> > > > CPU(s):              72
> > > > On-line CPU(s) list: 0-71
> > > > Thread(s) per core:  2
> > > > Core(s) per socket:  18
> > > > Socket(s):           2
> > > > NUMA node(s):        2
> > > > Vendor ID:           GenuineIntel
> > > > CPU family:          6
> > > > Model:               85
> > > > Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
> > > > Stepping:            4
> > > > CPU MHz:             1326.446
> > > > BogoMIPS:            6000.00
> > > > Hypervisor vendor:   KVM
> > > > Virtualization type: full
> > > > L1d cache:           32K
> > > > L1i cache:           32K
> > > > L2 cache:            1024K
> > > > L3 cache:            25344K
> > > > NUMA node0 CPU(s):   0-17,36-53
> > > > NUMA node1 CPU(s):   18-35,54-71
> > > > Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
> > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall 
> > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl 
> > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq monitor 
> > > > ssse3 fma cx16 pcid
> > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave 
> > > > avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch 
> > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 
> > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt 
> > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves 
> > > > ida arat pku ospke ----------Network Test----------
> > > >
> > > > ----------Python Info----------
> > > > Version      : 3.6.7
> > > > Compiler     : GCC 8.2.0
> > > > Build        : ('default', 'Oct 22 2018 11:32:17')
> > > > Arch         : ('64bit', 'ELF')
> > > > ------------Pip Info-----------
> > > > Version      : 19.1.1
> > > > Directory    : /home/piotr/mxnet_1.4/py3_venv/lib/python3.6/site-
> packages/pip
> > > > ----------MXNet Info-----------
> > > > Version      : 1.4.1
> > > > Directory    : /home/piotr/mxnet_1.4/python/mxnet
> > > > Hashtag not found. Not installed from pre-built package.
> > > > ----------System Info----------
> > > > Platform     : Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
> > > > system       : Linux
> > > > node         : ip-172-31-63-171
> > > > release      : 4.15.0-1035-aws
> > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
> > > > ----------Hardware Info----------
> > > > machine      : x86_64
> > > > processor    : x86_64
> > > > Architecture:        x86_64
> > > > CPU op-mode(s):      32-bit, 64-bit
> > > > Byte Order:          Little Endian
> > > > CPU(s):              72
> > > > On-line CPU(s) list: 0-71
> > > > Thread(s) per core:  2
> > > > Core(s) per socket:  18
> > > > Socket(s):           2
> > > > NUMA node(s):        2
> > > > Vendor ID:           GenuineIntel
> > > > CPU family:          6
> > > > Model:               85
> > > > Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
> > > > Stepping:            4
> > > > CPU MHz:             1223.344
> > > > BogoMIPS:            6000.00
> > > > Hypervisor vendor:   KVM
> > > > Virtualization type: full
> > > > L1d cache:           32K
> > > > L1i cache:           32K
> > > > L2 cache:            1024K
> > > > L3 cache:            25344K
> > > > NUMA node0 CPU(s):   0-17,36-53
> > > > NUMA node1 CPU(s):   18-35,54-71
> > > > Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
> > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall 
> > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl 
> > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq monitor 
> > > > ssse3 fma cx16 pcid
> > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave 
> > > > avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch 
> > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 
> > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt 
> > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves 
> > > > ida arat pku ospke ----------Network Test----------
> > > >
> > > > On Tue, Jun 25, 2019 at 2:35 PM Pedro Larroy
> <pe...@gmail.com> wrote:
> > > > >
> > > > > I did a training of cifar10 in CPU and seems there's some 
> > > > > regressions in the range of 7% increase of training time against 1.4.1:
> > > > >
> > > > > (py3_venv)
> > > > > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
> > > > > (master)+$ time python cifar10.py --epochs 5
> > > > > real    11m30.388s
> > > > > user    417m7.766s
> > > > > sys     16m57.315s
> > > > >
> > > > > VS 1.4.1:
> > > > > real    10m41.994s
> > > > > user    392m40.646s
> > > > > sys     12m30.601s
> > > > >
> > > > >
> > > > > On Thu, Jun 20, 2019 at 10:15 PM Lai Wei <ro...@gmail.com>
> wrote:
> > > > > >
> > > > > > Hi Anirudh,
> > > > > >
> > > > > > Thanks for jumping into this quickly, I followed up on the issue.
> > > > > >
> > > > > > I was meant for sockeye developer/maintainers to help setup 
> > > > > > nightly tests and raise issues early.
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > On Fri, Jun 21, 2019 at 10:10 AM Haibin Lin 
> > > > > > <ha...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > In GluonNLP we are testing with MXNET nightly build for 
> > > > > > > each PR, and we did find some MXNet related issue caught by the CI.
> > > > > > > I recommend other toolkits also add integration tests with 
> > > > > > > MXNet
> nightly.
> > > > > > > It helps identify issues early.
> > > > > > >
> > > > > > > Best,
> > > > > > > Haibin
> > > > > > >
> > > > > > > On Thu, Jun 20, 2019 at 18:52 Zhao, Patric 
> > > > > > > <pa...@intel.com>
> wrote:
> > > > > > >
> > > > > > > > Thanks to raise the issue and we will take a look ASAP.
> > > > > > > >
> > > > > > > > The downstream cases is not in the MXNet CI so it's hard 
> > > > > > > > to catch the potential bugs or performance degradation 
> > > > > > > > for
> MXNet developers.
> > > > > > > >
> > > > > > > > In the future, I suggest adding the major downstream 
> > > > > > > > test cases, like
> > > > > > > from
> > > > > > > > sockeye, GluonNLP, GLuonCV, DGL, Gluon-TS, into the 
> > > > > > > > nightly
> test.
> > > > > > > > If it's still too heavy,  maybe testing it weekly or 
> > > > > > > > monthly :)
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > --Patric
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Anirudh Subramanian 
> > > > > > > > > [mailto:anirudh2290@gmail.com]
> > > > > > > > > Sent: Friday, June 21, 2019 9:31 AM
> > > > > > > > > To: dev@mxnet.incubator.apache.org
> > > > > > > > > Cc: dev@mxnet.apache.org
> > > > > > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) 
> > > > > > > > > version
> > > > > > > > > 1.5.0.rc1
> > > > > > > > >
> > > > > > > > > Hi Lai,
> > > > > > > > >
> > > > > > > > > I have opened an issue:
> > > > > > > > > https://github.com/apache/incubator-mxnet/issues/15297
> > > > > > > > > I came to know about this issue only today and I have 
> > > > > > > > > not been
> > > > > > > monitoring
> > > > > > > > > sockeye.
> > > > > > > > > I jumped onto this issue to make sure it wasn't caused 
> > > > > > > > > by the dlpack
> > > > > > > > changes.
> > > > > > > > > Also, I don't  think sockeye CI checks against master, 
> > > > > > > > > it is using
> > > > > > > 1.4.1.
> > > > > > > > >
> > > > > > > > > Anirudh
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei 
> > > > > > > > > <ro...@gmail.com>
> wrote:
> > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > Could you share which test failed and what’s the 
> > > > > > > > > > crash? How to reproduce it?
> > > > > > > > > >
> > > > > > > > > > I was able to install sockeye and run all tests passed.
> > > > > > > > > > Using python setup.py test
> > > > > > > > > >
> > > > > > > > > > I have tested both nightly pip package and 1.5.0.rc1
> > > > > > > > > >
> > > > > > > > > > It would be great to create an issue with 
> > > > > > > > > > reproducible steps and move the discussion there.
> > > > > > > > > >
> > > > > > > > > > Also I see sockeye nightly build[1] has been failing 
> > > > > > > > > > for some time,
> > > > > > > if
> > > > > > > > > > it’s due to MXNet change, please raise this early so 
> > > > > > > > > > we can track and solve it in time rather than block 
> > > > > > > > > > the release
> during vote time.
> > > > > > > > > >
> > > > > > > > > > [1] https://travis-ci.org/awslabs/sockeye
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Fri, Jun 21, 2019 at 7:01 AM Anirudh Subramanian 
> > > > > > > > > > <anirudh2290@gmail.com
> > > > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > I was able to reproduce a crash with the commit
> > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06 but not 
> > > > > > > > > > > with the commit
> a862270beb2d796c1ba311183f7f4a766a18ad6c.
> > > > > > > > > > >
> > > > > > > > > > > Anirudh
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei 
> > > > > > > > > > > <ro...@gmail.com>
> > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Przemyslaw,
> > > > > > > > > > > >
> > > > > > > > > > > > Is there an issue with more details to track the problem?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Jun 21, 2019 at 6:04 AM Przemysław 
> > > > > > > > > > > > Trędak <pt...@apache.org>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > -1
> > > > > > > > > > > > >
> > > > > > > > > > > > > There is a crash in sockeye unit test (python 
> > > > > > > > > > > > > setup.py
> > > > > > > > > > > > > test) observed starting with nightly 1.5 build 
> > > > > > > > > > > > > from
> > > > > > > > > > > > > 6/13 and still occuring in
> > > > > > > > > > > 1.5rc1. I
> > > > > > > > > > > > > don't yet have the exact commit that is 
> > > > > > > > > > > > > responsible for it, but it is either 
> > > > > > > > > > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c 
> > > > > > > > > > > > > (dlpack
> > > > > > > > > > > > > related) or
> > > > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06 
> > > > > > > > > > > > > (cached op
> > > > > > > > > optimization).
> > > > > > > > > > > > >
> > > > > > > > > > > > > On 2019/06/20 06:36:22, Lai Wei 
> > > > > > > > > > > > > <ro...@gmail.com>
> wrote:
> > > > > > > > > > > > > > Dear MXNet community,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This is the 3-day vote to release Apache 
> > > > > > > > > > > > > > MXNet
> > > > > > > > > > > > > > (incubating) version
> > > > > > > > > > > > > 1.5.0.
> > > > > > > > > > > > > > Voting on dev@ will start June 19,
> > > > > > > > > > > > > > 23:59:59(PST) and close
> > > > > > > on
> > > > > > > > > > June
> > > > > > > > > > > > 22,
> > > > > > > > > > > > > > 23:59:59.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 1) Link to release notes:
> > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Re
> > > > > > > le
> > > > > > > ase+No
> > > > > > > te
> > > > > > > > > > > s
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 2) Link to release candidate:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.5
> > > > > > > .0
> > > > > > > .r
> > > > > > > > > > > > > > c1
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 3) Link to source and signatures on apache dist server:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5
> > > > > > > .0
> > > > > > > .r
> > > > > > > > > > > > > > c1/
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Please remember to TEST first before voting
> accordingly:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > +1 = approve
> > > > > > > > > > > > > > +0 = no opinion
> > > > > > > > > > > > > > -1 = disapprove (provide reason)
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > Best Regards
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Lai
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Best Regards
> > > > > > > > > > > >
> > > > > > > > > > > > Lai
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Best Regards
> > > > > > > > > >
> > > > > > > > > > Lai
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > --
> > > > > > Best Regards
> > > > > >
> > > > > > Lai

RE: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Posted by "Zhao, Patric" <pa...@intel.com>.
Could we run more epochs to see the performance difference or profiling the difference between good and bad run?

> -----Original Message-----
> From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com]
> Sent: Thursday, June 27, 2019 9:35 AM
> To: dev@mxnet.incubator.apache.org
> Cc: dev@mxnet.apache.org
> Subject: Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1
> 
> I run again and the gap is again bigger, I guess we need to average out the
> times across several runs:
> 
> piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
> (master)+$ time ~/mxnet_1.4/py3_venv/bin/python cifar10.py --epochs 5
> && time ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
> [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
> ImageRecordIOParser2:
> /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4 threads for
> decoding..
> [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean image from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean image from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin completed
> [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
> ImageRecordIOParser2:
> /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4 threads for
> decoding..
> [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean image from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean image from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin completed
> lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005, 300: 0.0001}
> Epoch 0, Changed learning rate to 0.05
> [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> 147456 bytes with malloc directly
> [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> 589824 bytes with malloc directly
> [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> 2359296 bytes with malloc directly
> [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> 9437184 bytes with malloc directly
> Epoch 0, Batch 199, Speed=384.149839
> Epoch 0, Duration=140.919567
> Epoch 0, Training accuracy=0.115169
> Epoch 0, Validation accuracy=0.141317
> Epoch 1, Batch 199, Speed=433.380512
> Epoch 1, Duration=119.553233
> Epoch 1, Training accuracy=0.170956
> Epoch 1, Validation accuracy=0.216146
> Epoch 2, Batch 199, Speed=434.864699
> Epoch 2, Duration=123.278490
> Epoch 2, Training accuracy=0.209455
> Epoch 2, Validation accuracy=0.247296
> Epoch 3, Batch 199, Speed=433.401854
> Epoch 3, Duration=118.327797
> Epoch 3, Training accuracy=0.248701
> Epoch 3, Validation accuracy=0.302083
> Epoch 4, Batch 199, Speed=419.713707
> Epoch 4, Duration=126.468409
> Epoch 4, Training accuracy=0.260949
> Epoch 4, Validation accuracy=0.269030
> 
> real    10m55.796s
> user    399m33.567s
> sys     13m55.904s
> [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
> ImageRecordIOParser2:
> /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4 threads for
> decoding..
> [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean image from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean image from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin completed
> [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
> ImageRecordIOParser2:
> /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4 threads for
> decoding..
> [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean image from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean image from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin completed
> lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005, 300: 0.0001}
> Epoch 0, Changed learning rate to 0.05 Epoch 0, Batch 199,
> Speed=419.039188 Epoch 0, Duration=143.934903 Epoch 0, Training
> accuracy=0.122542 Epoch 0, Validation accuracy=0.164359 Epoch 1, Batch
> 199, Speed=445.257048 Epoch 1, Duration=135.248399 Epoch 1, Training
> accuracy=0.178828 Epoch 1, Validation accuracy=0.199419 Epoch 2, Batch
> 199, Speed=447.115215 Epoch 2, Duration=132.003770 Epoch 2, Training
> accuracy=0.217808 Epoch 2, Validation accuracy=0.233073 Epoch 3, Batch
> 199, Speed=441.079477 Epoch 3, Duration=126.543316 Epoch 3, Training
> accuracy=0.248102 Epoch 3, Validation accuracy=0.293870 Epoch 4, Batch
> 199, Speed=449.329787 Epoch 4, Duration=138.398325 Epoch 4, Training
> accuracy=0.270021 Epoch 4, Validation accuracy=0.311498
> 
> real    11m45.329s
> user    426m13.908s
> sys     16m45.093s
> 
> On Wed, Jun 26, 2019 at 4:18 PM Pedro Larroy
> <pe...@gmail.com> wrote:
> >
> > The difference looks smaller now, more like your numbers. I wonder if
> > something happened during the previous benchmark like a system
> > update...
> >
> >
> > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
> (master)+$
> > time ~/mxnet_1.4/py3_venv/bin/python cifar10.py --epochs 5 && time
> > ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5 [22:49:41]
> > ../src/io/iter_image_recordio_2.cc:172:
> > ImageRecordIOParser2:
> > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4 threads
> > for decoding..
> > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> completed
> > [22:49:41] ../src/io/iter_image_recordio_2.cc:172:
> > ImageRecordIOParser2:
> > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4 threads
> > for decoding..
> > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> completed
> > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005, 300:
> > 0.0001} Epoch 0, Changed learning rate to 0.05 [22:49:42]
> > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > 147456 bytes with malloc directly
> > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > 589824 bytes with malloc directly
> > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > 2359296 bytes with malloc directly
> > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> > 9437184 bytes with malloc directly
> > Epoch 0, Batch 199, Speed=426.182733
> > Epoch 0, Duration=134.868458
> > Epoch 0, Training accuracy=0.127238
> > Epoch 0, Validation accuracy=0.206388
> > Epoch 1, Batch 199, Speed=313.127156
> > Epoch 1, Duration=128.041775
> > Epoch 1, Training accuracy=0.182065
> > Epoch 1, Validation accuracy=0.202524
> > Epoch 2, Batch 199, Speed=410.931187
> > Epoch 2, Duration=124.920588
> > Epoch 2, Training accuracy=0.202584
> > Epoch 2, Validation accuracy=0.245693
> > Epoch 3, Batch 199, Speed=419.119335
> > Epoch 3, Duration=120.948349
> > Epoch 3, Training accuracy=0.235854
> > Epoch 3, Validation accuracy=0.291066
> > Epoch 4, Batch 199, Speed=430.473733
> > Epoch 4, Duration=130.181724
> > Epoch 4, Training accuracy=0.257773
> > Epoch 4, Validation accuracy=0.304988
> >
> > real    11m7.356s
> > user    406m9.910s
> > sys     14m18.349s
> > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
> > ImageRecordIOParser2:
> > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4 threads
> > for decoding..
> > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> completed
> > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
> > ImageRecordIOParser2:
> > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4 threads
> > for decoding..
> > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> completed
> > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005, 300:
> > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0, Batch 199,
> > Speed=348.618154 Epoch 0, Duration=146.469352 Epoch 0, Training
> > accuracy=0.124121 Epoch 0, Validation accuracy=0.167227 Epoch 1, Batch
> > 199, Speed=452.790825 Epoch 1, Duration=130.199421 Epoch 1, Training
> > accuracy=0.183863 Epoch 1, Validation accuracy=0.237079 Epoch 2, Batch
> > 199, Speed=451.406559 Epoch 2, Duration=126.320823 Epoch 2, Training
> > accuracy=0.214844 Epoch 2, Validation accuracy=0.244692 Epoch 3, Batch
> > 199, Speed=403.161873 Epoch 3, Duration=125.331660 Epoch 3, Training
> > accuracy=0.243506 Epoch 3, Validation accuracy=0.301182 Epoch 4, Batch
> > 199, Speed=450.826598 Epoch 4, Duration=126.426253 Epoch 4, Training
> > accuracy=0.266424 Epoch 4, Validation accuracy=0.311899
> >
> > real    11m21.930s
> > user    415m3.855s
> > sys     13m53.975s
> >
> > On Wed, Jun 26, 2019 at 3:50 PM Pedro Larroy
> > <pe...@gmail.com> wrote:
> > >
> > > Hi Ciyong, thanks for trying to reproduce:
> > >
> > > I used this one:
> > > https://github.com/awslabs/deeplearning-
> benchmark/blob/master/dawnbe
> > > nch/cifar10.py
> > >
> > > Could you provide hardware and OS details?
> > >
> > > I will rerun and repost numbers in a few minutes.
> > >
> > > Pedro.
> > >
> > > On Wed, Jun 26, 2019 at 4:18 AM Chen, Ciyong <ci...@intel.com>
> wrote:
> > > >
> > > > Hi Pedro,
> > > >
> > > > I'm looking at this case, and using the script of
> > > > "incubator-mxnet/example/image-classification/train_cifar10.py" to get
> the timing data, but seems there's not much difference between mxnet
> 1.4.1.rc0 and 1.5.0.rc1 on C5.18xlarge.
> > > >
> > > > Not sure if there's any difference in the python script, can you point me
> the link to get your script (cifar10.py)?
> > > > Or you can also have a try with MXNet's script (train_cifar10.py) and see
> the performance.
> > > >
> > > > Here's the command I used to collect the time:
> > > >         python train_cifar10.py --num-epoch=5
> > > >
> > > > 1) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
> > > >         real    9m4.880s
> > > >         user    333m13.340s
> > > >         sys     14m36.100s
> > > >
> > > > 2) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
> > > >         real    9m2.155s
> > > >         user    329m37.092s
> > > >         sys     16m8.668s
> > > >
> > > > -Ciyong
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com]
> > > > Sent: Wednesday, June 26, 2019 6:28 AM
> > > > To: dev@mxnet.incubator.apache.org
> > > > Cc: dev@mxnet.apache.org
> > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version
> > > > 1.5.0.rc1
> > > >
> > > > Hi these were my build flags and system info:
> > > >
> > > >
> > > > --- # CMake configuration
> > > > USE_CUDA: "OFF" # Build with CUDA support
> > > > USE_OLDCMAKECUDA: "OFF" # Build with old cmake cuda
> > > > USE_NCCL: "OFF" # Use NVidia NCCL with CUDA
> > > > USE_OPENCV: "ON" # Build with OpenCV support
> > > > USE_OPENMP: "ON" # Build with Openmp support
> > > > USE_CUDNN: "ON" # Build with cudnn support) # one could set
> > > > CUDNN_ROOT for search path
> > > > USE_SSE: "ON" # Build with x86 SSE instruction support IF NOT ARM
> > > > USE_F16C: "ON" # Build with x86 F16C instruction support) #
> autodetects support if "ON"
> > > > USE_LAPACK: "ON" # Build with lapack support
> > > > USE_MKL_IF_AVAILABLE: "ON" # Use MKL if found
> > > > USE_MKLML_MKL: "ON" # Use MKLDNN variant of MKL (if MKL found) IF
> > > > USE_MKL_IF_AVAILABLE AND (NOT APPLE)
> > > > USE_MKLDNN: "ON" # Use MKLDNN variant of MKL (if MKL found) IF
> > > > USE_MKL_IF_AVAILABLE AND (NOT APPLE)
> > > > USE_OPERATOR_TUNING: "ON" # Enable auto-tuning of operators IF
> NOT
> > > > MSVC
> > > > USE_GPERFTOOLS: "ON" # Build with GPerfTools support (if found)
> > > > USE_JEMALLOC: "ON" # Build with Jemalloc support
> > > > USE_PROFILER: "ON" # Build with Profiler support
> > > > USE_DIST_KVSTORE: "OFF" # Build with DIST_KVSTORE support
> > > > USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC Plugins
> > > > USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin
> > > > USE_CPP_PACKAGE: "OFF" # Build C++ Package
> > > > USE_MXNET_LIB_NAMING: "ON" # Use MXNet library naming
> conventions.
> > > > USE_GPROF: "OFF" # Compile with gprof (profiling) flag
> > > > USE_CXX14_IF_AVAILABLE: "OFF" # Build with C++14 if the compiler
> > > > supports it
> > > > USE_VTUNE: "OFF" # Enable use of Intel Amplifier XE (VTune)) # one
> > > > could set VTUNE_ROOT for search path
> > > > ENABLE_CUDA_RTC: "ON" # Build with CUDA runtime compilation
> > > > support
> > > > BUILD_CPP_EXAMPLES: "ON" # Build cpp examples
> > > > INSTALL_EXAMPLES: "OFF" # Install the example source files.
> > > > USE_SIGNAL_HANDLER: "ON" # Print stack traces on segfaults.
> > > > USE_TENSORRT: "OFF" # Enable infeference optimization with TensorRT.
> > > > USE_ASAN: "OFF" # Enable Clang/GCC ASAN sanitizers.
> > > > ENABLE_TESTCOVERAGE: "OFF" # Enable compilation with test coverage
> > > > metric output
> > > > CMAKE_BUILD_TYPE: "Release"
> > > > CMAKE_CUDA_COMPILER_LAUNCHER: "ccache"
> > > > CMAKE_C_COMPILER_LAUNCHER: "ccache"
> > > > CMAKE_CXX_COMPILER_LAUNCHER: "ccache"
> > > >
> > > > commit 4d9667121ae6fb643f2a02ab15e25231ed756cde (HEAD, tag:
> > > > 1.5.0.rc1,
> > > > upstream/v1.5.x)
> > > > commit 1a7199691f5cbc6012bb53eecbf884bed5ae6590 (HEAD, tag:
> > > > 1.4.1.rc0,
> > > > upstream/v1.4.x)
> > > >
> > > > curl http://169.254.169.254/latest/meta-data/instance-type
> > > > c5d.18xlarge
> > > >
> > > >
> > > > Version      : 3.6.7
> > > > Compiler     : GCC 8.2.0
> > > > Build        : ('default', 'Oct 22 2018 11:32:17')
> > > > Arch         : ('64bit', 'ELF')
> > > > ------------Pip Info-----------
> > > > Version      : 19.1.1
> > > > Directory    : /home/piotr/mxnet_1.5/py3_venv/lib/python3.6/site-
> packages/pip
> > > > ----------MXNet Info-----------
> > > > Version      : 1.5.0
> > > > Directory    : /home/piotr/mxnet_1.5/python/mxnet
> > > > Hashtag not found. Not installed from pre-built package.
> > > > ----------System Info----------
> > > > Platform     : Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
> > > > system       : Linux
> > > > node         : ip-172-31-63-171
> > > > release      : 4.15.0-1035-aws
> > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
> > > > ----------Hardware Info----------
> > > > machine      : x86_64
> > > > processor    : x86_64
> > > > Architecture:        x86_64
> > > > CPU op-mode(s):      32-bit, 64-bit
> > > > Byte Order:          Little Endian
> > > > CPU(s):              72
> > > > On-line CPU(s) list: 0-71
> > > > Thread(s) per core:  2
> > > > Core(s) per socket:  18
> > > > Socket(s):           2
> > > > NUMA node(s):        2
> > > > Vendor ID:           GenuineIntel
> > > > CPU family:          6
> > > > Model:               85
> > > > Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
> > > > Stepping:            4
> > > > CPU MHz:             1326.446
> > > > BogoMIPS:            6000.00
> > > > Hypervisor vendor:   KVM
> > > > Virtualization type: full
> > > > L1d cache:           32K
> > > > L1i cache:           32K
> > > > L2 cache:            1024K
> > > > L3 cache:            25344K
> > > > NUMA node0 CPU(s):   0-17,36-53
> > > > NUMA node1 CPU(s):   18-35,54-71
> > > > Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
> > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx
> > > > pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl
> > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq monitor ssse3
> > > > fma cx16 pcid
> > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx
> > > > f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single
> > > > pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm
> > > > mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd
> > > > avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku
> > > > ospke ----------Network Test----------
> > > >
> > > > ----------Python Info----------
> > > > Version      : 3.6.7
> > > > Compiler     : GCC 8.2.0
> > > > Build        : ('default', 'Oct 22 2018 11:32:17')
> > > > Arch         : ('64bit', 'ELF')
> > > > ------------Pip Info-----------
> > > > Version      : 19.1.1
> > > > Directory    : /home/piotr/mxnet_1.4/py3_venv/lib/python3.6/site-
> packages/pip
> > > > ----------MXNet Info-----------
> > > > Version      : 1.4.1
> > > > Directory    : /home/piotr/mxnet_1.4/python/mxnet
> > > > Hashtag not found. Not installed from pre-built package.
> > > > ----------System Info----------
> > > > Platform     : Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
> > > > system       : Linux
> > > > node         : ip-172-31-63-171
> > > > release      : 4.15.0-1035-aws
> > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
> > > > ----------Hardware Info----------
> > > > machine      : x86_64
> > > > processor    : x86_64
> > > > Architecture:        x86_64
> > > > CPU op-mode(s):      32-bit, 64-bit
> > > > Byte Order:          Little Endian
> > > > CPU(s):              72
> > > > On-line CPU(s) list: 0-71
> > > > Thread(s) per core:  2
> > > > Core(s) per socket:  18
> > > > Socket(s):           2
> > > > NUMA node(s):        2
> > > > Vendor ID:           GenuineIntel
> > > > CPU family:          6
> > > > Model:               85
> > > > Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
> > > > Stepping:            4
> > > > CPU MHz:             1223.344
> > > > BogoMIPS:            6000.00
> > > > Hypervisor vendor:   KVM
> > > > Virtualization type: full
> > > > L1d cache:           32K
> > > > L1i cache:           32K
> > > > L2 cache:            1024K
> > > > L3 cache:            25344K
> > > > NUMA node0 CPU(s):   0-17,36-53
> > > > NUMA node1 CPU(s):   18-35,54-71
> > > > Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
> > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx
> > > > pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl
> > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq monitor ssse3
> > > > fma cx16 pcid
> > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx
> > > > f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single
> > > > pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm
> > > > mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd
> > > > avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku
> > > > ospke ----------Network Test----------
> > > >
> > > > On Tue, Jun 25, 2019 at 2:35 PM Pedro Larroy
> <pe...@gmail.com> wrote:
> > > > >
> > > > > I did a training of cifar10 in CPU and seems there's some
> > > > > regressions in the range of 7% increase of training time against 1.4.1:
> > > > >
> > > > > (py3_venv)
> > > > > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
> > > > > (master)+$ time python cifar10.py --epochs 5
> > > > > real    11m30.388s
> > > > > user    417m7.766s
> > > > > sys     16m57.315s
> > > > >
> > > > > VS 1.4.1:
> > > > > real    10m41.994s
> > > > > user    392m40.646s
> > > > > sys     12m30.601s
> > > > >
> > > > >
> > > > > On Thu, Jun 20, 2019 at 10:15 PM Lai Wei <ro...@gmail.com>
> wrote:
> > > > > >
> > > > > > Hi Anirudh,
> > > > > >
> > > > > > Thanks for jumping into this quickly, I followed up on the issue.
> > > > > >
> > > > > > I was meant for sockeye developer/maintainers to help setup
> > > > > > nightly tests and raise issues early.
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > On Fri, Jun 21, 2019 at 10:10 AM Haibin Lin
> > > > > > <ha...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > In GluonNLP we are testing with MXNET nightly build for each
> > > > > > > PR, and we did find some MXNet related issue caught by the CI.
> > > > > > > I recommend other toolkits also add integration tests with MXNet
> nightly.
> > > > > > > It helps identify issues early.
> > > > > > >
> > > > > > > Best,
> > > > > > > Haibin
> > > > > > >
> > > > > > > On Thu, Jun 20, 2019 at 18:52 Zhao, Patric <pa...@intel.com>
> wrote:
> > > > > > >
> > > > > > > > Thanks to raise the issue and we will take a look ASAP.
> > > > > > > >
> > > > > > > > The downstream cases is not in the MXNet CI so it's hard
> > > > > > > > to catch the potential bugs or performance degradation for
> MXNet developers.
> > > > > > > >
> > > > > > > > In the future, I suggest adding the major downstream test
> > > > > > > > cases, like
> > > > > > > from
> > > > > > > > sockeye, GluonNLP, GLuonCV, DGL, Gluon-TS, into the nightly
> test.
> > > > > > > > If it's still too heavy,  maybe testing it weekly or
> > > > > > > > monthly :)
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > --Patric
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
> > > > > > > > > Sent: Friday, June 21, 2019 9:31 AM
> > > > > > > > > To: dev@mxnet.incubator.apache.org
> > > > > > > > > Cc: dev@mxnet.apache.org
> > > > > > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating)
> > > > > > > > > version
> > > > > > > > > 1.5.0.rc1
> > > > > > > > >
> > > > > > > > > Hi Lai,
> > > > > > > > >
> > > > > > > > > I have opened an issue:
> > > > > > > > > https://github.com/apache/incubator-mxnet/issues/15297
> > > > > > > > > I came to know about this issue only today and I have
> > > > > > > > > not been
> > > > > > > monitoring
> > > > > > > > > sockeye.
> > > > > > > > > I jumped onto this issue to make sure it wasn't caused
> > > > > > > > > by the dlpack
> > > > > > > > changes.
> > > > > > > > > Also, I don't  think sockeye CI checks against master,
> > > > > > > > > it is using
> > > > > > > 1.4.1.
> > > > > > > > >
> > > > > > > > > Anirudh
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei <ro...@gmail.com>
> wrote:
> > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > Could you share which test failed and what’s the
> > > > > > > > > > crash? How to reproduce it?
> > > > > > > > > >
> > > > > > > > > > I was able to install sockeye and run all tests passed.
> > > > > > > > > > Using python setup.py test
> > > > > > > > > >
> > > > > > > > > > I have tested both nightly pip package and 1.5.0.rc1
> > > > > > > > > >
> > > > > > > > > > It would be great to create an issue with reproducible
> > > > > > > > > > steps and move the discussion there.
> > > > > > > > > >
> > > > > > > > > > Also I see sockeye nightly build[1] has been failing
> > > > > > > > > > for some time,
> > > > > > > if
> > > > > > > > > > it’s due to MXNet change, please raise this early so
> > > > > > > > > > we can track and solve it in time rather than block the release
> during vote time.
> > > > > > > > > >
> > > > > > > > > > [1] https://travis-ci.org/awslabs/sockeye
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Fri, Jun 21, 2019 at 7:01 AM Anirudh Subramanian
> > > > > > > > > > <anirudh2290@gmail.com
> > > > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > I was able to reproduce a crash with the commit
> > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06 but not
> > > > > > > > > > > with the commit
> a862270beb2d796c1ba311183f7f4a766a18ad6c.
> > > > > > > > > > >
> > > > > > > > > > > Anirudh
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei
> > > > > > > > > > > <ro...@gmail.com>
> > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Przemyslaw,
> > > > > > > > > > > >
> > > > > > > > > > > > Is there an issue with more details to track the problem?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Jun 21, 2019 at 6:04 AM Przemysław Trędak
> > > > > > > > > > > > <pt...@apache.org>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > -1
> > > > > > > > > > > > >
> > > > > > > > > > > > > There is a crash in sockeye unit test (python
> > > > > > > > > > > > > setup.py
> > > > > > > > > > > > > test) observed starting with nightly 1.5 build
> > > > > > > > > > > > > from
> > > > > > > > > > > > > 6/13 and still occuring in
> > > > > > > > > > > 1.5rc1. I
> > > > > > > > > > > > > don't yet have the exact commit that is
> > > > > > > > > > > > > responsible for it, but it is either
> > > > > > > > > > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c (dlpack
> > > > > > > > > > > > > related) or
> > > > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06 (cached
> > > > > > > > > > > > > op
> > > > > > > > > optimization).
> > > > > > > > > > > > >
> > > > > > > > > > > > > On 2019/06/20 06:36:22, Lai Wei <ro...@gmail.com>
> wrote:
> > > > > > > > > > > > > > Dear MXNet community,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This is the 3-day vote to release Apache MXNet
> > > > > > > > > > > > > > (incubating) version
> > > > > > > > > > > > > 1.5.0.
> > > > > > > > > > > > > > Voting on dev@ will start June 19,
> > > > > > > > > > > > > > 23:59:59(PST) and close
> > > > > > > on
> > > > > > > > > > June
> > > > > > > > > > > > 22,
> > > > > > > > > > > > > > 23:59:59.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 1) Link to release notes:
> > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Rele
> > > > > > > ase+No
> > > > > > > te
> > > > > > > > > > > s
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 2) Link to release candidate:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.5.0
> > > > > > > .r
> > > > > > > > > > > > > > c1
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 3) Link to source and signatures on apache dist server:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.0
> > > > > > > .r
> > > > > > > > > > > > > > c1/
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Please remember to TEST first before voting
> accordingly:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > +1 = approve
> > > > > > > > > > > > > > +0 = no opinion
> > > > > > > > > > > > > > -1 = disapprove (provide reason)
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > Best Regards
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Lai
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Best Regards
> > > > > > > > > > > >
> > > > > > > > > > > > Lai
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Best Regards
> > > > > > > > > >
> > > > > > > > > > Lai
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > --
> > > > > > Best Regards
> > > > > >
> > > > > > Lai

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Posted by Pedro Larroy <pe...@gmail.com>.
I run again and the gap is again bigger, I guess we need to average
out the times across several runs:

piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench (master)+$
time ~/mxnet_1.4/py3_venv/bin/python cifar10.py --epochs 5 && time
~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
[23:17:09] ../src/io/iter_image_recordio_2.cc:172:
ImageRecordIOParser2:
/home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4 threads
for decoding..
[23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean image
from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
[23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean image
from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin completed
[23:17:09] ../src/io/iter_image_recordio_2.cc:172:
ImageRecordIOParser2:
/home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4 threads
for decoding..
[23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean image
from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
[23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean image
from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin completed
lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005, 300: 0.0001}
Epoch 0, Changed learning rate to 0.05
[23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
147456 bytes with malloc directly
[23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
589824 bytes with malloc directly
[23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
2359296 bytes with malloc directly
[23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
9437184 bytes with malloc directly
Epoch 0, Batch 199, Speed=384.149839
Epoch 0, Duration=140.919567
Epoch 0, Training accuracy=0.115169
Epoch 0, Validation accuracy=0.141317
Epoch 1, Batch 199, Speed=433.380512
Epoch 1, Duration=119.553233
Epoch 1, Training accuracy=0.170956
Epoch 1, Validation accuracy=0.216146
Epoch 2, Batch 199, Speed=434.864699
Epoch 2, Duration=123.278490
Epoch 2, Training accuracy=0.209455
Epoch 2, Validation accuracy=0.247296
Epoch 3, Batch 199, Speed=433.401854
Epoch 3, Duration=118.327797
Epoch 3, Training accuracy=0.248701
Epoch 3, Validation accuracy=0.302083
Epoch 4, Batch 199, Speed=419.713707
Epoch 4, Duration=126.468409
Epoch 4, Training accuracy=0.260949
Epoch 4, Validation accuracy=0.269030

real    10m55.796s
user    399m33.567s
sys     13m55.904s
[23:28:04] ../src/io/iter_image_recordio_2.cc:172:
ImageRecordIOParser2:
/home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4 threads
for decoding..
[23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean image
from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
[23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean image
from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin completed
[23:28:04] ../src/io/iter_image_recordio_2.cc:172:
ImageRecordIOParser2:
/home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4 threads
for decoding..
[23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean image
from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
[23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean image
from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin completed
lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005, 300: 0.0001}
Epoch 0, Changed learning rate to 0.05
Epoch 0, Batch 199, Speed=419.039188
Epoch 0, Duration=143.934903
Epoch 0, Training accuracy=0.122542
Epoch 0, Validation accuracy=0.164359
Epoch 1, Batch 199, Speed=445.257048
Epoch 1, Duration=135.248399
Epoch 1, Training accuracy=0.178828
Epoch 1, Validation accuracy=0.199419
Epoch 2, Batch 199, Speed=447.115215
Epoch 2, Duration=132.003770
Epoch 2, Training accuracy=0.217808
Epoch 2, Validation accuracy=0.233073
Epoch 3, Batch 199, Speed=441.079477
Epoch 3, Duration=126.543316
Epoch 3, Training accuracy=0.248102
Epoch 3, Validation accuracy=0.293870
Epoch 4, Batch 199, Speed=449.329787
Epoch 4, Duration=138.398325
Epoch 4, Training accuracy=0.270021
Epoch 4, Validation accuracy=0.311498

real    11m45.329s
user    426m13.908s
sys     16m45.093s

On Wed, Jun 26, 2019 at 4:18 PM Pedro Larroy
<pe...@gmail.com> wrote:
>
> The difference looks smaller now, more like your numbers. I wonder if
> something happened during the previous benchmark like a system
> update...
>
>
> piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench (master)+$
> time ~/mxnet_1.4/py3_venv/bin/python cifar10.py --epochs 5 && time
> ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
> [22:49:41] ../src/io/iter_image_recordio_2.cc:172:
> ImageRecordIOParser2:
> /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4 threads
> for decoding..
> [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin completed
> [22:49:41] ../src/io/iter_image_recordio_2.cc:172:
> ImageRecordIOParser2:
> /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4 threads
> for decoding..
> [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin completed
> lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005, 300: 0.0001}
> Epoch 0, Changed learning rate to 0.05
> [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> 147456 bytes with malloc directly
> [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> 589824 bytes with malloc directly
> [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> 2359296 bytes with malloc directly
> [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> 9437184 bytes with malloc directly
> Epoch 0, Batch 199, Speed=426.182733
> Epoch 0, Duration=134.868458
> Epoch 0, Training accuracy=0.127238
> Epoch 0, Validation accuracy=0.206388
> Epoch 1, Batch 199, Speed=313.127156
> Epoch 1, Duration=128.041775
> Epoch 1, Training accuracy=0.182065
> Epoch 1, Validation accuracy=0.202524
> Epoch 2, Batch 199, Speed=410.931187
> Epoch 2, Duration=124.920588
> Epoch 2, Training accuracy=0.202584
> Epoch 2, Validation accuracy=0.245693
> Epoch 3, Batch 199, Speed=419.119335
> Epoch 3, Duration=120.948349
> Epoch 3, Training accuracy=0.235854
> Epoch 3, Validation accuracy=0.291066
> Epoch 4, Batch 199, Speed=430.473733
> Epoch 4, Duration=130.181724
> Epoch 4, Training accuracy=0.257773
> Epoch 4, Validation accuracy=0.304988
>
> real    11m7.356s
> user    406m9.910s
> sys     14m18.349s
> [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
> ImageRecordIOParser2:
> /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4 threads
> for decoding..
> [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin completed
> [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
> ImageRecordIOParser2:
> /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4 threads
> for decoding..
> [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load mean image
> from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load mean image
> from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin completed
> lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005, 300: 0.0001}
> Epoch 0, Changed learning rate to 0.05
> Epoch 0, Batch 199, Speed=348.618154
> Epoch 0, Duration=146.469352
> Epoch 0, Training accuracy=0.124121
> Epoch 0, Validation accuracy=0.167227
> Epoch 1, Batch 199, Speed=452.790825
> Epoch 1, Duration=130.199421
> Epoch 1, Training accuracy=0.183863
> Epoch 1, Validation accuracy=0.237079
> Epoch 2, Batch 199, Speed=451.406559
> Epoch 2, Duration=126.320823
> Epoch 2, Training accuracy=0.214844
> Epoch 2, Validation accuracy=0.244692
> Epoch 3, Batch 199, Speed=403.161873
> Epoch 3, Duration=125.331660
> Epoch 3, Training accuracy=0.243506
> Epoch 3, Validation accuracy=0.301182
> Epoch 4, Batch 199, Speed=450.826598
> Epoch 4, Duration=126.426253
> Epoch 4, Training accuracy=0.266424
> Epoch 4, Validation accuracy=0.311899
>
> real    11m21.930s
> user    415m3.855s
> sys     13m53.975s
>
> On Wed, Jun 26, 2019 at 3:50 PM Pedro Larroy
> <pe...@gmail.com> wrote:
> >
> > Hi Ciyong, thanks for trying to reproduce:
> >
> > I used this one:
> > https://github.com/awslabs/deeplearning-benchmark/blob/master/dawnbench/cifar10.py
> >
> > Could you provide hardware and OS details?
> >
> > I will rerun and repost numbers in a few minutes.
> >
> > Pedro.
> >
> > On Wed, Jun 26, 2019 at 4:18 AM Chen, Ciyong <ci...@intel.com> wrote:
> > >
> > > Hi Pedro,
> > >
> > > I'm looking at this case, and using the script of "incubator-mxnet/example/image-classification/train_cifar10.py" to get
> > > the timing data, but seems there's not much difference between mxnet 1.4.1.rc0 and 1.5.0.rc1 on C5.18xlarge.
> > >
> > > Not sure if there's any difference in the python script, can you point me the link to get your script (cifar10.py)?
> > > Or you can also have a try with MXNet's script (train_cifar10.py) and see the performance.
> > >
> > > Here's the command I used to collect the time:
> > >         python train_cifar10.py --num-epoch=5
> > >
> > > 1) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
> > >         real    9m4.880s
> > >         user    333m13.340s
> > >         sys     14m36.100s
> > >
> > > 2) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
> > >         real    9m2.155s
> > >         user    329m37.092s
> > >         sys     16m8.668s
> > >
> > > -Ciyong
> > >
> > >
> > > -----Original Message-----
> > > From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com]
> > > Sent: Wednesday, June 26, 2019 6:28 AM
> > > To: dev@mxnet.incubator.apache.org
> > > Cc: dev@mxnet.apache.org
> > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1
> > >
> > > Hi these were my build flags and system info:
> > >
> > >
> > > --- # CMake configuration
> > > USE_CUDA: "OFF" # Build with CUDA support
> > > USE_OLDCMAKECUDA: "OFF" # Build with old cmake cuda
> > > USE_NCCL: "OFF" # Use NVidia NCCL with CUDA
> > > USE_OPENCV: "ON" # Build with OpenCV support
> > > USE_OPENMP: "ON" # Build with Openmp support
> > > USE_CUDNN: "ON" # Build with cudnn support) # one could set CUDNN_ROOT for search path
> > > USE_SSE: "ON" # Build with x86 SSE instruction support IF NOT ARM
> > > USE_F16C: "ON" # Build with x86 F16C instruction support) # autodetects support if "ON"
> > > USE_LAPACK: "ON" # Build with lapack support
> > > USE_MKL_IF_AVAILABLE: "ON" # Use MKL if found
> > > USE_MKLML_MKL: "ON" # Use MKLDNN variant of MKL (if MKL found) IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
> > > USE_MKLDNN: "ON" # Use MKLDNN variant of MKL (if MKL found) IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
> > > USE_OPERATOR_TUNING: "ON" # Enable auto-tuning of operators IF NOT MSVC
> > > USE_GPERFTOOLS: "ON" # Build with GPerfTools support (if found)
> > > USE_JEMALLOC: "ON" # Build with Jemalloc support
> > > USE_PROFILER: "ON" # Build with Profiler support
> > > USE_DIST_KVSTORE: "OFF" # Build with DIST_KVSTORE support
> > > USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC Plugins
> > > USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin
> > > USE_CPP_PACKAGE: "OFF" # Build C++ Package
> > > USE_MXNET_LIB_NAMING: "ON" # Use MXNet library naming conventions.
> > > USE_GPROF: "OFF" # Compile with gprof (profiling) flag
> > > USE_CXX14_IF_AVAILABLE: "OFF" # Build with C++14 if the compiler supports it
> > > USE_VTUNE: "OFF" # Enable use of Intel Amplifier XE (VTune)) # one could set VTUNE_ROOT for search path
> > > ENABLE_CUDA_RTC: "ON" # Build with CUDA runtime compilation support
> > > BUILD_CPP_EXAMPLES: "ON" # Build cpp examples
> > > INSTALL_EXAMPLES: "OFF" # Install the example source files.
> > > USE_SIGNAL_HANDLER: "ON" # Print stack traces on segfaults.
> > > USE_TENSORRT: "OFF" # Enable infeference optimization with TensorRT.
> > > USE_ASAN: "OFF" # Enable Clang/GCC ASAN sanitizers.
> > > ENABLE_TESTCOVERAGE: "OFF" # Enable compilation with test coverage metric output
> > > CMAKE_BUILD_TYPE: "Release"
> > > CMAKE_CUDA_COMPILER_LAUNCHER: "ccache"
> > > CMAKE_C_COMPILER_LAUNCHER: "ccache"
> > > CMAKE_CXX_COMPILER_LAUNCHER: "ccache"
> > >
> > > commit 4d9667121ae6fb643f2a02ab15e25231ed756cde (HEAD, tag: 1.5.0.rc1,
> > > upstream/v1.5.x)
> > > commit 1a7199691f5cbc6012bb53eecbf884bed5ae6590 (HEAD, tag: 1.4.1.rc0,
> > > upstream/v1.4.x)
> > >
> > > curl http://169.254.169.254/latest/meta-data/instance-type
> > > c5d.18xlarge
> > >
> > >
> > > Version      : 3.6.7
> > > Compiler     : GCC 8.2.0
> > > Build        : ('default', 'Oct 22 2018 11:32:17')
> > > Arch         : ('64bit', 'ELF')
> > > ------------Pip Info-----------
> > > Version      : 19.1.1
> > > Directory    : /home/piotr/mxnet_1.5/py3_venv/lib/python3.6/site-packages/pip
> > > ----------MXNet Info-----------
> > > Version      : 1.5.0
> > > Directory    : /home/piotr/mxnet_1.5/python/mxnet
> > > Hashtag not found. Not installed from pre-built package.
> > > ----------System Info----------
> > > Platform     : Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
> > > system       : Linux
> > > node         : ip-172-31-63-171
> > > release      : 4.15.0-1035-aws
> > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
> > > ----------Hardware Info----------
> > > machine      : x86_64
> > > processor    : x86_64
> > > Architecture:        x86_64
> > > CPU op-mode(s):      32-bit, 64-bit
> > > Byte Order:          Little Endian
> > > CPU(s):              72
> > > On-line CPU(s) list: 0-71
> > > Thread(s) per core:  2
> > > Core(s) per socket:  18
> > > Socket(s):           2
> > > NUMA node(s):        2
> > > Vendor ID:           GenuineIntel
> > > CPU family:          6
> > > Model:               85
> > > Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
> > > Stepping:            4
> > > CPU MHz:             1326.446
> > > BogoMIPS:            6000.00
> > > Hypervisor vendor:   KVM
> > > Virtualization type: full
> > > L1d cache:           32K
> > > L1i cache:           32K
> > > L2 cache:            1024K
> > > L3 cache:            25344K
> > > NUMA node0 CPU(s):   0-17,36-53
> > > NUMA node1 CPU(s):   18-35,54-71
> > > Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
> > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 pcid
> > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke ----------Network Test----------
> > >
> > > ----------Python Info----------
> > > Version      : 3.6.7
> > > Compiler     : GCC 8.2.0
> > > Build        : ('default', 'Oct 22 2018 11:32:17')
> > > Arch         : ('64bit', 'ELF')
> > > ------------Pip Info-----------
> > > Version      : 19.1.1
> > > Directory    : /home/piotr/mxnet_1.4/py3_venv/lib/python3.6/site-packages/pip
> > > ----------MXNet Info-----------
> > > Version      : 1.4.1
> > > Directory    : /home/piotr/mxnet_1.4/python/mxnet
> > > Hashtag not found. Not installed from pre-built package.
> > > ----------System Info----------
> > > Platform     : Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
> > > system       : Linux
> > > node         : ip-172-31-63-171
> > > release      : 4.15.0-1035-aws
> > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
> > > ----------Hardware Info----------
> > > machine      : x86_64
> > > processor    : x86_64
> > > Architecture:        x86_64
> > > CPU op-mode(s):      32-bit, 64-bit
> > > Byte Order:          Little Endian
> > > CPU(s):              72
> > > On-line CPU(s) list: 0-71
> > > Thread(s) per core:  2
> > > Core(s) per socket:  18
> > > Socket(s):           2
> > > NUMA node(s):        2
> > > Vendor ID:           GenuineIntel
> > > CPU family:          6
> > > Model:               85
> > > Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
> > > Stepping:            4
> > > CPU MHz:             1223.344
> > > BogoMIPS:            6000.00
> > > Hypervisor vendor:   KVM
> > > Virtualization type: full
> > > L1d cache:           32K
> > > L1i cache:           32K
> > > L2 cache:            1024K
> > > L3 cache:            25344K
> > > NUMA node0 CPU(s):   0-17,36-53
> > > NUMA node1 CPU(s):   18-35,54-71
> > > Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
> > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 pcid
> > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke ----------Network Test----------
> > >
> > > On Tue, Jun 25, 2019 at 2:35 PM Pedro Larroy <pe...@gmail.com> wrote:
> > > >
> > > > I did a training of cifar10 in CPU and seems there's some regressions
> > > > in the range of 7% increase of training time against 1.4.1:
> > > >
> > > > (py3_venv) piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
> > > > (master)+$ time python cifar10.py --epochs 5
> > > > real    11m30.388s
> > > > user    417m7.766s
> > > > sys     16m57.315s
> > > >
> > > > VS 1.4.1:
> > > > real    10m41.994s
> > > > user    392m40.646s
> > > > sys     12m30.601s
> > > >
> > > >
> > > > On Thu, Jun 20, 2019 at 10:15 PM Lai Wei <ro...@gmail.com> wrote:
> > > > >
> > > > > Hi Anirudh,
> > > > >
> > > > > Thanks for jumping into this quickly, I followed up on the issue.
> > > > >
> > > > > I was meant for sockeye developer/maintainers to help setup nightly
> > > > > tests and raise issues early.
> > > > >
> > > > > Thanks!
> > > > >
> > > > > On Fri, Jun 21, 2019 at 10:10 AM Haibin Lin
> > > > > <ha...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > In GluonNLP we are testing with MXNET nightly build for each PR,
> > > > > > and we did find some MXNet related issue caught by the CI.
> > > > > > I recommend other toolkits also add integration tests with MXNet nightly.
> > > > > > It helps identify issues early.
> > > > > >
> > > > > > Best,
> > > > > > Haibin
> > > > > >
> > > > > > On Thu, Jun 20, 2019 at 18:52 Zhao, Patric <pa...@intel.com> wrote:
> > > > > >
> > > > > > > Thanks to raise the issue and we will take a look ASAP.
> > > > > > >
> > > > > > > The downstream cases is not in the MXNet CI so it's hard to
> > > > > > > catch the potential bugs or performance degradation for MXNet developers.
> > > > > > >
> > > > > > > In the future, I suggest adding the major downstream test cases,
> > > > > > > like
> > > > > > from
> > > > > > > sockeye, GluonNLP, GLuonCV, DGL, Gluon-TS, into the nightly test.
> > > > > > > If it's still too heavy,  maybe testing it weekly or monthly :)
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > --Patric
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
> > > > > > > > Sent: Friday, June 21, 2019 9:31 AM
> > > > > > > > To: dev@mxnet.incubator.apache.org
> > > > > > > > Cc: dev@mxnet.apache.org
> > > > > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version
> > > > > > > > 1.5.0.rc1
> > > > > > > >
> > > > > > > > Hi Lai,
> > > > > > > >
> > > > > > > > I have opened an issue:
> > > > > > > > https://github.com/apache/incubator-mxnet/issues/15297
> > > > > > > > I came to know about this issue only today and I have not been
> > > > > > monitoring
> > > > > > > > sockeye.
> > > > > > > > I jumped onto this issue to make sure it wasn't caused by the
> > > > > > > > dlpack
> > > > > > > changes.
> > > > > > > > Also, I don't  think sockeye CI checks against master, it is
> > > > > > > > using
> > > > > > 1.4.1.
> > > > > > > >
> > > > > > > > Anirudh
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei <ro...@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > Could you share which test failed and what’s the crash? How
> > > > > > > > > to reproduce it?
> > > > > > > > >
> > > > > > > > > I was able to install sockeye and run all tests passed.
> > > > > > > > > Using python setup.py test
> > > > > > > > >
> > > > > > > > > I have tested both nightly pip package and 1.5.0.rc1
> > > > > > > > >
> > > > > > > > > It would be great to create an issue with reproducible steps
> > > > > > > > > and move the discussion there.
> > > > > > > > >
> > > > > > > > > Also I see sockeye nightly build[1] has been failing for
> > > > > > > > > some time,
> > > > > > if
> > > > > > > > > it’s due to MXNet change, please raise this early so we can
> > > > > > > > > track and solve it in time rather than block the release during vote time.
> > > > > > > > >
> > > > > > > > > [1] https://travis-ci.org/awslabs/sockeye
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Jun 21, 2019 at 7:01 AM Anirudh Subramanian
> > > > > > > > > <anirudh2290@gmail.com
> > > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > I was able to reproduce a crash with the commit
> > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06 but not with the
> > > > > > > > > > commit a862270beb2d796c1ba311183f7f4a766a18ad6c.
> > > > > > > > > >
> > > > > > > > > > Anirudh
> > > > > > > > > >
> > > > > > > > > > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei
> > > > > > > > > > <ro...@gmail.com>
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Przemyslaw,
> > > > > > > > > > >
> > > > > > > > > > > Is there an issue with more details to track the problem?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Jun 21, 2019 at 6:04 AM Przemysław Trędak
> > > > > > > > > > > <pt...@apache.org>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > -1
> > > > > > > > > > > >
> > > > > > > > > > > > There is a crash in sockeye unit test (python setup.py
> > > > > > > > > > > > test) observed starting with nightly 1.5 build from
> > > > > > > > > > > > 6/13 and still occuring in
> > > > > > > > > > 1.5rc1. I
> > > > > > > > > > > > don't yet have the exact commit that is responsible
> > > > > > > > > > > > for it, but it is either
> > > > > > > > > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c (dlpack
> > > > > > > > > > > > related) or
> > > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06 (cached op
> > > > > > > > optimization).
> > > > > > > > > > > >
> > > > > > > > > > > > On 2019/06/20 06:36:22, Lai Wei <ro...@gmail.com> wrote:
> > > > > > > > > > > > > Dear MXNet community,
> > > > > > > > > > > > >
> > > > > > > > > > > > > This is the 3-day vote to release Apache MXNet
> > > > > > > > > > > > > (incubating) version
> > > > > > > > > > > > 1.5.0.
> > > > > > > > > > > > > Voting on dev@ will start June 19, 23:59:59(PST)
> > > > > > > > > > > > > and close
> > > > > > on
> > > > > > > > > June
> > > > > > > > > > > 22,
> > > > > > > > > > > > > 23:59:59.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1) Link to release notes:
> > > > > > > > > > > > >
> > > > > > > > > >
> > > > > > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+No
> > > > > > te
> > > > > > > > > > s
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > 2) Link to release candidate:
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.5.0.r
> > > > > > > > > > > > > c1
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > 3) Link to source and signatures on apache dist server:
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.0.r
> > > > > > > > > > > > > c1/
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Please remember to TEST first before voting accordingly:
> > > > > > > > > > > > >
> > > > > > > > > > > > > +1 = approve
> > > > > > > > > > > > > +0 = no opinion
> > > > > > > > > > > > > -1 = disapprove (provide reason)
> > > > > > > > > > > > > --
> > > > > > > > > > > > > Best Regards
> > > > > > > > > > > > >
> > > > > > > > > > > > > Lai
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Best Regards
> > > > > > > > > > >
> > > > > > > > > > > Lai
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Best Regards
> > > > > > > > >
> > > > > > > > > Lai
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > > --
> > > > > Best Regards
> > > > >
> > > > > Lai

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Posted by Pedro Larroy <pe...@gmail.com>.
The difference looks smaller now, more like your numbers. I wonder if
something happened during the previous benchmark like a system
update...


piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench (master)+$
time ~/mxnet_1.4/py3_venv/bin/python cifar10.py --epochs 5 && time
~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
[22:49:41] ../src/io/iter_image_recordio_2.cc:172:
ImageRecordIOParser2:
/home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4 threads
for decoding..
[22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load mean image
from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
[22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load mean image
from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin completed
[22:49:41] ../src/io/iter_image_recordio_2.cc:172:
ImageRecordIOParser2:
/home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4 threads
for decoding..
[22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load mean image
from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
[22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load mean image
from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin completed
lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005, 300: 0.0001}
Epoch 0, Changed learning rate to 0.05
[22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
147456 bytes with malloc directly
[22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
589824 bytes with malloc directly
[22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
2359296 bytes with malloc directly
[22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
9437184 bytes with malloc directly
Epoch 0, Batch 199, Speed=426.182733
Epoch 0, Duration=134.868458
Epoch 0, Training accuracy=0.127238
Epoch 0, Validation accuracy=0.206388
Epoch 1, Batch 199, Speed=313.127156
Epoch 1, Duration=128.041775
Epoch 1, Training accuracy=0.182065
Epoch 1, Validation accuracy=0.202524
Epoch 2, Batch 199, Speed=410.931187
Epoch 2, Duration=124.920588
Epoch 2, Training accuracy=0.202584
Epoch 2, Validation accuracy=0.245693
Epoch 3, Batch 199, Speed=419.119335
Epoch 3, Duration=120.948349
Epoch 3, Training accuracy=0.235854
Epoch 3, Validation accuracy=0.291066
Epoch 4, Batch 199, Speed=430.473733
Epoch 4, Duration=130.181724
Epoch 4, Training accuracy=0.257773
Epoch 4, Validation accuracy=0.304988

real    11m7.356s
user    406m9.910s
sys     14m18.349s
[23:00:49] ../src/io/iter_image_recordio_2.cc:172:
ImageRecordIOParser2:
/home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4 threads
for decoding..
[23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load mean image
from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
[23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load mean image
from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin completed
[23:00:49] ../src/io/iter_image_recordio_2.cc:172:
ImageRecordIOParser2:
/home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4 threads
for decoding..
[23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load mean image
from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
[23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load mean image
from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin completed
lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005, 300: 0.0001}
Epoch 0, Changed learning rate to 0.05
Epoch 0, Batch 199, Speed=348.618154
Epoch 0, Duration=146.469352
Epoch 0, Training accuracy=0.124121
Epoch 0, Validation accuracy=0.167227
Epoch 1, Batch 199, Speed=452.790825
Epoch 1, Duration=130.199421
Epoch 1, Training accuracy=0.183863
Epoch 1, Validation accuracy=0.237079
Epoch 2, Batch 199, Speed=451.406559
Epoch 2, Duration=126.320823
Epoch 2, Training accuracy=0.214844
Epoch 2, Validation accuracy=0.244692
Epoch 3, Batch 199, Speed=403.161873
Epoch 3, Duration=125.331660
Epoch 3, Training accuracy=0.243506
Epoch 3, Validation accuracy=0.301182
Epoch 4, Batch 199, Speed=450.826598
Epoch 4, Duration=126.426253
Epoch 4, Training accuracy=0.266424
Epoch 4, Validation accuracy=0.311899

real    11m21.930s
user    415m3.855s
sys     13m53.975s

On Wed, Jun 26, 2019 at 3:50 PM Pedro Larroy
<pe...@gmail.com> wrote:
>
> Hi Ciyong, thanks for trying to reproduce:
>
> I used this one:
> https://github.com/awslabs/deeplearning-benchmark/blob/master/dawnbench/cifar10.py
>
> Could you provide hardware and OS details?
>
> I will rerun and repost numbers in a few minutes.
>
> Pedro.
>
> On Wed, Jun 26, 2019 at 4:18 AM Chen, Ciyong <ci...@intel.com> wrote:
> >
> > Hi Pedro,
> >
> > I'm looking at this case, and using the script of "incubator-mxnet/example/image-classification/train_cifar10.py" to get
> > the timing data, but seems there's not much difference between mxnet 1.4.1.rc0 and 1.5.0.rc1 on C5.18xlarge.
> >
> > Not sure if there's any difference in the python script, can you point me the link to get your script (cifar10.py)?
> > Or you can also have a try with MXNet's script (train_cifar10.py) and see the performance.
> >
> > Here's the command I used to collect the time:
> >         python train_cifar10.py --num-epoch=5
> >
> > 1) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
> >         real    9m4.880s
> >         user    333m13.340s
> >         sys     14m36.100s
> >
> > 2) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
> >         real    9m2.155s
> >         user    329m37.092s
> >         sys     16m8.668s
> >
> > -Ciyong
> >
> >
> > -----Original Message-----
> > From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com]
> > Sent: Wednesday, June 26, 2019 6:28 AM
> > To: dev@mxnet.incubator.apache.org
> > Cc: dev@mxnet.apache.org
> > Subject: Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1
> >
> > Hi these were my build flags and system info:
> >
> >
> > --- # CMake configuration
> > USE_CUDA: "OFF" # Build with CUDA support
> > USE_OLDCMAKECUDA: "OFF" # Build with old cmake cuda
> > USE_NCCL: "OFF" # Use NVidia NCCL with CUDA
> > USE_OPENCV: "ON" # Build with OpenCV support
> > USE_OPENMP: "ON" # Build with Openmp support
> > USE_CUDNN: "ON" # Build with cudnn support) # one could set CUDNN_ROOT for search path
> > USE_SSE: "ON" # Build with x86 SSE instruction support IF NOT ARM
> > USE_F16C: "ON" # Build with x86 F16C instruction support) # autodetects support if "ON"
> > USE_LAPACK: "ON" # Build with lapack support
> > USE_MKL_IF_AVAILABLE: "ON" # Use MKL if found
> > USE_MKLML_MKL: "ON" # Use MKLDNN variant of MKL (if MKL found) IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
> > USE_MKLDNN: "ON" # Use MKLDNN variant of MKL (if MKL found) IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
> > USE_OPERATOR_TUNING: "ON" # Enable auto-tuning of operators IF NOT MSVC
> > USE_GPERFTOOLS: "ON" # Build with GPerfTools support (if found)
> > USE_JEMALLOC: "ON" # Build with Jemalloc support
> > USE_PROFILER: "ON" # Build with Profiler support
> > USE_DIST_KVSTORE: "OFF" # Build with DIST_KVSTORE support
> > USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC Plugins
> > USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin
> > USE_CPP_PACKAGE: "OFF" # Build C++ Package
> > USE_MXNET_LIB_NAMING: "ON" # Use MXNet library naming conventions.
> > USE_GPROF: "OFF" # Compile with gprof (profiling) flag
> > USE_CXX14_IF_AVAILABLE: "OFF" # Build with C++14 if the compiler supports it
> > USE_VTUNE: "OFF" # Enable use of Intel Amplifier XE (VTune)) # one could set VTUNE_ROOT for search path
> > ENABLE_CUDA_RTC: "ON" # Build with CUDA runtime compilation support
> > BUILD_CPP_EXAMPLES: "ON" # Build cpp examples
> > INSTALL_EXAMPLES: "OFF" # Install the example source files.
> > USE_SIGNAL_HANDLER: "ON" # Print stack traces on segfaults.
> > USE_TENSORRT: "OFF" # Enable infeference optimization with TensorRT.
> > USE_ASAN: "OFF" # Enable Clang/GCC ASAN sanitizers.
> > ENABLE_TESTCOVERAGE: "OFF" # Enable compilation with test coverage metric output
> > CMAKE_BUILD_TYPE: "Release"
> > CMAKE_CUDA_COMPILER_LAUNCHER: "ccache"
> > CMAKE_C_COMPILER_LAUNCHER: "ccache"
> > CMAKE_CXX_COMPILER_LAUNCHER: "ccache"
> >
> > commit 4d9667121ae6fb643f2a02ab15e25231ed756cde (HEAD, tag: 1.5.0.rc1,
> > upstream/v1.5.x)
> > commit 1a7199691f5cbc6012bb53eecbf884bed5ae6590 (HEAD, tag: 1.4.1.rc0,
> > upstream/v1.4.x)
> >
> > curl http://169.254.169.254/latest/meta-data/instance-type
> > c5d.18xlarge
> >
> >
> > Version      : 3.6.7
> > Compiler     : GCC 8.2.0
> > Build        : ('default', 'Oct 22 2018 11:32:17')
> > Arch         : ('64bit', 'ELF')
> > ------------Pip Info-----------
> > Version      : 19.1.1
> > Directory    : /home/piotr/mxnet_1.5/py3_venv/lib/python3.6/site-packages/pip
> > ----------MXNet Info-----------
> > Version      : 1.5.0
> > Directory    : /home/piotr/mxnet_1.5/python/mxnet
> > Hashtag not found. Not installed from pre-built package.
> > ----------System Info----------
> > Platform     : Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
> > system       : Linux
> > node         : ip-172-31-63-171
> > release      : 4.15.0-1035-aws
> > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
> > ----------Hardware Info----------
> > machine      : x86_64
> > processor    : x86_64
> > Architecture:        x86_64
> > CPU op-mode(s):      32-bit, 64-bit
> > Byte Order:          Little Endian
> > CPU(s):              72
> > On-line CPU(s) list: 0-71
> > Thread(s) per core:  2
> > Core(s) per socket:  18
> > Socket(s):           2
> > NUMA node(s):        2
> > Vendor ID:           GenuineIntel
> > CPU family:          6
> > Model:               85
> > Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
> > Stepping:            4
> > CPU MHz:             1326.446
> > BogoMIPS:            6000.00
> > Hypervisor vendor:   KVM
> > Virtualization type: full
> > L1d cache:           32K
> > L1i cache:           32K
> > L2 cache:            1024K
> > L3 cache:            25344K
> > NUMA node0 CPU(s):   0-17,36-53
> > NUMA node1 CPU(s):   18-35,54-71
> > Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
> > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 pcid
> > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke ----------Network Test----------
> >
> > ----------Python Info----------
> > Version      : 3.6.7
> > Compiler     : GCC 8.2.0
> > Build        : ('default', 'Oct 22 2018 11:32:17')
> > Arch         : ('64bit', 'ELF')
> > ------------Pip Info-----------
> > Version      : 19.1.1
> > Directory    : /home/piotr/mxnet_1.4/py3_venv/lib/python3.6/site-packages/pip
> > ----------MXNet Info-----------
> > Version      : 1.4.1
> > Directory    : /home/piotr/mxnet_1.4/python/mxnet
> > Hashtag not found. Not installed from pre-built package.
> > ----------System Info----------
> > Platform     : Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
> > system       : Linux
> > node         : ip-172-31-63-171
> > release      : 4.15.0-1035-aws
> > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
> > ----------Hardware Info----------
> > machine      : x86_64
> > processor    : x86_64
> > Architecture:        x86_64
> > CPU op-mode(s):      32-bit, 64-bit
> > Byte Order:          Little Endian
> > CPU(s):              72
> > On-line CPU(s) list: 0-71
> > Thread(s) per core:  2
> > Core(s) per socket:  18
> > Socket(s):           2
> > NUMA node(s):        2
> > Vendor ID:           GenuineIntel
> > CPU family:          6
> > Model:               85
> > Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
> > Stepping:            4
> > CPU MHz:             1223.344
> > BogoMIPS:            6000.00
> > Hypervisor vendor:   KVM
> > Virtualization type: full
> > L1d cache:           32K
> > L1i cache:           32K
> > L2 cache:            1024K
> > L3 cache:            25344K
> > NUMA node0 CPU(s):   0-17,36-53
> > NUMA node1 CPU(s):   18-35,54-71
> > Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
> > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 pcid
> > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke ----------Network Test----------
> >
> > On Tue, Jun 25, 2019 at 2:35 PM Pedro Larroy <pe...@gmail.com> wrote:
> > >
> > > I did a training of cifar10 in CPU and seems there's some regressions
> > > in the range of 7% increase of training time against 1.4.1:
> > >
> > > (py3_venv) piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
> > > (master)+$ time python cifar10.py --epochs 5
> > > real    11m30.388s
> > > user    417m7.766s
> > > sys     16m57.315s
> > >
> > > VS 1.4.1:
> > > real    10m41.994s
> > > user    392m40.646s
> > > sys     12m30.601s
> > >
> > >
> > > On Thu, Jun 20, 2019 at 10:15 PM Lai Wei <ro...@gmail.com> wrote:
> > > >
> > > > Hi Anirudh,
> > > >
> > > > Thanks for jumping into this quickly, I followed up on the issue.
> > > >
> > > > I was meant for sockeye developer/maintainers to help setup nightly
> > > > tests and raise issues early.
> > > >
> > > > Thanks!
> > > >
> > > > On Fri, Jun 21, 2019 at 10:10 AM Haibin Lin
> > > > <ha...@gmail.com>
> > > > wrote:
> > > >
> > > > > In GluonNLP we are testing with MXNET nightly build for each PR,
> > > > > and we did find some MXNet related issue caught by the CI.
> > > > > I recommend other toolkits also add integration tests with MXNet nightly.
> > > > > It helps identify issues early.
> > > > >
> > > > > Best,
> > > > > Haibin
> > > > >
> > > > > On Thu, Jun 20, 2019 at 18:52 Zhao, Patric <pa...@intel.com> wrote:
> > > > >
> > > > > > Thanks to raise the issue and we will take a look ASAP.
> > > > > >
> > > > > > The downstream cases is not in the MXNet CI so it's hard to
> > > > > > catch the potential bugs or performance degradation for MXNet developers.
> > > > > >
> > > > > > In the future, I suggest adding the major downstream test cases,
> > > > > > like
> > > > > from
> > > > > > sockeye, GluonNLP, GLuonCV, DGL, Gluon-TS, into the nightly test.
> > > > > > If it's still too heavy,  maybe testing it weekly or monthly :)
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > --Patric
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
> > > > > > > Sent: Friday, June 21, 2019 9:31 AM
> > > > > > > To: dev@mxnet.incubator.apache.org
> > > > > > > Cc: dev@mxnet.apache.org
> > > > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version
> > > > > > > 1.5.0.rc1
> > > > > > >
> > > > > > > Hi Lai,
> > > > > > >
> > > > > > > I have opened an issue:
> > > > > > > https://github.com/apache/incubator-mxnet/issues/15297
> > > > > > > I came to know about this issue only today and I have not been
> > > > > monitoring
> > > > > > > sockeye.
> > > > > > > I jumped onto this issue to make sure it wasn't caused by the
> > > > > > > dlpack
> > > > > > changes.
> > > > > > > Also, I don't  think sockeye CI checks against master, it is
> > > > > > > using
> > > > > 1.4.1.
> > > > > > >
> > > > > > > Anirudh
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei <ro...@gmail.com> wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Could you share which test failed and what’s the crash? How
> > > > > > > > to reproduce it?
> > > > > > > >
> > > > > > > > I was able to install sockeye and run all tests passed.
> > > > > > > > Using python setup.py test
> > > > > > > >
> > > > > > > > I have tested both nightly pip package and 1.5.0.rc1
> > > > > > > >
> > > > > > > > It would be great to create an issue with reproducible steps
> > > > > > > > and move the discussion there.
> > > > > > > >
> > > > > > > > Also I see sockeye nightly build[1] has been failing for
> > > > > > > > some time,
> > > > > if
> > > > > > > > it’s due to MXNet change, please raise this early so we can
> > > > > > > > track and solve it in time rather than block the release during vote time.
> > > > > > > >
> > > > > > > > [1] https://travis-ci.org/awslabs/sockeye
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Jun 21, 2019 at 7:01 AM Anirudh Subramanian
> > > > > > > > <anirudh2290@gmail.com
> > > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > I was able to reproduce a crash with the commit
> > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06 but not with the
> > > > > > > > > commit a862270beb2d796c1ba311183f7f4a766a18ad6c.
> > > > > > > > >
> > > > > > > > > Anirudh
> > > > > > > > >
> > > > > > > > > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei
> > > > > > > > > <ro...@gmail.com>
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Przemyslaw,
> > > > > > > > > >
> > > > > > > > > > Is there an issue with more details to track the problem?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Fri, Jun 21, 2019 at 6:04 AM Przemysław Trędak
> > > > > > > > > > <pt...@apache.org>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > -1
> > > > > > > > > > >
> > > > > > > > > > > There is a crash in sockeye unit test (python setup.py
> > > > > > > > > > > test) observed starting with nightly 1.5 build from
> > > > > > > > > > > 6/13 and still occuring in
> > > > > > > > > 1.5rc1. I
> > > > > > > > > > > don't yet have the exact commit that is responsible
> > > > > > > > > > > for it, but it is either
> > > > > > > > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c (dlpack
> > > > > > > > > > > related) or
> > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06 (cached op
> > > > > > > optimization).
> > > > > > > > > > >
> > > > > > > > > > > On 2019/06/20 06:36:22, Lai Wei <ro...@gmail.com> wrote:
> > > > > > > > > > > > Dear MXNet community,
> > > > > > > > > > > >
> > > > > > > > > > > > This is the 3-day vote to release Apache MXNet
> > > > > > > > > > > > (incubating) version
> > > > > > > > > > > 1.5.0.
> > > > > > > > > > > > Voting on dev@ will start June 19, 23:59:59(PST)
> > > > > > > > > > > > and close
> > > > > on
> > > > > > > > June
> > > > > > > > > > 22,
> > > > > > > > > > > > 23:59:59.
> > > > > > > > > > > >
> > > > > > > > > > > > 1) Link to release notes:
> > > > > > > > > > > >
> > > > > > > > >
> > > > > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+No
> > > > > te
> > > > > > > > > s
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > 2) Link to release candidate:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.5.0.r
> > > > > > > > > > > > c1
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > 3) Link to source and signatures on apache dist server:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.0.r
> > > > > > > > > > > > c1/
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Please remember to TEST first before voting accordingly:
> > > > > > > > > > > >
> > > > > > > > > > > > +1 = approve
> > > > > > > > > > > > +0 = no opinion
> > > > > > > > > > > > -1 = disapprove (provide reason)
> > > > > > > > > > > > --
> > > > > > > > > > > > Best Regards
> > > > > > > > > > > >
> > > > > > > > > > > > Lai
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Best Regards
> > > > > > > > > >
> > > > > > > > > > Lai
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > --
> > > > > > > > Best Regards
> > > > > > > >
> > > > > > > > Lai
> > > > > > > >
> > > > > >
> > > > >
> > > > --
> > > > Best Regards
> > > >
> > > > Lai

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Posted by Pedro Larroy <pe...@gmail.com>.
Hi Ciyong, thanks for trying to reproduce:

I used this one:
https://github.com/awslabs/deeplearning-benchmark/blob/master/dawnbench/cifar10.py

Could you provide hardware and OS details?

I will rerun and repost numbers in a few minutes.

Pedro.

On Wed, Jun 26, 2019 at 4:18 AM Chen, Ciyong <ci...@intel.com> wrote:
>
> Hi Pedro,
>
> I'm looking at this case, and using the script of "incubator-mxnet/example/image-classification/train_cifar10.py" to get
> the timing data, but seems there's not much difference between mxnet 1.4.1.rc0 and 1.5.0.rc1 on C5.18xlarge.
>
> Not sure if there's any difference in the python script, can you point me the link to get your script (cifar10.py)?
> Or you can also have a try with MXNet's script (train_cifar10.py) and see the performance.
>
> Here's the command I used to collect the time:
>         python train_cifar10.py --num-epoch=5
>
> 1) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
>         real    9m4.880s
>         user    333m13.340s
>         sys     14m36.100s
>
> 2) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
>         real    9m2.155s
>         user    329m37.092s
>         sys     16m8.668s
>
> -Ciyong
>
>
> -----Original Message-----
> From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com]
> Sent: Wednesday, June 26, 2019 6:28 AM
> To: dev@mxnet.incubator.apache.org
> Cc: dev@mxnet.apache.org
> Subject: Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1
>
> Hi these were my build flags and system info:
>
>
> --- # CMake configuration
> USE_CUDA: "OFF" # Build with CUDA support
> USE_OLDCMAKECUDA: "OFF" # Build with old cmake cuda
> USE_NCCL: "OFF" # Use NVidia NCCL with CUDA
> USE_OPENCV: "ON" # Build with OpenCV support
> USE_OPENMP: "ON" # Build with Openmp support
> USE_CUDNN: "ON" # Build with cudnn support) # one could set CUDNN_ROOT for search path
> USE_SSE: "ON" # Build with x86 SSE instruction support IF NOT ARM
> USE_F16C: "ON" # Build with x86 F16C instruction support) # autodetects support if "ON"
> USE_LAPACK: "ON" # Build with lapack support
> USE_MKL_IF_AVAILABLE: "ON" # Use MKL if found
> USE_MKLML_MKL: "ON" # Use MKLDNN variant of MKL (if MKL found) IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
> USE_MKLDNN: "ON" # Use MKLDNN variant of MKL (if MKL found) IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
> USE_OPERATOR_TUNING: "ON" # Enable auto-tuning of operators IF NOT MSVC
> USE_GPERFTOOLS: "ON" # Build with GPerfTools support (if found)
> USE_JEMALLOC: "ON" # Build with Jemalloc support
> USE_PROFILER: "ON" # Build with Profiler support
> USE_DIST_KVSTORE: "OFF" # Build with DIST_KVSTORE support
> USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC Plugins
> USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin
> USE_CPP_PACKAGE: "OFF" # Build C++ Package
> USE_MXNET_LIB_NAMING: "ON" # Use MXNet library naming conventions.
> USE_GPROF: "OFF" # Compile with gprof (profiling) flag
> USE_CXX14_IF_AVAILABLE: "OFF" # Build with C++14 if the compiler supports it
> USE_VTUNE: "OFF" # Enable use of Intel Amplifier XE (VTune)) # one could set VTUNE_ROOT for search path
> ENABLE_CUDA_RTC: "ON" # Build with CUDA runtime compilation support
> BUILD_CPP_EXAMPLES: "ON" # Build cpp examples
> INSTALL_EXAMPLES: "OFF" # Install the example source files.
> USE_SIGNAL_HANDLER: "ON" # Print stack traces on segfaults.
> USE_TENSORRT: "OFF" # Enable infeference optimization with TensorRT.
> USE_ASAN: "OFF" # Enable Clang/GCC ASAN sanitizers.
> ENABLE_TESTCOVERAGE: "OFF" # Enable compilation with test coverage metric output
> CMAKE_BUILD_TYPE: "Release"
> CMAKE_CUDA_COMPILER_LAUNCHER: "ccache"
> CMAKE_C_COMPILER_LAUNCHER: "ccache"
> CMAKE_CXX_COMPILER_LAUNCHER: "ccache"
>
> commit 4d9667121ae6fb643f2a02ab15e25231ed756cde (HEAD, tag: 1.5.0.rc1,
> upstream/v1.5.x)
> commit 1a7199691f5cbc6012bb53eecbf884bed5ae6590 (HEAD, tag: 1.4.1.rc0,
> upstream/v1.4.x)
>
> curl http://169.254.169.254/latest/meta-data/instance-type
> c5d.18xlarge
>
>
> Version      : 3.6.7
> Compiler     : GCC 8.2.0
> Build        : ('default', 'Oct 22 2018 11:32:17')
> Arch         : ('64bit', 'ELF')
> ------------Pip Info-----------
> Version      : 19.1.1
> Directory    : /home/piotr/mxnet_1.5/py3_venv/lib/python3.6/site-packages/pip
> ----------MXNet Info-----------
> Version      : 1.5.0
> Directory    : /home/piotr/mxnet_1.5/python/mxnet
> Hashtag not found. Not installed from pre-built package.
> ----------System Info----------
> Platform     : Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
> system       : Linux
> node         : ip-172-31-63-171
> release      : 4.15.0-1035-aws
> version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
> ----------Hardware Info----------
> machine      : x86_64
> processor    : x86_64
> Architecture:        x86_64
> CPU op-mode(s):      32-bit, 64-bit
> Byte Order:          Little Endian
> CPU(s):              72
> On-line CPU(s) list: 0-71
> Thread(s) per core:  2
> Core(s) per socket:  18
> Socket(s):           2
> NUMA node(s):        2
> Vendor ID:           GenuineIntel
> CPU family:          6
> Model:               85
> Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
> Stepping:            4
> CPU MHz:             1326.446
> BogoMIPS:            6000.00
> Hypervisor vendor:   KVM
> Virtualization type: full
> L1d cache:           32K
> L1i cache:           32K
> L2 cache:            1024K
> L3 cache:            25344K
> NUMA node0 CPU(s):   0-17,36-53
> NUMA node1 CPU(s):   18-35,54-71
> Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
> pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 pcid
> sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke ----------Network Test----------
>
> ----------Python Info----------
> Version      : 3.6.7
> Compiler     : GCC 8.2.0
> Build        : ('default', 'Oct 22 2018 11:32:17')
> Arch         : ('64bit', 'ELF')
> ------------Pip Info-----------
> Version      : 19.1.1
> Directory    : /home/piotr/mxnet_1.4/py3_venv/lib/python3.6/site-packages/pip
> ----------MXNet Info-----------
> Version      : 1.4.1
> Directory    : /home/piotr/mxnet_1.4/python/mxnet
> Hashtag not found. Not installed from pre-built package.
> ----------System Info----------
> Platform     : Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
> system       : Linux
> node         : ip-172-31-63-171
> release      : 4.15.0-1035-aws
> version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
> ----------Hardware Info----------
> machine      : x86_64
> processor    : x86_64
> Architecture:        x86_64
> CPU op-mode(s):      32-bit, 64-bit
> Byte Order:          Little Endian
> CPU(s):              72
> On-line CPU(s) list: 0-71
> Thread(s) per core:  2
> Core(s) per socket:  18
> Socket(s):           2
> NUMA node(s):        2
> Vendor ID:           GenuineIntel
> CPU family:          6
> Model:               85
> Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
> Stepping:            4
> CPU MHz:             1223.344
> BogoMIPS:            6000.00
> Hypervisor vendor:   KVM
> Virtualization type: full
> L1d cache:           32K
> L1i cache:           32K
> L2 cache:            1024K
> L3 cache:            25344K
> NUMA node0 CPU(s):   0-17,36-53
> NUMA node1 CPU(s):   18-35,54-71
> Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
> pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 pcid
> sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke ----------Network Test----------
>
> On Tue, Jun 25, 2019 at 2:35 PM Pedro Larroy <pe...@gmail.com> wrote:
> >
> > I did a training of cifar10 in CPU and seems there's some regressions
> > in the range of 7% increase of training time against 1.4.1:
> >
> > (py3_venv) piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
> > (master)+$ time python cifar10.py --epochs 5
> > real    11m30.388s
> > user    417m7.766s
> > sys     16m57.315s
> >
> > VS 1.4.1:
> > real    10m41.994s
> > user    392m40.646s
> > sys     12m30.601s
> >
> >
> > On Thu, Jun 20, 2019 at 10:15 PM Lai Wei <ro...@gmail.com> wrote:
> > >
> > > Hi Anirudh,
> > >
> > > Thanks for jumping into this quickly, I followed up on the issue.
> > >
> > > I was meant for sockeye developer/maintainers to help setup nightly
> > > tests and raise issues early.
> > >
> > > Thanks!
> > >
> > > On Fri, Jun 21, 2019 at 10:10 AM Haibin Lin
> > > <ha...@gmail.com>
> > > wrote:
> > >
> > > > In GluonNLP we are testing with MXNET nightly build for each PR,
> > > > and we did find some MXNet related issue caught by the CI.
> > > > I recommend other toolkits also add integration tests with MXNet nightly.
> > > > It helps identify issues early.
> > > >
> > > > Best,
> > > > Haibin
> > > >
> > > > On Thu, Jun 20, 2019 at 18:52 Zhao, Patric <pa...@intel.com> wrote:
> > > >
> > > > > Thanks to raise the issue and we will take a look ASAP.
> > > > >
> > > > > The downstream cases is not in the MXNet CI so it's hard to
> > > > > catch the potential bugs or performance degradation for MXNet developers.
> > > > >
> > > > > In the future, I suggest adding the major downstream test cases,
> > > > > like
> > > > from
> > > > > sockeye, GluonNLP, GLuonCV, DGL, Gluon-TS, into the nightly test.
> > > > > If it's still too heavy,  maybe testing it weekly or monthly :)
> > > > >
> > > > > Thanks,
> > > > >
> > > > > --Patric
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
> > > > > > Sent: Friday, June 21, 2019 9:31 AM
> > > > > > To: dev@mxnet.incubator.apache.org
> > > > > > Cc: dev@mxnet.apache.org
> > > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version
> > > > > > 1.5.0.rc1
> > > > > >
> > > > > > Hi Lai,
> > > > > >
> > > > > > I have opened an issue:
> > > > > > https://github.com/apache/incubator-mxnet/issues/15297
> > > > > > I came to know about this issue only today and I have not been
> > > > monitoring
> > > > > > sockeye.
> > > > > > I jumped onto this issue to make sure it wasn't caused by the
> > > > > > dlpack
> > > > > changes.
> > > > > > Also, I don't  think sockeye CI checks against master, it is
> > > > > > using
> > > > 1.4.1.
> > > > > >
> > > > > > Anirudh
> > > > > >
> > > > > >
> > > > > > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei <ro...@gmail.com> wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Could you share which test failed and what’s the crash? How
> > > > > > > to reproduce it?
> > > > > > >
> > > > > > > I was able to install sockeye and run all tests passed.
> > > > > > > Using python setup.py test
> > > > > > >
> > > > > > > I have tested both nightly pip package and 1.5.0.rc1
> > > > > > >
> > > > > > > It would be great to create an issue with reproducible steps
> > > > > > > and move the discussion there.
> > > > > > >
> > > > > > > Also I see sockeye nightly build[1] has been failing for
> > > > > > > some time,
> > > > if
> > > > > > > it’s due to MXNet change, please raise this early so we can
> > > > > > > track and solve it in time rather than block the release during vote time.
> > > > > > >
> > > > > > > [1] https://travis-ci.org/awslabs/sockeye
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Jun 21, 2019 at 7:01 AM Anirudh Subramanian
> > > > > > > <anirudh2290@gmail.com
> > > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I was able to reproduce a crash with the commit
> > > > > > > > 09202f7f261954383aa387144524d38f83f18d06 but not with the
> > > > > > > > commit a862270beb2d796c1ba311183f7f4a766a18ad6c.
> > > > > > > >
> > > > > > > > Anirudh
> > > > > > > >
> > > > > > > > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei
> > > > > > > > <ro...@gmail.com>
> > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Przemyslaw,
> > > > > > > > >
> > > > > > > > > Is there an issue with more details to track the problem?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Jun 21, 2019 at 6:04 AM Przemysław Trędak
> > > > > > > > > <pt...@apache.org>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > -1
> > > > > > > > > >
> > > > > > > > > > There is a crash in sockeye unit test (python setup.py
> > > > > > > > > > test) observed starting with nightly 1.5 build from
> > > > > > > > > > 6/13 and still occuring in
> > > > > > > > 1.5rc1. I
> > > > > > > > > > don't yet have the exact commit that is responsible
> > > > > > > > > > for it, but it is either
> > > > > > > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c (dlpack
> > > > > > > > > > related) or
> > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06 (cached op
> > > > > > optimization).
> > > > > > > > > >
> > > > > > > > > > On 2019/06/20 06:36:22, Lai Wei <ro...@gmail.com> wrote:
> > > > > > > > > > > Dear MXNet community,
> > > > > > > > > > >
> > > > > > > > > > > This is the 3-day vote to release Apache MXNet
> > > > > > > > > > > (incubating) version
> > > > > > > > > > 1.5.0.
> > > > > > > > > > > Voting on dev@ will start June 19, 23:59:59(PST)
> > > > > > > > > > > and close
> > > > on
> > > > > > > June
> > > > > > > > > 22,
> > > > > > > > > > > 23:59:59.
> > > > > > > > > > >
> > > > > > > > > > > 1) Link to release notes:
> > > > > > > > > > >
> > > > > > > >
> > > > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+No
> > > > te
> > > > > > > > s
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > 2) Link to release candidate:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > https://github.com/apache/incubator-mxnet/releases/tag/1.5.0.r
> > > > > > > > > > > c1
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > 3) Link to source and signatures on apache dist server:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.0.r
> > > > > > > > > > > c1/
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Please remember to TEST first before voting accordingly:
> > > > > > > > > > >
> > > > > > > > > > > +1 = approve
> > > > > > > > > > > +0 = no opinion
> > > > > > > > > > > -1 = disapprove (provide reason)
> > > > > > > > > > > --
> > > > > > > > > > > Best Regards
> > > > > > > > > > >
> > > > > > > > > > > Lai
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Best Regards
> > > > > > > > >
> > > > > > > > > Lai
> > > > > > > > >
> > > > > > > >
> > > > > > > --
> > > > > > > Best Regards
> > > > > > >
> > > > > > > Lai
> > > > > > >
> > > > >
> > > >
> > > --
> > > Best Regards
> > >
> > > Lai

RE: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Posted by "Chen, Ciyong" <ci...@intel.com>.
Hi Pedro,

I'm looking at this case, and using the script of "incubator-mxnet/example/image-classification/train_cifar10.py" to get
the timing data, but seems there's not much difference between mxnet 1.4.1.rc0 and 1.5.0.rc1 on C5.18xlarge.

Not sure if there's any difference in the python script, can you point me the link to get your script (cifar10.py)?
Or you can also have a try with MXNet's script (train_cifar10.py) and see the performance.

Here's the command I used to collect the time: 
	python train_cifar10.py --num-epoch=5

1) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
	real	9m4.880s
	user	333m13.340s
	sys	14m36.100s

2) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
	real	9m2.155s
	user	329m37.092s
	sys	16m8.668s

-Ciyong


-----Original Message-----
From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com] 
Sent: Wednesday, June 26, 2019 6:28 AM
To: dev@mxnet.incubator.apache.org
Cc: dev@mxnet.apache.org
Subject: Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Hi these were my build flags and system info:


--- # CMake configuration
USE_CUDA: "OFF" # Build with CUDA support
USE_OLDCMAKECUDA: "OFF" # Build with old cmake cuda
USE_NCCL: "OFF" # Use NVidia NCCL with CUDA
USE_OPENCV: "ON" # Build with OpenCV support
USE_OPENMP: "ON" # Build with Openmp support
USE_CUDNN: "ON" # Build with cudnn support) # one could set CUDNN_ROOT for search path
USE_SSE: "ON" # Build with x86 SSE instruction support IF NOT ARM
USE_F16C: "ON" # Build with x86 F16C instruction support) # autodetects support if "ON"
USE_LAPACK: "ON" # Build with lapack support
USE_MKL_IF_AVAILABLE: "ON" # Use MKL if found
USE_MKLML_MKL: "ON" # Use MKLDNN variant of MKL (if MKL found) IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
USE_MKLDNN: "ON" # Use MKLDNN variant of MKL (if MKL found) IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
USE_OPERATOR_TUNING: "ON" # Enable auto-tuning of operators IF NOT MSVC
USE_GPERFTOOLS: "ON" # Build with GPerfTools support (if found)
USE_JEMALLOC: "ON" # Build with Jemalloc support
USE_PROFILER: "ON" # Build with Profiler support
USE_DIST_KVSTORE: "OFF" # Build with DIST_KVSTORE support
USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC Plugins
USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin
USE_CPP_PACKAGE: "OFF" # Build C++ Package
USE_MXNET_LIB_NAMING: "ON" # Use MXNet library naming conventions.
USE_GPROF: "OFF" # Compile with gprof (profiling) flag
USE_CXX14_IF_AVAILABLE: "OFF" # Build with C++14 if the compiler supports it
USE_VTUNE: "OFF" # Enable use of Intel Amplifier XE (VTune)) # one could set VTUNE_ROOT for search path
ENABLE_CUDA_RTC: "ON" # Build with CUDA runtime compilation support
BUILD_CPP_EXAMPLES: "ON" # Build cpp examples
INSTALL_EXAMPLES: "OFF" # Install the example source files.
USE_SIGNAL_HANDLER: "ON" # Print stack traces on segfaults.
USE_TENSORRT: "OFF" # Enable infeference optimization with TensorRT.
USE_ASAN: "OFF" # Enable Clang/GCC ASAN sanitizers.
ENABLE_TESTCOVERAGE: "OFF" # Enable compilation with test coverage metric output
CMAKE_BUILD_TYPE: "Release"
CMAKE_CUDA_COMPILER_LAUNCHER: "ccache"
CMAKE_C_COMPILER_LAUNCHER: "ccache"
CMAKE_CXX_COMPILER_LAUNCHER: "ccache"

commit 4d9667121ae6fb643f2a02ab15e25231ed756cde (HEAD, tag: 1.5.0.rc1,
upstream/v1.5.x)
commit 1a7199691f5cbc6012bb53eecbf884bed5ae6590 (HEAD, tag: 1.4.1.rc0,
upstream/v1.4.x)

curl http://169.254.169.254/latest/meta-data/instance-type
c5d.18xlarge


Version      : 3.6.7
Compiler     : GCC 8.2.0
Build        : ('default', 'Oct 22 2018 11:32:17')
Arch         : ('64bit', 'ELF')
------------Pip Info-----------
Version      : 19.1.1
Directory    : /home/piotr/mxnet_1.5/py3_venv/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version      : 1.5.0
Directory    : /home/piotr/mxnet_1.5/python/mxnet
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform     : Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
system       : Linux
node         : ip-172-31-63-171
release      : 4.15.0-1035-aws
version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              72
On-line CPU(s) list: 0-71
Thread(s) per core:  2
Core(s) per socket:  18
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
Stepping:            4
CPU MHz:             1326.446
BogoMIPS:            6000.00
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            25344K
NUMA node0 CPU(s):   0-17,36-53
NUMA node1 CPU(s):   18-35,54-71
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 pcid
sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke ----------Network Test----------

----------Python Info----------
Version      : 3.6.7
Compiler     : GCC 8.2.0
Build        : ('default', 'Oct 22 2018 11:32:17')
Arch         : ('64bit', 'ELF')
------------Pip Info-----------
Version      : 19.1.1
Directory    : /home/piotr/mxnet_1.4/py3_venv/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version      : 1.4.1
Directory    : /home/piotr/mxnet_1.4/python/mxnet
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform     : Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
system       : Linux
node         : ip-172-31-63-171
release      : 4.15.0-1035-aws
version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              72
On-line CPU(s) list: 0-71
Thread(s) per core:  2
Core(s) per socket:  18
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
Stepping:            4
CPU MHz:             1223.344
BogoMIPS:            6000.00
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            25344K
NUMA node0 CPU(s):   0-17,36-53
NUMA node1 CPU(s):   18-35,54-71
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 pcid
sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke ----------Network Test----------

On Tue, Jun 25, 2019 at 2:35 PM Pedro Larroy <pe...@gmail.com> wrote:
>
> I did a training of cifar10 in CPU and seems there's some regressions 
> in the range of 7% increase of training time against 1.4.1:
>
> (py3_venv) piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
> (master)+$ time python cifar10.py --epochs 5
> real    11m30.388s
> user    417m7.766s
> sys     16m57.315s
>
> VS 1.4.1:
> real    10m41.994s
> user    392m40.646s
> sys     12m30.601s
>
>
> On Thu, Jun 20, 2019 at 10:15 PM Lai Wei <ro...@gmail.com> wrote:
> >
> > Hi Anirudh,
> >
> > Thanks for jumping into this quickly, I followed up on the issue.
> >
> > I was meant for sockeye developer/maintainers to help setup nightly 
> > tests and raise issues early.
> >
> > Thanks!
> >
> > On Fri, Jun 21, 2019 at 10:10 AM Haibin Lin 
> > <ha...@gmail.com>
> > wrote:
> >
> > > In GluonNLP we are testing with MXNET nightly build for each PR, 
> > > and we did find some MXNet related issue caught by the CI.
> > > I recommend other toolkits also add integration tests with MXNet nightly.
> > > It helps identify issues early.
> > >
> > > Best,
> > > Haibin
> > >
> > > On Thu, Jun 20, 2019 at 18:52 Zhao, Patric <pa...@intel.com> wrote:
> > >
> > > > Thanks to raise the issue and we will take a look ASAP.
> > > >
> > > > The downstream cases is not in the MXNet CI so it's hard to 
> > > > catch the potential bugs or performance degradation for MXNet developers.
> > > >
> > > > In the future, I suggest adding the major downstream test cases, 
> > > > like
> > > from
> > > > sockeye, GluonNLP, GLuonCV, DGL, Gluon-TS, into the nightly test.
> > > > If it's still too heavy,  maybe testing it weekly or monthly :)
> > > >
> > > > Thanks,
> > > >
> > > > --Patric
> > > >
> > > > > -----Original Message-----
> > > > > From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
> > > > > Sent: Friday, June 21, 2019 9:31 AM
> > > > > To: dev@mxnet.incubator.apache.org
> > > > > Cc: dev@mxnet.apache.org
> > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version 
> > > > > 1.5.0.rc1
> > > > >
> > > > > Hi Lai,
> > > > >
> > > > > I have opened an issue:
> > > > > https://github.com/apache/incubator-mxnet/issues/15297
> > > > > I came to know about this issue only today and I have not been
> > > monitoring
> > > > > sockeye.
> > > > > I jumped onto this issue to make sure it wasn't caused by the 
> > > > > dlpack
> > > > changes.
> > > > > Also, I don't  think sockeye CI checks against master, it is 
> > > > > using
> > > 1.4.1.
> > > > >
> > > > > Anirudh
> > > > >
> > > > >
> > > > > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei <ro...@gmail.com> wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Could you share which test failed and what’s the crash? How 
> > > > > > to reproduce it?
> > > > > >
> > > > > > I was able to install sockeye and run all tests passed. 
> > > > > > Using python setup.py test
> > > > > >
> > > > > > I have tested both nightly pip package and 1.5.0.rc1
> > > > > >
> > > > > > It would be great to create an issue with reproducible steps 
> > > > > > and move the discussion there.
> > > > > >
> > > > > > Also I see sockeye nightly build[1] has been failing for 
> > > > > > some time,
> > > if
> > > > > > it’s due to MXNet change, please raise this early so we can 
> > > > > > track and solve it in time rather than block the release during vote time.
> > > > > >
> > > > > > [1] https://travis-ci.org/awslabs/sockeye
> > > > > >
> > > > > >
> > > > > > On Fri, Jun 21, 2019 at 7:01 AM Anirudh Subramanian 
> > > > > > <anirudh2290@gmail.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > I was able to reproduce a crash with the commit
> > > > > > > 09202f7f261954383aa387144524d38f83f18d06 but not with the 
> > > > > > > commit a862270beb2d796c1ba311183f7f4a766a18ad6c.
> > > > > > >
> > > > > > > Anirudh
> > > > > > >
> > > > > > > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei 
> > > > > > > <ro...@gmail.com>
> > > wrote:
> > > > > > >
> > > > > > > > Hi Przemyslaw,
> > > > > > > >
> > > > > > > > Is there an issue with more details to track the problem?
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Jun 21, 2019 at 6:04 AM Przemysław Trędak 
> > > > > > > > <pt...@apache.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > -1
> > > > > > > > >
> > > > > > > > > There is a crash in sockeye unit test (python setup.py 
> > > > > > > > > test) observed starting with nightly 1.5 build from 
> > > > > > > > > 6/13 and still occuring in
> > > > > > > 1.5rc1. I
> > > > > > > > > don't yet have the exact commit that is responsible 
> > > > > > > > > for it, but it is either 
> > > > > > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c (dlpack
> > > > > > > > > related) or
> > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06 (cached op
> > > > > optimization).
> > > > > > > > >
> > > > > > > > > On 2019/06/20 06:36:22, Lai Wei <ro...@gmail.com> wrote:
> > > > > > > > > > Dear MXNet community,
> > > > > > > > > >
> > > > > > > > > > This is the 3-day vote to release Apache MXNet 
> > > > > > > > > > (incubating) version
> > > > > > > > > 1.5.0.
> > > > > > > > > > Voting on dev@ will start June 19, 23:59:59(PST)  
> > > > > > > > > > and close
> > > on
> > > > > > June
> > > > > > > > 22,
> > > > > > > > > > 23:59:59.
> > > > > > > > > >
> > > > > > > > > > 1) Link to release notes:
> > > > > > > > > >
> > > > > > >
> > > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+No
> > > te
> > > > > > > s
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 2) Link to release candidate:
> > > > > > > > > >
> > > > > > > > > >
> > > https://github.com/apache/incubator-mxnet/releases/tag/1.5.0.r
> > > > > > > > > > c1
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 3) Link to source and signatures on apache dist server:
> > > > > > > > > >
> > > > > > > > > >
> > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.0.r
> > > > > > > > > > c1/
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Please remember to TEST first before voting accordingly:
> > > > > > > > > >
> > > > > > > > > > +1 = approve
> > > > > > > > > > +0 = no opinion
> > > > > > > > > > -1 = disapprove (provide reason)
> > > > > > > > > > --
> > > > > > > > > > Best Regards
> > > > > > > > > >
> > > > > > > > > > Lai
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > --
> > > > > > > > Best Regards
> > > > > > > >
> > > > > > > > Lai
> > > > > > > >
> > > > > > >
> > > > > > --
> > > > > > Best Regards
> > > > > >
> > > > > > Lai
> > > > > >
> > > >
> > >
> > --
> > Best Regards
> >
> > Lai

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Posted by Pedro Larroy <pe...@gmail.com>.
Hi these were my build flags and system info:


--- # CMake configuration
USE_CUDA: "OFF" # Build with CUDA support
USE_OLDCMAKECUDA: "OFF" # Build with old cmake cuda
USE_NCCL: "OFF" # Use NVidia NCCL with CUDA
USE_OPENCV: "ON" # Build with OpenCV support
USE_OPENMP: "ON" # Build with Openmp support
USE_CUDNN: "ON" # Build with cudnn support) # one could set CUDNN_ROOT
for search path
USE_SSE: "ON" # Build with x86 SSE instruction support IF NOT ARM
USE_F16C: "ON" # Build with x86 F16C instruction support) #
autodetects support if "ON"
USE_LAPACK: "ON" # Build with lapack support
USE_MKL_IF_AVAILABLE: "ON" # Use MKL if found
USE_MKLML_MKL: "ON" # Use MKLDNN variant of MKL (if MKL found) IF
USE_MKL_IF_AVAILABLE AND (NOT APPLE)
USE_MKLDNN: "ON" # Use MKLDNN variant of MKL (if MKL found) IF
USE_MKL_IF_AVAILABLE AND (NOT APPLE)
USE_OPERATOR_TUNING: "ON" # Enable auto-tuning of operators IF NOT MSVC
USE_GPERFTOOLS: "ON" # Build with GPerfTools support (if found)
USE_JEMALLOC: "ON" # Build with Jemalloc support
USE_PROFILER: "ON" # Build with Profiler support
USE_DIST_KVSTORE: "OFF" # Build with DIST_KVSTORE support
USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC Plugins
USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin
USE_CPP_PACKAGE: "OFF" # Build C++ Package
USE_MXNET_LIB_NAMING: "ON" # Use MXNet library naming conventions.
USE_GPROF: "OFF" # Compile with gprof (profiling) flag
USE_CXX14_IF_AVAILABLE: "OFF" # Build with C++14 if the compiler supports it
USE_VTUNE: "OFF" # Enable use of Intel Amplifier XE (VTune)) # one
could set VTUNE_ROOT for search path
ENABLE_CUDA_RTC: "ON" # Build with CUDA runtime compilation support
BUILD_CPP_EXAMPLES: "ON" # Build cpp examples
INSTALL_EXAMPLES: "OFF" # Install the example source files.
USE_SIGNAL_HANDLER: "ON" # Print stack traces on segfaults.
USE_TENSORRT: "OFF" # Enable infeference optimization with TensorRT.
USE_ASAN: "OFF" # Enable Clang/GCC ASAN sanitizers.
ENABLE_TESTCOVERAGE: "OFF" # Enable compilation with test coverage metric output
CMAKE_BUILD_TYPE: "Release"
CMAKE_CUDA_COMPILER_LAUNCHER: "ccache"
CMAKE_C_COMPILER_LAUNCHER: "ccache"
CMAKE_CXX_COMPILER_LAUNCHER: "ccache"

commit 4d9667121ae6fb643f2a02ab15e25231ed756cde (HEAD, tag: 1.5.0.rc1,
upstream/v1.5.x)
commit 1a7199691f5cbc6012bb53eecbf884bed5ae6590 (HEAD, tag: 1.4.1.rc0,
upstream/v1.4.x)

curl http://169.254.169.254/latest/meta-data/instance-type
c5d.18xlarge


Version      : 3.6.7
Compiler     : GCC 8.2.0
Build        : ('default', 'Oct 22 2018 11:32:17')
Arch         : ('64bit', 'ELF')
------------Pip Info-----------
Version      : 19.1.1
Directory    : /home/piotr/mxnet_1.5/py3_venv/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version      : 1.5.0
Directory    : /home/piotr/mxnet_1.5/python/mxnet
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform     : Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
system       : Linux
node         : ip-172-31-63-171
release      : 4.15.0-1035-aws
version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              72
On-line CPU(s) list: 0-71
Thread(s) per core:  2
Core(s) per socket:  18
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
Stepping:            4
CPU MHz:             1326.446
BogoMIPS:            6000.00
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            25344K
NUMA node0 CPU(s):   0-17,36-53
NUMA node1 CPU(s):   18-35,54-71
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx
pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology
nonstop_tsc cpuid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 pcid
sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx
f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti
fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx
avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw
avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke
----------Network Test----------

----------Python Info----------
Version      : 3.6.7
Compiler     : GCC 8.2.0
Build        : ('default', 'Oct 22 2018 11:32:17')
Arch         : ('64bit', 'ELF')
------------Pip Info-----------
Version      : 19.1.1
Directory    : /home/piotr/mxnet_1.4/py3_venv/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version      : 1.4.1
Directory    : /home/piotr/mxnet_1.4/python/mxnet
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform     : Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
system       : Linux
node         : ip-172-31-63-171
release      : 4.15.0-1035-aws
version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              72
On-line CPU(s) list: 0-71
Thread(s) per core:  2
Core(s) per socket:  18
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
Stepping:            4
CPU MHz:             1223.344
BogoMIPS:            6000.00
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            25344K
NUMA node0 CPU(s):   0-17,36-53
NUMA node1 CPU(s):   18-35,54-71
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx
pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology
nonstop_tsc cpuid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 pcid
sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx
f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti
fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx
avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw
avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke
----------Network Test----------

On Tue, Jun 25, 2019 at 2:35 PM Pedro Larroy
<pe...@gmail.com> wrote:
>
> I did a training of cifar10 in CPU and seems there's some regressions
> in the range of 7% increase of training time against 1.4.1:
>
> (py3_venv) piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
> (master)+$ time python cifar10.py --epochs 5
> real    11m30.388s
> user    417m7.766s
> sys     16m57.315s
>
> VS 1.4.1:
> real    10m41.994s
> user    392m40.646s
> sys     12m30.601s
>
>
> On Thu, Jun 20, 2019 at 10:15 PM Lai Wei <ro...@gmail.com> wrote:
> >
> > Hi Anirudh,
> >
> > Thanks for jumping into this quickly, I followed up on the issue.
> >
> > I was meant for sockeye developer/maintainers to help setup nightly tests
> > and raise issues early.
> >
> > Thanks!
> >
> > On Fri, Jun 21, 2019 at 10:10 AM Haibin Lin <ha...@gmail.com>
> > wrote:
> >
> > > In GluonNLP we are testing with MXNET nightly build for each PR, and we did
> > > find some MXNet related issue caught by the CI.
> > > I recommend other toolkits also add integration tests with MXNet nightly.
> > > It helps identify issues early.
> > >
> > > Best,
> > > Haibin
> > >
> > > On Thu, Jun 20, 2019 at 18:52 Zhao, Patric <pa...@intel.com> wrote:
> > >
> > > > Thanks to raise the issue and we will take a look ASAP.
> > > >
> > > > The downstream cases is not in the MXNet CI so it's hard to catch the
> > > > potential bugs or performance degradation for MXNet developers.
> > > >
> > > > In the future, I suggest adding the major downstream test cases, like
> > > from
> > > > sockeye, GluonNLP, GLuonCV, DGL, Gluon-TS, into the nightly test.
> > > > If it's still too heavy,  maybe testing it weekly or monthly :)
> > > >
> > > > Thanks,
> > > >
> > > > --Patric
> > > >
> > > > > -----Original Message-----
> > > > > From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
> > > > > Sent: Friday, June 21, 2019 9:31 AM
> > > > > To: dev@mxnet.incubator.apache.org
> > > > > Cc: dev@mxnet.apache.org
> > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1
> > > > >
> > > > > Hi Lai,
> > > > >
> > > > > I have opened an issue:
> > > > > https://github.com/apache/incubator-mxnet/issues/15297
> > > > > I came to know about this issue only today and I have not been
> > > monitoring
> > > > > sockeye.
> > > > > I jumped onto this issue to make sure it wasn't caused by the dlpack
> > > > changes.
> > > > > Also, I don't  think sockeye CI checks against master, it is using
> > > 1.4.1.
> > > > >
> > > > > Anirudh
> > > > >
> > > > >
> > > > > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei <ro...@gmail.com> wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Could you share which test failed and what’s the crash? How to
> > > > > > reproduce it?
> > > > > >
> > > > > > I was able to install sockeye and run all tests passed. Using python
> > > > > > setup.py test
> > > > > >
> > > > > > I have tested both nightly pip package and 1.5.0.rc1
> > > > > >
> > > > > > It would be great to create an issue with reproducible steps and move
> > > > > > the discussion there.
> > > > > >
> > > > > > Also I see sockeye nightly build[1] has been failing for some time,
> > > if
> > > > > > it’s due to MXNet change, please raise this early so we can track and
> > > > > > solve it in time rather than block the release during vote time.
> > > > > >
> > > > > > [1] https://travis-ci.org/awslabs/sockeye
> > > > > >
> > > > > >
> > > > > > On Fri, Jun 21, 2019 at 7:01 AM Anirudh Subramanian
> > > > > > <anirudh2290@gmail.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > I was able to reproduce a crash with the commit
> > > > > > > 09202f7f261954383aa387144524d38f83f18d06 but not with the commit
> > > > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c.
> > > > > > >
> > > > > > > Anirudh
> > > > > > >
> > > > > > > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei <ro...@gmail.com>
> > > wrote:
> > > > > > >
> > > > > > > > Hi Przemyslaw,
> > > > > > > >
> > > > > > > > Is there an issue with more details to track the problem?
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Jun 21, 2019 at 6:04 AM Przemysław Trędak
> > > > > > > > <pt...@apache.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > -1
> > > > > > > > >
> > > > > > > > > There is a crash in sockeye unit test (python setup.py test)
> > > > > > > > > observed starting with nightly 1.5 build from 6/13 and still
> > > > > > > > > occuring in
> > > > > > > 1.5rc1. I
> > > > > > > > > don't yet have the exact commit that is responsible for it, but
> > > > > > > > > it is either a862270beb2d796c1ba311183f7f4a766a18ad6c (dlpack
> > > > > > > > > related) or
> > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06 (cached op
> > > > > optimization).
> > > > > > > > >
> > > > > > > > > On 2019/06/20 06:36:22, Lai Wei <ro...@gmail.com> wrote:
> > > > > > > > > > Dear MXNet community,
> > > > > > > > > >
> > > > > > > > > > This is the 3-day vote to release Apache MXNet (incubating)
> > > > > > > > > > version
> > > > > > > > > 1.5.0.
> > > > > > > > > > Voting on dev@ will start June 19, 23:59:59(PST)  and close
> > > on
> > > > > > June
> > > > > > > > 22,
> > > > > > > > > > 23:59:59.
> > > > > > > > > >
> > > > > > > > > > 1) Link to release notes:
> > > > > > > > > >
> > > > > > >
> > > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Note
> > > > > > > s
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 2) Link to release candidate:
> > > > > > > > > >
> > > > > > > > > >
> > > https://github.com/apache/incubator-mxnet/releases/tag/1.5.0.r
> > > > > > > > > > c1
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 3) Link to source and signatures on apache dist server:
> > > > > > > > > >
> > > > > > > > > >
> > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.0.r
> > > > > > > > > > c1/
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Please remember to TEST first before voting accordingly:
> > > > > > > > > >
> > > > > > > > > > +1 = approve
> > > > > > > > > > +0 = no opinion
> > > > > > > > > > -1 = disapprove (provide reason)
> > > > > > > > > > --
> > > > > > > > > > Best Regards
> > > > > > > > > >
> > > > > > > > > > Lai
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > --
> > > > > > > > Best Regards
> > > > > > > >
> > > > > > > > Lai
> > > > > > > >
> > > > > > >
> > > > > > --
> > > > > > Best Regards
> > > > > >
> > > > > > Lai
> > > > > >
> > > >
> > >
> > --
> > Best Regards
> >
> > Lai

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Posted by Pedro Larroy <pe...@gmail.com>.
I did a training of cifar10 in CPU and seems there's some regressions
in the range of 7% increase of training time against 1.4.1:

(py3_venv) piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
(master)+$ time python cifar10.py --epochs 5
real    11m30.388s
user    417m7.766s
sys     16m57.315s

VS 1.4.1:
real    10m41.994s
user    392m40.646s
sys     12m30.601s


On Thu, Jun 20, 2019 at 10:15 PM Lai Wei <ro...@gmail.com> wrote:
>
> Hi Anirudh,
>
> Thanks for jumping into this quickly, I followed up on the issue.
>
> I was meant for sockeye developer/maintainers to help setup nightly tests
> and raise issues early.
>
> Thanks!
>
> On Fri, Jun 21, 2019 at 10:10 AM Haibin Lin <ha...@gmail.com>
> wrote:
>
> > In GluonNLP we are testing with MXNET nightly build for each PR, and we did
> > find some MXNet related issue caught by the CI.
> > I recommend other toolkits also add integration tests with MXNet nightly.
> > It helps identify issues early.
> >
> > Best,
> > Haibin
> >
> > On Thu, Jun 20, 2019 at 18:52 Zhao, Patric <pa...@intel.com> wrote:
> >
> > > Thanks to raise the issue and we will take a look ASAP.
> > >
> > > The downstream cases is not in the MXNet CI so it's hard to catch the
> > > potential bugs or performance degradation for MXNet developers.
> > >
> > > In the future, I suggest adding the major downstream test cases, like
> > from
> > > sockeye, GluonNLP, GLuonCV, DGL, Gluon-TS, into the nightly test.
> > > If it's still too heavy,  maybe testing it weekly or monthly :)
> > >
> > > Thanks,
> > >
> > > --Patric
> > >
> > > > -----Original Message-----
> > > > From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
> > > > Sent: Friday, June 21, 2019 9:31 AM
> > > > To: dev@mxnet.incubator.apache.org
> > > > Cc: dev@mxnet.apache.org
> > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1
> > > >
> > > > Hi Lai,
> > > >
> > > > I have opened an issue:
> > > > https://github.com/apache/incubator-mxnet/issues/15297
> > > > I came to know about this issue only today and I have not been
> > monitoring
> > > > sockeye.
> > > > I jumped onto this issue to make sure it wasn't caused by the dlpack
> > > changes.
> > > > Also, I don't  think sockeye CI checks against master, it is using
> > 1.4.1.
> > > >
> > > > Anirudh
> > > >
> > > >
> > > > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei <ro...@gmail.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Could you share which test failed and what’s the crash? How to
> > > > > reproduce it?
> > > > >
> > > > > I was able to install sockeye and run all tests passed. Using python
> > > > > setup.py test
> > > > >
> > > > > I have tested both nightly pip package and 1.5.0.rc1
> > > > >
> > > > > It would be great to create an issue with reproducible steps and move
> > > > > the discussion there.
> > > > >
> > > > > Also I see sockeye nightly build[1] has been failing for some time,
> > if
> > > > > it’s due to MXNet change, please raise this early so we can track and
> > > > > solve it in time rather than block the release during vote time.
> > > > >
> > > > > [1] https://travis-ci.org/awslabs/sockeye
> > > > >
> > > > >
> > > > > On Fri, Jun 21, 2019 at 7:01 AM Anirudh Subramanian
> > > > > <anirudh2290@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > I was able to reproduce a crash with the commit
> > > > > > 09202f7f261954383aa387144524d38f83f18d06 but not with the commit
> > > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c.
> > > > > >
> > > > > > Anirudh
> > > > > >
> > > > > > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei <ro...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > Hi Przemyslaw,
> > > > > > >
> > > > > > > Is there an issue with more details to track the problem?
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Jun 21, 2019 at 6:04 AM Przemysław Trędak
> > > > > > > <pt...@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > -1
> > > > > > > >
> > > > > > > > There is a crash in sockeye unit test (python setup.py test)
> > > > > > > > observed starting with nightly 1.5 build from 6/13 and still
> > > > > > > > occuring in
> > > > > > 1.5rc1. I
> > > > > > > > don't yet have the exact commit that is responsible for it, but
> > > > > > > > it is either a862270beb2d796c1ba311183f7f4a766a18ad6c (dlpack
> > > > > > > > related) or
> > > > > > > > 09202f7f261954383aa387144524d38f83f18d06 (cached op
> > > > optimization).
> > > > > > > >
> > > > > > > > On 2019/06/20 06:36:22, Lai Wei <ro...@gmail.com> wrote:
> > > > > > > > > Dear MXNet community,
> > > > > > > > >
> > > > > > > > > This is the 3-day vote to release Apache MXNet (incubating)
> > > > > > > > > version
> > > > > > > > 1.5.0.
> > > > > > > > > Voting on dev@ will start June 19, 23:59:59(PST)  and close
> > on
> > > > > June
> > > > > > > 22,
> > > > > > > > > 23:59:59.
> > > > > > > > >
> > > > > > > > > 1) Link to release notes:
> > > > > > > > >
> > > > > >
> > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Note
> > > > > > s
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 2) Link to release candidate:
> > > > > > > > >
> > > > > > > > >
> > https://github.com/apache/incubator-mxnet/releases/tag/1.5.0.r
> > > > > > > > > c1
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 3) Link to source and signatures on apache dist server:
> > > > > > > > >
> > > > > > > > >
> > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.0.r
> > > > > > > > > c1/
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Please remember to TEST first before voting accordingly:
> > > > > > > > >
> > > > > > > > > +1 = approve
> > > > > > > > > +0 = no opinion
> > > > > > > > > -1 = disapprove (provide reason)
> > > > > > > > > --
> > > > > > > > > Best Regards
> > > > > > > > >
> > > > > > > > > Lai
> > > > > > > > >
> > > > > > > >
> > > > > > > --
> > > > > > > Best Regards
> > > > > > >
> > > > > > > Lai
> > > > > > >
> > > > > >
> > > > > --
> > > > > Best Regards
> > > > >
> > > > > Lai
> > > > >
> > >
> >
> --
> Best Regards
>
> Lai

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Posted by Lai Wei <ro...@gmail.com>.
Hi Anirudh,

Thanks for jumping into this quickly, I followed up on the issue.

I was meant for sockeye developer/maintainers to help setup nightly tests
and raise issues early.

Thanks!

On Fri, Jun 21, 2019 at 10:10 AM Haibin Lin <ha...@gmail.com>
wrote:

> In GluonNLP we are testing with MXNET nightly build for each PR, and we did
> find some MXNet related issue caught by the CI.
> I recommend other toolkits also add integration tests with MXNet nightly.
> It helps identify issues early.
>
> Best,
> Haibin
>
> On Thu, Jun 20, 2019 at 18:52 Zhao, Patric <pa...@intel.com> wrote:
>
> > Thanks to raise the issue and we will take a look ASAP.
> >
> > The downstream cases is not in the MXNet CI so it's hard to catch the
> > potential bugs or performance degradation for MXNet developers.
> >
> > In the future, I suggest adding the major downstream test cases, like
> from
> > sockeye, GluonNLP, GLuonCV, DGL, Gluon-TS, into the nightly test.
> > If it's still too heavy,  maybe testing it weekly or monthly :)
> >
> > Thanks,
> >
> > --Patric
> >
> > > -----Original Message-----
> > > From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
> > > Sent: Friday, June 21, 2019 9:31 AM
> > > To: dev@mxnet.incubator.apache.org
> > > Cc: dev@mxnet.apache.org
> > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1
> > >
> > > Hi Lai,
> > >
> > > I have opened an issue:
> > > https://github.com/apache/incubator-mxnet/issues/15297
> > > I came to know about this issue only today and I have not been
> monitoring
> > > sockeye.
> > > I jumped onto this issue to make sure it wasn't caused by the dlpack
> > changes.
> > > Also, I don't  think sockeye CI checks against master, it is using
> 1.4.1.
> > >
> > > Anirudh
> > >
> > >
> > > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei <ro...@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > Could you share which test failed and what’s the crash? How to
> > > > reproduce it?
> > > >
> > > > I was able to install sockeye and run all tests passed. Using python
> > > > setup.py test
> > > >
> > > > I have tested both nightly pip package and 1.5.0.rc1
> > > >
> > > > It would be great to create an issue with reproducible steps and move
> > > > the discussion there.
> > > >
> > > > Also I see sockeye nightly build[1] has been failing for some time,
> if
> > > > it’s due to MXNet change, please raise this early so we can track and
> > > > solve it in time rather than block the release during vote time.
> > > >
> > > > [1] https://travis-ci.org/awslabs/sockeye
> > > >
> > > >
> > > > On Fri, Jun 21, 2019 at 7:01 AM Anirudh Subramanian
> > > > <anirudh2290@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > I was able to reproduce a crash with the commit
> > > > > 09202f7f261954383aa387144524d38f83f18d06 but not with the commit
> > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c.
> > > > >
> > > > > Anirudh
> > > > >
> > > > > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei <ro...@gmail.com>
> wrote:
> > > > >
> > > > > > Hi Przemyslaw,
> > > > > >
> > > > > > Is there an issue with more details to track the problem?
> > > > > >
> > > > > >
> > > > > > On Fri, Jun 21, 2019 at 6:04 AM Przemysław Trędak
> > > > > > <pt...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > -1
> > > > > > >
> > > > > > > There is a crash in sockeye unit test (python setup.py test)
> > > > > > > observed starting with nightly 1.5 build from 6/13 and still
> > > > > > > occuring in
> > > > > 1.5rc1. I
> > > > > > > don't yet have the exact commit that is responsible for it, but
> > > > > > > it is either a862270beb2d796c1ba311183f7f4a766a18ad6c (dlpack
> > > > > > > related) or
> > > > > > > 09202f7f261954383aa387144524d38f83f18d06 (cached op
> > > optimization).
> > > > > > >
> > > > > > > On 2019/06/20 06:36:22, Lai Wei <ro...@gmail.com> wrote:
> > > > > > > > Dear MXNet community,
> > > > > > > >
> > > > > > > > This is the 3-day vote to release Apache MXNet (incubating)
> > > > > > > > version
> > > > > > > 1.5.0.
> > > > > > > > Voting on dev@ will start June 19, 23:59:59(PST)  and close
> on
> > > > June
> > > > > > 22,
> > > > > > > > 23:59:59.
> > > > > > > >
> > > > > > > > 1) Link to release notes:
> > > > > > > >
> > > > >
> https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Note
> > > > > s
> > > > > > > >
> > > > > > > >
> > > > > > > > 2) Link to release candidate:
> > > > > > > >
> > > > > > > >
> https://github.com/apache/incubator-mxnet/releases/tag/1.5.0.r
> > > > > > > > c1
> > > > > > > >
> > > > > > > >
> > > > > > > > 3) Link to source and signatures on apache dist server:
> > > > > > > >
> > > > > > > >
> https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.0.r
> > > > > > > > c1/
> > > > > > > >
> > > > > > > >
> > > > > > > > Please remember to TEST first before voting accordingly:
> > > > > > > >
> > > > > > > > +1 = approve
> > > > > > > > +0 = no opinion
> > > > > > > > -1 = disapprove (provide reason)
> > > > > > > > --
> > > > > > > > Best Regards
> > > > > > > >
> > > > > > > > Lai
> > > > > > > >
> > > > > > >
> > > > > > --
> > > > > > Best Regards
> > > > > >
> > > > > > Lai
> > > > > >
> > > > >
> > > > --
> > > > Best Regards
> > > >
> > > > Lai
> > > >
> >
>
-- 
Best Regards

Lai

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Posted by Haibin Lin <ha...@gmail.com>.
In GluonNLP we are testing with MXNET nightly build for each PR, and we did
find some MXNet related issue caught by the CI.
I recommend other toolkits also add integration tests with MXNet nightly.
It helps identify issues early.

Best,
Haibin

On Thu, Jun 20, 2019 at 18:52 Zhao, Patric <pa...@intel.com> wrote:

> Thanks to raise the issue and we will take a look ASAP.
>
> The downstream cases is not in the MXNet CI so it's hard to catch the
> potential bugs or performance degradation for MXNet developers.
>
> In the future, I suggest adding the major downstream test cases, like from
> sockeye, GluonNLP, GLuonCV, DGL, Gluon-TS, into the nightly test.
> If it's still too heavy,  maybe testing it weekly or monthly :)
>
> Thanks,
>
> --Patric
>
> > -----Original Message-----
> > From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
> > Sent: Friday, June 21, 2019 9:31 AM
> > To: dev@mxnet.incubator.apache.org
> > Cc: dev@mxnet.apache.org
> > Subject: Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1
> >
> > Hi Lai,
> >
> > I have opened an issue:
> > https://github.com/apache/incubator-mxnet/issues/15297
> > I came to know about this issue only today and I have not been monitoring
> > sockeye.
> > I jumped onto this issue to make sure it wasn't caused by the dlpack
> changes.
> > Also, I don't  think sockeye CI checks against master, it is using 1.4.1.
> >
> > Anirudh
> >
> >
> > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei <ro...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > Could you share which test failed and what’s the crash? How to
> > > reproduce it?
> > >
> > > I was able to install sockeye and run all tests passed. Using python
> > > setup.py test
> > >
> > > I have tested both nightly pip package and 1.5.0.rc1
> > >
> > > It would be great to create an issue with reproducible steps and move
> > > the discussion there.
> > >
> > > Also I see sockeye nightly build[1] has been failing for some time, if
> > > it’s due to MXNet change, please raise this early so we can track and
> > > solve it in time rather than block the release during vote time.
> > >
> > > [1] https://travis-ci.org/awslabs/sockeye
> > >
> > >
> > > On Fri, Jun 21, 2019 at 7:01 AM Anirudh Subramanian
> > > <anirudh2290@gmail.com
> > > >
> > > wrote:
> > >
> > > > I was able to reproduce a crash with the commit
> > > > 09202f7f261954383aa387144524d38f83f18d06 but not with the commit
> > > > a862270beb2d796c1ba311183f7f4a766a18ad6c.
> > > >
> > > > Anirudh
> > > >
> > > > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei <ro...@gmail.com> wrote:
> > > >
> > > > > Hi Przemyslaw,
> > > > >
> > > > > Is there an issue with more details to track the problem?
> > > > >
> > > > >
> > > > > On Fri, Jun 21, 2019 at 6:04 AM Przemysław Trędak
> > > > > <pt...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > -1
> > > > > >
> > > > > > There is a crash in sockeye unit test (python setup.py test)
> > > > > > observed starting with nightly 1.5 build from 6/13 and still
> > > > > > occuring in
> > > > 1.5rc1. I
> > > > > > don't yet have the exact commit that is responsible for it, but
> > > > > > it is either a862270beb2d796c1ba311183f7f4a766a18ad6c (dlpack
> > > > > > related) or
> > > > > > 09202f7f261954383aa387144524d38f83f18d06 (cached op
> > optimization).
> > > > > >
> > > > > > On 2019/06/20 06:36:22, Lai Wei <ro...@gmail.com> wrote:
> > > > > > > Dear MXNet community,
> > > > > > >
> > > > > > > This is the 3-day vote to release Apache MXNet (incubating)
> > > > > > > version
> > > > > > 1.5.0.
> > > > > > > Voting on dev@ will start June 19, 23:59:59(PST)  and close on
> > > June
> > > > > 22,
> > > > > > > 23:59:59.
> > > > > > >
> > > > > > > 1) Link to release notes:
> > > > > > >
> > > > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Note
> > > > s
> > > > > > >
> > > > > > >
> > > > > > > 2) Link to release candidate:
> > > > > > >
> > > > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.5.0.r
> > > > > > > c1
> > > > > > >
> > > > > > >
> > > > > > > 3) Link to source and signatures on apache dist server:
> > > > > > >
> > > > > > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.0.r
> > > > > > > c1/
> > > > > > >
> > > > > > >
> > > > > > > Please remember to TEST first before voting accordingly:
> > > > > > >
> > > > > > > +1 = approve
> > > > > > > +0 = no opinion
> > > > > > > -1 = disapprove (provide reason)
> > > > > > > --
> > > > > > > Best Regards
> > > > > > >
> > > > > > > Lai
> > > > > > >
> > > > > >
> > > > > --
> > > > > Best Regards
> > > > >
> > > > > Lai
> > > > >
> > > >
> > > --
> > > Best Regards
> > >
> > > Lai
> > >
>

RE: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Posted by "Zhao, Patric" <pa...@intel.com>.
Thanks to raise the issue and we will take a look ASAP.

The downstream cases is not in the MXNet CI so it's hard to catch the potential bugs or performance degradation for MXNet developers.

In the future, I suggest adding the major downstream test cases, like from sockeye, GluonNLP, GLuonCV, DGL, Gluon-TS, into the nightly test.
If it's still too heavy,  maybe testing it weekly or monthly :)

Thanks,

--Patric

> -----Original Message-----
> From: Anirudh Subramanian [mailto:anirudh2290@gmail.com]
> Sent: Friday, June 21, 2019 9:31 AM
> To: dev@mxnet.incubator.apache.org
> Cc: dev@mxnet.apache.org
> Subject: Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1
> 
> Hi Lai,
> 
> I have opened an issue:
> https://github.com/apache/incubator-mxnet/issues/15297
> I came to know about this issue only today and I have not been monitoring
> sockeye.
> I jumped onto this issue to make sure it wasn't caused by the dlpack changes.
> Also, I don't  think sockeye CI checks against master, it is using 1.4.1.
> 
> Anirudh
> 
> 
> On Thu, Jun 20, 2019 at 6:17 PM Lai Wei <ro...@gmail.com> wrote:
> 
> > Hi,
> >
> > Could you share which test failed and what’s the crash? How to
> > reproduce it?
> >
> > I was able to install sockeye and run all tests passed. Using python
> > setup.py test
> >
> > I have tested both nightly pip package and 1.5.0.rc1
> >
> > It would be great to create an issue with reproducible steps and move
> > the discussion there.
> >
> > Also I see sockeye nightly build[1] has been failing for some time, if
> > it’s due to MXNet change, please raise this early so we can track and
> > solve it in time rather than block the release during vote time.
> >
> > [1] https://travis-ci.org/awslabs/sockeye
> >
> >
> > On Fri, Jun 21, 2019 at 7:01 AM Anirudh Subramanian
> > <anirudh2290@gmail.com
> > >
> > wrote:
> >
> > > I was able to reproduce a crash with the commit
> > > 09202f7f261954383aa387144524d38f83f18d06 but not with the commit
> > > a862270beb2d796c1ba311183f7f4a766a18ad6c.
> > >
> > > Anirudh
> > >
> > > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei <ro...@gmail.com> wrote:
> > >
> > > > Hi Przemyslaw,
> > > >
> > > > Is there an issue with more details to track the problem?
> > > >
> > > >
> > > > On Fri, Jun 21, 2019 at 6:04 AM Przemysław Trędak
> > > > <pt...@apache.org>
> > > > wrote:
> > > >
> > > > > -1
> > > > >
> > > > > There is a crash in sockeye unit test (python setup.py test)
> > > > > observed starting with nightly 1.5 build from 6/13 and still
> > > > > occuring in
> > > 1.5rc1. I
> > > > > don't yet have the exact commit that is responsible for it, but
> > > > > it is either a862270beb2d796c1ba311183f7f4a766a18ad6c (dlpack
> > > > > related) or
> > > > > 09202f7f261954383aa387144524d38f83f18d06 (cached op
> optimization).
> > > > >
> > > > > On 2019/06/20 06:36:22, Lai Wei <ro...@gmail.com> wrote:
> > > > > > Dear MXNet community,
> > > > > >
> > > > > > This is the 3-day vote to release Apache MXNet (incubating)
> > > > > > version
> > > > > 1.5.0.
> > > > > > Voting on dev@ will start June 19, 23:59:59(PST)  and close on
> > June
> > > > 22,
> > > > > > 23:59:59.
> > > > > >
> > > > > > 1) Link to release notes:
> > > > > >
> > > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Note
> > > s
> > > > > >
> > > > > >
> > > > > > 2) Link to release candidate:
> > > > > >
> > > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.5.0.r
> > > > > > c1
> > > > > >
> > > > > >
> > > > > > 3) Link to source and signatures on apache dist server:
> > > > > >
> > > > > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.0.r
> > > > > > c1/
> > > > > >
> > > > > >
> > > > > > Please remember to TEST first before voting accordingly:
> > > > > >
> > > > > > +1 = approve
> > > > > > +0 = no opinion
> > > > > > -1 = disapprove (provide reason)
> > > > > > --
> > > > > > Best Regards
> > > > > >
> > > > > > Lai
> > > > > >
> > > > >
> > > > --
> > > > Best Regards
> > > >
> > > > Lai
> > > >
> > >
> > --
> > Best Regards
> >
> > Lai
> >

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Posted by Anirudh Subramanian <an...@gmail.com>.
Hi Lai,

I have opened an issue:
https://github.com/apache/incubator-mxnet/issues/15297
I came to know about this issue only today and I have not been monitoring
sockeye.
I jumped onto this issue to make sure it wasn't caused by the dlpack
changes.
Also, I don't  think sockeye CI checks against master, it is using 1.4.1.

Anirudh


On Thu, Jun 20, 2019 at 6:17 PM Lai Wei <ro...@gmail.com> wrote:

> Hi,
>
> Could you share which test failed and what’s the crash? How to reproduce
> it?
>
> I was able to install sockeye and run all tests passed. Using
> python setup.py test
>
> I have tested both nightly pip package and 1.5.0.rc1
>
> It would be great to create an issue with reproducible steps and move the
> discussion there.
>
> Also I see sockeye nightly build[1] has been failing for some time, if it’s
> due to MXNet change, please raise this early so we can track and solve it
> in time rather than block the release during vote time.
>
> [1] https://travis-ci.org/awslabs/sockeye
>
>
> On Fri, Jun 21, 2019 at 7:01 AM Anirudh Subramanian <anirudh2290@gmail.com
> >
> wrote:
>
> > I was able to reproduce a crash with the commit
> > 09202f7f261954383aa387144524d38f83f18d06 but not with the commit
> > a862270beb2d796c1ba311183f7f4a766a18ad6c.
> >
> > Anirudh
> >
> > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei <ro...@gmail.com> wrote:
> >
> > > Hi Przemyslaw,
> > >
> > > Is there an issue with more details to track the problem?
> > >
> > >
> > > On Fri, Jun 21, 2019 at 6:04 AM Przemysław Trędak <pt...@apache.org>
> > > wrote:
> > >
> > > > -1
> > > >
> > > > There is a crash in sockeye unit test (python setup.py test) observed
> > > > starting with nightly 1.5 build from 6/13 and still occuring in
> > 1.5rc1. I
> > > > don't yet have the exact commit that is responsible for it, but it is
> > > > either a862270beb2d796c1ba311183f7f4a766a18ad6c (dlpack related) or
> > > > 09202f7f261954383aa387144524d38f83f18d06 (cached op optimization).
> > > >
> > > > On 2019/06/20 06:36:22, Lai Wei <ro...@gmail.com> wrote:
> > > > > Dear MXNet community,
> > > > >
> > > > > This is the 3-day vote to release Apache MXNet (incubating) version
> > > > 1.5.0.
> > > > > Voting on dev@ will start June 19, 23:59:59(PST)  and close on
> June
> > > 22,
> > > > > 23:59:59.
> > > > >
> > > > > 1) Link to release notes:
> > > > >
> > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Notes
> > > > >
> > > > >
> > > > > 2) Link to release candidate:
> > > > >
> > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.5.0.rc1
> > > > >
> > > > >
> > > > > 3) Link to source and signatures on apache dist server:
> > > > >
> > > > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.0.rc1/
> > > > >
> > > > >
> > > > > Please remember to TEST first before voting accordingly:
> > > > >
> > > > > +1 = approve
> > > > > +0 = no opinion
> > > > > -1 = disapprove (provide reason)
> > > > > --
> > > > > Best Regards
> > > > >
> > > > > Lai
> > > > >
> > > >
> > > --
> > > Best Regards
> > >
> > > Lai
> > >
> >
> --
> Best Regards
>
> Lai
>

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Posted by Lai Wei <ro...@gmail.com>.
Hi,

Could you share which test failed and what’s the crash? How to reproduce it?

I was able to install sockeye and run all tests passed. Using
python setup.py test

I have tested both nightly pip package and 1.5.0.rc1

It would be great to create an issue with reproducible steps and move the
discussion there.

Also I see sockeye nightly build[1] has been failing for some time, if it’s
due to MXNet change, please raise this early so we can track and solve it
in time rather than block the release during vote time.

[1] https://travis-ci.org/awslabs/sockeye


On Fri, Jun 21, 2019 at 7:01 AM Anirudh Subramanian <an...@gmail.com>
wrote:

> I was able to reproduce a crash with the commit
> 09202f7f261954383aa387144524d38f83f18d06 but not with the commit
> a862270beb2d796c1ba311183f7f4a766a18ad6c.
>
> Anirudh
>
> On Thu, Jun 20, 2019 at 3:53 PM Lai Wei <ro...@gmail.com> wrote:
>
> > Hi Przemyslaw,
> >
> > Is there an issue with more details to track the problem?
> >
> >
> > On Fri, Jun 21, 2019 at 6:04 AM Przemysław Trędak <pt...@apache.org>
> > wrote:
> >
> > > -1
> > >
> > > There is a crash in sockeye unit test (python setup.py test) observed
> > > starting with nightly 1.5 build from 6/13 and still occuring in
> 1.5rc1. I
> > > don't yet have the exact commit that is responsible for it, but it is
> > > either a862270beb2d796c1ba311183f7f4a766a18ad6c (dlpack related) or
> > > 09202f7f261954383aa387144524d38f83f18d06 (cached op optimization).
> > >
> > > On 2019/06/20 06:36:22, Lai Wei <ro...@gmail.com> wrote:
> > > > Dear MXNet community,
> > > >
> > > > This is the 3-day vote to release Apache MXNet (incubating) version
> > > 1.5.0.
> > > > Voting on dev@ will start June 19, 23:59:59(PST)  and close on June
> > 22,
> > > > 23:59:59.
> > > >
> > > > 1) Link to release notes:
> > > >
> https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Notes
> > > >
> > > >
> > > > 2) Link to release candidate:
> > > >
> > > > https://github.com/apache/incubator-mxnet/releases/tag/1.5.0.rc1
> > > >
> > > >
> > > > 3) Link to source and signatures on apache dist server:
> > > >
> > > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.0.rc1/
> > > >
> > > >
> > > > Please remember to TEST first before voting accordingly:
> > > >
> > > > +1 = approve
> > > > +0 = no opinion
> > > > -1 = disapprove (provide reason)
> > > > --
> > > > Best Regards
> > > >
> > > > Lai
> > > >
> > >
> > --
> > Best Regards
> >
> > Lai
> >
>
-- 
Best Regards

Lai

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Posted by Anirudh Subramanian <an...@gmail.com>.
I was able to reproduce a crash with the commit
09202f7f261954383aa387144524d38f83f18d06 but not with the commit
a862270beb2d796c1ba311183f7f4a766a18ad6c.

Anirudh

On Thu, Jun 20, 2019 at 3:53 PM Lai Wei <ro...@gmail.com> wrote:

> Hi Przemyslaw,
>
> Is there an issue with more details to track the problem?
>
>
> On Fri, Jun 21, 2019 at 6:04 AM Przemysław Trędak <pt...@apache.org>
> wrote:
>
> > -1
> >
> > There is a crash in sockeye unit test (python setup.py test) observed
> > starting with nightly 1.5 build from 6/13 and still occuring in 1.5rc1. I
> > don't yet have the exact commit that is responsible for it, but it is
> > either a862270beb2d796c1ba311183f7f4a766a18ad6c (dlpack related) or
> > 09202f7f261954383aa387144524d38f83f18d06 (cached op optimization).
> >
> > On 2019/06/20 06:36:22, Lai Wei <ro...@gmail.com> wrote:
> > > Dear MXNet community,
> > >
> > > This is the 3-day vote to release Apache MXNet (incubating) version
> > 1.5.0.
> > > Voting on dev@ will start June 19, 23:59:59(PST)  and close on June
> 22,
> > > 23:59:59.
> > >
> > > 1) Link to release notes:
> > > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Notes
> > >
> > >
> > > 2) Link to release candidate:
> > >
> > > https://github.com/apache/incubator-mxnet/releases/tag/1.5.0.rc1
> > >
> > >
> > > 3) Link to source and signatures on apache dist server:
> > >
> > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.0.rc1/
> > >
> > >
> > > Please remember to TEST first before voting accordingly:
> > >
> > > +1 = approve
> > > +0 = no opinion
> > > -1 = disapprove (provide reason)
> > > --
> > > Best Regards
> > >
> > > Lai
> > >
> >
> --
> Best Regards
>
> Lai
>

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Posted by Lai Wei <ro...@gmail.com>.
Hi Przemyslaw,

Is there an issue with more details to track the problem?


On Fri, Jun 21, 2019 at 6:04 AM Przemysław Trędak <pt...@apache.org>
wrote:

> -1
>
> There is a crash in sockeye unit test (python setup.py test) observed
> starting with nightly 1.5 build from 6/13 and still occuring in 1.5rc1. I
> don't yet have the exact commit that is responsible for it, but it is
> either a862270beb2d796c1ba311183f7f4a766a18ad6c (dlpack related) or
> 09202f7f261954383aa387144524d38f83f18d06 (cached op optimization).
>
> On 2019/06/20 06:36:22, Lai Wei <ro...@gmail.com> wrote:
> > Dear MXNet community,
> >
> > This is the 3-day vote to release Apache MXNet (incubating) version
> 1.5.0.
> > Voting on dev@ will start June 19, 23:59:59(PST)  and close on June 22,
> > 23:59:59.
> >
> > 1) Link to release notes:
> > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Notes
> >
> >
> > 2) Link to release candidate:
> >
> > https://github.com/apache/incubator-mxnet/releases/tag/1.5.0.rc1
> >
> >
> > 3) Link to source and signatures on apache dist server:
> >
> > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.0.rc1/
> >
> >
> > Please remember to TEST first before voting accordingly:
> >
> > +1 = approve
> > +0 = no opinion
> > -1 = disapprove (provide reason)
> > --
> > Best Regards
> >
> > Lai
> >
>
-- 
Best Regards

Lai

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Posted by Przemys��aw Tr��dak <pt...@apache.org>.
-1

There is a crash in sockeye unit test (python setup.py test) observed starting with nightly 1.5 build from 6/13 and still occuring in 1.5rc1. I don't yet have the exact commit that is responsible for it, but it is either a862270beb2d796c1ba311183f7f4a766a18ad6c (dlpack related) or 09202f7f261954383aa387144524d38f83f18d06 (cached op optimization).

On 2019/06/20 06:36:22, Lai Wei <ro...@gmail.com> wrote: 
> Dear MXNet community,
> 
> This is the 3-day vote to release Apache MXNet (incubating) version 1.5.0.
> Voting on dev@ will start June 19, 23:59:59(PST)  and close on June 22,
> 23:59:59.
> 
> 1) Link to release notes:
> https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Notes
> 
> 
> 2) Link to release candidate:
> 
> https://github.com/apache/incubator-mxnet/releases/tag/1.5.0.rc1
> 
> 
> 3) Link to source and signatures on apache dist server:
> 
> https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.0.rc1/
> 
> 
> Please remember to TEST first before voting accordingly:
> 
> +1 = approve
> +0 = no opinion
> -1 = disapprove (provide reason)
> -- 
> Best Regards
> 
> Lai
>