You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mxnet.apache.org by Lai Wei <ro...@gmail.com> on 2019/07/05 21:39:54 UTC

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Hi all,

An update on the regression issue:

There is no significant regression on operators between 1.4.1 and 1.5.0
according to latest finding here [1].
The previous possible regresion observed is due to profiler change between
1.4.1 and 1.5.0 so it's not an apple to apple comparison. Please refer to
the performance results using time and timeit module from this comment. [2]

With that, let's restart voting on 1.5.0.rc2, as there is no code change
required.

[1]
https://github.com/apache/incubator-mxnet/issues/15429#issuecomment-508865398
[2]
https://github.com/apache/incubator-mxnet/issues/15429#issuecomment-508831150





Best Regards

Lai


On Sat, Jun 29, 2019 at 12:35 PM Chris Olivier <cj...@gmail.com>
wrote:

> for batch norm, I mean. max*
>
> On Sat, Jun 29, 2019 at 12:34 PM Chris Olivier <cj...@gmail.com>
> wrote:
>
> > what’s with the mac memory usage being 2x in 1.4? As I am not sure where
> > the number is coming from (if it’s my profiler code, I wouldn’t consider
> it
> > terribly meaningful), but it is the same everywhere else, so it kind of
> > sticks out.
> >
> > On Thu, Jun 27, 2019 at 3:36 PM sandeep krishnamurthy <
> > sandeep.krishna98@gmail.com> wrote:
> >
> >> Hello Ciyong/Pedro,
> >>
> >> Ran operator benchmarks on 1.4.1 and 1.5.0.rc2. (Not complete, doesn’t
> >> cover all MXNet operators, not presented in best possible way, still
> WIP)
> >>
> >>
> https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50
> >>
> >> Following operators looks slower in 1.5 compared to 1.4.1:
> >> - BatchNorm
> >> - Pooling
> >> - FullyConnected
> >> - batch_dot
> >> - Dot
> >> - broadcast_mul
> >> - log_softmax
> >> and few other operators
> >>
> >> Also, several operators runs a lot faster on 1.5 compared to 1.4.1. For
> >> example - Convolution, flatten, elementwise operators etc. So I see that
> >> likely few operators have regressed noticeably, however, due to other
> >> operator performance improvements, the end effect is not that
> significant
> >> hiding a lot of regression. We need more detailed analysis per operator
> >> performance. We will not be able to do this for current release, we
> should
> >> have a more concrete way to determining such performance regression
> before
> >> next release.
> >>
> >> Setup:
> >> 1.5 => Build from source (head of 1.5.rc2 tag), built with MKLDNN
> >> 1.4.1 => PyPi mxnet-mkl==1.4.1
> >> Machine: C5.18X
> >> No explicit environment variable were set
> >> Operator benchmark code -
> >> https://github.com/apache/incubator-mxnet/tree/master/benchmark/opperf
> >>
> >> Best,
> >> Sandeep
> >>
> >>
> >> On Thu, Jun 27, 2019 at 10:42 AM Pedro Larroy <
> >> pedro.larroy.lists@gmail.com>
> >> wrote:
> >>
> >> > I will try to run a few benchmarks in a bare metal instance tonight to
> >> > remove virtualization variance for the measurements and provide some
> >> > numbers.
> >> >
> >> > Please propose a set of models / examples that would be desirable to
> >> > run before the release and provide a link to an easy to run script
> >> > with instructions so we can validate the release better.
> >> >
> >> > Thank you.
> >> >
> >> > On Thu, Jun 27, 2019 at 10:01 AM Lai Wei <ro...@gmail.com> wrote:
> >> > >
> >> > > Dear @dev,
> >> > >
> >> > > I m cancelling the vote for cached op fix:
> >> > >
> >> > > https://github.com/apache/incubator-mxnet/pull/15298
> >> > >
> >> > > As for the possible cpu training regression, it looks like not a
> >> blocker
> >> > > for now.
> >> > >
> >> > > I will start a new rc2 vote, please help to validate.
> >> > >
> >> > > Thanks!
> >> > >
> >> > >
> >> > > On Thu, Jun 27, 2019 at 10:06 PM Chen, Ciyong <
> ciyong.chen@intel.com>
> >> > wrote:
> >> > >
> >> > > > Hi Pedro,
> >> > > >
> >> > > > I was able to reproduced the similar result (v1.5 is ~%5.6 slower
> >> than
> >> > > > v1.4, I was using 18 cores for computing) with your script on
> >> > C5.18xlarge.
> >> > > > But need to bind the cores with below command when running the
> >> script,
> >> > > > (without setting the env variables, I got a close time (<1%) with
> >> v1.5
> >> > and
> >> > > > v1.4)
> >> > > >         export
> >> KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0
> >> > > >         export OMP_NUM_THREADS=18
> >> > > >
> >> > > > Did you set any env variables during running?
> >> > > >
> >> > > > The performance result I got as below:
> >> > > > 1) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
> >> > > > real    12m10.856s
> >> > > > user    234m49.576s
> >> > > > sys     4m38.044s
> >> > > >
> >> > > > 2) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
> >> > > > real    12m52.140s
> >> > > > user    246m30.740s
> >> > > > sys     5m8.188s
> >> > > >
> >> > > > As I looked at the profiling data, most of the ops have same perf
> >> > between
> >> > > > v1.4 and v1.5. But some ops like " _backward_BatchNorm" and
> >> "Pooling"
> >> > is
> >> > > > ~1.37x slower on v1.5 compared with v1.4.
> >> > > > Will do further analysis on these ops.
> >> > > >
> >> > > > Here's the hardware/OS info from my side:
> >> > > > ----------Python Info----------
> >> > > > Version      : 3.6.8
> >> > > > Compiler     : GCC 7.3.0
> >> > > > Build        : ('default', 'Dec 30 2018 01:22:34')
> >> > > > Arch         : ('64bit', '')
> >> > > > ------------Pip Info-----------
> >> > > > Version      : 19.0.3
> >> > > > Directory    :
> >> > > >
> >> /home/ubuntu/anaconda3/envs/perf-mxnet/lib/python3.6/site-packages/pip
> >> > > > ----------MXNet Info-----------
> >> > > > Version      : 1.5.0
> >> > > > Directory    : /home/ubuntu/ws/incubator-mxnet/python/mxnet
> >> > > > Hashtag not found. Not installed from pre-built package.
> >> > > > ----------System Info----------
> >> > > > Platform     : Linux-4.4.0-1085-aws-x86_64-with-debian-stretch-sid
> >> > > > system       : Linux
> >> > > > node         : ip-172-31-32-129
> >> > > > release      : 4.4.0-1085-aws
> >> > > > version      : #96-Ubuntu SMP Tue Jun 11 09:08:32 UTC 2019
> >> > > > ----------Hardware Info----------
> >> > > > machine      : x86_64
> >> > > > processor    : x86_64
> >> > > > Architecture:          x86_64
> >> > > > CPU op-mode(s):        32-bit, 64-bit
> >> > > > Byte Order:            Little Endian
> >> > > > CPU(s):                72
> >> > > > On-line CPU(s) list:   0-71
> >> > > > Thread(s) per core:    2
> >> > > > Core(s) per socket:    18
> >> > > > Socket(s):             2
> >> > > > NUMA node(s):          2
> >> > > > Vendor ID:             GenuineIntel
> >> > > > CPU family:            6
> >> > > > Model:                 85
> >> > > > Model name:            Intel(R) Xeon(R) Platinum 8124M CPU @
> 3.00GHz
> >> > > > Stepping:              3
> >> > > > CPU MHz:               3000.000
> >> > > > BogoMIPS:              6000.00
> >> > > > Hypervisor vendor:     KVM
> >> > > > Virtualization type:   full
> >> > > > L1d cache:             32K
> >> > > > L1i cache:             32K
> >> > > > L2 cache:              1024K
> >> > > > L3 cache:              25344K
> >> > > > NUMA node0 CPU(s):     0-17,36-53
> >> > > > NUMA node1 CPU(s):     18-35,54-71
> >> > > > Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep
> >> mtrr
> >> > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx
> >> > pdpe1gb
> >> > > > rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology
> >> nonstop_tsc
> >> > > > aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16
> pcid
> >> > sse4_1
> >> > > > sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c
> >> rdrand
> >> > > > hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser
> fsgsbase
> >> > > > tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f
> >> rdseed
> >> > adx
> >> > > > smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 ida arat pku
> >> > > > ----------Network Test----------
> >> > > >
> >> > > >
> >> > > > -Ciyong
> >> > > >
> >> > > >
> >> > > > -----Original Message-----
> >> > > > From: Zhao, Patric [mailto:patric.zhao@intel.com]
> >> > > > Sent: Thursday, June 27, 2019 9:55 AM
> >> > > > To: dev@mxnet.incubator.apache.org
> >> > > > Cc: dev@mxnet.apache.org
> >> > > > Subject: RE: [VOTE] Release Apache MXNet (incubating) version
> >> 1.5.0.rc1
> >> > > >
> >> > > > Could we run more epochs to see the performance difference or
> >> profiling
> >> > > > the difference between good and bad run?
> >> > > >
> >> > > > > -----Original Message-----
> >> > > > > From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com]
> >> > > > > Sent: Thursday, June 27, 2019 9:35 AM
> >> > > > > To: dev@mxnet.incubator.apache.org
> >> > > > > Cc: dev@mxnet.apache.org
> >> > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version
> >> > > > > 1.5.0.rc1
> >> > > > >
> >> > > > > I run again and the gap is again bigger, I guess we need to
> >> average
> >> > > > > out the times across several runs:
> >> > > > >
> >> > > > > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
> >> > > > > (master)+$ time ~/mxnet_1.4/py3_venv/bin/python cifar10.py
> >> --epochs 5
> >> > > > > && time ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
> >> > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
> >> > > > > ImageRecordIOParser2:
> >> > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
> >> > threads
> >> > > > > for decoding..
> >> > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
> >> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >> > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
> >> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >> completed
> >> > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
> >> > > > > ImageRecordIOParser2:
> >> > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
> >> threads
> >> > > > > for decoding..
> >> > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
> >> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >> > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
> >> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >> completed
> >> > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
> 300:
> >> > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 [23:17:09]
> >> > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> >> > > > > 147456 bytes with malloc directly
> >> > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> >> > > > > 589824 bytes with malloc directly
> >> > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> >> > > > > 2359296 bytes with malloc directly
> >> > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> >> > > > > 9437184 bytes with malloc directly
> >> > > > > Epoch 0, Batch 199, Speed=384.149839
> >> > > > > Epoch 0, Duration=140.919567
> >> > > > > Epoch 0, Training accuracy=0.115169
> >> > > > > Epoch 0, Validation accuracy=0.141317
> >> > > > > Epoch 1, Batch 199, Speed=433.380512
> >> > > > > Epoch 1, Duration=119.553233
> >> > > > > Epoch 1, Training accuracy=0.170956
> >> > > > > Epoch 1, Validation accuracy=0.216146
> >> > > > > Epoch 2, Batch 199, Speed=434.864699
> >> > > > > Epoch 2, Duration=123.278490
> >> > > > > Epoch 2, Training accuracy=0.209455
> >> > > > > Epoch 2, Validation accuracy=0.247296
> >> > > > > Epoch 3, Batch 199, Speed=433.401854
> >> > > > > Epoch 3, Duration=118.327797
> >> > > > > Epoch 3, Training accuracy=0.248701
> >> > > > > Epoch 3, Validation accuracy=0.302083
> >> > > > > Epoch 4, Batch 199, Speed=419.713707
> >> > > > > Epoch 4, Duration=126.468409
> >> > > > > Epoch 4, Training accuracy=0.260949
> >> > > > > Epoch 4, Validation accuracy=0.269030
> >> > > > >
> >> > > > > real    10m55.796s
> >> > > > > user    399m33.567s
> >> > > > > sys     13m55.904s
> >> > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
> >> > > > > ImageRecordIOParser2:
> >> > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
> >> > threads
> >> > > > > for decoding..
> >> > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
> >> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >> > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
> >> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >> completed
> >> > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
> >> > > > > ImageRecordIOParser2:
> >> > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
> >> threads
> >> > > > > for decoding..
> >> > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
> >> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >> > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
> >> > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >> completed
> >> > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
> 300:
> >> > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0, Batch
> 199,
> >> > > > > Speed=419.039188 Epoch 0, Duration=143.934903 Epoch 0, Training
> >> > > > > accuracy=0.122542 Epoch 0, Validation accuracy=0.164359 Epoch 1,
> >> > Batch
> >> > > > > 199, Speed=445.257048 Epoch 1, Duration=135.248399 Epoch 1,
> >> Training
> >> > > > > accuracy=0.178828 Epoch 1, Validation accuracy=0.199419 Epoch 2,
> >> > Batch
> >> > > > > 199, Speed=447.115215 Epoch 2, Duration=132.003770 Epoch 2,
> >> Training
> >> > > > > accuracy=0.217808 Epoch 2, Validation accuracy=0.233073 Epoch 3,
> >> > Batch
> >> > > > > 199, Speed=441.079477 Epoch 3, Duration=126.543316 Epoch 3,
> >> Training
> >> > > > > accuracy=0.248102 Epoch 3, Validation accuracy=0.293870 Epoch 4,
> >> > Batch
> >> > > > > 199, Speed=449.329787 Epoch 4, Duration=138.398325 Epoch 4,
> >> Training
> >> > > > > accuracy=0.270021 Epoch 4, Validation accuracy=0.311498
> >> > > > >
> >> > > > > real    11m45.329s
> >> > > > > user    426m13.908s
> >> > > > > sys     16m45.093s
> >> > > > >
> >> > > > > On Wed, Jun 26, 2019 at 4:18 PM Pedro Larroy
> >> > > > > <pe...@gmail.com> wrote:
> >> > > > > >
> >> > > > > > The difference looks smaller now, more like your numbers. I
> >> wonder
> >> > > > > > if something happened during the previous benchmark like a
> >> system
> >> > > > > > update...
> >> > > > > >
> >> > > > > >
> >> > > > > > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
> >> > > > > (master)+$
> >> > > > > > time ~/mxnet_1.4/py3_venv/bin/python cifar10.py --epochs 5 &&
> >> time
> >> > > > > > ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
> [22:49:41]
> >> > > > > > ../src/io/iter_image_recordio_2.cc:172:
> >> > > > > > ImageRecordIOParser2:
> >> > > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
> >> > > > > > threads for decoding..
> >> > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load mean
> >> image
> >> > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >> > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load mean
> >> image
> >> > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >> > > > > completed
> >> > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:172:
> >> > > > > > ImageRecordIOParser2:
> >> > > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
> >> > > > > > threads for decoding..
> >> > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load mean
> >> image
> >> > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >> > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load mean
> >> image
> >> > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >> > > > > completed
> >> > > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
> >> 300:
> >> > > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 [22:49:42]
> >> > > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
> >> > > > > > 147456 bytes with malloc directly
> >> > > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> Allocate
> >> > > > > > 589824 bytes with malloc directly
> >> > > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> Allocate
> >> > > > > > 2359296 bytes with malloc directly
> >> > > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> Allocate
> >> > > > > > 9437184 bytes with malloc directly
> >> > > > > > Epoch 0, Batch 199, Speed=426.182733 Epoch 0,
> >> Duration=134.868458
> >> > > > > > Epoch 0, Training accuracy=0.127238 Epoch 0, Validation
> >> > > > > > accuracy=0.206388 Epoch 1, Batch 199, Speed=313.127156 Epoch
> 1,
> >> > > > > > Duration=128.041775 Epoch 1, Training accuracy=0.182065 Epoch
> 1,
> >> > > > > > Validation accuracy=0.202524 Epoch 2, Batch 199,
> >> Speed=410.931187
> >> > > > > > Epoch 2, Duration=124.920588 Epoch 2, Training
> accuracy=0.202584
> >> > > > > > Epoch 2, Validation accuracy=0.245693 Epoch 3, Batch 199,
> >> > > > > > Speed=419.119335 Epoch 3, Duration=120.948349 Epoch 3,
> Training
> >> > > > > > accuracy=0.235854 Epoch 3, Validation accuracy=0.291066 Epoch
> 4,
> >> > > > > > Batch 199, Speed=430.473733 Epoch 4, Duration=130.181724 Epoch
> >> 4,
> >> > > > > > Training accuracy=0.257773 Epoch 4, Validation
> accuracy=0.304988
> >> > > > > >
> >> > > > > > real    11m7.356s
> >> > > > > > user    406m9.910s
> >> > > > > > sys     14m18.349s
> >> > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
> >> > > > > > ImageRecordIOParser2:
> >> > > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
> >> > > > > > threads for decoding..
> >> > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load mean
> >> image
> >> > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >> > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load mean
> >> image
> >> > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >> > > > > completed
> >> > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
> >> > > > > > ImageRecordIOParser2:
> >> > > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
> >> > > > > > threads for decoding..
> >> > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load mean
> >> image
> >> > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >> > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load mean
> >> image
> >> > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> >> > > > > completed
> >> > > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
> >> 300:
> >> > > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0, Batch
> >> 199,
> >> > > > > > Speed=348.618154 Epoch 0, Duration=146.469352 Epoch 0,
> Training
> >> > > > > > accuracy=0.124121 Epoch 0, Validation accuracy=0.167227 Epoch
> 1,
> >> > > > > > Batch 199, Speed=452.790825 Epoch 1, Duration=130.199421 Epoch
> >> 1,
> >> > > > > > Training
> >> > > > > > accuracy=0.183863 Epoch 1, Validation accuracy=0.237079 Epoch
> 2,
> >> > > > > > Batch 199, Speed=451.406559 Epoch 2, Duration=126.320823 Epoch
> >> 2,
> >> > > > > > Training
> >> > > > > > accuracy=0.214844 Epoch 2, Validation accuracy=0.244692 Epoch
> 3,
> >> > > > > > Batch 199, Speed=403.161873 Epoch 3, Duration=125.331660 Epoch
> >> 3,
> >> > > > > > Training
> >> > > > > > accuracy=0.243506 Epoch 3, Validation accuracy=0.301182 Epoch
> 4,
> >> > > > > > Batch 199, Speed=450.826598 Epoch 4, Duration=126.426253 Epoch
> >> 4,
> >> > > > > > Training
> >> > > > > > accuracy=0.266424 Epoch 4, Validation accuracy=0.311899
> >> > > > > >
> >> > > > > > real    11m21.930s
> >> > > > > > user    415m3.855s
> >> > > > > > sys     13m53.975s
> >> > > > > >
> >> > > > > > On Wed, Jun 26, 2019 at 3:50 PM Pedro Larroy
> >> > > > > > <pe...@gmail.com> wrote:
> >> > > > > > >
> >> > > > > > > Hi Ciyong, thanks for trying to reproduce:
> >> > > > > > >
> >> > > > > > > I used this one:
> >> > > > > > > https://github.com/awslabs/deeplearning-
> >> > > > > benchmark/blob/master/dawnbe
> >> > > > > > > nch/cifar10.py
> >> > > > > > >
> >> > > > > > > Could you provide hardware and OS details?
> >> > > > > > >
> >> > > > > > > I will rerun and repost numbers in a few minutes.
> >> > > > > > >
> >> > > > > > > Pedro.
> >> > > > > > >
> >> > > > > > > On Wed, Jun 26, 2019 at 4:18 AM Chen, Ciyong
> >> > > > > > > <ci...@intel.com>
> >> > > > > wrote:
> >> > > > > > > >
> >> > > > > > > > Hi Pedro,
> >> > > > > > > >
> >> > > > > > > > I'm looking at this case, and using the script of
> >> > > > > > > >
> >> "incubator-mxnet/example/image-classification/train_cifar10.py"
> >> > > > > > > > to get
> >> > > > > the timing data, but seems there's not much difference between
> >> mxnet
> >> > > > > 1.4.1.rc0 and 1.5.0.rc1 on C5.18xlarge.
> >> > > > > > > >
> >> > > > > > > > Not sure if there's any difference in the python script,
> can
> >> > you
> >> > > > > > > > point me
> >> > > > > the link to get your script (cifar10.py)?
> >> > > > > > > > Or you can also have a try with MXNet's script
> >> > > > > > > > (train_cifar10.py) and see
> >> > > > > the performance.
> >> > > > > > > >
> >> > > > > > > > Here's the command I used to collect the time:
> >> > > > > > > >         python train_cifar10.py --num-epoch=5
> >> > > > > > > >
> >> > > > > > > > 1) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
> >> > > > > > > >         real    9m4.880s
> >> > > > > > > >         user    333m13.340s
> >> > > > > > > >         sys     14m36.100s
> >> > > > > > > >
> >> > > > > > > > 2) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
> >> > > > > > > >         real    9m2.155s
> >> > > > > > > >         user    329m37.092s
> >> > > > > > > >         sys     16m8.668s
> >> > > > > > > >
> >> > > > > > > > -Ciyong
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > -----Original Message-----
> >> > > > > > > > From: Pedro Larroy [mailto:pedro.larroy.lists@gmail.com]
> >> > > > > > > > Sent: Wednesday, June 26, 2019 6:28 AM
> >> > > > > > > > To: dev@mxnet.incubator.apache.org
> >> > > > > > > > Cc: dev@mxnet.apache.org
> >> > > > > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating)
> >> version
> >> > > > > > > > 1.5.0.rc1
> >> > > > > > > >
> >> > > > > > > > Hi these were my build flags and system info:
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > --- # CMake configuration
> >> > > > > > > > USE_CUDA: "OFF" # Build with CUDA support
> >> > > > > > > > USE_OLDCMAKECUDA: "OFF" # Build with old cmake cuda
> >> > > > > > > > USE_NCCL: "OFF" # Use NVidia NCCL with CUDA
> >> > > > > > > > USE_OPENCV: "ON" # Build with OpenCV support
> >> > > > > > > > USE_OPENMP: "ON" # Build with Openmp support
> >> > > > > > > > USE_CUDNN: "ON" # Build with cudnn support) # one could
> set
> >> > > > > > > > CUDNN_ROOT for search path
> >> > > > > > > > USE_SSE: "ON" # Build with x86 SSE instruction support IF
> >> NOT
> >> > > > > > > > ARM
> >> > > > > > > > USE_F16C: "ON" # Build with x86 F16C instruction support)
> #
> >> > > > > autodetects support if "ON"
> >> > > > > > > > USE_LAPACK: "ON" # Build with lapack support
> >> > > > > > > > USE_MKL_IF_AVAILABLE: "ON" # Use MKL if found
> >> > > > > > > > USE_MKLML_MKL: "ON" # Use MKLDNN variant of MKL (if MKL
> >> found)
> >> > > > > > > > IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
> >> > > > > > > > USE_MKLDNN: "ON" # Use MKLDNN variant of MKL (if MKL
> found)
> >> IF
> >> > > > > > > > USE_MKL_IF_AVAILABLE AND (NOT APPLE)
> >> > > > > > > > USE_OPERATOR_TUNING: "ON" # Enable auto-tuning of
> operators
> >> IF
> >> > > > > NOT
> >> > > > > > > > MSVC
> >> > > > > > > > USE_GPERFTOOLS: "ON" # Build with GPerfTools support (if
> >> found)
> >> > > > > > > > USE_JEMALLOC: "ON" # Build with Jemalloc support
> >> > > > > > > > USE_PROFILER: "ON" # Build with Profiler support
> >> > > > > > > > USE_DIST_KVSTORE: "OFF" # Build with DIST_KVSTORE support
> >> > > > > > > > USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC Plugins
> >> > > > > > > > USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin
> >> > > > > > > > USE_CPP_PACKAGE: "OFF" # Build C++ Package
> >> > > > > > > > USE_MXNET_LIB_NAMING: "ON" # Use MXNet library naming
> >> > > > > conventions.
> >> > > > > > > > USE_GPROF: "OFF" # Compile with gprof (profiling) flag
> >> > > > > > > > USE_CXX14_IF_AVAILABLE: "OFF" # Build with C++14 if the
> >> > compiler
> >> > > > > > > > supports it
> >> > > > > > > > USE_VTUNE: "OFF" # Enable use of Intel Amplifier XE
> >> (VTune)) #
> >> > > > > > > > one could set VTUNE_ROOT for search path
> >> > > > > > > > ENABLE_CUDA_RTC: "ON" # Build with CUDA runtime
> compilation
> >> > > > > > > > support
> >> > > > > > > > BUILD_CPP_EXAMPLES: "ON" # Build cpp examples
> >> > > > > > > > INSTALL_EXAMPLES: "OFF" # Install the example source
> files.
> >> > > > > > > > USE_SIGNAL_HANDLER: "ON" # Print stack traces on
> segfaults.
> >> > > > > > > > USE_TENSORRT: "OFF" # Enable infeference optimization with
> >> > > > TensorRT.
> >> > > > > > > > USE_ASAN: "OFF" # Enable Clang/GCC ASAN sanitizers.
> >> > > > > > > > ENABLE_TESTCOVERAGE: "OFF" # Enable compilation with test
> >> > > > > > > > coverage metric output
> >> > > > > > > > CMAKE_BUILD_TYPE: "Release"
> >> > > > > > > > CMAKE_CUDA_COMPILER_LAUNCHER: "ccache"
> >> > > > > > > > CMAKE_C_COMPILER_LAUNCHER: "ccache"
> >> > > > > > > > CMAKE_CXX_COMPILER_LAUNCHER: "ccache"
> >> > > > > > > >
> >> > > > > > > > commit 4d9667121ae6fb643f2a02ab15e25231ed756cde (HEAD,
> tag:
> >> > > > > > > > 1.5.0.rc1,
> >> > > > > > > > upstream/v1.5.x)
> >> > > > > > > > commit 1a7199691f5cbc6012bb53eecbf884bed5ae6590 (HEAD,
> tag:
> >> > > > > > > > 1.4.1.rc0,
> >> > > > > > > > upstream/v1.4.x)
> >> > > > > > > >
> >> > > > > > > > curl
> http://169.254.169.254/latest/meta-data/instance-type
> >> > > > > > > > c5d.18xlarge
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > Version      : 3.6.7
> >> > > > > > > > Compiler     : GCC 8.2.0
> >> > > > > > > > Build        : ('default', 'Oct 22 2018 11:32:17')
> >> > > > > > > > Arch         : ('64bit', 'ELF')
> >> > > > > > > > ------------Pip Info-----------
> >> > > > > > > > Version      : 19.1.1
> >> > > > > > > > Directory    :
> >> > /home/piotr/mxnet_1.5/py3_venv/lib/python3.6/site-
> >> > > > > packages/pip
> >> > > > > > > > ----------MXNet Info-----------
> >> > > > > > > > Version      : 1.5.0
> >> > > > > > > > Directory    : /home/piotr/mxnet_1.5/python/mxnet
> >> > > > > > > > Hashtag not found. Not installed from pre-built package.
> >> > > > > > > > ----------System Info----------
> >> > > > > > > > Platform     :
> >> > > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
> >> > > > > > > > system       : Linux
> >> > > > > > > > node         : ip-172-31-63-171
> >> > > > > > > > release      : 4.15.0-1035-aws
> >> > > > > > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
> >> > > > > > > > ----------Hardware Info----------
> >> > > > > > > > machine      : x86_64
> >> > > > > > > > processor    : x86_64
> >> > > > > > > > Architecture:        x86_64
> >> > > > > > > > CPU op-mode(s):      32-bit, 64-bit
> >> > > > > > > > Byte Order:          Little Endian
> >> > > > > > > > CPU(s):              72
> >> > > > > > > > On-line CPU(s) list: 0-71
> >> > > > > > > > Thread(s) per core:  2
> >> > > > > > > > Core(s) per socket:  18
> >> > > > > > > > Socket(s):           2
> >> > > > > > > > NUMA node(s):        2
> >> > > > > > > > Vendor ID:           GenuineIntel
> >> > > > > > > > CPU family:          6
> >> > > > > > > > Model:               85
> >> > > > > > > > Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @
> >> > 3.00GHz
> >> > > > > > > > Stepping:            4
> >> > > > > > > > CPU MHz:             1326.446
> >> > > > > > > > BogoMIPS:            6000.00
> >> > > > > > > > Hypervisor vendor:   KVM
> >> > > > > > > > Virtualization type: full
> >> > > > > > > > L1d cache:           32K
> >> > > > > > > > L1i cache:           32K
> >> > > > > > > > L2 cache:            1024K
> >> > > > > > > > L3 cache:            25344K
> >> > > > > > > > NUMA node0 CPU(s):   0-17,36-53
> >> > > > > > > > NUMA node1 CPU(s):   18-35,54-71
> >> > > > > > > > Flags:               fpu vme de pse tsc msr pae mce cx8
> apic
> >> > sep
> >> > > > mtrr
> >> > > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht
> >> syscall
> >> > > > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good
> nopl
> >> > > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
> monitor
> >> > > > > > > > ssse3 fma cx16 pcid
> >> > > > > > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
> >> xsave
> >> > > > > > > > avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
> >> > > > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep
> >> bmi2
> >> > > > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx smap
> >> > clflushopt
> >> > > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
> >> xsaves
> >> > > > > > > > ida arat pku ospke ----------Network Test----------
> >> > > > > > > >
> >> > > > > > > > ----------Python Info----------
> >> > > > > > > > Version      : 3.6.7
> >> > > > > > > > Compiler     : GCC 8.2.0
> >> > > > > > > > Build        : ('default', 'Oct 22 2018 11:32:17')
> >> > > > > > > > Arch         : ('64bit', 'ELF')
> >> > > > > > > > ------------Pip Info-----------
> >> > > > > > > > Version      : 19.1.1
> >> > > > > > > > Directory    :
> >> > /home/piotr/mxnet_1.4/py3_venv/lib/python3.6/site-
> >> > > > > packages/pip
> >> > > > > > > > ----------MXNet Info-----------
> >> > > > > > > > Version      : 1.4.1
> >> > > > > > > > Directory    : /home/piotr/mxnet_1.4/python/mxnet
> >> > > > > > > > Hashtag not found. Not installed from pre-built package.
> >> > > > > > > > ----------System Info----------
> >> > > > > > > > Platform     :
> >> > > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
> >> > > > > > > > system       : Linux
> >> > > > > > > > node         : ip-172-31-63-171
> >> > > > > > > > release      : 4.15.0-1035-aws
> >> > > > > > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
> >> > > > > > > > ----------Hardware Info----------
> >> > > > > > > > machine      : x86_64
> >> > > > > > > > processor    : x86_64
> >> > > > > > > > Architecture:        x86_64
> >> > > > > > > > CPU op-mode(s):      32-bit, 64-bit
> >> > > > > > > > Byte Order:          Little Endian
> >> > > > > > > > CPU(s):              72
> >> > > > > > > > On-line CPU(s) list: 0-71
> >> > > > > > > > Thread(s) per core:  2
> >> > > > > > > > Core(s) per socket:  18
> >> > > > > > > > Socket(s):           2
> >> > > > > > > > NUMA node(s):        2
> >> > > > > > > > Vendor ID:           GenuineIntel
> >> > > > > > > > CPU family:          6
> >> > > > > > > > Model:               85
> >> > > > > > > > Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @
> >> > 3.00GHz
> >> > > > > > > > Stepping:            4
> >> > > > > > > > CPU MHz:             1223.344
> >> > > > > > > > BogoMIPS:            6000.00
> >> > > > > > > > Hypervisor vendor:   KVM
> >> > > > > > > > Virtualization type: full
> >> > > > > > > > L1d cache:           32K
> >> > > > > > > > L1i cache:           32K
> >> > > > > > > > L2 cache:            1024K
> >> > > > > > > > L3 cache:            25344K
> >> > > > > > > > NUMA node0 CPU(s):   0-17,36-53
> >> > > > > > > > NUMA node1 CPU(s):   18-35,54-71
> >> > > > > > > > Flags:               fpu vme de pse tsc msr pae mce cx8
> apic
> >> > sep
> >> > > > mtrr
> >> > > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht
> >> syscall
> >> > > > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good
> nopl
> >> > > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
> monitor
> >> > > > > > > > ssse3 fma cx16 pcid
> >> > > > > > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
> >> xsave
> >> > > > > > > > avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
> >> > > > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep
> >> bmi2
> >> > > > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx smap
> >> > clflushopt
> >> > > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
> >> xsaves
> >> > > > > > > > ida arat pku ospke ----------Network Test----------
> >> > > > > > > >
> >> > > > > > > > On Tue, Jun 25, 2019 at 2:35 PM Pedro Larroy
> >> > > > > <pe...@gmail.com> wrote:
> >> > > > > > > > >
> >> > > > > > > > > I did a training of cifar10 in CPU and seems there's
> some
> >> > > > > > > > > regressions in the range of 7% increase of training time
> >> > against
> >> > > > 1.4.1:
> >> > > > > > > > >
> >> > > > > > > > > (py3_venv)
> >> > > > > > > > > piotr@ip-172-31-63-171
> >> :0:~/deeplearning-benchmark/dawnbench
> >> > > > > > > > > (master)+$ time python cifar10.py --epochs 5
> >> > > > > > > > > real    11m30.388s
> >> > > > > > > > > user    417m7.766s
> >> > > > > > > > > sys     16m57.315s
> >> > > > > > > > >
> >> > > > > > > > > VS 1.4.1:
> >> > > > > > > > > real    10m41.994s
> >> > > > > > > > > user    392m40.646s
> >> > > > > > > > > sys     12m30.601s
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > On Thu, Jun 20, 2019 at 10:15 PM Lai Wei <
> >> > royweilai@gmail.com>
> >> > > > > wrote:
> >> > > > > > > > > >
> >> > > > > > > > > > Hi Anirudh,
> >> > > > > > > > > >
> >> > > > > > > > > > Thanks for jumping into this quickly, I followed up on
> >> the
> >> > > > issue.
> >> > > > > > > > > >
> >> > > > > > > > > > I was meant for sockeye developer/maintainers to help
> >> setup
> >> > > > > > > > > > nightly tests and raise issues early.
> >> > > > > > > > > >
> >> > > > > > > > > > Thanks!
> >> > > > > > > > > >
> >> > > > > > > > > > On Fri, Jun 21, 2019 at 10:10 AM Haibin Lin
> >> > > > > > > > > > <ha...@gmail.com>
> >> > > > > > > > > > wrote:
> >> > > > > > > > > >
> >> > > > > > > > > > > In GluonNLP we are testing with MXNET nightly build
> >> for
> >> > > > > > > > > > > each PR, and we did find some MXNet related issue
> >> caught
> >> > by
> >> > > > the CI.
> >> > > > > > > > > > > I recommend other toolkits also add integration
> tests
> >> > with
> >> > > > > > > > > > > MXNet
> >> > > > > nightly.
> >> > > > > > > > > > > It helps identify issues early.
> >> > > > > > > > > > >
> >> > > > > > > > > > > Best,
> >> > > > > > > > > > > Haibin
> >> > > > > > > > > > >
> >> > > > > > > > > > > On Thu, Jun 20, 2019 at 18:52 Zhao, Patric
> >> > > > > > > > > > > <pa...@intel.com>
> >> > > > > wrote:
> >> > > > > > > > > > >
> >> > > > > > > > > > > > Thanks to raise the issue and we will take a look
> >> ASAP.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > The downstream cases is not in the MXNet CI so
> it's
> >> > hard
> >> > > > > > > > > > > > to catch the potential bugs or performance
> >> degradation
> >> > > > > > > > > > > > for
> >> > > > > MXNet developers.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > In the future, I suggest adding the major
> downstream
> >> > > > > > > > > > > > test cases, like
> >> > > > > > > > > > > from
> >> > > > > > > > > > > > sockeye, GluonNLP, GLuonCV, DGL, Gluon-TS, into
> the
> >> > > > > > > > > > > > nightly
> >> > > > > test.
> >> > > > > > > > > > > > If it's still too heavy,  maybe testing it weekly
> or
> >> > > > > > > > > > > > monthly :)
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Thanks,
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > --Patric
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > > -----Original Message-----
> >> > > > > > > > > > > > > From: Anirudh Subramanian
> >> > > > > > > > > > > > > [mailto:anirudh2290@gmail.com]
> >> > > > > > > > > > > > > Sent: Friday, June 21, 2019 9:31 AM
> >> > > > > > > > > > > > > To: dev@mxnet.incubator.apache.org
> >> > > > > > > > > > > > > Cc: dev@mxnet.apache.org
> >> > > > > > > > > > > > > Subject: Re: [VOTE] Release Apache MXNet
> >> (incubating)
> >> > > > > > > > > > > > > version
> >> > > > > > > > > > > > > 1.5.0.rc1
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > Hi Lai,
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > I have opened an issue:
> >> > > > > > > > > > > > >
> >> > https://github.com/apache/incubator-mxnet/issues/15297
> >> > > > > > > > > > > > > I came to know about this issue only today and I
> >> have
> >> > > > > > > > > > > > > not been
> >> > > > > > > > > > > monitoring
> >> > > > > > > > > > > > > sockeye.
> >> > > > > > > > > > > > > I jumped onto this issue to make sure it wasn't
> >> > caused
> >> > > > > > > > > > > > > by the dlpack
> >> > > > > > > > > > > > changes.
> >> > > > > > > > > > > > > Also, I don't  think sockeye CI checks against
> >> > master,
> >> > > > > > > > > > > > > it is using
> >> > > > > > > > > > > 1.4.1.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > Anirudh
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei
> >> > > > > > > > > > > > > <ro...@gmail.com>
> >> > > > > wrote:
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Hi,
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Could you share which test failed and what’s
> the
> >> > > > > > > > > > > > > > crash? How to reproduce it?
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > I was able to install sockeye and run all
> tests
> >> > passed.
> >> > > > > > > > > > > > > > Using python setup.py test
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > I have tested both nightly pip package and
> >> > 1.5.0.rc1
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > It would be great to create an issue with
> >> > > > > > > > > > > > > > reproducible steps and move the discussion
> >> there.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Also I see sockeye nightly build[1] has been
> >> > failing
> >> > > > > > > > > > > > > > for some time,
> >> > > > > > > > > > > if
> >> > > > > > > > > > > > > > it’s due to MXNet change, please raise this
> >> early
> >> > so
> >> > > > > > > > > > > > > > we can track and solve it in time rather than
> >> block
> >> > > > > > > > > > > > > > the release
> >> > > > > during vote time.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > [1] https://travis-ci.org/awslabs/sockeye
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 7:01 AM Anirudh
> >> Subramanian
> >> > > > > > > > > > > > > > <anirudh2290@gmail.com
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > I was able to reproduce a crash with the
> >> commit
> >> > > > > > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06 but
> >> not
> >> > > > > > > > > > > > > > > with the commit
> >> > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c.
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > Anirudh
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei
> >> > > > > > > > > > > > > > > <ro...@gmail.com>
> >> > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > Hi Przemyslaw,
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > Is there an issue with more details to
> track
> >> > the
> >> > > > problem?
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 6:04 AM Przemysław
> >> > > > > > > > > > > > > > > > Trędak <pt...@apache.org>
> >> > > > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > -1
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > There is a crash in sockeye unit test
> >> (python
> >> > > > > > > > > > > > > > > > > setup.py
> >> > > > > > > > > > > > > > > > > test) observed starting with nightly 1.5
> >> > build
> >> > > > > > > > > > > > > > > > > from
> >> > > > > > > > > > > > > > > > > 6/13 and still occuring in
> >> > > > > > > > > > > > > > > 1.5rc1. I
> >> > > > > > > > > > > > > > > > > don't yet have the exact commit that is
> >> > > > > > > > > > > > > > > > > responsible for it, but it is either
> >> > > > > > > > > > > > > > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c
> >> > > > > > > > > > > > > > > > > (dlpack
> >> > > > > > > > > > > > > > > > > related) or
> >> > > > > > > > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06
> >> > > > > > > > > > > > > > > > > (cached op
> >> > > > > > > > > > > > > optimization).
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > On 2019/06/20 06:36:22, Lai Wei
> >> > > > > > > > > > > > > > > > > <ro...@gmail.com>
> >> > > > > wrote:
> >> > > > > > > > > > > > > > > > > > Dear MXNet community,
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > This is the 3-day vote to release
> Apache
> >> > > > > > > > > > > > > > > > > > MXNet
> >> > > > > > > > > > > > > > > > > > (incubating) version
> >> > > > > > > > > > > > > > > > > 1.5.0.
> >> > > > > > > > > > > > > > > > > > Voting on dev@ will start June 19,
> >> > > > > > > > > > > > > > > > > > 23:59:59(PST) and close
> >> > > > > > > > > > > on
> >> > > > > > > > > > > > > > June
> >> > > > > > > > > > > > > > > > 22,
> >> > > > > > > > > > > > > > > > > > 23:59:59.
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > 1) Link to release notes:
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Re
> >> > > > > > > > > > > le
> >> > > > > > > > > > > ase+No
> >> > > > > > > > > > > te
> >> > > > > > > > > > > > > > > s
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > 2) Link to release candidate:
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > https://github.com/apache/incubator-mxnet/releases/tag/1.5
> >> > > > > > > > > > > .0
> >> > > > > > > > > > > .r
> >> > > > > > > > > > > > > > > > > > c1
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > 3) Link to source and signatures on
> >> apache
> >> > > > dist server:
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5
> >> > > > > > > > > > > .0
> >> > > > > > > > > > > .r
> >> > > > > > > > > > > > > > > > > > c1/
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > Please remember to TEST first before
> >> voting
> >> > > > > accordingly:
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > +1 = approve
> >> > > > > > > > > > > > > > > > > > +0 = no opinion
> >> > > > > > > > > > > > > > > > > > -1 = disapprove (provide reason)
> >> > > > > > > > > > > > > > > > > > --
> >> > > > > > > > > > > > > > > > > > Best Regards
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > Lai
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > --
> >> > > > > > > > > > > > > > > > Best Regards
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > Lai
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > --
> >> > > > > > > > > > > > > > Best Regards
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Lai
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > --
> >> > > > > > > > > > Best Regards
> >> > > > > > > > > >
> >> > > > > > > > > > Lai
> >> > > >
> >> > > --
> >> > > Best Regards
> >> > >
> >> > > Lai
> >> >
> >> >
> >
> >
> >>
> >> --
> >> Sandeep Krishnamurthy
> >>
> >
>