You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@systemml.apache.org by "Berthold Reinwald (JIRA)" <ji...@apache.org> on 2017/12/21 06:05:03 UTC
[jira] [Updated] (SYSTEMML-1650) GPU cudnn produces worrisome amount of numerical instability

     [ https://issues.apache.org/jira/browse/SYSTEMML-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Berthold Reinwald updated SYSTEMML-1650:
----------------------------------------
    Fix Version/s:     (was: SystemML 1.0)
                   SystemML 1.1

> GPU cudnn produces worrisome amount of numerical instability
> ------------------------------------------------------------
>
>                 Key: SYSTEMML-1650
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1650
>             Project: SystemML
>          Issue Type: Bug
>          Components: Runtime
>    Affects Versions: SystemML 0.14
>            Reporter: Nakul Jindal
>             Fix For: SystemML 1.1
>
>
> When running GPU tests (mike's run_tests.dml in the nn directory)
> {code}
> 17/05/30 17:24:19 INFO api.DMLScript: BEGIN DML run 05/30/2017 17:24:19
> 17/05/30 17:24:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 17/05/30 17:24:21 INFO context.GPUContext: Initializing CUDA
> Starting grad checks.
> ---
> 17/05/30 17:24:22 INFO context.GPUContext:  GPU memory - Total: 2096.300032 MB, Available: 1295.9743999999998 MB on GPUContext{deviceNum=0}
> 17/05/30 17:24:22 INFO context.GPUContext: Total number of GPUs on the machine: 1
> Grad checking the cross-entropy loss function.
> Grad checking the L1 loss function.
> Grad checking the L1 regularization function.
> Grad checking the L2 loss function.
> Grad checking the L2 regularization function.
> Grad checking the log loss function.
> Grad checking the affine layer with L2 loss.
>  - Grad checking X.
>  - Grad checking W.
>  - Grad checking b.
> Grad checking the 1D batch normalization layer with L2 loss.
>  - Grad checking the 'train' mode.
>    - Grad checking X.
>    - Grad checking gamma.
>    - Grad checking beta.
>  - Grad checking the 'test' mode.
>    - Grad checking X.
>    - Grad checking gamma.
>    - Grad checking beta.
> Grad checking the 2D (spatial) batch normalization layer with L2 loss.
>  - Grad checking the 'train' mode.
>    - Grad checking X.
>    - Grad checking gamma.
>    - Grad checking beta.
>  - Grad checking the 'test' mode.
>    - Grad checking X.
>    - Grad checking gamma.
>    - Grad checking beta.
> Grad checking the `im2col` 2D convolutional layer with L2 loss.
> 17/05/30 17:24:28 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
> 17/05/30 17:24:28 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
>  - Grad checking X.
>  - Grad checking W.
>  - Grad checking b.
> Grad checking the built-in 2D convolutional layer with L2 loss.
>  - Grad checking X.
>  - Grad checking W.
> WARNING: Relative error 3.063931109511093E-4 > 1.0E-4 & <= 0.01 with -11.682479557456533 analytical vs -11.689640614065409 numerical, with lossph 40.510115394324195 and lossmh 40.51034918713648
> WARNING: Relative error 6.785572589631694E-4 > 1.0E-4 & <= 0.01 with -14.363880156229683 analytical vs -14.383386822913733 numerical, with lossph 40.510088543924184 and lossmh 40.51037621166064
> WARNING: Relative error 8.117464157218959E-4 > 1.0E-4 & <= 0.01 with -13.400658690617757 analytical vs -13.378920463225084 numerical, with lossph 40.51009898805432 and lossmh 40.51036656646358
> WARNING: Relative error 6.785567321010216E-4 > 1.0E-4 & <= 0.01 with -14.37300870216048 analytical vs -14.39252775057298 numerical, with lossph 40.510088452456074 and lossmh 40.510376303011085
> WARNING: Relative error 0.0023065358169588085 > 1.0E-4 & <= 0.01 with -15.081214796672182 analytical vs -15.011804170583785 numerical, with lossph 40.510081360786614 and lossmh 40.510381596870026
> WARNING: Relative error 1.2020843619724922E-4 > 1.0E-4 & <= 0.01 with -14.602099111310885 analytical vs -14.60561012436301 numerical, with lossph 40.51008637609418 and lossmh 40.510378488296666
> WARNING: Relative error 3.063921242335014E-4 > 1.0E-4 & <= 0.01 with -11.654549775926586 analytical vs -11.66169368929104 numerical, with lossph 40.51011567395115 and lossmh 40.510348907824934
>  - Grad checking b.
> Grad checking the simple reference 2D convolutional layer with L2 loss.
>  - Grad checking X.
>  - Grad checking W.
>  - Grad checking b.
> Grad checking the 2D convolution transpose layer with L2 loss.
>  - Grad checking X.
> WARNING: Relative error 6.785553488451468E-4 > 1.0E-4 & <= 0.01 with 0.2480096633484488 analytical vs 0.2483464684566172 numerical, with lossph 8.25432627928163 and lossmh 8.25432131235226
> WARNING: Relative error 8.117342148227497E-4 > 1.0E-4 & <= 0.01 with 0.46178385247729725 analytical vs 0.4610347690281457 numerical, with lossph 8.254328419943578 and lossmh 8.254319199248197
> WARNING: Relative error 6.78555370922306E-4 > 1.0E-4 & <= 0.01 with 0.5511303465906289 analytical vs 0.5518787993707974 numerical, with lossph 8.254329314621874 and lossmh 8.254318277045886
> WARNING: Relative error 8.117020436730868E-4 > 1.0E-4 & <= 0.01 with 0.13829553169194606 analytical vs 0.13807120424758068 numerical, with lossph 8.254325180655963 and lossmh 8.254322419231878
> WARNING: Relative error 8.117328203683862E-4 > 1.0E-4 & <= 0.01 with 0.5055309144436196 analytical vs 0.5047108680322765 numerical, with lossph 8.25432885801433 and lossmh 8.25431876379697
> WARNING: Relative error 8.116455308945274E-4 > 1.0E-4 & <= 0.01 with 0.06899396037823916 analytical vs 0.06888205392741042 numerical, with lossph 8.254324486699112 and lossmh 8.254323109058033
> WARNING: Relative error 6.785554871822532E-4 > 1.0E-4 & <= 0.01 with -0.13350593809809497 analytical vs -0.1336872434976044 numerical, with lossph 8.254322458935555 and lossmh 8.254325132680425
> WARNING: Relative error 6.785552242935504E-4 > 1.0E-4 & <= 0.01 with -0.2724052650635402 analytical vs -0.2727752001163708 numerical, with lossph 8.254321068071604 and lossmh 8.254326523575607
> WARNING: Relative error 6.785555175584701E-4 > 1.0E-4 & <= 0.01 with -0.2904759680044567 analytical vs -0.29087044381981286 numerical, with lossph 8.254320887119167 and lossmh 8.254326704528044
> WARNING: Relative error 8.117365268798438E-4 > 1.0E-4 & <= 0.01 with 0.3720728122335215 analytical vs 0.37146925198072717 numerical, with lossph 8.254327521604598 and lossmh 8.254320092219558
> WARNING: Relative error 8.117996956316842E-4 > 1.0E-4 & <= 0.01 with 0.14788594799412863 analytical vs 0.1476460352201059 numerical, with lossph 8.254325267869831 and lossmh 8.254322314949126
> WARNING: Relative error 8.119012708962542E-4 > 1.0E-4 & <= 0.01 with -0.07795973031927872 analytical vs -0.07783324180721252 numerical, with lossph 8.254323015177858 and lossmh 8.254324571842695
> WARNING: Relative error 8.117353519435853E-4 > 1.0E-4 & <= 0.01 with 0.48348549268368723 analytical vs 0.48270120478477446 numerical, with lossph 8.254328637254696 and lossmh 8.2543189832306
> WARNING: Relative error 6.785553268852095E-4 > 1.0E-4 & <= 0.01 with 0.4883649684844016 analytical vs 0.48902818381435503 numerical, with lossph 8.254328686121752 and lossmh 8.254318905558076
> WARNING: Relative error 8.117788275473475E-4 > 1.0E-4 & <= 0.01 with 0.3804938800778617 analytical vs 0.37987662739880074 numerical, with lossph 8.254327583249065 and lossmh 8.254319985716517
> WARNING: Relative error 6.785547468313855E-4 > 1.0E-4 & <= 0.01 with 0.07631309322132188 analytical vs 0.07641672876701477 numerical, with lossph 8.254324560007202 and lossmh 8.254323031672627
> WARNING: Relative error 6.785553971200803E-4 > 1.0E-4 & <= 0.01 with -0.342925836087173 analytical vs -0.34339154044715764 numerical, with lossph 8.254320361892585 and lossmh 8.254327229723394
> WARNING: Relative error 6.785551565868137E-4 > 1.0E-4 & <= 0.01 with -0.02491798838300011 analytical vs -0.02495182780393179 numerical, with lossph 8.254323546295183 and lossmh 8.254324045331739
> WARNING: Relative error 8.117758660292993E-4 > 1.0E-4 & <= 0.01 with -0.29017371666857017 analytical vs -0.2897029867554579 numerical, with lossph 8.254320890135642 and lossmh 8.254326684195377
> WARNING: Relative error 6.785553229289772E-4 > 1.0E-4 & <= 0.01 with -0.5072646368526688 analytical vs -0.5079535185359418 numerical, with lossph 8.254318716279274 and lossmh 8.254328875349644
> WARNING: Relative error 6.78554778315526E-4 > 1.0E-4 & <= 0.01 with 0.03126357863101518 analytical vs 0.031306035541689425 numerical, with lossph 8.254324108883962 and lossmh 8.254323482763251
> WARNING: Relative error 6.785553099932536E-4 > 1.0E-4 & <= 0.01 with -0.2942630860026224 analytical vs -0.2946627047251127 numerical, with lossph 8.254320849187412 and lossmh 8.254326742441506
> WARNING: Relative error 6.785551200008403E-4 > 1.0E-4 & <= 0.01 with 0.08135480331223598 analytical vs 0.08146528571728595 numerical, with lossph 8.254324610476463 and lossmh 8.254322981170748
>  - Grad checking W.
> WARNING: Relative error 1.2021043996600452E-4 > 1.0E-4 & <= 0.01 with -0.6822178752109117 analytical vs -0.6823819143519926 numerical, with lossph 8.254316974560194 and lossmh 8.25433062219848
> WARNING: Relative error 3.0638425696629187E-4 > 1.0E-4 & <= 0.01 with -1.4508166976784973 analytical vs -1.4517059849339373 numerical, with lossph 8.254309268100457 and lossmh 8.254338302220155
> WARNING: Relative error 8.117943113692949E-4 > 1.0E-4 & <= 0.01 with 0.7308501955876874 analytical vs 0.7296645580190385 numerical, with lossph 8.254331070748734 and lossmh 8.254316477457573
> WARNING: Relative error 8.117065202066276E-4 > 1.0E-4 & <= 0.01 with 0.9946502709906734 analytical vs 0.9930368523924925 numerical, with lossph 8.254333755980372 and lossmh 8.254313895243325
> WARNING: Relative error 3.063898473026446E-4 > 1.0E-4 & <= 0.01 with -1.9666799496457117 analytical vs -1.967885460540941 numerical, with lossph 8.254304102377066 and lossmh 8.254343460086277
> WARNING: Relative error 3.063873162245538E-4 > 1.0E-4 & <= 0.01 with -1.4461219772325682 analytical vs -1.4470083956830135 numerical, with lossph 8.254309315051604 and lossmh 8.254338255219517
> WARNING: Relative error 1.2020330860452122E-4 > 1.0E-4 & <= 0.01 with -1.2286387562580745 analytical vs -1.228934164654305 numerical, with lossph 8.254311502036288 and lossmh 8.25433608071958
> WARNING: Relative error 6.785553661541439E-4 > 1.0E-4 & <= 0.01 with 0.08818615423020833 analytical vs 0.08830591387010144 numerical, with lossph 8.254324678957262 and lossmh 8.254322912838985
>  - Grad checking b.
> Grad checking the (inverted) dropout layer with L2 loss.
> Grad checking the LSTM layer with L2 loss.
>  - Grad checking X.
>  - Grad checking W.
>  - Grad checking b.
>  - Grad checking out0.
>  - Grad checking c0.
> Grad checking the 2D max pooling layer with L2 loss.
>  - Grad checking w/ pad=0.
>  - Grad checking w/ pad=1.
> Grad checking the built-in 2D max pooling layer with L2 loss.
>  - Grad checking w/ pad=0.
>  - Grad checking w/ pad=1.
> Grad checking the simple reference 2D max pooling layer with L2 loss.
>  - Grad checking w/ pad=0.
>  - Grad checking w/ pad=1.
> Grad checking the ReLU nonlinearity layer with L2 loss.
> Grad checking the simple RNN layer with L2 loss.
>  - Grad checking X.
>  - Grad checking W.
>  - Grad checking b.
>  - Grad checking out0.
> Grad checking the 1D scale & shift layer with L2 loss.
>  - Grad checking X.
>  - Grad checking gamma.
>  - Grad checking beta.
> Grad checking the 2D scale & shift layer with L2 loss.
>  - Grad checking X.
>  - Grad checking gamma.
>  - Grad checking beta.
> Grad checking the sigmoid nonlinearity layer with L2 loss.
> Grad checking the softmax layer with L2 loss.
> Grad checking the tanh nonlinearity layer with L2 loss.
> ---
> Grad checks complete -- look for any ERRORs or WARNINGs.
> If any tests involving ReLUs failed, try a few times to ensure that they were not false negatives due to kinks being crossed.
> Starting other tests.
> ---
> Testing the 1D batch normalization function.
> Testing the 2D (spatial) batch normalization function.
> Testing the 2D convolution functions.
> ERROR: Relative error 1.4275242179409038E-8 > 1.0E-10 with 0.2613148690102816 vs 0.26131486154961564.
> ERROR: Relative error 5.19998536442815E-10 > 1.0E-10 with -1.0332757339265042 vs -1.033275735001108.
> ERROR: Relative error 1.6690477019584457E-9 > 1.0E-10 with -0.2796655022366367 vs -0.2796655013030866.
> ERROR: Relative error 4.026622598319469E-8 > 1.0E-10 with -0.5810464384666573 vs -0.5810463916735648.
> ERROR: Relative error 1.5925147093041632E-8 > 1.0E-10 with 0.3443370985342769 vs 0.34433710950151497.
> ERROR: Relative error 1.527092464737425E-8 > 1.0E-10 with 0.29907123360418936 vs 0.29907122447000095.
> ERROR: Relative error 1.6981187364236183E-8 > 1.0E-10 with -1.3172908215225392 vs -1.3172908662608644.
> ERROR: Relative error 1.1249863733341583E-9 > 1.0E-10 with 2.372276123264216 vs 2.3722761179266594.
> ERROR: Relative error 2.0761911589432542E-8 > 1.0E-10 with 1.2022032376875469 vs 1.2022031877674733.
> ERROR: Relative error 1.565994385459594E-8 > 1.0E-10 with -3.2028287262088755 vs -3.202828826521113.
> ERROR: Relative error 1.6864676187009944E-8 > 1.0E-10 with -1.1617410552745209 vs -1.1617410160897481.
> ERROR: Relative error 6.7621761573523795E-9 > 1.0E-10 with -2.357698874691257 vs -2.3576989065776073.
> ERROR: Relative error 8.077058791206047E-9 > 1.0E-10 with -0.826672150112067 vs -0.8266721634662262.
> ERROR: Relative error 3.126862662452838E-7 > 1.0E-10 with 0.13533522445012097 vs 0.13533530908507949.
> ERROR: Relative error 1.4938685572516403E-8 > 1.0E-10 with 1.6979405686913527 vs 1.697940619421354.
> ERROR: Relative error 1.5016745260435074E-9 > 1.0E-10 with -1.1265715161920746 vs -1.1265715128085871.
> ERROR: Relative error 1.57421411011751E-8 > 1.0E-10 with -1.4288550242468203 vs -1.4288549792603462.
> ERROR: Relative error 5.967223900831169E-9 > 1.0E-10 with 3.3608783218897167 vs 3.360878361999944.
> ERROR: Relative error 2.6680716508589268E-8 > 1.0E-10 with -1.017766223123622 vs -1.0177662774330876.
> ERROR: Relative error 2.982949129497961E-8 > 1.0E-10 with -0.36147647765070573 vs -0.36147649921602526.
> ERROR: Relative error 3.657840826605735E-8 > 1.0E-10 with 0.45134768685895427 vs 0.4513477198781154.
> ERROR: Relative error 1.4907379969827675E-8 > 1.0E-10 with -1.5573016894495448 vs -1.5573017358801216.
> ERROR: Relative error 1.5786845592838222E-8 > 1.0E-10 with 0.14299706440454166 vs 0.14299706891948688.
> ERROR: Relative error 5.200703379619866E-10 > 1.0E-10 with 1.9947644445978157 vs 1.99476444252298.
> ERROR: Relative error 6.52328960308163E-9 > 1.0E-10 with -0.9699901535166611 vs -0.9699901661717145.
> ERROR: Relative error 1.1700565471480131E-8 > 1.0E-10 with 0.6822547438763443 vs 0.6822547598418771.
> ERROR: Relative error 1.7446876565463836E-10 > 1.0E-10 with 1.0078137191134946 vs 1.0078137194651586.
> ERROR: Relative error 1.1440123355883945E-8 > 1.0E-10 with 0.2932107414338165 vs 0.29321074814255066.
> ERROR: Relative error 3.136300157201714E-8 > 1.0E-10 with -0.12095057939678663 vs -0.1209505869835333.
> ERROR: Relative error 6.045675521257672E-9 > 1.0E-10 with -1.6284907233105383 vs -1.6284907036198855.
> ERROR: Relative error 1.9960135534777884E-7 > 1.0E-10 with 0.05796215971397543 vs 0.05796218285263132.
> ERROR: Relative error 1.4599148103004017E-8 > 1.0E-10 with -1.5695035918656892 vs -1.5695036376925207.
> ERROR: Relative error 6.543527908682189E-9 > 1.0E-10 with -1.0879498840683728 vs -1.0879498983064337.
> ERROR: Relative error 2.863818344889719E-8 > 1.0E-10 with 0.986026344823579 vs 0.9860264012995873.
> ERROR: Relative error 1.1056654568999266E-8 > 1.0E-10 with 0.24707866293721786 vs 0.24707866840094478.
> ERROR: Relative error 6.455137924885195E-8 > 1.0E-10 with -0.40055946263778364 vs -0.4005594109244554.
> ERROR: Relative error 1.639718543077429E-8 > 1.0E-10 with -2.1781136614834082 vs -2.178113590053542.
> ERROR: Relative error 4.8798430636206827E-8 > 1.0E-10 with 0.05076196924826959 vs 0.0507619742024787.
> ERROR: Relative error 9.165908471957055E-9 > 1.0E-10 with 1.8794282546494758 vs 1.8794282201961414.
> ERROR: Relative error 2.978816706402979E-8 > 1.0E-10 with 0.3975604976144726 vs 0.39756052129967034.
> ERROR: Relative error 2.963621786230762E-8 > 1.0E-10 with 0.6415698417229411 vs 0.6415698797503494.
> ERROR: Relative error 2.3109709555514415E-8 > 1.0E-10 with -1.1982869673393794 vs -1.1982870227235083.
> ERROR: Relative error 3.2115462610775645E-10 > 1.0E-10 with -2.0857041042357167 vs -2.0857041055753838.
> ERROR: Relative error 9.948630979193247E-8 > 1.0E-10 with -0.10622623825604993 vs -0.10622625939216493.
> ERROR: Relative error 6.778314445963566E-8 > 1.0E-10 with 0.2196231226007214 vs 0.21962309282723172.
> ERROR: Relative error 2.5809976150965856E-9 > 1.0E-10 with 2.588105950679918 vs 2.5881059640397086.
> ERROR: Relative error 9.087554361084105E-8 > 1.0E-10 with 0.09544286378739902 vs 0.09544284644055634.
> ERROR: Relative error 1.8858614351612048E-8 > 1.0E-10 with -0.5589644431314431 vs -0.5589644220488538.
> ERROR: Relative error 7.736763250891607E-8 > 1.0E-10 with 0.04561424549587588 vs 0.04561425255400879.
> ERROR: Relative error 2.4394587006452004E-8 > 1.0E-10 with 0.8068575646006609 vs 0.806857603966576.
> ERROR: Relative error 1.2147537773849554E-8 > 1.0E-10 with -1.2798666770117564 vs -1.2798666459172992.
> ERROR: Relative error 4.395188501268004E-9 > 1.0E-10 with 0.77094736022815 vs 0.7709473534512321.
> ERROR: Relative error 4.273504230193166E-8 > 1.0E-10 with -0.04529170878696899 vs -0.04529171265805534.
> ERROR: Relative error 5.193072042217285E-9 > 1.0E-10 with 0.26683412238923976 vs 0.26683411961786213.
> ERROR: Relative error 5.044924487623104E-9 > 1.0E-10 with -0.743458732252303 vs -0.7434587247509167.
> ERROR: Relative error 1.7148401756786993E-9 > 1.0E-10 with 0.8849261094606061 vs 0.8849261064255924.
> ERROR: Relative error 5.823286681596099E-10 > 1.0E-10 with -2.926710388969381 vs -2.9267103855607663.
> ERROR: Relative error 4.8325979184277686E-9 > 1.0E-10 with 1.8593052545877329 vs 1.8593052725582824.
> ERROR: Relative error 6.601979216797429E-8 > 1.0E-10 with 0.5235158749686739 vs 0.523515944093497.
> ERROR: Relative error 3.434129104490014E-8 > 1.0E-10 with -1.062474946283946 vs -1.0624750192574712.
> ERROR: Relative error 1.8295696384999767E-8 > 1.0E-10 with -0.2937149434414278 vs -0.29371493269398913.
> ERROR: Relative error 1.0063200936207417E-9 > 1.0E-10 with 1.119269021693839 vs 1.1192690239465248.
> ERROR: Relative error 4.809906625420993E-9 > 1.0E-10 with 1.5742089463824267 vs 1.5742089615260229.
> ERROR: Relative error 9.943604666704722E-9 > 1.0E-10 with 0.8775827136936241 vs 0.8775827311462954.
> ERROR: Relative error 6.3483134114491796E-9 > 1.0E-10 with -0.698009708220039 vs -0.6980096993576703.
> ERROR: Relative error 4.535157418604792E-8 > 1.0E-10 with -0.1749102977141979 vs -0.17491028184928392.
> ERROR: Relative error 2.2360276379519686E-9 > 1.0E-10 with -0.8108452879638077 vs -0.8108452843376628.
> ERROR: Relative error 1.0374420129323381E-8 > 1.0E-10 with -0.8834414065663451 vs -0.8834413882359606.
> ERROR: Relative error 1.1878019489316737E-9 > 1.0E-10 with 2.532993506117655 vs 2.5329935121350444.
> ERROR: Relative error 2.70754303169682E-7 > 1.0E-10 with 0.0717457323097893 vs 0.07174577116073133.
> ERROR: Relative error 9.53853266322461E-10 > 1.0E-10 with -0.6740828968349293 vs -0.6740828955489769.
> ERROR: Relative error 5.7021836168127075E-9 > 1.0E-10 with -1.2861467707448162 vs -1.2861467854125064.
> ERROR: Relative error 7.72738725803068E-9 > 1.0E-10 with 0.6463938706474603 vs 0.6463938806373319.
> ERROR: Relative error 1.6274098482424904E-8 > 1.0E-10 with 0.3306564558987715 vs 0.33065646666104315.
> ERROR: Relative error 3.103682986912765E-8 > 1.0E-10 with -0.6970667742796319 vs -0.697066817549119.
> ERROR: Relative error 1.535686164274709E-8 > 1.0E-10 with 0.701665021104026 vs 0.701664999553281.
> ERROR: Relative error 4.100056878916496E-9 > 1.0E-10 with -2.1103616009334836 vs -2.110361618238689.
> ERROR: Relative error 8.123120975873666E-9 > 1.0E-10 with 1.2604178829093742 vs 1.2604179033864282.
> ERROR: Relative error 1.0941209833039406E-8 > 1.0E-10 with 1.2266289837112507 vs 1.2266290105528612.
> ERROR: Relative error 5.541442492773271E-7 > 1.0E-10 with -0.012777216573308391 vs -0.012777202412474067.
> ERROR: Relative error 4.2253762431563665E-9 > 1.0E-10 with -1.339045610669553 vs -1.339045621985496.
> ERROR: Relative error 8.360566732946797E-8 > 1.0E-10 with 0.1561570755476005 vs 0.1561571016588357.
> ERROR: Relative error 2.767267007771319E-9 > 1.0E-10 with -2.6167992977337984 vs -2.616799312216563.
> ERROR: Relative error 1.7289233542875815E-8 > 1.0E-10 with 1.0579964137565983 vs 1.057996450340493.
> ERROR: Relative error 1.7154762078446898E-8 > 1.0E-10 with -0.5334469910041575 vs -0.5334469727018454.
> ERROR: Relative error 3.110409084496228E-9 > 1.0E-10 with 1.7670386817474821 vs 1.7670386707550558.
> ERROR: Relative error 1.1173482982155575E-8 > 1.0E-10 with -1.1253843105032257 vs -1.125384335652151.
> ERROR: Relative error 9.285717485307979E-9 > 1.0E-10 with -1.1188404636483114 vs -1.1188404428698386.
> ERROR: Relative error 2.1555251644095785E-9 > 1.0E-10 with 1.1254590576532526 vs 1.1254590625051633.
> ERROR: Relative error 2.319234552374325E-8 > 1.0E-10 with 0.9556977590880147 vs 0.955697803417761.
> ERROR: Relative error 4.4763326839679915E-9 > 1.0E-10 with 1.591009326384972 vs 1.5910093406287462.
> ERROR: Relative error 9.487915016929957E-9 > 1.0E-10 with 0.5884221509707697 vs 0.588422139804971.
> ERROR: Relative error 1.7764401531913022E-8 > 1.0E-10 with 0.916673465359319 vs 0.9166734979276306.
> ERROR: Relative error 2.3563316249488676E-8 > 1.0E-10 with -0.13562099666658628 vs -0.13562099027522556.
> ERROR: Relative error 4.394744814951474E-9 > 1.0E-10 with 2.0302516382886835 vs 2.0302516561335593.
> ERROR: Relative error 6.460001556392016E-10 > 1.0E-10 with 1.3949844117223749 vs 1.3949844099200546.
> ERROR: Relative error 8.778931914804406E-9 > 1.0E-10 with -1.2093561879421602 vs -1.2093562091758716.
> ERROR: Relative error 1.3404298361036346E-9 > 1.0E-10 with -1.7649810877732333 vs -1.7649810830415666.
> ERROR: Relative error 8.428228195211393E-9 > 1.0E-10 with 2.497793466520835 vs 2.497793508624782.
> ERROR: Relative error 1.0284037808289523E-8 > 1.0E-10 with 0.8600743385970706 vs 0.8600743209069968.
> Testing the 2D convolution transpose function.
> Testing the cross-entropy loss function with zero-valued predictions.
> Testing the im2col and col2im functions.
> Testing the 2D max pooling functions.
>  - Testing w/ padh=0 & padw=0.
>  - Testing w/ padh=0 & padw=1.
>  - Testing w/ padh=0 & padw=2.
>  - Testing w/ padh=0 & padw=3.
>  - Testing w/ padh=1 & padw=0.
>  - Testing w/ padh=1 & padw=1.
>  - Testing w/ padh=1 & padw=2.
>  - Testing w/ padh=1 & padw=3.
>  - Testing w/ padh=2 & padw=0.
>  - Testing w/ padh=2 & padw=1.
>  - Testing w/ padh=2 & padw=2.
>  - Testing w/ padh=2 & padw=3.
>  - Testing w/ padh=3 & padw=0.
>  - Testing w/ padh=3 & padw=1.
>  - Testing w/ padh=3 & padw=2.
>  - Testing w/ padh=3 & padw=3.
>  - Testing for correct behavior against known answer w/ pad=0.
>  - Testing for correct behavior against known answer w/ pad=1.
>  - Testing for correct behavior against known answer w/ all negative matrix w/ pad=0.
>  - Testing for correct behavior against known answer w/ all negative matrix w/ pad=1.
> Testing the padding and unpadding functions.
> Testing the tanh forward function.
> ---
> Other tests complete -- look for any ERRORs or WARNINGs.
> 17/05/30 17:26:25 INFO api.DMLScript: END DML run 05/30/2017 17:26:25
> SystemML Statistics:
> Total elapsed time:		126.751 sec.
> Total compilation time:		2.136 sec.
> Total execution time:		124.615 sec.
> Number of compiled MR Jobs:	0.
> Number of executed MR Jobs:	0.
> CUDA/CuLibraries init time:	1.086/0.985 sec.
> Number of executed GPU inst:	552273.
> GPU mem tx time  (alloc/dealloc/set0/toDev/fromDev):	0.032/0.002/6.738/29.418/16.843 sec.
> GPU mem tx count (alloc/dealloc/set0/toDev/fromDev/evict):	221/221/972795/532/402390/237544/0.
> GPU conversion time  (sparseConv/sp2dense/dense2sp):	0.001/0.037/0.000 sec.
> GPU conversion count (sparseConv/sp2dense/dense2sp):	532/561/0.
> Cache hits (Mem, WB, FS, HDFS):	2296853/0/0/0.
> Cache writes (WB, FS, HDFS):	23912/0/0.
> Cache times (ACQr/m, RLS, EXP):	18.229/0.952/0.666/0.000 sec.
> HOP DAGs recompiled (PRED, SB):	0/0.
> HOP DAGs recompile time:	0.053 sec.
> Functions recompiled:		6501.
> Functions recompile time:	12.265 sec.
> ParFor loops optimized:		1235.
> ParFor optimize time:		2.541 sec.
> ParFor initialize time:		0.092 sec.
> ParFor result merge time:	0.003 sec.
> ParFor total update in-place:	0/288348/367740
> Total JIT compile time:		39.733 sec.
> Total JVM GC count:		75.
> Total JVM GC time:		0.248 sec.
> LibMatrixDNN dense count (conv/bwdF/bwdD/im2col/maxBwd):	0/0/0/0/0.
> LibMatrixDNN sparse count (conv/bwdF/bwdD/im2col/maxBwd):	0/0/0/0/0.
> LibMatrixDNN conv(im2col/matmult), bwdF (im2col/matmult), bwdD (col2im/matmult) time:	0.000/0.000/0.000/0.000/0.000/0.000 sec.
> Heavy hitter instructions:
>    #  Instruction           Time(s)    Count  GPU
>    1  forward               111.231     7027  
>    2  lstm                   56.371        1  
>    3  gpu_*                  34.008   208803  s2d[0.000s,2], mmck[5.535s,75808], msk[9.984s,132995], ao[1.938s,208803], H2D[15.177s,209727]
>    4  conv2d_simple          19.747        1  
>    5  rnn                    14.719        1  
>    6  max_pool2d             12.343        2  
>    7  gpu_ba+*               12.131    71493  H2D[7.586s,99457], Mdmdm[0.623s,43613], ao[0.721s,71493], Mddot[2.555s,27880]
>    8  leftIndex              10.275   367740  
>    9  conv2d                  7.951        2  
>   10  gpu_+                   7.352    98170  s2d[0.035s,542], msk[0.196s,2787], ddgeaml[0.319s,24089], D2D[0.383s,39732], ao[0.975s,98170], H2D[2.294s,29303], mmck[2.367s,31562]
>   11  gpu_-                   6.977    80792  mmck[0.153s,2067], ddgeaml[0.179s,13471], msk[4.702s,65254], H2D[0.482s,6076], ao[0.896s,80792]
>   12  sigmoid                 5.261    88549  
>   13  backward                4.320       44  
>   14  gpu_uamax               4.054    15628  r[0.002s,15628], az[0.093s,15628], D2H[1.046s,15628], rallk[1.313s,15628], H2D[1.469s,15628]
>   15  max_pool2d_simple       4.010        1  
>   16  rmvar                   3.585  5694332  
>   17  gpu_r'                  3.118    25395  ao[0.255s,25394], ddgeaml[0.303s,25394], H2D[2.411s,25395]
>   18  batch_norm1d            2.906        2  
>   19  rangeReIndex            2.716   507414  
>   20  gpu_uarsqk+             2.296    12662  a[0.000s,1], r[0.003s,12661], az[0.058s,12662], ao[0.139s,12662], rrowk[0.883s,12662], msk[1.052s,12662]
>   21  gpu_uak+                2.164    13551  a[0.000s,1], s2d[0.001s,9], H2D[0.004s,18], r[0.005s,13550], az[0.105s,13551], D2H[0.946s,13551], rallk[0.966s,13551]
>   22  rshape                  2.057   345346  
>   23  affine                  1.425        1  
>   24  batch_norm2d            1.350        2  
>   25  gpu_+*                  1.229     8949  daxpymv[0.000s,2], s2d[0.000s,2], D2D[0.078s,8947], daxpy[0.088s,8947], ao[0.088s,8949], H2D[0.907s,12024]
>   26  createvar               0.922  1945781  
>   27  scale_shift1d           0.773        1  
>   28  rand                    0.759    67235  
>   29  dropout                 0.546        1  
>   30  gpu_uacvar              0.493     1312  r[0.000s,3929], a[0.001s,7], ao[0.014s,1312], az[0.019s,3936], mmck[0.091s,1312], rcolk[0.170s,2624], msk[0.173s,2624]
>   31  gpu_/                   0.304     3325  H2D[0.000s,1], ao[0.037s,3325], msk[0.096s,1252], mmck[0.147s,2073]
>   32  scale_shift2d           0.242        1  
>   33  *                       0.238   978652  
>   34  conv2d_builtin          0.232        1  
>   35  gpu_sqrt                0.216     2626  ao[0.028s,2626], sqrtk[0.168s,2626]
>   36  max_pool2d_builtin      0.204        1  
>   37  gpu_bias_add            0.202     1700  s2d[0.000s,4], ao[0.019s,1700], H2D[0.021s,323], nnrbk[0.126s,1700]
>   38  +                       0.190   753849  
>   39  gpu_uacmean             0.177     1312  ao[0.015s,1312], H2D[0.060s,906], rcolk[0.092s,1312]
>   40  check_rel_grad_error    0.177     6001  
>   41  gpu_bias_multiply       0.175     1553  ao[0.017s,1553], H2D[0.025s,319], nnrbk[0.120s,1553]
>   42  col2im_t259             0.165        2  
>   43  ncol                    0.152   404715  
>   44  gpu_uacmax              0.149     1144  ao[0.012s,1144], rcolk[0.119s,1144]
>   45  -                       0.147   539042  
>   46  cpvar                   0.138   563847  
>   47  append                  0.127    40461  
>   48  gpu_uark+               0.123      920  H2D[0.004s,15], ao[0.014s,920], rrowk[0.091s,920]
>   49  im2col                  0.120        2  
>   50  cross_entropy_loss      0.119        2  
>   51  check_rel_error         0.118    18454  
>   52  gpu_uarvar              0.115      310  r[0.000s,925], a[0.001s,5], ao[0.003s,310], az[0.004s,930], mmck[0.020s,310], msk[0.040s,620], rrowk[0.040s,620]
>   53  conv2d_transpose        0.105        2  
>   54  gpu_^2                  0.104      814  H2D[0.002s,30], ao[0.008s,814], msk[0.086s,814]
>   55  gpu_uarmean             0.095      620  ao[0.006s,620], rrowk[0.041s,620], H2D[0.044s,620]
>   56  col2im                  0.084        1  
>   57  nrow                    0.081   162537  
>   58  gpu_conv2d_bias_add     0.081      278  s2d[0.000s,2], nnc[0.000s,278], nni[0.003s,278], ao[0.003s,278], nncf[0.009s,278], H2D[0.020s,281], nnrbk[0.034s,278]
>   59  tanh                    0.065        2  
>   60  log_loss                0.064        1  
>   61  softmax                 0.062        1  
>   62  gpu_maxpooling          0.052      278  nnc[0.000s,278], nni[0.002s,278], ao[0.004s,278], nnmf[0.004s,278], H2D[0.037s,260]
>   63  im2col_t26284           0.051        3  
>   64  im2col_t26282           0.045        3  
>   65  im2col_t228             0.043        2  
>   66  im2col_t259             0.038        2  
>   67  im2col_t25918           0.037        3  
>   68  ==                      0.036     9393  
>   69  im2col_t25920           0.035        3  
>   70  im2col_t26193           0.035        3  
>   71  castdts                 0.034    55103  
>   72  im2col_t26191           0.034        3  
>   73  relu                    0.033        1  
>   74  im2col_t342             0.032        2  
>   75  im2col_t26102           0.031        3  
>   76  im2col_t26100           0.031        3  
>   77  im2col_t25827           0.030        3  
>   78  im2col_t25829           0.030        3  
>   79  im2col_t26011           0.030        3  
>   80  im2col_t434             0.030        2  
>   81  im2col_t25192           0.029        3  
>   82  im2col_t25554           0.029        3  
>   83  im2col_t457             0.029        2  
>   84  assignvar               0.029   144209  
>   85  im2col_t17008           0.029        2  
>   86  l2_reg                  0.029        1  
>   87  im2col_t595             0.029        2  
>   88  im2col_t25465           0.028        3  
>   89  im2col_t4804            0.028        2  
>   90  im2col_t388             0.028        2  
>   91  im2col_t25556           0.027        3  
>   92  im2col_t24923           0.027        3  
>   93  im2col_t25190           0.027        3  
>   94  l1_reg                  0.027        1  
>   95  im2col_t25738           0.027        3  
>   96  im2col_t25463           0.027        3  
>   97  im2col_t25736           0.026        3  
>   98  im2col_t26009           0.026        3  
>   99  im2col_t365             0.026        2  
>  100  im2col_t411             0.026        2  
> {code}
> Ping [~mwdusenb@us.ibm.com], [~niketanpansare]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)