You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@systemml.apache.org by "Mike Dusenberry (JIRA)" <ji...@apache.org> on 2017/08/08 17:34:00 UTC

[jira] [Updated] (SYSTEMML-1814) Improve slide distribution of the image dataset via improved sampling policy

     [ https://issues.apache.org/jira/browse/SYSTEMML-1814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Dusenberry updated SYSTEMML-1814:
--------------------------------------
    Sprint: Sprint 4

> Improve slide distribution of the image dataset via improved sampling policy
> ----------------------------------------------------------------------------
>
>                 Key: SYSTEMML-1814
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1814
>             Project: SystemML
>          Issue Type: Improvement
>            Reporter: Mike Dusenberry
>            Assignee: Mike Dusenberry
>
> Currently, our models are heavily overfitting on the training dataset.  However, further evaluation has shown that this is not the usual overfitting due to an over-expressive model -- in this case we are employing heavy model freezing (as much as only unfreezing the final softmax classifier of a pretrained ResNet50).  Therefore, my evaluation has led me to believe that this is likely due to batch effects in the data, and an examination of the original slide distribution in the sample images dataset has shown a severe imbalance.  Note, this is the distribution over the slide from which an image originated, and is distinctly different from the class distribution, which is much more reasonably dispersed.
> {code}
>      slide_num  count
> 0          436      1
> 1          116      1
> 2          468      2
> 3           38      3
> 4          195      4
> 5          173      5
> 6           13      7
> 7          481      8
> 8           83      9
> 9          349     11
> 10         490     15
> 11         292     17
> 12         281     22
> 13         387     26
> 14         326     32
> 15         286     32
> 16          88     39
> 17         477     48
> 18         205     57
> 19         135     58
> 20         127     58
> 21          16     61
> 22         245     66
> 23           5     81
> 24         306     83
> 25         284     91
> 26         263    100
> 27          15    120
> 28         345    124
> 29         380    128
> 30          24    137
> 31         382    150
> 32           1    154
> 33         421    164
> 34         163    169
> 35         278    171
> 36         235    197
> 37         332    197
> 38         343    207
> 39          43    237
> 40         249    246
> 41         113    256
> 42         496    262
> 43         482    264
> 44          86    269
> 45         415    269
> 46         472    326
> 47         422    329
> 48         450    340
> 49         108    348
> 50           3    390
> 51         191    402
> 52         272    474
> 53          85    483
> 54          97    484
> 55         210    508
> 56         293    544
> 57          41    595
> 58         452    613
> 59         220    613
> 60         406    651
> 61          67    665
> 62         260    666
> 63         361    673
> 64         269    684
> 65          50    684
> 66         304    753
> 67         101    769
> 68         433    868
> 69           4    898
> 70         499    915
> 71         145    917
> 72         357    918
> 73         365    940
> 74          82    951
> 75         126    965
> 76         185    965
> 77         164   1077
> 78         221   1086
> 79         165   1111
> 80         316   1129
> 81         350   1132
> 82          89   1162
> 83          19   1169
> 84          74   1206
> 85         132   1248
> 86          47   1278
> 87         188   1297
> 88         459   1312
> 89         368   1337
> 90         335   1368
> 91         225   1373
> 92         234   1378
> 93         487   1385
> 94         247   1464
> 95         427   1476
> 96          65   1492
> 97         402   1500
> 98         315   1557
> 99         201   1604
> 100        344   1607
> 101        273   1616
> 102        146   1623
> 103        341   1636
> 104        425   1640
> 105        182   1681
> 106        403   1682
> 107        275   1690
> 108        457   1717
> 109        448   1724
> 110        277   1729
> 111         70   1740
> 112        141   1747
> 113        264   1777
> 114        122   1880
> 115        319   1915
> 116        449   1951
> 117        104   1988
> 118        377   1993
> 119        285   2008
> 120        107   2084
> 121        410   2141
> 122         11   2148
> 123        367   2153
> 124        416   2162
> 125        311   2183
> 126        338   2206
> 127         51   2233
> 128        153   2255
> 129        144   2285
> 130        497   2358
> 131        218   2364
> 132        330   2376
> 133        308   2392
> 134        213   2480
> 135        454   2512
> 136        103   2567
> 137        446   2569
> 138         40   2622
> 139        251   2629
> 140        149   2632
> 141        455   2633
> 142        430   2669
> 143        262   2715
> 144         76   2737
> 145         18   2748
> 146        178   2763
> 147        383   2864
> 148         54   2871
> 149        223   2908
> 150        207   2931
> 151        486   3043
> 152        391   3099
> 153        342   3104
> 154        390   3116
> 155        276   3136
> 156         75   3141
> 157        181   3171
> 158        142   3213
> 159        414   3255
> 160        137   3276
> 161        295   3285
> 162        358   3315
> 163          7   3322
> 164        323   3327
> 165         71   3334
> 166        243   3344
> 167        120   3359
> 168         48   3371
> 169        434   3387
> 170        206   3404
> 171          9   3460
> 172        476   3467
> 173         32   3472
> 174        491   3496
> 175        444   3502
> 176        279   3530
> 177         59   3546
> 178        174   3556
> 179        464   3595
> 180        392   3633
> 181         99   3677
> 182         72   3682
> 183        347   3779
> 184         28   3804
> 185        314   3807
> 186        322   3809
> 187        492   3823
> 188        258   3824
> 189        230   3831
> 190        354   3887
> 191        346   3951
> 192        445   3963
> 193        209   3969
> 194          8   3986
> 195        443   3988
> 196        290   3993
> 197        118   4025
> 198        152   4026
> 199         56   4078
> 200        170   4131
> 201         84   4146
> 202        413   4150
> 203        447   4171
> 204        417   4193
> 205         60   4210
> 206         92   4265
> 207        374   4281
> 208         94   4307
> 209        161   4360
> 210        320   4408
> 211        114   4451
> 212        219   4480
> 213         90   4518
> 214        233   4528
> 215        396   4596
> 216        157   4661
> 217        117   4696
> 218        337   4724
> 219        202   4819
> 220         34   4827
> 221        105   4840
> 222        155   4841
> 223        176   4895
> 224        166   4966
> 225        456   5031
> 226        254   5085
> 227        475   5184
> 228         42   5221
> 229        172   5330
> 230        299   5358
> 231        473   5364
> 232        131   5369
> 233         61   5382
> 234        379   5470
> 235        355   5488
> 236        372   5496
> 237         53   5503
> 238         17   5523
> 239        495   5529
> 240        190   5536
> 241        451   5583
> 242        177   5630
> 243        123   5649
> 244        231   5686
> 245        217   5692
> 246         33   5742
> 247         55   5767
> 248        388   5786
> 249        318   5819
> 250         81   5838
> 251         62   5846
> 252        255   5854
> 253        485   5890
> 254        375   5928
> 255        156   5938
> 256        224   5945
> 257        267   5970
> 258        412   5987
> 259        136   6038
> 260        160   6055
> 261        240   6084
> 262         39   6093
> 263        469   6100
> 264        300   6167
> 265        183   6178
> 266        250   6195
> 267         49   6231
> 268        471   6251
> 269        334   6283
> 270        265   6422
> 271        407   6468
> 272        252   6472
> 273        466   6478
> 274        227   6528
> 275        102   6550
> 276        458   6653
> 277        140   6667
> 278        133   6668
> 279        493   6716
> 280        465   6729
> 281        370   6751
> 282        244   6772
> 283        216   6772
> 284        488   6773
> 285         95   6777
> 286         52   6788
> 287         57   6821
> 288        289   6846
> 289        362   6939
> 290        180   6944
> 291        324   6961
> 292        211   7012
> 293         73   7034
> 294        301   7094
> 295         23   7106
> 296         64   7169
> 297        420   7182
> 298         36   7219
> 299        376   7257
> 300        484   7265
> 301        253   7275
> 302        470   7312
> 303        460   7405
> 304         98   7425
> 305        302   7427
> 306        393   7435
> 307        159   7554
> 308        237   7564
> 309        274   7701
> 310        359   7769
> 311         68   7779
> 312        483   7829
> 313        151   7910
> 314        186   7948
> 315        442   7952
> 316        259   8049
> 317        246   8128
> 318         96   8129
> 319        271   8176
> 320        438   8190
> 321         87   8197
> 322        162   8226
> 323        489   8260
> 324        418   8312
> 325         31   8504
> 326        179   8532
> 327         79   8578
> 328        226   8600
> 329         27   8719
> 330        479   8862
> 331        268   8883
> 332        404   8908
> 333         46   8913
> 334        437   8961
> 335        147   9047
> 336        189   9164
> 337         20   9242
> 338        386   9356
> 339        435   9376
> 340        432   9495
> 341        408   9505
> 342        248   9509
> 343        462   9619
> 344        229   9774
> 345        193   9835
> 346        167   9871
> 347         69   9894
> 348        130   9954
> 349        327  10072
> 350        369  10078
> 351        106  10180
> 352        194  10212
> 353        325  10306
> 354        312  10344
> 355        303  10502
> 356        184  10655
> 357        463  10916
> 358        426  11055
> 359        283  11334
> 360        328  11450
> 361        129  11467
> 362        288  11806
> 363        124  12010
> 364        171  12250
> 365        121  12257
> 366         22  12276
> 367        423  12310
> 368        192  12313
> 369        378  12358
> 370        307  12366
> 371        143  12678
> 372         80  12899
> 373         66  12920
> 374        208  12970
> 375        158  13131
> 376        148  13423
> 377        119  13723
> 378        317  13830
> 379        395  13834
> 380        187  14003
> 381         25  14856
> 382        399  14905
> 383        478  16145
> 384         93  20009
> 385        215  20723
> {code}
> This task aims to improve the sampling policy to yield a more even slide distribution in the final image dataset, hopefully reducing the batch effects, and leading to improved model metric performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)