You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (JIRA)" <ji...@apache.org> on 2019/05/01 18:42:00 UTC

[jira] [Created] (MADLIB-1334) Mini-batch preprocessor for DL running very slowly

Frank McQuillan created MADLIB-1334:
---------------------------------------

             Summary: Mini-batch preprocessor for DL running very slowly
                 Key: MADLIB-1334
                 URL: https://issues.apache.org/jira/browse/MADLIB-1334
             Project: Apache MADlib
          Issue Type: Bug
          Components: Module: Utilities
            Reporter: Frank McQuillan
             Fix For: v1.16


Observed on 2-segment Greenplum 5.x cluster using lastest build from MASTER:

current `minibatch_preprocessor`
1) 60K MNIST training examples = 28.1 sec
2) 10K MNIST test examples = 5.9 sec

new `minibatch_preprocessor_dl`
3) 60K MNIST training examples = 1912.3 sec
4) 10K MNIST test examples = 24.2 sec

Wonder if there is a bug here, or at least a performance issue?  I thought `minibatch_preprocessor_dl` was supposed to be faster than `minibatch_preprocessor` 

(1)
{code}
madlib=# 
madlib=# SELECT madlib.minibatch_preprocessor('mnist_train',         -- Source table
madlib(#                                      'mnist_train_packed',  -- Output table
madlib(#                                      'y',                   -- Dependent variable
madlib(#                                      'x',                   -- Independent variables
madlib(#                                      NULL,                  -- Grouping 
madlib(#                                      NULL,                  -- Buffer size
madlib(#                                      TRUE                   -- One-hot encode integer dependent var
madlib(#                                      );
 minibatch_preprocessor 
------------------------
 
(1 row)

Time: 28093.977 ms
{code}

(2)
{code}
madlib=# SELECT madlib.minibatch_preprocessor('mnist_test',         -- Source table
madlib(#                                      'mnist_test_packed',  -- Output table
madlib(#                                      'y',                   -- Dependent variable
madlib(#                                      'x',                   -- Independent variables
madlib(#                                      NULL,                  -- Grouping 
madlib(#                                      NULL,                  -- Buffer size
madlib(#                                      TRUE                   -- One-hot encode integer dependent var
madlib(#                                      );
 minibatch_preprocessor 
------------------------
 
(1 row)

Time: 5934.194 ms
{code}

(3)
{code}
madlib=# SELECT madlib.minibatch_preprocessor_dl('mnist_train',         -- Source table
madlib(#                                         'mnist_train_packed',  -- Output table
madlib(#                                         'y',                   -- Dependent variable
madlib(#                                         'x',                   -- Independent variable
madlib(#                                          NULL,                 -- Buffer size
madlib(#                                          255,                  -- Normalizing constant
madlib(#                                          NULL
madlib(#                                         ); 
 minibatch_preprocessor_dl 
---------------------------
 
(1 row)

Time: 1912268.396 ms
{code}

(4)
{code}
madlib=# SELECT madlib.minibatch_preprocessor_dl('mnist_test',         -- Source table
madlib(#                                         'mnist_test_packed',  -- Output table
madlib(#                                         'y',                   -- Dependent variable
madlib(#                                         'x',                   -- Independent variable
madlib(#                                          NULL,                 -- Buffer size
madlib(#                                          255,                  -- Normalizing constant
madlib(#                                          NULL
madlib(#                                         ); 
 minibatch_preprocessor_dl 
---------------------------
 
(1 row)

Time: 24192.195 ms
{code}










--
This message was sent by Atlassian JIRA
(v7.6.3#76005)