You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (JIRA)" <ji...@apache.org> on 2019/05/01 18:42:00 UTC
[jira] [Created] (MADLIB-1334) Mini-batch preprocessor for DL
running very slowly
Frank McQuillan created MADLIB-1334:
---------------------------------------
Summary: Mini-batch preprocessor for DL running very slowly
Key: MADLIB-1334
URL: https://issues.apache.org/jira/browse/MADLIB-1334
Project: Apache MADlib
Issue Type: Bug
Components: Module: Utilities
Reporter: Frank McQuillan
Fix For: v1.16
Observed on 2-segment Greenplum 5.x cluster using lastest build from MASTER:
current `minibatch_preprocessor`
1) 60K MNIST training examples = 28.1 sec
2) 10K MNIST test examples = 5.9 sec
new `minibatch_preprocessor_dl`
3) 60K MNIST training examples = 1912.3 sec
4) 10K MNIST test examples = 24.2 sec
Wonder if there is a bug here, or at least a performance issue? I thought `minibatch_preprocessor_dl` was supposed to be faster than `minibatch_preprocessor`
(1)
{code}
madlib=#
madlib=# SELECT madlib.minibatch_preprocessor('mnist_train', -- Source table
madlib(# 'mnist_train_packed', -- Output table
madlib(# 'y', -- Dependent variable
madlib(# 'x', -- Independent variables
madlib(# NULL, -- Grouping
madlib(# NULL, -- Buffer size
madlib(# TRUE -- One-hot encode integer dependent var
madlib(# );
minibatch_preprocessor
------------------------
(1 row)
Time: 28093.977 ms
{code}
(2)
{code}
madlib=# SELECT madlib.minibatch_preprocessor('mnist_test', -- Source table
madlib(# 'mnist_test_packed', -- Output table
madlib(# 'y', -- Dependent variable
madlib(# 'x', -- Independent variables
madlib(# NULL, -- Grouping
madlib(# NULL, -- Buffer size
madlib(# TRUE -- One-hot encode integer dependent var
madlib(# );
minibatch_preprocessor
------------------------
(1 row)
Time: 5934.194 ms
{code}
(3)
{code}
madlib=# SELECT madlib.minibatch_preprocessor_dl('mnist_train', -- Source table
madlib(# 'mnist_train_packed', -- Output table
madlib(# 'y', -- Dependent variable
madlib(# 'x', -- Independent variable
madlib(# NULL, -- Buffer size
madlib(# 255, -- Normalizing constant
madlib(# NULL
madlib(# );
minibatch_preprocessor_dl
---------------------------
(1 row)
Time: 1912268.396 ms
{code}
(4)
{code}
madlib=# SELECT madlib.minibatch_preprocessor_dl('mnist_test', -- Source table
madlib(# 'mnist_test_packed', -- Output table
madlib(# 'y', -- Dependent variable
madlib(# 'x', -- Independent variable
madlib(# NULL, -- Buffer size
madlib(# 255, -- Normalizing constant
madlib(# NULL
madlib(# );
minibatch_preprocessor_dl
---------------------------
(1 row)
Time: 24192.195 ms
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)