You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Ekta Khanna (JIRA)" <ji...@apache.org> on 2019/05/13 18:10:00 UTC
[jira] [Commented] (MADLIB-1334) Mini-batch preprocessor for DL
running very slowly
[ https://issues.apache.org/jira/browse/MADLIB-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838769#comment-16838769 ]
Ekta Khanna commented on MADLIB-1334:
-------------------------------------
w/ [~okislal]
Below are our observations:
1. The current {{minibatch_preprocessor()}} uses {{madlib.matrix_agg()}}. The function {{madlib.matrix_agg()}} expects only one-dimensional array, for deep learning, we can expect a multi-dimensional array.
1. The current code uses concatenation $1 || $2 => instead we tried using postgres {{array_cat()}} function and see better performance than the current implementation.
Below are the test runs:
*on local mac - Single node 3 segments*
mnist test (10K)
|*Function*|*time(ms)*|*commit*|
|minibatch_preprocessor() |3221.892|master|
|training_preprocessor_dl() |89256.729|master|
|training_preprocessor_dl() |73038.055|with array_cat optimization|
*On cluster- 5 nodes 20 segments*
|*Function*|*time for mnist(10K)*|*time for mnist train(60K)*|*time for places10(50K)*|*commit*|
|minibatch_preprocessor() |2178.928|10000.903|2848488.244|master|
|training_preprocessor_dl() |1489.133| 32669.168|1622776.845 |master|
|training_preprocessor_dl() |1908.26| 25077.881| 1415855.433|with array_cat optimization|
From the above runs, we see that with the {{array_cat()}} optimization, the current minibatch preprocessor for DL ({{training_preprocessor_dl()}}) performs better on larger multi-node cluster, for images with higher resolution.
To improve the performance on single node clusters, there is a follow up JIRA(https://issues.apache.org/jira/browse/MADLIB-1342) created.
> Mini-batch preprocessor for DL running very slowly
> --------------------------------------------------
>
> Key: MADLIB-1334
> URL: https://issues.apache.org/jira/browse/MADLIB-1334
> Project: Apache MADlib
> Issue Type: Bug
> Components: Module: Utilities
> Reporter: Frank McQuillan
> Priority: Major
> Fix For: v1.16
>
>
> Observed on 2-segment Greenplum 5.x cluster using lastest build from MASTER:
> current `minibatch_preprocessor`
> 1) 60K MNIST training examples = 28.1 sec
> 2) 10K MNIST test examples = 5.9 sec
> new `minibatch_preprocessor_dl`
> 3) 60K MNIST training examples = 1912.3 sec
> 4) 10K MNIST test examples = 24.2 sec
> Wonder if there is a bug here, or at least a performance issue? I thought `minibatch_preprocessor_dl` was supposed to be faster than `minibatch_preprocessor`
> (1)
> {code}
> madlib=#
> madlib=# SELECT madlib.minibatch_preprocessor('mnist_train', -- Source table
> madlib(# 'mnist_train_packed', -- Output table
> madlib(# 'y', -- Dependent variable
> madlib(# 'x', -- Independent variables
> madlib(# NULL, -- Grouping
> madlib(# NULL, -- Buffer size
> madlib(# TRUE -- One-hot encode integer dependent var
> madlib(# );
> minibatch_preprocessor
> ------------------------
>
> (1 row)
> Time: 28093.977 ms
> {code}
> (2)
> {code}
> madlib=# SELECT madlib.minibatch_preprocessor('mnist_test', -- Source table
> madlib(# 'mnist_test_packed', -- Output table
> madlib(# 'y', -- Dependent variable
> madlib(# 'x', -- Independent variables
> madlib(# NULL, -- Grouping
> madlib(# NULL, -- Buffer size
> madlib(# TRUE -- One-hot encode integer dependent var
> madlib(# );
> minibatch_preprocessor
> ------------------------
>
> (1 row)
> Time: 5934.194 ms
> {code}
> (3)
> {code}
> madlib=# SELECT madlib.minibatch_preprocessor_dl('mnist_train', -- Source table
> madlib(# 'mnist_train_packed', -- Output table
> madlib(# 'y', -- Dependent variable
> madlib(# 'x', -- Independent variable
> madlib(# NULL, -- Buffer size
> madlib(# 255, -- Normalizing constant
> madlib(# NULL
> madlib(# );
> minibatch_preprocessor_dl
> ---------------------------
>
> (1 row)
> Time: 1912268.396 ms
> {code}
> (4)
> {code}
> madlib=# SELECT madlib.minibatch_preprocessor_dl('mnist_test', -- Source table
> madlib(# 'mnist_test_packed', -- Output table
> madlib(# 'y', -- Dependent variable
> madlib(# 'x', -- Independent variable
> madlib(# NULL, -- Buffer size
> madlib(# 255, -- Normalizing constant
> madlib(# NULL
> madlib(# );
> minibatch_preprocessor_dl
> ---------------------------
>
> (1 row)
> Time: 24192.195 ms
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)