You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (Jira)" <ji...@apache.org> on 2020/04/01 18:40:00 UTC

[jira] [Closed] (MADLIB-1406) DL: fit multiple takes up unnecessary disk space

     [ https://issues.apache.org/jira/browse/MADLIB-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Frank McQuillan closed MADLIB-1406.
-----------------------------------
    Resolution: Fixed

> DL: fit multiple takes up unnecessary disk space
> ------------------------------------------------
>
>                 Key: MADLIB-1406
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1406
>             Project: Apache MADlib
>          Issue Type: Bug
>          Components: Deep Learning
>            Reporter: Nikhil Kak
>            Assignee: Nikhil Kak
>            Priority: Major
>             Fix For: v1.17
>
>
> While testing places10 with fit multiple (gpdb5, 10 iterations and 20 msts), we ran out of disk space although we had at least 1.5T left at the beginning of the query. There is no reason for us to use this much space and this probably means that there is a bug in the code
> Here is the query and the failure
> {code:java}
> DROP TABLE IF EXISTS mst_table, mst_table_summary;
> SELECT load_model_selection_table(
>     'model_arch_places10',
>     'mst_table',
>     ARRAY[1],
>     ARRAY[
>         $$loss='categorical_crossentropy', optimizer='SGD(lr=0.1, decay=1e-6, nesterov=True)', metrics=['accuracy']$$,
>         $$loss='categorical_crossentropy', optimizer='SGD(lr=0.01, decay=1e-6, nesterov=True)', metrics=['accuracy']$$,
>         $$loss='categorical_crossentropy', optimizer='SGD(lr=0.001, decay=1e-6, nesterov=True)', metrics=['accuracy']$$,
>         $$loss='categorical_crossentropy', optimizer='SGD(lr=0.0001, decay=1e-6, nesterov=True)', metrics=['accuracy']$$,
>         $$loss='categorical_crossentropy', optimizer='SGD(lr=0.001, decay=1e-6, nesterov=False)', metrics=['accuracy']$$
>     ],
>     ARRAY[
>         $$batch_size=16, epochs=1, verbose=0$$,
>         $$batch_size=20, epochs=1, verbose=0$$,
>         $$batch_size=32, epochs=1, verbose=0$$,
>         $$batch_size=40, epochs=1, verbose=0$$
>     ]
> );
> DROP TABLE if exists places10_train_mult_model, places10_train_mult_model_summary, places10_train_mult_model_info;
> SELECT madlib_keras_fit_multiple_model(
>     'places10_train_bytea_batched',
>     'places10_train_mult_model',
>     'mst_table',
>     10,
>     TRUE
> );
> -- failed in the 7th iteration
> ....
> Time for training in iteration 6: 6403.70687222 sec
> ERROR:  plpy.SPIError: could not extend relation 1663/3721274/1121877: No space left on device  (seg1){code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)