You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Ekta Khanna (JIRA)" <ji...@apache.org> on 2019/06/14 20:25:00 UTC

[jira] [Commented] (MADLIB-1326) DL: Dev-check fails when keras_fit is called after array_scalar_mult

    [ https://issues.apache.org/jira/browse/MADLIB-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16864409#comment-16864409 ] 

Ekta Khanna commented on MADLIB-1326:
-------------------------------------

Merged as part of PR: https://github.com/apache/madlib/pull/405

> DL: Dev-check fails when keras_fit is called after array_scalar_mult
> --------------------------------------------------------------------
>
>                 Key: MADLIB-1326
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1326
>             Project: Apache MADlib
>          Issue Type: Bug
>          Components: Deep Learning
>            Reporter: Nandish Jayaram
>            Priority: Major
>             Fix For: v1.16
>
>
> In madlib_keras dev-check, we create the input data to fit using {{minibatch_preprocessor_dl()}}. This function internally calls {{array_scalar_mult()}}. If we call either of these functions followed by {{madlib_keras_fit()}}, then the following error pops up:
> {code:java}
> NOTICE:  Releasing segworker groups to finish aborting the transaction.
> ERROR:  could not connect to segment: initialization of segworker group failed (cdbgang.c:237)
> {code}
> Digging further into Postgres logs suggests that there was a segmentation fault, and it seems like it's happening the moment {{import keras}} is called in {{madlib_keras_fit()}}.
> This issue was first noticed while working on MADLIB-1304 (which was closed with [this commit|https://github.com/apache/madlib/commit/241074ae68cb8e15437f98abf1c2e3c7bb3471ae], as the comment [in this line|https://github.com/apache/madlib/commit/241074ae68cb8e15437f98abf1c2e3c7bb3471ae#diff-f89c193e163bfe0e7e3821445e38fa97R29] suggests. This happened on Greenplum then, and Postgres was not supporting deep learning yet. This was again noticed while working on MADLIB-1311, which added Postgres support. At this point, the failure happened on Postgres and there were no failures on Greenplum.
> While working on MADLIB-1311, we tried a couple of things and observed an odd behavior. We created a dummy function:
> {code:java}
> create function dummy()
> returns void as
> $$
> import keras
> $$
> language plpythonu;
> {code}
> If we ran {{select dummy()}} *before* running {{minibatch_preprocessor_dl()}} or {{array_scalar_mult()}}, then the whole dev-check passes. But running the same function right after calling either of those functions causes a failure.
>  So, looks like any UDF that calls {{import keras}} *must* be run *before* calling {{minibatch_preprocessor_dl()}} or {{array_scalar_mult()}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)