You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Nikhil (JIRA)" <ji...@apache.org> on 2017/12/08 18:43:00 UTC

[jira] [Commented] (MADLIB-1185) Postgres 10 support for MADlib with large tables

    [ https://issues.apache.org/jira/browse/MADLIB-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284009#comment-16284009 ] 

Nikhil commented on MADLIB-1185:
--------------------------------

The problem seems to be that calling a MADlib C UDF function and then a count(*) of foo causes the database to crash.

Even a simple C UDF like poisson_random crashes the database. We even tried commenting out everything from the implementation of poisson_random and just return NULL but it still crashes the database
{code}
AnyType
poisson_random::run(AnyType &args) {
    return NULL;
}
{code}

Here is the postgres log
{code}
libc++abi.dylib: terminating with uncaught exception of type madlib::dbconnector::postgres::PGException: The backend raised an exception.
2017-12-08 10:36:04.632 PST [72270] LOG:  worker process: parallel worker for PID 86327 (PID 86328) was terminated by signal 6: Abort trap
2017-12-08 10:36:04.632 PST [72270] LOG:  terminating any other active server processes
2017-12-08 10:36:04.632 PST [86327] WARNING:  terminating connection because of crash of another server process
2017-12-08 10:36:04.632 PST [86327] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2017-12-08 10:36:04.632 PST [86327] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2017-12-08 10:36:04.632 PST [86216] WARNING:  terminating connection because of crash of another server process
2017-12-08 10:36:04.632 PST [86216] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2017-12-08 10:36:04.632 PST [86216] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2017-12-08 10:36:04.633 PST [72270] LOG:  all server processes terminated; reinitializing
2017-12-08 10:36:04.642 PST [86330] LOG:  database system was interrupted; last known up at 2017-12-08 10:32:44 PST
2017-12-08 10:36:04.642 PST [86331] FATAL:  the database system is in recovery mode
2017-12-08 10:36:04.729 PST [86330] LOG:  database system was not properly shut down; automatic recovery in progress
2017-12-08 10:36:04.730 PST [86330] LOG:  redo starts at 0/6F83E460
2017-12-08 10:36:04.730 PST [86330] LOG:  invalid record length at 0/6F83E498: wanted 24, got 0
2017-12-08 10:36:04.730 PST [86330] LOG:  redo done at 0/6F83E460
2017-12-08 10:36:04.734 PST [72270] LOG:  database system is ready to accept connections
{code}

Notes
1. We suspect that the root cause is coming from the underlying db abstraction layer.
2. Also the database only crashes count(*) of foo >= 98000.
3. A table without the double precision array does not crash the database even with 98000+ rows
4. We also tried altering the storage type for column x of table foo to one of PLAIN, EXTENDED, MAIN and EXTERNAL but it still crashed the database.
5. We also tried taking MADlib out of the equation by writing our own extension in c and then calling count(*) of foo but this did not crash the database. Hence our suspicion with the db abstraction layer. 

> Postgres 10 support for MADlib with large tables
> ------------------------------------------------
>
>                 Key: MADLIB-1185
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1185
>             Project: Apache MADlib
>          Issue Type: Bug
>          Components: DB Abstraction Layer
>            Reporter: Nikhil
>             Fix For: v1.13
>
>
> Running MADlib on postgres10 with a large dataset ( 98000 rows with a double array column) causes the database to crash.
> Repro Steps
> {code}
> 1. create table foo (id integer, x double precision[], y integer);
> 2. Insert 98000 rows like these
>   id   |            x            | y
> -------+-------------------------+---
>  97440 | {1,0.2,0,1,0,1,0,0,0,0} | 1
> 3. Now running any C madlib UDF followed by a count(*) of foo will cause the database to crash
> select madlib.poisson_random(1); select count(*) from foo;
> or
> select madlib.svec_plus('{1}:{5}', '{1}:{4}'); select count(*) from foo;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)