You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Nikhil (JIRA)" <ji...@apache.org> on 2018/03/27 00:38:00 UTC

[jira] [Comment Edited] (MADLIB-1136) Getting "ERROR: plpy.SPIError: Function" when calling linregr_train function with big data

    [ https://issues.apache.org/jira/browse/MADLIB-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414823#comment-16414823 ] 

Nikhil edited comment on MADLIB-1136 at 3/27/18 12:37 AM:
----------------------------------------------------------

[~david.chen@wavein.com.tw]
 Hi David,

I recently saw the same problem with linear regression for a big dataset (500 million rows) and found out that increasing *statement_mem* from default *125 MB* to *175 MB* fixed the problem. If you still have this issue, try increasing the *statement_mem*.

{code}
psql (8.3.23)
Type "help" for help.

testdb=> show statement_mem;
 statement_mem
---------------
 125MB
(1 row)

testdb=> set statement_mem = '175MB';
SET

{code}



was (Author: nkak):
[~david.chen@wavein.com.tw]
Hi David,

I recently saw the same problem with linear regression for a big dataset (500 million rows) and found out that increasing the `statement_mem` from default `125 MB` to `175 MB` fixed the problem. If you still have this issue, try increasing the `statement_mem`

> Getting "ERROR: plpy.SPIError: Function" when calling linregr_train function with big data 
> -------------------------------------------------------------------------------------------
>
>                 Key: MADLIB-1136
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1136
>             Project: Apache MADlib
>          Issue Type: Bug
>          Components: Module: Linear Regression
>            Reporter: David Chen
>            Priority: Major
>
> hi MADLib developers,
> we have been trying to use MADlib on Greenplum to in-database perform linear regression calculation on a large amount of data (789,626,243 rows of data, segmented in ~475,000 groups). However, after running the following SQL statement for a little bit more than ten minutes, the following error message occurs:
> SQL statement: 
> SELECT madlib.linregr_train(
>     'xinos_plus_case_dlinterference_v2.temp_neighbor_pair_cqi_prb_nonull',
>     'xinos_plus_case_dlinterference_v2.taipei_lm_result_temp', 
>     'average_cqi', 'array[1, prb_utilization]',
>     'main_lnbts_id,main_lncel_id,lnbts_id,lncel_id');
> Error message:
> ERROR: plpy.SPIError: Function "madlib.linregr_merge_states(madlib.bytea8,madlib.bytea8)": ByteString improperly aligned for alignment request in seek(). (UDF_impl.hpp:210)  (seg2 59-120-199-107.HINET-IP.hinet.net:50002 pid=9137) (plpython.c:4648)
> If we downsize the input data to 269837688 rows, then the same SQL statement can run with successful result.
> We are not sure if what we encountered here is a bug or an issue with how we use this MADLib linear regression function and we will appreciate it a lot if you could give us some pointers.
> We are willing to provide more information about input data (e.g. data schema) for further investigation if needed.
> thank you very much for taking care of this issue.
> David



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)