You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (Jira)" <ji...@apache.org> on 2021/01/16 02:24:00 UTC
[jira] [Commented] (MADLIB-1460) Prevent an "integer out of range"
exception in linear regression train
[ https://issues.apache.org/jira/browse/MADLIB-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266483#comment-17266483 ]
Frank McQuillan commented on MADLIB-1460:
-----------------------------------------
(1)
BIGINT indep and dep vars in the BIGINT space
Train
{code}
DROP TABLE IF EXISTS tab1;
CREATE TABLE tab1(
indep_var BIGINT,
dep_var BIGINT
);
INSERT INTO tab1 VALUES(100000000000, 100000000000);
INSERT INTO tab1 VALUES(200000000000, 200000000000);
INSERT INTO tab1 VALUES(300000000000, 300000000000);
INSERT INTO tab1 VALUES(400000000000, 400000000000);
INSERT INTO tab1 VALUES(500000000000, 500000000000);
DROP TABLE IF EXISTS test_linregr, test_linregr_summary;
SELECT madlib.linregr_train( 'tab1',
'test_linregr',
'dep_var',
'ARRAY[1, indep_var]'
);
{code}
{code}
madlib=# select * from test_linregr_summary;
-[ RECORD 1 ]------------+--------------------
method | linregr
source_table | tab1
out_table | test_linregr
dependent_varname | dep_var
independent_varname | ARRAY[1, indep_var]
num_rows_processed | 5
num_missing_rows_skipped | 0
grouping_col |
{code}
{code}
madlib=# select * from test_linregr;
-[ RECORD 1 ]------------+-------------------------
coef | {2.72727272727273e-12,1}
r2 | 1
std_err | {0,0}
t_stats | {Infinity,Infinity}
p_values | {NaN,NaN}
condition_no | 777817459305.202
num_rows_processed | 5
num_missing_rows_skipped | 0
variance_covariance | {{0,0},{0,0}}
{code}
Predict
{code}
madlib=# SELECT madlib.linregr_predict( m.coef, ARRAY[1,indep_var]) as predict FROM tab1, test_linregr m;
predict
--------------
300000000000
500000000000
100000000000
200000000000
400000000000
(5 rows)
{code}
> Prevent an "integer out of range" exception in linear regression train
> ----------------------------------------------------------------------
>
> Key: MADLIB-1460
> URL: https://issues.apache.org/jira/browse/MADLIB-1460
> Project: Apache MADlib
> Issue Type: Improvement
> Components: Module: Linear Regression
> Reporter: Daniel Daniel
> Priority: Minor
> Fix For: v1.18.0
>
>
> Linear regression training results in 2 output tables (*neither are optional*):
> * The *primary* output table, that includes the computed coefficients.
> * A *summary* output table, that contains a single line.
> +Scenario+
> Running the linear regression training in postgresql on an input table which has *more than 2^31 records* within it (even if a grouping column is specified), fails due to an "*integer out of range*" exception.
> +Source+
> *The summary table* has a column that stores *the total number of records* involved in the computation. The column's data type is a *singed integer*. However, the total number of records is computed as a *BIGINT*. Therefore, when the total number of records in the input table is beyond the range of a signed integer (i.e., 2^31), an "integer out of range" exception is thrown.
> +Solution+
> A simple solution is to change the data type of the column from a *signed integer* into a *BIGINT*.
> +Test+
> We have executed the linear regression training function with and without the suggested modification on an input table having between 2^31-2^32 records. Without the modification, an integer out of range exception was thrown. After modifying the code as suggested, it worked perfectly.
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)