You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (JIRA)" <ji...@apache.org> on 2018/08/17 20:32:00 UTC
[jira] [Commented] (MADLIB-1262) Error for elastic net when the
group column contains no-numeric characters
[ https://issues.apache.org/jira/browse/MADLIB-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584353#comment-16584353 ]
Frank McQuillan commented on MADLIB-1262:
-----------------------------------------
Based on commit https://github.com/apache/madlib/commit/ec328dba6853d31df5b1bd6bbdcd35933596fe78
this seems to work now.
For data set:
{code}
DROP TABLE IF EXISTS houses;
CREATE TABLE houses ( id INT,
tax INT,
bedroom INT,
bath FLOAT,
price INT,
size INT,
lot INT,
zipcode TEXT);
INSERT INTO houses (id, tax, bedroom, bath, price, size, lot, zipcode) VALUES
(1 , 590 , 2 , 1 , 50000 , 770 , 22100 , 'x94301'),
(2 , 1050 , 3 , 2 , 85000 , 1410 , 12000 , 'x94301'),
(3 , 20 , 3 , 1 , 22500 , 1060 , 3500 , 'x94301'),
(4 , 870 , 2 , 2 , 90000 , 1300 , 17500 , 'x94301'),
(5 , 1320 , 3 , 2 , 133000 , 1500 , 30000 , 'x94301'),
(6 , 1350 , 2 , 1 , 90500 , 820 , 25700 , 'x94301'),
(7 , 2790 , 3 , 2.5 , 260000 , 2130 , 25000 , 'x94301'),
(8 , 680 , 2 , 1 , 142500 , 1170 , 22000 , 'x94301'),
(9 , 1840 , 3 , 2 , 160000 , 1500 , 19000 , 'x94301'),
(10 , 3680 , 4 , 2 , 240000 , 2790 , 20000 , 'x94301'),
(11 , 1660 , 3 , 1 , 87000 , 1030 , 17500 , 'x94301'),
(12 , 1620 , 3 , 2 , 118600 , 1250 , 20000 , 'x94301'),
(13 , 3100 , 3 , 2 , 140000 , 1760 , 38000 , 'x94301'),
(14 , 2070 , 2 , 3 , 148000 , 1550 , 14000 , 'x94301'),
(15 , 650 , 3 , 1.5 , 65000 , 1450 , 12000 , 'x94301'),
(16 , 770 , 2 , 2 , 91000 , 1300 , 17500 , 'x76010'),
(17 , 1220 , 3 , 2 , 132300 , 1500 , 30000 , 'x76010'),
(18 , 1150 , 2 , 1 , 91100 , 820 , 25700 , 'x76010'),
(19 , 2690 , 3 , 2.5 , 260011 , 2130 , 25000 , 'x76010'),
(20 , 780 , 2 , 1 , 141800 , 1170 , 22000 , 'x76010'),
(21 , 1910 , 3 , 2 , 160900 , 1500 , 19000 , 'x76010'),
(22 , 3600 , 4 , 2 , 239000 , 2790 , 20000 , 'x76010'),
(23 , 1600 , 3 , 1 , 81010 , 1030 , 17500 , 'x76010'),
(24 , 1590 , 3 , 2 , 117910 , 1250 , 20000 , 'x76010'),
(25 , 3200 , 3 , 2 , 141100 , 1760 , 38000 , 'x76010'),
(26 , 2270 , 2 , 3 , 148011 , 1550 , 14000 , 'x76010'),
(27 , 750 , 3 , 1.5 , 66000 , 1450 , 12000 , 'x76010');
{code}
For query:
{code}
DROP TABLE IF EXISTS houses_en1, houses_en1_summary;
SELECT madlib.elastic_net_train( 'houses', -- Source table
'houses_en1', -- Result table
'price', -- Dependent variable
'array[tax, bath, size]', -- Independent variable
'gaussian', -- Regression family
0.5, -- Alpha value
0.1, -- Lambda value
TRUE, -- Standardize
'zipcode', -- Grouping column(s)
'fista', -- Optimizer
'', -- Optimizer parameters
NULL, -- Excluded columns
10000, -- Maximum iterations
1e-6 -- Tolerance value
);
SELECT * FROM houses_en1;
{code}
In 1.15 the error is:
{code}
Done.
(psycopg2.ProgrammingError) spiexceptions.UndefinedColumn: column "x94301" does not exist
LINE 3: (x94301, 'gaussian', '{tax,bath,size}'::text...
^
QUERY:
INSERT INTO houses_en1 VALUES
(x94301, 'gaussian', '{tax,bath,size}'::text[], '{tax,bath,size}'::text[],
'{27.6945611116,11509.0099645,49.094555881}'::double precision[], '{27.6945611116,11509.0099645,49.094555881}'::double precision[],
-11145.5062166, -520358787.006, True, 10000)
CONTEXT: Traceback (most recent call last):
PL/Python function "elastic_net_train", line 24, in <module>
excluded, max_iter, tolerance)
PL/Python function "elastic_net_train", line 469, in elastic_net_train
PL/Python function "elastic_net_train", line 510, in _internal_elastic_net_train
PL/Python function "elastic_net_train", line 24, in _elastic_net_gaussian_fista_train
PL/Python function "elastic_net_train", line 171, in _elastic_net_fista_train
PL/Python function "elastic_net_train", line 297, in _elastic_net_fista_train_compute
PL/Python function "elastic_net_train", line 120, in _elastic_net_generate_result
PL/Python function "elastic_net_train", line 191, in build_output_table
PL/Python function "elastic_net_train"
[SQL: "SELECT madlib.elastic_net_train( 'houses', -- Source table\n 'houses_en1', -- Result table\n 'price', -- Dependent variable\n 'array[tax, bath, size]', -- Independent variable\n 'gaussian', -- Regression family\n 0.5, -- Alpha value\n 0.1, -- Lambda value\n TRUE, -- Standardize\n 'zipcode', -- Grouping column(s)\n 'fista', -- Optimizer\n '', -- Optimizer parameters\n NULL, -- Excluded columns\n 10000, -- Maximum iterations\n 1e-6 -- Tolerance value\n );"]
{code}
But for 1.15.1 with the commit applied, I get
{code}
-[ RECORD 1 ]-----+--------------------------------------------
zipcode | x94301
family | gaussian
features | {tax,bath,size}
features_selected | {tax,bath,size}
coef_nonzero | {27.6945595076,11509.0096164,49.0945593644}
coef_all | {27.6945595076,11509.0096164,49.0945593644}
intercept | -11145.5081132
log_likelihood | -520358783.294
standardize | t
iteration_run | 10000
-[ RECORD 2 ]-----+--------------------------------------------
zipcode | x76010
family | gaussian
features | {tax,bath,size}
features_selected | {tax,bath,size}
coef_nonzero | {14.9802170998,9133.17395416,62.8225354177}
coef_all | {14.9802170998,9133.17395416,62.8225354177}
intercept | 14.7356231281
log_likelihood | -525667072.871
standardize | t
iteration_run | 10000
{code}
> Error for elastic net when the group column contains no-numeric characters
> --------------------------------------------------------------------------
>
> Key: MADLIB-1262
> URL: https://issues.apache.org/jira/browse/MADLIB-1262
> Project: Apache MADlib
> Issue Type: Bug
> Components: Module: Regularized Regression
> Reporter: Frank McQuillan
> Priority: Minor
> Fix For: v1.15.1
>
>
> Minor error for elastic net when the group column contains no-numeric characters.
> It can be fixed by changing the below code in elastic_net_generate_result.py
> {code}
> grouping_info = "\'"+ '\',\''.join([str(res[grp_col.strip()])
> for grp_col in grouping_str1.split(',')
> if grp_col.strip() in res.keys()]) + "\',"
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)