You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (JIRA)" <ji...@apache.org> on 2018/08/17 20:32:00 UTC

[jira] [Commented] (MADLIB-1262) Error for elastic net when the group column contains no-numeric characters

    [ https://issues.apache.org/jira/browse/MADLIB-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584353#comment-16584353 ] 

Frank McQuillan commented on MADLIB-1262:
-----------------------------------------

Based on commit https://github.com/apache/madlib/commit/ec328dba6853d31df5b1bd6bbdcd35933596fe78
this seems to work now.

For data set:

{code}
DROP TABLE IF EXISTS houses;
CREATE TABLE houses ( id INT,
                      tax INT,
                      bedroom INT,
                      bath FLOAT,
                      price INT,
                      size INT,
                      lot INT,
                      zipcode TEXT);

INSERT INTO houses (id, tax, bedroom, bath, price, size, lot, zipcode) VALUES
(1  ,  590 ,       2 ,    1 ,  50000 ,  770 , 22100  , 'x94301'),
(2  , 1050 ,       3 ,    2 ,  85000 , 1410 , 12000  , 'x94301'),
(3  ,   20 ,       3 ,    1 ,  22500 , 1060 ,  3500  , 'x94301'),
(4  ,  870 ,       2 ,    2 ,  90000 , 1300 , 17500  , 'x94301'),
(5  , 1320 ,       3 ,    2 , 133000 , 1500 , 30000  , 'x94301'),
(6  , 1350 ,       2 ,    1 ,  90500 ,  820 , 25700  , 'x94301'),
(7  , 2790 ,       3 ,  2.5 , 260000 , 2130 , 25000  , 'x94301'),
(8  ,  680 ,       2 ,    1 , 142500 , 1170 , 22000  , 'x94301'),
(9  , 1840 ,       3 ,    2 , 160000 , 1500 , 19000  , 'x94301'),
(10 , 3680 ,       4 ,    2 , 240000 , 2790 , 20000  , 'x94301'),
(11 , 1660 ,       3 ,    1 ,  87000 , 1030 , 17500  , 'x94301'),
(12 , 1620 ,       3 ,    2 , 118600 , 1250 , 20000  , 'x94301'),
(13 , 3100 ,       3 ,    2 , 140000 , 1760 , 38000  , 'x94301'),
(14 , 2070 ,       2 ,    3 , 148000 , 1550 , 14000  , 'x94301'),
(15 ,  650 ,       3 ,  1.5 ,  65000 , 1450 , 12000  , 'x94301'),
(16 ,  770 ,       2 ,    2 ,  91000 , 1300 , 17500  , 'x76010'),
(17 , 1220 ,       3 ,    2 , 132300 , 1500 , 30000  , 'x76010'),
(18 , 1150 ,       2 ,    1 ,  91100 ,  820 , 25700  , 'x76010'),
(19 , 2690 ,       3 ,  2.5 , 260011 , 2130 , 25000  , 'x76010'),
(20 ,  780 ,       2 ,    1 , 141800 , 1170 , 22000  , 'x76010'),
(21 , 1910 ,       3 ,    2 , 160900 , 1500 , 19000  , 'x76010'),
(22 , 3600 ,       4 ,    2 , 239000 , 2790 , 20000  , 'x76010'),
(23 , 1600 ,       3 ,    1 ,  81010 , 1030 , 17500  , 'x76010'),
(24 , 1590 ,       3 ,    2 , 117910 , 1250 , 20000  , 'x76010'),
(25 , 3200 ,       3 ,    2 , 141100 , 1760 , 38000  , 'x76010'),
(26 , 2270 ,       2 ,    3 , 148011 , 1550 , 14000  , 'x76010'),
(27 ,  750 ,       3 ,  1.5 ,  66000 , 1450 , 12000  , 'x76010');
{code}

For query:

{code}
DROP TABLE IF EXISTS houses_en1, houses_en1_summary;
SELECT madlib.elastic_net_train( 'houses',                  -- Source table
                                 'houses_en1',              -- Result table
                                 'price',                   -- Dependent variable
                                 'array[tax, bath, size]',  -- Independent variable
                                 'gaussian',                -- Regression family
                                 0.5,                       -- Alpha value
                                 0.1,                       -- Lambda value
                                 TRUE,                      -- Standardize
                                 'zipcode',                 -- Grouping column(s)
                                 'fista',                   -- Optimizer
                                 '',                        -- Optimizer parameters
                                 NULL,                      -- Excluded columns
                                 10000,                     -- Maximum iterations
                                 1e-6                       -- Tolerance value
                               );
SELECT * FROM houses_en1;
{code}

In 1.15 the error is:

{code}
Done.
(psycopg2.ProgrammingError) spiexceptions.UndefinedColumn: column "x94301" does not exist
LINE 3:                 (x94301, 'gaussian', '{tax,bath,size}'::text...
                         ^
QUERY:  
            INSERT INTO houses_en1 VALUES
                (x94301, 'gaussian', '{tax,bath,size}'::text[], '{tax,bath,size}'::text[],
                '{27.6945611116,11509.0099645,49.094555881}'::double precision[], '{27.6945611116,11509.0099645,49.094555881}'::double precision[],
                -11145.5062166, -520358787.006, True, 10000)
            
CONTEXT:  Traceback (most recent call last):
  PL/Python function "elastic_net_train", line 24, in <module>
    excluded, max_iter, tolerance)
  PL/Python function "elastic_net_train", line 469, in elastic_net_train
  PL/Python function "elastic_net_train", line 510, in _internal_elastic_net_train
  PL/Python function "elastic_net_train", line 24, in _elastic_net_gaussian_fista_train
  PL/Python function "elastic_net_train", line 171, in _elastic_net_fista_train
  PL/Python function "elastic_net_train", line 297, in _elastic_net_fista_train_compute
  PL/Python function "elastic_net_train", line 120, in _elastic_net_generate_result
  PL/Python function "elastic_net_train", line 191, in build_output_table
PL/Python function "elastic_net_train"
 [SQL: "SELECT madlib.elastic_net_train( 'houses',                  -- Source table\n                                 'houses_en1',              -- Result table\n                                 'price',                   -- Dependent variable\n                                 'array[tax, bath, size]',  -- Independent variable\n                                 'gaussian',                -- Regression family\n                                 0.5,                       -- Alpha value\n                                 0.1,                       -- Lambda value\n                                 TRUE,                      -- Standardize\n                                 'zipcode',                 -- Grouping column(s)\n                                 'fista',                   -- Optimizer\n                                 '',                        -- Optimizer parameters\n                                 NULL,                      -- Excluded columns\n                                 10000,                     -- Maximum iterations\n                                 1e-6                       -- Tolerance value\n                               );"]
{code}

But for 1.15.1 with the commit applied, I get

{code}
-[ RECORD 1 ]-----+--------------------------------------------
zipcode           | x94301
family            | gaussian
features          | {tax,bath,size}
features_selected | {tax,bath,size}
coef_nonzero      | {27.6945595076,11509.0096164,49.0945593644}
coef_all          | {27.6945595076,11509.0096164,49.0945593644}
intercept         | -11145.5081132
log_likelihood    | -520358783.294
standardize       | t
iteration_run     | 10000
-[ RECORD 2 ]-----+--------------------------------------------
zipcode           | x76010
family            | gaussian
features          | {tax,bath,size}
features_selected | {tax,bath,size}
coef_nonzero      | {14.9802170998,9133.17395416,62.8225354177}
coef_all          | {14.9802170998,9133.17395416,62.8225354177}
intercept         | 14.7356231281
log_likelihood    | -525667072.871
standardize       | t
iteration_run     | 10000
{code}

> Error for elastic net when the group column contains no-numeric characters
> --------------------------------------------------------------------------
>
>                 Key: MADLIB-1262
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1262
>             Project: Apache MADlib
>          Issue Type: Bug
>          Components: Module: Regularized Regression
>            Reporter: Frank McQuillan
>            Priority: Minor
>             Fix For: v1.15.1
>
>
> Minor error for elastic net when the group column contains no-numeric characters.  
> It can be fixed by changing the below code in elastic_net_generate_result.py
> {code}
>     grouping_info = "\'"+ '\',\''.join([str(res[grp_col.strip()])
>                                       for grp_col in grouping_str1.split(',')
>                                       if grp_col.strip() in res.keys()]) + "\',"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)