You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Rahul Iyer (JIRA)" <ji...@apache.org> on 2016/08/09 16:52:20 UTC
[jira] [Created] (MADLIB-1013) Add array output to
create_indicator_variables
Rahul Iyer created MADLIB-1013:
----------------------------------
Summary: Add array output to create_indicator_variables
Key: MADLIB-1013
URL: https://issues.apache.org/jira/browse/MADLIB-1013
Project: Apache MADlib
Issue Type: Improvement
Components: Module: Utilities
Reporter: Rahul Iyer
Feature request from Satoshi Nagayasu <sn...@uptime.jp>
---------------------------------------------------------------------------------------
I'm trying create_indicator_variables() to encode categorical variables.
https://madlib.incubator.apache.org/docs/latest/group__grp__data__prep.html
And I found that PostgreSQL had a limitation of maximum number of variables
in SELECT list (called target list in PostgreSQL), up to 1664.
You may see this error when you have more than 1664 categories in your variable.
spiexceptions.ProgramLimitExceeded: target lists can have at most 1664 entries
Now, I'm considering using PostgreSQL arrays to contain indicators instead of
allocating single column per category.
If create_indicator_variables() supports arrays as its output, it
allows us to deal with categorical variables which have more than 1664 categories. And of course, I would like to use the sparse vector for it to compress them.
https://madlib.incubator.apache.org/docs/latest/group__grp__svec.html
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)