You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Rahul Iyer (JIRA)" <ji...@apache.org> on 2016/08/09 16:52:20 UTC

[jira] [Updated] (MADLIB-1013) Add array output to create_indicator_variables

     [ https://issues.apache.org/jira/browse/MADLIB-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rahul Iyer updated MADLIB-1013:
-------------------------------
         Assignee: Rahul Iyer
    Fix Version/s: v1.9.2

> Add array output to create_indicator_variables
> ----------------------------------------------
>
>                 Key: MADLIB-1013
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1013
>             Project: Apache MADlib
>          Issue Type: Improvement
>          Components: Module: Utilities
>            Reporter: Rahul Iyer
>            Assignee: Rahul Iyer
>             Fix For: v1.9.2
>
>
> Feature request from Satoshi Nagayasu <sn...@uptime.jp>
> ---------------------------------------------------------------------------------------
> I'm trying create_indicator_variables() to encode categorical variables.
> https://madlib.incubator.apache.org/docs/latest/group__grp__data__prep.html
> And I found that PostgreSQL had a limitation of maximum number of variables
> in SELECT list (called target list in PostgreSQL), up to 1664.
> You may see this error when you have more than 1664 categories in your variable.
> spiexceptions.ProgramLimitExceeded: target lists can have at most 1664 entries
> Now, I'm considering using PostgreSQL arrays to contain indicators instead of
> allocating single column per category.
> If create_indicator_variables() supports arrays as its output, it
> allows us to deal with categorical variables which have more than 1664 categories. And of course, I would like to use the sparse vector for it to compress them.
> https://madlib.incubator.apache.org/docs/latest/group__grp__svec.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)