You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (JIRA)" <ji...@apache.org> on 2018/05/25 20:12:00 UTC

[jira] [Assigned] (MADLIB-1240) Vector to Columns

     [ https://issues.apache.org/jira/browse/MADLIB-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Frank McQuillan reassigned MADLIB-1240:
---------------------------------------

    Assignee: Himanshu Pandey

> Vector to Columns
> -----------------
>
>                 Key: MADLIB-1240
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1240
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: Module: Utilities
>            Reporter: Frank McQuillan
>            Assignee: Himanshu Pandey
>            Priority: Major
>             Fix For: v1.15
>
>
> related to https://issues.apache.org/jira/browse/MADLIB-1239
> Vector to Columns
> Converts a feature array in a single column of an output table into multiple columns.  This process can be used to reverse the function cols2vec.
> {code}
> vec2cols(
>     source_table,
>     out_table,
>     vector_col,
>     dictionary,
>     cols_to_output
>     )
> source_table
> TEXT. Name of the table containing the source data.
> out_table
> TEXT. Name of the generated table containing the output. If a table with the same name already exists, an error will be returned. 
> vector_col
> TEXT.  Name of the column containing the feature array.  Must be a one-dimensional array.
> dictionary (optional)
> TEXT. Name of the table containing the array of names associated with the feature array.  This table is created by the function 'cols2vec'.  If the dictionary table is not specified, column names will be automatically generated of the form 'feature_1, feature_2, ...feature_n'
> cols_to_output (optional)
> TEXT, default NULL. Comma-separated string of column names from the source table to keep in the output table, in addition to the feature columns.  To keep all columns from the source table, use '*'.
> Output
> The output table produced by the vec2cols function contains the following columns:
> <...>
> Columns from source table, depending on which ones are kept (if any).
> feature columns
> Columns for each of the features in 'vector_col'.  Column type will depend on the feature array type in the source table.  Column naming will depend on whether the parameter 'dictionary' is used.
> {code}
> Notes
> (1)
> The function
> http://pivotalsoftware.github.io/PDLTools/group__ArrayUtilities.html
> is similar but the proposed MADlib one has more options.  To do the equivalent of the PDL Tools one in MADlib, you would do:
> {code}
> vec2cols(
>     table_name,
>     output_table,
>     vector_column,
>     NULL,
>     '*'
>     )
> {code}
> (2)
> Please put the generated feature columns on the right side of the output table, i.e., they will be the last column on the right.  Maintain the order of the array.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)