You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (JIRA)" <ji...@apache.org> on 2016/11/21 20:34:59 UTC

[jira] [Comment Edited] (MADLIB-947) Support grouping for PCA

    [ https://issues.apache.org/jira/browse/MADLIB-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15684648#comment-15684648 ] 

Frank McQuillan edited comment on MADLIB-947 at 11/21/16 8:34 PM:
------------------------------------------------------------------

1) Interface

grouping_col (optional)
{code}
TEXT, default: NULL. An expression list used to group the input dataset into discrete groups, running one model per group. Similar to the SQL "GROUP BY" clause. When this value is NULL, no grouping is used and a single model is generated.
{code)

IF dense 1

* specified: row_id and grouping_cols
* inferred:  row_vec is a numeric array defining the matrix values (to be cast to FLOAT8[])
* errors: if more than 1 numeric array in input table then throw error
* ignore any other columns that do not affect above logic

IF dense 2

* specified: row_id and grouping_cols
* inferred:  numeric columns become matrix values (to be cast each to FLOAT8)
* ignore any other columns that do not affect above logic

IF sparse

* specified:  everything that is needed - row_id, col_id, val_id, grouping_cols
* ignore any other columns that do not affect above logic


2) Performance

Please use the group iteration controller so we get query processor powered efficiency, rather than doing grouping in a straight for-loop which would be slow.





was (Author: fmcquillan):

1) Interface

{code}
grouping_col (optional)
TEXT, default: NULL. An expression list used to group the input dataset into discrete groups, running one model per group. Similar to the SQL "GROUP BY" clause. When this value is NULL, no grouping is used and a single model is generated.
{code)

IF dense 1

* specified: row_id and grouping_cols
* inferred:  row_vec is a numeric array defining the matrix values (to be cast to FLOAT8[])
* errors: if more than 1 numeric array in input table then throw error
* ignore any other columns that do not affect above logic

IF dense 2

* specified: row_id and grouping_cols
* inferred:  numeric columns become matrix values (to be cast each to FLOAT8)
* ignore any other columns that do not affect above logic

IF sparse

* specified:  everything that is needed - row_id, col_id, val_id, grouping_cols
* ignore any other columns that do not affect above logic


2) Performance

Please use the group iteration controller so we get query processor powered efficiency, rather than doing grouping in a straight for-loop which would be slow.




> Support grouping for PCA
> ------------------------
>
>                 Key: MADLIB-947
>                 URL: https://issues.apache.org/jira/browse/MADLIB-947
>             Project: Apache MADlib
>          Issue Type: New Feature
>            Reporter: Frank McQuillan
>             Fix For: v1.10
>
>
> Implement grouping support in PCA
> http://doc.madlib.net/latest/group__grp__pca__train.html#train
> http://doc.madlib.net/latest/group__grp__pca__train.html#train



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)