You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Rashmi Raghu (JIRA)" <ji...@apache.org> on 2017/04/07 00:31:41 UTC

[jira] [Comment Edited] (MADLIB-1086) Unnest 2-D array by one level (i.e. into rows of 1-D arrays)

    [ https://issues.apache.org/jira/browse/MADLIB-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960062#comment-15960062 ] 

Rashmi Raghu edited comment on MADLIB-1086 at 4/7/17 12:30 AM:
---------------------------------------------------------------

The proposed approach for unnesting 2D arrays into a set of 1D array is a SQL function is similar to one of the approaches here [http://stackoverflow.com/questions/8137112/unnest-array-by-one-level/8142998#8142998] : 
{code:sql}
CREATE OR REPLACE FUNCTION unnest_multidim(anyarray)
RETURNS SETOF anyarray AS
$BODY$
  SELECT array_agg($1[series2.i][series2.x]) FROM
    (SELECT generate_series(array_lower($1,2),array_upper($1,2)) as x, series1.i
     FROM 
     (SELECT generate_series(array_lower($1,1),array_upper($1,1)) as i) series1 
    ) series2
GROUP BY series2.i
$BODY$
LANGUAGE sql IMMUTABLE;
{code}

However, this does not preserve element ordering within the resulting 1D arrays in Greenplum. E.g.:
{code:sql}
SELECT unnest_multidim(val) FROM (SELECT ARRAY[[1,2],[3,4],[5,6]] AS val) t;
{code}
{code:sql}
 unnest_multidim 
-----------------
 {2,1}
 {4,3}
 {5,6}
(3 rows)
{code}

Including 'order by' within the array_agg function call fixes this issue:
{code:sql}
array_agg($1[series2.i][series2.x] ORDER BY series2.x)
{code}

One other aspect that needs to be added is an ID / bookkeeping column to note which position in the 2D array each 1D array corresponds to.

-- Rashmi


was (Author: rashmi.raghu@gmail.com):
The proposed approach for unnesting 2D arrays into a set of 1D array is a SQL function is similar to one of the approaches here [http://stackoverflow.com/questions/8137112/unnest-array-by-one-level/8142998#8142998] : 
{code:sql}
CREATE OR REPLACE FUNCTION unnest_multidim(anyarray)
RETURNS SETOF anyarray AS
$BODY$
  SELECT array_agg($1[series2.i][series2.x]) FROM
    (SELECT generate_series(array_lower($1,2),array_upper($1,2)) as x, series1.i
     FROM 
     (SELECT generate_series(array_lower($1,1),array_upper($1,1)) as i) series1 
    ) series2
GROUP BY series2.i
$BODY$
LANGUAGE sql IMMUTABLE;
{code}

However, this does not preserve element ordering within the resulting 1D arrays in Greenplum. E.g.:
{code:sql}
SELECT unnest_multidim(val) FROM (SELECT ARRAY[[1,2],[3,4],[5,6]] AS val) t;
{code}
{code:sql}
 unnest_multidim 
-----------------
 {2,1}
 {4,3}
 {5,6}
(3 rows)
{code}

Including 'order by' within the array_agg function call fixes this issue:
{code:sql}
array_agg($1[series2.i][series2.x] ORDER BY series2.x
{code}

One other aspect that needs to be added is an ID / bookkeeping column to note which position in the 2D array each 1D array corresponds to.

-- Rashmi

> Unnest 2-D array by one level (i.e. into rows of 1-D arrays)
> ------------------------------------------------------------
>
>                 Key: MADLIB-1086
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1086
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: Module: Utilities
>            Reporter: Frank McQuillan
>            Assignee: Rashmi Raghu
>            Priority: Minor
>             Fix For: v1.11
>
>
> Context
> Currently k-means returns the following
> {code}
> centroids        | {{13.7533333333333,1.905,2.425,16.0666666666667,90.3333333333333,2.805,2.98,0.29,2.005,5.40663333333333,1.04166666666667, 3.31833333333333,1020.83333333333},
>                    {14.255,1.9325,2.5025,16.05,110.5,3.055,2.9775,0.2975,1.845,6.2125,0.9975,3.365,1378.75}}
> cluster_variance | {122999.110416013,30561.74805}
> objective_fn     | 153560.858466013
> frac_reassigned  | 0
> num_iterations   | 3
> {code}
> Story
> As a data scientist, I want to unnest 2-D array by one level (i.e. into rows of 1-D arrays) in K-means, so that I can get one centroid per row for follow on operations.
> Acceptance
> 1) Add function to array operations
> http://madlib.incubator.apache.org/docs/latest/group__grp__array.html
> 2) Add an example in k-means
>  http://madlib.incubator.apache.org/docs/latest/group__grp__kmeans.html
> to demonstrate usage



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)