You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Rashmi Raghu (JIRA)" <ji...@apache.org> on 2017/04/07 00:31:41 UTC
[jira] [Comment Edited] (MADLIB-1086) Unnest 2-D array by one level
(i.e. into rows of 1-D arrays)
[ https://issues.apache.org/jira/browse/MADLIB-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960062#comment-15960062 ]
Rashmi Raghu edited comment on MADLIB-1086 at 4/7/17 12:30 AM:
---------------------------------------------------------------
The proposed approach for unnesting 2D arrays into a set of 1D array is a SQL function is similar to one of the approaches here [http://stackoverflow.com/questions/8137112/unnest-array-by-one-level/8142998#8142998] :
{code:sql}
CREATE OR REPLACE FUNCTION unnest_multidim(anyarray)
RETURNS SETOF anyarray AS
$BODY$
SELECT array_agg($1[series2.i][series2.x]) FROM
(SELECT generate_series(array_lower($1,2),array_upper($1,2)) as x, series1.i
FROM
(SELECT generate_series(array_lower($1,1),array_upper($1,1)) as i) series1
) series2
GROUP BY series2.i
$BODY$
LANGUAGE sql IMMUTABLE;
{code}
However, this does not preserve element ordering within the resulting 1D arrays in Greenplum. E.g.:
{code:sql}
SELECT unnest_multidim(val) FROM (SELECT ARRAY[[1,2],[3,4],[5,6]] AS val) t;
{code}
{code:sql}
unnest_multidim
-----------------
{2,1}
{4,3}
{5,6}
(3 rows)
{code}
Including 'order by' within the array_agg function call fixes this issue:
{code:sql}
array_agg($1[series2.i][series2.x] ORDER BY series2.x)
{code}
One other aspect that needs to be added is an ID / bookkeeping column to note which position in the 2D array each 1D array corresponds to.
-- Rashmi
was (Author: rashmi.raghu@gmail.com):
The proposed approach for unnesting 2D arrays into a set of 1D array is a SQL function is similar to one of the approaches here [http://stackoverflow.com/questions/8137112/unnest-array-by-one-level/8142998#8142998] :
{code:sql}
CREATE OR REPLACE FUNCTION unnest_multidim(anyarray)
RETURNS SETOF anyarray AS
$BODY$
SELECT array_agg($1[series2.i][series2.x]) FROM
(SELECT generate_series(array_lower($1,2),array_upper($1,2)) as x, series1.i
FROM
(SELECT generate_series(array_lower($1,1),array_upper($1,1)) as i) series1
) series2
GROUP BY series2.i
$BODY$
LANGUAGE sql IMMUTABLE;
{code}
However, this does not preserve element ordering within the resulting 1D arrays in Greenplum. E.g.:
{code:sql}
SELECT unnest_multidim(val) FROM (SELECT ARRAY[[1,2],[3,4],[5,6]] AS val) t;
{code}
{code:sql}
unnest_multidim
-----------------
{2,1}
{4,3}
{5,6}
(3 rows)
{code}
Including 'order by' within the array_agg function call fixes this issue:
{code:sql}
array_agg($1[series2.i][series2.x] ORDER BY series2.x
{code}
One other aspect that needs to be added is an ID / bookkeeping column to note which position in the 2D array each 1D array corresponds to.
-- Rashmi
> Unnest 2-D array by one level (i.e. into rows of 1-D arrays)
> ------------------------------------------------------------
>
> Key: MADLIB-1086
> URL: https://issues.apache.org/jira/browse/MADLIB-1086
> Project: Apache MADlib
> Issue Type: New Feature
> Components: Module: Utilities
> Reporter: Frank McQuillan
> Assignee: Rashmi Raghu
> Priority: Minor
> Fix For: v1.11
>
>
> Context
> Currently k-means returns the following
> {code}
> centroids | {{13.7533333333333,1.905,2.425,16.0666666666667,90.3333333333333,2.805,2.98,0.29,2.005,5.40663333333333,1.04166666666667, 3.31833333333333,1020.83333333333},
> {14.255,1.9325,2.5025,16.05,110.5,3.055,2.9775,0.2975,1.845,6.2125,0.9975,3.365,1378.75}}
> cluster_variance | {122999.110416013,30561.74805}
> objective_fn | 153560.858466013
> frac_reassigned | 0
> num_iterations | 3
> {code}
> Story
> As a data scientist, I want to unnest 2-D array by one level (i.e. into rows of 1-D arrays) in K-means, so that I can get one centroid per row for follow on operations.
> Acceptance
> 1) Add function to array operations
> http://madlib.incubator.apache.org/docs/latest/group__grp__array.html
> 2) Add an example in k-means
> http://madlib.incubator.apache.org/docs/latest/group__grp__kmeans.html
> to demonstrate usage
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)