You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Gabriel Reid (JIRA)" <ji...@apache.org> on 2014/03/16 08:23:14 UTC

[jira] [Resolved] (PHOENIX-309) Support ARRAY data type

     [ https://issues.apache.org/jira/browse/PHOENIX-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gabriel Reid resolved PHOENIX-309.
----------------------------------

    Resolution: Fixed

Bulk resolve of closed issues imported from GitHub. This status was reached by first re-opening all closed imported issues and then resolving them in bulk.

> Support ARRAY data type
> -----------------------
>
>                 Key: PHOENIX-309
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-309
>             Project: Phoenix
>          Issue Type: Task
>    Affects Versions: 3.0-Release
>            Reporter: James Taylor
>            Assignee: ramkrishna
>              Labels: enhancement
>
> Benchmark outline from Sudarshan Kadambi which describes a good use case for supporting an ARRAY data type. Supporting an ARRAY data type, an UNNEST built-in function, plus derived tables would allow the following standard query to be used (as opposed to creating custom aggregate functions):
> select avg(v) 
> from (select unnest(value) v from t
>          where object_id in (O1,O2,...O250K) and field_type = 'F1' and attrib_id = 'A1')
> It'd be nice if you could just do an average over an array directly, but this would be non standard.
> On 04/26/2013 11:17 AM, Sudarshan Kadambi (BLOOMBERG) wrote:
> Hi James:
> Yes, I saw the email. Thank you for this generous offer. I wanted some time to make sure the benchmark correctly represents my use case. 
> If you wish, here's a benchmark setup you could use:
> 1. 1 Billion keys in table
> 2. Query contains 250K randomly chosen object_ids and a single field_type (F1)
> 3. Aggregation done on a single attribute (A1)
> 4. Query:
>    select avg(value) where object_id in (O1,O2,...O250K) and field_type = 'F1' and attrib_id = 'A1'
> We would want the test done with and without the skip scan filter for the purpose of comparison.
> The reason why I wanted some time to think about it is that the values within each attribute is a JSON number array. So an avg across 2 values is a average of the averages.
> For e.g. Object_id=O1, Field_type=F1, Attrib_id=A1, Value 1: {1,2,3,4,5,}
>          Object_id=O2, Field_type=F1, Attrib_id=A1, Value 2: {1,2,3,4,5}
> The query should produce: Avg{Avg{Value1}, Avg{Value2}} = Avg{3,3} = 3
> If we were doing a sum, the query would produce: Sum{Avg{Value1}, Avg{Value2} = Sum{3,3} = 6.
> This might require customization of the aggregate function code.



--
This message was sent by Atlassian JIRA
(v6.2#6252)