You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Eric Jiang <Er...@autodesk.com> on 2012/10/29 11:01:10 UTC

UDAF issue - how to combine array data into one array

Hi All,

I am researching some ways to combine array data with a UDAF. The raw data table schema is listed here:

CREATE TABLE IF NOT EXISTS array_data (session_id string, properties array<struct<name : string, value : string>>);

I would like to do such operation for it with a UDAF "array_combine":

SELECT session_id, array_combine(properties) as combined_properties
FROM array_data
GROUP BY session_id;

For example, array_data table has two records:

session_id1, [{"name":"aaa","value":"111"}, {"name":"bbb","value":"222"}]
session_id1, [{"name":"ccc","value":"333"}, {"name":"ddd","value":"444"}]

Then with the combination, the result should be one record:

session_id1, [{"name":"aaa","value":"111"}, {"name":"bbb","value":"222"}, {"name":"ccc","value":"333"}, {"name":"ddd","value":"444"}]


But when I debug the UDAF, the "iterate" and "merge" functions will pass LazyArray type object as parameter,

public void iterate(AggregationBuffer agg, Object[] parameters)
public void merge(AggregationBuffer agg, Object partial)


There are two questions here:


(1)    Why the object is not ArrayList? I checked the input ObjectInspector which is StandardListObjectInspector in "init" function,

public ObjectInspector init(Mode m, ObjectInspector[] parameters)


(2)    And how to combine two LazyArray objects into one with easy way in "iterate" and "merge" functions? It seems that I have to create a new LazyArray object, but I don't know the values of separator, nullSequence, escapeChar in original LazyArray object, and I also have less knowledge to build a LazyArray with the complex type (array<struct<name : string, value : string>>).

Does anyone give me a help? Thanks in advance.


Best Regards,
Eric