You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Eric Jiang <Er...@autodesk.com> on 2012/10/29 11:01:10 UTC
UDAF issue - how to combine array data into one array
Hi All,
I am researching some ways to combine array data with a UDAF. The raw data table schema is listed here:
CREATE TABLE IF NOT EXISTS array_data (session_id string, properties array<struct<name : string, value : string>>);
I would like to do such operation for it with a UDAF "array_combine":
SELECT session_id, array_combine(properties) as combined_properties
FROM array_data
GROUP BY session_id;
For example, array_data table has two records:
session_id1, [{"name":"aaa","value":"111"}, {"name":"bbb","value":"222"}]
session_id1, [{"name":"ccc","value":"333"}, {"name":"ddd","value":"444"}]
Then with the combination, the result should be one record:
session_id1, [{"name":"aaa","value":"111"}, {"name":"bbb","value":"222"}, {"name":"ccc","value":"333"}, {"name":"ddd","value":"444"}]
But when I debug the UDAF, the "iterate" and "merge" functions will pass LazyArray type object as parameter,
public void iterate(AggregationBuffer agg, Object[] parameters)
public void merge(AggregationBuffer agg, Object partial)
There are two questions here:
(1) Why the object is not ArrayList? I checked the input ObjectInspector which is StandardListObjectInspector in "init" function,
public ObjectInspector init(Mode m, ObjectInspector[] parameters)
(2) And how to combine two LazyArray objects into one with easy way in "iterate" and "merge" functions? It seems that I have to create a new LazyArray object, but I don't know the values of separator, nullSequence, escapeChar in original LazyArray object, and I also have less knowledge to build a LazyArray with the complex type (array<struct<name : string, value : string>>).
Does anyone give me a help? Thanks in advance.
Best Regards,
Eric