You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Rathish A M <ra...@gmail.com> on 2014/11/03 16:24:55 UTC
help with hive -UDAF function - AggregationBuffer overwritten
Hi Team,
I am trying to create a parent child hierarchy using hive UDAF function
My Mapper output is coming as per expected.
But in reducer Merge function my AggregationBuffer object is over
written while calling ((LazyBinaryMap) ).getMap() function.This is
happening intermittently
Please let me know is it a technical limitation with hive UDAF function.
highlighted the issue steps :- (When The getMap function executes the
value of ((Components) agg).buffer is overwritten)
public void merge(AggregationBuffer agg, Object partial)
throws
HiveException {
Map<Text, List> map2 = ((Components) agg).buffer;
Map<Text, Object> map1 = getMap(partial); // ((LazyBinaryMap) ).getMap()
Set<Text> set1 = map1.keySet();
for (Text key1 : set1) {
if (map2.containsKey(key1)) {
addDistinct(map2.get(key1), getList(map1, key1));
} else {
map2.put(key1,
getList(map1, key1));
}
}
}
Steps to execute :
ADD JAR <JARNAME.jar>;
CREATE TEMPORARY FUNCTION pch as ‘pch_support';
select pch(empid, mgrid,empname,101) from employee;
Employee table:-
employee.empid employee.empname employee.mgrid employee.designation
101 Browne 0 CEO
102 Steve 101 Development Manager
103 Nicole 101 QA Manager
104 Brian 101 Architect
105 John 102 SSE
106 Suzan 102 SE
107 Dave 102 Test Manager
108 Allen 103 Functional Tester
109 Marry 103 Performance Tester
110 Patty 103 Acceptance Tester
111 Caroline 107 Integration Tester
112 Peggy 107 Regression Tester
113 Albert 107 Stress tester
show create table employee;
OK
createtab_stmt
CREATE EXTERNAL TABLE `employee`(
`empid` int,
`empname` string,
`mgrid` int,
`designation` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'maprfs:/poc/employee'
TBLPROPERTIES (
'transient_lastDdlTime'='1412051068')
Thanks & Regards,
Rathish A M