You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Rathish A M <ra...@gmail.com> on 2014/11/03 16:24:55 UTC

help with hive -UDAF function - AggregationBuffer overwritten

Hi Team,



I am trying to create a parent child hierarchy using hive UDAF function


My Mapper output is coming as per expected.

But in reducer Merge function my AggregationBuffer object is over
written while calling ((LazyBinaryMap) ).getMap() function.This is
happening intermittently

Please let me know is it a technical limitation with hive UDAF function.



highlighted the issue steps :- (When The getMap function executes the
value of ((Components) agg).buffer is overwritten)





public void merge(AggregationBuffer agg, Object partial)

                                                            throws
HiveException {

Map<Text, List>  map2 = ((Components) agg).buffer;

Map<Text, Object> map1 = getMap(partial);    // ((LazyBinaryMap) ).getMap()

Set<Text> set1 = map1.keySet();

for (Text key1 : set1) {

                              if (map2.containsKey(key1)) {


addDistinct(map2.get(key1), getList(map1, key1));

                              } else {

                                             map2.put(key1,
getList(map1, key1));

                              }

}

}



Steps to execute :



ADD JAR <JARNAME.jar>;

CREATE TEMPORARY FUNCTION pch as ‘pch_support';

select pch(empid, mgrid,empname,101) from employee;



Employee table:-



employee.empid  employee.empname        employee.mgrid  employee.designation

101     Browne  0       CEO

102     Steve   101     Development Manager

103     Nicole  101     QA Manager

104     Brian   101     Architect

105     John    102     SSE

106     Suzan   102     SE

107     Dave    102     Test Manager

108     Allen   103     Functional Tester

109     Marry   103     Performance Tester

110     Patty   103     Acceptance Tester

111     Caroline        107     Integration Tester

112     Peggy   107     Regression Tester

113     Albert  107     Stress tester



show create table employee;

OK

createtab_stmt

CREATE EXTERNAL TABLE `employee`(

  `empid` int,

  `empname` string,

  `mgrid` int,

  `designation` string)

ROW FORMAT DELIMITED

  FIELDS TERMINATED BY ','

STORED AS INPUTFORMAT

  'org.apache.hadoop.mapred.TextInputFormat'

OUTPUTFORMAT

  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION

  'maprfs:/poc/employee'

TBLPROPERTIES (

  'transient_lastDdlTime'='1412051068')





Thanks & Regards,

Rathish A M