You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by wz...@gmail.com on 2013/08/11 09:50:46 UTC

回复: hive 0.11 auto convert join bug report

Hi all:
when I change the table alias dim_pay_date to A, the query pass in hive 0.11(https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_change_alias_pass):

use test;
create table if not exists src ( `key` int,`val` string);
load data local inpath '/Users/code6/git/hive/data/files/kv1.txt' overwrite into table src;
drop table if exists orderpayment_small;
create table orderpayment_small (`dealid` int,`date` string,`time` string, `cityid` int, `userid` int);
insert overwrite table orderpayment_small select 748, '2011-03-24', '2011-03-24', 55 ,5372613 from src limit 1;
drop table if exists user_small;
create table user_small( userid int);
insert overwrite table user_small select key from src limit 100;
set hive.auto.convert.join.noconditionaltask.size = 200;
SELECT
`A`.`date`
, `deal`.`dealid`
FROM `orderpayment_small` `orderpayment`
JOIN `orderpayment_small` `A` ON `A`.`date` = `orderpayment`.`date`
JOIN `orderpayment_small` `deal` ON `deal`.`dealid` = `orderpayment`.`dealid`
JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` = `orderpayment`.`cityid`
JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
limit 5;



It's quite strange and interesting now. I will keep searching for the answer to this issue.




在 2013年8月9日星期五,上午3:32,wzc1989@gmail.com 写道:

> Hi all:  
> I'm currently testing hive11 and encounter one bug with hive.auto.convert.join, I construct a testcase so everyone can reproduce it(or you can reach the testcase here:https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_bug):
>  
> use test;
> create table src ( `key` int,`val` string);
> load data local inpath '/Users/code6/git/hive/data/files/kv1.txt' overwrite into table src;
> drop table if exists orderpayment_small;
> create table orderpayment_small (`dealid` int,`date` string,`time` string, `cityid` int, `userid` int);
> insert overwrite table orderpayment_small select 748, '2011-03-24', '2011-03-24', 55 ,5372613 from src limit 1;
> drop table if exists user_small;
> create table user_small( userid int);
> insert overwrite table user_small select key from src limit 100;
> set hive.auto.convert.join.noconditionaltask.size = 200;
> SELECT
> `dim_pay_date`.`date`
> , `deal`.`dealid`
> FROM `orderpayment_small` `orderpayment`
> JOIN `orderpayment_small` `dim_pay_date` ON `dim_pay_date`.`date` = `orderpayment`.`date`
> JOIN `orderpayment_small` `deal` ON `deal`.`dealid` = `orderpayment`.`dealid`
> JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` = `orderpayment`.`cityid`
> JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
> limit 5;
>  
>  
> You should replace the path of kv1.txt by yourself. You can run the above query in hive 0.11 and it will fail with ArrayIndexOutOfBoundsException, You can see the explain result and the console output of the query here : https://gist.github.com/code6/6187569
>  
> I compile the trunk code but it doesn't work with this query. I can run this query in hive 0.9 with hive.auto.convert.join turns on.
>  
> I try to dig into this problem and I think it may be caused by the map join optimization. Some adjacent operators aren't match for the input/output tableinfo(column positions diff).  
>  
> I'm not able to fix this bug and I would appreciate it if someone would like to look into this problem.
>  
> Thanks.  


Re: 回复: hive 0.11 auto convert join bug report

Posted by Steven Wong <sw...@netflix.com>.
Sorry, I mean to put this stack trace instead.

java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing row [Error getting row data with exception
java.lang.ArrayIndexOutOfBoundsException: 385986740
	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.checkObjectByteInfo(LazyBinaryUtils.java:180)
	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.parse(LazyBinaryStruct.java:138)
	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:195)
	at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:61)
	at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
	at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
	at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:663)
	at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:149)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:441)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:377)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
	at org.apache.hadoop.mapred.Child.main(Child.java:249)
 ]
	at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:167)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:441)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:377)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
	at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive
Runtime Error while processing row [Error getting row data with
exception java.lang.ArrayIndexOutOfBoundsException: 385986740
	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.checkObjectByteInfo(LazyBinaryUtils.java:180)
	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.parse(LazyBinaryStruct.java:138)
	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:195)
	at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:61)
	at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
	at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
	at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:663)
	at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:149)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:441)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:377)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
	at org.apache.hadoop.mapred.Child.main(Child.java:249)
 ]
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
	at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:149)
	... 8 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 385986740
	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.checkObjectByteInfo(LazyBinaryUtils.java:180)
	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.parse(LazyBinaryStruct.java:138)
	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:195)
	at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:61)
	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.evaluate(ExprNodeColumnEvaluator.java:98)
	at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:234)
	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)
	at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:90)
	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:652)
	... 9 more



On Wed, Sep 25, 2013 at 2:16 PM, Steven Wong <sw...@netflix.com> wrote:

> For me, the bug exhibits itself in Hive 0.11 as the following stack trace.
> I'm putting it here so that people searching on a similar problem can find
> this discussion thread in a web search. The discussion thread contains a
> workaround and a patch.
>
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) [Error getting row data with exception java.lang.ArrayIndexOutOfBoundsException: 175
> 	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.readVInt(LazyBinaryUtils.java:287)
> 	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.checkObjectByteInfo(LazyBinaryUtils.java:188)
> 	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.parse(LazyBinaryStruct.java:138)
> 	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:195)
> 	at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:61)
> 	at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:343)
> 	at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:343)
> 	at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:213)
> 	at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:251)
> 	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:522)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:423)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Unknown Source)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:260)
>  ]
> 	at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:268)
> 	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:522)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:423)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Unknown Source)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:260)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) [Error getting row data with exception java.lang.ArrayIndexOutOfBoundsException: 175
> 	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.readVInt(LazyBinaryUtils.java:287)
> 	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.checkObjectByteInfo(LazyBinaryUtils.java:188)
> 	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.parse(LazyBinaryStruct.java:138)
> 	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:195)
> 	at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:61)
> 	at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:343)
> 	at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:343)
> 	at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:213)
> 	at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:251)
> 	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:522)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:423)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Unknown Source)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:260)
>  ]
> 	at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:256)
> 	... 7 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ArrayIndexOutOfBoundsException: 175
> 	at org.apache.hadoop.hive.ql.exec.JoinOperator.processOp(JoinOperator.java:131)
> 	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
> 	at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:247)
> 	... 7 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 175
> 	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.readVInt(LazyBinaryUtils.java:287)
> 	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.checkObjectByteInfo(LazyBinaryUtils.java:188)
> 	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.parse(LazyBinaryStruct.java:138)
> 	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:195)
> 	at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:61)
> 	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.evaluate(ExprNodeColumnEvaluator.java:102)
> 	at org.apache.hadoop.hive.ql.exec.JoinUtil.computeValues(JoinUtil.java:243)
> 	at org.apache.hadoop.hive.ql.exec.JoinOperator.processOp(JoinOperator.java:82)
> 	... 9 more
>
>
>
> On Mon, Sep 16, 2013 at 5:20 AM, Sun, Rui <ru...@intel.com> wrote:
>
>>  Hi, Amit,****
>>
>> ** **
>>
>> You can see the description of HIVE-5256 for more detailed explanation.**
>> **
>>
>> ** **
>>
>> Both table aliases and names (if no alias) may run into this issue.****
>>
>> ** **
>>
>> This issue happened to be covered by the XML
>> serialization/deserialization of the MapredWork containing the join
>> operator (HashMap serialization/deserialization will reverse the order of
>> key-value pairs in the same bucket) and was exposed by HIVE-4078 because
>> the copy of Mapredwork in the case of noconditionaltask optimization was
>> optimized off. ****
>>
>> ** **
>>
>> ** **
>>
>> *From:* Amit Sharma [mailto:amsharma@netflix.com]
>> *Sent:* Friday, September 13, 2013 6:05 AM
>> *To:* user@hive.apache.org
>> *Subject:* Re: 回复: hive 0.11 auto convert join bug report****
>>
>> ** **
>>
>> Hi Navis,****
>>
>> ** **
>>
>> I was trying to look at this email thread as well as the jira to
>> understand the scope of this issue. Does this get triggered only in cases
>> of using aliases which end up mapping to the same value upon hashing? Or
>> can this be triggered under other conditions as well? What if the aliases
>> are not used and the table names some how might map to similar hashcode
>> values?****
>>
>> ** **
>>
>> Also is changing the alias the only workaround for this problem or is
>> there any other workaround possible?****
>>
>> ** **
>>
>> Thanks,
>> Amit****
>>
>> ** **
>>
>> On Sun, Aug 11, 2013 at 9:22 PM, Navis류승우 <na...@nexr.com> wrote:****
>>
>> Hi,
>>
>> Hive is notorious making different result with different aliases.
>> Changing alias was a final way to avoid bug in desperate situation.
>>
>> I think the patch in the issue is ready, wish it's helpful.
>>
>> Thanks.
>>
>> 2013/8/11  <wz...@gmail.com>:****
>>
>> > Hi Navis,
>> >
>> > My colleague chenchun finds that hashcode of 'deal' and 'dim_pay_date'
>> are
>> > the same and the code in MapJoinProcessor.java ignores the order of
>> > rowschema.
>> > I look at your patch and it's exactly the same place we are working on.
>> > Thanks for your patch.
>> >
>> > 在 2013年8月11日星期日,下午9:38,Navis류승우 写道:
>> >
>> > Hi,
>> >
>> > I've booked this on https://issues.apache.org/jira/browse/HIVE-5056
>> > and attached patch for it.
>> >
>> > It needs full test for confirmation but you can try it.
>> >
>> > Thanks.
>> >
>> > 2013/8/11 <wz...@gmail.com>:
>> >
>> > Hi all:
>> > when I change the table alias dim_pay_date to A, the query pass in hive
>> > 0.11(
>> https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_change_alias_pass
>> ):
>> >
>> > use test;
>> > create table if not exists src ( `key` int,`val` string);
>> > load data local inpath '/Users/code6/git/hive/data/files/kv1.txt'
>> overwrite
>> > into table src;
>> > drop table if exists orderpayment_small;
>> > create table orderpayment_small (`dealid` int,`date` string,`time`
>> string,
>> > `cityid` int, `userid` int);
>> > insert overwrite table orderpayment_small select 748, '2011-03-24',
>> > '2011-03-24', 55 ,5372613 from src limit 1;
>> > drop table if exists user_small;
>> > create table user_small( userid int);
>> > insert overwrite table user_small select key from src limit 100;
>> > set hive.auto.convert.join.noconditionaltask.size = 200;
>> > SELECT
>> > `A`.`date`
>> > , `deal`.`dealid`
>> > FROM `orderpayment_small` `orderpayment`
>> > JOIN `orderpayment_small` `A` ON `A`.`date` = `orderpayment`.`date`
>> > JOIN `orderpayment_small` `deal` ON `deal`.`dealid` =
>> > `orderpayment`.`dealid`
>> > JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` =
>> > `orderpayment`.`cityid`
>> > JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
>> > limit 5;
>> >
>> >
>> > It's quite strange and interesting now. I will keep searching for the
>> answer
>> > to this issue.
>> >
>> >
>> >
>> > 在 2013年8月9日星期五,上午3:32,wzc1989@gmail.com 写道:
>> >
>> > Hi all:
>> > I'm currently testing hive11 and encounter one bug with
>> > hive.auto.convert.join, I construct a testcase so everyone can reproduce
>> > it(or you can reach the testcase
>> > here:
>> https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_bug):
>> >
>> > use test;
>> > create table src ( `key` int,`val` string);
>> > load data local inpath '/Users/code6/git/hive/data/files/kv1.txt'
>> overwrite
>> > into table src;
>> > drop table if exists orderpayment_small;
>> > create table orderpayment_small (`dealid` int,`date` string,`time`
>> string,
>> > `cityid` int, `userid` int);
>> > insert overwrite table orderpayment_small select 748, '2011-03-24',
>> > '2011-03-24', 55 ,5372613 from src limit 1;
>> > drop table if exists user_small;
>> > create table user_small( userid int);
>> > insert overwrite table user_small select key from src limit 100;
>> > set hive.auto.convert.join.noconditionaltask.size = 200;
>> > SELECT
>> > `dim_pay_date`.`date`
>> > , `deal`.`dealid`
>> > FROM `orderpayment_small` `orderpayment`
>> > JOIN `orderpayment_small` `dim_pay_date` ON `dim_pay_date`.`date` =
>> > `orderpayment`.`date`
>> > JOIN `orderpayment_small` `deal` ON `deal`.`dealid` =
>> > `orderpayment`.`dealid`
>> > JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` =
>> > `orderpayment`.`cityid`
>> > JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
>> > limit 5;
>> >
>> >
>> > You should replace the path of kv1.txt by yourself. You can run the
>> above
>> > query in hive 0.11 and it will fail with
>> ArrayIndexOutOfBoundsException, You
>> > can see the explain result and the console output of the query here :
>> > https://gist.github.com/code6/6187569
>> >
>> > I compile the trunk code but it doesn't work with this query. I can run
>> this
>> > query in hive 0.9 with hive.auto.convert.join turns on.
>> >
>> > I try to dig into this problem and I think it may be caused by the map
>> join
>> > optimization. Some adjacent operators aren't match for the input/output
>> > tableinfo(column positions diff).
>> >
>> > I'm not able to fix this bug and I would appreciate it if someone would
>> like
>> > to look into this problem.
>> >
>> > Thanks.
>> >
>> >****
>>
>> ** **
>>
>
>

Re: 回复: hive 0.11 auto convert join bug report

Posted by Steven Wong <sw...@netflix.com>.
For me, the bug exhibits itself in Hive 0.11 as the following stack trace.
I'm putting it here so that people searching on a similar problem can find
this discussion thread in a web search. The discussion thread contains a
workaround and a patch.

java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing row (tag=0) [Error getting row data with exception
java.lang.ArrayIndexOutOfBoundsException: 175
	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.readVInt(LazyBinaryUtils.java:287)
	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.checkObjectByteInfo(LazyBinaryUtils.java:188)
	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.parse(LazyBinaryStruct.java:138)
	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:195)
	at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:61)
	at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:343)
	at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:343)
	at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:213)
	at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:251)
	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:522)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:423)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Unknown Source)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
	at org.apache.hadoop.mapred.Child.main(Child.java:260)
 ]
	at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:268)
	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:522)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:423)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Unknown Source)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
	at org.apache.hadoop.mapred.Child.main(Child.java:260)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive
Runtime Error while processing row (tag=0) [Error getting row data
with exception java.lang.ArrayIndexOutOfBoundsException: 175
	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.readVInt(LazyBinaryUtils.java:287)
	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.checkObjectByteInfo(LazyBinaryUtils.java:188)
	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.parse(LazyBinaryStruct.java:138)
	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:195)
	at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:61)
	at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:343)
	at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:343)
	at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:213)
	at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:251)
	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:522)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:423)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Unknown Source)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
	at org.apache.hadoop.mapred.Child.main(Child.java:260)
 ]
	at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:256)
	... 7 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.ArrayIndexOutOfBoundsException: 175
	at org.apache.hadoop.hive.ql.exec.JoinOperator.processOp(JoinOperator.java:131)
	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
	at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:247)
	... 7 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 175
	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.readVInt(LazyBinaryUtils.java:287)
	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.checkObjectByteInfo(LazyBinaryUtils.java:188)
	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.parse(LazyBinaryStruct.java:138)
	at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:195)
	at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:61)
	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.evaluate(ExprNodeColumnEvaluator.java:102)
	at org.apache.hadoop.hive.ql.exec.JoinUtil.computeValues(JoinUtil.java:243)
	at org.apache.hadoop.hive.ql.exec.JoinOperator.processOp(JoinOperator.java:82)
	... 9 more



On Mon, Sep 16, 2013 at 5:20 AM, Sun, Rui <ru...@intel.com> wrote:

>  Hi, Amit,****
>
> ** **
>
> You can see the description of HIVE-5256 for more detailed explanation.***
> *
>
> ** **
>
> Both table aliases and names (if no alias) may run into this issue.****
>
> ** **
>
> This issue happened to be covered by the XML serialization/deserialization
> of the MapredWork containing the join operator (HashMap
> serialization/deserialization will reverse the order of key-value pairs in
> the same bucket) and was exposed by HIVE-4078 because the copy of
> Mapredwork in the case of noconditionaltask optimization was optimized off.
> ****
>
> ** **
>
> ** **
>
> *From:* Amit Sharma [mailto:amsharma@netflix.com]
> *Sent:* Friday, September 13, 2013 6:05 AM
> *To:* user@hive.apache.org
> *Subject:* Re: 回复: hive 0.11 auto convert join bug report****
>
> ** **
>
> Hi Navis,****
>
> ** **
>
> I was trying to look at this email thread as well as the jira to
> understand the scope of this issue. Does this get triggered only in cases
> of using aliases which end up mapping to the same value upon hashing? Or
> can this be triggered under other conditions as well? What if the aliases
> are not used and the table names some how might map to similar hashcode
> values?****
>
> ** **
>
> Also is changing the alias the only workaround for this problem or is
> there any other workaround possible?****
>
> ** **
>
> Thanks,
> Amit****
>
> ** **
>
> On Sun, Aug 11, 2013 at 9:22 PM, Navis류승우 <na...@nexr.com> wrote:****
>
> Hi,
>
> Hive is notorious making different result with different aliases.
> Changing alias was a final way to avoid bug in desperate situation.
>
> I think the patch in the issue is ready, wish it's helpful.
>
> Thanks.
>
> 2013/8/11  <wz...@gmail.com>:****
>
> > Hi Navis,
> >
> > My colleague chenchun finds that hashcode of 'deal' and 'dim_pay_date'
> are
> > the same and the code in MapJoinProcessor.java ignores the order of
> > rowschema.
> > I look at your patch and it's exactly the same place we are working on.
> > Thanks for your patch.
> >
> > 在 2013年8月11日星期日,下午9:38,Navis류승우 写道:
> >
> > Hi,
> >
> > I've booked this on https://issues.apache.org/jira/browse/HIVE-5056
> > and attached patch for it.
> >
> > It needs full test for confirmation but you can try it.
> >
> > Thanks.
> >
> > 2013/8/11 <wz...@gmail.com>:
> >
> > Hi all:
> > when I change the table alias dim_pay_date to A, the query pass in hive
> > 0.11(
> https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_change_alias_pass
> ):
> >
> > use test;
> > create table if not exists src ( `key` int,`val` string);
> > load data local inpath '/Users/code6/git/hive/data/files/kv1.txt'
> overwrite
> > into table src;
> > drop table if exists orderpayment_small;
> > create table orderpayment_small (`dealid` int,`date` string,`time`
> string,
> > `cityid` int, `userid` int);
> > insert overwrite table orderpayment_small select 748, '2011-03-24',
> > '2011-03-24', 55 ,5372613 from src limit 1;
> > drop table if exists user_small;
> > create table user_small( userid int);
> > insert overwrite table user_small select key from src limit 100;
> > set hive.auto.convert.join.noconditionaltask.size = 200;
> > SELECT
> > `A`.`date`
> > , `deal`.`dealid`
> > FROM `orderpayment_small` `orderpayment`
> > JOIN `orderpayment_small` `A` ON `A`.`date` = `orderpayment`.`date`
> > JOIN `orderpayment_small` `deal` ON `deal`.`dealid` =
> > `orderpayment`.`dealid`
> > JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` =
> > `orderpayment`.`cityid`
> > JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
> > limit 5;
> >
> >
> > It's quite strange and interesting now. I will keep searching for the
> answer
> > to this issue.
> >
> >
> >
> > 在 2013年8月9日星期五,上午3:32,wzc1989@gmail.com 写道:
> >
> > Hi all:
> > I'm currently testing hive11 and encounter one bug with
> > hive.auto.convert.join, I construct a testcase so everyone can reproduce
> > it(or you can reach the testcase
> > here:
> https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_bug):
> >
> > use test;
> > create table src ( `key` int,`val` string);
> > load data local inpath '/Users/code6/git/hive/data/files/kv1.txt'
> overwrite
> > into table src;
> > drop table if exists orderpayment_small;
> > create table orderpayment_small (`dealid` int,`date` string,`time`
> string,
> > `cityid` int, `userid` int);
> > insert overwrite table orderpayment_small select 748, '2011-03-24',
> > '2011-03-24', 55 ,5372613 from src limit 1;
> > drop table if exists user_small;
> > create table user_small( userid int);
> > insert overwrite table user_small select key from src limit 100;
> > set hive.auto.convert.join.noconditionaltask.size = 200;
> > SELECT
> > `dim_pay_date`.`date`
> > , `deal`.`dealid`
> > FROM `orderpayment_small` `orderpayment`
> > JOIN `orderpayment_small` `dim_pay_date` ON `dim_pay_date`.`date` =
> > `orderpayment`.`date`
> > JOIN `orderpayment_small` `deal` ON `deal`.`dealid` =
> > `orderpayment`.`dealid`
> > JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` =
> > `orderpayment`.`cityid`
> > JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
> > limit 5;
> >
> >
> > You should replace the path of kv1.txt by yourself. You can run the above
> > query in hive 0.11 and it will fail with ArrayIndexOutOfBoundsException,
> You
> > can see the explain result and the console output of the query here :
> > https://gist.github.com/code6/6187569
> >
> > I compile the trunk code but it doesn't work with this query. I can run
> this
> > query in hive 0.9 with hive.auto.convert.join turns on.
> >
> > I try to dig into this problem and I think it may be caused by the map
> join
> > optimization. Some adjacent operators aren't match for the input/output
> > tableinfo(column positions diff).
> >
> > I'm not able to fix this bug and I would appreciate it if someone would
> like
> > to look into this problem.
> >
> > Thanks.
> >
> >****
>
> ** **
>

RE: 回复: hive 0.11 auto convert join bug report

Posted by "Sun, Rui" <ru...@intel.com>.
Hi, Amit,

You can see the description of HIVE-5256 for more detailed explanation.

Both table aliases and names (if no alias) may run into this issue.

This issue happened to be covered by the XML serialization/deserialization of the MapredWork containing the join operator (HashMap serialization/deserialization will reverse the order of key-value pairs in the same bucket) and was exposed by HIVE-4078 because the copy of Mapredwork in the case of noconditionaltask optimization was optimized off.


From: Amit Sharma [mailto:amsharma@netflix.com]
Sent: Friday, September 13, 2013 6:05 AM
To: user@hive.apache.org
Subject: Re: 回复: hive 0.11 auto convert join bug report

Hi Navis,

I was trying to look at this email thread as well as the jira to understand the scope of this issue. Does this get triggered only in cases of using aliases which end up mapping to the same value upon hashing? Or can this be triggered under other conditions as well? What if the aliases are not used and the table names some how might map to similar hashcode values?

Also is changing the alias the only workaround for this problem or is there any other workaround possible?

Thanks,
Amit

On Sun, Aug 11, 2013 at 9:22 PM, Navis류승우 <na...@nexr.com>> wrote:
Hi,

Hive is notorious making different result with different aliases.
Changing alias was a final way to avoid bug in desperate situation.

I think the patch in the issue is ready, wish it's helpful.

Thanks.

2013/8/11  <wz...@gmail.com>>:
> Hi Navis,
>
> My colleague chenchun finds that hashcode of 'deal' and 'dim_pay_date' are
> the same and the code in MapJoinProcessor.java ignores the order of
> rowschema.
> I look at your patch and it's exactly the same place we are working on.
> Thanks for your patch.
>
> 在 2013年8月11日星期日,下午9:38,Navis류승우 写道:
>
> Hi,
>
> I've booked this on https://issues.apache.org/jira/browse/HIVE-5056
> and attached patch for it.
>
> It needs full test for confirmation but you can try it.
>
> Thanks.
>
> 2013/8/11 <wz...@gmail.com>>:
>
> Hi all:
> when I change the table alias dim_pay_date to A, the query pass in hive
> 0.11(https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_change_alias_pass):
>
> use test;
> create table if not exists src ( `key` int,`val` string);
> load data local inpath '/Users/code6/git/hive/data/files/kv1.txt' overwrite
> into table src;
> drop table if exists orderpayment_small;
> create table orderpayment_small (`dealid` int,`date` string,`time` string,
> `cityid` int, `userid` int);
> insert overwrite table orderpayment_small select 748, '2011-03-24',
> '2011-03-24', 55 ,5372613 from src limit 1;
> drop table if exists user_small;
> create table user_small( userid int);
> insert overwrite table user_small select key from src limit 100;
> set hive.auto.convert.join.noconditionaltask.size = 200;
> SELECT
> `A`.`date`
> , `deal`.`dealid`
> FROM `orderpayment_small` `orderpayment`
> JOIN `orderpayment_small` `A` ON `A`.`date` = `orderpayment`.`date`
> JOIN `orderpayment_small` `deal` ON `deal`.`dealid` =
> `orderpayment`.`dealid`
> JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` =
> `orderpayment`.`cityid`
> JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
> limit 5;
>
>
> It's quite strange and interesting now. I will keep searching for the answer
> to this issue.
>
>
>
> 在 2013年8月9日星期五,上午3:32,wzc1989@gmail.com<ma...@gmail.com> 写道:
>
> Hi all:
> I'm currently testing hive11 and encounter one bug with
> hive.auto.convert.join, I construct a testcase so everyone can reproduce
> it(or you can reach the testcase
> here:https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_bug):
>
> use test;
> create table src ( `key` int,`val` string);
> load data local inpath '/Users/code6/git/hive/data/files/kv1.txt' overwrite
> into table src;
> drop table if exists orderpayment_small;
> create table orderpayment_small (`dealid` int,`date` string,`time` string,
> `cityid` int, `userid` int);
> insert overwrite table orderpayment_small select 748, '2011-03-24',
> '2011-03-24', 55 ,5372613 from src limit 1;
> drop table if exists user_small;
> create table user_small( userid int);
> insert overwrite table user_small select key from src limit 100;
> set hive.auto.convert.join.noconditionaltask.size = 200;
> SELECT
> `dim_pay_date`.`date`
> , `deal`.`dealid`
> FROM `orderpayment_small` `orderpayment`
> JOIN `orderpayment_small` `dim_pay_date` ON `dim_pay_date`.`date` =
> `orderpayment`.`date`
> JOIN `orderpayment_small` `deal` ON `deal`.`dealid` =
> `orderpayment`.`dealid`
> JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` =
> `orderpayment`.`cityid`
> JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
> limit 5;
>
>
> You should replace the path of kv1.txt by yourself. You can run the above
> query in hive 0.11 and it will fail with ArrayIndexOutOfBoundsException, You
> can see the explain result and the console output of the query here :
> https://gist.github.com/code6/6187569
>
> I compile the trunk code but it doesn't work with this query. I can run this
> query in hive 0.9 with hive.auto.convert.join turns on.
>
> I try to dig into this problem and I think it may be caused by the map join
> optimization. Some adjacent operators aren't match for the input/output
> tableinfo(column positions diff).
>
> I'm not able to fix this bug and I would appreciate it if someone would like
> to look into this problem.
>
> Thanks.
>
>


Re: 回复: hive 0.11 auto convert join bug report

Posted by Navis류승우 <na...@nexr.com>.
Hi, sorry for late reply.

As Chun Chen said, same hashcode would make this problem vivid. But it can
be happened whenever the appearing order in JOIN expression is different
with that of parents.

Thanks.





2013/9/13 Amit Sharma <am...@netflix.com>

> Hi Navis,
>
> I was trying to look at this email thread as well as the jira to
> understand the scope of this issue. Does this get triggered only in cases
> of using aliases which end up mapping to the same value upon hashing? Or
> can this be triggered under other conditions as well? What if the aliases
> are not used and the table names some how might map to similar hashcode
> values?
>
> Also is changing the alias the only workaround for this problem or is
> there any other workaround possible?
>
> Thanks,
> Amit
>
>
> On Sun, Aug 11, 2013 at 9:22 PM, Navis류승우 <na...@nexr.com> wrote:
>
>> Hi,
>>
>> Hive is notorious making different result with different aliases.
>> Changing alias was a final way to avoid bug in desperate situation.
>>
>> I think the patch in the issue is ready, wish it's helpful.
>>
>> Thanks.
>>
>> 2013/8/11  <wz...@gmail.com>:
>> > Hi Navis,
>> >
>> > My colleague chenchun finds that hashcode of 'deal' and 'dim_pay_date'
>> are
>> > the same and the code in MapJoinProcessor.java ignores the order of
>> > rowschema.
>> > I look at your patch and it's exactly the same place we are working on.
>> > Thanks for your patch.
>> >
>> > 在 2013年8月11日星期日,下午9:38,Navis류승우 写道:
>> >
>> > Hi,
>> >
>> > I've booked this on https://issues.apache.org/jira/browse/HIVE-5056
>> > and attached patch for it.
>> >
>> > It needs full test for confirmation but you can try it.
>> >
>> > Thanks.
>> >
>> > 2013/8/11 <wz...@gmail.com>:
>> >
>> > Hi all:
>> > when I change the table alias dim_pay_date to A, the query pass in hive
>> > 0.11(
>> https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_change_alias_pass
>> ):
>> >
>> > use test;
>> > create table if not exists src ( `key` int,`val` string);
>> > load data local inpath '/Users/code6/git/hive/data/files/kv1.txt'
>> overwrite
>> > into table src;
>> > drop table if exists orderpayment_small;
>> > create table orderpayment_small (`dealid` int,`date` string,`time`
>> string,
>> > `cityid` int, `userid` int);
>> > insert overwrite table orderpayment_small select 748, '2011-03-24',
>> > '2011-03-24', 55 ,5372613 from src limit 1;
>> > drop table if exists user_small;
>> > create table user_small( userid int);
>> > insert overwrite table user_small select key from src limit 100;
>> > set hive.auto.convert.join.noconditionaltask.size = 200;
>> > SELECT
>> > `A`.`date`
>> > , `deal`.`dealid`
>> > FROM `orderpayment_small` `orderpayment`
>> > JOIN `orderpayment_small` `A` ON `A`.`date` = `orderpayment`.`date`
>> > JOIN `orderpayment_small` `deal` ON `deal`.`dealid` =
>> > `orderpayment`.`dealid`
>> > JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` =
>> > `orderpayment`.`cityid`
>> > JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
>> > limit 5;
>> >
>> >
>> > It's quite strange and interesting now. I will keep searching for the
>> answer
>> > to this issue.
>> >
>> >
>> >
>> > 在 2013年8月9日星期五,上午3:32,wzc1989@gmail.com 写道:
>> >
>> > Hi all:
>> > I'm currently testing hive11 and encounter one bug with
>> > hive.auto.convert.join, I construct a testcase so everyone can reproduce
>> > it(or you can reach the testcase
>> > here:
>> https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_bug):
>> >
>> > use test;
>> > create table src ( `key` int,`val` string);
>> > load data local inpath '/Users/code6/git/hive/data/files/kv1.txt'
>> overwrite
>> > into table src;
>> > drop table if exists orderpayment_small;
>> > create table orderpayment_small (`dealid` int,`date` string,`time`
>> string,
>> > `cityid` int, `userid` int);
>> > insert overwrite table orderpayment_small select 748, '2011-03-24',
>> > '2011-03-24', 55 ,5372613 from src limit 1;
>> > drop table if exists user_small;
>> > create table user_small( userid int);
>> > insert overwrite table user_small select key from src limit 100;
>> > set hive.auto.convert.join.noconditionaltask.size = 200;
>> > SELECT
>> > `dim_pay_date`.`date`
>> > , `deal`.`dealid`
>> > FROM `orderpayment_small` `orderpayment`
>> > JOIN `orderpayment_small` `dim_pay_date` ON `dim_pay_date`.`date` =
>> > `orderpayment`.`date`
>> > JOIN `orderpayment_small` `deal` ON `deal`.`dealid` =
>> > `orderpayment`.`dealid`
>> > JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` =
>> > `orderpayment`.`cityid`
>> > JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
>> > limit 5;
>> >
>> >
>> > You should replace the path of kv1.txt by yourself. You can run the
>> above
>> > query in hive 0.11 and it will fail with
>> ArrayIndexOutOfBoundsException, You
>> > can see the explain result and the console output of the query here :
>> > https://gist.github.com/code6/6187569
>> >
>> > I compile the trunk code but it doesn't work with this query. I can run
>> this
>> > query in hive 0.9 with hive.auto.convert.join turns on.
>> >
>> > I try to dig into this problem and I think it may be caused by the map
>> join
>> > optimization. Some adjacent operators aren't match for the input/output
>> > tableinfo(column positions diff).
>> >
>> > I'm not able to fix this bug and I would appreciate it if someone would
>> like
>> > to look into this problem.
>> >
>> > Thanks.
>> >
>> >
>>
>
>

Re: 回复: hive 0.11 auto convert join bug report

Posted by Amit Sharma <am...@netflix.com>.
Hi Navis,

I was trying to look at this email thread as well as the jira to understand
the scope of this issue. Does this get triggered only in cases of using
aliases which end up mapping to the same value upon hashing? Or can this be
triggered under other conditions as well? What if the aliases are not used
and the table names some how might map to similar hashcode values?

Also is changing the alias the only workaround for this problem or is there
any other workaround possible?

Thanks,
Amit


On Sun, Aug 11, 2013 at 9:22 PM, Navis류승우 <na...@nexr.com> wrote:

> Hi,
>
> Hive is notorious making different result with different aliases.
> Changing alias was a final way to avoid bug in desperate situation.
>
> I think the patch in the issue is ready, wish it's helpful.
>
> Thanks.
>
> 2013/8/11  <wz...@gmail.com>:
> > Hi Navis,
> >
> > My colleague chenchun finds that hashcode of 'deal' and 'dim_pay_date'
> are
> > the same and the code in MapJoinProcessor.java ignores the order of
> > rowschema.
> > I look at your patch and it's exactly the same place we are working on.
> > Thanks for your patch.
> >
> > 在 2013年8月11日星期日,下午9:38,Navis류승우 写道:
> >
> > Hi,
> >
> > I've booked this on https://issues.apache.org/jira/browse/HIVE-5056
> > and attached patch for it.
> >
> > It needs full test for confirmation but you can try it.
> >
> > Thanks.
> >
> > 2013/8/11 <wz...@gmail.com>:
> >
> > Hi all:
> > when I change the table alias dim_pay_date to A, the query pass in hive
> > 0.11(
> https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_change_alias_pass
> ):
> >
> > use test;
> > create table if not exists src ( `key` int,`val` string);
> > load data local inpath '/Users/code6/git/hive/data/files/kv1.txt'
> overwrite
> > into table src;
> > drop table if exists orderpayment_small;
> > create table orderpayment_small (`dealid` int,`date` string,`time`
> string,
> > `cityid` int, `userid` int);
> > insert overwrite table orderpayment_small select 748, '2011-03-24',
> > '2011-03-24', 55 ,5372613 from src limit 1;
> > drop table if exists user_small;
> > create table user_small( userid int);
> > insert overwrite table user_small select key from src limit 100;
> > set hive.auto.convert.join.noconditionaltask.size = 200;
> > SELECT
> > `A`.`date`
> > , `deal`.`dealid`
> > FROM `orderpayment_small` `orderpayment`
> > JOIN `orderpayment_small` `A` ON `A`.`date` = `orderpayment`.`date`
> > JOIN `orderpayment_small` `deal` ON `deal`.`dealid` =
> > `orderpayment`.`dealid`
> > JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` =
> > `orderpayment`.`cityid`
> > JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
> > limit 5;
> >
> >
> > It's quite strange and interesting now. I will keep searching for the
> answer
> > to this issue.
> >
> >
> >
> > 在 2013年8月9日星期五,上午3:32,wzc1989@gmail.com 写道:
> >
> > Hi all:
> > I'm currently testing hive11 and encounter one bug with
> > hive.auto.convert.join, I construct a testcase so everyone can reproduce
> > it(or you can reach the testcase
> > here:
> https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_bug):
> >
> > use test;
> > create table src ( `key` int,`val` string);
> > load data local inpath '/Users/code6/git/hive/data/files/kv1.txt'
> overwrite
> > into table src;
> > drop table if exists orderpayment_small;
> > create table orderpayment_small (`dealid` int,`date` string,`time`
> string,
> > `cityid` int, `userid` int);
> > insert overwrite table orderpayment_small select 748, '2011-03-24',
> > '2011-03-24', 55 ,5372613 from src limit 1;
> > drop table if exists user_small;
> > create table user_small( userid int);
> > insert overwrite table user_small select key from src limit 100;
> > set hive.auto.convert.join.noconditionaltask.size = 200;
> > SELECT
> > `dim_pay_date`.`date`
> > , `deal`.`dealid`
> > FROM `orderpayment_small` `orderpayment`
> > JOIN `orderpayment_small` `dim_pay_date` ON `dim_pay_date`.`date` =
> > `orderpayment`.`date`
> > JOIN `orderpayment_small` `deal` ON `deal`.`dealid` =
> > `orderpayment`.`dealid`
> > JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` =
> > `orderpayment`.`cityid`
> > JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
> > limit 5;
> >
> >
> > You should replace the path of kv1.txt by yourself. You can run the above
> > query in hive 0.11 and it will fail with ArrayIndexOutOfBoundsException,
> You
> > can see the explain result and the console output of the query here :
> > https://gist.github.com/code6/6187569
> >
> > I compile the trunk code but it doesn't work with this query. I can run
> this
> > query in hive 0.9 with hive.auto.convert.join turns on.
> >
> > I try to dig into this problem and I think it may be caused by the map
> join
> > optimization. Some adjacent operators aren't match for the input/output
> > tableinfo(column positions diff).
> >
> > I'm not able to fix this bug and I would appreciate it if someone would
> like
> > to look into this problem.
> >
> > Thanks.
> >
> >
>

Re: 回复: hive 0.11 auto convert join bug report

Posted by Navis류승우 <na...@nexr.com>.
Hi,

Hive is notorious making different result with different aliases.
Changing alias was a final way to avoid bug in desperate situation.

I think the patch in the issue is ready, wish it's helpful.

Thanks.

2013/8/11  <wz...@gmail.com>:
> Hi Navis,
>
> My colleague chenchun finds that hashcode of 'deal' and 'dim_pay_date' are
> the same and the code in MapJoinProcessor.java ignores the order of
> rowschema.
> I look at your patch and it's exactly the same place we are working on.
> Thanks for your patch.
>
> 在 2013年8月11日星期日,下午9:38,Navis류승우 写道:
>
> Hi,
>
> I've booked this on https://issues.apache.org/jira/browse/HIVE-5056
> and attached patch for it.
>
> It needs full test for confirmation but you can try it.
>
> Thanks.
>
> 2013/8/11 <wz...@gmail.com>:
>
> Hi all:
> when I change the table alias dim_pay_date to A, the query pass in hive
> 0.11(https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_change_alias_pass):
>
> use test;
> create table if not exists src ( `key` int,`val` string);
> load data local inpath '/Users/code6/git/hive/data/files/kv1.txt' overwrite
> into table src;
> drop table if exists orderpayment_small;
> create table orderpayment_small (`dealid` int,`date` string,`time` string,
> `cityid` int, `userid` int);
> insert overwrite table orderpayment_small select 748, '2011-03-24',
> '2011-03-24', 55 ,5372613 from src limit 1;
> drop table if exists user_small;
> create table user_small( userid int);
> insert overwrite table user_small select key from src limit 100;
> set hive.auto.convert.join.noconditionaltask.size = 200;
> SELECT
> `A`.`date`
> , `deal`.`dealid`
> FROM `orderpayment_small` `orderpayment`
> JOIN `orderpayment_small` `A` ON `A`.`date` = `orderpayment`.`date`
> JOIN `orderpayment_small` `deal` ON `deal`.`dealid` =
> `orderpayment`.`dealid`
> JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` =
> `orderpayment`.`cityid`
> JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
> limit 5;
>
>
> It's quite strange and interesting now. I will keep searching for the answer
> to this issue.
>
>
>
> 在 2013年8月9日星期五,上午3:32,wzc1989@gmail.com 写道:
>
> Hi all:
> I'm currently testing hive11 and encounter one bug with
> hive.auto.convert.join, I construct a testcase so everyone can reproduce
> it(or you can reach the testcase
> here:https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_bug):
>
> use test;
> create table src ( `key` int,`val` string);
> load data local inpath '/Users/code6/git/hive/data/files/kv1.txt' overwrite
> into table src;
> drop table if exists orderpayment_small;
> create table orderpayment_small (`dealid` int,`date` string,`time` string,
> `cityid` int, `userid` int);
> insert overwrite table orderpayment_small select 748, '2011-03-24',
> '2011-03-24', 55 ,5372613 from src limit 1;
> drop table if exists user_small;
> create table user_small( userid int);
> insert overwrite table user_small select key from src limit 100;
> set hive.auto.convert.join.noconditionaltask.size = 200;
> SELECT
> `dim_pay_date`.`date`
> , `deal`.`dealid`
> FROM `orderpayment_small` `orderpayment`
> JOIN `orderpayment_small` `dim_pay_date` ON `dim_pay_date`.`date` =
> `orderpayment`.`date`
> JOIN `orderpayment_small` `deal` ON `deal`.`dealid` =
> `orderpayment`.`dealid`
> JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` =
> `orderpayment`.`cityid`
> JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
> limit 5;
>
>
> You should replace the path of kv1.txt by yourself. You can run the above
> query in hive 0.11 and it will fail with ArrayIndexOutOfBoundsException, You
> can see the explain result and the console output of the query here :
> https://gist.github.com/code6/6187569
>
> I compile the trunk code but it doesn't work with this query. I can run this
> query in hive 0.9 with hive.auto.convert.join turns on.
>
> I try to dig into this problem and I think it may be caused by the map join
> optimization. Some adjacent operators aren't match for the input/output
> tableinfo(column positions diff).
>
> I'm not able to fix this bug and I would appreciate it if someone would like
> to look into this problem.
>
> Thanks.
>
>

回复: hive 0.11 auto convert join bug report

Posted by wz...@gmail.com.
Hi Navis,

My colleague chenchun finds that hashcode of 'deal' and 'dim_pay_date' are the same and the code in MapJoinProcessor.java ignores the order of rowschema.
I look at your patch and it's exactly the same place we are working on.
Thanks for your patch.


在 2013年8月11日星期日,下午9:38,Navis류승우 写道:

> Hi,
>  
> I've booked this on https://issues.apache.org/jira/browse/HIVE-5056
> and attached patch for it.
>  
> It needs full test for confirmation but you can try it.
>  
> Thanks.
>  
> 2013/8/11 <wzc1989@gmail.com (mailto:wzc1989@gmail.com)>:
> > Hi all:
> > when I change the table alias dim_pay_date to A, the query pass in hive
> > 0.11(https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_change_alias_pass):
> >  
> > use test;
> > create table if not exists src ( `key` int,`val` string);
> > load data local inpath '/Users/code6/git/hive/data/files/kv1.txt' overwrite
> > into table src;
> > drop table if exists orderpayment_small;
> > create table orderpayment_small (`dealid` int,`date` string,`time` string,
> > `cityid` int, `userid` int);
> > insert overwrite table orderpayment_small select 748, '2011-03-24',
> > '2011-03-24', 55 ,5372613 from src limit 1;
> > drop table if exists user_small;
> > create table user_small( userid int);
> > insert overwrite table user_small select key from src limit 100;
> > set hive.auto.convert.join.noconditionaltask.size = 200;
> > SELECT
> > `A`.`date`
> > , `deal`.`dealid`
> > FROM `orderpayment_small` `orderpayment`
> > JOIN `orderpayment_small` `A` ON `A`.`date` = `orderpayment`.`date`
> > JOIN `orderpayment_small` `deal` ON `deal`.`dealid` =
> > `orderpayment`.`dealid`
> > JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` =
> > `orderpayment`.`cityid`
> > JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
> > limit 5;
> >  
> >  
> > It's quite strange and interesting now. I will keep searching for the answer
> > to this issue.
> >  
> >  
> >  
> > 在 2013年8月9日星期五,上午3:32,wzc1989@gmail.com (mailto:wzc1989@gmail.com) 写道:
> >  
> > Hi all:
> > I'm currently testing hive11 and encounter one bug with
> > hive.auto.convert.join, I construct a testcase so everyone can reproduce
> > it(or you can reach the testcase
> > here:https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_bug):
> >  
> > use test;
> > create table src ( `key` int,`val` string);
> > load data local inpath '/Users/code6/git/hive/data/files/kv1.txt' overwrite
> > into table src;
> > drop table if exists orderpayment_small;
> > create table orderpayment_small (`dealid` int,`date` string,`time` string,
> > `cityid` int, `userid` int);
> > insert overwrite table orderpayment_small select 748, '2011-03-24',
> > '2011-03-24', 55 ,5372613 from src limit 1;
> > drop table if exists user_small;
> > create table user_small( userid int);
> > insert overwrite table user_small select key from src limit 100;
> > set hive.auto.convert.join.noconditionaltask.size = 200;
> > SELECT
> > `dim_pay_date`.`date`
> > , `deal`.`dealid`
> > FROM `orderpayment_small` `orderpayment`
> > JOIN `orderpayment_small` `dim_pay_date` ON `dim_pay_date`.`date` =
> > `orderpayment`.`date`
> > JOIN `orderpayment_small` `deal` ON `deal`.`dealid` =
> > `orderpayment`.`dealid`
> > JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` =
> > `orderpayment`.`cityid`
> > JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
> > limit 5;
> >  
> >  
> > You should replace the path of kv1.txt by yourself. You can run the above
> > query in hive 0.11 and it will fail with ArrayIndexOutOfBoundsException, You
> > can see the explain result and the console output of the query here :
> > https://gist.github.com/code6/6187569
> >  
> > I compile the trunk code but it doesn't work with this query. I can run this
> > query in hive 0.9 with hive.auto.convert.join turns on.
> >  
> > I try to dig into this problem and I think it may be caused by the map join
> > optimization. Some adjacent operators aren't match for the input/output
> > tableinfo(column positions diff).
> >  
> > I'm not able to fix this bug and I would appreciate it if someone would like
> > to look into this problem.
> >  
> > Thanks.  


Re: 回复: hive 0.11 auto convert join bug report

Posted by Navis류승우 <na...@nexr.com>.
Hi,

I've booked this on https://issues.apache.org/jira/browse/HIVE-5056
and attached patch for it.

It needs full test for confirmation but you can try it.

Thanks.

2013/8/11  <wz...@gmail.com>:
> Hi all:
> when I change the table alias dim_pay_date to A, the query pass in hive
> 0.11(https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_change_alias_pass):
>
> use test;
> create table if not exists src ( `key` int,`val` string);
> load data local inpath '/Users/code6/git/hive/data/files/kv1.txt' overwrite
> into table src;
> drop table if exists orderpayment_small;
> create table orderpayment_small (`dealid` int,`date` string,`time` string,
> `cityid` int, `userid` int);
> insert overwrite table orderpayment_small select 748, '2011-03-24',
> '2011-03-24', 55 ,5372613 from src limit 1;
> drop table if exists user_small;
> create table user_small( userid int);
> insert overwrite table user_small select key from src limit 100;
> set hive.auto.convert.join.noconditionaltask.size = 200;
> SELECT
>      `A`.`date`
>     , `deal`.`dealid`
> FROM `orderpayment_small` `orderpayment`
> JOIN `orderpayment_small` `A` ON `A`.`date` = `orderpayment`.`date`
> JOIN `orderpayment_small` `deal` ON `deal`.`dealid` =
> `orderpayment`.`dealid`
> JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` =
> `orderpayment`.`cityid`
> JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
> limit 5;
>
>
> It's quite strange and interesting now. I will keep searching for the answer
> to this issue.
>
>
>
> 在 2013年8月9日星期五,上午3:32,wzc1989@gmail.com 写道:
>
> Hi all:
> I'm currently testing hive11 and encounter one bug with
> hive.auto.convert.join, I construct a testcase so everyone can reproduce
> it(or you can reach the testcase
> here:https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_bug):
>
> use test;
> create table src ( `key` int,`val` string);
> load data local inpath '/Users/code6/git/hive/data/files/kv1.txt' overwrite
> into table src;
> drop table if exists orderpayment_small;
> create table orderpayment_small (`dealid` int,`date` string,`time` string,
> `cityid` int, `userid` int);
> insert overwrite table orderpayment_small select 748, '2011-03-24',
> '2011-03-24', 55 ,5372613 from src limit 1;
> drop table if exists user_small;
> create table user_small( userid int);
> insert overwrite table user_small select key from src limit 100;
> set hive.auto.convert.join.noconditionaltask.size = 200;
> SELECT
>      `dim_pay_date`.`date`
>     , `deal`.`dealid`
> FROM `orderpayment_small` `orderpayment`
> JOIN `orderpayment_small` `dim_pay_date` ON `dim_pay_date`.`date` =
> `orderpayment`.`date`
> JOIN `orderpayment_small` `deal` ON `deal`.`dealid` =
> `orderpayment`.`dealid`
> JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` =
> `orderpayment`.`cityid`
> JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
> limit 5;
>
>
> You should replace the path of kv1.txt by yourself. You can run the above
> query in hive 0.11 and it will fail with ArrayIndexOutOfBoundsException, You
> can see the explain result and the console output of the query here :
> https://gist.github.com/code6/6187569
>
> I compile the trunk code but it doesn't work with this query. I can run this
> query in hive 0.9 with hive.auto.convert.join turns on.
>
> I try to dig into this problem and I think it may be caused by the map join
> optimization. Some adjacent operators aren't match for the input/output
> tableinfo(column positions diff).
>
> I'm not able to fix this bug and I would appreciate it if someone would like
> to look into this problem.
>
> Thanks.
>
>