You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Johan Gustavsson (JIRA)" <ji...@apache.org> on 2016/01/04 02:57:39 UTC
[jira] [Commented] (HIVE-12664) Bug in reduce deduplication
optimization causing ArrayOutOfBoundException
[ https://issues.apache.org/jira/browse/HIVE-12664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15080617#comment-15080617 ]
Johan Gustavsson commented on HIVE-12664:
-----------------------------------------
[~ashutoshc], sorry for the late response but bellow is a stack trace:
{code}
15/12/22 03:09:13 ERROR ql.Driver: FAILED: IndexOutOfBoundsException Index: 1, Size: 1
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at org.apache.hadoop.hive.ql.optimizer.correlation.ReduceSinkDeDuplication$AbsctractReducerReducerProc.merge(ReduceSinkDeDuplication.java:212)
at org.apache.hadoop.hive.ql.optimizer.correlation.ReduceSinkDeDuplication$JoinReducerProc.process(ReduceSinkDeDuplication.java:547)
at org.apache.hadoop.hive.ql.optimizer.correlation.ReduceSinkDeDuplication$AbsctractReducerReducerProc.process(ReduceSinkDeDuplication.java:164)
at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132)
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
at org.apache.hadoop.hive.ql.optimizer.correlation.ReduceSinkDeDuplication.transform(ReduceSinkDeDuplication.java:107)
at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:146)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9423)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:427)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:323)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:980)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1045)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:916)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:906)
at com.treasure_data.hadoop.hive.runner.QueryRunner.processQueryCmd(QueryRunner.java:453)
at com.treasure_data.hadoop.hive.runner.QueryRunner.processCmd(QueryRunner.java:394)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:359)
at com.treasure_data.hadoop.hive.runner.QueryRunner.run(QueryRunner.java:313)
at com.treasure_data.hadoop.hive.runner.QueryRunner$1.run(QueryRunner.java:192)
at com.treasure_data.hadoop.hive.runner.QueryRunner$1.run(QueryRunner.java:190)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at com.treasure_data.hadoop.util.TDUtil.doAs(TDUtil.java:226)
at com.treasure_data.hadoop.hive.runner.QueryRunner.main(QueryRunner.java:190)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
{code}
> Bug in reduce deduplication optimization causing ArrayOutOfBoundException
> -------------------------------------------------------------------------
>
> Key: HIVE-12664
> URL: https://issues.apache.org/jira/browse/HIVE-12664
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Affects Versions: 1.1.1, 1.2.1
> Reporter: Johan Gustavsson
> Assignee: Johan Gustavsson
> Attachments: HIVE-12664-1.patch, HIVE-12664.1.patch, HIVE-12664.patch
>
>
> The optimisation check for reduce deduplication only checks the first child node for join -and the check itself also contains a major bug- causing ArrayOutOfBoundException no matter what.
> Sample data table form:
> ||time||user||host||path||referer||code||agent||size||method||
> |int|string|string|string|string|bigint|string|bigint|string|
> Sample query
> {code:sql}
> SELECT
> t1.host,
> COUNT(DISTINCT t1.`date`) AS login_count,
> MAX(t2.code) AS code,
> unix_timestamp() AS time
> FROM (
> SELECT
> HOST,
> MIN(time) AS DATE
> FROM
> www_access
> WHERE
> HOST IS NOT NULL
> GROUP BY
> HOST
> ) t1
> JOIN (
> SELECT
> HOST,
> MIN(time) AS code
> FROM
> www_access
> WHERE
> HOST IS NOT NULL
> GROUP BY
> HOST
> ) t2
> ON t1.host = t2.host
> GROUP BY
> t1.host
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)