You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Ashutosh Chauhan (JIRA)" <ji...@apache.org> on 2017/12/13 18:45:00 UTC
[jira] [Commented] (HIVE-18270) count(distinct) using join and
group by produce incorrect output when hive.auto.convert.join=false and
hive.auto.convert.join.noconditionaltask=false
[ https://issues.apache.org/jira/browse/HIVE-18270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16289695#comment-16289695 ]
Ashutosh Chauhan commented on HIVE-18270:
-----------------------------------------
[~yuan_zac] Can you please also add a testcase?
> count(distinct) using join and group by produce incorrect output when hive.auto.convert.join=false and hive.auto.convert.join.noconditionaltask=false
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-18270
> URL: https://issues.apache.org/jira/browse/HIVE-18270
> Project: Hive
> Issue Type: Bug
> Affects Versions: 1.2.1, 2.1.1, 2.2.0, 2.3.0
> Reporter: Zac Zhou
> Assignee: Zac Zhou
> Attachments: HIVE-18270.1.patch
>
>
> When I run the following query:
> explain
> SELECT foo.id, count(distinct foo.line_id) as factor from
> foo JOIN bar ON (foo.id = bar.id)
> WHERE foo.orders != 'blah'
> group by foo.id;
> The following error is got:
> java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
> at java.util.ArrayList.rangeCheck(ArrayList.java:635)
> at java.util.ArrayList.get(ArrayList.java:411)
> at org.apache.hadoop.hive.ql.optimizer.correlation.ReduceSinkDeDuplication$AbsctractReducerReducerProc.merge(ReduceSinkDeDuplication.java:216)
> at org.apache.hadoop.hive.ql.optimizer.correlation.ReduceSinkDeDuplication$JoinReducerProc.process(ReduceSinkDeDuplication.java:557)
> at org.apache.hadoop.hive.ql.optimizer.correlation.ReduceSinkDeDuplication$AbsctractReducerReducerProc.process(ReduceSinkDeDuplication.java:166)
> at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95)
> at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:79)
> at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:133)
> at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:110)
> at org.apache.hadoop.hive.ql.optimizer.correlation.ReduceSinkDeDuplication.transform(ReduceSinkDeDuplication.java:108)
> at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:192)
> at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10201)
> at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:209)
> at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
> at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
> at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
> at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> It looks like it is a bug of ReduceSinkDeDuplication optimizer.
> Since the columns of count distinct need to be added into reduce key for sorting, the reducesink of group can't be replaced with the ones of join.
> In the case of count distinct query, reducesink of group should not be merged
>
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)