You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Stamatis Zampetakis (Jira)" <ji...@apache.org> on 2022/12/14 09:43:00 UTC

[jira] [Commented] (HIVE-26846) Exception at non-vectorized map join execution

    [ https://issues.apache.org/jira/browse/HIVE-26846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17647016#comment-17647016 ] 

Stamatis Zampetakis commented on HIVE-26846:
--------------------------------------------

We noticed this exception as well while working on HIVE-26653 but forgot to log a ticket; thanks for following up on this [~ghanko].  I agree with you that this is probably a different problem.

> Exception at non-vectorized map join execution
> ----------------------------------------------
>
>                 Key: HIVE-26846
>                 URL: https://issues.apache.org/jira/browse/HIVE-26846
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Hankó Gergely
>            Priority: Major
>
> How to reproduce (the csv files are attached to HIVE-26653):
> {code:java}
> set hive.auto.convert.join=true;
> set hive.vectorized.execution.enabled=false;
> CREATE TABLE table_a (`aid` string ) PARTITIONED BY (`p_dt` string) row format delimited fields terminated by ',' stored as textfile;
> CREATE TABLE table_b (`bid` string) PARTITIONED BY (`p_dt` string) row format delimited fields terminated by ',' stored as textfile;
> load data local inpath 'table_a.csv' into table table_a;
> load data local inpath 'table_b.csv' into table table_b;
> SELECT a.p_dt FROM ((SELECT p_dt FROM table_b GROUP BY p_dt) a JOIN (SELECT p_dt FROM table_a) b ON a.p_dt = b.p_dt) WHERE a.p_dt = translate(cast(to_date(date_sub('2022-08-01', 1)) AS string), '-', ''); {code}
> Result:
> {code:java}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected exception from MapJoinOperator : Index: 0, Size: 0
>         at org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:595)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
>         at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:173)
>         at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:155)
>         at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555)
>         ... 19 more
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>         at java.util.ArrayList.rangeCheck(ArrayList.java:659)
>         at java.util.ArrayList.get(ArrayList.java:435)
>         at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:918)
>         at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:1013)
>         at org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:582)
>         ... 23 more
>  {code}
> Expected:
> {code:java}
> 20220731
> 20220731
> 20220731
> 20220731
> 20220731
> 20220731
> 20220731
> 20220731
> 20220731
> 20220731
> 20220731
> 20220731
> 20220731
> 20220731
> 20220731
> 20220731
> 20220731
> 20220731
> 20220731
> 20220731
> 20220731
> 20220731
> 20220731
> 20220731
> 20220731
> 20220731
> 20220731
> 20220731
> 20220731
> 20220731
> 20220731 {code}
> Additional info:
> The cause of the exception is that _col1 is pruned at ColumnPrunerProcFactory:1152. If it is not pruned then the query runs fine.
> The query returns all NULLs in vectorized mode (HIVE-26653) and that problem is not fixed by keeping _col1 so I'm not entirely sure that it's the same issue as HIVE-26653, but their root cause is probably similar.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)