You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/08/10 11:05:00 UTC

[jira] [Work logged] (HIVE-25410) CommonMergeJoin fails for ARRAY join keys with varying size

     [ https://issues.apache.org/jira/browse/HIVE-25410?focusedWorklogId=636396&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636396 ]

ASF GitHub Bot logged work on HIVE-25410:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 10/Aug/21 11:04
            Start Date: 10/Aug/21 11:04
    Worklog Time Spent: 10m 
      Work Description: okumin commented on a change in pull request #2551:
URL: https://github.com/apache/hive/pull/2551#discussion_r685915956



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/HiveStructComparator.java
##########
@@ -45,16 +48,14 @@ public int compare(Object key1, Object key2) {
         if (a1.size() == 0) {
             return 0;
         }
-        if (comparator == null) {
-            comparator = new WritableComparator[a1.size()];
-            // For struct all elements may not be of same type, so create comparator for each entry.
-            for (int i = 0; i < a1.size(); i++) {
-                comparator[i] = WritableComparatorFactory.get(a1.get(i), nullSafe, nullOrdering);
-            }
+        // For array, the length may not be fixed, so extend comparators on demand
+        for (int i = comparators.size(); i < a1.size(); i++) {
+            // For struct, all elements may not be of same type, so create comparator for each entry.
+            comparators.add(i, WritableComparatorFactory.get(a1.get(i), nullSafe, nullOrdering));
         }
         result = 0;
         for (int i = 0; i < a1.size(); i++) {
-            result = comparator[i].compare(a1.get(i), a2.get(i));
+            result = comparators.get(i).compare(a1.get(i), a2.get(i));

Review comment:
       @zabetak Maybe, I have the same feeling. Basically, STRUCT and ARRAY are different data structures and we can have different approach. That would also make implementation straightforward.
   My idea is let WritableComparatorFactory identify more precise types so that it can distinct STRUCT, ARRAY, and so on. It will require some effort since WritableComparatorFactory has to accept ObjectInspector or type information. But it sounds more robust than inferring data types from `Object`.
   Anyway, I agree to create a follow-up ticket and I will do that if you have nothing more to discuss here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 636396)
    Time Spent: 1h 20m  (was: 1h 10m)

> CommonMergeJoin fails for ARRAY join keys with varying size
> -----------------------------------------------------------
>
>                 Key: HIVE-25410
>                 URL: https://issues.apache.org/jira/browse/HIVE-25410
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: okumin
>            Assignee: okumin
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Thanks to HIVE-24883, CommonMergeJoinOperator can handle ARRAY or STRUCT types as a JOIN key.
> There are corner cases where CommonMergeJoinOperator fails with `ArrayIndexOutOfBoundsException`.
>  
> This is a simple case.
> {code:java}
> SET hive.auto.convert.join=false;
> CREATE TABLE table_list_types (id int, key array<int>);
> INSERT INTO table_list_types VALUES (1, array(1, 2)), (2, array(1, 2)), (3, array(1, 2, 3)), (4, array(1, 2, 3));
> SELECT * FROM table_list_types t1 INNER JOIN table_list_types t2 ON t1.key = t2.key; {code}
> With 69c97c26ac68a245f4d327cc2f7b3a2333f8fa84, the following error happened.
> {code:java}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
> 	at org.apache.hadoop.hive.ql.exec.HiveStructComparator.compare(HiveStructComparator.java:57)
> 	at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.compareKey(CommonMergeJoinOperator.java:629)
> 	at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.compareKeys(CommonMergeJoinOperator.java:597)
> 	at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.processKey(CommonMergeJoinOperator.java:566)
> 	at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:249)
> 	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370)
> 	... 26 more {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)