You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/04/08 12:18:00 UTC

[jira] [Work logged] (HIVE-24883) Add support for complex types columns in Hive Joins

     [ https://issues.apache.org/jira/browse/HIVE-24883?focusedWorklogId=579135&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-579135 ]

ASF GitHub Bot logged work on HIVE-24883:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 08/Apr/21 12:17
            Start Date: 08/Apr/21 12:17
    Worklog Time Spent: 10m 
      Work Description: zabetak commented on a change in pull request #2071:
URL: https://github.com/apache/hive/pull/2071#discussion_r609622698



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/HiveWritableComparator.java
##########
@@ -0,0 +1,152 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.io.WritableComparable;
+import org.apache.hadoop.io.WritableComparator;
+import java.util.ArrayList;
+import java.util.LinkedHashMap;
+
+class HiveListComparator extends HiveWritableComparator {

Review comment:
       Having multiple top-level classes in a single source file does not provide any big advantage and on the contrary may cause problems (check Item 25: Limit source files to a single top-level class Effective Java).

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/HiveWritableComparator.java
##########
@@ -0,0 +1,152 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.io.WritableComparable;
+import org.apache.hadoop.io.WritableComparator;
+import java.util.ArrayList;
+import java.util.LinkedHashMap;
+
+class HiveListComparator extends HiveWritableComparator {
+    // For List, all elements will have same type, so only one comparator is sufficient.
+    HiveWritableComparator comparator = null;
+
+    @Override
+    public int compare(Object key1, Object key2) {
+        ArrayList a1 = (ArrayList) key1;
+        ArrayList a2 = (ArrayList) key2;
+        if (a1.size() != a2.size()) {
+            return a1.size() > a2.size() ? 1 : -1;
+        }
+        if (a1.size() == 0) {
+            return 0;
+        }
+
+        if (comparator == null) {
+            // For List, all elements should be of same type.
+            comparator = HiveWritableComparator.get(a1.get(0));
+        }
+
+        int result = 0;
+        for (int i = 0; i < a1.size(); i++) {
+            result = comparator.compare(a1.get(i), a2.get(i));
+            if (result != 0) {
+                return result;
+            }
+        }
+        return result;
+    }
+}
+
+class HiveStructComparator extends HiveWritableComparator {
+    HiveWritableComparator[] comparator = null;
+
+    @Override
+    public int compare(Object key1, Object key2) {
+        ArrayList a1 = (ArrayList) key1;
+        ArrayList a2 = (ArrayList) key2;
+        if (a1.size() != a2.size()) {
+            return a1.size() > a2.size() ? 1 : -1;
+        }
+        if (a1.size() == 0) {
+            return 0;
+        }
+        if (comparator == null) {
+            comparator = new HiveWritableComparator[a1.size()];
+            // For struct all elements may not be of same type, so create comparator for each entry.
+            for (int i = 0; i < a1.size(); i++) {
+                comparator[i] = HiveWritableComparator.get(a1.get(i));
+            }
+        }
+        int result = 0;
+        for (int i = 0; i < a1.size(); i++) {
+            result = comparator[i].compare(a1.get(i), a2.get(i));
+            if (result != 0) {
+                return result;
+            }
+        }
+        return result;
+    }
+}
+
+class HiveMapComparator extends HiveWritableComparator {
+    HiveWritableComparator comparatorValue = null;
+    HiveWritableComparator comparatorKey = null;
+
+    @Override
+    public int compare(Object key1, Object key2) {
+        LinkedHashMap map1 = (LinkedHashMap) key1;
+        LinkedHashMap map2 = (LinkedHashMap) key2;
+        if (map1.entrySet().size() != map2.entrySet().size()) {
+            return map1.entrySet().size() > map2.entrySet().size() ? 1 : -1;
+        }
+        if (map1.entrySet().size() == 0) {
+            return 0;
+        }
+
+        if (comparatorKey == null) {
+            comparatorKey = HiveWritableComparator.get(map1.keySet().iterator().next());
+            comparatorValue = HiveWritableComparator.get(map1.values().iterator().next());
+        }
+
+        int result = comparatorKey.compare(map1.keySet().iterator().next(),
+                map2.keySet().iterator().next());
+        if (result != 0) {
+            return result;
+        }
+        return comparatorValue.compare(map1.values().iterator().next(), map2.values().iterator().next());
+    }
+}
+
+public class HiveWritableComparator extends WritableComparator {

Review comment:
       Why is it necessary to introduce a new API? As far as I can see `HiveWritableComparator` does not add any new behavior to `WritableComparator`. It only contains some factory methods and these would fit much better in a `ComplexWritableComparatorFactory` class that is final and immutable. 
   
   The non-public top-level classes above could become private static members classes of the factory class.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java
##########
@@ -2594,7 +2594,9 @@ private boolean validateMapJoinDesc(MapJoinDesc desc) {
       return false;
     }
     List<ExprNodeDesc> keyExprs = desc.getKeys().get(posBigTable);
-    if (!validateExprNodeDesc(keyExprs, "Key")) {
+    if (!validateExprNodeDescNoComplex(keyExprs, "Key")) {

Review comment:
       A few lines up (2592-2593) there is a method that seems to control if complex types are allowed or not. How do we choose which one to use?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/HiveWritableComparator.java
##########
@@ -0,0 +1,152 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.exec;
+
+import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.apache.hadoop.io.WritableComparable;
+import org.apache.hadoop.io.WritableComparator;
+import java.util.ArrayList;
+import java.util.LinkedHashMap;
+
+class HiveListComparator extends HiveWritableComparator {
+    // For List, all elements will have same type, so only one comparator is sufficient.
+    HiveWritableComparator comparator = null;
+
+    @Override
+    public int compare(Object key1, Object key2) {
+        ArrayList a1 = (ArrayList) key1;
+        ArrayList a2 = (ArrayList) key2;
+        if (a1.size() != a2.size()) {
+            return a1.size() > a2.size() ? 1 : -1;
+        }
+        if (a1.size() == 0) {
+            return 0;
+        }
+
+        if (comparator == null) {
+            // For List, all elements should be of same type.
+            comparator = HiveWritableComparator.get(a1.get(0));
+        }
+
+        int result = 0;
+        for (int i = 0; i < a1.size(); i++) {
+            result = comparator.compare(a1.get(i), a2.get(i));
+            if (result != 0) {
+                return result;
+            }
+        }
+        return result;
+    }
+}
+
+class HiveStructComparator extends HiveWritableComparator {
+    HiveWritableComparator[] comparator = null;
+
+    @Override
+    public int compare(Object key1, Object key2) {
+        ArrayList a1 = (ArrayList) key1;
+        ArrayList a2 = (ArrayList) key2;
+        if (a1.size() != a2.size()) {

Review comment:
       Is it possible to get an NPE? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 579135)
    Time Spent: 20m  (was: 10m)

> Add support for complex types columns in Hive Joins
> ---------------------------------------------------
>
>                 Key: HIVE-24883
>                 URL: https://issues.apache.org/jira/browse/HIVE-24883
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 4.0.0
>            Reporter: mahesh kumar behera
>            Assignee: mahesh kumar behera
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hive fails to execute joins on array type columns as the comparison functions are not able to handle array type columns.   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)