You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/10/04 06:25:33 UTC

[GitHub] [hive] sankarh commented on a change in pull request #2689: HIVE-25553: Support Map data-type natively in Arrow format

sankarh commented on a change in pull request #2689:
URL: https://github.com/apache/hive/pull/2689#discussion_r721068047



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java
##########
@@ -170,7 +171,7 @@ private static Field toField(String name, TypeInfo typeInfo) {
         for (int i = 0; i < structSize; i++) {
           structFields.add(toField(fieldNames.get(i), fieldTypeInfos.get(i)));
         }
-        return new Field(name, FieldType.nullable(MinorType.STRUCT.getType()), structFields);
+        return new Field(name, new FieldType(false, new ArrowType.Struct(), null), structFields);

Review comment:
       I think, this point holds good only if Struct is a child element type of Map column. If the column type is STRUCT, then it should be nullable. Can we make this change only for first case?

##########
File path: ql/src/java/org/apache/hadoop/hive/llap/WritableByteChannelAdapter.java
##########
@@ -93,7 +93,7 @@ public int write(ByteBuffer src) throws IOException {
     int size = src.remaining();
     //Down the semaphore or block until available
     takeWriteResources(1);
-    ByteBuf buf = allocator.buffer(size);
+    ByteBuf buf = allocator.getAsByteBufAllocator().buffer(size);

Review comment:
       Why this change relevant to this jira?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/arrow/Serializer.java
##########
@@ -226,7 +226,7 @@ public ArrowWrapperWritable serializeBatch(VectorizedRowBatch vectorizedRowBatch
   }
 
   private static FieldType toFieldType(TypeInfo typeInfo) {
-    return new FieldType(true, toArrowType(typeInfo), null);
+    return new FieldType(false, toArrowType(typeInfo), null);

Review comment:
       I think, even this change should come only if list or struct are elements of other complex data types. We cannot make it global for all data types.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java
##########
@@ -185,7 +186,7 @@ private static Field toField(String name, TypeInfo typeInfo) {
         final TypeInfo keyTypeInfo = mapTypeInfo.getMapKeyTypeInfo();
         final TypeInfo valueTypeInfo = mapTypeInfo.getMapValueTypeInfo();
         final StructTypeInfo mapStructTypeInfo = new StructTypeInfo();
-        mapStructTypeInfo.setAllStructFieldNames(Lists.newArrayList("keys", "values"));
+        mapStructTypeInfo.setAllStructFieldNames(Lists.newArrayList("key", "value"));

Review comment:
       Make sense.

##########
File path: itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcWithMiniLlapArrow.java
##########
@@ -123,8 +123,8 @@ public static void afterTest() {
     return new LlapArrowRowInputFormat(Long.MAX_VALUE);
   }
 
-  // Currently MAP type is not supported. Add it back when Arrow 1.0 is released.
-  // See: SPARK-21187
+  // Currently, loading from a text file gives errors with Map dataType.
+  // This needs to be fixed when adding support for non-ORC writes (text and parquet) for the llap-ext-client.

Review comment:
       This comment doesn't make sense here as this test is not specific to llap-ext-client. Instead mention the Hive JIRA which would solve this issue.

##########
File path: itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcWithMiniLlapArrow.java
##########
@@ -123,8 +123,8 @@ public static void afterTest() {
     return new LlapArrowRowInputFormat(Long.MAX_VALUE);
   }
 
-  // Currently MAP type is not supported. Add it back when Arrow 1.0 is released.
-  // See: SPARK-21187
+  // Currently, loading from a text file gives errors with Map dataType.
+  // This needs to be fixed when adding support for non-ORC writes (text and parquet) for the llap-ext-client.

Review comment:
       Ideally, anything that works for ORC should work for non-ORC as well. Pls enable the tests on Map data type and see if it works after Arrow upgrade.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org