You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/01/14 14:49:49 UTC

[GitHub] [iceberg] pvary commented on a change in pull request #2078: Hive: Fix Deserializer to use source deserializer instead of the Iceberg ones

pvary commented on a change in pull request #2078:
URL: https://github.com/apache/iceberg/pull/2078#discussion_r557449942



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/Deserializer.java
##########
@@ -78,62 +82,55 @@ Record deserialize(Object data) {
     return (Record) fieldDeserializer.value(data);
   }
 
-  private Deserializer(Schema schema, ObjectInspector fieldInspector) {
-    this.fieldDeserializer = DeserializerVisitor.visit(schema, fieldInspector);
+  private Deserializer(Schema schema, ObjectInspectorPair pair) {
+    this.fieldDeserializer = DeserializerVisitor.visit(schema, pair);
   }
 
-  private static class DeserializerVisitor extends SchemaWithPartnerVisitor<ObjectInspector, FieldDeserializer> {
+  private static class DeserializerVisitor extends SchemaWithPartnerVisitor<ObjectInspectorPair, FieldDeserializer> {
 
-    public static FieldDeserializer visit(Schema schema, ObjectInspector objectInspector) {
-      return visit(schema, new FixNameMappingObjectInspector(schema, objectInspector), new DeserializerVisitor(),
+    public static FieldDeserializer visit(Schema schema, ObjectInspectorPair pair) {
+      return visit(schema, new FixNameMappingObjectInspectorPair(schema, pair), new DeserializerVisitor(),
           new PartnerObjectInspectorByNameAccessors());
     }
 
     @Override
-    public FieldDeserializer schema(Schema schema, ObjectInspector inspector, FieldDeserializer deserializer) {
+    public FieldDeserializer schema(Schema schema, ObjectInspectorPair pair, FieldDeserializer deserializer) {
       return deserializer;
     }
 
     @Override
-    public FieldDeserializer field(NestedField field, ObjectInspector inspector, FieldDeserializer deserializer) {
+    public FieldDeserializer field(NestedField field, ObjectInspectorPair pair, FieldDeserializer deserializer) {
       return deserializer;
     }
 
     @Override
-    public FieldDeserializer primitive(PrimitiveType type, ObjectInspector inspector) {
-      switch (type.typeId()) {
-        case BOOLEAN:
-        case INTEGER:
-        case LONG:
-        case FLOAT:
-        case DOUBLE:
-        case STRING:
-          // Generic conversions where Iceberg and Hive are using the same java object
-          return o -> ((PrimitiveObjectInspector) inspector).getPrimitiveJavaObject(o);
-        case UUID:
-          // TODO: This will not work with Parquet. Parquet UUID expect byte[], others are expecting UUID
-          return o -> UUID.fromString(((StringObjectInspector) inspector).getPrimitiveJavaObject(o));
-        case DATE:
-        case TIMESTAMP:
-        case FIXED:
-        case BINARY:
-        case DECIMAL:
-          // Iceberg specific conversions
-          return o -> ((WriteObjectInspector) inspector).convert(o);
-        case TIME:
-        default:
-          throw new IllegalArgumentException("Unsupported column type: " + type);
-      }
+    public FieldDeserializer primitive(PrimitiveType type, ObjectInspectorPair pair) {
+      return o -> {
+        if (o == null) {
+          return null;
+        }
+
+        ObjectInspector writerFieldInspector = pair.writerInspector();
+        ObjectInspector sourceFieldInspector = pair.sourceInspector();
+
+        Object result = ((PrimitiveObjectInspector) sourceFieldInspector).getPrimitiveJavaObject(o);
+        if (writerFieldInspector instanceof WriteObjectInspector) {

Review comment:
       The original version was to use the Writable as an input for the `convert()`. In this new version `convert()` uses the native Hive object instead




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org