You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/07/15 17:31:53 UTC

[GitHub] [iceberg] edgarRd opened a new pull request #1208: ORC: Add name mapping support

edgarRd opened a new pull request #1208:
URL: https://github.com/apache/iceberg/pull/1208


   This PR fixes #1204 implementing name mapping for ORC.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] shardulm94 commented on a change in pull request #1208: ORC: Add name mapping support

Posted by GitBox <gi...@apache.org>.
shardulm94 commented on a change in pull request #1208:
URL: https://github.com/apache/iceberg/pull/1208#discussion_r467673301



##########
File path: orc/src/test/java/org/apache/iceberg/orc/TestExpressionToSearchArgument.java
##########
@@ -262,4 +264,132 @@ public void testEvolvedSchema() {
     actual = ExpressionToSearchArgument.convert(boundFilter, readSchema);
     Assert.assertEquals(expected.toString(), actual.toString());
   }
+
+  @Test
+  public void testOriginalSchemaNameMapping() {
+    Schema originalSchema = new Schema(
+        required(1, "int", Types.IntegerType.get()),
+        optional(2, "long", Types.LongType.get())
+    );
+
+    TypeDescription orcSchemaWithoutIds = ORCSchemaUtil.removeIds(ORCSchemaUtil.convert(originalSchema));
+    NameMapping nameMapping = MappingUtil.create(originalSchema);
+
+    TypeDescription readSchema = ORCSchemaUtil.buildOrcProjection(originalSchema,
+        ORCSchemaUtil.applyNameMapping(orcSchemaWithoutIds, nameMapping));
+
+    Expression expr = and(equal("int", 1), equal("long", 1));
+    Expression boundFilter = Binder.bind(originalSchema.asStruct(), expr, true);
+    SearchArgument expected = SearchArgumentFactory.newBuilder()
+        .equals("`int`", Type.LONG, 1L)
+        .equals("`long`", Type.LONG, 1L)
+        .build();
+
+    SearchArgument actual = ExpressionToSearchArgument.convert(boundFilter, readSchema);
+    Assert.assertEquals(expected.toString(), actual.toString());
+  }
+
+  @Test
+  public void testModifiedSimpleSchemaNameMapping() {
+    Schema originalSchema = new Schema(
+        required(1, "int", Types.IntegerType.get()),
+        optional(2, "long_to_be_dropped", Types.LongType.get())
+    );
+    Schema mappingSchema = new Schema(
+        required(1, "int", Types.IntegerType.get()),
+        optional(3, "new_float_field", Types.FloatType.get())
+    );
+    TypeDescription orcSchemaWithoutIds = ORCSchemaUtil.removeIds(ORCSchemaUtil.convert(originalSchema));
+    NameMapping nameMapping = MappingUtil.create(mappingSchema);
+
+    TypeDescription readSchema = ORCSchemaUtil.buildOrcProjection(mappingSchema,
+        ORCSchemaUtil.applyNameMapping(orcSchemaWithoutIds, nameMapping));
+
+    Expression expr = equal("int", 1);
+    Expression boundFilter = Binder.bind(mappingSchema.asStruct(), expr, true);
+    SearchArgument expected = SearchArgumentFactory.newBuilder()
+        .equals("`int`", Type.LONG, 1L)
+        .build();
+
+    SearchArgument actual = ExpressionToSearchArgument.convert(boundFilter, readSchema);
+    Assert.assertEquals(expected.toString(), actual.toString());
+
+    // for columns not in the file, buildOrcProjection will append field names with _r<ID>
+    // this will be passed down to ORC, but ORC will handle such cases and return a TruthValue during evaluation
+    expr = equal("new_float_field", 1);
+    boundFilter = Binder.bind(mappingSchema.asStruct(), expr, true);
+    expected = SearchArgumentFactory.newBuilder()
+        .equals("`new_float_field_r3`", Type.FLOAT, 1.0)
+        .build();
+
+    actual = ExpressionToSearchArgument.convert(boundFilter, readSchema);
+    Assert.assertEquals(expected.toString(), actual.toString());
+  }
+
+  @Test
+  public void testModifiedComplexSchemaNameMapping() {
+    Schema originalSchema = new Schema(
+        optional(1, "struct", Types.StructType.of(
+            required(2, "long", Types.LongType.get())
+        )),
+        optional(3, "list", Types.ListType.ofRequired(4, Types.LongType.get())),
+        optional(5, "map", Types.MapType.ofRequired(6, 7, Types.LongType.get(), Types.LongType.get())),
+        optional(8, "listOfStruct", Types.ListType.ofRequired(9, Types.StructType.of(
+            required(10, "long", Types.LongType.get())))),
+        optional(11, "listOfPeople", Types.ListType.ofRequired(12, Types.StructType.of(
+            required(13, "name", Types.StringType.get()),
+            required(14, "birth_date", Types.DateType.get()))))
+    );
+    Schema mappingSchema = new Schema(
+        optional(1, "struct", Types.StructType.of(
+            required(2, "int", Types.LongType.get())
+        )),
+        optional(3, "list", Types.ListType.ofRequired(4, Types.LongType.get())),
+        optional(5, "newMap", Types.MapType.ofRequired(6, 7, Types.StringType.get(), Types.LongType.get())),
+        optional(8, "listOfStruct", Types.ListType.ofRequired(9, Types.StructType.of(
+            required(10, "newLong", Types.LongType.get())))),
+        optional(11, "listOfPeople", Types.ListType.ofRequired(12, Types.StructType.of(
+            required(13, "name", Types.StringType.get()),
+            required(14, "age", Types.IntegerType.get()))))
+    );
+    TypeDescription orcSchemaWithoutIds = ORCSchemaUtil.removeIds(ORCSchemaUtil.convert(originalSchema));
+    NameMapping nameMapping = MappingUtil.create(mappingSchema);
+
+    TypeDescription readSchema = ORCSchemaUtil.buildOrcProjection(mappingSchema,
+        ORCSchemaUtil.applyNameMapping(orcSchemaWithoutIds, nameMapping));
+
+    Expression expr = and(
+        and(
+            equal("struct.int", 1), and(
+                lessThanOrEqual("list.element", 5),
+                equal("newMap.key", "country")
+            ),
+            and(
+                equal("listOfStruct.newLong", 100L),
+                notEqual("listOfPeople.name", "Bob")
+            )
+
+        ),
+        lessThan("listOfPeople.age", 30)
+    );
+    Expression boundFilter = Binder.bind(mappingSchema.asStruct(), expr, true);
+    SearchArgument expected = SearchArgumentFactory.newBuilder()
+        .startAnd()
+        // Drops strict.long

Review comment:
       Nit: Typo `struct.long`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on pull request #1208: ORC: Add name mapping support

Posted by GitBox <gi...@apache.org>.
rdsr commented on pull request #1208:
URL: https://github.com/apache/iceberg/pull/1208#issuecomment-675701942


   Thanks @edgarRd . I just merged this!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on a change in pull request #1208: ORC: Add name mapping support

Posted by GitBox <gi...@apache.org>.
rdsr commented on a change in pull request #1208:
URL: https://github.com/apache/iceberg/pull/1208#discussion_r460375966



##########
File path: orc/src/main/java/org/apache/iceberg/orc/ApplyNameMapping.java
##########
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.orc;
+
+import java.util.List;
+import org.apache.iceberg.mapping.MappedField;
+import org.apache.iceberg.mapping.NameMapping;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.orc.TypeDescription;
+
+class ApplyNameMapping extends OrcSchemaVisitor<TypeDescription> {
+  private final NameMapping nameMapping;
+
+  ApplyNameMapping(NameMapping nameMapping) {
+    this.nameMapping = nameMapping;
+  }
+
+  @Override
+  public String elementName() {
+    return "element";
+  }
+
+  @Override
+  public String keyName() {
+    return "key";
+  }
+
+  @Override
+  public String valueName() {
+    return "value";
+  }
+
+  TypeDescription setId(TypeDescription type, MappedField mappedField) {
+    if (mappedField != null) {

Review comment:
       I think this works as we are using the Iceberg read schema for building the projection.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] edgarRd commented on a change in pull request #1208: ORC: Add name mapping support

Posted by GitBox <gi...@apache.org>.
edgarRd commented on a change in pull request #1208:
URL: https://github.com/apache/iceberg/pull/1208#discussion_r459805494



##########
File path: orc/src/test/java/org/apache/iceberg/orc/TestORCSchemaUtil.java
##########
@@ -269,4 +278,68 @@ public void testSkipNonIcebergColumns() {
     );
     assertEquals("Schemas must match.", expectedSchema2.asStruct(), icebergSchema2.asStruct());
   }
+
+  @Test
+  public void testHasIds() {
+    Schema schema = new Schema(
+        optional(1, "data", Types.StructType.of(
+            optional(10, "entries", Types.MapType.ofOptional(11, 12, Types.StringType.get(), Types.DateType.get()))
+        )),
+        optional(2, "intCol", Types.IntegerType.get()),
+        optional(3, "longCol", Types.LongType.get()),
+        optional(4, "listCol", Types.ListType.ofOptional(40, Types.DoubleType.get()))
+    );
+
+    TypeDescription orcSchema = ORCSchemaUtil.removeIds(ORCSchemaUtil.convert(schema));
+    assertFalse("Should not have Ids", ORCSchemaUtil.hasIds(orcSchema));
+
+    TypeDescription map2Col = TypeDescription.createMap(TypeDescription.createString(), TypeDescription.createBinary());
+    map2Col.setAttribute(ICEBERG_ID_ATTRIBUTE, "4");
+    orcSchema.addField("map2Col", map2Col);
+    assertTrue("Should have Ids after adding one type with Id", ORCSchemaUtil.hasIds(orcSchema));
+  }
+
+  @Test
+  public void tetsAssignIdsByNameMapping() {
+    Types.StructType structType = Types.StructType.of(
+        required(0, "id", Types.LongType.get()),
+        optional(1, "list_of_maps",
+            Types.ListType.ofOptional(2, Types.MapType.ofOptional(3, 4,
+                Types.StringType.get(),
+                SUPPORTED_PRIMITIVES))),
+        optional(5, "map_of_lists",
+            Types.MapType.ofOptional(6, 7,
+                Types.StringType.get(),
+                Types.ListType.ofOptional(8, SUPPORTED_PRIMITIVES))),
+        required(9, "list_of_lists",
+            Types.ListType.ofOptional(10, Types.ListType.ofOptional(11, SUPPORTED_PRIMITIVES))),
+        required(12, "map_of_maps",
+            Types.MapType.ofOptional(13, 14,
+                Types.StringType.get(),
+                Types.MapType.ofOptional(15, 16,
+                    Types.StringType.get(),
+                    SUPPORTED_PRIMITIVES))),
+        required(17, "list_of_struct_of_nested_types", Types.ListType.ofOptional(19, Types.StructType.of(
+            Types.NestedField.required(20, "m1", Types.MapType.ofOptional(21, 22,
+                Types.StringType.get(),
+                SUPPORTED_PRIMITIVES)),
+            Types.NestedField.optional(23, "l1", Types.ListType.ofRequired(24, SUPPORTED_PRIMITIVES)),
+            Types.NestedField.required(25, "l2", Types.ListType.ofRequired(26, SUPPORTED_PRIMITIVES)),
+            Types.NestedField.optional(27, "m2", Types.MapType.ofOptional(28, 29,
+                Types.StringType.get(),
+                SUPPORTED_PRIMITIVES))
+        )))
+    );
+
+    Schema schema = new Schema(TypeUtil.assignFreshIds(structType, new AtomicInteger(0)::incrementAndGet)
+        .asStructType().fields());
+
+    NameMapping nameMapping = MappingUtil.create(schema);
+    TypeDescription typeDescriptionWithIds = ORCSchemaUtil.convert(schema);
+    TypeDescription typeDescriptionWithIdsFromNameMapping = ORCSchemaUtil
+        .applyNameMapping(ORCSchemaUtil.removeIds(typeDescriptionWithIds), nameMapping);
+
+    assertTrue("TypeDescription schemas should be equal, not comparing Attributes",
+        typeDescriptionWithIds.equals(typeDescriptionWithIdsFromNameMapping, false /* checkAttributes */));

Review comment:
       Name mapping - using an Iceberg schema - will select the columns from an ORC schema that match the column names from Iceberg, skipping anything that does not match and adds the ids found to produce the reading schema. So, I think the effect of name mapping is still being tested in this case since we're checking the the `TypeDescription` generated with name mapping is the expected one. That being said, I've added a check for the IDs as well. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on a change in pull request #1208: ORC: Add name mapping support

Posted by GitBox <gi...@apache.org>.
rdsr commented on a change in pull request #1208:
URL: https://github.com/apache/iceberg/pull/1208#discussion_r460376124



##########
File path: orc/src/main/java/org/apache/iceberg/orc/OrcIterable.java
##########
@@ -73,7 +78,15 @@
   public CloseableIterator<T> iterator() {
     Reader orcFileReader = ORC.newFileReader(file, config);
     addCloseable(orcFileReader);
-    TypeDescription readOrcSchema = ORCSchemaUtil.buildOrcProjection(schema, orcFileReader.getSchema());
+
+    TypeDescription fileSchema = orcFileReader.getSchema();
+    final TypeDescription readOrcSchema;
+    if (ORCSchemaUtil.hasIds(fileSchema)) {
+      readOrcSchema = ORCSchemaUtil.buildOrcProjection(schema, fileSchema);
+    } else {
+      TypeDescription typeWithIds = ORCSchemaUtil.applyNameMapping(fileSchema, nameMapping);

Review comment:
       Seems like `buildOrcProjection` is handling it by traversing iceberg read schema to build the read projection




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] edgarRd commented on pull request #1208: ORC: Add name mapping support

Posted by GitBox <gi...@apache.org>.
edgarRd commented on pull request #1208:
URL: https://github.com/apache/iceberg/pull/1208#issuecomment-665856176


   @rdsr @shardulm94 I think this is ready for another review. PTAL when you have a chance.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] edgarRd commented on a change in pull request #1208: ORC: Add name mapping support

Posted by GitBox <gi...@apache.org>.
edgarRd commented on a change in pull request #1208:
URL: https://github.com/apache/iceberg/pull/1208#discussion_r461305817



##########
File path: orc/src/test/java/org/apache/iceberg/orc/TestExpressionToSearchArgument.java
##########
@@ -262,4 +264,65 @@ public void testEvolvedSchema() {
     actual = ExpressionToSearchArgument.convert(boundFilter, readSchema);
     Assert.assertEquals(expected.toString(), actual.toString());
   }
+
+  @Test
+  public void testOriginalSchemaNameMapping() {
+    Schema originalSchema = new Schema(
+        required(1, "int", Types.IntegerType.get()),
+        optional(2, "long", Types.LongType.get())
+    );
+
+    TypeDescription orcSchemaWithoutIds = ORCSchemaUtil.removeIds(ORCSchemaUtil.convert(originalSchema));
+    NameMapping nameMapping = MappingUtil.create(originalSchema);
+
+    TypeDescription readSchema = ORCSchemaUtil.buildOrcProjection(originalSchema,
+        ORCSchemaUtil.applyNameMapping(orcSchemaWithoutIds, nameMapping));
+
+    Expression expr = and(equal("int", 1), equal("long", 1));
+    Expression boundFilter = Binder.bind(originalSchema.asStruct(), expr, true);
+    SearchArgument expected = SearchArgumentFactory.newBuilder()
+        .equals("`int`", Type.LONG, 1L)
+        .equals("`long`", Type.LONG, 1L)
+        .build();
+
+    SearchArgument actual = ExpressionToSearchArgument.convert(boundFilter, readSchema);
+    Assert.assertEquals(expected.toString(), actual.toString());
+  }
+
+  @Test
+  public void testModifiedSchemaNameMapping() {

Review comment:
       I've added a couple tests to address this.

##########
File path: orc/src/main/java/org/apache/iceberg/orc/OrcIterable.java
##########
@@ -51,14 +54,16 @@
   private final boolean caseSensitive;
   private final Function<TypeDescription, OrcBatchReader<?>> batchReaderFunction;
   private final int recordsPerBatch;
+  private final NameMapping nameMapping;
 
   OrcIterable(InputFile file, Configuration config, Schema schema,
-              Long start, Long length,
+              NameMapping nameMapping, Long start, Long length,
               Function<TypeDescription, OrcRowReader<?>> readerFunction, boolean caseSensitive, Expression filter,
               Function<TypeDescription, OrcBatchReader<?>> batchReaderFunction, int recordsPerBatch) {
     this.schema = schema;
     this.readerFunction = readerFunction;
     this.file = file;
+    this.nameMapping = Optional.ofNullable(nameMapping).orElse(MappingUtil.create(schema));

Review comment:
       Done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on a change in pull request #1208: ORC: Add name mapping support

Posted by GitBox <gi...@apache.org>.
rdsr commented on a change in pull request #1208:
URL: https://github.com/apache/iceberg/pull/1208#discussion_r460356094



##########
File path: orc/src/main/java/org/apache/iceberg/orc/OrcIterable.java
##########
@@ -51,14 +54,16 @@
   private final boolean caseSensitive;
   private final Function<TypeDescription, OrcBatchReader<?>> batchReaderFunction;
   private final int recordsPerBatch;
+  private final NameMapping nameMapping;
 
   OrcIterable(InputFile file, Configuration config, Schema schema,
-              Long start, Long length,
+              NameMapping nameMapping, Long start, Long length,
               Function<TypeDescription, OrcRowReader<?>> readerFunction, boolean caseSensitive, Expression filter,
               Function<TypeDescription, OrcBatchReader<?>> batchReaderFunction, int recordsPerBatch) {
     this.schema = schema;
     this.readerFunction = readerFunction;
     this.file = file;
+    this.nameMapping = Optional.ofNullable(nameMapping).orElse(MappingUtil.create(schema));

Review comment:
       Should we also check if the file schema does not have ids before initializing nameMapping?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] edgarRd commented on pull request #1208: ORC: Add name mapping support

Posted by GitBox <gi...@apache.org>.
edgarRd commented on pull request #1208:
URL: https://github.com/apache/iceberg/pull/1208#issuecomment-658902179


   @rdblue @rdsr @shardulm94 PTAL when you have a chance. Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] edgarRd commented on a change in pull request #1208: ORC: Add name mapping support

Posted by GitBox <gi...@apache.org>.
edgarRd commented on a change in pull request #1208:
URL: https://github.com/apache/iceberg/pull/1208#discussion_r459805494



##########
File path: orc/src/test/java/org/apache/iceberg/orc/TestORCSchemaUtil.java
##########
@@ -269,4 +278,68 @@ public void testSkipNonIcebergColumns() {
     );
     assertEquals("Schemas must match.", expectedSchema2.asStruct(), icebergSchema2.asStruct());
   }
+
+  @Test
+  public void testHasIds() {
+    Schema schema = new Schema(
+        optional(1, "data", Types.StructType.of(
+            optional(10, "entries", Types.MapType.ofOptional(11, 12, Types.StringType.get(), Types.DateType.get()))
+        )),
+        optional(2, "intCol", Types.IntegerType.get()),
+        optional(3, "longCol", Types.LongType.get()),
+        optional(4, "listCol", Types.ListType.ofOptional(40, Types.DoubleType.get()))
+    );
+
+    TypeDescription orcSchema = ORCSchemaUtil.removeIds(ORCSchemaUtil.convert(schema));
+    assertFalse("Should not have Ids", ORCSchemaUtil.hasIds(orcSchema));
+
+    TypeDescription map2Col = TypeDescription.createMap(TypeDescription.createString(), TypeDescription.createBinary());
+    map2Col.setAttribute(ICEBERG_ID_ATTRIBUTE, "4");
+    orcSchema.addField("map2Col", map2Col);
+    assertTrue("Should have Ids after adding one type with Id", ORCSchemaUtil.hasIds(orcSchema));
+  }
+
+  @Test
+  public void tetsAssignIdsByNameMapping() {
+    Types.StructType structType = Types.StructType.of(
+        required(0, "id", Types.LongType.get()),
+        optional(1, "list_of_maps",
+            Types.ListType.ofOptional(2, Types.MapType.ofOptional(3, 4,
+                Types.StringType.get(),
+                SUPPORTED_PRIMITIVES))),
+        optional(5, "map_of_lists",
+            Types.MapType.ofOptional(6, 7,
+                Types.StringType.get(),
+                Types.ListType.ofOptional(8, SUPPORTED_PRIMITIVES))),
+        required(9, "list_of_lists",
+            Types.ListType.ofOptional(10, Types.ListType.ofOptional(11, SUPPORTED_PRIMITIVES))),
+        required(12, "map_of_maps",
+            Types.MapType.ofOptional(13, 14,
+                Types.StringType.get(),
+                Types.MapType.ofOptional(15, 16,
+                    Types.StringType.get(),
+                    SUPPORTED_PRIMITIVES))),
+        required(17, "list_of_struct_of_nested_types", Types.ListType.ofOptional(19, Types.StructType.of(
+            Types.NestedField.required(20, "m1", Types.MapType.ofOptional(21, 22,
+                Types.StringType.get(),
+                SUPPORTED_PRIMITIVES)),
+            Types.NestedField.optional(23, "l1", Types.ListType.ofRequired(24, SUPPORTED_PRIMITIVES)),
+            Types.NestedField.required(25, "l2", Types.ListType.ofRequired(26, SUPPORTED_PRIMITIVES)),
+            Types.NestedField.optional(27, "m2", Types.MapType.ofOptional(28, 29,
+                Types.StringType.get(),
+                SUPPORTED_PRIMITIVES))
+        )))
+    );
+
+    Schema schema = new Schema(TypeUtil.assignFreshIds(structType, new AtomicInteger(0)::incrementAndGet)
+        .asStructType().fields());
+
+    NameMapping nameMapping = MappingUtil.create(schema);
+    TypeDescription typeDescriptionWithIds = ORCSchemaUtil.convert(schema);
+    TypeDescription typeDescriptionWithIdsFromNameMapping = ORCSchemaUtil
+        .applyNameMapping(ORCSchemaUtil.removeIds(typeDescriptionWithIds), nameMapping);
+
+    assertTrue("TypeDescription schemas should be equal, not comparing Attributes",
+        typeDescriptionWithIds.equals(typeDescriptionWithIdsFromNameMapping, false /* checkAttributes */));

Review comment:
       Name mapping - using an Iceberg schema - will select the columns from an ORC schema that match the columns skipping anything that does not match and adds the ids found to produce the reading schema. So, I think the effect of name mapping is still being tested in this case since we're checking the the `TypeDescription` generated with name mapping is the expected one. That being said, I've added a check for the IDs as well. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] edgarRd commented on pull request #1208: ORC: Add name mapping support

Posted by GitBox <gi...@apache.org>.
edgarRd commented on pull request #1208:
URL: https://github.com/apache/iceberg/pull/1208#issuecomment-668138126


   @rdblue @rdsr Let me know if there's any comment. Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on a change in pull request #1208: ORC: Add name mapping support

Posted by GitBox <gi...@apache.org>.
rdsr commented on a change in pull request #1208:
URL: https://github.com/apache/iceberg/pull/1208#discussion_r460364692



##########
File path: orc/src/test/java/org/apache/iceberg/orc/TestExpressionToSearchArgument.java
##########
@@ -262,4 +264,65 @@ public void testEvolvedSchema() {
     actual = ExpressionToSearchArgument.convert(boundFilter, readSchema);
     Assert.assertEquals(expected.toString(), actual.toString());
   }
+
+  @Test
+  public void testOriginalSchemaNameMapping() {
+    Schema originalSchema = new Schema(
+        required(1, "int", Types.IntegerType.get()),
+        optional(2, "long", Types.LongType.get())
+    );
+
+    TypeDescription orcSchemaWithoutIds = ORCSchemaUtil.removeIds(ORCSchemaUtil.convert(originalSchema));
+    NameMapping nameMapping = MappingUtil.create(originalSchema);
+
+    TypeDescription readSchema = ORCSchemaUtil.buildOrcProjection(originalSchema,
+        ORCSchemaUtil.applyNameMapping(orcSchemaWithoutIds, nameMapping));
+
+    Expression expr = and(equal("int", 1), equal("long", 1));
+    Expression boundFilter = Binder.bind(originalSchema.asStruct(), expr, true);
+    SearchArgument expected = SearchArgumentFactory.newBuilder()
+        .equals("`int`", Type.LONG, 1L)
+        .equals("`long`", Type.LONG, 1L)
+        .build();
+
+    SearchArgument actual = ExpressionToSearchArgument.convert(boundFilter, readSchema);
+    Assert.assertEquals(expected.toString(), actual.toString());
+  }
+
+  @Test
+  public void testModifiedSchemaNameMapping() {

Review comment:
       can we have a test case for where complex fields are also pruned? For instance if all fields of a struct are pruned, the struct is pruned. Similarly for maps and lists?
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] shardulm94 commented on a change in pull request #1208: ORC: Add name mapping support

Posted by GitBox <gi...@apache.org>.
shardulm94 commented on a change in pull request #1208:
URL: https://github.com/apache/iceberg/pull/1208#discussion_r459193057



##########
File path: orc/src/test/java/org/apache/iceberg/orc/TestORCSchemaUtil.java
##########
@@ -269,4 +278,68 @@ public void testSkipNonIcebergColumns() {
     );
     assertEquals("Schemas must match.", expectedSchema2.asStruct(), icebergSchema2.asStruct());
   }
+
+  @Test
+  public void testHasIds() {
+    Schema schema = new Schema(
+        optional(1, "data", Types.StructType.of(
+            optional(10, "entries", Types.MapType.ofOptional(11, 12, Types.StringType.get(), Types.DateType.get()))
+        )),
+        optional(2, "intCol", Types.IntegerType.get()),
+        optional(3, "longCol", Types.LongType.get()),
+        optional(4, "listCol", Types.ListType.ofOptional(40, Types.DoubleType.get()))
+    );
+
+    TypeDescription orcSchema = ORCSchemaUtil.removeIds(ORCSchemaUtil.convert(schema));
+    assertFalse("Should not have Ids", ORCSchemaUtil.hasIds(orcSchema));
+
+    TypeDescription map2Col = TypeDescription.createMap(TypeDescription.createString(), TypeDescription.createBinary());
+    map2Col.setAttribute(ICEBERG_ID_ATTRIBUTE, "4");
+    orcSchema.addField("map2Col", map2Col);
+    assertTrue("Should have Ids after adding one type with Id", ORCSchemaUtil.hasIds(orcSchema));
+  }
+
+  @Test
+  public void tetsAssignIdsByNameMapping() {
+    Types.StructType structType = Types.StructType.of(
+        required(0, "id", Types.LongType.get()),
+        optional(1, "list_of_maps",
+            Types.ListType.ofOptional(2, Types.MapType.ofOptional(3, 4,
+                Types.StringType.get(),
+                SUPPORTED_PRIMITIVES))),
+        optional(5, "map_of_lists",
+            Types.MapType.ofOptional(6, 7,
+                Types.StringType.get(),
+                Types.ListType.ofOptional(8, SUPPORTED_PRIMITIVES))),
+        required(9, "list_of_lists",
+            Types.ListType.ofOptional(10, Types.ListType.ofOptional(11, SUPPORTED_PRIMITIVES))),
+        required(12, "map_of_maps",
+            Types.MapType.ofOptional(13, 14,
+                Types.StringType.get(),
+                Types.MapType.ofOptional(15, 16,
+                    Types.StringType.get(),
+                    SUPPORTED_PRIMITIVES))),
+        required(17, "list_of_struct_of_nested_types", Types.ListType.ofOptional(19, Types.StructType.of(
+            Types.NestedField.required(20, "m1", Types.MapType.ofOptional(21, 22,
+                Types.StringType.get(),
+                SUPPORTED_PRIMITIVES)),
+            Types.NestedField.optional(23, "l1", Types.ListType.ofRequired(24, SUPPORTED_PRIMITIVES)),
+            Types.NestedField.required(25, "l2", Types.ListType.ofRequired(26, SUPPORTED_PRIMITIVES)),
+            Types.NestedField.optional(27, "m2", Types.MapType.ofOptional(28, 29,
+                Types.StringType.get(),
+                SUPPORTED_PRIMITIVES))
+        )))
+    );
+
+    Schema schema = new Schema(TypeUtil.assignFreshIds(structType, new AtomicInteger(0)::incrementAndGet)
+        .asStructType().fields());
+
+    NameMapping nameMapping = MappingUtil.create(schema);
+    TypeDescription typeDescriptionWithIds = ORCSchemaUtil.convert(schema);
+    TypeDescription typeDescriptionWithIdsFromNameMapping = ORCSchemaUtil
+        .applyNameMapping(ORCSchemaUtil.removeIds(typeDescriptionWithIds), nameMapping);
+
+    assertTrue("TypeDescription schemas should be equal, not comparing Attributes",
+        typeDescriptionWithIds.equals(typeDescriptionWithIdsFromNameMapping, false /* checkAttributes */));

Review comment:
       If we are not comparing the attributes here, is there any value to adding this assert? Seems like name mapping's effect (assign ids) is not actually being tested in this case. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr commented on a change in pull request #1208: ORC: Add name mapping support

Posted by GitBox <gi...@apache.org>.
rdsr commented on a change in pull request #1208:
URL: https://github.com/apache/iceberg/pull/1208#discussion_r460360925



##########
File path: orc/src/main/java/org/apache/iceberg/orc/ApplyNameMapping.java
##########
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.orc;
+
+import java.util.List;
+import org.apache.iceberg.mapping.MappedField;
+import org.apache.iceberg.mapping.NameMapping;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.orc.TypeDescription;
+
+class ApplyNameMapping extends OrcSchemaVisitor<TypeDescription> {
+  private final NameMapping nameMapping;
+
+  ApplyNameMapping(NameMapping nameMapping) {
+    this.nameMapping = nameMapping;
+  }
+
+  @Override
+  public String elementName() {
+    return "element";
+  }
+
+  @Override
+  public String keyName() {
+    return "key";
+  }
+
+  @Override
+  public String valueName() {
+    return "value";
+  }
+
+  TypeDescription setId(TypeDescription type, MappedField mappedField) {
+    if (mappedField != null) {
+      type.setAttribute(ORCSchemaUtil.ICEBERG_ID_ATTRIBUTE, mappedField.id().toString());
+    }
+    return type;
+  }
+
+  @Override
+  public TypeDescription record(TypeDescription record, List<String> names, List<TypeDescription> fields) {
+    Preconditions.checkArgument(names.size() == fields.size(), "All fields must have names");
+    MappedField field = nameMapping.find(currentPath());
+    TypeDescription structType = TypeDescription.createStruct();
+
+    for (int i = 0; i < fields.size(); i++) {
+      String fieldName = names.get(i);
+      TypeDescription fieldType = fields.get(i);
+      if (fieldType != null) {
+        structType.addField(fieldName, fieldType);
+      }
+    }
+    return setId(structType, field);
+  }
+
+  @Override
+  public TypeDescription list(TypeDescription array, TypeDescription element) {
+    Preconditions.checkArgument(element != null, "List type must have element type");
+
+    MappedField field = nameMapping.find(currentPath());
+    TypeDescription listType = TypeDescription.createList(element);
+    return setId(listType, field);
+  }
+
+  @Override
+  public TypeDescription map(TypeDescription map, TypeDescription key, TypeDescription value) {
+    Preconditions.checkArgument(key != null && value != null, "Map type must have both key and value types");
+
+    MappedField field = nameMapping.find(currentPath());
+    TypeDescription mapType = TypeDescription.createMap(key, value);
+    return setId(mapType, field);
+  }
+
+  @Override
+  public TypeDescription primitive(TypeDescription primitive) {
+    MappedField field = nameMapping.find(currentPath());
+    return setId(primitive.clone(), field);

Review comment:
       Why do we need to call clone here?

##########
File path: orc/src/test/java/org/apache/iceberg/orc/TestExpressionToSearchArgument.java
##########
@@ -262,4 +264,65 @@ public void testEvolvedSchema() {
     actual = ExpressionToSearchArgument.convert(boundFilter, readSchema);
     Assert.assertEquals(expected.toString(), actual.toString());
   }
+
+  @Test
+  public void testOriginalSchemaNameMapping() {
+    Schema originalSchema = new Schema(
+        required(1, "int", Types.IntegerType.get()),
+        optional(2, "long", Types.LongType.get())
+    );
+
+    TypeDescription orcSchemaWithoutIds = ORCSchemaUtil.removeIds(ORCSchemaUtil.convert(originalSchema));
+    NameMapping nameMapping = MappingUtil.create(originalSchema);
+
+    TypeDescription readSchema = ORCSchemaUtil.buildOrcProjection(originalSchema,
+        ORCSchemaUtil.applyNameMapping(orcSchemaWithoutIds, nameMapping));
+
+    Expression expr = and(equal("int", 1), equal("long", 1));
+    Expression boundFilter = Binder.bind(originalSchema.asStruct(), expr, true);
+    SearchArgument expected = SearchArgumentFactory.newBuilder()
+        .equals("`int`", Type.LONG, 1L)
+        .equals("`long`", Type.LONG, 1L)
+        .build();
+
+    SearchArgument actual = ExpressionToSearchArgument.convert(boundFilter, readSchema);
+    Assert.assertEquals(expected.toString(), actual.toString());
+  }
+
+  @Test
+  public void testModifiedSchemaNameMapping() {

Review comment:
       can we have aa test case for where complex fields are also pruned? For instance if all fields of a struct are pruned, the struct is pruned. Similarly for maps and lists?
   
   

##########
File path: orc/src/main/java/org/apache/iceberg/orc/ApplyNameMapping.java
##########
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.orc;
+
+import java.util.List;
+import org.apache.iceberg.mapping.MappedField;
+import org.apache.iceberg.mapping.NameMapping;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.orc.TypeDescription;
+
+class ApplyNameMapping extends OrcSchemaVisitor<TypeDescription> {
+  private final NameMapping nameMapping;
+
+  ApplyNameMapping(NameMapping nameMapping) {
+    this.nameMapping = nameMapping;
+  }
+
+  @Override
+  public String elementName() {
+    return "element";
+  }
+
+  @Override
+  public String keyName() {
+    return "key";
+  }
+
+  @Override
+  public String valueName() {
+    return "value";
+  }
+
+  TypeDescription setId(TypeDescription type, MappedField mappedField) {
+    if (mappedField != null) {

Review comment:
       shouldn't a 'type' be pruned if a mappedField is missing or null?

##########
File path: orc/src/main/java/org/apache/iceberg/orc/OrcSchemaVisitor.java
##########
@@ -83,9 +112,53 @@
     return visitor.record(record, names, visitFields(fields, names, visitor));
   }
 
-  public void beforeField(String name, TypeDescription type) {}
+  public String elementName() {
+    return "_elem";

Review comment:
       Why underscores?

##########
File path: orc/src/main/java/org/apache/iceberg/orc/RemoveIds.java
##########
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.orc;
+
+import java.util.List;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.orc.TypeDescription;
+
+class RemoveIds extends OrcSchemaVisitor<TypeDescription> {
+
+  @Override
+  public TypeDescription record(TypeDescription record, List<String> names, List<TypeDescription> fields) {
+    Preconditions.checkArgument(names.size() == fields.size(), "All fields must have names.");
+    TypeDescription struct = TypeDescription.createStruct();
+
+    for (int i = 0; i < fields.size(); i++) {
+      struct.addField(names.get(i), fields.get(i));
+    }
+    return struct;
+  }
+
+  @Override
+  public TypeDescription list(TypeDescription array, TypeDescription element) {
+    return TypeDescription.createList(element);
+  }
+
+  @Override
+  public TypeDescription map(TypeDescription map, TypeDescription key, TypeDescription value) {
+    return TypeDescription.createMap(key, value);
+  }
+
+  @Override
+  public TypeDescription primitive(TypeDescription primitive) {
+    return removeIcebergAttributes(primitive.clone());
+  }
+
+  private static TypeDescription removeIcebergAttributes(TypeDescription orcType) {
+    orcType.removeAttribute(ORCSchemaUtil.ICEBERG_ID_ATTRIBUTE);

Review comment:
       Why don't we only remove the ID attribute?

##########
File path: orc/src/main/java/org/apache/iceberg/orc/OrcIterable.java
##########
@@ -73,7 +78,15 @@
   public CloseableIterator<T> iterator() {
     Reader orcFileReader = ORC.newFileReader(file, config);
     addCloseable(orcFileReader);
-    TypeDescription readOrcSchema = ORCSchemaUtil.buildOrcProjection(schema, orcFileReader.getSchema());
+
+    TypeDescription fileSchema = orcFileReader.getSchema();
+    final TypeDescription readOrcSchema;
+    if (ORCSchemaUtil.hasIds(fileSchema)) {
+      readOrcSchema = ORCSchemaUtil.buildOrcProjection(schema, fileSchema);
+    } else {
+      TypeDescription typeWithIds = ORCSchemaUtil.applyNameMapping(fileSchema, nameMapping);

Review comment:
       IIRC, nameMapping may not have all the fields which maybe present in the file schema. I think we need a way to prune columns which does not have an associated mapping defined. I can't say whether `buildOrcProjection` is handling that case or not.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] edgarRd commented on pull request #1208: ORC: Add name mapping support

Posted by GitBox <gi...@apache.org>.
edgarRd commented on pull request #1208:
URL: https://github.com/apache/iceberg/pull/1208#issuecomment-675746766


   Thanks @rdsr and @shardulm94 for the reviews and merging!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1208: ORC: Add name mapping support

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1208:
URL: https://github.com/apache/iceberg/pull/1208#issuecomment-663321442


   I'll try to get to this over the weekend. Sorry for not getting back to you sooner, I'm a bit behind with reviews.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] edgarRd commented on a change in pull request #1208: ORC: Add name mapping support

Posted by GitBox <gi...@apache.org>.
edgarRd commented on a change in pull request #1208:
URL: https://github.com/apache/iceberg/pull/1208#discussion_r459805494



##########
File path: orc/src/test/java/org/apache/iceberg/orc/TestORCSchemaUtil.java
##########
@@ -269,4 +278,68 @@ public void testSkipNonIcebergColumns() {
     );
     assertEquals("Schemas must match.", expectedSchema2.asStruct(), icebergSchema2.asStruct());
   }
+
+  @Test
+  public void testHasIds() {
+    Schema schema = new Schema(
+        optional(1, "data", Types.StructType.of(
+            optional(10, "entries", Types.MapType.ofOptional(11, 12, Types.StringType.get(), Types.DateType.get()))
+        )),
+        optional(2, "intCol", Types.IntegerType.get()),
+        optional(3, "longCol", Types.LongType.get()),
+        optional(4, "listCol", Types.ListType.ofOptional(40, Types.DoubleType.get()))
+    );
+
+    TypeDescription orcSchema = ORCSchemaUtil.removeIds(ORCSchemaUtil.convert(schema));
+    assertFalse("Should not have Ids", ORCSchemaUtil.hasIds(orcSchema));
+
+    TypeDescription map2Col = TypeDescription.createMap(TypeDescription.createString(), TypeDescription.createBinary());
+    map2Col.setAttribute(ICEBERG_ID_ATTRIBUTE, "4");
+    orcSchema.addField("map2Col", map2Col);
+    assertTrue("Should have Ids after adding one type with Id", ORCSchemaUtil.hasIds(orcSchema));
+  }
+
+  @Test
+  public void tetsAssignIdsByNameMapping() {
+    Types.StructType structType = Types.StructType.of(
+        required(0, "id", Types.LongType.get()),
+        optional(1, "list_of_maps",
+            Types.ListType.ofOptional(2, Types.MapType.ofOptional(3, 4,
+                Types.StringType.get(),
+                SUPPORTED_PRIMITIVES))),
+        optional(5, "map_of_lists",
+            Types.MapType.ofOptional(6, 7,
+                Types.StringType.get(),
+                Types.ListType.ofOptional(8, SUPPORTED_PRIMITIVES))),
+        required(9, "list_of_lists",
+            Types.ListType.ofOptional(10, Types.ListType.ofOptional(11, SUPPORTED_PRIMITIVES))),
+        required(12, "map_of_maps",
+            Types.MapType.ofOptional(13, 14,
+                Types.StringType.get(),
+                Types.MapType.ofOptional(15, 16,
+                    Types.StringType.get(),
+                    SUPPORTED_PRIMITIVES))),
+        required(17, "list_of_struct_of_nested_types", Types.ListType.ofOptional(19, Types.StructType.of(
+            Types.NestedField.required(20, "m1", Types.MapType.ofOptional(21, 22,
+                Types.StringType.get(),
+                SUPPORTED_PRIMITIVES)),
+            Types.NestedField.optional(23, "l1", Types.ListType.ofRequired(24, SUPPORTED_PRIMITIVES)),
+            Types.NestedField.required(25, "l2", Types.ListType.ofRequired(26, SUPPORTED_PRIMITIVES)),
+            Types.NestedField.optional(27, "m2", Types.MapType.ofOptional(28, 29,
+                Types.StringType.get(),
+                SUPPORTED_PRIMITIVES))
+        )))
+    );
+
+    Schema schema = new Schema(TypeUtil.assignFreshIds(structType, new AtomicInteger(0)::incrementAndGet)
+        .asStructType().fields());
+
+    NameMapping nameMapping = MappingUtil.create(schema);
+    TypeDescription typeDescriptionWithIds = ORCSchemaUtil.convert(schema);
+    TypeDescription typeDescriptionWithIdsFromNameMapping = ORCSchemaUtil
+        .applyNameMapping(ORCSchemaUtil.removeIds(typeDescriptionWithIds), nameMapping);
+
+    assertTrue("TypeDescription schemas should be equal, not comparing Attributes",
+        typeDescriptionWithIds.equals(typeDescriptionWithIdsFromNameMapping, false /* checkAttributes */));

Review comment:
       I've added a check for the IDs as well. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] edgarRd commented on pull request #1208: ORC: Add name mapping support

Posted by GitBox <gi...@apache.org>.
edgarRd commented on pull request #1208:
URL: https://github.com/apache/iceberg/pull/1208#issuecomment-673150699


   @rdblue do you have any additional comment? Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdsr merged pull request #1208: ORC: Add name mapping support

Posted by GitBox <gi...@apache.org>.
rdsr merged pull request #1208:
URL: https://github.com/apache/iceberg/pull/1208


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] shardulm94 commented on a change in pull request #1208: ORC: Add name mapping support

Posted by GitBox <gi...@apache.org>.
shardulm94 commented on a change in pull request #1208:
URL: https://github.com/apache/iceberg/pull/1208#discussion_r459193057



##########
File path: orc/src/test/java/org/apache/iceberg/orc/TestORCSchemaUtil.java
##########
@@ -269,4 +278,68 @@ public void testSkipNonIcebergColumns() {
     );
     assertEquals("Schemas must match.", expectedSchema2.asStruct(), icebergSchema2.asStruct());
   }
+
+  @Test
+  public void testHasIds() {
+    Schema schema = new Schema(
+        optional(1, "data", Types.StructType.of(
+            optional(10, "entries", Types.MapType.ofOptional(11, 12, Types.StringType.get(), Types.DateType.get()))
+        )),
+        optional(2, "intCol", Types.IntegerType.get()),
+        optional(3, "longCol", Types.LongType.get()),
+        optional(4, "listCol", Types.ListType.ofOptional(40, Types.DoubleType.get()))
+    );
+
+    TypeDescription orcSchema = ORCSchemaUtil.removeIds(ORCSchemaUtil.convert(schema));
+    assertFalse("Should not have Ids", ORCSchemaUtil.hasIds(orcSchema));
+
+    TypeDescription map2Col = TypeDescription.createMap(TypeDescription.createString(), TypeDescription.createBinary());
+    map2Col.setAttribute(ICEBERG_ID_ATTRIBUTE, "4");
+    orcSchema.addField("map2Col", map2Col);
+    assertTrue("Should have Ids after adding one type with Id", ORCSchemaUtil.hasIds(orcSchema));
+  }
+
+  @Test
+  public void tetsAssignIdsByNameMapping() {
+    Types.StructType structType = Types.StructType.of(
+        required(0, "id", Types.LongType.get()),
+        optional(1, "list_of_maps",
+            Types.ListType.ofOptional(2, Types.MapType.ofOptional(3, 4,
+                Types.StringType.get(),
+                SUPPORTED_PRIMITIVES))),
+        optional(5, "map_of_lists",
+            Types.MapType.ofOptional(6, 7,
+                Types.StringType.get(),
+                Types.ListType.ofOptional(8, SUPPORTED_PRIMITIVES))),
+        required(9, "list_of_lists",
+            Types.ListType.ofOptional(10, Types.ListType.ofOptional(11, SUPPORTED_PRIMITIVES))),
+        required(12, "map_of_maps",
+            Types.MapType.ofOptional(13, 14,
+                Types.StringType.get(),
+                Types.MapType.ofOptional(15, 16,
+                    Types.StringType.get(),
+                    SUPPORTED_PRIMITIVES))),
+        required(17, "list_of_struct_of_nested_types", Types.ListType.ofOptional(19, Types.StructType.of(
+            Types.NestedField.required(20, "m1", Types.MapType.ofOptional(21, 22,
+                Types.StringType.get(),
+                SUPPORTED_PRIMITIVES)),
+            Types.NestedField.optional(23, "l1", Types.ListType.ofRequired(24, SUPPORTED_PRIMITIVES)),
+            Types.NestedField.required(25, "l2", Types.ListType.ofRequired(26, SUPPORTED_PRIMITIVES)),
+            Types.NestedField.optional(27, "m2", Types.MapType.ofOptional(28, 29,
+                Types.StringType.get(),
+                SUPPORTED_PRIMITIVES))
+        )))
+    );
+
+    Schema schema = new Schema(TypeUtil.assignFreshIds(structType, new AtomicInteger(0)::incrementAndGet)
+        .asStructType().fields());
+
+    NameMapping nameMapping = MappingUtil.create(schema);
+    TypeDescription typeDescriptionWithIds = ORCSchemaUtil.convert(schema);
+    TypeDescription typeDescriptionWithIdsFromNameMapping = ORCSchemaUtil
+        .applyNameMapping(ORCSchemaUtil.removeIds(typeDescriptionWithIds), nameMapping);
+
+    assertTrue("TypeDescription schemas should be equal, not comparing Attributes",
+        typeDescriptionWithIds.equals(typeDescriptionWithIdsFromNameMapping, false /* checkAttributes */));

Review comment:
       If we are not comparing the attributes here, is there any value to adding this assert? Seems like name mapping's effect (assigning ids) is not actually being tested in this case. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] edgarRd edited a comment on pull request #1208: ORC: Add name mapping support

Posted by GitBox <gi...@apache.org>.
edgarRd edited a comment on pull request #1208:
URL: https://github.com/apache/iceberg/pull/1208#issuecomment-662589950


   @rdsr @shardulm94 @rdblue I've updated this branch to latest master, PTAL when you have a chance. Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] edgarRd commented on pull request #1208: ORC: Add name mapping support

Posted by GitBox <gi...@apache.org>.
edgarRd commented on pull request #1208:
URL: https://github.com/apache/iceberg/pull/1208#issuecomment-662589950


   @rdsr @shardulm94 @rdblue I'm updated this branch to latest master, PTAL when you have a chance. Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org