You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/08/20 00:42:14 UTC

[GitHub] [iceberg] rdblue commented on a change in pull request #1346: Flink: Introduce Flink InputFormat

rdblue commented on a change in pull request #1346:
URL: https://github.com/apache/iceberg/pull/1346#discussion_r473480966



##########
File path: flink/src/main/java/org/apache/iceberg/flink/FlinkSchemaUtil.java
##########
@@ -98,4 +102,22 @@ public static TableSchema toSchema(RowType rowType) {
     }
     return builder.build();
   }
+
+  /**
+   * Prune columns from a {@link Schema} using a projected fields.
+   *
+   * @param schema a Schema
+   * @param projectedFields projected fields from Flink
+   * @return a Schema corresponding to the Flink projection
+   * @throws IllegalArgumentException if the Flink type does not match the Schema
+   */
+  public static Schema pruneWithoutReordering(Schema schema, List<String> projectedFields) {
+    if (projectedFields == null) {
+      return schema;
+    }
+
+    Map<String, Integer> indexByName = TypeUtil.indexByName(schema.asStruct());
+    Set<Integer> projectedIds = projectedFields.stream().map(indexByName::get).collect(Collectors.toSet());
+    return TypeUtil.select(schema, projectedIds);

Review comment:
       You should be able to use the expected/projection schema. All readers should reorder columns to produce the requested column order.
   
   `AvroSchemaWithTypeVisitor` traverses the file schema to create the reader structure, but that's because fields in Avro must be read in the file's order. But when that reader adds data columns to records, the values are put in the correct order because the [`ResolvingDecoder` returns the correct position](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/avro/ValueReaders.java#L645-L648) in the projection schema.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org