You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/11/15 17:33:19 UTC

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #6163: Core: Method for building grouping key type

aokolnychyi commented on code in PR #6163:
URL: https://github.com/apache/iceberg/pull/6163#discussion_r1023081485


##########
core/src/main/java/org/apache/iceberg/Partitioning.java:
##########
@@ -195,41 +198,75 @@ public Void alwaysNull(int fieldId, String sourceName, int sourceId) {
   }
 
   /**
-   * Builds a common partition type for all specs in a table.
+   * Builds a grouping key type considering all provided specs.
    *
-   * <p>Whenever a table has multiple specs, the partition type is a struct containing all columns
-   * that have ever been a part of any spec in the table.
+   * <p>A grouping key defines how data is split between files and consists of partition fields with
+   * non-void transforms that are present in each provided spec. Iceberg guarantees that records
+   * with different values for the grouping key are disjoint and are stored in separate files.
+   *
+   * <p>If there is only one spec, the grouping key will include all partition fields with non-void
+   * transforms from that spec. Whenever there are multiple specs, the grouping key will represent
+   * an intersection of all partition fields with non-void transforms. If a partition field is
+   * present only in a subset of specs, Iceberg cannot guarantee data distribution on that field.
+   * That's why it will not be part of the grouping key. Unpartitioned tables or tables with
+   * non-overlapping specs have empty grouping keys.
+   *
+   * <p>When partition fields are dropped in v1 tables, they are replaced with new partition fields
+   * that have the same field ID but use a void transform under the hood. Such fields cannot be part
+   * of the grouping key as void transforms always return null.
+   *
+   * @param specs one or many specs
+   * @return the constructed grouping key type
+   */
+  public static StructType groupingKeyType(Collection<PartitionSpec> specs) {

Review Comment:
   @sunchao, the idea is that a scan cover a subset of files, which may mean we will only query particular specs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org