You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/01/13 00:27:43 UTC

[GitHub] [iceberg] jackye1995 commented on a change in pull request #2061: Improve partition spec builder

jackye1995 commented on a change in pull request #2061:
URL: https://github.com/apache/iceberg/pull/2061#discussion_r556182365



##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -389,7 +392,7 @@ public Builder year(String sourceName, String targetName) {
       Types.NestedField sourceColumn = findSourceColumn(sourceName);
       PartitionField field = new PartitionField(
           sourceColumn.fieldId(), nextFieldId(), targetName, Transforms.year(sourceColumn.type()));
-      checkForRedundantPartitions(field);
+      checkForRedundantPartitions(field, "dates");

Review comment:
       nit: "dates" has shown up multiple times, move to static variable

##########
File path: api/src/test/java/org/apache/iceberg/TestPartitionSpecValidation.java
##########
@@ -123,6 +123,21 @@ public void testMultipleDatePartitionsWithDifferentSourceColumns() {
     PartitionSpec.builderFor(SCHEMA).hour("d").hour("another_d").build();
   }
 
+  @Test
+  public void testMultipleIdentityPartitions() {
+    PartitionSpec.builderFor(SCHEMA).year("d").identity("id").identity("d").identity("s").build();

Review comment:
       I think I understand the goal this PR tires to achieve, but what is the use case for this? Because they are now not deduped, there are now 2 partition columns "d_year" and "d". Is this actually the right behavior for users?

##########
File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
##########
@@ -319,7 +319,7 @@ public static Builder builderFor(Schema schema) {
     private final Schema schema;
     private final List<PartitionField> fields = Lists.newArrayList();
     private final Set<String> partitionNames = Sets.newHashSet();
-    private Map<Integer, PartitionField> timeFields = Maps.newHashMap();
+    private Map<String, PartitionField> partitionFields = Maps.newHashMap();

Review comment:
       +1, maybe some name like a `dedupPartitionFields`. And if we are doing this for datetime transforms, we should probably also do it for bucket and truncate and alwaysNull.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org