You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/08/10 23:32:30 UTC

[GitHub] [iceberg] rdblue opened a new pull request #1317: Add specId to DataFile

rdblue opened a new pull request #1317:
URL: https://github.com/apache/iceberg/pull/1317


   Utilities that compare partitions need to also track the partition spec that a partition tuple belongs to because the same set of partition values can be valid for multiple specs, but identify different partitions. Many classes track the partitions of data and delete files, and the easiest way to update those utilities is to pass the spec ID along with the `DataFile` instance. Otherwise, getting the correct spec ID would require updating several public APIs to add a spec ID argument.
   
   This PR adds spec ID to `DataFile` and `DeleteFile`, and adds it to metadata that is inherited from `ManifestFile`, where the spec ID of a manifest is tracked.
   
   This also cleans up unnecessary factory methods in `DataFile` that were used only in tests and were missing spec ID. Now, all data file creation uses the builder.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1317: Add specId to DataFile

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1317:
URL: https://github.com/apache/iceberg/pull/1317#issuecomment-672339916


   Thanks for reviewing @rdsr and @shardulm94!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] shardulm94 commented on pull request #1317: Add specId to DataFile

Posted by GitBox <gi...@apache.org>.
shardulm94 commented on pull request #1317:
URL: https://github.com/apache/iceberg/pull/1317#issuecomment-671851848


   https://github.com/apache/iceberg/pull/1307#discussion_r468201047 mentions that a workaround in `StructLikeWrapper` will be removed after adding `specId` to `DataFile`. Is that supposed to happen as part of this PR?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1317: Add specId to DataFile

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1317:
URL: https://github.com/apache/iceberg/pull/1317#issuecomment-672172504


   @shardulm94, that is done in #1308. That PR includes the addition of `specId` to `DataFile`, but I think it is cleaner to put that change in a separate PR, which is why I opened this one.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1317: Add specId to DataFile

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1317:
URL: https://github.com/apache/iceberg/pull/1317#discussion_r705775959



##########
File path: spark/src/main/java/org/apache/iceberg/spark/SparkDataFile.java
##########
@@ -87,6 +87,11 @@ public SparkDataFile wrap(Row row) {
     return this;
   }
 
+  @Override
+  public int specId() {
+    return -1;

Review comment:
       Shouldn't this be handled like the other columns?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #1317: Add specId to DataFile

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #1317:
URL: https://github.com/apache/iceberg/pull/1317


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1317: Add specId to DataFile

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1317:
URL: https://github.com/apache/iceberg/pull/1317#discussion_r705775407



##########
File path: core/src/test/java/org/apache/iceberg/TestMergeAppend.java
##########
@@ -1129,8 +1133,12 @@ public void testManifestEntryFieldIdsForChangedPartitionSpecForV1Table() {
     V2Assert.assertEquals("Last sequence number should be 1", 1, readMetadata().lastSequenceNumber());
     V1Assert.assertEquals("Table should end with last-sequence-number 0", 0, readMetadata().lastSequenceNumber());
 
+    // create a new with the table's current spec
     DataFile newFile = DataFiles.builder(table.spec())
-        .copy(FILE_B)
+        .withPath("/path/to/data-x.parquet")
+        .withFileSizeInBytes(10)
+        .withPartitionPath("id_bucket=1/data_bucket=1")
+        .withRecordCount(1)

Review comment:
       Why did these tests change?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org