You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/01/08 17:41:16 UTC

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #675: Inherit snapshot ids for manifest entries

rdblue commented on a change in pull request #675: Inherit snapshot ids for manifest entries
URL: https://github.com/apache/incubator-iceberg/pull/675#discussion_r364357342
 
 

 ##########
 File path: core/src/main/java/org/apache/iceberg/FastAppend.java
 ##########
 @@ -81,15 +83,21 @@ public FastAppend appendFile(DataFile file) {
 
   @Override
   public FastAppend appendManifest(ManifestFile manifest) {
-    // the manifest must be rewritten with this update's snapshot ID
-    try (ManifestReader reader = ManifestReader.read(
-        ops.io().newInputFile(manifest.path()), ops.current().specsById())) {
-      OutputFile newManifestPath = manifestPath(manifestCount.getAndIncrement());
-      appendManifests.add(ManifestWriter.copyAppendManifest(reader, newManifestPath, snapshotId(), summaryBuilder));
+    Preconditions.checkArgument(!manifest.hasExistingFiles(), "Cannot append manifest with existing files");
+    Preconditions.checkArgument(!manifest.hasDeletedFiles(), "Cannot append manifest with deleted files");
+    Preconditions.checkArgument(manifest.snapshotId() == null, "Snapshot id must be assigned during commit");
+
+    // TODO: avoid reading manifests to simply get stats
+    try (ManifestReader reader = ManifestReader.read(manifest, ops.io(), ops.current().specsById())) {
 
 Review comment:
   Ideally, we would produce all of the summary stats, but the most important ones are `total-data-files`, `total-records`, and the `added-` or `deleted-` properties that are used to produce totals. I think it's okay to not write the `changed-partition-count` metrics if they require scanning the appended manifest.
   
   I think my response was confusing because we keep additional summary information about each partition in our version. I can move that upstream if everyone wants it, but it can make the metadata files quite large. Without a use case for doing this upstream, I didn't think it was a good idea to make everyone's metadata significantly larger.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org