You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/10/08 05:29:15 UTC

[GitHub] [iceberg] wypoon commented on pull request #1508: Use schema at the time of the snapshot when reading a snapshot.

wypoon commented on pull request #1508:
URL: https://github.com/apache/iceberg/pull/1508#issuecomment-705338838


   > I think you need to check `.snapshots()` not `.snapshotLog()`. The latter seems to only retain the history of changes since the table was created/replaced. You can see in `TableMetadata.buildReplacement` that the prior history entries are not retained in the replacement.
   > 
   > Also, from what I can tell, the metadata is always refreshed after the snapshot has been created (e.g., `SnapshotProducer.commit()`, `BaseTransaction.commitTransaction()`, etc... each refresh the metadata as part of the commit protocol). If that is always the case, then the loop should match the `Snapshot` with the first `MetadataLogEntry` where `MetadataLogEntry.timestampMillis() >= Snapshot.timestampMillis()`, and not the last `MetadataLogEntry.timestampMillis() <= Snapshot.timestampMillis()`.
   > 
   > As a sanity check, the code could check the expected snapshot-id against the `TableMetadata` that is read.
   
   My understanding from @rdblue's replies is that one should use the snapshot log, so I have stuck with that. Also, from what I see, the metadata timestamp is always the same as the snapshot timestamp when the metadata is written for a new snapshot. I changed the loop to look for the metadata log entry whose timestamp == the snapshot timestamp (and break out of the loop). (I could also use >= as you suggest, but it should make no difference.)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org