You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/10/19 04:05:44 UTC

[GitHub] [iceberg] edwinchoi edited a comment on pull request #1508: Use schema at the time of the snapshot when reading a snapshot.

edwinchoi edited a comment on pull request #1508:
URL: https://github.com/apache/iceberg/pull/1508#issuecomment-711506424

> Also, from what I see, the metadata timestamp is always the same as the snapshot timestamp when the metadata is written for a new snapshot.

If you use Spark 3's catalog API, you'll see that the snapshot timestamp and the metadata are _not guaranteed_ to have the same time . You can trace the call from `SparkCatalog.stageCreateOrReplace`. RTAS applies the changes in a transaction, which uses independent calls to `System.currentTimeMillis()` for the two timestamps.

Try adding tests to `TestCreateTableAsSelect` that do CTAS/RTAS, and you'll see that the timestamps are not the same.

Also, after giving this some more thought, you can't rely on a partial ordering between the snapshot and metadata update timestamps. `System.currentTimeMillis()` is not monotonic - clock adjustments via NTP can cause two consecutive readings to go back in time. The only safe option then is to scan the metadata files to find the file where the current-snapshot-id matches the target snapshot-id.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org