You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/03/24 11:25:11 UTC

[GitHub] [iceberg] yuzhaojing opened a new issue #2371: RowDelta validate data file exist should start from recent rewrite snapshot

yuzhaojing opened a new issue #2371:
URL: https://github.com/apache/iceberg/issues/2371


   Currently, MergingSnapshotProducer#validateDataFilesExist will validate all snapshots, but it doesn't consider rewrite and expire action. If we rewrite a snapshot, and then expire any snapshot with `cleanExpiredFiles(true)` before rewrite, this check will throw Execption.
   `ValidationException.check(currentSnapshot != null,
             "Cannot determine history between starting snapshot %s and current %s",
             startingSnapshotId, currentSnapshotId);`
   Because cleanExpiredFiles will delete snapshot in metadata, and MergingSnapshotProducer#validateDataFilesExist will check all snapshot in metadata from current to first. There must be a parent-snapshot-id doesn't have corresponding snapshot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue closed issue #2371: RowDelta validate data file exist should start from recent rewrite snapshot

Posted by GitBox <gi...@apache.org>.
rdblue closed issue #2371:
URL: https://github.com/apache/iceberg/issues/2371


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on issue #2371: RowDelta validate data file exist should start from recent rewrite snapshot

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #2371:
URL: https://github.com/apache/iceberg/issues/2371#issuecomment-939050446


   I don't think that this is a problem. `validateDataFilesExist` is written correctly, I think it is just used incorrectly in the CDC use case (see https://github.com/apache/iceberg/issues/2482#issuecomment-939049135). In that case, you don't even need to run the validation. And when a validation _is_ used, it needs to be configured by calling `validateFromSnapshot` with the snapshot used for all reads.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org