You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/09/14 02:21:21 UTC

[GitHub] [iceberg] hameizi commented on pull request #2867: Flink: Auto compact file

hameizi commented on pull request #2867:
URL: https://github.com/apache/iceberg/pull/2867#issuecomment-918739609

> I think the parallel commit proposal that @rdblue proposed could work

In this PR the rewriteAction of flink is parallel, it will not make data deal slow down. Because when the function snapshot success flink will continue deal data but not wait the result of notifyCheckpointComplete.

> I wonder what is the initial drive behind this implementation.

Auto compact file every checkpoint in flink will solve several question.
1. It will make query iceberg table fastly every time, because in our sence we find query table slowly although we have schedule compact file every day, but it is not enough.
2. It will slove the bug of there is duplicate rows in iceberg primary table when we compact file https://github.com/apache/iceberg/issues/2308 . Because we strict commit one snashot and then compact file in order, so we will not cause there is one more snapshot is commit when we are compacting file.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org