You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/03/05 08:30:45 UTC

[GitHub] [flink] gaoyunhaii edited a comment on pull request #18805: [FLINK-26173][streaming] Recover GlobalCommittables with Sink V1 GlobalCommittable serializer

gaoyunhaii edited a comment on pull request #18805:
URL: https://github.com/apache/flink/pull/18805#issuecomment-1059720777


   Hi Fabian~ thanks for the clarification!
   
   For the `filterRecoveredCommittables`, it should be initiated by some data lake systems like iceberg, if there are global committables from more than one checkpoint not committed before failover, then it should be able to merge them into one global committable on restoring, thus the restoring process could be accelerated [1].
   
   For the `combine`, sorry I overlooked the issue and the current implementation, currently it is indeed called before committed. The issue is said that the committables should be combined on `snapshotState`, and the resulted global committables should be persisted, thus if there are failovers and re-commit, the `combine` does need to be executed again.
   
   If we want to call `filterRecoveredCommittables` for each snapshot, we might need to add a parameter `isRestored` to the `CommittableManager#commit`. If we want to also keep the same behavior for the `combine` method, we might need to do some more modification to the `CommittableManager` or not reuse the current implementation.
   
   However, currently the sink implementation in the `iceberg`, `hudi` and `flink-table-store` are indeed using the customized operators / legacy sink API instead of the sink API v1. In addition, the two issues should mainly affect he performance instead of the correctness. At last, users are still be able to add their own global committer implementation with topology API. Thus I think it is still useful to solve the two issues, but if we do not have capacity it would be also ok we postpone solving them. 
   
   If so, perhaps we could create a new jira issue for state compatibility with the sink v1 and attach this PR to that issue, and dowgrade the FLINK-26173 and leaves it open? Since we have not fully fix the issues listed in the FLINK-26173~ 
   
   [1] https://lists.apache.org/thread/3tqgnhclbr8ptb69b2ro74535dtbd608


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org