You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2021/11/17 12:54:34 UTC

[GitHub] [incubator-doris] weizuo93 opened a new issue #7141: [Feature] [Stream Load] Two-Phase Commit for stream load

weizuo93 opened a new issue #7141:
URL: https://github.com/apache/incubator-doris/issues/7141


   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### Description
   
   ### Background
   In the sample Doris application, data flow is as follows:
   * read streaming data from Kafka
   * Execute ETL in Flink
   * Sink data batch to Doris by `stream load`
   
   Flink generates checkpoints on a regular, configurable interval and then writes the checkpoint to a persistent storage system, such as HDFS. A checkpoint in Flink is a consistent snapshot of:
   * The current state of an application
   * The consumption progress of data stream(`offset`)
   
   ![2021-11-17 20-43-16 的屏幕截图](https://user-images.githubusercontent.com/68884553/142202896-1402fe7f-adb2-42fb-9ac9-9f7dfe7e1acc.png)
   
   In the event of a machine or Flink software failure and upon restart, the Flink application resumes processing from the most recent successfully-completed checkpoint, which causes partial data to be loaded to Doris twice and duplicate data.
   
   To provide exactly-once semantics, Doris must provide a means to commit or rollback load that coordinate with Flink's checkpoints. So, it's better to support `Two-Phase Commit(2PC)` for stream load.
   
   For the data sink to provide exactly-once guarantees, it must:
   * write all data to Doris through several stream load tasks between two checkpoints (All data is non-visible).
   * commit all stream load tasks between two checkpoints(All data is visible).
   
   In the event of a machine or Flink software failure and upon restart, commit all stream load tasks between the most recent two checkpoints(It is ok to execute commit repeatedly for a stream load task).
   
   ### Design
   
   The design of the two phase for stream load is as follows:
   
   * First Phase:
   
   ![2021-11-02 15-55-26 的屏幕截图](https://user-images.githubusercontent.com/68884553/142198985-18bf0b3a-eb36-4ee1-bb37-fc79c6f70ab5.png)
   
   
   * Second Phase:
   ![2021-11-02 15-57-47 的屏幕截图](https://user-images.githubusercontent.com/68884553/142199038-4c36a277-cdcc-4de8-bcb1-4d0ffcde88f8.png)
   
   Once the `pre-commit` is complete, we must ensure that the `commit` can be successful.
   
   ### Use case
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman closed issue #7141: [Feature] [Stream Load] Two-Phase Commit for stream load

Posted by GitBox <gi...@apache.org>.
morningman closed issue #7141:
URL: https://github.com/apache/incubator-doris/issues/7141


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on issue #7141: [Feature] [Stream Load] Two-Phase Commit for stream load

Posted by GitBox <gi...@apache.org>.
morningman commented on issue #7141:
URL: https://github.com/apache/incubator-doris/issues/7141#issuecomment-973723783


   Looking forward your PR!
   And as we discussed before, the `pre-commit` status has be cleared somehow finally.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org