You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@seatunnel.apache.org by GitBox <gi...@apache.org> on 2022/01/10 05:39:40 UTC

[GitHub] [incubator-seatunnel] leo65535 opened a new issue #988: [DISCUSS][Feature][core] Add dirty data management

leo65535 opened a new issue #988:
URL: https://github.com/apache/incubator-seatunnel/issues/988


   ### Search before asking
   
   - [X] I had searched in the [feature](https://github.com/apache/incubator-seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22) and found no similar feature requirement.
   
   
   ### Description
   
   We may meet some dirty records when transmitting data, so we may need a dirty data management mechanism to handle them. 
   This issue is under discussing, for free to share your options.
   
   ### Usage Scenario
   
   -
   
   ### Related issues
   
   -
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] zhaomin1423 commented on issue #988: [DISCUSS][Feature][core] Add dirty data management

Posted by GitBox <gi...@apache.org>.
zhaomin1423 commented on issue #988:
URL: https://github.com/apache/incubator-seatunnel/issues/988#issuecomment-1049481572


   The dirty data management has two aspect. First, We can handle data one by one, then, the database must support transactions because when writing a batch data with few dirty data, the database must rollback. Therefore, we can write the batch one by one to catch the dirty data. In spark, add a datasource strategy to transform WriteToDataSourceV2 to an extended WriteToDataSourceV2Exec. So, we can handle the data one by one to mange dirty data. Then, to implement a jdbc connector base on DataSourceV2 API.
   
   Welcome to comment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] zhaomin1423 commented on issue #988: [DISCUSS][Feature][core] Add dirty data management

Posted by GitBox <gi...@apache.org>.
zhaomin1423 commented on issue #988:
URL: https://github.com/apache/incubator-seatunnel/issues/988#issuecomment-1048971611


   How is this work going? I am interested in it, and I am willing to subit a PR. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-seatunnel] leo65535 commented on issue #988: [DISCUSS][Feature][core] Add dirty data management

Posted by GitBox <gi...@apache.org>.
leo65535 commented on issue #988:
URL: https://github.com/apache/incubator-seatunnel/issues/988#issuecomment-1049384127


   > How is this work going? I am interested in it, and I am willing to subit a PR.
   
   Welcome @zhaomin1423.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org