You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@seatunnel.apache.org by GitBox <gi...@apache.org> on 2022/10/06 03:49:48 UTC

[GitHub] [incubator-seatunnel] liugddx opened a new issue, #3002: [Umbrella] [Dirty data] This is new feature for dirty data

liugddx opened a new issue, #3002:
URL: https://github.com/apache/incubator-seatunnel/issues/3002

   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   
   
   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/incubator-seatunnel/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues.
   
   
   ### Describe the proposal
   
   Dirty data is data that is not meaningful to the business, has an illegal format, or is out of sync. A single piece of data is dirty if an exception occurs during writing to the target data source. So any data that fails to write is classified as dirty. For example, data of type VARCHAR on the source side is written to the target column of type INT, resulting in data that can not be written because of improper conversion. You can control whether dirty data is allowed during synchronization when task configuration is synchronized, and support for controlling the number of dirty data bars, that is, when the dirty data exceeds the specified number, the task fails to exit.
   
   ### Task list
   
   Support for defining dirty data and its impact on tasks
   
   - [ ]  When do not allow the dirty data, if produce dirty data synchronization task execution process, the task will fail
   - [ ] When allowing dirty data and set its threshold, synchronization task will ignore the dirty data (that is, won't write to the target side), and normal execution.
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] Hisoka-X commented on issue #3002: [Umbrella] [Dirty data] This is new feature for dirty data

Posted by GitBox <gi...@apache.org>.
Hisoka-X commented on issue #3002:
URL: https://github.com/apache/incubator-seatunnel/issues/3002#issuecomment-1272470964

   Can you show me how to achieve this feature?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] liugddx commented on issue #3002: [Umbrella] [Dirty data] This is new feature for dirty data

Posted by GitBox <gi...@apache.org>.
liugddx commented on issue #3002:
URL: https://github.com/apache/incubator-seatunnel/issues/3002#issuecomment-1321377983

   > > When allowing dirty data and set its threshold, synchronization task will ignore the dirty data (that is, won't write to the target side), and normal execution.
   > 
   > If we want achieve this feature, two way: 1. the sink connector should make sure they are not throw exception when meet dirty data. 2. Or we recreate sink connector again. Both have some problem, 1 will be hard to do with all connector, 2 will have many connection to create and close.
   
   I think you can use the first way.Whether can be obtained at a higher level anomalies and then use the accumulator?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] github-actions[bot] commented on issue #3002: [Umbrella] [Dirty data] This is new feature for dirty data

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #3002:
URL: https://github.com/apache/incubator-seatunnel/issues/3002#issuecomment-1308023954

   This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] liugddx commented on issue #3002: [Umbrella] [Dirty data] This is new feature for dirty data

Posted by GitBox <gi...@apache.org>.
liugddx commented on issue #3002:
URL: https://github.com/apache/incubator-seatunnel/issues/3002#issuecomment-1316196841

   #3431 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] github-actions[bot] commented on issue #3002: [Umbrella] [Dirty data] This is new feature for dirty data

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #3002:
URL: https://github.com/apache/incubator-seatunnel/issues/3002#issuecomment-1475454138

   This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] liugddx commented on issue #3002: [Umbrella] [Dirty data] This is new feature for dirty data

Posted by GitBox <gi...@apache.org>.
liugddx commented on issue #3002:
URL: https://github.com/apache/incubator-seatunnel/issues/3002#issuecomment-1272471065

   > Can you show me how to achieve this feature?
   
   I will output the detailed design later
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] github-actions[bot] commented on issue #3002: [Umbrella] [Dirty data] This is new feature for dirty data

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #3002:
URL: https://github.com/apache/incubator-seatunnel/issues/3002#issuecomment-1362248226

   This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] liugddx commented on issue #3002: [Umbrella] [Dirty data] This is new feature for dirty data

Posted by GitBox <gi...@apache.org>.
liugddx commented on issue #3002:
URL: https://github.com/apache/incubator-seatunnel/issues/3002#issuecomment-1308524198

   > It would be better if it could support specified tolerance ratio of the dirty data.
   Now is not good support and need to provide the collector features first.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] github-actions[bot] commented on issue #3002: [Umbrella] [Dirty data] This is new feature for dirty data

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #3002:
URL: https://github.com/apache/incubator-seatunnel/issues/3002#issuecomment-1421692455

   This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] iture123 commented on issue #3002: [Umbrella] [Dirty data] This is new feature for dirty data

Posted by GitBox <gi...@apache.org>.
iture123 commented on issue #3002:
URL: https://github.com/apache/incubator-seatunnel/issues/3002#issuecomment-1308511195

   It would be better if it could support specified tolerance ratio of the dirty data.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] Hisoka-X commented on issue #3002: [Umbrella] [Dirty data] This is new feature for dirty data

Posted by GitBox <gi...@apache.org>.
Hisoka-X commented on issue #3002:
URL: https://github.com/apache/incubator-seatunnel/issues/3002#issuecomment-1321360867

   >  When allowing dirty data and set its threshold, synchronization task will ignore the dirty data (that is, won't write to the target side), and normal execution.
   If we want achieve this feature, two way: 1. the sink connector should make sure they are not throw exception when meet dirty data. 2. Or we recreate sink connector again. 
   Both have some problem, 1 will be hard to do with all connector, 2 will have many connection to create and close.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] liugddx commented on issue #3002: [Umbrella] [Dirty data] This is new feature for dirty data

Posted by GitBox <gi...@apache.org>.
liugddx commented on issue #3002:
URL: https://github.com/apache/incubator-seatunnel/issues/3002#issuecomment-1321358056

   flink:https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/ops/metrics/
   spark:https://spark.apache.org/docs/2.4.5/rdd-programming-guide.html#shared-variables


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] liugddx closed issue #3002: [Umbrella] [Dirty data] This is new feature for dirty data

Posted by "liugddx (via GitHub)" <gi...@apache.org>.
liugddx closed issue #3002: [Umbrella] [Dirty data] This is new feature for dirty data
URL: https://github.com/apache/incubator-seatunnel/issues/3002


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org