You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Jiabao-Sun (via GitHub)" <gi...@apache.org> on 2023/03/30 19:10:25 UTC

[GitHub] [flink-connector-mongodb] Jiabao-Sun opened a new pull request, #4: [hotfix] Fix unstable test of MongoSinkITCase.testRecovery

Jiabao-Sun opened a new pull request, #4:
URL: https://github.com/apache/flink-connector-mongodb/pull/4

   Fix the CI fails https://github.com/apache/flink-connector-mongodb/actions/runs/4435527099/jobs/7782784066.
   
   > 2023-03-16T09:46:59.8097110Z 09:46:52,687 [Source: Sequence Source -> Map -> Map -> Sink: Writer (1/1)#7] ERROR org.apache.flink.connector.mongodb.sink.writer.MongoWriter   [] - Bulk Write to MongoDB failed
   2023-03-16T09:46:59.8098540Z com.mongodb.MongoBulkWriteException: Bulk write operation error on server localhost:32771. Write errors: [BulkWriteError{index=0, code=11000, message='E11000 duplicate key error collection: test_sink.test-recovery-mongo-sink index: _id_ dup key: { : 1 }', details={}}]. 
   
   We use non-idempotent writes in this test case and may write some data before checkpointed.
   In that case we'll meet duplicate write error.
   Set `batchIntervalMs` and `batchSize` to -1 to force writes at checkpoint to make the test stable.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-connector-mongodb] Jiabao-Sun commented on pull request #4: [hotfix] Fix unstable test of MongoSinkITCase.testRecovery

Posted by "Jiabao-Sun (via GitHub)" <gi...@apache.org>.
Jiabao-Sun commented on PR #4:
URL: https://github.com/apache/flink-connector-mongodb/pull/4#issuecomment-1491579741

   > So...would this also fail with this error in production?
   > 
   > The sink supports AT_LEAST_ONCE semantics, which should imply that duplicate writes are fine and don't cause errors. But now they seemingly do?
   
   The `MongoRowDataSerializationSchema` uses upsert write which is idempotent and there will be no write conflict.
   
   If we set `batchIntervalMs != -1` and `batchSize != -1`, we may write some data not checkpoint. In this case users are required to ensure idempotent writes. I think this problem also happens in elasticsearch-connector. The `BulkProcessor` will be periodic flush data of scheduled tasks, but it has not been checkpointed.
   
   So do we need to only allow writing at the time of checkpoint in AT-LEAST-ONE semantics?
   @zentol, What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-connector-mongodb] Jiabao-Sun commented on pull request #4: [hotfix] Fix unstable test of MongoSinkITCase.testRecovery

Posted by "Jiabao-Sun (via GitHub)" <gi...@apache.org>.
Jiabao-Sun commented on PR #4:
URL: https://github.com/apache/flink-connector-mongodb/pull/4#issuecomment-1490804554

   Hi @dannycranmer.
   Could you help review it when you have time?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-connector-mongodb] zentol commented on pull request #4: [hotfix] Fix unstable test of MongoSinkITCase.testRecovery

Posted by "zentol (via GitHub)" <gi...@apache.org>.
zentol commented on PR #4:
URL: https://github.com/apache/flink-connector-mongodb/pull/4#issuecomment-1491623124

   So this is more an issue of the `AppendOnlySerializationSchema` used in the test, because it uses the `InsertOneModel` which fails on duplicates?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-connector-mongodb] zentol commented on pull request #4: [hotfix] Fix unstable test of MongoSinkITCase.testRecovery

Posted by "zentol (via GitHub)" <gi...@apache.org>.
zentol commented on PR #4:
URL: https://github.com/apache/flink-connector-mongodb/pull/4#issuecomment-1491627450

   I assume this was compounded by us passing `Documents` to the sink which already have an idea, while without that an ID would be auto-generated?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-connector-mongodb] zentol commented on pull request #4: [hotfix] Fix unstable test of MongoSinkITCase.testRecovery

Posted by "zentol (via GitHub)" <gi...@apache.org>.
zentol commented on PR #4:
URL: https://github.com/apache/flink-connector-mongodb/pull/4#issuecomment-1491630637

   If so, then it may be a good idea to explicitly say that using the `InsertOneModel` without only writing on checkpoint will likely fail at some point.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-connector-mongodb] Jiabao-Sun commented on pull request #4: [hotfix] Fix unstable test of MongoSinkITCase.testRecovery

Posted by "Jiabao-Sun (via GitHub)" <gi...@apache.org>.
Jiabao-Sun commented on PR #4:
URL: https://github.com/apache/flink-connector-mongodb/pull/4#issuecomment-1491660847

   > I guess the `InsertOneModel` shouldn't be used at all actually because even only writing on checkpoint won't prevent us from attempting to write multiple times due to the lack of transactions.
   
   Yes, we can change `InsertOneModel` to `UpdateOneModel` with upsert option.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-connector-mongodb] zentol merged pull request #4: [hotfix] Fix unstable test of MongoSinkITCase.testRecovery

Posted by "zentol (via GitHub)" <gi...@apache.org>.
zentol merged PR #4:
URL: https://github.com/apache/flink-connector-mongodb/pull/4


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-connector-mongodb] zentol commented on pull request #4: [hotfix] Fix unstable test of MongoSinkITCase.testRecovery

Posted by "zentol (via GitHub)" <gi...@apache.org>.
zentol commented on PR #4:
URL: https://github.com/apache/flink-connector-mongodb/pull/4#issuecomment-1491631381

   I guess the `InsertOneModel` shouldn't be used at all actually because even only writing on checkpoint won't prevent us from attempting to write multiple times due to the lack of transactions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org