You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2019/10/28 04:11:18 UTC

[GitHub] [pulsar] sijie opened a new issue #5476: Message deduplication is not well handled when batching is enabled with external provided sequenceId

sijie opened a new issue #5476: Message deduplication is not well handled when batching is enabled with external provided sequenceId
URL: https://github.com/apache/pulsar/issues/5476
 
 
   **Describe the bug**
   
   Current implementation of Pulsar producer doesn't check the sequenceId when adding messages to a batch container. That results in violations to idempotent producing with external sequenceId.
   
   **To Reproduce**
   
   - provide 10 message with sequenceId from 0-9
   - provide 10 message with sequenceId from 0-9 again
   - flush the producer
   - these 20 messages will be received by the consumer
   
   **Expected behavior**
   
   The second 10 messages will not be added to container, because they are duplicated. We can throw exceptions to client to indicate that it adds out-of-order sequence ids.
   
   **Additional context**
   
   There are a couple places requires attentions regarding handling batched messages with external sequenceId.
   
   1) The logic to maintain `lastPublishedSequenceId` is incorrect when using external sequenceId : `lastSequenceIdPublished = op.sequenceId + op.numMessagesInBatch - 1;`. Because the last sequence id is an external sequence id, which can't be computed by adding the number of messages in the batch.
   
   2) We only maintain `lastPublishedSequenceId` (which is the acked seequence id). We also need to maintain a `lastPushSequenceId` to indicate the last sequence id that a producer sends to the broker.
   
   3) the broker need to handle the first sequence id and last sequence id in a message batch.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services