You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/09/27 02:12:29 UTC

[GitHub] [pulsar] BewareMyPower edited a comment on pull request #12195: When the producer or client is normally closed, data will be lost

BewareMyPower edited a comment on pull request #12195:
URL: https://github.com/apache/pulsar/pull/12195#issuecomment-927451267


   > The flushing on close was already the pre-existing behavior
   
   @merlimat I'm afraid not, in current `ProducerImpl#closeAsync`, before sending `CommandClose`, producer only cancels all timers, then call `clearPendingMessagesWhenClose` to complete all pending messages' callbacks with `AlreadyClosedException`.
   
   > Of course, we can also provide a close method with a timeout to satisfy some users who do not want the close method to take too much time.
   
   It's not so easy like you might think. We need to handle more corner cases if you added the flush semantics to `close`.
   
   First, we cannot assume pending messages are sent quickly. If your buffer memory is large enough, it might take long time to close. Assuming you have 100000 pending messages and in the timeout, only 20000 messages are persisted. What will you do now?
   1. Continue sending the left 80000 messages. It means your application continues after `close`, but the background thread is still running. To ensure messages are not lost, you need to keep the connection open. Then it's interesting that a client is closed while the resources are still active, including the connection, pending messages, etc.
   2. Discard the left 80000 messages. Then data is lost. In addition, what's different from Kafka, Pulsar producer will send a `CommandCloseProducer` to notify the broker to close the producer. What's more, it make your guarantee that messages are not lost after calling synchronous `close` not work.
   
   At any case, when you choose `sendAsync`, you should always make use of the returned future to confirm the result of all messages. In Kafka, it's the send callback.
   
   Assuming now `close` will wait until all messages are persisted. Then you might write following pseudo code.
   
   ```java
   for (int i = 0; i < N; i++) {
       producer.sendAsync("message-i");
   }
   producer.close();
   ```
   
   What if `close()` failed? Then how would you check which messages are failed to send? You still need to do something with the future that `sendAsync` returns. Kafka's `close` only looks good for those don't care about any error and think all API calls should always succeed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org