You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/11/30 08:04:49 UTC

[GitHub] [pulsar] eolivelli commented on a change in pull request #8725: Issue 8677: Cannot get lastMessageId for an empty topic due to message retention

eolivelli commented on a change in pull request #8725:
URL: https://github.com/apache/pulsar/pull/8725#discussion_r532406502



##########
File path: managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java
##########
@@ -2089,6 +2089,10 @@ void internalTrimConsumedLedgers(CompletableFuture<?> promise) {
                     log.debug("[{}] Ledger {} skipped for deletion as it is currently being written to", name,
                             ls.getLedgerId());
                     break;
+                } else if (currentLastConfirmedEntry != null && ls.getLedgerId() == currentLastConfirmedEntry.getLedgerId()) {

Review comment:
       @sijie thanks for checking 
   Yes, I don't know what is the expected value for "lastMessageId" in this case.
   
   It is true that this is a behaviour change in the rollover but my rationale is that if we send an "empty" messageId, the consumer (or the "Reader" in the case of the user that reported the issue) won't see a consistent sequence of "lastMessageId"
   You can have:
   - time 1: last messageId:   X:Y:Z:K
   - time 2 (broker restart): messageId: empty
   - time 3 (after one write): messageId:  X+1:Y:Z:K
   so if you do not restart the broker and everything goes well you do not see that hole in the sequence
   
   In the case of the user the lastMessageId is used to compute an approximate number of messages to be consumed by the Reader. as the topic is basically "empty" the effect is the same, that is: there are no more messages to consume. So your suggestion would make the user happy anyhow.
   
   Regarding the trim: we are slightly changing the semantics, but as soon as you write a new message, the behavior is the same. Currently (AFAIK but I still have limited knowledge so I may be wrong) there is not strict contract about when a ledger is deleted (and also you won't know when it will be really deleted from the bookies).
   
   All in all I would prefer this new behaviour:
   - the sequence of lastMessageId does not contain "holes"
   - the trim behavior changes but is it already unpredictable
   
   You point "data won't ever be deleted if there is no more write to the topic" is valid, but I am not sure how much impact could have on production users. Is it about privacy laws/data retention rules ?
   
   but I am open to change the patch 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org