You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Xiangying Meng <xi...@apache.org> on 2024/03/27 06:41:02 UTC

[DISCUSS] Optimizing the Method of Estimating Message Backlog Size in Pulsar

Dear Pulsar Community,

I would like to initiate a discussion regarding the optimization of
the method used for estimating the message backlog size.

In the current implementation, the backlog size is estimated from the
mark delete position to the last confirm position, whereas the backlog
message count is the number of messages from the mark delete position
to the last confirm position, minus the count of individually
acknowledged messages. The inconsistency between these two could
potentially confuse users.

For instance, let's consider there are 3,000 messages in a topic and
all messages except for message 1:0, 1:998, and 3:999 have been
acknowledged by a subscription. When users retrieve the stats of the
subscription, they will find that `msgBacklog` is 3, while
`backlogSize` is 3000 * entry size.

    |1:0|...|1:998|...|3:999|

When it comes to the value of `backlogSize`, there seem to be two
different opinions:
1. The backlog size should be consistent with the message backlog, and
it should not include the messages that have been individually
acknowledged.
2. Only the messages before the mark delete position can be deleted,
so we should calculate the backlog size from the mark delete position,
and individual acknowledgments should not affect the calculation of
the backlog size.

I'm interested in hearing how others view this issue. I look forward
to your response.

Best Regards,
Xiangying

Re: [DISCUSS] Optimizing the Method of Estimating Message Backlog Size in Pulsar

Posted by Xiangying Meng <xi...@apache.org>.
Agree. While the name might be misleading, it indeed accurately
reflects the actual disk usage situation.


BR

On Wed, Mar 27, 2024 at 3:48 PM Girish Sharma <sc...@gmail.com> wrote:
>
> Hi Xiangying,
>
>
> > In the current implementation, the backlog size is estimated from the
> > mark delete position to the last confirm position, whereas the backlog
> > message count is the number of messages from the mark delete position
> > to the last confirm position, minus the count of individually
> > acknowledged messages. The inconsistency between these two could
> > potentially confuse users.
> >
>
> While confusing, it is somewhat accurate. Since the messages can be part of
> the same ledger where some messages are acked, some aren't, we can't delete
> that entire ledger until all messages of the ledger are acked - so it does
> contribute towards size of the backlog from a disk perspective.
> There might be some optimization possible - in a way that we try to figure
> out all completely acked ledgers from markDeletePosition to latest offset
> and remove their size, but what's the ROI there?
>
> So I would say that in your proposal, option 2 (current) is more accurate
> (while not being the best) than option 1.
>
> Regards
> --
> Girish Sharma

Re: [DISCUSS] Optimizing the Method of Estimating Message Backlog Size in Pulsar

Posted by Girish Sharma <sc...@gmail.com>.
Hi Xiangying,


> In the current implementation, the backlog size is estimated from the
> mark delete position to the last confirm position, whereas the backlog
> message count is the number of messages from the mark delete position
> to the last confirm position, minus the count of individually
> acknowledged messages. The inconsistency between these two could
> potentially confuse users.
>

While confusing, it is somewhat accurate. Since the messages can be part of
the same ledger where some messages are acked, some aren't, we can't delete
that entire ledger until all messages of the ledger are acked - so it does
contribute towards size of the backlog from a disk perspective.
There might be some optimization possible - in a way that we try to figure
out all completely acked ledgers from markDeletePosition to latest offset
and remove their size, but what's the ROI there?

So I would say that in your proposal, option 2 (current) is more accurate
(while not being the best) than option 1.

Regards
-- 
Girish Sharma