You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Yubiao Feng <yu...@streamnative.io.INVALID> on 2023/01/16 15:36:24 UTC

[DISCUSS] Make the default value of param "--get-subscription-backlog-size" of admin API "topics stats" true

Hi community

I am starting a DISCUSS for making the default value of the parameter
"--get-subscription-backlog-size" of admin API "topics stats" true.

In the PR https://github.com/apache/pulsar/pull/9302, the property backlog
size of each subscription returned in the response of the API topics stats,
by default this property is always equal to 0 in response, and this will
confuse users. Since the calculation of backlog size is done in broker
memory, there is no significant overhead(the process is described in the
following section), so I think the correct values should be displayed by
default.

### The following two APIs should be affected:

In Pulsar admin API
```
pulsar-admin topics stats persistent://test-tenant/ns1/tp1
--get-subscription-backlog-size
pulsar-admin topics stats persistent://test-tenant/ns1/tp1 -sbs
```
the default value of parameter `--get-subscription-backlog-size` will be
`true`

In Pulsar Rest API
```
curl GET "http://127.0.0.1:8080/test-tenant/ns1/tp1/stats
"?subscriptionBacklogSize=true
```
the default value of parameter `subscriptionBacklogSize ` will be `true`


### The following is the process of calculating backlog size:
- Divide `PersistentTopc.ledgers` into two parts according to the ledgerId
of the mark delete position of the cursor. The second part is ledgers
indicating the messages still need to be consumed, aka backlogSizeInLedgers.
- Find the LedgerInfo whose ledgerId is the same as the ledgerId of the
mark delete position of the cursor, and we can also divide the ledger into
two parts, the second part is entries indicating the messages still need to
be consumed, multiply the average size of each entry in metrics by the
number of still need to be consumed entries we can get the backlog size in
this ledger. aka backlogSizeInEntries.
- `backlogSizeInLe dgers` + `backlogSizeInEntries`

Thanks
Yubiao Feng

Re: [DISCUSS] Make the default value of param "--get-subscription-backlog-size" of admin API "topics stats" true

Posted by guo jiwei <te...@apache.org>.
+1


Regards
Jiwei Guo (Tboy)

On Sat, Jan 28, 2023 at 1:28 AM Yubiao Feng
<yu...@streamnative.io.invalid> wrote:
>
> Hi Michael Marshall
>
> > My primary concern with making this the new default is that the cost of
> calculating the size is in the synchronization on the managed ledger, which
> interrupts writes and reads through that managed ledger and potentially
> others in some cases.
>
> Yes, it affects producing, but it doesn't affect consuming. When
> calculating the backlogSize of each subscription, it fights for an
> exclusive lock(see:
> https://github.com/apache/pulsar/blob/644be5fdd6d080accffd72229b2e21e35a27d722/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L1270),
> this lock is also used when sending messages. If a topic has N
> subscriptions, then the time of impact is multiplied by N. I will write a
> PR to reduce the amplification of influence on producing caused by multiple
> subscriptions.
>
> > we should consider making the JSON result easier to interpret.
>
> Yes, I have set the attribute `backlogSize` in response to be `-1` if
> `--get-subscription-backlog-size` is false to distinguish between the two
> cases:
> - There is no backlog, and the parameter `--get-subscription-backlog-size`
> is true. After changes, the attribute `backlogSize` in response will be `0`.
> - The parameter `--get-subscription-backlog-size` is false. After changes,
> the attribute `backlogSize` in response will be `-1`.
>
> On Sat, Jan 21, 2023 at 3:56 AM Michael Marshall <mm...@apache.org>
> wrote:
>
> > My primary concern with making this the new default is that the cost
> > of calculating the size is in the synchronization on the managed
> > ledger, which interrupts writes and reads through that managed ledger
> > and potentially others in some cases.
> >
> > If the primary motivation for this PR is to avoid confusing users (and
> > I agree that the 0 value confuses users), we should consider making
> > the JSON result easier to interpret. It'd be just as easy to only
> > include the subscription backlog size when a client requests it.
> >
> > What do you think about changing the result?
> >
> > Thanks,
> > Michael
> >
> > On Thu, Jan 19, 2023 at 4:03 AM Haiting Jiang <ji...@gmail.com>
> > wrote:
> > >
> > > +1
> > >
> > > Haiting
> > >
> > > On Tue, Jan 17, 2023 at 5:50 PM Yubiao Feng
> > > <yu...@streamnative.io.invalid> wrote:
> > > >
> > > > Hi Asaf
> > > >
> > > > > It's worth noting that the estimated backlog size of a subscription
> > is
> > > > estimated since it doesn't consider any acknowledged messages between
> > the
> > > > mark delete position and the last message. It simply assumes all
> > messages
> > > > between the mark delete position and the last message have not been
> > > > acknowledged.
> > > >
> > > > Yes, it's not the exact value of the backlog. There are two reasons
> > for the
> > > > loss of accuracy:
> > > > - Whether the Entry size is closer to the `averageSize`.
> > > > - The number of messages after the mark deleted position has been
> > > > acknowledged.
> > > >
> > > > Thanks
> > > > Yubiao Feng
> > > >
> > > > On Tue, Jan 17, 2023 at 3:31 PM Asaf Mesika <as...@gmail.com>
> > wrote:
> > > >
> > > > > Small question regarding this:
> > > > >
> > > > > The code for calculation is:
> > > > >
> > > > > long estimateBacklogFromPosition(PositionImpl pos) {
> > > > >     synchronized (this) {
> > > > >         long sizeBeforePosLedger =
> > > > > ledgers.headMap(pos.getLedgerId()).values()
> > > > >                 .stream().mapToLong(LedgerInfo::getSize).sum();
> > > > >         LedgerInfo ledgerInfo = ledgers.get(pos.getLedgerId());
> > > > >         long sizeAfter = getTotalSize() - sizeBeforePosLedger;
> > > > >         if (ledgerInfo == null) {
> > > > >             return sizeAfter;
> > > > >         } else if (pos.getLedgerId() == currentLedger.getId()) {
> > > > >             return sizeAfter - consumedLedgerSize(currentLedgerSize,
> > > > > currentLedgerEntries, pos.getEntryId());
> > > > >         } else {
> > > > >             return sizeAfter -
> > > > > consumedLedgerSize(ledgerInfo.getSize(), ledgerInfo.getEntries(),
> > > > > pos.getEntryId());
> > > > >         }
> > > > >     }
> > > > > }
> > > > >
> > > > > and
> > > > >
> > > > > private long consumedLedgerSize(long ledgerSize, long ledgerEntries,
> > > > > long consumedEntries) {
> > > > >     if (ledgerEntries <= 0) {
> > > > >         return 0;
> > > > >     }
> > > > >     if (ledgerEntries <= (consumedEntries + 1)) {
> > > > >         return ledgerSize;
> > > > >     } else {
> > > > >         long averageSize = ledgerSize / ledgerEntries;
> > > > >         return consumedEntries >= 0 ? (consumedEntries + 1) *
> > averageSize
> > > > > : 0;
> > > > >     }
> > > > > }
> > > > >
> > > > >
> > > > >
> > > > > It's worth noting that the estimated backlog size of a subscription
> > is
> > > > > estimated since it doesn't consider any acknowledged messages
> > between the
> > > > > mark delete position and the last message. It simply assumes all
> > messages
> > > > > between the mark delete position and the last message have not been
> > > > > acknowledged.
> > > > >
> > > > > Good idea - +1
> > > > >
> > > > > On Tue, Jan 17, 2023 at 4:12 AM PengHui Li <co...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > Penghui
> > > > > >
> > > > > > > On Jan 16, 2023, at 23:36, Yubiao Feng <
> > yubiao.feng@streamnative.io
> > > > > .INVALID>
> > > > > > wrote:
> > > > > > >
> > > > > > > Hi community
> > > > > > >
> > > > > > > I am starting a DISCUSS for making the default value of the
> > parameter
> > > > > > > "--get-subscription-backlog-size" of admin API "topics stats"
> > true.
> > > > > > >
> > > > > > > In the PR https://github.com/apache/pulsar/pull/9302, the
> > property
> > > > > > backlog
> > > > > > > size of each subscription returned in the response of the API
> > topics
> > > > > > stats,
> > > > > > > by default this property is always equal to 0 in response, and
> > this
> > > > > will
> > > > > > > confuse users. Since the calculation of backlog size is done in
> > broker
> > > > > > > memory, there is no significant overhead(the process is
> > described in
> > > > > the
> > > > > > > following section), so I think the correct values should be
> > displayed
> > > > > by
> > > > > > > default.
> > > > > > >
> > > > > > > ### The following two APIs should be affected:
> > > > > > >
> > > > > > > In Pulsar admin API
> > > > > > > ```
> > > > > > > pulsar-admin topics stats persistent://test-tenant/ns1/tp1
> > > > > > > --get-subscription-backlog-size
> > > > > > > pulsar-admin topics stats persistent://test-tenant/ns1/tp1 -sbs
> > > > > > > ```
> > > > > > > the default value of parameter `--get-subscription-backlog-size`
> > will
> > > > > be
> > > > > > > `true`
> > > > > > >
> > > > > > > In Pulsar Rest API
> > > > > > > ```
> > > > > > > curl GET "http://127.0.0.1:8080/test-tenant/ns1/tp1/stats
> > > > > > > "?subscriptionBacklogSize=true
> > > > > > > ```
> > > > > > > the default value of parameter `subscriptionBacklogSize ` will be
> > > > > `true`
> > > > > > >
> > > > > > >
> > > > > > > ### The following is the process of calculating backlog size:
> > > > > > > - Divide `PersistentTopc.ledgers` into two parts according to the
> > > > > > ledgerId
> > > > > > > of the mark delete position of the cursor. The second part is
> > ledgers
> > > > > > > indicating the messages still need to be consumed, aka
> > > > > > backlogSizeInLedgers.
> > > > > > > - Find the LedgerInfo whose ledgerId is the same as the ledgerId
> > of the
> > > > > > > mark delete position of the cursor, and we can also divide the
> > ledger
> > > > > > into
> > > > > > > two parts, the second part is entries indicating the messages
> > still
> > > > > need
> > > > > > to
> > > > > > > be consumed, multiply the average size of each entry in metrics
> > by the
> > > > > > > number of still need to be consumed entries we can get the
> > backlog size
> > > > > > in
> > > > > > > this ledger. aka backlogSizeInEntries.
> > > > > > > - `backlogSizeInLe dgers` + `backlogSizeInEntries`
> > > > > > >
> > > > > > > Thanks
> > > > > > > Yubiao Feng
> > > > > >
> > > > > >
> > > > >
> >

Re: [DISCUSS] Make the default value of param "--get-subscription-backlog-size" of admin API "topics stats" true

Posted by Yubiao Feng <yu...@streamnative.io.INVALID>.
Hi Michael Marshall

> My primary concern with making this the new default is that the cost of
calculating the size is in the synchronization on the managed ledger, which
interrupts writes and reads through that managed ledger and potentially
others in some cases.

Yes, it affects producing, but it doesn't affect consuming. When
calculating the backlogSize of each subscription, it fights for an
exclusive lock(see:
https://github.com/apache/pulsar/blob/644be5fdd6d080accffd72229b2e21e35a27d722/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L1270),
this lock is also used when sending messages. If a topic has N
subscriptions, then the time of impact is multiplied by N. I will write a
PR to reduce the amplification of influence on producing caused by multiple
subscriptions.

> we should consider making the JSON result easier to interpret.

Yes, I have set the attribute `backlogSize` in response to be `-1` if
`--get-subscription-backlog-size` is false to distinguish between the two
cases:
- There is no backlog, and the parameter `--get-subscription-backlog-size`
is true. After changes, the attribute `backlogSize` in response will be `0`.
- The parameter `--get-subscription-backlog-size` is false. After changes,
the attribute `backlogSize` in response will be `-1`.

On Sat, Jan 21, 2023 at 3:56 AM Michael Marshall <mm...@apache.org>
wrote:

> My primary concern with making this the new default is that the cost
> of calculating the size is in the synchronization on the managed
> ledger, which interrupts writes and reads through that managed ledger
> and potentially others in some cases.
>
> If the primary motivation for this PR is to avoid confusing users (and
> I agree that the 0 value confuses users), we should consider making
> the JSON result easier to interpret. It'd be just as easy to only
> include the subscription backlog size when a client requests it.
>
> What do you think about changing the result?
>
> Thanks,
> Michael
>
> On Thu, Jan 19, 2023 at 4:03 AM Haiting Jiang <ji...@gmail.com>
> wrote:
> >
> > +1
> >
> > Haiting
> >
> > On Tue, Jan 17, 2023 at 5:50 PM Yubiao Feng
> > <yu...@streamnative.io.invalid> wrote:
> > >
> > > Hi Asaf
> > >
> > > > It's worth noting that the estimated backlog size of a subscription
> is
> > > estimated since it doesn't consider any acknowledged messages between
> the
> > > mark delete position and the last message. It simply assumes all
> messages
> > > between the mark delete position and the last message have not been
> > > acknowledged.
> > >
> > > Yes, it's not the exact value of the backlog. There are two reasons
> for the
> > > loss of accuracy:
> > > - Whether the Entry size is closer to the `averageSize`.
> > > - The number of messages after the mark deleted position has been
> > > acknowledged.
> > >
> > > Thanks
> > > Yubiao Feng
> > >
> > > On Tue, Jan 17, 2023 at 3:31 PM Asaf Mesika <as...@gmail.com>
> wrote:
> > >
> > > > Small question regarding this:
> > > >
> > > > The code for calculation is:
> > > >
> > > > long estimateBacklogFromPosition(PositionImpl pos) {
> > > >     synchronized (this) {
> > > >         long sizeBeforePosLedger =
> > > > ledgers.headMap(pos.getLedgerId()).values()
> > > >                 .stream().mapToLong(LedgerInfo::getSize).sum();
> > > >         LedgerInfo ledgerInfo = ledgers.get(pos.getLedgerId());
> > > >         long sizeAfter = getTotalSize() - sizeBeforePosLedger;
> > > >         if (ledgerInfo == null) {
> > > >             return sizeAfter;
> > > >         } else if (pos.getLedgerId() == currentLedger.getId()) {
> > > >             return sizeAfter - consumedLedgerSize(currentLedgerSize,
> > > > currentLedgerEntries, pos.getEntryId());
> > > >         } else {
> > > >             return sizeAfter -
> > > > consumedLedgerSize(ledgerInfo.getSize(), ledgerInfo.getEntries(),
> > > > pos.getEntryId());
> > > >         }
> > > >     }
> > > > }
> > > >
> > > > and
> > > >
> > > > private long consumedLedgerSize(long ledgerSize, long ledgerEntries,
> > > > long consumedEntries) {
> > > >     if (ledgerEntries <= 0) {
> > > >         return 0;
> > > >     }
> > > >     if (ledgerEntries <= (consumedEntries + 1)) {
> > > >         return ledgerSize;
> > > >     } else {
> > > >         long averageSize = ledgerSize / ledgerEntries;
> > > >         return consumedEntries >= 0 ? (consumedEntries + 1) *
> averageSize
> > > > : 0;
> > > >     }
> > > > }
> > > >
> > > >
> > > >
> > > > It's worth noting that the estimated backlog size of a subscription
> is
> > > > estimated since it doesn't consider any acknowledged messages
> between the
> > > > mark delete position and the last message. It simply assumes all
> messages
> > > > between the mark delete position and the last message have not been
> > > > acknowledged.
> > > >
> > > > Good idea - +1
> > > >
> > > > On Tue, Jan 17, 2023 at 4:12 AM PengHui Li <co...@gmail.com>
> > > > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > Penghui
> > > > >
> > > > > > On Jan 16, 2023, at 23:36, Yubiao Feng <
> yubiao.feng@streamnative.io
> > > > .INVALID>
> > > > > wrote:
> > > > > >
> > > > > > Hi community
> > > > > >
> > > > > > I am starting a DISCUSS for making the default value of the
> parameter
> > > > > > "--get-subscription-backlog-size" of admin API "topics stats"
> true.
> > > > > >
> > > > > > In the PR https://github.com/apache/pulsar/pull/9302, the
> property
> > > > > backlog
> > > > > > size of each subscription returned in the response of the API
> topics
> > > > > stats,
> > > > > > by default this property is always equal to 0 in response, and
> this
> > > > will
> > > > > > confuse users. Since the calculation of backlog size is done in
> broker
> > > > > > memory, there is no significant overhead(the process is
> described in
> > > > the
> > > > > > following section), so I think the correct values should be
> displayed
> > > > by
> > > > > > default.
> > > > > >
> > > > > > ### The following two APIs should be affected:
> > > > > >
> > > > > > In Pulsar admin API
> > > > > > ```
> > > > > > pulsar-admin topics stats persistent://test-tenant/ns1/tp1
> > > > > > --get-subscription-backlog-size
> > > > > > pulsar-admin topics stats persistent://test-tenant/ns1/tp1 -sbs
> > > > > > ```
> > > > > > the default value of parameter `--get-subscription-backlog-size`
> will
> > > > be
> > > > > > `true`
> > > > > >
> > > > > > In Pulsar Rest API
> > > > > > ```
> > > > > > curl GET "http://127.0.0.1:8080/test-tenant/ns1/tp1/stats
> > > > > > "?subscriptionBacklogSize=true
> > > > > > ```
> > > > > > the default value of parameter `subscriptionBacklogSize ` will be
> > > > `true`
> > > > > >
> > > > > >
> > > > > > ### The following is the process of calculating backlog size:
> > > > > > - Divide `PersistentTopc.ledgers` into two parts according to the
> > > > > ledgerId
> > > > > > of the mark delete position of the cursor. The second part is
> ledgers
> > > > > > indicating the messages still need to be consumed, aka
> > > > > backlogSizeInLedgers.
> > > > > > - Find the LedgerInfo whose ledgerId is the same as the ledgerId
> of the
> > > > > > mark delete position of the cursor, and we can also divide the
> ledger
> > > > > into
> > > > > > two parts, the second part is entries indicating the messages
> still
> > > > need
> > > > > to
> > > > > > be consumed, multiply the average size of each entry in metrics
> by the
> > > > > > number of still need to be consumed entries we can get the
> backlog size
> > > > > in
> > > > > > this ledger. aka backlogSizeInEntries.
> > > > > > - `backlogSizeInLe dgers` + `backlogSizeInEntries`
> > > > > >
> > > > > > Thanks
> > > > > > Yubiao Feng
> > > > >
> > > > >
> > > >
>

Re: [DISCUSS] Make the default value of param "--get-subscription-backlog-size" of admin API "topics stats" true

Posted by Michael Marshall <mm...@apache.org>.
My primary concern with making this the new default is that the cost
of calculating the size is in the synchronization on the managed
ledger, which interrupts writes and reads through that managed ledger
and potentially others in some cases.

If the primary motivation for this PR is to avoid confusing users (and
I agree that the 0 value confuses users), we should consider making
the JSON result easier to interpret. It'd be just as easy to only
include the subscription backlog size when a client requests it.

What do you think about changing the result?

Thanks,
Michael

On Thu, Jan 19, 2023 at 4:03 AM Haiting Jiang <ji...@gmail.com> wrote:
>
> +1
>
> Haiting
>
> On Tue, Jan 17, 2023 at 5:50 PM Yubiao Feng
> <yu...@streamnative.io.invalid> wrote:
> >
> > Hi Asaf
> >
> > > It's worth noting that the estimated backlog size of a subscription is
> > estimated since it doesn't consider any acknowledged messages between the
> > mark delete position and the last message. It simply assumes all messages
> > between the mark delete position and the last message have not been
> > acknowledged.
> >
> > Yes, it's not the exact value of the backlog. There are two reasons for the
> > loss of accuracy:
> > - Whether the Entry size is closer to the `averageSize`.
> > - The number of messages after the mark deleted position has been
> > acknowledged.
> >
> > Thanks
> > Yubiao Feng
> >
> > On Tue, Jan 17, 2023 at 3:31 PM Asaf Mesika <as...@gmail.com> wrote:
> >
> > > Small question regarding this:
> > >
> > > The code for calculation is:
> > >
> > > long estimateBacklogFromPosition(PositionImpl pos) {
> > >     synchronized (this) {
> > >         long sizeBeforePosLedger =
> > > ledgers.headMap(pos.getLedgerId()).values()
> > >                 .stream().mapToLong(LedgerInfo::getSize).sum();
> > >         LedgerInfo ledgerInfo = ledgers.get(pos.getLedgerId());
> > >         long sizeAfter = getTotalSize() - sizeBeforePosLedger;
> > >         if (ledgerInfo == null) {
> > >             return sizeAfter;
> > >         } else if (pos.getLedgerId() == currentLedger.getId()) {
> > >             return sizeAfter - consumedLedgerSize(currentLedgerSize,
> > > currentLedgerEntries, pos.getEntryId());
> > >         } else {
> > >             return sizeAfter -
> > > consumedLedgerSize(ledgerInfo.getSize(), ledgerInfo.getEntries(),
> > > pos.getEntryId());
> > >         }
> > >     }
> > > }
> > >
> > > and
> > >
> > > private long consumedLedgerSize(long ledgerSize, long ledgerEntries,
> > > long consumedEntries) {
> > >     if (ledgerEntries <= 0) {
> > >         return 0;
> > >     }
> > >     if (ledgerEntries <= (consumedEntries + 1)) {
> > >         return ledgerSize;
> > >     } else {
> > >         long averageSize = ledgerSize / ledgerEntries;
> > >         return consumedEntries >= 0 ? (consumedEntries + 1) * averageSize
> > > : 0;
> > >     }
> > > }
> > >
> > >
> > >
> > > It's worth noting that the estimated backlog size of a subscription is
> > > estimated since it doesn't consider any acknowledged messages between the
> > > mark delete position and the last message. It simply assumes all messages
> > > between the mark delete position and the last message have not been
> > > acknowledged.
> > >
> > > Good idea - +1
> > >
> > > On Tue, Jan 17, 2023 at 4:12 AM PengHui Li <co...@gmail.com>
> > > wrote:
> > >
> > > > +1
> > > >
> > > > Penghui
> > > >
> > > > > On Jan 16, 2023, at 23:36, Yubiao Feng <yubiao.feng@streamnative.io
> > > .INVALID>
> > > > wrote:
> > > > >
> > > > > Hi community
> > > > >
> > > > > I am starting a DISCUSS for making the default value of the parameter
> > > > > "--get-subscription-backlog-size" of admin API "topics stats" true.
> > > > >
> > > > > In the PR https://github.com/apache/pulsar/pull/9302, the property
> > > > backlog
> > > > > size of each subscription returned in the response of the API topics
> > > > stats,
> > > > > by default this property is always equal to 0 in response, and this
> > > will
> > > > > confuse users. Since the calculation of backlog size is done in broker
> > > > > memory, there is no significant overhead(the process is described in
> > > the
> > > > > following section), so I think the correct values should be displayed
> > > by
> > > > > default.
> > > > >
> > > > > ### The following two APIs should be affected:
> > > > >
> > > > > In Pulsar admin API
> > > > > ```
> > > > > pulsar-admin topics stats persistent://test-tenant/ns1/tp1
> > > > > --get-subscription-backlog-size
> > > > > pulsar-admin topics stats persistent://test-tenant/ns1/tp1 -sbs
> > > > > ```
> > > > > the default value of parameter `--get-subscription-backlog-size` will
> > > be
> > > > > `true`
> > > > >
> > > > > In Pulsar Rest API
> > > > > ```
> > > > > curl GET "http://127.0.0.1:8080/test-tenant/ns1/tp1/stats
> > > > > "?subscriptionBacklogSize=true
> > > > > ```
> > > > > the default value of parameter `subscriptionBacklogSize ` will be
> > > `true`
> > > > >
> > > > >
> > > > > ### The following is the process of calculating backlog size:
> > > > > - Divide `PersistentTopc.ledgers` into two parts according to the
> > > > ledgerId
> > > > > of the mark delete position of the cursor. The second part is ledgers
> > > > > indicating the messages still need to be consumed, aka
> > > > backlogSizeInLedgers.
> > > > > - Find the LedgerInfo whose ledgerId is the same as the ledgerId of the
> > > > > mark delete position of the cursor, and we can also divide the ledger
> > > > into
> > > > > two parts, the second part is entries indicating the messages still
> > > need
> > > > to
> > > > > be consumed, multiply the average size of each entry in metrics by the
> > > > > number of still need to be consumed entries we can get the backlog size
> > > > in
> > > > > this ledger. aka backlogSizeInEntries.
> > > > > - `backlogSizeInLe dgers` + `backlogSizeInEntries`
> > > > >
> > > > > Thanks
> > > > > Yubiao Feng
> > > >
> > > >
> > >

Re: [DISCUSS] Make the default value of param "--get-subscription-backlog-size" of admin API "topics stats" true

Posted by Haiting Jiang <ji...@gmail.com>.
+1

Haiting

On Tue, Jan 17, 2023 at 5:50 PM Yubiao Feng
<yu...@streamnative.io.invalid> wrote:
>
> Hi Asaf
>
> > It's worth noting that the estimated backlog size of a subscription is
> estimated since it doesn't consider any acknowledged messages between the
> mark delete position and the last message. It simply assumes all messages
> between the mark delete position and the last message have not been
> acknowledged.
>
> Yes, it's not the exact value of the backlog. There are two reasons for the
> loss of accuracy:
> - Whether the Entry size is closer to the `averageSize`.
> - The number of messages after the mark deleted position has been
> acknowledged.
>
> Thanks
> Yubiao Feng
>
> On Tue, Jan 17, 2023 at 3:31 PM Asaf Mesika <as...@gmail.com> wrote:
>
> > Small question regarding this:
> >
> > The code for calculation is:
> >
> > long estimateBacklogFromPosition(PositionImpl pos) {
> >     synchronized (this) {
> >         long sizeBeforePosLedger =
> > ledgers.headMap(pos.getLedgerId()).values()
> >                 .stream().mapToLong(LedgerInfo::getSize).sum();
> >         LedgerInfo ledgerInfo = ledgers.get(pos.getLedgerId());
> >         long sizeAfter = getTotalSize() - sizeBeforePosLedger;
> >         if (ledgerInfo == null) {
> >             return sizeAfter;
> >         } else if (pos.getLedgerId() == currentLedger.getId()) {
> >             return sizeAfter - consumedLedgerSize(currentLedgerSize,
> > currentLedgerEntries, pos.getEntryId());
> >         } else {
> >             return sizeAfter -
> > consumedLedgerSize(ledgerInfo.getSize(), ledgerInfo.getEntries(),
> > pos.getEntryId());
> >         }
> >     }
> > }
> >
> > and
> >
> > private long consumedLedgerSize(long ledgerSize, long ledgerEntries,
> > long consumedEntries) {
> >     if (ledgerEntries <= 0) {
> >         return 0;
> >     }
> >     if (ledgerEntries <= (consumedEntries + 1)) {
> >         return ledgerSize;
> >     } else {
> >         long averageSize = ledgerSize / ledgerEntries;
> >         return consumedEntries >= 0 ? (consumedEntries + 1) * averageSize
> > : 0;
> >     }
> > }
> >
> >
> >
> > It's worth noting that the estimated backlog size of a subscription is
> > estimated since it doesn't consider any acknowledged messages between the
> > mark delete position and the last message. It simply assumes all messages
> > between the mark delete position and the last message have not been
> > acknowledged.
> >
> > Good idea - +1
> >
> > On Tue, Jan 17, 2023 at 4:12 AM PengHui Li <co...@gmail.com>
> > wrote:
> >
> > > +1
> > >
> > > Penghui
> > >
> > > > On Jan 16, 2023, at 23:36, Yubiao Feng <yubiao.feng@streamnative.io
> > .INVALID>
> > > wrote:
> > > >
> > > > Hi community
> > > >
> > > > I am starting a DISCUSS for making the default value of the parameter
> > > > "--get-subscription-backlog-size" of admin API "topics stats" true.
> > > >
> > > > In the PR https://github.com/apache/pulsar/pull/9302, the property
> > > backlog
> > > > size of each subscription returned in the response of the API topics
> > > stats,
> > > > by default this property is always equal to 0 in response, and this
> > will
> > > > confuse users. Since the calculation of backlog size is done in broker
> > > > memory, there is no significant overhead(the process is described in
> > the
> > > > following section), so I think the correct values should be displayed
> > by
> > > > default.
> > > >
> > > > ### The following two APIs should be affected:
> > > >
> > > > In Pulsar admin API
> > > > ```
> > > > pulsar-admin topics stats persistent://test-tenant/ns1/tp1
> > > > --get-subscription-backlog-size
> > > > pulsar-admin topics stats persistent://test-tenant/ns1/tp1 -sbs
> > > > ```
> > > > the default value of parameter `--get-subscription-backlog-size` will
> > be
> > > > `true`
> > > >
> > > > In Pulsar Rest API
> > > > ```
> > > > curl GET "http://127.0.0.1:8080/test-tenant/ns1/tp1/stats
> > > > "?subscriptionBacklogSize=true
> > > > ```
> > > > the default value of parameter `subscriptionBacklogSize ` will be
> > `true`
> > > >
> > > >
> > > > ### The following is the process of calculating backlog size:
> > > > - Divide `PersistentTopc.ledgers` into two parts according to the
> > > ledgerId
> > > > of the mark delete position of the cursor. The second part is ledgers
> > > > indicating the messages still need to be consumed, aka
> > > backlogSizeInLedgers.
> > > > - Find the LedgerInfo whose ledgerId is the same as the ledgerId of the
> > > > mark delete position of the cursor, and we can also divide the ledger
> > > into
> > > > two parts, the second part is entries indicating the messages still
> > need
> > > to
> > > > be consumed, multiply the average size of each entry in metrics by the
> > > > number of still need to be consumed entries we can get the backlog size
> > > in
> > > > this ledger. aka backlogSizeInEntries.
> > > > - `backlogSizeInLe dgers` + `backlogSizeInEntries`
> > > >
> > > > Thanks
> > > > Yubiao Feng
> > >
> > >
> >

Re: [DISCUSS] Make the default value of param "--get-subscription-backlog-size" of admin API "topics stats" true

Posted by Yubiao Feng <yu...@streamnative.io.INVALID>.
Hi Asaf

> It's worth noting that the estimated backlog size of a subscription is
estimated since it doesn't consider any acknowledged messages between the
mark delete position and the last message. It simply assumes all messages
between the mark delete position and the last message have not been
acknowledged.

Yes, it's not the exact value of the backlog. There are two reasons for the
loss of accuracy:
- Whether the Entry size is closer to the `averageSize`.
- The number of messages after the mark deleted position has been
acknowledged.

Thanks
Yubiao Feng

On Tue, Jan 17, 2023 at 3:31 PM Asaf Mesika <as...@gmail.com> wrote:

> Small question regarding this:
>
> The code for calculation is:
>
> long estimateBacklogFromPosition(PositionImpl pos) {
>     synchronized (this) {
>         long sizeBeforePosLedger =
> ledgers.headMap(pos.getLedgerId()).values()
>                 .stream().mapToLong(LedgerInfo::getSize).sum();
>         LedgerInfo ledgerInfo = ledgers.get(pos.getLedgerId());
>         long sizeAfter = getTotalSize() - sizeBeforePosLedger;
>         if (ledgerInfo == null) {
>             return sizeAfter;
>         } else if (pos.getLedgerId() == currentLedger.getId()) {
>             return sizeAfter - consumedLedgerSize(currentLedgerSize,
> currentLedgerEntries, pos.getEntryId());
>         } else {
>             return sizeAfter -
> consumedLedgerSize(ledgerInfo.getSize(), ledgerInfo.getEntries(),
> pos.getEntryId());
>         }
>     }
> }
>
> and
>
> private long consumedLedgerSize(long ledgerSize, long ledgerEntries,
> long consumedEntries) {
>     if (ledgerEntries <= 0) {
>         return 0;
>     }
>     if (ledgerEntries <= (consumedEntries + 1)) {
>         return ledgerSize;
>     } else {
>         long averageSize = ledgerSize / ledgerEntries;
>         return consumedEntries >= 0 ? (consumedEntries + 1) * averageSize
> : 0;
>     }
> }
>
>
>
> It's worth noting that the estimated backlog size of a subscription is
> estimated since it doesn't consider any acknowledged messages between the
> mark delete position and the last message. It simply assumes all messages
> between the mark delete position and the last message have not been
> acknowledged.
>
> Good idea - +1
>
> On Tue, Jan 17, 2023 at 4:12 AM PengHui Li <co...@gmail.com>
> wrote:
>
> > +1
> >
> > Penghui
> >
> > > On Jan 16, 2023, at 23:36, Yubiao Feng <yubiao.feng@streamnative.io
> .INVALID>
> > wrote:
> > >
> > > Hi community
> > >
> > > I am starting a DISCUSS for making the default value of the parameter
> > > "--get-subscription-backlog-size" of admin API "topics stats" true.
> > >
> > > In the PR https://github.com/apache/pulsar/pull/9302, the property
> > backlog
> > > size of each subscription returned in the response of the API topics
> > stats,
> > > by default this property is always equal to 0 in response, and this
> will
> > > confuse users. Since the calculation of backlog size is done in broker
> > > memory, there is no significant overhead(the process is described in
> the
> > > following section), so I think the correct values should be displayed
> by
> > > default.
> > >
> > > ### The following two APIs should be affected:
> > >
> > > In Pulsar admin API
> > > ```
> > > pulsar-admin topics stats persistent://test-tenant/ns1/tp1
> > > --get-subscription-backlog-size
> > > pulsar-admin topics stats persistent://test-tenant/ns1/tp1 -sbs
> > > ```
> > > the default value of parameter `--get-subscription-backlog-size` will
> be
> > > `true`
> > >
> > > In Pulsar Rest API
> > > ```
> > > curl GET "http://127.0.0.1:8080/test-tenant/ns1/tp1/stats
> > > "?subscriptionBacklogSize=true
> > > ```
> > > the default value of parameter `subscriptionBacklogSize ` will be
> `true`
> > >
> > >
> > > ### The following is the process of calculating backlog size:
> > > - Divide `PersistentTopc.ledgers` into two parts according to the
> > ledgerId
> > > of the mark delete position of the cursor. The second part is ledgers
> > > indicating the messages still need to be consumed, aka
> > backlogSizeInLedgers.
> > > - Find the LedgerInfo whose ledgerId is the same as the ledgerId of the
> > > mark delete position of the cursor, and we can also divide the ledger
> > into
> > > two parts, the second part is entries indicating the messages still
> need
> > to
> > > be consumed, multiply the average size of each entry in metrics by the
> > > number of still need to be consumed entries we can get the backlog size
> > in
> > > this ledger. aka backlogSizeInEntries.
> > > - `backlogSizeInLe dgers` + `backlogSizeInEntries`
> > >
> > > Thanks
> > > Yubiao Feng
> >
> >
>

Re: [DISCUSS] Make the default value of param "--get-subscription-backlog-size" of admin API "topics stats" true

Posted by Asaf Mesika <as...@gmail.com>.
Small question regarding this:

The code for calculation is:

long estimateBacklogFromPosition(PositionImpl pos) {
    synchronized (this) {
        long sizeBeforePosLedger = ledgers.headMap(pos.getLedgerId()).values()
                .stream().mapToLong(LedgerInfo::getSize).sum();
        LedgerInfo ledgerInfo = ledgers.get(pos.getLedgerId());
        long sizeAfter = getTotalSize() - sizeBeforePosLedger;
        if (ledgerInfo == null) {
            return sizeAfter;
        } else if (pos.getLedgerId() == currentLedger.getId()) {
            return sizeAfter - consumedLedgerSize(currentLedgerSize,
currentLedgerEntries, pos.getEntryId());
        } else {
            return sizeAfter -
consumedLedgerSize(ledgerInfo.getSize(), ledgerInfo.getEntries(),
pos.getEntryId());
        }
    }
}

and

private long consumedLedgerSize(long ledgerSize, long ledgerEntries,
long consumedEntries) {
    if (ledgerEntries <= 0) {
        return 0;
    }
    if (ledgerEntries <= (consumedEntries + 1)) {
        return ledgerSize;
    } else {
        long averageSize = ledgerSize / ledgerEntries;
        return consumedEntries >= 0 ? (consumedEntries + 1) * averageSize : 0;
    }
}



It's worth noting that the estimated backlog size of a subscription is
estimated since it doesn't consider any acknowledged messages between the
mark delete position and the last message. It simply assumes all messages
between the mark delete position and the last message have not been
acknowledged.

Good idea - +1

On Tue, Jan 17, 2023 at 4:12 AM PengHui Li <co...@gmail.com> wrote:

> +1
>
> Penghui
>
> > On Jan 16, 2023, at 23:36, Yubiao Feng <yu...@streamnative.io.INVALID>
> wrote:
> >
> > Hi community
> >
> > I am starting a DISCUSS for making the default value of the parameter
> > "--get-subscription-backlog-size" of admin API "topics stats" true.
> >
> > In the PR https://github.com/apache/pulsar/pull/9302, the property
> backlog
> > size of each subscription returned in the response of the API topics
> stats,
> > by default this property is always equal to 0 in response, and this will
> > confuse users. Since the calculation of backlog size is done in broker
> > memory, there is no significant overhead(the process is described in the
> > following section), so I think the correct values should be displayed by
> > default.
> >
> > ### The following two APIs should be affected:
> >
> > In Pulsar admin API
> > ```
> > pulsar-admin topics stats persistent://test-tenant/ns1/tp1
> > --get-subscription-backlog-size
> > pulsar-admin topics stats persistent://test-tenant/ns1/tp1 -sbs
> > ```
> > the default value of parameter `--get-subscription-backlog-size` will be
> > `true`
> >
> > In Pulsar Rest API
> > ```
> > curl GET "http://127.0.0.1:8080/test-tenant/ns1/tp1/stats
> > "?subscriptionBacklogSize=true
> > ```
> > the default value of parameter `subscriptionBacklogSize ` will be `true`
> >
> >
> > ### The following is the process of calculating backlog size:
> > - Divide `PersistentTopc.ledgers` into two parts according to the
> ledgerId
> > of the mark delete position of the cursor. The second part is ledgers
> > indicating the messages still need to be consumed, aka
> backlogSizeInLedgers.
> > - Find the LedgerInfo whose ledgerId is the same as the ledgerId of the
> > mark delete position of the cursor, and we can also divide the ledger
> into
> > two parts, the second part is entries indicating the messages still need
> to
> > be consumed, multiply the average size of each entry in metrics by the
> > number of still need to be consumed entries we can get the backlog size
> in
> > this ledger. aka backlogSizeInEntries.
> > - `backlogSizeInLe dgers` + `backlogSizeInEntries`
> >
> > Thanks
> > Yubiao Feng
>
>

Re: [DISCUSS] Make the default value of param "--get-subscription-backlog-size" of admin API "topics stats" true

Posted by PengHui Li <co...@gmail.com>.
+1

Penghui

> On Jan 16, 2023, at 23:36, Yubiao Feng <yu...@streamnative.io.INVALID> wrote:
> 
> Hi community
> 
> I am starting a DISCUSS for making the default value of the parameter
> "--get-subscription-backlog-size" of admin API "topics stats" true.
> 
> In the PR https://github.com/apache/pulsar/pull/9302, the property backlog
> size of each subscription returned in the response of the API topics stats,
> by default this property is always equal to 0 in response, and this will
> confuse users. Since the calculation of backlog size is done in broker
> memory, there is no significant overhead(the process is described in the
> following section), so I think the correct values should be displayed by
> default.
> 
> ### The following two APIs should be affected:
> 
> In Pulsar admin API
> ```
> pulsar-admin topics stats persistent://test-tenant/ns1/tp1
> --get-subscription-backlog-size
> pulsar-admin topics stats persistent://test-tenant/ns1/tp1 -sbs
> ```
> the default value of parameter `--get-subscription-backlog-size` will be
> `true`
> 
> In Pulsar Rest API
> ```
> curl GET "http://127.0.0.1:8080/test-tenant/ns1/tp1/stats
> "?subscriptionBacklogSize=true
> ```
> the default value of parameter `subscriptionBacklogSize ` will be `true`
> 
> 
> ### The following is the process of calculating backlog size:
> - Divide `PersistentTopc.ledgers` into two parts according to the ledgerId
> of the mark delete position of the cursor. The second part is ledgers
> indicating the messages still need to be consumed, aka backlogSizeInLedgers.
> - Find the LedgerInfo whose ledgerId is the same as the ledgerId of the
> mark delete position of the cursor, and we can also divide the ledger into
> two parts, the second part is entries indicating the messages still need to
> be consumed, multiply the average size of each entry in metrics by the
> number of still need to be consumed entries we can get the backlog size in
> this ledger. aka backlogSizeInEntries.
> - `backlogSizeInLe dgers` + `backlogSizeInEntries`
> 
> Thanks
> Yubiao Feng