You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Joe F <jo...@apache.org> on 2017/09/07 15:07:45 UTC

Re: [GitHub] merlimat commented on issue #742: Avoid huge backlog on topic reloading: due to large gap between markDelete-offset and read-position of cursor.

I don't think we should attempt to store "regular" ack holes.  What I mean
there is that all shared(queue) consumers will have ack holes in the
regular order of business.  So if we store this unconditionally, we will
end up doing this for all queues.

Ideally we want the mark delete to keep pace with the dispatch rate (more
or less). When that does not happen, we end up with the a large set of
duplicate deliveries on topic reload.

So one way to specify the problem would be to set a size N (configurable,
with a sensible default) that would be the operating gap between mark
delete and the read cursor. Anything that falls beyond N is an ack hole.

The other comment I have is that the  "ack hole" is  a concept that is not
easy for end users to understand. Holes could vary in size.  There could be
10K messages in a single ack hole, or 1000 messages in 1000 ack holes (one
in each.  So in one case you could have 10,000 unacked messages and have no
issues, but in the other case you could have 1,001 messages and have
issues.   This would be difficult to explain, without users grasping what
happens under the covers.  I would rather we use the concept and
terminology  of the  "number of unacked messages".

This is an area where we have and continue to face significant user
education and support cases, hence my emphasis on user friendliness.

Joe

On Wed, Sep 6, 2017 at 5:02 PM, <gi...@git.apache.org> wrote:

> merlimat commented on issue #742: Avoid huge backlog on topic reloading:
> due to large gap between markDelete-offset and read-position of cursor.
> URL: https://github.com/apache/incubator-pulsar/issues/742#
> issuecomment-327644681
>
>
>    >  where large ack-holes can create backlog
>
>    large *number* of ack-holes
>
>    > So, I think having option to restrict distance between markDelete and
> readPosition is something useful for broker.
>
>    It's not related at all with the distance. I think the better metric is
> indeed the number of "holes"
>
> ----------------------------------------------------------------
> This is an automated message from the Apache Git Service.
> To respond to the message, please log on GitHub and use the
> URL above to go to the specific comment.
>
> For queries about this service, please contact Infrastructure at:
> users@infra.apache.org
>
>
> With regards,
> Apache Git Services
>

Re: [GitHub] merlimat commented on issue #742: Avoid huge backlog on topic reloading: due to large gap between markDelete-offset and read-position of cursor.

Posted by Matteo Merli <ma...@gmail.com>.
On Thu, Sep 7, 2017 at 3:18 PM Rajan Dhabalia <rd...@apache.org> wrote:

> > and leave normal queue consumption out of this mechanism, (to reduce
> the ZK writes)
> >> To be precise, these are BookKeeper writes that would be happening
> anyway
>
> Just to clarify: Broker also stores ack-holes in ZK along with BK
> (Cursor-ledger). But Broker only writes it to ZK when broker unloads the
> topic gracefully and deletes the cursor-ledger.
>

That's correct. The cursor state (along with info about messages deleted
individually) is snapshotted into the ZK z-node.

That doesn't increase the rate of ZK writes, just the size of it when it's
happening (and with an upper bound).



> > We could also do that once you reach the max number of "holes" the
> delivery stops
> The only problem I see in restricting based on ack-holes metrics is
> "Ack-hole doesn't follow any pattern and it might not be in sequence".
> *For example:*
> If we have that max-number of ack-hole is = 1K
> and if consumer acks alternate consumed message then there will be 1K
> ack-holes built with in 2K consumed messages, and broker will stop the
> message-delivery.
>

I don't think that acknowledging every-other message for an extensive
amount of time should be considered a "valid" use case.

There are multiple reasons to acknowledge out of order, but we cannot keep
and arbitrarily big state indefinitely.

Consumer should not suffer in this usecase where consumer is blocked after
> consuming only 2K messages.
>

There are not many options here.. Since I think we all agree that we
shouldn't store more than N "holes" in any case.

Currently we are not storing more that 1K "holes" and we just keep the rest
of them in memory only.

So the options would be:
 1. Continue as today, if you have more than 1K (or 10K should be better
default) "holes", you can have duplicates after a broker restart. Cursor
will not know of the "holes" after the first 1K (might be big or small..)
 2. Stop delivery after 1K holes
 3. Have alerts that get triggered at 2/3 of the max number of "holes" so
that user can be notified of abnormal acknowledging pattern

Stopping delivery based on read position vs mark-delete position doesn't
really address this problem.


My personal preference would be to do 1 & 3. Same behavior as today with
clear actionable metrics getting reported.



-- 
Matteo Merli
<mm...@apache.org>

Re: [GitHub] merlimat commented on issue #742: Avoid huge backlog on topic reloading: due to large gap between markDelete-offset and read-position of cursor.

Posted by Rajan Dhabalia <rd...@apache.org>.
> and leave normal queue consumption out of this mechanism, (to reduce the
ZK writes)
>> To be precise, these are BookKeeper writes that would be happening anyway

Just to clarify: Broker also stores ack-holes in ZK along with BK
(Cursor-ledger). But Broker only writes it to ZK when broker unloads the
topic gracefully and deletes the cursor-ledger.


> We could also do that once you reach the max number of "holes" the
delivery stops
The only problem I see in restricting based on ack-holes metrics is
"Ack-hole doesn't follow any pattern and it might not be in sequence".
*For example:*
If we have that max-number of ack-hole is = 1K
and if consumer acks alternate consumed message then there will be 1K
ack-holes built with in 2K consumed messages, and broker will stop the
message-delivery. Consumer should not suffer in this usecase where consumer
is blocked after consuming only 2K messages.

Thanks,
Rajan



On Thu, Sep 7, 2017 at 11:29 AM, Matteo Merli <ma...@gmail.com>
wrote:

> On Thu, Sep 7, 2017 at 11:10 AM Joe F <jo...@apache.org> wrote:
>
> > Sure. Setting N=0 (or 1),  and you will get this behavior. In a large (1M
> > topics) system, I  could, say set N = 1000, and leave normal queue
> > consumption out of this mechanism, (to reduce the ZK writes).   So this
> > approach is flexible enough to handle both situations.
> >
>
> To be precise, these are BookKeeper writes that would be happening anyway
> (when the cursor is updated). The only difference is that the size of that
> BK entry increases when there are "holes", up to 154Kb for 10K holes.
>
> Also these BK writes for cursor updates are already throttled (by default 1
> per sec)
>
>
> > Ack holes are not an useful user metric. They are  useful as a system
> > limit - only so many holes can be stored. and as a system metric we can
> use
> > that.
> >
>
> Absolutely agree that are not useful to user, though they are the best
> indicator for triggering a system level alert, so that someone can take a
> look and explain to user :)
>
>
> > But to me as an user, letting 1M messages go unacked is a problem - even
> > though Pulsar can store it super-efficient. I see the core user problem
> as
> > the unexpected duplicate delivery of a large set of messages  on certain
> > broker failures.  As distinct from how Pulsar can store a large set of
> > holes, and to what limit.
> >
>
> Even if we can store a large set of messages super-efficient, it does not
> > resolve the end user pain experienced today - which is a large set of old
> > messages getting delivered on broker failure. Why did you dump 1M
> messages
> > on me all of a sudden after a week? --  that is the question we have run
> > into many times.  So I would prefer that once a certain set limit is
> > crossed, delivery is stopped/alert generated and the the user fixes the
> > problem.  In essence, deal with unacked messages as a first class user
> > resource limit, like backlog quota.
> >
>
> We could also do that once you reach the max number of "holes" the delivery
> stops.
>
> --
> Matteo Merli
> <mm...@apache.org>
>

Re: [GitHub] merlimat commented on issue #742: Avoid huge backlog on topic reloading: due to large gap between markDelete-offset and read-position of cursor.

Posted by Rajan Dhabalia <rd...@apache.org>.
> and leave normal queue consumption out of this mechanism, (to reduce the
ZK writes)
>> To be precise, these are BookKeeper writes that would be happening anyway

Just to clarify: Broker also stores ack-holes in ZK along with BK
(Cursor-ledger). But Broker only writes it to ZK when broker unloads the
topic gracefully and deletes the cursor-ledger.


> We could also do that once you reach the max number of "holes" the
delivery stops
The only problem I see in restricting based on ack-holes metrics is
"Ack-hole doesn't follow any pattern and it might not be in sequence".
*For example:*
If we have that max-number of ack-hole is = 1K
and if consumer acks alternate consumed message then there will be 1K
ack-holes built with in 2K consumed messages, and broker will stop the
message-delivery. Consumer should not suffer in this usecase where consumer
is blocked after consuming only 2K messages.

Thanks,
Rajan



On Thu, Sep 7, 2017 at 11:29 AM, Matteo Merli <ma...@gmail.com>
wrote:

> On Thu, Sep 7, 2017 at 11:10 AM Joe F <jo...@apache.org> wrote:
>
> > Sure. Setting N=0 (or 1),  and you will get this behavior. In a large (1M
> > topics) system, I  could, say set N = 1000, and leave normal queue
> > consumption out of this mechanism, (to reduce the ZK writes).   So this
> > approach is flexible enough to handle both situations.
> >
>
> To be precise, these are BookKeeper writes that would be happening anyway
> (when the cursor is updated). The only difference is that the size of that
> BK entry increases when there are "holes", up to 154Kb for 10K holes.
>
> Also these BK writes for cursor updates are already throttled (by default 1
> per sec)
>
>
> > Ack holes are not an useful user metric. They are  useful as a system
> > limit - only so many holes can be stored. and as a system metric we can
> use
> > that.
> >
>
> Absolutely agree that are not useful to user, though they are the best
> indicator for triggering a system level alert, so that someone can take a
> look and explain to user :)
>
>
> > But to me as an user, letting 1M messages go unacked is a problem - even
> > though Pulsar can store it super-efficient. I see the core user problem
> as
> > the unexpected duplicate delivery of a large set of messages  on certain
> > broker failures.  As distinct from how Pulsar can store a large set of
> > holes, and to what limit.
> >
>
> Even if we can store a large set of messages super-efficient, it does not
> > resolve the end user pain experienced today - which is a large set of old
> > messages getting delivered on broker failure. Why did you dump 1M
> messages
> > on me all of a sudden after a week? --  that is the question we have run
> > into many times.  So I would prefer that once a certain set limit is
> > crossed, delivery is stopped/alert generated and the the user fixes the
> > problem.  In essence, deal with unacked messages as a first class user
> > resource limit, like backlog quota.
> >
>
> We could also do that once you reach the max number of "holes" the delivery
> stops.
>
> --
> Matteo Merli
> <mm...@apache.org>
>

Re: [GitHub] merlimat commented on issue #742: Avoid huge backlog on topic reloading: due to large gap between markDelete-offset and read-position of cursor.

Posted by Matteo Merli <ma...@gmail.com>.
On Thu, Sep 7, 2017 at 11:10 AM Joe F <jo...@apache.org> wrote:

> Sure. Setting N=0 (or 1),  and you will get this behavior. In a large (1M
> topics) system, I  could, say set N = 1000, and leave normal queue
> consumption out of this mechanism, (to reduce the ZK writes).   So this
> approach is flexible enough to handle both situations.
>

To be precise, these are BookKeeper writes that would be happening anyway
(when the cursor is updated). The only difference is that the size of that
BK entry increases when there are "holes", up to 154Kb for 10K holes.

Also these BK writes for cursor updates are already throttled (by default 1
per sec)


> Ack holes are not an useful user metric. They are  useful as a system
> limit - only so many holes can be stored. and as a system metric we can use
> that.
>

Absolutely agree that are not useful to user, though they are the best
indicator for triggering a system level alert, so that someone can take a
look and explain to user :)


> But to me as an user, letting 1M messages go unacked is a problem - even
> though Pulsar can store it super-efficient. I see the core user problem as
> the unexpected duplicate delivery of a large set of messages  on certain
> broker failures.  As distinct from how Pulsar can store a large set of
> holes, and to what limit.
>

Even if we can store a large set of messages super-efficient, it does not
> resolve the end user pain experienced today - which is a large set of old
> messages getting delivered on broker failure. Why did you dump 1M messages
> on me all of a sudden after a week? --  that is the question we have run
> into many times.  So I would prefer that once a certain set limit is
> crossed, delivery is stopped/alert generated and the the user fixes the
> problem.  In essence, deal with unacked messages as a first class user
> resource limit, like backlog quota.
>

We could also do that once you reach the max number of "holes" the delivery
stops.

-- 
Matteo Merli
<mm...@apache.org>

Re: [GitHub] merlimat commented on issue #742: Avoid huge backlog on topic reloading: due to large gap between markDelete-offset and read-position of cursor.

Posted by Matteo Merli <ma...@gmail.com>.
On Thu, Sep 7, 2017 at 11:10 AM Joe F <jo...@apache.org> wrote:

> Sure. Setting N=0 (or 1),  and you will get this behavior. In a large (1M
> topics) system, I  could, say set N = 1000, and leave normal queue
> consumption out of this mechanism, (to reduce the ZK writes).   So this
> approach is flexible enough to handle both situations.
>

To be precise, these are BookKeeper writes that would be happening anyway
(when the cursor is updated). The only difference is that the size of that
BK entry increases when there are "holes", up to 154Kb for 10K holes.

Also these BK writes for cursor updates are already throttled (by default 1
per sec)


> Ack holes are not an useful user metric. They are  useful as a system
> limit - only so many holes can be stored. and as a system metric we can use
> that.
>

Absolutely agree that are not useful to user, though they are the best
indicator for triggering a system level alert, so that someone can take a
look and explain to user :)


> But to me as an user, letting 1M messages go unacked is a problem - even
> though Pulsar can store it super-efficient. I see the core user problem as
> the unexpected duplicate delivery of a large set of messages  on certain
> broker failures.  As distinct from how Pulsar can store a large set of
> holes, and to what limit.
>

Even if we can store a large set of messages super-efficient, it does not
> resolve the end user pain experienced today - which is a large set of old
> messages getting delivered on broker failure. Why did you dump 1M messages
> on me all of a sudden after a week? --  that is the question we have run
> into many times.  So I would prefer that once a certain set limit is
> crossed, delivery is stopped/alert generated and the the user fixes the
> problem.  In essence, deal with unacked messages as a first class user
> resource limit, like backlog quota.
>

We could also do that once you reach the max number of "holes" the delivery
stops.

-- 
Matteo Merli
<mm...@apache.org>

Re: [GitHub] merlimat commented on issue #742: Avoid huge backlog on topic reloading: due to large gap between markDelete-offset and read-position of cursor.

Posted by Joe F <jo...@apache.org>.
>The broker needs to remember message by message (or better said, entry by
entry in case of batches).

Sure. Setting N=0 (or 1),  and you will get this behavior. In a large (1M
topics) system, I  could, say set N = 1000, and leave normal queue
consumption out of this mechanism, (to reduce the ZK writes).   So this
approach is flexible enough to handle both situations.

Ack holes are not an useful user metric. They are  useful as a system limit
- only so many holes can be stored. and as a system metric we can use that.


But to me as an user, letting 1M messages go unacked is a problem - even
though Pulsar can store it super-efficient. I see the core user problem as
the unexpected duplicate delivery of a large set of messages  on certain
broker failures.  As distinct from how Pulsar can store a large set of
holes, and to what limit.

Even if we can store a large set of messages super-efficient, it does not
resolve the end user pain experienced today - which is a large set of old
messages getting delivered on broker failure. Why did you dump 1M messages
on me all of a sudden after a week? --  that is the question we have run
into many times.  So I would prefer that once a certain set limit is
crossed, delivery is stopped/alert generated and the the user fixes the
problem.  In essence, deal with unacked messages as a first class user
resource limit, like backlog quota.


Joe



On Thu, Sep 7, 2017 at 10:06 AM, Matteo Merli <ma...@gmail.com>
wrote:

> > Ideally we want the mark delete to keep pace with the dispatch rate
> (more or less). When that does not happen, we end up with the a large set
> of duplicate deliveries on topic reload.
>
> That's not exact. If you delay the ack for a particular message, that
> information is being stored. After crash/restart, only that message will be
> re-delivered.
>
> > So one way to specify the problem would be to set a size N
> (configurable, with a sensible default) that would be the operating gap
> between mark delete and the read cursor. Anything that falls beyond N is an
> ack hole.
>
> The broker needs to remember message by message (or better said, entry by
> entry in case of batches).
>
> > This would be difficult to explain, without users grasping what happens
> under the covers.  I would rather we use the concept and terminology  of
> the  "number of unacked messages".
>
> But that is a bad metric to prevent issues in the broker.
>
> Even if you don't ack 1M messages, if they are consecutives, they won't
> pose any issue. We just need to store 1 range info:
> Acked messages: [0, 100) [100000, 2000000)
> This info is very compact.
>
> The limit on the *number* of ack-holes is to limit the number of ranges we
> need to store. If (per Rajan's test) we can store 10K disjointed ranges
> with 154KB size, I think that's a pretty good trade off.
>
> You can acknowledge out-of-order and the broker will safely store that
> information, up to a certain limit. The 10K limit (for ranges) should be
> high enough to not be a concern under regular consumer operations, but
> preventing a misbehaving consumer to cause problem in the system.
>
> The "alert" I was mentioning earlier, should be based on this number. If
> that goes high, it should be good to check with user on how he's using
> acknowledgements.
>
>
>
>
> On Thu, Sep 7, 2017 at 8:07 AM Joe F <jo...@apache.org> wrote:
>
>> I don't think we should attempt to store "regular" ack holes.  What I
>> mean there is that all shared(queue) consumers will have ack holes in the
>> regular order of business.  So if we store this unconditionally, we will
>> end up doing this for all queues.
>>
>> Ideally we want the mark delete to keep pace with the dispatch rate (more
>> or less). When that does not happen, we end up with the a large set of
>> duplicate deliveries on topic reload.
>>
>> So one way to specify the problem would be to set a size N (configurable,
>> with a sensible default) that would be the operating gap between mark
>> delete and the read cursor. Anything that falls beyond N is an ack hole.
>>
>> The other comment I have is that the  "ack hole" is  a concept that is
>> not easy for end users to understand. Holes could vary in size.  There
>> could be 10K messages in a single ack hole, or 1000 messages in 1000 ack
>> holes (one in each.  So in one case you could have 10,000 unacked messages
>> and have no issues, but in the other case you could have 1,001 messages and
>> have issues.   This would be difficult to explain, without users grasping
>> what happens under the covers.  I would rather we use the concept and
>> terminology  of the  "number of unacked messages".
>>
>> This is an area where we have and continue to face significant user
>> education and support cases, hence my emphasis on user friendliness.
>>
>> Joe
>>
>> On Wed, Sep 6, 2017 at 5:02 PM, <gi...@git.apache.org> wrote:
>>
>>> merlimat commented on issue #742: Avoid huge backlog on topic reloading:
>>> due to large gap between markDelete-offset and read-position of cursor.
>>> URL: https://github.com/apache/incubator-pulsar/issues/742#
>>> issuecomment-327644681
>>>
>>>
>>>    >  where large ack-holes can create backlog
>>>
>>>    large *number* of ack-holes
>>>
>>>    > So, I think having option to restrict distance between markDelete
>>> and readPosition is something useful for broker.
>>>
>>>    It's not related at all with the distance. I think the better metric
>>> is indeed the number of "holes"
>>>
>>> ----------------------------------------------------------------
>>> This is an automated message from the Apache Git Service.
>>> To respond to the message, please log on GitHub and use the
>>> URL above to go to the specific comment.
>>>
>>> For queries about this service, please contact Infrastructure at:
>>> users@infra.apache.org
>>>
>>>
>>> With regards,
>>> Apache Git Services
>>>
>>
>> --
> Matteo Merli
> <mm...@apache.org>
>

Re: [GitHub] merlimat commented on issue #742: Avoid huge backlog on topic reloading: due to large gap between markDelete-offset and read-position of cursor.

Posted by Joe F <jo...@apache.org>.
>The broker needs to remember message by message (or better said, entry by
entry in case of batches).

Sure. Setting N=0 (or 1),  and you will get this behavior. In a large (1M
topics) system, I  could, say set N = 1000, and leave normal queue
consumption out of this mechanism, (to reduce the ZK writes).   So this
approach is flexible enough to handle both situations.

Ack holes are not an useful user metric. They are  useful as a system limit
- only so many holes can be stored. and as a system metric we can use that.


But to me as an user, letting 1M messages go unacked is a problem - even
though Pulsar can store it super-efficient. I see the core user problem as
the unexpected duplicate delivery of a large set of messages  on certain
broker failures.  As distinct from how Pulsar can store a large set of
holes, and to what limit.

Even if we can store a large set of messages super-efficient, it does not
resolve the end user pain experienced today - which is a large set of old
messages getting delivered on broker failure. Why did you dump 1M messages
on me all of a sudden after a week? --  that is the question we have run
into many times.  So I would prefer that once a certain set limit is
crossed, delivery is stopped/alert generated and the the user fixes the
problem.  In essence, deal with unacked messages as a first class user
resource limit, like backlog quota.


Joe



On Thu, Sep 7, 2017 at 10:06 AM, Matteo Merli <ma...@gmail.com>
wrote:

> > Ideally we want the mark delete to keep pace with the dispatch rate
> (more or less). When that does not happen, we end up with the a large set
> of duplicate deliveries on topic reload.
>
> That's not exact. If you delay the ack for a particular message, that
> information is being stored. After crash/restart, only that message will be
> re-delivered.
>
> > So one way to specify the problem would be to set a size N
> (configurable, with a sensible default) that would be the operating gap
> between mark delete and the read cursor. Anything that falls beyond N is an
> ack hole.
>
> The broker needs to remember message by message (or better said, entry by
> entry in case of batches).
>
> > This would be difficult to explain, without users grasping what happens
> under the covers.  I would rather we use the concept and terminology  of
> the  "number of unacked messages".
>
> But that is a bad metric to prevent issues in the broker.
>
> Even if you don't ack 1M messages, if they are consecutives, they won't
> pose any issue. We just need to store 1 range info:
> Acked messages: [0, 100) [100000, 2000000)
> This info is very compact.
>
> The limit on the *number* of ack-holes is to limit the number of ranges we
> need to store. If (per Rajan's test) we can store 10K disjointed ranges
> with 154KB size, I think that's a pretty good trade off.
>
> You can acknowledge out-of-order and the broker will safely store that
> information, up to a certain limit. The 10K limit (for ranges) should be
> high enough to not be a concern under regular consumer operations, but
> preventing a misbehaving consumer to cause problem in the system.
>
> The "alert" I was mentioning earlier, should be based on this number. If
> that goes high, it should be good to check with user on how he's using
> acknowledgements.
>
>
>
>
> On Thu, Sep 7, 2017 at 8:07 AM Joe F <jo...@apache.org> wrote:
>
>> I don't think we should attempt to store "regular" ack holes.  What I
>> mean there is that all shared(queue) consumers will have ack holes in the
>> regular order of business.  So if we store this unconditionally, we will
>> end up doing this for all queues.
>>
>> Ideally we want the mark delete to keep pace with the dispatch rate (more
>> or less). When that does not happen, we end up with the a large set of
>> duplicate deliveries on topic reload.
>>
>> So one way to specify the problem would be to set a size N (configurable,
>> with a sensible default) that would be the operating gap between mark
>> delete and the read cursor. Anything that falls beyond N is an ack hole.
>>
>> The other comment I have is that the  "ack hole" is  a concept that is
>> not easy for end users to understand. Holes could vary in size.  There
>> could be 10K messages in a single ack hole, or 1000 messages in 1000 ack
>> holes (one in each.  So in one case you could have 10,000 unacked messages
>> and have no issues, but in the other case you could have 1,001 messages and
>> have issues.   This would be difficult to explain, without users grasping
>> what happens under the covers.  I would rather we use the concept and
>> terminology  of the  "number of unacked messages".
>>
>> This is an area where we have and continue to face significant user
>> education and support cases, hence my emphasis on user friendliness.
>>
>> Joe
>>
>> On Wed, Sep 6, 2017 at 5:02 PM, <gi...@git.apache.org> wrote:
>>
>>> merlimat commented on issue #742: Avoid huge backlog on topic reloading:
>>> due to large gap between markDelete-offset and read-position of cursor.
>>> URL: https://github.com/apache/incubator-pulsar/issues/742#
>>> issuecomment-327644681
>>>
>>>
>>>    >  where large ack-holes can create backlog
>>>
>>>    large *number* of ack-holes
>>>
>>>    > So, I think having option to restrict distance between markDelete
>>> and readPosition is something useful for broker.
>>>
>>>    It's not related at all with the distance. I think the better metric
>>> is indeed the number of "holes"
>>>
>>> ----------------------------------------------------------------
>>> This is an automated message from the Apache Git Service.
>>> To respond to the message, please log on GitHub and use the
>>> URL above to go to the specific comment.
>>>
>>> For queries about this service, please contact Infrastructure at:
>>> users@infra.apache.org
>>>
>>>
>>> With regards,
>>> Apache Git Services
>>>
>>
>> --
> Matteo Merli
> <mm...@apache.org>
>

Re: [GitHub] merlimat commented on issue #742: Avoid huge backlog on topic reloading: due to large gap between markDelete-offset and read-position of cursor.

Posted by Matteo Merli <ma...@gmail.com>.
> Ideally we want the mark delete to keep pace with the dispatch rate (more
or less). When that does not happen, we end up with the a large set of
duplicate deliveries on topic reload.

That's not exact. If you delay the ack for a particular message, that
information is being stored. After crash/restart, only that message will be
re-delivered.

> So one way to specify the problem would be to set a size N (configurable,
with a sensible default) that would be the operating gap between mark
delete and the read cursor. Anything that falls beyond N is an ack hole.

The broker needs to remember message by message (or better said, entry by
entry in case of batches).

> This would be difficult to explain, without users grasping what happens
under the covers.  I would rather we use the concept and terminology  of
the  "number of unacked messages".

But that is a bad metric to prevent issues in the broker.

Even if you don't ack 1M messages, if they are consecutives, they won't
pose any issue. We just need to store 1 range info:
Acked messages: [0, 100) [100000, 2000000)
This info is very compact.

The limit on the *number* of ack-holes is to limit the number of ranges we
need to store. If (per Rajan's test) we can store 10K disjointed ranges
with 154KB size, I think that's a pretty good trade off.

You can acknowledge out-of-order and the broker will safely store that
information, up to a certain limit. The 10K limit (for ranges) should be
high enough to not be a concern under regular consumer operations, but
preventing a misbehaving consumer to cause problem in the system.

The "alert" I was mentioning earlier, should be based on this number. If
that goes high, it should be good to check with user on how he's using
acknowledgements.




On Thu, Sep 7, 2017 at 8:07 AM Joe F <jo...@apache.org> wrote:

> I don't think we should attempt to store "regular" ack holes.  What I mean
> there is that all shared(queue) consumers will have ack holes in the
> regular order of business.  So if we store this unconditionally, we will
> end up doing this for all queues.
>
> Ideally we want the mark delete to keep pace with the dispatch rate (more
> or less). When that does not happen, we end up with the a large set of
> duplicate deliveries on topic reload.
>
> So one way to specify the problem would be to set a size N (configurable,
> with a sensible default) that would be the operating gap between mark
> delete and the read cursor. Anything that falls beyond N is an ack hole.
>
> The other comment I have is that the  "ack hole" is  a concept that is not
> easy for end users to understand. Holes could vary in size.  There could be
> 10K messages in a single ack hole, or 1000 messages in 1000 ack holes (one
> in each.  So in one case you could have 10,000 unacked messages and have no
> issues, but in the other case you could have 1,001 messages and have
> issues.   This would be difficult to explain, without users grasping what
> happens under the covers.  I would rather we use the concept and
> terminology  of the  "number of unacked messages".
>
> This is an area where we have and continue to face significant user
> education and support cases, hence my emphasis on user friendliness.
>
> Joe
>
> On Wed, Sep 6, 2017 at 5:02 PM, <gi...@git.apache.org> wrote:
>
>> merlimat commented on issue #742: Avoid huge backlog on topic reloading:
>> due to large gap between markDelete-offset and read-position of cursor.
>> URL:
>> https://github.com/apache/incubator-pulsar/issues/742#issuecomment-327644681
>>
>>
>>    >  where large ack-holes can create backlog
>>
>>    large *number* of ack-holes
>>
>>    > So, I think having option to restrict distance between markDelete
>> and readPosition is something useful for broker.
>>
>>    It's not related at all with the distance. I think the better metric
>> is indeed the number of "holes"
>>
>> ----------------------------------------------------------------
>> This is an automated message from the Apache Git Service.
>> To respond to the message, please log on GitHub and use the
>> URL above to go to the specific comment.
>>
>> For queries about this service, please contact Infrastructure at:
>> users@infra.apache.org
>>
>>
>> With regards,
>> Apache Git Services
>>
>
> --
Matteo Merli
<mm...@apache.org>

Re: [GitHub] merlimat commented on issue #742: Avoid huge backlog on topic reloading: due to large gap between markDelete-offset and read-position of cursor.

Posted by Matteo Merli <ma...@gmail.com>.
> Ideally we want the mark delete to keep pace with the dispatch rate (more
or less). When that does not happen, we end up with the a large set of
duplicate deliveries on topic reload.

That's not exact. If you delay the ack for a particular message, that
information is being stored. After crash/restart, only that message will be
re-delivered.

> So one way to specify the problem would be to set a size N (configurable,
with a sensible default) that would be the operating gap between mark
delete and the read cursor. Anything that falls beyond N is an ack hole.

The broker needs to remember message by message (or better said, entry by
entry in case of batches).

> This would be difficult to explain, without users grasping what happens
under the covers.  I would rather we use the concept and terminology  of
the  "number of unacked messages".

But that is a bad metric to prevent issues in the broker.

Even if you don't ack 1M messages, if they are consecutives, they won't
pose any issue. We just need to store 1 range info:
Acked messages: [0, 100) [100000, 2000000)
This info is very compact.

The limit on the *number* of ack-holes is to limit the number of ranges we
need to store. If (per Rajan's test) we can store 10K disjointed ranges
with 154KB size, I think that's a pretty good trade off.

You can acknowledge out-of-order and the broker will safely store that
information, up to a certain limit. The 10K limit (for ranges) should be
high enough to not be a concern under regular consumer operations, but
preventing a misbehaving consumer to cause problem in the system.

The "alert" I was mentioning earlier, should be based on this number. If
that goes high, it should be good to check with user on how he's using
acknowledgements.




On Thu, Sep 7, 2017 at 8:07 AM Joe F <jo...@apache.org> wrote:

> I don't think we should attempt to store "regular" ack holes.  What I mean
> there is that all shared(queue) consumers will have ack holes in the
> regular order of business.  So if we store this unconditionally, we will
> end up doing this for all queues.
>
> Ideally we want the mark delete to keep pace with the dispatch rate (more
> or less). When that does not happen, we end up with the a large set of
> duplicate deliveries on topic reload.
>
> So one way to specify the problem would be to set a size N (configurable,
> with a sensible default) that would be the operating gap between mark
> delete and the read cursor. Anything that falls beyond N is an ack hole.
>
> The other comment I have is that the  "ack hole" is  a concept that is not
> easy for end users to understand. Holes could vary in size.  There could be
> 10K messages in a single ack hole, or 1000 messages in 1000 ack holes (one
> in each.  So in one case you could have 10,000 unacked messages and have no
> issues, but in the other case you could have 1,001 messages and have
> issues.   This would be difficult to explain, without users grasping what
> happens under the covers.  I would rather we use the concept and
> terminology  of the  "number of unacked messages".
>
> This is an area where we have and continue to face significant user
> education and support cases, hence my emphasis on user friendliness.
>
> Joe
>
> On Wed, Sep 6, 2017 at 5:02 PM, <gi...@git.apache.org> wrote:
>
>> merlimat commented on issue #742: Avoid huge backlog on topic reloading:
>> due to large gap between markDelete-offset and read-position of cursor.
>> URL:
>> https://github.com/apache/incubator-pulsar/issues/742#issuecomment-327644681
>>
>>
>>    >  where large ack-holes can create backlog
>>
>>    large *number* of ack-holes
>>
>>    > So, I think having option to restrict distance between markDelete
>> and readPosition is something useful for broker.
>>
>>    It's not related at all with the distance. I think the better metric
>> is indeed the number of "holes"
>>
>> ----------------------------------------------------------------
>> This is an automated message from the Apache Git Service.
>> To respond to the message, please log on GitHub and use the
>> URL above to go to the specific comment.
>>
>> For queries about this service, please contact Infrastructure at:
>> users@infra.apache.org
>>
>>
>> With regards,
>> Apache Git Services
>>
>
> --
Matteo Merli
<mm...@apache.org>