You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/05/21 16:32:04 UTC

[GitHub] [pulsar] zbentley opened a new issue, #15705: New KeyShared consumers will not get any messages until a consumer that did get messages disconnects or acks/nacks some messages

zbentley opened a new issue, #15705:
URL: https://github.com/apache/pulsar/issues/15705

   **Describe the bug**
   
   If a KeyShared consumer has any unacknowledged messages, new KeyShared consumers on the same subscription after that point will not get any new messages (even messages with brand new keys) until the original consumer either disconnects or acks/nacks some indeterminate number of messages.
   
   This is a really bad bug!
   
   **Bug scenario 1**
   
   1. Create a **non partitioned** topic.
   2. Create a KeyShared subscription on that topic.
   3. Produce some number of messages on the topic with a given key, say `key1`, using a KeyBased batching strategy.
   4. Start a consumer on the topic with the below code and ensure it prints `Press a key to acknowledge messages`.
   5. Start a second consumer on the same topic, ensure it does not print `Press a key to acknowledge messages` (first consumer owns the key).
   6. Produce some number (10 should be sufficient) of messages on the topic with unique keys not equal to `key1`; say `key2`, `key3`, and so on. The goal here is to get a key that hashes to the second consumer's range.
   7. Observe that the second consumer never gets a message.
   8. In the first consumer's terminal, press enter.
   9. As the first consumer acks messages, observe that only then does the second consumer get any messages.
   
   **Bug scenario 2**
   1. Create a **non partitioned** topic.
   2. Create a KeyShared subscription on that topic.
   3. Produce 100 messages on that topic, each with a distinct partition key (e.g. `key1`, `key2` through `key100`).
   4. Start a consumer on the topic with the below code and ensure it prints `Press a key to acknowledge messages`.
   5. Start a second consumer on the topic with the below code.
   6. Observe that the second consumer does not receive any messages (i.e. it does not print `Press a key to acknowledge messages`), even though [hash range redistribution](https://medium.com/@ankushkhanna1988/apache-pulsar-key-shared-mode-sticky-consistent-hashing-a4ee7133930a) should have allocated at least some of the 100 keys to the new consumer.
   8. In the first consumer's terminal, press enter.
   9. As the first consumer acks messages, observe that only then does the second consumer get any messages.
   
   **Expected behavior**
   In scenario 1, the second consumer should receive at least some messages in step 6.
   In scenario 2, the second consumer should receive messages as soon as it starts.
   
   In short, **I think hash range redistribution is not working right, or is not triggering message re-routing:** when new KeyShared consumers arrive, two things should happen: 
   1. New consumers should be allocated part of the hash range of their subscription.
   2. Any backlog messages for keys in that range should be sent to the new consumer.
   
   Part 1 is working, but I think part 2 is not.
   
   **Environment:**
   
   Same environment as https://github.com/apache/pulsar/issues/15701


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] merlimat commented on issue #15705: New KeyShared consumers will not get any messages until a consumer that did get messages disconnects or acks/nacks some messages

Posted by GitBox <gi...@apache.org>.
merlimat commented on issue #15705:
URL: https://github.com/apache/pulsar/issues/15705#issuecomment-1136738845

   > @codelipenghui thank you for the explanation; that makes sense.
   > 
   > Two clarifications:
   > 
   > 1. How does this apply to nacked messages? if a new KeyShared consumer `c2` is blocked due to markDeletePosition not being caught up to the point where `c2` joined, if an existing consumer `c1` negatively acknowledges a message that hashes to `c2`, will the nacked message go to `c2` or `c1`?
   
   C2 will joined and be marked that it can only receive messages dispatched from the moment it joins. 
   
   In this example, a message nacked by c1, will still get redelivered to c1 (unless c1 disconnects), because the keys are not switched until everything that c1 has already received is acked.
   
   Otherwise, we could get a nack on one message and then on another and they could end up being out of order, eg: if c2 also goes away.
   
   > 2. Address this limitation, do you really need to track state for every key in the topic? I may be naïve here, but it seems to me that you would only need to track state for **each key which has messages that have been dispatched to a consumer**. That's still an O(N) state where there's currently not one, but it's a much smaller N. This might be getting into feature request territory now though.
   
   "**each key which has messages that have been dispatched to a consumer**." ... for which the worst case scenario is to track every key :) 
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] zbentley commented on issue #15705: New KeyShared consumers will not get any messages until a consumer that did get messages disconnects or acks/nacks some messages

Posted by GitBox <gi...@apache.org>.
zbentley commented on issue #15705:
URL: https://github.com/apache/pulsar/issues/15705#issuecomment-1134700810

   @codelipenghui thank you for the explanation; that makes sense.
   
   Two clarifications:
   1. How does this apply to nacked messages? if a new KeyShared consumer `c2` is blocked due to markDeletePosition not being caught up to the point where `c2` joined, if an existing consumer `c1` negatively acknowledges a message that hashes to `c2`, will the nacked message go to `c2` or `c1`?
   1. Address this limitation, do you really need to track state for every key in the topic? I may be naïve here, but it seems to me that you would only need to track state for **each key which has messages that have been dispatched to a consumer**. That's still an O(N) state where there's currently not one, but it's a much smaller N. This might be getting into feature request territory now though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] xuesongxs commented on issue #15705: New KeyShared consumers will not get any messages until a consumer that did get messages disconnects or acks/nacks some messages

Posted by GitBox <gi...@apache.org>.
xuesongxs commented on issue #15705:
URL: https://github.com/apache/pulsar/issues/15705#issuecomment-1136995653

   > erwise, we could get a nack on one message and then on another and they could end up being out of order, eg: if c2 al
   
   Your broker.conf file changed to:
   subscriptionKeySharedUseConsistentHashing=true
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] codelipenghui closed issue #15705: New KeyShared consumers will not get any messages until a consumer that did get messages disconnects or acks/nacks some messages

Posted by GitBox <gi...@apache.org>.
codelipenghui closed issue #15705: New KeyShared consumers will not get any messages until a consumer that did get messages disconnects or acks/nacks some messages
URL: https://github.com/apache/pulsar/issues/15705


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] codelipenghui commented on issue #15705: New KeyShared consumers will not get any messages until a consumer that did get messages disconnects or acks/nacks some messages

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on issue #15705:
URL: https://github.com/apache/pulsar/issues/15705#issuecomment-1133801755

   @zbentley 
   
   > If a KeyShared consumer has any unacknowledged messages, new KeyShared consumers on the same subscription after that point will not get any new messages (even messages with brand new keys) until the original consumer either disconnects or acks/nacks some indeterminate number of messages.
   
   It's expected behavior because the old consumer has unacked messages, the new messages after the `unacked messages` might break the message dispatch order by the key. 
   
   Here is more context about the key-shared subscription ordering guarantee
   
   https://github.com/apache/pulsar/issues/6554
   https://github.com/apache/pulsar/pull/7106


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] kuskmen commented on issue #15705: New KeyShared consumers will not get any messages until a consumer that did get messages disconnects or acks/nacks some messages

Posted by "kuskmen (via GitHub)" <gi...@apache.org>.
kuskmen commented on issue #15705:
URL: https://github.com/apache/pulsar/issues/15705#issuecomment-1676867328

   Can outOfOrderDelivery mitigate issues with consumers getting stuck, because obviously we don't care about ordering at this case?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] codelipenghui commented on issue #15705: New KeyShared consumers will not get any messages until a consumer that did get messages disconnects or acks/nacks some messages

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on issue #15705:
URL: https://github.com/apache/pulsar/issues/15705#issuecomment-1134068161

   > If a topic has no consumers, and a backlog of message index:key pairs 0:a, 1:a, 2:a, 3:b, 4:b, 5:b, and a KeyShared consumer c1 joins with a receiver queue size of 1 and gets message 0, why would we prevent a new consumer c2 from joining and getting messages 3-5? That doesn't compromise key ordering in any way.
   
   Yes, it will not break the key shared semantics, but it's an implementation tradeoff, the current implementation doesn't need to maintain the state for each key since a topic might have a huge number of keys.
   
   The behavior you described is expected for the current implementation(maybe not the best solution for now).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] zbentley commented on issue #15705: New KeyShared consumers will not get any messages until a consumer that did get messages disconnects or acks/nacks some messages

Posted by GitBox <gi...@apache.org>.
zbentley commented on issue #15705:
URL: https://github.com/apache/pulsar/issues/15705#issuecomment-1137465277

   @merlimat that's not accurate; if I start a KeyShared consumer `c1` per the test setup here, wait until it receives its first message, then start another consumer `c2`, then trigger `c1` to nack all of its messages, then `c2` starts getting (some, not all, hash-based) messages.
   
   Is that not supposed to happen? [The docs](tests/unit/test_checksum.py) seem to indicate that this is expected:
   
   > Be aware that negative acknowledgments on ordered subscription types, such as Exclusive, Failover and Key_Shared, might cause failed messages being sent to consumers out of the original order.
   
   Additionally, how does [allowOutOfOrderDelivery](https://pulsar.apache.org/api/client/2.6.0-SNAPSHOT/org/apache/pulsar/client/api/KeySharedPolicy.html#allowOutOfOrderDelivery) work with the Python/C++ client? Is that on by default? Off by default?
   
   Lastly, how does the setting that @xuesongxs mentioned (`subscriptionKeySharedUseConsistentHashing`) affect this behavior? I thought that setting only affected what keys new consumers assume ownership from when they arrive; does it also affect how those consumers get messages (nacked or backlogged) that were already in the topic when they joined?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] Anonymitaet commented on issue #15705: New KeyShared consumers will not get any messages until a consumer that did get messages disconnects or acks/nacks some messages

Posted by GitBox <gi...@apache.org>.
Anonymitaet commented on issue #15705:
URL: https://github.com/apache/pulsar/issues/15705#issuecomment-1134129609

   Hi @momo-jun can you help add that note? Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] zbentley commented on issue #15705: New KeyShared consumers will not get any messages until a consumer that did get messages disconnects or acks/nacks some messages

Posted by GitBox <gi...@apache.org>.
zbentley commented on issue #15705:
URL: https://github.com/apache/pulsar/issues/15705#issuecomment-1133918270

   @codelipenghui if I'm reading that correctly, that's really concerning behavior.
   
   If a topic has no consumers, and a backlog of message index:key pairs `0:a, 1:a, 2:a, 3:b, 4:b, 5:b`, and a KeyShared consumer `c1` joins with a receiver queue size of 1 and gets message 0, why would we prevent a new consumer `c2` from joining and getting messages 3-5? That doesn't compromise key ordering in any way.
   
   Am I interpreting it correctly that: a new key shared consumer that connects to the topic when the newest message has position `X` will not receive **any** messages until the oldest unacked message in the subscription is newer than or equal to `X`?
   
   If that's the current behavior, it should be really prominently documented (potentially in a warning/highlighted way). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] github-actions[bot] commented on issue #15705: New KeyShared consumers will not get any messages until a consumer that did get messages disconnects or acks/nacks some messages

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #15705:
URL: https://github.com/apache/pulsar/issues/15705#issuecomment-1166171755

   The issue had no activity for 30 days, mark with Stale label.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org