You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/01/28 05:07:00 UTC

[GitHub] [pulsar] oznitecki opened a new issue #9352: New Subscription mode: key_block

oznitecki opened a new issue #9352:
URL: https://github.com/apache/pulsar/issues/9352


   I would like to suggest a new subscription mode that blocks a message key from being delivered to consumers while it has a message that did not get ack or timeout yet.
   
   The use case is:
   You have many messages with different keys on the same topic. The keys are repeating like another update on the same issue, and when they do the order needs to be guaranteed. Example topic:
   Key:message
   1:hello
   2:good
   1:world
   2:bye
   And we have many consumers, still we would like that 'hello' would be processed successfully before 'world' and 'good' would be processed successfully before 'bye' but we do not care for the order between 'hello' and 'good' as they have different keys.
   
   From the currently available subscription modes:
   Key_shared would stick all the 1 keys to one consumer which can cause overload of unprocessed messages with key 1 on the topic.
   Shared would not guarantee the order of messages at all (not even with the same key).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] oznitecki commented on issue #9352: New Subscription mode: key_block

Posted by GitBox <gi...@apache.org>.
oznitecki commented on issue #9352:
URL: https://github.com/apache/pulsar/issues/9352#issuecomment-770918112


   @codelipenghui releasing the consumer key assignment (hash) take us back to the shared case where every message can be consumed by any consumer. Adding key block makes sure that the same key is not consumed on 2 consumers on the same time


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on issue #9352: New Subscription mode: key_block

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on issue #9352:
URL: https://github.com/apache/pulsar/issues/9352#issuecomment-1058893871


   The issue had no activity for 30 days, mark with Stale label.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on issue #9352: New Subscription mode: key_block

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on issue #9352:
URL: https://github.com/apache/pulsar/issues/9352#issuecomment-770246238


   Sorry @oznitecki , could you please give an example that the key_shared can’t works but key_block can works to make sure we are on the same page. Thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on issue #9352: New Subscription mode: key_block

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on issue #9352:
URL: https://github.com/apache/pulsar/issues/9352#issuecomment-770900513


   @oznitecki From your last comment, I think the problem that you have described is the key range distribution problem, the current implementation does not consider the consumer overloaded and no re-assign key ranges mechanism. 
   
   >  If the block was by key and not specific consumer, the idle consumer could take 1 of the 2 overloaded keys.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] addisonj commented on issue #9352: New Subscription mode: key_block

Posted by GitBox <gi...@apache.org>.
addisonj commented on issue #9352:
URL: https://github.com/apache/pulsar/issues/9352#issuecomment-770091367


   @oznitecki Thanks for the feedback and suggestion.
   
   A few questions to make sure I understand and understand your use case more:
   1. Just to make sure I understand, would you want key 1 messages to be able to delivered to many consumers? But block on sending on sending any more messages until the latest key 1 message is processed? If Key 1, has many messages, this likely would be somewhat slow, as we can't pipeline messages and would need to have the broker doing lots of back and forth, but would let all other consumers continue with other keys. Is that what you are trying to solve?
   2. For your use case, how many keys do you expect to be dealing with across how many consumers? 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on issue #9352: New Subscription mode: key_block

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on issue #9352:
URL: https://github.com/apache/pulsar/issues/9352#issuecomment-770223151


   @oznitecki I think you can try the `keyHashRange(Range... ranges)` on the Reader API. This one more like a key hash filter that will not the queue as a whole.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] oznitecki commented on issue #9352: New Subscription mode: key_block

Posted by GitBox <gi...@apache.org>.
oznitecki commented on issue #9352:
URL: https://github.com/apache/pulsar/issues/9352#issuecomment-770224986


   @codelipenghui no, it will only let you control which keys are stuck with which consumer. I do not want the keys to be stuck with a specific consumer, just blocked from being sent while being processed by one of the consumers. It is a looser definition that can prevent the broker from being stuck waiting for a specific consumer for many different messages which are all in the same hash range


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] oznitecki commented on issue #9352: New Subscription mode: key_block

Posted by GitBox <gi...@apache.org>.
oznitecki commented on issue #9352:
URL: https://github.com/apache/pulsar/issues/9352#issuecomment-770327620


   @codelipenghui example:
   Say you have 4 keys, 2 of them get 1000 messages per second and the other 2 get 1 message per second but you do not know in advance which is which (and to make it more complicated you could say the message overload picture change every couple of days). No matter how you will split the key range between 2 consumers there could be a case where both overloaded keys are attached to one consumer. In this case, one consumer would be idle but there would be many messages with the 2 overloaded keys waiting. If the block was by key and not specific consumer, the idle consumer could take 1 of the 2 overloaded keys.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] oznitecki edited a comment on issue #9352: New Subscription mode: key_block

Posted by GitBox <gi...@apache.org>.
oznitecki edited a comment on issue #9352:
URL: https://github.com/apache/pulsar/issues/9352#issuecomment-770161903


   @addisonj 
   1. Key 1 messages can be processed by many consumers but only one at a time (like in key_shared). Like you said at the same time key 1 is being processed by one consumer, the others are free to take any message with different key. I think that maybe messages with same key should clamp together in some way to make it faster to skip all of them, as the order only matters in the same key and not the queue as a whole.
   2. I was thinking about 100,000 keys (1% of them update all the time and the others are sporadic) and about 10 consumers but would surely want to be able to scale (the number of consumers depends on performance)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui edited a comment on issue #9352: New Subscription mode: key_block

Posted by GitBox <gi...@apache.org>.
codelipenghui edited a comment on issue #9352:
URL: https://github.com/apache/pulsar/issues/9352#issuecomment-770900513


   @oznitecki From your last comment, I think the problem that you have described is the key range distribution problem, the current implementation does not consider the consumer overloaded and no re-assign key ranges mechanism. 
   
   >  If the block was by key and not specific consumer, the idle consumer could take 1 of the 2 overloaded keys.
   
   I'm not getting the key point here, why `block` by key can affect `the idle consumer could take 1 of the 2 overloaded keys.`
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] oznitecki commented on issue #9352: New Subscription mode: key_block

Posted by GitBox <gi...@apache.org>.
oznitecki commented on issue #9352:
URL: https://github.com/apache/pulsar/issues/9352#issuecomment-770161903


   1. Key 1 messages can be processed by many consumers but only one at a time (like in key_shared). Like you said at the same time key 1 is being processed by one consumer, the others are free to take any message with different key. I think that maybe messages with same key should clamp together in some way to make it faster to skip all of them, as the order only matters in the same key and not the queue as a whole.
   2. I was thinking about 100,000 keys (1% of them update all the time and the others are sporadic) and about 10 consumers but would surely want to be able to scale (the number of consumers depends on performance)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org