You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/06/03 15:12:38 UTC

[GitHub] [pulsar] baynes opened a new issue #7157: Message lost with new topic and regex subscription

baynes opened a new issue #7157:
URL: https://github.com/apache/pulsar/issues/7157


   **Describe the bug**
   When a new topic is detected by a regexp subscription it takes time before the subscriptions cursor is set up for that topic. As the cursor is set to the end of the topic this means at least one message is lost and as this can take 40 seconds, one could lose 40 seconds of data.
   
   
   **To Reproduce**
   
   If I set up a consumer with a regex subscription, for example:
   
   `/opt/pulsar/bin/pulsar-client consume --regex '.*' -s all -n 0`
   
   I then send a message on a **NEW** topic that matches the regex.
   
   `/opt/pulsar//bin/pulsar-client produce addtopic -m 'm1'`
   
   The consumer detects the new topic and sets up a subscription to it. This can take 30-40 seconds. However it does not see the message (or any other messages sent befor the subscription is set up)
   
   Once it is set up, sending more data to the topic will be picked up by the consumer.
   
   `/opt/pulsar//bin/pulsar-client produce addtopic -m 'm2'`
   
   The consumer will display the message 'm2'.
   
   So though it works from now on, potentially the first 40 seconds of data have been lost.
   
   **Expected behavior**
   All messages sent to the new topic should be seen by the consumer.
   
   **Screenshots**
   N/A
   
   **Desktop (please complete the following information):**
    Centos 7
   Pulsar 2.5.0, 2.5.1
   
   **Additional context**
   
   The initial message(s) are on the topic, one can see them with a reader. So a solution would be for the cursor for the new topic subscription be created pointing to the start of the topic rather then the normal end in this case.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] baynes commented on issue #7157: Message lost with new topic and regex subscription

Posted by GitBox <gi...@apache.org>.
baynes commented on issue #7157:
URL: https://github.com/apache/pulsar/issues/7157#issuecomment-638262643


   Might be some overlap with #6531


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie commented on issue #7157: Messages lost with new topic and regex subscription

Posted by GitBox <gi...@apache.org>.
sijie commented on issue #7157:
URL: https://github.com/apache/pulsar/issues/7157#issuecomment-641730247


   @baynes noted.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on issue #7157: Messages lost with new topic and regex subscription

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on issue #7157:
URL: https://github.com/apache/pulsar/issues/7157#issuecomment-642393034


   The same applies to a partitioned consumer. IMO, when a consumer found new topics/partitions, the subscription initial position **should be changed to earliest** no matter what the original initial position is.
   
   Usually consumers use latest initial position to discard outdated messages. However, assuming that partitions were dynamic increased, i.e. there're some producers and consumers serving this partitioned topic currently. If producers found the increased partitions before consumers, in consumer's view, those messages before it consumes **shouldn't be considered outdated**.
   
   What do you think of this change? @sijie 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] baynes commented on issue #7157: Messages lost with new topic and regex subscription

Posted by GitBox <gi...@apache.org>.
baynes commented on issue #7157:
URL: https://github.com/apache/pulsar/issues/7157#issuecomment-638728414


   I guess that would do it. It changes the behavior for topics created while the client is down or before it is started for the first time. It is probably useful to read all the messages on those when the client comes up - but there could be cases where it is undesirable when the client starts for the first time.
   
   If we do go that way then  #6531 must be fixed as we are actually using functions.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] baynes commented on issue #7157: Messages lost with new topic and regex subscription

Posted by GitBox <gi...@apache.org>.
baynes commented on issue #7157:
URL: https://github.com/apache/pulsar/issues/7157#issuecomment-638283780


   > 
   > 
   > Might be some overlap with #6531
   Further thoughts.
   I think that is more to do with topics created before function creation/start - and the initial_position option of the consumer is enough to handle it so long as you can get it through. This is to do with topics created after function/consumer creation/start - where one wants it to behave as if the cursor setup on the new topic is instantaneous on the topic creation.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] baynes edited a comment on issue #7157: Messages lost with new topic and regex subscription

Posted by GitBox <gi...@apache.org>.
baynes edited a comment on issue #7157:
URL: https://github.com/apache/pulsar/issues/7157#issuecomment-638283780


   > 
   > 
   > Might be some overlap with #6531
   
   Further thoughts.
   I think that is more to do with topics created before function creation/start - and the initial_position option of the consumer is enough to handle it so long as you can get it through. This is to do with topics created after function/consumer creation/start - where one wants it to behave as if the cursor setup on the new topic is instantaneous on the topic creation.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie commented on issue #7157: Messages lost with new topic and regex subscription

Posted by GitBox <gi...@apache.org>.
sijie commented on issue #7157:
URL: https://github.com/apache/pulsar/issues/7157#issuecomment-638550641


   @baynes You can specify SubscriptionInitialPosition.earliest when you create the regex subscription.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org