You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/05/19 20:48:22 UTC

[GitHub] [pulsar] zwalsh-toast opened a new issue, #15679: Topic rebalances cause consumers with receiverQueueSize=0 to get stuck

zwalsh-toast opened a new issue, #15679:
URL: https://github.com/apache/pulsar/issues/15679

   **Describe the bug**
   Topic rebalances cause consumers with receiverQueueSize=0 to get stuck.
   
   **To Reproduce**
   Steps to reproduce the behavior:
   1. Create a consumer with receiverQueueSize=0
   2. Restart the broker that is serving the topic so that the topic is unloaded and rebalanced
   
   **Expected behavior**
   The consumer should continue to process new messages on the topic.
   
   **Actual behavior**
   The consumer gets stuck and does not process additional messages until it is restarted.
   
   **Environment:**
   - Java client 2.7.2
   - Pulsar cluster version 2.7.2
   
   **Additional context**
   
   Consumer settings:
   
   ```json
   {
     "topicNames" : [ "persistent://engineering/restaurant/config-command" ],
     "topicsPattern" : null,
     "subscriptionName" : "fig-mgmt-worker-config-command",
     "subscriptionType" : "Shared",
     "subscriptionMode" : "Durable",
     "receiverQueueSize" : 0,
     "acknowledgementsGroupTimeMicros" : 100000,
     "negativeAckRedeliveryDelayMicros" : 60000000,
     "maxTotalReceiverQueueSizeAcrossPartitions" : 50000,
     "consumerName" : "fig-mgmt-worker-prod-r96-d0-rand-mY-task-8d4116bbb7d5490cb9c51f0746b42e8e",
     "ackTimeoutMillis" : 0,
     "tickDurationMillis" : 1000,
     "priorityLevel" : 0,
     "maxPendingChuckedMessage" : 10,
     "autoAckOldestChunkedMessageOnQueueFull" : false,
     "expireTimeOfIncompleteChunkedMessageMillis" : 60000,
     "cryptoFailureAction" : "FAIL",
     "properties" : { },
     "readCompacted" : false,
     "subscriptionInitialPosition" : "Latest",
     "patternAutoDiscoveryPeriod" : 60,
     "regexSubscriptionMode" : "PersistentOnly",
     "deadLetterPolicy" : null,
     "retryEnable" : false,
     "autoUpdatePartitions" : true,
     "autoUpdatePartitionsIntervalSeconds" : 60,
     "replicateSubscriptionState" : false,
     "resetIncludeHead" : false,
     "keySharedPolicy" : null,
     "batchIndexAckEnabled" : false
   }
   ```
   
   Client logs during the broker restart:
   
   ```
   INFO 2022-05-01T11:18:46.205Z figmgmtworker<96> "pulsar-client-io-1-3" [] o.a.p.c.i.ClientCnx - [prod-pulsar-c1-use1-broker-7-r12.pulsar.prod.toastops.net/10.0.29.30:6651] Broker notification of Closed consumer: 0
   INFO 2022-05-01T11:18:46.205Z figmgmtworker<96> "pulsar-client-io-1-3" [] o.a.p.c.i.ConnectionHandler - [persistent://engineering/restaurant/config-command] [fig-mgmt-worker-config-command] Closed connection [id: 0x6f45bc89, L:/10.0.32.46:55240 - R:prod-pulsar-c1-use1-broker-7-r12.pulsar.prod.toastops.net/10.0.29.30:6651] -- Will try again in 0.1 s
   INFO 2022-05-01T11:18:46.306Z figmgmtworker<96> "pulsar-timer-5-1" [] o.a.p.c.i.ConnectionHandler - [persistent://engineering/restaurant/config-command] [fig-mgmt-worker-config-command] Reconnecting after timeout
   INFO 2022-05-01T11:18:46.317Z figmgmtworker<96> "pulsar-client-io-1-4" [] o.a.p.c.i.ConsumerImpl - [persistent://engineering/restaurant/config-command][fig-mgmt-worker-config-command] Subscribing to topic on cnx [id: 0xa83ab82d, L:/10.0.32.46:38634 - R:prod-pulsar-c1-use1-broker-0-r12.pulsar.prod.toastops.net/10.0.25.232:6651], consumerId 0
   INFO 2022-05-01T11:18:46.330Z figmgmtworker<96> "pulsar-client-io-1-4" [] o.a.p.c.i.ConsumerImpl - [persistent://engineering/restaurant/config-command][fig-mgmt-worker-config-command] Subscribed to topic on prod-pulsar-c1-use1-broker-0-r12.pulsar.prod.toastops.net/10.0.25.232:6651 -- consumer: 0
    ```
   
   
   Notice that the host serving the topic has changed from `prod-pulsar-c1-use1-broker-7-r12.pulsar.prod.toastops.net` to prod-`pulsar-c1-use1-broker-0-r12.pulsar.prod.toastops.net`
   
   Metrics showing a backlog build up until we manually intervened:
   
   ![image](https://user-images.githubusercontent.com/35272561/169401961-94f89a50-dda3-4a84-a8bd-175c871fad4f.png)
   
   
   Output of topic stats during the time when the consumer was connected but not consuming:
   
   ```json
   {
     "msgRateIn": 0.0,
     "msgThroughputIn": 0.0,
     "msgRateOut": 0.0,
     "msgThroughputOut": 0.0,
     "bytesInCounter": 1452130,
     "msgInCounter": 5939,
     "bytesOutCounter": 0,
     "msgOutCounter": 0,
     "averageMsgSize": 0.0,
     "msgChunkPublished": false,
     "storageSize": 2803581,
     "backlogSize": 1452130,
     "offloadedStorageSize": 0,
     "publishers": [
      // ... many publishers ...
     ],
     "subscriptions": {
       "fig-mgmt-worker-config-command": {
         "msgRateOut": 0.0,
         "msgThroughputOut": 0.0,
         "bytesOutCounter": 0,
         "msgOutCounter": 0,
         "msgRateRedeliver": 0.0,
         "chuckedMessageRate": 0,
         "msgBacklog": 5939,
         "backlogSize": 0,
         "msgBacklogNoDelayed": 5939,
         "blockedSubscriptionOnUnackedMsgs": false,
         "msgDelayed": 0,
         "unackedMessages": 0,
         "type": "Shared",
         "msgRateExpired": 0.0,
         "totalMsgExpired": 0,
         "lastExpireTimestamp": 1651507172654,
         "lastConsumedFlowTimestamp": 0,
         "lastConsumedTimestamp": 0,
         "lastAckedTimestamp": 0,
         "lastMarkDeleteAdvancedTimestamp": 0,
         "consumers": [
           {
             "msgRateOut": 0.0,
             "msgThroughputOut": 0.0,
             "bytesOutCounter": 0,
             "msgOutCounter": 0,
             "msgRateRedeliver": 0.0,
             "chuckedMessageRate": 0.0,
             "consumerName": "fig-mgmt-worker-prod-r96-d0-rand-Tm-task-ec6aab2ab576490f9cd7c44db9154bfc",
             "availablePermits": 0,
             "unackedMessages": 0,
             "avgMessagesPerEntry": 1000,
             "blockedConsumerOnUnackedMsgs": false,
             "lastAckedTimestamp": 0,
             "lastConsumedTimestamp": 0,
             "metadata": {
               
             },
             "connectedSince": "2022-05-01T11:18:46.329Z",
             "clientVersion": "2.7.2",
             "address": "/10.0.28.85:58540"
           },
           {
             "msgRateOut": 0.0,
             "msgThroughputOut": 0.0,
             "bytesOutCounter": 0,
             "msgOutCounter": 0,
             "msgRateRedeliver": 0.0,
             "chuckedMessageRate": 0.0,
             "consumerName": "fig-mgmt-worker-prod-r96-d0-rand-xE-task-f5255f72ef1d4caebbb7729ff3848046",
             "availablePermits": 0,
             "unackedMessages": 0,
             "avgMessagesPerEntry": 1000,
             "blockedConsumerOnUnackedMsgs": false,
             "lastAckedTimestamp": 0,
             "lastConsumedTimestamp": 0,
             "metadata": {
               
             },
             "connectedSince": "2022-05-01T11:18:46.329Z",
             "clientVersion": "2.7.2",
             "address": "/10.0.32.46:38634"
           }
         ],
         "isDurable": true,
         "isReplicated": false,
         "consumersAfterMarkDeletePosition": {
           
         },
         "nonContiguousDeletedMessagesRanges": 0,
         "nonContiguousDeletedMessagesRangesSerializedSize": 0
       }
     },
     "replication": {
       
     },
     "deduplicationStatus": "Disabled",
     "nonContiguousDeletedMessagesRanges": 0,
     "nonContiguousDeletedMessagesRangesSerializedSize": 0
   }
   ```
   
   Note that both consumers show zero unacked messages and zero available permits:
   ```
             "availablePermits": 0,
             "unackedMessages": 0,
   ```
   
   cc: @klevy-toast 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] github-actions[bot] commented on issue #15679: Topic rebalances cause consumers with receiverQueueSize=0 to get stuck when calling receiveAsync()

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #15679:
URL: https://github.com/apache/pulsar/issues/15679#issuecomment-1163846753

   The issue had no activity for 30 days, mark with Stale label.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] klevy-toast commented on issue #15679: Topic rebalances cause consumers with receiverQueueSize=0 to get stuck

Posted by GitBox <gi...@apache.org>.
klevy-toast commented on issue #15679:
URL: https://github.com/apache/pulsar/issues/15679#issuecomment-1135086108

   I have made more accurate local reproduction steps with docker / kotlin which should be straightforward to run with this test file: https://gist.github.com/klevy-toast/38d737c8abdbeb8d352c7617e4a96cfa
   
   It appears to be a bug specifically related to the `ZeroQueueConsumer`'s `receiveAsync`, after a topic has been unloaded while the consumer is connected. Using `receive` instead does not get stuck. I verified that the bug occurs with both 2.7.2 and 2.9.2
   
   @mattisonchao please let me know if you have any trouble reproducing this. Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org