You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/08/14 15:46:45 UTC

[GitHub] [pulsar] jiazhai opened a new pull request #7819: cpp: fix race condition caused by consumer seek and close

jiazhai opened a new pull request #7819:
URL: https://github.com/apache/pulsar/pull/7819


   
   ## Motivation
   User try a loop of create-exclusive-consumer, seek, consume and close of a consumer with cpp client, and some times will meet “consumer busy” errors, which means the broker side consumer still alive while creating new a consumer. 
   
   Here are suspicion logs.  
   ```
   INFO:Client(88)Subscribing on Topic :public/default/jeff-test-21-partition-0Tue Aug 11 11:38:38 2020
   INFO:HandlerBase(53)[persistent://public/default/jeff-test-21-partition-0, 0_itom_di_scheduler_bat_prod_jeff-test-21, 1612] Getting connection from poolTue Aug 11 11:38:38 2020
   INFO:ConsumerImpl(175)[persistent://public/default/jeff-test-21-partition-0, 0_itom_di_scheduler_bat_prod_jeff-test-21, 1612] Created consumer on broker [10.88.109.77:44648 -> 10.88.109.71:31051] Tue Aug 11 11:38:38 2020
   INFO:HandlerBase(130)[persistent://public/default/jeff-test-21-partition-0, 0_itom_di_scheduler_bat_prod_jeff-test-21, 1612] Schedule reconnection in 0.1 sTue Aug 11 11:38:38 2020
   INFO:ConsumerImpl(1047)[persistent://public/default/jeff-test-21-partition-0, 0_itom_di_scheduler_bat_prod_jeff-test-21, 1612] Seek successfullyTue Aug 11 11:38:38 2020
   INFO:HandlerBase(53)[persistent://public/default/jeff-test-21-partition-0, 0_itom_di_scheduler_bat_prod_jeff-test-21, 1612] Getting connection from poolTue Aug 11 11:38:38 2020
   INFO:ConsumerImpl(175)[persistent://public/default/jeff-test-21-partition-0, 0_itom_di_scheduler_bat_prod_jeff-test-21, 1612] Created consumer on broker [10.88.109.77:44648 -> 10.88.109.71:31051] Tue Aug 11 11:38:40 2020
   INFO:HandlerBase(130)[persistent://public/default/jeff-test-21-partition-0, 0_itom_di_scheduler_bat_prod_jeff-test-21, 1612] Schedule reconnection in 0.1 sTue Aug 11 11:38:41 2020
   INFO:HandlerBase(53)[persistent://public/default/jeff-test-21-partition-0, 0_itom_di_scheduler_bat_prod_jeff-test-21, 1612] Getting connection from poolTue Aug 11 11:38:41 2020
   INFO:ConsumerImpl(848)[persistent://public/default/jeff-test-21-partition-0, 0_itom_di_scheduler_bat_prod_jeff-test-21, 1612] Closing consumer for topic persistent://public/default/jeff-test-21-partition-0Tue Aug 11 11:38:42 2020
   INFO:ConsumerImpl(175)[persistent://public/default/jeff-test-21-partition-0, 0_itom_di_scheduler_bat_prod_jeff-test-21, 1612] Created consumer on broker [10.88.109.77:44646 -> 10.88.109.71:31051] Tue Aug 11 11:38:42 2020
   INFO:ConsumerImpl(104)[persistent://public/default/jeff-test-21-partition-0, 0_itom_di_scheduler_bat_prod_jeff-test-21, 1612] Destroyed consumer which was not properly closedTue Aug 11 11:38:42 2020
   ```
   
   -  there is reconnection caused by seek command;
   -  close operation happens at the same time of seek.
   -  consumer destroyed with log `Destroyed consumer which was not properly closed`.
   
   The race condition happens like this:
   1. seek command triggered consumer disconnect; 
   ```
   subscription.resetCursor
   \
   disconnectFuture = dispatcher.disconnectActiveConsumers(true);
   \
   disconnectAllConsumers(boolean isResetCursor)
   consumerList.forEach(consumer -> consumer.disconnect(isResetCursor)); 
   ```
   
   2. client trigger disconnectConsumer, and triggered `connection_.reset();`
   
   ```
   case BaseCommand::CLOSE_CONSUMER: {
       		consumer->disconnectConsumer();  LOG_INFO("Broker notification of Closed consumer: " << consumerId_);	   
   
   \
   void ConsumerImpl::disconnectConsumer() {
       LOG_INFO("Broker notification of Closed consumer: " << consumerId_);
       Lock lock(mutex_);
       connection_.reset();   < === 
       lock.unlock();
   }
   ```
   
   3. connection not ready, and close consumer happened, then it leaked send CloseConsumer command to broker.
   
   ```
   void ConsumerImpl::closeAsync(ResultCallback callback) {
   ...
       LOG_INFO(getName() << "Closing consumer for topic " << topic_);
       state_ = Closing;
       ClientConnectionPtr cnx = getCnx().lock();    < === the seek operation caused cnx reset
       if (!cnx) {     < === goes into this if, and set to Closed and returned directly without closeConsumer sent to broker
           state_ = Closed;  
           lock.unlock();
           // If connection is gone, also the consumer is closed on the broker side
           if (callback) {
               callback(ResultOk);
           }
           return;
       }
       ... 
       Future<Result, ResponseData> future =
           cnx->sendRequestWithId(Commands::newCloseConsumer(consumerId_, requestId), requested);  < ====
      .... 
   }
   ```
   
   
   ### Modifications
   
   when consumer destroy, try to send another closeConsumer command if suitable.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] jiazhai merged pull request #7819: cpp: fix race condition caused by consumer seek and close

Posted by GitBox <gi...@apache.org>.
jiazhai merged pull request #7819:
URL: https://github.com/apache/pulsar/pull/7819


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org