You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2019/04/26 01:14:15 UTC

[GitHub] [pulsar] rdhabalia opened a new pull request #4138: [pulsar-broker]Fix: client-producer can't reconnect due to failed producer-future on cnx cache

rdhabalia opened a new pull request #4138: [pulsar-broker]Fix: client-producer can't reconnect due to failed producer-future on cnx cache
URL: https://github.com/apache/pulsar/pull/4138
 
 
   ### Motivation
   
   Frequently having issue where broker doesn't allow client to create producer and client keep receiving error: `Producer is already present on the connection`
   
   - It happens when same client-host is connected to one broker for two different topics which are part of different bundles. So, client shares same cnx-C1 with broker-B1 for the two different topics which are part of different bundles. 
   - Now, one of the bundle and topic-T1 gets unloaded and goes to different broker-B2. while unloading, client try to reconnect to broker B1 and producer-creation fails due to unloading. However, in the unloading process, broker sometimes couldn't clear this failed-producer-future from the cache.
   - After sometime, the same bundle and topic T1 again gets unloaded from broker-B2 and comes back to Broker-B1.
   - Now, client already has cnx-C1 with broker-B1 (because bundle-2 topics are still connected) and try to create producer (with same producer-id which was created initially at client) for topic-T1 on the same cnx-C1.
   - Some how, broker couldn't clear old producer-result from the cnx-C1 cache and it thinks that producer is already connected so, client keeps failing and couldn't create the producer.
   
   **Broker-log**
   ```
   1#(1) Bundle started unloading
   10:11:18.980 [pulsar-web-29-18] INFO  org.apache.pulsar.broker.namespace.OwnedBundle - Disabling ownership: my-prop/my-cluster/my-ns/0x00000000_0x80000000
   :
   10:11:19.026 [pulsar-web-29-18] INFO  org.apache.pulsar.broker.service.BrokerService - [persistent://my-prop/my-cluster/my-ns/topic-partition-0] Unloading topic
   :
   10:11:19.029 [pulsar-web-29-18] INFO  org.apache.pulsar.broker.service.Producer - Disconnecting producer: Producer{topic=PersistentTopic{topic=persistent://my-prop/my-cluster/my-ns/topic-partition-0}, client=/121.121.121.121:58042, producerName=my-cluster-2897-13224, producerId=1069}
   :
   #(2) While unloading topic: producer disconnected and client tried again to create producer on same cnx: 121.121.121.121:58042 .. but I think producerFuture is not removed
   10:11:19.133 [pulsar-io-21-48] INFO  org.apache.pulsar.broker.service.ServerCnx - [/121.121.121.121:58042][persistent://my-prop/my-cluster/my-ns/topic-partition-0] Creating producer. producerId=1069
   10:11:19.134 [pulsar-io-21-42] INFO  org.apache.pulsar.broker.lookup.TopicLookupBase - Failed to lookup yahoo.dht.set.prod-tenzing.su for topic persistent://my-prop/my-cluster/my-ns/topic-partition-0 with error Namespace bundle my-prop/my-cluster/my-ns/0x00000000_0x80000000 is being unloaded
   :
   10:11:19.134 [pulsar-io-21-42] INFO  org.apache.pulsar.broker.lookup.TopicLookupBase - Failed to lookup yahoo.dht.set.prod-tenzing.su for topic persistent://my-prop/my-cluster/my-ns/topic-partition-0 with error Namespace bundle my-prop/my-cluster/my-ns/0x00000000_0x80000000 is being unloaded
   10:11:19.204 [BookKeeperClientWorker-OrderedExecutor-5-0] INFO  org.apache.bookkeeper.mledger.impl.MetaStoreImplZookeeper - [my-prop/my-cluster/my-ns/persistent/topic-partition-0] [ContactsBackend.BlendedContacts_Index_subscription] Updating cursor info ledgerId=-1 mark-delete=10548168989:116241
   :
   10:11:19.208 [bookkeeper-ml-workers-OrderedExecutor-19-0] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://my-prop/my-cluster/my-ns/topic-partition-0] Topic closed
   :
   10:11:19.211 [pulsar-io-21-27] WARN  org.apache.pulsar.broker.service.BrokerService - [persistent://my-prop/my-cluster/my-ns/topic-partition-0] Service unit is not ready when loading the topic
   #(3) Topic again assigned to same broker and stale and failed producer-future already exists which doesn't allow producer to reconnect and keep giving error: Producer already connected
   :
   16:46:19.116 [bookkeeper-ml-workers-OrderedExecutor-19-0] INFO  org.apache.pulsar.broker.service.persistent.DispatchRateLimiter - [persistent://my-prop/my-cluster/my-ns/topic-partition-0] [null] setting message-dispatch-rate DispatchRate{dispatchThrottlingRateInMsg=0, dispatchThrottlingRateInByte=0, ratePeriodInSecond=1}
   16:46:19.155 [pulsar-io-21-48] WARN  org.apache.pulsar.broker.service.ServerCnx - [/121.121.121.121:58042][persistent://my-prop/my-cluster/my-ns/topic-partition-0] Producer is already present on the connection
   :
   16:46:19.155 [pulsar-io-21-48] WARN  org.apache.pulsar.broker.service.ServerCnx - [/121.121.121.121:58042][persistent://my-prop/my-cluster/my-ns/topic-partition-0] Producer is already present on the connection
   16:46:19.350 [pulsar-io-21-48] WARN  org.apache.pulsar.broker.service.ServerCnx - [/121.121.121.121:58042][persistent://my-prop/my-cluster/my-ns/topic-partition-0] Producer is already present on the connection
   16:46:19.751 [pulsar-io-21-48] WARN  org.apache.pulsar.broker.service.ServerCnx - [/121.121.121.121:58042][persistent://my-prop/my-cluster/my-ns/topic-partition-0] Producer is already present on the connection
   16:46:20.528 [pulsar-io-21-48] WARN  org.apache.pulsar.broker.service.ServerCnx - [/121.121.121.121:58042][persistent://my-prop/my-cluster/my-ns/topic-partition-0] Producer is already present on the connection
   ```
   
   **client-log**
   CPP-Client log shows unknown error
   ```
   20190425 16:46:19.155 : WARN  cms./var/builds/workspace/260010-v3-component/BUILD_CONTAINER/rhel6/label/DOCKER-LOW/app_root/pulsar/pulsar-client-cpp/lib/ProducerImpl.cc - [persistent://sherpa-prodhd/prod2-sg3/prodhd-NOTIFICATION-ContactsBackend.BlendedContacts_Index/DIFF-partition-0, prod2-sg3-2897-13224] Failed to reconnect producer: UnknownError
   ```
   
   I couldn't reproduce it in my local env but it happens very frequently to my prod env.
   
   ### Modification
   
   - Clean up producer-future from the cache if future is already completed with failure. 
   - added log once schema-future completes for future troubleshooting to make sure schema-future doesn't get stuck.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services