You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by gi...@git.apache.org on 2017/07/06 22:27:02 UTC

[GitHub] rdhabalia opened a new pull request #550: ZookeeperCache children-cache invalidation on watch-event and LoadMa?

rdhabalia opened a new pull request #550: ZookeeperCache children-cache invalidation on watch-event  and LoadMa?
URL: https://github.com/apache/incubator-pulsar/pull/550
 
 
   ?nager handling if availableBrokerCache is not updated
   
   ### Motivation
   
   When broker shutdowns, it deletes its own znode from `/loadbalance/brokers` and Leader of [ModularLoadManager](https://github.com/apache/incubator-pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/loadbalance/impl/ModularLoadManagerImpl.java) should get watch event which should update the [available-broker-list-cache](https://github.com/apache/incubator-pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/loadbalance/impl/ModularLoadManagerImpl.java#L183) and loadManager should have up to date list of availableBroker. 
   
   LoadManager also has Zk-Data watch (`ZooKeeperDataCache`) for broker's node. so sometimes, we saw that zk triggers only 1 watch event per zkSession and it notifies only `ZooKeeperDataCache` and not `ZooKeeperChildrenCache` which fails to update availableBrokerList and load-manager fails to update bundle-ownership data which cause bundle downtime.
   
   ```
   ### Only received ZooKeeperDataCache event which doesn't update available Broker list
   22:14:03.646 [main-EventThread] INFO  c.y.p.zookeeper.ZooKeeperDataCache   - [State:CONNECTED Timeout:30000 sessionid:0x459d943ea7cef26 local:/ remoteserver:zk4/ lastZxid:391013064804 xid:600512 sent:600512 recv:751510 queuedpkts:0 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeDeleted path:/loadbalance/brokers/broker2:4080
   :
   22:14:08.535 [main-EventThread] INFO  c.y.p.zookeeper.ZooKeeperDataCache   - [State:CONNECTED Timeout:30000 sessionid:0x459d943ea7cef26 local:/ remoteserver:zk4/ lastZxid:3910130
   66537 xid:600708 sent:600708 recv:751737 queuedpkts:0 pendingresp:0 queuedevents:0] Received ZooKeeper watch event: WatchedEvent state:SyncConnected type:NodeDataChanged path:/loadbalance/brokers/broker15:4080
   22:14:08.538 [pool-21-thread-1] WARN  c.y.p.b.l.i.ModularLoadManagerImpl   - Error reading broker data from cache for broker - [broker2:4080], [KeeperErrorCode = NoNode]
   :
   #### Because of stale availableBrokerList : Load-manager failed to update bundle ownership here
   22:14:08.538 [pool-21-thread-1] WARN  c.y.p.b.l.i.ModularLoadManagerImpl   - Error reading broker data from cache for broker - [broker2:4080], [KeeperErrorCode = NoNode]
   22:14:21.006 [pool-21-thread-1] WARN  c.y.p.b.l.i.ModularLoadManagerImpl   - Error reading broker data from cache for broker - [broker2:4080], [KeeperErrorCode = NoNode]
   22:14:30.097 [pool-21-thread-1] WARN  c.y.p.b.l.i.ModularLoadManagerImpl   - Error reading broker data from cache for broker - [broker2:4080], [KeeperErrorCode = NoNode]
   :
   ##### All lookup fails until broker comes back again
   22:14:31.127 [zk-cache-callback-2-2] WARN  c.y.p.b.lookup.DestinationLookup     - Failed to lookup broker for topic persistent://sla-monitor/myCluster/broker2:4080/persistent-c2023ca5-e8f4-46fe-bb9f-3bf28b050faa: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /loadbalance/brokers/broker2:4080
   java.util.concurrent.CompletionException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /loadbalance/brokers/broker2:4080
   Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /loadbalance/brokers/broker2:4080
   22:14:31.367 [zk-cache-callback-2-4] WARN  c.y.p.b.lookup.DestinationLookup     - Failed to lookup broker for topic persistent://sla-monitor/myCluster/broker2:4080/persistent-c2023ca5-e8f4-46fe-bb9f-3bf28b050faa: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /loadbalance/brokers/broker2:4080
   java.util.concurrent.CompletionException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /loadbalance/brokers/broker2:4080
   Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /loadbalance/brokers/broker2:4080
   ```
   
   ### Modifications
   
   - ZKCache: invalidate parent-zkCache if node is deleted/created
   - LoadManager: Handle if availableBrokersCache is not update while updating load-report
   
   ### Result
   
   It will help LoadManager leader to keep latest bundle ownership data and broker restart will not cause downtime for bundle assignment.
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services