You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/01/26 04:45:05 UTC

[GitHub] [pulsar] wuYin edited a comment on issue #9297: proxy lookup result still contains unreachable owner broker serviceURL after broker restart and 30s zk session expired

wuYin edited a comment on issue #9297:
URL: https://github.com/apache/pulsar/issues/9297#issuecomment-767280833


   @congbobo184 Thanks for review
   I'm using pulsar-helm-chart to deploy cluster, in proxy.conf, broker connection addresses looks likeļ¼š
   ```
   brokerServiceURL=pulsar://handshake-pulsar-broker:6650
   brokerWebServiceURL=http://handshake-pulsar-broker:8080
   ```
   which generated by [proxy-configmap.yaml#L37](https://github.com/apache/pulsar-helm-chart/blob/master/charts/pulsar/templates/proxy-configmap.yaml#L37),  In proxy pod:
   ```
   > cat /etc/resolv.conf 
   search psr.svc.cluster.local svc.cluster.local cluster.local
   
   > host handshake-pulsar-broker
   handshake-pulsar-broker.psr.svc.cluster.local has address 10.113.42.32
   handshake-pulsar-broker.psr.svc.cluster.local has address 10.113.43.53 # will be removed
   handshake-pulsar-broker.psr.svc.cluster.local has address 10.113.46.57
   
   > host handshake-pulsar-broker-1.handshake-pulsar-broker.psr.svc.cluster.local  # bundle owner host
   handshake-pulsar-broker-1.handshake-pulsar-broker.psr.svc.cluster.local has address 10.113.43.53
   ```
   For this issue, during broker1 restarting/terminating, it's service DNS record will be removed quickly(within 1s)
   Proxy request to other brokers to do Lookup, due to broker1 related zNode not expired yet, other brokers still returned `broker1.xxx.cluster.local` which has been removed, finally lead to client backoff retry the same Lookup.
   
   I think it's reasonable, but there's still small chance to trigger flaky case
   In my production env, I drain a k8s node caused a broker be scheduled to another node, but client even retried 16min Lookup still failed.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org