You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/07/12 11:13:21 UTC

[GitHub] [pulsar] kezhuw opened a new issue #7517: Conflict namespace bundle ownership after unloading and connected zookeeper server disconnected

kezhuw opened a new issue #7517:
URL: https://github.com/apache/pulsar/issues/7517


   **Describe the bug**
   `OwnedBundle.handleUnloadRequest` re-active unloading namespace bundle after fail to remove its ownership from zookeeper. It is possible that the ownership was successfully removed, but connected zookeeper crashed before sending response, eg. `KeeperException.Code.CONNECTIONLOSS`.
   
   **To Reproduce**
   I add test case to reproduce this in https://github.com/kezhuw/pulsar/commit/066d16468fc5c7b496aca766efb159f177f10d1b#diff-29ccb5c3ba685ffcabe9df5e9fd7e841L50.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie closed issue #7517: Conflict namespace bundle ownership after unloading and connected zookeeper server disconnected

Posted by GitBox <gi...@apache.org>.
sijie closed issue #7517:
URL: https://github.com/apache/pulsar/issues/7517






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie commented on issue #7517: Conflict namespace bundle ownership after unloading and connected zookeeper server disconnected

Posted by GitBox <gi...@apache.org>.
sijie commented on issue #7517:
URL: https://github.com/apache/pulsar/issues/7517#issuecomment-664085345


   @kezhuw The second approach looks good to me.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] kezhuw commented on issue #7517: Conflict namespace bundle ownership after unloading and connected zookeeper server disconnected

Posted by GitBox <gi...@apache.org>.
kezhuw commented on issue #7517:
URL: https://github.com/apache/pulsar/issues/7517#issuecomment-664712886


   @sijie OK. I will open pull request in next few days.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie closed issue #7517: Conflict namespace bundle ownership after unloading and connected zookeeper server disconnected

Posted by GitBox <gi...@apache.org>.
sijie closed issue #7517:
URL: https://github.com/apache/pulsar/issues/7517






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie closed issue #7517: Conflict namespace bundle ownership after unloading and connected zookeeper server disconnected

Posted by GitBox <gi...@apache.org>.
sijie closed issue #7517:
URL: https://github.com/apache/pulsar/issues/7517


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie commented on issue #7517: Conflict namespace bundle ownership after unloading and connected zookeeper server disconnected

Posted by GitBox <gi...@apache.org>.
sijie commented on issue #7517:
URL: https://github.com/apache/pulsar/issues/7517#issuecomment-657913463


   @kezhuw Are you interested in sending a pull request to fix it?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie closed issue #7517: Conflict namespace bundle ownership after unloading and connected zookeeper server disconnected

Posted by GitBox <gi...@apache.org>.
sijie closed issue #7517:
URL: https://github.com/apache/pulsar/issues/7517


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] kezhuw commented on issue #7517: Conflict namespace bundle ownership after unloading and connected zookeeper server disconnected

Posted by GitBox <gi...@apache.org>.
kezhuw commented on issue #7517:
URL: https://github.com/apache/pulsar/issues/7517#issuecomment-663863578


   @sijie Sorry for the delay. I am willing to take over this issue. I have add new test case [`testAcquireOwnershipWithZookeeperDisconnectedAfterOwnershipNodeCreated`](https://github.com/kezhuw/pulsar/commit/bbe9bdd2244ca051c6fe4efb90aad66b5d079375#diff-29ccb5c3ba685ffcabe9df5e9fd7e841R193) which also fails due to zookeeper disconnected.
   
   Before formal pull request, I think we should converge on how to fix this issue to avoid substantial divergence.
   
   There are two possible approaches to fix or reduce possibility of this issue in my opinion:
   1. Retry on certain errors till success or session expired.
   2. Reestablish existing ownership in ownership querying and acquiring.
   
   I think first approach can't or hard to provide correctness due to reasons:
   * It is hard to take appropriate actions for all error conditions.
   * It can't handle disconnected-connected-disconnected-... dance.
   But I think retry approach indeed provides api usability and caller friendliness.
   
   In contrast, the second approach admits that we could not provide correct result in certain condition, but we can provide correct result with manually retry after that possibly temporary condition solved later.
   
   So, I tend to fix this issue by reestablish existing ownership in later ownership querying and acquiring. In future, we can retry on failure automatically to improve caller friendliness without sacrifice correctness.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org