You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by "Ivan Kelly (Commented) (JIRA)" <ji...@apache.org> on 2011/11/09 16:41:51 UTC

[jira] [Commented] (BOOKKEEPER-69) ServerRedirectLoopException when a machine (hosts bookie server & hub server) reboot, which is caused by race condition of topic manager

    [ https://issues.apache.org/jira/browse/BOOKKEEPER-69?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147110#comment-13147110 ] 

Ivan Kelly commented on BOOKKEEPER-69:
--------------------------------------

Im not sure the analysis of the issue here is correct. Or at least, the test attached doesn't exercise this scenario. One problem I did spot was that the topic manager was completing the operation without error sometimes even when it couldn't acquire the topic. I've attached a patch to fix this. 

There seems to be other issues with how a hub handles a bookie failure. For example, if a hub has a topic and then a bookie dies and comes back up, it can no longer publish to the topic because it doesn't clear up the ledger after the failed write.
                
> ServerRedirectLoopException when a machine (hosts bookie server & hub server) reboot, which is caused by race condition of topic manager
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-69
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-69
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: hedwig-client, hedwig-server
>    Affects Versions: 3.4.0
>         Environment: 3 machines (perf8, perf9, perf10), each machine hosts a bookie server & a hub server.
> perf8 is used as default server for client 1. perf9 is used as default server for client 2.
> bookkeeper is configured as below:
> ensemble size is 3, quorum size is 2.
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>            Priority: Critical
>             Fix For: 4.0.0
>
>         Attachments: BOOKKEEPER-69.possiblefix.diff, bookkeeper-69-testcase.patch, bookkeeper-69.patch, bookkeeper-69.patch
>
>
> 1) machine perf10 is rebooted. the bookie server & hub server are not restarted automatically after reboot.
> 2) client 1 & client 2 are still running. the topics owned in perf10 will be re-assigned to perf8/perf9. but they would fail because not enough bookie servers are available.
> 3) after 2 hours, we found that perf10 is rebooted. we restarted bookie server & hub server on perf10
> 4) then we got ServerRedirectLoopException in client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira