You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by "Sijie Guo (Created) (JIRA)" <ji...@apache.org> on 2011/12/06 12:31:40 UTC

[jira] [Created] (BOOKKEEPER-140) Hub server doesn't subscribe remote region correctly when a region is down.

Hub server doesn't subscribe remote region correctly when a region is down.
---------------------------------------------------------------------------

                 Key: BOOKKEEPER-140
                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-140
             Project: Bookkeeper
          Issue Type: Bug
          Components: hedwig-server
    Affects Versions: 4.0.0
            Reporter: Sijie Guo
            Assignee: Sijie Guo
             Fix For: 4.1.0


Hub server doesn't subscribe remote region correctly in following cases: (assume there is 3 regions, A, B, C)

1. region shuts down before first subscribe.

1) region C is down.
2) subscribe-a subscribe a topic in region A. a subscription state is created in region A's zookeeper. but remote subscribe to region C would fail since region C is down. hub server will respond client that subscribe failed without deleting subscription state. The following subscriptions using same subscribe id and same topic would failed due to NodeExists.

2. region shuts down when attaches existing subscriptions.

1) In region A, there is a local subscriber a for topic T. in region B, subscriber b for topic T. in region B, subscribe c for topic T.
2) servers are all restarted in all three regions. But region C is network-partitioned (or shuts down) from region A and region B.
3) subscriber b and subscribe c try to subscribe T again. hub servers in region B, C will try to remote subscribe region A, but should failed. There is no mechanism to retry remote subscribe. so if messages are published to topic T in region A, subscribe b and subscribe c would receive any message.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-140) Hub server doesn't subscribe remote region correctly when a region is down.

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174213#comment-13174213 ] 

Hudson commented on BOOKKEEPER-140:
-----------------------------------

Integrated in bookkeeper-trunk #289 (See [https://builds.apache.org/job/bookkeeper-trunk/289/])
    BOOKKEEPER-140: Hub server doesn't subscribe remote region correctly when a region is down. (Sijie Gou via ivank)

ivank : 
Files : 
* /zookeeper/bookkeeper/trunk/CHANGES.txt
* /zookeeper/bookkeeper/trunk/hedwig-server/src/main/java/org/apache/hedwig/server/common/ServerConfiguration.java
* /zookeeper/bookkeeper/trunk/hedwig-server/src/main/java/org/apache/hedwig/server/regions/RegionManager.java
* /zookeeper/bookkeeper/trunk/hedwig-server/src/main/java/org/apache/hedwig/server/subscriptions/AbstractSubscriptionManager.java
* /zookeeper/bookkeeper/trunk/hedwig-server/src/test/java/org/apache/hedwig/server/HedwigRegionTestBase.java
* /zookeeper/bookkeeper/trunk/hedwig-server/src/test/java/org/apache/hedwig/server/integration/TestHedwigRegion.java

                
> Hub server doesn't subscribe remote region correctly when a region is down.
> ---------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-140
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-140
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: hedwig-server
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BOOKKEEPER-140.patch, BOOKKEEPER-140.patch
>
>
> Hub server doesn't subscribe remote region correctly in following cases: (assume there is 3 regions, A, B, C)
> 1. region shuts down before first subscribe.
> 1) region C is down.
> 2) subscribe-a subscribe a topic in region A. a subscription state is created in region A's zookeeper. but remote subscribe to region C would fail since region C is down. hub server will respond client that subscribe failed without deleting subscription state. The following subscriptions using same subscribe id and same topic would failed due to NodeExists.
> 2. region shuts down when attaches existing subscriptions.
> 1) In region A, there is a local subscriber a for topic T. in region B, subscriber b for topic T. in region B, subscribe c for topic T.
> 2) servers are all restarted in all three regions. But region C is network-partitioned (or shuts down) from region A and region B.
> 3) subscriber b and subscribe c try to subscribe T again. hub servers in region B, C will try to remote subscribe region A, but should failed. There is no mechanism to retry remote subscribe. so if messages are published to topic T in region A, subscribe b and subscribe c would receive any message.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-140) Hub server doesn't subscribe remote region correctly when a region is down.

Posted by "Ivan Kelly (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170899#comment-13170899 ] 

Ivan Kelly commented on BOOKKEEPER-140:
---------------------------------------

I don't understand the second scenario in the description, could you double check you've labelled regions and subscribers correctly. 

Otherwise, I need to clarify some things about how cross region should work before properly reviewing the patch, as I don't quite understand how message ordering should work between regions. I've send a mail to bookkeeper-dev about this.
                
> Hub server doesn't subscribe remote region correctly when a region is down.
> ---------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-140
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-140
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: hedwig-server
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BOOKKEEPER-140.patch, BOOKKEEPER-140.patch
>
>
> Hub server doesn't subscribe remote region correctly in following cases: (assume there is 3 regions, A, B, C)
> 1. region shuts down before first subscribe.
> 1) region C is down.
> 2) subscribe-a subscribe a topic in region A. a subscription state is created in region A's zookeeper. but remote subscribe to region C would fail since region C is down. hub server will respond client that subscribe failed without deleting subscription state. The following subscriptions using same subscribe id and same topic would failed due to NodeExists.
> 2. region shuts down when attaches existing subscriptions.
> 1) In region A, there is a local subscriber a for topic T. in region B, subscriber b for topic T. in region B, subscribe c for topic T.
> 2) servers are all restarted in all three regions. But region C is network-partitioned (or shuts down) from region A and region B.
> 3) subscriber b and subscribe c try to subscribe T again. hub servers in region B, C will try to remote subscribe region A, but should failed. There is no mechanism to retry remote subscribe. so if messages are published to topic T in region A, subscribe b and subscribe c would receive any message.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-140) Hub server doesn't subscribe remote region correctly when a region is down.

Posted by "Sijie Guo (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sijie Guo updated BOOKKEEPER-140:
---------------------------------

    Attachment: BOOKKEEPER-140.patch

new patch generated with --no-prefix
                
> Hub server doesn't subscribe remote region correctly when a region is down.
> ---------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-140
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-140
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: hedwig-server
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BOOKKEEPER-140.patch, BOOKKEEPER-140.patch
>
>
> Hub server doesn't subscribe remote region correctly in following cases: (assume there is 3 regions, A, B, C)
> 1. region shuts down before first subscribe.
> 1) region C is down.
> 2) subscribe-a subscribe a topic in region A. a subscription state is created in region A's zookeeper. but remote subscribe to region C would fail since region C is down. hub server will respond client that subscribe failed without deleting subscription state. The following subscriptions using same subscribe id and same topic would failed due to NodeExists.
> 2. region shuts down when attaches existing subscriptions.
> 1) In region A, there is a local subscriber a for topic T. in region B, subscriber b for topic T. in region B, subscribe c for topic T.
> 2) servers are all restarted in all three regions. But region C is network-partitioned (or shuts down) from region A and region B.
> 3) subscriber b and subscribe c try to subscribe T again. hub servers in region B, C will try to remote subscribe region A, but should failed. There is no mechanism to retry remote subscribe. so if messages are published to topic T in region A, subscribe b and subscribe c would receive any message.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-140) Hub server doesn't subscribe remote region correctly when a region is down.

Posted by "Ivan Kelly (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174198#comment-13174198 ] 

Ivan Kelly commented on BOOKKEEPER-140:
---------------------------------------

+1 

Looks good. Committed as r1221798.
                
> Hub server doesn't subscribe remote region correctly when a region is down.
> ---------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-140
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-140
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: hedwig-server
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BOOKKEEPER-140.patch, BOOKKEEPER-140.patch
>
>
> Hub server doesn't subscribe remote region correctly in following cases: (assume there is 3 regions, A, B, C)
> 1. region shuts down before first subscribe.
> 1) region C is down.
> 2) subscribe-a subscribe a topic in region A. a subscription state is created in region A's zookeeper. but remote subscribe to region C would fail since region C is down. hub server will respond client that subscribe failed without deleting subscription state. The following subscriptions using same subscribe id and same topic would failed due to NodeExists.
> 2. region shuts down when attaches existing subscriptions.
> 1) In region A, there is a local subscriber a for topic T. in region B, subscriber b for topic T. in region B, subscribe c for topic T.
> 2) servers are all restarted in all three regions. But region C is network-partitioned (or shuts down) from region A and region B.
> 3) subscriber b and subscribe c try to subscribe T again. hub servers in region B, C will try to remote subscribe region A, but should failed. There is no mechanism to retry remote subscribe. so if messages are published to topic T in region A, subscribe b and subscribe c would receive any message.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (BOOKKEEPER-140) Hub server doesn't subscribe remote region correctly when a region is down.

Posted by "Sijie Guo (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sijie Guo updated BOOKKEEPER-140:
---------------------------------

    Attachment: BOOKKEEPER-140.patch

attach a patch to fix this issue.
                
> Hub server doesn't subscribe remote region correctly when a region is down.
> ---------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-140
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-140
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: hedwig-server
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BOOKKEEPER-140.patch
>
>
> Hub server doesn't subscribe remote region correctly in following cases: (assume there is 3 regions, A, B, C)
> 1. region shuts down before first subscribe.
> 1) region C is down.
> 2) subscribe-a subscribe a topic in region A. a subscription state is created in region A's zookeeper. but remote subscribe to region C would fail since region C is down. hub server will respond client that subscribe failed without deleting subscription state. The following subscriptions using same subscribe id and same topic would failed due to NodeExists.
> 2. region shuts down when attaches existing subscriptions.
> 1) In region A, there is a local subscriber a for topic T. in region B, subscriber b for topic T. in region B, subscribe c for topic T.
> 2) servers are all restarted in all three regions. But region C is network-partitioned (or shuts down) from region A and region B.
> 3) subscriber b and subscribe c try to subscribe T again. hub servers in region B, C will try to remote subscribe region A, but should failed. There is no mechanism to retry remote subscribe. so if messages are published to topic T in region A, subscribe b and subscribe c would receive any message.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (BOOKKEEPER-140) Hub server doesn't subscribe remote region correctly when a region is down.

Posted by "Sijie Guo (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172875#comment-13172875 ] 

Sijie Guo commented on BOOKKEEPER-140:
--------------------------------------

> the second scenario

As Utkarsh commented in the email,
{quote}
If it's not the first local subscribe, then region A won't be contacted, and
the subscription request will succeed even if region A is down.
{quote}

so if messages are published in region A, subscriber in region B would not receive.

refine the second scenario as below:
{code}
2. region shuts down when attaches existing subscriptions.

1) In region A, there is a local subscriber a for topic T. in region B, local subscriber b for topic T. in region B, local subscribe c for topic T.
2) servers are all restarted in all three regions. But region A is network-partitioned (or shuts down) from region B and region C.
3) subscriber b attach to Topic T in region B. subscriber c attach to Topic T in region C.
4) hub server owns Topic T in region B, would do remote subscription to region A and C. remote subscribe to region A succeed, while remote subscribe to region C would failed. hub server responds SUCCESS to subscriber b. subscriber c did the same thing in region C.
5) region A (restarted / network is connected again) became connected with region B and region C. messages published in region A. But subscribe b and c would never receive any messages.
{code}
                
> Hub server doesn't subscribe remote region correctly when a region is down.
> ---------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-140
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-140
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: hedwig-server
>    Affects Versions: 4.0.0
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.1.0
>
>         Attachments: BOOKKEEPER-140.patch, BOOKKEEPER-140.patch
>
>
> Hub server doesn't subscribe remote region correctly in following cases: (assume there is 3 regions, A, B, C)
> 1. region shuts down before first subscribe.
> 1) region C is down.
> 2) subscribe-a subscribe a topic in region A. a subscription state is created in region A's zookeeper. but remote subscribe to region C would fail since region C is down. hub server will respond client that subscribe failed without deleting subscription state. The following subscriptions using same subscribe id and same topic would failed due to NodeExists.
> 2. region shuts down when attaches existing subscriptions.
> 1) In region A, there is a local subscriber a for topic T. in region B, subscriber b for topic T. in region B, subscribe c for topic T.
> 2) servers are all restarted in all three regions. But region C is network-partitioned (or shuts down) from region A and region B.
> 3) subscriber b and subscribe c try to subscribe T again. hub servers in region B, C will try to remote subscribe region A, but should failed. There is no mechanism to retry remote subscribe. so if messages are published to topic T in region A, subscribe b and subscribe c would receive any message.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira