You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org> on 2011/03/03 22:16:45 UTC

[jira] Created: (HBASE-3596) [replication] Wait a few seconds before transferring queues

[replication] Wait a few seconds before transferring queues 
------------------------------------------------------------

                 Key: HBASE-3596
                 URL: https://issues.apache.org/jira/browse/HBASE-3596
             Project: HBase
          Issue Type: Improvement
    Affects Versions: 0.90.1
            Reporter: Jean-Daniel Cryans
            Assignee: Jean-Daniel Cryans
             Fix For: 0.90.2


ReplicationSourceManager.transferQueues is running a little too fast at the moment and this has the bad side effect of making us run into HBASE-2611 at almost every cluster restart. The reason is that some servers might shut down faster than others so that the last RS that are notified will at the same time see their friends dying, and will try to pick their queues. What happens then is that they also get told to shutdown and might be able to close their ZK session before the queue transfer process is completed, which is what 2611 is about.

Currently the only to fix to that is to delete the lock znode by hand and bounce a region server so that it picks up the queue on startup.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3596) [replication] Wait a few seconds before transferring queues

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007210#comment-13007210 ] 

Jean-Daniel Cryans commented on HBASE-3596:
-------------------------------------------

I agree, I will add a comment about that.

> [replication] Wait a few seconds before transferring queues 
> ------------------------------------------------------------
>
>                 Key: HBASE-3596
>                 URL: https://issues.apache.org/jira/browse/HBASE-3596
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.1
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.90.2
>
>         Attachments: HBASE-3596.patch
>
>
> ReplicationSourceManager.transferQueues is running a little too fast at the moment and this has the bad side effect of making us run into HBASE-2611 at almost every cluster restart. The reason is that some servers might shut down faster than others so that the last RS that are notified will at the same time see their friends dying, and will try to pick their queues. What happens then is that they also get told to shutdown and might be able to close their ZK session before the queue transfer process is completed, which is what 2611 is about.
> Currently the only to fix to that is to delete the lock znode by hand and bounce a region server so that it picks up the queue on startup.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3596) [replication] Wait a few seconds before transferring queues

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007878#comment-13007878 ] 

Hudson commented on HBASE-3596:
-------------------------------

Integrated in HBase-TRUNK #1792 (See [https://hudson.apache.org/hudson/job/HBase-TRUNK/1792/])
    

> [replication] Wait a few seconds before transferring queues 
> ------------------------------------------------------------
>
>                 Key: HBASE-3596
>                 URL: https://issues.apache.org/jira/browse/HBASE-3596
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.1
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.90.2
>
>         Attachments: HBASE-3596.patch
>
>
> ReplicationSourceManager.transferQueues is running a little too fast at the moment and this has the bad side effect of making us run into HBASE-2611 at almost every cluster restart. The reason is that some servers might shut down faster than others so that the last RS that are notified will at the same time see their friends dying, and will try to pick their queues. What happens then is that they also get told to shutdown and might be able to close their ZK session before the queue transfer process is completed, which is what 2611 is about.
> Currently the only to fix to that is to delete the lock znode by hand and bounce a region server so that it picks up the queue on startup.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HBASE-3596) [replication] Wait a few seconds before transferring queues

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-3596:
--------------------------------------

    Attachment: HBASE-3596.patch

Simple patch that adds a configurable time to sleep before trying to lock a region server.

> [replication] Wait a few seconds before transferring queues 
> ------------------------------------------------------------
>
>                 Key: HBASE-3596
>                 URL: https://issues.apache.org/jira/browse/HBASE-3596
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.1
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.90.2
>
>         Attachments: HBASE-3596.patch
>
>
> ReplicationSourceManager.transferQueues is running a little too fast at the moment and this has the bad side effect of making us run into HBASE-2611 at almost every cluster restart. The reason is that some servers might shut down faster than others so that the last RS that are notified will at the same time see their friends dying, and will try to pick their queues. What happens then is that they also get told to shutdown and might be able to close their ZK session before the queue transfer process is completed, which is what 2611 is about.
> Currently the only to fix to that is to delete the lock znode by hand and bounce a region server so that it picks up the queue on startup.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Resolved: (HBASE-3596) [replication] Wait a few seconds before transferring queues

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans resolved HBASE-3596.
---------------------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]

Committed to branch and trunk, thanks for taking a look Stack.

> [replication] Wait a few seconds before transferring queues 
> ------------------------------------------------------------
>
>                 Key: HBASE-3596
>                 URL: https://issues.apache.org/jira/browse/HBASE-3596
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.1
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.90.2
>
>         Attachments: HBASE-3596.patch
>
>
> ReplicationSourceManager.transferQueues is running a little too fast at the moment and this has the bad side effect of making us run into HBASE-2611 at almost every cluster restart. The reason is that some servers might shut down faster than others so that the last RS that are notified will at the same time see their friends dying, and will try to pick their queues. What happens then is that they also get told to shutdown and might be able to close their ZK session before the queue transfer process is completed, which is what 2611 is about.
> Currently the only to fix to that is to delete the lock znode by hand and bounce a region server so that it picks up the queue on startup.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3596) [replication] Wait a few seconds before transferring queues

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007179#comment-13007179 ] 

stack commented on HBASE-3596:
------------------------------

+1 Seems fine though waiting is probably not always going to work.

> [replication] Wait a few seconds before transferring queues 
> ------------------------------------------------------------
>
>                 Key: HBASE-3596
>                 URL: https://issues.apache.org/jira/browse/HBASE-3596
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.1
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.90.2
>
>         Attachments: HBASE-3596.patch
>
>
> ReplicationSourceManager.transferQueues is running a little too fast at the moment and this has the bad side effect of making us run into HBASE-2611 at almost every cluster restart. The reason is that some servers might shut down faster than others so that the last RS that are notified will at the same time see their friends dying, and will try to pick their queues. What happens then is that they also get told to shutdown and might be able to close their ZK session before the queue transfer process is completed, which is what 2611 is about.
> Currently the only to fix to that is to delete the lock znode by hand and bounce a region server so that it picks up the queue on startup.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira