You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Jim Kellerman (JIRA)" <ji...@apache.org> on 2009/01/26 22:07:59 UTC

[jira] Created: (HBASE-1156) Improve lease handling

Improve lease handling
----------------------

                 Key: HBASE-1156
                 URL: https://issues.apache.org/jira/browse/HBASE-1156
             Project: Hadoop HBase
          Issue Type: Improvement
          Components: master, regionserver
    Affects Versions: 0.19.0
            Reporter: Jim Kellerman
            Assignee: Jim Kellerman
             Fix For: 0.20.0


Currently, if a region server crashes and then restarts, it cannot be given work until its lease times out. This is because a lease is only identified by ipaddress:portnumber. If leases were also identified with the start code, the server could be given work immediately, because its log file includes the start code and will not interfere with the recovery of the log from its previous incarnation.

Additionally, we wait in a master server thread for the server to leave the dead servers list because dead servers are not identified by their start code either. Waiting in a master server thread ties up that thread (possibly for quite some time), and rather than waiting, we should throw an exception as the region server already knows how to deal with an exception thrown from a regionServerStartup call.

Finally, there is a bit of code cleanup that needs to be done in the region server when it receives a MSG_CALL_SERVER_STARTUP response from the master. It should not set up the HLog until reportForDuty completes
successfully (which is what it does on the initial reportForDuty call.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Work started: (HBASE-1156) Improve lease handling

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on HBASE-1156 started by Jim Kellerman.

> Improve lease handling
> ----------------------
>
>                 Key: HBASE-1156
>                 URL: https://issues.apache.org/jira/browse/HBASE-1156
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>          Components: master, regionserver
>    Affects Versions: 0.19.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.20.0
>
>
> Currently, if a region server crashes and then restarts, it cannot be given work until its lease times out. This is because a lease is only identified by ipaddress:portnumber. If leases were also identified with the start code, the server could be given work immediately, because its log file includes the start code and will not interfere with the recovery of the log from its previous incarnation.
> Additionally, we wait in a master server thread for the server to leave the dead servers list because dead servers are not identified by their start code either. Waiting in a master server thread ties up that thread (possibly for quite some time), and rather than waiting, we should throw an exception as the region server already knows how to deal with an exception thrown from a regionServerStartup call.
> Finally, there is a bit of code cleanup that needs to be done in the region server when it receives a MSG_CALL_SERVER_STARTUP response from the master. It should not set up the HLog until reportForDuty completes
> successfully (which is what it does on the initial reportForDuty call.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HBASE-1156) Improve lease handling

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman resolved HBASE-1156.
----------------------------------

    Resolution: Fixed

Fixed as a part of HBASE-1157

> Improve lease handling
> ----------------------
>
>                 Key: HBASE-1156
>                 URL: https://issues.apache.org/jira/browse/HBASE-1156
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>          Components: master, regionserver
>    Affects Versions: 0.19.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.20.0
>
>
> Currently, if a region server crashes and then restarts, it cannot be given work until its lease times out. This is because a lease is only identified by ipaddress:portnumber. If leases were also identified with the start code, the server could be given work immediately, because its log file includes the start code and will not interfere with the recovery of the log from its previous incarnation.
> Additionally, we wait in a master server thread for the server to leave the dead servers list because dead servers are not identified by their start code either. Waiting in a master server thread ties up that thread (possibly for quite some time), and rather than waiting, we should throw an exception as the region server already knows how to deal with an exception thrown from a regionServerStartup call.
> Finally, there is a bit of code cleanup that needs to be done in the region server when it receives a MSG_CALL_SERVER_STARTUP response from the master. It should not set up the HLog until reportForDuty completes
> successfully (which is what it does on the initial reportForDuty call.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1156) Improve lease handling

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HBASE-1156:
---------------------------------

    Issue Type: Sub-task  (was: Improvement)
        Parent: HBASE-1157

> Improve lease handling
> ----------------------
>
>                 Key: HBASE-1156
>                 URL: https://issues.apache.org/jira/browse/HBASE-1156
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>          Components: master, regionserver
>    Affects Versions: 0.19.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.20.0
>
>
> Currently, if a region server crashes and then restarts, it cannot be given work until its lease times out. This is because a lease is only identified by ipaddress:portnumber. If leases were also identified with the start code, the server could be given work immediately, because its log file includes the start code and will not interfere with the recovery of the log from its previous incarnation.
> Additionally, we wait in a master server thread for the server to leave the dead servers list because dead servers are not identified by their start code either. Waiting in a master server thread ties up that thread (possibly for quite some time), and rather than waiting, we should throw an exception as the region server already knows how to deal with an exception thrown from a regionServerStartup call.
> Finally, there is a bit of code cleanup that needs to be done in the region server when it receives a MSG_CALL_SERVER_STARTUP response from the master. It should not set up the HLog until reportForDuty completes
> successfully (which is what it does on the initial reportForDuty call.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1156) Improve lease handling

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667567#action_12667567 ] 

Jim Kellerman commented on HBASE-1156:
--------------------------------------

It turns out that currently there is a good reason for not allowing a restarted server to rejoin the cluster until ProcessServerShutdown is complete: we don't check the start code, so any server instance from the same ip:port pair will match. Ugh! Because we don't check the start-code, we cannot allow a server to start serving regions until ProcessServerShutdown is complete. If we checked the start code, we would know if a region was on the dead server and we could reassign it. Otherwise, we might end up reassigning regions being served by the new instance, resulting in multiple server serving the same region and possibly before the log had been recovered.

> Improve lease handling
> ----------------------
>
>                 Key: HBASE-1156
>                 URL: https://issues.apache.org/jira/browse/HBASE-1156
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: master, regionserver
>    Affects Versions: 0.19.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.20.0
>
>
> Currently, if a region server crashes and then restarts, it cannot be given work until its lease times out. This is because a lease is only identified by ipaddress:portnumber. If leases were also identified with the start code, the server could be given work immediately, because its log file includes the start code and will not interfere with the recovery of the log from its previous incarnation.
> Additionally, we wait in a master server thread for the server to leave the dead servers list because dead servers are not identified by their start code either. Waiting in a master server thread ties up that thread (possibly for quite some time), and rather than waiting, we should throw an exception as the region server already knows how to deal with an exception thrown from a regionServerStartup call.
> Finally, there is a bit of code cleanup that needs to be done in the region server when it receives a MSG_CALL_SERVER_STARTUP response from the master. It should not set up the HLog until reportForDuty completes
> successfully (which is what it does on the initial reportForDuty call.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1156) Improve lease handling

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667568#action_12667568 ] 

Jim Kellerman commented on HBASE-1156:
--------------------------------------

If we included the start code in the check, then the restarted server could start serving regions immediately without its regions being detected as having been hosted by the dead server.

> Improve lease handling
> ----------------------
>
>                 Key: HBASE-1156
>                 URL: https://issues.apache.org/jira/browse/HBASE-1156
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: master, regionserver
>    Affects Versions: 0.19.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.20.0
>
>
> Currently, if a region server crashes and then restarts, it cannot be given work until its lease times out. This is because a lease is only identified by ipaddress:portnumber. If leases were also identified with the start code, the server could be given work immediately, because its log file includes the start code and will not interfere with the recovery of the log from its previous incarnation.
> Additionally, we wait in a master server thread for the server to leave the dead servers list because dead servers are not identified by their start code either. Waiting in a master server thread ties up that thread (possibly for quite some time), and rather than waiting, we should throw an exception as the region server already knows how to deal with an exception thrown from a regionServerStartup call.
> Finally, there is a bit of code cleanup that needs to be done in the region server when it receives a MSG_CALL_SERVER_STARTUP response from the master. It should not set up the HLog until reportForDuty completes
> successfully (which is what it does on the initial reportForDuty call.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.