You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Xiaolin Ha (JIRA)" <ji...@apache.org> on 2019/06/28 09:09:00 UTC

[jira] [Comment Edited] (HBASE-20368) Fix RIT stuck when a rsgroup has no online servers but AM's pendingAssginQueue is cleared

    [ https://issues.apache.org/jira/browse/HBASE-20368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874786#comment-16874786 ] 

Xiaolin Ha edited comment on HBASE-20368 at 6/28/19 9:08 AM:
-------------------------------------------------------------

[~zghaobac], thanks your question.
{quote}Why this patch works and the old implemenation stucked in where?
{quote}
Balancer skipped to process region assignments when there are none online regionservers of group.

And AM won't process these regions again either. As a result, RITs will be held there and stuck.

I assigned regions to the BOGUS server when no online regionservers in group, and let AM check the assignment plans, if BOGUS, add the regions back to the pending assign queue.

Equivalent to retry to assign regions until there are online regionservers in group.

 

 


was (Author: xiaolin ha):
{quote}Why this patch works and the old implemenation stucked in where?
{quote}
Balancer skipped to process region assignments when there are none online regionservers of group.

And AM won't process these regions again either. As a result, RITs will be held there and stuck.

I assigned regions to the BOGUS server when no online regionservers in group, and let AM check the assignment plans, if BOGUS, add the regions back to the pending assign queue.

Equivalent to retry to assign regions until there are online regionservers in group.

 

 

> Fix RIT stuck when a rsgroup has no online servers but AM's pendingAssginQueue is cleared
> -----------------------------------------------------------------------------------------
>
>                 Key: HBASE-20368
>                 URL: https://issues.apache.org/jira/browse/HBASE-20368
>             Project: HBase
>          Issue Type: Bug
>          Components: rsgroup
>    Affects Versions: 2.0.0
>            Reporter: Xiaolin Ha
>            Assignee: Xiaolin Ha
>            Priority: Major
>         Attachments: HBASE-20368.branch-2.001.patch, HBASE-20368.branch-2.002.patch, HBASE-20368.branch-2.003.patch, HBASE-20368.branch-2.1.001.patch
>
>
> This error can be reproduced by shutting down all servers in a rsgroups and starting them soon afterwards. 
> The regions on this rsgroup will be reassigned, but there is no available servers of this rsgroup.
> They will be added to AM's pendingAssginQueue, which AM will clear regardless of the result of assigning in this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)