You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by "Keith Turner (Created) (JIRA)" <ji...@apache.org> on 2012/02/13 17:52:59 UTC

[jira] [Created] (ACCUMULO-393) Master not balancing after agitation

Master not balancing after agitation
------------------------------------

                 Key: ACCUMULO-393
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-393
             Project: Accumulo
          Issue Type: Bug
            Reporter: Keith Turner
            Assignee: Eric Newton
             Fix For: 1.4.0


Ran continuous ingest with agitation for 14 hours.  After this the tablets were left in an unbalanced state.  Saw the following in the master logs.

See a new tablet server xxx.xxx.xxx.12:9997[235396fb181e0c6]

{noformat}
12 07:47:19,370 [master.Master] INFO : New servers: [xxx.xxx.xxx.12:9997[235396fb181e0c6]]
12 07:50:27,199 [master.Master] INFO : New servers: [xxx.xxx.xxx.4:9997[135396fb18ee67f], xxx.xxx.xxx.7:9997[3353986642be160], xxx.xxx.xxx.5:9997[3353986642be24f], xxx.xxx.xxx.13:9997[135396fb18ee715], xxx.xxx.xxx.9:9997[3353986642be24e], xxx.xxx.xxx.6:9997[135396fb18ee6ca], xxx.xxx.xxx.10:9997[235396fb181df91], xxx.xxx.xxx.8:9997[135396fb18ee6cb], xxx.xxx.xxx.12:9997[235396fb181e0c6]]
{noformat}

The tablet server dies

{noformat}
12 07:57:44,868 [master.Master] DEBUG: Normal Tablets assigning tablet 6;06e04e;06c056=xxx.xxx.xxx.12:9997[235396fb181e0c6]
12 08:05:30,984 [master.Master] INFO : New servers: [xxx.xxx.xxx.4:9997[235396fb181e109], xxx.xxx.xxx.7:9997[3353986642be160], xxx.xxx.xxx.5:9997[3353986642be24f], xxx.xxx.xxx.9:9997[3353986642be300], xxx.xxx.xxx.6:9997[235396fb181e107], xxx.xxx.xxx.10:9997[235396fb181df91], xxx.xxx.xxx.8:9997[135396fb18ee7ab], xxx.xxx.xxx.12:9997[235396fb181e0c6], xxx.xxx.xxx.13:9997[235396fb181e108]]
12 08:05:56,044 [master.Master] WARN : Lost servers [xxx.xxx.xxx.12:9997[235396fb181e0c6]]
12 08:05:58,718 [master.Master] ERROR: unable to get tablet server status xxx.xxx.xxx.12:9997[235396fb181e0c6] null
12 08:05:58,718 [master.Master] DEBUG: unable to get tablet server status xxx.xxx.xxx.12:9997[235396fb181e0c6]
12 08:05:58,721 [master.Master] DEBUG: not balancing because the balance information is out-of-date [xxx.xxx.xxx.7:9997[3353986642be160], xxx.xxx.xxx.5:9997[3353986642be24f], xxx.xxx.xxx.8:9997[135396fb18ee7ab], xxx.xxx.xxx.12:9997[235396fb181e0c6]]
12 08:05:58,728 [master.Master] DEBUG: not balancing because the balance information is out-of-date [xxx.xxx.xxx.7:9997[3353986642be160], xxx.xxx.xxx.5:9997[3353986642be24f], xxx.xxx.xxx.8:9997[135396fb18ee7ab], xxx.xxx.xxx.12:9997[235396fb181e0c6]]
12 08:05:59,065 [master.Master] DEBUG: not balancing because the balance information is out-of-date [xxx.xxx.xxx.7:9997[3353986642be160], xxx.xxx.xxx.5:9997[3353986642be24f], xxx.xxx.xxx.8:9997[135396fb18ee7ab], xxx.xxx.xxx.12:9997[235396fb181e0c6]]
12 08:05:59,641 [master.Master] DEBUG: 1 assigned to dead servers: [6;3d40b2;3d20b1@(null,xxx.xxx.xxx.12:9997[235396fb181e0c6],null)]...
12 08:05:59,715 [master.Master] DEBUG: not balancing because the balance information is out-of-date [xxx.xxx.xxx.12:9997[235396fb181e0c6]]
{noformat}

Another instance of a tablet server start on xxx.xxx.xxx.12

{noformat}
12 08:07:35,245 [master.Master] INFO : New servers: [xxx.xxx.xxx.7:9997[3353986642be345], xxx.xxx.xxx.12:9997[235396fb181e15c]]
{noformat}


Much later its still not balancing for some reason.

{noformat}
13 16:31:24,131 [master.Master] DEBUG: not balancing because the balance information is out-of-date [xxx.xxx.xxx.12:9997[235396fb181e0c6]]
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (ACCUMULO-393) Master not balancing after agitation

Posted by "Eric Newton (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ACCUMULO-393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eric Newton resolved ACCUMULO-393.
----------------------------------

    Resolution: Fixed

needed to shrink the list of badServers to the list of the current servers
                
> Master not balancing after agitation
> ------------------------------------
>
>                 Key: ACCUMULO-393
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-393
>             Project: Accumulo
>          Issue Type: Bug
>            Reporter: Keith Turner
>            Assignee: Eric Newton
>              Labels: 14_qa_bug
>             Fix For: 1.4.0
>
>
> Ran continuous ingest with agitation for 14 hours.  After this the tablets were left in an unbalanced state.  Saw the following in the master logs.
> See a new tablet server xxx.xxx.xxx.12:9997[235396fb181e0c6]
> {noformat}
> 12 07:47:19,370 [master.Master] INFO : New servers: [xxx.xxx.xxx.12:9997[235396fb181e0c6]]
> 12 07:50:27,199 [master.Master] INFO : New servers: [xxx.xxx.xxx.4:9997[135396fb18ee67f], xxx.xxx.xxx.7:9997[3353986642be160], xxx.xxx.xxx.5:9997[3353986642be24f], xxx.xxx.xxx.13:9997[135396fb18ee715], xxx.xxx.xxx.9:9997[3353986642be24e], xxx.xxx.xxx.6:9997[135396fb18ee6ca], xxx.xxx.xxx.10:9997[235396fb181df91], xxx.xxx.xxx.8:9997[135396fb18ee6cb], xxx.xxx.xxx.12:9997[235396fb181e0c6]]
> {noformat}
> The tablet server dies
> {noformat}
> 12 07:57:44,868 [master.Master] DEBUG: Normal Tablets assigning tablet 6;06e04e;06c056=xxx.xxx.xxx.12:9997[235396fb181e0c6]
> 12 08:05:30,984 [master.Master] INFO : New servers: [xxx.xxx.xxx.4:9997[235396fb181e109], xxx.xxx.xxx.7:9997[3353986642be160], xxx.xxx.xxx.5:9997[3353986642be24f], xxx.xxx.xxx.9:9997[3353986642be300], xxx.xxx.xxx.6:9997[235396fb181e107], xxx.xxx.xxx.10:9997[235396fb181df91], xxx.xxx.xxx.8:9997[135396fb18ee7ab], xxx.xxx.xxx.12:9997[235396fb181e0c6], xxx.xxx.xxx.13:9997[235396fb181e108]]
> 12 08:05:56,044 [master.Master] WARN : Lost servers [xxx.xxx.xxx.12:9997[235396fb181e0c6]]
> 12 08:05:58,718 [master.Master] ERROR: unable to get tablet server status xxx.xxx.xxx.12:9997[235396fb181e0c6] null
> 12 08:05:58,718 [master.Master] DEBUG: unable to get tablet server status xxx.xxx.xxx.12:9997[235396fb181e0c6]
> 12 08:05:58,721 [master.Master] DEBUG: not balancing because the balance information is out-of-date [xxx.xxx.xxx.7:9997[3353986642be160], xxx.xxx.xxx.5:9997[3353986642be24f], xxx.xxx.xxx.8:9997[135396fb18ee7ab], xxx.xxx.xxx.12:9997[235396fb181e0c6]]
> 12 08:05:58,728 [master.Master] DEBUG: not balancing because the balance information is out-of-date [xxx.xxx.xxx.7:9997[3353986642be160], xxx.xxx.xxx.5:9997[3353986642be24f], xxx.xxx.xxx.8:9997[135396fb18ee7ab], xxx.xxx.xxx.12:9997[235396fb181e0c6]]
> 12 08:05:59,065 [master.Master] DEBUG: not balancing because the balance information is out-of-date [xxx.xxx.xxx.7:9997[3353986642be160], xxx.xxx.xxx.5:9997[3353986642be24f], xxx.xxx.xxx.8:9997[135396fb18ee7ab], xxx.xxx.xxx.12:9997[235396fb181e0c6]]
> 12 08:05:59,641 [master.Master] DEBUG: 1 assigned to dead servers: [6;3d40b2;3d20b1@(null,xxx.xxx.xxx.12:9997[235396fb181e0c6],null)]...
> 12 08:05:59,715 [master.Master] DEBUG: not balancing because the balance information is out-of-date [xxx.xxx.xxx.12:9997[235396fb181e0c6]]
> {noformat}
> Another instance of a tablet server start on xxx.xxx.xxx.12
> {noformat}
> 12 08:07:35,245 [master.Master] INFO : New servers: [xxx.xxx.xxx.7:9997[3353986642be345], xxx.xxx.xxx.12:9997[235396fb181e15c]]
> {noformat}
> Much later its still not balancing for some reason.
> {noformat}
> 13 16:31:24,131 [master.Master] DEBUG: not balancing because the balance information is out-of-date [xxx.xxx.xxx.12:9997[235396fb181e0c6]]
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira