You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by "Keith Turner (Created) (JIRA)" <ji...@apache.org> on 2012/02/03 20:43:54 UTC

[jira] [Created] (ACCUMULO-368) tablet had location but was not loaded

tablet had location but was not loaded
--------------------------------------

                 Key: ACCUMULO-368
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-368
             Project: Accumulo
          Issue Type: Bug
          Components: tserver
         Environment: Running random walktest against 1.4.0-SNAP on 10 node cluster
            Reporter: Keith Turner
            Assignee: Keith Turner
             Fix For: 1.4.0


While running the random walk test a delete range operation got hung because it could not split a tablet.  The tablet in question failed to load because the tablet server thought it was already serving it.

{noformat}
03 11:19:18,249 [tabletserver.Tablet] TABLET_HIST: 3nq;77cd1e415c4547a4< split 3nq;133660072804a502< 3nq;77cd1e415c4547a4;133660072804a502
03 11:19:18,249 [tabletserver.Tablet] TABLET_HIST: 3nq;133660072804a502< opened 
03 11:19:26,236 [tabletserver.Tablet] TABLET_HIST: 3nq;133660072804a502< import /b-0005t8f/I0005t8g.rf 388308 0
03 11:19:45,672 [tabletserver.Tablet] TABLET_HIST: 3nq;133660072804a502< MinC [memory] -> /t-0005typ/F0005tz4.rf
03 11:19:45,686 [tabletserver.Tablet] TABLET_HIST: 3nq;133660072804a502< closed
03 11:19:45,840 [tabletserver.Tablet] TABLET_HIST: 3nq;133660072804a502< opened
03 11:19:45,987 [tabletserver.Tablet] TABLET_HIST: 3nq;133660072804a502< closed
03 11:19:46,142 [tabletserver.TabletServer] INFO : Loading tablet 3nq;133660072804a502<
03 11:19:46,144 [tabletserver.TabletServer] ERROR: Tablet seems to be already assigned to xxx.xxx.xxx.9:9997[135396fb18d3fb0]
03 11:19:46,144 [tabletserver.TabletServer] INFO : Reporting tablet 3nq;133660072804a502< assignment failure: unable to verify Tablet Information
{noformat}

Looking at the walogs below it seems that the data mutations for the last successful open and close were written in reverse order.

{noformat}
1 mutations:
  3nq;133660072804a502
      ~tab:~pr [system]:959756 [] ^@
      srv:dir [system]:959756 [] /t-0005typ
      srv:time [system]:959756 [] M1328267935757
      loc:135396fb18d3fb0 [system]:959756 [] xxx.xxx.xxx.9:9997
      future:135396fb18d3fb0 [system]:959756 [] <deleted>
      srv:lock [system]:959756 [] tservers/xxx.xxx.xxx.9:9997/zlock-0000000000$135396fb18d3fb0

MUTATION 6462 5
1 mutations:
  3nq;133660072804a502
      file:/b-0005t8f/I0005t8g.rf [system]:959986 [] 388308,0
      loaded:/b-0005t8f/I0005t8g.rf [system]:959986 [] 1681970597222144296
      srv:time [system]:959986 [] M1328267935757
      srv:lock [system]:959986 [] tservers/xxx.xxx.xxx.9:9997/zlock-0000000000$135396fb18d3fb0

MUTATION 6462 5
1 mutations:
  3nq;133660072804a502
      file:/t-0005typ/F0005tz4.rf [system]:960298 [] 185156,44330
      srv:time [system]:960298 [] M1328267963158
      last:135396fb18d3fb0 [system]:960298 [] xxx.xxx.xxx.9:9997
      log:xxx.xxx.xxx.12:11224/cad1617c-5fb2-4057-abec-8edd46d0cf7a [system]:960298 [] <deleted>
      log:xxx.xxx.xxx.5:11224/50611604-8e6c-48a8-8e16-eb739a991721 [system]:960298 [] <deleted>
      srv:flush [system]:960298 [] 0
      srv:lock [system]:960298 [] tservers/xxx.xxx.xxx.9:9997/zlock-0000000000$135396fb18d3fb0

MANY_MUTATIONS 6462 5
1 mutations:
  3nq;133660072804a502
      loc:135396fb18d3fb0 [system]:960302 [] <deleted>

MANY_MUTATIONS 6462 5
1 mutations:
  3nq;133660072804a502
      future:135396fb18d3fb0 [system]:960321 [] xxx.xxx.xxx.9:9997

MANY_MUTATIONS 6462 5
1 mutations:
  3nq;133660072804a502
      loc:135396fb18d3fb0 [system]:960326 [] <deleted>

MANY_MUTATIONS 6462 5
1 mutations:
  3nq;133660072804a502
      loc:135396fb18d3fb0 [system]:960332 [] xxx.xxx.xxx.9:9997
      future:135396fb18d3fb0 [system]:960332 [] <deleted>
{noformat}

Looking at the tablet server code, a tablet is put in online tablets and then the location is written to the metadata table.  Since the tablet is in online tablets it could be unloaded.  I think that is what happened here.  In the short period of time between putting the tablet in onlinetablets and writing the location to the metadata table, the tablet was unloaded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ACCUMULO-368) tablet had location but was not loaded

Posted by "Keith Turner (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ACCUMULO-368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Keith Turner updated ACCUMULO-368:
----------------------------------

    Affects Version/s: 1.3.5
    
> tablet had location but was not loaded
> --------------------------------------
>
>                 Key: ACCUMULO-368
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-368
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.3.5
>         Environment: Running random walktest against 1.4.0-SNAP on 10 node cluster
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>             Fix For: 1.4.0
>
>
> While running the random walk test a delete range operation got hung because it could not split a tablet.  The tablet in question failed to load because the tablet server thought it was already serving it.
> {noformat}
> 03 11:19:18,249 [tabletserver.Tablet] TABLET_HIST: 3nq;77cd1e415c4547a4< split 3nq;133660072804a502< 3nq;77cd1e415c4547a4;133660072804a502
> 03 11:19:18,249 [tabletserver.Tablet] TABLET_HIST: 3nq;133660072804a502< opened 
> 03 11:19:26,236 [tabletserver.Tablet] TABLET_HIST: 3nq;133660072804a502< import /b-0005t8f/I0005t8g.rf 388308 0
> 03 11:19:45,672 [tabletserver.Tablet] TABLET_HIST: 3nq;133660072804a502< MinC [memory] -> /t-0005typ/F0005tz4.rf
> 03 11:19:45,686 [tabletserver.Tablet] TABLET_HIST: 3nq;133660072804a502< closed
> 03 11:19:45,840 [tabletserver.Tablet] TABLET_HIST: 3nq;133660072804a502< opened
> 03 11:19:45,987 [tabletserver.Tablet] TABLET_HIST: 3nq;133660072804a502< closed
> 03 11:19:46,142 [tabletserver.TabletServer] INFO : Loading tablet 3nq;133660072804a502<
> 03 11:19:46,144 [tabletserver.TabletServer] ERROR: Tablet seems to be already assigned to xxx.xxx.xxx.9:9997[135396fb18d3fb0]
> 03 11:19:46,144 [tabletserver.TabletServer] INFO : Reporting tablet 3nq;133660072804a502< assignment failure: unable to verify Tablet Information
> {noformat}
> Looking at the walogs below it seems that the data mutations for the last successful open and close were written in reverse order.
> {noformat}
> 1 mutations:
>   3nq;133660072804a502
>       ~tab:~pr [system]:959756 [] ^@
>       srv:dir [system]:959756 [] /t-0005typ
>       srv:time [system]:959756 [] M1328267935757
>       loc:135396fb18d3fb0 [system]:959756 [] xxx.xxx.xxx.9:9997
>       future:135396fb18d3fb0 [system]:959756 [] <deleted>
>       srv:lock [system]:959756 [] tservers/xxx.xxx.xxx.9:9997/zlock-0000000000$135396fb18d3fb0
> MUTATION 6462 5
> 1 mutations:
>   3nq;133660072804a502
>       file:/b-0005t8f/I0005t8g.rf [system]:959986 [] 388308,0
>       loaded:/b-0005t8f/I0005t8g.rf [system]:959986 [] 1681970597222144296
>       srv:time [system]:959986 [] M1328267935757
>       srv:lock [system]:959986 [] tservers/xxx.xxx.xxx.9:9997/zlock-0000000000$135396fb18d3fb0
> MUTATION 6462 5
> 1 mutations:
>   3nq;133660072804a502
>       file:/t-0005typ/F0005tz4.rf [system]:960298 [] 185156,44330
>       srv:time [system]:960298 [] M1328267963158
>       last:135396fb18d3fb0 [system]:960298 [] xxx.xxx.xxx.9:9997
>       log:xxx.xxx.xxx.12:11224/cad1617c-5fb2-4057-abec-8edd46d0cf7a [system]:960298 [] <deleted>
>       log:xxx.xxx.xxx.5:11224/50611604-8e6c-48a8-8e16-eb739a991721 [system]:960298 [] <deleted>
>       srv:flush [system]:960298 [] 0
>       srv:lock [system]:960298 [] tservers/xxx.xxx.xxx.9:9997/zlock-0000000000$135396fb18d3fb0
> MANY_MUTATIONS 6462 5
> 1 mutations:
>   3nq;133660072804a502
>       loc:135396fb18d3fb0 [system]:960302 [] <deleted>
> MANY_MUTATIONS 6462 5
> 1 mutations:
>   3nq;133660072804a502
>       future:135396fb18d3fb0 [system]:960321 [] xxx.xxx.xxx.9:9997
> MANY_MUTATIONS 6462 5
> 1 mutations:
>   3nq;133660072804a502
>       loc:135396fb18d3fb0 [system]:960326 [] <deleted>
> MANY_MUTATIONS 6462 5
> 1 mutations:
>   3nq;133660072804a502
>       loc:135396fb18d3fb0 [system]:960332 [] xxx.xxx.xxx.9:9997
>       future:135396fb18d3fb0 [system]:960332 [] <deleted>
> {noformat}
> Looking at the tablet server code, a tablet is put in online tablets and then the location is written to the metadata table.  Since the tablet is in online tablets it could be unloaded.  I think that is what happened here.  In the short period of time between putting the tablet in onlinetablets and writing the location to the metadata table, the tablet was unloaded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (ACCUMULO-368) tablet had location but was not loaded

Posted by "Keith Turner (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ACCUMULO-368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Keith Turner resolved ACCUMULO-368.
-----------------------------------

    Resolution: Fixed
    
> tablet had location but was not loaded
> --------------------------------------
>
>                 Key: ACCUMULO-368
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-368
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.3.5
>         Environment: Running random walktest against 1.4.0-SNAP on 10 node cluster
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>             Fix For: 1.4.0
>
>
> While running the random walk test a delete range operation got hung because it could not split a tablet.  The tablet in question failed to load because the tablet server thought it was already serving it.
> {noformat}
> 03 11:19:18,249 [tabletserver.Tablet] TABLET_HIST: 3nq;77cd1e415c4547a4< split 3nq;133660072804a502< 3nq;77cd1e415c4547a4;133660072804a502
> 03 11:19:18,249 [tabletserver.Tablet] TABLET_HIST: 3nq;133660072804a502< opened 
> 03 11:19:26,236 [tabletserver.Tablet] TABLET_HIST: 3nq;133660072804a502< import /b-0005t8f/I0005t8g.rf 388308 0
> 03 11:19:45,672 [tabletserver.Tablet] TABLET_HIST: 3nq;133660072804a502< MinC [memory] -> /t-0005typ/F0005tz4.rf
> 03 11:19:45,686 [tabletserver.Tablet] TABLET_HIST: 3nq;133660072804a502< closed
> 03 11:19:45,840 [tabletserver.Tablet] TABLET_HIST: 3nq;133660072804a502< opened
> 03 11:19:45,987 [tabletserver.Tablet] TABLET_HIST: 3nq;133660072804a502< closed
> 03 11:19:46,142 [tabletserver.TabletServer] INFO : Loading tablet 3nq;133660072804a502<
> 03 11:19:46,144 [tabletserver.TabletServer] ERROR: Tablet seems to be already assigned to xxx.xxx.xxx.9:9997[135396fb18d3fb0]
> 03 11:19:46,144 [tabletserver.TabletServer] INFO : Reporting tablet 3nq;133660072804a502< assignment failure: unable to verify Tablet Information
> {noformat}
> Looking at the walogs below it seems that the data mutations for the last successful open and close were written in reverse order.
> {noformat}
> 1 mutations:
>   3nq;133660072804a502
>       ~tab:~pr [system]:959756 [] ^@
>       srv:dir [system]:959756 [] /t-0005typ
>       srv:time [system]:959756 [] M1328267935757
>       loc:135396fb18d3fb0 [system]:959756 [] xxx.xxx.xxx.9:9997
>       future:135396fb18d3fb0 [system]:959756 [] <deleted>
>       srv:lock [system]:959756 [] tservers/xxx.xxx.xxx.9:9997/zlock-0000000000$135396fb18d3fb0
> MUTATION 6462 5
> 1 mutations:
>   3nq;133660072804a502
>       file:/b-0005t8f/I0005t8g.rf [system]:959986 [] 388308,0
>       loaded:/b-0005t8f/I0005t8g.rf [system]:959986 [] 1681970597222144296
>       srv:time [system]:959986 [] M1328267935757
>       srv:lock [system]:959986 [] tservers/xxx.xxx.xxx.9:9997/zlock-0000000000$135396fb18d3fb0
> MUTATION 6462 5
> 1 mutations:
>   3nq;133660072804a502
>       file:/t-0005typ/F0005tz4.rf [system]:960298 [] 185156,44330
>       srv:time [system]:960298 [] M1328267963158
>       last:135396fb18d3fb0 [system]:960298 [] xxx.xxx.xxx.9:9997
>       log:xxx.xxx.xxx.12:11224/cad1617c-5fb2-4057-abec-8edd46d0cf7a [system]:960298 [] <deleted>
>       log:xxx.xxx.xxx.5:11224/50611604-8e6c-48a8-8e16-eb739a991721 [system]:960298 [] <deleted>
>       srv:flush [system]:960298 [] 0
>       srv:lock [system]:960298 [] tservers/xxx.xxx.xxx.9:9997/zlock-0000000000$135396fb18d3fb0
> MANY_MUTATIONS 6462 5
> 1 mutations:
>   3nq;133660072804a502
>       loc:135396fb18d3fb0 [system]:960302 [] <deleted>
> MANY_MUTATIONS 6462 5
> 1 mutations:
>   3nq;133660072804a502
>       future:135396fb18d3fb0 [system]:960321 [] xxx.xxx.xxx.9:9997
> MANY_MUTATIONS 6462 5
> 1 mutations:
>   3nq;133660072804a502
>       loc:135396fb18d3fb0 [system]:960326 [] <deleted>
> MANY_MUTATIONS 6462 5
> 1 mutations:
>   3nq;133660072804a502
>       loc:135396fb18d3fb0 [system]:960332 [] xxx.xxx.xxx.9:9997
>       future:135396fb18d3fb0 [system]:960332 [] <deleted>
> {noformat}
> Looking at the tablet server code, a tablet is put in online tablets and then the location is written to the metadata table.  Since the tablet is in online tablets it could be unloaded.  I think that is what happened here.  In the short period of time between putting the tablet in onlinetablets and writing the location to the metadata table, the tablet was unloaded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira