You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "Eric Newton (JIRA)" <ji...@apache.org> on 2013/08/10 00:26:48 UTC

[jira] [Resolved] (ACCUMULO-1651) GC removed WAL that master wasn't done with

     [ https://issues.apache.org/jira/browse/ACCUMULO-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eric Newton resolved ACCUMULO-1651.
-----------------------------------

       Resolution: Fixed
    Fix Version/s: 1.6.0
         Assignee: Eric Newton  (was: Michael Berman)
    
> GC removed WAL that master wasn't done with
> -------------------------------------------
>
>                 Key: ACCUMULO-1651
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1651
>             Project: Accumulo
>          Issue Type: Bug
>          Components: gc, master
>            Reporter: Michael Berman
>            Assignee: Eric Newton
>             Fix For: 1.6.0
>
>
> I have a master that's spinning trying to recover a walog that doesn't exist in hdfs.  It looks like the GC cleaned it up.  I was stopping and starting my cluster throughout this period, and there was at least a few minutes in which every service was talking SSL except the GC, so the GC couldn't receive thrift messages from other services, but [~vines] says this shouldn't affect the GC's deletion behavior.
> Here are some relevant logs.  Note that the master thinks its logSet includes that file straight through the time the GC removed it.
> GC:
> {code}
> 2013-08-09 11:58:14,835 [util.MetadataTableUtil] INFO : Returning logs [!!R<< hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7 (1)] for extent !!R<<
> 2013-08-09 11:58:14,852 [gc.GarbageCollectWriteAheadLogs] DEBUG: Removing WAL for offline server hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7
> 2013-08-09 12:03:15,467 [util.MetadataTableUtil] INFO : Returning logs [!!R<< hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7 (1)] for extent !!R<<
> {code}
> Master:
> {code}
> 2013-08-09 11:57:45,235 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
> 2013-08-09 11:57:45,238 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
> 2013-08-09 11:57:45,286 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
> 2013-08-09 11:57:45,324 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
> 2013-08-09 11:57:45,939 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
> 2013-08-09 11:57:45,942 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
> 2013-08-09 11:57:45,975 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
> 2013-08-09 11:57:55,612 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
> 2013-08-09 11:57:55,679 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
> 2013-08-09 11:57:55,739 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
> 2013-08-09 11:57:55,764 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
> 2013-08-09 11:57:55,784 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
> 2013-08-09 11:57:56,031 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
> 2013-08-09 11:57:56,046 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
> 2013-08-09 11:58:56,051 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
> 2013-08-09 11:59:56,057 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
> 2013-08-09 12:00:56,062 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
> 2013-08-09 12:01:56,066 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
> 2013-08-09 12:02:56,071 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
> 2013-08-09 12:08:56,103 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
> 2013-08-09 12:09:56,108 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
> 2013-08-09 12:10:56,113 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
> 2013-08-09 12:11:56,118 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
> 2013-08-09 12:13:19,883 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
> 2013-08-09 12:14:19,887 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
> <master was restarted here>
> 2013-08-09 12:15:44,459 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
> 2013-08-09 12:15:44,467 [recovery.RecoveryManager] DEBUG: Recovering hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7 to hdfs://localhost:54310/otherAccumuloInstance/recovery/5a383792-c89b-41ed-bc22-0802e76638f7
> 2013-08-09 12:15:44,472 [recovery.RecoveryManager] INFO : Starting recovery of hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7 (in : 10s) created for localhost+9997, tablet !!R<< holds a reference
> 2013-08-09 12:15:54,479 [recovery.RecoveryManager] DEBUG: Unable to initate log sort for hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7: java.io.FileNotFoundException: java.io.FileNotFoundException: File not found /otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7
> 2013-08-09 12:16:44,487 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
> 2013-08-09 12:16:44,488 [recovery.RecoveryManager] DEBUG: Recovering hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7 to hdfs://localhost:54310/otherAccumuloInstance/recovery/5a383792-c89b-41ed-bc22-0802e76638f7
> 2013-08-09 12:16:44,490 [recovery.RecoveryManager] INFO : Starting recovery of hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7 (in : 20s) created for localhost+9997, tablet !!R<< holds a reference
> 2013-08-09 12:17:04,494 [recovery.RecoveryManager] DEBUG: Unable to initate log sort for hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7: java.io.FileNotFoundException: java.io.FileNotFoundException: File not found /otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7
> <repeating ad infinitum>
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira