You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by "Keith Turner (Commented) (JIRA)" <ji...@apache.org> on 2012/02/07 19:54:59 UTC

[jira] [Commented] (ACCUMULO-366) master killed a tablet server

    [ https://issues.apache.org/jira/browse/ACCUMULO-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202625#comment-13202625 ] 

Keith Turner commented on ACCUMULO-366:
---------------------------------------

Saw this bug again.  A minor compaction was attempted after the tablet was closed.  Looking at the code, the initiateMinorCompaction function tries to get the flush id from zookeeper, even if the tablet is closed.   Trying to call initiateMinroCompaction on a closed tablet should do nothing.

{noformat}
07 08:10:38,419 [tabletserver.Tablet] TABLET_HIST: f5i;10a579089cf842a0< closed

07 08:15:38,426 [tabletserver.LargestFirstMemoryManager] DEBUG: IDLE minor compaction chosen
07 08:15:38,427 [tabletserver.LargestFirstMemoryManager] DEBUG: COMPACTING f5i;10a579089cf842a0<  total = 32,091,937 ingestMemory = 32,091,937
07 08:15:38,427 [tabletserver.LargestFirstMemoryManager] DEBUG: chosenMem = 99,252 chosenIT = 300.01 load 125,050
07 08:15:38,427 [tabletserver.TabletServerResourceManager] ERROR: Minor compactions for memory managment failed
java.lang.RuntimeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /accumulo/fbebb086-960c-4a97-b502-154fc333d766/tables/f5i/flush-id
        at org.apache.accumulo.server.tabletserver.Tablet.getFlushID(Tablet.java:2349)
        at org.apache.accumulo.server.tabletserver.Tablet.initiateMinorCompaction(Tablet.java:2287)
        at org.apache.accumulo.server.tabletserver.TabletServerResourceManager$MemoryManagementFramework.manageMemory(TabletServerResourceManager.java:328)
        at org.apache.accumulo.server.tabletserver.TabletServerResourceManager$MemoryManagementFramework.access$1(TabletServerResourceManager.java:303)
        at org.apache.accumulo.server.tabletserver.TabletServerResourceManager$MemoryManagementFramework$2.run(TabletServerResourceManager.java:252)
        at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
        at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /accumulo/fbebb086-960c-4a97-b502-154fc333d766/tables/f5i/flush-id
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:921)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:950)
        at org.apache.accumulo.core.zookeeper.ZooReader.getData(ZooReader.java:42)
        at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.accumulo.server.zookeeper.ZooReaderWriter$1.invoke(ZooReaderWriter.java:169)
        at $Proxy4.getData(Unknown Source)
        at org.apache.accumulo.server.tabletserver.Tablet.getFlushID(Tablet.java:2347)
        ... 6 more

{noformat}
                
> master killed a tablet server
> -----------------------------
>
>                 Key: ACCUMULO-366
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-366
>             Project: Accumulo
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 1.4.0
>         Environment: randomwalk test on a 10 node test cluster
>            Reporter: Eric Newton
>            Assignee: Keith Turner
>
> Master killed a tablet server for having long hold times.
> The tablet server had this error during minor compaction:
> {noformat}
> 01 23:57:20,073 [security.ZKAuthenticator] ERROR: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /accumulo/88cd0f63-a36a-4218-86b1-9ba1d2cccf08/users/user004
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /accumulo/88cd0f63-a36a-4218-86b1-9ba1d2cccf08/users/user004
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243)
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1271)
>         at org.apache.accumulo.core.zookeeper.ZooUtil.recursiveDelete(ZooUtil.java:103)
>         at org.apache.accumulo.core.zookeeper.ZooUtil.recursiveDelete(ZooUtil.java:117)
>         at org.apache.accumulo.server.zookeeper.ZooReaderWriter.recursiveDelete(ZooReaderWriter.java:67)
>         at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.accumulo.server.zookeeper.ZooReaderWriter$1.invoke(ZooReaderWriter.java:169)
>         at $Proxy4.recursiveDelete(Unknown Source)
>         at org.apache.accumulo.server.security.ZKAuthenticator.dropUser(ZKAuthenticator.java:252)
>         at org.apache.accumulo.server.security.Auditor.dropUser(Auditor.java:104)
>         at org.apache.accumulo.server.client.ClientServiceHandler.dropUser(ClientServiceHandler.java:136)
>         at sun.reflect.GeneratedMethodAccessor52.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at cloudtrace.instrument.thrift.TraceWrap$1.invoke(TraceWrap.java:58)
>         at $Proxy2.dropUser(Unknown Source)
>         at org.apache.accumulo.core.client.impl.thrift.ClientService$Processor$dropUser.process(ClientService.java:2257)
>         at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor.process(TabletClientService.java:2037)
>         at org.apache.accumulo.server.util.TServerUtils$TimedProcessor.process(TServerUtils.java:151)
>         at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:631)
>         at org.apache.accumulo.server.util.TServerUtils$THsHaServer$Invocation.run(TServerUtils.java:199)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
>         at java.lang.Thread.run(Thread.java:662)
> {noformat}
> This tablet was the result of a split that occurred during a delete.  The master missed this tablet when taking tablets offline.
> We need to do a consistency check on the offline tablets before deleting the table information in zookeeper.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira