You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by "Eric Newton (Created) (JIRA)" <ji...@apache.org> on 2012/02/02 20:16:55 UTC
[jira] [Created] (ACCUMULO-366) master killed a tablet server
master killed a tablet server
-----------------------------
Key: ACCUMULO-366
URL: https://issues.apache.org/jira/browse/ACCUMULO-366
Project: Accumulo
Issue Type: Bug
Components: master
Affects Versions: 1.4.0
Environment: randomwalk test on a 10 node test cluster
Reporter: Eric Newton
Assignee: Keith Turner
Master killed a tablet server for having long hold times.
The tablet server had this error during minor compaction:
{noformat}
01 23:57:20,073 [security.ZKAuthenticator] ERROR: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /accumulo/88cd0f63-a36a-4218-86b1-9ba1d2cccf08/users/user004
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /accumulo/88cd0f63-a36a-4218-86b1-9ba1d2cccf08/users/user004
at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1271)
at org.apache.accumulo.core.zookeeper.ZooUtil.recursiveDelete(ZooUtil.java:103)
at org.apache.accumulo.core.zookeeper.ZooUtil.recursiveDelete(ZooUtil.java:117)
at org.apache.accumulo.server.zookeeper.ZooReaderWriter.recursiveDelete(ZooReaderWriter.java:67)
at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.accumulo.server.zookeeper.ZooReaderWriter$1.invoke(ZooReaderWriter.java:169)
at $Proxy4.recursiveDelete(Unknown Source)
at org.apache.accumulo.server.security.ZKAuthenticator.dropUser(ZKAuthenticator.java:252)
at org.apache.accumulo.server.security.Auditor.dropUser(Auditor.java:104)
at org.apache.accumulo.server.client.ClientServiceHandler.dropUser(ClientServiceHandler.java:136)
at sun.reflect.GeneratedMethodAccessor52.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at cloudtrace.instrument.thrift.TraceWrap$1.invoke(TraceWrap.java:58)
at $Proxy2.dropUser(Unknown Source)
at org.apache.accumulo.core.client.impl.thrift.ClientService$Processor$dropUser.process(ClientService.java:2257)
at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor.process(TabletClientService.java:2037)
at org.apache.accumulo.server.util.TServerUtils$TimedProcessor.process(TServerUtils.java:151)
at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:631)
at org.apache.accumulo.server.util.TServerUtils$THsHaServer$Invocation.run(TServerUtils.java:199)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
at java.lang.Thread.run(Thread.java:662)
{noformat}
This tablet was the result of a split that occurred during a delete. The master missed this tablet when taking tablets offline.
We need to do a consistency check on the offline tablets before deleting the table information in zookeeper.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (ACCUMULO-366) master killed a tablet server
Posted by "Keith Turner (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/ACCUMULO-366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Keith Turner updated ACCUMULO-366:
----------------------------------
Affects Version/s: (was: 1.4.0)
Fix Version/s: 1.4.0
> master killed a tablet server
> -----------------------------
>
> Key: ACCUMULO-366
> URL: https://issues.apache.org/jira/browse/ACCUMULO-366
> Project: Accumulo
> Issue Type: Bug
> Components: master
> Environment: randomwalk test on a 10 node test cluster
> Reporter: Eric Newton
> Assignee: Keith Turner
> Fix For: 1.4.0
>
>
> Master killed a tablet server for having long hold times.
> The tablet server had this error during minor compaction:
> {noformat}
> 01 23:57:20,073 [security.ZKAuthenticator] ERROR: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /accumulo/88cd0f63-a36a-4218-86b1-9ba1d2cccf08/users/user004
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /accumulo/88cd0f63-a36a-4218-86b1-9ba1d2cccf08/users/user004
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243)
> at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1271)
> at org.apache.accumulo.core.zookeeper.ZooUtil.recursiveDelete(ZooUtil.java:103)
> at org.apache.accumulo.core.zookeeper.ZooUtil.recursiveDelete(ZooUtil.java:117)
> at org.apache.accumulo.server.zookeeper.ZooReaderWriter.recursiveDelete(ZooReaderWriter.java:67)
> at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.accumulo.server.zookeeper.ZooReaderWriter$1.invoke(ZooReaderWriter.java:169)
> at $Proxy4.recursiveDelete(Unknown Source)
> at org.apache.accumulo.server.security.ZKAuthenticator.dropUser(ZKAuthenticator.java:252)
> at org.apache.accumulo.server.security.Auditor.dropUser(Auditor.java:104)
> at org.apache.accumulo.server.client.ClientServiceHandler.dropUser(ClientServiceHandler.java:136)
> at sun.reflect.GeneratedMethodAccessor52.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at cloudtrace.instrument.thrift.TraceWrap$1.invoke(TraceWrap.java:58)
> at $Proxy2.dropUser(Unknown Source)
> at org.apache.accumulo.core.client.impl.thrift.ClientService$Processor$dropUser.process(ClientService.java:2257)
> at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor.process(TabletClientService.java:2037)
> at org.apache.accumulo.server.util.TServerUtils$TimedProcessor.process(TServerUtils.java:151)
> at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:631)
> at org.apache.accumulo.server.util.TServerUtils$THsHaServer$Invocation.run(TServerUtils.java:199)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
> at java.lang.Thread.run(Thread.java:662)
> {noformat}
> This tablet was the result of a split that occurred during a delete. The master missed this tablet when taking tablets offline.
> We need to do a consistency check on the offline tablets before deleting the table information in zookeeper.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ACCUMULO-366) master killed a tablet server
Posted by "Keith Turner (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/ACCUMULO-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202625#comment-13202625 ]
Keith Turner commented on ACCUMULO-366:
---------------------------------------
Saw this bug again. A minor compaction was attempted after the tablet was closed. Looking at the code, the initiateMinorCompaction function tries to get the flush id from zookeeper, even if the tablet is closed. Trying to call initiateMinroCompaction on a closed tablet should do nothing.
{noformat}
07 08:10:38,419 [tabletserver.Tablet] TABLET_HIST: f5i;10a579089cf842a0< closed
07 08:15:38,426 [tabletserver.LargestFirstMemoryManager] DEBUG: IDLE minor compaction chosen
07 08:15:38,427 [tabletserver.LargestFirstMemoryManager] DEBUG: COMPACTING f5i;10a579089cf842a0< total = 32,091,937 ingestMemory = 32,091,937
07 08:15:38,427 [tabletserver.LargestFirstMemoryManager] DEBUG: chosenMem = 99,252 chosenIT = 300.01 load 125,050
07 08:15:38,427 [tabletserver.TabletServerResourceManager] ERROR: Minor compactions for memory managment failed
java.lang.RuntimeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /accumulo/fbebb086-960c-4a97-b502-154fc333d766/tables/f5i/flush-id
at org.apache.accumulo.server.tabletserver.Tablet.getFlushID(Tablet.java:2349)
at org.apache.accumulo.server.tabletserver.Tablet.initiateMinorCompaction(Tablet.java:2287)
at org.apache.accumulo.server.tabletserver.TabletServerResourceManager$MemoryManagementFramework.manageMemory(TabletServerResourceManager.java:328)
at org.apache.accumulo.server.tabletserver.TabletServerResourceManager$MemoryManagementFramework.access$1(TabletServerResourceManager.java:303)
at org.apache.accumulo.server.tabletserver.TabletServerResourceManager$MemoryManagementFramework$2.run(TabletServerResourceManager.java:252)
at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /accumulo/fbebb086-960c-4a97-b502-154fc333d766/tables/f5i/flush-id
at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:921)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:950)
at org.apache.accumulo.core.zookeeper.ZooReader.getData(ZooReader.java:42)
at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.accumulo.server.zookeeper.ZooReaderWriter$1.invoke(ZooReaderWriter.java:169)
at $Proxy4.getData(Unknown Source)
at org.apache.accumulo.server.tabletserver.Tablet.getFlushID(Tablet.java:2347)
... 6 more
{noformat}
> master killed a tablet server
> -----------------------------
>
> Key: ACCUMULO-366
> URL: https://issues.apache.org/jira/browse/ACCUMULO-366
> Project: Accumulo
> Issue Type: Bug
> Components: master
> Affects Versions: 1.4.0
> Environment: randomwalk test on a 10 node test cluster
> Reporter: Eric Newton
> Assignee: Keith Turner
>
> Master killed a tablet server for having long hold times.
> The tablet server had this error during minor compaction:
> {noformat}
> 01 23:57:20,073 [security.ZKAuthenticator] ERROR: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /accumulo/88cd0f63-a36a-4218-86b1-9ba1d2cccf08/users/user004
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /accumulo/88cd0f63-a36a-4218-86b1-9ba1d2cccf08/users/user004
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243)
> at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1271)
> at org.apache.accumulo.core.zookeeper.ZooUtil.recursiveDelete(ZooUtil.java:103)
> at org.apache.accumulo.core.zookeeper.ZooUtil.recursiveDelete(ZooUtil.java:117)
> at org.apache.accumulo.server.zookeeper.ZooReaderWriter.recursiveDelete(ZooReaderWriter.java:67)
> at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.accumulo.server.zookeeper.ZooReaderWriter$1.invoke(ZooReaderWriter.java:169)
> at $Proxy4.recursiveDelete(Unknown Source)
> at org.apache.accumulo.server.security.ZKAuthenticator.dropUser(ZKAuthenticator.java:252)
> at org.apache.accumulo.server.security.Auditor.dropUser(Auditor.java:104)
> at org.apache.accumulo.server.client.ClientServiceHandler.dropUser(ClientServiceHandler.java:136)
> at sun.reflect.GeneratedMethodAccessor52.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at cloudtrace.instrument.thrift.TraceWrap$1.invoke(TraceWrap.java:58)
> at $Proxy2.dropUser(Unknown Source)
> at org.apache.accumulo.core.client.impl.thrift.ClientService$Processor$dropUser.process(ClientService.java:2257)
> at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor.process(TabletClientService.java:2037)
> at org.apache.accumulo.server.util.TServerUtils$TimedProcessor.process(TServerUtils.java:151)
> at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:631)
> at org.apache.accumulo.server.util.TServerUtils$THsHaServer$Invocation.run(TServerUtils.java:199)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
> at java.lang.Thread.run(Thread.java:662)
> {noformat}
> This tablet was the result of a split that occurred during a delete. The master missed this tablet when taking tablets offline.
> We need to do a consistency check on the offline tablets before deleting the table information in zookeeper.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (ACCUMULO-366) master killed a tablet server
Posted by "Keith Turner (Resolved) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/ACCUMULO-366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Keith Turner resolved ACCUMULO-366.
-----------------------------------
Resolution: Fixed
> master killed a tablet server
> -----------------------------
>
> Key: ACCUMULO-366
> URL: https://issues.apache.org/jira/browse/ACCUMULO-366
> Project: Accumulo
> Issue Type: Bug
> Components: master
> Environment: randomwalk test on a 10 node test cluster
> Reporter: Eric Newton
> Assignee: Keith Turner
> Fix For: 1.4.0
>
>
> Master killed a tablet server for having long hold times.
> The tablet server had this error during minor compaction:
> {noformat}
> 01 23:57:20,073 [security.ZKAuthenticator] ERROR: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /accumulo/88cd0f63-a36a-4218-86b1-9ba1d2cccf08/users/user004
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /accumulo/88cd0f63-a36a-4218-86b1-9ba1d2cccf08/users/user004
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243)
> at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1271)
> at org.apache.accumulo.core.zookeeper.ZooUtil.recursiveDelete(ZooUtil.java:103)
> at org.apache.accumulo.core.zookeeper.ZooUtil.recursiveDelete(ZooUtil.java:117)
> at org.apache.accumulo.server.zookeeper.ZooReaderWriter.recursiveDelete(ZooReaderWriter.java:67)
> at sun.reflect.GeneratedMethodAccessor53.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.accumulo.server.zookeeper.ZooReaderWriter$1.invoke(ZooReaderWriter.java:169)
> at $Proxy4.recursiveDelete(Unknown Source)
> at org.apache.accumulo.server.security.ZKAuthenticator.dropUser(ZKAuthenticator.java:252)
> at org.apache.accumulo.server.security.Auditor.dropUser(Auditor.java:104)
> at org.apache.accumulo.server.client.ClientServiceHandler.dropUser(ClientServiceHandler.java:136)
> at sun.reflect.GeneratedMethodAccessor52.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at cloudtrace.instrument.thrift.TraceWrap$1.invoke(TraceWrap.java:58)
> at $Proxy2.dropUser(Unknown Source)
> at org.apache.accumulo.core.client.impl.thrift.ClientService$Processor$dropUser.process(ClientService.java:2257)
> at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor.process(TabletClientService.java:2037)
> at org.apache.accumulo.server.util.TServerUtils$TimedProcessor.process(TServerUtils.java:151)
> at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:631)
> at org.apache.accumulo.server.util.TServerUtils$THsHaServer$Invocation.run(TServerUtils.java:199)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
> at java.lang.Thread.run(Thread.java:662)
> {noformat}
> This tablet was the result of a split that occurred during a delete. The master missed this tablet when taking tablets offline.
> We need to do a consistency check on the offline tablets before deleting the table information in zookeeper.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira