You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by "Keith Turner (Commented) (JIRA)" <ji...@apache.org> on 2012/02/27 23:17:48 UTC

[jira] [Commented] (ACCUMULO-331) Scan of metadata table failed because tablet closed

    [ https://issues.apache.org/jira/browse/ACCUMULO-331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217628#comment-13217628 ] 

Keith Turner commented on ACCUMULO-331:
---------------------------------------

Saw this again

{noformat}
24 22:05:22,648 [tabletserver.Tablet] TABLET_HIST: n4;a200;a100 closed

24 22:05:23,672 [tabletserver.Tablet] TABLET_HIST: n4;a200;a100 opened

24 22:05:23,800 [tabletserver.TabletServer] WARN : exception while scanning tablet n4;a200;a100
java.io.IOException: Stream closed
        at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:189)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
        at java.io.FilterInputStream.read(FilterInputStream.java:66)
        at java.io.DataInputStream.readByte(DataInputStream.java:248)
        at org.apache.accumulo.core.file.rfile.RelativeKey.readFields(RelativeKey.java:116)
        at org.apache.accumulo.core.file.rfile.RFile$LocalityGroupReader._next(RFile.java:584)
        at org.apache.accumulo.core.file.rfile.RFile$LocalityGroupReader.next(RFile.java:556)
        at org.apache.accumulo.core.iterators.system.ColumnFamilySkippingIterator.consume(ColumnFamilySkippingIterator.java:64)
        at org.apache.accumulo.core.iterators.SkippingIterator.seek(SkippingIterator.java:38)
        at org.apache.accumulo.core.iterators.system.ColumnFamilySkippingIterator.seek(ColumnFamilySkippingIterator.java:125)
        at org.apache.accumulo.core.file.rfile.RFile$Reader.seek(RFile.java:1020)
        at org.apache.accumulo.server.problems.ProblemReportingIterator.seek(ProblemReportingIterator.java:94)
        at org.apache.accumulo.core.iterators.system.MultiIterator.seek(MultiIterator.java:105)
        at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92)
        at org.apache.accumulo.core.iterators.system.DeletingIterator.seek(DeletingIterator.java:67)
        at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92)
        at org.apache.accumulo.core.iterators.Filter.seek(Filter.java:64)
        at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92)
        at org.apache.accumulo.core.iterators.Filter.seek(Filter.java:64)
        at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92)
        at org.apache.accumulo.core.iterators.user.VersioningIterator.seek(VersioningIterator.java:78)
        at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.readNext(SourceSwitchingIterator.java:116)
        at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.seek(SourceSwitchingIterator.java:168)
        at org.apache.accumulo.server.tabletserver.Tablet.nextBatch(Tablet.java:1743)
        at org.apache.accumulo.server.tabletserver.Tablet.access$3200(Tablet.java:143)
        at org.apache.accumulo.server.tabletserver.Tablet$Scanner.read(Tablet.java:1883)
        at org.apache.accumulo.server.tabletserver.TabletServer$ThriftClientHandler$NextBatchTask.run(TabletServer.java:905)
        at cloudtrace.instrument.TraceRunnable.run(TraceRunnable.java:47)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

24 22:05:23,948 [tabletserver.Tablet] TABLET_HIST: n4;a200;a100 closed
{noformat}
                
> Scan of metadata table failed because tablet closed
> ---------------------------------------------------
>
>                 Key: ACCUMULO-331
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-331
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>         Environment: Running random walk test on 10 node cluster
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>              Labels: 14_qa_bug
>             Fix For: 1.4.0
>
>
> The security random walk test failed to drop a table.
> {noformat}
> 19 05:56:04,122 [randomwalk.Framework] ERROR: Error during random walk
> java.lang.Exception: Error running node Security.xml
>         at org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:253)
>         at org.apache.accumulo.server.test.randomwalk.Framework.run(Framework.java:61)
>         at org.apache.accumulo.server.test.randomwalk.Framework.main(Framework.java:114)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.accumulo.start.Main$1.run(Main.java:89)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.Exception: Error running node security.TableOp
>         at org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:253)
>         at org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:249)
>         ... 8 more
> Caused by: org.apache.accumulo.core.client.AccumuloException: Internal error processing waitForTableOperation
>         at org.apache.accumulo.core.client.admin.TableOperationsImpl.doTableOperation(TableOperationsImpl.java:293)
>         at org.apache.accumulo.core.client.admin.TableOperationsImpl.doTableOperation(TableOperationsImpl.java:261)
>         at org.apache.accumulo.core.client.admin.TableOperationsImpl.delete(TableOperationsImpl.java:508)
>         at org.apache.accumulo.server.test.randomwalk.security.DropTable.dropTable(DropTable.java:66)
>         at org.apache.accumulo.server.test.randomwalk.security.TableOp.visit(TableOp.java:204)
>         at org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:249)
>         ... 9 more
> Caused by: org.apache.thrift.TApplicationException: Internal error processing waitForTableOperation
>         at org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
>         at org.apache.accumulo.core.master.thrift.MasterClientService$Client.recv_waitForTableOperation(MasterClientService.java:684)
>         at org.apache.accumulo.core.master.thrift.MasterClientService$Client.waitForTableOperation(MasterClientService.java:665)
>         at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at cloudtrace.instrument.thrift.TraceWrap$2.invoke(TraceWrap.java:83)
>         at $Proxy1.waitForTableOperation(Unknown Source)
>         at org.apache.accumulo.core.client.admin.TableOperationsImpl.waitForTableOperation(TableOperationsImpl.java:233)
>         at org.apache.accumulo.core.client.admin.TableOperationsImpl.doTableOperation(TableOperationsImpl.java:275)
>         ... 14 more
> {noformat}
> Looking in the master logs see the following indicating a scan failed while trying to delete the table.
> {noformat}
> 19 05:56:04,098 [thrift.MasterClientService$Processor] ERROR: Internal error processing waitForTableOperation
> java.lang.RuntimeException: org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server xxx.xxx.xxx.xxx:9997
>         at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.hasNext(TabletServerBatchReaderIterator.java:188)
>         at org.apache.accumulo.server.master.state.MetaDataTableScanner.hasNext(MetaDataTableScanner.java:97)
>         at org.apache.accumulo.server.master.tableOps.CleanUp.isReady(DeleteTable.java:94)
>         at org.apache.accumulo.server.master.tableOps.CleanUp.isReady(DeleteTable.java:55)
>         at org.apache.accumulo.server.master.tableOps.TraceRepo.isReady(TraceRepo.java:50)
>         at org.apache.accumulo.server.fate.Fate$TransactionRunner.run(Fate.java:62)
>         at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server xxx.xxx.xxx.xxx:9997
>         at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:579)
>         at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:335)
>         at cloudtrace.instrument.TraceRunnable.run(TraceRunnable.java:47)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         ... 1 more
> Caused by: org.apache.thrift.TApplicationException: Internal error processing startMultiScan
>         at org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
>         at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:298)
>         at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:274)
>         at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at cloudtrace.instrument.thrift.TraceWrap$2.invoke(TraceWrap.java:83)
>         at $Proxy4.startMultiScan(Unknown Source)
>         at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:539)
>         ... 5 more
> {noformat}
> Looking on the tablet server where the scan failed, see the following
> {noformat}
> 19 05:56:04,062 [tabletserver.Tablet] DEBUG: initiateClose(saveState=true queueMinC=false disableWrites=false) !0;~;!0<
> 19 05:56:04,062 [tabletserver.Tablet] DEBUG: completeClose(saveState=true completeClose=true) !0;~;!0<
> 19 05:56:04,063 [problems.ProblemReports] DEBUG: Filing problem report !0 FILE_READ /accumulo/tables/!0/table_info/F0002xng.rf
> 19 05:56:04,072 [tabletserver.Tablet] TABLET_HIST: !0;~;!0< closed
> 19 05:56:04,072 [tabletserver.TabletServer] DEBUG: Unassigning !0;~;!0<@(null,192.168.117.13:9997[134d7425fc5940d],null)
> 19 05:56:04,073 [tabletserver.TabletServer] WARN : lookup failed for tablet !0;~;!0<
> java.io.EOFException
>         at java.io.DataInputStream.readByte(DataInputStream.java:250)
>         at org.apache.accumulo.core.file.rfile.RelativeKey.fastSkip(RelativeKey.java:249)
>         at org.apache.accumulo.core.file.rfile.RFile$LocalityGroupReader._seek(RFile.java:677)
>         at org.apache.accumulo.core.file.rfile.RFile$LocalityGroupReader.seek(RFile.java:614)
>         at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92)
>         at org.apache.accumulo.core.iterators.SkippingIterator.seek(SkippingIterator.java:37)
>         at org.apache.accumulo.core.iterators.system.ColumnFamilySkippingIterator.seek(ColumnFamilySkippingIterator.java:125)
>         at org.apache.accumulo.core.file.rfile.RFile$Reader.seek(RFile.java:1020)
>         at org.apache.accumulo.server.problems.ProblemReportingIterator.seek(ProblemReportingIterator.java:94)
>         at org.apache.accumulo.core.iterators.system.MultiIterator.seek(MultiIterator.java:105)
>         at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92)
>         at org.apache.accumulo.core.iterators.system.DeletingIterator.seek(DeletingIterator.java:67)
>         at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92)
>         at org.apache.accumulo.core.iterators.Filter.seek(Filter.java:64)
>         at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92)
>         at org.apache.accumulo.core.iterators.Filter.seek(Filter.java:64)
>         at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92)
>         at org.apache.accumulo.core.iterators.user.VersioningIterator.seek(VersioningIterator.java:89)
>         at org.apache.accumulo.core.iterators.user.WholeRowIterator.seek(WholeRowIterator.java:240)
>         at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92)
>         at org.apache.accumulo.core.iterators.SkippingIterator.seek(SkippingIterator.java:37)
>         at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.readNext(SourceSwitchingIterator.java:116)
>         at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.seek(SourceSwitchingIterator.java:168)
>         at org.apache.accumulo.server.tabletserver.Tablet.lookup(Tablet.java:1607)
>         at org.apache.accumulo.server.tabletserver.Tablet.lookup(Tablet.java:1710)
>         at org.apache.accumulo.server.tabletserver.TabletServer$ThriftClientHandler$LookupTask.run(TabletServer.java:988)
>         at cloudtrace.instrument.TraceRunnable.run(TraceRunnable.java:47)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 19 05:56:04,076 [tabletserver.TabletServer] INFO : unloaded !0;~;!0<
> 19 05:56:04,076 [tabletserver.TabletServer] INFO : told to unload tablet that was not being served !0;~;!0<
> 19 05:56:04,084 [thrift.TabletClientService$Processor] ERROR: Internal error processing startMultiScan
> java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.EOFException
>         at org.apache.accumulo.server.tabletserver.TabletServer$ThriftClientHandler.continueMultiScan(TabletServer.java:1321)
>         at org.apache.accumulo.server.tabletserver.TabletServer$ThriftClientHandler.startMultiScan(TabletServer.java:1272)
>         at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at cloudtrace.instrument.thrift.TraceWrap$1.invoke(TraceWrap.java:58)
>         at $Proxy2.startMultiScan(Unknown Source)
>         at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startMultiScan.process(TabletClientService.java:2164)
>         at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor.process(TabletClientService.java:2037)
>         at org.apache.accumulo.server.util.TServerUtils$TimedProcessor.process(TServerUtils.java:151)
>         at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:631)
>         at org.apache.accumulo.server.util.TServerUtils$THsHaServer$Invocation.run(TServerUtils.java:199)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.EOFException
>         at org.apache.accumulo.server.tabletserver.TabletServer$ScanTask.get(TabletServer.java:661)
>         at org.apache.accumulo.server.tabletserver.TabletServer$ThriftClientHandler.continueMultiScan(TabletServer.java:1308)
>         ... 15 more
> Caused by: java.lang.RuntimeException: java.io.EOFException
>         at org.apache.accumulo.server.tabletserver.TabletServer$ThriftClientHandler$LookupTask.run(TabletServer.java:999)
>         at cloudtrace.instrument.TraceRunnable.run(TraceRunnable.java:47)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         ... 1 more
> Caused by: java.io.EOFException
>         at java.io.DataInputStream.readByte(DataInputStream.java:250)
>         at org.apache.accumulo.core.file.rfile.RelativeKey.fastSkip(RelativeKey.java:249)
>         at org.apache.accumulo.core.file.rfile.RFile$LocalityGroupReader._seek(RFile.java:677)
>         at org.apache.accumulo.core.file.rfile.RFile$LocalityGroupReader.seek(RFile.java:614)
>         at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92)
>         at org.apache.accumulo.core.iterators.SkippingIterator.seek(SkippingIterator.java:37)
>         at org.apache.accumulo.core.iterators.system.ColumnFamilySkippingIterator.seek(ColumnFamilySkippingIterator.java:125)
>         at org.apache.accumulo.core.file.rfile.RFile$Reader.seek(RFile.java:1020)
>         at org.apache.accumulo.server.problems.ProblemReportingIterator.seek(ProblemReportingIterator.java:94)
>         at org.apache.accumulo.core.iterators.system.MultiIterator.seek(MultiIterator.java:105)
>         at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92)
>         at org.apache.accumulo.core.iterators.system.DeletingIterator.seek(DeletingIterator.java:67)
>         at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92)
>         at org.apache.accumulo.core.iterators.Filter.seek(Filter.java:64)
>         at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92)
>         at org.apache.accumulo.core.iterators.Filter.seek(Filter.java:64)
>         at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92)
>         at org.apache.accumulo.core.iterators.user.VersioningIterator.seek(VersioningIterator.java:89)
>         at org.apache.accumulo.core.iterators.user.WholeRowIterator.seek(WholeRowIterator.java:240)
>         at org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92)
>         at org.apache.accumulo.core.iterators.SkippingIterator.seek(SkippingIterator.java:37)
>         at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.readNext(SourceSwitchingIterator.java:116)
>         at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.seek(SourceSwitchingIterator.java:168)
>         at org.apache.accumulo.server.tabletserver.Tablet.lookup(Tablet.java:1607)
>         at org.apache.accumulo.server.tabletserver.Tablet.lookup(Tablet.java:1710)
>         at org.apache.accumulo.server.tabletserver.TabletServer$ThriftClientHandler$LookupTask.run(TabletServer.java:988)
>         ... 4 more
> 19 05:56:04,120 [tabletserver.TabletServer] INFO : Loading tablet !0;~;!0<
> 19 05:56:04,128 [tabletserver.Tablet] TABLET_HIST: !0;~;!0< opened
> {noformat}
> The metadata tablet were being frequently loaded and unloaded as descried in ACCUMULO-329 when this happened.  
> The file in question was major compacted away on another tablet server. Shortly after this.
> {noformat}
> 19 05:57:54,187 [tabletserver.Tablet] DEBUG: Starting MajC !0;~;!0< [/table_info/A0002xm4.rf, /table_info/C0002xz3.rf, /table_info/C0002y0d.rf, /table_info/F0002xng.rf, /table_info/M0002xzv.rf, /table_info/M0002y6z.rf] --> /table_info/A0002y0e.rf_tmp
> 19 05:57:54,309 [tabletserver.Tablet] TABLET_HIST: !0;~;!0< MajC [/table_info/A0002xm4.rf, /table_info/C0002xz3.rf, /table_info/C0002y0d.rf, /table_info/F0002xng.rf, /table_info/M0002xzv.rf, /table_info/M0002y6z.rf] --> /table_info/A0002y0e.rf
> {noformat}
> {noformat}
> 19 06:10:29,546 [gc.SimpleGarbageCollector] DEBUG: Deleting /accumulo/tables/!0/table_info/F0002xng.rf
> {noformat}
> The file was garbage collected later.  The GC and Majc show that the file existed while the EOF happened.  So this is probably either a bug in RFile fast skip or a bug related to the tablet closing. Probably related to the close w/ the close temporal relationship.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira