You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "John Vines (JIRA)" <ji...@apache.org> on 2014/01/16 01:02:34 UTC

[jira] [Reopened] (ACCUMULO-1998) Encrypted WALogs seem to be excessively buffering

     [ https://issues.apache.org/jira/browse/ACCUMULO-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Vines reopened ACCUMULO-1998:
----------------------------------


Seems there are some corner cases involving failed flushes that are causing buffer overflow excpetions in the blockwriter. Working on fixing these.

> Encrypted WALogs seem to be excessively buffering
> -------------------------------------------------
>
>                 Key: ACCUMULO-1998
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1998
>             Project: Accumulo
>          Issue Type: Bug
>            Reporter: Michael Allen
>            Assignee: John Vines
>            Priority: Blocker
>             Fix For: 1.6.0
>
>         Attachments: 0001-ACCUMULO-1998-Working-around-the-cipher-s-buffer-by-.patch, 0001-ACCUMULO-1998-forcing-Buffered-crypto-stream-to-flus.patch, 0001-ACCUMULO-1998.patch, 0002-ACCUMULO-1998-forcing-Buffered-crypto-stream-to-flus.patch, 0002-ACCUMULO-1998.patch, 0003-ACCUMULO-1998-forcing-Buffered-crypto-stream-to-flus.patch, 0004-ACCUMULO-1998-forcing-Buffered-crypto-stream-to-flus.patch
>
>
> The reproduction steps around this are a little bit fuzzy but basically we ran a moderate workload against a 1.6.0 server.  Encryption happened to be turned on but that doesn't seem to be germane to the problem.  After doing a moderate amount of work, Accumulo is refusing to start up, spewing this error over and over to the log:
> {noformat}
> 2013-12-10 10:23:02,529 [tserver.TabletServer] WARN : exception while doing multi-scan 
> java.lang.RuntimeException: java.io.IOException: Failed to open hdfs://10.10.1.115:9000/accumulo/tables/!0/table_info/A000042x.rf
> 	at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler$LookupTask.run(TabletServer.java:1125)
> 	at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> 	at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
> 	at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
> 	at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.IOException: Failed to open hdfs://10.10.1.115:9000/accumulo/tables/!0/table_info/A000042x.rf
> 	at org.apache.accumulo.tserver.FileManager.reserveReaders(FileManager.java:333)
> 	at org.apache.accumulo.tserver.FileManager.access$500(FileManager.java:58)
> 	at org.apache.accumulo.tserver.FileManager$ScanFileManager.openFiles(FileManager.java:478)
> 	at org.apache.accumulo.tserver.FileManager$ScanFileManager.openFileRefs(FileManager.java:466)
> 	at org.apache.accumulo.tserver.FileManager$ScanFileManager.openFiles(FileManager.java:486)
> 	at org.apache.accumulo.tserver.Tablet$ScanDataSource.createIterator(Tablet.java:2027)
> 	at org.apache.accumulo.tserver.Tablet$ScanDataSource.iterator(Tablet.java:1989)
> 	at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.seek(SourceSwitchingIterator.java:163)
> 	at org.apache.accumulo.tserver.Tablet.lookup(Tablet.java:1565)
> 	at org.apache.accumulo.tserver.Tablet.lookup(Tablet.java:1672)
> 	at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler$LookupTask.run(TabletServer.java:1114)
> 	... 6 more
> Caused by: java.io.FileNotFoundException: File does not exist: /accumulo/tables/!0/table_info/A000042x.rf
> 	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchLocatedBlocks(DFSClient.java:2006)
> 	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1975)
> 	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1967)
> 	at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:735)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:165)
> 	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:436)
> 	at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBCFile(CachableBlockFile.java:256)
> 	at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.access$000(CachableBlockFile.java:143)
> 	at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader$MetaBlockLoader.get(CachableBlockFile.java:212)
> 	at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBlock(CachableBlockFile.java:313)
> 	at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:367)
> 	at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:143)
> 	at org.apache.accumulo.core.file.rfile.RFile$Reader.<init>(RFile.java:825)
> 	at org.apache.accumulo.core.file.rfile.RFileOperations.openReader(RFileOperations.java:79)
> 	at org.apache.accumulo.core.file.DispatchingFileFactory.openReader(FileOperations.java:119)
> 	at org.apache.accumulo.tserver.FileManager.reserveReaders(FileManager.java:314)
> 	... 16 more
> {noformat}
> Here's some other pieces of context:
> HDFS contents:
> {noformat}
> ubuntu@ip-10-10-1-115:/data0/logs/accumulo$ hadoop fs -lsr /accumulo/tables/
> drwxr-xr-x   - accumulo hadoop          0 2013-12-10 00:32 /accumulo/tables/!0
> drwxr-xr-x   - accumulo hadoop          0 2013-12-10 01:06 /accumulo/tables/!0/default_tablet
> drwxr-xr-x   - accumulo hadoop          0 2013-12-10 10:49 /accumulo/tables/!0/table_info
> -rw-r--r--   5 accumulo hadoop       1698 2013-12-10 00:34 /accumulo/tables/!0/table_info/F0000000.rf
> -rw-r--r--   5 accumulo hadoop      43524 2013-12-10 01:53 /accumulo/tables/!0/table_info/F000062q.rf
> drwxr-xr-x   - accumulo hadoop          0 2013-12-10 00:32 /accumulo/tables/+r
> drwxr-xr-x   - accumulo hadoop          0 2013-12-10 10:45 /accumulo/tables/+r/root_tablet
> -rw-r--r--   5 accumulo hadoop       2070 2013-12-10 10:45 /accumulo/tables/+r/root_tablet/A0000738.rf
> drwxr-xr-x   - accumulo hadoop          0 2013-12-10 00:33 /accumulo/tables/1
> drwxr-xr-x   - accumulo hadoop          0 2013-12-10 00:33 /accumulo/tables/1/default_tablet
> {noformat}
> ZooKeeper entries
> {noformat}
> [zk: localhost:2181(CONNECTED) 6] get /accumulo/371cfa3e-fe96-4a50-92e9-da7572589ffa/root_tablet/dir 
> hdfs://10.10.1.115:9000/accumulo/tables/+r/root_tablet
> cZxid = 0x1b
> ctime = Tue Dec 10 00:32:56 EST 2013
> mZxid = 0x1b
> mtime = Tue Dec 10 00:32:56 EST 2013
> pZxid = 0x1b
> cversion = 0
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 54
> numChildren = 0
> {noformat}
> I'm going to preserve the state of this machine in HDFS for a while but not forever, so if there are other pieces of context people need, let me know.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)