You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Sergey Shelukhin (JIRA)" <ji...@apache.org> on 2019/01/08 21:58:00 UTC

[jira] [Comment Edited] (HBASE-21601) corrupted WAL is not handled in all places (NegativeArraySizeException)

    [ https://issues.apache.org/jira/browse/HBASE-21601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16737564#comment-16737564 ] 

Sergey Shelukhin edited comment on HBASE-21601 at 1/8/19 9:57 PM:
------------------------------------------------------------------

Looks like we might need to look closer at a problematic file... l cannot tell from KeyValueUtil/CellUtil/KeyValue/etc code where exactly the cell is created, but it seems like the requisite number of bytes should always be read for the record, assuming we don't get an IOException or EOF of some sort... or the lower level, byte-reading logic would throw an error.
So, we may be reading the record fully from the file, but getting some garbage bytes; or there's a bug somewhere that allows a partial read to happen, so the offset calculations in KeyValue/CellUtil return bogus offsets.


was (Author: sershe):

Looks like we might need to look closer at the file... l cannot tell from KeyValueUtil/CellUtil/KeyValue/etc code where exactly the cell is created, but it seems like the requisite number of bytes should always be read for the record, assuming we don't get an IOException or EOF of some sort... or the lower level, byte-reading logic would throw an error.
So, we may be reading the record fully from the file, but getting some garbage bytes; or there's a bug somewhere that allows a partial read to happen, so the offset calculations in KeyValue/CellUtil return bogus offsets.

> corrupted WAL is not handled in all places (NegativeArraySizeException)
> -----------------------------------------------------------------------
>
>                 Key: HBASE-21601
>                 URL: https://issues.apache.org/jira/browse/HBASE-21601
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Priority: Major
>
> {noformat}
> 2018-12-13 17:01:12,208 ERROR [RS_LOG_REPLAY_OPS-regionserver/...] executor.EventHandler: Caught throwable while processing event RS_LOG_REPLAY
> java.lang.RuntimeException: java.lang.NegativeArraySizeException
> 	at org.apache.hadoop.hbase.wal.WALSplitter$PipelineController.checkForErrors(WALSplitter.java:846)
> 	at org.apache.hadoop.hbase.wal.WALSplitter$OutputSink.finishWriting(WALSplitter.java:1203)
> 	at org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.finishWritingAndClose(WALSplitter.java:1267)
> 	at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:349)
> 	at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:196)
> 	at org.apache.hadoop.hbase.regionserver.SplitLogWorker.splitLog(SplitLogWorker.java:178)
> 	at org.apache.hadoop.hbase.regionserver.SplitLogWorker.lambda$new$0(SplitLogWorker.java:90)
> 	at org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:70)
> 	at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NegativeArraySizeException
> 	at org.apache.hadoop.hbase.CellUtil.cloneFamily(CellUtil.java:113)
> 	at org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.filterCellByStore(WALSplitter.java:1542)
> 	at org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1586)
> 	at org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1560)
> 	at org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1085)
> 	at org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1077)
> 	at org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1047)
> {noformat}
> Unfortunately I cannot share the file.
> The issue appears to be straightforward - for whatever reason the family length is negative. Not sure how such a cell got created, I suspect the file was corrupted.
> {code}
> byte[] output = new byte[cell.getFamilyLength()];
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)