You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Duo Zhang (JIRA)" <ji...@apache.org> on 2019/02/01 02:09:00 UTC
[jira] [Commented] (HBASE-21601) HBase (asyncwal?) can write a corrupted WAL record - step 1, skip such records

    [ https://issues.apache.org/jira/browse/HBASE-21601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16757881#comment-16757881 ] 

Duo Zhang commented on HBASE-21601:
-----------------------------------

Is it possible to reproduce the problem in production? If so you could try FSHLog, if everything is fine then it is the problem for AsyncFSWAL, if the problem is still there then it should be something wrong for the upper layer.

Anyway, let me check the code for where we reusing buffers.

> HBase (asyncwal?) can write a corrupted WAL record - step 1, skip such records
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-21601
>                 URL: https://issues.apache.org/jira/browse/HBASE-21601
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Priority: Critical
>         Attachments: wal2.txt
>
>
> {noformat}
> 2018-12-13 17:01:12,208 ERROR [RS_LOG_REPLAY_OPS-regionserver/...] executor.EventHandler: Caught throwable while processing event RS_LOG_REPLAY
> java.lang.RuntimeException: java.lang.NegativeArraySizeException
> 	at org.apache.hadoop.hbase.wal.WALSplitter$PipelineController.checkForErrors(WALSplitter.java:846)
> 	at org.apache.hadoop.hbase.wal.WALSplitter$OutputSink.finishWriting(WALSplitter.java:1203)
> 	at org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.finishWritingAndClose(WALSplitter.java:1267)
> 	at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:349)
> 	at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:196)
> 	at org.apache.hadoop.hbase.regionserver.SplitLogWorker.splitLog(SplitLogWorker.java:178)
> 	at org.apache.hadoop.hbase.regionserver.SplitLogWorker.lambda$new$0(SplitLogWorker.java:90)
> 	at org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:70)
> 	at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NegativeArraySizeException
> 	at org.apache.hadoop.hbase.CellUtil.cloneFamily(CellUtil.java:113)
> 	at org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.filterCellByStore(WALSplitter.java:1542)
> 	at org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1586)
> 	at org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1560)
> 	at org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1085)
> 	at org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1077)
> 	at org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1047)
> {noformat}
> Unfortunately I cannot share the file.
> The issue appears to be straightforward - for whatever reason the family length is negative. Not sure how such a cell got created, I suspect the file was corrupted.
> {code}
> byte[] output = new byte[cell.getFamilyLength()];
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)