You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Sergey Shelukhin (JIRA)" <ji...@apache.org> on 2019/02/01 01:50:00 UTC

[jira] [Comment Edited] (HBASE-21601) HBase (asyncwal?) can write a corrupted record - step 1, skip such records

    [ https://issues.apache.org/jira/browse/HBASE-21601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16757873#comment-16757873 ] 

Sergey Shelukhin edited comment on HBASE-21601 at 2/1/19 1:49 AM:
------------------------------------------------------------------

I looked at the code a bit but it will take a while to understand, so for now I'm going to mitigate it by skipping such records. Given that this record is bogus, and the lengths are intact, for this scenario it's safe to do so. However, it's possible I guess to have a bug where skipping such record would lead to data loss. Regardless, failure to split the WAL will lead to even more data loss in this case so it should be ok to handle errors where the structure is correct but cells are corrupted.


There are no retries that I see or any way to reuse the record on high level... 
I think there might be a race condition in asyncwal. It's possible that the record is written correctly, but not removed correctly from ring buffer, or something. So then the buffers of the WALEdit (which is itself somewhat difficult to track) get reused, and AsyncWAL writes the record again with some random data.
[~Apache9] do you have any insight about possible race conditions that can result in this? I can file a separate JIRA to fix it.

I cannot share the WAL data, but I'm attaching the debug logging I've added that shows the WALKey and cell length of two records, which I think should be sufficient - there's nothing more to see from the file itself, it's just a valid WAL file with records as follows:
The data in the first record in the logs, in every cell, is valid, and it's followed by more valid WALKey-s with various number of valid cells. I included one cell prefix as an example.
The data in the 2nd one is invalid in every cell, and looks like fragments of our real data (values we insert), randomly cut and pasted into the cell buffers. I tried to search for substrings from these invalid values in the rest of the file, but cannot find them, so they are from some different requests.





was (Author: sershe):
There are no retries that I see or any way to reuse the record on high level... 
I think there might be a race condition in asyncwal. It's possible that the record is written correctly, but not removed correctly from ring buffer, or something. So then the buffers of the WALEdit (which is itself somewhat difficult to track) get reused, and AsyncWAL writes the record again with some random data from other requests.

I looked at the code a bit but it will take a while to understand, so for now I'm going to mitigate it by skipping such records. Given that this record is bogus, and the lengths are intact, for this scenario it's safe to do so. However, it's possible I guess to have a bug where skipping such record would lead to data loss. Regardless, failure to split the WAL will lead to even more data loss in this case so it should be ok to handle errors where the structure is correct but cells are corrupted.

I cannot share the WAL data, but I'm attaching the debug logging I've added that shows the WALKey and cell length of two records, which I think should be sufficient - there's nothing more to see from the file itself, it's just a valid WAL file with records as follows:
The data in the first record in the logs, in every cell, is valid, and it's followed by more valid WALKey-s with various number of valid cells. I included one cell prefix as an example.
The data in the 2nd one is invalid in every cell, and looks like fragments of our real data (values we insert), randomly cut and pasted into the buffers. I tried to search for substrings from these invalid values in the rest of the file, but cannot find them, so they are from some different requests.




> HBase (asyncwal?) can write a corrupted record - step 1, skip such records
> --------------------------------------------------------------------------
>
>                 Key: HBASE-21601
>                 URL: https://issues.apache.org/jira/browse/HBASE-21601
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Major
>
> {noformat}
> 2018-12-13 17:01:12,208 ERROR [RS_LOG_REPLAY_OPS-regionserver/...] executor.EventHandler: Caught throwable while processing event RS_LOG_REPLAY
> java.lang.RuntimeException: java.lang.NegativeArraySizeException
> 	at org.apache.hadoop.hbase.wal.WALSplitter$PipelineController.checkForErrors(WALSplitter.java:846)
> 	at org.apache.hadoop.hbase.wal.WALSplitter$OutputSink.finishWriting(WALSplitter.java:1203)
> 	at org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.finishWritingAndClose(WALSplitter.java:1267)
> 	at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:349)
> 	at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:196)
> 	at org.apache.hadoop.hbase.regionserver.SplitLogWorker.splitLog(SplitLogWorker.java:178)
> 	at org.apache.hadoop.hbase.regionserver.SplitLogWorker.lambda$new$0(SplitLogWorker.java:90)
> 	at org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:70)
> 	at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NegativeArraySizeException
> 	at org.apache.hadoop.hbase.CellUtil.cloneFamily(CellUtil.java:113)
> 	at org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.filterCellByStore(WALSplitter.java:1542)
> 	at org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1586)
> 	at org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1560)
> 	at org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1085)
> 	at org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1077)
> 	at org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1047)
> {noformat}
> Unfortunately I cannot share the file.
> The issue appears to be straightforward - for whatever reason the family length is negative. Not sure how such a cell got created, I suspect the file was corrupted.
> {code}
> byte[] output = new byte[cell.getFamilyLength()];
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)