You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Wellington Chevreuil (JIRA)" <ji...@apache.org> on 2019/07/29 15:57:00 UTC
[jira] [Commented] (HBASE-22539) Potential WAL corruption due to early DBBs re-use.

    [ https://issues.apache.org/jira/browse/HBASE-22539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16895390#comment-16895390 ] 

Wellington Chevreuil commented on HBASE-22539:
----------------------------------------------

{quote}I guess the problem is that we release the ByteBuf too earlier...{quote}
Yep, just changed the title to reflect that, since we discarded the original suspicion around unsafe copy.

{quote}But seems the only way to release the ByteBuf is to finish the rpc call...{quote}
Hum, the stack trace suggests we are probably on a separate thread from *ringbuffer*. Maybe the rpc thread has reached its endpoint where the DBB is then released? 

{noformat}
        at org.apache.hadoop.hbase.KeyValueUtil.checkKeyValueBytes(KeyValueUtil.java:555)
        at org.apache.hadoop.hbase.KeyValueUtil.isBufferValid(KeyValueUtil.java:532)
        at org.apache.hadoop.hbase.io.ByteBufferWriterOutputStream.write(ByteBufferWriterOutputStream.java:99)
        at org.apache.hadoop.hbase.util.ByteBufferUtils.copyBufferToStream(ByteBufferUtils.java:451)
        at org.apache.hadoop.hbase.ByteBufferKeyValue.write(ByteBufferKeyValue.java:277)
        at org.apache.hadoop.hbase.KeyValueUtil.oswrite(KeyValueUtil.java:794)
        at org.apache.hadoop.hbase.regionserver.wal.WALCellCodec$EnsureKvEncoder.write(WALCellCodec.java:382)
        at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.append(ProtobufLogWriter.java:54)
        at org.apache.hadoop.hbase.regionserver.wal.FSHLog.doAppend(FSHLog.java:302)
        at org.apache.hadoop.hbase.regionserver.wal.FSHLog.doAppend(FSHLog.java:67)
        at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.append(AbstractFSWAL.java:918)
        at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.append(FSHLog.java:1082)
        at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:973)
        at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:881)
        at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:129)
        at java.lang.Thread.run(Thread.java:748)
{noformat}

> Potential WAL corruption due to early DBBs re-use. 
> ---------------------------------------------------
>
>                 Key: HBASE-22539
>                 URL: https://issues.apache.org/jira/browse/HBASE-22539
>             Project: HBase
>          Issue Type: Bug
>          Components: rpc, wal
>    Affects Versions: 2.1.1
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Blocker
>
> Summary
> We had been chasing a WAL corruption issue reported on one of our customers deployments running release 2.1.1 (CDH 6.1.0). After providing a custom modified jar with the extra sanity checks implemented by HBASE-21401 applied on some code points, plus additional debugging messages, we believe it is related to DirectByteBuffer usage, and Unsafe copy from offheap memory to on-heap array triggered [here|https://github.com/apache/hbase/blob/branch-2.1/hbase-common/src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java#L1157], such as when writing into a non ByteBufferWriter type, as done [here|https://github.com/apache/hbase/blob/branch-2.1/hbase-common/src/main/java/org/apache/hadoop/hbase/io/ByteBufferWriterOutputStream.java#L84].
> More details on the following comment.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)