You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Wellington Chevreuil (JIRA)" <ji...@apache.org> on 2019/07/29 15:57:00 UTC
[jira] [Commented] (HBASE-22539) Potential WAL corruption due to
early DBBs re-use.
[ https://issues.apache.org/jira/browse/HBASE-22539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16895390#comment-16895390 ]
Wellington Chevreuil commented on HBASE-22539:
----------------------------------------------
{quote}I guess the problem is that we release the ByteBuf too earlier...{quote}
Yep, just changed the title to reflect that, since we discarded the original suspicion around unsafe copy.
{quote}But seems the only way to release the ByteBuf is to finish the rpc call...{quote}
Hum, the stack trace suggests we are probably on a separate thread from *ringbuffer*. Maybe the rpc thread has reached its endpoint where the DBB is then released?
{noformat}
at org.apache.hadoop.hbase.KeyValueUtil.checkKeyValueBytes(KeyValueUtil.java:555)
at org.apache.hadoop.hbase.KeyValueUtil.isBufferValid(KeyValueUtil.java:532)
at org.apache.hadoop.hbase.io.ByteBufferWriterOutputStream.write(ByteBufferWriterOutputStream.java:99)
at org.apache.hadoop.hbase.util.ByteBufferUtils.copyBufferToStream(ByteBufferUtils.java:451)
at org.apache.hadoop.hbase.ByteBufferKeyValue.write(ByteBufferKeyValue.java:277)
at org.apache.hadoop.hbase.KeyValueUtil.oswrite(KeyValueUtil.java:794)
at org.apache.hadoop.hbase.regionserver.wal.WALCellCodec$EnsureKvEncoder.write(WALCellCodec.java:382)
at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.append(ProtobufLogWriter.java:54)
at org.apache.hadoop.hbase.regionserver.wal.FSHLog.doAppend(FSHLog.java:302)
at org.apache.hadoop.hbase.regionserver.wal.FSHLog.doAppend(FSHLog.java:67)
at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.append(AbstractFSWAL.java:918)
at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.append(FSHLog.java:1082)
at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:973)
at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:881)
at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:129)
at java.lang.Thread.run(Thread.java:748)
{noformat}
> Potential WAL corruption due to early DBBs re-use.
> ---------------------------------------------------
>
> Key: HBASE-22539
> URL: https://issues.apache.org/jira/browse/HBASE-22539
> Project: HBase
> Issue Type: Bug
> Components: rpc, wal
> Affects Versions: 2.1.1
> Reporter: Wellington Chevreuil
> Assignee: Wellington Chevreuil
> Priority: Blocker
>
> Summary
> We had been chasing a WAL corruption issue reported on one of our customers deployments running release 2.1.1 (CDH 6.1.0). After providing a custom modified jar with the extra sanity checks implemented by HBASE-21401 applied on some code points, plus additional debugging messages, we believe it is related to DirectByteBuffer usage, and Unsafe copy from offheap memory to on-heap array triggered [here|https://github.com/apache/hbase/blob/branch-2.1/hbase-common/src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java#L1157], such as when writing into a non ByteBufferWriter type, as done [here|https://github.com/apache/hbase/blob/branch-2.1/hbase-common/src/main/java/org/apache/hadoop/hbase/io/ByteBufferWriterOutputStream.java#L84].
> More details on the following comment.
>
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)