You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Wellington Chevreuil (JIRA)" <ji...@apache.org> on 2019/07/29 13:54:00 UTC

[jira] [Comment Edited] (HBASE-22539) Potential WAL corruption due to Unsafe.copyMemory usage when DBB are in place

    [ https://issues.apache.org/jira/browse/HBASE-22539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16895272#comment-16895272 ] 

Wellington Chevreuil edited comment on HBASE-22539 at 7/29/19 1:53 PM:
-----------------------------------------------------------------------

Thanks for jumping in, [~Apache9]!

{quote}Did the crashed region server timeout on some write requests?{quote}
The RS is not crashing at all when we see these corruptions (and the message mentioned above is never seen on RS logs either). It may eventually crashes later due other problems, such as GC long pauses, in which case, corrupt wal would cause any RS that then try to split to crash. 


was (Author: wchevreuil):
{quote}Did the crashed region server timeout on some write requests?{quote}
The RS is not crashing at all when we see these corruptions (and the message mentioned above is never seen on RS logs either). It may eventually crashes later due other problems, such as GC long pauses, in which case, corrupt wal would cause any RS that then try to split to crash. 

> Potential WAL corruption due to Unsafe.copyMemory usage when DBB are in place
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-22539
>                 URL: https://issues.apache.org/jira/browse/HBASE-22539
>             Project: HBase
>          Issue Type: Bug
>          Components: rpc, wal
>    Affects Versions: 2.1.1
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Blocker
>
> Summary
> We had been chasing a WAL corruption issue reported on one of our customers deployments running release 2.1.1 (CDH 6.1.0). After providing a custom modified jar with the extra sanity checks implemented by HBASE-21401 applied on some code points, plus additional debugging messages, we believe it is related to DirectByteBuffer usage, and Unsafe copy from offheap memory to on-heap array triggered [here|https://github.com/apache/hbase/blob/branch-2.1/hbase-common/src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java#L1157], such as when writing into a non ByteBufferWriter type, as done [here|https://github.com/apache/hbase/blob/branch-2.1/hbase-common/src/main/java/org/apache/hadoop/hbase/io/ByteBufferWriterOutputStream.java#L84].
> More details on the following comment.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)