You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Esteban Gutierrez (JIRA)" <ji...@apache.org> on 2018/11/02 19:51:00 UTC

[jira] [Commented] (HBASE-20604) ProtobufLogReader#readNext can incorrectly loop to the same position in the stream until the the WAL is rolled

    [ https://issues.apache.org/jira/browse/HBASE-20604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16673603#comment-16673603 ] 

Esteban Gutierrez commented on HBASE-20604:
-------------------------------------------

[~mdrob] I looked into that and and even it seems related we are doing positional reads and there is no pre-fetching involved. 

[~apurtell] we have been running in a production environment for months and we haven't run into an issue, also {{entry.getEdit().readFromCells}} needs to trigger a mismatch of the consumed entries vs  the expected entries or see an {{InvalidProtocolBufferException}} while consuming the WAL and seeking to {{originalPosition}}. So far, I think is safe to commit at this point if you are ok with the change. Thanks!

> ProtobufLogReader#readNext can incorrectly loop to the same position in the stream until the the WAL is rolled
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-20604
>                 URL: https://issues.apache.org/jira/browse/HBASE-20604
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication, wal
>    Affects Versions: 3.0.0
>            Reporter: Esteban Gutierrez
>            Assignee: Esteban Gutierrez
>            Priority: Critical
>         Attachments: HBASE-20604.002.patch, HBASE-20604.patch
>
>
> Every time we call {{ProtobufLogReader#readNext}} we consume the input stream associated to the {{FSDataInputStream}} from the WAL that we are reading. Under certain conditions, e.g. when using the encryption at rest ({{CryptoInputStream}}) the stream can return partial data which can cause a premature EOF that cause {{inputStream.getPos()}} to return to the same origina position causing {{ProtobufLogReader#readNext}} to re-try over the reads until the WAL is rolled.
> The side effect of this issue is that {{ReplicationSource}} can get stuck until the WAL is rolled and causing replication delays up to an hour in some cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)