You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2018/07/24 17:33:00 UTC
[jira] [Comment Edited] (HBASE-18152) [AMv2] Corrupt Procedure WAL
file; procedure data stored out of order
[ https://issues.apache.org/jira/browse/HBASE-18152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554540#comment-16554540 ]
stack edited comment on HBASE-18152 at 7/24/18 5:32 PM:
--------------------------------------------------------
[~elserj] thank you for jumping in. I appreciate being able to chat on this. I'm trying to make a test but it's a bit hard manufacturing. Looking at this wal procedure store I'm currently trying to figure why we have a third way of writing wals and was thinking of putting in place fshlog or asyncfs.
I appreciate the sort suggestion. I am not sure the edits we're part of the same batch so am unsure it would help. Might be worth trying though?
Any other ideas welcome.
Thanks.
was (Author: stack):
[~elserj] thank you for jumping in. I appreciate being able to chat on this. I'm trying to make a test but it's a bit hard manufacturing. Looking at this was procedure store I'm currently trying to figure why we have a third way of writing wals and was thinking of putting in place fshlog or asyncfs.
I appreciate the sort suggestion. I am not sure the edits we're part of the same batch so am unsure it would help. Might be worth trying though?
Any other ideas welcome.
Thanks.
> [AMv2] Corrupt Procedure WAL file; procedure data stored out of order
> ---------------------------------------------------------------------
>
> Key: HBASE-18152
> URL: https://issues.apache.org/jira/browse/HBASE-18152
> Project: HBase
> Issue Type: Bug
> Components: Region Assignment
> Affects Versions: 2.0.0
> Reporter: stack
> Assignee: stack
> Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HBASE-18152.master.001.patch, hbase-hbase-master-ctr-e138-1518143905142-221855-01-000002.hwx.site.log.gz, pv2-00000000000000000036.log, pv2-00000000000000000047.log, reading_bad_wal.patch
>
>
> I've seen corruption from time-to-time testing. Its rare enough. Often we can get over it but sometimes we can't. It took me a while to capture an instance of corruption. Turns out we are write to the WAL out-of-order which undoes a basic tenet; that WAL content is ordered in line w/ execution.
> Below I'll post a corrupt WAL.
> Looking at the write-side, there is a lot going on. I'm not clear on how we could write out of order. Will try and get more insight. Meantime parking this issue here to fill data into.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)