You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Gary Helmling (Updated) (JIRA)" <ji...@apache.org> on 2011/10/05 08:10:35 UTC

[jira] [Updated] (HBASE-4282) Potential data loss in retries of WAL close introduced in HBASE-4222

     [ https://issues.apache.org/jira/browse/HBASE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary Helmling updated HBASE-4282:
---------------------------------

    Attachment: HBASE-4282_trunk_3.patch

Here's an updated patch for trunk incorporating the HBASE-4487 changes.  We made use of the same sync tracking to check for unflushed entries.

I also added explicit checking in TestLogRolling that aborts did not occur for the HBASE-4222 behavior.

TestLogRolling and TestLogRollAbort both pass in batch runs of 10 times.
                
> Potential data loss in retries of WAL close introduced in HBASE-4222
> --------------------------------------------------------------------
>
>                 Key: HBASE-4282
>                 URL: https://issues.apache.org/jira/browse/HBASE-4282
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Gary Helmling
>            Assignee: Gary Helmling
>            Priority: Blocker
>             Fix For: 0.92.0, 0.90.5
>
>         Attachments: HBASE-4282_0.90_2.patch, HBASE-4282_trunk_2.patch, HBASE-4282_trunk_3.patch, HBASE-4282_trunk_prelim.patch
>
>
> The ability to ride over WAL close errors on log rolling added in HBASE-4222 could lead to missing HLog entries if:
> * A table has DEFERRED_LOG_FLUSH=true
> * There are unflushed WALEdit entries for that table in the current SequenceFile writer buffer
> Since the writes were already acknowledged to the client, just ignoring the close error to allow for another log roll doesn't seem like the right thing to do here.
> We could easily flag this state and only ride over the close error if there aren't unflushed entries.  This would bring the above condition back to the previous behavior of aborting the region server.  However, aborting the region server in this state is still guaranteeing data loss.  Is there anything we can do better in this case?  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira