You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Ted Yu (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2011/11/27 06:50:40 UTC

[jira] [Issue Comment Edited] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

    [ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157652#comment-13157652 ] 

Ted Yu edited comment on HBASE-4862 at 11/27/11 5:50 AM:
---------------------------------------------------------

@Ted
I add testing to this patch in patchV5.

In the OS:Red Hat Enterprise Linux Server release 5.4 (Tikanga)
The test results is as the following:

For trunk with  patchV5:
_
Results :

Failed tests:   testResetZooKeeperSession(org.apache.hadoop.hbase.replication.TestReplicationPeer): ReplicationPeer ZooKeeper session 

was not properly expired.
  testClosing(org.apache.hadoop.hbase.client.TestHCM)

Tests run: 1174, Failures: 2, Errors: 0, Skipped: 8

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2:00:49.122s
[INFO] Finished at: Sun Nov 27 02:41:40 CST 2011
[INFO] Final Memory: 35M/361M
[INFO] ------------------------------------------------------------------------
_



For 0.90 with  patchV5:

_
Results :

Tests run: 702, Failures: 0, Errors: 0, Skipped: 9

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1:15:37.342s
[INFO] Finished at: Sun Nov 27 11:00:07 CST 2011
[INFO] Final Memory: 26M/525M
[INFO] ------------------------------------------------------------------------
_

The failed two tests In trunk are the same as the last run, one of which(TestReplicationPeer#testResetZooKeeperSession) could pass separately and the other is related to HBASE-4874
                
      was (Author: zjushch):
    @Ted
I add testing to this patch in patchV5.

In the OS:Red Hat Enterprise Linux Server release 5.4 (Tikanga)
The test results is as the following:

For trunk with  patchV5:
_
Results :

Failed tests:   testResetZooKeeperSession(org.apache.hadoop.hbase.replication.TestReplicationPeer): ReplicationPeer ZooKeeper session 

was not properly expired.
  testClosing(org.apache.hadoop.hbase.client.TestHCM)

Tests run: 1174, Failures: 2, Errors: 0, Skipped: 8

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2:00:49.122s
[INFO] Finished at: Sun Nov 27 02:41:40 CST 2011
[INFO] Final Memory: 35M/361M
[INFO] ------------------------------------------------------------------------
_



For 0.90 with  patchV5:

_
Results :

Tests run: 702, Failures: 0, Errors: 0, Skipped: 9

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1:15:37.342s
[INFO] Finished at: Sun Nov 27 11:00:07 CST 2011
[INFO] Final Memory: 26M/525M
[INFO] ------------------------------------------------------------------------
_

The failed two tests In trunk are the same as the last run, one of which(testResetZooKeeperSession#TestReplicationPeer) could passed separately,
and the other is related to HBASE-4874
                  
> Splitting hlog and opening region concurrently may cause data loss
> ------------------------------------------------------------------
>
>                 Key: HBASE-4862
>                 URL: https://issues.apache.org/jira/browse/HBASE-4862
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 and is appending log entry
> 2.Regionserver is opening region A now, and in the process replayRecoveredEditsIfAny() ,it will delete the file region A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logs....However, data in other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file system is ok , and it only prints a error log, continue assigning regions. Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira