You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Zhihong Yu (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2012/01/20 20:08:39 UTC

[jira] [Issue Comment Edited] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss

    [ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190002#comment-13190002 ] 

Zhihong Yu edited comment on HBASE-5179 at 1/20/12 7:08 PM:
------------------------------------------------------------

Patch v17 for 0.90 passed unit tests.
Got a strange complaint about TestLruBlockCache. In org.apache.hadoop.hbase.io.hfile.TestLruBlockCache.txt:
{code}
testBackgroundEvictionThread(org.apache.hadoop.hbase.io.hfile.TestLruBlockCache)  Time elapsed: 3.157 sec  <<< FAILURE!
junit.framework.AssertionFailedError: null
  at junit.framework.Assert.fail(Assert.java:47)
  at junit.framework.Assert.assertTrue(Assert.java:20)
  at junit.framework.Assert.assertTrue(Assert.java:27)
  at org.apache.hadoop.hbase.io.hfile.TestLruBlockCache.testBackgroundEvictionThread(TestLruBlockCache.java:58)
{code}
Looks like the following assertion didn't give eviction enough time to run:
{code}
      Thread.sleep(1000);
      assertTrue(n++ < 2);
{code}
Running TestLruBlockCache alone passed.
                
      was (Author: zhihyu@ebaysf.com):
    Patch v17 for 0.90 passed unit tests.
Got a strange complaint about TestLruBlockCache. But in org.apache.hadoop.hbase.io.hfile.TestLruBlockCache.txt:
{code}
-------------------------------------------------------------------------------
Test set: org.apache.hadoop.hbase.io.hfile.TestLruBlockCache
-------------------------------------------------------------------------------
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.152 sec
{code}
                  
> Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5179
>                 URL: https://issues.apache.org/jira/browse/HBASE-5179
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.2
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.92.0, 0.94.0, 0.90.6
>
>         Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v16.patch, 5179-90v17.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira