You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Ted Yu (JIRA)" <ji...@apache.org> on 2013/09/03 19:19:52 UTC

[jira] [Updated] (HBASE-9387) Region could get lost during assignment

     [ https://issues.apache.org/jira/browse/HBASE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-9387:
--------------------------

    Attachment: 9387-v8.txt

Patch v8 moves the znode existence check and subsequent abortion to transitionToOpened().

This is to avoid unnecessary region server abortion.
                
> Region could get lost during assignment
> ---------------------------------------
>
>                 Key: HBASE-9387
>                 URL: https://issues.apache.org/jira/browse/HBASE-9387
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>    Affects Versions: 0.95.2
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>            Priority: Critical
>         Attachments: 9387-v1.txt, 9387-v3.txt, 9387-v4.2.txt, 9387-v4.3.txt, 9387-v4.4.txt, 9387-v4.txt, 9387-v5.txt, 9387-v6.txt, 9387-v7.txt, 9387-v8.txt, hbase-9387.patch, org.apache.hadoop.hbase.TestFullLogReconstruction-output.txt
>
>
> I observed test timeout running against hadoop 2.1.0 with distributed log replay turned on.
> Looks like region state for 1588230740 became inconsistent between master and the surviving region server:
> {code}
> 2013-08-29 22:15:34,180 INFO  [AM.ZK.Worker-pool2-t4] master.RegionStates(299): Onlined 1588230740 on kiyo.gq1.ygridcore.net,57016,1377814510039
> ...
> 2013-08-29 22:15:34,587 DEBUG [Thread-221] client.HConnectionManager$HConnectionImplementation(1269): locateRegionInMeta parentTable=hbase:meta, metaLocation={region=hbase:meta,,1.1588230740, hostname=kiyo.gq1.ygridcore.net,57016,1377814510039, seqNum=0}, attempt=2 of 35 failed; retrying after sleep of 302 because: org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region is being opened: 1588230740
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2574)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3949)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733)
>         at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26965)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2063)
>         at org.apache.hadoop.hbase.ipc.RpcServer$CallRunner.run(RpcServer.java:1800)
>         at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:165)
>         at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:41)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira