You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Hadoop QA (JIRA)" <ji...@apache.org> on 2018/06/01 00:14:00 UTC

[jira] [Commented] (HBASE-20634) Reopen region while server crash can cause the procedure to be stuck

    [ https://issues.apache.org/jira/browse/HBASE-20634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497361#comment-16497361 ] 

Hadoop QA commented on HBASE-20634:
-----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m  0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} |
|| || || || {color:brown} branch-2.0 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 13s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 23s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m  3s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 23s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 35s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  7s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 49s{color} | {color:green} branch-2.0 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 11s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  1m 12s{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red}  1m 12s{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  1m 12s{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 58s{color} | {color:red} hbase-server: The patch generated 6 new + 108 unchanged - 2 fixed = 114 total (was 110) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m  0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 34s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  9m 56s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green}  1m  8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 48s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 26s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 38s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}107m 33s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 57s{color} | {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}172m 26s{color} | {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:369877d |
| JIRA Issue | HBASE-20634 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12925999/HBASE-20634.branch-2.0.005.patch |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  cc  hbaseprotoc  |
| uname | Linux 3f413faee2e0 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh |
| git revision | branch-2.0 / f43676a38b |
| maven | version: Apache Maven 3.5.3 (3383c37e1f9e9b3bc3df5050c29c8aff9f295297; 2018-02-24T19:49:05Z) |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC3 |
| compile | https://builds.apache.org/job/PreCommit-HBASE-Build/13033/artifact/patchprocess/patch-compile-hbase-server.txt |
| cc | https://builds.apache.org/job/PreCommit-HBASE-Build/13033/artifact/patchprocess/patch-compile-hbase-server.txt |
| javac | https://builds.apache.org/job/PreCommit-HBASE-Build/13033/artifact/patchprocess/patch-compile-hbase-server.txt |
| checkstyle | https://builds.apache.org/job/PreCommit-HBASE-Build/13033/artifact/patchprocess/diff-checkstyle-hbase-server.txt |
|  Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/13033/testReport/ |
| Max. process+thread count | 4679 (vs. ulimit of 10000) |
| modules | C: hbase-protocol-shaded hbase-procedure hbase-server U: . |
| Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/13033/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Reopen region while server crash can cause the procedure to be stuck
> --------------------------------------------------------------------
>
>                 Key: HBASE-20634
>                 URL: https://issues.apache.org/jira/browse/HBASE-20634
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Duo Zhang
>            Assignee: stack
>            Priority: Critical
>             Fix For: 3.0.0, 2.1.0, 2.0.1
>
>         Attachments: HBASE-20634-UT.patch, HBASE-20634.branch-2.0.001.patch, HBASE-20634.branch-2.0.002.patch, HBASE-20634.branch-2.0.003.patch, HBASE-20634.branch-2.0.004.patch, HBASE-20634.branch-2.0.005.patch
>
>
> Found this when implementing HBASE-20424, where we will transit the peer sync replication state while there is server crash.
> The problem is that, in ServerCrashAssign, we do not have the region lock, so it is possible that after we call handleRIT to clear the existing assign/unassign procedures related to this rs, and before we schedule the assign procedures, it is possible that that we schedule a unassign procedure for a region on the crashed rs. This procedure will not receive the ServerCrashException, instead, in addToRemoteDispatcher, it will find that it can not dispatch the remote call and then a  FailedRemoteDispatchException will be raised. But we do not treat this exception the same with ServerCrashException, instead, we will try to expire the rs. Obviously the rs has already been marked as expired, so this is almost a no-op. Then the procedure will be stuck there for ever.
> A possible way to fix it is to treat FailedRemoteDispatchException the same with ServerCrashException, as it will be created in addToRemoteDispatcher only, and the only reason we can not dispatch a remote call is that the rs has already been dead. The nodeMap is a ConcurrentMap so I think we could use it as a guard.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)