You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Hadoop QA (JIRA)" <ji...@apache.org> on 2018/12/20 02:53:00 UTC

[jira] [Commented] (HBASE-21623) ServerCrashProcedure can stomp on a RIT for the wrong server

    [ https://issues.apache.org/jira/browse/HBASE-21623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725534#comment-16725534 ] 

Hadoop QA commented on HBASE-21623:
-----------------------------------

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  0m  0s{color} | {color:orange} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m  5s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 51s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m  3s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 48s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 58s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 31s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m  8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m  5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m  0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 48s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  8m 31s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}115m 18s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}151m 54s{color} | {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21623 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12952432/HBASE-21623.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux c60be26fb3b1 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh |
| git revision | master / 8991877bb2 |
| maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
|  Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/15334/testReport/ |
| Max. process+thread count | 5075 (vs. ulimit of 10000) |
| modules | C: hbase-server U: hbase-server |
| Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/15334/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> ServerCrashProcedure can stomp on a RIT for the wrong server
> ------------------------------------------------------------
>
>                 Key: HBASE-21623
>                 URL: https://issues.apache.org/jira/browse/HBASE-21623
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Major
>         Attachments: HBASE-21623.patch
>
>
> A server died while some region was being opened on it; eventually the open failed, and the RIT procedure started retrying on a different server.
> However, by then SCP for the dying server had already obtained the region from the list of regions on the old server, and proceeded to overwrite whatever the RIT was doing with a new server.
> {noformat}
> 2018-12-18 23:06:03,160 INFO  [PEWorker-14] procedure2.ProcedureExecutor: Initialized subprocedures=[{pid=151404, ppid=151104, state=RUNNABLE, hasLock=false; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure}]
> ...
> 2018-12-18 23:06:38,208 INFO  [PEWorker-10] procedure.ServerCrashProcedure: Start pid=151632, state=RUNNABLE:SERVER_CRASH_START, hasLock=true; ServerCrashProcedure server=oldServer,17020,1545202098577, splitWal=true, meta=false
> ...
> 2018-12-18 23:06:41,953 WARN  [RSProcedureDispatcher-pool4-t115] assignment.RegionRemoteProcedureBase: The remote operation pid=151404, ppid=151104, state=RUNNABLE, hasLock=false; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure for region {ENCODED => region1, ... } to server oldServer,17020,1545202098577 failed
> org.apache.hadoop.hbase.regionserver.RegionServerAbortedException: org.apache.hadoop.hbase.regionserver.RegionServerAbortedException: Server oldServer,17020,1545202098577 aborting
> 2018-12-18 23:06:42,485 INFO  [PEWorker-5] procedure2.ProcedureExecutor: Finished subprocedure(s) of pid=151104, ppid=150875, state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=true; TransitRegionStateProcedure table=t1, region=region1, ASSIGN; resume parent processing.
> 2018-12-18 23:06:42,485 INFO  [PEWorker-13] assignment.TransitRegionStateProcedure: Retry=1 of max=2147483647; pid=151104, ppid=150875, state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=true; TransitRegionStateProcedure table=t1, region=region1, ASSIGN; rit=OPENING, location=oldServer,17020,1545202098577
> 2018-12-18 23:06:42,500 INFO  [PEWorker-13] assignment.TransitRegionStateProcedure: Starting pid=151104, ppid=150875, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=true; TransitRegionStateProcedure table=t1, region=region1, ASSIGN; rit=OPENING, location=null; forceNewPlan=true, retain=false
> 2018-12-18 23:06:42,657 INFO  [PEWorker-2] assignment.RegionStateStore: pid=151104 updating hbase:meta row=region1, regionState=OPENING, regionLocation=newServer,17020,1545202111238
> ...
> 2018-12-18 23:06:43,094 INFO  [PEWorker-4] procedure.ServerCrashProcedure: pid=151632, state=RUNNABLE:SERVER_CRASH_ASSIGN, hasLock=true; ServerCrashProcedure server=oldServer,17020,1545202098577, splitWal=true, meta=false found RIT  pid=151104, ppid=150875, state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=true; TransitRegionStateProcedure table=t1, region=region1, ASSIGN; rit=OPENING, location=newServer,17020,1545202111238, table=t1, region=region1
> 2018-12-18 23:06:43,094 INFO  [PEWorker-4] assignment.RegionStateStore: pid=151104 updating hbase:meta row=region1, regionState=ABNORMALLY_CLOSED
> {noformat}
> Later, the RIT overwrote the state again, it seems, and then the region got stuck in OPENING state forever, but I'm not sure yet if that's just due to this bug or if there was another bug after that. For now this can be addressed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)