You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "Michael Stack (Jira)" <ji...@apache.org> on 2020/12/15 17:55:00 UTC

[jira] [Resolved] (HBASE-25389) [Flakey Tests] branch-2 TestMetaShutdownHandler

     [ https://issues.apache.org/jira/browse/HBASE-25389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Stack resolved HBASE-25389.
-----------------------------------
    Fix Version/s: 2.4.1
                   2.5.0
                   3.0.0-alpha-1
     Hadoop Flags: Reviewed
         Assignee: Michael Stack
       Resolution: Fixed

Merged to branch-2.4+. Thanks for the review [~bharathv]

> [Flakey Tests] branch-2 TestMetaShutdownHandler
> -----------------------------------------------
>
>                 Key: HBASE-25389
>                 URL: https://issues.apache.org/jira/browse/HBASE-25389
>             Project: HBase
>          Issue Type: Task
>          Components: flakies
>            Reporter: Michael Stack
>            Assignee: Michael Stack
>            Priority: Major
>             Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.1
>
>
> I see this in local runs fail regularly. We kill the server hosting meta and then check it came up in a new location after waiting on recovery. In the test, when it fails, the assert on new location fails because we have not waited for the CRASH to happen. Here is excerpt from log:
> {code}
>  2020-12-11 13:20:27,298 INFO  [Listener at localhost/62149] master.TestMetaShutdownHandler(111): Deleted the znode for the RegionServer hosting hbase:meta; waiting on SSH
> ...
>  2020-12-11 13:20:27,310 INFO  [Listener at localhost/62149] master.TestMetaShutdownHandler(122): Past wait on RIT
> ...
>  2020-12-11 13:20:27,351 DEBUG [RegionServerTracker-0] procedure2.ProcedureExecutor(1048): Stored pid=9, state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure stack.XXX.example.com,62201,1607721618377, splitWal=true, meta=true
> {code}
> The first line is where we remove the ephemeral node for the regionserver carrying hbase:meta. The second line is supposed to log AFTER SCP is done (it calls it SSH in this old test above). Notice how the 3rd line, after the 2nd, is first mention of SCP being queued.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)