You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Nick Dimiduk (Jira)" <ji...@apache.org> on 2020/06/02 16:35:00 UTC

[jira] [Commented] (HBASE-24131) [Flakey Tests] TestExportSnapshot takes too long; up against 13min max

    [ https://issues.apache.org/jira/browse/HBASE-24131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124022#comment-17124022 ] 

Nick Dimiduk commented on HBASE-24131:
--------------------------------------

I've observed another occurrence of this test timing out, over on https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1786/3/

The failure message is cryptic, but I think i found the issue. There appears to be a timeout set in some kind of secret manager, which is too aggressive for this mini-cluster test. The last component of the mini-cluster, MapReduce, is finally available at T+273501ms -- ~4.5 minutes after process launch. This is out I interpret the log line

{noformat}
2020-06-02 03:20:49,252 INFO  [Thread-223] server.Server(419): Started @273501ms
{noformat}

a scant 20ms later we get

{noformat}
2020-06-02 03:20:50,274 ERROR [Thread[Thread-224,5,FailOnTimeoutGroup]] delegation.AbstractDelegationTokenSecretManager$ExpiredTokenRemover(700): ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2020-06-02 03:20:50,351 INFO  [Time-limited test] hbase.HBaseTestingUtility(1272): Shutting down minicluster
{noformat}

These thread group names have no meaning to me.

> [Flakey Tests] TestExportSnapshot takes too long; up against 13min max
> ----------------------------------------------------------------------
>
>                 Key: HBASE-24131
>                 URL: https://issues.apache.org/jira/browse/HBASE-24131
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Michael Stack
>            Assignee: Michael Stack
>            Priority: Major
>             Fix For: 3.0.0-alpha-1, 2.3.0
>
>
> TestExportSnapshot fails fairly regularly locally. Looking, its test timeout. Looking at how long it ran, its 13minutes plus. Looking at recent successful branch-2 run, it passed but it took about 7-8minutes. Let me break up the test into to pieces.
>  org.junit.runners.model.TestTimedOutException: test timed out after 780 seconds
>    at org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testExportFileSystemState(TestExportSnapshot.java:227)
>    at org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testExportRetry(TestExportSnapshot.java:267)
> ... I see this in the log:
>  ====> TEST TIMED OUT. PRINTING THREAD DUMP. <====
> Test started at:
>  2020-04-06 17:19:21,739 INFO
> ... and the timestamp just above the TIMED OUT was
>  2020-04-06 17:31:01,758



--
This message was sent by Atlassian Jira
(v8.3.4#803005)