You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Yongjun Zhang (JIRA)" <ji...@apache.org> on 2014/09/02 04:28:21 UTC

[jira] [Commented] (HADOOP-11045) Introducing a tool to detect flaky tests of hadoop jenkins test job

    [ https://issues.apache.org/jira/browse/HADOOP-11045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117835#comment-14117835 ] 

Yongjun Zhang commented on HADOOP-11045:
----------------------------------------

Example output for job Hadoop-Common-0.23-Build
{code}
[yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j Hadoop-Common-0.23-Build -n 8
****Recently FAILED builds in url: https://builds.apache.org//job/Hadoop-Common-0.23-Build
    THERE ARE 5 builds (out of 5) that have failed tests in the past 8 days, as listed below:

===>https://builds.apache.org/job/Hadoop-Common-0.23-Build/1059/testReport (2014-09-01 02:01:30)
    Failed test: org.apache.hadoop.io.compress.TestCodec.testSnappyCodec
===>https://builds.apache.org/job/Hadoop-Common-0.23-Build/1058/testReport (2014-08-31 02:01:30)
    Failed test: org.apache.hadoop.io.compress.TestCodec.testSnappyCodec
===>https://builds.apache.org/job/Hadoop-Common-0.23-Build/1057/testReport (2014-08-30 02:01:30)
    Failed test: org.apache.hadoop.io.compress.TestCodec.testSnappyCodec
===>https://builds.apache.org/job/Hadoop-Common-0.23-Build/1056/testReport (2014-08-29 02:01:30)
    Failed test: org.apache.hadoop.io.compress.TestCodec.testSnappyCodec
===>https://builds.apache.org/job/Hadoop-Common-0.23-Build/1055/testReport (2014-08-28 02:01:30)
    Failed test: org.apache.hadoop.io.compress.TestCodec.testSnappyCodec

All failed tests <#occurrences: testName>:
    5: org.apache.hadoop.io.compress.TestCodec.testSnappyCodec
{code}

Example output for Hadoop-Hdfs-trunk:
{code}
[yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -n 7
****Recently FAILED builds in url: https://builds.apache.org//job/Hadoop-Hdfs-trunk
    THERE ARE 7 builds (out of 8) that have failed tests in the past 7 days, as listed below:

===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1858/testReport (2014-09-01 04:31:30)
    Failed test: org.apache.hadoop.hdfs.web.TestWebHDFS.testWebHdfsDeleteSnapshot
===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1857/testReport (2014-08-31 04:31:30)
    Failed test: org.apache.hadoop.hdfs.web.TestWebHDFSForHA.testFailoverAfterOpen
    Failed test: org.apache.hadoop.hdfs.web.TestWebHDFSForHA.testSecureHAToken
===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1856/testReport (2014-08-30 09:46:54)
    Failed test: org.apache.hadoop.hdfs.TestDFSClientRetries.testIdempotentAllocateBlockAndClose
    Failed test: org.apache.hadoop.hdfs.TestDFSClientRetries.testFailuresArePerOperation
    Failed test: org.apache.hadoop.hdfs.TestDFSClientRetries.testRetryOnChecksumFailure
    Failed test: org.apache.hadoop.hdfs.TestDFSClientRetries.testWriteTimeoutAtDataNode
    Failed test: org.apache.hadoop.hdfs.TestDFSClientRetries.testDFSClientRetriesOnBusyBlocks
    Failed test: org.apache.hadoop.hdfs.TestDFSClientRetries.testClientDNProtocolTimeout
    Failed test: org.apache.hadoop.hdfs.TestDFSClientRetries.testGetFileChecksum
    Failed test: org.apache.hadoop.hdfs.TestDFSClientRetries.testNamenodeRestart
===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1855/testReport (2014-08-30 04:31:30)
    Failed test: org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testBalancer
    Failed test: org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testUnevenDistribution
===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1854/testReport (2014-08-29 04:31:30)
    Could not open testReport
===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1853/testReport (2014-08-28 09:37:18)
    Could not open testReport
===>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1852/testReport (2014-08-28 09:28:48)
    Could not open testReport

All failed tests <#occurrences: testName>:
    1: org.apache.hadoop.hdfs.TestDFSClientRetries.testIdempotentAllocateBlockAndClose
    1: org.apache.hadoop.hdfs.web.TestWebHDFSForHA.testFailoverAfterOpen
    1: org.apache.hadoop.hdfs.web.TestWebHDFS.testWebHdfsDeleteSnapshot
    1: org.apache.hadoop.hdfs.TestDFSClientRetries.testFailuresArePerOperation
    1: org.apache.hadoop.hdfs.web.TestWebHDFSForHA.testSecureHAToken
    1: org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testUnevenDistribution
    1: org.apache.hadoop.hdfs.TestDFSClientRetries.testRetryOnChecksumFailure
    1: org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testBalancer
    1: org.apache.hadoop.hdfs.TestDFSClientRetries.testWriteTimeoutAtDataNode
    1: org.apache.hadoop.hdfs.TestDFSClientRetries.testDFSClientRetriesOnBusyBlocks
    1: org.apache.hadoop.hdfs.TestDFSClientRetries.testClientDNProtocolTimeout
    1: org.apache.hadoop.hdfs.TestDFSClientRetries.testGetFileChecksum
    1: org.apache.hadoop.hdfs.TestDFSClientRetries.testNamenodeRestart
{code}

> Introducing a tool to detect flaky tests of hadoop jenkins test job
> -------------------------------------------------------------------
>
>                 Key: HADOOP-11045
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11045
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: build, tools
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>
> File this jira to introduce a tool to detect flaky tests of hadoop jenkins test jobs. 
> I developed the tool on top of some initial work [~tlipcon] did. We find it quite useful. With Todd's agreement, I'd like to push it to upstream so all of us can share (thanks Todd for the initial work and support). I hope you find the tool useful.
> This is a tool for hadoop contributors rather than hadoop users. Thanks [~tedyu] for the advice to put to dev-support dir.
> Description of the tool:
> {code}
> #
> # Given a jenkins test job, this script examines all runs of the job done
> # within specified period of time (number of days prior to the execution
> # time of this script), and reports all failed tests.
> #
> # The output of this script includes a section for each run that has failed
> # tests, with each failed test name listed.
> #
> # More importantly, at the end, it outputs a summary section to list all failed
> # tests within all examined runs, and indicate how many runs a same test
> # failed, and sorted all failed tests by how many runs each test failed in.
> #
> # This way, when we see failed tests in PreCommit build, we can quickly tell 
> # whether a failed test is a new failure or it failed before, and it may just 
> # be a flaky test.
> #
> # Of course, to be 100% sure about the reason of a failed test, closer look 
> # at the failed test for the specific run is necessary.
> #
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)