You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/07/03 04:44:02 UTC

[jira] [Commented] (DRILL-5155) TestDrillbitResilience unit test is not resilient

    [ https://issues.apache.org/jira/browse/DRILL-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071915#comment-16071915 ] 

Paul Rogers commented on DRILL-5155:
------------------------------------

Additional issues. After enabling the managed version of the external sort, two tests within the test suite behave randomly.

When run in the debugger (directly in Eclipse or using a remote debug when run from Maven), the tests pass. Run as part of the Drill test suite, or as a standalone test in Maven, the tests fail.

{code}
  TestDrillbitResilience.interruptingBlockedMergingRecordBatch:784->interruptingBlockedFragmentsWaitingForData:814->assertCancelledWithoutException:545->assertStateCompleted:531 Query state is incorrect (expected: CANCELED, actual: FAILED) AND/OR 
Exception thrown: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: AssertionError
{code}

And

{code}
memoryLeaksWhenCancelled(org.apache.drill.exec.server.TestDrillbitResilience)  Time elapsed: 50.019 sec  <<< ERROR!
java.lang.Exception: test timed out after 50000 milliseconds
{code}

Sometimes the following fails, though most often it works:

{code}
Running org.apache.drill.exec.server.TestDrillbitResilience#failsAfterMSorterSorting
org.apache.drill.common.exceptions.UserException: CONNECTION ERROR: Connection /172.30.1.212:58698 <--> /172.30.1.212:31013 (user client) closed unexpectedly. Drillbit down?
{code}

In another instance, a test failed because of a *negative* memory leak (test leaked -500 bytes, because start was greater than end...)

The conclusion is that the Drillbit is very fragile; the tests pass, but likely due to luck. Change anything and the tests fail.

> TestDrillbitResilience unit test is not resilient
> -------------------------------------------------
>
>                 Key: DRILL-5155
>                 URL: https://issues.apache.org/jira/browse/DRILL-5155
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.9.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>
> The unit test {{TestDrillbitResilience}} plays quite rough with a set of Drillbits, forcing a number of error conditions to see if the Drillbits can recover. The test cases are good, but they interact with each other to make the test as a whole quite fragile. The failure of any one test tends to cause others to fail. When tests are run individually, they may run. But, when run as a suite, they fail due to cross-interactions.
> Restructure the test to make the tests more independent so that one test does not change the state of the cluster expected by a different test.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)