You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@drill.apache.org by "Vitalii Diravka (Jira)" <ji...@apache.org> on 2021/11/03 11:32:00 UTC

[jira] [Commented] (DRILL-8030) Intermittent TestDrillbitResilience cancelInMiddleOfFetchingResults and foreman_runTryEnd failures

    [ https://issues.apache.org/jira/browse/DRILL-8030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437961#comment-17437961 ] 

Vitalii Diravka commented on DRILL-8030:
----------------------------------------

It also resolves DRILL-3052, DRILL-3167, DRILL-3193, DRILL-3194, DRILL-3967, DRILL-6228. Therefore close them

> Intermittent TestDrillbitResilience cancelInMiddleOfFetchingResults and foreman_runTryEnd failures
> --------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-8030
>                 URL: https://issues.apache.org/jira/browse/DRILL-8030
>             Project: Apache Drill
>          Issue Type: Sub-task
>          Components: Tools, Build &amp; Test
>    Affects Versions: 1.19.0
>            Reporter: Vitalii Diravka
>            Assignee: Vitalii Diravka
>            Priority: Minor
>             Fix For: Future
>
>
> DRILL-7908 fixes distributed deadlocks in _TestDrillbitResilience_ and add better timing for simulation the different Drill states. But sometimes several tests failed.
>  1. Sometimes tests indicate memory leak:
> {code:java}
> Error:  Failures: 
> 3419Error:  org.apache.drill.exec.server.TestDrillbitResilience.cancelInMiddleOfFetchingResults
> 3420Error:    Run 1: TestDrillbitResilience.cancelInMiddleOfFetchingResults:375 We are leaking 3000000 bytes ==> expected: <0> but was: <3000000>
> {code}
> But actually there is no memory leak. Looks like Drill just check actual memory to early, when dot all fragments are closed, so adding timeout before final _countAllocatedMemory_ fixes the issue. 
>  The other reason of test failures - the queries were not in expected state before cancelling (for instance in STARTING state instead of RUNNING), so adding timeout before starting cancellation thread allows to wait the proper drill query state, which is expected to be for Drill  in test case before cancellation.
>  I don't have anymore test failures with NUM_RUNS = 1000 (@RepeatedTest) for the problematic test cases. 
> 2. The other test case which failed is:
> {code:java}
> Error:  Failures: 
> 3540Error:    TestDrillbitResilience.foreman_runTryEnd:289->testForeman:973->assertFailsWithException:960->assertFailsWithException:954 Query state should be FAILED (and not COMPLETED). ==> expected: <COMPLETED> but was: <FAILED>{code}
> It relates to DRILL-3167. The root cause here is the following: in some cases we are completing the query faster than run-try-end exception is injecetd and thrown in Foreman. The Completed state is acceptable for such cases



--
This message was sent by Atlassian Jira
(v8.3.4#803005)