You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2019/02/21 05:01:00 UTC

[jira] [Resolved] (IMPALA-6662) Make stress test resilient to hangs due to client crashes

     [ https://issues.apache.org/jira/browse/IMPALA-6662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Armstrong resolved IMPALA-6662.
-----------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 3.2.0

Thanks [~sailesh] for doing most of the work here.

> Make stress test resilient to hangs due to client crashes
> ---------------------------------------------------------
>
>                 Key: IMPALA-6662
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6662
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Infrastructure
>            Reporter: Sailesh Mukil
>            Assignee: Tim Armstrong
>            Priority: Major
>             Fix For: Impala 3.2.0
>
>
> The concurrent_select.py process starts multiple sub processes (called query runners), to run the queries. It also starts 2 threads called the query producer thread and the query consumer thread. The query producer thread adds queries to a query queue and the query consumer thread pulls off the queue and feeds the queries to the query runners.
> The query runner, once it gets queries, does the following:
> {code:java}
> (pseudo code. Real code here: https://github.com/apache/impala/blob/d49f629c447ea59ad73ceeb0547fde4d41c651d1/tests/stress/concurrent_select.py#L583-L595)
> with _submit_query_lock:
>     increment(num_queries_started)
> run_query()    # One runner crashes here.
> increment(num_queries_finished)
> {code}
> One of the runners crash inside run_query(), thereby never incrementing num_queries_finished.
> Another thread that's supposed to check for memory leaks (but actually doesn't), periodically acquires '_submit_query_lock' and waits for the number of running queries to reach 0 before releasing the lock:
> https://github.com/apache/impala/blob/d49f629c447ea59ad73ceeb0547fde4d41c651d1/tests/stress/concurrent_select.py#L449-L511
> However, in the above case, the number of running queries will never reach 0 because one of the query runners hasn't incremented 'num_queries_finished' and exited. Therefore, the poll_mem_usage() function will hold the lock indefinitely, causing no new queries to be submitted, nor the stress test to complete running.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)