You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "bharath v (JIRA)" <ji...@apache.org> on 2017/05/24 15:46:04 UTC

[jira] [Resolved] (IMPALA-1972) Queries that take a long time to plan can cause webserver to block other queries

     [ https://issues.apache.org/jira/browse/IMPALA-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

bharath v resolved IMPALA-1972.
-------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.8.0

  Author: Bharath Vissapragada <bh...@cloudera.com>
  Date:   2017-05-23 (Tue, 23 May 2017)

  Changed paths:
    M be/src/service/impala-beeswax-server.cc
    M be/src/service/impala-hs2-server.cc
    M be/src/service/impala-http-handler.cc
    M be/src/service/impala-server.cc
    M be/src/service/impala-server.h
    A tests/custom_cluster/test_query_concurrency.py
    M www/query_plan.tmpl

  Log Message:
  -----------
  IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention

Holding client_request_state_map_lock_ and CRS::lock_ together in certain
paths could potentially block the impalad from registering new queries.
The most common occurrence of this is while loading the webpage of a
query while the query planning is still in progress. Since we hold the
CRS::lock_ during planning, it blocks the web page from loading which
inturn blocks incoming queries by holding client_request_state_map_lock_.

This patch makes client_request_state_map_lock_ a terminal lock so that
we don't have interleaving locking with CRS::lock_.

Testing: Tested it locally by adding a long sleep in
JniFrontend.createExecRequest() and still was able to refresh the web UI
and run parallel queries. Also added a custom cluster test that does the
same sequence of actions by injecting a metadata loading pause.

Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026
Reviewed-on: http://gerrit.cloudera.org:8080/6707
Reviewed-by: Bharath Vissapragada <bh...@cloudera.com>
Tested-by: Impala Public Jenkins

> Queries that take a long time to plan can cause webserver to block other queries
> --------------------------------------------------------------------------------
>
>                 Key: IMPALA-1972
>                 URL: https://issues.apache.org/jira/browse/IMPALA-1972
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.2, Impala 2.3.0
>            Reporter: Henry Robinson
>            Assignee: bharath v
>              Labels: hang, performance
>             Fix For: Impala 2.8.0
>
>
> h3. Summary 
> Trying to get the details of a query through the debug web page while the query is planning will block other queries (and the UI itself), because {{query_exec_state_map_lock_}} will be held for the duration of planning.
> h3. Details
> While a query is planning, it holds onto its query exec state's lock: 
> {code}
>    lock_guard<mutex> l(*(*exec_state)->lock());
>     // register exec state as early as possible so that queries that
>     // take a long time to plan show up, and to handle incoming status
>     // reports before execution starts.
>     RETURN_IF_ERROR(RegisterQuery(session_state, *exec_state));
>     *registered_exec_state = true;
>     // GetExecRequest() does planning
>     RETURN_IF_ERROR((*exec_state)->UpdateQueryStatus(
>         exec_env_->frontend()->GetExecRequest(query_ctx, &result)));
> {code}
> *Query details callback*
> {{ImpalaServer::QuerySummaryCallback}}, which handles {{/query_plan}}, tries to get the same exec state's lock (see {{true}} argument to {{GetQueryExecState()}}.
> {code}
> shared_ptr<QueryExecState> exec_state = GetQueryExecState(query_id, true);
> {code}
> {{GetQueryExecState()}} holds {{query_exec_state_map_lock_}} while it waits to get the {{QueryExecState}}'s lock:
> {code}
> shared_ptr<ImpalaServer::QueryExecState> ImpalaServer::GetQueryExecState(
>     const TUniqueId& query_id, bool lock) {
>   lock_guard<mutex> l(query_exec_state_map_lock_);
>   QueryExecStateMap::iterator i = query_exec_state_map_.find(query_id);
>   if (i == query_exec_state_map_.end()) {
>     return shared_ptr<QueryExecState>();
>   } else {
>     if (lock) i->second->lock()->lock();
>     return i->second;
>   }
> }
> {code}
> So until planning is finished, no query can get {{query_exec_state_map_lock_}}, which it needs to execute.
> h3. What can we do?
> In the short term, maybe we can add {{TryGetQueryExecState()}} which will indicate if the query exists but its lock can't be taken. 
> Or we might be able to let go of {{query_exec_state_map_lock_}} as soon as we find the entry, and before taking its lock:
> {code}
> shared_ptr<ImpalaServer::QueryExecState> ImpalaServer::GetQueryExecState(
>     const TUniqueId& query_id, bool lock) {
>   shared_ptr<QueryExecState> ret;
>   {
>     lock_guard<mutex> l(query_exec_state_map_lock_);
>     QueryExecStateMap::iterator i = query_exec_state_map_.find(query_id);
>     if (i == query_exec_state_map_.end()) {
>       return shared_ptr<QueryExecState>();
>     } else {
>       ret = i->second;
>     }
>   } // give up query_exec_state_map_lock_
>   if (lock) ret->lock()->lock();
>   return ret;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)