You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "bharath v (JIRA)" <ji...@apache.org> on 2018/10/24 18:52:00 UTC

[jira] [Commented] (IMPALA-7727) failed compute stats child query status no longer propagates to parent query

    [ https://issues.apache.org/jira/browse/IMPALA-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662674#comment-16662674 ] 

bharath v commented on IMPALA-7727:
-----------------------------------

TL;DR It's funny how this surfaced now. It was broken for a while and the commit in [IMPALA-7420|https://github.com/apache/impala/commit/4845f98beecc90775f58e8e3eb72721e02252f18] exposed it. cc: [~tarmstrong]

Long story:
========

So, the child query execution uses hiveserver2 interface to execute the queries. The problematic calls here are highlighted below.
{noformat}
Status ChildQuery::ExecAndFetch() {
 ...........
  parent_server_->ExecuteStatement(exec_stmt_resp, exec_stmt_req);
 ............ 
  do {
    RETURN_IF_ERROR(IsCancelled());
    parent_server_->FetchResults(fetch_resp_, fetch_req);  <======
    status = fetch_resp_.status; <=====
  } while (status.ok() && fetch_resp_.hasMoreRows);
  RETURN_IF_ERROR(IsCancelled());
{noformat}
The issue here is that {{FetchResults()}} returns a {{TFetchResultsResp}} which contains an embedded {{apache::hive::service:cli::thrift::TStatus}} and in case of errors, {{HS2_RETURN_ERROR}} swallows original status code and just converts it to a generic "ERROR_STATUS".
{noformat}
// HiveServer2 error returning macro
#define HS2_RETURN_ERROR(return_val, error_msg, error_state) \
  do { \
    return_val.status.__set_statusCode(thrift::TStatusCode::ERROR_STATUS); \
    return_val.status.__set_errorMessage((error_msg)); \
    return_val.status.__set_sqlState((error_state)); \
    return; \
  } while (false)

void ImpalaServer::FetchResults(TFetchResultsResp& return_val,
    const TFetchResultsReq& request) {
  .............
  // FetchInternal takes care of extending the session
  Status status = FetchInternal(query_id, request.maxRows, fetch_first, &return_val);
  if (!status.ok()) {
    if (status.IsRecoverableError()) {
      DCHECK(fetch_first);
    } else {
      discard_result(UnregisterQuery(query_id, false, &status));
    }
    HS2_RETURN_ERROR(return_val, status.GetDetail(), SQLSTATE_GENERAL_ERROR);  <======
  }
  return_val.status.__set_statusCode(thrift::TStatusCode::SUCCESS_STATUS);
}
{noformat}
This a problem however for child-query execution since we again try to convert {{{{apache::hive::service:cli::thrift::TStatus}} back to {{impala::Status}} . The problematic code is below
{noformat}
Status& Status::operator=(
    const apache::hive::service::cli::thrift::TStatus& hs2_status) {
  delete msg_;
  if (hs2_status.statusCode
        == apache::hive::service::cli::thrift::TStatusCode::SUCCESS_STATUS) {
    msg_ = NULL;
  } else {
    msg_ = new ErrorMsg(
        static_cast<TErrorCode::type>(hs2_status.statusCode), hs2_status.errorMessage);  <========
  }
  {noformat}
The hs2_status.statusCode is now mapped back to an Impala Status code and the {{ErrorMessage}} tries to substitute the string back that is mapped to this error code.

{{ERROR_STATUS}} (enum value = 3) [maps|https://github.com/apache/impala/blob/15e8ce4f273945ce548fe677ee0140dea8068e6d/common/thrift/generate_error_codes.py#L35] to "CANCELED" which was changed by IMPALA-7420 to a hard coded error string "canceled" which fails the impala::Status substitution with the actual error message.

> failed compute stats child query status no longer propagates to parent query
> ----------------------------------------------------------------------------
>
>                 Key: IMPALA-7727
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7727
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.1.0
>            Reporter: Michael Brown
>            Assignee: bharath v
>            Priority: Blocker
>              Labels: regression, stress
>         Attachments: 2.12-child-profile.txt, 2.12-compute-stats-profile.txt, 3.1-child-profile.txt, 3.1-compute-stats-profile.txt
>
>
> [~bharathv] since you have been dealing with stats, please take a look. Otherwise feel free to reassign. This bug prevents the stress test from running with compute stats statements. It triggers in non-stressful conditions, too.
> {noformat}
> $ impala-shell.sh -d tpch_parquet
> [localhost:21000] tpch_parquet> set mem_limit=24m;
> MEM_LIMIT set to 24m
> [localhost:21000] tpch_parquet> compute stats customer;
> Query: compute stats customer
> WARNINGS: Cancelled
> [localhost:21000] tpch_parquet>
> {noformat}
> The problem is that the child query didn't have enough memory to run, but this error didn't propagate up.
> {noformat}
> Query (id=384d37fb2826a962:f4b1035700000000):
>   DEBUG MODE WARNING: Query profile created while running a DEBUG build of Impala. Use RELEASE builds to measure query performance.
>   Summary:
>     Session ID: d343e1026d497bb0:7e87b342c73c108d
>     Session Type: BEESWAX
>     Start Time: 2018-10-18 15:16:34.036363000
>     End Time: 2018-10-18 15:16:34.177711000
>     Query Type: QUERY
>     Query State: EXCEPTION
>     Query Status: Rejected query from pool default-pool: minimum memory reservation is greater than memory available to the query for buffer reservations. Memory reservation needed given the current plan: 128.00 KB. Adjust either the mem_limit or the pool config (max-query-mem-limit, min-query-mem-limit) for the query to allow the query memory limit to be at least 32.12 MB. Note that changing the mem_limit may also change the plan. See the query profile for more information about the per-node memory requirements.
>     Impala Version: impalad version 3.1.0-SNAPSHOT DEBUG (build 9f5c5e6df03824cba292fe5a619153462c11669c)
>     User: mikeb
>     Connected User: mikeb
>     Delegated User: 
>     Network Address: ::ffff:127.0.0.1:46458
>     Default Db: tpch_parquet
>     Sql Statement: SELECT COUNT(*) FROM customer
>     Coordinator: mikeb-ub162:22000
>     Query Options (set by configuration): MEM_LIMIT=25165824,MT_DOP=4
>     Query Options (set by configuration and planner): MEM_LIMIT=25165824,NUM_SCANNER_THREADS=1,MT_DOP=4
>     Plan: 
> ----------------
> Max Per-Host Resource Reservation: Memory=512.00KB Threads=5
> Per-Host Resource Estimates: Memory=146MB
> F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
> |  Per-Host Resources: mem-estimate=10.00MB mem-reservation=0B thread-reservation=1
> PLAN-ROOT SINK
> |  mem-estimate=0B mem-reservation=0B thread-reservation=0
> |
> 03:AGGREGATE [FINALIZE]
> |  output: count:merge(*)
> |  mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB thread-reservation=0
> |  tuple-ids=1 row-size=8B cardinality=1
> |  in pipelines: 03(GETNEXT), 01(OPEN)
> |
> 02:EXCHANGE [UNPARTITIONED]
> |  mem-estimate=0B mem-reservation=0B thread-reservation=0
> |  tuple-ids=1 row-size=8B cardinality=1
> |  in pipelines: 01(GETNEXT)
> |
> F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=4
> Per-Host Resources: mem-estimate=136.00MB mem-reservation=512.00KB thread-reservation=4
> 01:AGGREGATE
> |  output: sum_init_zero(tpch_parquet.customer.parquet-stats: num_rows)
> |  mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB thread-reservation=0
> |  tuple-ids=1 row-size=8B cardinality=1
> |  in pipelines: 01(GETNEXT), 00(OPEN)
> |
> 00:SCAN HDFS [tpch_parquet.customer, RANDOM]
>    partitions=1/1 files=1 size=12.34MB
>    stored statistics:
>      table: rows=150000 size=12.34MB
>      columns: all
>    extrapolated-rows=disabled max-scan-range-rows=150000
>    mem-estimate=24.00MB mem-reservation=128.00KB thread-reservation=0
>    tuple-ids=0 row-size=8B cardinality=150000
>    in pipelines: 00(GETNEXT)
> ----------------
>     Estimated Per-Host Mem: 153092096
>     Per Host Min Memory Reservation: mikeb-ub162:22000(0) mikeb-ub162:22001(128.00 KB)
>     Request Pool: default-pool
>     Admission result: Rejected
>     Query Compilation: 126.903ms
>        - Metadata of all 1 tables cached: 5.484ms (5.484ms)
>        - Analysis finished: 16.104ms (10.619ms)
>        - Value transfer graph computed: 32.646ms (16.542ms)
>        - Single node plan created: 61.289ms (28.642ms)
>        - Runtime filters computed: 66.148ms (4.859ms)
>        - Distributed plan created: 66.428ms (280.057us)
>        - Parallel plans created: 67.866ms (1.437ms)
>        - Planning finished: 126.903ms (59.037ms)
>     Query Timeline: 140.000ms
>        - Query submitted: 0.000ns (0.000ns)
>        - Planning finished: 140.000ms (140.000ms)
>        - Submit for admission: 140.000ms (0.000ns)
>        - Completed admission: 140.000ms (0.000ns)
>        - Rows available: 140.000ms (0.000ns)
>        - Unregister query: 140.000ms (0.000ns)
>      - ComputeScanRangeAssignmentTimer: 0.000ns
>     Frontend:
>   ImpalaServer:
>      - ClientFetchWaitTimer: 0.000ns
>      - RowMaterializationTimer: 0.000ns
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org