You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "bharath v (JIRA)" <ji...@apache.org> on 2018/10/24 18:52:00 UTC
[jira] [Commented] (IMPALA-7727) failed compute stats child query
status no longer propagates to parent query
[ https://issues.apache.org/jira/browse/IMPALA-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662674#comment-16662674 ]
bharath v commented on IMPALA-7727:
-----------------------------------
TL;DR It's funny how this surfaced now. It was broken for a while and the commit in [IMPALA-7420|https://github.com/apache/impala/commit/4845f98beecc90775f58e8e3eb72721e02252f18] exposed it. cc: [~tarmstrong]
Long story:
========
So, the child query execution uses hiveserver2 interface to execute the queries. The problematic calls here are highlighted below.
{noformat}
Status ChildQuery::ExecAndFetch() {
...........
parent_server_->ExecuteStatement(exec_stmt_resp, exec_stmt_req);
............
do {
RETURN_IF_ERROR(IsCancelled());
parent_server_->FetchResults(fetch_resp_, fetch_req); <======
status = fetch_resp_.status; <=====
} while (status.ok() && fetch_resp_.hasMoreRows);
RETURN_IF_ERROR(IsCancelled());
{noformat}
The issue here is that {{FetchResults()}} returns a {{TFetchResultsResp}} which contains an embedded {{apache::hive::service:cli::thrift::TStatus}} and in case of errors, {{HS2_RETURN_ERROR}} swallows original status code and just converts it to a generic "ERROR_STATUS".
{noformat}
// HiveServer2 error returning macro
#define HS2_RETURN_ERROR(return_val, error_msg, error_state) \
do { \
return_val.status.__set_statusCode(thrift::TStatusCode::ERROR_STATUS); \
return_val.status.__set_errorMessage((error_msg)); \
return_val.status.__set_sqlState((error_state)); \
return; \
} while (false)
void ImpalaServer::FetchResults(TFetchResultsResp& return_val,
const TFetchResultsReq& request) {
.............
// FetchInternal takes care of extending the session
Status status = FetchInternal(query_id, request.maxRows, fetch_first, &return_val);
if (!status.ok()) {
if (status.IsRecoverableError()) {
DCHECK(fetch_first);
} else {
discard_result(UnregisterQuery(query_id, false, &status));
}
HS2_RETURN_ERROR(return_val, status.GetDetail(), SQLSTATE_GENERAL_ERROR); <======
}
return_val.status.__set_statusCode(thrift::TStatusCode::SUCCESS_STATUS);
}
{noformat}
This a problem however for child-query execution since we again try to convert {{{{apache::hive::service:cli::thrift::TStatus}} back to {{impala::Status}} . The problematic code is below
{noformat}
Status& Status::operator=(
const apache::hive::service::cli::thrift::TStatus& hs2_status) {
delete msg_;
if (hs2_status.statusCode
== apache::hive::service::cli::thrift::TStatusCode::SUCCESS_STATUS) {
msg_ = NULL;
} else {
msg_ = new ErrorMsg(
static_cast<TErrorCode::type>(hs2_status.statusCode), hs2_status.errorMessage); <========
}
{noformat}
The hs2_status.statusCode is now mapped back to an Impala Status code and the {{ErrorMessage}} tries to substitute the string back that is mapped to this error code.
{{ERROR_STATUS}} (enum value = 3) [maps|https://github.com/apache/impala/blob/15e8ce4f273945ce548fe677ee0140dea8068e6d/common/thrift/generate_error_codes.py#L35] to "CANCELED" which was changed by IMPALA-7420 to a hard coded error string "canceled" which fails the impala::Status substitution with the actual error message.
> failed compute stats child query status no longer propagates to parent query
> ----------------------------------------------------------------------------
>
> Key: IMPALA-7727
> URL: https://issues.apache.org/jira/browse/IMPALA-7727
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 3.1.0
> Reporter: Michael Brown
> Assignee: bharath v
> Priority: Blocker
> Labels: regression, stress
> Attachments: 2.12-child-profile.txt, 2.12-compute-stats-profile.txt, 3.1-child-profile.txt, 3.1-compute-stats-profile.txt
>
>
> [~bharathv] since you have been dealing with stats, please take a look. Otherwise feel free to reassign. This bug prevents the stress test from running with compute stats statements. It triggers in non-stressful conditions, too.
> {noformat}
> $ impala-shell.sh -d tpch_parquet
> [localhost:21000] tpch_parquet> set mem_limit=24m;
> MEM_LIMIT set to 24m
> [localhost:21000] tpch_parquet> compute stats customer;
> Query: compute stats customer
> WARNINGS: Cancelled
> [localhost:21000] tpch_parquet>
> {noformat}
> The problem is that the child query didn't have enough memory to run, but this error didn't propagate up.
> {noformat}
> Query (id=384d37fb2826a962:f4b1035700000000):
> DEBUG MODE WARNING: Query profile created while running a DEBUG build of Impala. Use RELEASE builds to measure query performance.
> Summary:
> Session ID: d343e1026d497bb0:7e87b342c73c108d
> Session Type: BEESWAX
> Start Time: 2018-10-18 15:16:34.036363000
> End Time: 2018-10-18 15:16:34.177711000
> Query Type: QUERY
> Query State: EXCEPTION
> Query Status: Rejected query from pool default-pool: minimum memory reservation is greater than memory available to the query for buffer reservations. Memory reservation needed given the current plan: 128.00 KB. Adjust either the mem_limit or the pool config (max-query-mem-limit, min-query-mem-limit) for the query to allow the query memory limit to be at least 32.12 MB. Note that changing the mem_limit may also change the plan. See the query profile for more information about the per-node memory requirements.
> Impala Version: impalad version 3.1.0-SNAPSHOT DEBUG (build 9f5c5e6df03824cba292fe5a619153462c11669c)
> User: mikeb
> Connected User: mikeb
> Delegated User:
> Network Address: ::ffff:127.0.0.1:46458
> Default Db: tpch_parquet
> Sql Statement: SELECT COUNT(*) FROM customer
> Coordinator: mikeb-ub162:22000
> Query Options (set by configuration): MEM_LIMIT=25165824,MT_DOP=4
> Query Options (set by configuration and planner): MEM_LIMIT=25165824,NUM_SCANNER_THREADS=1,MT_DOP=4
> Plan:
> ----------------
> Max Per-Host Resource Reservation: Memory=512.00KB Threads=5
> Per-Host Resource Estimates: Memory=146MB
> F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
> | Per-Host Resources: mem-estimate=10.00MB mem-reservation=0B thread-reservation=1
> PLAN-ROOT SINK
> | mem-estimate=0B mem-reservation=0B thread-reservation=0
> |
> 03:AGGREGATE [FINALIZE]
> | output: count:merge(*)
> | mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB thread-reservation=0
> | tuple-ids=1 row-size=8B cardinality=1
> | in pipelines: 03(GETNEXT), 01(OPEN)
> |
> 02:EXCHANGE [UNPARTITIONED]
> | mem-estimate=0B mem-reservation=0B thread-reservation=0
> | tuple-ids=1 row-size=8B cardinality=1
> | in pipelines: 01(GETNEXT)
> |
> F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=4
> Per-Host Resources: mem-estimate=136.00MB mem-reservation=512.00KB thread-reservation=4
> 01:AGGREGATE
> | output: sum_init_zero(tpch_parquet.customer.parquet-stats: num_rows)
> | mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB thread-reservation=0
> | tuple-ids=1 row-size=8B cardinality=1
> | in pipelines: 01(GETNEXT), 00(OPEN)
> |
> 00:SCAN HDFS [tpch_parquet.customer, RANDOM]
> partitions=1/1 files=1 size=12.34MB
> stored statistics:
> table: rows=150000 size=12.34MB
> columns: all
> extrapolated-rows=disabled max-scan-range-rows=150000
> mem-estimate=24.00MB mem-reservation=128.00KB thread-reservation=0
> tuple-ids=0 row-size=8B cardinality=150000
> in pipelines: 00(GETNEXT)
> ----------------
> Estimated Per-Host Mem: 153092096
> Per Host Min Memory Reservation: mikeb-ub162:22000(0) mikeb-ub162:22001(128.00 KB)
> Request Pool: default-pool
> Admission result: Rejected
> Query Compilation: 126.903ms
> - Metadata of all 1 tables cached: 5.484ms (5.484ms)
> - Analysis finished: 16.104ms (10.619ms)
> - Value transfer graph computed: 32.646ms (16.542ms)
> - Single node plan created: 61.289ms (28.642ms)
> - Runtime filters computed: 66.148ms (4.859ms)
> - Distributed plan created: 66.428ms (280.057us)
> - Parallel plans created: 67.866ms (1.437ms)
> - Planning finished: 126.903ms (59.037ms)
> Query Timeline: 140.000ms
> - Query submitted: 0.000ns (0.000ns)
> - Planning finished: 140.000ms (140.000ms)
> - Submit for admission: 140.000ms (0.000ns)
> - Completed admission: 140.000ms (0.000ns)
> - Rows available: 140.000ms (0.000ns)
> - Unregister query: 140.000ms (0.000ns)
> - ComputeScanRangeAssignmentTimer: 0.000ns
> Frontend:
> ImpalaServer:
> - ClientFetchWaitTimer: 0.000ns
> - RowMaterializationTimer: 0.000ns
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org