You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Jr (JIRA)" <ji...@apache.org> on 2017/04/24 13:11:04 UTC
[jira] [Created] (IMPALA-5249) JDBC for Impala cannot cancel
queries successfully - shown as "session closed"
Jr created IMPALA-5249:
--------------------------
Summary: JDBC for Impala cannot cancel queries successfully - shown as "session closed"
Key: IMPALA-5249
URL: https://issues.apache.org/jira/browse/IMPALA-5249
Project: IMPALA
Issue Type: Bug
Affects Versions: Impala 2.2
Environment: Cloudera JDBC 2.5.34, Impala 2.2
Reporter: Jr
We are using Cloudera JDBC 2.5.34 to connector to Impala 2.2.
We have had reports that if a user cancels a long-running query, it continues to execute and does not cancel appropriately. This means resources are tied up for a query no longer required.
Although queries seem to be cancelled, they often run for the same length of time as completed queries.
Why might this be happening? How can we use the JDBC to immediately cancel the query and free resources?
The query status for cancelled queries is "Session closed". I noticed that if I do this in Hue, the query status is "canceled".
I have investigated and our code is calling Statement.cancel() and close() as expected.
Is this related to row batches? Could it be related to IMPALA-1869 Or is this just typical of cancellation through JDBC (https://docs.oracle.com/cd/E11882_01/java.112/e16548/apxtblsh.htm#JJDBC28983 )? Is the reason for the Query Status just because the status is not specified via JDBC?
How is Statement.cancel() implemented in the JDBC driver and how does Impala handle it? Is there a specification for expected behaviour.
Below are some of details of a query that does not appear to have been cancelled properly
{code}
Query (id=f9412bbfb5615592:91bde18d434babb5)
Summary
Session ID: f2444cec28c22d64:5a891aadd50a30aa
Session Type: HIVESERVER2
HiveServer2 Protocol Version: V6
Start Time: 2017-04-20 19:33:38.586908000
End Time: 2017-04-20 19:34:40.673970000
Query Type: QUERY
Query State: EXCEPTION
Query Status: Session closed
Impala Version: impalad version 2.2.0-cdh5.4.5 RELEASE (build 4a81c1d04c39961ef14ff6121d543dd96ef60e6e)
User: test-user
Connected User: impala@CLOUDERA
Delegated User: test-user
Network Address: 10.25.21.26:47390
Default Db: test_db
Sql Statement: Select <LONG LIST OF FIELDS> from test_db.test_table ORDER BY ts DESC LIMIT 50
Plan:
----------------
Estimated Per-Host Requirements: Memory=5.30GB VCores=1
WARNING: The following tables are missing relevant table and/or column statistics.
test_db.test_table
F01:PLAN FRAGMENT [UNPARTITIONED]
02:MERGING-EXCHANGE [UNPARTITIONED]
order by: concat_ws('.', from_unixtime(unix_timestamp(ts), 'dd MMM yyyy HH:mm:ss'), CAST(extract(ts, 'millisecond') AS STRING)) DESC
limit: 50
hosts=4 per-host-mem=unavailable
tuple-ids=1 row-size=1.46KB cardinality=0
F00:PLAN FRAGMENT [RANDOM]
DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=02, UNPARTITIONED]
01:TOP-N [LIMIT=50]
| order by: concat_ws('.', from_unixtime(unix_timestamp(ts), 'dd MMM yyyy HH:mm:ss'), CAST(extract(ts, 'millisecond') AS STRING)) DESC
| hosts=4 per-host-mem=0B
| tuple-ids=1 row-size=1.46KB cardinality=0
|
00:SCAN HDFS [test_db.test_table, RANDOM]
partitions=6/45 files=50 size=1.92GB
table stats: 0 rows total (6 partition(s) missing stats)
column stats: all
hosts=4 per-host-mem=5.30GB
tuple-ids=0 row-size=1.46KB cardinality=0
----------------
Estimated Per-Host Mem: 5687476224
Estimated Per-Host VCores: 1
Tables Missing Stats: test_db.test_table
Request Pool: default-pool
ExecSummary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
-------------------------------------------------------------------------------------------------------------------------------
02:MERGING-EXCHANGE 1 226.179us 226.179us 0 0 8.00 KB -1.00 B UNPARTITIONED
01:TOP-N 4 58s272ms 1m1s 200 0 1.07 GB 0
00:SCAN HDFS 4 95.567ms 107.109ms 24.38M 0 805.44 MB 5.30 GB test_db.test_table
Planner Timeline
Analysis finished: 9543737
Equivalence classes computed: 15108936
Single node plan created: 18757579
Distributed plan created: 22242733
Lineage info computed: 31840781
Planning finished: 35370463
Query Timeline
Start execution: 60574
Planning finished: 39487764
Ready to start remote fragments: 43883334
Remote fragments started: 83548681
Rows available: 62070099362
Cancelled: 62085704664
Unregister query: 62087032514
{code}
There are a few fields that look like this:
{code}
from_unixtime(unix_timestamp(start_ts), 'dd MMM yyyy HH:mm:ss'), cast(extract(start_ts, 'millisecond') as string)) AS "start_ts"
{code}
This is what we expect to see (a query from hue):
{code}
Query Type: QUERY
Query State: EXCEPTION
Query Status: Cancelled
{code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)