You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Jr (JIRA)" <ji...@apache.org> on 2017/04/24 13:11:04 UTC
[jira] [Created] (IMPALA-5249) JDBC for Impala cannot cancel queries successfully - shown as "session closed"

Jr created IMPALA-5249:
--------------------------

             Summary: JDBC for Impala cannot cancel queries successfully - shown as "session closed"
                 Key: IMPALA-5249
                 URL: https://issues.apache.org/jira/browse/IMPALA-5249
             Project: IMPALA
          Issue Type: Bug
    Affects Versions: Impala 2.2
         Environment: Cloudera JDBC 2.5.34, Impala 2.2
            Reporter: Jr


We are using Cloudera JDBC 2.5.34 to connector to Impala 2.2.

We have had reports that if a user cancels a long-running query, it continues to execute and does not cancel appropriately. This means resources are tied up for a query no longer required.

Although queries seem to be cancelled, they often run for the same length of time as completed queries.

Why might this be happening? How can we use the JDBC to immediately cancel the query and free resources?

The query status for cancelled queries is "Session closed". I noticed that if I do this in Hue, the query status is "canceled".

I have investigated and our code is calling Statement.cancel() and close() as expected.

Is this related to row batches? Could it be related to IMPALA-1869 Or is this just typical of cancellation through JDBC (https://docs.oracle.com/cd/E11882_01/java.112/e16548/apxtblsh.htm#JJDBC28983 )? Is the reason for the Query Status just because the status is not specified via JDBC?

How is Statement.cancel() implemented in the JDBC driver and how does Impala handle it? Is there a specification for expected behaviour.

Below are some of details of a query that does not appear to have been cancelled properly

{code}
Query (id=f9412bbfb5615592:91bde18d434babb5)
  Summary
    Session ID: f2444cec28c22d64:5a891aadd50a30aa
    Session Type: HIVESERVER2
    HiveServer2 Protocol Version: V6
    Start Time: 2017-04-20 19:33:38.586908000
    End Time: 2017-04-20 19:34:40.673970000
    Query Type: QUERY
    Query State: EXCEPTION
    Query Status: Session closed
	
	
   Impala Version: impalad version 2.2.0-cdh5.4.5 RELEASE (build 4a81c1d04c39961ef14ff6121d543dd96ef60e6e)
    User: test-user
    Connected User: impala@CLOUDERA
    Delegated User: test-user
    Network Address: 10.25.21.26:47390
    Default Db: test_db
    Sql Statement: Select <LONG LIST OF FIELDS> from test_db.test_table ORDER BY ts  DESC  LIMIT 50	
	
    Plan: 
----------------
Estimated Per-Host Requirements: Memory=5.30GB VCores=1
WARNING: The following tables are missing relevant table and/or column statistics.
test_db.test_table

F01:PLAN FRAGMENT [UNPARTITIONED]
  02:MERGING-EXCHANGE [UNPARTITIONED]
     order by: concat_ws('.', from_unixtime(unix_timestamp(ts), 'dd MMM yyyy HH:mm:ss'), CAST(extract(ts, 'millisecond') AS STRING)) DESC
     limit: 50
     hosts=4 per-host-mem=unavailable
     tuple-ids=1 row-size=1.46KB cardinality=0

F00:PLAN FRAGMENT [RANDOM]
  DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=02, UNPARTITIONED]
  01:TOP-N [LIMIT=50]
  |  order by: concat_ws('.', from_unixtime(unix_timestamp(ts), 'dd MMM yyyy HH:mm:ss'), CAST(extract(ts, 'millisecond') AS STRING)) DESC
  |  hosts=4 per-host-mem=0B
  |  tuple-ids=1 row-size=1.46KB cardinality=0
  |
  00:SCAN HDFS [test_db.test_table, RANDOM]
     partitions=6/45 files=50 size=1.92GB
     table stats: 0 rows total (6 partition(s) missing stats)
     column stats: all
     hosts=4 per-host-mem=5.30GB
     tuple-ids=0 row-size=1.46KB cardinality=0
----------------
    Estimated Per-Host Mem: 5687476224
    Estimated Per-Host VCores: 1
    Tables Missing Stats: test_db.test_table
    Request Pool: default-pool
    ExecSummary: 
Operator              #Hosts   Avg Time   Max Time   #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail                       
-------------------------------------------------------------------------------------------------------------------------------
02:MERGING-EXCHANGE        1  226.179us  226.179us       0           0    8.00 KB        -1.00 B  UNPARTITIONED                
01:TOP-N                   4   58s272ms       1m1s     200           0    1.07 GB              0                               
00:SCAN HDFS               4   95.567ms  107.109ms  24.38M           0  805.44 MB        5.30 GB  test_db.test_table 
    Planner Timeline
      Analysis finished: 9543737
      Equivalence classes computed: 15108936
      Single node plan created: 18757579
      Distributed plan created: 22242733
      Lineage info computed: 31840781
      Planning finished: 35370463
    Query Timeline
      Start execution: 60574
      Planning finished: 39487764
      Ready to start remote fragments: 43883334
      Remote fragments started: 83548681
      Rows available: 62070099362
      Cancelled: 62085704664
      Unregister query: 62087032514
{code}

There are a few fields that look like this:

{code}
from_unixtime(unix_timestamp(start_ts), 'dd MMM yyyy HH:mm:ss'), cast(extract(start_ts, 'millisecond') as string)) AS "start_ts"
{code}

This is what we expect to see (a query from hue):

{code}
    Query Type: QUERY
    Query State: EXCEPTION
    Query Status: Cancelled
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)