You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Bikramjeet Vig (Jira)" <ji...@apache.org> on 2021/03/31 17:57:00 UTC

[jira] [Comment Edited] (IMPALA-10596) TestAdmissionControllerStressWithACService.test_mem_limit fails with "Invalid or unknown query handle" when canceling a query

    [ https://issues.apache.org/jira/browse/IMPALA-10596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311963#comment-17311963 ] 

Bikramjeet Vig edited comment on IMPALA-10596 at 3/31/21, 5:56 PM:
-------------------------------------------------------------------

From the logs it seems like the client running the query was closed earlier than the teardown() method. If the client is closed an Unregistered is triggered which is why when the teardown issues a cancel on that query_id it hits the aforementioned error.
The last part of the log is always "MainThread: Found all 8 admitted threads after 0.0 seconds".
I think what might be happening here is that it hits an exception which causes the SubmitQueryThread to exit and close the client in its _finally_ block before teardown gets a chance to cancel the query. We also don't get to the see log lines printed before the client is closed, not sure if that is because the logger does not get a chance to flush before the process dies.

Filtered logs  from on the failed builds:
{noformat}
19643:I0317 04:04:55.293568 21378 impala-server.cc:1388] UnregisterQuery(): query_id=6a4a4ca00aeb3326:0eb710ec00000000
19644:I0317 04:04:55.293584 21378 coordinator.cc:706] ExecState: query id=6a4a4ca00aeb3326:0eb710ec00000000 execution cancelled
19645:I0317 04:04:55.293594 21378 coordinator-backend-state.cc:974] query_id=6a4a4ca00aeb3326:0eb710ec00000000 target backend=127.0.0.1:27000: Sending CancelQueryFInstances rpc
19668:I0317 04:04:55.300276 20592 control-service.cc:219] CancelQueryFInstances(): query_id=6a4a4ca00aeb3326:0eb710ec00000000
19669:I0317 04:04:55.300292 20592 query-exec-mgr.cc:126] QueryState: query_id=6a4a4ca00aeb3326:0eb710ec00000000 refcnt=5
19670:I0317 04:04:55.300297 20592 query-state.cc:974] Cancel: query_id=6a4a4ca00aeb3326:0eb710ec00000000
19671:I0317 04:04:55.300305 20592 krpc-data-stream-mgr.cc:339] cancelling active streams for query_id=6a4a4ca00aeb3326:0eb710ec00000000
19672:I0317 04:04:55.300380 21466 krpc-data-stream-mgr.cc:308] 6a4a4ca00aeb3326:0eb710ec00000000] DeregisterRecvr(): fragment_instance_id=6a4a4ca00aeb3326:0eb710ec00000000, node=31
19673:I0317 04:04:55.300405 21378 coordinator-backend-state.cc:974] query_id=6a4a4ca00aeb3326:0eb710ec00000000 target backend=127.0.0.1:27002: Not cancelling because the backend is already done: 
19674:I0317 04:04:55.300420 21378 coordinator-backend-state.cc:974] query_id=6a4a4ca00aeb3326:0eb710ec00000000 target backend=127.0.0.1:27001: Not cancelling because the backend is already done: 
19675:I0317 04:04:55.300426 21378 coordinator.cc:994] CancelBackends() query_id=6a4a4ca00aeb3326:0eb710ec00000000, tried to cancel 1 backends
19680:I0317 04:04:55.300539 21378 coordinator.cc:1370] Release admission control resources for query_id=6a4a4ca00aeb3326:0eb710ec00000000
19704:I0317 04:04:55.309382 22599 impala-server.cc:1577] Cancel(): query_id=6a4a4ca00aeb3326:0eb710ec00000000
19705:I0317 04:04:55.309398 22599 impala-server.cc:1538] Invalid or unknown query handle: 6a4a4ca00aeb3326:0eb710ec00000000.
19751:I0317 04:04:55.317633 21466 query-state.cc:957] 6a4a4ca00aeb3326:0eb710ec00000000] Instance completed. instance_id=6a4a4ca00aeb3326:0eb710ec00000000 #in-flight=14 status=CANCELLED: Cancelled
19752:I0317 04:04:55.317644 21448 query-state.cc:468] 6a4a4ca00aeb3326:0eb710ec00000000] UpdateBackendExecState(): last report for 6a4a4ca00aeb3326:0eb710ec00000000
19754:I0317 04:04:55.320560 21464 query-state.cc:957] 6a4a4ca00aeb3326:0eb710ec00000002] Instance completed. instance_id=6a4a4ca00aeb3326:0eb710ec00000002 #in-flight=12 status=CANCELLED: Cancelled
19765:I0317 04:04:55.334306 20592 impala-server.cc:1551] Invalid or unknown query handle: 6a4a4ca00aeb3326:0eb710ec00000000.
19766:I0317 04:04:55.334322 20592 control-service.cc:179] ReportExecStatus(): Received report for unknown query ID (probably closed or cancelled): 6a4a4ca00aeb3326:0eb710ec00000000 remote host=127.0.0.1:59410
19767:I0317 04:04:55.334400 21448 query-state.cc:738] 6a4a4ca00aeb3326:0eb710ec00000000] Cancelling fragment instances as directed by the coordinator. Returned status: ReportExecStatus(): Received report for unknown query ID (probably closed or cancelled): 6a4a4ca00aeb3326:0eb710ec00000000 remote host=127.0.0.1:59410
19768:I0317 04:04:55.334416 21448 query-state.cc:974] 6a4a4ca00aeb3326:0eb710ec00000000] Cancel: query_id=6a4a4ca00aeb3326:0eb710ec00000000
19773:I0317 04:04:55.348507 21253 impala-server.cc:1420] Query successfully unregistered: query_id=6a4a4ca00aeb3326:0eb710ec00000000
19775:I0317 04:04:55.376655 21253 query-exec-mgr.cc:213] ReleaseQueryState(): deleted query_id=6a4a4ca00aeb3326:0eb710ec00000000
{noformat}

I'll try to simulate this on my local and see if i hit the same behavior


was (Author: bikramjeet.vig):
From the logs it seems like the client running the query was closed earlier than the teardown() method. If the client is closed an Unregistered is triggered which is why when the teardown issues a cancel on that query_id it hits the aforementioned error.
The last part of the log is always "MainThread: Found all 8 admitted threads after 0.0 seconds".
I think what might be happening here is that it hits an exception which causes the SubmitQueryThread to exit and close the client in its _finally_ block before teardown gets a change to cancel the query. We also don't get to the see log lines printed before the client is closed, not sure if that is because the logger does not get a chance to flush before the process dies.

Filtered logs  from on the failed builds:
{noformat}
19643:I0317 04:04:55.293568 21378 impala-server.cc:1388] UnregisterQuery(): query_id=6a4a4ca00aeb3326:0eb710ec00000000
19644:I0317 04:04:55.293584 21378 coordinator.cc:706] ExecState: query id=6a4a4ca00aeb3326:0eb710ec00000000 execution cancelled
19645:I0317 04:04:55.293594 21378 coordinator-backend-state.cc:974] query_id=6a4a4ca00aeb3326:0eb710ec00000000 target backend=127.0.0.1:27000: Sending CancelQueryFInstances rpc
19668:I0317 04:04:55.300276 20592 control-service.cc:219] CancelQueryFInstances(): query_id=6a4a4ca00aeb3326:0eb710ec00000000
19669:I0317 04:04:55.300292 20592 query-exec-mgr.cc:126] QueryState: query_id=6a4a4ca00aeb3326:0eb710ec00000000 refcnt=5
19670:I0317 04:04:55.300297 20592 query-state.cc:974] Cancel: query_id=6a4a4ca00aeb3326:0eb710ec00000000
19671:I0317 04:04:55.300305 20592 krpc-data-stream-mgr.cc:339] cancelling active streams for query_id=6a4a4ca00aeb3326:0eb710ec00000000
19672:I0317 04:04:55.300380 21466 krpc-data-stream-mgr.cc:308] 6a4a4ca00aeb3326:0eb710ec00000000] DeregisterRecvr(): fragment_instance_id=6a4a4ca00aeb3326:0eb710ec00000000, node=31
19673:I0317 04:04:55.300405 21378 coordinator-backend-state.cc:974] query_id=6a4a4ca00aeb3326:0eb710ec00000000 target backend=127.0.0.1:27002: Not cancelling because the backend is already done: 
19674:I0317 04:04:55.300420 21378 coordinator-backend-state.cc:974] query_id=6a4a4ca00aeb3326:0eb710ec00000000 target backend=127.0.0.1:27001: Not cancelling because the backend is already done: 
19675:I0317 04:04:55.300426 21378 coordinator.cc:994] CancelBackends() query_id=6a4a4ca00aeb3326:0eb710ec00000000, tried to cancel 1 backends
19680:I0317 04:04:55.300539 21378 coordinator.cc:1370] Release admission control resources for query_id=6a4a4ca00aeb3326:0eb710ec00000000
19704:I0317 04:04:55.309382 22599 impala-server.cc:1577] Cancel(): query_id=6a4a4ca00aeb3326:0eb710ec00000000
19705:I0317 04:04:55.309398 22599 impala-server.cc:1538] Invalid or unknown query handle: 6a4a4ca00aeb3326:0eb710ec00000000.
19751:I0317 04:04:55.317633 21466 query-state.cc:957] 6a4a4ca00aeb3326:0eb710ec00000000] Instance completed. instance_id=6a4a4ca00aeb3326:0eb710ec00000000 #in-flight=14 status=CANCELLED: Cancelled
19752:I0317 04:04:55.317644 21448 query-state.cc:468] 6a4a4ca00aeb3326:0eb710ec00000000] UpdateBackendExecState(): last report for 6a4a4ca00aeb3326:0eb710ec00000000
19754:I0317 04:04:55.320560 21464 query-state.cc:957] 6a4a4ca00aeb3326:0eb710ec00000002] Instance completed. instance_id=6a4a4ca00aeb3326:0eb710ec00000002 #in-flight=12 status=CANCELLED: Cancelled
19765:I0317 04:04:55.334306 20592 impala-server.cc:1551] Invalid or unknown query handle: 6a4a4ca00aeb3326:0eb710ec00000000.
19766:I0317 04:04:55.334322 20592 control-service.cc:179] ReportExecStatus(): Received report for unknown query ID (probably closed or cancelled): 6a4a4ca00aeb3326:0eb710ec00000000 remote host=127.0.0.1:59410
19767:I0317 04:04:55.334400 21448 query-state.cc:738] 6a4a4ca00aeb3326:0eb710ec00000000] Cancelling fragment instances as directed by the coordinator. Returned status: ReportExecStatus(): Received report for unknown query ID (probably closed or cancelled): 6a4a4ca00aeb3326:0eb710ec00000000 remote host=127.0.0.1:59410
19768:I0317 04:04:55.334416 21448 query-state.cc:974] 6a4a4ca00aeb3326:0eb710ec00000000] Cancel: query_id=6a4a4ca00aeb3326:0eb710ec00000000
19773:I0317 04:04:55.348507 21253 impala-server.cc:1420] Query successfully unregistered: query_id=6a4a4ca00aeb3326:0eb710ec00000000
19775:I0317 04:04:55.376655 21253 query-exec-mgr.cc:213] ReleaseQueryState(): deleted query_id=6a4a4ca00aeb3326:0eb710ec00000000
{noformat}

I'll try to simulate this on my local and see if i hit the same behavior

> TestAdmissionControllerStressWithACService.test_mem_limit fails with "Invalid or unknown query handle" when canceling a query
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-10596
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10596
>             Project: IMPALA
>          Issue Type: Bug
>    Affects Versions: Impala 4.0
>            Reporter: Andrew Sherman
>            Assignee: Bikramjeet Vig
>            Priority: Critical
>
> TestAdmissionControllerStress.test_mem_limit fails similarly
> {code}
> custom_cluster/test_admission_controller.py:1437: in teardown
>     client.cancel(thread.query_handle)
> common/impala_connection.py:215: in cancel
>     return self.__beeswax_client.cancel_query(operation_handle.get_handle())
> beeswax/impala_beeswax.py:369: in cancel_query
>     return self.__do_rpc(lambda: self.imp_service.Cancel(query_id))
> beeswax/impala_beeswax.py:520: in __do_rpc
>     raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> E    INNER EXCEPTION: <class 'beeswaxd.ttypes.BeeswaxException'>
> E    MESSAGE: Invalid or unknown query handle: 174962332188aac2:1713d0fe00000000.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org