You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Abhishek Rawat (Jira)" <ji...@apache.org> on 2023/04/07 00:56:00 UTC

[jira] [Commented] (IMPALA-12039) Potential Race condition between executor group deletion and admission controller

    [ https://issues.apache.org/jira/browse/IMPALA-12039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709534#comment-17709534 ] 

Abhishek Rawat commented on IMPALA-12039:
-----------------------------------------

While the race condition might be unavoidable since admission and scheduling logic relies on snapshots which doesn't reflect the live state of the cluster. But, the query retry path could be improved to not rely on old snapshots.

> Potential Race condition between executor group deletion and admission controller
> ---------------------------------------------------------------------------------
>
>                 Key: IMPALA-12039
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12039
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Abhishek Rawat
>            Priority: Critical
>
> There is a race condition between admission controller and executors/executor-group deletion. if a query comes in it could be admitted to just deleted executor group and the query would fail.
> {code:java}
> I0330 06:05:25.600728  9398 admission-controller.cc:1941] 3c4f9069df52951e:0b97d92800000000] Trying to admit id=3c4f9069df52951e:0b97d92800000000 in pool_name=root.default executor_group_name=root.default-group-000 per_host_mem_estimate=192.22 MB dedicated_coord_mem_estimate=100.03 MB max_requests=-1 max_queued=200 max_mem=48828.12 GB is_trivial_query=false
> I0330 06:05:25.600769  9398 admission-controller.cc:1950] 3c4f9069df52951e:0b97d92800000000] Stats: agg_num_running=0, agg_num_queued=0, agg_mem_reserved=0,  local_host(local_mem_admitted=0, local_trivial_running=0, num_admitted_running=0, num_queued=0, backend_mem_reserved=0, topN_query_stats: queries=[7345a69a7cf74870:36a8543f00000000], total_mem_consumed=0; pool_level_stats: num_running=1, min=0, max=0, pool_total_mem=0, average_per_query=0)
> I0330 06:05:25.600816  9398 admission-controller.cc:1300] 3c4f9069df52951e:0b97d92800000000] Admitting query id=3c4f9069df52951e:0b97d92800000000
> I0330 06:05:25.600883  9398 impala-server.cc:2231] 3c4f9069df52951e:0b97d92800000000] Registering query locations
> I0330 06:05:25.600898  9398 coordinator.cc:151] 3c4f9069df52951e:0b97d92800000000] Exec() query_id=3c4f9069df52951e:0b97d92800000000 stmt=select count(*) from test_a9a41a5.t where id + random() < sleep(10000)
> I0330 06:05:25.601054  9398 coordinator.cc:476] 3c4f9069df52951e:0b97d92800000000] starting execution on 2 backends for query_id=3c4f9069df52951e:0b97d92800000000
> I0330 06:05:25.601359   124 control-service.cc:148] 3c4f9069df52951e:0b97d92800000000] ExecQueryFInstances(): query_id=3c4f9069df52951e:0b97d92800000000 coord=coordinator-0.coordinator-int.impala-1680155570-trh7.svc.cluster.local:27000 #instances=1
> I0330 06:05:25.601604   117 kudu-status-util.h:55] Exec() rpc failed: Network error: Client connection negotiation failed: client connection to 192.168.112.16:27010: connect: Connection refused (error 111)
> E0330 06:05:25.601706   117 coordinator-backend-state.cc:190] ExecQueryFInstances rpc query_id=3c4f9069df52951e:0b97d92800000000 failed: Exec() rpc failed: Network error: Client connection negotiation failed: client connection to 192.168.112.16:27010: connect: Connection refused (error 111) {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org