You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Abhishek Rawat (Jira)" <ji...@apache.org> on 2023/04/05 20:04:00 UTC
[jira] [Updated] (IMPALA-12039) Potential Race condition between executor group deletion and admission controller
[ https://issues.apache.org/jira/browse/IMPALA-12039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Abhishek Rawat updated IMPALA-12039:
------------------------------------
Description:
There is a race condition between admission controller and executors/executor-group deletion. if a query comes in it could be admitted to just deleted executor group and the query fails.
{code:java}
I0330 06:05:25.600728 9398 admission-controller.cc:1941] 3c4f9069df52951e:0b97d92800000000] Trying to admit id=3c4f9069df52951e:0b97d92800000000 in pool_name=root.default executor_group_name=root.default-group-000 per_host_mem_estimate=192.22 MB dedicated_coord_mem_estimate=100.03 MB max_requests=-1 max_queued=200 max_mem=48828.12 GB is_trivial_query=false
I0330 06:05:25.600769 9398 admission-controller.cc:1950] 3c4f9069df52951e:0b97d92800000000] Stats: agg_num_running=0, agg_num_queued=0, agg_mem_reserved=0, local_host(local_mem_admitted=0, local_trivial_running=0, num_admitted_running=0, num_queued=0, backend_mem_reserved=0, topN_query_stats: queries=[7345a69a7cf74870:36a8543f00000000], total_mem_consumed=0; pool_level_stats: num_running=1, min=0, max=0, pool_total_mem=0, average_per_query=0)
I0330 06:05:25.600816 9398 admission-controller.cc:1300] 3c4f9069df52951e:0b97d92800000000] Admitting query id=3c4f9069df52951e:0b97d92800000000
I0330 06:05:25.600883 9398 impala-server.cc:2231] 3c4f9069df52951e:0b97d92800000000] Registering query locations
I0330 06:05:25.600898 9398 coordinator.cc:151] 3c4f9069df52951e:0b97d92800000000] Exec() query_id=3c4f9069df52951e:0b97d92800000000 stmt=select count(*) from test_a9a41a5.t where id + random() < sleep(10000)
I0330 06:05:25.601054 9398 coordinator.cc:476] 3c4f9069df52951e:0b97d92800000000] starting execution on 2 backends for query_id=3c4f9069df52951e:0b97d92800000000
I0330 06:05:25.601359 124 control-service.cc:148] 3c4f9069df52951e:0b97d92800000000] ExecQueryFInstances(): query_id=3c4f9069df52951e:0b97d92800000000 coord=coordinator-0.coordinator-int.impala-1680155570-trh7.svc.cluster.local:27000 #instances=1
I0330 06:05:25.601604 117 kudu-status-util.h:55] Exec() rpc failed: Network error: Client connection negotiation failed: client connection to 192.168.112.16:27010: connect: Connection refused (error 111)
E0330 06:05:25.601706 117 coordinator-backend-state.cc:190] ExecQueryFInstances rpc query_id=3c4f9069df52951e:0b97d92800000000 failed: Exec() rpc failed: Network error: Client connection negotiation failed: client connection to 192.168.112.16:27010: connect: Connection refused (error 111) {code}
was:
IMPALA-11891 added support for deleting executor groups if it's empty. However, there is a race condition here where if a query comes in it could be admitted to just deleted executor group and the query fails.
{code:java}
I0330 06:05:25.600728 9398 admission-controller.cc:1941] 3c4f9069df52951e:0b97d92800000000] Trying to admit id=3c4f9069df52951e:0b97d92800000000 in pool_name=root.default executor_group_name=root.default-group-000 per_host_mem_estimate=192.22 MB dedicated_coord_mem_estimate=100.03 MB max_requests=-1 max_queued=200 max_mem=48828.12 GB is_trivial_query=false
I0330 06:05:25.600769 9398 admission-controller.cc:1950] 3c4f9069df52951e:0b97d92800000000] Stats: agg_num_running=0, agg_num_queued=0, agg_mem_reserved=0, local_host(local_mem_admitted=0, local_trivial_running=0, num_admitted_running=0, num_queued=0, backend_mem_reserved=0, topN_query_stats: queries=[7345a69a7cf74870:36a8543f00000000], total_mem_consumed=0; pool_level_stats: num_running=1, min=0, max=0, pool_total_mem=0, average_per_query=0)
I0330 06:05:25.600816 9398 admission-controller.cc:1300] 3c4f9069df52951e:0b97d92800000000] Admitting query id=3c4f9069df52951e:0b97d92800000000
I0330 06:05:25.600883 9398 impala-server.cc:2231] 3c4f9069df52951e:0b97d92800000000] Registering query locations
I0330 06:05:25.600898 9398 coordinator.cc:151] 3c4f9069df52951e:0b97d92800000000] Exec() query_id=3c4f9069df52951e:0b97d92800000000 stmt=select count(*) from test_a9a41a5.t where id + random() < sleep(10000)
I0330 06:05:25.601054 9398 coordinator.cc:476] 3c4f9069df52951e:0b97d92800000000] starting execution on 2 backends for query_id=3c4f9069df52951e:0b97d92800000000
I0330 06:05:25.601359 124 control-service.cc:148] 3c4f9069df52951e:0b97d92800000000] ExecQueryFInstances(): query_id=3c4f9069df52951e:0b97d92800000000 coord=coordinator-0.coordinator-int.impala-1680155570-trh7.svc.cluster.local:27000 #instances=1
I0330 06:05:25.601604 117 kudu-status-util.h:55] Exec() rpc failed: Network error: Client connection negotiation failed: client connection to 192.168.112.16:27010: connect: Connection refused (error 111)
E0330 06:05:25.601706 117 coordinator-backend-state.cc:190] ExecQueryFInstances rpc query_id=3c4f9069df52951e:0b97d92800000000 failed: Exec() rpc failed: Network error: Client connection negotiation failed: client connection to 192.168.112.16:27010: connect: Connection refused (error 111) {code}
In the past the empty executor group would have been unhealthy and admission controller would've queued the incoming query.
> Potential Race condition between executor group deletion and admission controller
> ---------------------------------------------------------------------------------
>
> Key: IMPALA-12039
> URL: https://issues.apache.org/jira/browse/IMPALA-12039
> Project: IMPALA
> Issue Type: Improvement
> Reporter: Abhishek Rawat
> Priority: Critical
>
> There is a race condition between admission controller and executors/executor-group deletion. if a query comes in it could be admitted to just deleted executor group and the query fails.
> {code:java}
> I0330 06:05:25.600728 9398 admission-controller.cc:1941] 3c4f9069df52951e:0b97d92800000000] Trying to admit id=3c4f9069df52951e:0b97d92800000000 in pool_name=root.default executor_group_name=root.default-group-000 per_host_mem_estimate=192.22 MB dedicated_coord_mem_estimate=100.03 MB max_requests=-1 max_queued=200 max_mem=48828.12 GB is_trivial_query=false
> I0330 06:05:25.600769 9398 admission-controller.cc:1950] 3c4f9069df52951e:0b97d92800000000] Stats: agg_num_running=0, agg_num_queued=0, agg_mem_reserved=0, local_host(local_mem_admitted=0, local_trivial_running=0, num_admitted_running=0, num_queued=0, backend_mem_reserved=0, topN_query_stats: queries=[7345a69a7cf74870:36a8543f00000000], total_mem_consumed=0; pool_level_stats: num_running=1, min=0, max=0, pool_total_mem=0, average_per_query=0)
> I0330 06:05:25.600816 9398 admission-controller.cc:1300] 3c4f9069df52951e:0b97d92800000000] Admitting query id=3c4f9069df52951e:0b97d92800000000
> I0330 06:05:25.600883 9398 impala-server.cc:2231] 3c4f9069df52951e:0b97d92800000000] Registering query locations
> I0330 06:05:25.600898 9398 coordinator.cc:151] 3c4f9069df52951e:0b97d92800000000] Exec() query_id=3c4f9069df52951e:0b97d92800000000 stmt=select count(*) from test_a9a41a5.t where id + random() < sleep(10000)
> I0330 06:05:25.601054 9398 coordinator.cc:476] 3c4f9069df52951e:0b97d92800000000] starting execution on 2 backends for query_id=3c4f9069df52951e:0b97d92800000000
> I0330 06:05:25.601359 124 control-service.cc:148] 3c4f9069df52951e:0b97d92800000000] ExecQueryFInstances(): query_id=3c4f9069df52951e:0b97d92800000000 coord=coordinator-0.coordinator-int.impala-1680155570-trh7.svc.cluster.local:27000 #instances=1
> I0330 06:05:25.601604 117 kudu-status-util.h:55] Exec() rpc failed: Network error: Client connection negotiation failed: client connection to 192.168.112.16:27010: connect: Connection refused (error 111)
> E0330 06:05:25.601706 117 coordinator-backend-state.cc:190] ExecQueryFInstances rpc query_id=3c4f9069df52951e:0b97d92800000000 failed: Exec() rpc failed: Network error: Client connection negotiation failed: client connection to 192.168.112.16:27010: connect: Connection refused (error 111) {code}
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org