You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "YifanZhang (Jira)" <ji...@apache.org> on 2022/12/27 10:28:00 UTC

[jira] [Created] (IMPALA-11814) TransmitData RPC become slow when multiple large queries are executed simultaneously

YifanZhang created IMPALA-11814:
-----------------------------------

             Summary: TransmitData RPC become slow when multiple large queries are executed simultaneously
                 Key: IMPALA-11814
                 URL: https://issues.apache.org/jira/browse/IMPALA-11814
             Project: IMPALA
          Issue Type: Bug
            Reporter: YifanZhang


We have seen a lot of slow TransmitData RPC in our clusters when multiple large queries are running. Logs in executors:
{code:java}
1227 16:44:24.859488  1134 rpcz_store.cc:269] Call impala.DataStreamService.TransmitData from 11.174.152.179:53824 (request call id 10458) took 368799ms. Request Metrics: {}
I1227 16:44:24.859828  1134 rpcz_store.cc:273] Trace:
1227 16:38:16.060450 (+     0us) impala-service-pool.cc:179] Inserting onto call queue
1227 16:38:16.060480 (+    30us) impala-service-pool.cc:278] Handling call
1227 16:38:16.060498 (+    18us) krpc-data-stream-recvr.cc:325] Enqueuing deferred RPC
1227 16:44:24.859372 (+368798874us) krpc-data-stream-recvr.cc:504] Processing deferred RPC
1227 16:44:24.859382 (+    10us) krpc-data-stream-recvr.cc:397] Deserializing batch
1227 16:44:24.859442 (+    60us) krpc-data-stream-recvr.cc:424] Enqueuing deserialized batch
1227 16:44:24.859451 (+     9us) inbound_call.cc:162] Queueing success response
Metrics: {} {code}
The RPC spends a lot of time waiting in the queue. Some other executors also take a long time to transmit data:
{code:java}
I1227 16:44:23.379583   528 krpc-data-stream-sender.cc:396] Slow TransmitData RPC to 11.136.184.52:50257 (fragment_instance_id=f9477dcd54c9d817:1bc346de000001e7): took 5m37s. Receiver time: 5m37s Network time: 1.343ms
I1227 16:44:23.396365   631 krpc-data-stream-sender.cc:396] Slow TransmitData RPC to 11.174.158.152:58535 (fragment_instance_id=f9477dcd54c9d817:1bc346de00000231): took 5m57s. Receiver time: 5m57s Network time: 2.970ms
I1227 16:44:23.396482  4654 krpc-data-stream-sender.cc:430] f9477dcd54c9d817:1bc346de000001b6] Long delay waiting for RPC to 11.174.158.152:58535 (fragment_instance_id=f9477dcd54c9d817:1bc346de00000231): took 5m57s
I1227 16:44:23.444633   572 krpc-data-stream-sender.cc:396] Slow TransmitData RPC to 11.174.153.240:60303 (fragment_instance_id=f9477dcd54c9d817:1bc346de000002c3): took 5m53s. Receiver time: 5m53s Network time: 17.604ms{code}
The max handler latency time is also very long:
|*Method: _TransmitData_*|
|Handler Latency|Count: 2321637, min / max: 3.000us / 5m10s, 25th %-ile: 22.000us, 50th %-ile: 38.000us, 75th %-ile: 55.000us, 90th %-ile: 72.000us, 95th %-ile: 86.000us, 99.9th %-ile: 4.512ms|

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)