You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Riza Suminto (Jira)" <ji...@apache.org> on 2021/09/28 21:52:00 UTC

[jira] [Comment Edited] (IMPALA-10049) Include RPC call_id in slow RPC logs

    [ https://issues.apache.org/jira/browse/IMPALA-10049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17419888#comment-17419888 ] 

Riza Suminto edited comment on IMPALA-10049 at 9/28/21, 9:51 PM:
-----------------------------------------------------------------

call_id is recently exposed through RpcController class in this commit:
 [https://github.com/apache/kudu/commit/4438a15f9238f818e9af1230521e8d7c4924f92b]

After we rebase this KRPC change to Impala code, we should able to log the call_id in the sender side.

I see that slow RPC's call_id is logged already in the receiver side like this:
{code:java}
I0923 16:12:43.990563 8667 krpc-data-stream-mgr.cc:372] Sender 127.0.0.1 timed out waiting for receiver fragment instance: 854be678cb62de18:80c91e9600000005, dest node: 4
I0923 16:12:43.990624 8667 rpcz_store.cc:269] Call impala.DataStreamService.TransmitData from 127.0.0.1:38918 (request call id 1) took 6030ms. Request Metrics: {}
I0923 16:12:43.990684 8667 rpcz_store.cc:273] Trace:
0923 16:12:37.960211 (+ 0us) impala-service-pool.cc:179] Inserting onto call queue
0923 16:12:37.960262 (+ 51us) impala-service-pool.cc:278] Handling call
0923 16:12:37.960308 (+ 46us) krpc-data-stream-mgr.cc:228] Added early sender
0923 16:12:43.990514 (+6030206us) krpc-data-stream-mgr.cc:364] Timed out sender
0923 16:12:43.990617 (+ 103us) inbound_call.cc:162] Queueing success response
Metrics: {}{code}
Adding call_id in the sender log for slow RPC will help us make association and analyze wether the slowdown happens in network or RPC handler. In the meantime, I'm assigning this Jira to myself.


was (Author: rizaon):
call_id is recently exposed through RpcController class in this commit:
[https://github.com/apache/kudu/commit/4438a15f9238f818e9af1230521e8d7c4924f92b]

After we rebase this KRPC change to Impala code, we should able to log the client_id in the sender side.

I see that slow RPC's call_id is logged already in the receiver side like this:
{code:java}
I0923 16:12:43.990563 8667 krpc-data-stream-mgr.cc:372] Sender 127.0.0.1 timed out waiting for receiver fragment instance: 854be678cb62de18:80c91e9600000005, dest node: 4
I0923 16:12:43.990624 8667 rpcz_store.cc:269] Call impala.DataStreamService.TransmitData from 127.0.0.1:38918 (request call id 1) took 6030ms. Request Metrics: {}
I0923 16:12:43.990684 8667 rpcz_store.cc:273] Trace:
0923 16:12:37.960211 (+ 0us) impala-service-pool.cc:179] Inserting onto call queue
0923 16:12:37.960262 (+ 51us) impala-service-pool.cc:278] Handling call
0923 16:12:37.960308 (+ 46us) krpc-data-stream-mgr.cc:228] Added early sender
0923 16:12:43.990514 (+6030206us) krpc-data-stream-mgr.cc:364] Timed out sender
0923 16:12:43.990617 (+ 103us) inbound_call.cc:162] Queueing success response
Metrics: {}{code}
Adding call_id in the sender log for slow RPC will help us make association and analyze wether the slowdown happens in network or RPC handler. In the meantime, I'm assigning this Jira to myself.

> Include RPC call_id in slow RPC logs
> ------------------------------------
>
>                 Key: IMPALA-10049
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10049
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Distributed Exec
>            Reporter: Sahil Takiar
>            Assignee: Riza Suminto
>            Priority: Major
>              Labels: observability
>
> The current code for logging slow RPCs on the sender side looks something like this:
> {code:java}
> template <typename ResponsePBType>
> void KrpcDataStreamSender::Channel::LogSlowRpc(
>   ¦ const char* rpc_name, int64_t total_time_ns, const ResponsePBType& resp) {
>   int64_t network_time_ns = total_time_ns - resp_.receiver_latency_ns();
>   LOG(INFO) << "Slow " << rpc_name << " RPC to " << address_
>   ¦ ¦ ¦ ¦ ¦ << " (fragment_instance_id=" << PrintId(fragment_instance_id_) << "): "
>   ¦ ¦ ¦ ¦ ¦ << "took " << PrettyPrinter::Print(total_time_ns, TUnit::TIME_NS) << ". "
>   ¦ ¦ ¦ ¦ ¦ << "Receiver time: "
>   ¦ ¦ ¦ ¦ ¦ << PrettyPrinter::Print(resp_.receiver_latency_ns(), TUnit::TIME_NS)
>   ¦ ¦ ¦ ¦ ¦ << " Network time: " << PrettyPrinter::Print(network_time_ns, TUnit::TIME_NS);
> }
> void KrpcDataStreamSender::Channel::LogSlowFailedRpc(
>   ¦ const char* rpc_name, int64_t total_time_ns, const kudu::Status& err) {
>   LOG(INFO) << "Slow " << rpc_name << " RPC to " << address_
>   ¦ ¦ ¦ ¦ ¦ << " (fragment_instance_id=" << PrintId(fragment_instance_id_) << "): "
>   ¦ ¦ ¦ ¦ ¦ << "took " << PrettyPrinter::Print(total_time_ns, TUnit::TIME_NS) << ". "
>   ¦ ¦ ¦ ¦ ¦ << "Error: " << err.ToString();
> } {code}
> It would be nice to include the call_id in the logs as well so that RPCs can more easily be traced. The RPC call_id is dumped in RPC traces on the receiver side, as well as in the /rpcz output on the debug ui.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org