You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tim Armstrong (Jira)" <ji...@apache.org> on 2019/11/05 20:10:00 UTC
[jira] [Created] (IMPALA-9128) Improve debugging for slow sends in
KrpcDataStreamSender
Tim Armstrong created IMPALA-9128:
-------------------------------------
Summary: Improve debugging for slow sends in KrpcDataStreamSender
Key: IMPALA-9128
URL: https://issues.apache.org/jira/browse/IMPALA-9128
Project: IMPALA
Issue Type: Bug
Components: Distributed Exec
Reporter: Tim Armstrong
Assignee: Tim Armstrong
I'm trying to debug a problem that appears to be caused by a slow RPC:
{noformat}
Fragment F00
Instance 754fc21ba4744310:d58fd04200000020 (host=xxxxxxxxx)
Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:1/120.48 MB
- AverageThreadTokens: 1.00 (1.0)
- BloomFilterBytes: 0 B (0)
- InactiveTotalTime: 0ns (0)
- PeakMemoryUsage: 3.2 MiB (3337546)
- PeakReservation: 2.0 MiB (2097152)
- PeakUsedReservation: 0 B (0)
- PerHostPeakMemUsage: 6.7 MiB (6987376)
- RowsProduced: 7 (7)
- TotalNetworkReceiveTime: 0ns (0)
- TotalNetworkSendTime: 3.6m (215354065071)
- TotalStorageWaitTime: 4ms (4552708)
- TotalThreadsInvoluntaryContextSwitches: 2 (2)
- TotalThreadsTotalWallClockTime: 3.6m (215924079474)
- TotalThreadsSysTime: 24ms (24386000)
- TotalThreadsUserTime: 505ms (505714000)
- TotalThreadsVoluntaryContextSwitches: 3,623 (3623)
- TotalTime: 3.6m (215801961705)
Fragment Instance Lifecycle Event Timeline
Prepare Finished: 1ms (1812344)
Open Finished: 322ms (322905753)
First Batch Produced: 447ms (447050377)
First Batch Sent: 447ms (447054546)
ExecInternal Finished: 3.6m (215802284852)
Buffer pool
- AllocTime: 0ns (0)
- CumulativeAllocationBytes: 0 B (0)
- CumulativeAllocations: 0 (0)
- InactiveTotalTime: 0ns (0)
- PeakReservation: 0 B (0)
- PeakUnpinnedBytes: 0 B (0)
- PeakUsedReservation: 0 B (0)
- ReadIoBytes: 0 B (0)
- ReadIoOps: 0 (0)
- ReadIoWaitTime: 0ns (0)
- ReservationLimit: 0 B (0)
- TotalTime: 0ns (0)
- WriteIoBytes: 0 B (0)
- WriteIoOps: 0 (0)
- WriteIoWaitTime: 0ns (0)
Fragment Instance Lifecycle Timings
- ExecTime: 3.6m (215479380267)
- ExecTreeExecTime: 124ms (124299400)
- InactiveTotalTime: 0ns (0)
- OpenTime: 321ms (321088906)
- ExecTreeOpenTime: 572.04us (572045)
- PrepareTime: 1ms (1426412)
- ExecTreePrepareTime: 233.32us (233318)
- TotalTime: 0ns (0)
KrpcDataStreamSender (dst_id=3)
- EosSent: 58 (58)
- InactiveTotalTime: 3.6m (215354085858)
- PeakMemoryUsage: 464.4 KiB (475504)
- RowsSent: 7 (7)
- RpcFailure: 0 (0)
- RpcRetry: 0 (0)
- SerializeBatchTime: 99.87us (99867)
- TotalBytesSent: 207 B (207)
- TotalTime: 3.6m (215355336381)
- UncompressedRowBatchSize: 267 B (267)
{noformat}
We should add some diagnostics that will allow us to figure out which RPCs are slow and whether there's a pattern about which host is the problem. E.g. maybe we should log if the RPC time exceeds a configured threshold.
It may also be useful to include some stats about the wait time, e.g. a histogram of the wait times, so that we can see if it's an outlier or general slowness.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)