You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Michael Ho (JIRA)" <ji...@apache.org> on 2018/03/16 06:46:00 UTC

[jira] [Created] (IMPALA-6685) Improve profile in KrpcDataStreamRecvr and KrpcDataStreamSender

Michael Ho created IMPALA-6685:
----------------------------------

             Summary: Improve profile in KrpcDataStreamRecvr and KrpcDataStreamSender
                 Key: IMPALA-6685
                 URL: https://issues.apache.org/jira/browse/IMPALA-6685
             Project: IMPALA
          Issue Type: Improvement
          Components: Distributed Exec
    Affects Versions: Impala 3.0, Impala 2.12.0
            Reporter: Michael Ho
            Assignee: Michael Ho


The existing profiles in KrpcDataStreamRecvr and KrpcDataStreamSender made it hard to diagnose slow queries shown in IMPALA-6657. In particular, there are times in which the profile of the receiver showing a lot of time waiting for row batches to arrive while the sender is also showing a lot of time waiting for responses of TransmitData() RPC. 

A couple of improvements can be done to make it slightly easier to diagnose the problem:
- track the number of deferred row batches over time in KrpcDataStreamRecvr
- track the number of bytes dequeued over time in KrpcDataStreamRecvr
- track the amount of time row batches spent in deferred queue
- track the number of bytes sent from KrpcDataStreamSender over time

The above items help identify cases in which one fragment instances containing an exchange node is slow for a period of time (e.g. the parent of exchange node spills heavily), causing all senders to that fragment instance to block waiting for responses. As all senders are blocked waiting for previous RPC to complete, they will not produce more rows and all other fragment instances will be starved, leading to the high wait time shown in their receiver's profile. The time series counter for the number of deferred row batches in a receiver helps identify cases described above.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)