You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues-all@impala.apache.org by "Joe McDonnell (Jira)" <ji...@apache.org> on 2020/04/24 21:54:00 UTC

[jira] [Updated] (IMPALA-6783) Rethink the end-to-end queuing at KrpcDataStreamReceiver

     [ https://issues.apache.org/jira/browse/IMPALA-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joe McDonnell updated IMPALA-6783:
----------------------------------
    Target Version: Impala 4.0  (was: Impala 3.4.0)

> Rethink the end-to-end queuing at KrpcDataStreamReceiver
> --------------------------------------------------------
>
>                 Key: IMPALA-6783
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6783
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Distributed Exec
>    Affects Versions: Impala 2.12.0
>            Reporter: Michael Ho
>            Priority: Major
>
> Follow up from IMPALA-6116. We currently bound the memory usage of service queue and force a RPC to retry if the memory usage exceeds the configured limit. The deserialization of row batches happen in the context of service threads. The deserialized row batches are stored in a queue in the receiver and its memory consumption is bound by FLAGS_exchg_node_buffer_size_bytes. Exceeding that limit, we will put incoming row batches into a deferred RPC queue, which will be drained by deserialization threads. This makes it hard to size the service queues as its capacity may need to grow as the number of nodes in the cluster grows.
> We may need to reconsider the role of service queue: it could just be a transition queue before KrpcDataStreamMgr routes the incoming row batches to the appropriate receivers. The actual queuing may happen in the receiver. The deserialization should always happen in the context of deserialization threads so the service threads will just be responsible for routing the RPC requests. This allows us to keep a rather small service queue. Incoming serialized row batches will always sit in a queue to be drained by deserialization threads. We may still need to keep a certain number of deserialized row batches around ready to be consumed. In this way, we can account for the memory consumption and size the queue based on number of senders and memory budget of a query.
> One hurdle is that we need to overcome the undesirable cross-thread allocation pattern as rpc_context is allocated from service threads but freed by the deserialization thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org