You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues-all@impala.apache.org by "Tim Armstrong (Jira)" <ji...@apache.org> on 2020/06/04 16:51:00 UTC

[jira] [Resolved] (IMPALA-2797) scanner threads can act like a thundering herd

     [ https://issues.apache.org/jira/browse/IMPALA-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Armstrong resolved IMPALA-2797.
-----------------------------------
    Resolution: Won't Fix

IMPALA-3902 moves away from this threading model

> scanner threads can act like a thundering herd
> ----------------------------------------------
>
>                 Key: IMPALA-2797
>                 URL: https://issues.apache.org/jira/browse/IMPALA-2797
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 2.3.0
>            Reporter: Todd Lipcon
>            Priority: Major
>         Attachments: trace_Tue_Dec_22_2015_2.27.03_PM.json.gz
>
>
> I noticed this issue with the Kudu scan node implementation, but I imagine it could happen with HDFS as well:
> - on a big box, we started up 48 scanner threads for a 'SELECT COUNT(*)' query
> - the underlying batches that are read from Kudu return a few million rows each (because it's trying to do large IOs to amortize round-trips, and the projection is empty for COUNT(*))
> -- the scannerthread chops these Kudu batches into RowBatches of 1000 rows each and pushes those onto the RowBatchQueue
> Because each backend IO (scan RPC to Kudu) results in thousands of Impala RowBatches, we end up with the main thread pulling "round robin" from all of the scanner threads, rather than exhausting one Kudu batch before moving to the next. The issue here is that we see the following:
> - when the query starts, 48 threads hammer the kudu server with Scan RPCs
> - the Kudu server is then completely quiet for ~30 seconds while they drain their buffers
> - all of the buffers "empty" at basically the same time, and we get another herd of IO on the Kudu side.
> It would be preferable to make the RowBatchQueue "unfair" in some way, such that the main thread exhausts entire IO buffers at a time, rather than pulling little bits from each of the threads in a round-robin fashion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org