You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2017/09/15 06:40:00 UTC

[jira] [Resolved] (IMPALA-5417) Consider limiting I/O buffer queue size to 2 buffers

     [ https://issues.apache.org/jira/browse/IMPALA-5417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Armstrong resolved IMPALA-5417.
-----------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.11.0



IMPALA-5417: make I/O buffer queue fixed-size

This removes the dynamically-varying queue size behaviour in the I/O
manager. The motivation is to bound resource consumption of scans
and make it possible to reserve memory for I/O buffers upfront.

Does some cleanup/documentation of the locking policy. Fix some cases
in ScanRange::GetNext() where members documented as being protected by
ScanRange::lock_ were accessed without holding it. I think the races
were either benign or prevented by holding DiskIoRequestContext::lock_
in practice.

Testing:
Ran exhaustive build.

Perf:
Ran the full set of workloads (TPC-H, TPC-DS, targeted) on a 16 node
cluster. Everything was within normal variance.

Change-Id: If7cc3f7199f5320db00b7face97a96cdadb6f83f
Reviewed-on: http://gerrit.cloudera.org:8080/7408
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins

> Consider limiting I/O buffer queue size to 2 buffers
> ----------------------------------------------------
>
>                 Key: IMPALA-5417
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5417
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>    Affects Versions: Impala 2.9.0
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>              Labels: resource-management
>             Fix For: Impala 2.11.0
>
>         Attachments: benchmark_report_full.txt
>
>
> Currently the I/O buffer queue size can dynamically vary from 2 buffers up to 128 buffers. Retaining this behaviour while adding a memory constraint to the HDFS scans presents some challenges. The memory constraint is much simpler to implement if the scan node simply hands the I/O mgr a fixed amount of reservation to work with.
> Having 2 buffers is clearly beneficial because it allows overlapping I/O and compute. 
> Having > 2 buffers is not clearly beneficial. There are a few cases:
> 1. I/O is faster than compute and the queue fills up. In that case there is no benefit to having > 2 buffers because the consumer never sees an empty queue.
> 2. Compute is faster than I/O and the queue is empty. In that case additional buffering does not help.
> 3. Compute and I/O are approximately matched, but there is some variability - I/O may get a bit ahead of compute, but compute could speed up or I/O slow down temporarily. With up to 2 buffers in the I/O manager and 1 buffer being processed by the reader, I/O can be 8MB-16MB ahead of compute and therefore absorb bursts of up to 8MB. That seems like it should be enough in most cases. In this case it's also not clear to me that the dynamic sizing algorithm is particularly beneficial - it shrinks the queue aggressively when the scan gets ahead of the consumer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)