You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/04/28 23:43:00 UTC

[jira] [Commented] (IMPALA-6679) Don't claim ideal reservation in scanner until actually processing scan ranges

    [ https://issues.apache.org/jira/browse/IMPALA-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16457853#comment-16457853 ] 

ASF subversion and git services commented on IMPALA-6679:
---------------------------------------------------------

Commit 418c705787f060afb3e9e5fbeadb891b2484b6c5 in impala's branch refs/heads/master from [~tarmstrong@cloudera.com]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=418c705 ]

IMPALA-6679,IMPALA-6678: reduce scan reservation

This has two related changes.

IMPALA-6679: defer scanner reservation increases
------------------------------------------------
When starting each scan range, check to see how big the initial scan
range is (the full thing for row-based formats, the footer for
Parquet) and determine whether more reservation would be useful.

For Parquet, base the ideal reservation on the actual column layout
of each file. This avoids reserving memory that we won't use for
the actual files that we're scanning. This also avoid the need to
estimate ideal reservation in the planner.

We also release scanner thread reservations above the minimum as
soon as threads complete, so that resources can be released slightly
earlier.

IMPALA-6678: estimate Parquet column size for reservation
---------------------------------------------------------
This change also reduces reservation computed by the planner in certain
cases by estimating the on-disk size of column data based on stats. It
also reduces the default per-column reservation to 4MB since it appears
that < 8MB columns are generally common in practice and the method for
estimating column size is biased towards over-estimating. There are two
main cases to consider for the performance implications:
* Memory is available to improve query perf - if we underestimate, we
  can increase the reservation so we can do "efficient" 8MB I/Os for
  large columns.
* The ideal reservation is not available - query performance is affected
  because we can't overlap I/O and compute as much and may do smaller
  (probably 4MB I/Os). However, we should avoid pathological behaviour
  like tiny I/Os.

When stats are not available, we just default to reserving 4MB per
column, which typically is more memory than required. When stats are
available, the memory required can be reduced below when some heuristic
tell us with high confidence that the column data for most or all files
is smaller than 4MB.

The stats-based heuristic could reduce scan performance if both the
conservative heuristics significantly underestimate the column size
and memory is constrained such that we can't increase the scan
reservation at runtime (in which case the memory might be used by
a different operator or scanner thread).

Observability:
Added counters to track when threads were not spawned due to reservation
and to track when reservation increases are requested and denied. These
allow determining if performance may have been affected by memory
availability.

Testing:
Updated test_mem_usage_scaling.py memory requirements and added steps
to regenerate the requirements. Loops test for a while to flush out
flakiness.

Added targeted planner and query tests for reservation calculations and
increases.

Change-Id: Ifc80e05118a9eef72cac8e2308418122e3ee0842
Reviewed-on: http://gerrit.cloudera.org:8080/9757
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Don't claim ideal reservation in scanner until actually processing scan ranges
> ------------------------------------------------------------------------------
>
>                 Key: IMPALA-6679
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6679
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>            Priority: Major
>
> One cause of the memory regression in IMPALA-4835 was that scans, particularly Parquet, claimed memory very aggressively in HdfsScanNode::Open() based on the planner's estimated ideal memory. There were two problems here:
> # In many cases the ideal memory increase was not at all needed, because the total amount of data scanned in the Parquet file was actually less than the minimum reservation. We don't know this for sure until the read the Parquet footer and check the column sizes.
> # HdfsScanNode::Open() may happen a long time before the first HdfsScanNode::GetNext(), particularly if the scan node is on the left side of a broadcast join. The scan node may grab a lot of reservation before other plan nodes have started running, eventually resulting in starving a lot of nodes.
> It would make sense to wait until we are actually processing a scan range to increase a scanner thread's reservation from the minimum. There are various more complex things we could do here, but I suspect the simplest possible thing would help a lot. E.g. we could extend this by  releasing reservation after processing each scan range.
> I believe the second effect may be fairly significant when running many queries with high concurrency. In that case the async join build will tend to be disabled, which means that HdfsScanNode::GetNext() on the left child of a broadcast join will only be called after the join build has completed, so overlap between that scans and the nodes on the right side of the join would be minimal (there's not enough synchronisation to guarantee no overlap).
> See https://docs.google.com/document/d/1kR0zfevNNUJom3sH1XmposacVZ-QALan7NSwnR5CkSA/edit# for some more analysis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org