You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/06/12 07:17:00 UTC

[jira] [Commented] (IMPALA-5706) Parallelise read I/O in sorter

    [ https://issues.apache.org/jira/browse/IMPALA-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509278#comment-16509278 ] 

ASF subversion and git services commented on IMPALA-5706:
---------------------------------------------------------

Commit 88366bab07db3defa14ce00ac4f02ad97ec54176 in impala's branch refs/heads/master from [~gaborkaszab]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=88366ba ]

IMPALA-5706: Spilling sort optimisations

This patch covers multiple changes with the purpose of optimizing
spilling sort mechanism:
  - Remove the hard-coded maximum limit of buffers that can be used
    for merging the sorted runs. Instead this number is calculated
    based on the available memory through buffer pool.
  - The already sorted runs are distributed more optimally between
    the last intermediate merge and the final merge to avoid that a
    heavy intermediate merge is followed by a light final merge.
  - Right before starting the merging phase Sorter tries to allocate
    additional memory through the buffer pool.
  - An output run is not allocated anymore for the final merge.

Note, double-buffering the runs during a merge was also planned with
this patch. However, performance testing showed that except some
exotic queries with unreasonably small amount of buffer pool memory
available double-buffering doesn't add to the overall performance.
It's basically because the half of the available buffers have to be
sacrificed to do double-buffering and as a result the merge tree can
get deeper. In addition the amount of I/O wait time is not reaching
the level where double-buffering could countervail the reduced number
of runs during a particular merge.

Performance measurements were made during manual testing to verify
that this is in fact an optimization:
  - In case doing a sort on top of a join when working with a
    restricted amount of memory then the Sort node successfully
    allocates additional memory right before the merging phase. This
    is feasible because once Join finishes sending new input data and
    calls InputDone() then it releases memory that can be picked up
    by the Sorter. This results in shallower merging trees (more runs
    grabbed for a merge).
  - On a multi-node cluster I verified that in cases when at least one
    merging step is done then this change reduces the execution time
    for sorts.
  - The more merging steps are done the bigger the performance gain is
    compared to the baseline.

Change-Id: I74857c1694802e81f1cfc765d2b4e8bc644387f9
Reviewed-on: http://gerrit.cloudera.org:8080/9943
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Parallelise read I/O in sorter
> ------------------------------
>
>                 Key: IMPALA-5706
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5706
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>    Affects Versions: Impala 2.10.0
>            Reporter: Tim Armstrong
>            Assignee: Gabor Kaszab
>            Priority: Major
>              Labels: resource-management, spill
>             Fix For: Impala 3.1.0
>
>
> IMPALA-3200 offers an opportunity to improve the spilling sort algorithm:
> * Use the reliability of reservations to select the most efficient order to conduct merges in (rather than greedily trying to maximise the fan-in of the current merge). We want to minimise the depth of the merge tree, then structure the tree based on the preferred fan-in.
> * Do multiple-buffering of the stream being written (this happens automatically if there are free buffers in the BufferPool client).
> * Do multiple-buffering of the streams being read, instead of blocking on read I/O frequently.
> More concretely, the idea is to implement double-buffering of spilled input runs by calling BufferPool::Pin() early to prefetch the second page in each input Run. Currently only one page per input run is pinned, which means that the sorter frequently blocks on I/O.
> I'd suggest doing this in two steps.
> The first step is to change how the fan-in of each merge run is selected. We know the number of runs to be merged and the buffer reservation that is available, so we can compute the maximum possible fan-in of each merge step (assuming 1 buffer for the output run and 1 buffer for each input run to the merge). We can then calculate the minimum number of rounds of merging required and, based on that, decide how the runs should be merged (you could think about it as a tree of merge operations). I think we want to reduce the number of bytes written to disk. E.g. if we have 5 buffers and 8 input runs, we should merge input runs (1,2,3,4) then merge that intermediate runs with runs (5,6,7). It's reasonable to assume that the input runs are all approximate the same size.
> ee53ddb389549247f5bfe760d446dc7b3b963a29 actually removed some logic along those lines because it didn't work with the old buffer management scheme. The logic before that commit might provide some ideas. There are also some related TODOs in Sorter::MergeIntermediateRuns() and Sorter::CreateMerger() to simplify how the number of input runs is decided and how the merger is set up:
> {code}
>     // TODO: once we have reliable reservations (IMPALA-3200), we should calculate this
>     // based on the available reservations.
> ....
>     // TODO: this isn't optimal: we could defer creating the merged run if we have
>     // reliable reservations (IMPALA-3200).
> ...
>       // TODO: IMPALA-3200: we should not need this logic once we have reliable
>       // reservations (IMPALA-3200).
> {code}
> The second step would be to adjust the logic from the first step to reserve 2 buffers per input and output run and then implement the logic to call Pin() earlier to prefetch the page after the current page.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org