You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Rajesh Balamohan (JIRA)" <ji...@apache.org> on 2015/10/26 23:21:27 UTC

[jira] [Updated] (TEZ-2244) PipelinedSorter: Progressive allocation for sort-buffers

     [ https://issues.apache.org/jira/browse/TEZ-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rajesh Balamohan updated TEZ-2244:
----------------------------------
    Attachment: TEZ-2244.5.patch

- TEZ_RUNTIME_PIPELINED_SORTER_PRE_ALLOCATE_MEMORY (true by default)
-- When enabled, it would pre-allocate all sort buffers in sorter upfront in chunks. It would respect chunk sizes determined by TEZ_RUNTIME_PIPELINED_SORTER_PRE_ALLOCATE_MIN_BLOCK_SIZE_IN_MB (2000 by default). Last chunk (if less than one chunk) would be merged with previous chunk.
-- When disabled, it would allocate one chunk at a time (First chunk in this case would be 32 MB and subsequent chunks would be 256 MB each.). Last chunk (if less than the min block size) would be merged with the previous chunk. This would be useful in scenarios, where one does not want to allocate all memory upfront. 


> PipelinedSorter: Progressive allocation for sort-buffers
> --------------------------------------------------------
>
>                 Key: TEZ-2244
>                 URL: https://issues.apache.org/jira/browse/TEZ-2244
>             Project: Apache Tez
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Gopal V
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-2244.1.patch, TEZ-2244.2.patch, TEZ-2244.3.patch, TEZ-2244.4.patch, TEZ-2244.5.patch, TEZ-2244.WIP.patch
>
>
> Currently, the sort buffers are allocated pessimistically for all tasks so that the largest task's spill stays within memory.
> After the chained buffer implementation inside PipelinedSorter, it brings up the possibility of only allocating the first chunk of the sort buffer when the sorter starts up.
> This allows for the tasks which do not heavily use the sort buffer (like a grouping aggregation) to use the sort-space only when the map-aggregation turns itself off.
> Not reserving memory on startup hurts the worst-case scenario for the pipelined sorter, but improves the average case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)