You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2017/08/05 03:22:00 UTC

[jira] [Resolved] (IMPALA-1382) Wasted space in buffered-tuple-stream in presence of many NULL tuples

     [ https://issues.apache.org/jira/browse/IMPALA-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Armstrong resolved IMPALA-1382.
-----------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.10.0



IMPALA-4674: Part 2: port backend exec to BufferPool

Always create global BufferPool at startup using 80% of memory and
limit reservations to 80% of query memory (same as BufferedBlockMgr).
The query's initial reservation is computed in the planner, claimed
centrally (managed by the InitialReservations class) and distributed
to query operators from there.

min_spillable_buffer_size and default_spillable_buffer_size query
options control the buffer size that the planner selects for
spilling operators.

Port ExecNodes to use BufferPool:
  * Each ExecNode has to claim its reservation during Open()
  * Port Sorter to use BufferPool.
  * Switch from BufferedTupleStream to BufferedTupleStreamV2
  * Port HashTable to use BufferPool via a Suballocator.

This also makes PAGG memory consumption more efficient (avoid wasting buffers)
and improve the spilling algorithm:
* Allow preaggs to execute with 0 reservation - if streams and hash tables
  cannot be allocated, it will pass through rows.
* Halve the buffer requirement for spilling aggs - avoid allocating
  buffers for aggregated and unaggregated streams simultaneously.
* Rebuild spilled partitions instead of repartitioning (IMPALA-2708)

TODO in follow-up patches:
* Rename BufferedTupleStreamV2 to BufferedTupleStream
* Implement max_row_size query option.

Testing:
* Updated tests to reflect new memory requirements

Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e
Reviewed-on: http://gerrit.cloudera.org:8080/5801
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins


> Wasted space in buffered-tuple-stream in presence of many NULL tuples
> ---------------------------------------------------------------------
>
>                 Key: IMPALA-1382
>                 URL: https://issues.apache.org/jira/browse/IMPALA-1382
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 2.0
>            Reporter: Ippokratis Pandis
>            Assignee: Tim Armstrong
>            Priority: Minor
>             Fix For: Impala 2.10.0
>
>
> We currently preallocate the null tuple indicator bitstring in each tuple stream assuming that very few tuples will be NULL. That can lead to lots of wasted space in the buffer if there are many NULL tuples in the stream.
> A possible solution is to use a slotted page (buffer) with NULL indicators growing from the end of the page (buffer).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)