You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Noemi Pap-Takacs (Code Review)" <ge...@cloudera.org> on 2023/05/05 13:58:36 UTC

[Impala-ASF-CR] IMPALA-4530: Implement in-memory merge of quicksorted runs

Noemi Pap-Takacs has uploaded a new patch set (#20). ( http://gerrit.cloudera.org:8080/18393 )

Change subject: IMPALA-4530: Implement in-memory merge of quicksorted runs
......................................................................

IMPALA-4530: Implement in-memory merge of quicksorted runs

This change aims to decrease back-pressure in the sorter. It offers an
alternative for the in-memory run formation strategy and sorting
algorithm by introducing a new in-memory merge level between the
in-memory quicksort and the external merge phase.
Instead of forming one big run, it produces many smaller in-memory runs
(called miniruns), sorts those with quicksort, then merges them
in memory, before spilling or serving GetNext().
The external merge phase remains the same.
Works with MAX_SORT_RUN_SIZE development query option that determines
the maximum number of pages in a 'minirun'. The default value of
MAX_SORT_RUN_SIZE is 0, which keeps the original implementation of 1
big initial in-memory run. Other options are integers of 2 and above.
The recommended value is 10 or more, to avoid high fragmentation
in case of variable length data.

Testing:
- added MAX_SORT_RUN_SIZE as an additional test dimension to
  test_sort.py with values [0, 2, 20]
- manual E2E testing

Change-Id: I58c0ae112e279b93426752895ded7b1a3791865c
---
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/runtime/sorter-internal.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/tuple-row-compare.h
M bin/perf_tools/perf-query.sh
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M tests/query_test/test_kudu.py
M tests/query_test/test_sort.py
13 files changed, 472 insertions(+), 87 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/18393/20
-- 
To view, visit http://gerrit.cloudera.org:8080/18393
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I58c0ae112e279b93426752895ded7b1a3791865c
Gerrit-Change-Number: 18393
Gerrit-PatchSet: 20
Gerrit-Owner: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>