You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Tim Armstrong (Code Review)" <ge...@cloudera.org> on 2017/09/25 23:50:18 UTC
[Impala-ASF-CR] IMPALA-5844: use a MemPool for expr result allocations
Hello Dan Hecht,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/8025
to look at the new patch set (#8).
Change subject: IMPALA-5844: use a MemPool for expr result allocations
......................................................................
IMPALA-5844: use a MemPool for expr result allocations
"local" allocations containing expression results (either intermediate
or final results) have the following properties:
* They are usually small allocations
* They can be made frequently (e.g. every function call)
* They are owned and managed by the Impala runtime
* They are freed in bulk at various points in query execution.
A MemPool (i.e. bump allocator) is the right mechanism to manage
allocations with the above properties. Before this patch
FunctionContext's used a FreePool + vector of allocations to emulate the
above behaviour. This patch switches to using a MemPool to bring these
allocations in line with the rest of the codebase.
The steps required to do this conversion.
* Use a MemPool for FunctionContext local allocations.
* Identify appropriate MemPools for all of the local allocations from
function contexts so that the memory lifetime is correct.
* Various cleanup and documentation of existing MemPools.
* Replaces calls to FreeLocalAllocations() with calls to
MemPool::Clear()
More involved surgery was required in a few places:
* Made the Sorter own its comparator, exprs and MemPool.
* Remove FunctionContextImpl::ReallocateLocal() and just have
StringFunctions::Replace() do the doubling itself to avoid
the need for a special interface. Worst-case this doubles
the memory requirements for Replace() since n / 2 + n / 4
+ n / 8 + .... bytes of memory could be wasted instead of recycled
for an n-byte output string.
* Provide a way redirect agg fn Serialize()/Finalize() allocations
to come directly from the output RowBatch's MemPool. This is
also potentially applicable to other places where we currently
copy out strings from local allocations, e.g.
AnalyticEvalNode::AddResultTuple() and Tuple::MaterializeExprs().
* --stress_free_pool_alloc was changed to instead intercept at the
FunctionContext layer so that it retains the old behaviour even
though allocations do not all come from FreePools.
The "local" allocation concept was not exposed directly in udf.h so this
patch also renames them to better reflect that they're used for expr
results.
Testing:
* ran exhaustive and ASAN
Change-Id: I4ba5a7542ed90a49a4b5586c040b5985a7d45b61
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/common/global-flags.cc
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-mt.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/exec/plan-root-sink.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/exec/topn-node.h
M be/src/exec/union-node.cc
M be/src/exec/unnest-node.cc
M be/src/exprs/agg-fn-evaluator.cc
M be/src/exprs/agg-fn-evaluator.h
M be/src/exprs/case-expr.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/hive-udf-call.cc
M be/src/exprs/scalar-expr-evaluator.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/string-functions-ir.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/descriptors.cc
D be/src/runtime/free-pool.cc
M be/src/runtime/free-pool.h
M be/src/runtime/sorted-run-merger.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/runtime/tuple.cc
M be/src/service/fe-support.cc
M be/src/udf/udf-internal.h
M be/src/udf/udf-test-harness.cc
M be/src/udf/udf.cc
M be/src/udf/udf.h
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test
M tests/custom_cluster/test_alloc_fail.py
75 files changed, 707 insertions(+), 812 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/8025/8
--
To view, visit http://gerrit.cloudera.org:8080/8025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4ba5a7542ed90a49a4b5586c040b5985a7d45b61
Gerrit-Change-Number: 8025
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>