You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Tim Armstrong (Code Review)" <ge...@cloudera.org> on 2020/04/07 16:12:00 UTC

[Impala-ASF-CR] IMPALA-9176: shared null-aware anti-join build

Tim Armstrong has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/15612 )

Change subject: IMPALA-9176: shared null-aware anti-join build
......................................................................

IMPALA-9176: shared null-aware anti-join build

This switches null-aware anti-join (NAAJ) to use shared
join builds with mt_dop > 0. To support this, we
make all access to the join build data structures
from the probe read-only. NAAJ requires iterating
over rows from build partitions at various steps
in the algorithm and before this patch this was not
thread-safe. We avoided that problem by having a
separate builder for each join node and duplicating
the data.

The main challenge was iteration over
null_aware_partition()->build_rows() from the probe
side, because it uses an embedded iterator in the
stream so was not thread-safe (since each thread
would be trying to use the same iterator).

The solution is to extend BufferedTupleStream to
allow multiple read iterators into a pinned,
read-only, stream. Each probe thread can then
iterate over the stream independently with no
thread safety issues.

With BufferedTupleStream changes, I partially abstracted
ReadIterator more from the rest of BufferedTupleStream,
but decided not to completely refactor so that this patchset
didn't cause excessive churn. I.e. much BufferedTupleStream
code still accesses internal fields of ReadIterator.

Fix a pre-existing bug in grouping-aggregator where
Spill() hit a DCHECK because the hash table was
destroyed unnecessarily when it hit an OOM. This was
flushed out by the parameter change in test_spilling.

Testing:
Add test to buffered-tuple-stream-test for multiple readers
to BTS.

Tweaked test_spilling_naaj_no_deny_reservation to have
a smaller minimum reservation, required to keep the
test passing with the new, lower, memory requirement.

Updated a TPC-H planner test where resource requirements
slightly decreased for the NAAJ.

Ran the naaj tests in test_spilling.py with TSAN enabled,
confirmed no data races.

Ran exhaustive tests, which passed after fixing IMPALA-9611.

Ran core tests with ASAN.

Ran backend tests with TSAN.

Perf:
I ran this query that exercises EvaluateNullProbe() heavily.

  select l_orderkey, l_partkey, l_suppkey, l_linenumber
  from tpch30_parquet.lineitem
  where l_suppkey = 4162 and l_shipmode = 'AIR'
        and l_returnflag = 'A' and l_shipdate > '1993-01-01'
        and if(l_orderkey > 5500000, NULL, l_orderkey) not in (
          select if(o_orderkey % 2 = 0, NULL, o_orderkey + 1)
          from orders
          where l_orderkey = o_orderkey)
  order by 1,2,3,4;

It went from ~13s to ~11s running on a single impalad with
this change, because of the inlining of CreateOutputRow() and
EvalConjuncts().

I also ran TPC-H SF 30 on Parquet with mt_dop=4, and there was
no change in performance.

Change-Id: I95ead761430b0aa59a4fb2e7848e47d1bf73c1c9
---
M be/src/exec/blocking-join-node.cc
M be/src/exec/blocking-join-node.h
A be/src/exec/blocking-join-node.inline.h
M be/src/exec/data-source-scan-node.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
A be/src/exec/exec-node.inline.h
M be/src/exec/grouping-aggregator-partition.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/exec/hbase-scan-node.cc
M be/src/exec/hdfs-avro-scanner-ir.cc
M be/src/exec/hdfs-columnar-scanner-ir.cc
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/non-grouping-aggregator.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node-ir.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/exec/select-node-ir.cc
M be/src/exec/unnest-node.cc
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/buffered-tuple-stream.inline.h
M be/src/runtime/bufferpool/buffer-pool-internal.h
M be/src/runtime/bufferpool/buffer-pool-test.cc
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-query/queries/QueryTest/spilling-no-debug-action.test
M tests/query_test/test_spilling.py
45 files changed, 786 insertions(+), 397 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/15612/10
-- 
To view, visit http://gerrit.cloudera.org:8080/15612
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I95ead761430b0aa59a4fb2e7848e47d1bf73c1c9
Gerrit-Change-Number: 15612
Gerrit-PatchSet: 10
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>