You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org> on 2017/09/11 16:57:26 UTC

[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

Thomas Tauber-Marshall has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/8029

Change subject: IMPALA-3360: Codegen inserting into runtime filters
......................................................................

IMPALA-3360: Codegen inserting into runtime filters

This patch codegens PhjBuilder::InsertRuntimeFilters() and
FilterContext::Insert().

This allows us to unroll the loop over all the filters in
PhjBuilder::ProcessBuildBatch(), including skipping consideration of
FilterContext's that don't have a local_bloom_filter, and eliminates
the branch on type that happens in RawValue::GetHashValue().

Testing:
- Ran existing runtime filter tests.
- Ran perf tests locally (all avg. over three runs):
  - Four way self join on tpch_parquet.lineitem. Should be a good case
    for this as there's several large hash join build sides that will
    benefit from the codegen. Total query running time improved ~7%
    (from 16.07s to 14.91s).
  - Single join of tpch_parquet.lineitem against a selectively
    filtered tpch_parquet.lineitem. Should be a bad case for this
    patch, as the build side of the join is very small. Total query
    running time regressed by about ~2% (from 0.73s to 0.75s) due to
    an increase in codegen time (from 295ms to 309ms for the fragment
    containing the hash join).

Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/util/CMakeLists.txt
A be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.h
10 files changed, 232 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/8029/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8029 )

Change subject: IMPALA-3360: Codegen inserting into runtime filters
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/exec/filter-context.cc
File be/src/exec/filter-context.cc:

http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/exec/filter-context.cc@216
PS2, Line 216: 
> Forgot in the previous pass: can you include an example of the IR it genera
Done


http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/exec/filter-context.cc@217
PS2, Line 217: Status FilterContext::CodegenInsert(
> So I've been working on this, but it turns out to be tricky. I uploaded a n
After much discussion with Tim, we've decided not to bother trying to do the cross-compilation and to just stick with the ir builder version.

The reason basically is that the Function returned by ScalarExpr::GetCodegendComputeFn() can't quite be used as a drop-in replacement for ScalarExprEvaluator::GetValue() (the first returns the value directly while the second returns a pointer) and the work necessary to fix this without losing efficiency is beyond the scope of this patch.


http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/exec/partitioned-hash-join-builder.cc
File be/src/exec/partitioned-hash-join-builder.cc:

http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/exec/partitioned-hash-join-builder.cc@935
PS2, Line 935: Status PhjBuilder::CodegenInsertRuntimeFilters(
> It would be good to have an example of the IR here too.
Done


http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/util/bloom-filter-ir.cc
File be/src/util/bloom-filter-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/util/bloom-filter-ir.cc@24
PS2, Line 24: IR_ALWAYS_INLINE
> Is the always_inline needed here?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-Change-Number: 8029
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 22 Sep 2017 21:35:01 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8029 )

Change subject: IMPALA-3360: Codegen inserting into runtime filters
......................................................................


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1264/


-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-Change-Number: 8029
Gerrit-PatchSet: 7
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 25 Sep 2017 15:38:29 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/8029

to look at the new patch set (#3).

Change subject: IMPALA-3360: Codegen inserting into runtime filters
......................................................................

IMPALA-3360: Codegen inserting into runtime filters

This patch codegens PhjBuilder::InsertRuntimeFilters() and
FilterContext::Insert().

This allows us to unroll the loop over all the filters in
PhjBuilder::ProcessBuildBatch(), eliminate the branch on type that
happens in RawValue::GetHashValue(), and eliminate the AVX check
that happens in BloomFilter::Insert().

Testing:
- Ran existing runtime filter tests.
- Ran perf tests locally (all avg. over three runs):
  - Four way self join on tpch_parquet.lineitem. Should be a good case
    for this as there's several large hash join build sides that will
    benefit from the codegen. Total query running time improved ~7%
    (from 16.07s to 14.91s).
  - Single join of tpch_parquet.lineitem against a selectively
    filtered tpch_parquet.lineitem. Should be a bad case for this
    patch, as the build side of the join is very small. Total query
    running time regressed by about ~2% (from 0.73s to 0.75s) due to
    an increase in codegen time (from 295ms to 309ms for the fragment
    containing the hash join).

Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exprs/scalar-expr-evaluator.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/util/CMakeLists.txt
A be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.h
15 files changed, 244 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/8029/3
-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3360: Codegen inserting into runtime filters
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8029/3/be/src/exec/CMakeLists.txt
File be/src/exec/CMakeLists.txt:

Line 40:   filter-context-ir.cc
Missing this file?


http://gerrit.cloudera.org:8080/#/c/8029/3/be/src/exec/filter-context.cc
File be/src/exec/filter-context.cc:

Line 225:   Constant* expr_type_arg = codegen->ConstantToGVPtr(
This does look like it should work to me. I think the types should line up - i.e. ColumnType* and ColumnType*.


http://gerrit.cloudera.org:8080/#/c/8029/3/be/src/exprs/scalar-expr.cc
File be/src/exprs/scalar-expr.cc:

Line 368:   // Set the pointer to NULL in case it evaluates to NULL.
It seems like the main way this differs from other places is returning a null pointer if IsNull() is true. Could we just wrap that up in something like a CodegenAnyVal.ToNullableNativePtr() method and avoid the boilerplate of creating a whole new function?


-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8029 )

Change subject: IMPALA-3360: Codegen inserting into runtime filters
......................................................................


Patch Set 6: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-Change-Number: 8029
Gerrit-PatchSet: 6
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 22 Sep 2017 22:36:02 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/8029

to look at the new patch set (#5).

Change subject: IMPALA-3360: Codegen inserting into runtime filters
......................................................................

IMPALA-3360: Codegen inserting into runtime filters

This patch codegens PhjBuilder::InsertRuntimeFilters() and
FilterContext::Insert().

This allows us to unroll the loop over all the filters in
PhjBuilder::ProcessBuildBatch(), eliminate the branch on type that
happens in RawValue::GetHashValue(), and eliminate the AVX check
that happens in BloomFilter::Insert().

Testing:
- Ran existing runtime filter tests.
- Ran perf tests locally (all avg. over three runs):
  - Four way self join on tpch_parquet.lineitem. Should be a good case
    for this as there's several large hash join build sides that will
    benefit from the codegen. Total query running time improved ~7%
    (from 16.07s to 14.91s).
  - Single join of tpch_parquet.lineitem against a selectively
    filtered tpch_parquet.lineitem. Should be a bad case for this
    patch, as the build side of the join is very small. Total query
    running time regressed by about ~2% (from 0.73s to 0.75s) due to
    an increase in codegen time (from 295ms to 309ms for the fragment
    containing the hash join).

Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
A be/src/exec/filter-context-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exprs/scalar-expr-evaluator.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/util/CMakeLists.txt
A be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.h
16 files changed, 278 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/8029/5
-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-3360: Codegen inserting into runtime filters
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8029/1/be/src/exec/filter-context.h
File be/src/exec/filter-context.h:

Line 124:   static Status CodegenInsert(LlvmCodeGen* codegen, ScalarExpr* filter_expr,
> This is a bit tricky because it doesn't actually do exactly the same thing 
Done


http://gerrit.cloudera.org:8080/#/c/8029/1/be/src/exec/partitioned-hash-join-builder.cc
File be/src/exec/partitioned-hash-join-builder.cc:

Line 953:   for (int i = 0; i < num_filters; ++i) {
> Nit: it looks like the two branches are identical except for adding the NoI
Done


Line 958:     Value* filter_context_ptr =
> Going forward, we want to avoid have this dependence between the codegen co
Done


http://gerrit.cloudera.org:8080/#/c/8029/1/be/src/util/bloom-filter-ir.cc
File be/src/util/bloom-filter-ir.cc:

Line 32: }
> It would be nice to keep this in header file to avoid regressing the non-co
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8029 )

Change subject: IMPALA-3360: Codegen inserting into runtime filters
......................................................................


Patch Set 6: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1262/


-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-Change-Number: 8029
Gerrit-PatchSet: 6
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Sat, 23 Sep 2017 02:55:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3360: Codegen inserting into runtime filters
......................................................................


Patch Set 1:

(4 comments)

Nice to see this getting fixed.

http://gerrit.cloudera.org:8080/#/c/8029/1/be/src/exec/filter-context.h
File be/src/exec/filter-context.h:

Line 124:   static Status CodegenInsert(LlvmCodeGen* codegen, ScalarExpr* filter_expr,
This is a bit tricky because it doesn't actually do exactly the same thing as Insert() - it doesn't check if the filter is NULL. That would be worth documenting, but I think the null check is actually necessary so the resolution is probably just to make it do the same thing as Insert().


http://gerrit.cloudera.org:8080/#/c/8029/1/be/src/exec/partitioned-hash-join-builder.cc
File be/src/exec/partitioned-hash-join-builder.cc:

Line 953:   if (num_filters == 0) {
Nit: it looks like the two branches are identical except for adding the NoInline. Might be clearer just to write:

  if (num_filters > 0) {
   insert_runtime_filters_fn->addFnAttr(llvm::Attribute::NoInline);
  }


Line 958:       if (filter_ctxs_[i].local_bloom_filter != NULL) {
Going forward, we want to avoid have this dependence between the codegen code and fragment instance-local state: see IMPALA-4080 - so that the same codegen'd code can be shared between instances of the same plan node. I.e. this should be a runtime branch.


http://gerrit.cloudera.org:8080/#/c/8029/1/be/src/util/bloom-filter-ir.cc
File be/src/util/bloom-filter-ir.cc:

Line 32:   if (CpuInfo::IsSupported(CpuInfo::AVX2)) {
It would be nice to keep this in header file to avoid regressing the non-codegen'd path (although that likely won't be a big deal).

It would also be nice to avoid this AVX2 branch. Maybe we could just have 3 versions - the original inlined one in the header and then two IR ones here for the AVX and non-AVX2 cases.


-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8029 )

Change subject: IMPALA-3360: Codegen inserting into runtime filters
......................................................................


Patch Set 7: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-Change-Number: 8029
Gerrit-PatchSet: 7
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 25 Sep 2017 19:37:18 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8029 )

Change subject: IMPALA-3360: Codegen inserting into runtime filters
......................................................................


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1262/


-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-Change-Number: 8029
Gerrit-PatchSet: 6
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 22 Sep 2017 23:50:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change.

Change subject: IMPALA-3360: Codegen inserting into runtime filters
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

Line 135:   void InsertNoAvx(const uint32_t hash) noexcept;
naming nit: We actually need AVX2 -- AVX won't cut it


-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-3360: Codegen inserting into runtime filters
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/exec/filter-context.cc
File be/src/exec/filter-context.cc:

Line 217: 
> You're right, we should be able to cross-compile Insert() and substitute in
So I've been working on this, but it turns out to be tricky. I uploaded a new patch, but it doesn't fully work.

One issue is that the function returned by ScalarExpr::GetCodegendComputeFn() can't be used as a drop-in replacement for ScalarExprEvaluator::GetValue() as they have different return types (AnyVal and void* respectively). I solved this with ScalarExpr::GetCodegendComputePtrFn(), but it means keeping a lot of the IRBuilder code that's here.

The other issue is how to deal with making the type argument to GetHashValue() a constant. As far as I can tell, we don't already have any tools for directly replacing arguments to functions.

The latest version of the patch defines a function GetType(), called in FilterContext::Insert(), that I replace with ReplaceCallSitesWithValue(), but this doesn't work (in particular, the values to be hashed that are passed to GetHashValue() appear to be corrupted, and from dumping the IR the branching isn't eliminated anyways), and I'm not sure if its even reasonable to expect it to work since we don't use ReplaceCallSitesWithValue() in that way anywhere else.

I also considered making something like RawValue::CodegenGetHashValue() which would take the type and return a codegen'd function that can be used to replace GetHashValue() with ReplaceCallSites() (ie. the codegen'd function would still also take a type argument but would ignore it), but this would be a bunch more IRBuilder (more even than the original version of the patch, I think), so I wanted to see if that approach makes sense before doing it.


http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

Line 135:   // Same as Insert(), but skips the CPU check and assumes that AVX is not available.
> naming nit: We actually need AVX2 -- AVX won't cut it
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8029 )

Change subject: IMPALA-3360: Codegen inserting into runtime filters
......................................................................

IMPALA-3360: Codegen inserting into runtime filters

This patch codegens PhjBuilder::InsertRuntimeFilters() and
FilterContext::Insert().

This allows us to unroll the loop over all the filters in
PhjBuilder::ProcessBuildBatch(), eliminate the branch on type that
happens in RawValue::GetHashValue(), and eliminate the AVX check
that happens in BloomFilter::Insert().

Testing:
- Ran existing runtime filter tests.
- Ran perf tests locally (all avg. over three runs):
  - Four way self join on tpch_parquet.lineitem. Should be a good case
    for this as there's several large hash join build sides that will
    benefit from the codegen. Total query running time improved ~7%
    (from 16.07s to 14.91s).
  - Single join of tpch_parquet.lineitem against a selectively
    filtered tpch_parquet.lineitem. Should be a bad case for this
    patch, as the build side of the join is very small. Total query
    running time regressed by about ~2% (from 0.73s to 0.75s) due to
    an increase in codegen time (from 295ms to 309ms for the fragment
    containing the hash join).

Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Reviewed-on: http://gerrit.cloudera.org:8080/8029
Reviewed-by: Thomas Tauber-Marshall <tm...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/util/CMakeLists.txt
A be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.h
10 files changed, 311 insertions(+), 9 deletions(-)

Approvals:
  Thomas Tauber-Marshall: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-Change-Number: 8029
Gerrit-PatchSet: 8
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3360: Codegen inserting into runtime filters
......................................................................


Patch Set 2:

> (4 comments)
 > 
 > Looks good to me minus a couple of things. Not sure if Dan wants to
 > take a look

I looked at the header changes and they look fine. Tim, feel free to do the +2 once all comments are addressed.

-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/8029

to look at the new patch set (#4).

Change subject: IMPALA-3360: Codegen inserting into runtime filters
......................................................................

IMPALA-3360: Codegen inserting into runtime filters

This patch codegens PhjBuilder::InsertRuntimeFilters() and
FilterContext::Insert().

This allows us to unroll the loop over all the filters in
PhjBuilder::ProcessBuildBatch(), eliminate the branch on type that
happens in RawValue::GetHashValue(), and eliminate the AVX check
that happens in BloomFilter::Insert().

Testing:
- Ran existing runtime filter tests.
- Ran perf tests locally (all avg. over three runs):
  - Four way self join on tpch_parquet.lineitem. Should be a good case
    for this as there's several large hash join build sides that will
    benefit from the codegen. Total query running time improved ~7%
    (from 16.07s to 14.91s).
  - Single join of tpch_parquet.lineitem against a selectively
    filtered tpch_parquet.lineitem. Should be a bad case for this
    patch, as the build side of the join is very small. Total query
    running time regressed by about ~2% (from 0.73s to 0.75s) due to
    an increase in codegen time (from 295ms to 309ms for the fragment
    containing the hash join).

Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
A be/src/exec/filter-context-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exprs/scalar-expr-evaluator.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/util/CMakeLists.txt
A be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.h
16 files changed, 279 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/8029/4
-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3360: Codegen inserting into runtime filters
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/exec/filter-context.cc
File be/src/exec/filter-context.cc:

Line 217: Status FilterContext::CodegenInsert(
> I'm still learning here. I don't see a loop here. What makes this ineligibl
You're right, we should be able to cross-compile Insert() and substitute in GetValue(), GetHashValue() and BloomFilter::Insert().


-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8029 )

Change subject: IMPALA-3360: Codegen inserting into runtime filters
......................................................................


Patch Set 7: Code-Review+2

GVO failed because of WARN_UNUSED_RESULT not being followed. Added a RETURN_IF_ERROR


-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-Change-Number: 8029
Gerrit-PatchSet: 7
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 25 Sep 2017 15:26:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-3360: Codegen inserting into runtime filters
......................................................................


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8029/3/be/src/exec/CMakeLists.txt
File be/src/exec/CMakeLists.txt:

Line 40:   filter-context-ir.cc
> Missing this file?
Done


http://gerrit.cloudera.org:8080/#/c/8029/3/be/src/exec/filter-context.cc
File be/src/exec/filter-context.cc:

Line 225:   Constant* expr_type_arg = codegen->ConstantToGVPtr(
> This does look like it should work to me. I think the types should line up 
Yeah, I have no idea why it doesn't work. The types definitely line up or it wouldn't pass verification, and as far as I can tell the type is being passed correctly, its just somehow corrupting/overwriting the actual value that's getting hashed.

I've been playing around with it, and the only thing I can figure out that makes it work is to change the cross-complied function to assign the ColumnType to a local variable rather than just handling a ref to it, but of course that somewhat defeats the point here.

Is there someone who knows more about llvm that can help me with this? I can post the generated IR somewhere for someone to look at. I've already spent quite a lot of time trying to get this to work, and I'm not sure what else to do since I'm finding llvm to be very poorly documented.


http://gerrit.cloudera.org:8080/#/c/8029/3/be/src/exprs/scalar-expr.cc
File be/src/exprs/scalar-expr.cc:

Line 368:   // Set the pointer to NULL in case it evaluates to NULL.
> It seems like the main way this differs from other places is returning a nu
I don't know what you mean by this. If we're going to replace ScalarExprEval::GetValue() with ReplaceCallSites, then we need another function to replace it with.


-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-3360: Codegen inserting into runtime filters
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8029/3/be/src/exec/filter-context.cc
File be/src/exec/filter-context.cc:

Line 225:   Constant* expr_type_arg = codegen->ConstantToGVPtr(
> Yeah, I have no idea why it doesn't work. The types definitely line up or i
So I've been playing around with this more, and I have a theory. I noticed that it works if I only replace 'GetType' and not 'GetValue', so the problem may actually be with 'GetValue'.

I think that the value that's being returned by the function generated by GetCodegendComputePtrFn() is stack allocated (because ToNativePtr() is called on 'result' which then calls CreateEntryBlockAlloca, which allocates stack space).

Basically, GetCodegendComputePtrFn() is trying to create a codegend copy of ScalarExprEvaluator::GetValue() but its not currently doing the part where the value is stored in 'result_' so that a pointer to it can be passed back.

One solution would be to figure out how to write IRBuilder for storing the value in 'result_' (or else cross compile ScalarExprEvaluator::GetValue() and sub things in), but that's potentially complicated. It may also be worth just going back to the original version of this patch to eliminate the perf cost from having to store the value in 'result_'.

Or maybe you had something in mind with your other comment about not needing to create a whole new function that would solve this problem as well.


-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Matthew Jacobs, Jim Apple, Philip Zeyliger, Tim Armstrong, Dan Hecht, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8029

to look at the new patch set (#7).

Change subject: IMPALA-3360: Codegen inserting into runtime filters
......................................................................

IMPALA-3360: Codegen inserting into runtime filters

This patch codegens PhjBuilder::InsertRuntimeFilters() and
FilterContext::Insert().

This allows us to unroll the loop over all the filters in
PhjBuilder::ProcessBuildBatch(), eliminate the branch on type that
happens in RawValue::GetHashValue(), and eliminate the AVX check
that happens in BloomFilter::Insert().

Testing:
- Ran existing runtime filter tests.
- Ran perf tests locally (all avg. over three runs):
  - Four way self join on tpch_parquet.lineitem. Should be a good case
    for this as there's several large hash join build sides that will
    benefit from the codegen. Total query running time improved ~7%
    (from 16.07s to 14.91s).
  - Single join of tpch_parquet.lineitem against a selectively
    filtered tpch_parquet.lineitem. Should be a bad case for this
    patch, as the build side of the join is very small. Total query
    running time regressed by about ~2% (from 0.73s to 0.75s) due to
    an increase in codegen time (from 295ms to 309ms for the fragment
    containing the hash join).

Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/util/CMakeLists.txt
A be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.h
10 files changed, 311 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/8029/7
-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-Change-Number: 8029
Gerrit-PatchSet: 7
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8029 )

Change subject: IMPALA-3360: Codegen inserting into runtime filters
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8029/3/be/src/exec/filter-context.cc
File be/src/exec/filter-context.cc:

http://gerrit.cloudera.org:8080/#/c/8029/3/be/src/exec/filter-context.cc@225
PS3, Line 225:   Constant* expr_type_arg = codegen->ConstantToGVPtr(
> So I've been playing around with this more, and I have a theory. I noticed 
That sounds plausible. ToNativePtr() makes a stack allocation so it makes sense that it would cause problems if you return that pointer from the function. Just generating the call and calling ToNativePtr() inline in this function would work (I think) since the stack allocation would have the correct lifetime.

Storing the intermediate values to the heap (i.e. result_) I think is the wrong path to go down since it will inhibit optimisations (the compiler knows that random pointers won't alias the stack allocation and that the write to the stack allocation won't be visible outside of the functoin).



-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-Change-Number: 8029
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Sep 2017 17:50:43 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has uploaded a new patch set (#2).

Change subject: IMPALA-3360: Codegen inserting into runtime filters
......................................................................

IMPALA-3360: Codegen inserting into runtime filters

This patch codegens PhjBuilder::InsertRuntimeFilters() and
FilterContext::Insert().

This allows us to unroll the loop over all the filters in
PhjBuilder::ProcessBuildBatch(), eliminate the branch on type that
happens in RawValue::GetHashValue(), and eliminate the AVX check
that happens in BloomFilter::Insert().

Testing:
- Ran existing runtime filter tests.
- Ran perf tests locally (all avg. over three runs):
  - Four way self join on tpch_parquet.lineitem. Should be a good case
    for this as there's several large hash join build sides that will
    benefit from the codegen. Total query running time improved ~7%
    (from 16.07s to 14.91s).
  - Single join of tpch_parquet.lineitem against a selectively
    filtered tpch_parquet.lineitem. Should be a bad case for this
    patch, as the build side of the join is very small. Total query
    running time regressed by about ~2% (from 0.73s to 0.75s) due to
    an increase in codegen time (from 295ms to 309ms for the fragment
    containing the hash join).

Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/util/CMakeLists.txt
A be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.h
10 files changed, 246 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/8029/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

Posted by "Philip Zeyliger (Code Review)" <ge...@cloudera.org>.
Philip Zeyliger has posted comments on this change.

Change subject: IMPALA-3360: Codegen inserting into runtime filters
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/exec/filter-context.cc
File be/src/exec/filter-context.cc:

Line 217: Status FilterContext::CodegenInsert(
> It would be nice not to add the IRBuilder code but we don't have a good pat
I'm still learning here. I don't see a loop here. What makes this ineligible for cross-compilation?


-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3360: Codegen inserting into runtime filters
......................................................................


Patch Set 2: Code-Review+1

(4 comments)

Looks good to me minus a couple of things. Not sure if Dan wants to take a look

http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/exec/filter-context.cc
File be/src/exec/filter-context.cc:

Line 216: 
Forgot in the previous pass: can you include an example of the IR it generates?


Line 217: Status FilterContext::CodegenInsert(
It would be nice not to add the IRBuilder code but we don't have a good path yet to automatically unroll loops in cross-compiled code. This seems pretty reasonable as far as IRBuilder code - I didn't see any obvious opportunities to simplify things.


http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/exec/partitioned-hash-join-builder.cc
File be/src/exec/partitioned-hash-join-builder.cc:

Line 935: Status PhjBuilder::CodegenInsertRuntimeFilters(
It would be good to have an example of the IR here too.


http://gerrit.cloudera.org:8080/#/c/8029/2/be/src/util/bloom-filter-ir.cc
File be/src/util/bloom-filter-ir.cc:

PS2, Line 24: IR_ALWAYS_INLINE
Is the always_inline needed here?


-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Matthew Jacobs, Jim Apple, Philip Zeyliger, Tim Armstrong, Dan Hecht, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8029

to look at the new patch set (#6).

Change subject: IMPALA-3360: Codegen inserting into runtime filters
......................................................................

IMPALA-3360: Codegen inserting into runtime filters

This patch codegens PhjBuilder::InsertRuntimeFilters() and
FilterContext::Insert().

This allows us to unroll the loop over all the filters in
PhjBuilder::ProcessBuildBatch(), eliminate the branch on type that
happens in RawValue::GetHashValue(), and eliminate the AVX check
that happens in BloomFilter::Insert().

Testing:
- Ran existing runtime filter tests.
- Ran perf tests locally (all avg. over three runs):
  - Four way self join on tpch_parquet.lineitem. Should be a good case
    for this as there's several large hash join build sides that will
    benefit from the codegen. Total query running time improved ~7%
    (from 16.07s to 14.91s).
  - Single join of tpch_parquet.lineitem against a selectively
    filtered tpch_parquet.lineitem. Should be a bad case for this
    patch, as the build side of the join is very small. Total query
    running time regressed by about ~2% (from 0.73s to 0.75s) due to
    an increase in codegen time (from 295ms to 309ms for the fragment
    containing the hash join).

Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/util/CMakeLists.txt
A be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.h
10 files changed, 311 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/8029/6
-- 
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-Change-Number: 8029
Gerrit-PatchSet: 6
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>