You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Tianyi Wang (Code Review)" <ge...@cloudera.org> on 2017/09/29 01:24:21 UTC

[Impala-ASF-CR] IMPALA-5789: Add always false flag in bloom filter

Tianyi Wang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8170


Change subject: IMPALA-5789: Add always_false flag in bloom filter
......................................................................

IMPALA-5789: Add always_false flag in bloom filter

This patch adds an always_false flag in bloom filters. The flag is set
if nothing has been inserted into the bloom filter. HdfsScanner uses
this flag to early terminate the scan at file, partition and rowbatch
granularities.

Testing: A test case is added checking that an always-false runtime
filter can filter out files.

Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
---
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scanner.cc
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M common/thrift/ImpalaInternalService.thrift
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
18 files changed, 158 insertions(+), 195 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/8170/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
Gerrit-Change-Number: 8170
Gerrit-PatchSet: 1
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>

[Impala-ASF-CR] IMPALA-5789: Add always false flag in bloom filter

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Hello Thomas Tauber-Marshall, Sailesh Mukil, Tim Armstrong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8170

to look at the new patch set (#7).

Change subject: IMPALA-5789: Add always_false flag in bloom filter
......................................................................

IMPALA-5789: Add always_false flag in bloom filter

This patch adds an always_false flag in bloom filters. The flag is set
if nothing has been inserted into the bloom filter. HdfsScanner uses
this flag to early terminate the scan at file and split granularities.

Testing: It passes existing tests. Two test cases are added checking
that an always-false runtime filter can filter out files and splits.
In single node perf tests, time spent on primitive_empty_build_join_1
is reduced by 75%.

Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
---
M be/src/common/global-flags.cc
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/base-sequence-scanner.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner.cc
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M common/thrift/ImpalaInternalService.thrift
A tests/custom_cluster/test_always_false_filter.py
M tests/query_test/test_runtime_filters.py
22 files changed, 200 insertions(+), 197 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/8170/7
-- 
To view, visit http://gerrit.cloudera.org:8080/8170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
Gerrit-Change-Number: 8170
Gerrit-PatchSet: 7
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5789: Add always false flag in bloom filter

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8170 )

Change subject: IMPALA-5789: Add always_false flag in bloom filter
......................................................................


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/8170/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8170/4//COMMIT_MSG@15
PS4, Line 15: 
Did you do any perf runs? It would be good to verify that the extra flag checking doesn't affect perf (I suspect it doesn't).

It would also be good to confirm that testdata/workloads/targeted-perf/queries/primitive_empty_build_join_1.test gets faster (I think it should!).


http://gerrit.cloudera.org:8080/#/c/8170/4/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8170/4/be/src/exec/hdfs-parquet-scanner.cc@403
PS4, Line 403: std::
Shouldn't need std:: - it's imported in common/names.h


http://gerrit.cloudera.org:8080/#/c/8170/4/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8170/4/be/src/exec/hdfs-scanner.cc@107
PS4, Line 107:     if (!BaseSequenceScanner::FileFormatIsSequenceBased(
Might be more readable to factor subexpression into variable e.g. is_sequence_based.


http://gerrit.cloudera.org:8080/#/c/8170/4/tests/custom_cluster/test_always_false_filter.py
File tests/custom_cluster/test_always_false_filter.py:

http://gerrit.cloudera.org:8080/#/c/8170/4/tests/custom_cluster/test_always_false_filter.py@53
PS4, Line 53:   seq_table_suffixes = ['_avro', '_rc', '_seq']
We'd normally create a test matrix based on these. Is the idea here to avoid restarting the cluster for each file format? If so it would be good to leave a comment so that readers understand why.


http://gerrit.cloudera.org:8080/#/c/8170/4/tests/custom_cluster/test_always_false_filter.py@82
PS4, Line 82:   def test_skip_file(self, cursor):
Does this need to be a custom cluster test? I.e. does it need a special minicluster to execute. It's best to make things query tests if possible since starting a cluster is slow and the tests aren't parallelisable.



-- 
To view, visit http://gerrit.cloudera.org:8080/8170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
Gerrit-Change-Number: 8170
Gerrit-PatchSet: 4
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 13 Oct 2017 20:34:05 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5789: Add always false flag in bloom filter

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has abandoned this change. ( http://gerrit.cloudera.org:8080/8170 )

Change subject: IMPALA-5789: Add always_false flag in bloom filter
......................................................................


Abandoned

More tests are needed
-- 
To view, visit http://gerrit.cloudera.org:8080/8170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
Gerrit-Change-Number: 8170
Gerrit-PatchSet: 1
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5789: Add always false flag in bloom filter

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/8170 )

Change subject: IMPALA-5789: Add always_false flag in bloom filter
......................................................................


Patch Set 7:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/8170/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8170/4//COMMIT_MSG@15
PS4, Line 15: In single node perf tests, time spent on primitive_empty_build_join_1
> Did you do any perf runs? It would be good to verify that the extra flag ch
Commit message is updated with perf run result and the complete report is posted on JIRA. primitive_empty_build_join_1 does get faster.


http://gerrit.cloudera.org:8080/#/c/8170/4/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8170/4/be/src/exec/hdfs-parquet-scanner.cc@403
PS4, Line 403: std::
> Shouldn't need std:: - it's imported in common/names.h
It turns out that there is an llvm::make_unique and this file uses llvm namespace so it cannot be removed.


http://gerrit.cloudera.org:8080/#/c/8170/4/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8170/4/be/src/exec/hdfs-scanner.cc@107
PS4, Line 107:     bool is_sequence_based = BaseSequenceScanner::FileFormatIsSequenceBased(
> Might be more readable to factor subexpression into variable e.g. is_sequen
Done


http://gerrit.cloudera.org:8080/#/c/8170/4/tests/custom_cluster/test_always_false_filter.py
File tests/custom_cluster/test_always_false_filter.py:

http://gerrit.cloudera.org:8080/#/c/8170/4/tests/custom_cluster/test_always_false_filter.py@53
PS4, Line 53:       assert re.search("Splits rejected: 8 \(8\)", profile) is not None
> We'd normally create a test matrix based on these. Is the idea here to avoi
Done


http://gerrit.cloudera.org:8080/#/c/8170/4/tests/custom_cluster/test_always_false_filter.py@82
PS4, Line 82: 
> Does this need to be a custom cluster test? I.e. does it need a special min
It is moved into tests/query_test/test_runtime_filters.py



-- 
To view, visit http://gerrit.cloudera.org:8080/8170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
Gerrit-Change-Number: 8170
Gerrit-PatchSet: 7
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 19 Oct 2017 18:20:59 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5789: Add always false flag in bloom filter

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8170 )

Change subject: IMPALA-5789: Add always_false flag in bloom filter
......................................................................


Patch Set 8: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/8170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
Gerrit-Change-Number: 8170
Gerrit-PatchSet: 8
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 20 Oct 2017 00:03:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5789: Add always false flag in bloom filter

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/8170 )

Change subject: IMPALA-5789: Add always_false flag in bloom filter
......................................................................


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8170/7/tests/custom_cluster/test_always_false_filter.py
File tests/custom_cluster/test_always_false_filter.py:

http://gerrit.cloudera.org:8080/#/c/8170/7/tests/custom_cluster/test_always_false_filter.py@37
PS7, Line 37:     query = """select STRAIGHT_JOIN * from alltypes inner join
> Would be better to use the """ multi-line literals
Done


http://gerrit.cloudera.org:8080/#/c/8170/7/tests/query_test/test_runtime_filters.py
File tests/query_test/test_runtime_filters.py:

http://gerrit.cloudera.org:8080/#/c/8170/7/tests/query_test/test_runtime_filters.py@55
PS7, Line 55: "
> nit: extra space. Maybe also use the """ multi-line literals
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
Gerrit-Change-Number: 8170
Gerrit-PatchSet: 8
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 19 Oct 2017 23:30:27 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5789: Add always false flag in bloom filter

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8170 )

Change subject: IMPALA-5789: Add always_false flag in bloom filter
......................................................................


Patch Set 3:

(6 comments)

This is definitely going to conflict with my patch, but you should be able to get this in first so I'll probably have to do the painful rebasing.

http://gerrit.cloudera.org:8080/#/c/8170/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8170/3//COMMIT_MSG@13
PS3, Line 13: Testing
I assume you ran the existing runtime filter tests?


http://gerrit.cloudera.org:8080/#/c/8170/3/be/src/common/global-flags.cc
File be/src/common/global-flags.cc:

http://gerrit.cloudera.org:8080/#/c/8170/3/be/src/common/global-flags.cc@129
PS3, Line 129: order to provide a regression test for IMPALA-3798 and a test for IMPALA-5789.
This is a little cluttered. I think its okay to just say "for testing purposes" and anyone who wants to know the exact JIRAs can easily grep for uses of it.


http://gerrit.cloudera.org:8080/#/c/8170/3/be/src/exec/filter-context.h
File be/src/exec/filter-context.h:

http://gerrit.cloudera.org:8080/#/c/8170/3/be/src/exec/filter-context.h@129
PS3, Line 129: HasAlwaysFalseInList
I think this needs to be renamed for two reasons:
- We're eventually going to be adding 'in-list' filters (in addition to the existing "bloom" and soon "min-max" filters), so the "InList" here is potentially confusing.
- Its unusual for a function starting with "Has" to have side effects (incrementing the stats).

Maybe "CheckForAlwaysFalse"? Or something better you come up with.


http://gerrit.cloudera.org:8080/#/c/8170/3/be/src/runtime/runtime-filter.inline.h
File be/src/runtime/runtime-filter.inline.h:

http://gerrit.cloudera.org:8080/#/c/8170/3/be/src/runtime/runtime-filter.inline.h@53
PS3, Line 53: return bloom_filter_ != BloomFilter::ALWAYS_TRUE_FILTER &&
            :       bloom_filter_->AlwaysFalse();
single line?


http://gerrit.cloudera.org:8080/#/c/8170/3/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

http://gerrit.cloudera.org:8080/#/c/8170/3/be/src/util/bloom-filter.h@111
PS3, Line 111: hasn't been inserted any elements
hasn't had any elements inserted


http://gerrit.cloudera.org:8080/#/c/8170/3/tests/custom_cluster/test_always_false_filter.py
File tests/custom_cluster/test_always_false_filter.py:

http://gerrit.cloudera.org:8080/#/c/8170/3/tests/custom_cluster/test_always_false_filter.py@34
PS3, Line 34: impalad = self.cluster.get_any_impalad()
            :     client = impalad.service.create_beeswax_client()
I think you can just add 'cursor' as a parameter to the 'test_' functions.



-- 
To view, visit http://gerrit.cloudera.org:8080/8170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
Gerrit-Change-Number: 8170
Gerrit-PatchSet: 3
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 10 Oct 2017 18:11:44 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5789: Add always false flag in bloom filter

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/8170 )

Change subject: IMPALA-5789: Add always_false flag in bloom filter
......................................................................


Patch Set 4:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/8170/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8170/3//COMMIT_MSG@13
PS3, Line 13: Testing
> I assume you ran the existing runtime filter tests?
Yes. Commit message is updated with the clarification.


http://gerrit.cloudera.org:8080/#/c/8170/3/be/src/common/global-flags.cc
File be/src/common/global-flags.cc:

http://gerrit.cloudera.org:8080/#/c/8170/3/be/src/common/global-flags.cc@129
PS3, Line 129: testing purposes. Effective in debug builds only.");
> This is a little cluttered. I think its okay to just say "for testing purpo
Done


http://gerrit.cloudera.org:8080/#/c/8170/3/be/src/exec/filter-context.h
File be/src/exec/filter-context.h:

http://gerrit.cloudera.org:8080/#/c/8170/3/be/src/exec/filter-context.h@129
PS3, Line 129: CheckForAlwaysFalse(
> I think this needs to be renamed for two reasons:
Done


http://gerrit.cloudera.org:8080/#/c/8170/3/be/src/runtime/runtime-filter.inline.h
File be/src/runtime/runtime-filter.inline.h:

http://gerrit.cloudera.org:8080/#/c/8170/3/be/src/runtime/runtime-filter.inline.h@53
PS3, Line 53: return bloom_filter_ != BloomFilter::ALWAYS_TRUE_FILTER && bloom_filter_->AlwaysFalse();
            : }
> single line?
Done


http://gerrit.cloudera.org:8080/#/c/8170/3/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

http://gerrit.cloudera.org:8080/#/c/8170/3/be/src/util/bloom-filter.h@111
PS3, Line 111: hasn't had any elements inserted.
> hasn't had any elements inserted
Done


http://gerrit.cloudera.org:8080/#/c/8170/3/tests/custom_cluster/test_always_false_filter.py
File tests/custom_cluster/test_always_false_filter.py:

http://gerrit.cloudera.org:8080/#/c/8170/3/tests/custom_cluster/test_always_false_filter.py@34
PS3, Line 34: f assert_file_rejected(result):
            :     assert re.search("Files rejected: 8 \(8\)", resu
> I think you can just add 'cursor' as a parameter to the 'test_' functions.
Done.



-- 
To view, visit http://gerrit.cloudera.org:8080/8170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
Gerrit-Change-Number: 8170
Gerrit-PatchSet: 4
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 10 Oct 2017 22:47:39 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5789: Add always false flag in bloom filter

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8170 )

Change subject: IMPALA-5789: Add always_false flag in bloom filter
......................................................................


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1354/


-- 
To view, visit http://gerrit.cloudera.org:8080/8170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
Gerrit-Change-Number: 8170
Gerrit-PatchSet: 8
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 20 Oct 2017 00:04:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5789: Add always false flag in bloom filter

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8170 )

Change subject: IMPALA-5789: Add always_false flag in bloom filter
......................................................................


Patch Set 7: Code-Review+2

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8170/4/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8170/4/be/src/exec/hdfs-parquet-scanner.cc@403
PS4, Line 403: std::
> It turns out that there is an llvm::make_unique and this file uses llvm nam
Got it, thanks. This seems like a reason to not use "using namespace llvm;" but we don't need to fix that now. Filed IMPALA-6084


http://gerrit.cloudera.org:8080/#/c/8170/7/tests/custom_cluster/test_always_false_filter.py
File tests/custom_cluster/test_always_false_filter.py:

http://gerrit.cloudera.org:8080/#/c/8170/7/tests/custom_cluster/test_always_false_filter.py@37
PS7, Line 37:     query = "select STRAIGHT_JOIN * from alltypes inner join " \
Would be better to use the """ multi-line literals


http://gerrit.cloudera.org:8080/#/c/8170/7/tests/query_test/test_runtime_filters.py
File tests/query_test/test_runtime_filters.py:

http://gerrit.cloudera.org:8080/#/c/8170/7/tests/query_test/test_runtime_filters.py@55
PS7, Line 55:  
nit: extra space. Maybe also use the """ multi-line literals



-- 
To view, visit http://gerrit.cloudera.org:8080/8170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
Gerrit-Change-Number: 8170
Gerrit-PatchSet: 7
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 19 Oct 2017 22:47:17 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5789: Add always false flag in bloom filter

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/8170 )

Change subject: IMPALA-5789: Add always_false flag in bloom filter
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8170/2/be/src/runtime/coordinator-filter-state.h
File be/src/runtime/coordinator-filter-state.h:

http://gerrit.cloudera.org:8080/#/c/8170/2/be/src/runtime/coordinator-filter-state.h@57
PS2, Line 57: class Coordinator::FilterState {
> Could you add a comment up here explaining the different meanings of 'alway
These implications does not apply to other parts of the code. We are essentially doing an OR(a list of bloom_filters) here so OR(empty list) should be false. And always_true short-circuits the evaluation since future filters can no longer make any difference.  In comparison, in scanner we need to apply multiple bloom_filters to a row, which is an AND, so any non-yet-received filter is treated as true. And if one filter returns false we terminate the evaluation.

If we use disabled_ and is_inited_ we need additional synchronization between these two set of flags, and the two sets of flags have no difference. And I think using always_false and always_true here is algebraically more natural than is_inited_ and disabled_.


http://gerrit.cloudera.org:8080/#/c/8170/2/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/8170/2/be/src/runtime/coordinator.cc@1121
PS2, Line 1121:     swap(rpc_params.bloom_filter, aggregated_filter);
> What are the possible states after this swap() ? Since we're not explicitly
A check is added. (It could be always_false as well)


http://gerrit.cloudera.org:8080/#/c/8170/2/be/src/runtime/coordinator.cc@1123
PS2, Line 1123: size()
> Since you changed the mem tracking from size() to capacity() in L1120 and L
You are right. Done.



-- 
To view, visit http://gerrit.cloudera.org:8080/8170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
Gerrit-Change-Number: 8170
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 09 Oct 2017 21:07:01 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5789: Add always false flag in bloom filter

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Hello Thomas Tauber-Marshall, Sailesh Mukil, Tim Armstrong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8170

to look at the new patch set (#5).

Change subject: IMPALA-5789: Add always_false flag in bloom filter
......................................................................

IMPALA-5789: Add always_false flag in bloom filter

This patch adds an always_false flag in bloom filters. The flag is set
if nothing has been inserted into the bloom filter. HdfsScanner uses
this flag to early terminate the scan at file and split granularities.

Testing: It passes existing tests. A test case is added checking that an
always-false runtime filter can filter out files and splits.

Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
---
M be/src/common/global-flags.cc
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/base-sequence-scanner.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner.cc
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M common/thrift/ImpalaInternalService.thrift
A tests/custom_cluster/test_always_false_filter.py
21 files changed, 228 insertions(+), 198 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/8170/5
-- 
To view, visit http://gerrit.cloudera.org:8080/8170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
Gerrit-Change-Number: 8170
Gerrit-PatchSet: 5
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5789: Add always false flag in bloom filter

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/8170 )

Change subject: IMPALA-5789: Add always_false flag in bloom filter
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8170/2/be/src/runtime/coordinator-filter-state.h
File be/src/runtime/coordinator-filter-state.h:

http://gerrit.cloudera.org:8080/#/c/8170/2/be/src/runtime/coordinator-filter-state.h@57
PS2, Line 57: class Coordinator::FilterState {
Could you add a comment up here explaining the different meanings of 'always_true' and 'always_false' ?

To me it seems like 'always_false' means either that the FilterState does not hold a bloom filter yet, or that the bloom filter is always false.

And 'always_true' could mean that the filter is disabled or that the filter it holds is always true.

Also, do these meanings of 'always_true' and 'always_false' hold true in other parts of the code as well? i.e. in parts that are not the coordinator?

If it were up to me, I would hold different states for 'disabled_' and 'is_inited_' instead of trying to derive those meanings implicitly, to make the code easier to read. But that's up for debate, and we can reach a consensus based on what others say.


http://gerrit.cloudera.org:8080/#/c/8170/2/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/8170/2/be/src/runtime/coordinator.cc@1121
PS2, Line 1121:     swap(rpc_params.bloom_filter, aggregated_filter);
What are the possible states after this swap() ? Since we're not explicitly setting the disabled case anymore.

Eg: DCHECK(rpc_params.bloom_filter.directory.size() > 0 || rpc_params.bloom_filter.always_true == true) ?


http://gerrit.cloudera.org:8080/#/c/8170/2/be/src/runtime/coordinator.cc@1123
PS2, Line 1123: size()
Since you changed the mem tracking from size() to capacity() in L1120 and L1181, shouldn't this be capacity() too now?

Else, we risk releasing the capacity() twice, if the swap() doesn't assign a 0 capacity string to the aggregated filter.



-- 
To view, visit http://gerrit.cloudera.org:8080/8170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
Gerrit-Change-Number: 8170
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 09 Oct 2017 19:01:30 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5789: Add always false flag in bloom filter

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8170 )

Change subject: IMPALA-5789: Add always_false flag in bloom filter
......................................................................


Patch Set 4:

(3 comments)

I need to do a proper pass over this. Did a quick pass now and had some questions.

http://gerrit.cloudera.org:8080/#/c/8170/4/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/8170/4/be/src/runtime/coordinator.cc@a20
PS4, Line 20: 
Thanks for doing cleanup here. How did you determine which includes were needed? Is there a good tool to use?


http://gerrit.cloudera.org:8080/#/c/8170/4/be/src/runtime/runtime-filter-ir.cc
File be/src/runtime/runtime-filter-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8170/4/be/src/runtime/runtime-filter-ir.cc@27
PS4, Line 27:   if (UNLIKELY(bloom_filter_->AlwaysFalse())) return false;
Won't these branches be likely in some cases? IMO best to only use these annotations when it's very strongly unlikely, e.g. error paths.


http://gerrit.cloudera.org:8080/#/c/8170/4/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

http://gerrit.cloudera.org:8080/#/c/8170/4/be/src/util/bloom-filter.h@200
PS4, Line 200:   if (UNLIKELY(always_false_)) return false;
Doesn't this mean that RuntimeFilter::Eval() checks always_false_ twice? Maybe best to only check it in here.



-- 
To view, visit http://gerrit.cloudera.org:8080/8170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
Gerrit-Change-Number: 8170
Gerrit-PatchSet: 4
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 12 Oct 2017 01:07:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5789: Add always false flag in bloom filter

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Hello Thomas Tauber-Marshall, Sailesh Mukil, Tim Armstrong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8170

to look at the new patch set (#8).

Change subject: IMPALA-5789: Add always_false flag in bloom filter
......................................................................

IMPALA-5789: Add always_false flag in bloom filter

This patch adds an always_false flag in bloom filters. The flag is set
if nothing has been inserted into the bloom filter. HdfsScanner uses
this flag to early terminate the scan at file and split granularities.

Testing: It passes existing tests. Two test cases are added checking
that an always-false runtime filter can filter out files and splits.
In single node perf tests, time spent on primitive_empty_build_join_1
is reduced by 75%.

Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
---
M be/src/common/global-flags.cc
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/base-sequence-scanner.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner.cc
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M common/thrift/ImpalaInternalService.thrift
A tests/custom_cluster/test_always_false_filter.py
M tests/query_test/test_runtime_filters.py
22 files changed, 200 insertions(+), 197 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/8170/8
-- 
To view, visit http://gerrit.cloudera.org:8080/8170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
Gerrit-Change-Number: 8170
Gerrit-PatchSet: 8
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5789: Add always false flag in bloom filter

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/8170 )

Change subject: IMPALA-5789: Add always_false flag in bloom filter
......................................................................


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8170/4/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/8170/4/be/src/runtime/coordinator.cc@a20
PS4, Line 20: 
> Thanks for doing cleanup here. How did you determine which includes were ne
I used the Clion ide.


http://gerrit.cloudera.org:8080/#/c/8170/4/be/src/runtime/runtime-filter-ir.cc
File be/src/runtime/runtime-filter-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8170/4/be/src/runtime/runtime-filter-ir.cc@27
PS4, Line 27:   uint32_t h = RawValue::GetHashValue(val, col_type,
> Won't these branches be likely in some cases? IMO best to only use these an
It's removed.


http://gerrit.cloudera.org:8080/#/c/8170/4/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

http://gerrit.cloudera.org:8080/#/c/8170/4/be/src/util/bloom-filter.h@200
PS4, Line 200:   if (always_false_) return false;
> Doesn't this mean that RuntimeFilter::Eval() checks always_false_ twice? Ma
Removed the other one.



-- 
To view, visit http://gerrit.cloudera.org:8080/8170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
Gerrit-Change-Number: 8170
Gerrit-PatchSet: 6
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 13 Oct 2017 19:12:58 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5789: Add always false flag in bloom filter

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/8170 )

Change subject: IMPALA-5789: Add always_false flag in bloom filter
......................................................................


Patch Set 1:

(3 comments)

> Patch Set 1:
> 
> (2 comments)
> 
> Will wait for the new PS but had a couple of high-level bits of feedback first. It may be worth looking at Thomas' patches if you haven't already to make sure your changes are compatible.

Thanks for the tips. I'm sure rabasing on Thomas's patches is not trivial.

http://gerrit.cloudera.org:8080/#/c/8170/1/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8170/1/be/src/exec/hdfs-scanner.cc@104
PS1, Line 104:     if (FilterContext::HasAlwaysFalseInList(FilterStats::SPLITS_KEY,
> Not sure if this is relevant but there is a known problem with filtering ou
Yeah just found it hanged on Jenkins for 4 days though the tests passed locally. I plan to disable this part of code with sequence scanner.


http://gerrit.cloudera.org:8080/#/c/8170/1/be/src/runtime/runtime-filter-bank.cc
File be/src/runtime/runtime-filter-bank.cc:

http://gerrit.cloudera.org:8080/#/c/8170/1/be/src/runtime/runtime-filter-bank.cc@203
PS1, Line 203:   memory_allocated_->Add(required_space);
The accounting is changed to required_space (not space used) here.


http://gerrit.cloudera.org:8080/#/c/8170/1/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

http://gerrit.cloudera.org:8080/#/c/8170/1/be/src/util/bloom-filter.h@106
PS1, Line 106:     if (always_false_) return 0;
> I think this change will mess up some accounting in RuntimeFilterBank.
OK. But see comment in runtime-filter-bank.cc.



-- 
To view, visit http://gerrit.cloudera.org:8080/8170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
Gerrit-Change-Number: 8170
Gerrit-PatchSet: 1
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Oct 2017 00:05:07 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5789: Add always false flag in bloom filter

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8170 )

Change subject: IMPALA-5789: Add always_false flag in bloom filter
......................................................................


Patch Set 1:

(2 comments)

Will wait for the new PS but had a couple of high-level bits of feedback first. It may be worth looking at Thomas' patches if you haven't already to make sure your changes are compatible.

http://gerrit.cloudera.org:8080/#/c/8170/1/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8170/1/be/src/exec/hdfs-scanner.cc@104
PS1, Line 104:     if (FilterContext::HasAlwaysFalseInList(FilterStats::SPLITS_KEY,
Not sure if this is relevant but there is a known problem with filtering out splits with some scanners: IMPALA-3804


http://gerrit.cloudera.org:8080/#/c/8170/1/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

http://gerrit.cloudera.org:8080/#/c/8170/1/be/src/util/bloom-filter.h@106
PS1, Line 106:     if (always_false_) return 0;
I think this change will mess up some accounting in RuntimeFilterBank.

My preference is to continue to allocate the memory when the BloomFilter is constructed instead of allocating it lazily. It would waste memory in the always_false case but I think we can live with that because it doesn't increase worst-case memory consumption.



-- 
To view, visit http://gerrit.cloudera.org:8080/8170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
Gerrit-Change-Number: 8170
Gerrit-PatchSet: 1
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Oct 2017 23:32:29 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5789: Add always false flag in bloom filter

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Hello Thomas Tauber-Marshall, Sailesh Mukil, Tim Armstrong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8170

to look at the new patch set (#6).

Change subject: IMPALA-5789: Add always_false flag in bloom filter
......................................................................

IMPALA-5789: Add always_false flag in bloom filter

This patch adds an always_false flag in bloom filters. The flag is set
if nothing has been inserted into the bloom filter. HdfsScanner uses
this flag to early terminate the scan at file and split granularities.

Testing: It passes existing tests. A test case is added checking that an
always-false runtime filter can filter out files and splits.

Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
---
M be/src/common/global-flags.cc
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/base-sequence-scanner.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner.cc
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M common/thrift/ImpalaInternalService.thrift
A tests/custom_cluster/test_always_false_filter.py
21 files changed, 227 insertions(+), 197 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/8170/6
-- 
To view, visit http://gerrit.cloudera.org:8080/8170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
Gerrit-Change-Number: 8170
Gerrit-PatchSet: 6
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5789: Add always false flag in bloom filter

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/8170 )

Change subject: IMPALA-5789: Add always_false flag in bloom filter
......................................................................


Patch Set 4: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/8170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
Gerrit-Change-Number: 8170
Gerrit-PatchSet: 4
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 11 Oct 2017 19:22:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5789: Add always false flag in bloom filter

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Hello Sailesh Mukil, Tim Armstrong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8170

to look at the new patch set (#3).

Change subject: IMPALA-5789: Add always_false flag in bloom filter
......................................................................

IMPALA-5789: Add always_false flag in bloom filter

This patch adds an always_false flag in bloom filters. The flag is set
if nothing has been inserted into the bloom filter. HdfsScanner uses
this flag to early terminate the scan at file and split granularities.

Testing: A test case is added checking that an always-false runtime
filter can filter out files and splits.

Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
---
M be/src/common/global-flags.cc
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/base-sequence-scanner.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner.cc
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M common/thrift/ImpalaInternalService.thrift
A tests/custom_cluster/test_always_false_filter.py
21 files changed, 233 insertions(+), 197 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/8170/3
-- 
To view, visit http://gerrit.cloudera.org:8080/8170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
Gerrit-Change-Number: 8170
Gerrit-PatchSet: 3
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5789: Add always false flag in bloom filter

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8170 )

Change subject: IMPALA-5789: Add always_false flag in bloom filter
......................................................................


Patch Set 8: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/8170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
Gerrit-Change-Number: 8170
Gerrit-PatchSet: 8
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 20 Oct 2017 03:59:47 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5789: Add always false flag in bloom filter

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has restored this change. ( http://gerrit.cloudera.org:8080/8170 )

Change subject: IMPALA-5789: Add always_false flag in bloom filter
......................................................................


Restored
-- 
To view, visit http://gerrit.cloudera.org:8080/8170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: restore
Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
Gerrit-Change-Number: 8170
Gerrit-PatchSet: 1
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5789: Add always false flag in bloom filter

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/8170 )

Change subject: IMPALA-5789: Add always_false flag in bloom filter
......................................................................


Patch Set 2:

> Patch Set 1:
> 
> (2 comments)
> 
> Will wait for the new PS but had a couple of high-level bits of feedback first. It may be worth looking at Thomas' patches if you haven't already to make sure your changes are compatible.

This is ready for review. Thomas's min-max filter hasn't been merged and https://gerrit.cloudera.org/c/8148/ does not conflict with this patch.


-- 
To view, visit http://gerrit.cloudera.org:8080/8170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
Gerrit-Change-Number: 8170
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Oct 2017 23:33:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5789: Add always false flag in bloom filter

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8170 )

Change subject: IMPALA-5789: Add always_false flag in bloom filter
......................................................................

IMPALA-5789: Add always_false flag in bloom filter

This patch adds an always_false flag in bloom filters. The flag is set
if nothing has been inserted into the bloom filter. HdfsScanner uses
this flag to early terminate the scan at file and split granularities.

Testing: It passes existing tests. Two test cases are added checking
that an always-false runtime filter can filter out files and splits.
In single node perf tests, time spent on primitive_empty_build_join_1
is reduced by 75%.

Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
Reviewed-on: http://gerrit.cloudera.org:8080/8170
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/common/global-flags.cc
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/base-sequence-scanner.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner.cc
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M common/thrift/ImpalaInternalService.thrift
A tests/custom_cluster/test_always_false_filter.py
M tests/query_test/test_runtime_filters.py
22 files changed, 200 insertions(+), 197 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/8170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
Gerrit-Change-Number: 8170
Gerrit-PatchSet: 9
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5789: Add always false flag in bloom filter

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Hello Thomas Tauber-Marshall, Sailesh Mukil, Tim Armstrong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8170

to look at the new patch set (#4).

Change subject: IMPALA-5789: Add always_false flag in bloom filter
......................................................................

IMPALA-5789: Add always_false flag in bloom filter

This patch adds an always_false flag in bloom filters. The flag is set
if nothing has been inserted into the bloom filter. HdfsScanner uses
this flag to early terminate the scan at file and split granularities.

Testing: It passes existing tests. A test case is added checking that an
always-false runtime filter can filter out files and splits.

Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
---
M be/src/common/global-flags.cc
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/base-sequence-scanner.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner.cc
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M common/thrift/ImpalaInternalService.thrift
A tests/custom_cluster/test_always_false_filter.py
21 files changed, 229 insertions(+), 198 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/8170/4
-- 
To view, visit http://gerrit.cloudera.org:8080/8170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
Gerrit-Change-Number: 8170
Gerrit-PatchSet: 4
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5789: Add always false flag in bloom filter

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8170

to look at the new patch set (#2).

Change subject: IMPALA-5789: Add always_false flag in bloom filter
......................................................................

IMPALA-5789: Add always_false flag in bloom filter

This patch adds an always_false flag in bloom filters. The flag is set
if nothing has been inserted into the bloom filter. HdfsScanner uses
this flag to early terminate the scan at file and split granularities.

Testing: A test case is added checking that an always-false runtime
filter can filter out files and splits.

Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
---
M be/src/common/global-flags.cc
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/base-sequence-scanner.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner.cc
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/runtime/runtime-filter.inline.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M common/thrift/ImpalaInternalService.thrift
A tests/custom_cluster/test_always_false_filter.py
21 files changed, 231 insertions(+), 196 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/8170/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If680240a3cd4583fc97c3192177d86d9567c4f8d
Gerrit-Change-Number: 8170
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>