You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Daniel Becker (Code Review)" <ge...@cloudera.org> on 2022/07/25 14:52:44 UTC

[Impala-ASF-CR] IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema

Daniel Becker has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18779


Change subject: IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema
......................................................................

IMPALA-11345: Parquet Bloom filtering failure if column is added to the
schema

If a new column was added to an existing table with existing data and
Parquet Bloom filtering was turned ON, queries having an equality
conjunct on the new column failed.

This was because the old Parquet data files did not have the new column
in their schema and could not find a column for the conjunct. This was
treated as an error and the query failed.

After this patch this situation is no longer treated as an error and the
conjunct is simply disregarded for Bloom filtering in the files that
lack the new column.

Testing:
 - added the test
   TestParquetBloomFilter::test_parquet_bloom_filtering_schema_change in
   tests/query_test/test_parquet_bloom_filter.py that checks that a
   query as described above does not fail.

Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M tests/query_test/test_parquet_bloom_filter.py
2 files changed, 45 insertions(+), 5 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/18779/1
-- 
To view, visit http://gerrit.cloudera.org:8080/18779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
Gerrit-Change-Number: 18779
Gerrit-PatchSet: 1
Gerrit-Owner: Daniel Becker <da...@cloudera.com>

[Impala-ASF-CR] IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18779 )

Change subject: IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11034/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
Gerrit-Change-Number: 18779
Gerrit-PatchSet: 1
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 25 Jul 2022 15:12:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/18779 )

Change subject: IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema
......................................................................

IMPALA-11345: Parquet Bloom filtering failure if column is added to the
schema

If a new column was added to an existing table with existing data and
Parquet Bloom filtering was turned ON, queries having an equality
conjunct on the new column failed.

This was because the old Parquet data files did not have the new column
in their schema and could not find a column for the conjunct. This was
treated as an error and the query failed.

After this patch this situation is no longer treated as an error and the
conjunct is simply disregarded for Bloom filtering in the files that
lack the new column.

Testing:
 - added the test
   TestParquetBloomFilter::test_parquet_bloom_filtering_schema_change in
   tests/query_test/test_parquet_bloom_filter.py that checks that a
   query as described above does not fail.

Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
Reviewed-on: http://gerrit.cloudera.org:8080/18779
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M tests/query_test/test_parquet_bloom_filter.py
2 files changed, 87 insertions(+), 6 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/18779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
Gerrit-Change-Number: 18779
Gerrit-PatchSet: 5
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18779 )

Change subject: IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema
......................................................................


Patch Set 3:

Build Successful 

https://ec2-35-162-169-52.us-west-2.compute.amazonaws.com/job/gerrit-code-review-checks/11035/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
Gerrit-Change-Number: 18779
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 29 Jul 2022 14:12:57 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18779 )

Change subject: IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11059/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
Gerrit-Change-Number: 18779
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 29 Jul 2022 14:12:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18779 )

Change subject: IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema
......................................................................


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8380/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/18779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
Gerrit-Change-Number: 18779
Gerrit-PatchSet: 4
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sat, 30 Jul 2022 08:09:40 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18779 )

Change subject: IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/18779/1/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/18779/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@1992
PS1, Line 1992: are 
> On the other hand, do we have a way to update those old parquet files too?

I think we can only regenerate the files.


http://gerrit.cloudera.org:8080/#/c/18779/2/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/18779/2/be/src/exec/parquet/hdfs-parquet-scanner.cc@2074
PS2, Line 2074: for the for the
nit: duplicated "for the"


http://gerrit.cloudera.org:8080/#/c/18779/2/be/src/exec/parquet/hdfs-parquet-scanner.cc@2090
PS2, Line 2090: unexpected_missing_fields
I think this should be 'expected_missing_fields'.



-- 
To view, visit http://gerrit.cloudera.org:8080/18779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
Gerrit-Change-Number: 18779
Gerrit-PatchSet: 2
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 29 Jul 2022 01:24:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18779 )

Change subject: IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema
......................................................................


Patch Set 4: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
Gerrit-Change-Number: 18779
Gerrit-PatchSet: 4
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sat, 30 Jul 2022 13:05:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18779 )

Change subject: IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18779/3/tests/query_test/test_parquet_bloom_filter.py
File tests/query_test/test_parquet_bloom_filter.py:

http://gerrit.cloudera.org:8080/#/c/18779/3/tests/query_test/test_parquet_bloom_filter.py@117
PS3, Line 117: v
flake8: E126 continuation line over-indented for hanging indent



-- 
To view, visit http://gerrit.cloudera.org:8080/18779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
Gerrit-Change-Number: 18779
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 29 Jul 2022 13:53:30 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18779 )

Change subject: IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18779/1/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/18779/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@1992
PS1, Line 1992: INFO
I think WARNING is better since this might impact performance.


http://gerrit.cloudera.org:8080/#/c/18779/1/be/src/exec/parquet/hdfs-parquet-scanner.cc@2002
PS1, Line 2002:           VLOG(google::INFO) << Substitute(
Logging this (and the above one) for each stat conjunct is too verbose, especially for large queries that have many predicates. Can we try to reduce the logs? E.g. collect the slot descriptors and log them once at the end of this method.



-- 
To view, visit http://gerrit.cloudera.org:8080/18779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
Gerrit-Change-Number: 18779
Gerrit-PatchSet: 1
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 28 Jul 2022 03:58:06 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18779 )

Change subject: IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema
......................................................................


Patch Set 4: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/18779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
Gerrit-Change-Number: 18779
Gerrit-PatchSet: 4
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sat, 30 Jul 2022 08:09:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18779 )

Change subject: IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18779/1/tests/query_test/test_parquet_bloom_filter.py
File tests/query_test/test_parquet_bloom_filter.py:

http://gerrit.cloudera.org:8080/#/c/18779/1/tests/query_test/test_parquet_bloom_filter.py@117
PS1, Line 117: v
flake8: E126 continuation line over-indented for hanging indent


http://gerrit.cloudera.org:8080/#/c/18779/1/tests/query_test/test_parquet_bloom_filter.py@117
PS1, Line 117: ;
flake8: E703 statement ends with a semicolon



-- 
To view, visit http://gerrit.cloudera.org:8080/18779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
Gerrit-Change-Number: 18779
Gerrit-PatchSet: 1
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 25 Jul 2022 14:53:40 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/18779 )

Change subject: IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema
......................................................................

IMPALA-11345: Parquet Bloom filtering failure if column is added to the
schema

If a new column was added to an existing table with existing data and
Parquet Bloom filtering was turned ON, queries having an equality
conjunct on the new column failed.

This was because the old Parquet data files did not have the new column
in their schema and could not find a column for the conjunct. This was
treated as an error and the query failed.

After this patch this situation is no longer treated as an error and the
conjunct is simply disregarded for Bloom filtering in the files that
lack the new column.

Testing:
 - added the test
   TestParquetBloomFilter::test_parquet_bloom_filtering_schema_change in
   tests/query_test/test_parquet_bloom_filter.py that checks that a
   query as described above does not fail.

Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M tests/query_test/test_parquet_bloom_filter.py
2 files changed, 87 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/18779/3
-- 
To view, visit http://gerrit.cloudera.org:8080/18779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
Gerrit-Change-Number: 18779
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18779 )

Change subject: IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18779/3/tests/query_test/test_parquet_bloom_filter.py
File tests/query_test/test_parquet_bloom_filter.py:

http://gerrit.cloudera.org:8080/#/c/18779/3/tests/query_test/test_parquet_bloom_filter.py@117
PS3, Line 117: v
flake8: E126 continuation line over-indented for hanging indent



-- 
To view, visit http://gerrit.cloudera.org:8080/18779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
Gerrit-Change-Number: 18779
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 29 Jul 2022 22:39:41 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18779 )

Change subject: IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema
......................................................................


Patch Set 3: Code-Review+2

LGTM. Thanks for fixing this quickly!


-- 
To view, visit http://gerrit.cloudera.org:8080/18779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
Gerrit-Change-Number: 18779
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sat, 30 Jul 2022 08:08:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/18779 )

Change subject: IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema
......................................................................

IMPALA-11345: Parquet Bloom filtering failure if column is added to the
schema

If a new column was added to an existing table with existing data and
Parquet Bloom filtering was turned ON, queries having an equality
conjunct on the new column failed.

This was because the old Parquet data files did not have the new column
in their schema and could not find a column for the conjunct. This was
treated as an error and the query failed.

After this patch this situation is no longer treated as an error and the
conjunct is simply disregarded for Bloom filtering in the files that
lack the new column.

Testing:
 - added the test
   TestParquetBloomFilter::test_parquet_bloom_filtering_schema_change in
   tests/query_test/test_parquet_bloom_filter.py that checks that a
   query as described above does not fail.

Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M tests/query_test/test_parquet_bloom_filter.py
2 files changed, 87 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/18779/2
-- 
To view, visit http://gerrit.cloudera.org:8080/18779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
Gerrit-Change-Number: 18779
Gerrit-PatchSet: 2
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18779 )

Change subject: IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18779/2/tests/query_test/test_parquet_bloom_filter.py
File tests/query_test/test_parquet_bloom_filter.py:

http://gerrit.cloudera.org:8080/#/c/18779/2/tests/query_test/test_parquet_bloom_filter.py@117
PS2, Line 117: v
flake8: E126 continuation line over-indented for hanging indent



-- 
To view, visit http://gerrit.cloudera.org:8080/18779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
Gerrit-Change-Number: 18779
Gerrit-PatchSet: 2
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 28 Jul 2022 13:18:27 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18779 )

Change subject: IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11051/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
Gerrit-Change-Number: 18779
Gerrit-PatchSet: 2
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 28 Jul 2022 13:38:26 +0000
Gerrit-HasComments: No