You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Quanlong Huang (Code Review)" <ge...@cloudera.org> on 2022/08/23 01:12:01 UTC

[Impala-ASF-CR](branch-4.1.1) IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema

Hello Impala Public Jenkins,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/18888

to review the following change.


Change subject: IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema
......................................................................

IMPALA-11345: Parquet Bloom filtering failure if column is added to the
schema

If a new column was added to an existing table with existing data and
Parquet Bloom filtering was turned ON, queries having an equality
conjunct on the new column failed.

This was because the old Parquet data files did not have the new column
in their schema and could not find a column for the conjunct. This was
treated as an error and the query failed.

After this patch this situation is no longer treated as an error and the
conjunct is simply disregarded for Bloom filtering in the files that
lack the new column.

Testing:
 - added the test
   TestParquetBloomFilter::test_parquet_bloom_filtering_schema_change in
   tests/query_test/test_parquet_bloom_filter.py that checks that a
   query as described above does not fail.

Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
Reviewed-on: http://gerrit.cloudera.org:8080/18779
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M tests/query_test/test_parquet_bloom_filter.py
2 files changed, 87 insertions(+), 6 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/18888/1
-- 
To view, visit http://gerrit.cloudera.org:8080/18888
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: branch-4.1.1
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
Gerrit-Change-Number: 18888
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR](branch-4.1.1) IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/18888 )

Change subject: IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema
......................................................................


Patch Set 3: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/18888
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: branch-4.1.1
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
Gerrit-Change-Number: 18888
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 25 Aug 2022 12:18:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR](branch-4.1.1) IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Daniel Becker, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18888

to look at the new patch set (#2).

Change subject: IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema
......................................................................

IMPALA-11345: Parquet Bloom filtering failure if column is added to the
schema

If a new column was added to an existing table with existing data and
Parquet Bloom filtering was turned ON, queries having an equality
conjunct on the new column failed.

This was because the old Parquet data files did not have the new column
in their schema and could not find a column for the conjunct. This was
treated as an error and the query failed.

After this patch this situation is no longer treated as an error and the
conjunct is simply disregarded for Bloom filtering in the files that
lack the new column.

Testing:
 - added the test
   TestParquetBloomFilter::test_parquet_bloom_filtering_schema_change in
   tests/query_test/test_parquet_bloom_filter.py that checks that a
   query as described above does not fail.

Merge conflicts:
 - hdfs-parquet-scanner.cc removes usage of NeedDataInFile().

Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
Reviewed-on: http://gerrit.cloudera.org:8080/18779
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M tests/query_test/test_parquet_bloom_filter.py
2 files changed, 87 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/18888/2
-- 
To view, visit http://gerrit.cloudera.org:8080/18888
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: branch-4.1.1
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
Gerrit-Change-Number: 18888
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR](branch-4.1.1) IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18888 )

Change subject: IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema
......................................................................


Patch Set 3: Verified+1

Verified by https://jenkins.impala.io/job/gerrit-verify-dryrun/8494/


-- 
To view, visit http://gerrit.cloudera.org:8080/18888
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: branch-4.1.1
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
Gerrit-Change-Number: 18888
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 25 Aug 2022 12:16:19 +0000
Gerrit-HasComments: No

[Impala-ASF-CR](branch-4.1.1) IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/18888 )

Change subject: IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema
......................................................................

IMPALA-11345: Parquet Bloom filtering failure if column is added to the
schema

If a new column was added to an existing table with existing data and
Parquet Bloom filtering was turned ON, queries having an equality
conjunct on the new column failed.

This was because the old Parquet data files did not have the new column
in their schema and could not find a column for the conjunct. This was
treated as an error and the query failed.

After this patch this situation is no longer treated as an error and the
conjunct is simply disregarded for Bloom filtering in the files that
lack the new column.

Testing:
 - added the test
   TestParquetBloomFilter::test_parquet_bloom_filtering_schema_change in
   tests/query_test/test_parquet_bloom_filter.py that checks that a
   query as described above does not fail.

Merge conflicts:
 - hdfs-parquet-scanner.cc removes usage of NeedDataInFile().

Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
Reviewed-on: http://gerrit.cloudera.org:8080/18779
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-on: http://gerrit.cloudera.org:8080/18888
Tested-by: Quanlong Huang <hu...@gmail.com>
Reviewed-by: Csaba Ringhofer <cs...@cloudera.com>
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M tests/query_test/test_parquet_bloom_filter.py
2 files changed, 87 insertions(+), 6 deletions(-)

Approvals:
  Quanlong Huang: Verified
  Csaba Ringhofer: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/18888
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: branch-4.1.1
Gerrit-MessageType: merged
Gerrit-Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
Gerrit-Change-Number: 18888
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR](branch-4.1.1) IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18888 )

Change subject: IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema
......................................................................


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8484/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/18888
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: branch-4.1.1
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
Gerrit-Change-Number: 18888
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 Aug 2022 03:10:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR](branch-4.1.1) IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18888 )

Change subject: IMPALA-11345: Parquet Bloom filtering failure if column is added to the schema
......................................................................


Patch Set 2: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8484/


-- 
To view, visit http://gerrit.cloudera.org:8080/18888
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: branch-4.1.1
Gerrit-MessageType: comment
Gerrit-Change-Id: Ief3e6b6358d3dff3abe5beeda752033a7e8e16a6
Gerrit-Change-Number: 18888
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 Aug 2022 07:32:58 +0000
Gerrit-HasComments: No