You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org> on 2022/06/09 16:00:10 UTC

[Impala-ASF-CR] IMPALA-11346: Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column

Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18605


Change subject: IMPALA-11346: Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column
......................................................................

IMPALA-11346: Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column

Identity-partitioned columns are not necessarily stored in the data
files. E.g. when we migrate a legacy partitioned table to Iceberg
without rewriting the data files, the partition columns won't be
present in the files.

The Parquet scanner does a few optimizations to eliminate row groups,
i.e. filtering based on stats, bloom filters, etc. When a column is
not present in the data file that has some predicate on, then it is
assumed that the whole row group doesn't pass the filtering criteria.

But for Iceberg some files might contain partition columns, while
other files doesn't, so we need to prepare the scanners to handle
such cases.

The ORC scanner doesn't have that many optimizations so it didn't
ran into this issue.

Testing:
 * e2e tests

Change-Id: Ie706317888981f634d792fb570f3eab1ec11a4f4
---
M be/src/exec/file-metadata-utils.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M testdata/workloads/functional-query/queries/QueryTest/iceberg-migrated-tables.test
3 files changed, 142 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/18605/1
-- 
To view, visit http://gerrit.cloudera.org:8080/18605
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie706317888981f634d792fb570f3eab1ec11a4f4
Gerrit-Change-Number: 18605
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11346: Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18605 )

Change subject: IMPALA-11346: Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column
......................................................................


Patch Set 1: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18605
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie706317888981f634d792fb570f3eab1ec11a4f4
Gerrit-Change-Number: 18605
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Fri, 10 Jun 2022 13:01:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11346: Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18605 )

Change subject: IMPALA-11346: Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column
......................................................................


Patch Set 1:

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8211/


-- 
To view, visit http://gerrit.cloudera.org:8080/18605
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie706317888981f634d792fb570f3eab1ec11a4f4
Gerrit-Change-Number: 18605
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Fri, 10 Jun 2022 07:16:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11346: Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18605 )

Change subject: IMPALA-11346: Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column
......................................................................


Patch Set 1: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8210/


-- 
To view, visit http://gerrit.cloudera.org:8080/18605
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie706317888981f634d792fb570f3eab1ec11a4f4
Gerrit-Change-Number: 18605
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Thu, 09 Jun 2022 22:06:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11346: Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/18605 )

Change subject: IMPALA-11346: Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column
......................................................................


Patch Set 1: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18605
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie706317888981f634d792fb570f3eab1ec11a4f4
Gerrit-Change-Number: 18605
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Thu, 09 Jun 2022 16:21:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11346: Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column

Posted by "Tamas Mate (Code Review)" <ge...@cloudera.org>.
Tamas Mate has posted comments on this change. ( http://gerrit.cloudera.org:8080/18605 )

Change subject: IMPALA-11346: Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column
......................................................................


Patch Set 1: Code-Review+2

Thank you for the quick fix!


-- 
To view, visit http://gerrit.cloudera.org:8080/18605
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie706317888981f634d792fb570f3eab1ec11a4f4
Gerrit-Change-Number: 18605
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Thu, 09 Jun 2022 17:40:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11346: Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
lipenglin@sensorsdata.cn has posted comments on this change. ( http://gerrit.cloudera.org:8080/18605 )

Change subject: IMPALA-11346: Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column
......................................................................


Patch Set 1: Code-Review+1

LGTM
Thanks for your fix, that is great!


-- 
To view, visit http://gerrit.cloudera.org:8080/18605
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie706317888981f634d792fb570f3eab1ec11a4f4
Gerrit-Change-Number: 18605
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Fri, 10 Jun 2022 03:35:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11346: Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18605 )

Change subject: IMPALA-11346: Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column
......................................................................


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8213/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/18605
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie706317888981f634d792fb570f3eab1ec11a4f4
Gerrit-Change-Number: 18605
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Fri, 10 Jun 2022 08:31:52 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11346: Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18605 )

Change subject: IMPALA-11346: Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10746/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18605
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie706317888981f634d792fb570f3eab1ec11a4f4
Gerrit-Change-Number: 18605
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Thu, 09 Jun 2022 16:19:48 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11346: Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18605 )

Change subject: IMPALA-11346: Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column
......................................................................


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8210/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/18605
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie706317888981f634d792fb570f3eab1ec11a4f4
Gerrit-Change-Number: 18605
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Thu, 09 Jun 2022 17:55:58 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11346: Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18605 )

Change subject: IMPALA-11346: Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column
......................................................................


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8211/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/18605
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie706317888981f634d792fb570f3eab1ec11a4f4
Gerrit-Change-Number: 18605
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Fri, 10 Jun 2022 02:43:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11346: Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/18605 )

Change subject: IMPALA-11346: Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column
......................................................................

IMPALA-11346: Migrated partitioned Iceberg tables might return ERROR when WHERE condition is used on partition column

Identity-partitioned columns are not necessarily stored in the data
files. E.g. when we migrate a legacy partitioned table to Iceberg
without rewriting the data files, the partition columns won't be
present in the files.

The Parquet scanner does a few optimizations to eliminate row groups,
i.e. filtering based on stats, bloom filters, etc. When a column is
not present in the data file that has some predicate on, then it is
assumed that the whole row group doesn't pass the filtering criteria.

But for Iceberg some files might contain partition columns, while
other files doesn't, so we need to prepare the scanners to handle
such cases.

The ORC scanner doesn't have that many optimizations so it didn't
ran into this issue.

Testing:
 * e2e tests

Change-Id: Ie706317888981f634d792fb570f3eab1ec11a4f4
Reviewed-on: http://gerrit.cloudera.org:8080/18605
Reviewed-by: Csaba Ringhofer <cs...@cloudera.com>
Reviewed-by: Tamas Mate <tm...@apache.org>
Reviewed-by: <li...@sensorsdata.cn>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/file-metadata-utils.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M testdata/workloads/functional-query/queries/QueryTest/iceberg-migrated-tables.test
3 files changed, 142 insertions(+), 1 deletion(-)

Approvals:
  Csaba Ringhofer: Looks good to me, but someone else must approve
  Tamas Mate: Looks good to me, approved
  lipenglin@sensorsdata.cn: Looks good to me, but someone else must approve
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/18605
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ie706317888981f634d792fb570f3eab1ec11a4f4
Gerrit-Change-Number: 18605
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>