You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Noemi Pap-Takacs (Code Review)" <ge...@cloudera.org> on 2022/12/13 17:04:14 UTC

[Impala-ASF-CR] IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format

Noemi Pap-Takacs has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19353


Change subject: IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format
......................................................................

IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format

This patch extends the support of Iceberg tables containing multiple
file formats. Now AVRO data files can also be read in a mixed table
besides Parquet and ORC.

Impala uses its avro scanner to read AVRO files, therefore all the
avro related limitations apply here as well: writes/metadata
changes are not supported.

testing:
- E2E testing: extending 'iceberg-mixed-file-format.test' to include
  AVRO files as well, in order to test reading all three currently
  supported file formats: avro+orc+parquet

Change-Id: I941adfb659218283eb5fec1b394bb3003f8072a6
---
M be/src/exec/hdfs-scan-node-base.cc
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/data/00000-0-data-noemi_20221026130844_b228ff88-5625-494b-b27a-7819aad52ced-job_16629766502890_0016-1-00001.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/data/00000-0-data-noemi_20221028111610_c7e89043-49e0-40fe-95a5-bf24d958ebc7-job_16629766502890_0017-1-00001.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/data/00000-0-data-noemi_20221028113321_fbfa5f31-421d-406a-9d46-6bec36d7a93c-job_16629766502890_0018-1-00001.orc
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/data/00000-0-data-noemi_20221028114730_e2f7d99d-7ad8-478c-a814-19e2d7912ad1-job_16629766502890_0019-1-00001.parquet
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/13c55017-b018-4ccb-a407-08e37e28eec8-m0.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/7b422180-e3f8-4500-b240-1424ef012246-m0.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/80a79f8a-5a47-44c9-b16d-4bef4a5ecec3-m0.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/8e66c338-5cd3-4b85-b986-18ec29b67d94-m0.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/snap-1131576191504541058-1-8e66c338-5cd3-4b85-b986-18ec29b67d94.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/snap-1744181916149214787-1-13c55017-b018-4ccb-a407-08e37e28eec8.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/snap-3243718219085059034-1-7b422180-e3f8-4500-b240-1424ef012246.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/snap-5089000375160183133-1-80a79f8a-5a47-44c9-b16d-4bef4a5ecec3.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/v1.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/v2.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/v3.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/v4.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/v5.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/v6.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/v7.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/v8.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/v9.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/version-hint.txt
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_only/data/00000-0-data-noemi_20221021195331_77fbb37f-2393-4a66-9656-61cd56b94b46-job_16629766502890_0015-1-00001.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_only/metadata/a9f8d35c-a852-49fe-996a-d94ae1896c32-m0.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_only/metadata/snap-725782911885631732-1-a9f8d35c-a852-49fe-996a-d94ae1896c32.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_only/metadata/v1.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_only/metadata/v2.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_only/metadata/version-hint.txt
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_only/version-hint.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/iceberg-avro.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-file-format.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M tests/query_test/test_iceberg.py
37 files changed, 60 insertions(+), 1,283 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/19353/1
-- 
To view, visit http://gerrit.cloudera.org:8080/19353
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I941adfb659218283eb5fec1b394bb3003f8072a6
Gerrit-Change-Number: 19353
Gerrit-PatchSet: 1
Gerrit-Owner: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>

[Impala-ASF-CR] IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19353 )

Change subject: IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/12023/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19353
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I941adfb659218283eb5fec1b394bb3003f8072a6
Gerrit-Change-Number: 19353
Gerrit-PatchSet: 1
Gerrit-Owner: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Tue, 13 Dec 2022 17:25:16 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19353 )

Change subject: IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format
......................................................................


Patch Set 1: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8904/


-- 
To view, visit http://gerrit.cloudera.org:8080/19353
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I941adfb659218283eb5fec1b394bb3003f8072a6
Gerrit-Change-Number: 19353
Gerrit-PatchSet: 1
Gerrit-Owner: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Tue, 13 Dec 2022 22:33:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has posted comments on this change. ( http://gerrit.cloudera.org:8080/19353 )

Change subject: IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format
......................................................................


Patch Set 1:

(1 comment)

Clean followup of the previous commit, added one few nitpick, otherwise LGTM

http://gerrit.cloudera.org:8080/#/c/19353/1/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/19353/1/be/src/exec/hdfs-scan-node-base.cc@306
PS1, Line 306:         if (file_metadata) {
We could add a DCHECK here too like L894, in case if we change/extend the file_metadata in the future and "file_metadata->iceberg_metadata()" can be nullptr



-- 
To view, visit http://gerrit.cloudera.org:8080/19353
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I941adfb659218283eb5fec1b394bb3003f8072a6
Gerrit-Change-Number: 19353
Gerrit-PatchSet: 1
Gerrit-Owner: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Wed, 14 Dec 2022 09:49:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19353 )

Change subject: IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format
......................................................................


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8922/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/19353
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I941adfb659218283eb5fec1b394bb3003f8072a6
Gerrit-Change-Number: 19353
Gerrit-PatchSet: 3
Gerrit-Owner: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Thu, 15 Dec 2022 12:39:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19353 )

Change subject: IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format
......................................................................


Patch Set 4: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/19353
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I941adfb659218283eb5fec1b394bb3003f8072a6
Gerrit-Change-Number: 19353
Gerrit-PatchSet: 4
Gerrit-Owner: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Fri, 16 Dec 2022 17:37:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19353 )

Change subject: IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format
......................................................................


Patch Set 2: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8914/


-- 
To view, visit http://gerrit.cloudera.org:8080/19353
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I941adfb659218283eb5fec1b394bb3003f8072a6
Gerrit-Change-Number: 19353
Gerrit-PatchSet: 2
Gerrit-Owner: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Wed, 14 Dec 2022 19:47:55 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19353 )

Change subject: IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format
......................................................................


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8914/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/19353
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I941adfb659218283eb5fec1b394bb3003f8072a6
Gerrit-Change-Number: 19353
Gerrit-PatchSet: 2
Gerrit-Owner: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Wed, 14 Dec 2022 14:40:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format

Posted by "Tamas Mate (Code Review)" <ge...@cloudera.org>.
Tamas Mate has posted comments on this change. ( http://gerrit.cloudera.org:8080/19353 )

Change subject: IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format
......................................................................


Patch Set 1:

(1 comment)

Nice improvement! I only had a code organisation question.

http://gerrit.cloudera.org:8080/#/c/19353/1/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/19353/1/be/src/exec/hdfs-scan-node-base.cc@306
PS1, Line 306:         if (file_metadata) {
             :           switch (file_metadata->iceberg_metadata()->file_format()) {
             :             case FbIcebergDataFileFormat::FbIcebergDataFileFormat_PARQUET:
             :               file_desc->file_format = THdfsFileFormat::PARQUET;
             :               break;
             :             case FbIcebergDataFileFormat::FbIcebergDataFileFormat_ORC:
             :               file_desc->file_format = THdfsFileFormat::ORC;
             :               break;
             :             case FbIcebergDataFileFormat::FbIcebergDataFileFormat_AVRO:
             :               file_desc->file_format = THdfsFileFormat::AVRO;
             :               break;
             :             default:
             :               return Status(Substitute(
             :                   "Unknown Iceberg file format type: $0",
             :                   file_metadata->iceberg_metadata()->file_format()));
             :           }
             :         } else {
             :           file_desc->file_format = partition_desc->file_format();
             :         }
Do you think we could move this logic into FileMetadataUtils?



-- 
To view, visit http://gerrit.cloudera.org:8080/19353
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I941adfb659218283eb5fec1b394bb3003f8072a6
Gerrit-Change-Number: 19353
Gerrit-PatchSet: 1
Gerrit-Owner: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Wed, 14 Dec 2022 13:42:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format

Posted by "Noemi Pap-Takacs (Code Review)" <ge...@cloudera.org>.
Noemi Pap-Takacs has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/19353 )

Change subject: IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format
......................................................................

IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format

This patch extends the support of Iceberg tables containing multiple
file formats. Now AVRO data files can also be read in a mixed table
besides Parquet and ORC.

Impala uses its avro scanner to read AVRO files, therefore all the
avro related limitations apply here as well: writes/metadata
changes are not supported.

testing:
- E2E testing: extending 'iceberg-mixed-file-format.test' to include
  AVRO files as well, in order to test reading all three currently
  supported file formats: avro+orc+parquet

Change-Id: I941adfb659218283eb5fec1b394bb3003f8072a6
---
M be/src/exec/hdfs-scan-node-base.cc
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/data/00000-0-data-noemi_20221026130844_b228ff88-5625-494b-b27a-7819aad52ced-job_16629766502890_0016-1-00001.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/data/00000-0-data-noemi_20221028111610_c7e89043-49e0-40fe-95a5-bf24d958ebc7-job_16629766502890_0017-1-00001.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/data/00000-0-data-noemi_20221028113321_fbfa5f31-421d-406a-9d46-6bec36d7a93c-job_16629766502890_0018-1-00001.orc
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/data/00000-0-data-noemi_20221028114730_e2f7d99d-7ad8-478c-a814-19e2d7912ad1-job_16629766502890_0019-1-00001.parquet
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/13c55017-b018-4ccb-a407-08e37e28eec8-m0.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/7b422180-e3f8-4500-b240-1424ef012246-m0.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/80a79f8a-5a47-44c9-b16d-4bef4a5ecec3-m0.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/8e66c338-5cd3-4b85-b986-18ec29b67d94-m0.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/snap-1131576191504541058-1-8e66c338-5cd3-4b85-b986-18ec29b67d94.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/snap-1744181916149214787-1-13c55017-b018-4ccb-a407-08e37e28eec8.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/snap-3243718219085059034-1-7b422180-e3f8-4500-b240-1424ef012246.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/snap-5089000375160183133-1-80a79f8a-5a47-44c9-b16d-4bef4a5ecec3.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/v1.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/v2.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/v3.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/v4.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/v5.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/v6.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/v7.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/v8.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/v9.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/version-hint.txt
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_only/data/00000-0-data-noemi_20221021195331_77fbb37f-2393-4a66-9656-61cd56b94b46-job_16629766502890_0015-1-00001.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_only/metadata/a9f8d35c-a852-49fe-996a-d94ae1896c32-m0.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_only/metadata/snap-725782911885631732-1-a9f8d35c-a852-49fe-996a-d94ae1896c32.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_only/metadata/v1.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_only/metadata/v2.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_only/metadata/version-hint.txt
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_only/version-hint.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/iceberg-avro.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-file-format.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M tests/query_test/test_iceberg.py
37 files changed, 61 insertions(+), 1,283 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/19353/2
-- 
To view, visit http://gerrit.cloudera.org:8080/19353
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I941adfb659218283eb5fec1b394bb3003f8072a6
Gerrit-Change-Number: 19353
Gerrit-PatchSet: 2
Gerrit-Owner: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>

[Impala-ASF-CR] IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19353 )

Change subject: IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/12032/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19353
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I941adfb659218283eb5fec1b394bb3003f8072a6
Gerrit-Change-Number: 19353
Gerrit-PatchSet: 2
Gerrit-Owner: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Wed, 14 Dec 2022 14:59:18 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format

Posted by "Tamas Mate (Code Review)" <ge...@cloudera.org>.
Tamas Mate has posted comments on this change. ( http://gerrit.cloudera.org:8080/19353 )

Change subject: IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format
......................................................................


Patch Set 3: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/19353
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I941adfb659218283eb5fec1b394bb3003f8072a6
Gerrit-Change-Number: 19353
Gerrit-PatchSet: 3
Gerrit-Owner: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Fri, 16 Dec 2022 12:24:13 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19353 )

Change subject: IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format
......................................................................


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8904/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/19353
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I941adfb659218283eb5fec1b394bb3003f8072a6
Gerrit-Change-Number: 19353
Gerrit-PatchSet: 1
Gerrit-Owner: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Tue, 13 Dec 2022 17:18:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19353 )

Change subject: IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format
......................................................................


Patch Set 3: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/19353
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I941adfb659218283eb5fec1b394bb3003f8072a6
Gerrit-Change-Number: 19353
Gerrit-PatchSet: 3
Gerrit-Owner: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Thu, 15 Dec 2022 17:44:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19353 )

Change subject: IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format
......................................................................


Patch Set 4: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/19353
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I941adfb659218283eb5fec1b394bb3003f8072a6
Gerrit-Change-Number: 19353
Gerrit-PatchSet: 4
Gerrit-Owner: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Fri, 16 Dec 2022 12:25:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19353 )

Change subject: IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format
......................................................................


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8925/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/19353
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I941adfb659218283eb5fec1b394bb3003f8072a6
Gerrit-Change-Number: 19353
Gerrit-PatchSet: 4
Gerrit-Owner: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Fri, 16 Dec 2022 12:25:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format

Posted by "Noemi Pap-Takacs (Code Review)" <ge...@cloudera.org>.
Noemi Pap-Takacs has posted comments on this change. ( http://gerrit.cloudera.org:8080/19353 )

Change subject: IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19353/1/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/19353/1/be/src/exec/hdfs-scan-node-base.cc@306
PS1, Line 306:         if (file_metadata) {
> We could add a DCHECK here too like L894, in case if we change/extend the f
Done


http://gerrit.cloudera.org:8080/#/c/19353/1/be/src/exec/hdfs-scan-node-base.cc@306
PS1, Line 306:         if (file_metadata) {
             :           DCHECK(file_metadata->iceberg_metadata() != nullptr);
             :           switch (file_metadata->iceberg_metadata()->file_format()) {
             :             case FbIcebergDataFileFormat::FbIcebergDataFileFormat_PARQUET:
             :               file_desc->file_format = THdfsFileFormat::PARQUET;
             :               break;
             :             case FbIcebergDataFileFormat::FbIcebergDataFileFormat_ORC:
             :               file_desc->file_format = THdfsFileFormat::ORC;
             :               break;
             :             case FbIcebergDataFileFormat::FbIcebergDataFileFormat_AVRO:
             :               file_desc->file_format = THdfsFileFormat::AVRO;
             :               break;
             :             default:
             :               return Status(Substitute(
             :                   "Unknown Iceberg file format type: $0",
             :                   file_metadata->iceberg_metadata()->file_format()));
             :           }
             :         } else {
             :          
> Do you think we could move this logic into FileMetadataUtils?
Makes sense, but requires a bit more refactoring. I would address this later, maybe together with some other scan node changes, if that is OK.



-- 
To view, visit http://gerrit.cloudera.org:8080/19353
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I941adfb659218283eb5fec1b394bb3003f8072a6
Gerrit-Change-Number: 19353
Gerrit-PatchSet: 2
Gerrit-Owner: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Thu, 15 Dec 2022 10:58:44 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/19353 )

Change subject: IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format
......................................................................

IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format

This patch extends the support of Iceberg tables containing multiple
file formats. Now AVRO data files can also be read in a mixed table
besides Parquet and ORC.

Impala uses its avro scanner to read AVRO files, therefore all the
avro related limitations apply here as well: writes/metadata
changes are not supported.

testing:
- E2E testing: extending 'iceberg-mixed-file-format.test' to include
  AVRO files as well, in order to test reading all three currently
  supported file formats: avro+orc+parquet

Change-Id: I941adfb659218283eb5fec1b394bb3003f8072a6
Reviewed-on: http://gerrit.cloudera.org:8080/19353
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/hdfs-scan-node-base.cc
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/data/00000-0-data-noemi_20221026130844_b228ff88-5625-494b-b27a-7819aad52ced-job_16629766502890_0016-1-00001.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/data/00000-0-data-noemi_20221028111610_c7e89043-49e0-40fe-95a5-bf24d958ebc7-job_16629766502890_0017-1-00001.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/data/00000-0-data-noemi_20221028113321_fbfa5f31-421d-406a-9d46-6bec36d7a93c-job_16629766502890_0018-1-00001.orc
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/data/00000-0-data-noemi_20221028114730_e2f7d99d-7ad8-478c-a814-19e2d7912ad1-job_16629766502890_0019-1-00001.parquet
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/13c55017-b018-4ccb-a407-08e37e28eec8-m0.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/7b422180-e3f8-4500-b240-1424ef012246-m0.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/80a79f8a-5a47-44c9-b16d-4bef4a5ecec3-m0.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/8e66c338-5cd3-4b85-b986-18ec29b67d94-m0.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/snap-1131576191504541058-1-8e66c338-5cd3-4b85-b986-18ec29b67d94.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/snap-1744181916149214787-1-13c55017-b018-4ccb-a407-08e37e28eec8.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/snap-3243718219085059034-1-7b422180-e3f8-4500-b240-1424ef012246.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/snap-5089000375160183133-1-80a79f8a-5a47-44c9-b16d-4bef4a5ecec3.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/v1.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/v2.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/v3.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/v4.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/v5.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/v6.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/v7.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/v8.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/v9.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_mixed/metadata/version-hint.txt
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_only/data/00000-0-data-noemi_20221021195331_77fbb37f-2393-4a66-9656-61cd56b94b46-job_16629766502890_0015-1-00001.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_only/metadata/a9f8d35c-a852-49fe-996a-d94ae1896c32-m0.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_only/metadata/snap-725782911885631732-1-a9f8d35c-a852-49fe-996a-d94ae1896c32.avro
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_only/metadata/v1.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_only/metadata/v2.metadata.json
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_only/metadata/version-hint.txt
D testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_avro_only/version-hint.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-query/queries/QueryTest/iceberg-avro.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-mixed-file-format.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M tests/query_test/test_iceberg.py
37 files changed, 61 insertions(+), 1,283 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/19353
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I941adfb659218283eb5fec1b394bb3003f8072a6
Gerrit-Change-Number: 19353
Gerrit-PatchSet: 5
Gerrit-Owner: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>

[Impala-ASF-CR] IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19353 )

Change subject: IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format
......................................................................


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8910/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/19353
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I941adfb659218283eb5fec1b394bb3003f8072a6
Gerrit-Change-Number: 19353
Gerrit-PatchSet: 1
Gerrit-Owner: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Wed, 14 Dec 2022 08:59:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19353 )

Change subject: IMPALA-11708: Add support for mixed Iceberg tables with AVRO file format
......................................................................


Patch Set 1:

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8910/


-- 
To view, visit http://gerrit.cloudera.org:8080/19353
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I941adfb659218283eb5fec1b394bb3003f8072a6
Gerrit-Change-Number: 19353
Gerrit-PatchSet: 1
Gerrit-Owner: Noemi Pap-Takacs <np...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Wed, 14 Dec 2022 14:11:46 +0000
Gerrit-HasComments: No