You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org> on 2021/09/20 13:10:38 UTC

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17857


Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................

IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Before this patch Impala inconsistently scheduled scan ranges for
Iceberg tables on HDFS, in local catalog mode. It did so because
LocalIcebergTable reloaded all the files descriptors, and the HDFS
block locations were not consistent across the reloads. Impala's
scheduler uses the block location list for scan range assignment,
hence the assignments were inconsistent between queries. This has
a negative effect on caching and hence hit performance quite badly.

It is redundant and expensive to reload file descriptors for each
query in local catalog mode. This patch extends the GetPartialInfo()
RPC with Iceberg-specific snapshot information. It means that the
coordinator is now able to fetch Iceberg data file descriptors from
the CatalogD. This way scan range assignment becomes consistent
because we reuse the same file descriptors with the same block
location information.

Fixing the above revealed another bug. Before this patch we didn't
handle self-events of Iceberg tables. When an Iceberg table is stored
in the HiveCatalog it means that Iceberg will update the HMS table
on modifications. Then Catalogd processes these modifications again
when they were arrive via the event notification mechanism. I fixed
this by creating Iceberg transactions in which I set the catalog
service ID and new catalog version for the Iceberg table.

Testing:
 * added e2e test for the scan range assignment

Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
---
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-catalogs.test
M tests/query_test/test_iceberg.py
16 files changed, 290 insertions(+), 104 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/17857/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................


Patch Set 2:

(19 comments)

Thanks everyone for the comments!

http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@11
PS1, Line 11:  and the HDFS
            : block locations were not consistent across the reload
> +1. 
When we retrieve the file status from HDFS we get block location information as well. I.e. for each block we get a list about where can we find the replicas. The order of the elements in this list is random for each file status call.

The order of the block locations matter during scheduling. In other words, even if the state of HDFS blocks remain the same, Impala might schedule scan ranges differently because it will see the block locations in different order.


http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@25
PS1, Line 25: Fixing the above revealed another bug. Before this patch we didn't
            : handle self-events of Iceberg tables.
> Can you add more information about this? For example what events are genera
I extended the following sentence.


http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@31
PS1, Line 31: via the event notification mechanism. I fixed this by cr
> +1 if we have native version number support in Iceberg it is better to use 
Iceberg tables can be stored in different Iceberg Catalogs. One of it is HiveCatalog which uses HMS to store information about Iceberg tables. Most importantly it stores snapshot information in table property 'metadata_location'. We could check if it has changed.

But if the Iceberg table is stored in HadoopTables/HadoopCatalog, then we would need to do expensive IO operations to determine the snapshot ID.

I think we could use the snapshot ID in IcebergTable.load() to avoid reloading a lot of information. But we still need to detect self-events because the catalogVersion of the table gets updated and it can cause exceptions in local catalog mode (InconsistentMetadataFetchException with message "Catalog object %s changed version between accesses.").

I'm looking forward to the overhaul of self-events logic and how it will work with IcebergTables in HiveCatalog (where we don't have control of HMS RPCs).


http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@34
PS1, Line 34: Iceberg has to embed all table modifications in a single ALTER TABLE
> It is a bit tricky, but self-event handling can be also tested, see https:/
Added e2e tests.


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@348
PS1, Line 348: hdfsTable_
             :             .load(false, msClient, msTable_, true, true, false, null, null,null, reason);
             :         pathHashToFileDescMap_ = Utils.loadAllPartition(this);
> This is probably unrelated to this patch I was curious to understand what i
Yeah, it's on the radar: IMPALA-10254

Currently metadata handling of Iceberg tables is quite ugly and redundant, especially file metadata.

We load all the file descriptors recursively here via file listings. Then we also load the file descriptors one-by-one...

I think as a first step we should at least reuse the file descriptors of the underlying hdfsTable_ because the recursive listing is probably more efficient than the file status RPCs one-by-one...

But the ideal would be to refactor Iceberg tables, hopefully get rid of the underlying HdfsTable. And most importantly create file descriptors from the information in Iceberg metadata files. Though the latter might not achievable when HDFS is used to store data files because we need block location information. But on S3 we should only use Iceberg metadata. The only challenge there is to come up with a "modification time" of the files which is needed for the remote data cache (hopefully we can use the snapshot time if noone verifies it).


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@481
PS1, Line 481:     TGetPartialCatalogObjectResponse resp =
             :         getHdfsTable().getPartialInfo(req, missingPartialInfos);
             :     if (req.table_info_selector.want_iceberg_snapshot) {
             :       resp.table_info.setIceberg_snapshot(
             :           FeIcebergTable.Utils.createTIcebergSnapshot(this));
             :     }
> are we sending some file descriptors twice here? Some from the hdfsTable an
Yes, we send all data file descriptors twice. Let me deal with it in a separate patch.


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java
File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java:

http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java@104
PS1, Line 104:     properties.setProperty("external.table.purge", "TRUE");
> Is this change related to the ones mentioned in the commit message?
Yes, because before this change we issued an ALTER TABLE statement to set this property. Which we processed later during event processing.


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java@151
PS1, Line 151:     properties.setProperty(IcebergTable.ICEBERG_CATALOG,
             :                            tableProps.get(IcebergTable.ICEBERG_CATALOG));
> Is this change related to the ones mentioned in the commit message?
Yes, because in the past we set it in a separate ALTER TABLE.


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@110
PS1, Line 110:           "Failed to load table: %s.%s", msTable.getDbName(), msTable.getTableName()), e);
> nit probably should include the snapshot id in the message, if it is not in
We would also retrieve the snapshot id in loadIcebergSnapshot().


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@116
PS1, Line 116: pathHashToFileDescMap_
> Am I correct in understanding that pathHashToFileDescMap_ is always loaded 
Yes, we shouldn't load file descriptors twice, but currently things break if we don't do that. Let me deal with it in a separate patch, probably in the context of IMPALA-10254.


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1227
PS1, Line 1227: a
> nit. an
Done


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1268
PS1, Line 1268:           addSummary(response, "Updated table.");
> Updated table after set table properties
I used the same message that we use at L1079.


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1272
PS1, Line 1272: Updated table.
> Updated table after unset table properties
I used the same message that we use at L1090.


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1295
PS1, Line 1295:     if (!needsToUpdateHms) {
              :       loadTableMetadata(tbl, newCatalogVersion, true, true, null, "ALTER Iceberg TABLE " +
              :           params.getAlter_type().name());
              :       catalog_.addVersionsForInflightEvents(false, tbl, newCatalogVersion);
              :       addTableToCatalogUpdate(tbl, wantMinimalResult, response.result);
              :       return false;
              :     }
> Should this block of code be executed within 1283-1286 too? That is, if txn
SET PARTITION SPEC doesn't use an Iceberg transaction and we don't need to update HMS either.


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2915
PS1, Line 2915: IcebergCatalogOpExecutor.addCatalogVersionToTxn(iceTxn,
> It is not clear to me why do we need to increment the catalog version here 
dropTableStats() issues an ALTER TABLE statement, then the iceTxn.commitTransaction() will issue another ALTER TABLE statement, therefore we need to invoke catalog_.addVersionsForInflightEvents() twice (once at L2914 and once at L2773).

It looked reasonable to me to use two different catalogVersions, but seems like using the same catalogVersion twice also works.


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2918
PS1, Line 2918:     iceTxn.commitTransaction();
              :     return newCatalogVersion;
> Can you move these statements into a finally block and avoid code duplicati
Actually we can remove these two lines, since after the if stmt we have the same two lines.

I'm not sure about putting commitTransaction() to a finally block. I'd think we don't want to commit in case of exceptions.


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6070
PS1, Line 6070: 
> Can we add test coverage for this in the test_self_events?
Added a test to test_events_custom_configs.py


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@113
PS1, Line 113: addColumn
> nit. addColumns()?
Done


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@339
PS1, Line 339: table properties usi
> nit. update property in 'txn'
Updated the comment.



-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 23 Sep 2021 13:34:40 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................


Patch Set 5: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Sat, 02 Oct 2021 16:35:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7516/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 04 Oct 2021 10:02:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tamas Mate, Qifan Chen, Vihang Karajgaonkar, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17857

to look at the new patch set (#6).

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................

IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Before this patch Impala inconsistently scheduled scan ranges for
Iceberg tables on HDFS, in local catalog mode. It did so because
LocalIcebergTable reloaded all the files descriptors, and the HDFS
block locations were not consistent across the reloads. Impala's
scheduler uses the block location list for scan range assignment,
hence the assignments were inconsistent between queries. This has
a negative effect on caching and hence hit performance quite badly.

It is redundant and expensive to reload file descriptors for each
query in local catalog mode. This patch extends the GetPartialInfo()
RPC with Iceberg-specific snapshot information. It means that the
coordinator is now able to fetch Iceberg data file descriptors from
the CatalogD. This way scan range assignment becomes consistent
because we reuse the same file descriptors with the same block
location information.

Fixing the above revealed another bug. Before this patch we didn't
handle self-events of Iceberg tables. When an Iceberg table is stored
in the HiveCatalog it means that Iceberg will update the HMS table
on modifications because it needs to update table property
'metadata_location' (this points to the new snapshot file).
Then Catalogd processes these modifications again when they arrive
via the event notification mechanism. I fixed this by creating Iceberg
transactions in which I set the catalog service ID and new catalog
version for the Iceberg table. Since we are using transactions now
Iceberg has to embed all table modifications in a single ALTER TABLE
request to HMS, and detect the corresponding alter event later via the
aforementioned catalog service ID and version.

Testing:
 * added e2e test for the scan range assignment
 * added e2e test for detecting self-events

Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
---
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-catalogs.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
M tests/custom_cluster/test_events_custom_configs.py
M tests/metadata/test_show_create_table.py
M tests/query_test/test_iceberg.py
M tests/stress/test_insert_stress.py
21 files changed, 432 insertions(+), 140 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/17857/6
-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tamas Mate, Qifan Chen, Vihang Karajgaonkar, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17857

to look at the new patch set (#3).

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................

IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Before this patch Impala inconsistently scheduled scan ranges for
Iceberg tables on HDFS, in local catalog mode. It did so because
LocalIcebergTable reloaded all the files descriptors, and the HDFS
block locations were not consistent across the reloads. Impala's
scheduler uses the block location list for scan range assignment,
hence the assignments were inconsistent between queries. This has
a negative effect on caching and hence hit performance quite badly.

It is redundant and expensive to reload file descriptors for each
query in local catalog mode. This patch extends the GetPartialInfo()
RPC with Iceberg-specific snapshot information. It means that the
coordinator is now able to fetch Iceberg data file descriptors from
the CatalogD. This way scan range assignment becomes consistent
because we reuse the same file descriptors with the same block
location information.

Fixing the above revealed another bug. Before this patch we didn't
handle self-events of Iceberg tables. When an Iceberg table is stored
in the HiveCatalog it means that Iceberg will update the HMS table
on modifications because it needs to update table property
'metadata_location' (this points to the new snapshot file).
Then Catalogd processes these modifications again when they arrive
via the event notification mechanism. I fixed this by creating Iceberg
transactions in which I set the catalog service ID and new catalog
version for the Iceberg table. Since we are using transactions now
Iceberg has to embed all table modifications in a single ALTER TABLE
request to HMS, and detect the corresponding alter event later via the
aforementioned catalog service ID and version.

Testing:
 * added e2e test for the scan range assignment
 * added e2e test for detecting self-events

Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
---
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-catalogs.test
M tests/custom_cluster/test_events_custom_configs.py
M tests/query_test/test_iceberg.py
17 files changed, 342 insertions(+), 105 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/17857/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................


Patch Set 1:

(8 comments)

Looks very reasonable. 

It is not clear to me how a user select a particular iceberg table snapshot to use in a query.

http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@11
PS1, Line 11:  and the HDFS
            : block locations were not consistent across the reload
> This is not clear to me: if the block locations have changed between querie
+1. 

I think the fix is applicable when an iceberg table of a particular snapshot is to be scanned. In doing so for the same table instance, the partition info about the snapshot can be reused.

When only the last version of the table is to be scanned, my impression is that the patch should not be applied.


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@110
PS1, Line 110:           "Failed to load table: %s.%s", msTable.getDbName(), msTable.getTableName()), e);
nit probably should include the snapshot id in the message, if it is not in e.


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1227
PS1, Line 1227: a
nit. an


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1268
PS1, Line 1268:           addSummary(response, "Updated table.");
Updated table after set table properties


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1272
PS1, Line 1272: Updated table.
Updated table after unset table properties


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1295
PS1, Line 1295:     if (!needsToUpdateHms) {
              :       loadTableMetadata(tbl, newCatalogVersion, true, true, null, "ALTER Iceberg TABLE " +
              :           params.getAlter_type().name());
              :       catalog_.addVersionsForInflightEvents(false, tbl, newCatalogVersion);
              :       addTableToCatalogUpdate(tbl, wantMinimalResult, response.result);
              :       return false;
              :     }
Should this block of code be executed within 1283-1286 too? That is, if txn is needed, then this block of code is protected by the txn.


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@113
PS1, Line 113: addColumn
nit. addColumns()?


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@339
PS1, Line 339: the table parameters
nit. update property in 'txn'



-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Tue, 21 Sep 2021 17:23:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................


Patch Set 3: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7499/


-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 28 Sep 2021 14:43:29 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17857/2/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/17857/2/tests/custom_cluster/test_events_custom_configs.py@270
PS2, Line 270: =
flake8: E225 missing whitespace around operator



-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Thu, 23 Sep 2021 13:33:34 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7507/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 30 Sep 2021 16:48:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tamas Mate, Qifan Chen, Vihang Karajgaonkar, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17857

to look at the new patch set (#4).

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................

IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Before this patch Impala inconsistently scheduled scan ranges for
Iceberg tables on HDFS, in local catalog mode. It did so because
LocalIcebergTable reloaded all the files descriptors, and the HDFS
block locations were not consistent across the reloads. Impala's
scheduler uses the block location list for scan range assignment,
hence the assignments were inconsistent between queries. This has
a negative effect on caching and hence hit performance quite badly.

It is redundant and expensive to reload file descriptors for each
query in local catalog mode. This patch extends the GetPartialInfo()
RPC with Iceberg-specific snapshot information. It means that the
coordinator is now able to fetch Iceberg data file descriptors from
the CatalogD. This way scan range assignment becomes consistent
because we reuse the same file descriptors with the same block
location information.

Fixing the above revealed another bug. Before this patch we didn't
handle self-events of Iceberg tables. When an Iceberg table is stored
in the HiveCatalog it means that Iceberg will update the HMS table
on modifications because it needs to update table property
'metadata_location' (this points to the new snapshot file).
Then Catalogd processes these modifications again when they arrive
via the event notification mechanism. I fixed this by creating Iceberg
transactions in which I set the catalog service ID and new catalog
version for the Iceberg table. Since we are using transactions now
Iceberg has to embed all table modifications in a single ALTER TABLE
request to HMS, and detect the corresponding alter event later via the
aforementioned catalog service ID and version.

Testing:
 * added e2e test for the scan range assignment
 * added e2e test for detecting self-events

Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
---
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-catalogs.test
M tests/custom_cluster/test_events_custom_configs.py
M tests/metadata/test_show_create_table.py
M tests/query_test/test_iceberg.py
M tests/stress/test_insert_stress.py
20 files changed, 408 insertions(+), 127 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/17857/4
-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7510/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 01 Oct 2021 10:31:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................

IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Before this patch Impala inconsistently scheduled scan ranges for
Iceberg tables on HDFS, in local catalog mode. It did so because
LocalIcebergTable reloaded all the files descriptors, and the HDFS
block locations were not consistent across the reloads. Impala's
scheduler uses the block location list for scan range assignment,
hence the assignments were inconsistent between queries. This has
a negative effect on caching and hence hit performance quite badly.

It is redundant and expensive to reload file descriptors for each
query in local catalog mode. This patch extends the GetPartialInfo()
RPC with Iceberg-specific snapshot information. It means that the
coordinator is now able to fetch Iceberg data file descriptors from
the CatalogD. This way scan range assignment becomes consistent
because we reuse the same file descriptors with the same block
location information.

Fixing the above revealed another bug. Before this patch we didn't
handle self-events of Iceberg tables. When an Iceberg table is stored
in the HiveCatalog it means that Iceberg will update the HMS table
on modifications because it needs to update table property
'metadata_location' (this points to the new snapshot file).
Then Catalogd processes these modifications again when they arrive
via the event notification mechanism. I fixed this by creating Iceberg
transactions in which I set the catalog service ID and new catalog
version for the Iceberg table. Since we are using transactions now
Iceberg has to embed all table modifications in a single ALTER TABLE
request to HMS, and detect the corresponding alter event later via the
aforementioned catalog service ID and version.

Testing:
 * added e2e test for the scan range assignment
 * added e2e test for detecting self-events

Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Reviewed-on: http://gerrit.cloudera.org:8080/17857
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Qifan Chen <qc...@cloudera.com>
---
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-catalogs.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
M tests/custom_cluster/test_events_custom_configs.py
M tests/metadata/test_show_create_table.py
M tests/query_test/test_iceberg.py
M tests/stress/test_insert_stress.py
21 files changed, 432 insertions(+), 140 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Qifan Chen: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................


Patch Set 6: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 04 Oct 2021 17:34:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................


Patch Set 3: Code-Review+1

(1 comment)

Great work!

http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@11
PS1, Line 11:  and the HDFS
            : block locations were not consistent across the reload
> Sorting is one possibility, but I think we shouldn't reload file descriptor
Okay on sorting. Yes I agree if the set of file descriptors are fixed, there is no point to reload them for every query.

I like the idea of utilizing CatalogD as the key to get the right version of meta-data including file descriptors.



-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 28 Sep 2021 13:08:42 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................


Patch Set 2:

(4 comments)

Thanks a lot for addressing my comments! Looks good!

http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@11
PS1, Line 11:  and the HDFS
            : block locations were not consistent across the reload
> When we retrieve the file status from HDFS we get block location informatio
In this case, if we sort the block locations, would it make things better? 

With this patch, it is true that we can only access iceberg tables associated with a snapshot? That worries me a bit.


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1268
PS1, Line 1268:           addSummary(response, "Updated table.");
> I used the same message that we use at L1079.
Done


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1272
PS1, Line 1272: Updated table.
> I used the same message that we use at L1090.
Done


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1295
PS1, Line 1295:     if (!needsToUpdateHms) {
              :       loadTableMetadata(tbl, newCatalogVersion, true, true, null, "ALTER Iceberg TABLE " +
              :           params.getAlter_type().name());
              :       catalog_.addVersionsForInflightEvents(false, tbl, newCatalogVersion);
              :       addTableToCatalogUpdate(tbl, wantMinimalResult, response.result);
              :       return false;
              :     }
> SET PARTITION SPEC doesn't use an Iceberg transaction and we don't need to 
I see. Maybe add a comment to indicate it is not transactional and the work is to set partition specs.



-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 27 Sep 2021 14:38:49 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7499/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 28 Sep 2021 08:21:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9515/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 28 Sep 2021 08:02:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................


Patch Set 6:

PS6 is only a rebase.


-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 04 Oct 2021 10:01:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................


Patch Set 5: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7510/


-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 01 Oct 2021 16:46:55 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................


Patch Set 7:

Thanks for the review!


-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 05 Oct 2021 09:40:45 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................


Patch Set 6: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 04 Oct 2021 16:21:45 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9552/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 04 Oct 2021 10:23:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@31
PS1, Line 31: service ID and new catalog version for the Iceberg table
> Doesn't Iceberg have something similar to catalog version, e.g. something l
+1 if we have native version number support in Iceberg it is better to use that detect self-events. I am okay doing it in a separate patch if needed. The self-events logic for ALTER events is up for a overhaul in https://gerrit.cloudera.org/#/c/17756/ which would use the eventId based detection instead of catalog version number.


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@348
PS1, Line 348: hdfsTable_
             :             .load(false, msClient, msTable_, true, true, false, null, null,null, reason);
             :         pathHashToFileDescMap_ = Utils.loadAllPartition(this);
This is probably unrelated to this patch I was curious to understand what is the difference between the file descriptors which are loaded by hdfsTable_.load() and the loadAllPartition(this) method here. Is there a overlap in these 2 collections of file descriptors?


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@481
PS1, Line 481:     TGetPartialCatalogObjectResponse resp =
             :         getHdfsTable().getPartialInfo(req, missingPartialInfos);
             :     if (req.table_info_selector.want_iceberg_snapshot) {
             :       resp.table_info.setIceberg_snapshot(
             :           FeIcebergTable.Utils.createTIcebergSnapshot(this));
             :     }
are we sending some file descriptors twice here? Some from the hdfsTable and some from the pathHashToFileDescMap?


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@116
PS1, Line 116: pathHashToFileDescMap_
Am I correct in understanding that pathHashToFileDescMap_ is always loaded from the snapshot? If yes, does catalogd need to load the filedescriptors in the hdfsTable_ backing this iceberg table?


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2915
PS1, Line 2915: newCatalogVersion = catalog_.incrementAndGetCatalogVersion();
It is not clear to me why do we need to increment the catalog version here again?


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2918
PS1, Line 2918:       iceTxn.commitTransaction();
              :       return newCatalogVersion;
Can you move these statements into a finally block and avoid code duplication in the if block and later down below?


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6070
PS1, Line 6070: addVersionsForInflightEvents
Can we add test coverage for this in the test_self_events?



-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Mon, 20 Sep 2021 20:48:33 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9535/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 30 Sep 2021 17:07:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7514/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Sat, 02 Oct 2021 10:24:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@11
PS1, Line 11:  and the HDFS
            : block locations were not consistent across the reload
> In this case, if we sort the block locations, would it make things better? 
Sorting is one possibility, but I think we shouldn't reload file descriptors if we don't have to. And since we should always reuse file descriptors, sorting becomes unnecessary and could potentially cause perf regression.

With this patch coordinators access the Iceberg snapshot that is loaded by CatalogD. When CatalogD refreshes the Iceberg table, the coordinators will use the new version. This behavior is analog to what we do for other tables.


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1295
PS1, Line 1295:     if (!needsToUpdateHms) {
              :       loadTableMetadata(tbl, newCatalogVersion, true, true, null, "ALTER Iceberg TABLE " +
              :           params.getAlter_type().name());
              :       catalog_.addVersionsForInflightEvents(false, tbl, newCatalogVersion);
              :       addTableToCatalogUpdate(tbl, wantMinimalResult, response.result);
              :       return false;
              :     }
> I see. Maybe add a comment to indicate it is not transactional and the work
Added a comment with some explanation.


http://gerrit.cloudera.org:8080/#/c/17857/2/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/17857/2/tests/custom_cluster/test_events_custom_configs.py@270
PS2, Line 270: n
> flake8: E225 missing whitespace around operator
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 28 Sep 2021 07:46:31 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9472/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Mon, 20 Sep 2021 13:33:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................


Patch Set 1:

(6 comments)

Some high level comments.
The code looks good to me so far, but I will go through it once more.

http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@11
PS1, Line 11:  and the HDFS
            : block locations were not consistent across the reload
This is not clear to me: if the block locations have changed between queries, then it makes sense to schedule them differently, but if they didn't change (which seems the more common case to me), then how are they different?


http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@25
PS1, Line 25: Fixing the above revealed another bug. Before this patch we didn't
            : handle self-events of Iceberg tables.
Can you add more information about this? For example what events are generated in HMS for different Iceberg operations? I guess that schema changes will be treated similarly to the case when Impala calls HMS directly, but I can't imagine what happens in case of INSERTs at the moment.


http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@31
PS1, Line 31: service ID and new catalog version for the Iceberg table
Doesn't Iceberg have something similar to catalog version, e.g. something like "snapshot id", that could be used to check whether we need to reload data?

The current way we handle self-events is a horrible hack IMO which is needed because HMS doesn't provide a usable version number for table/partition metadata. Just hoping that Iceberg does have something like that :)


http://gerrit.cloudera.org:8080/#/c/17857/1//COMMIT_MSG@34
PS1, Line 34:  * added e2e test for the scan range assignment
It is a bit tricky, but self-event handling can be also tested, see https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java#L508


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java
File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java:

http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java@104
PS1, Line 104:     properties.setProperty("external.table.purge", "TRUE");
Is this change related to the ones mentioned in the commit message?


http://gerrit.cloudera.org:8080/#/c/17857/1/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java@151
PS1, Line 151:     properties.setProperty(IcebergTable.ICEBERG_CATALOG,
             :                            tableProps.get(IcebergTable.ICEBERG_CATALOG));
Is this change related to the ones mentioned in the commit message?



-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Mon, 20 Sep 2021 15:40:32 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tamas Mate, Qifan Chen, Vihang Karajgaonkar, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17857

to look at the new patch set (#5).

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................

IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Before this patch Impala inconsistently scheduled scan ranges for
Iceberg tables on HDFS, in local catalog mode. It did so because
LocalIcebergTable reloaded all the files descriptors, and the HDFS
block locations were not consistent across the reloads. Impala's
scheduler uses the block location list for scan range assignment,
hence the assignments were inconsistent between queries. This has
a negative effect on caching and hence hit performance quite badly.

It is redundant and expensive to reload file descriptors for each
query in local catalog mode. This patch extends the GetPartialInfo()
RPC with Iceberg-specific snapshot information. It means that the
coordinator is now able to fetch Iceberg data file descriptors from
the CatalogD. This way scan range assignment becomes consistent
because we reuse the same file descriptors with the same block
location information.

Fixing the above revealed another bug. Before this patch we didn't
handle self-events of Iceberg tables. When an Iceberg table is stored
in the HiveCatalog it means that Iceberg will update the HMS table
on modifications because it needs to update table property
'metadata_location' (this points to the new snapshot file).
Then Catalogd processes these modifications again when they arrive
via the event notification mechanism. I fixed this by creating Iceberg
transactions in which I set the catalog service ID and new catalog
version for the Iceberg table. Since we are using transactions now
Iceberg has to embed all table modifications in a single ALTER TABLE
request to HMS, and detect the corresponding alter event later via the
aforementioned catalog service ID and version.

Testing:
 * added e2e test for the scan range assignment
 * added e2e test for detecting self-events

Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
---
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-catalogs.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
M tests/custom_cluster/test_events_custom_configs.py
M tests/metadata/test_show_create_table.py
M tests/query_test/test_iceberg.py
M tests/stress/test_insert_stress.py
21 files changed, 432 insertions(+), 140 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/17857/5
-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................


Patch Set 4: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7507/


-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 30 Sep 2021 23:05:42 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9539/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 01 Oct 2021 10:53:41 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17857 )

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9493/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 23 Sep 2021 13:55:55 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tamas Mate, Qifan Chen, Vihang Karajgaonkar, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17857

to look at the new patch set (#2).

Change subject: IMPALA-10914: Consistently schedule scan ranges for Iceberg tables
......................................................................

IMPALA-10914: Consistently schedule scan ranges for Iceberg tables

Before this patch Impala inconsistently scheduled scan ranges for
Iceberg tables on HDFS, in local catalog mode. It did so because
LocalIcebergTable reloaded all the files descriptors, and the HDFS
block locations were not consistent across the reloads. Impala's
scheduler uses the block location list for scan range assignment,
hence the assignments were inconsistent between queries. This has
a negative effect on caching and hence hit performance quite badly.

It is redundant and expensive to reload file descriptors for each
query in local catalog mode. This patch extends the GetPartialInfo()
RPC with Iceberg-specific snapshot information. It means that the
coordinator is now able to fetch Iceberg data file descriptors from
the CatalogD. This way scan range assignment becomes consistent
because we reuse the same file descriptors with the same block
location information.

Fixing the above revealed another bug. Before this patch we didn't
handle self-events of Iceberg tables. When an Iceberg table is stored
in the HiveCatalog it means that Iceberg will update the HMS table
on modifications because it needs to update table property
'metadata_location' (this points to the new snapshot file).
Then Catalogd processes these modifications again when they arrive
via the event notification mechanism. I fixed this by creating Iceberg
transactions in which I set the catalog service ID and new catalog
version for the Iceberg table. Since we are using transactions now
Iceberg has to embed all table modifications in a single ALTER TABLE
request to HMS, and detect the corresponding alter event later via the
aforementioned catalog service ID and version.

Testing:
 * added e2e test for the scan range assignment
 * added e2e test for detecting self-events

Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
---
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCatalogs.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-catalogs.test
M tests/custom_cluster/test_events_custom_configs.py
M tests/query_test/test_iceberg.py
17 files changed, 340 insertions(+), 105 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/17857/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17857
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibb8216b37d350469b573dad7fcefdd0ee0599ed5
Gerrit-Change-Number: 17857
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>