You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Yu-Wen Lai (Code Review)" <ge...@cloudera.org> on 2021/07/19 18:06:39 UTC

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request

Yu-Wen Lai has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17697


Change subject: IMPALA-10801: Check the latest compaction Id before serving request
......................................................................

IMPALA-10801: Check the latest compaction Id before serving request

Since compactions don't advance write id, we don't know if a
table/partition is compacted by comparing writeIdList. A possible
issue is that CatalogD provides obsolete file metadata and causes a
runtime error.

In order to fix this issue, we introduced a HMS API that can get the
latest compaction record for a table/partition (HIVE-24828). In
CatalogD, we cache compaction id while loading partitions and compare
the cached id with the latest compaction id before serving. If there
is a newer compaction happened, it would refresh the file metadata.

Besides, this patch also change how to replace the existing table
after a table full reloading. The current way is to replace the table
if the catalog version is not changed. For transactional tables,
things get additional complexity given that file metadata refreshing
and full table reloading can happen together. We can actually use
writeIdList to determine whether we should replace the table for
transactional tables. As long as the updated table has more recent
writeIdList than the existing one, we are safe to replace the table.
For Non-transactional tables, we still keep original behavior.

Testing:
- Add a test in PartialCatalogInfoWriteIdTest

Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
---
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
8 files changed, 362 insertions(+), 43 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 1
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................


Patch Set 10:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2127
PS10, Line 2127: Get non-ACID table with writeIdList:
This text here comes as a message for the IllegalStateException thrown and is user-visible. The message should be easy to understand for the user. I would recommend writing something like "Compaction id check cannot be done for non-transaction table + tbl.getFullName()"


http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2131
PS10, Line 2131: getPartitionsForRefreshingFileMetadata
It looks like we are fetching the same information twice from the metastore. Once when we want to detect if there is a compaction on any of the partitions of the table, second when we are actually updating the compactionId of the partition in the refreshFileMetadata method below. Can we avoid the repeated work? May be you can reuse the compactionIds which HMS returned in by setting them in partsToBeRefreshed so that refreshFileMetadata method below reuses them. That way you can get rid of getAndSetLastCompactionId()


http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3479
PS10, Line 3479: If there is an ongoing loading task, we don't reload file metadata but wait for the
               :    * loading task completed and return the table just loaded.
this comment is stale now that removed the loadReq logic.


http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@41
PS10, Line 41: Log
is this import needed?


http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@345
PS10, Line 345: hdfsTable_.writeLock().lock();
why is this needed? The table locks are generally taken at CatalogOpExecutor level. e.g see callers of loadTableMetadata() in CatalogOpExecutor


http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/TableLoader.java
File fe/src/main/java/org/apache/impala/catalog/TableLoader.java:

http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/TableLoader.java@115
PS10, Line 115: able.writeLock().lock();
Why is this needed?


http://gerrit.cloudera.org:8080/#/c/17697/10/testdata/cluster/ranger/setup/policy_5_revised.json
File testdata/cluster/ranger/setup/policy_5_revised.json:

http://gerrit.cloudera.org:8080/#/c/17697/10/testdata/cluster/ranger/setup/policy_5_revised.json@8
PS10, Line 8: 5
how is this change related? If it is not can we remove it from this patch and create a separate commit for it?



-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 10
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Wed, 28 Jul 2021 19:39:34 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request

Posted by "Yu-Wen Lai (Code Review)" <ge...@cloudera.org>.
Yu-Wen Lai has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving request
......................................................................

IMPALA-10801: Check the latest compaction Id before serving request

Since compactions don't advance write id, we don't know if a
table/partition is compacted by comparing writeIdList. A possible
issue is that CatalogD provides obsolete file metadata and causes a
runtime error.

In order to fix this issue, we introduced a HMS API that can get the
latest compaction record for a table/partition (HIVE-24828). In
CatalogD, we cache compaction id while loading partitions and compare
the cached id with the latest compaction id before serving. If there
is a newer compaction happened, it would refresh the file metadata.

Besides, this patch also change how to replace the existing table
after a table full reloading. The current way is to replace the table
if the catalog version is not changed. For transactional tables,
things get additional complexity given that file metadata refreshing
and full table reloading can happen together. We can actually use
writeIdList to determine whether we should replace the table for
transactional tables. As long as the updated table has more recent
writeIdList than the existing one, we are safe to replace the table.
For Non-transactional tables, we still keep original behavior.

Testing:
- Add a test in PartialCatalogInfoWriteIdTest

Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
---
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M testdata/bin/create-load-data.sh
R testdata/cluster/ranger/setup/policy_5_revised.json
9 files changed, 367 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/5
-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 5
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request

Posted by "Yu-Wen Lai (Code Review)" <ge...@cloudera.org>.
Yu-Wen Lai has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving request
......................................................................

IMPALA-10801: Check the latest compaction Id before serving request

Since compactions don't advance write id, we don't know if a
table/partition is compacted by comparing writeIdList. A possible
issue is that CatalogD provides obsolete file metadata and causes a
runtime error.

In order to fix this issue, we introduced a HMS API that can get the
latest compaction record for a table/partition (HIVE-24828). In
CatalogD, we cache compaction id while loading partitions and compare
the cached id with the latest compaction id before serving. If there
is a newer compaction happened, it would refresh the file metadata.

Besides, this patch also change how to replace the existing table
after a table full reloading. The current way is to replace the table
if the catalog version is not changed. For transactional tables,
things get additional complexity given that file metadata refreshing
and full table reloading can happen together. We can actually use
writeIdList to determine whether we should replace the table for
transactional tables. As long as the updated table has more recent
writeIdList than the existing one, we are safe to replace the table.
For Non-transactional tables, we still keep original behavior.

Testing:
- Add a test in PartialCatalogInfoWriteIdTest

Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
---
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M testdata/bin/create-load-data.sh
R testdata/cluster/ranger/setup/policy_5_revised.json
10 files changed, 370 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 3
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................


Patch Set 14:

(2 comments)

Thanks for the clarification. I think we should have a more detailed comment. Otherwise the patch looks great to me. I will +2 the patch once the comments are updated as suggested below.

http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2194
PS14, Line 2194: AcidUtils.compare((HdfsTable) existingTbl,
               :                                updatedTbl.getValidWriteIds(), tableId)
               :                             >= 0)
> No, it is not guaranteed. In this case, the table is coming from a full tab
I synced up with Yu-Wen offline. Based on the discussion, I understand the reasoning of this part of the change better.

With this change now it is possible that when a ACID table is refreshed the version number is bumped up but it still keeps the ValidWriteIdList to the previous ValidWriteIdList because refresh was due to compaction. Hence the assumption which replaceTableIfUnchanged makes that any time a table is modified it moves to a later validWriteIdList than previous does not hold anymore.

Since this is not very easy to understand for someone who doesn't know how ValidWriteIdLists are used here, I think it would be good to add a more comprehensive comment. Something similar to below (feel free to modify as appropriate):

For ACID tables, we also compare the writeIdList with existing table's and return existing table
when its writeIdList is more recent. This check is needed in addition to the catalogVersion check because it is possible that when table's files are refreshed to account for compaction, the table's version number is higher but ValidWriteIdList is still the same as before refresh. This could cause this method to not update the table when the updatedTbl has a higher ValidWriteIdList if we just rely on catalog version comparison which would break the reload on stale ValidWriteIdList logic.


http://gerrit.cloudera.org:8080/#/c/17697/15/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
File fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java:

http://gerrit.cloudera.org:8080/#/c/17697/15/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@598
PS15, Line 598:         .get(0).file_descriptors;
nit, this comment can be removed now.



-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 14
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Tue, 10 Aug 2021 19:12:42 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request

Posted by "Yu-Wen Lai (Code Review)" <ge...@cloudera.org>.
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving request
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17697/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java
File fe/src/main/java/org/apache/impala/util/AcidUtils.java:

http://gerrit.cloudera.org:8080/#/c/17697/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java@811
PS2, Line 811:     Map<String, Long> partNameToCompactionId = new HashMap<>();
> Thanks for the suggestion. A batch size of 1K makes sense to me. I will tes
In my local, the execution time of this api are ~1 ms for 1K partitions, ~10 ms for 10K paritions and ~30 ms for 50K partitions. Although it might takes a bit longer in a production env, we can expect it still falls in the range of tens of ms and I suppose it is a tolerable latency.



-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 3
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Wed, 21 Jul 2021 22:18:40 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9175/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 6
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Mon, 26 Jul 2021 18:05:58 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Yu-Wen Lai (Code Review)" <ge...@cloudera.org>.
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2131
PS10, Line 2131:           partsToBeRefreshed =
> @Yu-Wen: If there are multiple getOrLoad requests that end up at line 2131 
@Sourabh Thanks for the suggestion. I will try if I can do something like loadAsync.

For refreshFileMetadata(), Vihang pointed out a potential race condition that we cannot make sure the whole table reloading was happened after a compaction or not. It is possible we end up still serve stale file metadata. To avoid more issues around race conditions, we'd need to refresh file metadata even there is a concurrent full table reloading.



-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 11
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Fri, 30 Jul 2021 16:58:57 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving request
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9138/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 3
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Wed, 21 Jul 2021 22:34:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................


Patch Set 14:

(5 comments)

Thanks for making the suggested changes. The patch is looking pretty good now. I left some questions/suggestions below based on the the new changes since last reviewed patchset.

http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2194
PS14, Line 2194: AcidUtils.compare((HdfsTable) existingTbl,
               :                                updatedTbl.getValidWriteIds(), tableId)
               :                             >= 0)
Is it guaranteed that the existingTbl will have compacted files if it has a more recent validWriteIdList than updatedTbl?

It is not clear to me on what is the really the intention of this part of the change here. Can you please help me understand?


http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java
File fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java:

http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java@37
PS14, Line 37: getLatestCompactionInfo
Is this always a valid optimization? I thought the request object takes in the partition names. What if the new request has a different set of partition names than the one which is queued?


http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java@60
PS14, Line 60:     + request.getDbname() + "." + request.getTablename(),
             :           e
nit, can go into the same line.


http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@1522
PS14, Line 1522:  }
nit, please add a single line space here.


http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
File fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java:

http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@564
PS14, Line 564: Assert.assertEquals
A comment above this line saying that compacted tables/partitions should only return one file would be helpful.

Also, do we need to handle the case for ORC (full acid) tables as well? Do you know if the delete and insert files in ORC tables get merged  into 1 after minor compaction? Okay to do this as a followup if it is a significant change.



-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 14
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Aug 2021 21:19:23 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Yu-Wen Lai (Code Review)" <ge...@cloudera.org>.
Yu-Wen Lai has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................

IMPALA-10801: Check the latest compaction Id before serving ACID table

Since compactions don't advance write id, we don't know if a
table/partition is compacted by comparing writeIdList. A possible
issue is that CatalogD provides obsolete file metadata and causes a
runtime error.

In order to fix this issue, we introduced a HMS API that can get the
latest compaction record for a table/partition (HIVE-24828). In
CatalogD, we cache compaction id while loading partitions and compare
the cached id with the latest compaction id before serving. If there
is a newer compaction happened, it would refresh the file metadata.

Besides, this patch also change how to replace the existing table
after a table full reloading. The current way is to replace the table
if the catalog version is not changed. For transactional tables,
things get additional complexity given that file metadata refreshing
and full table reloading can happen together. We can actually use
writeIdList to determine whether we should replace the table for
transactional tables. As long as the updated table has more recent
writeIdList than the existing one, we are safe to replace the table.
For Non-transactional tables, we still keep original behavior.

Testing:
- Add a test in PartialCatalogInfoWriteIdTest

Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
---
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M testdata/bin/create-load-data.sh
R testdata/cluster/ranger/setup/policy_5_revised.json
9 files changed, 370 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/7
-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 7
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving request
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9117/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 1
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Mon, 19 Jul 2021 18:35:41 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving request
......................................................................


Patch Set 1: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7315/


-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 1
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Mon, 19 Jul 2021 19:38:19 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving request
......................................................................


Patch Set 4: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 4
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Fri, 23 Jul 2021 11:08:55 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Yu-Wen Lai (Code Review)" <ge...@cloudera.org>.
Yu-Wen Lai has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................

IMPALA-10801: Check the latest compaction Id before serving ACID table

Since compactions don't advance write id, we don't know if a
table/partition is compacted by comparing writeIdList. A possible
issue is that CatalogD provides obsolete file metadata and causes a
runtime error.

In order to fix this issue, we introduced a HMS API that can get the
latest compaction record for a table/partition (HIVE-24828). In
CatalogD, we compare the cached id with the latest compaction id before
serving. If there is a newer compaction happened, we will cache the
latest compaction id and refresh the file metadata.

Besides, this patch also change how to replace the existing table
after a table full reloading. The current way is to replace the table
if the catalog version is not changed. For transactional tables,
things get additional complexity given that file metadata refreshing
and full table reloading can happen together. We can actually use
writeIdList to determine whether we should replace the table for
transactional tables. As long as the updated table has more recent
writeIdList than the existing one, we are safe to replace the table.
For non-transactional tables, we still keep the original behavior.

Testing:
- Add several tests in PartialCatalogInfoWriteIdTest

Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
A fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
6 files changed, 324 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/14
-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 14
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Yu-Wen Lai (Code Review)" <ge...@cloudera.org>.
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................


Patch Set 16:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17697/15/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
File fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java:

http://gerrit.cloudera.org:8080/#/c/17697/15/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@533
PS15, Line 533:         " (c1 int) partitioned by (part int) stored as orc" +
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17697/15/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@547
PS15, Line 547:     executeHiveSql("create table " + getTestFullAcidTblName() +
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17697/15/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@598
PS15, Line 598:     TPartialPartitionInfo afterPartitionInfo =
> nit, this comment can be removed now.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 16
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Tue, 10 Aug 2021 20:32:09 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................


Patch Set 15:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17697/15/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
File fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java:

http://gerrit.cloudera.org:8080/#/c/17697/15/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@533
PS15, Line 533:         "partitioned by (part int) stored as orc tblproperties ('transactional' = 'true')");
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/17697/15/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@547
PS15, Line 547:         "partitioned by (part int) stored as orc tblproperties ('transactional' = 'true')");
line too long (92 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 15
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Mon, 09 Aug 2021 17:59:39 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving request
......................................................................


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7315/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 1
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 19 Jul 2021 18:08:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................


Patch Set 10:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9202/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 10
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Wed, 28 Jul 2021 17:56:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................


Patch Set 15:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9260/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 15
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Mon, 09 Aug 2021 18:24:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9179/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 7
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Tue, 27 Jul 2021 06:36:48 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving request
......................................................................


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7338/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 4
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Fri, 23 Jul 2021 05:01:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request

Posted by "Sourabh Goyal (Code Review)" <ge...@cloudera.org>.
Sourabh Goyal has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving request
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17697/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java
File fe/src/main/java/org/apache/impala/util/AcidUtils.java:

http://gerrit.cloudera.org:8080/#/c/17697/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java@788
PS2, Line 788:       CatalogServiceCatalog catalog, HdfsTable hdfsTable) throws CatalogException {
nit: If we choose to use a single client to fetch compaction info for all partitions, then we can pass MetastoreClient instead of catalog Object.


http://gerrit.cloudera.org:8080/#/c/17697/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java@811
PS2, Line 811:           metaStoreClient.getHiveClient().getLatestCommittedCompactionInfo(request);
Should we parallelize this HMS api if table has large number of partitions say 10k? We can use multiple Metastore clients from the pool and request compaction info for partitions in a batch size of say 1k.

Thoughts?


http://gerrit.cloudera.org:8080/#/c/17697/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java@837
PS2, Line 837:       LOG.debug("Cached compaction id for {}: {} but the latest compaction id: {}",
Log partition name as well?



-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 2
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Tue, 20 Jul 2021 14:11:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request

Posted by "Yu-Wen Lai (Code Review)" <ge...@cloudera.org>.
Yu-Wen Lai has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving request
......................................................................

IMPALA-10801: Check the latest compaction Id before serving request

Since compactions don't advance write id, we don't know if a
table/partition is compacted by comparing writeIdList. A possible
issue is that CatalogD provides obsolete file metadata and causes a
runtime error.

In order to fix this issue, we introduced a HMS API that can get the
latest compaction record for a table/partition (HIVE-24828). In
CatalogD, we cache compaction id while loading partitions and compare
the cached id with the latest compaction id before serving. If there
is a newer compaction happened, it would refresh the file metadata.

Besides, this patch also change how to replace the existing table
after a table full reloading. The current way is to replace the table
if the catalog version is not changed. For transactional tables,
things get additional complexity given that file metadata refreshing
and full table reloading can happen together. We can actually use
writeIdList to determine whether we should replace the table for
transactional tables. As long as the updated table has more recent
writeIdList than the existing one, we are safe to replace the table.
For Non-transactional tables, we still keep original behavior.

Testing:
- Add a test in PartialCatalogInfoWriteIdTest

Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
---
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
8 files changed, 368 insertions(+), 41 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 2
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................


Patch Set 14:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java
File fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java:

http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java@37
PS14, Line 37: getLatestCompactionInfo
> @Vihang: GetLatestCommittedCompactionInfoRequest's equals() method checks p
Oh, yes. That is correct. Thanks for pointing that out. Looks good then. This may be worth a comment then since it may be confusing who is not aware.


http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java@37
PS14, Line 37: getLatestCompactionInfo
> Is this always a valid optimization? I thought the request object takes in 
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 14
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Aug 2021 21:16:56 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request

Posted by "Yu-Wen Lai (Code Review)" <ge...@cloudera.org>.
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving request
......................................................................


Patch Set 2:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2135
PS1, Line 2135:         CatalogMonitor.INSTANCE.getCatalogdHmsCacheMetrics()
> line too long (129 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2140
PS1, Line 2140:           // Update the cache miss metric, as the valid write id list did not match and we
> line too long (158 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@797
PS1, Line 797:   private void getAndSetLastCompactionId(IMetaStoreClient client,
> line too long (113 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@828
PS1, Line 828:       String partName = lci.getPartitionname() == null ? DEFAULT_PARTITION_NAME :
> line too long (105 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
File fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java:

http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@2193
PS1, Line 2193:       return client.getHiveClient()
> line too long (100 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
File fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java:

http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@561
PS1, Line 561:         testDbName, testPartitionedTbl, HdfsTable.FILEMETADATA_CACHE_MISS_METRIC);
> line too long (97 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@563
PS1, Line 563:         testDbName, testPartitionedTbl, HdfsTable.FILEMETADATA_CACHE_HIT_METRIC);
> line too long (96 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@578
PS1, Line 578:         testDbName, testPartitionedTbl, HdfsTable.FILEMETADATA_CACHE_MISS_METRIC);
> line too long (97 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@580
PS1, Line 580:         testDbName, testPartitionedTbl, HdfsTable.FILEMETADATA_CACHE_HIT_METRIC);
> line too long (96 > 90)
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 2
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Mon, 19 Jul 2021 18:15:59 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Yu-Wen Lai (Code Review)" <ge...@cloudera.org>.
Yu-Wen Lai has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................

IMPALA-10801: Check the latest compaction Id before serving ACID table

Since compactions don't advance write id, we don't know if a
table/partition is compacted by comparing writeIdList. A possible
issue is that CatalogD provides obsolete file metadata and causes a
runtime error.

In order to fix this issue, we introduced a HMS API that can get the
latest compaction record for a table/partition (HIVE-24828). In
CatalogD, we compare the cached id with the latest compaction id before
serving. If there is a newer compaction happened, we will cache the
latest compaction id and refresh the file metadata.

Besides, this patch also change how to replace the existing table
after a table full reloading. The current way is to replace the table
if the catalog version is not changed. For transactional tables,
things get additional complexity given that file metadata refreshing
and full table reloading can happen together. We can actually use
writeIdList to determine whether we should replace the table for
transactional tables. As long as the updated table has more recent
writeIdList than the existing one, we are safe to replace the table.
For non-transactional tables, we still keep the original behavior.

Testing:
- Add several tests in PartialCatalogInfoWriteIdTest

Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
A fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
6 files changed, 322 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/13
-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 13
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................


Patch Set 14: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 14
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Aug 2021 00:59:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Sourabh Goyal (Code Review)" <ge...@cloudera.org>.
Sourabh Goyal has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2131
PS10, Line 2131:           partsToBeRefreshed =
@Yu-Wen: If there are multiple getOrLoad requests that end up at line 2131 for the same table, each one of them would make HMS api call to detect the partitions for refreshing metadata. It is not efficient because we are making repeated HMS api calls specially if the partitioned table is large. If feel the later requests for the same table could piggyback on the first request (just like loadSync does). 

The other potential issue I see is with refreshFileMetada method. By the time this method is called, version read lock is released. What if there is another getOrLoad request that replaces the whole table around the same time when we are inside refreshFileMetadata()? Though it should not cause any issue with the cache consistency but we are unnecessarily updating the table that has already been replaced.



-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 11
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Thu, 29 Jul 2021 17:33:44 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................


Patch Set 14:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7366/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 14
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Aug 2021 00:39:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Yu-Wen Lai (Code Review)" <ge...@cloudera.org>.
Yu-Wen Lai has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................

IMPALA-10801: Check the latest compaction Id before serving ACID table

Since compactions don't advance write id, we don't know if a
table/partition is compacted by comparing writeIdList. A possible
issue is that CatalogD provides obsolete file metadata and causes a
runtime error.

In order to fix this issue, we introduced a HMS API that can get the
latest compaction record for a table/partition (HIVE-24828). In
CatalogD, we compare the cached id with the latest compaction id before
serving. If there is a newer compaction happened, we will cache the
latest compaction id and refresh the file metadata.

Besides, this patch also change how to replace the existing table
after a table full reloading. The current way is to replace the table
if the catalog version is not changed. For transactional tables,
things get additional complexity given that file metadata refreshing
and full table reloading can happen together. We can actually use
writeIdList to determine whether we should replace the table for
transactional tables. As long as the updated table has more recent
writeIdList than the existing one, we are safe to replace the table.
For non-transactional tables, we still keep the original behavior.

Testing:
- Add several tests in PartialCatalogInfoWriteIdTest

Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
A fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
6 files changed, 369 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/16
-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 16
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Yu-Wen Lai (Code Review)" <ge...@cloudera.org>.
Yu-Wen Lai has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................

IMPALA-10801: Check the latest compaction Id before serving ACID table

Since compactions don't advance write id, we don't know if a
table/partition is compacted by comparing writeIdList. A possible
issue is that CatalogD provides obsolete file metadata and causes a
runtime error.

In order to fix this issue, we introduced a HMS API that can get the
latest compaction record for a table/partition (HIVE-24828). In
CatalogD, we compare the cached id with the latest compaction id before
serving. If there is a newer compaction happened, we will cache the
latest compaction id and refresh the file metadata.

Besides, this patch also change how to replace the existing table
after a table full reloading. The current way is to replace the table
if the catalog version is not changed. For transactional tables,
things get additional complexity given that file metadata refreshing
and full table reloading can happen together. We can actually use
writeIdList to determine whether we should replace the table for
transactional tables. As long as the updated table has more recent
writeIdList than the existing one, we are safe to replace the table.
For non-transactional tables, we still keep the original behavior.

Testing:
- Add a test in PartialCatalogInfoWriteIdTest

Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
---
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M testdata/bin/create-load-data.sh
R testdata/cluster/ranger/setup/policy_5_revised.json
12 files changed, 326 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/11
-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 11
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request

Posted by "Yu-Wen Lai (Code Review)" <ge...@cloudera.org>.
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving request
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17697/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java
File fe/src/main/java/org/apache/impala/util/AcidUtils.java:

http://gerrit.cloudera.org:8080/#/c/17697/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java@811
PS2, Line 811:           metaStoreClient.getHiveClient().getLatestCommittedCompactionInfo(request);
> Should we parallelize this HMS api if table has large number of partitions 
Thanks for the suggestion. A batch size of 1K makes sense to me. I will test it out.


http://gerrit.cloudera.org:8080/#/c/17697/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java@837
PS2, Line 837:       LOG.debug("Cached compaction id for {}: {} but the latest compaction id: {}",
> Log partition name as well?
Will add.



-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 2
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Tue, 20 Jul 2021 16:55:57 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9189/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 9
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Tue, 27 Jul 2021 19:57:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request

Posted by "Yu-Wen Lai (Code Review)" <ge...@cloudera.org>.
Yu-Wen Lai has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving request
......................................................................

IMPALA-10801: Check the latest compaction Id before serving request

Since compactions don't advance write id, we don't know if a
table/partition is compacted by comparing writeIdList. A possible
issue is that CatalogD provides obsolete file metadata and causes a
runtime error.

In order to fix this issue, we introduced a HMS API that can get the
latest compaction record for a table/partition (HIVE-24828). In
CatalogD, we cache compaction id while loading partitions and compare
the cached id with the latest compaction id before serving. If there
is a newer compaction happened, it would refresh the file metadata.

Besides, this patch also change how to replace the existing table
after a table full reloading. The current way is to replace the table
if the catalog version is not changed. For transactional tables,
things get additional complexity given that file metadata refreshing
and full table reloading can happen together. We can actually use
writeIdList to determine whether we should replace the table for
transactional tables. As long as the updated table has more recent
writeIdList than the existing one, we are safe to replace the table.
For Non-transactional tables, we still keep original behavior.

Testing:
- Add a test in PartialCatalogInfoWriteIdTest

Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
---
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M testdata/bin/create-load-data.sh
R testdata/cluster/ranger/setup/policy_5_revised.json
10 files changed, 378 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/4
-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 4
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................


Patch Set 16:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9276/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 16
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Tue, 10 Aug 2021 20:53:42 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................


Patch Set 14: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7366/


-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 14
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Aug 2021 07:00:32 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving request
......................................................................


Patch Set 1:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2135
PS1, Line 2135:         CatalogMonitor.INSTANCE.getCatalogdHmsCacheMetrics().getCounter(CatalogMetastoreServer.CATALOGD_CACHE_MISS_METRIC).inc();
line too long (129 > 90)


http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2140
PS1, Line 2140:           CatalogMonitor.INSTANCE.getCatalogdHmsCacheMetrics().getCounter(String.format(CatalogMetastoreServer.CATALOGD_CACHE_API_MISS_METRIC, reason)).inc();
line too long (158 > 90)


http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@797
PS1, Line 797:   private void getAndSetLastCompactionId(IMetaStoreClient client, Collection<HdfsPartition.Builder> partBuilders)
line too long (113 > 90)


http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@828
PS1, Line 828:       String partName = lci.getPartitionname() == null ? DEFAULT_PARTITION_NAME : lci.getPartitionname();
line too long (105 > 90)


http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
File fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java:

http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@2193
PS1, Line 2193:       return client.getHiveClient().getThriftClient().get_latest_committed_compaction_info(request);
line too long (100 > 90)


http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
File fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java:

http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@561
PS1, Line 561:         getMetricCount(testDbName, testPartitionedTbl, HdfsTable.FILEMETADATA_CACHE_MISS_METRIC);
line too long (97 > 90)


http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@563
PS1, Line 563:         getMetricCount(testDbName, testPartitionedTbl, HdfsTable.FILEMETADATA_CACHE_HIT_METRIC);
line too long (96 > 90)


http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@578
PS1, Line 578:         getMetricCount(testDbName, testPartitionedTbl, HdfsTable.FILEMETADATA_CACHE_MISS_METRIC);
line too long (97 > 90)


http://gerrit.cloudera.org:8080/#/c/17697/1/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@580
PS1, Line 580:         getMetricCount(testDbName, testPartitionedTbl, HdfsTable.FILEMETADATA_CACHE_HIT_METRIC);
line too long (96 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 1
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 19 Jul 2021 18:07:24 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................


Patch Set 16: Verified+1 Code-Review+2

Looks good to me. There were only minor comment changes since the last verified +1 so I am going to override that and give a +1.


-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 16
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Tue, 10 Aug 2021 21:05:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Sourabh Goyal (Code Review)" <ge...@cloudera.org>.
Sourabh Goyal has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................


Patch Set 14:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java
File fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java:

http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java@37
PS14, Line 37: getLatestCompactionInfo
> Is this always a valid optimization? I thought the request object takes in 
@Vihang: GetLatestCommittedCompactionInfoRequest's equals() method checks partition names along with db and table name. So I think we are good.


http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java@38
PS14, Line 38:       CatalogServiceCatalog catalog, GetLatestCommittedCompactionInfoRequest request)
nit: Only pass MetastoreClient instead of whole catalog Object ?



-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 14
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Aug 2021 18:03:06 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Yu-Wen Lai (Code Review)" <ge...@cloudera.org>.
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................


Patch Set 16:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2194
PS14, Line 2194: te the table when the updatedTbl has a higher ValidWriteIdList
               :       // if we just rely on catalog version comparison which would break the logic to
               :       // reload on stale ValidWri
> I synced up with Yu-Wen offline. Based on the discussion, I understand the 
Thanks Vihang for putting this together. I've added the comment.



-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 16
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Tue, 10 Aug 2021 20:34:44 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Yu-Wen Lai (Code Review)" <ge...@cloudera.org>.
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................


Patch Set 14:

(3 comments)

> Patch Set 14:
> 
> (2 comments)

http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2194
PS14, Line 2194: AcidUtils.compare((HdfsTable) existingTbl,
               :                                updatedTbl.getValidWriteIds(), tableId)
               :                             >= 0)
> Is it guaranteed that the existingTbl will have compacted files if it has a
No, it is not guaranteed. In this case, the table is coming from a full table loading and I supposed at this time the client who sent the query already acquired read/write lock on the table. Therefore, the file metadata loaded won't be cleaned for the lifetime of the query. We don't need to worry other queries because the file metadata will be updated next time when there is any compaction.

If I understand the original design correctly, the conditions here make sure we only update once for many requests to a single table. After my change, refreshing file metadata also changes catalogVersion. Let's say there are one table loading and one file metadata refreshing happening together. If file metadata refreshing is finished earlier and catalogVersion is updated, we should not discard the result of full table loading here since we need to return the table with most recent validWriteIdList.


http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java
File fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java:

http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java@38
PS14, Line 38:       CatalogServiceCatalog catalog, GetLatestCommittedCompactionInfoRequest request)
> nit: Only pass MetastoreClient instead of whole catalog Object ?
Passing catalog here so we can get a client from pool only when needed.


http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
File fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java:

http://gerrit.cloudera.org:8080/#/c/17697/14/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@564
PS14, Line 564: Assert.assertEquals
> A comment above this line saying that compacted tables/partitions should on
Not sure about ORC tables. Will check.



-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 14
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Aug 2021 21:39:48 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................


Patch Set 13:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9232/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 13
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Aug 2021 23:51:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving request
......................................................................


Patch Set 4:

(9 comments)

Thanks a lot for working on this. This is definitely very useful. I took a first pass at this. I will take another look at this while you address the comments above.

http://gerrit.cloudera.org:8080/#/c/17697/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17697/4//COMMIT_MSG@7
PS4, Line 7: request
nit, May be change this to say "ACID table" to be more specific.


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2125
PS4, Line 2125: tbl.readLock().lock();
Can you add a Preconditions check before this line to make sure that the table is a ACID table? This makes sure that we don't regress unintentionally by taking read lock on non-transactional tables here.

Preconditions.checkState(AcidUtils.isTransactionalTable(tbl.getMetaStoreTable().getParameters())));


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2127
PS4, Line 2127: refreshNeededPartitions
nit, can we rename this variable to something like "partsToBeRefreshed" to make it more readable?


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2187
PS4, Line 2187: HdfsTable
change to "ACID tables" since external tables are also HdfsTables


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3486
PS4, Line 3486:     TableLoadingMgr.LoadRequest loadReq = tableLoadingMgr_.getLoadRequest(tableName);
              :     if (loadReq != null) {
              :       try {
              :         return loadReq.get();
              :       } finally {
              :         loadReq.close();
              :       }
              :     }
This logic seems to have a race condition. How do we know that the loadReq got completed before or after compaction? If it completed just before compaction, we are still going to return uncompacted files. Is this just for optimization or needed for certain edge case?


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@727
PS4, Line 727: public
since this method is now made public, I think we should add a Preconditions check here to make sure that the table write lock was table before calling this method.

Preconditions.checkState(this.isWriteLockedByCurrentThread());


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@760
PS4, Line 760: there is a compaction
nit, another compaction


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@824
PS4, Line 824:     Map<String, HdfsPartition.Builder> nameToPartition =
             :         partBuilders.stream().collect(Collectors.toMap(
             :             HdfsPartition.Builder::getPartitionName, builder -> builder));
If you move this to line 805 you can avoid iterating the partBuilders twice by passing new ArrayList<>(nameToPartition.keySet()) on line 812.


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@827
PS4, Line 827: for (CompactionInfoStruct lci : resp.getCompactions()) {
             :       String partName = lci.getPartitionname() == null ? DEFAULT_PARTITION_NAME :
             :                                                          lci.getPartitionname();
             :       HdfsPartition.Builder partition = nameToPartition.get(partName);
             :       partition.setLastCompactionId(lci.getId());
             :     }
I think the code readability can be improved if you handle the non-partitioned and partitioned case separately. e.g

if (isPartitioned()) {
     for (CompactionInfoStruct lci : resp.getCompactions()) {
                                                 
      HdfsPartition.Builder partition = nameToPartition.get(lci.getPartitionname());
      Preconditions.checkNotNull(partition);
      partition.setLastCompactionId(lci.getId());
    }
} else {
   CompactionInfoStruct ci = Iterables.getOnlyElement(resp.getCompactions());
   HdfsPartition.Builder partBuilder = Iterables.getOnlyElement(partBuilders);
   partBuilder.setLastCompactionId(ci.getId());
}



-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 4
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Fri, 23 Jul 2021 22:46:09 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving request
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9150/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 4
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Fri, 23 Jul 2021 05:21:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Yu-Wen Lai (Code Review)" <ge...@cloudera.org>.
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................


Patch Set 6:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/17697/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17697/4//COMMIT_MSG@7
PS4, Line 7: ACID ta
> nit, May be change this to say "ACID table" to be more specific.
Done


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2125
PS4, Line 2125: Preconditions.checkSta
> Can you add a Preconditions check before this line to make sure that the ta
Done


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2127
PS4, Line 2127: l.readLock().lock();
> nit, can we rename this variable to something like "partsToBeRefreshed" to 
Done


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2187
PS4, Line 2187: 
> change to "ACID tables" since external tables are also HdfsTables
Done


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3486
PS4, Line 3486: 
              :     if (!tryWriteLock(hdfsTable)) {
              :       throw new CatalogException(String.format(
              :           "Error during refreshing file metadata for table %s due to lock contention",
              :           hdfsTable.getFullName()));
              :     }
              :     long newVersion = incrementAndGetCatalogVersion();
              :     v
> This logic seems to have a race condition. How do we know that the loadReq 
Thanks for pointing out this. It is for optimization so I've removed it.


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@824
PS4, Line 824:     if (isPartitioned()) {
             :       for (CompactionInfoStruct ci : resp.getCompactions()) {
             :         HdfsPartition.Builder partBuilder = nameToPartBuilder.get(ci.getPa
> If you move this to line 805 you can avoid iterating the partBuilders twice
Done


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@827
PS4, Line 827:     Preconditions.checkNotNull(partBuilder);
             :         partBuilder.setLastCompactionId(ci.getId());
             :       }
             :     } else {
             :       CompactionInfoStruct ci = Iterables.getOnlyElement(resp.getCompactions());
             :      
> I think the code readability can be improved if you handle the non-partitio
Done


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
File fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java:

http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@520
PS4, Line 520:     TGetPartialCatalogObjectResponse response =
> line too long (107 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@539
PS4, Line 539:     response = sendRequest(request);
> line too long (114 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@569
PS4, Line 569:     Assert.assertTrue(prePartitionInfo.getFile_descriptors().size() > 1);
> line too long (110 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@583
PS4, Line 583:                   .wantFiles()
> line too long (92 > 90)
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 6
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Mon, 26 Jul 2021 17:50:13 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving request
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9118/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 2
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Mon, 19 Jul 2021 18:44:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9186/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 8
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Tue, 27 Jul 2021 16:48:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving request
......................................................................


Patch Set 3: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7332/


-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 3
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Jul 2021 04:24:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................


Patch Set 14:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7370/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 14
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Aug 2021 18:41:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Yu-Wen Lai (Code Review)" <ge...@cloudera.org>.
Yu-Wen Lai has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................

IMPALA-10801: Check the latest compaction Id before serving ACID table

Since compactions don't advance write id, we don't know if a
table/partition is compacted by comparing writeIdList. A possible
issue is that CatalogD provides obsolete file metadata and causes a
runtime error.

In order to fix this issue, we introduced a HMS API that can get the
latest compaction record for a table/partition (HIVE-24828). In
CatalogD, we cache compaction id while loading partitions and compare
the cached id with the latest compaction id before serving. If there
is a newer compaction happened, it would refresh the file metadata.

Besides, this patch also change how to replace the existing table
after a table full reloading. The current way is to replace the table
if the catalog version is not changed. For transactional tables,
things get additional complexity given that file metadata refreshing
and full table reloading can happen together. We can actually use
writeIdList to determine whether we should replace the table for
transactional tables. As long as the updated table has more recent
writeIdList than the existing one, we are safe to replace the table.
For Non-transactional tables, we still keep original behavior.

Testing:
- Add a test in PartialCatalogInfoWriteIdTest

Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
---
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M testdata/bin/create-load-data.sh
R testdata/cluster/ranger/setup/policy_5_revised.json
11 files changed, 383 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/9
-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 9
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................

IMPALA-10801: Check the latest compaction Id before serving ACID table

Since compactions don't advance write id, we don't know if a
table/partition is compacted by comparing writeIdList. A possible
issue is that CatalogD provides obsolete file metadata and causes a
runtime error.

In order to fix this issue, we introduced a HMS API that can get the
latest compaction record for a table/partition (HIVE-24828). In
CatalogD, we compare the cached id with the latest compaction id before
serving. If there is a newer compaction happened, we will cache the
latest compaction id and refresh the file metadata.

Besides, this patch also change how to replace the existing table
after a table full reloading. The current way is to replace the table
if the catalog version is not changed. For transactional tables,
things get additional complexity given that file metadata refreshing
and full table reloading can happen together. We can actually use
writeIdList to determine whether we should replace the table for
transactional tables. As long as the updated table has more recent
writeIdList than the existing one, we are safe to replace the table.
For non-transactional tables, we still keep the original behavior.

Testing:
- Add several tests in PartialCatalogInfoWriteIdTest

Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Reviewed-on: http://gerrit.cloudera.org:8080/17697
Reviewed-by: Vihang Karajgaonkar <vi...@cloudera.com>
Tested-by: Vihang Karajgaonkar <vi...@cloudera.com>
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
A fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
6 files changed, 369 insertions(+), 29 deletions(-)

Approvals:
  Vihang Karajgaonkar: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 17
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Yu-Wen Lai (Code Review)" <ge...@cloudera.org>.
Yu-Wen Lai has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................

IMPALA-10801: Check the latest compaction Id before serving ACID table

Since compactions don't advance write id, we don't know if a
table/partition is compacted by comparing writeIdList. A possible
issue is that CatalogD provides obsolete file metadata and causes a
runtime error.

In order to fix this issue, we introduced a HMS API that can get the
latest compaction record for a table/partition (HIVE-24828). In
CatalogD, we cache compaction id while loading partitions and compare
the cached id with the latest compaction id before serving. If there
is a newer compaction happened, it would refresh the file metadata.

Besides, this patch also change how to replace the existing table
after a table full reloading. The current way is to replace the table
if the catalog version is not changed. For transactional tables,
things get additional complexity given that file metadata refreshing
and full table reloading can happen together. We can actually use
writeIdList to determine whether we should replace the table for
transactional tables. As long as the updated table has more recent
writeIdList than the existing one, we are safe to replace the table.
For Non-transactional tables, we still keep original behavior.

Testing:
- Add a test in PartialCatalogInfoWriteIdTest

Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
---
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M testdata/bin/create-load-data.sh
R testdata/cluster/ranger/setup/policy_5_revised.json
12 files changed, 387 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/10
-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 10
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Yu-Wen Lai (Code Review)" <ge...@cloudera.org>.
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................


Patch Set 10:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2127
PS10, Line 2127: Get non-ACID table with writeIdList:
> This text here comes as a message for the IllegalStateException thrown and 
Ack


http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3479
PS10, Line 3479: If there is an ongoing loading task, we don't reload file metadata but wait for the
               :    * loading task completed and return the table just loaded.
> this comment is stale now that removed the loadReq logic.
Ack


http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@41
PS10, Line 41: Log
> is this import needed?
Ack


http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@345
PS10, Line 345: hdfsTable_.writeLock().lock();
> why is this needed? The table locks are generally taken at CatalogOpExecuto
This load fails after adding a precondition check for locking in HdfsTable.loadFileMetadataForPartitions. I suppose the lock is not taken at CatalogOpExecutor because hdfsTable_ is the internal object of icebergTable.


http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/TableLoader.java
File fe/src/main/java/org/apache/impala/catalog/TableLoader.java:

http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/TableLoader.java@115
PS10, Line 115: able.writeLock().lock();
> Why is this needed?
Same as icebergTable. After adding a precondition check for locking in HdfsTable.loadFileMetadataForPartitions, this function would fail without taking lock.


http://gerrit.cloudera.org:8080/#/c/17697/10/testdata/cluster/ranger/setup/policy_5_revised.json
File testdata/cluster/ranger/setup/policy_5_revised.json:

http://gerrit.cloudera.org:8080/#/c/17697/10/testdata/cluster/ranger/setup/policy_5_revised.json@8
PS10, Line 8: 5
> how is this change related? If it is not can we remove it from this patch a
Since this patch bumps up cdp version and the new version of ranger would cause failure of create-load-data.sh. If I don't put this here, I cannot get a green testing. Or should I create a seperate commit to bump up cdp version?



-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 10
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Wed, 28 Jul 2021 21:40:09 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Yu-Wen Lai (Code Review)" <ge...@cloudera.org>.
Yu-Wen Lai has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................

IMPALA-10801: Check the latest compaction Id before serving ACID table

Since compactions don't advance write id, we don't know if a
table/partition is compacted by comparing writeIdList. A possible
issue is that CatalogD provides obsolete file metadata and causes a
runtime error.

In order to fix this issue, we introduced a HMS API that can get the
latest compaction record for a table/partition (HIVE-24828). In
CatalogD, we cache compaction id while loading partitions and compare
the cached id with the latest compaction id before serving. If there
is a newer compaction happened, it would refresh the file metadata.

Besides, this patch also change how to replace the existing table
after a table full reloading. The current way is to replace the table
if the catalog version is not changed. For transactional tables,
things get additional complexity given that file metadata refreshing
and full table reloading can happen together. We can actually use
writeIdList to determine whether we should replace the table for
transactional tables. As long as the updated table has more recent
writeIdList than the existing one, we are safe to replace the table.
For Non-transactional tables, we still keep original behavior.

Testing:
- Add a test in PartialCatalogInfoWriteIdTest

Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
---
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M testdata/bin/create-load-data.sh
R testdata/cluster/ranger/setup/policy_5_revised.json
9 files changed, 367 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/6
-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 6
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving request
......................................................................


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7332/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 3
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Wed, 21 Jul 2021 22:08:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................


Patch Set 11:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9209/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 11
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Thu, 29 Jul 2021 17:47:19 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................


Patch Set 13:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17697/13/fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java
File fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java:

http://gerrit.cloudera.org:8080/#/c/17697/13/fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java@38
PS13, Line 38:       CatalogServiceCatalog catalog, GetLatestCommittedCompactionInfoRequest request) throws CatalogException {
line too long (111 > 90)


http://gerrit.cloudera.org:8080/#/c/17697/13/fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java@41
PS13, Line 41:           try (MetaStoreClientPool.MetaStoreClient client = catalog.getMetaStoreClient()) {
line too long (91 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 13
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Aug 2021 23:29:48 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9174/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 5
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Mon, 26 Jul 2021 18:03:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................


Patch Set 14:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9233/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 14
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Aug 2021 00:06:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................


Patch Set 10:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@345
PS10, Line 345: hdfsTable_.writeLock().lock();
> This load fails after adding a precondition check for locking in HdfsTable.
Ah, I see. The only code path which i see could hit this condition is when TableLoader is loading the table. TableLoader which loads the table in background does not need to take a table lock because in that case it loads the table in background and then atomically replaces it in the Db. The case where we need to take a lock on the table before reloading the table is when you are executing a ddl like alter.

I don't think taking a lock here is the solution for this. I think in that case we should remove the Preconditions check since it is not applicable for all the code paths.


http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/Table.java@592
PS10, Line 592: protected
this probably is not needed if you remove the preconditions check.


http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/TableLoader.java
File fe/src/main/java/org/apache/impala/catalog/TableLoader.java:

http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/main/java/org/apache/impala/catalog/TableLoader.java@115
PS10, Line 115: able.writeLock().lock();
> Same as icebergTable. After adding a precondition check for locking in Hdfs
Please remove the added Preconditions check as per my comment earlier.


http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
File fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java:

http://gerrit.cloudera.org:8080/#/c/17697/10/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@526
PS10, Line 526: executeHiveSql
can you add test coverage for major compaction too and partition level compactions too?


http://gerrit.cloudera.org:8080/#/c/17697/10/testdata/cluster/ranger/setup/policy_5_revised.json
File testdata/cluster/ranger/setup/policy_5_revised.json:

http://gerrit.cloudera.org:8080/#/c/17697/10/testdata/cluster/ranger/setup/policy_5_revised.json@8
PS10, Line 8: 5
> Since this patch bumps up cdp version and the new version of ranger would c
Yes, I think that would be better. Generally if there are associated changes with the BUILD number bump, we should commit them as a separate patch just for GBN bump so that we test them separately.



-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 10
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Fri, 30 Jul 2021 18:46:27 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Yu-Wen Lai (Code Review)" <ge...@cloudera.org>.
Yu-Wen Lai has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................

IMPALA-10801: Check the latest compaction Id before serving ACID table

Since compactions don't advance write id, we don't know if a
table/partition is compacted by comparing writeIdList. A possible
issue is that CatalogD provides obsolete file metadata and causes a
runtime error.

In order to fix this issue, we introduced a HMS API that can get the
latest compaction record for a table/partition (HIVE-24828). In
CatalogD, we cache compaction id while loading partitions and compare
the cached id with the latest compaction id before serving. If there
is a newer compaction happened, it would refresh the file metadata.

Besides, this patch also change how to replace the existing table
after a table full reloading. The current way is to replace the table
if the catalog version is not changed. For transactional tables,
things get additional complexity given that file metadata refreshing
and full table reloading can happen together. We can actually use
writeIdList to determine whether we should replace the table for
transactional tables. As long as the updated table has more recent
writeIdList than the existing one, we are safe to replace the table.
For Non-transactional tables, we still keep original behavior.

Testing:
- Add a test in PartialCatalogInfoWriteIdTest

Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
---
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M testdata/bin/create-load-data.sh
R testdata/cluster/ranger/setup/policy_5_revised.json
10 files changed, 376 insertions(+), 45 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/8
-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 8
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving ACID table

Posted by "Yu-Wen Lai (Code Review)" <ge...@cloudera.org>.
Yu-Wen Lai has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving ACID table
......................................................................

IMPALA-10801: Check the latest compaction Id before serving ACID table

Since compactions don't advance write id, we don't know if a
table/partition is compacted by comparing writeIdList. A possible
issue is that CatalogD provides obsolete file metadata and causes a
runtime error.

In order to fix this issue, we introduced a HMS API that can get the
latest compaction record for a table/partition (HIVE-24828). In
CatalogD, we compare the cached id with the latest compaction id before
serving. If there is a newer compaction happened, we will cache the
latest compaction id and refresh the file metadata.

Besides, this patch also change how to replace the existing table
after a table full reloading. The current way is to replace the table
if the catalog version is not changed. For transactional tables,
things get additional complexity given that file metadata refreshing
and full table reloading can happen together. We can actually use
writeIdList to determine whether we should replace the table for
transactional tables. As long as the updated table has more recent
writeIdList than the existing one, we are safe to replace the table.
For non-transactional tables, we still keep the original behavior.

Testing:
- Add several tests in PartialCatalogInfoWriteIdTest

Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
A fe/src/main/java/org/apache/impala/catalog/CompactionInfoLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
6 files changed, 362 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/17697/15
-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 15
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>

[Impala-ASF-CR] IMPALA-10801: Check the latest compaction Id before serving request

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17697 )

Change subject: IMPALA-10801: Check the latest compaction Id before serving request
......................................................................


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
File fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java:

http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@520
PS4, Line 520:     TGetPartialCatalogObjectResponse response = ((HdfsTable) tbl).getPartialInfo(request, new HashMap<>());
line too long (107 > 90)


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@539
PS4, Line 539:     TPartialPartitionInfo afterPartitionInfo = Iterables.getOnlyElement(response.getTable_info().getPartitions());
line too long (114 > 90)


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@569
PS4, Line 569:     executeHiveSql("alter table " + getPartitionedTblName() + " partition (part=1) compact 'minor' and wait");
line too long (110 > 90)


http://gerrit.cloudera.org:8080/#/c/17697/4/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@583
PS4, Line 583:     afterPartitionInfo = Iterables.getOnlyElement(response.getTable_info().getPartitions());
line too long (92 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/17697
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I86a112a77980fef7f6238978bc9668a65262101e
Gerrit-Change-Number: 17697
Gerrit-PatchSet: 4
Gerrit-Owner: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Fri, 23 Jul 2021 05:01:05 +0000
Gerrit-HasComments: Yes