You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org> on 2020/05/29 23:55:02 UTC

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Vihang Karajgaonkar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16008


Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................

IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

This change enhances the Catalog-v2 API getPartialCatalogObject to
support ValidWriteIdList as an optional field in the TableInfoSelector.
When such a field is provided by the clients, catalog compares the
provided ValidWriteIdList with the cached ValidWriteIdList of the
table. The catalog reloads the table if it determines that the cached
table is stale with respect to the ValidWriteIdList provided.
In case the table is already at or above the requested ValidWriteIdList
catalog uses the cached table metadata information to filter out
filedescriptors pertaining to the provided ValidWriteIdList.

This is helpful to provide a snapshot view of the table metadata during
query compilation with respect to other changes happening to the table
concurrently. Note that this change does not implement the coordinator
side changes needed for catalog clients to use such a field. That would
be taken up in a separate change to keep this patch smaller.

Testing:
1. Ran existing filemetadata loader tests.
2. Added a new test which exercises the various cases for
ValidWriteIdList comparison.

Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
---
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogdTableInvalidatorTest.java
A fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java
M shaded-deps/pom.xml
14 files changed, 676 insertions(+), 77 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/16008/2
-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/6171/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Sat, 30 May 2020 00:42:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5966/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 9
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 08 Jun 2020 22:25:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 15:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/16008/14/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/16008/14/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@20
PS14, Line 20: import java.io.FileNotFoundException
> nit: move this to line 102
Done


http://gerrit.cloudera.org:8080/#/c/16008/14/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1552
PS14, Line 1552: "Could not use
> We should not use String.format() here since we are using "{}" instead of "
yes, makes sense. Thanks!


http://gerrit.cloudera.org:8080/#/c/16008/14/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
File fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java:

http://gerrit.cloudera.org:8080/#/c/16008/14/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@21
PS14, Line 21: import java.sql.SQLException;
> nit: can remove them since the following codes are not using these static i
Done


http://gerrit.cloudera.org:8080/#/c/16008/14/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@42
PS14, Line 42: import org.junit.After;
> nit: unused import
Done


http://gerrit.cloudera.org:8080/#/c/16008/14/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@223
PS14, Line 223:     ValidWriteIdList currentWriteIdList = getValidWriteIdList(testDbName,
              :       testPartitionedTbl);
> Can we simply get this by tbl.getValidWriteIds() instead of sending an HMS 
I think its safer to get the validwriteIdList directly from the source of truth for the test.



-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 15
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Jun 2020 05:43:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has uploaded a new patch set (#18). ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................

IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

This change enhances the Catalog-v2 API getPartialCatalogObject to
support ValidWriteIdList as an optional field in the TableInfoSelector.
When such a field is provided by the clients, catalog compares the
provided ValidWriteIdList with the cached ValidWriteIdList of the
table. The catalog reloads the table if it determines that the cached
table is stale with respect to the ValidWriteIdList provided.
In case the table is already at or above the requested ValidWriteIdList
catalog uses the cached table metadata information to filter out
filedescriptors pertaining to the provided ValidWriteIdList.
Note that in case compactions it is possible that the requested
ValidWriteIdList cannot be satisfied using the cached file-metadata
for some partitions. For such partitions, catalog re-fetches the
file-metadata from the FileSystem.

In order to implement the fall-back to getting the file-metadata from
filesystem, the patch refactor some of file-metadata loading logic into
ParallelFileMetadataLoader which also helps simplify some methods
in HdfsTable.java. Additionally, it modifies the WriteIdBasedPredicate
to optionally do a strict check which throws an exception on some
scenarios.

This is helpful to provide a snapshot view of the table metadata during
query compilation with respect to other changes happening to the table
concurrently. Note that this change does not implement the coordinator
side changes needed for catalog clients to use such a field. That would
be taken up in a separate change to keep this patch smaller.

Testing:
1. Ran existing filemetadata loader tests.
2. Added a new test which exercises the various cases for
ValidWriteIdList comparison.
3. Ran core tests along with the dependent MetastoreClientPool
patch (IMPALA-9824).

Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
---
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogdTableInvalidatorTest.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
A fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/testutil/ImpalaJdbcClient.java
M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M shaded-deps/pom.xml
22 files changed, 1,169 insertions(+), 183 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/16008/18
-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 18
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16008/11/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/16008/11/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1571
PS11, Line 1571:       LOG.debug("{} files filtered out of table {} for {}. Hit rate : {}", numFilesFiltered,
line too long (92 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 11
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Jun 2020 06:39:33 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 15: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 15
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Jun 2020 07:01:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 14:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/6275/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 14
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Jun 2020 19:29:48 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................

IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

This change enhances the Catalog-v2 API getPartialCatalogObject to
support ValidWriteIdList as an optional field in the TableInfoSelector.
When such a field is provided by the clients, catalog compares the
provided ValidWriteIdList with the cached ValidWriteIdList of the
table. The catalog reloads the table if it determines that the cached
table is stale with respect to the ValidWriteIdList provided.
In case the table is already at or above the requested ValidWriteIdList
catalog uses the cached table metadata information to filter out
filedescriptors pertaining to the provided ValidWriteIdList.
Note that in case compactions it is possible that the requested
ValidWriteIdList cannot be satisfied using the cached file-metadata
for some partitions. For such partitions, catalog re-fetches the
file-metadata from the FileSystem.

In order to implement the fall-back to getting the file-metadata from
filesystem, the patch refactor some of file-metadata loading logic into
ParallelFileMetadataLoader which also helps simplify some methods
in HdfsTable.java. Additionally, it modifies the WriteIdBasedPredicate
to optionally do a strict check which throws an exception on some
scenarios.

This is helpful to provide a snapshot view of the table metadata during
query compilation with respect to other changes happening to the table
concurrently. Note that this change does not implement the coordinator
side changes needed for catalog clients to use such a field. That would
be taken up in a separate change to keep this patch smaller.

Testing:
1. Ran existing filemetadata loader tests.
2. Added a new test which exercises the various cases for
ValidWriteIdList comparison.

Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
---
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogdTableInvalidatorTest.java
A fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java
M shaded-deps/pom.xml
19 files changed, 1,108 insertions(+), 162 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/16008/5
-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 5
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 19: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5985/


-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 19
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Jun 2020 06:30:41 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................

IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

This change enhances the Catalog-v2 API getPartialCatalogObject to
support ValidWriteIdList as an optional field in the TableInfoSelector.
When such a field is provided by the clients, catalog compares the
provided ValidWriteIdList with the cached ValidWriteIdList of the
table. The catalog reloads the table if it determines that the cached
table is stale with respect to the ValidWriteIdList provided.
In case the table is already at or above the requested ValidWriteIdList
catalog uses the cached table metadata information to filter out
filedescriptors pertaining to the provided ValidWriteIdList.
Note that in case compactions it is possible that the requested
ValidWriteIdList cannot be satisfied using the cached file-metadata
for some partitions. For such partitions, catalog re-fetches the
file-metadata from the FileSystem.

In order to implement the fall-back to getting the file-metadata from
filesystem, the patch refactor some of file-metadata loading logic into
ParallelFileMetadataLoader which also helps simplify some methods
in HdfsTable.java. Additionally, it modifies the WriteIdBasedPredicate
to optionally do a strict check which throws an exception on some
scenarios.

This is helpful to provide a snapshot view of the table metadata during
query compilation with respect to other changes happening to the table
concurrently. Note that this change does not implement the coordinator
side changes needed for catalog clients to use such a field. That would
be taken up in a separate change to keep this patch smaller.

Testing:
1. Ran existing filemetadata loader tests.
2. Added a new test which exercises the various cases for
ValidWriteIdList comparison.
3. Ran core tests along with the dependent MetastoreClientPool
patch (IMPALA-9824).

Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
---
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogdTableInvalidatorTest.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
A fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M shaded-deps/pom.xml
21 files changed, 1,126 insertions(+), 176 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/16008/14
-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 14
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/6246/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 9
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 08 Jun 2020 23:10:56 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 8:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@26
PS8, Line 26: import java.text.DecimalFormat;
> nit: unsed import
Done


http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@642
PS8, Line 642:     ValidTxnList validTxnList = validWriteIds_ != null ? loadValidTxns(client) : null;
> This is no longer used.
the latest patch refactors the code and uses this.


http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@997
PS8, Line 997:         loadValidWriteIdList(client);
> We always update the validWriteIdList here no matter fully or incrementally
Yes, I think you are right. The ValidWriteIdList should only be reloaded when file-metadata is being reloaded in my opinion. But this is tricky to do since that would mean that in case of incremental refresh we might end up doing a full file-metadata refresh and may be seen as a perf regression. I think it would require some more thought and testing.

For now, I would like to keep that out of scope of this patch since this focuses only on the read-path. The patch is not operational currently since the coordinators never send any ValidWriteIdList. I created IMPALA-9841 to track those changes.


http://gerrit.cloudera.org:8080/#/c/16008/5/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
File fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java:

http://gerrit.cloudera.org:8080/#/c/16008/5/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java@102
PS5, Line 102:           Utils.shouldRecursivelyListPartitions(table), oldFds, table.getHostIndex(),
> line too long (99 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
File fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java:

http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java@79
PS8, Line 79:     // Group the partitions by their path (multiple partitions may point to the same
            :     // path).
> nit: could you move this comment to line 91?
Done


http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java@83
PS8, Line 83:       try (MetaStoreClient client = MetaStoreClientPool.get().getClient()) {
            :         try {
> nit: can we have a single try-catch?
Done


http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java@101
PS8, Line 101:       FileMetadataLoader loader = new FileMetadataLoader(e.getKey(),
             :           Utils.shouldRecursivelyListPartitions(table), oldFds, table.getHostIndex(),
             :           validTxnList, writeIdList, e.getValue().get(0).getFileFormat());
> IIUC, the 'writeIdList' sent from the client(coordinator) is only used to c
I would like to defer opening a transaction during table load to a separate change since it is needed for a separate issue.

I refactored the code so that ParallelFileMetadataLoader doesn't fetch the transaction id list here since its seems cleaner (FileMetadataLoaders deal with FileSystem not HMS).


http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java@101
PS8, Line 101:       FileMetadataLoader loader = new FileMetadataLoader(e.getKey(),
             :           Utils.shouldRecursivelyListPartitions(table), oldFds, table.getHostIndex(),
             :           validTxnList, writeIdList, e.getValue().get(0).getFileFormat());
> If 'validTxnList' is loaded after a while since 'writeIdList' is fetched by
we are actually doing a strict check in the regular path as well now. But instead of throwing an exception we just ignore the files which have open writeIds in compacted files. See the changes in AcidUtils.java for WriteListBasedPredicate class.


http://gerrit.cloudera.org:8080/#/c/16008/5/fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
File fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java:

http://gerrit.cloudera.org:8080/#/c/16008/5/fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java@189
PS5, Line 189:       List<PartitionRef> partitionRefs) throws CatalogException, TException {
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/util/AcidUtils.java
File fe/src/main/java/org/apache/impala/util/AcidUtils.java:

http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/util/AcidUtils.java@372
PS8, Line 372: try
> nit: trying
Done


http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
File fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java:

http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java@145
PS8, Line 145: MetaE
> nit: CatalogException
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 8
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 08 Jun 2020 21:43:31 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 11:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/6267/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 11
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Jun 2020 07:24:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................

IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

This change enhances the Catalog-v2 API getPartialCatalogObject to
support ValidWriteIdList as an optional field in the TableInfoSelector.
When such a field is provided by the clients, catalog compares the
provided ValidWriteIdList with the cached ValidWriteIdList of the
table. The catalog reloads the table if it determines that the cached
table is stale with respect to the ValidWriteIdList provided.
In case the table is already at or above the requested ValidWriteIdList
catalog uses the cached table metadata information to filter out
filedescriptors pertaining to the provided ValidWriteIdList.
Note that in case compactions it is possible that the requested
ValidWriteIdList cannot be satisfied using the cached file-metadata
for some partitions. For such partitions, catalog re-fetches the
file-metadata from the FileSystem.

In order to implement the fall-back to getting the file-metadata from
filesystem, the patch refactor some of file-metadata loading logic into
ParallelFileMetadataLoader which also helps simplify some methods
in HdfsTable.java. Additionally, it modifies the WriteIdBasedPredicate
to optionally do a strict check which throws an exception on some
scenarios.

This is helpful to provide a snapshot view of the table metadata during
query compilation with respect to other changes happening to the table
concurrently. Note that this change does not implement the coordinator
side changes needed for catalog clients to use such a field. That would
be taken up in a separate change to keep this patch smaller.

Testing:
1. Ran existing filemetadata loader tests.
2. Added a new test which exercises the various cases for
ValidWriteIdList comparison.
3. Ran core tests along with the dependent MetastoreClientPool
patch (IMPALA-9824).

Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
---
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogdTableInvalidatorTest.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
A fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M shaded-deps/pom.xml
21 files changed, 1,127 insertions(+), 176 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/16008/11
-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 11
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5967/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 9
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 08 Jun 2020 22:26:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 9: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5967/


-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 9
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 09 Jun 2020 03:35:29 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 17: Code-Review+2

(4 comments)

http://gerrit.cloudera.org:8080/#/c/16008/17/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/16008/17/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@119
PS17, Line 119: import com.google.common.base.Stopwatch;
nit: please sort this to line 116


http://gerrit.cloudera.org:8080/#/c/16008/17/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
File fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java:

http://gerrit.cloudera.org:8080/#/c/16008/17/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@79
PS17, Line 79:     impalaJdbcClient_ = ImpalaJdbcClient.createClientUsingHiveJdbcDriver();
nit: unused var


http://gerrit.cloudera.org:8080/#/c/16008/17/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@111
PS17, Line 111: if (client != null) {
nit: this is always true otherwise line 97 will fail first. Actually, ImpalaJdbcClient.createClientUsingHiveJdbcDriver() always returns nonnull values


http://gerrit.cloudera.org:8080/#/c/16008/17/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@131
PS17, Line 131: if (client != null) {
nit: this is always true otherwise line 127 will fail first



-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 17
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Jun 2020 00:35:54 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 13:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/6273/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 13
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Jun 2020 19:12:32 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 8:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@26
PS8, Line 26: import java.text.DecimalFormat;
nit: unsed import


http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@642
PS8, Line 642:     ValidTxnList validTxnList = validWriteIds_ != null ? loadValidTxns(client) : null;
This is no longer used.


http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@997
PS8, Line 997:         loadValidWriteIdList(client);
We always update the validWriteIdList here no matter fully or incrementally reload the table. Most of the DDL/DML code paths invoke incremental reload. E.g. INSERT statement will only reload affected partitions or load new partitions. For other partitions, they are previously loaded with an old validWriteIdList. Should we trigger a full reload when we detect they are stale (e.g. validWriteIdList changes more than expected)? For instance, I'm not sure if we handle this scenario correctly:

impala> refresh tableA;   // tableA has two partitions p=1,2
hive> insert into tableA partition (p=1) values ...;
impala> insert into table tableA partition (p=2) values ...;

In the INSERT statement, we just reload the metadata of partition(p=2). However, the validWriteIdList is also updated. This patch use this new validWriteIdList to compare with those sent from clients(coordinators), if they match, we are returning the stale metadata for partition(p=1). Is my understanding correct?


http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
File fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java:

http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java@79
PS8, Line 79:     // Group the partitions by their path (multiple partitions may point to the same
            :     // path).
nit: could you move this comment to line 91?


http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java@101
PS8, Line 101:       FileMetadataLoader loader = new FileMetadataLoader(e.getKey(),
             :           Utils.shouldRecursivelyListPartitions(table), oldFds, table.getHostIndex(),
             :           validTxnList, writeIdList, e.getValue().get(0).getFileFormat());
> If 'validTxnList' is loaded after a while since 'writeIdList' is fetched by
IIUC, the 'writeIdList' sent from the client(coordinator) is only used to compare with the table's 'writeIdList'. I think 'writeIdList' here comes from HdfsTable.load() instead of the client(coordinator):
https://github.com/apache/impala/blob/d45e3a50b003259e4ef1023333b47781a028eb19/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L1017
and it's only updated in HdfsTable.load() -> HdfsTable.loadValidWriteIdList(). Should we cache the 'validTxnList' there instead of getting it here? and also open a read transaction in HdfsTable.load()?


http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/util/AcidUtils.java
File fe/src/main/java/org/apache/impala/util/AcidUtils.java:

http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/util/AcidUtils.java@372
PS8, Line 372: try
nit: trying



-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 8
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 08 Jun 2020 13:29:05 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5913/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 29 May 2020 23:55:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 15:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/6279/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 15
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Jun 2020 06:30:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Anurag Mantripragada (Code Review)" <ge...@cloudera.org>.
Anurag Mantripragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 2:

(8 comments)

I did the first round, I will still have to go through the tests.

http://gerrit.cloudera.org:8080/#/c/16008/2/common/thrift/CatalogService.thrift
File common/thrift/CatalogService.thrift:

http://gerrit.cloudera.org:8080/#/c/16008/2/common/thrift/CatalogService.thrift@497
PS2, Line 497: valid_write_id_list
Nit: Could we use the same variable name for this field to avoid confusion in code?


http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1851
PS2, Line 1851:     
Nit: Indent 4 spaces.


http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1868
PS2, Line 1868:   
Nit: Indent 4 spaces.


http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3027
PS2, Line 3027: String dbName = objectDesc.getTable().db_name == null ? "default"
              :             : objectDesc.getTable().db_name;
I could not find the code for it but does a similar conversion to "default" db happen in the later getOrLoadTable() call? I just want to make sure this conversion is consistent.


http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
File fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java:

http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java@196
PS2, Line 196:   
Nit: indent with 4 spaces.


http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1536
PS2, Line 1536: LoadStats stats = new LoadStats(getHdfsBaseDirPath());
If LoadStats is only needed if `req.table_info_selector.want_partition_files` is true, could we move this initilization inside the if-statement on L1561 below?


http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java
File fe/src/main/java/org/apache/impala/util/AcidUtils.java:

http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java@324
PS2, Line 324: public static List
Could you add a method description?


http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
File fe/src/test/java/org/apache/impala/catalog/CatalogTest.java:

http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java@138
PS2, Line 138:   
Nit: 4 space indentation needed here and other places in this file.



-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 01 Jun 2020 22:33:58 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 18:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/6300/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 18
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Jun 2020 01:50:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................

IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

This change enhances the Catalog-v2 API getPartialCatalogObject to
support ValidWriteIdList as an optional field in the TableInfoSelector.
When such a field is provided by the clients, catalog compares the
provided ValidWriteIdList with the cached ValidWriteIdList of the
table. The catalog reloads the table if it determines that the cached
table is stale with respect to the ValidWriteIdList provided.
In case the table is already at or above the requested ValidWriteIdList
catalog uses the cached table metadata information to filter out
filedescriptors pertaining to the provided ValidWriteIdList.
Note that in case compactions it is possible that the requested
ValidWriteIdList cannot be satisfied using the cached file-metadata
for some partitions. For such partitions, catalog re-fetches the
file-metadata from the FileSystem.

This is helpful to provide a snapshot view of the table metadata during
query compilation with respect to other changes happening to the table
concurrently. Note that this change does not implement the coordinator
side changes needed for catalog clients to use such a field. That would
be taken up in a separate change to keep this patch smaller.

Testing:
1. Ran existing filemetadata loader tests.
2. Added a new test which exercises the various cases for
ValidWriteIdList comparison.

Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
---
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogdTableInvalidatorTest.java
A fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java
M shaded-deps/pom.xml
16 files changed, 1,074 insertions(+), 142 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/16008/4
-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 4
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1850
PS2, Line 1850: String reason
nit: could you keep "reason" as the last parameter to be consistent with other method definitions?


http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1850
PS2, Line 1850:   public Table getOrLoadTable(String dbName, String tblName, String reason,
              :     ValidWriteIdList validWriteIdList
I see many lines of the change are adding a default null value to the new parameter. What about adding this parameter in an overload? So we don't need to touch existing callers of this. I.e.

public Table getOrLoadTable(String dbName, String tblName, String reason)
    throws CatalogException {
  return getOrLoadTable(dbName, tblName, null, reason);
}
public Table getOrLoadTable(String dbName, String tblName,
    ValidWriteIdList validWriteIdList, String reason) throws CatalogException {
  ...
}



-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 02 Jun 2020 22:13:15 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................

IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

This change enhances the Catalog-v2 API getPartialCatalogObject to
support ValidWriteIdList as an optional field in the TableInfoSelector.
When such a field is provided by the clients, catalog compares the
provided ValidWriteIdList with the cached ValidWriteIdList of the
table. The catalog reloads the table if it determines that the cached
table is stale with respect to the ValidWriteIdList provided.
In case the table is already at or above the requested ValidWriteIdList
catalog uses the cached table metadata information to filter out
filedescriptors pertaining to the provided ValidWriteIdList.
Note that in case compactions it is possible that the requested
ValidWriteIdList cannot be satisfied using the cached file-metadata
for some partitions. For such partitions, catalog re-fetches the
file-metadata from the FileSystem.

In order to implement the fall-back to getting the file-metadata from
filesystem, the patch refactor some of file-metadata loading logic into
ParallelFileMetadataLoader which also helps simplify some methods
in HdfsTable.java. Additionally, it modifies the WriteIdBasedPredicate
to optionally do a strict check which throws an exception on some
scenarios.

This is helpful to provide a snapshot view of the table metadata during
query compilation with respect to other changes happening to the table
concurrently. Note that this change does not implement the coordinator
side changes needed for catalog clients to use such a field. That would
be taken up in a separate change to keep this patch smaller.

Testing:
1. Ran existing filemetadata loader tests.
2. Added a new test which exercises the various cases for
ValidWriteIdList comparison.

Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
---
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogdTableInvalidatorTest.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
A fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M shaded-deps/pom.xml
21 files changed, 1,122 insertions(+), 172 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/16008/7
-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 7
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 20: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 20
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Jun 2020 12:36:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16008/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/16008/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1571
PS9, Line 1571:       LOG.debug("{} files filtered out of table {} for {}. Hit rate : {}", numFilesFiltered,
line too long (92 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 9
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 08 Jun 2020 22:25:05 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/6211/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 4
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 04 Jun 2020 08:54:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................

IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

This change enhances the Catalog-v2 API getPartialCatalogObject to
support ValidWriteIdList as an optional field in the TableInfoSelector.
When such a field is provided by the clients, catalog compares the
provided ValidWriteIdList with the cached ValidWriteIdList of the
table. The catalog reloads the table if it determines that the cached
table is stale with respect to the ValidWriteIdList provided.
In case the table is already at or above the requested ValidWriteIdList
catalog uses the cached table metadata information to filter out
filedescriptors pertaining to the provided ValidWriteIdList.
Note that in case compactions it is possible that the requested
ValidWriteIdList cannot be satisfied using the cached file-metadata
for some partitions. For such partitions, catalog re-fetches the
file-metadata from the FileSystem.

In order to implement the fall-back to getting the file-metadata from
filesystem, the patch refactor some of file-metadata loading logic into
ParallelFileMetadataLoader which also helps simplify some methods
in HdfsTable.java. Additionally, it modifies the WriteIdBasedPredicate
to optionally do a strict check which throws an exception on some
scenarios.

This is helpful to provide a snapshot view of the table metadata during
query compilation with respect to other changes happening to the table
concurrently. Note that this change does not implement the coordinator
side changes needed for catalog clients to use such a field. That would
be taken up in a separate change to keep this patch smaller.

Testing:
1. Ran existing filemetadata loader tests.
2. Added a new test which exercises the various cases for
ValidWriteIdList comparison.
3. Ran core tests along with the dependent MetastoreClientPool
patch (IMPALA-9824).

Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
---
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogdTableInvalidatorTest.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
A fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M shaded-deps/pom.xml
21 files changed, 1,126 insertions(+), 176 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/16008/13
-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 13
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................

IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

This change enhances the Catalog-v2 API getPartialCatalogObject to
support ValidWriteIdList as an optional field in the TableInfoSelector.
When such a field is provided by the clients, catalog compares the
provided ValidWriteIdList with the cached ValidWriteIdList of the
table. The catalog reloads the table if it determines that the cached
table is stale with respect to the ValidWriteIdList provided.
In case the table is already at or above the requested ValidWriteIdList
catalog uses the cached table metadata information to filter out
filedescriptors pertaining to the provided ValidWriteIdList.
Note that in case compactions it is possible that the requested
ValidWriteIdList cannot be satisfied using the cached file-metadata
for some partitions. For such partitions, catalog re-fetches the
file-metadata from the FileSystem.

In order to implement the fall-back to getting the file-metadata from
filesystem, the patch refactor some of file-metadata loading logic into
ParallelFileMetadataLoader which also helps simplify some methods
in HdfsTable.java. Additionally, it modifies the WriteIdBasedPredicate
to optionally do a strict check which throws an exception on some
scenarios.

This is helpful to provide a snapshot view of the table metadata during
query compilation with respect to other changes happening to the table
concurrently. Note that this change does not implement the coordinator
side changes needed for catalog clients to use such a field. That would
be taken up in a separate change to keep this patch smaller.

Testing:
1. Ran existing filemetadata loader tests.
2. Added a new test which exercises the various cases for
ValidWriteIdList comparison.
3. Ran core tests along with the dependent MetastoreClientPool
patch (IMPALA-9824).

Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
---
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogdTableInvalidatorTest.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
A fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M shaded-deps/pom.xml
21 files changed, 1,121 insertions(+), 176 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/16008/15
-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 15
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................

IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

This change enhances the Catalog-v2 API getPartialCatalogObject to
support ValidWriteIdList as an optional field in the TableInfoSelector.
When such a field is provided by the clients, catalog compares the
provided ValidWriteIdList with the cached ValidWriteIdList of the
table. The catalog reloads the table if it determines that the cached
table is stale with respect to the ValidWriteIdList provided.
In case the table is already at or above the requested ValidWriteIdList
catalog uses the cached table metadata information to filter out
filedescriptors pertaining to the provided ValidWriteIdList.
Note that in case compactions it is possible that the requested
ValidWriteIdList cannot be satisfied using the cached file-metadata
for some partitions. For such partitions, catalog re-fetches the
file-metadata from the FileSystem.

In order to implement the fall-back to getting the file-metadata from
filesystem, the patch refactor some of file-metadata loading logic into
ParallelFileMetadataLoader which also helps simplify some methods
in HdfsTable.java. Additionally, it modifies the WriteIdBasedPredicate
to optionally do a strict check which throws an exception on some
scenarios.

This is helpful to provide a snapshot view of the table metadata during
query compilation with respect to other changes happening to the table
concurrently. Note that this change does not implement the coordinator
side changes needed for catalog clients to use such a field. That would
be taken up in a separate change to keep this patch smaller.

Testing:
1. Ran existing filemetadata loader tests.
2. Added a new test which exercises the various cases for
ValidWriteIdList comparison.

Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
---
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogdTableInvalidatorTest.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
A fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M shaded-deps/pom.xml
21 files changed, 1,128 insertions(+), 172 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/16008/8
-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 8
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 2:

(5 comments)

I did a first pass of the code changes. Haven't looked at the tests in detail.

http://gerrit.cloudera.org:8080/#/c/16008/2/common/thrift/CatalogService.thrift
File common/thrift/CatalogService.thrift:

http://gerrit.cloudera.org:8080/#/c/16008/2/common/thrift/CatalogService.thrift@497
PS2, Line 497:   3: optional CatalogObjects.TValidWriteIdList valid_write_id_list
May be better to keep the naming consistent i.e valid_write_ids (as defined in CatalogService.thrift) or make the other one same as this.


http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3026
PS2, Line 3026:         if (req.table_info_selector.valid_write_ids != null) {
This code block is for both TABLE and VIEW catalog objects. I would expect the ValidWriteIdList to be null for views  .. or does it reflect a concatenation of the valid writeIds of all the tables contained in that view ?


http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java
File fe/src/main/java/org/apache/impala/util/AcidUtils.java:

http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java@324
PS2, Line 324:   public static List<FileDescriptor> filterFDsforAcidState(List<FileDescriptor> fds,
Would be good to add a brief comment for the motivation for this method compared to the pre-existing  filterFilesForAcidState. At first glance, it is not obvious why a second method was needed.  Since you already give the motivation in WriteListBasedPredicate maybe you can point to those comments.


http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java@329
PS2, Line 329:     for (FileDescriptor fd : fds) {
These file descriptors are the total across all (pruned) partitions, right?  I think you  have mentioned offline that partition level ValidWriteIdList could potentially be used to skip although currently they are not reliable.  The performance would be an issue for very large lists where partition pruning was not too selective..but I suppose it is no worse than the existing filterFilesForAcidState() method.  Maybe a future enhancement.


http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java@579
PS2, Line 579:     // if tbl is not a transactional, there is not to compare against and we return 0
nit: change 'not  to compare ..'  to 'nothing to compare ..'



-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 01 Jun 2020 21:52:33 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/6219/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 8
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 05 Jun 2020 01:33:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 8:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
File fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java:

http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java@83
PS8, Line 83:       try (MetaStoreClient client = MetaStoreClientPool.get().getClient()) {
            :         try {
nit: can we have a single try-catch?


http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java@101
PS8, Line 101:       FileMetadataLoader loader = new FileMetadataLoader(e.getKey(),
             :           Utils.shouldRecursivelyListPartitions(table), oldFds, table.getHostIndex(),
             :           validTxnList, writeIdList, e.getValue().get(0).getFileFormat());
If 'validTxnList' is loaded after a while since 'writeIdList' is fetched by the client, then it might load a compacted directory that has contents we shouldn't see based on the write id list.

So we should either do the strict check, or make the client provide a 'validTxnList' alongside with the write id list. You already mentioned the need for the strict check in the fallback path in a previous comment, but I think the current code doesn't have it. Or maybe I'm missing something.

To detect removed directories, we should check whether we have directories/files for all the committed write ids. If the files gets deleted shortly after, the backend would throw an error. However, as we already talked about it out of band, the ideal solution would be to open a transaction and put a SHARED_READ lock on the table.


http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/util/AcidUtils.java
File fe/src/main/java/org/apache/impala/util/AcidUtils.java:

http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/main/java/org/apache/impala/util/AcidUtils.java@432
PS8, Line 432: Catalog
Thanks for migrating this to CatalogException!


http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
File fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java:

http://gerrit.cloudera.org:8080/#/c/16008/8/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java@145
PS8, Line 145: MetaE
nit: CatalogException



-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 8
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 05 Jun 2020 13:32:18 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 14: Code-Review+2

(5 comments)

Carry on Zoltan's +1. Please carry on my +2 after resolving the String.format() comment. Also left some minor comments.

http://gerrit.cloudera.org:8080/#/c/16008/14/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/16008/14/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@20
PS14, Line 20: import com.codahale.metrics.Counter;
nit: move this to line 102


http://gerrit.cloudera.org:8080/#/c/16008/14/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1552
PS14, Line 1552: String.format(
We should not use String.format() here since we are using "{}" instead of "%s". The last argument of LOG.debug() can be a Throwable: https://github.com/qos-ch/slf4j/blob/v_1.7.25/slf4j-api/src/main/java/org/slf4j/helpers/MessageFormatter.java#L160-L163


http://gerrit.cloudera.org:8080/#/c/16008/14/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
File fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java:

http://gerrit.cloudera.org:8080/#/c/16008/14/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@21
PS14, Line 21: import static org.junit.Assert.assertTrue;
nit: can remove them since the following codes are not using these static import.


http://gerrit.cloudera.org:8080/#/c/16008/14/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@42
PS14, Line 42: import org.apache.impala.thrift.TValidWriteIdList;
nit: unused import


http://gerrit.cloudera.org:8080/#/c/16008/14/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@223
PS14, Line 223:     ValidWriteIdList olderWriteIdList = getValidWriteIdList(testDbName,
              :       testPartitionedTbl);
Can we simply get this by tbl.getValidWriteIds() instead of sending an HMS RPC?



-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 14
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Jun 2020 02:50:21 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................

IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

This change enhances the Catalog-v2 API getPartialCatalogObject to
support ValidWriteIdList as an optional field in the TableInfoSelector.
When such a field is provided by the clients, catalog compares the
provided ValidWriteIdList with the cached ValidWriteIdList of the
table. The catalog reloads the table if it determines that the cached
table is stale with respect to the ValidWriteIdList provided.
In case the table is already at or above the requested ValidWriteIdList
catalog uses the cached table metadata information to filter out
filedescriptors pertaining to the provided ValidWriteIdList.
Note that in case compactions it is possible that the requested
ValidWriteIdList cannot be satisfied using the cached file-metadata
for some partitions. For such partitions, catalog re-fetches the
file-metadata from the FileSystem.

In order to implement the fall-back to getting the file-metadata from
filesystem, the patch refactor some of file-metadata loading logic into
ParallelFileMetadataLoader which also helps simplify some methods
in HdfsTable.java. Additionally, it modifies the WriteIdBasedPredicate
to optionally do a strict check which throws an exception on some
scenarios.

This is helpful to provide a snapshot view of the table metadata during
query compilation with respect to other changes happening to the table
concurrently. Note that this change does not implement the coordinator
side changes needed for catalog clients to use such a field. That would
be taken up in a separate change to keep this patch smaller.

Testing:
1. Ran existing filemetadata loader tests.
2. Added a new test which exercises the various cases for
ValidWriteIdList comparison.

Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
---
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogdTableInvalidatorTest.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
A fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M shaded-deps/pom.xml
21 files changed, 1,127 insertions(+), 176 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/16008/9
-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 9
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has uploaded a new patch set (#17). ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................

IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

This change enhances the Catalog-v2 API getPartialCatalogObject to
support ValidWriteIdList as an optional field in the TableInfoSelector.
When such a field is provided by the clients, catalog compares the
provided ValidWriteIdList with the cached ValidWriteIdList of the
table. The catalog reloads the table if it determines that the cached
table is stale with respect to the ValidWriteIdList provided.
In case the table is already at or above the requested ValidWriteIdList
catalog uses the cached table metadata information to filter out
filedescriptors pertaining to the provided ValidWriteIdList.
Note that in case compactions it is possible that the requested
ValidWriteIdList cannot be satisfied using the cached file-metadata
for some partitions. For such partitions, catalog re-fetches the
file-metadata from the FileSystem.

In order to implement the fall-back to getting the file-metadata from
filesystem, the patch refactor some of file-metadata loading logic into
ParallelFileMetadataLoader which also helps simplify some methods
in HdfsTable.java. Additionally, it modifies the WriteIdBasedPredicate
to optionally do a strict check which throws an exception on some
scenarios.

This is helpful to provide a snapshot view of the table metadata during
query compilation with respect to other changes happening to the table
concurrently. Note that this change does not implement the coordinator
side changes needed for catalog clients to use such a field. That would
be taken up in a separate change to keep this patch smaller.

Testing:
1. Ran existing filemetadata loader tests.
2. Added a new test which exercises the various cases for
ValidWriteIdList comparison.
3. Ran core tests along with the dependent MetastoreClientPool
patch (IMPALA-9824).

Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
---
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogdTableInvalidatorTest.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
A fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/testutil/ImpalaJdbcClient.java
M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M shaded-deps/pom.xml
22 files changed, 1,175 insertions(+), 183 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/16008/17
-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 17
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5947/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 8
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 05 Jun 2020 01:14:40 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16008/5/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
File fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java:

http://gerrit.cloudera.org:8080/#/c/16008/5/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java@102
PS5, Line 102:           Utils.shouldRecursivelyListPartitions(table), oldFds, table.getHostIndex(), validTxnList,
line too long (99 > 90)


http://gerrit.cloudera.org:8080/#/c/16008/5/fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
File fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java:

http://gerrit.cloudera.org:8080/#/c/16008/5/fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java@189
PS5, Line 189:       List<PartitionRef> partitionRefs) throws CatalogException, MetaException, TException {
line too long (92 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 5
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 04 Jun 2020 19:31:27 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 20: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 20
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Jun 2020 07:32:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 8: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5947/


-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 8
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 05 Jun 2020 05:16:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 20:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5987/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 20
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Jun 2020 07:32:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16008/4/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
File fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java:

http://gerrit.cloudera.org:8080/#/c/16008/4/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java@102
PS4, Line 102:           Utils.shouldRecursivelyListPartitions(table), oldFds, table.getHostIndex(), validTxnList,
line too long (99 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 4
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 04 Jun 2020 08:09:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 17:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16008/17/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
File fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java:

http://gerrit.cloudera.org:8080/#/c/16008/17/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@106
PS17, Line 106:       client.execStatement("insert into " + getPartitionedTblName() + " partition (part=1) "
line too long (92 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 17
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Jun 2020 23:55:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................

IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

This change enhances the Catalog-v2 API getPartialCatalogObject to
support ValidWriteIdList as an optional field in the TableInfoSelector.
When such a field is provided by the clients, catalog compares the
provided ValidWriteIdList with the cached ValidWriteIdList of the
table. The catalog reloads the table if it determines that the cached
table is stale with respect to the ValidWriteIdList provided.
In case the table is already at or above the requested ValidWriteIdList
catalog uses the cached table metadata information to filter out
filedescriptors pertaining to the provided ValidWriteIdList.

This is helpful to provide a snapshot view of the table metadata during
query compilation with respect to other changes happening to the table
concurrently. Note that this change does not implement the coordinator
side changes needed for catalog clients to use such a field. That would
be taken up in a separate change to keep this patch smaller.

Testing:
1. Ran existing filemetadata loader tests.
2. Added a new test which exercises the various cases for
ValidWriteIdList comparison.

Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
---
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogdTableInvalidatorTest.java
A fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java
M shaded-deps/pom.xml
14 files changed, 683 insertions(+), 77 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/16008/3
-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 3
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 12:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/6272/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 12
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Jun 2020 18:47:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 11: Code-Review+1

(4 comments)

LGTM, only spotted a few nits.

http://gerrit.cloudera.org:8080/#/c/16008/11/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/16008/11/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1619
PS11, Line 1619:             "Unable to fetch valid transaction ids while loading file metadata for table "
               :                 + getFullName());
You could also add the error message in 'ex' to get the root cause.


http://gerrit.cloudera.org:8080/#/c/16008/11/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
File fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java:

http://gerrit.cloudera.org:8080/#/c/16008/11/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@86
PS11, Line 86:   @Before
             :   public void createTestTbls() throws Exception {
             :     try (HiveJdbcClient client = hiveClientPool_.getClient()) {
             :       client.executeSql("create database if not exists " + testDbName);
             :       client.executeSql("create table " + getTestTblName() + " like "
             :         + "functional.insert_only_transactional_table stored as parquet");
             :       client
             :         .executeSql("insert into " + getTestTblName() + " values (1)");
             :       client.executeSql("create table " + getPartitionedTblName() + " (c1 int) "
             :         + "partitioned by (part int) stored as parquet " + getTblProperties());
             :       client.executeSql("insert into " + getPartitionedTblName() + " partition (part=1) "
             :         + " values (1)");
             :     }
             :     catalog_.reset();
             :   }
             : 
             :   private static String getTblProperties() {
             :     return "tblproperties ('transactional'='true', 'transactional_properties' = "
             :       + "'insert_only')";
             :   }
             : 
             :   @After
             :   public void dropTestTbls() throws Exception {
             :     try (HiveJdbcClient client = hiveClientPool_.getClient()) {
             :       client.executeSql("drop database " + testDbName + " cascade");
             :     }
             :   }
Would it make sense to execute these via Impala? Probably it'd make the test a bit faster.


http://gerrit.cloudera.org:8080/#/c/16008/11/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@348
PS11, Line 348: 2
nit: missing comma


http://gerrit.cloudera.org:8080/#/c/16008/11/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@411
PS11, Line 411:         "alter table " + getTestTblName()
              :             + " compact 'minor' and wait");
nit: fits single line



-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 11
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Jun 2020 14:04:59 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3026
PS2, Line 3026:         if (req.table_info_selector.valid_write_ids != null) {
> hmm. My understanding is that in Impala Materialized views are treated as v
Yeah, sorry, you are right.


http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java
File fe/src/main/java/org/apache/impala/util/AcidUtils.java:

http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java@324
PS2, Line 324: filterFDsforAcidState
> Thanks Zoltan for the feedback. Is it fair to say that in case of base file
Yeah, I think what you wrote is correct, and yes, we can fetch the latest validTxnIdList during the fallback code.

However, delta directories created by Hive Streaming should be checked in a non-strict way. Such delta directories also contain files corresponding to a write id range, but they don't have visibilityTxnId, so e.g. delta_0001_0020.

Now my only concern is the Hive cleaner thread which deletes old transactional directories. Given that maybe it would be better to force the client to refresh its table metadata when we cannot serve the file descriptors from cache? Or we don't expect that the clients' lag to be too big, so the necessary files should still exist?



-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 03 Jun 2020 14:09:06 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/6200/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 3
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 03 Jun 2020 00:04:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16008/12/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/16008/12/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1571
PS12, Line 1571:       LOG.debug("{} files filtered out of table {} for {}. Hit rate : {}", numFilesFiltered,
line too long (92 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 12
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Jun 2020 18:01:41 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 17:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/6297/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 17
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Jun 2020 00:41:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................

IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

This change enhances the Catalog-v2 API getPartialCatalogObject to
support ValidWriteIdList as an optional field in the TableInfoSelector.
When such a field is provided by the clients, catalog compares the
provided ValidWriteIdList with the cached ValidWriteIdList of the
table. The catalog reloads the table if it determines that the cached
table is stale with respect to the ValidWriteIdList provided.
In case the table is already at or above the requested ValidWriteIdList
catalog uses the cached table metadata information to filter out
filedescriptors pertaining to the provided ValidWriteIdList.
Note that in case compactions it is possible that the requested
ValidWriteIdList cannot be satisfied using the cached file-metadata
for some partitions. For such partitions, catalog re-fetches the
file-metadata from the FileSystem.

In order to implement the fall-back to getting the file-metadata from
filesystem, the patch refactor some of file-metadata loading logic into
ParallelFileMetadataLoader which also helps simplify some methods
in HdfsTable.java. Additionally, it modifies the WriteIdBasedPredicate
to optionally do a strict check which throws an exception on some
scenarios.

This is helpful to provide a snapshot view of the table metadata during
query compilation with respect to other changes happening to the table
concurrently. Note that this change does not implement the coordinator
side changes needed for catalog clients to use such a field. That would
be taken up in a separate change to keep this patch smaller.

Testing:
1. Ran existing filemetadata loader tests.
2. Added a new test which exercises the various cases for
ValidWriteIdList comparison.
3. Ran core tests along with the dependent MetastoreClientPool
patch (IMPALA-9824).

Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Reviewed-on: http://gerrit.cloudera.org:8080/16008
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogdTableInvalidatorTest.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
A fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/testutil/ImpalaJdbcClient.java
M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M shaded-deps/pom.xml
22 files changed, 1,169 insertions(+), 183 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 21
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 5:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/6216/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 5
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 04 Jun 2020 20:12:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 19:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5985/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 19
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Jun 2020 01:05:29 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 19: Code-Review+2

(4 comments)

Thanks for the review. Carrying over the +2

http://gerrit.cloudera.org:8080/#/c/16008/17/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
File fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java:

http://gerrit.cloudera.org:8080/#/c/16008/17/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@79
PS17, Line 79: 
> nit: unused var
Done


http://gerrit.cloudera.org:8080/#/c/16008/17/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@106
PS17, Line 106:     } finally {
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16008/17/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@111
PS17, Line 111: talog_.reset();
> nit: this is always true otherwise line 97 will fail first. Actually, Impal
Done


http://gerrit.cloudera.org:8080/#/c/16008/17/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java@131
PS17, Line 131: 
> nit: this is always true otherwise line 127 will fail first
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 19
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Jun 2020 01:05:50 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................

IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

This change enhances the Catalog-v2 API getPartialCatalogObject to
support ValidWriteIdList as an optional field in the TableInfoSelector.
When such a field is provided by the clients, catalog compares the
provided ValidWriteIdList with the cached ValidWriteIdList of the
table. The catalog reloads the table if it determines that the cached
table is stale with respect to the ValidWriteIdList provided.
In case the table is already at or above the requested ValidWriteIdList
catalog uses the cached table metadata information to filter out
filedescriptors pertaining to the provided ValidWriteIdList.
Note that in case compactions it is possible that the requested
ValidWriteIdList cannot be satisfied using the cached file-metadata
for some partitions. For such partitions, catalog re-fetches the
file-metadata from the FileSystem.

In order to implement the fall-back to getting the file-metadata from
filesystem, the patch refactor some of file-metadata loading logic into
ParallelFileMetadataLoader which also helps simplify some methods
in HdfsTable.java. Additionally, it modifies the WriteIdBasedPredicate
to optionally do a strict check which throws an exception on some
scenarios.

This is helpful to provide a snapshot view of the table metadata during
query compilation with respect to other changes happening to the table
concurrently. Note that this change does not implement the coordinator
side changes needed for catalog clients to use such a field. That would
be taken up in a separate change to keep this patch smaller.

Testing:
1. Ran existing filemetadata loader tests.
2. Added a new test which exercises the various cases for
ValidWriteIdList comparison.
3. Ran core tests along with the dependent MetastoreClientPool
patch (IMPALA-9824).

Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
---
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogdTableInvalidatorTest.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
A fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoWriteIdTest.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M shaded-deps/pom.xml
21 files changed, 1,126 insertions(+), 176 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/16008/12
-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 12
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/6217/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 7
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 04 Jun 2020 21:53:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 2:

(9 comments)

Compactions can cause some troubles I think, see my comment for details.

http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3026
PS2, Line 3026:         if (req.table_info_selector.valid_write_ids != null) {
> This code block is for both TABLE and VIEW catalog objects. I would expect 
Maybe it can be a materialized view?


http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3027
PS2, Line 3027: "default"
nit: Catalog.DEFAULT_DB?


http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3031
PS2, Line 3031:   ,
nit: indentation should be +4 spaces instead of 2, also this standalone comma looks a bit weird


http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1538
PS2, Line 1538:   
nit: indent with 4 spaces


http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1582
PS2, Line 1582:   
nit: indent with 4 spaces


http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java
File fe/src/main/java/org/apache/impala/util/AcidUtils.java:

http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java@227
PS2, Line 227: delta_0001_0002
maybe "delta_0001_0002_v01234", and mention that "1234" is the visibilityTxnId.

Otherwise the sentence "TransactionID derived from the directory name" could be confusing.


http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java@324
PS2, Line 324: filterFDsforAcidState
This might not work well with compactions:

 CREATE TABLE test (i int) .....;

 INSERT INTO TABLE test values (1), (2), (3);

Now delta_001_001 contains this:
 1
 2
 3

Then run

 DELETE FROM test WHERE i = 2;

Now we have delete_delta_002_002 with the delete event of element 2.

 ALTER TABLE test COMPACT 'major';

Compaction creates base_002_v1234 with the following content:
 1
 3

(Value 2 is not stored anymore).

Now if we load the table with both 1 and 2 write ids being valid, also transaction 1234 is valid, then we'll only have base_002_v1234/0000_0 in file descriptors.

And if we try to filter the file descriptors with a writeIdList where only 1 is valid, we won't see any file descriptors. The workaround could be to fallback to load fds from the file system.

Minor compactions are also problematic because the backend assumes that it should only validate rows in files written by Hive Streaming (when it is enough to validate per ORC stripe, or per ORC row batch).

E.g. we might have the following file descriptor:
 delta_001_002_v01234/0000_0

But if based on the validWriteIdList we can only see rows coming from write id 1 then we'd need to add some (quite complex) logic to the backend to filter out rows based on write ids.

Probably the workaround for that could be to add another stricter test to WriteListBasedPredicate. E.g. if 'rr == ValidWriteIdList.RangeResponse.ALL' is not satisfied then we should fall back to loading from the file system.


http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java@325
PS2, Line 325:     
nit: indent with 4 spaces


http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java@587
PS2, Line 587: same
nit: the same



-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 02 Jun 2020 15:59:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
File fe/src/test/java/org/apache/impala/catalog/CatalogTest.java:

http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java@159
PS2, Line 159:     assertNotNull(catalog_.getOrLoadTable("functional", "StringPartitionKey", "test", null));
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java@177
PS2, Line 177:     assertNotNull(catalog_.getOrLoadTable(hbaseDb.getName(), "alltypessmall", "test", null));
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java@182
PS2, Line 182:     assertNotNull(catalog_.getOrLoadTable(hbaseDb.getName(), "alltypesagg", "test", null));
line too long (91 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 29 May 2020 23:56:04 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API
......................................................................


Patch Set 2: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Sat, 30 May 2020 05:12:50 +0000
Gerrit-HasComments: No