You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2018/07/23 23:57:54 UTC

[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

Hello Bharath Vissapragada, Tianyi Wang, Vuk Ercegovac,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/11027

to review the following change.


Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded
......................................................................

IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

Prior to this patch, when a table is first loaded, the catalog iterated
over each of the partition directories and called getFileStatus() on
each, serially, to determine the overall access level of the table.

In some testing, each such call took 1-2ms, so this could add many
seconds to the overall table load time for a table with thousands of
partitions and also add to the NN load.

This patch adds some batch pre-fetching of file status information: for
any parent directory which contains more than one partition, we use the
listStatus() API to fetch the FileStatus objects in bulk.

A new unit test verifies the number of API calls made to the NameNode
during a table load.

Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/util/FsPermissionCache.java
M fe/src/main/java/org/apache/impala/util/FsPermissionChecker.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
4 files changed, 200 insertions(+), 54 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/11027/1
-- 
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>

[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded
......................................................................


Patch Set 3:

Build Started https://jenkins.impala.io/job/gerrit-code-review-checks/69/ 

Running initial code review checks. This is experimental - please report any issues to tarmstrong@cloudera.com or on this JIRA: IMPALA-7317


-- 
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Thu, 26 Jul 2018 01:40:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Bharath Vissapragada, Tianyi Wang, Vuk Ercegovac, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11027

to look at the new patch set (#3).

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded
......................................................................

IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

Prior to this patch, when a table is first loaded, the catalog iterated
over each of the partition directories and called getFileStatus() on
each, serially, to determine the overall access level of the table.

In some testing, each such call took 1-2ms, so this could add many
seconds to the overall table load time for a table with thousands of
partitions and also add to the NN load.

This patch adds some batch pre-fetching of file status information: for
any parent directory which contains more than one partition, we use the
listStatus() API to fetch the FileStatus objects in bulk.

A new unit test verifies the number of API calls made to the NameNode
during a table load.

Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/util/FsPermissionCache.java
M fe/src/main/java/org/apache/impala/util/FsPermissionChecker.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
4 files changed, 220 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/11027/3
-- 
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>

[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded
......................................................................


Patch Set 1:

(8 comments)

Patch lgtm overall. Bunch of nits and one clarification.

http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@857
PS1, Line 857:         HdfsPartition partition = createPartition(msPartition.getSd(), msPartition, permCache);
longline


http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1005
PS1, Line 1005: );
Can you add the table name too.


http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1015
PS1, Line 1015:    * Throws CatalogException if one of the supplied storage descriptors contains metadata that
nit: longline


http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1026
PS1, Line 1026:       HdfsPartition hdfsPartition = createPartition(partition.getSd(), partition, permCache);
longline


http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1625
PS1, Line 1625:       HdfsPartition partition = createPartition(msPartition.getSd(), msPartition, permCache);
longline


http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1675
PS1, Line 1675: );
, ioe);


http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/util/FsPermissionCache.java
File fe/src/main/java/org/apache/impala/util/FsPermissionCache.java:

http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/util/FsPermissionCache.java@40
PS1, Line 40: FileSystem
qq. What is the use of having this as a part of map's key? Isn't it already a part of Path's URI?


http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/util/FsPermissionCache.java@45
PS1, Line 45: if (perms != null) {
            :       return perms;
            :     }
single line.



-- 
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Tue, 24 Jul 2018 06:54:21 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded
......................................................................


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2957/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 8
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Aug 2018 21:39:19 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/11027/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/11027/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@897
PS2, Line 897: this
update?


http://gerrit.cloudera.org:8080/#/c/11027/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@957
PS2, Line 957:     preloadPermissionsCache(msPartitions, permCache);
Looking at the consumers of this, alterTableAddPartitions() and alterTableRecoverPartitions(), I have a feeling that this optimization in this codepath is an overkill (for large tables, say 100k partitions and every partition under tableDir). 

It is probably useful in alterTableRecoverPartitions() where we could add large number of partitions in a single batch, but probably not in alterTableAddPartitions().

Also, both these methods only add partitions in batches of MAX_PARTITION_UPDATES_PER_RPC (500). So we could ideally we repeating RPCs in preloadPermissionsCache for ceil(num_parts/ MAX_PARTITION_UPDATES_PER_RPC) times.

Thoughts?


http://gerrit.cloudera.org:8080/#/c/11027/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1557
PS2, Line 1557:     preloadPermissionsCache(msPartitions, permCache);
Same question as above. Could this be an overkill if size(msPartitions) is small? Say we are refreshing 10 partitions in a 100k partition table?

Looking at the listStatusIterator() Impl, it seems to be using some RemoteIterator impl that does batching of RPCs. I couldn't figure out what is the basis for batching, but I get a feeling that the number of RPCs could be > 1.  

Additionally, we create a bunch of temporary objects for each partition and that could affect GC

Just curious if I'm getting it right and what your thoughts are.



-- 
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Wed, 25 Jul 2018 06:34:16 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded
......................................................................


Patch Set 3: Code-Review+2

(3 comments)

http://gerrit.cloudera.org:8080/#/c/11027/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/11027/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1557
PS2, Line 1557:   }
> I added a simple heuristic here that compares size(msPartitions) vs the num
That looks reasonable to me.


http://gerrit.cloudera.org:8080/#/c/11027/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/11027/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1566
PS3, Line 1566: void
nit: Java-style comment: Returning FsPermissionCache  (vs) passing the output param by reference. 

I don't see this discussed in the Google Java-style guide, but I have a feeling that the former is preferred. The latter appears more C++ style to me.


http://gerrit.cloudera.org:8080/#/c/11027/3/fe/src/main/java/org/apache/impala/util/FsPermissionCache.java
File fe/src/main/java/org/apache/impala/util/FsPermissionCache.java:

http://gerrit.cloudera.org:8080/#/c/11027/3/fe/src/main/java/org/apache/impala/util/FsPermissionCache.java@45
PS3, Line 45: FsPermissionChecker checker = FsPermissionChecker.getInstance();
Make it a static class member may be?



-- 
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Fri, 27 Jul 2018 06:41:51 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/143/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Wed, 01 Aug 2018 23:11:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded
......................................................................


Patch Set 8: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 8
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Thu, 09 Aug 2018 00:52:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/37/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Tue, 24 Jul 2018 19:40:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded
......................................................................

IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

Prior to this patch, when a table is first loaded, the catalog iterated
over each of the partition directories and called getFileStatus() on
each, serially, to determine the overall access level of the table.

In some testing, each such call took 1-2ms, so this could add many
seconds to the overall table load time for a table with thousands of
partitions and also add to the NN load.

This patch adds some batch pre-fetching of file status information: for
any parent directory which contains more than one partition, we use the
listStatus() API to fetch the FileStatus objects in bulk.

A new unit test verifies the number of API calls made to the NameNode
during a table load.

Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Reviewed-on: http://gerrit.cloudera.org:8080/11027
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Todd Lipcon <to...@apache.org>
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/util/FsPermissionCache.java
M fe/src/main/java/org/apache/impala/util/FsPermissionChecker.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
4 files changed, 218 insertions(+), 59 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Todd Lipcon: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 9
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>

[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/25/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Tue, 24 Jul 2018 00:53:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/util/FsPermissionCache.java
File fe/src/main/java/org/apache/impala/util/FsPermissionCache.java:

http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/util/FsPermissionCache.java@40
PS1, Line 40: FileSystem
> qq. What is the use of having this as a part of map's key? Isn't it already
Technically not all paths need to be fully-qualified. I'll see if we can restrict this function to only take fully-qualified paths, or ensure canonicalization, etc.



-- 
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Tue, 24 Jul 2018 16:08:48 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded
......................................................................


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2943/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Tue, 07 Aug 2018 17:10:32 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11027/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/11027/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@957
PS2, Line 957:     preloadPermissionsCache(msPartitions, permCache);
> Looking at the consumers of this, alterTableAddPartitions() and alterTableR
Good point about the incremental updates. Let me take this back to the drawing board and think about that.



-- 
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Wed, 25 Jul 2018 16:32:51 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Bharath Vissapragada, Tianyi Wang, Vuk Ercegovac, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11027

to look at the new patch set (#2).

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded
......................................................................

IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

Prior to this patch, when a table is first loaded, the catalog iterated
over each of the partition directories and called getFileStatus() on
each, serially, to determine the overall access level of the table.

In some testing, each such call took 1-2ms, so this could add many
seconds to the overall table load time for a table with thousands of
partitions and also add to the NN load.

This patch adds some batch pre-fetching of file status information: for
any parent directory which contains more than one partition, we use the
listStatus() API to fetch the FileStatus objects in bulk.

A new unit test verifies the number of API calls made to the NameNode
during a table load.

Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/util/FsPermissionCache.java
M fe/src/main/java/org/apache/impala/util/FsPermissionChecker.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
4 files changed, 205 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/11027/2
-- 
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>

[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Bharath Vissapragada, Tianyi Wang, Impala Public Jenkins, Vuk Ercegovac, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11027

to look at the new patch set (#4).

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded
......................................................................

IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

Prior to this patch, when a table is first loaded, the catalog iterated
over each of the partition directories and called getFileStatus() on
each, serially, to determine the overall access level of the table.

In some testing, each such call took 1-2ms, so this could add many
seconds to the overall table load time for a table with thousands of
partitions and also add to the NN load.

This patch adds some batch pre-fetching of file status information: for
any parent directory which contains more than one partition, we use the
listStatus() API to fetch the FileStatus objects in bulk.

A new unit test verifies the number of API calls made to the NameNode
during a table load.

Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/util/FsPermissionCache.java
M fe/src/main/java/org/apache/impala/util/FsPermissionChecker.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
4 files changed, 218 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/11027/4
-- 
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>

[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded
......................................................................


Patch Set 1:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@857
PS1, Line 857:         HdfsPartition partition = createPartition(msPartition.getSd(), msPartition, permCache);
> longline
Done


http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1005
PS1, Line 1005: );
> Can you add the table name too.
Done


http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1015
PS1, Line 1015:    * Throws CatalogException if one of the supplied storage descriptors contains metadata that
> nit: longline
Done


http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1026
PS1, Line 1026:       HdfsPartition hdfsPartition = createPartition(partition.getSd(), partition, permCache);
> longline
Done


http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1625
PS1, Line 1625:       HdfsPartition partition = createPartition(msPartition.getSd(), msPartition, permCache);
> longline
Done


http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1675
PS1, Line 1675: );
> , ioe);
Done


http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/util/FsPermissionCache.java
File fe/src/main/java/org/apache/impala/util/FsPermissionCache.java:

http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/util/FsPermissionCache.java@40
PS1, Line 40: FileSystem
> Technically not all paths need to be fully-qualified. I'll see if we can re
I looked through the source for FileStatus.getPath() and it seems to "do the right thing" here -- if the path that we listed is qualified, it returns a similarly-qualified path on the resulting children, so things should match up appropriately without having to include the FileSystem here.


http://gerrit.cloudera.org:8080/#/c/11027/1/fe/src/main/java/org/apache/impala/util/FsPermissionCache.java@45
PS1, Line 45: if (perms != null) {
            :       return perms;
            :     }
> single line.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Tue, 24 Jul 2018 18:38:56 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded
......................................................................


Patch Set 8: Code-Review+2

Forwarding +2


-- 
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 8
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Thu, 09 Aug 2018 21:39:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded
......................................................................


Patch Set 1:

Build Started https://jenkins.impala.io/job/gerrit-code-review-checks/25/ 

Running initial code review checks. This is experimental - please report any issues to tarmstrong@cloudera.com or on this JIRA: IMPALA-7317


-- 
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Mon, 23 Jul 2018 23:58:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/11027/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/11027/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1566
PS3, Line 1566: void
> nit: Java-style comment: Returning FsPermissionCache  (vs) passing the outp
Done


http://gerrit.cloudera.org:8080/#/c/11027/3/fe/src/main/java/org/apache/impala/util/FsPermissionCache.java
File fe/src/main/java/org/apache/impala/util/FsPermissionCache.java:

http://gerrit.cloudera.org:8080/#/c/11027/3/fe/src/main/java/org/apache/impala/util/FsPermissionCache.java@45
PS3, Line 45: FsPermissionChecker checker = FsPermissionChecker.getInstance();
> Make it a static class member may be?
this is already just an accessor for a static instance in FsPermissionChecker, so there isn't really any benefit to "caching" the result



-- 
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Wed, 01 Aug 2018 22:20:18 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded
......................................................................


Patch Set 6: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/2943/


-- 
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Tue, 07 Aug 2018 20:26:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded
......................................................................


Patch Set 2:

Build Started https://jenkins.impala.io/job/gerrit-code-review-checks/37/ 

Running initial code review checks. This is experimental - please report any issues to tarmstrong@cloudera.com or on this JIRA: IMPALA-7317


-- 
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Tue, 24 Jul 2018 18:46:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/69/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Thu, 26 Jul 2018 02:22:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/11027 )

Change subject: IMPALA-7320. Avoid calling getFileStatus() for each partition when table is loaded
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11027/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/11027/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1557
PS2, Line 1557:     preloadPermissionsCache(msPartitions, permCache);
> Same question as above. Could this be an overkill if size(msPartitions) is 
I added a simple heuristic here that compares size(msPartitions) vs the number of existing partitions, and only uses this trick if the number of new partitions is 3x the number of existing partitions. That should catch RECOVER PARTITIONS use case when basically constructing the entire hierarchy from an existing table path, but not affect "alter table add" or RECOVER when it just adds a few. Let me know what you think



-- 
To view, visit http://gerrit.cloudera.org:8080/11027
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e5ebc214d6620d165e13f8cc80f8fdda100734
Gerrit-Change-Number: 11027
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Thu, 26 Jul 2018 01:42:03 +0000
Gerrit-HasComments: Yes