You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Pranay Singh (Code Review)" <ge...@cloudera.org> on 2018/06/02 06:15:05 UTC

[Impala-ASF-CR] IMPALA-6994:Avoid reloading a table's HMS data for file-only operations

Pranay Singh has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10587


Change subject: IMPALA-6994:Avoid reloading a table's HMS data for file-only operations
......................................................................

IMPALA-6994:Avoid reloading a table's HMS data for file-only operations

  The problem is that while inserting a  new row to an unpartitioned HDFS table,
  or to an existing partition, the catalogd makes unecessary call to Hive Meta
  Store to do a getTable() and list the partitions of the table, when in fact
  only the file metadata for HDFS tables needs to be updated.

  This extra call to HMS introduces a point of failure, we need to handle
  for error scenario when Hive MetaStore crashes. This change removes the
  extra call to Hive Meta Store for the case when a row is inserted to an
  existing partition in HDFS table or when a row is added to unpartitioned
  table. Thus, an optimization as it reduces the call to Hive Meta Store
  during Update of Catalog.

  Testing: Ran core test without failure.

Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
2 files changed, 34 insertions(+), 17 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/10587/1
-- 
To view, visit http://gerrit.cloudera.org:8080/10587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c
Gerrit-Change-Number: 10587
Gerrit-PatchSet: 1
Gerrit-Owner: Pranay Singh

[Impala-ASF-CR] IMPALA-6994:Avoid reloading a table's HMS data for file-only operations

Posted by "Pranay Singh (Code Review)" <ge...@cloudera.org>.
Pranay Singh has posted comments on this change. ( http://gerrit.cloudera.org:8080/10587 )

Change subject: IMPALA-6994:Avoid reloading a table's HMS data for file-only operations
......................................................................


Patch Set 2:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@7
PS2, Line 7: IMPALA-6994:Avoid reloading a table's HMS data for file-only operations
> nit: add a space
Done


http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@10
PS2, Line 10: unecessary
> nit: check spelling and remove extra spaces.
Done


http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@9
PS2, Line 9: unpartitioned HDFS table,
           :   or to an existing partition
> simplify to just HDFS table, or is there some case that's being excluded?
Done


http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@21
PS2, Line 21:   Testing: Ran core test without failure.
> Is there any new scenario that you want to test for this change? If existin
Yes I want to test the scenario where concurrent removal
of partition and insertion of row to the same partition happens.

a) Impala is inserting the row to an existing partition
b) Hive is removing the partition at the same time.


http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1252
PS2, Line 1252: nonewPartitionHint
> needs a comment.
Done


http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1252
PS2, Line 1252: nonewPartitionHint
> needs a comment.
Added the comment


http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1365
PS2, Line 1365:       loadMetadataAndDiskIds(partitionsToUpdateFileMdByPath, true);
> any way to refactor this function so you don't need to duplicate this code 
Refactored the code and added a new function to reload the partition.


http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@640
PS2, Line 640:   private void loadTableMetadata(Table tbl, long newCatalogVersion,
> the arguments here are getting quite unwieldly, and it seems that the two e
Made changes to the code to use enum MetadataLoadFlag



-- 
To view, visit http://gerrit.cloudera.org:8080/10587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c
Gerrit-Change-Number: 10587
Gerrit-PatchSet: 2
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Thu, 19 Jul 2018 18:27:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6994:Avoid reloading a table's HMS data for file-only operations

Posted by "Vuk Ercegovac (Code Review)" <ge...@cloudera.org>.
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/10587 )

Change subject: IMPALA-6994:Avoid reloading a table's HMS data for file-only operations
......................................................................


Patch Set 2:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@7
PS2, Line 7: IMPALA-6994:Avoid reloading a table's HMS data for file-only operations
nit: add a space


http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@9
PS2, Line 9:  
remove the description block's indent


http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@10
PS2, Line 10: unecessary
nit: check spelling and remove extra spaces.


http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@9
PS2, Line 9: unpartitioned HDFS table,
           :   or to an existing partition
simplify to just HDFS table, or is there some case that's being excluded?


http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@14
PS2, Line 14: This extra call to HMS introduces a point of failure, we need to handle
            :   for error scenario when Hive MetaStore crashes
So is this change an optimization, a correctness/robustness fix, or both? Also, I don't understand the point about a failure-- pls add an example with a sequence of steps that currently leads to an error. Can add such an example to the jira.


http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@15
PS2, Line 15: This change removes the
            :   extra call to Hive Meta Store for the case when a row is inserted to an
            :   existing partition in HDFS table or when a row is added to unpartitioned
            :   table. Thus, an optimization as it reduces the call to Hive Meta Store
            :   during Update of Catalog.
lots of redundancy here with the prev. paragraph-- pls condense or remove.


http://gerrit.cloudera.org:8080/#/c/10587/2//COMMIT_MSG@21
PS2, Line 21:   Testing: Ran core test without failure.
Is there any new scenario that you want to test for this change? If existing tests cover it, is there any way to assert that this change is operating as expected?


http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1252
PS2, Line 1252: nonewPartitionHint
needs a comment.



-- 
To view, visit http://gerrit.cloudera.org:8080/10587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c
Gerrit-Change-Number: 10587
Gerrit-PatchSet: 2
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Jul 2018 16:19:48 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6994:Avoid reloading a table's HMS data for file-only operations

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10587 )

Change subject: IMPALA-6994:Avoid reloading a table's HMS data for file-only operations
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1365
PS2, Line 1365:       loadMetadataAndDiskIds(partitionsToUpdateFileMdByPath, true);
any way to refactor this function so you don't need to duplicate this code from line 1429-1435 up to here? perhaps extracting another function for all of the partition-reloading code and only call it when you dont have the 'no new partitions' hint?


http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/10587/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@640
PS2, Line 640:   private void loadTableMetadata(Table tbl, long newCatalogVersion,
the arguments here are getting quite unwieldly, and it seems that the two existing booleans have a positive sense ("DO reload filemetadata" and "DO reload table schema") whereas this new boolean has a negative sense "DONT reload data from HMS".

How about refactoring this slightly to use an enum and an EnumSet, something like:

static enum MetadataLoadFlag {
  RELOAD_TABLE_METADATA,
  RELOAD_PARTITION_METADATA,
  RELOAD_FILE_METADATA
}

void loadTableMetadata(Table tbl, long newCatalogVersion,
  EnumSet<MetadataLoadFlag> flags,
  Set<String> partitionsToUpdate,
  Table msTable);

.. and perhaps add Preconditions that validate that the optional msTable, partitionsToUpdate, etc params are only set when it makes sense according to the flags?


I think this would make the call sites and code a lot more understandable.



-- 
To view, visit http://gerrit.cloudera.org:8080/10587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c
Gerrit-Change-Number: 10587
Gerrit-PatchSet: 2
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Jul 2018 17:54:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations

Posted by "Pranay Singh (Code Review)" <ge...@cloudera.org>.
Hello Bharath Vissapragada, Balazs Jeszenszky, Todd Lipcon, Vuk Ercegovac, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10587

to look at the new patch set (#4).

Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only operations
......................................................................

IMPALA-6994: Avoid reloading a table's HMS data for file-only operations

The problem is that while inserting a new row to an existing partition
of a HDFS table or to an unpartitioned table, catalogd makes an unnecessary
call to Hive MetaStore to do a getTable() and list the partitions of the
table. In such a case where a row is being inserted to an existing
partition or to an unpartitioned table only the file metadata for
HDFS tables needs to be updated.

This change avoids calling the Hive Meta Store in updatePartitionsFromHms()
by providing a hint from the caller, to skip loading the partitions.

Testing:
  Ran core test without failure.

Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
2 files changed, 126 insertions(+), 52 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/10587/4
-- 
To view, visit http://gerrit.cloudera.org:8080/10587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c
Gerrit-Change-Number: 10587
Gerrit-PatchSet: 4
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Balazs Jeszenszky <je...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>

[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations

Posted by "Pranay Singh (Code Review)" <ge...@cloudera.org>.
Pranay Singh has posted comments on this change. ( http://gerrit.cloudera.org:8080/10587 )

Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only operations
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10587/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10587/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1251
PS3, Line 1251:  RELOAD_PARTITION_METADATA if not set in flags results in not getting the
              :    * partition details from HiveMetaStore in updatePartitionsFromHms(). 
> nit: this double negative is a bit confusing. Perhaps it's clearer to rephr
Changed the comment, thanks for rewording it.


http://gerrit.cloudera.org:8080/#/c/10587/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1254
PS3, Line 1254: or if there are no partitions in the table
> I'm not sure I follow this part. How does the caller know if there are no p
That's right the unpartitioned case doesn't have to consider this, I have removed the sentence.



-- 
To view, visit http://gerrit.cloudera.org:8080/10587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c
Gerrit-Change-Number: 10587
Gerrit-PatchSet: 3
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Balazs Jeszenszky <je...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Fri, 20 Jul 2018 18:57:05 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations

Posted by "Pranay Singh (Code Review)" <ge...@cloudera.org>.
Pranay Singh has abandoned this change. ( http://gerrit.cloudera.org:8080/10587 )

Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only operations
......................................................................


Abandoned
-- 
To view, visit http://gerrit.cloudera.org:8080/10587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c
Gerrit-Change-Number: 10587
Gerrit-PatchSet: 4
Gerrit-Owner: Pranay Singh (320)
Gerrit-Reviewer: Balazs Jeszenszky <je...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Pranay Singh (320)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>

[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations

Posted by "Vuk Ercegovac (Code Review)" <ge...@cloudera.org>.
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/10587 )

Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only operations
......................................................................


Patch Set 4:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/10587/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10587/4//COMMIT_MSG@9
PS4, Line 9: new row
this should apply more generally to new files, not just rows? either way, the wording here is inconsistent with the title, which refers to "file-only operations".


http://gerrit.cloudera.org:8080/#/c/10587/4//COMMIT_MSG@17
PS4, Line 17: the
all? which partitions?


http://gerrit.cloudera.org:8080/#/c/10587/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10587/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1417
PS4, Line 1417: a
nit: an


http://gerrit.cloudera.org:8080/#/c/10587/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1420
PS4, Line 1420: by calling
nit: from


http://gerrit.cloudera.org:8080/#/c/10587/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/10587/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@254
PS4, Line 254: reloads the table schema from Hive Meta Store
can in-line this comment directly on L259. same with the other descriptions of what each flag does when set. 

doing so will remove some redundancy in the comment.


http://gerrit.cloudera.org:8080/#/c/10587/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@585
PS4, Line 585: noneOf(MetaDataLoadFlag.class);
             :         flags.add
simplify to .of(...) since we're always adding RELOAD_PARTITION_METADATA.


http://gerrit.cloudera.org:8080/#/c/10587/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@663
PS4, Line 663: MetaDataLoadFlag of type 'RELOAD_TABLE_METADATA', 'RELOAD_PARTITION_METADATA',
             :    * 'RELOAD_FILE_METADATA'
replace with just flags.


http://gerrit.cloudera.org:8080/#/c/10587/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@670
PS4, Line 670: msTbl
why is this pulled out now as a parameter. I see one place where its used-- what requires this change?
if its really needed, please add a comment (e.g., what does a null msTbl do?)


http://gerrit.cloudera.org:8080/#/c/10587/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3484
PS4, Line 3484: noneOf(MetaDataLoadFlag.class);
              :       flags.add(MetaDataLoadFlag.RELOAD_FILE_METADATA)
simplify to of(...) since the flag on L3485 is always included.


http://gerrit.cloudera.org:8080/#/c/10587/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3486
PS4, Line 3486:  == true
remove



-- 
To view, visit http://gerrit.cloudera.org:8080/10587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c
Gerrit-Change-Number: 10587
Gerrit-PatchSet: 4
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Balazs Jeszenszky <je...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Wed, 01 Aug 2018 23:26:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations

Posted by "Pranay Singh (Code Review)" <ge...@cloudera.org>.
Hello Bharath Vissapragada, Todd Lipcon, Vuk Ercegovac, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10587

to look at the new patch set (#3).

Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only operations
......................................................................

IMPALA-6994: Avoid reloading a table's HMS data for file-only operations

The problem is that while inserting a new row to an existing partition
of a HDFS table or to an unpartitioned table, catalogd makes an unnecessary
call to Hive MetaStore to do a getTable() and list the partitions of the
table. In such a case where a row is being inserted to an existing
partition or to an unpartitioned table only the file metadata for
HDFS tables needs to be updated.

This change avoids calling the Hive Meta Store in updatePartitionsFromHms()
by providing a hint from the caller, to skip loading the partitions.

Testing:
  Ran core test without failure.

Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
2 files changed, 126 insertions(+), 52 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/10587/3
-- 
To view, visit http://gerrit.cloudera.org:8080/10587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c
Gerrit-Change-Number: 10587
Gerrit-PatchSet: 3
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>

[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10587 )

Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only operations
......................................................................


Patch Set 3:

(3 comments)

Just left some notes on the HdfsTable side. I dont know the CatalogOpExecutor code well so probably better for someone else to take a look at that part.

http://gerrit.cloudera.org:8080/#/c/10587/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10587/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1251
PS3, Line 1251:  RELOAD_PARTITION_METADATA if not set in flags results in not getting the
              :    * partition details from HiveMetaStore in updatePartitionsFromHms(). 
nit: this double negative is a bit confusing. Perhaps it's clearer to rephrase like:

If RELOAD_PARTITION_METADATA is set in flags, the partition details will be reloaded from the HMS. In some cases, this may not be necessary, and when unset, an expensive call to the HMS can be avoided.


http://gerrit.cloudera.org:8080/#/c/10587/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1254
PS3, Line 1254: or if there are no partitions in the table
I'm not sure I follow this part. How does the caller know if there are no partitions in the table without fetching the list of partitions? In the case of an unpartitioned table (no partition keys) the code already takes care of this internally without the caller having to pass it, right?


http://gerrit.cloudera.org:8080/#/c/10587/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1418
PS3, Line 1418:  Updates the partitions of an HdfsTable so that they are in sync with the Hive
              :    * Metastore. 
this doesn't seem to be an accurate description of the function, since the function also reloads file metadata, right? Maybe the function should just be called updatePartitionMetadata? Then it's clearer that the flags specify whether we want to update the file metadata, the list of partitions, or both, right?



-- 
To view, visit http://gerrit.cloudera.org:8080/10587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I331bb0371fde287f43a85b025b4f98cb45f3eb3c
Gerrit-Change-Number: 10587
Gerrit-PatchSet: 3
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Balazs Jeszenszky <je...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Fri, 20 Jul 2018 16:57:43 +0000
Gerrit-HasComments: Yes