You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2019/04/10 15:43:00 UTC

[jira] [Resolved] (IMPALA-7047) REFRESH on unpartitioned tables calls getBlockLocations on every file

     [ https://issues.apache.org/jira/browse/IMPALA-7047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon resolved IMPALA-7047.
---------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 3.2.0

Yep, thanks for catching this.

> REFRESH on unpartitioned tables calls getBlockLocations on every file
> ---------------------------------------------------------------------
>
>                 Key: IMPALA-7047
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7047
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>    Affects Versions: Impala 2.13.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Major
>              Labels: metadata
>             Fix For: Impala 3.2.0
>
>
> In HdfsTable.updateUnpartitionedTableFileMd() the existing default Partition object is reset, and a new empty one is created. It then calls refreshPartitionFileMetadata with this new partition which has an empty list of file descriptors. This ends up listing the directory, and for each file, since it doesn't find it in the empty descriptor list, will make a separate RPC to HDFS to get the locations.
> This is quite wasteful vs just using the API that returns the located statuses for the directory.
> Alternatively, it seems like it should probably keep around the old file descriptor list in the new Partition object so that the incremental refresh path can work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)