You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2021/06/03 07:18:00 UTC

[jira] [Created] (IMPALA-10727) Share identical CachedHmsPartitionDescriptor across HdfsPartitions

Quanlong Huang created IMPALA-10727:
---------------------------------------

             Summary: Share identical CachedHmsPartitionDescriptor across HdfsPartitions
                 Key: IMPALA-10727
                 URL: https://issues.apache.org/jira/browse/IMPALA-10727
             Project: IMPALA
          Issue Type: Improvement
          Components: Catalog
            Reporter: Quanlong Huang


In catalogd, we keep one CachedHmsPartitionDescriptor for each HdfsPartition. Many fields in it could be identical, e.g. sdBucketCols, sdSortCols. We can keep different {{CachedHmsPartitionDescriptor}} in HdfsTable instead and share them to the HdfsPartition. For fields that differs across partitions, e.g. msCreateTime, msLastAccessTime, we can move them to HdfsPartition.

https://github.com/apache/impala/blob/1a84a1420c5d517f43e4c7e90ee204db30f27d57/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java#L543
{code:java}
  // TODO: Cache this descriptor in HdfsTable so that identical descriptors are shared
  // between HdfsPartition instances.
  // TODO: sdInputFormat and sdOutputFormat can be mutated by Impala when the file format
  // of a partition changes; move these fields to HdfsPartition.
  private static class CachedHmsPartitionDescriptor {
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)