You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/06/05 00:23:00 UTC

[jira] [Commented] (IMPALA-7501) Slim down metastore Partition objects in LocalCatalog cache

    [ https://issues.apache.org/jira/browse/IMPALA-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17357730#comment-17357730 ] 

ASF subversion and git services commented on IMPALA-7501:
---------------------------------------------------------

Commit bb3062197b134f33e2796fac603e3367ab8bef1a in impala's branch refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=bb30621 ]

IMPALA-7501: Slim down partition metadata in LocalCatalog mode

In LocalCatalog mode, the coordinator caches HMS partition objects in
its local cache. HMS partition contains many fields that Impala won't
used and some fields that are suboptimal. Most of them are in the
StorageDescriptor:
 - partition-level schema (list<FieldSchema>) which is never used.
 - location string which can be prefix-compressed since the prefix is
   usually the table location.
 - input/outputFormat string which is always duplicated and can be
   represented by an integer(enum).
An experiment on a large table with 478 columns and 87320 partitions
(one non-empty file per partition) shows that more than 90% of the
memory space (1.1GB) are occupied by these fields. The dominant part is
the partition-level schema which consumed 76% of the cache.

On the other hand, these unused or suboptimal fields are got in one
response from catalogd, wrapped in TPartialPartitionInfo which finally
belongs to a TGetPartialCatalogObjectResponse. They dramatically
increase the serialized thrift object size of the response, which has a
2GB array size limit in JVM. Fetching metadata of many partitions from
catalogd could cause it runs into OOM error that hits the 2GB limit
(e.g. IMPALA-9896).

This patch extracts the HMS partition object and replaces it with the
fields that Impala actually uses. In the LocalCatalog cache, the HMS
partition object is replaced with
 - hms parameters
 - write id
 - HdfsStorageDescriptor which represents the input/output format and
   some delimiters.
 - prefix-compressed location
The hms_partition field of TPartialPartitionInfo is also extracted with
corresponding fields. However, CatalogHmsAPIHelper still requires the
whole hms partition object. So the hms_partition field is kept for its
usage. To distinguish the different requirements, we add a new field,
want_hms_partition in TTableInfoSelector. The existing
'want_partition_metadata' field means returning these extracted fields,
and the 'want_hms_partition' field means returning the whole HMS
partition object.

Improvement results in the above case:
 - reduce the heap usage from 1.1GB to 113.2MB, objects from 41m to 2.3m
 - reduce the response size from 1.7GB to 28.41MB.

Tests:
 - Run local-catalog related tests locally
 - Run CORE tests

Change-Id: I307e7a8193b54a7b3ab93d9ebd194766bbdbd977
Reviewed-on: http://gerrit.cloudera.org:8080/17505
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Vihang Karajgaonkar <vi...@cloudera.com>


> Slim down metastore Partition objects in LocalCatalog cache
> -----------------------------------------------------------
>
>                 Key: IMPALA-7501
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7501
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Catalog
>            Reporter: Todd Lipcon
>            Assignee: Quanlong Huang
>            Priority: Critical
>              Labels: catalog-v2
>         Attachments: impalad_dominator_tree.txt, impalad_histogram.txt
>
>
> I took a heap dump of an impalad running in LocalCatalog mode with a 2G limit after running a production workload simulation for a couple hours. It had 38.5M objects and 2.02GB heap (the vast majority of the heap is, as expected, in the LocalCatalog cache). Of this total footprint, 1.78GB and 34.6M objects are retained by 'Partition' objects. Drilling into those, 1.29GB and 33.6M objects are retained by FieldSchema, which, as far as I remember, are ignored on the partition level by the Impala planner. So, with a bit of slimming down of these objects, we could make a huge dent in effective cache capacity given a fixed budget. Reducing object count should also have the effect of improved GC performance (old gen GC is more closely tied to object count than size)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org