You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2021/05/21 02:33:00 UTC

[jira] [Commented] (IMPALA-7501) Slim down metastore Partition objects in LocalCatalog cache

    [ https://issues.apache.org/jira/browse/IMPALA-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348942#comment-17348942 ] 

Quanlong Huang commented on IMPALA-7501:
----------------------------------------

To measure the benifits of this work on the memory usage, I did a heap analysis on the case I mentioned above, i.e. a table with 478 columns and 87320 partitions (1 non-empty file per partition). The total heap usage is 1.1 GB. There are 41M objects in it. The dominator is the column list (i.e. list<FieldSchema>) included in the StorageDescriptor of each partition:
{code:java}
Class Name                                            |    Objects |  Shallow Heap | Retained Heap
---------------------------------------------------------------------------------------------------
org.apache.hadoop.hive.metastore.api.FieldSchema      | 38,130,538 |   915,132,912 |              
java.lang.Object[]                                    |    164,431 |   157,234,704 |              
char[]                                                |    436,556 |    41,490,952 |              
byte[]                                                |     81,461 |    13,026,696 |              
java.util.HashMap                                     |    240,230 |    11,531,040 |              
java.lang.String                                      |    436,420 |    10,474,080 |              
java.util.ArrayList                                   |    319,912 |     7,677,888 |              
org.apache.hadoop.hive.metastore.api.Partition        |     79,771 |     5,743,512 |              
org.apache.hadoop.hive.metastore.api.StorageDescriptor|     79,771 |     4,467,176 |              
com.google.common.cache.LocalCache$StrongAccessEntry  |     79,822 |     3,831,456 |              
java.nio.HeapByteBuffer                               |     79,777 |     3,829,296 |              
Total: 11 of 6,920 entries; 6,909 more                | 41,029,952 | 1,200,677,088 |              
---------------------------------------------------------------------------------------------------
{code}
Using the dominator_tree analysis of MAT, the result proves these FieldSchema objects come from the partition metadata:
{code:java}
Class Name                                                                                                       | Shallow Heap | Retained Heap | Percentage
------------------------------------------------------------------------------------------------------------------------------------------------------------
class org.apache.impala.service.FeCatalogManager$LocalImpl @ 0x5cce8e308                                         |            8 | 1,187,171,224 |     98.88%
'- org.apache.impala.catalog.local.CatalogdMetaProvider @ 0x5cd26a700                                            |           72 | 1,187,171,216 |     98.88%
   |- com.google.common.cache.LocalCache$LocalManualCache @ 0x5cd26a800                                          |           16 | 1,187,169,424 |     98.87%
   |  '- com.google.common.cache.LocalCache @ 0x5cd26a810                                                        |          128 | 1,187,169,408 |     98.87%
   |     |- com.google.common.cache.LocalCache$Segment[4] @ 0x5cd26a890                                          |           32 | 1,187,169,032 |     98.87%
   |     |  |- com.google.common.cache.LocalCache$Segment @ 0x5cd26aa58                                          |           80 |   296,800,352 |     24.72%
   |     |  |  |- java.util.concurrent.atomic.AtomicReferenceArray @ 0x60f98aad0                                 |           16 |       131,104 |      0.01%
   |     |  |  |- com.google.common.cache.LocalCache$StrongAccessEntry @ 0x610d3b750                             |           48 |        14,968 |      0.00%
   |     |  |  |  |- com.google.common.cache.LocalCache$WeightedStrongValueReference @ 0x610d3b798               |           24 |        14,896 |      0.00%
   |     |  |  |  |  '- org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl @ 0x610d3b7b0 |           40 |        14,872 |      0.00%
   |     |  |  |  |     |- org.apache.hadoop.hive.metastore.api.Partition @ 0x5fea12cf0                          |           72 |        14,576 |      0.00%
   |     |  |  |  |     |  |- org.apache.hadoop.hive.metastore.api.StorageDescriptor @ 0x5fea12e80               |           56 |        14,072 |      0.00%
   |     |  |  |  |     |  |  |- java.util.ArrayList @ 0x5fea12eb8                                               |           24 |        13,424 |      0.00%
   |     |  |  |  |     |  |  |  '- java.lang.Object[478] @ 0x5fea220f0                                          |        1,928 |        13,400 |      0.00%
   |     |  |  |  |     |  |  |     |- org.apache.hadoop.hive.metastore.api.FieldSchema @ 0x5fea22878            |           24 |            24 |      0.00%
   |     |  |  |  |     |  |  |     |- org.apache.hadoop.hive.metastore.api.FieldSchema @ 0x5fea22890            |           24 |            24 |      0.00%
   |     |  |  |  |     |  |  |     |- org.apache.hadoop.hive.metastore.api.FieldSchema @ 0x5fea228a8            |           24 |            24 |      0.00%
{code}
Each FieldSchema object take 24 bytes, and there are 38,130,538 FieldSchema objects. They finally consume 76% of the 1.1GB heap space.

Note that the uncompressed location strings and input/outputFormat strings of each partition also take some space.

I plan to move on the following optimization:
 * Don't cache msPartition object in CatalogdMetaProvider. Replace it with the actual fields we need, including
 ** hms parameters
 ** write id 
 ** HdfsStorageDescriptor which replaces the Input/OutputFormat strings with enums and contains sufficient info like lineDelimiter, fieldDelimiter, blockSize etc.
 ** HdfsPartitionLocationCompressor$Location which prefix compresses the partition location strings.
 * Don't transmit msPartition object in TPartitialPartitionInfo. Replace it with the fields mentioned above.

These can be done together. The first one reduce memory usage of the coordinator. The second one reduce the thrift object size that transfers from catalogd to coordinator, which is now easily causing OOM error for exceeding java array size limit (2GB).

CC [~vihangk1], [~amansinha]

> Slim down metastore Partition objects in LocalCatalog cache
> -----------------------------------------------------------
>
>                 Key: IMPALA-7501
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7501
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Catalog
>            Reporter: Todd Lipcon
>            Assignee: Quanlong Huang
>            Priority: Critical
>              Labels: catalog-v2
>
> I took a heap dump of an impalad running in LocalCatalog mode with a 2G limit after running a production workload simulation for a couple hours. It had 38.5M objects and 2.02GB heap (the vast majority of the heap is, as expected, in the LocalCatalog cache). Of this total footprint, 1.78GB and 34.6M objects are retained by 'Partition' objects. Drilling into those, 1.29GB and 33.6M objects are retained by FieldSchema, which, as far as I remember, are ignored on the partition level by the Impala planner. So, with a bit of slimming down of these objects, we could make a huge dent in effective cache capacity given a fixed budget. Reducing object count should also have the effect of improved GC performance (old gen GC is more closely tied to object count than size)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org