You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Misha Dmitriev (JIRA)" <ji...@apache.org> on 2017/08/03 01:06:00 UTC

[jira] [Updated] (HIVE-17237) HMS wastes 26.4% of memory due to dup strings in metastore.api.Partition.parameters

     [ https://issues.apache.org/jira/browse/HIVE-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Misha Dmitriev updated HIVE-17237:
----------------------------------
    Attachment: HIVE-17237.01.patch

> HMS wastes 26.4% of memory due to dup strings in metastore.api.Partition.parameters
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-17237
>                 URL: https://issues.apache.org/jira/browse/HIVE-17237
>             Project: Hive
>          Issue Type: Improvement
>          Components: HiveServer2
>            Reporter: Misha Dmitriev
>            Assignee: Misha Dmitriev
>         Attachments: HIVE-17237.01.patch
>
>
> I've analyzed a heap dump from a production Hive installation using jxray (www.jxray.com) It turns out that there are a lot of duplicate strings in memory, that waste 26.4% of the heap. Most of them come from HashMaps referenced by org.apache.hadoop.hive.metastore.api.Partition.parameters. Below is the relevant section of the jxray report.
> Looking at Partition.java, I see that in the past somebody has already added code to intern keys and values in the parameters table when it's first set up. However, when more key-value pairs are added, they are not interned, and that probably explains the reason for all these duplicate strings. Also when a Partition instance is deserialized, no interning of parameters is currently done.
> {code}
> 6. DUPLICATE STRINGS
> Total strings: 3,273,557  Unique strings: 460,390  Duplicate values: 110,232  Overhead: 3,220,458K (26.4%)
> ....
> ===================================================
> 7. REFERENCE CHAINS FOR DUPLICATE STRINGS
>   2,326,150K (19.1%), 597058 dup strings (36386 unique), 597058 dup backing arrays:
> 39949 of "-1", 39088 of "true", 28959 of "8", 20987 of "1", 18437 of "10", 9583 of "9", 5908 of "269664", 5691 of "174528", 4598 of "133980", 4598 of "BgUGBQgFCAYFCgYIBgUEBgQHBgUGCwYGBwYHBgkKBwYGBggIBwUHBgYGCgUJCQUG ...[length 3560]"
> ... and 419200 more strings, of which 36376 are unique
> Also contains one-char strings: 217 of "6", 147 of "7", 91 of "4", 28 of "5", 28 of "2", 21 of "0"
>      <--  {j.u.HashMap}.values <-- org.apache.hadoop.hive.metastore.api.Partition.parameters <--  {j.u.ArrayList} <-- org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.success <-- Java Local (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result) [@6e33618d8,@6eedb9a80,@6eedbad68,@6eedbc788] ... and 3 more GC roots
>   463,060K (3.8%), 119644 dup strings (34075 unique), 119644 dup backing arrays:
> 7914 of "true", 7912 of "-1", 6578 of "8", 5606 of "1", 2302 of "10", 1626 of "174528", 1223 of "9", 970 of "171680", 837 of "269664", 657 of "133980"
> ... and 84009 more strings, of which 34065 are unique
> Also contains one-char strings: 42 of "7", 31 of "6", 20 of "4", 8 of "5", 5 of "2", 3 of "0"
>      <--  {j.u.HashMap}.values <-- org.apache.hadoop.hive.metastore.api.Partition.parameters <--  {j.u.TreeMap}.values <-- Java Local (j.u.TreeMap) [@6f084afa0,@73aac9e68]
>   233,384K (1.9%), 64601 dup strings (27295 unique), 64601 dup backing arrays:
> 4472 of "true", 4173 of "-1", 3798 of "1", 3591 of "8", 813 of "174528", 684 of "10" ... and 44568 more strings, of which 27285 are unique
> Also contains one-char strings: 305 of "7", 301 of "0", 277 of "4", 146 of "6", 29 of "2", 23 of "5", 19 of "9", 2 of "3"
>      <--  {j.u.HashMap}.values <-- org.apache.hadoop.hive.metastore.api.Partition.parameters <--  {j.u.ArrayList} <-- Java Local (j.u.ArrayList) [@4f4cfbd10,@536122408,@726616778]
> ...
>   52,916K (0.4%), 597058 dup strings (16 unique), 597058 dup backing arrays:
>      <--  {j.u.HashMap}.keys <-- org.apache.hadoop.hive.metastore.api.Partition.parameters <--  {j.u.ArrayList} <-- org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.success <-- Java Local (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result) [@6e33618d8,@6eedb9a80,@6eedbad68,@6eedbc788] ... and 3 more GC roots
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)