You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@sentry.apache.org by "Misha Dmitriev (JIRA)" <ji...@apache.org> on 2017/06/21 18:09:00 UTC

[jira] [Created] (SENTRY-1811) Optimize data structures used in HDFS sync

Misha Dmitriev created SENTRY-1811:
--------------------------------------

             Summary: Optimize data structures used in HDFS sync
                 Key: SENTRY-1811
                 URL: https://issues.apache.org/jira/browse/SENTRY-1811
             Project: Sentry
          Issue Type: Improvement
    Affects Versions: 1.8.0, sentry-ha-redesign
            Reporter: Misha Dmitriev


We obtained a heap dump taken from the JVM running Hive Metastore at the time when Sentry HDFS sync operation was performed. I've analyzed this dump with jxray (www.jxray.com) and found that more than 19% of memory is wasted due to empty or suboptimally-sized Java collections:

{code}
9. BAD COLLECTIONS

Total collections: 54,057,249  Bad collections: 31,569,606  Overhead: 5,292,821K (19.3%)
{code}

Most of these collections come from thrift classes used by the Sentry plugin, see below. The associated memory waste can be significantly reduced or eliminated if these collections were allocated lazily and then with the initial capacity smaller than the default 16 elements for HashMap/HashSet.

{code}
  1,869,023K (6.8%): j.u.HashSet: 3388670 of 1-elem 979,537K (3.6%), 5897806 of empty 552,919K (2.0%), 1010321 of small 336,566K (1.2%)
     <-- org.apache.sentry.hdfs.service.thrift.TPathEntry.children <--  {j.u.HashMap}.values <-- org.apache.sentry.hdfs.service.thrift.TPathsDump.nodeMap <-- org.apache.sentry.hdfs.service.thrift.TPathsUpdate.pathsDump <-- Java Local@7fea0851c360 (org.apache.sentry.hdfs.service.thrift.TPathsUpdate)
  1,190,050K (4.3%): j.u.HashMap: 3382765 of 1-elem 898,546K (3.3%), 1005341 of small 291,503K (1.1%)
     <-- org.apache.sentry.hdfs.HMSPaths$Entry.children <-- org.apache.sentry.hdfs.HMSPaths$Entry.{parent} <--  {j.u.HashSet} <--  {j.u.TreeMap}.values <-- org.apache.sentry.hdfs.HMSPaths.authzObjToPath <-- org.apache.sentry.hdfs.UpdateableAuthzPaths.paths <-- org.apache.sentry.hdfs.MetastorePlugin.authzPaths <-- Java Local@7fe4fe84e030 (org.apache.sentry.hdfs.MetastorePlugin)
  969,442K (3.5%): j.u.TreeSet: 5907188 of 1-elem 969,148K (3.5%)
     <-- org.apache.sentry.hdfs.service.thrift.TPathEntry.authzObjs <--  {j.u.HashMap}.values <-- org.apache.sentry.hdfs.service.thrift.TPathsDump.nodeMap <-- org.apache.sentry.hdfs.service.thrift.TPathsUpdate.pathsDump <-- Java Local@7fea0851c360 (org.apache.sentry.hdfs.service.thrift.TPathsUpdate)
  487,690K (1.8%): j.u.TreeSet: 4801877 of empty 487,690K (1.8%)
     <-- org.apache.sentry.hdfs.HMSPaths$Entry.authzObjs <-- org.apache.sentry.hdfs.HMSPaths$Entry.{parent} <--  {j.u.HashSet} <--  {j.u.TreeMap}.values <-- org.apache.sentry.hdfs.HMSPaths.authzObjToPath <-- org.apache.sentry.hdfs.UpdateableAuthzPaths.paths <-- org.apache.sentry.hdfs.MetastorePlugin.authzPaths <-- Java Local@7fe4fe84e030 (org.apache.sentry.hdfs.MetastorePlugin)
  415,064K (1.5%): j.u.HashMap: 5897806 of empty 414,689K (1.5%)
     <-- org.apache.sentry.hdfs.HMSPaths$Entry.children <--  {j.u.HashSet} <--  {j.u.TreeMap}.values <-- org.apache.sentry.hdfs.HMSPaths.authzObjToPath <-- org.apache.sentry.hdfs.UpdateableAuthzPaths.paths <-- org.apache.sentry.hdfs.MetastorePlugin.authzPaths <-- Java Local@7fe4fe84e030 (org.apache.sentry.hdfs.MetastorePlugin)
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)