You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/08/20 20:37:00 UTC

[jira] [Commented] (IMPALA-7453) Intern HdfsStorageDescriptors

    [ https://issues.apache.org/jira/browse/IMPALA-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16586481#comment-16586481 ] 

ASF subversion and git services commented on IMPALA-7453:
---------------------------------------------------------

Commit d29300281b5d07c1ed98032536c008b076a7baa5 in impala's branch refs/heads/master from [~tlipcon]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=d293002 ]

IMPALA-7453. Intern HdfsStorageDescriptors

The number of unique HdfsStorageDescriptors in a warehouse is typically
much smaller than the number of partitions. Each object takes 32/40 bytes
(with/without compressed OOPs respectively). So, by interning these
objects, we can save that amount of memory as well as one object per
partition.

The overall savings aren't huge (on the order of tens of MBs) but the
change is pretty simple so seems worthwhile.

This patch also pulls in the errorprone annotations into the pom so that
errorprone can ensure that the class can be annotated as Immutable.
errorprone checks that classes annotated as Immutable only contain
immutable fields.

I tested this change by comparing 'jmap -histo:live' on a catalogd
before/after. For my local dev environment test warehouse, I had 12055
instances (385kb) before the change and 24 instances (768 bytes) after.

Change-Id: I9ef93148d629b060fa9f67c631e9c3d904a0ccf9
Reviewed-on: http://gerrit.cloudera.org:8080/11236
Reviewed-by: Bharath Vissapragada <bh...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Intern HdfsStorageDescriptors
> -----------------------------
>
>                 Key: IMPALA-7453
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7453
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>            Reporter: Todd Lipcon
>            Priority: Major
>
> Every partition currently has an HdfsStorageDescriptor attached. In most cases, the number of unique storage descriptors in a warehouse is pretty low (most partitions use the same escaping, file formats, etc). For example, in the functional test data load, we only have 24 unique SDs across ~10k partitions. Each object takes 32 bytes (with compressed oops) or 40 (without). So, we can get some small memory/object-count savings by interning these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org