You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/08/31 22:30:00 UTC

[jira] [Commented] (IMPALA-7424) Improve in-memory representation of incremental stats

    [ https://issues.apache.org/jira/browse/IMPALA-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16599332#comment-16599332 ] 

ASF subversion and git services commented on IMPALA-7424:
---------------------------------------------------------

Commit d4e281b734befdd4b4d289526887d1828581925d in impala's branch refs/heads/master from Bharath Vissapragada
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=d4e281b ]

IMPALA-7424: Reduce in-memory footprint of incremental stats

Currently incremental stats are stored as chunked Base64 strings in the
HMS parameters map of partition objects. Each of these strings when
stored in the catalogd are Java 'String' objects that use UTF-16 encoding
and take up to 2 bytes per character.

This patch converts the string representation into a deflate-compressed byte
array form when the partition is loaded in the Catalogd and this state is
maintained when transmitting them to the coordinators. To maintain
backward compatibility, the persistent HMS representation of stats has not
been modified. So the incremental stats are still written back to the
chunked Base64 representation while serializing the partition state to
HMS.

On a real world catalogserver dominated by incremental stats memory
footprint, this patch showed ~54% end-to-end heapsize reduction and ~79%
reduction in the memory footprint of incremental stats data structures.

This patch also improves the way the callers check if a partition has
incremental stats by computing this information once and reusing it
later. Without the patch, we deserialize the entire incremental stats
structure everytime this information is needed and that triggers a spike
in usage of working memory on catalogds/Impalads.

Testing: Ran core tests on Catalog V1 Implementation. Ran some manual
queries on Catalog V2 implementation.

Change-Id: I39f02ebfa0c6e9b0baedd0d76058a1b34efb5a02
Reviewed-on: http://gerrit.cloudera.org:8080/11341
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Todd Lipcon <to...@apache.org>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Improve in-memory representation of incremental stats
> -----------------------------------------------------
>
>                 Key: IMPALA-7424
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7424
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Catalog
>    Affects Versions: Impala 2.13.0, Impala 3.1.0
>            Reporter: bharath v
>            Assignee: bharath v
>            Priority: Major
>
> Incremental stats are stored in the HMS' parameters map as plain Java Strings. This is suboptimal since Java String class internally uses UTF-16 encoding for the underlying bytes. The idea here is to switch to a byte array representation so that we can reduce the memory usage by half (8 bytes).  We can also compress the byte array using gzip compression and lazily decompress them when needed (typically during the incremental stats computation's finalization phase).
> A prototype of this patch on a real-world Catalog dump showed ~54% JVM heap usage reduction (end-to-end) and ~79% reduction in the heap footprint for the incremental stats.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org