You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/10/18 21:33:00 UTC

[jira] [Commented] (IMPALA-7424) Improve in-memory representation of incremental stats

    [ https://issues.apache.org/jira/browse/IMPALA-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16655904#comment-16655904 ] 

ASF subversion and git services commented on IMPALA-7424:
---------------------------------------------------------

Commit 5af5456a2d95a43ce63f4e364ff0b9631729bb1a in impala's branch refs/heads/master from Bharath Vissapragada
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=5af5456 ]

IMPALA-7689: Reduce per column per partition stats estimate size

With the improvements in the incremental stats memory representation
(IMPALA-7424), the per column per partition stats estimate should be
reduced to account for the compressed memory footprint. Doing some
experiments on various test tables, I see the size is down by 50-70%.

This patch reduces the size estimate by 50% (conservative). Ideally we
don't need to estimate on the Catalog server during serialization since
we can compute the byte sizes by looping through all the partitions.
However this patch retains the current logic to keep it consistent with
"compute incremental stats" analysis.

Change-Id: I347b41d9b298d7cd73ec812692172e0511415eee
Reviewed-on: http://gerrit.cloudera.org:8080/11706
Reviewed-by: Bharath Vissapragada <bh...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Improve in-memory representation of incremental stats
> -----------------------------------------------------
>
>                 Key: IMPALA-7424
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7424
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Catalog
>    Affects Versions: Impala 2.13.0, Impala 3.1.0
>            Reporter: bharath v
>            Assignee: bharath v
>            Priority: Major
>             Fix For: Impala 3.1.0
>
>
> Incremental stats are stored in the HMS' parameters map as plain Java Strings. This is suboptimal since Java String class internally uses UTF-16 encoding for the underlying bytes. The idea here is to switch to a byte array representation so that we can reduce the memory usage by half (8 bytes).  We can also compress the byte array using gzip compression and lazily decompress them when needed (typically during the incremental stats computation's finalization phase).
> A prototype of this patch on a real-world Catalog dump showed ~54% JVM heap usage reduction (end-to-end) and ~79% reduction in the heap footprint for the incremental stats.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org