You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Sergey Shelukhin (JIRA)" <ji...@apache.org> on 2015/06/03 22:03:38 UTC

[jira] [Commented] (HIVE-10068) LLAP: adjust allocation after decompression

    [ https://issues.apache.org/jira/browse/HIVE-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14571562#comment-14571562 ] 

Sergey Shelukhin commented on HIVE-10068:
-----------------------------------------

Update from some test runs on TPCDS and TPCH queries, we waste around 15% allocated memory due to buddy allocator granularity:
{noformat}
$ sed -E "s/.*ALLOCATED_BYTES=([0-9]+).*/\1/" lrfu1.log | awk '{s+=$1}END{print s}'
278162046976
$ sed -E "s/.*ALLOCATED_USED_BYTES=([0-9]+).*/\1/" lrfu1.log | awk '{s+=$1}END{print s}'
238565954908
{noformat}

Some of that is obviously unavoidable, but some could be avoided by implementing this. However, it's not as bad as I expected (bad results can be seen on very small datasets were stripes/RGs are routinely smaller than compression block size.

> LLAP: adjust allocation after decompression
> -------------------------------------------
>
>                 Key: HIVE-10068
>                 URL: https://issues.apache.org/jira/browse/HIVE-10068
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Sergey Shelukhin
>
> We don't know decompressed size of a compression buffer in ORC, all we know is the file-level compression buffer size. For many files, compression buffers can be smaller than that because of compact encoding, or because compression block ends for other reasons (different streams, etc. - "present" streams for example are very small).
> BuddyAllocator should be able to accept back parts of the allocated memory (e.g. allocate 256Kb with minimum allocation of 32Kb, decompress 45Kb, return the last 192Kb as 64+128Kb). For generality (this depends on implementation), we can make an API like "offer", and allocator can decide to take back however much it can.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)