You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Rajesh Balamohan (JIRA)" <ji...@apache.org> on 2014/03/23 11:14:42 UTC

[jira] [Assigned] (TEZ-972) Shuffle Phase - optimize memory usage of empty partition data in DataMovementEvent

     [ https://issues.apache.org/jira/browse/TEZ-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rajesh Balamohan reassigned TEZ-972:
------------------------------------

    Assignee: Rajesh Balamohan

> Shuffle Phase - optimize memory usage of empty partition data in DataMovementEvent
> ----------------------------------------------------------------------------------
>
>                 Key: TEZ-972
>                 URL: https://issues.apache.org/jira/browse/TEZ-972
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>
> Empty partition details are stored in byte[] in compressed format and sent via DataMovementEvent in shuffle phase.  Quick standalone tests reveals that BitSet would be more efficient than compressing the byte[].  
> PartitionSize=1 , BitSetSize=1 , CompressedBitSetSize=9 , NormalByteArrayCompressed=9
> PartitionSize=101 , BitSetSize=13 , CompressedBitSetSize=22 , NormalByteArrayCompressed=42
> PartitionSize=201 , BitSetSize=26 , CompressedBitSetSize=37 , NormalByteArrayCompressed=62
> PartitionSize=301 , BitSetSize=38 , CompressedBitSetSize=49 , NormalByteArrayCompressed=76
> ..
> PartitionSize=1001 , BitSetSize=126 , CompressedBitSetSize=137 , NormalByteArrayCompressed=197
> ..
> PartitionSize=2001 , BitSetSize=251 , CompressedBitSetSize=262 , NormalByteArrayCompressed=374
> PartitionSize=4001 , BitSetSize=501 , CompressedBitSetSize=512 , NormalByteArrayCompressed=686
> PartitionSize=8001 , BitSetSize=1001 , CompressedBitSetSize=1012 , NormalByteArrayCompressed=1330
> PartitionSize=16001 , BitSetSize=2001 , CompressedBitSetSize=1979 , NormalByteArrayCompressed=2569
> PartitionSize=32001 , BitSetSize=4001 , CompressedBitSetSize=3885 , NormalByteArrayCompressed=5000
> -This is based on considering random bit positions as empty partitions.
> It is not possible to directly use JDK 1.6's BitSet directly as it does not support valueOf, toByteArray() functions.  Suggestion is to have Tez specific BitSet (until Tez moves to JDK 1.7) and make the compression as a job configuration.



--
This message was sent by Atlassian JIRA
(v6.2#6252)