You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Rajesh Balamohan (JIRA)" <ji...@apache.org> on 2014/03/23 10:50:43 UTC
[jira] [Created] (TEZ-972) Shuffle Phase - optimize memory usage of
empty partition data in DataMovementEvent
Rajesh Balamohan created TEZ-972:
------------------------------------
Summary: Shuffle Phase - optimize memory usage of empty partition data in DataMovementEvent
Key: TEZ-972
URL: https://issues.apache.org/jira/browse/TEZ-972
Project: Apache Tez
Issue Type: Improvement
Reporter: Rajesh Balamohan
Empty partition details are stored in byte[] in compressed format and sent via DataMovementEvent in shuffle phase. Quick standalone tests reveals that BitSet would be more efficient than compressing the byte[].
PartitionSize=1 , BitSetSize=1 , CompressedBitSetSize=9 , NormalByteArrayCompressed=9
PartitionSize=101 , BitSetSize=13 , CompressedBitSetSize=22 , NormalByteArrayCompressed=42
PartitionSize=201 , BitSetSize=26 , CompressedBitSetSize=37 , NormalByteArrayCompressed=62
PartitionSize=301 , BitSetSize=38 , CompressedBitSetSize=49 , NormalByteArrayCompressed=76
..
PartitionSize=1001 , BitSetSize=126 , CompressedBitSetSize=137 , NormalByteArrayCompressed=197
..
PartitionSize=2001 , BitSetSize=251 , CompressedBitSetSize=262 , NormalByteArrayCompressed=374
PartitionSize=4001 , BitSetSize=501 , CompressedBitSetSize=512 , NormalByteArrayCompressed=686
PartitionSize=8001 , BitSetSize=1001 , CompressedBitSetSize=1012 , NormalByteArrayCompressed=1330
PartitionSize=16001 , BitSetSize=2001 , CompressedBitSetSize=1979 , NormalByteArrayCompressed=2569
PartitionSize=32001 , BitSetSize=4001 , CompressedBitSetSize=3885 , NormalByteArrayCompressed=5000
-This is based on considering random bit positions as empty partitions.
It is not possible to directly use JDK 1.6's BitSet directly as it does not support valueOf, toByteArray() functions. Suggestion is to have Tez specific BitSet (until Tez moves to JDK 1.7) and make the compression as a job configuration.
--
This message was sent by Atlassian JIRA
(v6.2#6252)