You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@tez.apache.org by "Rajesh Balamohan (JIRA)" <ji...@apache.org> on 2014/07/28 07:21:39 UTC

[jira] [Comment Edited] (TEZ-1288) Create FastTezSerialization as an optional feature

    [ https://issues.apache.org/jira/browse/TEZ-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075874#comment-14075874 ] 

Rajesh Balamohan edited comment on TEZ-1288 at 7/28/14 5:20 AM:
----------------------------------------------------------------

Changes:
- ValueIterator had a bug (old bug which existed in codebase) in readNextKey/readNextValue and earlier patch had to address that for unblocking.  Created a testcase for ValuesIterator with/without custom comparator (testcase exercises the Merger with multiple streams for in-memory and disk-based files) 
- Separate methods for key/value serialization


was (Author: rajesh.balamohan):
Changes:
- ValueIterator had a bug in readNextKey/readNextValue and earlier patch tried to address that as well.  Created a testcase for ValuesIterator with/without custom comparator (testcase exercises the Merger with multiple streams for in-memory and disk-based files) 
- Separate methods for key/value serialization

> Create FastTezSerialization as an optional feature
> --------------------------------------------------
>
>                 Key: TEZ-1288
>                 URL: https://issues.apache.org/jira/browse/TEZ-1288
>             Project: Apache Tez
>          Issue Type: Improvement
>    Affects Versions: 0.5.0
>            Reporter: Gopal V
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-1288.1.patch, TEZ-1288.2.patch, TEZ-1288.3.patch, TEZ-1288.4.patch
>
>
> Tez inherits the writable framework from map-reduce. 
> This is very flexible, but not particularly memory efficient for the small data types.
> When deserializing, each value and key has to be allocated afresh for each small chunk of data (new IntWritable instead of .set()).
> The bytes writable serialization operation always has to write a 4 byte prefix for  all values and keys, because of requirements around streamed .readFields() instead of a customer setter/getter impl.
> Implement a faster serialization mechanism for the inner loop of sort, spill, merge, which doesn't trigger the GC and avoids adding simplistic overheads to the IFile format.



--
This message was sent by Atlassian JIRA
(v6.2#6252)