You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org> on 2010/07/16 00:20:50 UTC

[jira] Commented: (PIG-1473) Avoid serialization/deserialization costs for PigStorage data - Use custom Map and Bag implementation

    [ https://issues.apache.org/jira/browse/PIG-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888942#action_12888942 ] 

Dmitriy V. Ryaboy commented on PIG-1473:
----------------------------------------

Thejas, do you think there could be any performance gains if we could delay deserialization of the top-level fields in the tuple, but deserialize whole maps or databags if they are touched?

> Avoid serialization/deserialization costs for PigStorage data - Use custom Map and Bag implementation
> -----------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1473
>                 URL: https://issues.apache.org/jira/browse/PIG-1473
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.8.0
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>             Fix For: 0.8.0
>
>
> Cost of serialization/deserialization (sedes) can be very high and avoiding it will improve performance.
> Avoid sedes when possible by implementing approach #3 proposed in http://wiki.apache.org/pig/AvoidingSedes .
> The load function uses subclass of Map and DataBag which holds the serialized copy.  LoadFunction delays deserialization of map and bag types until a member function of java.util.Map or DataBag is called. 
> Example of query where this will help -
> {CODE}
> l = LOAD 'file1' AS (a : int, b : map [ ]);
> f = FOREACH l GENERATE udf1(a), b;      
> fil = FILTER f BY $0 > 5;
> dump fil; -- Serialization of column b can be delayed until here using this approach .
> {CODE}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.