You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Jonathan Coveney (JIRA)" <ji...@apache.org> on 2012/07/03 23:00:44 UTC

[jira] [Commented] (PIG-2632) Create a SchemaTuple which generates efficient Tuples via code gen

    [ https://issues.apache.org/jira/browse/PIG-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406044#comment-13406044 ] 

Jonathan Coveney commented on PIG-2632:
---------------------------------------

Julien +1'd on reviewboard (didn't +1 here because JIRA has been down for people). Revision is: r1356921. I will add more documentation in a separate patch. This is TURNED OFF by default so should be invisible to existing jobs.
                
> Create a SchemaTuple which generates efficient Tuples via code gen
> ------------------------------------------------------------------
>
>                 Key: PIG-2632
>                 URL: https://issues.apache.org/jira/browse/PIG-2632
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.11
>
>         Attachments: PIG-2632-0.patch, PIG-2632-1.patch, PIG-2632-10.patch, PIG-2632-10.patch, PIG-2632-3.patch, PIG-2632-4.patch, PIG-2632-5.patch, PIG-2632-6.patch, PIG-2632-7.patch, PIG-2632-8.patch, PIG-2632-9.patch, PIG-2632-9.patch, schematuple benchmarking.pdf, schematuple benchmarking.pptx
>
>
> This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing the Schema on the frontend, we can code generate Tuples which can be used for fun and profit. In rudimentary tests, the memory efficiency is 2-4x better, and it's ~15% smaller serialized (heavily heavily depends on the data, though). Need to do get/set tests, but assuming that it's on par (or even faster) than Tuple, the memory gain is huge.
> Need to clean up the code and add tests.
> Right now, it generates a SchemaTuple for every inputSchema and outputSchema given to UDF's. The next step is to make a SchemaBag, where I think the serialization savings will be really huge.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira