You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Jonathan Coveney (JIRA)" <ji...@apache.org> on 2012/08/08 20:22:20 UTC
[jira] [Created] (PIG-2862) Hardcore certain tuple lengths into the
TUPLE BinInterSedes byte identifier
Jonathan Coveney created PIG-2862:
-------------------------------------
Summary: Hardcore certain tuple lengths into the TUPLE BinInterSedes byte identifier
Key: PIG-2862
URL: https://issues.apache.org/jira/browse/PIG-2862
Project: Pig
Issue Type: Improvement
Reporter: Jonathan Coveney
Assignee: Jonathan Coveney
Attachments: PIG-2862-0.patch
Right now, there is TUPLE, SMALLTUPLE, and TINYTUPLE to try and save space when writing out Tuples. There is no reason, however, that this can't be hardcoded for common tuple sizes (<10) to further save space. A quick fix that has positive benefits.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2862) Hardcode certain tuple lengths into
the TUPLE BinInterSedes byte identifier
Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431362#comment-13431362 ]
Jonathan Coveney commented on PIG-2862:
---------------------------------------
Given that tuple serialization/deserialization is key to any and all pig jobs (esp. the intermediate data in between the map and reduce phase), then the mere fact that the test suite runs means it is ok.
The savings come from 1 less byte per Tuple being serialized (both a size and CPU saving). The magnitude of that saving isn't gigantic, but it's essentially ~8MB per 1000000 records.
> Hardcode certain tuple lengths into the TUPLE BinInterSedes byte identifier
> ---------------------------------------------------------------------------
>
> Key: PIG-2862
> URL: https://issues.apache.org/jira/browse/PIG-2862
> Project: Pig
> Issue Type: Improvement
> Reporter: Jonathan Coveney
> Assignee: Jonathan Coveney
> Attachments: PIG-2862-0.patch
>
>
> Right now, there is TUPLE, SMALLTUPLE, and TINYTUPLE to try and save space when writing out Tuples. There is no reason, however, that this can't be hardcoded for common tuple sizes (<10) to further save space. A quick fix that has positive benefits.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2862) Hardcode certain tuple lengths into
the TUPLE BinInterSedes byte identifier
Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431512#comment-13431512 ]
Dmitriy V. Ryaboy commented on PIG-2862:
----------------------------------------
+1, why not.
> Hardcode certain tuple lengths into the TUPLE BinInterSedes byte identifier
> ---------------------------------------------------------------------------
>
> Key: PIG-2862
> URL: https://issues.apache.org/jira/browse/PIG-2862
> Project: Pig
> Issue Type: Improvement
> Reporter: Jonathan Coveney
> Assignee: Jonathan Coveney
> Attachments: PIG-2862-0.patch
>
>
> Right now, there is TUPLE, SMALLTUPLE, and TINYTUPLE to try and save space when writing out Tuples. There is no reason, however, that this can't be hardcoded for common tuple sizes (<10) to further save space. A quick fix that has positive benefits.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2862) Hardcore certain tuple lengths into the
TUPLE BinInterSedes byte identifier
Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Coveney updated PIG-2862:
----------------------------------
Attachment: PIG-2862-0.patch
> Hardcore certain tuple lengths into the TUPLE BinInterSedes byte identifier
> ---------------------------------------------------------------------------
>
> Key: PIG-2862
> URL: https://issues.apache.org/jira/browse/PIG-2862
> Project: Pig
> Issue Type: Improvement
> Reporter: Jonathan Coveney
> Assignee: Jonathan Coveney
> Attachments: PIG-2862-0.patch
>
>
> Right now, there is TUPLE, SMALLTUPLE, and TINYTUPLE to try and save space when writing out Tuples. There is no reason, however, that this can't be hardcoded for common tuple sizes (<10) to further save space. A quick fix that has positive benefits.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2862) Hardcode certain tuple lengths into
the TUPLE BinInterSedes byte identifier
Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431355#comment-13431355 ]
Dmitriy V. Ryaboy commented on PIG-2862:
----------------------------------------
That looks ok.. how did you test, where do the savings come from, and how much savings?
> Hardcode certain tuple lengths into the TUPLE BinInterSedes byte identifier
> ---------------------------------------------------------------------------
>
> Key: PIG-2862
> URL: https://issues.apache.org/jira/browse/PIG-2862
> Project: Pig
> Issue Type: Improvement
> Reporter: Jonathan Coveney
> Assignee: Jonathan Coveney
> Attachments: PIG-2862-0.patch
>
>
> Right now, there is TUPLE, SMALLTUPLE, and TINYTUPLE to try and save space when writing out Tuples. There is no reason, however, that this can't be hardcoded for common tuple sizes (<10) to further save space. A quick fix that has positive benefits.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2862) Hardcode certain tuple lengths into the
TUPLE BinInterSedes byte identifier
Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Coveney updated PIG-2862:
----------------------------------
Summary: Hardcode certain tuple lengths into the TUPLE BinInterSedes byte identifier (was: Hardcore certain tuple lengths into the TUPLE BinInterSedes byte identifier)
> Hardcode certain tuple lengths into the TUPLE BinInterSedes byte identifier
> ---------------------------------------------------------------------------
>
> Key: PIG-2862
> URL: https://issues.apache.org/jira/browse/PIG-2862
> Project: Pig
> Issue Type: Improvement
> Reporter: Jonathan Coveney
> Assignee: Jonathan Coveney
> Attachments: PIG-2862-0.patch
>
>
> Right now, there is TUPLE, SMALLTUPLE, and TINYTUPLE to try and save space when writing out Tuples. There is no reason, however, that this can't be hardcoded for common tuple sizes (<10) to further save space. A quick fix that has positive benefits.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2862) Hardcore certain tuple lengths into the
TUPLE BinInterSedes byte identifier
Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Coveney updated PIG-2862:
----------------------------------
Status: Patch Available (was: Open)
> Hardcore certain tuple lengths into the TUPLE BinInterSedes byte identifier
> ---------------------------------------------------------------------------
>
> Key: PIG-2862
> URL: https://issues.apache.org/jira/browse/PIG-2862
> Project: Pig
> Issue Type: Improvement
> Reporter: Jonathan Coveney
> Assignee: Jonathan Coveney
> Attachments: PIG-2862-0.patch
>
>
> Right now, there is TUPLE, SMALLTUPLE, and TINYTUPLE to try and save space when writing out Tuples. There is no reason, however, that this can't be hardcoded for common tuple sizes (<10) to further save space. A quick fix that has positive benefits.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira