You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Jonathan Coveney (JIRA)" <ji...@apache.org> on 2012/08/08 20:22:20 UTC

[jira] [Created] (PIG-2862) Hardcore certain tuple lengths into the TUPLE BinInterSedes byte identifier

Jonathan Coveney created PIG-2862:
-------------------------------------

             Summary: Hardcore certain tuple lengths into the TUPLE BinInterSedes byte identifier
                 Key: PIG-2862
                 URL: https://issues.apache.org/jira/browse/PIG-2862
             Project: Pig
          Issue Type: Improvement
            Reporter: Jonathan Coveney
            Assignee: Jonathan Coveney
         Attachments: PIG-2862-0.patch

Right now, there is TUPLE, SMALLTUPLE, and TINYTUPLE to try and save space when writing out Tuples. There is no reason, however, that this can't be hardcoded for common tuple sizes (<10) to further save space. A quick fix that has positive benefits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2862) Hardcode certain tuple lengths into the TUPLE BinInterSedes byte identifier

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431362#comment-13431362 ] 

Jonathan Coveney commented on PIG-2862:
---------------------------------------

Given that tuple serialization/deserialization is key to any and all pig jobs (esp. the intermediate data in between the map and reduce phase), then the mere fact that the test suite runs means it is ok.

The savings come from 1 less byte per Tuple being serialized (both a size and CPU saving). The magnitude of that saving isn't gigantic, but it's essentially ~8MB per 1000000 records.
                
> Hardcode certain tuple lengths into the TUPLE BinInterSedes byte identifier
> ---------------------------------------------------------------------------
>
>                 Key: PIG-2862
>                 URL: https://issues.apache.org/jira/browse/PIG-2862
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>         Attachments: PIG-2862-0.patch
>
>
> Right now, there is TUPLE, SMALLTUPLE, and TINYTUPLE to try and save space when writing out Tuples. There is no reason, however, that this can't be hardcoded for common tuple sizes (<10) to further save space. A quick fix that has positive benefits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2862) Hardcode certain tuple lengths into the TUPLE BinInterSedes byte identifier

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431512#comment-13431512 ] 

Dmitriy V. Ryaboy commented on PIG-2862:
----------------------------------------

+1, why not.
                
> Hardcode certain tuple lengths into the TUPLE BinInterSedes byte identifier
> ---------------------------------------------------------------------------
>
>                 Key: PIG-2862
>                 URL: https://issues.apache.org/jira/browse/PIG-2862
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>         Attachments: PIG-2862-0.patch
>
>
> Right now, there is TUPLE, SMALLTUPLE, and TINYTUPLE to try and save space when writing out Tuples. There is no reason, however, that this can't be hardcoded for common tuple sizes (<10) to further save space. A quick fix that has positive benefits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2862) Hardcore certain tuple lengths into the TUPLE BinInterSedes byte identifier

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Coveney updated PIG-2862:
----------------------------------

    Attachment: PIG-2862-0.patch
    
> Hardcore certain tuple lengths into the TUPLE BinInterSedes byte identifier
> ---------------------------------------------------------------------------
>
>                 Key: PIG-2862
>                 URL: https://issues.apache.org/jira/browse/PIG-2862
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>         Attachments: PIG-2862-0.patch
>
>
> Right now, there is TUPLE, SMALLTUPLE, and TINYTUPLE to try and save space when writing out Tuples. There is no reason, however, that this can't be hardcoded for common tuple sizes (<10) to further save space. A quick fix that has positive benefits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2862) Hardcode certain tuple lengths into the TUPLE BinInterSedes byte identifier

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431355#comment-13431355 ] 

Dmitriy V. Ryaboy commented on PIG-2862:
----------------------------------------

That looks ok.. how did you test, where do the savings come from, and how much savings?
                
> Hardcode certain tuple lengths into the TUPLE BinInterSedes byte identifier
> ---------------------------------------------------------------------------
>
>                 Key: PIG-2862
>                 URL: https://issues.apache.org/jira/browse/PIG-2862
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>         Attachments: PIG-2862-0.patch
>
>
> Right now, there is TUPLE, SMALLTUPLE, and TINYTUPLE to try and save space when writing out Tuples. There is no reason, however, that this can't be hardcoded for common tuple sizes (<10) to further save space. A quick fix that has positive benefits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2862) Hardcode certain tuple lengths into the TUPLE BinInterSedes byte identifier

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Coveney updated PIG-2862:
----------------------------------

    Summary: Hardcode certain tuple lengths into the TUPLE BinInterSedes byte identifier  (was: Hardcore certain tuple lengths into the TUPLE BinInterSedes byte identifier)
    
> Hardcode certain tuple lengths into the TUPLE BinInterSedes byte identifier
> ---------------------------------------------------------------------------
>
>                 Key: PIG-2862
>                 URL: https://issues.apache.org/jira/browse/PIG-2862
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>         Attachments: PIG-2862-0.patch
>
>
> Right now, there is TUPLE, SMALLTUPLE, and TINYTUPLE to try and save space when writing out Tuples. There is no reason, however, that this can't be hardcoded for common tuple sizes (<10) to further save space. A quick fix that has positive benefits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2862) Hardcore certain tuple lengths into the TUPLE BinInterSedes byte identifier

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Coveney updated PIG-2862:
----------------------------------

    Status: Patch Available  (was: Open)
    
> Hardcore certain tuple lengths into the TUPLE BinInterSedes byte identifier
> ---------------------------------------------------------------------------
>
>                 Key: PIG-2862
>                 URL: https://issues.apache.org/jira/browse/PIG-2862
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>         Attachments: PIG-2862-0.patch
>
>
> Right now, there is TUPLE, SMALLTUPLE, and TINYTUPLE to try and save space when writing out Tuples. There is no reason, however, that this can't be hardcoded for common tuple sizes (<10) to further save space. A quick fix that has positive benefits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira