You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Jonathan Coveney (JIRA)" <ji...@apache.org> on 2012/10/30 19:04:12 UTC

[jira] [Created] (PIG-3017) Pig's object serialization should use compression

Jonathan Coveney created PIG-3017:
-------------------------------------

             Summary: Pig's object serialization should use compression
                 Key: PIG-3017
                 URL: https://issues.apache.org/jira/browse/PIG-3017
             Project: Pig
          Issue Type: Bug
            Reporter: Jonathan Coveney
            Assignee: Jonathan Coveney
             Fix For: 0.12


We have run into cases of very large JobConf objects, and part of this is the fact that serialized objects are quite large. There is no reason not to use compression here, and ratios should be quite high.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-3017) Pig's object serialization should use compression

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Coveney updated PIG-3017:
----------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Added to 0.11 and trunk, thanks Julien
                
> Pig's object serialization should use compression
> -------------------------------------------------
>
>                 Key: PIG-3017
>                 URL: https://issues.apache.org/jira/browse/PIG-3017
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3017-0.patch
>
>
> We have run into cases of very large JobConf objects, and part of this is the fact that serialized objects are quite large. There is no reason not to use compression here, and ratios should be quite high.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3017) Pig's object serialization should use compression

Posted by "Prashant Kommireddi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487192#comment-13487192 ] 

Prashant Kommireddi commented on PIG-3017:
------------------------------------------

Sounds reasonable, thanks.
                
> Pig's object serialization should use compression
> -------------------------------------------------
>
>                 Key: PIG-3017
>                 URL: https://issues.apache.org/jira/browse/PIG-3017
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3017-0.patch
>
>
> We have run into cases of very large JobConf objects, and part of this is the fact that serialized objects are quite large. There is no reason not to use compression here, and ratios should be quite high.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-3017) Pig's object serialization should use compression

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Coveney updated PIG-3017:
----------------------------------

    Status: Patch Available  (was: Open)
    
> Pig's object serialization should use compression
> -------------------------------------------------
>
>                 Key: PIG-3017
>                 URL: https://issues.apache.org/jira/browse/PIG-3017
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3017-0.patch
>
>
> We have run into cases of very large JobConf objects, and part of this is the fact that serialized objects are quite large. There is no reason not to use compression here, and ratios should be quite high.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3017) Pig's object serialization should use compression

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487186#comment-13487186 ] 

Jonathan Coveney commented on PIG-3017:
---------------------------------------

Well, I don't know the absolute size because I had a script where the JobConf was failing out at about 6.5MB...I'm not sure if it fails as soon as it crosses the thresh-hold, or if it fails after serializing everything. That said, after this patch, the same JobConf was 600KB, so about 10x (note that I also changed it to use Base64 encoding). Also, as far as serialization time, it's still in the realm  of ~5MB, so compression time is negligible. I did not do extensive testing around the specifics, though. 
                
> Pig's object serialization should use compression
> -------------------------------------------------
>
>                 Key: PIG-3017
>                 URL: https://issues.apache.org/jira/browse/PIG-3017
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3017-0.patch
>
>
> We have run into cases of very large JobConf objects, and part of this is the fact that serialized objects are quite large. There is no reason not to use compression here, and ratios should be quite high.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3017) Pig's object serialization should use compression

Posted by "Prashant Kommireddi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487127#comment-13487127 ] 

Prashant Kommireddi commented on PIG-3017:
------------------------------------------

Hey Jon, out of curiosity - have you done any comparison between object sizes before and after this patch, and also comparisons w.r.t time? Just trying to understand if Deflater.BEST_COMPRESSION is the ideal choice.
                
> Pig's object serialization should use compression
> -------------------------------------------------
>
>                 Key: PIG-3017
>                 URL: https://issues.apache.org/jira/browse/PIG-3017
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3017-0.patch
>
>
> We have run into cases of very large JobConf objects, and part of this is the fact that serialized objects are quite large. There is no reason not to use compression here, and ratios should be quite high.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3017) Pig's object serialization should use compression

Posted by "Julien Le Dem (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487279#comment-13487279 ] 

Julien Le Dem commented on PIG-3017:
------------------------------------

+1 looks good to me
                
> Pig's object serialization should use compression
> -------------------------------------------------
>
>                 Key: PIG-3017
>                 URL: https://issues.apache.org/jira/browse/PIG-3017
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3017-0.patch
>
>
> We have run into cases of very large JobConf objects, and part of this is the fact that serialized objects are quite large. There is no reason not to use compression here, and ratios should be quite high.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-3017) Pig's object serialization should use compression

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Coveney updated PIG-3017:
----------------------------------

    Attachment: PIG-3017-0.patch

This passes test-commit, though I had to change one golden file because this serialization affect results (golden files *shakes fist*).
                
> Pig's object serialization should use compression
> -------------------------------------------------
>
>                 Key: PIG-3017
>                 URL: https://issues.apache.org/jira/browse/PIG-3017
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3017-0.patch
>
>
> We have run into cases of very large JobConf objects, and part of this is the fact that serialized objects are quite large. There is no reason not to use compression here, and ratios should be quite high.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira