You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by "Josh Wills (JIRA)" <ji...@apache.org> on 2013/03/04 07:15:12 UTC

[jira] [Updated] (CRUNCH-173) Make WritableTypeFamily more compact for composite types

     [ https://issues.apache.org/jira/browse/CRUNCH-173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Josh Wills updated CRUNCH-173:
------------------------------

    Attachment: CRUNCH-173.patch

Here's what it looks like-- not the prettiest thing ever, but a good deal faster (30% or so) on some of my test sets on the cluster, where I'm essentially trading off IO for CPU. The difference on the unit/integration tests is pretty marginal since we don't write that much data out.
                
> Make WritableTypeFamily more compact for composite types
> --------------------------------------------------------
>
>                 Key: CRUNCH-173
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-173
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>            Reporter: Josh Wills
>            Assignee: Josh Wills
>         Attachments: CRUNCH-173.patch
>
>
> I'm throwing this out as something of a strawman JIRA: it's always bugged me how verbose the serialization of TupleWritable et al. are compared to the Avro formats, so I took a crack at changing their underlying serialization to be more compact by doing more things in terms of BytesWritable and using the wrapping MapFns in order to do more of the de-serialization work. Patch is attached, if anyone is interested in this or has an opinion on whether or not this is a good idea, I'd love to hear it. The big pro is that Crunch jobs that have to use writables will run faster as a result, the downside is that it's not backwards compatible and it makes the code more complex.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira