You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by "Gabriel Reid (JIRA)" <ji...@apache.org> on 2014/01/23 22:25:39 UTC

[jira] [Comment Edited] (CRUNCH-329) Re-add type info to TupleWritable to make fields sort correctly

    [ https://issues.apache.org/jira/browse/CRUNCH-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880377#comment-13880377 ] 

Gabriel Reid edited comment on CRUNCH-329 at 1/23/14 9:24 PM:
--------------------------------------------------------------

{quote}
what if we made the writable codes explicit-- i.e., you had to call something like Writables.setCode(Class<? extends Writable> clazz, int code) and re-defining an existing code was a runtime error?
{quote}

Yeah, that would work too. The thing that still worries me with that it means that the user needs to remember which code a class is registered as, as well as knowing that they have to explicitly register them, otherwise things will break. Probably not too much of a real issue, but a bit of a drag nonetheless.

I'm almost starting to wonder if it's an option to skip the serialization codes altogether and just save the schema of the tuple in the Configuration. On the one hand that sounds like an even better way to go, and on the other hand it kind of sounds like re-inventing Avro.


was (Author: gabriel.reid):
{blockquote}
what if we made the writable codes explicit-- i.e., you had to call something like Writables.setCode(Class<? extends Writable> clazz, int code) and re-defining an existing code was a runtime error?
{blockquote}

Yeah, that would work too. The thing that still worries me with that it means that the user needs to remember which code a class is registered as, as well as knowing that they have to explicitly register them, otherwise things will break. Probably not too much of a real issue, but a bit of a drag nonetheless.

I'm almost starting to wonder if it's an option to skip the serialization codes altogether and just save the schema of the tuple in the Configuration. On the one hand that sounds like an even better way to go, and on the other hand it kind of sounds like re-inventing Avro.

> Re-add type info to TupleWritable to make fields sort correctly
> ---------------------------------------------------------------
>
>                 Key: CRUNCH-329
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-329
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.10.0, 0.8.3
>            Reporter: Josh Wills
>            Assignee: Josh Wills
>             Fix For: 0.10.0, 0.8.3
>
>         Attachments: fix-ss-writables.patch
>
>
> Secondary sorts aren't currently working correctly for Writable types after we hacked the TupleWritable impl to make all of the fields BytesWritables (e.g., secondary IntWritable values will no longer be sorted correctly, even though everything is still grouped correctly.)
> The least-bad way that I came up with to fix this is to use integer codes for each possible WritableComparable type in a pipeline that we can use to decode what Writable type each tuple field corresponds to. This allows us to keep the various fields sortable while still doing a reasonable job of minimizing the serialization required to pass the type information along.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)