You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2009/08/28 18:38:32 UTC

[jira] Created: (AVRO-108) add binary comparator

add binary comparator
---------------------

                 Key: AVRO-108
                 URL: https://issues.apache.org/jira/browse/AVRO-108
             Project: Avro
          Issue Type: New Feature
          Components: java
            Reporter: Doug Cutting


Hadoop MapReduce performance benefits greatly if data may be compared without deserializing to an object, but rather by examining its serialized bytes directly.  Such "raw" comparators are typically written by hand in Hadoop, and are very fragile.

With Avro it is possible to generically compare two serialized byte sequences if their schema is known.  This should work for any Avro data, regardless of how it was serialized or how it will be deserialized.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-108) add binary comparator

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-108:
------------------------------

    Fix Version/s: 1.0.1
         Assignee: Doug Cutting
           Status: Patch Available  (was: Open)

This is all-new code and includes tests.  I'll commit it soon unless someone objects.

> add binary comparator
> ---------------------
>
>                 Key: AVRO-108
>                 URL: https://issues.apache.org/jira/browse/AVRO-108
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.0.1
>
>         Attachments: AVRO-108.java
>
>
> Hadoop MapReduce performance benefits greatly if data may be compared without deserializing to an object, but rather by examining its serialized bytes directly.  Such "raw" comparators are typically written by hand in Hadoop, and are very fragile.
> With Avro it is possible to generically compare two serialized byte sequences if their schema is known.  This should work for any Avro data, regardless of how it was serialized or how it will be deserialized.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-108) add binary comparator

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-108:
------------------------------

    Attachment: AVRO-108.java

Here's a patch that implements this.

> add binary comparator
> ---------------------
>
>                 Key: AVRO-108
>                 URL: https://issues.apache.org/jira/browse/AVRO-108
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Doug Cutting
>         Attachments: AVRO-108.java
>
>
> Hadoop MapReduce performance benefits greatly if data may be compared without deserializing to an object, but rather by examining its serialized bytes directly.  Such "raw" comparators are typically written by hand in Hadoop, and are very fragile.
> With Avro it is possible to generically compare two serialized byte sequences if their schema is known.  This should work for any Avro data, regardless of how it was serialized or how it will be deserialized.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AVRO-108) add binary comparator

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748893#action_12748893 ] 

Doug Cutting commented on AVRO-108:
-----------------------------------

An API for this might be something like:

  BinaryComparator.compare(byte[] bytes1, int start1, byte[] bytes2, int start2, Schema schema);

The schema provided must be the schema used to write the data.

Records would be ordered using the order of their fields, arrays and maps by their entries, unions by their branches, etc.


> add binary comparator
> ---------------------
>
>                 Key: AVRO-108
>                 URL: https://issues.apache.org/jira/browse/AVRO-108
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Doug Cutting
>
> Hadoop MapReduce performance benefits greatly if data may be compared without deserializing to an object, but rather by examining its serialized bytes directly.  Such "raw" comparators are typically written by hand in Hadoop, and are very fragile.
> With Avro it is possible to generically compare two serialized byte sequences if their schema is known.  This should work for any Avro data, regardless of how it was serialized or how it will be deserialized.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AVRO-108) add binary comparator

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-108:
------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this.

> add binary comparator
> ---------------------
>
>                 Key: AVRO-108
>                 URL: https://issues.apache.org/jira/browse/AVRO-108
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.0.1
>
>         Attachments: AVRO-108.java
>
>
> Hadoop MapReduce performance benefits greatly if data may be compared without deserializing to an object, but rather by examining its serialized bytes directly.  Such "raw" comparators are typically written by hand in Hadoop, and are very fragile.
> With Avro it is possible to generically compare two serialized byte sequences if their schema is known.  This should work for any Avro data, regardless of how it was serialized or how it will be deserialized.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.