You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Sameer Paranjpye (JIRA)" <ji...@apache.org> on 2006/09/13 00:49:22 UTC

[jira] Created: (HADOOP-525) Need raw comparators for hadoop record types

Need raw comparators for hadoop record types
--------------------------------------------

                 Key: HADOOP-525
                 URL: http://issues.apache.org/jira/browse/HADOOP-525
             Project: Hadoop
          Issue Type: Improvement
          Components: record
    Affects Versions: 0.6.0
            Reporter: Sameer Paranjpye
         Assigned To: Milind Bhandarkar
            Priority: Minor
             Fix For: 0.7.0


Raw comparators are not generated for types that are generated with the Hadoop record framework. This could have a substantial performance impact when using hadoop record generated types in Map/Reduce. The record i/o framework should auto-generate raw comparators for types.

Comparison for hadoop record i/o types is defined to be member wise comparison of objects. A possible implementation could only deserialize one member from each object at a time, compare them and either return or move on to the next member if the values are equal.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-525) Need raw comparators for hadoop record types

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-525?page=comments#action_12460622 ] 
            
Hadoop QA commented on HADOOP-525:
----------------------------------

+1, because http://issues.apache.org/jira/secure/attachment/12347782/raw-comparators.patch applied and successfully tested against trunk revision r489707.

> Need raw comparators for hadoop record types
> --------------------------------------------
>
>                 Key: HADOOP-525
>                 URL: http://issues.apache.org/jira/browse/HADOOP-525
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: record
>    Affects Versions: 0.6.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.10.0
>
>         Attachments: raw-comparators.patch, TypeBuilder-support.tar, TypeBuilder.java, WordCountType.java
>
>
> Raw comparators are not generated for types that are generated with the Hadoop record framework. This could have a substantial performance impact when using hadoop record generated types in Map/Reduce. The record i/o framework should auto-generate raw comparators for types.
> Comparison for hadoop record i/o types is defined to be member wise comparison of objects. A possible implementation could only deserialize one member from each object at a time, compare them and either return or move on to the next member if the values are equal.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-525) Need raw comparators for hadoop record types

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-525?page=all ]

Milind Bhandarkar updated HADOOP-525:
-------------------------------------

    Status: Patch Available  (was: Open)

> Need raw comparators for hadoop record types
> --------------------------------------------
>
>                 Key: HADOOP-525
>                 URL: http://issues.apache.org/jira/browse/HADOOP-525
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: record
>    Affects Versions: 0.6.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.10.0
>
>         Attachments: raw-comparators.patch, TypeBuilder-support.tar, TypeBuilder.java, WordCountType.java
>
>
> Raw comparators are not generated for types that are generated with the Hadoop record framework. This could have a substantial performance impact when using hadoop record generated types in Map/Reduce. The record i/o framework should auto-generate raw comparators for types.
> Comparison for hadoop record i/o types is defined to be member wise comparison of objects. A possible implementation could only deserialize one member from each object at a time, compare them and either return or move on to the next member if the values are equal.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-525) Need raw comparators for hadoop record types

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-525?page=all ]

Milind Bhandarkar updated HADOOP-525:
-------------------------------------

    Status: Open  (was: Patch Available)

produces javadoc warnings. cancelling patch.

> Need raw comparators for hadoop record types
> --------------------------------------------
>
>                 Key: HADOOP-525
>                 URL: http://issues.apache.org/jira/browse/HADOOP-525
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: record
>    Affects Versions: 0.6.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.10.0
>
>         Attachments: raw-comparators.patch, TypeBuilder-support.tar, TypeBuilder.java, WordCountType.java
>
>
> Raw comparators are not generated for types that are generated with the Hadoop record framework. This could have a substantial performance impact when using hadoop record generated types in Map/Reduce. The record i/o framework should auto-generate raw comparators for types.
> Comparison for hadoop record i/o types is defined to be member wise comparison of objects. A possible implementation could only deserialize one member from each object at a time, compare them and either return or move on to the next member if the values are equal.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-525) Need raw comparators for hadoop record types

Posted by "Trevor Strohman (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-525?page=all ]

Trevor Strohman updated HADOOP-525:
-----------------------------------

    Attachment: TypeBuilder.java

TypeBuilder class used in another MapReduce implementation.

> Need raw comparators for hadoop record types
> --------------------------------------------
>
>                 Key: HADOOP-525
>                 URL: http://issues.apache.org/jira/browse/HADOOP-525
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: record
>    Affects Versions: 0.6.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: TypeBuilder.java
>
>
> Raw comparators are not generated for types that are generated with the Hadoop record framework. This could have a substantial performance impact when using hadoop record generated types in Map/Reduce. The record i/o framework should auto-generate raw comparators for types.
> Comparison for hadoop record i/o types is defined to be member wise comparison of objects. A possible implementation could only deserialize one member from each object at a time, compare them and either return or move on to the next member if the values are equal.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Work started: (HADOOP-525) Need raw comparators for hadoop record types

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-525?page=all ]

Work on HADOOP-525 started by Milind Bhandarkar.

> Need raw comparators for hadoop record types
> --------------------------------------------
>
>                 Key: HADOOP-525
>                 URL: http://issues.apache.org/jira/browse/HADOOP-525
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: record
>    Affects Versions: 0.6.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>         Attachments: TypeBuilder-support.tar, TypeBuilder.java, WordCountType.java
>
>
> Raw comparators are not generated for types that are generated with the Hadoop record framework. This could have a substantial performance impact when using hadoop record generated types in Map/Reduce. The record i/o framework should auto-generate raw comparators for types.
> Comparison for hadoop record i/o types is defined to be member wise comparison of objects. A possible implementation could only deserialize one member from each object at a time, compare them and either return or move on to the next member if the values are equal.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-525) Need raw comparators for hadoop record types

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-525?page=all ]

Milind Bhandarkar updated HADOOP-525:
-------------------------------------

    Attachment: raw-comparators.patch

Patch attached. Added raw-comparators to Hadoop record I/O types. (Also fixes compareTo bugs, so that raw comparator semantics match compareTo semantics for generated types.


> Need raw comparators for hadoop record types
> --------------------------------------------
>
>                 Key: HADOOP-525
>                 URL: http://issues.apache.org/jira/browse/HADOOP-525
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: record
>    Affects Versions: 0.6.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.10.0
>
>         Attachments: raw-comparators.patch, TypeBuilder-support.tar, TypeBuilder.java, WordCountType.java
>
>
> Raw comparators are not generated for types that are generated with the Hadoop record framework. This could have a substantial performance impact when using hadoop record generated types in Map/Reduce. The record i/o framework should auto-generate raw comparators for types.
> Comparison for hadoop record i/o types is defined to be member wise comparison of objects. A possible implementation could only deserialize one member from each object at a time, compare them and either return or move on to the next member if the values are equal.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-525) Need raw comparators for hadoop record types

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-525?page=all ]

Doug Cutting updated HADOOP-525:
--------------------------------

    Fix Version/s: 0.8.0
                       (was: 0.7.0)

> Need raw comparators for hadoop record types
> --------------------------------------------
>
>                 Key: HADOOP-525
>                 URL: http://issues.apache.org/jira/browse/HADOOP-525
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: record
>    Affects Versions: 0.6.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.8.0
>
>
> Raw comparators are not generated for types that are generated with the Hadoop record framework. This could have a substantial performance impact when using hadoop record generated types in Map/Reduce. The record i/o framework should auto-generate raw comparators for types.
> Comparison for hadoop record i/o types is defined to be member wise comparison of objects. A possible implementation could only deserialize one member from each object at a time, compare them and either return or move on to the next member if the values are equal.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-525) Need raw comparators for hadoop record types

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12461824 ] 

Owen O'Malley commented on HADOOP-525:
--------------------------------------

Why are the generated files being checked into svn? It would be a much better test if they were generated, wouldn't it?


> Need raw comparators for hadoop record types
> --------------------------------------------
>
>                 Key: HADOOP-525
>                 URL: http://issues.apache.org/jira/browse/HADOOP-525
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: record
>    Affects Versions: 0.6.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.10.0
>
>         Attachments: raw-comparators.patch, TypeBuilder-support.tar, TypeBuilder.java, WordCountType.java
>
>
> Raw comparators are not generated for types that are generated with the Hadoop record framework. This could have a substantial performance impact when using hadoop record generated types in Map/Reduce. The record i/o framework should auto-generate raw comparators for types.
> Comparison for hadoop record i/o types is defined to be member wise comparison of objects. A possible implementation could only deserialize one member from each object at a time, compare them and either return or move on to the next member if the values are equal.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-525) Need raw comparators for hadoop record types

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-525?page=all ]

Milind Bhandarkar updated HADOOP-525:
-------------------------------------

           Status: Patch Available  (was: In Progress)
    Fix Version/s: 0.10.0

> Need raw comparators for hadoop record types
> --------------------------------------------
>
>                 Key: HADOOP-525
>                 URL: http://issues.apache.org/jira/browse/HADOOP-525
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: record
>    Affects Versions: 0.6.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.10.0
>
>         Attachments: raw-comparators.patch, TypeBuilder-support.tar, TypeBuilder.java, WordCountType.java
>
>
> Raw comparators are not generated for types that are generated with the Hadoop record framework. This could have a substantial performance impact when using hadoop record generated types in Map/Reduce. The record i/o framework should auto-generate raw comparators for types.
> Comparison for hadoop record i/o types is defined to be member wise comparison of objects. A possible implementation could only deserialize one member from each object at a time, compare them and either return or move on to the next member if the values are equal.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-525) Need raw comparators for hadoop record types

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12461831 ] 

Milind Bhandarkar commented on HADOOP-525:
------------------------------------------

Yes, that would be a better test. I will file a separate bug for that, though.


> Need raw comparators for hadoop record types
> --------------------------------------------
>
>                 Key: HADOOP-525
>                 URL: http://issues.apache.org/jira/browse/HADOOP-525
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: record
>    Affects Versions: 0.6.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.10.0
>
>         Attachments: raw-comparators.patch, TypeBuilder-support.tar, TypeBuilder.java, WordCountType.java
>
>
> Raw comparators are not generated for types that are generated with the Hadoop record framework. This could have a substantial performance impact when using hadoop record generated types in Map/Reduce. The record i/o framework should auto-generate raw comparators for types.
> Comparison for hadoop record i/o types is defined to be member wise comparison of objects. A possible implementation could only deserialize one member from each object at a time, compare them and either return or move on to the next member if the values are equal.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-525) Need raw comparators for hadoop record types

Posted by "Trevor Strohman (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-525?page=all ]

Trevor Strohman updated HADOOP-525:
-----------------------------------

    Attachment: TypeBuilder-support.tar

Support code for TypeBuilder.

> Need raw comparators for hadoop record types
> --------------------------------------------
>
>                 Key: HADOOP-525
>                 URL: http://issues.apache.org/jira/browse/HADOOP-525
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: record
>    Affects Versions: 0.6.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: TypeBuilder-support.tar, TypeBuilder.java, WordCountType.java
>
>
> Raw comparators are not generated for types that are generated with the Hadoop record framework. This could have a substantial performance impact when using hadoop record generated types in Map/Reduce. The record i/o framework should auto-generate raw comparators for types.
> Comparison for hadoop record i/o types is defined to be member wise comparison of objects. A possible implementation could only deserialize one member from each object at a time, compare them and either return or move on to the next member if the values are equal.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-525) Need raw comparators for hadoop record types

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-525?page=comments#action_12460605 ] 
            
Hadoop QA commented on HADOOP-525:
----------------------------------

-1, because the javadoc command appears to have generated warning messages when testing the latest attachment (http://issues.apache.org/jira/secure/attachment/12347780/raw-comparators.patch) against trunk revision r489707. Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.

> Need raw comparators for hadoop record types
> --------------------------------------------
>
>                 Key: HADOOP-525
>                 URL: http://issues.apache.org/jira/browse/HADOOP-525
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: record
>    Affects Versions: 0.6.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.10.0
>
>         Attachments: raw-comparators.patch, TypeBuilder-support.tar, TypeBuilder.java, WordCountType.java
>
>
> Raw comparators are not generated for types that are generated with the Hadoop record framework. This could have a substantial performance impact when using hadoop record generated types in Map/Reduce. The record i/o framework should auto-generate raw comparators for types.
> Comparison for hadoop record i/o types is defined to be member wise comparison of objects. A possible implementation could only deserialize one member from each object at a time, compare them and either return or move on to the next member if the values are equal.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-525) Need raw comparators for hadoop record types

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-525?page=all ]

Milind Bhandarkar updated HADOOP-525:
-------------------------------------

    Attachment: raw-comparators.patch

Attached patch that does not emit javadoc warnings.

> Need raw comparators for hadoop record types
> --------------------------------------------
>
>                 Key: HADOOP-525
>                 URL: http://issues.apache.org/jira/browse/HADOOP-525
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: record
>    Affects Versions: 0.6.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.10.0
>
>         Attachments: raw-comparators.patch, TypeBuilder-support.tar, TypeBuilder.java, WordCountType.java
>
>
> Raw comparators are not generated for types that are generated with the Hadoop record framework. This could have a substantial performance impact when using hadoop record generated types in Map/Reduce. The record i/o framework should auto-generate raw comparators for types.
> Comparison for hadoop record i/o types is defined to be member wise comparison of objects. A possible implementation could only deserialize one member from each object at a time, compare them and either return or move on to the next member if the values are equal.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-525) Need raw comparators for hadoop record types

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-525?page=all ]

Owen O'Malley updated HADOOP-525:
---------------------------------

    Fix Version/s: 0.9.0
                       (was: 0.8.0)

> Need raw comparators for hadoop record types
> --------------------------------------------
>
>                 Key: HADOOP-525
>                 URL: http://issues.apache.org/jira/browse/HADOOP-525
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: record
>    Affects Versions: 0.6.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.9.0
>
>         Attachments: TypeBuilder-support.tar, TypeBuilder.java, WordCountType.java
>
>
> Raw comparators are not generated for types that are generated with the Hadoop record framework. This could have a substantial performance impact when using hadoop record generated types in Map/Reduce. The record i/o framework should auto-generate raw comparators for types.
> Comparison for hadoop record i/o types is defined to be member wise comparison of objects. A possible implementation could only deserialize one member from each object at a time, compare them and either return or move on to the next member if the values are equal.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-525) Need raw comparators for hadoop record types

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-525?page=all ]

Doug Cutting updated HADOOP-525:
--------------------------------

    Priority: Major  (was: Minor)

A raw comparator shouldn't have to deserialize fields, but should operate directly on the field data.  For primitive fields we'd generate calls to methods like WritableComparator.{readInt,readLong,...}.  For Text, we'd generate calls to WritableComparator.compareBytes().  For complex objects we'd generate calls to their raw comparator.

Besides having a huge performance benefit, adding raw comparators to records would solve other problems with Hadoop's io framework: currently it is possible for raw and cooked comparators to differ.  But if both are auto-generated from the same source they'll be guaranteed compatible.  Also, raw comparators are fragile and difficult to develop, since they bypass all type mechanisms.  Generated code would ensure correctness.

I've increased the priority of this issue.  We should implement this and start using records more extensively.  Prior we've mostly thought of records as an aid for interoperability with other programming languages, but I think they'll also be a valuable for performance and correctness.

> Need raw comparators for hadoop record types
> --------------------------------------------
>
>                 Key: HADOOP-525
>                 URL: http://issues.apache.org/jira/browse/HADOOP-525
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: record
>    Affects Versions: 0.6.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.8.0
>
>         Attachments: TypeBuilder-support.tar, TypeBuilder.java, WordCountType.java
>
>
> Raw comparators are not generated for types that are generated with the Hadoop record framework. This could have a substantial performance impact when using hadoop record generated types in Map/Reduce. The record i/o framework should auto-generate raw comparators for types.
> Comparison for hadoop record i/o types is defined to be member wise comparison of objects. A possible implementation could only deserialize one member from each object at a time, compare them and either return or move on to the next member if the values are equal.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-525) Need raw comparators for hadoop record types

Posted by "Trevor Strohman (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-525?page=all ]

Trevor Strohman updated HADOOP-525:
-----------------------------------

    Attachment: WordCountType.java

Example Type class autogenerated by TypeBuilder.

> Need raw comparators for hadoop record types
> --------------------------------------------
>
>                 Key: HADOOP-525
>                 URL: http://issues.apache.org/jira/browse/HADOOP-525
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: record
>    Affects Versions: 0.6.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: TypeBuilder.java, WordCountType.java
>
>
> Raw comparators are not generated for types that are generated with the Hadoop record framework. This could have a substantial performance impact when using hadoop record generated types in Map/Reduce. The record i/o framework should auto-generate raw comparators for types.
> Comparison for hadoop record i/o types is defined to be member wise comparison of objects. A possible implementation could only deserialize one member from each object at a time, compare them and either return or move on to the next member if the values are equal.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-525) Need raw comparators for hadoop record types

Posted by "Trevor Strohman (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-525?page=comments#action_12441544 ] 
            
Trevor Strohman commented on HADOOP-525:
----------------------------------------

I just added three files (TypeBuilder, WordCountType and TypeBuilder-support.tar) to this issue based on the thread referenced below.  This code comes from my own MapReduce implementation and is not directly Hadoop-compatible.  The code here handles automatic generation of record comparators, hash functions, and serialization code.  The serialization code in particular uses knowledge of object order to compress the output.

Reference: http://mail-archives.apache.org/mod_mbox/lucene-hadoop-user/200610.mbox/%3c452D1CA7.40000@apache.org%3e

> Need raw comparators for hadoop record types
> --------------------------------------------
>
>                 Key: HADOOP-525
>                 URL: http://issues.apache.org/jira/browse/HADOOP-525
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: record
>    Affects Versions: 0.6.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: TypeBuilder-support.tar, TypeBuilder.java, WordCountType.java
>
>
> Raw comparators are not generated for types that are generated with the Hadoop record framework. This could have a substantial performance impact when using hadoop record generated types in Map/Reduce. The record i/o framework should auto-generate raw comparators for types.
> Comparison for hadoop record i/o types is defined to be member wise comparison of objects. A possible implementation could only deserialize one member from each object at a time, compare them and either return or move on to the next member if the values are equal.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-525) Need raw comparators for hadoop record types

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-525?page=all ]

Milind Bhandarkar updated HADOOP-525:
-------------------------------------

    Attachment:     (was: raw-comparators.patch)

> Need raw comparators for hadoop record types
> --------------------------------------------
>
>                 Key: HADOOP-525
>                 URL: http://issues.apache.org/jira/browse/HADOOP-525
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: record
>    Affects Versions: 0.6.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.10.0
>
>         Attachments: TypeBuilder-support.tar, TypeBuilder.java, WordCountType.java
>
>
> Raw comparators are not generated for types that are generated with the Hadoop record framework. This could have a substantial performance impact when using hadoop record generated types in Map/Reduce. The record i/o framework should auto-generate raw comparators for types.
> Comparison for hadoop record i/o types is defined to be member wise comparison of objects. A possible implementation could only deserialize one member from each object at a time, compare them and either return or move on to the next member if the values are equal.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-525) Need raw comparators for hadoop record types

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-525:
--------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this.  Thanks, Milind!

> Need raw comparators for hadoop record types
> --------------------------------------------
>
>                 Key: HADOOP-525
>                 URL: https://issues.apache.org/jira/browse/HADOOP-525
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: record
>    Affects Versions: 0.6.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.10.0
>
>         Attachments: raw-comparators.patch, TypeBuilder-support.tar, TypeBuilder.java, WordCountType.java
>
>
> Raw comparators are not generated for types that are generated with the Hadoop record framework. This could have a substantial performance impact when using hadoop record generated types in Map/Reduce. The record i/o framework should auto-generate raw comparators for types.
> Comparison for hadoop record i/o types is defined to be member wise comparison of objects. A possible implementation could only deserialize one member from each object at a time, compare them and either return or move on to the next member if the values are equal.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira