You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2008/05/13 18:31:55 UTC

[jira] Created: (HADOOP-3380) need comparators in serializer framework

need comparators in serializer framework
----------------------------------------

                 Key: HADOOP-3380
                 URL: https://issues.apache.org/jira/browse/HADOOP-3380
             Project: Hadoop Core
          Issue Type: New Feature
          Components: io
            Reporter: Doug Cutting


The new serialization framework permits Hadoop to incorporate different serialization systems, including Hadoop's Writable, Thrift, Java Serialization, etc.  It provides a generic, extensible means (SerializationFactory) to create serializers and deserializers for arbitrary Java classes.  However it does not include a generic means to create comparators for these classes.  Comparators are required for MapReduce keys and many other computations.  Thus we should enhance the serialization framwork to provide comparators too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3380) need comparators in serializer framework

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12599451#action_12599451 ] 

Doug Cutting commented on HADOOP-3380:
--------------------------------------

> serializationFactory.getSerialization(IntWritable) equals serializationFactory.getSerialization(DoubleWritable)

No, that's not required and not the case with the current implementation.

> Anyway, aside from this, what is the benefit of tying getComparator() to serialization [...]

A RawComparator compares serialized data, not objects.


> need comparators in serializer framework
> ----------------------------------------
>
>                 Key: HADOOP-3380
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3380
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Doug Cutting
>         Attachments: comparator_wip1.patch, comparator_wip1.patch
>
>
> The new serialization framework permits Hadoop to incorporate different serialization systems, including Hadoop's Writable, Thrift, Java Serialization, etc.  It provides a generic, extensible means (SerializationFactory) to create serializers and deserializers for arbitrary Java classes.  However it does not include a generic means to create comparators for these classes.  Comparators are required for MapReduce keys and many other computations.  Thus we should enhance the serialization framwork to provide comparators too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3380) need comparators in serializer framework

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12598348#action_12598348 ] 

Doug Cutting commented on HADOOP-3380:
--------------------------------------

> Note that getComparator() takes the class as an argument [ ... ]

That makes sense, but I had assumed that the Serialization could keep a pointer to the class and then use that in its implementation of getComparator() to look up a registered comparator.  Applications would not need to pass a class to Serialization#getComparator(), since Serializeation is already parameterized by class.


> need comparators in serializer framework
> ----------------------------------------
>
>                 Key: HADOOP-3380
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3380
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Doug Cutting
>         Attachments: comparator_wip1.patch, comparator_wip1.patch
>
>
> The new serialization framework permits Hadoop to incorporate different serialization systems, including Hadoop's Writable, Thrift, Java Serialization, etc.  It provides a generic, extensible means (SerializationFactory) to create serializers and deserializers for arbitrary Java classes.  However it does not include a generic means to create comparators for these classes.  Comparators are required for MapReduce keys and many other computations.  Thus we should enhance the serialization framwork to provide comparators too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3380) need comparators in serializer framework

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12598204#action_12598204 ] 

Owen O'Malley commented on HADOOP-3380:
---------------------------------------

Enis,
   I think the fact that the raw comparators depend on the serialization used is precisely why Doug wants to put it there. It seemed wrong to me at first, but it is growing on me. One very unfortunate part of using raw comparators is that Hadoop doesn't have a reasonable story if the user wants to do object-based compares. (Ignoring the utterly non-performant raw comparator that deserializes the two keys and calls compare on them.)

> need comparators in serializer framework
> ----------------------------------------
>
>                 Key: HADOOP-3380
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3380
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Doug Cutting
>
> The new serialization framework permits Hadoop to incorporate different serialization systems, including Hadoop's Writable, Thrift, Java Serialization, etc.  It provides a generic, extensible means (SerializationFactory) to create serializers and deserializers for arbitrary Java classes.  However it does not include a generic means to create comparators for these classes.  Comparators are required for MapReduce keys and many other computations.  Thus we should enhance the serialization framwork to provide comparators too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3380) need comparators in serializer framework

Posted by "Chris K Wensel (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596505#action_12596505 ] 

Chris K Wensel commented on HADOOP-3380:
----------------------------------------

Ok, I think I get the gist of the new Serialization stuff. 

If I'm correct, I only would need RawComparator to be Configurable. And I can continue to override the comparator via JobConf. I don't see that I would need to implement Serialization, since Writable is suitable...

> need comparators in serializer framework
> ----------------------------------------
>
>                 Key: HADOOP-3380
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3380
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Doug Cutting
>
> The new serialization framework permits Hadoop to incorporate different serialization systems, including Hadoop's Writable, Thrift, Java Serialization, etc.  It provides a generic, extensible means (SerializationFactory) to create serializers and deserializers for arbitrary Java classes.  However it does not include a generic means to create comparators for these classes.  Comparators are required for MapReduce keys and many other computations.  Thus we should enhance the serialization framwork to provide comparators too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3380) need comparators in serializer framework

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596429#action_12596429 ] 

Doug Cutting commented on HADOOP-3380:
--------------------------------------

A simple way to add comparators would be to add the method:

RawComparator Serialiation#getComparator();

Serialization is an interface, so this would be an incompatible change.  We should make Serialzation an abstract class at the same time, so that we can modify it further in the future without breaking implementations.

We should then implement this method in WritableSerialization and JavaSerialization, the existing Serialization implementations.

> need comparators in serializer framework
> ----------------------------------------
>
>                 Key: HADOOP-3380
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3380
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Doug Cutting
>
> The new serialization framework permits Hadoop to incorporate different serialization systems, including Hadoop's Writable, Thrift, Java Serialization, etc.  It provides a generic, extensible means (SerializationFactory) to create serializers and deserializers for arbitrary Java classes.  However it does not include a generic means to create comparators for these classes.  Comparators are required for MapReduce keys and many other computations.  Thus we should enhance the serialization framwork to provide comparators too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3380) need comparators in serializer framework

Posted by "Enis Soztutar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596730#action_12596730 ] 

Enis Soztutar commented on HADOOP-3380:
---------------------------------------

With the introduction of serialization framework, the need for RawComparator is somewhat broken. 
In theory an object of some type (for example Double) can be serialized to its byte[] form in an arbitrary way by different serializers, so it is not possible to efficiently compare two byte arrays w/o actually deserializing the objects. Although some objects, especially writables, can precisely know how it is serialized and thus can benefit from raw byte comparison(in short we should keep RawComparator) 
Similarly the returned RawComparators returned by Serialization#getComparator() cannot do much except deserializing the objects and calling {{o1.compareTo(o2)}} (see {{DeserializerComparator}} and {{JavaSerializationComparator}}). 

I think we should 
# not change Serialization interface 
# introduce DefaultComparator extending DeserializerComparator, implementing Configurable, and with static {{register(Class, RawComparator)}} and {{get(Class)}} methods. 
DefaultComparator.get(Class keyClass) should check for registered Comparator instances for a given class, if unsuccessful, it should return itself, obtaining Deserializer by calling serializationFactory.getDeSerializer(c); 
# replace usages of WritableComparator#define() with DefaultComparator#register(), 
# WritableComparator extends DefaultComparator
# fix JobConf#getOutputValueGroupingComparator(), so that it uses DefaultComparator. 
# depracate JavaSerializationComparator (since it is not needed once we have DefaultComparator extending DeserializerComparator)

thoughts ? 

> need comparators in serializer framework
> ----------------------------------------
>
>                 Key: HADOOP-3380
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3380
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Doug Cutting
>
> The new serialization framework permits Hadoop to incorporate different serialization systems, including Hadoop's Writable, Thrift, Java Serialization, etc.  It provides a generic, extensible means (SerializationFactory) to create serializers and deserializers for arbitrary Java classes.  However it does not include a generic means to create comparators for these classes.  Comparators are required for MapReduce keys and many other computations.  Thus we should enhance the serialization framwork to provide comparators too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3380) need comparators in serializer framework

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596490#action_12596490 ] 

Doug Cutting commented on HADOOP-3380:
--------------------------------------

> those methods should both be altered to return RawComparator, not a WritableComparator, no?

Oops.  They already have been.  Nevermind!

> need comparators in serializer framework
> ----------------------------------------
>
>                 Key: HADOOP-3380
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3380
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Doug Cutting
>
> The new serialization framework permits Hadoop to incorporate different serialization systems, including Hadoop's Writable, Thrift, Java Serialization, etc.  It provides a generic, extensible means (SerializationFactory) to create serializers and deserializers for arbitrary Java classes.  However it does not include a generic means to create comparators for these classes.  Comparators are required for MapReduce keys and many other computations.  Thus we should enhance the serialization framwork to provide comparators too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3380) need comparators in serializer framework

Posted by "Chris K Wensel (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596474#action_12596474 ] 

Chris K Wensel commented on HADOOP-3380:
----------------------------------------

Ok, last comment makes good sense.

What's the relationship between this proposal and JobConf#getOutputValueGroupingComparator() and JobConf#getOutputKeyComparator()?  Is it just a replacement of WritableComparator,get(getMapOutputKeyClass())?

> need comparators in serializer framework
> ----------------------------------------
>
>                 Key: HADOOP-3380
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3380
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Doug Cutting
>
> The new serialization framework permits Hadoop to incorporate different serialization systems, including Hadoop's Writable, Thrift, Java Serialization, etc.  It provides a generic, extensible means (SerializationFactory) to create serializers and deserializers for arbitrary Java classes.  However it does not include a generic means to create comparators for these classes.  Comparators are required for MapReduce keys and many other computations.  Thus we should enhance the serialization framwork to provide comparators too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3380) need comparators in serializer framework

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596469#action_12596469 ] 

Doug Cutting commented on HADOOP-3380:
--------------------------------------

Under my proposal above, one would create a compator with:

RawComparator c = new SerializationFactory(conf).getSerialization(MyKey.class).getComparator();

So a configuration would be involved, and a serialization framework could in theory support configurable comparators.  On the other hand, doing so efficiently might be hard.  One could, e.g., implement JavaSerialization#getComparator() to read a configuration parameter that names a list of fields and use introspection to order things by those fields.  Ideally it would generate comparator code and compile it on the fly, but that's a lot of work.  Record IO provides a single generated comparator that's efficient but not parameterized.  Thrift doesn't (yet) even generate comparators!  Ideally IDL-generated serializers might generate a general-purpose parameterized comparator, e.g., compare(int[] fieldIds), where {1,-3} might mean to order by increasing values of the first field and decreasing values of the third.

For text input (e.g., tab-separated), one could easily write a configurable comparator.  We could use the serialization framework to associate a Serialization for String that does that.  Would that suffice for now?

> need comparators in serializer framework
> ----------------------------------------
>
>                 Key: HADOOP-3380
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3380
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Doug Cutting
>
> The new serialization framework permits Hadoop to incorporate different serialization systems, including Hadoop's Writable, Thrift, Java Serialization, etc.  It provides a generic, extensible means (SerializationFactory) to create serializers and deserializers for arbitrary Java classes.  However it does not include a generic means to create comparators for these classes.  Comparators are required for MapReduce keys and many other computations.  Thus we should enhance the serialization framwork to provide comparators too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3380) need comparators in serializer framework

Posted by "Enis Soztutar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Enis Soztutar updated HADOOP-3380:
----------------------------------

    Attachment: comparator_wip1.patch

> need comparators in serializer framework
> ----------------------------------------
>
>                 Key: HADOOP-3380
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3380
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Doug Cutting
>         Attachments: comparator_wip1.patch, comparator_wip1.patch
>
>
> The new serialization framework permits Hadoop to incorporate different serialization systems, including Hadoop's Writable, Thrift, Java Serialization, etc.  It provides a generic, extensible means (SerializationFactory) to create serializers and deserializers for arbitrary Java classes.  However it does not include a generic means to create comparators for these classes.  Comparators are required for MapReduce keys and many other computations.  Thus we should enhance the serialization framwork to provide comparators too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3380) need comparators in serializer framework

Posted by "Enis Soztutar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Enis Soztutar updated HADOOP-3380:
----------------------------------

    Attachment: comparator_wip1.patch

bq. I think the fact that the raw comparators depend on the serialization used is precisely why Doug wants to put it there. 
Yes, but my point is that, the *default* comparator returned by some serialization cannot do much except for deserializing the objects and calling compareTo on them. Is this assumption not correct? In either case, the developer has to write its own comparator for a specific class, under a known serialization.  

If we want to allow different raw comparators for different serializations (of the same class), then we may define the API like :
{code}
RawComparator c = new SerializationFactory(conf).getSerialization(MyKey.class).getComparator(MyKey.class);
{code}
Note that getComparator() takes the class as an argument so that it can return a registered comparator for that class, if any, if not it can return the default(deserializing) comparator. 

If we do not want to allow different raw comparators, then wouldn't the attached (half-baked) patch be enough ? 

> need comparators in serializer framework
> ----------------------------------------
>
>                 Key: HADOOP-3380
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3380
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Doug Cutting
>         Attachments: comparator_wip1.patch
>
>
> The new serialization framework permits Hadoop to incorporate different serialization systems, including Hadoop's Writable, Thrift, Java Serialization, etc.  It provides a generic, extensible means (SerializationFactory) to create serializers and deserializers for arbitrary Java classes.  However it does not include a generic means to create comparators for these classes.  Comparators are required for MapReduce keys and many other computations.  Thus we should enhance the serialization framwork to provide comparators too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3380) need comparators in serializer framework

Posted by "Chris K Wensel (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596453#action_12596453 ] 

Chris K Wensel commented on HADOOP-3380:
----------------------------------------

Would be useful for the RawComparator to be Configurable (or something similar) so that it can be configured during runtime.

This would be very useful for multi-part keys that need to arbitrarily sort on different positions.

> need comparators in serializer framework
> ----------------------------------------
>
>                 Key: HADOOP-3380
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3380
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Doug Cutting
>
> The new serialization framework permits Hadoop to incorporate different serialization systems, including Hadoop's Writable, Thrift, Java Serialization, etc.  It provides a generic, extensible means (SerializationFactory) to create serializers and deserializers for arbitrary Java classes.  However it does not include a generic means to create comparators for these classes.  Comparators are required for MapReduce keys and many other computations.  Thus we should enhance the serialization framwork to provide comparators too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3380) need comparators in serializer framework

Posted by "Chris K Wensel (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596492#action_12596492 ] 

Chris K Wensel commented on HADOOP-3380:
----------------------------------------

> BTW, those methods should both be altered to return RawComparator, not a WritableComparator, no?

I expect so.

Consider a key of type Tuple (a ComparableWritable type) that holds an arbitrary list of ComparableWritable instances. 

If I want fine grained ability to compare/sort these keys based on a runtime configuration, I think I would be happy with providing a Configurable RawComparator class to the JobConf during job setup.

Or are you suggesting best practice is to register a new TupleSerialization (that could subclass WritableSerialization and return my fancy TupleComparator). 

Or should I have a TupleSerialization decorator that delegates to a configurable 'base' Serialization (Text, Thrift, Writable, JSON, etc) but overrides Serialization#getComparator()?

Sorry, just trying to wrap my head around the proposed changes and their implications... I still need to poke around and see the relationship with FileInput/OutputFormat classes...



> need comparators in serializer framework
> ----------------------------------------
>
>                 Key: HADOOP-3380
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3380
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Doug Cutting
>
> The new serialization framework permits Hadoop to incorporate different serialization systems, including Hadoop's Writable, Thrift, Java Serialization, etc.  It provides a generic, extensible means (SerializationFactory) to create serializers and deserializers for arbitrary Java classes.  However it does not include a generic means to create comparators for these classes.  Comparators are required for MapReduce keys and many other computations.  Thus we should enhance the serialization framwork to provide comparators too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3380) need comparators in serializer framework

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596484#action_12596484 ] 

Doug Cutting commented on HADOOP-3380:
--------------------------------------

> What's the relationship between this proposal and JobConf#getOutputValueGroupingComparator() and JobConf#getOutputKeyComparator()?

Those are ways to override the "natural" (or default) comparator under MapReduce.  This proposal is about defining the natural comparator.  If we had a good configurable comparator, then we perhaps wouldn't need those methods, but I'm not sure...  The framework might set io.comparator.context=grouping, and then the configurable comparator implementation could use this to decide to use the user-specified value of io.record.compare.grouping or somesuch.  Yuck!

BTW, those methods should both be altered to return RawComparator, not a WritableComparator, no?


> need comparators in serializer framework
> ----------------------------------------
>
>                 Key: HADOOP-3380
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3380
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Doug Cutting
>
> The new serialization framework permits Hadoop to incorporate different serialization systems, including Hadoop's Writable, Thrift, Java Serialization, etc.  It provides a generic, extensible means (SerializationFactory) to create serializers and deserializers for arbitrary Java classes.  However it does not include a generic means to create comparators for these classes.  Comparators are required for MapReduce keys and many other computations.  Thus we should enhance the serialization framwork to provide comparators too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3380) need comparators in serializer framework

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596470#action_12596470 ] 

Doug Cutting commented on HADOOP-3380:
--------------------------------------

> So a configuration would be involved [...]

That was a little glib.  We need to make Serialization a Configurable, so that SerializationFactory can pass the configuration down to the Serialization implementation, which can then use it as it pleases.  I think we ought to do this too, as a part of this issue, in order to support configurable comparators, etc.

> need comparators in serializer framework
> ----------------------------------------
>
>                 Key: HADOOP-3380
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3380
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Doug Cutting
>
> The new serialization framework permits Hadoop to incorporate different serialization systems, including Hadoop's Writable, Thrift, Java Serialization, etc.  It provides a generic, extensible means (SerializationFactory) to create serializers and deserializers for arbitrary Java classes.  However it does not include a generic means to create comparators for these classes.  Comparators are required for MapReduce keys and many other computations.  Thus we should enhance the serialization framwork to provide comparators too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3380) need comparators in serializer framework

Posted by "Enis Soztutar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12599450#action_12599450 ] 

Enis Soztutar commented on HADOOP-3380:
---------------------------------------

I thought that an object of Serialization class captures the semantics of a Serialization abstraction(such as writable). What I mean is that :

serializationFactory.getSerialization(IntWritable) equals serializationFactory.getSerialization(DoubleWritable)

In that case, we need to pass the class object to getComparator(). 

Anyway, aside from this, what is the benefit of tying getComparator() to serialization instead of a stand alone DefaultComparator#get(ClassName) method (as in the attached patch)?  

> need comparators in serializer framework
> ----------------------------------------
>
>                 Key: HADOOP-3380
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3380
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Doug Cutting
>         Attachments: comparator_wip1.patch, comparator_wip1.patch
>
>
> The new serialization framework permits Hadoop to incorporate different serialization systems, including Hadoop's Writable, Thrift, Java Serialization, etc.  It provides a generic, extensible means (SerializationFactory) to create serializers and deserializers for arbitrary Java classes.  However it does not include a generic means to create comparators for these classes.  Comparators are required for MapReduce keys and many other computations.  Thus we should enhance the serialization framwork to provide comparators too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.