You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Matt Corgan <mc...@hotpads.com> on 2014/03/02 00:30:04 UTC

Re: Why doesn't KeyValue.equals/CellComparator compare the values?

Hmm, I don't think KeyValue.hashCode should be including the value.  I'm
surprised it hasn't turned up a bug, but maybe that's because there's
barely any code relying on it.  Looks like KeyValue.equals now farms out
the work to CellComparator, and maybe KeyValue.hashCode should do the same.
 Note that CellComparator.hashCode does not include the value.


On Fri, Feb 28, 2014 at 10:20 AM, Cosmin Lehene <cl...@adobe.com> wrote:

> Thanks Matt, Stack,
>
> My question/comment was biased by the perspective of a co-processor
> implementation, but I guess it may well apply for HBase development.
> From that perspective you're both in HBase-land and Java-land.
>
> A collection of cells needs to be compared to another collection of cells
> (I¹m doing a diff).
> Java collections will end up comparing individual objects for equality so
> it boils down to a Cell object being equal to another Cell object. So from
> a java/oo perspective the question is: are two cells with different values
> equal (I.e. Can I swap them?)
>
> The HBase answer is indeed yes they are equal as long as row, family,
> qualifier, timestamp and type are the same.
>
> The Java answer, however may be different (and hence the expectations of a
> developer) as, in general it will be based on the known contract.
>
> And the general hashCode  contract is
>
> * If two objects are equal according to the equals(Object) method, then
> calling the hashCode method on each of the two objects must produce the
> same integer result.
>
>
>
> And the equals javadoc
>
> * Note that it is generally necessary to override the {@code hashCode}
>      * method whenever this method is overridden, so as to maintain the
>      * general contract for the {@code hashCode} method, which states
>      * that equal objects must have equal hash codes.
>
>
> But in our case, the object equality will pass but hash codes will be
> different (https://gist.github.com/clehene/9276434)
>
> It¹s obvious why the behavior is as is in Hbase, so rather than
> nitpicking, I wonder whether this could be made obvious as it may help
> avoid some unexpected behaviors :)
>
> Thanks,
> Cosmin
>
> On 2/27/14, 10:22 AM, "Stack" <st...@duboce.net> wrote:
>
> >On Wed, Feb 26, 2014 at 8:31 PM, Matt Corgan <mc...@hotpads.com> wrote:
> >....
> >
> >> But maybe one of the committers could add a sentence to emphasize that
> >> value is excluded.
> >>
> >>
> >We should underline that data is not considered comparing Cells
> >(KeyValues).  Apart from the fact that it could make for some interesting
> >performance issues, the system isn't plumbed for dealing with coordinates
> >that differ in their value only.  Rather, the mvcc/sequenceid is used
> >splitting Cells whose coordinates are otherwise the same).
> >
> >What was your expectation mighty Cosmin?  What you think HBase should do
> >with values that differ in value only?
> >
> >Thanks,
> >St.Ack
>
>

Re: Why doesn't KeyValue.equals/CellComparator compare the values?

Posted by Cosmin Lehene <cl...@adobe.com>.
So should there be a Jira for this?

This wouldn’t fully fix my concern though.
I wonder whether the “language” should make it more obvious when dealing
with coordinates (row, family, qualifier, ts) rather than values.

Cosmin

On 3/1/14, 3:30 PM, "Matt Corgan" <mc...@hotpads.com> wrote:

>Hmm, I don't think KeyValue.hashCode should be including the value.  I'm
>surprised it hasn't turned up a bug, but maybe that's because there's
>barely any code relying on it.  Looks like KeyValue.equals now farms out
>the work to CellComparator, and maybe KeyValue.hashCode should do the
>same.
> Note that CellComparator.hashCode does not include the value.
>
>
>On Fri, Feb 28, 2014 at 10:20 AM, Cosmin Lehene <cl...@adobe.com> wrote:
>
>> Thanks Matt, Stack,
>>
>> My question/comment was biased by the perspective of a co-processor
>> implementation, but I guess it may well apply for HBase development.
>> From that perspective you're both in HBase-land and Java-land.
>>
>> A collection of cells needs to be compared to another collection of
>>cells
>> (I¹m doing a diff).
>> Java collections will end up comparing individual objects for equality
>>so
>> it boils down to a Cell object being equal to another Cell object. So
>>from
>> a java/oo perspective the question is: are two cells with different
>>values
>> equal (I.e. Can I swap them?)
>>
>> The HBase answer is indeed yes they are equal as long as row, family,
>> qualifier, timestamp and type are the same.
>>
>> The Java answer, however may be different (and hence the expectations
>>of a
>> developer) as, in general it will be based on the known contract.
>>
>> And the general hashCode  contract is
>>
>> * If two objects are equal according to the equals(Object) method, then
>> calling the hashCode method on each of the two objects must produce the
>> same integer result.
>>
>>
>>
>> And the equals javadoc
>>
>> * Note that it is generally necessary to override the {@code hashCode}
>>      * method whenever this method is overridden, so as to maintain the
>>      * general contract for the {@code hashCode} method, which states
>>      * that equal objects must have equal hash codes.
>>
>>
>> But in our case, the object equality will pass but hash codes will be
>> different (https://gist.github.com/clehene/9276434)
>>
>> It¹s obvious why the behavior is as is in Hbase, so rather than
>> nitpicking, I wonder whether this could be made obvious as it may help
>> avoid some unexpected behaviors :)
>>
>> Thanks,
>> Cosmin
>>
>> On 2/27/14, 10:22 AM, "Stack" <st...@duboce.net> wrote:
>>
>> >On Wed, Feb 26, 2014 at 8:31 PM, Matt Corgan <mc...@hotpads.com>
>>wrote:
>> >....
>> >
>> >> But maybe one of the committers could add a sentence to emphasize
>>that
>> >> value is excluded.
>> >>
>> >>
>> >We should underline that data is not considered comparing Cells
>> >(KeyValues).  Apart from the fact that it could make for some
>>interesting
>> >performance issues, the system isn't plumbed for dealing with
>>coordinates
>> >that differ in their value only.  Rather, the mvcc/sequenceid is used
>> >splitting Cells whose coordinates are otherwise the same).
>> >
>> >What was your expectation mighty Cosmin?  What you think HBase should
>>do
>> >with values that differ in value only?
>> >
>> >Thanks,
>> >St.Ack
>>
>>