You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Sean Owen <sr...@gmail.com> on 2010/01/08 10:28:12 UTC

compareTo() issue

I see some compareTo() methods with logic like this --

int a = object1.foo();
int b = object2.foo();
if (a == b) {
  return 1; // order randomly
} else {
  return a - b;
}

Three problems here:
- This does not produce a random ordering when used with a sort; it's
quite deterministic
- This violates the contract of compareTo() -- true that
a.compareTo(b) = -b.compareTo(a)
- "a-b" can overflow and give the wrong sign

Mind if I fix?

There's still code going in with lots of variances from what I think
are our agreed standards too.

Sean

Re: compareTo() issue

Posted by Sean Owen <sr...@gmail.com>.
If the placement doesn't matter, why is returning 0 a problem? I'm
just wondering if this doesn't introduce some subtle bugs in the way
that not implementing hashCode/equals does. it may happen to work here
but later...

The overflow problem is remote, but not trivial... can support be
large? like anywhere near a billion? then it's a possible bug.

On Fri, Jan 8, 2010 at 6:26 PM, Robin Anil <ro...@gmail.com> wrote:
> That one was specifc to ordering of sub patterns in Fpgrowth stage. I did
> that as an optimisation where in the object needs to be at a random place in
> the heap if they are of equal length and support. Since it is the most
> called function in the entire algorithm, I got some performance benefit from
> it, because there were many patterns of the same length and support by
> differnet pattern underneath. Plus there is no need of comparing the two
> arrays which would be expensive.  If the compareTo contract has to be
> maintained, I will move it out to another class and use it as the comparator
> during heap initialization.
>
>
> public int compareTo(Pattern cr2) {
>    long support2 = cr2.support();
>    int length2 = cr2.length();
>    if (support == support2) {
>      if (length == length2) {
>        // if they are of same length and support order randomly
>        return 1;
>      } else {
>        return length - length2;
>      }
>    } else {
>      if (support > support2) {
>        return 1;
>      } else {
>        return -1;
>      }
>    }
>  }
>
> On Fri, Jan 8, 2010 at 2:58 PM, Sean Owen <sr...@gmail.com> wrote:
>
>> I see some compareTo() methods with logic like this --
>>
>> int a = object1.foo();
>> int b = object2.foo();
>> if (a == b) {
>>  return 1; // order randomly
>> } else {
>>  return a - b;
>> }
>>
>> Three problems here:
>> - This does not produce a random ordering when used with a sort; it's
>> quite deterministic
>> - This violates the contract of compareTo() -- true that
>> a.compareTo(b) = -b.compareTo(a)
>> - "a-b" can overflow and give the wrong sign
>>
>> Mind if I fix?
>>
>> There's still code going in with lots of variances from what I think
>> are our agreed standards too.
>>
>> Sean
>>
>

Re: compareTo() issue

Posted by Robin Anil <ro...@gmail.com>.
That one was specifc to ordering of sub patterns in Fpgrowth stage. I did
that as an optimisation where in the object needs to be at a random place in
the heap if they are of equal length and support. Since it is the most
called function in the entire algorithm, I got some performance benefit from
it, because there were many patterns of the same length and support by
differnet pattern underneath. Plus there is no need of comparing the two
arrays which would be expensive.  If the compareTo contract has to be
maintained, I will move it out to another class and use it as the comparator
during heap initialization.


public int compareTo(Pattern cr2) {
    long support2 = cr2.support();
    int length2 = cr2.length();
    if (support == support2) {
      if (length == length2) {
        // if they are of same length and support order randomly
        return 1;
      } else {
        return length - length2;
      }
    } else {
      if (support > support2) {
        return 1;
      } else {
        return -1;
      }
    }
  }

On Fri, Jan 8, 2010 at 2:58 PM, Sean Owen <sr...@gmail.com> wrote:

> I see some compareTo() methods with logic like this --
>
> int a = object1.foo();
> int b = object2.foo();
> if (a == b) {
>  return 1; // order randomly
> } else {
>  return a - b;
> }
>
> Three problems here:
> - This does not produce a random ordering when used with a sort; it's
> quite deterministic
> - This violates the contract of compareTo() -- true that
> a.compareTo(b) = -b.compareTo(a)
> - "a-b" can overflow and give the wrong sign
>
> Mind if I fix?
>
> There's still code going in with lots of variances from what I think
> are our agreed standards too.
>
> Sean
>