You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Shai Erera <se...@gmail.com> on 2007/12/06 10:52:39 UTC

Comparable ScoreDoc

Hi

Today ScoreDoc is not Comparable. That prevents applications that would like
to use it in Comparable data structures (such as priority queues), but still
use other Lucene's objects, like TopDocs, unless they create a
ComparableScoreDoc which extends ScoreDoc and implements Comparable. To make
ScoreDoc Comparable requires very minor changes:
- Add implements Comparable
- Add:
    public int compareTo(Object o) {
        ScoreDoc oScoreDoc = (ScoreDoc) o;
        float oScore = oScoreDoc.score;
        if (score == oScore) {
            return doc - oScoreDoc.doc;
        }
        return score - oScore < 0 ? -1 : 1;
    }

If you agree to do it, I'm willing to open an issue and provide a patch. It
shouldn't affect any current Lucene implementations.

Thanks,

Shai Erera

Re: Comparable ScoreDoc

Posted by Shai Erera <se...@gmail.com>.
BTW, HitQueue performs this comparison:
  protected final boolean lessThan(Object a, Object b) {
    ScoreDoc hitA = (ScoreDoc)a;
    ScoreDoc hitB = (ScoreDoc)b;
    if (hitA.score == hitB.score)
      return hitA.doc > hitB.doc;
    else
      return hitA.score < hitB.score;
  }
As you can see, the default sorting when using TopDocCollector is by score,
then by doc. So we kind of already made the choice for applications :-)

On Dec 8, 2007 8:55 AM, Chris Hostetter <ho...@fucit.org> wrote:

>
> : In general I would agree that people may want different implementations
> for
> : compare(), but I hardly see that's the case for ScoreDoc. After all, you
> can
> : either compare it by score or by doc (at least now). I believe that
> since
> : most people use the TopDocsHitCollector, they prefer the
> compare-by-score
> : approach ...
>
> sure, but that's not all your suggested compareTo does ... it first
> compares by score and then does a secondary comparison by docId.
> some people might want docs added "more recently" to sort first instead,
> some might want docid left out of hte comparison all together.
>
> This is where Comparators are more useful then compareTo methods.  We can
> add *lots* of different static inner Comparator classes to ScoreDoc, but
> if we add any compareTo method it could wind up burning someone down the
> road.
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 
Regards,

Shai Erera

Re: Comparable ScoreDoc

Posted by Shai Erera <se...@gmail.com>.
I looked up the difference between Comparator and Comparable. I found this
post http://forum.java.sun.com/thread.jspa?threadID=522388&messageID=2500053from
Java's forum.
In short, Comparable is used for a natural ordering of the objects.
Comparator allows for custom comparisons of the same objects. In our
example, ScoreDoc can implement Comparable and compare using the score only.
That's a true ordering of ScoreDoc. If you don't like the second ordering
(in case of tie in scores), we can offer that as static Comparators inside
ScoreDoc.
For example (from the forum), String implements Comparable and also includes
several Comparator implementations. The Comparable implementation compares
Strings by their ASCII, while it offers static
String.CASE_INSENSITIVE_ORDERto compare the string in an insensitive
order.

To conclude, by implementing Comparable on ScoreDoc, applications can use it
as-is for natural ordering. I agree that the second ordering (by doc Id) may
not fit all applications, so that can be omitted from compareTo. In
addition, we can offer another Comparator: ScoreDoc.SCORE_THEN_DOC_ORDER (or
something similar), or leave it out altogether.
But having ScoreDoc implements Comparable can be a good service to many
applications. And ... it doesn't break anything for existing applications or
changes API.

On Dec 8, 2007 8:55 AM, Chris Hostetter <ho...@fucit.org> wrote:

>
> : In general I would agree that people may want different implementations
> for
> : compare(), but I hardly see that's the case for ScoreDoc. After all, you
> can
> : either compare it by score or by doc (at least now). I believe that
> since
> : most people use the TopDocsHitCollector, they prefer the
> compare-by-score
> : approach ...
>
> sure, but that's not all your suggested compareTo does ... it first
> compares by score and then does a secondary comparison by docId.
> some people might want docs added "more recently" to sort first instead,
> some might want docid left out of hte comparison all together.
>
> This is where Comparators are more useful then compareTo methods.  We can
> add *lots* of different static inner Comparator classes to ScoreDoc, but
> if we add any compareTo method it could wind up burning someone down the
> road.
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 
Regards,

Shai Erera

Re: Comparable ScoreDoc

Posted by Chris Hostetter <ho...@fucit.org>.
: In general I would agree that people may want different implementations for
: compare(), but I hardly see that's the case for ScoreDoc. After all, you can
: either compare it by score or by doc (at least now). I believe that since
: most people use the TopDocsHitCollector, they prefer the compare-by-score
: approach ...

sure, but that's not all your suggested compareTo does ... it first 
compares by score and then does a secondary comparison by docId.  
some people might want docs added "more recently" to sort first instead, 
some might want docid left out of hte comparison all together.

This is where Comparators are more useful then compareTo methods.  We can 
add *lots* of different static inner Comparator classes to ScoreDoc, but 
if we add any compareTo method it could wind up burning someone down the 
road.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Comparable ScoreDoc

Posted by Shai Erera <se...@gmail.com>.
In general I would agree that people may want different implementations for
compare(), but I hardly see that's the case for ScoreDoc. After all, you can
either compare it by score or by doc (at least now). I believe that since
most people use the TopDocsHitCollector, they prefer the compare-by-score
approach ...

What about access to inner fields? Like I wrote, Comparable are
self-contained in the sense that they know how to compare themselves to the
same instances. However Comparators can only compare public variables.

On Dec 6, 2007 7:20 PM, Michael Busch <bu...@gmail.com> wrote:

> Shai Erera wrote:
>
> >
> > Comparators however have an advantage - in that specific case I could
> create
> > two Comparators: (1) compares by the score and then by doc (2) compares
> by
>
> That's why I hesitate to add the Comparable interface to ScoreDoc:
> Different people might want different implementations of compare(), and
> that makes it questionable which default compare() implementation we
> should commit. I think Comparator solves this problem quite nicely.
>
> -Michael
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 
Regards,

Shai Erera

Re: Comparable ScoreDoc

Posted by Michael Busch <bu...@gmail.com>.
Shai Erera wrote:

> 
> Comparators however have an advantage - in that specific case I could create
> two Comparators: (1) compares by the score and then by doc (2) compares by

That's why I hesitate to add the Comparable interface to ScoreDoc:
Different people might want different implementations of compare(), and
that makes it questionable which default compare() implementation we
should commit. I think Comparator solves this problem quite nicely.

-Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Comparable ScoreDoc

Posted by Shai Erera <se...@gmail.com>.
That is indeed one alternative. I think however that Comparable objects are
cleaner. It is evident by just looking at the code how they are compared.
Otherwise (with the Comparator approach), you need to pair objects with
comparators, and it's not always clear which comparator to use with each
object.

Also Comparators need access to fields of the object that you don't
necessarily want to expose. If the object implements its own compareTo, it
has access to all the internal fields that may affect the comparison (for
example, hash values).

Comparators however have an advantage - in that specific case I could create
two Comparators: (1) compares by the score and then by doc (2) compares by
the score only. That gives me some flexibility. But code-wise, I would still
need to create two additional classes, while in the Comparable approach I
could just extend ScoreDoc and override compareTo.

Shai

On Dec 6, 2007 12:01 PM, Michael Busch <bu...@gmail.com> wrote:

> Hi Shai,
>
> I think you don't have to subclass ScoreDoc. Can't you simply implement
> a Comparator and pass it in the data structure you need?
>
> E. g.:
> Arrays.sort(scoreDocs, new Comparator() {
>
> public int compare(Object o1, Object o2) {
>  ScoreDoc d1 = (ScoreDoc) o1;
>  ScoreDoc d2 = (ScoreDoc) o2;
>
>    if (d1.score == d2.score) {
>      return d1.doc - d2.doc;
>    }
>    return d1.score - d2.score < 0 ? -1 : 1;
>  }
> });
>
> -Michael
>
> Shai Erera wrote:
> > Hi
> >
> > Today ScoreDoc is not Comparable. That prevents applications that would
> like
> > to use it in Comparable data structures (such as priority queues), but
> still
> > use other Lucene's objects, like TopDocs, unless they create a
> > ComparableScoreDoc which extends ScoreDoc and implements Comparable. To
> make
> > ScoreDoc Comparable requires very minor changes:
> > - Add implements Comparable
> > - Add:
> >     public int compareTo(Object o) {
> >         ScoreDoc oScoreDoc = (ScoreDoc) o;
> >         float oScore = oScoreDoc.score;
> >         if (score == oScore) {
> >             return doc - oScoreDoc.doc;
> >         }
> >         return score - oScore < 0 ? -1 : 1;
> >     }
> >
> > If you agree to do it, I'm willing to open an issue and provide a patch.
> It
> > shouldn't affect any current Lucene implementations.
> >
> > Thanks,
> >
> > Shai Erera
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 
Regards,

Shai Erera

Re: Comparable ScoreDoc

Posted by Michael Busch <bu...@gmail.com>.
Hi Shai,

I think you don't have to subclass ScoreDoc. Can't you simply implement
a Comparator and pass it in the data structure you need?

E. g.:
Arrays.sort(scoreDocs, new Comparator() {

public int compare(Object o1, Object o2) {
  ScoreDoc d1 = (ScoreDoc) o1;
  ScoreDoc d2 = (ScoreDoc) o2;

    if (d1.score == d2.score) {
      return d1.doc - d2.doc;
    }
    return d1.score - d2.score < 0 ? -1 : 1;		
  }
});

-Michael

Shai Erera wrote:
> Hi
> 
> Today ScoreDoc is not Comparable. That prevents applications that would like
> to use it in Comparable data structures (such as priority queues), but still
> use other Lucene's objects, like TopDocs, unless they create a
> ComparableScoreDoc which extends ScoreDoc and implements Comparable. To make
> ScoreDoc Comparable requires very minor changes:
> - Add implements Comparable
> - Add:
>     public int compareTo(Object o) {
>         ScoreDoc oScoreDoc = (ScoreDoc) o;
>         float oScore = oScoreDoc.score;
>         if (score == oScore) {
>             return doc - oScoreDoc.doc;
>         }
>         return score - oScore < 0 ? -1 : 1;
>     }
> 
> If you agree to do it, I'm willing to open an issue and provide a patch. It
> shouldn't affect any current Lucene implementations.
> 
> Thanks,
> 
> Shai Erera
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org