You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by James Huang <me...@yahoo.com> on 2005/09/17 22:10:35 UTC

Sort by relevance+distance

Hi,

I can sort the search results by distance now. But,
the relevance is lost.

I like to have the results sorted by relevance +
distance, i.e., relevance first; for results of
similar relevance, order by distance. How to do that?

Thanks a lot in advance!
-James


--- James Huang <me...@yahoo.com> wrote:

> Hi Otis,
> 
> Thanks for your answer. I do have LIA (but not with
> me
> now physically), and have the impression that the
> search ordering is predetermined (at index time);
> what
> I want is search-time ordering, e.g.,
> 
> "I'm at (x,y) now and low on gas; find me the
> closest
> airports that can land 747, the closest first,
> please".
> 
> I'll re-read the book/chapter tonight, but look
> forward to any expert advises.
> 
> Thanks,
> -James
> 
> --- Otis Gospodnetic <ot...@yahoo.com>
> wrote:
> 
> > Hi James,
> > 
> > Check out the org.apache.lucene.search.package,
> > there are several sort
> > classes that will let you write  a custom sorter. 
> > If you have a copy
> > of LIA, look at chapter 6 for an example (
> >
>
http://www.lucenebook.com/search?query=custom+sort+section%3A6*
> > )
> > 
> > Otis
> > 
> > --- James Huang <me...@yahoo.com> wrote:
> > 
> > > Suppose I have a book index with
> > field="publisher", field="title",
> > > etc.
> > > If a user has bought Manning books, then I like
> to
> > sort the result
> > > with Manning books listed first.
> > >  
> > > In essence, I'm asking for a parameterized
> custom
> > sorting. Is there a
> > > way to do this?
> > >  
> > > Thanks,
> > > -James
> > > 
> > > 
> > > 		
> > > ---------------------------------
> > > Yahoo! for Good
> > >  Click here to donate to the Hurricane Katrina
> > relief effort. 
> > 
> > 
> >
>
---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail:
> > java-user-help@lucene.apache.org
> > 
> > 
> 
> 
> 
> 		
> __________________________________ 
> Yahoo! Mail - PC Magazine Editors' Choice 2005 
> http://mail.yahoo.com
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Sort by relevance+distance

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Sep 18, 2005, at 11:10 AM, James Huang wrote:

>
>
> --- Erik Hatcher <er...@ehatchersolutions.com> wrote:
>
>
>>
>> On Sep 18, 2005, at 10:24 AM, James Huang wrote:
>>
>>
>>> --- Erik Hatcher <er...@ehatchersolutions.com>
>>>
>> wrote:
>>
>>>
>>>
>>>
>>>> Get back to using your DistanceComparatorSource,
>>>>
>> and
>>
>>>> couple that with
>>>> a SortField.FIELD_SCORE, like this:
>>>>
>>>> Sort sort = new Sort(new SortField[] {new
>>>> SortField("location",
>>>>          new DistanceComparatorSource(<whatever
>>>>
>> args
>>
>>>> you need>)),
>>>> SortField.FIELD_SCORE});
>>>>
>>>>
>>>
>>> Thanks!
>>>
>>> Does the order of thest two fields matter? I mean,
>>> with your code, would distance take precedence
>>>
>> over
>>
>>> relevance? Anyway, I'll try it out and play with
>>> ordering and such.
>>>
>>
>> Yes, order matters - they sort in the order
>> specified.  Subsequent
>> SortField's in the list are only used when prior
>> ones are
>> equivalent.  In other words, when the distance is
>> equal between two
>> documents, then they are sorted by score.
>>
>>      Erik
>>
>>
>
> Then this is not what I want -- if I put FIELD_SCORE
> first, it'll rarely work because FIELD_SCORE's seldom
> are the same, practically leaving distance sorting out
> of the picture.
>
> What I want is a "compound" score, i.e., to adjust the
> score based on the distance, like this:
>
>   score *= 1.0 - distance/200.0;
>
> This formula seems to work well for my situation. Is
> there a way to modify the score during search?

Sounds like you want a new type of Query subclass that weight each  
document by a given distance.  Though I'm curious why just sorting by  
distance isn't sufficient for your situation.  Could you describe a  
bit more about what you're doing?

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Sort by relevance+distance

Posted by James Huang <me...@yahoo.com>.

--- Erik Hatcher <er...@ehatchersolutions.com> wrote:

> 
> On Sep 18, 2005, at 10:24 AM, James Huang wrote:
> 
> > --- Erik Hatcher <er...@ehatchersolutions.com>
> wrote:
> >
> >
> >> Get back to using your DistanceComparatorSource,
> and
> >> couple that with
> >> a SortField.FIELD_SCORE, like this:
> >>
> >> Sort sort = new Sort(new SortField[] {new
> >> SortField("location",
> >>          new DistanceComparatorSource(<whatever
> args
> >> you need>)),
> >> SortField.FIELD_SCORE});
> >>
> >
> > Thanks!
> >
> > Does the order of thest two fields matter? I mean,
> > with your code, would distance take precedence
> over
> > relevance? Anyway, I'll try it out and play with
> > ordering and such.
> 
> Yes, order matters - they sort in the order
> specified.  Subsequent  
> SortField's in the list are only used when prior
> ones are  
> equivalent.  In other words, when the distance is
> equal between two  
> documents, then they are sorted by score.
> 
>      Erik
> 

Then this is not what I want -- if I put FIELD_SCORE
first, it'll rarely work because FIELD_SCORE's seldom
are the same, practically leaving distance sorting out
of the picture.

What I want is a "compound" score, i.e., to adjust the
score based on the distance, like this:

  score *= 1.0 - distance/200.0;

This formula seems to work well for my situation. Is
there a way to modify the score during search?

Thanks,

-James

---------------------------------------------------------------------
> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail:
> java-user-help@lucene.apache.org
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Sort by relevance+distance

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Sep 18, 2005, at 10:24 AM, James Huang wrote:

> --- Erik Hatcher <er...@ehatchersolutions.com> wrote:
>
>
>> Get back to using your DistanceComparatorSource, and
>> couple that with
>> a SortField.FIELD_SCORE, like this:
>>
>> Sort sort = new Sort(new SortField[] {new
>> SortField("location",
>>          new DistanceComparatorSource(<whatever args
>> you need>)),
>> SortField.FIELD_SCORE});
>>
>
> Thanks!
>
> Does the order of thest two fields matter? I mean,
> with your code, would distance take precedence over
> relevance? Anyway, I'll try it out and play with
> ordering and such.

Yes, order matters - they sort in the order specified.  Subsequent  
SortField's in the list are only used when prior ones are  
equivalent.  In other words, when the distance is equal between two  
documents, then they are sorted by score.

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Sort by relevance+distance

Posted by James Huang <me...@yahoo.com>.
--- Erik Hatcher <er...@ehatchersolutions.com> wrote:

> Get back to using your DistanceComparatorSource, and
> couple that with  
> a SortField.FIELD_SCORE, like this:
> 
> Sort sort = new Sort(new SortField[] {new
> SortField("location",
>          new DistanceComparatorSource(<whatever args
> you need>)),  
> SortField.FIELD_SCORE});

Thanks!

Does the order of thest two fields matter? I mean,
with your code, would distance take precedence over
relevance? Anyway, I'll try it out and play with
ordering and such.

-James

> 
>      Erik
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Sort by relevance+distance

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Sep 17, 2005, at 7:00 PM, James Huang wrote:

> I use a custom collector:
>
[...]
>
> Then, use IndexSearcher.search(qry, collector);

So what happens if you get 10M results from a search?

> This seems to work. What I wish for is that sorting is
> done by the search engine itself, hoping for a better
> performance (and cleaner code).

And it can be done by Lucene itself...

> Previously, I have created a DistanceComparatorSource
> (similar to that in LIA-ch6); sorting by distance
> works but relevance is lost.

Get back to using your DistanceComparatorSource, and couple that with  
a SortField.FIELD_SCORE, like this:

Sort sort = new Sort(new SortField[] {new SortField("location",
         new DistanceComparatorSource(<whatever args you need>)),  
SortField.FIELD_SCORE});

     Erik

>
> -James
>
> --- Erik Hatcher <er...@ehatchersolutions.com> wrote:
>
>
>>
>> On Sep 17, 2005, at 4:10 PM, James Huang wrote:
>>
>>
>>> Hi,
>>>
>>> I can sort the search results by distance now.
>>>
>> But,
>>
>>> the relevance is lost.
>>>
>>> I like to have the results sorted by relevance +
>>> distance, i.e., relevance first; for results of
>>> similar relevance, order by distance. How to do
>>>
>> that?
>>
>> How are you currently sorting?   You can use
>> multiple sort fields
>> within a Sort.
>>
>>      Erik
>>
>>
>
>
>
> __________________________________
> Yahoo! Mail - PC Magazine Editors' Choice 2005
> http://mail.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Sort by relevance+distance

Posted by James Huang <me...@yahoo.com>.
I use a custom collector:

class ResultCollector extends HitCollector
{
  SortedSet set = new TreeSet();
  IndexSearcher searcher;
  Location me;

  ResultCollector(IndexSearcher searcher, Location me)
  {
    this.me = me;
    this.searcher = searcher;
  }

  public void collect(int id, float score) {
    try {
      Document doc = helper.searcher.doc(id);
      String zc = doc.get("zipcode");
      SearchResult sr = new SearchResult(
         score, zc, getDistance(me, zc));
      // The score in SearchResult is adjusted:
      // score *= 1.0 - distance/200.0;
      set.add(sr);
    } catch(Exception e) {
      e.printStackTrace();
    }
  }

  int getResult(int startindex, SearchResult[] result)
  {
    Iterator iter = set.iterator();
    int idx = 0;
    for (int i=0; iter.hasNext() && idx <
result.length; ++i) {
      Object o = iter.next();
      if (i >= startindex)
        result[idx++] = (SearchResult)o;
    }
    return set.size();
  }
}

The SearchResult extends Comparable.
Then, use IndexSearcher.search(qry, collector);

This seems to work. What I wish for is that sorting is
done by the search engine itself, hoping for a better
performance (and cleaner code).

Previously, I have created a DistanceComparatorSource
(similar to that in LIA-ch6); sorting by distance
works but relevance is lost.

-James

--- Erik Hatcher <er...@ehatchersolutions.com> wrote:

> 
> On Sep 17, 2005, at 4:10 PM, James Huang wrote:
> 
> > Hi,
> >
> > I can sort the search results by distance now.
> But,
> > the relevance is lost.
> >
> > I like to have the results sorted by relevance +
> > distance, i.e., relevance first; for results of
> > similar relevance, order by distance. How to do
> that?
> 
> How are you currently sorting?   You can use
> multiple sort fields  
> within a Sort.
> 
>      Erik
>


		
__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Sort by relevance+distance

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Sep 17, 2005, at 4:10 PM, James Huang wrote:

> Hi,
>
> I can sort the search results by distance now. But,
> the relevance is lost.
>
> I like to have the results sorted by relevance +
> distance, i.e., relevance first; for results of
> similar relevance, order by distance. How to do that?

How are you currently sorting?   You can use multiple sort fields  
within a Sort.

     Erik




>
> Thanks a lot in advance!
> -James
>
>
> --- James Huang <me...@yahoo.com> wrote:
>
>
>> Hi Otis,
>>
>> Thanks for your answer. I do have LIA (but not with
>> me
>> now physically), and have the impression that the
>> search ordering is predetermined (at index time);
>> what
>> I want is search-time ordering, e.g.,
>>
>> "I'm at (x,y) now and low on gas; find me the
>> closest
>> airports that can land 747, the closest first,
>> please".
>>
>> I'll re-read the book/chapter tonight, but look
>> forward to any expert advises.
>>
>> Thanks,
>> -James
>>
>> --- Otis Gospodnetic <ot...@yahoo.com>
>> wrote:
>>
>>
>>> Hi James,
>>>
>>> Check out the org.apache.lucene.search.package,
>>> there are several sort
>>> classes that will let you write  a custom sorter.
>>> If you have a copy
>>> of LIA, look at chapter 6 for an example (
>>>
>>>
>>
>>
> http://www.lucenebook.com/search?query=custom+sort+section%3A6*
>
>>> )
>>>
>>> Otis
>>>
>>> --- James Huang <me...@yahoo.com> wrote:
>>>
>>>
>>>> Suppose I have a book index with
>>>>
>>> field="publisher", field="title",
>>>
>>>> etc.
>>>> If a user has bought Manning books, then I like
>>>>
>> to
>>
>>> sort the result
>>>
>>>> with Manning books listed first.
>>>>
>>>> In essence, I'm asking for a parameterized
>>>>
>> custom
>>
>>> sorting. Is there a
>>>
>>>> way to do this?
>>>>
>>>> Thanks,
>>>> -James
>>>>
>>>>
>>>>
>>>> ---------------------------------
>>>> Yahoo! for Good
>>>>  Click here to donate to the Hurricane Katrina
>>>>
>>> relief effort.
>>>
>>>
>>>
>>>
>>
>>
> ---------------------------------------------------------------------
>
>>> To unsubscribe, e-mail:
>>> java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail:
>>> java-user-help@lucene.apache.org
>>>
>>>
>>>
>>
>>
>>
>>
>> __________________________________
>> Yahoo! Mail - PC Magazine Editors' Choice 2005
>> http://mail.yahoo.com
>>
>>
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Sort by relevance+distance

Posted by James Huang <me...@yahoo.com>.
I guess I can use HitCollector and implement my own
sorting, right?

Is there a better approach?

--- James Huang <me...@yahoo.com> wrote:

> Hi,
> 
> I can sort the search results by distance now. But,
> the relevance is lost.
> 
> I like to have the results sorted by relevance +
> distance, i.e., relevance first; for results of
> similar relevance, order by distance. How to do
> that?
> 
> Thanks a lot in advance!
> -James
> 
> 
> --- James Huang <me...@yahoo.com> wrote:
> 
> > Hi Otis,
> > 
> > Thanks for your answer. I do have LIA (but not
> with
> > me
> > now physically), and have the impression that the
> > search ordering is predetermined (at index time);
> > what
> > I want is search-time ordering, e.g.,
> > 
> > "I'm at (x,y) now and low on gas; find me the
> > closest
> > airports that can land 747, the closest first,
> > please".
> > 
> > I'll re-read the book/chapter tonight, but look
> > forward to any expert advises.
> > 
> > Thanks,
> > -James
> > 
> > --- Otis Gospodnetic <ot...@yahoo.com>
> > wrote:
> > 
> > > Hi James,
> > > 
> > > Check out the org.apache.lucene.search.package,
> > > there are several sort
> > > classes that will let you write  a custom
> sorter. 
> > > If you have a copy
> > > of LIA, look at chapter 6 for an example (
> > >
> >
>
http://www.lucenebook.com/search?query=custom+sort+section%3A6*
> > > )
> > > 
> > > Otis
> > > 
> > > --- James Huang <me...@yahoo.com> wrote:
> > > 
> > > > Suppose I have a book index with
> > > field="publisher", field="title",
> > > > etc.
> > > > If a user has bought Manning books, then I
> like
> > to
> > > sort the result
> > > > with Manning books listed first.
> > > >  
> > > > In essence, I'm asking for a parameterized
> > custom
> > > sorting. Is there a
> > > > way to do this?
> > > >  
> > > > Thanks,
> > > > -James
> > > > 
> > > > 
> > > > 		
> > > > ---------------------------------
> > > > Yahoo! for Good
> > > >  Click here to donate to the Hurricane Katrina
> > > relief effort. 
> > > 
> > > 
> > >
> >
>
---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> > > java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail:
> > > java-user-help@lucene.apache.org
> > > 
> > > 
> > 
> > 
> > 
> > 		
> > __________________________________ 
> > Yahoo! Mail - PC Magazine Editors' Choice 2005 
> > http://mail.yahoo.com
> > 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam
> protection around 
> http://mail.yahoo.com 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail:
> java-user-help@lucene.apache.org
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org