You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2003/01/26 06:30:17 UTC
Re: Question: using boost for sorting
I think I'll try to find a place for your lucene_ext code somewhere in
Lucene Sandbox, what do you think?
Otis
--- Che Dong <ch...@hotmail.com> wrote:
> How about add sortType in IndexSearcher first?
> User can speciefy IndexSearcher.sortType(by score:default, by docID,
> by docID desc) before indexing.
>
> Che, Dong
>
> diff IndexSearcher.java
> ~/lucene-1.2-src/src/java/org/apache/lucene/search/IndexSearcher.java
>
> 66,81c66
> < /**
> < * Implements search over a single IndexReader.
> < *
> < * user can customize search result sort behavior via
> <code>sortType</code>:
> < * if data source sorted by some field before indexing docID can be
> take
> < * as the alias to the sort field, so
> < * search result sort by docID(or desc) equals to sort by field
> < *
> < * search results sort method:
> < * 0: sort by score (default)
> < * 1: sort by docID
> < * -1: sort by docID desc
> < *
> < * @author Che, Dong <ch...@bigfoot.com>
> < * $Header:
>
/home/cvsroot/lucene_ext/src/org/apache/lucene/search/IndexSearcher.java,v
> 1.1.1.1 2002/09/22 19:36:08 chedong Exp $
> < */
> ---
> > /** Implements search over a single IndexReader. */
> 83,89d67
> < /**
> <
> < */
> < public static final int ORDER_BY_SCORE = 0;
> < public static final int ORDER_BY_DOCID = 1;
> < public static final int ORDER_BY_DOCID_DESC = -1;
> < public int sortType = ORDER_BY_SCORE;
> 96c74
> <
> ---
> >
> 101c79
> <
> ---
> >
> 106c84
> <
> ---
> >
> 134,162c112,127
> < final int md = reader.maxDoc();
> <
> < scorer.score(new HitCollector()
> < {
> < private float minScore = 0.0f;
> < public final void collect(int doc, float score) {
> < if (score > 0.0f && // ignore
> zeroed buckets
> < (bits==null || bits.get(doc))) { // skip
> docs not in bits
> < totalHits[0]++;
> < if (score >= minScore) {
> < // update hit queue
> < switch (sortType) {
> < case ORDER_BY_SCORE: //sort results by
> score
> < hq.put(new ScoreDoc(doc, score));
> < case ORDER_BY_DOCID: //sort results by
> docID
> < hq.put(new ScoreDoc(doc, doc));
> < case ORDER_BY_DOCID_DESC: //sort results
> by docID desc
> < hq.put(new ScoreDoc(doc, (md - doc) )
> );
> < default: //sort results by
> score(default)
> < hq.put(new ScoreDoc(doc, score));
> < }
> < if (hq.size() > nDocs) { // if hit
> queue overfull
> < hq.pop(); //
> remove lowest in hit queue
> < minScore =
> ((ScoreDoc)hq.top()).score; // reset minScore
> < }
> < }
> < }
> < }
> < }, md);
> ---
> > scorer.score(new HitCollector() {
> > private float minScore = 0.0f;
> > public final void collect(int doc, float score) {
> > if (score > 0.0f && // ignore zeroed
> buckets
> > (bits==null || bits.get(doc))) { // skip docs not in
> bits
> > totalHits[0]++;
> > if (score >= minScore) {
> > hq.put(new ScoreDoc(doc, score)); // update hit queue
> > if (hq.size() > nDocs) { // if hit queue
> overfull
> > hq.pop(); // remove lowest in
> hit queue
> > minScore = ((ScoreDoc)hq.top()).score; // reset
> minScore
> > }
> > }
> > }
> > }
> > }, reader.maxDoc());
> 167c132
> <
> ---
> >
>
>
> ----- Original Message -----
> From: "Doug Cutting" <cu...@lucene.com>
> To: "Lucene Developers List" <lu...@jakarta.apache.org>
> Sent: Thursday, October 17, 2002 5:21 AM
> Subject: Re: Question: using boost for sorting
>
>
> > Please submit diffs before committing anything, as this is delicate
>
> > code. Small changes here can affect performance in a big way.
> >
> > Also, we must be extra-careful when making a new public API: once a
>
> > method is public it's very hard to remove it. The Similarity
> methods
> > also need to be well documented.
> >
> > Doug
> >
> > Otis Gospodnetic wrote:
> > > This sounds good to me, as it would lead us to pluggable
> similarity
> > > computation...mmmm.
> > > I can refactor some of this tonight.
> > >
> > > Otis
> > >
> > >
> > > --- Doug Cutting <cu...@lucene.com> wrote:
> > >
> > >>This looks like a good approach. When I get a chance, I'd like
> to
> > >>make
> > >>Similarity an interface or an abstract class, whose default
> > >>implementation would do what the current class does, but whose
> > >>methods
> > >>can be overridden. Then I'd add methods like:
> > >>
> > >> public static void Similarity.setDefaultSimilarity(Similarity
> > >>sim);
> > >> public void IndexWriter.setSimilarity(Similarity sim);
> > >> public void Searcher.setSimilarity(Similarity sim);
> > >>
> > >>So to override Similarity methods you'd define a subclass of the
> > >>standard implementation, then either install yours globally via
> > >>setDefaultSimilarity, or set it in your IndexWriter before adding
>
> > >>documents and in your Searcher before searching. Does that sound
>
> > >>reasonable?
> > >>
> > >>This would let you do what you describe below without changing
> > >>Lucene's
> > >>sources. However I'm very short on time right now and don't know
> how
> > >>
> > >>soon I'll get to this.
> > >>
> > >>Doug
> > >>
> > >>David Birtwell wrote:
> > >>
> > >>>Hi Dmitry,
> > >>>
> > >>>I was faced with a similar problem. We wanted to have a numeric
> > >>
> > >>rank
> > >>
> > >>>field in each document influence the order in which the
> documents
> > >>
> > >>were
> > >>
> > >>>returned by lucene. While investigating a solution for this, I
> > >>
> > >>wanted
> > >>
> > >>>to see if I could implement strict sorting based on this numeric
> > >>
> > >>value.
> > >>
> > >>>I was able to accomplish this using document boosting, but not
> > >>
> > >>without
> > >>
> > >>>modifying the lucene source. Our "ranking" field is an integer
> > >>
> > >>value
> > >>
> > >>>from one to one hundred. I'm not sure if this will help you,
> but
> > >>
> > >>I'll
> > >>
> > >>>include a summary of what I did.
> > >>>
> > >>>In DocumentWriter remove the normalization by field length:
> > >>> float norm = fieldBoosts[n] *
> > >>>Similarity.normalizeLength(fieldLengths[n]);
> > >>>to
> > >>> float norm = fieldBoosts[n];
> > >>>
> > >>>In TermScorer and PhraseScorer, modify the score() method to
> ignore
> > >>
> > >>the
> > >>
> > >>>lucene base score:
> > >>> score *= Similarity.decodeNorm(norms[d]);
> > >>>to
> > >>> score = Similarity.decodeNorm(norms[d]);
> > >>>
> > >>>In Similarity.java, make byteToFloat() public.
> > >>>
> > >>>At index time, use Similarity.byteToFloat() to determine your
> boost
> > >>
> > >>>value as in the following pseudocode:
> > >>> Document d = new Document();
> > >>> ... add your fields ...
> > >>> int rank = d.getField("RANK"); (range of rank can be 0 to
> 255)
> > >>> float sortVal = Similarity.byteToFloat(rank)
> > >>> d.setBoost(sortVal)
> > >>>
> > >>>If you'd like the reasoning behind any or all of these items,
> let
> > >>
> > >>me know.
> > >>
> > >>>DaveB
> > >>>
> > >>>
> > >>>
> > >>>Dmitry Serebrennikov wrote:
> > >>>
> > >>>
> > >>>>Greetings Everyone,
> > >>>>
> > >>>>I'm thinking of trying to build something that manipulates a
> query
> > >>>
> > >>>>score in order to achieve a sort order other then the default
> > >>>>relevance sort. The idea is to create a new type of query:
> > >>>>SortingQuery( Query query, String sortByField )
> > >>>>
> > >>>>It would run the sub-query and return results in an order of
> the
> > >>>>values found in the "sortByField" for those documents. Now,
> I've
> > >>>>looked at all of the sorting discussion prior to this, and the
> > >>>
> > >>best
> > >>
> > >>>>approach (recommended by Doug among others) is to provide some
> > >>>
> > >>sort of
> > >>
> > >>>>a fast access to the field values inside the HitCollector.
> Reading
> > >>>
> > >>>>documents at search time is too slow, so people access the data
>
> > >>>>elsewhere or build an in-memory index of that data (such as is
> > >>>
> > >>done in
> > >>
> > >>>>the SearchBean's SortField).
> > >>>>
> > >>>>My idea is different. I want to try to do the following:
> > >>>>- compose a query that consists of the original sub-query
> followed
> > >>>
> > >>by
> > >>
> > >>>>a special "sorting query"
> > >>>>- "boost" the score of the original sub-query to 0
> > >>>>- compute the score of the sorting query such that it would
> > >>>
> > >>reflect
> > >>
> > >>>>the desired sort order
> > >>>>
> > >>>>Has anyone tried to do something like this?
> > >>>>Would this work?
> > >>>>Is this worth doing?
> > >>>>If it would, would then I have to do something during the
> indexing
> > >>>
> > >>>>time to set normalization / scoring factors for that field to
> > >>>>something or other?
> > >>>>
> > >>>>Thanks.
> > >>>>Dmitry.
> > >>>>
> > >>>>
> > >>>>
> > >>>>--
> > >>>>To unsubscribe, e-mail:
> > >>>><ma...@jakarta.apache.org>
> > >>>>For additional commands, e-mail:
> > >>>><ma...@jakarta.apache.org>
> > >>>>
> > >>>>
> > >>>
> > >>>
> > >>>--
> > >>>To unsubscribe, e-mail:
> > >>><ma...@jakarta.apache.org>
> > >>>For additional commands, e-mail:
> > >>><ma...@jakarta.apache.org>
> > >>>
> > >>
> > >>
> > >>--
> > >>To unsubscribe, e-mail:
> > >><ma...@jakarta.apache.org>
> > >>For additional commands, e-mail:
> > >><ma...@jakarta.apache.org>
> > >>
> > >
> > >
> > > __________________________________________________
> > > Do you Yahoo!?
> > > Faith Hill - Exclusive Performances, Videos & More
> > > http://faith.yahoo.com
> > >
> > > --
> > > To unsubscribe, e-mail:
> <ma...@jakarta.apache.org>
> > > For additional commands, e-mail:
> <ma...@jakarta.apache.org>
> > >
> >
> >
> >
> > --
> > To unsubscribe, e-mail:
> <ma...@jakarta.apache.org>
> > For additional commands, e-mail:
> <ma...@jakarta.apache.org>
> >
> ATTACHMENT part 2 application/octet-stream name=IndexSearcher.java
> --
> To unsubscribe, e-mail:
> <ma...@jakarta.apache.org>
> For additional commands, e-mail:
<ma...@jakarta.apache.org>
__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>
Re: Question: using boost for sorting
Posted by Che Dong <ch...@hotmail.com>.
Thank you, is it possable create a sub project to store user's implent basic lucene interface: Tokenizer, Filter and some other indexing approach.
Regards
Che, Dong
----- Original Message -----
From: "Otis Gospodnetic" <ot...@yahoo.com>
To: "Lucene Developers List" <lu...@jakarta.apache.org>
Cc: "Che Dong" <ch...@hotmail.com>
Sent: Sunday, January 26, 2003 1:30 PM
Subject: Re: Question: using boost for sorting
> I think I'll try to find a place for your lucene_ext code somewhere in
> Lucene Sandbox, what do you think?
>
> Otis