You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Karthik N S <ka...@controlnet.co.in> on 2004/12/10 13:39:41 UTC

HITCOLLECTOR+SCORE+DELIMA

Hi guys

Apologies.............



I am still in delima on How to use the HitCollector for returning  Hits hits
between scores  0.2f to 1.0f ,

There is not a simple example for the same, yet lot's of talk on usage for
the same on the form.

Please somebody spare a bit of code (u'r intelligence) on this form.





Thx in advance
Karthik

























      WITH WARM REGARDS
      HAVE A NICE DAY
      [ N.S.KARTHIK]




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: HITCOLLECTOR+SCORE+DELIMMA

Posted by Nader Henein <ns...@bayt.net>.

Dude, and I say this with love, it's open source, you've got the code, 
take the initiative, DIY, be creative and share your findings with the 
rest of us.

Personally I would be interested to see how you do this, keep your 
changes documented and share.

Nader Henein

Karthik N S wrote:

>Hi Erik
>
>
>Apologies..............
>
>I got Confused with the last mail.
>
>  
>
>>>Iterate over Hits.  returns large hit values and Iteration on Hits for
>>>      
>>>
>scores consumes time ,
>
>so How Do I Limit my Search Between [ X.xf to Y.yf ] prior getting the Hits.
>
>Note:- The search is being done on Field Type 'Text' ,consists of 'Contents'
>from various Html documents
>
>
>Please Advise me
>Karthik
>
>
>
>
>-----Original Message-----
>From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
>Sent: Monday, December 13, 2004 5:05 PM
>To: Lucene Users List
>Subject: Re: HITCOLLECTOR+SCORE+DELIMA
>
>
>
>On Dec 13, 2004, at 1:16 AM, Karthik N S wrote:
>  
>
>>So u say I have to Build a Filter to Collect all the Scores between
>>the 2
>>Ranges [ 0.2f to 1.0f]
>>    
>>
>
>My message is being misinterpreted.  I said "filter" as a verb, not a
>noun.  :)  In other words, I was not intending to mean write a Filter -
>a Filter would not be able to filter on score.
>
>  
>
>>so the API for the same would be
>>
>> Hits hit = search(Query query, Filter filtertoGetScore)
>>
>>
>> But while writing the Filter  Score again depends on Hits  ====>
>>Score =
>>hits.score(x);
>>    
>>
>
>Again, you cannot write a Filter (capital 'F') to deal with score.
>
>Please re-read what I said below...
>
>  
>
>>Hits are in descending score
>>order, so you may just want to use Hits and filter based on the score
>>provided by hits.score(i).
>>    
>>
>
>Iterate over Hits... when you encounter scores below your desired
>range, stop iterating.  Why is this simple procedure not good enough
>for what you are trying to achieve?
>
>	Erik
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
>
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: HITCOLLECTOR+SCORE+DELIMMA

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Dec 14, 2004, at 5:42 AM, Karthik N S wrote:
> What exactly do u mean by this
>
>
>> We've emphasized numerous times that calling hits.doc(i) is a resource
>> hit.  Don't do it for documents you aren't going to show.  To filter 
>> by
>> score, use hits.score(i) first.
>
>  I am bit Confused u mean to say Replace
>
>    hits.doc(i)
>
>     by
>
>   hits.score(i)

Here is some pseudo-code:

	start = 0 or the starting index for the page you want to display
	finish = last hits index you want to display
	for i = start; i < finish ; i++
		if hits.score(i) within tolerance
			grab hits.doc(i)

I'm working hard to be helpful here.  I'm running out of answers for 
you though.  You are ignoring my requests to actually post code.  If 
you want further assistance shows us *exactly* what you're doing.

>   ( >7 ) x 1000 x  15000  documents , Get most of the Relevant His 
> (Where
> ever Score is between 0.5 to 1.0 )
>
>   and then Sort the adjecent Fields 'Vendors' and 'Price' in ASC Order
>
>  In such a case We cannot use RangeQuery.... without priorly knowing 
> what
> exactly the Consumer want's

See above.  I cannot help with this without actual code (succinct clear 
code!).  Lucene can sort and filter if you leverage it appropriately.  
Please grab a copy of Lucene in Action for lots of details on sorting 
and filtering.

>  Is it not possible to have a Generalized Filter in further versions 
> of API
> , to Inject some minor factors prior to
>  getting the Hits returned.

This already exists.  Please try it out.  There have been numerous 
posts about this topic.  Lucene in Action covers it.  Our source code 
download has examples.

	Erik

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

RE: HITCOLLECTOR+SCORE+DELIMMA

Posted by Karthik N S <ka...@controlnet.co.in>.

Hi Erik

What exactly do u mean by this

>We've emphasized numerous times that calling hits.doc(i) is a resource
>hit.  Don't do it for documents you aren't going to show.  To filter by
>score, use hits.score(i) first.

 I am bit Confused u mean to say Replace

   hits.doc(i)

    by

  hits.score(i)

Also

> Ah, so you are accessing every document to get this field information.
> It is incorrect that you cannot filter prior to getting hits.  You have
> a couple of options in filtering by a field value - use a QueryFilter
. or simply AND a RangeQuery to the original query.

Since the portal we ar building for is a eCommerce one, We have to return
SearchWord across

  ( >7 ) x 1000 x  15000  documents , Get most of the Relevant His (Where
ever Score is between 0.5 to 1.0 )

  and then Sort the adjecent Fields 'Vendors' and 'Price' in ASC Order

 In such a case We cannot use RangeQuery.... without priorly knowing what
exactly the Consumer want's

 Is it not possible to have a Generalized Filter in further versions of API
, to Inject some minor factors prior to

 getting the Hits returned.

Thx in advance
Karthik

-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
Sent: Tuesday, December 14, 2004 3:44 PM
To: Lucene Users List
Subject: Re: HITCOLLECTOR+SCORE+DELIMMA

On Dec 13, 2004, at 11:16 PM, Karthik N S wrote:
>  time [ A simple search of 'handbags' returned 1,60,000 hits and time
> taken
> was 440 secs ,in production Env  / May be our
>  Coding is poor,But we are constantly improving the process ].

If your searches are taking 440 seconds, you have something more
fundamentally wrong.  You are either doing some large
wildcard/range/fuzzy expansions or you're accessing every document from
all your hits.  Is the searcher.search() method taking that long?  I
bet not.  Or rather is it the iteration over the Hits that is killing
the "search" time, which is what I suspect?

We've emphasized numerous times that calling hits.doc(i) is a resource
hit.  Don't do it for documents you aren't going to show.  To filter by
score, use hits.score(i) first.

>  { O/s Linux Gentoo , RAM 1GB, Lucene1.4.1,Appserver = Tomcat5, and
> BlackDawn Java 1.4.2 with Args  -XX:+UseParallelGC for
>
>  Garbage Collection  }

Please narrow your code down to a clean, succinct example that you can
post.  It is difficult to help you without details of your code (but
let me emphasize again - it needs to be clean and succinct so it is
quick for us to get a handle on).

>  To be One step in advance ,We also have an adjecent Fields 'Vendor
> ','Price' which we have to accordingly Compare
>  Best/Poor/Least results . So We have to have to limit the hits
> accordingly,since Lucene API does not provide any way to
>  inject this limiting facility *prior* to getting the hits .

Ah, so you are accessing every document to get this field information.
It is incorrect that you cannot filter prior to getting hits.  You have
a couple of options in filtering by a field value - use a QueryFilter
or simply AND a RangeQuery to the original query.

	Erik

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: HITCOLLECTOR+SCORE+DELIMMA

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Dec 13, 2004, at 11:16 PM, Karthik N S wrote:
>  time [ A simple search of 'handbags' returned 1,60,000 hits and time 
> taken
> was 440 secs ,in production Env  / May be our
>  Coding is poor,But we are constantly improving the process ].

If your searches are taking 440 seconds, you have something more 
fundamentally wrong.  You are either doing some large 
wildcard/range/fuzzy expansions or you're accessing every document from 
all your hits.  Is the searcher.search() method taking that long?  I 
bet not.  Or rather is it the iteration over the Hits that is killing 
the "search" time, which is what I suspect?

We've emphasized numerous times that calling hits.doc(i) is a resource 
hit.  Don't do it for documents you aren't going to show.  To filter by 
score, use hits.score(i) first.

>  { O/s Linux Gentoo , RAM 1GB, Lucene1.4.1,Appserver = Tomcat5, and
> BlackDawn Java 1.4.2 with Args  -XX:+UseParallelGC for
>
>  Garbage Collection  }

Please narrow your code down to a clean, succinct example that you can 
post.  It is difficult to help you without details of your code (but 
let me emphasize again - it needs to be clean and succinct so it is 
quick for us to get a handle on).

>  To be One step in advance ,We also have an adjecent Fields 'Vendor
> ','Price' which we have to accordingly Compare
>  Best/Poor/Least results . So We have to have to limit the hits
> accordingly,since Lucene API does not provide any way to
>  inject this limiting facility *prior* to getting the hits .

Ah, so you are accessing every document to get this field information.  
It is incorrect that you cannot filter prior to getting hits.  You have 
a couple of options in filtering by a field value - use a QueryFilter 
or simply AND a RangeQuery to the original query.

	Erik

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

RE: HITCOLLECTOR+SCORE+DELIMMA

Posted by Karthik N S <ka...@controlnet.co.in>.

Hi Erik

Apologies...........




 In this Mailed
"http://nagoya.apache.org/eyebrowse/ReadMsg?listName=lucene-user@jakarta.apa
che.org&msgNo=11254"

 I have already told u that  doc.get(" "); was coming in batches for a mear
hit of  '>4000' , and this is happening in real

 time [ A simple search of 'handbags' returned 1,60,000 hits and time taken
was 440 secs ,in production Env  / May be our

 Coding is poor,But we are constantly improving the process ].


 { O/s Linux Gentoo , RAM 1GB, Lucene1.4.1,Appserver = Tomcat5, and
BlackDawn Java 1.4.2 with Args  -XX:+UseParallelGC for

 Garbage Collection  }


 To be One step in advance ,We also have an adjecent Fields 'Vendor
','Price' which we have to accordingly Compare

 Best/Poor/Least results . So We have to have to limit the hits
accordingly,since Lucene API does not provide any way to

 inject this limiting facility *prior* to getting the hits .


 [ Excuse me Nader Henein ,I am from a Lucene-Users Form  NOT in
Lucene-Developer's Form,

  So we expect a Least possible Help ]


With Warm Regards
Karthik




-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
Sent: Monday, December 13, 2004 6:39 PM
To: Lucene Users List
Subject: Re: HITCOLLECTOR+SCORE+DELIMMA



On Dec 13, 2004, at 6:58 AM, Karthik N S wrote:
>>> Iterate over Hits.  returns large hit values and Iteration on Hits
>>> for
> scores consumes time ,
>
> so How Do I Limit my Search Between [ X.xf to Y.yf ] prior getting the
> Hits.

Why do you need to do this *prior* to getting Hits?

You have yet to justify what you're asking.  I almost guarantee you
that navigating Hits in the way I said will be as fast as you need it
to be.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: HITCOLLECTOR+SCORE+DELIMMA

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Dec 13, 2004, at 6:58 AM, Karthik N S wrote:
>>> Iterate over Hits.  returns large hit values and Iteration on Hits 
>>> for
> scores consumes time ,
>
> so How Do I Limit my Search Between [ X.xf to Y.yf ] prior getting the 
> Hits.

Why do you need to do this *prior* to getting Hits?

You have yet to justify what you're asking.  I almost guarantee you 
that navigating Hits in the way I said will be as fast as you need it 
to be.

	Erik

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

RE: HITCOLLECTOR+SCORE+DELIMMA

Posted by Karthik N S <ka...@controlnet.co.in>.

Hi Erik


Apologies..............

I got Confused with the last mail.

>> Iterate over Hits.  returns large hit values and Iteration on Hits for
scores consumes time ,

so How Do I Limit my Search Between [ X.xf to Y.yf ] prior getting the Hits.

Note:- The search is being done on Field Type 'Text' ,consists of 'Contents'
from various Html documents


Please Advise me
Karthik




-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
Sent: Monday, December 13, 2004 5:05 PM
To: Lucene Users List
Subject: Re: HITCOLLECTOR+SCORE+DELIMA



On Dec 13, 2004, at 1:16 AM, Karthik N S wrote:
> So u say I have to Build a Filter to Collect all the Scores between
> the 2
> Ranges [ 0.2f to 1.0f]

My message is being misinterpreted.  I said "filter" as a verb, not a
noun.  :)  In other words, I was not intending to mean write a Filter -
a Filter would not be able to filter on score.

> so the API for the same would be
>
>  Hits hit = search(Query query, Filter filtertoGetScore)
>
>
>  But while writing the Filter  Score again depends on Hits  ====>
> Score =
> hits.score(x);

Again, you cannot write a Filter (capital 'F') to deal with score.

Please re-read what I said below...

> Hits are in descending score
> order, so you may just want to use Hits and filter based on the score
> provided by hits.score(i).

Iterate over Hits... when you encounter scores below your desired
range, stop iterating.  Why is this simple procedure not good enough
for what you are trying to achieve?

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: HITCOLLECTOR+SCORE+DELIMA

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Dec 13, 2004, at 1:16 AM, Karthik N S wrote:
> So u say I have to Build a Filter to Collect all the Scores between 
> the 2
> Ranges [ 0.2f to 1.0f]

My message is being misinterpreted.  I said "filter" as a verb, not a 
noun.  :)  In other words, I was not intending to mean write a Filter - 
a Filter would not be able to filter on score.

> so the API for the same would be
>
>  Hits hit = search(Query query, Filter filtertoGetScore)
>
>
>  But while writing the Filter  Score again depends on Hits  ====> 
> Score =
> hits.score(x);

Again, you cannot write a Filter (capital 'F') to deal with score.

Please re-read what I said below...

> Hits are in descending score
> order, so you may just want to use Hits and filter based on the score
> provided by hits.score(i).

Iterate over Hits... when you encounter scores below your desired 
range, stop iterating.  Why is this simple procedure not good enough 
for what you are trying to achieve?

	Erik

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

RE: HITCOLLECTOR+SCORE+DELIMMA

Posted by Karthik N S <ka...@controlnet.co.in>.

Hi

Vikas Gupta


Since Erik Replied to me on my last mail, A FILTER cand be built for the
same can be to
fetch scrores between  0.2f to 1.0f.

Can u please spare me some code for the same.

[ Sorry for the Spell mistake, My Mail IDE does not have one ]
With regards
Karthik




-----Original Message-----
From: Vikas Gupta [mailto:vgupta@cs.utexas.edu]
Sent: Monday, December 13, 2004 3:17 PM
To: Lucene Users List
Subject: RE: HITCOLLECTOR+SCORE+DELIMA


> On Dec 10, 2004, at 7:39 AM, Karthik N S wrote:
> > I am still in delima on How to use the HitCollector for returning
> > Hits hits
> > between scores  0.2f to 1.0f ,
> >
> > There is not a simple example for the same, yet lot's of talk on usage
> > for
> > the same on the form.

1) I am not 100% sure about this but it might work.

Add the code starting with >>>>> in IndexSearcher.java::search()

 // inherit javadoc
  public TopDocs search(Query query, Filter filter, final int nDocs)
       throws IOException {
    Scorer scorer = query.weight(this).scorer(reader);
    if (scorer == null)
      return new TopDocs(0, new ScoreDoc[0]);

    final BitSet bits = filter != null ? filter.bits(reader) : null;
    final HitQueue hq = new HitQueue(nDocs);
    final int[] totalHits = new int[1];
    scorer.score(new HitCollector() {
	public final void collect(int doc, float score) {
	  if (score > 0.0f &&			  // ignore zeroed buckets
>>>>>         && score >0.2f && score<1.0f)
	      (bits==null || bits.get(doc))) {	  // skip docs not in bits
	    totalHits[0]++;
            hq.insert(new ScoreDoc(doc, score));
	  }
	}
      });



2) Filter examples are in Lucene in Action book, Chapter 5. I wrote an
example as well:



        String query = "odyssey";

        BooleanQuery bq = new BooleanQuery();
        bq.add(new TermQuery(new Term("content", query)), true, false);

        BooleanQuery bqf = new BooleanQuery();
        bqf.add(new TermQuery(new Term("H2", query)), true, false);

        Filter f = new QueryFilter(bqf);

        IndexReader reader = IndexReader.open(new File(dir,
"index").getCanonicalPath());
        Searcher luceneSearcher = new
org.apache.lucene.search.IndexSearcher(reader);
        luceneSearcher.setSimilarity(new NutchSimilarity());

	//Logically the following would be executed as follows: Find all
        //the docs matching bq. Select the ones which matchbqf
        hits = luceneSearcher.search(bq, f);

        System.out.print("query: " + query);

        System.out.println("Total hits: " + hits.length());

3) delima is spelled as dilemma


-Vikas Gupta

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

RE: HITCOLLECTOR+SCORE+DELIMA

Posted by Vikas Gupta <vg...@cs.utexas.edu>.

> On Dec 10, 2004, at 7:39 AM, Karthik N S wrote:
> > I am still in delima on How to use the HitCollector for returning
> > Hits hits
> > between scores  0.2f to 1.0f ,
> >
> > There is not a simple example for the same, yet lot's of talk on usage
> > for
> > the same on the form.

1) I am not 100% sure about this but it might work.

Add the code starting with >>>>> in IndexSearcher.java::search()

 // inherit javadoc
  public TopDocs search(Query query, Filter filter, final int nDocs)
       throws IOException {
    Scorer scorer = query.weight(this).scorer(reader);
    if (scorer == null)
      return new TopDocs(0, new ScoreDoc[0]);

    final BitSet bits = filter != null ? filter.bits(reader) : null;
    final HitQueue hq = new HitQueue(nDocs);
    final int[] totalHits = new int[1];
    scorer.score(new HitCollector() {
	public final void collect(int doc, float score) {
	  if (score > 0.0f &&			  // ignore zeroed buckets
>>>>>         && score >0.2f && score<1.0f)
	      (bits==null || bits.get(doc))) {	  // skip docs not in bits
	    totalHits[0]++;
            hq.insert(new ScoreDoc(doc, score));
	  }
	}
      });



2) Filter examples are in Lucene in Action book, Chapter 5. I wrote an
example as well:



        String query = "odyssey";

        BooleanQuery bq = new BooleanQuery();
        bq.add(new TermQuery(new Term("content", query)), true, false);

        BooleanQuery bqf = new BooleanQuery();
        bqf.add(new TermQuery(new Term("H2", query)), true, false);

        Filter f = new QueryFilter(bqf);

        IndexReader reader = IndexReader.open(new File(dir, "index").getCanonicalPath());
        Searcher luceneSearcher = new org.apache.lucene.search.IndexSearcher(reader);
        luceneSearcher.setSimilarity(new NutchSimilarity());

	//Logically the following would be executed as follows: Find all
        //the docs matching bq. Select the ones which matchbqf
        hits = luceneSearcher.search(bq, f);

        System.out.print("query: " + query);

        System.out.println("Total hits: " + hits.length());

3) delima is spelled as dilemma


-Vikas Gupta

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

RE: HITCOLLECTOR+SCORE+DELIMA

Posted by Karthik N S <ka...@controlnet.co.in>.

Hi Guys

Apologies..........


So u say I have to Build a Filter to Collect all the Scores between the 2
Ranges [ 0.2f to 1.0f]


so the API for the same would be

 Hits hit = search(Query query, Filter filtertoGetScore)


 But while writing the Filter  Score again depends on Hits  ====> Score =
hits.score(x);



 How To solve this Or Am I in Wrong Process........


Any Simple Src for the same will be greatly appreciated.  :)

Thx in advance



-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
Sent: Friday, December 10, 2004 6:54 PM
To: Lucene Users List
Subject: Re: HITCOLLECTOR+SCORE+DELIMA


On Dec 10, 2004, at 7:39 AM, Karthik N S wrote:
> I am still in delima on How to use the HitCollector for returning
> Hits hits
> between scores  0.2f to 1.0f ,
>
> There is not a simple example for the same, yet lot's of talk on usage
> for
> the same on the form.

Unfortunately there isn't a clean way to stop a HitCollector - it will
simply collect all hits.

Also, scores are _not_ normalized when passed to a HitCollector, so you
may get scores > 1.0.  Hits, however, does normalize and you're
guaranteed that scores will be <= 1.0.  Hits are in descending score
order, so you may just want to use Hits and filter based on the score
provided by hits.score(i).

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: HITCOLLECTOR+SCORE+DELIMA

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Dec 10, 2004, at 7:39 AM, Karthik N S wrote:
> I am still in delima on How to use the HitCollector for returning  
> Hits hits
> between scores  0.2f to 1.0f ,
>
> There is not a simple example for the same, yet lot's of talk on usage 
> for
> the same on the form.

Unfortunately there isn't a clean way to stop a HitCollector - it will 
simply collect all hits.

Also, scores are _not_ normalized when passed to a HitCollector, so you 
may get scores > 1.0.  Hits, however, does normalize and you're 
guaranteed that scores will be <= 1.0.  Hits are in descending score 
order, so you may just want to use Hits and filter based on the score 
provided by hits.score(i).

	Erik

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org