You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Michael McDonald <ke...@kelek.com> on 2002/10/30 20:58:46 UTC

Making capitalization significant

Is there a way to arrange indexing and searching so that when searching 
for "Lucene", the term "Lucene" would be given more boost than the term 
"lucene", and ideally "lucene" would have more boost than "LUCENE"?
-- 




-- Mike McDonald <ke...@kelek.com>
    http://www.kelek.com
      Web Samurai
      Freelance Programmer/Consultant
      Developer, http://www.Worldisround.com



--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

ignoring fileds in score calculation

Posted by Harpreet S Walia <ha...@sansuisoftware.com>.

Hi ,

I have the following quries :

1.  Is there a way to ignore certain fields while score calculation ?
2.  Does putting extra braces in the query affect the search results /score
. i.e is the query "india" any different from "( (india) )" ?

TIA

Regards,
Harpreet


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: Making capitalization significant

Posted by Ype Kingma <yk...@xs4all.nl>.

On Wednesday 30 October 2002 20:58, Michael McDonald wrote:
> Is there a way to arrange indexing and searching so that when searching
> for "Lucene", the term "Lucene" would be given more boost than the term
> "lucene", and ideally "lucene" would have more boost than "LUCENE"?

Use an analyzer that keeps the original case for indexing and query eg. like 
this:

Lucene^10 lucene^8 LUCENE^6

You want different weights per term, and you can't influence these directly
in the index. Therefore you'll have to query with different term weights.

A problem arises when there are 100 documents mentioning Lucene, and 
one document mentioning LUCENE. With the above query, the LUCENE document 
will likely get the highest score.

So you'll have to adapt the weights in the query by using the scoring formula
and correcting for  the nrs of documents containing each of the terms.
You can get these from IndexReader.docFreq().

And you'll have to do that for each casing of the queried term, ie.
2 ** (length of term) times, skipping the ones having zero frequency.

Kind regards,
Ype

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>