You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Clemens Marschner <cm...@lanlab.de> on 2002/08/12 18:01:02 UTC

term Boosting + AND semantics

I want to boost terms found in header lines of HTML pages more than those found in other parts of the text. (just an example which can be applied to different use cases).
Therefore I put text enclosed with <h1> tags in a special field (call it "hd") and the other text in another field.

How can I rewrite the query so that I can reach AND semantics? That is,

- if terms of the query are found in the headers, the result docs are ranked higher
- if some terms of the query are found in the headers and some in the rest, it is ok. if they are found in both, even better.
- if only one term of the whole query is not found in either of these fields, no results are returned

think of the query
  my little query
AND semantics means something like
  +my +little +query
splitting this into two fields would be something like
  +hd:(+my +little +query)^2 +(+my +little +query)
but apparently, this will need the terms to occur in both fields.

The only idea I can think of is to provide every possible combination like
  (+hd:(my little query)^2 +(+my +little +query)) 
  (+hd:(my little +query)^2 +(+my +little query)) 
  (+hd:(my +little query)^2 +(+my little +query)) 
  ...
which is awkward. Any ideas? 

Clemens