You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Arcadius Ahouansou <ar...@menelic.com> on 2012/03/16 03:21:04 UTC

Index-time field boost with DIH

Hello.

I have an SQL database with documents having an ID, TITLE and SUMMARY.
I am using the DIH to index the data.

In the DIH dataConfig, for every document, I would like to do something
like:

<field column="TITLE" name="title"* boost="2.0"* />

In other words,  "A match on any document's title is worth twice as much as
a match on other fields"

In my schema, I have omitNorms set to false.

1) How can I do this in the DIH?

2) Apart from omitNorms making the index bigger,  I thought that index-time
boost would give us more performance than doing the very same boosting at
query time over and over again.
Is that correct?

3) I also came across the Lucene FAQ at
http://wiki.apache.org/lucene-java/LuceneFAQ#What_is_the_difference_between_field_.28or_document.29_boosting_and_query_boosting.3F

where the following interesting statement seems to contradict what I'm
trying to achieve:

*Index time field boosts are worthless if you set them on every document. *

Any hint would be much appreciated.


Thanks.

Arcadius.

Re: Index-time field boost with DIH

Posted by Erick Erickson <er...@gmail.com>.
I'd go ahead and do the query time boosts. The "penalty" will
be a single multiplication per doc (I think), and probably not
noticeable. And it's much more flexible/easier...

Best
Erick

On Thu, Mar 15, 2012 at 9:21 PM, Arcadius Ahouansou
<ar...@menelic.com> wrote:
> Hello.
>
> I have an SQL database with documents having an ID, TITLE and SUMMARY.
> I am using the DIH to index the data.
>
> In the DIH dataConfig, for every document, I would like to do something
> like:
>
> <field column="TITLE" name="title"* boost="2.0"* />
>
> In other words,  "A match on any document's title is worth twice as much as
> a match on other fields"
>
> In my schema, I have omitNorms set to false.
>
> 1) How can I do this in the DIH?
>
> 2) Apart from omitNorms making the index bigger,  I thought that index-time
> boost would give us more performance than doing the very same boosting at
> query time over and over again.
> Is that correct?
>
> 3) I also came across the Lucene FAQ at
> http://wiki.apache.org/lucene-java/LuceneFAQ#What_is_the_difference_between_field_.28or_document.29_boosting_and_query_boosting.3F
>
> where the following interesting statement seems to contradict what I'm
> trying to achieve:
>
> *Index time field boosts are worthless if you set them on every document. *
>
> Any hint would be much appreciated.
>
>
> Thanks.
>
> Arcadius.