You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by s d <s....@gmail.com> on 2008/01/08 06:02:29 UTC

How do i normalize diff information (different type of documents) in the index ?

e.g. if the index is field1 and field2 and documents of type (A) always have
information for field1 AND information for field2 while document of type (B)
always have information for field1 but NEVER information for field2.
The problem is that the formula will sum field1 and field2 hence skewing in
favour of documents of type (A).
If i combine the 2 fields into 1 field (in an attempt to normalize) i will
obviously skew the statistics.
Please advise,
Thanks,

Re: How do i normalize diff information (different type of documents) in the index ?

Posted by s d <s....@gmail.com>.

Got it (
http://wiki.apache.org/solr/DisMaxRequestHandler#head-cfa8058622bce1baaf98607b197dc906a7f09590)
.
thx !

On Jan 8, 2008 12:11 AM, Chris Hostetter < hossman_lucene@fucit.org> wrote:

>
> : Isn't there a better way to take the information into account but still
> : normalize? taking the score of only one of the fields doesn't sound like
> the
> : best thing to do (it's basically ignoring part of the information).
>
> note the word "mostly" in Mike's response about dismax ... the "tie" param
>
> lets you decide how much the other fields influence the score.  Try it,
> it works really well ... trust me/us.
>
> For the record: i'm really not sure what your question is ... you say you
> want to normalize for the fact that some docs don't have a value in some
> fields, but you don't want to combine the fields because it will skew the
> statistics ... isn't that "skewing" exactly what you are trying to
> achieve?
>
> don't you need to introduce some "skew" in favor of hte docs that don't
> have a value for field2 to compensate forr the existing "counter skew"
> they already have?
>
>
>
> -Hoss
>
>

Re: How do i normalize diff information (different type of documents) in the index ?

Posted by Chris Hostetter <ho...@fucit.org>.

: Isn't there a better way to take the information into account but still
: normalize? taking the score of only one of the fields doesn't sound like the
: best thing to do (it's basically ignoring part of the information).

note the word "mostly" in Mike's response about dismax ... the "tie" param 
lets you decide how much the other fields influence the score.  Try it, 
it works really well ... trust me/us.

For the record: i'm really not sure what your question is ... you say you 
want to normalize for the fact that some docs don't have a value in some 
fields, but you don't want to combine the fields because it will skew the 
statistics ... isn't that "skewing" exactly what you are trying to 
achieve?

don't you need to introduce some "skew" in favor of hte docs that don't 
have a value for field2 to compensate forr the existing "counter skew" 
they already have?



-Hoss

Re: How do i normalize diff information (different type of documents) in the index ?

Posted by s d <s....@gmail.com>.

Isn't there a better way to take the information into account but still
normalize? taking the score of only one of the fields doesn't sound like the
best thing to do (it's basically ignoring part of the information).

On Jan 7, 2008 9:20 PM, Mike Klaas <mi...@gmail.com> wrote:

>
> On 7-Jan-08, at 9:02 PM, s d wrote:
>
> > e.g. if the index is field1 and field2 and documents of type (A)
> > always have
> > information for field1 AND information for field2 while document of
> > type (B)
> > always have information for field1 but NEVER information for field2.
> > The problem is that the formula will sum field1 and field2 hence
> > skewing in
> > favour of documents of type (A).
> > If i combine the 2 fields into 1 field (in an attempt to normalize)
> > i will
> > obviously skew the statistics.
>
> Try the dismax handler.  It's main goal is to query multiple fields
> while only counting the score of the highest-scoring one (mostly).
>
> -Mike
>

Re: How do i normalize diff information (different type of documents) in the index ?

Posted by Mike Klaas <mi...@gmail.com>.

On 7-Jan-08, at 9:02 PM, s d wrote:

> e.g. if the index is field1 and field2 and documents of type (A)  
> always have
> information for field1 AND information for field2 while document of  
> type (B)
> always have information for field1 but NEVER information for field2.
> The problem is that the formula will sum field1 and field2 hence  
> skewing in
> favour of documents of type (A).
> If i combine the 2 fields into 1 field (in an attempt to normalize)  
> i will
> obviously skew the statistics.

Try the dismax handler.  It's main goal is to query multiple fields  
while only counting the score of the highest-scoring one (mostly).

-Mike