You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Dmitriy V. Kazimirov" <dm...@viorsan.com> on 2010/06/26 13:53:16 UTC

How to make nutch take distance between terms in document in account?

Hi,

Is it possible to make nutch scoring take into account distance between
terms?

i.e. if we have query president bush medvev, document where all 3 terms
are near(how to define 'near' hear is also interesting) each are scoring
higher than they are away from each other?

If that's not possible right now - I'm correct that new QueryFilter should
be implemented?how this should be made?

 

 

With regards, Dmitriy


RE: How to make nutch take distance between terms in document in account?

Posted by Ar...@csiro.au.
Hi Dmitriy,


>-----Original Message-----
>From: Dmitriy V. Kazimirov [mailto:dmitriy.kazimirov@viorsan.com]
>Sent: Saturday, June 26, 2010 9:53 PM
>To: user@nutch.apache.org
>Subject: How to make nutch take distance between terms in document in
>account?
>
>Hi,
>
>Is it possible to make nutch scoring take into account distance between
>terms?
>
>i.e. if we have query president bush medvev, document where all 3 terms
>are near(how to define 'near' hear is also interesting) each are scoring
>higher than they are away from each other?
>
>If that's not possible right now - I'm correct that new QueryFilter
>should
>be implemented?how this should be made?

This is implemented, but is not being used, if I am not wrong. Please see
addSloppyPhrases in BasicQueryFilter.java. Note that SLOP (the proximity
parameter) is set to Integer.MAX_VALUE which defines 'near' as 'very far'.
I did not find any code that would change it in Nutch.

Regards,

Arkadi

>
>
>
>
>With regards, Dmitriy