You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by YourSoft <yo...@freemail.hu> on 2006/05/10 19:47:47 UTC

distance between words

Hi,

I have a suggestion to improve nutch search results.
The "big" search engines (like google) measure the distance between the 
query words.
E.g.:
query string: lucene in action
When you search for it in google, google will boost up that documents 
where the "lucene in action" is in the same sequence.

I think it is possible in nutch/lucene (e.g. if your search string is: 
"lucene in action"), but nutch don't make it.

Any ideas how to make it?


Regards,
       Ferenc

mozdex

Posted by YourSoft <yo...@freemail.hu>.
Dear List!

I don't know who support mozdex.com, but this server doesn't search 
since Saturday.

Regards,
    Ferenc

Re: distance between words

Posted by YourSoft <yo...@freemail.hu>.
Sorry my bad English.
Ok, I'm see that I wrote my suggestion very wrongly.

Please try the following:
search in msn and google for the following:
Freddie i want to ride my bicycle

I think this is unambiguous what I would like to see in results.
In msn are 21,958 hits and there is the 4th position the good results. 
(4th from 21,958)
In google there are 308,000 hits, and there is the first hit is the full 
text of music (1st from 308,000)

I think in this situation the google results is better than msn. In the 
google is a larger dataset, and there is better result.
I think the nutch results is bad in most cases.

I found that in 'explain.jsp' the result scored by full phrase also 
("Freddie i want to ride my bicycle").
But in this situation it is bad, because "Freddie" is not near to "i 
want...".

Best Regards,
    Ferenc


Re: distance between words

Posted by YourSoft <yo...@freemail.hu>.
YourSoft írta:
> Hi,
>
> I have a suggestion to improve nutch search results.
> The "big" search engines (like google) measure the distance between 
> the query words.
> E.g.:
> query string: lucene in action
> When you search for it in google, google will boost up that documents 
> where the "lucene in action" is in the same sequence.
>
> I think it is possible in nutch/lucene (e.g. if your search string is: 
> "lucene in action"), but nutch don't make it.
>
> Any ideas how to make it?
>
>
> Regards,
>       Ferenc
>
>
I'm sorry something is missing from previous mail:
When search the keywords, there something also improve the boost:
- How many times found the full query ('Lucene in action") in document. 
(The length of total document / full query count - if it is bigger than 
10-20% it is BAD)
- How many times found the query words in document ("lucene" "in" "action")