You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Massimo Miccoli <mm...@iltrovatore.it> on 2005/06/28 17:20:51 UTC
Re: [Nutch-dev] Copy DB by the piece
Dear Nutch dev,
I want to know if the Boost calulated for Pages from inlink count at
indexing and fetching time is used on the search.
Using DistributedSearch seams that Pgae Boost is not used to calculate
the ranks for pages. What I see in my result pages
is most pages with low page Boost is on top and some with high Boost below.
For example by explain.jsp:
1) boost = 5.3968873 score for query= 50.692223
2 ) boost = 5.586193 score for query= 46.90389
3) boost = 6.0371985 score for query= 43.306103
4) boost = 7.388178 score for query= 37.984783
....
So only the score for query is considered for sort (rank) the hits
results? For an hits I think that ranks must be boost*score for query or
I'm wrong?
Thanks,
Massimo
RE: [Nutch-dev] Copy DB by the piece
Posted by Chirag Chaman <de...@filangy.com>.
Boost are multiplied into the "match score" (aka. The Idf-tf)
Thus, pages are not soted by boosts, but by the final score.
Here's a example:
You have 3 pages:
- news.google.com
- www.blogspot.com/googguy (blog talking about google)
- www.yahoo.com/google-launches-ship-into-space.html
Let's say the boosts factors are 1,2 and 3 respectively.
Now, you do a search for "google".
Let's take the raw scores to be 50,20,15 for the 3 url.
After boosts are applied:
News.google.com - 50 * 1 = 50
www.blogspot.com - 20 * 2 = 40
www.yahoo.com - 15 * 3 = 45
Thus, you'll get ranking as
News.google.com
www.yahoo.com...
www.blogspot.com...
HTH!
-----Original Message-----
From: Massimo Miccoli [mailto:mmiccoli@iltrovatore.it]
Sent: Tuesday, June 28, 2005 12:51 PM
To: dev@nutch.org
Subject: Re: [Nutch-dev] Copy DB by the piece
Chirag,
So the boost on top of explain.jsp is for sorting results, the final value
for rank? If so the Hits on results pages is not ordered by boost.
Because I have in firts positions Hits with low boost.
Thanks
Chirag Chaman ha scritto:
>Massimo,
>
>The boost gets multiplied at search time.
>
>This boost has already been applied to the "field norms" -- a good way
>to confirm is see a field norm that was originally one (URL or anchor
>is a good
>one) and that should now be higher. A lot of the other fields like
"content"
>is way too small be being with to show any difference.
>
>In shot, if you see the boost on the top of the explain page, it's
>definitely there in the field norms -- and thus being applied.
>
>CC-
>
>--------------------------------------------
>Filangy, Inc.
>We're Improving Search!
>www.filangy.com
>
>
>-----Original Message-----
>From: Massimo Miccoli [mailto:mmiccoli@iltrovatore.it]
>Sent: Tuesday, June 28, 2005 11:21 AM
>To: dev@nutch.org
>Subject: Re: [Nutch-dev] Copy DB by the piece
>
>Dear Nutch dev,
>
>I want to know if the Boost calulated for Pages from inlink count at
>indexing and fetching time is used on the search.
>Using DistributedSearch seams that Pgae Boost is not used to calculate
>the ranks for pages. What I see in my result pages is most pages with
>low page Boost is on top and some with high Boost below.
>For example by explain.jsp:
>
>
>1) boost = 5.3968873 score for query= 50.692223
>2 ) boost = 5.586193 score for query= 46.90389
>3) boost = 6.0371985 score for query= 43.306103
>4) boost = 7.388178 score for query= 37.984783
>....
>
>So only the score for query is considered for sort (rank) the hits results?
>For an hits I think that ranks must be boost*score for query or I'm wrong?
>
>Thanks,
>
>Massimo
>
>
>
>
>-------------------------------------------------------
>SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
>from IBM. Find simple to follow Roadmaps, straightforward articles,
>informative Webcasts and more! Get everything you need to get up to
>speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
>_______________________________________________
>Nutch-developers mailing list
>Nutch-developers@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/nutch-developers
>
>
>
Re: [Nutch-dev] Copy DB by the piece
Posted by Massimo Miccoli <mm...@iltrovatore.it>.
Chirag,
So the boost on top of explain.jsp is for sorting results, the final
value for rank? If so the Hits on results pages is not ordered by boost.
Because I have in firts positions Hits with low boost.
Thanks
Chirag Chaman ha scritto:
>Massimo,
>
>The boost gets multiplied at search time.
>
>This boost has already been applied to the "field norms" -- a good way to
>confirm is see a field norm that was originally one (URL or anchor is a good
>one) and that should now be higher. A lot of the other fields like "content"
>is way too small be being with to show any difference.
>
>In shot, if you see the boost on the top of the explain page, it's
>definitely there in the field norms -- and thus being applied.
>
>CC-
>
>--------------------------------------------
>Filangy, Inc.
>We're Improving Search!
>www.filangy.com
>
>
>-----Original Message-----
>From: Massimo Miccoli [mailto:mmiccoli@iltrovatore.it]
>Sent: Tuesday, June 28, 2005 11:21 AM
>To: dev@nutch.org
>Subject: Re: [Nutch-dev] Copy DB by the piece
>
>Dear Nutch dev,
>
>I want to know if the Boost calulated for Pages from inlink count at
>indexing and fetching time is used on the search.
>Using DistributedSearch seams that Pgae Boost is not used to calculate the
>ranks for pages. What I see in my result pages is most pages with low page
>Boost is on top and some with high Boost below.
>For example by explain.jsp:
>
>
>1) boost = 5.3968873 score for query= 50.692223
>2 ) boost = 5.586193 score for query= 46.90389
>3) boost = 6.0371985 score for query= 43.306103
>4) boost = 7.388178 score for query= 37.984783
>....
>
>So only the score for query is considered for sort (rank) the hits results?
>For an hits I think that ranks must be boost*score for query or I'm wrong?
>
>Thanks,
>
>Massimo
>
>
>
>
>-------------------------------------------------------
>SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
>from IBM. Find simple to follow Roadmaps, straightforward articles,
>informative Webcasts and more! Get everything you need to get up to
>speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
>_______________________________________________
>Nutch-developers mailing list
>Nutch-developers@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/nutch-developers
>
>
>
RE: [Nutch-dev] Copy DB by the piece
Posted by Chirag Chaman <de...@filangy.com>.
Massimo,
The boost gets multiplied at search time.
This boost has already been applied to the "field norms" -- a good way to
confirm is see a field norm that was originally one (URL or anchor is a good
one) and that should now be higher. A lot of the other fields like "content"
is way too small be being with to show any difference.
In shot, if you see the boost on the top of the explain page, it's
definitely there in the field norms -- and thus being applied.
CC-
--------------------------------------------
Filangy, Inc.
We're Improving Search!
www.filangy.com
-----Original Message-----
From: Massimo Miccoli [mailto:mmiccoli@iltrovatore.it]
Sent: Tuesday, June 28, 2005 11:21 AM
To: dev@nutch.org
Subject: Re: [Nutch-dev] Copy DB by the piece
Dear Nutch dev,
I want to know if the Boost calulated for Pages from inlink count at
indexing and fetching time is used on the search.
Using DistributedSearch seams that Pgae Boost is not used to calculate the
ranks for pages. What I see in my result pages is most pages with low page
Boost is on top and some with high Boost below.
For example by explain.jsp:
1) boost = 5.3968873 score for query= 50.692223
2 ) boost = 5.586193 score for query= 46.90389
3) boost = 6.0371985 score for query= 43.306103
4) boost = 7.388178 score for query= 37.984783
....
So only the score for query is considered for sort (rank) the hits results?
For an hits I think that ranks must be boost*score for query or I'm wrong?
Thanks,
Massimo