You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Naess, Ronny" <Ro...@avinor.no> on 2007/06/26 15:36:58 UTC

The ranking is wrong

I have indexed our intranet with Nutch-0.9.

I do a query 'parking location:stavanger language:no' and I recive some
hits. (two extra fields added)

The Nutch client ranks the hits not quite as expected. 
1. Transport and parking - Stavanger Airport, Sola
2. Frontpage - Stavanger Airport, Sola
3. Parking - Stavanger Airport, Sola

How it should have been
1. Parking - Stavanger Airport, Sola
2. Transport and parking - Stavanger Airport, Sola
3. Frontpage - Stavanger Airport, Sola (should not have been there at
all if possible, but I recon it is not easy to not index a navigation
menus since they are part of the page) 

The page "Parking - Stavanger Airport, Sola" has parking in the title,
parking in the content (20+ times in some way, mostly combined words
like xxxparking, or parkingxxx, but also about 5 times as only parking)
and even parking in the url.

I guess I have to alter the boosting for some fields. I tried to up the
boost in index-basic plugin (hardcode it), but I can't see any changes
in the index. Luke tells me that the field index still is 1.0 even after
I changed them. Am I doing it wrong?

Even if I search only for 'parking' and not filtering on location I
recive a lot of hits but all is frontpage for the different frontpage.
All of this pages seem to have a high boost outranking the real parking
page (s) big time. 

Any help is appreciated.


Best regards, 

Ronny N.