You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Insurance Squared Inc." <gc...@insurancesquared.com> on 2006/05/15 19:50:06 UTC

Boost for inbound links

Is there any good way to boost for the number of inbound links to a 
page?  I guess we can't use PR as that's patented, but I thought that we 
could somehow boost based on the number of inbound links.  Upon looking 
at the conf though (and my developer's reply) it doesn't seem like we 
can do this.  Am I missing something obvious?  Or should I be looking at 
building this in?

thanks,
g.


Re: Boost for inbound links

Posted by "Insurance Squared Inc." <gc...@insurancesquared.com>.
Just some further notes on this.  We are using a 'whitelist' only 
approach, where only specific files are injected.  Thus we have crawling 
of external links off.  Despite this, we still get some spammy sites 
into the index when we inject lists of 50,000 or so urls. 

Because we have external links turned off for crawling, external links 
are not taken into account (AFAIK) in the ranking, only internal links.  
So the spammy sites do well.

Is there a way to do this?  It seems odd that because we turn off 
external links in the crawl that we can't use 'external links' i.e. 
other sites within the database, as part of the ranking. 


Thanks,
g.

Insurance Squared Inc. wrote:

> Is there any good way to boost for the number of inbound links to a 
> page?  I guess we can't use PR as that's patented, but I thought that 
> we could somehow boost based on the number of inbound links.  Upon 
> looking at the conf though (and my developer's reply) it doesn't seem 
> like we can do this.  Am I missing something obvious?  Or should I be 
> looking at building this in?
>
> thanks,
> g.
>
>