You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by grif <tp...@gmail.com> on 2007/10/22 05:50:54 UTC

Mimicking Anchor Text Relevance & Authority On a Focused Crawl

I am only interested in searching across a corpus of injected domains. The
problem with this, however, is that two of the most valuable elements
towards achieving ranking accuracy won't be there: incoming anchor text and
the authority level inherited from sites linking to it.

I can get backlink information for each url I'm interested in from Yahoo
Site Explorer or Alexa's set of web search tools. If I started the crawl at
these URLs, I would capture the anchor text and authority levels of the
pages I'm really interested in - but I would then have to remove the pages
I'm not interested in.

I'm wondering if anyone has ever tried to do what I'm trying to do - and if
so, please share any tips/ideas that might make the process a little less
painful.

Thanks! :)
-- 
View this message in context: http://www.nabble.com/Mimicking-Anchor-Text-Relevance---Authority-On-a-Focused-Crawl-tf4668564.html#a13336338
Sent from the Nutch - User mailing list archive at Nabble.com.