You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Kevin MacDonald <ke...@hautesecure.com> on 2008/09/06 01:00:35 UTC

Looking to count links with Nutch

Hello,

I have what I hope is a very simple task that I would like to accomplish
using Nutch. Given an Url I need to produce a list of links that come off
that page. Once I have that list I will be counting the number of links that
remain on that domain and the number of links that lead off the domain. At
the moment I don't need to crawl those links, just enumerate them.

I'm assuming I can do this by writing a simple plugin. Could anyone give me
some hints as to which type of plugin I need to write? I would also like to
configure Nutch so that only this one operation is performed as quickly as
possible, i.e. I would like to switch off as many other types of processing
and parsing as possible.

Any suggestions are much appreciated.

Thanks

Kevin