You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Robert Sanford <rs...@trefs.com> on 2006/07/25 17:11:12 UTC

Indexing href attribute in links.

I am currently running Nutch 0.7.2 under Jboss 4.0.1 using Java 1.5.0_01
for Win32. The index runs were created under Cygwin.

What I have found so far is that Nutch will not index keywords within
the href attribute of an anchor tag and I want Nutch to do so.

I provide a co-branding service for customer sites. That service
requires that the customer site format the link to my site in a specific
fashion in order to receive their share of revenue from the service my
site provides. In order to help out my clients I want to index their
sites and scan the links into my site and validate their links.

Is there an existing plugin that will do what I'm asking for? Is that
the purpose of the "index-basic" plugin? If not, I'm willing to look at
coding a plugin but no guarantees that my boss will give me the time...

rjsjr