You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Jeremy Calvert <Je...@vulcan.com> on 2005/08/19 00:54:24 UTC

RE: [Nutch-dev] Outlink metadata?

To this end, would it suffice to abstract the Page and Link classes and
make expanded implementations of these?

Jeremy

-----Original Message-----
From: nutch-developers-admin@lists.sourceforge.net
[mailto:nutch-developers-admin@lists.sourceforge.net] On Behalf Of Erik
Hatcher
Sent: Thursday, August 18, 2005 11:45 AM
To: nutch-dev@lucene.apache.org
Subject: [Nutch-dev] Outlink metadata?

First a question about the current behavior... does Nutch adhere to  
the <a rel="nofollow"...> conventions?  If so, where is that coded?

On a related note, it seems carrying metadata around on Outlink would  
be beneficial, not just anchor text and URL.  For example, my  
application will crawl HTML sites with a HEAD <link> to RDF data.   
I'd like to, in an HtmlParseFilter, add ParseData metadata so that an  
indexer (a custom one currently, not the Nutch one) can get at the  
RDF data that has been fetched by the URL stored in the metadata.   
Make sense?

Would my use indicate that Outlink should carry along metadata or is  
there another way to achieve this (besides writing a custom HTML  
parser)?

Thanks,
     Erik



-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle
Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing &
QA
Security * Process Improvement & Measurement *
http://www.sqe.com/bsce5sf
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers