You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Andy Cranfill <An...@careerbuilder.com> on 2010/09/16 19:09:42 UTC

nutch crawling page question

Hi All,

I am using nutch for a new crawling project and have run into a quandary (for me).   When i get a page to HTML parse it, i need a datum from the page that had the link to this page (the one i am parsing now).  The page previous to the one i need has a list of links and i need to get some data with the link so when i parse the page (the target of one of these links) i can get the data i need.

Any ideas on how to pass the data from the preceding page to the linked-to page?

Thanks!
Andy Cranfill