You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Andy Cranfill <An...@careerbuilder.com> on 2010/09/16 19:09:42 UTC
nutch crawling page question
Hi All,
I am using nutch for a new crawling project and have run into a quandary (for me). When i get a page to HTML parse it, i need a datum from the page that had the link to this page (the one i am parsing now). The page previous to the one i need has a list of links and i need to get some data with the link so when i parse the page (the target of one of these links) i can get the data i need.
Any ideas on how to pass the data from the preceding page to the linked-to page?
Thanks!
Andy Cranfill