You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Ali Nazemian <al...@gmail.com> on 2014/06/10 12:55:42 UTC

Sending parse data from one generate-fetch-update cycle to another one

Hi every body,
I am going to crawl and parse some news website as follows:
There are some important locations in each website that have news with
higher value of importance. Therefore I am going to parse page by xpath to
find these news. Then I am going to assign specific score to these news
based on their xpath. This is the step that I faced with problem. My
problem is score can be determined when one page is parsed by xpath. But
this score should be send to solr as a score of the document that could be
fetched at the next generate-fetch-update cycle! Therefore I should send
this score to the document that Is not fetched yet! How can I do this
procedure using Nutch? Is there exist any built-in class or process for
this purpose? How can I do that?
Best regards.

-- 
A.Nazemian