You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Kelvin Tan <ke...@relevanz.com> on 2005/10/04 17:50:51 UTC

No more FetchListEntry in MapReduce branch

There were some previous discussions on implementing If-Modified-Since during the fetching phase by modifying FetchListEntry. Seeing that FetchListEntry is no longer used in the MapReduce branch and only the URL string is passed to the protocol handlers, I'm wondering if anyone has thoughts on how to work with this.

Along the same lines, it appears that implementing a more feature-ful system of crawling scopes and filters (ala OC/Nutch-84) requires some form of abstraction to be carried around in the map/reduce phases rather than just the URL string. For example, for each URL, its seed URL, depth from seed and its parent URL needs to be known.

kelvin