You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Doug Cutting <cu...@nutch.org> on 2005/11/04 22:47:29 UTC

Re: mapred questions

Ken van Mulder wrote:
> First is that the fetcher slows down over time and continues to use more 
> and more memory as it goes (which I think is eventually hanging the 
> process).

What parser plugins do you have enabled?  These are usually the culprit. 
  Try using 'kill -QUIT' to see what various threads are doing, both at 
the start and later, when it slows and grows.

> Second problem is trying to use the crawl. I've tried with a seeds/url 
> file contain 4, 2000 and then 100k urls in it. Using:
> 
> $ bin/nutch crawl seeds
> 
> Which goes through its processing and completes, but doesn't visit any 
> of the urls in the seeds file. What am I missing to get it to actually 
> do the crawl?

Are you using NDFS?  If so, the seeds directory needs to be stored in 
NDFS.  Use 'bin/nutch ndfs -put seeds seeds'.

Doug