You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Doug Cutting <cu...@nutch.org> on 2005/11/04 22:47:29 UTC
Re: mapred questions
Ken van Mulder wrote:
> First is that the fetcher slows down over time and continues to use more
> and more memory as it goes (which I think is eventually hanging the
> process).
What parser plugins do you have enabled? These are usually the culprit.
Try using 'kill -QUIT' to see what various threads are doing, both at
the start and later, when it slows and grows.
> Second problem is trying to use the crawl. I've tried with a seeds/url
> file contain 4, 2000 and then 100k urls in it. Using:
>
> $ bin/nutch crawl seeds
>
> Which goes through its processing and completes, but doesn't visit any
> of the urls in the seeds file. What am I missing to get it to actually
> do the crawl?
Are you using NDFS? If so, the seeds directory needs to be stored in
NDFS. Use 'bin/nutch ndfs -put seeds seeds'.
Doug