You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by alessio crisantemi <al...@gmail.com> on 2012/03/19 20:41:29 UTC
crawling sile system
Dear All,
How can I do to crawl ONLY a directory? I specified only my directory on
txt file, but nutch crawling all parents directories.
suggestions?
thank you
alessio
Re: crawling sile system
Posted by Sebastian Nagel <wa...@googlemail.com>.
Hi Alessio,
you should set the property file.crawl.parent (see below) to false in your nutch-site.xml.
Sebastian
<property>
<name>file.crawl.parent</name>
<value>true</value>
<description>
The crawler is not restricted to the directories that
you specified in the Urls file but it is jumping into the parent
directories as well. For your own crawlings you can change this
behavior (set to false) the way that only directories beneath the
directories that you specify get crawled.
</description>
</property>
On 03/19/2012 08:41 PM, alessio crisantemi wrote:
> Dear All,
> How can I do to crawl ONLY a directory? I specified only my directory on
> txt file, but nutch crawling all parents directories.
> suggestions?
> thank you
> alessio
>