You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by alessio crisantemi <al...@gmail.com> on 2012/03/19 20:41:29 UTC

crawling sile system

Dear All,
How can I do to crawl ONLY a directory? I specified only my directory on
txt file, but nutch crawling all parents directories.
suggestions?
thank you
alessio

Re: crawling sile system

Posted by Sebastian Nagel <wa...@googlemail.com>.
Hi Alessio,

you should set the property file.crawl.parent (see below) to false in your nutch-site.xml.

Sebastian

<property>
   <name>file.crawl.parent</name>
   <value>true</value>
   <description>
     The crawler is not restricted to the directories that
     you specified in the Urls file but it is jumping into the parent
     directories as well. For your own crawlings you can change this
     behavior (set to false) the way that only directories beneath the
     directories that you specify get crawled.
   </description>
</property>


On 03/19/2012 08:41 PM, alessio crisantemi wrote:
> Dear All,
> How can I do to crawl ONLY a directory? I specified only my directory on
> txt file, but nutch crawling all parents directories.
> suggestions?
> thank you
> alessio
>