You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Paul M Lieberman <pa...@alum.mit.edu> on 2006/08/12 21:38:02 UTC

crawl w/o store

Y'all -

I need to do an intranet crawl in order to get a list of all URLs 
fetched. I do NOT want to store the data for this crawl. I understand 
there is a configuration option to do just this. Which file do I change 
(conf/nutch-site.xml?), and what do I need to add to it?

I'm running nutch 0.72.

- Paul M Lieberman

Re: crawl w/o store

Posted by Dennis Kubes <nu...@dragonflymc.com>.

You can add the property to the nutch-site.xml file to take precedence 
over default in nutch-default.xml file.  The value is as below.  This is 
for Nutch 0.8  I am not sure if this is the same for 0.72

<property>
  <name>fetcher.store.content</name>
  <value>false</value>
  <description>If true, fetcher will store content.</description>
</property>

Dennis



Paul M Lieberman wrote:
> Y'all -
>
> I need to do an intranet crawl in order to get a list of all URLs 
> fetched. I do NOT want to store the data for this crawl. I understand 
> there is a configuration option to do just this. Which file do I 
> change (conf/nutch-site.xml?), and what do I need to add to it?
>
> I'm running nutch 0.72.
>
> - Paul M Lieberman