You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by alexis artes <al...@yahoo.com> on 2006/04/24 08:04:53 UTC

deletable files

Hi,

I am using Nutchwax. which is using Nutch v0.7,
together with heritrix and wera for a web archive
system.

Since we are achiving the websites that we crawled,
storage is a concern. I would like to ask what files
inside the Index folder can be deleted? I did a trial
and error approach and was still able to run search
and retrieval on Wera without the following folders:
webdb,segment-*-indexs, segment-*-parse_data, and
segment-*-fetcher.

I hope someone can advise me if what I am doing is
correct.

Best Regards,
Alexis Artes

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com