You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Paul Harrison <pr...@swbell.net> on 2005/10/14 14:02:17 UTC

Can any directories in segments be deleted?

In each segment I have the following directories:

 

Content

Parse_data

Parse_text

Fetcher

Fetchlist

 

I have not plans of updating the index as it is used for a static
demonstration.

 

Are all of these directories necessary or can I delete some?  I need to
remove data as I can't finish indexing these segments, because I don't have
enough disk space.


Re: Can any directories in segments be deleted?

Posted by Andrzej Bialecki <ab...@getopt.org>.
Paul Harrison wrote:
> In each segment I have the following directories:
> 
>  
> 
> Content
> 
> Parse_data
> 
> Parse_text
> 
> Fetcher
> 
> Fetchlist

Bad news: if you haven't finished fetching yet, then this means that not 
only you need all dirs, but also you will need to create the index 
proper on top of that. I suggest creating a smaller segment, and 
refetching - or just stop where you are, and use segslice command to cut 
a piece that you can handle.

And also I suggest getting a bigger disk... ;-)

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


RE: Can any directories in segments be deleted?

Posted by Paul Harrison <pa...@personifi.com>.
I appreciate the feedback.

We have 5 machines with 1 80 GB drive (OS), 2 250GB drives (data1, data2).
We have a little over 100 million pages.

It appears there is a leftover Webdb from when the pages were pulled on the
fourth machine.  We are currently running without this machine in the lineup
as it is the only one that has not been indexed.  There is a webdb on the
first machine.  I am guessing that this webdb that is taking up 72GB of
space is not needed.  We don't intend on pulling any more pages or updating
the data, but we may run linkanalysis.  Any comments?

Thanks,

Paul

-----Original Message-----
From: EM [mailto:emilijan@cpuedge.com] 
Sent: Friday, October 14, 2005 10:23 AM
To: nutch-user@lucene.apache.org
Subject: Re: Can any directories in segments be deleted?

If you can live without 'view cached version" then you can delete 
'content' and of the 'parse_' directories (I don't remember which one),
but not before the indexing finishes.

250GB drives are the best you can get for your money lately (I did a bit 
of research before my shopping)


Paul Harrison wrote:

>In each segment I have the following directories:
>
> 
>
>Content
>
>Parse_data
>
>Parse_text
>
>Fetcher
>
>Fetchlist
>
> 
>
>I have not plans of updating the index as it is used for a static
>demonstration.
>
> 
>
>Are all of these directories necessary or can I delete some?  I need to
>remove data as I can't finish indexing these segments, because I don't have
>enough disk space.
>
>
>  
>


Re: Can any directories in segments be deleted?

Posted by EM <em...@cpuedge.com>.
If you can live without 'view cached version" then you can delete 
'content' and of the 'parse_' directories (I don't remember which one),
but not before the indexing finishes.

250GB drives are the best you can get for your money lately (I did a bit 
of research before my shopping)


Paul Harrison wrote:

>In each segment I have the following directories:
>
> 
>
>Content
>
>Parse_data
>
>Parse_text
>
>Fetcher
>
>Fetchlist
>
> 
>
>I have not plans of updating the index as it is used for a static
>demonstration.
>
> 
>
>Are all of these directories necessary or can I delete some?  I need to
>remove data as I can't finish indexing these segments, because I don't have
>enough disk space.
>
>
>  
>