You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Alaak <al...@gmx.de> on 2012/09/08 10:43:49 UTC

Problem with corrupted index "Input path does not exist:"

Hi,

I needed to abort a crawl this morning and it seems my drawl directory 
is somehow corrupted now. I get the error message: "Input path does not 
exist: file:/home/user/Apache 
Nutch/crawl/segments/20120908095131/parse_data" Is there any way to 
delete the data already created by the non finished crawl to clean up 
the crawl directory?

The solution I found on stackoverflow was to delete the whole crawl db, 
which I would like to avoid since it already contains one week of data.

Thanks.

Re: Problem with corrupted index "Input path does not exist:"

Posted by Lewis John Mcgibbney <le...@gmail.com>.
http://wiki.apache.org/nutch/FAQ#How_can_I_recover_an_aborted_fetch_process.3F

hth

Lewis

On Sat, Sep 8, 2012 at 9:43 AM, Alaak <al...@gmx.de> wrote:
> Hi,
>
> I needed to abort a crawl this morning and it seems my drawl directory is
> somehow corrupted now. I get the error message: "Input path does not exist:
> file:/home/user/Apache Nutch/crawl/segments/20120908095131/parse_data" Is
> there any way to delete the data already created by the non finished crawl
> to clean up the crawl directory?
>
> The solution I found on stackoverflow was to delete the whole crawl db,
> which I would like to avoid since it already contains one week of data.
>
> Thanks.



-- 
Lewis

Re: Problem with corrupted index "Input path does not exist:"

Posted by Alaak <al...@gmx.de>.
Hi,

Yeah. That helped. Thank you.

Am 08.09.2012 11:03, schrieb remi tassing:
> deleting that specific segment directory [0] should fix the problem 
> but it depends on what you're attempting to do.
>
> Remi
>
> [0]: /home/user/Apache Nutch/crawl/segments/20120908095131/
>
> On Saturday, September 8, 2012, Alaak wrote:
>
>     Hi,
>
>     I needed to abort a crawl this morning and it seems my drawl
>     directory is somehow corrupted now. I get the error message:
>     "Input path does not exist: file:/home/user/Apache
>     Nutch/crawl/segments/20120908095131/parse_data" Is there any way
>     to delete the data already created by the non finished crawl to
>     clean up the crawl directory?
>
>     The solution I found on stackoverflow was to delete the whole
>     crawl db, which I would like to avoid since it already contains
>     one week of data.
>
>     Thanks.
>


Re: Problem with corrupted index "Input path does not exist:"

Posted by remi tassing <ta...@gmail.com>.
deleting that specific segment directory [0] should fix the problem but it
depends on what you're attempting to do.

Remi

[0]: /home/user/Apache Nutch/crawl/segments/**20120908095131/

On Saturday, September 8, 2012, Alaak wrote:

> Hi,
>
> I needed to abort a crawl this morning and it seems my drawl directory is
> somehow corrupted now. I get the error message: "Input path does not exist:
> file:/home/user/Apache Nutch/crawl/segments/**20120908095131/parse_data"
> Is there any way to delete the data already created by the non finished
> crawl to clean up the crawl directory?
>
> The solution I found on stackoverflow was to delete the whole crawl db,
> which I would like to avoid since it already contains one week of data.
>
> Thanks.
>