You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by EM <em...@cpuedge.com> on 2005/09/23 07:44:47 UTC

Re: How can I recover an aborted fetch process

You cannot resume failed fetch.
You can either 1. restart it, 2. use whatever's fetched so far.

To perform 2 you'll need to create 'fetcher.done' in the segment 
directory. To do this, simply:
#cd <your segment directory>
#touch fetcher.done
the 'touch' command will create the file (size 0 bytes).

Once that's done, run updatedb.

Gal Nitzan wrote:

> Hi,
>
> In the FAQ there is the following answer and I really do not 
> understand it so I'm sure it is a good candidate for revision :-) .
>
> the answer as follows:
>
> >>>>You'll need to touch the file fetcher.done in the segment 
> directory.<<<<
>
> when a fetch is aborted there is no such file as fetcher.done at least 
> not on my system
>
> >>>> All the pages that were not crawled will be re-generated for 
> fetch pretty soon. <<<<
>
> How? (probably by calling generate?) what will re-generate it.
>
> >>>> If you fetched lots of pages, and don't want to have to re-fetch 
> them again, this is the best way.<<<<
>
> Please feel free to elaborate....
>
> Regards,
>
> Gal

Re: How can I recover an aborted fetch process

Posted by Gal Nitzan <gn...@usa.net>.

EM wrote:
> You cannot resume failed fetch.
> You can either 1. restart it, 2. use whatever's fetched so far.
>
> To perform 2 you'll need to create 'fetcher.done' in the segment 
> directory. To do this, simply:
> #cd <your segment directory>
> #touch fetcher.done
> the 'touch' command will create the file (size 0 bytes).
>
> Once that's done, run updatedb.
>
>
>
>
> Gal Nitzan wrote:
>
>> Hi,
>>
>> In the FAQ there is the following answer and I really do not 
>> understand it so I'm sure it is a good candidate for revision :-) .
>>
>> the answer as follows:
>>
>> >>>>You'll need to touch the file fetcher.done in the segment 
>> directory.<<<<
>>
>> when a fetch is aborted there is no such file as fetcher.done at 
>> least not on my system
>>
>> >>>> All the pages that were not crawled will be re-generated for 
>> fetch pretty soon. <<<<
>>
>> How? (probably by calling generate?) what will re-generate it.
>>
>> >>>> If you fetched lots of pages, and don't want to have to re-fetch 
>> them again, this is the best way.<<<<
>>
>> Please feel free to elaborate....
>>
>> Regards,
>>
>> Gal
>
>
>
> .
>
Thanks EM...