You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by EM <em...@cpuedge.com> on 2005/09/23 07:44:47 UTC
Re: How can I recover an aborted fetch process
You cannot resume failed fetch.
You can either 1. restart it, 2. use whatever's fetched so far.
To perform 2 you'll need to create 'fetcher.done' in the segment
directory. To do this, simply:
#cd <your segment directory>
#touch fetcher.done
the 'touch' command will create the file (size 0 bytes).
Once that's done, run updatedb.
Gal Nitzan wrote:
> Hi,
>
> In the FAQ there is the following answer and I really do not
> understand it so I'm sure it is a good candidate for revision :-) .
>
> the answer as follows:
>
> >>>>You'll need to touch the file fetcher.done in the segment
> directory.<<<<
>
> when a fetch is aborted there is no such file as fetcher.done at least
> not on my system
>
> >>>> All the pages that were not crawled will be re-generated for
> fetch pretty soon. <<<<
>
> How? (probably by calling generate?) what will re-generate it.
>
> >>>> If you fetched lots of pages, and don't want to have to re-fetch
> them again, this is the best way.<<<<
>
> Please feel free to elaborate....
>
> Regards,
>
> Gal
Re: How can I recover an aborted fetch process
Posted by Gal Nitzan <gn...@usa.net>.
EM wrote:
> You cannot resume failed fetch.
> You can either 1. restart it, 2. use whatever's fetched so far.
>
> To perform 2 you'll need to create 'fetcher.done' in the segment
> directory. To do this, simply:
> #cd <your segment directory>
> #touch fetcher.done
> the 'touch' command will create the file (size 0 bytes).
>
> Once that's done, run updatedb.
>
>
>
>
> Gal Nitzan wrote:
>
>> Hi,
>>
>> In the FAQ there is the following answer and I really do not
>> understand it so I'm sure it is a good candidate for revision :-) .
>>
>> the answer as follows:
>>
>> >>>>You'll need to touch the file fetcher.done in the segment
>> directory.<<<<
>>
>> when a fetch is aborted there is no such file as fetcher.done at
>> least not on my system
>>
>> >>>> All the pages that were not crawled will be re-generated for
>> fetch pretty soon. <<<<
>>
>> How? (probably by calling generate?) what will re-generate it.
>>
>> >>>> If you fetched lots of pages, and don't want to have to re-fetch
>> them again, this is the best way.<<<<
>>
>> Please feel free to elaborate....
>>
>> Regards,
>>
>> Gal
>
>
>
> .
>
Thanks EM...