You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Shailendra Mudgal <mu...@gmail.com> on 2008/10/15 12:43:54 UTC

Re: Recovering aborted fetch

Hi Andrzej,

Thanks for the tool. It allowed me to recover a failed fetch due to Disk
space.

Regards,
Vipin

On Mon, Mar 12, 2007 at 7:35 PM, Mathijs Homminga <
mathijs.homminga@knowlogy.nl> wrote:

> Hi all,
>
> We managed to recover most of our data using Andrzej's LocalFetchRecover
> tool.
>
> For more info see:
> http://issues.apache.org/jira/browse/NUTCH-451
>
> Mathijs
>
>
>
> Andrzej Bialecki wrote:
>
>> Mathijs Homminga wrote:
>>
>>> Hi Andrzej,
>>>
>>> Thanks for the tool!
>>>
>>> I found one 'map_xxxxxx' directory which matches the date my segment was
>>> created.
>>> It contains a 'part-0.out' file with a timestamp that matches the time of
>>> the last entries in my log file (just before the process stopped).
>>>
>>> I followed the preparation steps and ran the tool. However, I got the
>>> following error:
>>>
>>
>> The SequenceFile has the following structure (approximately): first, three
>> letter magic 'SEQ', and then either the fully qualified class names for
>> key/value, or abbreviated classnames obtained from the mapping in
>> WritableName. Please check what is this class name in part-0.out file -
>> apparently Hadoop can't find the right mapping (you can send me the first
>> bytes of this file off the list - use 'dd if=part-0.out of=data.out bs=512
>> count=1').
>>
>>  By looking at the Hadoop sources I noticed that the FetcherOutput class
>>> mentioned in this error message is determined by the SequenceFile class and
>>> obtained from the sequence file itself.
>>>
>>
>> The class itself is not obtained from the file - it's loaded from the
>> classpath. The thing that is missing here is the right name of the class as
>> determined by the symbolic name inside the SequenceFile.
>>
>>
>