You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Cam Macdonell <ca...@cs.ualberta.ca> on 2009/04/16 01:02:25 UTC

Error reading task output

Hi,

I'm getting the following warning when running the simple wordcount and 
grep examples.

09/04/15 16:54:16 INFO mapred.JobClient: Task Id : 
attempt_200904151649_0001_m_000019_0, Status : FAILED
Too many fetch-failures
09/04/15 16:54:16 WARN mapred.JobClient: Error reading task 
outputhttp://localhost.localdomain:50060/tasklog?plaintext=true&taskid=attempt_200904151649_0001_m_000019_0&filter=stdout
09/04/15 16:54:16 WARN mapred.JobClient: Error reading task 
outputhttp://localhost.localdomain:50060/tasklog?plaintext=true&taskid=attempt_200904151649_0001_m_000019_0&filter=stderr

The only advice I could find from other posts with similar errors is to 
setup /etc/hosts with all slaves and the host IPs.  I did this, but I 
still get the warning above.  The output seems to come out alright 
however (I guess that's why it is a warning).

I tried running a wget on the http:// address in the warning message and 
I get the following back

2009-04-15 16:53:46 ERROR 400: Argument taskid is required.

So perhaps the wrong task ID is being passed to the http request.  Any 
ideas on what can get rid of these warnings?

Thanks,
Cam

Re: Error reading task output

Posted by Steve Loughran <st...@apache.org>.
Cam Macdonell wrote:

> 
> Well, for future googlers, I'll answer my own post.  Watch our for the 
> hostname at the end of "localhost" lines on slaves.  One of my slaves 
> was registering itself as "localhost.localdomain" with the jobtracker.
> 
> Is there a way that Hadoop could be made to not be so dependent on 
> /etc/hosts, but on more dynamic hostname resolution?
> 

DNS is trouble in Java; there are some (outstanding) bugreps/hadoop 
patches on the topic, mostly showing up on a machine of mine with a bad 
hosts entry. I also encountered some fun last month with ubuntu linux 
adding the local hostname to /etc/hosts along the 127.0.0.1 entry, which 
is precisely what you dont want for a cluster of vms with no DNS at all. 
This sounds like your problem too, in which case I have shared your pain

http://www.1060.org/blogxter/entry?publicid=121ED68BB21DB8C060FE88607222EB52


Re: Error reading task output

Posted by Steve Loughran <st...@apache.org>.
Aaron Kimball wrote:
> Cam,
> 
> This isn't Hadoop-specific, it's how Linux treats its network configuration.
> If you look at /etc/host.conf, you'll probably see a line that says "order
> hosts, bind" -- this is telling Linux's DNS resolution library to first read
> your /etc/hosts file, then check an external DNS server.
> 
> You could probably disable local hostfile checking, but that means that
> every time a program on your system queries the authoritative hostname for
> "localhost", it'll go out to the network. You'll probably see a big
> performance hit. The better solution, I think, is to get your nodes'
> /etc/hosts files squared away.

I agree

> You only need to do so once :)

No, you need to detect whenever the Linux networking stack has decided 
to add new entries to resolv.conf  or /etc/hosts and detect when they 
are inappropriate. Which is a tricky thing to do as there are some cases 
where you may actually be grateful that someone in the debian codebase 
decided that adding the local hostname as 127.0.0.1 is actually a 
feature. I ended up writing a new SmartFrog component that can be 
configured to fail to start if the network is a mess, which is something 
worth pushing out.

as part of hadoop diagnostics, this test would be one of the things to 
deal with and at least warn on. "your hostname is local, you will not be 
visible over the network".

-steve


Re: Error reading task output

Posted by Aaron Kimball <aa...@cloudera.com>.
Cam,

This isn't Hadoop-specific, it's how Linux treats its network configuration.
If you look at /etc/host.conf, you'll probably see a line that says "order
hosts, bind" -- this is telling Linux's DNS resolution library to first read
your /etc/hosts file, then check an external DNS server.

You could probably disable local hostfile checking, but that means that
every time a program on your system queries the authoritative hostname for
"localhost", it'll go out to the network. You'll probably see a big
performance hit. The better solution, I think, is to get your nodes'
/etc/hosts files squared away. You only need to do so once :)


-- Aaron


On Thu, Apr 16, 2009 at 11:31 AM, Cam Macdonell <ca...@cs.ualberta.ca> wrote:

> Cam Macdonell wrote:
>
>>
>> Hi,
>>
>> I'm getting the following warning when running the simple wordcount and
>> grep examples.
>>
>> 09/04/15 16:54:16 INFO mapred.JobClient: Task Id :
>> attempt_200904151649_0001_m_000019_0, Status : FAILED
>> Too many fetch-failures
>> 09/04/15 16:54:16 WARN mapred.JobClient: Error reading task
>> outputhttp://localhost.localdomain:50060/tasklog?plaintext=true&taskid=attempt_200904151649_0001_m_000019_0&filter=stdout
>>
>> 09/04/15 16:54:16 WARN mapred.JobClient: Error reading task
>> outputhttp://localhost.localdomain:50060/tasklog?plaintext=true&taskid=attempt_200904151649_0001_m_000019_0&filter=stderr
>>
>>
>> The only advice I could find from other posts with similar errors is to
>> setup /etc/hosts with all slaves and the host IPs.  I did this, but I still
>> get the warning above.  The output seems to come out alright however (I
>> guess that's why it is a warning).
>>
>> I tried running a wget on the http:// address in the warning message and
>> I get the following back
>>
>> 2009-04-15 16:53:46 ERROR 400: Argument taskid is required.
>>
>> So perhaps the wrong task ID is being passed to the http request.  Any
>> ideas on what can get rid of these warnings?
>>
>> Thanks,
>> Cam
>>
>
> Well, for future googlers, I'll answer my own post.  Watch our for the
> hostname at the end of "localhost" lines on slaves.  One of my slaves was
> registering itself as "localhost.localdomain" with the jobtracker.
>
> Is there a way that Hadoop could be made to not be so dependent on
> /etc/hosts, but on more dynamic hostname resolution?
>
> Cam
>

Re: Error reading task output

Posted by Cam Macdonell <ca...@cs.ualberta.ca>.
Cam Macdonell wrote:
> 
> Hi,
> 
> I'm getting the following warning when running the simple wordcount and 
> grep examples.
> 
> 09/04/15 16:54:16 INFO mapred.JobClient: Task Id : 
> attempt_200904151649_0001_m_000019_0, Status : FAILED
> Too many fetch-failures
> 09/04/15 16:54:16 WARN mapred.JobClient: Error reading task 
> outputhttp://localhost.localdomain:50060/tasklog?plaintext=true&taskid=attempt_200904151649_0001_m_000019_0&filter=stdout 
> 
> 09/04/15 16:54:16 WARN mapred.JobClient: Error reading task 
> outputhttp://localhost.localdomain:50060/tasklog?plaintext=true&taskid=attempt_200904151649_0001_m_000019_0&filter=stderr 
> 
> 
> The only advice I could find from other posts with similar errors is to 
> setup /etc/hosts with all slaves and the host IPs.  I did this, but I 
> still get the warning above.  The output seems to come out alright 
> however (I guess that's why it is a warning).
> 
> I tried running a wget on the http:// address in the warning message and 
> I get the following back
> 
> 2009-04-15 16:53:46 ERROR 400: Argument taskid is required.
> 
> So perhaps the wrong task ID is being passed to the http request.  Any 
> ideas on what can get rid of these warnings?
> 
> Thanks,
> Cam

Well, for future googlers, I'll answer my own post.  Watch our for the 
hostname at the end of "localhost" lines on slaves.  One of my slaves 
was registering itself as "localhost.localdomain" with the jobtracker.

Is there a way that Hadoop could be made to not be so dependent on 
/etc/hosts, but on more dynamic hostname resolution?

Cam