You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Cam Macdonell <ca...@cs.ualberta.ca> on 2009/04/16 01:02:25 UTC
Error reading task output
Hi,
I'm getting the following warning when running the simple wordcount and
grep examples.
09/04/15 16:54:16 INFO mapred.JobClient: Task Id :
attempt_200904151649_0001_m_000019_0, Status : FAILED
Too many fetch-failures
09/04/15 16:54:16 WARN mapred.JobClient: Error reading task
outputhttp://localhost.localdomain:50060/tasklog?plaintext=true&taskid=attempt_200904151649_0001_m_000019_0&filter=stdout
09/04/15 16:54:16 WARN mapred.JobClient: Error reading task
outputhttp://localhost.localdomain:50060/tasklog?plaintext=true&taskid=attempt_200904151649_0001_m_000019_0&filter=stderr
The only advice I could find from other posts with similar errors is to
setup /etc/hosts with all slaves and the host IPs. I did this, but I
still get the warning above. The output seems to come out alright
however (I guess that's why it is a warning).
I tried running a wget on the http:// address in the warning message and
I get the following back
2009-04-15 16:53:46 ERROR 400: Argument taskid is required.
So perhaps the wrong task ID is being passed to the http request. Any
ideas on what can get rid of these warnings?
Thanks,
Cam
Re: Error reading task output
Posted by Steve Loughran <st...@apache.org>.
Cam Macdonell wrote:
>
> Well, for future googlers, I'll answer my own post. Watch our for the
> hostname at the end of "localhost" lines on slaves. One of my slaves
> was registering itself as "localhost.localdomain" with the jobtracker.
>
> Is there a way that Hadoop could be made to not be so dependent on
> /etc/hosts, but on more dynamic hostname resolution?
>
DNS is trouble in Java; there are some (outstanding) bugreps/hadoop
patches on the topic, mostly showing up on a machine of mine with a bad
hosts entry. I also encountered some fun last month with ubuntu linux
adding the local hostname to /etc/hosts along the 127.0.0.1 entry, which
is precisely what you dont want for a cluster of vms with no DNS at all.
This sounds like your problem too, in which case I have shared your pain
http://www.1060.org/blogxter/entry?publicid=121ED68BB21DB8C060FE88607222EB52
Re: Error reading task output
Posted by Steve Loughran <st...@apache.org>.
Aaron Kimball wrote:
> Cam,
>
> This isn't Hadoop-specific, it's how Linux treats its network configuration.
> If you look at /etc/host.conf, you'll probably see a line that says "order
> hosts, bind" -- this is telling Linux's DNS resolution library to first read
> your /etc/hosts file, then check an external DNS server.
>
> You could probably disable local hostfile checking, but that means that
> every time a program on your system queries the authoritative hostname for
> "localhost", it'll go out to the network. You'll probably see a big
> performance hit. The better solution, I think, is to get your nodes'
> /etc/hosts files squared away.
I agree
> You only need to do so once :)
No, you need to detect whenever the Linux networking stack has decided
to add new entries to resolv.conf or /etc/hosts and detect when they
are inappropriate. Which is a tricky thing to do as there are some cases
where you may actually be grateful that someone in the debian codebase
decided that adding the local hostname as 127.0.0.1 is actually a
feature. I ended up writing a new SmartFrog component that can be
configured to fail to start if the network is a mess, which is something
worth pushing out.
as part of hadoop diagnostics, this test would be one of the things to
deal with and at least warn on. "your hostname is local, you will not be
visible over the network".
-steve
Re: Error reading task output
Posted by Aaron Kimball <aa...@cloudera.com>.
Cam,
This isn't Hadoop-specific, it's how Linux treats its network configuration.
If you look at /etc/host.conf, you'll probably see a line that says "order
hosts, bind" -- this is telling Linux's DNS resolution library to first read
your /etc/hosts file, then check an external DNS server.
You could probably disable local hostfile checking, but that means that
every time a program on your system queries the authoritative hostname for
"localhost", it'll go out to the network. You'll probably see a big
performance hit. The better solution, I think, is to get your nodes'
/etc/hosts files squared away. You only need to do so once :)
-- Aaron
On Thu, Apr 16, 2009 at 11:31 AM, Cam Macdonell <ca...@cs.ualberta.ca> wrote:
> Cam Macdonell wrote:
>
>>
>> Hi,
>>
>> I'm getting the following warning when running the simple wordcount and
>> grep examples.
>>
>> 09/04/15 16:54:16 INFO mapred.JobClient: Task Id :
>> attempt_200904151649_0001_m_000019_0, Status : FAILED
>> Too many fetch-failures
>> 09/04/15 16:54:16 WARN mapred.JobClient: Error reading task
>> outputhttp://localhost.localdomain:50060/tasklog?plaintext=true&taskid=attempt_200904151649_0001_m_000019_0&filter=stdout
>>
>> 09/04/15 16:54:16 WARN mapred.JobClient: Error reading task
>> outputhttp://localhost.localdomain:50060/tasklog?plaintext=true&taskid=attempt_200904151649_0001_m_000019_0&filter=stderr
>>
>>
>> The only advice I could find from other posts with similar errors is to
>> setup /etc/hosts with all slaves and the host IPs. I did this, but I still
>> get the warning above. The output seems to come out alright however (I
>> guess that's why it is a warning).
>>
>> I tried running a wget on the http:// address in the warning message and
>> I get the following back
>>
>> 2009-04-15 16:53:46 ERROR 400: Argument taskid is required.
>>
>> So perhaps the wrong task ID is being passed to the http request. Any
>> ideas on what can get rid of these warnings?
>>
>> Thanks,
>> Cam
>>
>
> Well, for future googlers, I'll answer my own post. Watch our for the
> hostname at the end of "localhost" lines on slaves. One of my slaves was
> registering itself as "localhost.localdomain" with the jobtracker.
>
> Is there a way that Hadoop could be made to not be so dependent on
> /etc/hosts, but on more dynamic hostname resolution?
>
> Cam
>
Re: Error reading task output
Posted by Cam Macdonell <ca...@cs.ualberta.ca>.
Cam Macdonell wrote:
>
> Hi,
>
> I'm getting the following warning when running the simple wordcount and
> grep examples.
>
> 09/04/15 16:54:16 INFO mapred.JobClient: Task Id :
> attempt_200904151649_0001_m_000019_0, Status : FAILED
> Too many fetch-failures
> 09/04/15 16:54:16 WARN mapred.JobClient: Error reading task
> outputhttp://localhost.localdomain:50060/tasklog?plaintext=true&taskid=attempt_200904151649_0001_m_000019_0&filter=stdout
>
> 09/04/15 16:54:16 WARN mapred.JobClient: Error reading task
> outputhttp://localhost.localdomain:50060/tasklog?plaintext=true&taskid=attempt_200904151649_0001_m_000019_0&filter=stderr
>
>
> The only advice I could find from other posts with similar errors is to
> setup /etc/hosts with all slaves and the host IPs. I did this, but I
> still get the warning above. The output seems to come out alright
> however (I guess that's why it is a warning).
>
> I tried running a wget on the http:// address in the warning message and
> I get the following back
>
> 2009-04-15 16:53:46 ERROR 400: Argument taskid is required.
>
> So perhaps the wrong task ID is being passed to the http request. Any
> ideas on what can get rid of these warnings?
>
> Thanks,
> Cam
Well, for future googlers, I'll answer my own post. Watch our for the
hostname at the end of "localhost" lines on slaves. One of my slaves
was registering itself as "localhost.localdomain" with the jobtracker.
Is there a way that Hadoop could be made to not be so dependent on
/etc/hosts, but on more dynamic hostname resolution?
Cam