You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by S D <sd...@gmail.com> on 2009/02/17 23:14:20 UTC

How do you remove a machine from the cluster? Slaves file not working...

I have a Hadoop 0.19.0 cluster of 3 machines (storm, mystique, batman). It
seemed as if problems were occurring on mystique (I was noticing errors with
tasks that executed on mystique). So I decided to remove mystique. I did so
by calling stop-mapred.sh (I'm using S3 Native, not HDFS), removing mystique
from the $HADOOP_HOME/conf/slaves file on storm and batman. I then called
start-mapred.sh and verified (via the output of start-mapred.sh) that
tasktrackers were started only on batman and storm. When I started my
MapReduce program I viewed the task tracker machine list web interface and
saw that not only was mystique listed as one of the task trackers but that a
task had been assigned to it. How can I keep a machine from being included
in a cluster?

Any help is appreciated.

Thanks,
John

Could not obtain block blk ...

Posted by Arv Mistry <ar...@kindsight.net>.

I am using hadoop 18.3, I have a single datanode and it appears to be up and running fine. I'm able to read/write data to it.
 
However, when I try to spawn a map/reduce job it fails with "Could not obtain block: blk_3263745172951227264_1155 file   =/opt/kindsight/hadoop/data/mapred/system/job_200902171547_0001/job.xml"
 
I noticed in the logs the following exception. I have not specified a mapred.fairscheduler.allocation.file but I have another machine(s) running the same configuration and that seems to work. If I do have to specify that file how do I do it and whats the format of the file?
 
Any help would be appreciated.
 
2009-02-17 15:47:03,304 WARN org.apache.hadoop.mapred.PoolManager: No mapred.fairscheduler.allocation.file given in jobconf - the fair scheduler will    not use any queues.
2009-02-17 15:47:03,319 INFO org.apache.hadoop.mapred.FairScheduler: Successfully configured FairScheduler
2009-02-17 15:47:17,338 ERROR org.apache.hadoop.mapred.PoolManager: Failed to reload allocations file - will use existing allocations.
java.lang.NullPointerException
        at java.io.File.<init>(File.java:222)
        at org.apache.hadoop.mapred.PoolManager.reloadAllocsIfNecessary(PoolManager.java:116)
        at org.apache.hadoop.mapred.FairScheduler.assignTasks(FairScheduler.java:226)
        at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1288)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
2009-02-17 15:52:29,611 INFO org.apache.hadoop.dfs.DFSClient: Could not obtain block blk_3263745172951227264_1155 from any node:  java.io.IOException:    No live nodes contain current block
2009-02-17 15:52:32,617 INFO org.apache.hadoop.dfs.DFSClient: Could not obtain block blk_3263745172951227264_1155 from any node:  java.io.IOException:    No live nodes contain current block
2009-02-17 15:52:35,626 INFO org.apache.hadoop.dfs.DFSClient: Could not obtain block blk_3263745172951227264_1155 from any node:  java.io.IOException:    No live nodes contain current block
2009-02-17 15:52:38,631 WARN org.apache.hadoop.dfs.DFSClient: DFS Read: java.io.IOException: Could not obtain block: blk_3263745172951227264_1155 file   =/opt/kindsight/hadoop/data/mapred/system/job_200902171547_0001/job.xml
        at org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1470)
        at org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1320)
        at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1425)
        at java.io.DataInputStream.read(DataInputStream.java:83)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)

Cheers Arv

Re: How do you remove a machine from the cluster? Slaves file not working...

Posted by S D <sd...@gmail.com>.

Thanks for this. I've set that property in my hadoop-site.xml file.
Personally, that property seems a bit redundant given the slaves file. I
think a better design would either use the value found in the file specified
by mapred.hosts or use the contents of the slaves file but not both. In my
case I set the value of mapred.hosts to point to the slaves file.

...but back to the first solution given in this thread...it seems that
running 'hadoop dfsadmin -refreshNodes' did the trick. After running that
command and then restarting the Map/Reduce framework (stop-mapred.sh
followed by start-mapred.sh) the nodes were updated successfully. Prior
attempts to stop and start the Map/Reduce framework didn't seem to work....

Thanks,
John

On Tue, Feb 17, 2009 at 7:41 PM, Tom White <to...@cloudera.com> wrote:

> The decommission process is for data nodes - which you are not
> running. Have a look at the mapred.hosts.exclude property for how to
> exclude tasktrackers.
>
> Tom
>
> On Tue, Feb 17, 2009 at 5:31 PM, S D <sd...@gmail.com> wrote:
> > Thanks for your response. For clarification, I'm using S3 Native instead
> of
> > HDFS. Hence, I'm not even calling start-dfs.sh since I'm not using a
> > distributed filesystem. Given such a situation, is decommissioning nodes
> > applicable? When I ran 'hadoop dfsadmin -refreshNodes' I received the
> > following response:
> >
> > FileSystem is s3n://<bucketname>
> >
> > Thanks,
> > John
> >
> > On Tue, Feb 17, 2009 at 4:20 PM, Amandeep Khurana <am...@gmail.com>
> wrote:
> >
> >> You have to decommission the node. Look at
> >> http://wiki.apache.org/hadoop/FAQ#17
> >>
> >> Amandeep
> >>
> >>
> >> Amandeep Khurana
> >> Computer Science Graduate Student
> >> University of California, Santa Cruz
> >>
> >>
> >> On Tue, Feb 17, 2009 at 2:14 PM, S D <sd...@gmail.com> wrote:
> >>
> >> > I have a Hadoop 0.19.0 cluster of 3 machines (storm, mystique,
> batman).
> >> It
> >> > seemed as if problems were occurring on mystique (I was noticing
> errors
> >> > with
> >> > tasks that executed on mystique). So I decided to remove mystique. I
> did
> >> so
> >> > by calling stop-mapred.sh (I'm using S3 Native, not HDFS), removing
> >> > mystique
> >> > from the $HADOOP_HOME/conf/slaves file on storm and batman. I then
> called
> >> > start-mapred.sh and verified (via the output of start-mapred.sh) that
> >> > tasktrackers were started only on batman and storm. When I started my
> >> > MapReduce program I viewed the task tracker machine list web interface
> >> and
> >> > saw that not only was mystique listed as one of the task trackers but
> >> that
> >> > a
> >> > task had been assigned to it. How can I keep a machine from being
> >> included
> >> > in a cluster?
> >> >
> >> > Any help is appreciated.
> >> >
> >> > Thanks,
> >> > John
> >> >
> >>
> >
>

Re: How do you remove a machine from the cluster? Slaves file not working...

Posted by Tom White <to...@cloudera.com>.

The decommission process is for data nodes - which you are not
running. Have a look at the mapred.hosts.exclude property for how to
exclude tasktrackers.

Tom

On Tue, Feb 17, 2009 at 5:31 PM, S D <sd...@gmail.com> wrote:
> Thanks for your response. For clarification, I'm using S3 Native instead of
> HDFS. Hence, I'm not even calling start-dfs.sh since I'm not using a
> distributed filesystem. Given such a situation, is decommissioning nodes
> applicable? When I ran 'hadoop dfsadmin -refreshNodes' I received the
> following response:
>
> FileSystem is s3n://<bucketname>
>
> Thanks,
> John
>
> On Tue, Feb 17, 2009 at 4:20 PM, Amandeep Khurana <am...@gmail.com> wrote:
>
>> You have to decommission the node. Look at
>> http://wiki.apache.org/hadoop/FAQ#17
>>
>> Amandeep
>>
>>
>> Amandeep Khurana
>> Computer Science Graduate Student
>> University of California, Santa Cruz
>>
>>
>> On Tue, Feb 17, 2009 at 2:14 PM, S D <sd...@gmail.com> wrote:
>>
>> > I have a Hadoop 0.19.0 cluster of 3 machines (storm, mystique, batman).
>> It
>> > seemed as if problems were occurring on mystique (I was noticing errors
>> > with
>> > tasks that executed on mystique). So I decided to remove mystique. I did
>> so
>> > by calling stop-mapred.sh (I'm using S3 Native, not HDFS), removing
>> > mystique
>> > from the $HADOOP_HOME/conf/slaves file on storm and batman. I then called
>> > start-mapred.sh and verified (via the output of start-mapred.sh) that
>> > tasktrackers were started only on batman and storm. When I started my
>> > MapReduce program I viewed the task tracker machine list web interface
>> and
>> > saw that not only was mystique listed as one of the task trackers but
>> that
>> > a
>> > task had been assigned to it. How can I keep a machine from being
>> included
>> > in a cluster?
>> >
>> > Any help is appreciated.
>> >
>> > Thanks,
>> > John
>> >
>>
>

Re: How do you remove a machine from the cluster? Slaves file not working...

Posted by S D <sd...@gmail.com>.

Thanks for your response. For clarification, I'm using S3 Native instead of
HDFS. Hence, I'm not even calling start-dfs.sh since I'm not using a
distributed filesystem. Given such a situation, is decommissioning nodes
applicable? When I ran 'hadoop dfsadmin -refreshNodes' I received the
following response:

FileSystem is s3n://<bucketname>

Thanks,
John

On Tue, Feb 17, 2009 at 4:20 PM, Amandeep Khurana <am...@gmail.com> wrote:

> You have to decommission the node. Look at
> http://wiki.apache.org/hadoop/FAQ#17
>
> Amandeep
>
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>
>
> On Tue, Feb 17, 2009 at 2:14 PM, S D <sd...@gmail.com> wrote:
>
> > I have a Hadoop 0.19.0 cluster of 3 machines (storm, mystique, batman).
> It
> > seemed as if problems were occurring on mystique (I was noticing errors
> > with
> > tasks that executed on mystique). So I decided to remove mystique. I did
> so
> > by calling stop-mapred.sh (I'm using S3 Native, not HDFS), removing
> > mystique
> > from the $HADOOP_HOME/conf/slaves file on storm and batman. I then called
> > start-mapred.sh and verified (via the output of start-mapred.sh) that
> > tasktrackers were started only on batman and storm. When I started my
> > MapReduce program I viewed the task tracker machine list web interface
> and
> > saw that not only was mystique listed as one of the task trackers but
> that
> > a
> > task had been assigned to it. How can I keep a machine from being
> included
> > in a cluster?
> >
> > Any help is appreciated.
> >
> > Thanks,
> > John
> >
>

Re: How do you remove a machine from the cluster? Slaves file not working...

Posted by Amandeep Khurana <am...@gmail.com>.

You have to decommission the node. Look at
http://wiki.apache.org/hadoop/FAQ#17

Amandeep


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Tue, Feb 17, 2009 at 2:14 PM, S D <sd...@gmail.com> wrote:

> I have a Hadoop 0.19.0 cluster of 3 machines (storm, mystique, batman). It
> seemed as if problems were occurring on mystique (I was noticing errors
> with
> tasks that executed on mystique). So I decided to remove mystique. I did so
> by calling stop-mapred.sh (I'm using S3 Native, not HDFS), removing
> mystique
> from the $HADOOP_HOME/conf/slaves file on storm and batman. I then called
> start-mapred.sh and verified (via the output of start-mapred.sh) that
> tasktrackers were started only on batman and storm. When I started my
> MapReduce program I viewed the task tracker machine list web interface and
> saw that not only was mystique listed as one of the task trackers but that
> a
> task had been assigned to it. How can I keep a machine from being included
> in a cluster?
>
> Any help is appreciated.
>
> Thanks,
> John
>