You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Pat Ferrel <pa...@occamsmachete.com> on 2012/06/04 23:06:08 UTC

mini node in a cluster

I have a machine that is part of the cluster but I'd like to dedicate it 
to being the web server and run the db but still have access to starting 
jobs and getting data out of hdfs. In other words I'd like to have the 
cores, memory, and disk only minimally affected by running jobs on the 
cluster yet still have easy access when I need to get data out.

I assume I can do something like set the max number of jobs for the node 
to 0 and something similar for hdfs? Is there a recommended way to go 
about this?

Re: mini node in a cluster

Posted by Pat Ferrel <pa...@occamsmachete.com>.

OK, so remove the mini-node (client) from the master's slaves since it's 
no longer a node. This will cause the client to not get started when the 
master starts. There is no init.d script on the client only the master 
since it was always started by the master through ssh and start-all.sh. 
The config on the client is still set to point to the master so hadoops 
-fs will get data from the cluster. I suppose having init.d scripts on 
all the slaves (not the clients) as well as the master is a better way 
to handle power outages since they will come up at slightly different times.

I think I get it now. (correct me if I'm wrong)

Thanks,
Pat

On 6/4/12 4:06 PM, Tom Melendez wrote:
> Hi Pat,
>
>> Sounds like the trick. This node is a slave so it's datanode and tasktracker
>> are started from the master.
>>    - how do I start the cluster without starting the datanode and the
>> tasktracker on the mini-node slave? Remove it from slaves?
> There's no "main" cluster software, just don't start those services.
> If you're on Linux and have init.d scripts, look for the ones that are
> appended with datanode and tasktracker.
>
>>    - what do I minimally need to start on the mini-node?
>>
> Nothing except the hadoop jars.  The presence of the config files in
> your CLASSPATH is all you need to talk to your cluster.  So, if you
> can run hadoop dfs -ls /some/path/in/hdfs and it succeeds, you're
> probably OK.
>
>> Also I have replication set to 2 so the data will just get re-replicated
>> once the mini-node is reconfigured, right? There should be another copy
>> somewhere on the cluster.
>>
> Probably.
>
> It's not really a "mini-node", it's really just a client at this
> point, it's not known by your cluster.  You could configure your
> laptop or any other machine to do the same thing, for example.
>
> Thanks,
>
> Tom
>
>

Re: mini node in a cluster

Posted by Tom Melendez <to...@supertom.com>.

Hi Pat,

> Sounds like the trick. This node is a slave so it's datanode and tasktracker
> are started from the master.
>   - how do I start the cluster without starting the datanode and the
> tasktracker on the mini-node slave? Remove it from slaves?

There's no "main" cluster software, just don't start those services.
If you're on Linux and have init.d scripts, look for the ones that are
appended with datanode and tasktracker.

>   - what do I minimally need to start on the mini-node?
>

Nothing except the hadoop jars.  The presence of the config files in
your CLASSPATH is all you need to talk to your cluster.  So, if you
can run hadoop dfs -ls /some/path/in/hdfs and it succeeds, you're
probably OK.

> Also I have replication set to 2 so the data will just get re-replicated
> once the mini-node is reconfigured, right? There should be another copy
> somewhere on the cluster.
>

Probably.

It's not really a "mini-node", it's really just a client at this
point, it's not known by your cluster.  You could configure your
laptop or any other machine to do the same thing, for example.

Thanks,

Tom

Re: mini node in a cluster

Posted by Pat Ferrel <pa...@occamsmachete.com>.

Hi Tom,

Sounds like the trick. This node is a slave so it's datanode and 
tasktracker are started from the master.
    - how do I start the cluster without starting the datanode and the 
tasktracker on the mini-node slave? Remove it from slaves?
    - what do I minimally need to start on the mini-node?

Also I have replication set to 2 so the data will just get re-replicated 
once the mini-node is reconfigured, right? There should be another copy 
somewhere on the cluster.

Thanks
Pat

On 6/4/12 2:38 PM, Tom Melendez wrote:
> Hi Pat,
>
> Sounds like you would just turn off the datanode and the tasktracker.
> Your config will still point to the Namenode and JT, so you can still
> launch jobs and read/write from HDFS.
>
> You'll probably want to replicate the data off first of course.
>
> Thanks,
>
> Tom
>
> On Mon, Jun 4, 2012 at 2:06 PM, Pat Ferrel<pa...@occamsmachete.com>  wrote:
>> I have a machine that is part of the cluster but I'd like to dedicate it to
>> being the web server and run the db but still have access to starting jobs
>> and getting data out of hdfs. In other words I'd like to have the cores,
>> memory, and disk only minimally affected by running jobs on the cluster yet
>> still have easy access when I need to get data out.
>>
>> I assume I can do something like set the max number of jobs for the node to
>> 0 and something similar for hdfs? Is there a recommended way to go about
>> this?
>

Re: mini node in a cluster

Posted by Tom Melendez <to...@supertom.com>.

Hi Pat,

Sounds like you would just turn off the datanode and the tasktracker.
Your config will still point to the Namenode and JT, so you can still
launch jobs and read/write from HDFS.

You'll probably want to replicate the data off first of course.

Thanks,

Tom

On Mon, Jun 4, 2012 at 2:06 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
> I have a machine that is part of the cluster but I'd like to dedicate it to
> being the web server and run the db but still have access to starting jobs
> and getting data out of hdfs. In other words I'd like to have the cores,
> memory, and disk only minimally affected by running jobs on the cluster yet
> still have easy access when I need to get data out.
>
> I assume I can do something like set the max number of jobs for the node to
> 0 and something similar for hdfs? Is there a recommended way to go about
> this?