You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Chris Nauroth <cn...@hortonworks.com> on 2013/04/15 19:32:37 UTC

Re: How to run hadoop jar command in a clustered environment

Hello Thoihen,

I'm moving this discussion from common-dev (questions about developing
Hadoop) to user (questions about using Hadoop).

If you haven't already seen it, then I recommend reading the cluster setup
documentation.  It's a bit different depending on the version of the Hadoop
code that you're deploying and running.  You mentioned JobTracker, so I
expect that you're using something from the 1.x line, but here are links to
both 1.x and 2.x docs just in case:

1.x: http://hadoop.apache.org/docs/r1.1.2/cluster_setup.html
2.x/trunk:
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html

To address your specific questions:

1. You can run the hadoop jar command and submit MapReduce jobs from any
machine that has the Hadoop software and configuration deployed and has
network connectivity to the machines that make up the Hadoop cluster.

2. Yes, you can use a separate machine that is not a member of the cluster
(meaning it does not run Hadoop daemons like DataNode, TaskTracker, or
NodeManager).  This is your choice.  I've found it valuable to isolate
nodes like this to prevent MR job tasks from taking processing resources
away from interactive user commands, but this does mean that the resources
on that node can't be utilized by MR jobs during user idle times, so it
causes a small hit to overall utilization.

Hope this helps,
--Chris

On Mon, Apr 15, 2013 at 9:36 AM, Thoihen Maibam <th...@gmail.com>wrote:

> Hi All,
>
> I am really new to Hadoop and installed hadoop in my local ubuntu machine.
> I also created a wordcount.jar and started hadoop with start-all.sh which
> started all the hadoop daemons and used jps to confirm it. Cd to hadoop/bin
> and ran hadoop jar x.jar  and successfully ran the map reduce program.
>
> Now, can someone please help me how I should run the hadoop jar command
> over a clustered environment say for example a cluster with 50 nodes. I
> know a dedicated machine would be namenode and another jobtracker and other
> datanodes and tasktrackers.
>
> 1. From which machine should I run the hadoop jar command considering I
> have a mapreduce jar in hand. Is it the jobtracker machine from where I
> should run this hadoop jar command or can I run this hadoop jar command
> from any machine in the cluster.
>
> 2, Can I run the map reduce job from another machine which is not part of
> the cluster , if yes how should I do it.
>
> Please help me.
>
> Regards
> thoihen
>

Re: How to run hadoop jar command in a clustered environment

Posted by maisnam ns <ma...@gmail.com>.

@Chris thanks a lot that helped a lot.


On Mon, Apr 15, 2013 at 11:02 PM, Chris Nauroth <cn...@hortonworks.com>wrote:

> Hello Thoihen,
>
> I'm moving this discussion from common-dev (questions about developing
> Hadoop) to user (questions about using Hadoop).
>
> If you haven't already seen it, then I recommend reading the cluster setup
> documentation.  It's a bit different depending on the version of the Hadoop
> code that you're deploying and running.  You mentioned JobTracker, so I
> expect that you're using something from the 1.x line, but here are links to
> both 1.x and 2.x docs just in case:
>
> 1.x: http://hadoop.apache.org/docs/r1.1.2/cluster_setup.html
> 2.x/trunk:
>
> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html
>
> To address your specific questions:
>
> 1. You can run the hadoop jar command and submit MapReduce jobs from any
> machine that has the Hadoop software and configuration deployed and has
> network connectivity to the machines that make up the Hadoop cluster.
>
> 2. Yes, you can use a separate machine that is not a member of the cluster
> (meaning it does not run Hadoop daemons like DataNode, TaskTracker, or
> NodeManager).  This is your choice.  I've found it valuable to isolate
> nodes like this to prevent MR job tasks from taking processing resources
> away from interactive user commands, but this does mean that the resources
> on that node can't be utilized by MR jobs during user idle times, so it
> causes a small hit to overall utilization.
>
> Hope this helps,
> --Chris
>
>
> On Mon, Apr 15, 2013 at 9:36 AM, Thoihen Maibam <thoihen123@gmail.com
> >wrote:
>
> > Hi All,
> >
> > I am really new to Hadoop and installed hadoop in my local ubuntu
> machine.
> > I also created a wordcount.jar and started hadoop with start-all.sh which
> > started all the hadoop daemons and used jps to confirm it. Cd to
> hadoop/bin
> > and ran hadoop jar x.jar  and successfully ran the map reduce program.
> >
> > Now, can someone please help me how I should run the hadoop jar command
> > over a clustered environment say for example a cluster with 50 nodes. I
> > know a dedicated machine would be namenode and another jobtracker and
> other
> > datanodes and tasktrackers.
> >
> > 1. From which machine should I run the hadoop jar command considering I
> > have a mapreduce jar in hand. Is it the jobtracker machine from where I
> > should run this hadoop jar command or can I run this hadoop jar command
> > from any machine in the cluster.
> >
> > 2, Can I run the map reduce job from another machine which is not part of
> > the cluster , if yes how should I do it.
> >
> > Please help me.
> >
> > Regards
> > thoihen
> >
>