You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by "ados1984@gmail.com" <ad...@gmail.com> on 2014/03/13 22:00:22 UTC

Reg: Setting up Hadoop Cluster

Hello Team,

I have one question regarding putting data into hdfs and running mapreduce
on data present in hdfs.

   1. hdfs is file system and so to interact with it what kind of clients
   are available? also where do we need to install those client?
   2. regarding pig, hive and mapreduce, where do we install them on hadoop
   cluster and from where do we run all scripts and how does it internally
   know that it needs to run on node 1, node2 or node 3?

any inputs here would really helpful.

Thanks, Andy.

Re: Reg: Setting up Hadoop Cluster

Posted by Geoffry Roberts <th...@gmail.com>.

Did you not populate the "slaves" file when you did your installation?  In
older versions of hadoop (< 2.0),  there was a "master" file where you
entered your name node.  Now days there are multiple name nodes.  I haven't
worked with them as of yet.

I installed pig, for example, on my name node and ran it from there.


On Thu, Mar 13, 2014 at 5:22 PM, ados1984@gmail.com <ad...@gmail.com>wrote:

> Thank you Geoffry,
>
> I have some fundamental question here.
>
>    1. Once I have installed Hadoop, how can i identify which nodes is
>    master node, which is slave?
>    2. My understanding is that master node is by default namenode and
>    slave node are data nodes, correct?
>    3. So i installed hadoop and i do not know which one is namenode and
>    which one id datanode then how can i go in and start run my jar from
>    namenode?
>    4. also when we do mapreduce programming, where do we write the
>    program on hadoop server (where we have nodes installed both
>    master/namenode and slaves/datanode) or in our local system using any
>    standard ide then package them together as jar and deploy it to name node,
>    but here again how can i identify which is name node and which is data node?
>    5. Ok, assumming, I have figured out which one is data node and which
>    one is namenode then how will my mapreduce program or pig or hive scripts
>    know that it needs to run on node 1 or node 2 or node 3?
>    6. also where do we install pig, hive and flume on hadoop
>    master/slaves nodes or somewhere else? and how do we let pig/hive know that
>    node 1 is master/namenode and other nodes are slaves or data nodes?
>
> I would really appreciate inputs on this questions as setting up hadoop is
> turning out to be a quite complex task from where i currently see it.
>
> Regards, Andy.
>
>
> On Thu, Mar 13, 2014 at 5:14 PM, Geoffry Roberts <th...@gmail.com>wrote:
>
>> Andy,
>>
>> Once you have hadoop running,  You can run your jobs from the cli of the
>> name node. When I write a map reduce job, I jar it up. and place it in,
>> say, my home directory and run it from there.  I do the same with pig
>> scripts.  I've used neither hive nor cascading, but I imagine they would
>> work the same.
>>
>> Another approach I've tried is WebHDFS.  It's for manipulating the hdfs
>> via a restful interface.  It worked well enough for me.  I stopped using it
>> when I discovered it didn't support MapFiles but that's another story.
>>
>>
>> On Thu, Mar 13, 2014 at 5:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:
>>
>>> Hello Team,
>>>
>>> I have one question regarding putting data into hdfs and running
>>> mapreduce on data present in hdfs.
>>>
>>>    1. hdfs is file system and so to interact with it what kind of
>>>    clients are available? also where do we need to install those client?
>>>    2. regarding pig, hive and mapreduce, where do we install them on
>>>    hadoop cluster and from where do we run all scripts and how does it
>>>    internally know that it needs to run on node 1, node2 or node 3?
>>>
>>> any inputs here would really helpful.
>>>
>>> Thanks, Andy.
>>>
>>
>>
>>
>> --
>> There are ways and there are ways,
>>
>> Geoffry Roberts
>>
>
>


-- 
There are ways and there are ways,

Geoffry Roberts

Re: Reg: Setting up Hadoop Cluster

Posted by Geoffry Roberts <th...@gmail.com>.

Did you not populate the "slaves" file when you did your installation?  In
older versions of hadoop (< 2.0),  there was a "master" file where you
entered your name node.  Now days there are multiple name nodes.  I haven't
worked with them as of yet.

I installed pig, for example, on my name node and ran it from there.


On Thu, Mar 13, 2014 at 5:22 PM, ados1984@gmail.com <ad...@gmail.com>wrote:

> Thank you Geoffry,
>
> I have some fundamental question here.
>
>    1. Once I have installed Hadoop, how can i identify which nodes is
>    master node, which is slave?
>    2. My understanding is that master node is by default namenode and
>    slave node are data nodes, correct?
>    3. So i installed hadoop and i do not know which one is namenode and
>    which one id datanode then how can i go in and start run my jar from
>    namenode?
>    4. also when we do mapreduce programming, where do we write the
>    program on hadoop server (where we have nodes installed both
>    master/namenode and slaves/datanode) or in our local system using any
>    standard ide then package them together as jar and deploy it to name node,
>    but here again how can i identify which is name node and which is data node?
>    5. Ok, assumming, I have figured out which one is data node and which
>    one is namenode then how will my mapreduce program or pig or hive scripts
>    know that it needs to run on node 1 or node 2 or node 3?
>    6. also where do we install pig, hive and flume on hadoop
>    master/slaves nodes or somewhere else? and how do we let pig/hive know that
>    node 1 is master/namenode and other nodes are slaves or data nodes?
>
> I would really appreciate inputs on this questions as setting up hadoop is
> turning out to be a quite complex task from where i currently see it.
>
> Regards, Andy.
>
>
> On Thu, Mar 13, 2014 at 5:14 PM, Geoffry Roberts <th...@gmail.com>wrote:
>
>> Andy,
>>
>> Once you have hadoop running,  You can run your jobs from the cli of the
>> name node. When I write a map reduce job, I jar it up. and place it in,
>> say, my home directory and run it from there.  I do the same with pig
>> scripts.  I've used neither hive nor cascading, but I imagine they would
>> work the same.
>>
>> Another approach I've tried is WebHDFS.  It's for manipulating the hdfs
>> via a restful interface.  It worked well enough for me.  I stopped using it
>> when I discovered it didn't support MapFiles but that's another story.
>>
>>
>> On Thu, Mar 13, 2014 at 5:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:
>>
>>> Hello Team,
>>>
>>> I have one question regarding putting data into hdfs and running
>>> mapreduce on data present in hdfs.
>>>
>>>    1. hdfs is file system and so to interact with it what kind of
>>>    clients are available? also where do we need to install those client?
>>>    2. regarding pig, hive and mapreduce, where do we install them on
>>>    hadoop cluster and from where do we run all scripts and how does it
>>>    internally know that it needs to run on node 1, node2 or node 3?
>>>
>>> any inputs here would really helpful.
>>>
>>> Thanks, Andy.
>>>
>>
>>
>>
>> --
>> There are ways and there are ways,
>>
>> Geoffry Roberts
>>
>
>


-- 
There are ways and there are ways,

Geoffry Roberts

Re: Reg: Setting up Hadoop Cluster

Posted by Geoffry Roberts <th...@gmail.com>.

Did you not populate the "slaves" file when you did your installation?  In
older versions of hadoop (< 2.0),  there was a "master" file where you
entered your name node.  Now days there are multiple name nodes.  I haven't
worked with them as of yet.

I installed pig, for example, on my name node and ran it from there.


On Thu, Mar 13, 2014 at 5:22 PM, ados1984@gmail.com <ad...@gmail.com>wrote:

> Thank you Geoffry,
>
> I have some fundamental question here.
>
>    1. Once I have installed Hadoop, how can i identify which nodes is
>    master node, which is slave?
>    2. My understanding is that master node is by default namenode and
>    slave node are data nodes, correct?
>    3. So i installed hadoop and i do not know which one is namenode and
>    which one id datanode then how can i go in and start run my jar from
>    namenode?
>    4. also when we do mapreduce programming, where do we write the
>    program on hadoop server (where we have nodes installed both
>    master/namenode and slaves/datanode) or in our local system using any
>    standard ide then package them together as jar and deploy it to name node,
>    but here again how can i identify which is name node and which is data node?
>    5. Ok, assumming, I have figured out which one is data node and which
>    one is namenode then how will my mapreduce program or pig or hive scripts
>    know that it needs to run on node 1 or node 2 or node 3?
>    6. also where do we install pig, hive and flume on hadoop
>    master/slaves nodes or somewhere else? and how do we let pig/hive know that
>    node 1 is master/namenode and other nodes are slaves or data nodes?
>
> I would really appreciate inputs on this questions as setting up hadoop is
> turning out to be a quite complex task from where i currently see it.
>
> Regards, Andy.
>
>
> On Thu, Mar 13, 2014 at 5:14 PM, Geoffry Roberts <th...@gmail.com>wrote:
>
>> Andy,
>>
>> Once you have hadoop running,  You can run your jobs from the cli of the
>> name node. When I write a map reduce job, I jar it up. and place it in,
>> say, my home directory and run it from there.  I do the same with pig
>> scripts.  I've used neither hive nor cascading, but I imagine they would
>> work the same.
>>
>> Another approach I've tried is WebHDFS.  It's for manipulating the hdfs
>> via a restful interface.  It worked well enough for me.  I stopped using it
>> when I discovered it didn't support MapFiles but that's another story.
>>
>>
>> On Thu, Mar 13, 2014 at 5:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:
>>
>>> Hello Team,
>>>
>>> I have one question regarding putting data into hdfs and running
>>> mapreduce on data present in hdfs.
>>>
>>>    1. hdfs is file system and so to interact with it what kind of
>>>    clients are available? also where do we need to install those client?
>>>    2. regarding pig, hive and mapreduce, where do we install them on
>>>    hadoop cluster and from where do we run all scripts and how does it
>>>    internally know that it needs to run on node 1, node2 or node 3?
>>>
>>> any inputs here would really helpful.
>>>
>>> Thanks, Andy.
>>>
>>
>>
>>
>> --
>> There are ways and there are ways,
>>
>> Geoffry Roberts
>>
>
>


-- 
There are ways and there are ways,

Geoffry Roberts

Re: Reg: Setting up Hadoop Cluster

Posted by Geoffry Roberts <th...@gmail.com>.

Did you not populate the "slaves" file when you did your installation?  In
older versions of hadoop (< 2.0),  there was a "master" file where you
entered your name node.  Now days there are multiple name nodes.  I haven't
worked with them as of yet.

I installed pig, for example, on my name node and ran it from there.


On Thu, Mar 13, 2014 at 5:22 PM, ados1984@gmail.com <ad...@gmail.com>wrote:

> Thank you Geoffry,
>
> I have some fundamental question here.
>
>    1. Once I have installed Hadoop, how can i identify which nodes is
>    master node, which is slave?
>    2. My understanding is that master node is by default namenode and
>    slave node are data nodes, correct?
>    3. So i installed hadoop and i do not know which one is namenode and
>    which one id datanode then how can i go in and start run my jar from
>    namenode?
>    4. also when we do mapreduce programming, where do we write the
>    program on hadoop server (where we have nodes installed both
>    master/namenode and slaves/datanode) or in our local system using any
>    standard ide then package them together as jar and deploy it to name node,
>    but here again how can i identify which is name node and which is data node?
>    5. Ok, assumming, I have figured out which one is data node and which
>    one is namenode then how will my mapreduce program or pig or hive scripts
>    know that it needs to run on node 1 or node 2 or node 3?
>    6. also where do we install pig, hive and flume on hadoop
>    master/slaves nodes or somewhere else? and how do we let pig/hive know that
>    node 1 is master/namenode and other nodes are slaves or data nodes?
>
> I would really appreciate inputs on this questions as setting up hadoop is
> turning out to be a quite complex task from where i currently see it.
>
> Regards, Andy.
>
>
> On Thu, Mar 13, 2014 at 5:14 PM, Geoffry Roberts <th...@gmail.com>wrote:
>
>> Andy,
>>
>> Once you have hadoop running,  You can run your jobs from the cli of the
>> name node. When I write a map reduce job, I jar it up. and place it in,
>> say, my home directory and run it from there.  I do the same with pig
>> scripts.  I've used neither hive nor cascading, but I imagine they would
>> work the same.
>>
>> Another approach I've tried is WebHDFS.  It's for manipulating the hdfs
>> via a restful interface.  It worked well enough for me.  I stopped using it
>> when I discovered it didn't support MapFiles but that's another story.
>>
>>
>> On Thu, Mar 13, 2014 at 5:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:
>>
>>> Hello Team,
>>>
>>> I have one question regarding putting data into hdfs and running
>>> mapreduce on data present in hdfs.
>>>
>>>    1. hdfs is file system and so to interact with it what kind of
>>>    clients are available? also where do we need to install those client?
>>>    2. regarding pig, hive and mapreduce, where do we install them on
>>>    hadoop cluster and from where do we run all scripts and how does it
>>>    internally know that it needs to run on node 1, node2 or node 3?
>>>
>>> any inputs here would really helpful.
>>>
>>> Thanks, Andy.
>>>
>>
>>
>>
>> --
>> There are ways and there are ways,
>>
>> Geoffry Roberts
>>
>
>


-- 
There are ways and there are ways,

Geoffry Roberts

Re: Reg: Setting up Hadoop Cluster

Posted by "ados1984@gmail.com" <ad...@gmail.com>.

Thank you Geoffry,

I have some fundamental question here.

   1. Once I have installed Hadoop, how can i identify which nodes is
   master node, which is slave?
   2. My understanding is that master node is by default namenode and slave
   node are data nodes, correct?
   3. So i installed hadoop and i do not know which one is namenode and
   which one id datanode then how can i go in and start run my jar from
   namenode?
   4. also when we do mapreduce programming, where do we write the program
   on hadoop server (where we have nodes installed both master/namenode and
   slaves/datanode) or in our local system using any standard ide then package
   them together as jar and deploy it to name node, but here again how can i
   identify which is name node and which is data node?
   5. Ok, assumming, I have figured out which one is data node and which
   one is namenode then how will my mapreduce program or pig or hive scripts
   know that it needs to run on node 1 or node 2 or node 3?
   6. also where do we install pig, hive and flume on hadoop master/slaves
   nodes or somewhere else? and how do we let pig/hive know that node 1 is
   master/namenode and other nodes are slaves or data nodes?

I would really appreciate inputs on this questions as setting up hadoop is
turning out to be a quite complex task from where i currently see it.

Regards, Andy.

On Thu, Mar 13, 2014 at 5:14 PM, Geoffry Roberts <th...@gmail.com>wrote:

> Andy,
>
> Once you have hadoop running,  You can run your jobs from the cli of the
> name node. When I write a map reduce job, I jar it up. and place it in,
> say, my home directory and run it from there.  I do the same with pig
> scripts.  I've used neither hive nor cascading, but I imagine they would
> work the same.
>
> Another approach I've tried is WebHDFS.  It's for manipulating the hdfs
> via a restful interface.  It worked well enough for me.  I stopped using it
> when I discovered it didn't support MapFiles but that's another story.
>
>
> On Thu, Mar 13, 2014 at 5:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:
>
>> Hello Team,
>>
>> I have one question regarding putting data into hdfs and running
>> mapreduce on data present in hdfs.
>>
>>    1. hdfs is file system and so to interact with it what kind of
>>    clients are available? also where do we need to install those client?
>>    2. regarding pig, hive and mapreduce, where do we install them on
>>    hadoop cluster and from where do we run all scripts and how does it
>>    internally know that it needs to run on node 1, node2 or node 3?
>>
>> any inputs here would really helpful.
>>
>> Thanks, Andy.
>>
>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>

Re: Reg: Setting up Hadoop Cluster

Posted by "ados1984@gmail.com" <ad...@gmail.com>.

Thank you Geoffry,

I have some fundamental question here.

   1. Once I have installed Hadoop, how can i identify which nodes is
   master node, which is slave?
   2. My understanding is that master node is by default namenode and slave
   node are data nodes, correct?
   3. So i installed hadoop and i do not know which one is namenode and
   which one id datanode then how can i go in and start run my jar from
   namenode?
   4. also when we do mapreduce programming, where do we write the program
   on hadoop server (where we have nodes installed both master/namenode and
   slaves/datanode) or in our local system using any standard ide then package
   them together as jar and deploy it to name node, but here again how can i
   identify which is name node and which is data node?
   5. Ok, assumming, I have figured out which one is data node and which
   one is namenode then how will my mapreduce program or pig or hive scripts
   know that it needs to run on node 1 or node 2 or node 3?
   6. also where do we install pig, hive and flume on hadoop master/slaves
   nodes or somewhere else? and how do we let pig/hive know that node 1 is
   master/namenode and other nodes are slaves or data nodes?

I would really appreciate inputs on this questions as setting up hadoop is
turning out to be a quite complex task from where i currently see it.

Regards, Andy.

On Thu, Mar 13, 2014 at 5:14 PM, Geoffry Roberts <th...@gmail.com>wrote:

> Andy,
>
> Once you have hadoop running,  You can run your jobs from the cli of the
> name node. When I write a map reduce job, I jar it up. and place it in,
> say, my home directory and run it from there.  I do the same with pig
> scripts.  I've used neither hive nor cascading, but I imagine they would
> work the same.
>
> Another approach I've tried is WebHDFS.  It's for manipulating the hdfs
> via a restful interface.  It worked well enough for me.  I stopped using it
> when I discovered it didn't support MapFiles but that's another story.
>
>
> On Thu, Mar 13, 2014 at 5:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:
>
>> Hello Team,
>>
>> I have one question regarding putting data into hdfs and running
>> mapreduce on data present in hdfs.
>>
>>    1. hdfs is file system and so to interact with it what kind of
>>    clients are available? also where do we need to install those client?
>>    2. regarding pig, hive and mapreduce, where do we install them on
>>    hadoop cluster and from where do we run all scripts and how does it
>>    internally know that it needs to run on node 1, node2 or node 3?
>>
>> any inputs here would really helpful.
>>
>> Thanks, Andy.
>>
>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>

Re: Reg: Setting up Hadoop Cluster

Posted by "ados1984@gmail.com" <ad...@gmail.com>.

Thank you Geoffry,

I have some fundamental question here.

   1. Once I have installed Hadoop, how can i identify which nodes is
   master node, which is slave?
   2. My understanding is that master node is by default namenode and slave
   node are data nodes, correct?
   3. So i installed hadoop and i do not know which one is namenode and
   which one id datanode then how can i go in and start run my jar from
   namenode?
   4. also when we do mapreduce programming, where do we write the program
   on hadoop server (where we have nodes installed both master/namenode and
   slaves/datanode) or in our local system using any standard ide then package
   them together as jar and deploy it to name node, but here again how can i
   identify which is name node and which is data node?
   5. Ok, assumming, I have figured out which one is data node and which
   one is namenode then how will my mapreduce program or pig or hive scripts
   know that it needs to run on node 1 or node 2 or node 3?
   6. also where do we install pig, hive and flume on hadoop master/slaves
   nodes or somewhere else? and how do we let pig/hive know that node 1 is
   master/namenode and other nodes are slaves or data nodes?

I would really appreciate inputs on this questions as setting up hadoop is
turning out to be a quite complex task from where i currently see it.

Regards, Andy.

On Thu, Mar 13, 2014 at 5:14 PM, Geoffry Roberts <th...@gmail.com>wrote:

> Andy,
>
> Once you have hadoop running,  You can run your jobs from the cli of the
> name node. When I write a map reduce job, I jar it up. and place it in,
> say, my home directory and run it from there.  I do the same with pig
> scripts.  I've used neither hive nor cascading, but I imagine they would
> work the same.
>
> Another approach I've tried is WebHDFS.  It's for manipulating the hdfs
> via a restful interface.  It worked well enough for me.  I stopped using it
> when I discovered it didn't support MapFiles but that's another story.
>
>
> On Thu, Mar 13, 2014 at 5:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:
>
>> Hello Team,
>>
>> I have one question regarding putting data into hdfs and running
>> mapreduce on data present in hdfs.
>>
>>    1. hdfs is file system and so to interact with it what kind of
>>    clients are available? also where do we need to install those client?
>>    2. regarding pig, hive and mapreduce, where do we install them on
>>    hadoop cluster and from where do we run all scripts and how does it
>>    internally know that it needs to run on node 1, node2 or node 3?
>>
>> any inputs here would really helpful.
>>
>> Thanks, Andy.
>>
>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>

Re: Reg: Setting up Hadoop Cluster

Posted by "ados1984@gmail.com" <ad...@gmail.com>.

Thank you Geoffry,

I have some fundamental question here.

   1. Once I have installed Hadoop, how can i identify which nodes is
   master node, which is slave?
   2. My understanding is that master node is by default namenode and slave
   node are data nodes, correct?
   3. So i installed hadoop and i do not know which one is namenode and
   which one id datanode then how can i go in and start run my jar from
   namenode?
   4. also when we do mapreduce programming, where do we write the program
   on hadoop server (where we have nodes installed both master/namenode and
   slaves/datanode) or in our local system using any standard ide then package
   them together as jar and deploy it to name node, but here again how can i
   identify which is name node and which is data node?
   5. Ok, assumming, I have figured out which one is data node and which
   one is namenode then how will my mapreduce program or pig or hive scripts
   know that it needs to run on node 1 or node 2 or node 3?
   6. also where do we install pig, hive and flume on hadoop master/slaves
   nodes or somewhere else? and how do we let pig/hive know that node 1 is
   master/namenode and other nodes are slaves or data nodes?

I would really appreciate inputs on this questions as setting up hadoop is
turning out to be a quite complex task from where i currently see it.

Regards, Andy.

On Thu, Mar 13, 2014 at 5:14 PM, Geoffry Roberts <th...@gmail.com>wrote:

> Andy,
>
> Once you have hadoop running,  You can run your jobs from the cli of the
> name node. When I write a map reduce job, I jar it up. and place it in,
> say, my home directory and run it from there.  I do the same with pig
> scripts.  I've used neither hive nor cascading, but I imagine they would
> work the same.
>
> Another approach I've tried is WebHDFS.  It's for manipulating the hdfs
> via a restful interface.  It worked well enough for me.  I stopped using it
> when I discovered it didn't support MapFiles but that's another story.
>
>
> On Thu, Mar 13, 2014 at 5:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:
>
>> Hello Team,
>>
>> I have one question regarding putting data into hdfs and running
>> mapreduce on data present in hdfs.
>>
>>    1. hdfs is file system and so to interact with it what kind of
>>    clients are available? also where do we need to install those client?
>>    2. regarding pig, hive and mapreduce, where do we install them on
>>    hadoop cluster and from where do we run all scripts and how does it
>>    internally know that it needs to run on node 1, node2 or node 3?
>>
>> any inputs here would really helpful.
>>
>> Thanks, Andy.
>>
>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>

Re: Reg: Setting up Hadoop Cluster

Posted by Geoffry Roberts <th...@gmail.com>.

Andy,

Once you have hadoop running,  You can run your jobs from the cli of the
name node. When I write a map reduce job, I jar it up. and place it in,
say, my home directory and run it from there.  I do the same with pig
scripts.  I've used neither hive nor cascading, but I imagine they would
work the same.

Another approach I've tried is WebHDFS.  It's for manipulating the hdfs via
a restful interface.  It worked well enough for me.  I stopped using it
when I discovered it didn't support MapFiles but that's another story.

On Thu, Mar 13, 2014 at 5:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:

> Hello Team,
>
> I have one question regarding putting data into hdfs and running mapreduce
> on data present in hdfs.
>
>    1. hdfs is file system and so to interact with it what kind of clients
>    are available? also where do we need to install those client?
>    2. regarding pig, hive and mapreduce, where do we install them on
>    hadoop cluster and from where do we run all scripts and how does it
>    internally know that it needs to run on node 1, node2 or node 3?
>
> any inputs here would really helpful.
>
> Thanks, Andy.
>

-- 
There are ways and there are ways,

Geoffry Roberts

Re: Reg: Setting up Hadoop Cluster

Posted by Geoffry Roberts <th...@gmail.com>.

Andy,

Once you have hadoop running,  You can run your jobs from the cli of the
name node. When I write a map reduce job, I jar it up. and place it in,
say, my home directory and run it from there.  I do the same with pig
scripts.  I've used neither hive nor cascading, but I imagine they would
work the same.

Another approach I've tried is WebHDFS.  It's for manipulating the hdfs via
a restful interface.  It worked well enough for me.  I stopped using it
when I discovered it didn't support MapFiles but that's another story.

On Thu, Mar 13, 2014 at 5:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:

> Hello Team,
>
> I have one question regarding putting data into hdfs and running mapreduce
> on data present in hdfs.
>
>    1. hdfs is file system and so to interact with it what kind of clients
>    are available? also where do we need to install those client?
>    2. regarding pig, hive and mapreduce, where do we install them on
>    hadoop cluster and from where do we run all scripts and how does it
>    internally know that it needs to run on node 1, node2 or node 3?
>
> any inputs here would really helpful.
>
> Thanks, Andy.
>

-- 
There are ways and there are ways,

Geoffry Roberts

Re: Reg: Setting up Hadoop Cluster

Posted by Geoffry Roberts <th...@gmail.com>.

Andy,

Once you have hadoop running,  You can run your jobs from the cli of the
name node. When I write a map reduce job, I jar it up. and place it in,
say, my home directory and run it from there.  I do the same with pig
scripts.  I've used neither hive nor cascading, but I imagine they would
work the same.

Another approach I've tried is WebHDFS.  It's for manipulating the hdfs via
a restful interface.  It worked well enough for me.  I stopped using it
when I discovered it didn't support MapFiles but that's another story.

On Thu, Mar 13, 2014 at 5:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:

> Hello Team,
>
> I have one question regarding putting data into hdfs and running mapreduce
> on data present in hdfs.
>
>    1. hdfs is file system and so to interact with it what kind of clients
>    are available? also where do we need to install those client?
>    2. regarding pig, hive and mapreduce, where do we install them on
>    hadoop cluster and from where do we run all scripts and how does it
>    internally know that it needs to run on node 1, node2 or node 3?
>
> any inputs here would really helpful.
>
> Thanks, Andy.
>

-- 
There are ways and there are ways,

Geoffry Roberts

Re: Reg: Setting up Hadoop Cluster

Posted by Geoffry Roberts <th...@gmail.com>.

Andy,

Once you have hadoop running,  You can run your jobs from the cli of the
name node. When I write a map reduce job, I jar it up. and place it in,
say, my home directory and run it from there.  I do the same with pig
scripts.  I've used neither hive nor cascading, but I imagine they would
work the same.

Another approach I've tried is WebHDFS.  It's for manipulating the hdfs via
a restful interface.  It worked well enough for me.  I stopped using it
when I discovered it didn't support MapFiles but that's another story.

On Thu, Mar 13, 2014 at 5:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:

> Hello Team,
>
> I have one question regarding putting data into hdfs and running mapreduce
> on data present in hdfs.
>
>    1. hdfs is file system and so to interact with it what kind of clients
>    are available? also where do we need to install those client?
>    2. regarding pig, hive and mapreduce, where do we install them on
>    hadoop cluster and from where do we run all scripts and how does it
>    internally know that it needs to run on node 1, node2 or node 3?
>
> any inputs here would really helpful.
>
> Thanks, Andy.
>

-- 
There are ways and there are ways,

Geoffry Roberts