You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Mayur Patil <ra...@gmail.com> on 2013/03/04 08:20:32 UTC

Re: [Hadoop-Help]About Map-Reduce implementation

Hello,

     Now I am slowly understanding Hadoop working.

     As I want to collect the logs from three machines

     including Master itself . My small query is

     which mode should I implement for this??

   -      Standalone Operation
   -      Pseudo-Distributed Operation
   -      Fully-Distributed Operation

     Seeking for guidance,

     Thank you !!
*--
Cheers,
Mayur*




Hi mayur,
>
> Flume is used for data collection. Pig is used for data processing.
> For eg, if you have a bunch of servers that you want to collect the
> logs from and push to HDFS - you would use flume. Now if you need to
> run some analysis on that data, you could use pig to do that.
>
> Sent from my iPhone
>
> On Feb 14, 2013, at 1:39 AM, Mayur Patil <ra...@gmail.com> wrote:
>
> > Hello,
> >
> >   I just read about Pig
> >
> >> Pig
> >> A data flow language and execution environment for exploring very
> > large datasets.
> >> Pig runs on HDFS and MapReduce clusters.
> >
> >   What the actual difference between Pig and Flume makes in logs
> clustering??
> >
> >   Thank you !!
> > --
> > Cheers,
> > Mayur.
> >
> >
> >
> >> Hey Mayur,
> >>>
> >>> If you are collecting logs from multiple servers then you can use flume
> >>> for the same.
> >>>
> >>> if the contents of the logs are different in format  then you can just
> >>> use
> >>> textfileinput format to read and write into any other format you want
> for
> >>> your processing in later part of your projects
> >>>
> >>> first thing you need to learn is how to setup hadoop
> >>> then you can try writing sample hadoop mapreduce jobs to read from text
> >>> file and then process them and write the results into another file
> >>> then you can integrate flume as your log collection mechanism
> >>> once you get hold on the system then you can decide more on which paths
> >>> you want to follow based on your requirements for storage, compute
> time,
> >>> compute capacity, compression etc
> >>>
> >> --------------
> >> --------------
> >>
> >>> Hi,
> >>>
> >>> Please read basics on how hadoop works.
> >>>
> >>> Then start your hands on with map reduce coding.
> >>>
> >>> The tool which has been made for you is flume , but don't see tool till
> >>> you complete above two steps.
> >>>
> >>> Good luck , keep us posted.
> >>>
> >>> Regards,
> >>>
> >>> Jagat Singh
> >>>
> >>> -----------
> >>> Sent from Mobile , short and crisp.
> >>> On 06-Feb-2013 8:32 AM, "Mayur Patil" <ra...@gmail.com>
> wrote:
> >>>
> >>>> Hello,
> >>>>
> >>>>    I am new to Hadoop. I am doing a project in cloud in which I
> >>>>
> >>>>    have to use hadoop for Map-reduce. It is such that I am going
> >>>>
> >>>>    to collect logs from 2-3 machines having different locations.
> >>>>
> >>>>    The logs are also in different formats such as .rtf .log .txt
> >>>>
> >>>>    Later, I have to collect and convert them to one format and
> >>>>
> >>>>    collect to one location.
> >>>>
> >>>>    So I am asking which module of Hadoop that I need to study
> >>>>
> >>>>    for this implementation?? Or whole framework should I need
> >>>>
> >>>>    to study ??
> >>>>
> >>>>    Seeking for guidance,
> >>>>
> >>>>    Thank you !!
>

Re: [Hadoop-Help]About Map-Reduce implementation

Posted by Mayur Patil <ra...@gmail.com>.

Hello,

   Thanks sir for your reply !!

   Have you taken look at attachments

   which is my actual scenario ??

   Seeking for guidance,

   Thank you!!
--
*Cheers,
Mayur.*


Hi Mayur,
>
> You might want to take a look at http://flume.apache.org/ .
>
> JM
>



On Sat, Mar 9, 2013 at 10:59 PM, Mayur Patil <ra...@gmail.com>wrote:

> Hello,
>
>  Thinking on the problem I suddenly clicked by point
>
>  that is it possible for Master Node of Hadoop to collect
>
>  data from location on which Hadoop is not installed??
>
>  Thanks !!
>
>
> On Sat, Mar 9, 2013 at 9:29 PM, Mayur Patil <ra...@gmail.com>wrote:
>
>> Hello,
>>
>> Thanks sir for your favourable reply.
>>
>> I study on my needs more and get more insight as follows:
>>
>> I have to export logs from two machines to rSyslog server related to
>> Snort and Eucalyptus components.
>>
>> There are also logs generated related to OS. So,my observations are as
>> follows
>>
>> 1. Now, as per I think I just have to reduce data ( because Hadoop,what I
>> understand, is used to solve problem by assigning
>>
>> jobs to worker node. In my case, problem data is itself on worker node,
>> so I think I have to process problem data on that
>>
>> nodes themselves.
>>
>> 2. Now what I realise is I have one Master node and two worker node; one
>> is web server and other is operating system.
>>
>>
>> Seeking for guidance,
>>
>> Thank you !!
>>
>> == I have attached files to understand the scenario. Plz download anyone
>> which you find convenient.
>>
>

Re: [Hadoop-Help]About Map-Reduce implementation

Posted by Mayur Patil <ra...@gmail.com>.

Hello,

   Thanks sir for your reply !!

   Have you taken look at attachments

   which is my actual scenario ??

   Seeking for guidance,

   Thank you!!
--
*Cheers,
Mayur.*


Hi Mayur,
>
> You might want to take a look at http://flume.apache.org/ .
>
> JM
>



On Sat, Mar 9, 2013 at 10:59 PM, Mayur Patil <ra...@gmail.com>wrote:

> Hello,
>
>  Thinking on the problem I suddenly clicked by point
>
>  that is it possible for Master Node of Hadoop to collect
>
>  data from location on which Hadoop is not installed??
>
>  Thanks !!
>
>
> On Sat, Mar 9, 2013 at 9:29 PM, Mayur Patil <ra...@gmail.com>wrote:
>
>> Hello,
>>
>> Thanks sir for your favourable reply.
>>
>> I study on my needs more and get more insight as follows:
>>
>> I have to export logs from two machines to rSyslog server related to
>> Snort and Eucalyptus components.
>>
>> There are also logs generated related to OS. So,my observations are as
>> follows
>>
>> 1. Now, as per I think I just have to reduce data ( because Hadoop,what I
>> understand, is used to solve problem by assigning
>>
>> jobs to worker node. In my case, problem data is itself on worker node,
>> so I think I have to process problem data on that
>>
>> nodes themselves.
>>
>> 2. Now what I realise is I have one Master node and two worker node; one
>> is web server and other is operating system.
>>
>>
>> Seeking for guidance,
>>
>> Thank you !!
>>
>> == I have attached files to understand the scenario. Plz download anyone
>> which you find convenient.
>>
>

Re: [Hadoop-Help]About Map-Reduce implementation

Posted by Mayur Patil <ra...@gmail.com>.

Hello,

   Thanks sir for your reply !!

   Have you taken look at attachments

   which is my actual scenario ??

   Seeking for guidance,

   Thank you!!
--
*Cheers,
Mayur.*


Hi Mayur,
>
> You might want to take a look at http://flume.apache.org/ .
>
> JM
>



On Sat, Mar 9, 2013 at 10:59 PM, Mayur Patil <ra...@gmail.com>wrote:

> Hello,
>
>  Thinking on the problem I suddenly clicked by point
>
>  that is it possible for Master Node of Hadoop to collect
>
>  data from location on which Hadoop is not installed??
>
>  Thanks !!
>
>
> On Sat, Mar 9, 2013 at 9:29 PM, Mayur Patil <ra...@gmail.com>wrote:
>
>> Hello,
>>
>> Thanks sir for your favourable reply.
>>
>> I study on my needs more and get more insight as follows:
>>
>> I have to export logs from two machines to rSyslog server related to
>> Snort and Eucalyptus components.
>>
>> There are also logs generated related to OS. So,my observations are as
>> follows
>>
>> 1. Now, as per I think I just have to reduce data ( because Hadoop,what I
>> understand, is used to solve problem by assigning
>>
>> jobs to worker node. In my case, problem data is itself on worker node,
>> so I think I have to process problem data on that
>>
>> nodes themselves.
>>
>> 2. Now what I realise is I have one Master node and two worker node; one
>> is web server and other is operating system.
>>
>>
>> Seeking for guidance,
>>
>> Thank you !!
>>
>> == I have attached files to understand the scenario. Plz download anyone
>> which you find convenient.
>>
>

Re: [Hadoop-Help]About Map-Reduce implementation

Posted by Mayur Patil <ra...@gmail.com>.

Hello,

   Thanks sir for your reply !!

   Have you taken look at attachments

   which is my actual scenario ??

   Seeking for guidance,

   Thank you!!
--
*Cheers,
Mayur.*


Hi Mayur,
>
> You might want to take a look at http://flume.apache.org/ .
>
> JM
>



On Sat, Mar 9, 2013 at 10:59 PM, Mayur Patil <ra...@gmail.com>wrote:

> Hello,
>
>  Thinking on the problem I suddenly clicked by point
>
>  that is it possible for Master Node of Hadoop to collect
>
>  data from location on which Hadoop is not installed??
>
>  Thanks !!
>
>
> On Sat, Mar 9, 2013 at 9:29 PM, Mayur Patil <ra...@gmail.com>wrote:
>
>> Hello,
>>
>> Thanks sir for your favourable reply.
>>
>> I study on my needs more and get more insight as follows:
>>
>> I have to export logs from two machines to rSyslog server related to
>> Snort and Eucalyptus components.
>>
>> There are also logs generated related to OS. So,my observations are as
>> follows
>>
>> 1. Now, as per I think I just have to reduce data ( because Hadoop,what I
>> understand, is used to solve problem by assigning
>>
>> jobs to worker node. In my case, problem data is itself on worker node,
>> so I think I have to process problem data on that
>>
>> nodes themselves.
>>
>> 2. Now what I realise is I have one Master node and two worker node; one
>> is web server and other is operating system.
>>
>>
>> Seeking for guidance,
>>
>> Thank you !!
>>
>> == I have attached files to understand the scenario. Plz download anyone
>> which you find convenient.
>>
>

Re: [Hadoop-Help]About Map-Reduce implementation

Posted by Mayur Patil <ra...@gmail.com>.

Hello,

 Thinking on the problem I suddenly clicked by point

 that is it possible for Master Node of Hadoop to collect

 data from location on which Hadoop is not installed??

 Thanks !!
*
--
Cheers,
Mayur.*

On Sat, Mar 9, 2013 at 9:29 PM, Mayur Patil <ra...@gmail.com>wrote:

> Hello,
>
> Thanks sir for your favourable reply.
>
> I study on my needs more and get more insight as follows:
>
> I have to export logs from two machines to rSyslog server related to Snort
> and Eucalyptus components.
>
> There are also logs generated related to OS. So,my observations are as
> follows
>
> 1. Now, as per I think I just have to reduce data ( because Hadoop,what I
> understand, is used to solve problem by assigning
>
> jobs to worker node. In my case, problem data is itself on worker node, so
> I think I have to process problem data on that
>
> nodes themselves.
>
> 2. Now what I realise is I have one Master node and two worker node; one
> is web server and other is operating system.
>
>
> Seeking for guidance,
>
> Thank you !!
>
> == I have attached files to understand the scenario. Plz download anyone
> which you find convenient.
> *
> --
> Cheers,
> Mayur.*
>



-- 
*Cheers,
Mayur*.

Re: [Hadoop-Help]About Map-Reduce implementation

Posted by Mayur Patil <ra...@gmail.com>.

Hello,

 Thinking on the problem I suddenly clicked by point

 that is it possible for Master Node of Hadoop to collect

 data from location on which Hadoop is not installed??

 Thanks !!
*
--
Cheers,
Mayur.*

On Sat, Mar 9, 2013 at 9:29 PM, Mayur Patil <ra...@gmail.com>wrote:

> Hello,
>
> Thanks sir for your favourable reply.
>
> I study on my needs more and get more insight as follows:
>
> I have to export logs from two machines to rSyslog server related to Snort
> and Eucalyptus components.
>
> There are also logs generated related to OS. So,my observations are as
> follows
>
> 1. Now, as per I think I just have to reduce data ( because Hadoop,what I
> understand, is used to solve problem by assigning
>
> jobs to worker node. In my case, problem data is itself on worker node, so
> I think I have to process problem data on that
>
> nodes themselves.
>
> 2. Now what I realise is I have one Master node and two worker node; one
> is web server and other is operating system.
>
>
> Seeking for guidance,
>
> Thank you !!
>
> == I have attached files to understand the scenario. Plz download anyone
> which you find convenient.
> *
> --
> Cheers,
> Mayur.*
>



-- 
*Cheers,
Mayur*.

Re: [Hadoop-Help]About Map-Reduce implementation

Posted by Mayur Patil <ra...@gmail.com>.

Hello,

 Thinking on the problem I suddenly clicked by point

 that is it possible for Master Node of Hadoop to collect

 data from location on which Hadoop is not installed??

 Thanks !!
*
--
Cheers,
Mayur.*

On Sat, Mar 9, 2013 at 9:29 PM, Mayur Patil <ra...@gmail.com>wrote:

> Hello,
>
> Thanks sir for your favourable reply.
>
> I study on my needs more and get more insight as follows:
>
> I have to export logs from two machines to rSyslog server related to Snort
> and Eucalyptus components.
>
> There are also logs generated related to OS. So,my observations are as
> follows
>
> 1. Now, as per I think I just have to reduce data ( because Hadoop,what I
> understand, is used to solve problem by assigning
>
> jobs to worker node. In my case, problem data is itself on worker node, so
> I think I have to process problem data on that
>
> nodes themselves.
>
> 2. Now what I realise is I have one Master node and two worker node; one
> is web server and other is operating system.
>
>
> Seeking for guidance,
>
> Thank you !!
>
> == I have attached files to understand the scenario. Plz download anyone
> which you find convenient.
> *
> --
> Cheers,
> Mayur.*
>



-- 
*Cheers,
Mayur*.

Re: [Hadoop-Help]About Map-Reduce implementation

Posted by Mayur Patil <ra...@gmail.com>.

Hello,

 Thinking on the problem I suddenly clicked by point

 that is it possible for Master Node of Hadoop to collect

 data from location on which Hadoop is not installed??

 Thanks !!
*
--
Cheers,
Mayur.*

On Sat, Mar 9, 2013 at 9:29 PM, Mayur Patil <ra...@gmail.com>wrote:

> Hello,
>
> Thanks sir for your favourable reply.
>
> I study on my needs more and get more insight as follows:
>
> I have to export logs from two machines to rSyslog server related to Snort
> and Eucalyptus components.
>
> There are also logs generated related to OS. So,my observations are as
> follows
>
> 1. Now, as per I think I just have to reduce data ( because Hadoop,what I
> understand, is used to solve problem by assigning
>
> jobs to worker node. In my case, problem data is itself on worker node, so
> I think I have to process problem data on that
>
> nodes themselves.
>
> 2. Now what I realise is I have one Master node and two worker node; one
> is web server and other is operating system.
>
>
> Seeking for guidance,
>
> Thank you !!
>
> == I have attached files to understand the scenario. Plz download anyone
> which you find convenient.
> *
> --
> Cheers,
> Mayur.*
>



-- 
*Cheers,
Mayur*.

Re: [Hadoop-Help]About Map-Reduce implementation

Posted by Mayur Patil <ra...@gmail.com>.

Hello,

Thanks sir for your favourable reply.

I study on my needs more and get more insight as follows:

I have to export logs from two machines to rSyslog server related to Snort
and Eucalyptus components.

There are also logs generated related to OS. So,my observations are as
follows

1. Now, as per I think I just have to reduce data ( because Hadoop,what I
understand, is used to solve problem by assigning

jobs to worker node. In my case, problem data is itself on worker node, so
I think I have to process problem data on that

nodes themselves.

2. Now what I realise is I have one Master node and two worker node; one is
web server and other is operating system.

Seeking for guidance,

Thank you !!

== I have attached files to understand the scenario. Plz download anyone
which you find convenient.
*
--
Cheers,
Mayur.*

Re: [Hadoop-Help]About Map-Reduce implementation

Posted by Mayur Patil <ra...@gmail.com>.

Hello,

Thanks sir for your favourable reply.

I study on my needs more and get more insight as follows:

I have to export logs from two machines to rSyslog server related to Snort
and Eucalyptus components.

There are also logs generated related to OS. So,my observations are as
follows

1. Now, as per I think I just have to reduce data ( because Hadoop,what I
understand, is used to solve problem by assigning

jobs to worker node. In my case, problem data is itself on worker node, so
I think I have to process problem data on that

nodes themselves.

2. Now what I realise is I have one Master node and two worker node; one is
web server and other is operating system.

Seeking for guidance,

Thank you !!

== I have attached files to understand the scenario. Plz download anyone
which you find convenient.
*
--
Cheers,
Mayur.*

Re: [Hadoop-Help]About Map-Reduce implementation

Posted by Mayur Patil <ra...@gmail.com>.

Hello,

Thanks sir for your favourable reply.

I study on my needs more and get more insight as follows:

I have to export logs from two machines to rSyslog server related to Snort
and Eucalyptus components.

There are also logs generated related to OS. So,my observations are as
follows

1. Now, as per I think I just have to reduce data ( because Hadoop,what I
understand, is used to solve problem by assigning

jobs to worker node. In my case, problem data is itself on worker node, so
I think I have to process problem data on that

nodes themselves.

2. Now what I realise is I have one Master node and two worker node; one is
web server and other is operating system.

Seeking for guidance,

Thank you !!

== I have attached files to understand the scenario. Plz download anyone
which you find convenient.
*
--
Cheers,
Mayur.*

Re: [Hadoop-Help]About Map-Reduce implementation

Posted by Mayur Patil <ra...@gmail.com>.

Hello,

Thanks sir for your favourable reply.

I study on my needs more and get more insight as follows:

I have to export logs from two machines to rSyslog server related to Snort
and Eucalyptus components.

There are also logs generated related to OS. So,my observations are as
follows

1. Now, as per I think I just have to reduce data ( because Hadoop,what I
understand, is used to solve problem by assigning

jobs to worker node. In my case, problem data is itself on worker node, so
I think I have to process problem data on that

nodes themselves.

2. Now what I realise is I have one Master node and two worker node; one is
web server and other is operating system.

Seeking for guidance,

Thank you !!

== I have attached files to understand the scenario. Plz download anyone
which you find convenient.
*
--
Cheers,
Mayur.*

Re: [Hadoop-Help]About Map-Reduce implementation

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi Mayur,

Take a look here:
http://hadoop.apache.org/docs/r1.1.1/single_node_setup.html#PseudoDistributed

"Hadoop can also be run on a single-node in a pseudo-distributed mode
where each Hadoop daemon runs in a separate Java process." =
SingleNode.

So you can only use the Fully-Distributed mode.

JM

2013/3/8 Mayur Patil <ra...@gmail.com>:
> Hello,
>
>   Thank you sir for your favorable reply.
>
>   I am going to use 1master and 2 worker
>
>   nodes ; totally 3 nodes.
>
>
>   Thank you !!
>
> --
> Cheers,
> Mayur
>
> On Fri, Mar 8, 2013 at 8:30 AM, Jean-Marc Spaggiari
> <je...@spaggiari.org> wrote:
>>
>> Hi Mayur,
>>
>> Those 3 modes are 3 differents ways to use Hadoop, however, the only
>> production mode here is the fully distributed one. The 2 others are
>> more for local testing. How many nodes are you expecting to use hadoop
>> on?
>>
>> JM
>>
>>
>> 2013/3/7 Mayur Patil <ra...@gmail.com>:
>> > Hello,
>> >
>> >    Now I am slowly understanding Hadoop working.
>> >
>> >   As I want to collect the logs from three machines
>> >
>> >   including Master itself . My small query is
>> >
>> >   which mode should I implement for this??
>> >
>> >                   Standalone Operation
>> >                   Pseudo-Distributed Operation
>> >                   Fully-Distributed Operation
>> >
>> >      Seeking for guidance,
>> >
>> >      Thank you !!
>> > --
>> > Cheers,
>> > Mayur
>> >
>> >
>> >
>> >
>> >>> Hi mayur,
>> >>>
>> >>> Flume is used for data collection. Pig is used for data processing.
>> >>> For eg, if you have a bunch of servers that you want to collect the
>> >>> logs from and push to HDFS - you would use flume. Now if you need to
>> >>> run some analysis on that data, you could use pig to do that.
>> >>>
>> >>> Sent from my iPhone
>> >>>
>> >>> On Feb 14, 2013, at 1:39 AM, Mayur Patil <ra...@gmail.com>
>> >>> wrote:
>> >>>
>> >>> > Hello,
>> >>> >
>> >>> >   I just read about Pig
>> >>> >
>> >>> >> Pig
>> >>> >> A data flow language and execution environment for exploring very
>> >>> > large datasets.
>> >>> >> Pig runs on HDFS and MapReduce clusters.
>> >>> >
>> >>> >   What the actual difference between Pig and Flume makes in logs
>> >>> > clustering??
>> >>> >
>> >>> >   Thank you !!
>> >>> > --
>> >>> > Cheers,
>> >>> > Mayur.
>> >>> >
>> >>> >
>> >>> >
>> >>> >> Hey Mayur,
>> >>> >>>
>> >>> >>> If you are collecting logs from multiple servers then you can use
>> >>> >>> flume
>> >>> >>> for the same.
>> >>> >>>
>> >>> >>> if the contents of the logs are different in format  then you can
>> >>> >>> just
>> >>> >>> use
>> >>> >>> textfileinput format to read and write into any other format you
>> >>> >>> want
>> >>> >>> for
>> >>> >>> your processing in later part of your projects
>> >>> >>>
>> >>> >>> first thing you need to learn is how to setup hadoop
>> >>> >>> then you can try writing sample hadoop mapreduce jobs to read from
>> >>> >>> text
>> >>> >>> file and then process them and write the results into another file
>> >>> >>> then you can integrate flume as your log collection mechanism
>> >>> >>> once you get hold on the system then you can decide more on which
>> >>> >>> paths
>> >>> >>> you want to follow based on your requirements for storage, compute
>> >>> >>> time,
>> >>> >>> compute capacity, compression etc
>> >>> >>>
>> >>> >> --------------
>> >>> >> --------------
>> >>> >>
>> >>> >>> Hi,
>> >>> >>>
>> >>> >>> Please read basics on how hadoop works.
>> >>> >>>
>> >>> >>> Then start your hands on with map reduce coding.
>> >>> >>>
>> >>> >>> The tool which has been made for you is flume , but don't see tool
>> >>> >>> till
>> >>> >>> you complete above two steps.
>> >>> >>>
>> >>> >>> Good luck , keep us posted.
>> >>> >>>
>> >>> >>> Regards,
>> >>> >>>
>> >>> >>> Jagat Singh
>> >>> >>>
>> >>> >>> -----------
>> >>> >>> Sent from Mobile , short and crisp.
>> >>> >>> On 06-Feb-2013 8:32 AM, "Mayur Patil" <ra...@gmail.com>
>> >>> >>> wrote:
>> >>> >>>
>> >>> >>>> Hello,
>> >>> >>>>
>> >>> >>>>    I am new to Hadoop. I am doing a project in cloud in which I
>> >>> >>>>
>> >>> >>>>    have to use hadoop for Map-reduce. It is such that I am going
>> >>> >>>>
>> >>> >>>>    to collect logs from 2-3 machines having different locations.
>> >>> >>>>
>> >>> >>>>    The logs are also in different formats such as .rtf .log .txt
>> >>> >>>>
>> >>> >>>>    Later, I have to collect and convert them to one format and
>> >>> >>>>
>> >>> >>>>    collect to one location.
>> >>> >>>>
>> >>> >>>>    So I am asking which module of Hadoop that I need to study
>> >>> >>>>
>> >>> >>>>    for this implementation?? Or whole framework should I need
>> >>> >>>>
>> >>> >>>>    to study ??
>> >>> >>>>
>> >>> >>>>    Seeking for guidance,
>> >>> >>>>
>> >>> >>>>    Thank you !!
>> >
>> >
>> >
>> >
>> > --
>> > Cheers,
>> > Mayur.
>
>
>
>
> --
> Cheers,
> Mayur.

Re: [Hadoop-Help]About Map-Reduce implementation

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi Mayur,

Take a look here:
http://hadoop.apache.org/docs/r1.1.1/single_node_setup.html#PseudoDistributed

"Hadoop can also be run on a single-node in a pseudo-distributed mode
where each Hadoop daemon runs in a separate Java process." =
SingleNode.

So you can only use the Fully-Distributed mode.

JM

2013/3/8 Mayur Patil <ra...@gmail.com>:
> Hello,
>
>   Thank you sir for your favorable reply.
>
>   I am going to use 1master and 2 worker
>
>   nodes ; totally 3 nodes.
>
>
>   Thank you !!
>
> --
> Cheers,
> Mayur
>
> On Fri, Mar 8, 2013 at 8:30 AM, Jean-Marc Spaggiari
> <je...@spaggiari.org> wrote:
>>
>> Hi Mayur,
>>
>> Those 3 modes are 3 differents ways to use Hadoop, however, the only
>> production mode here is the fully distributed one. The 2 others are
>> more for local testing. How many nodes are you expecting to use hadoop
>> on?
>>
>> JM
>>
>>
>> 2013/3/7 Mayur Patil <ra...@gmail.com>:
>> > Hello,
>> >
>> >    Now I am slowly understanding Hadoop working.
>> >
>> >   As I want to collect the logs from three machines
>> >
>> >   including Master itself . My small query is
>> >
>> >   which mode should I implement for this??
>> >
>> >                   Standalone Operation
>> >                   Pseudo-Distributed Operation
>> >                   Fully-Distributed Operation
>> >
>> >      Seeking for guidance,
>> >
>> >      Thank you !!
>> > --
>> > Cheers,
>> > Mayur
>> >
>> >
>> >
>> >
>> >>> Hi mayur,
>> >>>
>> >>> Flume is used for data collection. Pig is used for data processing.
>> >>> For eg, if you have a bunch of servers that you want to collect the
>> >>> logs from and push to HDFS - you would use flume. Now if you need to
>> >>> run some analysis on that data, you could use pig to do that.
>> >>>
>> >>> Sent from my iPhone
>> >>>
>> >>> On Feb 14, 2013, at 1:39 AM, Mayur Patil <ra...@gmail.com>
>> >>> wrote:
>> >>>
>> >>> > Hello,
>> >>> >
>> >>> >   I just read about Pig
>> >>> >
>> >>> >> Pig
>> >>> >> A data flow language and execution environment for exploring very
>> >>> > large datasets.
>> >>> >> Pig runs on HDFS and MapReduce clusters.
>> >>> >
>> >>> >   What the actual difference between Pig and Flume makes in logs
>> >>> > clustering??
>> >>> >
>> >>> >   Thank you !!
>> >>> > --
>> >>> > Cheers,
>> >>> > Mayur.
>> >>> >
>> >>> >
>> >>> >
>> >>> >> Hey Mayur,
>> >>> >>>
>> >>> >>> If you are collecting logs from multiple servers then you can use
>> >>> >>> flume
>> >>> >>> for the same.
>> >>> >>>
>> >>> >>> if the contents of the logs are different in format  then you can
>> >>> >>> just
>> >>> >>> use
>> >>> >>> textfileinput format to read and write into any other format you
>> >>> >>> want
>> >>> >>> for
>> >>> >>> your processing in later part of your projects
>> >>> >>>
>> >>> >>> first thing you need to learn is how to setup hadoop
>> >>> >>> then you can try writing sample hadoop mapreduce jobs to read from
>> >>> >>> text
>> >>> >>> file and then process them and write the results into another file
>> >>> >>> then you can integrate flume as your log collection mechanism
>> >>> >>> once you get hold on the system then you can decide more on which
>> >>> >>> paths
>> >>> >>> you want to follow based on your requirements for storage, compute
>> >>> >>> time,
>> >>> >>> compute capacity, compression etc
>> >>> >>>
>> >>> >> --------------
>> >>> >> --------------
>> >>> >>
>> >>> >>> Hi,
>> >>> >>>
>> >>> >>> Please read basics on how hadoop works.
>> >>> >>>
>> >>> >>> Then start your hands on with map reduce coding.
>> >>> >>>
>> >>> >>> The tool which has been made for you is flume , but don't see tool
>> >>> >>> till
>> >>> >>> you complete above two steps.
>> >>> >>>
>> >>> >>> Good luck , keep us posted.
>> >>> >>>
>> >>> >>> Regards,
>> >>> >>>
>> >>> >>> Jagat Singh
>> >>> >>>
>> >>> >>> -----------
>> >>> >>> Sent from Mobile , short and crisp.
>> >>> >>> On 06-Feb-2013 8:32 AM, "Mayur Patil" <ra...@gmail.com>
>> >>> >>> wrote:
>> >>> >>>
>> >>> >>>> Hello,
>> >>> >>>>
>> >>> >>>>    I am new to Hadoop. I am doing a project in cloud in which I
>> >>> >>>>
>> >>> >>>>    have to use hadoop for Map-reduce. It is such that I am going
>> >>> >>>>
>> >>> >>>>    to collect logs from 2-3 machines having different locations.
>> >>> >>>>
>> >>> >>>>    The logs are also in different formats such as .rtf .log .txt
>> >>> >>>>
>> >>> >>>>    Later, I have to collect and convert them to one format and
>> >>> >>>>
>> >>> >>>>    collect to one location.
>> >>> >>>>
>> >>> >>>>    So I am asking which module of Hadoop that I need to study
>> >>> >>>>
>> >>> >>>>    for this implementation?? Or whole framework should I need
>> >>> >>>>
>> >>> >>>>    to study ??
>> >>> >>>>
>> >>> >>>>    Seeking for guidance,
>> >>> >>>>
>> >>> >>>>    Thank you !!
>> >
>> >
>> >
>> >
>> > --
>> > Cheers,
>> > Mayur.
>
>
>
>
> --
> Cheers,
> Mayur.

Re: [Hadoop-Help]About Map-Reduce implementation

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi Mayur,

Take a look here:
http://hadoop.apache.org/docs/r1.1.1/single_node_setup.html#PseudoDistributed

"Hadoop can also be run on a single-node in a pseudo-distributed mode
where each Hadoop daemon runs in a separate Java process." =
SingleNode.

So you can only use the Fully-Distributed mode.

JM

2013/3/8 Mayur Patil <ra...@gmail.com>:
> Hello,
>
>   Thank you sir for your favorable reply.
>
>   I am going to use 1master and 2 worker
>
>   nodes ; totally 3 nodes.
>
>
>   Thank you !!
>
> --
> Cheers,
> Mayur
>
> On Fri, Mar 8, 2013 at 8:30 AM, Jean-Marc Spaggiari
> <je...@spaggiari.org> wrote:
>>
>> Hi Mayur,
>>
>> Those 3 modes are 3 differents ways to use Hadoop, however, the only
>> production mode here is the fully distributed one. The 2 others are
>> more for local testing. How many nodes are you expecting to use hadoop
>> on?
>>
>> JM
>>
>>
>> 2013/3/7 Mayur Patil <ra...@gmail.com>:
>> > Hello,
>> >
>> >    Now I am slowly understanding Hadoop working.
>> >
>> >   As I want to collect the logs from three machines
>> >
>> >   including Master itself . My small query is
>> >
>> >   which mode should I implement for this??
>> >
>> >                   Standalone Operation
>> >                   Pseudo-Distributed Operation
>> >                   Fully-Distributed Operation
>> >
>> >      Seeking for guidance,
>> >
>> >      Thank you !!
>> > --
>> > Cheers,
>> > Mayur
>> >
>> >
>> >
>> >
>> >>> Hi mayur,
>> >>>
>> >>> Flume is used for data collection. Pig is used for data processing.
>> >>> For eg, if you have a bunch of servers that you want to collect the
>> >>> logs from and push to HDFS - you would use flume. Now if you need to
>> >>> run some analysis on that data, you could use pig to do that.
>> >>>
>> >>> Sent from my iPhone
>> >>>
>> >>> On Feb 14, 2013, at 1:39 AM, Mayur Patil <ra...@gmail.com>
>> >>> wrote:
>> >>>
>> >>> > Hello,
>> >>> >
>> >>> >   I just read about Pig
>> >>> >
>> >>> >> Pig
>> >>> >> A data flow language and execution environment for exploring very
>> >>> > large datasets.
>> >>> >> Pig runs on HDFS and MapReduce clusters.
>> >>> >
>> >>> >   What the actual difference between Pig and Flume makes in logs
>> >>> > clustering??
>> >>> >
>> >>> >   Thank you !!
>> >>> > --
>> >>> > Cheers,
>> >>> > Mayur.
>> >>> >
>> >>> >
>> >>> >
>> >>> >> Hey Mayur,
>> >>> >>>
>> >>> >>> If you are collecting logs from multiple servers then you can use
>> >>> >>> flume
>> >>> >>> for the same.
>> >>> >>>
>> >>> >>> if the contents of the logs are different in format  then you can
>> >>> >>> just
>> >>> >>> use
>> >>> >>> textfileinput format to read and write into any other format you
>> >>> >>> want
>> >>> >>> for
>> >>> >>> your processing in later part of your projects
>> >>> >>>
>> >>> >>> first thing you need to learn is how to setup hadoop
>> >>> >>> then you can try writing sample hadoop mapreduce jobs to read from
>> >>> >>> text
>> >>> >>> file and then process them and write the results into another file
>> >>> >>> then you can integrate flume as your log collection mechanism
>> >>> >>> once you get hold on the system then you can decide more on which
>> >>> >>> paths
>> >>> >>> you want to follow based on your requirements for storage, compute
>> >>> >>> time,
>> >>> >>> compute capacity, compression etc
>> >>> >>>
>> >>> >> --------------
>> >>> >> --------------
>> >>> >>
>> >>> >>> Hi,
>> >>> >>>
>> >>> >>> Please read basics on how hadoop works.
>> >>> >>>
>> >>> >>> Then start your hands on with map reduce coding.
>> >>> >>>
>> >>> >>> The tool which has been made for you is flume , but don't see tool
>> >>> >>> till
>> >>> >>> you complete above two steps.
>> >>> >>>
>> >>> >>> Good luck , keep us posted.
>> >>> >>>
>> >>> >>> Regards,
>> >>> >>>
>> >>> >>> Jagat Singh
>> >>> >>>
>> >>> >>> -----------
>> >>> >>> Sent from Mobile , short and crisp.
>> >>> >>> On 06-Feb-2013 8:32 AM, "Mayur Patil" <ra...@gmail.com>
>> >>> >>> wrote:
>> >>> >>>
>> >>> >>>> Hello,
>> >>> >>>>
>> >>> >>>>    I am new to Hadoop. I am doing a project in cloud in which I
>> >>> >>>>
>> >>> >>>>    have to use hadoop for Map-reduce. It is such that I am going
>> >>> >>>>
>> >>> >>>>    to collect logs from 2-3 machines having different locations.
>> >>> >>>>
>> >>> >>>>    The logs are also in different formats such as .rtf .log .txt
>> >>> >>>>
>> >>> >>>>    Later, I have to collect and convert them to one format and
>> >>> >>>>
>> >>> >>>>    collect to one location.
>> >>> >>>>
>> >>> >>>>    So I am asking which module of Hadoop that I need to study
>> >>> >>>>
>> >>> >>>>    for this implementation?? Or whole framework should I need
>> >>> >>>>
>> >>> >>>>    to study ??
>> >>> >>>>
>> >>> >>>>    Seeking for guidance,
>> >>> >>>>
>> >>> >>>>    Thank you !!
>> >
>> >
>> >
>> >
>> > --
>> > Cheers,
>> > Mayur.
>
>
>
>
> --
> Cheers,
> Mayur.

Re: [Hadoop-Help]About Map-Reduce implementation

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi Mayur,

Take a look here:
http://hadoop.apache.org/docs/r1.1.1/single_node_setup.html#PseudoDistributed

"Hadoop can also be run on a single-node in a pseudo-distributed mode
where each Hadoop daemon runs in a separate Java process." =
SingleNode.

So you can only use the Fully-Distributed mode.

JM

2013/3/8 Mayur Patil <ra...@gmail.com>:
> Hello,
>
>   Thank you sir for your favorable reply.
>
>   I am going to use 1master and 2 worker
>
>   nodes ; totally 3 nodes.
>
>
>   Thank you !!
>
> --
> Cheers,
> Mayur
>
> On Fri, Mar 8, 2013 at 8:30 AM, Jean-Marc Spaggiari
> <je...@spaggiari.org> wrote:
>>
>> Hi Mayur,
>>
>> Those 3 modes are 3 differents ways to use Hadoop, however, the only
>> production mode here is the fully distributed one. The 2 others are
>> more for local testing. How many nodes are you expecting to use hadoop
>> on?
>>
>> JM
>>
>>
>> 2013/3/7 Mayur Patil <ra...@gmail.com>:
>> > Hello,
>> >
>> >    Now I am slowly understanding Hadoop working.
>> >
>> >   As I want to collect the logs from three machines
>> >
>> >   including Master itself . My small query is
>> >
>> >   which mode should I implement for this??
>> >
>> >                   Standalone Operation
>> >                   Pseudo-Distributed Operation
>> >                   Fully-Distributed Operation
>> >
>> >      Seeking for guidance,
>> >
>> >      Thank you !!
>> > --
>> > Cheers,
>> > Mayur
>> >
>> >
>> >
>> >
>> >>> Hi mayur,
>> >>>
>> >>> Flume is used for data collection. Pig is used for data processing.
>> >>> For eg, if you have a bunch of servers that you want to collect the
>> >>> logs from and push to HDFS - you would use flume. Now if you need to
>> >>> run some analysis on that data, you could use pig to do that.
>> >>>
>> >>> Sent from my iPhone
>> >>>
>> >>> On Feb 14, 2013, at 1:39 AM, Mayur Patil <ra...@gmail.com>
>> >>> wrote:
>> >>>
>> >>> > Hello,
>> >>> >
>> >>> >   I just read about Pig
>> >>> >
>> >>> >> Pig
>> >>> >> A data flow language and execution environment for exploring very
>> >>> > large datasets.
>> >>> >> Pig runs on HDFS and MapReduce clusters.
>> >>> >
>> >>> >   What the actual difference between Pig and Flume makes in logs
>> >>> > clustering??
>> >>> >
>> >>> >   Thank you !!
>> >>> > --
>> >>> > Cheers,
>> >>> > Mayur.
>> >>> >
>> >>> >
>> >>> >
>> >>> >> Hey Mayur,
>> >>> >>>
>> >>> >>> If you are collecting logs from multiple servers then you can use
>> >>> >>> flume
>> >>> >>> for the same.
>> >>> >>>
>> >>> >>> if the contents of the logs are different in format  then you can
>> >>> >>> just
>> >>> >>> use
>> >>> >>> textfileinput format to read and write into any other format you
>> >>> >>> want
>> >>> >>> for
>> >>> >>> your processing in later part of your projects
>> >>> >>>
>> >>> >>> first thing you need to learn is how to setup hadoop
>> >>> >>> then you can try writing sample hadoop mapreduce jobs to read from
>> >>> >>> text
>> >>> >>> file and then process them and write the results into another file
>> >>> >>> then you can integrate flume as your log collection mechanism
>> >>> >>> once you get hold on the system then you can decide more on which
>> >>> >>> paths
>> >>> >>> you want to follow based on your requirements for storage, compute
>> >>> >>> time,
>> >>> >>> compute capacity, compression etc
>> >>> >>>
>> >>> >> --------------
>> >>> >> --------------
>> >>> >>
>> >>> >>> Hi,
>> >>> >>>
>> >>> >>> Please read basics on how hadoop works.
>> >>> >>>
>> >>> >>> Then start your hands on with map reduce coding.
>> >>> >>>
>> >>> >>> The tool which has been made for you is flume , but don't see tool
>> >>> >>> till
>> >>> >>> you complete above two steps.
>> >>> >>>
>> >>> >>> Good luck , keep us posted.
>> >>> >>>
>> >>> >>> Regards,
>> >>> >>>
>> >>> >>> Jagat Singh
>> >>> >>>
>> >>> >>> -----------
>> >>> >>> Sent from Mobile , short and crisp.
>> >>> >>> On 06-Feb-2013 8:32 AM, "Mayur Patil" <ra...@gmail.com>
>> >>> >>> wrote:
>> >>> >>>
>> >>> >>>> Hello,
>> >>> >>>>
>> >>> >>>>    I am new to Hadoop. I am doing a project in cloud in which I
>> >>> >>>>
>> >>> >>>>    have to use hadoop for Map-reduce. It is such that I am going
>> >>> >>>>
>> >>> >>>>    to collect logs from 2-3 machines having different locations.
>> >>> >>>>
>> >>> >>>>    The logs are also in different formats such as .rtf .log .txt
>> >>> >>>>
>> >>> >>>>    Later, I have to collect and convert them to one format and
>> >>> >>>>
>> >>> >>>>    collect to one location.
>> >>> >>>>
>> >>> >>>>    So I am asking which module of Hadoop that I need to study
>> >>> >>>>
>> >>> >>>>    for this implementation?? Or whole framework should I need
>> >>> >>>>
>> >>> >>>>    to study ??
>> >>> >>>>
>> >>> >>>>    Seeking for guidance,
>> >>> >>>>
>> >>> >>>>    Thank you !!
>> >
>> >
>> >
>> >
>> > --
>> > Cheers,
>> > Mayur.
>
>
>
>
> --
> Cheers,
> Mayur.

Re: [Hadoop-Help]About Map-Reduce implementation

Posted by Mayur Patil <ra...@gmail.com>.

Hello,

  Thank you sir for your favorable reply.

  I am going to use 1master and 2 worker

  nodes ; totally 3 nodes.

  Thank you !!

*--
Cheers,
Mayur
*
On Fri, Mar 8, 2013 at 8:30 AM, Jean-Marc Spaggiari <jean-marc@spaggiari.org
> wrote:

> Hi Mayur,
>
> Those 3 modes are 3 differents ways to use Hadoop, however, the only
> production mode here is the fully distributed one. The 2 others are
> more for local testing. How many nodes are you expecting to use hadoop
> on?
>
> JM
>
>
> 2013/3/7 Mayur Patil <ra...@gmail.com>:
> > Hello,
> >
> >    Now I am slowly understanding Hadoop working.
> >
> >   As I want to collect the logs from three machines
> >
> >   including Master itself . My small query is
> >
> >   which mode should I implement for this??
> >
> >                   Standalone Operation
> >                   Pseudo-Distributed Operation
> >                   Fully-Distributed Operation
> >
> >      Seeking for guidance,
> >
> >      Thank you !!
> > --
> > Cheers,
> > Mayur
> >
> >
> >
> >
> >>> Hi mayur,
> >>>
> >>> Flume is used for data collection. Pig is used for data processing.
> >>> For eg, if you have a bunch of servers that you want to collect the
> >>> logs from and push to HDFS - you would use flume. Now if you need to
> >>> run some analysis on that data, you could use pig to do that.
> >>>
> >>> Sent from my iPhone
> >>>
> >>> On Feb 14, 2013, at 1:39 AM, Mayur Patil <ra...@gmail.com>
> >>> wrote:
> >>>
> >>> > Hello,
> >>> >
> >>> >   I just read about Pig
> >>> >
> >>> >> Pig
> >>> >> A data flow language and execution environment for exploring very
> >>> > large datasets.
> >>> >> Pig runs on HDFS and MapReduce clusters.
> >>> >
> >>> >   What the actual difference between Pig and Flume makes in logs
> >>> > clustering??
> >>> >
> >>> >   Thank you !!
> >>> > --
> >>> > Cheers,
> >>> > Mayur.
> >>> >
> >>> >
> >>> >
> >>> >> Hey Mayur,
> >>> >>>
> >>> >>> If you are collecting logs from multiple servers then you can use
> >>> >>> flume
> >>> >>> for the same.
> >>> >>>
> >>> >>> if the contents of the logs are different in format  then you can
> >>> >>> just
> >>> >>> use
> >>> >>> textfileinput format to read and write into any other format you
> want
> >>> >>> for
> >>> >>> your processing in later part of your projects
> >>> >>>
> >>> >>> first thing you need to learn is how to setup hadoop
> >>> >>> then you can try writing sample hadoop mapreduce jobs to read from
> >>> >>> text
> >>> >>> file and then process them and write the results into another file
> >>> >>> then you can integrate flume as your log collection mechanism
> >>> >>> once you get hold on the system then you can decide more on which
> >>> >>> paths
> >>> >>> you want to follow based on your requirements for storage, compute
> >>> >>> time,
> >>> >>> compute capacity, compression etc
> >>> >>>
> >>> >> --------------
> >>> >> --------------
> >>> >>
> >>> >>> Hi,
> >>> >>>
> >>> >>> Please read basics on how hadoop works.
> >>> >>>
> >>> >>> Then start your hands on with map reduce coding.
> >>> >>>
> >>> >>> The tool which has been made for you is flume , but don't see tool
> >>> >>> till
> >>> >>> you complete above two steps.
> >>> >>>
> >>> >>> Good luck , keep us posted.
> >>> >>>
> >>> >>> Regards,
> >>> >>>
> >>> >>> Jagat Singh
> >>> >>>
> >>> >>> -----------
> >>> >>> Sent from Mobile , short and crisp.
> >>> >>> On 06-Feb-2013 8:32 AM, "Mayur Patil" <ra...@gmail.com>
> >>> >>> wrote:
> >>> >>>
> >>> >>>> Hello,
> >>> >>>>
> >>> >>>>    I am new to Hadoop. I am doing a project in cloud in which I
> >>> >>>>
> >>> >>>>    have to use hadoop for Map-reduce. It is such that I am going
> >>> >>>>
> >>> >>>>    to collect logs from 2-3 machines having different locations.
> >>> >>>>
> >>> >>>>    The logs are also in different formats such as .rtf .log .txt
> >>> >>>>
> >>> >>>>    Later, I have to collect and convert them to one format and
> >>> >>>>
> >>> >>>>    collect to one location.
> >>> >>>>
> >>> >>>>    So I am asking which module of Hadoop that I need to study
> >>> >>>>
> >>> >>>>    for this implementation?? Or whole framework should I need
> >>> >>>>
> >>> >>>>    to study ??
> >>> >>>>
> >>> >>>>    Seeking for guidance,
> >>> >>>>
> >>> >>>>    Thank you !!
> >
> >
> >
> >
> > --
> > Cheers,
> > Mayur.
>



-- 
*Cheers,
Mayur*.

Re: Find current version & cluster info of hadoop

Posted by Harsh J <ha...@cloudera.com>.

Something like "hadoop version" and "hdfs dfsadmin -report" is what
you're looking for?

On Fri, Mar 8, 2013 at 11:11 AM, Sai Sai <sa...@yahoo.in> wrote:
> Just wondering if there r any commands in Hadoop which would give us the
> current version that we
> r using and any command which will give us the info of cluster setup of H we
> r working on.
> Thanks
> Sai



--
Harsh J

Re: Dissecting MR output article

Posted by Harsh J <ha...@cloudera.com>.

+1 for Hadoop: The Definitive Guide and other books.

Sidenote: The 3rd Edition of Tom White's Hadoop: The
Definitive Guide does have good details on MRv2 and YARN.

On Sat, Mar 23, 2013 at 11:22 AM, Azuryy Yu <az...@gmail.com> wrote:
> hadoop definition guide.pdf should be helpful. there is a chapter for this.
> but only for MRv1.
>
> On Mar 23, 2013 1:50 PM, "Sai Sai" <sa...@yahoo.in> wrote:
>>
>>
>> Just wondering if there is any step by step explaination/article of MR
>> output we get when we run a job either in eclipse or ubuntu.
>> Any help is appreciated.
>> Thanks
>> Sai



--
Harsh J

Re: Dissecting MR output article

Posted by Harsh J <ha...@cloudera.com>.

+1 for Hadoop: The Definitive Guide and other books.

Sidenote: The 3rd Edition of Tom White's Hadoop: The
Definitive Guide does have good details on MRv2 and YARN.

On Sat, Mar 23, 2013 at 11:22 AM, Azuryy Yu <az...@gmail.com> wrote:
> hadoop definition guide.pdf should be helpful. there is a chapter for this.
> but only for MRv1.
>
> On Mar 23, 2013 1:50 PM, "Sai Sai" <sa...@yahoo.in> wrote:
>>
>>
>> Just wondering if there is any step by step explaination/article of MR
>> output we get when we run a job either in eclipse or ubuntu.
>> Any help is appreciated.
>> Thanks
>> Sai



--
Harsh J

Re: Dissecting MR output article

Posted by Harsh J <ha...@cloudera.com>.

+1 for Hadoop: The Definitive Guide and other books.

Sidenote: The 3rd Edition of Tom White's Hadoop: The
Definitive Guide does have good details on MRv2 and YARN.

On Sat, Mar 23, 2013 at 11:22 AM, Azuryy Yu <az...@gmail.com> wrote:
> hadoop definition guide.pdf should be helpful. there is a chapter for this.
> but only for MRv1.
>
> On Mar 23, 2013 1:50 PM, "Sai Sai" <sa...@yahoo.in> wrote:
>>
>>
>> Just wondering if there is any step by step explaination/article of MR
>> output we get when we run a job either in eclipse or ubuntu.
>> Any help is appreciated.
>> Thanks
>> Sai



--
Harsh J

Re: Dissecting MR output article

Posted by Harsh J <ha...@cloudera.com>.

+1 for Hadoop: The Definitive Guide and other books.

Sidenote: The 3rd Edition of Tom White's Hadoop: The
Definitive Guide does have good details on MRv2 and YARN.

On Sat, Mar 23, 2013 at 11:22 AM, Azuryy Yu <az...@gmail.com> wrote:
> hadoop definition guide.pdf should be helpful. there is a chapter for this.
> but only for MRv1.
>
> On Mar 23, 2013 1:50 PM, "Sai Sai" <sa...@yahoo.in> wrote:
>>
>>
>> Just wondering if there is any step by step explaination/article of MR
>> output we get when we run a job either in eclipse or ubuntu.
>> Any help is appreciated.
>> Thanks
>> Sai



--
Harsh J

Re: Dissecting MR output article

Posted by Azuryy Yu <az...@gmail.com>.

hadoop definition guide.pdf should be helpful. there is a chapter for this.
but only for MRv1.
 On Mar 23, 2013 1:50 PM, "Sai Sai" <sa...@yahoo.in> wrote:

>
> Just wondering if there is any step by step explaination/article of MR
> output we get when we run a job either in eclipse or ubuntu.
> Any help is appreciated.
> Thanks
> Sai
>

Re: Dissecting MR output article

Posted by Azuryy Yu <az...@gmail.com>.

hadoop definition guide.pdf should be helpful. there is a chapter for this.
but only for MRv1.
 On Mar 23, 2013 1:50 PM, "Sai Sai" <sa...@yahoo.in> wrote:

>
> Just wondering if there is any step by step explaination/article of MR
> output we get when we run a job either in eclipse or ubuntu.
> Any help is appreciated.
> Thanks
> Sai
>

Re: Dissecting MR output article

Posted by Azuryy Yu <az...@gmail.com>.

hadoop definition guide.pdf should be helpful. there is a chapter for this.
but only for MRv1.
 On Mar 23, 2013 1:50 PM, "Sai Sai" <sa...@yahoo.in> wrote:

>
> Just wondering if there is any step by step explaination/article of MR
> output we get when we run a job either in eclipse or ubuntu.
> Any help is appreciated.
> Thanks
> Sai
>

Re: Dissecting MR output article

Posted by Azuryy Yu <az...@gmail.com>.

hadoop definition guide.pdf should be helpful. there is a chapter for this.
but only for MRv1.
 On Mar 23, 2013 1:50 PM, "Sai Sai" <sa...@yahoo.in> wrote:

>
> Just wondering if there is any step by step explaination/article of MR
> output we get when we run a job either in eclipse or ubuntu.
> Any help is appreciated.
> Thanks
> Sai
>

Re: Dissecting MR output article

Posted by Sai Sai <sa...@yahoo.in>.


Just wondering if there is any step by step explaination/article of MR output we get when we run a job either in eclipse or ubuntu.Any help is appreciated.
Thanks
Sai

Re: Dissecting MR output article

Posted by Sai Sai <sa...@yahoo.in>.


Just wondering if there is any step by step explaination/article of MR output we get when we run a job either in eclipse or ubuntu.Any help is appreciated.
Thanks
Sai

Re: Dissecting MR output article

Posted by Sai Sai <sa...@yahoo.in>.


Just wondering if there is any step by step explaination/article of MR output we get when we run a job either in eclipse or ubuntu.Any help is appreciated.
Thanks
Sai

Re: Dissecting MR output article

Posted by Sai Sai <sa...@yahoo.in>.


Just wondering if there is any step by step explaination/article of MR output we get when we run a job either in eclipse or ubuntu.Any help is appreciated.
Thanks
Sai

Re: Setup/Cleanup question

Posted by Sai Sai <sa...@yahoo.in>.

Thanks Harsh.
So the setup/cleanup r for the Job and not the Mappers i take it.
Thanks.





________________________________
 From: Harsh J <ha...@cloudera.com>
To: "<us...@hadoop.apache.org>" <us...@hadoop.apache.org>; Sai Sai <sa...@yahoo.in> 
Sent: Friday, 22 March 2013 10:05 PM
Subject: Re: Setup/Cleanup question
 
Assuming you speak of MRv1 (1.x/0.20.x versions), there is just 1 Job
Setup and 1 Job Cleanup tasks additionally run for each Job.

On Sat, Mar 23, 2013 at 9:10 AM, Sai Sai <sa...@yahoo.in> wrote:
> When running an MR job/program assuming there r 'n' (=100) Mappers triggered
> then my question is will the setup & cleanup run n number of times which
> means once for each mapper or for all the mappers they will run only once.
> Any help is appreciated.
> Thanks
> Sai



-- 
Harsh J

Re: Setup/Cleanup question

Posted by Sai Sai <sa...@yahoo.in>.

Thanks Harsh.
So the setup/cleanup r for the Job and not the Mappers i take it.
Thanks.





________________________________
 From: Harsh J <ha...@cloudera.com>
To: "<us...@hadoop.apache.org>" <us...@hadoop.apache.org>; Sai Sai <sa...@yahoo.in> 
Sent: Friday, 22 March 2013 10:05 PM
Subject: Re: Setup/Cleanup question
 
Assuming you speak of MRv1 (1.x/0.20.x versions), there is just 1 Job
Setup and 1 Job Cleanup tasks additionally run for each Job.

On Sat, Mar 23, 2013 at 9:10 AM, Sai Sai <sa...@yahoo.in> wrote:
> When running an MR job/program assuming there r 'n' (=100) Mappers triggered
> then my question is will the setup & cleanup run n number of times which
> means once for each mapper or for all the mappers they will run only once.
> Any help is appreciated.
> Thanks
> Sai



-- 
Harsh J

Re: Setup/Cleanup question

Posted by Sai Sai <sa...@yahoo.in>.

Thanks Harsh.
So the setup/cleanup r for the Job and not the Mappers i take it.
Thanks.





________________________________
 From: Harsh J <ha...@cloudera.com>
To: "<us...@hadoop.apache.org>" <us...@hadoop.apache.org>; Sai Sai <sa...@yahoo.in> 
Sent: Friday, 22 March 2013 10:05 PM
Subject: Re: Setup/Cleanup question
 
Assuming you speak of MRv1 (1.x/0.20.x versions), there is just 1 Job
Setup and 1 Job Cleanup tasks additionally run for each Job.

On Sat, Mar 23, 2013 at 9:10 AM, Sai Sai <sa...@yahoo.in> wrote:
> When running an MR job/program assuming there r 'n' (=100) Mappers triggered
> then my question is will the setup & cleanup run n number of times which
> means once for each mapper or for all the mappers they will run only once.
> Any help is appreciated.
> Thanks
> Sai



-- 
Harsh J

Re: Setup/Cleanup question

Posted by Sai Sai <sa...@yahoo.in>.

Thanks Harsh.
So the setup/cleanup r for the Job and not the Mappers i take it.
Thanks.





________________________________
 From: Harsh J <ha...@cloudera.com>
To: "<us...@hadoop.apache.org>" <us...@hadoop.apache.org>; Sai Sai <sa...@yahoo.in> 
Sent: Friday, 22 March 2013 10:05 PM
Subject: Re: Setup/Cleanup question
 
Assuming you speak of MRv1 (1.x/0.20.x versions), there is just 1 Job
Setup and 1 Job Cleanup tasks additionally run for each Job.

On Sat, Mar 23, 2013 at 9:10 AM, Sai Sai <sa...@yahoo.in> wrote:
> When running an MR job/program assuming there r 'n' (=100) Mappers triggered
> then my question is will the setup & cleanup run n number of times which
> means once for each mapper or for all the mappers they will run only once.
> Any help is appreciated.
> Thanks
> Sai



-- 
Harsh J

Re: Setup/Cleanup question

Posted by Harsh J <ha...@cloudera.com>.

Assuming you speak of MRv1 (1.x/0.20.x versions), there is just 1 Job
Setup and 1 Job Cleanup tasks additionally run for each Job.

On Sat, Mar 23, 2013 at 9:10 AM, Sai Sai <sa...@yahoo.in> wrote:
> When running an MR job/program assuming there r 'n' (=100) Mappers triggered
> then my question is will the setup & cleanup run n number of times which
> means once for each mapper or for all the mappers they will run only once.
> Any help is appreciated.
> Thanks
> Sai



-- 
Harsh J

Re: Setup/Cleanup question

Posted by Harsh J <ha...@cloudera.com>.

Assuming you speak of MRv1 (1.x/0.20.x versions), there is just 1 Job
Setup and 1 Job Cleanup tasks additionally run for each Job.

On Sat, Mar 23, 2013 at 9:10 AM, Sai Sai <sa...@yahoo.in> wrote:
> When running an MR job/program assuming there r 'n' (=100) Mappers triggered
> then my question is will the setup & cleanup run n number of times which
> means once for each mapper or for all the mappers they will run only once.
> Any help is appreciated.
> Thanks
> Sai



-- 
Harsh J

Re: Setup/Cleanup question

Posted by Harsh J <ha...@cloudera.com>.

Assuming you speak of MRv1 (1.x/0.20.x versions), there is just 1 Job
Setup and 1 Job Cleanup tasks additionally run for each Job.

On Sat, Mar 23, 2013 at 9:10 AM, Sai Sai <sa...@yahoo.in> wrote:
> When running an MR job/program assuming there r 'n' (=100) Mappers triggered
> then my question is will the setup & cleanup run n number of times which
> means once for each mapper or for all the mappers they will run only once.
> Any help is appreciated.
> Thanks
> Sai



-- 
Harsh J

Re: Setup/Cleanup question

Posted by Harsh J <ha...@cloudera.com>.

Assuming you speak of MRv1 (1.x/0.20.x versions), there is just 1 Job
Setup and 1 Job Cleanup tasks additionally run for each Job.

On Sat, Mar 23, 2013 at 9:10 AM, Sai Sai <sa...@yahoo.in> wrote:
> When running an MR job/program assuming there r 'n' (=100) Mappers triggered
> then my question is will the setup & cleanup run n number of times which
> means once for each mapper or for all the mappers they will run only once.
> Any help is appreciated.
> Thanks
> Sai



-- 
Harsh J

Re: Setup/Cleanup question

Posted by Sai Sai <sa...@yahoo.in>.

When running an MR job/program assuming there r 'n' (=100) Mapperstriggered then my question is will the setup & cleanup run n number of times which means once for each mapper or for all the mappers they will run only once. 

Any help is appreciated.
Thanks
Sai

Re: Setup/Cleanup question

Posted by Sai Sai <sa...@yahoo.in>.

When running an MR job/program assuming there r 'n' (=100) Mapperstriggered then my question is will the setup & cleanup run n number of times which means once for each mapper or for all the mappers they will run only once. 

Any help is appreciated.
Thanks
Sai

Re: Find current version & cluster info of hadoop

Posted by Harsh J <ha...@cloudera.com>.

Something like "hadoop version" and "hdfs dfsadmin -report" is what
you're looking for?

On Fri, Mar 8, 2013 at 11:11 AM, Sai Sai <sa...@yahoo.in> wrote:
> Just wondering if there r any commands in Hadoop which would give us the
> current version that we
> r using and any command which will give us the info of cluster setup of H we
> r working on.
> Thanks
> Sai



--
Harsh J

Re: Setup/Cleanup question

Posted by Sai Sai <sa...@yahoo.in>.

When running an MR job/program assuming there r 'n' (=100) Mapperstriggered then my question is will the setup & cleanup run n number of times which means once for each mapper or for all the mappers they will run only once. 

Any help is appreciated.
Thanks
Sai

Re: Find current version & cluster info of hadoop

Posted by Harsh J <ha...@cloudera.com>.

Something like "hadoop version" and "hdfs dfsadmin -report" is what
you're looking for?

On Fri, Mar 8, 2013 at 11:11 AM, Sai Sai <sa...@yahoo.in> wrote:
> Just wondering if there r any commands in Hadoop which would give us the
> current version that we
> r using and any command which will give us the info of cluster setup of H we
> r working on.
> Thanks
> Sai



--
Harsh J

Re: Find current version & cluster info of hadoop

Posted by Harsh J <ha...@cloudera.com>.

Something like "hadoop version" and "hdfs dfsadmin -report" is what
you're looking for?

On Fri, Mar 8, 2013 at 11:11 AM, Sai Sai <sa...@yahoo.in> wrote:
> Just wondering if there r any commands in Hadoop which would give us the
> current version that we
> r using and any command which will give us the info of cluster setup of H we
> r working on.
> Thanks
> Sai



--
Harsh J

Re: Setup/Cleanup question

Posted by Sai Sai <sa...@yahoo.in>.

When running an MR job/program assuming there r 'n' (=100) Mapperstriggered then my question is will the setup & cleanup run n number of times which means once for each mapper or for all the mappers they will run only once. 

Any help is appreciated.
Thanks
Sai

Re: Find current version & cluster info of hadoop

Posted by Sai Sai <sa...@yahoo.in>.

Just wondering if there r any commands in Hadoop which would give us the current version that we
r using and any command which will give us the info of cluster setup of H we r working on.
Thanks
Sai

Re: [Hadoop-Help]About Map-Reduce implementation

Posted by Mayur Patil <ra...@gmail.com>.

Hello,

  Thank you sir for your favorable reply.

  I am going to use 1master and 2 worker

  nodes ; totally 3 nodes.

  Thank you !!

*--
Cheers,
Mayur
*
On Fri, Mar 8, 2013 at 8:30 AM, Jean-Marc Spaggiari <jean-marc@spaggiari.org
> wrote:

> Hi Mayur,
>
> Those 3 modes are 3 differents ways to use Hadoop, however, the only
> production mode here is the fully distributed one. The 2 others are
> more for local testing. How many nodes are you expecting to use hadoop
> on?
>
> JM
>
>
> 2013/3/7 Mayur Patil <ra...@gmail.com>:
> > Hello,
> >
> >    Now I am slowly understanding Hadoop working.
> >
> >   As I want to collect the logs from three machines
> >
> >   including Master itself . My small query is
> >
> >   which mode should I implement for this??
> >
> >                   Standalone Operation
> >                   Pseudo-Distributed Operation
> >                   Fully-Distributed Operation
> >
> >      Seeking for guidance,
> >
> >      Thank you !!
> > --
> > Cheers,
> > Mayur
> >
> >
> >
> >
> >>> Hi mayur,
> >>>
> >>> Flume is used for data collection. Pig is used for data processing.
> >>> For eg, if you have a bunch of servers that you want to collect the
> >>> logs from and push to HDFS - you would use flume. Now if you need to
> >>> run some analysis on that data, you could use pig to do that.
> >>>
> >>> Sent from my iPhone
> >>>
> >>> On Feb 14, 2013, at 1:39 AM, Mayur Patil <ra...@gmail.com>
> >>> wrote:
> >>>
> >>> > Hello,
> >>> >
> >>> >   I just read about Pig
> >>> >
> >>> >> Pig
> >>> >> A data flow language and execution environment for exploring very
> >>> > large datasets.
> >>> >> Pig runs on HDFS and MapReduce clusters.
> >>> >
> >>> >   What the actual difference between Pig and Flume makes in logs
> >>> > clustering??
> >>> >
> >>> >   Thank you !!
> >>> > --
> >>> > Cheers,
> >>> > Mayur.
> >>> >
> >>> >
> >>> >
> >>> >> Hey Mayur,
> >>> >>>
> >>> >>> If you are collecting logs from multiple servers then you can use
> >>> >>> flume
> >>> >>> for the same.
> >>> >>>
> >>> >>> if the contents of the logs are different in format  then you can
> >>> >>> just
> >>> >>> use
> >>> >>> textfileinput format to read and write into any other format you
> want
> >>> >>> for
> >>> >>> your processing in later part of your projects
> >>> >>>
> >>> >>> first thing you need to learn is how to setup hadoop
> >>> >>> then you can try writing sample hadoop mapreduce jobs to read from
> >>> >>> text
> >>> >>> file and then process them and write the results into another file
> >>> >>> then you can integrate flume as your log collection mechanism
> >>> >>> once you get hold on the system then you can decide more on which
> >>> >>> paths
> >>> >>> you want to follow based on your requirements for storage, compute
> >>> >>> time,
> >>> >>> compute capacity, compression etc
> >>> >>>
> >>> >> --------------
> >>> >> --------------
> >>> >>
> >>> >>> Hi,
> >>> >>>
> >>> >>> Please read basics on how hadoop works.
> >>> >>>
> >>> >>> Then start your hands on with map reduce coding.
> >>> >>>
> >>> >>> The tool which has been made for you is flume , but don't see tool
> >>> >>> till
> >>> >>> you complete above two steps.
> >>> >>>
> >>> >>> Good luck , keep us posted.
> >>> >>>
> >>> >>> Regards,
> >>> >>>
> >>> >>> Jagat Singh
> >>> >>>
> >>> >>> -----------
> >>> >>> Sent from Mobile , short and crisp.
> >>> >>> On 06-Feb-2013 8:32 AM, "Mayur Patil" <ra...@gmail.com>
> >>> >>> wrote:
> >>> >>>
> >>> >>>> Hello,
> >>> >>>>
> >>> >>>>    I am new to Hadoop. I am doing a project in cloud in which I
> >>> >>>>
> >>> >>>>    have to use hadoop for Map-reduce. It is such that I am going
> >>> >>>>
> >>> >>>>    to collect logs from 2-3 machines having different locations.
> >>> >>>>
> >>> >>>>    The logs are also in different formats such as .rtf .log .txt
> >>> >>>>
> >>> >>>>    Later, I have to collect and convert them to one format and
> >>> >>>>
> >>> >>>>    collect to one location.
> >>> >>>>
> >>> >>>>    So I am asking which module of Hadoop that I need to study
> >>> >>>>
> >>> >>>>    for this implementation?? Or whole framework should I need
> >>> >>>>
> >>> >>>>    to study ??
> >>> >>>>
> >>> >>>>    Seeking for guidance,
> >>> >>>>
> >>> >>>>    Thank you !!
> >
> >
> >
> >
> > --
> > Cheers,
> > Mayur.
>



-- 
*Cheers,
Mayur*.

Re: Find current version & cluster info of hadoop

Posted by Sai Sai <sa...@yahoo.in>.

Just wondering if there r any commands in Hadoop which would give us the current version that we
r using and any command which will give us the info of cluster setup of H we r working on.
Thanks
Sai

Re: Find current version & cluster info of hadoop

Posted by Sai Sai <sa...@yahoo.in>.

Just wondering if there r any commands in Hadoop which would give us the current version that we
r using and any command which will give us the info of cluster setup of H we r working on.
Thanks
Sai

Re: [Hadoop-Help]About Map-Reduce implementation

Posted by Mayur Patil <ra...@gmail.com>.

Hello,

  Thank you sir for your favorable reply.

  I am going to use 1master and 2 worker

  nodes ; totally 3 nodes.

  Thank you !!

*--
Cheers,
Mayur
*
On Fri, Mar 8, 2013 at 8:30 AM, Jean-Marc Spaggiari <jean-marc@spaggiari.org
> wrote:

> Hi Mayur,
>
> Those 3 modes are 3 differents ways to use Hadoop, however, the only
> production mode here is the fully distributed one. The 2 others are
> more for local testing. How many nodes are you expecting to use hadoop
> on?
>
> JM
>
>
> 2013/3/7 Mayur Patil <ra...@gmail.com>:
> > Hello,
> >
> >    Now I am slowly understanding Hadoop working.
> >
> >   As I want to collect the logs from three machines
> >
> >   including Master itself . My small query is
> >
> >   which mode should I implement for this??
> >
> >                   Standalone Operation
> >                   Pseudo-Distributed Operation
> >                   Fully-Distributed Operation
> >
> >      Seeking for guidance,
> >
> >      Thank you !!
> > --
> > Cheers,
> > Mayur
> >
> >
> >
> >
> >>> Hi mayur,
> >>>
> >>> Flume is used for data collection. Pig is used for data processing.
> >>> For eg, if you have a bunch of servers that you want to collect the
> >>> logs from and push to HDFS - you would use flume. Now if you need to
> >>> run some analysis on that data, you could use pig to do that.
> >>>
> >>> Sent from my iPhone
> >>>
> >>> On Feb 14, 2013, at 1:39 AM, Mayur Patil <ra...@gmail.com>
> >>> wrote:
> >>>
> >>> > Hello,
> >>> >
> >>> >   I just read about Pig
> >>> >
> >>> >> Pig
> >>> >> A data flow language and execution environment for exploring very
> >>> > large datasets.
> >>> >> Pig runs on HDFS and MapReduce clusters.
> >>> >
> >>> >   What the actual difference between Pig and Flume makes in logs
> >>> > clustering??
> >>> >
> >>> >   Thank you !!
> >>> > --
> >>> > Cheers,
> >>> > Mayur.
> >>> >
> >>> >
> >>> >
> >>> >> Hey Mayur,
> >>> >>>
> >>> >>> If you are collecting logs from multiple servers then you can use
> >>> >>> flume
> >>> >>> for the same.
> >>> >>>
> >>> >>> if the contents of the logs are different in format  then you can
> >>> >>> just
> >>> >>> use
> >>> >>> textfileinput format to read and write into any other format you
> want
> >>> >>> for
> >>> >>> your processing in later part of your projects
> >>> >>>
> >>> >>> first thing you need to learn is how to setup hadoop
> >>> >>> then you can try writing sample hadoop mapreduce jobs to read from
> >>> >>> text
> >>> >>> file and then process them and write the results into another file
> >>> >>> then you can integrate flume as your log collection mechanism
> >>> >>> once you get hold on the system then you can decide more on which
> >>> >>> paths
> >>> >>> you want to follow based on your requirements for storage, compute
> >>> >>> time,
> >>> >>> compute capacity, compression etc
> >>> >>>
> >>> >> --------------
> >>> >> --------------
> >>> >>
> >>> >>> Hi,
> >>> >>>
> >>> >>> Please read basics on how hadoop works.
> >>> >>>
> >>> >>> Then start your hands on with map reduce coding.
> >>> >>>
> >>> >>> The tool which has been made for you is flume , but don't see tool
> >>> >>> till
> >>> >>> you complete above two steps.
> >>> >>>
> >>> >>> Good luck , keep us posted.
> >>> >>>
> >>> >>> Regards,
> >>> >>>
> >>> >>> Jagat Singh
> >>> >>>
> >>> >>> -----------
> >>> >>> Sent from Mobile , short and crisp.
> >>> >>> On 06-Feb-2013 8:32 AM, "Mayur Patil" <ra...@gmail.com>
> >>> >>> wrote:
> >>> >>>
> >>> >>>> Hello,
> >>> >>>>
> >>> >>>>    I am new to Hadoop. I am doing a project in cloud in which I
> >>> >>>>
> >>> >>>>    have to use hadoop for Map-reduce. It is such that I am going
> >>> >>>>
> >>> >>>>    to collect logs from 2-3 machines having different locations.
> >>> >>>>
> >>> >>>>    The logs are also in different formats such as .rtf .log .txt
> >>> >>>>
> >>> >>>>    Later, I have to collect and convert them to one format and
> >>> >>>>
> >>> >>>>    collect to one location.
> >>> >>>>
> >>> >>>>    So I am asking which module of Hadoop that I need to study
> >>> >>>>
> >>> >>>>    for this implementation?? Or whole framework should I need
> >>> >>>>
> >>> >>>>    to study ??
> >>> >>>>
> >>> >>>>    Seeking for guidance,
> >>> >>>>
> >>> >>>>    Thank you !!
> >
> >
> >
> >
> > --
> > Cheers,
> > Mayur.
>



-- 
*Cheers,
Mayur*.

Re: Find current version & cluster info of hadoop

Posted by Sai Sai <sa...@yahoo.in>.

Just wondering if there r any commands in Hadoop which would give us the current version that we
r using and any command which will give us the info of cluster setup of H we r working on.
Thanks
Sai

Re: [Hadoop-Help]About Map-Reduce implementation

Posted by Mayur Patil <ra...@gmail.com>.

Hello,

  Thank you sir for your favorable reply.

  I am going to use 1master and 2 worker

  nodes ; totally 3 nodes.

  Thank you !!

*--
Cheers,
Mayur
*
On Fri, Mar 8, 2013 at 8:30 AM, Jean-Marc Spaggiari <jean-marc@spaggiari.org
> wrote:

> Hi Mayur,
>
> Those 3 modes are 3 differents ways to use Hadoop, however, the only
> production mode here is the fully distributed one. The 2 others are
> more for local testing. How many nodes are you expecting to use hadoop
> on?
>
> JM
>
>
> 2013/3/7 Mayur Patil <ra...@gmail.com>:
> > Hello,
> >
> >    Now I am slowly understanding Hadoop working.
> >
> >   As I want to collect the logs from three machines
> >
> >   including Master itself . My small query is
> >
> >   which mode should I implement for this??
> >
> >                   Standalone Operation
> >                   Pseudo-Distributed Operation
> >                   Fully-Distributed Operation
> >
> >      Seeking for guidance,
> >
> >      Thank you !!
> > --
> > Cheers,
> > Mayur
> >
> >
> >
> >
> >>> Hi mayur,
> >>>
> >>> Flume is used for data collection. Pig is used for data processing.
> >>> For eg, if you have a bunch of servers that you want to collect the
> >>> logs from and push to HDFS - you would use flume. Now if you need to
> >>> run some analysis on that data, you could use pig to do that.
> >>>
> >>> Sent from my iPhone
> >>>
> >>> On Feb 14, 2013, at 1:39 AM, Mayur Patil <ra...@gmail.com>
> >>> wrote:
> >>>
> >>> > Hello,
> >>> >
> >>> >   I just read about Pig
> >>> >
> >>> >> Pig
> >>> >> A data flow language and execution environment for exploring very
> >>> > large datasets.
> >>> >> Pig runs on HDFS and MapReduce clusters.
> >>> >
> >>> >   What the actual difference between Pig and Flume makes in logs
> >>> > clustering??
> >>> >
> >>> >   Thank you !!
> >>> > --
> >>> > Cheers,
> >>> > Mayur.
> >>> >
> >>> >
> >>> >
> >>> >> Hey Mayur,
> >>> >>>
> >>> >>> If you are collecting logs from multiple servers then you can use
> >>> >>> flume
> >>> >>> for the same.
> >>> >>>
> >>> >>> if the contents of the logs are different in format  then you can
> >>> >>> just
> >>> >>> use
> >>> >>> textfileinput format to read and write into any other format you
> want
> >>> >>> for
> >>> >>> your processing in later part of your projects
> >>> >>>
> >>> >>> first thing you need to learn is how to setup hadoop
> >>> >>> then you can try writing sample hadoop mapreduce jobs to read from
> >>> >>> text
> >>> >>> file and then process them and write the results into another file
> >>> >>> then you can integrate flume as your log collection mechanism
> >>> >>> once you get hold on the system then you can decide more on which
> >>> >>> paths
> >>> >>> you want to follow based on your requirements for storage, compute
> >>> >>> time,
> >>> >>> compute capacity, compression etc
> >>> >>>
> >>> >> --------------
> >>> >> --------------
> >>> >>
> >>> >>> Hi,
> >>> >>>
> >>> >>> Please read basics on how hadoop works.
> >>> >>>
> >>> >>> Then start your hands on with map reduce coding.
> >>> >>>
> >>> >>> The tool which has been made for you is flume , but don't see tool
> >>> >>> till
> >>> >>> you complete above two steps.
> >>> >>>
> >>> >>> Good luck , keep us posted.
> >>> >>>
> >>> >>> Regards,
> >>> >>>
> >>> >>> Jagat Singh
> >>> >>>
> >>> >>> -----------
> >>> >>> Sent from Mobile , short and crisp.
> >>> >>> On 06-Feb-2013 8:32 AM, "Mayur Patil" <ra...@gmail.com>
> >>> >>> wrote:
> >>> >>>
> >>> >>>> Hello,
> >>> >>>>
> >>> >>>>    I am new to Hadoop. I am doing a project in cloud in which I
> >>> >>>>
> >>> >>>>    have to use hadoop for Map-reduce. It is such that I am going
> >>> >>>>
> >>> >>>>    to collect logs from 2-3 machines having different locations.
> >>> >>>>
> >>> >>>>    The logs are also in different formats such as .rtf .log .txt
> >>> >>>>
> >>> >>>>    Later, I have to collect and convert them to one format and
> >>> >>>>
> >>> >>>>    collect to one location.
> >>> >>>>
> >>> >>>>    So I am asking which module of Hadoop that I need to study
> >>> >>>>
> >>> >>>>    for this implementation?? Or whole framework should I need
> >>> >>>>
> >>> >>>>    to study ??
> >>> >>>>
> >>> >>>>    Seeking for guidance,
> >>> >>>>
> >>> >>>>    Thank you !!
> >
> >
> >
> >
> > --
> > Cheers,
> > Mayur.
>



-- 
*Cheers,
Mayur*.

Re: [Hadoop-Help]About Map-Reduce implementation

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi Mayur,

Those 3 modes are 3 differents ways to use Hadoop, however, the only
production mode here is the fully distributed one. The 2 others are
more for local testing. How many nodes are you expecting to use hadoop
on?

JM


2013/3/7 Mayur Patil <ra...@gmail.com>:
> Hello,
>
>    Now I am slowly understanding Hadoop working.
>
>   As I want to collect the logs from three machines
>
>   including Master itself . My small query is
>
>   which mode should I implement for this??
>
>                   Standalone Operation
>                   Pseudo-Distributed Operation
>                   Fully-Distributed Operation
>
>      Seeking for guidance,
>
>      Thank you !!
> --
> Cheers,
> Mayur
>
>
>
>
>>> Hi mayur,
>>>
>>> Flume is used for data collection. Pig is used for data processing.
>>> For eg, if you have a bunch of servers that you want to collect the
>>> logs from and push to HDFS - you would use flume. Now if you need to
>>> run some analysis on that data, you could use pig to do that.
>>>
>>> Sent from my iPhone
>>>
>>> On Feb 14, 2013, at 1:39 AM, Mayur Patil <ra...@gmail.com>
>>> wrote:
>>>
>>> > Hello,
>>> >
>>> >   I just read about Pig
>>> >
>>> >> Pig
>>> >> A data flow language and execution environment for exploring very
>>> > large datasets.
>>> >> Pig runs on HDFS and MapReduce clusters.
>>> >
>>> >   What the actual difference between Pig and Flume makes in logs
>>> > clustering??
>>> >
>>> >   Thank you !!
>>> > --
>>> > Cheers,
>>> > Mayur.
>>> >
>>> >
>>> >
>>> >> Hey Mayur,
>>> >>>
>>> >>> If you are collecting logs from multiple servers then you can use
>>> >>> flume
>>> >>> for the same.
>>> >>>
>>> >>> if the contents of the logs are different in format  then you can
>>> >>> just
>>> >>> use
>>> >>> textfileinput format to read and write into any other format you want
>>> >>> for
>>> >>> your processing in later part of your projects
>>> >>>
>>> >>> first thing you need to learn is how to setup hadoop
>>> >>> then you can try writing sample hadoop mapreduce jobs to read from
>>> >>> text
>>> >>> file and then process them and write the results into another file
>>> >>> then you can integrate flume as your log collection mechanism
>>> >>> once you get hold on the system then you can decide more on which
>>> >>> paths
>>> >>> you want to follow based on your requirements for storage, compute
>>> >>> time,
>>> >>> compute capacity, compression etc
>>> >>>
>>> >> --------------
>>> >> --------------
>>> >>
>>> >>> Hi,
>>> >>>
>>> >>> Please read basics on how hadoop works.
>>> >>>
>>> >>> Then start your hands on with map reduce coding.
>>> >>>
>>> >>> The tool which has been made for you is flume , but don't see tool
>>> >>> till
>>> >>> you complete above two steps.
>>> >>>
>>> >>> Good luck , keep us posted.
>>> >>>
>>> >>> Regards,
>>> >>>
>>> >>> Jagat Singh
>>> >>>
>>> >>> -----------
>>> >>> Sent from Mobile , short and crisp.
>>> >>> On 06-Feb-2013 8:32 AM, "Mayur Patil" <ra...@gmail.com>
>>> >>> wrote:
>>> >>>
>>> >>>> Hello,
>>> >>>>
>>> >>>>    I am new to Hadoop. I am doing a project in cloud in which I
>>> >>>>
>>> >>>>    have to use hadoop for Map-reduce. It is such that I am going
>>> >>>>
>>> >>>>    to collect logs from 2-3 machines having different locations.
>>> >>>>
>>> >>>>    The logs are also in different formats such as .rtf .log .txt
>>> >>>>
>>> >>>>    Later, I have to collect and convert them to one format and
>>> >>>>
>>> >>>>    collect to one location.
>>> >>>>
>>> >>>>    So I am asking which module of Hadoop that I need to study
>>> >>>>
>>> >>>>    for this implementation?? Or whole framework should I need
>>> >>>>
>>> >>>>    to study ??
>>> >>>>
>>> >>>>    Seeking for guidance,
>>> >>>>
>>> >>>>    Thank you !!
>
>
>
>
> --
> Cheers,
> Mayur.

Re: [Hadoop-Help]About Map-Reduce implementation

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi Mayur,

Those 3 modes are 3 differents ways to use Hadoop, however, the only
production mode here is the fully distributed one. The 2 others are
more for local testing. How many nodes are you expecting to use hadoop
on?

JM


2013/3/7 Mayur Patil <ra...@gmail.com>:
> Hello,
>
>    Now I am slowly understanding Hadoop working.
>
>   As I want to collect the logs from three machines
>
>   including Master itself . My small query is
>
>   which mode should I implement for this??
>
>                   Standalone Operation
>                   Pseudo-Distributed Operation
>                   Fully-Distributed Operation
>
>      Seeking for guidance,
>
>      Thank you !!
> --
> Cheers,
> Mayur
>
>
>
>
>>> Hi mayur,
>>>
>>> Flume is used for data collection. Pig is used for data processing.
>>> For eg, if you have a bunch of servers that you want to collect the
>>> logs from and push to HDFS - you would use flume. Now if you need to
>>> run some analysis on that data, you could use pig to do that.
>>>
>>> Sent from my iPhone
>>>
>>> On Feb 14, 2013, at 1:39 AM, Mayur Patil <ra...@gmail.com>
>>> wrote:
>>>
>>> > Hello,
>>> >
>>> >   I just read about Pig
>>> >
>>> >> Pig
>>> >> A data flow language and execution environment for exploring very
>>> > large datasets.
>>> >> Pig runs on HDFS and MapReduce clusters.
>>> >
>>> >   What the actual difference between Pig and Flume makes in logs
>>> > clustering??
>>> >
>>> >   Thank you !!
>>> > --
>>> > Cheers,
>>> > Mayur.
>>> >
>>> >
>>> >
>>> >> Hey Mayur,
>>> >>>
>>> >>> If you are collecting logs from multiple servers then you can use
>>> >>> flume
>>> >>> for the same.
>>> >>>
>>> >>> if the contents of the logs are different in format  then you can
>>> >>> just
>>> >>> use
>>> >>> textfileinput format to read and write into any other format you want
>>> >>> for
>>> >>> your processing in later part of your projects
>>> >>>
>>> >>> first thing you need to learn is how to setup hadoop
>>> >>> then you can try writing sample hadoop mapreduce jobs to read from
>>> >>> text
>>> >>> file and then process them and write the results into another file
>>> >>> then you can integrate flume as your log collection mechanism
>>> >>> once you get hold on the system then you can decide more on which
>>> >>> paths
>>> >>> you want to follow based on your requirements for storage, compute
>>> >>> time,
>>> >>> compute capacity, compression etc
>>> >>>
>>> >> --------------
>>> >> --------------
>>> >>
>>> >>> Hi,
>>> >>>
>>> >>> Please read basics on how hadoop works.
>>> >>>
>>> >>> Then start your hands on with map reduce coding.
>>> >>>
>>> >>> The tool which has been made for you is flume , but don't see tool
>>> >>> till
>>> >>> you complete above two steps.
>>> >>>
>>> >>> Good luck , keep us posted.
>>> >>>
>>> >>> Regards,
>>> >>>
>>> >>> Jagat Singh
>>> >>>
>>> >>> -----------
>>> >>> Sent from Mobile , short and crisp.
>>> >>> On 06-Feb-2013 8:32 AM, "Mayur Patil" <ra...@gmail.com>
>>> >>> wrote:
>>> >>>
>>> >>>> Hello,
>>> >>>>
>>> >>>>    I am new to Hadoop. I am doing a project in cloud in which I
>>> >>>>
>>> >>>>    have to use hadoop for Map-reduce. It is such that I am going
>>> >>>>
>>> >>>>    to collect logs from 2-3 machines having different locations.
>>> >>>>
>>> >>>>    The logs are also in different formats such as .rtf .log .txt
>>> >>>>
>>> >>>>    Later, I have to collect and convert them to one format and
>>> >>>>
>>> >>>>    collect to one location.
>>> >>>>
>>> >>>>    So I am asking which module of Hadoop that I need to study
>>> >>>>
>>> >>>>    for this implementation?? Or whole framework should I need
>>> >>>>
>>> >>>>    to study ??
>>> >>>>
>>> >>>>    Seeking for guidance,
>>> >>>>
>>> >>>>    Thank you !!
>
>
>
>
> --
> Cheers,
> Mayur.

Re: [Hadoop-Help]About Map-Reduce implementation

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi Mayur,

Those 3 modes are 3 differents ways to use Hadoop, however, the only
production mode here is the fully distributed one. The 2 others are
more for local testing. How many nodes are you expecting to use hadoop
on?

JM


2013/3/7 Mayur Patil <ra...@gmail.com>:
> Hello,
>
>    Now I am slowly understanding Hadoop working.
>
>   As I want to collect the logs from three machines
>
>   including Master itself . My small query is
>
>   which mode should I implement for this??
>
>                   Standalone Operation
>                   Pseudo-Distributed Operation
>                   Fully-Distributed Operation
>
>      Seeking for guidance,
>
>      Thank you !!
> --
> Cheers,
> Mayur
>
>
>
>
>>> Hi mayur,
>>>
>>> Flume is used for data collection. Pig is used for data processing.
>>> For eg, if you have a bunch of servers that you want to collect the
>>> logs from and push to HDFS - you would use flume. Now if you need to
>>> run some analysis on that data, you could use pig to do that.
>>>
>>> Sent from my iPhone
>>>
>>> On Feb 14, 2013, at 1:39 AM, Mayur Patil <ra...@gmail.com>
>>> wrote:
>>>
>>> > Hello,
>>> >
>>> >   I just read about Pig
>>> >
>>> >> Pig
>>> >> A data flow language and execution environment for exploring very
>>> > large datasets.
>>> >> Pig runs on HDFS and MapReduce clusters.
>>> >
>>> >   What the actual difference between Pig and Flume makes in logs
>>> > clustering??
>>> >
>>> >   Thank you !!
>>> > --
>>> > Cheers,
>>> > Mayur.
>>> >
>>> >
>>> >
>>> >> Hey Mayur,
>>> >>>
>>> >>> If you are collecting logs from multiple servers then you can use
>>> >>> flume
>>> >>> for the same.
>>> >>>
>>> >>> if the contents of the logs are different in format  then you can
>>> >>> just
>>> >>> use
>>> >>> textfileinput format to read and write into any other format you want
>>> >>> for
>>> >>> your processing in later part of your projects
>>> >>>
>>> >>> first thing you need to learn is how to setup hadoop
>>> >>> then you can try writing sample hadoop mapreduce jobs to read from
>>> >>> text
>>> >>> file and then process them and write the results into another file
>>> >>> then you can integrate flume as your log collection mechanism
>>> >>> once you get hold on the system then you can decide more on which
>>> >>> paths
>>> >>> you want to follow based on your requirements for storage, compute
>>> >>> time,
>>> >>> compute capacity, compression etc
>>> >>>
>>> >> --------------
>>> >> --------------
>>> >>
>>> >>> Hi,
>>> >>>
>>> >>> Please read basics on how hadoop works.
>>> >>>
>>> >>> Then start your hands on with map reduce coding.
>>> >>>
>>> >>> The tool which has been made for you is flume , but don't see tool
>>> >>> till
>>> >>> you complete above two steps.
>>> >>>
>>> >>> Good luck , keep us posted.
>>> >>>
>>> >>> Regards,
>>> >>>
>>> >>> Jagat Singh
>>> >>>
>>> >>> -----------
>>> >>> Sent from Mobile , short and crisp.
>>> >>> On 06-Feb-2013 8:32 AM, "Mayur Patil" <ra...@gmail.com>
>>> >>> wrote:
>>> >>>
>>> >>>> Hello,
>>> >>>>
>>> >>>>    I am new to Hadoop. I am doing a project in cloud in which I
>>> >>>>
>>> >>>>    have to use hadoop for Map-reduce. It is such that I am going
>>> >>>>
>>> >>>>    to collect logs from 2-3 machines having different locations.
>>> >>>>
>>> >>>>    The logs are also in different formats such as .rtf .log .txt
>>> >>>>
>>> >>>>    Later, I have to collect and convert them to one format and
>>> >>>>
>>> >>>>    collect to one location.
>>> >>>>
>>> >>>>    So I am asking which module of Hadoop that I need to study
>>> >>>>
>>> >>>>    for this implementation?? Or whole framework should I need
>>> >>>>
>>> >>>>    to study ??
>>> >>>>
>>> >>>>    Seeking for guidance,
>>> >>>>
>>> >>>>    Thank you !!
>
>
>
>
> --
> Cheers,
> Mayur.

Re: [Hadoop-Help]About Map-Reduce implementation

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi Mayur,

Those 3 modes are 3 differents ways to use Hadoop, however, the only
production mode here is the fully distributed one. The 2 others are
more for local testing. How many nodes are you expecting to use hadoop
on?

JM


2013/3/7 Mayur Patil <ra...@gmail.com>:
> Hello,
>
>    Now I am slowly understanding Hadoop working.
>
>   As I want to collect the logs from three machines
>
>   including Master itself . My small query is
>
>   which mode should I implement for this??
>
>                   Standalone Operation
>                   Pseudo-Distributed Operation
>                   Fully-Distributed Operation
>
>      Seeking for guidance,
>
>      Thank you !!
> --
> Cheers,
> Mayur
>
>
>
>
>>> Hi mayur,
>>>
>>> Flume is used for data collection. Pig is used for data processing.
>>> For eg, if you have a bunch of servers that you want to collect the
>>> logs from and push to HDFS - you would use flume. Now if you need to
>>> run some analysis on that data, you could use pig to do that.
>>>
>>> Sent from my iPhone
>>>
>>> On Feb 14, 2013, at 1:39 AM, Mayur Patil <ra...@gmail.com>
>>> wrote:
>>>
>>> > Hello,
>>> >
>>> >   I just read about Pig
>>> >
>>> >> Pig
>>> >> A data flow language and execution environment for exploring very
>>> > large datasets.
>>> >> Pig runs on HDFS and MapReduce clusters.
>>> >
>>> >   What the actual difference between Pig and Flume makes in logs
>>> > clustering??
>>> >
>>> >   Thank you !!
>>> > --
>>> > Cheers,
>>> > Mayur.
>>> >
>>> >
>>> >
>>> >> Hey Mayur,
>>> >>>
>>> >>> If you are collecting logs from multiple servers then you can use
>>> >>> flume
>>> >>> for the same.
>>> >>>
>>> >>> if the contents of the logs are different in format  then you can
>>> >>> just
>>> >>> use
>>> >>> textfileinput format to read and write into any other format you want
>>> >>> for
>>> >>> your processing in later part of your projects
>>> >>>
>>> >>> first thing you need to learn is how to setup hadoop
>>> >>> then you can try writing sample hadoop mapreduce jobs to read from
>>> >>> text
>>> >>> file and then process them and write the results into another file
>>> >>> then you can integrate flume as your log collection mechanism
>>> >>> once you get hold on the system then you can decide more on which
>>> >>> paths
>>> >>> you want to follow based on your requirements for storage, compute
>>> >>> time,
>>> >>> compute capacity, compression etc
>>> >>>
>>> >> --------------
>>> >> --------------
>>> >>
>>> >>> Hi,
>>> >>>
>>> >>> Please read basics on how hadoop works.
>>> >>>
>>> >>> Then start your hands on with map reduce coding.
>>> >>>
>>> >>> The tool which has been made for you is flume , but don't see tool
>>> >>> till
>>> >>> you complete above two steps.
>>> >>>
>>> >>> Good luck , keep us posted.
>>> >>>
>>> >>> Regards,
>>> >>>
>>> >>> Jagat Singh
>>> >>>
>>> >>> -----------
>>> >>> Sent from Mobile , short and crisp.
>>> >>> On 06-Feb-2013 8:32 AM, "Mayur Patil" <ra...@gmail.com>
>>> >>> wrote:
>>> >>>
>>> >>>> Hello,
>>> >>>>
>>> >>>>    I am new to Hadoop. I am doing a project in cloud in which I
>>> >>>>
>>> >>>>    have to use hadoop for Map-reduce. It is such that I am going
>>> >>>>
>>> >>>>    to collect logs from 2-3 machines having different locations.
>>> >>>>
>>> >>>>    The logs are also in different formats such as .rtf .log .txt
>>> >>>>
>>> >>>>    Later, I have to collect and convert them to one format and
>>> >>>>
>>> >>>>    collect to one location.
>>> >>>>
>>> >>>>    So I am asking which module of Hadoop that I need to study
>>> >>>>
>>> >>>>    for this implementation?? Or whole framework should I need
>>> >>>>
>>> >>>>    to study ??
>>> >>>>
>>> >>>>    Seeking for guidance,
>>> >>>>
>>> >>>>    Thank you !!
>
>
>
>
> --
> Cheers,
> Mayur.

Re: [Hadoop-Help]About Map-Reduce implementation

Posted by Mayur Patil <ra...@gmail.com>.

Hello,

   Now I am slowly understanding Hadoop working.

  As I want to collect the logs from three machines

  including Master itself . My small query is

  which mode should I implement for this??

                  Standalone Operation
                  Pseudo-Distributed Operation
                  Fully-Distributed Operation

     Seeking for guidance,

     Thank you !!
*--
Cheers,
Mayur*




Hi mayur,
>>
>> Flume is used for data collection. Pig is used for data processing.
>> For eg, if you have a bunch of servers that you want to collect the
>> logs from and push to HDFS - you would use flume. Now if you need to
>> run some analysis on that data, you could use pig to do that.
>>
>> Sent from my iPhone
>>
>> On Feb 14, 2013, at 1:39 AM, Mayur Patil <ra...@gmail.com>
>> wrote:
>>
>> > Hello,
>> >
>> >   I just read about Pig
>> >
>> >> Pig
>> >> A data flow language and execution environment for exploring very
>> > large datasets.
>> >> Pig runs on HDFS and MapReduce clusters.
>> >
>> >   What the actual difference between Pig and Flume makes in logs
>> clustering??
>> >
>> >   Thank you !!
>> > --
>> > Cheers,
>> > Mayur.
>> >
>> >
>> >
>> >> Hey Mayur,
>> >>>
>> >>> If you are collecting logs from multiple servers then you can use
>> flume
>> >>> for the same.
>> >>>
>> >>> if the contents of the logs are different in format  then you can just
>> >>> use
>> >>> textfileinput format to read and write into any other format you want
>> for
>> >>> your processing in later part of your projects
>> >>>
>> >>> first thing you need to learn is how to setup hadoop
>> >>> then you can try writing sample hadoop mapreduce jobs to read from
>> text
>> >>> file and then process them and write the results into another file
>> >>> then you can integrate flume as your log collection mechanism
>> >>> once you get hold on the system then you can decide more on which
>> paths
>> >>> you want to follow based on your requirements for storage, compute
>> time,
>> >>> compute capacity, compression etc
>> >>>
>> >> --------------
>> >> --------------
>> >>
>> >>> Hi,
>> >>>
>> >>> Please read basics on how hadoop works.
>> >>>
>> >>> Then start your hands on with map reduce coding.
>> >>>
>> >>> The tool which has been made for you is flume , but don't see tool
>> till
>> >>> you complete above two steps.
>> >>>
>> >>> Good luck , keep us posted.
>> >>>
>> >>> Regards,
>> >>>
>> >>> Jagat Singh
>> >>>
>> >>> -----------
>> >>> Sent from Mobile , short and crisp.
>> >>> On 06-Feb-2013 8:32 AM, "Mayur Patil" <ra...@gmail.com>
>> wrote:
>> >>>
>> >>>> Hello,
>> >>>>
>> >>>>    I am new to Hadoop. I am doing a project in cloud in which I
>> >>>>
>> >>>>    have to use hadoop for Map-reduce. It is such that I am going
>> >>>>
>> >>>>    to collect logs from 2-3 machines having different locations.
>> >>>>
>> >>>>    The logs are also in different formats such as .rtf .log .txt
>> >>>>
>> >>>>    Later, I have to collect and convert them to one format and
>> >>>>
>> >>>>    collect to one location.
>> >>>>
>> >>>>    So I am asking which module of Hadoop that I need to study
>> >>>>
>> >>>>    for this implementation?? Or whole framework should I need
>> >>>>
>> >>>>    to study ??
>> >>>>
>> >>>>    Seeking for guidance,
>> >>>>
>> >>>>    Thank you !!
>>
>


-- 
*Cheers,
Mayur*.

Re: [Hadoop-Help]About Map-Reduce implementation

Posted by Mayur Patil <ra...@gmail.com>.

Hello,

   Now I am slowly understanding Hadoop working.

  As I want to collect the logs from three machines

  including Master itself . My small query is

  which mode should I implement for this??

                  Standalone Operation
                  Pseudo-Distributed Operation
                  Fully-Distributed Operation

     Seeking for guidance,

     Thank you !!
*--
Cheers,
Mayur*




Hi mayur,
>>
>> Flume is used for data collection. Pig is used for data processing.
>> For eg, if you have a bunch of servers that you want to collect the
>> logs from and push to HDFS - you would use flume. Now if you need to
>> run some analysis on that data, you could use pig to do that.
>>
>> Sent from my iPhone
>>
>> On Feb 14, 2013, at 1:39 AM, Mayur Patil <ra...@gmail.com>
>> wrote:
>>
>> > Hello,
>> >
>> >   I just read about Pig
>> >
>> >> Pig
>> >> A data flow language and execution environment for exploring very
>> > large datasets.
>> >> Pig runs on HDFS and MapReduce clusters.
>> >
>> >   What the actual difference between Pig and Flume makes in logs
>> clustering??
>> >
>> >   Thank you !!
>> > --
>> > Cheers,
>> > Mayur.
>> >
>> >
>> >
>> >> Hey Mayur,
>> >>>
>> >>> If you are collecting logs from multiple servers then you can use
>> flume
>> >>> for the same.
>> >>>
>> >>> if the contents of the logs are different in format  then you can just
>> >>> use
>> >>> textfileinput format to read and write into any other format you want
>> for
>> >>> your processing in later part of your projects
>> >>>
>> >>> first thing you need to learn is how to setup hadoop
>> >>> then you can try writing sample hadoop mapreduce jobs to read from
>> text
>> >>> file and then process them and write the results into another file
>> >>> then you can integrate flume as your log collection mechanism
>> >>> once you get hold on the system then you can decide more on which
>> paths
>> >>> you want to follow based on your requirements for storage, compute
>> time,
>> >>> compute capacity, compression etc
>> >>>
>> >> --------------
>> >> --------------
>> >>
>> >>> Hi,
>> >>>
>> >>> Please read basics on how hadoop works.
>> >>>
>> >>> Then start your hands on with map reduce coding.
>> >>>
>> >>> The tool which has been made for you is flume , but don't see tool
>> till
>> >>> you complete above two steps.
>> >>>
>> >>> Good luck , keep us posted.
>> >>>
>> >>> Regards,
>> >>>
>> >>> Jagat Singh
>> >>>
>> >>> -----------
>> >>> Sent from Mobile , short and crisp.
>> >>> On 06-Feb-2013 8:32 AM, "Mayur Patil" <ra...@gmail.com>
>> wrote:
>> >>>
>> >>>> Hello,
>> >>>>
>> >>>>    I am new to Hadoop. I am doing a project in cloud in which I
>> >>>>
>> >>>>    have to use hadoop for Map-reduce. It is such that I am going
>> >>>>
>> >>>>    to collect logs from 2-3 machines having different locations.
>> >>>>
>> >>>>    The logs are also in different formats such as .rtf .log .txt
>> >>>>
>> >>>>    Later, I have to collect and convert them to one format and
>> >>>>
>> >>>>    collect to one location.
>> >>>>
>> >>>>    So I am asking which module of Hadoop that I need to study
>> >>>>
>> >>>>    for this implementation?? Or whole framework should I need
>> >>>>
>> >>>>    to study ??
>> >>>>
>> >>>>    Seeking for guidance,
>> >>>>
>> >>>>    Thank you !!
>>
>


-- 
*Cheers,
Mayur*.

Re: [Hadoop-Help]About Map-Reduce implementation

Posted by Mayur Patil <ra...@gmail.com>.

Hello,

   Now I am slowly understanding Hadoop working.

  As I want to collect the logs from three machines

  including Master itself . My small query is

  which mode should I implement for this??

                  Standalone Operation
                  Pseudo-Distributed Operation
                  Fully-Distributed Operation

     Seeking for guidance,

     Thank you !!
*--
Cheers,
Mayur*




Hi mayur,
>>
>> Flume is used for data collection. Pig is used for data processing.
>> For eg, if you have a bunch of servers that you want to collect the
>> logs from and push to HDFS - you would use flume. Now if you need to
>> run some analysis on that data, you could use pig to do that.
>>
>> Sent from my iPhone
>>
>> On Feb 14, 2013, at 1:39 AM, Mayur Patil <ra...@gmail.com>
>> wrote:
>>
>> > Hello,
>> >
>> >   I just read about Pig
>> >
>> >> Pig
>> >> A data flow language and execution environment for exploring very
>> > large datasets.
>> >> Pig runs on HDFS and MapReduce clusters.
>> >
>> >   What the actual difference between Pig and Flume makes in logs
>> clustering??
>> >
>> >   Thank you !!
>> > --
>> > Cheers,
>> > Mayur.
>> >
>> >
>> >
>> >> Hey Mayur,
>> >>>
>> >>> If you are collecting logs from multiple servers then you can use
>> flume
>> >>> for the same.
>> >>>
>> >>> if the contents of the logs are different in format  then you can just
>> >>> use
>> >>> textfileinput format to read and write into any other format you want
>> for
>> >>> your processing in later part of your projects
>> >>>
>> >>> first thing you need to learn is how to setup hadoop
>> >>> then you can try writing sample hadoop mapreduce jobs to read from
>> text
>> >>> file and then process them and write the results into another file
>> >>> then you can integrate flume as your log collection mechanism
>> >>> once you get hold on the system then you can decide more on which
>> paths
>> >>> you want to follow based on your requirements for storage, compute
>> time,
>> >>> compute capacity, compression etc
>> >>>
>> >> --------------
>> >> --------------
>> >>
>> >>> Hi,
>> >>>
>> >>> Please read basics on how hadoop works.
>> >>>
>> >>> Then start your hands on with map reduce coding.
>> >>>
>> >>> The tool which has been made for you is flume , but don't see tool
>> till
>> >>> you complete above two steps.
>> >>>
>> >>> Good luck , keep us posted.
>> >>>
>> >>> Regards,
>> >>>
>> >>> Jagat Singh
>> >>>
>> >>> -----------
>> >>> Sent from Mobile , short and crisp.
>> >>> On 06-Feb-2013 8:32 AM, "Mayur Patil" <ra...@gmail.com>
>> wrote:
>> >>>
>> >>>> Hello,
>> >>>>
>> >>>>    I am new to Hadoop. I am doing a project in cloud in which I
>> >>>>
>> >>>>    have to use hadoop for Map-reduce. It is such that I am going
>> >>>>
>> >>>>    to collect logs from 2-3 machines having different locations.
>> >>>>
>> >>>>    The logs are also in different formats such as .rtf .log .txt
>> >>>>
>> >>>>    Later, I have to collect and convert them to one format and
>> >>>>
>> >>>>    collect to one location.
>> >>>>
>> >>>>    So I am asking which module of Hadoop that I need to study
>> >>>>
>> >>>>    for this implementation?? Or whole framework should I need
>> >>>>
>> >>>>    to study ??
>> >>>>
>> >>>>    Seeking for guidance,
>> >>>>
>> >>>>    Thank you !!
>>
>


-- 
*Cheers,
Mayur*.

Re: [Hadoop-Help]About Map-Reduce implementation

Posted by Mayur Patil <ra...@gmail.com>.

Hello,

   Now I am slowly understanding Hadoop working.

  As I want to collect the logs from three machines

  including Master itself . My small query is

  which mode should I implement for this??

                  Standalone Operation
                  Pseudo-Distributed Operation
                  Fully-Distributed Operation

     Seeking for guidance,

     Thank you !!
*--
Cheers,
Mayur*




Hi mayur,
>>
>> Flume is used for data collection. Pig is used for data processing.
>> For eg, if you have a bunch of servers that you want to collect the
>> logs from and push to HDFS - you would use flume. Now if you need to
>> run some analysis on that data, you could use pig to do that.
>>
>> Sent from my iPhone
>>
>> On Feb 14, 2013, at 1:39 AM, Mayur Patil <ra...@gmail.com>
>> wrote:
>>
>> > Hello,
>> >
>> >   I just read about Pig
>> >
>> >> Pig
>> >> A data flow language and execution environment for exploring very
>> > large datasets.
>> >> Pig runs on HDFS and MapReduce clusters.
>> >
>> >   What the actual difference between Pig and Flume makes in logs
>> clustering??
>> >
>> >   Thank you !!
>> > --
>> > Cheers,
>> > Mayur.
>> >
>> >
>> >
>> >> Hey Mayur,
>> >>>
>> >>> If you are collecting logs from multiple servers then you can use
>> flume
>> >>> for the same.
>> >>>
>> >>> if the contents of the logs are different in format  then you can just
>> >>> use
>> >>> textfileinput format to read and write into any other format you want
>> for
>> >>> your processing in later part of your projects
>> >>>
>> >>> first thing you need to learn is how to setup hadoop
>> >>> then you can try writing sample hadoop mapreduce jobs to read from
>> text
>> >>> file and then process them and write the results into another file
>> >>> then you can integrate flume as your log collection mechanism
>> >>> once you get hold on the system then you can decide more on which
>> paths
>> >>> you want to follow based on your requirements for storage, compute
>> time,
>> >>> compute capacity, compression etc
>> >>>
>> >> --------------
>> >> --------------
>> >>
>> >>> Hi,
>> >>>
>> >>> Please read basics on how hadoop works.
>> >>>
>> >>> Then start your hands on with map reduce coding.
>> >>>
>> >>> The tool which has been made for you is flume , but don't see tool
>> till
>> >>> you complete above two steps.
>> >>>
>> >>> Good luck , keep us posted.
>> >>>
>> >>> Regards,
>> >>>
>> >>> Jagat Singh
>> >>>
>> >>> -----------
>> >>> Sent from Mobile , short and crisp.
>> >>> On 06-Feb-2013 8:32 AM, "Mayur Patil" <ra...@gmail.com>
>> wrote:
>> >>>
>> >>>> Hello,
>> >>>>
>> >>>>    I am new to Hadoop. I am doing a project in cloud in which I
>> >>>>
>> >>>>    have to use hadoop for Map-reduce. It is such that I am going
>> >>>>
>> >>>>    to collect logs from 2-3 machines having different locations.
>> >>>>
>> >>>>    The logs are also in different formats such as .rtf .log .txt
>> >>>>
>> >>>>    Later, I have to collect and convert them to one format and
>> >>>>
>> >>>>    collect to one location.
>> >>>>
>> >>>>    So I am asking which module of Hadoop that I need to study
>> >>>>
>> >>>>    for this implementation?? Or whole framework should I need
>> >>>>
>> >>>>    to study ??
>> >>>>
>> >>>>    Seeking for guidance,
>> >>>>
>> >>>>    Thank you !!
>>
>


-- 
*Cheers,
Mayur*.