You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by yo...@wipro.com on 2012/11/07 16:33:07 UTC

Map-Reduce V/S Hadoop Ecosystem

Hello Hadoop Champs,

Please give some suggestion..

As Hadoop Ecosystem(Hive, Pig...) internally do Map-Reduce to process.

My Question is 

1). where Map-Reduce program(written in Java, python etc) are overtaking Hadoop Ecosystem.

2). Limitations of Hadoop Ecosystem comparing with Writing Map-Reduce program.

3) for writing Map-Reduce jobs in java how much we need to have skills in java out of 10 (?/10) 


Please put some light over it.


Thanks & Regards
Yogesh Kumar


The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

RE: Map-Reduce V/S Hadoop Ecosystem

Posted by "Kartashov, Andy" <An...@mpac.ca>.

The way I understand it...

Hadoop is a distributed file system that allows you to create folders in its own NameSpace, copy  files to and  from your local Linux FS. Set-up Hadoop configuration for local|pseudo-distributed|fully-distributed cluster.

You write your jobs using MapReduce API and execute them in Hadoop eco-system. I write mine in Java and  would rate my experience, say at 6-7. I use Sqoop to import data into HDFS from RDMS MySql. I use Sqoop-generated classes to read data into the Mapper and design my own classes for the reducer which provide the required output.

Rgds,

-----Original Message-----
From: yogesh.kumar13@wipro.com [mailto:yogesh.kumar13@wipro.com]
Sent: Wednesday, November 07, 2012 10:33 AM
To: user@hadoop.apache.org
Subject: Map-Reduce V/S Hadoop Ecosystem

Hello Hadoop Champs,

Please give some suggestion..

As Hadoop Ecosystem(Hive, Pig...) internally do Map-Reduce to process.

My Question is

1). where Map-Reduce program(written in Java, python etc) are overtaking Hadoop Ecosystem.

2). Limitations of Hadoop Ecosystem comparing with Writing Map-Reduce program.

3) for writing Map-Reduce jobs in java how much we need to have skills in java out of 10 (?/10)


Please put some light over it.


Thanks & Regards
Yogesh Kumar


The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com
NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

RE: Map-Reduce V/S Hadoop Ecosystem

Posted by "Kartashov, Andy" <An...@mpac.ca>.

The way I understand it...

Hadoop is a distributed file system that allows you to create folders in its own NameSpace, copy  files to and  from your local Linux FS. Set-up Hadoop configuration for local|pseudo-distributed|fully-distributed cluster.

You write your jobs using MapReduce API and execute them in Hadoop eco-system. I write mine in Java and  would rate my experience, say at 6-7. I use Sqoop to import data into HDFS from RDMS MySql. I use Sqoop-generated classes to read data into the Mapper and design my own classes for the reducer which provide the required output.

Rgds,

-----Original Message-----
From: yogesh.kumar13@wipro.com [mailto:yogesh.kumar13@wipro.com]
Sent: Wednesday, November 07, 2012 10:33 AM
To: user@hadoop.apache.org
Subject: Map-Reduce V/S Hadoop Ecosystem

Hello Hadoop Champs,

Please give some suggestion..

As Hadoop Ecosystem(Hive, Pig...) internally do Map-Reduce to process.

My Question is

1). where Map-Reduce program(written in Java, python etc) are overtaking Hadoop Ecosystem.

2). Limitations of Hadoop Ecosystem comparing with Writing Map-Reduce program.

3) for writing Map-Reduce jobs in java how much we need to have skills in java out of 10 (?/10)


Please put some light over it.


Thanks & Regards
Yogesh Kumar


The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com
NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

Re: Map-Reduce V/S Hadoop Ecosystem

Posted by Russell Jurney <ru...@gmail.com>.

Hourly consultants may prefer MapReduce. Everyone else should be using Pig,
Hive, Cascading, etc.

Russell Jurney twitter.com/rjurney


On Nov 7, 2012, at 8:08 PM, yogesh dhari <yo...@live.com> wrote:

 Thanks Bejoy Sir,

I am always grateful to u for your help.
Please explain these word into simple language with some case (if possible)

" If your requirement is that complex and you need very low level control
of your code mapreduce is better. If you are an expert in mapreduce your
code can be efficient as yours would very specific to your app but the MR
in hive and pig may be more generic.
"
If my requirement is complex. I will prefer Ecosystem(bcoz it provide
simple interface to run Map-Reduce).
What is low level control of code. belongs here?? plz provide an example..

please do explain it into simple way so that I can understand your point of
view.

Regards
Yogesh Kumar



> Subject: Re: Map-Reduce V/S Hadoop Ecosystem
> To: user@hadoop.apache.org
> From: bejoy.hadoop@gmail.com
> Date: Wed, 7 Nov 2012 18:24:52 +0000
>
> Hi Yogesh,
>
> The development time in Pig and hive are pretty less compared to its
equivalent mapreduce code and for generic cases it is very efficient.
> If your requirement is that complex and you need very low level control
of your code mapreduce is better. If you are an expert in mapreduce your
code can be efficient as yours would very specific to your app but the MR
in hive and pig may be more generic.
>
> To just write your custom mapreduce functions, just basic knowledge on
java is good. As you are better with java you can understand the internals
better.
>
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
>
> -----Original Message-----
> From: <yo...@wipro.com>
> Date: Wed, 7 Nov 2012 15:33:07
> To: <us...@hadoop.apache.org>
> Reply-To: user@hadoop.apache.org
> Subject: Map-Reduce V/S Hadoop Ecosystem
>
> Hello Hadoop Champs,
>
> Please give some suggestion..
>
> As Hadoop Ecosystem(Hive, Pig...) internally do Map-Reduce to process.
>
> My Question is
>
> 1). where Map-Reduce program(written in Java, python etc) are overtaking
Hadoop Ecosystem.
>
> 2). Limitations of Hadoop Ecosystem comparing with Writing Map-Reduce
program.
>
> 3) for writing Map-Reduce jobs in java how much we need to have skills in
java out of 10 (?/10)
>
>
> Please put some light over it.
>
>
> Thanks & Regards
> Yogesh Kumar
>
>
> The information contained in this electronic message and any attachments
to this message are intended for the exclusive use of the addressee(s) and
may contain proprietary, confidential or privileged information. If you are
not the intended recipient, you should not disseminate, distribute or copy
this e-mail. Please notify the sender immediately and destroy all copies of
this message and any attachments.
>
> WARNING: Computer viruses can be transmitted via email. The recipient
should check this email and any attachments for the presence of viruses.
The company accepts no liability for any damage caused by any virus
transmitted by this email.
>
> www.wipro.com

Re: Map-Reduce V/S Hadoop Ecosystem

Posted by Russell Jurney <ru...@gmail.com>.

Hourly consultants may prefer MapReduce. Everyone else should be using Pig,
Hive, Cascading, etc.

Russell Jurney twitter.com/rjurney


On Nov 7, 2012, at 8:08 PM, yogesh dhari <yo...@live.com> wrote:

 Thanks Bejoy Sir,

I am always grateful to u for your help.
Please explain these word into simple language with some case (if possible)

" If your requirement is that complex and you need very low level control
of your code mapreduce is better. If you are an expert in mapreduce your
code can be efficient as yours would very specific to your app but the MR
in hive and pig may be more generic.
"
If my requirement is complex. I will prefer Ecosystem(bcoz it provide
simple interface to run Map-Reduce).
What is low level control of code. belongs here?? plz provide an example..

please do explain it into simple way so that I can understand your point of
view.

Regards
Yogesh Kumar



> Subject: Re: Map-Reduce V/S Hadoop Ecosystem
> To: user@hadoop.apache.org
> From: bejoy.hadoop@gmail.com
> Date: Wed, 7 Nov 2012 18:24:52 +0000
>
> Hi Yogesh,
>
> The development time in Pig and hive are pretty less compared to its
equivalent mapreduce code and for generic cases it is very efficient.
> If your requirement is that complex and you need very low level control
of your code mapreduce is better. If you are an expert in mapreduce your
code can be efficient as yours would very specific to your app but the MR
in hive and pig may be more generic.
>
> To just write your custom mapreduce functions, just basic knowledge on
java is good. As you are better with java you can understand the internals
better.
>
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
>
> -----Original Message-----
> From: <yo...@wipro.com>
> Date: Wed, 7 Nov 2012 15:33:07
> To: <us...@hadoop.apache.org>
> Reply-To: user@hadoop.apache.org
> Subject: Map-Reduce V/S Hadoop Ecosystem
>
> Hello Hadoop Champs,
>
> Please give some suggestion..
>
> As Hadoop Ecosystem(Hive, Pig...) internally do Map-Reduce to process.
>
> My Question is
>
> 1). where Map-Reduce program(written in Java, python etc) are overtaking
Hadoop Ecosystem.
>
> 2). Limitations of Hadoop Ecosystem comparing with Writing Map-Reduce
program.
>
> 3) for writing Map-Reduce jobs in java how much we need to have skills in
java out of 10 (?/10)
>
>
> Please put some light over it.
>
>
> Thanks & Regards
> Yogesh Kumar
>
>
> The information contained in this electronic message and any attachments
to this message are intended for the exclusive use of the addressee(s) and
may contain proprietary, confidential or privileged information. If you are
not the intended recipient, you should not disseminate, distribute or copy
this e-mail. Please notify the sender immediately and destroy all copies of
this message and any attachments.
>
> WARNING: Computer viruses can be transmitted via email. The recipient
should check this email and any attachments for the presence of viruses.
The company accepts no liability for any damage caused by any virus
transmitted by this email.
>
> www.wipro.com

Re: Map-Reduce V/S Hadoop Ecosystem

Posted by Russell Jurney <ru...@gmail.com>.

Hourly consultants may prefer MapReduce. Everyone else should be using Pig,
Hive, Cascading, etc.

Russell Jurney twitter.com/rjurney


On Nov 7, 2012, at 8:08 PM, yogesh dhari <yo...@live.com> wrote:

 Thanks Bejoy Sir,

I am always grateful to u for your help.
Please explain these word into simple language with some case (if possible)

" If your requirement is that complex and you need very low level control
of your code mapreduce is better. If you are an expert in mapreduce your
code can be efficient as yours would very specific to your app but the MR
in hive and pig may be more generic.
"
If my requirement is complex. I will prefer Ecosystem(bcoz it provide
simple interface to run Map-Reduce).
What is low level control of code. belongs here?? plz provide an example..

please do explain it into simple way so that I can understand your point of
view.

Regards
Yogesh Kumar



> Subject: Re: Map-Reduce V/S Hadoop Ecosystem
> To: user@hadoop.apache.org
> From: bejoy.hadoop@gmail.com
> Date: Wed, 7 Nov 2012 18:24:52 +0000
>
> Hi Yogesh,
>
> The development time in Pig and hive are pretty less compared to its
equivalent mapreduce code and for generic cases it is very efficient.
> If your requirement is that complex and you need very low level control
of your code mapreduce is better. If you are an expert in mapreduce your
code can be efficient as yours would very specific to your app but the MR
in hive and pig may be more generic.
>
> To just write your custom mapreduce functions, just basic knowledge on
java is good. As you are better with java you can understand the internals
better.
>
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
>
> -----Original Message-----
> From: <yo...@wipro.com>
> Date: Wed, 7 Nov 2012 15:33:07
> To: <us...@hadoop.apache.org>
> Reply-To: user@hadoop.apache.org
> Subject: Map-Reduce V/S Hadoop Ecosystem
>
> Hello Hadoop Champs,
>
> Please give some suggestion..
>
> As Hadoop Ecosystem(Hive, Pig...) internally do Map-Reduce to process.
>
> My Question is
>
> 1). where Map-Reduce program(written in Java, python etc) are overtaking
Hadoop Ecosystem.
>
> 2). Limitations of Hadoop Ecosystem comparing with Writing Map-Reduce
program.
>
> 3) for writing Map-Reduce jobs in java how much we need to have skills in
java out of 10 (?/10)
>
>
> Please put some light over it.
>
>
> Thanks & Regards
> Yogesh Kumar
>
>
> The information contained in this electronic message and any attachments
to this message are intended for the exclusive use of the addressee(s) and
may contain proprietary, confidential or privileged information. If you are
not the intended recipient, you should not disseminate, distribute or copy
this e-mail. Please notify the sender immediately and destroy all copies of
this message and any attachments.
>
> WARNING: Computer viruses can be transmitted via email. The recipient
should check this email and any attachments for the presence of viruses.
The company accepts no liability for any damage caused by any virus
transmitted by this email.
>
> www.wipro.com

Re: Map-Reduce V/S Hadoop Ecosystem

Posted by Bejoy KS <be...@gmail.com>.

Hi Yogesh

Pretty much all the requirements fit well into hive, pig etc. The HivQL and pig latin are parsed by its respective parsers to map reduce jobs. This MR code thus generated is generic and is totally based on some rules defined in the parser.

But say your requirement has something more to be done in your job. Like updating some stats in hbase or so in between your hdfs data processing. When you combine hdfs data processing along with some hbase inserts/updates Hive / Pig may do it in two sets of MR jobs. But if you write a custom code you may be able to integrate this hbase updates  along with the Map/Reduce job that does hdfs data processing. Summarizing my thought the custom MR code can limit the no of MR jobs in this case.

There can be n number of complex scenarios like this where your custom code turns more efficient and performant.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: yogesh dhari <yo...@live.com>
Date: Thu, 8 Nov 2012 00:37:44 
To: hadoop helpforoum<us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: RE: Map-Reduce V/S Hadoop Ecosystem


Thanks Bejoy Sir,

I am always grateful to u for your help.
Please explain these word into simple language with some case (if possible)

" If your requirement is that complex and you need very low level control 
of your code mapreduce is better. If you are an expert in mapreduce your
 code can be efficient as yours would very specific to your app but the 
MR in hive and pig may be more generic.
"
If my requirement is complex. I will prefer Ecosystem(bcoz it provide simple interface to run Map-Reduce).  
What is low level control of code. belongs here?? plz provide an example..

please do explain it into simple way so that I can understand your point of view. 

Regards 
Yogesh Kumar



> Subject: Re: Map-Reduce V/S Hadoop Ecosystem
> To: user@hadoop.apache.org
> From: bejoy.hadoop@gmail.com
> Date: Wed, 7 Nov 2012 18:24:52 +0000
> 
> Hi Yogesh,
> 
> The development time in Pig and hive are pretty less compared to its equivalent mapreduce code and for generic cases it is very efficient. 
> If your requirement is that complex and you need very low level control of your code mapreduce is better. If you are an expert in mapreduce your code can be efficient as yours would very specific to your app but the MR in hive and pig may be more generic.
> 
> To just write your custom mapreduce functions, just basic knowledge on java is good. As you are better with java you can understand the internals better.
> 
> 
> Regards
> Bejoy KS
> 
> Sent from handheld, please excuse typos.
> 
> -----Original Message-----
> From: <yo...@wipro.com>
> Date: Wed, 7 Nov 2012 15:33:07 
> To: <us...@hadoop.apache.org>
> Reply-To: user@hadoop.apache.org
> Subject: Map-Reduce V/S Hadoop Ecosystem
> 
> Hello Hadoop Champs,
> 
> Please give some suggestion..
> 
> As Hadoop Ecosystem(Hive, Pig...) internally do Map-Reduce to process.
> 
> My Question is
> 
> 1). where Map-Reduce program(written in Java, python etc) are overtaking Hadoop Ecosystem.
> 
> 2). Limitations of Hadoop Ecosystem comparing with Writing Map-Reduce program.
> 
> 3) for writing Map-Reduce jobs in java how much we need to have skills in java out of 10 (?/10)
> 
> 
> Please put some light over it.
> 
> 
> Thanks & Regards
> Yogesh Kumar
> 
> 
> The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.
> 
> WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.
> 
> www.wipro.com

Re: Map-Reduce V/S Hadoop Ecosystem

Posted by Bejoy KS <be...@gmail.com>.

Hi Yogesh

Pretty much all the requirements fit well into hive, pig etc. The HivQL and pig latin are parsed by its respective parsers to map reduce jobs. This MR code thus generated is generic and is totally based on some rules defined in the parser.

But say your requirement has something more to be done in your job. Like updating some stats in hbase or so in between your hdfs data processing. When you combine hdfs data processing along with some hbase inserts/updates Hive / Pig may do it in two sets of MR jobs. But if you write a custom code you may be able to integrate this hbase updates  along with the Map/Reduce job that does hdfs data processing. Summarizing my thought the custom MR code can limit the no of MR jobs in this case.

There can be n number of complex scenarios like this where your custom code turns more efficient and performant.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: yogesh dhari <yo...@live.com>
Date: Thu, 8 Nov 2012 00:37:44 
To: hadoop helpforoum<us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: RE: Map-Reduce V/S Hadoop Ecosystem


Thanks Bejoy Sir,

I am always grateful to u for your help.
Please explain these word into simple language with some case (if possible)

" If your requirement is that complex and you need very low level control 
of your code mapreduce is better. If you are an expert in mapreduce your
 code can be efficient as yours would very specific to your app but the 
MR in hive and pig may be more generic.
"
If my requirement is complex. I will prefer Ecosystem(bcoz it provide simple interface to run Map-Reduce).  
What is low level control of code. belongs here?? plz provide an example..

please do explain it into simple way so that I can understand your point of view. 

Regards 
Yogesh Kumar



> Subject: Re: Map-Reduce V/S Hadoop Ecosystem
> To: user@hadoop.apache.org
> From: bejoy.hadoop@gmail.com
> Date: Wed, 7 Nov 2012 18:24:52 +0000
> 
> Hi Yogesh,
> 
> The development time in Pig and hive are pretty less compared to its equivalent mapreduce code and for generic cases it is very efficient. 
> If your requirement is that complex and you need very low level control of your code mapreduce is better. If you are an expert in mapreduce your code can be efficient as yours would very specific to your app but the MR in hive and pig may be more generic.
> 
> To just write your custom mapreduce functions, just basic knowledge on java is good. As you are better with java you can understand the internals better.
> 
> 
> Regards
> Bejoy KS
> 
> Sent from handheld, please excuse typos.
> 
> -----Original Message-----
> From: <yo...@wipro.com>
> Date: Wed, 7 Nov 2012 15:33:07 
> To: <us...@hadoop.apache.org>
> Reply-To: user@hadoop.apache.org
> Subject: Map-Reduce V/S Hadoop Ecosystem
> 
> Hello Hadoop Champs,
> 
> Please give some suggestion..
> 
> As Hadoop Ecosystem(Hive, Pig...) internally do Map-Reduce to process.
> 
> My Question is
> 
> 1). where Map-Reduce program(written in Java, python etc) are overtaking Hadoop Ecosystem.
> 
> 2). Limitations of Hadoop Ecosystem comparing with Writing Map-Reduce program.
> 
> 3) for writing Map-Reduce jobs in java how much we need to have skills in java out of 10 (?/10)
> 
> 
> Please put some light over it.
> 
> 
> Thanks & Regards
> Yogesh Kumar
> 
> 
> The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.
> 
> WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.
> 
> www.wipro.com

Re: Map-Reduce V/S Hadoop Ecosystem

Posted by Russell Jurney <ru...@gmail.com>.

Hourly consultants may prefer MapReduce. Everyone else should be using Pig,
Hive, Cascading, etc.

Russell Jurney twitter.com/rjurney


On Nov 7, 2012, at 8:08 PM, yogesh dhari <yo...@live.com> wrote:

 Thanks Bejoy Sir,

I am always grateful to u for your help.
Please explain these word into simple language with some case (if possible)

" If your requirement is that complex and you need very low level control
of your code mapreduce is better. If you are an expert in mapreduce your
code can be efficient as yours would very specific to your app but the MR
in hive and pig may be more generic.
"
If my requirement is complex. I will prefer Ecosystem(bcoz it provide
simple interface to run Map-Reduce).
What is low level control of code. belongs here?? plz provide an example..

please do explain it into simple way so that I can understand your point of
view.

Regards
Yogesh Kumar



> Subject: Re: Map-Reduce V/S Hadoop Ecosystem
> To: user@hadoop.apache.org
> From: bejoy.hadoop@gmail.com
> Date: Wed, 7 Nov 2012 18:24:52 +0000
>
> Hi Yogesh,
>
> The development time in Pig and hive are pretty less compared to its
equivalent mapreduce code and for generic cases it is very efficient.
> If your requirement is that complex and you need very low level control
of your code mapreduce is better. If you are an expert in mapreduce your
code can be efficient as yours would very specific to your app but the MR
in hive and pig may be more generic.
>
> To just write your custom mapreduce functions, just basic knowledge on
java is good. As you are better with java you can understand the internals
better.
>
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
>
> -----Original Message-----
> From: <yo...@wipro.com>
> Date: Wed, 7 Nov 2012 15:33:07
> To: <us...@hadoop.apache.org>
> Reply-To: user@hadoop.apache.org
> Subject: Map-Reduce V/S Hadoop Ecosystem
>
> Hello Hadoop Champs,
>
> Please give some suggestion..
>
> As Hadoop Ecosystem(Hive, Pig...) internally do Map-Reduce to process.
>
> My Question is
>
> 1). where Map-Reduce program(written in Java, python etc) are overtaking
Hadoop Ecosystem.
>
> 2). Limitations of Hadoop Ecosystem comparing with Writing Map-Reduce
program.
>
> 3) for writing Map-Reduce jobs in java how much we need to have skills in
java out of 10 (?/10)
>
>
> Please put some light over it.
>
>
> Thanks & Regards
> Yogesh Kumar
>
>
> The information contained in this electronic message and any attachments
to this message are intended for the exclusive use of the addressee(s) and
may contain proprietary, confidential or privileged information. If you are
not the intended recipient, you should not disseminate, distribute or copy
this e-mail. Please notify the sender immediately and destroy all copies of
this message and any attachments.
>
> WARNING: Computer viruses can be transmitted via email. The recipient
should check this email and any attachments for the presence of viruses.
The company accepts no liability for any damage caused by any virus
transmitted by this email.
>
> www.wipro.com

Re: Map-Reduce V/S Hadoop Ecosystem

Posted by Bejoy KS <be...@gmail.com>.

Hi Yogesh

Pretty much all the requirements fit well into hive, pig etc. The HivQL and pig latin are parsed by its respective parsers to map reduce jobs. This MR code thus generated is generic and is totally based on some rules defined in the parser.

But say your requirement has something more to be done in your job. Like updating some stats in hbase or so in between your hdfs data processing. When you combine hdfs data processing along with some hbase inserts/updates Hive / Pig may do it in two sets of MR jobs. But if you write a custom code you may be able to integrate this hbase updates  along with the Map/Reduce job that does hdfs data processing. Summarizing my thought the custom MR code can limit the no of MR jobs in this case.

There can be n number of complex scenarios like this where your custom code turns more efficient and performant.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: yogesh dhari <yo...@live.com>
Date: Thu, 8 Nov 2012 00:37:44 
To: hadoop helpforoum<us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: RE: Map-Reduce V/S Hadoop Ecosystem


Thanks Bejoy Sir,

I am always grateful to u for your help.
Please explain these word into simple language with some case (if possible)

" If your requirement is that complex and you need very low level control 
of your code mapreduce is better. If you are an expert in mapreduce your
 code can be efficient as yours would very specific to your app but the 
MR in hive and pig may be more generic.
"
If my requirement is complex. I will prefer Ecosystem(bcoz it provide simple interface to run Map-Reduce).  
What is low level control of code. belongs here?? plz provide an example..

please do explain it into simple way so that I can understand your point of view. 

Regards 
Yogesh Kumar



> Subject: Re: Map-Reduce V/S Hadoop Ecosystem
> To: user@hadoop.apache.org
> From: bejoy.hadoop@gmail.com
> Date: Wed, 7 Nov 2012 18:24:52 +0000
> 
> Hi Yogesh,
> 
> The development time in Pig and hive are pretty less compared to its equivalent mapreduce code and for generic cases it is very efficient. 
> If your requirement is that complex and you need very low level control of your code mapreduce is better. If you are an expert in mapreduce your code can be efficient as yours would very specific to your app but the MR in hive and pig may be more generic.
> 
> To just write your custom mapreduce functions, just basic knowledge on java is good. As you are better with java you can understand the internals better.
> 
> 
> Regards
> Bejoy KS
> 
> Sent from handheld, please excuse typos.
> 
> -----Original Message-----
> From: <yo...@wipro.com>
> Date: Wed, 7 Nov 2012 15:33:07 
> To: <us...@hadoop.apache.org>
> Reply-To: user@hadoop.apache.org
> Subject: Map-Reduce V/S Hadoop Ecosystem
> 
> Hello Hadoop Champs,
> 
> Please give some suggestion..
> 
> As Hadoop Ecosystem(Hive, Pig...) internally do Map-Reduce to process.
> 
> My Question is
> 
> 1). where Map-Reduce program(written in Java, python etc) are overtaking Hadoop Ecosystem.
> 
> 2). Limitations of Hadoop Ecosystem comparing with Writing Map-Reduce program.
> 
> 3) for writing Map-Reduce jobs in java how much we need to have skills in java out of 10 (?/10)
> 
> 
> Please put some light over it.
> 
> 
> Thanks & Regards
> Yogesh Kumar
> 
> 
> The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.
> 
> WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.
> 
> www.wipro.com

Re: Map-Reduce V/S Hadoop Ecosystem

Posted by Bejoy KS <be...@gmail.com>.

Hi Yogesh

Pretty much all the requirements fit well into hive, pig etc. The HivQL and pig latin are parsed by its respective parsers to map reduce jobs. This MR code thus generated is generic and is totally based on some rules defined in the parser.

But say your requirement has something more to be done in your job. Like updating some stats in hbase or so in between your hdfs data processing. When you combine hdfs data processing along with some hbase inserts/updates Hive / Pig may do it in two sets of MR jobs. But if you write a custom code you may be able to integrate this hbase updates  along with the Map/Reduce job that does hdfs data processing. Summarizing my thought the custom MR code can limit the no of MR jobs in this case.

There can be n number of complex scenarios like this where your custom code turns more efficient and performant.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: yogesh dhari <yo...@live.com>
Date: Thu, 8 Nov 2012 00:37:44 
To: hadoop helpforoum<us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: RE: Map-Reduce V/S Hadoop Ecosystem


Thanks Bejoy Sir,

I am always grateful to u for your help.
Please explain these word into simple language with some case (if possible)

" If your requirement is that complex and you need very low level control 
of your code mapreduce is better. If you are an expert in mapreduce your
 code can be efficient as yours would very specific to your app but the 
MR in hive and pig may be more generic.
"
If my requirement is complex. I will prefer Ecosystem(bcoz it provide simple interface to run Map-Reduce).  
What is low level control of code. belongs here?? plz provide an example..

please do explain it into simple way so that I can understand your point of view. 

Regards 
Yogesh Kumar



> Subject: Re: Map-Reduce V/S Hadoop Ecosystem
> To: user@hadoop.apache.org
> From: bejoy.hadoop@gmail.com
> Date: Wed, 7 Nov 2012 18:24:52 +0000
> 
> Hi Yogesh,
> 
> The development time in Pig and hive are pretty less compared to its equivalent mapreduce code and for generic cases it is very efficient. 
> If your requirement is that complex and you need very low level control of your code mapreduce is better. If you are an expert in mapreduce your code can be efficient as yours would very specific to your app but the MR in hive and pig may be more generic.
> 
> To just write your custom mapreduce functions, just basic knowledge on java is good. As you are better with java you can understand the internals better.
> 
> 
> Regards
> Bejoy KS
> 
> Sent from handheld, please excuse typos.
> 
> -----Original Message-----
> From: <yo...@wipro.com>
> Date: Wed, 7 Nov 2012 15:33:07 
> To: <us...@hadoop.apache.org>
> Reply-To: user@hadoop.apache.org
> Subject: Map-Reduce V/S Hadoop Ecosystem
> 
> Hello Hadoop Champs,
> 
> Please give some suggestion..
> 
> As Hadoop Ecosystem(Hive, Pig...) internally do Map-Reduce to process.
> 
> My Question is
> 
> 1). where Map-Reduce program(written in Java, python etc) are overtaking Hadoop Ecosystem.
> 
> 2). Limitations of Hadoop Ecosystem comparing with Writing Map-Reduce program.
> 
> 3) for writing Map-Reduce jobs in java how much we need to have skills in java out of 10 (?/10)
> 
> 
> Please put some light over it.
> 
> 
> Thanks & Regards
> Yogesh Kumar
> 
> 
> The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.
> 
> WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.
> 
> www.wipro.com

RE: Map-Reduce V/S Hadoop Ecosystem

Posted by yogesh dhari <yo...@live.com>.

Thanks Bejoy Sir,

I am always grateful to u for your help.
Please explain these word into simple language with some case (if possible)

" If your requirement is that complex and you need very low level control 
of your code mapreduce is better. If you are an expert in mapreduce your
 code can be efficient as yours would very specific to your app but the 
MR in hive and pig may be more generic.
"
If my requirement is complex. I will prefer Ecosystem(bcoz it provide simple interface to run Map-Reduce).  
What is low level control of code. belongs here?? plz provide an example..

please do explain it into simple way so that I can understand your point of view. 

Regards 
Yogesh Kumar



> Subject: Re: Map-Reduce V/S Hadoop Ecosystem
> To: user@hadoop.apache.org
> From: bejoy.hadoop@gmail.com
> Date: Wed, 7 Nov 2012 18:24:52 +0000
> 
> Hi Yogesh,
> 
> The development time in Pig and hive are pretty less compared to its equivalent mapreduce code and for generic cases it is very efficient. 
> If your requirement is that complex and you need very low level control of your code mapreduce is better. If you are an expert in mapreduce your code can be efficient as yours would very specific to your app but the MR in hive and pig may be more generic.
> 
> To just write your custom mapreduce functions, just basic knowledge on java is good. As you are better with java you can understand the internals better.
> 
> 
> Regards
> Bejoy KS
> 
> Sent from handheld, please excuse typos.
> 
> -----Original Message-----
> From: <yo...@wipro.com>
> Date: Wed, 7 Nov 2012 15:33:07 
> To: <us...@hadoop.apache.org>
> Reply-To: user@hadoop.apache.org
> Subject: Map-Reduce V/S Hadoop Ecosystem
> 
> Hello Hadoop Champs,
> 
> Please give some suggestion..
> 
> As Hadoop Ecosystem(Hive, Pig...) internally do Map-Reduce to process.
> 
> My Question is
> 
> 1). where Map-Reduce program(written in Java, python etc) are overtaking Hadoop Ecosystem.
> 
> 2). Limitations of Hadoop Ecosystem comparing with Writing Map-Reduce program.
> 
> 3) for writing Map-Reduce jobs in java how much we need to have skills in java out of 10 (?/10)
> 
> 
> Please put some light over it.
> 
> 
> Thanks & Regards
> Yogesh Kumar
> 
> 
> The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.
> 
> WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.
> 
> www.wipro.com

RE: Map-Reduce V/S Hadoop Ecosystem

Posted by yogesh dhari <yo...@live.com>.

Thanks Bejoy Sir,

I am always grateful to u for your help.
Please explain these word into simple language with some case (if possible)

" If your requirement is that complex and you need very low level control 
of your code mapreduce is better. If you are an expert in mapreduce your
 code can be efficient as yours would very specific to your app but the 
MR in hive and pig may be more generic.
"
If my requirement is complex. I will prefer Ecosystem(bcoz it provide simple interface to run Map-Reduce).  
What is low level control of code. belongs here?? plz provide an example..

please do explain it into simple way so that I can understand your point of view. 

Regards 
Yogesh Kumar



> Subject: Re: Map-Reduce V/S Hadoop Ecosystem
> To: user@hadoop.apache.org
> From: bejoy.hadoop@gmail.com
> Date: Wed, 7 Nov 2012 18:24:52 +0000
> 
> Hi Yogesh,
> 
> The development time in Pig and hive are pretty less compared to its equivalent mapreduce code and for generic cases it is very efficient. 
> If your requirement is that complex and you need very low level control of your code mapreduce is better. If you are an expert in mapreduce your code can be efficient as yours would very specific to your app but the MR in hive and pig may be more generic.
> 
> To just write your custom mapreduce functions, just basic knowledge on java is good. As you are better with java you can understand the internals better.
> 
> 
> Regards
> Bejoy KS
> 
> Sent from handheld, please excuse typos.
> 
> -----Original Message-----
> From: <yo...@wipro.com>
> Date: Wed, 7 Nov 2012 15:33:07 
> To: <us...@hadoop.apache.org>
> Reply-To: user@hadoop.apache.org
> Subject: Map-Reduce V/S Hadoop Ecosystem
> 
> Hello Hadoop Champs,
> 
> Please give some suggestion..
> 
> As Hadoop Ecosystem(Hive, Pig...) internally do Map-Reduce to process.
> 
> My Question is
> 
> 1). where Map-Reduce program(written in Java, python etc) are overtaking Hadoop Ecosystem.
> 
> 2). Limitations of Hadoop Ecosystem comparing with Writing Map-Reduce program.
> 
> 3) for writing Map-Reduce jobs in java how much we need to have skills in java out of 10 (?/10)
> 
> 
> Please put some light over it.
> 
> 
> Thanks & Regards
> Yogesh Kumar
> 
> 
> The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.
> 
> WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.
> 
> www.wipro.com

RE: Map-Reduce V/S Hadoop Ecosystem

Posted by yogesh dhari <yo...@live.com>.

Thanks Bejoy Sir,

I am always grateful to u for your help.
Please explain these word into simple language with some case (if possible)

" If your requirement is that complex and you need very low level control 
of your code mapreduce is better. If you are an expert in mapreduce your
 code can be efficient as yours would very specific to your app but the 
MR in hive and pig may be more generic.
"
If my requirement is complex. I will prefer Ecosystem(bcoz it provide simple interface to run Map-Reduce).  
What is low level control of code. belongs here?? plz provide an example..

please do explain it into simple way so that I can understand your point of view. 

Regards 
Yogesh Kumar



> Subject: Re: Map-Reduce V/S Hadoop Ecosystem
> To: user@hadoop.apache.org
> From: bejoy.hadoop@gmail.com
> Date: Wed, 7 Nov 2012 18:24:52 +0000
> 
> Hi Yogesh,
> 
> The development time in Pig and hive are pretty less compared to its equivalent mapreduce code and for generic cases it is very efficient. 
> If your requirement is that complex and you need very low level control of your code mapreduce is better. If you are an expert in mapreduce your code can be efficient as yours would very specific to your app but the MR in hive and pig may be more generic.
> 
> To just write your custom mapreduce functions, just basic knowledge on java is good. As you are better with java you can understand the internals better.
> 
> 
> Regards
> Bejoy KS
> 
> Sent from handheld, please excuse typos.
> 
> -----Original Message-----
> From: <yo...@wipro.com>
> Date: Wed, 7 Nov 2012 15:33:07 
> To: <us...@hadoop.apache.org>
> Reply-To: user@hadoop.apache.org
> Subject: Map-Reduce V/S Hadoop Ecosystem
> 
> Hello Hadoop Champs,
> 
> Please give some suggestion..
> 
> As Hadoop Ecosystem(Hive, Pig...) internally do Map-Reduce to process.
> 
> My Question is
> 
> 1). where Map-Reduce program(written in Java, python etc) are overtaking Hadoop Ecosystem.
> 
> 2). Limitations of Hadoop Ecosystem comparing with Writing Map-Reduce program.
> 
> 3) for writing Map-Reduce jobs in java how much we need to have skills in java out of 10 (?/10)
> 
> 
> Please put some light over it.
> 
> 
> Thanks & Regards
> Yogesh Kumar
> 
> 
> The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.
> 
> WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.
> 
> www.wipro.com

RE: Map-Reduce V/S Hadoop Ecosystem

Posted by yogesh dhari <yo...@live.com>.

Thanks Bejoy Sir,

I am always grateful to u for your help.
Please explain these word into simple language with some case (if possible)

" If your requirement is that complex and you need very low level control 
of your code mapreduce is better. If you are an expert in mapreduce your
 code can be efficient as yours would very specific to your app but the 
MR in hive and pig may be more generic.
"
If my requirement is complex. I will prefer Ecosystem(bcoz it provide simple interface to run Map-Reduce).  
What is low level control of code. belongs here?? plz provide an example..

please do explain it into simple way so that I can understand your point of view. 

Regards 
Yogesh Kumar



> Subject: Re: Map-Reduce V/S Hadoop Ecosystem
> To: user@hadoop.apache.org
> From: bejoy.hadoop@gmail.com
> Date: Wed, 7 Nov 2012 18:24:52 +0000
> 
> Hi Yogesh,
> 
> The development time in Pig and hive are pretty less compared to its equivalent mapreduce code and for generic cases it is very efficient. 
> If your requirement is that complex and you need very low level control of your code mapreduce is better. If you are an expert in mapreduce your code can be efficient as yours would very specific to your app but the MR in hive and pig may be more generic.
> 
> To just write your custom mapreduce functions, just basic knowledge on java is good. As you are better with java you can understand the internals better.
> 
> 
> Regards
> Bejoy KS
> 
> Sent from handheld, please excuse typos.
> 
> -----Original Message-----
> From: <yo...@wipro.com>
> Date: Wed, 7 Nov 2012 15:33:07 
> To: <us...@hadoop.apache.org>
> Reply-To: user@hadoop.apache.org
> Subject: Map-Reduce V/S Hadoop Ecosystem
> 
> Hello Hadoop Champs,
> 
> Please give some suggestion..
> 
> As Hadoop Ecosystem(Hive, Pig...) internally do Map-Reduce to process.
> 
> My Question is
> 
> 1). where Map-Reduce program(written in Java, python etc) are overtaking Hadoop Ecosystem.
> 
> 2). Limitations of Hadoop Ecosystem comparing with Writing Map-Reduce program.
> 
> 3) for writing Map-Reduce jobs in java how much we need to have skills in java out of 10 (?/10)
> 
> 
> Please put some light over it.
> 
> 
> Thanks & Regards
> Yogesh Kumar
> 
> 
> The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.
> 
> WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.
> 
> www.wipro.com

Re: Map-Reduce V/S Hadoop Ecosystem

Posted by Bejoy KS <be...@gmail.com>.

Hi Yogesh,

The development time in Pig and hive are pretty less compared to its equivalent mapreduce code and for generic cases it is very efficient. 
If your requirement is that complex and you need very low level control of your code mapreduce is better. If you are an expert in mapreduce your code can be efficient as yours would very specific to your app but the MR in hive and pig may be more generic.

To just write your custom mapreduce functions, just basic knowledge on java is good. As you are better with java you can understand the internals better.


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: <yo...@wipro.com>
Date: Wed, 7 Nov 2012 15:33:07 
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Map-Reduce V/S Hadoop Ecosystem

Hello Hadoop Champs,

Please give some suggestion..

As Hadoop Ecosystem(Hive, Pig...) internally do Map-Reduce to process.

My Question is

1). where Map-Reduce program(written in Java, python etc) are overtaking Hadoop Ecosystem.

2). Limitations of Hadoop Ecosystem comparing with Writing Map-Reduce program.

3) for writing Map-Reduce jobs in java how much we need to have skills in java out of 10 (?/10)


Please put some light over it.


Thanks & Regards
Yogesh Kumar


The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

Re: Map-Reduce V/S Hadoop Ecosystem

Posted by Bejoy KS <be...@gmail.com>.

Hi Yogesh,

The development time in Pig and hive are pretty less compared to its equivalent mapreduce code and for generic cases it is very efficient. 
If your requirement is that complex and you need very low level control of your code mapreduce is better. If you are an expert in mapreduce your code can be efficient as yours would very specific to your app but the MR in hive and pig may be more generic.

To just write your custom mapreduce functions, just basic knowledge on java is good. As you are better with java you can understand the internals better.


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: <yo...@wipro.com>
Date: Wed, 7 Nov 2012 15:33:07 
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Map-Reduce V/S Hadoop Ecosystem

Hello Hadoop Champs,

Please give some suggestion..

As Hadoop Ecosystem(Hive, Pig...) internally do Map-Reduce to process.

My Question is

1). where Map-Reduce program(written in Java, python etc) are overtaking Hadoop Ecosystem.

2). Limitations of Hadoop Ecosystem comparing with Writing Map-Reduce program.

3) for writing Map-Reduce jobs in java how much we need to have skills in java out of 10 (?/10)


Please put some light over it.


Thanks & Regards
Yogesh Kumar


The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

RE: Map-Reduce V/S Hadoop Ecosystem

Posted by "Kartashov, Andy" <An...@mpac.ca>.

The way I understand it...

Hadoop is a distributed file system that allows you to create folders in its own NameSpace, copy  files to and  from your local Linux FS. Set-up Hadoop configuration for local|pseudo-distributed|fully-distributed cluster.

You write your jobs using MapReduce API and execute them in Hadoop eco-system. I write mine in Java and  would rate my experience, say at 6-7. I use Sqoop to import data into HDFS from RDMS MySql. I use Sqoop-generated classes to read data into the Mapper and design my own classes for the reducer which provide the required output.

Rgds,

-----Original Message-----
From: yogesh.kumar13@wipro.com [mailto:yogesh.kumar13@wipro.com]
Sent: Wednesday, November 07, 2012 10:33 AM
To: user@hadoop.apache.org
Subject: Map-Reduce V/S Hadoop Ecosystem

Hello Hadoop Champs,

Please give some suggestion..

As Hadoop Ecosystem(Hive, Pig...) internally do Map-Reduce to process.

My Question is

1). where Map-Reduce program(written in Java, python etc) are overtaking Hadoop Ecosystem.

2). Limitations of Hadoop Ecosystem comparing with Writing Map-Reduce program.

3) for writing Map-Reduce jobs in java how much we need to have skills in java out of 10 (?/10)


Please put some light over it.


Thanks & Regards
Yogesh Kumar


The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com
NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

Re: Map-Reduce V/S Hadoop Ecosystem

Posted by Bejoy KS <be...@gmail.com>.

Hi Yogesh,

The development time in Pig and hive are pretty less compared to its equivalent mapreduce code and for generic cases it is very efficient. 
If your requirement is that complex and you need very low level control of your code mapreduce is better. If you are an expert in mapreduce your code can be efficient as yours would very specific to your app but the MR in hive and pig may be more generic.

To just write your custom mapreduce functions, just basic knowledge on java is good. As you are better with java you can understand the internals better.


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: <yo...@wipro.com>
Date: Wed, 7 Nov 2012 15:33:07 
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Map-Reduce V/S Hadoop Ecosystem

Hello Hadoop Champs,

Please give some suggestion..

As Hadoop Ecosystem(Hive, Pig...) internally do Map-Reduce to process.

My Question is

1). where Map-Reduce program(written in Java, python etc) are overtaking Hadoop Ecosystem.

2). Limitations of Hadoop Ecosystem comparing with Writing Map-Reduce program.

3) for writing Map-Reduce jobs in java how much we need to have skills in java out of 10 (?/10)


Please put some light over it.


Thanks & Regards
Yogesh Kumar


The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com

RE: Map-Reduce V/S Hadoop Ecosystem

Posted by "Kartashov, Andy" <An...@mpac.ca>.

The way I understand it...

Hadoop is a distributed file system that allows you to create folders in its own NameSpace, copy  files to and  from your local Linux FS. Set-up Hadoop configuration for local|pseudo-distributed|fully-distributed cluster.

You write your jobs using MapReduce API and execute them in Hadoop eco-system. I write mine in Java and  would rate my experience, say at 6-7. I use Sqoop to import data into HDFS from RDMS MySql. I use Sqoop-generated classes to read data into the Mapper and design my own classes for the reducer which provide the required output.

Rgds,

-----Original Message-----
From: yogesh.kumar13@wipro.com [mailto:yogesh.kumar13@wipro.com]
Sent: Wednesday, November 07, 2012 10:33 AM
To: user@hadoop.apache.org
Subject: Map-Reduce V/S Hadoop Ecosystem

Hello Hadoop Champs,

Please give some suggestion..

As Hadoop Ecosystem(Hive, Pig...) internally do Map-Reduce to process.

My Question is

1). where Map-Reduce program(written in Java, python etc) are overtaking Hadoop Ecosystem.

2). Limitations of Hadoop Ecosystem comparing with Writing Map-Reduce program.

3) for writing Map-Reduce jobs in java how much we need to have skills in java out of 10 (?/10)


Please put some light over it.


Thanks & Regards
Yogesh Kumar


The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com
NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

Re: Map-Reduce V/S Hadoop Ecosystem

Posted by Bejoy KS <be...@gmail.com>.

Hi Yogesh,

The development time in Pig and hive are pretty less compared to its equivalent mapreduce code and for generic cases it is very efficient. 
If your requirement is that complex and you need very low level control of your code mapreduce is better. If you are an expert in mapreduce your code can be efficient as yours would very specific to your app but the MR in hive and pig may be more generic.

To just write your custom mapreduce functions, just basic knowledge on java is good. As you are better with java you can understand the internals better.


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: <yo...@wipro.com>
Date: Wed, 7 Nov 2012 15:33:07 
To: <us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Map-Reduce V/S Hadoop Ecosystem

Hello Hadoop Champs,

Please give some suggestion..

As Hadoop Ecosystem(Hive, Pig...) internally do Map-Reduce to process.

My Question is

1). where Map-Reduce program(written in Java, python etc) are overtaking Hadoop Ecosystem.

2). Limitations of Hadoop Ecosystem comparing with Writing Map-Reduce program.

3) for writing Map-Reduce jobs in java how much we need to have skills in java out of 10 (?/10)


Please put some light over it.


Thanks & Regards
Yogesh Kumar


The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com