You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Sebastiano Di Paola <se...@gmail.com> on 2014/08/10 19:23:41 UTC

Yarn, MRv1, MRv2 lots of newbie doubts and questions

Hi all,
I'm a newbie hadoop user, and I started using hadoop 2.4.1 as my first
installation.
So now I'm struggling with mapred, mapreduce, yarn....MRv1, MRv2, yarn.
I tried to read the documentation, but I couldn't find a clear
answer...sometimes it seems  that documentations thinks that you know all
the history about hadoop framework... :(

I started with standalone node of course, but I have deployed also a
cluster with 10 machines.

Start with the example on the documentation.

Cluster installed...dfs running with
start-dfs.sh

when I run

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar
grep input output 'dfs[a-z.]+'

What I'm using? MRv1, MRv2?
The job execute successfully and I can get the output on HDFS output
directory.


Then on the same installation I start yarn with start-yarn.sh
I run the same command after starting yarn

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar
grep input output 'dfs[a-z.]+'

So what I'm using in this case?

I'm not sure about what is the difference from mapreduce and
yarn....probably mapreduce is running on top of yarn? How does mapreduce
interact with yarn? it it completely transparent?

What's the difference between a mapreduce and a yarn application? (Forgive
me if it's not correct to talk about mapreduce application)

Besides that...writing a completely new mapreduce application what API that
should be used? not to write deprecated/old hadoop style code?
mapred or mapreduce
Thanks a lot.
Kind regards.
Seba

Re: Yarn, MRv1, MRv2 lots of newbie doubts and questions

Posted by Nicolas Maillard <nm...@hortonworks.com>.
Hello

As the hadoop ecosystem moves fast and the yarn part was a mini revolution
I understand your confusion.
To make it simple in hadoop 1 there were two main things Hadoop MapReduce
and Hadoop HDFS.
Hadoop MR was actually two things: A compute paradigme, map-reduce and a
distribution process of that paradigme. So MR had to do map and reduce
phases but also talk to all the machines to get compute slots at the right
places. This meant that use that distribution process you had to go through
the mapreduce paradigme, since they were bundeled.

In hadoop 2 you have map reduce 2 that is a paradigme and yarn that does
the distribution. The added bonus here is now you can use the paradigme you
want and talk to yarn to get the distribution. So you can still do Map
Reduce code if you want but you can now do other stuff like
tez,spark,giraph etc... and they all use yarn as a way to get distributed
cleanly on the cluster.

On the Api question yarn has also changed the game you now want to use the
paradigme or engine of your choice according to what best fits your
calculations, DAG or not, In memory or not, Graph or nt etc...
I would advise going through higher level APIs that let you write your
logic and then choose the engine you need, so Cascading for example is a
nice for that. Hive As well let's you write sql code and then decide later
what you need, Map reduce, tez, in the near future spark. etc..

I hope this helps


On Sun, Aug 10, 2014 at 7:23 PM, Sebastiano Di Paola <
sebastiano.dipaola@gmail.com> wrote:

> Hi all,
> I'm a newbie hadoop user, and I started using hadoop 2.4.1 as my first
> installation.
> So now I'm struggling with mapred, mapreduce, yarn....MRv1, MRv2, yarn.
> I tried to read the documentation, but I couldn't find a clear
> answer...sometimes it seems  that documentations thinks that you know all
> the history about hadoop framework... :(
>
> I started with standalone node of course, but I have deployed also a
> cluster with 10 machines.
>
> Start with the example on the documentation.
>
> Cluster installed...dfs running with
> start-dfs.sh
>
> when I run
>
> bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar grep input output 'dfs[a-z.]+'
>
> What I'm using? MRv1, MRv2?
> The job execute successfully and I can get the output on HDFS output
> directory.
>
>
> Then on the same installation I start yarn with start-yarn.sh
> I run the same command after starting yarn
>
> bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar grep input output 'dfs[a-z.]+'
>
> So what I'm using in this case?
>
> I'm not sure about what is the difference from mapreduce and
> yarn....probably mapreduce is running on top of yarn? How does mapreduce
> interact with yarn? it it completely transparent?
>
> What's the difference between a mapreduce and a yarn application? (Forgive
> me if it's not correct to talk about mapreduce application)
>
> Besides that...writing a completely new mapreduce application what API
> that should be used? not to write deprecated/old hadoop style code?
> mapred or mapreduce
> Thanks a lot.
> Kind regards.
> Seba
>
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Yarn, MRv1, MRv2 lots of newbie doubts and questions

Posted by Nicolas Maillard <nm...@hortonworks.com>.
Hello

As the hadoop ecosystem moves fast and the yarn part was a mini revolution
I understand your confusion.
To make it simple in hadoop 1 there were two main things Hadoop MapReduce
and Hadoop HDFS.
Hadoop MR was actually two things: A compute paradigme, map-reduce and a
distribution process of that paradigme. So MR had to do map and reduce
phases but also talk to all the machines to get compute slots at the right
places. This meant that use that distribution process you had to go through
the mapreduce paradigme, since they were bundeled.

In hadoop 2 you have map reduce 2 that is a paradigme and yarn that does
the distribution. The added bonus here is now you can use the paradigme you
want and talk to yarn to get the distribution. So you can still do Map
Reduce code if you want but you can now do other stuff like
tez,spark,giraph etc... and they all use yarn as a way to get distributed
cleanly on the cluster.

On the Api question yarn has also changed the game you now want to use the
paradigme or engine of your choice according to what best fits your
calculations, DAG or not, In memory or not, Graph or nt etc...
I would advise going through higher level APIs that let you write your
logic and then choose the engine you need, so Cascading for example is a
nice for that. Hive As well let's you write sql code and then decide later
what you need, Map reduce, tez, in the near future spark. etc..

I hope this helps


On Sun, Aug 10, 2014 at 7:23 PM, Sebastiano Di Paola <
sebastiano.dipaola@gmail.com> wrote:

> Hi all,
> I'm a newbie hadoop user, and I started using hadoop 2.4.1 as my first
> installation.
> So now I'm struggling with mapred, mapreduce, yarn....MRv1, MRv2, yarn.
> I tried to read the documentation, but I couldn't find a clear
> answer...sometimes it seems  that documentations thinks that you know all
> the history about hadoop framework... :(
>
> I started with standalone node of course, but I have deployed also a
> cluster with 10 machines.
>
> Start with the example on the documentation.
>
> Cluster installed...dfs running with
> start-dfs.sh
>
> when I run
>
> bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar grep input output 'dfs[a-z.]+'
>
> What I'm using? MRv1, MRv2?
> The job execute successfully and I can get the output on HDFS output
> directory.
>
>
> Then on the same installation I start yarn with start-yarn.sh
> I run the same command after starting yarn
>
> bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar grep input output 'dfs[a-z.]+'
>
> So what I'm using in this case?
>
> I'm not sure about what is the difference from mapreduce and
> yarn....probably mapreduce is running on top of yarn? How does mapreduce
> interact with yarn? it it completely transparent?
>
> What's the difference between a mapreduce and a yarn application? (Forgive
> me if it's not correct to talk about mapreduce application)
>
> Besides that...writing a completely new mapreduce application what API
> that should be used? not to write deprecated/old hadoop style code?
> mapred or mapreduce
> Thanks a lot.
> Kind regards.
> Seba
>
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Yarn, MRv1, MRv2 lots of newbie doubts and questions

Posted by Nicolas Maillard <nm...@hortonworks.com>.
Hello

As the hadoop ecosystem moves fast and the yarn part was a mini revolution
I understand your confusion.
To make it simple in hadoop 1 there were two main things Hadoop MapReduce
and Hadoop HDFS.
Hadoop MR was actually two things: A compute paradigme, map-reduce and a
distribution process of that paradigme. So MR had to do map and reduce
phases but also talk to all the machines to get compute slots at the right
places. This meant that use that distribution process you had to go through
the mapreduce paradigme, since they were bundeled.

In hadoop 2 you have map reduce 2 that is a paradigme and yarn that does
the distribution. The added bonus here is now you can use the paradigme you
want and talk to yarn to get the distribution. So you can still do Map
Reduce code if you want but you can now do other stuff like
tez,spark,giraph etc... and they all use yarn as a way to get distributed
cleanly on the cluster.

On the Api question yarn has also changed the game you now want to use the
paradigme or engine of your choice according to what best fits your
calculations, DAG or not, In memory or not, Graph or nt etc...
I would advise going through higher level APIs that let you write your
logic and then choose the engine you need, so Cascading for example is a
nice for that. Hive As well let's you write sql code and then decide later
what you need, Map reduce, tez, in the near future spark. etc..

I hope this helps


On Sun, Aug 10, 2014 at 7:23 PM, Sebastiano Di Paola <
sebastiano.dipaola@gmail.com> wrote:

> Hi all,
> I'm a newbie hadoop user, and I started using hadoop 2.4.1 as my first
> installation.
> So now I'm struggling with mapred, mapreduce, yarn....MRv1, MRv2, yarn.
> I tried to read the documentation, but I couldn't find a clear
> answer...sometimes it seems  that documentations thinks that you know all
> the history about hadoop framework... :(
>
> I started with standalone node of course, but I have deployed also a
> cluster with 10 machines.
>
> Start with the example on the documentation.
>
> Cluster installed...dfs running with
> start-dfs.sh
>
> when I run
>
> bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar grep input output 'dfs[a-z.]+'
>
> What I'm using? MRv1, MRv2?
> The job execute successfully and I can get the output on HDFS output
> directory.
>
>
> Then on the same installation I start yarn with start-yarn.sh
> I run the same command after starting yarn
>
> bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar grep input output 'dfs[a-z.]+'
>
> So what I'm using in this case?
>
> I'm not sure about what is the difference from mapreduce and
> yarn....probably mapreduce is running on top of yarn? How does mapreduce
> interact with yarn? it it completely transparent?
>
> What's the difference between a mapreduce and a yarn application? (Forgive
> me if it's not correct to talk about mapreduce application)
>
> Besides that...writing a completely new mapreduce application what API
> that should be used? not to write deprecated/old hadoop style code?
> mapred or mapreduce
> Thanks a lot.
> Kind regards.
> Seba
>
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Yarn, MRv1, MRv2 lots of newbie doubts and questions

Posted by Nicolas Maillard <nm...@hortonworks.com>.
Hello

As the hadoop ecosystem moves fast and the yarn part was a mini revolution
I understand your confusion.
To make it simple in hadoop 1 there were two main things Hadoop MapReduce
and Hadoop HDFS.
Hadoop MR was actually two things: A compute paradigme, map-reduce and a
distribution process of that paradigme. So MR had to do map and reduce
phases but also talk to all the machines to get compute slots at the right
places. This meant that use that distribution process you had to go through
the mapreduce paradigme, since they were bundeled.

In hadoop 2 you have map reduce 2 that is a paradigme and yarn that does
the distribution. The added bonus here is now you can use the paradigme you
want and talk to yarn to get the distribution. So you can still do Map
Reduce code if you want but you can now do other stuff like
tez,spark,giraph etc... and they all use yarn as a way to get distributed
cleanly on the cluster.

On the Api question yarn has also changed the game you now want to use the
paradigme or engine of your choice according to what best fits your
calculations, DAG or not, In memory or not, Graph or nt etc...
I would advise going through higher level APIs that let you write your
logic and then choose the engine you need, so Cascading for example is a
nice for that. Hive As well let's you write sql code and then decide later
what you need, Map reduce, tez, in the near future spark. etc..

I hope this helps


On Sun, Aug 10, 2014 at 7:23 PM, Sebastiano Di Paola <
sebastiano.dipaola@gmail.com> wrote:

> Hi all,
> I'm a newbie hadoop user, and I started using hadoop 2.4.1 as my first
> installation.
> So now I'm struggling with mapred, mapreduce, yarn....MRv1, MRv2, yarn.
> I tried to read the documentation, but I couldn't find a clear
> answer...sometimes it seems  that documentations thinks that you know all
> the history about hadoop framework... :(
>
> I started with standalone node of course, but I have deployed also a
> cluster with 10 machines.
>
> Start with the example on the documentation.
>
> Cluster installed...dfs running with
> start-dfs.sh
>
> when I run
>
> bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar grep input output 'dfs[a-z.]+'
>
> What I'm using? MRv1, MRv2?
> The job execute successfully and I can get the output on HDFS output
> directory.
>
>
> Then on the same installation I start yarn with start-yarn.sh
> I run the same command after starting yarn
>
> bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar grep input output 'dfs[a-z.]+'
>
> So what I'm using in this case?
>
> I'm not sure about what is the difference from mapreduce and
> yarn....probably mapreduce is running on top of yarn? How does mapreduce
> interact with yarn? it it completely transparent?
>
> What's the difference between a mapreduce and a yarn application? (Forgive
> me if it's not correct to talk about mapreduce application)
>
> Besides that...writing a completely new mapreduce application what API
> that should be used? not to write deprecated/old hadoop style code?
> mapred or mapreduce
> Thanks a lot.
> Kind regards.
> Seba
>
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.