You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Anthony Mattas <an...@mattas.net> on 2014/03/05 20:06:30 UTC

Impact of Tez/Spark to MapReduce

With Tez and Spark becoming mainstream what does Map Reduce look like
longer term? Will it become a component that sits on top of Tez, or will
they continue to live side by side utilizing YARN?

I'm struggling a little bit to understand what the roadmap looks like for
the technologies that sit on top of YARN.


Anthony Mattas
anthony@mattas.net

Re: Impact of Tez/Spark to MapReduce

Posted by "Emil A. Siemes" <es...@hortonworks.com>.

I think it is necessary to look at the question from multiple angles:

First there is MapReduce as computing paradigm.
Second there is the MapReduce API.
And third you have an implementation.

My believe is that the computing paradigm is not going away anytime soon. It's a fundamental approach for distributed computing. Not the only one though.
The API should also be quite stable so our applications will continue to work. I think it's also a save bet that there will be more high level apis making developers more productive but will call MapReduce internally.
And then there is the implementation which will indeed call/use Tez to execute the map and reduce tasks rather sooner than later. This is transparent for the developer just his app just execute faster.

Functional programming will certainly play an important role but I doubt it will be the only style e.g. Scala is big but it has not eliminated Java, JavaEE or Spring over the last 10 years.

And that's great isn't it? Java/JVM has always been about developer freedom: Platform, language, APIs, frameworks, implementations,.... You pick what makes you most productive. 

Just my few cents...
Emil



Am Mar 6, 2014 um 2:57 AM schrieb Anthony Mattas <an...@mattas.net>:

> Unfortunately I’m not super familiar with Spark - I guess my curiosity stems from a deep seated belief that big iron EDW type appliances are slowly going to fade out, so I’m trying to really get my head around what that’s going to look like in the next few years.
> 
> Hive(Stinger)+Tez+Yarn seems very promising, Impala does as well but I’m not sure if the more open Hive solution will be preferred longer term.  Does Map-Reduce still exist at that time, or does it slowly fade away (I would assume its still around because there are a lot of unique things you can do with MR today that isn’t easily accomplished in other frameworks).
> 
> On Mar 5, 2014, at 8:48 PM, Jeff Zhang <je...@gopivotal.com> wrote:
> 
>> I believe in the future the spark functional style api will dominate the big data world. Very few people will use the native mapreduce API. Even now usually users use third-party mapreduce library such as cascading, scalding, scoobi or script language hive, pig rather than the native mapreduce api.  
>> And this functional style of api compatible both with hadoop's mapreduce and spark's RDD. The underlying execution engine will be transparent to users. So I guess or I hope in the future, the api will be unified  while the underlying execution engine will been choose intelligently according the resources you have and the metadata of the data you operate on. 
>> 
>> 
>> On Thu, Mar 6, 2014 at 9:02 AM, Edward Capriolo <ed...@gmail.com> wrote:
>> The thing about yarn is you chose what is right for the the workload. 
>> 
>> For example: Spark may not the right choice if for example join tables do not fit in memory.
>> 
>> 
>> On Wednesday, March 5, 2014, Anthony Mattas <an...@mattas.net> wrote:
>> > With Tez and Spark becoming mainstream what does Map Reduce look like longer term? Will it become a component that sits on top of Tez, or will they continue to live side by side utilizing YARN?
>> > I'm struggling a little bit to understand what the roadmap looks like for the technologies that sit on top of YARN.
>> >
>> > Anthony Mattas
>> > anthony@mattas.net
>> 
>> -- 
>> Sorry this was sent from mobile. Will do less grammar and spell check than usual.
>> 
> 

Emil Andreas Siemes
Sr. Solution Engineer
Hortonworks Inc.
esiemes@hortonworks.com
+49 176 72590764


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Impact of Tez/Spark to MapReduce

Posted by "Emil A. Siemes" <es...@hortonworks.com>.

I think it is necessary to look at the question from multiple angles:

First there is MapReduce as computing paradigm.
Second there is the MapReduce API.
And third you have an implementation.

My believe is that the computing paradigm is not going away anytime soon. It's a fundamental approach for distributed computing. Not the only one though.
The API should also be quite stable so our applications will continue to work. I think it's also a save bet that there will be more high level apis making developers more productive but will call MapReduce internally.
And then there is the implementation which will indeed call/use Tez to execute the map and reduce tasks rather sooner than later. This is transparent for the developer just his app just execute faster.

Functional programming will certainly play an important role but I doubt it will be the only style e.g. Scala is big but it has not eliminated Java, JavaEE or Spring over the last 10 years.

And that's great isn't it? Java/JVM has always been about developer freedom: Platform, language, APIs, frameworks, implementations,.... You pick what makes you most productive. 

Just my few cents...
Emil



Am Mar 6, 2014 um 2:57 AM schrieb Anthony Mattas <an...@mattas.net>:

> Unfortunately I’m not super familiar with Spark - I guess my curiosity stems from a deep seated belief that big iron EDW type appliances are slowly going to fade out, so I’m trying to really get my head around what that’s going to look like in the next few years.
> 
> Hive(Stinger)+Tez+Yarn seems very promising, Impala does as well but I’m not sure if the more open Hive solution will be preferred longer term.  Does Map-Reduce still exist at that time, or does it slowly fade away (I would assume its still around because there are a lot of unique things you can do with MR today that isn’t easily accomplished in other frameworks).
> 
> On Mar 5, 2014, at 8:48 PM, Jeff Zhang <je...@gopivotal.com> wrote:
> 
>> I believe in the future the spark functional style api will dominate the big data world. Very few people will use the native mapreduce API. Even now usually users use third-party mapreduce library such as cascading, scalding, scoobi or script language hive, pig rather than the native mapreduce api.  
>> And this functional style of api compatible both with hadoop's mapreduce and spark's RDD. The underlying execution engine will be transparent to users. So I guess or I hope in the future, the api will be unified  while the underlying execution engine will been choose intelligently according the resources you have and the metadata of the data you operate on. 
>> 
>> 
>> On Thu, Mar 6, 2014 at 9:02 AM, Edward Capriolo <ed...@gmail.com> wrote:
>> The thing about yarn is you chose what is right for the the workload. 
>> 
>> For example: Spark may not the right choice if for example join tables do not fit in memory.
>> 
>> 
>> On Wednesday, March 5, 2014, Anthony Mattas <an...@mattas.net> wrote:
>> > With Tez and Spark becoming mainstream what does Map Reduce look like longer term? Will it become a component that sits on top of Tez, or will they continue to live side by side utilizing YARN?
>> > I'm struggling a little bit to understand what the roadmap looks like for the technologies that sit on top of YARN.
>> >
>> > Anthony Mattas
>> > anthony@mattas.net
>> 
>> -- 
>> Sorry this was sent from mobile. Will do less grammar and spell check than usual.
>> 
> 

Emil Andreas Siemes
Sr. Solution Engineer
Hortonworks Inc.
esiemes@hortonworks.com
+49 176 72590764


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Impact of Tez/Spark to MapReduce

Posted by "Emil A. Siemes" <es...@hortonworks.com>.

I think it is necessary to look at the question from multiple angles:

First there is MapReduce as computing paradigm.
Second there is the MapReduce API.
And third you have an implementation.

My believe is that the computing paradigm is not going away anytime soon. It's a fundamental approach for distributed computing. Not the only one though.
The API should also be quite stable so our applications will continue to work. I think it's also a save bet that there will be more high level apis making developers more productive but will call MapReduce internally.
And then there is the implementation which will indeed call/use Tez to execute the map and reduce tasks rather sooner than later. This is transparent for the developer just his app just execute faster.

Functional programming will certainly play an important role but I doubt it will be the only style e.g. Scala is big but it has not eliminated Java, JavaEE or Spring over the last 10 years.

And that's great isn't it? Java/JVM has always been about developer freedom: Platform, language, APIs, frameworks, implementations,.... You pick what makes you most productive. 

Just my few cents...
Emil



Am Mar 6, 2014 um 2:57 AM schrieb Anthony Mattas <an...@mattas.net>:

> Unfortunately I’m not super familiar with Spark - I guess my curiosity stems from a deep seated belief that big iron EDW type appliances are slowly going to fade out, so I’m trying to really get my head around what that’s going to look like in the next few years.
> 
> Hive(Stinger)+Tez+Yarn seems very promising, Impala does as well but I’m not sure if the more open Hive solution will be preferred longer term.  Does Map-Reduce still exist at that time, or does it slowly fade away (I would assume its still around because there are a lot of unique things you can do with MR today that isn’t easily accomplished in other frameworks).
> 
> On Mar 5, 2014, at 8:48 PM, Jeff Zhang <je...@gopivotal.com> wrote:
> 
>> I believe in the future the spark functional style api will dominate the big data world. Very few people will use the native mapreduce API. Even now usually users use third-party mapreduce library such as cascading, scalding, scoobi or script language hive, pig rather than the native mapreduce api.  
>> And this functional style of api compatible both with hadoop's mapreduce and spark's RDD. The underlying execution engine will be transparent to users. So I guess or I hope in the future, the api will be unified  while the underlying execution engine will been choose intelligently according the resources you have and the metadata of the data you operate on. 
>> 
>> 
>> On Thu, Mar 6, 2014 at 9:02 AM, Edward Capriolo <ed...@gmail.com> wrote:
>> The thing about yarn is you chose what is right for the the workload. 
>> 
>> For example: Spark may not the right choice if for example join tables do not fit in memory.
>> 
>> 
>> On Wednesday, March 5, 2014, Anthony Mattas <an...@mattas.net> wrote:
>> > With Tez and Spark becoming mainstream what does Map Reduce look like longer term? Will it become a component that sits on top of Tez, or will they continue to live side by side utilizing YARN?
>> > I'm struggling a little bit to understand what the roadmap looks like for the technologies that sit on top of YARN.
>> >
>> > Anthony Mattas
>> > anthony@mattas.net
>> 
>> -- 
>> Sorry this was sent from mobile. Will do less grammar and spell check than usual.
>> 
> 

Emil Andreas Siemes
Sr. Solution Engineer
Hortonworks Inc.
esiemes@hortonworks.com
+49 176 72590764


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Impact of Tez/Spark to MapReduce

Posted by "Emil A. Siemes" <es...@hortonworks.com>.

I think it is necessary to look at the question from multiple angles:

First there is MapReduce as computing paradigm.
Second there is the MapReduce API.
And third you have an implementation.

My believe is that the computing paradigm is not going away anytime soon. It's a fundamental approach for distributed computing. Not the only one though.
The API should also be quite stable so our applications will continue to work. I think it's also a save bet that there will be more high level apis making developers more productive but will call MapReduce internally.
And then there is the implementation which will indeed call/use Tez to execute the map and reduce tasks rather sooner than later. This is transparent for the developer just his app just execute faster.

Functional programming will certainly play an important role but I doubt it will be the only style e.g. Scala is big but it has not eliminated Java, JavaEE or Spring over the last 10 years.

And that's great isn't it? Java/JVM has always been about developer freedom: Platform, language, APIs, frameworks, implementations,.... You pick what makes you most productive. 

Just my few cents...
Emil



Am Mar 6, 2014 um 2:57 AM schrieb Anthony Mattas <an...@mattas.net>:

> Unfortunately I’m not super familiar with Spark - I guess my curiosity stems from a deep seated belief that big iron EDW type appliances are slowly going to fade out, so I’m trying to really get my head around what that’s going to look like in the next few years.
> 
> Hive(Stinger)+Tez+Yarn seems very promising, Impala does as well but I’m not sure if the more open Hive solution will be preferred longer term.  Does Map-Reduce still exist at that time, or does it slowly fade away (I would assume its still around because there are a lot of unique things you can do with MR today that isn’t easily accomplished in other frameworks).
> 
> On Mar 5, 2014, at 8:48 PM, Jeff Zhang <je...@gopivotal.com> wrote:
> 
>> I believe in the future the spark functional style api will dominate the big data world. Very few people will use the native mapreduce API. Even now usually users use third-party mapreduce library such as cascading, scalding, scoobi or script language hive, pig rather than the native mapreduce api.  
>> And this functional style of api compatible both with hadoop's mapreduce and spark's RDD. The underlying execution engine will be transparent to users. So I guess or I hope in the future, the api will be unified  while the underlying execution engine will been choose intelligently according the resources you have and the metadata of the data you operate on. 
>> 
>> 
>> On Thu, Mar 6, 2014 at 9:02 AM, Edward Capriolo <ed...@gmail.com> wrote:
>> The thing about yarn is you chose what is right for the the workload. 
>> 
>> For example: Spark may not the right choice if for example join tables do not fit in memory.
>> 
>> 
>> On Wednesday, March 5, 2014, Anthony Mattas <an...@mattas.net> wrote:
>> > With Tez and Spark becoming mainstream what does Map Reduce look like longer term? Will it become a component that sits on top of Tez, or will they continue to live side by side utilizing YARN?
>> > I'm struggling a little bit to understand what the roadmap looks like for the technologies that sit on top of YARN.
>> >
>> > Anthony Mattas
>> > anthony@mattas.net
>> 
>> -- 
>> Sorry this was sent from mobile. Will do less grammar and spell check than usual.
>> 
> 

Emil Andreas Siemes
Sr. Solution Engineer
Hortonworks Inc.
esiemes@hortonworks.com
+49 176 72590764


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Impact of Tez/Spark to MapReduce

Posted by Anthony Mattas <an...@mattas.net>.

Unfortunately I’m not super familiar with Spark - I guess my curiosity stems from a deep seated belief that big iron EDW type appliances are slowly going to fade out, so I’m trying to really get my head around what that’s going to look like in the next few years.

Hive(Stinger)+Tez+Yarn seems very promising, Impala does as well but I’m not sure if the more open Hive solution will be preferred longer term.  Does Map-Reduce still exist at that time, or does it slowly fade away (I would assume its still around because there are a lot of unique things you can do with MR today that isn’t easily accomplished in other frameworks).

On Mar 5, 2014, at 8:48 PM, Jeff Zhang <je...@gopivotal.com> wrote:

> I believe in the future the spark functional style api will dominate the big data world. Very few people will use the native mapreduce API. Even now usually users use third-party mapreduce library such as cascading, scalding, scoobi or script language hive, pig rather than the native mapreduce api.  
> And this functional style of api compatible both with hadoop's mapreduce and spark's RDD. The underlying execution engine will be transparent to users. So I guess or I hope in the future, the api will be unified  while the underlying execution engine will been choose intelligently according the resources you have and the metadata of the data you operate on. 
> 
> 
> On Thu, Mar 6, 2014 at 9:02 AM, Edward Capriolo <ed...@gmail.com> wrote:
> The thing about yarn is you chose what is right for the the workload. 
> 
> For example: Spark may not the right choice if for example join tables do not fit in memory.
> 
> 
> On Wednesday, March 5, 2014, Anthony Mattas <an...@mattas.net> wrote:
> > With Tez and Spark becoming mainstream what does Map Reduce look like longer term? Will it become a component that sits on top of Tez, or will they continue to live side by side utilizing YARN?
> > I'm struggling a little bit to understand what the roadmap looks like for the technologies that sit on top of YARN.
> >
> > Anthony Mattas
> > anthony@mattas.net
> 
> -- 
> Sorry this was sent from mobile. Will do less grammar and spell check than usual.
>

Re: Impact of Tez/Spark to MapReduce

Posted by Anthony Mattas <an...@mattas.net>.

Unfortunately I’m not super familiar with Spark - I guess my curiosity stems from a deep seated belief that big iron EDW type appliances are slowly going to fade out, so I’m trying to really get my head around what that’s going to look like in the next few years.

Hive(Stinger)+Tez+Yarn seems very promising, Impala does as well but I’m not sure if the more open Hive solution will be preferred longer term.  Does Map-Reduce still exist at that time, or does it slowly fade away (I would assume its still around because there are a lot of unique things you can do with MR today that isn’t easily accomplished in other frameworks).

On Mar 5, 2014, at 8:48 PM, Jeff Zhang <je...@gopivotal.com> wrote:

> I believe in the future the spark functional style api will dominate the big data world. Very few people will use the native mapreduce API. Even now usually users use third-party mapreduce library such as cascading, scalding, scoobi or script language hive, pig rather than the native mapreduce api.  
> And this functional style of api compatible both with hadoop's mapreduce and spark's RDD. The underlying execution engine will be transparent to users. So I guess or I hope in the future, the api will be unified  while the underlying execution engine will been choose intelligently according the resources you have and the metadata of the data you operate on. 
> 
> 
> On Thu, Mar 6, 2014 at 9:02 AM, Edward Capriolo <ed...@gmail.com> wrote:
> The thing about yarn is you chose what is right for the the workload. 
> 
> For example: Spark may not the right choice if for example join tables do not fit in memory.
> 
> 
> On Wednesday, March 5, 2014, Anthony Mattas <an...@mattas.net> wrote:
> > With Tez and Spark becoming mainstream what does Map Reduce look like longer term? Will it become a component that sits on top of Tez, or will they continue to live side by side utilizing YARN?
> > I'm struggling a little bit to understand what the roadmap looks like for the technologies that sit on top of YARN.
> >
> > Anthony Mattas
> > anthony@mattas.net
> 
> -- 
> Sorry this was sent from mobile. Will do less grammar and spell check than usual.
>

Re: Impact of Tez/Spark to MapReduce

Posted by Anthony Mattas <an...@mattas.net>.

Unfortunately I’m not super familiar with Spark - I guess my curiosity stems from a deep seated belief that big iron EDW type appliances are slowly going to fade out, so I’m trying to really get my head around what that’s going to look like in the next few years.

Hive(Stinger)+Tez+Yarn seems very promising, Impala does as well but I’m not sure if the more open Hive solution will be preferred longer term.  Does Map-Reduce still exist at that time, or does it slowly fade away (I would assume its still around because there are a lot of unique things you can do with MR today that isn’t easily accomplished in other frameworks).

On Mar 5, 2014, at 8:48 PM, Jeff Zhang <je...@gopivotal.com> wrote:

> I believe in the future the spark functional style api will dominate the big data world. Very few people will use the native mapreduce API. Even now usually users use third-party mapreduce library such as cascading, scalding, scoobi or script language hive, pig rather than the native mapreduce api.  
> And this functional style of api compatible both with hadoop's mapreduce and spark's RDD. The underlying execution engine will be transparent to users. So I guess or I hope in the future, the api will be unified  while the underlying execution engine will been choose intelligently according the resources you have and the metadata of the data you operate on. 
> 
> 
> On Thu, Mar 6, 2014 at 9:02 AM, Edward Capriolo <ed...@gmail.com> wrote:
> The thing about yarn is you chose what is right for the the workload. 
> 
> For example: Spark may not the right choice if for example join tables do not fit in memory.
> 
> 
> On Wednesday, March 5, 2014, Anthony Mattas <an...@mattas.net> wrote:
> > With Tez and Spark becoming mainstream what does Map Reduce look like longer term? Will it become a component that sits on top of Tez, or will they continue to live side by side utilizing YARN?
> > I'm struggling a little bit to understand what the roadmap looks like for the technologies that sit on top of YARN.
> >
> > Anthony Mattas
> > anthony@mattas.net
> 
> -- 
> Sorry this was sent from mobile. Will do less grammar and spell check than usual.
>

Re: Impact of Tez/Spark to MapReduce

Posted by Anthony Mattas <an...@mattas.net>.

Unfortunately I’m not super familiar with Spark - I guess my curiosity stems from a deep seated belief that big iron EDW type appliances are slowly going to fade out, so I’m trying to really get my head around what that’s going to look like in the next few years.

Hive(Stinger)+Tez+Yarn seems very promising, Impala does as well but I’m not sure if the more open Hive solution will be preferred longer term.  Does Map-Reduce still exist at that time, or does it slowly fade away (I would assume its still around because there are a lot of unique things you can do with MR today that isn’t easily accomplished in other frameworks).

On Mar 5, 2014, at 8:48 PM, Jeff Zhang <je...@gopivotal.com> wrote:

> I believe in the future the spark functional style api will dominate the big data world. Very few people will use the native mapreduce API. Even now usually users use third-party mapreduce library such as cascading, scalding, scoobi or script language hive, pig rather than the native mapreduce api.  
> And this functional style of api compatible both with hadoop's mapreduce and spark's RDD. The underlying execution engine will be transparent to users. So I guess or I hope in the future, the api will be unified  while the underlying execution engine will been choose intelligently according the resources you have and the metadata of the data you operate on. 
> 
> 
> On Thu, Mar 6, 2014 at 9:02 AM, Edward Capriolo <ed...@gmail.com> wrote:
> The thing about yarn is you chose what is right for the the workload. 
> 
> For example: Spark may not the right choice if for example join tables do not fit in memory.
> 
> 
> On Wednesday, March 5, 2014, Anthony Mattas <an...@mattas.net> wrote:
> > With Tez and Spark becoming mainstream what does Map Reduce look like longer term? Will it become a component that sits on top of Tez, or will they continue to live side by side utilizing YARN?
> > I'm struggling a little bit to understand what the roadmap looks like for the technologies that sit on top of YARN.
> >
> > Anthony Mattas
> > anthony@mattas.net
> 
> -- 
> Sorry this was sent from mobile. Will do less grammar and spell check than usual.
>

Re: Impact of Tez/Spark to MapReduce

Posted by Jeff Zhang <je...@gopivotal.com>.

I believe in the future the spark functional style api will dominate the
big data world. Very few people will use the native mapreduce API. Even now
usually users use third-party mapreduce library such as cascading,
scalding, scoobi or script language hive, pig rather than the native
mapreduce api.
And this functional style of api compatible both with hadoop's mapreduce
and spark's RDD. The underlying execution engine will be transparent to
users. So I guess or I hope in the future, the api will be unified  while
the underlying execution engine will been choose intelligently according
the resources you have and the metadata of the data you operate on.

On Thu, Mar 6, 2014 at 9:02 AM, Edward Capriolo <ed...@gmail.com>wrote:

> The thing about yarn is you chose what is right for the the workload.
>
> For example: Spark may not the right choice if for example join tables do
> not fit in memory.
>
>
> On Wednesday, March 5, 2014, Anthony Mattas <an...@mattas.net> wrote:
> > With Tez and Spark becoming mainstream what does Map Reduce look like
> longer term? Will it become a component that sits on top of Tez, or will
> they continue to live side by side utilizing YARN?
> > I'm struggling a little bit to understand what the roadmap looks like
> for the technologies that sit on top of YARN.
> >
> > Anthony Mattas
> > anthony@mattas.net
>
> --
> Sorry this was sent from mobile. Will do less grammar and spell check than
> usual.
>

Re: Impact of Tez/Spark to MapReduce

Posted by Jeff Zhang <je...@gopivotal.com>.

I believe in the future the spark functional style api will dominate the
big data world. Very few people will use the native mapreduce API. Even now
usually users use third-party mapreduce library such as cascading,
scalding, scoobi or script language hive, pig rather than the native
mapreduce api.
And this functional style of api compatible both with hadoop's mapreduce
and spark's RDD. The underlying execution engine will be transparent to
users. So I guess or I hope in the future, the api will be unified  while
the underlying execution engine will been choose intelligently according
the resources you have and the metadata of the data you operate on.

On Thu, Mar 6, 2014 at 9:02 AM, Edward Capriolo <ed...@gmail.com>wrote:

> The thing about yarn is you chose what is right for the the workload.
>
> For example: Spark may not the right choice if for example join tables do
> not fit in memory.
>
>
> On Wednesday, March 5, 2014, Anthony Mattas <an...@mattas.net> wrote:
> > With Tez and Spark becoming mainstream what does Map Reduce look like
> longer term? Will it become a component that sits on top of Tez, or will
> they continue to live side by side utilizing YARN?
> > I'm struggling a little bit to understand what the roadmap looks like
> for the technologies that sit on top of YARN.
> >
> > Anthony Mattas
> > anthony@mattas.net
>
> --
> Sorry this was sent from mobile. Will do less grammar and spell check than
> usual.
>

Re: Impact of Tez/Spark to MapReduce

Posted by Jeff Zhang <je...@gopivotal.com>.

I believe in the future the spark functional style api will dominate the
big data world. Very few people will use the native mapreduce API. Even now
usually users use third-party mapreduce library such as cascading,
scalding, scoobi or script language hive, pig rather than the native
mapreduce api.
And this functional style of api compatible both with hadoop's mapreduce
and spark's RDD. The underlying execution engine will be transparent to
users. So I guess or I hope in the future, the api will be unified  while
the underlying execution engine will been choose intelligently according
the resources you have and the metadata of the data you operate on.

On Thu, Mar 6, 2014 at 9:02 AM, Edward Capriolo <ed...@gmail.com>wrote:

> The thing about yarn is you chose what is right for the the workload.
>
> For example: Spark may not the right choice if for example join tables do
> not fit in memory.
>
>
> On Wednesday, March 5, 2014, Anthony Mattas <an...@mattas.net> wrote:
> > With Tez and Spark becoming mainstream what does Map Reduce look like
> longer term? Will it become a component that sits on top of Tez, or will
> they continue to live side by side utilizing YARN?
> > I'm struggling a little bit to understand what the roadmap looks like
> for the technologies that sit on top of YARN.
> >
> > Anthony Mattas
> > anthony@mattas.net
>
> --
> Sorry this was sent from mobile. Will do less grammar and spell check than
> usual.
>

Re: Impact of Tez/Spark to MapReduce

Posted by Jeff Zhang <je...@gopivotal.com>.

I believe in the future the spark functional style api will dominate the
big data world. Very few people will use the native mapreduce API. Even now
usually users use third-party mapreduce library such as cascading,
scalding, scoobi or script language hive, pig rather than the native
mapreduce api.
And this functional style of api compatible both with hadoop's mapreduce
and spark's RDD. The underlying execution engine will be transparent to
users. So I guess or I hope in the future, the api will be unified  while
the underlying execution engine will been choose intelligently according
the resources you have and the metadata of the data you operate on.

On Thu, Mar 6, 2014 at 9:02 AM, Edward Capriolo <ed...@gmail.com>wrote:

> The thing about yarn is you chose what is right for the the workload.
>
> For example: Spark may not the right choice if for example join tables do
> not fit in memory.
>
>
> On Wednesday, March 5, 2014, Anthony Mattas <an...@mattas.net> wrote:
> > With Tez and Spark becoming mainstream what does Map Reduce look like
> longer term? Will it become a component that sits on top of Tez, or will
> they continue to live side by side utilizing YARN?
> > I'm struggling a little bit to understand what the roadmap looks like
> for the technologies that sit on top of YARN.
> >
> > Anthony Mattas
> > anthony@mattas.net
>
> --
> Sorry this was sent from mobile. Will do less grammar and spell check than
> usual.
>

Re: Impact of Tez/Spark to MapReduce

Posted by Edward Capriolo <ed...@gmail.com>.

The thing about yarn is you chose what is right for the the workload.

For example: Spark may not the right choice if for example join tables do
not fit in memory.

On Wednesday, March 5, 2014, Anthony Mattas <an...@mattas.net> wrote:
> With Tez and Spark becoming mainstream what does Map Reduce look like
longer term? Will it become a component that sits on top of Tez, or will
they continue to live side by side utilizing YARN?
> I'm struggling a little bit to understand what the roadmap looks like for
the technologies that sit on top of YARN.
>
> Anthony Mattas
> anthony@mattas.net

-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: Impact of Tez/Spark to MapReduce

Posted by Edward Capriolo <ed...@gmail.com>.

The thing about yarn is you chose what is right for the the workload.

For example: Spark may not the right choice if for example join tables do
not fit in memory.

On Wednesday, March 5, 2014, Anthony Mattas <an...@mattas.net> wrote:
> With Tez and Spark becoming mainstream what does Map Reduce look like
longer term? Will it become a component that sits on top of Tez, or will
they continue to live side by side utilizing YARN?
> I'm struggling a little bit to understand what the roadmap looks like for
the technologies that sit on top of YARN.
>
> Anthony Mattas
> anthony@mattas.net

-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: Impact of Tez/Spark to MapReduce

Posted by Edward Capriolo <ed...@gmail.com>.

The thing about yarn is you chose what is right for the the workload.

For example: Spark may not the right choice if for example join tables do
not fit in memory.

On Wednesday, March 5, 2014, Anthony Mattas <an...@mattas.net> wrote:
> With Tez and Spark becoming mainstream what does Map Reduce look like
longer term? Will it become a component that sits on top of Tez, or will
they continue to live side by side utilizing YARN?
> I'm struggling a little bit to understand what the roadmap looks like for
the technologies that sit on top of YARN.
>
> Anthony Mattas
> anthony@mattas.net

-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: Impact of Tez/Spark to MapReduce

Posted by Edward Capriolo <ed...@gmail.com>.

The thing about yarn is you chose what is right for the the workload.

For example: Spark may not the right choice if for example join tables do
not fit in memory.

On Wednesday, March 5, 2014, Anthony Mattas <an...@mattas.net> wrote:
> With Tez and Spark becoming mainstream what does Map Reduce look like
longer term? Will it become a component that sits on top of Tez, or will
they continue to live side by side utilizing YARN?
> I'm struggling a little bit to understand what the roadmap looks like for
the technologies that sit on top of YARN.
>
> Anthony Mattas
> anthony@mattas.net

-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.