You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Ashutosh Kumar <as...@gmail.com> on 2015/04/10 07:08:58 UTC

Hadoop or spark

How do I decide whether I should go for Hadoop or Spark for a greenfield
project . I tried to find out and looks like Spark can do everything that
hadoop can do. Appreciate your thoughts on it.

Thanks

Re: Hadoop or spark

Posted by Ashutosh Kumar <as...@gmail.com>.
Thanks. I read this article and t seems for all practical purposes Spark is
preferred than Hadoop map reduce. Only when have processing for very large
files , in that case Hadoop map reduce scores over Spark. But what is this
large file size? Is it TBs or PBs or varies based on cluster size? Please
share your views.

Thanks
Ashutosh


On Fri, Apr 10, 2015 at 8:23 PM, Moty Michaely <mo...@xplenty.com> wrote:

> Hey,
>
> Xplenty's CTO wrote a good piece of comparison between the two:
>
> https://www.xplenty.com/blog/2014/11/apache-spark-vs-hadoop-mapreduce/?utm_source=hadoop-mailing-group&utm_medium=email&utm_campaign=social
>
> Hope this helps with deciding.
>
> Good luck!
>
> On Fri, Apr 10, 2015 at 4:28 PM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
>> Thanks for this. Slide# 77 and 87 are pretty good. Quite a few of it,  is
>> new stuff and still emerging.
>>
>> Regards,
>> Shahab
>>
>> On Fri, Apr 10, 2015 at 9:10 AM, Peyman Mohajerian <mo...@gmail.com>
>> wrote:
>>
>>> There actually is such a discussion, e.g.:
>>>
>>> http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi
>>>
>>> you can have a standalone Spark cluster with no dependency on Hadoop.
>>>
>>> On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <sh...@gmail.com>
>>> wrote:
>>>
>>>> I hope I am not misunderstanding your question but I don't think there
>>>> is a comparison between Spark and Hadoop. They are different things.
>>>>
>>>> Hadoop is a platform on which you can run Yarn, HBase and even Spark.
>>>> E.g. Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as
>>>> part of its installation. Spark can run within a Hadoop cluster deployment.
>>>>
>>>> I think a more apt comparison would be something like whether you
>>>> should use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>>>>
>>>> Or even more direct would be Spark vs. Storm, which has been discussed
>>>> here.
>>>> http://marc.info/?l=hadoop-user&m=140434265901449
>>>>
>>>> Regards,
>>>> Shahab
>>>>
>>>>
>>>>
>>>> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <ashutosh.k78@gmail.com
>>>> > wrote:
>>>>
>>>>> How do I decide whether I should go for Hadoop or Spark for a
>>>>> greenfield project . I tried to find out and looks like Spark can do
>>>>> everything that hadoop can do. Appreciate your thoughts on it.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
>
> Moty Michaely
>
> VP R&D, Xplenty
>
>
>

Re: Hadoop or spark

Posted by Ashutosh Kumar <as...@gmail.com>.
Thanks. I read this article and t seems for all practical purposes Spark is
preferred than Hadoop map reduce. Only when have processing for very large
files , in that case Hadoop map reduce scores over Spark. But what is this
large file size? Is it TBs or PBs or varies based on cluster size? Please
share your views.

Thanks
Ashutosh


On Fri, Apr 10, 2015 at 8:23 PM, Moty Michaely <mo...@xplenty.com> wrote:

> Hey,
>
> Xplenty's CTO wrote a good piece of comparison between the two:
>
> https://www.xplenty.com/blog/2014/11/apache-spark-vs-hadoop-mapreduce/?utm_source=hadoop-mailing-group&utm_medium=email&utm_campaign=social
>
> Hope this helps with deciding.
>
> Good luck!
>
> On Fri, Apr 10, 2015 at 4:28 PM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
>> Thanks for this. Slide# 77 and 87 are pretty good. Quite a few of it,  is
>> new stuff and still emerging.
>>
>> Regards,
>> Shahab
>>
>> On Fri, Apr 10, 2015 at 9:10 AM, Peyman Mohajerian <mo...@gmail.com>
>> wrote:
>>
>>> There actually is such a discussion, e.g.:
>>>
>>> http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi
>>>
>>> you can have a standalone Spark cluster with no dependency on Hadoop.
>>>
>>> On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <sh...@gmail.com>
>>> wrote:
>>>
>>>> I hope I am not misunderstanding your question but I don't think there
>>>> is a comparison between Spark and Hadoop. They are different things.
>>>>
>>>> Hadoop is a platform on which you can run Yarn, HBase and even Spark.
>>>> E.g. Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as
>>>> part of its installation. Spark can run within a Hadoop cluster deployment.
>>>>
>>>> I think a more apt comparison would be something like whether you
>>>> should use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>>>>
>>>> Or even more direct would be Spark vs. Storm, which has been discussed
>>>> here.
>>>> http://marc.info/?l=hadoop-user&m=140434265901449
>>>>
>>>> Regards,
>>>> Shahab
>>>>
>>>>
>>>>
>>>> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <ashutosh.k78@gmail.com
>>>> > wrote:
>>>>
>>>>> How do I decide whether I should go for Hadoop or Spark for a
>>>>> greenfield project . I tried to find out and looks like Spark can do
>>>>> everything that hadoop can do. Appreciate your thoughts on it.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
>
> Moty Michaely
>
> VP R&D, Xplenty
>
>
>

Re: Hadoop or spark

Posted by Ashutosh Kumar <as...@gmail.com>.
Thanks. I read this article and t seems for all practical purposes Spark is
preferred than Hadoop map reduce. Only when have processing for very large
files , in that case Hadoop map reduce scores over Spark. But what is this
large file size? Is it TBs or PBs or varies based on cluster size? Please
share your views.

Thanks
Ashutosh


On Fri, Apr 10, 2015 at 8:23 PM, Moty Michaely <mo...@xplenty.com> wrote:

> Hey,
>
> Xplenty's CTO wrote a good piece of comparison between the two:
>
> https://www.xplenty.com/blog/2014/11/apache-spark-vs-hadoop-mapreduce/?utm_source=hadoop-mailing-group&utm_medium=email&utm_campaign=social
>
> Hope this helps with deciding.
>
> Good luck!
>
> On Fri, Apr 10, 2015 at 4:28 PM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
>> Thanks for this. Slide# 77 and 87 are pretty good. Quite a few of it,  is
>> new stuff and still emerging.
>>
>> Regards,
>> Shahab
>>
>> On Fri, Apr 10, 2015 at 9:10 AM, Peyman Mohajerian <mo...@gmail.com>
>> wrote:
>>
>>> There actually is such a discussion, e.g.:
>>>
>>> http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi
>>>
>>> you can have a standalone Spark cluster with no dependency on Hadoop.
>>>
>>> On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <sh...@gmail.com>
>>> wrote:
>>>
>>>> I hope I am not misunderstanding your question but I don't think there
>>>> is a comparison between Spark and Hadoop. They are different things.
>>>>
>>>> Hadoop is a platform on which you can run Yarn, HBase and even Spark.
>>>> E.g. Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as
>>>> part of its installation. Spark can run within a Hadoop cluster deployment.
>>>>
>>>> I think a more apt comparison would be something like whether you
>>>> should use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>>>>
>>>> Or even more direct would be Spark vs. Storm, which has been discussed
>>>> here.
>>>> http://marc.info/?l=hadoop-user&m=140434265901449
>>>>
>>>> Regards,
>>>> Shahab
>>>>
>>>>
>>>>
>>>> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <ashutosh.k78@gmail.com
>>>> > wrote:
>>>>
>>>>> How do I decide whether I should go for Hadoop or Spark for a
>>>>> greenfield project . I tried to find out and looks like Spark can do
>>>>> everything that hadoop can do. Appreciate your thoughts on it.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
>
> Moty Michaely
>
> VP R&D, Xplenty
>
>
>

Re: Hadoop or spark

Posted by Ashutosh Kumar <as...@gmail.com>.
Thanks. I read this article and t seems for all practical purposes Spark is
preferred than Hadoop map reduce. Only when have processing for very large
files , in that case Hadoop map reduce scores over Spark. But what is this
large file size? Is it TBs or PBs or varies based on cluster size? Please
share your views.

Thanks
Ashutosh


On Fri, Apr 10, 2015 at 8:23 PM, Moty Michaely <mo...@xplenty.com> wrote:

> Hey,
>
> Xplenty's CTO wrote a good piece of comparison between the two:
>
> https://www.xplenty.com/blog/2014/11/apache-spark-vs-hadoop-mapreduce/?utm_source=hadoop-mailing-group&utm_medium=email&utm_campaign=social
>
> Hope this helps with deciding.
>
> Good luck!
>
> On Fri, Apr 10, 2015 at 4:28 PM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
>> Thanks for this. Slide# 77 and 87 are pretty good. Quite a few of it,  is
>> new stuff and still emerging.
>>
>> Regards,
>> Shahab
>>
>> On Fri, Apr 10, 2015 at 9:10 AM, Peyman Mohajerian <mo...@gmail.com>
>> wrote:
>>
>>> There actually is such a discussion, e.g.:
>>>
>>> http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi
>>>
>>> you can have a standalone Spark cluster with no dependency on Hadoop.
>>>
>>> On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <sh...@gmail.com>
>>> wrote:
>>>
>>>> I hope I am not misunderstanding your question but I don't think there
>>>> is a comparison between Spark and Hadoop. They are different things.
>>>>
>>>> Hadoop is a platform on which you can run Yarn, HBase and even Spark.
>>>> E.g. Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as
>>>> part of its installation. Spark can run within a Hadoop cluster deployment.
>>>>
>>>> I think a more apt comparison would be something like whether you
>>>> should use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>>>>
>>>> Or even more direct would be Spark vs. Storm, which has been discussed
>>>> here.
>>>> http://marc.info/?l=hadoop-user&m=140434265901449
>>>>
>>>> Regards,
>>>> Shahab
>>>>
>>>>
>>>>
>>>> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <ashutosh.k78@gmail.com
>>>> > wrote:
>>>>
>>>>> How do I decide whether I should go for Hadoop or Spark for a
>>>>> greenfield project . I tried to find out and looks like Spark can do
>>>>> everything that hadoop can do. Appreciate your thoughts on it.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
>
> Moty Michaely
>
> VP R&D, Xplenty
>
>
>

Re: Hadoop or spark

Posted by Moty Michaely <mo...@xplenty.com>.
Hey,

Xplenty's CTO wrote a good piece of comparison between the two:
https://www.xplenty.com/blog/2014/11/apache-spark-vs-hadoop-mapreduce/?utm_source=hadoop-mailing-group&utm_medium=email&utm_campaign=social

Hope this helps with deciding.

Good luck!

On Fri, Apr 10, 2015 at 4:28 PM, Shahab Yunus <sh...@gmail.com>
wrote:

> Thanks for this. Slide# 77 and 87 are pretty good. Quite a few of it,  is
> new stuff and still emerging.
>
> Regards,
> Shahab
>
> On Fri, Apr 10, 2015 at 9:10 AM, Peyman Mohajerian <mo...@gmail.com>
> wrote:
>
>> There actually is such a discussion, e.g.:
>>
>> http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi
>>
>> you can have a standalone Spark cluster with no dependency on Hadoop.
>>
>> On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <sh...@gmail.com>
>> wrote:
>>
>>> I hope I am not misunderstanding your question but I don't think there
>>> is a comparison between Spark and Hadoop. They are different things.
>>>
>>> Hadoop is a platform on which you can run Yarn, HBase and even Spark.
>>> E.g. Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as
>>> part of its installation. Spark can run within a Hadoop cluster deployment.
>>>
>>> I think a more apt comparison would be something like whether you should
>>> use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>>>
>>> Or even more direct would be Spark vs. Storm, which has been discussed
>>> here.
>>> http://marc.info/?l=hadoop-user&m=140434265901449
>>>
>>> Regards,
>>> Shahab
>>>
>>>
>>>
>>> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <as...@gmail.com>
>>> wrote:
>>>
>>>> How do I decide whether I should go for Hadoop or Spark for a
>>>> greenfield project . I tried to find out and looks like Spark can do
>>>> everything that hadoop can do. Appreciate your thoughts on it.
>>>>
>>>> Thanks
>>>>
>>>>
>>>
>>
>


-- 

Moty Michaely

VP R&D, Xplenty

Re: Hadoop or spark

Posted by Moty Michaely <mo...@xplenty.com>.
Hey,

Xplenty's CTO wrote a good piece of comparison between the two:
https://www.xplenty.com/blog/2014/11/apache-spark-vs-hadoop-mapreduce/?utm_source=hadoop-mailing-group&utm_medium=email&utm_campaign=social

Hope this helps with deciding.

Good luck!

On Fri, Apr 10, 2015 at 4:28 PM, Shahab Yunus <sh...@gmail.com>
wrote:

> Thanks for this. Slide# 77 and 87 are pretty good. Quite a few of it,  is
> new stuff and still emerging.
>
> Regards,
> Shahab
>
> On Fri, Apr 10, 2015 at 9:10 AM, Peyman Mohajerian <mo...@gmail.com>
> wrote:
>
>> There actually is such a discussion, e.g.:
>>
>> http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi
>>
>> you can have a standalone Spark cluster with no dependency on Hadoop.
>>
>> On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <sh...@gmail.com>
>> wrote:
>>
>>> I hope I am not misunderstanding your question but I don't think there
>>> is a comparison between Spark and Hadoop. They are different things.
>>>
>>> Hadoop is a platform on which you can run Yarn, HBase and even Spark.
>>> E.g. Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as
>>> part of its installation. Spark can run within a Hadoop cluster deployment.
>>>
>>> I think a more apt comparison would be something like whether you should
>>> use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>>>
>>> Or even more direct would be Spark vs. Storm, which has been discussed
>>> here.
>>> http://marc.info/?l=hadoop-user&m=140434265901449
>>>
>>> Regards,
>>> Shahab
>>>
>>>
>>>
>>> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <as...@gmail.com>
>>> wrote:
>>>
>>>> How do I decide whether I should go for Hadoop or Spark for a
>>>> greenfield project . I tried to find out and looks like Spark can do
>>>> everything that hadoop can do. Appreciate your thoughts on it.
>>>>
>>>> Thanks
>>>>
>>>>
>>>
>>
>


-- 

Moty Michaely

VP R&D, Xplenty

Re: Hadoop or spark

Posted by Moty Michaely <mo...@xplenty.com>.
Hey,

Xplenty's CTO wrote a good piece of comparison between the two:
https://www.xplenty.com/blog/2014/11/apache-spark-vs-hadoop-mapreduce/?utm_source=hadoop-mailing-group&utm_medium=email&utm_campaign=social

Hope this helps with deciding.

Good luck!

On Fri, Apr 10, 2015 at 4:28 PM, Shahab Yunus <sh...@gmail.com>
wrote:

> Thanks for this. Slide# 77 and 87 are pretty good. Quite a few of it,  is
> new stuff and still emerging.
>
> Regards,
> Shahab
>
> On Fri, Apr 10, 2015 at 9:10 AM, Peyman Mohajerian <mo...@gmail.com>
> wrote:
>
>> There actually is such a discussion, e.g.:
>>
>> http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi
>>
>> you can have a standalone Spark cluster with no dependency on Hadoop.
>>
>> On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <sh...@gmail.com>
>> wrote:
>>
>>> I hope I am not misunderstanding your question but I don't think there
>>> is a comparison between Spark and Hadoop. They are different things.
>>>
>>> Hadoop is a platform on which you can run Yarn, HBase and even Spark.
>>> E.g. Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as
>>> part of its installation. Spark can run within a Hadoop cluster deployment.
>>>
>>> I think a more apt comparison would be something like whether you should
>>> use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>>>
>>> Or even more direct would be Spark vs. Storm, which has been discussed
>>> here.
>>> http://marc.info/?l=hadoop-user&m=140434265901449
>>>
>>> Regards,
>>> Shahab
>>>
>>>
>>>
>>> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <as...@gmail.com>
>>> wrote:
>>>
>>>> How do I decide whether I should go for Hadoop or Spark for a
>>>> greenfield project . I tried to find out and looks like Spark can do
>>>> everything that hadoop can do. Appreciate your thoughts on it.
>>>>
>>>> Thanks
>>>>
>>>>
>>>
>>
>


-- 

Moty Michaely

VP R&D, Xplenty

Re: Hadoop or spark

Posted by Moty Michaely <mo...@xplenty.com>.
Hey,

Xplenty's CTO wrote a good piece of comparison between the two:
https://www.xplenty.com/blog/2014/11/apache-spark-vs-hadoop-mapreduce/?utm_source=hadoop-mailing-group&utm_medium=email&utm_campaign=social

Hope this helps with deciding.

Good luck!

On Fri, Apr 10, 2015 at 4:28 PM, Shahab Yunus <sh...@gmail.com>
wrote:

> Thanks for this. Slide# 77 and 87 are pretty good. Quite a few of it,  is
> new stuff and still emerging.
>
> Regards,
> Shahab
>
> On Fri, Apr 10, 2015 at 9:10 AM, Peyman Mohajerian <mo...@gmail.com>
> wrote:
>
>> There actually is such a discussion, e.g.:
>>
>> http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi
>>
>> you can have a standalone Spark cluster with no dependency on Hadoop.
>>
>> On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <sh...@gmail.com>
>> wrote:
>>
>>> I hope I am not misunderstanding your question but I don't think there
>>> is a comparison between Spark and Hadoop. They are different things.
>>>
>>> Hadoop is a platform on which you can run Yarn, HBase and even Spark.
>>> E.g. Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as
>>> part of its installation. Spark can run within a Hadoop cluster deployment.
>>>
>>> I think a more apt comparison would be something like whether you should
>>> use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>>>
>>> Or even more direct would be Spark vs. Storm, which has been discussed
>>> here.
>>> http://marc.info/?l=hadoop-user&m=140434265901449
>>>
>>> Regards,
>>> Shahab
>>>
>>>
>>>
>>> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <as...@gmail.com>
>>> wrote:
>>>
>>>> How do I decide whether I should go for Hadoop or Spark for a
>>>> greenfield project . I tried to find out and looks like Spark can do
>>>> everything that hadoop can do. Appreciate your thoughts on it.
>>>>
>>>> Thanks
>>>>
>>>>
>>>
>>
>


-- 

Moty Michaely

VP R&D, Xplenty

Re: Hadoop or spark

Posted by Shahab Yunus <sh...@gmail.com>.
Thanks for this. Slide# 77 and 87 are pretty good. Quite a few of it,  is
new stuff and still emerging.

Regards,
Shahab

On Fri, Apr 10, 2015 at 9:10 AM, Peyman Mohajerian <mo...@gmail.com>
wrote:

> There actually is such a discussion, e.g.:
>
> http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi
>
> you can have a standalone Spark cluster with no dependency on Hadoop.
>
> On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
>> I hope I am not misunderstanding your question but I don't think there is
>> a comparison between Spark and Hadoop. They are different things.
>>
>> Hadoop is a platform on which you can run Yarn, HBase and even Spark.
>> E.g. Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as
>> part of its installation. Spark can run within a Hadoop cluster deployment.
>>
>> I think a more apt comparison would be something like whether you should
>> use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>>
>> Or even more direct would be Spark vs. Storm, which has been discussed
>> here.
>> http://marc.info/?l=hadoop-user&m=140434265901449
>>
>> Regards,
>> Shahab
>>
>>
>>
>> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <as...@gmail.com>
>> wrote:
>>
>>> How do I decide whether I should go for Hadoop or Spark for a greenfield
>>> project . I tried to find out and looks like Spark can do everything that
>>> hadoop can do. Appreciate your thoughts on it.
>>>
>>> Thanks
>>>
>>>
>>
>

Re: Hadoop or spark

Posted by Shahab Yunus <sh...@gmail.com>.
Thanks for this. Slide# 77 and 87 are pretty good. Quite a few of it,  is
new stuff and still emerging.

Regards,
Shahab

On Fri, Apr 10, 2015 at 9:10 AM, Peyman Mohajerian <mo...@gmail.com>
wrote:

> There actually is such a discussion, e.g.:
>
> http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi
>
> you can have a standalone Spark cluster with no dependency on Hadoop.
>
> On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
>> I hope I am not misunderstanding your question but I don't think there is
>> a comparison between Spark and Hadoop. They are different things.
>>
>> Hadoop is a platform on which you can run Yarn, HBase and even Spark.
>> E.g. Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as
>> part of its installation. Spark can run within a Hadoop cluster deployment.
>>
>> I think a more apt comparison would be something like whether you should
>> use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>>
>> Or even more direct would be Spark vs. Storm, which has been discussed
>> here.
>> http://marc.info/?l=hadoop-user&m=140434265901449
>>
>> Regards,
>> Shahab
>>
>>
>>
>> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <as...@gmail.com>
>> wrote:
>>
>>> How do I decide whether I should go for Hadoop or Spark for a greenfield
>>> project . I tried to find out and looks like Spark can do everything that
>>> hadoop can do. Appreciate your thoughts on it.
>>>
>>> Thanks
>>>
>>>
>>
>

Re: Hadoop or spark

Posted by Shahab Yunus <sh...@gmail.com>.
Thanks for this. Slide# 77 and 87 are pretty good. Quite a few of it,  is
new stuff and still emerging.

Regards,
Shahab

On Fri, Apr 10, 2015 at 9:10 AM, Peyman Mohajerian <mo...@gmail.com>
wrote:

> There actually is such a discussion, e.g.:
>
> http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi
>
> you can have a standalone Spark cluster with no dependency on Hadoop.
>
> On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
>> I hope I am not misunderstanding your question but I don't think there is
>> a comparison between Spark and Hadoop. They are different things.
>>
>> Hadoop is a platform on which you can run Yarn, HBase and even Spark.
>> E.g. Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as
>> part of its installation. Spark can run within a Hadoop cluster deployment.
>>
>> I think a more apt comparison would be something like whether you should
>> use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>>
>> Or even more direct would be Spark vs. Storm, which has been discussed
>> here.
>> http://marc.info/?l=hadoop-user&m=140434265901449
>>
>> Regards,
>> Shahab
>>
>>
>>
>> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <as...@gmail.com>
>> wrote:
>>
>>> How do I decide whether I should go for Hadoop or Spark for a greenfield
>>> project . I tried to find out and looks like Spark can do everything that
>>> hadoop can do. Appreciate your thoughts on it.
>>>
>>> Thanks
>>>
>>>
>>
>

Re: Hadoop or spark

Posted by Shahab Yunus <sh...@gmail.com>.
Thanks for this. Slide# 77 and 87 are pretty good. Quite a few of it,  is
new stuff and still emerging.

Regards,
Shahab

On Fri, Apr 10, 2015 at 9:10 AM, Peyman Mohajerian <mo...@gmail.com>
wrote:

> There actually is such a discussion, e.g.:
>
> http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi
>
> you can have a standalone Spark cluster with no dependency on Hadoop.
>
> On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
>> I hope I am not misunderstanding your question but I don't think there is
>> a comparison between Spark and Hadoop. They are different things.
>>
>> Hadoop is a platform on which you can run Yarn, HBase and even Spark.
>> E.g. Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as
>> part of its installation. Spark can run within a Hadoop cluster deployment.
>>
>> I think a more apt comparison would be something like whether you should
>> use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>>
>> Or even more direct would be Spark vs. Storm, which has been discussed
>> here.
>> http://marc.info/?l=hadoop-user&m=140434265901449
>>
>> Regards,
>> Shahab
>>
>>
>>
>> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <as...@gmail.com>
>> wrote:
>>
>>> How do I decide whether I should go for Hadoop or Spark for a greenfield
>>> project . I tried to find out and looks like Spark can do everything that
>>> hadoop can do. Appreciate your thoughts on it.
>>>
>>> Thanks
>>>
>>>
>>
>

Re: Hadoop or spark

Posted by Peyman Mohajerian <mo...@gmail.com>.
There actually is such a discussion, e.g.:
http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi

you can have a standalone Spark cluster with no dependency on Hadoop.

On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <sh...@gmail.com>
wrote:

> I hope I am not misunderstanding your question but I don't think there is
> a comparison between Spark and Hadoop. They are different things.
>
> Hadoop is a platform on which you can run Yarn, HBase and even Spark. E.g.
> Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as part
> of its installation. Spark can run within a Hadoop cluster deployment.
>
> I think a more apt comparison would be something like whether you should
> use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>
> Or even more direct would be Spark vs. Storm, which has been discussed
> here.
> http://marc.info/?l=hadoop-user&m=140434265901449
>
> Regards,
> Shahab
>
>
>
> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <as...@gmail.com>
> wrote:
>
>> How do I decide whether I should go for Hadoop or Spark for a greenfield
>> project . I tried to find out and looks like Spark can do everything that
>> hadoop can do. Appreciate your thoughts on it.
>>
>> Thanks
>>
>>
>

Re: Hadoop or spark

Posted by Peyman Mohajerian <mo...@gmail.com>.
There actually is such a discussion, e.g.:
http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi

you can have a standalone Spark cluster with no dependency on Hadoop.

On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <sh...@gmail.com>
wrote:

> I hope I am not misunderstanding your question but I don't think there is
> a comparison between Spark and Hadoop. They are different things.
>
> Hadoop is a platform on which you can run Yarn, HBase and even Spark. E.g.
> Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as part
> of its installation. Spark can run within a Hadoop cluster deployment.
>
> I think a more apt comparison would be something like whether you should
> use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>
> Or even more direct would be Spark vs. Storm, which has been discussed
> here.
> http://marc.info/?l=hadoop-user&m=140434265901449
>
> Regards,
> Shahab
>
>
>
> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <as...@gmail.com>
> wrote:
>
>> How do I decide whether I should go for Hadoop or Spark for a greenfield
>> project . I tried to find out and looks like Spark can do everything that
>> hadoop can do. Appreciate your thoughts on it.
>>
>> Thanks
>>
>>
>

Re: Hadoop or spark

Posted by Peyman Mohajerian <mo...@gmail.com>.
There actually is such a discussion, e.g.:
http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi

you can have a standalone Spark cluster with no dependency on Hadoop.

On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <sh...@gmail.com>
wrote:

> I hope I am not misunderstanding your question but I don't think there is
> a comparison between Spark and Hadoop. They are different things.
>
> Hadoop is a platform on which you can run Yarn, HBase and even Spark. E.g.
> Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as part
> of its installation. Spark can run within a Hadoop cluster deployment.
>
> I think a more apt comparison would be something like whether you should
> use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>
> Or even more direct would be Spark vs. Storm, which has been discussed
> here.
> http://marc.info/?l=hadoop-user&m=140434265901449
>
> Regards,
> Shahab
>
>
>
> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <as...@gmail.com>
> wrote:
>
>> How do I decide whether I should go for Hadoop or Spark for a greenfield
>> project . I tried to find out and looks like Spark can do everything that
>> hadoop can do. Appreciate your thoughts on it.
>>
>> Thanks
>>
>>
>

Re: Hadoop or spark

Posted by Peyman Mohajerian <mo...@gmail.com>.
There actually is such a discussion, e.g.:
http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi

you can have a standalone Spark cluster with no dependency on Hadoop.

On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <sh...@gmail.com>
wrote:

> I hope I am not misunderstanding your question but I don't think there is
> a comparison between Spark and Hadoop. They are different things.
>
> Hadoop is a platform on which you can run Yarn, HBase and even Spark. E.g.
> Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as part
> of its installation. Spark can run within a Hadoop cluster deployment.
>
> I think a more apt comparison would be something like whether you should
> use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>
> Or even more direct would be Spark vs. Storm, which has been discussed
> here.
> http://marc.info/?l=hadoop-user&m=140434265901449
>
> Regards,
> Shahab
>
>
>
> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <as...@gmail.com>
> wrote:
>
>> How do I decide whether I should go for Hadoop or Spark for a greenfield
>> project . I tried to find out and looks like Spark can do everything that
>> hadoop can do. Appreciate your thoughts on it.
>>
>> Thanks
>>
>>
>

Re: Hadoop or spark

Posted by Shahab Yunus <sh...@gmail.com>.
I hope I am not misunderstanding your question but I don't think there is a
comparison between Spark and Hadoop. They are different things.

Hadoop is a platform on which you can run Yarn, HBase and even Spark. E.g.
Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as part
of its installation. Spark can run within a Hadoop cluster deployment.

I think a more apt comparison would be something like whether you should
use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.

Or even more direct would be Spark vs. Storm, which has been discussed here.
http://marc.info/?l=hadoop-user&m=140434265901449

Regards,
Shahab



On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <as...@gmail.com>
wrote:

> How do I decide whether I should go for Hadoop or Spark for a greenfield
> project . I tried to find out and looks like Spark can do everything that
> hadoop can do. Appreciate your thoughts on it.
>
> Thanks
>
>

Re: Hadoop or spark

Posted by Shahab Yunus <sh...@gmail.com>.
I hope I am not misunderstanding your question but I don't think there is a
comparison between Spark and Hadoop. They are different things.

Hadoop is a platform on which you can run Yarn, HBase and even Spark. E.g.
Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as part
of its installation. Spark can run within a Hadoop cluster deployment.

I think a more apt comparison would be something like whether you should
use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.

Or even more direct would be Spark vs. Storm, which has been discussed here.
http://marc.info/?l=hadoop-user&m=140434265901449

Regards,
Shahab



On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <as...@gmail.com>
wrote:

> How do I decide whether I should go for Hadoop or Spark for a greenfield
> project . I tried to find out and looks like Spark can do everything that
> hadoop can do. Appreciate your thoughts on it.
>
> Thanks
>
>

Re: Hadoop or spark

Posted by Shahab Yunus <sh...@gmail.com>.
I hope I am not misunderstanding your question but I don't think there is a
comparison between Spark and Hadoop. They are different things.

Hadoop is a platform on which you can run Yarn, HBase and even Spark. E.g.
Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as part
of its installation. Spark can run within a Hadoop cluster deployment.

I think a more apt comparison would be something like whether you should
use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.

Or even more direct would be Spark vs. Storm, which has been discussed here.
http://marc.info/?l=hadoop-user&m=140434265901449

Regards,
Shahab



On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <as...@gmail.com>
wrote:

> How do I decide whether I should go for Hadoop or Spark for a greenfield
> project . I tried to find out and looks like Spark can do everything that
> hadoop can do. Appreciate your thoughts on it.
>
> Thanks
>
>

Re: Hadoop or spark

Posted by Shahab Yunus <sh...@gmail.com>.
I hope I am not misunderstanding your question but I don't think there is a
comparison between Spark and Hadoop. They are different things.

Hadoop is a platform on which you can run Yarn, HBase and even Spark. E.g.
Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as part
of its installation. Spark can run within a Hadoop cluster deployment.

I think a more apt comparison would be something like whether you should
use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.

Or even more direct would be Spark vs. Storm, which has been discussed here.
http://marc.info/?l=hadoop-user&m=140434265901449

Regards,
Shahab



On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <as...@gmail.com>
wrote:

> How do I decide whether I should go for Hadoop or Spark for a greenfield
> project . I tried to find out and looks like Spark can do everything that
> hadoop can do. Appreciate your thoughts on it.
>
> Thanks
>
>