You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Ashutosh Kumar <as...@gmail.com> on 2015/04/10 07:08:58 UTC
Hadoop or spark
How do I decide whether I should go for Hadoop or Spark for a greenfield
project . I tried to find out and looks like Spark can do everything that
hadoop can do. Appreciate your thoughts on it.
Thanks
Re: Hadoop or spark
Posted by Ashutosh Kumar <as...@gmail.com>.
Thanks. I read this article and t seems for all practical purposes Spark is
preferred than Hadoop map reduce. Only when have processing for very large
files , in that case Hadoop map reduce scores over Spark. But what is this
large file size? Is it TBs or PBs or varies based on cluster size? Please
share your views.
Thanks
Ashutosh
On Fri, Apr 10, 2015 at 8:23 PM, Moty Michaely <mo...@xplenty.com> wrote:
> Hey,
>
> Xplenty's CTO wrote a good piece of comparison between the two:
>
> https://www.xplenty.com/blog/2014/11/apache-spark-vs-hadoop-mapreduce/?utm_source=hadoop-mailing-group&utm_medium=email&utm_campaign=social
>
> Hope this helps with deciding.
>
> Good luck!
>
> On Fri, Apr 10, 2015 at 4:28 PM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
>> Thanks for this. Slide# 77 and 87 are pretty good. Quite a few of it, is
>> new stuff and still emerging.
>>
>> Regards,
>> Shahab
>>
>> On Fri, Apr 10, 2015 at 9:10 AM, Peyman Mohajerian <mo...@gmail.com>
>> wrote:
>>
>>> There actually is such a discussion, e.g.:
>>>
>>> http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi
>>>
>>> you can have a standalone Spark cluster with no dependency on Hadoop.
>>>
>>> On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <sh...@gmail.com>
>>> wrote:
>>>
>>>> I hope I am not misunderstanding your question but I don't think there
>>>> is a comparison between Spark and Hadoop. They are different things.
>>>>
>>>> Hadoop is a platform on which you can run Yarn, HBase and even Spark.
>>>> E.g. Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as
>>>> part of its installation. Spark can run within a Hadoop cluster deployment.
>>>>
>>>> I think a more apt comparison would be something like whether you
>>>> should use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>>>>
>>>> Or even more direct would be Spark vs. Storm, which has been discussed
>>>> here.
>>>> http://marc.info/?l=hadoop-user&m=140434265901449
>>>>
>>>> Regards,
>>>> Shahab
>>>>
>>>>
>>>>
>>>> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <ashutosh.k78@gmail.com
>>>> > wrote:
>>>>
>>>>> How do I decide whether I should go for Hadoop or Spark for a
>>>>> greenfield project . I tried to find out and looks like Spark can do
>>>>> everything that hadoop can do. Appreciate your thoughts on it.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
>
> Moty Michaely
>
> VP R&D, Xplenty
>
>
>
Re: Hadoop or spark
Posted by Ashutosh Kumar <as...@gmail.com>.
Thanks. I read this article and t seems for all practical purposes Spark is
preferred than Hadoop map reduce. Only when have processing for very large
files , in that case Hadoop map reduce scores over Spark. But what is this
large file size? Is it TBs or PBs or varies based on cluster size? Please
share your views.
Thanks
Ashutosh
On Fri, Apr 10, 2015 at 8:23 PM, Moty Michaely <mo...@xplenty.com> wrote:
> Hey,
>
> Xplenty's CTO wrote a good piece of comparison between the two:
>
> https://www.xplenty.com/blog/2014/11/apache-spark-vs-hadoop-mapreduce/?utm_source=hadoop-mailing-group&utm_medium=email&utm_campaign=social
>
> Hope this helps with deciding.
>
> Good luck!
>
> On Fri, Apr 10, 2015 at 4:28 PM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
>> Thanks for this. Slide# 77 and 87 are pretty good. Quite a few of it, is
>> new stuff and still emerging.
>>
>> Regards,
>> Shahab
>>
>> On Fri, Apr 10, 2015 at 9:10 AM, Peyman Mohajerian <mo...@gmail.com>
>> wrote:
>>
>>> There actually is such a discussion, e.g.:
>>>
>>> http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi
>>>
>>> you can have a standalone Spark cluster with no dependency on Hadoop.
>>>
>>> On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <sh...@gmail.com>
>>> wrote:
>>>
>>>> I hope I am not misunderstanding your question but I don't think there
>>>> is a comparison between Spark and Hadoop. They are different things.
>>>>
>>>> Hadoop is a platform on which you can run Yarn, HBase and even Spark.
>>>> E.g. Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as
>>>> part of its installation. Spark can run within a Hadoop cluster deployment.
>>>>
>>>> I think a more apt comparison would be something like whether you
>>>> should use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>>>>
>>>> Or even more direct would be Spark vs. Storm, which has been discussed
>>>> here.
>>>> http://marc.info/?l=hadoop-user&m=140434265901449
>>>>
>>>> Regards,
>>>> Shahab
>>>>
>>>>
>>>>
>>>> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <ashutosh.k78@gmail.com
>>>> > wrote:
>>>>
>>>>> How do I decide whether I should go for Hadoop or Spark for a
>>>>> greenfield project . I tried to find out and looks like Spark can do
>>>>> everything that hadoop can do. Appreciate your thoughts on it.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
>
> Moty Michaely
>
> VP R&D, Xplenty
>
>
>
Re: Hadoop or spark
Posted by Ashutosh Kumar <as...@gmail.com>.
Thanks. I read this article and t seems for all practical purposes Spark is
preferred than Hadoop map reduce. Only when have processing for very large
files , in that case Hadoop map reduce scores over Spark. But what is this
large file size? Is it TBs or PBs or varies based on cluster size? Please
share your views.
Thanks
Ashutosh
On Fri, Apr 10, 2015 at 8:23 PM, Moty Michaely <mo...@xplenty.com> wrote:
> Hey,
>
> Xplenty's CTO wrote a good piece of comparison between the two:
>
> https://www.xplenty.com/blog/2014/11/apache-spark-vs-hadoop-mapreduce/?utm_source=hadoop-mailing-group&utm_medium=email&utm_campaign=social
>
> Hope this helps with deciding.
>
> Good luck!
>
> On Fri, Apr 10, 2015 at 4:28 PM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
>> Thanks for this. Slide# 77 and 87 are pretty good. Quite a few of it, is
>> new stuff and still emerging.
>>
>> Regards,
>> Shahab
>>
>> On Fri, Apr 10, 2015 at 9:10 AM, Peyman Mohajerian <mo...@gmail.com>
>> wrote:
>>
>>> There actually is such a discussion, e.g.:
>>>
>>> http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi
>>>
>>> you can have a standalone Spark cluster with no dependency on Hadoop.
>>>
>>> On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <sh...@gmail.com>
>>> wrote:
>>>
>>>> I hope I am not misunderstanding your question but I don't think there
>>>> is a comparison between Spark and Hadoop. They are different things.
>>>>
>>>> Hadoop is a platform on which you can run Yarn, HBase and even Spark.
>>>> E.g. Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as
>>>> part of its installation. Spark can run within a Hadoop cluster deployment.
>>>>
>>>> I think a more apt comparison would be something like whether you
>>>> should use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>>>>
>>>> Or even more direct would be Spark vs. Storm, which has been discussed
>>>> here.
>>>> http://marc.info/?l=hadoop-user&m=140434265901449
>>>>
>>>> Regards,
>>>> Shahab
>>>>
>>>>
>>>>
>>>> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <ashutosh.k78@gmail.com
>>>> > wrote:
>>>>
>>>>> How do I decide whether I should go for Hadoop or Spark for a
>>>>> greenfield project . I tried to find out and looks like Spark can do
>>>>> everything that hadoop can do. Appreciate your thoughts on it.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
>
> Moty Michaely
>
> VP R&D, Xplenty
>
>
>
Re: Hadoop or spark
Posted by Ashutosh Kumar <as...@gmail.com>.
Thanks. I read this article and t seems for all practical purposes Spark is
preferred than Hadoop map reduce. Only when have processing for very large
files , in that case Hadoop map reduce scores over Spark. But what is this
large file size? Is it TBs or PBs or varies based on cluster size? Please
share your views.
Thanks
Ashutosh
On Fri, Apr 10, 2015 at 8:23 PM, Moty Michaely <mo...@xplenty.com> wrote:
> Hey,
>
> Xplenty's CTO wrote a good piece of comparison between the two:
>
> https://www.xplenty.com/blog/2014/11/apache-spark-vs-hadoop-mapreduce/?utm_source=hadoop-mailing-group&utm_medium=email&utm_campaign=social
>
> Hope this helps with deciding.
>
> Good luck!
>
> On Fri, Apr 10, 2015 at 4:28 PM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
>> Thanks for this. Slide# 77 and 87 are pretty good. Quite a few of it, is
>> new stuff and still emerging.
>>
>> Regards,
>> Shahab
>>
>> On Fri, Apr 10, 2015 at 9:10 AM, Peyman Mohajerian <mo...@gmail.com>
>> wrote:
>>
>>> There actually is such a discussion, e.g.:
>>>
>>> http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi
>>>
>>> you can have a standalone Spark cluster with no dependency on Hadoop.
>>>
>>> On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <sh...@gmail.com>
>>> wrote:
>>>
>>>> I hope I am not misunderstanding your question but I don't think there
>>>> is a comparison between Spark and Hadoop. They are different things.
>>>>
>>>> Hadoop is a platform on which you can run Yarn, HBase and even Spark.
>>>> E.g. Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as
>>>> part of its installation. Spark can run within a Hadoop cluster deployment.
>>>>
>>>> I think a more apt comparison would be something like whether you
>>>> should use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>>>>
>>>> Or even more direct would be Spark vs. Storm, which has been discussed
>>>> here.
>>>> http://marc.info/?l=hadoop-user&m=140434265901449
>>>>
>>>> Regards,
>>>> Shahab
>>>>
>>>>
>>>>
>>>> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <ashutosh.k78@gmail.com
>>>> > wrote:
>>>>
>>>>> How do I decide whether I should go for Hadoop or Spark for a
>>>>> greenfield project . I tried to find out and looks like Spark can do
>>>>> everything that hadoop can do. Appreciate your thoughts on it.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
>
> Moty Michaely
>
> VP R&D, Xplenty
>
>
>
Re: Hadoop or spark
Posted by Moty Michaely <mo...@xplenty.com>.
Hey,
Xplenty's CTO wrote a good piece of comparison between the two:
https://www.xplenty.com/blog/2014/11/apache-spark-vs-hadoop-mapreduce/?utm_source=hadoop-mailing-group&utm_medium=email&utm_campaign=social
Hope this helps with deciding.
Good luck!
On Fri, Apr 10, 2015 at 4:28 PM, Shahab Yunus <sh...@gmail.com>
wrote:
> Thanks for this. Slide# 77 and 87 are pretty good. Quite a few of it, is
> new stuff and still emerging.
>
> Regards,
> Shahab
>
> On Fri, Apr 10, 2015 at 9:10 AM, Peyman Mohajerian <mo...@gmail.com>
> wrote:
>
>> There actually is such a discussion, e.g.:
>>
>> http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi
>>
>> you can have a standalone Spark cluster with no dependency on Hadoop.
>>
>> On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <sh...@gmail.com>
>> wrote:
>>
>>> I hope I am not misunderstanding your question but I don't think there
>>> is a comparison between Spark and Hadoop. They are different things.
>>>
>>> Hadoop is a platform on which you can run Yarn, HBase and even Spark.
>>> E.g. Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as
>>> part of its installation. Spark can run within a Hadoop cluster deployment.
>>>
>>> I think a more apt comparison would be something like whether you should
>>> use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>>>
>>> Or even more direct would be Spark vs. Storm, which has been discussed
>>> here.
>>> http://marc.info/?l=hadoop-user&m=140434265901449
>>>
>>> Regards,
>>> Shahab
>>>
>>>
>>>
>>> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <as...@gmail.com>
>>> wrote:
>>>
>>>> How do I decide whether I should go for Hadoop or Spark for a
>>>> greenfield project . I tried to find out and looks like Spark can do
>>>> everything that hadoop can do. Appreciate your thoughts on it.
>>>>
>>>> Thanks
>>>>
>>>>
>>>
>>
>
--
Moty Michaely
VP R&D, Xplenty
Re: Hadoop or spark
Posted by Moty Michaely <mo...@xplenty.com>.
Hey,
Xplenty's CTO wrote a good piece of comparison between the two:
https://www.xplenty.com/blog/2014/11/apache-spark-vs-hadoop-mapreduce/?utm_source=hadoop-mailing-group&utm_medium=email&utm_campaign=social
Hope this helps with deciding.
Good luck!
On Fri, Apr 10, 2015 at 4:28 PM, Shahab Yunus <sh...@gmail.com>
wrote:
> Thanks for this. Slide# 77 and 87 are pretty good. Quite a few of it, is
> new stuff and still emerging.
>
> Regards,
> Shahab
>
> On Fri, Apr 10, 2015 at 9:10 AM, Peyman Mohajerian <mo...@gmail.com>
> wrote:
>
>> There actually is such a discussion, e.g.:
>>
>> http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi
>>
>> you can have a standalone Spark cluster with no dependency on Hadoop.
>>
>> On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <sh...@gmail.com>
>> wrote:
>>
>>> I hope I am not misunderstanding your question but I don't think there
>>> is a comparison between Spark and Hadoop. They are different things.
>>>
>>> Hadoop is a platform on which you can run Yarn, HBase and even Spark.
>>> E.g. Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as
>>> part of its installation. Spark can run within a Hadoop cluster deployment.
>>>
>>> I think a more apt comparison would be something like whether you should
>>> use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>>>
>>> Or even more direct would be Spark vs. Storm, which has been discussed
>>> here.
>>> http://marc.info/?l=hadoop-user&m=140434265901449
>>>
>>> Regards,
>>> Shahab
>>>
>>>
>>>
>>> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <as...@gmail.com>
>>> wrote:
>>>
>>>> How do I decide whether I should go for Hadoop or Spark for a
>>>> greenfield project . I tried to find out and looks like Spark can do
>>>> everything that hadoop can do. Appreciate your thoughts on it.
>>>>
>>>> Thanks
>>>>
>>>>
>>>
>>
>
--
Moty Michaely
VP R&D, Xplenty
Re: Hadoop or spark
Posted by Moty Michaely <mo...@xplenty.com>.
Hey,
Xplenty's CTO wrote a good piece of comparison between the two:
https://www.xplenty.com/blog/2014/11/apache-spark-vs-hadoop-mapreduce/?utm_source=hadoop-mailing-group&utm_medium=email&utm_campaign=social
Hope this helps with deciding.
Good luck!
On Fri, Apr 10, 2015 at 4:28 PM, Shahab Yunus <sh...@gmail.com>
wrote:
> Thanks for this. Slide# 77 and 87 are pretty good. Quite a few of it, is
> new stuff and still emerging.
>
> Regards,
> Shahab
>
> On Fri, Apr 10, 2015 at 9:10 AM, Peyman Mohajerian <mo...@gmail.com>
> wrote:
>
>> There actually is such a discussion, e.g.:
>>
>> http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi
>>
>> you can have a standalone Spark cluster with no dependency on Hadoop.
>>
>> On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <sh...@gmail.com>
>> wrote:
>>
>>> I hope I am not misunderstanding your question but I don't think there
>>> is a comparison between Spark and Hadoop. They are different things.
>>>
>>> Hadoop is a platform on which you can run Yarn, HBase and even Spark.
>>> E.g. Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as
>>> part of its installation. Spark can run within a Hadoop cluster deployment.
>>>
>>> I think a more apt comparison would be something like whether you should
>>> use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>>>
>>> Or even more direct would be Spark vs. Storm, which has been discussed
>>> here.
>>> http://marc.info/?l=hadoop-user&m=140434265901449
>>>
>>> Regards,
>>> Shahab
>>>
>>>
>>>
>>> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <as...@gmail.com>
>>> wrote:
>>>
>>>> How do I decide whether I should go for Hadoop or Spark for a
>>>> greenfield project . I tried to find out and looks like Spark can do
>>>> everything that hadoop can do. Appreciate your thoughts on it.
>>>>
>>>> Thanks
>>>>
>>>>
>>>
>>
>
--
Moty Michaely
VP R&D, Xplenty
Re: Hadoop or spark
Posted by Moty Michaely <mo...@xplenty.com>.
Hey,
Xplenty's CTO wrote a good piece of comparison between the two:
https://www.xplenty.com/blog/2014/11/apache-spark-vs-hadoop-mapreduce/?utm_source=hadoop-mailing-group&utm_medium=email&utm_campaign=social
Hope this helps with deciding.
Good luck!
On Fri, Apr 10, 2015 at 4:28 PM, Shahab Yunus <sh...@gmail.com>
wrote:
> Thanks for this. Slide# 77 and 87 are pretty good. Quite a few of it, is
> new stuff and still emerging.
>
> Regards,
> Shahab
>
> On Fri, Apr 10, 2015 at 9:10 AM, Peyman Mohajerian <mo...@gmail.com>
> wrote:
>
>> There actually is such a discussion, e.g.:
>>
>> http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi
>>
>> you can have a standalone Spark cluster with no dependency on Hadoop.
>>
>> On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <sh...@gmail.com>
>> wrote:
>>
>>> I hope I am not misunderstanding your question but I don't think there
>>> is a comparison between Spark and Hadoop. They are different things.
>>>
>>> Hadoop is a platform on which you can run Yarn, HBase and even Spark.
>>> E.g. Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as
>>> part of its installation. Spark can run within a Hadoop cluster deployment.
>>>
>>> I think a more apt comparison would be something like whether you should
>>> use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>>>
>>> Or even more direct would be Spark vs. Storm, which has been discussed
>>> here.
>>> http://marc.info/?l=hadoop-user&m=140434265901449
>>>
>>> Regards,
>>> Shahab
>>>
>>>
>>>
>>> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <as...@gmail.com>
>>> wrote:
>>>
>>>> How do I decide whether I should go for Hadoop or Spark for a
>>>> greenfield project . I tried to find out and looks like Spark can do
>>>> everything that hadoop can do. Appreciate your thoughts on it.
>>>>
>>>> Thanks
>>>>
>>>>
>>>
>>
>
--
Moty Michaely
VP R&D, Xplenty
Re: Hadoop or spark
Posted by Shahab Yunus <sh...@gmail.com>.
Thanks for this. Slide# 77 and 87 are pretty good. Quite a few of it, is
new stuff and still emerging.
Regards,
Shahab
On Fri, Apr 10, 2015 at 9:10 AM, Peyman Mohajerian <mo...@gmail.com>
wrote:
> There actually is such a discussion, e.g.:
>
> http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi
>
> you can have a standalone Spark cluster with no dependency on Hadoop.
>
> On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
>> I hope I am not misunderstanding your question but I don't think there is
>> a comparison between Spark and Hadoop. They are different things.
>>
>> Hadoop is a platform on which you can run Yarn, HBase and even Spark.
>> E.g. Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as
>> part of its installation. Spark can run within a Hadoop cluster deployment.
>>
>> I think a more apt comparison would be something like whether you should
>> use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>>
>> Or even more direct would be Spark vs. Storm, which has been discussed
>> here.
>> http://marc.info/?l=hadoop-user&m=140434265901449
>>
>> Regards,
>> Shahab
>>
>>
>>
>> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <as...@gmail.com>
>> wrote:
>>
>>> How do I decide whether I should go for Hadoop or Spark for a greenfield
>>> project . I tried to find out and looks like Spark can do everything that
>>> hadoop can do. Appreciate your thoughts on it.
>>>
>>> Thanks
>>>
>>>
>>
>
Re: Hadoop or spark
Posted by Shahab Yunus <sh...@gmail.com>.
Thanks for this. Slide# 77 and 87 are pretty good. Quite a few of it, is
new stuff and still emerging.
Regards,
Shahab
On Fri, Apr 10, 2015 at 9:10 AM, Peyman Mohajerian <mo...@gmail.com>
wrote:
> There actually is such a discussion, e.g.:
>
> http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi
>
> you can have a standalone Spark cluster with no dependency on Hadoop.
>
> On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
>> I hope I am not misunderstanding your question but I don't think there is
>> a comparison between Spark and Hadoop. They are different things.
>>
>> Hadoop is a platform on which you can run Yarn, HBase and even Spark.
>> E.g. Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as
>> part of its installation. Spark can run within a Hadoop cluster deployment.
>>
>> I think a more apt comparison would be something like whether you should
>> use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>>
>> Or even more direct would be Spark vs. Storm, which has been discussed
>> here.
>> http://marc.info/?l=hadoop-user&m=140434265901449
>>
>> Regards,
>> Shahab
>>
>>
>>
>> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <as...@gmail.com>
>> wrote:
>>
>>> How do I decide whether I should go for Hadoop or Spark for a greenfield
>>> project . I tried to find out and looks like Spark can do everything that
>>> hadoop can do. Appreciate your thoughts on it.
>>>
>>> Thanks
>>>
>>>
>>
>
Re: Hadoop or spark
Posted by Shahab Yunus <sh...@gmail.com>.
Thanks for this. Slide# 77 and 87 are pretty good. Quite a few of it, is
new stuff and still emerging.
Regards,
Shahab
On Fri, Apr 10, 2015 at 9:10 AM, Peyman Mohajerian <mo...@gmail.com>
wrote:
> There actually is such a discussion, e.g.:
>
> http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi
>
> you can have a standalone Spark cluster with no dependency on Hadoop.
>
> On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
>> I hope I am not misunderstanding your question but I don't think there is
>> a comparison between Spark and Hadoop. They are different things.
>>
>> Hadoop is a platform on which you can run Yarn, HBase and even Spark.
>> E.g. Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as
>> part of its installation. Spark can run within a Hadoop cluster deployment.
>>
>> I think a more apt comparison would be something like whether you should
>> use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>>
>> Or even more direct would be Spark vs. Storm, which has been discussed
>> here.
>> http://marc.info/?l=hadoop-user&m=140434265901449
>>
>> Regards,
>> Shahab
>>
>>
>>
>> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <as...@gmail.com>
>> wrote:
>>
>>> How do I decide whether I should go for Hadoop or Spark for a greenfield
>>> project . I tried to find out and looks like Spark can do everything that
>>> hadoop can do. Appreciate your thoughts on it.
>>>
>>> Thanks
>>>
>>>
>>
>
Re: Hadoop or spark
Posted by Shahab Yunus <sh...@gmail.com>.
Thanks for this. Slide# 77 and 87 are pretty good. Quite a few of it, is
new stuff and still emerging.
Regards,
Shahab
On Fri, Apr 10, 2015 at 9:10 AM, Peyman Mohajerian <mo...@gmail.com>
wrote:
> There actually is such a discussion, e.g.:
>
> http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi
>
> you can have a standalone Spark cluster with no dependency on Hadoop.
>
> On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
>> I hope I am not misunderstanding your question but I don't think there is
>> a comparison between Spark and Hadoop. They are different things.
>>
>> Hadoop is a platform on which you can run Yarn, HBase and even Spark.
>> E.g. Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as
>> part of its installation. Spark can run within a Hadoop cluster deployment.
>>
>> I think a more apt comparison would be something like whether you should
>> use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>>
>> Or even more direct would be Spark vs. Storm, which has been discussed
>> here.
>> http://marc.info/?l=hadoop-user&m=140434265901449
>>
>> Regards,
>> Shahab
>>
>>
>>
>> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <as...@gmail.com>
>> wrote:
>>
>>> How do I decide whether I should go for Hadoop or Spark for a greenfield
>>> project . I tried to find out and looks like Spark can do everything that
>>> hadoop can do. Appreciate your thoughts on it.
>>>
>>> Thanks
>>>
>>>
>>
>
Re: Hadoop or spark
Posted by Peyman Mohajerian <mo...@gmail.com>.
There actually is such a discussion, e.g.:
http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi
you can have a standalone Spark cluster with no dependency on Hadoop.
On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <sh...@gmail.com>
wrote:
> I hope I am not misunderstanding your question but I don't think there is
> a comparison between Spark and Hadoop. They are different things.
>
> Hadoop is a platform on which you can run Yarn, HBase and even Spark. E.g.
> Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as part
> of its installation. Spark can run within a Hadoop cluster deployment.
>
> I think a more apt comparison would be something like whether you should
> use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>
> Or even more direct would be Spark vs. Storm, which has been discussed
> here.
> http://marc.info/?l=hadoop-user&m=140434265901449
>
> Regards,
> Shahab
>
>
>
> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <as...@gmail.com>
> wrote:
>
>> How do I decide whether I should go for Hadoop or Spark for a greenfield
>> project . I tried to find out and looks like Spark can do everything that
>> hadoop can do. Appreciate your thoughts on it.
>>
>> Thanks
>>
>>
>
Re: Hadoop or spark
Posted by Peyman Mohajerian <mo...@gmail.com>.
There actually is such a discussion, e.g.:
http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi
you can have a standalone Spark cluster with no dependency on Hadoop.
On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <sh...@gmail.com>
wrote:
> I hope I am not misunderstanding your question but I don't think there is
> a comparison between Spark and Hadoop. They are different things.
>
> Hadoop is a platform on which you can run Yarn, HBase and even Spark. E.g.
> Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as part
> of its installation. Spark can run within a Hadoop cluster deployment.
>
> I think a more apt comparison would be something like whether you should
> use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>
> Or even more direct would be Spark vs. Storm, which has been discussed
> here.
> http://marc.info/?l=hadoop-user&m=140434265901449
>
> Regards,
> Shahab
>
>
>
> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <as...@gmail.com>
> wrote:
>
>> How do I decide whether I should go for Hadoop or Spark for a greenfield
>> project . I tried to find out and looks like Spark can do everything that
>> hadoop can do. Appreciate your thoughts on it.
>>
>> Thanks
>>
>>
>
Re: Hadoop or spark
Posted by Peyman Mohajerian <mo...@gmail.com>.
There actually is such a discussion, e.g.:
http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi
you can have a standalone Spark cluster with no dependency on Hadoop.
On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <sh...@gmail.com>
wrote:
> I hope I am not misunderstanding your question but I don't think there is
> a comparison between Spark and Hadoop. They are different things.
>
> Hadoop is a platform on which you can run Yarn, HBase and even Spark. E.g.
> Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as part
> of its installation. Spark can run within a Hadoop cluster deployment.
>
> I think a more apt comparison would be something like whether you should
> use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>
> Or even more direct would be Spark vs. Storm, which has been discussed
> here.
> http://marc.info/?l=hadoop-user&m=140434265901449
>
> Regards,
> Shahab
>
>
>
> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <as...@gmail.com>
> wrote:
>
>> How do I decide whether I should go for Hadoop or Spark for a greenfield
>> project . I tried to find out and looks like Spark can do everything that
>> hadoop can do. Appreciate your thoughts on it.
>>
>> Thanks
>>
>>
>
Re: Hadoop or spark
Posted by Peyman Mohajerian <mo...@gmail.com>.
There actually is such a discussion, e.g.:
http://www.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi
you can have a standalone Spark cluster with no dependency on Hadoop.
On Fri, Apr 10, 2015 at 5:47 AM, Shahab Yunus <sh...@gmail.com>
wrote:
> I hope I am not misunderstanding your question but I don't think there is
> a comparison between Spark and Hadoop. They are different things.
>
> Hadoop is a platform on which you can run Yarn, HBase and even Spark. E.g.
> Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as part
> of its installation. Spark can run within a Hadoop cluster deployment.
>
> I think a more apt comparison would be something like whether you should
> use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
>
> Or even more direct would be Spark vs. Storm, which has been discussed
> here.
> http://marc.info/?l=hadoop-user&m=140434265901449
>
> Regards,
> Shahab
>
>
>
> On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <as...@gmail.com>
> wrote:
>
>> How do I decide whether I should go for Hadoop or Spark for a greenfield
>> project . I tried to find out and looks like Spark can do everything that
>> hadoop can do. Appreciate your thoughts on it.
>>
>> Thanks
>>
>>
>
Re: Hadoop or spark
Posted by Shahab Yunus <sh...@gmail.com>.
I hope I am not misunderstanding your question but I don't think there is a
comparison between Spark and Hadoop. They are different things.
Hadoop is a platform on which you can run Yarn, HBase and even Spark. E.g.
Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as part
of its installation. Spark can run within a Hadoop cluster deployment.
I think a more apt comparison would be something like whether you should
use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
Or even more direct would be Spark vs. Storm, which has been discussed here.
http://marc.info/?l=hadoop-user&m=140434265901449
Regards,
Shahab
On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <as...@gmail.com>
wrote:
> How do I decide whether I should go for Hadoop or Spark for a greenfield
> project . I tried to find out and looks like Spark can do everything that
> hadoop can do. Appreciate your thoughts on it.
>
> Thanks
>
>
Re: Hadoop or spark
Posted by Shahab Yunus <sh...@gmail.com>.
I hope I am not misunderstanding your question but I don't think there is a
comparison between Spark and Hadoop. They are different things.
Hadoop is a platform on which you can run Yarn, HBase and even Spark. E.g.
Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as part
of its installation. Spark can run within a Hadoop cluster deployment.
I think a more apt comparison would be something like whether you should
use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
Or even more direct would be Spark vs. Storm, which has been discussed here.
http://marc.info/?l=hadoop-user&m=140434265901449
Regards,
Shahab
On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <as...@gmail.com>
wrote:
> How do I decide whether I should go for Hadoop or Spark for a greenfield
> project . I tried to find out and looks like Spark can do everything that
> hadoop can do. Appreciate your thoughts on it.
>
> Thanks
>
>
Re: Hadoop or spark
Posted by Shahab Yunus <sh...@gmail.com>.
I hope I am not misunderstanding your question but I don't think there is a
comparison between Spark and Hadoop. They are different things.
Hadoop is a platform on which you can run Yarn, HBase and even Spark. E.g.
Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as part
of its installation. Spark can run within a Hadoop cluster deployment.
I think a more apt comparison would be something like whether you should
use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
Or even more direct would be Spark vs. Storm, which has been discussed here.
http://marc.info/?l=hadoop-user&m=140434265901449
Regards,
Shahab
On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <as...@gmail.com>
wrote:
> How do I decide whether I should go for Hadoop or Spark for a greenfield
> project . I tried to find out and looks like Spark can do everything that
> hadoop can do. Appreciate your thoughts on it.
>
> Thanks
>
>
Re: Hadoop or spark
Posted by Shahab Yunus <sh...@gmail.com>.
I hope I am not misunderstanding your question but I don't think there is a
comparison between Spark and Hadoop. They are different things.
Hadoop is a platform on which you can run Yarn, HBase and even Spark. E.g.
Cloudera's Hadoop distribution has Spark, Hbase, Impala, Pig etc. as part
of its installation. Spark can run within a Hadoop cluster deployment.
I think a more apt comparison would be something like whether you should
use regular MapReduce on Yarn on Hadoop OR Spark on Hadoop.
Or even more direct would be Spark vs. Storm, which has been discussed here.
http://marc.info/?l=hadoop-user&m=140434265901449
Regards,
Shahab
On Fri, Apr 10, 2015 at 1:08 AM, Ashutosh Kumar <as...@gmail.com>
wrote:
> How do I decide whether I should go for Hadoop or Spark for a greenfield
> project . I tried to find out and looks like Spark can do everything that
> hadoop can do. Appreciate your thoughts on it.
>
> Thanks
>
>