You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Dean Wampler <de...@gmail.com> on 2014/05/01 00:25:19 UTC

My talk on "Spark: The Next Top (Compute) Model"

I meant to post this last week, but this is a talk I gave at the Philly ETE
conf. last week:

http://www.slideshare.net/deanwampler/spark-the-next-top-compute-model

Also here:

http://polyglotprogramming.com/papers/Spark-TheNextTopComputeModel.pdf

dean

-- 
Dean Wampler, Ph.D.
Typesafe
@deanwampler
http://typesafe.com
http://polyglotprogramming.com

Re: My talk on "Spark: The Next Top (Compute) Model"

Posted by Dean Wampler <de...@gmail.com>.

I updated the uploads at both locations to fix slide 23. Thanks for the
feedback.

dean


On Thu, May 1, 2014 at 9:25 AM, diplomatic Guru <di...@gmail.com>wrote:

> Thanks Dean, very useful indeed!
>
> Best regards,
>
> Raj
>
>
> On 1 May 2014 14:46, Dean Wampler <de...@gmail.com> wrote:
>
>> That's great! Thanks. Let me know if it works ;) or what I could improve
>> to make it work.
>>
>> dean
>>
>>
>> On Thu, May 1, 2014 at 8:45 AM, ZhangYi <yi...@thoughtworks.com> wrote:
>>
>>>  Very Useful material. Currently, I am trying to persuade my client
>>> choose Spark instead of Hadoop MapReduce. Your slide give me more evidence
>>> to support my opinion.
>>>
>>> --
>>> ZhangYi (张逸)
>>> Developer
>>> tel: 15023157626
>>> blog: agiledon.github.com
>>> weibo: tw张逸
>>> Sent with Sparrow <http://www.sparrowmailapp.com/?sig>
>>>
>>> On Thursday, May 1, 2014 at 9:18 PM, Daniel Darabos wrote:
>>>
>>> Cool intro, thanks! One question. On slide 23 it says "Standalone
>>> ("local" mode)". That sounds a bit confusing without hearing the talk.
>>>
>>> Standalone mode is not local. It just does not depend on a cluster
>>> software. I think it's the best mode for EC2/GCE, because they provide a
>>> distributed filesystem anyway (S3/GCS). Why configure Hadoop if you don't
>>> have to.
>>>
>>>
>>> On Thu, May 1, 2014 at 12:25 AM, Dean Wampler <de...@gmail.com>wrote:
>>>
>>>  I meant to post this last week, but this is a talk I gave at the Philly
>>> ETE conf. last week:
>>>
>>> http://www.slideshare.net/deanwampler/spark-the-next-top-compute-model
>>>
>>> Also here:
>>>
>>> http://polyglotprogramming.com/papers/Spark-TheNextTopComputeModel.pdf
>>>
>>> dean
>>>
>>> --
>>> Dean Wampler, Ph.D.
>>> Typesafe
>>> @deanwampler
>>> http://typesafe.com
>>> http://polyglotprogramming.com
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Dean Wampler, Ph.D.
>> Typesafe
>> @deanwampler
>> http://typesafe.com
>> http://polyglotprogramming.com
>>
>
>


-- 
Dean Wampler, Ph.D.
Typesafe
@deanwampler
http://typesafe.com
http://polyglotprogramming.com

Re: My talk on "Spark: The Next Top (Compute) Model"

Posted by diplomatic Guru <di...@gmail.com>.

Thanks Dean, very useful indeed!

Best regards,

Raj


On 1 May 2014 14:46, Dean Wampler <de...@gmail.com> wrote:

> That's great! Thanks. Let me know if it works ;) or what I could improve
> to make it work.
>
> dean
>
>
> On Thu, May 1, 2014 at 8:45 AM, ZhangYi <yi...@thoughtworks.com> wrote:
>
>>  Very Useful material. Currently, I am trying to persuade my client
>> choose Spark instead of Hadoop MapReduce. Your slide give me more evidence
>> to support my opinion.
>>
>> --
>> ZhangYi (张逸)
>> Developer
>> tel: 15023157626
>> blog: agiledon.github.com
>> weibo: tw张逸
>> Sent with Sparrow <http://www.sparrowmailapp.com/?sig>
>>
>> On Thursday, May 1, 2014 at 9:18 PM, Daniel Darabos wrote:
>>
>> Cool intro, thanks! One question. On slide 23 it says "Standalone
>> ("local" mode)". That sounds a bit confusing without hearing the talk.
>>
>> Standalone mode is not local. It just does not depend on a cluster
>> software. I think it's the best mode for EC2/GCE, because they provide a
>> distributed filesystem anyway (S3/GCS). Why configure Hadoop if you don't
>> have to.
>>
>>
>> On Thu, May 1, 2014 at 12:25 AM, Dean Wampler <de...@gmail.com>wrote:
>>
>>  I meant to post this last week, but this is a talk I gave at the Philly
>> ETE conf. last week:
>>
>> http://www.slideshare.net/deanwampler/spark-the-next-top-compute-model
>>
>> Also here:
>>
>> http://polyglotprogramming.com/papers/Spark-TheNextTopComputeModel.pdf
>>
>> dean
>>
>> --
>> Dean Wampler, Ph.D.
>> Typesafe
>> @deanwampler
>> http://typesafe.com
>> http://polyglotprogramming.com
>>
>>
>>
>>
>
>
> --
> Dean Wampler, Ph.D.
> Typesafe
> @deanwampler
> http://typesafe.com
> http://polyglotprogramming.com
>

Re: My talk on "Spark: The Next Top (Compute) Model"

Posted by Dean Wampler <de...@gmail.com>.

That's great! Thanks. Let me know if it works ;) or what I could improve to
make it work.

dean


On Thu, May 1, 2014 at 8:45 AM, ZhangYi <yi...@thoughtworks.com> wrote:

>  Very Useful material. Currently, I am trying to persuade my client choose
> Spark instead of Hadoop MapReduce. Your slide give me more evidence to
> support my opinion.
>
> --
> ZhangYi (张逸)
> Developer
> tel: 15023157626
> blog: agiledon.github.com
> weibo: tw张逸
> Sent with Sparrow <http://www.sparrowmailapp.com/?sig>
>
> On Thursday, May 1, 2014 at 9:18 PM, Daniel Darabos wrote:
>
> Cool intro, thanks! One question. On slide 23 it says "Standalone ("local"
> mode)". That sounds a bit confusing without hearing the talk.
>
> Standalone mode is not local. It just does not depend on a cluster
> software. I think it's the best mode for EC2/GCE, because they provide a
> distributed filesystem anyway (S3/GCS). Why configure Hadoop if you don't
> have to.
>
>
> On Thu, May 1, 2014 at 12:25 AM, Dean Wampler <de...@gmail.com>wrote:
>
>  I meant to post this last week, but this is a talk I gave at the Philly
> ETE conf. last week:
>
> http://www.slideshare.net/deanwampler/spark-the-next-top-compute-model
>
> Also here:
>
> http://polyglotprogramming.com/papers/Spark-TheNextTopComputeModel.pdf
>
> dean
>
> --
> Dean Wampler, Ph.D.
> Typesafe
> @deanwampler
> http://typesafe.com
> http://polyglotprogramming.com
>
>
>
>


-- 
Dean Wampler, Ph.D.
Typesafe
@deanwampler
http://typesafe.com
http://polyglotprogramming.com

Re: My talk on "Spark: The Next Top (Compute) Model"

Posted by ZhangYi <yi...@thoughtworks.com>.

Very Useful material. Currently, I am trying to persuade my client choose Spark instead of Hadoop MapReduce. Your slide give me more evidence to support my opinion.   

--  
ZhangYi (张逸)
Developer
tel: 15023157626
blog: agiledon.github.com
weibo: tw张逸
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Thursday, May 1, 2014 at 9:18 PM, Daniel Darabos wrote:

> Cool intro, thanks! One question. On slide 23 it says "Standalone ("local" mode)". That sounds a bit confusing without hearing the talk.
>  
> Standalone mode is not local. It just does not depend on a cluster software. I think it's the best mode for EC2/GCE, because they provide a distributed filesystem anyway (S3/GCS). Why configure Hadoop if you don't have to.
>  
>  
> On Thu, May 1, 2014 at 12:25 AM, Dean Wampler <deanwampler@gmail.com (mailto:deanwampler@gmail.com)> wrote:
> > I meant to post this last week, but this is a talk I gave at the Philly ETE conf. last week:  
> >  
> > http://www.slideshare.net/deanwampler/spark-the-next-top-compute-model  
> >  
> > Also here:
> >  
> > http://polyglotprogramming.com/papers/Spark-TheNextTopComputeModel.pdf  
> >  
> > dean  
> >  
> > --  
> > Dean Wampler, Ph.D.
> > Typesafe
> > @deanwampler
> > http://typesafe.com
> > http://polyglotprogramming.com
> >  
> >  
> >  
> >  
>  
>  
>

Re: My talk on "Spark: The Next Top (Compute) Model"

Posted by Dean Wampler <de...@gmail.com>.

Thanks for the clarification. I'll fix the slide. I've done a lot of
Scalding/Cascading programming where the two concepts are synonymous, but
clearly I was imposing my prejudices here ;)

dean


On Thu, May 1, 2014 at 8:18 AM, Daniel Darabos <
daniel.darabos@lynxanalytics.com> wrote:

> Cool intro, thanks! One question. On slide 23 it says "Standalone ("local"
> mode)". That sounds a bit confusing without hearing the talk.
>
> Standalone mode is not local. It just does not depend on a cluster
> software. I think it's the best mode for EC2/GCE, because they provide a
> distributed filesystem anyway (S3/GCS). Why configure Hadoop if you don't
> have to.
>
>
> On Thu, May 1, 2014 at 12:25 AM, Dean Wampler <de...@gmail.com>wrote:
>
>> I meant to post this last week, but this is a talk I gave at the Philly
>> ETE conf. last week:
>>
>> http://www.slideshare.net/deanwampler/spark-the-next-top-compute-model
>>
>> Also here:
>>
>> http://polyglotprogramming.com/papers/Spark-TheNextTopComputeModel.pdf
>>
>> dean
>>
>> --
>> Dean Wampler, Ph.D.
>> Typesafe
>> @deanwampler
>> http://typesafe.com
>> http://polyglotprogramming.com
>>
>
>


-- 
Dean Wampler, Ph.D.
Typesafe
@deanwampler
http://typesafe.com
http://polyglotprogramming.com

Re: My talk on "Spark: The Next Top (Compute) Model"

Posted by Daniel Darabos <da...@lynxanalytics.com>.

Cool intro, thanks! One question. On slide 23 it says "Standalone ("local"
mode)". That sounds a bit confusing without hearing the talk.

Standalone mode is not local. It just does not depend on a cluster
software. I think it's the best mode for EC2/GCE, because they provide a
distributed filesystem anyway (S3/GCS). Why configure Hadoop if you don't
have to.

On Thu, May 1, 2014 at 12:25 AM, Dean Wampler <de...@gmail.com> wrote:

> I meant to post this last week, but this is a talk I gave at the Philly
> ETE conf. last week:
>
> http://www.slideshare.net/deanwampler/spark-the-next-top-compute-model
>
> Also here:
>
> http://polyglotprogramming.com/papers/Spark-TheNextTopComputeModel.pdf
>
> dean
>
> --
> Dean Wampler, Ph.D.
> Typesafe
> @deanwampler
> http://typesafe.com
> http://polyglotprogramming.com
>