You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by "Evan R. Sparks" <ev...@gmail.com> on 2014/11/23 17:55:45 UTC

Notes on writing complex spark applications

Hi all,

Shivaram Venkataraman, Joseph Gonzalez, Tomer Kaftan, and I have been
working on a short document about writing high performance Spark
applications based on our experience developing MLlib, GraphX, ml-matrix,
pipelines, etc. It may be a useful document both for users and new Spark
developers - perhaps it should go on the wiki?

The document itself is here:
https://docs.google.com/document/d/1gEIawzRsOwksV_bq4je3ofnd-7Xu-u409mdW-RXTDnQ/edit?usp=sharing
and I've created SPARK-4565
<https://issues.apache.org/jira/browse/SPARK-4565> to track this.

- Evan

Re: Notes on writing complex spark applications

Posted by andy petrella <an...@gmail.com>.
Cool!

On Sun Nov 23 2014 at 5:58:03 PM Evan R. Sparks <ev...@gmail.com>
wrote:

> Hi all,
>
> Shivaram Venkataraman, Joseph Gonzalez, Tomer Kaftan, and I have been
> working on a short document about writing high performance Spark
> applications based on our experience developing MLlib, GraphX, ml-matrix,
> pipelines, etc. It may be a useful document both for users and new Spark
> developers - perhaps it should go on the wiki?
>
> The document itself is here:
> https://docs.google.com/document/d/1gEIawzRsOwksV_
> bq4je3ofnd-7Xu-u409mdW-RXTDnQ/edit?usp=sharing
> and I've created SPARK-4565
> <https://issues.apache.org/jira/browse/SPARK-4565> to track this.
>
> - Evan
>

Re: Notes on writing complex spark applications

Posted by "Evan R. Sparks" <ev...@gmail.com>.
Thanks Patrick,

You raise a good point - for this to be useful it's imperative that it is
updated with new versions of spark.

My thought with putting it on the wiki was that it's lower friction for
community members to edit, but it likely won't have the same level of
quality control as the existing documentation.

At a higher level - some of these tips are best practices for writing
applications that depend on Spark. I'm wondering if a new document is in
order for things like "this is how you set up a project skeleton to link
against spark," and "this is how you handle external libraries," - etc.? I
know that in the past I've run into stumbling blocks on things like getting
classpaths correct, trying to link against a different version of akka, and
so on that would be useful to have in such a document, in addition to some
of the application architecture suggestions we propose in *this* document.

- Evan

On Sun, Nov 23, 2014 at 9:02 PM, Patrick Wendell <pw...@gmail.com> wrote:

> Hey Evan,
>
> It might be nice to merge this into existing documentation. In
> particular, a lot of this could serve to update the current tuning
> section and programming guides.
>
> It could also work to paste this wholesale as a reference for Spark
> users, but in that case it's less likely to get updated when other
> things change, or be found by users reading through the spark docs.
>
> - Patrick
>
> On Sun, Nov 23, 2014 at 8:27 PM, Inkyu Lee <go...@gmail.com> wrote:
> > Very helpful!!
> >
> > thank you very much!
> >
> > 2014-11-24 2:17 GMT+09:00 Sam Bessalah <sa...@gmail.com>:
> >
> >> Thanks Evan, this is great.
> >> On Nov 23, 2014 5:58 PM, "Evan R. Sparks" <ev...@gmail.com>
> wrote:
> >>
> >> > Hi all,
> >> >
> >> > Shivaram Venkataraman, Joseph Gonzalez, Tomer Kaftan, and I have been
> >> > working on a short document about writing high performance Spark
> >> > applications based on our experience developing MLlib, GraphX,
> ml-matrix,
> >> > pipelines, etc. It may be a useful document both for users and new
> Spark
> >> > developers - perhaps it should go on the wiki?
> >> >
> >> > The document itself is here:
> >> >
> >> >
> >>
> https://docs.google.com/document/d/1gEIawzRsOwksV_bq4je3ofnd-7Xu-u409mdW-RXTDnQ/edit?usp=sharing
> >> > and I've created SPARK-4565
> >> > <https://issues.apache.org/jira/browse/SPARK-4565> to track this.
> >> >
> >> > - Evan
> >> >
> >>
>

Re: Notes on writing complex spark applications

Posted by Patrick Wendell <pw...@gmail.com>.
Hey Evan,

It might be nice to merge this into existing documentation. In
particular, a lot of this could serve to update the current tuning
section and programming guides.

It could also work to paste this wholesale as a reference for Spark
users, but in that case it's less likely to get updated when other
things change, or be found by users reading through the spark docs.

- Patrick

On Sun, Nov 23, 2014 at 8:27 PM, Inkyu Lee <go...@gmail.com> wrote:
> Very helpful!!
>
> thank you very much!
>
> 2014-11-24 2:17 GMT+09:00 Sam Bessalah <sa...@gmail.com>:
>
>> Thanks Evan, this is great.
>> On Nov 23, 2014 5:58 PM, "Evan R. Sparks" <ev...@gmail.com> wrote:
>>
>> > Hi all,
>> >
>> > Shivaram Venkataraman, Joseph Gonzalez, Tomer Kaftan, and I have been
>> > working on a short document about writing high performance Spark
>> > applications based on our experience developing MLlib, GraphX, ml-matrix,
>> > pipelines, etc. It may be a useful document both for users and new Spark
>> > developers - perhaps it should go on the wiki?
>> >
>> > The document itself is here:
>> >
>> >
>> https://docs.google.com/document/d/1gEIawzRsOwksV_bq4je3ofnd-7Xu-u409mdW-RXTDnQ/edit?usp=sharing
>> > and I've created SPARK-4565
>> > <https://issues.apache.org/jira/browse/SPARK-4565> to track this.
>> >
>> > - Evan
>> >
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Notes on writing complex spark applications

Posted by Inkyu Lee <go...@gmail.com>.
Very helpful!!

thank you very much!

2014-11-24 2:17 GMT+09:00 Sam Bessalah <sa...@gmail.com>:

> Thanks Evan, this is great.
> On Nov 23, 2014 5:58 PM, "Evan R. Sparks" <ev...@gmail.com> wrote:
>
> > Hi all,
> >
> > Shivaram Venkataraman, Joseph Gonzalez, Tomer Kaftan, and I have been
> > working on a short document about writing high performance Spark
> > applications based on our experience developing MLlib, GraphX, ml-matrix,
> > pipelines, etc. It may be a useful document both for users and new Spark
> > developers - perhaps it should go on the wiki?
> >
> > The document itself is here:
> >
> >
> https://docs.google.com/document/d/1gEIawzRsOwksV_bq4je3ofnd-7Xu-u409mdW-RXTDnQ/edit?usp=sharing
> > and I've created SPARK-4565
> > <https://issues.apache.org/jira/browse/SPARK-4565> to track this.
> >
> > - Evan
> >
>

Re: Notes on writing complex spark applications

Posted by Sam Bessalah <sa...@gmail.com>.
Thanks Evan, this is great.
On Nov 23, 2014 5:58 PM, "Evan R. Sparks" <ev...@gmail.com> wrote:

> Hi all,
>
> Shivaram Venkataraman, Joseph Gonzalez, Tomer Kaftan, and I have been
> working on a short document about writing high performance Spark
> applications based on our experience developing MLlib, GraphX, ml-matrix,
> pipelines, etc. It may be a useful document both for users and new Spark
> developers - perhaps it should go on the wiki?
>
> The document itself is here:
>
> https://docs.google.com/document/d/1gEIawzRsOwksV_bq4je3ofnd-7Xu-u409mdW-RXTDnQ/edit?usp=sharing
> and I've created SPARK-4565
> <https://issues.apache.org/jira/browse/SPARK-4565> to track this.
>
> - Evan
>