You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Stephan Ewen <se...@apache.org> on 2015/03/16 19:51:37 UTC

Improve the documentation of the Flink Architecture and internals

Hi all!

I would like to kick of an effort to improve the documentation of the Flink
Architecture and internals. This also means making the streaming
architecture more prominent in the docs.

Being quite a sophisticated stack, we need to improve the presentation of
how Flink works - to an extend necessary to use Flink (and to appreciate
all the cool stuff that is happening). This should also come in handy with
new contributors.

As a general umbrella, we need to first decide where and how to organize
the documentation.

I would propose to put the bulk of the documentation into the Wiki. Create
a dedicated section on Flink Internals and sub-pages for each component /
topic. To the docs, we add a general overview from which we link into the
Wiki.


 == These sections would go into the DOCS in the git repository ==

  - Overview of Program, pre-flight phase (type extraction, optimizer),
JobManager, TaskManager. Differences between streaming and batch. We can
realize this through one very nice picture with few lines of text.

  - High level architecture stack, different program representations (API
operators, common API DAG, optimizer DAG, parallel data flow (JobGraph /
Execution Graph)

  - (maybe) Parallelism and scheduling. This seems to be paramount to
understand for users.

  - Processes (JobManager, TaskManager, Webserver, WebClient, CLI client)



 == These sections would go into the WIKI ==

  - Project structure (maven projects, what is where, dependencies between
projects)

  - Component overview

    -> JobManager (InstanceManager, Scheduler, BLOB server, Library Cache,
Archiving)

    -> TaskManager (MemoryManager, IOManager, BLOB Cache, Library Cache)

    -> Involved Actor Systems / Actors / Messages

  - Details about submitting a job (library upload, job graph submission,
execution graph setup, scheduling trigger)

  - Memory Management

  - Optimizer internals

  - Akka Setup specifics

  - Netty and pluggable data exchange strategies

  - Testing: Flink test clusters and unit test utilities

  - Developer How-To: Setting up Eclipse, IntelliJ, Travis

  - Step-by-step guide to add a new operator


I will go ahead and stub some sections in the Wiki.

As we discuss and agree/disagree with the outline, we can evolve the Wiki.

Greetings,
Stephan

Re: Improve the documentation of the Flink Architecture and internals

Posted by Henry Saputra <he...@gmail.com>.
I don't think so. But adding new contributors wiki username to edit is
much easier to merging changes to website, I hope.

- Henry

On Fri, Mar 20, 2015 at 8:53 AM, Maximilian Michels <mx...@apache.org> wrote:
> +1 for the initiative and for the wiki.
>
> At the moment, the wiki's barrier to entry is like the git repository's.
> Contributors need to explicitly ask comitters for access to the wiki. Is
> there a way we could open up the wiki for contributors without having to
> face too much spam? (e.g. have changes approved before showing them in the
> wiki)
>
> On Fri, Mar 20, 2015 at 12:49 PM, Ufuk Celebi <uc...@apache.org> wrote:
>
>> Thanks. I will have a look later :-)
>>
>> +1 for the Wiki. I think the low overhead does not only make it easier to
>> contribute for newcomers, but for committers as well. :-)
>>
>> On 20 Mar 2015, at 12:46, Kostas Tzoumas <kt...@apache.org> wrote:
>>
>> > I added a document for data exchange between tasks:
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks
>> >
>> > Feel free to edit. I plan to link the class names to the class files in
>> > github.
>> >
>> > On Tue, Mar 17, 2015 at 11:17 AM, Kostas Tzoumas <kt...@apache.org>
>> > wrote:
>> >
>> >> +1 for the Wiki.
>> >>
>> >> When these have been stabilized we can move them to the docs if we
>> decide
>> >> to do so.
>> >>
>> >> On Mon, Mar 16, 2015 at 10:07 PM, Stephan Ewen <se...@apache.org>
>> wrote:
>> >>
>> >>> I have put my suggested version of an outline for the docs into the
>> wiki.
>> >>> Regardless where the docs end up (wiki or repository), we can use the
>> wiki
>> >>> to outline the docs.
>> >>>
>> >>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals
>> >>>
>> >>> Some pages contain some stub or outline, others are completely blank.
>> >>>
>> >>> Not a comple list. Additions are welcome.
>> >>>
>> >>> On Mon, Mar 16, 2015 at 10:04 PM, Stephan Ewen <se...@apache.org>
>> wrote:
>> >>>
>> >>>> I think the Wiki has a much lower barrier of entry to fix docs,
>> >>> especially
>> >>>> for external people. The docs, with the Jekyll setup, is rather
>> tricky.
>> >>>> I would very much like that all kinds of people contribute to the docs
>> >>>> about the internals, not just the usual three suspects that have done
>> >>> this
>> >>>> so far.
>> >>>>
>> >>>> Having a good landing page in the regular docs is exactly to not loose
>> >>> all
>> >>>> the people that do not look into a wiki. The overview pages for the
>> >>>> internals need to be good and accessible and nicely link to the wiki
>> to
>> >>>> "forward" people there.
>> >>>>
>> >>>> The overhead of deciding what goes where should not be terribly large,
>> >>> in
>> >>>> my opinion, since there is no really "wrong" place to put it.
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Mon, Mar 16, 2015 at 9:58 PM, Aljoscha Krettek <
>> aljoscha@apache.org>
>> >>>> wrote:
>> >>>>
>> >>>>> Why do you wan't to split stuff between the doc in the repository and
>> >>>>> the wiki. I for one would always be to lazy to check stuff in a wiki
>> >>>>> when there is also a documentation. Plus, this would lead to
>> >>>>> additional overhead in deciding what goes where and syncing between
>> >>>>> the two places for documentation.
>> >>>>>
>> >>>>> On Mon, Mar 16, 2015 at 7:59 PM, Stephan Ewen <se...@apache.org>
>> >>> wrote:
>> >>>>>> Ah, I totally forgot to add to the internals:
>> >>>>>>
>> >>>>>>  - Fault tolerance in Batch mode
>> >>>>>>
>> >>>>>>  - Fault Tolerance in Streaming Mode, with state handling
>> >>>>>>
>> >>>>>> On Mon, Mar 16, 2015 at 7:51 PM, Stephan Ewen <se...@apache.org>
>> >>> wrote:
>> >>>>>>
>> >>>>>>> Hi all!
>> >>>>>>>
>> >>>>>>> I would like to kick of an effort to improve the documentation of
>> >>> the
>> >>>>>>> Flink Architecture and internals. This also means making the
>> >>> streaming
>> >>>>>>> architecture more prominent in the docs.
>> >>>>>>>
>> >>>>>>> Being quite a sophisticated stack, we need to improve the
>> >>> presentation
>> >>>>> of
>> >>>>>>> how Flink works - to an extend necessary to use Flink (and to
>> >>>>> appreciate
>> >>>>>>> all the cool stuff that is happening). This should also come in
>> >>> handy
>> >>>>> with
>> >>>>>>> new contributors.
>> >>>>>>>
>> >>>>>>> As a general umbrella, we need to first decide where and how to
>> >>>>> organize
>> >>>>>>> the documentation.
>> >>>>>>>
>> >>>>>>> I would propose to put the bulk of the documentation into the Wiki.
>> >>>>> Create
>> >>>>>>> a dedicated section on Flink Internals and sub-pages for each
>> >>>>> component /
>> >>>>>>> topic. To the docs, we add a general overview from which we link
>> >>> into
>> >>>>> the
>> >>>>>>> Wiki.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> == These sections would go into the DOCS in the git repository ==
>> >>>>>>>
>> >>>>>>>  - Overview of Program, pre-flight phase (type extraction,
>> >>> optimizer),
>> >>>>>>> JobManager, TaskManager. Differences between streaming and batch.
>> We
>> >>>>> can
>> >>>>>>> realize this through one very nice picture with few lines of text.
>> >>>>>>>
>> >>>>>>>  - High level architecture stack, different program representations
>> >>>>> (API
>> >>>>>>> operators, common API DAG, optimizer DAG, parallel data flow
>> >>> (JobGraph
>> >>>>> /
>> >>>>>>> Execution Graph)
>> >>>>>>>
>> >>>>>>>  - (maybe) Parallelism and scheduling. This seems to be paramount
>> >>> to
>> >>>>>>> understand for users.
>> >>>>>>>
>> >>>>>>>  - Processes (JobManager, TaskManager, Webserver, WebClient, CLI
>> >>>>> client)
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> == These sections would go into the WIKI ==
>> >>>>>>>
>> >>>>>>>  - Project structure (maven projects, what is where, dependencies
>> >>>>> between
>> >>>>>>> projects)
>> >>>>>>>
>> >>>>>>>  - Component overview
>> >>>>>>>
>> >>>>>>>    -> JobManager (InstanceManager, Scheduler, BLOB server, Library
>> >>>>> Cache,
>> >>>>>>> Archiving)
>> >>>>>>>
>> >>>>>>>    -> TaskManager (MemoryManager, IOManager, BLOB Cache, Library
>> >>>>> Cache)
>> >>>>>>>
>> >>>>>>>    -> Involved Actor Systems / Actors / Messages
>> >>>>>>>
>> >>>>>>>  - Details about submitting a job (library upload, job graph
>> >>>>> submission,
>> >>>>>>> execution graph setup, scheduling trigger)
>> >>>>>>>
>> >>>>>>>  - Memory Management
>> >>>>>>>
>> >>>>>>>  - Optimizer internals
>> >>>>>>>
>> >>>>>>>  - Akka Setup specifics
>> >>>>>>>
>> >>>>>>>  - Netty and pluggable data exchange strategies
>> >>>>>>>
>> >>>>>>>  - Testing: Flink test clusters and unit test utilities
>> >>>>>>>
>> >>>>>>>  - Developer How-To: Setting up Eclipse, IntelliJ, Travis
>> >>>>>>>
>> >>>>>>>  - Step-by-step guide to add a new operator
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> I will go ahead and stub some sections in the Wiki.
>> >>>>>>>
>> >>>>>>> As we discuss and agree/disagree with the outline, we can evolve
>> the
>> >>>>> Wiki.
>> >>>>>>>
>> >>>>>>> Greetings,
>> >>>>>>> Stephan
>> >>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>
>> >>
>> >>
>>
>>

Re: Improve the documentation of the Flink Architecture and internals

Posted by Maximilian Michels <mx...@apache.org>.
+1 for the initiative and for the wiki.

At the moment, the wiki's barrier to entry is like the git repository's.
Contributors need to explicitly ask comitters for access to the wiki. Is
there a way we could open up the wiki for contributors without having to
face too much spam? (e.g. have changes approved before showing them in the
wiki)

On Fri, Mar 20, 2015 at 12:49 PM, Ufuk Celebi <uc...@apache.org> wrote:

> Thanks. I will have a look later :-)
>
> +1 for the Wiki. I think the low overhead does not only make it easier to
> contribute for newcomers, but for committers as well. :-)
>
> On 20 Mar 2015, at 12:46, Kostas Tzoumas <kt...@apache.org> wrote:
>
> > I added a document for data exchange between tasks:
> >
> https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks
> >
> > Feel free to edit. I plan to link the class names to the class files in
> > github.
> >
> > On Tue, Mar 17, 2015 at 11:17 AM, Kostas Tzoumas <kt...@apache.org>
> > wrote:
> >
> >> +1 for the Wiki.
> >>
> >> When these have been stabilized we can move them to the docs if we
> decide
> >> to do so.
> >>
> >> On Mon, Mar 16, 2015 at 10:07 PM, Stephan Ewen <se...@apache.org>
> wrote:
> >>
> >>> I have put my suggested version of an outline for the docs into the
> wiki.
> >>> Regardless where the docs end up (wiki or repository), we can use the
> wiki
> >>> to outline the docs.
> >>>
> >>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals
> >>>
> >>> Some pages contain some stub or outline, others are completely blank.
> >>>
> >>> Not a comple list. Additions are welcome.
> >>>
> >>> On Mon, Mar 16, 2015 at 10:04 PM, Stephan Ewen <se...@apache.org>
> wrote:
> >>>
> >>>> I think the Wiki has a much lower barrier of entry to fix docs,
> >>> especially
> >>>> for external people. The docs, with the Jekyll setup, is rather
> tricky.
> >>>> I would very much like that all kinds of people contribute to the docs
> >>>> about the internals, not just the usual three suspects that have done
> >>> this
> >>>> so far.
> >>>>
> >>>> Having a good landing page in the regular docs is exactly to not loose
> >>> all
> >>>> the people that do not look into a wiki. The overview pages for the
> >>>> internals need to be good and accessible and nicely link to the wiki
> to
> >>>> "forward" people there.
> >>>>
> >>>> The overhead of deciding what goes where should not be terribly large,
> >>> in
> >>>> my opinion, since there is no really "wrong" place to put it.
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Mar 16, 2015 at 9:58 PM, Aljoscha Krettek <
> aljoscha@apache.org>
> >>>> wrote:
> >>>>
> >>>>> Why do you wan't to split stuff between the doc in the repository and
> >>>>> the wiki. I for one would always be to lazy to check stuff in a wiki
> >>>>> when there is also a documentation. Plus, this would lead to
> >>>>> additional overhead in deciding what goes where and syncing between
> >>>>> the two places for documentation.
> >>>>>
> >>>>> On Mon, Mar 16, 2015 at 7:59 PM, Stephan Ewen <se...@apache.org>
> >>> wrote:
> >>>>>> Ah, I totally forgot to add to the internals:
> >>>>>>
> >>>>>>  - Fault tolerance in Batch mode
> >>>>>>
> >>>>>>  - Fault Tolerance in Streaming Mode, with state handling
> >>>>>>
> >>>>>> On Mon, Mar 16, 2015 at 7:51 PM, Stephan Ewen <se...@apache.org>
> >>> wrote:
> >>>>>>
> >>>>>>> Hi all!
> >>>>>>>
> >>>>>>> I would like to kick of an effort to improve the documentation of
> >>> the
> >>>>>>> Flink Architecture and internals. This also means making the
> >>> streaming
> >>>>>>> architecture more prominent in the docs.
> >>>>>>>
> >>>>>>> Being quite a sophisticated stack, we need to improve the
> >>> presentation
> >>>>> of
> >>>>>>> how Flink works - to an extend necessary to use Flink (and to
> >>>>> appreciate
> >>>>>>> all the cool stuff that is happening). This should also come in
> >>> handy
> >>>>> with
> >>>>>>> new contributors.
> >>>>>>>
> >>>>>>> As a general umbrella, we need to first decide where and how to
> >>>>> organize
> >>>>>>> the documentation.
> >>>>>>>
> >>>>>>> I would propose to put the bulk of the documentation into the Wiki.
> >>>>> Create
> >>>>>>> a dedicated section on Flink Internals and sub-pages for each
> >>>>> component /
> >>>>>>> topic. To the docs, we add a general overview from which we link
> >>> into
> >>>>> the
> >>>>>>> Wiki.
> >>>>>>>
> >>>>>>>
> >>>>>>> == These sections would go into the DOCS in the git repository ==
> >>>>>>>
> >>>>>>>  - Overview of Program, pre-flight phase (type extraction,
> >>> optimizer),
> >>>>>>> JobManager, TaskManager. Differences between streaming and batch.
> We
> >>>>> can
> >>>>>>> realize this through one very nice picture with few lines of text.
> >>>>>>>
> >>>>>>>  - High level architecture stack, different program representations
> >>>>> (API
> >>>>>>> operators, common API DAG, optimizer DAG, parallel data flow
> >>> (JobGraph
> >>>>> /
> >>>>>>> Execution Graph)
> >>>>>>>
> >>>>>>>  - (maybe) Parallelism and scheduling. This seems to be paramount
> >>> to
> >>>>>>> understand for users.
> >>>>>>>
> >>>>>>>  - Processes (JobManager, TaskManager, Webserver, WebClient, CLI
> >>>>> client)
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> == These sections would go into the WIKI ==
> >>>>>>>
> >>>>>>>  - Project structure (maven projects, what is where, dependencies
> >>>>> between
> >>>>>>> projects)
> >>>>>>>
> >>>>>>>  - Component overview
> >>>>>>>
> >>>>>>>    -> JobManager (InstanceManager, Scheduler, BLOB server, Library
> >>>>> Cache,
> >>>>>>> Archiving)
> >>>>>>>
> >>>>>>>    -> TaskManager (MemoryManager, IOManager, BLOB Cache, Library
> >>>>> Cache)
> >>>>>>>
> >>>>>>>    -> Involved Actor Systems / Actors / Messages
> >>>>>>>
> >>>>>>>  - Details about submitting a job (library upload, job graph
> >>>>> submission,
> >>>>>>> execution graph setup, scheduling trigger)
> >>>>>>>
> >>>>>>>  - Memory Management
> >>>>>>>
> >>>>>>>  - Optimizer internals
> >>>>>>>
> >>>>>>>  - Akka Setup specifics
> >>>>>>>
> >>>>>>>  - Netty and pluggable data exchange strategies
> >>>>>>>
> >>>>>>>  - Testing: Flink test clusters and unit test utilities
> >>>>>>>
> >>>>>>>  - Developer How-To: Setting up Eclipse, IntelliJ, Travis
> >>>>>>>
> >>>>>>>  - Step-by-step guide to add a new operator
> >>>>>>>
> >>>>>>>
> >>>>>>> I will go ahead and stub some sections in the Wiki.
> >>>>>>>
> >>>>>>> As we discuss and agree/disagree with the outline, we can evolve
> the
> >>>>> Wiki.
> >>>>>>>
> >>>>>>> Greetings,
> >>>>>>> Stephan
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >>
>
>

Re: Improve the documentation of the Flink Architecture and internals

Posted by Ufuk Celebi <uc...@apache.org>.
Thanks. I will have a look later :-)

+1 for the Wiki. I think the low overhead does not only make it easier to contribute for newcomers, but for committers as well. :-)

On 20 Mar 2015, at 12:46, Kostas Tzoumas <kt...@apache.org> wrote:

> I added a document for data exchange between tasks:
> https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks
> 
> Feel free to edit. I plan to link the class names to the class files in
> github.
> 
> On Tue, Mar 17, 2015 at 11:17 AM, Kostas Tzoumas <kt...@apache.org>
> wrote:
> 
>> +1 for the Wiki.
>> 
>> When these have been stabilized we can move them to the docs if we decide
>> to do so.
>> 
>> On Mon, Mar 16, 2015 at 10:07 PM, Stephan Ewen <se...@apache.org> wrote:
>> 
>>> I have put my suggested version of an outline for the docs into the wiki.
>>> Regardless where the docs end up (wiki or repository), we can use the wiki
>>> to outline the docs.
>>> 
>>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals
>>> 
>>> Some pages contain some stub or outline, others are completely blank.
>>> 
>>> Not a comple list. Additions are welcome.
>>> 
>>> On Mon, Mar 16, 2015 at 10:04 PM, Stephan Ewen <se...@apache.org> wrote:
>>> 
>>>> I think the Wiki has a much lower barrier of entry to fix docs,
>>> especially
>>>> for external people. The docs, with the Jekyll setup, is rather tricky.
>>>> I would very much like that all kinds of people contribute to the docs
>>>> about the internals, not just the usual three suspects that have done
>>> this
>>>> so far.
>>>> 
>>>> Having a good landing page in the regular docs is exactly to not loose
>>> all
>>>> the people that do not look into a wiki. The overview pages for the
>>>> internals need to be good and accessible and nicely link to the wiki to
>>>> "forward" people there.
>>>> 
>>>> The overhead of deciding what goes where should not be terribly large,
>>> in
>>>> my opinion, since there is no really "wrong" place to put it.
>>>> 
>>>> 
>>>> 
>>>> On Mon, Mar 16, 2015 at 9:58 PM, Aljoscha Krettek <al...@apache.org>
>>>> wrote:
>>>> 
>>>>> Why do you wan't to split stuff between the doc in the repository and
>>>>> the wiki. I for one would always be to lazy to check stuff in a wiki
>>>>> when there is also a documentation. Plus, this would lead to
>>>>> additional overhead in deciding what goes where and syncing between
>>>>> the two places for documentation.
>>>>> 
>>>>> On Mon, Mar 16, 2015 at 7:59 PM, Stephan Ewen <se...@apache.org>
>>> wrote:
>>>>>> Ah, I totally forgot to add to the internals:
>>>>>> 
>>>>>>  - Fault tolerance in Batch mode
>>>>>> 
>>>>>>  - Fault Tolerance in Streaming Mode, with state handling
>>>>>> 
>>>>>> On Mon, Mar 16, 2015 at 7:51 PM, Stephan Ewen <se...@apache.org>
>>> wrote:
>>>>>> 
>>>>>>> Hi all!
>>>>>>> 
>>>>>>> I would like to kick of an effort to improve the documentation of
>>> the
>>>>>>> Flink Architecture and internals. This also means making the
>>> streaming
>>>>>>> architecture more prominent in the docs.
>>>>>>> 
>>>>>>> Being quite a sophisticated stack, we need to improve the
>>> presentation
>>>>> of
>>>>>>> how Flink works - to an extend necessary to use Flink (and to
>>>>> appreciate
>>>>>>> all the cool stuff that is happening). This should also come in
>>> handy
>>>>> with
>>>>>>> new contributors.
>>>>>>> 
>>>>>>> As a general umbrella, we need to first decide where and how to
>>>>> organize
>>>>>>> the documentation.
>>>>>>> 
>>>>>>> I would propose to put the bulk of the documentation into the Wiki.
>>>>> Create
>>>>>>> a dedicated section on Flink Internals and sub-pages for each
>>>>> component /
>>>>>>> topic. To the docs, we add a general overview from which we link
>>> into
>>>>> the
>>>>>>> Wiki.
>>>>>>> 
>>>>>>> 
>>>>>>> == These sections would go into the DOCS in the git repository ==
>>>>>>> 
>>>>>>>  - Overview of Program, pre-flight phase (type extraction,
>>> optimizer),
>>>>>>> JobManager, TaskManager. Differences between streaming and batch. We
>>>>> can
>>>>>>> realize this through one very nice picture with few lines of text.
>>>>>>> 
>>>>>>>  - High level architecture stack, different program representations
>>>>> (API
>>>>>>> operators, common API DAG, optimizer DAG, parallel data flow
>>> (JobGraph
>>>>> /
>>>>>>> Execution Graph)
>>>>>>> 
>>>>>>>  - (maybe) Parallelism and scheduling. This seems to be paramount
>>> to
>>>>>>> understand for users.
>>>>>>> 
>>>>>>>  - Processes (JobManager, TaskManager, Webserver, WebClient, CLI
>>>>> client)
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> == These sections would go into the WIKI ==
>>>>>>> 
>>>>>>>  - Project structure (maven projects, what is where, dependencies
>>>>> between
>>>>>>> projects)
>>>>>>> 
>>>>>>>  - Component overview
>>>>>>> 
>>>>>>>    -> JobManager (InstanceManager, Scheduler, BLOB server, Library
>>>>> Cache,
>>>>>>> Archiving)
>>>>>>> 
>>>>>>>    -> TaskManager (MemoryManager, IOManager, BLOB Cache, Library
>>>>> Cache)
>>>>>>> 
>>>>>>>    -> Involved Actor Systems / Actors / Messages
>>>>>>> 
>>>>>>>  - Details about submitting a job (library upload, job graph
>>>>> submission,
>>>>>>> execution graph setup, scheduling trigger)
>>>>>>> 
>>>>>>>  - Memory Management
>>>>>>> 
>>>>>>>  - Optimizer internals
>>>>>>> 
>>>>>>>  - Akka Setup specifics
>>>>>>> 
>>>>>>>  - Netty and pluggable data exchange strategies
>>>>>>> 
>>>>>>>  - Testing: Flink test clusters and unit test utilities
>>>>>>> 
>>>>>>>  - Developer How-To: Setting up Eclipse, IntelliJ, Travis
>>>>>>> 
>>>>>>>  - Step-by-step guide to add a new operator
>>>>>>> 
>>>>>>> 
>>>>>>> I will go ahead and stub some sections in the Wiki.
>>>>>>> 
>>>>>>> As we discuss and agree/disagree with the outline, we can evolve the
>>>>> Wiki.
>>>>>>> 
>>>>>>> Greetings,
>>>>>>> Stephan
>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 


Re: Improve the documentation of the Flink Architecture and internals

Posted by Maximilian Michels <mx...@apache.org>.
Very insightful post. Thank you!

On Sat, Mar 21, 2015 at 5:30 PM, Till Rohrmann <tr...@apache.org> wrote:

> I wrote some internal documentation for Akka and the distributed
> communication [1].
>
> Cheers,
>
> Till
>
> [1] https://cwiki.apache.org/confluence/display/FLINK/Akka+and+Actors
>
> On Fri, Mar 20, 2015 at 7:31 PM, Henry Saputra <he...@gmail.com>
> wrote:
>
> > Ah the Tweet infra bot just announce extended downtime for Confluence [1]
> >
> > - Henry
> >
> > [1] https://twitter.com/infrabot/status/578983473970475008
> >
> > On Fri, Mar 20, 2015 at 11:27 AM, Stephan Ewen <se...@apache.org> wrote:
> > > For me as well. Earlier today it said "down for maintenance"
> > >
> > > On Fri, Mar 20, 2015 at 7:14 PM, Kostas Tzoumas <kt...@apache.org>
> > wrote:
> > >
> > >> it's down for me as well
> > >>
> > >> On Fri, Mar 20, 2015 at 7:12 PM, Henry Saputra <
> henry.saputra@gmail.com
> > >
> > >> wrote:
> > >>
> > >> > Is the wiki down for any of you?
> > >> >
> > >> > I can't access
> > >> > https://cwiki.apache.org/confluence/display/FLINK/Apache+Flink+Home
> > >> >
> > >> > 404
> > >> >
> > >> > - Henry
> > >> >
> > >> > On Fri, Mar 20, 2015 at 4:46 AM, Kostas Tzoumas <
> ktzoumas@apache.org>
> > >> > wrote:
> > >> > > I added a document for data exchange between tasks:
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks
> > >> > >
> > >> > > Feel free to edit. I plan to link the class names to the class
> > files in
> > >> > > github.
> > >> > >
> > >> > > On Tue, Mar 17, 2015 at 11:17 AM, Kostas Tzoumas <
> > ktzoumas@apache.org>
> > >> > > wrote:
> > >> > >
> > >> > >> +1 for the Wiki.
> > >> > >>
> > >> > >> When these have been stabilized we can move them to the docs if
> we
> > >> > decide
> > >> > >> to do so.
> > >> > >>
> > >> > >> On Mon, Mar 16, 2015 at 10:07 PM, Stephan Ewen <sewen@apache.org
> >
> > >> > wrote:
> > >> > >>
> > >> > >>> I have put my suggested version of an outline for the docs into
> > the
> > >> > wiki.
> > >> > >>> Regardless where the docs end up (wiki or repository), we can
> use
> > the
> > >> > wiki
> > >> > >>> to outline the docs.
> > >> > >>>
> > >> > >>>
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals
> > >> > >>>
> > >> > >>> Some pages contain some stub or outline, others are completely
> > blank.
> > >> > >>>
> > >> > >>> Not a comple list. Additions are welcome.
> > >> > >>>
> > >> > >>> On Mon, Mar 16, 2015 at 10:04 PM, Stephan Ewen <
> sewen@apache.org>
> > >> > wrote:
> > >> > >>>
> > >> > >>> > I think the Wiki has a much lower barrier of entry to fix
> docs,
> > >> > >>> especially
> > >> > >>> > for external people. The docs, with the Jekyll setup, is
> rather
> > >> > tricky.
> > >> > >>> > I would very much like that all kinds of people contribute to
> > the
> > >> > docs
> > >> > >>> > about the internals, not just the usual three suspects that
> have
> > >> done
> > >> > >>> this
> > >> > >>> > so far.
> > >> > >>> >
> > >> > >>> > Having a good landing page in the regular docs is exactly to
> not
> > >> > loose
> > >> > >>> all
> > >> > >>> > the people that do not look into a wiki. The overview pages
> for
> > the
> > >> > >>> > internals need to be good and accessible and nicely link to
> the
> > >> wiki
> > >> > to
> > >> > >>> > "forward" people there.
> > >> > >>> >
> > >> > >>> > The overhead of deciding what goes where should not be
> terribly
> > >> > large,
> > >> > >>> in
> > >> > >>> > my opinion, since there is no really "wrong" place to put it.
> > >> > >>> >
> > >> > >>> >
> > >> > >>> >
> > >> > >>> > On Mon, Mar 16, 2015 at 9:58 PM, Aljoscha Krettek <
> > >> > aljoscha@apache.org>
> > >> > >>> > wrote:
> > >> > >>> >
> > >> > >>> >> Why do you wan't to split stuff between the doc in the
> > repository
> > >> > and
> > >> > >>> >> the wiki. I for one would always be to lazy to check stuff
> in a
> > >> wiki
> > >> > >>> >> when there is also a documentation. Plus, this would lead to
> > >> > >>> >> additional overhead in deciding what goes where and syncing
> > >> between
> > >> > >>> >> the two places for documentation.
> > >> > >>> >>
> > >> > >>> >> On Mon, Mar 16, 2015 at 7:59 PM, Stephan Ewen <
> > sewen@apache.org>
> > >> > >>> wrote:
> > >> > >>> >> > Ah, I totally forgot to add to the internals:
> > >> > >>> >> >
> > >> > >>> >> >   - Fault tolerance in Batch mode
> > >> > >>> >> >
> > >> > >>> >> >   - Fault Tolerance in Streaming Mode, with state handling
> > >> > >>> >> >
> > >> > >>> >> > On Mon, Mar 16, 2015 at 7:51 PM, Stephan Ewen <
> > sewen@apache.org
> > >> >
> > >> > >>> wrote:
> > >> > >>> >> >
> > >> > >>> >> >> Hi all!
> > >> > >>> >> >>
> > >> > >>> >> >> I would like to kick of an effort to improve the
> > documentation
> > >> of
> > >> > >>> the
> > >> > >>> >> >> Flink Architecture and internals. This also means making
> the
> > >> > >>> streaming
> > >> > >>> >> >> architecture more prominent in the docs.
> > >> > >>> >> >>
> > >> > >>> >> >> Being quite a sophisticated stack, we need to improve the
> > >> > >>> presentation
> > >> > >>> >> of
> > >> > >>> >> >> how Flink works - to an extend necessary to use Flink (and
> > to
> > >> > >>> >> appreciate
> > >> > >>> >> >> all the cool stuff that is happening). This should also
> > come in
> > >> > >>> handy
> > >> > >>> >> with
> > >> > >>> >> >> new contributors.
> > >> > >>> >> >>
> > >> > >>> >> >> As a general umbrella, we need to first decide where and
> > how to
> > >> > >>> >> organize
> > >> > >>> >> >> the documentation.
> > >> > >>> >> >>
> > >> > >>> >> >> I would propose to put the bulk of the documentation into
> > the
> > >> > Wiki.
> > >> > >>> >> Create
> > >> > >>> >> >> a dedicated section on Flink Internals and sub-pages for
> > each
> > >> > >>> >> component /
> > >> > >>> >> >> topic. To the docs, we add a general overview from which
> we
> > >> link
> > >> > >>> into
> > >> > >>> >> the
> > >> > >>> >> >> Wiki.
> > >> > >>> >> >>
> > >> > >>> >> >>
> > >> > >>> >> >>  == These sections would go into the DOCS in the git
> > repository
> > >> > ==
> > >> > >>> >> >>
> > >> > >>> >> >>   - Overview of Program, pre-flight phase (type
> extraction,
> > >> > >>> optimizer),
> > >> > >>> >> >> JobManager, TaskManager. Differences between streaming and
> > >> > batch. We
> > >> > >>> >> can
> > >> > >>> >> >> realize this through one very nice picture with few lines
> of
> > >> > text.
> > >> > >>> >> >>
> > >> > >>> >> >>   - High level architecture stack, different program
> > >> > representations
> > >> > >>> >> (API
> > >> > >>> >> >> operators, common API DAG, optimizer DAG, parallel data
> flow
> > >> > >>> (JobGraph
> > >> > >>> >> /
> > >> > >>> >> >> Execution Graph)
> > >> > >>> >> >>
> > >> > >>> >> >>   - (maybe) Parallelism and scheduling. This seems to be
> > >> > paramount
> > >> > >>> to
> > >> > >>> >> >> understand for users.
> > >> > >>> >> >>
> > >> > >>> >> >>   - Processes (JobManager, TaskManager, Webserver,
> > WebClient,
> > >> CLI
> > >> > >>> >> client)
> > >> > >>> >> >>
> > >> > >>> >> >>
> > >> > >>> >> >>
> > >> > >>> >> >>  == These sections would go into the WIKI ==
> > >> > >>> >> >>
> > >> > >>> >> >>   - Project structure (maven projects, what is where,
> > >> > dependencies
> > >> > >>> >> between
> > >> > >>> >> >> projects)
> > >> > >>> >> >>
> > >> > >>> >> >>   - Component overview
> > >> > >>> >> >>
> > >> > >>> >> >>     -> JobManager (InstanceManager, Scheduler, BLOB
> server,
> > >> > Library
> > >> > >>> >> Cache,
> > >> > >>> >> >> Archiving)
> > >> > >>> >> >>
> > >> > >>> >> >>     -> TaskManager (MemoryManager, IOManager, BLOB Cache,
> > >> Library
> > >> > >>> >> Cache)
> > >> > >>> >> >>
> > >> > >>> >> >>     -> Involved Actor Systems / Actors / Messages
> > >> > >>> >> >>
> > >> > >>> >> >>   - Details about submitting a job (library upload, job
> > graph
> > >> > >>> >> submission,
> > >> > >>> >> >> execution graph setup, scheduling trigger)
> > >> > >>> >> >>
> > >> > >>> >> >>   - Memory Management
> > >> > >>> >> >>
> > >> > >>> >> >>   - Optimizer internals
> > >> > >>> >> >>
> > >> > >>> >> >>   - Akka Setup specifics
> > >> > >>> >> >>
> > >> > >>> >> >>   - Netty and pluggable data exchange strategies
> > >> > >>> >> >>
> > >> > >>> >> >>   - Testing: Flink test clusters and unit test utilities
> > >> > >>> >> >>
> > >> > >>> >> >>   - Developer How-To: Setting up Eclipse, IntelliJ, Travis
> > >> > >>> >> >>
> > >> > >>> >> >>   - Step-by-step guide to add a new operator
> > >> > >>> >> >>
> > >> > >>> >> >>
> > >> > >>> >> >> I will go ahead and stub some sections in the Wiki.
> > >> > >>> >> >>
> > >> > >>> >> >> As we discuss and agree/disagree with the outline, we can
> > >> evolve
> > >> > the
> > >> > >>> >> Wiki.
> > >> > >>> >> >>
> > >> > >>> >> >> Greetings,
> > >> > >>> >> >> Stephan
> > >> > >>> >> >>
> > >> > >>> >> >>
> > >> > >>> >>
> > >> > >>> >
> > >> > >>> >
> > >> > >>>
> > >> > >>
> > >> > >>
> > >> >
> > >>
> >
>

Re: Improve the documentation of the Flink Architecture and internals

Posted by Stephan Ewen <se...@apache.org>.
Very nice post, Till!

We are starting to get much better with this...

On Sat, Mar 21, 2015 at 6:45 PM, Henry Saputra <he...@gmail.com>
wrote:

> Awesome, thanks Till
>
> On Saturday, March 21, 2015, Till Rohrmann <tr...@apache.org> wrote:
>
> > I wrote some internal documentation for Akka and the distributed
> > communication [1].
> >
> > Cheers,
> >
> > Till
> >
> > [1] https://cwiki.apache.org/confluence/display/FLINK/Akka+and+Actors
> >
> > On Fri, Mar 20, 2015 at 7:31 PM, Henry Saputra <henry.saputra@gmail.com
> > <javascript:;>>
> > wrote:
> >
> > > Ah the Tweet infra bot just announce extended downtime for Confluence
> [1]
> > >
> > > - Henry
> > >
> > > [1] https://twitter.com/infrabot/status/578983473970475008
> > >
> > > On Fri, Mar 20, 2015 at 11:27 AM, Stephan Ewen <sewen@apache.org
> > <javascript:;>> wrote:
> > > > For me as well. Earlier today it said "down for maintenance"
> > > >
> > > > On Fri, Mar 20, 2015 at 7:14 PM, Kostas Tzoumas <ktzoumas@apache.org
> > <javascript:;>>
> > > wrote:
> > > >
> > > >> it's down for me as well
> > > >>
> > > >> On Fri, Mar 20, 2015 at 7:12 PM, Henry Saputra <
> > henry.saputra@gmail.com <javascript:;>
> > > >
> > > >> wrote:
> > > >>
> > > >> > Is the wiki down for any of you?
> > > >> >
> > > >> > I can't access
> > > >> >
> https://cwiki.apache.org/confluence/display/FLINK/Apache+Flink+Home
> > > >> >
> > > >> > 404
> > > >> >
> > > >> > - Henry
> > > >> >
> > > >> > On Fri, Mar 20, 2015 at 4:46 AM, Kostas Tzoumas <
> > ktzoumas@apache.org <javascript:;>>
> > > >> > wrote:
> > > >> > > I added a document for data exchange between tasks:
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks
> > > >> > >
> > > >> > > Feel free to edit. I plan to link the class names to the class
> > > files in
> > > >> > > github.
> > > >> > >
> > > >> > > On Tue, Mar 17, 2015 at 11:17 AM, Kostas Tzoumas <
> > > ktzoumas@apache.org <javascript:;>>
> > > >> > > wrote:
> > > >> > >
> > > >> > >> +1 for the Wiki.
> > > >> > >>
> > > >> > >> When these have been stabilized we can move them to the docs if
> > we
> > > >> > decide
> > > >> > >> to do so.
> > > >> > >>
> > > >> > >> On Mon, Mar 16, 2015 at 10:07 PM, Stephan Ewen <
> sewen@apache.org
> > <javascript:;>>
> > > >> > wrote:
> > > >> > >>
> > > >> > >>> I have put my suggested version of an outline for the docs
> into
> > > the
> > > >> > wiki.
> > > >> > >>> Regardless where the docs end up (wiki or repository), we can
> > use
> > > the
> > > >> > wiki
> > > >> > >>> to outline the docs.
> > > >> > >>>
> > > >> > >>>
> > https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals
> > > >> > >>>
> > > >> > >>> Some pages contain some stub or outline, others are completely
> > > blank.
> > > >> > >>>
> > > >> > >>> Not a comple list. Additions are welcome.
> > > >> > >>>
> > > >> > >>> On Mon, Mar 16, 2015 at 10:04 PM, Stephan Ewen <
> > sewen@apache.org <javascript:;>>
> > > >> > wrote:
> > > >> > >>>
> > > >> > >>> > I think the Wiki has a much lower barrier of entry to fix
> > docs,
> > > >> > >>> especially
> > > >> > >>> > for external people. The docs, with the Jekyll setup, is
> > rather
> > > >> > tricky.
> > > >> > >>> > I would very much like that all kinds of people contribute
> to
> > > the
> > > >> > docs
> > > >> > >>> > about the internals, not just the usual three suspects that
> > have
> > > >> done
> > > >> > >>> this
> > > >> > >>> > so far.
> > > >> > >>> >
> > > >> > >>> > Having a good landing page in the regular docs is exactly to
> > not
> > > >> > loose
> > > >> > >>> all
> > > >> > >>> > the people that do not look into a wiki. The overview pages
> > for
> > > the
> > > >> > >>> > internals need to be good and accessible and nicely link to
> > the
> > > >> wiki
> > > >> > to
> > > >> > >>> > "forward" people there.
> > > >> > >>> >
> > > >> > >>> > The overhead of deciding what goes where should not be
> > terribly
> > > >> > large,
> > > >> > >>> in
> > > >> > >>> > my opinion, since there is no really "wrong" place to put
> it.
> > > >> > >>> >
> > > >> > >>> >
> > > >> > >>> >
> > > >> > >>> > On Mon, Mar 16, 2015 at 9:58 PM, Aljoscha Krettek <
> > > >> > aljoscha@apache.org <javascript:;>>
> > > >> > >>> > wrote:
> > > >> > >>> >
> > > >> > >>> >> Why do you wan't to split stuff between the doc in the
> > > repository
> > > >> > and
> > > >> > >>> >> the wiki. I for one would always be to lazy to check stuff
> > in a
> > > >> wiki
> > > >> > >>> >> when there is also a documentation. Plus, this would lead
> to
> > > >> > >>> >> additional overhead in deciding what goes where and syncing
> > > >> between
> > > >> > >>> >> the two places for documentation.
> > > >> > >>> >>
> > > >> > >>> >> On Mon, Mar 16, 2015 at 7:59 PM, Stephan Ewen <
> > > sewen@apache.org <javascript:;>>
> > > >> > >>> wrote:
> > > >> > >>> >> > Ah, I totally forgot to add to the internals:
> > > >> > >>> >> >
> > > >> > >>> >> >   - Fault tolerance in Batch mode
> > > >> > >>> >> >
> > > >> > >>> >> >   - Fault Tolerance in Streaming Mode, with state
> handling
> > > >> > >>> >> >
> > > >> > >>> >> > On Mon, Mar 16, 2015 at 7:51 PM, Stephan Ewen <
> > > sewen@apache.org <javascript:;>
> > > >> >
> > > >> > >>> wrote:
> > > >> > >>> >> >
> > > >> > >>> >> >> Hi all!
> > > >> > >>> >> >>
> > > >> > >>> >> >> I would like to kick of an effort to improve the
> > > documentation
> > > >> of
> > > >> > >>> the
> > > >> > >>> >> >> Flink Architecture and internals. This also means making
> > the
> > > >> > >>> streaming
> > > >> > >>> >> >> architecture more prominent in the docs.
> > > >> > >>> >> >>
> > > >> > >>> >> >> Being quite a sophisticated stack, we need to improve
> the
> > > >> > >>> presentation
> > > >> > >>> >> of
> > > >> > >>> >> >> how Flink works - to an extend necessary to use Flink
> (and
> > > to
> > > >> > >>> >> appreciate
> > > >> > >>> >> >> all the cool stuff that is happening). This should also
> > > come in
> > > >> > >>> handy
> > > >> > >>> >> with
> > > >> > >>> >> >> new contributors.
> > > >> > >>> >> >>
> > > >> > >>> >> >> As a general umbrella, we need to first decide where and
> > > how to
> > > >> > >>> >> organize
> > > >> > >>> >> >> the documentation.
> > > >> > >>> >> >>
> > > >> > >>> >> >> I would propose to put the bulk of the documentation
> into
> > > the
> > > >> > Wiki.
> > > >> > >>> >> Create
> > > >> > >>> >> >> a dedicated section on Flink Internals and sub-pages for
> > > each
> > > >> > >>> >> component /
> > > >> > >>> >> >> topic. To the docs, we add a general overview from which
> > we
> > > >> link
> > > >> > >>> into
> > > >> > >>> >> the
> > > >> > >>> >> >> Wiki.
> > > >> > >>> >> >>
> > > >> > >>> >> >>
> > > >> > >>> >> >>  == These sections would go into the DOCS in the git
> > > repository
> > > >> > ==
> > > >> > >>> >> >>
> > > >> > >>> >> >>   - Overview of Program, pre-flight phase (type
> > extraction,
> > > >> > >>> optimizer),
> > > >> > >>> >> >> JobManager, TaskManager. Differences between streaming
> and
> > > >> > batch. We
> > > >> > >>> >> can
> > > >> > >>> >> >> realize this through one very nice picture with few
> lines
> > of
> > > >> > text.
> > > >> > >>> >> >>
> > > >> > >>> >> >>   - High level architecture stack, different program
> > > >> > representations
> > > >> > >>> >> (API
> > > >> > >>> >> >> operators, common API DAG, optimizer DAG, parallel data
> > flow
> > > >> > >>> (JobGraph
> > > >> > >>> >> /
> > > >> > >>> >> >> Execution Graph)
> > > >> > >>> >> >>
> > > >> > >>> >> >>   - (maybe) Parallelism and scheduling. This seems to be
> > > >> > paramount
> > > >> > >>> to
> > > >> > >>> >> >> understand for users.
> > > >> > >>> >> >>
> > > >> > >>> >> >>   - Processes (JobManager, TaskManager, Webserver,
> > > WebClient,
> > > >> CLI
> > > >> > >>> >> client)
> > > >> > >>> >> >>
> > > >> > >>> >> >>
> > > >> > >>> >> >>
> > > >> > >>> >> >>  == These sections would go into the WIKI ==
> > > >> > >>> >> >>
> > > >> > >>> >> >>   - Project structure (maven projects, what is where,
> > > >> > dependencies
> > > >> > >>> >> between
> > > >> > >>> >> >> projects)
> > > >> > >>> >> >>
> > > >> > >>> >> >>   - Component overview
> > > >> > >>> >> >>
> > > >> > >>> >> >>     -> JobManager (InstanceManager, Scheduler, BLOB
> > server,
> > > >> > Library
> > > >> > >>> >> Cache,
> > > >> > >>> >> >> Archiving)
> > > >> > >>> >> >>
> > > >> > >>> >> >>     -> TaskManager (MemoryManager, IOManager, BLOB
> Cache,
> > > >> Library
> > > >> > >>> >> Cache)
> > > >> > >>> >> >>
> > > >> > >>> >> >>     -> Involved Actor Systems / Actors / Messages
> > > >> > >>> >> >>
> > > >> > >>> >> >>   - Details about submitting a job (library upload, job
> > > graph
> > > >> > >>> >> submission,
> > > >> > >>> >> >> execution graph setup, scheduling trigger)
> > > >> > >>> >> >>
> > > >> > >>> >> >>   - Memory Management
> > > >> > >>> >> >>
> > > >> > >>> >> >>   - Optimizer internals
> > > >> > >>> >> >>
> > > >> > >>> >> >>   - Akka Setup specifics
> > > >> > >>> >> >>
> > > >> > >>> >> >>   - Netty and pluggable data exchange strategies
> > > >> > >>> >> >>
> > > >> > >>> >> >>   - Testing: Flink test clusters and unit test utilities
> > > >> > >>> >> >>
> > > >> > >>> >> >>   - Developer How-To: Setting up Eclipse, IntelliJ,
> Travis
> > > >> > >>> >> >>
> > > >> > >>> >> >>   - Step-by-step guide to add a new operator
> > > >> > >>> >> >>
> > > >> > >>> >> >>
> > > >> > >>> >> >> I will go ahead and stub some sections in the Wiki.
> > > >> > >>> >> >>
> > > >> > >>> >> >> As we discuss and agree/disagree with the outline, we
> can
> > > >> evolve
> > > >> > the
> > > >> > >>> >> Wiki.
> > > >> > >>> >> >>
> > > >> > >>> >> >> Greetings,
> > > >> > >>> >> >> Stephan
> > > >> > >>> >> >>
> > > >> > >>> >> >>
> > > >> > >>> >>
> > > >> > >>> >
> > > >> > >>> >
> > > >> > >>>
> > > >> > >>
> > > >> > >>
> > > >> >
> > > >>
> > >
> >
>

Re: Improve the documentation of the Flink Architecture and internals

Posted by Henry Saputra <he...@gmail.com>.
Awesome, thanks Till

On Saturday, March 21, 2015, Till Rohrmann <tr...@apache.org> wrote:

> I wrote some internal documentation for Akka and the distributed
> communication [1].
>
> Cheers,
>
> Till
>
> [1] https://cwiki.apache.org/confluence/display/FLINK/Akka+and+Actors
>
> On Fri, Mar 20, 2015 at 7:31 PM, Henry Saputra <henry.saputra@gmail.com
> <javascript:;>>
> wrote:
>
> > Ah the Tweet infra bot just announce extended downtime for Confluence [1]
> >
> > - Henry
> >
> > [1] https://twitter.com/infrabot/status/578983473970475008
> >
> > On Fri, Mar 20, 2015 at 11:27 AM, Stephan Ewen <sewen@apache.org
> <javascript:;>> wrote:
> > > For me as well. Earlier today it said "down for maintenance"
> > >
> > > On Fri, Mar 20, 2015 at 7:14 PM, Kostas Tzoumas <ktzoumas@apache.org
> <javascript:;>>
> > wrote:
> > >
> > >> it's down for me as well
> > >>
> > >> On Fri, Mar 20, 2015 at 7:12 PM, Henry Saputra <
> henry.saputra@gmail.com <javascript:;>
> > >
> > >> wrote:
> > >>
> > >> > Is the wiki down for any of you?
> > >> >
> > >> > I can't access
> > >> > https://cwiki.apache.org/confluence/display/FLINK/Apache+Flink+Home
> > >> >
> > >> > 404
> > >> >
> > >> > - Henry
> > >> >
> > >> > On Fri, Mar 20, 2015 at 4:46 AM, Kostas Tzoumas <
> ktzoumas@apache.org <javascript:;>>
> > >> > wrote:
> > >> > > I added a document for data exchange between tasks:
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks
> > >> > >
> > >> > > Feel free to edit. I plan to link the class names to the class
> > files in
> > >> > > github.
> > >> > >
> > >> > > On Tue, Mar 17, 2015 at 11:17 AM, Kostas Tzoumas <
> > ktzoumas@apache.org <javascript:;>>
> > >> > > wrote:
> > >> > >
> > >> > >> +1 for the Wiki.
> > >> > >>
> > >> > >> When these have been stabilized we can move them to the docs if
> we
> > >> > decide
> > >> > >> to do so.
> > >> > >>
> > >> > >> On Mon, Mar 16, 2015 at 10:07 PM, Stephan Ewen <sewen@apache.org
> <javascript:;>>
> > >> > wrote:
> > >> > >>
> > >> > >>> I have put my suggested version of an outline for the docs into
> > the
> > >> > wiki.
> > >> > >>> Regardless where the docs end up (wiki or repository), we can
> use
> > the
> > >> > wiki
> > >> > >>> to outline the docs.
> > >> > >>>
> > >> > >>>
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals
> > >> > >>>
> > >> > >>> Some pages contain some stub or outline, others are completely
> > blank.
> > >> > >>>
> > >> > >>> Not a comple list. Additions are welcome.
> > >> > >>>
> > >> > >>> On Mon, Mar 16, 2015 at 10:04 PM, Stephan Ewen <
> sewen@apache.org <javascript:;>>
> > >> > wrote:
> > >> > >>>
> > >> > >>> > I think the Wiki has a much lower barrier of entry to fix
> docs,
> > >> > >>> especially
> > >> > >>> > for external people. The docs, with the Jekyll setup, is
> rather
> > >> > tricky.
> > >> > >>> > I would very much like that all kinds of people contribute to
> > the
> > >> > docs
> > >> > >>> > about the internals, not just the usual three suspects that
> have
> > >> done
> > >> > >>> this
> > >> > >>> > so far.
> > >> > >>> >
> > >> > >>> > Having a good landing page in the regular docs is exactly to
> not
> > >> > loose
> > >> > >>> all
> > >> > >>> > the people that do not look into a wiki. The overview pages
> for
> > the
> > >> > >>> > internals need to be good and accessible and nicely link to
> the
> > >> wiki
> > >> > to
> > >> > >>> > "forward" people there.
> > >> > >>> >
> > >> > >>> > The overhead of deciding what goes where should not be
> terribly
> > >> > large,
> > >> > >>> in
> > >> > >>> > my opinion, since there is no really "wrong" place to put it.
> > >> > >>> >
> > >> > >>> >
> > >> > >>> >
> > >> > >>> > On Mon, Mar 16, 2015 at 9:58 PM, Aljoscha Krettek <
> > >> > aljoscha@apache.org <javascript:;>>
> > >> > >>> > wrote:
> > >> > >>> >
> > >> > >>> >> Why do you wan't to split stuff between the doc in the
> > repository
> > >> > and
> > >> > >>> >> the wiki. I for one would always be to lazy to check stuff
> in a
> > >> wiki
> > >> > >>> >> when there is also a documentation. Plus, this would lead to
> > >> > >>> >> additional overhead in deciding what goes where and syncing
> > >> between
> > >> > >>> >> the two places for documentation.
> > >> > >>> >>
> > >> > >>> >> On Mon, Mar 16, 2015 at 7:59 PM, Stephan Ewen <
> > sewen@apache.org <javascript:;>>
> > >> > >>> wrote:
> > >> > >>> >> > Ah, I totally forgot to add to the internals:
> > >> > >>> >> >
> > >> > >>> >> >   - Fault tolerance in Batch mode
> > >> > >>> >> >
> > >> > >>> >> >   - Fault Tolerance in Streaming Mode, with state handling
> > >> > >>> >> >
> > >> > >>> >> > On Mon, Mar 16, 2015 at 7:51 PM, Stephan Ewen <
> > sewen@apache.org <javascript:;>
> > >> >
> > >> > >>> wrote:
> > >> > >>> >> >
> > >> > >>> >> >> Hi all!
> > >> > >>> >> >>
> > >> > >>> >> >> I would like to kick of an effort to improve the
> > documentation
> > >> of
> > >> > >>> the
> > >> > >>> >> >> Flink Architecture and internals. This also means making
> the
> > >> > >>> streaming
> > >> > >>> >> >> architecture more prominent in the docs.
> > >> > >>> >> >>
> > >> > >>> >> >> Being quite a sophisticated stack, we need to improve the
> > >> > >>> presentation
> > >> > >>> >> of
> > >> > >>> >> >> how Flink works - to an extend necessary to use Flink (and
> > to
> > >> > >>> >> appreciate
> > >> > >>> >> >> all the cool stuff that is happening). This should also
> > come in
> > >> > >>> handy
> > >> > >>> >> with
> > >> > >>> >> >> new contributors.
> > >> > >>> >> >>
> > >> > >>> >> >> As a general umbrella, we need to first decide where and
> > how to
> > >> > >>> >> organize
> > >> > >>> >> >> the documentation.
> > >> > >>> >> >>
> > >> > >>> >> >> I would propose to put the bulk of the documentation into
> > the
> > >> > Wiki.
> > >> > >>> >> Create
> > >> > >>> >> >> a dedicated section on Flink Internals and sub-pages for
> > each
> > >> > >>> >> component /
> > >> > >>> >> >> topic. To the docs, we add a general overview from which
> we
> > >> link
> > >> > >>> into
> > >> > >>> >> the
> > >> > >>> >> >> Wiki.
> > >> > >>> >> >>
> > >> > >>> >> >>
> > >> > >>> >> >>  == These sections would go into the DOCS in the git
> > repository
> > >> > ==
> > >> > >>> >> >>
> > >> > >>> >> >>   - Overview of Program, pre-flight phase (type
> extraction,
> > >> > >>> optimizer),
> > >> > >>> >> >> JobManager, TaskManager. Differences between streaming and
> > >> > batch. We
> > >> > >>> >> can
> > >> > >>> >> >> realize this through one very nice picture with few lines
> of
> > >> > text.
> > >> > >>> >> >>
> > >> > >>> >> >>   - High level architecture stack, different program
> > >> > representations
> > >> > >>> >> (API
> > >> > >>> >> >> operators, common API DAG, optimizer DAG, parallel data
> flow
> > >> > >>> (JobGraph
> > >> > >>> >> /
> > >> > >>> >> >> Execution Graph)
> > >> > >>> >> >>
> > >> > >>> >> >>   - (maybe) Parallelism and scheduling. This seems to be
> > >> > paramount
> > >> > >>> to
> > >> > >>> >> >> understand for users.
> > >> > >>> >> >>
> > >> > >>> >> >>   - Processes (JobManager, TaskManager, Webserver,
> > WebClient,
> > >> CLI
> > >> > >>> >> client)
> > >> > >>> >> >>
> > >> > >>> >> >>
> > >> > >>> >> >>
> > >> > >>> >> >>  == These sections would go into the WIKI ==
> > >> > >>> >> >>
> > >> > >>> >> >>   - Project structure (maven projects, what is where,
> > >> > dependencies
> > >> > >>> >> between
> > >> > >>> >> >> projects)
> > >> > >>> >> >>
> > >> > >>> >> >>   - Component overview
> > >> > >>> >> >>
> > >> > >>> >> >>     -> JobManager (InstanceManager, Scheduler, BLOB
> server,
> > >> > Library
> > >> > >>> >> Cache,
> > >> > >>> >> >> Archiving)
> > >> > >>> >> >>
> > >> > >>> >> >>     -> TaskManager (MemoryManager, IOManager, BLOB Cache,
> > >> Library
> > >> > >>> >> Cache)
> > >> > >>> >> >>
> > >> > >>> >> >>     -> Involved Actor Systems / Actors / Messages
> > >> > >>> >> >>
> > >> > >>> >> >>   - Details about submitting a job (library upload, job
> > graph
> > >> > >>> >> submission,
> > >> > >>> >> >> execution graph setup, scheduling trigger)
> > >> > >>> >> >>
> > >> > >>> >> >>   - Memory Management
> > >> > >>> >> >>
> > >> > >>> >> >>   - Optimizer internals
> > >> > >>> >> >>
> > >> > >>> >> >>   - Akka Setup specifics
> > >> > >>> >> >>
> > >> > >>> >> >>   - Netty and pluggable data exchange strategies
> > >> > >>> >> >>
> > >> > >>> >> >>   - Testing: Flink test clusters and unit test utilities
> > >> > >>> >> >>
> > >> > >>> >> >>   - Developer How-To: Setting up Eclipse, IntelliJ, Travis
> > >> > >>> >> >>
> > >> > >>> >> >>   - Step-by-step guide to add a new operator
> > >> > >>> >> >>
> > >> > >>> >> >>
> > >> > >>> >> >> I will go ahead and stub some sections in the Wiki.
> > >> > >>> >> >>
> > >> > >>> >> >> As we discuss and agree/disagree with the outline, we can
> > >> evolve
> > >> > the
> > >> > >>> >> Wiki.
> > >> > >>> >> >>
> > >> > >>> >> >> Greetings,
> > >> > >>> >> >> Stephan
> > >> > >>> >> >>
> > >> > >>> >> >>
> > >> > >>> >>
> > >> > >>> >
> > >> > >>> >
> > >> > >>>
> > >> > >>
> > >> > >>
> > >> >
> > >>
> >
>

Re: Improve the documentation of the Flink Architecture and internals

Posted by Till Rohrmann <tr...@apache.org>.
I wrote some internal documentation for Akka and the distributed
communication [1].

Cheers,

Till

[1] https://cwiki.apache.org/confluence/display/FLINK/Akka+and+Actors

On Fri, Mar 20, 2015 at 7:31 PM, Henry Saputra <he...@gmail.com>
wrote:

> Ah the Tweet infra bot just announce extended downtime for Confluence [1]
>
> - Henry
>
> [1] https://twitter.com/infrabot/status/578983473970475008
>
> On Fri, Mar 20, 2015 at 11:27 AM, Stephan Ewen <se...@apache.org> wrote:
> > For me as well. Earlier today it said "down for maintenance"
> >
> > On Fri, Mar 20, 2015 at 7:14 PM, Kostas Tzoumas <kt...@apache.org>
> wrote:
> >
> >> it's down for me as well
> >>
> >> On Fri, Mar 20, 2015 at 7:12 PM, Henry Saputra <henry.saputra@gmail.com
> >
> >> wrote:
> >>
> >> > Is the wiki down for any of you?
> >> >
> >> > I can't access
> >> > https://cwiki.apache.org/confluence/display/FLINK/Apache+Flink+Home
> >> >
> >> > 404
> >> >
> >> > - Henry
> >> >
> >> > On Fri, Mar 20, 2015 at 4:46 AM, Kostas Tzoumas <kt...@apache.org>
> >> > wrote:
> >> > > I added a document for data exchange between tasks:
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks
> >> > >
> >> > > Feel free to edit. I plan to link the class names to the class
> files in
> >> > > github.
> >> > >
> >> > > On Tue, Mar 17, 2015 at 11:17 AM, Kostas Tzoumas <
> ktzoumas@apache.org>
> >> > > wrote:
> >> > >
> >> > >> +1 for the Wiki.
> >> > >>
> >> > >> When these have been stabilized we can move them to the docs if we
> >> > decide
> >> > >> to do so.
> >> > >>
> >> > >> On Mon, Mar 16, 2015 at 10:07 PM, Stephan Ewen <se...@apache.org>
> >> > wrote:
> >> > >>
> >> > >>> I have put my suggested version of an outline for the docs into
> the
> >> > wiki.
> >> > >>> Regardless where the docs end up (wiki or repository), we can use
> the
> >> > wiki
> >> > >>> to outline the docs.
> >> > >>>
> >> > >>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals
> >> > >>>
> >> > >>> Some pages contain some stub or outline, others are completely
> blank.
> >> > >>>
> >> > >>> Not a comple list. Additions are welcome.
> >> > >>>
> >> > >>> On Mon, Mar 16, 2015 at 10:04 PM, Stephan Ewen <se...@apache.org>
> >> > wrote:
> >> > >>>
> >> > >>> > I think the Wiki has a much lower barrier of entry to fix docs,
> >> > >>> especially
> >> > >>> > for external people. The docs, with the Jekyll setup, is rather
> >> > tricky.
> >> > >>> > I would very much like that all kinds of people contribute to
> the
> >> > docs
> >> > >>> > about the internals, not just the usual three suspects that have
> >> done
> >> > >>> this
> >> > >>> > so far.
> >> > >>> >
> >> > >>> > Having a good landing page in the regular docs is exactly to not
> >> > loose
> >> > >>> all
> >> > >>> > the people that do not look into a wiki. The overview pages for
> the
> >> > >>> > internals need to be good and accessible and nicely link to the
> >> wiki
> >> > to
> >> > >>> > "forward" people there.
> >> > >>> >
> >> > >>> > The overhead of deciding what goes where should not be terribly
> >> > large,
> >> > >>> in
> >> > >>> > my opinion, since there is no really "wrong" place to put it.
> >> > >>> >
> >> > >>> >
> >> > >>> >
> >> > >>> > On Mon, Mar 16, 2015 at 9:58 PM, Aljoscha Krettek <
> >> > aljoscha@apache.org>
> >> > >>> > wrote:
> >> > >>> >
> >> > >>> >> Why do you wan't to split stuff between the doc in the
> repository
> >> > and
> >> > >>> >> the wiki. I for one would always be to lazy to check stuff in a
> >> wiki
> >> > >>> >> when there is also a documentation. Plus, this would lead to
> >> > >>> >> additional overhead in deciding what goes where and syncing
> >> between
> >> > >>> >> the two places for documentation.
> >> > >>> >>
> >> > >>> >> On Mon, Mar 16, 2015 at 7:59 PM, Stephan Ewen <
> sewen@apache.org>
> >> > >>> wrote:
> >> > >>> >> > Ah, I totally forgot to add to the internals:
> >> > >>> >> >
> >> > >>> >> >   - Fault tolerance in Batch mode
> >> > >>> >> >
> >> > >>> >> >   - Fault Tolerance in Streaming Mode, with state handling
> >> > >>> >> >
> >> > >>> >> > On Mon, Mar 16, 2015 at 7:51 PM, Stephan Ewen <
> sewen@apache.org
> >> >
> >> > >>> wrote:
> >> > >>> >> >
> >> > >>> >> >> Hi all!
> >> > >>> >> >>
> >> > >>> >> >> I would like to kick of an effort to improve the
> documentation
> >> of
> >> > >>> the
> >> > >>> >> >> Flink Architecture and internals. This also means making the
> >> > >>> streaming
> >> > >>> >> >> architecture more prominent in the docs.
> >> > >>> >> >>
> >> > >>> >> >> Being quite a sophisticated stack, we need to improve the
> >> > >>> presentation
> >> > >>> >> of
> >> > >>> >> >> how Flink works - to an extend necessary to use Flink (and
> to
> >> > >>> >> appreciate
> >> > >>> >> >> all the cool stuff that is happening). This should also
> come in
> >> > >>> handy
> >> > >>> >> with
> >> > >>> >> >> new contributors.
> >> > >>> >> >>
> >> > >>> >> >> As a general umbrella, we need to first decide where and
> how to
> >> > >>> >> organize
> >> > >>> >> >> the documentation.
> >> > >>> >> >>
> >> > >>> >> >> I would propose to put the bulk of the documentation into
> the
> >> > Wiki.
> >> > >>> >> Create
> >> > >>> >> >> a dedicated section on Flink Internals and sub-pages for
> each
> >> > >>> >> component /
> >> > >>> >> >> topic. To the docs, we add a general overview from which we
> >> link
> >> > >>> into
> >> > >>> >> the
> >> > >>> >> >> Wiki.
> >> > >>> >> >>
> >> > >>> >> >>
> >> > >>> >> >>  == These sections would go into the DOCS in the git
> repository
> >> > ==
> >> > >>> >> >>
> >> > >>> >> >>   - Overview of Program, pre-flight phase (type extraction,
> >> > >>> optimizer),
> >> > >>> >> >> JobManager, TaskManager. Differences between streaming and
> >> > batch. We
> >> > >>> >> can
> >> > >>> >> >> realize this through one very nice picture with few lines of
> >> > text.
> >> > >>> >> >>
> >> > >>> >> >>   - High level architecture stack, different program
> >> > representations
> >> > >>> >> (API
> >> > >>> >> >> operators, common API DAG, optimizer DAG, parallel data flow
> >> > >>> (JobGraph
> >> > >>> >> /
> >> > >>> >> >> Execution Graph)
> >> > >>> >> >>
> >> > >>> >> >>   - (maybe) Parallelism and scheduling. This seems to be
> >> > paramount
> >> > >>> to
> >> > >>> >> >> understand for users.
> >> > >>> >> >>
> >> > >>> >> >>   - Processes (JobManager, TaskManager, Webserver,
> WebClient,
> >> CLI
> >> > >>> >> client)
> >> > >>> >> >>
> >> > >>> >> >>
> >> > >>> >> >>
> >> > >>> >> >>  == These sections would go into the WIKI ==
> >> > >>> >> >>
> >> > >>> >> >>   - Project structure (maven projects, what is where,
> >> > dependencies
> >> > >>> >> between
> >> > >>> >> >> projects)
> >> > >>> >> >>
> >> > >>> >> >>   - Component overview
> >> > >>> >> >>
> >> > >>> >> >>     -> JobManager (InstanceManager, Scheduler, BLOB server,
> >> > Library
> >> > >>> >> Cache,
> >> > >>> >> >> Archiving)
> >> > >>> >> >>
> >> > >>> >> >>     -> TaskManager (MemoryManager, IOManager, BLOB Cache,
> >> Library
> >> > >>> >> Cache)
> >> > >>> >> >>
> >> > >>> >> >>     -> Involved Actor Systems / Actors / Messages
> >> > >>> >> >>
> >> > >>> >> >>   - Details about submitting a job (library upload, job
> graph
> >> > >>> >> submission,
> >> > >>> >> >> execution graph setup, scheduling trigger)
> >> > >>> >> >>
> >> > >>> >> >>   - Memory Management
> >> > >>> >> >>
> >> > >>> >> >>   - Optimizer internals
> >> > >>> >> >>
> >> > >>> >> >>   - Akka Setup specifics
> >> > >>> >> >>
> >> > >>> >> >>   - Netty and pluggable data exchange strategies
> >> > >>> >> >>
> >> > >>> >> >>   - Testing: Flink test clusters and unit test utilities
> >> > >>> >> >>
> >> > >>> >> >>   - Developer How-To: Setting up Eclipse, IntelliJ, Travis
> >> > >>> >> >>
> >> > >>> >> >>   - Step-by-step guide to add a new operator
> >> > >>> >> >>
> >> > >>> >> >>
> >> > >>> >> >> I will go ahead and stub some sections in the Wiki.
> >> > >>> >> >>
> >> > >>> >> >> As we discuss and agree/disagree with the outline, we can
> >> evolve
> >> > the
> >> > >>> >> Wiki.
> >> > >>> >> >>
> >> > >>> >> >> Greetings,
> >> > >>> >> >> Stephan
> >> > >>> >> >>
> >> > >>> >> >>
> >> > >>> >>
> >> > >>> >
> >> > >>> >
> >> > >>>
> >> > >>
> >> > >>
> >> >
> >>
>

Re: Improve the documentation of the Flink Architecture and internals

Posted by Henry Saputra <he...@gmail.com>.
Ah the Tweet infra bot just announce extended downtime for Confluence [1]

- Henry

[1] https://twitter.com/infrabot/status/578983473970475008

On Fri, Mar 20, 2015 at 11:27 AM, Stephan Ewen <se...@apache.org> wrote:
> For me as well. Earlier today it said "down for maintenance"
>
> On Fri, Mar 20, 2015 at 7:14 PM, Kostas Tzoumas <kt...@apache.org> wrote:
>
>> it's down for me as well
>>
>> On Fri, Mar 20, 2015 at 7:12 PM, Henry Saputra <he...@gmail.com>
>> wrote:
>>
>> > Is the wiki down for any of you?
>> >
>> > I can't access
>> > https://cwiki.apache.org/confluence/display/FLINK/Apache+Flink+Home
>> >
>> > 404
>> >
>> > - Henry
>> >
>> > On Fri, Mar 20, 2015 at 4:46 AM, Kostas Tzoumas <kt...@apache.org>
>> > wrote:
>> > > I added a document for data exchange between tasks:
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks
>> > >
>> > > Feel free to edit. I plan to link the class names to the class files in
>> > > github.
>> > >
>> > > On Tue, Mar 17, 2015 at 11:17 AM, Kostas Tzoumas <kt...@apache.org>
>> > > wrote:
>> > >
>> > >> +1 for the Wiki.
>> > >>
>> > >> When these have been stabilized we can move them to the docs if we
>> > decide
>> > >> to do so.
>> > >>
>> > >> On Mon, Mar 16, 2015 at 10:07 PM, Stephan Ewen <se...@apache.org>
>> > wrote:
>> > >>
>> > >>> I have put my suggested version of an outline for the docs into the
>> > wiki.
>> > >>> Regardless where the docs end up (wiki or repository), we can use the
>> > wiki
>> > >>> to outline the docs.
>> > >>>
>> > >>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals
>> > >>>
>> > >>> Some pages contain some stub or outline, others are completely blank.
>> > >>>
>> > >>> Not a comple list. Additions are welcome.
>> > >>>
>> > >>> On Mon, Mar 16, 2015 at 10:04 PM, Stephan Ewen <se...@apache.org>
>> > wrote:
>> > >>>
>> > >>> > I think the Wiki has a much lower barrier of entry to fix docs,
>> > >>> especially
>> > >>> > for external people. The docs, with the Jekyll setup, is rather
>> > tricky.
>> > >>> > I would very much like that all kinds of people contribute to the
>> > docs
>> > >>> > about the internals, not just the usual three suspects that have
>> done
>> > >>> this
>> > >>> > so far.
>> > >>> >
>> > >>> > Having a good landing page in the regular docs is exactly to not
>> > loose
>> > >>> all
>> > >>> > the people that do not look into a wiki. The overview pages for the
>> > >>> > internals need to be good and accessible and nicely link to the
>> wiki
>> > to
>> > >>> > "forward" people there.
>> > >>> >
>> > >>> > The overhead of deciding what goes where should not be terribly
>> > large,
>> > >>> in
>> > >>> > my opinion, since there is no really "wrong" place to put it.
>> > >>> >
>> > >>> >
>> > >>> >
>> > >>> > On Mon, Mar 16, 2015 at 9:58 PM, Aljoscha Krettek <
>> > aljoscha@apache.org>
>> > >>> > wrote:
>> > >>> >
>> > >>> >> Why do you wan't to split stuff between the doc in the repository
>> > and
>> > >>> >> the wiki. I for one would always be to lazy to check stuff in a
>> wiki
>> > >>> >> when there is also a documentation. Plus, this would lead to
>> > >>> >> additional overhead in deciding what goes where and syncing
>> between
>> > >>> >> the two places for documentation.
>> > >>> >>
>> > >>> >> On Mon, Mar 16, 2015 at 7:59 PM, Stephan Ewen <se...@apache.org>
>> > >>> wrote:
>> > >>> >> > Ah, I totally forgot to add to the internals:
>> > >>> >> >
>> > >>> >> >   - Fault tolerance in Batch mode
>> > >>> >> >
>> > >>> >> >   - Fault Tolerance in Streaming Mode, with state handling
>> > >>> >> >
>> > >>> >> > On Mon, Mar 16, 2015 at 7:51 PM, Stephan Ewen <sewen@apache.org
>> >
>> > >>> wrote:
>> > >>> >> >
>> > >>> >> >> Hi all!
>> > >>> >> >>
>> > >>> >> >> I would like to kick of an effort to improve the documentation
>> of
>> > >>> the
>> > >>> >> >> Flink Architecture and internals. This also means making the
>> > >>> streaming
>> > >>> >> >> architecture more prominent in the docs.
>> > >>> >> >>
>> > >>> >> >> Being quite a sophisticated stack, we need to improve the
>> > >>> presentation
>> > >>> >> of
>> > >>> >> >> how Flink works - to an extend necessary to use Flink (and to
>> > >>> >> appreciate
>> > >>> >> >> all the cool stuff that is happening). This should also come in
>> > >>> handy
>> > >>> >> with
>> > >>> >> >> new contributors.
>> > >>> >> >>
>> > >>> >> >> As a general umbrella, we need to first decide where and how to
>> > >>> >> organize
>> > >>> >> >> the documentation.
>> > >>> >> >>
>> > >>> >> >> I would propose to put the bulk of the documentation into the
>> > Wiki.
>> > >>> >> Create
>> > >>> >> >> a dedicated section on Flink Internals and sub-pages for each
>> > >>> >> component /
>> > >>> >> >> topic. To the docs, we add a general overview from which we
>> link
>> > >>> into
>> > >>> >> the
>> > >>> >> >> Wiki.
>> > >>> >> >>
>> > >>> >> >>
>> > >>> >> >>  == These sections would go into the DOCS in the git repository
>> > ==
>> > >>> >> >>
>> > >>> >> >>   - Overview of Program, pre-flight phase (type extraction,
>> > >>> optimizer),
>> > >>> >> >> JobManager, TaskManager. Differences between streaming and
>> > batch. We
>> > >>> >> can
>> > >>> >> >> realize this through one very nice picture with few lines of
>> > text.
>> > >>> >> >>
>> > >>> >> >>   - High level architecture stack, different program
>> > representations
>> > >>> >> (API
>> > >>> >> >> operators, common API DAG, optimizer DAG, parallel data flow
>> > >>> (JobGraph
>> > >>> >> /
>> > >>> >> >> Execution Graph)
>> > >>> >> >>
>> > >>> >> >>   - (maybe) Parallelism and scheduling. This seems to be
>> > paramount
>> > >>> to
>> > >>> >> >> understand for users.
>> > >>> >> >>
>> > >>> >> >>   - Processes (JobManager, TaskManager, Webserver, WebClient,
>> CLI
>> > >>> >> client)
>> > >>> >> >>
>> > >>> >> >>
>> > >>> >> >>
>> > >>> >> >>  == These sections would go into the WIKI ==
>> > >>> >> >>
>> > >>> >> >>   - Project structure (maven projects, what is where,
>> > dependencies
>> > >>> >> between
>> > >>> >> >> projects)
>> > >>> >> >>
>> > >>> >> >>   - Component overview
>> > >>> >> >>
>> > >>> >> >>     -> JobManager (InstanceManager, Scheduler, BLOB server,
>> > Library
>> > >>> >> Cache,
>> > >>> >> >> Archiving)
>> > >>> >> >>
>> > >>> >> >>     -> TaskManager (MemoryManager, IOManager, BLOB Cache,
>> Library
>> > >>> >> Cache)
>> > >>> >> >>
>> > >>> >> >>     -> Involved Actor Systems / Actors / Messages
>> > >>> >> >>
>> > >>> >> >>   - Details about submitting a job (library upload, job graph
>> > >>> >> submission,
>> > >>> >> >> execution graph setup, scheduling trigger)
>> > >>> >> >>
>> > >>> >> >>   - Memory Management
>> > >>> >> >>
>> > >>> >> >>   - Optimizer internals
>> > >>> >> >>
>> > >>> >> >>   - Akka Setup specifics
>> > >>> >> >>
>> > >>> >> >>   - Netty and pluggable data exchange strategies
>> > >>> >> >>
>> > >>> >> >>   - Testing: Flink test clusters and unit test utilities
>> > >>> >> >>
>> > >>> >> >>   - Developer How-To: Setting up Eclipse, IntelliJ, Travis
>> > >>> >> >>
>> > >>> >> >>   - Step-by-step guide to add a new operator
>> > >>> >> >>
>> > >>> >> >>
>> > >>> >> >> I will go ahead and stub some sections in the Wiki.
>> > >>> >> >>
>> > >>> >> >> As we discuss and agree/disagree with the outline, we can
>> evolve
>> > the
>> > >>> >> Wiki.
>> > >>> >> >>
>> > >>> >> >> Greetings,
>> > >>> >> >> Stephan
>> > >>> >> >>
>> > >>> >> >>
>> > >>> >>
>> > >>> >
>> > >>> >
>> > >>>
>> > >>
>> > >>
>> >
>>

Re: Improve the documentation of the Flink Architecture and internals

Posted by Stephan Ewen <se...@apache.org>.
For me as well. Earlier today it said "down for maintenance"

On Fri, Mar 20, 2015 at 7:14 PM, Kostas Tzoumas <kt...@apache.org> wrote:

> it's down for me as well
>
> On Fri, Mar 20, 2015 at 7:12 PM, Henry Saputra <he...@gmail.com>
> wrote:
>
> > Is the wiki down for any of you?
> >
> > I can't access
> > https://cwiki.apache.org/confluence/display/FLINK/Apache+Flink+Home
> >
> > 404
> >
> > - Henry
> >
> > On Fri, Mar 20, 2015 at 4:46 AM, Kostas Tzoumas <kt...@apache.org>
> > wrote:
> > > I added a document for data exchange between tasks:
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks
> > >
> > > Feel free to edit. I plan to link the class names to the class files in
> > > github.
> > >
> > > On Tue, Mar 17, 2015 at 11:17 AM, Kostas Tzoumas <kt...@apache.org>
> > > wrote:
> > >
> > >> +1 for the Wiki.
> > >>
> > >> When these have been stabilized we can move them to the docs if we
> > decide
> > >> to do so.
> > >>
> > >> On Mon, Mar 16, 2015 at 10:07 PM, Stephan Ewen <se...@apache.org>
> > wrote:
> > >>
> > >>> I have put my suggested version of an outline for the docs into the
> > wiki.
> > >>> Regardless where the docs end up (wiki or repository), we can use the
> > wiki
> > >>> to outline the docs.
> > >>>
> > >>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals
> > >>>
> > >>> Some pages contain some stub or outline, others are completely blank.
> > >>>
> > >>> Not a comple list. Additions are welcome.
> > >>>
> > >>> On Mon, Mar 16, 2015 at 10:04 PM, Stephan Ewen <se...@apache.org>
> > wrote:
> > >>>
> > >>> > I think the Wiki has a much lower barrier of entry to fix docs,
> > >>> especially
> > >>> > for external people. The docs, with the Jekyll setup, is rather
> > tricky.
> > >>> > I would very much like that all kinds of people contribute to the
> > docs
> > >>> > about the internals, not just the usual three suspects that have
> done
> > >>> this
> > >>> > so far.
> > >>> >
> > >>> > Having a good landing page in the regular docs is exactly to not
> > loose
> > >>> all
> > >>> > the people that do not look into a wiki. The overview pages for the
> > >>> > internals need to be good and accessible and nicely link to the
> wiki
> > to
> > >>> > "forward" people there.
> > >>> >
> > >>> > The overhead of deciding what goes where should not be terribly
> > large,
> > >>> in
> > >>> > my opinion, since there is no really "wrong" place to put it.
> > >>> >
> > >>> >
> > >>> >
> > >>> > On Mon, Mar 16, 2015 at 9:58 PM, Aljoscha Krettek <
> > aljoscha@apache.org>
> > >>> > wrote:
> > >>> >
> > >>> >> Why do you wan't to split stuff between the doc in the repository
> > and
> > >>> >> the wiki. I for one would always be to lazy to check stuff in a
> wiki
> > >>> >> when there is also a documentation. Plus, this would lead to
> > >>> >> additional overhead in deciding what goes where and syncing
> between
> > >>> >> the two places for documentation.
> > >>> >>
> > >>> >> On Mon, Mar 16, 2015 at 7:59 PM, Stephan Ewen <se...@apache.org>
> > >>> wrote:
> > >>> >> > Ah, I totally forgot to add to the internals:
> > >>> >> >
> > >>> >> >   - Fault tolerance in Batch mode
> > >>> >> >
> > >>> >> >   - Fault Tolerance in Streaming Mode, with state handling
> > >>> >> >
> > >>> >> > On Mon, Mar 16, 2015 at 7:51 PM, Stephan Ewen <sewen@apache.org
> >
> > >>> wrote:
> > >>> >> >
> > >>> >> >> Hi all!
> > >>> >> >>
> > >>> >> >> I would like to kick of an effort to improve the documentation
> of
> > >>> the
> > >>> >> >> Flink Architecture and internals. This also means making the
> > >>> streaming
> > >>> >> >> architecture more prominent in the docs.
> > >>> >> >>
> > >>> >> >> Being quite a sophisticated stack, we need to improve the
> > >>> presentation
> > >>> >> of
> > >>> >> >> how Flink works - to an extend necessary to use Flink (and to
> > >>> >> appreciate
> > >>> >> >> all the cool stuff that is happening). This should also come in
> > >>> handy
> > >>> >> with
> > >>> >> >> new contributors.
> > >>> >> >>
> > >>> >> >> As a general umbrella, we need to first decide where and how to
> > >>> >> organize
> > >>> >> >> the documentation.
> > >>> >> >>
> > >>> >> >> I would propose to put the bulk of the documentation into the
> > Wiki.
> > >>> >> Create
> > >>> >> >> a dedicated section on Flink Internals and sub-pages for each
> > >>> >> component /
> > >>> >> >> topic. To the docs, we add a general overview from which we
> link
> > >>> into
> > >>> >> the
> > >>> >> >> Wiki.
> > >>> >> >>
> > >>> >> >>
> > >>> >> >>  == These sections would go into the DOCS in the git repository
> > ==
> > >>> >> >>
> > >>> >> >>   - Overview of Program, pre-flight phase (type extraction,
> > >>> optimizer),
> > >>> >> >> JobManager, TaskManager. Differences between streaming and
> > batch. We
> > >>> >> can
> > >>> >> >> realize this through one very nice picture with few lines of
> > text.
> > >>> >> >>
> > >>> >> >>   - High level architecture stack, different program
> > representations
> > >>> >> (API
> > >>> >> >> operators, common API DAG, optimizer DAG, parallel data flow
> > >>> (JobGraph
> > >>> >> /
> > >>> >> >> Execution Graph)
> > >>> >> >>
> > >>> >> >>   - (maybe) Parallelism and scheduling. This seems to be
> > paramount
> > >>> to
> > >>> >> >> understand for users.
> > >>> >> >>
> > >>> >> >>   - Processes (JobManager, TaskManager, Webserver, WebClient,
> CLI
> > >>> >> client)
> > >>> >> >>
> > >>> >> >>
> > >>> >> >>
> > >>> >> >>  == These sections would go into the WIKI ==
> > >>> >> >>
> > >>> >> >>   - Project structure (maven projects, what is where,
> > dependencies
> > >>> >> between
> > >>> >> >> projects)
> > >>> >> >>
> > >>> >> >>   - Component overview
> > >>> >> >>
> > >>> >> >>     -> JobManager (InstanceManager, Scheduler, BLOB server,
> > Library
> > >>> >> Cache,
> > >>> >> >> Archiving)
> > >>> >> >>
> > >>> >> >>     -> TaskManager (MemoryManager, IOManager, BLOB Cache,
> Library
> > >>> >> Cache)
> > >>> >> >>
> > >>> >> >>     -> Involved Actor Systems / Actors / Messages
> > >>> >> >>
> > >>> >> >>   - Details about submitting a job (library upload, job graph
> > >>> >> submission,
> > >>> >> >> execution graph setup, scheduling trigger)
> > >>> >> >>
> > >>> >> >>   - Memory Management
> > >>> >> >>
> > >>> >> >>   - Optimizer internals
> > >>> >> >>
> > >>> >> >>   - Akka Setup specifics
> > >>> >> >>
> > >>> >> >>   - Netty and pluggable data exchange strategies
> > >>> >> >>
> > >>> >> >>   - Testing: Flink test clusters and unit test utilities
> > >>> >> >>
> > >>> >> >>   - Developer How-To: Setting up Eclipse, IntelliJ, Travis
> > >>> >> >>
> > >>> >> >>   - Step-by-step guide to add a new operator
> > >>> >> >>
> > >>> >> >>
> > >>> >> >> I will go ahead and stub some sections in the Wiki.
> > >>> >> >>
> > >>> >> >> As we discuss and agree/disagree with the outline, we can
> evolve
> > the
> > >>> >> Wiki.
> > >>> >> >>
> > >>> >> >> Greetings,
> > >>> >> >> Stephan
> > >>> >> >>
> > >>> >> >>
> > >>> >>
> > >>> >
> > >>> >
> > >>>
> > >>
> > >>
> >
>

Re: Improve the documentation of the Flink Architecture and internals

Posted by Kostas Tzoumas <kt...@apache.org>.
it's down for me as well

On Fri, Mar 20, 2015 at 7:12 PM, Henry Saputra <he...@gmail.com>
wrote:

> Is the wiki down for any of you?
>
> I can't access
> https://cwiki.apache.org/confluence/display/FLINK/Apache+Flink+Home
>
> 404
>
> - Henry
>
> On Fri, Mar 20, 2015 at 4:46 AM, Kostas Tzoumas <kt...@apache.org>
> wrote:
> > I added a document for data exchange between tasks:
> >
> https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks
> >
> > Feel free to edit. I plan to link the class names to the class files in
> > github.
> >
> > On Tue, Mar 17, 2015 at 11:17 AM, Kostas Tzoumas <kt...@apache.org>
> > wrote:
> >
> >> +1 for the Wiki.
> >>
> >> When these have been stabilized we can move them to the docs if we
> decide
> >> to do so.
> >>
> >> On Mon, Mar 16, 2015 at 10:07 PM, Stephan Ewen <se...@apache.org>
> wrote:
> >>
> >>> I have put my suggested version of an outline for the docs into the
> wiki.
> >>> Regardless where the docs end up (wiki or repository), we can use the
> wiki
> >>> to outline the docs.
> >>>
> >>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals
> >>>
> >>> Some pages contain some stub or outline, others are completely blank.
> >>>
> >>> Not a comple list. Additions are welcome.
> >>>
> >>> On Mon, Mar 16, 2015 at 10:04 PM, Stephan Ewen <se...@apache.org>
> wrote:
> >>>
> >>> > I think the Wiki has a much lower barrier of entry to fix docs,
> >>> especially
> >>> > for external people. The docs, with the Jekyll setup, is rather
> tricky.
> >>> > I would very much like that all kinds of people contribute to the
> docs
> >>> > about the internals, not just the usual three suspects that have done
> >>> this
> >>> > so far.
> >>> >
> >>> > Having a good landing page in the regular docs is exactly to not
> loose
> >>> all
> >>> > the people that do not look into a wiki. The overview pages for the
> >>> > internals need to be good and accessible and nicely link to the wiki
> to
> >>> > "forward" people there.
> >>> >
> >>> > The overhead of deciding what goes where should not be terribly
> large,
> >>> in
> >>> > my opinion, since there is no really "wrong" place to put it.
> >>> >
> >>> >
> >>> >
> >>> > On Mon, Mar 16, 2015 at 9:58 PM, Aljoscha Krettek <
> aljoscha@apache.org>
> >>> > wrote:
> >>> >
> >>> >> Why do you wan't to split stuff between the doc in the repository
> and
> >>> >> the wiki. I for one would always be to lazy to check stuff in a wiki
> >>> >> when there is also a documentation. Plus, this would lead to
> >>> >> additional overhead in deciding what goes where and syncing between
> >>> >> the two places for documentation.
> >>> >>
> >>> >> On Mon, Mar 16, 2015 at 7:59 PM, Stephan Ewen <se...@apache.org>
> >>> wrote:
> >>> >> > Ah, I totally forgot to add to the internals:
> >>> >> >
> >>> >> >   - Fault tolerance in Batch mode
> >>> >> >
> >>> >> >   - Fault Tolerance in Streaming Mode, with state handling
> >>> >> >
> >>> >> > On Mon, Mar 16, 2015 at 7:51 PM, Stephan Ewen <se...@apache.org>
> >>> wrote:
> >>> >> >
> >>> >> >> Hi all!
> >>> >> >>
> >>> >> >> I would like to kick of an effort to improve the documentation of
> >>> the
> >>> >> >> Flink Architecture and internals. This also means making the
> >>> streaming
> >>> >> >> architecture more prominent in the docs.
> >>> >> >>
> >>> >> >> Being quite a sophisticated stack, we need to improve the
> >>> presentation
> >>> >> of
> >>> >> >> how Flink works - to an extend necessary to use Flink (and to
> >>> >> appreciate
> >>> >> >> all the cool stuff that is happening). This should also come in
> >>> handy
> >>> >> with
> >>> >> >> new contributors.
> >>> >> >>
> >>> >> >> As a general umbrella, we need to first decide where and how to
> >>> >> organize
> >>> >> >> the documentation.
> >>> >> >>
> >>> >> >> I would propose to put the bulk of the documentation into the
> Wiki.
> >>> >> Create
> >>> >> >> a dedicated section on Flink Internals and sub-pages for each
> >>> >> component /
> >>> >> >> topic. To the docs, we add a general overview from which we link
> >>> into
> >>> >> the
> >>> >> >> Wiki.
> >>> >> >>
> >>> >> >>
> >>> >> >>  == These sections would go into the DOCS in the git repository
> ==
> >>> >> >>
> >>> >> >>   - Overview of Program, pre-flight phase (type extraction,
> >>> optimizer),
> >>> >> >> JobManager, TaskManager. Differences between streaming and
> batch. We
> >>> >> can
> >>> >> >> realize this through one very nice picture with few lines of
> text.
> >>> >> >>
> >>> >> >>   - High level architecture stack, different program
> representations
> >>> >> (API
> >>> >> >> operators, common API DAG, optimizer DAG, parallel data flow
> >>> (JobGraph
> >>> >> /
> >>> >> >> Execution Graph)
> >>> >> >>
> >>> >> >>   - (maybe) Parallelism and scheduling. This seems to be
> paramount
> >>> to
> >>> >> >> understand for users.
> >>> >> >>
> >>> >> >>   - Processes (JobManager, TaskManager, Webserver, WebClient, CLI
> >>> >> client)
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >>  == These sections would go into the WIKI ==
> >>> >> >>
> >>> >> >>   - Project structure (maven projects, what is where,
> dependencies
> >>> >> between
> >>> >> >> projects)
> >>> >> >>
> >>> >> >>   - Component overview
> >>> >> >>
> >>> >> >>     -> JobManager (InstanceManager, Scheduler, BLOB server,
> Library
> >>> >> Cache,
> >>> >> >> Archiving)
> >>> >> >>
> >>> >> >>     -> TaskManager (MemoryManager, IOManager, BLOB Cache, Library
> >>> >> Cache)
> >>> >> >>
> >>> >> >>     -> Involved Actor Systems / Actors / Messages
> >>> >> >>
> >>> >> >>   - Details about submitting a job (library upload, job graph
> >>> >> submission,
> >>> >> >> execution graph setup, scheduling trigger)
> >>> >> >>
> >>> >> >>   - Memory Management
> >>> >> >>
> >>> >> >>   - Optimizer internals
> >>> >> >>
> >>> >> >>   - Akka Setup specifics
> >>> >> >>
> >>> >> >>   - Netty and pluggable data exchange strategies
> >>> >> >>
> >>> >> >>   - Testing: Flink test clusters and unit test utilities
> >>> >> >>
> >>> >> >>   - Developer How-To: Setting up Eclipse, IntelliJ, Travis
> >>> >> >>
> >>> >> >>   - Step-by-step guide to add a new operator
> >>> >> >>
> >>> >> >>
> >>> >> >> I will go ahead and stub some sections in the Wiki.
> >>> >> >>
> >>> >> >> As we discuss and agree/disagree with the outline, we can evolve
> the
> >>> >> Wiki.
> >>> >> >>
> >>> >> >> Greetings,
> >>> >> >> Stephan
> >>> >> >>
> >>> >> >>
> >>> >>
> >>> >
> >>> >
> >>>
> >>
> >>
>

Re: Improve the documentation of the Flink Architecture and internals

Posted by Henry Saputra <he...@gmail.com>.
Is the wiki down for any of you?

I can't access https://cwiki.apache.org/confluence/display/FLINK/Apache+Flink+Home

404

- Henry

On Fri, Mar 20, 2015 at 4:46 AM, Kostas Tzoumas <kt...@apache.org> wrote:
> I added a document for data exchange between tasks:
> https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks
>
> Feel free to edit. I plan to link the class names to the class files in
> github.
>
> On Tue, Mar 17, 2015 at 11:17 AM, Kostas Tzoumas <kt...@apache.org>
> wrote:
>
>> +1 for the Wiki.
>>
>> When these have been stabilized we can move them to the docs if we decide
>> to do so.
>>
>> On Mon, Mar 16, 2015 at 10:07 PM, Stephan Ewen <se...@apache.org> wrote:
>>
>>> I have put my suggested version of an outline for the docs into the wiki.
>>> Regardless where the docs end up (wiki or repository), we can use the wiki
>>> to outline the docs.
>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals
>>>
>>> Some pages contain some stub or outline, others are completely blank.
>>>
>>> Not a comple list. Additions are welcome.
>>>
>>> On Mon, Mar 16, 2015 at 10:04 PM, Stephan Ewen <se...@apache.org> wrote:
>>>
>>> > I think the Wiki has a much lower barrier of entry to fix docs,
>>> especially
>>> > for external people. The docs, with the Jekyll setup, is rather tricky.
>>> > I would very much like that all kinds of people contribute to the docs
>>> > about the internals, not just the usual three suspects that have done
>>> this
>>> > so far.
>>> >
>>> > Having a good landing page in the regular docs is exactly to not loose
>>> all
>>> > the people that do not look into a wiki. The overview pages for the
>>> > internals need to be good and accessible and nicely link to the wiki to
>>> > "forward" people there.
>>> >
>>> > The overhead of deciding what goes where should not be terribly large,
>>> in
>>> > my opinion, since there is no really "wrong" place to put it.
>>> >
>>> >
>>> >
>>> > On Mon, Mar 16, 2015 at 9:58 PM, Aljoscha Krettek <al...@apache.org>
>>> > wrote:
>>> >
>>> >> Why do you wan't to split stuff between the doc in the repository and
>>> >> the wiki. I for one would always be to lazy to check stuff in a wiki
>>> >> when there is also a documentation. Plus, this would lead to
>>> >> additional overhead in deciding what goes where and syncing between
>>> >> the two places for documentation.
>>> >>
>>> >> On Mon, Mar 16, 2015 at 7:59 PM, Stephan Ewen <se...@apache.org>
>>> wrote:
>>> >> > Ah, I totally forgot to add to the internals:
>>> >> >
>>> >> >   - Fault tolerance in Batch mode
>>> >> >
>>> >> >   - Fault Tolerance in Streaming Mode, with state handling
>>> >> >
>>> >> > On Mon, Mar 16, 2015 at 7:51 PM, Stephan Ewen <se...@apache.org>
>>> wrote:
>>> >> >
>>> >> >> Hi all!
>>> >> >>
>>> >> >> I would like to kick of an effort to improve the documentation of
>>> the
>>> >> >> Flink Architecture and internals. This also means making the
>>> streaming
>>> >> >> architecture more prominent in the docs.
>>> >> >>
>>> >> >> Being quite a sophisticated stack, we need to improve the
>>> presentation
>>> >> of
>>> >> >> how Flink works - to an extend necessary to use Flink (and to
>>> >> appreciate
>>> >> >> all the cool stuff that is happening). This should also come in
>>> handy
>>> >> with
>>> >> >> new contributors.
>>> >> >>
>>> >> >> As a general umbrella, we need to first decide where and how to
>>> >> organize
>>> >> >> the documentation.
>>> >> >>
>>> >> >> I would propose to put the bulk of the documentation into the Wiki.
>>> >> Create
>>> >> >> a dedicated section on Flink Internals and sub-pages for each
>>> >> component /
>>> >> >> topic. To the docs, we add a general overview from which we link
>>> into
>>> >> the
>>> >> >> Wiki.
>>> >> >>
>>> >> >>
>>> >> >>  == These sections would go into the DOCS in the git repository ==
>>> >> >>
>>> >> >>   - Overview of Program, pre-flight phase (type extraction,
>>> optimizer),
>>> >> >> JobManager, TaskManager. Differences between streaming and batch. We
>>> >> can
>>> >> >> realize this through one very nice picture with few lines of text.
>>> >> >>
>>> >> >>   - High level architecture stack, different program representations
>>> >> (API
>>> >> >> operators, common API DAG, optimizer DAG, parallel data flow
>>> (JobGraph
>>> >> /
>>> >> >> Execution Graph)
>>> >> >>
>>> >> >>   - (maybe) Parallelism and scheduling. This seems to be paramount
>>> to
>>> >> >> understand for users.
>>> >> >>
>>> >> >>   - Processes (JobManager, TaskManager, Webserver, WebClient, CLI
>>> >> client)
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>  == These sections would go into the WIKI ==
>>> >> >>
>>> >> >>   - Project structure (maven projects, what is where, dependencies
>>> >> between
>>> >> >> projects)
>>> >> >>
>>> >> >>   - Component overview
>>> >> >>
>>> >> >>     -> JobManager (InstanceManager, Scheduler, BLOB server, Library
>>> >> Cache,
>>> >> >> Archiving)
>>> >> >>
>>> >> >>     -> TaskManager (MemoryManager, IOManager, BLOB Cache, Library
>>> >> Cache)
>>> >> >>
>>> >> >>     -> Involved Actor Systems / Actors / Messages
>>> >> >>
>>> >> >>   - Details about submitting a job (library upload, job graph
>>> >> submission,
>>> >> >> execution graph setup, scheduling trigger)
>>> >> >>
>>> >> >>   - Memory Management
>>> >> >>
>>> >> >>   - Optimizer internals
>>> >> >>
>>> >> >>   - Akka Setup specifics
>>> >> >>
>>> >> >>   - Netty and pluggable data exchange strategies
>>> >> >>
>>> >> >>   - Testing: Flink test clusters and unit test utilities
>>> >> >>
>>> >> >>   - Developer How-To: Setting up Eclipse, IntelliJ, Travis
>>> >> >>
>>> >> >>   - Step-by-step guide to add a new operator
>>> >> >>
>>> >> >>
>>> >> >> I will go ahead and stub some sections in the Wiki.
>>> >> >>
>>> >> >> As we discuss and agree/disagree with the outline, we can evolve the
>>> >> Wiki.
>>> >> >>
>>> >> >> Greetings,
>>> >> >> Stephan
>>> >> >>
>>> >> >>
>>> >>
>>> >
>>> >
>>>
>>
>>

Re: Improve the documentation of the Flink Architecture and internals

Posted by Ufuk Celebi <uc...@apache.org>.
I couldn't have a look at it earlier, because the Wiki was down. Very nice overview of the flow of things. I like the text and pictures a lot.

I will add content about:

1) The way that we do the network transfers with Netty

2) A more detailed message flow for pipelined vs. blocking results.


I am actually very happy that we moved this to the Wiki... it is so much easier to fix minor things now. :-)

On 20 Mar 2015, at 12:48, Ufuk Celebi <uc...@apache.org> wrote:

> Thanks. I will have a look later :-)
> 
> +1 for the Wiki. I think the low overhead makle
> 
> On 20 Mar 2015, at 12:46, Kostas Tzoumas <kt...@apache.org> wrote:
> 
>> I added a document for data exchange between tasks:
>> https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks
>> 
>> Feel free to edit. I plan to link the class names to the class files in
>> github.
>> 
>> On Tue, Mar 17, 2015 at 11:17 AM, Kostas Tzoumas <kt...@apache.org>
>> wrote:
>> 
>>> +1 for the Wiki.
>>> 
>>> When these have been stabilized we can move them to the docs if we decide
>>> to do so.
>>> 
>>> On Mon, Mar 16, 2015 at 10:07 PM, Stephan Ewen <se...@apache.org> wrote:
>>> 
>>>> I have put my suggested version of an outline for the docs into the wiki.
>>>> Regardless where the docs end up (wiki or repository), we can use the wiki
>>>> to outline the docs.
>>>> 
>>>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals
>>>> 
>>>> Some pages contain some stub or outline, others are completely blank.
>>>> 
>>>> Not a comple list. Additions are welcome.
>>>> 
>>>> On Mon, Mar 16, 2015 at 10:04 PM, Stephan Ewen <se...@apache.org> wrote:
>>>> 
>>>>> I think the Wiki has a much lower barrier of entry to fix docs,
>>>> especially
>>>>> for external people. The docs, with the Jekyll setup, is rather tricky.
>>>>> I would very much like that all kinds of people contribute to the docs
>>>>> about the internals, not just the usual three suspects that have done
>>>> this
>>>>> so far.
>>>>> 
>>>>> Having a good landing page in the regular docs is exactly to not loose
>>>> all
>>>>> the people that do not look into a wiki. The overview pages for the
>>>>> internals need to be good and accessible and nicely link to the wiki to
>>>>> "forward" people there.
>>>>> 
>>>>> The overhead of deciding what goes where should not be terribly large,
>>>> in
>>>>> my opinion, since there is no really "wrong" place to put it.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Mon, Mar 16, 2015 at 9:58 PM, Aljoscha Krettek <al...@apache.org>
>>>>> wrote:
>>>>> 
>>>>>> Why do you wan't to split stuff between the doc in the repository and
>>>>>> the wiki. I for one would always be to lazy to check stuff in a wiki
>>>>>> when there is also a documentation. Plus, this would lead to
>>>>>> additional overhead in deciding what goes where and syncing between
>>>>>> the two places for documentation.
>>>>>> 
>>>>>> On Mon, Mar 16, 2015 at 7:59 PM, Stephan Ewen <se...@apache.org>
>>>> wrote:
>>>>>>> Ah, I totally forgot to add to the internals:
>>>>>>> 
>>>>>>>  - Fault tolerance in Batch mode
>>>>>>> 
>>>>>>>  - Fault Tolerance in Streaming Mode, with state handling
>>>>>>> 
>>>>>>> On Mon, Mar 16, 2015 at 7:51 PM, Stephan Ewen <se...@apache.org>
>>>> wrote:
>>>>>>> 
>>>>>>>> Hi all!
>>>>>>>> 
>>>>>>>> I would like to kick of an effort to improve the documentation of
>>>> the
>>>>>>>> Flink Architecture and internals. This also means making the
>>>> streaming
>>>>>>>> architecture more prominent in the docs.
>>>>>>>> 
>>>>>>>> Being quite a sophisticated stack, we need to improve the
>>>> presentation
>>>>>> of
>>>>>>>> how Flink works - to an extend necessary to use Flink (and to
>>>>>> appreciate
>>>>>>>> all the cool stuff that is happening). This should also come in
>>>> handy
>>>>>> with
>>>>>>>> new contributors.
>>>>>>>> 
>>>>>>>> As a general umbrella, we need to first decide where and how to
>>>>>> organize
>>>>>>>> the documentation.
>>>>>>>> 
>>>>>>>> I would propose to put the bulk of the documentation into the Wiki.
>>>>>> Create
>>>>>>>> a dedicated section on Flink Internals and sub-pages for each
>>>>>> component /
>>>>>>>> topic. To the docs, we add a general overview from which we link
>>>> into
>>>>>> the
>>>>>>>> Wiki.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> == These sections would go into the DOCS in the git repository ==
>>>>>>>> 
>>>>>>>>  - Overview of Program, pre-flight phase (type extraction,
>>>> optimizer),
>>>>>>>> JobManager, TaskManager. Differences between streaming and batch. We
>>>>>> can
>>>>>>>> realize this through one very nice picture with few lines of text.
>>>>>>>> 
>>>>>>>>  - High level architecture stack, different program representations
>>>>>> (API
>>>>>>>> operators, common API DAG, optimizer DAG, parallel data flow
>>>> (JobGraph
>>>>>> /
>>>>>>>> Execution Graph)
>>>>>>>> 
>>>>>>>>  - (maybe) Parallelism and scheduling. This seems to be paramount
>>>> to
>>>>>>>> understand for users.
>>>>>>>> 
>>>>>>>>  - Processes (JobManager, TaskManager, Webserver, WebClient, CLI
>>>>>> client)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> == These sections would go into the WIKI ==
>>>>>>>> 
>>>>>>>>  - Project structure (maven projects, what is where, dependencies
>>>>>> between
>>>>>>>> projects)
>>>>>>>> 
>>>>>>>>  - Component overview
>>>>>>>> 
>>>>>>>>    -> JobManager (InstanceManager, Scheduler, BLOB server, Library
>>>>>> Cache,
>>>>>>>> Archiving)
>>>>>>>> 
>>>>>>>>    -> TaskManager (MemoryManager, IOManager, BLOB Cache, Library
>>>>>> Cache)
>>>>>>>> 
>>>>>>>>    -> Involved Actor Systems / Actors / Messages
>>>>>>>> 
>>>>>>>>  - Details about submitting a job (library upload, job graph
>>>>>> submission,
>>>>>>>> execution graph setup, scheduling trigger)
>>>>>>>> 
>>>>>>>>  - Memory Management
>>>>>>>> 
>>>>>>>>  - Optimizer internals
>>>>>>>> 
>>>>>>>>  - Akka Setup specifics
>>>>>>>> 
>>>>>>>>  - Netty and pluggable data exchange strategies
>>>>>>>> 
>>>>>>>>  - Testing: Flink test clusters and unit test utilities
>>>>>>>> 
>>>>>>>>  - Developer How-To: Setting up Eclipse, IntelliJ, Travis
>>>>>>>> 
>>>>>>>>  - Step-by-step guide to add a new operator
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I will go ahead and stub some sections in the Wiki.
>>>>>>>> 
>>>>>>>> As we discuss and agree/disagree with the outline, we can evolve the
>>>>>> Wiki.
>>>>>>>> 
>>>>>>>> Greetings,
>>>>>>>> Stephan
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
> 


Re: Improve the documentation of the Flink Architecture and internals

Posted by Kostas Tzoumas <kt...@apache.org>.
I added a document for data exchange between tasks:
https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks

Feel free to edit. I plan to link the class names to the class files in
github.

On Tue, Mar 17, 2015 at 11:17 AM, Kostas Tzoumas <kt...@apache.org>
wrote:

> +1 for the Wiki.
>
> When these have been stabilized we can move them to the docs if we decide
> to do so.
>
> On Mon, Mar 16, 2015 at 10:07 PM, Stephan Ewen <se...@apache.org> wrote:
>
>> I have put my suggested version of an outline for the docs into the wiki.
>> Regardless where the docs end up (wiki or repository), we can use the wiki
>> to outline the docs.
>>
>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals
>>
>> Some pages contain some stub or outline, others are completely blank.
>>
>> Not a comple list. Additions are welcome.
>>
>> On Mon, Mar 16, 2015 at 10:04 PM, Stephan Ewen <se...@apache.org> wrote:
>>
>> > I think the Wiki has a much lower barrier of entry to fix docs,
>> especially
>> > for external people. The docs, with the Jekyll setup, is rather tricky.
>> > I would very much like that all kinds of people contribute to the docs
>> > about the internals, not just the usual three suspects that have done
>> this
>> > so far.
>> >
>> > Having a good landing page in the regular docs is exactly to not loose
>> all
>> > the people that do not look into a wiki. The overview pages for the
>> > internals need to be good and accessible and nicely link to the wiki to
>> > "forward" people there.
>> >
>> > The overhead of deciding what goes where should not be terribly large,
>> in
>> > my opinion, since there is no really "wrong" place to put it.
>> >
>> >
>> >
>> > On Mon, Mar 16, 2015 at 9:58 PM, Aljoscha Krettek <al...@apache.org>
>> > wrote:
>> >
>> >> Why do you wan't to split stuff between the doc in the repository and
>> >> the wiki. I for one would always be to lazy to check stuff in a wiki
>> >> when there is also a documentation. Plus, this would lead to
>> >> additional overhead in deciding what goes where and syncing between
>> >> the two places for documentation.
>> >>
>> >> On Mon, Mar 16, 2015 at 7:59 PM, Stephan Ewen <se...@apache.org>
>> wrote:
>> >> > Ah, I totally forgot to add to the internals:
>> >> >
>> >> >   - Fault tolerance in Batch mode
>> >> >
>> >> >   - Fault Tolerance in Streaming Mode, with state handling
>> >> >
>> >> > On Mon, Mar 16, 2015 at 7:51 PM, Stephan Ewen <se...@apache.org>
>> wrote:
>> >> >
>> >> >> Hi all!
>> >> >>
>> >> >> I would like to kick of an effort to improve the documentation of
>> the
>> >> >> Flink Architecture and internals. This also means making the
>> streaming
>> >> >> architecture more prominent in the docs.
>> >> >>
>> >> >> Being quite a sophisticated stack, we need to improve the
>> presentation
>> >> of
>> >> >> how Flink works - to an extend necessary to use Flink (and to
>> >> appreciate
>> >> >> all the cool stuff that is happening). This should also come in
>> handy
>> >> with
>> >> >> new contributors.
>> >> >>
>> >> >> As a general umbrella, we need to first decide where and how to
>> >> organize
>> >> >> the documentation.
>> >> >>
>> >> >> I would propose to put the bulk of the documentation into the Wiki.
>> >> Create
>> >> >> a dedicated section on Flink Internals and sub-pages for each
>> >> component /
>> >> >> topic. To the docs, we add a general overview from which we link
>> into
>> >> the
>> >> >> Wiki.
>> >> >>
>> >> >>
>> >> >>  == These sections would go into the DOCS in the git repository ==
>> >> >>
>> >> >>   - Overview of Program, pre-flight phase (type extraction,
>> optimizer),
>> >> >> JobManager, TaskManager. Differences between streaming and batch. We
>> >> can
>> >> >> realize this through one very nice picture with few lines of text.
>> >> >>
>> >> >>   - High level architecture stack, different program representations
>> >> (API
>> >> >> operators, common API DAG, optimizer DAG, parallel data flow
>> (JobGraph
>> >> /
>> >> >> Execution Graph)
>> >> >>
>> >> >>   - (maybe) Parallelism and scheduling. This seems to be paramount
>> to
>> >> >> understand for users.
>> >> >>
>> >> >>   - Processes (JobManager, TaskManager, Webserver, WebClient, CLI
>> >> client)
>> >> >>
>> >> >>
>> >> >>
>> >> >>  == These sections would go into the WIKI ==
>> >> >>
>> >> >>   - Project structure (maven projects, what is where, dependencies
>> >> between
>> >> >> projects)
>> >> >>
>> >> >>   - Component overview
>> >> >>
>> >> >>     -> JobManager (InstanceManager, Scheduler, BLOB server, Library
>> >> Cache,
>> >> >> Archiving)
>> >> >>
>> >> >>     -> TaskManager (MemoryManager, IOManager, BLOB Cache, Library
>> >> Cache)
>> >> >>
>> >> >>     -> Involved Actor Systems / Actors / Messages
>> >> >>
>> >> >>   - Details about submitting a job (library upload, job graph
>> >> submission,
>> >> >> execution graph setup, scheduling trigger)
>> >> >>
>> >> >>   - Memory Management
>> >> >>
>> >> >>   - Optimizer internals
>> >> >>
>> >> >>   - Akka Setup specifics
>> >> >>
>> >> >>   - Netty and pluggable data exchange strategies
>> >> >>
>> >> >>   - Testing: Flink test clusters and unit test utilities
>> >> >>
>> >> >>   - Developer How-To: Setting up Eclipse, IntelliJ, Travis
>> >> >>
>> >> >>   - Step-by-step guide to add a new operator
>> >> >>
>> >> >>
>> >> >> I will go ahead and stub some sections in the Wiki.
>> >> >>
>> >> >> As we discuss and agree/disagree with the outline, we can evolve the
>> >> Wiki.
>> >> >>
>> >> >> Greetings,
>> >> >> Stephan
>> >> >>
>> >> >>
>> >>
>> >
>> >
>>
>
>

Re: Improve the documentation of the Flink Architecture and internals

Posted by Kostas Tzoumas <kt...@apache.org>.
+1 for the Wiki.

When these have been stabilized we can move them to the docs if we decide
to do so.

On Mon, Mar 16, 2015 at 10:07 PM, Stephan Ewen <se...@apache.org> wrote:

> I have put my suggested version of an outline for the docs into the wiki.
> Regardless where the docs end up (wiki or repository), we can use the wiki
> to outline the docs.
>
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals
>
> Some pages contain some stub or outline, others are completely blank.
>
> Not a comple list. Additions are welcome.
>
> On Mon, Mar 16, 2015 at 10:04 PM, Stephan Ewen <se...@apache.org> wrote:
>
> > I think the Wiki has a much lower barrier of entry to fix docs,
> especially
> > for external people. The docs, with the Jekyll setup, is rather tricky.
> > I would very much like that all kinds of people contribute to the docs
> > about the internals, not just the usual three suspects that have done
> this
> > so far.
> >
> > Having a good landing page in the regular docs is exactly to not loose
> all
> > the people that do not look into a wiki. The overview pages for the
> > internals need to be good and accessible and nicely link to the wiki to
> > "forward" people there.
> >
> > The overhead of deciding what goes where should not be terribly large, in
> > my opinion, since there is no really "wrong" place to put it.
> >
> >
> >
> > On Mon, Mar 16, 2015 at 9:58 PM, Aljoscha Krettek <al...@apache.org>
> > wrote:
> >
> >> Why do you wan't to split stuff between the doc in the repository and
> >> the wiki. I for one would always be to lazy to check stuff in a wiki
> >> when there is also a documentation. Plus, this would lead to
> >> additional overhead in deciding what goes where and syncing between
> >> the two places for documentation.
> >>
> >> On Mon, Mar 16, 2015 at 7:59 PM, Stephan Ewen <se...@apache.org> wrote:
> >> > Ah, I totally forgot to add to the internals:
> >> >
> >> >   - Fault tolerance in Batch mode
> >> >
> >> >   - Fault Tolerance in Streaming Mode, with state handling
> >> >
> >> > On Mon, Mar 16, 2015 at 7:51 PM, Stephan Ewen <se...@apache.org>
> wrote:
> >> >
> >> >> Hi all!
> >> >>
> >> >> I would like to kick of an effort to improve the documentation of the
> >> >> Flink Architecture and internals. This also means making the
> streaming
> >> >> architecture more prominent in the docs.
> >> >>
> >> >> Being quite a sophisticated stack, we need to improve the
> presentation
> >> of
> >> >> how Flink works - to an extend necessary to use Flink (and to
> >> appreciate
> >> >> all the cool stuff that is happening). This should also come in handy
> >> with
> >> >> new contributors.
> >> >>
> >> >> As a general umbrella, we need to first decide where and how to
> >> organize
> >> >> the documentation.
> >> >>
> >> >> I would propose to put the bulk of the documentation into the Wiki.
> >> Create
> >> >> a dedicated section on Flink Internals and sub-pages for each
> >> component /
> >> >> topic. To the docs, we add a general overview from which we link into
> >> the
> >> >> Wiki.
> >> >>
> >> >>
> >> >>  == These sections would go into the DOCS in the git repository ==
> >> >>
> >> >>   - Overview of Program, pre-flight phase (type extraction,
> optimizer),
> >> >> JobManager, TaskManager. Differences between streaming and batch. We
> >> can
> >> >> realize this through one very nice picture with few lines of text.
> >> >>
> >> >>   - High level architecture stack, different program representations
> >> (API
> >> >> operators, common API DAG, optimizer DAG, parallel data flow
> (JobGraph
> >> /
> >> >> Execution Graph)
> >> >>
> >> >>   - (maybe) Parallelism and scheduling. This seems to be paramount to
> >> >> understand for users.
> >> >>
> >> >>   - Processes (JobManager, TaskManager, Webserver, WebClient, CLI
> >> client)
> >> >>
> >> >>
> >> >>
> >> >>  == These sections would go into the WIKI ==
> >> >>
> >> >>   - Project structure (maven projects, what is where, dependencies
> >> between
> >> >> projects)
> >> >>
> >> >>   - Component overview
> >> >>
> >> >>     -> JobManager (InstanceManager, Scheduler, BLOB server, Library
> >> Cache,
> >> >> Archiving)
> >> >>
> >> >>     -> TaskManager (MemoryManager, IOManager, BLOB Cache, Library
> >> Cache)
> >> >>
> >> >>     -> Involved Actor Systems / Actors / Messages
> >> >>
> >> >>   - Details about submitting a job (library upload, job graph
> >> submission,
> >> >> execution graph setup, scheduling trigger)
> >> >>
> >> >>   - Memory Management
> >> >>
> >> >>   - Optimizer internals
> >> >>
> >> >>   - Akka Setup specifics
> >> >>
> >> >>   - Netty and pluggable data exchange strategies
> >> >>
> >> >>   - Testing: Flink test clusters and unit test utilities
> >> >>
> >> >>   - Developer How-To: Setting up Eclipse, IntelliJ, Travis
> >> >>
> >> >>   - Step-by-step guide to add a new operator
> >> >>
> >> >>
> >> >> I will go ahead and stub some sections in the Wiki.
> >> >>
> >> >> As we discuss and agree/disagree with the outline, we can evolve the
> >> Wiki.
> >> >>
> >> >> Greetings,
> >> >> Stephan
> >> >>
> >> >>
> >>
> >
> >
>

Re: Improve the documentation of the Flink Architecture and internals

Posted by Stephan Ewen <se...@apache.org>.
I have put my suggested version of an outline for the docs into the wiki.
Regardless where the docs end up (wiki or repository), we can use the wiki
to outline the docs.

https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals

Some pages contain some stub or outline, others are completely blank.

Not a comple list. Additions are welcome.

On Mon, Mar 16, 2015 at 10:04 PM, Stephan Ewen <se...@apache.org> wrote:

> I think the Wiki has a much lower barrier of entry to fix docs, especially
> for external people. The docs, with the Jekyll setup, is rather tricky.
> I would very much like that all kinds of people contribute to the docs
> about the internals, not just the usual three suspects that have done this
> so far.
>
> Having a good landing page in the regular docs is exactly to not loose all
> the people that do not look into a wiki. The overview pages for the
> internals need to be good and accessible and nicely link to the wiki to
> "forward" people there.
>
> The overhead of deciding what goes where should not be terribly large, in
> my opinion, since there is no really "wrong" place to put it.
>
>
>
> On Mon, Mar 16, 2015 at 9:58 PM, Aljoscha Krettek <al...@apache.org>
> wrote:
>
>> Why do you wan't to split stuff between the doc in the repository and
>> the wiki. I for one would always be to lazy to check stuff in a wiki
>> when there is also a documentation. Plus, this would lead to
>> additional overhead in deciding what goes where and syncing between
>> the two places for documentation.
>>
>> On Mon, Mar 16, 2015 at 7:59 PM, Stephan Ewen <se...@apache.org> wrote:
>> > Ah, I totally forgot to add to the internals:
>> >
>> >   - Fault tolerance in Batch mode
>> >
>> >   - Fault Tolerance in Streaming Mode, with state handling
>> >
>> > On Mon, Mar 16, 2015 at 7:51 PM, Stephan Ewen <se...@apache.org> wrote:
>> >
>> >> Hi all!
>> >>
>> >> I would like to kick of an effort to improve the documentation of the
>> >> Flink Architecture and internals. This also means making the streaming
>> >> architecture more prominent in the docs.
>> >>
>> >> Being quite a sophisticated stack, we need to improve the presentation
>> of
>> >> how Flink works - to an extend necessary to use Flink (and to
>> appreciate
>> >> all the cool stuff that is happening). This should also come in handy
>> with
>> >> new contributors.
>> >>
>> >> As a general umbrella, we need to first decide where and how to
>> organize
>> >> the documentation.
>> >>
>> >> I would propose to put the bulk of the documentation into the Wiki.
>> Create
>> >> a dedicated section on Flink Internals and sub-pages for each
>> component /
>> >> topic. To the docs, we add a general overview from which we link into
>> the
>> >> Wiki.
>> >>
>> >>
>> >>  == These sections would go into the DOCS in the git repository ==
>> >>
>> >>   - Overview of Program, pre-flight phase (type extraction, optimizer),
>> >> JobManager, TaskManager. Differences between streaming and batch. We
>> can
>> >> realize this through one very nice picture with few lines of text.
>> >>
>> >>   - High level architecture stack, different program representations
>> (API
>> >> operators, common API DAG, optimizer DAG, parallel data flow (JobGraph
>> /
>> >> Execution Graph)
>> >>
>> >>   - (maybe) Parallelism and scheduling. This seems to be paramount to
>> >> understand for users.
>> >>
>> >>   - Processes (JobManager, TaskManager, Webserver, WebClient, CLI
>> client)
>> >>
>> >>
>> >>
>> >>  == These sections would go into the WIKI ==
>> >>
>> >>   - Project structure (maven projects, what is where, dependencies
>> between
>> >> projects)
>> >>
>> >>   - Component overview
>> >>
>> >>     -> JobManager (InstanceManager, Scheduler, BLOB server, Library
>> Cache,
>> >> Archiving)
>> >>
>> >>     -> TaskManager (MemoryManager, IOManager, BLOB Cache, Library
>> Cache)
>> >>
>> >>     -> Involved Actor Systems / Actors / Messages
>> >>
>> >>   - Details about submitting a job (library upload, job graph
>> submission,
>> >> execution graph setup, scheduling trigger)
>> >>
>> >>   - Memory Management
>> >>
>> >>   - Optimizer internals
>> >>
>> >>   - Akka Setup specifics
>> >>
>> >>   - Netty and pluggable data exchange strategies
>> >>
>> >>   - Testing: Flink test clusters and unit test utilities
>> >>
>> >>   - Developer How-To: Setting up Eclipse, IntelliJ, Travis
>> >>
>> >>   - Step-by-step guide to add a new operator
>> >>
>> >>
>> >> I will go ahead and stub some sections in the Wiki.
>> >>
>> >> As we discuss and agree/disagree with the outline, we can evolve the
>> Wiki.
>> >>
>> >> Greetings,
>> >> Stephan
>> >>
>> >>
>>
>
>

Re: Improve the documentation of the Flink Architecture and internals

Posted by Stephan Ewen <se...@apache.org>.
I think the Wiki has a much lower barrier of entry to fix docs, especially
for external people. The docs, with the Jekyll setup, is rather tricky.
I would very much like that all kinds of people contribute to the docs
about the internals, not just the usual three suspects that have done this
so far.

Having a good landing page in the regular docs is exactly to not loose all
the people that do not look into a wiki. The overview pages for the
internals need to be good and accessible and nicely link to the wiki to
"forward" people there.

The overhead of deciding what goes where should not be terribly large, in
my opinion, since there is no really "wrong" place to put it.



On Mon, Mar 16, 2015 at 9:58 PM, Aljoscha Krettek <al...@apache.org>
wrote:

> Why do you wan't to split stuff between the doc in the repository and
> the wiki. I for one would always be to lazy to check stuff in a wiki
> when there is also a documentation. Plus, this would lead to
> additional overhead in deciding what goes where and syncing between
> the two places for documentation.
>
> On Mon, Mar 16, 2015 at 7:59 PM, Stephan Ewen <se...@apache.org> wrote:
> > Ah, I totally forgot to add to the internals:
> >
> >   - Fault tolerance in Batch mode
> >
> >   - Fault Tolerance in Streaming Mode, with state handling
> >
> > On Mon, Mar 16, 2015 at 7:51 PM, Stephan Ewen <se...@apache.org> wrote:
> >
> >> Hi all!
> >>
> >> I would like to kick of an effort to improve the documentation of the
> >> Flink Architecture and internals. This also means making the streaming
> >> architecture more prominent in the docs.
> >>
> >> Being quite a sophisticated stack, we need to improve the presentation
> of
> >> how Flink works - to an extend necessary to use Flink (and to appreciate
> >> all the cool stuff that is happening). This should also come in handy
> with
> >> new contributors.
> >>
> >> As a general umbrella, we need to first decide where and how to organize
> >> the documentation.
> >>
> >> I would propose to put the bulk of the documentation into the Wiki.
> Create
> >> a dedicated section on Flink Internals and sub-pages for each component
> /
> >> topic. To the docs, we add a general overview from which we link into
> the
> >> Wiki.
> >>
> >>
> >>  == These sections would go into the DOCS in the git repository ==
> >>
> >>   - Overview of Program, pre-flight phase (type extraction, optimizer),
> >> JobManager, TaskManager. Differences between streaming and batch. We can
> >> realize this through one very nice picture with few lines of text.
> >>
> >>   - High level architecture stack, different program representations
> (API
> >> operators, common API DAG, optimizer DAG, parallel data flow (JobGraph /
> >> Execution Graph)
> >>
> >>   - (maybe) Parallelism and scheduling. This seems to be paramount to
> >> understand for users.
> >>
> >>   - Processes (JobManager, TaskManager, Webserver, WebClient, CLI
> client)
> >>
> >>
> >>
> >>  == These sections would go into the WIKI ==
> >>
> >>   - Project structure (maven projects, what is where, dependencies
> between
> >> projects)
> >>
> >>   - Component overview
> >>
> >>     -> JobManager (InstanceManager, Scheduler, BLOB server, Library
> Cache,
> >> Archiving)
> >>
> >>     -> TaskManager (MemoryManager, IOManager, BLOB Cache, Library Cache)
> >>
> >>     -> Involved Actor Systems / Actors / Messages
> >>
> >>   - Details about submitting a job (library upload, job graph
> submission,
> >> execution graph setup, scheduling trigger)
> >>
> >>   - Memory Management
> >>
> >>   - Optimizer internals
> >>
> >>   - Akka Setup specifics
> >>
> >>   - Netty and pluggable data exchange strategies
> >>
> >>   - Testing: Flink test clusters and unit test utilities
> >>
> >>   - Developer How-To: Setting up Eclipse, IntelliJ, Travis
> >>
> >>   - Step-by-step guide to add a new operator
> >>
> >>
> >> I will go ahead and stub some sections in the Wiki.
> >>
> >> As we discuss and agree/disagree with the outline, we can evolve the
> Wiki.
> >>
> >> Greetings,
> >> Stephan
> >>
> >>
>

Re: Improve the documentation of the Flink Architecture and internals

Posted by Aljoscha Krettek <al...@apache.org>.
Why do you wan't to split stuff between the doc in the repository and
the wiki. I for one would always be to lazy to check stuff in a wiki
when there is also a documentation. Plus, this would lead to
additional overhead in deciding what goes where and syncing between
the two places for documentation.

On Mon, Mar 16, 2015 at 7:59 PM, Stephan Ewen <se...@apache.org> wrote:
> Ah, I totally forgot to add to the internals:
>
>   - Fault tolerance in Batch mode
>
>   - Fault Tolerance in Streaming Mode, with state handling
>
> On Mon, Mar 16, 2015 at 7:51 PM, Stephan Ewen <se...@apache.org> wrote:
>
>> Hi all!
>>
>> I would like to kick of an effort to improve the documentation of the
>> Flink Architecture and internals. This also means making the streaming
>> architecture more prominent in the docs.
>>
>> Being quite a sophisticated stack, we need to improve the presentation of
>> how Flink works - to an extend necessary to use Flink (and to appreciate
>> all the cool stuff that is happening). This should also come in handy with
>> new contributors.
>>
>> As a general umbrella, we need to first decide where and how to organize
>> the documentation.
>>
>> I would propose to put the bulk of the documentation into the Wiki. Create
>> a dedicated section on Flink Internals and sub-pages for each component /
>> topic. To the docs, we add a general overview from which we link into the
>> Wiki.
>>
>>
>>  == These sections would go into the DOCS in the git repository ==
>>
>>   - Overview of Program, pre-flight phase (type extraction, optimizer),
>> JobManager, TaskManager. Differences between streaming and batch. We can
>> realize this through one very nice picture with few lines of text.
>>
>>   - High level architecture stack, different program representations (API
>> operators, common API DAG, optimizer DAG, parallel data flow (JobGraph /
>> Execution Graph)
>>
>>   - (maybe) Parallelism and scheduling. This seems to be paramount to
>> understand for users.
>>
>>   - Processes (JobManager, TaskManager, Webserver, WebClient, CLI client)
>>
>>
>>
>>  == These sections would go into the WIKI ==
>>
>>   - Project structure (maven projects, what is where, dependencies between
>> projects)
>>
>>   - Component overview
>>
>>     -> JobManager (InstanceManager, Scheduler, BLOB server, Library Cache,
>> Archiving)
>>
>>     -> TaskManager (MemoryManager, IOManager, BLOB Cache, Library Cache)
>>
>>     -> Involved Actor Systems / Actors / Messages
>>
>>   - Details about submitting a job (library upload, job graph submission,
>> execution graph setup, scheduling trigger)
>>
>>   - Memory Management
>>
>>   - Optimizer internals
>>
>>   - Akka Setup specifics
>>
>>   - Netty and pluggable data exchange strategies
>>
>>   - Testing: Flink test clusters and unit test utilities
>>
>>   - Developer How-To: Setting up Eclipse, IntelliJ, Travis
>>
>>   - Step-by-step guide to add a new operator
>>
>>
>> I will go ahead and stub some sections in the Wiki.
>>
>> As we discuss and agree/disagree with the outline, we can evolve the Wiki.
>>
>> Greetings,
>> Stephan
>>
>>

Re: Improve the documentation of the Flink Architecture and internals

Posted by Stephan Ewen <se...@apache.org>.
Ah, I totally forgot to add to the internals:

  - Fault tolerance in Batch mode

  - Fault Tolerance in Streaming Mode, with state handling

On Mon, Mar 16, 2015 at 7:51 PM, Stephan Ewen <se...@apache.org> wrote:

> Hi all!
>
> I would like to kick of an effort to improve the documentation of the
> Flink Architecture and internals. This also means making the streaming
> architecture more prominent in the docs.
>
> Being quite a sophisticated stack, we need to improve the presentation of
> how Flink works - to an extend necessary to use Flink (and to appreciate
> all the cool stuff that is happening). This should also come in handy with
> new contributors.
>
> As a general umbrella, we need to first decide where and how to organize
> the documentation.
>
> I would propose to put the bulk of the documentation into the Wiki. Create
> a dedicated section on Flink Internals and sub-pages for each component /
> topic. To the docs, we add a general overview from which we link into the
> Wiki.
>
>
>  == These sections would go into the DOCS in the git repository ==
>
>   - Overview of Program, pre-flight phase (type extraction, optimizer),
> JobManager, TaskManager. Differences between streaming and batch. We can
> realize this through one very nice picture with few lines of text.
>
>   - High level architecture stack, different program representations (API
> operators, common API DAG, optimizer DAG, parallel data flow (JobGraph /
> Execution Graph)
>
>   - (maybe) Parallelism and scheduling. This seems to be paramount to
> understand for users.
>
>   - Processes (JobManager, TaskManager, Webserver, WebClient, CLI client)
>
>
>
>  == These sections would go into the WIKI ==
>
>   - Project structure (maven projects, what is where, dependencies between
> projects)
>
>   - Component overview
>
>     -> JobManager (InstanceManager, Scheduler, BLOB server, Library Cache,
> Archiving)
>
>     -> TaskManager (MemoryManager, IOManager, BLOB Cache, Library Cache)
>
>     -> Involved Actor Systems / Actors / Messages
>
>   - Details about submitting a job (library upload, job graph submission,
> execution graph setup, scheduling trigger)
>
>   - Memory Management
>
>   - Optimizer internals
>
>   - Akka Setup specifics
>
>   - Netty and pluggable data exchange strategies
>
>   - Testing: Flink test clusters and unit test utilities
>
>   - Developer How-To: Setting up Eclipse, IntelliJ, Travis
>
>   - Step-by-step guide to add a new operator
>
>
> I will go ahead and stub some sections in the Wiki.
>
> As we discuss and agree/disagree with the outline, we can evolve the Wiki.
>
> Greetings,
> Stephan
>
>