You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Vishwas Siravara <vs...@gmail.com> on 2019/07/17 05:40:11 UTC

Questions about user doc.

Hey guys,
In this document :
https://ci.apache.org/projects/flink/flink-docs-stable/internals/job_scheduling.html
,
there is a line in the beginning of the scheduling section which says that
: "A pipeline consists of multiple successive tasks, such as the
*n-th* parallel
instance of a MapFunction together with the *n-th* parallel instance of a
ReduceFunction. Note that Flink often executes successive tasks
concurrently:"

I am guessing this means that Flink executes successive tasks from
different pipelines successively right ?

I also don't fully understand Intermediate result partition and
Intermediate dataset , why are there two boxes in the diagram for
intermediate result after the first execution job vertex ? Is there any
more docs I can read to clearly understand these diagrams, thanks for your
help.

Thanks,
Vishwas

Re: Questions about user doc.

Posted by Biao Liu <mm...@gmail.com>.

Hi Vishwas,

> I am guessing this means that Flink executes successive tasks from
different pipelines successively right ?

As the document described, "Note that Flink often executes successive tasks
concurrently: For Streaming programs, that happens in any case, but also
for batch programs, it happens frequently.". So I think "successively" is
not accurate, at least for streaming job.

> I also don't fully understand Intermediate result partition and
Intermediate dataset , why are there two boxes in the diagram for
intermediate result after the first execution job vertex ? Is there any
more docs I can read to clearly understand these diagrams, thanks for your
help.

1. The "Intermediate dataset" is a kind of logical concept described in
JobGraph, while the "Intermediate result partition" is more like physical
concept described in ExecutionGraph. The "Intermediate result partition" is
a parallel version of "Intermediate dataset".
2. This document is under "Internals" part. It refers to some internal
implementations. There might not be enough documents as you wish. There are
some links of the critical concepts of this document. They link to Flink
Github repository. Sometimes codes are the best document :)


Vishwas Siravara <vs...@gmail.com> 于2019年7月17日周三 下午1:40写道：

> Hey guys,
> In this document :
> https://ci.apache.org/projects/flink/flink-docs-stable/internals/job_scheduling.html ,
> there is a line in the beginning of the scheduling section which says that
> : "A pipeline consists of multiple successive tasks, such as the *n-th* parallel
> instance of a MapFunction together with the *n-th* parallel instance of a
> ReduceFunction. Note that Flink often executes successive tasks
> concurrently:"
>
> I am guessing this means that Flink executes successive tasks from
> different pipelines successively right ?
>
> I also don't fully understand Intermediate result partition and
> Intermediate dataset , why are there two boxes in the diagram for
> intermediate result after the first execution job vertex ? Is there any
> more docs I can read to clearly understand these diagrams, thanks for your
> help.
>
> Thanks,
> Vishwas
>