You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by 윤형덕 <yn...@naver.com> on 2017/03/13 04:42:45 UTC

multiple consumer of intermediate data set

Hi All,
 
figure1
https://ci.apache.org/projects/flink/flink-docs-release-1.2/fig/job_and_execution_graph.svg
 
as we can see in figure1, JobVertex(B) has two consumer( JobVertex(C) and JobVertex(D) )
and accordingly Intermediate Data Set of JobVertex(B) has two consumer( JobVertex(C) and JobVertex(D) )
but in case of JobVertex(A), though it has two consumer( JobVertex(B) and JobVertex(D) ) same as JobVertex(B)
it has two separate intermediates data set and each intermediate data set has one consumer.
i couldn't understand why... for me it looks same case but why one has one Intermediate Data Set and another has two?
could anyone explain what is difference between JobVertex(A) and JobVertex(B)?

回复：multiple consumer of intermediate data set

Posted by "Zhijiang(wangzhijiang999)" <wa...@aliyun.com>.

Hi  lining,
        From JobGraph level, it is logic topology. There will be one IntermediateDataSet between each producer and consumer, like the case A-IntermediateDataSet-B,  A-IntermediateDataSet-D in the left graph.Also the same case for  B-IntermediateDataSet-C,  B-IntermediateDataSet-D, but the IntermediateDataSet between B and D is not shown separately in the left graph.
       From ExecutionGraph level, it is related with physical runtime. There will be one IntermediateResultPartition among each connected parallel ExecutionVertex, like the case A1-IntermediateResultPartition-B1,A1-IntermediateResultPartition-B2,A2-IntermediateResultPartition-B1, A2-IntermediateResultPartition-B2 in the right graph.
Cheers,
Zhijiang-----------------------------------------------------------------发件人：lining jing <ji...@gmail.com>发送时间：2017年3月15日(星期三) 10:54收件人：user <us...@flink.apache.org>; Zhijiang(wangzhijiang999) <wa...@aliyun.com>主　题：Re: multiple consumer of intermediate data set
Hi，   if output is same， why not just only one intermediate data set is ok
2017-03-14 14:36 GMT+08:00 Zhijiang(wangzhijiang999) <wa...@aliyun.com>:
Hi ,
     I think there is no difference between JobVertex(A) and JobVertex(B). Because the JobVertex(C) is not shown in the right graph, it may mislead you.There should be another intermediate result partition between JobVertex(B) and JobVertex(C) for each parallelism, and that is the same case with JobVertex(A).

Cheers,
Zhijiang------------------------------------------------------------------发件人：윤형덕 <yn...@naver.com>发送时间：2017年3月13日(星期一) 12:43收件人：user <us...@flink.apache.org>主　题：multiple consumer of intermediate data set
Hi All, figure1
https://ci.apache.org/projects/flink/flink-docs-release-1.2/fig/job_and_execution_graph.svg as we can see in figure1, JobVertex(B) has two consumer( JobVertex(C) and JobVertex(D) )and accordingly Intermediate Data Set of JobVertex(B) has two consumer( JobVertex(C) and JobVertex(D) )
but in case of JobVertex(A), though it has two consumer( JobVertex(B) and JobVertex(D) ) same as JobVertex(B)it has two separate intermediates data set and each intermediate data set has one consumer.
i couldn't understand why... for me it looks same case but why one has one Intermediate Data Set and another has two?
could anyone explain what is difference between JobVertex(A) and JobVertex(B)?

Re: multiple consumer of intermediate data set

Posted by lining jing <ji...@gmail.com>.

Hi，
   if output is same， why not just only one intermediate data set is ok

2017-03-14 14:36 GMT+08:00 Zhijiang(wangzhijiang999) <
wangzhijiang999@aliyun.com>:

> Hi ,
>
>      I think there is no difference between JobVertex(A) and JobVertex(B).
> Because the JobVertex(C) is not shown in the right graph, it may mislead
> you.
> There should be another intermediate result partition between JobVertex(B)
> and JobVertex(C) for each parallelism, and that is the same case with
> JobVertex(A).
>
>
> Cheers,
>
> Zhijiang
>
> ------------------------------------------------------------------
> 发件人：윤형덕 <yn...@naver.com>
> 发送时间：2017年3月13日(星期一) 12:43
> 收件人：user <us...@flink.apache.org>
> 主 题：multiple consumer of intermediate data set
>
> Hi All,
>
>
>
> figure1
> https://ci.apache.org/projects/flink/flink-docs-release-1.2/fig/job_and_
> execution_graph.svg
>
>
>
> as we can see in figure1, JobVertex(B) has two consumer( JobVertex(C) and
> JobVertex(D) )
>
> and accordingly Intermediate Data Set of JobVertex(B) has two consumer(
> JobVertex(C) and JobVertex(D) )
> but in case of JobVertex(A), though it has two consumer( JobVertex(B) and
> JobVertex(D) ) same as JobVertex(B)
>
> it has two separate intermediates data set and each intermediate data
> set has one consumer.
> i couldn't understand why... for me it looks same case but why one has one
> Intermediate Data Set and another has two?
> could anyone explain what is difference between JobVertex(A) and
> JobVertex(B)?
>
>
>

回复：multiple consumer of intermediate data set

Posted by "Zhijiang(wangzhijiang999)" <wa...@aliyun.com>.

Hi ,
     I think there is no difference between JobVertex(A) and JobVertex(B). Because the JobVertex(C) is not shown in the right graph, it may mislead you.There should be another intermediate result partition between JobVertex(B) and JobVertex(C) for each parallelism, and that is the same case with JobVertex(A).

Cheers,
Zhijiang------------------------------------------------------------------发件人：윤형덕 <yn...@naver.com>发送时间：2017年3月13日(星期一) 12:43收件人：user <us...@flink.apache.org>主　题：multiple consumer of intermediate data set
Hi All, figure1
https://ci.apache.org/projects/flink/flink-docs-release-1.2/fig/job_and_execution_graph.svg as we can see in figure1, JobVertex(B) has two consumer( JobVertex(C) and JobVertex(D) )and accordingly Intermediate Data Set of JobVertex(B) has two consumer( JobVertex(C) and JobVertex(D) )
but in case of JobVertex(A), though it has two consumer( JobVertex(B) and JobVertex(D) ) same as JobVertex(B)it has two separate intermediates data set and each intermediate data set has one consumer.
i couldn't understand why... for me it looks same case but why one has one Intermediate Data Set and another has two?
could anyone explain what is difference between JobVertex(A) and JobVertex(B)?

Re: multiple consumer of intermediate data set

Posted by Aljoscha Krettek <al...@apache.org>.

For context, this is from the Flink doc:
https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/job_scheduling.html


I think Ufuk (cc-ed) could know more about this>





On Mon, Mar 13, 2017, at 05:42, 윤형덕 wrote:

> Hi All,



>  



> figure1

> https://ci.apache.org/projects/flink/flink-docs-release-1.2/fig/job_and_execution_graph.svg
>  



> as we can see in figure1, JobVertex(B) has two consumer( JobVertex(C)
> and JobVertex(D) )
> and accordingly Intermediate Data Set of JobVertex(B) has two
> consumer( JobVertex(C) and JobVertex(D) )
> but in case of JobVertex(A), though it has two consumer( JobVertex(B)
> and JobVertex(D) ) same as JobVertex(B)
> it has two separate intermediates data set and each intermediate data
> set has one consumer.
> i couldn't understand why... for me it looks same case but why one has
> one Intermediate Data Set and another has two?
> could anyone explain what is difference between JobVertex(A) and
> JobVertex(B)?


>