You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@beam.apache.org by thinkdoom <th...@qq.com> on 2019/12/13 03:20:31 UTC

回复： What's every part's responsiblity for python sdk with flink?

Thanks.


So the sdk harness is just&nbsp; like pyspark's&nbsp; "PYSPARK_PYTHON&nbsp; for executor" ?
It sounds like we need to depoloy it on every flink cluster node?





------------------&nbsp;原始邮件&nbsp;------------------
发件人:&nbsp;"Kyle Weaver"<kcweaver@google.com&gt;;
发送时间:&nbsp;2019年12月13日(星期五) 凌晨0:26
收件人:&nbsp;"user"<user@beam.apache.org&gt;;
抄送:&nbsp;"Maximilian Michels"<mxm@apache.org&gt;;
主题:&nbsp;Re: What's every part's responsiblity for python sdk with flink?



The order is: user python code&nbsp; -&gt;&nbsp; job server&nbsp;-&gt; flink cluster -&gt; SDK harness


1. User python code defines the Beam pipeline.
2. The job server executes the Beam pipeline on the Flink cluster. To do so, it must translate Beam operations into Flink native operations.
3. The Flink cluster executes the transforms specified by Beam, the same as it would execute any "normal" Flink pipeline.
4. When needed, a Beam-defined Flink transform (running on a Flink task manager) will invoke the SDK harness to run the python user code.


Might be a little overly simplistic, but that's the gist. For more info, there are several public talks on this available online. I think this one is the&nbsp;latest:&nbsp;https://youtu.be/hxHGLrshnCY?t=1769


On Thu, Dec 12, 2019 at 1:32 AM thinkdoom <thinkdoom@qq.com&gt; wrote:

1. what i grasp there are 4 part, and the data flow and call step is describe as below, is it right? 

user python code&nbsp; -&gt;&nbsp; beam-runners-flink-1.8-job-server -&gt; SDK harness -&gt; flink cluster.
1. What's every part's responsiblity for the 4 part?

Re: What's every part's responsiblity for python sdk with flink?

Posted by Kyle Weaver <kc...@google.com>.

> So the sdk harness is just  like pyspark's  "PYSPARK_PYTHON  for
executor" ?

They're conceptually similar, though the implementation is very different.

> It sounds like we need to depoloy it on every flink cluster node?

There's a few different configuration options. The preferred way is to use
containers, which requires no special deployment except an installation of
Docker on the Flink task managers.

On Thu, Dec 12, 2019 at 7:23 PM thinkdoom <th...@qq.com> wrote:

> Thanks.
>
> So the sdk harness is just  like pyspark's  "PYSPARK_PYTHON  for executor"
> ?
> It sounds like we need to depoloy it on every flink cluster node?
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "Kyle Weaver"<kc...@google.com>;
> *发送时间:* 2019年12月13日(星期五) 凌晨0:26
> *收件人:* "user"<us...@beam.apache.org>;
> *抄送:* "Maximilian Michels"<mx...@apache.org>;
> *主题:* Re: What's every part's responsiblity for python sdk with flink?
>
> The order is: user python code  ->  job server -> *flink cluster -> SDK
> harness*
>
> 1. User python code defines the Beam pipeline.
> 2. The job server executes the Beam pipeline on the Flink cluster. To do
> so, it must translate Beam operations into Flink native operations.
> 3. The Flink cluster executes the transforms specified by Beam, the same
> as it would execute any "normal" Flink pipeline.
> 4. When needed, a Beam-defined Flink transform (running on a Flink task
> manager) will invoke the SDK harness to run the python user code.
>
> Might be a little overly simplistic, but that's the gist. For more info,
> there are several public talks on this available online. I think this one
> is the latest: https://youtu.be/hxHGLrshnCY?t=1769
> <https://www.youtube.com/watch?v=hxHGLrshnCY>
>
> On Thu, Dec 12, 2019 at 1:32 AM thinkdoom <th...@qq.com> wrote:
>
>> 1. what i grasp there are 4 part, and the data flow and call step is
>> describe as below, is it right?
>> user python code  ->  beam-runners-flink-1.8-job-server -> SDK harness ->
>> flink cluster.
>> 1. What's every part's responsiblity for the 4 part?
>>
>>