You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Vincenzo Gulisano <vi...@gmail.com> on 2015/08/17 15:38:21 UTC

Pointers about internal threads and communication in Flink (streaming)

Hi, is there any document describing how streaming operators are run by the
TaskManagers and how communication (intra-node and inter-node) is managed.
The closest documention I found is
https://ci.apache.org/projects/flink/flink-docs-release-0.9/internals/general_arch.html
but it is still pretty high-level.

Thank you for your help

Re: Pointers about internal threads and communication in Flink (streaming)

Posted by Vincenzo Gulisano <vi...@gmail.com>.
Thank you very much!
I will have a look at the docs

Vincenzo

On 17 August 2015 at 16:26, Aljoscha Krettek <al...@apache.org> wrote:

> Hi Vincenzo,
> regarding TaskManagers and how they execute the operations:
>
> The TaskManager gets a class that is derived from AbstractInvokable. The
> TaskManager will create an object from that class and then call methods to
> facilitate execution. The two main methods are registerInputOutput() and
> invoke(). The first allows the invokable to setup the input/output channels
> and do initialization work. Then, invoke is called which would contain the
> actual loop that keeps reading from inputs and forwards data to the
> operator implementation.
>
> The base invokable for streaming is StreamTask. Then there are concrete
> subclasses OneInputStreamTask and TwoInputStreamTask for these two basic
> types of operator. The actual logic for an operator such as Map or Reduce
> is implemented in a subclass of StreamOperator (with concrete
> OneInputStreamOperator and TwoInputStreamOperator). OneInputStreamOperator,
> for example, has a method processElement(StreamRecord) that must be called
> for each element that is received.
>
> The StreamOperator, in turn, would hold the user code function object and
> forward received elements to it.
>
> To conclude, the StreamTask does the raw reading from network inputs. The
> StreamOperator receives elements and forwards them to user functions based
> on the semantics of the operator.
>
> I hope this helps, let us know if you have any more questions about this.
> :D
>
> Aljoscha
>
> On Mon, 17 Aug 2015 at 16:08 Stephan Ewen <se...@apache.org> wrote:
>
>> Hi!
>>
>> We are working on more docs for that. Here is a start that has a section
>> about the TaskManager task execution.
>>
>> Until then, here is a bit from our wiki:
>>
>> Data Exchange:
>> https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks
>>
>> Serialization for Data Exchange:
>> https://cwiki.apache.org/confluence/display/FLINK/Type+System%2C+Type+Extraction%2C+Serialization
>>
>> Coordiation with Actors:
>> https://cwiki.apache.org/confluence/display/FLINK/Akka+and+Actors
>>
>>
>>
>> Some WIP documentation on the Task execution:
>>   -
>> https://github.com/StephanEwen/incubator-flink/blob/docs/docs/internals/fig/taskmanager_task.svg
>>   -
>> https://github.com/StephanEwen/incubator-flink/blob/docs/docs/internals/through_stack.md
>>
>>
>> Greetings,
>> Stephan
>>
>>
>> On Mon, Aug 17, 2015 at 3:38 PM, Vincenzo Gulisano <
>> vincenzo.gulisano@gmail.com> wrote:
>>
>>> Hi, is there any document describing how streaming operators are run by
>>> the TaskManagers and how communication (intra-node and inter-node) is
>>> managed. The closest documention I found is
>>> https://ci.apache.org/projects/flink/flink-docs-release-0.9/internals/general_arch.html
>>> but it is still pretty high-level.
>>>
>>> Thank you for your help
>>>
>>>
>>>
>>

Re: Pointers about internal threads and communication in Flink (streaming)

Posted by Aljoscha Krettek <al...@apache.org>.
Hi Vincenzo,
regarding TaskManagers and how they execute the operations:

The TaskManager gets a class that is derived from AbstractInvokable. The
TaskManager will create an object from that class and then call methods to
facilitate execution. The two main methods are registerInputOutput() and
invoke(). The first allows the invokable to setup the input/output channels
and do initialization work. Then, invoke is called which would contain the
actual loop that keeps reading from inputs and forwards data to the
operator implementation.

The base invokable for streaming is StreamTask. Then there are concrete
subclasses OneInputStreamTask and TwoInputStreamTask for these two basic
types of operator. The actual logic for an operator such as Map or Reduce
is implemented in a subclass of StreamOperator (with concrete
OneInputStreamOperator and TwoInputStreamOperator). OneInputStreamOperator,
for example, has a method processElement(StreamRecord) that must be called
for each element that is received.

The StreamOperator, in turn, would hold the user code function object and
forward received elements to it.

To conclude, the StreamTask does the raw reading from network inputs. The
StreamOperator receives elements and forwards them to user functions based
on the semantics of the operator.

I hope this helps, let us know if you have any more questions about this. :D

Aljoscha

On Mon, 17 Aug 2015 at 16:08 Stephan Ewen <se...@apache.org> wrote:

> Hi!
>
> We are working on more docs for that. Here is a start that has a section
> about the TaskManager task execution.
>
> Until then, here is a bit from our wiki:
>
> Data Exchange:
> https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks
>
> Serialization for Data Exchange:
> https://cwiki.apache.org/confluence/display/FLINK/Type+System%2C+Type+Extraction%2C+Serialization
>
> Coordiation with Actors:
> https://cwiki.apache.org/confluence/display/FLINK/Akka+and+Actors
>
>
>
> Some WIP documentation on the Task execution:
>   -
> https://github.com/StephanEwen/incubator-flink/blob/docs/docs/internals/fig/taskmanager_task.svg
>   -
> https://github.com/StephanEwen/incubator-flink/blob/docs/docs/internals/through_stack.md
>
>
> Greetings,
> Stephan
>
>
> On Mon, Aug 17, 2015 at 3:38 PM, Vincenzo Gulisano <
> vincenzo.gulisano@gmail.com> wrote:
>
>> Hi, is there any document describing how streaming operators are run by
>> the TaskManagers and how communication (intra-node and inter-node) is
>> managed. The closest documention I found is
>> https://ci.apache.org/projects/flink/flink-docs-release-0.9/internals/general_arch.html
>> but it is still pretty high-level.
>>
>> Thank you for your help
>>
>>
>>
>

Re: Pointers about internal threads and communication in Flink (streaming)

Posted by Stephan Ewen <se...@apache.org>.
Hi!

We are working on more docs for that. Here is a start that has a section
about the TaskManager task execution.

Until then, here is a bit from our wiki:

Data Exchange:
https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks

Serialization for Data Exchange:
https://cwiki.apache.org/confluence/display/FLINK/Type+System%2C+Type+Extraction%2C+Serialization

Coordiation with Actors:
https://cwiki.apache.org/confluence/display/FLINK/Akka+and+Actors



Some WIP documentation on the Task execution:
  -
https://github.com/StephanEwen/incubator-flink/blob/docs/docs/internals/fig/taskmanager_task.svg
  -
https://github.com/StephanEwen/incubator-flink/blob/docs/docs/internals/through_stack.md


Greetings,
Stephan


On Mon, Aug 17, 2015 at 3:38 PM, Vincenzo Gulisano <
vincenzo.gulisano@gmail.com> wrote:

> Hi, is there any document describing how streaming operators are run by
> the TaskManagers and how communication (intra-node and inter-node) is
> managed. The closest documention I found is
> https://ci.apache.org/projects/flink/flink-docs-release-0.9/internals/general_arch.html
> but it is still pretty high-level.
>
> Thank you for your help
>
>
>