You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Niclas Hedhman <ni...@apache.org> on 2018/03/28 10:14:54 UTC

Flink TaskManager and JobManager internals

Hi,

is there some document (or presentation) that explains the internals of how
a Job gets deployed on to the cluster? Communications, Classloading and
Serialization (if any) are the key points here I think.

I suspect that my application modeling framework is incompatible with the
standard Flink mechanism, and I would like to learn how much effort there
is to make my own mechanism (assuming it is possible, since Yarn and Mesos
are in similar situation)


Thanks in Advance
-- 
Niclas Hedhman, Software Developer
http://zest.apache.org - New Energy for Java

Re: Flink TaskManager and JobManager internals

Posted by Niclas Hedhman <ni...@apache.org>.
Thanks for trying to help, really appreciate it, and I am sorry that I was
not clear enough.

I am using Apache Polygene, and its application modeling is very nice for
what we do. What Polygene is exactly, is not really important, other than a
lot of app code exist at my end and that Polygene generates classes on the
fly, using custom classloaders. Think; AspectJ and similar, with runtime
weaving.

These last few weeks with Flink has been a bit scary, since I think it is
the first time in my 35 year career where I don't understand, can't figure
out and can't find answers, to what is actually going on under the hood,
even though I am able to work as a plain user, as prescribed, just fine. I
can guess, but that is going to take longer time to work out, than getting
pointers to those answers from the horses mouth.

What I don't fully understand in Flink (Streaming) is;

1. I define a main() method, put everything into a JAR file, and it
"somehow" gets deployed on the nodes in my cluster. Will each node receive
the JAR file and a JVM is spun up for that main(), or does Flink keep it
in-JVM and some classloader isolation to protect jobs from each other? The
dataArtisan presentation given, on slide 17 shows an ambiguous (to me)
layout which could be interpreted as my Flink app (topology I prefer to
call it) is executed on a seprate JVM...

2. But I have also seen that it is possible to "scale out" the processing
within a topology, which would suggest that additional hosts are used. If
so, how does that relate to the above deployment on, say 3 hosts? Is that
scale-out only within that JVM 9in which case I am good and don't need to
worry), or is that somehow offloaded to other servers in the cluster, and
if so how is that deployed?

3. "Debugging Classloading" is IMVHO a little bit "short" on the details,
and a complete overview of what classloaders Flink sets up (if any) and
when/how it does it, is basically what I need to make sure I set all of
that up correctly in my own case.

4. All Functions (et al) in Flink seems to require "java.io.Serializable",
which to me is a big flag waved screaming "problem for me". Polygene has a
state model that is not compatible with java.io.Serializable, and I have
been looking for explanations on why the Functions are serializable, but
since the data flow is dominating Flink Streaming there are LOTS of links
talking about data serialization, which is not a problem on my end.

5. YARN/Mesos was only mentioned to point out that complex deployments are
possible, with hooks, so from my PoV, worst-case scenario is to do my own
deployment system that doesn't rely on some of the fundamentals in Flink. I
am not to deploy on Mesos nor Yarn.


Once again, thanks a lot for any pointers or info that can be given.

Cheers
Niclas


On Wed, Mar 28, 2018 at 8:17 PM, kedar mhaswade <ke...@gmail.com>
wrote:

>
>
> On Wed, Mar 28, 2018 at 3:14 AM, Niclas Hedhman <ni...@apache.org> wrote:
>
>> Hi,
>>
>> is there some document (or presentation) that explains the internals of
>> how a Job gets deployed on to the cluster? Communications, Classloading and
>> Serialization (if any) are the key points here I think.
>>
>
> I don't know of any specific presentations, but data artisans provide
> http://training.data-artisans.com/system-overview.html which are pretty
> good.
> The Flink documentation is comprehensive.
> Class-loading: https://ci.apache.org/projects/flink/
> flink-docs-master/monitoring/debugging_classloading.html
> State serialization: https://ci.apache.org/projects/flink/
> flink-docs-master/dev/stream/state/custom_serialization.html
>
>>
>> I suspect that my application modeling framework is incompatible with the
>> standard Flink mechanism, and I would like to learn how much effort there
>> is to make my own mechanism (assuming it is possible, since Yarn and Mesos
>> are in similar situation)
>>
>
> Don't know what you mean by application "modeling" framework, but if you
> mean that you have a Flink app (batch or streaming) that you'd want to
> deploy to YARN (or Mesos, which is similar), then the flow appears to be
> 1- Create a "Flink Cluster" (also called a YARN session) when a user does
> "bin/yarn-session.sh <params>" and then
> 2- Run the app when the user does "bin/flink run <app-class> <app-jar>".
>
> It's the user's responsibility to shut down the cluster (YARN session) by
> sending a "stop" command to the YARN session created in 1). The code
> appears to be in classes like org.apache.flink.yarn.cli.FlinkYarnSessionCli (manage
> the YARN session) and org.apache.flink.client.CliFrontend (submit a Flink
> app to the YARN session).
>
> Regards,
> Kedar
>
>
>>
>> Thanks in Advance
>> --
>> Niclas Hedhman, Software Developer
>> http://zest.apache.org - New Energy for Java
>>
>
>


-- 
Niclas Hedhman, Software Developer
http://zest.apache.org - New Energy for Java

Re: Flink TaskManager and JobManager internals

Posted by kedar mhaswade <ke...@gmail.com>.
On Wed, Mar 28, 2018 at 3:14 AM, Niclas Hedhman <ni...@apache.org> wrote:

> Hi,
>
> is there some document (or presentation) that explains the internals of
> how a Job gets deployed on to the cluster? Communications, Classloading and
> Serialization (if any) are the key points here I think.
>

I don't know of any specific presentations, but data artisans provide
http://training.data-artisans.com/system-overview.html which are pretty
good.
The Flink documentation is comprehensive.
Class-loading:
https://ci.apache.org/projects/flink/flink-docs-master/monitoring/debugging_classloading.html
State serialization:
https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/state/custom_serialization.html


>
> I suspect that my application modeling framework is incompatible with the
> standard Flink mechanism, and I would like to learn how much effort there
> is to make my own mechanism (assuming it is possible, since Yarn and Mesos
> are in similar situation)
>

Don't know what you mean by application "modeling" framework, but if you
mean that you have a Flink app (batch or streaming) that you'd want to
deploy to YARN (or Mesos, which is similar), then the flow appears to be
1- Create a "Flink Cluster" (also called a YARN session) when a user does
"bin/yarn-session.sh <params>" and then
2- Run the app when the user does "bin/flink run <app-class> <app-jar>".

It's the user's responsibility to shut down the cluster (YARN session) by
sending a "stop" command to the YARN session created in 1). The code
appears to be in classes like
org.apache.flink.yarn.cli.FlinkYarnSessionCli (manage the YARN session)
and org.apache.flink.client.CliFrontend (submit a Flink app to the YARN
session).

Regards,
Kedar


>
> Thanks in Advance
> --
> Niclas Hedhman, Software Developer
> http://zest.apache.org - New Energy for Java
>