You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Elias Levy <fe...@gmail.com> on 2017/12/22 19:36:53 UTC

Flink network access control documentation

There is a need for better documentation on what connects to what over
which ports in a Flink cluster to allow users to configure network access
control rules.

I was under the impression that in a ZK HA configuration the Job Managers
were essentially independent and only coordinated via ZK.  But starting
multiple JMs in HA with the JM RPC port blocked between JMs shows that the
second JM's Akka subsystem is trying to connect to the leading JM:

INFO  akka.remote.transport.ProtocolStateActor                      - No
response from remote for outbound association. Associate timed out after
[20000 ms].
WARN  akka.remote.ReliableDeliverySupervisor                        -
Association with remote system [akka.tcp://flink@10.210.210.127:6123] has
failed, address is now gated for [5000] ms. Reason: [Association failed
with [akka.tcp://flink@10.210.210.127:6123]] Caused by: [No response from
remote for outbound association. Associate timed out after [20000 ms].]
WARN  akka.remote.transport.netty.NettyTransport                    -
Remote connection to [null] failed with
org.apache.flink.shaded.akka.org.jboss.netty.channel.ConnectTimeoutException:
connection timed out: /10.210.210.127:6123

Re: Flink network access control documentation

Posted by Ufuk Celebi <uc...@apache.org>.
Hey Elias,

thanks for opening a ticket (for reference:
https://issues.apache.org/jira/browse/FLINK-8311). I fully agree with
adding docs for this. I will try to write something down this week.

Your point about JobManagers only coordinating via ZK is correct
though. I had a look into the JobManager code (as of 1.4) and the
leader election service only reads and writes leader information into
ZK which is then picked up by the TaskManagers.

What you are seeing here is related to the web UI which is attached to
every JM. The UI tries to connect to the leading JM in order to access
runtime information of the leading JM. This is not documented anywhere
as far as I can tell and might have changed between 1.3 and 1.4. The
port should not be critical to the functioning of your Flink cluster,
but only for accessing the web UI on a non-leading JM.

– Ufuk


On Fri, Dec 22, 2017 at 8:36 PM, Elias Levy <fe...@gmail.com> wrote:
> There is a need for better documentation on what connects to what over which
> ports in a Flink cluster to allow users to configure network access control
> rules.
>
> I was under the impression that in a ZK HA configuration the Job Managers
> were essentially independent and only coordinated via ZK.  But starting
> multiple JMs in HA with the JM RPC port blocked between JMs shows that the
> second JM's Akka subsystem is trying to connect to the leading JM:
>
> INFO  akka.remote.transport.ProtocolStateActor                      - No
> response from remote for outbound association. Associate timed out after
> [20000 ms].
> WARN  akka.remote.ReliableDeliverySupervisor                        -
> Association with remote system [akka.tcp://flink@10.210.210.127:6123] has
> failed, address is now gated for [5000] ms. Reason: [Association failed with
> [akka.tcp://flink@10.210.210.127:6123]] Caused by: [No response from remote
> for outbound association. Associate timed out after [20000 ms].]
> WARN  akka.remote.transport.netty.NettyTransport                    - Remote
> connection to [null] failed with
> org.apache.flink.shaded.akka.org.jboss.netty.channel.ConnectTimeoutException:
> connection timed out: /10.210.210.127:6123
>