You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Stephan Ewen <se...@apache.org> on 2018/08/13 06:54:36 UTC

Re: [Discuss] FLIP-26 - SSL Mutual Authentication

Sounds good, Eron!

Please go ahead...

On Sat, Jul 28, 2018 at 1:33 AM, Eron Wright <er...@gmail.com> wrote:

>  As an update to this thread, Stephan opted to split the internal/external
> configuration (by providing overrides for a common SSL configuration):
> https://github.com/apache/flink/pull/6326
>
> Note that Akka doesn't support hostname verification in its 'classic'
> remoting implementation (though the new Artery implementation apparently
> does), and such verification wouldn't apply to the client certificate
> anyway.   So the reality is that one should use a limited truststore (never
> the system truststore) for Akka communication.
>
> On the question of routing external communication thru the YARN resource
> proxy or Mesos/DCOS admin router, the value proposition is:
> a) simplifies service discovery on the part of external clients,
> b) permits single sign-on (SSO) be delegating authentication to a central
> authority,
> c) facilitates access from outside the cluster, via a public address.
> The main challenge is that the Flink client code must support a more
> diverse array of authentication methods, e.g. Kerberos when communicating
> with the YARN proxy.
>
> Given #6326, the next steps would be (unordered):
> a) create an umbrella issue for the overall effort
> b) dive into the authorization work for external communication
> c) implement auto-generation of a certificate for internal communication
> d) implement TLS on queryable state interface (FLINK-5029)
>
> I'll take care of (a) unless there is any objection.
> -Eron
>
>
> On Sun, May 13, 2018 at 5:45 AM Stephan Ewen <ew...@gmail.com>
> wrote:
>
> > Throwing in some more food for thought:
> >
> > An alternative to the above proposed separation of internal and external
> > SSL would be the following:
> >
> >   - We separate channel encryption and authentication
> >   - We use one common SSL layer (internal and external) that is in both
> > cases only responsible for establishing an encrypted connection
> >   - Authentication / authorization internally is done by SASL with
> > username/password or shared secret.
> >   - Authentication externally must be through a proxy and authorization
> > based on a validating HTTP headers set by the proxy, as discussed above..
> >
> > Advantages:
> >   - There is only one certificate needed, which could also be shared
> across
> > applications
> >   - One or two lines in the config authenticate and authorize internal
> > communication
> >   - One could possibly still fall back to the other mode by skipping
> >
> > Open Questions / Disadvantages
> >   - Given that hostname verification during SSL handshake is not possible
> > in many setups, the encrypted channel is vulnerable to man-in-the-middle
> > attacks without mutual authentication. Not sure how serious that is,
> > because it would need an attacker to have compromise network nodes of the
> > cluster already. is that not a universal issue in the K8s world?
> >
> > This is anyways a bit hypothetical, because as long as we have akka
> beneath
> > the RPC layer, we cannot go with that approach.
> >
> > However, if we want to at least keep the door open towards something like
> > that in the future, we would need to set up configuration in such a way
> > that we have a "common SSL" configuration (keystore, truststore, etc.)
> and
> > internal/external options that override those. That would anyways be
> > helpful for backwards compatibility.
> >
> > @Eron - what are your thoughts on that?
> >
> >
> >
> >
> >
> >
> >
> >
> > On Sun, May 13, 2018 at 1:40 AM, Stephan Ewen <ew...@gmail.com>
> > wrote:
> >
> > > Thank you for bringing this proposal up. It looks very good and we seem
> > to
> > > be thinking along very similar lines.
> > >
> > > Below are some comments and thoughts on the FLIP.
> > >
> > > *Internal vs. External Connectivity*
> > >
> > > That is a very helpful distinction, let's build on that.
> > >
> > >   - I would suggest to treat eventually all communication coming
> > > potentially from users as external, meaning Client-to-Dispatcher,
> > > Client-to-JobManager (trigger savepoint, change parallelism, ...), Web
> > UI,
> > > Queryable State.
> > >
> > >   - That leaves communication that is only between
> > JobManager/TaskManager/
> > > ResourceManager/Dispatcher/HistoryServer as internal.
> > >
> > >   - I am somewhat operating under the assumption that all external
> > > communication will eventually be HTTP/REST. That works best with many
> > > setups and is the basis for using service proxies that
> > > handle  authentication/authorization.
> > >
> > >
> > > In Flink 1.5 and future versions, we have the following update there:
> > >
> > >   - Akka is now strictly internal connectivity, the client (except
> legacy
> > > client) do not use it any more.
> > >
> > >   - The Blob Server will move to purely internal connectivity in Flink
> > > 1.6, where a POST of a job to the Dispatcher has the jars and the
> > JobGraph.
> > > That is important for Kubernetes setups, where exposing the BlobServer
> > and
> > > querying the blob port causes quite some friction.
> > >
> > >   - Treating queryable state as "internal connectivity" is fine for
> now.
> > > We should treat it as "external" connectivity in the future if we move
> it
> > > to HTTP/REST.
> > >
> > >
> > > *Internal Connectivity and SSL Mutual Authentication*
> > >
> > > Simply activating SSL mutual authentication for the internal
> > communication
> > > is a really low hanging fruit.
> > >
> > > Activating client authentication for Akka, network stack Netty (and
> Blob
> > > Server/Client in Flink 1.6) should require no change in the
> > configurations
> > > with respect to Flink 1.4. All processes are, with respect to internal
> > > communication, simultaneously server and client endpoints. Because of
> > that,
> > > they already need KeyStore and TrustStore files for SSL handshakes,
> where
> > > the TrustStore needs to trust the KeyStore Certificate.
> > >
> > > I personally favor the suggestion made to have a script that generates
> a
> > > self-signed certificate and adds it to "conf" and updates the
> > > configuration. That should be picked up by the Yarn and Mesos clients
> > > anyways.
> > >
> > >
> > > *External Connectivity*
> > >
> > > There is a huge surface area and I think we need to give users a way to
> > > plug in their own tools.
> > > From what I see (and after some discussions with Patrick and Gary) I
> > think
> > > it makes sense to look at proxies in a broad way, similar to the
> approach
> > > Eron outlined.
> > >
> > > The basic approach could be like that:
> > >
> > >   - Everything goes through HTTPS, so the proxy can work with HTTP
> > headers.
> > >   - The proxy handles authentication and possibly authorization. The
> > proxy
> > > adds some header, for example a user name, a group id, an authorization
> > > token.
> > >   - Flink can configure an implementation of an 'authorizer' or
> validator
> > > on the headers to decide whether the request is valid.
> > >
> > >   - Example 1: The proxy does authentication and adds the user name /
> > > group as a header. The the Flink-side authorizer simply checks whether
> > the
> > > name is in the config (simple ACL-style) scheme.
> > >   - Example 2: The proxy adds an JSON Web Token and the authorizer
> > > validates that token.
> > >
> > > For secure connections between the Proxy and the Flink Endpoint I would
> > > follow Eron's suggestion, to use separate KeyStores and TrustStores
> than
> > > for internal communication.
> > >
> > > For Yarn and Mesos, I would like to see if we could handle those again
> as
> > > a special case of the proxies above:
> > >   - DCOS Admin Router forwards the user authentication token, so that
> > > could be another authorizer implementation.
> > >   - In YARN we could see if can implement the IP filter via such an
> > > authorizer.
> > >
> > >
> > > *Hostname Verification*
> > >
> > > For internal communication, and especially on dynamic environments like
> > > Kubernetes, it is very hard to work with certificates and have hostname
> > > verification on.
> > >
> > > If we assume internal communication works strictly with a shared secret
> > > certificate and with client authentication, does hostname verification
> > > actually still add security in that particular setup? My understanding
> > was
> > > that hostname verification is important to not have some valid
> > certificate
> > > presented, but the one bound to the server you want to talk to. If we
> > have
> > > anyways one trusted certificate only, isn't that already implied?
> > >
> > > On the other hand, it is still possible (and potentially valuable) for
> > > users in standalone mode to use keystores and truststores from a PKI,
> in
> > > which case there may still be an argument in favor of hostname
> > verification.
> > >
> > > On Thu, May 10, 2018, 02:30 Eron Wright <er...@gmail.com> wrote:
> > >
> > >> Hello,
> > >>
> > >> Given that some SSL enhancement bugs have been posted lately, I took
> > some
> > >> time to revise FLIP-26 which explores how to harden both external and
> > >> internal communication.
> > >>
> > >>
> > https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=80453255
> > >>
> > >> Some recent related issues:
> > >> - FLINK-9312 - mutual auth for intra-cluster communication
> > >> - FLINK-5030 - original SSL feature work
> > >>
> > >> There's also some recent discussion of how to use Flink SSL
> effectively
> > in
> > >> a Kubernetes environment.   The issue is about hostname verification.
> > The
> > >> proposal that I've put forward in FLIP-26 is to not use hostname
> > >> verification for intra-cluster communication, but rather to rely in a
> > >> cluster-internal certificate and a truststore consisting only of that
> > >> certificate.   Meanwhile, a new "external" certificate would be
> > >> configurable for the web/api endpoint and associated with a well-known
> > DNS
> > >> name as provided by a K8s Service resource.
> > >>
> > >> Stephan is this in-line with your thinking re FLINK-9312?
> > >>
> > >> Thanks
> > >> Eron
> > >>
> > >
> >
>

Re: [Discuss] FLIP-26 - SSL Mutual Authentication

Posted by Stephan Ewen <se...@apache.org>.
FYI: The 1.6 docs reflect the setup where internal and external SSL are
separately configured, and where internal SSL uses client authentication.

https://ci.apache.org/projects/flink/flink-docs-release-1.6/ops/security-ssl.html

On Mon, Aug 13, 2018 at 8:54 AM, Stephan Ewen <se...@apache.org> wrote:

> Sounds good, Eron!
>
> Please go ahead...
>
> On Sat, Jul 28, 2018 at 1:33 AM, Eron Wright <er...@gmail.com> wrote:
>
>>  As an update to this thread, Stephan opted to split the internal/external
>> configuration (by providing overrides for a common SSL configuration):
>> https://github.com/apache/flink/pull/6326
>>
>> Note that Akka doesn't support hostname verification in its 'classic'
>> remoting implementation (though the new Artery implementation apparently
>> does), and such verification wouldn't apply to the client certificate
>> anyway.   So the reality is that one should use a limited truststore
>> (never
>> the system truststore) for Akka communication.
>>
>> On the question of routing external communication thru the YARN resource
>> proxy or Mesos/DCOS admin router, the value proposition is:
>> a) simplifies service discovery on the part of external clients,
>> b) permits single sign-on (SSO) be delegating authentication to a central
>> authority,
>> c) facilitates access from outside the cluster, via a public address.
>> The main challenge is that the Flink client code must support a more
>> diverse array of authentication methods, e.g. Kerberos when communicating
>> with the YARN proxy.
>>
>> Given #6326, the next steps would be (unordered):
>> a) create an umbrella issue for the overall effort
>> b) dive into the authorization work for external communication
>> c) implement auto-generation of a certificate for internal communication
>> d) implement TLS on queryable state interface (FLINK-5029)
>>
>> I'll take care of (a) unless there is any objection.
>> -Eron
>>
>>
>> On Sun, May 13, 2018 at 5:45 AM Stephan Ewen <ew...@gmail.com>
>> wrote:
>>
>> > Throwing in some more food for thought:
>> >
>> > An alternative to the above proposed separation of internal and external
>> > SSL would be the following:
>> >
>> >   - We separate channel encryption and authentication
>> >   - We use one common SSL layer (internal and external) that is in both
>> > cases only responsible for establishing an encrypted connection
>> >   - Authentication / authorization internally is done by SASL with
>> > username/password or shared secret.
>> >   - Authentication externally must be through a proxy and authorization
>> > based on a validating HTTP headers set by the proxy, as discussed
>> above..
>> >
>> > Advantages:
>> >   - There is only one certificate needed, which could also be shared
>> across
>> > applications
>> >   - One or two lines in the config authenticate and authorize internal
>> > communication
>> >   - One could possibly still fall back to the other mode by skipping
>> >
>> > Open Questions / Disadvantages
>> >   - Given that hostname verification during SSL handshake is not
>> possible
>> > in many setups, the encrypted channel is vulnerable to man-in-the-middle
>> > attacks without mutual authentication. Not sure how serious that is,
>> > because it would need an attacker to have compromise network nodes of
>> the
>> > cluster already. is that not a universal issue in the K8s world?
>> >
>> > This is anyways a bit hypothetical, because as long as we have akka
>> beneath
>> > the RPC layer, we cannot go with that approach.
>> >
>> > However, if we want to at least keep the door open towards something
>> like
>> > that in the future, we would need to set up configuration in such a way
>> > that we have a "common SSL" configuration (keystore, truststore, etc.)
>> and
>> > internal/external options that override those. That would anyways be
>> > helpful for backwards compatibility.
>> >
>> > @Eron - what are your thoughts on that?
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Sun, May 13, 2018 at 1:40 AM, Stephan Ewen <ew...@gmail.com>
>> > wrote:
>> >
>> > > Thank you for bringing this proposal up. It looks very good and we
>> seem
>> > to
>> > > be thinking along very similar lines.
>> > >
>> > > Below are some comments and thoughts on the FLIP.
>> > >
>> > > *Internal vs. External Connectivity*
>> > >
>> > > That is a very helpful distinction, let's build on that.
>> > >
>> > >   - I would suggest to treat eventually all communication coming
>> > > potentially from users as external, meaning Client-to-Dispatcher,
>> > > Client-to-JobManager (trigger savepoint, change parallelism, ...), Web
>> > UI,
>> > > Queryable State.
>> > >
>> > >   - That leaves communication that is only between
>> > JobManager/TaskManager/
>> > > ResourceManager/Dispatcher/HistoryServer as internal.
>> > >
>> > >   - I am somewhat operating under the assumption that all external
>> > > communication will eventually be HTTP/REST. That works best with many
>> > > setups and is the basis for using service proxies that
>> > > handle  authentication/authorization.
>> > >
>> > >
>> > > In Flink 1.5 and future versions, we have the following update there:
>> > >
>> > >   - Akka is now strictly internal connectivity, the client (except
>> legacy
>> > > client) do not use it any more.
>> > >
>> > >   - The Blob Server will move to purely internal connectivity in Flink
>> > > 1.6, where a POST of a job to the Dispatcher has the jars and the
>> > JobGraph.
>> > > That is important for Kubernetes setups, where exposing the BlobServer
>> > and
>> > > querying the blob port causes quite some friction.
>> > >
>> > >   - Treating queryable state as "internal connectivity" is fine for
>> now.
>> > > We should treat it as "external" connectivity in the future if we
>> move it
>> > > to HTTP/REST.
>> > >
>> > >
>> > > *Internal Connectivity and SSL Mutual Authentication*
>> > >
>> > > Simply activating SSL mutual authentication for the internal
>> > communication
>> > > is a really low hanging fruit.
>> > >
>> > > Activating client authentication for Akka, network stack Netty (and
>> Blob
>> > > Server/Client in Flink 1.6) should require no change in the
>> > configurations
>> > > with respect to Flink 1.4. All processes are, with respect to internal
>> > > communication, simultaneously server and client endpoints. Because of
>> > that,
>> > > they already need KeyStore and TrustStore files for SSL handshakes,
>> where
>> > > the TrustStore needs to trust the KeyStore Certificate.
>> > >
>> > > I personally favor the suggestion made to have a script that
>> generates a
>> > > self-signed certificate and adds it to "conf" and updates the
>> > > configuration. That should be picked up by the Yarn and Mesos clients
>> > > anyways.
>> > >
>> > >
>> > > *External Connectivity*
>> > >
>> > > There is a huge surface area and I think we need to give users a way
>> to
>> > > plug in their own tools.
>> > > From what I see (and after some discussions with Patrick and Gary) I
>> > think
>> > > it makes sense to look at proxies in a broad way, similar to the
>> approach
>> > > Eron outlined.
>> > >
>> > > The basic approach could be like that:
>> > >
>> > >   - Everything goes through HTTPS, so the proxy can work with HTTP
>> > headers.
>> > >   - The proxy handles authentication and possibly authorization. The
>> > proxy
>> > > adds some header, for example a user name, a group id, an
>> authorization
>> > > token.
>> > >   - Flink can configure an implementation of an 'authorizer' or
>> validator
>> > > on the headers to decide whether the request is valid.
>> > >
>> > >   - Example 1: The proxy does authentication and adds the user name /
>> > > group as a header. The the Flink-side authorizer simply checks whether
>> > the
>> > > name is in the config (simple ACL-style) scheme.
>> > >   - Example 2: The proxy adds an JSON Web Token and the authorizer
>> > > validates that token.
>> > >
>> > > For secure connections between the Proxy and the Flink Endpoint I
>> would
>> > > follow Eron's suggestion, to use separate KeyStores and TrustStores
>> than
>> > > for internal communication.
>> > >
>> > > For Yarn and Mesos, I would like to see if we could handle those
>> again as
>> > > a special case of the proxies above:
>> > >   - DCOS Admin Router forwards the user authentication token, so that
>> > > could be another authorizer implementation.
>> > >   - In YARN we could see if can implement the IP filter via such an
>> > > authorizer.
>> > >
>> > >
>> > > *Hostname Verification*
>> > >
>> > > For internal communication, and especially on dynamic environments
>> like
>> > > Kubernetes, it is very hard to work with certificates and have
>> hostname
>> > > verification on.
>> > >
>> > > If we assume internal communication works strictly with a shared
>> secret
>> > > certificate and with client authentication, does hostname verification
>> > > actually still add security in that particular setup? My understanding
>> > was
>> > > that hostname verification is important to not have some valid
>> > certificate
>> > > presented, but the one bound to the server you want to talk to. If we
>> > have
>> > > anyways one trusted certificate only, isn't that already implied?
>> > >
>> > > On the other hand, it is still possible (and potentially valuable) for
>> > > users in standalone mode to use keystores and truststores from a PKI,
>> in
>> > > which case there may still be an argument in favor of hostname
>> > verification.
>> > >
>> > > On Thu, May 10, 2018, 02:30 Eron Wright <er...@gmail.com> wrote:
>> > >
>> > >> Hello,
>> > >>
>> > >> Given that some SSL enhancement bugs have been posted lately, I took
>> > some
>> > >> time to revise FLIP-26 which explores how to harden both external and
>> > >> internal communication.
>> > >>
>> > >>
>> > https://cwiki.apache.org/confluence/pages/viewpage.action?
>> pageId=80453255
>> > >>
>> > >> Some recent related issues:
>> > >> - FLINK-9312 - mutual auth for intra-cluster communication
>> > >> - FLINK-5030 - original SSL feature work
>> > >>
>> > >> There's also some recent discussion of how to use Flink SSL
>> effectively
>> > in
>> > >> a Kubernetes environment.   The issue is about hostname verification.
>> > The
>> > >> proposal that I've put forward in FLIP-26 is to not use hostname
>> > >> verification for intra-cluster communication, but rather to rely in a
>> > >> cluster-internal certificate and a truststore consisting only of that
>> > >> certificate.   Meanwhile, a new "external" certificate would be
>> > >> configurable for the web/api endpoint and associated with a
>> well-known
>> > DNS
>> > >> name as provided by a K8s Service resource.
>> > >>
>> > >> Stephan is this in-line with your thinking re FLINK-9312?
>> > >>
>> > >> Thanks
>> > >> Eron
>> > >>
>> > >
>> >
>>
>
>