You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Sebastian Struß (Jira)" <ji...@apache.org> on 2022/10/07 13:32:00 UTC

[jira] [Created] (FLINK-29535) Flink Operator Certificate renew issue

Sebastian Struß created FLINK-29535:
---------------------------------------

             Summary: Flink Operator Certificate renew issue
                 Key: FLINK-29535
                 URL: https://issues.apache.org/jira/browse/FLINK-29535
             Project: Flink
          Issue Type: Bug
          Components: Kubernetes Operator
            Reporter: Sebastian Struß


It seems that there is an issue with the Kubernetes Operator (at least in version 1.1.0) when it comes to certificates for the webhook.

We've seen this error message pop up in the logs:
| |
|An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.|
| 
and

javax.net.ssl.SSLHandshakeException: Received fatal alert: bad_certificate at sun.security.ssl.Alert.createSSLException(Unknown Source) ~[?:?] at sun.security.ssl.Alert.createSSLException(Unknown Source) ~[?:?] at sun.security.ssl.TransportContext.fatal(Unknown Source) ~[?:?] at sun.security.ssl.Alert$AlertConsumer.consume(Unknown Source) ~[?:?] at sun.security.ssl.TransportContext.dispatch(Unknown Source) ~[?:?] at sun.security.ssl.SSLTransport.decode(Unknown Source) ~[?:?] at sun.security.ssl.SSLEngineImpl.decode(Unknown Source) ~[?:?] at sun.security.ssl.SSLEngineImpl.readRecord(Unknown Source) ~[?:?] at sun.security.ssl.SSLEngineImpl.unwrap(Unknown Source) ~[?:?] at sun.security.ssl.SSLEngineImpl.unwrap(Unknown Source) ~[?:?] at javax.net.ssl.SSLEngine.unwrap(Unknown Source) ~[?:?] at org.apache.flink.shaded.netty4.io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:296) ~[flink-kubernetes-operator-1.1.0-shaded.jar:1.1.0] at org.apache.flink.shaded.netty4.io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1342) ~[flink-kubernetes-operator-1.1.0-shaded.jar:1.1.0] at org.apache.flink.shaded.netty4.io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1235) ~[flink-kubernetes-operator-1.1.0-shaded.jar:1.1.0] at org.apache.flink.shaded.netty4.io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1284) ~[flink-kubernetes-operator-1.1.0-shaded.jar:1.1.0] at org.apache.flink.shaded.netty4.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507) ~[flink-kubernetes-operator-1.1.0-shaded.jar:1.1.0] at org.apache.flink.shaded.netty4.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446) ~[flink-kubernetes-operator-1.1.0-shaded.jar:1.1.0]|

It happens when our fluxcd is trying to update the FlinkDeployment resource.

This seems to trigger a webhook to an endpoint (in the operator) which is serving a (then) invalid certificate.

We've noticed this after 18 days of it running, so maybe something shortlived was not renewed correctly?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)