You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@streampark.apache.org by "J-dfy (via GitHub)" <gi...@apache.org> on 2023/06/20 08:20:44 UTC
[GitHub] [incubator-streampark] J-dfy opened a new issue, #2807: [Bug] stream make k8s zk High availability not work
J-dfy opened a new issue, #2807:
URL: https://github.com/apache/incubator-streampark/issues/2807
### Search before asking
- [X] I had searched in the [issues](https://github.com/apache/incubator-streampark/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues.
### Java Version
1.8.0
### Scala Version
2.12.x
### StreamPark Version
2.0.0
### Flink Version
1.16.1
### deploy mode
kubernetes-application
### What happened
1. use streampark start a job , kill the jobmanager , a new jobmanager will be pulled by ha(zk) , but after seconds the new jobmanager will be killed ,ha not work
2. use streampark start a job , shutdown streampark , kill the jobmanager , new jobmanager will be pulled by ha(zk)
1.用streampark启动一个任务,然后用kill命令杀死jobmanager,新的jobmanager会被高可用(zk)拉起,但是新的jobmanager很快会被杀死
2.用streampark启动一个任务,然后关闭streampark,用kill命令杀死jobmanager,新的jobmanager会被高可用(zk)拉起,之后无其他异常
### Error Exception
```log
2023-06-20 14:12:11,261 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - --------------------------------------------------------------------------------
2023-06-20 14:12:11,267 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Preconfiguration:
2023-06-20 14:12:11,268 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] -
RESOURCE_PARAMS extraction logs:
jvm_params: -Xmx1073741824 -Xms1073741824 -XX:MaxMetaspaceSize=268435456
dynamic_configs: -D jobmanager.memory.off-heap.size=134217728b -D jobmanager.memory.jvm-overhead.min=201326592b -D jobmanager.memory.jvm-metaspace.size=268435456b -D jobmanager.memory.heap.size=1073741824b -D jobmanager.memory.jvm-overhead.max=201326592b
logs: INFO [] - Loading configuration property: blob.server.port, 6124
INFO [] - Loading configuration property: state.checkpoints.num-retained, 1
INFO [] - Loading configuration property: kubernetes.hostnetwork.enabled, true
INFO [] - Loading configuration property: jobmanager.execution.failover-strategy, region
INFO [] - Loading configuration property: high-availability.cluster-id, opswaf
INFO [] - Loading configuration property: jobmanager.rpc.address, localhost
INFO [] - Loading configuration property: kubernetes.service-account, flink-service-account
INFO [] - Loading configuration property: kubernetes.cluster-id, opswaf
INFO [] - Loading configuration property: high-availability.storageDir, hdfs:///user/flink/ha
INFO [] - Loading configuration property: $internal.application.program-args, --servers;axcloud
INFO [] - Loading configuration property: kubernetes.container.image, harbor-pre.jijiaban.net/flink/flink/streamparkflinkjob-flink-opswaf
INFO [] - Loading configuration property: parallelism.default, 1
INFO [] - Loading configuration property: kubernetes.namespace, flink
INFO [] - Loading configuration property: taskmanager.numberOfTaskSlots, 1
INFO [] - Loading configuration property: kubernetes.rest-service.exposed.type, NodePort
INFO [] - Loading configuration property: high-availability.jobmanager.port, 6123
INFO [] - Loading configuration property: kubernetes.jobmanager.node-selector, bu:flink
INFO [] - Loading configuration property: $internal.application.main, com.huixian.flinkops.stream.application.OpsWafApp
INFO [] - Loading configuration property: taskmanager.memory.process.size, 1728m
INFO [] - Loading configuration property: jobmanager.archive.fs.dir, hdfs://nameservice1/user/streampark/historyserver/archive
INFO [] - Loading configuration property: kubernetes.internal.jobmanager.entrypoint.class, org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint
INFO [] - Loading configuration property: pipeline.name, OpsWafApp
INFO [] - Loading configuration property: classloader.resolve-order, child-first
INFO [] - Loading configuration property: kubernetes.pod-template-file, /data/flink/flink/conf/flink-pod-template.yaml
INFO [] - Loading configuration property: execution.target, kubernetes-application
INFO [] - Loading configuration property: jobmanager.memory.process.size, 1600m
INFO [] - Loading configuration property: jobmanager.rpc.port, 6123
INFO [] - Loading configuration property: taskmanager.rpc.port, 6122
INFO [] - Loading configuration property: kubernetes.container.image.pull-policy, Always
INFO [] - Loading configuration property: high-availability.zookeeper.quorum, 172.16.122.91:2181,172.16.122.92:2181,172.16.122.93:2181
INFO [] - Loading configuration property: internal.cluster.execution-mode, NORMAL
INFO [] - Loading configuration property: $internal.pipeline.job-id, 8eea7f717c7de1b06b9902b159a28b9e
INFO [] - Loading configuration property: high-availability, ZOOKEEPER
INFO [] - Loading configuration property: pipeline.jars, local:///opt/flink/usrlib/streampark-flinkjob_OpsWafApp.jar
INFO [] - Loading configuration property: rest.address, localhost
INFO [] - Loading configuration property: kubernetes.taskmanager.node-selector, bu:flink
INFO [] - The derived from fraction jvm overhead memory (160.000mb (167772162 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
INFO [] - Final Master Memory configuration:
INFO [] - Total Process Memory: 1.563gb (1677721600 bytes)
INFO [] - Total Flink Memory: 1.125gb (1207959552 bytes)
INFO [] - JVM Heap: 1024.000mb (1073741824 bytes)
INFO [] - Off-heap: 128.000mb (134217728 bytes)
INFO [] - JVM Metaspace: 256.000mb (268435456 bytes)
INFO [] - JVM Overhead: 192.000mb (201326592 bytes)
2023-06-20 14:12:11,269 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - --------------------------------------------------------------------------------
2023-06-20 14:12:11,269 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Starting KubernetesApplicationClusterEntrypoint (Version: 1.16.1, Scala: 2.12, Rev:DeadD0d0, Date:1970-01-01T01:00:00+01:00)
2023-06-20 14:12:11,269 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - OS current user: flink
2023-06-20 14:12:11,881 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Current Hadoop/Kerberos user: flink
2023-06-20 14:12:11,881 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - JVM: OpenJDK 64-Bit Server VM - Temurin - 1.8/25.362-b09
2023-06-20 14:12:11,882 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Arch: amd64
2023-06-20 14:12:11,882 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Maximum heap size: 989 MiBytes
2023-06-20 14:12:11,882 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - JAVA_HOME: /opt/java/openjdk
2023-06-20 14:12:11,885 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Hadoop version: 3.0.0-cdh6.3.2
2023-06-20 14:12:11,886 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - JVM Options:
2023-06-20 14:12:11,886 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Xmx1073741824
2023-06-20 14:12:11,886 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Xms1073741824
2023-06-20 14:12:11,886 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -XX:MaxMetaspaceSize=268435456
2023-06-20 14:12:11,886 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlog.file=/opt/flink/log/flink--kubernetes-application-0-k8s-node-147-26.log
2023-06-20 14:12:11,886 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties
2023-06-20 14:12:11,886 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlog4j.configurationFile=file:/opt/flink/conf/log4j-console.properties
2023-06-20 14:12:11,887 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml
2023-06-20 14:12:11,887 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Program Arguments:
2023-06-20 14:12:11,888 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -D
2023-06-20 14:12:11,888 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - jobmanager.memory.off-heap.size=134217728b
2023-06-20 14:12:11,888 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -D
2023-06-20 14:12:11,888 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - jobmanager.memory.jvm-overhead.min=201326592b
2023-06-20 14:12:11,889 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -D
2023-06-20 14:12:11,889 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - jobmanager.memory.jvm-metaspace.size=268435456b
2023-06-20 14:12:11,889 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -D
2023-06-20 14:12:11,889 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - jobmanager.memory.heap.size=1073741824b
2023-06-20 14:12:11,889 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - -D
2023-06-20 14:12:11,890 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - jobmanager.memory.jvm-overhead.max=201326592b
2023-06-20 14:12:11,890 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Classpath: /opt/flink/lib/commons-pool2-2.6.2.jar:/opt/flink/lib/connect-api-2.7.1.jar:/opt/flink/lib/druid-1.1.10.jar:/opt/flink/lib/flink-cep-1.16.1.jar:/opt/flink/lib/flink-connector-files-1.16.1.jar:/opt/flink/lib/flink-connector-hbase-2.2-1.16.1.jar:/opt/flink/lib/flink-connector-jdbc-1.16.1.jar:/opt/flink/lib/flink-connector-kafka-1.16.1.jar:/opt/flink/lib/flink-csv-1.16.1.jar:/opt/flink/lib/flink-json-1.16.1.jar:/opt/flink/lib/flink-queryable-state-runtime-1.16.1.jar:/opt/flink/lib/flink-scala_2.12-1.16.1.jar:/opt/flink/lib/flink-shaded-hadoop-2-uber-3.0.0-cdh6.3.2-10.0.jar:/opt/flink/lib/flink-shaded-zookeeper-3.5.9.jar:/opt/flink/lib/flink-sql-connector-mysql-cdc-2.3.0.jar:/opt/flink/lib/flink-table-api-java-bridge-1.16.1.jar:/opt/flink/lib/flink-table-api-java-uber-1.16.1.jar:/opt/flink/lib/flink-table-planner-loader-1.16.1.jar:/opt/flink/lib/flink-table-runtime-1.16.1.jar:/o
pt/flink/lib/hadoop-client-3.0.0-cdh6.3.2.jar:/opt/flink/lib/HikariCP-4.0.3.jar:/opt/flink/lib/jedis-3.4.1.jar:/opt/flink/lib/kafka-clients-3.2.3.jar:/opt/flink/lib/log4j-1.2-api-2.17.1.jar:/opt/flink/lib/log4j-api-2.17.1.jar:/opt/flink/lib/log4j-core-2.17.1.jar:/opt/flink/lib/log4j-slf4j-impl-2.17.1.jar:/opt/flink/lib/mysql-connector-java-8.0.27.jar:/opt/flink/lib/taos-jdbcdriver-2.0.38-dist.jar:/opt/flink/lib/flink-dist-1.16.1.jar:::/opt/hadoop/conf:
2023-06-20 14:12:11,890 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - --------------------------------------------------------------------------------
2023-06-20 14:12:11,892 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Registered UNIX signal handlers for [TERM, HUP, INT]
2023-06-20 14:12:11,968 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: blob.server.port, 6124
2023-06-20 14:12:11,968 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: state.checkpoints.num-retained, 1
2023-06-20 14:12:11,968 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: kubernetes.hostnetwork.enabled, true
2023-06-20 14:12:11,969 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: jobmanager.execution.failover-strategy, region
2023-06-20 14:12:11,969 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: high-availability.cluster-id, opswaf
2023-06-20 14:12:11,969 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: jobmanager.rpc.address, localhost
2023-06-20 14:12:11,969 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: kubernetes.service-account, flink-service-account
2023-06-20 14:12:11,970 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: kubernetes.cluster-id, opswaf
2023-06-20 14:12:11,970 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: high-availability.storageDir, hdfs:///user/flink/ha
2023-06-20 14:12:11,970 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: $internal.application.program-args, --servers;axcloud
2023-06-20 14:12:11,970 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: kubernetes.container.image, harbor-pre.jijiaban.net/flink/flink/streamparkflinkjob-flink-opswaf
2023-06-20 14:12:11,970 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: parallelism.default, 1
2023-06-20 14:12:11,970 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: kubernetes.namespace, flink
2023-06-20 14:12:11,971 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2023-06-20 14:12:11,971 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: kubernetes.rest-service.exposed.type, NodePort
2023-06-20 14:12:11,971 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: high-availability.jobmanager.port, 6123
2023-06-20 14:12:11,971 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: kubernetes.jobmanager.node-selector, bu:flink
2023-06-20 14:12:11,971 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: $internal.application.main, com.huixian.flinkops.stream.application.OpsWafApp
2023-06-20 14:12:11,972 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: taskmanager.memory.process.size, 1728m
2023-06-20 14:12:11,972 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: jobmanager.archive.fs.dir, hdfs://nameservice1/user/streampark/historyserver/archive
2023-06-20 14:12:11,972 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: kubernetes.internal.jobmanager.entrypoint.class, org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint
2023-06-20 14:12:11,972 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: pipeline.name, OpsWafApp
2023-06-20 14:12:11,973 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: classloader.resolve-order, child-first
2023-06-20 14:12:11,973 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: kubernetes.pod-template-file, /data/flink/flink/conf/flink-pod-template.yaml
2023-06-20 14:12:11,973 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: execution.target, kubernetes-application
2023-06-20 14:12:11,973 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: jobmanager.memory.process.size, 1600m
2023-06-20 14:12:11,974 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: jobmanager.rpc.port, 6123
2023-06-20 14:12:11,974 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: taskmanager.rpc.port, 6122
2023-06-20 14:12:11,974 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: kubernetes.container.image.pull-policy, Always
2023-06-20 14:12:11,974 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: high-availability.zookeeper.quorum, 172.16.122.91:2181,172.16.122.92:2181,172.16.122.93:2181
2023-06-20 14:12:11,975 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: internal.cluster.execution-mode, NORMAL
2023-06-20 14:12:11,975 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: $internal.pipeline.job-id, 8eea7f717c7de1b06b9902b159a28b9e
2023-06-20 14:12:11,975 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: high-availability, ZOOKEEPER
2023-06-20 14:12:11,975 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: pipeline.jars, local:///opt/flink/usrlib/streampark-flinkjob_OpsWafApp.jar
2023-06-20 14:12:11,975 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: rest.address, localhost
2023-06-20 14:12:11,976 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: kubernetes.taskmanager.node-selector, bu:flink
2023-06-20 14:12:11,976 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading dynamic configuration property: jobmanager.memory.off-heap.size, 134217728b
2023-06-20 14:12:11,976 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading dynamic configuration property: jobmanager.memory.jvm-overhead.min, 201326592b
2023-06-20 14:12:11,976 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading dynamic configuration property: jobmanager.memory.jvm-metaspace.size, 268435456b
2023-06-20 14:12:11,977 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading dynamic configuration property: jobmanager.memory.heap.size, 1073741824b
2023-06-20 14:12:11,977 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading dynamic configuration property: jobmanager.memory.jvm-overhead.max, 201326592b
2023-06-20 14:12:12,608 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested.
```
### Screenshots
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!(您是否要贡献这个PR?)
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@streampark.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-streampark] J-dfy commented on issue #2807: [Bug] streampark make k8s zk High availability not work
Posted by "J-dfy (via GitHub)" <gi...@apache.org>.
J-dfy commented on issue #2807:
URL: https://github.com/apache/incubator-streampark/issues/2807#issuecomment-1611009831
![image](https://github.com/apache/incubator-streampark/assets/91323259/4f8b9b12-bfc1-42a6-a0ea-6b92db641f04)
不理解为什么pod终止时要删除deployment
暂时改成这样,高可用就能用了
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@streampark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-streampark] wolfboys commented on issue #2807: [Bug] streampark make k8s zk High availability not work
Posted by "wolfboys (via GitHub)" <gi...@apache.org>.
wolfboys commented on issue #2807:
URL: https://github.com/apache/incubator-streampark/issues/2807#issuecomment-1614155975
cc @Al-assad @MonsterChenzhuo
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@streampark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org