You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@streampark.apache.org by "J-dfy (via GitHub)" <gi...@apache.org> on 2023/06/20 08:20:44 UTC

[GitHub] [incubator-streampark] J-dfy opened a new issue, #2807: [Bug] stream make k8s zk High availability not work

J-dfy opened a new issue, #2807:
URL: https://github.com/apache/incubator-streampark/issues/2807

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/incubator-streampark/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues.
   
   
   ### Java Version
   
   1.8.0
   
   ### Scala Version
   
   2.12.x
   
   ### StreamPark Version
   
   2.0.0
   
   ### Flink Version
   
   1.16.1
   
   ### deploy mode
   
   kubernetes-application
   
   ### What happened
   
   1. use streampark start a job , kill the jobmanager ,  a new jobmanager will be pulled  by ha(zk) , but after seconds the new jobmanager will be killed ,ha not work
   2. use streampark start a job , shutdown streampark , kill the jobmanager , new jobmanager will be pulled  by ha(zk) 
   
   1.用streampark启动一个任务,然后用kill命令杀死jobmanager,新的jobmanager会被高可用(zk)拉起,但是新的jobmanager很快会被杀死
   2.用streampark启动一个任务,然后关闭streampark,用kill命令杀死jobmanager,新的jobmanager会被高可用(zk)拉起,之后无其他异常
   
   ### Error Exception
   
   ```log
   2023-06-20 14:12:11,261 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - --------------------------------------------------------------------------------
   2023-06-20 14:12:11,267 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -  Preconfiguration: 
   2023-06-20 14:12:11,268 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - 
   
   
   RESOURCE_PARAMS extraction logs:
   jvm_params: -Xmx1073741824 -Xms1073741824 -XX:MaxMetaspaceSize=268435456
   dynamic_configs: -D jobmanager.memory.off-heap.size=134217728b -D jobmanager.memory.jvm-overhead.min=201326592b -D jobmanager.memory.jvm-metaspace.size=268435456b -D jobmanager.memory.heap.size=1073741824b -D jobmanager.memory.jvm-overhead.max=201326592b
   logs: INFO  [] - Loading configuration property: blob.server.port, 6124
   INFO  [] - Loading configuration property: state.checkpoints.num-retained, 1
   INFO  [] - Loading configuration property: kubernetes.hostnetwork.enabled, true
   INFO  [] - Loading configuration property: jobmanager.execution.failover-strategy, region
   INFO  [] - Loading configuration property: high-availability.cluster-id, opswaf
   INFO  [] - Loading configuration property: jobmanager.rpc.address, localhost
   INFO  [] - Loading configuration property: kubernetes.service-account, flink-service-account
   INFO  [] - Loading configuration property: kubernetes.cluster-id, opswaf
   INFO  [] - Loading configuration property: high-availability.storageDir, hdfs:///user/flink/ha
   INFO  [] - Loading configuration property: $internal.application.program-args, --servers;axcloud
   INFO  [] - Loading configuration property: kubernetes.container.image, harbor-pre.jijiaban.net/flink/flink/streamparkflinkjob-flink-opswaf
   INFO  [] - Loading configuration property: parallelism.default, 1
   INFO  [] - Loading configuration property: kubernetes.namespace, flink
   INFO  [] - Loading configuration property: taskmanager.numberOfTaskSlots, 1
   INFO  [] - Loading configuration property: kubernetes.rest-service.exposed.type, NodePort
   INFO  [] - Loading configuration property: high-availability.jobmanager.port, 6123
   INFO  [] - Loading configuration property: kubernetes.jobmanager.node-selector, bu:flink
   INFO  [] - Loading configuration property: $internal.application.main, com.huixian.flinkops.stream.application.OpsWafApp
   INFO  [] - Loading configuration property: taskmanager.memory.process.size, 1728m
   INFO  [] - Loading configuration property: jobmanager.archive.fs.dir, hdfs://nameservice1/user/streampark/historyserver/archive
   INFO  [] - Loading configuration property: kubernetes.internal.jobmanager.entrypoint.class, org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint
   INFO  [] - Loading configuration property: pipeline.name, OpsWafApp
   INFO  [] - Loading configuration property: classloader.resolve-order, child-first
   INFO  [] - Loading configuration property: kubernetes.pod-template-file, /data/flink/flink/conf/flink-pod-template.yaml
   INFO  [] - Loading configuration property: execution.target, kubernetes-application
   INFO  [] - Loading configuration property: jobmanager.memory.process.size, 1600m
   INFO  [] - Loading configuration property: jobmanager.rpc.port, 6123
   INFO  [] - Loading configuration property: taskmanager.rpc.port, 6122
   INFO  [] - Loading configuration property: kubernetes.container.image.pull-policy, Always
   INFO  [] - Loading configuration property: high-availability.zookeeper.quorum, 172.16.122.91:2181,172.16.122.92:2181,172.16.122.93:2181
   INFO  [] - Loading configuration property: internal.cluster.execution-mode, NORMAL
   INFO  [] - Loading configuration property: $internal.pipeline.job-id, 8eea7f717c7de1b06b9902b159a28b9e
   INFO  [] - Loading configuration property: high-availability, ZOOKEEPER
   INFO  [] - Loading configuration property: pipeline.jars, local:///opt/flink/usrlib/streampark-flinkjob_OpsWafApp.jar
   INFO  [] - Loading configuration property: rest.address, localhost
   INFO  [] - Loading configuration property: kubernetes.taskmanager.node-selector, bu:flink
   INFO  [] - The derived from fraction jvm overhead memory (160.000mb (167772162 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead
   INFO  [] - Final Master Memory configuration:
   INFO  [] -   Total Process Memory: 1.563gb (1677721600 bytes)
   INFO  [] -     Total Flink Memory: 1.125gb (1207959552 bytes)
   INFO  [] -       JVM Heap:         1024.000mb (1073741824 bytes)
   INFO  [] -       Off-heap:         128.000mb (134217728 bytes)
   INFO  [] -     JVM Metaspace:      256.000mb (268435456 bytes)
   INFO  [] -     JVM Overhead:       192.000mb (201326592 bytes)
   
   2023-06-20 14:12:11,269 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - --------------------------------------------------------------------------------
   2023-06-20 14:12:11,269 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -  Starting KubernetesApplicationClusterEntrypoint (Version: 1.16.1, Scala: 2.12, Rev:DeadD0d0, Date:1970-01-01T01:00:00+01:00)
   2023-06-20 14:12:11,269 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -  OS current user: flink
   2023-06-20 14:12:11,881 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -  Current Hadoop/Kerberos user: flink
   2023-06-20 14:12:11,881 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -  JVM: OpenJDK 64-Bit Server VM - Temurin - 1.8/25.362-b09
   2023-06-20 14:12:11,882 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -  Arch: amd64
   2023-06-20 14:12:11,882 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -  Maximum heap size: 989 MiBytes
   2023-06-20 14:12:11,882 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -  JAVA_HOME: /opt/java/openjdk
   2023-06-20 14:12:11,885 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -  Hadoop version: 3.0.0-cdh6.3.2
   2023-06-20 14:12:11,886 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -  JVM Options:
   2023-06-20 14:12:11,886 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -     -Xmx1073741824
   2023-06-20 14:12:11,886 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -     -Xms1073741824
   2023-06-20 14:12:11,886 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -     -XX:MaxMetaspaceSize=268435456
   2023-06-20 14:12:11,886 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -     -Dlog.file=/opt/flink/log/flink--kubernetes-application-0-k8s-node-147-26.log
   2023-06-20 14:12:11,886 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -     -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties
   2023-06-20 14:12:11,886 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -     -Dlog4j.configurationFile=file:/opt/flink/conf/log4j-console.properties
   2023-06-20 14:12:11,887 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -     -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml
   2023-06-20 14:12:11,887 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -  Program Arguments:
   2023-06-20 14:12:11,888 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -     -D
   2023-06-20 14:12:11,888 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -     jobmanager.memory.off-heap.size=134217728b
   2023-06-20 14:12:11,888 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -     -D
   2023-06-20 14:12:11,888 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -     jobmanager.memory.jvm-overhead.min=201326592b
   2023-06-20 14:12:11,889 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -     -D
   2023-06-20 14:12:11,889 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -     jobmanager.memory.jvm-metaspace.size=268435456b
   2023-06-20 14:12:11,889 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -     -D
   2023-06-20 14:12:11,889 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -     jobmanager.memory.heap.size=1073741824b
   2023-06-20 14:12:11,889 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -     -D
   2023-06-20 14:12:11,890 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -     jobmanager.memory.jvm-overhead.max=201326592b
   2023-06-20 14:12:11,890 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] -  Classpath: /opt/flink/lib/commons-pool2-2.6.2.jar:/opt/flink/lib/connect-api-2.7.1.jar:/opt/flink/lib/druid-1.1.10.jar:/opt/flink/lib/flink-cep-1.16.1.jar:/opt/flink/lib/flink-connector-files-1.16.1.jar:/opt/flink/lib/flink-connector-hbase-2.2-1.16.1.jar:/opt/flink/lib/flink-connector-jdbc-1.16.1.jar:/opt/flink/lib/flink-connector-kafka-1.16.1.jar:/opt/flink/lib/flink-csv-1.16.1.jar:/opt/flink/lib/flink-json-1.16.1.jar:/opt/flink/lib/flink-queryable-state-runtime-1.16.1.jar:/opt/flink/lib/flink-scala_2.12-1.16.1.jar:/opt/flink/lib/flink-shaded-hadoop-2-uber-3.0.0-cdh6.3.2-10.0.jar:/opt/flink/lib/flink-shaded-zookeeper-3.5.9.jar:/opt/flink/lib/flink-sql-connector-mysql-cdc-2.3.0.jar:/opt/flink/lib/flink-table-api-java-bridge-1.16.1.jar:/opt/flink/lib/flink-table-api-java-uber-1.16.1.jar:/opt/flink/lib/flink-table-planner-loader-1.16.1.jar:/opt/flink/lib/flink-table-runtime-1.16.1.jar:/o
 pt/flink/lib/hadoop-client-3.0.0-cdh6.3.2.jar:/opt/flink/lib/HikariCP-4.0.3.jar:/opt/flink/lib/jedis-3.4.1.jar:/opt/flink/lib/kafka-clients-3.2.3.jar:/opt/flink/lib/log4j-1.2-api-2.17.1.jar:/opt/flink/lib/log4j-api-2.17.1.jar:/opt/flink/lib/log4j-core-2.17.1.jar:/opt/flink/lib/log4j-slf4j-impl-2.17.1.jar:/opt/flink/lib/mysql-connector-java-8.0.27.jar:/opt/flink/lib/taos-jdbcdriver-2.0.38-dist.jar:/opt/flink/lib/flink-dist-1.16.1.jar:::/opt/hadoop/conf:
   2023-06-20 14:12:11,890 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - --------------------------------------------------------------------------------
   2023-06-20 14:12:11,892 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Registered UNIX signal handlers for [TERM, HUP, INT]
   2023-06-20 14:12:11,968 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: blob.server.port, 6124
   2023-06-20 14:12:11,968 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: state.checkpoints.num-retained, 1
   2023-06-20 14:12:11,968 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: kubernetes.hostnetwork.enabled, true
   2023-06-20 14:12:11,969 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.execution.failover-strategy, region
   2023-06-20 14:12:11,969 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: high-availability.cluster-id, opswaf
   2023-06-20 14:12:11,969 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.rpc.address, localhost
   2023-06-20 14:12:11,969 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: kubernetes.service-account, flink-service-account
   2023-06-20 14:12:11,970 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: kubernetes.cluster-id, opswaf
   2023-06-20 14:12:11,970 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: high-availability.storageDir, hdfs:///user/flink/ha
   2023-06-20 14:12:11,970 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: $internal.application.program-args, --servers;axcloud
   2023-06-20 14:12:11,970 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: kubernetes.container.image, harbor-pre.jijiaban.net/flink/flink/streamparkflinkjob-flink-opswaf
   2023-06-20 14:12:11,970 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: parallelism.default, 1
   2023-06-20 14:12:11,970 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: kubernetes.namespace, flink
   2023-06-20 14:12:11,971 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.numberOfTaskSlots, 1
   2023-06-20 14:12:11,971 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: kubernetes.rest-service.exposed.type, NodePort
   2023-06-20 14:12:11,971 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: high-availability.jobmanager.port, 6123
   2023-06-20 14:12:11,971 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: kubernetes.jobmanager.node-selector, bu:flink
   2023-06-20 14:12:11,971 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: $internal.application.main, com.huixian.flinkops.stream.application.OpsWafApp
   2023-06-20 14:12:11,972 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.memory.process.size, 1728m
   2023-06-20 14:12:11,972 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.archive.fs.dir, hdfs://nameservice1/user/streampark/historyserver/archive
   2023-06-20 14:12:11,972 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: kubernetes.internal.jobmanager.entrypoint.class, org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint
   2023-06-20 14:12:11,972 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: pipeline.name, OpsWafApp
   2023-06-20 14:12:11,973 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: classloader.resolve-order, child-first
   2023-06-20 14:12:11,973 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: kubernetes.pod-template-file, /data/flink/flink/conf/flink-pod-template.yaml
   2023-06-20 14:12:11,973 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: execution.target, kubernetes-application
   2023-06-20 14:12:11,973 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.memory.process.size, 1600m
   2023-06-20 14:12:11,974 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.rpc.port, 6123
   2023-06-20 14:12:11,974 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.rpc.port, 6122
   2023-06-20 14:12:11,974 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: kubernetes.container.image.pull-policy, Always
   2023-06-20 14:12:11,974 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: high-availability.zookeeper.quorum, 172.16.122.91:2181,172.16.122.92:2181,172.16.122.93:2181
   2023-06-20 14:12:11,975 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: internal.cluster.execution-mode, NORMAL
   2023-06-20 14:12:11,975 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: $internal.pipeline.job-id, 8eea7f717c7de1b06b9902b159a28b9e
   2023-06-20 14:12:11,975 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: high-availability, ZOOKEEPER
   2023-06-20 14:12:11,975 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: pipeline.jars, local:///opt/flink/usrlib/streampark-flinkjob_OpsWafApp.jar
   2023-06-20 14:12:11,975 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: rest.address, localhost
   2023-06-20 14:12:11,976 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: kubernetes.taskmanager.node-selector, bu:flink
   2023-06-20 14:12:11,976 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading dynamic configuration property: jobmanager.memory.off-heap.size, 134217728b
   2023-06-20 14:12:11,976 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading dynamic configuration property: jobmanager.memory.jvm-overhead.min, 201326592b
   2023-06-20 14:12:11,976 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading dynamic configuration property: jobmanager.memory.jvm-metaspace.size, 268435456b
   2023-06-20 14:12:11,977 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading dynamic configuration property: jobmanager.memory.heap.size, 1073741824b
   2023-06-20 14:12:11,977 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading dynamic configuration property: jobmanager.memory.jvm-overhead.max, 201326592b
   2023-06-20 14:12:12,608 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested.
   ```
   
   
   ### Screenshots
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!(您是否要贡献这个PR?)
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@streampark.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-streampark] J-dfy commented on issue #2807: [Bug] streampark make k8s zk High availability not work

Posted by "J-dfy (via GitHub)" <gi...@apache.org>.
J-dfy commented on issue #2807:
URL: https://github.com/apache/incubator-streampark/issues/2807#issuecomment-1611009831

   ![image](https://github.com/apache/incubator-streampark/assets/91323259/4f8b9b12-bfc1-42a6-a0ea-6b92db641f04)
   不理解为什么pod终止时要删除deployment
   暂时改成这样,高可用就能用了


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@streampark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-streampark] wolfboys commented on issue #2807: [Bug] streampark make k8s zk High availability not work

Posted by "wolfboys (via GitHub)" <gi...@apache.org>.
wolfboys commented on issue #2807:
URL: https://github.com/apache/incubator-streampark/issues/2807#issuecomment-1614155975

   cc @Al-assad @MonsterChenzhuo 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@streampark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org