You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@kyuubi.apache.org by "khwj (via GitHub)" <gi...@apache.org> on 2023/06/08 17:01:46 UTC
[GitHub] [kyuubi] khwj opened a new issue, #4942: [Bug] Unable to create Spark executors, k8s label is too long, exceeding 63 characters
khwj opened a new issue, #4942:
URL: https://github.com/apache/kyuubi/issues/4942
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
### Search before asking
- [X] I have searched in the [issues](https://github.com/apache/kyuubi/issues?q=is%3Aissue) and found no similar issues.
### Describe the bug
The default Kubernetes driver pod name set by [EngineRef.scala](https://github.com/apache/kyuubi/blob/9ff46a3c633534c2266ad8e6316b9fddaa024a6c/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/EngineRef.scala#LL129C32-L129C32) is longer than the maximum allowed length of 63 characters. This poses a problem as the driver pod name is subsequently used as a label in Spark executor pods, resulting in invalid label errors.
To mitigate this issue, I have resorted to configuring the `spark.app.name` to a shorter value. However, this workaround hampers our ability to identify specific Spark apps based on session, user, or group (Kyuubi currently does not set the Spark user or group as Kubernetes labels).
### Affects Version(s)
1.7.1
### Kyuubi Server Log Output
_No response_
### Kyuubi Engine Log Output
```logtalk
++ id -u
+ myuid=999
++ id -g
+ mygid=1000
+ set +e
++ getent passwd 999
+ uidentry=hadoop:x:999:1000::/home/hadoop:/bin/bash
+ set -e
+ '[' -z hadoop:x:999:1000::/home/hadoop:/bin/bash ']'
+ '[' -n '' ']'
+ SPARK_K8S_CMD=driver
+ [[ driver == executor ]]
+ SPARK_CLASSPATH=':/usr/lib/spark/jars/*'
+ env
+ grep SPARK_JAVA_OPT_
+ sort -t_ -k4 -n
+ sed 's/[^=]*=\(.*\)/\1/g'
+ readarray -t SPARK_EXECUTOR_JAVA_OPTS
+ '[' -n '' ']'
+ '[' -z ']'
+ '[' -z ']'
+ '[' -n '' ']'
+ '[' -z x ']'
+ SPARK_CLASSPATH='/etc/hadoop/conf::/usr/lib/spark/jars/*'
+ '[' -z x ']'
+ SPARK_CLASSPATH='/usr/lib/spark/conf:/etc/hadoop/conf::/usr/lib/spark/jars/*'
+ '[' -n '' ']'
+ case "$SPARK_K8S_CMD" in
+ shift 1
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
+ DISABLE_STDOUT_STDERR=0
+ '[' -z '' ']'
+ DISABLE_STDOUT_STDERR=1
+ DISABLE_PULLING_CONTAINER_FAILURE=0
+ '[' -z '' ']'
+ DISABLE_PULLING_CONTAINER_FAILURE=1
+ '[' -n '' ']'
+ '[' -n '' ']'
+ '[' -n '' ']'
++ dirname ''
++ dirname ''
+ mkdir -p . .
+ '[' -n '' ']'
+ (( 1 ))
+ (( DISABLE_PULLING_CONTAINER_FAILURE ))
+ exec /usr/bin/tini -s -- /usr/lib/spark/bin/spark-submit --conf spark.driver.bindAddress=10.177.40.182 --deploy-mode client --proxy-user khwunchai --properties-file /usr/lib/spark/conf/spark.properties --class org.apache.kyuubi.engine.spark.SparkSQLEngine spark-internal
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
23/06/08 16:25:12 WARN HadoopFileSystemOwner: found no group information for khwunchai (auth:PROXY) via hadoop (auth:SIMPLE), using khwunchai as primary group
23/06/08 16:25:12 WARN HadoopFileSystemOwner: found no group information for khwunchai (auth:PROXY) via hadoop (auth:SIMPLE), using khwunchai as primary group
23/06/08 16:25:12 WARN HadoopFileSystemOwner: found no group information for khwunchai (auth:PROXY) via hadoop (auth:SIMPLE), using khwunchai as primary group
23/06/08 16:25:13 INFO SignalRegister: Registering signal handler for TERM
23/06/08 16:25:13 INFO SignalRegister: Registering signal handler for HUP
23/06/08 16:25:13 INFO SignalRegister: Registering signal handler for INT
23/06/08 16:25:13 INFO HiveConf: Found configuration file file:/etc/spark/conf/hive-site.xml
23/06/08 16:25:13 INFO SparkContext: Running Spark version 3.3.1-amzn-0
23/06/08 16:25:13 INFO ResourceUtils: ==============================================================
23/06/08 16:25:13 INFO ResourceUtils: No custom resources configured for spark.driver.
23/06/08 16:25:13 INFO ResourceUtils: ==============================================================
23/06/08 16:25:13 INFO SparkContext: Submitted application: kyuubi_USER_SPARK_SQL_khwunchai_default_73bce6a4-df00-403e-bc5d-d1721e515f9d
23/06/08 16:25:13 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 7200, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
23/06/08 16:25:13 INFO ResourceProfile: Limiting resource is cpus at 1 tasks per executor
23/06/08 16:25:13 INFO ResourceProfileManager: Added ResourceProfile id: 0
23/06/08 16:25:13 INFO SecurityManager: Changing view acls to: hadoop,khwunchai
23/06/08 16:25:13 INFO SecurityManager: Changing modify acls to: hadoop,khwunchai
23/06/08 16:25:13 INFO SecurityManager: Changing view acls groups to:
23/06/08 16:25:13 INFO SecurityManager: Changing modify acls groups to:
23/06/08 16:25:13 INFO SecurityManager: SecurityManager: authentication enabled; ui acls disabled; users with view permissions: Set(hadoop, khwunchai); groups with view permissions: Set(); users with modify permissions: Set(hadoop, khwunchai); groups with modify permissions: Set()
23/06/08 16:25:14 INFO Utils: Successfully started service 'sparkDriver' on port 7078.
23/06/08 16:25:14 INFO SparkEnv: Registering MapOutputTracker
23/06/08 16:25:14 INFO SparkEnv: Registering BlockManagerMaster
23/06/08 16:25:14 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
23/06/08 16:25:14 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
23/06/08 16:25:14 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
23/06/08 16:25:14 INFO DiskBlockManager: Created local directory at /var/data/spark-fce5fc27-0a38-451f-b83f-e3712babead1/blockmgr-30c92157-0852-4216-baa8-2b7964aed441
23/06/08 16:25:14 INFO MemoryStore: MemoryStore started with capacity 1740.0 MiB
23/06/08 16:25:14 INFO SparkEnv: Registering OutputCommitCoordinator
23/06/08 16:25:14 INFO SubResultCacheManager: Sub-result caches are disabled.
23/06/08 16:25:14 INFO Utils: Successfully started service 'SparkUI' on port 4040.
23/06/08 16:25:15 INFO SparkContext: Added JAR file:/tmp/spark-b3f39e11-1a74-40f7-a84b-273d5a2ad361/kyuubi-spark-sql-engine_2.12-1.7.1.jar at spark://spark-187df6889bd35db6-driver-svc.spark-apps.svc:7078/jars/kyuubi-spark-sql-engine_2.12-1.7.1.jar with timestamp 1686241513713
23/06/08 16:25:15 INFO SparkContext: Added JAR local:///usr/share/aws/delta/lib/delta-core.jar at file:/usr/share/aws/delta/lib/delta-core.jar with timestamp 1686241513713
23/06/08 16:25:15 INFO SparkContext: Added JAR local:///usr/share/aws/delta/lib/delta-storage.jar at file:/usr/share/aws/delta/lib/delta-storage.jar with timestamp 1686241513713
23/06/08 16:25:15 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
23/06/08 16:25:16 INFO KubernetesClientUtils: Skip updating the Pod Labels, as the Label eks-subscription.amazonaws.com/emr.internal.id is already present.
23/06/08 16:25:16 INFO Utils: Using initial executors = 1, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
23/06/08 16:25:16 WARN FairSchedulableBuilder: Fair Scheduler configuration file not found so jobs will be scheduled in FIFO order. To use fair scheduling, configure pools in fairscheduler.xml or set spark.scheduler.allocation.file to a file that contains the configuration.
23/06/08 16:25:16 INFO FairSchedulableBuilder: Created default pool: default, schedulingMode: FIFO, minShare: 0, weight: 1
23/06/08 16:25:16 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes for ResourceProfile Id: 0, target: 1, known: 0, sharedSlotFromPendingPods: 2147483647.
23/06/08 16:25:16 WARN WatchConnectionManager: Exec Failure: HTTP 400, Status: 400 - Bad Request
23/06/08 16:25:16 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed.
23/06/08 16:25:16 ERROR SparkContext: Error initializing SparkContext.
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://kubernetes.default.svc/api/v1/namespaces/spark-apps/pods?labelSelector=spark-app-selector%3Dspark-0968f860f58f469cba38861033b463bf%2Cspark-role%3Dexecutor%2Cspark-driver-pod-name%3Dkyuubi-user-spark-sql-khwunchai-default-73bce6a4-df00-403e-bc5d-d1721e515f9d-f0ccb3889bd3576e-driver&allowWatchBookmarks=true&watch=true. Message: Bad Request.
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:682) ~[kubernetes-client-5.12.2.jar:?]
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:661) ~[kubernetes-client-5.12.2.jar:?]
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.lambda$run$2(WatchConnectionManager.java:126) ~[kubernetes-client-5.12.2.jar:?]
at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) ~[?:1.8.0_362]
at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) ~[?:1.8.0_362]
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_362]
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_362]
at io.fabric8.kubernetes.client.okhttp.OkHttpWebSocketImpl$BuilderImpl$1.onFailure(OkHttpWebSocketImpl.java:66) ~[kubernetes-client-5.12.2.jar:?]
at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:571) ~[okhttp-3.12.12.jar:?]
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:198) ~[okhttp-3.12.12.jar:?]
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203) ~[okhttp-3.12.12.jar:?]
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) ~[okhttp-3.12.12.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_362]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_362]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
Suppressed: java.lang.Throwable: waiting here
at io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:169) ~[kubernetes-client-5.12.2.jar:?]
at io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:180) ~[kubernetes-client-5.12.2.jar:?]
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:96) ~[kubernetes-client-5.12.2.jar:?]
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:572) ~[kubernetes-client-5.12.2.jar:?]
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:547) ~[kubernetes-client-5.12.2.jar:?]
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:83) ~[kubernetes-client-5.12.2.jar:?]
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsWatchSnapshotSource.start(ExecutorPodsWatchSnapshotSource.scala:64) ~[spark-kubernetes_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0]
at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend.start(KubernetesClusterSchedulerBackend.scala:154) ~[spark-kubernetes_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0]
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:222) ~[spark-core_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0]
at org.apache.spark.SparkContext.<init>(SparkContext.scala:586) ~[spark-core_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0]
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2708) ~[spark-core_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0]
at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953) ~[spark-sql_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0]
at scala.Option.getOrElse(Option.scala:189) ~[scala-library-2.12.15.jar:?]
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947) ~[spark-sql_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0]
at org.apache.kyuubi.engine.spark.SparkSQLEngine$.createSpark(SparkSQLEngine.scala:253) ~[kyuubi-spark-sql-engine_2.12-1.7.1.jar:?]
at org.apache.kyuubi.engine.spark.SparkSQLEngine$.main(SparkSQLEngine.scala:326) ~[kyuubi-spark-sql-engine_2.12-1.7.1.jar:?]
at org.apache.kyuubi.engine.spark.SparkSQLEngine.main(SparkSQLEngine.scala) ~[kyuubi-spark-sql-engine_2.12-1.7.1.jar:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_362]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_362]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_362]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_362]
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) ~[spark-core_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0]
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1006) ~[spark-core_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0]
at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:165) ~[spark-core_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0]
at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:163) ~[spark-core_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_362]
at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_362]
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) ~[hadoop-client-api-3.3.3-amzn-2.jar:?]
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:163) ~[spark-core_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0]
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) ~[spark-core_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0]
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) ~[spark-core_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0]
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1095) ~[spark-core_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0]
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1104) ~[spark-core_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0]
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) ~[spark-core_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0]
23/06/08 16:25:16 INFO SparkUI: Stopped Spark web UI at http://spark-187df6889bd35db6-driver-svc.spark-apps.svc:4040
23/06/08 16:25:16 INFO KubernetesClusterSchedulerBackend: Shutting down all executors
23/06/08 16:25:16 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each executor to shut down
23/06/08 16:25:16 INFO KubernetesClientUtils: Spark configuration files loaded from Some(/usr/lib/spark/conf) : spark-env.sh,hive-site.xml,log4j2.properties,metrics.properties
23/06/08 16:25:16 INFO BasicExecutorFeatureStep: Decommissioning not enabled, skipping shutdown script
23/06/08 16:25:17 WARN ExecutorPodsSnapshotsStoreImpl: Exception when notifying snapshot subscriber.
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://kubernetes.default.svc/api/v1/namespaces/spark-apps/pods. Message: Pod "kyuubi-0a1b9d58-85e7-416c-aacb-c374e2e1b6b3-exec-1" is invalid: metadata.labels: Invalid value: "kyuubi-user-spark-sql-khwunchai-default-73bce6a4-df00-403e-bc5d-d1721e515f9d-f0ccb3889bd3576e-driver": must be no more than 63 characters. Received status: Status(apiVersion=v1, code=422, details=StatusDetails(causes=[StatusCause(field=metadata.labels, message=Invalid value: "kyuubi-user-spark-sql-khwunchai-default-73bce6a4-df00-403e-bc5d-d1721e515f9d-f0ccb3889bd3576e-driver": must be no more than 63 characters, reason=FieldValueInvalid, additionalProperties={})], group=null, kind=Pod, name=kyuubi-0a1b9d58-85e7-416c-aacb-c374e2e1b6b3-exec-1, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Pod "kyuubi-0a1b9d58-85e7-416c-aacb-c374e2e1b6b3-exec-1" is invalid: metadata.labels: Invalid value: "kyu
ubi-user-spark-sql-khwunchai-default-73bce6a4-df00-403e-bc5d-d1721e515f9d-f0ccb3889bd3576e-driver": must be no more than 63 characters, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid, status=Failure, additionalProperties={}).
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:682) ~[kubernetes-client-5.12.2.jar:?]
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:661) ~[kubernetes-client-5.12.2.jar:?]
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612) ~[kubernetes-client-5.12.2.jar:?]
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:555) ~[kubernetes-client-5.12.2.jar:?]
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:518) ~[kubernetes-client-5.12.2.jar:?]
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:305) ~[kubernetes-client-5.12.2.jar:?]
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:644) ~[kubernetes-client-5.12.2.jar:?]
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:83) ~[kubernetes-client-5.12.2.jar:?]
at io.fabric8.kubernetes.client.dsl.base.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:61) ~[kubernetes-client-5.12.2.jar:?]
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$1(ExecutorPodsAllocator.scala:430) ~[spark-kubernetes_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0]
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) ~[scala-library-2.12.15.jar:?]
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.requestNewExecutors(ExecutorPodsAllocator.scala:412) ~[spark-kubernetes_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0]
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$37(ExecutorPodsAllocator.scala:376) ~[spark-kubernetes_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0]
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$37$adapted(ExecutorPodsAllocator.scala:369) ~[spark-kubernetes_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0]
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) ~[scala-library-2.12.15.jar:?]
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) ~[scala-library-2.12.15.jar:?]
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) ~[scala-library-2.12.15.jar:?]
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.onNewSnapshots(ExecutorPodsAllocator.scala:369) ~[spark-kubernetes_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0]
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3(ExecutorPodsAllocator.scala:143) ~[spark-kubernetes_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0]
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3$adapted(ExecutorPodsAllocator.scala:143) ~[spark-kubernetes_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0]
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber.org$apache$spark$scheduler$cluster$k8s$ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber$$processSnapshotsInternal(ExecutorPodsSnapshotsStoreImpl.scala:138) ~[spark-kubernetes_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0]
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber.processSnapshots(ExecutorPodsSnapshotsStoreImpl.scala:126) ~[spark-kubernetes_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0]
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl.$anonfun$addSubscriber$1(ExecutorPodsSnapshotsStoreImpl.scala:81) ~[spark-kubernetes_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_362]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) ~[?:1.8.0_362]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) ~[?:1.8.0_362]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) ~[?:1.8.0_362]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_362]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_362]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
```
### Kyuubi Server Configurations
```yaml
23/06/08 16:25:13 INFO SparkContext: Spark configuration:
spark.app.id=spark-0968f860f58f469cba38861033b463bf
spark.app.name=kyuubi_USER_SPARK_SQL_khwunchai_default_73bce6a4-df00-403e-bc5d-d1721e515f9d
spark.app.startTime=1686241513713
spark.app.submitTime=1686241513129
spark.authenticate=true
spark.blacklist.decommissioning.enabled=true
spark.blacklist.decommissioning.timeout=1h
spark.databricks.delta.schema.autoMerge.enabled=true
spark.decommissioning.timeout.threshold=20
spark.default.parallelism=8
spark.driver.bindAddress=10.177.40.182
spark.driver.blockManager.port=7079
spark.driver.cores=1
spark.driver.defaultJavaOptions=-XX:OnOutOfMemoryError='kill -9 %p' -XX:+UseParallelGC -XX:InitiatingHeapOccupancyPercent=70
spark.driver.extraClassPath=/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/docker/usr/lib/hadoop-lzo/lib/*:/docker/usr/lib/hadoop/hadoop-aws.jar:/docker/usr/share/aws/aws-java-sdk/*:/docker/usr/share/aws/emr/emrfs/conf:/docker/usr/share/aws/emr/emrfs/lib/*:/docker/usr/share/aws/emr/emrfs/auxlib/*:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/docker/usr/share/aws/emr/security/conf:/docker/usr/share/aws/emr/security/lib/*:/docker/usr/share/aws/hmclient/lib/aws-glue-datacata
log-spark-client.jar:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/usr/share/aws/redshift/jdbc/RedshiftJDBC.jar:/usr/share/aws/redshift/spark-redshift/lib/*
spark.driver.extraJavaOptions=-XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -XX:OnOutOfMemoryError='kill -9 %p' -XX:+UseParallelGC -XX:InitiatingHeapOccupancyPercent=70
spark.driver.extraLibraryPath=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
spark.driver.host=spark-187df6889bd35db6-driver-svc.spark-apps.svc
spark.driver.memory=3600M
spark.driver.port=7078
spark.dynamicAllocation.cachedExecutorIdleTimeout=300s
spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.executorAllocationRatio=0.33
spark.dynamicAllocation.initialExecutors=1
spark.dynamicAllocation.maxExecutors=2
spark.dynamicAllocation.shuffleTracking.enabled=true
spark.eventLog.dir=s3://omise-data-platform-apps-staging/spark/logs
spark.eventLog.enabled=true
spark.executor.cores=1
spark.executor.defaultJavaOptions=-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseParallelGC -XX:InitiatingHeapOccupancyPercent=70 -XX:OnOutOfMemoryError='kill -9 %p'
spark.executor.extraClassPath=/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/docker/usr/lib/hadoop-lzo/lib/*:/docker/usr/lib/hadoop/hadoop-aws.jar:/docker/usr/share/aws/aws-java-sdk/*:/docker/usr/share/aws/emr/emrfs/conf:/docker/usr/share/aws/emr/emrfs/lib/*:/docker/usr/share/aws/emr/emrfs/auxlib/*:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/docker/usr/share/aws/emr/security/conf:/docker/usr/share/aws/emr/security/lib/*:/docker/usr/share/aws/hmclient/lib/aws-glue-dataca
talog-spark-client.jar:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/usr/share/aws/redshift/jdbc/RedshiftJDBC.jar:/usr/share/aws/redshift/spark-redshift/lib/*
spark.executor.extraJavaOptions=-XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseParallelGC -XX:InitiatingHeapOccupancyPercent=70 -XX:OnOutOfMemoryError='kill -9 %p'
spark.executor.extraLibraryPath=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
spark.executor.memory=7200M
spark.executorEnv.SPARK_USER_NAME=khwunchai
spark.files.fetchFailure.unRegisterOutputOnHost=true
spark.hadoop.dynamodb.customAWSCredentialsProvider=*********(redacted)
spark.hadoop.fs.defaultFS=file:///
spark.hadoop.fs.s3.customAWSCredentialsProvider=*********(redacted)
spark.hadoop.fs.s3.getObject.initialSocketTimeoutMilliseconds=2000
spark.hadoop.hive.metastore.client.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem=2
spark.hadoop.mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem=true
spark.hadoop.mapreduce.input.fileinputformat.list-status.num-threads=20
spark.history.fs.logDirectory=file:///var/log/spark/apps
spark.history.ui.port=18080
spark.hive.server2.thrift.resultset.default.fetch.size=1000
spark.jars=file:/tmp/spark-b3f39e11-1a74-40f7-a84b-273d5a2ad361/kyuubi-spark-sql-engine_2.12-1.7.1.jar,local:///usr/share/aws/delta/lib/delta-core.jar,local:///usr/share/aws/delta/lib/delta-storage.jar
spark.kryoserializer.buffer.max=256
spark.kubernetes.authenticate.driver.serviceAccountName=kyuubi-sparksql-engine
spark.kubernetes.authenticate.executor.serviceAccountName=kyuubi-sparksql-engine
spark.kubernetes.container.image=671219180197.dkr.ecr.ap-southeast-1.amazonaws.com/spark/emr-6.10.0:20230421
spark.kubernetes.container.image.pullPolicy=Always
spark.kubernetes.driver.label.kyuubi-unique-tag=73bce6a4-df00-403e-bc5d-d1721e515f9d
spark.kubernetes.driver.pod.name=kyuubi-user-spark-sql-khwunchai-default-73bce6a4-df00-403e-bc5d-d1721e515f9d-f0ccb3889bd3576e-driver
spark.kubernetes.driver.podTemplateContainerName=spark-kubernetes-driver
spark.kubernetes.driver.podTemplateFile=/opt/kyuubi/conf/driver-template.yaml
spark.kubernetes.driver.request.cores=250m
spark.kubernetes.driverEnv.SPARK_USER_NAME=khwunchai
spark.kubernetes.executor.podNamePrefix=kyuubi-0a1b9d58-85e7-416c-aacb-c374e2e1b6b3
spark.kubernetes.executor.podTemplateContainerName=spark-kubernetes-executor
spark.kubernetes.executor.podTemplateFile=/opt/spark/pod-template/pod-spec-template.yml
spark.kubernetes.executor.request.cores=500m
spark.kubernetes.file.upload.path=s3://omise-data-platform-apps-staging/spark/uploads/
spark.kubernetes.memoryOverheadFactor=0.1
spark.kubernetes.namespace=spark-apps
spark.kubernetes.pyspark.pythonVersion=3
spark.kubernetes.resource.type=java
spark.kubernetes.submitInDriver=true
spark.kyuubi.client.ipAddress=192.168.1.101
spark.kyuubi.client.version=1.7.0
spark.kyuubi.credentials.hadoopfs.enabled=false
spark.kyuubi.credentials.hive.enabled=false
spark.kyuubi.engine.credentials=
spark.kyuubi.engine.share.level=USER
spark.kyuubi.engine.submit.time=1686241495689
spark.kyuubi.engine.type=SPARK_SQL
spark.kyuubi.frontend.connection.url.use.hostname=false
spark.kyuubi.frontend.protocols=THRIFT_BINARY,REST
spark.kyuubi.ha.addresses=zookeeper-headless.spark.svc.cluster.local
spark.kyuubi.ha.client.class=org.apache.kyuubi.ha.client.zookeeper.ZookeeperDiscoveryClient
spark.kyuubi.ha.enabled=true
spark.kyuubi.ha.engine.ref.id=73bce6a4-df00-403e-bc5d-d1721e515f9d
spark.kyuubi.ha.namespace=/kyuubi_1.7.1_USER_SPARK_SQL/khwunchai/default
spark.kyuubi.ha.zookeeper.auth.type=NONE
spark.kyuubi.ha.zookeeper.client.port=2181
spark.kyuubi.ha.zookeeper.engine.auth.type=NONE
spark.kyuubi.ha.zookeeper.session.timeout=600000
spark.kyuubi.server.ipAddress=0.0.0.0
spark.kyuubi.session.connection.url=0.0.0.0:10009
spark.kyuubi.session.engine.idle.timeout=PT20M
spark.kyuubi.session.engine.initialize.timeout=120000
spark.kyuubi.session.real.user=khwunchai
spark.logConf=true
spark.master=k8s://https://kubernetes.default.svc:443
spark.redaction.regex=*********(redacted)
spark.repl.class.outputDir=/var/data/spark-fce5fc27-0a38-451f-b83f-e3712babead1/spark-163261a5-b242-4996-aa67-65212e84d128/repl-85bea369-fd1b-414f-93d2-357c947d6e52
spark.repl.local.jars=file:/tmp/spark-b3f39e11-1a74-40f7-a84b-273d5a2ad361/kyuubi-spark-sql-engine_2.12-1.7.1.jar,local:///usr/share/aws/delta/lib/delta-core.jar,local:///usr/share/aws/delta/lib/delta-storage.jar
spark.resourceManager.cleanupExpiredHost=true
spark.scheduler.mode=FAIR
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.shuffle.service.enabled=false
spark.sql.adaptive.enabled=true
spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog
spark.sql.catalogImplementation=hive
spark.sql.emr.internal.extensions=com.amazonaws.emr.spark.EmrSparkSessionExtensions
spark.sql.execution.topKSortFallbackThreshold=10000
spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension
spark.sql.legacy.castComplexTypesToString.enabled=true
spark.sql.parquet.datetimeRebaseModeInRead=CORRECTED
spark.sql.parquet.datetimeRebaseModeInWrite=CORRECTED
spark.sql.parquet.fs.optimized.committer.optimization-enabled=true
spark.sql.parquet.int96RebaseModeInRead=CORRECTED
spark.sql.parquet.int96RebaseModeInWrite=CORRECTED
spark.sql.parquet.output.committer.class=com.amazon.emr.committer.EmrOptimizedSparkSqlParquetOutputCommitter
spark.sql.sources.partitionColumnTypeInference.enabled=false
spark.stage.attempt.ignoreOnDecommissionFetchFailure=true
spark.submit.deployMode=client
spark.submit.pyFiles=
spark.ui.enabled=true
spark.ui.port=4040
spark.yarn.heterogeneousExecutors.enabled=false
```
### Kyuubi Engine Configurations
_No response_
### Additional context
Spark version 3.3.1-amzn-0 (EMR Containers)
### Are you willing to submit PR?
- [X] Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix.
- [ ] No. I cannot submit a PR at this time.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [kyuubi] MrZsy commented on issue #4942: [Bug] Unable to create Spark executors, k8s label is too long, exceeding 63 characters
Posted by "MrZsy (via GitHub)" <gi...@apache.org>.
MrZsy commented on issue #4942:
URL: https://github.com/apache/kyuubi/issues/4942#issuecomment-1697207397
You can add spark.app.name= configuration to the configuration file and try
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [kyuubi] pan3793 commented on issue #4942: [Bug] Unable to create Spark executors, k8s label is too long, exceeding 63 characters
Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on issue #4942:
URL: https://github.com/apache/kyuubi/issues/4942#issuecomment-1584249805
@khwj Spark takes responsibility to generate a valid name for such cases, if you search on Spark JIRA/PR, several similar issues were fixed as time going. 3.3.1 may not contain all of those fixes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org
[GitHub] [kyuubi] github-actions[bot] commented on issue #4942: [Bug] Unable to create Spark executors, k8s label is too long, exceeding 63 characters
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #4942:
URL: https://github.com/apache/kyuubi/issues/4942#issuecomment-1583036237
Hello @khwj,
Thanks for finding the time to report the issue!
We really appreciate the community's efforts to improve Apache Kyuubi.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org