You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Robert Metzger (Jira)" <ji...@apache.org> on 2020/11/05 15:35:00 UTC

[jira] [Created] (FLINK-20005) "Kerberized YARN application" test unstable

Robert Metzger created FLINK-20005:
--------------------------------------

             Summary: "Kerberized YARN application" test unstable
                 Key: FLINK-20005
                 URL: https://issues.apache.org/jira/browse/FLINK-20005
             Project: Flink
          Issue Type: Bug
          Components: Deployment / YARN, Runtime / Coordination
    Affects Versions: 1.12.0
            Reporter: Robert Metzger


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9066&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529

The {{Running Kerberized YARN application on Docker test (default input)}} is failing.

These are some exceptions spotted in the logs:
{code}
2020-11-05T14:22:29.3315695Z Nov 05 14:22:29 2020-11-05 14:21:52,696 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Flat Map (2/3) (7806b7a7074425c5ff0906befd94e122) switched from SCHEDULED to FAILED on not deployed.
2020-11-05T14:22:29.3318307Z Nov 05 14:22:29 java.util.concurrent.CompletionException: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Slot request bulk is not fulfillable! Could not allocate the required slot within slot request timeout
2020-11-05T14:22:29.3320512Z Nov 05 14:22:29 	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_272]
2020-11-05T14:22:29.3322173Z Nov 05 14:22:29 	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_272]
2020-11-05T14:22:29.3323809Z Nov 05 14:22:29 	at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) ~[?:1.8.0_272]
2020-11-05T14:22:29.3325448Z Nov 05 14:22:29 	at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) ~[?:1.8.0_272]
2020-11-05T14:22:29.3331094Z Nov 05 14:22:29 	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_272]
2020-11-05T14:22:29.3332769Z Nov 05 14:22:29 	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_272]
2020-11-05T14:22:29.3335736Z Nov 05 14:22:29 	at org.apache.flink.runtime.scheduler.SharedSlot.cancelLogicalSlotRequest(SharedSlot.java:195) ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-05T14:22:29.3342621Z Nov 05 14:22:29 	at org.apache.flink.runtime.scheduler.SlotSharingExecutionSlotAllocator.cancelLogicalSlotRequest(SlotSharingExecutionSlotAllocator.java:147) ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-05T14:22:29.3348463Z Nov 05 14:22:29 	at org.apache.flink.runtime.scheduler.SharingPhysicalSlotRequestBulk.cancel(SharingPhysicalSlotRequestBulk.java:84) ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-05T14:22:29.3353749Z Nov 05 14:22:29 	at org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkWithTimestamp.cancel(PhysicalSlotRequestBulkWithTimestamp.java:66) ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-05T14:22:29.3362495Z Nov 05 14:22:29 	at org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:87) ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-05T14:22:29.3366937Z Nov 05 14:22:29 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_272]
2020-11-05T14:22:29.3370686Z Nov 05 14:22:29 	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_272]
2020-11-05T14:22:29.3380715Z Nov 05 14:22:29 	at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:404) ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-05T14:22:29.3384436Z Nov 05 14:22:29 	at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:197) ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-05T14:22:29.3387431Z Nov 05 14:22:29 	at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-05T14:22:29.3390333Z Nov 05 14:22:29 	at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:154) ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-05T14:22:29.3392937Z Nov 05 14:22:29 	at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-05T14:22:29.3395430Z Nov 05 14:22:29 	at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-05T14:22:29.3397949Z Nov 05 14:22:29 	at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-05T14:22:29.3401799Z Nov 05 14:22:29 	at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-05T14:22:29.3449637Z Nov 05 14:22:29 	at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-05T14:22:29.3452289Z Nov 05 14:22:29 	at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-05T14:22:29.3454833Z Nov 05 14:22:29 	at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-05T14:22:29.3458801Z Nov 05 14:22:29 	at akka.actor.Actor$class.aroundReceive(Actor.scala:517) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-05T14:22:29.3469564Z Nov 05 14:22:29 	at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-05T14:22:29.3472736Z Nov 05 14:22:29 	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-05T14:22:29.3475094Z Nov 05 14:22:29 	at akka.actor.ActorCell.invoke(ActorCell.scala:561) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-05T14:22:29.3478753Z Nov 05 14:22:29 	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-05T14:22:29.3497848Z Nov 05 14:22:29 	at akka.dispatch.Mailbox.run(Mailbox.scala:225) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-05T14:22:29.3516200Z Nov 05 14:22:29 	at akka.dispatch.Mailbox.exec(Mailbox.scala:235) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-05T14:22:29.3519594Z Nov 05 14:22:29 	at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-05T14:22:29.3522331Z Nov 05 14:22:29 	at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-05T14:22:29.3524990Z Nov 05 14:22:29 	at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-05T14:22:29.3528102Z Nov 05 14:22:29 	at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-05T14:22:29.3530334Z Nov 05 14:22:29 Caused by: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Slot request bulk is not fulfillable! Could not allocate the required slot within slot request timeout
2020-11-05T14:22:29.3534080Z Nov 05 14:22:29 	at org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:84) ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-05T14:22:29.3536451Z Nov 05 14:22:29 	... 24 more
2020-11-05T14:22:29.3537535Z Nov 05 14:22:29 Caused by: java.util.concurrent.TimeoutException: Timeout has occurred: 120000 ms
2020-11-05T14:22:29.3540969Z Nov 05 14:22:29 	at org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:84) ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-05T14:22:29.3542868Z Nov 05 14:22:29 	... 24 more
{code}

{code}
2020-11-05T14:22:14.3964651Z Nov 05 14:22:13 20/11/05 14:21:55 INFO rmapp.RMAppImpl: application_1604585664395_0001 State change from RUNNING to FINAL_SAVING on event=ATTEMPT_FAILED
2020-11-05T14:22:14.3966539Z Nov 05 14:22:13 20/11/05 14:21:55 INFO recovery.RMStateStore: Updating info for app: application_1604585664395_0001
2020-11-05T14:22:14.3968255Z Nov 05 14:22:13 20/11/05 14:21:55 INFO capacity.CapacityScheduler: Application Attempt appattempt_1604585664395_0001_000001 is done. finalState=FAILED
2020-11-05T14:22:14.3970618Z Nov 05 14:22:13 20/11/05 14:21:55 INFO rmapp.RMAppImpl: Application application_1604585664395_0001 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1604585664395_0001_000001 exited with  exitCode: 2
2020-11-05T14:22:14.3973331Z Nov 05 14:22:13 Failing this attempt.Diagnostics: Exception from container-launch.
2020-11-05T14:22:14.3974475Z Nov 05 14:22:13 Container id: container_1604585664395_0001_01_000001
2020-11-05T14:22:14.3975384Z Nov 05 14:22:13 Exit code: 2
2020-11-05T14:22:14.3976946Z Nov 05 14:22:13 Stack trace: org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: Launch container failed
2020-11-05T14:22:14.3979115Z Nov 05 14:22:13 	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:112)
2020-11-05T14:22:14.3981642Z Nov 05 14:22:13 	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:130)
2020-11-05T14:22:14.3983756Z Nov 05 14:22:13 	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:395)
2020-11-05T14:22:14.3985627Z Nov 05 14:22:13 	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
2020-11-05T14:22:14.3987444Z Nov 05 14:22:13 	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
2020-11-05T14:22:14.3989017Z Nov 05 14:22:13 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
2020-11-05T14:22:14.3990393Z Nov 05 14:22:13 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
2020-11-05T14:22:14.3991866Z Nov 05 14:22:13 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
2020-11-05T14:22:14.3993133Z Nov 05 14:22:13 	at java.lang.Thread.run(Thread.java:748)
2020-11-05T14:22:14.3993947Z Nov 05 14:22:13 
2020-11-05T14:22:14.3994706Z Nov 05 14:22:13 Shell output: main : command provided 1
{code}






--
This message was sent by Atlassian Jira
(v8.3.4#803005)