You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Yuan Mei (Jira)" <ji...@apache.org> on 2020/07/02 11:23:00 UTC

[jira] [Commented] (FLINK-17912) KafkaShuffleITCase.testAssignedToPartitionEventTime: "Watermark should always increase"

    [ https://issues.apache.org/jira/browse/FLINK-17912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17150196#comment-17150196 ] 

Yuan Mei commented on FLINK-17912:
----------------------------------

This issue is different from FLINK-17949, hence I did another set of replays of the most recent failed test "KafkaShuffleITCase.testSimpleEventTime" on my local azure. Here are the results:

*I’ve run 2490 tests for more than 14 hours without a single test failure on azure.*

*plus the set of replays I did in FLINK-17949, I've run about 5500 tests for about 25 hours in total without a single test failure on azure.*

Here are a bit more details:

The tests are replayed rebased on master 621240108f6146c1a85376484954dbb9daaa25f3 
 * 200 runs, KafkaShuffleITCase.testSimpleEventTime (with other tests commented); succeed in 3 hours; 200 tests in total

[https://dev.azure.com/mymeiyuan/Flink/_build/results?buildId=45&view=results]
 * 343 runs, KafkaShuffleITCase.testSimpleEventTime (with other tests commented), canceled after 4 hours by azure; 343 tests in total

[https://dev.azure.com/mymeiyuan/Flink/_build/results?buildId=46&view=results]
 * 341 runs, KafkaShuffleITCase.testSimpleEventTime (with other tests commented), canceled after 4 hours by azure; 341 tests in total

[https://dev.azure.com/mymeiyuan/Flink/_build/results?buildId=47&view=results]
 * 146 runs; with all the original tests in KafkaShuffleITCase; canceled after 4 hours by azure; total 1606 tests

[https://dev.azure.com/mymeiyuan/Flink/_build/results?buildId=48&view=results]

 

So, I will leave this ticket as it is for now, and will close it if no more cases are reported.

 

 

> KafkaShuffleITCase.testAssignedToPartitionEventTime: "Watermark should always increase"
> ---------------------------------------------------------------------------------------
>
>                 Key: FLINK-17912
>                 URL: https://issues.apache.org/jira/browse/FLINK-17912
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / Kafka, Tests
>    Affects Versions: 1.11.0, 1.12.0
>            Reporter: Robert Metzger
>            Priority: Critical
>              Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=2062&view=logs&j=1fc6e7bf-633c-5081-c32a-9dea24b05730&t=0d9ad4c1-5629-5ffc-10dc-113ca91e23c5
> {code}
> 2020-05-22T21:16:24.7188044Z org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
> 2020-05-22T21:16:24.7188796Z 	at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147)
> 2020-05-22T21:16:24.7189596Z 	at org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:677)
> 2020-05-22T21:16:24.7190352Z 	at org.apache.flink.streaming.util.TestStreamEnvironment.execute(TestStreamEnvironment.java:81)
> 2020-05-22T21:16:24.7191261Z 	at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1673)
> 2020-05-22T21:16:24.7191824Z 	at org.apache.flink.test.util.TestUtils.tryExecute(TestUtils.java:35)
> 2020-05-22T21:16:24.7192325Z 	at org.apache.flink.streaming.connectors.kafka.shuffle.KafkaShuffleITCase.testAssignedToPartition(KafkaShuffleITCase.java:296)
> 2020-05-22T21:16:24.7192962Z 	at org.apache.flink.streaming.connectors.kafka.shuffle.KafkaShuffleITCase.testAssignedToPartitionEventTime(KafkaShuffleITCase.java:126)
> 2020-05-22T21:16:24.7193436Z 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-05-22T21:16:24.7193999Z 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-05-22T21:16:24.7194720Z 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-05-22T21:16:24.7195226Z 	at java.lang.reflect.Method.invoke(Method.java:498)
> 2020-05-22T21:16:24.7195864Z 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 2020-05-22T21:16:24.7196574Z 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2020-05-22T21:16:24.7197511Z 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 2020-05-22T21:16:24.7198020Z 	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 2020-05-22T21:16:24.7198494Z 	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 2020-05-22T21:16:24.7199128Z 	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> 2020-05-22T21:16:24.7199689Z 	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> 2020-05-22T21:16:24.7200308Z 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 2020-05-22T21:16:24.7200645Z 	at java.lang.Thread.run(Thread.java:748)
> 2020-05-22T21:16:24.7201029Z Caused by: org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy
> 2020-05-22T21:16:24.7201643Z 	at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:116)
> 2020-05-22T21:16:24.7202275Z 	at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:78)
> 2020-05-22T21:16:24.7202863Z 	at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:192)
> 2020-05-22T21:16:24.7203525Z 	at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:185)
> 2020-05-22T21:16:24.7204072Z 	at org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:179)
> 2020-05-22T21:16:24.7204618Z 	at org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:503)
> 2020-05-22T21:16:24.7205255Z 	at org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:386)
> 2020-05-22T21:16:24.7205716Z 	at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
> 2020-05-22T21:16:24.7206191Z 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-05-22T21:16:24.7206585Z 	at java.lang.reflect.Method.invoke(Method.java:498)
> 2020-05-22T21:16:24.7207261Z 	at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:284)
> 2020-05-22T21:16:24.7207736Z 	at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:199)
> 2020-05-22T21:16:24.7208234Z 	at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
> 2020-05-22T21:16:24.7208728Z 	at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
> 2020-05-22T21:16:24.7209145Z 	at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
> 2020-05-22T21:16:24.7209536Z 	at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
> 2020-05-22T21:16:24.7210039Z 	at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
> 2020-05-22T21:16:24.7210447Z 	at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
> 2020-05-22T21:16:24.7210839Z 	at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
> 2020-05-22T21:16:24.7211241Z 	at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> 2020-05-22T21:16:24.7211701Z 	at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> 2020-05-22T21:16:24.7212065Z 	at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
> 2020-05-22T21:16:24.7212442Z 	at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
> 2020-05-22T21:16:24.7212806Z 	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
> 2020-05-22T21:16:24.7213158Z 	at akka.actor.ActorCell.invoke(ActorCell.scala:561)
> 2020-05-22T21:16:24.7213494Z 	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
> 2020-05-22T21:16:24.7213841Z 	at akka.dispatch.Mailbox.run(Mailbox.scala:225)
> 2020-05-22T21:16:24.7214352Z 	at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
> 2020-05-22T21:16:24.7214790Z 	at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> 2020-05-22T21:16:24.7215213Z 	at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> 2020-05-22T21:16:24.7215629Z 	at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> 2020-05-22T21:16:24.7216063Z 	at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 2020-05-22T21:16:24.7216568Z Caused by: java.lang.IllegalStateException: Watermark should always increase: current : new 1590182136538:1590182136260
> 2020-05-22T21:16:24.7217362Z 	at org.apache.flink.util.Preconditions.checkState(Preconditions.java:195)
> 2020-05-22T21:16:24.7218100Z 	at org.apache.flink.streaming.connectors.kafka.internal.KafkaShuffleFetcher$WatermarkHandler.checkAndGetNewWatermark(KafkaShuffleFetcher.java:278)
> 2020-05-22T21:16:24.7218884Z 	at org.apache.flink.streaming.connectors.kafka.internal.KafkaShuffleFetcher$WatermarkHandler.access$000(KafkaShuffleFetcher.java:262)
> 2020-05-22T21:16:24.7219535Z 	at org.apache.flink.streaming.connectors.kafka.internal.KafkaShuffleFetcher.partitionConsumerRecordsHandler(KafkaShuffleFetcher.java:133)
> 2020-05-22T21:16:24.7220126Z 	at org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.runFetchLoop(KafkaFetcher.java:141)
> 2020-05-22T21:16:24.7220643Z 	at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:768)
> 2020-05-22T21:16:24.7221142Z 	at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
> 2020-05-22T21:16:24.7221636Z 	at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
> 2020-05-22T21:16:24.7222157Z 	at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:201)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)