You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Stefan Richter (JIRA)" <ji...@apache.org> on 2018/12/10 10:10:00 UTC

[jira] [Commented] (FLINK-10844) ArrayIndexOutOfBoundsException during checkpoint (Flink 1.5.2)

    [ https://issues.apache.org/jira/browse/FLINK-10844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16714523#comment-16714523 ] 

Stefan Richter commented on FLINK-10844:
----------------------------------------

Hi, sorry for the late reply. I cannot answer that question with the given information. I would assume the problem is in the code that defines or modifies the key objects for your keyed state. Did you have any new insights on this issue in the meantime?

> ArrayIndexOutOfBoundsException during checkpoint (Flink 1.5.2)
> --------------------------------------------------------------
>
>                 Key: FLINK-10844
>                 URL: https://issues.apache.org/jira/browse/FLINK-10844
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core
>         Environment: Flink 1.5.2
> AWS EMR 5.17.0
>            Reporter: Daniel Harper
>            Priority: Minor
>
> We have seen this exception a few times in our production environment during  checkpoints, which causes the job to restart
> {code}
> AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2426 for operator x.x.x.x.streaming.pipeline.transformations.concurrentstreams.ConcurrentStreamsAggregator PERFORM COUNT DISTINCT OVER UUIDS FOR KEY -> ParDo(ToConcurrentStreamsResult)/ParMultiDo(ToConcurrentStreamsResult) -> JdbcIO.Write/ParDo(Write)/ParMultiDo(Write) (28/32).}
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointExceptionHandler.tryHandleCheckpointException(StreamTask.java:1154)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:948)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:885)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 	at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.Exception: Could not materialize checkpoint 2426 for operator x.x.x.x.streaming.pipeline.transformations.concurrentstreams.ConcurrentStreamsAggregator PERFORM COUNT DISTINCT OVER UUIDS FOR KEY -> ParDo(ToConcurrentStreamsResult)/ParMultiDo(ToConcurrentStreamsResult) -> JdbcIO.Write/ParDo(Write)/ParMultiDo(Write) (28/32).
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:943)
> 	... 6 more
> Caused by: java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: -22
> 	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> 	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> 	at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:53)
> 	at org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:47)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:854)
> 	... 5 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: -22
> 	at org.apache.flink.runtime.state.heap.CopyOnWriteStateTableSnapshot.partitionEntriesByKeyGroup(CopyOnWriteStateTableSnapshot.java:162)
> 	at org.apache.flink.runtime.state.heap.CopyOnWriteStateTableSnapshot.writeMappingsInKeyGroup(CopyOnWriteStateTableSnapshot.java:178)
> 	at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend$HeapSnapshotStrategy$1.performOperation(HeapKeyedStateBackend.java:697)
> 	at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend$HeapSnapshotStrategy$1.performOperation(HeapKeyedStateBackend.java:641)
> 	at org.apache.flink.runtime.io.async.AbstractAsyncCallableWithResources.call(AbstractAsyncCallableWithResources.java:75)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:50)
> 	... 7 more
> {code}
> Looks like the index being computed here: 
> https://github.com/apache/flink/blob/release-1.5.2/flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/CopyOnWriteStateTableSnapshot.java#L161
> Is returning -22, causing the AOOB exception to fire
> https://github.com/apache/flink/blob/release-1.5.2/flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/CopyOnWriteStateTableSnapshot.java#L162



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)