You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Stefan Richter (JIRA)" <ji...@apache.org> on 2018/11/23 16:41:01 UTC

[jira] [Resolved] (FLINK-10946) Resuming Externalized Checkpoint (rocks, incremental, scale up) end-to-end test failed on Travis

     [ https://issues.apache.org/jira/browse/FLINK-10946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stefan Richter resolved FLINK-10946.
------------------------------------
       Resolution: Fixed
    Fix Version/s:     (was: 1.7.1)
                       (was: 1.8.0)
                   1.7.0

Merged in:
master: 6e62ca2793
release-1.7: 8cf611e525

> Resuming Externalized Checkpoint (rocks, incremental, scale up) end-to-end test failed on Travis
> ------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-10946
>                 URL: https://issues.apache.org/jira/browse/FLINK-10946
>             Project: Flink
>          Issue Type: Bug
>          Components: E2E Tests
>    Affects Versions: 1.7.0
>            Reporter: Andrey Zagrebin
>            Assignee: Andrey Zagrebin
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.7.0
>
>
> The test failed 3 times in total during the overall night build, but succeeded 2 times after restart. It did not fail locally for me.
> Here is a travis build to run it 500 times (reproducable):
> [https://travis-ci.org/azagrebin/flink/builds/457375100]
> {code:java}
> 2018-11-20 11:59:54,673 INFO  org.apache.flink.runtime.taskmanager.Task                     - Triggering cancellation of task code ArtificalKeyedStateMapper_Avro -> ArtificalOperatorStateMapper (2/2) (e06b7022f2f2154f2a84206f068ff1fd).
> 2018-11-20 11:59:54,701 INFO  org.apache.flink.streaming.api.operators.AbstractStreamOperator  - Could not complete snapshot 12 for operator ArtificalKeyedStateMapper_Avro -> ArtificalOperatorStateMapper (1/2).
> java.io.IOException: Cannot register Closeable, registry is already closed. Closing argument.
> 	at org.apache.flink.util.AbstractCloseableRegistry.registerCloseable(AbstractCloseableRegistry.java:85)
> 	at org.apache.flink.runtime.state.AsyncSnapshotCallable$AsyncSnapshotTask.<init>(AsyncSnapshotCallable.java:123)
> 	at org.apache.flink.runtime.state.AsyncSnapshotCallable$AsyncSnapshotTask.<init>(AsyncSnapshotCallable.java:111)
> 	at org.apache.flink.runtime.state.AsyncSnapshotCallable.toAsyncSnapshotFutureTask(AsyncSnapshotCallable.java:105)
> 	at org.apache.flink.contrib.streaming.state.snapshot.RocksIncrementalSnapshotStrategy.doSnapshot(RocksIncrementalSnapshotStrategy.java:164)
> 	at org.apache.flink.contrib.streaming.state.snapshot.RocksDBSnapshotStrategyBase.snapshot(RocksDBSnapshotStrategyBase.java:128)
> 	at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.snapshot(RocksDBKeyedStateBackend.java:496)
> 	at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:407)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(StreamTask.java:1113)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1055)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:729)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:641)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:586)
> 	at org.apache.flink.streaming.runtime.io.BarrierBuffer.notifyCheckpoint(BarrierBuffer.java:396)
> 	at org.apache.flink.streaming.runtime.io.BarrierBuffer.processBarrier(BarrierBuffer.java:292)
> 	at org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:200)
> 	at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:209)
> 	at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
> 	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704)
> 	at java.lang.Thread.run(Thread.java:748)
> 2018-11-20 11:59:54,702 INFO  org.apache.flink.runtime.taskmanager.Task                     - Ensuring all FileSystem streams are closed for task Source: Custom Source -> Timestamps/Watermarks (2/2) (09528d6ab0e1ee87ed21e78139682b18) [CANCELED]
> 2018-11-20 11:59:54,703 INFO  org.apache.flink.runtime.taskmanager.Task                     - Ensuring all FileSystem streams are closed for task Source: Custom Source -> Timestamps/Watermarks (1/2) (c98146380e7f559ca18a183f4c0ef12d) [CANCELED]
> 2018-11-20 11:59:54,721 INFO  org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend  - Deleting existing instance base directory /tmp/flink-io-71851605-f8f2-4d4e-83e7-9b69e0a879ef/job_41e7002f55a128f646117fc14cf858a1_op_StreamMap_7d23c6ceabda05a587f0217e44f21301__2_2__uuid_59f43f20-768f-4117-9a3f-4a101a32b1d2.
> 2018-11-20 11:59:54,724 INFO  org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend  - Deleting existing instance base directory /tmp/flink-io-71851605-f8f2-4d4e-83e7-9b69e0a879ef/job_41e7002f55a128f646117fc14cf858a1_op_StreamMap_7d23c6ceabda05a587f0217e44f21301__1_2__uuid_f3ce6abc-52dc-4fa1-820e-e015daca418c.
> 2018-11-20 11:59:54,732 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to cancel task TumblingWindowOperator (1/2) (3ea121867a218e10137c9bfe9ef991b8).
> 2018-11-20 11:59:54,732 INFO  org.apache.flink.runtime.taskmanager.Task                     - TumblingWindowOperator (1/2) (3ea121867a218e10137c9bfe9ef991b8) switched from RUNNING to CANCELING.
> 2018-11-20 11:59:54,732 INFO  org.apache.flink.runtime.taskmanager.Task                     - Triggering cancellation of task code TumblingWindowOperator (1/2) (3ea121867a218e10137c9bfe9ef991b8).
> 2018-11-20 11:59:54,769 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to cancel task TumblingWindowOperator (2/2) (7a117e9804f71c6a995e31616ee05ec9).
> 2018-11-20 11:59:54,770 INFO  org.apache.flink.runtime.taskmanager.Task                     - TumblingWindowOperator (2/2) (7a117e9804f71c6a995e31616ee05ec9) switched from RUNNING to CANCELING.
> 2018-11-20 11:59:54,770 INFO  org.apache.flink.runtime.taskmanager.Task                     - Triggering cancellation of task code TumblingWindowOperator (2/2) (7a117e9804f71c6a995e31616ee05ec9).
> 2018-11-20 11:59:54,789 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to cancel task SemanticsCheckMapper -> Sink: Unnamed (1/2) (0a53c5fdd36dffde33ef032b6ffb5307).
> 2018-11-20 11:59:54,789 INFO  org.apache.flink.runtime.taskmanager.Task                     - SemanticsCheckMapper -> Sink: Unnamed (1/2) (0a53c5fdd36dffde33ef032b6ffb5307) switched from RUNNING to CANCELING.
> 2018-11-20 11:59:54,789 INFO  org.apache.flink.runtime.taskmanager.Task                     - Triggering cancellation of task code SemanticsCheckMapper -> Sink: Unnamed (1/2) (0a53c5fdd36dffde33ef032b6ffb5307).
> 2018-11-20 11:59:54,813 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to cancel task SemanticsCheckMapper -> Sink: Unnamed (2/2) (a9b08ced9d0d0ed0fb130336340b1a7a).
> 2018-11-20 11:59:54,813 INFO  org.apache.flink.runtime.taskmanager.Task                     - SemanticsCheckMapper -> Sink: Unnamed (2/2) (a9b08ced9d0d0ed0fb130336340b1a7a) switched from RUNNING to CANCELING.
> 2018-11-20 11:59:54,813 INFO  org.apache.flink.runtime.taskmanager.Task                     - Triggering cancellation of task code SemanticsCheckMapper -> Sink: Unnamed (2/2) (a9b08ced9d0d0ed0fb130336340b1a7a).
> 2018-11-20 11:59:54,824 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to cancel task SlidingWindowOperator (1/2) (0f6b13316bcfe13e46972d4dd0bc5939).
> 2018-11-20 11:59:54,824 INFO  org.apache.flink.runtime.taskmanager.Task                     - SlidingWindowOperator (1/2) (0f6b13316bcfe13e46972d4dd0bc5939) switched from RUNNING to CANCELING.
> 2018-11-20 11:59:54,824 INFO  org.apache.flink.runtime.taskmanager.Task                     - Triggering cancellation of task code SlidingWindowOperator (1/2) (0f6b13316bcfe13e46972d4dd0bc5939).
> 2018-11-20 11:59:54,831 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to cancel task SlidingWindowOperator (2/2) (4b0e279fcf1ce9afb5833731a3844319).
> 2018-11-20 11:59:54,831 INFO  org.apache.flink.runtime.taskmanager.Task                     - SlidingWindowOperator (2/2) (4b0e279fcf1ce9afb5833731a3844319) switched from RUNNING to CANCELING.
> 2018-11-20 11:59:54,831 INFO  org.apache.flink.runtime.taskmanager.Task                     - Triggering cancellation of task code SlidingWindowOperator (2/2) (4b0e279fcf1ce9afb5833731a3844319).
> 2018-11-20 11:59:54,844 INFO  org.apache.flink.runtime.taskmanager.Task                     - Attempting to cancel task SlidingWindowCheckMapper -> Sink: Unnamed (1/2) (f269b77a22ea33b41d58a276154fff75).
> 2018-11-20 11:59:54,844 INFO  org.apache.flink.runtime.taskmanager.Task                     - SlidingWindowCheckMapper -> Sink: Unnamed (1/2) (f269b77a22ea33b41d58a276154fff75) switched from RUNNING to CANCELING.
> 2018-11-20 11:59:54,844 INFO  org.apache.flink.runtime.taskmanager.Task                     - Triggering cancellation of task code SlidingWindowCheckMapper -> Sink: Unnamed (1/2) (f269b77a22ea33b41d58a276154fff75).
> 2018-11-20 11:59:54,857 INFO  org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend  - Deleting existing instance base directory /tmp/flink-io-71851605-f8f2-4d4e-83e7-9b69e0a879ef/job_41e7002f55a128f646117fc14cf858a1_op_WindowOperator_0b63e7dd9fb1948bf052174673e64274__1_2__uuid_f01e3e36-565e-4680-b7d8-1f30e6a64474.
> 2018-11-20 11:59:54,742 INFO  org.apache.flink.streaming.api.operators.AbstractStreamOperator  - Could not complete snapshot 12 for operator ArtificalKeyedStateMapper_Kryo_and_Custom_Stateful (1/2).
> java.io.IOException: Cannot register Closeable, registry is already closed. Closing argument.
> 	at org.apache.flink.util.AbstractCloseableRegistry.registerCloseable(AbstractCloseableRegistry.java:85)
> 	at org.apache.flink.runtime.state.AsyncSnapshotCallable$AsyncSnapshotTask.<init>(AsyncSnapshotCallable.java:123)
> 	at org.apache.flink.runtime.state.AsyncSnapshotCallable$AsyncSnapshotTask.<init>(AsyncSnapshotCallable.java:111)
> 	at org.apache.flink.runtime.state.AsyncSnapshotCallable.toAsyncSnapshotFutureTask(AsyncSnapshotCallable.java:105)
> 	at org.apache.flink.contrib.streaming.state.snapshot.RocksIncrementalSnapshotStrategy.doSnapshot(RocksIncrementalSnapshotStrategy.java:164)
> 	at org.apache.flink.contrib.streaming.state.snapshot.RocksDBSnapshotStrategyBase.snapshot(RocksDBSnapshotStrategyBase.java:128)
> 	at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.snapshot(RocksDBKeyedStateBackend.java:496)
> 	at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:407)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(StreamTask.java:1113)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1055)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:729)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:641)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:586)
> 	at org.apache.flink.streaming.runtime.io.BarrierBuffer.notifyCheckpoint(BarrierBuffer.java:396)
> 	at org.apache.flink.streaming.runtime.io.BarrierBuffer.processBarrier(BarrierBuffer.java:292)
> 	at org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:200)
> 	at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:209)
> 	at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
> 	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704)
> 	at java.lang.Thread.run(Thread.java:748)
> 2018-11-20 11:59:54,871 INFO  org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend  - Deleting existing instance base directory /tmp/flink-io-71851605-f8f2-4d4e-83e7-9b69e0a879ef/job_41e7002f55a128f646117fc14cf858a1_op_StreamMap_5271c210329e73bd743f3227edfb3b71__1_2__uuid_0f79c71c-ffe0-4fb6-a16e-a7a7f1c8490d.
> 2018-11-20 11:59:54,747 INFO  org.apache.flink.streaming.api.operators.AbstractStreamOperator  - Could not complete snapshot 12 for operator ArtificalKeyedStateMapper_Kryo_and_Custom_Stateful (2/2).
> java.io.IOException: Cannot register Closeable, registry is already closed. Closing argument.
> 	at org.apache.flink.util.AbstractCloseableRegistry.registerCloseable(AbstractCloseableRegistry.java:85)
> 	at org.apache.flink.runtime.state.AsyncSnapshotCallable$AsyncSnapshotTask.<init>(AsyncSnapshotCallable.java:123)
> 	at org.apache.flink.runtime.state.AsyncSnapshotCallable$AsyncSnapshotTask.<init>(AsyncSnapshotCallable.java:111)
> 	at org.apache.flink.runtime.state.AsyncSnapshotCallable.toAsyncSnapshotFutureTask(AsyncSnapshotCallable.java:105)
> 	at org.apache.flink.contrib.streaming.state.snapshot.RocksIncrementalSnapshotStrategy.doSnapshot(RocksIncrementalSnapshotStrategy.java:164)
> 	at org.apache.flink.contrib.streaming.state.snapshot.RocksDBSnapshotStrategyBase.snapshot(RocksDBSnapshotStrategyBase.java:128)
> 	at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.snapshot(RocksDBKeyedStateBackend.java:496)
> 	at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:407)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(StreamTask.java:1113)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1055)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:729)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:641)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:586)
> 	at org.apache.flink.streaming.runtime.io.BarrierBuffer.notifyCheckpoint(BarrierBuffer.java:396)
> 	at org.apache.flink.streaming.runtime.io.BarrierBuffer.processBarrier(BarrierBuffer.java:292)
> 	at org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:200)
> 	at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:209)
> 	at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
> 	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704)
> 	at java.lang.Thread.run(Thread.java:748)
> 2018-11-20 11:59:54,875 INFO  org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend  - Deleting existing instance base directory /tmp/flink-io-71851605-f8f2-4d4e-83e7-9b69e0a879ef/job_41e7002f55a128f646117fc14cf858a1_op_StreamMap_5271c210329e73bd743f3227edfb3b71__2_2__uuid_f09e971f-8542-4f0b-9abe-644d702f235f.
> 2018-11-20 11:59:54,873 INFO  org.apache.flink.runtime.taskmanager.Task                     - ArtificalKeyedStateMapper_Avro -> ArtificalOperatorStateMapper (1/2) (3e5842c071706a82194c67143f6f4a60) switched from CANCELING to CANCELED.{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)