You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by "James (Jian Wu) [FDS Data Platform]" <ja...@coupang.com> on 2018/06/05 10:30:00 UTC

Link checkpoint failure issue

Hi:

  I am using Flink streaming continuous query.
  Scenario:
  Kafka-connector to consume a topic, and streaming incremental calculate 24 hours window data. And use processingTime as TimeCharacteristic. I am using RocksDB as StateBackend, file system is HDFS, and checkpoint interval is 5 minutes.

env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);


RocksDBStateBackend rocksdb = new RocksDBStateBackend(checkPointPath, true);
rocksdb.setPredefinedOptions(PredefinedOptions.FLASH_SSD_OPTIMIZED);

env.setStateBackend(rocksdb);


env.enableCheckpointing(checkPointInterval);
env.getCheckpointConfig().setMinPauseBetweenCheckpoints(checkPointInterval);

  After I run the application for serval hours, the info log shows
2018-06-04 19:29:08,048 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering checkpoint 9 @ 1528108147985 for job 33313f186b439312bd09e5672e8af661.

 but not completed log, and the checkpoint failed
 Via web UI metrics, kafka commit offset stop increase, and kafka current offset still go ahead, wait for 2 hours, kafka stop consume message.

Then I enable debug log, and try to reproduce the issue,

During normal stage, the log shows there are DFSClient send data package


2018-06-04 19:23:58,933 DEBUG org.apache.hadoop.hdfs.DFSClient                              - /flink/cps/33313f186b439312bd09e5672e8af661/chk-8: masked=rwxr-xr-x
2018-06-04 19:23:58,934 DEBUG org.apache.hadoop.ipc.Client                                  - The ping interval is 60000 ms.
2018-06-04 19:23:58,934 DEBUG org.apache.hadoop.ipc.Client                                  - Connecting to fds-hadoop-prod30-mp/10.10.22.50:8020
2018-06-04 19:23:58,935 DEBUG org.apache.hadoop.ipc.Client                                  - IPC Client (2045458324) connection to fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin: starting, having connections 1
2018-06-04 19:23:58,936 DEBUG org.apache.hadoop.ipc.Client                                  - IPC Client (2045458324) connection to fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin sending #1709
2018-06-04 19:23:58,967 DEBUG org.apache.hadoop.ipc.Client                                  - IPC Client (2045458324) connection to fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin got value #1709
2018-06-04 19:23:58,967 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine                       - Call: mkdirs took 33ms
2018-06-04 19:23:58,967 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering checkpoint 8 @ 1528107838933 for job 33313f186b439312bd09e5672e8af661.
2018-06-04 19:24:00,054 DEBUG org.apache.hadoop.hdfs.DFSClient                              - /flink/cps/33313f186b439312bd09e5672e8af661/chk-8/_metadata: masked=rw-r--r--
2018-06-04 19:24:00,055 DEBUG org.apache.hadoop.ipc.Client                                  - IPC Client (2045458324) connection to fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin sending #1710
2018-06-04 19:24:00,060 DEBUG org.apache.hadoop.ipc.Client                                  - IPC Client (2045458324) connection to fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin got value #1710
2018-06-04 19:24:00,060 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine                       - Call: create took 6ms
2018-06-04 19:24:00,060 DEBUG org.apache.hadoop.hdfs.DFSClient                              - computePacketChunkSize: src=/flink/cps/33313f186b439312bd09e5672e8af661/chk-8/_metadata, chunkSize=516, chunksPerPacket=126, packetSize=65016
2018-06-04 19:24:00,061 DEBUG org.apache.hadoop.hdfs.LeaseRenewer                           - Lease renewer daemon for [DFSClient_NONMAPREDUCE_-866487647_111] with renew id 1 started
2018-06-04 19:24:00,061 DEBUG org.apache.hadoop.hdfs.DFSClient                              - DFSClient writeChunk allocating new packet seqno=0, src=/flink/cps/33313f186b439312bd09e5672e8af661/chk-8/_metadata, packetSize=65016, chunksPerPacket=126, bytesCurBlock=0
2018-06-04 19:24:00,061 DEBUG org.apache.hadoop.hdfs.DFSClient                              - DFSClient flush(): bytesCurBlock=6567 lastFlushOffset=0 createNewBlock=false
2018-06-04 19:24:00,061 DEBUG org.apache.hadoop.hdfs.DFSClient                              - Queued packet 0
2018-06-04 19:24:00,061 DEBUG org.apache.hadoop.hdfs.DFSClient                              - Waiting for ack for: 0
2018-06-04 19:24:00,061 DEBUG org.apache.hadoop.hdfs.DFSClient                              - Allocating new block
2018-06-04 19:24:00,062 DEBUG org.apache.hadoop.ipc.Client                                  - IPC Client (2045458324) connection to fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin sending #1711
2018-06-04 19:24:00,068 DEBUG org.apache.hadoop.ipc.Client                                  - IPC Client (2045458324) connection to fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin got value #1711
2018-06-04 19:24:00,068 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine                       - Call: addBlock took 6ms
2018-06-04 19:24:00,069 DEBUG org.apache.hadoop.hdfs.DFSClient                              - pipeline = DatanodeInfoWithStorage[10.10.22.51:50010,DS-a67bc965-fb34-4d18-bfb6-4771a485ae84,DISK]
2018-06-04 19:24:00,069 DEBUG org.apache.hadoop.hdfs.DFSClient                              - pipeline = DatanodeInfoWithStorage[10.10.22.52:50010,DS-37e4ccb5-9ea1-4e11-90c4-7052aec50282,DISK]
2018-06-04 19:24:00,069 DEBUG org.apache.hadoop.hdfs.DFSClient                              - pipeline = DatanodeInfoWithStorage[10.10.9.21:50010,DS-129977b5-a5c7-4e2f-827a-e4ef9fa22647,DISK]
2018-06-04 19:24:00,069 DEBUG org.apache.hadoop.hdfs.DFSClient                              - Connecting to datanode 10.10.22.51:50010
2018-06-04 19:24:00,069 DEBUG org.apache.hadoop.hdfs.DFSClient                              - Send buf size 124928








And after send data package, there are some IPC operations:




2018-06-04 19:24:00,507 DEBUG org.apache.hadoop.hdfs.DFSClient                              - DFSClient seqno: 866 reply: SUCCESS reply: SUCCESS reply: SUCCESS downstreamAckTimeNanos: 223700
2018-06-04 19:24:00,507 DEBUG org.apache.hadoop.hdfs.DFSClient                              - DataStreamer block BP-740685118-10.10.22.50-1493792592687:blk_1160534861_86796613 sending packet packet seqno: 867 offsetInBlock: 119274 lastPacketInBlock: true lastByteOffsetInBlock: 119274
2018-06-04 19:24:00,511 DEBUG org.apache.hadoop.hdfs.DFSClient                              - DFSClient seqno: 867 reply: SUCCESS reply: SUCCESS reply: SUCCESS downstreamAckTimeNanos: 2367379
2018-06-04 19:24:00,511 DEBUG org.apache.hadoop.hdfs.DFSClient                              - Closing old block BP-740685118-10.10.22.50-1493792592687:blk_1160534861_86796613
2018-06-04 19:24:00,511 DEBUG org.apache.hadoop.ipc.Client                                  - IPC Client (2045458324) connection to fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin sending #1713
2018-06-04 19:24:00,547 DEBUG org.apache.hadoop.ipc.Client                                  - IPC Client (2045458324) connection to fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin got value #1713
2018-06-04 19:24:00,547 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine                       - Call: complete took 36ms
2018-06-04 19:24:00,548 DEBUG org.apache.hadoop.ipc.Client                                  - IPC Client (2045458324) connection to fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin sending #1714
2018-06-04 19:24:00,571 DEBUG org.apache.hadoop.ipc.Client                                  - IPC Client (2045458324) connection to fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin got value #1714
2018-06-04 19:24:00,571 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine                       - Call: delete took 23ms
2018-06-04 19:24:00,571 DEBUG org.apache.hadoop.ipc.Client                                  - IPC Client (2045458324) connection to fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin sending #1715
2018-06-04 19:24:00,611 DEBUG org.apache.hadoop.ipc.Client                                  - IPC Client (2045458324) connection to fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin got value #1715
2018-06-04 19:24:00,612 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine                       - Call: delete took 41ms
2018-06-04 19:24:00,612 DEBUG org.apache.hadoop.ipc.Client                                  - IPC Client (2045458324) connection to fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin sending #1716
2018-06-04 19:24:00,635 DEBUG org.apache.hadoop.ipc.Client                                  - IPC Client (2045458324) connection to fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin got value #1716
2018-06-04 19:24:00,635 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine                       - Call: delete took 23ms








The abnormal stage shows there are not any data to send, and following checkpoint interval became to 10 minutes, and not data to send to HDFS




2018-06-04 19:24:07,985 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed checkpoint 8 for job 33313f186b439312bd09e5672e8af661 (7080351 bytes in 1615 ms).
2018-06-04 19:24:17,919 DEBUG org.apache.hadoop.ipc.Client                                  - IPC Client (2045458324) connection to fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin: closed
2018-06-04 19:24:17,920 DEBUG org.apache.hadoop.ipc.Client                                  - IPC Client (2045458324) connection to fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin: stopped, remaining connections 0
2018-06-04 19:24:30,064 DEBUG org.apache.hadoop.hdfs.LeaseRenewer                           - Lease renewer daemon for [] with renew id 1 executed
2018-06-04 19:25:00,068 DEBUG org.apache.hadoop.hdfs.LeaseRenewer                           - Lease renewer daemon for [] with renew id 1 executed
2018-06-04 19:25:01,068 DEBUG org.apache.hadoop.hdfs.LeaseRenewer                           - Lease renewer daemon for [] with renew id 1 expired
2018-06-04 19:25:01,068 DEBUG org.apache.hadoop.hdfs.LeaseRenewer                           - Lease renewer daemon for [] with renew id 1 exited
2018-06-04 19:29:07,986 DEBUG org.apache.hadoop.hdfs.DFSClient                              - /flink/cps/33313f186b439312bd09e5672e8af661/chk-9: masked=rwxr-xr-x
2018-06-04 19:29:07,986 DEBUG org.apache.hadoop.ipc.Client                                  - The ping interval is 60000 ms.
2018-06-04 19:29:07,986 DEBUG org.apache.hadoop.ipc.Client                                  - Connecting to fds-hadoop-prod30-mp/10.10.22.50:8020
2018-06-04 19:29:07,987 DEBUG org.apache.hadoop.ipc.Client                                  - IPC Client (2045458324) connection to fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin: starting, having connections 1
2018-06-04 19:29:07,988 DEBUG org.apache.hadoop.ipc.Client                                  - IPC Client (2045458324) connection to fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin sending #1990
2018-06-04 19:29:08,048 DEBUG org.apache.hadoop.ipc.Client                                  - IPC Client (2045458324) connection to fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin got value #1990
2018-06-04 19:29:08,048 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine                       - Call: mkdirs took 62ms
2018-06-04 19:29:08,048 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering checkpoint 9 @ 1528108147985 for job 33313f186b439312bd09e5672e8af661.
2018-06-04 19:29:17,988 DEBUG org.apache.hadoop.ipc.Client                                  - IPC Client (2045458324) connection to fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin: closed
2018-06-04 19:29:17,988 DEBUG org.apache.hadoop.ipc.Client                                  - IPC Client (2045458324) connection to fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin: stopped, remaining connections 0
2018-06-04 19:39:08,049 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Checkpoint 9 of job 33313f186b439312bd09e5672e8af661 expired before completing.
2018-06-04 19:39:08,050 DEBUG org.apache.hadoop.hdfs.DFSClient                              - /flink/cps/33313f186b439312bd09e5672e8af661/chk-10: masked=rwxr-xr-x
2018-06-04 19:39:08,050 DEBUG org.apache.hadoop.ipc.Client                                  - The ping interval is 60000 ms.
2018-06-04 19:39:08,051 DEBUG org.apache.hadoop.ipc.Client                                  - Connecting to fds-hadoop-prod30-mp/10.10.22.50:8020
2018-06-04 19:39:08,053 DEBUG org.apache.hadoop.ipc.Client                                  - IPC Client (2045458324) connection to fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin: starting, having connections 1
2018-06-04 19:39:08,054 DEBUG org.apache.hadoop.ipc.Client                                  - IPC Client (2045458324) connection to fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin sending #1991




Then I disable checkpoint, currently the application run for more than 10 hours, kafka consume message as normal as expected.




My link version is 1.5.0.




Because of there are not any exception/error information in log, could anyone give me some suggestion.




Thanks




James




Re: Link checkpoint failure issue

Posted by Chesnay Schepler <ch...@apache.org>.
Can you provide us with the TaskManager logs?

On 05.06.2018 12:30, James (Jian Wu) [FDS Data Platform] wrote:
>
> Hi:
>
>   I am using Flink streaming continuous query.
>
>   Scenario:
>
> Kafka-connector to consume a topic, and streaming incremental 
> calculate 24 hours window data. And use processingTime as 
> TimeCharacteristic. I am using RocksDB as StateBackend, file system is 
> HDFS, and checkpoint interval is 5 minutes.
>
> env.setStreamTimeCharacteristic(TimeCharacteristic./ProcessingTime/);
>
>
> RocksDBStateBackend rocksdb = new RocksDBStateBackend(checkPointPath, 
> true);
> rocksdb.setPredefinedOptions(PredefinedOptions./FLASH_SSD_OPTIMIZED/);
>
> env.setStateBackend(rocksdb);
>
>
> env.enableCheckpointing(checkPointInterval);
> env.getCheckpointConfig().setMinPauseBetweenCheckpoints(checkPointInterval);
>
>   After I run the application for serval hours, the info log shows
>
> 2018-06-04 19:29:08,048 INFO 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator   - 
> Triggering checkpoint 9 @ 1528108147985 for job 
> 33313f186b439312bd09e5672e8af661.
>
>  but not completed log, and the checkpoint failed
>
>  Via web UI metrics, kafka commit offset stop increase, and kafka 
> current offset still go ahead, wait for 2 hours, kafka stop consume 
> message.
>
> Then I enable debug log, and try to reproduce the issue,
>
> *During normal stage, the log shows there are DFSClient send data package*
>
> 2018-06-04 19:23:58,933 DEBUG org.apache.hadoop.hdfs.DFSClient         
>             - /flink/cps/33313f186b439312bd09e5672e8af661/chk-8: 
> masked=rwxr-xr-x
>
> 2018-06-04 19:23:58,934 DEBUG org.apache.hadoop.ipc.Client             
>         - The ping interval is 60000 ms.
>
> 2018-06-04 19:23:58,934 DEBUG org.apache.hadoop.ipc.Client             
>         - Connecting to fds-hadoop-prod30-mp/10.10.22.50:8020
>
> 2018-06-04 19:23:58,935 DEBUG org.apache.hadoop.ipc.Client             
>         - IPC Client (2045458324) connection to 
> fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin: starting, having 
> connections 1
>
> 2018-06-04 19:23:58,936 DEBUG org.apache.hadoop.ipc.Client             
>         - IPC Client (2045458324) connection to 
> fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin sending #1709
>
> 2018-06-04 19:23:58,967 DEBUG org.apache.hadoop.ipc.Client             
>         - IPC Client (2045458324) connection to 
> fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin got value #1709
>
> 2018-06-04 19:23:58,967 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine 
>                     - Call: mkdirs took 33ms
>
> 2018-06-04 19:23:58,967 INFO 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator   - 
> Triggering checkpoint 8 @ 1528107838933 for job 
> 33313f186b439312bd09e5672e8af661.
>
> 2018-06-04 19:24:00,054 DEBUG org.apache.hadoop.hdfs.DFSClient         
>             - 
> /flink/cps/33313f186b439312bd09e5672e8af661/chk-8/_metadata: 
> masked=rw-r--r--
>
> 2018-06-04 19:24:00,055 DEBUG org.apache.hadoop.ipc.Client             
>         - IPC Client (2045458324) connection to 
> fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin sending #1710
>
> 2018-06-04 19:24:00,060 DEBUG org.apache.hadoop.ipc.Client             
>         - IPC Client (2045458324) connection to 
> fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin got value #1710
>
> 2018-06-04 19:24:00,060 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine 
>                     - Call: create took 6ms
>
> 2018-06-04 19:24:00,060 DEBUG org.apache.hadoop.hdfs.DFSClient         
>             - computePacketChunkSize: 
> src=/flink/cps/33313f186b439312bd09e5672e8af661/chk-8/_metadata, 
> chunkSize=516, chunksPerPacket=126, packetSize=65016
>
> 2018-06-04 19:24:00,061 DEBUG org.apache.hadoop.hdfs.LeaseRenewer     
>                 - Lease renewer daemon for 
> [DFSClient_NONMAPREDUCE_-866487647_111] with renew id 1 started
>
> 2018-06-04 19:24:00,061 DEBUG org.apache.hadoop.hdfs.DFSClient         
>             - DFSClient writeChunk allocating new packet seqno=0, 
> src=/flink/cps/33313f186b439312bd09e5672e8af661/chk-8/_metadata, 
> packetSize=65016, chunksPerPacket=126, bytesCurBlock=0
>
> 2018-06-04 19:24:00,061 DEBUG org.apache.hadoop.hdfs.DFSClient         
>             - DFSClient flush(): bytesCurBlock=6567 lastFlushOffset=0 
> createNewBlock=false
>
> 2018-06-04 19:24:00,061 DEBUG org.apache.hadoop.hdfs.DFSClient         
>             - Queued packet 0
>
> 2018-06-04 19:24:00,061 DEBUG org.apache.hadoop.hdfs.DFSClient         
>             - Waiting for ack for: 0
>
> 2018-06-04 19:24:00,061 DEBUG org.apache.hadoop.hdfs.DFSClient         
>             - Allocating new block
>
> 2018-06-04 19:24:00,062 DEBUG org.apache.hadoop.ipc.Client             
>         - IPC Client (2045458324) connection to 
> fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin sending #1711
>
> 2018-06-04 19:24:00,068 DEBUG org.apache.hadoop.ipc.Client             
>         - IPC Client (2045458324) connection to 
> fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin got value #1711
>
> 2018-06-04 19:24:00,068 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine 
>                     - Call: addBlock took 6ms
>
> 2018-06-04 19:24:00,069 DEBUG org.apache.hadoop.hdfs.DFSClient         
>             - pipeline = 
> DatanodeInfoWithStorage[10.10.22.51:50010,DS-a67bc965-fb34-4d18-bfb6-4771a485ae84,DISK]
>
> 2018-06-04 19:24:00,069 DEBUG org.apache.hadoop.hdfs.DFSClient         
>             - pipeline = 
> DatanodeInfoWithStorage[10.10.22.52:50010,DS-37e4ccb5-9ea1-4e11-90c4-7052aec50282,DISK]
>
> 2018-06-04 19:24:00,069 DEBUG org.apache.hadoop.hdfs.DFSClient         
>             - pipeline = 
> DatanodeInfoWithStorage[10.10.9.21:50010,DS-129977b5-a5c7-4e2f-827a-e4ef9fa22647,DISK]
>
> 2018-06-04 19:24:00,069 DEBUG org.apache.hadoop.hdfs.DFSClient         
>             - Connecting to datanode 10.10.22.51:50010
>
> 2018-06-04 19:24:00,069 DEBUG org.apache.hadoop.hdfs.DFSClient         
>             - Send buf size 124928
>
>
>
>
>
>
>
>
>
> *And after send data package, there are some IPC operations:*
>
>
>
>
>
> 2018-06-04 19:24:00,507 DEBUG org.apache.hadoop.hdfs.DFSClient         
>             - DFSClient seqno: 866 reply: SUCCESS reply: SUCCESS 
> reply: SUCCESS downstreamAckTimeNanos: 223700
>
> 2018-06-04 19:24:00,507 DEBUG org.apache.hadoop.hdfs.DFSClient         
>             - DataStreamer block 
> BP-740685118-10.10.22.50-1493792592687:blk_1160534861_86796613 sending 
> packet packet seqno: 867 offsetInBlock: 119274 lastPacketInBlock: true 
> lastByteOffsetInBlock: 119274
>
> 2018-06-04 19:24:00,511 DEBUG org.apache.hadoop.hdfs.DFSClient         
>             - DFSClient seqno: 867 reply: SUCCESS reply: SUCCESS 
> reply: SUCCESS downstreamAckTimeNanos: 2367379
>
> 2018-06-04 19:24:00,511 DEBUG org.apache.hadoop.hdfs.DFSClient         
>             - Closing old block 
> BP-740685118-10.10.22.50-1493792592687:blk_1160534861_86796613
>
> 2018-06-04 19:24:00,511 DEBUG org.apache.hadoop.ipc.Client             
>         - IPC Client (2045458324) connection to 
> fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin sending #1713
>
> 2018-06-04 19:24:00,547 DEBUG org.apache.hadoop.ipc.Client             
>         - IPC Client (2045458324) connection to 
> fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin got value #1713
>
> 2018-06-04 19:24:00,547 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine 
>                     - Call: complete took 36ms
>
> 2018-06-04 19:24:00,548 DEBUG org.apache.hadoop.ipc.Client             
>         - IPC Client (2045458324) connection to 
> fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin sending #1714
>
> 2018-06-04 19:24:00,571 DEBUG org.apache.hadoop.ipc.Client             
>         - IPC Client (2045458324) connection to 
> fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin got value #1714
>
> 2018-06-04 19:24:00,571 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine 
>                     - Call: delete took 23ms
>
> 2018-06-04 19:24:00,571 DEBUG org.apache.hadoop.ipc.Client             
>         - IPC Client (2045458324) connection to 
> fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin sending #1715
>
> 2018-06-04 19:24:00,611 DEBUG org.apache.hadoop.ipc.Client             
>         - IPC Client (2045458324) connection to 
> fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin got value #1715
>
> 2018-06-04 19:24:00,612 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine 
>                     - Call: delete took 41ms
>
> 2018-06-04 19:24:00,612 DEBUG org.apache.hadoop.ipc.Client             
>         - IPC Client (2045458324) connection to 
> fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin sending #1716
>
> 2018-06-04 19:24:00,635 DEBUG org.apache.hadoop.ipc.Client             
>         - IPC Client (2045458324) connection to 
> fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin got value #1716
>
> 2018-06-04 19:24:00,635 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine 
>                     - Call: delete took 23ms
>
>
>
>
>
>
>
>
>
> *The abnormal stage shows there are not any data to send, and 
> following checkpoint interval became to 10 minutes, and not data to 
> send to HDFS*
>
>
>
>
>
> 2018-06-04 19:24:07,985 INFO 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator   - 
> Completed checkpoint 8 for job 33313f186b439312bd09e5672e8af661 
> (7080351 bytes in 1615 ms).
>
> 2018-06-04 19:24:17,919 DEBUG org.apache.hadoop.ipc.Client             
>         - IPC Client (2045458324) connection to 
> fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin: closed
>
> 2018-06-04 19:24:17,920 DEBUG org.apache.hadoop.ipc.Client             
>         - IPC Client (2045458324) connection to 
> fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin: stopped, 
> remaining connections 0
>
> 2018-06-04 19:24:30,064 DEBUG org.apache.hadoop.hdfs.LeaseRenewer     
>                 - Lease renewer daemon for [] with renew id 1 executed
>
> 2018-06-04 19:25:00,068 DEBUG org.apache.hadoop.hdfs.LeaseRenewer     
>                 - Lease renewer daemon for [] with renew id 1 executed
>
> 2018-06-04 19:25:01,068 DEBUG org.apache.hadoop.hdfs.LeaseRenewer     
>                 - Lease renewer daemon for [] with renew id 1 expired
>
> 2018-06-04 19:25:01,068 DEBUG org.apache.hadoop.hdfs.LeaseRenewer     
>                 - Lease renewer daemon for [] with renew id 1 exited
>
> 2018-06-04 19:29:07,986 DEBUG org.apache.hadoop.hdfs.DFSClient         
>             - /flink/cps/33313f186b439312bd09e5672e8af661/chk-9: 
> masked=rwxr-xr-x
>
> 2018-06-04 19:29:07,986 DEBUG org.apache.hadoop.ipc.Client             
>         - The ping interval is 60000 ms.
>
> 2018-06-04 19:29:07,986 DEBUG org.apache.hadoop.ipc.Client             
>         - Connecting to fds-hadoop-prod30-mp/10.10.22.50:8020
>
> 2018-06-04 19:29:07,987 DEBUG org.apache.hadoop.ipc.Client             
>         - IPC Client (2045458324) connection to 
> fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin: starting, having 
> connections 1
>
> 2018-06-04 19:29:07,988 DEBUG org.apache.hadoop.ipc.Client             
>         - IPC Client (2045458324) connection to 
> fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin sending #1990
>
> 2018-06-04 19:29:08,048 DEBUG org.apache.hadoop.ipc.Client             
>         - IPC Client (2045458324) connection to 
> fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin got value #1990
>
> 2018-06-04 19:29:08,048 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine 
>                     - Call: mkdirs took 62ms
>
> 2018-06-04 19:29:08,048 INFO 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator   - 
> Triggering checkpoint 9 @ 1528108147985 for job 
> 33313f186b439312bd09e5672e8af661.
>
> 2018-06-04 19:29:17,988 DEBUG org.apache.hadoop.ipc.Client             
>         - IPC Client (2045458324) connection to 
> fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin: closed
>
> 2018-06-04 19:29:17,988 DEBUG org.apache.hadoop.ipc.Client             
>         - IPC Client (2045458324) connection to 
> fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin: stopped, 
> remaining connections 0
>
> 2018-06-04 19:39:08,049 INFO 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator   - 
> Checkpoint 9 of job 33313f186b439312bd09e5672e8af661 expired before 
> completing.
>
> 2018-06-04 19:39:08,050 DEBUG org.apache.hadoop.hdfs.DFSClient         
>             - /flink/cps/33313f186b439312bd09e5672e8af661/chk-10: 
> masked=rwxr-xr-x
>
> 2018-06-04 19:39:08,050 DEBUG org.apache.hadoop.ipc.Client             
>         - The ping interval is 60000 ms.
>
> 2018-06-04 19:39:08,051 DEBUG org.apache.hadoop.ipc.Client             
>         - Connecting to fds-hadoop-prod30-mp/10.10.22.50:8020
>
> 2018-06-04 19:39:08,053 DEBUG org.apache.hadoop.ipc.Client             
>         - IPC Client (2045458324) connection to 
> fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin: starting, having 
> connections 1
>
> 2018-06-04 19:39:08,054 DEBUG org.apache.hadoop.ipc.Client             
>         - IPC Client (2045458324) connection to 
> fds-hadoop-prod30-mp/10.10.22.50:8020 from fdsadmin sending #1991
>
>
>
>
>
> *Then I disable checkpoint, currently the application run for more 
> than 10 hours, kafka consume message as normal as expected.*
>
>
>
>
>
> My link version is 1.5.0.
>
>
>
>
>
> Because of there are not any exception/error information in log, could 
> anyone give me some suggestion.
>
>
>
>
>
> Thanks
>
>
>
>
>
> James
>