You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Tim Van Laer (JIRA)" <ji...@apache.org> on 2018/01/30 09:20:00 UTC

[jira] [Commented] (KAFKA-5998) /.checkpoint.tmp Not found exception

    [ https://issues.apache.org/jira/browse/KAFKA-5998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344725#comment-16344725 ] 

Tim Van Laer commented on KAFKA-5998:
-------------------------------------

It looks like I ran into the same issue. It doesn't seem to impact processing however.

I'm running a Kafka Streams application with multiple state stores. On the same machine run 3 docker containers with the same application, they share the same disk on the host. 

The application uses version 1.0.0 of the Kafka Streams library. 

All three instances started emitting the error for the first time at 4:26 and kept producing it every 30 seconds until now. commit.interval.ms=30000, so that error interval makes sense.
{code:java}
[WARN] 2018-01-30 08:47:23,724 LogContext$KafkaLogger - task [2_19] Failed to write checkpoint file to /var/kafka-streams-state/microservice-primaryproduction/2_19/.checkpoint:
java.io.FileNotFoundException: /var/kafka-streams-state/microservice-primaryproduction/2_19/.checkpoint.tmp (No such file or directory)
at java.io.FileOutputStream.open0(Native Method) ~[?:1.8.0_151]
at java.io.FileOutputStream.open(FileOutputStream.java:270) ~[?:1.8.0_151]
at java.io.FileOutputStream.<init>(FileOutputStream.java:213) ~[?:1.8.0_151]
at java.io.FileOutputStream.<init>(FileOutputStream.java:162) ~[?:1.8.0_151]
at org.apache.kafka.streams.state.internals.OffsetCheckpoint.write(OffsetCheckpoint.java:73) ~[ms.jar:?]
at org.apache.kafka.streams.processor.internals.ProcessorStateManager.checkpoint(ProcessorStateManager.java:320) [ms.jar:?]
at org.apache.kafka.streams.processor.internals.StreamTask$1.run(StreamTask.java:306) [ms.jar:?]
at org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:208) [ms.jar:?]
at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:299) [ms.jar:?]
at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:289) [ms.jar:?]
at org.apache.kafka.streams.processor.internals.AssignedTasks$2.apply(AssignedTasks.java:87) [ms.jar:?]
at org.apache.kafka.streams.processor.internals.AssignedTasks.applyToRunningTasks(AssignedTasks.java:451) [ms.jar:?]
at org.apache.kafka.streams.processor.internals.AssignedTasks.commit(AssignedTasks.java:380) [ms.jar:?]
at org.apache.kafka.streams.processor.internals.TaskManager.commitAll(TaskManager.java:309) [ms.jar:?]
at org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:1018) [ms.jar:?]
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:835) [ms.jar:?]
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:774) [ms.jar:?]
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:744) [ms.jar:?]{code}

Failing tasks:
 * instance 1: 2_13, 2_14, 2_15, 2_16, 2_17, 2_18, 2_19, 2_20, 2_21, 2_22, 2_23
 * instance 2: 2_6, 2_7, 2_8, 2_9, , 2_10, 2_11, 2_12
 * instance 3: 1_0, 2_0, 2_1, 2_2, 2_3, 2_4, 2_5

The interesting thing is, these directories does indeed *NOT* exist... Application is running under root user in container, changing permissions didn't impact the behaviour. 
{code:java}
drwxr-xr-x 3 root root 53 Jan 30 09:07 /var/kafka_streams_state_disk/microservice-primaryproduction/0_0
drwxr-xr-x 3 root root 53 Jan 30 09:07 /var/kafka_streams_state_disk/microservice-primaryproduction/0_1
drwxr-xr-x 3 root root 53 Jan 30 09:07 /var/kafka_streams_state_disk/microservice-primaryproduction/0_10
drwxr-xr-x 3 root root 53 Jan 30 09:07 /var/kafka_streams_state_disk/microservice-primaryproduction/0_11
drwxr-xr-x 3 root root 53 Jan 30 09:07 /var/kafka_streams_state_disk/microservice-primaryproduction/0_12
drwxr-xr-x 3 root root 53 Jan 30 09:07 /var/kafka_streams_state_disk/microservice-primaryproduction/0_13
drwxr-xr-x 3 root root 53 Jan 30 09:07 /var/kafka_streams_state_disk/microservice-primaryproduction/0_14
drwxr-xr-x 3 root root 53 Jan 30 09:06 /var/kafka_streams_state_disk/microservice-primaryproduction/0_15
drwxr-xr-x 3 root root 53 Jan 30 09:06 /var/kafka_streams_state_disk/microservice-primaryproduction/0_16
drwxr-xr-x 3 root root 53 Jan 30 09:06 /var/kafka_streams_state_disk/microservice-primaryproduction/0_17
drwxr-xr-x 3 root root 53 Jan 30 09:06 /var/kafka_streams_state_disk/microservice-primaryproduction/0_18
drwxr-xr-x 3 root root 53 Jan 30 09:06 /var/kafka_streams_state_disk/microservice-primaryproduction/0_19
drwxr-xr-x 3 root root 53 Jan 30 09:07 /var/kafka_streams_state_disk/microservice-primaryproduction/0_2
drwxr-xr-x 3 root root 53 Jan 30 09:06 /var/kafka_streams_state_disk/microservice-primaryproduction/0_20
drwxr-xr-x 3 root root 53 Jan 30 09:06 /var/kafka_streams_state_disk/microservice-primaryproduction/0_21
drwxr-xr-x 3 root root 53 Jan 30 09:07 /var/kafka_streams_state_disk/microservice-primaryproduction/0_22
drwxr-xr-x 3 root root 53 Jan 30 09:07 /var/kafka_streams_state_disk/microservice-primaryproduction/0_3
drwxr-xr-x 3 root root 53 Jan 30 09:07 /var/kafka_streams_state_disk/microservice-primaryproduction/0_4
drwxr-xr-x 3 root root 53 Jan 30 09:07 /var/kafka_streams_state_disk/microservice-primaryproduction/0_5
drwxr-xr-x 3 root root 53 Jan 30 09:07 /var/kafka_streams_state_disk/microservice-primaryproduction/0_6
drwxr-xr-x 3 root root 53 Jan 30 09:07 /var/kafka_streams_state_disk/microservice-primaryproduction/0_7
drwxr-xr-x 3 root root 53 Jan 30 09:07 /var/kafka_streams_state_disk/microservice-primaryproduction/0_8
drwxr-xr-x 3 root root 53 Jan 30 09:07 /var/kafka_streams_state_disk/microservice-primaryproduction/0_9
drwxr-xr-x 3 root root 53 Jan 30 09:07 /var/kafka_streams_state_disk/microservice-primaryproduction/3_0
drwxr-xr-x 3 root root 53 Jan 30 09:07 /var/kafka_streams_state_disk/microservice-primaryproduction/3_1
drwxr-xr-x 3 root root 53 Jan 30 09:07 /var/kafka_streams_state_disk/microservice-primaryproduction/3_2
drwxr-xr-x 3 root root 53 Jan 30 09:07 /var/kafka_streams_state_disk/microservice-primaryproduction/3_3
drwxr-xr-x 3 root root 53 Jan 30 09:07 /var/kafka_streams_state_disk/microservice-primaryproduction/3_4
drwxr-xr-x 3 root root 53 Jan 30 09:07 /var/kafka_streams_state_disk/microservice-primaryproduction/3_5
drwxr-xr-x 3 root root 53 Jan 30 09:07 /var/kafka_streams_state_disk/microservice-primaryproduction/3_6
{code}

I restarted all three instances. The problem was gone without impact (no lag, no state store restoration...).  

> /.checkpoint.tmp Not found exception
> ------------------------------------
>
>                 Key: KAFKA-5998
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5998
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 0.11.0.0, 0.11.0.1
>            Reporter: Yogesh BG
>            Priority: Major
>         Attachments: 5998.v1.txt, 5998.v2.txt
>
>
> I have one kafka broker and one kafka stream running... I am running its since two days under load of around 2500 msgs per second.. On third day am getting below exception for some of the partitions, I have 16 partitions only 0_0 and 0_1 gives this error
> {{09:43:25.955 [ks_0_inst-StreamThread-6] WARN  o.a.k.s.p.i.ProcessorStateManager - Failed to write checkpoint file to /data/kstreams/rtp-kafkastreams/0_1/.checkpoint:
> java.io.FileNotFoundException: /data/kstreams/rtp-kafkastreams/0_1/.checkpoint.tmp (No such file or directory)
>         at java.io.FileOutputStream.open(Native Method) ~[na:1.7.0_111]
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221) ~[na:1.7.0_111]
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:171) ~[na:1.7.0_111]
>         at org.apache.kafka.streams.state.internals.OffsetCheckpoint.write(OffsetCheckpoint.java:73) ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at org.apache.kafka.streams.processor.internals.ProcessorStateManager.checkpoint(ProcessorStateManager.java:324) ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at org.apache.kafka.streams.processor.internals.StreamTask$1.run(StreamTask.java:267) [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:201) [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:260) [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:254) [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at org.apache.kafka.streams.processor.internals.AssignedTasks$1.apply(AssignedTasks.java:322) [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at org.apache.kafka.streams.processor.internals.AssignedTasks.applyToRunningTasks(AssignedTasks.java:415) [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at org.apache.kafka.streams.processor.internals.AssignedTasks.commit(AssignedTasks.java:314) [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at org.apache.kafka.streams.processor.internals.StreamThread.commitAll(StreamThread.java:700) [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:683) [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:523) [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:480) [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:457) [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> 09:43:25.974 [ks_0_inst-StreamThread-15] WARN  o.a.k.s.p.i.ProcessorStateManager - Failed to write checkpoint file to /data/kstreams/rtp-kafkastreams/0_0/.checkpoint:
> java.io.FileNotFoundException: /data/kstreams/rtp-kafkastreams/0_0/.checkpoint.tmp (No such file or directory)
>         at java.io.FileOutputStream.open(Native Method) ~[na:1.7.0_111]
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221) ~[na:1.7.0_111]
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:171) ~[na:1.7.0_111]
>         at org.apache.kafka.streams.state.internals.OffsetCheckpoint.write(OffsetCheckpoint.java:73) ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at org.apache.kafka.streams.processor.internals.ProcessorStateManager.checkpoint(ProcessorStateManager.java:324) ~[rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at org.apache.kafka.streams.processor.internals.StreamTask$1.run(StreamTask.java:267) [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:201) [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:260) [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:254) [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at org.apache.kafka.streams.processor.internals.AssignedTasks$1.apply(AssignedTasks.java:322) [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at org.apache.kafka.streams.processor.internals.AssignedTasks.applyToRunningTasks(AssignedTasks.java:415) [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at org.apache.kafka.streams.processor.internals.AssignedTasks.commit(AssignedTasks.java:314) [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at org.apache.kafka.streams.processor.internals.StreamThread.commitAll(StreamThread.java:700) [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:683) [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:523) [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:480) [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
>         at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:457) [rtp-kafkastreams-1.0-SNAPSHOT-jar-with-dependencies.jar:na]
> }}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)