You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Tianning Zhang <Ti...@zanox.com> on 2015/07/22 12:44:13 UTC
FATAL [Replica Manager on Broker XXX]: Error writing to highwatermark
file:
Hi,
After switching from the Kafka version 2.10-0.8.1.1 to 2.10-0.8.2.1 I frequently encounter the exception below, which result in re-election of the leaders for the partitions. This exception occurs every one ~ several hours.
We have a 3-node-cluster. We didn't see this problem when we still use the version 0.8.1.1. It seems neither to be a problem of limits in open sockets/files - as increasing the limits didn't alleviate the problem.
Does anyone have similar problem? What are the possible fix?
I noticed that a similar issue was reported in (http://mail-archives.apache.org/mod_mbox/kafka-users/201501.mbox/%3CCAHwHRrW7qMLByv85pe7Vg17ksFrMSWtwHxGSjuHJawDFeWBuag@mail.gmail.com%3E) for version 0.8.1. Did you find a solution at the end?
Thanks!
Tianning
__________________________
FATAL [Replica Manager on Broker 3]: Error writing to highwatermark file: (kafka.server.ReplicaManager)
java.io.FileNotFoundException: /data/replication-offset-checkpoint.tmp (Permission denied)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:212)
at java.io.FileOutputStream.<init>(FileOutputStream.java:165)
at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37)
at kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(Re
at kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(Re
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(Tra
at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scal
at kafka.server.ReplicaManager.checkpointHighWatermarks(ReplicaManager.scal
at kafka.server.ReplicaManager$$anonfun$1.apply$mcV$sp(ReplicaManager.scala
at kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:
at kafka.utils.Utils$$anon$1.run(Utils.scala:54)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:35
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.acc
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.jav
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja
at java.lang.Thread.run(Thread.java:722)
[http://www.zanox.com/disclaimer/lgo_zanox.gif]
--------------------------------------------------------------------------------
ZANOX AG | Headquarters: Berlin AG Charlottenburg | HRB 75459 | VAT identification number: DE 209981705
Executive Board: Mark Walters (CEO) | Adam Ross (COO) | Peter Loveday (CTO)
Chairman of the Supervisory Board: Dr. Andreas Wiele
This e-mail and any attachments may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail from your system. Any other use, copying, disclosure or distribution is strictly forbidden.
Re: FATAL [Replica Manager on Broker XXX]: Error writing to highwatermark file:
Posted by "tianning.zhang" <ti...@zanox.com>.
Tianning Zhang <Ti...@...> writes:
>
> Hi,
>
> After switching from the Kafka version 2.10-0.8.1.1 to 2.10-0.8.2.1 I
frequently encounter the
> exception below, which result in re-election of the leaders for the
partitions. This exception occurs
> every one ~ several hours.
>
> We have a 3-node-cluster. We didn't see this problem when we still use
the version 0.8.1.1. It seems neither
> to be a problem of limits in open sockets/files - as increasing the limits
didn't alleviate the problem.
>
> Does anyone have similar problem? What are the possible fix?
>
> I noticed that a similar issue was reported in
>
(http://mail-archives.apache.org/mod_mbox/kafka-users/201501.mbox/%3CCAHwHRrW7qMLByv85pe7Vg17ksFrMSWtwHxGSjuHJawDFeWBuag-JsoAwUIsXosN+BqQ9rBEUg
<at> public.gmane.org%3E)
> for version 0.8.1. Did you find a solution at the end?
>
> Thanks!
>
> Tianning
> __________________________
> FATAL [Replica Manager on Broker 3]: Error writing to highwatermark file:
(kafka.server.ReplicaManager)
> java.io.FileNotFoundException: /data/replication-offset-checkpoint.tmp
(Permission denied)
> at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.<init>(FileOutputStream.java:212)
> at java.io.FileOutputStream.<init>(FileOutputStream.java:165)
> at kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:37)
> at
kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(Re
> at
kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(Re
> at
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(Tra
> at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
> at
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scal
> at
kafka.server.ReplicaManager.checkpointHighWatermarks(ReplicaManager.scal
> at
kafka.server.ReplicaManager$$anonfun$1.apply$mcV$sp(ReplicaManager.scala
> at
kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:
> at kafka.utils.Utils$$anon$1.run(Utils.scala:54)
> at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:35
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.acc
> at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run
> at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.jav
> at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja
> at java.lang.Thread.run(Thread.java:722)
>
> [http://www.zanox.com/disclaimer/lgo_zanox.gif]
>
--------------------------------------------------------------------------------
> ZANOX AG | Headquarters: Berlin AG Charlottenburg | HRB 75459 | VAT
identification number: DE 209981705
> Executive Board: Mark Walters (CEO) | Adam Ross (COO) | Peter Loveday (CTO)
> Chairman of the Supervisory Board: Dr. Andreas Wiele
>
> This e-mail and any attachments may contain confidential and/or privileged
information. If you are not
> the intended recipient (or have received this e-mail in error) please
notify the sender immediately and
> delete this e-mail from your system. Any other use, copying, disclosure or
distribution is strictly forbidden.
>
The above problem in our data center is solved. It seems to be a
reliability problem of the storage system. After moving to a new storage
provider, the exception disappeared.