You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Guozhang Wang <wa...@gmail.com> on 2014/04/09 01:58:08 UTC

Re: Exception on Startup. Is it bad or benign.

Hi Alex,

1. There is no "cool-off" time since the rebalance should be done before
the server complete shutdown.

2. The logs are indicating there is possible data loss, which is "expected"
if your producer's required.ack config is <= 1 but not == -1. If you do not
want data loss, you can change that config value in your producer clients
to be > 1, which will effectively trade some latency and availability for
consistency.

Guozhang


On Tue, Apr 8, 2014 at 9:51 AM, Alex Gray <Al...@inin.com> wrote:

> We have 3 Zookeepers and 3 Kafka Brokers, version 0.8.0.
>
> I gracefully shutdown one of the kafka brokers.
>
> Question 1:  Should I wait some time before starting the broker back up,
> or can I restart it as soon as possible?  In other words, do I have to wait
> for the other brokers to "re-balance (or whatever they do)" before starting
> it back up?
>
> Question 2: Every once in a while, I get the following exception when the
> kafka broker is starting up.  Is this bad?  Searching around the
> newsgroups, I could not get a definitive answer. Example:
> http://grokbase.com/t/kafka/users/13cq54bx5q/understanding-
> offsetoutofrangeexceptions
> http://grokbase.com/t/kafka/users/1413hp296y/trouble-
> recovering-after-a-crashed-broker
>
> Here is the exception:
> [2014-04-08 00:02:40,555] ERROR [KafkaApi-3] Error when processing fetch
> request for partition [KeyPairGenerated,0] offset 514 from consumer with
> correlation id 85 (kafka.server.KafkaApis)
> kafka.common.OffsetOutOfRangeException: Request for offset 514 but we
> only have log segments in the range 0 to 0.
>     at kafka.log.Log.read(Log.scala:429)
>     at kafka.server.KafkaApis.kafka$server$KafkaApis$$
> readMessageSet(KafkaApis.scala:388)
>     at kafka.server.KafkaApis$$anonfun$kafka$server$
> KafkaApis$$readMessageSets$1.apply(KafkaApis.scala:334)
>     at kafka.server.KafkaApis$$anonfun$kafka$server$
> KafkaApis$$readMessageSets$1.apply(KafkaApis.scala:330)
>     at scala.collection.TraversableLike$$anonfun$map$
> 1.apply(TraversableLike.scala:206)
>     at scala.collection.TraversableLike$$anonfun$map$
> 1.apply(TraversableLike.scala:206)
>     at scala.collection.immutable.Map$Map1.foreach(Map.scala:105)
>     at scala.collection.TraversableLike$class.map(
> TraversableLike.scala:206)
>     at scala.collection.immutable.Map$Map1.map(Map.scala:93)
>     at kafka.server.KafkaApis.kafka$server$KafkaApis$$
> readMessageSets(KafkaApis.scala:330)
>     at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:296)
>     at kafka.server.KafkaApis.handle(KafkaApis.scala:66)
>     at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:42)
>     at java.lang.Thread.run(Thread.java:722)
>
> And in the controller.log, I see every once in a while something like:
>
> controller.log.2014-04-01-04:[2014-04-01 04:42:41,713] WARN [
> OfflinePartitionLeaderSelector]: No broker in ISR is alive for
> [KeyPairGenerated,0]. Elect leader 3 from live brokers 3. There's potential
> data loss. (kafka.controller.OfflinePartitionLeaderSelector)
>
> (Which I did via: grep "data loss" *)
>
> I'm not a programmer: I am the admin for these machines, and I just want
> to make sure everything is cool.
> Oh, the server.properties has:
> default.replication.factor=3
>
> Thanks,
>
> Alex
>
>


-- 
-- Guozhang

Re: Exception on Startup. Is it bad or benign.

Posted by Joel Koshy <jj...@gmail.com>.

Do you see the data loss warning after a controlled shutdown? It isn't
very clear from your original message whether that is associated with
a shutdown operation.

We have a test setup similar to what you are describing - i.e.,
continuous rolling bounces of a test cluster (while there is traffic
flowing into it through mirror makers). For each broker: wait until
under-replicated-partition count on every broker is zero, then proceed
to do a controlled shutdown of that broker.

Thanks,

Joel

On Wed, Apr 09, 2014 at 09:02:45AM -0400, Alex Gray wrote:
> Thanks Joel and Guozhang!
> The data retention is 72 hours.
> Graceful shutdown is done via SIGTERM, and
> controlled.shutdown.enabled=true is in the config.
> I do see 'Controlled shutdown succeeded' in the broker log when I
> shut it down.
> 
> With both your responses, I feel as if brokers are indeed setup and
> functioning correctly.
> 
> I want to ask the developers if I can run a write a script that
> gracefully restarts each broker randomly throughout the entire day,
> 24/7 :)
> 
> That should weed out any issues.
> 
> Thanks guys,
> 
> Alex
> 
> 
> On Tue Apr  8 20:38:15 2014, Joel Koshy wrote:
> >Also, when you say "graceful shutdown" you mean you issue SIGTERM? Do
> >you have controlled.shutdown.enable=true in the broker config. If that
> >is set and the controlled shutdown succeeds (i.e., if you see
> >'Controlled shutdown succeeded' in the broker log) then you shouldn't
> >be seeing the data loss warning in your controller log during the
> >shutdown and restarts. Or are you seeing it at other times as well?
> >
> >WRT the OffsetOutOfRangeException: is your broker down for a long
> >period? Do you have a very low retention setting for your topics? Or
> >are you bringing up a consumer that has been down for a long period?
> >
> >Thanks,
> >
> >Joel
> >
> >On Tue, Apr 08, 2014 at 04:58:08PM -0700, Guozhang Wang wrote:
> >>Hi Alex,
> >>
> >>1. There is no "cool-off" time since the rebalance should be done before
> >>the server complete shutdown.
> >>
> >>2. The logs are indicating there is possible data loss, which is "expected"
> >>if your producer's required.ack config is <= 1 but not == -1. If you do not
> >>want data loss, you can change that config value in your producer clients
> >>to be > 1, which will effectively trade some latency and availability for
> >>consistency.
> >>
> >>Guozhang
> >>
> >>
> >>On Tue, Apr 8, 2014 at 9:51 AM, Alex Gray <Al...@inin.com> wrote:
> >>
> >>>We have 3 Zookeepers and 3 Kafka Brokers, version 0.8.0.
> >>>
> >>>I gracefully shutdown one of the kafka brokers.
> >>>
> >>>Question 1:  Should I wait some time before starting the broker back up,
> >>>or can I restart it as soon as possible?  In other words, do I have to wait
> >>>for the other brokers to "re-balance (or whatever they do)" before starting
> >>>it back up?
> >>>
> >>>Question 2: Every once in a while, I get the following exception when the
> >>>kafka broker is starting up.  Is this bad?  Searching around the
> >>>newsgroups, I could not get a definitive answer. Example:
> >>>http://grokbase.com/t/kafka/users/13cq54bx5q/understanding-
> >>>offsetoutofrangeexceptions
> >>>http://grokbase.com/t/kafka/users/1413hp296y/trouble-
> >>>recovering-after-a-crashed-broker
> >>>
> >>>Here is the exception:
> >>>[2014-04-08 00:02:40,555] ERROR [KafkaApi-3] Error when processing fetch
> >>>request for partition [KeyPairGenerated,0] offset 514 from consumer with
> >>>correlation id 85 (kafka.server.KafkaApis)
> >>>kafka.common.OffsetOutOfRangeException: Request for offset 514 but we
> >>>only have log segments in the range 0 to 0.
> >>>     at kafka.log.Log.read(Log.scala:429)
> >>>     at kafka.server.KafkaApis.kafka$server$KafkaApis$$
> >>>readMessageSet(KafkaApis.scala:388)
> >>>     at kafka.server.KafkaApis$$anonfun$kafka$server$
> >>>KafkaApis$$readMessageSets$1.apply(KafkaApis.scala:334)
> >>>     at kafka.server.KafkaApis$$anonfun$kafka$server$
> >>>KafkaApis$$readMessageSets$1.apply(KafkaApis.scala:330)
> >>>     at scala.collection.TraversableLike$$anonfun$map$
> >>>1.apply(TraversableLike.scala:206)
> >>>     at scala.collection.TraversableLike$$anonfun$map$
> >>>1.apply(TraversableLike.scala:206)
> >>>     at scala.collection.immutable.Map$Map1.foreach(Map.scala:105)
> >>>     at scala.collection.TraversableLike$class.map(
> >>>TraversableLike.scala:206)
> >>>     at scala.collection.immutable.Map$Map1.map(Map.scala:93)
> >>>     at kafka.server.KafkaApis.kafka$server$KafkaApis$$
> >>>readMessageSets(KafkaApis.scala:330)
> >>>     at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:296)
> >>>     at kafka.server.KafkaApis.handle(KafkaApis.scala:66)
> >>>     at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:42)
> >>>     at java.lang.Thread.run(Thread.java:722)
> >>>
> >>>And in the controller.log, I see every once in a while something like:
> >>>
> >>>controller.log.2014-04-01-04:[2014-04-01 04:42:41,713] WARN [
> >>>OfflinePartitionLeaderSelector]: No broker in ISR is alive for
> >>>[KeyPairGenerated,0]. Elect leader 3 from live brokers 3. There's potential
> >>>data loss. (kafka.controller.OfflinePartitionLeaderSelector)
> >>>
> >>>(Which I did via: grep "data loss" *)
> >>>
> >>>I'm not a programmer: I am the admin for these machines, and I just want
> >>>to make sure everything is cool.
> >>>Oh, the server.properties has:
> >>>default.replication.factor=3
> >>>
> >>>Thanks,
> >>>
> >>>Alex
> >>>
> >>>
> >>
> >>
> >>--
> >>-- Guozhang
> >
> 
> --
> *Alex Gray* | DevOps Engineer, PureCloud
> Phone +1.317.493.4291 | mobile +1.857.636.2810
> *Interactive Intelligence*
> Deliberately Innovative
> www.inin.com <http://www.inin.com/>
>

Re: Exception on Startup. Is it bad or benign.

Posted by Alex Gray <Al...@inin.com>.

Thanks Joel and Guozhang!
The data retention is 72 hours.
Graceful shutdown is done via SIGTERM, and 
controlled.shutdown.enabled=true is in the config.
I do see 'Controlled shutdown succeeded' in the broker log when I shut 
it down.

With both your responses, I feel as if brokers are indeed setup and 
functioning correctly.

I want to ask the developers if I can run a write a script that 
gracefully restarts each broker randomly throughout the entire day, 
24/7 :)

That should weed out any issues.

Thanks guys,

Alex


On Tue Apr  8 20:38:15 2014, Joel Koshy wrote:
> Also, when you say "graceful shutdown" you mean you issue SIGTERM? Do
> you have controlled.shutdown.enable=true in the broker config. If that
> is set and the controlled shutdown succeeds (i.e., if you see
> 'Controlled shutdown succeeded' in the broker log) then you shouldn't
> be seeing the data loss warning in your controller log during the
> shutdown and restarts. Or are you seeing it at other times as well?
>
> WRT the OffsetOutOfRangeException: is your broker down for a long
> period? Do you have a very low retention setting for your topics? Or
> are you bringing up a consumer that has been down for a long period?
>
> Thanks,
>
> Joel
>
> On Tue, Apr 08, 2014 at 04:58:08PM -0700, Guozhang Wang wrote:
>> Hi Alex,
>>
>> 1. There is no "cool-off" time since the rebalance should be done before
>> the server complete shutdown.
>>
>> 2. The logs are indicating there is possible data loss, which is "expected"
>> if your producer's required.ack config is <= 1 but not == -1. If you do not
>> want data loss, you can change that config value in your producer clients
>> to be > 1, which will effectively trade some latency and availability for
>> consistency.
>>
>> Guozhang
>>
>>
>> On Tue, Apr 8, 2014 at 9:51 AM, Alex Gray <Al...@inin.com> wrote:
>>
>>> We have 3 Zookeepers and 3 Kafka Brokers, version 0.8.0.
>>>
>>> I gracefully shutdown one of the kafka brokers.
>>>
>>> Question 1:  Should I wait some time before starting the broker back up,
>>> or can I restart it as soon as possible?  In other words, do I have to wait
>>> for the other brokers to "re-balance (or whatever they do)" before starting
>>> it back up?
>>>
>>> Question 2: Every once in a while, I get the following exception when the
>>> kafka broker is starting up.  Is this bad?  Searching around the
>>> newsgroups, I could not get a definitive answer. Example:
>>> http://grokbase.com/t/kafka/users/13cq54bx5q/understanding-
>>> offsetoutofrangeexceptions
>>> http://grokbase.com/t/kafka/users/1413hp296y/trouble-
>>> recovering-after-a-crashed-broker
>>>
>>> Here is the exception:
>>> [2014-04-08 00:02:40,555] ERROR [KafkaApi-3] Error when processing fetch
>>> request for partition [KeyPairGenerated,0] offset 514 from consumer with
>>> correlation id 85 (kafka.server.KafkaApis)
>>> kafka.common.OffsetOutOfRangeException: Request for offset 514 but we
>>> only have log segments in the range 0 to 0.
>>>      at kafka.log.Log.read(Log.scala:429)
>>>      at kafka.server.KafkaApis.kafka$server$KafkaApis$$
>>> readMessageSet(KafkaApis.scala:388)
>>>      at kafka.server.KafkaApis$$anonfun$kafka$server$
>>> KafkaApis$$readMessageSets$1.apply(KafkaApis.scala:334)
>>>      at kafka.server.KafkaApis$$anonfun$kafka$server$
>>> KafkaApis$$readMessageSets$1.apply(KafkaApis.scala:330)
>>>      at scala.collection.TraversableLike$$anonfun$map$
>>> 1.apply(TraversableLike.scala:206)
>>>      at scala.collection.TraversableLike$$anonfun$map$
>>> 1.apply(TraversableLike.scala:206)
>>>      at scala.collection.immutable.Map$Map1.foreach(Map.scala:105)
>>>      at scala.collection.TraversableLike$class.map(
>>> TraversableLike.scala:206)
>>>      at scala.collection.immutable.Map$Map1.map(Map.scala:93)
>>>      at kafka.server.KafkaApis.kafka$server$KafkaApis$$
>>> readMessageSets(KafkaApis.scala:330)
>>>      at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:296)
>>>      at kafka.server.KafkaApis.handle(KafkaApis.scala:66)
>>>      at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:42)
>>>      at java.lang.Thread.run(Thread.java:722)
>>>
>>> And in the controller.log, I see every once in a while something like:
>>>
>>> controller.log.2014-04-01-04:[2014-04-01 04:42:41,713] WARN [
>>> OfflinePartitionLeaderSelector]: No broker in ISR is alive for
>>> [KeyPairGenerated,0]. Elect leader 3 from live brokers 3. There's potential
>>> data loss. (kafka.controller.OfflinePartitionLeaderSelector)
>>>
>>> (Which I did via: grep "data loss" *)
>>>
>>> I'm not a programmer: I am the admin for these machines, and I just want
>>> to make sure everything is cool.
>>> Oh, the server.properties has:
>>> default.replication.factor=3
>>>
>>> Thanks,
>>>
>>> Alex
>>>
>>>
>>
>>
>> --
>> -- Guozhang
>

--
*Alex Gray* | DevOps Engineer, PureCloud
Phone +1.317.493.4291 | mobile +1.857.636.2810
*Interactive Intelligence*
Deliberately Innovative
www.inin.com <http://www.inin.com/>

Re: Exception on Startup. Is it bad or benign.

Posted by Joel Koshy <jj...@gmail.com>.

Also, when you say "graceful shutdown" you mean you issue SIGTERM? Do
you have controlled.shutdown.enable=true in the broker config. If that
is set and the controlled shutdown succeeds (i.e., if you see
'Controlled shutdown succeeded' in the broker log) then you shouldn't
be seeing the data loss warning in your controller log during the
shutdown and restarts. Or are you seeing it at other times as well?

WRT the OffsetOutOfRangeException: is your broker down for a long
period? Do you have a very low retention setting for your topics? Or
are you bringing up a consumer that has been down for a long period?

Thanks,

Joel

On Tue, Apr 08, 2014 at 04:58:08PM -0700, Guozhang Wang wrote:
> Hi Alex,
> 
> 1. There is no "cool-off" time since the rebalance should be done before
> the server complete shutdown.
> 
> 2. The logs are indicating there is possible data loss, which is "expected"
> if your producer's required.ack config is <= 1 but not == -1. If you do not
> want data loss, you can change that config value in your producer clients
> to be > 1, which will effectively trade some latency and availability for
> consistency.
> 
> Guozhang
> 
> 
> On Tue, Apr 8, 2014 at 9:51 AM, Alex Gray <Al...@inin.com> wrote:
> 
> > We have 3 Zookeepers and 3 Kafka Brokers, version 0.8.0.
> >
> > I gracefully shutdown one of the kafka brokers.
> >
> > Question 1:  Should I wait some time before starting the broker back up,
> > or can I restart it as soon as possible?  In other words, do I have to wait
> > for the other brokers to "re-balance (or whatever they do)" before starting
> > it back up?
> >
> > Question 2: Every once in a while, I get the following exception when the
> > kafka broker is starting up.  Is this bad?  Searching around the
> > newsgroups, I could not get a definitive answer. Example:
> > http://grokbase.com/t/kafka/users/13cq54bx5q/understanding-
> > offsetoutofrangeexceptions
> > http://grokbase.com/t/kafka/users/1413hp296y/trouble-
> > recovering-after-a-crashed-broker
> >
> > Here is the exception:
> > [2014-04-08 00:02:40,555] ERROR [KafkaApi-3] Error when processing fetch
> > request for partition [KeyPairGenerated,0] offset 514 from consumer with
> > correlation id 85 (kafka.server.KafkaApis)
> > kafka.common.OffsetOutOfRangeException: Request for offset 514 but we
> > only have log segments in the range 0 to 0.
> >     at kafka.log.Log.read(Log.scala:429)
> >     at kafka.server.KafkaApis.kafka$server$KafkaApis$$
> > readMessageSet(KafkaApis.scala:388)
> >     at kafka.server.KafkaApis$$anonfun$kafka$server$
> > KafkaApis$$readMessageSets$1.apply(KafkaApis.scala:334)
> >     at kafka.server.KafkaApis$$anonfun$kafka$server$
> > KafkaApis$$readMessageSets$1.apply(KafkaApis.scala:330)
> >     at scala.collection.TraversableLike$$anonfun$map$
> > 1.apply(TraversableLike.scala:206)
> >     at scala.collection.TraversableLike$$anonfun$map$
> > 1.apply(TraversableLike.scala:206)
> >     at scala.collection.immutable.Map$Map1.foreach(Map.scala:105)
> >     at scala.collection.TraversableLike$class.map(
> > TraversableLike.scala:206)
> >     at scala.collection.immutable.Map$Map1.map(Map.scala:93)
> >     at kafka.server.KafkaApis.kafka$server$KafkaApis$$
> > readMessageSets(KafkaApis.scala:330)
> >     at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:296)
> >     at kafka.server.KafkaApis.handle(KafkaApis.scala:66)
> >     at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:42)
> >     at java.lang.Thread.run(Thread.java:722)
> >
> > And in the controller.log, I see every once in a while something like:
> >
> > controller.log.2014-04-01-04:[2014-04-01 04:42:41,713] WARN [
> > OfflinePartitionLeaderSelector]: No broker in ISR is alive for
> > [KeyPairGenerated,0]. Elect leader 3 from live brokers 3. There's potential
> > data loss. (kafka.controller.OfflinePartitionLeaderSelector)
> >
> > (Which I did via: grep "data loss" *)
> >
> > I'm not a programmer: I am the admin for these machines, and I just want
> > to make sure everything is cool.
> > Oh, the server.properties has:
> > default.replication.factor=3
> >
> > Thanks,
> >
> > Alex
> >
> >
> 
> 
> -- 
> -- Guozhang