You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Mackey star <Ma...@hotmail.com> on 2017/07/18 07:18:05 UTC

kafka cluster crashs periodically

 [2017-07-15 08:45:19,071] WARN [ReplicaFetcherThread-0-3], Error in fetch kafka.server.ReplicaFetcherThread$FetchRequest@60192273 (kafka.server.ReplicaFetcherThread)
java.io.IOException: Connection to 3 was disconnected before the response was read
at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:115)
at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:112)
at scala.Option.foreach(Option.scala:236)
at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:112)
at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:108)
at kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(NetworkClientBlockingOps.scala:137)
at kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollContinuously$extension(NetworkClientBlockingOps.scala:143)
at kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:108)
at kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:253)
at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:238)
at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)

Re: kafka cluster crashs periodically

Posted by "mostolog@gmail.com" <mo...@gmail.com>.

IIUC, we are having similar issues with 0.10.2.1.

already asked on another thread.


On 18/07/17 16:49, M. Manna wrote:
> Is this from 0.10.2.1? I havebeen running on both Windows and Linux but
> cannot see any issues.
>
> Anyone else?
>
> On Tue, 18 Jul 2017 at 3:31 pm, John Yost <ho...@gmail.com> wrote:
>
>> I saw this recently as well. This could result from either really long GC
>> pauses or slow Zookeeper responses. The former can result from too big of a
>> memory heap or sub-optimal GC algorithm/GC configuration.
>>
>> --John
>>
>> On Tue, Jul 18, 2017 at 3:18 AM, Mackey star <Ma...@hotmail.com>
>> wrote:
>>
>>>   [2017-07-15 08:45:19,071] WARN [ReplicaFetcherThread-0-3], Error in
>> fetch
>>> kafka.server.ReplicaFetcherThread$FetchRequest@60192273 (kafka.server.
>>> ReplicaFetcherThread)
>>> java.io.IOException: Connection to 3 was disconnected before the response
>>> was read
>>> at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
>>> extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:115)
>>> at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
>>> extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:112)
>>> at scala.Option.foreach(Option.scala:236)
>>> at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
>>> extension$1.apply(NetworkClientBlockingOps.scala:112)
>>> at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
>>> extension$1.apply(NetworkClientBlockingOps.scala:108)
>>> at kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(
>>> NetworkClientBlockingOps.scala:137)
>>> at kafka.utils.NetworkClientBlockingOps$.kafka$utils$
>>> NetworkClientBlockingOps$$pollContinuously$extension(
>>> NetworkClientBlockingOps.scala:143)
>>> at
>> kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(
>>> NetworkClientBlockingOps.scala:108)
>>> at kafka.server.ReplicaFetcherThread.sendRequest(
>>> ReplicaFetcherThread.scala:253)
>>> at
>> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:238)
>>> at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
>>> at kafka.server.AbstractFetcherThread.processFetchRequest(
>>> AbstractFetcherThread.scala:118)
>>> at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:
>>> 103)
>>> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
>>>
>>>

Re: kafka cluster crashs periodically

Posted by "M. Manna" <ma...@gmail.com>.

Is this from 0.10.2.1? I havebeen running on both Windows and Linux but
cannot see any issues.

Anyone else?

On Tue, 18 Jul 2017 at 3:31 pm, John Yost <ho...@gmail.com> wrote:

> I saw this recently as well. This could result from either really long GC
> pauses or slow Zookeeper responses. The former can result from too big of a
> memory heap or sub-optimal GC algorithm/GC configuration.
>
> --John
>
> On Tue, Jul 18, 2017 at 3:18 AM, Mackey star <Ma...@hotmail.com>
> wrote:
>
> >  [2017-07-15 08:45:19,071] WARN [ReplicaFetcherThread-0-3], Error in
> fetch
> > kafka.server.ReplicaFetcherThread$FetchRequest@60192273 (kafka.server.
> > ReplicaFetcherThread)
> > java.io.IOException: Connection to 3 was disconnected before the response
> > was read
> > at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> > extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:115)
> > at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> > extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:112)
> > at scala.Option.foreach(Option.scala:236)
> > at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> > extension$1.apply(NetworkClientBlockingOps.scala:112)
> > at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> > extension$1.apply(NetworkClientBlockingOps.scala:108)
> > at kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(
> > NetworkClientBlockingOps.scala:137)
> > at kafka.utils.NetworkClientBlockingOps$.kafka$utils$
> > NetworkClientBlockingOps$$pollContinuously$extension(
> > NetworkClientBlockingOps.scala:143)
> > at
> kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(
> > NetworkClientBlockingOps.scala:108)
> > at kafka.server.ReplicaFetcherThread.sendRequest(
> > ReplicaFetcherThread.scala:253)
> > at
> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:238)
> > at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
> > at kafka.server.AbstractFetcherThread.processFetchRequest(
> > AbstractFetcherThread.scala:118)
> > at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:
> > 103)
> > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> >
> >
>

Re: kafka cluster crashs periodically

Posted by John Yost <ho...@gmail.com>.

I saw this recently as well. This could result from either really long GC
pauses or slow Zookeeper responses. The former can result from too big of a
memory heap or sub-optimal GC algorithm/GC configuration.

--John

On Tue, Jul 18, 2017 at 3:18 AM, Mackey star <Ma...@hotmail.com> wrote:

>  [2017-07-15 08:45:19,071] WARN [ReplicaFetcherThread-0-3], Error in fetch
> kafka.server.ReplicaFetcherThread$FetchRequest@60192273 (kafka.server.
> ReplicaFetcherThread)
> java.io.IOException: Connection to 3 was disconnected before the response
> was read
> at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:115)
> at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:112)
> at scala.Option.foreach(Option.scala:236)
> at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> extension$1.apply(NetworkClientBlockingOps.scala:112)
> at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> extension$1.apply(NetworkClientBlockingOps.scala:108)
> at kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(
> NetworkClientBlockingOps.scala:137)
> at kafka.utils.NetworkClientBlockingOps$.kafka$utils$
> NetworkClientBlockingOps$$pollContinuously$extension(
> NetworkClientBlockingOps.scala:143)
> at kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(
> NetworkClientBlockingOps.scala:108)
> at kafka.server.ReplicaFetcherThread.sendRequest(
> ReplicaFetcherThread.scala:253)
> at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:238)
> at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
> at kafka.server.AbstractFetcherThread.processFetchRequest(
> AbstractFetcherThread.scala:118)
> at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:
> 103)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
>
>

Re: On rebalance consumer group start from old offset

Posted by Justin Maltat <ju...@gmail.com>.

Solved by kafka-5600

Le mar. 18 juil. 2017 18:51, Sabarish Sasidharan <sa...@gmail.com> a
écrit :

> This is similar to a problem I am also grappling with. We store the
> processed offset for each partition in state store. And after restarts we
> see that sometimes the start offset that Kafka Streams uses is a few
> thousands to a couple million behind per partition. To compound it, this is
> not repeatable.
>
> Regards
> Sab
>
> On Tue, Jul 18, 2017 at 9:00 PM, Justin Maltat <ju...@gmail.com>
> wrote:
>
> > Hi
> >
> > On a 3 brokers cluster when one of the broker come back after a restart
> > group rebalancing happens on our 2 consumers which make them restart to
> > consume from an old offset which is not the earliest. Looking at the
> > consumer offsets through kafka tools when running commits look good but
> on
> > rebalance offset changes
> >
> > If we cut our consumer it restarts at the right offset
> >
> > Do you have any idea of what could be happening?
> >
> > Regards
> >
> > Justin Maltat
> >
> > >
> > >
> >
>

Re: On rebalance consumer group start from old offset

Posted by Sabarish Sasidharan <sa...@gmail.com>.

This is similar to a problem I am also grappling with. We store the
processed offset for each partition in state store. And after restarts we
see that sometimes the start offset that Kafka Streams uses is a few
thousands to a couple million behind per partition. To compound it, this is
not repeatable.

Regards
Sab

On Tue, Jul 18, 2017 at 9:00 PM, Justin Maltat <ju...@gmail.com>
wrote:

> Hi
>
> On a 3 brokers cluster when one of the broker come back after a restart
> group rebalancing happens on our 2 consumers which make them restart to
> consume from an old offset which is not the earliest. Looking at the
> consumer offsets through kafka tools when running commits look good but on
> rebalance offset changes
>
> If we cut our consumer it restarts at the right offset
>
> Do you have any idea of what could be happening?
>
> Regards
>
> Justin Maltat
>
> >
> >
>

On rebalance consumer group start from old offset

Posted by Justin Maltat <ju...@gmail.com>.

Hi

On a 3 brokers cluster when one of the broker come back after a restart
group rebalancing happens on our 2 consumers which make them restart to
consume from an old offset which is not the earliest. Looking at the
consumer offsets through kafka tools when running commits look good but on
rebalance offset changes

If we cut our consumer it restarts at the right offset

Do you have any idea of what could be happening?

Regards

Justin Maltat

>
>

Re: kafka cluster crashs periodically

Posted by Ismael Juma <is...@juma.me.uk>.

Hi,

This is not really a crash, it just means that a connection to the leader
was disconnected. The follower will try to reconnect periodically. If the
leader is really down, the Controller will elect a new leader and the
following will stop trying to reconnect to the old leader.

Hope this helps.

Ismael

On Tue, Jul 18, 2017 at 12:18 AM, Mackey star <Ma...@hotmail.com>
wrote:

>  [2017-07-15 08:45:19,071] WARN [ReplicaFetcherThread-0-3], Error in fetch
> kafka.server.ReplicaFetcherThread$FetchRequest@60192273 (kafka.server.
> ReplicaFetcherThread)
> java.io.IOException: Connection to 3 was disconnected before the response
> was read
> at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:115)
> at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:112)
> at scala.Option.foreach(Option.scala:236)
> at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> extension$1.apply(NetworkClientBlockingOps.scala:112)
> at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> extension$1.apply(NetworkClientBlockingOps.scala:108)
> at kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(
> NetworkClientBlockingOps.scala:137)
> at kafka.utils.NetworkClientBlockingOps$.kafka$utils$
> NetworkClientBlockingOps$$pollContinuously$extension(
> NetworkClientBlockingOps.scala:143)
> at kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(
> NetworkClientBlockingOps.scala:108)
> at kafka.server.ReplicaFetcherThread.sendRequest(
> ReplicaFetcherThread.scala:253)
> at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:238)
> at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
> at kafka.server.AbstractFetcherThread.processFetchRequest(
> AbstractFetcherThread.scala:118)
> at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:
> 103)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
>
>