You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by allen chan <al...@gmail.com> on 2016/07/01 00:06:55 UTC

Re: broker randomly shuts down

Anyone else have ideas?

This is still happening. I moved off zookeeper from the server to its own
dedicated VMs.
Kakfa starts with 4G of heap and gets nowhere near that much consumed when
it crashed.
i bumped up the zookeeper timeout settings but that has not solved it.

I also disconnected all the producers and consumers. This point something
between kafka and zookeeper right?

Again logs are no help as to why kafka decided to shut itself down
https://gist.github.com/allenmchan/f9331e54bb4fd77cc5bc0b031a7a6206




On Thu, Jun 2, 2016 at 4:22 PM, Russ Lavoie <ru...@gmail.com> wrote:

> What about in dmesg?  I have run into this issue and it was the OOM
> killer.  I also ran into a heap issue using too much of the direct memory
> (JVM).  Reducing the fetcher threads helped with that problem.
> On Jun 2, 2016 12:19 PM, "allen chan" <al...@gmail.com>
> wrote:
>
> > Hi Tom,
> >
> > That is one of the first things that i checked. Active memory never goes
> > above 50% of overall available. File cache uses the rest of the memory
> but
> > i do not think that causes OOM killer.
> > Either way there is no entries in /var/log/messages (centos) to show OOM
> is
> > happening.
> >
> > Thanks
> >
> > On Thu, Jun 2, 2016 at 5:36 AM, Tom Crayford <tc...@heroku.com>
> wrote:
> >
> > > That looks like somebody is killing the process. I'd suspect either the
> > > linux OOM killer or something else automatically killing the JVM for
> some
> > > reason.
> > >
> > > For the OOM killer, assuming you're on ubuntu, it's pretty easy to find
> > in
> > > /var/log/syslog (depending on your setup). I don't know about other
> > > operating systems.
> > >
> > > On Thu, Jun 2, 2016 at 5:54 AM, allen chan <
> allen.michael.chan@gmail.com
> > >
> > > wrote:
> > >
> > > > I have an issue where my brokers would randomly shut itself down.
> > > > I turned on debug in log4j.properties but still do not see a reason
> why
> > > the
> > > > shutdown is happening.
> > > >
> > > > Anyone seen this behavior before?
> > > >
> > > > version 0.10.0
> > > > log4j.properties
> > > >     log4j.rootLogger=DEBUG, kafkaAppender
> > > > * I tried TRACE level but i do not see any additional log messages
> > > >
> > > > snippet of log around shutdown
> > > > [2016-06-01 15:11:51,374] DEBUG Got ping response for sessionid:
> > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > [2016-06-01 15:11:53,376] DEBUG Got ping response for sessionid:
> > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > [2016-06-01 15:11:55,377] DEBUG Got ping response for sessionid:
> > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > [2016-06-01 15:11:57,380] DEBUG Got ping response for sessionid:
> > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > [2016-06-01 15:11:59,383] DEBUG Got ping response for sessionid:
> > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > [2016-06-01 15:12:01,386] DEBUG Got ping response for sessionid:
> > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > [2016-06-01 15:12:03,389] DEBUG Got ping response for sessionid:
> > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > [2016-06-01 15:12:04,121] INFO [Group Metadata Manager on Broker 2]:
> > > > Removed 0 expired offsets in 0 milliseconds.
> > > > (kafka.coordinator.GroupMetadataManager)
> > > > [2016-06-01 15:12:04,121] INFO [Group Metadata Manager on Broker 2]:
> > > > Removed 0 expired offsets in 0 milliseconds.
> > > > (kafka.coordinator.GroupMetadataManager)
> > > > [2016-06-01 15:12:05,390] DEBUG Got ping response for sessionid:
> > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > [2016-06-01 15:12:07,393] DEBUG Got ping response for sessionid:
> > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > [2016-06-01 15:12:09,396] DEBUG Got ping response for sessionid:
> > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > [2016-06-01 15:12:11,399] DEBUG Got ping response for sessionid:
> > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > [2016-06-01 15:12:13,334] INFO [Kafka Server 2], shutting down
> > > > (kafka.server.KafkaServer)
> > > > [2016-06-01 15:12:13,334] INFO [Kafka Server 2], shutting down
> > > > (kafka.server.KafkaServer)
> > > > [2016-06-01 15:12:13,336] INFO [Kafka Server 2], Starting controlled
> > > > shutdown (kafka.server.KafkaServer)
> > > > [2016-06-01 15:12:13,336] INFO [Kafka Server 2], Starting controlled
> > > > shutdown (kafka.server.KafkaServer)
> > > > [2016-06-01 15:12:13,338] DEBUG Added sensor with name
> > > connections-closed:
> > > > (org.apache.kafka.common.metrics.Metrics)
> > > > [2016-06-01 15:12:13,338] DEBUG Added sensor with name
> > > connections-created:
> > > > (org.apache.kafka.common.metrics.Metrics)
> > > > [2016-06-01 15:12:13,338] DEBUG Added sensor with name
> > > bytes-sent-received:
> > > > (org.apache.kafka.common.metrics.Metrics)
> > > > [2016-06-01 15:12:13,338] DEBUG Added sensor with name bytes-sent:
> > > > (org.apache.kafka.common.metrics.Metrics)
> > > > [2016-06-01 15:12:13,339] DEBUG Added sensor with name
> bytes-received:
> > > > (org.apache.kafka.common.metrics.Metrics)
> > > > [2016-06-01 15:12:13,339] DEBUG Added sensor with name select-time:
> > > > (org.apache.kafka.common.metrics.Metrics)
> > > >
> > > > --
> > > > Allen Michael Chan
> > > >
> > >
> >
> >
> >
> > --
> > Allen Michael Chan
> >
>



-- 
Allen Michael Chan

Re: broker randomly shuts down

Posted by Shikhar Bhushan <sh...@confluent.io>.
This is somewhat specific to your runtime environment, you can check out
whatever script is getting used for bringing up Kafka, and where the stderr
of the java command is being redirected (hopefully not /dev/null!).

On Thu, Jun 30, 2016 at 5:24 PM allen chan <al...@gmail.com>
wrote:

> Hi Shikhar,
> I do not see stderr log file anywhere. Can you point me to where kafka
> would write such a file?
>
> On Thu, Jun 30, 2016 at 5:10 PM, Shikhar Bhushan <sh...@confluent.io>
> wrote:
>
> > Perhaps it's a JVM crash? You might not see anything in the standard
> > application-level logs, you'd need to look for the stderr.
> >
> > On Thu, Jun 30, 2016 at 5:07 PM allen chan <allen.michael.chan@gmail.com
> >
> > wrote:
> >
> > > Anyone else have ideas?
> > >
> > > This is still happening. I moved off zookeeper from the server to its
> own
> > > dedicated VMs.
> > > Kakfa starts with 4G of heap and gets nowhere near that much consumed
> > when
> > > it crashed.
> > > i bumped up the zookeeper timeout settings but that has not solved it.
> > >
> > > I also disconnected all the producers and consumers. This point
> something
> > > between kafka and zookeeper right?
> > >
> > > Again logs are no help as to why kafka decided to shut itself down
> > > https://gist.github.com/allenmchan/f9331e54bb4fd77cc5bc0b031a7a6206
> > >
> > >
> > >
> > >
> > > On Thu, Jun 2, 2016 at 4:22 PM, Russ Lavoie <ru...@gmail.com>
> > wrote:
> > >
> > > > What about in dmesg?  I have run into this issue and it was the OOM
> > > > killer.  I also ran into a heap issue using too much of the direct
> > memory
> > > > (JVM).  Reducing the fetcher threads helped with that problem.
> > > > On Jun 2, 2016 12:19 PM, "allen chan" <al...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Tom,
> > > > >
> > > > > That is one of the first things that i checked. Active memory never
> > > goes
> > > > > above 50% of overall available. File cache uses the rest of the
> > memory
> > > > but
> > > > > i do not think that causes OOM killer.
> > > > > Either way there is no entries in /var/log/messages (centos) to
> show
> > > OOM
> > > > is
> > > > > happening.
> > > > >
> > > > > Thanks
> > > > >
> > > > > On Thu, Jun 2, 2016 at 5:36 AM, Tom Crayford <tcrayford@heroku.com
> >
> > > > wrote:
> > > > >
> > > > > > That looks like somebody is killing the process. I'd suspect
> either
> > > the
> > > > > > linux OOM killer or something else automatically killing the JVM
> > for
> > > > some
> > > > > > reason.
> > > > > >
> > > > > > For the OOM killer, assuming you're on ubuntu, it's pretty easy
> to
> > > find
> > > > > in
> > > > > > /var/log/syslog (depending on your setup). I don't know about
> other
> > > > > > operating systems.
> > > > > >
> > > > > > On Thu, Jun 2, 2016 at 5:54 AM, allen chan <
> > > > allen.michael.chan@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > I have an issue where my brokers would randomly shut itself
> down.
> > > > > > > I turned on debug in log4j.properties but still do not see a
> > reason
> > > > why
> > > > > > the
> > > > > > > shutdown is happening.
> > > > > > >
> > > > > > > Anyone seen this behavior before?
> > > > > > >
> > > > > > > version 0.10.0
> > > > > > > log4j.properties
> > > > > > >     log4j.rootLogger=DEBUG, kafkaAppender
> > > > > > > * I tried TRACE level but i do not see any additional log
> > messages
> > > > > > >
> > > > > > > snippet of log around shutdown
> > > > > > > [2016-06-01 15:11:51,374] DEBUG Got ping response for
> sessionid:
> > > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > > > [2016-06-01 15:11:53,376] DEBUG Got ping response for
> sessionid:
> > > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > > > [2016-06-01 15:11:55,377] DEBUG Got ping response for
> sessionid:
> > > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > > > [2016-06-01 15:11:57,380] DEBUG Got ping response for
> sessionid:
> > > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > > > [2016-06-01 15:11:59,383] DEBUG Got ping response for
> sessionid:
> > > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > > > [2016-06-01 15:12:01,386] DEBUG Got ping response for
> sessionid:
> > > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > > > [2016-06-01 15:12:03,389] DEBUG Got ping response for
> sessionid:
> > > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > > > [2016-06-01 15:12:04,121] INFO [Group Metadata Manager on
> Broker
> > > 2]:
> > > > > > > Removed 0 expired offsets in 0 milliseconds.
> > > > > > > (kafka.coordinator.GroupMetadataManager)
> > > > > > > [2016-06-01 15:12:04,121] INFO [Group Metadata Manager on
> Broker
> > > 2]:
> > > > > > > Removed 0 expired offsets in 0 milliseconds.
> > > > > > > (kafka.coordinator.GroupMetadataManager)
> > > > > > > [2016-06-01 15:12:05,390] DEBUG Got ping response for
> sessionid:
> > > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > > > [2016-06-01 15:12:07,393] DEBUG Got ping response for
> sessionid:
> > > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > > > [2016-06-01 15:12:09,396] DEBUG Got ping response for
> sessionid:
> > > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > > > [2016-06-01 15:12:11,399] DEBUG Got ping response for
> sessionid:
> > > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > > > [2016-06-01 15:12:13,334] INFO [Kafka Server 2], shutting down
> > > > > > > (kafka.server.KafkaServer)
> > > > > > > [2016-06-01 15:12:13,334] INFO [Kafka Server 2], shutting down
> > > > > > > (kafka.server.KafkaServer)
> > > > > > > [2016-06-01 15:12:13,336] INFO [Kafka Server 2], Starting
> > > controlled
> > > > > > > shutdown (kafka.server.KafkaServer)
> > > > > > > [2016-06-01 15:12:13,336] INFO [Kafka Server 2], Starting
> > > controlled
> > > > > > > shutdown (kafka.server.KafkaServer)
> > > > > > > [2016-06-01 15:12:13,338] DEBUG Added sensor with name
> > > > > > connections-closed:
> > > > > > > (org.apache.kafka.common.metrics.Metrics)
> > > > > > > [2016-06-01 15:12:13,338] DEBUG Added sensor with name
> > > > > > connections-created:
> > > > > > > (org.apache.kafka.common.metrics.Metrics)
> > > > > > > [2016-06-01 15:12:13,338] DEBUG Added sensor with name
> > > > > > bytes-sent-received:
> > > > > > > (org.apache.kafka.common.metrics.Metrics)
> > > > > > > [2016-06-01 15:12:13,338] DEBUG Added sensor with name
> > bytes-sent:
> > > > > > > (org.apache.kafka.common.metrics.Metrics)
> > > > > > > [2016-06-01 15:12:13,339] DEBUG Added sensor with name
> > > > bytes-received:
> > > > > > > (org.apache.kafka.common.metrics.Metrics)
> > > > > > > [2016-06-01 15:12:13,339] DEBUG Added sensor with name
> > select-time:
> > > > > > > (org.apache.kafka.common.metrics.Metrics)
> > > > > > >
> > > > > > > --
> > > > > > > Allen Michael Chan
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Allen Michael Chan
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Allen Michael Chan
> > >
> >
>
>
>
> --
> Allen Michael Chan
>

Re: broker randomly shuts down

Posted by allen chan <al...@gmail.com>.
Hi Shikhar,
I do not see stderr log file anywhere. Can you point me to where kafka
would write such a file?

On Thu, Jun 30, 2016 at 5:10 PM, Shikhar Bhushan <sh...@confluent.io>
wrote:

> Perhaps it's a JVM crash? You might not see anything in the standard
> application-level logs, you'd need to look for the stderr.
>
> On Thu, Jun 30, 2016 at 5:07 PM allen chan <al...@gmail.com>
> wrote:
>
> > Anyone else have ideas?
> >
> > This is still happening. I moved off zookeeper from the server to its own
> > dedicated VMs.
> > Kakfa starts with 4G of heap and gets nowhere near that much consumed
> when
> > it crashed.
> > i bumped up the zookeeper timeout settings but that has not solved it.
> >
> > I also disconnected all the producers and consumers. This point something
> > between kafka and zookeeper right?
> >
> > Again logs are no help as to why kafka decided to shut itself down
> > https://gist.github.com/allenmchan/f9331e54bb4fd77cc5bc0b031a7a6206
> >
> >
> >
> >
> > On Thu, Jun 2, 2016 at 4:22 PM, Russ Lavoie <ru...@gmail.com>
> wrote:
> >
> > > What about in dmesg?  I have run into this issue and it was the OOM
> > > killer.  I also ran into a heap issue using too much of the direct
> memory
> > > (JVM).  Reducing the fetcher threads helped with that problem.
> > > On Jun 2, 2016 12:19 PM, "allen chan" <al...@gmail.com>
> > > wrote:
> > >
> > > > Hi Tom,
> > > >
> > > > That is one of the first things that i checked. Active memory never
> > goes
> > > > above 50% of overall available. File cache uses the rest of the
> memory
> > > but
> > > > i do not think that causes OOM killer.
> > > > Either way there is no entries in /var/log/messages (centos) to show
> > OOM
> > > is
> > > > happening.
> > > >
> > > > Thanks
> > > >
> > > > On Thu, Jun 2, 2016 at 5:36 AM, Tom Crayford <tc...@heroku.com>
> > > wrote:
> > > >
> > > > > That looks like somebody is killing the process. I'd suspect either
> > the
> > > > > linux OOM killer or something else automatically killing the JVM
> for
> > > some
> > > > > reason.
> > > > >
> > > > > For the OOM killer, assuming you're on ubuntu, it's pretty easy to
> > find
> > > > in
> > > > > /var/log/syslog (depending on your setup). I don't know about other
> > > > > operating systems.
> > > > >
> > > > > On Thu, Jun 2, 2016 at 5:54 AM, allen chan <
> > > allen.michael.chan@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > I have an issue where my brokers would randomly shut itself down.
> > > > > > I turned on debug in log4j.properties but still do not see a
> reason
> > > why
> > > > > the
> > > > > > shutdown is happening.
> > > > > >
> > > > > > Anyone seen this behavior before?
> > > > > >
> > > > > > version 0.10.0
> > > > > > log4j.properties
> > > > > >     log4j.rootLogger=DEBUG, kafkaAppender
> > > > > > * I tried TRACE level but i do not see any additional log
> messages
> > > > > >
> > > > > > snippet of log around shutdown
> > > > > > [2016-06-01 15:11:51,374] DEBUG Got ping response for sessionid:
> > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > > [2016-06-01 15:11:53,376] DEBUG Got ping response for sessionid:
> > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > > [2016-06-01 15:11:55,377] DEBUG Got ping response for sessionid:
> > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > > [2016-06-01 15:11:57,380] DEBUG Got ping response for sessionid:
> > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > > [2016-06-01 15:11:59,383] DEBUG Got ping response for sessionid:
> > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > > [2016-06-01 15:12:01,386] DEBUG Got ping response for sessionid:
> > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > > [2016-06-01 15:12:03,389] DEBUG Got ping response for sessionid:
> > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > > [2016-06-01 15:12:04,121] INFO [Group Metadata Manager on Broker
> > 2]:
> > > > > > Removed 0 expired offsets in 0 milliseconds.
> > > > > > (kafka.coordinator.GroupMetadataManager)
> > > > > > [2016-06-01 15:12:04,121] INFO [Group Metadata Manager on Broker
> > 2]:
> > > > > > Removed 0 expired offsets in 0 milliseconds.
> > > > > > (kafka.coordinator.GroupMetadataManager)
> > > > > > [2016-06-01 15:12:05,390] DEBUG Got ping response for sessionid:
> > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > > [2016-06-01 15:12:07,393] DEBUG Got ping response for sessionid:
> > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > > [2016-06-01 15:12:09,396] DEBUG Got ping response for sessionid:
> > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > > [2016-06-01 15:12:11,399] DEBUG Got ping response for sessionid:
> > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > > [2016-06-01 15:12:13,334] INFO [Kafka Server 2], shutting down
> > > > > > (kafka.server.KafkaServer)
> > > > > > [2016-06-01 15:12:13,334] INFO [Kafka Server 2], shutting down
> > > > > > (kafka.server.KafkaServer)
> > > > > > [2016-06-01 15:12:13,336] INFO [Kafka Server 2], Starting
> > controlled
> > > > > > shutdown (kafka.server.KafkaServer)
> > > > > > [2016-06-01 15:12:13,336] INFO [Kafka Server 2], Starting
> > controlled
> > > > > > shutdown (kafka.server.KafkaServer)
> > > > > > [2016-06-01 15:12:13,338] DEBUG Added sensor with name
> > > > > connections-closed:
> > > > > > (org.apache.kafka.common.metrics.Metrics)
> > > > > > [2016-06-01 15:12:13,338] DEBUG Added sensor with name
> > > > > connections-created:
> > > > > > (org.apache.kafka.common.metrics.Metrics)
> > > > > > [2016-06-01 15:12:13,338] DEBUG Added sensor with name
> > > > > bytes-sent-received:
> > > > > > (org.apache.kafka.common.metrics.Metrics)
> > > > > > [2016-06-01 15:12:13,338] DEBUG Added sensor with name
> bytes-sent:
> > > > > > (org.apache.kafka.common.metrics.Metrics)
> > > > > > [2016-06-01 15:12:13,339] DEBUG Added sensor with name
> > > bytes-received:
> > > > > > (org.apache.kafka.common.metrics.Metrics)
> > > > > > [2016-06-01 15:12:13,339] DEBUG Added sensor with name
> select-time:
> > > > > > (org.apache.kafka.common.metrics.Metrics)
> > > > > >
> > > > > > --
> > > > > > Allen Michael Chan
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Allen Michael Chan
> > > >
> > >
> >
> >
> >
> > --
> > Allen Michael Chan
> >
>



-- 
Allen Michael Chan

Re: broker randomly shuts down

Posted by Shikhar Bhushan <sh...@confluent.io>.
Perhaps it's a JVM crash? You might not see anything in the standard
application-level logs, you'd need to look for the stderr.

On Thu, Jun 30, 2016 at 5:07 PM allen chan <al...@gmail.com>
wrote:

> Anyone else have ideas?
>
> This is still happening. I moved off zookeeper from the server to its own
> dedicated VMs.
> Kakfa starts with 4G of heap and gets nowhere near that much consumed when
> it crashed.
> i bumped up the zookeeper timeout settings but that has not solved it.
>
> I also disconnected all the producers and consumers. This point something
> between kafka and zookeeper right?
>
> Again logs are no help as to why kafka decided to shut itself down
> https://gist.github.com/allenmchan/f9331e54bb4fd77cc5bc0b031a7a6206
>
>
>
>
> On Thu, Jun 2, 2016 at 4:22 PM, Russ Lavoie <ru...@gmail.com> wrote:
>
> > What about in dmesg?  I have run into this issue and it was the OOM
> > killer.  I also ran into a heap issue using too much of the direct memory
> > (JVM).  Reducing the fetcher threads helped with that problem.
> > On Jun 2, 2016 12:19 PM, "allen chan" <al...@gmail.com>
> > wrote:
> >
> > > Hi Tom,
> > >
> > > That is one of the first things that i checked. Active memory never
> goes
> > > above 50% of overall available. File cache uses the rest of the memory
> > but
> > > i do not think that causes OOM killer.
> > > Either way there is no entries in /var/log/messages (centos) to show
> OOM
> > is
> > > happening.
> > >
> > > Thanks
> > >
> > > On Thu, Jun 2, 2016 at 5:36 AM, Tom Crayford <tc...@heroku.com>
> > wrote:
> > >
> > > > That looks like somebody is killing the process. I'd suspect either
> the
> > > > linux OOM killer or something else automatically killing the JVM for
> > some
> > > > reason.
> > > >
> > > > For the OOM killer, assuming you're on ubuntu, it's pretty easy to
> find
> > > in
> > > > /var/log/syslog (depending on your setup). I don't know about other
> > > > operating systems.
> > > >
> > > > On Thu, Jun 2, 2016 at 5:54 AM, allen chan <
> > allen.michael.chan@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > I have an issue where my brokers would randomly shut itself down.
> > > > > I turned on debug in log4j.properties but still do not see a reason
> > why
> > > > the
> > > > > shutdown is happening.
> > > > >
> > > > > Anyone seen this behavior before?
> > > > >
> > > > > version 0.10.0
> > > > > log4j.properties
> > > > >     log4j.rootLogger=DEBUG, kafkaAppender
> > > > > * I tried TRACE level but i do not see any additional log messages
> > > > >
> > > > > snippet of log around shutdown
> > > > > [2016-06-01 15:11:51,374] DEBUG Got ping response for sessionid:
> > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > [2016-06-01 15:11:53,376] DEBUG Got ping response for sessionid:
> > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > [2016-06-01 15:11:55,377] DEBUG Got ping response for sessionid:
> > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > [2016-06-01 15:11:57,380] DEBUG Got ping response for sessionid:
> > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > [2016-06-01 15:11:59,383] DEBUG Got ping response for sessionid:
> > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > [2016-06-01 15:12:01,386] DEBUG Got ping response for sessionid:
> > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > [2016-06-01 15:12:03,389] DEBUG Got ping response for sessionid:
> > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > [2016-06-01 15:12:04,121] INFO [Group Metadata Manager on Broker
> 2]:
> > > > > Removed 0 expired offsets in 0 milliseconds.
> > > > > (kafka.coordinator.GroupMetadataManager)
> > > > > [2016-06-01 15:12:04,121] INFO [Group Metadata Manager on Broker
> 2]:
> > > > > Removed 0 expired offsets in 0 milliseconds.
> > > > > (kafka.coordinator.GroupMetadataManager)
> > > > > [2016-06-01 15:12:05,390] DEBUG Got ping response for sessionid:
> > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > [2016-06-01 15:12:07,393] DEBUG Got ping response for sessionid:
> > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > [2016-06-01 15:12:09,396] DEBUG Got ping response for sessionid:
> > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > [2016-06-01 15:12:11,399] DEBUG Got ping response for sessionid:
> > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> > > > > [2016-06-01 15:12:13,334] INFO [Kafka Server 2], shutting down
> > > > > (kafka.server.KafkaServer)
> > > > > [2016-06-01 15:12:13,334] INFO [Kafka Server 2], shutting down
> > > > > (kafka.server.KafkaServer)
> > > > > [2016-06-01 15:12:13,336] INFO [Kafka Server 2], Starting
> controlled
> > > > > shutdown (kafka.server.KafkaServer)
> > > > > [2016-06-01 15:12:13,336] INFO [Kafka Server 2], Starting
> controlled
> > > > > shutdown (kafka.server.KafkaServer)
> > > > > [2016-06-01 15:12:13,338] DEBUG Added sensor with name
> > > > connections-closed:
> > > > > (org.apache.kafka.common.metrics.Metrics)
> > > > > [2016-06-01 15:12:13,338] DEBUG Added sensor with name
> > > > connections-created:
> > > > > (org.apache.kafka.common.metrics.Metrics)
> > > > > [2016-06-01 15:12:13,338] DEBUG Added sensor with name
> > > > bytes-sent-received:
> > > > > (org.apache.kafka.common.metrics.Metrics)
> > > > > [2016-06-01 15:12:13,338] DEBUG Added sensor with name bytes-sent:
> > > > > (org.apache.kafka.common.metrics.Metrics)
> > > > > [2016-06-01 15:12:13,339] DEBUG Added sensor with name
> > bytes-received:
> > > > > (org.apache.kafka.common.metrics.Metrics)
> > > > > [2016-06-01 15:12:13,339] DEBUG Added sensor with name select-time:
> > > > > (org.apache.kafka.common.metrics.Metrics)
> > > > >
> > > > > --
> > > > > Allen Michael Chan
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Allen Michael Chan
> > >
> >
>
>
>
> --
> Allen Michael Chan
>