You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by Patrick Hunt <ph...@apache.org> on 2015/05/08 18:46:43 UTC

Great post on PagerDuty today.

There's a great post on Pager Duty today,
http://www.pagerduty.com/blog/the-discovery-of-apache-zookeepers-poison-packet/
some good comments on hackernews too
https://news.ycombinator.com/item?id=9509698

If I understand correctly bug1 is already fixed:
https://issues.apache.org/jira/browse/ZOOKEEPER-2146
should be released in 3.4.7+

However bug2
https://issues.apache.org/jira/browse/ZOOKEEPER-602
is just in 3.5 and not 3.4.x.  Note my push back in the comments on 602 re
risk vs reward. Evan makes a good case for including it. :-)

We should also recommend that folks run with
-XX:-HeapDumpOnOutOfMemoryError
I would think. That should have caused the jvm to restart when bug1 was hit.

Thoughts? Hongchao can you confirm that 2146 fixes bug 1?

Patrick

Re: Great post on PagerDuty today.

Posted by Rakesh Radhakrishnan <ra...@gmail.com>.
Nice article! Thanks for sharing the thoughts.

ZK-602, ZK-1907, ZK-2029
I think we would be able to back port these changes. Since I've
prepared/contributed these patches, let me try to recollect the changes and
come up with 3.4.7 patch in next week. I will ping you all if I face any
difficulties while doing this.

Best Regards,
Rakesh

On Fri, May 8, 2015 at 11:59 PM, Patrick Hunt <ph...@apache.org> wrote:

> Thanks Hongchao, I linked it to 602.
>
> Patrick
>
> On Fri, May 8, 2015 at 11:17 AM, Hongchao Deng <hd...@cloudera.com> wrote:
>
> > I believe bug #2 is fixed by ZOOKEEPER-1907
> >
> > On Fri, May 8, 2015 at 10:46 AM, Chris Nauroth <cnauroth@hortonworks.com
> >
> > wrote:
> >
> > > No worries!  I got the gist of it.  :-)  It all makes sense, and like
> you
> > > said, I often use the 2 options at the same time.
> > >
> > > I filed ZOOKEEPER-2185 to track a documentation enhancement for this.
> > >
> > > --Chris Nauroth
> > >
> > >
> > >
> > >
> > > On 5/8/15, 10:30 AM, "Patrick Hunt" <ph...@apache.org> wrote:
> > >
> > > >Chris, you're right, my bad. I often run with both. :-) You do need a
> > > >supervisor regardless in case the JVM exits it should be restarted.
> > > >
> >
> http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#sc_supervision
> > > >
> > > >Patrick
> > > >
> > > >On Fri, May 8, 2015 at 10:23 AM, Chris Nauroth <
> > cnauroth@hortonworks.com>
> > > >wrote:
> > > >
> > > >> Thanks for sharing!  I love reading articles like this that cover
> > > >>multiple
> > > >> layers of a system as part of an investigation.
> > > >>
> > > >> Can you clarify the comment about -XX:-HeapDumpOnOutOfMemoryError?
> I
> > > >> believe this would not restart the JVM and instead would log the
> > > >>contents
> > > >> of the heap (which is still very valuable for post-mortem analysis).
> > > >> Would you also recommend something like -XX:OnOutOfMemoryError="kill
> > > >>%p",
> > > >> under the assumption that a process monitor like upstart or monit
> will
> > > >> bring it back up?
> > > >>
> > > >> --Chris Nauroth
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On 5/8/15, 9:46 AM, "Patrick Hunt" <ph...@apache.org> wrote:
> > > >>
> > > >> >There's a great post on Pager Duty today,
> > > >> >
> > > >>
> > > >>
> > >
> >
> http://www.pagerduty.com/blog/the-discovery-of-apache-zookeepers-poison-p
> > > >>a
> > > >> >cket/
> > > >> >some good comments on hackernews too
> > > >> >https://news.ycombinator.com/item?id=9509698
> > > >> >
> > > >> >If I understand correctly bug1 is already fixed:
> > > >> >https://issues.apache.org/jira/browse/ZOOKEEPER-2146
> > > >> >should be released in 3.4.7+
> > > >> >
> > > >> >However bug2
> > > >> >https://issues.apache.org/jira/browse/ZOOKEEPER-602
> > > >> >is just in 3.5 and not 3.4.x.  Note my push back in the comments on
> > > >>602 re
> > > >> >risk vs reward. Evan makes a good case for including it. :-)
> > > >> >
> > > >> >We should also recommend that folks run with
> > > >> >-XX:-HeapDumpOnOutOfMemoryError
> > > >> >I would think. That should have caused the jvm to restart when bug1
> > was
> > > >> >hit.
> > > >> >
> > > >> >Thoughts? Hongchao can you confirm that 2146 fixes bug 1?
> > > >> >
> > > >> >Patrick
> > > >>
> > > >>
> > >
> > >
> >
> >
> > --
> > *- Hongchao Deng*
> > *Software Engineer*
> >
>

Re: Great post on PagerDuty today.

Posted by Patrick Hunt <ph...@apache.org>.
Thanks Hongchao, I linked it to 602.

Patrick

On Fri, May 8, 2015 at 11:17 AM, Hongchao Deng <hd...@cloudera.com> wrote:

> I believe bug #2 is fixed by ZOOKEEPER-1907
>
> On Fri, May 8, 2015 at 10:46 AM, Chris Nauroth <cn...@hortonworks.com>
> wrote:
>
> > No worries!  I got the gist of it.  :-)  It all makes sense, and like you
> > said, I often use the 2 options at the same time.
> >
> > I filed ZOOKEEPER-2185 to track a documentation enhancement for this.
> >
> > --Chris Nauroth
> >
> >
> >
> >
> > On 5/8/15, 10:30 AM, "Patrick Hunt" <ph...@apache.org> wrote:
> >
> > >Chris, you're right, my bad. I often run with both. :-) You do need a
> > >supervisor regardless in case the JVM exits it should be restarted.
> > >
> http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#sc_supervision
> > >
> > >Patrick
> > >
> > >On Fri, May 8, 2015 at 10:23 AM, Chris Nauroth <
> cnauroth@hortonworks.com>
> > >wrote:
> > >
> > >> Thanks for sharing!  I love reading articles like this that cover
> > >>multiple
> > >> layers of a system as part of an investigation.
> > >>
> > >> Can you clarify the comment about -XX:-HeapDumpOnOutOfMemoryError?  I
> > >> believe this would not restart the JVM and instead would log the
> > >>contents
> > >> of the heap (which is still very valuable for post-mortem analysis).
> > >> Would you also recommend something like -XX:OnOutOfMemoryError="kill
> > >>%p",
> > >> under the assumption that a process monitor like upstart or monit will
> > >> bring it back up?
> > >>
> > >> --Chris Nauroth
> > >>
> > >>
> > >>
> > >>
> > >> On 5/8/15, 9:46 AM, "Patrick Hunt" <ph...@apache.org> wrote:
> > >>
> > >> >There's a great post on Pager Duty today,
> > >> >
> > >>
> > >>
> >
> http://www.pagerduty.com/blog/the-discovery-of-apache-zookeepers-poison-p
> > >>a
> > >> >cket/
> > >> >some good comments on hackernews too
> > >> >https://news.ycombinator.com/item?id=9509698
> > >> >
> > >> >If I understand correctly bug1 is already fixed:
> > >> >https://issues.apache.org/jira/browse/ZOOKEEPER-2146
> > >> >should be released in 3.4.7+
> > >> >
> > >> >However bug2
> > >> >https://issues.apache.org/jira/browse/ZOOKEEPER-602
> > >> >is just in 3.5 and not 3.4.x.  Note my push back in the comments on
> > >>602 re
> > >> >risk vs reward. Evan makes a good case for including it. :-)
> > >> >
> > >> >We should also recommend that folks run with
> > >> >-XX:-HeapDumpOnOutOfMemoryError
> > >> >I would think. That should have caused the jvm to restart when bug1
> was
> > >> >hit.
> > >> >
> > >> >Thoughts? Hongchao can you confirm that 2146 fixes bug 1?
> > >> >
> > >> >Patrick
> > >>
> > >>
> >
> >
>
>
> --
> *- Hongchao Deng*
> *Software Engineer*
>

Re: Great post on PagerDuty today.

Posted by Hongchao Deng <hd...@cloudera.com>.
I believe bug #2 is fixed by ZOOKEEPER-1907

On Fri, May 8, 2015 at 10:46 AM, Chris Nauroth <cn...@hortonworks.com>
wrote:

> No worries!  I got the gist of it.  :-)  It all makes sense, and like you
> said, I often use the 2 options at the same time.
>
> I filed ZOOKEEPER-2185 to track a documentation enhancement for this.
>
> --Chris Nauroth
>
>
>
>
> On 5/8/15, 10:30 AM, "Patrick Hunt" <ph...@apache.org> wrote:
>
> >Chris, you're right, my bad. I often run with both. :-) You do need a
> >supervisor regardless in case the JVM exits it should be restarted.
> >http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#sc_supervision
> >
> >Patrick
> >
> >On Fri, May 8, 2015 at 10:23 AM, Chris Nauroth <cn...@hortonworks.com>
> >wrote:
> >
> >> Thanks for sharing!  I love reading articles like this that cover
> >>multiple
> >> layers of a system as part of an investigation.
> >>
> >> Can you clarify the comment about -XX:-HeapDumpOnOutOfMemoryError?  I
> >> believe this would not restart the JVM and instead would log the
> >>contents
> >> of the heap (which is still very valuable for post-mortem analysis).
> >> Would you also recommend something like -XX:OnOutOfMemoryError="kill
> >>%p",
> >> under the assumption that a process monitor like upstart or monit will
> >> bring it back up?
> >>
> >> --Chris Nauroth
> >>
> >>
> >>
> >>
> >> On 5/8/15, 9:46 AM, "Patrick Hunt" <ph...@apache.org> wrote:
> >>
> >> >There's a great post on Pager Duty today,
> >> >
> >>
> >>
> http://www.pagerduty.com/blog/the-discovery-of-apache-zookeepers-poison-p
> >>a
> >> >cket/
> >> >some good comments on hackernews too
> >> >https://news.ycombinator.com/item?id=9509698
> >> >
> >> >If I understand correctly bug1 is already fixed:
> >> >https://issues.apache.org/jira/browse/ZOOKEEPER-2146
> >> >should be released in 3.4.7+
> >> >
> >> >However bug2
> >> >https://issues.apache.org/jira/browse/ZOOKEEPER-602
> >> >is just in 3.5 and not 3.4.x.  Note my push back in the comments on
> >>602 re
> >> >risk vs reward. Evan makes a good case for including it. :-)
> >> >
> >> >We should also recommend that folks run with
> >> >-XX:-HeapDumpOnOutOfMemoryError
> >> >I would think. That should have caused the jvm to restart when bug1 was
> >> >hit.
> >> >
> >> >Thoughts? Hongchao can you confirm that 2146 fixes bug 1?
> >> >
> >> >Patrick
> >>
> >>
>
>


-- 
*- Hongchao Deng*
*Software Engineer*

Re: Great post on PagerDuty today.

Posted by Chris Nauroth <cn...@hortonworks.com>.
No worries!  I got the gist of it.  :-)  It all makes sense, and like you
said, I often use the 2 options at the same time.

I filed ZOOKEEPER-2185 to track a documentation enhancement for this.

--Chris Nauroth




On 5/8/15, 10:30 AM, "Patrick Hunt" <ph...@apache.org> wrote:

>Chris, you're right, my bad. I often run with both. :-) You do need a
>supervisor regardless in case the JVM exits it should be restarted.
>http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#sc_supervision
>
>Patrick
>
>On Fri, May 8, 2015 at 10:23 AM, Chris Nauroth <cn...@hortonworks.com>
>wrote:
>
>> Thanks for sharing!  I love reading articles like this that cover
>>multiple
>> layers of a system as part of an investigation.
>>
>> Can you clarify the comment about -XX:-HeapDumpOnOutOfMemoryError?  I
>> believe this would not restart the JVM and instead would log the
>>contents
>> of the heap (which is still very valuable for post-mortem analysis).
>> Would you also recommend something like -XX:OnOutOfMemoryError="kill
>>%p",
>> under the assumption that a process monitor like upstart or monit will
>> bring it back up?
>>
>> --Chris Nauroth
>>
>>
>>
>>
>> On 5/8/15, 9:46 AM, "Patrick Hunt" <ph...@apache.org> wrote:
>>
>> >There's a great post on Pager Duty today,
>> >
>> 
>>http://www.pagerduty.com/blog/the-discovery-of-apache-zookeepers-poison-p
>>a
>> >cket/
>> >some good comments on hackernews too
>> >https://news.ycombinator.com/item?id=9509698
>> >
>> >If I understand correctly bug1 is already fixed:
>> >https://issues.apache.org/jira/browse/ZOOKEEPER-2146
>> >should be released in 3.4.7+
>> >
>> >However bug2
>> >https://issues.apache.org/jira/browse/ZOOKEEPER-602
>> >is just in 3.5 and not 3.4.x.  Note my push back in the comments on
>>602 re
>> >risk vs reward. Evan makes a good case for including it. :-)
>> >
>> >We should also recommend that folks run with
>> >-XX:-HeapDumpOnOutOfMemoryError
>> >I would think. That should have caused the jvm to restart when bug1 was
>> >hit.
>> >
>> >Thoughts? Hongchao can you confirm that 2146 fixes bug 1?
>> >
>> >Patrick
>>
>>


Re: Great post on PagerDuty today.

Posted by Patrick Hunt <ph...@apache.org>.
Chris, you're right, my bad. I often run with both. :-) You do need a
supervisor regardless in case the JVM exits it should be restarted.
http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#sc_supervision

Patrick

On Fri, May 8, 2015 at 10:23 AM, Chris Nauroth <cn...@hortonworks.com>
wrote:

> Thanks for sharing!  I love reading articles like this that cover multiple
> layers of a system as part of an investigation.
>
> Can you clarify the comment about -XX:-HeapDumpOnOutOfMemoryError?  I
> believe this would not restart the JVM and instead would log the contents
> of the heap (which is still very valuable for post-mortem analysis).
> Would you also recommend something like -XX:OnOutOfMemoryError="kill %p",
> under the assumption that a process monitor like upstart or monit will
> bring it back up?
>
> --Chris Nauroth
>
>
>
>
> On 5/8/15, 9:46 AM, "Patrick Hunt" <ph...@apache.org> wrote:
>
> >There's a great post on Pager Duty today,
> >
> http://www.pagerduty.com/blog/the-discovery-of-apache-zookeepers-poison-pa
> >cket/
> >some good comments on hackernews too
> >https://news.ycombinator.com/item?id=9509698
> >
> >If I understand correctly bug1 is already fixed:
> >https://issues.apache.org/jira/browse/ZOOKEEPER-2146
> >should be released in 3.4.7+
> >
> >However bug2
> >https://issues.apache.org/jira/browse/ZOOKEEPER-602
> >is just in 3.5 and not 3.4.x.  Note my push back in the comments on 602 re
> >risk vs reward. Evan makes a good case for including it. :-)
> >
> >We should also recommend that folks run with
> >-XX:-HeapDumpOnOutOfMemoryError
> >I would think. That should have caused the jvm to restart when bug1 was
> >hit.
> >
> >Thoughts? Hongchao can you confirm that 2146 fixes bug 1?
> >
> >Patrick
>
>

Re: Great post on PagerDuty today.

Posted by Chris Nauroth <cn...@hortonworks.com>.
Thanks for sharing!  I love reading articles like this that cover multiple
layers of a system as part of an investigation.

Can you clarify the comment about -XX:-HeapDumpOnOutOfMemoryError?  I
believe this would not restart the JVM and instead would log the contents
of the heap (which is still very valuable for post-mortem analysis).
Would you also recommend something like -XX:OnOutOfMemoryError="kill %p",
under the assumption that a process monitor like upstart or monit will
bring it back up?

--Chris Nauroth




On 5/8/15, 9:46 AM, "Patrick Hunt" <ph...@apache.org> wrote:

>There's a great post on Pager Duty today,
>http://www.pagerduty.com/blog/the-discovery-of-apache-zookeepers-poison-pa
>cket/
>some good comments on hackernews too
>https://news.ycombinator.com/item?id=9509698
>
>If I understand correctly bug1 is already fixed:
>https://issues.apache.org/jira/browse/ZOOKEEPER-2146
>should be released in 3.4.7+
>
>However bug2
>https://issues.apache.org/jira/browse/ZOOKEEPER-602
>is just in 3.5 and not 3.4.x.  Note my push back in the comments on 602 re
>risk vs reward. Evan makes a good case for including it. :-)
>
>We should also recommend that folks run with
>-XX:-HeapDumpOnOutOfMemoryError
>I would think. That should have caused the jvm to restart when bug1 was
>hit.
>
>Thoughts? Hongchao can you confirm that 2146 fixes bug 1?
>
>Patrick


Re: Great post on PagerDuty today.

Posted by Raúl Gutiérrez Segalés <rg...@itevenworks.net>.
On 8 May 2015 at 09:46, Patrick Hunt <ph...@apache.org> wrote:

> There's a great post on Pager Duty today,
>
> http://www.pagerduty.com/blog/the-discovery-of-apache-zookeepers-poison-packet/
> some good comments on hackernews too
> https://news.ycombinator.com/item?id=9509698
>
> If I understand correctly bug1 is already fixed:
> https://issues.apache.org/jira/browse/ZOOKEEPER-2146
> should be released in 3.4.7+
>
> However bug2
> https://issues.apache.org/jira/browse/ZOOKEEPER-602
> is just in 3.5 and not 3.4.x.  Note my push back in the comments on 602 re
> risk vs reward. Evan makes a good case for including it. :-)
>
> We should also recommend that folks run with
> -XX:-HeapDumpOnOutOfMemoryError
> I would think. That should have caused the jvm to restart when bug1 was
> hit.
>
> Thoughts? Hongchao can you confirm that 2146 fixes bug 1?
>

While we are at the topic of bad input:
https://issues.apache.org/jira/browse/ZOOKEEPER-2186.

I have an internal patch for trunk, will back-port to 3.4 as well.


-rgs

Re: Great post on PagerDuty today.

Posted by Raúl Gutiérrez Segalés <rg...@itevenworks.net>.
Hi,

On 8 May 2015 at 09:46, Patrick Hunt <ph...@apache.org> wrote:

> There's a great post on Pager Duty today,
>
> http://www.pagerduty.com/blog/the-discovery-of-apache-zookeepers-poison-packet/
> some good comments on hackernews too
> https://news.ycombinator.com/item?id=9509698
>
> If I understand correctly bug1 is already fixed:
> https://issues.apache.org/jira/browse/ZOOKEEPER-2146
> should be released in 3.4.7+
>

Yup. The more the reason to get 3.4.7 out soon :-) We only need a few more
reviews to get there.


>
> However bug2
> https://issues.apache.org/jira/browse/ZOOKEEPER-602
> is just in 3.5 and not 3.4.x.  Note my push back in the comments on 602 re
> risk vs reward. Evan makes a good case for including it. :-)
>

I am, initially, -1 on the backport. It's too big and it's a bit of a
distraction towards getting  3.5.0 to stable status.


-rgs