You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zookeeper.apache.org by ibrahim El-sanosi <ib...@gmail.com> on 2017/01/26 12:44:49 UTC

Paper

Hi folk,

There is a paper published recently "PaceMaker: When ZooKeeper Arteries Get
Clogged in Storm Clusters" [1]. It may worth to read.

[1]
http://ieeexplore.ieee.org/document/7820303/?tp=&arnumber=7820303&contentType=Conference%20Publications&dld=eWFob28uY29t&source=SEARCHALERT

Ibrahim

Re: Paper

Posted by Bobby Evans <ev...@yahoo-inc.com.INVALID>.

Yes we probably should have called it out.


- Bobby

On Thursday, January 26, 2017, 5:14:19 PM CST, Patrick Hunt <ph...@apache.org> wrote:On Thu, Jan 26, 2017 at 1:14 PM, Bobby Evans <ev...@yahoo-inc.com> wrote:

> We did think about ram disks a little, but the plan still is to have the
> source code, in one form or another, morph into a small distributed time
> series database for the metrics.  With that in mind we thought it would be
> better to take a step in that direction.  Yes a ram disk would likely have
> provided similar performance.  Although we would have still wanted to
> separate out just the metrics to the ram disk backed ZK, because we store
> other more critical data in ZK too, that need more durability guarantees.
>
>
Indeed, that was my thinking as well - two ZKs, one with "ephemeral" data
and one with precious. Your response makes sense/expected, I only ask
because I didn't see it mentioned in the document and for many folks it's a
good, if perhaps short term, solution.

Regards,

Patrick


> - Bobby
>
>
> On Thursday, January 26, 2017, 10:36:01 AM CST, Patrick Hunt <
> phunt@apache.org> wrote:
> Very interesting results and real world insights. Thanks for
> creating/sharing.
>
> One thing I noticed is that you mentioned considering SSDs, had you also
> considered using ram disks? I've seen some scenarios where that has been
> very successful.
>
> Patrick
>
> On Thu, Jan 26, 2017 at 6:28 AM, Bobby Evans <ev...@yahoo-inc.com.invalid>
> wrote:
>
> > As one of the authors of pacemaker in Apache Storm (and the paper), I am
> > happy to answer any questions about why we did it or how it works.  The
> > reality of it is storm was, and still is by default, abusing zookeeper by
> > trying to store a massive amount of metrics in it, instead of the
> > configuration/coordination it was designed for. And since storm metrics
> > don't really need strong consistency or even that much in terms of
> > reliability guarantees we stood up a netty server in front of a
> > ConcurrentHashMap (quite literately) and then wrote a client that could
> > handle fail-over.
> > It really is meant as a scalability stepping stone until we can get to
> the
> > point that all the metrics go to a TSDB that is actually designed for
> > metrics. But like I said if you have any questions I am happy to answer
> > them.
> > Sadly because of the way IEEE works neither I nor my employer own the
> copy
> > right to that paper any more so I can't even put a copy of it up for you
> to
> > read.
> >
> >
> > - Bobby
> >
> > On Thursday, January 26, 2017, 6:44:56 AM CST, ibrahim El-sanosi <
> > ibrahimsabattt@gmail.com> wrote:Hi folk,
> >
> > There is a paper published recently "PaceMaker: When ZooKeeper Arteries
> Get
> > Clogged in Storm Clusters" [1]. It may worth to read.
> >
> > [1]
> > http://ieeexplore.ieee.org/document/7820303/?tp=&
> > arnumber=7820303&contentType=Conference%20Publications&dld=
> > eWFob28uY29t&source=SEARCHALERT
> >
> > Ibrahim
> >
>

Re: Paper

Posted by Patrick Hunt <ph...@apache.org>.

On Thu, Jan 26, 2017 at 1:14 PM, Bobby Evans <ev...@yahoo-inc.com> wrote:

> We did think about ram disks a little, but the plan still is to have the
> source code, in one form or another, morph into a small distributed time
> series database for the metrics.  With that in mind we thought it would be
> better to take a step in that direction.  Yes a ram disk would likely have
> provided similar performance.  Although we would have still wanted to
> separate out just the metrics to the ram disk backed ZK, because we store
> other more critical data in ZK too, that need more durability guarantees.
>
>
Indeed, that was my thinking as well - two ZKs, one with "ephemeral" data
and one with precious. Your response makes sense/expected, I only ask
because I didn't see it mentioned in the document and for many folks it's a
good, if perhaps short term, solution.

Regards,

Patrick


> - Bobby
>
>
> On Thursday, January 26, 2017, 10:36:01 AM CST, Patrick Hunt <
> phunt@apache.org> wrote:
> Very interesting results and real world insights. Thanks for
> creating/sharing.
>
> One thing I noticed is that you mentioned considering SSDs, had you also
> considered using ram disks? I've seen some scenarios where that has been
> very successful.
>
> Patrick
>
> On Thu, Jan 26, 2017 at 6:28 AM, Bobby Evans <ev...@yahoo-inc.com.invalid>
> wrote:
>
> > As one of the authors of pacemaker in Apache Storm (and the paper), I am
> > happy to answer any questions about why we did it or how it works.  The
> > reality of it is storm was, and still is by default, abusing zookeeper by
> > trying to store a massive amount of metrics in it, instead of the
> > configuration/coordination it was designed for. And since storm metrics
> > don't really need strong consistency or even that much in terms of
> > reliability guarantees we stood up a netty server in front of a
> > ConcurrentHashMap (quite literately) and then wrote a client that could
> > handle fail-over.
> > It really is meant as a scalability stepping stone until we can get to
> the
> > point that all the metrics go to a TSDB that is actually designed for
> > metrics. But like I said if you have any questions I am happy to answer
> > them.
> > Sadly because of the way IEEE works neither I nor my employer own the
> copy
> > right to that paper any more so I can't even put a copy of it up for you
> to
> > read.
> >
> >
> > - Bobby
> >
> > On Thursday, January 26, 2017, 6:44:56 AM CST, ibrahim El-sanosi <
> > ibrahimsabattt@gmail.com> wrote:Hi folk,
> >
> > There is a paper published recently "PaceMaker: When ZooKeeper Arteries
> Get
> > Clogged in Storm Clusters" [1]. It may worth to read.
> >
> > [1]
> > http://ieeexplore.ieee.org/document/7820303/?tp=&
> > arnumber=7820303&contentType=Conference%20Publications&dld=
> > eWFob28uY29t&source=SEARCHALERT
> >
> > Ibrahim
> >
>

Re: Paper

Posted by Bobby Evans <ev...@yahoo-inc.com.INVALID>.

We did think about ram disks a little, but the plan still is to have the source code, in one form or another, morph into a small distributed time series database for the metrics.  With that in mind we thought it would be better to take a step in that direction.  Yes a ram disk would likely have provided similar performance.  Although we would have still wanted to separate out just the metrics to the ram disk backed ZK, because we store other more critical data in ZK too, that need more durability guarantees.

- Bobby

On Thursday, January 26, 2017, 10:36:01 AM CST, Patrick Hunt <ph...@apache.org> wrote:Very interesting results and real world insights. Thanks for
creating/sharing.

One thing I noticed is that you mentioned considering SSDs, had you also
considered using ram disks? I've seen some scenarios where that has been
very successful.

Patrick

On Thu, Jan 26, 2017 at 6:28 AM, Bobby Evans <ev...@yahoo-inc.com.invalid>
wrote:

> As one of the authors of pacemaker in Apache Storm (and the paper), I am
> happy to answer any questions about why we did it or how it works.  The
> reality of it is storm was, and still is by default, abusing zookeeper by
> trying to store a massive amount of metrics in it, instead of the
> configuration/coordination it was designed for. And since storm metrics
> don't really need strong consistency or even that much in terms of
> reliability guarantees we stood up a netty server in front of a
> ConcurrentHashMap (quite literately) and then wrote a client that could
> handle fail-over.
> It really is meant as a scalability stepping stone until we can get to the
> point that all the metrics go to a TSDB that is actually designed for
> metrics. But like I said if you have any questions I am happy to answer
> them.
> Sadly because of the way IEEE works neither I nor my employer own the copy
> right to that paper any more so I can't even put a copy of it up for you to
> read.
>
>
> - Bobby
>
> On Thursday, January 26, 2017, 6:44:56 AM CST, ibrahim El-sanosi <
> ibrahimsabattt@gmail.com> wrote:Hi folk,
>
> There is a paper published recently "PaceMaker: When ZooKeeper Arteries Get
> Clogged in Storm Clusters" [1]. It may worth to read.
>
> [1]
> http://ieeexplore.ieee.org/document/7820303/?tp=&
> arnumber=7820303&contentType=Conference%20Publications&dld=
> eWFob28uY29t&source=SEARCHALERT
>
> Ibrahim
>

Re: Paper

Posted by Patrick Hunt <ph...@apache.org>.

Very interesting results and real world insights. Thanks for
creating/sharing.

One thing I noticed is that you mentioned considering SSDs, had you also
considered using ram disks? I've seen some scenarios where that has been
very successful.

Patrick

On Thu, Jan 26, 2017 at 6:28 AM, Bobby Evans <ev...@yahoo-inc.com.invalid>
wrote:

> As one of the authors of pacemaker in Apache Storm (and the paper), I am
> happy to answer any questions about why we did it or how it works.  The
> reality of it is storm was, and still is by default, abusing zookeeper by
> trying to store a massive amount of metrics in it, instead of the
> configuration/coordination it was designed for. And since storm metrics
> don't really need strong consistency or even that much in terms of
> reliability guarantees we stood up a netty server in front of a
> ConcurrentHashMap (quite literately) and then wrote a client that could
> handle fail-over.
> It really is meant as a scalability stepping stone until we can get to the
> point that all the metrics go to a TSDB that is actually designed for
> metrics. But like I said if you have any questions I am happy to answer
> them.
> Sadly because of the way IEEE works neither I nor my employer own the copy
> right to that paper any more so I can't even put a copy of it up for you to
> read.
>
>
> - Bobby
>
> On Thursday, January 26, 2017, 6:44:56 AM CST, ibrahim El-sanosi <
> ibrahimsabattt@gmail.com> wrote:Hi folk,
>
> There is a paper published recently "PaceMaker: When ZooKeeper Arteries Get
> Clogged in Storm Clusters" [1]. It may worth to read.
>
> [1]
> http://ieeexplore.ieee.org/document/7820303/?tp=&
> arnumber=7820303&contentType=Conference%20Publications&dld=
> eWFob28uY29t&source=SEARCHALERT
>
> Ibrahim
>

Re: Paper

Posted by Bobby Evans <ev...@yahoo-inc.com.INVALID>.

As one of the authors of pacemaker in Apache Storm (and the paper), I am happy to answer any questions about why we did it or how it works.  The reality of it is storm was, and still is by default, abusing zookeeper by trying to store a massive amount of metrics in it, instead of the configuration/coordination it was designed for. And since storm metrics don't really need strong consistency or even that much in terms of reliability guarantees we stood up a netty server in front of a ConcurrentHashMap (quite literately) and then wrote a client that could handle fail-over.
It really is meant as a scalability stepping stone until we can get to the point that all the metrics go to a TSDB that is actually designed for metrics. But like I said if you have any questions I am happy to answer them.
Sadly because of the way IEEE works neither I nor my employer own the copy right to that paper any more so I can't even put a copy of it up for you to read.


- Bobby

On Thursday, January 26, 2017, 6:44:56 AM CST, ibrahim El-sanosi <ib...@gmail.com> wrote:Hi folk,

There is a paper published recently "PaceMaker: When ZooKeeper Arteries Get
Clogged in Storm Clusters" [1]. It may worth to read.

[1]
http://ieeexplore.ieee.org/document/7820303/?tp=&arnumber=7820303&contentType=Conference%20Publications&dld=eWFob28uY29t&source=SEARCHALERT

Ibrahim

Re: Paper

Posted by Paul Asmuth <pa...@googlemail.com>.

You can already find the paper on sci-hub.io (search for the DOI I)Dsci

On Thu, Jan 26, 2017 at 3:17 PM, Jordan Zimmerman <
jordan@jordanzimmerman.com> wrote:

> Sad that such an important paper requires a fee. Is there a free version
> anywhere?
>
> -Jordan
>
> > On Jan 26, 2017, at 7:44 AM, ibrahim El-sanosi <ib...@gmail.com>
> wrote:
> >
> > Hi folk,
> >
> > There is a paper published recently "PaceMaker: When ZooKeeper Arteries
> Get
> > Clogged in Storm Clusters" [1]. It may worth to read.
> >
> > [1]
> > http://ieeexplore.ieee.org/document/7820303/?tp=&
> arnumber=7820303&contentType=Conference%20Publications&dld=
> eWFob28uY29t&source=SEARCHALERT
> >
> > Ibrahim
>
>


-- 
Paul Asmuth
T: +31-622-351956
paul@asmuth.com

EventQL | DeepCortex GmbH
https://eventql.io/
Kantstraße 33
10625 Berlin

Re: Paper

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.

Sad that such an important paper requires a fee. Is there a free version anywhere?

-Jordan

> On Jan 26, 2017, at 7:44 AM, ibrahim El-sanosi <ib...@gmail.com> wrote:
> 
> Hi folk,
> 
> There is a paper published recently "PaceMaker: When ZooKeeper Arteries Get
> Clogged in Storm Clusters" [1]. It may worth to read.
> 
> [1]
> http://ieeexplore.ieee.org/document/7820303/?tp=&arnumber=7820303&contentType=Conference%20Publications&dld=eWFob28uY29t&source=SEARCHALERT
> 
> Ibrahim