You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flume.apache.org by Eric Sammer <es...@cloudera.com> on 2011/07/08 23:08:27 UTC

Flume master NG

Flumers:

Jon and I had some preliminary discussions (prior to Flume entering
the incubator) about a redesign of flume's master. The short version
is that:

1. The current multimaster doesn't work correctly in all cases (e.g.
autochains).
2. State exists in the master's memory that isn't in ZK (so failover
isn't simple)
3. All the heartbeat / RPC code is complicated.

The short version of master NG would be:

1. All master state gets pushed into ZK
2. Nodes no longer heartbeat to a specific master. Instead, they
connect to ZK and use ephemeral nodes to indicate health.
3. Masters judge health and push configuration by poking state into
each node's respective ZK namespace.

There's a separate discussion around changing how ACKs work (so they
don't pass through the master) but we're going to separate that for
now. This is just a heads up that I'm going to start poking bits into
the new Apache Flume wiki. The discussion is open to all and both Jon
and I would love to hear from users, contrib'ers, and dev'ers alike.
I'm also going to try and rope in a ZK ninja to talk about potential
issues they may cause / things to be aware of once we have something
in the wiki to point at.

-- 
Eric Sammer
twitter: esammer
data: www.cloudera.com

Re: Flume master NG

Posted by Patrick Hunt <ph...@apache.org>.

On Mon, Jul 18, 2011 at 12:08 AM, Eric Sammer <es...@cloudera.com> wrote:
> * If we use ZK sessions and ephemeral nodes for heartbeats and status,
> are we killing ourselves long term? I know ZK can span wide areas
> (i.e. suffer high latency) but it will certainly impact the time it
> takes to detect and recover from errors. I'd like to get input from
> phunt or Henry on this. Another option is to have a secondary layer
> for controlling inter-DC communication and have discreet ZK and
> control planes.

What you say here is correct. ZK can span DCs, however the increased
latency, and higher levels of connection "flakeyness", generally cause
you to tune ZK (quorum and clients) to handle - which results in less
sensitivity to failure. If you are talking about a use case that
actually spans DCs then this is fine - say global resource allocation
"I have 2 hadoop clusters, one in each of 2 DCs, which should I use?"
However if you are talking about more fine grained access, typically
"within a DC", then you pay the price of this multi-DC distribution.

> * The new role of the master. Assuming we can gut ACK path from the
> master and remove it from the critical path, does it matter where it
> lives? Based on the discussions we've had internally and what is
> codified in the doc thus far, the master would only be used to push
> config changes and display status. Really, it just reads from / write
> to ZK. ZK is the true master in the proposal.
> * If we're primarily interested in an inter-DC data plane, is this
> best done by having discreet Flume systems and simply configuring them
> to hand off data to one another? Note this is how most systems work:
> SMTP, modular switches with "virtual" fabric" type functionality, JMS
> broker topologies. Most of the time, there's a discreet system in each
> location and an "uber tool" pushes configurations down to each system
> from a centralized location. Nothing we would do here would preclude
> this from working. In other words, maybe the best way to deal with
> inter-DC data is to have separate Flume installs and have a tool or UI
> that just communicates to N masters and pushes the right config to
> each.

My experience building large multi-DC systems is consistent with this.

You could use ZK to hit both of these uses cases, 1) have local
ensembles to handle the low-latency finer grained requests, and 2) an
separate overarching ZK ensemble to handle the cross-DC issues. You
could also handle the case where one DC gets partitioned from the rest
- it would lose access to 2), however it could still function until it
re-connects and gets updated instructions. Or perhaps there is some
fallback in this case. Granted, having your DC lose connectivity for a
longish period of time to your other DCs is probably bad generally
though.

Take a look at how Akamai solved this issue. Replace "quorum" with
ZooKeeper. :-)
http://www.usenix.org/event/nsdi05/tech/full_papers/sherman/sherman_html/index.html

Patrick

Re: Flume master NG

Posted by Patrick Hunt <ph...@apache.org>.

On Fri, Jul 22, 2011 at 7:13 AM, Jonathan Hsieh <jo...@cloudera.com> wrote:
> On Mon, Jul 18, 2011 at 12:08 AM, Eric Sammer <es...@cloudera.com> wrote:
> I'd prefer limiting the scope of implementation currently, but having a
> design that will allow for greater scope in the future.  More specifically,
> I've been thinking that the current flume, with its collectors and agents
> would be focused on the single DC case.
>

Even users with multiple DCs should just consider running in this way
to keep things simple and increase reliability.

> Let's consider the "limits" with ZK and emphemral nodes today.  Heartbeating
> via ephmeral nodes will crap out when ZK cannot handle the # of connections
> or the amount of writes.  Last I've heard this is on the order of 1k
> writes/s on a laptop. [1]  and several 10k's from the zk paper [2].  With 1
> second ephemeral node refreshes I think this means we could theoretically
> have 10k's of nodes.  More practically, this means it should be able to
> handle 1k's of nodes with failures.   Right now we use 5s so we could get a
> 5x bump or more if we make the heartbeats longer.  My understanding is that
> are single-DC clusters today that are already at these scales.
>

The current ZK impl on 3 year old hardware can do ~20k writes/sec. We
are working to increase this - FB is currently testing at 50k+

The largest production cluster I've heard of is 10k clients on a 5
server ensemble.

Patrick

Re: Flume master NG

Posted by NerdyNick <ne...@gmail.com>.

Just to clarify the use case I have for multi-datacenter isn't really
to complicated. Its basically I have 3 DCs. DC1 contains the Hadoop
cluster and DC2/3 just contains nodes where data is produced. As well
as DC1 also has nodes producing data. I need a way of getting that
data to DC1 and into Hadoop reliably.

On Fri, Jul 22, 2011 at 8:13 AM, Jonathan Hsieh <jo...@cloudera.com> wrote:
> I'm assuming the goal of inter-DC comms is it for replication and disaster
> recovery.  Are there other goals for this?
>
> On Mon, Jul 18, 2011 at 12:08 AM, Eric Sammer <es...@cloudera.com> wrote:
>
>> A good (and tricky) point. In keeping with the way we've talked about
>> Flume, there are two systems that are important. One system is the
>> control plane responsible for node health and status, configuration,
>> and reporting. The other major system is the data plane, obviously
>> responsible for data transfer. I see the utility of both being able to
>> span a wide area but I think it's warrants a discussion as to how it
>> should work and what guarantees we're interested in making here.
>>
>> Major points to cover:
>>
>> * If we use ZK sessions and ephemeral nodes for heartbeats and status,
>> are we killing ourselves long term? I know ZK can span wide areas
>> (i.e. suffer high latency) but it will certainly impact the time it
>> takes to detect and recover from errors. I'd like to get input from
>> phunt or Henry on this. Another option is to have a secondary layer
>> for controlling inter-DC communication and have discreet ZK and
>> control planes.
>>
>
>
> I'd prefer limiting the scope of implementation currently, but having a
> design that will allow for greater scope in the future.  More specifically,
> I've been thinking that the current flume, with its collectors and agents
> would be focused on the single DC case.
>
> Let's consider the "limits" with ZK and emphemral nodes today.  Heartbeating
> via ephmeral nodes will crap out when ZK cannot handle the # of connections
> or the amount of writes.  Last I've heard this is on the order of 1k
> writes/s on a laptop. [1]  and several 10k's from the zk paper [2].  With 1
> second ephemeral node refreshes I think this means we could theoretically
> have 10k's of nodes.  More practically, this means it should be able to
> handle 1k's of nodes with failures.   Right now we use 5s so we could get a
> 5x bump or more if we make the heartbeats longer.  My understanding is that
> are single-DC clusters today that are already at these scales.
>
> It follows that if we ever get into clusters with 10k's or 100k's of
> machines we'll probably want to have multiple instances.  If they are in the
> same DC they could potentially write to the same HDFS.  If they are
> different we have a different problem -- single source and many
> destinations.  For this I'm leaning towards a secondary layer of Flume, with
> inter-DC agents and inter-DC collectors would likely have more of a pub-sub
> model a la facebook's calligraphus (uses hdfs for durability) or yahoo!'s
> hedwig (bookkeeper/zk for durability).  This lets the lume 1.0 goal be solid
> and scalable many-to-few aggregation a single DC and have a long term flume
> 2.0 goal being few-to-many inter-dc replication.
>
> [1]
> http://www.cloudera.com/blog/2009/12/observers-making-zookeeper-scale-even-further/
>  [2] http://www.usenix.org/event/atc10/tech/full_papers/Hunt.pdf
>
>
>> * The new role of the master. Assuming we can gut ACK path from the
>> master and remove it from the critical path, does it matter where it
>> lives? Based on the discussions we've had internally and what is
>> codified in the doc thus far, the master would only be used to push
>> config changes and display status. Really, it just reads from / write
>> to ZK. ZK is the true master in the proposal.
>>
>
> The master would also still be the server for the shell and along with the
> web interfaces.  A shell could potentially just go to zk directly, but my
> first intuition is that there should be one server to make concurrent shell
> access easier to understand and manage.
>
> By decoupling the state to ZK, this functionality could be separated.  For
> now, the master would remain the "ack node" that could too also eventually
> be put into a separate process.  Or, collector nodes could actually publish
> acks into ZK.   At the end of the day, if all the state lives in ZK, the
> master doesn't even need to talk to nodes -- it could just talk to ZK.  The
> master could just watch the status and just manage control plane things like
> flow transitions and adapting if more nodes are added ore removed.  If
> masters only interact with ZK, I don't think we would even need master
> leader election.
>
> Another ramification of completely decoupling for an inter-dc case is that
> one master could consults two or more different zk's.
>
> * If we're primarily interested in an inter-DC data plane, is this
>> best done by having discreet Flume systems and simply configuring them
>> to hand off data to one another? Note this is how most systems work:
>> SMTP, modular switches with "virtual" fabric" type functionality, JMS
>> broker topologies. Most of the time, there's a discreet system in each
>> location and an "uber tool" pushes configurations down to each system
>> from a centralized location. Nothing we would do here would preclude
>> this from working. In other words, maybe the best way to deal with
>> inter-DC data is to have separate Flume installs and have a tool or UI
>> that just communicates to N masters and pushes the right config to
>> each.
>>
>>
> I like the idea of messaging/pubsub style system for inter-DC
> communications.  I like the design of Facebook's calligraphus system (treat
> HDFS as a pubsub queue using append and mulitple concurrent readers for 3-5
> second data latency).
>
> We don't necessarily have to solve all of these problems / answer all
>> of these questions. I'm interested in deciding:
>> * What an initial re-implementation or re-factoring of the master looks
>> like.
>>
>
> I think the discrete steps may be:
>
> 1) nodes use zk ephemeral nodes to heart beat
> 2) translation/auto chaining stuff gets extracted out of master (just a
> client to the zk "schema")
>
>
> * What features we'd like to support long term.
>>
>
> Ideally, the zk "schema" for flume data would be extensible so that extra
> features (like chokes, the addition of flows, the additions of ip/port info
> etc) can be added on without causing pain.
>
> * What features we want to punt on for the foreseeable future (e.g. > 24
>> months)
>>
>> On Sat, Jul 16, 2011 at 8:10 PM, NerdyNick <ne...@gmail.com> wrote:
>> > The only thing I would like to make sure is that the design allow for
>> > future work to be done on allowing for multi datacenter.
>> >
>> > On Fri, Jul 8, 2011 at 3:08 PM, Eric Sammer <es...@cloudera.com>
>> wrote:
>> >> Flumers:
>> >>
>> >> Jon and I had some preliminary discussions (prior to Flume entering
>> >> the incubator) about a redesign of flume's master. The short version
>> >> is that:
>> >>
>> >> 1. The current multimaster doesn't work correctly in all cases (e.g.
>> >> autochains).
>> >> 2. State exists in the master's memory that isn't in ZK (so failover
>> >> isn't simple)
>> >> 3. All the heartbeat / RPC code is complicated.
>> >>
>> >> The short version of master NG would be:
>> >>
>> >> 1. All master state gets pushed into ZK
>> >> 2. Nodes no longer heartbeat to a specific master. Instead, they
>> >> connect to ZK and use ephemeral nodes to indicate health.
>> >> 3. Masters judge health and push configuration by poking state into
>> >> each node's respective ZK namespace.
>> >>
>> >> There's a separate discussion around changing how ACKs work (so they
>> >> don't pass through the master) but we're going to separate that for
>> >> now. This is just a heads up that I'm going to start poking bits into
>> >> the new Apache Flume wiki. The discussion is open to all and both Jon
>> >> and I would love to hear from users, contrib'ers, and dev'ers alike.
>> >> I'm also going to try and rope in a ZK ninja to talk about potential
>> >> issues they may cause / things to be aware of once we have something
>> >> in the wiki to point at.
>> >>
>> >> --
>> >> Eric Sammer
>> >> twitter: esammer
>> >> data: www.cloudera.com
>> >>
>> >
>> >
>> >
>> > --
>> > Nick Verbeck - NerdyNick
>> > ----------------------------------------------------
>> > NerdyNick.com
>> > Coloco.ubuntu-rocks.org
>> >
>>
>>
>>
>> --
>> Eric Sammer
>> twitter: esammer
>> data: www.cloudera.com
>>
>
>
>
> --
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // jon@cloudera.com
>



-- 
Nick Verbeck - NerdyNick
----------------------------------------------------
NerdyNick.com
Coloco.ubuntu-rocks.org

Re: Flume master NG

Posted by Jonathan Hsieh <jo...@cloudera.com>.

I'm assuming the goal of inter-DC comms is it for replication and disaster
recovery.  Are there other goals for this?

On Mon, Jul 18, 2011 at 12:08 AM, Eric Sammer <es...@cloudera.com> wrote:

> A good (and tricky) point. In keeping with the way we've talked about
> Flume, there are two systems that are important. One system is the
> control plane responsible for node health and status, configuration,
> and reporting. The other major system is the data plane, obviously
> responsible for data transfer. I see the utility of both being able to
> span a wide area but I think it's warrants a discussion as to how it
> should work and what guarantees we're interested in making here.
>
> Major points to cover:
>
> * If we use ZK sessions and ephemeral nodes for heartbeats and status,
> are we killing ourselves long term? I know ZK can span wide areas
> (i.e. suffer high latency) but it will certainly impact the time it
> takes to detect and recover from errors. I'd like to get input from
> phunt or Henry on this. Another option is to have a secondary layer
> for controlling inter-DC communication and have discreet ZK and
> control planes.
>

I'd prefer limiting the scope of implementation currently, but having a
design that will allow for greater scope in the future.  More specifically,
I've been thinking that the current flume, with its collectors and agents
would be focused on the single DC case.

Let's consider the "limits" with ZK and emphemral nodes today.  Heartbeating
via ephmeral nodes will crap out when ZK cannot handle the # of connections
or the amount of writes.  Last I've heard this is on the order of 1k
writes/s on a laptop. [1]  and several 10k's from the zk paper [2].  With 1
second ephemeral node refreshes I think this means we could theoretically
have 10k's of nodes.  More practically, this means it should be able to
handle 1k's of nodes with failures.   Right now we use 5s so we could get a
5x bump or more if we make the heartbeats longer.  My understanding is that
are single-DC clusters today that are already at these scales.

It follows that if we ever get into clusters with 10k's or 100k's of
machines we'll probably want to have multiple instances.  If they are in the
same DC they could potentially write to the same HDFS.  If they are
different we have a different problem -- single source and many
destinations.  For this I'm leaning towards a secondary layer of Flume, with
inter-DC agents and inter-DC collectors would likely have more of a pub-sub
model a la facebook's calligraphus (uses hdfs for durability) or yahoo!'s
hedwig (bookkeeper/zk for durability).  This lets the lume 1.0 goal be solid
and scalable many-to-few aggregation a single DC and have a long term flume
2.0 goal being few-to-many inter-dc replication.

[1]
http://www.cloudera.com/blog/2009/12/observers-making-zookeeper-scale-even-further/
 [2] http://www.usenix.org/event/atc10/tech/full_papers/Hunt.pdf

> * The new role of the master. Assuming we can gut ACK path from the
> master and remove it from the critical path, does it matter where it
> lives? Based on the discussions we've had internally and what is
> codified in the doc thus far, the master would only be used to push
> config changes and display status. Really, it just reads from / write
> to ZK. ZK is the true master in the proposal.
>

The master would also still be the server for the shell and along with the
web interfaces.  A shell could potentially just go to zk directly, but my
first intuition is that there should be one server to make concurrent shell
access easier to understand and manage.

By decoupling the state to ZK, this functionality could be separated.  For
now, the master would remain the "ack node" that could too also eventually
be put into a separate process.  Or, collector nodes could actually publish
acks into ZK.   At the end of the day, if all the state lives in ZK, the
master doesn't even need to talk to nodes -- it could just talk to ZK.  The
master could just watch the status and just manage control plane things like
flow transitions and adapting if more nodes are added ore removed.  If
masters only interact with ZK, I don't think we would even need master
leader election.

Another ramification of completely decoupling for an inter-dc case is that
one master could consults two or more different zk's.

* If we're primarily interested in an inter-DC data plane, is this
> best done by having discreet Flume systems and simply configuring them
> to hand off data to one another? Note this is how most systems work:
> SMTP, modular switches with "virtual" fabric" type functionality, JMS
> broker topologies. Most of the time, there's a discreet system in each
> location and an "uber tool" pushes configurations down to each system
> from a centralized location. Nothing we would do here would preclude
> this from working. In other words, maybe the best way to deal with
> inter-DC data is to have separate Flume installs and have a tool or UI
> that just communicates to N masters and pushes the right config to
> each.
>
>
I like the idea of messaging/pubsub style system for inter-DC
communications.  I like the design of Facebook's calligraphus system (treat
HDFS as a pubsub queue using append and mulitple concurrent readers for 3-5
second data latency).

We don't necessarily have to solve all of these problems / answer all
> of these questions. I'm interested in deciding:
> * What an initial re-implementation or re-factoring of the master looks
> like.
>

I think the discrete steps may be:

1) nodes use zk ephemeral nodes to heart beat
2) translation/auto chaining stuff gets extracted out of master (just a
client to the zk "schema")

* What features we'd like to support long term.
>

Ideally, the zk "schema" for flume data would be extensible so that extra
features (like chokes, the addition of flows, the additions of ip/port info
etc) can be added on without causing pain.

* What features we want to punt on for the foreseeable future (e.g. > 24
> months)
>
> On Sat, Jul 16, 2011 at 8:10 PM, NerdyNick <ne...@gmail.com> wrote:
> > The only thing I would like to make sure is that the design allow for
> > future work to be done on allowing for multi datacenter.
> >
> > On Fri, Jul 8, 2011 at 3:08 PM, Eric Sammer <es...@cloudera.com>
> wrote:
> >> Flumers:
> >>
> >> Jon and I had some preliminary discussions (prior to Flume entering
> >> the incubator) about a redesign of flume's master. The short version
> >> is that:
> >>
> >> 1. The current multimaster doesn't work correctly in all cases (e.g.
> >> autochains).
> >> 2. State exists in the master's memory that isn't in ZK (so failover
> >> isn't simple)
> >> 3. All the heartbeat / RPC code is complicated.
> >>
> >> The short version of master NG would be:
> >>
> >> 1. All master state gets pushed into ZK
> >> 2. Nodes no longer heartbeat to a specific master. Instead, they
> >> connect to ZK and use ephemeral nodes to indicate health.
> >> 3. Masters judge health and push configuration by poking state into
> >> each node's respective ZK namespace.
> >>
> >> There's a separate discussion around changing how ACKs work (so they
> >> don't pass through the master) but we're going to separate that for
> >> now. This is just a heads up that I'm going to start poking bits into
> >> the new Apache Flume wiki. The discussion is open to all and both Jon
> >> and I would love to hear from users, contrib'ers, and dev'ers alike.
> >> I'm also going to try and rope in a ZK ninja to talk about potential
> >> issues they may cause / things to be aware of once we have something
> >> in the wiki to point at.
> >>
> >> --
> >> Eric Sammer
> >> twitter: esammer
> >> data: www.cloudera.com
> >>
> >
> >
> >
> > --
> > Nick Verbeck - NerdyNick
> > ----------------------------------------------------
> > NerdyNick.com
> > Coloco.ubuntu-rocks.org
> >
>
>
>
> --
> Eric Sammer
> twitter: esammer
> data: www.cloudera.com
>

-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Re: Flume master NG

Posted by Eric Sammer <es...@cloudera.com>.

A good (and tricky) point. In keeping with the way we've talked about
Flume, there are two systems that are important. One system is the
control plane responsible for node health and status, configuration,
and reporting. The other major system is the data plane, obviously
responsible for data transfer. I see the utility of both being able to
span a wide area but I think it's warrants a discussion as to how it
should work and what guarantees we're interested in making here.

Major points to cover:

* If we use ZK sessions and ephemeral nodes for heartbeats and status,
are we killing ourselves long term? I know ZK can span wide areas
(i.e. suffer high latency) but it will certainly impact the time it
takes to detect and recover from errors. I'd like to get input from
phunt or Henry on this. Another option is to have a secondary layer
for controlling inter-DC communication and have discreet ZK and
control planes.
* The new role of the master. Assuming we can gut ACK path from the
master and remove it from the critical path, does it matter where it
lives? Based on the discussions we've had internally and what is
codified in the doc thus far, the master would only be used to push
config changes and display status. Really, it just reads from / write
to ZK. ZK is the true master in the proposal.
* If we're primarily interested in an inter-DC data plane, is this
best done by having discreet Flume systems and simply configuring them
to hand off data to one another? Note this is how most systems work:
SMTP, modular switches with "virtual" fabric" type functionality, JMS
broker topologies. Most of the time, there's a discreet system in each
location and an "uber tool" pushes configurations down to each system
from a centralized location. Nothing we would do here would preclude
this from working. In other words, maybe the best way to deal with
inter-DC data is to have separate Flume installs and have a tool or UI
that just communicates to N masters and pushes the right config to
each.

We don't necessarily have to solve all of these problems / answer all
of these questions. I'm interested in deciding:
* What an initial re-implementation or re-factoring of the master looks like.
* What features we'd like to support long term.
* What features we want to punt on for the foreseeable future (e.g. > 24 months)

On Sat, Jul 16, 2011 at 8:10 PM, NerdyNick <ne...@gmail.com> wrote:
> The only thing I would like to make sure is that the design allow for
> future work to be done on allowing for multi datacenter.
>
> On Fri, Jul 8, 2011 at 3:08 PM, Eric Sammer <es...@cloudera.com> wrote:
>> Flumers:
>>
>> Jon and I had some preliminary discussions (prior to Flume entering
>> the incubator) about a redesign of flume's master. The short version
>> is that:
>>
>> 1. The current multimaster doesn't work correctly in all cases (e.g.
>> autochains).
>> 2. State exists in the master's memory that isn't in ZK (so failover
>> isn't simple)
>> 3. All the heartbeat / RPC code is complicated.
>>
>> The short version of master NG would be:
>>
>> 1. All master state gets pushed into ZK
>> 2. Nodes no longer heartbeat to a specific master. Instead, they
>> connect to ZK and use ephemeral nodes to indicate health.
>> 3. Masters judge health and push configuration by poking state into
>> each node's respective ZK namespace.
>>
>> There's a separate discussion around changing how ACKs work (so they
>> don't pass through the master) but we're going to separate that for
>> now. This is just a heads up that I'm going to start poking bits into
>> the new Apache Flume wiki. The discussion is open to all and both Jon
>> and I would love to hear from users, contrib'ers, and dev'ers alike.
>> I'm also going to try and rope in a ZK ninja to talk about potential
>> issues they may cause / things to be aware of once we have something
>> in the wiki to point at.
>>
>> --
>> Eric Sammer
>> twitter: esammer
>> data: www.cloudera.com
>>
>
>
>
> --
> Nick Verbeck - NerdyNick
> ----------------------------------------------------
> NerdyNick.com
> Coloco.ubuntu-rocks.org
>

-- 
Eric Sammer
twitter: esammer
data: www.cloudera.com

Re: Flume master NG

Posted by NerdyNick <ne...@gmail.com>.

The only thing I would like to make sure is that the design allow for
future work to be done on allowing for multi datacenter.

On Fri, Jul 8, 2011 at 3:08 PM, Eric Sammer <es...@cloudera.com> wrote:
> Flumers:
>
> Jon and I had some preliminary discussions (prior to Flume entering
> the incubator) about a redesign of flume's master. The short version
> is that:
>
> 1. The current multimaster doesn't work correctly in all cases (e.g.
> autochains).
> 2. State exists in the master's memory that isn't in ZK (so failover
> isn't simple)
> 3. All the heartbeat / RPC code is complicated.
>
> The short version of master NG would be:
>
> 1. All master state gets pushed into ZK
> 2. Nodes no longer heartbeat to a specific master. Instead, they
> connect to ZK and use ephemeral nodes to indicate health.
> 3. Masters judge health and push configuration by poking state into
> each node's respective ZK namespace.
>
> There's a separate discussion around changing how ACKs work (so they
> don't pass through the master) but we're going to separate that for
> now. This is just a heads up that I'm going to start poking bits into
> the new Apache Flume wiki. The discussion is open to all and both Jon
> and I would love to hear from users, contrib'ers, and dev'ers alike.
> I'm also going to try and rope in a ZK ninja to talk about potential
> issues they may cause / things to be aware of once we have something
> in the wiki to point at.
>
> --
> Eric Sammer
> twitter: esammer
> data: www.cloudera.com
>



-- 
Nick Verbeck - NerdyNick
----------------------------------------------------
NerdyNick.com
Coloco.ubuntu-rocks.org