You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nifi.apache.org by Mark Payne <ma...@hotmail.com> on 2016/04/01 15:31:17 UTC

Re: Zookeeper issues mentioned in a talk about storm / heron

Guys,

Certainly some great points here and important concepts to keep in mind!

One thing to remember, though, that very much differentiates NIFi from Storm
or HBase or Accumulo: those systems are typically expected to scale to hundreds
or thousands of nodes (you mention a couple hundred node HBase cluster being
"modest"). With NiFi, we are typically operating clusters on the order of several
nodes to dozens of nodes. A couple hundred nodes would be a pretty massive NiFi
cluster.

In terms of storing state, we could potentially get into more of a sticky situation
if not done carefully. However, we generally expect the "State Management" feature to be used for
occasionally storing small amounts of state, such as for ListHDFS storing 2 timestamps
and we don't expect ListHDFS to be continually hammering HDFS asking for
a new listing. It may be scheduled to run once every few minutes for example.

That said, we certainly have been designing everything using appropriate interfaces
in such a way that if we do later decide that we want to use some other mechanism
for storing state and/or heartbeats, it should be a very reasonable path to
take advantage of some other service for one or both of these tasks.

Thanks
-Mark

> On Mar 31, 2016, at 5:35 PM, Sean Busbey <bu...@apache.org> wrote:
> 
> HBase has also had issues with ZK at modest (~couple hundred node)
> scale when using it to act as an intermediary for heartbeats and to do
> work assignment.
> 
> On Thu, Mar 31, 2016 at 4:33 PM, Tony Kurc <tr...@gmail.com> wrote:
>> My commentary that didn't accompany this - it appears Storm was using
>> zookeeper in a similar way as the road we're heading down, and it was such
>> a major bottleneck that they moved key value storage and heartbeating out
>> into separate services, and then re-engineering (i.e. built Heron). Before
>> we get too dependent on zookeeper, may be worth learning some lessons from
>> the crew that built Heron or from a team that learned zookeeper lessons
>> scale like accumulo.
>> 
>> On Thu, Mar 24, 2016 at 6:22 PM, Tony Kurc <tr...@gmail.com> wrote:
>> 
>>> I mentioned slides I saw at the meetup about zookeeper perils at scale in
>>> storm, here are slides, i couldn't find a video after some limited
>>> searching.
>>> https://qconsf.com/system/files/presentation-slides/heron-qcon-2015.pdf
>>>

Re: Zookeeper issues mentioned in a talk about storm / heron

Posted by Mark Payne <ma...@hotmail.com>.

Tony,

Certainly some good points here. I definitely can agree that there is some concern
about the resource contention between heart-beating and state management.

In testing the PR 323 for heart-beating to ZooKeeper, we did notice that if we lose
the quorum, the nodes are no longer able to know which nodes are sending heartbeats,
which results in the state of the cluster really being unknown. For example, if we have a 3
node NiFi cluster, all running an embedded ZooKeeper, if we lose 2 nodes, we are now
reporting that we have 2/3 nodes instead of 1/3 nodes because we can't read from ZooKeeper
to determine that the second node is missing.

While I guess I did realize that this was the way it would work, it wasn't clear how poor the
usability would be here, when losing a quorum means that the state of the cluster really is
unknown.

Also, as the heart-beating was refactored, we dramatically trimmed the size of Heartbeat messages
to a very small size (probably around 1-2 KB). Also, with no NCM running the show anymore, all nodes
within a cluster will be required to be able to communicate with one another to send Cluster Protocol
messages.

To this end, I think the better solution regarding heart-beating is to simply have nodes send their heartbeat
to all nodes in the cluster. This allows all nodes to know the current state. Since the heartbeats are now
very small, the network chatter will be pretty limited.

This means that ZooKeeper will be required only for two things: Leader Election, and State Management
(with the ability to provide a different storage mechanism for State Management later, if we see the need).
Leader Election still would be used to elect a 'Cluster Coordinator' who is responsible for (among other things)
monitoring heartbeats. If that node does not receive a heartbeat from Node X in some amount of time, it would
notify all nodes in the cluster that Node X is now disconnected. This way, even if the Leader is unable to
communicate with Node X, all other nodes in the cluster know it is disconnected and will issue Node X
a Disconnection Request the next time that Node X heartbeats.

I have created a JIRA [1] where we can track any further concerns that may develop in the community.

Thanks
-Mark

[1] https://issues.apache.org/jira/browse/NIFI-1743 <https://issues.apache.org/jira/browse/NIFI-1743>

> On Apr 4, 2016, at 12:40 PM, Tony Kurc <tr...@gmail.com> wrote:
> 
> Mark,
> Fair points!
> 
> Something an Apache Accumulo committer pointed out at meetup is the is that
> the scale issues may come sooner than a couple hundred nodes due to the
> size of writes and potential frequency of writes (Joe Percivall's
> demonstration on the windowing seemed like it could write much more
> frequently than a couple times a minute).
> 
> Another point someone brought up is that if heartbeating and state
> management are competing for resources, bad things can happen.
> 
> Tony
> 
> On Fri, Apr 1, 2016 at 9:31 AM, Mark Payne <ma...@hotmail.com> wrote:
> 
>> Guys,
>> 
>> Certainly some great points here and important concepts to keep in mind!
>> 
>> One thing to remember, though, that very much differentiates NIFi from
>> Storm
>> or HBase or Accumulo: those systems are typically expected to scale to
>> hundreds
>> or thousands of nodes (you mention a couple hundred node HBase cluster
>> being
>> "modest"). With NiFi, we are typically operating clusters on the order of
>> several
>> nodes to dozens of nodes. A couple hundred nodes would be a pretty massive
>> NiFi
>> cluster.
>> 
>> In terms of storing state, we could potentially get into more of a sticky
>> situation
>> if not done carefully. However, we generally expect the "State Management"
>> feature to be used for
>> occasionally storing small amounts of state, such as for ListHDFS storing
>> 2 timestamps
>> and we don't expect ListHDFS to be continually hammering HDFS asking for
>> a new listing. It may be scheduled to run once every few minutes for
>> example.
>> 
>> That said, we certainly have been designing everything using appropriate
>> interfaces
>> in such a way that if we do later decide that we want to use some other
>> mechanism
>> for storing state and/or heartbeats, it should be a very reasonable path to
>> take advantage of some other service for one or both of these tasks.
>> 
>> Thanks
>> -Mark
>> 
>> 
>> 
>>> On Mar 31, 2016, at 5:35 PM, Sean Busbey <bu...@apache.org> wrote:
>>> 
>>> HBase has also had issues with ZK at modest (~couple hundred node)
>>> scale when using it to act as an intermediary for heartbeats and to do
>>> work assignment.
>>> 
>>> On Thu, Mar 31, 2016 at 4:33 PM, Tony Kurc <tr...@gmail.com> wrote:
>>>> My commentary that didn't accompany this - it appears Storm was using
>>>> zookeeper in a similar way as the road we're heading down, and it was
>> such
>>>> a major bottleneck that they moved key value storage and heartbeating
>> out
>>>> into separate services, and then re-engineering (i.e. built Heron).
>> Before
>>>> we get too dependent on zookeeper, may be worth learning some lessons
>> from
>>>> the crew that built Heron or from a team that learned zookeeper lessons
>>>> scale like accumulo.
>>>> 
>>>> On Thu, Mar 24, 2016 at 6:22 PM, Tony Kurc <tr...@gmail.com> wrote:
>>>> 
>>>>> I mentioned slides I saw at the meetup about zookeeper perils at scale
>> in
>>>>> storm, here are slides, i couldn't find a video after some limited
>>>>> searching.
>>>>> 
>> https://qconsf.com/system/files/presentation-slides/heron-qcon-2015.pdf
>>>>> 
>> 
>>

Re: Zookeeper issues mentioned in a talk about storm / heron

Posted by Tony Kurc <tr...@gmail.com>.

Mark,
Fair points!

Something an Apache Accumulo committer pointed out at meetup is the is that
the scale issues may come sooner than a couple hundred nodes due to the
size of writes and potential frequency of writes (Joe Percivall's
demonstration on the windowing seemed like it could write much more
frequently than a couple times a minute).

Another point someone brought up is that if heartbeating and state
management are competing for resources, bad things can happen.

Tony

On Fri, Apr 1, 2016 at 9:31 AM, Mark Payne <ma...@hotmail.com> wrote:

> Guys,
>
> Certainly some great points here and important concepts to keep in mind!
>
> One thing to remember, though, that very much differentiates NIFi from
> Storm
> or HBase or Accumulo: those systems are typically expected to scale to
> hundreds
> or thousands of nodes (you mention a couple hundred node HBase cluster
> being
> "modest"). With NiFi, we are typically operating clusters on the order of
> several
> nodes to dozens of nodes. A couple hundred nodes would be a pretty massive
> NiFi
> cluster.
>
> In terms of storing state, we could potentially get into more of a sticky
> situation
> if not done carefully. However, we generally expect the "State Management"
> feature to be used for
> occasionally storing small amounts of state, such as for ListHDFS storing
> 2 timestamps
> and we don't expect ListHDFS to be continually hammering HDFS asking for
> a new listing. It may be scheduled to run once every few minutes for
> example.
>
> That said, we certainly have been designing everything using appropriate
> interfaces
> in such a way that if we do later decide that we want to use some other
> mechanism
> for storing state and/or heartbeats, it should be a very reasonable path to
> take advantage of some other service for one or both of these tasks.
>
> Thanks
> -Mark
>
>
>
> > On Mar 31, 2016, at 5:35 PM, Sean Busbey <bu...@apache.org> wrote:
> >
> > HBase has also had issues with ZK at modest (~couple hundred node)
> > scale when using it to act as an intermediary for heartbeats and to do
> > work assignment.
> >
> > On Thu, Mar 31, 2016 at 4:33 PM, Tony Kurc <tr...@gmail.com> wrote:
> >> My commentary that didn't accompany this - it appears Storm was using
> >> zookeeper in a similar way as the road we're heading down, and it was
> such
> >> a major bottleneck that they moved key value storage and heartbeating
> out
> >> into separate services, and then re-engineering (i.e. built Heron).
> Before
> >> we get too dependent on zookeeper, may be worth learning some lessons
> from
> >> the crew that built Heron or from a team that learned zookeeper lessons
> >> scale like accumulo.
> >>
> >> On Thu, Mar 24, 2016 at 6:22 PM, Tony Kurc <tr...@gmail.com> wrote:
> >>
> >>> I mentioned slides I saw at the meetup about zookeeper perils at scale
> in
> >>> storm, here are slides, i couldn't find a video after some limited
> >>> searching.
> >>>
> https://qconsf.com/system/files/presentation-slides/heron-qcon-2015.pdf
> >>>
>
>