You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@nifi.apache.org by "Joseph Witt (JIRA)" <ji...@apache.org> on 2015/02/10 02:08:34 UTC

[jira] [Created] (NIFI-338) The NCM should offer high availability

Joseph Witt created NIFI-338:
--------------------------------

             Summary: The NCM should offer high availability
                 Key: NIFI-338
                 URL: https://issues.apache.org/jira/browse/NIFI-338
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Core Framework
            Reporter: Andrew Purtell


Purtell:
 I'm separately curious about the level of effort it
would require to introduce multiple masters. I think of like how HBase
does it: there is only ever one active master but a user can deploy
multiple standbys to take over service should the active master fail.

Witt: 
So it is true that we have a single master model today.  But that
single master is solely for command/control of changes to the dataflow
configuration and is a very lightweight process that does nothing more
than that.  If the master dies then all nodes continue to do what they
were doing and even site-to-site continues to distribute data.  It
just does so without updates on current loading across the cluster.
Once the master is brought back on-line then the real-time command and
control functions return.  Building support for a back-up master to
offer HA of even the command/control side would probably also be a
considerable effort.  This one I'd be curious to hear of cases where
it was critical to make this part HA.

Purtell:
Yes but imagine a NiFi installation, perhaps a hosted service built on top
of it, where DataFlow Managers expect the command and control aspect of the
system to be as robust and available as flow processing itself. If one or
more standby masters are waiting in the wings to take over service for the
failed active master then automated and unattended failover would be
possible, and likely to narrow the interval where administrative changes
may fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)