You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Juan Sequeiros <he...@gmail.com> on 2017/04/21 14:57:45 UTC

List processors

Hello all,

My preliminary testing shows that if I run ListS3 ( maybe all list
processors? )  processor on a cluster and that cluster is not running or
configured to talk to zookeeper that he does not maintain state at all even
though I would expect him to maintain state locally.

EX: ListProcessor ( run on primary node ) > distribute to cluster and use
ConsumeProcessor,

* We accept the fact that if primary node changes it would lose state, but
I want to maintain local state on cluster.

I have tried changing the
nifi.state.management.provider.cluster=local-provider and he fails since I
am clustered.

I am certainly going to do more testing unless its definitely true that if
clustered it only maintains zookeeper state.

Or it might be processor dependent? My tests have been with ListS3 and i am
on Apache NIFI 0.7

Re: List processors

Posted by Bryan Bende <bb...@gmail.com>.
Ah I see, the providers say which scopes they support, and then the
framework only lets them be used for the appropriate scope, which
makes sense.

I suppose the WriteAheadLocalStateProvider could be modified to
support CLUSTER scope, although this seems like a bad idea to give
people an option that we know causes data loss. You could make your
own version of the WriteAheadLocalStateProvider with this change
though.

If you are running a cluster on 0.7, don't you already have ZooKeeper
in order to run the cluster? Just trying to see what the issue with
using ZooKeeper is.

On Fri, Apr 21, 2017 at 2:32 PM, Juan Sequeiros <he...@gmail.com> wrote:
> Thanks Bryan just tried and NIFI does not start because of this:
>
> " Cannot use Cluster State Provider ( WriteAheadLocalStateProvider ) as it
> only supports scope(s) {LOCAL} but instance is configured to use scope
> CLUSTER"
>
>
> On Fri, Apr 21, 2017 at 2:12 PM Bryan Bende <bb...@gmail.com> wrote:
>>
>> Juan,
>>
>> I believe from the processor side of the things, when a processor
>> calls save/retrieve on the state manager, the processor has to specify
>> a context like CLUSTER or LOCAL. If you specify CLUSTER, and no
>> clustered state provider exists, then it will save it to the local
>> provider. This allows a processor to work seamlessly across a
>> standalone Nifi and a clustered Nifi.
>>
>> The issue is that NiFi is not allowed to start up in clustered mode
>> without a clustered state provider, generally that is the ZooKeeper
>> provider, although it is an extension point and someone can implement
>> their own.
>>
>> I would think you could do the following...
>>
>> The normal clustered provider looks like this in state-management.xml....
>>
>> <cluster-provider>
>>     <id>zk-provider</id>
>>
>> <class>org.apache.nifi.controller.state.providers.zookeeper.ZooKeeperStateProvider</class>
>>     <property name="Connect String"></property>
>>     <property name="Root Node">/nifi</property>
>>     <property name="Session Timeout">10 seconds</property>
>>     <property name="Access Control">Open</property>
>> </cluster-provider>
>>
>> If you take the config from the local provider and drop it in the
>> cluster provider...
>>
>> <cluster-provider>
>>     <id>local-cluster-provider</id>
>>
>> <class>org.apache.nifi.controller.state.providers.local.WriteAheadLocalStateProvider</class>
>>    <property name="Directory">./state/local</property>
>>    <property name="Always Sync">false</property>
>>    <property name="Partitions">16</property>
>>    <property name="Checkpoint Interval">2 mins</property>
>> </cluster-provider>
>>
>> Basically defining another instance of the local state provider as the
>> cluster provider.
>>
>> Not totally sure if this works, but theoretically it should.
>>
>> -Bryan
>>
>>
>> On Fri, Apr 21, 2017 at 1:54 PM, Juan Sequeiros <he...@gmail.com>
>> wrote:
>> > To add more to the issue I see this on my log:
>> >
>> > "Failed to restore processor state; yielding java.io.IOException; Failed
>> > to
>> > obtain value from Zookeeper for component ...." with exception code
>> > CONNECTIONLOSS
>> >
>> > So this is confirming what I expected and at this point not sure if this
>> > is
>> > a bug or working as expected .... Feels like the dataflow manager should
>> > configure how to handle state and be able to use ListS3 in cluster even
>> > though I do not have zookeeper?
>> >
>> >
>> >
>> > On Fri, Apr 21, 2017 at 10:57 AM Juan Sequeiros <he...@gmail.com>
>> > wrote:
>> >>
>> >> Hello all,
>> >>
>> >> My preliminary testing shows that if I run ListS3 ( maybe all list
>> >> processors? )  processor on a cluster and that cluster is not running
>> >> or
>> >> configured to talk to zookeeper that he does not maintain state at all
>> >> even
>> >> though I would expect him to maintain state locally.
>> >>
>> >> EX: ListProcessor ( run on primary node ) > distribute to cluster and
>> >> use
>> >> ConsumeProcessor,
>> >>
>> >> * We accept the fact that if primary node changes it would lose state,
>> >> but
>> >> I want to maintain local state on cluster.
>> >>
>> >> I have tried changing the
>> >> nifi.state.management.provider.cluster=local-provider and he fails
>> >> since I
>> >> am clustered.
>> >>
>> >> I am certainly going to do more testing unless its definitely true that
>> >> if
>> >> clustered it only maintains zookeeper state.
>> >>
>> >> Or it might be processor dependent? My tests have been with ListS3 and
>> >> i
>> >> am on Apache NIFI 0.7

Re: List processors

Posted by Juan Sequeiros <he...@gmail.com>.
Thanks Bryan just tried and NIFI does not start because of this:

" Cannot use Cluster State Provider ( WriteAheadLocalStateProvider ) as it
only supports scope(s) {LOCAL} but instance is configured to use scope
CLUSTER"


On Fri, Apr 21, 2017 at 2:12 PM Bryan Bende <bb...@gmail.com> wrote:

> Juan,
>
> I believe from the processor side of the things, when a processor
> calls save/retrieve on the state manager, the processor has to specify
> a context like CLUSTER or LOCAL. If you specify CLUSTER, and no
> clustered state provider exists, then it will save it to the local
> provider. This allows a processor to work seamlessly across a
> standalone Nifi and a clustered Nifi.
>
> The issue is that NiFi is not allowed to start up in clustered mode
> without a clustered state provider, generally that is the ZooKeeper
> provider, although it is an extension point and someone can implement
> their own.
>
> I would think you could do the following...
>
> The normal clustered provider looks like this in state-management.xml....
>
> <cluster-provider>
>     <id>zk-provider</id>
>
> <class>org.apache.nifi.controller.state.providers.zookeeper.ZooKeeperStateProvider</class>
>     <property name="Connect String"></property>
>     <property name="Root Node">/nifi</property>
>     <property name="Session Timeout">10 seconds</property>
>     <property name="Access Control">Open</property>
> </cluster-provider>
>
> If you take the config from the local provider and drop it in the
> cluster provider...
>
> <cluster-provider>
>     <id>local-cluster-provider</id>
>
>  <class>org.apache.nifi.controller.state.providers.local.WriteAheadLocalStateProvider</class>
>    <property name="Directory">./state/local</property>
>    <property name="Always Sync">false</property>
>    <property name="Partitions">16</property>
>    <property name="Checkpoint Interval">2 mins</property>
> </cluster-provider>
>
> Basically defining another instance of the local state provider as the
> cluster provider.
>
> Not totally sure if this works, but theoretically it should.
>
> -Bryan
>
>
> On Fri, Apr 21, 2017 at 1:54 PM, Juan Sequeiros <he...@gmail.com>
> wrote:
> > To add more to the issue I see this on my log:
> >
> > "Failed to restore processor state; yielding java.io.IOException; Failed
> to
> > obtain value from Zookeeper for component ...." with exception code
> > CONNECTIONLOSS
> >
> > So this is confirming what I expected and at this point not sure if this
> is
> > a bug or working as expected .... Feels like the dataflow manager should
> > configure how to handle state and be able to use ListS3 in cluster even
> > though I do not have zookeeper?
> >
> >
> >
> > On Fri, Apr 21, 2017 at 10:57 AM Juan Sequeiros <he...@gmail.com>
> wrote:
> >>
> >> Hello all,
> >>
> >> My preliminary testing shows that if I run ListS3 ( maybe all list
> >> processors? )  processor on a cluster and that cluster is not running or
> >> configured to talk to zookeeper that he does not maintain state at all
> even
> >> though I would expect him to maintain state locally.
> >>
> >> EX: ListProcessor ( run on primary node ) > distribute to cluster and
> use
> >> ConsumeProcessor,
> >>
> >> * We accept the fact that if primary node changes it would lose state,
> but
> >> I want to maintain local state on cluster.
> >>
> >> I have tried changing the
> >> nifi.state.management.provider.cluster=local-provider and he fails
> since I
> >> am clustered.
> >>
> >> I am certainly going to do more testing unless its definitely true that
> if
> >> clustered it only maintains zookeeper state.
> >>
> >> Or it might be processor dependent? My tests have been with ListS3 and i
> >> am on Apache NIFI 0.7
>

Re: List processors

Posted by Bryan Bende <bb...@gmail.com>.
Juan,

I believe from the processor side of the things, when a processor
calls save/retrieve on the state manager, the processor has to specify
a context like CLUSTER or LOCAL. If you specify CLUSTER, and no
clustered state provider exists, then it will save it to the local
provider. This allows a processor to work seamlessly across a
standalone Nifi and a clustered Nifi.

The issue is that NiFi is not allowed to start up in clustered mode
without a clustered state provider, generally that is the ZooKeeper
provider, although it is an extension point and someone can implement
their own.

I would think you could do the following...

The normal clustered provider looks like this in state-management.xml....

<cluster-provider>
    <id>zk-provider</id>
    <class>org.apache.nifi.controller.state.providers.zookeeper.ZooKeeperStateProvider</class>
    <property name="Connect String"></property>
    <property name="Root Node">/nifi</property>
    <property name="Session Timeout">10 seconds</property>
    <property name="Access Control">Open</property>
</cluster-provider>

If you take the config from the local provider and drop it in the
cluster provider...

<cluster-provider>
    <id>local-cluster-provider</id>
   <class>org.apache.nifi.controller.state.providers.local.WriteAheadLocalStateProvider</class>
   <property name="Directory">./state/local</property>
   <property name="Always Sync">false</property>
   <property name="Partitions">16</property>
   <property name="Checkpoint Interval">2 mins</property>
</cluster-provider>

Basically defining another instance of the local state provider as the
cluster provider.

Not totally sure if this works, but theoretically it should.

-Bryan


On Fri, Apr 21, 2017 at 1:54 PM, Juan Sequeiros <he...@gmail.com> wrote:
> To add more to the issue I see this on my log:
>
> "Failed to restore processor state; yielding java.io.IOException; Failed to
> obtain value from Zookeeper for component ...." with exception code
> CONNECTIONLOSS
>
> So this is confirming what I expected and at this point not sure if this is
> a bug or working as expected .... Feels like the dataflow manager should
> configure how to handle state and be able to use ListS3 in cluster even
> though I do not have zookeeper?
>
>
>
> On Fri, Apr 21, 2017 at 10:57 AM Juan Sequeiros <he...@gmail.com> wrote:
>>
>> Hello all,
>>
>> My preliminary testing shows that if I run ListS3 ( maybe all list
>> processors? )  processor on a cluster and that cluster is not running or
>> configured to talk to zookeeper that he does not maintain state at all even
>> though I would expect him to maintain state locally.
>>
>> EX: ListProcessor ( run on primary node ) > distribute to cluster and use
>> ConsumeProcessor,
>>
>> * We accept the fact that if primary node changes it would lose state, but
>> I want to maintain local state on cluster.
>>
>> I have tried changing the
>> nifi.state.management.provider.cluster=local-provider and he fails since I
>> am clustered.
>>
>> I am certainly going to do more testing unless its definitely true that if
>> clustered it only maintains zookeeper state.
>>
>> Or it might be processor dependent? My tests have been with ListS3 and i
>> am on Apache NIFI 0.7

Re: List processors

Posted by Juan Sequeiros <he...@gmail.com>.
To add more to the issue I see this on my log:

"Failed to restore processor state; yielding java.io.IOException; Failed to
obtain value from Zookeeper for component ...." with exception code
CONNECTIONLOSS

So this is confirming what I expected and at this point not sure if this is
a bug or working as expected .... Feels like the dataflow manager should
configure how to handle state and be able to use ListS3 in cluster even
though I do not have zookeeper?



On Fri, Apr 21, 2017 at 10:57 AM Juan Sequeiros <he...@gmail.com> wrote:

> Hello all,
>
> My preliminary testing shows that if I run ListS3 ( maybe all list
> processors? )  processor on a cluster and that cluster is not running or
> configured to talk to zookeeper that he does not maintain state at all even
> though I would expect him to maintain state locally.
>
> EX: ListProcessor ( run on primary node ) > distribute to cluster and use
> ConsumeProcessor,
>
> * We accept the fact that if primary node changes it would lose state, but
> I want to maintain local state on cluster.
>
> I have tried changing the
> nifi.state.management.provider.cluster=local-provider and he fails since I
> am clustered.
>
> I am certainly going to do more testing unless its definitely true that if
> clustered it only maintains zookeeper state.
>
> Or it might be processor dependent? My tests have been with ListS3 and i
> am on Apache NIFI 0.7
>