You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@nifi.apache.org by Jeff <jt...@gmail.com> on 2016/09/19 17:46:39 UTC

Re: enforce run only in promary node $ multiple primary node

Tijo,

To give you some information on your second question, you can design your
flow to redistribute the flowfiles coming out of your processors to other
nodes in the cluster for processing.  There are several examples on how
this on various blogs/email lists/etc, and I just grabbed one for
reference, written by Apache NiFi's own Bryan Bende:
http://apache-nifi.1125220.n5.nabble.com/How-to-configure-site-to-site-communication-between-nodes-in-one-cluster-td8528.html

Please review that thread and let us know if you have further questions!

On Mon, Sep 19, 2016 at 1:19 PM Tijo Thomas <ti...@yahoo.in> wrote:

>
> Hi ,
>
> 1. While writing a processor is it possible to enforce to run only in
> primary node. I saw a Jira for this but appears to unresolved.
>
> [NIFI-543] Provide extensions a way to indicate that they can run only on
> primary node, if clustered - ASF JIRA
> <https://issues.apache.org/jira/browse/NIFI-543>
>
> [NIFI-543] Provide extensions a way to indicate that they can run only on
> p...
> <https://issues.apache.org/jira/browse/NIFI-543>
>
> 2. Currently my Primary node is heavily loaded  as i have many  processor
> which will run only in Primary node.  Is it possible to define multiple
> primary nodes . or is it possible to configure processors not to run in
> primary node.
>
> Tijo
>

Re: enforce run only in promary node $ multiple primary node

Posted by Mark Payne <ma...@hotmail.com>.

Tijo,

Sounds great. I'm very happy to see you digging in here! Would be happy to give it a review once posted.

Thanks!
-Mark

> On Nov 7, 2016, at 12:26 PM, Tijo Thomas <ti...@gmail.com> wrote:
> 
> Hi Mark,
> 
> Some how I missed this mail . We have implemented it on the similar lines
> based on 1.0  code base.
> But there are some contention which happened on  zookeeper side.  We will
> get back to the community once it is stabilised as we have release pressure
> now.
> 
> I will keep you posted on this by end of this week .
> 
> Thanks & Regards
> Tijo Thomas
> 
> On Tue, Oct 4, 2016 at 6:54 PM, Mark Payne <ma...@hotmail.com> wrote:
> 
>> Tijo,
>> 
>> Sure, I would be happy to elaborate some. Sorry it's taken me a while to
>> get back to you.
>> 
>> The idea would be to create some "named thing." Let's call it the
>> Processing Locality.
>> Perhaps a better name can be used, but I'll use this term for this email.
>> 
>> The idea is that through the UI, a user with appropriate permissions is
>> able to create a new
>> Processing Locality. Once created, a user can go to a Processor's
>> configuration and go to
>> the Scheduling Tab. Currently, there are 3 options available for the
>> Scheduling Strategy:
>> Timer-Driven (always available), Event-Driven (available for some
>> processors), and
>> Primary Node (available when running in clustered mode).
>> 
>> My proposal is that we first remove the Primary Node scheduling strategy,
>> so that we have
>> only two scheduling strategies: Timer-Driven and Event-Driven. We then add
>> a Processing
>> Locality field to the Scheduling tab. The available options would be "All
>> Nodes" (which would
>> be the default) or any of the named Processing Localities that users have
>> added. For backward
>> compatibility purposes, we would always have a "Primary Node" Processing
>> Locality.
>> 
>> If a Processing Locality other than "All Nodes" is selected, then the
>> processor would run only on
>> a single node, just as Primary Node works today. The difference, though,
>> is that all nodes that have
>> the same Processing Locality would run on the same node but processors
>> with a different
>> Processing Locality would potentially run on a different node. Which node
>> a given Processing Locality
>> is run on would be determined via ZooKeeper, just as Primary Node is. This
>> allows us automatic
>> failover if the node running a specific Processing Locality fails.
>> 
>> For example, say we have 5 Processors: A, B, C, D, E, and F. And we have 3
>> Processing Localities:
>> Locality 1, Locality 2, Locality 3.
>> We configure B and E to run at Locality 1, A and C to run at Locality 2,
>> and D to run at Locality 3.
>> 
>> Now we know that Processor B and E will run on the same node. Processors A
>> and C will run on
>> the same node. It's possible that B, E, A, and C will all run on the same
>> node (if one node is elected
>> to run both Locality 1 and Locality 2). Or they may be different nodes.
>> But we know that B & E will
>> run on the same node and A & C will run on the same node. Processor D is
>> again in its own Processing
>> Locality, so it may run on any given node. But if another Processor is
>> added and configured to run on
>> Processing Locality 3, it will definitely be co-located with Processor D.
>> 
>> Does all of this sound reasonable to you and make sense? Would love to
>> hear any ideas that you or
>> the others on your team have!
>> 
>> Thanks
>> -Mark
>> 
>> 
>> 
>>> On Oct 1, 2016, at 1:43 AM, Tijo Thomas <ti...@yahoo.in.INVALID>
>> wrote:
>>> 
>>> Hi Mark , In your earlier mail you have mention about some appoach on
>> named grouping construct.  Is it possible to discuss further about this.
>>> I am thinking of some thing like a  node labeling concept in Yarn  . If
>> Nifi can support this feature it will be good.  Me and my team is willing
>> to contribute if we can implement this feature.
>>> Please let me know your opinion.
>>> Thanks & RegardsTijo Thomas
>>> 
>>>   On Wednesday, 21 September 2016 10:20 PM, Tijo Thomas
>> <ti...@yahoo.in.INVALID> wrote:
>>> 
>>> 
>>> Mark,
>>> Changing the concept of "Run on Primary Node" to " Run on Only one node"
>> will not solve the problem .  Name Grouping constructs would be better
>> option .
>>> 
>>> Nijel,
>>> Our usecase is also similar.  We have many tasks to run only in one node
>> and wanted to distribute the load . If we can have a list of primary node
>> to distribute the load it will solve our problem .
>>> 
>>> Tijo
>>> 
>>>    On Wednesday, 21 September 2016 6:01 PM, "markap14@hotmail.com" <
>> markap14@hotmail.com> wrote:
>>> 
>>> 
>>> Nijel,
>>> 
>>> I'd like to hear more about your use case, as from the description
>> given, I'm not sure that this all would need to run on a primary node.
>> Generally, you want only "source processors" to run on primary node.
>>> 
>>> One thing that I've been thinking about, though, is changing the concept
>> of "Run on Primary Node" to a "Run on Only One Node." The concern there is
>> that we will have cases where a few processors have to run on the same
>> node. So we would need a mechanism for supporting that. Perhaps some sort
>> of named grouping construct.
>>> 
>>> Thoughts?
>>> 
>>> Sent from my iPhone
>>> 
>>>> On Sep 21, 2016, at 5:07 AM, Nijel s f <ni...@huawei.com> wrote:
>>>> 
>>>> Hi all
>>>> 
>>>>                Supporting to Tijo’s thought, have one scenario.
>>>> 
>>>> we are trying to use Nifi for a data pipeline solution. The scenario is
>> to coordinate between various services and provide a solution for big data
>> analysis
>>>>                In our scenario many of the activities are kind of "run
>> on primary" mode processors. These are being implemented on top of various
>> components like Yarn, Hbase, Spark, DB etc.
>>>> 
>>>>                One issue we are seeing is all these processors to be
>> run on primary node  [like spark execution, yarn/mr job execution etc.. ]
>> and it is only one.
>>>>                We are thinking of having multiple primary nodes and
>> assign the activities using some distribution algorithm.
>>>>                The idea is to handle the coordination and failover
>> mechanism using zookeeper.
>>>> 
>>>>                Any thoughts on this ?
>>>> 
>>>> Regards
>>>> Nijel
>>>> 
>>>> From: Jeff [mailto:jtswork@gmail.com]
>>>> Sent: Monday, September 19, 2016 11:17 PM
>>>> To: Tijo Thomas; users@nifi.apache.org
>>>> Subject: Re: enforce run only in promary node $ multiple primary node
>>>> 
>>>> Tijo,
>>>> 
>>>> To give you some information on your second question, you can design
>> your flow to redistribute the flowfiles coming out of your processors to
>> other nodes in the cluster for processing.  There are several examples on
>> how this on various blogs/email lists/etc, and I just grabbed one for
>> reference, written by Apache NiFi's own Bryan Bende:
>> http://apache-nifi.1125220.n5.nabble.com/How-to-configure-
>> site-to-site-communication-between-nodes-in-one-cluster-td8528.html
>>>> 
>>>> Please review that thread and let us know if you have further questions!
>>>> 
>>>> On Mon, Sep 19, 2016 at 1:19 PM Tijo Thomas <tijoparacka@yahoo.in
>> <ma...@yahoo.in>> wrote:
>>>> 
>>>> Hi ,
>>>> 
>>>> 1. While writing a processor is it possible to enforce to run only in
>> primary node. I saw a Jira for this but appears to unresolved.
>>>> 
>>>> [NIFI-543] Provide extensions a way to indicate that they can run only
>> on primary node, if clustered - ASF JIRA<https://issues.apache.
>> org/jira/browse/NIFI-543>
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> [NIFI-543] Provide extensions a way to indicate that they can run only
>> on p...
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 2. Currently my Primary node is heavily loaded  as i have many
>> processor which will run only in Primary node.  Is it possible to define
>> multiple primary nodes . or is it possible to configure processors not to
>> run in primary node.
>>>> 
>>>> Tijo
>>> 
>>> 
>>> 
>>> 
>> 
>>

Re: enforce run only in promary node $ multiple primary node

Posted by Tijo Thomas <ti...@gmail.com>.

Hi Mark,

Some how I missed this mail . We have implemented it on the similar lines
 based on 1.0  code base.
But there are some contention which happened on  zookeeper side.  We will
get back to the community once it is stabilised as we have release pressure
now.

I will keep you posted on this by end of this week .

Thanks & Regards
Tijo Thomas

On Tue, Oct 4, 2016 at 6:54 PM, Mark Payne <ma...@hotmail.com> wrote:

> Tijo,
>
> Sure, I would be happy to elaborate some. Sorry it's taken me a while to
> get back to you.
>
> The idea would be to create some "named thing." Let's call it the
> Processing Locality.
> Perhaps a better name can be used, but I'll use this term for this email.
>
> The idea is that through the UI, a user with appropriate permissions is
> able to create a new
> Processing Locality. Once created, a user can go to a Processor's
> configuration and go to
> the Scheduling Tab. Currently, there are 3 options available for the
> Scheduling Strategy:
> Timer-Driven (always available), Event-Driven (available for some
> processors), and
> Primary Node (available when running in clustered mode).
>
> My proposal is that we first remove the Primary Node scheduling strategy,
> so that we have
> only two scheduling strategies: Timer-Driven and Event-Driven. We then add
> a Processing
> Locality field to the Scheduling tab. The available options would be "All
> Nodes" (which would
> be the default) or any of the named Processing Localities that users have
> added. For backward
> compatibility purposes, we would always have a "Primary Node" Processing
> Locality.
>
> If a Processing Locality other than "All Nodes" is selected, then the
> processor would run only on
> a single node, just as Primary Node works today. The difference, though,
> is that all nodes that have
> the same Processing Locality would run on the same node but processors
> with a different
> Processing Locality would potentially run on a different node. Which node
> a given Processing Locality
> is run on would be determined via ZooKeeper, just as Primary Node is. This
> allows us automatic
> failover if the node running a specific Processing Locality fails.
>
> For example, say we have 5 Processors: A, B, C, D, E, and F. And we have 3
> Processing Localities:
> Locality 1, Locality 2, Locality 3.
> We configure B and E to run at Locality 1, A and C to run at Locality 2,
> and D to run at Locality 3.
>
> Now we know that Processor B and E will run on the same node. Processors A
> and C will run on
> the same node. It's possible that B, E, A, and C will all run on the same
> node (if one node is elected
> to run both Locality 1 and Locality 2). Or they may be different nodes.
> But we know that B & E will
> run on the same node and A & C will run on the same node. Processor D is
> again in its own Processing
> Locality, so it may run on any given node. But if another Processor is
> added and configured to run on
> Processing Locality 3, it will definitely be co-located with Processor D.
>
> Does all of this sound reasonable to you and make sense? Would love to
> hear any ideas that you or
> the others on your team have!
>
> Thanks
> -Mark
>
>
>
> > On Oct 1, 2016, at 1:43 AM, Tijo Thomas <ti...@yahoo.in.INVALID>
> wrote:
> >
> > Hi Mark , In your earlier mail you have mention about some appoach on
> named grouping construct.  Is it possible to discuss further about this.
> > I am thinking of some thing like a  node labeling concept in Yarn  . If
> Nifi can support this feature it will be good.  Me and my team is willing
> to contribute if we can implement this feature.
> > Please let me know your opinion.
> > Thanks & RegardsTijo Thomas
> >
> >    On Wednesday, 21 September 2016 10:20 PM, Tijo Thomas
> <ti...@yahoo.in.INVALID> wrote:
> >
> >
> > Mark,
> > Changing the concept of "Run on Primary Node" to " Run on Only one node"
> will not solve the problem .  Name Grouping constructs would be better
> option .
> >
> > Nijel,
> > Our usecase is also similar.  We have many tasks to run only in one node
> and wanted to distribute the load . If we can have a list of primary node
> to distribute the load it will solve our problem .
> >
> > Tijo
> >
> >     On Wednesday, 21 September 2016 6:01 PM, "markap14@hotmail.com" <
> markap14@hotmail.com> wrote:
> >
> >
> > Nijel,
> >
> > I'd like to hear more about your use case, as from the description
> given, I'm not sure that this all would need to run on a primary node.
> Generally, you want only "source processors" to run on primary node.
> >
> > One thing that I've been thinking about, though, is changing the concept
> of "Run on Primary Node" to a "Run on Only One Node." The concern there is
> that we will have cases where a few processors have to run on the same
> node. So we would need a mechanism for supporting that. Perhaps some sort
> of named grouping construct.
> >
> > Thoughts?
> >
> > Sent from my iPhone
> >
> >> On Sep 21, 2016, at 5:07 AM, Nijel s f <ni...@huawei.com> wrote:
> >>
> >> Hi all
> >>
> >>                 Supporting to Tijo’s thought, have one scenario.
> >>
> >> we are trying to use Nifi for a data pipeline solution. The scenario is
> to coordinate between various services and provide a solution for big data
> analysis
> >>                 In our scenario many of the activities are kind of "run
> on primary" mode processors. These are being implemented on top of various
> components like Yarn, Hbase, Spark, DB etc.
> >>
> >>                 One issue we are seeing is all these processors to be
> run on primary node  [like spark execution, yarn/mr job execution etc.. ]
> and it is only one.
> >>                 We are thinking of having multiple primary nodes and
> assign the activities using some distribution algorithm.
> >>                 The idea is to handle the coordination and failover
> mechanism using zookeeper.
> >>
> >>                 Any thoughts on this ?
> >>
> >> Regards
> >> Nijel
> >>
> >> From: Jeff [mailto:jtswork@gmail.com]
> >> Sent: Monday, September 19, 2016 11:17 PM
> >> To: Tijo Thomas; users@nifi.apache.org
> >> Subject: Re: enforce run only in promary node $ multiple primary node
> >>
> >> Tijo,
> >>
> >> To give you some information on your second question, you can design
> your flow to redistribute the flowfiles coming out of your processors to
> other nodes in the cluster for processing.  There are several examples on
> how this on various blogs/email lists/etc, and I just grabbed one for
> reference, written by Apache NiFi's own Bryan Bende:
> http://apache-nifi.1125220.n5.nabble.com/How-to-configure-
> site-to-site-communication-between-nodes-in-one-cluster-td8528.html
> >>
> >> Please review that thread and let us know if you have further questions!
> >>
> >> On Mon, Sep 19, 2016 at 1:19 PM Tijo Thomas <tijoparacka@yahoo.in
> <ma...@yahoo.in>> wrote:
> >>
> >> Hi ,
> >>
> >> 1. While writing a processor is it possible to enforce to run only in
> primary node. I saw a Jira for this but appears to unresolved.
> >>
> >> [NIFI-543] Provide extensions a way to indicate that they can run only
> on primary node, if clustered - ASF JIRA<https://issues.apache.
> org/jira/browse/NIFI-543>
> >>
> >>
> >>
> >>
> >>
> >> [NIFI-543] Provide extensions a way to indicate that they can run only
> on p...
> >>
> >>
> >>
> >>
> >> 2. Currently my Primary node is heavily loaded  as i have many
> processor which will run only in Primary node.  Is it possible to define
> multiple primary nodes . or is it possible to configure processors not to
> run in primary node.
> >>
> >> Tijo
> >
> >
> >
> >
>
>

Re: enforce run only in promary node $ multiple primary node

Posted by Mark Payne <ma...@hotmail.com>.

Tijo,

Sure, I would be happy to elaborate some. Sorry it's taken me a while to get back to you.

The idea would be to create some "named thing." Let's call it the Processing Locality.
Perhaps a better name can be used, but I'll use this term for this email.

The idea is that through the UI, a user with appropriate permissions is able to create a new
Processing Locality. Once created, a user can go to a Processor's configuration and go to
the Scheduling Tab. Currently, there are 3 options available for the Scheduling Strategy:
Timer-Driven (always available), Event-Driven (available for some processors), and
Primary Node (available when running in clustered mode).

My proposal is that we first remove the Primary Node scheduling strategy, so that we have
only two scheduling strategies: Timer-Driven and Event-Driven. We then add a Processing
Locality field to the Scheduling tab. The available options would be "All Nodes" (which would
be the default) or any of the named Processing Localities that users have added. For backward
compatibility purposes, we would always have a "Primary Node" Processing Locality.

If a Processing Locality other than "All Nodes" is selected, then the processor would run only on
a single node, just as Primary Node works today. The difference, though, is that all nodes that have
the same Processing Locality would run on the same node but processors with a different
Processing Locality would potentially run on a different node. Which node a given Processing Locality
is run on would be determined via ZooKeeper, just as Primary Node is. This allows us automatic
failover if the node running a specific Processing Locality fails.

For example, say we have 5 Processors: A, B, C, D, E, and F. And we have 3 Processing Localities:
Locality 1, Locality 2, Locality 3.
We configure B and E to run at Locality 1, A and C to run at Locality 2, and D to run at Locality 3.

Now we know that Processor B and E will run on the same node. Processors A and C will run on
the same node. It's possible that B, E, A, and C will all run on the same node (if one node is elected
to run both Locality 1 and Locality 2). Or they may be different nodes. But we know that B & E will
run on the same node and A & C will run on the same node. Processor D is again in its own Processing
Locality, so it may run on any given node. But if another Processor is added and configured to run on
Processing Locality 3, it will definitely be co-located with Processor D.

Does all of this sound reasonable to you and make sense? Would love to hear any ideas that you or
the others on your team have!

Thanks
-Mark

> On Oct 1, 2016, at 1:43 AM, Tijo Thomas <ti...@yahoo.in.INVALID> wrote:
> 
> Hi Mark , In your earlier mail you have mention about some appoach on named grouping construct.  Is it possible to discuss further about this.   
> I am thinking of some thing like a  node labeling concept in Yarn  . If Nifi can support this feature it will be good.  Me and my team is willing to contribute if we can implement this feature.  
> Please let me know your opinion. 
> Thanks & RegardsTijo Thomas 
> 
>    On Wednesday, 21 September 2016 10:20 PM, Tijo Thomas <ti...@yahoo.in.INVALID> wrote:
> 
> 
> Mark, 
> Changing the concept of "Run on Primary Node" to " Run on Only one node" will not solve the problem .  Name Grouping constructs would be better option . 
> 
> Nijel, 
> Our usecase is also similar.  We have many tasks to run only in one node and wanted to distribute the load . If we can have a list of primary node  to distribute the load it will solve our problem . 
> 
> Tijo
> 
>     On Wednesday, 21 September 2016 6:01 PM, "markap14@hotmail.com" <ma...@hotmail.com> wrote:
> 
> 
> Nijel,
> 
> I'd like to hear more about your use case, as from the description given, I'm not sure that this all would need to run on a primary node. Generally, you want only "source processors" to run on primary node.
> 
> One thing that I've been thinking about, though, is changing the concept of "Run on Primary Node" to a "Run on Only One Node." The concern there is that we will have cases where a few processors have to run on the same node. So we would need a mechanism for supporting that. Perhaps some sort of named grouping construct. 
> 
> Thoughts?
> 
> Sent from my iPhone
> 
>> On Sep 21, 2016, at 5:07 AM, Nijel s f <ni...@huawei.com> wrote:
>> 
>> Hi all
>> 
>>                 Supporting to Tijo’s thought, have one scenario.
>> 
>> we are trying to use Nifi for a data pipeline solution. The scenario is to coordinate between various services and provide a solution for big data analysis
>>                 In our scenario many of the activities are kind of "run on primary" mode processors. These are being implemented on top of various components like Yarn, Hbase, Spark, DB etc.
>> 
>>                 One issue we are seeing is all these processors to be run on primary node  [like spark execution, yarn/mr job execution etc.. ] and it is only one.
>>                 We are thinking of having multiple primary nodes and assign the activities using some distribution algorithm.
>>                 The idea is to handle the coordination and failover mechanism using zookeeper.
>> 
>>                 Any thoughts on this ?
>> 
>> Regards
>> Nijel
>> 
>> From: Jeff [mailto:jtswork@gmail.com]
>> Sent: Monday, September 19, 2016 11:17 PM
>> To: Tijo Thomas; users@nifi.apache.org
>> Subject: Re: enforce run only in promary node $ multiple primary node
>> 
>> Tijo,
>> 
>> To give you some information on your second question, you can design your flow to redistribute the flowfiles coming out of your processors to other nodes in the cluster for processing.  There are several examples on how this on various blogs/email lists/etc, and I just grabbed one for reference, written by Apache NiFi's own Bryan Bende: http://apache-nifi.1125220.n5.nabble.com/How-to-configure-site-to-site-communication-between-nodes-in-one-cluster-td8528.html
>> 
>> Please review that thread and let us know if you have further questions!
>> 
>> On Mon, Sep 19, 2016 at 1:19 PM Tijo Thomas <ti...@yahoo.in>> wrote:
>> 
>> Hi ,
>> 
>> 1. While writing a processor is it possible to enforce to run only in primary node. I saw a Jira for this but appears to unresolved.
>> 
>> [NIFI-543] Provide extensions a way to indicate that they can run only on primary node, if clustered - ASF JIRA<https://issues.apache.org/jira/browse/NIFI-543>
>> 
>> 
>> 
>> 
>> 
>> [NIFI-543] Provide extensions a way to indicate that they can run only on p...
>> 
>> 
>> 
>> 
>> 2. Currently my Primary node is heavily loaded  as i have many  processor which will run only in Primary node.  Is it possible to define multiple primary nodes . or is it possible to configure processors not to run in primary node.
>> 
>> Tijo
> 
> 
>   
>

Re: enforce run only in promary node $ multiple primary node

Posted by Tijo Thomas <ti...@yahoo.in.INVALID>.

Hi Mark , In your earlier mail you have mention about some appoach on named grouping construct.  Is it possible to discuss further about this.   
I am thinking of some thing like a  node labeling concept in Yarn  . If Nifi can support this feature it will be good.  Me and my team is willing to contribute if we can implement this feature.  
Please let me know your opinion. 
Thanks & RegardsTijo Thomas 

    On Wednesday, 21 September 2016 10:20 PM, Tijo Thomas <ti...@yahoo.in.INVALID> wrote:

 Mark, 
Changing the concept of "Run on Primary Node" to " Run on Only one node" will not solve the problem .  Name Grouping constructs would be better option . 

Nijel, 
Our usecase is also similar.  We have many tasks to run only in one node and wanted to distribute the load . If we can have a list of primary node  to distribute the load it will solve our problem . 

Tijo

    On Wednesday, 21 September 2016 6:01 PM, "markap14@hotmail.com" <ma...@hotmail.com> wrote:

 Nijel,

I'd like to hear more about your use case, as from the description given, I'm not sure that this all would need to run on a primary node. Generally, you want only "source processors" to run on primary node.

One thing that I've been thinking about, though, is changing the concept of "Run on Primary Node" to a "Run on Only One Node." The concern there is that we will have cases where a few processors have to run on the same node. So we would need a mechanism for supporting that. Perhaps some sort of named grouping construct. 

Thoughts?

Sent from my iPhone

> On Sep 21, 2016, at 5:07 AM, Nijel s f <ni...@huawei.com> wrote:
> 
> Hi all
> 
>                Supporting to Tijo’s thought, have one scenario.
> 
> we are trying to use Nifi for a data pipeline solution. The scenario is to coordinate between various services and provide a solution for big data analysis
>                In our scenario many of the activities are kind of "run on primary" mode processors. These are being implemented on top of various components like Yarn, Hbase, Spark, DB etc.
> 
>                One issue we are seeing is all these processors to be run on primary node  [like spark execution, yarn/mr job execution etc.. ] and it is only one.
>                We are thinking of having multiple primary nodes and assign the activities using some distribution algorithm.
>                The idea is to handle the coordination and failover mechanism using zookeeper.
> 
>                Any thoughts on this ?
> 
> Regards
> Nijel
> 
> From: Jeff [mailto:jtswork@gmail.com]
> Sent: Monday, September 19, 2016 11:17 PM
> To: Tijo Thomas; users@nifi.apache.org
> Subject: Re: enforce run only in promary node $ multiple primary node
> 
> Tijo,
> 
> To give you some information on your second question, you can design your flow to redistribute the flowfiles coming out of your processors to other nodes in the cluster for processing.  There are several examples on how this on various blogs/email lists/etc, and I just grabbed one for reference, written by Apache NiFi's own Bryan Bende: http://apache-nifi.1125220.n5.nabble.com/How-to-configure-site-to-site-communication-between-nodes-in-one-cluster-td8528.html
> 
> Please review that thread and let us know if you have further questions!
> 
> On Mon, Sep 19, 2016 at 1:19 PM Tijo Thomas <ti...@yahoo.in>> wrote:
> 
> Hi ,
> 
> 1. While writing a processor is it possible to enforce to run only in primary node. I saw a Jira for this but appears to unresolved.
> 
> [NIFI-543] Provide extensions a way to indicate that they can run only on primary node, if clustered - ASF JIRA<https://issues.apache.org/jira/browse/NIFI-543>
> 
> 
> 
> 
> 
> [NIFI-543] Provide extensions a way to indicate that they can run only on p...
> 
> 
> 
> 
> 2. Currently my Primary node is heavily loaded  as i have many  processor which will run only in Primary node.  Is it possible to define multiple primary nodes . or is it possible to configure processors not to run in primary node.
> 
> Tijo

RE: enforce run only in promary node $ multiple primary node

Posted by Nijel s f <ni...@huawei.com>.

Thanks Mark and tijo for the opinion

Yes. I am looking into somewhat grouping the nodes. But again in this case the processor should run only on one node.
Few examples of this type of processors which we have are
1. Execute a spark job -> invoke the spark job and track it
2. Invoke distCp in Hadoop cluster -> Invoke the job and track
3. Invoke any other yarn/MR job -> Invoke the job and track

As a summary I wanted to partition the cluster into 2 type of nodes.
1. One handling the single execution processors ( multiple nodes, but one processor run on only one node)
2. Normal data processing nodes (this is like current Nifi nodes).

So grouping the nodes is one idea. But again in that also the processor execution needs to be controlled to be in one node only.

As a background, our plan is to deploy some sort of pipeline service where the deployment needs these two type of nodes to be deployed and managed separately.

Regards
-nijel

-----Original Message-----
From: Tijo Thomas [mailto:tijoparacka@yahoo.in.INVALID] 
Sent: Wednesday, September 21, 2016 10:20 PM
To: dev@nifi.apache.org
Subject: Re: enforce run only in promary node $ multiple primary node

Mark, 
Changing the concept of "Run on Primary Node" to " Run on Only one node" will not solve the problem .  Name Grouping constructs would be better option . 

Nijel, 
Our usecase is also similar.  We have many tasks to run only in one node and wanted to distribute the load . If we can have a list of primary node  to distribute the load it will solve our problem . 

Tijo

    On Wednesday, 21 September 2016 6:01 PM, "markap14@hotmail.com" <ma...@hotmail.com> wrote:

 Nijel,

I'd like to hear more about your use case, as from the description given, I'm not sure that this all would need to run on a primary node. Generally, you want only "source processors" to run on primary node.

One thing that I've been thinking about, though, is changing the concept of "Run on Primary Node" to a "Run on Only One Node." The concern there is that we will have cases where a few processors have to run on the same node. So we would need a mechanism for supporting that. Perhaps some sort of named grouping construct. 

Thoughts?

Sent from my iPhone

> On Sep 21, 2016, at 5:07 AM, Nijel s f <ni...@huawei.com> wrote:
> 
> Hi all
> 
>                Supporting to Tijo’s thought, have one scenario.
> 
> we are trying to use Nifi for a data pipeline solution. The scenario is to coordinate between various services and provide a solution for big data analysis
>                In our scenario many of the activities are kind of "run on primary" mode processors. These are being implemented on top of various components like Yarn, Hbase, Spark, DB etc.
> 
>                One issue we are seeing is all these processors to be run on primary node  [like spark execution, yarn/mr job execution etc.. ] and it is only one.
>                We are thinking of having multiple primary nodes and assign the activities using some distribution algorithm.
>                The idea is to handle the coordination and failover mechanism using zookeeper.
> 
>                Any thoughts on this ?
> 
> Regards
> Nijel
> 
> From: Jeff [mailto:jtswork@gmail.com]
> Sent: Monday, September 19, 2016 11:17 PM
> To: Tijo Thomas; users@nifi.apache.org
> Subject: Re: enforce run only in promary node $ multiple primary node
> 
> Tijo,
> 
> To give you some information on your second question, you can design your flow to redistribute the flowfiles coming out of your processors to other nodes in the cluster for processing.  There are several examples on how this on various blogs/email lists/etc, and I just grabbed one for reference, written by Apache NiFi's own Bryan Bende: http://apache-nifi.1125220.n5.nabble.com/How-to-configure-site-to-site-communication-between-nodes-in-one-cluster-td8528.html
> 
> Please review that thread and let us know if you have further questions!
> 
> On Mon, Sep 19, 2016 at 1:19 PM Tijo Thomas <ti...@yahoo.in>> wrote:
> 
> Hi ,
> 
> 1. While writing a processor is it possible to enforce to run only in primary node. I saw a Jira for this but appears to unresolved.
> 
> [NIFI-543] Provide extensions a way to indicate that they can run only on primary node, if clustered - ASF JIRA<https://issues.apache.org/jira/browse/NIFI-543>
> 
> 
> 
> 
> 
> [NIFI-543] Provide extensions a way to indicate that they can run only on p...
> 
> 
> 
> 
> 2. Currently my Primary node is heavily loaded  as i have many  processor which will run only in Primary node.  Is it possible to define multiple primary nodes . or is it possible to configure processors not to run in primary node.
> 
> Tijo

Re: enforce run only in promary node $ multiple primary node

Posted by Tijo Thomas <ti...@yahoo.in.INVALID>.

Mark, 
Changing the concept of "Run on Primary Node" to " Run on Only one node" will not solve the problem .  Name Grouping constructs would be better option . 

Nijel, 
Our usecase is also similar.  We have many tasks to run only in one node and wanted to distribute the load . If we can have a list of primary node  to distribute the load it will solve our problem . 
 
Tijo

    On Wednesday, 21 September 2016 6:01 PM, "markap14@hotmail.com" <ma...@hotmail.com> wrote:
 

 Nijel,

I'd like to hear more about your use case, as from the description given, I'm not sure that this all would need to run on a primary node. Generally, you want only "source processors" to run on primary node.

One thing that I've been thinking about, though, is changing the concept of "Run on Primary Node" to a "Run on Only One Node." The concern there is that we will have cases where a few processors have to run on the same node. So we would need a mechanism for supporting that. Perhaps some sort of named grouping construct. 

Thoughts?

Sent from my iPhone

> On Sep 21, 2016, at 5:07 AM, Nijel s f <ni...@huawei.com> wrote:
> 
> Hi all
> 
>                Supporting to Tijo’s thought, have one scenario.
> 
> we are trying to use Nifi for a data pipeline solution. The scenario is to coordinate between various services and provide a solution for big data analysis
>                In our scenario many of the activities are kind of "run on primary" mode processors. These are being implemented on top of various components like Yarn, Hbase, Spark, DB etc.
> 
>                One issue we are seeing is all these processors to be run on primary node  [like spark execution, yarn/mr job execution etc.. ] and it is only one.
>                We are thinking of having multiple primary nodes and assign the activities using some distribution algorithm.
>                The idea is to handle the coordination and failover mechanism using zookeeper.
> 
>                Any thoughts on this ?
> 
> Regards
> Nijel
> 
> From: Jeff [mailto:jtswork@gmail.com]
> Sent: Monday, September 19, 2016 11:17 PM
> To: Tijo Thomas; users@nifi.apache.org
> Subject: Re: enforce run only in promary node $ multiple primary node
> 
> Tijo,
> 
> To give you some information on your second question, you can design your flow to redistribute the flowfiles coming out of your processors to other nodes in the cluster for processing.  There are several examples on how this on various blogs/email lists/etc, and I just grabbed one for reference, written by Apache NiFi's own Bryan Bende: http://apache-nifi.1125220.n5.nabble.com/How-to-configure-site-to-site-communication-between-nodes-in-one-cluster-td8528.html
> 
> Please review that thread and let us know if you have further questions!
> 
> On Mon, Sep 19, 2016 at 1:19 PM Tijo Thomas <ti...@yahoo.in>> wrote:
> 
> Hi ,
> 
> 1. While writing a processor is it possible to enforce to run only in primary node. I saw a Jira for this but appears to unresolved.
> 
> [NIFI-543] Provide extensions a way to indicate that they can run only on primary node, if clustered - ASF JIRA<https://issues.apache.org/jira/browse/NIFI-543>
> 
> 
> 
> 
> 
> [NIFI-543] Provide extensions a way to indicate that they can run only on p...
> 
> 
> 
> 
> 2. Currently my Primary node is heavily loaded  as i have many  processor which will run only in Primary node.  Is it possible to define multiple primary nodes . or is it possible to configure processors not to run in primary node.
> 
> Tijo

Re: enforce run only in promary node $ multiple primary node

Posted by ma...@hotmail.com.

Nijel,

I'd like to hear more about your use case, as from the description given, I'm not sure that this all would need to run on a primary node. Generally, you want only "source processors" to run on primary node.

One thing that I've been thinking about, though, is changing the concept of "Run on Primary Node" to a "Run on Only One Node." The concern there is that we will have cases where a few processors have to run on the same node. So we would need a mechanism for supporting that. Perhaps some sort of named grouping construct. 

Thoughts?

Sent from my iPhone

> On Sep 21, 2016, at 5:07 AM, Nijel s f <ni...@huawei.com> wrote:
> 
> Hi all
> 
>                Supporting to Tijo’s thought, have one scenario.
> 
> we are trying to use Nifi for a data pipeline solution. The scenario is to coordinate between various services and provide a solution for big data analysis
>                In our scenario many of the activities are kind of "run on primary" mode processors. These are being implemented on top of various components like Yarn, Hbase, Spark, DB etc.
> 
>                One issue we are seeing is all these processors to be run on primary node  [like spark execution, yarn/mr job execution etc.. ] and it is only one.
>                We are thinking of having multiple primary nodes and assign the activities using some distribution algorithm.
>                The idea is to handle the coordination and failover mechanism using zookeeper.
> 
>                Any thoughts on this ?
> 
> Regards
> Nijel
> 
> From: Jeff [mailto:jtswork@gmail.com]
> Sent: Monday, September 19, 2016 11:17 PM
> To: Tijo Thomas; users@nifi.apache.org
> Subject: Re: enforce run only in promary node $ multiple primary node
> 
> Tijo,
> 
> To give you some information on your second question, you can design your flow to redistribute the flowfiles coming out of your processors to other nodes in the cluster for processing.  There are several examples on how this on various blogs/email lists/etc, and I just grabbed one for reference, written by Apache NiFi's own Bryan Bende: http://apache-nifi.1125220.n5.nabble.com/How-to-configure-site-to-site-communication-between-nodes-in-one-cluster-td8528.html
> 
> Please review that thread and let us know if you have further questions!
> 
> On Mon, Sep 19, 2016 at 1:19 PM Tijo Thomas <ti...@yahoo.in>> wrote:
> 
> Hi ,
> 
> 1. While writing a processor is it possible to enforce to run only in primary node. I saw a Jira for this but appears to unresolved.
> 
> [NIFI-543] Provide extensions a way to indicate that they can run only on primary node, if clustered - ASF JIRA<https://issues.apache.org/jira/browse/NIFI-543>
> 
> 
> 
> 
> 
> [NIFI-543] Provide extensions a way to indicate that they can run only on p...
> 
> 
> 
> 
> 2. Currently my Primary node is heavily loaded  as i have many  processor which will run only in Primary node.  Is it possible to define multiple primary nodes . or is it possible to configure processors not to run in primary node.
> 
> Tijo

RE: enforce run only in promary node $ multiple primary node

Posted by Nijel s f <ni...@huawei.com>.

Hi all

                Supporting to Tijo’s thought, have one scenario.

we are trying to use Nifi for a data pipeline solution. The scenario is to coordinate between various services and provide a solution for big data analysis
                In our scenario many of the activities are kind of "run on primary" mode processors. These are being implemented on top of various components like Yarn, Hbase, Spark, DB etc.

                One issue we are seeing is all these processors to be run on primary node  [like spark execution, yarn/mr job execution etc.. ] and it is only one.
                We are thinking of having multiple primary nodes and assign the activities using some distribution algorithm.
                The idea is to handle the coordination and failover mechanism using zookeeper.

                Any thoughts on this ?

Regards
Nijel

From: Jeff [mailto:jtswork@gmail.com]
Sent: Monday, September 19, 2016 11:17 PM
To: Tijo Thomas; users@nifi.apache.org
Subject: Re: enforce run only in promary node $ multiple primary node

Tijo,

To give you some information on your second question, you can design your flow to redistribute the flowfiles coming out of your processors to other nodes in the cluster for processing.  There are several examples on how this on various blogs/email lists/etc, and I just grabbed one for reference, written by Apache NiFi's own Bryan Bende: http://apache-nifi.1125220.n5.nabble.com/How-to-configure-site-to-site-communication-between-nodes-in-one-cluster-td8528.html

Please review that thread and let us know if you have further questions!

On Mon, Sep 19, 2016 at 1:19 PM Tijo Thomas <ti...@yahoo.in>> wrote:

Hi ,

1. While writing a processor is it possible to enforce to run only in primary node. I saw a Jira for this but appears to unresolved.

[NIFI-543] Provide extensions a way to indicate that they can run only on primary node, if clustered - ASF JIRA<https://issues.apache.org/jira/browse/NIFI-543>





[NIFI-543] Provide extensions a way to indicate that they can run only on p...




2. Currently my Primary node is heavily loaded  as i have many  processor which will run only in Primary node.  Is it possible to define multiple primary nodes . or is it possible to configure processors not to run in primary node.

Tijo