You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Chakrader Dewaragatla <Ch...@lifelock.com> on 2015/11/10 20:39:10 UTC

Replicate flow files to multiple processors

Hi - Do we have any built in processor that replicate flow files to multiple  processors in parallel (in memory, not staging on disk)?
I was looking at distributedload processor that distribute load on weighted, roudrobin technique. I am looking for something that replicate the flow files.

Thanks,
-Chakri
________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

Re: Replicate flow files to multiple processors

Posted by Oleg Zhurakousky <oz...@hortonworks.com>.
Multiple processors or multiple instance of the same processor?
Could you also elaborate on your use case a bit more, simply because their may be several ways of accomplishing your goal and to pick the best understanding of the underlying problem would help.

Thanks
Oleg


On Nov 10, 2015, at 14:39, Chakrader Dewaragatla <Ch...@lifelock.com>> wrote:

Hi - Do we have any built in processor that replicate flow files to multiple  processors in parallel (in memory, not staging on disk)?
I was looking at distributedload processor that distribute load on weighted, roudrobin technique. I am looking for something that replicate the flow files.

Thanks,
-Chakri
________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

Re: Replicate flow files to multiple processors

Posted by Mark Petronic <ma...@gmail.com>.
Thanks

On Sat, Nov 14, 2015 at 11:08 AM, Mark Payne <ma...@hotmail.com> wrote:

> Correct, the content/data itself is not copied. A replica of the FlowFile
> object is created, pointing to the same piece of content in the Content
> Repository.
>
> On Nov 14, 2015, at 11:00 AM, Mark Petronic <ma...@gmail.com>
> wrote:
>
> Mark, just wanted to clarify for my understanding. You said...
>
> "NiFi does this without copying the data or anything, simply by creating
> a new FlowFile that points to the same content on disk..."
>
> -- then you said,
>
> "So when you create two connections with the same relationship, you are
> sending a copy of the FlowFile to both connections (i.e., you are
> replicating it)."
>
> I believe it is the former - WITHOUT copying the contents of the file,
> just passes a reference to the file unless the content is mutated but
> another processor, correct?
>
>
>

Re: Replicate flow files to multiple processors

Posted by Mark Payne <ma...@hotmail.com>.
Correct, the content/data itself is not copied. A replica of the FlowFile object is created, pointing to the same piece of content in the Content Repository.

> On Nov 14, 2015, at 11:00 AM, Mark Petronic <ma...@gmail.com> wrote:
> 
> Mark, just wanted to clarify for my understanding. You said...
> 
> "NiFi does this without copying the data or anything, simply by creating a new FlowFile that points to the same content on disk..." 
> 
> -- then you said, 
> 
> "So when you create two connections with the same relationship, you are sending a copy of the FlowFile to both connections (i.e., you are replicating it)."
> 
> I believe it is the former - WITHOUT copying the contents of the file, just passes a reference to the file unless the content is mutated but another processor, correct?
> 


Re: Replicate flow files to multiple processors

Posted by Mark Petronic <ma...@gmail.com>.
Mark, just wanted to clarify for my understanding. You said...

"NiFi does this without copying the data or anything, simply by creating a
new FlowFile that points to the same content on disk..."

-- then you said,

"So when you create two connections with the same relationship, you are
sending a copy of the FlowFile to both connections (i.e., you are
replicating it)."

I believe it is the former - WITHOUT copying the contents of the file, just
passes a reference to the file unless the content is mutated but another
processor, correct?

Re: Replicate flow files to multiple processors

Posted by Mark Payne <ma...@hotmail.com>.
Oleg,

Replication simply means to make a copy of something. You're thinking of replication as
distributed data replication in order to provide high availability, I believe. What we are talking
about here is simply sending a FlowFile from Processor A to Processor B and also sending
that same FlowFile (or a copy of it) from Processor A to Processor C.

So when you create two connections with the same relationship, you are sending a copy
of the FlowFile to both connections (i.e., you are replicating it).

Thanks
-Mark

> On Nov 11, 2015, at 7:42 AM, Oleg Zhurakousky <oz...@hortonworks.com> wrote:
> 
> I am still a bit confused with the problem that is being solved here.
> “replication” implies some type of redundancy allowing processing that failed “here” to be resumed “there”.
> 
> What I am reading here is more about "content based routing” - (route to their respective workflows based on their attribute)
> 
> Am I missing something?
> 
> Cheers
> Oleg
>> On Nov 10, 2015, at 5:35 PM, Andrew Grande <agrande@hortonworks.com <ma...@hortonworks.com>> wrote:
>> 
>> As mentioned, simply keep connecting things together (e.g. multiple 'success' relationship links). For better organization, consider putting a Funnel in the flow and connecting to it instead of a processor.
>> 
>> Andrew
>> 
>> From: Chakrader Dewaragatla <Chakrader.Dewaragatla@lifelock.com <ma...@lifelock.com>>
>> Reply-To: "users@nifi.apache.org <ma...@nifi.apache.org>" <users@nifi.apache.org <ma...@nifi.apache.org>>
>> Date: Tuesday, November 10, 2015 at 3:01 PM
>> To: "users@nifi.apache.org <ma...@nifi.apache.org>" <users@nifi.apache.org <ma...@nifi.apache.org>>
>> Subject: RE: Replicate flow files to multiple processors
>> 
>> Thanks Mark. This should help. 
>> 
>>  Our use case is to route traffic (flowflies) to multiple independent processors that inline route to their respective workflows based on their attribute.
>>  
>> From: Mark Payne [markap14@hotmail.com <ma...@hotmail.com>]
>> Sent: Tuesday, November 10, 2015 11:45 AM
>> To: users@nifi.apache.org <ma...@nifi.apache.org>
>> Subject: Re: Replicate flow files to multiple processors
>> 
>> Chakri,
>> 
>> This can be done with any Processor. You can simply drag multiple connections that have the same Relationship.
>> 
>> For example, you can create a GetSFTP processor and draw a connection from GetSFTP to UpdateAttribute with the 'success' relationship.
>> and then also draw a connection from GetSFTP to PutHDFS with the 'success' relationship.
>> 
>> This will result in each FlowFile that is routed to 'success' going to both Processors.
>> 
>> NiFi does this without copying the data or anything, simply by creating a new FlowFile that points to the same content on disk, so
>> it is able to do this extremely efficiently.
>> 
>> Thanks
>> -Mark
>> 
>> 
>> 
>> 
>>> On Nov 10, 2015, at 2:39 PM, Chakrader Dewaragatla <Chakrader.Dewaragatla@lifelock.com <ma...@lifelock.com>> wrote:
>>> 
>>> Hi - Do we have any built in processor that replicate flow files to multiple  processors in parallel (in memory, not staging on disk)? 
>>> I was looking at distributedload processor that distribute load on weighted, roudrobin technique. I am looking for something that replicate the flow files. 
>>> 
>>> Thanks,
>>> -Chakri
>>> The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
>> 
>> The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
> 


Re: Replicate flow files to multiple processors

Posted by Oleg Zhurakousky <oz...@hortonworks.com>.
I am still a bit confused with the problem that is being solved here.
“replication” implies some type of redundancy allowing processing that failed “here” to be resumed “there”.

What I am reading here is more about "content based routing” - (route to their respective workflows based on their attribute)

Am I missing something?

Cheers
Oleg
On Nov 10, 2015, at 5:35 PM, Andrew Grande <ag...@hortonworks.com>> wrote:

As mentioned, simply keep connecting things together (e.g. multiple 'success' relationship links). For better organization, consider putting a Funnel in the flow and connecting to it instead of a processor.

Andrew

From: Chakrader Dewaragatla <Ch...@lifelock.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Tuesday, November 10, 2015 at 3:01 PM
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: RE: Replicate flow files to multiple processors

Thanks Mark. This should help.

 Our use case is to route traffic (flowflies) to multiple independent processors that inline route to their respective workflows based on their attribute.

________________________________
From: Mark Payne [markap14@hotmail.com<ma...@hotmail.com>]
Sent: Tuesday, November 10, 2015 11:45 AM
To: users@nifi.apache.org<ma...@nifi.apache.org>
Subject: Re: Replicate flow files to multiple processors

Chakri,

This can be done with any Processor. You can simply drag multiple connections that have the same Relationship.

For example, you can create a GetSFTP processor and draw a connection from GetSFTP to UpdateAttribute with the 'success' relationship.
and then also draw a connection from GetSFTP to PutHDFS with the 'success' relationship.

This will result in each FlowFile that is routed to 'success' going to both Processors.

NiFi does this without copying the data or anything, simply by creating a new FlowFile that points to the same content on disk, so
it is able to do this extremely efficiently.

Thanks
-Mark




On Nov 10, 2015, at 2:39 PM, Chakrader Dewaragatla <Ch...@lifelock.com>> wrote:

Hi - Do we have any built in processor that replicate flow files to multiple  processors in parallel (in memory, not staging on disk)?
I was looking at distributedload processor that distribute load on weighted, roudrobin technique. I am looking for something that replicate the flow files.

Thanks,
-Chakri
________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.


Re: Replicate flow files to multiple processors

Posted by Andrew Grande <ag...@hortonworks.com>.
As mentioned, simply keep connecting things together (e.g. multiple 'success' relationship links). For better organization, consider putting a Funnel in the flow and connecting to it instead of a processor.

Andrew

From: Chakrader Dewaragatla <Ch...@lifelock.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Tuesday, November 10, 2015 at 3:01 PM
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: RE: Replicate flow files to multiple processors

Thanks Mark. This should help.

 Our use case is to route traffic (flowflies) to multiple independent processors that inline route to their respective workflows based on their attribute.

________________________________
From: Mark Payne [markap14@hotmail.com<ma...@hotmail.com>]
Sent: Tuesday, November 10, 2015 11:45 AM
To: users@nifi.apache.org<ma...@nifi.apache.org>
Subject: Re: Replicate flow files to multiple processors

Chakri,

This can be done with any Processor. You can simply drag multiple connections that have the same Relationship.

For example, you can create a GetSFTP processor and draw a connection from GetSFTP to UpdateAttribute with the 'success' relationship.
and then also draw a connection from GetSFTP to PutHDFS with the 'success' relationship.

This will result in each FlowFile that is routed to 'success' going to both Processors.

NiFi does this without copying the data or anything, simply by creating a new FlowFile that points to the same content on disk, so
it is able to do this extremely efficiently.

Thanks
-Mark




On Nov 10, 2015, at 2:39 PM, Chakrader Dewaragatla <Ch...@lifelock.com>> wrote:

Hi - Do we have any built in processor that replicate flow files to multiple  processors in parallel (in memory, not staging on disk)?
I was looking at distributedload processor that distribute load on weighted, roudrobin technique. I am looking for something that replicate the flow files.

Thanks,
-Chakri
________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

RE: Replicate flow files to multiple processors

Posted by Chakrader Dewaragatla <Ch...@lifelock.com>.
Thanks Mark. This should help.

 Our use case is to route traffic (flowflies) to multiple independent processors that inline route to their respective workflows based on their attribute.

________________________________
From: Mark Payne [markap14@hotmail.com]
Sent: Tuesday, November 10, 2015 11:45 AM
To: users@nifi.apache.org
Subject: Re: Replicate flow files to multiple processors

Chakri,

This can be done with any Processor. You can simply drag multiple connections that have the same Relationship.

For example, you can create a GetSFTP processor and draw a connection from GetSFTP to UpdateAttribute with the 'success' relationship.
and then also draw a connection from GetSFTP to PutHDFS with the 'success' relationship.

This will result in each FlowFile that is routed to 'success' going to both Processors.

NiFi does this without copying the data or anything, simply by creating a new FlowFile that points to the same content on disk, so
it is able to do this extremely efficiently.

Thanks
-Mark




On Nov 10, 2015, at 2:39 PM, Chakrader Dewaragatla <Ch...@lifelock.com>> wrote:

Hi - Do we have any built in processor that replicate flow files to multiple  processors in parallel (in memory, not staging on disk)?
I was looking at distributedload processor that distribute load on weighted, roudrobin technique. I am looking for something that replicate the flow files.

Thanks,
-Chakri
________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

________________________________
The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
________________________________

Re: Replicate flow files to multiple processors

Posted by Mark Payne <ma...@hotmail.com>.
Chakri,

This can be done with any Processor. You can simply drag multiple connections that have the same Relationship.

For example, you can create a GetSFTP processor and draw a connection from GetSFTP to UpdateAttribute with the 'success' relationship.
and then also draw a connection from GetSFTP to PutHDFS with the 'success' relationship.

This will result in each FlowFile that is routed to 'success' going to both Processors.

NiFi does this without copying the data or anything, simply by creating a new FlowFile that points to the same content on disk, so
it is able to do this extremely efficiently.

Thanks
-Mark




> On Nov 10, 2015, at 2:39 PM, Chakrader Dewaragatla <Ch...@lifelock.com> wrote:
> 
> Hi - Do we have any built in processor that replicate flow files to multiple  processors in parallel (in memory, not staging on disk)? 
> I was looking at distributedload processor that distribute load on weighted, roudrobin technique. I am looking for something that replicate the flow files. 
> 
> Thanks,
> -Chakri
> The information contained in this transmission may contain privileged and confidential information. It is intended only for the use of the person(s) named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.