You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by M Singh <ma...@yahoo.com> on 2016/07/15 13:09:02 UTC

Apache Nifi - Splitting input and distributing processing to multiple nodes in a Nifi cluster

Hey Folks:
I am looking for information on how to split/partition input in a generic way (say rows in a relational database, or lines in a file) and then process each split on a different node in parallel in a Nifi cluster.  I believe there is a webinar from the Nifi team on this but am not able to find it now.
If someone has the documentation on this or link the webinar, please let me know.
Thanks
Mans

Re: Apache Nifi - Splitting input and distributing processing to multiple nodes in a Nifi cluster

Posted by M Singh <ma...@yahoo.com>.
Thanks Bryan.  I will check it. 

    On Friday, July 15, 2016 9:49 AM, Bryan Bende <bb...@gmail.com> wrote:
 

 Hi Mans,
Not sure if this is what you are referring to, but there is a diagram in this article that shows how this would work for fetching from HDFS in parallel:https://community.hortonworks.com/articles/16120/how-do-i-distribute-data-across-a-nifi-cluster.html

It is more from the logical point of view, rather than how to actually configure step-by-step in NiFi.
-Bryan
On Fri, Jul 15, 2016 at 12:42 PM, M Singh <ma...@yahoo.com> wrote:

Hi Joe:
Thanks for the info.  
I believe one of the Nifi team members had a webinar/presentation on it or something very similar.  If you have a reference for that, please let me know.
Thanks again for your help. 

    On Friday, July 15, 2016 6:37 AM, Joe Witt <jo...@gmail.com> wrote:
 

 Mans,

The general pattern for something like this that works well is:
 - Capture
 - Split
 - Site-to-Site transfer back to same cluster which distributes the
partitioned/split data to all nodes
 - Do work on smaller chunks

We often do exactly this sort of thing for larger scale geo enrichment
for example.
- Receive large batch of events on a given system (in a line oriented
event model)
- Run SplitText to break out each event
- Use site-to-site to distribute them to the entire cluster
- On each node receive split events then run geo enrichment
- then send to Kafka as-is or aggregate and send to HDFS

Does that make sense/help for your scenario?

Thanks
Joe


On Fri, Jul 15, 2016 at 9:09 AM, M Singh <ma...@yahoo.com> wrote:
> Hey Folks:
>
> I am looking for information on how to split/partition input in a generic
> way (say rows in a relational database, or lines in a file) and then process
> each split on a different node in parallel in a Nifi cluster.  I believe
> there is a webinar from the Nifi team on this but am not able to find it
> now.
>
> If someone has the documentation on this or link the webinar, please let me
> know.
>
> Thanks
>
> Mans


   



  

Re: Apache Nifi - Splitting input and distributing processing to multiple nodes in a Nifi cluster

Posted by Bryan Bende <bb...@gmail.com>.
Hi Mans,

Not sure if this is what you are referring to, but there is a diagram in
this article that shows how this would work for fetching from HDFS in
parallel:
https://community.hortonworks.com/articles/16120/how-do-i-distribute-data-across-a-nifi-cluster.html

It is more from the logical point of view, rather than how to actually
configure step-by-step in NiFi.

-Bryan

On Fri, Jul 15, 2016 at 12:42 PM, M Singh <ma...@yahoo.com> wrote:

> Hi Joe:
>
> Thanks for the info.
>
> I believe one of the Nifi team members had a webinar/presentation on it or
> something very similar.  If you have a reference for that, please let me
> know.
>
> Thanks again for your help.
>
>
> On Friday, July 15, 2016 6:37 AM, Joe Witt <jo...@gmail.com> wrote:
>
>
> Mans,
>
> The general pattern for something like this that works well is:
> - Capture
> - Split
> - Site-to-Site transfer back to same cluster which distributes the
> partitioned/split data to all nodes
> - Do work on smaller chunks
>
> We often do exactly this sort of thing for larger scale geo enrichment
> for example.
> - Receive large batch of events on a given system (in a line oriented
> event model)
> - Run SplitText to break out each event
> - Use site-to-site to distribute them to the entire cluster
> - On each node receive split events then run geo enrichment
> - then send to Kafka as-is or aggregate and send to HDFS
>
> Does that make sense/help for your scenario?
>
> Thanks
> Joe
>
>
> On Fri, Jul 15, 2016 at 9:09 AM, M Singh <ma...@yahoo.com> wrote:
> > Hey Folks:
> >
> > I am looking for information on how to split/partition input in a generic
> > way (say rows in a relational database, or lines in a file) and then
> process
> > each split on a different node in parallel in a Nifi cluster.  I believe
> > there is a webinar from the Nifi team on this but am not able to find it
> > now.
> >
> > If someone has the documentation on this or link the webinar, please let
> me
> > know.
> >
> > Thanks
> >
> > Mans
>
>
>

Re: Apache Nifi - Splitting input and distributing processing to multiple nodes in a Nifi cluster

Posted by M Singh <ma...@yahoo.com>.
Hi Joe:
Thanks for the info.  
I believe one of the Nifi team members had a webinar/presentation on it or something very similar.  If you have a reference for that, please let me know.
Thanks again for your help. 

    On Friday, July 15, 2016 6:37 AM, Joe Witt <jo...@gmail.com> wrote:
 

 Mans,

The general pattern for something like this that works well is:
 - Capture
 - Split
 - Site-to-Site transfer back to same cluster which distributes the
partitioned/split data to all nodes
 - Do work on smaller chunks

We often do exactly this sort of thing for larger scale geo enrichment
for example.
- Receive large batch of events on a given system (in a line oriented
event model)
- Run SplitText to break out each event
- Use site-to-site to distribute them to the entire cluster
- On each node receive split events then run geo enrichment
- then send to Kafka as-is or aggregate and send to HDFS

Does that make sense/help for your scenario?

Thanks
Joe


On Fri, Jul 15, 2016 at 9:09 AM, M Singh <ma...@yahoo.com> wrote:
> Hey Folks:
>
> I am looking for information on how to split/partition input in a generic
> way (say rows in a relational database, or lines in a file) and then process
> each split on a different node in parallel in a Nifi cluster.  I believe
> there is a webinar from the Nifi team on this but am not able to find it
> now.
>
> If someone has the documentation on this or link the webinar, please let me
> know.
>
> Thanks
>
> Mans


  

Re: Apache Nifi - Splitting input and distributing processing to multiple nodes in a Nifi cluster

Posted by Joe Witt <jo...@gmail.com>.
Mans,

The general pattern for something like this that works well is:
 - Capture
 - Split
 - Site-to-Site transfer back to same cluster which distributes the
partitioned/split data to all nodes
 - Do work on smaller chunks

We often do exactly this sort of thing for larger scale geo enrichment
for example.
- Receive large batch of events on a given system (in a line oriented
event model)
- Run SplitText to break out each event
- Use site-to-site to distribute them to the entire cluster
- On each node receive split events then run geo enrichment
- then send to Kafka as-is or aggregate and send to HDFS

Does that make sense/help for your scenario?

Thanks
Joe


On Fri, Jul 15, 2016 at 9:09 AM, M Singh <ma...@yahoo.com> wrote:
> Hey Folks:
>
> I am looking for information on how to split/partition input in a generic
> way (say rows in a relational database, or lines in a file) and then process
> each split on a different node in parallel in a Nifi cluster.  I believe
> there is a webinar from the Nifi team on this but am not able to find it
> now.
>
> If someone has the documentation on this or link the webinar, please let me
> know.
>
> Thanks
>
> Mans