You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by M Singh <ma...@yahoo.com> on 2016/07/15 13:09:02 UTC
Apache Nifi - Splitting input and distributing processing to
multiple nodes in a Nifi cluster
Hey Folks:
I am looking for information on how to split/partition input in a generic way (say rows in a relational database, or lines in a file) and then process each split on a different node in parallel in a Nifi cluster. I believe there is a webinar from the Nifi team on this but am not able to find it now.
If someone has the documentation on this or link the webinar, please let me know.
Thanks
Mans
Re: Apache Nifi - Splitting input and distributing processing to
multiple nodes in a Nifi cluster
Posted by M Singh <ma...@yahoo.com>.
Thanks Bryan. I will check it.
On Friday, July 15, 2016 9:49 AM, Bryan Bende <bb...@gmail.com> wrote:
Hi Mans,
Not sure if this is what you are referring to, but there is a diagram in this article that shows how this would work for fetching from HDFS in parallel:https://community.hortonworks.com/articles/16120/how-do-i-distribute-data-across-a-nifi-cluster.html
It is more from the logical point of view, rather than how to actually configure step-by-step in NiFi.
-Bryan
On Fri, Jul 15, 2016 at 12:42 PM, M Singh <ma...@yahoo.com> wrote:
Hi Joe:
Thanks for the info.
I believe one of the Nifi team members had a webinar/presentation on it or something very similar. If you have a reference for that, please let me know.
Thanks again for your help.
On Friday, July 15, 2016 6:37 AM, Joe Witt <jo...@gmail.com> wrote:
Mans,
The general pattern for something like this that works well is:
- Capture
- Split
- Site-to-Site transfer back to same cluster which distributes the
partitioned/split data to all nodes
- Do work on smaller chunks
We often do exactly this sort of thing for larger scale geo enrichment
for example.
- Receive large batch of events on a given system (in a line oriented
event model)
- Run SplitText to break out each event
- Use site-to-site to distribute them to the entire cluster
- On each node receive split events then run geo enrichment
- then send to Kafka as-is or aggregate and send to HDFS
Does that make sense/help for your scenario?
Thanks
Joe
On Fri, Jul 15, 2016 at 9:09 AM, M Singh <ma...@yahoo.com> wrote:
> Hey Folks:
>
> I am looking for information on how to split/partition input in a generic
> way (say rows in a relational database, or lines in a file) and then process
> each split on a different node in parallel in a Nifi cluster. I believe
> there is a webinar from the Nifi team on this but am not able to find it
> now.
>
> If someone has the documentation on this or link the webinar, please let me
> know.
>
> Thanks
>
> Mans
Re: Apache Nifi - Splitting input and distributing processing to
multiple nodes in a Nifi cluster
Posted by Bryan Bende <bb...@gmail.com>.
Hi Mans,
Not sure if this is what you are referring to, but there is a diagram in
this article that shows how this would work for fetching from HDFS in
parallel:
https://community.hortonworks.com/articles/16120/how-do-i-distribute-data-across-a-nifi-cluster.html
It is more from the logical point of view, rather than how to actually
configure step-by-step in NiFi.
-Bryan
On Fri, Jul 15, 2016 at 12:42 PM, M Singh <ma...@yahoo.com> wrote:
> Hi Joe:
>
> Thanks for the info.
>
> I believe one of the Nifi team members had a webinar/presentation on it or
> something very similar. If you have a reference for that, please let me
> know.
>
> Thanks again for your help.
>
>
> On Friday, July 15, 2016 6:37 AM, Joe Witt <jo...@gmail.com> wrote:
>
>
> Mans,
>
> The general pattern for something like this that works well is:
> - Capture
> - Split
> - Site-to-Site transfer back to same cluster which distributes the
> partitioned/split data to all nodes
> - Do work on smaller chunks
>
> We often do exactly this sort of thing for larger scale geo enrichment
> for example.
> - Receive large batch of events on a given system (in a line oriented
> event model)
> - Run SplitText to break out each event
> - Use site-to-site to distribute them to the entire cluster
> - On each node receive split events then run geo enrichment
> - then send to Kafka as-is or aggregate and send to HDFS
>
> Does that make sense/help for your scenario?
>
> Thanks
> Joe
>
>
> On Fri, Jul 15, 2016 at 9:09 AM, M Singh <ma...@yahoo.com> wrote:
> > Hey Folks:
> >
> > I am looking for information on how to split/partition input in a generic
> > way (say rows in a relational database, or lines in a file) and then
> process
> > each split on a different node in parallel in a Nifi cluster. I believe
> > there is a webinar from the Nifi team on this but am not able to find it
> > now.
> >
> > If someone has the documentation on this or link the webinar, please let
> me
> > know.
> >
> > Thanks
> >
> > Mans
>
>
>
Re: Apache Nifi - Splitting input and distributing processing to
multiple nodes in a Nifi cluster
Posted by M Singh <ma...@yahoo.com>.
Hi Joe:
Thanks for the info.
I believe one of the Nifi team members had a webinar/presentation on it or something very similar. If you have a reference for that, please let me know.
Thanks again for your help.
On Friday, July 15, 2016 6:37 AM, Joe Witt <jo...@gmail.com> wrote:
Mans,
The general pattern for something like this that works well is:
- Capture
- Split
- Site-to-Site transfer back to same cluster which distributes the
partitioned/split data to all nodes
- Do work on smaller chunks
We often do exactly this sort of thing for larger scale geo enrichment
for example.
- Receive large batch of events on a given system (in a line oriented
event model)
- Run SplitText to break out each event
- Use site-to-site to distribute them to the entire cluster
- On each node receive split events then run geo enrichment
- then send to Kafka as-is or aggregate and send to HDFS
Does that make sense/help for your scenario?
Thanks
Joe
On Fri, Jul 15, 2016 at 9:09 AM, M Singh <ma...@yahoo.com> wrote:
> Hey Folks:
>
> I am looking for information on how to split/partition input in a generic
> way (say rows in a relational database, or lines in a file) and then process
> each split on a different node in parallel in a Nifi cluster. I believe
> there is a webinar from the Nifi team on this but am not able to find it
> now.
>
> If someone has the documentation on this or link the webinar, please let me
> know.
>
> Thanks
>
> Mans
Re: Apache Nifi - Splitting input and distributing processing to
multiple nodes in a Nifi cluster
Posted by Joe Witt <jo...@gmail.com>.
Mans,
The general pattern for something like this that works well is:
- Capture
- Split
- Site-to-Site transfer back to same cluster which distributes the
partitioned/split data to all nodes
- Do work on smaller chunks
We often do exactly this sort of thing for larger scale geo enrichment
for example.
- Receive large batch of events on a given system (in a line oriented
event model)
- Run SplitText to break out each event
- Use site-to-site to distribute them to the entire cluster
- On each node receive split events then run geo enrichment
- then send to Kafka as-is or aggregate and send to HDFS
Does that make sense/help for your scenario?
Thanks
Joe
On Fri, Jul 15, 2016 at 9:09 AM, M Singh <ma...@yahoo.com> wrote:
> Hey Folks:
>
> I am looking for information on how to split/partition input in a generic
> way (say rows in a relational database, or lines in a file) and then process
> each split on a different node in parallel in a Nifi cluster. I believe
> there is a webinar from the Nifi team on this but am not able to find it
> now.
>
> If someone has the documentation on this or link the webinar, please let me
> know.
>
> Thanks
>
> Mans