You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by "Tarou, Kirk" <Ki...@VerizonWireless.com> on 2016/08/09 21:38:55 UTC

Strategy To collect from hundreds of servers?

I'm trying to transition my data collection from a variety of configuration files and scripts running in cron to NiFi.

Given the hundreds of servers I have to collect from, creating a flow for each server and data type would be very time consuming. Even utilizing templates, creating everything in the GUI would take too long. 

I've considered writing a script that uses the API to generate the flows, but I'm concerned that it will be difficult to make changes without deleting whole sections of my flow and regenerating them. 

I've written a few custom processors, so I'm considering building what I need on my own and leveraging the existing processors when I can.

If anyone else has done something similar and would be willing to share their strategy, I'd really appreciate it. 

-Kirk

Re: Strategy To collect from hundreds of servers?

Posted by Aldrin Piri <al...@gmail.com>.
Hi Kirk,

To echo what Andy said, MiNiFi seems like a likely solution and I feel like
a lot of the design and principles the community came up with when the
topic was first discussed get close the functionality you are looking for.
We are pretty early on with this effort, but the notion of centralized
management is a key target moving forward which would help alleviate some
of these issues.  MiNiFi is about bringing the core principles of NiFi such
as management, provenance, and security closer to where the data
originates.  Management in this case is via declarative configuration.
There are a lot of very cool possibilities ahead.

Saying you wanted to give things a go with In the current, 0.0.1 release,
this is specified through a YAML configuration file [1].  A common pattern
for the existing functionality is to use NiFi to design a template which
can be converted to the MiNiFi YAML format via a toolkit [2].  From here,
it would be possible to make use of popular CM tools like Puppet, Chef,
Ansible, Salt and the like to take this base template and deploy to your
various classes of machines/servers/devices.  Longer term, with an
established centralized management mechanism, the notions of he toolkit and
CM tool would be less important or alternatives to a more cohesive
experience in communicating with MiNiFi instances in terms of both
processing control and data flow.

To help supplement this idea, we would definitely welcome hearing more
about your infrastructure/topology you are trying to cover.  One of the
ideas that seemed to be fairly consistent was the idea of classes of
sources.  For instance, while I may have n-thousand servers, there are only
m types of servers.  Anything you can share in that end would definitely be
helpful for the community in helping to design.

Good questions and overview of your problem and pain points.  It is one I
think we definitely want to solve, so any information you can share with us
would definitely help us get closer to that your goal.

[1] http://nifi.apache.org/minifi/system-admin-guide.html
[2] http://nifi.apache.org/minifi/minifi-toolkit.html



On Tue, Aug 9, 2016 at 6:33 PM, Andy LoPresto <al...@apache.org> wrote:

> Hi Kirk,
>
> Without knowing more of the details of your situation, my suggestions
> would be as follows:
>
> * Abstract the details to follow specific conventions (i.e. always read
> from a standard directory path when loading data files, etc.)
> * If you feel you need an instance of NiFi on every remote system (the
> source servers), investigate MiNiFi [1][2]. You can build a single flow
> in NiFi and export it as a class of flows to MiNiFi. It will run as
> a “well-behaved” guest agent rather than NiFi, which can sometimes be
> resource-heavy and more often is the main tenant on a system.
> * If you don’t think you need an instance of NiFi to do processing and
> transmission on each endpoint system, use standard tools like rsync, FTP,
> UDP, etc. to transmit the data from each of the collection systems to a
> common NiFi endpoint, where you can listen for each protocol/origin and
> aggregate the data into the form you wish to process and operate on in a
> single location.
>
> I know that was vague but hopefully it will help you identify the core
> functionality you need and will lead to steps toward a reusable and
> versatile solution. If you have further specifics, people may be able to
> provide additional advice.
>
> [1] https://nifi.apache.org/minifi/
> [2] https://cwiki.apache.org/confluence/display/MINIFI/Design
>
>
> Andy LoPresto
> alopresto@apache.org
> *alopresto.apache@gmail.com <al...@gmail.com>*
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Aug 9, 2016, at 2:38 PM, Tarou, Kirk <Kirk.Tarou@VerizonWireless.com
> <Ki...@verizonwireless.com>> wrote:
>
> I'm trying to transition my data collection from a variety of
> configuration files and scripts running in cron to NiFi.
>
> Given the hundreds of servers I have to collect from, creating a flow for
> each server and data type would be very time consuming. Even utilizing
> templates, creating everything in the GUI would take too long.
>
> I've considered writing a script that uses the API to generate the flows,
> but I'm concerned that it will be difficult to make changes without
> deleting whole sections of my flow and regenerating them.
>
> I've written a few custom processors, so I'm considering building what I
> need on my own and leveraging the existing processors when I can.
>
> If anyone else has done something similar and would be willing to share
> their strategy, I'd really appreciate it.
>
> -Kirk
>
>
>

Re: Strategy To collect from hundreds of servers?

Posted by Andy LoPresto <al...@apache.org>.
Hi Kirk,

Without knowing more of the details of your situation, my suggestions would be as follows:

* Abstract the details to follow specific conventions (i.e. always read from a standard directory path when loading data files, etc.)
* If you feel you need an instance of NiFi on every remote system (the source servers), investigate MiNiFi [1][2]. You can build a single flow in NiFi and export it as a class of flows to MiNiFi. It will run as a “well-behaved” guest agent rather than NiFi, which can sometimes be resource-heavy and more often is the main tenant on a system.
* If you don’t think you need an instance of NiFi to do processing and transmission on each endpoint system, use standard tools like rsync, FTP, UDP, etc. to transmit the data from each of the collection systems to a common NiFi endpoint, where you can listen for each protocol/origin and aggregate the data into the form you wish to process and operate on in a single location.

I know that was vague but hopefully it will help you identify the core functionality you need and will lead to steps toward a reusable and versatile solution. If you have further specifics, people may be able to provide additional advice.

[1] https://nifi.apache.org/minifi/ <https://nifi.apache.org/minifi/>
[2] https://cwiki.apache.org/confluence/display/MINIFI/Design <https://cwiki.apache.org/confluence/display/MINIFI/Design>


Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Aug 9, 2016, at 2:38 PM, Tarou, Kirk <Ki...@VerizonWireless.com> wrote:
> 
> I'm trying to transition my data collection from a variety of configuration files and scripts running in cron to NiFi.
> 
> Given the hundreds of servers I have to collect from, creating a flow for each server and data type would be very time consuming. Even utilizing templates, creating everything in the GUI would take too long.
> 
> I've considered writing a script that uses the API to generate the flows, but I'm concerned that it will be difficult to make changes without deleting whole sections of my flow and regenerating them.
> 
> I've written a few custom processors, so I'm considering building what I need on my own and leveraging the existing processors when I can.
> 
> If anyone else has done something similar and would be willing to share their strategy, I'd really appreciate it.
> 
> -Kirk