You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by DAVID SMITH <da...@btinternet.com> on 2015/10/06 21:58:04 UTC

Bulk data ingest processor

Hi

Has anyone or is anyone creating a bulk data ingest processor. We are looking into the feasibility of using NiFi to do our bulk data pickups from many locations and possibly multiple directories from each location.
My initial thoughts were to use a processor which could run multiple SFTP sessions at one time. Can anyone give me any guidance and what pitfalls I may come up against?

Many thanks
Dave

Sent from Yahoo! Mail on Android


Re: Bulk data ingest processor

Posted by Oleg Zhurakousky <oz...@hortonworks.com>.
And only to add  to what Andrew have already said, the beauty of it is not only the simplicity but reliability which comes with “provenance” feature - https://blogs.apache.org/nifi/entry/basic_dataflow_design, which is especially relevant to bulk data movement. As the blog says: "I sent you a file last week. What did you do with it?”. With provenance you’ll always be able to dig for an answer rater then just dig

Oleg

On Oct 6, 2015, at 4:24 PM, Andrew Grande <ag...@hortonworks.com>> wrote:

Dave,

The next version of NiFi has a FetchSFTP processor which may simplify your design. E.g. I have it receiving file paths to pull via many incoming channels, which can be different directories on a server (haven't looked, but it might support server as an expression field).

Andrew




On 10/6/15, 4:19 PM, "Joe Witt" <jo...@gmail.com>> wrote:

David,

You can already place multiple GetSFTP processors on a single flow to
do what is described here.  Capturing of *many* flows using one or
more protocols at once and routing them is quite common.

Thanks
Joe

On Tue, Oct 6, 2015 at 3:58 PM, DAVID SMITH <da...@btinternet.com>> wrote:
Hi

Has anyone or is anyone creating a bulk data ingest processor. We are looking into the feasibility of using NiFi to do our bulk data pickups from many locations and possibly multiple directories from each location.
My initial thoughts were to use a processor which could run multiple SFTP sessions at one time. Can anyone give me any guidance and what pitfalls I may come up against?

Many thanks
Dave

Sent from Yahoo! Mail on Android




Re: Bulk data ingest processor

Posted by Andrew Grande <ag...@hortonworks.com>.
Dave,

The next version of NiFi has a FetchSFTP processor which may simplify your design. E.g. I have it receiving file paths to pull via many incoming channels, which can be different directories on a server (haven't looked, but it might support server as an expression field).

Andrew




On 10/6/15, 4:19 PM, "Joe Witt" <jo...@gmail.com> wrote:

>David,
>
>You can already place multiple GetSFTP processors on a single flow to
>do what is described here.  Capturing of *many* flows using one or
>more protocols at once and routing them is quite common.
>
>Thanks
>Joe
>
>On Tue, Oct 6, 2015 at 3:58 PM, DAVID SMITH <da...@btinternet.com> wrote:
>> Hi
>>
>> Has anyone or is anyone creating a bulk data ingest processor. We are looking into the feasibility of using NiFi to do our bulk data pickups from many locations and possibly multiple directories from each location.
>> My initial thoughts were to use a processor which could run multiple SFTP sessions at one time. Can anyone give me any guidance and what pitfalls I may come up against?
>>
>> Many thanks
>> Dave
>>
>> Sent from Yahoo! Mail on Android
>>
>

Re: Bulk data ingest processor

Posted by Joe Witt <jo...@gmail.com>.
David,

You can already place multiple GetSFTP processors on a single flow to
do what is described here.  Capturing of *many* flows using one or
more protocols at once and routing them is quite common.

Thanks
Joe

On Tue, Oct 6, 2015 at 3:58 PM, DAVID SMITH <da...@btinternet.com> wrote:
> Hi
>
> Has anyone or is anyone creating a bulk data ingest processor. We are looking into the feasibility of using NiFi to do our bulk data pickups from many locations and possibly multiple directories from each location.
> My initial thoughts were to use a processor which could run multiple SFTP sessions at one time. Can anyone give me any guidance and what pitfalls I may come up against?
>
> Many thanks
> Dave
>
> Sent from Yahoo! Mail on Android
>