You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by DAVID SMITH <da...@btinternet.com> on 2015/10/06 21:58:04 UTC
Bulk data ingest processor
Hi
Has anyone or is anyone creating a bulk data ingest processor. We are looking into the feasibility of using NiFi to do our bulk data pickups from many locations and possibly multiple directories from each location.
My initial thoughts were to use a processor which could run multiple SFTP sessions at one time. Can anyone give me any guidance and what pitfalls I may come up against?
Many thanks
Dave
Sent from Yahoo! Mail on Android
Re: Bulk data ingest processor
Posted by Oleg Zhurakousky <oz...@hortonworks.com>.
And only to add to what Andrew have already said, the beauty of it is not only the simplicity but reliability which comes with “provenance” feature - https://blogs.apache.org/nifi/entry/basic_dataflow_design, which is especially relevant to bulk data movement. As the blog says: "I sent you a file last week. What did you do with it?”. With provenance you’ll always be able to dig for an answer rater then just dig
Oleg
On Oct 6, 2015, at 4:24 PM, Andrew Grande <ag...@hortonworks.com>> wrote:
Dave,
The next version of NiFi has a FetchSFTP processor which may simplify your design. E.g. I have it receiving file paths to pull via many incoming channels, which can be different directories on a server (haven't looked, but it might support server as an expression field).
Andrew
On 10/6/15, 4:19 PM, "Joe Witt" <jo...@gmail.com>> wrote:
David,
You can already place multiple GetSFTP processors on a single flow to
do what is described here. Capturing of *many* flows using one or
more protocols at once and routing them is quite common.
Thanks
Joe
On Tue, Oct 6, 2015 at 3:58 PM, DAVID SMITH <da...@btinternet.com>> wrote:
Hi
Has anyone or is anyone creating a bulk data ingest processor. We are looking into the feasibility of using NiFi to do our bulk data pickups from many locations and possibly multiple directories from each location.
My initial thoughts were to use a processor which could run multiple SFTP sessions at one time. Can anyone give me any guidance and what pitfalls I may come up against?
Many thanks
Dave
Sent from Yahoo! Mail on Android
Re: Bulk data ingest processor
Posted by Andrew Grande <ag...@hortonworks.com>.
Dave,
The next version of NiFi has a FetchSFTP processor which may simplify your design. E.g. I have it receiving file paths to pull via many incoming channels, which can be different directories on a server (haven't looked, but it might support server as an expression field).
Andrew
On 10/6/15, 4:19 PM, "Joe Witt" <jo...@gmail.com> wrote:
>David,
>
>You can already place multiple GetSFTP processors on a single flow to
>do what is described here. Capturing of *many* flows using one or
>more protocols at once and routing them is quite common.
>
>Thanks
>Joe
>
>On Tue, Oct 6, 2015 at 3:58 PM, DAVID SMITH <da...@btinternet.com> wrote:
>> Hi
>>
>> Has anyone or is anyone creating a bulk data ingest processor. We are looking into the feasibility of using NiFi to do our bulk data pickups from many locations and possibly multiple directories from each location.
>> My initial thoughts were to use a processor which could run multiple SFTP sessions at one time. Can anyone give me any guidance and what pitfalls I may come up against?
>>
>> Many thanks
>> Dave
>>
>> Sent from Yahoo! Mail on Android
>>
>
Re: Bulk data ingest processor
Posted by Joe Witt <jo...@gmail.com>.
David,
You can already place multiple GetSFTP processors on a single flow to
do what is described here. Capturing of *many* flows using one or
more protocols at once and routing them is quite common.
Thanks
Joe
On Tue, Oct 6, 2015 at 3:58 PM, DAVID SMITH <da...@btinternet.com> wrote:
> Hi
>
> Has anyone or is anyone creating a bulk data ingest processor. We are looking into the feasibility of using NiFi to do our bulk data pickups from many locations and possibly multiple directories from each location.
> My initial thoughts were to use a processor which could run multiple SFTP sessions at one time. Can anyone give me any guidance and what pitfalls I may come up against?
>
> Many thanks
> Dave
>
> Sent from Yahoo! Mail on Android
>