You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Kyle Burke <ky...@ignitionone.com> on 2016/01/30 16:14:42 UTC

ListS3 processor?

All,
  I'm trying to get Nifi set up to a move data around S3. My first attempt is to just monitor a S3 folder where json files are placed and then copy the file, convert it to Avro, and the drop it in a different S3 folder. The documentation is pretty slim for working with S3. I can't seem to get it working and was wondering if anyone had any S3 examples for monitoring an S3 folder (i.e.. something like a ListS3 processer similar to what is available on a local file system?)

Respectfully,

Kyle Burke | Data Science Engineer
IgnitionOne - Marketing Technology. Simplified.
Office: 1545 Peachtree St NE, Suite 500 | Atlanta, GA | 30309
Direct: 404.961.3918


Re: ListS3 processor?

Posted by Adam Lamar <ad...@gmail.com>.
I agree. There are also one-time actions like processing data inside an 
existing bucket for which a ListS3 processor would be well suited.

Thanks for all the encouraging feedback on the blog post!

Adam

On 2/1/16 9:39 AM, Tony Kurc wrote:
> Joe,
> There is the possibility of people using S3 compatible "block stores" 
> which don't have the same notification services, I'd say leave them open.
>
> Tony


Re: ListS3 processor?

Posted by Tony Kurc <tr...@gmail.com>.
Joe,
There is the possibility of people using S3 compatible "block stores" which
don't have the same notification services, I'd say leave them open.

Tony

On Mon, Feb 1, 2016 at 11:30 AM, Joe Skora <js...@gmail.com> wrote:

> Agreed, excellent write up.
>
> When this thread started I had forgotten about prior discussions of using
> SQS instead a ListS3 processor.  I am familiar with S3 but not as much SQS,
> but Adam's article makes it very accessible.
>
> If SQS is preferred to a ListS3 processor, should the ListS3 related
> tickets be closed?
>
> On Mon, Feb 1, 2016 at 9:28 AM, Mark Payne <ma...@hotmail.com> wrote:
>
>> Adam,
>>
>> Just read through your post - fantastic write-up! Just wanted to say
>> thanks for sharing. This is
>> a question we've seen a few times in the last couple of weeks, and this
>> is a great resource to
>> point people to.
>>
>> Thanks
>> -Mark
>>
>> > On Jan 31, 2016, at 1:57 AM, Adam Lamar <ad...@gmail.com> wrote:
>> >
>> > Kyle/Joe,
>> >
>> > I've been meaning to document this process myself, and just finished a
>> post with some details:
>> >
>> https://adamlamar.github.io/2016-01-30-monitoring-an-s3-bucket-in-apache-nifi/
>> >
>> > Hope that helps,
>> > Adam
>> >
>> > On 1/30/16 9:29 PM, Joe Witt wrote:
>> >> Kyle,
>> >>
>> >> The ideal case for communicating how to do this would be both a
>> >> template and an associated doc.  Great for a blog or wiki page or
>> >> something.  We can of course give you perms to write to a wiki page on
>> >> the nifi wiki if interested.  The template itself can also be
>> >> annotated with comments that show up right in the flow itself.  That
>> >> may be a fine option too.
>> >>
>> >> Thanks
>> >> Joe
>> >>
>> >> On Sat, Jan 30, 2016 at 2:52 PM, Kyle Burke <
>> kyle.burke@ignitionone.com> wrote:
>> >>> Joe/Joe,
>> >>>   Thanks for the response. It makes sense to use SNS and SQS to
>> respond to
>> >>> S3 file changes. I’m going see if my company will give me access to
>> those
>> >>> Amazon services. I found an article that explains how to setup on this
>> >>> functionality in the Amazon console. Once that’s setup it seems pretty
>> >>> straight forward to use GetSQS/DeleteSQS. I suspect many will want
>> this
>> >>> functionality but I’m not sure what’s the best method (i.e. Template
>> or user
>> >>> doc) that explains how to solve this in nifi.  I’ll be happy to submit
>> >>> something if you let me know the right method.
>> >>>
>> >>> http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
>> >>>
>> >>> Respectfully,
>> >>>
>> >>> Kyle Burke | Data Science Engineer
>> >>> IgnitionOne - Marketing Technology. Simplified.
>> >>> Office: 1545 Peachtree St NE, Suite 500 | Atlanta, GA | 30309
>> >>>
>> >>>
>> >>> From: Joe Witt
>> >>> Reply-To: "users@nifi.apache.org"
>> >>> Date: Saturday, January 30, 2016 at 2:06 PM
>> >>> To: "users@nifi.apache.org"
>> >>> Subject: Re: ListS3 processor?
>> >>>
>> >>> Kyle
>> >>>
>> >>> Let us know if that doesn't get you what you need.  We have a decent
>> set of
>> >>> templates but I didn't see one that demonstrates interaction with
>> amazon
>> >>> services.
>> >>>
>> >>> Thanks
>> >>> Joe
>> >>>
>> >>> On Jan 30, 2016 12:56 PM, "Joey Frazee" <jo...@icloud.com>
>> wrote:
>> >>>> Kyle,
>> >>>>
>> >>>> I think you can do what you want right now without ListS3 by using S3
>> >>>> event notifications. You can configure an event notification to
>> publish to
>> >>>> SQS and then use GetSQS to retrieve the events and FetchS3Object to
>> get the
>> >>>> JSON file and the rest of the flow could be written as you have in
>> mind.
>> >>>>
>> >>>> Depending on your scale, this might be preferable because it's
>> >>>> slow/expensive to do listings on S3 prefixes that have a lot of file
>> >>>> matches.
>> >>>>
>> >>>>
>> >>>> -joey
>> >>>>
>> >>>> On Jan 30, 2016, at 11:40 AM, Joe Skora <js...@gmail.com> wrote:
>> >>>>
>> >>>> Kyle,
>> >>>>
>> >>>> Processors exist to Put, Fetch, and Delete S3Objects, but ListS3 is
>> in the
>> >>>> backlog on ticket NIFI-840 at the moment.  It should fit the
>> List/Fetch
>> >>>> metaphor like the List/Fetch processors pairs for xFile, xHDFS,
>> xSFTP, etc.
>> >>>>
>> >>>> Regards,
>> >>>> Joe Skora
>> >>>>
>> >>>> On Sat, Jan 30, 2016 at 10:14 AM, Kyle Burke <
>> kyle.burke@ignitionone.com>
>> >>>> wrote:
>> >>>>> All,
>> >>>>>   I'm trying to get Nifi set up to a move data around S3. My first
>> >>>>> attempt is to just monitor a S3 folder where json files are placed
>> and then
>> >>>>> copy the file, convert it to Avro, and the drop it in a different
>> S3 folder.
>> >>>>> The documentation is pretty slim for working with S3. I can't seem
>> to get it
>> >>>>> working and was wondering if anyone had any S3 examples for
>> monitoring an S3
>> >>>>> folder (i.e.. something like a ListS3 processer similar to what is
>> available
>> >>>>> on a local file system?)
>> >>>>>
>> >>>>> Respectfully,
>> >>>>>
>> >>>>> Kyle Burke | Data Science Engineer
>> >>>>> IgnitionOne - Marketing Technology. Simplified.
>> >>>>> Office: 1545 Peachtree St NE, Suite 500 | Atlanta, GA | 30309
>> >>>>> Direct: 404.961.3918
>> >>>>>
>> >
>>
>>
>

Re: ListS3 processor?

Posted by Joe Skora <js...@gmail.com>.
Agreed, excellent write up.

When this thread started I had forgotten about prior discussions of using
SQS instead a ListS3 processor.  I am familiar with S3 but not as much SQS,
but Adam's article makes it very accessible.

If SQS is preferred to a ListS3 processor, should the ListS3 related
tickets be closed?

On Mon, Feb 1, 2016 at 9:28 AM, Mark Payne <ma...@hotmail.com> wrote:

> Adam,
>
> Just read through your post - fantastic write-up! Just wanted to say
> thanks for sharing. This is
> a question we've seen a few times in the last couple of weeks, and this is
> a great resource to
> point people to.
>
> Thanks
> -Mark
>
> > On Jan 31, 2016, at 1:57 AM, Adam Lamar <ad...@gmail.com> wrote:
> >
> > Kyle/Joe,
> >
> > I've been meaning to document this process myself, and just finished a
> post with some details:
> >
> https://adamlamar.github.io/2016-01-30-monitoring-an-s3-bucket-in-apache-nifi/
> >
> > Hope that helps,
> > Adam
> >
> > On 1/30/16 9:29 PM, Joe Witt wrote:
> >> Kyle,
> >>
> >> The ideal case for communicating how to do this would be both a
> >> template and an associated doc.  Great for a blog or wiki page or
> >> something.  We can of course give you perms to write to a wiki page on
> >> the nifi wiki if interested.  The template itself can also be
> >> annotated with comments that show up right in the flow itself.  That
> >> may be a fine option too.
> >>
> >> Thanks
> >> Joe
> >>
> >> On Sat, Jan 30, 2016 at 2:52 PM, Kyle Burke <ky...@ignitionone.com>
> wrote:
> >>> Joe/Joe,
> >>>   Thanks for the response. It makes sense to use SNS and SQS to
> respond to
> >>> S3 file changes. I’m going see if my company will give me access to
> those
> >>> Amazon services. I found an article that explains how to setup on this
> >>> functionality in the Amazon console. Once that’s setup it seems pretty
> >>> straight forward to use GetSQS/DeleteSQS. I suspect many will want this
> >>> functionality but I’m not sure what’s the best method (i.e. Template
> or user
> >>> doc) that explains how to solve this in nifi.  I’ll be happy to submit
> >>> something if you let me know the right method.
> >>>
> >>> http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
> >>>
> >>> Respectfully,
> >>>
> >>> Kyle Burke | Data Science Engineer
> >>> IgnitionOne - Marketing Technology. Simplified.
> >>> Office: 1545 Peachtree St NE, Suite 500 | Atlanta, GA | 30309
> >>>
> >>>
> >>> From: Joe Witt
> >>> Reply-To: "users@nifi.apache.org"
> >>> Date: Saturday, January 30, 2016 at 2:06 PM
> >>> To: "users@nifi.apache.org"
> >>> Subject: Re: ListS3 processor?
> >>>
> >>> Kyle
> >>>
> >>> Let us know if that doesn't get you what you need.  We have a decent
> set of
> >>> templates but I didn't see one that demonstrates interaction with
> amazon
> >>> services.
> >>>
> >>> Thanks
> >>> Joe
> >>>
> >>> On Jan 30, 2016 12:56 PM, "Joey Frazee" <jo...@icloud.com>
> wrote:
> >>>> Kyle,
> >>>>
> >>>> I think you can do what you want right now without ListS3 by using S3
> >>>> event notifications. You can configure an event notification to
> publish to
> >>>> SQS and then use GetSQS to retrieve the events and FetchS3Object to
> get the
> >>>> JSON file and the rest of the flow could be written as you have in
> mind.
> >>>>
> >>>> Depending on your scale, this might be preferable because it's
> >>>> slow/expensive to do listings on S3 prefixes that have a lot of file
> >>>> matches.
> >>>>
> >>>>
> >>>> -joey
> >>>>
> >>>> On Jan 30, 2016, at 11:40 AM, Joe Skora <js...@gmail.com> wrote:
> >>>>
> >>>> Kyle,
> >>>>
> >>>> Processors exist to Put, Fetch, and Delete S3Objects, but ListS3 is
> in the
> >>>> backlog on ticket NIFI-840 at the moment.  It should fit the
> List/Fetch
> >>>> metaphor like the List/Fetch processors pairs for xFile, xHDFS,
> xSFTP, etc.
> >>>>
> >>>> Regards,
> >>>> Joe Skora
> >>>>
> >>>> On Sat, Jan 30, 2016 at 10:14 AM, Kyle Burke <
> kyle.burke@ignitionone.com>
> >>>> wrote:
> >>>>> All,
> >>>>>   I'm trying to get Nifi set up to a move data around S3. My first
> >>>>> attempt is to just monitor a S3 folder where json files are placed
> and then
> >>>>> copy the file, convert it to Avro, and the drop it in a different S3
> folder.
> >>>>> The documentation is pretty slim for working with S3. I can't seem
> to get it
> >>>>> working and was wondering if anyone had any S3 examples for
> monitoring an S3
> >>>>> folder (i.e.. something like a ListS3 processer similar to what is
> available
> >>>>> on a local file system?)
> >>>>>
> >>>>> Respectfully,
> >>>>>
> >>>>> Kyle Burke | Data Science Engineer
> >>>>> IgnitionOne - Marketing Technology. Simplified.
> >>>>> Office: 1545 Peachtree St NE, Suite 500 | Atlanta, GA | 30309
> >>>>> Direct: 404.961.3918
> >>>>>
> >
>
>

Re: ListS3 processor?

Posted by Mark Payne <ma...@hotmail.com>.
Adam,

Just read through your post - fantastic write-up! Just wanted to say thanks for sharing. This is
a question we've seen a few times in the last couple of weeks, and this is a great resource to
point people to.

Thanks
-Mark

> On Jan 31, 2016, at 1:57 AM, Adam Lamar <ad...@gmail.com> wrote:
> 
> Kyle/Joe,
> 
> I've been meaning to document this process myself, and just finished a post with some details:
> https://adamlamar.github.io/2016-01-30-monitoring-an-s3-bucket-in-apache-nifi/
> 
> Hope that helps,
> Adam
> 
> On 1/30/16 9:29 PM, Joe Witt wrote:
>> Kyle,
>> 
>> The ideal case for communicating how to do this would be both a
>> template and an associated doc.  Great for a blog or wiki page or
>> something.  We can of course give you perms to write to a wiki page on
>> the nifi wiki if interested.  The template itself can also be
>> annotated with comments that show up right in the flow itself.  That
>> may be a fine option too.
>> 
>> Thanks
>> Joe
>> 
>> On Sat, Jan 30, 2016 at 2:52 PM, Kyle Burke <ky...@ignitionone.com> wrote:
>>> Joe/Joe,
>>>   Thanks for the response. It makes sense to use SNS and SQS to respond to
>>> S3 file changes. I’m going see if my company will give me access to those
>>> Amazon services. I found an article that explains how to setup on this
>>> functionality in the Amazon console. Once that’s setup it seems pretty
>>> straight forward to use GetSQS/DeleteSQS. I suspect many will want this
>>> functionality but I’m not sure what’s the best method (i.e. Template or user
>>> doc) that explains how to solve this in nifi.  I’ll be happy to submit
>>> something if you let me know the right method.
>>> 
>>> http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
>>> 
>>> Respectfully,
>>> 
>>> Kyle Burke | Data Science Engineer
>>> IgnitionOne - Marketing Technology. Simplified.
>>> Office: 1545 Peachtree St NE, Suite 500 | Atlanta, GA | 30309
>>> 
>>> 
>>> From: Joe Witt
>>> Reply-To: "users@nifi.apache.org"
>>> Date: Saturday, January 30, 2016 at 2:06 PM
>>> To: "users@nifi.apache.org"
>>> Subject: Re: ListS3 processor?
>>> 
>>> Kyle
>>> 
>>> Let us know if that doesn't get you what you need.  We have a decent set of
>>> templates but I didn't see one that demonstrates interaction with amazon
>>> services.
>>> 
>>> Thanks
>>> Joe
>>> 
>>> On Jan 30, 2016 12:56 PM, "Joey Frazee" <jo...@icloud.com> wrote:
>>>> Kyle,
>>>> 
>>>> I think you can do what you want right now without ListS3 by using S3
>>>> event notifications. You can configure an event notification to publish to
>>>> SQS and then use GetSQS to retrieve the events and FetchS3Object to get the
>>>> JSON file and the rest of the flow could be written as you have in mind.
>>>> 
>>>> Depending on your scale, this might be preferable because it's
>>>> slow/expensive to do listings on S3 prefixes that have a lot of file
>>>> matches.
>>>> 
>>>> 
>>>> -joey
>>>> 
>>>> On Jan 30, 2016, at 11:40 AM, Joe Skora <js...@gmail.com> wrote:
>>>> 
>>>> Kyle,
>>>> 
>>>> Processors exist to Put, Fetch, and Delete S3Objects, but ListS3 is in the
>>>> backlog on ticket NIFI-840 at the moment.  It should fit the List/Fetch
>>>> metaphor like the List/Fetch processors pairs for xFile, xHDFS, xSFTP, etc.
>>>> 
>>>> Regards,
>>>> Joe Skora
>>>> 
>>>> On Sat, Jan 30, 2016 at 10:14 AM, Kyle Burke <ky...@ignitionone.com>
>>>> wrote:
>>>>> All,
>>>>>   I'm trying to get Nifi set up to a move data around S3. My first
>>>>> attempt is to just monitor a S3 folder where json files are placed and then
>>>>> copy the file, convert it to Avro, and the drop it in a different S3 folder.
>>>>> The documentation is pretty slim for working with S3. I can't seem to get it
>>>>> working and was wondering if anyone had any S3 examples for monitoring an S3
>>>>> folder (i.e.. something like a ListS3 processer similar to what is available
>>>>> on a local file system?)
>>>>> 
>>>>> Respectfully,
>>>>> 
>>>>> Kyle Burke | Data Science Engineer
>>>>> IgnitionOne - Marketing Technology. Simplified.
>>>>> Office: 1545 Peachtree St NE, Suite 500 | Atlanta, GA | 30309
>>>>> Direct: 404.961.3918
>>>>> 
> 


Re: ListS3 processor?

Posted by Joe Witt <jo...@gmail.com>.
Ha!  Insert jaw-drop-emoticon-here.

Perfect timing and very well written Adam.

Thanks

On Sun, Jan 31, 2016 at 1:57 AM, Adam Lamar <ad...@gmail.com> wrote:
> Kyle/Joe,
>
> I've been meaning to document this process myself, and just finished a post
> with some details:
> https://adamlamar.github.io/2016-01-30-monitoring-an-s3-bucket-in-apache-nifi/
>
> Hope that helps,
> Adam
>
>
> On 1/30/16 9:29 PM, Joe Witt wrote:
>>
>> Kyle,
>>
>> The ideal case for communicating how to do this would be both a
>> template and an associated doc.  Great for a blog or wiki page or
>> something.  We can of course give you perms to write to a wiki page on
>> the nifi wiki if interested.  The template itself can also be
>> annotated with comments that show up right in the flow itself.  That
>> may be a fine option too.
>>
>> Thanks
>> Joe
>>
>> On Sat, Jan 30, 2016 at 2:52 PM, Kyle Burke <ky...@ignitionone.com>
>> wrote:
>>>
>>> Joe/Joe,
>>>    Thanks for the response. It makes sense to use SNS and SQS to respond
>>> to
>>> S3 file changes. I’m going see if my company will give me access to those
>>> Amazon services. I found an article that explains how to setup on this
>>> functionality in the Amazon console. Once that’s setup it seems pretty
>>> straight forward to use GetSQS/DeleteSQS. I suspect many will want this
>>> functionality but I’m not sure what’s the best method (i.e. Template or
>>> user
>>> doc) that explains how to solve this in nifi.  I’ll be happy to submit
>>> something if you let me know the right method.
>>>
>>> http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
>>>
>>> Respectfully,
>>>
>>> Kyle Burke | Data Science Engineer
>>> IgnitionOne - Marketing Technology. Simplified.
>>> Office: 1545 Peachtree St NE, Suite 500 | Atlanta, GA | 30309
>>>
>>>
>>> From: Joe Witt
>>> Reply-To: "users@nifi.apache.org"
>>> Date: Saturday, January 30, 2016 at 2:06 PM
>>> To: "users@nifi.apache.org"
>>> Subject: Re: ListS3 processor?
>>>
>>> Kyle
>>>
>>> Let us know if that doesn't get you what you need.  We have a decent set
>>> of
>>> templates but I didn't see one that demonstrates interaction with amazon
>>> services.
>>>
>>> Thanks
>>> Joe
>>>
>>> On Jan 30, 2016 12:56 PM, "Joey Frazee" <jo...@icloud.com> wrote:
>>>>
>>>> Kyle,
>>>>
>>>> I think you can do what you want right now without ListS3 by using S3
>>>> event notifications. You can configure an event notification to publish
>>>> to
>>>> SQS and then use GetSQS to retrieve the events and FetchS3Object to get
>>>> the
>>>> JSON file and the rest of the flow could be written as you have in mind.
>>>>
>>>> Depending on your scale, this might be preferable because it's
>>>> slow/expensive to do listings on S3 prefixes that have a lot of file
>>>> matches.
>>>>
>>>>
>>>> -joey
>>>>
>>>> On Jan 30, 2016, at 11:40 AM, Joe Skora <js...@gmail.com> wrote:
>>>>
>>>> Kyle,
>>>>
>>>> Processors exist to Put, Fetch, and Delete S3Objects, but ListS3 is in
>>>> the
>>>> backlog on ticket NIFI-840 at the moment.  It should fit the List/Fetch
>>>> metaphor like the List/Fetch processors pairs for xFile, xHDFS, xSFTP,
>>>> etc.
>>>>
>>>> Regards,
>>>> Joe Skora
>>>>
>>>> On Sat, Jan 30, 2016 at 10:14 AM, Kyle Burke
>>>> <ky...@ignitionone.com>
>>>> wrote:
>>>>>
>>>>> All,
>>>>>    I'm trying to get Nifi set up to a move data around S3. My first
>>>>> attempt is to just monitor a S3 folder where json files are placed and
>>>>> then
>>>>> copy the file, convert it to Avro, and the drop it in a different S3
>>>>> folder.
>>>>> The documentation is pretty slim for working with S3. I can't seem to
>>>>> get it
>>>>> working and was wondering if anyone had any S3 examples for monitoring
>>>>> an S3
>>>>> folder (i.e.. something like a ListS3 processer similar to what is
>>>>> available
>>>>> on a local file system?)
>>>>>
>>>>> Respectfully,
>>>>>
>>>>> Kyle Burke | Data Science Engineer
>>>>> IgnitionOne - Marketing Technology. Simplified.
>>>>> Office: 1545 Peachtree St NE, Suite 500 | Atlanta, GA | 30309
>>>>> Direct: 404.961.3918
>>>>>
>

RE: ListS3 processor?

Posted by Kyle Burke <ky...@ignitionone.com>.
Wow. Really nice work. Thank you so much Adam. I can honestly say I couldn't have done it better myself.

Kyle



-------- Original message --------
From: Adam Lamar <ad...@gmail.com>
Date:01/31/2016 1:57 AM (GMT-05:00)
To: users@nifi.apache.org
Cc:
Subject: Re: ListS3 processor?

Kyle/Joe,

I've been meaning to document this process myself, and just finished a
post with some details:
https://adamlamar.github.io/2016-01-30-monitoring-an-s3-bucket-in-apache-nifi/

Hope that helps,
Adam

On 1/30/16 9:29 PM, Joe Witt wrote:
> Kyle,
>
> The ideal case for communicating how to do this would be both a
> template and an associated doc.  Great for a blog or wiki page or
> something.  We can of course give you perms to write to a wiki page on
> the nifi wiki if interested.  The template itself can also be
> annotated with comments that show up right in the flow itself.  That
> may be a fine option too.
>
> Thanks
> Joe
>
> On Sat, Jan 30, 2016 at 2:52 PM, Kyle Burke <ky...@ignitionone.com> wrote:
>> Joe/Joe,
>>    Thanks for the response. It makes sense to use SNS and SQS to respond to
>> S3 file changes. I’m going see if my company will give me access to those
>> Amazon services. I found an article that explains how to setup on this
>> functionality in the Amazon console. Once that’s setup it seems pretty
>> straight forward to use GetSQS/DeleteSQS. I suspect many will want this
>> functionality but I’m not sure what’s the best method (i.e. Template or user
>> doc) that explains how to solve this in nifi.  I’ll be happy to submit
>> something if you let me know the right method.
>>
>> http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
>>
>> Respectfully,
>>
>> Kyle Burke | Data Science Engineer
>> IgnitionOne - Marketing Technology. Simplified.
>> Office: 1545 Peachtree St NE, Suite 500 | Atlanta, GA | 30309
>>
>>
>> From: Joe Witt
>> Reply-To: "users@nifi.apache.org"
>> Date: Saturday, January 30, 2016 at 2:06 PM
>> To: "users@nifi.apache.org"
>> Subject: Re: ListS3 processor?
>>
>> Kyle
>>
>> Let us know if that doesn't get you what you need.  We have a decent set of
>> templates but I didn't see one that demonstrates interaction with amazon
>> services.
>>
>> Thanks
>> Joe
>>
>> On Jan 30, 2016 12:56 PM, "Joey Frazee" <jo...@icloud.com> wrote:
>>> Kyle,
>>>
>>> I think you can do what you want right now without ListS3 by using S3
>>> event notifications. You can configure an event notification to publish to
>>> SQS and then use GetSQS to retrieve the events and FetchS3Object to get the
>>> JSON file and the rest of the flow could be written as you have in mind.
>>>
>>> Depending on your scale, this might be preferable because it's
>>> slow/expensive to do listings on S3 prefixes that have a lot of file
>>> matches.
>>>
>>>
>>> -joey
>>>
>>> On Jan 30, 2016, at 11:40 AM, Joe Skora <js...@gmail.com> wrote:
>>>
>>> Kyle,
>>>
>>> Processors exist to Put, Fetch, and Delete S3Objects, but ListS3 is in the
>>> backlog on ticket NIFI-840 at the moment.  It should fit the List/Fetch
>>> metaphor like the List/Fetch processors pairs for xFile, xHDFS, xSFTP, etc.
>>>
>>> Regards,
>>> Joe Skora
>>>
>>> On Sat, Jan 30, 2016 at 10:14 AM, Kyle Burke <ky...@ignitionone.com>
>>> wrote:
>>>> All,
>>>>    I'm trying to get Nifi set up to a move data around S3. My first
>>>> attempt is to just monitor a S3 folder where json files are placed and then
>>>> copy the file, convert it to Avro, and the drop it in a different S3 folder.
>>>> The documentation is pretty slim for working with S3. I can't seem to get it
>>>> working and was wondering if anyone had any S3 examples for monitoring an S3
>>>> folder (i.e.. something like a ListS3 processer similar to what is available
>>>> on a local file system?)
>>>>
>>>> Respectfully,
>>>>
>>>> Kyle Burke | Data Science Engineer
>>>> IgnitionOne - Marketing Technology. Simplified.
>>>> Office: 1545 Peachtree St NE, Suite 500 | Atlanta, GA | 30309
>>>> Direct: 404.961.3918
>>>>


Re: ListS3 processor?

Posted by Adam Lamar <ad...@gmail.com>.
Kyle/Joe,

I've been meaning to document this process myself, and just finished a 
post with some details:
https://adamlamar.github.io/2016-01-30-monitoring-an-s3-bucket-in-apache-nifi/

Hope that helps,
Adam

On 1/30/16 9:29 PM, Joe Witt wrote:
> Kyle,
>
> The ideal case for communicating how to do this would be both a
> template and an associated doc.  Great for a blog or wiki page or
> something.  We can of course give you perms to write to a wiki page on
> the nifi wiki if interested.  The template itself can also be
> annotated with comments that show up right in the flow itself.  That
> may be a fine option too.
>
> Thanks
> Joe
>
> On Sat, Jan 30, 2016 at 2:52 PM, Kyle Burke <ky...@ignitionone.com> wrote:
>> Joe/Joe,
>>    Thanks for the response. It makes sense to use SNS and SQS to respond to
>> S3 file changes. I’m going see if my company will give me access to those
>> Amazon services. I found an article that explains how to setup on this
>> functionality in the Amazon console. Once that’s setup it seems pretty
>> straight forward to use GetSQS/DeleteSQS. I suspect many will want this
>> functionality but I’m not sure what’s the best method (i.e. Template or user
>> doc) that explains how to solve this in nifi.  I’ll be happy to submit
>> something if you let me know the right method.
>>
>> http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
>>
>> Respectfully,
>>
>> Kyle Burke | Data Science Engineer
>> IgnitionOne - Marketing Technology. Simplified.
>> Office: 1545 Peachtree St NE, Suite 500 | Atlanta, GA | 30309
>>
>>
>> From: Joe Witt
>> Reply-To: "users@nifi.apache.org"
>> Date: Saturday, January 30, 2016 at 2:06 PM
>> To: "users@nifi.apache.org"
>> Subject: Re: ListS3 processor?
>>
>> Kyle
>>
>> Let us know if that doesn't get you what you need.  We have a decent set of
>> templates but I didn't see one that demonstrates interaction with amazon
>> services.
>>
>> Thanks
>> Joe
>>
>> On Jan 30, 2016 12:56 PM, "Joey Frazee" <jo...@icloud.com> wrote:
>>> Kyle,
>>>
>>> I think you can do what you want right now without ListS3 by using S3
>>> event notifications. You can configure an event notification to publish to
>>> SQS and then use GetSQS to retrieve the events and FetchS3Object to get the
>>> JSON file and the rest of the flow could be written as you have in mind.
>>>
>>> Depending on your scale, this might be preferable because it's
>>> slow/expensive to do listings on S3 prefixes that have a lot of file
>>> matches.
>>>
>>>
>>> -joey
>>>
>>> On Jan 30, 2016, at 11:40 AM, Joe Skora <js...@gmail.com> wrote:
>>>
>>> Kyle,
>>>
>>> Processors exist to Put, Fetch, and Delete S3Objects, but ListS3 is in the
>>> backlog on ticket NIFI-840 at the moment.  It should fit the List/Fetch
>>> metaphor like the List/Fetch processors pairs for xFile, xHDFS, xSFTP, etc.
>>>
>>> Regards,
>>> Joe Skora
>>>
>>> On Sat, Jan 30, 2016 at 10:14 AM, Kyle Burke <ky...@ignitionone.com>
>>> wrote:
>>>> All,
>>>>    I'm trying to get Nifi set up to a move data around S3. My first
>>>> attempt is to just monitor a S3 folder where json files are placed and then
>>>> copy the file, convert it to Avro, and the drop it in a different S3 folder.
>>>> The documentation is pretty slim for working with S3. I can't seem to get it
>>>> working and was wondering if anyone had any S3 examples for monitoring an S3
>>>> folder (i.e.. something like a ListS3 processer similar to what is available
>>>> on a local file system?)
>>>>
>>>> Respectfully,
>>>>
>>>> Kyle Burke | Data Science Engineer
>>>> IgnitionOne - Marketing Technology. Simplified.
>>>> Office: 1545 Peachtree St NE, Suite 500 | Atlanta, GA | 30309
>>>> Direct: 404.961.3918
>>>>


Re: ListS3 processor?

Posted by Joe Witt <jo...@gmail.com>.
Kyle,

The ideal case for communicating how to do this would be both a
template and an associated doc.  Great for a blog or wiki page or
something.  We can of course give you perms to write to a wiki page on
the nifi wiki if interested.  The template itself can also be
annotated with comments that show up right in the flow itself.  That
may be a fine option too.

Thanks
Joe

On Sat, Jan 30, 2016 at 2:52 PM, Kyle Burke <ky...@ignitionone.com> wrote:
> Joe/Joe,
>   Thanks for the response. It makes sense to use SNS and SQS to respond to
> S3 file changes. I’m going see if my company will give me access to those
> Amazon services. I found an article that explains how to setup on this
> functionality in the Amazon console. Once that’s setup it seems pretty
> straight forward to use GetSQS/DeleteSQS. I suspect many will want this
> functionality but I’m not sure what’s the best method (i.e. Template or user
> doc) that explains how to solve this in nifi.  I’ll be happy to submit
> something if you let me know the right method.
>
> http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
>
> Respectfully,
>
> Kyle Burke | Data Science Engineer
> IgnitionOne - Marketing Technology. Simplified.
> Office: 1545 Peachtree St NE, Suite 500 | Atlanta, GA | 30309
>
>
> From: Joe Witt
> Reply-To: "users@nifi.apache.org"
> Date: Saturday, January 30, 2016 at 2:06 PM
> To: "users@nifi.apache.org"
> Subject: Re: ListS3 processor?
>
> Kyle
>
> Let us know if that doesn't get you what you need.  We have a decent set of
> templates but I didn't see one that demonstrates interaction with amazon
> services.
>
> Thanks
> Joe
>
> On Jan 30, 2016 12:56 PM, "Joey Frazee" <jo...@icloud.com> wrote:
>>
>> Kyle,
>>
>> I think you can do what you want right now without ListS3 by using S3
>> event notifications. You can configure an event notification to publish to
>> SQS and then use GetSQS to retrieve the events and FetchS3Object to get the
>> JSON file and the rest of the flow could be written as you have in mind.
>>
>> Depending on your scale, this might be preferable because it's
>> slow/expensive to do listings on S3 prefixes that have a lot of file
>> matches.
>>
>>
>> -joey
>>
>> On Jan 30, 2016, at 11:40 AM, Joe Skora <js...@gmail.com> wrote:
>>
>> Kyle,
>>
>> Processors exist to Put, Fetch, and Delete S3Objects, but ListS3 is in the
>> backlog on ticket NIFI-840 at the moment.  It should fit the List/Fetch
>> metaphor like the List/Fetch processors pairs for xFile, xHDFS, xSFTP, etc.
>>
>> Regards,
>> Joe Skora
>>
>> On Sat, Jan 30, 2016 at 10:14 AM, Kyle Burke <ky...@ignitionone.com>
>> wrote:
>>>
>>> All,
>>>   I'm trying to get Nifi set up to a move data around S3. My first
>>> attempt is to just monitor a S3 folder where json files are placed and then
>>> copy the file, convert it to Avro, and the drop it in a different S3 folder.
>>> The documentation is pretty slim for working with S3. I can't seem to get it
>>> working and was wondering if anyone had any S3 examples for monitoring an S3
>>> folder (i.e.. something like a ListS3 processer similar to what is available
>>> on a local file system?)
>>>
>>> Respectfully,
>>>
>>> Kyle Burke | Data Science Engineer
>>> IgnitionOne - Marketing Technology. Simplified.
>>> Office: 1545 Peachtree St NE, Suite 500 | Atlanta, GA | 30309
>>> Direct: 404.961.3918
>>>
>>
>

Re: ListS3 processor?

Posted by Kyle Burke <ky...@ignitionone.com>.
Joe/Joe,
  Thanks for the response. It makes sense to use SNS and SQS to respond to S3 file changes. I’m going see if my company will give me access to those Amazon services. I found an article that explains how to setup on this functionality in the Amazon console. Once that’s setup it seems pretty straight forward to use GetSQS/DeleteSQS. I suspect many will want this functionality but I’m not sure what’s the best method (i.e. Template or user doc) that explains how to solve this in nifi.  I’ll be happy to submit something if you let me know the right method.

http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html

Respectfully,

Kyle Burke | Data Science Engineer
IgnitionOne - Marketing Technology. Simplified.
Office: 1545 Peachtree St NE, Suite 500 | Atlanta, GA | 30309


From: Joe Witt
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>"
Date: Saturday, January 30, 2016 at 2:06 PM
To: "users@nifi.apache.org<ma...@nifi.apache.org>"
Subject: Re: ListS3 processor?


Kyle

Let us know if that doesn't get you what you need.  We have a decent set of templates but I didn't see one that demonstrates interaction with amazon services.

Thanks
Joe

On Jan 30, 2016 12:56 PM, "Joey Frazee" <jo...@icloud.com>> wrote:
Kyle,

I think you can do what you want right now without ListS3 by using S3 event notifications. You can configure an event notification to publish to SQS and then use GetSQS to retrieve the events and FetchS3Object to get the JSON file and the rest of the flow could be written as you have in mind.

Depending on your scale, this might be preferable because it's slow/expensive to do listings on S3 prefixes that have a lot of file matches.


-joey

On Jan 30, 2016, at 11:40 AM, Joe Skora <js...@gmail.com>> wrote:

Kyle,

Processors exist to Put, Fetch, and Delete S3Objects, but ListS3 is in the backlog on ticket NIFI-840<https://issues.apache.org/jira/browse/NIFI-840> at the moment.  It should fit the List/Fetch metaphor like the List/Fetch processors pairs for xFile, xHDFS, xSFTP, etc.

Regards,
Joe Skora

On Sat, Jan 30, 2016 at 10:14 AM, Kyle Burke <ky...@ignitionone.com>> wrote:
All,
  I'm trying to get Nifi set up to a move data around S3. My first attempt is to just monitor a S3 folder where json files are placed and then copy the file, convert it to Avro, and the drop it in a different S3 folder. The documentation is pretty slim for working with S3. I can't seem to get it working and was wondering if anyone had any S3 examples for monitoring an S3 folder (i.e.. something like a ListS3 processer similar to what is available on a local file system?)

Respectfully,

Kyle Burke | Data Science Engineer
IgnitionOne - Marketing Technology. Simplified.
Office: 1545 Peachtree St NE, Suite 500 | Atlanta, GA | 30309
Direct: 404.961.3918<tel:404.961.3918>



Re: ListS3 processor?

Posted by Joe Witt <jo...@gmail.com>.
Kyle

Let us know if that doesn't get you what you need.  We have a decent set of
templates but I didn't see one that demonstrates interaction with amazon
services.

Thanks
Joe
On Jan 30, 2016 12:56 PM, "Joey Frazee" <jo...@icloud.com> wrote:

> Kyle,
>
> I think you can do what you want right now without ListS3 by using S3
> event notifications. You can configure an event notification to publish to
> SQS and then use GetSQS to retrieve the events and FetchS3Object to get the
> JSON file and the rest of the flow could be written as you have in mind.
>
> Depending on your scale, this might be preferable because it's
> slow/expensive to do listings on S3 prefixes that have a lot of file
> matches.
>
>
> -joey
>
> On Jan 30, 2016, at 11:40 AM, Joe Skora <js...@gmail.com> wrote:
>
> Kyle,
>
> Processors exist to Put, Fetch, and Delete S3Objects, but ListS3 is in the
> backlog on ticket NIFI-840
> <https://issues.apache.org/jira/browse/NIFI-840> at the moment.  It
> should fit the List/Fetch metaphor like the List/Fetch processors pairs for
> xFile, xHDFS, xSFTP, etc.
>
> Regards,
> Joe Skora
>
> On Sat, Jan 30, 2016 at 10:14 AM, Kyle Burke <ky...@ignitionone.com>
> wrote:
>
>> All,
>>   I'm trying to get Nifi set up to a move data around S3. My first
>> attempt is to just monitor a S3 folder where json files are placed and then
>> copy the file, convert it to Avro, and the drop it in a different S3
>> folder. The documentation is pretty slim for working with S3. I can't seem
>> to get it working and was wondering if anyone had any S3 examples for
>> monitoring an S3 folder (i.e.. something like a ListS3 processer similar to
>> what is available on a local file system?)
>>
>> Respectfully,
>>
>> *Kyle Burke *| Data Science Engineer
>> *IgnitionOne - *Marketing Technology. Simplified.
>> Office: 1545 Peachtree St NE, Suite 500 | Atlanta, GA | 30309
>> Direct: 404.961.3918
>>
>>
>

Re: ListS3 processor?

Posted by Joey Frazee <jo...@icloud.com>.
Kyle,

I think you can do what you want right now without ListS3 by using S3 event notifications. You can configure an event notification to publish to SQS and then use GetSQS to retrieve the events and FetchS3Object to get the JSON file and the rest of the flow could be written as you have in mind.

Depending on your scale, this might be preferable because it's slow/expensive to do listings on S3 prefixes that have a lot of file matches.


-joey

> On Jan 30, 2016, at 11:40 AM, Joe Skora <js...@gmail.com> wrote:
> 
> Kyle,
> 
> Processors exist to Put, Fetch, and Delete S3Objects, but ListS3 is in the backlog on ticket NIFI-840 at the moment.  It should fit the List/Fetch metaphor like the List/Fetch processors pairs for xFile, xHDFS, xSFTP, etc.
> 
> Regards,
> Joe Skora
> 
>> On Sat, Jan 30, 2016 at 10:14 AM, Kyle Burke <ky...@ignitionone.com> wrote:
>> All,
>>   I'm trying to get Nifi set up to a move data around S3. My first attempt is to just monitor a S3 folder where json files are placed and then copy the file, convert it to Avro, and the drop it in a different S3 folder. The documentation is pretty slim for working with S3. I can't seem to get it working and was wondering if anyone had any S3 examples for monitoring an S3 folder (i.e.. something like a ListS3 processer similar to what is available on a local file system?) 
>> 
>> Respectfully,
>> 
>> Kyle Burke | Data Science Engineer
>> IgnitionOne - Marketing Technology. Simplified.
>> Office: 1545 Peachtree St NE, Suite 500 | Atlanta, GA | 30309
>> Direct: 404.961.3918
> 

Re: ListS3 processor?

Posted by Joe Skora <js...@gmail.com>.
Kyle,

Processors exist to Put, Fetch, and Delete S3Objects, but ListS3 is in the
backlog on ticket NIFI-840 <https://issues.apache.org/jira/browse/NIFI-840> at
the moment.  It should fit the List/Fetch metaphor like the List/Fetch
processors pairs for xFile, xHDFS, xSFTP, etc.

Regards,
Joe Skora

On Sat, Jan 30, 2016 at 10:14 AM, Kyle Burke <ky...@ignitionone.com>
wrote:

> All,
>   I'm trying to get Nifi set up to a move data around S3. My first
> attempt is to just monitor a S3 folder where json files are placed and then
> copy the file, convert it to Avro, and the drop it in a different S3
> folder. The documentation is pretty slim for working with S3. I can't seem
> to get it working and was wondering if anyone had any S3 examples for
> monitoring an S3 folder (i.e.. something like a ListS3 processer similar to
> what is available on a local file system?)
>
> Respectfully,
>
> *Kyle Burke *| Data Science Engineer
> *IgnitionOne - *Marketing Technology. Simplified.
> Office: 1545 Peachtree St NE, Suite 500 | Atlanta, GA | 30309
> Direct: 404.961.3918
>
>