You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nifi.apache.org by Darren Hitchman Code <co...@hitchman.info> on 2020/07/08 23:33:24 UTC

Questions about Nifi Dev process

Hi All. I’m new to NIFI and I had a few questions which could either be dev and general and wasn’t sure where to direct them.

1. Currently I have a Java process that moves MQ messages a queue into multiple Qs (i.e. a duplicate process) that I would like to replace with NIFI. I have setup a ConsumerJMS and PublishJMS in NIFI however I have 100s of queues per QueueManager and 30 queueManagers. Ideally I would like to be able to create a generator file with a csv string containing “sourceQ, destQ1,destQ2,…” and split that then pass it into the ConsumerJMS. However the ConsumerJMS doesn’t support an input flow. I think this is.a similar problem to the listFile->fetchFile flow in that something provided a list of sources.

Am I missing a fundamental design practice for NIFI (e.g. should I be creating 000’s of processors ?) or should ConsumerJMS accept inbound flow attributes. A similar design will be required when I consume all our Kafka topics.

2. I need to source content from a mainframe that can only send files to me via FTP. NIFI doesn’t have an FTP Server process that I can see but all I need is a simple listener that supports PUT (and maybe MPUT). My question is in regard to request/response flows like the HandleHttpRequest and HandleHttpResponse. Is this the best practice for client server or is a simple listener Process sufficient ? Im happy to code this.

3. Whats the best way to discuss an idea for a new processor/component ? Raise a JIRA or use this mail group ? And I presume the answer will be the dame if I have a question about a design decision that was taken.

Thanks for the help
Darren Hitchman

Re: Questions about Nifi Dev process

Posted by Darren Hitchman Code <co...@hitchman.info>.

Hi Peter. 
The consumer would most likely have to maintain state for the provide list of queues/topics so that it didn’t reconnect every poll cycle. A defined thread pool would read off these queues and yield based on either time or messages consumed.  The flow file creates from the consumingProcessor would contain details about the queue/qm/topic in the attributes.  Its a common pattern with other tools so I was looking for a NIFI equivalent so I didn’t have to create and maintain thousands of ConsumeJMS processors and connections.  It sounds like there isn’t anything currently available in this space so I will create a ticket and progress unless anyone has any feelings otherwise ?

Thanks
Darren


> On 10 Jul 2020, at 5:27 am, Bryan Bende <bb...@gmail.com> wrote:
> 
> I am not that familiar with the low level JMS mechanics so I can't speak
> specifically to that case, but many consumers are meant to be created once
> for a particular topic/queue and kept open until the processor is stopped
> and consuming is stopped. So allowing an incoming connection to specify
> things to consume from, means there is an unbounded number of possible
> consumers. What happens if every minute a new flow file comes in with a new
> topic, do we keep opening new consumers forever? how do we know when to
> stop consuming from one of the topics that came in from a previous flow
> file?
> 
> It is the same problem with the request to ask for ListFile/ListHDFS to
> accept an incoming connection. Those processors maintain state of
> previously seen files so that they are not listed again, so with an
> incoming connection, the amount of state is unbounded because there can be
> an infinite number of changing things to list. So the suggested was to make
> a new variation of those processors like ScanFile that does not maintain
> state and can then scan anything based on the incoming flow file because
> now each execution of the processor is independent from the last.
> 
> Most of the publishers are more fire-and-forget so you should be able to
> set a dynamic topic name using expression language.
> 
> 
> On Thu, Jul 9, 2020 at 3:10 PM Darren Hitchman Code <co...@hitchman.info>
> wrote:
> 
>> Does anyone have any guidence on point one for me please ?
>> 
>> Thanks
>> Darren
>> 
>> 
>> 
>>> On 9 Jul 2020, at 10:06 pm, Peter Gyori <pe...@gmail.com>
>> wrote:
>>> 
>>> Hi Darren,
>>> 
>>> Regarding point 2: I'm currently working on a ListenFTP processor for
>> NiFi
>>> that does exactly what you mentioned. I just opened an Apache Jira ticket
>>> to indicate that the development is in progress:
>>> https://issues.apache.org/jira/browse/NIFI-7624
>>> I'm using the Apache Mina FtpServer library for this purpose. The entire
>>> filesystem layer - used by the library - needs to be replaced since NiFi
>>> won't actually store the files like a regular ftp server. The basic
>>> functionality is working already, but the development is not complete
>> yet.
>>> I'll invite you to review the code once the pull request is ready.
>>> A simple listener process is sufficient in this case. (Listen-type
>>> processors are not native in NiFi, but not unprecedented: there is
>> already
>>> a ListenHTTP processor, for example.)
>>> 
>>> Best regards,
>>> Peter
>>> 
>>> 
>>> On Thu, Jul 9, 2020 at 1:33 AM Darren Hitchman Code <co...@hitchman.info>
>>> wrote:
>>> 
>>>> Hi All. I’m new to NIFI and I had a few questions which could either be
>>>> dev and general and wasn’t sure where to direct them.
>>>> 
>>>> 
>>>> 1. Currently I have a Java process that moves MQ messages a queue into
>>>> multiple Qs (i.e. a duplicate process) that I would like to replace with
>>>> NIFI.  I have setup a ConsumerJMS and PublishJMS in NIFI however I have
>>>> 100s of queues per QueueManager and 30 queueManagers. Ideally I would
>> like
>>>> to be able to create a generator file with a csv string containing
>>>> “sourceQ, destQ1,destQ2,…” and split that then pass it into the
>>>> ConsumerJMS. However the ConsumerJMS doesn’t support an input flow. I
>> think
>>>> this is.a similar problem to the listFile->fetchFile flow in that
>> something
>>>> provided a list of sources.
>>>> 
>>>> Am I missing a fundamental design practice for NIFI (e.g. should I be
>>>> creating 000’s of processors ?) or should ConsumerJMS accept inbound
>> flow
>>>> attributes. A similar design will be required when I consume all our
>> Kafka
>>>> topics.
>>>> 
>>>> 2. I need to source content from a mainframe that can only send files to
>>>> me via FTP. NIFI doesn’t have an FTP Server process that I can see but
>> all
>>>> I need is a simple listener that supports PUT (and maybe MPUT). My
>> question
>>>> is in regard to request/response flows like the HandleHttpRequest and
>>>> HandleHttpResponse. Is this the best practice for client server or is a
>>>> simple listener Process sufficient ? Im happy to code this.
>>>> 
>>>> 3. Whats the best way to discuss an idea for a new processor/component ?
>>>> Raise a JIRA or use this mail group ? And I presume the answer will be
>> the
>>>> dame if I have a question about a design decision that was taken.
>>>> 
>>>> 
>>>> Thanks for the help
>>>> Darren Hitchman
>> 
>>

Re: Questions about Nifi Dev process

Posted by Bryan Bende <bb...@gmail.com>.

I am not that familiar with the low level JMS mechanics so I can't speak
specifically to that case, but many consumers are meant to be created once
for a particular topic/queue and kept open until the processor is stopped
and consuming is stopped. So allowing an incoming connection to specify
things to consume from, means there is an unbounded number of possible
consumers. What happens if every minute a new flow file comes in with a new
topic, do we keep opening new consumers forever? how do we know when to
stop consuming from one of the topics that came in from a previous flow
file?

It is the same problem with the request to ask for ListFile/ListHDFS to
accept an incoming connection. Those processors maintain state of
previously seen files so that they are not listed again, so with an
incoming connection, the amount of state is unbounded because there can be
an infinite number of changing things to list. So the suggested was to make
a new variation of those processors like ScanFile that does not maintain
state and can then scan anything based on the incoming flow file because
now each execution of the processor is independent from the last.

Most of the publishers are more fire-and-forget so you should be able to
set a dynamic topic name using expression language.

On Thu, Jul 9, 2020 at 3:10 PM Darren Hitchman Code <co...@hitchman.info>
wrote:

> Does anyone have any guidence on point one for me please ?
>
> Thanks
> Darren
>
>
>
> > On 9 Jul 2020, at 10:06 pm, Peter Gyori <pe...@gmail.com>
> wrote:
> >
> > Hi Darren,
> >
> > Regarding point 2: I'm currently working on a ListenFTP processor for
> NiFi
> > that does exactly what you mentioned. I just opened an Apache Jira ticket
> > to indicate that the development is in progress:
> > https://issues.apache.org/jira/browse/NIFI-7624
> > I'm using the Apache Mina FtpServer library for this purpose. The entire
> > filesystem layer - used by the library - needs to be replaced since NiFi
> > won't actually store the files like a regular ftp server. The basic
> > functionality is working already, but the development is not complete
> yet.
> > I'll invite you to review the code once the pull request is ready.
> > A simple listener process is sufficient in this case. (Listen-type
> > processors are not native in NiFi, but not unprecedented: there is
> already
> > a ListenHTTP processor, for example.)
> >
> > Best regards,
> > Peter
> >
> >
> > On Thu, Jul 9, 2020 at 1:33 AM Darren Hitchman Code <co...@hitchman.info>
> > wrote:
> >
> >> Hi All. I’m new to NIFI and I had a few questions which could either be
> >> dev and general and wasn’t sure where to direct them.
> >>
> >>
> >> 1. Currently I have a Java process that moves MQ messages a queue into
> >> multiple Qs (i.e. a duplicate process) that I would like to replace with
> >> NIFI.  I have setup a ConsumerJMS and PublishJMS in NIFI however I have
> >> 100s of queues per QueueManager and 30 queueManagers. Ideally I would
> like
> >> to be able to create a generator file with a csv string containing
> >> “sourceQ, destQ1,destQ2,…” and split that then pass it into the
> >> ConsumerJMS. However the ConsumerJMS doesn’t support an input flow. I
> think
> >> this is.a similar problem to the listFile->fetchFile flow in that
> something
> >> provided a list of sources.
> >>
> >> Am I missing a fundamental design practice for NIFI (e.g. should I be
> >> creating 000’s of processors ?) or should ConsumerJMS accept inbound
> flow
> >> attributes. A similar design will be required when I consume all our
> Kafka
> >> topics.
> >>
> >> 2. I need to source content from a mainframe that can only send files to
> >> me via FTP. NIFI doesn’t have an FTP Server process that I can see but
> all
> >> I need is a simple listener that supports PUT (and maybe MPUT). My
> question
> >> is in regard to request/response flows like the HandleHttpRequest and
> >> HandleHttpResponse. Is this the best practice for client server or is a
> >> simple listener Process sufficient ? Im happy to code this.
> >>
> >> 3. Whats the best way to discuss an idea for a new processor/component ?
> >> Raise a JIRA or use this mail group ? And I presume the answer will be
> the
> >> dame if I have a question about a design decision that was taken.
> >>
> >>
> >> Thanks for the help
> >> Darren Hitchman
>
>

Re: Questions about Nifi Dev process

Posted by Darren Hitchman Code <co...@hitchman.info>.

Does anyone have any guidence on point one for me please ?

Thanks
Darren



> On 9 Jul 2020, at 10:06 pm, Peter Gyori <pe...@gmail.com> wrote:
> 
> Hi Darren,
> 
> Regarding point 2: I'm currently working on a ListenFTP processor for NiFi
> that does exactly what you mentioned. I just opened an Apache Jira ticket
> to indicate that the development is in progress:
> https://issues.apache.org/jira/browse/NIFI-7624
> I'm using the Apache Mina FtpServer library for this purpose. The entire
> filesystem layer - used by the library - needs to be replaced since NiFi
> won't actually store the files like a regular ftp server. The basic
> functionality is working already, but the development is not complete yet.
> I'll invite you to review the code once the pull request is ready.
> A simple listener process is sufficient in this case. (Listen-type
> processors are not native in NiFi, but not unprecedented: there is already
> a ListenHTTP processor, for example.)
> 
> Best regards,
> Peter
> 
> 
> On Thu, Jul 9, 2020 at 1:33 AM Darren Hitchman Code <co...@hitchman.info>
> wrote:
> 
>> Hi All. I’m new to NIFI and I had a few questions which could either be
>> dev and general and wasn’t sure where to direct them.
>> 
>> 
>> 1. Currently I have a Java process that moves MQ messages a queue into
>> multiple Qs (i.e. a duplicate process) that I would like to replace with
>> NIFI.  I have setup a ConsumerJMS and PublishJMS in NIFI however I have
>> 100s of queues per QueueManager and 30 queueManagers. Ideally I would like
>> to be able to create a generator file with a csv string containing
>> “sourceQ, destQ1,destQ2,…” and split that then pass it into the
>> ConsumerJMS. However the ConsumerJMS doesn’t support an input flow. I think
>> this is.a similar problem to the listFile->fetchFile flow in that something
>> provided a list of sources.
>> 
>> Am I missing a fundamental design practice for NIFI (e.g. should I be
>> creating 000’s of processors ?) or should ConsumerJMS accept inbound flow
>> attributes. A similar design will be required when I consume all our Kafka
>> topics.
>> 
>> 2. I need to source content from a mainframe that can only send files to
>> me via FTP. NIFI doesn’t have an FTP Server process that I can see but all
>> I need is a simple listener that supports PUT (and maybe MPUT). My question
>> is in regard to request/response flows like the HandleHttpRequest and
>> HandleHttpResponse. Is this the best practice for client server or is a
>> simple listener Process sufficient ? Im happy to code this.
>> 
>> 3. Whats the best way to discuss an idea for a new processor/component ?
>> Raise a JIRA or use this mail group ? And I presume the answer will be the
>> dame if I have a question about a design decision that was taken.
>> 
>> 
>> Thanks for the help
>> Darren Hitchman

Re: Questions about Nifi Dev process

Posted by Darren Hitchman Code <co...@hitchman.info>.

Thanks Peter. I look forward to seeing it.

> On 9 Jul 2020, at 10:06 pm, Peter Gyori <pe...@gmail.com> wrote:
> 
> Hi Darren,
> 
> Regarding point 2: I'm currently working on a ListenFTP processor for NiFi
> that does exactly what you mentioned. I just opened an Apache Jira ticket
> to indicate that the development is in progress:
> https://issues.apache.org/jira/browse/NIFI-7624
> I'm using the Apache Mina FtpServer library for this purpose. The entire
> filesystem layer - used by the library - needs to be replaced since NiFi
> won't actually store the files like a regular ftp server. The basic
> functionality is working already, but the development is not complete yet.
> I'll invite you to review the code once the pull request is ready.
> A simple listener process is sufficient in this case. (Listen-type
> processors are not native in NiFi, but not unprecedented: there is already
> a ListenHTTP processor, for example.)
> 
> Best regards,
> Peter
> 
> 
> On Thu, Jul 9, 2020 at 1:33 AM Darren Hitchman Code <co...@hitchman.info>
> wrote:
> 
>> Hi All. I’m new to NIFI and I had a few questions which could either be
>> dev and general and wasn’t sure where to direct them.
>> 
>> 
>> 1. Currently I have a Java process that moves MQ messages a queue into
>> multiple Qs (i.e. a duplicate process) that I would like to replace with
>> NIFI.  I have setup a ConsumerJMS and PublishJMS in NIFI however I have
>> 100s of queues per QueueManager and 30 queueManagers. Ideally I would like
>> to be able to create a generator file with a csv string containing
>> “sourceQ, destQ1,destQ2,…” and split that then pass it into the
>> ConsumerJMS. However the ConsumerJMS doesn’t support an input flow. I think
>> this is.a similar problem to the listFile->fetchFile flow in that something
>> provided a list of sources.
>> 
>> Am I missing a fundamental design practice for NIFI (e.g. should I be
>> creating 000’s of processors ?) or should ConsumerJMS accept inbound flow
>> attributes. A similar design will be required when I consume all our Kafka
>> topics.
>> 
>> 2. I need to source content from a mainframe that can only send files to
>> me via FTP. NIFI doesn’t have an FTP Server process that I can see but all
>> I need is a simple listener that supports PUT (and maybe MPUT). My question
>> is in regard to request/response flows like the HandleHttpRequest and
>> HandleHttpResponse. Is this the best practice for client server or is a
>> simple listener Process sufficient ? Im happy to code this.
>> 
>> 3. Whats the best way to discuss an idea for a new processor/component ?
>> Raise a JIRA or use this mail group ? And I presume the answer will be the
>> dame if I have a question about a design decision that was taken.
>> 
>> 
>> Thanks for the help
>> Darren Hitchman

Re: Questions about Nifi Dev process

Posted by Peter Gyori <pe...@gmail.com>.

Hi Darren,

Regarding point 2: I'm currently working on a ListenFTP processor for NiFi
that does exactly what you mentioned. I just opened an Apache Jira ticket
to indicate that the development is in progress:
https://issues.apache.org/jira/browse/NIFI-7624
I'm using the Apache Mina FtpServer library for this purpose. The entire
filesystem layer - used by the library - needs to be replaced since NiFi
won't actually store the files like a regular ftp server. The basic
functionality is working already, but the development is not complete yet.
I'll invite you to review the code once the pull request is ready.
A simple listener process is sufficient in this case. (Listen-type
processors are not native in NiFi, but not unprecedented: there is already
a ListenHTTP processor, for example.)

Best regards,
Peter

On Thu, Jul 9, 2020 at 1:33 AM Darren Hitchman Code <co...@hitchman.info>
wrote:

> Hi All. I’m new to NIFI and I had a few questions which could either be
> dev and general and wasn’t sure where to direct them.
>
>
> 1. Currently I have a Java process that moves MQ messages a queue into
> multiple Qs (i.e. a duplicate process) that I would like to replace with
> NIFI.  I have setup a ConsumerJMS and PublishJMS in NIFI however I have
> 100s of queues per QueueManager and 30 queueManagers. Ideally I would like
> to be able to create a generator file with a csv string containing
> “sourceQ, destQ1,destQ2,…” and split that then pass it into the
> ConsumerJMS. However the ConsumerJMS doesn’t support an input flow. I think
> this is.a similar problem to the listFile->fetchFile flow in that something
> provided a list of sources.
>
> Am I missing a fundamental design practice for NIFI (e.g. should I be
> creating 000’s of processors ?) or should ConsumerJMS accept inbound flow
> attributes. A similar design will be required when I consume all our Kafka
> topics.
>
> 2. I need to source content from a mainframe that can only send files to
> me via FTP. NIFI doesn’t have an FTP Server process that I can see but all
> I need is a simple listener that supports PUT (and maybe MPUT). My question
> is in regard to request/response flows like the HandleHttpRequest and
> HandleHttpResponse. Is this the best practice for client server or is a
> simple listener Process sufficient ? Im happy to code this.
>
> 3. Whats the best way to discuss an idea for a new processor/component ?
> Raise a JIRA or use this mail group ? And I presume the answer will be the
> dame if I have a question about a design decision that was taken.
>
>
> Thanks for the help
> Darren Hitchman