You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Kumiko Yada <Ku...@ds-iq.com> on 2016/05/27 01:08:36 UTC

Best way to process the processor requests in batch

Hello,

We implemented the custom process that are similar to the InvokeHTTP that the part of URL can be replaced with the Context Data List, then write the weather to the flowfile.  For example, URL to get the weather feed have to include the zip code in URL, and the ZIP code is {0} in the URL and replaced the zip code from the Context Data List property.

URL
http://example{0}/weather<http://example%7b0%7d/weather>

Context Data List:
00000
11111
22222

Processor with make the following requests:
http://example{0}/weather<http://example%7b0%7d/weather>

http://example00000/weather
http://example11111/weather
http://example22222/weather

This processor is processed in one request at a time and have a perf issue.  I'd like to modify to process in batches.  What are the best way to process in batches?  And also, would the Nifi keep track how many requests the processor is processed?  If so, how the Nifi keep track this and how long the Nifi keep track of data?  I'd like to add the quota priorities in this processor to keep track of quota.  For example, if the weather feeds can be requested only 100 requests a day, I don't want to processor to executed once the quota is reached.

Thanks
Kumiko

RE: Best way to process the processor requests in batch

Posted by Kumiko Yada <Ku...@ds-iq.com>.
Joe,

Thank you for your inputs.

I'd like to avoid creating the multi-threads.  Would it possible to loop through a ProcessSession once it's committed?   For example, the total of 1000 requests, and break down 100 requests per batch.  Create/transfer a flowfile per request, then once 100 requests are processed, commit it and then loop through again.  Would it better that transfer a flow once at time, but transfer it in batch?

Thanks
Kumiko

-----Original Message-----
From: Joe Witt [mailto:joe.witt@gmail.com] 
Sent: Thursday, May 26, 2016 7:17 PM
To: dev@nifi.apache.org
Subject: Re: Best way to process the processor requests in batch

Kumiko

A couple of quick thoughts to share.  You can absolutely code your processor to operate in batches and you can of course multi-thread the processor.  The general unit of work concept Apache NiFi supports is called a ProcessSession and you can operate on as many flow files as
you need in that session and then commit it as one batch.   NiFi will
automatically track/record a lot of very nice information at the process session level.  In addition NiFi will capture provenance information which itself is useful for understand specific items that went through that flow and their latencies and such.  Beyond these options there is also a concept of counters which you can use to capture, generally for development purposes, interesting things you'd like to observe over time. You'll also want to get a good handle on what performance you should expect interacting with the web service independent of NiFi so you can get a good baseline to work from.

The quota question is also one where you have choices and design decisions to make.  You can bake this quota handling logic into your processor itself or you could also possibly wire existing or some new processor in that specifically handles the quote/grouping logic you need and it would have relationships such as 'within quota' and 'exceeds quota'.

I apologize for not giving a more precise response.  There are many ways to approach this and the best trade offs will depend on finer details.  As you advance with this please feel free to ask more questions.  If you find things you wish were available and you think should exist in NiFi we'd love to have your contribution in any form (ideas, code, JIRAs, etc..).

Thanks
Joe

On Thu, May 26, 2016 at 9:08 PM, Kumiko Yada <Ku...@ds-iq.com> wrote:
> Hello,
>
> We implemented the custom process that are similar to the InvokeHTTP that the part of URL can be replaced with the Context Data List, then write the weather to the flowfile.  For example, URL to get the weather feed have to include the zip code in URL, and the ZIP code is {0} in the URL and replaced the zip code from the Context Data List property.
>
> URL
> http://example{0}/weather<http://example%7b0%7d/weather>
>
> Context Data List:
> 00000
> 11111
> 22222
>
> Processor with make the following requests:
> http://example{0}/weather<http://example%7b0%7d/weather>
>
> http://example00000/weather
> http://example11111/weather
> http://example22222/weather
>
> This processor is processed in one request at a time and have a perf issue.  I'd like to modify to process in batches.  What are the best way to process in batches?  And also, would the Nifi keep track how many requests the processor is processed?  If so, how the Nifi keep track this and how long the Nifi keep track of data?  I'd like to add the quota priorities in this processor to keep track of quota.  For example, if the weather feeds can be requested only 100 requests a day, I don't want to processor to executed once the quota is reached.
>
> Thanks
> Kumiko

Re: Best way to process the processor requests in batch

Posted by Joe Witt <jo...@gmail.com>.
Kumiko

A couple of quick thoughts to share.  You can absolutely code your
processor to operate in batches and you can of course multi-thread the
processor.  The general unit of work concept Apache NiFi supports is
called a ProcessSession and you can operate on as many flow files as
you need in that session and then commit it as one batch.   NiFi will
automatically track/record a lot of very nice information at the
process session level.  In addition NiFi will capture provenance
information which itself is useful for understand specific items that
went through that flow and their latencies and such.  Beyond these
options there is also a concept of counters which you can use to
capture, generally for development purposes, interesting things you'd
like to observe over time. You'll also want to get a good handle on
what performance you should expect interacting with the web service
independent of NiFi so you can get a good baseline to work from.

The quota question is also one where you have choices and design
decisions to make.  You can bake this quota handling logic into your
processor itself or you could also possibly wire existing or some new
processor in that specifically handles the quote/grouping logic you
need and it would have relationships such as 'within quota' and
'exceeds quota'.

I apologize for not giving a more precise response.  There are many
ways to approach this and the best trade offs will depend on finer
details.  As you advance with this please feel free to ask more
questions.  If you find things you wish were available and you think
should exist in NiFi we'd love to have your contribution in any form
(ideas, code, JIRAs, etc..).

Thanks
Joe

On Thu, May 26, 2016 at 9:08 PM, Kumiko Yada <Ku...@ds-iq.com> wrote:
> Hello,
>
> We implemented the custom process that are similar to the InvokeHTTP that the part of URL can be replaced with the Context Data List, then write the weather to the flowfile.  For example, URL to get the weather feed have to include the zip code in URL, and the ZIP code is {0} in the URL and replaced the zip code from the Context Data List property.
>
> URL
> http://example{0}/weather<http://example%7b0%7d/weather>
>
> Context Data List:
> 00000
> 11111
> 22222
>
> Processor with make the following requests:
> http://example{0}/weather<http://example%7b0%7d/weather>
>
> http://example00000/weather
> http://example11111/weather
> http://example22222/weather
>
> This processor is processed in one request at a time and have a perf issue.  I'd like to modify to process in batches.  What are the best way to process in batches?  And also, would the Nifi keep track how many requests the processor is processed?  If so, how the Nifi keep track this and how long the Nifi keep track of data?  I'd like to add the quota priorities in this processor to keep track of quota.  For example, if the weather feeds can be requested only 100 requests a day, I don't want to processor to executed once the quota is reached.
>
> Thanks
> Kumiko