You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Joe Witt <jo...@gmail.com> on 2016/01/01 16:37:52 UTC

Re: Database Lookup Approach

Indus,

My apologies for the response time.  For step 3 I was suggesting a
processor which takes a given flowfile attribute and looks that value
up against a cache to then populate further attributes or indicate a
miss.  In the event of a hit the values are populated and you're set.
In the case of a miss then route to the ExecuteSQL processor to get
the results.  After the ExecuteSQL processor then the results could be
routed a processor to populate the cache.

Joe Percivall: As part of the work you've been doing with templates
this is a really good use case to document as this is a fairly common
pattern and we leave a lot of the exercise up to the reader.  We
should make a template to do this, identify gaps where we should
provide features, and publish a good guide.

Thanks
Joe

On Sun, Nov 15, 2015 at 11:40 AM, indus well <in...@gmail.com> wrote:
> Thanks, Joe. Yes, the referenced lookup data does not change frequently so
> your built-in caching strategy sounds great. Would you expand on how to
> implement the built-in caching mechanism as well as refreshing the cache in
> NiFi please?
>
> Thanks,
>
> Indus
>
> On Sat, Nov 14, 2015 at 12:24 PM, Joe Witt <jo...@gmail.com> wrote:
>>
>> Indus,
>>
>> Let's project out one order of magnitude.  That puts you at just a bit
>> more than 500 lookups per second.  You will want to consider the
>> properties of the data itself and the properties of the database.
>> This helps you decide on the appropriate caching and querying
>> strategy.  If the data as-is lends itself to highly effective caching
>> behavior then you're good.  If not you may choose to intentionally
>> merge/batch data together to gain higher cache hits.  With regard to
>> the database itself if it is being updated frequently then you'll need
>> a tighter cache refresh policy and so on.  Anyway such cases and
>> fairly wide combination of scenarios have been played out with great
>> effect in NiFi.  We have some of the tools you'll need out of the box
>> to do this and some you'll possibly need to build for your database or
>> caching strategy or if you want to edit the JSON data based on the
>> results of the database lookup.
>>
>> So general flow logic would be like this (not specific to any existing
>> processor)
>>
>> 1) Consume JSON object
>> 2) Extract username from JSON object as a flow file attribute
>> 3) Execute SQL statement to lookup results of query with username.
>> This could be a single processor which has a built-in caching
>> mechanism or a processor that first looks up in a cache then if misses
>> looks up against the actual database and then populates the cache.
>> 4) The attributes of the flow file now contain the results of the
>> query.  Update the JSON object content if necessary using attributes
>> or just make an attribute based routing decision
>>
>> The key part of this is really 3.  Many ways to go after that.
>>
>> Thanks
>> Joe
>>
>> On Sat, Nov 14, 2015 at 12:46 PM, indus well <in...@gmail.com> wrote:
>> > Hi NiFi Experts:
>> >
>> > I have a use case where I consume a Json message containing an userID
>> > and
>> > need to lookup the value from a database table. The transaction volume
>> > is
>> > averaging around 5 million per day and growing. What would be the best
>> > approach for this scenario.
>> >
>> > Thanks,
>> >
>> > Indus
>> >
>> >
>
>