You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by indus well <in...@gmail.com> on 2015/11/14 18:46:05 UTC

Fwd: Database Lookup Approach

Hi NiFi Experts:

I have a use case where I consume a Json message containing an userID and
need to lookup the value from a database table. The transaction volume is
averaging around 5 million per day and growing. What would be the best
approach for this scenario.

Thanks,

Indus

Re: Database Lookup Approach

Posted by Joe Witt <jo...@gmail.com>.
Indus,

My apologies for the response time.  For step 3 I was suggesting a
processor which takes a given flowfile attribute and looks that value
up against a cache to then populate further attributes or indicate a
miss.  In the event of a hit the values are populated and you're set.
In the case of a miss then route to the ExecuteSQL processor to get
the results.  After the ExecuteSQL processor then the results could be
routed a processor to populate the cache.

Joe Percivall: As part of the work you've been doing with templates
this is a really good use case to document as this is a fairly common
pattern and we leave a lot of the exercise up to the reader.  We
should make a template to do this, identify gaps where we should
provide features, and publish a good guide.

Thanks
Joe

On Sun, Nov 15, 2015 at 11:40 AM, indus well <in...@gmail.com> wrote:
> Thanks, Joe. Yes, the referenced lookup data does not change frequently so
> your built-in caching strategy sounds great. Would you expand on how to
> implement the built-in caching mechanism as well as refreshing the cache in
> NiFi please?
>
> Thanks,
>
> Indus
>
> On Sat, Nov 14, 2015 at 12:24 PM, Joe Witt <jo...@gmail.com> wrote:
>>
>> Indus,
>>
>> Let's project out one order of magnitude.  That puts you at just a bit
>> more than 500 lookups per second.  You will want to consider the
>> properties of the data itself and the properties of the database.
>> This helps you decide on the appropriate caching and querying
>> strategy.  If the data as-is lends itself to highly effective caching
>> behavior then you're good.  If not you may choose to intentionally
>> merge/batch data together to gain higher cache hits.  With regard to
>> the database itself if it is being updated frequently then you'll need
>> a tighter cache refresh policy and so on.  Anyway such cases and
>> fairly wide combination of scenarios have been played out with great
>> effect in NiFi.  We have some of the tools you'll need out of the box
>> to do this and some you'll possibly need to build for your database or
>> caching strategy or if you want to edit the JSON data based on the
>> results of the database lookup.
>>
>> So general flow logic would be like this (not specific to any existing
>> processor)
>>
>> 1) Consume JSON object
>> 2) Extract username from JSON object as a flow file attribute
>> 3) Execute SQL statement to lookup results of query with username.
>> This could be a single processor which has a built-in caching
>> mechanism or a processor that first looks up in a cache then if misses
>> looks up against the actual database and then populates the cache.
>> 4) The attributes of the flow file now contain the results of the
>> query.  Update the JSON object content if necessary using attributes
>> or just make an attribute based routing decision
>>
>> The key part of this is really 3.  Many ways to go after that.
>>
>> Thanks
>> Joe
>>
>> On Sat, Nov 14, 2015 at 12:46 PM, indus well <in...@gmail.com> wrote:
>> > Hi NiFi Experts:
>> >
>> > I have a use case where I consume a Json message containing an userID
>> > and
>> > need to lookup the value from a database table. The transaction volume
>> > is
>> > averaging around 5 million per day and growing. What would be the best
>> > approach for this scenario.
>> >
>> > Thanks,
>> >
>> > Indus
>> >
>> >
>
>

Re: Database Lookup Approach

Posted by indus well <in...@gmail.com>.
Thanks, Joe. Yes, the referenced lookup data does not change frequently so
your built-in caching strategy sounds great. Would you expand on how to
implement the built-in caching mechanism as well as refreshing the cache in
NiFi please?

Thanks,

Indus

On Sat, Nov 14, 2015 at 12:24 PM, Joe Witt <jo...@gmail.com> wrote:

> Indus,
>
> Let's project out one order of magnitude.  That puts you at just a bit
> more than 500 lookups per second.  You will want to consider the
> properties of the data itself and the properties of the database.
> This helps you decide on the appropriate caching and querying
> strategy.  If the data as-is lends itself to highly effective caching
> behavior then you're good.  If not you may choose to intentionally
> merge/batch data together to gain higher cache hits.  With regard to
> the database itself if it is being updated frequently then you'll need
> a tighter cache refresh policy and so on.  Anyway such cases and
> fairly wide combination of scenarios have been played out with great
> effect in NiFi.  We have some of the tools you'll need out of the box
> to do this and some you'll possibly need to build for your database or
> caching strategy or if you want to edit the JSON data based on the
> results of the database lookup.
>
> So general flow logic would be like this (not specific to any existing
> processor)
>
> 1) Consume JSON object
> 2) Extract username from JSON object as a flow file attribute
> 3) Execute SQL statement to lookup results of query with username.
> This could be a single processor which has a built-in caching
> mechanism or a processor that first looks up in a cache then if misses
> looks up against the actual database and then populates the cache.
> 4) The attributes of the flow file now contain the results of the
> query.  Update the JSON object content if necessary using attributes
> or just make an attribute based routing decision
>
> The key part of this is really 3.  Many ways to go after that.
>
> Thanks
> Joe
>
> On Sat, Nov 14, 2015 at 12:46 PM, indus well <in...@gmail.com> wrote:
> > Hi NiFi Experts:
> >
> > I have a use case where I consume a Json message containing an userID and
> > need to lookup the value from a database table. The transaction volume is
> > averaging around 5 million per day and growing. What would be the best
> > approach for this scenario.
> >
> > Thanks,
> >
> > Indus
> >
> >
>

Re: Database Lookup Approach

Posted by Joe Witt <jo...@gmail.com>.
Indus,

Let's project out one order of magnitude.  That puts you at just a bit
more than 500 lookups per second.  You will want to consider the
properties of the data itself and the properties of the database.
This helps you decide on the appropriate caching and querying
strategy.  If the data as-is lends itself to highly effective caching
behavior then you're good.  If not you may choose to intentionally
merge/batch data together to gain higher cache hits.  With regard to
the database itself if it is being updated frequently then you'll need
a tighter cache refresh policy and so on.  Anyway such cases and
fairly wide combination of scenarios have been played out with great
effect in NiFi.  We have some of the tools you'll need out of the box
to do this and some you'll possibly need to build for your database or
caching strategy or if you want to edit the JSON data based on the
results of the database lookup.

So general flow logic would be like this (not specific to any existing
processor)

1) Consume JSON object
2) Extract username from JSON object as a flow file attribute
3) Execute SQL statement to lookup results of query with username.
This could be a single processor which has a built-in caching
mechanism or a processor that first looks up in a cache then if misses
looks up against the actual database and then populates the cache.
4) The attributes of the flow file now contain the results of the
query.  Update the JSON object content if necessary using attributes
or just make an attribute based routing decision

The key part of this is really 3.  Many ways to go after that.

Thanks
Joe

On Sat, Nov 14, 2015 at 12:46 PM, indus well <in...@gmail.com> wrote:
> Hi NiFi Experts:
>
> I have a use case where I consume a Json message containing an userID and
> need to lookup the value from a database table. The transaction volume is
> averaging around 5 million per day and growing. What would be the best
> approach for this scenario.
>
> Thanks,
>
> Indus
>
>