You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Mike Harding <mi...@gmail.com> on 2016/08/25 10:38:45 UTC

NiFi global variables / persisting state outside of a pipeline

Hi All,

I have a mapping table stored in hive that maps an ID to a readable name
string. When a JSON object enters my nifi pipeline as a flowfile I want to
be able to inject the readable name string into the JSON object. The
problem is currently as each flowfile enters the pipe I have to make a
SelectHiveQL call tofirst get the lookup table data and store as attributes.

Is there a way I can load the lookup table data once or on a periodic basis
into nifi (as a global variable/attribute) to save having to make the
select call for each flowfile which translates to 1000's of calls a minute?

Thanks,
Mike

Re: NiFi global variables / persisting state outside of a pipeline

Posted by Andrew Grande <ap...@gmail.com>.
How did you guys solve the problem of warming up the caches? E.g. it should
block the flow until it's done reading ref data and populating the dist
cache.

I'm not sure how this would scale if one has to deploy numerous minifi
agents with a preconfigured flow.

Andrew

On Thu, Aug 25, 2016, 10:23 AM Mike Harding <mi...@gmail.com> wrote:

> Thanks Bryan - I was unaware of the MapCacheServer functionality - I've
> now implemented the approached suggested and it works perfectly.
>
> Mike
>
> On 25 August 2016 at 15:05, Joe Witt <jo...@gmail.com> wrote:
>
>> also this is a great use case which has been done quite a bit in the
>> past using exactly the sort of logic Bryan calls out.  We've also done
>> things like written custom controller services specific to the type of
>> data and data structures needed for the job.  But the
>> plumbing/infrastructure for it is well supported to avoid the RPC
>> calls you mention, ensure the cache gets frequently updated live, and
>> that the cache can be used by numerous components at once.
>>
>> Thanks
>> Joe
>>
>> On Thu, Aug 25, 2016 at 9:57 AM, Bryan Bende <bb...@gmail.com> wrote:
>> > Hi Mike,
>> >
>> > I think one approach might the following...
>> >
>> > Setup controller services for DistributedMapCacheServer and
>> > DistributedMapCacheClient, then have part of your flow that is triggered
>> > periodically and queries your Hive table, probably need to split/parse
>> the
>> > results, and then use PutDistributedMapCache processor to store them in
>> the
>> > cache.
>> >
>> > In the other part of your flow use FetchDistributedMapCache to do a
>> look up
>> > against the cache.
>> >
>> > I haven't worked through all of the exact steps, but I think something
>> like
>> > that should work.
>> >
>> > Thanks,
>> >
>> > Bryan
>> >
>> > On Thu, Aug 25, 2016 at 6:38 AM, Mike Harding <mi...@gmail.com>
>> > wrote:
>> >>
>> >> Hi All,
>> >>
>> >> I have a mapping table stored in hive that maps an ID to a readable
>> name
>> >> string. When a JSON object enters my nifi pipeline as a flowfile I
>> want to
>> >> be able to inject the readable name string into the JSON object. The
>> problem
>> >> is currently as each flowfile enters the pipe I have to make a
>> SelectHiveQL
>> >> call tofirst get the lookup table data and store as attributes.
>> >>
>> >> Is there a way I can load the lookup table data once or on a periodic
>> >> basis into nifi (as a global variable/attribute) to save having to
>> make the
>> >> select call for each flowfile which translates to 1000's of calls a
>> minute?
>> >>
>> >> Thanks,
>> >> Mike
>> >
>> >
>>
>
>

Re: NiFi global variables / persisting state outside of a pipeline

Posted by Mike Harding <mi...@gmail.com>.
Thanks Bryan - I was unaware of the MapCacheServer functionality - I've now
implemented the approached suggested and it works perfectly.

Mike

On 25 August 2016 at 15:05, Joe Witt <jo...@gmail.com> wrote:

> also this is a great use case which has been done quite a bit in the
> past using exactly the sort of logic Bryan calls out.  We've also done
> things like written custom controller services specific to the type of
> data and data structures needed for the job.  But the
> plumbing/infrastructure for it is well supported to avoid the RPC
> calls you mention, ensure the cache gets frequently updated live, and
> that the cache can be used by numerous components at once.
>
> Thanks
> Joe
>
> On Thu, Aug 25, 2016 at 9:57 AM, Bryan Bende <bb...@gmail.com> wrote:
> > Hi Mike,
> >
> > I think one approach might the following...
> >
> > Setup controller services for DistributedMapCacheServer and
> > DistributedMapCacheClient, then have part of your flow that is triggered
> > periodically and queries your Hive table, probably need to split/parse
> the
> > results, and then use PutDistributedMapCache processor to store them in
> the
> > cache.
> >
> > In the other part of your flow use FetchDistributedMapCache to do a look
> up
> > against the cache.
> >
> > I haven't worked through all of the exact steps, but I think something
> like
> > that should work.
> >
> > Thanks,
> >
> > Bryan
> >
> > On Thu, Aug 25, 2016 at 6:38 AM, Mike Harding <mi...@gmail.com>
> > wrote:
> >>
> >> Hi All,
> >>
> >> I have a mapping table stored in hive that maps an ID to a readable name
> >> string. When a JSON object enters my nifi pipeline as a flowfile I want
> to
> >> be able to inject the readable name string into the JSON object. The
> problem
> >> is currently as each flowfile enters the pipe I have to make a
> SelectHiveQL
> >> call tofirst get the lookup table data and store as attributes.
> >>
> >> Is there a way I can load the lookup table data once or on a periodic
> >> basis into nifi (as a global variable/attribute) to save having to make
> the
> >> select call for each flowfile which translates to 1000's of calls a
> minute?
> >>
> >> Thanks,
> >> Mike
> >
> >
>

Re: NiFi global variables / persisting state outside of a pipeline

Posted by Joe Witt <jo...@gmail.com>.
also this is a great use case which has been done quite a bit in the
past using exactly the sort of logic Bryan calls out.  We've also done
things like written custom controller services specific to the type of
data and data structures needed for the job.  But the
plumbing/infrastructure for it is well supported to avoid the RPC
calls you mention, ensure the cache gets frequently updated live, and
that the cache can be used by numerous components at once.

Thanks
Joe

On Thu, Aug 25, 2016 at 9:57 AM, Bryan Bende <bb...@gmail.com> wrote:
> Hi Mike,
>
> I think one approach might the following...
>
> Setup controller services for DistributedMapCacheServer and
> DistributedMapCacheClient, then have part of your flow that is triggered
> periodically and queries your Hive table, probably need to split/parse the
> results, and then use PutDistributedMapCache processor to store them in the
> cache.
>
> In the other part of your flow use FetchDistributedMapCache to do a look up
> against the cache.
>
> I haven't worked through all of the exact steps, but I think something like
> that should work.
>
> Thanks,
>
> Bryan
>
> On Thu, Aug 25, 2016 at 6:38 AM, Mike Harding <mi...@gmail.com>
> wrote:
>>
>> Hi All,
>>
>> I have a mapping table stored in hive that maps an ID to a readable name
>> string. When a JSON object enters my nifi pipeline as a flowfile I want to
>> be able to inject the readable name string into the JSON object. The problem
>> is currently as each flowfile enters the pipe I have to make a SelectHiveQL
>> call tofirst get the lookup table data and store as attributes.
>>
>> Is there a way I can load the lookup table data once or on a periodic
>> basis into nifi (as a global variable/attribute) to save having to make the
>> select call for each flowfile which translates to 1000's of calls a minute?
>>
>> Thanks,
>> Mike
>
>

Re: NiFi global variables / persisting state outside of a pipeline

Posted by Bryan Bende <bb...@gmail.com>.
Hi Mike,

I think one approach might the following...

Setup controller services for DistributedMapCacheServer and
DistributedMapCacheClient, then have part of your flow that is triggered
periodically and queries your Hive table, probably need to split/parse the
results, and then use PutDistributedMapCache processor to store them in the
cache.

In the other part of your flow use FetchDistributedMapCache to do a look up
against the cache.

I haven't worked through all of the exact steps, but I think something like
that should work.

Thanks,

Bryan

On Thu, Aug 25, 2016 at 6:38 AM, Mike Harding <mi...@gmail.com>
wrote:

> Hi All,
>
> I have a mapping table stored in hive that maps an ID to a readable name
> string. When a JSON object enters my nifi pipeline as a flowfile I want to
> be able to inject the readable name string into the JSON object. The
> problem is currently as each flowfile enters the pipe I have to make a
> SelectHiveQL call tofirst get the lookup table data and store as attributes.
>
> Is there a way I can load the lookup table data once or on a periodic
> basis into nifi (as a global variable/attribute) to save having to make the
> select call for each flowfile which translates to 1000's of calls a minute?
>
> Thanks,
> Mike
>