You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Ali Nazemian <al...@gmail.com> on 2019/02/15 00:05:07 UTC

Nifi provenance indexing throughput if it is being used as an event store

Hi All,

I am investigating to see how Nifi provenance can be used as an event store
for a long period of time. Our use case is very burst based and sometimes
we may not receive any event for a period of time and sometimes we may get
burst traffic. On average we can say maybe around 1000 eps is the expected
throughput at this stage. Nifi has a powerful provenance that gives you an
ability to also index based on some attributes. I am investigating how
reliable is to use Nifi provenance store for a long period of time and
enable index for a few extra attributes. Has anybody used Nifi provenance
at this scale? Can lots of Lucene indices create other issues within Nifi
as provenance uses Lucene for the indexing?

P.S: Our use case is pretty light for Nifi as we are not going to have any
ETL and Nifi is being used mostly as an Orchestrator of multiple
Microservices.

Regards,
Ali

Re: Nifi provenance indexing throughput if it is being used as an event store

Posted by Ali Nazemian <al...@gmail.com>.
Sure. Thanks, Joe.

On Sun, 17 Feb. 2019, 22:52 Joe Witt <joe.witt@gmail.com wrote:

> ali
>
> there are many variables here that are needed before anyone could know for
> sure.
>
> but give it a try and measure amd forecast and youll know within a day or
> two.
>
> thanks
>
>
> On Sat, Feb 16, 2019, 11:37 PM Ali Nazemian <alinazemian@gmail.com wrote:
>
>> Thanks, Joe. Given the fact that we would like to add a few attributes
>> and set them to be indexed for the provenance, the mentioned rate should be
>> alright?
>>
>> Cheers,
>> Ali
>>
>> On Sat, Feb 16, 2019 at 2:56 PM Joe Witt <jo...@gmail.com> wrote:
>>
>>> Ali
>>>
>>> You certainly can and at the rates you mention you should be able to
>>> keep it for a good while.
>>>
>>> Just set the properties you need for your system and measure the rate at
>>> which prov storage fills.
>>>
>>> Thanks
>>>
>>> On Fri, Feb 15, 2019 at 10:29 PM Ali Nazemian <al...@gmail.com>
>>> wrote:
>>>
>>>> I didn't mean to use Nifi provenance search for an external provenance
>>>> search. I meant to use it for internal search provenance but keep the
>>>> provenance for a longer time than usual. It means instead of expecting it
>>>> to keep provenance data for a few days, use it as an event store as it also
>>>> provides the search capability.
>>>>
>>>> Regards,
>>>> Ali
>>>>
>>>> On Sat, Feb 16, 2019 at 5:29 AM Andrew Grande <ap...@gmail.com>
>>>> wrote:
>>>>
>>>>> NiFi provenance searches are not a good integration pattern for
>>>>> external systems. I.e. using it to periodicaly fetch history burdens the
>>>>> cluster (those searches can be heavy) and disrupt normal processing SLAs.
>>>>>
>>>>> Pushing provenance events out to an external system (pitebtially even
>>>>> filtered down to components of interest) is a much more predictable pattern
>>>>> and provides lots of flexibility on how to interpret the events.
>>>>>
>>>>> Andrew
>>>>>
>>>>> On Thu, Feb 14, 2019, 11:26 PM Ali Nazemian <al...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Can I expect the Nifi search provenance part do the job for me?
>>>>>>
>>>>>> On Fri, 15 Feb. 2019, 13:21 Mike Thomsen <mikerthomsen@gmail.com
>>>>>> wrote:
>>>>>>
>>>>>>> Ali,
>>>>>>>
>>>>>>> There is a site to site publishing task for provenance that you can
>>>>>>> add as a root controller service that would be great here. It'll just take
>>>>>>> all of your provenance data periodically and ship it off to another NiFi
>>>>>>> server or cluster that can process all of the provenance data as blocks of
>>>>>>> JSON data. A common pattern there is to filter down to the events you want
>>>>>>> and publish to ElasticSearch.
>>>>>>>
>>>>>>> On Thu, Feb 14, 2019 at 7:05 PM Ali Nazemian <al...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> I am investigating to see how Nifi provenance can be used as an
>>>>>>>> event store for a long period of time. Our use case is very burst based and
>>>>>>>> sometimes we may not receive any event for a period of time and sometimes
>>>>>>>> we may get burst traffic. On average we can say maybe around 1000 eps is
>>>>>>>> the expected throughput at this stage. Nifi has a powerful provenance that
>>>>>>>> gives you an ability to also index based on some attributes. I am
>>>>>>>> investigating how reliable is to use Nifi provenance store for a long
>>>>>>>> period of time and enable index for a few extra attributes. Has anybody
>>>>>>>> used Nifi provenance at this scale? Can lots of Lucene indices create other
>>>>>>>> issues within Nifi as provenance uses Lucene for the indexing?
>>>>>>>>
>>>>>>>> P.S: Our use case is pretty light for Nifi as we are not going to
>>>>>>>> have any ETL and Nifi is being used mostly as an Orchestrator of multiple
>>>>>>>> Microservices.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Ali
>>>>>>>>
>>>>>>>
>>>>
>>>> --
>>>> A.Nazemian
>>>>
>>>
>>
>> --
>> A.Nazemian
>>
>

Re: Nifi provenance indexing throughput if it is being used as an event store

Posted by Joe Witt <jo...@gmail.com>.
ali

there are many variables here that are needed before anyone could know for
sure.

but give it a try and measure amd forecast and youll know within a day or
two.

thanks


On Sat, Feb 16, 2019, 11:37 PM Ali Nazemian <alinazemian@gmail.com wrote:

> Thanks, Joe. Given the fact that we would like to add a few attributes and
> set them to be indexed for the provenance, the mentioned rate should be
> alright?
>
> Cheers,
> Ali
>
> On Sat, Feb 16, 2019 at 2:56 PM Joe Witt <jo...@gmail.com> wrote:
>
>> Ali
>>
>> You certainly can and at the rates you mention you should be able to keep
>> it for a good while.
>>
>> Just set the properties you need for your system and measure the rate at
>> which prov storage fills.
>>
>> Thanks
>>
>> On Fri, Feb 15, 2019 at 10:29 PM Ali Nazemian <al...@gmail.com>
>> wrote:
>>
>>> I didn't mean to use Nifi provenance search for an external provenance
>>> search. I meant to use it for internal search provenance but keep the
>>> provenance for a longer time than usual. It means instead of expecting it
>>> to keep provenance data for a few days, use it as an event store as it also
>>> provides the search capability.
>>>
>>> Regards,
>>> Ali
>>>
>>> On Sat, Feb 16, 2019 at 5:29 AM Andrew Grande <ap...@gmail.com>
>>> wrote:
>>>
>>>> NiFi provenance searches are not a good integration pattern for
>>>> external systems. I.e. using it to periodicaly fetch history burdens the
>>>> cluster (those searches can be heavy) and disrupt normal processing SLAs.
>>>>
>>>> Pushing provenance events out to an external system (pitebtially even
>>>> filtered down to components of interest) is a much more predictable pattern
>>>> and provides lots of flexibility on how to interpret the events.
>>>>
>>>> Andrew
>>>>
>>>> On Thu, Feb 14, 2019, 11:26 PM Ali Nazemian <al...@gmail.com>
>>>> wrote:
>>>>
>>>>> Can I expect the Nifi search provenance part do the job for me?
>>>>>
>>>>> On Fri, 15 Feb. 2019, 13:21 Mike Thomsen <mikerthomsen@gmail.com
>>>>> wrote:
>>>>>
>>>>>> Ali,
>>>>>>
>>>>>> There is a site to site publishing task for provenance that you can
>>>>>> add as a root controller service that would be great here. It'll just take
>>>>>> all of your provenance data periodically and ship it off to another NiFi
>>>>>> server or cluster that can process all of the provenance data as blocks of
>>>>>> JSON data. A common pattern there is to filter down to the events you want
>>>>>> and publish to ElasticSearch.
>>>>>>
>>>>>> On Thu, Feb 14, 2019 at 7:05 PM Ali Nazemian <al...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> I am investigating to see how Nifi provenance can be used as an
>>>>>>> event store for a long period of time. Our use case is very burst based and
>>>>>>> sometimes we may not receive any event for a period of time and sometimes
>>>>>>> we may get burst traffic. On average we can say maybe around 1000 eps is
>>>>>>> the expected throughput at this stage. Nifi has a powerful provenance that
>>>>>>> gives you an ability to also index based on some attributes. I am
>>>>>>> investigating how reliable is to use Nifi provenance store for a long
>>>>>>> period of time and enable index for a few extra attributes. Has anybody
>>>>>>> used Nifi provenance at this scale? Can lots of Lucene indices create other
>>>>>>> issues within Nifi as provenance uses Lucene for the indexing?
>>>>>>>
>>>>>>> P.S: Our use case is pretty light for Nifi as we are not going to
>>>>>>> have any ETL and Nifi is being used mostly as an Orchestrator of multiple
>>>>>>> Microservices.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Ali
>>>>>>>
>>>>>>
>>>
>>> --
>>> A.Nazemian
>>>
>>
>
> --
> A.Nazemian
>

Re: Nifi provenance indexing throughput if it is being used as an event store

Posted by Ali Nazemian <al...@gmail.com>.
Thanks, Joe. Given the fact that we would like to add a few attributes and
set them to be indexed for the provenance, the mentioned rate should be
alright?

Cheers,
Ali

On Sat, Feb 16, 2019 at 2:56 PM Joe Witt <jo...@gmail.com> wrote:

> Ali
>
> You certainly can and at the rates you mention you should be able to keep
> it for a good while.
>
> Just set the properties you need for your system and measure the rate at
> which prov storage fills.
>
> Thanks
>
> On Fri, Feb 15, 2019 at 10:29 PM Ali Nazemian <al...@gmail.com>
> wrote:
>
>> I didn't mean to use Nifi provenance search for an external provenance
>> search. I meant to use it for internal search provenance but keep the
>> provenance for a longer time than usual. It means instead of expecting it
>> to keep provenance data for a few days, use it as an event store as it also
>> provides the search capability.
>>
>> Regards,
>> Ali
>>
>> On Sat, Feb 16, 2019 at 5:29 AM Andrew Grande <ap...@gmail.com> wrote:
>>
>>> NiFi provenance searches are not a good integration pattern for external
>>> systems. I.e. using it to periodicaly fetch history burdens the cluster
>>> (those searches can be heavy) and disrupt normal processing SLAs.
>>>
>>> Pushing provenance events out to an external system (pitebtially even
>>> filtered down to components of interest) is a much more predictable pattern
>>> and provides lots of flexibility on how to interpret the events.
>>>
>>> Andrew
>>>
>>> On Thu, Feb 14, 2019, 11:26 PM Ali Nazemian <al...@gmail.com>
>>> wrote:
>>>
>>>> Can I expect the Nifi search provenance part do the job for me?
>>>>
>>>> On Fri, 15 Feb. 2019, 13:21 Mike Thomsen <mikerthomsen@gmail.com wrote:
>>>>
>>>>> Ali,
>>>>>
>>>>> There is a site to site publishing task for provenance that you can
>>>>> add as a root controller service that would be great here. It'll just take
>>>>> all of your provenance data periodically and ship it off to another NiFi
>>>>> server or cluster that can process all of the provenance data as blocks of
>>>>> JSON data. A common pattern there is to filter down to the events you want
>>>>> and publish to ElasticSearch.
>>>>>
>>>>> On Thu, Feb 14, 2019 at 7:05 PM Ali Nazemian <al...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I am investigating to see how Nifi provenance can be used as an event
>>>>>> store for a long period of time. Our use case is very burst based and
>>>>>> sometimes we may not receive any event for a period of time and sometimes
>>>>>> we may get burst traffic. On average we can say maybe around 1000 eps is
>>>>>> the expected throughput at this stage. Nifi has a powerful provenance that
>>>>>> gives you an ability to also index based on some attributes. I am
>>>>>> investigating how reliable is to use Nifi provenance store for a long
>>>>>> period of time and enable index for a few extra attributes. Has anybody
>>>>>> used Nifi provenance at this scale? Can lots of Lucene indices create other
>>>>>> issues within Nifi as provenance uses Lucene for the indexing?
>>>>>>
>>>>>> P.S: Our use case is pretty light for Nifi as we are not going to
>>>>>> have any ETL and Nifi is being used mostly as an Orchestrator of multiple
>>>>>> Microservices.
>>>>>>
>>>>>> Regards,
>>>>>> Ali
>>>>>>
>>>>>
>>
>> --
>> A.Nazemian
>>
>

-- 
A.Nazemian

Re: Nifi provenance indexing throughput if it is being used as an event store

Posted by Joe Witt <jo...@gmail.com>.
Ali

You certainly can and at the rates you mention you should be able to keep
it for a good while.

Just set the properties you need for your system and measure the rate at
which prov storage fills.

Thanks

On Fri, Feb 15, 2019 at 10:29 PM Ali Nazemian <al...@gmail.com> wrote:

> I didn't mean to use Nifi provenance search for an external provenance
> search. I meant to use it for internal search provenance but keep the
> provenance for a longer time than usual. It means instead of expecting it
> to keep provenance data for a few days, use it as an event store as it also
> provides the search capability.
>
> Regards,
> Ali
>
> On Sat, Feb 16, 2019 at 5:29 AM Andrew Grande <ap...@gmail.com> wrote:
>
>> NiFi provenance searches are not a good integration pattern for external
>> systems. I.e. using it to periodicaly fetch history burdens the cluster
>> (those searches can be heavy) and disrupt normal processing SLAs.
>>
>> Pushing provenance events out to an external system (pitebtially even
>> filtered down to components of interest) is a much more predictable pattern
>> and provides lots of flexibility on how to interpret the events.
>>
>> Andrew
>>
>> On Thu, Feb 14, 2019, 11:26 PM Ali Nazemian <al...@gmail.com>
>> wrote:
>>
>>> Can I expect the Nifi search provenance part do the job for me?
>>>
>>> On Fri, 15 Feb. 2019, 13:21 Mike Thomsen <mikerthomsen@gmail.com wrote:
>>>
>>>> Ali,
>>>>
>>>> There is a site to site publishing task for provenance that you can add
>>>> as a root controller service that would be great here. It'll just take all
>>>> of your provenance data periodically and ship it off to another NiFi server
>>>> or cluster that can process all of the provenance data as blocks of JSON
>>>> data. A common pattern there is to filter down to the events you want and
>>>> publish to ElasticSearch.
>>>>
>>>> On Thu, Feb 14, 2019 at 7:05 PM Ali Nazemian <al...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I am investigating to see how Nifi provenance can be used as an event
>>>>> store for a long period of time. Our use case is very burst based and
>>>>> sometimes we may not receive any event for a period of time and sometimes
>>>>> we may get burst traffic. On average we can say maybe around 1000 eps is
>>>>> the expected throughput at this stage. Nifi has a powerful provenance that
>>>>> gives you an ability to also index based on some attributes. I am
>>>>> investigating how reliable is to use Nifi provenance store for a long
>>>>> period of time and enable index for a few extra attributes. Has anybody
>>>>> used Nifi provenance at this scale? Can lots of Lucene indices create other
>>>>> issues within Nifi as provenance uses Lucene for the indexing?
>>>>>
>>>>> P.S: Our use case is pretty light for Nifi as we are not going to have
>>>>> any ETL and Nifi is being used mostly as an Orchestrator of multiple
>>>>> Microservices.
>>>>>
>>>>> Regards,
>>>>> Ali
>>>>>
>>>>
>
> --
> A.Nazemian
>

Re: Nifi provenance indexing throughput if it is being used as an event store

Posted by Ali Nazemian <al...@gmail.com>.
I didn't mean to use Nifi provenance search for an external provenance
search. I meant to use it for internal search provenance but keep the
provenance for a longer time than usual. It means instead of expecting it
to keep provenance data for a few days, use it as an event store as it also
provides the search capability.

Regards,
Ali

On Sat, Feb 16, 2019 at 5:29 AM Andrew Grande <ap...@gmail.com> wrote:

> NiFi provenance searches are not a good integration pattern for external
> systems. I.e. using it to periodicaly fetch history burdens the cluster
> (those searches can be heavy) and disrupt normal processing SLAs.
>
> Pushing provenance events out to an external system (pitebtially even
> filtered down to components of interest) is a much more predictable pattern
> and provides lots of flexibility on how to interpret the events.
>
> Andrew
>
> On Thu, Feb 14, 2019, 11:26 PM Ali Nazemian <al...@gmail.com> wrote:
>
>> Can I expect the Nifi search provenance part do the job for me?
>>
>> On Fri, 15 Feb. 2019, 13:21 Mike Thomsen <mikerthomsen@gmail.com wrote:
>>
>>> Ali,
>>>
>>> There is a site to site publishing task for provenance that you can add
>>> as a root controller service that would be great here. It'll just take all
>>> of your provenance data periodically and ship it off to another NiFi server
>>> or cluster that can process all of the provenance data as blocks of JSON
>>> data. A common pattern there is to filter down to the events you want and
>>> publish to ElasticSearch.
>>>
>>> On Thu, Feb 14, 2019 at 7:05 PM Ali Nazemian <al...@gmail.com>
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I am investigating to see how Nifi provenance can be used as an event
>>>> store for a long period of time. Our use case is very burst based and
>>>> sometimes we may not receive any event for a period of time and sometimes
>>>> we may get burst traffic. On average we can say maybe around 1000 eps is
>>>> the expected throughput at this stage. Nifi has a powerful provenance that
>>>> gives you an ability to also index based on some attributes. I am
>>>> investigating how reliable is to use Nifi provenance store for a long
>>>> period of time and enable index for a few extra attributes. Has anybody
>>>> used Nifi provenance at this scale? Can lots of Lucene indices create other
>>>> issues within Nifi as provenance uses Lucene for the indexing?
>>>>
>>>> P.S: Our use case is pretty light for Nifi as we are not going to have
>>>> any ETL and Nifi is being used mostly as an Orchestrator of multiple
>>>> Microservices.
>>>>
>>>> Regards,
>>>> Ali
>>>>
>>>

-- 
A.Nazemian

Re: Nifi provenance indexing throughput if it is being used as an event store

Posted by Andrew Grande <ap...@gmail.com>.
NiFi provenance searches are not a good integration pattern for external
systems. I.e. using it to periodicaly fetch history burdens the cluster
(those searches can be heavy) and disrupt normal processing SLAs.

Pushing provenance events out to an external system (pitebtially even
filtered down to components of interest) is a much more predictable pattern
and provides lots of flexibility on how to interpret the events.

Andrew

On Thu, Feb 14, 2019, 11:26 PM Ali Nazemian <al...@gmail.com> wrote:

> Can I expect the Nifi search provenance part do the job for me?
>
> On Fri, 15 Feb. 2019, 13:21 Mike Thomsen <mikerthomsen@gmail.com wrote:
>
>> Ali,
>>
>> There is a site to site publishing task for provenance that you can add
>> as a root controller service that would be great here. It'll just take all
>> of your provenance data periodically and ship it off to another NiFi server
>> or cluster that can process all of the provenance data as blocks of JSON
>> data. A common pattern there is to filter down to the events you want and
>> publish to ElasticSearch.
>>
>> On Thu, Feb 14, 2019 at 7:05 PM Ali Nazemian <al...@gmail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> I am investigating to see how Nifi provenance can be used as an event
>>> store for a long period of time. Our use case is very burst based and
>>> sometimes we may not receive any event for a period of time and sometimes
>>> we may get burst traffic. On average we can say maybe around 1000 eps is
>>> the expected throughput at this stage. Nifi has a powerful provenance that
>>> gives you an ability to also index based on some attributes. I am
>>> investigating how reliable is to use Nifi provenance store for a long
>>> period of time and enable index for a few extra attributes. Has anybody
>>> used Nifi provenance at this scale? Can lots of Lucene indices create other
>>> issues within Nifi as provenance uses Lucene for the indexing?
>>>
>>> P.S: Our use case is pretty light for Nifi as we are not going to have
>>> any ETL and Nifi is being used mostly as an Orchestrator of multiple
>>> Microservices.
>>>
>>> Regards,
>>> Ali
>>>
>>

Re: Nifi provenance indexing throughput if it is being used as an event store

Posted by Ali Nazemian <al...@gmail.com>.
Can I expect the Nifi search provenance part do the job for me?

On Fri, 15 Feb. 2019, 13:21 Mike Thomsen <mikerthomsen@gmail.com wrote:

> Ali,
>
> There is a site to site publishing task for provenance that you can add as
> a root controller service that would be great here. It'll just take all of
> your provenance data periodically and ship it off to another NiFi server or
> cluster that can process all of the provenance data as blocks of JSON data.
> A common pattern there is to filter down to the events you want and publish
> to ElasticSearch.
>
> On Thu, Feb 14, 2019 at 7:05 PM Ali Nazemian <al...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> I am investigating to see how Nifi provenance can be used as an event
>> store for a long period of time. Our use case is very burst based and
>> sometimes we may not receive any event for a period of time and sometimes
>> we may get burst traffic. On average we can say maybe around 1000 eps is
>> the expected throughput at this stage. Nifi has a powerful provenance that
>> gives you an ability to also index based on some attributes. I am
>> investigating how reliable is to use Nifi provenance store for a long
>> period of time and enable index for a few extra attributes. Has anybody
>> used Nifi provenance at this scale? Can lots of Lucene indices create other
>> issues within Nifi as provenance uses Lucene for the indexing?
>>
>> P.S: Our use case is pretty light for Nifi as we are not going to have
>> any ETL and Nifi is being used mostly as an Orchestrator of multiple
>> Microservices.
>>
>> Regards,
>> Ali
>>
>

Re: Nifi provenance indexing throughput if it is being used as an event store

Posted by Mike Thomsen <mi...@gmail.com>.
Ali,

There is a site to site publishing task for provenance that you can add as
a root controller service that would be great here. It'll just take all of
your provenance data periodically and ship it off to another NiFi server or
cluster that can process all of the provenance data as blocks of JSON data.
A common pattern there is to filter down to the events you want and publish
to ElasticSearch.

On Thu, Feb 14, 2019 at 7:05 PM Ali Nazemian <al...@gmail.com> wrote:

> Hi All,
>
> I am investigating to see how Nifi provenance can be used as an event
> store for a long period of time. Our use case is very burst based and
> sometimes we may not receive any event for a period of time and sometimes
> we may get burst traffic. On average we can say maybe around 1000 eps is
> the expected throughput at this stage. Nifi has a powerful provenance that
> gives you an ability to also index based on some attributes. I am
> investigating how reliable is to use Nifi provenance store for a long
> period of time and enable index for a few extra attributes. Has anybody
> used Nifi provenance at this scale? Can lots of Lucene indices create other
> issues within Nifi as provenance uses Lucene for the indexing?
>
> P.S: Our use case is pretty light for Nifi as we are not going to have any
> ETL and Nifi is being used mostly as an Orchestrator of multiple
> Microservices.
>
> Regards,
> Ali
>