You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nifi.apache.org by Phil H <gi...@gmail.com> on 2018/05/14 22:41:12 UTC

Data flow rate exceeding provenance recording rate

Hi gang,

I have started receiving this error after perhaps 24 hours of run time. The
first queue in our flow has a very large backlog by the time this error
arrives. What is odd is that the incoming message rate is fairly constant
at all times and while I am watching NiFi during the day, we never have any
backlogs. So this leads me to believe that the backlog is a symptom of NiFi
slowing itself down, not the cause of the problem. I am using 4 threads per
processor for our main flow to handle the throughput.

Any ideas what causes this to happen and how I can fix it?

TIA,
Phil

Re: Data flow rate exceeding provenance recording rate

Posted by Mark Payne <ma...@hotmail.com>.

Phil,

is there any backpressure being applied to the GetTCP processor?
I.e., is the connection that GetTCP is putting FlowFiles in full?

It sounds like the system starts out fast and then the performance decays
over time? Is that accurate? How long as it fast after a restart?

If there is no backpressure being applied, then the first thing that comes to mind
is heap usage. If the system starts out fast and then degrades over time, that is
a telltale sign of running out of heap. As the system runs, it gathers metrics for
processors, etc. and over time that can cause you to run out of memory. So along
those lines: how much heap do you have allocated to the NiFi instance? Do you have
any custom processors/controller services?

Thanks
-Mark

> On May 15, 2018, at 12:15 AM, Phil H <gi...@gmail.com> wrote:
> 
> I may have spoken too soon. I was processing well through the 7 figure
> backlog, when the system started slowing again. I upped the indexing thread
> count again (it was 2 initially, then 4, then finally 8) and the system
> become unusably slow, so I set it back to 4.
> 
> The system is now operating better (UI is responsive, and no errors or
> warnings) but the main GetTCP only seems to be spitting out flowfiles about
> 1/3rd as fast as it used to (network factors have been eliminated).
> 
> Any more ideas?
> 
> On Tue, 15 May 2018 at 09:36, Phil H <gi...@gmail.com> wrote:
> 
>> Thanks Mark,
>> 
>> That has done the trick. The whole system seems to be performing better
>> than I was used to, even before I started receiving those errors.
>> 
>> Cheers,
>> Phil
>> 
>> On Tue, 15 May 2018 at 08:54, Mark Payne <ma...@hotmail.com> wrote:
>> 
>>> Phil,
>>> 
>>> This is just a side effect of how the old provenance repository was
>>> designed. There is a new
>>> implementation that is far faster and seems to be more stable. However,
>>> in order to use it,
>>> you have to "opt in" simply because we wanted to make sure that it was
>>> stable enough to set
>>> it as the default. At this point, I do believe it is and would like to
>>> set it as the default, perhaps in
>>> the next release.
>>> 
>>> To opt in, you can just update nifi.properties to change the value of the
>>> "nifi.provenance.repository.implementation"
>>> property from "org.apache.nifi.provenance.PersistentProvenanceRepository"
>>> to "org.apache.nifi.provenance.WriteAheadProvenanceRepository".
>>> The new implementation provides better throughput, will avoid the
>>> problematic pauses that you're encountering now,
>>> and also is quite a bit faster to search.
>>> 
>>> Thanks
>>> -Mark
>>> 
>>> 
>>> 
>>>> On May 14, 2018, at 6:41 PM, Phil H <gi...@gmail.com> wrote:
>>>> 
>>>> Hi gang,
>>>> 
>>>> I have started receiving this error after perhaps 24 hours of run time.
>>> The
>>>> first queue in our flow has a very large backlog by the time this error
>>>> arrives. What is odd is that the incoming message rate is fairly
>>> constant
>>>> at all times and while I am watching NiFi during the day, we never have
>>> any
>>>> backlogs. So this leads me to believe that the backlog is a symptom of
>>> NiFi
>>>> slowing itself down, not the cause of the problem. I am using 4 threads
>>> per
>>>> processor for our main flow to handle the throughput.
>>>> 
>>>> Any ideas what causes this to happen and how I can fix it?
>>>> 
>>>> TIA,
>>>> Phil
>>> 
>>>

Re: Data flow rate exceeding provenance recording rate

Posted by Phil H <gi...@gmail.com>.

I may have spoken too soon. I was processing well through the 7 figure
backlog, when the system started slowing again. I upped the indexing thread
count again (it was 2 initially, then 4, then finally 8) and the system
become unusably slow, so I set it back to 4.

The system is now operating better (UI is responsive, and no errors or
warnings) but the main GetTCP only seems to be spitting out flowfiles about
1/3rd as fast as it used to (network factors have been eliminated).

Any more ideas?

On Tue, 15 May 2018 at 09:36, Phil H <gi...@gmail.com> wrote:

> Thanks Mark,
>
> That has done the trick. The whole system seems to be performing better
> than I was used to, even before I started receiving those errors.
>
> Cheers,
> Phil
>
> On Tue, 15 May 2018 at 08:54, Mark Payne <ma...@hotmail.com> wrote:
>
>> Phil,
>>
>> This is just a side effect of how the old provenance repository was
>> designed. There is a new
>> implementation that is far faster and seems to be more stable. However,
>> in order to use it,
>> you have to "opt in" simply because we wanted to make sure that it was
>> stable enough to set
>> it as the default. At this point, I do believe it is and would like to
>> set it as the default, perhaps in
>> the next release.
>>
>> To opt in, you can just update nifi.properties to change the value of the
>> "nifi.provenance.repository.implementation"
>> property from "org.apache.nifi.provenance.PersistentProvenanceRepository"
>> to "org.apache.nifi.provenance.WriteAheadProvenanceRepository".
>> The new implementation provides better throughput, will avoid the
>> problematic pauses that you're encountering now,
>> and also is quite a bit faster to search.
>>
>> Thanks
>> -Mark
>>
>>
>>
>> > On May 14, 2018, at 6:41 PM, Phil H <gi...@gmail.com> wrote:
>> >
>> > Hi gang,
>> >
>> > I have started receiving this error after perhaps 24 hours of run time.
>> The
>> > first queue in our flow has a very large backlog by the time this error
>> > arrives. What is odd is that the incoming message rate is fairly
>> constant
>> > at all times and while I am watching NiFi during the day, we never have
>> any
>> > backlogs. So this leads me to believe that the backlog is a symptom of
>> NiFi
>> > slowing itself down, not the cause of the problem. I am using 4 threads
>> per
>> > processor for our main flow to handle the throughput.
>> >
>> > Any ideas what causes this to happen and how I can fix it?
>> >
>> > TIA,
>> > Phil
>>
>>

Re: Data flow rate exceeding provenance recording rate

Posted by Phil H <gi...@gmail.com>.

Thanks Mark,

That has done the trick. The whole system seems to be performing better
than I was used to, even before I started receiving those errors.

Cheers,
Phil

On Tue, 15 May 2018 at 08:54, Mark Payne <ma...@hotmail.com> wrote:

> Phil,
>
> This is just a side effect of how the old provenance repository was
> designed. There is a new
> implementation that is far faster and seems to be more stable. However, in
> order to use it,
> you have to "opt in" simply because we wanted to make sure that it was
> stable enough to set
> it as the default. At this point, I do believe it is and would like to set
> it as the default, perhaps in
> the next release.
>
> To opt in, you can just update nifi.properties to change the value of the
> "nifi.provenance.repository.implementation"
> property from "org.apache.nifi.provenance.PersistentProvenanceRepository"
> to "org.apache.nifi.provenance.WriteAheadProvenanceRepository".
> The new implementation provides better throughput, will avoid the
> problematic pauses that you're encountering now,
> and also is quite a bit faster to search.
>
> Thanks
> -Mark
>
>
>
> > On May 14, 2018, at 6:41 PM, Phil H <gi...@gmail.com> wrote:
> >
> > Hi gang,
> >
> > I have started receiving this error after perhaps 24 hours of run time.
> The
> > first queue in our flow has a very large backlog by the time this error
> > arrives. What is odd is that the incoming message rate is fairly constant
> > at all times and while I am watching NiFi during the day, we never have
> any
> > backlogs. So this leads me to believe that the backlog is a symptom of
> NiFi
> > slowing itself down, not the cause of the problem. I am using 4 threads
> per
> > processor for our main flow to handle the throughput.
> >
> > Any ideas what causes this to happen and how I can fix it?
> >
> > TIA,
> > Phil
>
>

Re: Data flow rate exceeding provenance recording rate

Posted by Mark Payne <ma...@hotmail.com>.

Phil,

This is just a side effect of how the old provenance repository was designed. There is a new
implementation that is far faster and seems to be more stable. However, in order to use it,
you have to "opt in" simply because we wanted to make sure that it was stable enough to set
it as the default. At this point, I do believe it is and would like to set it as the default, perhaps in
the next release.

To opt in, you can just update nifi.properties to change the value of the "nifi.provenance.repository.implementation"
property from "org.apache.nifi.provenance.PersistentProvenanceRepository" to "org.apache.nifi.provenance.WriteAheadProvenanceRepository".
The new implementation provides better throughput, will avoid the problematic pauses that you're encountering now,
and also is quite a bit faster to search.

Thanks
-Mark

> On May 14, 2018, at 6:41 PM, Phil H <gi...@gmail.com> wrote:
> 
> Hi gang,
> 
> I have started receiving this error after perhaps 24 hours of run time. The
> first queue in our flow has a very large backlog by the time this error
> arrives. What is odd is that the incoming message rate is fairly constant
> at all times and while I am watching NiFi during the day, we never have any
> backlogs. So this leads me to believe that the backlog is a symptom of NiFi
> slowing itself down, not the cause of the problem. I am using 4 threads per
> processor for our main flow to handle the throughput.
> 
> Any ideas what causes this to happen and how I can fix it?
> 
> TIA,
> Phil