You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by we are <ar...@gmail.com> on 2018/01/24 11:48:40 UTC

RedisConnectionPoolService Makes NiFi Unresponsive

Hi,

Recently we switched the server we run nifi on from a 24 core server to a 4
core one, and since then approximately 4 times a day nifi stops responding
until it is restarted . Then we switched to an 8 cores server, and now it
happens approximately every 2 days.

When this happens, the UI becomes unresponsive, as well as the rest api.
The number of nifi active threads metric returns 0 active threads, and the
cpu is at 100% idle. There is not large spike in flowfiles, memory or cpu
usage before nifi stops responding. But, when we checked the provenance
repo we saw that events were getting created. The logs only show that
events are being created, there are no errors or warnings. By looking into
the content of the events we were able to determine that events were
flowing up until a processor using the RedisConnectionPoolService.

We tried to connect with the debugger to different processors and all of
them, except 4, responded and the debugger connected successfully.
The other 4 are using the RedisConnectionPoolService, and they didn't
respond. 2 of these processors are custom ones we wrote, the other 2 are
the built in wait-notify mechanism. When we tried to connect to the
RedisConnectionPoolService the debugger wasn't able to connect to it as
well. The redis service that the connection pool is connected to responds
to us normally.

We tried to look at the active threads using /opt/nifi/bin/nifi.sh dump,
but we did not see anything strange.

When we tried to dig into the problem we noticed that nifi uses an old
version of spring-data-redis. We don't know if this is the problem but we
opened an issue for this: https://issues.apache.org/jira/browse/NIFI-4811u

The maximum timer driven thread count is the default (10). Our custom
processors are configured to a maximum of 10 concurrent tasks, and the
wait/notify processors are configured to 5. The RedisConnectionPoolService
is configured with the default values:
Max Total: 20
Max Idle: 8
Min Idle: 0
Block When Exhausted: true
Max Evictable Idle Time: 60 seconds
Time Between Eviction Runs: 30 seconds
Num Tests Per Eviction Run: -1

We made sure to always call connection.close() in our custom made
processors.
Is it possible that somehow connections are not released or evicted, and
that is why nifi freezes like this? How can we determine that this is the
case?

Thanks!
Daniel

Re: RedisConnectionPoolService Makes NiFi Unresponsive

Posted by we are <ar...@gmail.com>.
Hi,

We are very sorry, but we were not able to receive permission to send our
thread dump.
Is there other information that may be helpful? Could you give us tips as
to how to read this document, and what it is supposed to contain under
normal circumstances?

Thanks

On Wed, Jan 24, 2018 at 4:19 PM, Bryan Bende <bb...@gmail.com> wrote:

> Hello,
>
> Can you take a couple of thread dumps while this is happening and provide
> them so we can take a look?
>
> You can put a file name as the argument to nifi.sh dump to have it written
> to a file.
>
> Thanks,
>
> Bryan
>
> On Wed, Jan 24, 2018 at 6:48 AM we are <ar...@gmail.com> wrote:
>
> > Hi,
> >
> > Recently we switched the server we run nifi on from a 24 core server to
> a 4
> > core one, and since then approximately 4 times a day nifi stops
> responding
> > until it is restarted . Then we switched to an 8 cores server, and now it
> > happens approximately every 2 days.
> >
> > When this happens, the UI becomes unresponsive, as well as the rest api.
> > The number of nifi active threads metric returns 0 active threads, and
> the
> > cpu is at 100% idle. There is not large spike in flowfiles, memory or cpu
> > usage before nifi stops responding. But, when we checked the provenance
> > repo we saw that events were getting created. The logs only show that
> > events are being created, there are no errors or warnings. By looking
> into
> > the content of the events we were able to determine that events were
> > flowing up until a processor using the RedisConnectionPoolService.
> >
> > We tried to connect with the debugger to different processors and all of
> > them, except 4, responded and the debugger connected successfully.
> > The other 4 are using the RedisConnectionPoolService, and they didn't
> > respond. 2 of these processors are custom ones we wrote, the other 2 are
> > the built in wait-notify mechanism. When we tried to connect to the
> > RedisConnectionPoolService the debugger wasn't able to connect to it as
> > well. The redis service that the connection pool is connected to responds
> > to us normally.
> >
> > We tried to look at the active threads using /opt/nifi/bin/nifi.sh dump,
> > but we did not see anything strange.
> >
> > When we tried to dig into the problem we noticed that nifi uses an old
> > version of spring-data-redis. We don't know if this is the problem but we
> > opened an issue for this: https://issues.apache.org/
> jira/browse/NIFI-4811u
> >
> > The maximum timer driven thread count is the default (10). Our custom
> > processors are configured to a maximum of 10 concurrent tasks, and the
> > wait/notify processors are configured to 5. The
> RedisConnectionPoolService
> > is configured with the default values:
> > Max Total: 20
> > Max Idle: 8
> > Min Idle: 0
> > Block When Exhausted: true
> > Max Evictable Idle Time: 60 seconds
> > Time Between Eviction Runs: 30 seconds
> > Num Tests Per Eviction Run: -1
> >
> > We made sure to always call connection.close() in our custom made
> > processors.
> > Is it possible that somehow connections are not released or evicted, and
> > that is why nifi freezes like this? How can we determine that this is the
> > case?
> >
> > Thanks!
> > Daniel
> >
> --
> Sent from Gmail Mobile
>

Re: RedisConnectionPoolService Makes NiFi Unresponsive

Posted by Bryan Bende <bb...@gmail.com>.
Hello,

Can you take a couple of thread dumps while this is happening and provide
them so we can take a look?

You can put a file name as the argument to nifi.sh dump to have it written
to a file.

Thanks,

Bryan

On Wed, Jan 24, 2018 at 6:48 AM we are <ar...@gmail.com> wrote:

> Hi,
>
> Recently we switched the server we run nifi on from a 24 core server to a 4
> core one, and since then approximately 4 times a day nifi stops responding
> until it is restarted . Then we switched to an 8 cores server, and now it
> happens approximately every 2 days.
>
> When this happens, the UI becomes unresponsive, as well as the rest api.
> The number of nifi active threads metric returns 0 active threads, and the
> cpu is at 100% idle. There is not large spike in flowfiles, memory or cpu
> usage before nifi stops responding. But, when we checked the provenance
> repo we saw that events were getting created. The logs only show that
> events are being created, there are no errors or warnings. By looking into
> the content of the events we were able to determine that events were
> flowing up until a processor using the RedisConnectionPoolService.
>
> We tried to connect with the debugger to different processors and all of
> them, except 4, responded and the debugger connected successfully.
> The other 4 are using the RedisConnectionPoolService, and they didn't
> respond. 2 of these processors are custom ones we wrote, the other 2 are
> the built in wait-notify mechanism. When we tried to connect to the
> RedisConnectionPoolService the debugger wasn't able to connect to it as
> well. The redis service that the connection pool is connected to responds
> to us normally.
>
> We tried to look at the active threads using /opt/nifi/bin/nifi.sh dump,
> but we did not see anything strange.
>
> When we tried to dig into the problem we noticed that nifi uses an old
> version of spring-data-redis. We don't know if this is the problem but we
> opened an issue for this: https://issues.apache.org/jira/browse/NIFI-4811u
>
> The maximum timer driven thread count is the default (10). Our custom
> processors are configured to a maximum of 10 concurrent tasks, and the
> wait/notify processors are configured to 5. The RedisConnectionPoolService
> is configured with the default values:
> Max Total: 20
> Max Idle: 8
> Min Idle: 0
> Block When Exhausted: true
> Max Evictable Idle Time: 60 seconds
> Time Between Eviction Runs: 30 seconds
> Num Tests Per Eviction Run: -1
>
> We made sure to always call connection.close() in our custom made
> processors.
> Is it possible that somehow connections are not released or evicted, and
> that is why nifi freezes like this? How can we determine that this is the
> case?
>
> Thanks!
> Daniel
>
-- 
Sent from Gmail Mobile