You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@plc4x.apache.org by Adam Rossi <ac...@gmail.com> on 2020/09/02 20:35:55 UTC

Re: Leaking nioEventLoopGroup threads

Hi Julian. I would like to submit a fix for this, but the relevant section
of the code is tricky for me to understand.

Basically, if the PooledPlcDriverManager is used with
DefaultNettyPlcConnections, when code calls the "close" method of the
PlcConnection, the "close" method of the defaultNettyPlcConnection is
intercepted and never called, and the nioEventLoopGroup threads are never
closed as your patch corrected. The pooled PlcConnection can go stale or
become invalidated by the PLC while it is waiting in the pool, so perhaps
the fix is to test if the connection isConnected when it is borrowed from
the pool, and then explicitly call the closed method before returning it if
the connection is not connected?

I would have thought the pooledPlcDriverManager would always test for and
return a connected PlcConnection, but I seem to remember another discussion
on this topic and the preference was expressed to not always call connect()
on the PlcConnection before returning.

From the PooledPlcDriverManager class my potential fix is below (see
comment starting "//Call the close method..."

@Override
public PlcConnection getConnection(String url, PlcAuthentication
authentication) throws PlcConnectionException {
PoolKey poolKey = poolKeyFactory.getPoolKey(url, authentication);
if (LOGGER.isDebugEnabled()) {
if (authentication != noPlcAuthentication) {
LOGGER.debug("Try to borrow an object for url {} and authentication {}",
url, authentication);
} else {
LOGGER.debug("Try to borrow an object for url {}", url);
}
}
PlcConnection plcConnection;
try {
plcConnection = keyedObjectPool.borrowObject(poolKey); //Call the close
method if the connection has lost connection to close threads that might
remain open if (!plcConnection.isConnected()) { plcConnection.close(); }
} catch (Exception e) {
throw new PlcConnectionException(e);
}
// Used to invalidate a proxy
AtomicBoolean proxyInvalidated = new AtomicBoolean(false);
return (PlcConnection) Proxy.newProxyInstance(classLoader, new Class[]{
PlcConnection.class}, (proxy, method, args) -> {
if (proxyInvalidated.get()) {
throw new IllegalStateException("Proxy not valid anymore");
}
if ("close".equals(method.getName())) {
LOGGER.debug("close called on {}", plcConnection);
proxyInvalidated.set(true);
keyedObjectPool.returnObject(poolKey, plcConnection);
return null;
} else {
try {
return method.invoke(plcConnection, args);
} catch (InvocationTargetException e) {
if (e.getCause().getClass() == PlcConnectionException.class) {
keyedObjectPool.invalidateObject(poolKey, plcConnection);
proxyInvalidated.set(true);
}
throw e;
}
}
});
}


On Tue, Aug 25, 2020 at 10:30 AM Julian Feinauer <
j.feinauer@pragmaticminds.de> wrote:

> Hi Adam,
>
>
>
> sorry form y late response and in facgt thanks for your feedback, happy to
> hear!
>
> I will merge my branch and close the issue, no worries.
>
>
>
> And you are exactly right, just log another issue fort he Pool and we hope
> that someone may find the time to have a look there : )
>
>
>
> Julian
>
>
>
> PS.: As you already digged into the code you can try to fix it yourself of
> course, I would be more than happy to get a PR from you : )
>
>
>
> *Von: *Adam Rossi <ac...@gmail.com>
> *Datum: *Dienstag, 25. August 2020 um 15:45
> *An: *Julian Feinauer <j....@pragmaticminds.de>
> *Cc: *"dev@plc4x.apache.org" <de...@plc4x.apache.org>
> *Betreff: *Re: Leaking nioEventLoopGroup threads
>
>
>
> Juilan,
>
>
>
> My apologies - your fix did indeed work. The issue is that
> PooledPlcDriverManager does not seem to be calling the close method on the
> connection. Switching back to PlcDriverManager from PooledPlcDriverManager
> results in your new log comments showing up in the log, and more
> importantly no more leaks of the nioEventLoopGroup threads. I have tested
> the code in a loop for an hour or so and it is working perfectly.
>
>
>
> The PooledPlcDriverManager seems to intercept the close method on the
> plcConnection (lines 125 - 130):
>
>
>
>             if ("close".equals(method.getName())) {
>
>                 LOGGER.debug("close called on {}", plcConnection);
>
>                 proxyInvalidated.set(true);
>
>                 keyedObjectPool.returnObject(poolKey, plcConnection);
>
>                 return null;
>
>             } else {
>
>
>
> Which makes sense as it is trying to keep active connections pooled.
> However, when this connection is again retrieved from the pool it seems the
> plcConnection connects again and creates an additional nioEventLoopGroup
> thread, which is never closed.
>
>
>
> I am new to working with this project and Jira. It seems to me that I
> should close the issue I just created as your fix does indeed correct the
> original issue, and perhaps open another issue on PooledPlcDriverManager
> for this thread leak?
>
>
>
> Regards, Adam
>
>
>
> On Mon, Aug 24, 2020 at 4:39 AM Julian Feinauer <
> j.feinauer@pragmaticminds.de> wrote:
>
> Hi,
>
> short feedback. I looked into the code and indeed it seems that we had a
> bug there which could lead tot he socket leak you described.
> I pushed a fix in the branch:
>
> https://github.com/apache/plc4x/tree/bugfix/close-eventloop-after-channel
>
> Would you mind taking a look and testing this with your code @Adam Rossi?
>
> Thanks!
> Juliasn
>
> Am 24.08.20, 08:26 schrieb "Julian Feinauer" <j.feinauer@pragmaticminds.de
> >:
>
>     Perhaps, some related questions:
>
>     - You are using Linux for your Tests?
>     - Do you close all Connections properly?
>     Normally the `PlcConnection.close()` method should close the EventLoop.
>
>     Julian
>
>     Am 24.08.20, 08:23 schrieb "Julian Feinauer" <
> j.feinauer@pragmaticminds.de>:
>
>         Hi Adam,
>
>         I will have a look today!
>
>         Do we have a Jira Issue for it already?
>
>         Julian
>
>         Am 24.08.20, 07:38 schrieb "Christofer Dutz" <
> christofer.dutz@c-ware.de>:
>
>             Hi Adam,
>
>             of course that's unfortunate ... also I will not be able to
> address this issue soon as I have to work on the tasks of my research
> project.
>             I have one more month to work on this and I'm months behind
> schedule because I have been doing free support way too much lately.
>
>             I really hope Julian will be able to help ... he's way more
> into the details of Netty than I am (cause he's got the book ;-) )
>
>             So Julian? ... it would be super awesome if you could take on
> this issue.
>
>             Chris
>
>
>
>             Am 24.08.20, 00:17 schrieb "Adam Rossi" <ac...@gmail.com>:
>
>                 Thanks, I did test with 0.8.0-SNAPSHOT and see the same
> behavior. In every
>                 plcConnection a nioEventLoopGroup thread is created and
> does not ever seem
>                 to be destroyed.
>
>                 I wish I understood the io.netty.channel.EventLoopGroup
> class better to be
>                 more helpful here. Would an example program that
> reproduces this thread
>                 leak be useful?
>
>                 jconsole shows the thread data as:
>
>                 Name: nioEventLoopGroup-19-1
>                 State: RUNNABLE
>                 Total blocked: 0  Total waited: 0
>
>                 Stack trace:
>                 java.base@13.0.2/sun.nio.ch.EPoll.wait(Native Method)
>                 java.base@13.0.2
>
> /sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:120)
>                 java.base@13.0.2
>
> /sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:124)
>                    - locked
> io.netty.channel.nio.SelectedSelectionKeySet@1838f97
>                    - locked sun.nio.ch.EPollSelectorImpl@1f49287
>                 java.base@13.0.2/sun.nio.ch
> .SelectorImpl.select(SelectorImpl.java:141)
>
> app//io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68)
>
> app//io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803)
>
> app//io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
>
> app//io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>
> app//io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>
> app//io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>                 java.base@13.0.2/java.lang.Thread.run(Thread.java:830)
>
>
>                 Regards, Adam
>
>                 On Sun, Aug 23, 2020 at 1:00 PM Christofer Dutz <
> christofer.dutz@c-ware.de>
>                 wrote:
>
>                 > Hi Adam,
>                 >
>                 > the Apache SNAPSHOT repo is at:
>                 >
> https://repository.apache.org/content/repositories/snapshots
>                 >
>                 > Adding this to your pom should help:
>                 >
>                 >   <!-- Make Snapshots of Apache projects available -->
>                 >   <repositories>
>                 >     <repository>
>                 >       <id>apache-snapshots</id>
>                 >       <url>
> https://repository.apache.org/content/repositories/snapshots
>                 > </url>
>                 >       <releases>
>                 >         <enabled>false</enabled>
>                 >       </releases>
>                 >       <snapshots>
>                 >         <enabled>true</enabled>
>                 >       </snapshots>
>                 >     </repository>
>                 >   </repositories>
>                 >
>                 >   <!-- Make Snapshots of Apache plugins available -->
>                 >   <pluginRepositories>
>                 >     <pluginRepository>
>                 >       <id>apache-snapshots</id>
>                 >       <url>
> https://repository.apache.org/content/repositories/snapshots
>                 > </url>
>                 >       <releases>
>                 >         <enabled>false</enabled>
>                 >       </releases>
>                 >       <snapshots>
>                 >         <enabled>true</enabled>
>                 >       </snapshots>
>                 >     </pluginRepository>
>                 >   </pluginRepositories>
>                 >
>                 > Chris
>                 > Am 23.08.20, 18:56 schrieb "Adam Rossi" <
> ac.rossi@gmail.com>:
>                 >
>                 >     Sure thing. Is 0.8.0-snapshot hosted anywhere or is
> that something
>                 > that needs to be built? Regards Adam
>                 >
>                 >     > On Aug 23, 2020, at 12:31 PM, Christofer Dutz <
>                 > christofer.dutz@c-ware.de> wrote:
>                 >     >
>                 >     > Hmm ...
>                 >     >
>                 >     > Could you possibly give 0.8.0-SNAPSHOT a try? ....
> or 0.6.x? ...
>                 > 0.7.0 was the first of the new generation drivers. We're
> maintaining the
>                 > 0.6 branch and working hard on making the new generation
> drivers 100%
>                 > production ready.
>                 >     >
>                 >     > Chris
>                 >     >
>                 >     >
>                 >     > Am 23.08.20, 18:06 schrieb "Adam Rossi" <
> ac.rossi@gmail.com>:
>                 >     >
>                 >     >    This is the latest 0.7.0 release from Maven.
>                 >     >
>                 >     >
>                 >     >    Regards Adam
>                 >     >
>                 >     >> On Aug 23, 2020, at 11:56 AM, Christofer Dutz <
>                 > christofer.dutz@c-ware.de> wrote:
>                 >     >>
>                 >     >> Hi Adam,
>                 >     >>
>                 >     >> which version of PLC4X are you using? I know we
> had similar reports
>                 > some time ago, but had thought we had fixed them
>                 >     >>
>                 >     >> Chris
>                 >     >>
>                 >     >>
>                 >     >>
>                 >     >> Am 23.08.20, 16:40 schrieb "Adam Rossi" <
> ac.rossi@gmail.com>:
>                 >     >>
>                 >     >>   Howdy. I am seeing a persistent thread being
> created for every
>                 >     >>   plcConnection connect which looks like the
> following:
>                 >     >>
>                 >     >>   Name: nioEventLoopGroup-11-1
>                 >     >>   State: RUNNABLE
>                 >     >>   Total blocked: 0  Total waited: 0
>                 >     >>
>                 >     >>   Stack trace:
>                 >     >>   java.base@13.0.2/sun.nio.ch.EPoll.wait(Native
> Method)
>                 >     >>   java.base@13.0.2
>                 >     >>
>  /sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:120)
>                 >     >>   java.base@13.0.2
>                 >     >>
>  /sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:124)
>                 >     >>      - locked
> io.netty.channel.nio.SelectedSelectionKeySet@f6df6b
>                 >     >>      - locked sun.nio.ch.EPollSelectorImpl@7ffc19
>                 >     >>   java.base@13.0.2/sun.nio.ch
>                 > .SelectorImpl.select(SelectorImpl.java:141)
>                 >     >>
>                 >
> app//io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68)
>                 >     >>
>                 >
> app//io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803)
>                 >     >>
>  app//io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
>                 >     >>
>                 >
> app//io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>                 >     >>
>                 >
> app//io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>                 >     >>
>                 >
> app//io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>                 >     >>   java.base@13.0.2
> /java.lang.Thread.run(Thread.java:830)
>                 >     >>
>                 >     >>
>                 >     >>   This happens both when creating a connection
> directly or when
>                 > using the
>                 >     >>   plc4x connection pool.
>                 >     >>
>                 >     >>   In both cases I am obtaining the connection,
> reading some modbus
>                 > values,
>                 >     >>   and then closing the connection. From the
> plcConnection metadata
>                 > the
>                 >     >>   connection type I am using is:
>                 >     >>
>                 >     >>
>  org.apache.plc4x.java.spi.connection.DefaultNettyPlcConnection
>                 >     >>
>                 >     >>   Over time thousands of these nioEventLoopGroup
> threads are
>                 > created and I
>                 >     >>   experience an OutOfMemory error in my program.
>                 >     >>
>                 >     >>   Has anyone experienced this problem? Is there a
> way to force
>                 > these threads
>                 >     >>   to close, timeout, or to specify the total
> thread limit?
>                 >     >>
>                 >     >>   Thanks and Regards, Adam
>                 >     >>
>                 >     >
>                 >
>                 >
>
>
>
>