You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@directory.apache.org by "Moyer, Steven William" <sw...@psu.edu> on 2019/07/15 17:08:51 UTC

LdapConnectionPool problems

We've got a few interesting problems occurring at a network level in our systems and I was hoping for some pointers on how to troubleshoot them.  We're connecting to both OpenLDAP and Microsoft AD using LDAP API client.  For testing, I've built a quick fixture using 2.0.0-AM4.

1)  We have a variety of devices (security and load-balancers) between our LDAP client and the LDAP servers.  Using a connection pool with testWhileIdle = true and a timeBetweenEvictionRunsMillis = 3700000 (over an hour) we're seeing the load-balancer disconnect unused connections (this is the expected behavior at 1 hour).  When this happens, we'll typically see between one and four of the eight connections receive a RST/ACK in Wireshark which seems to trigger the pool to fill back up.  The odd thing is that regardless of how many RST/ACK we see, there are always three new connections established and (presumably) added to the pool.  We've also got MinIdle = 8, so you would think that this would result in the pool being larger or smaller depending on the number of RST/ACKs we receive but the pool always reports 8 idle connections.  We're seeing symptoms of there being broken connections in the pool and, as expected, if we set testOnBorrow = true, this behavior disappears.  My concern is that this allows the LDAP operation to proceed but our effective idle connections may be far lower than we expect.  Any idea how we might best trouble-shoot the pool's behavior?  My guess is that since numTestPerEvictionRun defaults to 3, the incoming RST/ACKs start one test cycle regardless of how many RST/ACKs we receive.

We've got a couple processes that scale running threads up and down based on load and I'm concerned that if the pool thinks it has idle connections, it's going to lend a broken connection to a thread that spins up.

2)  We have an AD server that periodically locks up.  It's connections look fine but it will never answer queries.  We're looking into why only one of the six servers has this pathology, but I'm wondering whether using the LookupLdapConnectionValidator might help detect when this problem occurs.  I know the validators are for detecting when a connections binding has changed but performing the lookup has the side-effect (for us) of showing that the connection isn't actually functional.  This problem is not frequent enough to really troubleshoot and has been solved by rebooting the server so I don't think the long-term answer is a change to how we're using the LDAP API.

3)  Our legacy load-balancer also drops connections that are unused, but doesn't send a signal (RST/ACK or FIN/ACK) to the client at all.  Are we even going to be able to detect these?  It seems like using the LookupLdapConnectionValidator might help with this as well.

One final observation - I'm trying to understand why the default lifo configuration is true.  If a stack is effectively being used to manage connections, won't the connections "on the bottom" generally be very stale?  If lifo = false results in the use of a fifo, won't that tend to balance the use of the connections in the pool?

Hopefully this all makes sense ... I should note that in general the LDAP API is working well against a very old version of AD, a new version of AD and several versions of OpenLDAP.  Right now, we're working towards finding a connection and pool configuration that best handles the active network devices used to provide resiliency and security to our organization.

Thanks for any insights you might provide!

Steve