You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by "TurboChargedDad ." <li...@gmail.com> on 2017/10/04 12:51:17 UTC

AJP connection pool issue bug?

 Hello all..
I am going to do my best to describe my problem.  Hopefully someone will
have some sort of insight.

Tomcat 7.0.41 (working on updating that)
Java 1.6 (Working on getting this updated to the latest minor release)
RHEL Linux

I inherited an opti-tenant setup.  Individual user accounts on the system
each have their own Tomcat instance, each is started using sysinit.  This
is done to keep each website in its own permissible world so one website
can't interfere with a others data.

There are two load balanced apache proxies at the edge that point to one
Tomcat server (I know I know but again I inherited this)

Apache lays over the top of tomcat to terminate SSL and uses AJP to
proxypass to each tomcat instance based on the users assigned port.

Things have run fine for years (so I am being told anyway) until recently.
Let me give an example of an outage.

User1, user2 and user3 all use unique databases on a shared database
server, SQL server 10.

User 4 runs on a windows jboss server and also has a database on shared
database server 10.

Users 5-50 all run in the mentioned Linux server using tomcat and have
databases on *other* various shared databases servers but have nothing to
do with database server 10.

User 4 had a stored proc go wild on database server 10 basically knocking
it offline.

  Now one would expect sites 1-4 to experience interruption of service
because they use a shared DBMS platform.  However.

Every single site goes down. I monitor the connections for each site with a
custom tool.  When this happens, the connections start stacking up across
all the components. (Proxies all the way through the stack)
Looking at the AJP connection pool threads for user 9 shows that user has
exhausted their AJP connection pool threads.  They are maxed out at 300 yet
that user doesn't have high activity at all. The CPU load, memory usage and
traffic for everything except SQL server 10 is stable during this outrage.
The proxies start consuming more and more memory the longer the outrage
occurs but that's expected as the connection counts stack up into the
thousands.  After a short time all the sites apache / ssl termination later
start throwing AJP timeout errors.  Shortly after that the edge proxies
will naturally also starting throwing timeout errors of their own.

I am only watching user 9 using a tool that allows me to have insight into
what's going on using JMX metrics but I suspect that once I get all the
others instrumented that I will see the same thing. Maxed out AJP
connection pools.

Aren't those supposed to be unique per user/ JVM? Am I missing something in
the docs?

Any assistance from the tomcat gods is much appreciated.


Thanks in advance.
TCD

Re: AJP connection pool issue bug?

Posted by "TurboChargedDad ." <li...@gmail.com>.
I missed some of these messages before.. I apologize.

Can I send these to you privately.

On Wed, Oct 4, 2017 at 4:01 PM, Christopher Schultz <
chris@christopherschultz.net> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> TCD,
>
> On 10/4/17 3:45 PM, TurboChargedDad . wrote:
> > Perhaps I am not wording my question correctly.
>
> Can you confirm that the connection-pool exhaustion appears to be
> happening on the AJP client (httpd/mod_proxy_ajp) and NOT on the
> server (Tomcat/AJP)?
>
> If so, the problem will likely not improve by switching-over to an
> NIO-based connector on the Tomcat side.
>
> Having said that, the real problem is likely to be simple arithmetic.
> Remember this expression:
>
> Ctc = Nhttpd * Cworkers
>
> Ctc = Connections Tomcat should be prepared to accept (e.g. Connector
> maxConnections)
>
> Nhttpd = # of httpd servers
> Cworkers = total # of connections in httpd connection pool for all
> workers(!!)
>
> Imagine the following scenario:
>
> Nhttpd = 2
> Cworker = 200
> Ntomcat = 2
>
> On httpd server A, we have a connection pool with 200 connections. If
> Tomcat A goes down, all 200 connections will go to Tomcat B. If that
> happens to both proxies (Tomcat A stops responding), then both proxies
> will send all 200 connections to Tomcat B. That means that Tomcat B
> needs to be able to support 400 connections, not 200.
>
> Let's say you now have 5 workers (1 for each application). Each worker
> gets its own connection pool, and each connection pool has 200 workers
> in it. Now, we have a situation where each httpd instance actually has
> 1000 (potential) connections in the connection pool, and if Tomcat A
> goes down, Tomcat B must be able to handle 2000 connections (1000 from
> httpd A and 1000 from httpd B).
>
> At some point, you can't provision enough threads to handle all of
> those connections.
>
> The solution (bringing this back around again) is to use NIO, because
> you can handle a LOT more connections with a lower number of threads.
> NIO doesn't allow you to handle more *concurrent* traffic (in fact, it
> makes performance a tiny bit worse than BIO), but it will allow you to
> have TONS of idle connections that aren't "wasting" request-processing
> threads that are just waiting for another actual request to come
> across the wire.
>
> > As a test I changed the following line in one of the many tomcat
> > instances running on the server and bounced it. Old <!--
> > <Connector port="9335" protocol="AJP/1.3" redirectPort="8443"
> > maxThreads="300" /> --> New <Connector port="9335"
> > protocol="org.apache.coyote.ajp.AjpNioProtocol" redirectPort="8443"
> > maxThreads="300" />
>
> Yep, that's how to do it.
>
> > As the docs state I am able to verify that it did in fact switch
> > over to NIO.
> >
> > INFO: Starting ProtocolHandler ["ajp-nio-9335"]
>
> Good. Now you can handle many idle connections with the same number of
> threads.
>
> > Will running NIO and BIO on the same box have a negative impact?
>
> No.
>
> > I am thinking they should all be switched to NIO, this was just a
> > test to see if I was understanding what I was reading.
> I would recommend NIO in all cases.
>
> > That being said I suspect there are going to be far more tweaks
> > that needs to be applied as there are none to date.
>
> Hopefully not. A recent Tomcat (which you don't actually have) with a
> stock configuration should be fairly well-configured to handle a great
> deal of traffic without falling-over.
>
> > I also know that the HTTPD server is running in prefork mode.
> That will pose some other issues for you, mostly the ability to handle
> bursts of high concurrency from your clients. You can consider it
> out-of-scope for this discussion, though. What we want to do for you
> is stop httpd+Tomcat from freaking out and getting stopped-up with
> even a small number of users.
>
> > Which I think leaves me with no control over how many connections
> > can be handed back from apache on a site by site basis.
>
> Probably not on a site-by-site basis, but you can adjust the
> connection-pool size on a per-worker basis. For prefork it MUST BE
> connection_pool_size=1 (the default for prefork httpd) and for
> "worker" and similarly-threaded MPMs the default should be fine to use.
>
> > Really having hard time explaining to others how BIO could have
> > caused the connection pool for another use to become exhausted.
> Well...
>
> If one of your Tomcats locks-up (database is dead; might want to check
> to see how the application is accessing that... infinite timeouts can
> be a real killer, here), it can tie-up connections from
> mod_proxy_ajp's connection pool. But those connections should be
> per-worker and shouldn't interfere with each other. Unless you have an
> uber-worker that handles everything for all those various Tomcats.
>
> Can you give us a peek at your worker configuration? You explained it
> a bit in your first post, but it might be time for some more details...
>
> - -chris
> -----BEGIN PGP SIGNATURE-----
> Comment: GPGTools - http://gpgtools.org
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlnVTB0dHGNocmlzQGNo
> cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFjbSw/9EyMkmQHZicehVjGg
> tueSx+0mgVPEAmrVK/ZFVM2W0zlh4xPjh+O2SS7h3ppfOEB8cMrz3qcmAYEIK8QV
> vgBieXP7CPo80/TIklJXFz50fl/T4xEGSwtJ91+U9O10Py/0QzJVRqT8ac+EinQH
> fiCmkJKpxwCHxvUyziTGyT/H9xkb885ElXxG/KnoTwJ/r4Fph18I4kEj+KPA/2Gd
> lc0i0p1MtSlD+lwsvaPnXgkXPhzUoWTOyUjj3/5/XwqeTfPTuRfz3xPH7TCrjc+S
> X6JK2oHGvw4cnkskUtodDzBglnR2py+yV5HN4LXjcTe9zYJlhaqL4/kpmC5Bk4/i
> TrQ0lyZmbYsLB6a2hHkNgt2G/mR7w+RKtdcTpArsJZdUWuwrj0zZZG4+el31MJuI
> Fr6mjkFTLkFVjEHluQGhv+fa80xhT73kqOIJWPRGUbnLW4QyrJ1dogN5kqg+go8F
> uASXqh9XZV/NsKGdmkIh07yllwKpMrPOB7QnXUbNDLWxLg7wiyN77uD1yY9uZgbo
> GcbyxH4beGTVYK98SIi6I8J2WgIyGhAR7KN4ZBu+A+a0XqiRYKb41i/O43cW2AN2
> n3QrXFQw2Xeqm9s6Y2OGtK7wSovwJfGbaNYY7f1kOHR03YxfJyoRC5GnLM4BvHJb
> RP2Ho5einxJLEpEZ9qRM/6x2qKg=
> =36Nx
> -----END PGP SIGNATURE-----
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>
>

Re: AJP connection pool issue bug?

Posted by Christopher Schultz <ch...@christopherschultz.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

TCD,

On 10/4/17 3:45 PM, TurboChargedDad . wrote:
> Perhaps I am not wording my question correctly.

Can you confirm that the connection-pool exhaustion appears to be
happening on the AJP client (httpd/mod_proxy_ajp) and NOT on the
server (Tomcat/AJP)?

If so, the problem will likely not improve by switching-over to an
NIO-based connector on the Tomcat side.

Having said that, the real problem is likely to be simple arithmetic.
Remember this expression:

Ctc = Nhttpd * Cworkers

Ctc = Connections Tomcat should be prepared to accept (e.g. Connector
maxConnections)

Nhttpd = # of httpd servers
Cworkers = total # of connections in httpd connection pool for all
workers(!!)

Imagine the following scenario:

Nhttpd = 2
Cworker = 200
Ntomcat = 2

On httpd server A, we have a connection pool with 200 connections. If
Tomcat A goes down, all 200 connections will go to Tomcat B. If that
happens to both proxies (Tomcat A stops responding), then both proxies
will send all 200 connections to Tomcat B. That means that Tomcat B
needs to be able to support 400 connections, not 200.

Let's say you now have 5 workers (1 for each application). Each worker
gets its own connection pool, and each connection pool has 200 workers
in it. Now, we have a situation where each httpd instance actually has
1000 (potential) connections in the connection pool, and if Tomcat A
goes down, Tomcat B must be able to handle 2000 connections (1000 from
httpd A and 1000 from httpd B).

At some point, you can't provision enough threads to handle all of
those connections.

The solution (bringing this back around again) is to use NIO, because
you can handle a LOT more connections with a lower number of threads.
NIO doesn't allow you to handle more *concurrent* traffic (in fact, it
makes performance a tiny bit worse than BIO), but it will allow you to
have TONS of idle connections that aren't "wasting" request-processing
threads that are just waiting for another actual request to come
across the wire.

> As a test I changed the following line in one of the many tomcat
> instances running on the server and bounced it. Old <!--
> <Connector port="9335" protocol="AJP/1.3" redirectPort="8443" 
> maxThreads="300" /> --> New <Connector port="9335"
> protocol="org.apache.coyote.ajp.AjpNioProtocol" redirectPort="8443"
> maxThreads="300" />

Yep, that's how to do it.

> As the docs state I am able to verify that it did in fact switch
> over to NIO.
> 
> INFO: Starting ProtocolHandler ["ajp-nio-9335"]

Good. Now you can handle many idle connections with the same number of
threads.

> Will running NIO and BIO on the same box have a negative impact?

No.

> I am thinking they should all be switched to NIO, this was just a 
> test to see if I was understanding what I was reading.
I would recommend NIO in all cases.

> That being said I suspect there are going to be far more tweaks
> that needs to be applied as there are none to date.

Hopefully not. A recent Tomcat (which you don't actually have) with a
stock configuration should be fairly well-configured to handle a great
deal of traffic without falling-over.

> I also know that the HTTPD server is running in prefork mode.
That will pose some other issues for you, mostly the ability to handle
bursts of high concurrency from your clients. You can consider it
out-of-scope for this discussion, though. What we want to do for you
is stop httpd+Tomcat from freaking out and getting stopped-up with
even a small number of users.

> Which I think leaves me with no control over how many connections
> can be handed back from apache on a site by site basis.

Probably not on a site-by-site basis, but you can adjust the
connection-pool size on a per-worker basis. For prefork it MUST BE
connection_pool_size=1 (the default for prefork httpd) and for
"worker" and similarly-threaded MPMs the default should be fine to use.

> Really having hard time explaining to others how BIO could have 
> caused the connection pool for another use to become exhausted.
Well...

If one of your Tomcats locks-up (database is dead; might want to check
to see how the application is accessing that... infinite timeouts can
be a real killer, here), it can tie-up connections from
mod_proxy_ajp's connection pool. But those connections should be
per-worker and shouldn't interfere with each other. Unless you have an
uber-worker that handles everything for all those various Tomcats.

Can you give us a peek at your worker configuration? You explained it
a bit in your first post, but it might be time for some more details...

- -chris
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlnVTB0dHGNocmlzQGNo
cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFjbSw/9EyMkmQHZicehVjGg
tueSx+0mgVPEAmrVK/ZFVM2W0zlh4xPjh+O2SS7h3ppfOEB8cMrz3qcmAYEIK8QV
vgBieXP7CPo80/TIklJXFz50fl/T4xEGSwtJ91+U9O10Py/0QzJVRqT8ac+EinQH
fiCmkJKpxwCHxvUyziTGyT/H9xkb885ElXxG/KnoTwJ/r4Fph18I4kEj+KPA/2Gd
lc0i0p1MtSlD+lwsvaPnXgkXPhzUoWTOyUjj3/5/XwqeTfPTuRfz3xPH7TCrjc+S
X6JK2oHGvw4cnkskUtodDzBglnR2py+yV5HN4LXjcTe9zYJlhaqL4/kpmC5Bk4/i
TrQ0lyZmbYsLB6a2hHkNgt2G/mR7w+RKtdcTpArsJZdUWuwrj0zZZG4+el31MJuI
Fr6mjkFTLkFVjEHluQGhv+fa80xhT73kqOIJWPRGUbnLW4QyrJ1dogN5kqg+go8F
uASXqh9XZV/NsKGdmkIh07yllwKpMrPOB7QnXUbNDLWxLg7wiyN77uD1yY9uZgbo
GcbyxH4beGTVYK98SIi6I8J2WgIyGhAR7KN4ZBu+A+a0XqiRYKb41i/O43cW2AN2
n3QrXFQw2Xeqm9s6Y2OGtK7wSovwJfGbaNYY7f1kOHR03YxfJyoRC5GnLM4BvHJb
RP2Ho5einxJLEpEZ9qRM/6x2qKg=
=36Nx
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: AJP connection pool issue bug?

Posted by "TurboChargedDad ." <li...@gmail.com>.
  Perhaps I am not wording my question correctly.

Today we have...
[Prxoy 1]  |  [Proxy 2]   ---> [Apache ---> tomcat1]
(HTTPS) (HTTPS)             (HTTPS) --> (AJP) -->

So we send the information from the proxies over https to the instance
running the tomcat server.

The SSL is terminated by Apache/HTTPD and handed back to tomcat over AJP.
Tomcat doesn't handle SSL in anyway.  It can't, it's not configured to do
so.  Is that how you understand the question I asked?

As a test I changed the following line in one of the many tomcat instances
running on the server and bounced it.
Old
<!--    <Connector port="9335" protocol="AJP/1.3" redirectPort="8443"
maxThreads="300" /> -->
New
    <Connector port="9335" protocol="org.apache.coyote.ajp.AjpNioProtocol"
redirectPort="8443" maxThreads="300" />

As the docs state I am able to verify that it did in fact switch over to
NIO.

INFO: Starting ProtocolHandler ["ajp-nio-9335"]


Will running NIO and BIO on the same box have a negative impact?  I am
thinking they should all be switched to NIO, this was just a test to see if
I was understanding what I was reading.  That being said I suspect there
are going to be far more tweaks that needs to be applied as there are none
to date.  I also know that the HTTPD server is running in prefork mode.
Which I think leaves me with no control over how many connections can be
handed back from apache on a site by site basis.
Really having hard time explaining to others how BIO could have caused the
connection pool for another use to become exhausted.

 Thanks,
 TCD



On Wed, Oct 4, 2017 at 1:31 PM, Mark Thomas <ma...@apache.org> wrote:

> On 04/10/17 19:26, TurboChargedDad . wrote:
> >   My initial reads about BIO vs NIO seems to involve terminating SSL at
> the
> > tomcat instance.  Which we do not do.  Am I running off into the weeds
> with
> > that?
>
> Yes. The NIO AJP connector is a drop in replacement for the BIO AJP
> connector.
>
> https://tomcat.apache.org/tomcat-7.0-doc/config/ajp.
> html#Standard_Implementations
>
> Look for the protocol attribute.
>
> Mark
>
>
> >
> > Thanks,
> > TCD
> >
> > On Wed, Oct 4, 2017 at 9:17 AM, Mark Thomas <ma...@apache.org> wrote:
> >
> >> On 04/10/17 13:51, TurboChargedDad . wrote:
> >>>  Hello all..
> >>> I am going to do my best to describe my problem.  Hopefully someone
> will
> >>> have some sort of insight.
> >>>
> >>> Tomcat 7.0.41 (working on updating that)
> >>> Java 1.6 (Working on getting this updated to the latest minor release)
> >>> RHEL Linux
> >>>
> >>> I inherited an opti-tenant setup.  Individual user accounts on the
> system
> >>> each have their own Tomcat instance, each is started using sysinit.
> This
> >>> is done to keep each website in its own permissible world so one
> website
> >>> can't interfere with a others data.
> >>>
> >>> There are two load balanced apache proxies at the edge that point to
> one
> >>> Tomcat server (I know I know but again I inherited this)
> >>>
> >>> Apache lays over the top of tomcat to terminate SSL and uses AJP to
> >>> proxypass to each tomcat instance based on the users assigned port.
> >>>
> >>> Things have run fine for years (so I am being told anyway) until
> >> recently.
> >>> Let me give an example of an outage.
> >>>
> >>> User1, user2 and user3 all use unique databases on a shared database
> >>> server, SQL server 10.
> >>>
> >>> User 4 runs on a windows jboss server and also has a database on shared
> >>> database server 10.
> >>>
> >>> Users 5-50 all run in the mentioned Linux server using tomcat and have
> >>> databases on *other* various shared databases servers but have nothing
> to
> >>> do with database server 10.
> >>>
> >>> User 4 had a stored proc go wild on database server 10 basically
> knocking
> >>> it offline.
> >>>
> >>>   Now one would expect sites 1-4 to experience interruption of service
> >>> because they use a shared DBMS platform.  However.
> >>>
> >>> Every single site goes down. I monitor the connections for each site
> >> with a
> >>> custom tool.  When this happens, the connections start stacking up
> across
> >>> all the components. (Proxies all the way through the stack)
> >>> Looking at the AJP connection pool threads for user 9 shows that user
> has
> >>> exhausted their AJP connection pool threads.  They are maxed out at 300
> >> yet
> >>> that user doesn't have high activity at all. The CPU load, memory usage
> >> and
> >>> traffic for everything except SQL server 10 is stable during this
> >> outrage.
> >>> The proxies start consuming more and more memory the longer the outrage
> >>> occurs but that's expected as the connection counts stack up into the
> >>> thousands.  After a short time all the sites apache / ssl termination
> >> later
> >>> start throwing AJP timeout errors.  Shortly after that the edge proxies
> >>> will naturally also starting throwing timeout errors of their own.
> >>>
> >>> I am only watching user 9 using a tool that allows me to have insight
> >> into
> >>> what's going on using JMX metrics but I suspect that once I get all the
> >>> others instrumented that I will see the same thing. Maxed out AJP
> >>> connection pools.
> >>>
> >>> Aren't those supposed to be unique per user/ JVM? Am I missing
> something
> >> in
> >>> the docs?
> >>>
> >>> Any assistance from the tomcat gods is much appreciated.
> >>
> >> TL;DR - Try switching to the NIO AJP connector on Tomcat.
> >>
> >> Take a look at this session I just uploaded from TomcatCon London last
> >> week. You probably want to start around 35:00 and the topic of thread
> >> exhaustion.
> >>
> >> HTH,
> >>
> >> Mark
> >>
> >> P.S. The other sessions we have are on the way. I plan to update the
> >> site and post links once I have them all uploaded.
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> >> For additional commands, e-mail: users-help@tomcat.apache.org
> >>
> >>
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>
>

Re: AJP connection pool issue bug?

Posted by Mark Thomas <ma...@apache.org>.
On 04/10/17 19:26, TurboChargedDad . wrote:
>   My initial reads about BIO vs NIO seems to involve terminating SSL at the
> tomcat instance.  Which we do not do.  Am I running off into the weeds with
> that?

Yes. The NIO AJP connector is a drop in replacement for the BIO AJP
connector.

https://tomcat.apache.org/tomcat-7.0-doc/config/ajp.html#Standard_Implementations

Look for the protocol attribute.

Mark


> 
> Thanks,
> TCD
> 
> On Wed, Oct 4, 2017 at 9:17 AM, Mark Thomas <ma...@apache.org> wrote:
> 
>> On 04/10/17 13:51, TurboChargedDad . wrote:
>>>  Hello all..
>>> I am going to do my best to describe my problem.  Hopefully someone will
>>> have some sort of insight.
>>>
>>> Tomcat 7.0.41 (working on updating that)
>>> Java 1.6 (Working on getting this updated to the latest minor release)
>>> RHEL Linux
>>>
>>> I inherited an opti-tenant setup.  Individual user accounts on the system
>>> each have their own Tomcat instance, each is started using sysinit.  This
>>> is done to keep each website in its own permissible world so one website
>>> can't interfere with a others data.
>>>
>>> There are two load balanced apache proxies at the edge that point to one
>>> Tomcat server (I know I know but again I inherited this)
>>>
>>> Apache lays over the top of tomcat to terminate SSL and uses AJP to
>>> proxypass to each tomcat instance based on the users assigned port.
>>>
>>> Things have run fine for years (so I am being told anyway) until
>> recently.
>>> Let me give an example of an outage.
>>>
>>> User1, user2 and user3 all use unique databases on a shared database
>>> server, SQL server 10.
>>>
>>> User 4 runs on a windows jboss server and also has a database on shared
>>> database server 10.
>>>
>>> Users 5-50 all run in the mentioned Linux server using tomcat and have
>>> databases on *other* various shared databases servers but have nothing to
>>> do with database server 10.
>>>
>>> User 4 had a stored proc go wild on database server 10 basically knocking
>>> it offline.
>>>
>>>   Now one would expect sites 1-4 to experience interruption of service
>>> because they use a shared DBMS platform.  However.
>>>
>>> Every single site goes down. I monitor the connections for each site
>> with a
>>> custom tool.  When this happens, the connections start stacking up across
>>> all the components. (Proxies all the way through the stack)
>>> Looking at the AJP connection pool threads for user 9 shows that user has
>>> exhausted their AJP connection pool threads.  They are maxed out at 300
>> yet
>>> that user doesn't have high activity at all. The CPU load, memory usage
>> and
>>> traffic for everything except SQL server 10 is stable during this
>> outrage.
>>> The proxies start consuming more and more memory the longer the outrage
>>> occurs but that's expected as the connection counts stack up into the
>>> thousands.  After a short time all the sites apache / ssl termination
>> later
>>> start throwing AJP timeout errors.  Shortly after that the edge proxies
>>> will naturally also starting throwing timeout errors of their own.
>>>
>>> I am only watching user 9 using a tool that allows me to have insight
>> into
>>> what's going on using JMX metrics but I suspect that once I get all the
>>> others instrumented that I will see the same thing. Maxed out AJP
>>> connection pools.
>>>
>>> Aren't those supposed to be unique per user/ JVM? Am I missing something
>> in
>>> the docs?
>>>
>>> Any assistance from the tomcat gods is much appreciated.
>>
>> TL;DR - Try switching to the NIO AJP connector on Tomcat.
>>
>> Take a look at this session I just uploaded from TomcatCon London last
>> week. You probably want to start around 35:00 and the topic of thread
>> exhaustion.
>>
>> HTH,
>>
>> Mark
>>
>> P.S. The other sessions we have are on the way. I plan to update the
>> site and post links once I have them all uploaded.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
>> For additional commands, e-mail: users-help@tomcat.apache.org
>>
>>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: AJP connection pool issue bug?

Posted by "TurboChargedDad ." <li...@gmail.com>.
  My initial reads about BIO vs NIO seems to involve terminating SSL at the
tomcat instance.  Which we do not do.  Am I running off into the weeds with
that?

Thanks,
TCD

On Wed, Oct 4, 2017 at 9:17 AM, Mark Thomas <ma...@apache.org> wrote:

> On 04/10/17 13:51, TurboChargedDad . wrote:
> >  Hello all..
> > I am going to do my best to describe my problem.  Hopefully someone will
> > have some sort of insight.
> >
> > Tomcat 7.0.41 (working on updating that)
> > Java 1.6 (Working on getting this updated to the latest minor release)
> > RHEL Linux
> >
> > I inherited an opti-tenant setup.  Individual user accounts on the system
> > each have their own Tomcat instance, each is started using sysinit.  This
> > is done to keep each website in its own permissible world so one website
> > can't interfere with a others data.
> >
> > There are two load balanced apache proxies at the edge that point to one
> > Tomcat server (I know I know but again I inherited this)
> >
> > Apache lays over the top of tomcat to terminate SSL and uses AJP to
> > proxypass to each tomcat instance based on the users assigned port.
> >
> > Things have run fine for years (so I am being told anyway) until
> recently.
> > Let me give an example of an outage.
> >
> > User1, user2 and user3 all use unique databases on a shared database
> > server, SQL server 10.
> >
> > User 4 runs on a windows jboss server and also has a database on shared
> > database server 10.
> >
> > Users 5-50 all run in the mentioned Linux server using tomcat and have
> > databases on *other* various shared databases servers but have nothing to
> > do with database server 10.
> >
> > User 4 had a stored proc go wild on database server 10 basically knocking
> > it offline.
> >
> >   Now one would expect sites 1-4 to experience interruption of service
> > because they use a shared DBMS platform.  However.
> >
> > Every single site goes down. I monitor the connections for each site
> with a
> > custom tool.  When this happens, the connections start stacking up across
> > all the components. (Proxies all the way through the stack)
> > Looking at the AJP connection pool threads for user 9 shows that user has
> > exhausted their AJP connection pool threads.  They are maxed out at 300
> yet
> > that user doesn't have high activity at all. The CPU load, memory usage
> and
> > traffic for everything except SQL server 10 is stable during this
> outrage.
> > The proxies start consuming more and more memory the longer the outrage
> > occurs but that's expected as the connection counts stack up into the
> > thousands.  After a short time all the sites apache / ssl termination
> later
> > start throwing AJP timeout errors.  Shortly after that the edge proxies
> > will naturally also starting throwing timeout errors of their own.
> >
> > I am only watching user 9 using a tool that allows me to have insight
> into
> > what's going on using JMX metrics but I suspect that once I get all the
> > others instrumented that I will see the same thing. Maxed out AJP
> > connection pools.
> >
> > Aren't those supposed to be unique per user/ JVM? Am I missing something
> in
> > the docs?
> >
> > Any assistance from the tomcat gods is much appreciated.
>
> TL;DR - Try switching to the NIO AJP connector on Tomcat.
>
> Take a look at this session I just uploaded from TomcatCon London last
> week. You probably want to start around 35:00 and the topic of thread
> exhaustion.
>
> HTH,
>
> Mark
>
> P.S. The other sessions we have are on the way. I plan to update the
> site and post links once I have them all uploaded.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>
>

Re: AJP connection pool issue bug?

Posted by Mark Thomas <ma...@apache.org>.
On 4 October 2017 15:17:25 BST, Mark Thomas <ma...@apache.org> wrote:
>On 04/10/17 13:51, TurboChargedDad . wrote:
>>  Hello all..
>> I am going to do my best to describe my problem.  Hopefully someone
>will
>> have some sort of insight.
>> 
>> Tomcat 7.0.41 (working on updating that)
>> Java 1.6 (Working on getting this updated to the latest minor
>release)
>> RHEL Linux
>> 
>> I inherited an opti-tenant setup.  Individual user accounts on the
>system
>> each have their own Tomcat instance, each is started using sysinit. 
>This
>> is done to keep each website in its own permissible world so one
>website
>> can't interfere with a others data.
>> 
>> There are two load balanced apache proxies at the edge that point to
>one
>> Tomcat server (I know I know but again I inherited this)
>> 
>> Apache lays over the top of tomcat to terminate SSL and uses AJP to
>> proxypass to each tomcat instance based on the users assigned port.
>> 
>> Things have run fine for years (so I am being told anyway) until
>recently.
>> Let me give an example of an outage.
>> 
>> User1, user2 and user3 all use unique databases on a shared database
>> server, SQL server 10.
>> 
>> User 4 runs on a windows jboss server and also has a database on
>shared
>> database server 10.
>> 
>> Users 5-50 all run in the mentioned Linux server using tomcat and
>have
>> databases on *other* various shared databases servers but have
>nothing to
>> do with database server 10.
>> 
>> User 4 had a stored proc go wild on database server 10 basically
>knocking
>> it offline.
>> 
>>   Now one would expect sites 1-4 to experience interruption of
>service
>> because they use a shared DBMS platform.  However.
>> 
>> Every single site goes down. I monitor the connections for each site
>with a
>> custom tool.  When this happens, the connections start stacking up
>across
>> all the components. (Proxies all the way through the stack)
>> Looking at the AJP connection pool threads for user 9 shows that user
>has
>> exhausted their AJP connection pool threads.  They are maxed out at
>300 yet
>> that user doesn't have high activity at all. The CPU load, memory
>usage and
>> traffic for everything except SQL server 10 is stable during this
>outrage.
>> The proxies start consuming more and more memory the longer the
>outrage
>> occurs but that's expected as the connection counts stack up into the
>> thousands.  After a short time all the sites apache / ssl termination
>later
>> start throwing AJP timeout errors.  Shortly after that the edge
>proxies
>> will naturally also starting throwing timeout errors of their own.
>> 
>> I am only watching user 9 using a tool that allows me to have insight
>into
>> what's going on using JMX metrics but I suspect that once I get all
>the
>> others instrumented that I will see the same thing. Maxed out AJP
>> connection pools.
>> 
>> Aren't those supposed to be unique per user/ JVM? Am I missing
>something in
>> the docs?
>> 
>> Any assistance from the tomcat gods is much appreciated.
>
>TL;DR - Try switching to the NIO AJP connector on Tomcat.
>
>Take a look at this session I just uploaded from TomcatCon London last
>week. You probably want to start around 35:00 and the topic of thread
>exhaustion.

Whoops. Here is the link.

https://youtu.be/2QYWp1k5QQM

Mark


>
>HTH,
>
>Mark
>
>P.S. The other sessions we have are on the way. I plan to update the
>site and post links once I have them all uploaded.
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
>For additional commands, e-mail: users-help@tomcat.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: AJP connection pool issue bug?

Posted by Mark Thomas <ma...@apache.org>.
On 04/10/17 13:51, TurboChargedDad . wrote:
>  Hello all..
> I am going to do my best to describe my problem.  Hopefully someone will
> have some sort of insight.
> 
> Tomcat 7.0.41 (working on updating that)
> Java 1.6 (Working on getting this updated to the latest minor release)
> RHEL Linux
> 
> I inherited an opti-tenant setup.  Individual user accounts on the system
> each have their own Tomcat instance, each is started using sysinit.  This
> is done to keep each website in its own permissible world so one website
> can't interfere with a others data.
> 
> There are two load balanced apache proxies at the edge that point to one
> Tomcat server (I know I know but again I inherited this)
> 
> Apache lays over the top of tomcat to terminate SSL and uses AJP to
> proxypass to each tomcat instance based on the users assigned port.
> 
> Things have run fine for years (so I am being told anyway) until recently.
> Let me give an example of an outage.
> 
> User1, user2 and user3 all use unique databases on a shared database
> server, SQL server 10.
> 
> User 4 runs on a windows jboss server and also has a database on shared
> database server 10.
> 
> Users 5-50 all run in the mentioned Linux server using tomcat and have
> databases on *other* various shared databases servers but have nothing to
> do with database server 10.
> 
> User 4 had a stored proc go wild on database server 10 basically knocking
> it offline.
> 
>   Now one would expect sites 1-4 to experience interruption of service
> because they use a shared DBMS platform.  However.
> 
> Every single site goes down. I monitor the connections for each site with a
> custom tool.  When this happens, the connections start stacking up across
> all the components. (Proxies all the way through the stack)
> Looking at the AJP connection pool threads for user 9 shows that user has
> exhausted their AJP connection pool threads.  They are maxed out at 300 yet
> that user doesn't have high activity at all. The CPU load, memory usage and
> traffic for everything except SQL server 10 is stable during this outrage.
> The proxies start consuming more and more memory the longer the outrage
> occurs but that's expected as the connection counts stack up into the
> thousands.  After a short time all the sites apache / ssl termination later
> start throwing AJP timeout errors.  Shortly after that the edge proxies
> will naturally also starting throwing timeout errors of their own.
> 
> I am only watching user 9 using a tool that allows me to have insight into
> what's going on using JMX metrics but I suspect that once I get all the
> others instrumented that I will see the same thing. Maxed out AJP
> connection pools.
> 
> Aren't those supposed to be unique per user/ JVM? Am I missing something in
> the docs?
> 
> Any assistance from the tomcat gods is much appreciated.

TL;DR - Try switching to the NIO AJP connector on Tomcat.

Take a look at this session I just uploaded from TomcatCon London last
week. You probably want to start around 35:00 and the topic of thread
exhaustion.

HTH,

Mark

P.S. The other sessions we have are on the way. I plan to update the
site and post links once I have them all uploaded.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org