You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by "Noel J. Bergman" <no...@devtech.com> on 2002/10/30 09:43:44 UTC

[PROPOSAL] James Interrupted (and Exceptions)

During testing of Peter's interrupt() patch, I did some testing by turning
off the database server during operation.  That didn't require his
particular patch, but turning the server back on (and watching James NOT
reconnect) reminds me that we need to work on reconnecting with the database
(not for rev 2.1).  And we really need to take a close look at exception
handling and recovery, post-2.1 release.

The attached is a start towards fixing an outstanding issue with
interrupting the spooler. It is submited only for comment and review, not
commit, and contains preliminary support for allowing accept() to be
interruptible, which would impact JamesSpoolManager.  This is NOT intended
for release 2.1.  Look for the following comment string: "-- post 2.1 enable
this (NjB)" --- in fact, the changes are all commented out because they are
in my build tree, and I'm not ready to deploy them even here until post-2.1.

Depending upon what direction we take with repositories, it may or may not
be an issue, but the bottom line is that in the future we must cleanup and
normalize exception handling.  We have too many places where we toss
RuntimeExceptions rather than clean up the interface to declare what we
need, and too many places were we eat Exceptions that should be handled.  A
review and cleanup of exception handling ought to be on the v3 TODO list.

	--- Noel

JDBC Resources/Mordred issues

Posted by "Noel J. Bergman" <no...@devtech.com>.
This morning's testing with IBM JVM 1.3.1_01 demonstrated a problem.  At the
least, a default config issue, but I believe there are also coding issues.
Whether or not they require a fix before release is another matter.

James was was running along at a nice healthy clip of roughly 2500 messages
per minute (same box that was running 1600 mpm with Hotspot Server), but
from time to time, would return:

451 Error processing message: Exception spooling message: Exception caught
while storing mail Container: java.sql.SQLException: Giving up... no
connections available.

Checking the logs, I found this to be an equal opportunity exception:

java.sql.SQLException: Giving up... no connections available.
        at JdbcDataSource.getConnection
        at JDBCMailRepository.retrieve
        at JamesSpoolManager.run

java.sql.SQLException: Giving up... no connections available.
        at JdbcDataSource.getConnection
        at JDBCMailRepository.store
        at JDBCSpoolRepository.store
        at James.sendMail

store() was only throwing this a few times an hour initially, but retrieve()
was throwing this about 20 times per hour, or more, even early on when
things were running smartly.

Checking the config, I found one issue: current values in james-config.xml
(I was using a stock config for testing) provide 30 SMTP connections and 10
spool threads, but only 10 database connections.  Apparently, even with a 5
second window within getConnection(), a 1:4 ratio of resources is
insufficient.  Changing that value to 20 helped considerably, and I haven't
seen any more of those exceptions since making the change.

Another problem seems to be an inexorable memory leak.  After only about 5
hours that James precipitously bogged down:

09:11,2420,2115,1,1636,0
09:12,1929,1687,1,1292,0
09:13,1875,1650,0,1252,0
09:14,1370,1213,3,903,0
09:15,1104,970,0,734,0
09:16,1507,1303,0,1010,0
09:17,880,764,0,600,0
09:18,992,878,0,658,0
09:19,918,782,0,605,0
09:20,199,157,6,135,0
09:21,408,369,1,270,0
09:22,417,361,1,288,0
09:23,209,182,4,139,0
09:24,297,258,5,208,0
09:43,135,116,2,90,0
09:44,163,147,4,113,0

When I looked, I found that the performance loss and increase in exception
rate coincides with loss of free memory eventually causing major swapping.
I don't know if we have any memory leaks related to these exceptions,
although it is certainly possible.  IBM doesn't appear to provide a nice
stock way to look at that, so I'll go back to heap profiling with the Sun
JVM and see if I can spot anything.  I had not in previous attempts.  If any
one else is using JDBC and wants to help, you can add

  -Xrunhprof:heap=sites,file=/home/james/logs/heap.log

to the RUN_CMD in phoenix.sh.

Another facet of the problem is that even after exiting the JVM, I'm not
recovering as much free memory as I think I should.  Even if I restart
mysql.  I'll do some testing on RH 8.0 with a newer version of mysql (I'm
running the last binary build that was RH 6.2 compatible).

I have no problem with switching to a mature connection pool.  Danny has
one, I've been using mine for a few years, and, of course, there is Commons
DBCP.  Personally, I agree with the philosophy of using Commons whenever
reasonable.  However, I don't believe that we're planning to adopt that
change until v3, right?  A question remains: do we want to make any changes,
e.g., add replace the pooling loop in JdbcDataSource.getConnection() with
wait()/notify() logic?  I'm still ambivalent unless I can pin it down as the
absolute cause of a defect.

	--- Noel

-----Original Message-----
From: Serge Knystautas [mailto:sergek@lokitech.com]
Sent: Wednesday, October 30, 2002 7:13
To: James Developers List
Subject: Re: [PROPOSAL] James Interrupted (and Exceptions)


Danny Angus wrote:
>>During testing of Peter's interrupt() patch, I did some testing by turning
>>off the database server during operation.  That didn't require his
>>particular patch, but turning the server back on (and watching James NOT
>>reconnect) reminds me that we need to work on reconnecting with
>>the database
>>(not for rev 2.1).
>
>
> I have a database connection pool package which does re-connect, if this
isn't a config issue I'll compare mordred's logic with my own and see whats
what.
>
> d.

Mordred's logic is pretty weak.  At this point I would suggest wrapping
an Avalon configuration around DBCP and use that since it can also
expose javax.sql.DataSource via JNDI, which for my money would be the
best way to let mailets get database connections in a standard way.

--
Serge Knystautas
Loki Technologies - Unstoppable Websites
http://www.lokitech.com


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


RE: [PROPOSAL] James Interrupted (and Exceptions)

Posted by Danny Angus <da...@apache.org>.
Always in favour of eating our own dogfood, +1 for Commons-DBCP.
d.

> -----Original Message-----
> From: Serge Knystautas [mailto:sergek@lokitech.com]

... I would suggest ... DBCP 


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: [PROPOSAL] James Interrupted (and Exceptions)

Posted by Serge Knystautas <se...@lokitech.com>.
Danny Angus wrote:
>>During testing of Peter's interrupt() patch, I did some testing by turning
>>off the database server during operation.  That didn't require his
>>particular patch, but turning the server back on (and watching James NOT
>>reconnect) reminds me that we need to work on reconnecting with 
>>the database
>>(not for rev 2.1).
> 
> 
> I have a database connection pool package which does re-connect, if this isn't a config issue I'll compare mordred's logic with my own and see whats what.
> 
> d.

Mordred's logic is pretty weak.  At this point I would suggest wrapping 
an Avalon configuration around DBCP and use that since it can also 
expose javax.sql.DataSource via JNDI, which for my money would be the 
best way to let mailets get database connections in a standard way.

-- 
Serge Knystautas
Loki Technologies - Unstoppable Websites
http://www.lokitech.com


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


RE: [PROPOSAL] James Interrupted (and Exceptions)

Posted by Danny Angus <da...@apache.org>.
> During testing of Peter's interrupt() patch, I did some testing by turning
> off the database server during operation.  That didn't require his
> particular patch, but turning the server back on (and watching James NOT
> reconnect) reminds me that we need to work on reconnecting with 
> the database
> (not for rev 2.1).

I have a database connection pool package which does re-connect, if this isn't a config issue I'll compare mordred's logic with my own and see whats what.

d.


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>