You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-user@james.apache.org by Michael Nestler <mn...@whoglue.com> on 2004/02/13 15:18:46 UTC
James Running Out Of Connections
Hello,
I am running James 2.1.3 under Suse Linux 9.0 (kernel 2.4.21) and Sun
JDK 1.4.2_03 in a production email forwarding system. From time to time
(twice a day to once every few days) James runs out of connections as
shown in the attached part of the connections log file. Searching the
mailing list archive I found that there was a Linux 2.4.5 bug (?) that
kept Java applications from closing sockets (?).
Do you think this bug is the cause of the problem? Could it be something
else? If not, is there a workaround or should I switch to Windows - or
would it help already to increase the connection pool size and to reduce
the idle timeout for connections?
Thanks in advance,
Michael
--
Michael Nestler
Director of Technology,
WhoGlue, Inc.
Re: James Running Out Of Connections
Posted by Cesar Bonadio <bo...@picture.com.br>.
You should try to increase the number of connections on config.xml, I
have the same setup
LInux + Java and works fine, try to see if you really have all that
connections with
netstat -n and search for connection on port 25
[]s
On Fri, 2004-02-13 at 12:18, Michael Nestler wrote:
> Hello,
>
> I am running James 2.1.3 under Suse Linux 9.0 (kernel 2.4.21) and Sun
> JDK 1.4.2_03 in a production email forwarding system. From time to time
> (twice a day to once every few days) James runs out of connections as
> shown in the attached part of the connections log file. Searching the
> mailing list archive I found that there was a Linux 2.4.5 bug (?) that
> kept Java applications from closing sockets (?).
>
> Do you think this bug is the cause of the problem? Could it be something
> else? If not, is there a workaround or should I switch to Windows - or
> would it help already to increase the connection pool size and to reduce
> the idle timeout for connections?
>
> Thanks in advance,
> Michael
Re: James Running Out Of Connections
Posted by Michael Nestler <mn...@whoglue.com>.
Thank you - I will try to find and run it.
Michael
Cesar Bonadio wrote:
>Just another tip
>
>You can use a program called smtp-source that comes with postfix to send
>email to your server
>using several session you just start the program and indicates how many
>sessions you want
>and the program opens as many simultaneous sessions and send a test
>email. This way
>you can make sure that the problem is not related to James and how James
>will work in such load
>
>[]s
>
>
>
>On Fri, 2004-02-13 at 15:30, Michael Nestler wrote:
>
>
>
>>I will send the netstat output if it should happen again. Yes, I had a
>>connection limit of 30 (increased it now to 120) - but the connections
>>were not release for 6 hours. James started logging
>>
>>12/02/04 02:29:38 WARN connections: Maximum number of open connections
>>exceeded - refusing connection. Current number of connections is 30
>>
>>and continued doing this once every few seconds until I stopped it when
>>I came to work in the morning:
>>
>>12/02/04 08:11:19 WARN connections: Maximum number of open connections
>>exceeded - refusing connection. Current number of connections is 30
>>12/02/04 08:11:21 DEBUG connections: Disposing server
>>connection...org.apache.james.util.connection.ServerConnection@178655
>>12/02/04 08:11:21 DEBUG connections: Closed server connection - cleaning
>>up clients - org.apache.james.util.connection.ServerConnection@178655
>>12/02/04 08:11:21 DEBUG connections: Cleaned up clients -
>>org.apache.james.util.connection.ServerConnection@178655
>>12/02/04 08:11:21 DEBUG connections: Disposing server
>>connection...org.apache.james.util.connection.ServerConnection@746ad0
>>12/02/04 08:11:21 DEBUG connections: Closed server connection - cleaning
>>up clients - org.apache.james.util.connection.ServerConnection@746ad0
>>
>>So, yes it used all connections in the pool - but they were never
>>released. That is my problem.
>>
>>Michael
>>
>>
>>
>>Noel J. Bergman wrote:
>>
>>
>>
>>>Michael,
>>>
>>>You can use netstat | grep smtp to check, but I suspect that James is
>>>working as designed. It appears that you have a connection limit of 30, and
>>>really do have 30 simultaneously active connections.
>>>
>>> --- Noel
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
>>>For additional commands, e-mail: server-user-help@james.apache.org
>>>
>>>
>>>
>>>
>>>
>>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
>>For additional commands, e-mail: server-user-help@james.apache.org
>>
>>
>>
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org
RE: James Running Out Of Connections
Posted by Cesar Bonadio <bo...@picture.com.br>.
> That sounds like a problem we may have already fixed. I'd like to know what you see with the current test build. If it isn't fixed in the test build, we should be able to modify the instrumentation to help track this problem.
>
> --- Noel
>
Ok
I will try to put the test build to run in one server
[]s
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> For additional commands, e-mail: server-user-help@james.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org
RE: James Running Out Of Connections
Posted by Cesar Bonadio <bo...@picture.com.br>.
On Thu, 2004-04-15 at 13:02, Noel J. Bergman wrote:
> > I am not using mordred, I configured to use the other method
>
> Oh? When did you make that change?
when I installed the build version, the other version 2.1.3 is still
running with modred
> Only just now? Are the other systems that have this problem still using mordred?
I will try to change back to modred and see if the problem occours
again?
>
> --- Noel
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> For additional commands, e-mail: server-user-help@james.apache.org
>
RE: James Running Out Of Connections
Posted by "Noel J. Bergman" <no...@devtech.com>.
> I am not using mordred, I configured to use the other method
Oh? When did you make that change? Only just now? Are the other systems that have this problem still using mordred?
--- Noel
---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org
RE: James Running Out Of Connections
Posted by Cesar Bonadio <bo...@picture.com.br>.
I am not using mordred, I configured to use the other method
On Thu, 2004-04-15 at 09:48, Noel J. Bergman wrote:
> Cesar Bonadio wrote:
> > I put the test build to run in our server and so far the
> > problem did not happen again
>
> > I will keep check until tomorrow, if everything is fine I will
> > install this version in all of our server
>
> Good to hear.
>
> Unfortunately, I am seeing this problem consistently now with the current code in CVS. The only change I can think of that might have effected it is, oddly, a change to mordred. I don't think that any other change has been anywhere near this code path, although I'm looking.
>
> I've made some logging changes to help track this down.
>
> --- Noel
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> For additional commands, e-mail: server-user-help@james.apache.org
>
RE: James Running Out Of Connections
Posted by Cesar Bonadio <bo...@picture.com.br>.
On Fri, 2004-04-16 at 12:55, Noel J. Bergman wrote:
> > I am running 1.4.2_03 and its working fine too
>
> With which code? 2.2.0a18 or the code currently in CVS? I hope to post a
2.2.0a18
[]s
---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org
RE: James Running Out Of Connections
Posted by "Noel J. Bergman" <no...@devtech.com>.
> I am running 1.4.2_03 and its working fine too
With which code? 2.2.0a18 or the code currently in CVS? I hope to post a
new test build late today with everything planned for 2.2.0 except for DSN
format bounces, so hopefully you can verify that it continues to work well
in your environment.
--- Noel
---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org
RE: James Running Out Of Connections
Posted by Cesar Bonadio <bo...@picture.com.br>.
>
> Seems to be running fine with 1.4.2_04, using the same code that was crashing consistently with 1.4.1_02.
>
I am running 1.4.2_03 and its working fine too
> --- Noel
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> For additional commands, e-mail: server-user-help@james.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org
RE: James Running Out Of Connections
Posted by "Noel J. Bergman" <no...@devtech.com>.
> > I did a thread dump, and could see the threads waiting on
> > the synchronized pool object. The odd thing is that I
> > could not find anything that owned it.
> Apparently, this is a defect in Sun JVM 1.4.0 and 1.4.1. Testing
> with 1.4.2 now. The problem has been occuring every 6 hours, so
> we'll see in the AM.
Seems to be running fine with 1.4.2_04, using the same code that was crashing consistently with 1.4.1_02.
--- Noel
---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org
RE: James Running Out Of Connections
Posted by "Noel J. Bergman" <no...@devtech.com>.
> I did a thread dump, and could see the threads waiting on
> the synchronized pool object. The odd thing is that I
> could not find anything that owned it.
Apparently, this is a defect in Sun JVM 1.4.0 and 1.4.1. Testing with 1.4.2 now. The problem has been occuring every 6 hours, so we'll see in the AM.
--- Noel
---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org
RE: James Running Out Of Connections
Posted by "Noel J. Bergman" <no...@devtech.com>.
> I am using org.apache.james.util.mordred.JdbcDataSource
> So the test build with mordred is working just fine for me.
With the changes submitted yesterday to fix a deadlock problem, I was seeing consistent failures within about 6 hours. I did a thread dump, and could see the threads waiting on the synchronized pool object. The odd thing is that I could not find anything that owned it.
I am testing a change that uses an ArrayList instead of a Vector (inherently synchromized), and ensures that all uses of the pool are explicitly synchronized. One conjecture from a friend is that there might be a bug in the JVM I'm using (one of the Sun 1.4.1 vintages), and that simplifying the synchronization could relieve it.
--- Noel
---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org
RE: James Running Out Of Connections
Posted by Cesar Bonadio <bo...@picture.com.br>.
Sorry my mistake
I am using
org.apache.james.util.mordred.JdbcDataSource
I just checked the config.xml in the Test Build server and the database
is using mordred.
So the test build with mordred is working just fine for me.
[]s
On Thu, 2004-04-15 at 14:13, Noel J. Bergman wrote:
> > The only change I can think of that might have effected it is, oddly,
> > a change to mordred. I don't think that any other change has been
> > anywhere near this code path, although I'm looking.
>
> I'm very much leaning towards that being the cause. When you encounter this problem, look in Phoenix's temp/ directory, and see if there are a bunch of .m64 files that correspond with when the problem starts.
>
> My current hypothesis is that a problem in mordred is causing all of the calling threads to hang when they try to save to the spool. One thread must be stuck, and all of the others are blocked on a synchronize. I had never encounted this before, but have several times since making a change to mordred yesterday.
>
> --- Noel
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> For additional commands, e-mail: server-user-help@james.apache.org
>
RE: James Running Out Of Connections
Posted by "Noel J. Bergman" <no...@devtech.com>.
> The only change I can think of that might have effected it is, oddly,
> a change to mordred. I don't think that any other change has been
> anywhere near this code path, although I'm looking.
I'm very much leaning towards that being the cause. When you encounter this problem, look in Phoenix's temp/ directory, and see if there are a bunch of .m64 files that correspond with when the problem starts.
My current hypothesis is that a problem in mordred is causing all of the calling threads to hang when they try to save to the spool. One thread must be stuck, and all of the others are blocked on a synchronize. I had never encounted this before, but have several times since making a change to mordred yesterday.
--- Noel
---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org
RE: James Running Out Of Connections
Posted by "Noel J. Bergman" <no...@devtech.com>.
Cesar Bonadio wrote:
> I put the test build to run in our server and so far the
> problem did not happen again
> I will keep check until tomorrow, if everything is fine I will
> install this version in all of our server
Good to hear.
Unfortunately, I am seeing this problem consistently now with the current code in CVS. The only change I can think of that might have effected it is, oddly, a change to mordred. I don't think that any other change has been anywhere near this code path, although I'm looking.
I've made some logging changes to help track this down.
--- Noel
---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org
RE: James Running Out Of Connections
Posted by Cesar Bonadio <bo...@picture.com.br>.
I put the test build to run in our server and so far the problem did not
happen again
the connection log shows a smaller number of connections than netstat
what is correct
Usually I would aready have more connections in the connection log than
in netstat after
12 hours of server up.
I will keep check until tomorrow, if everything is fine I will install
this version in
all of our server
[]s
On Thu, 2004-04-15 at 00:57, Noel J. Bergman wrote:
> I encountered this problem tonight, and have opened up a JIRA issue to track this problem: http://nagoya.apache.org/jira/browse/JAMES-254
>
> Unfortunately, I reset the server before checking the state. I did review the logs, and I have a few ideas about what might be causing it, but there wasn't sufficient information in the logs to be conclusive. I am making some changes to the logging, so that exceptions that terminate the connection are logged at ERROR instead of DEBUG level.
>
> Some of you have apparently been encountering this problem all-too-frequently, and already have DEBUG enabled for the SMTPHandler. What I'd like for you to do is check the smtpserver log to see if you have a bunch of messages of the form:
>
> $DATE ERROR smtpserver: SMTP Connection has idled out.
>
> but without a subsequent exception, which might be of the form:
>
> $DATE DEBUG smtpserver: Socket to $remoteHost ($remoteIP) timeout.
>
> Also, if the connection log says that you are maxed, and netstat --inet does not show that many connections, do a KILL -3 on the james process so that we can get a thread dump, before you restart James.
>
> Right now I am letting a connection idle out before the HELO to see what it does. If that works properly, I'll let it idle out after the HELO, and will walk my way through until I reproduce the problem.
>
> --- Noel
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> For additional commands, e-mail: server-user-help@james.apache.org
>
RE: James Running Out Of Connections
Posted by "Noel J. Bergman" <no...@devtech.com>.
I encountered this problem tonight, and have opened up a JIRA issue to track this problem: http://nagoya.apache.org/jira/browse/JAMES-254
Unfortunately, I reset the server before checking the state. I did review the logs, and I have a few ideas about what might be causing it, but there wasn't sufficient information in the logs to be conclusive. I am making some changes to the logging, so that exceptions that terminate the connection are logged at ERROR instead of DEBUG level.
Some of you have apparently been encountering this problem all-too-frequently, and already have DEBUG enabled for the SMTPHandler. What I'd like for you to do is check the smtpserver log to see if you have a bunch of messages of the form:
$DATE ERROR smtpserver: SMTP Connection has idled out.
but without a subsequent exception, which might be of the form:
$DATE DEBUG smtpserver: Socket to $remoteHost ($remoteIP) timeout.
Also, if the connection log says that you are maxed, and netstat --inet does not show that many connections, do a KILL -3 on the james process so that we can get a thread dump, before you restart James.
Right now I am letting a connection idle out before the HELO to see what it does. If that works properly, I'll let it idle out after the HELO, and will walk my way through until I reproduce the problem.
--- Noel
---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org
RE: James Running Out Of Connections
Posted by "Noel J. Bergman" <no...@devtech.com>.
Please do not use HTML e-mail.
> > Have you run netstat to check the active connections?
> Yes there was 30 ESTABLISHED connections with 26 in the SMTP service
OK, if they are ESTABLISHED, with whom are they ESTABLISHED? That does not sound like you have a lot of "hung" connections.
> smtpserver-2004-04-13-10-53.log:13/04/04 10:53:29 DEBUG smtpserver: Calling reset() default Worker #40
> ......... 34576 times the same message
Hmm ... if you are saying that you had 34576 messages in one second, that should eliminate data arriving that quickly.
> smtpserver-2004-04-13-10-53.log:13/04/04 10:58:59 DEBUG smtpserver: Calling stop() default Worker #40
> smtpserver-2004-04-13-10-53.log:13/04/04 10:58:59 DEBUG smtpserver: Calling disposeWatchdog() default Worker #40
> smtpserver-2004-04-13-10-53.log:13/04/04 10:58:59 DEBUG smtpserver: Watchdog default Worker #40 is exiting run().
OK, that was alive for 30 seconds, and appears to have cleaned up just fine. If you knew to whom that was connected, you could find out what they were doing.
> What I find is that when I start the server the number of
> connection in the connection log is smaller but closer to
> the number os connections with NETSTAT but after 2-3 hours
> the number of connections in the connection log starts to grow
> little by little and passes the number of connections in NETSTAT
That sounds like a problem we may have already fixed. I'd like to know what you see with the current test build. If it isn't fixed in the test build, we should be able to modify the instrumentation to help track this problem.
--- Noel
---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org
RE: James Running Out Of Connections
Posted by Cesar Bonadio <bo...@picture.com.br>.
On Tue, 2004-04-13 at 13:31, Noel J. Bergman wrote:
> Which version of James?
2.1.3
> Have you run netstat to check the active connections?
Yes there was 30 ESTABLISHED connections with 26 in the SMTP service
> Is it possible that your service is simply being hit by lots of connections today from internet worms?
Not really the system works fine today
>
> > in the smtp log hundreds of messages like the following:
> > 13/04/04 10:58:24 DEBUG smtpserver: Calling reset() default Worker #40
> > 13/04/04 10:58:24 DEBUG smtpserver: Calling reset() default Worker #40
> > ...
> > Any idea if this can be the source of the problem
>
> Actually, that could be helpful. That message is emitted by the Watchdog for that connection when it is reset. The reset happens on one of two conditions:
>
> - a new command is entered
> - a certain amount (20K default) of data is received during
> processing of the DATA command
The messages started as:
smtpserver-2004-04-13-10-53.log:13/04/04 10:53:27 DEBUG smtpserver:
Watchdog default Worker #40 has time to sleep 360000
smtpserver-2004-04-13-10-53.log:13/04/04 10:53:28 DEBUG smtpserver:
Calling reset() default Worker #40
smtpserver-2004-04-13-10-53.log:13/04/04 10:53:29 DEBUG smtpserver:
Calling reset() default Worker #40
......... 34576 times the same message
smtpserver-2004-04-13-10-53.log:13/04/04 10:58:59 DEBUG smtpserver:
Calling stop() default Worker #40
smtpserver-2004-04-13-10-53.log:13/04/04 10:58:59 DEBUG smtpserver:
Calling disposeWatchdog() default Worker #40
smtpserver-2004-04-13-10-53.log:13/04/04 10:58:59 DEBUG smtpserver:
Watchdog default Worker #40 is exiting run().
then it resume with normal operation
What I find is that when I start the server the number of connection in
the connection log is smaller but closer to the
number os connections with NETSTAT but after 2-3 hours the number of
connections in the connection log starts to grow
little by little and passes the number of connections in NETSTAT
Like now I have in the connection log
13/04/04 16:41:39 DEBUG connections: Releasing one connection, leaving a
total of 62
13/04/04 16:41:39 DEBUG connections: Returning a
org.apache.james.util.connection.ServerConnection$ClientConnectionRunner
to the pool
13/04/04 16:41:39 DEBUG connections: Retrieving a
org.apache.james.util.connection.ServerConnection$ClientConnectionRunner
from the pool
13/04/04 16:41:39 DEBUG connections: Adding one connection for a total
of 63
13/04/04 16:41:39 DEBUG connections: Starting connection on
200.229.128.62:25 from 200.77.160.40:3805
13/04/04 16:41:39 DEBUG connections: Ending connection on
200.229.128.62:25 from 200.247.6.3:1409
13/04/04 16:41:39 DEBUG connections: Releasing one connection, leaving a
total of 62
13/04/04 16:41:39 DEBUG connections: Returning a
org.apache.james.util.connection.ServerConnection$ClientConnectionRunner
to the pool
13/04/04 16:41:39 DEBUG connections: Ending connection on
200.229.128.62:25 from 67.64.81.156:10326
13/04/04 16:41:39 DEBUG connections: Releasing one connection, leaving a
total of 61
13/04/04 16:41:39 DEBUG connections: Returning a
org.apache.james.util.connection.ServerConnection$ClientConnectionRunner
to the pool
But NETSTAT says:
[root@mx05 logs]# netstat -n | grep ESTA | wc -l
41
[root@mx05 logs]# date
Tue Apr 13 16:41:50 BRT 2004
So after 2 days my server starts to refuse connections because it thinks
that there are more than 200 simultaneous connections what is
not correct.
Any idea on how to debug more information about that?
Thanks
>
> Either someone is entering 100s of lines, they are sending you a huge message in a short period of time, or there is a really weird bug in the code. You showed 7 entries at 10:58:24, so that would imply 140K in one second. Are all of the 100s of entries at the same time? What is the rate at which they are logged?
>
> I'm leaning towards someone sending a blank command, which would not be logged but would result in the watchdog being reset.
>
> --- Noel
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> For additional commands, e-mail: server-user-help@james.apache.org
>
RE: James Running Out Of Connections
Posted by "Noel J. Bergman" <no...@devtech.com>.
Which version of James? Have you run netstat to check the active connections? Is it possible that your service is simply being hit by lots of connections today from internet worms?
> in the smtp log hundreds of messages like the following:
> 13/04/04 10:58:24 DEBUG smtpserver: Calling reset() default Worker #40
> 13/04/04 10:58:24 DEBUG smtpserver: Calling reset() default Worker #40
> ...
> Any idea if this can be the source of the problem
Actually, that could be helpful. That message is emitted by the Watchdog for that connection when it is reset. The reset happens on one of two conditions:
- a new command is entered
- a certain amount (20K default) of data is received during
processing of the DATA command
Either someone is entering 100s of lines, they are sending you a huge message in a short period of time, or there is a really weird bug in the code. You showed 7 entries at 10:58:24, so that would imply 140K in one second. Are all of the 100s of entries at the same time? What is the rate at which they are logged?
I'm leaning towards someone sending a blank command, which would not be logged but would result in the watchdog being reset.
--- Noel
---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org
James Running Out Of Connections
Posted by Cesar Bonadio <bo...@picture.com.br>.
Hello
I still have the problema of james running out of connection, today
I start a new debug session and found in the smtp log hundreds of
messages
like the following:
13/04/04 10:58:24 DEBUG smtpserver: Calling reset() default Worker #40
13/04/04 10:58:24 DEBUG smtpserver: Calling reset() default Worker #40
13/04/04 10:58:24 DEBUG smtpserver: Calling reset() default Worker #40
13/04/04 10:58:24 DEBUG smtpserver: Calling reset() default Worker #40
13/04/04 10:58:24 DEBUG smtpserver: Calling reset() default Worker #40
13/04/04 10:58:24 DEBUG smtpserver: Calling reset() default Worker #40
13/04/04 10:58:24 DEBUG smtpserver: Calling reset() default Worker #40
the interesting is that some messages only occours a few times but this
work #40 appears several hundred times.
Any idea if this can be the source of the problem
Thanks
RE: James Running Out Of Connections
Posted by "Noel J. Bergman" <no...@devtech.com>.
> You can use a program called smtp-source that comes with
> postfix to send email to your server using several session
I use Russell Coker's "postal" to do the same thing. :-)
--- Noel
---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org
Re: James Running Out Of Connections
Posted by Cesar Bonadio <bo...@picture.com.br>.
Just another tip
You can use a program called smtp-source that comes with postfix to send
email to your server
using several session you just start the program and indicates how many
sessions you want
and the program opens as many simultaneous sessions and send a test
email. This way
you can make sure that the problem is not related to James and how James
will work in such load
[]s
On Fri, 2004-02-13 at 15:30, Michael Nestler wrote:
> I will send the netstat output if it should happen again. Yes, I had a
> connection limit of 30 (increased it now to 120) - but the connections
> were not release for 6 hours. James started logging
>
> 12/02/04 02:29:38 WARN connections: Maximum number of open connections
> exceeded - refusing connection. Current number of connections is 30
>
> and continued doing this once every few seconds until I stopped it when
> I came to work in the morning:
>
> 12/02/04 08:11:19 WARN connections: Maximum number of open connections
> exceeded - refusing connection. Current number of connections is 30
> 12/02/04 08:11:21 DEBUG connections: Disposing server
> connection...org.apache.james.util.connection.ServerConnection@178655
> 12/02/04 08:11:21 DEBUG connections: Closed server connection - cleaning
> up clients - org.apache.james.util.connection.ServerConnection@178655
> 12/02/04 08:11:21 DEBUG connections: Cleaned up clients -
> org.apache.james.util.connection.ServerConnection@178655
> 12/02/04 08:11:21 DEBUG connections: Disposing server
> connection...org.apache.james.util.connection.ServerConnection@746ad0
> 12/02/04 08:11:21 DEBUG connections: Closed server connection - cleaning
> up clients - org.apache.james.util.connection.ServerConnection@746ad0
>
> So, yes it used all connections in the pool - but they were never
> released. That is my problem.
>
> Michael
>
>
>
> Noel J. Bergman wrote:
>
> >Michael,
> >
> >You can use netstat | grep smtp to check, but I suspect that James is
> >working as designed. It appears that you have a connection limit of 30, and
> >really do have 30 simultaneously active connections.
> >
> > --- Noel
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> >For additional commands, e-mail: server-user-help@james.apache.org
> >
> >
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> For additional commands, e-mail: server-user-help@james.apache.org
>
Re: James Running Out Of Connections
Posted by Michael Nestler <mn...@whoglue.com>.
Cesar Bonadio wrote:
>On Fri, 2004-02-13 at 15:30, Michael Nestler wrote:
>
>
>
>>I will send the netstat output if it should happen again. Yes, I had a
>>connection limit of 30 (increased it now to 120) - but the connections
>>were not release for 6 hours. James started logging
>>
>>
>>
>
>
>Make sure you have anougth threads configured to open 120 connections
>
>
>
I set max-threads to 150. I guess that is sufficient - but please
correct me if I should be wrong.
Michael
---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org
Re: James Running Out Of Connections
Posted by Cesar Bonadio <bo...@picture.com.br>.
On Fri, 2004-02-13 at 15:30, Michael Nestler wrote:
> I will send the netstat output if it should happen again. Yes, I had a
> connection limit of 30 (increased it now to 120) - but the connections
> were not release for 6 hours. James started logging
>
Make sure you have anougth threads configured to open 120 connections
> 12/02/04 02:29:38 WARN connections: Maximum number of open connections
> exceeded - refusing connection. Current number of connections is 30
>
> and continued doing this once every few seconds until I stopped it when
> I came to work in the morning:
>
> 12/02/04 08:11:19 WARN connections: Maximum number of open connections
> exceeded - refusing connection. Current number of connections is 30
> 12/02/04 08:11:21 DEBUG connections: Disposing server
> connection...org.apache.james.util.connection.ServerConnection@178655
> 12/02/04 08:11:21 DEBUG connections: Closed server connection - cleaning
> up clients - org.apache.james.util.connection.ServerConnection@178655
> 12/02/04 08:11:21 DEBUG connections: Cleaned up clients -
> org.apache.james.util.connection.ServerConnection@178655
> 12/02/04 08:11:21 DEBUG connections: Disposing server
> connection...org.apache.james.util.connection.ServerConnection@746ad0
> 12/02/04 08:11:21 DEBUG connections: Closed server connection - cleaning
> up clients - org.apache.james.util.connection.ServerConnection@746ad0
>
> So, yes it used all connections in the pool - but they were never
> released. That is my problem.
>
> Michael
>
>
>
> Noel J. Bergman wrote:
>
> >Michael,
> >
> >You can use netstat | grep smtp to check, but I suspect that James is
> >working as designed. It appears that you have a connection limit of 30, and
> >really do have 30 simultaneously active connections.
> >
> > --- Noel
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> >For additional commands, e-mail: server-user-help@james.apache.org
> >
> >
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> For additional commands, e-mail: server-user-help@james.apache.org
>
Re: James Running Out Of Connections
Posted by Robert J Taylor <rj...@cox.net>.
What was the resolution of this problem?
I'm running James 2.1.2 on:
Linux pjemail1.jazel.net 2.4.21-4.0.1.ELsmp #1 SMP Thu Oct 23 01:21:42
EDT 2003 i686 athlon i386 GNU/Linux
Java:
java version "1.4.2_03"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02)
Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode)
Thanks,
Robert Taylor
Cesar Bonadio wrote:
>
> java version "1.4.2_03"
> Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02)
> Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode)
>
> And Linux 2.4
>
>
> On Tue, 2004-02-17 at 17:02, Michael Nestler wrote:
>
>
>>Phew - at least I am not the only one who has this problem ;-) Are you
>>using Linux 2.4 as well?
>>
>>I also did a load test before deploying James into production and I was
>>able to deliver about 100,000 emails within 8 hours without any issue.
>>The issue seems to occur only sporadically - sometimes it takes only a
>>few hours, sometimes a few days. But once it hangs, it does not come
>>back in my case (at least not within 6 hours).
>>
>>I also had the issue of James suddenly "partially" halting before I
>>upgraded from file to db persistence. I don't know though if this was
>>the same problem.
>>
>>My connections log level was set to DEBUG when the log was made.
>>
>>I did not have any ESTABLISHED smtp connection while James was hanging -
>>almost all had status TIME_WAIT.
>>
>>Michael
>>
>>
>>Cesar Bonadio wrote:
>>
>>
>>>I think you are not alone :)
>>>
>>>In the last five days I have found several messages in the connection
>>>log with the same
>>>problem my server stops responding but after a while it comes back and
>>>if I run a netstat when
>>>I am receiving the error I really dont have more than 20 ESTABLISED
>>>connections.
>>>
>>>This is interesting problem because in our test server running the same
>>>version of James and java
>>>a manage to deliver more than 20.000 messages with smtp-source and the
>>>connections works just fine
>>>if I go over it complains and releases the connections after use.
>>>
>>>I will put my connection log in debug mode to see if I get something
>>>
>>>
>>>[]s
>>>
>>>
>>>
>>>
>>>On Tue, 2004-02-17 at 11:35, Michael Nestler wrote:
>>>
>>>
>>>
>>>
>>>>OK, James stopped again last night at about 1:50 am. I did not have the
>>>>time yet to run postal or smtp-source. But this time I have a log
>>>>showing smtp sockets with a 10 minute resolution. Since the last stop I
>>>>increased the maximum number of connections to 120 (thread number to 150).
>>>>
>>>>At 1:50 James reported in connections*.log:
>>>>
>>>>17/02/04 01:50:00 WARN connections: Maximum number of open connections
>>>>exceeded - refusing connection. Current number of connections is 120
>>>>
>>>>repeatedly until 8:29 am when I restarted James. My smtpconn.log (cron
>>>>job "netstat | grep smtp") shows for the same time:
>>>>
>>>>Tue Feb 17 01:50:00 EST 2004
>>>>tcp 0 0 email1.whoglue.net:smtp DSL217-1:fyre-messanger
>>>>SYN_RECV
>>>>tcp 0 0 192.168.168.5:smtp vm211.vm:cruise-swroute
>>>>ESTABLISHED
>>>>tcp 1 0 192.168.168.5:smtp 220.127.103.78:idmaps
>>>>CLOSE_WAIT
>>>>tcp 0 0 192.168.168.5:smtp mta.email.contine:44071
>>>>TIME_WAIT
>>>>tcp 0 0 192.168.168.5:smtp custd530.innerhost:4074
>>>>ESTABLISHED
>>>>tcp 0 0 192.168.168.5:smtp pro109.profontis.:59902
>>>>ESTABLISHED
>>>>tcp 0 0 192.168.168.5:smtp smtp01.1800contact:4220
>>>>ESTABLISHED
>>>>tcp 0 0 192.168.168.5:smtp mta.email.contine:42345
>>>>ESTABLISHED
>>>>tcp 0 0 192.168.168.5:smtp mta.email.contine:25694
>>>>ESTABLISHED
>>>>tcp 0 0 192.168.168.5:smtp stdout-mv-01.inbo:44652
>>>>TIME_WAIT
>>>>tcp 0 0 192.168.168.5:smtp 128.121.26.2:31574
>>>>ESTABLISHED
>>>>tcp 1 0 192.168.168.5:smtp ool-435798e8.dy:evb-elm
>>>>CLOSE_WAIT
>>>>tcp 1 0 192.168.168.5:smtp lsanca1-ar5:notify_srvr
>>>>CLOSE_WAIT
>>>>tcp 0 0 192.168.168.5:smtp mta113.cheetahmai:52853
>>>>TIME_WAIT
>>>>tcp 0 0 192.168.168.5:smtp 66.42.142.98:49221
>>>>ESTABLISHED
>>>>tcp 0 0 192.168.168.5:smtp server611.colonize:3926
>>>>ESTABLISHED
>>>>tcp 1 0 192.168.168.5:smtp 211-56-243-3.inter:linx
>>>>CLOSE_WAIT
>>>>tcp 0 0 192.168.168.5:smtp wchicon1.qvc.com:4611
>>>>ESTABLISHED
>>>>tcp 0 0 192.168.168.5:smtp vm125.vmadmin.com:8381
>>>>ESTABLISHED
>>>>tcp 7 0 192.168.168.5:smtp 66.28.60.195:58491
>>>>CLOSE_WAIT
>>>>tcp 0 0 192.168.168.5:smtp imo-r10.mx.aol.co:61761
>>>>ESTABLISHED
>>>>tcp 0 0 192.168.168.5:smtp bonaire.server1.o:52001
>>>>ESTABLISHED
>>>>tcp 0 0 192.168.168.5:smtp bonaire.server1.o:52272
>>>>ESTABLISHED
>>>>tcp 1 0 192.168.168.5:smtp 202.Red-213-97-41:pipes
>>>>CLOSE_WAIT
>>>>tcp 0 0 192.168.168.5:smtp n25.grp.scd.yahoo:47649
>>>>ESTABLISHED
>>>>tcp 1 0 192.168.168.5:smtp 218.98.97.183:4612
>>>>CLOSE_WAIT
>>>>tcp 1 0 192.168.168.5:smtp mail.contactel.cz:43724
>>>>CLOSE_WAIT
>>>>tcp 0 0 192.168.168.5:smtp mail.contactel.cz:43794
>>>>ESTABLISHED
>>>>tcp 0 0 192.168.168.5:smtp mail.contactel.cz:43801
>>>>ESTABLISHED
>>>>tcp 0 0 192.168.168.5:smtp 212.143.236.10:hylafax
>>>>ESTABLISHED
>>>>tcp 0 0 192.168.168.5:smtp cust54-224.netcabo:4416
>>>>ESTABLISHED
>>>>tcp 0 0 192.168.168.5:smtp ana216.anotherday:40710
>>>>ESTABLISHED
>>>>tcp 1 0 192.168.168.5:smtp 218.232.228.130:vsixml
>>>>CLOSE_WAIT
>>>>
>>>>
>>>>And at the next time the netstat snapshot was made:
>>>>
>>>>Tue Feb 17 02:00:00 EST 2004
>>>>tcp 0 0 192.168.168.5:smtp mta.email.contine:13622
>>>>TIME_WAIT
>>>>tcp 0 0 192.168.168.5:smtp imo-m23.mx.aol.co:63699
>>>>TIME_WAIT
>>>>tcp 0 0 192.168.168.5:smtp modemcable:conclave-cpp
>>>>TIME_WAIT
>>>>tcp 0 0 192.168.168.5:smtp 219.144.194.18:43349
>>>>TIME_WAIT
>>>>tcp 0 0 192.168.168.5:smtp 219.144.194.18:43281
>>>>TIME_WAIT
>>>>tcp 0 0 192.168.168.5:smtp c-24-7-95-19.clien:4669
>>>>TIME_WAIT
>>>>tcp 0 0 192.168.168.5:smtp web60804.mail.yah:30519
>>>>TIME_WAIT
>>>>tcp 0 0 192.168.168.5:smtp web60804.mail.yah:30518
>>>>TIME_WAIT
>>>>tcp 0 0 192.168.168.5:smtp relay.quickhire.c:34089
>>>>TIME_WAIT
>>>>tcp 0 0 192.168.168.5:smtp ACC1B832.:ns-cfg-server
>>>>TIME_WAIT
>>>>
>>>>
>>>>>From there on the smtp sockets log file basically shows only TIME_WAIT
>>>>sockets (there are a few SYN_RECV) until I restarted James at 8:29 am.
>>>>Never netstat shows 120 open smtp connections (what James seems to
>>>>believe). As a strange aside the default*.log contains a database
>>>>connection exception (it contained the same exception at the last time
>>>>James stopped 6 days ago). I don't know if this means anything - could
>>>>this exception have caused the problem? spoolmanager*.log and
>>>>mailet*.log stopped recording at 1:32 am.
>>>>
>>>>To summarize the events from my point of view:
>>>>o James was running for about 6 days without any problem, delivering
>>>>about 5,000-10,000 emails a day
>>>>o some event at a few minutes before 1:50 am stopped James from
>>>>returning used smtp connection socket objects to the pool starting at
>>>>that time
>>>>o as a consequence new connections were not accepted and the service
>>>>stopped effectively at 1:32 am
>>>>o it never recovered from that situation until I relieved it by
>>>>restarting phoenix at 8:29 am
>>>>
>>>>>From my point of view it seems to be likely that the mentioned
>>>>exception in default*.log caused the problem, since it happened so close
>>>>to the time James stopped:
>>>>
>>>>17/02/04 01:33:11 INFO database-connections.maildb: ***** connection 1
>>>>is way too old: 63408 > 60000
>>>>17/02/04 01:33:11 INFO database-connections.maildb: java.lang.Throwable
>>>> at
>>>>org.apache.james.util.mordred.PoolConnEntry.lock(PoolConnEntry.java:135)
>>>> at
>>>>org.apache.james.util.mordred.JdbcDataSource.getConnection(JdbcDataSource.java:178)
>>>>
>>>> at
>>>>org.apache.james.mailrepository.JDBCMailRepository.store(JDBCMailRepository.java:484)
>>>>
>>>> at
>>>>org.apache.james.mailrepository.JDBCSpoolRepository.store(JDBCSpoolRepository.java:243)
>>>>
>>>> at
>>>>org.apache.james.transport.mailets.RemoteDelivery.service(RemoteDelivery.java:649)
>>>>
>>>> at
>>>>org.apache.james.transport.LinearProcessor.service(LinearProcessor.java:413)
>>>>
>>>> at
>>>>org.apache.james.transport.JamesSpoolManager.process(JamesSpoolManager.java:436)
>>>>
>>>> at
>>>>org.apache.james.transport.JamesSpoolManager.run(JamesSpoolManager.java:366)
>>>>
>>>> at
>>>>org.apache.avalon.excalibur.thread.impl.ExecutableRunnable.execute(ExecutableRunnable.java:47)
>>>>
>>>> at
>>>>org.apache.avalon.excalibur.thread.impl.WorkerThread.run(WorkerThread.java:80)
>>>>
>>>>
>>>>On the other hand it could be just a side effect. I attached the
>>>>mentioned log files since midnight to this email in zipped format.
>>>>
>>>>I really hope somebody on this list can make any sense out of this and
>>>>knows what's going on here and how I can fix it. Let me know if I should
>>>>run postal or smtp-source first and I will do it.
>>>>
>>>>All help is greatly appreciated.
>>>>
>>>>Best regards,
>>>>Michael
>>>>
>>>>P.S. I was not able to attach a ZIP file to this email. mail.apache.org
>>>>bounced it saying "552 we don't accept email with executable content
>>>>(#5.3.4)" ? I try it again as TGZ.
>>>>
>>>>
>>>>
>>>>Noel J. Bergman wrote:
>>>>
>>>>
>>>>
>>>>
>>>>>Run netstat and check to see if they are real. I have tested the connection
>>>>>handling with linux kernel 2.2.22, 2.4.20-18 and 2.4.20-24 without problem,
>>>>>amongst other versions.
>>>>>
>>>>>What version of James?
>>>>>
>>>>> --- Noel
>>>>>
>>>>>
>>>>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
>>For additional commands, e-mail: server-user-help@james.apache.org
>>
>
>
Re: James Running Out Of Connections
Posted by Cesar Bonadio <bo...@picture.com.br>.
java version "1.4.2_03"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02)
Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode)
And Linux 2.4
On Tue, 2004-02-17 at 17:02, Michael Nestler wrote:
> Phew - at least I am not the only one who has this problem ;-) Are you
> using Linux 2.4 as well?
>
> I also did a load test before deploying James into production and I was
> able to deliver about 100,000 emails within 8 hours without any issue.
> The issue seems to occur only sporadically - sometimes it takes only a
> few hours, sometimes a few days. But once it hangs, it does not come
> back in my case (at least not within 6 hours).
>
> I also had the issue of James suddenly "partially" halting before I
> upgraded from file to db persistence. I don't know though if this was
> the same problem.
>
> My connections log level was set to DEBUG when the log was made.
>
> I did not have any ESTABLISHED smtp connection while James was hanging -
> almost all had status TIME_WAIT.
>
> Michael
>
>
> Cesar Bonadio wrote:
>
> >I think you are not alone :)
> >
> >In the last five days I have found several messages in the connection
> >log with the same
> >problem my server stops responding but after a while it comes back and
> >if I run a netstat when
> >I am receiving the error I really dont have more than 20 ESTABLISED
> >connections.
> >
> >This is interesting problem because in our test server running the same
> >version of James and java
> >a manage to deliver more than 20.000 messages with smtp-source and the
> >connections works just fine
> >if I go over it complains and releases the connections after use.
> >
> >I will put my connection log in debug mode to see if I get something
> >
> >
> >[]s
> >
> >
> >
> >
> >On Tue, 2004-02-17 at 11:35, Michael Nestler wrote:
> >
> >
> >
> >>OK, James stopped again last night at about 1:50 am. I did not have the
> >>time yet to run postal or smtp-source. But this time I have a log
> >>showing smtp sockets with a 10 minute resolution. Since the last stop I
> >>increased the maximum number of connections to 120 (thread number to 150).
> >>
> >>At 1:50 James reported in connections*.log:
> >>
> >>17/02/04 01:50:00 WARN connections: Maximum number of open connections
> >>exceeded - refusing connection. Current number of connections is 120
> >>
> >>repeatedly until 8:29 am when I restarted James. My smtpconn.log (cron
> >>job "netstat | grep smtp") shows for the same time:
> >>
> >>Tue Feb 17 01:50:00 EST 2004
> >>tcp 0 0 email1.whoglue.net:smtp DSL217-1:fyre-messanger
> >>SYN_RECV
> >>tcp 0 0 192.168.168.5:smtp vm211.vm:cruise-swroute
> >>ESTABLISHED
> >>tcp 1 0 192.168.168.5:smtp 220.127.103.78:idmaps
> >>CLOSE_WAIT
> >>tcp 0 0 192.168.168.5:smtp mta.email.contine:44071
> >>TIME_WAIT
> >>tcp 0 0 192.168.168.5:smtp custd530.innerhost:4074
> >>ESTABLISHED
> >>tcp 0 0 192.168.168.5:smtp pro109.profontis.:59902
> >>ESTABLISHED
> >>tcp 0 0 192.168.168.5:smtp smtp01.1800contact:4220
> >>ESTABLISHED
> >>tcp 0 0 192.168.168.5:smtp mta.email.contine:42345
> >>ESTABLISHED
> >>tcp 0 0 192.168.168.5:smtp mta.email.contine:25694
> >>ESTABLISHED
> >>tcp 0 0 192.168.168.5:smtp stdout-mv-01.inbo:44652
> >>TIME_WAIT
> >>tcp 0 0 192.168.168.5:smtp 128.121.26.2:31574
> >>ESTABLISHED
> >>tcp 1 0 192.168.168.5:smtp ool-435798e8.dy:evb-elm
> >>CLOSE_WAIT
> >>tcp 1 0 192.168.168.5:smtp lsanca1-ar5:notify_srvr
> >>CLOSE_WAIT
> >>tcp 0 0 192.168.168.5:smtp mta113.cheetahmai:52853
> >>TIME_WAIT
> >>tcp 0 0 192.168.168.5:smtp 66.42.142.98:49221
> >>ESTABLISHED
> >>tcp 0 0 192.168.168.5:smtp server611.colonize:3926
> >>ESTABLISHED
> >>tcp 1 0 192.168.168.5:smtp 211-56-243-3.inter:linx
> >>CLOSE_WAIT
> >>tcp 0 0 192.168.168.5:smtp wchicon1.qvc.com:4611
> >>ESTABLISHED
> >>tcp 0 0 192.168.168.5:smtp vm125.vmadmin.com:8381
> >>ESTABLISHED
> >>tcp 7 0 192.168.168.5:smtp 66.28.60.195:58491
> >>CLOSE_WAIT
> >>tcp 0 0 192.168.168.5:smtp imo-r10.mx.aol.co:61761
> >>ESTABLISHED
> >>tcp 0 0 192.168.168.5:smtp bonaire.server1.o:52001
> >>ESTABLISHED
> >>tcp 0 0 192.168.168.5:smtp bonaire.server1.o:52272
> >>ESTABLISHED
> >>tcp 1 0 192.168.168.5:smtp 202.Red-213-97-41:pipes
> >>CLOSE_WAIT
> >>tcp 0 0 192.168.168.5:smtp n25.grp.scd.yahoo:47649
> >>ESTABLISHED
> >>tcp 1 0 192.168.168.5:smtp 218.98.97.183:4612
> >>CLOSE_WAIT
> >>tcp 1 0 192.168.168.5:smtp mail.contactel.cz:43724
> >>CLOSE_WAIT
> >>tcp 0 0 192.168.168.5:smtp mail.contactel.cz:43794
> >>ESTABLISHED
> >>tcp 0 0 192.168.168.5:smtp mail.contactel.cz:43801
> >>ESTABLISHED
> >>tcp 0 0 192.168.168.5:smtp 212.143.236.10:hylafax
> >>ESTABLISHED
> >>tcp 0 0 192.168.168.5:smtp cust54-224.netcabo:4416
> >>ESTABLISHED
> >>tcp 0 0 192.168.168.5:smtp ana216.anotherday:40710
> >>ESTABLISHED
> >>tcp 1 0 192.168.168.5:smtp 218.232.228.130:vsixml
> >>CLOSE_WAIT
> >>
> >>
> >>And at the next time the netstat snapshot was made:
> >>
> >>Tue Feb 17 02:00:00 EST 2004
> >>tcp 0 0 192.168.168.5:smtp mta.email.contine:13622
> >>TIME_WAIT
> >>tcp 0 0 192.168.168.5:smtp imo-m23.mx.aol.co:63699
> >>TIME_WAIT
> >>tcp 0 0 192.168.168.5:smtp modemcable:conclave-cpp
> >>TIME_WAIT
> >>tcp 0 0 192.168.168.5:smtp 219.144.194.18:43349
> >>TIME_WAIT
> >>tcp 0 0 192.168.168.5:smtp 219.144.194.18:43281
> >>TIME_WAIT
> >>tcp 0 0 192.168.168.5:smtp c-24-7-95-19.clien:4669
> >>TIME_WAIT
> >>tcp 0 0 192.168.168.5:smtp web60804.mail.yah:30519
> >>TIME_WAIT
> >>tcp 0 0 192.168.168.5:smtp web60804.mail.yah:30518
> >>TIME_WAIT
> >>tcp 0 0 192.168.168.5:smtp relay.quickhire.c:34089
> >>TIME_WAIT
> >>tcp 0 0 192.168.168.5:smtp ACC1B832.:ns-cfg-server
> >>TIME_WAIT
> >>
> >>
> >> From there on the smtp sockets log file basically shows only TIME_WAIT
> >>sockets (there are a few SYN_RECV) until I restarted James at 8:29 am.
> >>Never netstat shows 120 open smtp connections (what James seems to
> >>believe). As a strange aside the default*.log contains a database
> >>connection exception (it contained the same exception at the last time
> >>James stopped 6 days ago). I don't know if this means anything - could
> >>this exception have caused the problem? spoolmanager*.log and
> >>mailet*.log stopped recording at 1:32 am.
> >>
> >>To summarize the events from my point of view:
> >>o James was running for about 6 days without any problem, delivering
> >>about 5,000-10,000 emails a day
> >>o some event at a few minutes before 1:50 am stopped James from
> >>returning used smtp connection socket objects to the pool starting at
> >>that time
> >>o as a consequence new connections were not accepted and the service
> >>stopped effectively at 1:32 am
> >>o it never recovered from that situation until I relieved it by
> >>restarting phoenix at 8:29 am
> >>
> >> From my point of view it seems to be likely that the mentioned
> >>exception in default*.log caused the problem, since it happened so close
> >>to the time James stopped:
> >>
> >>17/02/04 01:33:11 INFO database-connections.maildb: ***** connection 1
> >>is way too old: 63408 > 60000
> >>17/02/04 01:33:11 INFO database-connections.maildb: java.lang.Throwable
> >> at
> >>org.apache.james.util.mordred.PoolConnEntry.lock(PoolConnEntry.java:135)
> >> at
> >>org.apache.james.util.mordred.JdbcDataSource.getConnection(JdbcDataSource.java:178)
> >>
> >> at
> >>org.apache.james.mailrepository.JDBCMailRepository.store(JDBCMailRepository.java:484)
> >>
> >> at
> >>org.apache.james.mailrepository.JDBCSpoolRepository.store(JDBCSpoolRepository.java:243)
> >>
> >> at
> >>org.apache.james.transport.mailets.RemoteDelivery.service(RemoteDelivery.java:649)
> >>
> >> at
> >>org.apache.james.transport.LinearProcessor.service(LinearProcessor.java:413)
> >>
> >> at
> >>org.apache.james.transport.JamesSpoolManager.process(JamesSpoolManager.java:436)
> >>
> >> at
> >>org.apache.james.transport.JamesSpoolManager.run(JamesSpoolManager.java:366)
> >>
> >> at
> >>org.apache.avalon.excalibur.thread.impl.ExecutableRunnable.execute(ExecutableRunnable.java:47)
> >>
> >> at
> >>org.apache.avalon.excalibur.thread.impl.WorkerThread.run(WorkerThread.java:80)
> >>
> >>
> >>On the other hand it could be just a side effect. I attached the
> >>mentioned log files since midnight to this email in zipped format.
> >>
> >>I really hope somebody on this list can make any sense out of this and
> >>knows what's going on here and how I can fix it. Let me know if I should
> >>run postal or smtp-source first and I will do it.
> >>
> >>All help is greatly appreciated.
> >>
> >>Best regards,
> >>Michael
> >>
> >>P.S. I was not able to attach a ZIP file to this email. mail.apache.org
> >>bounced it saying "552 we don't accept email with executable content
> >>(#5.3.4)" ? I try it again as TGZ.
> >>
> >>
> >>
> >>Noel J. Bergman wrote:
> >>
> >>
> >>
> >>>Run netstat and check to see if they are real. I have tested the connection
> >>>handling with linux kernel 2.2.22, 2.4.20-18 and 2.4.20-24 without problem,
> >>>amongst other versions.
> >>>
> >>>What version of James?
> >>>
> >>> --- Noel
> >>>
> >>>
> >>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> For additional commands, e-mail: server-user-help@james.apache.org
>
Re: James Running Out Of Connections
Posted by Michael Nestler <mn...@whoglue.com>.
Phew - at least I am not the only one who has this problem ;-) Are you
using Linux 2.4 as well?
I also did a load test before deploying James into production and I was
able to deliver about 100,000 emails within 8 hours without any issue.
The issue seems to occur only sporadically - sometimes it takes only a
few hours, sometimes a few days. But once it hangs, it does not come
back in my case (at least not within 6 hours).
I also had the issue of James suddenly "partially" halting before I
upgraded from file to db persistence. I don't know though if this was
the same problem.
My connections log level was set to DEBUG when the log was made.
I did not have any ESTABLISHED smtp connection while James was hanging -
almost all had status TIME_WAIT.
Michael
Cesar Bonadio wrote:
>I think you are not alone :)
>
>In the last five days I have found several messages in the connection
>log with the same
>problem my server stops responding but after a while it comes back and
>if I run a netstat when
>I am receiving the error I really dont have more than 20 ESTABLISED
>connections.
>
>This is interesting problem because in our test server running the same
>version of James and java
>a manage to deliver more than 20.000 messages with smtp-source and the
>connections works just fine
>if I go over it complains and releases the connections after use.
>
>I will put my connection log in debug mode to see if I get something
>
>
>[]s
>
>
>
>
>On Tue, 2004-02-17 at 11:35, Michael Nestler wrote:
>
>
>
>>OK, James stopped again last night at about 1:50 am. I did not have the
>>time yet to run postal or smtp-source. But this time I have a log
>>showing smtp sockets with a 10 minute resolution. Since the last stop I
>>increased the maximum number of connections to 120 (thread number to 150).
>>
>>At 1:50 James reported in connections*.log:
>>
>>17/02/04 01:50:00 WARN connections: Maximum number of open connections
>>exceeded - refusing connection. Current number of connections is 120
>>
>>repeatedly until 8:29 am when I restarted James. My smtpconn.log (cron
>>job "netstat | grep smtp") shows for the same time:
>>
>>Tue Feb 17 01:50:00 EST 2004
>>tcp 0 0 email1.whoglue.net:smtp DSL217-1:fyre-messanger
>>SYN_RECV
>>tcp 0 0 192.168.168.5:smtp vm211.vm:cruise-swroute
>>ESTABLISHED
>>tcp 1 0 192.168.168.5:smtp 220.127.103.78:idmaps
>>CLOSE_WAIT
>>tcp 0 0 192.168.168.5:smtp mta.email.contine:44071
>>TIME_WAIT
>>tcp 0 0 192.168.168.5:smtp custd530.innerhost:4074
>>ESTABLISHED
>>tcp 0 0 192.168.168.5:smtp pro109.profontis.:59902
>>ESTABLISHED
>>tcp 0 0 192.168.168.5:smtp smtp01.1800contact:4220
>>ESTABLISHED
>>tcp 0 0 192.168.168.5:smtp mta.email.contine:42345
>>ESTABLISHED
>>tcp 0 0 192.168.168.5:smtp mta.email.contine:25694
>>ESTABLISHED
>>tcp 0 0 192.168.168.5:smtp stdout-mv-01.inbo:44652
>>TIME_WAIT
>>tcp 0 0 192.168.168.5:smtp 128.121.26.2:31574
>>ESTABLISHED
>>tcp 1 0 192.168.168.5:smtp ool-435798e8.dy:evb-elm
>>CLOSE_WAIT
>>tcp 1 0 192.168.168.5:smtp lsanca1-ar5:notify_srvr
>>CLOSE_WAIT
>>tcp 0 0 192.168.168.5:smtp mta113.cheetahmai:52853
>>TIME_WAIT
>>tcp 0 0 192.168.168.5:smtp 66.42.142.98:49221
>>ESTABLISHED
>>tcp 0 0 192.168.168.5:smtp server611.colonize:3926
>>ESTABLISHED
>>tcp 1 0 192.168.168.5:smtp 211-56-243-3.inter:linx
>>CLOSE_WAIT
>>tcp 0 0 192.168.168.5:smtp wchicon1.qvc.com:4611
>>ESTABLISHED
>>tcp 0 0 192.168.168.5:smtp vm125.vmadmin.com:8381
>>ESTABLISHED
>>tcp 7 0 192.168.168.5:smtp 66.28.60.195:58491
>>CLOSE_WAIT
>>tcp 0 0 192.168.168.5:smtp imo-r10.mx.aol.co:61761
>>ESTABLISHED
>>tcp 0 0 192.168.168.5:smtp bonaire.server1.o:52001
>>ESTABLISHED
>>tcp 0 0 192.168.168.5:smtp bonaire.server1.o:52272
>>ESTABLISHED
>>tcp 1 0 192.168.168.5:smtp 202.Red-213-97-41:pipes
>>CLOSE_WAIT
>>tcp 0 0 192.168.168.5:smtp n25.grp.scd.yahoo:47649
>>ESTABLISHED
>>tcp 1 0 192.168.168.5:smtp 218.98.97.183:4612
>>CLOSE_WAIT
>>tcp 1 0 192.168.168.5:smtp mail.contactel.cz:43724
>>CLOSE_WAIT
>>tcp 0 0 192.168.168.5:smtp mail.contactel.cz:43794
>>ESTABLISHED
>>tcp 0 0 192.168.168.5:smtp mail.contactel.cz:43801
>>ESTABLISHED
>>tcp 0 0 192.168.168.5:smtp 212.143.236.10:hylafax
>>ESTABLISHED
>>tcp 0 0 192.168.168.5:smtp cust54-224.netcabo:4416
>>ESTABLISHED
>>tcp 0 0 192.168.168.5:smtp ana216.anotherday:40710
>>ESTABLISHED
>>tcp 1 0 192.168.168.5:smtp 218.232.228.130:vsixml
>>CLOSE_WAIT
>>
>>
>>And at the next time the netstat snapshot was made:
>>
>>Tue Feb 17 02:00:00 EST 2004
>>tcp 0 0 192.168.168.5:smtp mta.email.contine:13622
>>TIME_WAIT
>>tcp 0 0 192.168.168.5:smtp imo-m23.mx.aol.co:63699
>>TIME_WAIT
>>tcp 0 0 192.168.168.5:smtp modemcable:conclave-cpp
>>TIME_WAIT
>>tcp 0 0 192.168.168.5:smtp 219.144.194.18:43349
>>TIME_WAIT
>>tcp 0 0 192.168.168.5:smtp 219.144.194.18:43281
>>TIME_WAIT
>>tcp 0 0 192.168.168.5:smtp c-24-7-95-19.clien:4669
>>TIME_WAIT
>>tcp 0 0 192.168.168.5:smtp web60804.mail.yah:30519
>>TIME_WAIT
>>tcp 0 0 192.168.168.5:smtp web60804.mail.yah:30518
>>TIME_WAIT
>>tcp 0 0 192.168.168.5:smtp relay.quickhire.c:34089
>>TIME_WAIT
>>tcp 0 0 192.168.168.5:smtp ACC1B832.:ns-cfg-server
>>TIME_WAIT
>>
>>
>> From there on the smtp sockets log file basically shows only TIME_WAIT
>>sockets (there are a few SYN_RECV) until I restarted James at 8:29 am.
>>Never netstat shows 120 open smtp connections (what James seems to
>>believe). As a strange aside the default*.log contains a database
>>connection exception (it contained the same exception at the last time
>>James stopped 6 days ago). I don't know if this means anything - could
>>this exception have caused the problem? spoolmanager*.log and
>>mailet*.log stopped recording at 1:32 am.
>>
>>To summarize the events from my point of view:
>>o James was running for about 6 days without any problem, delivering
>>about 5,000-10,000 emails a day
>>o some event at a few minutes before 1:50 am stopped James from
>>returning used smtp connection socket objects to the pool starting at
>>that time
>>o as a consequence new connections were not accepted and the service
>>stopped effectively at 1:32 am
>>o it never recovered from that situation until I relieved it by
>>restarting phoenix at 8:29 am
>>
>> From my point of view it seems to be likely that the mentioned
>>exception in default*.log caused the problem, since it happened so close
>>to the time James stopped:
>>
>>17/02/04 01:33:11 INFO database-connections.maildb: ***** connection 1
>>is way too old: 63408 > 60000
>>17/02/04 01:33:11 INFO database-connections.maildb: java.lang.Throwable
>> at
>>org.apache.james.util.mordred.PoolConnEntry.lock(PoolConnEntry.java:135)
>> at
>>org.apache.james.util.mordred.JdbcDataSource.getConnection(JdbcDataSource.java:178)
>>
>> at
>>org.apache.james.mailrepository.JDBCMailRepository.store(JDBCMailRepository.java:484)
>>
>> at
>>org.apache.james.mailrepository.JDBCSpoolRepository.store(JDBCSpoolRepository.java:243)
>>
>> at
>>org.apache.james.transport.mailets.RemoteDelivery.service(RemoteDelivery.java:649)
>>
>> at
>>org.apache.james.transport.LinearProcessor.service(LinearProcessor.java:413)
>>
>> at
>>org.apache.james.transport.JamesSpoolManager.process(JamesSpoolManager.java:436)
>>
>> at
>>org.apache.james.transport.JamesSpoolManager.run(JamesSpoolManager.java:366)
>>
>> at
>>org.apache.avalon.excalibur.thread.impl.ExecutableRunnable.execute(ExecutableRunnable.java:47)
>>
>> at
>>org.apache.avalon.excalibur.thread.impl.WorkerThread.run(WorkerThread.java:80)
>>
>>
>>On the other hand it could be just a side effect. I attached the
>>mentioned log files since midnight to this email in zipped format.
>>
>>I really hope somebody on this list can make any sense out of this and
>>knows what's going on here and how I can fix it. Let me know if I should
>>run postal or smtp-source first and I will do it.
>>
>>All help is greatly appreciated.
>>
>>Best regards,
>>Michael
>>
>>P.S. I was not able to attach a ZIP file to this email. mail.apache.org
>>bounced it saying "552 we don't accept email with executable content
>>(#5.3.4)" ? I try it again as TGZ.
>>
>>
>>
>>Noel J. Bergman wrote:
>>
>>
>>
>>>Run netstat and check to see if they are real. I have tested the connection
>>>handling with linux kernel 2.2.22, 2.4.20-18 and 2.4.20-24 without problem,
>>>amongst other versions.
>>>
>>>What version of James?
>>>
>>> --- Noel
>>>
>>>
>>>
---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org
Re: James Running Out Of Connections
Posted by Cesar Bonadio <bo...@picture.com.br>.
I think you are not alone :)
In the last five days I have found several messages in the connection
log with the same
problem my server stops responding but after a while it comes back and
if I run a netstat when
I am receiving the error I really dont have more than 20 ESTABLISED
connections.
This is interesting problem because in our test server running the same
version of James and java
a manage to deliver more than 20.000 messages with smtp-source and the
connections works just fine
if I go over it complains and releases the connections after use.
I will put my connection log in debug mode to see if I get something
[]s
On Tue, 2004-02-17 at 11:35, Michael Nestler wrote:
> OK, James stopped again last night at about 1:50 am. I did not have the
> time yet to run postal or smtp-source. But this time I have a log
> showing smtp sockets with a 10 minute resolution. Since the last stop I
> increased the maximum number of connections to 120 (thread number to 150).
>
> At 1:50 James reported in connections*.log:
>
> 17/02/04 01:50:00 WARN connections: Maximum number of open connections
> exceeded - refusing connection. Current number of connections is 120
>
> repeatedly until 8:29 am when I restarted James. My smtpconn.log (cron
> job "netstat | grep smtp") shows for the same time:
>
> Tue Feb 17 01:50:00 EST 2004
> tcp 0 0 email1.whoglue.net:smtp DSL217-1:fyre-messanger
> SYN_RECV
> tcp 0 0 192.168.168.5:smtp vm211.vm:cruise-swroute
> ESTABLISHED
> tcp 1 0 192.168.168.5:smtp 220.127.103.78:idmaps
> CLOSE_WAIT
> tcp 0 0 192.168.168.5:smtp mta.email.contine:44071
> TIME_WAIT
> tcp 0 0 192.168.168.5:smtp custd530.innerhost:4074
> ESTABLISHED
> tcp 0 0 192.168.168.5:smtp pro109.profontis.:59902
> ESTABLISHED
> tcp 0 0 192.168.168.5:smtp smtp01.1800contact:4220
> ESTABLISHED
> tcp 0 0 192.168.168.5:smtp mta.email.contine:42345
> ESTABLISHED
> tcp 0 0 192.168.168.5:smtp mta.email.contine:25694
> ESTABLISHED
> tcp 0 0 192.168.168.5:smtp stdout-mv-01.inbo:44652
> TIME_WAIT
> tcp 0 0 192.168.168.5:smtp 128.121.26.2:31574
> ESTABLISHED
> tcp 1 0 192.168.168.5:smtp ool-435798e8.dy:evb-elm
> CLOSE_WAIT
> tcp 1 0 192.168.168.5:smtp lsanca1-ar5:notify_srvr
> CLOSE_WAIT
> tcp 0 0 192.168.168.5:smtp mta113.cheetahmai:52853
> TIME_WAIT
> tcp 0 0 192.168.168.5:smtp 66.42.142.98:49221
> ESTABLISHED
> tcp 0 0 192.168.168.5:smtp server611.colonize:3926
> ESTABLISHED
> tcp 1 0 192.168.168.5:smtp 211-56-243-3.inter:linx
> CLOSE_WAIT
> tcp 0 0 192.168.168.5:smtp wchicon1.qvc.com:4611
> ESTABLISHED
> tcp 0 0 192.168.168.5:smtp vm125.vmadmin.com:8381
> ESTABLISHED
> tcp 7 0 192.168.168.5:smtp 66.28.60.195:58491
> CLOSE_WAIT
> tcp 0 0 192.168.168.5:smtp imo-r10.mx.aol.co:61761
> ESTABLISHED
> tcp 0 0 192.168.168.5:smtp bonaire.server1.o:52001
> ESTABLISHED
> tcp 0 0 192.168.168.5:smtp bonaire.server1.o:52272
> ESTABLISHED
> tcp 1 0 192.168.168.5:smtp 202.Red-213-97-41:pipes
> CLOSE_WAIT
> tcp 0 0 192.168.168.5:smtp n25.grp.scd.yahoo:47649
> ESTABLISHED
> tcp 1 0 192.168.168.5:smtp 218.98.97.183:4612
> CLOSE_WAIT
> tcp 1 0 192.168.168.5:smtp mail.contactel.cz:43724
> CLOSE_WAIT
> tcp 0 0 192.168.168.5:smtp mail.contactel.cz:43794
> ESTABLISHED
> tcp 0 0 192.168.168.5:smtp mail.contactel.cz:43801
> ESTABLISHED
> tcp 0 0 192.168.168.5:smtp 212.143.236.10:hylafax
> ESTABLISHED
> tcp 0 0 192.168.168.5:smtp cust54-224.netcabo:4416
> ESTABLISHED
> tcp 0 0 192.168.168.5:smtp ana216.anotherday:40710
> ESTABLISHED
> tcp 1 0 192.168.168.5:smtp 218.232.228.130:vsixml
> CLOSE_WAIT
>
>
> And at the next time the netstat snapshot was made:
>
> Tue Feb 17 02:00:00 EST 2004
> tcp 0 0 192.168.168.5:smtp mta.email.contine:13622
> TIME_WAIT
> tcp 0 0 192.168.168.5:smtp imo-m23.mx.aol.co:63699
> TIME_WAIT
> tcp 0 0 192.168.168.5:smtp modemcable:conclave-cpp
> TIME_WAIT
> tcp 0 0 192.168.168.5:smtp 219.144.194.18:43349
> TIME_WAIT
> tcp 0 0 192.168.168.5:smtp 219.144.194.18:43281
> TIME_WAIT
> tcp 0 0 192.168.168.5:smtp c-24-7-95-19.clien:4669
> TIME_WAIT
> tcp 0 0 192.168.168.5:smtp web60804.mail.yah:30519
> TIME_WAIT
> tcp 0 0 192.168.168.5:smtp web60804.mail.yah:30518
> TIME_WAIT
> tcp 0 0 192.168.168.5:smtp relay.quickhire.c:34089
> TIME_WAIT
> tcp 0 0 192.168.168.5:smtp ACC1B832.:ns-cfg-server
> TIME_WAIT
>
>
> From there on the smtp sockets log file basically shows only TIME_WAIT
> sockets (there are a few SYN_RECV) until I restarted James at 8:29 am.
> Never netstat shows 120 open smtp connections (what James seems to
> believe). As a strange aside the default*.log contains a database
> connection exception (it contained the same exception at the last time
> James stopped 6 days ago). I don't know if this means anything - could
> this exception have caused the problem? spoolmanager*.log and
> mailet*.log stopped recording at 1:32 am.
>
> To summarize the events from my point of view:
> o James was running for about 6 days without any problem, delivering
> about 5,000-10,000 emails a day
> o some event at a few minutes before 1:50 am stopped James from
> returning used smtp connection socket objects to the pool starting at
> that time
> o as a consequence new connections were not accepted and the service
> stopped effectively at 1:32 am
> o it never recovered from that situation until I relieved it by
> restarting phoenix at 8:29 am
>
> From my point of view it seems to be likely that the mentioned
> exception in default*.log caused the problem, since it happened so close
> to the time James stopped:
>
> 17/02/04 01:33:11 INFO database-connections.maildb: ***** connection 1
> is way too old: 63408 > 60000
> 17/02/04 01:33:11 INFO database-connections.maildb: java.lang.Throwable
> at
> org.apache.james.util.mordred.PoolConnEntry.lock(PoolConnEntry.java:135)
> at
> org.apache.james.util.mordred.JdbcDataSource.getConnection(JdbcDataSource.java:178)
>
> at
> org.apache.james.mailrepository.JDBCMailRepository.store(JDBCMailRepository.java:484)
>
> at
> org.apache.james.mailrepository.JDBCSpoolRepository.store(JDBCSpoolRepository.java:243)
>
> at
> org.apache.james.transport.mailets.RemoteDelivery.service(RemoteDelivery.java:649)
>
> at
> org.apache.james.transport.LinearProcessor.service(LinearProcessor.java:413)
>
> at
> org.apache.james.transport.JamesSpoolManager.process(JamesSpoolManager.java:436)
>
> at
> org.apache.james.transport.JamesSpoolManager.run(JamesSpoolManager.java:366)
>
> at
> org.apache.avalon.excalibur.thread.impl.ExecutableRunnable.execute(ExecutableRunnable.java:47)
>
> at
> org.apache.avalon.excalibur.thread.impl.WorkerThread.run(WorkerThread.java:80)
>
>
> On the other hand it could be just a side effect. I attached the
> mentioned log files since midnight to this email in zipped format.
>
> I really hope somebody on this list can make any sense out of this and
> knows what's going on here and how I can fix it. Let me know if I should
> run postal or smtp-source first and I will do it.
>
> All help is greatly appreciated.
>
> Best regards,
> Michael
>
> P.S. I was not able to attach a ZIP file to this email. mail.apache.org
> bounced it saying "552 we don't accept email with executable content
> (#5.3.4)" ? I try it again as TGZ.
>
>
>
> Noel J. Bergman wrote:
>
> >Run netstat and check to see if they are real. I have tested the connection
> >handling with linux kernel 2.2.22, 2.4.20-18 and 2.4.20-24 without problem,
> >amongst other versions.
> >
> >What version of James?
> >
> > --- Noel
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> >For additional commands, e-mail: server-user-help@james.apache.org
> >
> >
> >
> >
>
>
> ______________________________________________________________________
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> For additional commands, e-mail: server-user-help@james.apache.org
Re: James Running Out Of Connections
Posted by Michael Nestler <mn...@whoglue.com>.
OK, James stopped again last night at about 1:50 am. I did not have the
time yet to run postal or smtp-source. But this time I have a log
showing smtp sockets with a 10 minute resolution. Since the last stop I
increased the maximum number of connections to 120 (thread number to 150).
At 1:50 James reported in connections*.log:
17/02/04 01:50:00 WARN connections: Maximum number of open connections
exceeded - refusing connection. Current number of connections is 120
repeatedly until 8:29 am when I restarted James. My smtpconn.log (cron
job "netstat | grep smtp") shows for the same time:
Tue Feb 17 01:50:00 EST 2004
tcp 0 0 email1.whoglue.net:smtp DSL217-1:fyre-messanger
SYN_RECV
tcp 0 0 192.168.168.5:smtp vm211.vm:cruise-swroute
ESTABLISHED
tcp 1 0 192.168.168.5:smtp 220.127.103.78:idmaps
CLOSE_WAIT
tcp 0 0 192.168.168.5:smtp mta.email.contine:44071
TIME_WAIT
tcp 0 0 192.168.168.5:smtp custd530.innerhost:4074
ESTABLISHED
tcp 0 0 192.168.168.5:smtp pro109.profontis.:59902
ESTABLISHED
tcp 0 0 192.168.168.5:smtp smtp01.1800contact:4220
ESTABLISHED
tcp 0 0 192.168.168.5:smtp mta.email.contine:42345
ESTABLISHED
tcp 0 0 192.168.168.5:smtp mta.email.contine:25694
ESTABLISHED
tcp 0 0 192.168.168.5:smtp stdout-mv-01.inbo:44652
TIME_WAIT
tcp 0 0 192.168.168.5:smtp 128.121.26.2:31574
ESTABLISHED
tcp 1 0 192.168.168.5:smtp ool-435798e8.dy:evb-elm
CLOSE_WAIT
tcp 1 0 192.168.168.5:smtp lsanca1-ar5:notify_srvr
CLOSE_WAIT
tcp 0 0 192.168.168.5:smtp mta113.cheetahmai:52853
TIME_WAIT
tcp 0 0 192.168.168.5:smtp 66.42.142.98:49221
ESTABLISHED
tcp 0 0 192.168.168.5:smtp server611.colonize:3926
ESTABLISHED
tcp 1 0 192.168.168.5:smtp 211-56-243-3.inter:linx
CLOSE_WAIT
tcp 0 0 192.168.168.5:smtp wchicon1.qvc.com:4611
ESTABLISHED
tcp 0 0 192.168.168.5:smtp vm125.vmadmin.com:8381
ESTABLISHED
tcp 7 0 192.168.168.5:smtp 66.28.60.195:58491
CLOSE_WAIT
tcp 0 0 192.168.168.5:smtp imo-r10.mx.aol.co:61761
ESTABLISHED
tcp 0 0 192.168.168.5:smtp bonaire.server1.o:52001
ESTABLISHED
tcp 0 0 192.168.168.5:smtp bonaire.server1.o:52272
ESTABLISHED
tcp 1 0 192.168.168.5:smtp 202.Red-213-97-41:pipes
CLOSE_WAIT
tcp 0 0 192.168.168.5:smtp n25.grp.scd.yahoo:47649
ESTABLISHED
tcp 1 0 192.168.168.5:smtp 218.98.97.183:4612
CLOSE_WAIT
tcp 1 0 192.168.168.5:smtp mail.contactel.cz:43724
CLOSE_WAIT
tcp 0 0 192.168.168.5:smtp mail.contactel.cz:43794
ESTABLISHED
tcp 0 0 192.168.168.5:smtp mail.contactel.cz:43801
ESTABLISHED
tcp 0 0 192.168.168.5:smtp 212.143.236.10:hylafax
ESTABLISHED
tcp 0 0 192.168.168.5:smtp cust54-224.netcabo:4416
ESTABLISHED
tcp 0 0 192.168.168.5:smtp ana216.anotherday:40710
ESTABLISHED
tcp 1 0 192.168.168.5:smtp 218.232.228.130:vsixml
CLOSE_WAIT
And at the next time the netstat snapshot was made:
Tue Feb 17 02:00:00 EST 2004
tcp 0 0 192.168.168.5:smtp mta.email.contine:13622
TIME_WAIT
tcp 0 0 192.168.168.5:smtp imo-m23.mx.aol.co:63699
TIME_WAIT
tcp 0 0 192.168.168.5:smtp modemcable:conclave-cpp
TIME_WAIT
tcp 0 0 192.168.168.5:smtp 219.144.194.18:43349
TIME_WAIT
tcp 0 0 192.168.168.5:smtp 219.144.194.18:43281
TIME_WAIT
tcp 0 0 192.168.168.5:smtp c-24-7-95-19.clien:4669
TIME_WAIT
tcp 0 0 192.168.168.5:smtp web60804.mail.yah:30519
TIME_WAIT
tcp 0 0 192.168.168.5:smtp web60804.mail.yah:30518
TIME_WAIT
tcp 0 0 192.168.168.5:smtp relay.quickhire.c:34089
TIME_WAIT
tcp 0 0 192.168.168.5:smtp ACC1B832.:ns-cfg-server
TIME_WAIT
From there on the smtp sockets log file basically shows only TIME_WAIT
sockets (there are a few SYN_RECV) until I restarted James at 8:29 am.
Never netstat shows 120 open smtp connections (what James seems to
believe). As a strange aside the default*.log contains a database
connection exception (it contained the same exception at the last time
James stopped 6 days ago). I don't know if this means anything - could
this exception have caused the problem? spoolmanager*.log and
mailet*.log stopped recording at 1:32 am.
To summarize the events from my point of view:
o James was running for about 6 days without any problem, delivering
about 5,000-10,000 emails a day
o some event at a few minutes before 1:50 am stopped James from
returning used smtp connection socket objects to the pool starting at
that time
o as a consequence new connections were not accepted and the service
stopped effectively at 1:32 am
o it never recovered from that situation until I relieved it by
restarting phoenix at 8:29 am
From my point of view it seems to be likely that the mentioned
exception in default*.log caused the problem, since it happened so close
to the time James stopped:
17/02/04 01:33:11 INFO database-connections.maildb: ***** connection 1
is way too old: 63408 > 60000
17/02/04 01:33:11 INFO database-connections.maildb: java.lang.Throwable
at
org.apache.james.util.mordred.PoolConnEntry.lock(PoolConnEntry.java:135)
at
org.apache.james.util.mordred.JdbcDataSource.getConnection(JdbcDataSource.java:178)
at
org.apache.james.mailrepository.JDBCMailRepository.store(JDBCMailRepository.java:484)
at
org.apache.james.mailrepository.JDBCSpoolRepository.store(JDBCSpoolRepository.java:243)
at
org.apache.james.transport.mailets.RemoteDelivery.service(RemoteDelivery.java:649)
at
org.apache.james.transport.LinearProcessor.service(LinearProcessor.java:413)
at
org.apache.james.transport.JamesSpoolManager.process(JamesSpoolManager.java:436)
at
org.apache.james.transport.JamesSpoolManager.run(JamesSpoolManager.java:366)
at
org.apache.avalon.excalibur.thread.impl.ExecutableRunnable.execute(ExecutableRunnable.java:47)
at
org.apache.avalon.excalibur.thread.impl.WorkerThread.run(WorkerThread.java:80)
On the other hand it could be just a side effect. I attached the
mentioned log files since midnight to this email in zipped format.
I really hope somebody on this list can make any sense out of this and
knows what's going on here and how I can fix it. Let me know if I should
run postal or smtp-source first and I will do it.
All help is greatly appreciated.
Best regards,
Michael
P.S. I was not able to attach a ZIP file to this email. mail.apache.org
bounced it saying "552 we don't accept email with executable content
(#5.3.4)" ? I try it again as TGZ.
Noel J. Bergman wrote:
>Run netstat and check to see if they are real. I have tested the connection
>handling with linux kernel 2.2.22, 2.4.20-18 and 2.4.20-24 without problem,
>amongst other versions.
>
>What version of James?
>
> --- Noel
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
>For additional commands, e-mail: server-user-help@james.apache.org
>
>
>
>
Re: James Running Out Of Connections
Posted by Michael Nestler <mn...@whoglue.com>.
"netstat | grep smtp" reports now:
tcp 0 0 192.168.168.5:smtp 69-84-204-9.shlby:43729
TIME_WAIT
tcp 0 0 192.168.168.5:smtp mta1.primary.ddc.:29389
TIME_WAIT
tcp 0 0 192.168.168.5:smtp bhwash004.us.usai:33680
TIME_WAIT
tcp 0 0 192.168.168.5:39345 nycmx02.mgw.rr.com:smtp
ESTABLISHED
tcp 0 0 192.168.168.5:smtp mail09.rm04.net:47300
TIME_WAIT
I will forward the output again after James stopped for the next time.
It is James 2.1.3 with an additional forwarding mailet, nntp and pop3 off.
Michael
Noel J. Bergman wrote:
>Run netstat and check to see if they are real. I have tested the connection
>handling with linux kernel 2.2.22, 2.4.20-18 and 2.4.20-24 without problem,
>amongst other versions.
>
>What version of James?
>
> --- Noel
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
>For additional commands, e-mail: server-user-help@james.apache.org
>
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org
RE: James Running Out Of Connections
Posted by "Noel J. Bergman" <no...@devtech.com>.
Run netstat and check to see if they are real. I have tested the connection
handling with linux kernel 2.2.22, 2.4.20-18 and 2.4.20-24 without problem,
amongst other versions.
What version of James?
--- Noel
---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org
Re: James Running Out Of Connections
Posted by Michael Nestler <mn...@whoglue.com>.
I will send the netstat output if it should happen again. Yes, I had a
connection limit of 30 (increased it now to 120) - but the connections
were not release for 6 hours. James started logging
12/02/04 02:29:38 WARN connections: Maximum number of open connections
exceeded - refusing connection. Current number of connections is 30
and continued doing this once every few seconds until I stopped it when
I came to work in the morning:
12/02/04 08:11:19 WARN connections: Maximum number of open connections
exceeded - refusing connection. Current number of connections is 30
12/02/04 08:11:21 DEBUG connections: Disposing server
connection...org.apache.james.util.connection.ServerConnection@178655
12/02/04 08:11:21 DEBUG connections: Closed server connection - cleaning
up clients - org.apache.james.util.connection.ServerConnection@178655
12/02/04 08:11:21 DEBUG connections: Cleaned up clients -
org.apache.james.util.connection.ServerConnection@178655
12/02/04 08:11:21 DEBUG connections: Disposing server
connection...org.apache.james.util.connection.ServerConnection@746ad0
12/02/04 08:11:21 DEBUG connections: Closed server connection - cleaning
up clients - org.apache.james.util.connection.ServerConnection@746ad0
So, yes it used all connections in the pool - but they were never
released. That is my problem.
Michael
Noel J. Bergman wrote:
>Michael,
>
>You can use netstat | grep smtp to check, but I suspect that James is
>working as designed. It appears that you have a connection limit of 30, and
>really do have 30 simultaneously active connections.
>
> --- Noel
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
>For additional commands, e-mail: server-user-help@james.apache.org
>
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org
RE: James Running Out Of Connections
Posted by "Noel J. Bergman" <no...@devtech.com>.
Michael,
You can use netstat | grep smtp to check, but I suspect that James is
working as designed. It appears that you have a connection limit of 30, and
really do have 30 simultaneously active connections.
--- Noel
---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org