You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-user@james.apache.org by Michael Nestler <mn...@whoglue.com> on 2004/02/13 15:18:46 UTC

James Running Out Of Connections

Hello,

I am running James 2.1.3 under Suse Linux 9.0 (kernel 2.4.21) and Sun 
JDK 1.4.2_03 in a production email forwarding system. From time to time 
(twice a day to once every few days) James runs out of connections as 
shown in the attached part of the connections log file. Searching the 
mailing list archive I found that there was a Linux 2.4.5 bug (?) that 
kept Java applications from closing sockets (?).

Do you think this bug is the cause of the problem? Could it be something 
else? If not, is there a workaround or should I switch to Windows - or 
would it help already to increase the connection pool size and to reduce 
the idle timeout for connections?

Thanks in advance,
Michael

-- 
Michael Nestler
Director of Technology,
WhoGlue, Inc.


Re: James Running Out Of Connections

Posted by Cesar Bonadio <bo...@picture.com.br>.
You should try to increase the number of connections on config.xml, I
have the same setup 
LInux + Java and works fine, try to see if you really have all that
connections with
netstat -n  and search for connection on port 25 


[]s


On Fri, 2004-02-13 at 12:18, Michael Nestler wrote:

> Hello,
> 
> I am running James 2.1.3 under Suse Linux 9.0 (kernel 2.4.21) and Sun 
> JDK 1.4.2_03 in a production email forwarding system. From time to time 
> (twice a day to once every few days) James runs out of connections as 
> shown in the attached part of the connections log file. Searching the 
> mailing list archive I found that there was a Linux 2.4.5 bug (?) that 
> kept Java applications from closing sockets (?).
> 
> Do you think this bug is the cause of the problem? Could it be something 
> else? If not, is there a workaround or should I switch to Windows - or 
> would it help already to increase the connection pool size and to reduce 
> the idle timeout for connections?
> 
> Thanks in advance,
> Michael

Re: James Running Out Of Connections

Posted by Michael Nestler <mn...@whoglue.com>.
Thank you - I will try to find and run it.

Michael


Cesar Bonadio wrote:

>Just another tip
>
>You can use a program called smtp-source that comes with postfix to send
>email to your server
>using several session you just start the program and indicates how many
>sessions you want 
>and the program opens as many simultaneous sessions and send a test
>email. This way
>you can make sure that the problem is not related to James and how James
>will work in such load
>
>[]s
>
>
>
>On Fri, 2004-02-13 at 15:30, Michael Nestler wrote:
>
>  
>
>>I will send the netstat output if it should happen again. Yes, I had a 
>>connection limit of 30 (increased it now to 120) - but the connections 
>>were not release for 6 hours. James started logging
>>
>>12/02/04 02:29:38 WARN  connections: Maximum number of open connections 
>>exceeded - refusing connection.  Current number of connections is 30
>>
>>and continued doing this once every few seconds until I stopped it when 
>>I came to work in the morning:
>>
>>12/02/04 08:11:19 WARN  connections: Maximum number of open connections 
>>exceeded - refusing connection.  Current number of connections is 30
>>12/02/04 08:11:21 DEBUG connections: Disposing server 
>>connection...org.apache.james.util.connection.ServerConnection@178655
>>12/02/04 08:11:21 DEBUG connections: Closed server connection - cleaning 
>>up clients - org.apache.james.util.connection.ServerConnection@178655
>>12/02/04 08:11:21 DEBUG connections: Cleaned up clients - 
>>org.apache.james.util.connection.ServerConnection@178655
>>12/02/04 08:11:21 DEBUG connections: Disposing server 
>>connection...org.apache.james.util.connection.ServerConnection@746ad0
>>12/02/04 08:11:21 DEBUG connections: Closed server connection - cleaning 
>>up clients - org.apache.james.util.connection.ServerConnection@746ad0
>>
>>So, yes it used all connections in the pool - but they were never 
>>released. That is my problem.
>>
>>Michael
>>
>>
>>
>>Noel J. Bergman wrote:
>>
>>    
>>
>>>Michael,
>>>
>>>You can use netstat | grep smtp to check, but I suspect that James is
>>>working as designed.  It appears that you have a connection limit of 30, and
>>>really do have 30 simultaneously active connections.
>>>
>>>	--- Noel
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
>>>For additional commands, e-mail: server-user-help@james.apache.org
>>>
>>>
>>> 
>>>
>>>      
>>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
>>For additional commands, e-mail: server-user-help@james.apache.org
>>
>>    
>>
>
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


RE: James Running Out Of Connections

Posted by Cesar Bonadio <bo...@picture.com.br>.
> That sounds like a problem we may have already fixed.  I'd like to know what you see with the current test build.  If it isn't fixed in the test build, we should be able to modify the instrumentation to help track this problem.
> 
> 	--- Noel
> 

Ok
I will try to put the test build to run in one server

[]s



> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> For additional commands, e-mail: server-user-help@james.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


RE: James Running Out Of Connections

Posted by Cesar Bonadio <bo...@picture.com.br>.
On Thu, 2004-04-15 at 13:02, Noel J. Bergman wrote:

> > I am not using mordred, I configured to use the other method 
> 
> Oh?  When did you make that change?  


when I installed the build version, the other version 2.1.3 is still
running with modred


> Only just now?  Are the other systems that have this problem still using mordred?


I will try to change back to modred and see if the problem occours
again?


> 
> 	--- Noel
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> For additional commands, e-mail: server-user-help@james.apache.org
> 

RE: James Running Out Of Connections

Posted by "Noel J. Bergman" <no...@devtech.com>.
> I am not using mordred, I configured to use the other method 

Oh?  When did you make that change?  Only just now?  Are the other systems that have this problem still using mordred?

	--- Noel


---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


RE: James Running Out Of Connections

Posted by Cesar Bonadio <bo...@picture.com.br>.
I am not using mordred, I configured to use the other method


On Thu, 2004-04-15 at 09:48, Noel J. Bergman wrote:

> Cesar Bonadio wrote:
> > I put the test build to run in our server and so far the
> > problem did not happen again
> 
> > I will keep check until tomorrow, if everything is fine I will
> > install this version in all of our server
> 
> Good to hear.
> 
> Unfortunately, I am seeing this problem consistently now with the current code in CVS.  The only change I can think of that might have effected it is, oddly, a change to mordred.  I don't think that any other change has been anywhere near this code path, although I'm looking.
> 
> I've made some logging changes to help track this down.
> 
> 	--- Noel
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> For additional commands, e-mail: server-user-help@james.apache.org
> 

RE: James Running Out Of Connections

Posted by Cesar Bonadio <bo...@picture.com.br>.
On Fri, 2004-04-16 at 12:55, Noel J. Bergman wrote:
> > I am running 1.4.2_03 and its working fine too
> 
> With which code?  2.2.0a18 or the code currently in CVS?  I hope to post a

2.2.0a18

[]s



---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


RE: James Running Out Of Connections

Posted by "Noel J. Bergman" <no...@devtech.com>.
> I am running 1.4.2_03 and its working fine too

With which code?  2.2.0a18 or the code currently in CVS?  I hope to post a
new test build late today with everything planned for 2.2.0 except for DSN
format bounces, so hopefully you can verify that it continues to work well
in your environment.

	--- Noel


---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


RE: James Running Out Of Connections

Posted by Cesar Bonadio <bo...@picture.com.br>.
> 
> Seems to be running fine with 1.4.2_04, using the same code that was crashing consistently with 1.4.1_02.
> 

I am running 1.4.2_03 and its working fine too


> 	--- Noel
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> For additional commands, e-mail: server-user-help@james.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


RE: James Running Out Of Connections

Posted by "Noel J. Bergman" <no...@devtech.com>.
> > I did a thread dump, and could see the threads waiting on
> > the synchronized pool object.  The odd thing is that I
> > could not find anything that owned it.

> Apparently, this is a defect in Sun JVM 1.4.0 and 1.4.1.  Testing
> with 1.4.2 now.  The problem has been occuring every 6 hours, so
> we'll see in the AM.

Seems to be running fine with 1.4.2_04, using the same code that was crashing consistently with 1.4.1_02.

	--- Noel


---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


RE: James Running Out Of Connections

Posted by "Noel J. Bergman" <no...@devtech.com>.
> I did a thread dump, and could see the threads waiting on
> the synchronized pool object.  The odd thing is that I
> could not find anything that owned it.

Apparently, this is a defect in Sun JVM 1.4.0 and 1.4.1.  Testing with 1.4.2 now.  The problem has been occuring every 6 hours, so we'll see in the AM.

	--- Noel


---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


RE: James Running Out Of Connections

Posted by "Noel J. Bergman" <no...@devtech.com>.
> I am using org.apache.james.util.mordred.JdbcDataSource

> So the test build with mordred is working just fine for me.

With the changes submitted yesterday to fix a deadlock problem, I was seeing consistent failures within about 6 hours.  I did a thread dump, and could see the threads waiting on the synchronized pool object.  The odd thing is that I could not find anything that owned it.

I am testing a change that uses an ArrayList instead of a Vector (inherently synchromized), and ensures that all uses of the pool are explicitly synchronized.  One conjecture from a friend is that there might be a bug in the JVM I'm using (one of the Sun 1.4.1 vintages), and that simplifying the synchronization could relieve it.

	--- Noel


---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


RE: James Running Out Of Connections

Posted by Cesar Bonadio <bo...@picture.com.br>.
Sorry my mistake

I am using 

org.apache.james.util.mordred.JdbcDataSource

I just checked the config.xml in the Test Build server and the database
is using mordred.

So the test build with mordred is working just fine for me.


[]s





On Thu, 2004-04-15 at 14:13, Noel J. Bergman wrote:

> > The only change I can think of that might have effected it is, oddly,
> > a change to mordred.  I don't think that any other change has been
> > anywhere near this code path, although I'm looking.
> 
> I'm very much leaning towards that being the cause.  When you encounter this problem, look in Phoenix's temp/ directory, and see if there are a bunch of .m64 files that correspond with when the problem starts.
> 
> My current hypothesis is that a problem in mordred is causing all of the calling threads to hang when they try to save to the spool.  One thread must be stuck, and all of the others are blocked on a synchronize.  I had never encounted this before, but have several times since making a change to mordred yesterday.
> 
> 	--- Noel
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> For additional commands, e-mail: server-user-help@james.apache.org
> 

RE: James Running Out Of Connections

Posted by "Noel J. Bergman" <no...@devtech.com>.
> The only change I can think of that might have effected it is, oddly,
> a change to mordred.  I don't think that any other change has been
> anywhere near this code path, although I'm looking.

I'm very much leaning towards that being the cause.  When you encounter this problem, look in Phoenix's temp/ directory, and see if there are a bunch of .m64 files that correspond with when the problem starts.

My current hypothesis is that a problem in mordred is causing all of the calling threads to hang when they try to save to the spool.  One thread must be stuck, and all of the others are blocked on a synchronize.  I had never encounted this before, but have several times since making a change to mordred yesterday.

	--- Noel


---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


RE: James Running Out Of Connections

Posted by "Noel J. Bergman" <no...@devtech.com>.
Cesar Bonadio wrote:
> I put the test build to run in our server and so far the
> problem did not happen again

> I will keep check until tomorrow, if everything is fine I will
> install this version in all of our server

Good to hear.

Unfortunately, I am seeing this problem consistently now with the current code in CVS.  The only change I can think of that might have effected it is, oddly, a change to mordred.  I don't think that any other change has been anywhere near this code path, although I'm looking.

I've made some logging changes to help track this down.

	--- Noel


---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


RE: James Running Out Of Connections

Posted by Cesar Bonadio <bo...@picture.com.br>.
I put the test build to run in our server and so far the problem did not
happen again
the connection log shows a smaller number of connections than netstat
what is correct

Usually I would aready have more connections in the connection log than
in netstat after
12 hours of server up.

I will keep check until tomorrow, if everything is fine I will install
this version in 
all of our server


[]s




On Thu, 2004-04-15 at 00:57, Noel J. Bergman wrote:

> I encountered this problem tonight, and have opened up a JIRA issue to track this problem: http://nagoya.apache.org/jira/browse/JAMES-254
> 
> Unfortunately, I reset the server before checking the state.  I did review the logs, and I have a few ideas about what might be causing it, but there wasn't sufficient information in the logs to be conclusive.  I am making some changes to the logging, so that exceptions that terminate the connection are logged at ERROR instead of DEBUG level.
> 
> Some of you have apparently been encountering this problem all-too-frequently, and already have DEBUG enabled for the SMTPHandler.  What I'd like for you to do is check the smtpserver log to see if you have a bunch of messages of the form:
> 
>  $DATE ERROR smtpserver: SMTP Connection has idled out.
> 
> but without a subsequent exception, which might be of the form:
> 
>  $DATE DEBUG smtpserver: Socket to $remoteHost ($remoteIP) timeout.
> 
> Also, if the connection log says that you are maxed, and netstat --inet does not show that many connections, do a KILL -3 on the james process so that we can get a thread dump, before you restart James.
> 
> Right now I am letting a connection idle out before the HELO to see what it does.  If that works properly, I'll let it idle out after the HELO, and will walk my way through until I reproduce the problem.
> 
> 	--- Noel
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> For additional commands, e-mail: server-user-help@james.apache.org
> 

RE: James Running Out Of Connections

Posted by "Noel J. Bergman" <no...@devtech.com>.
I encountered this problem tonight, and have opened up a JIRA issue to track this problem: http://nagoya.apache.org/jira/browse/JAMES-254

Unfortunately, I reset the server before checking the state.  I did review the logs, and I have a few ideas about what might be causing it, but there wasn't sufficient information in the logs to be conclusive.  I am making some changes to the logging, so that exceptions that terminate the connection are logged at ERROR instead of DEBUG level.

Some of you have apparently been encountering this problem all-too-frequently, and already have DEBUG enabled for the SMTPHandler.  What I'd like for you to do is check the smtpserver log to see if you have a bunch of messages of the form:

 $DATE ERROR smtpserver: SMTP Connection has idled out.

but without a subsequent exception, which might be of the form:

 $DATE DEBUG smtpserver: Socket to $remoteHost ($remoteIP) timeout.

Also, if the connection log says that you are maxed, and netstat --inet does not show that many connections, do a KILL -3 on the james process so that we can get a thread dump, before you restart James.

Right now I am letting a connection idle out before the HELO to see what it does.  If that works properly, I'll let it idle out after the HELO, and will walk my way through until I reproduce the problem.

	--- Noel


---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


RE: James Running Out Of Connections

Posted by "Noel J. Bergman" <no...@devtech.com>.
Please do not use HTML e-mail.

> > Have you run netstat to check the active connections?
> Yes there was 30 ESTABLISHED connections with 26 in the SMTP service

OK, if they are ESTABLISHED, with whom are they ESTABLISHED?  That does not sound like you have a lot of "hung" connections.

> smtpserver-2004-04-13-10-53.log:13/04/04 10:53:29 DEBUG smtpserver: Calling reset() default Worker #40
> ......... 34576 times the same message

Hmm ... if you are saying that you had 34576 messages in one second, that should eliminate data arriving that quickly.

> smtpserver-2004-04-13-10-53.log:13/04/04 10:58:59 DEBUG smtpserver: Calling stop() default Worker #40
> smtpserver-2004-04-13-10-53.log:13/04/04 10:58:59 DEBUG smtpserver: Calling disposeWatchdog() default Worker #40
> smtpserver-2004-04-13-10-53.log:13/04/04 10:58:59 DEBUG smtpserver: Watchdog default Worker #40 is exiting run().

OK, that was alive for 30 seconds, and appears to have cleaned up just fine.  If you knew to whom that was connected, you could find out what they were doing.

> What I find is that when I start the server the number of
> connection in the connection log is smaller but closer to
> the number os connections with NETSTAT but after 2-3 hours
> the number of connections in the connection log starts to grow
> little by little and passes the number of connections in NETSTAT

That sounds like a problem we may have already fixed.  I'd like to know what you see with the current test build.  If it isn't fixed in the test build, we should be able to modify the instrumentation to help track this problem.

	--- Noel


---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


RE: James Running Out Of Connections

Posted by Cesar Bonadio <bo...@picture.com.br>.
On Tue, 2004-04-13 at 13:31, Noel J. Bergman wrote:

> Which version of James?  


2.1.3


> Have you run netstat to check the active connections?  

Yes there was 30 ESTABLISHED connections with 26 in the SMTP service


>  Is it possible that your service is simply being hit by lots of connections today from internet worms?


Not really the system works fine today


> 
> > in the smtp log hundreds of messages like the following:
> > 13/04/04 10:58:24 DEBUG smtpserver: Calling reset() default Worker #40
> > 13/04/04 10:58:24 DEBUG smtpserver: Calling reset() default Worker #40
> > ...
> > Any idea if this can be the source of the problem
> 
> Actually, that could be helpful.  That message is emitted by the Watchdog for that connection when it is reset.  The reset happens on one of two conditions:
> 
>   - a new command is entered
>   - a certain amount (20K default) of data is received during
>     processing of the DATA command


The messages started as:

smtpserver-2004-04-13-10-53.log:13/04/04 10:53:27 DEBUG smtpserver:
Watchdog default Worker #40 has time to sleep 360000
smtpserver-2004-04-13-10-53.log:13/04/04 10:53:28 DEBUG smtpserver:
Calling reset() default Worker #40
smtpserver-2004-04-13-10-53.log:13/04/04 10:53:29 DEBUG smtpserver:
Calling reset() default Worker #40
......... 34576 times the same message
smtpserver-2004-04-13-10-53.log:13/04/04 10:58:59 DEBUG smtpserver:
Calling stop() default Worker #40
smtpserver-2004-04-13-10-53.log:13/04/04 10:58:59 DEBUG smtpserver:
Calling disposeWatchdog() default Worker #40
smtpserver-2004-04-13-10-53.log:13/04/04 10:58:59 DEBUG smtpserver:
Watchdog default Worker #40 is exiting run().

then it resume with normal operation


What I find is that when I start the server the number of connection in
the connection log is smaller but closer to the
number os connections with NETSTAT but after 2-3 hours the number of
connections in the connection log starts to grow
little by little and passes the number of connections in NETSTAT

Like now I have in the connection log

13/04/04 16:41:39 DEBUG connections: Releasing one connection, leaving a
total of 62
13/04/04 16:41:39 DEBUG connections: Returning a
org.apache.james.util.connection.ServerConnection$ClientConnectionRunner
to the pool
13/04/04 16:41:39 DEBUG connections: Retrieving a
org.apache.james.util.connection.ServerConnection$ClientConnectionRunner
from the pool
13/04/04 16:41:39 DEBUG connections: Adding one connection for a total
of 63
13/04/04 16:41:39 DEBUG connections: Starting connection on
200.229.128.62:25 from 200.77.160.40:3805
13/04/04 16:41:39 DEBUG connections: Ending connection on
200.229.128.62:25 from 200.247.6.3:1409
13/04/04 16:41:39 DEBUG connections: Releasing one connection, leaving a
total of 62
13/04/04 16:41:39 DEBUG connections: Returning a
org.apache.james.util.connection.ServerConnection$ClientConnectionRunner
to the pool
13/04/04 16:41:39 DEBUG connections: Ending connection on
200.229.128.62:25 from 67.64.81.156:10326
13/04/04 16:41:39 DEBUG connections: Releasing one connection, leaving a
total of 61
13/04/04 16:41:39 DEBUG connections: Returning a
org.apache.james.util.connection.ServerConnection$ClientConnectionRunner
to the pool

But NETSTAT says:

[root@mx05 logs]# netstat -n | grep ESTA | wc -l
     41
[root@mx05 logs]# date
Tue Apr 13 16:41:50 BRT 2004


So after 2 days my server starts to refuse connections because it thinks
that there are more than 200 simultaneous connections what is
not correct.


Any idea on how to debug more information about that?

Thanks







> 
> Either someone is entering 100s of lines, they are sending you a huge message in a short period of time, or there is a really weird bug in the code.  You showed 7 entries at 10:58:24, so that would imply 140K in one second.  Are all of the 100s of entries at the same time?  What is the rate at which they are logged?
> 
> I'm leaning towards someone sending a blank command, which would not be logged but would result in the watchdog being reset.
> 
> 	--- Noel
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> For additional commands, e-mail: server-user-help@james.apache.org
> 

RE: James Running Out Of Connections

Posted by "Noel J. Bergman" <no...@devtech.com>.
Which version of James?  Have you run netstat to check the active connections?   Is it possible that your service is simply being hit by lots of connections today from internet worms?

> in the smtp log hundreds of messages like the following:
> 13/04/04 10:58:24 DEBUG smtpserver: Calling reset() default Worker #40
> 13/04/04 10:58:24 DEBUG smtpserver: Calling reset() default Worker #40
> ...
> Any idea if this can be the source of the problem

Actually, that could be helpful.  That message is emitted by the Watchdog for that connection when it is reset.  The reset happens on one of two conditions:

  - a new command is entered
  - a certain amount (20K default) of data is received during
    processing of the DATA command

Either someone is entering 100s of lines, they are sending you a huge message in a short period of time, or there is a really weird bug in the code.  You showed 7 entries at 10:58:24, so that would imply 140K in one second.  Are all of the 100s of entries at the same time?  What is the rate at which they are logged?

I'm leaning towards someone sending a blank command, which would not be logged but would result in the watchdog being reset.

	--- Noel


---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


James Running Out Of Connections

Posted by Cesar Bonadio <bo...@picture.com.br>.
Hello 

I still have the problema of james running out of connection, today
I start a new debug session and found in the smtp log hundreds of
messages
like the following:

13/04/04 10:58:24 DEBUG smtpserver: Calling reset() default Worker #40
13/04/04 10:58:24 DEBUG smtpserver: Calling reset() default Worker #40
13/04/04 10:58:24 DEBUG smtpserver: Calling reset() default Worker #40
13/04/04 10:58:24 DEBUG smtpserver: Calling reset() default Worker #40
13/04/04 10:58:24 DEBUG smtpserver: Calling reset() default Worker #40
13/04/04 10:58:24 DEBUG smtpserver: Calling reset() default Worker #40
13/04/04 10:58:24 DEBUG smtpserver: Calling reset() default Worker #40


the interesting is that some messages only occours a few times but this
work #40 appears several hundred times.

Any idea if this can be the source of the problem

Thanks




RE: James Running Out Of Connections

Posted by "Noel J. Bergman" <no...@devtech.com>.
> You can use a program called smtp-source that comes with
> postfix to send email to your server using several session

I use Russell Coker's "postal" to do the same thing.  :-)

	--- Noel

---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


Re: James Running Out Of Connections

Posted by Cesar Bonadio <bo...@picture.com.br>.
Just another tip

You can use a program called smtp-source that comes with postfix to send
email to your server
using several session you just start the program and indicates how many
sessions you want 
and the program opens as many simultaneous sessions and send a test
email. This way
you can make sure that the problem is not related to James and how James
will work in such load

[]s



On Fri, 2004-02-13 at 15:30, Michael Nestler wrote:

> I will send the netstat output if it should happen again. Yes, I had a 
> connection limit of 30 (increased it now to 120) - but the connections 
> were not release for 6 hours. James started logging
> 
> 12/02/04 02:29:38 WARN  connections: Maximum number of open connections 
> exceeded - refusing connection.  Current number of connections is 30
> 
> and continued doing this once every few seconds until I stopped it when 
> I came to work in the morning:
> 
> 12/02/04 08:11:19 WARN  connections: Maximum number of open connections 
> exceeded - refusing connection.  Current number of connections is 30
> 12/02/04 08:11:21 DEBUG connections: Disposing server 
> connection...org.apache.james.util.connection.ServerConnection@178655
> 12/02/04 08:11:21 DEBUG connections: Closed server connection - cleaning 
> up clients - org.apache.james.util.connection.ServerConnection@178655
> 12/02/04 08:11:21 DEBUG connections: Cleaned up clients - 
> org.apache.james.util.connection.ServerConnection@178655
> 12/02/04 08:11:21 DEBUG connections: Disposing server 
> connection...org.apache.james.util.connection.ServerConnection@746ad0
> 12/02/04 08:11:21 DEBUG connections: Closed server connection - cleaning 
> up clients - org.apache.james.util.connection.ServerConnection@746ad0
> 
> So, yes it used all connections in the pool - but they were never 
> released. That is my problem.
> 
> Michael
> 
> 
> 
> Noel J. Bergman wrote:
> 
> >Michael,
> >
> >You can use netstat | grep smtp to check, but I suspect that James is
> >working as designed.  It appears that you have a connection limit of 30, and
> >really do have 30 simultaneously active connections.
> >
> >	--- Noel
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> >For additional commands, e-mail: server-user-help@james.apache.org
> >
> >
> >  
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> For additional commands, e-mail: server-user-help@james.apache.org
> 

Re: James Running Out Of Connections

Posted by Michael Nestler <mn...@whoglue.com>.
Cesar Bonadio wrote:

>On Fri, 2004-02-13 at 15:30, Michael Nestler wrote:
>
>  
>
>>I will send the netstat output if it should happen again. Yes, I had a 
>>connection limit of 30 (increased it now to 120) - but the connections 
>>were not release for 6 hours. James started logging
>>
>>    
>>
>
>
>Make sure you have anougth threads configured to open 120 connections 
>
>  
>

I set max-threads to 150. I guess that is sufficient - but please 
correct me if I should be wrong.

Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


Re: James Running Out Of Connections

Posted by Cesar Bonadio <bo...@picture.com.br>.
On Fri, 2004-02-13 at 15:30, Michael Nestler wrote:

> I will send the netstat output if it should happen again. Yes, I had a 
> connection limit of 30 (increased it now to 120) - but the connections 
> were not release for 6 hours. James started logging
> 


Make sure you have anougth threads configured to open 120 connections 



> 12/02/04 02:29:38 WARN  connections: Maximum number of open connections 
> exceeded - refusing connection.  Current number of connections is 30
> 
> and continued doing this once every few seconds until I stopped it when 
> I came to work in the morning:
> 
> 12/02/04 08:11:19 WARN  connections: Maximum number of open connections 
> exceeded - refusing connection.  Current number of connections is 30
> 12/02/04 08:11:21 DEBUG connections: Disposing server 
> connection...org.apache.james.util.connection.ServerConnection@178655
> 12/02/04 08:11:21 DEBUG connections: Closed server connection - cleaning 
> up clients - org.apache.james.util.connection.ServerConnection@178655
> 12/02/04 08:11:21 DEBUG connections: Cleaned up clients - 
> org.apache.james.util.connection.ServerConnection@178655
> 12/02/04 08:11:21 DEBUG connections: Disposing server 
> connection...org.apache.james.util.connection.ServerConnection@746ad0
> 12/02/04 08:11:21 DEBUG connections: Closed server connection - cleaning 
> up clients - org.apache.james.util.connection.ServerConnection@746ad0
> 
> So, yes it used all connections in the pool - but they were never 
> released. That is my problem.
> 
> Michael
> 
> 
> 
> Noel J. Bergman wrote:
> 
> >Michael,
> >
> >You can use netstat | grep smtp to check, but I suspect that James is
> >working as designed.  It appears that you have a connection limit of 30, and
> >really do have 30 simultaneously active connections.
> >
> >	--- Noel
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> >For additional commands, e-mail: server-user-help@james.apache.org
> >
> >
> >  
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> For additional commands, e-mail: server-user-help@james.apache.org
> 

Re: James Running Out Of Connections

Posted by Robert J Taylor <rj...@cox.net>.
What was the resolution of this problem?

I'm running James 2.1.2 on:
Linux pjemail1.jazel.net 2.4.21-4.0.1.ELsmp #1 SMP Thu Oct 23 01:21:42 
EDT 2003 i686 athlon i386 GNU/Linux

Java:
java version "1.4.2_03"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02)
Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode)


Thanks,

Robert Taylor

Cesar Bonadio wrote:
> 
> java version "1.4.2_03"
> Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02)
> Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode)
> 
> And Linux 2.4
> 
> 
> On Tue, 2004-02-17 at 17:02, Michael Nestler wrote:
> 
> 
>>Phew - at least I am not the only one who has this problem ;-) Are you 
>>using Linux 2.4 as well?
>>
>>I also did a load test before deploying James into production and I was 
>>able to deliver about 100,000 emails within 8 hours without any issue. 
>>The issue seems to occur only sporadically - sometimes it takes only a 
>>few hours, sometimes a few days. But once it hangs, it does not come 
>>back in my case (at least not within 6 hours).
>>
>>I also had the issue of James suddenly "partially" halting before I 
>>upgraded from file to db persistence. I don't know though if this was 
>>the same problem.
>>
>>My connections log level was set to DEBUG when the log was made.
>>
>>I did not have any ESTABLISHED smtp connection while James was hanging - 
>>almost all had status TIME_WAIT.
>>
>>Michael
>>
>>
>>Cesar Bonadio wrote:
>>
>>
>>>I think you are not alone :) 
>>>
>>>In the last five days I have found several messages in the connection
>>>log with the same
>>>problem my server stops responding but after a while it comes back and
>>>if I run a netstat when
>>>I am receiving the error I really dont have more than 20 ESTABLISED
>>>connections.
>>>
>>>This is interesting problem because in our test server running the same
>>>version of James and java
>>>a manage to deliver more than 20.000 messages with smtp-source and the
>>>connections works just fine
>>>if I go over it complains and releases the connections after use.
>>>
>>>I will put my connection log in debug mode to see if I get something
>>>
>>>
>>>[]s
>>>
>>>
>>>
>>>
>>>On Tue, 2004-02-17 at 11:35, Michael Nestler wrote:
>>>
>>> 
>>>
>>>
>>>>OK, James stopped again last night at about 1:50 am. I did not have the 
>>>>time yet to run postal or smtp-source. But this time I have a log 
>>>>showing smtp sockets with a 10 minute resolution. Since the last stop I 
>>>>increased the maximum number of connections to 120 (thread number to 150).
>>>>
>>>>At 1:50 James reported in connections*.log:
>>>>
>>>>17/02/04 01:50:00 WARN  connections: Maximum number of open connections 
>>>>exceeded - refusing connection.  Current number of connections is 120
>>>>
>>>>repeatedly until 8:29 am when I restarted James. My smtpconn.log (cron 
>>>>job "netstat | grep smtp") shows for the same time:
>>>>
>>>>Tue Feb 17 01:50:00 EST 2004
>>>>tcp        0      0 email1.whoglue.net:smtp DSL217-1:fyre-messanger 
>>>>SYN_RECV   
>>>>tcp        0      0 192.168.168.5:smtp      vm211.vm:cruise-swroute 
>>>>ESTABLISHED
>>>>tcp        1      0 192.168.168.5:smtp      220.127.103.78:idmaps   
>>>>CLOSE_WAIT 
>>>>tcp        0      0 192.168.168.5:smtp      mta.email.contine:44071 
>>>>TIME_WAIT  
>>>>tcp        0      0 192.168.168.5:smtp      custd530.innerhost:4074 
>>>>ESTABLISHED
>>>>tcp        0      0 192.168.168.5:smtp      pro109.profontis.:59902 
>>>>ESTABLISHED
>>>>tcp        0      0 192.168.168.5:smtp      smtp01.1800contact:4220 
>>>>ESTABLISHED
>>>>tcp        0      0 192.168.168.5:smtp      mta.email.contine:42345 
>>>>ESTABLISHED
>>>>tcp        0      0 192.168.168.5:smtp      mta.email.contine:25694 
>>>>ESTABLISHED
>>>>tcp        0      0 192.168.168.5:smtp      stdout-mv-01.inbo:44652 
>>>>TIME_WAIT  
>>>>tcp        0      0 192.168.168.5:smtp      128.121.26.2:31574      
>>>>ESTABLISHED
>>>>tcp        1      0 192.168.168.5:smtp      ool-435798e8.dy:evb-elm 
>>>>CLOSE_WAIT 
>>>>tcp        1      0 192.168.168.5:smtp      lsanca1-ar5:notify_srvr 
>>>>CLOSE_WAIT 
>>>>tcp        0      0 192.168.168.5:smtp      mta113.cheetahmai:52853 
>>>>TIME_WAIT  
>>>>tcp        0      0 192.168.168.5:smtp      66.42.142.98:49221      
>>>>ESTABLISHED
>>>>tcp        0      0 192.168.168.5:smtp      server611.colonize:3926 
>>>>ESTABLISHED
>>>>tcp        1      0 192.168.168.5:smtp      211-56-243-3.inter:linx 
>>>>CLOSE_WAIT 
>>>>tcp        0      0 192.168.168.5:smtp      wchicon1.qvc.com:4611   
>>>>ESTABLISHED
>>>>tcp        0      0 192.168.168.5:smtp      vm125.vmadmin.com:8381  
>>>>ESTABLISHED
>>>>tcp        7      0 192.168.168.5:smtp      66.28.60.195:58491      
>>>>CLOSE_WAIT 
>>>>tcp        0      0 192.168.168.5:smtp      imo-r10.mx.aol.co:61761 
>>>>ESTABLISHED
>>>>tcp        0      0 192.168.168.5:smtp      bonaire.server1.o:52001 
>>>>ESTABLISHED
>>>>tcp        0      0 192.168.168.5:smtp      bonaire.server1.o:52272 
>>>>ESTABLISHED
>>>>tcp        1      0 192.168.168.5:smtp      202.Red-213-97-41:pipes 
>>>>CLOSE_WAIT 
>>>>tcp        0      0 192.168.168.5:smtp      n25.grp.scd.yahoo:47649 
>>>>ESTABLISHED
>>>>tcp        1      0 192.168.168.5:smtp      218.98.97.183:4612      
>>>>CLOSE_WAIT 
>>>>tcp        1      0 192.168.168.5:smtp      mail.contactel.cz:43724 
>>>>CLOSE_WAIT 
>>>>tcp        0      0 192.168.168.5:smtp      mail.contactel.cz:43794 
>>>>ESTABLISHED
>>>>tcp        0      0 192.168.168.5:smtp      mail.contactel.cz:43801 
>>>>ESTABLISHED
>>>>tcp        0      0 192.168.168.5:smtp      212.143.236.10:hylafax  
>>>>ESTABLISHED
>>>>tcp        0      0 192.168.168.5:smtp      cust54-224.netcabo:4416 
>>>>ESTABLISHED
>>>>tcp        0      0 192.168.168.5:smtp      ana216.anotherday:40710 
>>>>ESTABLISHED
>>>>tcp        1      0 192.168.168.5:smtp      218.232.228.130:vsixml  
>>>>CLOSE_WAIT 
>>>>
>>>>
>>>>And at the next time the netstat snapshot was made:
>>>>
>>>>Tue Feb 17 02:00:00 EST 2004
>>>>tcp        0      0 192.168.168.5:smtp      mta.email.contine:13622 
>>>>TIME_WAIT  
>>>>tcp        0      0 192.168.168.5:smtp      imo-m23.mx.aol.co:63699 
>>>>TIME_WAIT  
>>>>tcp        0      0 192.168.168.5:smtp      modemcable:conclave-cpp 
>>>>TIME_WAIT  
>>>>tcp        0      0 192.168.168.5:smtp      219.144.194.18:43349    
>>>>TIME_WAIT  
>>>>tcp        0      0 192.168.168.5:smtp      219.144.194.18:43281    
>>>>TIME_WAIT  
>>>>tcp        0      0 192.168.168.5:smtp      c-24-7-95-19.clien:4669 
>>>>TIME_WAIT  
>>>>tcp        0      0 192.168.168.5:smtp      web60804.mail.yah:30519 
>>>>TIME_WAIT  
>>>>tcp        0      0 192.168.168.5:smtp      web60804.mail.yah:30518 
>>>>TIME_WAIT  
>>>>tcp        0      0 192.168.168.5:smtp      relay.quickhire.c:34089 
>>>>TIME_WAIT  
>>>>tcp        0      0 192.168.168.5:smtp      ACC1B832.:ns-cfg-server 
>>>>TIME_WAIT  
>>>>
>>>>
>>>>>From there on the smtp sockets log file basically shows only TIME_WAIT 
>>>>sockets (there are a few SYN_RECV) until I restarted James at 8:29 am. 
>>>>Never netstat shows 120 open smtp connections (what James seems to 
>>>>believe). As a strange aside the default*.log contains a database 
>>>>connection exception (it contained the same exception at the last time 
>>>>James stopped 6 days ago). I don't know if this means anything - could 
>>>>this exception have caused the problem? spoolmanager*.log and 
>>>>mailet*.log stopped recording at 1:32 am.
>>>>
>>>>To summarize the events from my point of view:
>>>>o James was running for about 6 days without any problem, delivering 
>>>>about 5,000-10,000 emails a day
>>>>o some event at a few minutes before 1:50 am stopped James from 
>>>>returning used smtp connection socket objects to the pool starting at 
>>>>that time
>>>>o as a consequence new connections were not accepted and the service 
>>>>stopped effectively at 1:32 am
>>>>o it never recovered from that situation until I relieved it by 
>>>>restarting phoenix at 8:29 am
>>>>
>>>>>From my point of view it seems to be likely that the mentioned 
>>>>exception in default*.log caused the problem, since it happened so close 
>>>>to the time James stopped:
>>>>
>>>>17/02/04 01:33:11 INFO  database-connections.maildb:  ***** connection 1 
>>>>is way too old: 63408 > 60000
>>>>17/02/04 01:33:11 INFO  database-connections.maildb: java.lang.Throwable
>>>>  at 
>>>>org.apache.james.util.mordred.PoolConnEntry.lock(PoolConnEntry.java:135)
>>>>  at 
>>>>org.apache.james.util.mordred.JdbcDataSource.getConnection(JdbcDataSource.java:178) 
>>>>
>>>>  at 
>>>>org.apache.james.mailrepository.JDBCMailRepository.store(JDBCMailRepository.java:484) 
>>>>
>>>>  at 
>>>>org.apache.james.mailrepository.JDBCSpoolRepository.store(JDBCSpoolRepository.java:243) 
>>>>
>>>>  at 
>>>>org.apache.james.transport.mailets.RemoteDelivery.service(RemoteDelivery.java:649) 
>>>>
>>>>  at 
>>>>org.apache.james.transport.LinearProcessor.service(LinearProcessor.java:413) 
>>>>
>>>>  at 
>>>>org.apache.james.transport.JamesSpoolManager.process(JamesSpoolManager.java:436) 
>>>>
>>>>  at 
>>>>org.apache.james.transport.JamesSpoolManager.run(JamesSpoolManager.java:366) 
>>>>
>>>>  at 
>>>>org.apache.avalon.excalibur.thread.impl.ExecutableRunnable.execute(ExecutableRunnable.java:47) 
>>>>
>>>>  at 
>>>>org.apache.avalon.excalibur.thread.impl.WorkerThread.run(WorkerThread.java:80) 
>>>>
>>>>
>>>>On the other hand it could be just a side effect. I attached the 
>>>>mentioned log files since midnight to this email in zipped format.
>>>>
>>>>I really hope somebody on this list can make any sense out of this and 
>>>>knows what's going on here and how I can fix it. Let me know if I should 
>>>>run postal or smtp-source first and I will do it.
>>>>
>>>>All help is greatly appreciated.
>>>>
>>>>Best regards,
>>>>Michael
>>>>
>>>>P.S. I was not able to attach a ZIP file to this email. mail.apache.org 
>>>>bounced it saying "552 we don't accept email with executable content 
>>>>(#5.3.4)" ? I try it again as TGZ.
>>>>
>>>>
>>>>
>>>>Noel J. Bergman wrote:
>>>>
>>>>   
>>>>
>>>>
>>>>>Run netstat and check to see if they are real.  I have tested the connection
>>>>>handling with linux kernel 2.2.22, 2.4.20-18 and 2.4.20-24 without problem,
>>>>>amongst other versions.
>>>>>
>>>>>What version of James?
>>>>>
>>>>>	--- Noel
>>>>>
>>>>>     
>>>>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
>>For additional commands, e-mail: server-user-help@james.apache.org
>>
> 
> 



Re: James Running Out Of Connections

Posted by Cesar Bonadio <bo...@picture.com.br>.

java version "1.4.2_03"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02)
Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode)

And Linux 2.4


On Tue, 2004-02-17 at 17:02, Michael Nestler wrote:

> Phew - at least I am not the only one who has this problem ;-) Are you 
> using Linux 2.4 as well?
> 
> I also did a load test before deploying James into production and I was 
> able to deliver about 100,000 emails within 8 hours without any issue. 
> The issue seems to occur only sporadically - sometimes it takes only a 
> few hours, sometimes a few days. But once it hangs, it does not come 
> back in my case (at least not within 6 hours).
> 
> I also had the issue of James suddenly "partially" halting before I 
> upgraded from file to db persistence. I don't know though if this was 
> the same problem.
> 
> My connections log level was set to DEBUG when the log was made.
> 
> I did not have any ESTABLISHED smtp connection while James was hanging - 
> almost all had status TIME_WAIT.
> 
> Michael
> 
> 
> Cesar Bonadio wrote:
> 
> >I think you are not alone :) 
> >
> >In the last five days I have found several messages in the connection
> >log with the same
> >problem my server stops responding but after a while it comes back and
> >if I run a netstat when
> >I am receiving the error I really dont have more than 20 ESTABLISED
> >connections.
> >
> >This is interesting problem because in our test server running the same
> >version of James and java
> >a manage to deliver more than 20.000 messages with smtp-source and the
> >connections works just fine
> >if I go over it complains and releases the connections after use.
> >
> >I will put my connection log in debug mode to see if I get something
> >
> >
> >[]s
> >
> >
> >
> >
> >On Tue, 2004-02-17 at 11:35, Michael Nestler wrote:
> >
> >  
> >
> >>OK, James stopped again last night at about 1:50 am. I did not have the 
> >>time yet to run postal or smtp-source. But this time I have a log 
> >>showing smtp sockets with a 10 minute resolution. Since the last stop I 
> >>increased the maximum number of connections to 120 (thread number to 150).
> >>
> >>At 1:50 James reported in connections*.log:
> >>
> >>17/02/04 01:50:00 WARN  connections: Maximum number of open connections 
> >>exceeded - refusing connection.  Current number of connections is 120
> >>
> >>repeatedly until 8:29 am when I restarted James. My smtpconn.log (cron 
> >>job "netstat | grep smtp") shows for the same time:
> >>
> >>Tue Feb 17 01:50:00 EST 2004
> >>tcp        0      0 email1.whoglue.net:smtp DSL217-1:fyre-messanger 
> >>SYN_RECV   
> >>tcp        0      0 192.168.168.5:smtp      vm211.vm:cruise-swroute 
> >>ESTABLISHED
> >>tcp        1      0 192.168.168.5:smtp      220.127.103.78:idmaps   
> >>CLOSE_WAIT 
> >>tcp        0      0 192.168.168.5:smtp      mta.email.contine:44071 
> >>TIME_WAIT  
> >>tcp        0      0 192.168.168.5:smtp      custd530.innerhost:4074 
> >>ESTABLISHED
> >>tcp        0      0 192.168.168.5:smtp      pro109.profontis.:59902 
> >>ESTABLISHED
> >>tcp        0      0 192.168.168.5:smtp      smtp01.1800contact:4220 
> >>ESTABLISHED
> >>tcp        0      0 192.168.168.5:smtp      mta.email.contine:42345 
> >>ESTABLISHED
> >>tcp        0      0 192.168.168.5:smtp      mta.email.contine:25694 
> >>ESTABLISHED
> >>tcp        0      0 192.168.168.5:smtp      stdout-mv-01.inbo:44652 
> >>TIME_WAIT  
> >>tcp        0      0 192.168.168.5:smtp      128.121.26.2:31574      
> >>ESTABLISHED
> >>tcp        1      0 192.168.168.5:smtp      ool-435798e8.dy:evb-elm 
> >>CLOSE_WAIT 
> >>tcp        1      0 192.168.168.5:smtp      lsanca1-ar5:notify_srvr 
> >>CLOSE_WAIT 
> >>tcp        0      0 192.168.168.5:smtp      mta113.cheetahmai:52853 
> >>TIME_WAIT  
> >>tcp        0      0 192.168.168.5:smtp      66.42.142.98:49221      
> >>ESTABLISHED
> >>tcp        0      0 192.168.168.5:smtp      server611.colonize:3926 
> >>ESTABLISHED
> >>tcp        1      0 192.168.168.5:smtp      211-56-243-3.inter:linx 
> >>CLOSE_WAIT 
> >>tcp        0      0 192.168.168.5:smtp      wchicon1.qvc.com:4611   
> >>ESTABLISHED
> >>tcp        0      0 192.168.168.5:smtp      vm125.vmadmin.com:8381  
> >>ESTABLISHED
> >>tcp        7      0 192.168.168.5:smtp      66.28.60.195:58491      
> >>CLOSE_WAIT 
> >>tcp        0      0 192.168.168.5:smtp      imo-r10.mx.aol.co:61761 
> >>ESTABLISHED
> >>tcp        0      0 192.168.168.5:smtp      bonaire.server1.o:52001 
> >>ESTABLISHED
> >>tcp        0      0 192.168.168.5:smtp      bonaire.server1.o:52272 
> >>ESTABLISHED
> >>tcp        1      0 192.168.168.5:smtp      202.Red-213-97-41:pipes 
> >>CLOSE_WAIT 
> >>tcp        0      0 192.168.168.5:smtp      n25.grp.scd.yahoo:47649 
> >>ESTABLISHED
> >>tcp        1      0 192.168.168.5:smtp      218.98.97.183:4612      
> >>CLOSE_WAIT 
> >>tcp        1      0 192.168.168.5:smtp      mail.contactel.cz:43724 
> >>CLOSE_WAIT 
> >>tcp        0      0 192.168.168.5:smtp      mail.contactel.cz:43794 
> >>ESTABLISHED
> >>tcp        0      0 192.168.168.5:smtp      mail.contactel.cz:43801 
> >>ESTABLISHED
> >>tcp        0      0 192.168.168.5:smtp      212.143.236.10:hylafax  
> >>ESTABLISHED
> >>tcp        0      0 192.168.168.5:smtp      cust54-224.netcabo:4416 
> >>ESTABLISHED
> >>tcp        0      0 192.168.168.5:smtp      ana216.anotherday:40710 
> >>ESTABLISHED
> >>tcp        1      0 192.168.168.5:smtp      218.232.228.130:vsixml  
> >>CLOSE_WAIT 
> >>
> >>
> >>And at the next time the netstat snapshot was made:
> >>
> >>Tue Feb 17 02:00:00 EST 2004
> >>tcp        0      0 192.168.168.5:smtp      mta.email.contine:13622 
> >>TIME_WAIT  
> >>tcp        0      0 192.168.168.5:smtp      imo-m23.mx.aol.co:63699 
> >>TIME_WAIT  
> >>tcp        0      0 192.168.168.5:smtp      modemcable:conclave-cpp 
> >>TIME_WAIT  
> >>tcp        0      0 192.168.168.5:smtp      219.144.194.18:43349    
> >>TIME_WAIT  
> >>tcp        0      0 192.168.168.5:smtp      219.144.194.18:43281    
> >>TIME_WAIT  
> >>tcp        0      0 192.168.168.5:smtp      c-24-7-95-19.clien:4669 
> >>TIME_WAIT  
> >>tcp        0      0 192.168.168.5:smtp      web60804.mail.yah:30519 
> >>TIME_WAIT  
> >>tcp        0      0 192.168.168.5:smtp      web60804.mail.yah:30518 
> >>TIME_WAIT  
> >>tcp        0      0 192.168.168.5:smtp      relay.quickhire.c:34089 
> >>TIME_WAIT  
> >>tcp        0      0 192.168.168.5:smtp      ACC1B832.:ns-cfg-server 
> >>TIME_WAIT  
> >>
> >>
> >> From there on the smtp sockets log file basically shows only TIME_WAIT 
> >>sockets (there are a few SYN_RECV) until I restarted James at 8:29 am. 
> >>Never netstat shows 120 open smtp connections (what James seems to 
> >>believe). As a strange aside the default*.log contains a database 
> >>connection exception (it contained the same exception at the last time 
> >>James stopped 6 days ago). I don't know if this means anything - could 
> >>this exception have caused the problem? spoolmanager*.log and 
> >>mailet*.log stopped recording at 1:32 am.
> >>
> >>To summarize the events from my point of view:
> >>o James was running for about 6 days without any problem, delivering 
> >>about 5,000-10,000 emails a day
> >>o some event at a few minutes before 1:50 am stopped James from 
> >>returning used smtp connection socket objects to the pool starting at 
> >>that time
> >>o as a consequence new connections were not accepted and the service 
> >>stopped effectively at 1:32 am
> >>o it never recovered from that situation until I relieved it by 
> >>restarting phoenix at 8:29 am
> >>
> >> From my point of view it seems to be likely that the mentioned 
> >>exception in default*.log caused the problem, since it happened so close 
> >>to the time James stopped:
> >>
> >>17/02/04 01:33:11 INFO  database-connections.maildb:  ***** connection 1 
> >>is way too old: 63408 > 60000
> >>17/02/04 01:33:11 INFO  database-connections.maildb: java.lang.Throwable
> >>   at 
> >>org.apache.james.util.mordred.PoolConnEntry.lock(PoolConnEntry.java:135)
> >>   at 
> >>org.apache.james.util.mordred.JdbcDataSource.getConnection(JdbcDataSource.java:178) 
> >>
> >>   at 
> >>org.apache.james.mailrepository.JDBCMailRepository.store(JDBCMailRepository.java:484) 
> >>
> >>   at 
> >>org.apache.james.mailrepository.JDBCSpoolRepository.store(JDBCSpoolRepository.java:243) 
> >>
> >>   at 
> >>org.apache.james.transport.mailets.RemoteDelivery.service(RemoteDelivery.java:649) 
> >>
> >>   at 
> >>org.apache.james.transport.LinearProcessor.service(LinearProcessor.java:413) 
> >>
> >>   at 
> >>org.apache.james.transport.JamesSpoolManager.process(JamesSpoolManager.java:436) 
> >>
> >>   at 
> >>org.apache.james.transport.JamesSpoolManager.run(JamesSpoolManager.java:366) 
> >>
> >>   at 
> >>org.apache.avalon.excalibur.thread.impl.ExecutableRunnable.execute(ExecutableRunnable.java:47) 
> >>
> >>   at 
> >>org.apache.avalon.excalibur.thread.impl.WorkerThread.run(WorkerThread.java:80) 
> >>
> >>
> >>On the other hand it could be just a side effect. I attached the 
> >>mentioned log files since midnight to this email in zipped format.
> >>
> >>I really hope somebody on this list can make any sense out of this and 
> >>knows what's going on here and how I can fix it. Let me know if I should 
> >>run postal or smtp-source first and I will do it.
> >>
> >>All help is greatly appreciated.
> >>
> >>Best regards,
> >>Michael
> >>
> >>P.S. I was not able to attach a ZIP file to this email. mail.apache.org 
> >>bounced it saying "552 we don't accept email with executable content 
> >>(#5.3.4)" ? I try it again as TGZ.
> >>
> >>
> >>
> >>Noel J. Bergman wrote:
> >>
> >>    
> >>
> >>>Run netstat and check to see if they are real.  I have tested the connection
> >>>handling with linux kernel 2.2.22, 2.4.20-18 and 2.4.20-24 without problem,
> >>>amongst other versions.
> >>>
> >>>What version of James?
> >>>
> >>>	--- Noel
> >>>
> >>>      
> >>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> For additional commands, e-mail: server-user-help@james.apache.org
> 

Re: James Running Out Of Connections

Posted by Michael Nestler <mn...@whoglue.com>.
Phew - at least I am not the only one who has this problem ;-) Are you 
using Linux 2.4 as well?

I also did a load test before deploying James into production and I was 
able to deliver about 100,000 emails within 8 hours without any issue. 
The issue seems to occur only sporadically - sometimes it takes only a 
few hours, sometimes a few days. But once it hangs, it does not come 
back in my case (at least not within 6 hours).

I also had the issue of James suddenly "partially" halting before I 
upgraded from file to db persistence. I don't know though if this was 
the same problem.

My connections log level was set to DEBUG when the log was made.

I did not have any ESTABLISHED smtp connection while James was hanging - 
almost all had status TIME_WAIT.

Michael


Cesar Bonadio wrote:

>I think you are not alone :) 
>
>In the last five days I have found several messages in the connection
>log with the same
>problem my server stops responding but after a while it comes back and
>if I run a netstat when
>I am receiving the error I really dont have more than 20 ESTABLISED
>connections.
>
>This is interesting problem because in our test server running the same
>version of James and java
>a manage to deliver more than 20.000 messages with smtp-source and the
>connections works just fine
>if I go over it complains and releases the connections after use.
>
>I will put my connection log in debug mode to see if I get something
>
>
>[]s
>
>
>
>
>On Tue, 2004-02-17 at 11:35, Michael Nestler wrote:
>
>  
>
>>OK, James stopped again last night at about 1:50 am. I did not have the 
>>time yet to run postal or smtp-source. But this time I have a log 
>>showing smtp sockets with a 10 minute resolution. Since the last stop I 
>>increased the maximum number of connections to 120 (thread number to 150).
>>
>>At 1:50 James reported in connections*.log:
>>
>>17/02/04 01:50:00 WARN  connections: Maximum number of open connections 
>>exceeded - refusing connection.  Current number of connections is 120
>>
>>repeatedly until 8:29 am when I restarted James. My smtpconn.log (cron 
>>job "netstat | grep smtp") shows for the same time:
>>
>>Tue Feb 17 01:50:00 EST 2004
>>tcp        0      0 email1.whoglue.net:smtp DSL217-1:fyre-messanger 
>>SYN_RECV   
>>tcp        0      0 192.168.168.5:smtp      vm211.vm:cruise-swroute 
>>ESTABLISHED
>>tcp        1      0 192.168.168.5:smtp      220.127.103.78:idmaps   
>>CLOSE_WAIT 
>>tcp        0      0 192.168.168.5:smtp      mta.email.contine:44071 
>>TIME_WAIT  
>>tcp        0      0 192.168.168.5:smtp      custd530.innerhost:4074 
>>ESTABLISHED
>>tcp        0      0 192.168.168.5:smtp      pro109.profontis.:59902 
>>ESTABLISHED
>>tcp        0      0 192.168.168.5:smtp      smtp01.1800contact:4220 
>>ESTABLISHED
>>tcp        0      0 192.168.168.5:smtp      mta.email.contine:42345 
>>ESTABLISHED
>>tcp        0      0 192.168.168.5:smtp      mta.email.contine:25694 
>>ESTABLISHED
>>tcp        0      0 192.168.168.5:smtp      stdout-mv-01.inbo:44652 
>>TIME_WAIT  
>>tcp        0      0 192.168.168.5:smtp      128.121.26.2:31574      
>>ESTABLISHED
>>tcp        1      0 192.168.168.5:smtp      ool-435798e8.dy:evb-elm 
>>CLOSE_WAIT 
>>tcp        1      0 192.168.168.5:smtp      lsanca1-ar5:notify_srvr 
>>CLOSE_WAIT 
>>tcp        0      0 192.168.168.5:smtp      mta113.cheetahmai:52853 
>>TIME_WAIT  
>>tcp        0      0 192.168.168.5:smtp      66.42.142.98:49221      
>>ESTABLISHED
>>tcp        0      0 192.168.168.5:smtp      server611.colonize:3926 
>>ESTABLISHED
>>tcp        1      0 192.168.168.5:smtp      211-56-243-3.inter:linx 
>>CLOSE_WAIT 
>>tcp        0      0 192.168.168.5:smtp      wchicon1.qvc.com:4611   
>>ESTABLISHED
>>tcp        0      0 192.168.168.5:smtp      vm125.vmadmin.com:8381  
>>ESTABLISHED
>>tcp        7      0 192.168.168.5:smtp      66.28.60.195:58491      
>>CLOSE_WAIT 
>>tcp        0      0 192.168.168.5:smtp      imo-r10.mx.aol.co:61761 
>>ESTABLISHED
>>tcp        0      0 192.168.168.5:smtp      bonaire.server1.o:52001 
>>ESTABLISHED
>>tcp        0      0 192.168.168.5:smtp      bonaire.server1.o:52272 
>>ESTABLISHED
>>tcp        1      0 192.168.168.5:smtp      202.Red-213-97-41:pipes 
>>CLOSE_WAIT 
>>tcp        0      0 192.168.168.5:smtp      n25.grp.scd.yahoo:47649 
>>ESTABLISHED
>>tcp        1      0 192.168.168.5:smtp      218.98.97.183:4612      
>>CLOSE_WAIT 
>>tcp        1      0 192.168.168.5:smtp      mail.contactel.cz:43724 
>>CLOSE_WAIT 
>>tcp        0      0 192.168.168.5:smtp      mail.contactel.cz:43794 
>>ESTABLISHED
>>tcp        0      0 192.168.168.5:smtp      mail.contactel.cz:43801 
>>ESTABLISHED
>>tcp        0      0 192.168.168.5:smtp      212.143.236.10:hylafax  
>>ESTABLISHED
>>tcp        0      0 192.168.168.5:smtp      cust54-224.netcabo:4416 
>>ESTABLISHED
>>tcp        0      0 192.168.168.5:smtp      ana216.anotherday:40710 
>>ESTABLISHED
>>tcp        1      0 192.168.168.5:smtp      218.232.228.130:vsixml  
>>CLOSE_WAIT 
>>
>>
>>And at the next time the netstat snapshot was made:
>>
>>Tue Feb 17 02:00:00 EST 2004
>>tcp        0      0 192.168.168.5:smtp      mta.email.contine:13622 
>>TIME_WAIT  
>>tcp        0      0 192.168.168.5:smtp      imo-m23.mx.aol.co:63699 
>>TIME_WAIT  
>>tcp        0      0 192.168.168.5:smtp      modemcable:conclave-cpp 
>>TIME_WAIT  
>>tcp        0      0 192.168.168.5:smtp      219.144.194.18:43349    
>>TIME_WAIT  
>>tcp        0      0 192.168.168.5:smtp      219.144.194.18:43281    
>>TIME_WAIT  
>>tcp        0      0 192.168.168.5:smtp      c-24-7-95-19.clien:4669 
>>TIME_WAIT  
>>tcp        0      0 192.168.168.5:smtp      web60804.mail.yah:30519 
>>TIME_WAIT  
>>tcp        0      0 192.168.168.5:smtp      web60804.mail.yah:30518 
>>TIME_WAIT  
>>tcp        0      0 192.168.168.5:smtp      relay.quickhire.c:34089 
>>TIME_WAIT  
>>tcp        0      0 192.168.168.5:smtp      ACC1B832.:ns-cfg-server 
>>TIME_WAIT  
>>
>>
>> From there on the smtp sockets log file basically shows only TIME_WAIT 
>>sockets (there are a few SYN_RECV) until I restarted James at 8:29 am. 
>>Never netstat shows 120 open smtp connections (what James seems to 
>>believe). As a strange aside the default*.log contains a database 
>>connection exception (it contained the same exception at the last time 
>>James stopped 6 days ago). I don't know if this means anything - could 
>>this exception have caused the problem? spoolmanager*.log and 
>>mailet*.log stopped recording at 1:32 am.
>>
>>To summarize the events from my point of view:
>>o James was running for about 6 days without any problem, delivering 
>>about 5,000-10,000 emails a day
>>o some event at a few minutes before 1:50 am stopped James from 
>>returning used smtp connection socket objects to the pool starting at 
>>that time
>>o as a consequence new connections were not accepted and the service 
>>stopped effectively at 1:32 am
>>o it never recovered from that situation until I relieved it by 
>>restarting phoenix at 8:29 am
>>
>> From my point of view it seems to be likely that the mentioned 
>>exception in default*.log caused the problem, since it happened so close 
>>to the time James stopped:
>>
>>17/02/04 01:33:11 INFO  database-connections.maildb:  ***** connection 1 
>>is way too old: 63408 > 60000
>>17/02/04 01:33:11 INFO  database-connections.maildb: java.lang.Throwable
>>   at 
>>org.apache.james.util.mordred.PoolConnEntry.lock(PoolConnEntry.java:135)
>>   at 
>>org.apache.james.util.mordred.JdbcDataSource.getConnection(JdbcDataSource.java:178) 
>>
>>   at 
>>org.apache.james.mailrepository.JDBCMailRepository.store(JDBCMailRepository.java:484) 
>>
>>   at 
>>org.apache.james.mailrepository.JDBCSpoolRepository.store(JDBCSpoolRepository.java:243) 
>>
>>   at 
>>org.apache.james.transport.mailets.RemoteDelivery.service(RemoteDelivery.java:649) 
>>
>>   at 
>>org.apache.james.transport.LinearProcessor.service(LinearProcessor.java:413) 
>>
>>   at 
>>org.apache.james.transport.JamesSpoolManager.process(JamesSpoolManager.java:436) 
>>
>>   at 
>>org.apache.james.transport.JamesSpoolManager.run(JamesSpoolManager.java:366) 
>>
>>   at 
>>org.apache.avalon.excalibur.thread.impl.ExecutableRunnable.execute(ExecutableRunnable.java:47) 
>>
>>   at 
>>org.apache.avalon.excalibur.thread.impl.WorkerThread.run(WorkerThread.java:80) 
>>
>>
>>On the other hand it could be just a side effect. I attached the 
>>mentioned log files since midnight to this email in zipped format.
>>
>>I really hope somebody on this list can make any sense out of this and 
>>knows what's going on here and how I can fix it. Let me know if I should 
>>run postal or smtp-source first and I will do it.
>>
>>All help is greatly appreciated.
>>
>>Best regards,
>>Michael
>>
>>P.S. I was not able to attach a ZIP file to this email. mail.apache.org 
>>bounced it saying "552 we don't accept email with executable content 
>>(#5.3.4)" ? I try it again as TGZ.
>>
>>
>>
>>Noel J. Bergman wrote:
>>
>>    
>>
>>>Run netstat and check to see if they are real.  I have tested the connection
>>>handling with linux kernel 2.2.22, 2.4.20-18 and 2.4.20-24 without problem,
>>>amongst other versions.
>>>
>>>What version of James?
>>>
>>>	--- Noel
>>>
>>>      
>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


Re: James Running Out Of Connections

Posted by Cesar Bonadio <bo...@picture.com.br>.
I think you are not alone :) 

In the last five days I have found several messages in the connection
log with the same
problem my server stops responding but after a while it comes back and
if I run a netstat when
I am receiving the error I really dont have more than 20 ESTABLISED
connections.

This is interesting problem because in our test server running the same
version of James and java
a manage to deliver more than 20.000 messages with smtp-source and the
connections works just fine
if I go over it complains and releases the connections after use.

I will put my connection log in debug mode to see if I get something


[]s




On Tue, 2004-02-17 at 11:35, Michael Nestler wrote:

> OK, James stopped again last night at about 1:50 am. I did not have the 
> time yet to run postal or smtp-source. But this time I have a log 
> showing smtp sockets with a 10 minute resolution. Since the last stop I 
> increased the maximum number of connections to 120 (thread number to 150).
> 
> At 1:50 James reported in connections*.log:
> 
> 17/02/04 01:50:00 WARN  connections: Maximum number of open connections 
> exceeded - refusing connection.  Current number of connections is 120
> 
> repeatedly until 8:29 am when I restarted James. My smtpconn.log (cron 
> job "netstat | grep smtp") shows for the same time:
> 
> Tue Feb 17 01:50:00 EST 2004
> tcp        0      0 email1.whoglue.net:smtp DSL217-1:fyre-messanger 
> SYN_RECV   
> tcp        0      0 192.168.168.5:smtp      vm211.vm:cruise-swroute 
> ESTABLISHED
> tcp        1      0 192.168.168.5:smtp      220.127.103.78:idmaps   
> CLOSE_WAIT 
> tcp        0      0 192.168.168.5:smtp      mta.email.contine:44071 
> TIME_WAIT  
> tcp        0      0 192.168.168.5:smtp      custd530.innerhost:4074 
> ESTABLISHED
> tcp        0      0 192.168.168.5:smtp      pro109.profontis.:59902 
> ESTABLISHED
> tcp        0      0 192.168.168.5:smtp      smtp01.1800contact:4220 
> ESTABLISHED
> tcp        0      0 192.168.168.5:smtp      mta.email.contine:42345 
> ESTABLISHED
> tcp        0      0 192.168.168.5:smtp      mta.email.contine:25694 
> ESTABLISHED
> tcp        0      0 192.168.168.5:smtp      stdout-mv-01.inbo:44652 
> TIME_WAIT  
> tcp        0      0 192.168.168.5:smtp      128.121.26.2:31574      
> ESTABLISHED
> tcp        1      0 192.168.168.5:smtp      ool-435798e8.dy:evb-elm 
> CLOSE_WAIT 
> tcp        1      0 192.168.168.5:smtp      lsanca1-ar5:notify_srvr 
> CLOSE_WAIT 
> tcp        0      0 192.168.168.5:smtp      mta113.cheetahmai:52853 
> TIME_WAIT  
> tcp        0      0 192.168.168.5:smtp      66.42.142.98:49221      
> ESTABLISHED
> tcp        0      0 192.168.168.5:smtp      server611.colonize:3926 
> ESTABLISHED
> tcp        1      0 192.168.168.5:smtp      211-56-243-3.inter:linx 
> CLOSE_WAIT 
> tcp        0      0 192.168.168.5:smtp      wchicon1.qvc.com:4611   
> ESTABLISHED
> tcp        0      0 192.168.168.5:smtp      vm125.vmadmin.com:8381  
> ESTABLISHED
> tcp        7      0 192.168.168.5:smtp      66.28.60.195:58491      
> CLOSE_WAIT 
> tcp        0      0 192.168.168.5:smtp      imo-r10.mx.aol.co:61761 
> ESTABLISHED
> tcp        0      0 192.168.168.5:smtp      bonaire.server1.o:52001 
> ESTABLISHED
> tcp        0      0 192.168.168.5:smtp      bonaire.server1.o:52272 
> ESTABLISHED
> tcp        1      0 192.168.168.5:smtp      202.Red-213-97-41:pipes 
> CLOSE_WAIT 
> tcp        0      0 192.168.168.5:smtp      n25.grp.scd.yahoo:47649 
> ESTABLISHED
> tcp        1      0 192.168.168.5:smtp      218.98.97.183:4612      
> CLOSE_WAIT 
> tcp        1      0 192.168.168.5:smtp      mail.contactel.cz:43724 
> CLOSE_WAIT 
> tcp        0      0 192.168.168.5:smtp      mail.contactel.cz:43794 
> ESTABLISHED
> tcp        0      0 192.168.168.5:smtp      mail.contactel.cz:43801 
> ESTABLISHED
> tcp        0      0 192.168.168.5:smtp      212.143.236.10:hylafax  
> ESTABLISHED
> tcp        0      0 192.168.168.5:smtp      cust54-224.netcabo:4416 
> ESTABLISHED
> tcp        0      0 192.168.168.5:smtp      ana216.anotherday:40710 
> ESTABLISHED
> tcp        1      0 192.168.168.5:smtp      218.232.228.130:vsixml  
> CLOSE_WAIT 
> 
> 
> And at the next time the netstat snapshot was made:
> 
> Tue Feb 17 02:00:00 EST 2004
> tcp        0      0 192.168.168.5:smtp      mta.email.contine:13622 
> TIME_WAIT  
> tcp        0      0 192.168.168.5:smtp      imo-m23.mx.aol.co:63699 
> TIME_WAIT  
> tcp        0      0 192.168.168.5:smtp      modemcable:conclave-cpp 
> TIME_WAIT  
> tcp        0      0 192.168.168.5:smtp      219.144.194.18:43349    
> TIME_WAIT  
> tcp        0      0 192.168.168.5:smtp      219.144.194.18:43281    
> TIME_WAIT  
> tcp        0      0 192.168.168.5:smtp      c-24-7-95-19.clien:4669 
> TIME_WAIT  
> tcp        0      0 192.168.168.5:smtp      web60804.mail.yah:30519 
> TIME_WAIT  
> tcp        0      0 192.168.168.5:smtp      web60804.mail.yah:30518 
> TIME_WAIT  
> tcp        0      0 192.168.168.5:smtp      relay.quickhire.c:34089 
> TIME_WAIT  
> tcp        0      0 192.168.168.5:smtp      ACC1B832.:ns-cfg-server 
> TIME_WAIT  
> 
> 
>  From there on the smtp sockets log file basically shows only TIME_WAIT 
> sockets (there are a few SYN_RECV) until I restarted James at 8:29 am. 
> Never netstat shows 120 open smtp connections (what James seems to 
> believe). As a strange aside the default*.log contains a database 
> connection exception (it contained the same exception at the last time 
> James stopped 6 days ago). I don't know if this means anything - could 
> this exception have caused the problem? spoolmanager*.log and 
> mailet*.log stopped recording at 1:32 am.
> 
> To summarize the events from my point of view:
> o James was running for about 6 days without any problem, delivering 
> about 5,000-10,000 emails a day
> o some event at a few minutes before 1:50 am stopped James from 
> returning used smtp connection socket objects to the pool starting at 
> that time
> o as a consequence new connections were not accepted and the service 
> stopped effectively at 1:32 am
> o it never recovered from that situation until I relieved it by 
> restarting phoenix at 8:29 am
> 
>  From my point of view it seems to be likely that the mentioned 
> exception in default*.log caused the problem, since it happened so close 
> to the time James stopped:
> 
> 17/02/04 01:33:11 INFO  database-connections.maildb:  ***** connection 1 
> is way too old: 63408 > 60000
> 17/02/04 01:33:11 INFO  database-connections.maildb: java.lang.Throwable
>    at 
> org.apache.james.util.mordred.PoolConnEntry.lock(PoolConnEntry.java:135)
>    at 
> org.apache.james.util.mordred.JdbcDataSource.getConnection(JdbcDataSource.java:178) 
> 
>    at 
> org.apache.james.mailrepository.JDBCMailRepository.store(JDBCMailRepository.java:484) 
> 
>    at 
> org.apache.james.mailrepository.JDBCSpoolRepository.store(JDBCSpoolRepository.java:243) 
> 
>    at 
> org.apache.james.transport.mailets.RemoteDelivery.service(RemoteDelivery.java:649) 
> 
>    at 
> org.apache.james.transport.LinearProcessor.service(LinearProcessor.java:413) 
> 
>    at 
> org.apache.james.transport.JamesSpoolManager.process(JamesSpoolManager.java:436) 
> 
>    at 
> org.apache.james.transport.JamesSpoolManager.run(JamesSpoolManager.java:366) 
> 
>    at 
> org.apache.avalon.excalibur.thread.impl.ExecutableRunnable.execute(ExecutableRunnable.java:47) 
> 
>    at 
> org.apache.avalon.excalibur.thread.impl.WorkerThread.run(WorkerThread.java:80) 
> 
> 
> On the other hand it could be just a side effect. I attached the 
> mentioned log files since midnight to this email in zipped format.
> 
> I really hope somebody on this list can make any sense out of this and 
> knows what's going on here and how I can fix it. Let me know if I should 
> run postal or smtp-source first and I will do it.
> 
> All help is greatly appreciated.
> 
> Best regards,
> Michael
> 
> P.S. I was not able to attach a ZIP file to this email. mail.apache.org 
> bounced it saying "552 we don't accept email with executable content 
> (#5.3.4)" ? I try it again as TGZ.
> 
> 
> 
> Noel J. Bergman wrote:
> 
> >Run netstat and check to see if they are real.  I have tested the connection
> >handling with linux kernel 2.2.22, 2.4.20-18 and 2.4.20-24 without problem,
> >amongst other versions.
> >
> >What version of James?
> >
> >	--- Noel
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> >For additional commands, e-mail: server-user-help@james.apache.org
> >
> >
> >  
> >
> 
> 
> ______________________________________________________________________
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> For additional commands, e-mail: server-user-help@james.apache.org

Re: James Running Out Of Connections

Posted by Michael Nestler <mn...@whoglue.com>.
OK, James stopped again last night at about 1:50 am. I did not have the 
time yet to run postal or smtp-source. But this time I have a log 
showing smtp sockets with a 10 minute resolution. Since the last stop I 
increased the maximum number of connections to 120 (thread number to 150).

At 1:50 James reported in connections*.log:

17/02/04 01:50:00 WARN  connections: Maximum number of open connections 
exceeded - refusing connection.  Current number of connections is 120

repeatedly until 8:29 am when I restarted James. My smtpconn.log (cron 
job "netstat | grep smtp") shows for the same time:

Tue Feb 17 01:50:00 EST 2004
tcp        0      0 email1.whoglue.net:smtp DSL217-1:fyre-messanger 
SYN_RECV   
tcp        0      0 192.168.168.5:smtp      vm211.vm:cruise-swroute 
ESTABLISHED
tcp        1      0 192.168.168.5:smtp      220.127.103.78:idmaps   
CLOSE_WAIT 
tcp        0      0 192.168.168.5:smtp      mta.email.contine:44071 
TIME_WAIT  
tcp        0      0 192.168.168.5:smtp      custd530.innerhost:4074 
ESTABLISHED
tcp        0      0 192.168.168.5:smtp      pro109.profontis.:59902 
ESTABLISHED
tcp        0      0 192.168.168.5:smtp      smtp01.1800contact:4220 
ESTABLISHED
tcp        0      0 192.168.168.5:smtp      mta.email.contine:42345 
ESTABLISHED
tcp        0      0 192.168.168.5:smtp      mta.email.contine:25694 
ESTABLISHED
tcp        0      0 192.168.168.5:smtp      stdout-mv-01.inbo:44652 
TIME_WAIT  
tcp        0      0 192.168.168.5:smtp      128.121.26.2:31574      
ESTABLISHED
tcp        1      0 192.168.168.5:smtp      ool-435798e8.dy:evb-elm 
CLOSE_WAIT 
tcp        1      0 192.168.168.5:smtp      lsanca1-ar5:notify_srvr 
CLOSE_WAIT 
tcp        0      0 192.168.168.5:smtp      mta113.cheetahmai:52853 
TIME_WAIT  
tcp        0      0 192.168.168.5:smtp      66.42.142.98:49221      
ESTABLISHED
tcp        0      0 192.168.168.5:smtp      server611.colonize:3926 
ESTABLISHED
tcp        1      0 192.168.168.5:smtp      211-56-243-3.inter:linx 
CLOSE_WAIT 
tcp        0      0 192.168.168.5:smtp      wchicon1.qvc.com:4611   
ESTABLISHED
tcp        0      0 192.168.168.5:smtp      vm125.vmadmin.com:8381  
ESTABLISHED
tcp        7      0 192.168.168.5:smtp      66.28.60.195:58491      
CLOSE_WAIT 
tcp        0      0 192.168.168.5:smtp      imo-r10.mx.aol.co:61761 
ESTABLISHED
tcp        0      0 192.168.168.5:smtp      bonaire.server1.o:52001 
ESTABLISHED
tcp        0      0 192.168.168.5:smtp      bonaire.server1.o:52272 
ESTABLISHED
tcp        1      0 192.168.168.5:smtp      202.Red-213-97-41:pipes 
CLOSE_WAIT 
tcp        0      0 192.168.168.5:smtp      n25.grp.scd.yahoo:47649 
ESTABLISHED
tcp        1      0 192.168.168.5:smtp      218.98.97.183:4612      
CLOSE_WAIT 
tcp        1      0 192.168.168.5:smtp      mail.contactel.cz:43724 
CLOSE_WAIT 
tcp        0      0 192.168.168.5:smtp      mail.contactel.cz:43794 
ESTABLISHED
tcp        0      0 192.168.168.5:smtp      mail.contactel.cz:43801 
ESTABLISHED
tcp        0      0 192.168.168.5:smtp      212.143.236.10:hylafax  
ESTABLISHED
tcp        0      0 192.168.168.5:smtp      cust54-224.netcabo:4416 
ESTABLISHED
tcp        0      0 192.168.168.5:smtp      ana216.anotherday:40710 
ESTABLISHED
tcp        1      0 192.168.168.5:smtp      218.232.228.130:vsixml  
CLOSE_WAIT 


And at the next time the netstat snapshot was made:

Tue Feb 17 02:00:00 EST 2004
tcp        0      0 192.168.168.5:smtp      mta.email.contine:13622 
TIME_WAIT  
tcp        0      0 192.168.168.5:smtp      imo-m23.mx.aol.co:63699 
TIME_WAIT  
tcp        0      0 192.168.168.5:smtp      modemcable:conclave-cpp 
TIME_WAIT  
tcp        0      0 192.168.168.5:smtp      219.144.194.18:43349    
TIME_WAIT  
tcp        0      0 192.168.168.5:smtp      219.144.194.18:43281    
TIME_WAIT  
tcp        0      0 192.168.168.5:smtp      c-24-7-95-19.clien:4669 
TIME_WAIT  
tcp        0      0 192.168.168.5:smtp      web60804.mail.yah:30519 
TIME_WAIT  
tcp        0      0 192.168.168.5:smtp      web60804.mail.yah:30518 
TIME_WAIT  
tcp        0      0 192.168.168.5:smtp      relay.quickhire.c:34089 
TIME_WAIT  
tcp        0      0 192.168.168.5:smtp      ACC1B832.:ns-cfg-server 
TIME_WAIT  


 From there on the smtp sockets log file basically shows only TIME_WAIT 
sockets (there are a few SYN_RECV) until I restarted James at 8:29 am. 
Never netstat shows 120 open smtp connections (what James seems to 
believe). As a strange aside the default*.log contains a database 
connection exception (it contained the same exception at the last time 
James stopped 6 days ago). I don't know if this means anything - could 
this exception have caused the problem? spoolmanager*.log and 
mailet*.log stopped recording at 1:32 am.

To summarize the events from my point of view:
o James was running for about 6 days without any problem, delivering 
about 5,000-10,000 emails a day
o some event at a few minutes before 1:50 am stopped James from 
returning used smtp connection socket objects to the pool starting at 
that time
o as a consequence new connections were not accepted and the service 
stopped effectively at 1:32 am
o it never recovered from that situation until I relieved it by 
restarting phoenix at 8:29 am

 From my point of view it seems to be likely that the mentioned 
exception in default*.log caused the problem, since it happened so close 
to the time James stopped:

17/02/04 01:33:11 INFO  database-connections.maildb:  ***** connection 1 
is way too old: 63408 > 60000
17/02/04 01:33:11 INFO  database-connections.maildb: java.lang.Throwable
   at 
org.apache.james.util.mordred.PoolConnEntry.lock(PoolConnEntry.java:135)
   at 
org.apache.james.util.mordred.JdbcDataSource.getConnection(JdbcDataSource.java:178) 

   at 
org.apache.james.mailrepository.JDBCMailRepository.store(JDBCMailRepository.java:484) 

   at 
org.apache.james.mailrepository.JDBCSpoolRepository.store(JDBCSpoolRepository.java:243) 

   at 
org.apache.james.transport.mailets.RemoteDelivery.service(RemoteDelivery.java:649) 

   at 
org.apache.james.transport.LinearProcessor.service(LinearProcessor.java:413) 

   at 
org.apache.james.transport.JamesSpoolManager.process(JamesSpoolManager.java:436) 

   at 
org.apache.james.transport.JamesSpoolManager.run(JamesSpoolManager.java:366) 

   at 
org.apache.avalon.excalibur.thread.impl.ExecutableRunnable.execute(ExecutableRunnable.java:47) 

   at 
org.apache.avalon.excalibur.thread.impl.WorkerThread.run(WorkerThread.java:80) 


On the other hand it could be just a side effect. I attached the 
mentioned log files since midnight to this email in zipped format.

I really hope somebody on this list can make any sense out of this and 
knows what's going on here and how I can fix it. Let me know if I should 
run postal or smtp-source first and I will do it.

All help is greatly appreciated.

Best regards,
Michael

P.S. I was not able to attach a ZIP file to this email. mail.apache.org 
bounced it saying "552 we don't accept email with executable content 
(#5.3.4)" ? I try it again as TGZ.



Noel J. Bergman wrote:

>Run netstat and check to see if they are real.  I have tested the connection
>handling with linux kernel 2.2.22, 2.4.20-18 and 2.4.20-24 without problem,
>amongst other versions.
>
>What version of James?
>
>	--- Noel
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
>For additional commands, e-mail: server-user-help@james.apache.org
>
>
>  
>

Re: James Running Out Of Connections

Posted by Michael Nestler <mn...@whoglue.com>.
"netstat | grep smtp" reports now:

tcp        0      0 192.168.168.5:smtp      69-84-204-9.shlby:43729 
TIME_WAIT
tcp        0      0 192.168.168.5:smtp      mta1.primary.ddc.:29389 
TIME_WAIT
tcp        0      0 192.168.168.5:smtp      bhwash004.us.usai:33680 
TIME_WAIT
tcp        0      0 192.168.168.5:39345     nycmx02.mgw.rr.com:smtp 
ESTABLISHED
tcp        0      0 192.168.168.5:smtp      mail09.rm04.net:47300   
TIME_WAIT

I will forward the output again after James stopped for the next time.

It is James 2.1.3 with an additional forwarding mailet, nntp and pop3 off.

Michael



Noel J. Bergman wrote:

>Run netstat and check to see if they are real.  I have tested the connection
>handling with linux kernel 2.2.22, 2.4.20-18 and 2.4.20-24 without problem,
>amongst other versions.
>
>What version of James?
>
>	--- Noel
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
>For additional commands, e-mail: server-user-help@james.apache.org
>
>
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


RE: James Running Out Of Connections

Posted by "Noel J. Bergman" <no...@devtech.com>.
Run netstat and check to see if they are real.  I have tested the connection
handling with linux kernel 2.2.22, 2.4.20-18 and 2.4.20-24 without problem,
amongst other versions.

What version of James?

	--- Noel


---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


Re: James Running Out Of Connections

Posted by Michael Nestler <mn...@whoglue.com>.
I will send the netstat output if it should happen again. Yes, I had a 
connection limit of 30 (increased it now to 120) - but the connections 
were not release for 6 hours. James started logging

12/02/04 02:29:38 WARN  connections: Maximum number of open connections 
exceeded - refusing connection.  Current number of connections is 30

and continued doing this once every few seconds until I stopped it when 
I came to work in the morning:

12/02/04 08:11:19 WARN  connections: Maximum number of open connections 
exceeded - refusing connection.  Current number of connections is 30
12/02/04 08:11:21 DEBUG connections: Disposing server 
connection...org.apache.james.util.connection.ServerConnection@178655
12/02/04 08:11:21 DEBUG connections: Closed server connection - cleaning 
up clients - org.apache.james.util.connection.ServerConnection@178655
12/02/04 08:11:21 DEBUG connections: Cleaned up clients - 
org.apache.james.util.connection.ServerConnection@178655
12/02/04 08:11:21 DEBUG connections: Disposing server 
connection...org.apache.james.util.connection.ServerConnection@746ad0
12/02/04 08:11:21 DEBUG connections: Closed server connection - cleaning 
up clients - org.apache.james.util.connection.ServerConnection@746ad0

So, yes it used all connections in the pool - but they were never 
released. That is my problem.

Michael



Noel J. Bergman wrote:

>Michael,
>
>You can use netstat | grep smtp to check, but I suspect that James is
>working as designed.  It appears that you have a connection limit of 30, and
>really do have 30 simultaneously active connections.
>
>	--- Noel
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
>For additional commands, e-mail: server-user-help@james.apache.org
>
>
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org


RE: James Running Out Of Connections

Posted by "Noel J. Bergman" <no...@devtech.com>.
Michael,

You can use netstat | grep smtp to check, but I suspect that James is
working as designed.  It appears that you have a connection limit of 30, and
really do have 30 simultaneously active connections.

	--- Noel


---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org