You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-user@james.apache.org by Mahesh Sivarama Pillai <sr...@gmail.com> on 2015/03/27 14:16:43 UTC

URGENT HELP: James 2.3.2 not responding after few days of run

Hi,

 I need an urgent help. We have rolled out James 2.3.2 to production for
our email processing application. I see that James getting shutdown (no
trace in the phoenix.console) after few days of run. It processes around
100K email a day and sends a good amount of Notification through
RemoveDelivery.

I have verified the logs but I couldn't find any reason for this abnormal
shutdown. I have seen couple of "Too Many Open Files" errors in smtpserver
log and spoolmanager log. But I think those will not bring down the server.
Will they ? I am not sure if James is killed by some other Linux process.
James is running under a user (eg: james) account with sudo access to run
on port 25. Since I don't have root access, what all areas that I look to
figure out what the problem is ? If I want to talk to Sys Admin, what all
information that I should ask him/her to gather ?

James is running on a 4 CPU machine with 8GB RAM. Heapsize of James is set
to 4GB.

I have configured to run James as service in Linux. I am not sure if our
Sys Admin run the chkconfig command. Is there any impact of not running
this command ? Please provide your inputs as early as possible..


Thanks
Mahesh

Re: URGENT HELP: James 2.3.2 not responding after few days of run

Posted by Mahesh Sivarama Pillai <sr...@gmail.com>.
Thanks again Bernd... I couldn't find the hserr files under the temp or
james directories. Considering we faced Too Many open files issue, will it
prevent the JVM from not creating this file ? I am clueless on this issue.
No process Killed James, Noone stopped James.. No OOM in logs.. No core
dump :) :(

Regarding the file system I will verify. As far as I know we have a NAS...

On Sat, Mar 28, 2015 at 3:50 AM, Bernd Waibel <BW...@intarsys.de> wrote:

> Hi Mahesh,
>
> Don't missunderstand: Out-of-file-handle COULD lead to a memory leak,
> consuming memory time by time. But not NEED to.
>
> OOMs will normally been shown in the log, as I know, but we got this only
> for the heap memory.
> OOMs normally happen if the heap memory will reach the limit, and yes, we
> got this in the logs, sometimes.
> Every time I got an OOM in log, I restarted the server. Just to be sure it
> keeps running.
> So I do not have long running servers with a lot of OOM errors. So: no
> experience with that.
>
> But you could also get short on memory for the java classes (Native area,
> Method area), and I am not sure if this will show up in the log. Never had
> this with james. I got his when running JIRA long ago, but could not
> remember the log.
>
> The PID (process ID) is something handled by the linux system, it is
> outside James, and I think you won't find it in log.
> But the PID is created on startup (phonix.sh), and may be logged in the
> shell script to somewhere, together with a time stamp.
> But not in the james logs.
>
> If your sysadmins do use a monitoring tool (like nagios or icinga) the may
> monitor the memory.
> You could also monitor the memory inside the VM using JMX, but this is a
> little bit hard to set up.
>
> But anyway: the memory may NOT be the problem. So do not spend to much
> time on that.
>
> If you could find a hserr*.pid file, the file will tell the reason for
> "crashing".
>
>
> There is something else I could remember. But with another software.
> If the log file is stored on a file server (not a local directory), and
> the file server will reboot, you will loose the log.
> We got a java process which "died", cause the file server has been
> rebooted at midnight, and the java process lost all mounted directories.
> After that we made sure that the log directory is always local. And the
> programm directory too.
> You may check if your server uses mounted file systems.
>
>
> Greetings
> Bernd
>
> -----Ursprüngliche Nachricht-----
> Von: Mahesh Sivarama Pillai [mailto:srmahe@gmail.com]
> Gesendet: Freitag, 27. März 2015 15:17
> An: James Users List
> Betreff: Re: URGENT HELP: James 2.3.2 not responding after few days of run
>
> Hi Bernd,
>
>  Thanks for the pointers. Let me ask the Sys admin on these details. Btw,
> will this memory leak be shown in the logs? I couldn't find any OOM errors
> in any of the logs. When the issue, happened, our team restarted the
> server. It will create a new PID rite ? Is there a way we can see the old
> pids from the james logs ?
>
> Thanks
> Mahesh
>
> On Fri, Mar 27, 2015 at 7:33 PM, Bernd Waibel <BW...@intarsys.de> wrote:
>
> > Hi Mahesh
> >
> > to man open files may result in a memory leak.
> > Could the sysadmin monitor the memory?
> >
> > It is a java prozess. Is there a file called hserr*.pid? That is
> > produced if the vm crashes.
> >
> > Ciao
> > Bernd
> >
> >
> > -------- Ursprüngliche Nachricht --------
> > Von: Mahesh Sivarama Pillai <sr...@gmail.com>
> > Datum: 27.03.2015 14:18 (GMT+01:00)
> > An: James Users List <se...@james.apache.org>
> > Betreff: URGENT HELP: James 2.3.2 not responding after few days of run
> >
> > Hi,
> >
> >  I need an urgent help. We have rolled out James 2.3.2 to production
> > for our email processing application. I see that James getting
> > shutdown (no trace in the phoenix.console) after few days of run. It
> > processes around 100K email a day and sends a good amount of
> > Notification through RemoveDelivery.
> >
> > I have verified the logs but I couldn't find any reason for this
> > abnormal shutdown. I have seen couple of "Too Many Open Files" errors
> > in smtpserver log and spoolmanager log. But I think those will not bring
> down the server.
> > Will they ? I am not sure if James is killed by some other Linux process.
> > James is running under a user (eg: james) account with sudo access to
> > run on port 25. Since I don't have root access, what all areas that I
> > look to figure out what the problem is ? If I want to talk to Sys
> > Admin, what all information that I should ask him/her to gather ?
> >
> > James is running on a 4 CPU machine with 8GB RAM. Heapsize of James is
> > set to 4GB.
> >
> > I have configured to run James as service in Linux. I am not sure if
> > our Sys Admin run the chkconfig command. Is there any impact of not
> > running this command ? Please provide your inputs as early as possible..
> >
> >
> > Thanks
> > Mahesh
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> For additional commands, e-mail: server-user-help@james.apache.org
>

AW: URGENT HELP: James 2.3.2 not responding after few days of run

Posted by Bernd Waibel <BW...@intarsys.de>.
Hi Mahesh,

Don't missunderstand: Out-of-file-handle COULD lead to a memory leak, consuming memory time by time. But not NEED to.

OOMs will normally been shown in the log, as I know, but we got this only for the heap memory.
OOMs normally happen if the heap memory will reach the limit, and yes, we got this in the logs, sometimes.
Every time I got an OOM in log, I restarted the server. Just to be sure it keeps running.
So I do not have long running servers with a lot of OOM errors. So: no experience with that.

But you could also get short on memory for the java classes (Native area, Method area), and I am not sure if this will show up in the log. Never had this with james. I got his when running JIRA long ago, but could not remember the log.

The PID (process ID) is something handled by the linux system, it is outside James, and I think you won't find it in log.
But the PID is created on startup (phonix.sh), and may be logged in the shell script to somewhere, together with a time stamp.
But not in the james logs.

If your sysadmins do use a monitoring tool (like nagios or icinga) the may monitor the memory.
You could also monitor the memory inside the VM using JMX, but this is a little bit hard to set up.

But anyway: the memory may NOT be the problem. So do not spend to much time on that.

If you could find a hserr*.pid file, the file will tell the reason for "crashing".


There is something else I could remember. But with another software.
If the log file is stored on a file server (not a local directory), and the file server will reboot, you will loose the log.
We got a java process which "died", cause the file server has been rebooted at midnight, and the java process lost all mounted directories. After that we made sure that the log directory is always local. And the programm directory too.
You may check if your server uses mounted file systems.


Greetings
Bernd

-----Ursprüngliche Nachricht-----
Von: Mahesh Sivarama Pillai [mailto:srmahe@gmail.com] 
Gesendet: Freitag, 27. März 2015 15:17
An: James Users List
Betreff: Re: URGENT HELP: James 2.3.2 not responding after few days of run

Hi Bernd,

 Thanks for the pointers. Let me ask the Sys admin on these details. Btw, will this memory leak be shown in the logs? I couldn't find any OOM errors in any of the logs. When the issue, happened, our team restarted the server. It will create a new PID rite ? Is there a way we can see the old pids from the james logs ?

Thanks
Mahesh

On Fri, Mar 27, 2015 at 7:33 PM, Bernd Waibel <BW...@intarsys.de> wrote:

> Hi Mahesh
>
> to man open files may result in a memory leak.
> Could the sysadmin monitor the memory?
>
> It is a java prozess. Is there a file called hserr*.pid? That is 
> produced if the vm crashes.
>
> Ciao
> Bernd
>
>
> -------- Ursprüngliche Nachricht --------
> Von: Mahesh Sivarama Pillai <sr...@gmail.com>
> Datum: 27.03.2015 14:18 (GMT+01:00)
> An: James Users List <se...@james.apache.org>
> Betreff: URGENT HELP: James 2.3.2 not responding after few days of run
>
> Hi,
>
>  I need an urgent help. We have rolled out James 2.3.2 to production 
> for our email processing application. I see that James getting 
> shutdown (no trace in the phoenix.console) after few days of run. It 
> processes around 100K email a day and sends a good amount of 
> Notification through RemoveDelivery.
>
> I have verified the logs but I couldn't find any reason for this 
> abnormal shutdown. I have seen couple of "Too Many Open Files" errors 
> in smtpserver log and spoolmanager log. But I think those will not bring down the server.
> Will they ? I am not sure if James is killed by some other Linux process.
> James is running under a user (eg: james) account with sudo access to 
> run on port 25. Since I don't have root access, what all areas that I 
> look to figure out what the problem is ? If I want to talk to Sys 
> Admin, what all information that I should ask him/her to gather ?
>
> James is running on a 4 CPU machine with 8GB RAM. Heapsize of James is 
> set to 4GB.
>
> I have configured to run James as service in Linux. I am not sure if 
> our Sys Admin run the chkconfig command. Is there any impact of not 
> running this command ? Please provide your inputs as early as possible..
>
>
> Thanks
> Mahesh
>

---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
For additional commands, e-mail: server-user-help@james.apache.org

Re: URGENT HELP: James 2.3.2 not responding after few days of run

Posted by Mahesh Sivarama Pillai <sr...@gmail.com>.
Hi Bernd,

 Thanks for the pointers. Let me ask the Sys admin on these details. Btw,
will this memory leak be shown in the logs? I couldn't find any OOM errors
in any of the logs. When the issue, happened, our team restarted the
server. It will create a new PID rite ? Is there a way we can see the old
pids from the james logs ?

Thanks
Mahesh

On Fri, Mar 27, 2015 at 7:33 PM, Bernd Waibel <BW...@intarsys.de> wrote:

> Hi Mahesh
>
> to man open files may result in a memory leak.
> Could the sysadmin monitor the memory?
>
> It is a java prozess. Is there a file called hserr*.pid? That is produced
> if the vm crashes.
>
> Ciao
> Bernd
>
>
> -------- Ursprüngliche Nachricht --------
> Von: Mahesh Sivarama Pillai <sr...@gmail.com>
> Datum: 27.03.2015 14:18 (GMT+01:00)
> An: James Users List <se...@james.apache.org>
> Betreff: URGENT HELP: James 2.3.2 not responding after few days of run
>
> Hi,
>
>  I need an urgent help. We have rolled out James 2.3.2 to production for
> our email processing application. I see that James getting shutdown (no
> trace in the phoenix.console) after few days of run. It processes around
> 100K email a day and sends a good amount of Notification through
> RemoveDelivery.
>
> I have verified the logs but I couldn't find any reason for this abnormal
> shutdown. I have seen couple of "Too Many Open Files" errors in smtpserver
> log and spoolmanager log. But I think those will not bring down the server.
> Will they ? I am not sure if James is killed by some other Linux process.
> James is running under a user (eg: james) account with sudo access to run
> on port 25. Since I don't have root access, what all areas that I look to
> figure out what the problem is ? If I want to talk to Sys Admin, what all
> information that I should ask him/her to gather ?
>
> James is running on a 4 CPU machine with 8GB RAM. Heapsize of James is set
> to 4GB.
>
> I have configured to run James as service in Linux. I am not sure if our
> Sys Admin run the chkconfig command. Is there any impact of not running
> this command ? Please provide your inputs as early as possible..
>
>
> Thanks
> Mahesh
>

AW: URGENT HELP: James 2.3.2 not responding after few days of run

Posted by Bernd Waibel <BW...@intarsys.de>.
Hi Mahesh

to man open files may result in a memory leak.
Could the sysadmin monitor the memory?

It is a java prozess. Is there a file called hserr*.pid? That is produced if the vm crashes.

Ciao
Bernd


-------- Ursprüngliche Nachricht --------
Von: Mahesh Sivarama Pillai <sr...@gmail.com>
Datum: 27.03.2015 14:18 (GMT+01:00)
An: James Users List <se...@james.apache.org>
Betreff: URGENT HELP: James 2.3.2 not responding after few days of run

Hi,

 I need an urgent help. We have rolled out James 2.3.2 to production for
our email processing application. I see that James getting shutdown (no
trace in the phoenix.console) after few days of run. It processes around
100K email a day and sends a good amount of Notification through
RemoveDelivery.

I have verified the logs but I couldn't find any reason for this abnormal
shutdown. I have seen couple of "Too Many Open Files" errors in smtpserver
log and spoolmanager log. But I think those will not bring down the server.
Will they ? I am not sure if James is killed by some other Linux process.
James is running under a user (eg: james) account with sudo access to run
on port 25. Since I don't have root access, what all areas that I look to
figure out what the problem is ? If I want to talk to Sys Admin, what all
information that I should ask him/her to gather ?

James is running on a 4 CPU machine with 8GB RAM. Heapsize of James is set
to 4GB.

I have configured to run James as service in Linux. I am not sure if our
Sys Admin run the chkconfig command. Is there any impact of not running
this command ? Please provide your inputs as early as possible..


Thanks
Mahesh