You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-user@james.apache.org by Mahesh Sivarama Pillai <sr...@gmail.com> on 2015/04/01 08:59:20 UTC

Re: URGENT HELP: James 2.3.2 not responding after few days of run

Thanks Bernd. Enjoy your holidays and please help whenever you get time...
:)

Did you get a chance to take a look at my latest email ? Actually the
server is not dead. Its refusing connections.. I have put the relevent
details in the email.... Please take a look.

Tanks
Mahesh

On Wed, Apr 1, 2015 at 12:48 AM, Bernd Waibel <BW...@intarsys.de> wrote:

> Hi Mahesh
>
> I am currently on holidays. So I could not check on a linux.
>
> The "chkconfig add" will add scripts for startup AND shutdown, with a
> defined order and in the defined runlevel.
> Not having this means: you have the service to be started and stopped by
> hand.
>
> And the process may just be killed when rebooting. This MAY result in
> nothing to be logged on shutdown.
> If you reboot the Server the log may just end and the process will die. It
> will not been started again.
>
> Just sounds like your description. Does it?
>
> Greetings
> Bernd
>
>
> -------- Ursprüngliche Nachricht --------
> Von: Mahesh Sivarama Pillai <sr...@gmail.com>
> Datum: 31.03.2015 07:48 (GMT+01:00)
> An: James Users List <se...@james.apache.org>
> Betreff: Re: URGENT HELP: James 2.3.2 not responding after few days of run
>
> Hi Bernd,
>
>  Our Sys Admin has NOT performed the following things while configuring
> james as a service.
>
> 1. Adding the below lines in phoenix.sh
>
> #chkconfig: 2345 80 05#description: James Mail Server
>
> 2. Chkconfig command
>
> chkconfig --add james
>
>
> They created only the link in /etc/init.d pointing to phoenix.sh. We can
> start and stop the service using the service command. Do you think not
> doing the above two steps will impact a running James in any manner ? I am
> trying to understand he run levels as well.
>
> Thanks
> Mahesh
>
>
>
> On Mon, Mar 30, 2015 at 5:28 PM, Mahesh Sivarama Pillai <sr...@gmail.com>
> wrote:
>
> > If there is a clean shutdown through RemoteManager, it should be shown in
> > the log rite ? The thing is, I don't see any entry in the console log
> which
> > says STOPPED..I am investigating and will keep you posted. Thanks for the
> > help so far.
> >
> > Thanks
> > Mahesh
> >
> > On Mon, Mar 30, 2015 at 2:48 AM, Bernd Waibel <BW...@intarsys.de>
> wrote:
> >
> >> Hi Mahesh
> >>
> >> finding a hserr would be a clear sign that something happened outside
> the
> >> VM.
> >> E.g. if you load a dll or lib inside your Java code and the dll produces
> >> a memory fault than the vm may crash.
> >> If a hserr is produced the vm have crashed, without writing a log or
> >> something else. The log just ends.
> >> Not finding a hserr means you need to look for something else.
> >> So I think it is not a crash.
> >>
> >> Another Idea:
> >> In the config.xml you could configure a RemoteManager Port and user.
> >> I am currently on holidays so I could not look up the syntax.
> >> You could telnet to that port and send a shutdown command.
> >> Could something simple like that happen?
> >>
> >> And about chkconfig:
> >> We had a system with james configured to run only in runlevel "with gui"
> >> (i think it was 5 or 6).
> >> And than a sysadmin switched the system to run "without gui".
> >> So the switch to another runlevel just stopped james, with a clean
> >> shutdown.
> >> After that we just carefully looked for the runlevels.
> >> James needs to start after network, and after database if used.
> >> And also it should stop this way.
> >>
> >> Greetings Bernd
> >>
> >>
> >> -------- Ursprüngliche Nachricht --------
> >> Von: Mahesh Sivarama Pillai <sr...@gmail.com>
> >> Datum: 29.03.2015 07:58 (GMT+01:00)
> >> An: James Users List <se...@james.apache.org>
> >> Betreff: Re: URGENT HELP: James 2.3.2 not responding after few days of
> run
> >>
> >> Thanks again Bernd... I couldn't find the hserr files under the temp or
> >> james directories. Considering we faced Too Many open files issue, will
> it
> >> prevent the JVM from not creating this file ? I am clueless on this
> issue.
> >> No process Killed James, Noone stopped James.. No OOM in logs.. No core
> >> dump :) :(
> >>
> >> Regarding the file system I will verify. As far as I know we have a
> NAS...
> >>
> >> On Sat, Mar 28, 2015 at 3:50 AM, Bernd Waibel <BW...@intarsys.de>
> >> wrote:
> >>
> >> > Hi Mahesh,
> >> >
> >> > Don't missunderstand: Out-of-file-handle COULD lead to a memory leak,
> >> > consuming memory time by time. But not NEED to.
> >> >
> >> > OOMs will normally been shown in the log, as I know, but we got this
> >> only
> >> > for the heap memory.
> >> > OOMs normally happen if the heap memory will reach the limit, and yes,
> >> we
> >> > got this in the logs, sometimes.
> >> > Every time I got an OOM in log, I restarted the server. Just to be
> sure
> >> it
> >> > keeps running.
> >> > So I do not have long running servers with a lot of OOM errors. So: no
> >> > experience with that.
> >> >
> >> > But you could also get short on memory for the java classes (Native
> >> area,
> >> > Method area), and I am not sure if this will show up in the log. Never
> >> had
> >> > this with james. I got his when running JIRA long ago, but could not
> >> > remember the log.
> >> >
> >> > The PID (process ID) is something handled by the linux system, it is
> >> > outside James, and I think you won't find it in log.
> >> > But the PID is created on startup (phonix.sh), and may be logged in
> the
> >> > shell script to somewhere, together with a time stamp.
> >> > But not in the james logs.
> >> >
> >> > If your sysadmins do use a monitoring tool (like nagios or icinga) the
> >> may
> >> > monitor the memory.
> >> > You could also monitor the memory inside the VM using JMX, but this
> is a
> >> > little bit hard to set up.
> >> >
> >> > But anyway: the memory may NOT be the problem. So do not spend to much
> >> > time on that.
> >> >
> >> > If you could find a hserr*.pid file, the file will tell the reason for
> >> > "crashing".
> >> >
> >> >
> >> > There is something else I could remember. But with another software.
> >> > If the log file is stored on a file server (not a local directory),
> and
> >> > the file server will reboot, you will loose the log.
> >> > We got a java process which "died", cause the file server has been
> >> > rebooted at midnight, and the java process lost all mounted
> directories.
> >> > After that we made sure that the log directory is always local. And
> the
> >> > programm directory too.
> >> > You may check if your server uses mounted file systems.
> >> >
> >> >
> >> > Greetings
> >> > Bernd
> >> >
> >> > -----Ursprüngliche Nachricht-----
> >> > Von: Mahesh Sivarama Pillai [mailto:srmahe@gmail.com]
> >> > Gesendet: Freitag, 27. März 2015 15:17
> >> > An: James Users List
> >> > Betreff: Re: URGENT HELP: James 2.3.2 not responding after few days of
> >> run
> >> >
> >> > Hi Bernd,
> >> >
> >> >  Thanks for the pointers. Let me ask the Sys admin on these details.
> >> Btw,
> >> > will this memory leak be shown in the logs? I couldn't find any OOM
> >> errors
> >> > in any of the logs. When the issue, happened, our team restarted the
> >> > server. It will create a new PID rite ? Is there a way we can see the
> >> old
> >> > pids from the james logs ?
> >> >
> >> > Thanks
> >> > Mahesh
> >> >
> >> > On Fri, Mar 27, 2015 at 7:33 PM, Bernd Waibel <BW...@intarsys.de>
> >> wrote:
> >> >
> >> > > Hi Mahesh
> >> > >
> >> > > to man open files may result in a memory leak.
> >> > > Could the sysadmin monitor the memory?
> >> > >
> >> > > It is a java prozess. Is there a file called hserr*.pid? That is
> >> > > produced if the vm crashes.
> >> > >
> >> > > Ciao
> >> > > Bernd
> >> > >
> >> > >
> >> > > -------- Ursprüngliche Nachricht --------
> >> > > Von: Mahesh Sivarama Pillai <sr...@gmail.com>
> >> > > Datum: 27.03.2015 14:18 (GMT+01:00)
> >> > > An: James Users List <se...@james.apache.org>
> >> > > Betreff: URGENT HELP: James 2.3.2 not responding after few days of
> run
> >> > >
> >> > > Hi,
> >> > >
> >> > >  I need an urgent help. We have rolled out James 2.3.2 to production
> >> > > for our email processing application. I see that James getting
> >> > > shutdown (no trace in the phoenix.console) after few days of run. It
> >> > > processes around 100K email a day and sends a good amount of
> >> > > Notification through RemoveDelivery.
> >> > >
> >> > > I have verified the logs but I couldn't find any reason for this
> >> > > abnormal shutdown. I have seen couple of "Too Many Open Files"
> errors
> >> > > in smtpserver log and spoolmanager log. But I think those will not
> >> bring
> >> > down the server.
> >> > > Will they ? I am not sure if James is killed by some other Linux
> >> process.
> >> > > James is running under a user (eg: james) account with sudo access
> to
> >> > > run on port 25. Since I don't have root access, what all areas that
> I
> >> > > look to figure out what the problem is ? If I want to talk to Sys
> >> > > Admin, what all information that I should ask him/her to gather ?
> >> > >
> >> > > James is running on a 4 CPU machine with 8GB RAM. Heapsize of James
> is
> >> > > set to 4GB.
> >> > >
> >> > > I have configured to run James as service in Linux. I am not sure if
> >> > > our Sys Admin run the chkconfig command. Is there any impact of not
> >> > > running this command ? Please provide your inputs as early as
> >> possible..
> >> > >
> >> > >
> >> > > Thanks
> >> > > Mahesh
> >> > >
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
> >> > For additional commands, e-mail: server-user-help@james.apache.org
> >> >
> >>
> >
> >
>

Re: URGENT HELP: James 2.3.2 not responding after few days of run

Posted by Mahesh Sivarama Pillai <sr...@gmail.com>.
I will start another thread with a relevant subject..

Thanks
Mahesh

On Wed, Apr 1, 2015 at 12:29 PM, Mahesh Sivarama Pillai <sr...@gmail.com>
wrote:

> Thanks Bernd. Enjoy your holidays and please help whenever you get time...
> :)
>
> Did you get a chance to take a look at my latest email ? Actually the
> server is not dead. Its refusing connections.. I have put the relevent
> details in the email.... Please take a look.
>
> Tanks
> Mahesh
>
> On Wed, Apr 1, 2015 at 12:48 AM, Bernd Waibel <BW...@intarsys.de> wrote:
>
>> Hi Mahesh
>>
>> I am currently on holidays. So I could not check on a linux.
>>
>> The "chkconfig add" will add scripts for startup AND shutdown, with a
>> defined order and in the defined runlevel.
>> Not having this means: you have the service to be started and stopped by
>> hand.
>>
>> And the process may just be killed when rebooting. This MAY result in
>> nothing to be logged on shutdown.
>> If you reboot the Server the log may just end and the process will die.
>> It will not been started again.
>>
>> Just sounds like your description. Does it?
>>
>> Greetings
>> Bernd
>>
>>
>> -------- Ursprüngliche Nachricht --------
>> Von: Mahesh Sivarama Pillai <sr...@gmail.com>
>> Datum: 31.03.2015 07:48 (GMT+01:00)
>> An: James Users List <se...@james.apache.org>
>> Betreff: Re: URGENT HELP: James 2.3.2 not responding after few days of run
>>
>> Hi Bernd,
>>
>>  Our Sys Admin has NOT performed the following things while configuring
>> james as a service.
>>
>> 1. Adding the below lines in phoenix.sh
>>
>> #chkconfig: 2345 80 05#description: James Mail Server
>>
>> 2. Chkconfig command
>>
>> chkconfig --add james
>>
>>
>> They created only the link in /etc/init.d pointing to phoenix.sh. We can
>> start and stop the service using the service command. Do you think not
>> doing the above two steps will impact a running James in any manner ? I am
>> trying to understand he run levels as well.
>>
>> Thanks
>> Mahesh
>>
>>
>>
>> On Mon, Mar 30, 2015 at 5:28 PM, Mahesh Sivarama Pillai <srmahe@gmail.com
>> >
>> wrote:
>>
>> > If there is a clean shutdown through RemoteManager, it should be shown
>> in
>> > the log rite ? The thing is, I don't see any entry in the console log
>> which
>> > says STOPPED..I am investigating and will keep you posted. Thanks for
>> the
>> > help so far.
>> >
>> > Thanks
>> > Mahesh
>> >
>> > On Mon, Mar 30, 2015 at 2:48 AM, Bernd Waibel <BW...@intarsys.de>
>> wrote:
>> >
>> >> Hi Mahesh
>> >>
>> >> finding a hserr would be a clear sign that something happened outside
>> the
>> >> VM.
>> >> E.g. if you load a dll or lib inside your Java code and the dll
>> produces
>> >> a memory fault than the vm may crash.
>> >> If a hserr is produced the vm have crashed, without writing a log or
>> >> something else. The log just ends.
>> >> Not finding a hserr means you need to look for something else.
>> >> So I think it is not a crash.
>> >>
>> >> Another Idea:
>> >> In the config.xml you could configure a RemoteManager Port and user.
>> >> I am currently on holidays so I could not look up the syntax.
>> >> You could telnet to that port and send a shutdown command.
>> >> Could something simple like that happen?
>> >>
>> >> And about chkconfig:
>> >> We had a system with james configured to run only in runlevel "with
>> gui"
>> >> (i think it was 5 or 6).
>> >> And than a sysadmin switched the system to run "without gui".
>> >> So the switch to another runlevel just stopped james, with a clean
>> >> shutdown.
>> >> After that we just carefully looked for the runlevels.
>> >> James needs to start after network, and after database if used.
>> >> And also it should stop this way.
>> >>
>> >> Greetings Bernd
>> >>
>> >>
>> >> -------- Ursprüngliche Nachricht --------
>> >> Von: Mahesh Sivarama Pillai <sr...@gmail.com>
>> >> Datum: 29.03.2015 07:58 (GMT+01:00)
>> >> An: James Users List <se...@james.apache.org>
>> >> Betreff: Re: URGENT HELP: James 2.3.2 not responding after few days of
>> run
>> >>
>> >> Thanks again Bernd... I couldn't find the hserr files under the temp or
>> >> james directories. Considering we faced Too Many open files issue,
>> will it
>> >> prevent the JVM from not creating this file ? I am clueless on this
>> issue.
>> >> No process Killed James, Noone stopped James.. No OOM in logs.. No core
>> >> dump :) :(
>> >>
>> >> Regarding the file system I will verify. As far as I know we have a
>> NAS...
>> >>
>> >> On Sat, Mar 28, 2015 at 3:50 AM, Bernd Waibel <BW...@intarsys.de>
>> >> wrote:
>> >>
>> >> > Hi Mahesh,
>> >> >
>> >> > Don't missunderstand: Out-of-file-handle COULD lead to a memory leak,
>> >> > consuming memory time by time. But not NEED to.
>> >> >
>> >> > OOMs will normally been shown in the log, as I know, but we got this
>> >> only
>> >> > for the heap memory.
>> >> > OOMs normally happen if the heap memory will reach the limit, and
>> yes,
>> >> we
>> >> > got this in the logs, sometimes.
>> >> > Every time I got an OOM in log, I restarted the server. Just to be
>> sure
>> >> it
>> >> > keeps running.
>> >> > So I do not have long running servers with a lot of OOM errors. So:
>> no
>> >> > experience with that.
>> >> >
>> >> > But you could also get short on memory for the java classes (Native
>> >> area,
>> >> > Method area), and I am not sure if this will show up in the log.
>> Never
>> >> had
>> >> > this with james. I got his when running JIRA long ago, but could not
>> >> > remember the log.
>> >> >
>> >> > The PID (process ID) is something handled by the linux system, it is
>> >> > outside James, and I think you won't find it in log.
>> >> > But the PID is created on startup (phonix.sh), and may be logged in
>> the
>> >> > shell script to somewhere, together with a time stamp.
>> >> > But not in the james logs.
>> >> >
>> >> > If your sysadmins do use a monitoring tool (like nagios or icinga)
>> the
>> >> may
>> >> > monitor the memory.
>> >> > You could also monitor the memory inside the VM using JMX, but this
>> is a
>> >> > little bit hard to set up.
>> >> >
>> >> > But anyway: the memory may NOT be the problem. So do not spend to
>> much
>> >> > time on that.
>> >> >
>> >> > If you could find a hserr*.pid file, the file will tell the reason
>> for
>> >> > "crashing".
>> >> >
>> >> >
>> >> > There is something else I could remember. But with another software.
>> >> > If the log file is stored on a file server (not a local directory),
>> and
>> >> > the file server will reboot, you will loose the log.
>> >> > We got a java process which "died", cause the file server has been
>> >> > rebooted at midnight, and the java process lost all mounted
>> directories.
>> >> > After that we made sure that the log directory is always local. And
>> the
>> >> > programm directory too.
>> >> > You may check if your server uses mounted file systems.
>> >> >
>> >> >
>> >> > Greetings
>> >> > Bernd
>> >> >
>> >> > -----Ursprüngliche Nachricht-----
>> >> > Von: Mahesh Sivarama Pillai [mailto:srmahe@gmail.com]
>> >> > Gesendet: Freitag, 27. März 2015 15:17
>> >> > An: James Users List
>> >> > Betreff: Re: URGENT HELP: James 2.3.2 not responding after few days
>> of
>> >> run
>> >> >
>> >> > Hi Bernd,
>> >> >
>> >> >  Thanks for the pointers. Let me ask the Sys admin on these details.
>> >> Btw,
>> >> > will this memory leak be shown in the logs? I couldn't find any OOM
>> >> errors
>> >> > in any of the logs. When the issue, happened, our team restarted the
>> >> > server. It will create a new PID rite ? Is there a way we can see the
>> >> old
>> >> > pids from the james logs ?
>> >> >
>> >> > Thanks
>> >> > Mahesh
>> >> >
>> >> > On Fri, Mar 27, 2015 at 7:33 PM, Bernd Waibel <BW...@intarsys.de>
>> >> wrote:
>> >> >
>> >> > > Hi Mahesh
>> >> > >
>> >> > > to man open files may result in a memory leak.
>> >> > > Could the sysadmin monitor the memory?
>> >> > >
>> >> > > It is a java prozess. Is there a file called hserr*.pid? That is
>> >> > > produced if the vm crashes.
>> >> > >
>> >> > > Ciao
>> >> > > Bernd
>> >> > >
>> >> > >
>> >> > > -------- Ursprüngliche Nachricht --------
>> >> > > Von: Mahesh Sivarama Pillai <sr...@gmail.com>
>> >> > > Datum: 27.03.2015 14:18 (GMT+01:00)
>> >> > > An: James Users List <se...@james.apache.org>
>> >> > > Betreff: URGENT HELP: James 2.3.2 not responding after few days of
>> run
>> >> > >
>> >> > > Hi,
>> >> > >
>> >> > >  I need an urgent help. We have rolled out James 2.3.2 to
>> production
>> >> > > for our email processing application. I see that James getting
>> >> > > shutdown (no trace in the phoenix.console) after few days of run.
>> It
>> >> > > processes around 100K email a day and sends a good amount of
>> >> > > Notification through RemoveDelivery.
>> >> > >
>> >> > > I have verified the logs but I couldn't find any reason for this
>> >> > > abnormal shutdown. I have seen couple of "Too Many Open Files"
>> errors
>> >> > > in smtpserver log and spoolmanager log. But I think those will not
>> >> bring
>> >> > down the server.
>> >> > > Will they ? I am not sure if James is killed by some other Linux
>> >> process.
>> >> > > James is running under a user (eg: james) account with sudo access
>> to
>> >> > > run on port 25. Since I don't have root access, what all areas
>> that I
>> >> > > look to figure out what the problem is ? If I want to talk to Sys
>> >> > > Admin, what all information that I should ask him/her to gather ?
>> >> > >
>> >> > > James is running on a 4 CPU machine with 8GB RAM. Heapsize of
>> James is
>> >> > > set to 4GB.
>> >> > >
>> >> > > I have configured to run James as service in Linux. I am not sure
>> if
>> >> > > our Sys Admin run the chkconfig command. Is there any impact of not
>> >> > > running this command ? Please provide your inputs as early as
>> >> possible..
>> >> > >
>> >> > >
>> >> > > Thanks
>> >> > > Mahesh
>> >> > >
>> >> >
>> >> > ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: server-user-unsubscribe@james.apache.org
>> >> > For additional commands, e-mail: server-user-help@james.apache.org
>> >> >
>> >>
>> >
>> >
>>
>
>