You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by Anirban Chakraborty <ab...@juniper.net> on 2014/03/08 08:15:30 UTC

system vm disk space issue in ACS 4.3

Hi All,

I am seeing system vm disk has no space left after running for few days. Cloudstack UI shows the agent in v-2-VM in alert state, while agent state of s-1-VM shows blank (hyphen in the UI).
Both the system vms are running and ssh-able from the host. The log in s-1-Vm shows following errors:

root@s-1-VM:~# grep 'Exception' /var/log/cloud/*.*
/var/log/cloud/cloud.out.2:java.io.IOException: No space left on device
/var/log/cloud/cloud.out.2:java.io.IOException: No space left on device

whereas logs in v-1-VM shows
/var/log/cloud/cloud.out.3:java.io.IOException: No space left on device
/var/log/cloud/cloud.out.3:java.io.IOException: No space left on device
/var/log/cloud/cloud.out.3:07:18:00,547  INFO CSExceptionErrorCode:87 - Could not find exception: com.cloud.exception.AgentControlChannelException in error code list for exceptions
/var/log/cloud/cloud.out.3:com.cloud.exception.AgentControlChannelException: Unable to post agent control request as link is not available

Looks like cloud agent is filling up the log, which is leading to the disk full state.

Is this a known issue? Thanks.

Anirban

Re: system vm disk space issue in ACS 4.3

Posted by Chiradeep Vittal <Ch...@citrix.com>.
Would¹ve been nice to raise a bug about it :)

On 3/8/14, 10:32 AM, "Marcus" <sh...@gmail.com> wrote:

>Yeah, I've just seen on busy systems where even with log rotation working
>properly the little space left in var after OS files is barely enough, for
>example the conntrackd log on a busy VPC. We actually ended up rolling our
>own system vm, the existing image has plenty of space, its just locked up
>in other partitions.
>On Mar 8, 2014 8:58 AM, "Rajesh Battala" <ra...@citrix.com>
>wrote:
>
>> Yes, only 435MB is available for /var . we can increase the space also.
>> But we need to find out the root cause which services are causing the
>>/var
>> to fill up.
>> Can you please find out and post which log files are taking up more
>>space
>> in /var
>>
>> Thanks
>> Rajesh Battala
>>
>> -----Original Message-----
>> From: Marcus [mailto:shadowsor@gmail.com]
>> Sent: Saturday, March 8, 2014 8:19 PM
>> To: dev@cloudstack.apache.org
>> Subject: RE: system vm disk space issue in ACS 4.3
>>
>> Perhaps there's a new service. I know in the past we've seen issues with
>> this , specifically the conntrackd log. I think the cloud logs weren't
>> getting rolled either, but I thought it was all fixed.
>>
>> There's also simply not a ton of space on /var, I wish we would go back
>>to
>> just having one partition because it orphans lots of free space in other
>> filesystems.
>> On Mar 8, 2014 12:37 AM, "Rajesh Battala" <ra...@citrix.com>
>> wrote:
>>
>> > AFAIK, log roation is enabled in the systemvm.
>> > Can you check whether the logs are getting zipped .?
>> >
>> > -----Original Message-----
>> > From: Anirban Chakraborty [mailto:abchak@juniper.net]
>> > Sent: Saturday, March 8, 2014 12:46 PM
>> > To: dev@cloudstack.apache.org
>> > Subject: system vm disk space issue in ACS 4.3
>> >
>> > Hi All,
>> >
>> > I am seeing system vm disk has no space left after running for few
>>days.
>> > Cloudstack UI shows the agent in v-2-VM in alert state, while agent
>> > state of s-1-VM shows blank (hyphen in the UI).
>> > Both the system vms are running and ssh-able from the host. The log in
>> > s-1-Vm shows following errors:
>> >
>> > root@s-1-VM:~# grep 'Exception' /var/log/cloud/*.*
>> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left on
>> > device
>> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left on
>> > device
>> >
>> > whereas logs in v-1-VM shows
>> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left on
>> > device
>> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left on
>> > device
>> > /var/log/cloud/cloud.out.3:07:18:00,547  INFO CSExceptionErrorCode:87
>> > - Could not find exception:
>> > com.cloud.exception.AgentControlChannelException
>> > in error code list for exceptions
>> >
>> 
>>/var/log/cloud/cloud.out.3:com.cloud.exception.AgentControlChannelExcepti
>>on:
>> > Unable to post agent control request as link is not available
>> >
>> > Looks like cloud agent is filling up the log, which is leading to the
>> > disk full state.
>> >
>> > Is this a known issue? Thanks.
>> >
>> > Anirban
>> >
>>


Re: system vm disk space issue in ACS 4.3

Posted by Saurav Lahiri <sa...@sungard.com>.
So if you look at the code for run.sh you will see that it is a while loop
that calls _run.sh. If _run.sh  gets terminated and the java process is
still running, future runs of _run.sh will attempt to start the java
process and will fail. But as it fails a whole lot of log message will get
logged to cloud.out by the /etc/init.d/cloud script. I guess the fix I am
working on should also address( by preventing scripts from logging to
cloud.out) this issue.

Thanks
Saurav


On Thu, Mar 20, 2014 at 10:29 PM, Sunil Bakhru <sb...@juniper.net> wrote:

> The fix for 6258 will contain the log size and prevent it from hogging the
> root partition.
>
> However, we do see the following 'ERROR' log which fills up our system VM
> log file.  Has anybody seen this?
> Any pointers on what could possibly be wrong?
>
>
> 2014-03-20 08:40:54,395 ERROR [cloud.agent.AgentShell] (main:null) Unable
> to start agent: Java process is being started twice.  If this is not true,
> remove /var/run/agent.SecStorage.pid
> 2014-03-20 08:41:05,359 INFO  [cloud.agent.AgentShell] (main:null) Agent
> started
> 2014-03-20 08:41:05,364 INFO  [cloud.agent.AgentShell] (main:null)
> Implementation Version is 4.3.0
> 2014-03-20 08:41:05,364 INFO  [cloud.agent.AgentShell] (main:null)
> agent.properties found at /usr/local/cloud/systemvm/conf/agent.properties
> 2014-03-20 08:41:05,370 DEBUG [cloud.agent.AgentShell] (main:null) Found
> property: instance
> 2014-03-20 08:41:05,370 DEBUG [cloud.agent.AgentShell] (main:null) Found
> property: NfsSecondaryStorageResource.id
> 2014-03-20 08:41:05,370 DEBUG [cloud.agent.AgentShell] (main:null) Found
> property: resource
> 2014-03-20 08:41:05,370 INFO  [cloud.agent.AgentShell] (main:null)
> Defaulting to using properties file for storage
> 2014-03-20 08:41:05,371 INFO  [cloud.agent.AgentShell] (main:null)
> Defaulting to the constant time backoff algorithm
> 2014-03-20 08:41:05,380 INFO  [cloud.utils.LogUtils] (main:null) log4j
> configuration found at /usr/local/cloud/systemvm/conf/log4j-cloud.xml
> 2014-03-20 08:41:05,393 DEBUG [cloud.agent.AgentShell] (main:null)
> Checking to see if agent.SecStorage.pid exists.
> 2014-03-20 08:41:05,396 DEBUG [cloud.utils.ProcessUtil] (main:null)
> environment.properties could not be opened
> 2014-03-20 08:41:05,403 DEBUG [cloud.utils.ProcessUtil] (main:null)
> Executing: bash -c ps -p 3469
> 2014-03-20 08:41:05,411 DEBUG [cloud.utils.ProcessUtil] (main:null)
> Execution is successful.
> 2014-03-20 08:41:05,411 DEBUG [cloud.utils.ProcessUtil] (main:null)   PID
> TTY          TIME CMD
>  3469 ?        00:00:36 java
>
> 2014-03-20 08:41:05,411 ERROR [cloud.agent.AgentShell] (main:null) Unable
> to start agent: Java process is being started twice.  If this is not true,
> remove /var/run/agent.SecStorage.pid
>
>
>
> Appreciate the help.
>
> Thanks,
> Sunil
>
>
>
>
>
> On 3/19/14 8:40 AM, "Rajesh Battala" <ra...@citrix.com> wrote:
>
> >Great. Post your patch at reviews.apache.org
> >
> >Thanks
> >Rajesh Battala
> >
> >-----Original Message-----
> >From: Saurav Lahiri [mailto:saurav.lahiri@sungard.com]
> >Sent: Wednesday, March 19, 2014 8:49 PM
> >To: dev@cloudstack.apache.org
> >Subject: Re: system vm disk space issue in ACS 4.3
> >
> >Thanks Rajesh. I have created a jira ticket for this
> >https://issues.apache.org/jira/browse/CLOUDSTACK-6258. Will send in the
> >fix for review in a couple of days.
> >
> >Thanks
> >Saurav
> >
> >
> >On Wed, Mar 19, 2014 at 8:03 PM, Rajesh Battala
> ><ra...@citrix.com>wrote:
> >
> >> Can you please file a bug and send your fix for review.
> >>
> >> Thanks
> >> Rajesh Battala
> >>
> >> -----Original Message-----
> >> From: Saurav Lahiri [mailto:saurav.lahiri@sungard.com]
> >> Sent: Wednesday, March 19, 2014 7:20 PM
> >> To: dev@cloudstack.apache.org
> >> Subject: Re: system vm disk space issue in ACS 4.3
> >>
> >> The problem appears to be the start function in the /etc/init.d/cloud
> >> service for console proxy.
> >> More specifically the following line also writes to /var/log/cloud.out
> >>
> >>
> >> ----------------------------------------------------------------------
> >> ------------------------------------------------------------
> >> (cd $CLOUD_COM_HOME/systemvm; nohup ./run.sh >
> >> /var/log/cloud/cloud.out
> >> 2>&1 & )
> >>
> >> ----------------------------------------------------------------------
> >> ------------------------------------------------------------
> >>
> >> since run.sh calls _run.sh and both has "set -x" enabled, in certain
> >> situations they can keep logging messages to cloud.out without being
> >> aware of the settings in log4j-cloud.xml
> >>
> >>
> >> One way to fix that could be that run.sh and _run.sh would log to
> >> cloud.out only if a debug flag was set to true, otherwise only the
> >> java process would write to cloud.out and log4j would respect the
> >> settings in log4j-cloud.xml
> >>
> >>
> >> Thanks
> >> Saurav
> >>
> >>
> >>
> >> On Mon, Mar 17, 2014 at 8:47 PM, Saurav Lahiri
> >> <saurav.lahiri@sungard.com
> >> >wrote:
> >>
> >> > Could it have  something to do with the RollingFileAppender that is
> >> > being used.
> >> > The following
> >> > rollingfileappender<http://apache-logging.6191.n7.nabble.com/Rolling
> >> > Fi leAppender-not-working-consistently-td8582.html> link appears to
> >> > be a
> >> bit outdated but they more or less describe a similar problem that we
> >> are seeing?
> >> >
> >> >
> >> > On our environment that is what we have seeing for sometime on
> >> > console proxy.  The root filesystem goes full with the cloud.out.*
> >> > occupying all the space. This happens pretty frequently and we have
> >> > to regularly recycle the console proxy to resolve this issue.
> >> >
> >> >
> >> > As seen below, cloud.out.2 should not have exceeded 10MB but it
> >> > stands at 217MB now.
> >> >
> >> > drwxr-xr-x 2 root root 4.0K Mar 17 14:57 .
> >> > drwxr-xr-x 8 root root 4.0K Mar 17 15:01 ..
> >> > -rw-r--r-- 1 root root    0 Mar 12 18:18 api-server.log
> >> > -rw-r--r-- 1 root root 357K Mar 17 15:06 cloud.out
> >> > -rw-r--r-- 1 root root 2.1M Mar 17 14:56 cloud.out.1
> >> > -rw-r--r-- 1 root root 217M Mar 17 15:06 cloud.out.2
> >> >
> >> > root@v-zzzz-VM:/var/log/cloud# lsof | grep cloud.out
> >> > sleep       649 root    1w      REG      202,1 226122291     181737
> >> > /var/log/cloud/cloud.out.2
> >> > sleep       649 root    2w      REG      202,1 226122291     181737
> >> > /var/log/cloud/cloud.out.2
> >> > bash       2312 root    1w      REG      202,1 226122291     181737
> >> > /var/log/cloud/cloud.out.2
> >> > bash       2312 root    2w      REG      202,1 226122291     181737
> >> > /var/log/cloud/cloud.out.2
> >> > bash       2339 root    1w      REG      202,1 226122291     181737
> >> > /var/log/cloud/cloud.out.2
> >> > bash       2339 root    2w      REG      202,1 226122291     181737
> >> > /var/log/cloud/cloud.out.2
> >> > bash       2786 root    1w      REG      202,1 226122291     181737
> >> > /var/log/cloud/cloud.out.2
> >> > bash       2786 root    2w      REG      202,1 226122291     181737
> >> > /var/log/cloud/cloud.out.2
> >> > java       2805 root    1w      REG      202,1 226122291     181737
> >> > /var/log/cloud/cloud.out.2
> >> > java       2805 root    2w      REG      202,1 226122291     181737
> >> > /var/log/cloud/cloud.out.2
> >> > java       2805 root  116w      REG      202,1    319382     181769
> >> > /var/log/cloud/cloud.out
> >> > root@v-zzzz-VM:/var/log/cloud# ls -alh
> >> >
> >> > Thanks
> >> > Saurav
> >> >
> >> >
> >> > On Tue, Mar 11, 2014 at 7:58 AM, Chiradeep Vittal <
> >> > Chiradeep.Vittal@citrix.com> wrote:
> >> >
> >> >> Yes, it was deliberate. I can¹t find the discussion, but it
> >> >> revolved around a security best practice of having separate
> >> >> partitions for /, /swap, home directories
> >> >>
> >> >>
> >> >> On 3/10/14, 11:35 AM, "Marcus" <sh...@gmail.com> wrote:
> >> >>
> >> >> >There have been several raised, actually regarding /var/log.  As
> >> >> >for the system vm partitioning, it was explicitly changed from
> >> >> >single to multiple partitions last year. I have no idea why, but I
> >> >> >generally don't file bugs without community discussion on things
> >> >> >that seem deliberate.
> >> >> >
> >> >> >On Sat, Mar 8, 2014 at 11:32 AM, Marcus <sh...@gmail.com>
> wrote:
> >> >> >> Yeah, I've just seen on busy systems where even with log
> >> >> >>rotation working  properly the little space left in var after OS
> >> >> >>files is barely enough, for  example the conntrackd log on a busy
> >> >> >>VPC. We actually ended up rolling our  own system vm, the
> >> >> >>existing image has plenty of space, its just locked up in  other
> >>partitions.
> >> >> >>
> >> >> >> On Mar 8, 2014 8:58 AM, "Rajesh Battala"
> >> >> >><ra...@citrix.com>
> >> >> >>wrote:
> >> >> >>>
> >> >> >>> Yes, only 435MB is available for /var . we can increase the
> >> >> >>> space
> >> >> also.
> >> >> >>> But we need to find out the root cause which services are
> >> >> >>>causing the /var  to fill up.
> >> >> >>> Can you please find out and post which log files are taking up
> >> >> >>>more space  in /var
> >> >> >>>
> >> >> >>> Thanks
> >> >> >>> Rajesh Battala
> >> >> >>>
> >> >> >>> -----Original Message-----
> >> >> >>> From: Marcus [mailto:shadowsor@gmail.com]
> >> >> >>> Sent: Saturday, March 8, 2014 8:19 PM
> >> >> >>> To: dev@cloudstack.apache.org
> >> >> >>> Subject: RE: system vm disk space issue in ACS 4.3
> >> >> >>>
> >> >> >>> Perhaps there's a new service. I know in the past we've seen
> >> >> >>>issues with  this , specifically the conntrackd log. I think the
> >> >> >>>cloud logs weren't  getting rolled either, but I thought it was
> >> >> >>>all fixed.
> >> >> >>>
> >> >> >>> There's also simply not a ton of space on /var, I wish we would
> >> >> >>>go back to  just having one partition because it orphans lots of
> >> >> >>>free space in other  filesystems.
> >> >> >>> On Mar 8, 2014 12:37 AM, "Rajesh Battala"
> >> >> >>><ra...@citrix.com>
> >> >> >>> wrote:
> >> >> >>>
> >> >> >>> > AFAIK, log roation is enabled in the systemvm.
> >> >> >>> > Can you check whether the logs are getting zipped .?
> >> >> >>> >
> >> >> >>> > -----Original Message-----
> >> >> >>> > From: Anirban Chakraborty [mailto:abchak@juniper.net]
> >> >> >>> > Sent: Saturday, March 8, 2014 12:46 PM
> >> >> >>> > To: dev@cloudstack.apache.org
> >> >> >>> > Subject: system vm disk space issue in ACS 4.3
> >> >> >>> >
> >> >> >>> > Hi All,
> >> >> >>> >
> >> >> >>> > I am seeing system vm disk has no space left after running
> >> >> >>> > for few
> >> >> >>>days.
> >> >> >>> > Cloudstack UI shows the agent in v-2-VM in alert state, while
> >> >> >>> > agent state of s-1-VM shows blank (hyphen in the UI).
> >> >> >>> > Both the system vms are running and ssh-able from the host.
> >> >> >>> > The log
> >> >> >>>in
> >> >> >>> > s-1-Vm shows following errors:
> >> >> >>> >
> >> >> >>> > root@s-1-VM:~# grep 'Exception' /var/log/cloud/*.*
> >> >> >>> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left
> >> >> >>> > on device
> >> >> >>> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left
> >> >> >>> > on device
> >> >> >>> >
> >> >> >>> > whereas logs in v-1-VM shows
> >> >> >>> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left
> >> >> >>> > on device
> >> >> >>> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left
> >> >> >>> > on device
> >> >> >>> > /var/log/cloud/cloud.out.3:07:18:00,547  INFO
> >> >> CSExceptionErrorCode:87
> >> >> >>> > - Could not find exception:
> >> >> >>> > com.cloud.exception.AgentControlChannelException
> >> >> >>> > in error code list for exceptions
> >> >> >>> >
> >> >> >>> >
> >> >>
> >> >> >>>/var/log/cloud/cloud.out.3:com.cloud.exception.AgentControlChann
> >> >> >>>el
> >> >> >>>Except
> >> >> >>>ion:
> >> >> >>> > Unable to post agent control request as link is not available
> >> >> >>> >
> >> >> >>> > Looks like cloud agent is filling up the log, which is
> >> >> >>> > leading to
> >> >> the
> >> >> >>> > disk full state.
> >> >> >>> >
> >> >> >>> > Is this a known issue? Thanks.
> >> >> >>> >
> >> >> >>> > Anirban
> >> >> >>> >
> >> >>
> >> >>
> >> >>
> >> >
> >>
> >>
> >
> >
>
>

Re: system vm disk space issue in ACS 4.3

Posted by Sunil Bakhru <sb...@juniper.net>.
The fix for 6258 will contain the log size and prevent it from hogging the
root partition. 

However, we do see the following 'ERROR' log which fills up our system VM
log file.  Has anybody seen this?
Any pointers on what could possibly be wrong?


2014-03-20 08:40:54,395 ERROR [cloud.agent.AgentShell] (main:null) Unable
to start agent: Java process is being started twice.  If this is not true,
remove /var/run/agent.SecStorage.pid
2014-03-20 08:41:05,359 INFO  [cloud.agent.AgentShell] (main:null) Agent
started
2014-03-20 08:41:05,364 INFO  [cloud.agent.AgentShell] (main:null)
Implementation Version is 4.3.0
2014-03-20 08:41:05,364 INFO  [cloud.agent.AgentShell] (main:null)
agent.properties found at /usr/local/cloud/systemvm/conf/agent.properties
2014-03-20 08:41:05,370 DEBUG [cloud.agent.AgentShell] (main:null) Found
property: instance
2014-03-20 08:41:05,370 DEBUG [cloud.agent.AgentShell] (main:null) Found
property: NfsSecondaryStorageResource.id
2014-03-20 08:41:05,370 DEBUG [cloud.agent.AgentShell] (main:null) Found
property: resource
2014-03-20 08:41:05,370 INFO  [cloud.agent.AgentShell] (main:null)
Defaulting to using properties file for storage
2014-03-20 08:41:05,371 INFO  [cloud.agent.AgentShell] (main:null)
Defaulting to the constant time backoff algorithm
2014-03-20 08:41:05,380 INFO  [cloud.utils.LogUtils] (main:null) log4j
configuration found at /usr/local/cloud/systemvm/conf/log4j-cloud.xml
2014-03-20 08:41:05,393 DEBUG [cloud.agent.AgentShell] (main:null)
Checking to see if agent.SecStorage.pid exists.
2014-03-20 08:41:05,396 DEBUG [cloud.utils.ProcessUtil] (main:null)
environment.properties could not be opened
2014-03-20 08:41:05,403 DEBUG [cloud.utils.ProcessUtil] (main:null)
Executing: bash -c ps -p 3469
2014-03-20 08:41:05,411 DEBUG [cloud.utils.ProcessUtil] (main:null)
Execution is successful.
2014-03-20 08:41:05,411 DEBUG [cloud.utils.ProcessUtil] (main:null)   PID
TTY          TIME CMD
 3469 ?        00:00:36 java

2014-03-20 08:41:05,411 ERROR [cloud.agent.AgentShell] (main:null) Unable
to start agent: Java process is being started twice.  If this is not true,
remove /var/run/agent.SecStorage.pid



Appreciate the help.

Thanks,
Sunil 





On 3/19/14 8:40 AM, "Rajesh Battala" <ra...@citrix.com> wrote:

>Great. Post your patch at reviews.apache.org
>
>Thanks
>Rajesh Battala
>
>-----Original Message-----
>From: Saurav Lahiri [mailto:saurav.lahiri@sungard.com]
>Sent: Wednesday, March 19, 2014 8:49 PM
>To: dev@cloudstack.apache.org
>Subject: Re: system vm disk space issue in ACS 4.3
>
>Thanks Rajesh. I have created a jira ticket for this
>https://issues.apache.org/jira/browse/CLOUDSTACK-6258. Will send in the
>fix for review in a couple of days.
>
>Thanks
>Saurav
>
>
>On Wed, Mar 19, 2014 at 8:03 PM, Rajesh Battala
><ra...@citrix.com>wrote:
>
>> Can you please file a bug and send your fix for review.
>>
>> Thanks
>> Rajesh Battala
>>
>> -----Original Message-----
>> From: Saurav Lahiri [mailto:saurav.lahiri@sungard.com]
>> Sent: Wednesday, March 19, 2014 7:20 PM
>> To: dev@cloudstack.apache.org
>> Subject: Re: system vm disk space issue in ACS 4.3
>>
>> The problem appears to be the start function in the /etc/init.d/cloud
>> service for console proxy.
>> More specifically the following line also writes to /var/log/cloud.out
>>
>>
>> ----------------------------------------------------------------------
>> ------------------------------------------------------------
>> (cd $CLOUD_COM_HOME/systemvm; nohup ./run.sh >
>> /var/log/cloud/cloud.out
>> 2>&1 & )
>>
>> ----------------------------------------------------------------------
>> ------------------------------------------------------------
>>
>> since run.sh calls _run.sh and both has "set -x" enabled, in certain
>> situations they can keep logging messages to cloud.out without being
>> aware of the settings in log4j-cloud.xml
>>
>>
>> One way to fix that could be that run.sh and _run.sh would log to
>> cloud.out only if a debug flag was set to true, otherwise only the
>> java process would write to cloud.out and log4j would respect the
>> settings in log4j-cloud.xml
>>
>>
>> Thanks
>> Saurav
>>
>>
>>
>> On Mon, Mar 17, 2014 at 8:47 PM, Saurav Lahiri
>> <saurav.lahiri@sungard.com
>> >wrote:
>>
>> > Could it have  something to do with the RollingFileAppender that is
>> > being used.
>> > The following
>> > rollingfileappender<http://apache-logging.6191.n7.nabble.com/Rolling
>> > Fi leAppender-not-working-consistently-td8582.html> link appears to
>> > be a
>> bit outdated but they more or less describe a similar problem that we
>> are seeing?
>> >
>> >
>> > On our environment that is what we have seeing for sometime on
>> > console proxy.  The root filesystem goes full with the cloud.out.*
>> > occupying all the space. This happens pretty frequently and we have
>> > to regularly recycle the console proxy to resolve this issue.
>> >
>> >
>> > As seen below, cloud.out.2 should not have exceeded 10MB but it
>> > stands at 217MB now.
>> >
>> > drwxr-xr-x 2 root root 4.0K Mar 17 14:57 .
>> > drwxr-xr-x 8 root root 4.0K Mar 17 15:01 ..
>> > -rw-r--r-- 1 root root    0 Mar 12 18:18 api-server.log
>> > -rw-r--r-- 1 root root 357K Mar 17 15:06 cloud.out
>> > -rw-r--r-- 1 root root 2.1M Mar 17 14:56 cloud.out.1
>> > -rw-r--r-- 1 root root 217M Mar 17 15:06 cloud.out.2
>> >
>> > root@v-zzzz-VM:/var/log/cloud# lsof | grep cloud.out
>> > sleep       649 root    1w      REG      202,1 226122291     181737
>> > /var/log/cloud/cloud.out.2
>> > sleep       649 root    2w      REG      202,1 226122291     181737
>> > /var/log/cloud/cloud.out.2
>> > bash       2312 root    1w      REG      202,1 226122291     181737
>> > /var/log/cloud/cloud.out.2
>> > bash       2312 root    2w      REG      202,1 226122291     181737
>> > /var/log/cloud/cloud.out.2
>> > bash       2339 root    1w      REG      202,1 226122291     181737
>> > /var/log/cloud/cloud.out.2
>> > bash       2339 root    2w      REG      202,1 226122291     181737
>> > /var/log/cloud/cloud.out.2
>> > bash       2786 root    1w      REG      202,1 226122291     181737
>> > /var/log/cloud/cloud.out.2
>> > bash       2786 root    2w      REG      202,1 226122291     181737
>> > /var/log/cloud/cloud.out.2
>> > java       2805 root    1w      REG      202,1 226122291     181737
>> > /var/log/cloud/cloud.out.2
>> > java       2805 root    2w      REG      202,1 226122291     181737
>> > /var/log/cloud/cloud.out.2
>> > java       2805 root  116w      REG      202,1    319382     181769
>> > /var/log/cloud/cloud.out
>> > root@v-zzzz-VM:/var/log/cloud# ls -alh
>> >
>> > Thanks
>> > Saurav
>> >
>> >
>> > On Tue, Mar 11, 2014 at 7:58 AM, Chiradeep Vittal <
>> > Chiradeep.Vittal@citrix.com> wrote:
>> >
>> >> Yes, it was deliberate. I can¹t find the discussion, but it
>> >> revolved around a security best practice of having separate
>> >> partitions for /, /swap, home directories
>> >>
>> >>
>> >> On 3/10/14, 11:35 AM, "Marcus" <sh...@gmail.com> wrote:
>> >>
>> >> >There have been several raised, actually regarding /var/log.  As
>> >> >for the system vm partitioning, it was explicitly changed from
>> >> >single to multiple partitions last year. I have no idea why, but I
>> >> >generally don't file bugs without community discussion on things
>> >> >that seem deliberate.
>> >> >
>> >> >On Sat, Mar 8, 2014 at 11:32 AM, Marcus <sh...@gmail.com> wrote:
>> >> >> Yeah, I've just seen on busy systems where even with log
>> >> >>rotation working  properly the little space left in var after OS
>> >> >>files is barely enough, for  example the conntrackd log on a busy
>> >> >>VPC. We actually ended up rolling our  own system vm, the
>> >> >>existing image has plenty of space, its just locked up in  other
>>partitions.
>> >> >>
>> >> >> On Mar 8, 2014 8:58 AM, "Rajesh Battala"
>> >> >><ra...@citrix.com>
>> >> >>wrote:
>> >> >>>
>> >> >>> Yes, only 435MB is available for /var . we can increase the
>> >> >>> space
>> >> also.
>> >> >>> But we need to find out the root cause which services are
>> >> >>>causing the /var  to fill up.
>> >> >>> Can you please find out and post which log files are taking up
>> >> >>>more space  in /var
>> >> >>>
>> >> >>> Thanks
>> >> >>> Rajesh Battala
>> >> >>>
>> >> >>> -----Original Message-----
>> >> >>> From: Marcus [mailto:shadowsor@gmail.com]
>> >> >>> Sent: Saturday, March 8, 2014 8:19 PM
>> >> >>> To: dev@cloudstack.apache.org
>> >> >>> Subject: RE: system vm disk space issue in ACS 4.3
>> >> >>>
>> >> >>> Perhaps there's a new service. I know in the past we've seen
>> >> >>>issues with  this , specifically the conntrackd log. I think the
>> >> >>>cloud logs weren't  getting rolled either, but I thought it was
>> >> >>>all fixed.
>> >> >>>
>> >> >>> There's also simply not a ton of space on /var, I wish we would
>> >> >>>go back to  just having one partition because it orphans lots of
>> >> >>>free space in other  filesystems.
>> >> >>> On Mar 8, 2014 12:37 AM, "Rajesh Battala"
>> >> >>><ra...@citrix.com>
>> >> >>> wrote:
>> >> >>>
>> >> >>> > AFAIK, log roation is enabled in the systemvm.
>> >> >>> > Can you check whether the logs are getting zipped .?
>> >> >>> >
>> >> >>> > -----Original Message-----
>> >> >>> > From: Anirban Chakraborty [mailto:abchak@juniper.net]
>> >> >>> > Sent: Saturday, March 8, 2014 12:46 PM
>> >> >>> > To: dev@cloudstack.apache.org
>> >> >>> > Subject: system vm disk space issue in ACS 4.3
>> >> >>> >
>> >> >>> > Hi All,
>> >> >>> >
>> >> >>> > I am seeing system vm disk has no space left after running
>> >> >>> > for few
>> >> >>>days.
>> >> >>> > Cloudstack UI shows the agent in v-2-VM in alert state, while
>> >> >>> > agent state of s-1-VM shows blank (hyphen in the UI).
>> >> >>> > Both the system vms are running and ssh-able from the host.
>> >> >>> > The log
>> >> >>>in
>> >> >>> > s-1-Vm shows following errors:
>> >> >>> >
>> >> >>> > root@s-1-VM:~# grep 'Exception' /var/log/cloud/*.*
>> >> >>> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left
>> >> >>> > on device
>> >> >>> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left
>> >> >>> > on device
>> >> >>> >
>> >> >>> > whereas logs in v-1-VM shows
>> >> >>> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left
>> >> >>> > on device
>> >> >>> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left
>> >> >>> > on device
>> >> >>> > /var/log/cloud/cloud.out.3:07:18:00,547  INFO
>> >> CSExceptionErrorCode:87
>> >> >>> > - Could not find exception:
>> >> >>> > com.cloud.exception.AgentControlChannelException
>> >> >>> > in error code list for exceptions
>> >> >>> >
>> >> >>> >
>> >>
>> >> >>>/var/log/cloud/cloud.out.3:com.cloud.exception.AgentControlChann
>> >> >>>el
>> >> >>>Except
>> >> >>>ion:
>> >> >>> > Unable to post agent control request as link is not available
>> >> >>> >
>> >> >>> > Looks like cloud agent is filling up the log, which is
>> >> >>> > leading to
>> >> the
>> >> >>> > disk full state.
>> >> >>> >
>> >> >>> > Is this a known issue? Thanks.
>> >> >>> >
>> >> >>> > Anirban
>> >> >>> >
>> >>
>> >>
>> >>
>> >
>>
>>
>
>


RE: system vm disk space issue in ACS 4.3

Posted by Rajesh Battala <ra...@citrix.com>.
Great. Post your patch at reviews.apache.org 

Thanks
Rajesh Battala

-----Original Message-----
From: Saurav Lahiri [mailto:saurav.lahiri@sungard.com] 
Sent: Wednesday, March 19, 2014 8:49 PM
To: dev@cloudstack.apache.org
Subject: Re: system vm disk space issue in ACS 4.3

Thanks Rajesh. I have created a jira ticket for this https://issues.apache.org/jira/browse/CLOUDSTACK-6258. Will send in the fix for review in a couple of days.

Thanks
Saurav


On Wed, Mar 19, 2014 at 8:03 PM, Rajesh Battala
<ra...@citrix.com>wrote:

> Can you please file a bug and send your fix for review.
>
> Thanks
> Rajesh Battala
>
> -----Original Message-----
> From: Saurav Lahiri [mailto:saurav.lahiri@sungard.com]
> Sent: Wednesday, March 19, 2014 7:20 PM
> To: dev@cloudstack.apache.org
> Subject: Re: system vm disk space issue in ACS 4.3
>
> The problem appears to be the start function in the /etc/init.d/cloud 
> service for console proxy.
> More specifically the following line also writes to /var/log/cloud.out
>
>
> ----------------------------------------------------------------------
> ------------------------------------------------------------
> (cd $CLOUD_COM_HOME/systemvm; nohup ./run.sh > 
> /var/log/cloud/cloud.out
> 2>&1 & )
>
> ----------------------------------------------------------------------
> ------------------------------------------------------------
>
> since run.sh calls _run.sh and both has "set -x" enabled, in certain 
> situations they can keep logging messages to cloud.out without being 
> aware of the settings in log4j-cloud.xml
>
>
> One way to fix that could be that run.sh and _run.sh would log to 
> cloud.out only if a debug flag was set to true, otherwise only the 
> java process would write to cloud.out and log4j would respect the 
> settings in log4j-cloud.xml
>
>
> Thanks
> Saurav
>
>
>
> On Mon, Mar 17, 2014 at 8:47 PM, Saurav Lahiri 
> <saurav.lahiri@sungard.com
> >wrote:
>
> > Could it have  something to do with the RollingFileAppender that is 
> > being used.
> > The following
> > rollingfileappender<http://apache-logging.6191.n7.nabble.com/Rolling
> > Fi leAppender-not-working-consistently-td8582.html> link appears to 
> > be a
> bit outdated but they more or less describe a similar problem that we 
> are seeing?
> >
> >
> > On our environment that is what we have seeing for sometime on 
> > console proxy.  The root filesystem goes full with the cloud.out.* 
> > occupying all the space. This happens pretty frequently and we have 
> > to regularly recycle the console proxy to resolve this issue.
> >
> >
> > As seen below, cloud.out.2 should not have exceeded 10MB but it 
> > stands at 217MB now.
> >
> > drwxr-xr-x 2 root root 4.0K Mar 17 14:57 .
> > drwxr-xr-x 8 root root 4.0K Mar 17 15:01 ..
> > -rw-r--r-- 1 root root    0 Mar 12 18:18 api-server.log
> > -rw-r--r-- 1 root root 357K Mar 17 15:06 cloud.out
> > -rw-r--r-- 1 root root 2.1M Mar 17 14:56 cloud.out.1
> > -rw-r--r-- 1 root root 217M Mar 17 15:06 cloud.out.2
> >
> > root@v-zzzz-VM:/var/log/cloud# lsof | grep cloud.out
> > sleep       649 root    1w      REG      202,1 226122291     181737
> > /var/log/cloud/cloud.out.2
> > sleep       649 root    2w      REG      202,1 226122291     181737
> > /var/log/cloud/cloud.out.2
> > bash       2312 root    1w      REG      202,1 226122291     181737
> > /var/log/cloud/cloud.out.2
> > bash       2312 root    2w      REG      202,1 226122291     181737
> > /var/log/cloud/cloud.out.2
> > bash       2339 root    1w      REG      202,1 226122291     181737
> > /var/log/cloud/cloud.out.2
> > bash       2339 root    2w      REG      202,1 226122291     181737
> > /var/log/cloud/cloud.out.2
> > bash       2786 root    1w      REG      202,1 226122291     181737
> > /var/log/cloud/cloud.out.2
> > bash       2786 root    2w      REG      202,1 226122291     181737
> > /var/log/cloud/cloud.out.2
> > java       2805 root    1w      REG      202,1 226122291     181737
> > /var/log/cloud/cloud.out.2
> > java       2805 root    2w      REG      202,1 226122291     181737
> > /var/log/cloud/cloud.out.2
> > java       2805 root  116w      REG      202,1    319382     181769
> > /var/log/cloud/cloud.out
> > root@v-zzzz-VM:/var/log/cloud# ls -alh
> >
> > Thanks
> > Saurav
> >
> >
> > On Tue, Mar 11, 2014 at 7:58 AM, Chiradeep Vittal < 
> > Chiradeep.Vittal@citrix.com> wrote:
> >
> >> Yes, it was deliberate. I can¹t find the discussion, but it 
> >> revolved around a security best practice of having separate 
> >> partitions for /, /swap, home directories
> >>
> >>
> >> On 3/10/14, 11:35 AM, "Marcus" <sh...@gmail.com> wrote:
> >>
> >> >There have been several raised, actually regarding /var/log.  As 
> >> >for the system vm partitioning, it was explicitly changed from 
> >> >single to multiple partitions last year. I have no idea why, but I 
> >> >generally don't file bugs without community discussion on things 
> >> >that seem deliberate.
> >> >
> >> >On Sat, Mar 8, 2014 at 11:32 AM, Marcus <sh...@gmail.com> wrote:
> >> >> Yeah, I've just seen on busy systems where even with log 
> >> >>rotation working  properly the little space left in var after OS 
> >> >>files is barely enough, for  example the conntrackd log on a busy 
> >> >>VPC. We actually ended up rolling our  own system vm, the 
> >> >>existing image has plenty of space, its just locked up in  other partitions.
> >> >>
> >> >> On Mar 8, 2014 8:58 AM, "Rajesh Battala"
> >> >><ra...@citrix.com>
> >> >>wrote:
> >> >>>
> >> >>> Yes, only 435MB is available for /var . we can increase the 
> >> >>> space
> >> also.
> >> >>> But we need to find out the root cause which services are 
> >> >>>causing the /var  to fill up.
> >> >>> Can you please find out and post which log files are taking up 
> >> >>>more space  in /var
> >> >>>
> >> >>> Thanks
> >> >>> Rajesh Battala
> >> >>>
> >> >>> -----Original Message-----
> >> >>> From: Marcus [mailto:shadowsor@gmail.com]
> >> >>> Sent: Saturday, March 8, 2014 8:19 PM
> >> >>> To: dev@cloudstack.apache.org
> >> >>> Subject: RE: system vm disk space issue in ACS 4.3
> >> >>>
> >> >>> Perhaps there's a new service. I know in the past we've seen 
> >> >>>issues with  this , specifically the conntrackd log. I think the 
> >> >>>cloud logs weren't  getting rolled either, but I thought it was 
> >> >>>all fixed.
> >> >>>
> >> >>> There's also simply not a ton of space on /var, I wish we would 
> >> >>>go back to  just having one partition because it orphans lots of 
> >> >>>free space in other  filesystems.
> >> >>> On Mar 8, 2014 12:37 AM, "Rajesh Battala"
> >> >>><ra...@citrix.com>
> >> >>> wrote:
> >> >>>
> >> >>> > AFAIK, log roation is enabled in the systemvm.
> >> >>> > Can you check whether the logs are getting zipped .?
> >> >>> >
> >> >>> > -----Original Message-----
> >> >>> > From: Anirban Chakraborty [mailto:abchak@juniper.net]
> >> >>> > Sent: Saturday, March 8, 2014 12:46 PM
> >> >>> > To: dev@cloudstack.apache.org
> >> >>> > Subject: system vm disk space issue in ACS 4.3
> >> >>> >
> >> >>> > Hi All,
> >> >>> >
> >> >>> > I am seeing system vm disk has no space left after running 
> >> >>> > for few
> >> >>>days.
> >> >>> > Cloudstack UI shows the agent in v-2-VM in alert state, while 
> >> >>> > agent state of s-1-VM shows blank (hyphen in the UI).
> >> >>> > Both the system vms are running and ssh-able from the host. 
> >> >>> > The log
> >> >>>in
> >> >>> > s-1-Vm shows following errors:
> >> >>> >
> >> >>> > root@s-1-VM:~# grep 'Exception' /var/log/cloud/*.*
> >> >>> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left 
> >> >>> > on device
> >> >>> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left 
> >> >>> > on device
> >> >>> >
> >> >>> > whereas logs in v-1-VM shows
> >> >>> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left 
> >> >>> > on device
> >> >>> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left 
> >> >>> > on device
> >> >>> > /var/log/cloud/cloud.out.3:07:18:00,547  INFO
> >> CSExceptionErrorCode:87
> >> >>> > - Could not find exception:
> >> >>> > com.cloud.exception.AgentControlChannelException
> >> >>> > in error code list for exceptions
> >> >>> >
> >> >>> >
> >>
> >> >>>/var/log/cloud/cloud.out.3:com.cloud.exception.AgentControlChann
> >> >>>el
> >> >>>Except
> >> >>>ion:
> >> >>> > Unable to post agent control request as link is not available
> >> >>> >
> >> >>> > Looks like cloud agent is filling up the log, which is 
> >> >>> > leading to
> >> the
> >> >>> > disk full state.
> >> >>> >
> >> >>> > Is this a known issue? Thanks.
> >> >>> >
> >> >>> > Anirban
> >> >>> >
> >>
> >>
> >>
> >
>
>

Re: system vm disk space issue in ACS 4.3

Posted by Saurav Lahiri <sa...@sungard.com>.
Thanks Rajesh. I have created a jira ticket for this
https://issues.apache.org/jira/browse/CLOUDSTACK-6258. Will send in the fix
for review in a couple of days.

Thanks
Saurav


On Wed, Mar 19, 2014 at 8:03 PM, Rajesh Battala
<ra...@citrix.com>wrote:

> Can you please file a bug and send your fix for review.
>
> Thanks
> Rajesh Battala
>
> -----Original Message-----
> From: Saurav Lahiri [mailto:saurav.lahiri@sungard.com]
> Sent: Wednesday, March 19, 2014 7:20 PM
> To: dev@cloudstack.apache.org
> Subject: Re: system vm disk space issue in ACS 4.3
>
> The problem appears to be the start function in the /etc/init.d/cloud
> service for console proxy.
> More specifically the following line also writes to /var/log/cloud.out
>
>
> ----------------------------------------------------------------------------------------------------------------------------------
> (cd $CLOUD_COM_HOME/systemvm; nohup ./run.sh > /var/log/cloud/cloud.out
> 2>&1 & )
>
> ----------------------------------------------------------------------------------------------------------------------------------
>
> since run.sh calls _run.sh and both has "set -x" enabled, in certain
> situations they can keep logging messages to cloud.out without being aware
> of the settings in log4j-cloud.xml
>
>
> One way to fix that could be that run.sh and _run.sh would log to
> cloud.out only if a debug flag was set to true, otherwise only the java
> process would write to cloud.out and log4j would respect the settings in
> log4j-cloud.xml
>
>
> Thanks
> Saurav
>
>
>
> On Mon, Mar 17, 2014 at 8:47 PM, Saurav Lahiri <saurav.lahiri@sungard.com
> >wrote:
>
> > Could it have  something to do with the RollingFileAppender that is
> > being used.
> > The following
> > rollingfileappender<http://apache-logging.6191.n7.nabble.com/RollingFi
> > leAppender-not-working-consistently-td8582.html> link appears to be a
> bit outdated but they more or less describe a similar problem that we are
> seeing?
> >
> >
> > On our environment that is what we have seeing for sometime on console
> > proxy.  The root filesystem goes full with the cloud.out.* occupying
> > all the space. This happens pretty frequently and we have to regularly
> > recycle the console proxy to resolve this issue.
> >
> >
> > As seen below, cloud.out.2 should not have exceeded 10MB but it stands
> > at 217MB now.
> >
> > drwxr-xr-x 2 root root 4.0K Mar 17 14:57 .
> > drwxr-xr-x 8 root root 4.0K Mar 17 15:01 ..
> > -rw-r--r-- 1 root root    0 Mar 12 18:18 api-server.log
> > -rw-r--r-- 1 root root 357K Mar 17 15:06 cloud.out
> > -rw-r--r-- 1 root root 2.1M Mar 17 14:56 cloud.out.1
> > -rw-r--r-- 1 root root 217M Mar 17 15:06 cloud.out.2
> >
> > root@v-zzzz-VM:/var/log/cloud# lsof | grep cloud.out
> > sleep       649 root    1w      REG      202,1 226122291     181737
> > /var/log/cloud/cloud.out.2
> > sleep       649 root    2w      REG      202,1 226122291     181737
> > /var/log/cloud/cloud.out.2
> > bash       2312 root    1w      REG      202,1 226122291     181737
> > /var/log/cloud/cloud.out.2
> > bash       2312 root    2w      REG      202,1 226122291     181737
> > /var/log/cloud/cloud.out.2
> > bash       2339 root    1w      REG      202,1 226122291     181737
> > /var/log/cloud/cloud.out.2
> > bash       2339 root    2w      REG      202,1 226122291     181737
> > /var/log/cloud/cloud.out.2
> > bash       2786 root    1w      REG      202,1 226122291     181737
> > /var/log/cloud/cloud.out.2
> > bash       2786 root    2w      REG      202,1 226122291     181737
> > /var/log/cloud/cloud.out.2
> > java       2805 root    1w      REG      202,1 226122291     181737
> > /var/log/cloud/cloud.out.2
> > java       2805 root    2w      REG      202,1 226122291     181737
> > /var/log/cloud/cloud.out.2
> > java       2805 root  116w      REG      202,1    319382     181769
> > /var/log/cloud/cloud.out
> > root@v-zzzz-VM:/var/log/cloud# ls -alh
> >
> > Thanks
> > Saurav
> >
> >
> > On Tue, Mar 11, 2014 at 7:58 AM, Chiradeep Vittal <
> > Chiradeep.Vittal@citrix.com> wrote:
> >
> >> Yes, it was deliberate. I can¹t find the discussion, but it revolved
> >> around a security best practice of having separate partitions for /,
> >> /swap, home directories
> >>
> >>
> >> On 3/10/14, 11:35 AM, "Marcus" <sh...@gmail.com> wrote:
> >>
> >> >There have been several raised, actually regarding /var/log.  As for
> >> >the system vm partitioning, it was explicitly changed from single to
> >> >multiple partitions last year. I have no idea why, but I generally
> >> >don't file bugs without community discussion on things that seem
> >> >deliberate.
> >> >
> >> >On Sat, Mar 8, 2014 at 11:32 AM, Marcus <sh...@gmail.com> wrote:
> >> >> Yeah, I've just seen on busy systems where even with log rotation
> >> >>working  properly the little space left in var after OS files is
> >> >>barely enough, for  example the conntrackd log on a busy VPC. We
> >> >>actually ended up rolling our  own system vm, the existing image
> >> >>has plenty of space, its just locked up in  other partitions.
> >> >>
> >> >> On Mar 8, 2014 8:58 AM, "Rajesh Battala"
> >> >><ra...@citrix.com>
> >> >>wrote:
> >> >>>
> >> >>> Yes, only 435MB is available for /var . we can increase the space
> >> also.
> >> >>> But we need to find out the root cause which services are causing
> >> >>>the /var  to fill up.
> >> >>> Can you please find out and post which log files are taking up
> >> >>>more space  in /var
> >> >>>
> >> >>> Thanks
> >> >>> Rajesh Battala
> >> >>>
> >> >>> -----Original Message-----
> >> >>> From: Marcus [mailto:shadowsor@gmail.com]
> >> >>> Sent: Saturday, March 8, 2014 8:19 PM
> >> >>> To: dev@cloudstack.apache.org
> >> >>> Subject: RE: system vm disk space issue in ACS 4.3
> >> >>>
> >> >>> Perhaps there's a new service. I know in the past we've seen
> >> >>>issues with  this , specifically the conntrackd log. I think the
> >> >>>cloud logs weren't  getting rolled either, but I thought it was
> >> >>>all fixed.
> >> >>>
> >> >>> There's also simply not a ton of space on /var, I wish we would
> >> >>>go back to  just having one partition because it orphans lots of
> >> >>>free space in other  filesystems.
> >> >>> On Mar 8, 2014 12:37 AM, "Rajesh Battala"
> >> >>><ra...@citrix.com>
> >> >>> wrote:
> >> >>>
> >> >>> > AFAIK, log roation is enabled in the systemvm.
> >> >>> > Can you check whether the logs are getting zipped .?
> >> >>> >
> >> >>> > -----Original Message-----
> >> >>> > From: Anirban Chakraborty [mailto:abchak@juniper.net]
> >> >>> > Sent: Saturday, March 8, 2014 12:46 PM
> >> >>> > To: dev@cloudstack.apache.org
> >> >>> > Subject: system vm disk space issue in ACS 4.3
> >> >>> >
> >> >>> > Hi All,
> >> >>> >
> >> >>> > I am seeing system vm disk has no space left after running for
> >> >>> > few
> >> >>>days.
> >> >>> > Cloudstack UI shows the agent in v-2-VM in alert state, while
> >> >>> > agent state of s-1-VM shows blank (hyphen in the UI).
> >> >>> > Both the system vms are running and ssh-able from the host. The
> >> >>> > log
> >> >>>in
> >> >>> > s-1-Vm shows following errors:
> >> >>> >
> >> >>> > root@s-1-VM:~# grep 'Exception' /var/log/cloud/*.*
> >> >>> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left
> >> >>> > on device
> >> >>> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left
> >> >>> > on device
> >> >>> >
> >> >>> > whereas logs in v-1-VM shows
> >> >>> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left
> >> >>> > on device
> >> >>> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left
> >> >>> > on device
> >> >>> > /var/log/cloud/cloud.out.3:07:18:00,547  INFO
> >> CSExceptionErrorCode:87
> >> >>> > - Could not find exception:
> >> >>> > com.cloud.exception.AgentControlChannelException
> >> >>> > in error code list for exceptions
> >> >>> >
> >> >>> >
> >>
> >> >>>/var/log/cloud/cloud.out.3:com.cloud.exception.AgentControlChannel
> >> >>>Except
> >> >>>ion:
> >> >>> > Unable to post agent control request as link is not available
> >> >>> >
> >> >>> > Looks like cloud agent is filling up the log, which is leading
> >> >>> > to
> >> the
> >> >>> > disk full state.
> >> >>> >
> >> >>> > Is this a known issue? Thanks.
> >> >>> >
> >> >>> > Anirban
> >> >>> >
> >>
> >>
> >>
> >
>
>

RE: system vm disk space issue in ACS 4.3

Posted by Rajesh Battala <ra...@citrix.com>.
Can you please file a bug and send your fix for review. 

Thanks
Rajesh Battala

-----Original Message-----
From: Saurav Lahiri [mailto:saurav.lahiri@sungard.com] 
Sent: Wednesday, March 19, 2014 7:20 PM
To: dev@cloudstack.apache.org
Subject: Re: system vm disk space issue in ACS 4.3

The problem appears to be the start function in the /etc/init.d/cloud service for console proxy.
More specifically the following line also writes to /var/log/cloud.out

----------------------------------------------------------------------------------------------------------------------------------
(cd $CLOUD_COM_HOME/systemvm; nohup ./run.sh > /var/log/cloud/cloud.out
2>&1 & )
----------------------------------------------------------------------------------------------------------------------------------

since run.sh calls _run.sh and both has "set -x" enabled, in certain situations they can keep logging messages to cloud.out without being aware of the settings in log4j-cloud.xml


One way to fix that could be that run.sh and _run.sh would log to cloud.out only if a debug flag was set to true, otherwise only the java process would write to cloud.out and log4j would respect the settings in log4j-cloud.xml


Thanks
Saurav



On Mon, Mar 17, 2014 at 8:47 PM, Saurav Lahiri <sa...@sungard.com>wrote:

> Could it have  something to do with the RollingFileAppender that is 
> being used.
> The following 
> rollingfileappender<http://apache-logging.6191.n7.nabble.com/RollingFi
> leAppender-not-working-consistently-td8582.html> link appears to be a bit outdated but they more or less describe a similar problem that we are seeing?
>
>
> On our environment that is what we have seeing for sometime on console 
> proxy.  The root filesystem goes full with the cloud.out.* occupying 
> all the space. This happens pretty frequently and we have to regularly 
> recycle the console proxy to resolve this issue.
>
>
> As seen below, cloud.out.2 should not have exceeded 10MB but it stands 
> at 217MB now.
>
> drwxr-xr-x 2 root root 4.0K Mar 17 14:57 .
> drwxr-xr-x 8 root root 4.0K Mar 17 15:01 ..
> -rw-r--r-- 1 root root    0 Mar 12 18:18 api-server.log
> -rw-r--r-- 1 root root 357K Mar 17 15:06 cloud.out
> -rw-r--r-- 1 root root 2.1M Mar 17 14:56 cloud.out.1
> -rw-r--r-- 1 root root 217M Mar 17 15:06 cloud.out.2
>
> root@v-zzzz-VM:/var/log/cloud# lsof | grep cloud.out
> sleep       649 root    1w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> sleep       649 root    2w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> bash       2312 root    1w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> bash       2312 root    2w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> bash       2339 root    1w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> bash       2339 root    2w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> bash       2786 root    1w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> bash       2786 root    2w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> java       2805 root    1w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> java       2805 root    2w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> java       2805 root  116w      REG      202,1    319382     181769
> /var/log/cloud/cloud.out
> root@v-zzzz-VM:/var/log/cloud# ls -alh
>
> Thanks
> Saurav
>
>
> On Tue, Mar 11, 2014 at 7:58 AM, Chiradeep Vittal < 
> Chiradeep.Vittal@citrix.com> wrote:
>
>> Yes, it was deliberate. I can¹t find the discussion, but it revolved 
>> around a security best practice of having separate partitions for /, 
>> /swap, home directories
>>
>>
>> On 3/10/14, 11:35 AM, "Marcus" <sh...@gmail.com> wrote:
>>
>> >There have been several raised, actually regarding /var/log.  As for 
>> >the system vm partitioning, it was explicitly changed from single to 
>> >multiple partitions last year. I have no idea why, but I generally 
>> >don't file bugs without community discussion on things that seem 
>> >deliberate.
>> >
>> >On Sat, Mar 8, 2014 at 11:32 AM, Marcus <sh...@gmail.com> wrote:
>> >> Yeah, I've just seen on busy systems where even with log rotation 
>> >>working  properly the little space left in var after OS files is 
>> >>barely enough, for  example the conntrackd log on a busy VPC. We 
>> >>actually ended up rolling our  own system vm, the existing image 
>> >>has plenty of space, its just locked up in  other partitions.
>> >>
>> >> On Mar 8, 2014 8:58 AM, "Rajesh Battala" 
>> >><ra...@citrix.com>
>> >>wrote:
>> >>>
>> >>> Yes, only 435MB is available for /var . we can increase the space
>> also.
>> >>> But we need to find out the root cause which services are causing 
>> >>>the /var  to fill up.
>> >>> Can you please find out and post which log files are taking up 
>> >>>more space  in /var
>> >>>
>> >>> Thanks
>> >>> Rajesh Battala
>> >>>
>> >>> -----Original Message-----
>> >>> From: Marcus [mailto:shadowsor@gmail.com]
>> >>> Sent: Saturday, March 8, 2014 8:19 PM
>> >>> To: dev@cloudstack.apache.org
>> >>> Subject: RE: system vm disk space issue in ACS 4.3
>> >>>
>> >>> Perhaps there's a new service. I know in the past we've seen 
>> >>>issues with  this , specifically the conntrackd log. I think the 
>> >>>cloud logs weren't  getting rolled either, but I thought it was 
>> >>>all fixed.
>> >>>
>> >>> There's also simply not a ton of space on /var, I wish we would 
>> >>>go back to  just having one partition because it orphans lots of 
>> >>>free space in other  filesystems.
>> >>> On Mar 8, 2014 12:37 AM, "Rajesh Battala" 
>> >>><ra...@citrix.com>
>> >>> wrote:
>> >>>
>> >>> > AFAIK, log roation is enabled in the systemvm.
>> >>> > Can you check whether the logs are getting zipped .?
>> >>> >
>> >>> > -----Original Message-----
>> >>> > From: Anirban Chakraborty [mailto:abchak@juniper.net]
>> >>> > Sent: Saturday, March 8, 2014 12:46 PM
>> >>> > To: dev@cloudstack.apache.org
>> >>> > Subject: system vm disk space issue in ACS 4.3
>> >>> >
>> >>> > Hi All,
>> >>> >
>> >>> > I am seeing system vm disk has no space left after running for 
>> >>> > few
>> >>>days.
>> >>> > Cloudstack UI shows the agent in v-2-VM in alert state, while 
>> >>> > agent state of s-1-VM shows blank (hyphen in the UI).
>> >>> > Both the system vms are running and ssh-able from the host. The 
>> >>> > log
>> >>>in
>> >>> > s-1-Vm shows following errors:
>> >>> >
>> >>> > root@s-1-VM:~# grep 'Exception' /var/log/cloud/*.*
>> >>> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left 
>> >>> > on device
>> >>> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left 
>> >>> > on device
>> >>> >
>> >>> > whereas logs in v-1-VM shows
>> >>> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left 
>> >>> > on device
>> >>> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left 
>> >>> > on device
>> >>> > /var/log/cloud/cloud.out.3:07:18:00,547  INFO
>> CSExceptionErrorCode:87
>> >>> > - Could not find exception:
>> >>> > com.cloud.exception.AgentControlChannelException
>> >>> > in error code list for exceptions
>> >>> >
>> >>> >
>>
>> >>>/var/log/cloud/cloud.out.3:com.cloud.exception.AgentControlChannel
>> >>>Except
>> >>>ion:
>> >>> > Unable to post agent control request as link is not available
>> >>> >
>> >>> > Looks like cloud agent is filling up the log, which is leading 
>> >>> > to
>> the
>> >>> > disk full state.
>> >>> >
>> >>> > Is this a known issue? Thanks.
>> >>> >
>> >>> > Anirban
>> >>> >
>>
>>
>>
>

Re: system vm disk space issue in ACS 4.3

Posted by Saurav Lahiri <sa...@sungard.com>.
The problem appears to be the start function in the /etc/init.d/cloud
service for console proxy.
More specifically the following line also writes to /var/log/cloud.out

----------------------------------------------------------------------------------------------------------------------------------
(cd $CLOUD_COM_HOME/systemvm; nohup ./run.sh > /var/log/cloud/cloud.out
2>&1 & )
----------------------------------------------------------------------------------------------------------------------------------

since run.sh calls _run.sh and both has "set -x" enabled, in certain
situations they can keep
logging messages to cloud.out without being aware of the settings in
log4j-cloud.xml


One way to fix that could be that run.sh and _run.sh would log to cloud.out
only if a debug flag
was set to true, otherwise only the java process would write to cloud.out
and log4j would
respect the settings in log4j-cloud.xml


Thanks
Saurav



On Mon, Mar 17, 2014 at 8:47 PM, Saurav Lahiri <sa...@sungard.com>wrote:

> Could it have  something to do with the RollingFileAppender that is being
> used.
> The following rollingfileappender<http://apache-logging.6191.n7.nabble.com/RollingFileAppender-not-working-consistently-td8582.html> link
> appears to be a bit outdated but they more or less describe a similar
> problem that we are seeing?
>
>
> On our environment that is what we have seeing for sometime on console
> proxy.  The root filesystem goes full with the cloud.out.* occupying all
> the space. This happens pretty frequently and we have to regularly recycle
> the console proxy to resolve this issue.
>
>
> As seen below, cloud.out.2 should not have exceeded 10MB but it stands at
> 217MB now.
>
> drwxr-xr-x 2 root root 4.0K Mar 17 14:57 .
> drwxr-xr-x 8 root root 4.0K Mar 17 15:01 ..
> -rw-r--r-- 1 root root    0 Mar 12 18:18 api-server.log
> -rw-r--r-- 1 root root 357K Mar 17 15:06 cloud.out
> -rw-r--r-- 1 root root 2.1M Mar 17 14:56 cloud.out.1
> -rw-r--r-- 1 root root 217M Mar 17 15:06 cloud.out.2
>
> root@v-zzzz-VM:/var/log/cloud# lsof | grep cloud.out
> sleep       649 root    1w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> sleep       649 root    2w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> bash       2312 root    1w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> bash       2312 root    2w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> bash       2339 root    1w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> bash       2339 root    2w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> bash       2786 root    1w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> bash       2786 root    2w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> java       2805 root    1w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> java       2805 root    2w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> java       2805 root  116w      REG      202,1    319382     181769
> /var/log/cloud/cloud.out
> root@v-zzzz-VM:/var/log/cloud# ls -alh
>
> Thanks
> Saurav
>
>
> On Tue, Mar 11, 2014 at 7:58 AM, Chiradeep Vittal <
> Chiradeep.Vittal@citrix.com> wrote:
>
>> Yes, it was deliberate. I can¹t find the discussion, but it revolved
>> around a security best practice of having separate partitions for /,
>> /swap, home directories
>>
>>
>> On 3/10/14, 11:35 AM, "Marcus" <sh...@gmail.com> wrote:
>>
>> >There have been several raised, actually regarding /var/log.  As for
>> >the system vm partitioning, it was explicitly changed from single to
>> >multiple partitions last year. I have no idea why, but I generally
>> >don't file bugs without community discussion on things that seem
>> >deliberate.
>> >
>> >On Sat, Mar 8, 2014 at 11:32 AM, Marcus <sh...@gmail.com> wrote:
>> >> Yeah, I've just seen on busy systems where even with log rotation
>> >>working
>> >> properly the little space left in var after OS files is barely enough,
>> >>for
>> >> example the conntrackd log on a busy VPC. We actually ended up rolling
>> >>our
>> >> own system vm, the existing image has plenty of space, its just locked
>> >>up in
>> >> other partitions.
>> >>
>> >> On Mar 8, 2014 8:58 AM, "Rajesh Battala" <ra...@citrix.com>
>> >>wrote:
>> >>>
>> >>> Yes, only 435MB is available for /var . we can increase the space
>> also.
>> >>> But we need to find out the root cause which services are causing the
>> >>>/var
>> >>> to fill up.
>> >>> Can you please find out and post which log files are taking up more
>> >>>space
>> >>> in /var
>> >>>
>> >>> Thanks
>> >>> Rajesh Battala
>> >>>
>> >>> -----Original Message-----
>> >>> From: Marcus [mailto:shadowsor@gmail.com]
>> >>> Sent: Saturday, March 8, 2014 8:19 PM
>> >>> To: dev@cloudstack.apache.org
>> >>> Subject: RE: system vm disk space issue in ACS 4.3
>> >>>
>> >>> Perhaps there's a new service. I know in the past we've seen issues
>> >>>with
>> >>> this , specifically the conntrackd log. I think the cloud logs weren't
>> >>> getting rolled either, but I thought it was all fixed.
>> >>>
>> >>> There's also simply not a ton of space on /var, I wish we would go
>> >>>back to
>> >>> just having one partition because it orphans lots of free space in
>> >>>other
>> >>> filesystems.
>> >>> On Mar 8, 2014 12:37 AM, "Rajesh Battala" <ra...@citrix.com>
>> >>> wrote:
>> >>>
>> >>> > AFAIK, log roation is enabled in the systemvm.
>> >>> > Can you check whether the logs are getting zipped .?
>> >>> >
>> >>> > -----Original Message-----
>> >>> > From: Anirban Chakraborty [mailto:abchak@juniper.net]
>> >>> > Sent: Saturday, March 8, 2014 12:46 PM
>> >>> > To: dev@cloudstack.apache.org
>> >>> > Subject: system vm disk space issue in ACS 4.3
>> >>> >
>> >>> > Hi All,
>> >>> >
>> >>> > I am seeing system vm disk has no space left after running for few
>> >>>days.
>> >>> > Cloudstack UI shows the agent in v-2-VM in alert state, while agent
>> >>> > state of s-1-VM shows blank (hyphen in the UI).
>> >>> > Both the system vms are running and ssh-able from the host. The log
>> >>>in
>> >>> > s-1-Vm shows following errors:
>> >>> >
>> >>> > root@s-1-VM:~# grep 'Exception' /var/log/cloud/*.*
>> >>> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left on
>> >>> > device
>> >>> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left on
>> >>> > device
>> >>> >
>> >>> > whereas logs in v-1-VM shows
>> >>> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left on
>> >>> > device
>> >>> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left on
>> >>> > device
>> >>> > /var/log/cloud/cloud.out.3:07:18:00,547  INFO
>> CSExceptionErrorCode:87
>> >>> > - Could not find exception:
>> >>> > com.cloud.exception.AgentControlChannelException
>> >>> > in error code list for exceptions
>> >>> >
>> >>> >
>>
>> >>>/var/log/cloud/cloud.out.3:com.cloud.exception.AgentControlChannelExcept
>> >>>ion:
>> >>> > Unable to post agent control request as link is not available
>> >>> >
>> >>> > Looks like cloud agent is filling up the log, which is leading to
>> the
>> >>> > disk full state.
>> >>> >
>> >>> > Is this a known issue? Thanks.
>> >>> >
>> >>> > Anirban
>> >>> >
>>
>>
>>
>

Re: system vm disk space issue in ACS 4.3

Posted by Saurav Lahiri <sa...@sungard.com>.
Could it have  something to do with the RollingFileAppender that is being
used.
The following rollingfileappender<http://apache-logging.6191.n7.nabble.com/RollingFileAppender-not-working-consistently-td8582.html>
link
appears to be a bit outdated but they more or less describe a similar
problem that we are seeing?


On our environment that is what we have seeing for sometime on console
proxy.  The root filesystem goes full with the cloud.out.* occupying all
the space. This happens pretty frequently and we have to regularly recycle
the console proxy to resolve this issue.


As seen below, cloud.out.2 should not have exceeded 10MB but it stands at
217MB now.

drwxr-xr-x 2 root root 4.0K Mar 17 14:57 .
drwxr-xr-x 8 root root 4.0K Mar 17 15:01 ..
-rw-r--r-- 1 root root    0 Mar 12 18:18 api-server.log
-rw-r--r-- 1 root root 357K Mar 17 15:06 cloud.out
-rw-r--r-- 1 root root 2.1M Mar 17 14:56 cloud.out.1
-rw-r--r-- 1 root root 217M Mar 17 15:06 cloud.out.2

root@v-zzzz-VM:/var/log/cloud# lsof | grep cloud.out
sleep       649 root    1w      REG      202,1 226122291     181737
/var/log/cloud/cloud.out.2
sleep       649 root    2w      REG      202,1 226122291     181737
/var/log/cloud/cloud.out.2
bash       2312 root    1w      REG      202,1 226122291     181737
/var/log/cloud/cloud.out.2
bash       2312 root    2w      REG      202,1 226122291     181737
/var/log/cloud/cloud.out.2
bash       2339 root    1w      REG      202,1 226122291     181737
/var/log/cloud/cloud.out.2
bash       2339 root    2w      REG      202,1 226122291     181737
/var/log/cloud/cloud.out.2
bash       2786 root    1w      REG      202,1 226122291     181737
/var/log/cloud/cloud.out.2
bash       2786 root    2w      REG      202,1 226122291     181737
/var/log/cloud/cloud.out.2
java       2805 root    1w      REG      202,1 226122291     181737
/var/log/cloud/cloud.out.2
java       2805 root    2w      REG      202,1 226122291     181737
/var/log/cloud/cloud.out.2
java       2805 root  116w      REG      202,1    319382     181769
/var/log/cloud/cloud.out
root@v-zzzz-VM:/var/log/cloud# ls -alh

Thanks
Saurav


On Tue, Mar 11, 2014 at 7:58 AM, Chiradeep Vittal <
Chiradeep.Vittal@citrix.com> wrote:

> Yes, it was deliberate. I can¹t find the discussion, but it revolved
> around a security best practice of having separate partitions for /,
> /swap, home directories
>
>
> On 3/10/14, 11:35 AM, "Marcus" <sh...@gmail.com> wrote:
>
> >There have been several raised, actually regarding /var/log.  As for
> >the system vm partitioning, it was explicitly changed from single to
> >multiple partitions last year. I have no idea why, but I generally
> >don't file bugs without community discussion on things that seem
> >deliberate.
> >
> >On Sat, Mar 8, 2014 at 11:32 AM, Marcus <sh...@gmail.com> wrote:
> >> Yeah, I've just seen on busy systems where even with log rotation
> >>working
> >> properly the little space left in var after OS files is barely enough,
> >>for
> >> example the conntrackd log on a busy VPC. We actually ended up rolling
> >>our
> >> own system vm, the existing image has plenty of space, its just locked
> >>up in
> >> other partitions.
> >>
> >> On Mar 8, 2014 8:58 AM, "Rajesh Battala" <ra...@citrix.com>
> >>wrote:
> >>>
> >>> Yes, only 435MB is available for /var . we can increase the space also.
> >>> But we need to find out the root cause which services are causing the
> >>>/var
> >>> to fill up.
> >>> Can you please find out and post which log files are taking up more
> >>>space
> >>> in /var
> >>>
> >>> Thanks
> >>> Rajesh Battala
> >>>
> >>> -----Original Message-----
> >>> From: Marcus [mailto:shadowsor@gmail.com]
> >>> Sent: Saturday, March 8, 2014 8:19 PM
> >>> To: dev@cloudstack.apache.org
> >>> Subject: RE: system vm disk space issue in ACS 4.3
> >>>
> >>> Perhaps there's a new service. I know in the past we've seen issues
> >>>with
> >>> this , specifically the conntrackd log. I think the cloud logs weren't
> >>> getting rolled either, but I thought it was all fixed.
> >>>
> >>> There's also simply not a ton of space on /var, I wish we would go
> >>>back to
> >>> just having one partition because it orphans lots of free space in
> >>>other
> >>> filesystems.
> >>> On Mar 8, 2014 12:37 AM, "Rajesh Battala" <ra...@citrix.com>
> >>> wrote:
> >>>
> >>> > AFAIK, log roation is enabled in the systemvm.
> >>> > Can you check whether the logs are getting zipped .?
> >>> >
> >>> > -----Original Message-----
> >>> > From: Anirban Chakraborty [mailto:abchak@juniper.net]
> >>> > Sent: Saturday, March 8, 2014 12:46 PM
> >>> > To: dev@cloudstack.apache.org
> >>> > Subject: system vm disk space issue in ACS 4.3
> >>> >
> >>> > Hi All,
> >>> >
> >>> > I am seeing system vm disk has no space left after running for few
> >>>days.
> >>> > Cloudstack UI shows the agent in v-2-VM in alert state, while agent
> >>> > state of s-1-VM shows blank (hyphen in the UI).
> >>> > Both the system vms are running and ssh-able from the host. The log
> >>>in
> >>> > s-1-Vm shows following errors:
> >>> >
> >>> > root@s-1-VM:~# grep 'Exception' /var/log/cloud/*.*
> >>> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left on
> >>> > device
> >>> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left on
> >>> > device
> >>> >
> >>> > whereas logs in v-1-VM shows
> >>> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left on
> >>> > device
> >>> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left on
> >>> > device
> >>> > /var/log/cloud/cloud.out.3:07:18:00,547  INFO CSExceptionErrorCode:87
> >>> > - Could not find exception:
> >>> > com.cloud.exception.AgentControlChannelException
> >>> > in error code list for exceptions
> >>> >
> >>> >
> >>>/var/log/cloud/cloud.out.3:com.cloud.exception.AgentControlChannelExcept
> >>>ion:
> >>> > Unable to post agent control request as link is not available
> >>> >
> >>> > Looks like cloud agent is filling up the log, which is leading to the
> >>> > disk full state.
> >>> >
> >>> > Is this a known issue? Thanks.
> >>> >
> >>> > Anirban
> >>> >
>
>
>

Re: system vm disk space issue in ACS 4.3

Posted by Chiradeep Vittal <Ch...@citrix.com>.
Yes, it was deliberate. I can¹t find the discussion, but it revolved
around a security best practice of having separate partitions for /,
/swap, home directories


On 3/10/14, 11:35 AM, "Marcus" <sh...@gmail.com> wrote:

>There have been several raised, actually regarding /var/log.  As for
>the system vm partitioning, it was explicitly changed from single to
>multiple partitions last year. I have no idea why, but I generally
>don't file bugs without community discussion on things that seem
>deliberate.
>
>On Sat, Mar 8, 2014 at 11:32 AM, Marcus <sh...@gmail.com> wrote:
>> Yeah, I've just seen on busy systems where even with log rotation
>>working
>> properly the little space left in var after OS files is barely enough,
>>for
>> example the conntrackd log on a busy VPC. We actually ended up rolling
>>our
>> own system vm, the existing image has plenty of space, its just locked
>>up in
>> other partitions.
>>
>> On Mar 8, 2014 8:58 AM, "Rajesh Battala" <ra...@citrix.com>
>>wrote:
>>>
>>> Yes, only 435MB is available for /var . we can increase the space also.
>>> But we need to find out the root cause which services are causing the
>>>/var
>>> to fill up.
>>> Can you please find out and post which log files are taking up more
>>>space
>>> in /var
>>>
>>> Thanks
>>> Rajesh Battala
>>>
>>> -----Original Message-----
>>> From: Marcus [mailto:shadowsor@gmail.com]
>>> Sent: Saturday, March 8, 2014 8:19 PM
>>> To: dev@cloudstack.apache.org
>>> Subject: RE: system vm disk space issue in ACS 4.3
>>>
>>> Perhaps there's a new service. I know in the past we've seen issues
>>>with
>>> this , specifically the conntrackd log. I think the cloud logs weren't
>>> getting rolled either, but I thought it was all fixed.
>>>
>>> There's also simply not a ton of space on /var, I wish we would go
>>>back to
>>> just having one partition because it orphans lots of free space in
>>>other
>>> filesystems.
>>> On Mar 8, 2014 12:37 AM, "Rajesh Battala" <ra...@citrix.com>
>>> wrote:
>>>
>>> > AFAIK, log roation is enabled in the systemvm.
>>> > Can you check whether the logs are getting zipped .?
>>> >
>>> > -----Original Message-----
>>> > From: Anirban Chakraborty [mailto:abchak@juniper.net]
>>> > Sent: Saturday, March 8, 2014 12:46 PM
>>> > To: dev@cloudstack.apache.org
>>> > Subject: system vm disk space issue in ACS 4.3
>>> >
>>> > Hi All,
>>> >
>>> > I am seeing system vm disk has no space left after running for few
>>>days.
>>> > Cloudstack UI shows the agent in v-2-VM in alert state, while agent
>>> > state of s-1-VM shows blank (hyphen in the UI).
>>> > Both the system vms are running and ssh-able from the host. The log
>>>in
>>> > s-1-Vm shows following errors:
>>> >
>>> > root@s-1-VM:~# grep 'Exception' /var/log/cloud/*.*
>>> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left on
>>> > device
>>> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left on
>>> > device
>>> >
>>> > whereas logs in v-1-VM shows
>>> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left on
>>> > device
>>> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left on
>>> > device
>>> > /var/log/cloud/cloud.out.3:07:18:00,547  INFO CSExceptionErrorCode:87
>>> > - Could not find exception:
>>> > com.cloud.exception.AgentControlChannelException
>>> > in error code list for exceptions
>>> >
>>> > 
>>>/var/log/cloud/cloud.out.3:com.cloud.exception.AgentControlChannelExcept
>>>ion:
>>> > Unable to post agent control request as link is not available
>>> >
>>> > Looks like cloud agent is filling up the log, which is leading to the
>>> > disk full state.
>>> >
>>> > Is this a known issue? Thanks.
>>> >
>>> > Anirban
>>> >


Re: system vm disk space issue in ACS 4.3

Posted by Marcus <sh...@gmail.com>.
There have been several raised, actually regarding /var/log.  As for
the system vm partitioning, it was explicitly changed from single to
multiple partitions last year. I have no idea why, but I generally
don't file bugs without community discussion on things that seem
deliberate.

On Sat, Mar 8, 2014 at 11:32 AM, Marcus <sh...@gmail.com> wrote:
> Yeah, I've just seen on busy systems where even with log rotation working
> properly the little space left in var after OS files is barely enough, for
> example the conntrackd log on a busy VPC. We actually ended up rolling our
> own system vm, the existing image has plenty of space, its just locked up in
> other partitions.
>
> On Mar 8, 2014 8:58 AM, "Rajesh Battala" <ra...@citrix.com> wrote:
>>
>> Yes, only 435MB is available for /var . we can increase the space also.
>> But we need to find out the root cause which services are causing the /var
>> to fill up.
>> Can you please find out and post which log files are taking up more space
>> in /var
>>
>> Thanks
>> Rajesh Battala
>>
>> -----Original Message-----
>> From: Marcus [mailto:shadowsor@gmail.com]
>> Sent: Saturday, March 8, 2014 8:19 PM
>> To: dev@cloudstack.apache.org
>> Subject: RE: system vm disk space issue in ACS 4.3
>>
>> Perhaps there's a new service. I know in the past we've seen issues with
>> this , specifically the conntrackd log. I think the cloud logs weren't
>> getting rolled either, but I thought it was all fixed.
>>
>> There's also simply not a ton of space on /var, I wish we would go back to
>> just having one partition because it orphans lots of free space in other
>> filesystems.
>> On Mar 8, 2014 12:37 AM, "Rajesh Battala" <ra...@citrix.com>
>> wrote:
>>
>> > AFAIK, log roation is enabled in the systemvm.
>> > Can you check whether the logs are getting zipped .?
>> >
>> > -----Original Message-----
>> > From: Anirban Chakraborty [mailto:abchak@juniper.net]
>> > Sent: Saturday, March 8, 2014 12:46 PM
>> > To: dev@cloudstack.apache.org
>> > Subject: system vm disk space issue in ACS 4.3
>> >
>> > Hi All,
>> >
>> > I am seeing system vm disk has no space left after running for few days.
>> > Cloudstack UI shows the agent in v-2-VM in alert state, while agent
>> > state of s-1-VM shows blank (hyphen in the UI).
>> > Both the system vms are running and ssh-able from the host. The log in
>> > s-1-Vm shows following errors:
>> >
>> > root@s-1-VM:~# grep 'Exception' /var/log/cloud/*.*
>> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left on
>> > device
>> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left on
>> > device
>> >
>> > whereas logs in v-1-VM shows
>> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left on
>> > device
>> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left on
>> > device
>> > /var/log/cloud/cloud.out.3:07:18:00,547  INFO CSExceptionErrorCode:87
>> > - Could not find exception:
>> > com.cloud.exception.AgentControlChannelException
>> > in error code list for exceptions
>> >
>> > /var/log/cloud/cloud.out.3:com.cloud.exception.AgentControlChannelException:
>> > Unable to post agent control request as link is not available
>> >
>> > Looks like cloud agent is filling up the log, which is leading to the
>> > disk full state.
>> >
>> > Is this a known issue? Thanks.
>> >
>> > Anirban
>> >

RE: system vm disk space issue in ACS 4.3

Posted by Marcus <sh...@gmail.com>.
Yeah, I've just seen on busy systems where even with log rotation working
properly the little space left in var after OS files is barely enough, for
example the conntrackd log on a busy VPC. We actually ended up rolling our
own system vm, the existing image has plenty of space, its just locked up
in other partitions.
On Mar 8, 2014 8:58 AM, "Rajesh Battala" <ra...@citrix.com> wrote:

> Yes, only 435MB is available for /var . we can increase the space also.
> But we need to find out the root cause which services are causing the /var
> to fill up.
> Can you please find out and post which log files are taking up more space
> in /var
>
> Thanks
> Rajesh Battala
>
> -----Original Message-----
> From: Marcus [mailto:shadowsor@gmail.com]
> Sent: Saturday, March 8, 2014 8:19 PM
> To: dev@cloudstack.apache.org
> Subject: RE: system vm disk space issue in ACS 4.3
>
> Perhaps there's a new service. I know in the past we've seen issues with
> this , specifically the conntrackd log. I think the cloud logs weren't
> getting rolled either, but I thought it was all fixed.
>
> There's also simply not a ton of space on /var, I wish we would go back to
> just having one partition because it orphans lots of free space in other
> filesystems.
> On Mar 8, 2014 12:37 AM, "Rajesh Battala" <ra...@citrix.com>
> wrote:
>
> > AFAIK, log roation is enabled in the systemvm.
> > Can you check whether the logs are getting zipped .?
> >
> > -----Original Message-----
> > From: Anirban Chakraborty [mailto:abchak@juniper.net]
> > Sent: Saturday, March 8, 2014 12:46 PM
> > To: dev@cloudstack.apache.org
> > Subject: system vm disk space issue in ACS 4.3
> >
> > Hi All,
> >
> > I am seeing system vm disk has no space left after running for few days.
> > Cloudstack UI shows the agent in v-2-VM in alert state, while agent
> > state of s-1-VM shows blank (hyphen in the UI).
> > Both the system vms are running and ssh-able from the host. The log in
> > s-1-Vm shows following errors:
> >
> > root@s-1-VM:~# grep 'Exception' /var/log/cloud/*.*
> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left on
> > device
> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left on
> > device
> >
> > whereas logs in v-1-VM shows
> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left on
> > device
> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left on
> > device
> > /var/log/cloud/cloud.out.3:07:18:00,547  INFO CSExceptionErrorCode:87
> > - Could not find exception:
> > com.cloud.exception.AgentControlChannelException
> > in error code list for exceptions
> >
> /var/log/cloud/cloud.out.3:com.cloud.exception.AgentControlChannelException:
> > Unable to post agent control request as link is not available
> >
> > Looks like cloud agent is filling up the log, which is leading to the
> > disk full state.
> >
> > Is this a known issue? Thanks.
> >
> > Anirban
> >
>

RE: system vm disk space issue in ACS 4.3

Posted by Rajesh Battala <ra...@citrix.com>.
Yes, only 435MB is available for /var . we can increase the space also. But we need to find out the root cause which services are causing the /var to fill up.
Can you please find out and post which log files are taking up more space in /var 

Thanks
Rajesh Battala

-----Original Message-----
From: Marcus [mailto:shadowsor@gmail.com] 
Sent: Saturday, March 8, 2014 8:19 PM
To: dev@cloudstack.apache.org
Subject: RE: system vm disk space issue in ACS 4.3

Perhaps there's a new service. I know in the past we've seen issues with this , specifically the conntrackd log. I think the cloud logs weren't getting rolled either, but I thought it was all fixed.

There's also simply not a ton of space on /var, I wish we would go back to just having one partition because it orphans lots of free space in other filesystems.
On Mar 8, 2014 12:37 AM, "Rajesh Battala" <ra...@citrix.com> wrote:

> AFAIK, log roation is enabled in the systemvm.
> Can you check whether the logs are getting zipped .?
>
> -----Original Message-----
> From: Anirban Chakraborty [mailto:abchak@juniper.net]
> Sent: Saturday, March 8, 2014 12:46 PM
> To: dev@cloudstack.apache.org
> Subject: system vm disk space issue in ACS 4.3
>
> Hi All,
>
> I am seeing system vm disk has no space left after running for few days.
> Cloudstack UI shows the agent in v-2-VM in alert state, while agent 
> state of s-1-VM shows blank (hyphen in the UI).
> Both the system vms are running and ssh-able from the host. The log in 
> s-1-Vm shows following errors:
>
> root@s-1-VM:~# grep 'Exception' /var/log/cloud/*.*
> /var/log/cloud/cloud.out.2:java.io.IOException: No space left on 
> device
> /var/log/cloud/cloud.out.2:java.io.IOException: No space left on 
> device
>
> whereas logs in v-1-VM shows
> /var/log/cloud/cloud.out.3:java.io.IOException: No space left on 
> device
> /var/log/cloud/cloud.out.3:java.io.IOException: No space left on 
> device
> /var/log/cloud/cloud.out.3:07:18:00,547  INFO CSExceptionErrorCode:87 
> - Could not find exception: 
> com.cloud.exception.AgentControlChannelException
> in error code list for exceptions
> /var/log/cloud/cloud.out.3:com.cloud.exception.AgentControlChannelException:
> Unable to post agent control request as link is not available
>
> Looks like cloud agent is filling up the log, which is leading to the 
> disk full state.
>
> Is this a known issue? Thanks.
>
> Anirban
>

RE: system vm disk space issue in ACS 4.3

Posted by Marcus <sh...@gmail.com>.
Perhaps there's a new service. I know in the past we've seen issues with
this , specifically the conntrackd log. I think the cloud logs weren't
getting rolled either, but I thought it was all fixed.

There's also simply not a ton of space on /var, I wish we would go back to
just having one partition because it orphans lots of free space in other
filesystems.
On Mar 8, 2014 12:37 AM, "Rajesh Battala" <ra...@citrix.com> wrote:

> AFAIK, log roation is enabled in the systemvm.
> Can you check whether the logs are getting zipped .?
>
> -----Original Message-----
> From: Anirban Chakraborty [mailto:abchak@juniper.net]
> Sent: Saturday, March 8, 2014 12:46 PM
> To: dev@cloudstack.apache.org
> Subject: system vm disk space issue in ACS 4.3
>
> Hi All,
>
> I am seeing system vm disk has no space left after running for few days.
> Cloudstack UI shows the agent in v-2-VM in alert state, while agent state
> of s-1-VM shows blank (hyphen in the UI).
> Both the system vms are running and ssh-able from the host. The log in
> s-1-Vm shows following errors:
>
> root@s-1-VM:~# grep 'Exception' /var/log/cloud/*.*
> /var/log/cloud/cloud.out.2:java.io.IOException: No space left on device
> /var/log/cloud/cloud.out.2:java.io.IOException: No space left on device
>
> whereas logs in v-1-VM shows
> /var/log/cloud/cloud.out.3:java.io.IOException: No space left on device
> /var/log/cloud/cloud.out.3:java.io.IOException: No space left on device
> /var/log/cloud/cloud.out.3:07:18:00,547  INFO CSExceptionErrorCode:87 -
> Could not find exception: com.cloud.exception.AgentControlChannelException
> in error code list for exceptions
> /var/log/cloud/cloud.out.3:com.cloud.exception.AgentControlChannelException:
> Unable to post agent control request as link is not available
>
> Looks like cloud agent is filling up the log, which is leading to the disk
> full state.
>
> Is this a known issue? Thanks.
>
> Anirban
>

Re: system vm disk space issue in ACS 4.3

Posted by Marcus <sh...@gmail.com>.
we can maybe switch to an agent properties for the system vms that use
a size-based roll for logging.

On Mon, Mar 10, 2014 at 5:30 PM, Anirban Chakraborty <ab...@juniper.net> wrote:
> Thanks for all the responses. I do not see cloud.log and cloud.out logs are zipped in /var/log and /var/log/cloud respectively. Only file that was zipped was cron.log. The two largest files are:
> cloud.out.2 with following:
> --
> + keyvalues=' root=LABEL console=tty0 xencons=ttyS0,115200 console=hvc0 console=hvc0 template=domP type=secstorage host=10.84.58.252 port=8250 name=s-1-VM zone=1 pod=1 guid=s-1-VM resource=org.apache.cloudstack.storage.resource.NfsSecondaryStorageResource instance=SecStorage sslcopy=true role=templateProcessor mtu=1500'
> + for i in '$CMDLINE'
> ++ cut -s -d= -f1
> ++ echo eth2ip=10.84.59.176
> + KEY=eth2ip
> ++ cut -s -d= -f2
> ++ echo eth2ip=10.84.59.176
> + VALUE=10.84.59.176
> + '[' eth2ip == '' ']'
> + case $KEY in
> + keyvalues=' root=LABEL console=tty0 xencons=ttyS0,115200 console=hvc0 console=hvc0 template=domP type=secstorage host=10.84.58.252 port=8250 name=s-1-VM zone=1 pod=1 guid=s-1-VM resource=org.apache.cloudstack.sJava HotSpot(TM) Client VM warning: Insufficient space for shared memory file:
>    /tmp/hsperfdata_root/8004
> Try using the -Djava.io.tmpdir= option to select an alternate temp location.
>
> log4j:WARN No appenders could be found for logger (com.cloud.agent.AgentShell).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
> log4j:WARN No such property [maxFileSize] in org.apache.log4j.rolling.RollingFileAppender.
> log4j:WARN No such property [maxBackupIndex] in org.apache.log4j.rolling.RollingFileAppender.
> log4j:WARN Please set a rolling policy for the RollingFileAppender named 'FILE3'
> 08:31:55,320  INFO AgentShell:318 - Agent started
> log4j:ERROR Failed to flush writer,
> java.io.IOException: No space left on device
> --
> and, repetition of following in /var/log/cloud.log.1
> --
> 2014-03-01 07:05:08,607 DEBUG [cloud.utils.ProcessUtil] (main:null)   PID TTY          TIME CMD
>  3938 ?        00:00:36 java
>
> 2014-03-01 07:05:08,607 ERROR [cloud.agent.AgentShell] (main:null) Unable to start agent: Java process is being started twice.  If this is not true, remove /var/run/agent.SecStorage.pid
> 2014-03-01 07:05:19,028 INFO  [cloud.agent.AgentShell] (main:null) Agent started
> 2014-03-01 07:05:19,030 INFO  [cloud.agent.AgentShell] (main:null) Implementation Version is 4.3.0-SNAPSHOT
> 2014-03-01 07:05:19,030 INFO  [cloud.agent.AgentShell] (main:null) agent.properties found at /usr/local/cloud/systemvm/conf/agent.properties
> 2014-03-01 07:05:19,038 DEBUG [cloud.agent.AgentShell] (main:null) Found property: instance
> 2014-03-01 07:05:19,038 DEBUG [cloud.agent.AgentShell] (main:null) Found property: resource
> 2014-03-01 07:05:19,038 INFO  [cloud.agent.AgentShell] (main:null) Defaulting to using properties file for storage
> 2014-03-01 07:05:19,039 INFO  [cloud.agent.AgentShell] (main:null) Defaulting to the constant time backoff algorithm
> 2014-03-01 07:05:19,048 INFO  [cloud.utils.LogUtils] (main:null) log4j configuration found at /usr/local/cloud/systemvm/conf/log4j-cloud.xml
> 2014-03-01 07:05:19,062 DEBUG [cloud.agent.AgentShell] (main:null) Checking to see if agent.SecStorage.pid exists.
> 2014-03-01 07:05:19,064 DEBUG [cloud.utils.ProcessUtil] (main:null) environment.properties could not be opened
> 2014-03-01 07:05:19,071 DEBUG [cloud.utils.ProcessUtil] (main:null) Executing: bash -c ps -p 3938
> 2014-03-01 07:05:19,077 DEBUG [cloud.utils.ProcessUtil] (main:null) Execution is successful.
>
> eventually the log gets full with,
> 2014-03-10 09:54:09,420 INFO  [utils.nio.NioClient] (Agent-Selector:null) Connecting to 10.84.58.252:8250
> 2014-03-10 09:54:36,916 WARN  [utils.nio.NioConnection] (Agent-Selector:null) Unable to connect to remote: is there a server running on port 8250
> --
> Looks like the agent on s-1-vm could not connect to the management server at some point of time after the system vm startup and the log gets filled with above 'Unable to connect' messages.
>
> Anirban
>
> On Mar 7, 2014, at 11:37 PM, Rajesh Battala <ra...@citrix.com> wrote:
>
>> AFAIK, log roation is enabled in the systemvm.
>> Can you check whether the logs are getting zipped .?
>>
>> -----Original Message-----
>> From: Anirban Chakraborty [mailto:abchak@juniper.net]
>> Sent: Saturday, March 8, 2014 12:46 PM
>> To: dev@cloudstack.apache.org
>> Subject: system vm disk space issue in ACS 4.3
>>
>> Hi All,
>>
>> I am seeing system vm disk has no space left after running for few days. Cloudstack UI shows the agent in v-2-VM in alert state, while agent state of s-1-VM shows blank (hyphen in the UI).
>> Both the system vms are running and ssh-able from the host. The log in s-1-Vm shows following errors:
>>
>> root@s-1-VM:~# grep 'Exception' /var/log/cloud/*.*
>> /var/log/cloud/cloud.out.2:java.io.IOException: No space left on device
>> /var/log/cloud/cloud.out.2:java.io.IOException: No space left on device
>>
>> whereas logs in v-1-VM shows
>> /var/log/cloud/cloud.out.3:java.io.IOException: No space left on device
>> /var/log/cloud/cloud.out.3:java.io.IOException: No space left on device
>> /var/log/cloud/cloud.out.3:07:18:00,547  INFO CSExceptionErrorCode:87 - Could not find exception: com.cloud.exception.AgentControlChannelException in error code list for exceptions
>> /var/log/cloud/cloud.out.3:com.cloud.exception.AgentControlChannelException: Unable to post agent control request as link is not available
>>
>> Looks like cloud agent is filling up the log, which is leading to the disk full state.
>>
>> Is this a known issue? Thanks.
>>
>> Anirban
>>
>>
>
>

Re: system vm disk space issue in ACS 4.3

Posted by Anirban Chakraborty <ab...@juniper.net>.
Thanks for all the responses. I do not see cloud.log and cloud.out logs are zipped in /var/log and /var/log/cloud respectively. Only file that was zipped was cron.log. The two largest files are:
cloud.out.2 with following:
—
+ keyvalues=' root=LABEL console=tty0 xencons=ttyS0,115200 console=hvc0 console=hvc0 template=domP type=secstorage host=10.84.58.252 port=8250 name=s-1-VM zone=1 pod=1 guid=s-1-VM resource=org.apache.cloudstack.storage.resource.NfsSecondaryStorageResource instance=SecStorage sslcopy=true role=templateProcessor mtu=1500'
+ for i in '$CMDLINE'
++ cut -s -d= -f1
++ echo eth2ip=10.84.59.176
+ KEY=eth2ip
++ cut -s -d= -f2
++ echo eth2ip=10.84.59.176
+ VALUE=10.84.59.176
+ '[' eth2ip == '' ']'
+ case $KEY in
+ keyvalues=' root=LABEL console=tty0 xencons=ttyS0,115200 console=hvc0 console=hvc0 template=domP type=secstorage host=10.84.58.252 port=8250 name=s-1-VM zone=1 pod=1 guid=s-1-VM resource=org.apache.cloudstack.sJava HotSpot(TM) Client VM warning: Insufficient space for shared memory file:
   /tmp/hsperfdata_root/8004
Try using the -Djava.io.tmpdir= option to select an alternate temp location.

log4j:WARN No appenders could be found for logger (com.cloud.agent.AgentShell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
log4j:WARN No such property [maxFileSize] in org.apache.log4j.rolling.RollingFileAppender.
log4j:WARN No such property [maxBackupIndex] in org.apache.log4j.rolling.RollingFileAppender.
log4j:WARN Please set a rolling policy for the RollingFileAppender named 'FILE3'
08:31:55,320  INFO AgentShell:318 - Agent started
log4j:ERROR Failed to flush writer,
java.io.IOException: No space left on device
—
and, repetition of following in /var/log/cloud.log.1 
—
2014-03-01 07:05:08,607 DEBUG [cloud.utils.ProcessUtil] (main:null)   PID TTY          TIME CMD
 3938 ?        00:00:36 java

2014-03-01 07:05:08,607 ERROR [cloud.agent.AgentShell] (main:null) Unable to start agent: Java process is being started twice.  If this is not true, remove /var/run/agent.SecStorage.pid
2014-03-01 07:05:19,028 INFO  [cloud.agent.AgentShell] (main:null) Agent started
2014-03-01 07:05:19,030 INFO  [cloud.agent.AgentShell] (main:null) Implementation Version is 4.3.0-SNAPSHOT
2014-03-01 07:05:19,030 INFO  [cloud.agent.AgentShell] (main:null) agent.properties found at /usr/local/cloud/systemvm/conf/agent.properties
2014-03-01 07:05:19,038 DEBUG [cloud.agent.AgentShell] (main:null) Found property: instance
2014-03-01 07:05:19,038 DEBUG [cloud.agent.AgentShell] (main:null) Found property: resource
2014-03-01 07:05:19,038 INFO  [cloud.agent.AgentShell] (main:null) Defaulting to using properties file for storage
2014-03-01 07:05:19,039 INFO  [cloud.agent.AgentShell] (main:null) Defaulting to the constant time backoff algorithm
2014-03-01 07:05:19,048 INFO  [cloud.utils.LogUtils] (main:null) log4j configuration found at /usr/local/cloud/systemvm/conf/log4j-cloud.xml
2014-03-01 07:05:19,062 DEBUG [cloud.agent.AgentShell] (main:null) Checking to see if agent.SecStorage.pid exists.
2014-03-01 07:05:19,064 DEBUG [cloud.utils.ProcessUtil] (main:null) environment.properties could not be opened
2014-03-01 07:05:19,071 DEBUG [cloud.utils.ProcessUtil] (main:null) Executing: bash -c ps -p 3938 
2014-03-01 07:05:19,077 DEBUG [cloud.utils.ProcessUtil] (main:null) Execution is successful.

eventually the log gets full with,
2014-03-10 09:54:09,420 INFO  [utils.nio.NioClient] (Agent-Selector:null) Connecting to 10.84.58.252:8250
2014-03-10 09:54:36,916 WARN  [utils.nio.NioConnection] (Agent-Selector:null) Unable to connect to remote: is there a server running on port 8250 
—
Looks like the agent on s-1-vm could not connect to the management server at some point of time after the system vm startup and the log gets filled with above ‘Unable to connect’ messages.

Anirban

On Mar 7, 2014, at 11:37 PM, Rajesh Battala <ra...@citrix.com> wrote:

> AFAIK, log roation is enabled in the systemvm.
> Can you check whether the logs are getting zipped .?
> 
> -----Original Message-----
> From: Anirban Chakraborty [mailto:abchak@juniper.net] 
> Sent: Saturday, March 8, 2014 12:46 PM
> To: dev@cloudstack.apache.org
> Subject: system vm disk space issue in ACS 4.3
> 
> Hi All,
> 
> I am seeing system vm disk has no space left after running for few days. Cloudstack UI shows the agent in v-2-VM in alert state, while agent state of s-1-VM shows blank (hyphen in the UI).
> Both the system vms are running and ssh-able from the host. The log in s-1-Vm shows following errors:
> 
> root@s-1-VM:~# grep 'Exception' /var/log/cloud/*.*
> /var/log/cloud/cloud.out.2:java.io.IOException: No space left on device
> /var/log/cloud/cloud.out.2:java.io.IOException: No space left on device
> 
> whereas logs in v-1-VM shows
> /var/log/cloud/cloud.out.3:java.io.IOException: No space left on device
> /var/log/cloud/cloud.out.3:java.io.IOException: No space left on device
> /var/log/cloud/cloud.out.3:07:18:00,547  INFO CSExceptionErrorCode:87 - Could not find exception: com.cloud.exception.AgentControlChannelException in error code list for exceptions
> /var/log/cloud/cloud.out.3:com.cloud.exception.AgentControlChannelException: Unable to post agent control request as link is not available
> 
> Looks like cloud agent is filling up the log, which is leading to the disk full state.
> 
> Is this a known issue? Thanks.
> 
> Anirban
> 
> 



RE: system vm disk space issue in ACS 4.3

Posted by Rajesh Battala <ra...@citrix.com>.
AFAIK, log roation is enabled in the systemvm.
Can you check whether the logs are getting zipped .?

-----Original Message-----
From: Anirban Chakraborty [mailto:abchak@juniper.net] 
Sent: Saturday, March 8, 2014 12:46 PM
To: dev@cloudstack.apache.org
Subject: system vm disk space issue in ACS 4.3

Hi All,

I am seeing system vm disk has no space left after running for few days. Cloudstack UI shows the agent in v-2-VM in alert state, while agent state of s-1-VM shows blank (hyphen in the UI).
Both the system vms are running and ssh-able from the host. The log in s-1-Vm shows following errors:

root@s-1-VM:~# grep 'Exception' /var/log/cloud/*.*
/var/log/cloud/cloud.out.2:java.io.IOException: No space left on device
/var/log/cloud/cloud.out.2:java.io.IOException: No space left on device

whereas logs in v-1-VM shows
/var/log/cloud/cloud.out.3:java.io.IOException: No space left on device
/var/log/cloud/cloud.out.3:java.io.IOException: No space left on device
/var/log/cloud/cloud.out.3:07:18:00,547  INFO CSExceptionErrorCode:87 - Could not find exception: com.cloud.exception.AgentControlChannelException in error code list for exceptions
/var/log/cloud/cloud.out.3:com.cloud.exception.AgentControlChannelException: Unable to post agent control request as link is not available

Looks like cloud agent is filling up the log, which is leading to the disk full state.

Is this a known issue? Thanks.

Anirban