You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@stratos.apache.org by "Martin Eppel (meppel)" <me...@cisco.com> on 2015/09/10 23:39:13 UTC

Stratos 4.1: "Too many open files" issue

Hi,

We are seeing an issue with stratos running out of file handles when creating a number of applications and VM instances:

The scenario is as follows:

13 applications are deployed, each with a single cluster and a single member instance,

As the VMs spin up stratos becomes unresponsive and checking the logs we find the following exceptions (see below). I remember we had seen similar issues (same exceptions) back in stratos 4.0 in the context of longevity tests.

We are running stratos 4.1 RC4 with the latest  commit

commit 0fd41840fb04d92ba921bf58c59c2c3fbad0c561
Author: Imesh Gunaratne <im...@apache.org>
Date:   Tue Jul 7 12:54:47 2015 +0530

Is this a known issue which might have been fixed in a later commit or something new ? Can we verify that the fixes for the previous issues are included in our system (jars, commit,s etc ...) ?




rg.apache.thrift.transport.TTransportException: java.net.SocketException: Too many open files
at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:118)
at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35)
at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
at org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:106)
at org.wso2.carbon.databridge.receiver.thrift.internal.ThriftDataReceiver$ServerThread.run(ThriftDataReceiver.java:199)
at java.lang.Thread.run(Thread.java:745)
TID: [0] [STRATOS] [2015-08-17 17:38:17,499] WARN {org.apache.thrift.server.TThreadPoolServer} - Transport error occurred during acceptance of message.
org.apache.thrift.transport.TTransportException: java.net.SocketException: Too many open files
at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:118)
at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35)
at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
at org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:106)
at org.wso2.carbon.databridge.receiver.thrift.internal.ThriftDataReceiver$ServerThread.run(ThriftDataReceiver.java:199)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketException: Too many open files

// listing the applications, member isntances and cartridge state:

[di-000-xxx] - application name,

di-000-010: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Starting 1)
di-000-011: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Initialized 1)
cartridge-proxy: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Active 1)
di-000-001: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Active 1)
di-000-002: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Active 1)
di-000-012: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Created 1)
di-000-003: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Starting 1)
di-000-004: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Starting 1)
di-000-006: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Starting 1)
di-000-005: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Starting 1)
di-000-008: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Starting 1)
di-000-007: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Starting 1)
di-000-009: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Starting 1)

Re: Stratos 4.1: "Too many open files" issue

Posted by Reka Thirunavukkarasu <re...@wso2.com>.

Hi Martin,

CA is publishing events in every 15s by default. If you want to change it,
please add

-Dstats.notifier.interval=xx to stratos.sh of the CA.

In order to add thrift-agent-config.xml to CA:

  - Please create <CA-HOME>/conf/data-bridge folder
  - Place the attached thrift-agent-config.xml with customized
configuration as you need into <CA-HOME>/conf/data-bridge
   - Add below  to stratos.sh of the CA
           -Dcarbon.config.dir.path=<CA-HOME>/conf
           where replace <CA-HOME> with the absolute path

If you perform above steps, data publisher of CA will pick up the updated
thrift-agent-config.xml.

Since PCA is not using the java written thrift library for the
communication, we might not need to do the exact configuration there.

@Chamila, but is it possible to configure anything similar like this in PCA?

Thanks,
Reka


On Mon, Sep 14, 2015 at 11:38 PM, Martin Eppel (meppel) <me...@cisco.com>
wrote:

> Hi Reka, Akila
>
>
>
> Thanks for the response:
>
>
>
> On the system where we encountered the issue we are still running the java
> agent (JCA). I checked the commit as suggested by Akila and we do have it.
>
>
>
> The secureEvictionTimePeriod is set to the default (5500 ms).
>
>
>
> Question, in STRATOS-739 it states (see quote below) that the
> secureEvictionTimePeriod needs to be tuned according the publishing period
> – how would I determine the publishing period ? Also, I don’t see the
> config file (thrift-agent-config.xml) in the JCA package (client side) ?
> Where should we set the time period, server side or client side ?
>
>
>
> Is this configuration (or something similar ) expected as well in PCA (we
> are migrating over but it’s taking some time) ?
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> Quote from doc:
>
>
>
> (… If you are publishing events in a periodic interval as more than 5.5s,
> you need to tune the <secureEvictionTimePeriod> parameter accordingly…)
>
>
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
> *Sent:* Sunday, September 13, 2015 10:49 PM
> *To:* dev
> *Cc:* Chamila De Alwis
> *Subject:* Re: Stratos 4.1: "Too many open files" issue
>
>
>
> Hi Martin,
>
> What cartridge agent are you using currently? Is it java or Python? This
> problem was identified from the thrift data publisher side in java. Since
> python agent is using different approach to connect to data receiver, we
> will need to verify whether python agent fixed this particular issue. If
> you could explain who are connecting to stratos using thrift, then we can
> check on the thrift agent version as Akila mentioned and find out the root
> cause of this issue.
>
> Thanks,
>
> Reka
>
>
>
> On Sun, Sep 13, 2015 at 12:46 PM, Akila Ravihansa Perera <
> ravihansa@wso2.com> wrote:
>
> Hi Martin,
>
> I think we fixed this problem by uplifting Thrift agent feature in [1].
> The root cause of this issue was that after periodic publishing, thrift
> agent fails to evict the old connection according to [2]. The fix is
> described in [3]. Looks like your stack trace is very similar to what has
> been reported in JIRA.
>
>
>
> Can you check whether you have this fix applied?
>
> [1]
> https://github.com/apache/stratos/commit/8985d96eb811aa8e9ce2c114f1856b4c4e20517b
> [2] https://issues.apache.org/jira/browse/STRATOS-723
> [3] https://issues.apache.org/jira/browse/STRATOS-739
>
>
>
> Thanks.
>
>
>
> On Sun, Sep 13, 2015 at 10:56 AM, Imesh Gunaratne <im...@apache.org>
> wrote:
>
> Hi Martin,
>
>
>
> I believe you are using 4.1.0-RC4 with some custom changes you have done
> locally. Will you be able to test this on stratos-4.1.x branch latest
> commit (without having any other changes)? I cannot recall a fix we did
> after 4.1.0-RC4 for this but it would be better if you can verify with the
> latest code in stratos-4.1.x branch.
>
>
>
> At the same time will you be able to do following:
>
>    - Take a thread dump of the running Stratos, CEP instances once this
>    happens
>    - Check the file descriptor limits of the OS
>
> Thanks
>
>
>
> On Sat, Sep 12, 2015 at 10:56 PM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Resending in case it got lost,
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> *From:* Martin Eppel (meppel)
> *Sent:* Thursday, September 10, 2015 2:39 PM
> *To:* dev@stratos.apache.org
> *Subject:* Stratos 4.1: "Too many open files" issue
>
>
>
> Hi,
>
>
>
> We are seeing an issue with stratos running out of file handles when
> creating a number of applications and VM instances:
>
>
>
> The scenario is as follows:
>
>
>
> 13 applications are deployed, each with a single cluster and a single
> member instance,
>
>
>
> As the VMs spin up stratos becomes unresponsive and checking the logs we
> find the following exceptions (see below). I remember we had seen similar
> issues (same exceptions) back in stratos 4.0 in the context of longevity
> tests.
>
>
>
> We are running stratos 4.1 RC4 with the latest  commit
>
>
>
> commit 0fd41840fb04d92ba921bf58c59c2c3fbad0c561
>
> Author: Imesh Gunaratne <im...@apache.org>
>
> Date:   Tue Jul 7 12:54:47 2015 +0530
>
>
>
> Is this a known issue which might have been fixed in a later commit or
> something new ? Can we verify that the fixes for the previous issues are
> included in our system (jars, commit,s etc …) ?
>
>
>
>
>
>
>
>
>
> rg.apache.thrift.transport.TTransportException: java.net.SocketException:
> Too many open files
> at
> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:118)
> at
> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35)
> at
> org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
> at
> org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:106)
> at
> org.wso2.carbon.databridge.receiver.thrift.internal.ThriftDataReceiver$ServerThread.run(ThriftDataReceiver.java:199)
> at java.lang.Thread.run(Thread.java:745)
> TID: [0] [STRATOS] [2015-08-17 17:38:17,499] WARN
> {org.apache.thrift.server.TThreadPoolServer} - Transport error occurred
> during acceptance of message.
> org.apache.thrift.transport.TTransportException: java.net.SocketException:
> Too many open files
> at
> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:118)
> at
> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35)
> at
> org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
> at
> org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:106)
> at
> org.wso2.carbon.databridge.receiver.thrift.internal.ThriftDataReceiver$ServerThread.run(ThriftDataReceiver.java:199)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.SocketException: Too many open files
>
>
>
> // listing the applications, member isntances and cartridge state:
>
>
>
> [di-000-xxx] – application name,
>
>
>
> di-000-010: applicationInstances 1, groupInstances 0, clusterInstances 1,
> members 1 (Starting 1)
>
> di-000-011: applicationInstances 1, groupInstances 0, clusterInstances 1,
> members 1 (Initialized 1)
>
> cartridge-proxy: applicationInstances 1, groupInstances 0,
> clusterInstances 1, members 1 (Active 1)
>
> di-000-001: applicationInstances 1, groupInstances 0, clusterInstances 1,
> members 1 (Active 1)
>
> di-000-002: applicationInstances 1, groupInstances 0, clusterInstances 1,
> members 1 (Active 1)
>
> di-000-012: applicationInstances 1, groupInstances 0, clusterInstances 1,
> members 1 (Created 1)
>
> di-000-003: applicationInstances 1, groupInstances 0, clusterInstances 1,
> members 1 (Starting 1)
>
> di-000-004: applicationInstances 1, groupInstances 0, clusterInstances 1,
> members 1 (Starting 1)
>
> di-000-006: applicationInstances 1, groupInstances 0, clusterInstances 1,
> members 1 (Starting 1)
>
> di-000-005: applicationInstances 1, groupInstances 0, clusterInstances 1,
> members 1 (Starting 1)
>
> di-000-008: applicationInstances 1, groupInstances 0, clusterInstances 1,
> members 1 (Starting 1)
>
> di-000-007: applicationInstances 1, groupInstances 0, clusterInstances 1,
> members 1 (Starting 1)
>
> di-000-009: applicationInstances 1, groupInstances 0, clusterInstances 1,
> members 1 (Starting 1)
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
> --
>
> Akila Ravihansa Perera
> WSO2 Inc.;  http://wso2.com/
>
> Blog: http://ravihansa3000.blogspot.com
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>



-- 
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007

RE: Stratos 4.1: "Too many open files" issue

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.

Hi Reka, Akila

Thanks for the response:

On the system where we encountered the issue we are still running the java agent (JCA). I checked the commit as suggested by Akila and we do have it.

The secureEvictionTimePeriod is set to the default (5500 ms).

Question, in STRATOS-739 it states (see quote below) that the secureEvictionTimePeriod needs to be tuned according the publishing period – how would I determine the publishing period ? Also, I don’t see the config file (thrift-agent-config.xml) in the JCA package (client side) ? Where should we set the time period, server side or client side ?

Is this configuration (or something similar ) expected as well in PCA (we are migrating over but it’s taking some time) ?

Thanks

Martin

Quote from doc:

(… If you are publishing events in a periodic interval as more than 5.5s, you need to tune the <secureEvictionTimePeriod> parameter accordingly…)

From: Reka Thirunavukkarasu [mailto:reka@wso2.com]
Sent: Sunday, September 13, 2015 10:49 PM
To: dev
Cc: Chamila De Alwis
Subject: Re: Stratos 4.1: "Too many open files" issue

Hi Martin,
What cartridge agent are you using currently? Is it java or Python? This problem was identified from the thrift data publisher side in java. Since python agent is using different approach to connect to data receiver, we will need to verify whether python agent fixed this particular issue. If you could explain who are connecting to stratos using thrift, then we can check on the thrift agent version as Akila mentioned and find out the root cause of this issue.
Thanks,
Reka

On Sun, Sep 13, 2015 at 12:46 PM, Akila Ravihansa Perera <ra...@wso2.com>> wrote:
Hi Martin,

I think we fixed this problem by uplifting Thrift agent feature in [1]. The root cause of this issue was that after periodic publishing, thrift agent fails to evict the old connection according to [2]. The fix is described in [3]. Looks like your stack trace is very similar to what has been reported in JIRA.

Can you check whether you have this fix applied?

[1] https://github.com/apache/stratos/commit/8985d96eb811aa8e9ce2c114f1856b4c4e20517b
[2] https://issues.apache.org/jira/browse/STRATOS-723
[3] https://issues.apache.org/jira/browse/STRATOS-739

Thanks.

On Sun, Sep 13, 2015 at 10:56 AM, Imesh Gunaratne <im...@apache.org>> wrote:
Hi Martin,

I believe you are using 4.1.0-RC4 with some custom changes you have done locally. Will you be able to test this on stratos-4.1.x branch latest commit (without having any other changes)? I cannot recall a fix we did after 4.1.0-RC4 for this but it would be better if you can verify with the latest code in stratos-4.1.x branch.

At the same time will you be able to do following:

  *   Take a thread dump of the running Stratos, CEP instances once this happens
  *   Check the file descriptor limits of the OS
Thanks

On Sat, Sep 12, 2015 at 10:56 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Resending in case it got lost,

Thanks

Martin

From: Martin Eppel (meppel)
Sent: Thursday, September 10, 2015 2:39 PM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: Stratos 4.1: "Too many open files" issue

Hi,

We are seeing an issue with stratos running out of file handles when creating a number of applications and VM instances:

The scenario is as follows:

13 applications are deployed, each with a single cluster and a single member instance,

As the VMs spin up stratos becomes unresponsive and checking the logs we find the following exceptions (see below). I remember we had seen similar issues (same exceptions) back in stratos 4.0 in the context of longevity tests.

We are running stratos 4.1 RC4 with the latest  commit

commit 0fd41840fb04d92ba921bf58c59c2c3fbad0c561
Author: Imesh Gunaratne <im...@apache.org>>
Date:   Tue Jul 7 12:54:47 2015 +0530

Is this a known issue which might have been fixed in a later commit or something new ? Can we verify that the fixes for the previous issues are included in our system (jars, commit,s etc …) ?




rg.apache.thrift.transport.TTransportException: java.net.SocketException: Too many open files
at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:118)
at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35)
at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
at org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:106)
at org.wso2.carbon.databridge.receiver.thrift.internal.ThriftDataReceiver$ServerThread.run(ThriftDataReceiver.java:199)
at java.lang.Thread.run(Thread.java:745)
TID: [0] [STRATOS] [2015-08-17 17:38:17,499] WARN {org.apache.thrift.server.TThreadPoolServer} - Transport error occurred during acceptance of message.
org.apache.thrift.transport.TTransportException: java.net.SocketException: Too many open files
at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:118)
at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35)
at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
at org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:106)
at org.wso2.carbon.databridge.receiver.thrift.internal.ThriftDataReceiver$ServerThread.run(ThriftDataReceiver.java:199)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketException: Too many open files

// listing the applications, member isntances and cartridge state:

[di-000-xxx] – application name,

di-000-010: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Starting 1)
di-000-011: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Initialized 1)
cartridge-proxy: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Active 1)
di-000-001: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Active 1)
di-000-002: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Active 1)
di-000-012: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Created 1)
di-000-003: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Starting 1)
di-000-004: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Starting 1)
di-000-006: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Starting 1)
di-000-005: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Starting 1)
di-000-008: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Starting 1)
di-000-007: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Starting 1)
di-000-009: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Starting 1)




--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Akila Ravihansa Perera
WSO2 Inc.;  http://wso2.com/

Blog: http://ravihansa3000.blogspot.com



--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007

Re: Stratos 4.1: "Too many open files" issue

Posted by Reka Thirunavukkarasu <re...@wso2.com>.

Hi Martin,

What cartridge agent are you using currently? Is it java or Python? This
problem was identified from the thrift data publisher side in java. Since
python agent is using different approach to connect to data receiver, we
will need to verify whether python agent fixed this particular issue. If
you could explain who are connecting to stratos using thrift, then we can
check on the thrift agent version as Akila mentioned and find out the root
cause of this issue.

Thanks,
Reka

On Sun, Sep 13, 2015 at 12:46 PM, Akila Ravihansa Perera <ravihansa@wso2.com
> wrote:

> Hi Martin,
>
> I think we fixed this problem by uplifting Thrift agent feature in [1].
> The root cause of this issue was that after periodic publishing, thrift
> agent fails to evict the old connection according to [2]. The fix is
> described in [3]. Looks like your stack trace is very similar to what has
> been reported in JIRA.
>
> Can you check whether you have this fix applied?
>
> [1]
> https://github.com/apache/stratos/commit/8985d96eb811aa8e9ce2c114f1856b4c4e20517b
> [2] https://issues.apache.org/jira/browse/STRATOS-723
> [3] https://issues.apache.org/jira/browse/STRATOS-739
>
> Thanks.
>
> On Sun, Sep 13, 2015 at 10:56 AM, Imesh Gunaratne <im...@apache.org>
> wrote:
>
>> Hi Martin,
>>
>> I believe you are using 4.1.0-RC4 with some custom changes you have done
>> locally. Will you be able to test this on stratos-4.1.x branch latest
>> commit (without having any other changes)? I cannot recall a fix we did
>> after 4.1.0-RC4 for this but it would be better if you can verify with the
>> latest code in stratos-4.1.x branch.
>>
>> At the same time will you be able to do following:
>>
>>    - Take a thread dump of the running Stratos, CEP instances once this
>>    happens
>>    - Check the file descriptor limits of the OS
>>
>> Thanks
>>
>> On Sat, Sep 12, 2015 at 10:56 PM, Martin Eppel (meppel) <meppel@cisco.com
>> > wrote:
>>
>>> Resending in case it got lost,
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> Martin
>>>
>>>
>>>
>>> *From:* Martin Eppel (meppel)
>>> *Sent:* Thursday, September 10, 2015 2:39 PM
>>> *To:* dev@stratos.apache.org
>>> *Subject:* Stratos 4.1: "Too many open files" issue
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> We are seeing an issue with stratos running out of file handles when
>>> creating a number of applications and VM instances:
>>>
>>>
>>>
>>> The scenario is as follows:
>>>
>>>
>>>
>>> 13 applications are deployed, each with a single cluster and a single
>>> member instance,
>>>
>>>
>>>
>>> As the VMs spin up stratos becomes unresponsive and checking the logs we
>>> find the following exceptions (see below). I remember we had seen similar
>>> issues (same exceptions) back in stratos 4.0 in the context of longevity
>>> tests.
>>>
>>>
>>>
>>> We are running stratos 4.1 RC4 with the latest  commit
>>>
>>>
>>>
>>> commit 0fd41840fb04d92ba921bf58c59c2c3fbad0c561
>>>
>>> Author: Imesh Gunaratne <im...@apache.org>
>>>
>>> Date:   Tue Jul 7 12:54:47 2015 +0530
>>>
>>>
>>>
>>> Is this a known issue which might have been fixed in a later commit or
>>> something new ? Can we verify that the fixes for the previous issues are
>>> included in our system (jars, commit,s etc …) ?
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> rg.apache.thrift.transport.TTransportException:
>>> java.net.SocketException: Too many open files
>>> at
>>> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:118)
>>> at
>>> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35)
>>> at
>>> org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
>>> at
>>> org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:106)
>>> at
>>> org.wso2.carbon.databridge.receiver.thrift.internal.ThriftDataReceiver$ServerThread.run(ThriftDataReceiver.java:199)
>>> at java.lang.Thread.run(Thread.java:745)
>>> TID: [0] [STRATOS] [2015-08-17 17:38:17,499] WARN
>>> {org.apache.thrift.server.TThreadPoolServer} - Transport error occurred
>>> during acceptance of message.
>>> org.apache.thrift.transport.TTransportException:
>>> java.net.SocketException: Too many open files
>>> at
>>> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:118)
>>> at
>>> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35)
>>> at
>>> org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
>>> at
>>> org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:106)
>>> at
>>> org.wso2.carbon.databridge.receiver.thrift.internal.ThriftDataReceiver$ServerThread.run(ThriftDataReceiver.java:199)
>>> at java.lang.Thread.run(Thread.java:745)
>>> Caused by: java.net.SocketException: Too many open files
>>>
>>>
>>>
>>> // listing the applications, member isntances and cartridge state:
>>>
>>>
>>>
>>> [di-000-xxx] – application name,
>>>
>>>
>>>
>>> di-000-010: applicationInstances 1, groupInstances 0, clusterInstances
>>> 1, members 1 (Starting 1)
>>>
>>> di-000-011: applicationInstances 1, groupInstances 0, clusterInstances
>>> 1, members 1 (Initialized 1)
>>>
>>> cartridge-proxy: applicationInstances 1, groupInstances 0,
>>> clusterInstances 1, members 1 (Active 1)
>>>
>>> di-000-001: applicationInstances 1, groupInstances 0, clusterInstances
>>> 1, members 1 (Active 1)
>>>
>>> di-000-002: applicationInstances 1, groupInstances 0, clusterInstances
>>> 1, members 1 (Active 1)
>>>
>>> di-000-012: applicationInstances 1, groupInstances 0, clusterInstances
>>> 1, members 1 (Created 1)
>>>
>>> di-000-003: applicationInstances 1, groupInstances 0, clusterInstances
>>> 1, members 1 (Starting 1)
>>>
>>> di-000-004: applicationInstances 1, groupInstances 0, clusterInstances
>>> 1, members 1 (Starting 1)
>>>
>>> di-000-006: applicationInstances 1, groupInstances 0, clusterInstances
>>> 1, members 1 (Starting 1)
>>>
>>> di-000-005: applicationInstances 1, groupInstances 0, clusterInstances
>>> 1, members 1 (Starting 1)
>>>
>>> di-000-008: applicationInstances 1, groupInstances 0, clusterInstances
>>> 1, members 1 (Starting 1)
>>>
>>> di-000-007: applicationInstances 1, groupInstances 0, clusterInstances
>>> 1, members 1 (Starting 1)
>>>
>>> di-000-009: applicationInstances 1, groupInstances 0, clusterInstances
>>> 1, members 1 (Starting 1)
>>>
>>>
>>>
>>
>>
>>
>> --
>> Imesh Gunaratne
>>
>> Senior Technical Lead, WSO2
>> Committer & PMC Member, Apache Stratos
>>
>
>
>
> --
> Akila Ravihansa Perera
> WSO2 Inc.;  http://wso2.com/
>
> Blog: http://ravihansa3000.blogspot.com
>



-- 
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007

Re: Stratos 4.1: "Too many open files" issue

Posted by Akila Ravihansa Perera <ra...@wso2.com>.

Hi Martin,

I think we fixed this problem by uplifting Thrift agent feature in [1]. The
root cause of this issue was that after periodic publishing, thrift agent
fails to evict the old connection according to [2]. The fix is described in
[3]. Looks like your stack trace is very similar to what has been reported
in JIRA.

Can you check whether you have this fix applied?

[1]
https://github.com/apache/stratos/commit/8985d96eb811aa8e9ce2c114f1856b4c4e20517b
[2] https://issues.apache.org/jira/browse/STRATOS-723
[3] https://issues.apache.org/jira/browse/STRATOS-739

Thanks.

On Sun, Sep 13, 2015 at 10:56 AM, Imesh Gunaratne <im...@apache.org> wrote:

> Hi Martin,
>
> I believe you are using 4.1.0-RC4 with some custom changes you have done
> locally. Will you be able to test this on stratos-4.1.x branch latest
> commit (without having any other changes)? I cannot recall a fix we did
> after 4.1.0-RC4 for this but it would be better if you can verify with the
> latest code in stratos-4.1.x branch.
>
> At the same time will you be able to do following:
>
>    - Take a thread dump of the running Stratos, CEP instances once this
>    happens
>    - Check the file descriptor limits of the OS
>
> Thanks
>
> On Sat, Sep 12, 2015 at 10:56 PM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
>> Resending in case it got lost,
>>
>>
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>> *From:* Martin Eppel (meppel)
>> *Sent:* Thursday, September 10, 2015 2:39 PM
>> *To:* dev@stratos.apache.org
>> *Subject:* Stratos 4.1: "Too many open files" issue
>>
>>
>>
>> Hi,
>>
>>
>>
>> We are seeing an issue with stratos running out of file handles when
>> creating a number of applications and VM instances:
>>
>>
>>
>> The scenario is as follows:
>>
>>
>>
>> 13 applications are deployed, each with a single cluster and a single
>> member instance,
>>
>>
>>
>> As the VMs spin up stratos becomes unresponsive and checking the logs we
>> find the following exceptions (see below). I remember we had seen similar
>> issues (same exceptions) back in stratos 4.0 in the context of longevity
>> tests.
>>
>>
>>
>> We are running stratos 4.1 RC4 with the latest  commit
>>
>>
>>
>> commit 0fd41840fb04d92ba921bf58c59c2c3fbad0c561
>>
>> Author: Imesh Gunaratne <im...@apache.org>
>>
>> Date:   Tue Jul 7 12:54:47 2015 +0530
>>
>>
>>
>> Is this a known issue which might have been fixed in a later commit or
>> something new ? Can we verify that the fixes for the previous issues are
>> included in our system (jars, commit,s etc …) ?
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> rg.apache.thrift.transport.TTransportException: java.net.SocketException:
>> Too many open files
>> at
>> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:118)
>> at
>> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35)
>> at
>> org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
>> at
>> org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:106)
>> at
>> org.wso2.carbon.databridge.receiver.thrift.internal.ThriftDataReceiver$ServerThread.run(ThriftDataReceiver.java:199)
>> at java.lang.Thread.run(Thread.java:745)
>> TID: [0] [STRATOS] [2015-08-17 17:38:17,499] WARN
>> {org.apache.thrift.server.TThreadPoolServer} - Transport error occurred
>> during acceptance of message.
>> org.apache.thrift.transport.TTransportException:
>> java.net.SocketException: Too many open files
>> at
>> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:118)
>> at
>> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35)
>> at
>> org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
>> at
>> org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:106)
>> at
>> org.wso2.carbon.databridge.receiver.thrift.internal.ThriftDataReceiver$ServerThread.run(ThriftDataReceiver.java:199)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.net.SocketException: Too many open files
>>
>>
>>
>> // listing the applications, member isntances and cartridge state:
>>
>>
>>
>> [di-000-xxx] – application name,
>>
>>
>>
>> di-000-010: applicationInstances 1, groupInstances 0, clusterInstances 1,
>> members 1 (Starting 1)
>>
>> di-000-011: applicationInstances 1, groupInstances 0, clusterInstances 1,
>> members 1 (Initialized 1)
>>
>> cartridge-proxy: applicationInstances 1, groupInstances 0,
>> clusterInstances 1, members 1 (Active 1)
>>
>> di-000-001: applicationInstances 1, groupInstances 0, clusterInstances 1,
>> members 1 (Active 1)
>>
>> di-000-002: applicationInstances 1, groupInstances 0, clusterInstances 1,
>> members 1 (Active 1)
>>
>> di-000-012: applicationInstances 1, groupInstances 0, clusterInstances 1,
>> members 1 (Created 1)
>>
>> di-000-003: applicationInstances 1, groupInstances 0, clusterInstances 1,
>> members 1 (Starting 1)
>>
>> di-000-004: applicationInstances 1, groupInstances 0, clusterInstances 1,
>> members 1 (Starting 1)
>>
>> di-000-006: applicationInstances 1, groupInstances 0, clusterInstances 1,
>> members 1 (Starting 1)
>>
>> di-000-005: applicationInstances 1, groupInstances 0, clusterInstances 1,
>> members 1 (Starting 1)
>>
>> di-000-008: applicationInstances 1, groupInstances 0, clusterInstances 1,
>> members 1 (Starting 1)
>>
>> di-000-007: applicationInstances 1, groupInstances 0, clusterInstances 1,
>> members 1 (Starting 1)
>>
>> di-000-009: applicationInstances 1, groupInstances 0, clusterInstances 1,
>> members 1 (Starting 1)
>>
>>
>>
>
>
>
> --
> Imesh Gunaratne
>
> Senior Technical Lead, WSO2
> Committer & PMC Member, Apache Stratos
>



-- 
Akila Ravihansa Perera
WSO2 Inc.;  http://wso2.com/

Blog: http://ravihansa3000.blogspot.com

Re: Stratos 4.1: "Too many open files" issue

Posted by Imesh Gunaratne <im...@apache.org>.

Hi Martin,

I believe you are using 4.1.0-RC4 with some custom changes you have done
locally. Will you be able to test this on stratos-4.1.x branch latest
commit (without having any other changes)? I cannot recall a fix we did
after 4.1.0-RC4 for this but it would be better if you can verify with the
latest code in stratos-4.1.x branch.

At the same time will you be able to do following:

   - Take a thread dump of the running Stratos, CEP instances once this
   happens
   - Check the file descriptor limits of the OS

Thanks

On Sat, Sep 12, 2015 at 10:56 PM, Martin Eppel (meppel) <me...@cisco.com>
wrote:

> Resending in case it got lost,
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> *From:* Martin Eppel (meppel)
> *Sent:* Thursday, September 10, 2015 2:39 PM
> *To:* dev@stratos.apache.org
> *Subject:* Stratos 4.1: "Too many open files" issue
>
>
>
> Hi,
>
>
>
> We are seeing an issue with stratos running out of file handles when
> creating a number of applications and VM instances:
>
>
>
> The scenario is as follows:
>
>
>
> 13 applications are deployed, each with a single cluster and a single
> member instance,
>
>
>
> As the VMs spin up stratos becomes unresponsive and checking the logs we
> find the following exceptions (see below). I remember we had seen similar
> issues (same exceptions) back in stratos 4.0 in the context of longevity
> tests.
>
>
>
> We are running stratos 4.1 RC4 with the latest  commit
>
>
>
> commit 0fd41840fb04d92ba921bf58c59c2c3fbad0c561
>
> Author: Imesh Gunaratne <im...@apache.org>
>
> Date:   Tue Jul 7 12:54:47 2015 +0530
>
>
>
> Is this a known issue which might have been fixed in a later commit or
> something new ? Can we verify that the fixes for the previous issues are
> included in our system (jars, commit,s etc …) ?
>
>
>
>
>
>
>
>
>
> rg.apache.thrift.transport.TTransportException: java.net.SocketException:
> Too many open files
> at
> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:118)
> at
> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35)
> at
> org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
> at
> org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:106)
> at
> org.wso2.carbon.databridge.receiver.thrift.internal.ThriftDataReceiver$ServerThread.run(ThriftDataReceiver.java:199)
> at java.lang.Thread.run(Thread.java:745)
> TID: [0] [STRATOS] [2015-08-17 17:38:17,499] WARN
> {org.apache.thrift.server.TThreadPoolServer} - Transport error occurred
> during acceptance of message.
> org.apache.thrift.transport.TTransportException: java.net.SocketException:
> Too many open files
> at
> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:118)
> at
> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35)
> at
> org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
> at
> org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:106)
> at
> org.wso2.carbon.databridge.receiver.thrift.internal.ThriftDataReceiver$ServerThread.run(ThriftDataReceiver.java:199)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.SocketException: Too many open files
>
>
>
> // listing the applications, member isntances and cartridge state:
>
>
>
> [di-000-xxx] – application name,
>
>
>
> di-000-010: applicationInstances 1, groupInstances 0, clusterInstances 1,
> members 1 (Starting 1)
>
> di-000-011: applicationInstances 1, groupInstances 0, clusterInstances 1,
> members 1 (Initialized 1)
>
> cartridge-proxy: applicationInstances 1, groupInstances 0,
> clusterInstances 1, members 1 (Active 1)
>
> di-000-001: applicationInstances 1, groupInstances 0, clusterInstances 1,
> members 1 (Active 1)
>
> di-000-002: applicationInstances 1, groupInstances 0, clusterInstances 1,
> members 1 (Active 1)
>
> di-000-012: applicationInstances 1, groupInstances 0, clusterInstances 1,
> members 1 (Created 1)
>
> di-000-003: applicationInstances 1, groupInstances 0, clusterInstances 1,
> members 1 (Starting 1)
>
> di-000-004: applicationInstances 1, groupInstances 0, clusterInstances 1,
> members 1 (Starting 1)
>
> di-000-006: applicationInstances 1, groupInstances 0, clusterInstances 1,
> members 1 (Starting 1)
>
> di-000-005: applicationInstances 1, groupInstances 0, clusterInstances 1,
> members 1 (Starting 1)
>
> di-000-008: applicationInstances 1, groupInstances 0, clusterInstances 1,
> members 1 (Starting 1)
>
> di-000-007: applicationInstances 1, groupInstances 0, clusterInstances 1,
> members 1 (Starting 1)
>
> di-000-009: applicationInstances 1, groupInstances 0, clusterInstances 1,
> members 1 (Starting 1)
>
>
>



-- 
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

RE: Stratos 4.1: "Too many open files" issue

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.

Resending in case it got lost,

Thanks

Martin

From: Martin Eppel (meppel)
Sent: Thursday, September 10, 2015 2:39 PM
To: dev@stratos.apache.org
Subject: Stratos 4.1: "Too many open files" issue

Hi,

We are seeing an issue with stratos running out of file handles when creating a number of applications and VM instances:

The scenario is as follows:

13 applications are deployed, each with a single cluster and a single member instance,

As the VMs spin up stratos becomes unresponsive and checking the logs we find the following exceptions (see below). I remember we had seen similar issues (same exceptions) back in stratos 4.0 in the context of longevity tests.

We are running stratos 4.1 RC4 with the latest  commit

commit 0fd41840fb04d92ba921bf58c59c2c3fbad0c561
Author: Imesh Gunaratne <im...@apache.org>>
Date:   Tue Jul 7 12:54:47 2015 +0530

Is this a known issue which might have been fixed in a later commit or something new ? Can we verify that the fixes for the previous issues are included in our system (jars, commit,s etc ...) ?

rg.apache.thrift.transport.TTransportException: java.net.SocketException: Too many open files
at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:118)
at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35)
at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
at org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:106)
at org.wso2.carbon.databridge.receiver.thrift.internal.ThriftDataReceiver$ServerThread.run(ThriftDataReceiver.java:199)
at java.lang.Thread.run(Thread.java:745)
TID: [0] [STRATOS] [2015-08-17 17:38:17,499] WARN {org.apache.thrift.server.TThreadPoolServer} - Transport error occurred during acceptance of message.
org.apache.thrift.transport.TTransportException: java.net.SocketException: Too many open files
at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:118)
at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35)
at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
at org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:106)
at org.wso2.carbon.databridge.receiver.thrift.internal.ThriftDataReceiver$ServerThread.run(ThriftDataReceiver.java:199)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketException: Too many open files

// listing the applications, member isntances and cartridge state:

[di-000-xxx] - application name,

di-000-010: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Starting 1)
di-000-011: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Initialized 1)
cartridge-proxy: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Active 1)
di-000-001: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Active 1)
di-000-002: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Active 1)
di-000-012: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Created 1)
di-000-003: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Starting 1)
di-000-004: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Starting 1)
di-000-006: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Starting 1)
di-000-005: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Starting 1)
di-000-008: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Starting 1)
di-000-007: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Starting 1)
di-000-009: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Starting 1)