You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@activemq.apache.org by "Lionel Cons (JIRA)" <ji...@apache.org> on 2012/08/14 09:05:38 UTC

[jira] [Created] (APLO-241) Apollo becomes unresponsive under stress

Lionel Cons created APLO-241:
--------------------------------

             Summary: Apollo becomes unresponsive under stress
                 Key: APLO-241
                 URL: https://issues.apache.org/jira/browse/APLO-241
             Project: ActiveMQ Apollo
          Issue Type: Bug
         Environment: apollo-99-trunk-20120813.171747-82
            Reporter: Lionel Cons


When trying to reproduce APLO-238, I found another problem :-(

I ran stomp-benchmark with the attached scenario to simulate one topic consumer with many producers. As expected, stomp-benchmark reported many errors like:

java.net.ConnectException: Connection timed out
java.io.IOException: Connection reset by peer

However, according to netstat, more than 10k connections have been established. stomp-benchmark eventually stopped, with some results:

c_c1 samples: [ 1450,140,261,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ]
p_p1 samples: [ 75052,447310,430373,431496,406670,436637,455825,436305,451396,449173,408920,491221,527663,556973,605508,580931,616605,576667,589194,606230,567179,426264,256924,152997,82039,42061,23405,11416,5965,3874,2660,1834,1740,1298,1026,660,742,639,533,496,400,175,124,79,124,124,123,124,123,87,0,0,0,0,0,0,0,0,0,0 ]
e_p1 samples: [ 1832,76,0,0,0,3,664,1721,1,1,0,3,19,659,1704,0,5,8,10,670,910,785,4,10,8,26,654,1138,555,12,7,12,24,656,1674,13,11,8,17,20,655,1670,11,17,14,14,29,651,1664,15,16,16,16,669,901,772,19,16,15,25 ]
p_p2 samples: [ 68831,398643,391879,389710,365791,393488,406712,382222,398429,407385,370296,439194,465552,496801,507263,412671,328124,216992,123357,77492,43132,19195,10099,6064,4780,2969,1255,766,886,879,849,677,995,932,860,486,551,552,551,415,247,149,110,117,330,330,330,294,419,444,394,441,196,220,158,217,221,148,111,97 ]
e_p2 samples: [ 1704,82,0,0,0,1,767,1613,1,2,0,8,24,767,1601,0,1,1,12,786,782,814,7,3,8,28,770,984,604,7,10,8,24,773,1566,14,11,16,8,27,775,1557,17,8,15,10,32,775,1547,15,10,18,15,786,767,789,19,11,25,35 ]

However, after the end of the test, Apollo does not respond anymore. Its REST API cannot be contacted (read timeout) and it cannot be stopped via the service script, only kill -9 works. Strangely, it's only using 100% of CPU (on multi-core) and 35% of memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (APLO-241) Apollo becomes unresponsive under stress

Posted by "Hiram Chirino (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/APLO-241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hiram Chirino updated APLO-241:
-------------------------------

      Component/s: apollo-stomp
                   apollo-broker
    Fix Version/s: 1.5
         Assignee: Hiram Chirino
    
> Apollo becomes unresponsive under stress
> ----------------------------------------
>
>                 Key: APLO-241
>                 URL: https://issues.apache.org/jira/browse/APLO-241
>             Project: ActiveMQ Apollo
>          Issue Type: Bug
>          Components: apollo-broker, apollo-stomp
>         Environment: apollo-99-trunk-20120813.171747-82
>            Reporter: Lionel Cons
>            Assignee: Hiram Chirino
>             Fix For: 1.5
>
>         Attachments: APLO-241.stack, APLO-241.xml
>
>
> When trying to reproduce APLO-238, I found another problem :-(
> I ran stomp-benchmark with the attached scenario to simulate one topic consumer with many producers. As expected, stomp-benchmark reported many errors like:
> java.net.ConnectException: Connection timed out
> java.io.IOException: Connection reset by peer
> However, according to netstat, more than 10k connections have been established. stomp-benchmark eventually stopped, with some results:
> c_c1 samples: [ 1450,140,261,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ]
> p_p1 samples: [ 75052,447310,430373,431496,406670,436637,455825,436305,451396,449173,408920,491221,527663,556973,605508,580931,616605,576667,589194,606230,567179,426264,256924,152997,82039,42061,23405,11416,5965,3874,2660,1834,1740,1298,1026,660,742,639,533,496,400,175,124,79,124,124,123,124,123,87,0,0,0,0,0,0,0,0,0,0 ]
> e_p1 samples: [ 1832,76,0,0,0,3,664,1721,1,1,0,3,19,659,1704,0,5,8,10,670,910,785,4,10,8,26,654,1138,555,12,7,12,24,656,1674,13,11,8,17,20,655,1670,11,17,14,14,29,651,1664,15,16,16,16,669,901,772,19,16,15,25 ]
> p_p2 samples: [ 68831,398643,391879,389710,365791,393488,406712,382222,398429,407385,370296,439194,465552,496801,507263,412671,328124,216992,123357,77492,43132,19195,10099,6064,4780,2969,1255,766,886,879,849,677,995,932,860,486,551,552,551,415,247,149,110,117,330,330,330,294,419,444,394,441,196,220,158,217,221,148,111,97 ]
> e_p2 samples: [ 1704,82,0,0,0,1,767,1613,1,2,0,8,24,767,1601,0,1,1,12,786,782,814,7,3,8,28,770,984,604,7,10,8,24,773,1566,14,11,16,8,27,775,1557,17,8,15,10,32,775,1547,15,10,18,15,786,767,789,19,11,25,35 ]
> However, after the end of the test, Apollo does not respond anymore. Its REST API cannot be contacted (read timeout) and it cannot be stopped via the service script, only kill -9 works. Strangely, it's only using 100% of CPU (on multi-core) and 35% of memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (APLO-241) Apollo becomes unresponsive under stress

Posted by "Lionel Cons (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/APLO-241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lionel Cons updated APLO-241:
-----------------------------

    Attachment: APLO-241.stack

20 minutes after the end of stomp-benchmark, Apollo is still dead. Here is its stack trace.
                
> Apollo becomes unresponsive under stress
> ----------------------------------------
>
>                 Key: APLO-241
>                 URL: https://issues.apache.org/jira/browse/APLO-241
>             Project: ActiveMQ Apollo
>          Issue Type: Bug
>         Environment: apollo-99-trunk-20120813.171747-82
>            Reporter: Lionel Cons
>         Attachments: APLO-241.stack, APLO-241.xml
>
>
> When trying to reproduce APLO-238, I found another problem :-(
> I ran stomp-benchmark with the attached scenario to simulate one topic consumer with many producers. As expected, stomp-benchmark reported many errors like:
> java.net.ConnectException: Connection timed out
> java.io.IOException: Connection reset by peer
> However, according to netstat, more than 10k connections have been established. stomp-benchmark eventually stopped, with some results:
> c_c1 samples: [ 1450,140,261,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ]
> p_p1 samples: [ 75052,447310,430373,431496,406670,436637,455825,436305,451396,449173,408920,491221,527663,556973,605508,580931,616605,576667,589194,606230,567179,426264,256924,152997,82039,42061,23405,11416,5965,3874,2660,1834,1740,1298,1026,660,742,639,533,496,400,175,124,79,124,124,123,124,123,87,0,0,0,0,0,0,0,0,0,0 ]
> e_p1 samples: [ 1832,76,0,0,0,3,664,1721,1,1,0,3,19,659,1704,0,5,8,10,670,910,785,4,10,8,26,654,1138,555,12,7,12,24,656,1674,13,11,8,17,20,655,1670,11,17,14,14,29,651,1664,15,16,16,16,669,901,772,19,16,15,25 ]
> p_p2 samples: [ 68831,398643,391879,389710,365791,393488,406712,382222,398429,407385,370296,439194,465552,496801,507263,412671,328124,216992,123357,77492,43132,19195,10099,6064,4780,2969,1255,766,886,879,849,677,995,932,860,486,551,552,551,415,247,149,110,117,330,330,330,294,419,444,394,441,196,220,158,217,221,148,111,97 ]
> e_p2 samples: [ 1704,82,0,0,0,1,767,1613,1,2,0,8,24,767,1601,0,1,1,12,786,782,814,7,3,8,28,770,984,604,7,10,8,24,773,1566,14,11,16,8,27,775,1557,17,8,15,10,32,775,1547,15,10,18,15,786,767,789,19,11,25,35 ]
> However, after the end of the test, Apollo does not respond anymore. Its REST API cannot be contacted (read timeout) and it cannot be stopped via the service script, only kill -9 works. Strangely, it's only using 100% of CPU (on multi-core) and 35% of memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (APLO-241) Apollo becomes unresponsive under stress

Posted by "Lionel Cons (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/APLO-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433982#comment-13433982 ] 

Lionel Cons commented on APLO-241:
----------------------------------

With 10 times less producers (800+800), stomp-benchmark does not report errors anymore but Apollo still becomes unresponsive after a few minutes.
                
> Apollo becomes unresponsive under stress
> ----------------------------------------
>
>                 Key: APLO-241
>                 URL: https://issues.apache.org/jira/browse/APLO-241
>             Project: ActiveMQ Apollo
>          Issue Type: Bug
>         Environment: apollo-99-trunk-20120813.171747-82
>            Reporter: Lionel Cons
>         Attachments: APLO-241.stack, APLO-241.xml
>
>
> When trying to reproduce APLO-238, I found another problem :-(
> I ran stomp-benchmark with the attached scenario to simulate one topic consumer with many producers. As expected, stomp-benchmark reported many errors like:
> java.net.ConnectException: Connection timed out
> java.io.IOException: Connection reset by peer
> However, according to netstat, more than 10k connections have been established. stomp-benchmark eventually stopped, with some results:
> c_c1 samples: [ 1450,140,261,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ]
> p_p1 samples: [ 75052,447310,430373,431496,406670,436637,455825,436305,451396,449173,408920,491221,527663,556973,605508,580931,616605,576667,589194,606230,567179,426264,256924,152997,82039,42061,23405,11416,5965,3874,2660,1834,1740,1298,1026,660,742,639,533,496,400,175,124,79,124,124,123,124,123,87,0,0,0,0,0,0,0,0,0,0 ]
> e_p1 samples: [ 1832,76,0,0,0,3,664,1721,1,1,0,3,19,659,1704,0,5,8,10,670,910,785,4,10,8,26,654,1138,555,12,7,12,24,656,1674,13,11,8,17,20,655,1670,11,17,14,14,29,651,1664,15,16,16,16,669,901,772,19,16,15,25 ]
> p_p2 samples: [ 68831,398643,391879,389710,365791,393488,406712,382222,398429,407385,370296,439194,465552,496801,507263,412671,328124,216992,123357,77492,43132,19195,10099,6064,4780,2969,1255,766,886,879,849,677,995,932,860,486,551,552,551,415,247,149,110,117,330,330,330,294,419,444,394,441,196,220,158,217,221,148,111,97 ]
> e_p2 samples: [ 1704,82,0,0,0,1,767,1613,1,2,0,8,24,767,1601,0,1,1,12,786,782,814,7,3,8,28,770,984,604,7,10,8,24,773,1566,14,11,16,8,27,775,1557,17,8,15,10,32,775,1547,15,10,18,15,786,767,789,19,11,25,35 ]
> However, after the end of the test, Apollo does not respond anymore. Its REST API cannot be contacted (read timeout) and it cannot be stopped via the service script, only kill -9 works. Strangely, it's only using 100% of CPU (on multi-core) and 35% of memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (APLO-241) Apollo becomes unresponsive under stress

Posted by "Hiram Chirino (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/APLO-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435741#comment-13435741 ] 

Hiram Chirino commented on APLO-241:
------------------------------------

Apollo was not optimally handling buffers in the case were clients were sending small messages between small pauses.  Each message in apollo would hold on to about a 64k chunk of memory.  In the case when there is no pause, multiple messages would share that 64k chunk of memory so it was a non-issue. 

The responsiveness issue was due to the broker hitting an out of memory condition.

This should now be fixed in the following build:
https://repository.apache.org/content/repositories/snapshots/org/apache/activemq/apache-apollo/99-trunk-SNAPSHOT/apache-apollo-99-trunk-20120816.033652-85-unix-distro.tar.gz

I've also update the stomp-benchmark to more gracefully establish large numbers of connections against a broker.  I recommend you pull the new stomp-benchmark source and do an 'sbt update'.
                
> Apollo becomes unresponsive under stress
> ----------------------------------------
>
>                 Key: APLO-241
>                 URL: https://issues.apache.org/jira/browse/APLO-241
>             Project: ActiveMQ Apollo
>          Issue Type: Bug
>         Environment: apollo-99-trunk-20120813.171747-82
>            Reporter: Lionel Cons
>         Attachments: APLO-241.stack, APLO-241.xml
>
>
> When trying to reproduce APLO-238, I found another problem :-(
> I ran stomp-benchmark with the attached scenario to simulate one topic consumer with many producers. As expected, stomp-benchmark reported many errors like:
> java.net.ConnectException: Connection timed out
> java.io.IOException: Connection reset by peer
> However, according to netstat, more than 10k connections have been established. stomp-benchmark eventually stopped, with some results:
> c_c1 samples: [ 1450,140,261,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ]
> p_p1 samples: [ 75052,447310,430373,431496,406670,436637,455825,436305,451396,449173,408920,491221,527663,556973,605508,580931,616605,576667,589194,606230,567179,426264,256924,152997,82039,42061,23405,11416,5965,3874,2660,1834,1740,1298,1026,660,742,639,533,496,400,175,124,79,124,124,123,124,123,87,0,0,0,0,0,0,0,0,0,0 ]
> e_p1 samples: [ 1832,76,0,0,0,3,664,1721,1,1,0,3,19,659,1704,0,5,8,10,670,910,785,4,10,8,26,654,1138,555,12,7,12,24,656,1674,13,11,8,17,20,655,1670,11,17,14,14,29,651,1664,15,16,16,16,669,901,772,19,16,15,25 ]
> p_p2 samples: [ 68831,398643,391879,389710,365791,393488,406712,382222,398429,407385,370296,439194,465552,496801,507263,412671,328124,216992,123357,77492,43132,19195,10099,6064,4780,2969,1255,766,886,879,849,677,995,932,860,486,551,552,551,415,247,149,110,117,330,330,330,294,419,444,394,441,196,220,158,217,221,148,111,97 ]
> e_p2 samples: [ 1704,82,0,0,0,1,767,1613,1,2,0,8,24,767,1601,0,1,1,12,786,782,814,7,3,8,28,770,984,604,7,10,8,24,773,1566,14,11,16,8,27,775,1557,17,8,15,10,32,775,1547,15,10,18,15,786,767,789,19,11,25,35 ]
> However, after the end of the test, Apollo does not respond anymore. Its REST API cannot be contacted (read timeout) and it cannot be stopped via the service script, only kill -9 works. Strangely, it's only using 100% of CPU (on multi-core) and 35% of memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (APLO-241) Apollo becomes unresponsive under stress

Posted by "Hiram Chirino (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/APLO-241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hiram Chirino resolved APLO-241.
--------------------------------

    Resolution: Fixed
    
> Apollo becomes unresponsive under stress
> ----------------------------------------
>
>                 Key: APLO-241
>                 URL: https://issues.apache.org/jira/browse/APLO-241
>             Project: ActiveMQ Apollo
>          Issue Type: Bug
>          Components: apollo-broker, apollo-stomp
>         Environment: apollo-99-trunk-20120813.171747-82
>            Reporter: Lionel Cons
>            Assignee: Hiram Chirino
>             Fix For: 1.5
>
>         Attachments: APLO-241.stack, APLO-241.xml
>
>
> When trying to reproduce APLO-238, I found another problem :-(
> I ran stomp-benchmark with the attached scenario to simulate one topic consumer with many producers. As expected, stomp-benchmark reported many errors like:
> java.net.ConnectException: Connection timed out
> java.io.IOException: Connection reset by peer
> However, according to netstat, more than 10k connections have been established. stomp-benchmark eventually stopped, with some results:
> c_c1 samples: [ 1450,140,261,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ]
> p_p1 samples: [ 75052,447310,430373,431496,406670,436637,455825,436305,451396,449173,408920,491221,527663,556973,605508,580931,616605,576667,589194,606230,567179,426264,256924,152997,82039,42061,23405,11416,5965,3874,2660,1834,1740,1298,1026,660,742,639,533,496,400,175,124,79,124,124,123,124,123,87,0,0,0,0,0,0,0,0,0,0 ]
> e_p1 samples: [ 1832,76,0,0,0,3,664,1721,1,1,0,3,19,659,1704,0,5,8,10,670,910,785,4,10,8,26,654,1138,555,12,7,12,24,656,1674,13,11,8,17,20,655,1670,11,17,14,14,29,651,1664,15,16,16,16,669,901,772,19,16,15,25 ]
> p_p2 samples: [ 68831,398643,391879,389710,365791,393488,406712,382222,398429,407385,370296,439194,465552,496801,507263,412671,328124,216992,123357,77492,43132,19195,10099,6064,4780,2969,1255,766,886,879,849,677,995,932,860,486,551,552,551,415,247,149,110,117,330,330,330,294,419,444,394,441,196,220,158,217,221,148,111,97 ]
> e_p2 samples: [ 1704,82,0,0,0,1,767,1613,1,2,0,8,24,767,1601,0,1,1,12,786,782,814,7,3,8,28,770,984,604,7,10,8,24,773,1566,14,11,16,8,27,775,1557,17,8,15,10,32,775,1547,15,10,18,15,786,767,789,19,11,25,35 ]
> However, after the end of the test, Apollo does not respond anymore. Its REST API cannot be contacted (read timeout) and it cannot be stopped via the service script, only kill -9 works. Strangely, it's only using 100% of CPU (on multi-core) and 35% of memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (APLO-241) Apollo becomes unresponsive under stress

Posted by "Lionel Cons (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/APLO-241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lionel Cons updated APLO-241:
-----------------------------

    Attachment: APLO-241.xml
    
> Apollo becomes unresponsive under stress
> ----------------------------------------
>
>                 Key: APLO-241
>                 URL: https://issues.apache.org/jira/browse/APLO-241
>             Project: ActiveMQ Apollo
>          Issue Type: Bug
>         Environment: apollo-99-trunk-20120813.171747-82
>            Reporter: Lionel Cons
>         Attachments: APLO-241.xml
>
>
> When trying to reproduce APLO-238, I found another problem :-(
> I ran stomp-benchmark with the attached scenario to simulate one topic consumer with many producers. As expected, stomp-benchmark reported many errors like:
> java.net.ConnectException: Connection timed out
> java.io.IOException: Connection reset by peer
> However, according to netstat, more than 10k connections have been established. stomp-benchmark eventually stopped, with some results:
> c_c1 samples: [ 1450,140,261,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ]
> p_p1 samples: [ 75052,447310,430373,431496,406670,436637,455825,436305,451396,449173,408920,491221,527663,556973,605508,580931,616605,576667,589194,606230,567179,426264,256924,152997,82039,42061,23405,11416,5965,3874,2660,1834,1740,1298,1026,660,742,639,533,496,400,175,124,79,124,124,123,124,123,87,0,0,0,0,0,0,0,0,0,0 ]
> e_p1 samples: [ 1832,76,0,0,0,3,664,1721,1,1,0,3,19,659,1704,0,5,8,10,670,910,785,4,10,8,26,654,1138,555,12,7,12,24,656,1674,13,11,8,17,20,655,1670,11,17,14,14,29,651,1664,15,16,16,16,669,901,772,19,16,15,25 ]
> p_p2 samples: [ 68831,398643,391879,389710,365791,393488,406712,382222,398429,407385,370296,439194,465552,496801,507263,412671,328124,216992,123357,77492,43132,19195,10099,6064,4780,2969,1255,766,886,879,849,677,995,932,860,486,551,552,551,415,247,149,110,117,330,330,330,294,419,444,394,441,196,220,158,217,221,148,111,97 ]
> e_p2 samples: [ 1704,82,0,0,0,1,767,1613,1,2,0,8,24,767,1601,0,1,1,12,786,782,814,7,3,8,28,770,984,604,7,10,8,24,773,1566,14,11,16,8,27,775,1557,17,8,15,10,32,775,1547,15,10,18,15,786,767,789,19,11,25,35 ]
> However, after the end of the test, Apollo does not respond anymore. Its REST API cannot be contacted (read timeout) and it cannot be stopped via the service script, only kill -9 works. Strangely, it's only using 100% of CPU (on multi-core) and 35% of memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (APLO-241) Apollo becomes unresponsive under stress

Posted by "Lionel Cons (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/APLO-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435941#comment-13435941 ] 

Lionel Cons commented on APLO-241:
----------------------------------

I've not been able to reproduce the problem with your code change. Good!

However, regarding "The responsiveness issue was due to the broker hitting an out of memory condition.", it would be good to log an exception when such memory problems occur...
                
> Apollo becomes unresponsive under stress
> ----------------------------------------
>
>                 Key: APLO-241
>                 URL: https://issues.apache.org/jira/browse/APLO-241
>             Project: ActiveMQ Apollo
>          Issue Type: Bug
>          Components: apollo-broker, apollo-stomp
>         Environment: apollo-99-trunk-20120813.171747-82
>            Reporter: Lionel Cons
>            Assignee: Hiram Chirino
>             Fix For: 1.5
>
>         Attachments: APLO-241.stack, APLO-241.xml
>
>
> When trying to reproduce APLO-238, I found another problem :-(
> I ran stomp-benchmark with the attached scenario to simulate one topic consumer with many producers. As expected, stomp-benchmark reported many errors like:
> java.net.ConnectException: Connection timed out
> java.io.IOException: Connection reset by peer
> However, according to netstat, more than 10k connections have been established. stomp-benchmark eventually stopped, with some results:
> c_c1 samples: [ 1450,140,261,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ]
> p_p1 samples: [ 75052,447310,430373,431496,406670,436637,455825,436305,451396,449173,408920,491221,527663,556973,605508,580931,616605,576667,589194,606230,567179,426264,256924,152997,82039,42061,23405,11416,5965,3874,2660,1834,1740,1298,1026,660,742,639,533,496,400,175,124,79,124,124,123,124,123,87,0,0,0,0,0,0,0,0,0,0 ]
> e_p1 samples: [ 1832,76,0,0,0,3,664,1721,1,1,0,3,19,659,1704,0,5,8,10,670,910,785,4,10,8,26,654,1138,555,12,7,12,24,656,1674,13,11,8,17,20,655,1670,11,17,14,14,29,651,1664,15,16,16,16,669,901,772,19,16,15,25 ]
> p_p2 samples: [ 68831,398643,391879,389710,365791,393488,406712,382222,398429,407385,370296,439194,465552,496801,507263,412671,328124,216992,123357,77492,43132,19195,10099,6064,4780,2969,1255,766,886,879,849,677,995,932,860,486,551,552,551,415,247,149,110,117,330,330,330,294,419,444,394,441,196,220,158,217,221,148,111,97 ]
> e_p2 samples: [ 1704,82,0,0,0,1,767,1613,1,2,0,8,24,767,1601,0,1,1,12,786,782,814,7,3,8,28,770,984,604,7,10,8,24,773,1566,14,11,16,8,27,775,1557,17,8,15,10,32,775,1547,15,10,18,15,786,767,789,19,11,25,35 ]
> However, after the end of the test, Apollo does not respond anymore. Its REST API cannot be contacted (read timeout) and it cannot be stopped via the service script, only kill -9 works. Strangely, it's only using 100% of CPU (on multi-core) and 35% of memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira