You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@activemq.apache.org by "Arthur Naseef (JIRA)" <ji...@apache.org> on 2010/11/10 21:14:19 UTC

[jira] Created: (AMQ-3028) ActiveMQ broker processing slows with consumption from large store

ActiveMQ broker processing slows with consumption from large store
------------------------------------------------------------------

                 Key: AMQ-3028
                 URL: https://issues.apache.org/activemq/browse/AMQ-3028
             Project: ActiveMQ
          Issue Type: Bug
          Components: Broker
    Affects Versions: 5.4.1
         Environment: CentOS 5.5, Sun JDK 1.6.0_21-b06 64 bit, ActiveMQ 5.4.1, AMD Athlon(tm) II X2 B22, local disk
            Reporter: Arthur Naseef
            Priority: Critical


In scalability tests, this problem occured.  I have tested a workaround that appears to function.  A fix will gladly be submitted - would like some guidance, though, on the most appropriate solution.

Here's the summary.  Many more details are available upon request.

Root cause:

   - Believed to be simultaneous access to LRUCache objects which are not thread-safe (PageFile's pageCache)

Workaround:

   - Synchronize the LRUCache on all access methods (get, put, remove)

The symptoms are as follows:

  1. Message rates run fairly-constant until a point in time when they degrade rather quickly
  2. After a while (about 15 minutes), the message rates drop to the floor - with large numbers of seconds with 0 records passing
  3. Using VisualVM or JConsole, note that memory use grows continuosuly
  4. When message rates drop to the floor, the VM is spending the vast majority of its time performing garbage collection
  5. Heap dumps show that LRUCache objects (the pageCache members of PageFile's) are far exceeding their configured limits.
      The default limit was used, 10000.  A size of over 170,000 entries was reached.
  6. No producer flow control occurred (did not see the flow control log message)

Test scenario used to reproduce:

   - Fast producers (limited to <= 1000 msgs/sec)
      -- using transactions
      -- 10 msg per transaction
      -- message content size 177 bytes

   - Slow consumers (limited to <= 10 msg/sec)
      -- auto-acknowledge mode; not transacted

   - 10 Queues
      -- 1 producer per queue
      -- 1 consumer per queue

   - Producers, Consumers, and Broker all running on different systems, and on the same system (different test runs).

Note that disk space was not an issue - there was always plenty of disk space available.

One other interesting note - once a large database of records was stored in KahaDB, only running consumers, this problem still occurred.

This issue sounds like it may be related to 1764, and 2721.  The root cause sounds the same as 2290 - unsynchronized access to LRUCache.

The most straight-forward solution is to modify all LRUCache objects (org.apache.kahadb.util.LRUCache, org.apache.activemq.util.LRUCache, ...) to be concurrent.  Another is to create concurrent versions (perhaps ConcurrentLRUCache) and make use of those at least in PageFile.pageCache.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AMQ-3028) ActiveMQ broker processing slows with consumption from large store

Posted by "Arthur Naseef (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/activemq/browse/AMQ-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=63517#action_63517 ] 

Arthur Naseef commented on AMQ-3028:
------------------------------------

Oh hey Adam - different message thread.

I was just indicating that I need to run my tests to feel confident it's resolved.


> ActiveMQ broker processing slows with consumption from large store
> ------------------------------------------------------------------
>
>                 Key: AMQ-3028
>                 URL: https://issues.apache.org/activemq/browse/AMQ-3028
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 5.4.1
>         Environment: CentOS 5.5, Sun JDK 1.6.0_21-b06 64 bit, ActiveMQ 5.4.1, AMD Athlon(tm) II X2 B22, local disk
>            Reporter: Arthur Naseef
>            Assignee: Dejan Bosanac
>            Priority: Critical
>             Fix For: 5.5.0
>
>         Attachments: LRUCache.patch
>
>
> In scalability tests, this problem occured.  I have tested a workaround that appears to function.  A fix will gladly be submitted - would like some guidance, though, on the most appropriate solution.
> Here's the summary.  Many more details are available upon request.
> Root cause:
>    - Believed to be simultaneous access to LRUCache objects which are not thread-safe (PageFile's pageCache)
> Workaround:
>    - Synchronize the LRUCache on all access methods (get, put, remove)
> The symptoms are as follows:
>   1. Message rates run fairly-constant until a point in time when they degrade rather quickly
>   2. After a while (about 15 minutes), the message rates drop to the floor - with large numbers of seconds with 0 records passing
>   3. Using VisualVM or JConsole, note that memory use grows continuosuly
>   4. When message rates drop to the floor, the VM is spending the vast majority of its time performing garbage collection
>   5. Heap dumps show that LRUCache objects (the pageCache members of PageFile's) are far exceeding their configured limits.
>       The default limit was used, 10000.  A size of over 170,000 entries was reached.
>   6. No producer flow control occurred (did not see the flow control log message)
> Test scenario used to reproduce:
>    - Fast producers (limited to <= 1000 msgs/sec)
>       -- using transactions
>       -- 10 msg per transaction
>       -- message content size 177 bytes
>    - Slow consumers (limited to <= 10 msg/sec)
>       -- auto-acknowledge mode; not transacted
>    - 10 Queues
>       -- 1 producer per queue
>       -- 1 consumer per queue
>    - Producers, Consumers, and Broker all running on different systems, and on the same system (different test runs).
> Note that disk space was not an issue - there was always plenty of disk space available.
> One other interesting note - once a large database of records was stored in KahaDB, only running consumers, this problem still occurred.
> This issue sounds like it may be related to 1764, and 2721.  The root cause sounds the same as 2290 - unsynchronized access to LRUCache.
> The most straight-forward solution is to modify all LRUCache objects (org.apache.kahadb.util.LRUCache, org.apache.activemq.util.LRUCache, ...) to be concurrent.  Another is to create concurrent versions (perhaps ConcurrentLRUCache) and make use of those at least in PageFile.pageCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AMQ-3028) ActiveMQ broker processing slows with consumption from large store

Posted by "Arthur Naseef (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/activemq/browse/AMQ-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=63444#action_63444 ] 

Arthur Naseef commented on AMQ-3028:
------------------------------------

Is there something I can do to further assist with this issue?

Testing with the attached patch was successful - all of the problems were aleviated.

I have considered writing a JUnit to test it, but that is not trivial because (a) the time needed to learn JUnit, (b) the impact of configuration on reproducing the problem in a timely manner (increasing JVM memory may delay detection of the issue), and (c) detecting the problem requires internal access to the LRUCache or some other method with which I am unfamiliar.


> ActiveMQ broker processing slows with consumption from large store
> ------------------------------------------------------------------
>
>                 Key: AMQ-3028
>                 URL: https://issues.apache.org/activemq/browse/AMQ-3028
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 5.4.1
>         Environment: CentOS 5.5, Sun JDK 1.6.0_21-b06 64 bit, ActiveMQ 5.4.1, AMD Athlon(tm) II X2 B22, local disk
>            Reporter: Arthur Naseef
>            Priority: Critical
>         Attachments: LRUCache.patch
>
>
> In scalability tests, this problem occured.  I have tested a workaround that appears to function.  A fix will gladly be submitted - would like some guidance, though, on the most appropriate solution.
> Here's the summary.  Many more details are available upon request.
> Root cause:
>    - Believed to be simultaneous access to LRUCache objects which are not thread-safe (PageFile's pageCache)
> Workaround:
>    - Synchronize the LRUCache on all access methods (get, put, remove)
> The symptoms are as follows:
>   1. Message rates run fairly-constant until a point in time when they degrade rather quickly
>   2. After a while (about 15 minutes), the message rates drop to the floor - with large numbers of seconds with 0 records passing
>   3. Using VisualVM or JConsole, note that memory use grows continuosuly
>   4. When message rates drop to the floor, the VM is spending the vast majority of its time performing garbage collection
>   5. Heap dumps show that LRUCache objects (the pageCache members of PageFile's) are far exceeding their configured limits.
>       The default limit was used, 10000.  A size of over 170,000 entries was reached.
>   6. No producer flow control occurred (did not see the flow control log message)
> Test scenario used to reproduce:
>    - Fast producers (limited to <= 1000 msgs/sec)
>       -- using transactions
>       -- 10 msg per transaction
>       -- message content size 177 bytes
>    - Slow consumers (limited to <= 10 msg/sec)
>       -- auto-acknowledge mode; not transacted
>    - 10 Queues
>       -- 1 producer per queue
>       -- 1 consumer per queue
>    - Producers, Consumers, and Broker all running on different systems, and on the same system (different test runs).
> Note that disk space was not an issue - there was always plenty of disk space available.
> One other interesting note - once a large database of records was stored in KahaDB, only running consumers, this problem still occurred.
> This issue sounds like it may be related to 1764, and 2721.  The root cause sounds the same as 2290 - unsynchronized access to LRUCache.
> The most straight-forward solution is to modify all LRUCache objects (org.apache.kahadb.util.LRUCache, org.apache.activemq.util.LRUCache, ...) to be concurrent.  Another is to create concurrent versions (perhaps ConcurrentLRUCache) and make use of those at least in PageFile.pageCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AMQ-3028) ActiveMQ broker processing slows with consumption from large store

Posted by "Arthur Naseef (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/activemq/browse/AMQ-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=63519#action_63519 ] 

Arthur Naseef commented on AMQ-3028:
------------------------------------

My tests just finished and ran without a problem.  In addition to consistent performance throughout the test, a heapdump with VisualVM shows the LRUCache objects all stayed within their limits.

Thank you!

> ActiveMQ broker processing slows with consumption from large store
> ------------------------------------------------------------------
>
>                 Key: AMQ-3028
>                 URL: https://issues.apache.org/activemq/browse/AMQ-3028
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 5.4.1
>         Environment: CentOS 5.5, Sun JDK 1.6.0_21-b06 64 bit, ActiveMQ 5.4.1, AMD Athlon(tm) II X2 B22, local disk
>            Reporter: Arthur Naseef
>            Assignee: Dejan Bosanac
>            Priority: Critical
>             Fix For: 5.5.0
>
>         Attachments: LRUCache.patch
>
>
> In scalability tests, this problem occured.  I have tested a workaround that appears to function.  A fix will gladly be submitted - would like some guidance, though, on the most appropriate solution.
> Here's the summary.  Many more details are available upon request.
> Root cause:
>    - Believed to be simultaneous access to LRUCache objects which are not thread-safe (PageFile's pageCache)
> Workaround:
>    - Synchronize the LRUCache on all access methods (get, put, remove)
> The symptoms are as follows:
>   1. Message rates run fairly-constant until a point in time when they degrade rather quickly
>   2. After a while (about 15 minutes), the message rates drop to the floor - with large numbers of seconds with 0 records passing
>   3. Using VisualVM or JConsole, note that memory use grows continuosuly
>   4. When message rates drop to the floor, the VM is spending the vast majority of its time performing garbage collection
>   5. Heap dumps show that LRUCache objects (the pageCache members of PageFile's) are far exceeding their configured limits.
>       The default limit was used, 10000.  A size of over 170,000 entries was reached.
>   6. No producer flow control occurred (did not see the flow control log message)
> Test scenario used to reproduce:
>    - Fast producers (limited to <= 1000 msgs/sec)
>       -- using transactions
>       -- 10 msg per transaction
>       -- message content size 177 bytes
>    - Slow consumers (limited to <= 10 msg/sec)
>       -- auto-acknowledge mode; not transacted
>    - 10 Queues
>       -- 1 producer per queue
>       -- 1 consumer per queue
>    - Producers, Consumers, and Broker all running on different systems, and on the same system (different test runs).
> Note that disk space was not an issue - there was always plenty of disk space available.
> One other interesting note - once a large database of records was stored in KahaDB, only running consumers, this problem still occurred.
> This issue sounds like it may be related to 1764, and 2721.  The root cause sounds the same as 2290 - unsynchronized access to LRUCache.
> The most straight-forward solution is to modify all LRUCache objects (org.apache.kahadb.util.LRUCache, org.apache.activemq.util.LRUCache, ...) to be concurrent.  Another is to create concurrent versions (perhaps ConcurrentLRUCache) and make use of those at least in PageFile.pageCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AMQ-3028) ActiveMQ broker processing slows with consumption from large store

Posted by "Adam Sussman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/activemq/browse/AMQ-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=63516#action_63516 ] 

Adam Sussman commented on AMQ-3028:
-----------------------------------


Are you saying their solution isn't good enough?




> ActiveMQ broker processing slows with consumption from large store
> ------------------------------------------------------------------
>
>                 Key: AMQ-3028
>                 URL: https://issues.apache.org/activemq/browse/AMQ-3028
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 5.4.1
>         Environment: CentOS 5.5, Sun JDK 1.6.0_21-b06 64 bit, ActiveMQ 5.4.1, AMD Athlon(tm) II X2 B22, local disk
>            Reporter: Arthur Naseef
>            Assignee: Dejan Bosanac
>            Priority: Critical
>             Fix For: 5.5.0
>
>         Attachments: LRUCache.patch
>
>
> In scalability tests, this problem occured.  I have tested a workaround that appears to function.  A fix will gladly be submitted - would like some guidance, though, on the most appropriate solution.
> Here's the summary.  Many more details are available upon request.
> Root cause:
>    - Believed to be simultaneous access to LRUCache objects which are not thread-safe (PageFile's pageCache)
> Workaround:
>    - Synchronize the LRUCache on all access methods (get, put, remove)
> The symptoms are as follows:
>   1. Message rates run fairly-constant until a point in time when they degrade rather quickly
>   2. After a while (about 15 minutes), the message rates drop to the floor - with large numbers of seconds with 0 records passing
>   3. Using VisualVM or JConsole, note that memory use grows continuosuly
>   4. When message rates drop to the floor, the VM is spending the vast majority of its time performing garbage collection
>   5. Heap dumps show that LRUCache objects (the pageCache members of PageFile's) are far exceeding their configured limits.
>       The default limit was used, 10000.  A size of over 170,000 entries was reached.
>   6. No producer flow control occurred (did not see the flow control log message)
> Test scenario used to reproduce:
>    - Fast producers (limited to <= 1000 msgs/sec)
>       -- using transactions
>       -- 10 msg per transaction
>       -- message content size 177 bytes
>    - Slow consumers (limited to <= 10 msg/sec)
>       -- auto-acknowledge mode; not transacted
>    - 10 Queues
>       -- 1 producer per queue
>       -- 1 consumer per queue
>    - Producers, Consumers, and Broker all running on different systems, and on the same system (different test runs).
> Note that disk space was not an issue - there was always plenty of disk space available.
> One other interesting note - once a large database of records was stored in KahaDB, only running consumers, this problem still occurred.
> This issue sounds like it may be related to 1764, and 2721.  The root cause sounds the same as 2290 - unsynchronized access to LRUCache.
> The most straight-forward solution is to modify all LRUCache objects (org.apache.kahadb.util.LRUCache, org.apache.activemq.util.LRUCache, ...) to be concurrent.  Another is to create concurrent versions (perhaps ConcurrentLRUCache) and make use of those at least in PageFile.pageCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AMQ-3028) ActiveMQ broker processing slows with consumption from large store

Posted by "Arthur Naseef (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/activemq/browse/AMQ-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=63515#action_63515 ] 

Arthur Naseef commented on AMQ-3028:
------------------------------------

I will test with the update and post the results when complete.  With any luck, it'll be done today.

> ActiveMQ broker processing slows with consumption from large store
> ------------------------------------------------------------------
>
>                 Key: AMQ-3028
>                 URL: https://issues.apache.org/activemq/browse/AMQ-3028
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 5.4.1
>         Environment: CentOS 5.5, Sun JDK 1.6.0_21-b06 64 bit, ActiveMQ 5.4.1, AMD Athlon(tm) II X2 B22, local disk
>            Reporter: Arthur Naseef
>            Assignee: Dejan Bosanac
>            Priority: Critical
>             Fix For: 5.5.0
>
>         Attachments: LRUCache.patch
>
>
> In scalability tests, this problem occured.  I have tested a workaround that appears to function.  A fix will gladly be submitted - would like some guidance, though, on the most appropriate solution.
> Here's the summary.  Many more details are available upon request.
> Root cause:
>    - Believed to be simultaneous access to LRUCache objects which are not thread-safe (PageFile's pageCache)
> Workaround:
>    - Synchronize the LRUCache on all access methods (get, put, remove)
> The symptoms are as follows:
>   1. Message rates run fairly-constant until a point in time when they degrade rather quickly
>   2. After a while (about 15 minutes), the message rates drop to the floor - with large numbers of seconds with 0 records passing
>   3. Using VisualVM or JConsole, note that memory use grows continuosuly
>   4. When message rates drop to the floor, the VM is spending the vast majority of its time performing garbage collection
>   5. Heap dumps show that LRUCache objects (the pageCache members of PageFile's) are far exceeding their configured limits.
>       The default limit was used, 10000.  A size of over 170,000 entries was reached.
>   6. No producer flow control occurred (did not see the flow control log message)
> Test scenario used to reproduce:
>    - Fast producers (limited to <= 1000 msgs/sec)
>       -- using transactions
>       -- 10 msg per transaction
>       -- message content size 177 bytes
>    - Slow consumers (limited to <= 10 msg/sec)
>       -- auto-acknowledge mode; not transacted
>    - 10 Queues
>       -- 1 producer per queue
>       -- 1 consumer per queue
>    - Producers, Consumers, and Broker all running on different systems, and on the same system (different test runs).
> Note that disk space was not an issue - there was always plenty of disk space available.
> One other interesting note - once a large database of records was stored in KahaDB, only running consumers, this problem still occurred.
> This issue sounds like it may be related to 1764, and 2721.  The root cause sounds the same as 2290 - unsynchronized access to LRUCache.
> The most straight-forward solution is to modify all LRUCache objects (org.apache.kahadb.util.LRUCache, org.apache.activemq.util.LRUCache, ...) to be concurrent.  Another is to create concurrent versions (perhaps ConcurrentLRUCache) and make use of those at least in PageFile.pageCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (AMQ-3028) ActiveMQ broker processing slows with consumption from large store

Posted by "Arthur Naseef (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/activemq/browse/AMQ-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arthur Naseef updated AMQ-3028:
-------------------------------

    Attachment: LRUCache.patch

Patch which synchronizes org.apache.kahadb.util.LRUCache and org.apache.activemq.util.LRUCache on get(), put(), and remove() calls.


> ActiveMQ broker processing slows with consumption from large store
> ------------------------------------------------------------------
>
>                 Key: AMQ-3028
>                 URL: https://issues.apache.org/activemq/browse/AMQ-3028
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 5.4.1
>         Environment: CentOS 5.5, Sun JDK 1.6.0_21-b06 64 bit, ActiveMQ 5.4.1, AMD Athlon(tm) II X2 B22, local disk
>            Reporter: Arthur Naseef
>            Priority: Critical
>         Attachments: LRUCache.patch
>
>
> In scalability tests, this problem occured.  I have tested a workaround that appears to function.  A fix will gladly be submitted - would like some guidance, though, on the most appropriate solution.
> Here's the summary.  Many more details are available upon request.
> Root cause:
>    - Believed to be simultaneous access to LRUCache objects which are not thread-safe (PageFile's pageCache)
> Workaround:
>    - Synchronize the LRUCache on all access methods (get, put, remove)
> The symptoms are as follows:
>   1. Message rates run fairly-constant until a point in time when they degrade rather quickly
>   2. After a while (about 15 minutes), the message rates drop to the floor - with large numbers of seconds with 0 records passing
>   3. Using VisualVM or JConsole, note that memory use grows continuosuly
>   4. When message rates drop to the floor, the VM is spending the vast majority of its time performing garbage collection
>   5. Heap dumps show that LRUCache objects (the pageCache members of PageFile's) are far exceeding their configured limits.
>       The default limit was used, 10000.  A size of over 170,000 entries was reached.
>   6. No producer flow control occurred (did not see the flow control log message)
> Test scenario used to reproduce:
>    - Fast producers (limited to <= 1000 msgs/sec)
>       -- using transactions
>       -- 10 msg per transaction
>       -- message content size 177 bytes
>    - Slow consumers (limited to <= 10 msg/sec)
>       -- auto-acknowledge mode; not transacted
>    - 10 Queues
>       -- 1 producer per queue
>       -- 1 consumer per queue
>    - Producers, Consumers, and Broker all running on different systems, and on the same system (different test runs).
> Note that disk space was not an issue - there was always plenty of disk space available.
> One other interesting note - once a large database of records was stored in KahaDB, only running consumers, this problem still occurred.
> This issue sounds like it may be related to 1764, and 2721.  The root cause sounds the same as 2290 - unsynchronized access to LRUCache.
> The most straight-forward solution is to modify all LRUCache objects (org.apache.kahadb.util.LRUCache, org.apache.activemq.util.LRUCache, ...) to be concurrent.  Another is to create concurrent versions (perhaps ConcurrentLRUCache) and make use of those at least in PageFile.pageCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (AMQ-3028) ActiveMQ broker processing slows with consumption from large store

Posted by "Dejan Bosanac (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/activemq/browse/AMQ-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dejan Bosanac resolved AMQ-3028.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 5.5.0
         Assignee: Dejan Bosanac

Fixed with svn revision 1038566

I didn't make LRU cache synced in general, just synced the usage of pageCache. Let us know if it helps with your scenario.

> ActiveMQ broker processing slows with consumption from large store
> ------------------------------------------------------------------
>
>                 Key: AMQ-3028
>                 URL: https://issues.apache.org/activemq/browse/AMQ-3028
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 5.4.1
>         Environment: CentOS 5.5, Sun JDK 1.6.0_21-b06 64 bit, ActiveMQ 5.4.1, AMD Athlon(tm) II X2 B22, local disk
>            Reporter: Arthur Naseef
>            Assignee: Dejan Bosanac
>            Priority: Critical
>             Fix For: 5.5.0
>
>         Attachments: LRUCache.patch
>
>
> In scalability tests, this problem occured.  I have tested a workaround that appears to function.  A fix will gladly be submitted - would like some guidance, though, on the most appropriate solution.
> Here's the summary.  Many more details are available upon request.
> Root cause:
>    - Believed to be simultaneous access to LRUCache objects which are not thread-safe (PageFile's pageCache)
> Workaround:
>    - Synchronize the LRUCache on all access methods (get, put, remove)
> The symptoms are as follows:
>   1. Message rates run fairly-constant until a point in time when they degrade rather quickly
>   2. After a while (about 15 minutes), the message rates drop to the floor - with large numbers of seconds with 0 records passing
>   3. Using VisualVM or JConsole, note that memory use grows continuosuly
>   4. When message rates drop to the floor, the VM is spending the vast majority of its time performing garbage collection
>   5. Heap dumps show that LRUCache objects (the pageCache members of PageFile's) are far exceeding their configured limits.
>       The default limit was used, 10000.  A size of over 170,000 entries was reached.
>   6. No producer flow control occurred (did not see the flow control log message)
> Test scenario used to reproduce:
>    - Fast producers (limited to <= 1000 msgs/sec)
>       -- using transactions
>       -- 10 msg per transaction
>       -- message content size 177 bytes
>    - Slow consumers (limited to <= 10 msg/sec)
>       -- auto-acknowledge mode; not transacted
>    - 10 Queues
>       -- 1 producer per queue
>       -- 1 consumer per queue
>    - Producers, Consumers, and Broker all running on different systems, and on the same system (different test runs).
> Note that disk space was not an issue - there was always plenty of disk space available.
> One other interesting note - once a large database of records was stored in KahaDB, only running consumers, this problem still occurred.
> This issue sounds like it may be related to 1764, and 2721.  The root cause sounds the same as 2290 - unsynchronized access to LRUCache.
> The most straight-forward solution is to modify all LRUCache objects (org.apache.kahadb.util.LRUCache, org.apache.activemq.util.LRUCache, ...) to be concurrent.  Another is to create concurrent versions (perhaps ConcurrentLRUCache) and make use of those at least in PageFile.pageCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (AMQ-3028) ActiveMQ broker processing slows with consumption from large store

Posted by "Dejan Bosanac (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/activemq/browse/AMQ-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=63529#action_63529 ] 

Dejan Bosanac commented on AMQ-3028:
------------------------------------

Thanks for confirming!

> ActiveMQ broker processing slows with consumption from large store
> ------------------------------------------------------------------
>
>                 Key: AMQ-3028
>                 URL: https://issues.apache.org/activemq/browse/AMQ-3028
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 5.4.1
>         Environment: CentOS 5.5, Sun JDK 1.6.0_21-b06 64 bit, ActiveMQ 5.4.1, AMD Athlon(tm) II X2 B22, local disk
>            Reporter: Arthur Naseef
>            Assignee: Dejan Bosanac
>            Priority: Critical
>             Fix For: 5.5.0
>
>         Attachments: LRUCache.patch
>
>
> In scalability tests, this problem occured.  I have tested a workaround that appears to function.  A fix will gladly be submitted - would like some guidance, though, on the most appropriate solution.
> Here's the summary.  Many more details are available upon request.
> Root cause:
>    - Believed to be simultaneous access to LRUCache objects which are not thread-safe (PageFile's pageCache)
> Workaround:
>    - Synchronize the LRUCache on all access methods (get, put, remove)
> The symptoms are as follows:
>   1. Message rates run fairly-constant until a point in time when they degrade rather quickly
>   2. After a while (about 15 minutes), the message rates drop to the floor - with large numbers of seconds with 0 records passing
>   3. Using VisualVM or JConsole, note that memory use grows continuosuly
>   4. When message rates drop to the floor, the VM is spending the vast majority of its time performing garbage collection
>   5. Heap dumps show that LRUCache objects (the pageCache members of PageFile's) are far exceeding their configured limits.
>       The default limit was used, 10000.  A size of over 170,000 entries was reached.
>   6. No producer flow control occurred (did not see the flow control log message)
> Test scenario used to reproduce:
>    - Fast producers (limited to <= 1000 msgs/sec)
>       -- using transactions
>       -- 10 msg per transaction
>       -- message content size 177 bytes
>    - Slow consumers (limited to <= 10 msg/sec)
>       -- auto-acknowledge mode; not transacted
>    - 10 Queues
>       -- 1 producer per queue
>       -- 1 consumer per queue
>    - Producers, Consumers, and Broker all running on different systems, and on the same system (different test runs).
> Note that disk space was not an issue - there was always plenty of disk space available.
> One other interesting note - once a large database of records was stored in KahaDB, only running consumers, this problem still occurred.
> This issue sounds like it may be related to 1764, and 2721.  The root cause sounds the same as 2290 - unsynchronized access to LRUCache.
> The most straight-forward solution is to modify all LRUCache objects (org.apache.kahadb.util.LRUCache, org.apache.activemq.util.LRUCache, ...) to be concurrent.  Another is to create concurrent versions (perhaps ConcurrentLRUCache) and make use of those at least in PageFile.pageCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.