You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2022/01/21 02:59:59 UTC

[GitHub] [accumulo] keith-turner opened a new issue #2423: Seeing out of memory error in manager

keith-turner opened a new issue #2423:
URL: https://github.com/apache/accumulo/issues/2423


   **Describe the bug**
   
   When running Accumulo 2.1.0-SNAPSHOT for extended periods (like 24+hours) I would see the manager process die with an out of memory error. 
   
   **Versions (OS, Maven, Java, and others, as appropriate):**
    - Affected version(s) of this project: 2.1.0-SNAPSHOT
    
   **To Reproduce**
   
   Run the Accumulo 2.1.0 Manager for an extended period of time with 512M of RAM.  512M is the default config set by Muchos for the manager.  It has been this size for a very long time and I don't recall seeing this error before.
   
   **Expected behavior**
   
   No OOME
   
   **Screenshots**
   
   After the first time this happened, I modified the manager process to heap dump on OOME.  Below is a screenshot of analyzing the heap dump.  I saw a lot of micrometer objects in the heap dump.  Randomly poking around, a lot of these seemed to have `exectuor` in the name.  Based on this I looked in the manager code and found it was [periodically creating thread pools to get tserver info](https://github.com/apache/accumulo/blob/1f80b7445d1320efaa8db1fb58c7b67d73c9fdbc/server/manager/src/main/java/org/apache/accumulo/manager/Manager.java#L920) and this [may register the thread pool statically for monitoring](https://github.com/apache/accumulo/blob/1f80b7445d1320efaa8db1fb58c7b67d73c9fdbc/core/src/main/java/org/apache/accumulo/core/util/threads/ThreadPools.java#L287).  I have yet to connect the objects in the heap dump the code linked Manager code.  The objects seem to be statically referenced so I am guessing the refs from Accumulo code are gone and only the static metric registry re
 fs are left. I am not sure yet if the Manager code I linked is the culprit though.
   
   ![image](https://user-images.githubusercontent.com/1268739/150456982-6a35b443-80dc-4281-b162-442d29c5a10d.png)
   
   **Additional context**
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner edited a comment on issue #2423: Seeing out of memory error in manager

Posted by GitBox <gi...@apache.org>.
keith-turner edited a comment on issue #2423:
URL: https://github.com/apache/accumulo/issues/2423#issuecomment-1021545222


   @dlmarion another possible solution to this problem is having an option on the util method to create a thread pool to instrument or not instrument. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner edited a comment on issue #2423: Seeing out of memory error in manager

Posted by GitBox <gi...@apache.org>.
keith-turner edited a comment on issue #2423:
URL: https://github.com/apache/accumulo/issues/2423#issuecomment-1021265001


   I realized I can work around this in the test I am running by periodically killing the manager process.  This bug is one of those that would probably be hidden by agitation during testing.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on issue #2423: Seeing out of memory error in manager

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #2423:
URL: https://github.com/apache/accumulo/issues/2423#issuecomment-1022250670


   So, our ThreadPools class calls a method in our MetricsUtils class that creates a Micrometer ExecutorServiceMetrics object which ultimately creates several Meters and registers them (https://github.com/micrometer-metrics/micrometer/blob/main/micrometer-core/src/main/java/io/micrometer/core/instrument/binder/jvm/ExecutorServiceMetrics.java#L316). The registered Meters are added with a `name` tag. There is a way to remove Meters from the MeterRegistry (MeterRegistry.remove), but we don't have a direct reference to the Meters that ExecutorServiceMetrics creates. We would have to use the `Search` class to look them up and then remove them.
   
   > I am not sure if it's useful to instrument and report metrics on short lived thread pools.
   
   I am going down this path, not because it's not useful for short lived thread pools, but because there is no easy way to remove the Meters when the thread pool is shut down. My plan is to conditionally create metrics for ThreadPools, and then only do it when the ThreadPool lives for the duration of the VM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on issue #2423: Seeing out of memory error in manager

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #2423:
URL: https://github.com/apache/accumulo/issues/2423#issuecomment-1021283349


   > Based on this I looked in the manager code and found it was periodically creating thread pools to get tserver info and this may register the thread pool statically for monitoring.
   
   Seems like the Manager should re-use a thread pool for getting tserver info rather than recreating?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner commented on issue #2423: Seeing out of memory error in manager

Posted by GitBox <gi...@apache.org>.
keith-turner commented on issue #2423:
URL: https://github.com/apache/accumulo/issues/2423#issuecomment-1021545222


   @dlmarion another possible solution to this problem is having an option on the util method to create a thread pool method to instrument or not instrument. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner commented on issue #2423: Seeing out of memory error in manager

Posted by GitBox <gi...@apache.org>.
keith-turner commented on issue #2423:
URL: https://github.com/apache/accumulo/issues/2423#issuecomment-1021265001


   I realized I can work around this in the test I am running by periodically killing the manager process.  This bug is one of those that would be hidden by agitation during testing.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on issue #2423: Seeing out of memory error in manager

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #2423:
URL: https://github.com/apache/accumulo/issues/2423#issuecomment-1023158042


   > do you think an upstream issue/PR to micrometer about this would be useful?
   
   I had found https://github.com/micrometer-metrics/micrometer/issues/1929, which is not specific to their ExecutorServiceMetrics class, but is where I got my information on how to remove the Meters from the MeterRegistry. We could submit an issue, but I see two possible solutions:
   
     1. They provide a method in ExecutorServiceMetrics that performs the removal of the Meters for the user based on the ExecutorService name tag that they add.
     2. Return a reference to all of the Meters created by ExecutorServiceMetrics to the caller so that the caller can remove them.
     
     I think that 2 above presents a problem for us as we would need to keep a reference to all Meters created for all ExecutorServices so that they could be cleaned up. Given that #2432 only creates metrics for long lived ExecutorService's, I'm of the opinion that we punt on this issue until someone submits a ticket asking us to provide metrics for all ExecutorServices.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner edited a comment on issue #2423: Seeing out of memory error in manager

Posted by GitBox <gi...@apache.org>.
keith-turner edited a comment on issue #2423:
URL: https://github.com/apache/accumulo/issues/2423#issuecomment-1021265001


   I realized I can work around this in the test I am running by periodically killing the manager process.  This bug is one of those that would probably be hidden by agitation during testing.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner commented on issue #2423: Seeing out of memory error in manager

Posted by GitBox <gi...@apache.org>.
keith-turner commented on issue #2423:
URL: https://github.com/apache/accumulo/issues/2423#issuecomment-1021547770


   Even if there were no memory issues, I am not sure if it's useful to instrument and report metrics on short lived thread pools.  So making instrumentation optional may be useful for more than fixing the bug.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner commented on issue #2423: Seeing out of memory error in manager

Posted by GitBox <gi...@apache.org>.
keith-turner commented on issue #2423:
URL: https://github.com/apache/accumulo/issues/2423#issuecomment-1022669615


   >  There is a way to remove Meters from the MeterRegistry (MeterRegistry.remove), but we don't have a direct reference to the Meters that ExecutorServiceMetrics creates.
   
   @dlmarion do you think an upstream issue/PR to micrometer about this would be useful?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion closed issue #2423: Seeing out of memory error in manager

Posted by GitBox <gi...@apache.org>.
dlmarion closed issue #2423:
URL: https://github.com/apache/accumulo/issues/2423


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner commented on issue #2423: Seeing out of memory error in manager

Posted by GitBox <gi...@apache.org>.
keith-turner commented on issue #2423:
URL: https://github.com/apache/accumulo/issues/2423#issuecomment-1021265001






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on issue #2423: Seeing out of memory error in manager

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #2423:
URL: https://github.com/apache/accumulo/issues/2423#issuecomment-1021608330


   > @keith-turner - do you remember which thread pool in the Manager was short-lived?
   
   Nevermind, I found it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on issue #2423: Seeing out of memory error in manager

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #2423:
URL: https://github.com/apache/accumulo/issues/2423#issuecomment-1021607465


   @keith-turner  - do you remember which thread pool in the Manager was short-lived?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on issue #2423: Seeing out of memory error in manager

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #2423:
URL: https://github.com/apache/accumulo/issues/2423#issuecomment-1021409204


   > If there is any other code in Accumulo that creates thread pools for short task, then those could also cause an OOME when running for extended periods.
   
   I think most are long lived IIRC.
   
   My guess is the problem is in the ThreadPools class where it's creating metrics for each ExecutorService that is created. Ref: https://github.com/apache/accumulo/blob/main/core/src/main/java/org/apache/accumulo/core/util/threads/ThreadPools.java#L213


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on issue #2423: Seeing out of memory error in manager

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #2423:
URL: https://github.com/apache/accumulo/issues/2423#issuecomment-1021283349


   > Based on this I looked in the manager code and found it was periodically creating thread pools to get tserver info and this may register the thread pool statically for monitoring.
   
   Seems like the Manager should re-use a thread pool for getting tserver info rather than recreating?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org