You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2022/03/07 19:38:19 UTC

[GitHub] [accumulo] EdColeman commented on issue #946: Consider adding critical thread metrics for monitoring

EdColeman commented on issue #946:
URL: https://github.com/apache/accumulo/issues/946#issuecomment-1061061376


   This was originally proposed as metrics / monitoring at a level such that operator and app developers could gain insight into overall health and trends.  Having the threads throw exceptions is great. But, this was more directed to allowing monitoring and trending of higher level functions - things that could be using multiple threads.  @keith-turner provided some concrete examples. Knowing that the expected threads in the TabletGroupWatcher are running and possibly timing how long each run takes would allow metrics alerting and trending.
   
   This is speculation and more of an description of something desired rather than a concrete example that I know happens.  But assume that the thread handling user tablet assignments gets stuck or dies - if the manager keeps running then that is going to eventually be noticed through secondary effects - maybe its FATEs on table creates hang and backup or fail? Or its splits that start failing,...  Exposing that function as a reportable metric could allow intervention sooner - or maybe it could be trended and if the thread starts taking longer and longer to run one could look what has changed and fix something before it falls over.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org