You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by zd-project <gi...@git.apache.org> on 2018/06/25 21:14:21 UTC
[GitHub] storm issue #2710: [WIP] STORM-3099: Extend metrics on supervisor and worker...
Github user zd-project commented on the issue:
https://github.com/apache/storm/pull/2710
New supervisor level metrics:
- [ ] Worker Kill/Restart Statistics
- [x] Kill Count by Category - assignment change/HB too old/Heap space (memory limit?)
- [x] blob change?
- [ ] Worker Suicide Cnt - category: internal error or Assignment Change
- [x] - Implemented based on running status the container's main process. Does not actually reflect suicide count because it counts the normal exit as well.
- [x] Worker idle period
- The metrics records the duration machines spent in each state (in histogram) and how many times it transition into/out to a certain state.
- [x] Time to Actually Kill worker (from identifying need by supervisor and actual change in the state of the worker) - (This is only an estimation, accuracy affected by SleepTime)
- [x] Time to start worker for topology from reading assignment for the first time.
- [x] Worker cleanup time
- [x] Supervisor Level Metrics:
- [x] Supervisor restart Count
- simply report everytime it restarts.
- [x] Blobstore (Request to download time)
- [x] download time individual blob (inside localizer) localizer gettting requst to actually download hdfs request to finish
- I assume this to be [the complete process] from initiating download to commit download to local blob cache and inform relative workers
- [x] download rate individual blob (inside localizer)
- This is tracks the actual download rate of a blob retrieval, in MB/s
- [x] supervisor localizer thread blob download - how long (outside localizer)
- I put this inside async localizer as it turns out to be better suited for purpose. This tracks the time for a topology blob download request to be completely processed.
- [x] Blob update is also considered.
- [x] Blobstore Update due to Version change Cnts
---