You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by zd-project <gi...@git.apache.org> on 2018/06/25 21:14:21 UTC

[GitHub] storm issue #2710: [WIP] STORM-3099: Extend metrics on supervisor and worker...

Github user zd-project commented on the issue:

    https://github.com/apache/storm/pull/2710
  
    New supervisor level metrics: 
    
    - [ ] Worker Kill/Restart Statistics
    	- [x] Kill Count by Category - assignment change/HB too old/Heap space (memory limit?)
    		- [x] blob change?
    	- [ ] Worker Suicide Cnt - category:  internal error or Assignment Change
    		- [x] - Implemented based on running status the container's main process. Does not actually reflect suicide count because it counts the normal exit as well.
    	- [x] Worker idle period
    		- The metrics records the duration machines spent in each state (in histogram) and how many times it transition into/out to a certain state.
    	- [x] Time to Actually Kill worker (from identifying need by supervisor and actual change in the state of the worker) - (This is only an estimation, accuracy affected by SleepTime)
    	- [x] Time to start worker for topology from reading assignment for the first time.
    	- [x] Worker cleanup time
    - [x] Supervisor Level Metrics:
    	- [x] Supervisor restart Count
    		- simply report everytime it restarts.
    	- [x] Blobstore (Request to download time)
    		- [x] download time individual blob (inside localizer) localizer gettting requst to actually download hdfs request to finish
    			- I assume this to be [the complete process] from initiating download to commit download to local blob cache and inform relative workers
    		- [x] download rate individual blob (inside localizer)
    			- This is tracks the actual download rate of a blob retrieval, in MB/s
    		- [x] supervisor localizer thread blob download - how long (outside localizer)
    			- I put this inside async localizer as it turns out to be better suited for purpose. This tracks the time for a topology blob download request to be completely processed.
    		- [x] Blob update is also considered.
    	- [x] Blobstore Update due to Version change Cnts


---