You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by abhishekagarwal87 <gi...@git.apache.org> on 2016/05/08 16:48:51 UTC

[GitHub] storm pull request: [STORM-433] [WIP] Executor queue backlog metri...

GitHub user abhishekagarwal87 opened a pull request:

    https://github.com/apache/storm/pull/1406

    [STORM-433] [WIP] Executor queue backlog metric

    This is first set of changes to show the average queue population in UI. It is work in progress. Right now, only the population is shown only for the executors and not aggregated at component level. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/abhishekagarwal87/storm backlog-metrics

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/1406.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1406
    
----
commit 81f35aa0c93720da9a034378dac23b8efef2c25f
Author: Abhishek Agarwal <ab...@inmobi.com>
Date:   2016-04-25T07:30:19Z

    First cut implementation for backlog metrics

commit 73f608beab475a47af5665d1952ed3f488026bff
Author: Abhishek Agarwal <ab...@inmobi.com>
Date:   2016-05-07T07:58:29Z

    Merge branch '1.x-branch' of https://github.com/apache/storm into backlog-metrics

commit d00376ff75a0a535a7760ff1be8487f076e176a7
Author: Abhishek Agarwal <ab...@inmobi.com>
Date:   2016-05-08T06:23:45Z

    Changes in thrift java class

commit ab138bc49ed5e8fb4e7ae450d891670388b3a507
Author: Abhishek Agarwal <ab...@inmobi.com>
Date:   2016-05-08T16:28:32Z

    Add backlog metrics to spout component

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1406: [STORM-433] [WIP] Executor queue backlog metric

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/1406
  
    @abhishekagarwal87 
    Could I ask some questions regarding this: what points did you need to work further? It's needed for someone including me to take this up and reuse your work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: [STORM-433] [WIP] Executor queue backlog metri...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the pull request:

    https://github.com/apache/storm/pull/1406#issuecomment-217765582
  
    @abhishekagarwal87 
    Thanks for the improvement. :)
    
    Btw, I've some opinions on this change.
    
    1. some concerns about adding payloads for task heartbeat
    
    As you may know, why Storm needs Pacemaker daemon with large cluster is that Storm includes task metrics into heartbeat message and store to ZK in a short interval (task.heartbeat.frequency.secs, its default value is 3) which is a big pressure for ZK.
    
    So we would like to have some discussions for expanding heartbeat message with current way, or change the way to send metrics to Nimbus (like JStorm). If we can make some more spaces for metrics, we can have ideations around metrics and add them to enrich. For example, spout tasks can have optional metrics, for example, partition information and lag for KafkaSpout.
    
    2. metrics for queue
    
    I guess sojourn time for the queue is one of most wanted feature of queue metrics, since many users said that they see very short latencies for execute/process latency for each task but also see very high complete latency.
    (@wangli1426 addresses sojourn time for disruptor queue but [as he stated to code comment](https://github.com/apache/storm/blob/1.x-branch/storm-core/src/jvm/org/apache/storm/utils/DisruptorQueue.java#L324), it's based on precondition which is sometimes not true for problematic task. If we can make it stable it would be really helpful.)
    
    STORM-1742 covers the accuracy of 'complete latency', but many parts of lifecycle of tuple are still hidden, for example, avg. of queue sojourn time, serde latency, transfer latency, etc. I think we don't want to address the things which can affect overall performance in order to measure, but they're meaningful information indeed so I would like to address if they don't hurt at all.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1406: [STORM-433] [WIP] Executor queue backlog metric

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/1406
  
    @abhishekagarwal87 Any updates here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: [STORM-433] [WIP] Executor queue backlog metri...

Posted by abhishekagarwal87 <gi...@git.apache.org>.
Github user abhishekagarwal87 commented on the pull request:

    https://github.com/apache/storm/pull/1406#issuecomment-217732510
  
    ![screen shot 2016-05-08 at 9 48 27 pm](https://cloud.githubusercontent.com/assets/1477457/15099125/06b9583e-156b-11e6-918d-639822665079.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1406: [STORM-433] [WIP] Executor queue backlog metric

Posted by erikdw <gi...@git.apache.org>.
Github user erikdw commented on the issue:

    https://github.com/apache/storm/pull/1406
  
    @abhishekagarwal87 & @HeartSaVioR : any update on this?  Getting visibility into queue depth in storm is a major reason for us to even consider upgrading off of the storm-0.9.x branch.  Do you need help?  We can potentially dedicate time in our next quarter towards helping.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1406: [STORM-433] [WIP] Executor queue backlog metric

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/storm/pull/1406


---

[GitHub] storm issue #1406: [STORM-433] [WIP] Executor queue backlog metric

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/1406
  
    @abhishekagarwal87 Any updates on this? I'm supportive for this patch but you marked WIP so I'm waiting for you to finalize.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1406: [STORM-433] [WIP] Executor queue backlog metric

Posted by abhishekagarwal87 <gi...@git.apache.org>.
Github user abhishekagarwal87 commented on the issue:

    https://github.com/apache/storm/pull/1406
  
    @HeartSaVioR - If we are ok to show these metrics only at the executor level (and not aggregate it at component level), it is almost done. Aggregation at component level can be taken up in separate jira as well. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1406: [STORM-433] [WIP] Executor queue backlog metric

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/1406
  
    @erikdw Yes I'd be really appreciated if you can help. It might depends on metrics renewal so if you want to address this later, please stay tuned for metrics renewal by @ptgoetz and @abellina.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: [STORM-433] [WIP] Executor queue backlog metric

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the pull request:

    https://github.com/apache/storm/pull/1406
  
    @abhishekagarwal87 @knusbaum 
    Not at all. It shouldn't hurt much so let's apply this as it is. We can address broader considerations from another issues.
    One thing I'd like to see is the status of backpressure for each queue. We can show the percentage, or just show whether this queue meets condition to trigger backpressure or not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: [STORM-433] [WIP] Executor queue backlog metri...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the pull request:

    https://github.com/apache/storm/pull/1406#issuecomment-218339228
  
    @knusbaum Yeah, agreed. I was seeing larger view, but when we narrow the view, this PR would be add a tiny amount of load. Seeing is believing so I would like to see performance test result.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: [STORM-433] [WIP] Executor queue backlog metri...

Posted by abhishekagarwal87 <gi...@git.apache.org>.
Github user abhishekagarwal87 commented on the pull request:

    https://github.com/apache/storm/pull/1406#issuecomment-217775942
  
    Regarding A - That's a good point and glad to know that this issue is being solved for. The current executor stats takes up good amount of space (0.5 MB in my topologies) I earlier ran into an issue where I had packed too many executors into single worker and zookeeper writing failed. In fact, I was quite surprised to know that such a heavy write is being done into zookeeper :) This PR can wait if we want to solve that problem first. 
    
    Regarding B - Having queue depth can give user's very good visibility. Though I think it would have been more useful when backpressure was not available. Now if you look at the code, the (queue + overflow buffer) is actually unbounded. Previously, a slow bolt would stall the whole topology and queue depth would have helped in zeroing down on that bolt. 
    
    complete latency -  I haven't gone through the discussion closely so I can't comment right now. 
    sojournTime - Looks like a good metric. I missed it completely. Though it still remains to be shown on UI. 
    
    So to summarize, there are four major points - 
    1. A more efficient way to periodically update executor stats (Metric Producer)
    2. Zeroing on which queue metrics are useful
    3. Packing queue metrics in the executor stats (simple enough)
    4. Move nimbus to read the metrics for new metric store or some other place
    
     


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: [STORM-433] [WIP] Executor queue backlog metri...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the pull request:

    https://github.com/apache/storm/pull/1406#issuecomment-217794597
  
    For addressing big picture of metrics improvement, yes we want to address four major points.
    
    But given that we already released major version of Storm 1.0.0, we would want to keep backward compatibility for 1.x. That's why I'm addressing small improvements first instead of working long-term huge improvements. It's also planned to phase 2 on JStorm merger in 2.0.0.
    
    But if we have many ideas which relies on that improvements, I think we can discuss how to address and work on that right now. (evaluation of many ways including porting metrics feature of JStorm)
    
    Btw, since we're having alternative (Pacemaker) now, we may feel OK to add small amount of payload to heartbeat message. 
    
    If it doesn't make sense, even we can separate heartbeat message and metrics message, and set frequency of latter thing to 10s or even longer, I guess we can reduce ZK write load greatly. (since it aggregates built-in topology metrics and writes to ZK every 3 seconds by default.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: [STORM-433] [WIP] Executor queue backlog metri...

Posted by abhishekagarwal87 <gi...@git.apache.org>.
Github user abhishekagarwal87 commented on the pull request:

    https://github.com/apache/storm/pull/1406#issuecomment-218076239
  
    Two parts of this PR -
    *Metric collection* - From what I have seen, storm users are very much interested in queue length statistics. Current queue metrics do not have histogram or average of queue length etc. There must be a way to add them without touching the critical path.
    
    *Metric reporting* - Reporting additional metrics to UI adds zookeeper overhead but additional payload is not significant. Other alternative is to let users rely on external components such as graphite by using appropriate MetricConsumer. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1406: [STORM-433] [WIP] Executor queue backlog metric

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/1406
  
    @abhishekagarwal87 Yeah I think we can do that later, and anyone can do that once it's added to metrics. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: [STORM-433] [WIP] Executor queue backlog metri...

Posted by knusbaum <gi...@git.apache.org>.
Github user knusbaum commented on the pull request:

    https://github.com/apache/storm/pull/1406#issuecomment-218289768
  
    I think it's fine if this goes in. It doesn't add much to the metrics load, and we already have a working solution for clusters whose zk instances get overloaded. Ultimately, we do want to move all the metrics somewhere else, but until then, we can continue doing them as we have been. 
    
    When we make the larger change, it won't me more difficult because of this than it would have been.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: [STORM-433] [WIP] Executor queue backlog metri...

Posted by hustfxj <gi...@git.apache.org>.
Github user hustfxj commented on the pull request:

    https://github.com/apache/storm/pull/1406#issuecomment-218035627
  
    In fact, JStorm can show alll queues population in UI. And we seperarate heartbeat message and metrics message. I agree with @HeartSaVioR . The work is long-term huge, So I hope we can plan to phase 2 in 2.0.0. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1406: [STORM-433] [WIP] Executor queue backlog metric

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/1406
  
    @abhishekagarwal87 No worry for performance benchmarking. I'm thinking it's tiny for now. If we want to have more values about queue we can measure then.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1406: [STORM-433] [WIP] Executor queue backlog metric

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/1406
  
    @abhishekagarwal87 
    Do you have some more works in mind, or do you think we can remove [WIP]?
    (Input backlog and output backlog seems a good start. It would be better to have component level  aggregation, but I think I can help if you don't have time to go on.)
    
    If you think it's done I'll take a deep look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1406: [STORM-433] [WIP] Executor queue backlog metric

Posted by abhishekagarwal87 <gi...@git.apache.org>.
Github user abhishekagarwal87 commented on the issue:

    https://github.com/apache/storm/pull/1406
  
    @HeartSaVioR - I  am not able to get time. I would like to do some performance benchmarking as well for these changes. Can you point me to any references for that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1406: [STORM-433] [WIP] Executor queue backlog metric

Posted by abhishekagarwal87 <gi...@git.apache.org>.
Github user abhishekagarwal87 commented on the issue:

    https://github.com/apache/storm/pull/1406
  
    @heartsavior unfortunately I havent done any further work. This feature needs some gauranteed time which I am not able to give. Anyone else wants to take it up and reuse my work, please do so


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---