You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by revans2 <gi...@git.apache.org> on 2015/11/12 18:00:59 UTC

[GitHub] storm pull request: Added in feature diff

GitHub user revans2 opened a pull request:

    https://github.com/apache/storm/pull/877

    Added in feature diff 

    based off of work from Basti Liu

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/revans2/incubator-storm jstorm-import

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/877.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #877
    
----
commit 52494b5c1a0d3ddbefe13eea5157f4d41b68dea5
Author: Robert (Bobby) Evans <ev...@yahoo-inc.com>
Date:   2015-11-12T16:19:37Z

    Added in Empty File

commit 99442c4168ed727d4b608a4e39e7de5a0437c68c
Author: Robert (Bobby) Evans <re...@gmail.com>
Date:   2015-11-12T16:58:19Z

    Update STORM_JSTORM_FEATURE_DIFF.md

commit 1a065cd0164db5bf2419afb8987fadcd80a63411
Author: Robert (Bobby) Evans <re...@gmail.com>
Date:   2015-11-12T16:58:56Z

    Merge pull request #4 from revans2/revans2-add-feature-diff
    
    Update STORM_JSTORM_FEATURE_DIFF.md

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: Added in feature diff

Posted by hustfxj <gi...@git.apache.org>.
Github user hustfxj commented on a diff in the pull request:

    https://github.com/apache/storm/pull/877#discussion_r44796241
  
    --- Diff: STORM_JSTORM_FEATURE_DIFF.md ---
    @@ -0,0 +1,30 @@
    + feature | Storm-0.11.0-SNAPSHOT (and expected pull requests) | JStorm-2.1.0 | Notes | JIRA |
    +----------|-----------------------|--------------|-------|----|
    + Security | Basic Authentication/Authorization | N/A | The migration is not high risk. But JStorm daeons/connections need to be evaluated
    +Scheduler | Resource aware scheduling	(Work In Progress) | <ol><li>Make the tasks of same component distributed evenly across the nodes in cluster.<li>Make number of tasks in a worker is balanced.<li>Try to assign two tasks which are transferring message directly into a same worker to reduce network cost.<li>Support user defined assignment and  to use the result of last assignment	Different solution between Storm and JStorm.</ol> Actually, JStorm has ever used cpu/mem/disk/network demond for scheduling. But we faced the problem of low utilization of cluster. For example, because the configuration of topologys and resources of hardwares in a cluster are different. After running a period, it is possible that no topology will be assigned to the node with enough other resources but just lacking one resource. | Scheduler interface is pluggable so we should be able to support boht schedulers for the time being |
    +Nimbus HA	| Support to configure more than one spare nimbus. When the master nimbus is down, a most appropriate spare nimbus(topologys in disk mostly matches the recoreds in zk) will be chosen to be promoted. | Different solution. | | 
    +Topology structure | worker/executor/task | worker/task | It might be the biggest diff between Storm and JStorm. So, not sure if there are some features related to "executor" can not be merged. (Rebalancing allows us to combine multiple tasks on a single executor)  This may go away in the future once RAS is stable |
    +Topology Master | (Heartbeat server is similar for scaling on a pull request) | New system bolt "topology master" was added, which is responsible for collecting task heartbeat info of all tasks and reports to nimbus. Besides task hb info, it also can be used to dispatch control messages in topology. Topology master significantly reduce the amout of read/write to zk.	Before this change, zk is the bottleneck to depoly big cluster and topology.
    --- End diff --
    
    TM is mainly used to manage topology later,and  we hope  users can  defines control  stream in the future. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: Added in feature diff

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/877#discussion_r44794800
  
    --- Diff: STORM_JSTORM_FEATURE_DIFF.md ---
    @@ -0,0 +1,30 @@
    + feature | Storm-0.11.0-SNAPSHOT (and expected pull requests) | JStorm-2.1.0 | Notes | JIRA |
    +----------|-----------------------|--------------|-------|----|
    + Security | Basic Authentication/Authorization | N/A | The migration is not high risk. But JStorm daeons/connections need to be evaluated
    +Scheduler | Resource aware scheduling	(Work In Progress) | <ol><li>Make the tasks of same component distributed evenly across the nodes in cluster.<li>Make number of tasks in a worker is balanced.<li>Try to assign two tasks which are transferring message directly into a same worker to reduce network cost.<li>Support user defined assignment and  to use the result of last assignment	Different solution between Storm and JStorm.</ol> Actually, JStorm has ever used cpu/mem/disk/network demond for scheduling. But we faced the problem of low utilization of cluster. For example, because the configuration of topologys and resources of hardwares in a cluster are different. After running a period, it is possible that no topology will be assigned to the node with enough other resources but just lacking one resource. | Scheduler interface is pluggable so we should be able to support boht schedulers for the time being |
    +Nimbus HA	| Support to configure more than one spare nimbus. When the master nimbus is down, a most appropriate spare nimbus(topologys in disk mostly matches the recoreds in zk) will be chosen to be promoted. | Different solution. | | 
    +Topology structure | worker/executor/task | worker/task | It might be the biggest diff between Storm and JStorm. So, not sure if there are some features related to "executor" can not be merged. (Rebalancing allows us to combine multiple tasks on a single executor)  This may go away in the future once RAS is stable |
    +Topology Master | (Heartbeat server is similar for scaling on a pull request) | New system bolt "topology master" was added, which is responsible for collecting task heartbeat info of all tasks and reports to nimbus. Besides task hb info, it also can be used to dispatch control messages in topology. Topology master significantly reduce the amout of read/write to zk.	Before this change, zk is the bottleneck to depoly big cluster and topology.
    +Backpressure | Store backpressure status in zk which will trigger the source spout to start flow control. When flow control, relative souce spout will stop sending tuples. | <ol><li>Implement backpressure based on topology master. TM is responsible to process the trigger message and send flow control request to relative spouts. When flow control, spout will not stop to send tuples, but just slow down the sending.<li>Enable user to update the configuration  of backpressure dynamically without restarting topology, e.g. enable/disable backpressue, high/low water mark... | These two solution is totally different. We need to evaluate which one is better. |
    +Monitor of task execute thread | N/A | Monitor the status of the execute thread of task. It is effective to find the slow bolt in a topology, and potential dead lock. | 	
    +Message processing | Configurable multi receiving threads of worker (But likely to change to have deserialization happen in Netty) | <ol><li>Add receiving and transferring queue/thread for each task to make deserialization and serialization asynchronously<li>Remove receiving and transferring thread on worker level to avoid unnecessary locks and to shorten the message processing phase | Low risk for merging to JStorm (Both implementations are similar) |
    +Batch tuples | Batching in DisruptorQueue | Do batch before sending tuple to transfer queue and support to adjust the batch size dynamically according to samples of actual batch size sent out for past intervals. | Should evaluate both implementations, and see which is better, or if we can pull in some of the dynamic batching to the disruptor level |	
    +Grouping | Load aware balancing shuffle grouping | <ol><li>Add localfrist grouping that tuples are sent to the tasks in the same worker firstly. If the load of all local tasks is high, the tuples will be sent out to remote tasks.<li>Improve localOrShuffle grouping that tuples are only sent to the tasks in the same worker or in same node.	The shuffle of JStorm checks the load of network connection of local worker. | The risk for merging of  checking remote load which was implemented in "Load aware balancing shuffle grouping" is low.
    +Rest API | YES | NA |The porting of rest API to JStorm is in progress
    +web-UI | | <ol><li>Redesign Web-UI, the new Web-UI code is clear and clean.<li>Data display is much more beautiful and better user experience, especially topology graph and 30 minutes metric trend. url: http://storm.taobao.org/ 	
    +metric system | | <ol><li>All levels of metrics, including stream metrics, task metrics, component metrics, topology metrics, even cluster metrics, will be sampled & calculated. Some metrics, e.g. ""tuple life cycle"", are very useful for debug and finding hot spot of a topology.<li>Support full metrics data. previous metric system can only display mean value of meters/histograms, the new metric system can display m1, m5, m15 of meters, and common percentiles of histograms.<li>Use new metrics windows, the min metric window is 1min, thus we can see the metircs data every single minute.<li>Supplies a metric uploader interface, thirty-party companies can easily build their own metric systems based on all historic metric data. | JStorm metric system is completely redesigned.
    +jar command | Support progress bar when submitting topology | Does not support progress bar | Low risk for merging to JStorm
    +rebalance	command | Basic functionality of rebalance | Besides rebalance, scale-out/in by updating the number of worker, acker, spout & bolt dynamically without stopping topology | How is grouping consistency maintined when changing the number or bolt/spout? | 
    --- End diff --
    
    There are no plans to change exactly once processing right now.  At yahoo we have a decent number of trident topologies running in production, and I don't see it ever going fully away.  But I do see the appeal of having something like apex or flink, where you can have exactly once processing without the requirement for kafka to do the replay.  It would also be nice to be able to have the exact same code work for both exactly once, at least once, and best effort processing.
    
    But that would be a long ways off if we do it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: Added in feature diff

Posted by wuchong <gi...@git.apache.org>.
Github user wuchong commented on a diff in the pull request:

    https://github.com/apache/storm/pull/877#discussion_r44742223
  
    --- Diff: STORM_JSTORM_FEATURE_DIFF.md ---
    @@ -0,0 +1,30 @@
    + feature | Storm-0.11.0-SNAPSHOT (and expected pull requests) | JStorm-2.1.0 | Notes | JIRA |
    +----------|-----------------------|--------------|-------|----|
    + Security | Basic Authentication/Authorization | N/A | The migration is not high risk. But JStorm daeons/connections need to be evaluated
    +Scheduler | Resource aware scheduling	(Work In Progress) | <ol><li>Make the tasks of same component distributed evenly across the nodes in cluster.<li>Make number of tasks in a worker is balanced.<li>Try to assign two tasks which are transferring message directly into a same worker to reduce network cost.<li>Support user defined assignment and  to use the result of last assignment	Different solution between Storm and JStorm.</ol> Actually, JStorm has ever used cpu/mem/disk/network demond for scheduling. But we faced the problem of low utilization of cluster. For example, because the configuration of topologys and resources of hardwares in a cluster are different. After running a period, it is possible that no topology will be assigned to the node with enough other resources but just lacking one resource. | Scheduler interface is pluggable so we should be able to support boht schedulers for the time being |
    +Nimbus HA	| Support to configure more than one spare nimbus. When the master nimbus is down, a most appropriate spare nimbus(topologys in disk mostly matches the recoreds in zk) will be chosen to be promoted. | Different solution. | | 
    --- End diff --
    
    Seems that the Nimbus HA feature of storm and jstorm has been swapped. 
    Maybe it should be `Nimbus HA  |	| Support to configure more than one spare nimbus. When the master nimbus is down, a most appropriate spare nimbus(topologys in disk mostly matches the recoreds in zk) will be chosen to be promoted. | Different solution. | `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: Added in feature diff

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/877#discussion_r44786316
  
    --- Diff: STORM_JSTORM_FEATURE_DIFF.md ---
    @@ -0,0 +1,30 @@
    + feature | Storm-0.11.0-SNAPSHOT (and expected pull requests) | JStorm-2.1.0 | Notes | JIRA |
    +----------|-----------------------|--------------|-------|----|
    + Security | Basic Authentication/Authorization | N/A | The migration is not high risk. But JStorm daeons/connections need to be evaluated
    +Scheduler | Resource aware scheduling	(Work In Progress) | <ol><li>Make the tasks of same component distributed evenly across the nodes in cluster.<li>Make number of tasks in a worker is balanced.<li>Try to assign two tasks which are transferring message directly into a same worker to reduce network cost.<li>Support user defined assignment and  to use the result of last assignment	Different solution between Storm and JStorm.</ol> Actually, JStorm has ever used cpu/mem/disk/network demond for scheduling. But we faced the problem of low utilization of cluster. For example, because the configuration of topologys and resources of hardwares in a cluster are different. After running a period, it is possible that no topology will be assigned to the node with enough other resources but just lacking one resource. | Scheduler interface is pluggable so we should be able to support boht schedulers for the time being |
    +Nimbus HA	| Support to configure more than one spare nimbus. When the master nimbus is down, a most appropriate spare nimbus(topologys in disk mostly matches the recoreds in zk) will be chosen to be promoted. | Different solution. | | 
    +Topology structure | worker/executor/task | worker/task | It might be the biggest diff between Storm and JStorm. So, not sure if there are some features related to "executor" can not be merged. (Rebalancing allows us to combine multiple tasks on a single executor)  This may go away in the future once RAS is stable |
    +Topology Master | (Heartbeat server is similar for scaling on a pull request) | New system bolt "topology master" was added, which is responsible for collecting task heartbeat info of all tasks and reports to nimbus. Besides task hb info, it also can be used to dispatch control messages in topology. Topology master significantly reduce the amout of read/write to zk.	Before this change, zk is the bottleneck to depoly big cluster and topology.
    +Backpressure | Store backpressure status in zk which will trigger the source spout to start flow control. When flow control, relative souce spout will stop sending tuples. | <ol><li>Implement backpressure based on topology master. TM is responsible to process the trigger message and send flow control request to relative spouts. When flow control, spout will not stop to send tuples, but just slow down the sending.<li>Enable user to update the configuration  of backpressure dynamically without restarting topology, e.g. enable/disable backpressue, high/low water mark... | These two solution is totally different. We need to evaluate which one is better. |
    +Monitor of task execute thread | N/A | Monitor the status of the execute thread of task. It is effective to find the slow bolt in a topology, and potential dead lock. | 	
    +Message processing | Configurable multi receiving threads of worker (But likely to change to have deserialization happen in Netty) | <ol><li>Add receiving and transferring queue/thread for each task to make deserialization and serialization asynchronously<li>Remove receiving and transferring thread on worker level to avoid unnecessary locks and to shorten the message processing phase | Low risk for merging to JStorm (Both implementations are similar) |
    +Batch tuples | Batching in DisruptorQueue | Do batch before sending tuple to transfer queue and support to adjust the batch size dynamically according to samples of actual batch size sent out for past intervals. | Should evaluate both implementations, and see which is better, or if we can pull in some of the dynamic batching to the disruptor level |	
    +Grouping | Load aware balancing shuffle grouping | <ol><li>Add localfrist grouping that tuples are sent to the tasks in the same worker firstly. If the load of all local tasks is high, the tuples will be sent out to remote tasks.<li>Improve localOrShuffle grouping that tuples are only sent to the tasks in the same worker or in same node.	The shuffle of JStorm checks the load of network connection of local worker. | The risk for merging of  checking remote load which was implemented in "Load aware balancing shuffle grouping" is low.
    +Rest API | YES | NA |The porting of rest API to JStorm is in progress
    +web-UI | | <ol><li>Redesign Web-UI, the new Web-UI code is clear and clean.<li>Data display is much more beautiful and better user experience, especially topology graph and 30 minutes metric trend. url: http://storm.taobao.org/ 	
    +metric system | | <ol><li>All levels of metrics, including stream metrics, task metrics, component metrics, topology metrics, even cluster metrics, will be sampled & calculated. Some metrics, e.g. ""tuple life cycle"", are very useful for debug and finding hot spot of a topology.<li>Support full metrics data. previous metric system can only display mean value of meters/histograms, the new metric system can display m1, m5, m15 of meters, and common percentiles of histograms.<li>Use new metrics windows, the min metric window is 1min, thus we can see the metircs data every single minute.<li>Supplies a metric uploader interface, thirty-party companies can easily build their own metric systems based on all historic metric data. | JStorm metric system is completely redesigned.
    +jar command | Support progress bar when submitting topology | Does not support progress bar | Low risk for merging to JStorm
    +rebalance	command | Basic functionality of rebalance | Besides rebalance, scale-out/in by updating the number of worker, acker, spout & bolt dynamically without stopping topology | How is grouping consistency maintined when changing the number or bolt/spout? | 
    --- End diff --
    
    OK we will need some really good documentation explaining how this works, especially with state.  Also if we ever want to go towards a flink or apex like state check-pointing for exactly once processing it will likely not be compatible with this feature, unless we allow everyone to checkpoint state in a lazy key/value like way, which sounds a lot more complex.  This is why I like the task/executor split, because you can set the task level high while keeping the executors low, and if you need to scale up the state does not need to be split/combined.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: Added in feature diff

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on the pull request:

    https://github.com/apache/storm/pull/877#issuecomment-159090195
  
    This is now captured at https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61328109


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: Added in feature diff

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 closed the pull request at:

    https://github.com/apache/storm/pull/877


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: Added in feature diff

Posted by bastiliu <gi...@git.apache.org>.
Github user bastiliu commented on a diff in the pull request:

    https://github.com/apache/storm/pull/877#discussion_r44746119
  
    --- Diff: STORM_JSTORM_FEATURE_DIFF.md ---
    @@ -0,0 +1,30 @@
    + feature | Storm-0.11.0-SNAPSHOT (and expected pull requests) | JStorm-2.1.0 | Notes | JIRA |
    +----------|-----------------------|--------------|-------|----|
    + Security | Basic Authentication/Authorization | N/A | The migration is not high risk. But JStorm daeons/connections need to be evaluated
    +Scheduler | Resource aware scheduling	(Work In Progress) | <ol><li>Make the tasks of same component distributed evenly across the nodes in cluster.<li>Make number of tasks in a worker is balanced.<li>Try to assign two tasks which are transferring message directly into a same worker to reduce network cost.<li>Support user defined assignment and  to use the result of last assignment	Different solution between Storm and JStorm.</ol> Actually, JStorm has ever used cpu/mem/disk/network demond for scheduling. But we faced the problem of low utilization of cluster. For example, because the configuration of topologys and resources of hardwares in a cluster are different. After running a period, it is possible that no topology will be assigned to the node with enough other resources but just lacking one resource. | Scheduler interface is pluggable so we should be able to support boht schedulers for the time being |
    +Nimbus HA	| Support to configure more than one spare nimbus. When the master nimbus is down, a most appropriate spare nimbus(topologys in disk mostly matches the recoreds in zk) will be chosen to be promoted. | Different solution. | | 
    +Topology structure | worker/executor/task | worker/task | It might be the biggest diff between Storm and JStorm. So, not sure if there are some features related to "executor" can not be merged. (Rebalancing allows us to combine multiple tasks on a single executor)  This may go away in the future once RAS is stable |
    +Topology Master | (Heartbeat server is similar for scaling on a pull request) | New system bolt "topology master" was added, which is responsible for collecting task heartbeat info of all tasks and reports to nimbus. Besides task hb info, it also can be used to dispatch control messages in topology. Topology master significantly reduce the amout of read/write to zk.	Before this change, zk is the bottleneck to depoly big cluster and topology.
    +Backpressure | Store backpressure status in zk which will trigger the source spout to start flow control. When flow control, relative souce spout will stop sending tuples. | <ol><li>Implement backpressure based on topology master. TM is responsible to process the trigger message and send flow control request to relative spouts. When flow control, spout will not stop to send tuples, but just slow down the sending.<li>Enable user to update the configuration  of backpressure dynamically without restarting topology, e.g. enable/disable backpressue, high/low water mark... | These two solution is totally different. We need to evaluate which one is better. |
    +Monitor of task execute thread | N/A | Monitor the status of the execute thread of task. It is effective to find the slow bolt in a topology, and potential dead lock. | 	
    +Message processing | Configurable multi receiving threads of worker (But likely to change to have deserialization happen in Netty) | <ol><li>Add receiving and transferring queue/thread for each task to make deserialization and serialization asynchronously<li>Remove receiving and transferring thread on worker level to avoid unnecessary locks and to shorten the message processing phase | Low risk for merging to JStorm (Both implementations are similar) |
    +Batch tuples | Batching in DisruptorQueue | Do batch before sending tuple to transfer queue and support to adjust the batch size dynamically according to samples of actual batch size sent out for past intervals. | Should evaluate both implementations, and see which is better, or if we can pull in some of the dynamic batching to the disruptor level |	
    +Grouping | Load aware balancing shuffle grouping | <ol><li>Add localfrist grouping that tuples are sent to the tasks in the same worker firstly. If the load of all local tasks is high, the tuples will be sent out to remote tasks.<li>Improve localOrShuffle grouping that tuples are only sent to the tasks in the same worker or in same node.	The shuffle of JStorm checks the load of network connection of local worker. | The risk for merging of  checking remote load which was implemented in "Load aware balancing shuffle grouping" is low.
    +Rest API | YES | NA |The porting of rest API to JStorm is in progress
    +web-UI | | <ol><li>Redesign Web-UI, the new Web-UI code is clear and clean.<li>Data display is much more beautiful and better user experience, especially topology graph and 30 minutes metric trend. url: http://storm.taobao.org/ 	
    +metric system | | <ol><li>All levels of metrics, including stream metrics, task metrics, component metrics, topology metrics, even cluster metrics, will be sampled & calculated. Some metrics, e.g. ""tuple life cycle"", are very useful for debug and finding hot spot of a topology.<li>Support full metrics data. previous metric system can only display mean value of meters/histograms, the new metric system can display m1, m5, m15 of meters, and common percentiles of histograms.<li>Use new metrics windows, the min metric window is 1min, thus we can see the metircs data every single minute.<li>Supplies a metric uploader interface, thirty-party companies can easily build their own metric systems based on all historic metric data. | JStorm metric system is completely redesigned.
    +jar command | Support progress bar when submitting topology | Does not support progress bar | Low risk for merging to JStorm
    +rebalance	command | Basic functionality of rebalance | Besides rebalance, scale-out/in by updating the number of worker, acker, spout & bolt dynamically without stopping topology | How is grouping consistency maintined when changing the number or bolt/spout? | 
    --- End diff --
    
    For grouping consistency, the out tasks will be updated accordingly when refreshing connection


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Re: [GitHub] storm pull request: Added in feature diff

Posted by Cody Innowhere <e....@gmail.com>.
@revans2,
Thanks for the compliment.
We've done a lot on the new metric system of JStorm, which we think, is a
very important part of JStorm/Storm. We've been helping users within our
company troubleshoot almost every day, and we know which metrics are
necessary to help do troubleshooting and how to monitor the metrics.

But I think it's not the metrics that matter most, what really matters is
that users should know instantly upon a sudden/sharp metric value change,
i.e., metrics help user to discover problems and troubleshoot problems. And
I look forward to working with community to enable Storm with the ability.


On Fri, Nov 13, 2015 at 10:32 PM, revans2 <gi...@git.apache.org> wrote:

> Github user revans2 commented on the pull request:
>
>     https://github.com/apache/storm/pull/877#issuecomment-156448057
>
>     @wuchong and @bastiliu thanks for the feedback and comments.  I
> updated the pull request based off of them.  Any more feedback is
> definitely welcome.  By the way great work on jstorm.  My has spent way too
> much time playing around with http://storm.taobao.org/ being jealous of
> the features there.
>
>     Very excited to see what all of us can accomplish together.
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastructure@apache.org or file a JIRA ticket
> with INFRA.
> ---
>

[GitHub] storm pull request: Added in feature diff

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on the pull request:

    https://github.com/apache/storm/pull/877#issuecomment-156448057
  
    @wuchong and @bastiliu thanks for the feedback and comments.  I updated the pull request based off of them.  Any more feedback is definitely welcome.  By the way great work on jstorm.  My has spent way too much time playing around with http://storm.taobao.org/ being jealous of the features there.
    
    Very excited to see what all of us can accomplish together.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: Added in feature diff

Posted by ptgoetz <gi...@git.apache.org>.
Github user ptgoetz commented on the pull request:

    https://github.com/apache/storm/pull/877#issuecomment-156184568
  
    +1
    
    I was just going to attach the spreadsheet to JIRA, but this is a much better solution in terms of enabling collaboration.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: Added in feature diff

Posted by bastiliu <gi...@git.apache.org>.
Github user bastiliu commented on a diff in the pull request:

    https://github.com/apache/storm/pull/877#discussion_r44792426
  
    --- Diff: STORM_JSTORM_FEATURE_DIFF.md ---
    @@ -0,0 +1,30 @@
    + feature | Storm-0.11.0-SNAPSHOT (and expected pull requests) | JStorm-2.1.0 | Notes | JIRA |
    +----------|-----------------------|--------------|-------|----|
    + Security | Basic Authentication/Authorization | N/A | The migration is not high risk. But JStorm daeons/connections need to be evaluated
    +Scheduler | Resource aware scheduling	(Work In Progress) | <ol><li>Make the tasks of same component distributed evenly across the nodes in cluster.<li>Make number of tasks in a worker is balanced.<li>Try to assign two tasks which are transferring message directly into a same worker to reduce network cost.<li>Support user defined assignment and  to use the result of last assignment	Different solution between Storm and JStorm.</ol> Actually, JStorm has ever used cpu/mem/disk/network demond for scheduling. But we faced the problem of low utilization of cluster. For example, because the configuration of topologys and resources of hardwares in a cluster are different. After running a period, it is possible that no topology will be assigned to the node with enough other resources but just lacking one resource. | Scheduler interface is pluggable so we should be able to support boht schedulers for the time being |
    +Nimbus HA	| Support to configure more than one spare nimbus. When the master nimbus is down, a most appropriate spare nimbus(topologys in disk mostly matches the recoreds in zk) will be chosen to be promoted. | Different solution. | | 
    +Topology structure | worker/executor/task | worker/task | It might be the biggest diff between Storm and JStorm. So, not sure if there are some features related to "executor" can not be merged. (Rebalancing allows us to combine multiple tasks on a single executor)  This may go away in the future once RAS is stable |
    +Topology Master | (Heartbeat server is similar for scaling on a pull request) | New system bolt "topology master" was added, which is responsible for collecting task heartbeat info of all tasks and reports to nimbus. Besides task hb info, it also can be used to dispatch control messages in topology. Topology master significantly reduce the amout of read/write to zk.	Before this change, zk is the bottleneck to depoly big cluster and topology.
    +Backpressure | Store backpressure status in zk which will trigger the source spout to start flow control. When flow control, relative souce spout will stop sending tuples. | <ol><li>Implement backpressure based on topology master. TM is responsible to process the trigger message and send flow control request to relative spouts. When flow control, spout will not stop to send tuples, but just slow down the sending.<li>Enable user to update the configuration  of backpressure dynamically without restarting topology, e.g. enable/disable backpressue, high/low water mark... | These two solution is totally different. We need to evaluate which one is better. |
    +Monitor of task execute thread | N/A | Monitor the status of the execute thread of task. It is effective to find the slow bolt in a topology, and potential dead lock. | 	
    +Message processing | Configurable multi receiving threads of worker (But likely to change to have deserialization happen in Netty) | <ol><li>Add receiving and transferring queue/thread for each task to make deserialization and serialization asynchronously<li>Remove receiving and transferring thread on worker level to avoid unnecessary locks and to shorten the message processing phase | Low risk for merging to JStorm (Both implementations are similar) |
    +Batch tuples | Batching in DisruptorQueue | Do batch before sending tuple to transfer queue and support to adjust the batch size dynamically according to samples of actual batch size sent out for past intervals. | Should evaluate both implementations, and see which is better, or if we can pull in some of the dynamic batching to the disruptor level |	
    +Grouping | Load aware balancing shuffle grouping | <ol><li>Add localfrist grouping that tuples are sent to the tasks in the same worker firstly. If the load of all local tasks is high, the tuples will be sent out to remote tasks.<li>Improve localOrShuffle grouping that tuples are only sent to the tasks in the same worker or in same node.	The shuffle of JStorm checks the load of network connection of local worker. | The risk for merging of  checking remote load which was implemented in "Load aware balancing shuffle grouping" is low.
    +Rest API | YES | NA |The porting of rest API to JStorm is in progress
    +web-UI | | <ol><li>Redesign Web-UI, the new Web-UI code is clear and clean.<li>Data display is much more beautiful and better user experience, especially topology graph and 30 minutes metric trend. url: http://storm.taobao.org/ 	
    +metric system | | <ol><li>All levels of metrics, including stream metrics, task metrics, component metrics, topology metrics, even cluster metrics, will be sampled & calculated. Some metrics, e.g. ""tuple life cycle"", are very useful for debug and finding hot spot of a topology.<li>Support full metrics data. previous metric system can only display mean value of meters/histograms, the new metric system can display m1, m5, m15 of meters, and common percentiles of histograms.<li>Use new metrics windows, the min metric window is 1min, thus we can see the metircs data every single minute.<li>Supplies a metric uploader interface, thirty-party companies can easily build their own metric systems based on all historic metric data. | JStorm metric system is completely redesigned.
    +jar command | Support progress bar when submitting topology | Does not support progress bar | Low risk for merging to JStorm
    +rebalance	command | Basic functionality of rebalance | Besides rebalance, scale-out/in by updating the number of worker, acker, spout & bolt dynamically without stopping topology | How is grouping consistency maintined when changing the number or bolt/spout? | 
    --- End diff --
    
    Thansk for the explanation. We will try to prepare the documentation, like high level design.
    BTW, does community plan to re-design the exactly once processing of Trident? or plan to provide a solution for both normal topology and Trident?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---