You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "Robert Joseph Evans (JIRA)" <ji...@apache.org> on 2018/04/02 17:52:00 UTC
[jira] [Commented] (STORM-2983) Some topologies not working properly

    [ https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422861#comment-16422861 ] 

Robert Joseph Evans commented on STORM-2983:
--------------------------------------------

Happy to add my 2 cents on this.

 

The point of RAS is for multiple reasons. Yes one of them is to allow the cluster to be more utilized by having finer grained resource scheduling.

 

When we added in RAS we did stop honoring {{topology.workers}}.  There are only a handfull of places where it is used outside of the scheduler, so it didn't turn out to be that big of a deal.  The biggest issue we ran into was around the default number of ackers.  In really old versions of storm it was 1, but then was updated to be the number of workers.  We worked around this by having RAS "guess" an approximate number of workers as an alternative default.

[https://github.com/apache/storm/blob/402a371ccdb39ccd7146fe9743e91ca36fee6d15/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L2911-L2924]

 

For me there are two different things that need to be addressed.

 

1) we need to update the documentation for {{topology.workers}} to better reflect how it is actually used.

https://github.com/apache/storm/blob/402a371ccdb39ccd7146fe9743e91ca36fee6d15/storm-client/src/jvm/org/apache/storm/Config.java#L194-L199

 

2) We need to fix the current issue. 

We can do that by either not relying on the config and instead looking to see if all of the executors for the topology are in the current worker, by turning it off like the current patch does, or by having RAS do a similar "guess" about the actual number of workers, but making sure that we always set it to at least 2, because we don't know how many there really will be.

> Some topologies not working properly 
> -------------------------------------
>
>                 Key: STORM-2983
>                 URL: https://issues.apache.org/jira/browse/STORM-2983
>             Project: Apache Storm
>          Issue Type: Bug
>            Reporter: Ethan Li
>            Assignee: Ethan Li
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> For example,
> {code:java}
> bin/storm jar storm-loadgen-*.jar org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 --counters 1 -c topology.debug=true
> {code}
> on ResourceAwareScheduler not working properly.
> With default cluster settings, there will be only one __acker-executor and it will be on a separate worker. And it looks like the __acker-executor was not able to receive messages from spouts and bolts. And spouts and bolts continued to retry sending messages to acker. It then led to another problem:
> STORM-2970
> I tried to run on storm right before [https://github.com/apache/storm/pull/2502] and right after and confirmed that this bug should be related to it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)