You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by Ethanlm <gi...@git.apache.org> on 2017/10/10 13:58:31 UTC

[GitHub] storm pull request #2366: [STORM-2686] Add locality awareness to LoadAwareSh...

GitHub user Ethanlm opened a pull request:

    https://github.com/apache/storm/pull/2366

    [STORM-2686] Add locality awareness to LoadAwareShuffleGrouping

    A redesign of https://github.com/apache/storm/pull/2270 . This adds the locality awareness to LoadAwareShuffleGrouping. It applies the concept of Bang–bang control (credit to @revans2 ).
    
    I didn't change the critical function like chooseTasks(). So the performance should be good. 
    
    I am doing some performance testing. Hope to get some feedback first. Thanks.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Ethanlm/storm STORM-2686-redesign

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/2366.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2366
    
----
commit 379ea4cf872160ad29744c5729476e860d40fc14
Author: Ethan Li <et...@gmail.com>
Date:   2017-10-09T23:41:39Z

    [STORM-2686] Add locality awareness to LoadAwareShuffleGrouping

----


---

[GitHub] storm issue #2366: [STORM-2686] Add locality awareness to LoadAwareShuffleGr...

Posted by Ethanlm <gi...@git.apache.org>.
Github user Ethanlm commented on the issue:

    https://github.com/apache/storm/pull/2366
  
    I ran some other performance tests this weekend and here are some results supporting this patch. Below are average results among all the 27 tests ( 2/4/8 workers * 5000/10000/20000 rates * 3 repetition) for each experiment (ORIG/SHUFFLE/STORM-2686-H80L20/STORM-2686-H90L40/STORM-2686-H70L30).   Thanks @revans2  . I am going to fix the acker bug now.
    
    
    
    |Experiment  | rate(tuple/s) | mean(ms) | 99%ile(ms) | 99.9%ile(ms) | cores | mem(MB)
    -- | -- | -- | -- | -- | -- | --
    ORIG | 11668.1 | 11.0787 | 18.7359 | 31.2698 | 2.61209 | 351.185
    SHUFFLE | 10952.2 | 476.725 | 744.26 | 779.364 | 2.75163 | 511.239
    STORM-2686-H80L20 | 11660.9 | 10.3228 | 17.6535 | 32.0146 | 2.62504 | 374.756
    STORM-2686-H90L40 | 11659.6 | 10.6373 | 18.5732 | 32.7845 | 2.6264 | 416.69
    STORM-2686-H70L30 | 11734.4 | 45.3912 | 504.058 | 571.151 | 2.6467 | 450.725
    
    
    Note:
    SHUFFLE and STORM-2686-H70L30 has extreme high overall average latency because they have very high average latency in some single experiment (2vm8spout8splitter8counter20000r.20171014140308 and 2vm8spout8splitter8counter20000r.20171016023641 respectively).
    
    Results are uploaded at https://github.com/Ethanlm/STORM-2686-perf-study/commit/a790e780e8d3db5f0b68c87320af1aad5a03e0ae
    
    
    



---

[GitHub] storm issue #2366: [STORM-2686] Add locality awareness to LoadAwareShuffleGr...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on the issue:

    https://github.com/apache/storm/pull/2366
  
    @Ethanlm Everything looks good I am +1 on this change.  I would love to see more numbers to back this up, but as of right now it looks really good.


---

[GitHub] storm pull request #2366: [STORM-2686] Add locality awareness to LoadAwareSh...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/storm/pull/2366


---

[GitHub] storm pull request #2366: [STORM-2686] Add locality awareness to LoadAwareSh...

Posted by Ethanlm <gi...@git.apache.org>.
Github user Ethanlm commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2366#discussion_r143837500
  
    --- Diff: storm-client/src/jvm/org/apache/storm/grouping/LoadAwareShuffleGrouping.java ---
    @@ -50,10 +61,28 @@
         volatile int[] choices;
         private volatile int[] prepareChoices;
         private AtomicInteger current;
    +    private AtomicReference<Scope> currentScopeRef;
    --- End diff --
    
    I think you are right


---

[GitHub] storm pull request #2366: [STORM-2686] Add locality awareness to LoadAwareSh...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2366#discussion_r143828158
  
    --- Diff: storm-client/src/jvm/org/apache/storm/task/WorkerTopologyContext.java ---
    @@ -34,6 +36,8 @@
         private String _pidDir;
         Map<String, Object> _userResources;
         Map<String, Object> _defaultResources;
    +    private AtomicReference<Map<Integer, NodeInfo>> _taskToNodePort;
    +    private String _assignmentId;
    --- End diff --
    
    Could we avoid adding in new member variables with _ as the first character.  This is against the new guidelines.


---

[GitHub] storm issue #2366: [STORM-2686] Add locality awareness to LoadAwareShuffleGr...

Posted by Ethanlm <gi...@git.apache.org>.
Github user Ethanlm commented on the issue:

    https://github.com/apache/storm/pull/2366
  
    Just ran a very small group of tests:   https://github.com/Ethanlm/STORM-2686-perf-study/commit/50693b938de2321f53655e4039ac8811f963424a
    It seems to lead to similar conclusion for these specific tests.


---

[GitHub] storm issue #2366: [STORM-2686] Add locality awareness to LoadAwareShuffleGr...

Posted by Ethanlm <gi...@git.apache.org>.
Github user Ethanlm commented on the issue:

    https://github.com/apache/storm/pull/2366
  
    @HeartSaVioR  It should be default to worker count. But it's not. There must be something wrong with the recent code change: https://github.com/apache/storm/pull/2325 .  
    I ran topologies with `-c topology.workers=N` but the acker is always 1


---

[GitHub] storm issue #2366: [STORM-2686] Add locality awareness to LoadAwareShuffleGr...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on the issue:

    https://github.com/apache/storm/pull/2366
  
    I ran some small performance tests and things are looking really good.  Have you run any tests?  How did you come up with the 80% config for growing in size, and 20% for shrinking?
    
    I would really be interested in seeing how well word count scales by adding in new machines under this patch vs what was there before. 


---

[GitHub] storm issue #2366: [STORM-2686] Add locality awareness to LoadAwareShuffleGr...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on the issue:

    https://github.com/apache/storm/pull/2366
  
    @Ethanlm The numbers look really good for the normal use case, and when I ran with a slow splitter I saw acceptable performance too.  I am +1 on merging this in, but we need to be sure the default acker gets fixed.  Have you filed a bug for it yet?


---

[GitHub] storm pull request #2366: [STORM-2686] Add locality awareness to LoadAwareSh...

Posted by Ethanlm <gi...@git.apache.org>.
Github user Ethanlm commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2366#discussion_r143865842
  
    --- Diff: storm-client/src/jvm/org/apache/storm/grouping/LoadAwareShuffleGrouping.java ---
    @@ -50,10 +69,28 @@
         volatile int[] choices;
         private volatile int[] prepareChoices;
         private AtomicInteger current;
    +    private Scope currentScope;
    +    private NodeInfo sourceNodeInfo;
    +    private List<Integer> targetTasks;
    +    private AtomicReference<Map<Integer, NodeInfo>> taskToNodePort;
    +    private Map<String, Object> conf;
    +    private DNSToSwitchMapping dnsToSwitchMapping;
    +    private Map<Scope, List<Integer>> localityGroup;
    +    private double HIGHER_BOUND;
    --- End diff --
    
    Thanks. Fixed it


---

[GitHub] storm issue #2366: [STORM-2686] Add locality awareness to LoadAwareShuffleGr...

Posted by Ethanlm <gi...@git.apache.org>.
Github user Ethanlm commented on the issue:

    https://github.com/apache/storm/pull/2366
  
    I ran some performance tests using ThroughputVsLatency and the raw results are available at https://github.com/Ethanlm/STORM-2686-perf-study
    
    The experiments include 
    
    | Experiment        | Description
    | ------------- |:-------------:| 
    | ORIGINAL     | the current storm LoadAwareShuffleGrouping| 
    | SHUFFLE     | the current storm with loadaware disabled |
    | STORM-2686    | this patch with higher bound 0.8 and lower bound 0.2
    | STORM-2686-H70L30| this patch with higher bound 0.7 and lower bound 0.3 |
    
    Every experiment runs on 2 VMs with 2/4/8 workers and 5000/10000/20000 rate separately. The number of spouts, splitters and counters are all equal to the number of workers for simplicity.  Repeat three times.
    The table below shows the average of all the experiments (except for the experiments with 8workers and 20000rate because they all failed)
    
    | Experiment | rate(tuple/s) | mean(ms) | 99%ile(ms) | 99.9%ile(ms) | cores | mem(MB) |
    | ------------- |:-------------:| -----:|------------- |:-------------:| -----:|-----:|
    | ORIGINAL | 10862.78696 | 11.12642261 | 16.76319565 | 25.67112174 | 2.1671 | 416.8698696 |
    |SHUFFLE | 10625.18 | 11.15462583 | 16.94839583 | 25.39703333 | 2.174539167 | 369.2407083 |
    |STORM-2686  | 10626.52542 | 10.40140958 | 15.68085 | 29.86079583 | 2.15822 | 378.7717542|
    |STORM-2686-H70L30| 10624.41 | 10.51196542 | 15.7350125 | 26.97664583 | 2.136774583 | 411.7942958|
    
    It seems that from the result this patch did bring some benefits especially about overall 6.6% less on mean latency.  If you look at the single experiment results, this patch seems to do a better job than the original storm. 
    
    Since I didn't change the critical functions, the performance of `choose a Task` should remain the same.
    



---

[GitHub] storm pull request #2366: [STORM-2686] Add locality awareness to LoadAwareSh...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2366#discussion_r143862027
  
    --- Diff: storm-client/src/jvm/org/apache/storm/grouping/LoadAwareShuffleGrouping.java ---
    @@ -50,10 +69,28 @@
         volatile int[] choices;
         private volatile int[] prepareChoices;
         private AtomicInteger current;
    +    private Scope currentScope;
    +    private NodeInfo sourceNodeInfo;
    +    private List<Integer> targetTasks;
    +    private AtomicReference<Map<Integer, NodeInfo>> taskToNodePort;
    +    private Map<String, Object> conf;
    +    private DNSToSwitchMapping dnsToSwitchMapping;
    +    private Map<Scope, List<Integer>> localityGroup;
    +    private double HIGHER_BOUND;
    --- End diff --
    
    This and below line are against style guide and check style will catch it. `higherBound` and `lowerBound`.


---

[GitHub] storm issue #2366: [STORM-2686] Add locality awareness to LoadAwareShuffleGr...

Posted by Ethanlm <gi...@git.apache.org>.
Github user Ethanlm commented on the issue:

    https://github.com/apache/storm/pull/2366
  
    Thanks! Will do


---

[GitHub] storm pull request #2366: [STORM-2686] Add locality awareness to LoadAwareSh...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2366#discussion_r143829089
  
    --- Diff: storm-client/src/jvm/org/apache/storm/grouping/LoadAwareShuffleGrouping.java ---
    @@ -50,10 +61,28 @@
         volatile int[] choices;
         private volatile int[] prepareChoices;
         private AtomicInteger current;
    +    private AtomicReference<Scope> currentScopeRef;
    --- End diff --
    
    I don't think this needs to be atomic.  Only `chooseTasks` needs to be thread safe.  All the others are called from a single thread, and we can put a lock around if we really want to.


---

[GitHub] storm issue #2366: [STORM-2686] Add locality awareness to LoadAwareShuffleGr...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/2366
  
    @Ethanlm 
    When you don't set topology.acker.executors anywhere it will take default value which is same as worker count. So the change shouldn't affect the performance, and if it influences the performance, that behavior itself should be investigated.


---

[GitHub] storm pull request #2366: [STORM-2686] Add locality awareness to LoadAwareSh...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2366#discussion_r143826667
  
    --- Diff: storm-client/src/jvm/org/apache/storm/grouping/LoadAwareShuffleGrouping.java ---
    @@ -20,14 +20,20 @@
     
     import com.google.common.annotations.VisibleForTesting;
     import java.io.Serializable;
    -import java.util.Arrays;
    -import java.util.HashMap;
    -import java.util.List;
    -import java.util.Map;
    -import java.util.Random;
    +import java.util.*;
    --- End diff --
    
    We avoid .* imports, it is against the checkstyle guidelines we have.


---

[GitHub] storm issue #2366: [STORM-2686] Add locality awareness to LoadAwareShuffleGr...

Posted by Ethanlm <gi...@git.apache.org>.
Github user Ethanlm commented on the issue:

    https://github.com/apache/storm/pull/2366
  
    @revans2  I ran some tests (e.g. various number of workers on 1~2 VMs) to make sure it's working properly. But I haven't done any performance testing yet. I plan to compare the performances between this patch and the original one. 
    
    0.8 and 0.2 are random numbers (just because there are 80/20 rules everywhere in the world). We will need to do some testing on different configs.


---

[GitHub] storm issue #2366: [STORM-2686] Add locality awareness to LoadAwareShuffleGr...

Posted by revans2 <gi...@git.apache.org>.
Github user revans2 commented on the issue:

    https://github.com/apache/storm/pull/2366
  
    @Ethanlm You have some NPEs in the LoadAwareShuffleGroupingTest


---

[GitHub] storm issue #2366: [STORM-2686] Add locality awareness to LoadAwareShuffleGr...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/2366
  
    @Ethanlm Ah OK. You talked about the bug you found. OK. Did you also see the bug without RAS enabled?


---

[GitHub] storm issue #2366: [STORM-2686] Add locality awareness to LoadAwareShuffleGr...

Posted by Ethanlm <gi...@git.apache.org>.
Github user Ethanlm commented on the issue:

    https://github.com/apache/storm/pull/2366
  
    @HeartSaVioR  Yea....
    
    ![image](https://user-images.githubusercontent.com/14900612/31571912-0b879fd2-b061-11e7-9fcb-f84a11eec039.png)
    
    Looking into it


---