You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/06/29 08:02:55 UTC

[jira] [Commented] (STORM-1934) Race condition between sync-supervisor and sync-processes raises several strange issues

    [ https://issues.apache.org/jira/browse/STORM-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15354750#comment-15354750 ] 

ASF GitHub Bot commented on STORM-1934:
---------------------------------------

GitHub user HeartSaVioR opened a pull request:

    https://github.com/apache/storm/pull/1528

    STORM-1934 Fix race condition between sync-supervisor and sync-processes

    * sync-supervisor just downloads new topology code and writes new local assignment
      * shutting down workers and removing topology code is moved to sync-processes
    * sync-processes does all of jobs based on local assignment and allocated workers
    * remove unused / unneeded codes
    
    Here's my test result for this patch:
    
    * `mvn clean install` 5 times: not met supervisor intermittent failure (STORM-1933)
      * will try more times
    * kill worker via `kill`, `kill -9`, `restart worker` from UI: no issue on restarting worker
    * rebalance topology to change workers (2 -> 3): to test that new assignment has same worker port but different executors compared to assigned workers
      * worker is recognized as :disallowed, and killed & relaunched
    
    Rebalance test in details:
    
    - Writing new assignment
    ```
    6701 {:storm-id "test-topology2-4-1467185073", :executors ([7 7] [5 5] [3 3] [1 1]), :resources [0.0 0.0 0.0]}, 
    6702 {:storm-id "test-topology2-4-1467185073", :executors ([6 6] [4 4] [2 2]), :resources [0.0 0.0 0.0]}
    ```
    
    - Assigned executors:
    ```
    6701 {:storm-id "test-topology2-4-1467185073", :executors [[7 7] [5 5] [3 3] [1 1]], :resources #object[org.apache.storm.generated.WorkerResources 0x40c4d31c "WorkerResources(mem_on_heap:0.0, mem_off_heap:0.0, cpu:0.0)"]}, 
    6702 {:storm-id "test-topology2-4-1467185073", :executors [[6 6] [4 4] [2 2]], :resources #object[org.apache.storm.generated.WorkerResources 0x4ba861f4 "WorkerResources(mem_on_heap:0.0, mem_off_heap:0.0, cpu:0.0)"]}}
    ```
    
    - Allocated:
    ```
    "2e9bea10-02b7-4e55-88e7-b194b9917a63" [:disallowed {:time-secs 1467185407, :storm-id "test-topology2-4-1467185073", :executors [[3 3] [6 6] [-1 -1]], :port 6703}], 
    "4630c4bf-9786-47ff-9f3b-6b42d9781b9d" [:disallowed {:time-secs 1467185407, :storm-id "test-topology2-4-1467185073", :executors [[7 7] [1 1] [-1 -1] [4 4]], :port 6701}], 
    "b9a622d2-5e5b-4311-999c-8c8dd92da6b6" [:disallowed {:time-secs 1467185406, :storm-id "test-topology2-4-1467185073", :executors [[2 2] [-1 -1] [5 5]], :port 6702}]}
    ```
    
    NOTE: Due to forward reference, I have to move `sync-processes` to just before `mk-synchronize-supervisor`. Major changes are done in sync-processes so reviewers need to compare before & after manually. Sorry about that.
    
    Since supervisor.clj is already ported to Java in master branch, I should have time to read ported code, and modify to be in sync.
    
    Please review and comment while I'm working against master branch. Thanks!

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HeartSaVioR/storm STORM-1934-1.x

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/1528.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1528
    
----
commit e5857e07838af888988691af39efbe415b9a2345
Author: Jungtaek Lim <ka...@gmail.com>
Date:   2016-06-29T07:06:20Z

    STORM-1934 Fix race condition between sync-supervisor and sync-processes
    
    * sync-supervisor just downloads new topology code and writes new local assignment
      * shutting down workers and removing topology code is moved to sync-processes
    * sync-processes does all of jobs based on local assignment and allocated workers
    * remove unused / unneeded codes

----


> Race condition between sync-supervisor and sync-processes raises several strange issues
> ---------------------------------------------------------------------------------------
>
>                 Key: STORM-1934
>                 URL: https://issues.apache.org/jira/browse/STORM-1934
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>    Affects Versions: 1.0.0, 2.0.0, 1.0.1
>            Reporter: Jungtaek Lim
>            Assignee: Jungtaek Lim
>            Priority: Critical
>
> There're some strange issues including STORM-1933 and others (which I will file an issue soon) which are related to race condition in supervisor.
> As I mentioned to STORM-1933, basically sync-supervisor relies on zk assignment, and sync-processes relies on local assignment and local workers directory, but in fact sync-supervisor also access local state and take some actions which affects sync-processes. And also Satish left the comment to STORM-1933 describing other issue related to race condition and idea to fix this which is same page on me.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)