You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "Ethan Li (Jira)" <ji...@apache.org> on 2020/03/19 14:00:12 UTC

[jira] [Resolved] (STORM-3602) loadaware shuffle can overload local worker

     [ https://issues.apache.org/jira/browse/STORM-3602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ethan Li resolved STORM-3602.
-----------------------------
    Fix Version/s: 2.1.1
                   2.2.0
       Resolution: Fixed

Thanks [~agresch]. I merged this to master and 2.1.x-branch

> loadaware shuffle can overload local worker
> -------------------------------------------
>
>                 Key: STORM-3602
>                 URL: https://issues.apache.org/jira/browse/STORM-3602
>             Project: Apache Storm
>          Issue Type: Bug
>            Reporter: Aaron Gresch
>            Assignee: Aaron Gresch
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.2.0, 2.1.1
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> We were seeing a worker overloaded and tuples timing out with loadaware shuffle enabled.  From investigating, we found that the code allows switching from Host local to Worker local if the load average is lower than the low water mark.  It really should be checking the load on the worker instead. 
>  
> What's happening is the worker is overloaded with tons of idle host local tasks, so it switches to HOST_LOCAL.  Then the calculation across all the host tasks is below the low water mark and it immediately switches back to the overloaded worker local task.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)