You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "Robert Joseph Evans (JIRA)" <ji...@apache.org> on 2017/09/14 15:33:00 UTC

[jira] [Resolved] (STORM-2733) Make Load Aware Shuffle much better at really bad situations

     [ https://issues.apache.org/jira/browse/STORM-2733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Joseph Evans resolved STORM-2733.
----------------------------------------
       Resolution: Fixed
    Fix Version/s: 2.0.0

I pulled this into 2.x

> Make Load Aware Shuffle much better at really bad situations
> ------------------------------------------------------------
>
>                 Key: STORM-2733
>                 URL: https://issues.apache.org/jira/browse/STORM-2733
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-client
>    Affects Versions: 1.0.0, 2.0.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>             Fix For: 2.0.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> We recently had an issue where some bolts got really backed up and started to die from OOMs.  The issue ended up being 2 fold.
> First the GC really slowed down the worker so much that it could not keep up even with < 1% of the traffic that was still being sent to it.  Which made it almost impossible to recover.
> The second issue was that the serialization of the tuples took a lot longer than the processing, which resulted in the send queue filling up much more quickly than the receive queue.
> To help fix this issue I plan to address this in 2 ways.  First we need a better algorithm that can actually shut off the flow entirely to a very slow bolt and second we need to take the send queue into account when shuffling.
> This is not a full set of changes needed by STORM-2686 but it is a step in that direction.  I am going to try and set it up so that the two algorithms would work nicely together.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)