You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/09/14 15:34:00 UTC
[jira] [Updated] (STORM-2733) Make Load Aware Shuffle much better
at really bad situations
[ https://issues.apache.org/jira/browse/STORM-2733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated STORM-2733:
----------------------------------
Labels: pull-request-available (was: )
> Make Load Aware Shuffle much better at really bad situations
> ------------------------------------------------------------
>
> Key: STORM-2733
> URL: https://issues.apache.org/jira/browse/STORM-2733
> Project: Apache Storm
> Issue Type: Bug
> Components: storm-client
> Affects Versions: 1.0.0, 2.0.0
> Reporter: Robert Joseph Evans
> Assignee: Robert Joseph Evans
> Labels: pull-request-available
> Fix For: 2.0.0
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> We recently had an issue where some bolts got really backed up and started to die from OOMs. The issue ended up being 2 fold.
> First the GC really slowed down the worker so much that it could not keep up even with < 1% of the traffic that was still being sent to it. Which made it almost impossible to recover.
> The second issue was that the serialization of the tuples took a lot longer than the processing, which resulted in the send queue filling up much more quickly than the receive queue.
> To help fix this issue I plan to address this in 2 ways. First we need a better algorithm that can actually shut off the flow entirely to a very slow bolt and second we need to take the send queue into account when shuffling.
> This is not a full set of changes needed by STORM-2686 but it is a step in that direction. I am going to try and set it up so that the two algorithms would work nicely together.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)