You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "Stig Rohde Døssing (JIRA)" <ji...@apache.org> on 2019/01/10 17:48:00 UTC

[jira] [Commented] (STORM-2359) Revising Message Timeouts

    [ https://issues.apache.org/jira/browse/STORM-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739639#comment-16739639 ] 

Stig Rohde Døssing commented on STORM-2359:
-------------------------------------------

I've been taking a look at the feasibility of automatically resetting timeouts for tuples that are still being processed, and I think we can do it without much overhead.

The idea is to track the anchor ids of each non-system message that enters an executor in/out queue. For the inbound queue, the anchor is no longer in progress when the associated tuple is acked or failed. For the outbound queue (pendingEmits in the Executor), the anchor is no longer in progress when the associated tuple gets flushed from pendingEmits.

Occasionally a thread will check the set of in-progress anchors for the worker and send reset messages for all of them to the relevant ackers. In order to avoid sending too many messages, this thread snapshots the anchor set when it runs, and only sends reset messages for anchors that have been in progress sufficiently long in that worker.

Since there may be more than one tuple per anchor, anchors are tracked as a count in a multiset, rather than just presence in a set.

I've updated the spreadsheet with benchmark numbers for TVL with this functionality enabled. For the 90k example I also did a run where the grace period is disabled, to show the penalty for sending resets in the worst case, i.e. all in progress tuples have their timeouts reset every time the resetter thread runs.

The code is available at https://github.com/srdo/storm/tree/auto-reset-timeout. Only the latest commit is new.

> Revising Message Timeouts
> -------------------------
>
>                 Key: STORM-2359
>                 URL: https://issues.apache.org/jira/browse/STORM-2359
>             Project: Apache Storm
>          Issue Type: Sub-task
>          Components: storm-core
>    Affects Versions: 2.0.0
>            Reporter: Roshan Naik
>            Assignee: Stig Rohde Døssing
>            Priority: Major
>         Attachments: STORM-2359.ods, STORM-2359.ods
>
>
> A revised strategy for message timeouts is proposed here.
> Design Doc:
>  https://docs.google.com/document/d/1am1kO7Wmf17U_Vz5_uyBB2OuSsc4TZQWRvbRhX52n5w/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)