You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "Stig Rohde Døssing (JIRA)" <ji...@apache.org> on 2017/11/19 08:13:00 UTC

[jira] [Commented] (STORM-2824) Ability to configure topologies for exactly once processing

    [ https://issues.apache.org/jira/browse/STORM-2824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16258382#comment-16258382 ] 

Stig Rohde Døssing commented on STORM-2824:
-------------------------------------------

There is already an option to ack tuples immediately before emiting them from the spout, leading to an at-most-once processing guarantee, which is what it sounds like you're describing. You can get this effect by setting topology.acker.executors to 0 in your topology config.

Exactly-once processing is not supported by base Storm out of the box, but you can use Trident (https://storm.apache.org/releases/2.0.0-SNAPSHOT/Trident-state.html and maybe https://storm.apache.org/releases/2.0.0-SNAPSHOT/Trident-tutorial.html) to get idempotent writes to your data stores. The data in your data store will end up only being written once regardless of how many times it was processed. As far as I know no one has figured out how to guarantee exactly-once processing for streaming systems yet.

> Ability to configure topologies for exactly once processing
> -----------------------------------------------------------
>
>                 Key: STORM-2824
>                 URL: https://issues.apache.org/jira/browse/STORM-2824
>             Project: Apache Storm
>          Issue Type: Improvement
>          Components: storm-core
>    Affects Versions: 1.0.1
>         Environment: CentOS 7, Docker
>            Reporter: Anton Alfred
>            Priority: Minor
>
> The default implementation of a spout  (Kafka) is to wait for acknowledgement, if an acknowledgement is not provided the tuple is replayed leading to an at least once processing model.
> Can an option be provided to always acknowledge even in the event of error in any spout or bolt and the user decide which mode the topology should be configured.
> There are cases like multiple bolts (B) inserting to persistent stores (PS) like B1 - PS1, B2-PS2, B3-PS3, the fact that B2-PS2 bolt fail doesn't mean that the tuple needs to be replayed leading to complexity on the logic of bolts, it would be easier if this was configurable and the user of the topology decides which style to choose.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)