You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/01/09 02:10:34 UTC

[jira] [Commented] (STORM-618) Add spoutconfig option to make kafka spout process messages at most once.

    [ https://issues.apache.org/jira/browse/STORM-618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14270331#comment-14270331 ] 

ASF GitHub Bot commented on STORM-618:
--------------------------------------

GitHub user sweetest opened a pull request:

    https://github.com/apache/storm/pull/376

    STORM-618 :  Add spoutconfig option to make kafka spout process messages at most once.

    Closes [STORM-618](https://issues.apache.org/jira/browse/STORM-618)
    
    While it's nice for kafka spout to push failed tuple back into a sorted set and try to process it again, this way of guaranteed message processing sometimes makes situation pretty bad when a failed tuple repeatedly fails in downstream bolts since PartitionManager#fill method tries to fetch from that offset repeatedly.
    
    This is a corresponding code snippet.
    
        private void fill() {
    ...
            if (had_failed) {
                offset = failed.first();
            } else {
                offset = _emittedToOffset;
            }
    ...
                msgs = KafkaUtils.fetchMessages(_spoutConfig, _consumer, _partition, offset);
    ...
    
    So there should be an option for a developer to decide if he wants to process failed tuple again or just skip failed tuple. One of the best thing of Storm is that spout together with trident can be implemented to guarantee at-least-once,exactly-once and at-most-once message processing.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sweetest/storm master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/376.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #376
    
----
commit 07a0106878896266c9f97f3622bcf708425ee15a
Author: SEUNGJIN LEE <sw...@navercorp.com>
Date:   2015-01-09T00:58:47Z

    STORM-618 :  Add spoutconfig option to make kafka spout process messages at most once.

----


>  Add spoutconfig option to make kafka spout process messages at most once.
> --------------------------------------------------------------------------
>
>                 Key: STORM-618
>                 URL: https://issues.apache.org/jira/browse/STORM-618
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-kafka
>    Affects Versions: 0.9.3
>            Reporter: Adrian Seungjin Lee
>
> While it's nice for kafka spout to push failed tuple back into a sorted set and try to process it again, this way of guaranteed message processing sometimes makes situation pretty bad when a failed tuple repeatedly fails in downstream bolts since PartitionManager#fill method tries to fetch from that offset repeatedly.
> This is a corresponding code snippet.
>     private void fill() {
> ...
>         if (had_failed) {
>             offset = failed.first();
>         } else {
>             offset = _emittedToOffset;
>         }
> ...
>             msgs = KafkaUtils.fetchMessages(_spoutConfig, _consumer, _partition, offset);
> ...
> So there should be an option for a developer to decide if he wants to process failed tuple again or just skip failed tuple. One of the best thing of Storm is that spout together with trident can be implemented to guarantee at-least-once,exactly-once and at-most-once message processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)