You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "Jungtaek Lim (JIRA)" <ji...@apache.org> on 2018/02/05 03:04:00 UTC

[jira] [Commented] (STORM-2913) STORM-2844 made autocommit and at-most-once storm-kafka-client spouts log warnings on every emit, because those modes don't commit the right metadata to Kafka

    [ https://issues.apache.org/jira/browse/STORM-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16352005#comment-16352005 ] 

Jungtaek Lim commented on STORM-2913:
-------------------------------------

Thanks [~Srdo], merged into master.

Since there's some diverge between master and 1.x-branch for storm-kafka-client, I'd like to wait for [~Srdo] to submit pull request against 1.x branch. We can merge it afterwards.

> STORM-2844 made autocommit and at-most-once storm-kafka-client spouts log warnings on every emit, because those modes don't commit the right metadata to Kafka
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: STORM-2913
>                 URL: https://issues.apache.org/jira/browse/STORM-2913
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-kafka-client
>    Affects Versions: 2.0.0, 1.2.0
>            Reporter: Stig Rohde Døssing
>            Assignee: Stig Rohde Døssing
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 2.0.0
>
>          Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> The mechanism added in https://issues.apache.org/jira/browse/STORM-2844 to allow us to check whether a committed offset was committed by the currently running topology requires that we commit some metadata along with the offset.
> We are using this metadata for two things: Only applying the FirstPollOffsetStrategy when the topology is deployed, rather than when the worker is restarted, and an (IMO fairly unimportant) runtime check that the spout offset tracking is not in a bad state.
> Autocommit spouts don't include this metadata, and we also don't include it when committing offsets in at-most-once mode. We can fix at-most-once by switching to committing a custom OffsetAndMetadata, rather than using the no-arg commitSync variant. 
> I'm not sure what we should do to fix the autocommit case. There doesn't seem to be a way to include metadata in autocommits, so I don't think we can support this mechanism for autocommits. 
> If we can't fix the autocommit case, I see two options for fixing this:
> * Make doSeek have the old behavior for autocommits only (i.e. apply the FirstPollOffsetStrategy on every worker restart), and keep the new behavior for at-least-once/at-most-once. I think this behavior could be a little confusing.
> * Revert doSeek to the old behavior in all cases, and throw out the runtime check that uses the metadata. This also isn't a great option, because the new seek behavior is more useful than restarting on every worker reboot.
> What do you think [~hmclouro]? I'm leaning toward the first option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)