You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@storm.apache.org by "Stig Rohde Døssing (JIRA)" <ji...@apache.org> on 2018/01/27 00:32:00 UTC

[jira] [Created] (STORM-2913) STORM-2844 made autocommit and at-most-once topologies log warnings on every emit

Stig Rohde Døssing created STORM-2913:
-----------------------------------------

Summary: STORM-2844 made autocommit and at-most-once topologies log warnings on every emit
Key: STORM-2913
URL: https://issues.apache.org/jira/browse/STORM-2913
Project: Apache Storm
Issue Type: Bug
Components: storm-kafka-client
Affects Versions: 2.0.0, 1.2.0
Reporter: Stig Rohde Døssing

The mechanism added in https://issues.apache.org/jira/browse/STORM-2844 to allow us to check whether a committed offset was committed by the currently running topology requires that we commit some metadata along with the offset.

We are using this metadata for two things: Only applying the FirstPollOffsetStrategy when the topology is deployed, rather than when the worker is restarted, and an (IMO fairly unimportant) runtime check that the spout offset tracking is not in a bad state.

Autocommit spouts don't include this metadata, and we also don't include it when committing offsets in at-most-once mode. We can fix at-most-once by switching to committing a custom OffsetAndMetadata, rather than using the no-arg commitSync variant.

I'm not sure what we should do to fix the autocommit case. There doesn't seem to be a way to include metadata in autocommits, so I don't think we can support this mechanism for autocommits.

If we can't fix the autocommit case, I see two options for fixing this:
* Make doSeek have the old behavior for autocommits only (i.e. apply the FirstPollOffsetStrategy on every worker restart), and keep the new behavior for at-least-once/at-most-once. I think this behavior could be a little confusing.
* Revert doSeek to the old behavior in all cases, and throw out the runtime check that uses the metadata. This also isn't a great option, because the new seek behavior is more useful than restarting on every worker reboot.

What do you think [~hmclouro]? I'm leaning toward the first option.

--
This message was sent by Atlassian JIRA
(v7.6.3#76005)