You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "Rick Kellogg (JIRA)" <ji...@apache.org> on 2015/09/26 04:41:04 UTC

[jira] [Updated] (STORM-399) Kafka Spout defaulting to latest offset when current offset is older then 100k

     [ https://issues.apache.org/jira/browse/STORM-399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rick Kellogg updated STORM-399:
-------------------------------
    Component/s: storm-kafka

> Kafka Spout defaulting to latest offset when current offset is older then 100k
> ------------------------------------------------------------------------------
>
>                 Key: STORM-399
>                 URL: https://issues.apache.org/jira/browse/STORM-399
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-kafka
>    Affects Versions: 0.9.2-incubating
>            Reporter: Curtis Allen
>            Assignee: Curtis Allen
>            Priority: Minor
>             Fix For: 0.9.3
>
>
> Using storm and storm-kafka 0.9.2-incubating
> In the storm kafka spout the default for maxOffsetBehind is 100000
> see https://github.com/apache/incubator-storm/blob/v0.9.2-incubating/external/storm-kafka/src/jvm/storm/kafka/KafkaConfig.java#L38
> This default is too low and causes the kafka spout to start from the latest offset instead of the last committed offset without warning.
> see https://github.com/apache/incubator-storm/blob/v0.9.2-incubating/external/storm-kafka/src/jvm/storm/kafka/PartitionManager.java#L95
> Producing the following log output from the storm worker processes
> {code}
> 2014-07-09 18:02:15 s.k.PartitionManager [INFO] Read last commit
> offset from zookeeper: 15266940; old topology_id:
> ef3f1f89-f64c-4947-b6eb-0c7fb9adb9ea - new topology_id:
> 5747dba6-c947-4c4f-af4a-4f50a84817bf
> 2014-07-09 18:02:15 s.k.PartitionManager [INFO] Last commit offset
> from zookeeper: 15266940
> 2014-07-09 18:02:15 s.k.PartitionManager [INFO] Commit offset 22092614
> is more than 100000 behind, resetting to startOffsetTime=-2
> 2014-07-09 18:02:15 s.k.PartitionManager [INFO] Starting Kafka
> prd-use1c-pr-08-kafka-kamq-0004:4 from offset 22092614
> {code}
> To fix this problem I ended up setting spout config in my topology like so
> {code}
> spoutConf.maxOffsetBehind = Long.MAX_VALUE;
> {code}
> Why would the kafka spout skip to the latest offset if the current offset
> is more then 100000 behind by default?
> This seems like a bad default value, the spout literally skipped over
> months of data without any warning.
> Are the core contributors open to accepting a pull request that would set
> the default to Long.MAX_VALUE?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)