You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "Janith Kaiprath Valiyalappil (JIRA)" <ji...@apache.org> on 2018/11/04 21:03:00 UTC

[jira] [Created] (STORM-3279) Kafka trident spout could loose its position with EARLIEST or LATEST FirstPollOffsetStrategy

Janith Kaiprath Valiyalappil created STORM-3279:
---------------------------------------------------

             Summary: Kafka trident spout could loose its position with EARLIEST or LATEST FirstPollOffsetStrategy
                 Key: STORM-3279
                 URL: https://issues.apache.org/jira/browse/STORM-3279
             Project: Apache Storm
          Issue Type: Bug
          Components: trident
    Affects Versions: 2.0.1
            Reporter: Janith Kaiprath Valiyalappil


In KafkaTridentSpoutEmitter emitPartitionBatch() function, when kafkaConsumer.poll(pollTimeoutMs) returns 0 records for the very first transaction where FirstPollOffsetStrategy is set to EARLIEST or LATEST, the spout fails to move to EARLIEST or LATEST, and continues from the last metadata position.

 

The flow of events which would cause this bug :

 

1. FirstPollOffsetStrategy set to EARLIEST or LATEST

2. For first transaction after restart txid1 Based on [link L164|https://github.com/apache/storm/blob/master/external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/trident/KafkaTridentSpoutEmitter.java#L164] ,

The currentBatch is initialized to lastBatchMeta (which need not be null);

3. Later in L171, the consumer seeks to "start" OR "end"

4. Then consumer.poll(pollTimeoutMs) is called.

5. If poll returns non 0 records , currentBatch is set to a new metadata . *If poll returns 0 records,*

*currentBatch is not reset ie, currentBatch is still lastBatchMeta (which need not be null)*

 

So now in transaction txid2 after txid1, isFirstPoll() returns false, and the spout continues from lastBatchMeta.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)