You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bhaskar E (JIRA)" <ji...@apache.org> on 2018/01/09 23:48:00 UTC

[jira] [Created] (SPARK-23017) Why would spark-kafka stream fail stating `Got wrong record for even after seeking to offset #` when using kafka API to commit offset

Bhaskar E created SPARK-23017:
---------------------------------

             Summary: Why would spark-kafka stream fail stating `Got wrong record for <groupid> <topic> <partition> even after seeking to offset #` when using kafka API to commit offset
                 Key: SPARK-23017
                 URL: https://issues.apache.org/jira/browse/SPARK-23017
             Project: Spark
          Issue Type: Question
          Components: Structured Streaming
    Affects Versions: 2.2.1
            Reporter: Bhaskar E
            Priority: Minor


My spark-kafka streaming job started failing after multiple messages stating - `Got wrong record for <groupid> <topic> <partition> even after seeking to offset # `.

I disabled `enable.auto.commit` and saving the commits (to kafka itself) manually using kafka API
{code}((CanCommitOffsets) messages.inputDStream()).commitAsync(offsetRanges.get());{code}

When I'm manually commit offsets to kafka and my job resumes requesting (kafka) (say after 1 hr recovering from some failure) for data then kafka should send the next available offsets (from last committed offset). 
So, when I'm using kafka itself to store my committed offsets then my spark job clearly doesn't know what's the next offset to request. But, here in the error message it states that it `Got wrong record ....  even after seeking to a particular offset #`. **So, how is this possible?**

If I assume that the spark-driver gets some offsets ahead from kafka (before initially reading the actual records) and then start requesting for the offsets even then it's confusing how could spark receive wrong offset when it is requesting for the offsets which it got from kafka itself in the first place?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org