You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Vasu Dasari (Jira)" <ji...@apache.org> on 2020/03/30 10:53:00 UTC

[jira] [Commented] (SPARK-29709) structured streaming The offset in the checkpoint is suddenly reset to the earliest

    [ https://issues.apache.org/jira/browse/SPARK-29709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17070877#comment-17070877 ] 

Vasu Dasari commented on SPARK-29709:
-------------------------------------

More details about the issue:

Spark structured Streaming -Kafka, ListOffsetRequest to earliest(-2) for existing partition:

Issue identified as spark has been sending ListOffsetRequest for existing partition as "earliest(-2)" instead of latest(-1). This has been occuring all of sudden after couple of hours application got started. 
As per the documentaion, spark should request as earliest for only new partitions but in this request with earliest is being send to existing particular partition. Due to this, offsets are being set to LSO(Stable offset) and duplicate data is being consumed.

Below are the logs at consumer in DEBUG mode. Duplicates have flown from partition-20 and at this point, below 2 lines are bit different from normal logs. Also no error message in logs.This behaviour looks peculiar .

29/03/20 18:27:53 DEBUG internals.Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-bca38026-e65e-4e24-8d86-aaa-driver-0]
 Sending ListOffsetRequest (type=ListOffsetRequest, replicaId=-1, partitionTimestamps=\{Hid1328.gnpp-raw-20=-2}, isolationLevel=READ_UNCOMMITTED) to broker famescpolyfi5.teliacompany.net:9095 (id: 5 rack: null)

29/03/20 18:27:53 DEBUG internals.Fetcher: [Consumer clientId=consumer-1, groupId=spark-kafka-source-bca38026-e65e-4e24-8d86-9daf0d34b406--1757490130-driver-0] Sending ListOffsetRequest (type=ListOffsetRequest, replicaId=-1, partitionTimestamps=\{Hid1328.gnpp-raw-5=-1, Hid1328.gnpp-raw-20=-1, Hid1328.gnpp-raw-25=-1, Hid1328.gnpp-raw-10=-1, Hid1328.gnpp-raw-15=-1, Hid1328.gnpp-raw-0=-1}, isolationLevel=READ_UNCOMMITTED) to broker famescpolyfi5.teliacompany.net:9095 (id: 5 rack: null).

 

Could update further about the issue in clone ticket: @spark-31303

> structured streaming The offset in the checkpoint is suddenly reset to the earliest
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-29709
>                 URL: https://issues.apache.org/jira/browse/SPARK-29709
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.4.0
>            Reporter: robert
>            Priority: Major
>
> structured streaming The offset in the checkpoint is suddenly reset to the earliest,
> One of the partitions in the kafka topic
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org