You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Richard Gilmore (Jira)" <ji...@apache.org> on 2020/03/04 15:48:00 UTC

[jira] [Created] (SPARK-31040) Offsets are only logged for partitions which had data this causes next batch to read the partitions that were not included from the beginning when using kafka

Richard Gilmore created SPARK-31040:
---------------------------------------

             Summary: Offsets are only logged for partitions which had data this causes next batch to read the partitions that were not included from the beginning when using kafka
                 Key: SPARK-31040
                 URL: https://issues.apache.org/jira/browse/SPARK-31040
             Project: Spark
          Issue Type: Bug
          Components: Structured Streaming
    Affects Versions: 2.4.5, 2.4.4, 2.4.0
            Reporter: Richard Gilmore


Each batch should either log all offsets for each partition or should scan back across commit logs.

[https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala]

offset log 23615

 
{code:java}
{"myTopic.myTopic.orders":{"2":27531503,"5":27562423,"4":27528794,"1":27514991,"3":27528899,"0":27504949}}%
{code}
 

 

offset log 23616

 
{code:java}
{"myTopic.myTopic.orders":{"1":27515130,"0":27505140}}%Topic
{code}
 

 
{code:java}
/0/03/04 13:49:05 INFO MicroBatchExecution: Resuming at batch 26317 with committed offsets {KafkaV2[Subscribe[myTopic.myTopic.orders]]: {"myTopic.myTopic.orders":{"1":27515130,"0":27505140}}} and available offsets {KafkaV2[Subscribe[myTopic.myTopic.orders]]: {"myTopic.myTopic.orders":{"2":27531625,"5":27562568,"4":27528990,"1":27515131,"3":27529075,"0":27505141}}}commit log: {"myTopic.myTopic.orders":{"1":27515130,"0":27505140}}%0/03/04 13:50:24 INFO KafkaMicroBatchReader: Partitions added: Map(myTopic.myTopic.orders-3 -> 26533520, myTopic.myTopic.orders-2 -> 26533730, myTopic.myTopic.orders-4 -> 26533608, myTopic.myTopic.orders-5 -> 26533486)
20/03/04 13:50:24 WARN KafkaMicroBatchReader: Added partition myTopic.myTopic.orders-3 starts from 26533520 instead of 0. Some data may have been missed.
20/03/04 13:50:24 WARN KafkaMicroBatchReader: Added partition myTopic.myTopic.orders-2 starts from 26533730 instead of 0. Some data may have been missed.
20/03/04 13:50:24 WARN KafkaMicroBatchReader: Added partition myTopic.myTopic.orders-4 starts from 26533608 instead of 0. Some data may have been missed.
20/03/04 13:50:24 WARN KafkaMicroBatchReader: Added partition myTopic.myTopic.orders-5 starts from 26533486 instead of 0. Some data may have been missed.

{code}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org