You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Tathagata Das (JIRA)" <ji...@apache.org> on 2018/07/11 19:46:00 UTC

[jira] [Resolved] (SPARK-24697) Fix the reported start offsets in streaming query progress

     [ https://issues.apache.org/jira/browse/SPARK-24697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tathagata Das resolved SPARK-24697.
-----------------------------------
       Resolution: Fixed
    Fix Version/s: 3.0.0

Issue resolved by pull request 21744
[https://github.com/apache/spark/pull/21744]

> Fix the reported start offsets in streaming query progress
> ----------------------------------------------------------
>
>                 Key: SPARK-24697
>                 URL: https://issues.apache.org/jira/browse/SPARK-24697
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 2.3.1
>            Reporter: Arun Mahadevan
>            Priority: Major
>             Fix For: 3.0.0
>
>
> Streaming query reports progress during each trigger (e.g. after runBatch in MicrobatchExcecution). However the reported progress has wrong offsets since the offsets are committed and committedOffsets is updated to the availableOffsets before the progress is reported.
> This leads to weird progress where startOffset and endOffsets are always the same.
> Sample output for Kafka source below. Here 11 rows are processed in the microbatch however the start and end offsets are same.
>  
> {code:java}
> {
>  "id" : "76bf5515-55be-46af-bc79-9fc92cc6d856",
>  "runId" : "b526f0f4-24bf-4ddc-b6e8-7b0cc83bdbe8",
> ...
> "sources" : [ {
>  "description" : "KafkaV2[Subscribe[topic2]]",
>  "startOffset" : {
>  "topic2" : {
>  "0" : 44
>  }
>  },
>  "endOffset" : {
>  "topic2" : {
>  "0" : 44
>  }
>  },
>  "numInputRows" : 11,
>  "inputRowsPerSecond" : 1.099670098970309,
>  "processedRowsPerSecond" : 1.8829168093118795
>  } ],
> ...
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org