You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Gabor Somogyi (JIRA)" <ji...@apache.org> on 2019/01/30 13:20:00 UTC

[jira] [Comment Edited] (SPARK-23685) Spark Structured Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive Offsets (i.e. Log Compaction)

    [ https://issues.apache.org/jira/browse/SPARK-23685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756077#comment-16756077 ] 

Gabor Somogyi edited comment on SPARK-23685 at 1/30/19 1:19 PM:
----------------------------------------------------------------

Comment from [~sindiri] on the PR:
{quote}Originally this pr was created as "failOnDataLoss" doesn't have any impact when set in structured streaming. But found out that ,the variable that needs to be used is "failondataloss" (all in lower case).
This is not properly documented in Spark documentations. Hence, closing the pr . Thanks{quote}
File a PR to fix the upper/lowercase things.


was (Author: gsomogyi):
Comment from [~sindiri] on the PR:
{quote}Originally this pr was created as "failOnDataLoss" doesn't have any impact when set in structured streaming. But found out that ,the variable that needs to be used is "failondataloss" (all in lower case).
This is not properly documented in Spark documentations. Hence, closing the pr . Thanks{quote}
Closing the jira.

> Spark Structured Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive Offsets (i.e. Log Compaction)
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-23685
>                 URL: https://issues.apache.org/jira/browse/SPARK-23685
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.2.0
>            Reporter: sirisha
>            Priority: Major
>
> When Kafka does log compaction offsets often end up with gaps, meaning the next requested offset will be frequently not be offset+1. The logic in KafkaSourceRDD & CachedKafkaConsumer assumes that the next offset will always be just an increment of 1 .If not, it throws the below exception:
>  
> "Cannot fetch records in [5589, 5693) (GroupId: XXX, TopicPartition:XXXX). Some data may have been lost because they are not available in Kafka any more; either the data was aged out by Kafka or the topic may have been deleted before all the data in the topic was processed. If you don't want your streaming query to fail on such cases, set the source option "failOnDataLoss" to "false". "
>  
> FYI: This bug is related to https://issues.apache.org/jira/browse/SPARK-17147
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org