You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Rado Buransky (JIRA)" <ji...@apache.org> on 2016/01/08 05:51:39 UTC

[jira] [Comment Edited] (SPARK-12693) OffsetOutOfRangeException cause by retention

    [ https://issues.apache.org/jira/browse/SPARK-12693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15088704#comment-15088704 ] 

Rado Buransky edited comment on SPARK-12693 at 1/8/16 4:50 AM:
---------------------------------------------------------------

I wasn't actually right. This issue is not related to short retention. It depends on how often does Kafka cleaning. So if you have retention 7 days, segmentation 100 millisecond and you check retention every 100 millisecond, then this issue has very high chance to occur. Actually it is difficult to avoid it at all in this case.

Ok, let's not look at it as a bug on Spark side. What would be the proposed implementation to create a direct stream to Kafka to either avoid this issue, or to be able to handle it?


was (Author: radoburansky):
I wasn't actually right. This issue is not related to short retention. It depends on how often does Kafka cleaning. So if you have retention 7 days, segmentation 100 millisecond and you check retention every 100 millisecond, then this issue has very high chance to occur. Actually it is difficult to avoid it at all in this case.

Ok, let's not look at it as a bug on Spark side. What would be the proposed implementation to create a direct stream to Kafka to either avoid this issue, or to be handle it?

> OffsetOutOfRangeException cause by retention
> --------------------------------------------
>
>                 Key: SPARK-12693
>                 URL: https://issues.apache.org/jira/browse/SPARK-12693
>             Project: Spark
>          Issue Type: Bug
>          Components: Streaming
>    Affects Versions: 1.6.0
>         Environment: Ubuntu 64bit, Intel i7
>            Reporter: Rado Buransky
>            Priority: Minor
>              Labels: kafka
>         Attachments: kafka-log.txt, log.txt
>
>
> I am running Kafka server locally with extremely low retention of 3 seconds and with 1 second segmentation. I create direct Kafka stream with auto.offset.reset = smallest. 
> In case of bad luck (happens actually quite often in my case) the smallest offset retrieved druing stream initialization doesn't already exists when streaming actually starts.
> Complete source code of the Spark Streaming application is here:
> https://github.com/pygmalios/spark-checkpoint-experience/blob/cb27ab83b7a29e619386b56e68a755d7bd73fc46/src/main/scala/com/pygmalios/sparkCheckpointExperience/spark/SparkApp.scala
> The application ends in an endless loop trying to get that non-existing offset and has to be killed. Check attached logs from Spark and also from Kafka server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org