You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "etienne (JIRA)" <ji...@apache.org> on 2016/09/20 09:18:20 UTC

[jira] [Created] (SPARK-17606) New batches are not created when there are 1000 created after restarting streaming from checkpoint.

etienne created SPARK-17606:
-------------------------------

             Summary: New batches are not created when there are 1000 created after restarting streaming from checkpoint.
                 Key: SPARK-17606
                 URL: https://issues.apache.org/jira/browse/SPARK-17606
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 1.6.1
            Reporter: etienne


When spark restarts from a checkpoint after being down for a while.
It recreates missing batch since the down time.

When there are few missing batches, spark creates new incoming batch every batchTime, but when there is enough missing time to create 1000 batches no new batch is created.

So when all these batch are completed the stream is idle ...

I think there is a rigid limit set somewhere.

I was expecting that spark continue to recreate missed batches, maybe not all at once ( because it's look like it's cause driver memory problem ), and then recreate batches each batchTime.

Another solution would be to not create missing batches but still restart the direct input.

Right know for me the only solution to restart a stream after a long break it to remove the checkpoint to allow the creation of a new stream. But losing all my states.

ps : I'm speaking about direct Kafka input because it's the source I'm currently using, I don't know what happens with other sources.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org