You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Tobias Pfeiffer <tg...@preferred.jp> on 2014/11/13 09:10:06 UTC

StreamingContext does not stop

Hi,

I am processing a bunch of HDFS data using the StreamingContext (Spark
1.1.0) which means that all files that exist in the directory at start()
time are processed in the first batch. Now when I try to stop this stream
processing using `streamingContext.stop(false, false)` (that is, even with
stopGracefully = false), it has no effect. The stop() call blocks and data
processing continues (probably it would stop after the batch, but that
would be too long since all my data is in that batch).

I am not exactly sure if this is generally true or only for the first
batch. Also I observed that stopping the stream processing during the first
batch does occasionally lead to a very long time until the stop takes place
(even if there is no data present at all).

Has anyone experienced something similar? In my processing code, do I have
to do something particular (like checking for the state of the
StreamingContext) to allow the interruption? It is quite important for me
that stopping the stream processing takes place rather quickly.

Thanks
Tobias

Re: StreamingContext does not stop

Posted by Tobias Pfeiffer <tg...@preferred.jp>.
Hi,

I guess I found part of the issue: I said
  dstream.transform(rdd => { rdd.foreachPartition(...); rdd })
instead of
  dstream.transform(rdd => { rdd.mapPartitions(...) }),
that's why stop() would not stop the processing.

Now with the new version a non-graceful shutdown works in the sense that
Spark does not wait for my processing to complete; job generator, job
scheduler, job executor etc. all seem to be shut down fine, just the
threads that do the actual processing are not. Even after
streamingContext.stop() is done, I see logging output from my processing
task.

Is there any way to signal to my processing tasks that they should stop the
processing?

Thanks
Tobias