You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mohammad Tariq <do...@gmail.com> on 2016/05/31 17:33:03 UTC

Recommended way to close resources in a Spark streaming application

Dear fellow Spark users,

I have a streaming app which is reading data from Kafka, doing some
computations and storing the results into HBase. Since I am new to Spark
streaming I feel that there could still be scope of making my app better.

To begin with, I was wondering what's the best way to free up resources in
case of app shutdown(because of some exception, or some other cause). While
looking for some help online I bumped into the Spark doc which talks about
*spark.streaming.stopGracefullyOnShutdown*. This says I*f true, Spark shuts
down the StreamingContext gracefully on JVM shutdown rather than
immediately*.

Or, does it make more sense to add a *ShutdownHook* explicitly in my app
and call JavaStreamingContext.stop()?

One potential benefit which I see with *ShutdownHook* is that I could close
any external resources before the JVM dies inside the *run()* method.

Thoughts/suggestions??

Also, I am banking on the fact that Kafka direct takes care of exact once
data delivery. It'll start consuming data after the point where the app had
crashed. Is there any way I can restart my streaming app automatically in
case of a failure?

I'm really sorry to be a pest of questions. I could not satisfy myself with
the answers I found online. Thank you so much for your valuable time.
Really appreciate it!


[image: http://]

Tariq, Mohammad
about.me/mti
[image: http://]
<http://about.me/mti>