You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Oz Ben Ami <oz...@doubleverify.com> on 2018/04/30 16:03:18 UTC

Re: [Kubernetes] structured-streaming driver restarts / roadmap

This would be useful to us, so I've created a JIRA ticket for this
discussion: https://issues.apache.org/jira/browse/SPARK-24122

On Wed, Mar 28, 2018 at 10:28 AM, Anirudh Ramanathan <
ramanathana@google.com.invalid> wrote:

> We discussed this early on in our fork and I think we should have this in
> a JIRA and discuss it further. It's something we want to address in the
> future.
>
> One proposed method is using a StatefulSet of size 1 for the driver. This
> ensures recovery but at the same time takes away from the completion
> semantics of a single pod.
>
>
> See history in https://github.com/apache-spark-on-k8s/spark/issues/288
>
> On Wed, Mar 28, 2018, 6:56 AM Lucas Kacher <lu...@vsco.co> wrote:
>
>> A carry-over from the apache-spark-on-k8s project, it would be useful to
>> have a configurable restart policy for submitted jobs with the Kubernetes
>> resource manager. See the following issues:
>>
>> https://github.com/apache-spark-on-k8s/spark/issues/133
>> https://github.com/apache-spark-on-k8s/spark/issues/288
>> https://github.com/apache-spark-on-k8s/spark/issues/546
>>
>> Use case: I have a structured streaming job that reads from Kafka,
>> aggregates, and writes back out to Kafka deployed via k8s and checkpointing
>> to a remote location. If the driver pod dies for a any number of reasons,
>> it will not restart.
>>
>> For us, as all data is stored via checkpoint and we are satisfied with
>> at-least-once semantics, it would be useful if the driver were to come back
>> on it's own and pick back up.
>>
>> Firstly, may we add this to JIRA? Secondly, Is there any insight as to
>> what the thought is around allowing that to be configurable in the future?
>> If it's not something that will happen natively, we will end up needing to
>> write something that polls or listens to k8s and has the ability to
>> re-submit any failed jobs.
>>
>> Thanks!
>>
>> --
>>
>> *Lucas Kacher*Senior Engineer
>> -
>> vsco.co <https://www.vsco.co/>
>> New York, NY
>> 818.512.5239
>>
>