You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by S <sh...@gmail.com> on 2022/02/16 09:17:17 UTC

Implementing circuit breaker pattern in Spark

Hi,

We have a spark job that calls a microservice in the lambda function of the
flatmap transformation  -> passes to this microservice, the inbound element
in the lambda function and returns the transformed value or "None" from the
microservice as an output of this flatMap transform. Of course the lambda
also takes care of exceptions from the microservice etc.. The question is:
there are times when the microservice may be down and there is no point
recording an exception and putting the message in the DLQ for every element
in our streaming pipeline so long as the microservice stays down. Instead
we want to be able to do is retry the microservice call for a given event
for a predefined no. of times and if found to be down then terminate the
spark job so that this current microbatch is terminated and there is no
next microbatch and the rest of the messages continue therefore continue to
be in the source kafka topics unpolled and therefore unprocesseed.  until
the microservice is back up and the spark job is redeployed again. In
regular microservices, we can implement this using the Circuit breaker
pattern. In Spark jobs however this would mean, being able to somehow send
a signal from an executor JVM to the driver JVM to terminate the Spark job.
Is there a way to do that in Spark?

P.S.:
- Having the circuit breaker functionality helps specificize the purpose of
the DLQ to data or schema issues only instead of infra/network related
issues.
- As far as the need for the Spark job to use microservices is concerned,
think of it as a complex logic being maintained in a microservice that does
not warrant duplication.
- checkpointing is being taken care of manually and not using spark's
default checkpointing mechanism.

Regards,
Sheel

Re: Implementing circuit breaker pattern in Spark

Posted by Mich Talebzadeh <mi...@gmail.com>.

Are you going to terminate the spark process if the queue is empty?



   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 16 Feb 2022 at 09:18, S <sh...@gmail.com> wrote:

> Hi,
>
> We have a spark job that calls a microservice in the lambda function of
> the flatmap transformation  -> passes to this microservice, the inbound
> element in the lambda function and returns the transformed value or "None"
> from the microservice as an output of this flatMap transform. Of course the
> lambda also takes care of exceptions from the microservice etc.. The
> question is: there are times when the microservice may be down and there is
> no point recording an exception and putting the message in the DLQ for
> every element in our streaming pipeline so long as the microservice stays
> down. Instead we want to be able to do is retry the microservice call for a
> given event for a predefined no. of times and if found to be down then
> terminate the spark job so that this current microbatch is terminated and
> there is no next microbatch and the rest of the messages continue therefore
> continue to be in the source kafka topics unpolled and therefore
> unprocesseed.  until the microservice is back up and the spark job is
> redeployed again. In regular microservices, we can implement this using the
> Circuit breaker pattern. In Spark jobs however this would mean, being able
> to somehow send a signal from an executor JVM to the driver JVM to
> terminate the Spark job. Is there a way to do that in Spark?
>
> P.S.:
> - Having the circuit breaker functionality helps specificize the purpose
> of the DLQ to data or schema issues only instead of infra/network related
> issues.
> - As far as the need for the Spark job to use microservices is concerned,
> think of it as a complex logic being maintained in a microservice that does
> not warrant duplication.
> - checkpointing is being taken care of manually and not using spark's
> default checkpointing mechanism.
>
> Regards,
> Sheel
>

Re: Implementing circuit breaker pattern in Spark

Posted by Gourav Sengupta <go...@gmail.com>.

Hi,
From a technical perspective I think that we all agree, there are no
arguments.

From a design/ architecture point of view, given that big data was supposed
to solve design challenges on volume, velocity, veracity, and variety, and
companies usually investing in data solutions build them to run
economically, with security, costs and other implications for at least 3 to
4 years.

There is an old saying, do not fit the solution to the problem. May be I do
not understand the problem, and therefore saying all wrong things :)


Regards,
Gourav Sengupta


On Wed, Feb 16, 2022 at 3:31 PM Sean Owen <sr...@gmail.com> wrote:

> There's nothing wrong with calling microservices this way. Something needs
> to call the service with all the data arriving, and Spark is fine for
> executing arbitrary logic including this kind of thing.
> Kafka does not change that?
>
> On Wed, Feb 16, 2022 at 9:24 AM Gourav Sengupta <go...@gmail.com>
> wrote:
>
>> Hi,
>> once again, just trying to understand the problem first.
>>
>> Why are we using SPARK to place calls to micro services? There are
>> several reasons why this should never happen, including costs/ security/
>> scalability concerns, etc.
>>
>> Is there a way that you can create a producer and put the data into Kafka
>> first?
>>
>> Sorry, I am not suggesting any solutions, just trying to understand the
>> problem first.
>>
>>
>> Regards,
>> Gourav
>>
>>
>>
>> On Wed, Feb 16, 2022 at 2:36 PM S <sh...@gmail.com> wrote:
>>
>>> No I want the job to stop and end once it discovers on repeated retries
>>> that the microservice is not responding. But I think I got where you were
>>> going right after sending my previous mail. Basically repeatedly failing of
>>> your tasks on retries ultimately fails your job anyway. So thats an
>>> in-built circuit breaker. So what that essentially means is we should not
>>> be catching those HTTP 5XX exceptions (which we currently do) and let the
>>> tasks fail on their own only for spark to retry them for finite number of
>>> times and then subsequently fail and thereby break the circuit. Thanks.
>>>
>>> On Wed, Feb 16, 2022 at 7:59 PM Sean Owen <sr...@gmail.com> wrote:
>>>
>>>> You stop the Spark job by tasks failing repeatedly, that's already how
>>>> it works. You can't kill the driver from the executor other ways, but
>>>> should not need to. I'm not clear, you're saying you want to stop the job,
>>>> but also continue processing?
>>>>
>>>> On Wed, Feb 16, 2022 at 7:58 AM S <sh...@gmail.com> wrote:
>>>>
>>>>> Retries have been already implemented. The question is how to stop the
>>>>> spark job by having an executor JVM send a signal to the driver JVM. e.g. I
>>>>> have a microbatch of 30 messages; 10 in each of the 3 partitions. Let's say
>>>>> while a partition of 10 messages was being processed, first 3 went through
>>>>> but then the microservice went down. Now when the 4th message in the
>>>>> partition is sent to the microservice it keeps receiving 5XX on every retry
>>>>> e.g. 5 retries. What I now want is to have that task from that executor JVM
>>>>> send a signal to the driver JVM to terminate the spark job on the failure
>>>>> of the 5th retry. Currently, what we have in place is retrying it 5 times
>>>>> and then upon failure i.e. 5XX catch the exception and move the message to
>>>>> a DLQ thereby having the flatmap produce a *None* and proceed to the
>>>>> next message in the partition of that microbatch. This approach keeps the
>>>>> pipeline alive and keeps pushing messages to DLQ microbatch after
>>>>> microbatch until the microservice is back up.
>>>>>
>>>>>
>>>>> On Wed, Feb 16, 2022 at 6:50 PM Sean Owen <sr...@gmail.com> wrote:
>>>>>
>>>>>> You could use the same pattern in your flatMap function. If you want
>>>>>> Spark to keep retrying though, you don't need any special logic, that is
>>>>>> what it would do already. You could increase the number of task retries
>>>>>> though; see the spark.excludeOnFailure.task.* configurations.
>>>>>>
>>>>>> You can just implement the circuit breaker pattern directly too,
>>>>>> nothing special there, though I don't think that's what you want? you
>>>>>> actually want to retry the failed attempts, not just avoid calling the
>>>>>> microservice.
>>>>>>
>>>>>> On Wed, Feb 16, 2022 at 3:18 AM S <sh...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> We have a spark job that calls a microservice in the lambda function
>>>>>>> of the flatmap transformation  -> passes to this microservice, the inbound
>>>>>>> element in the lambda function and returns the transformed value or "None"
>>>>>>> from the microservice as an output of this flatMap transform. Of course the
>>>>>>> lambda also takes care of exceptions from the microservice etc.. The
>>>>>>> question is: there are times when the microservice may be down and there is
>>>>>>> no point recording an exception and putting the message in the DLQ for
>>>>>>> every element in our streaming pipeline so long as the microservice stays
>>>>>>> down. Instead we want to be able to do is retry the microservice call for a
>>>>>>> given event for a predefined no. of times and if found to be down then
>>>>>>> terminate the spark job so that this current microbatch is terminated and
>>>>>>> there is no next microbatch and the rest of the messages continue therefore
>>>>>>> continue to be in the source kafka topics unpolled and therefore
>>>>>>> unprocesseed.  until the microservice is back up and the spark job is
>>>>>>> redeployed again. In regular microservices, we can implement this using the
>>>>>>> Circuit breaker pattern. In Spark jobs however this would mean, being able
>>>>>>> to somehow send a signal from an executor JVM to the driver JVM to
>>>>>>> terminate the Spark job. Is there a way to do that in Spark?
>>>>>>>
>>>>>>> P.S.:
>>>>>>> - Having the circuit breaker functionality helps specificize the
>>>>>>> purpose of the DLQ to data or schema issues only instead of infra/network
>>>>>>> related issues.
>>>>>>> - As far as the need for the Spark job to use microservices is
>>>>>>> concerned, think of it as a complex logic being maintained in a
>>>>>>> microservice that does not warrant duplication.
>>>>>>> - checkpointing is being taken care of manually and not using
>>>>>>> spark's default checkpointing mechanism.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Sheel
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Best Regards,
>>>>>
>>>>> Sheel Pancholi
>>>>>
>>>>
>>>
>>> --
>>>
>>> Best Regards,
>>>
>>> Sheel Pancholi
>>>
>>

Re: Implementing circuit breaker pattern in Spark

Posted by Sean Owen <sr...@gmail.com>.

There's nothing wrong with calling microservices this way. Something needs
to call the service with all the data arriving, and Spark is fine for
executing arbitrary logic including this kind of thing.
Kafka does not change that?

On Wed, Feb 16, 2022 at 9:24 AM Gourav Sengupta <go...@gmail.com>
wrote:

> Hi,
> once again, just trying to understand the problem first.
>
> Why are we using SPARK to place calls to micro services? There are several
> reasons why this should never happen, including costs/ security/
> scalability concerns, etc.
>
> Is there a way that you can create a producer and put the data into Kafka
> first?
>
> Sorry, I am not suggesting any solutions, just trying to understand the
> problem first.
>
>
> Regards,
> Gourav
>
>
>
> On Wed, Feb 16, 2022 at 2:36 PM S <sh...@gmail.com> wrote:
>
>> No I want the job to stop and end once it discovers on repeated retries
>> that the microservice is not responding. But I think I got where you were
>> going right after sending my previous mail. Basically repeatedly failing of
>> your tasks on retries ultimately fails your job anyway. So thats an
>> in-built circuit breaker. So what that essentially means is we should not
>> be catching those HTTP 5XX exceptions (which we currently do) and let the
>> tasks fail on their own only for spark to retry them for finite number of
>> times and then subsequently fail and thereby break the circuit. Thanks.
>>
>> On Wed, Feb 16, 2022 at 7:59 PM Sean Owen <sr...@gmail.com> wrote:
>>
>>> You stop the Spark job by tasks failing repeatedly, that's already how
>>> it works. You can't kill the driver from the executor other ways, but
>>> should not need to. I'm not clear, you're saying you want to stop the job,
>>> but also continue processing?
>>>
>>> On Wed, Feb 16, 2022 at 7:58 AM S <sh...@gmail.com> wrote:
>>>
>>>> Retries have been already implemented. The question is how to stop the
>>>> spark job by having an executor JVM send a signal to the driver JVM. e.g. I
>>>> have a microbatch of 30 messages; 10 in each of the 3 partitions. Let's say
>>>> while a partition of 10 messages was being processed, first 3 went through
>>>> but then the microservice went down. Now when the 4th message in the
>>>> partition is sent to the microservice it keeps receiving 5XX on every retry
>>>> e.g. 5 retries. What I now want is to have that task from that executor JVM
>>>> send a signal to the driver JVM to terminate the spark job on the failure
>>>> of the 5th retry. Currently, what we have in place is retrying it 5 times
>>>> and then upon failure i.e. 5XX catch the exception and move the message to
>>>> a DLQ thereby having the flatmap produce a *None* and proceed to the
>>>> next message in the partition of that microbatch. This approach keeps the
>>>> pipeline alive and keeps pushing messages to DLQ microbatch after
>>>> microbatch until the microservice is back up.
>>>>
>>>>
>>>> On Wed, Feb 16, 2022 at 6:50 PM Sean Owen <sr...@gmail.com> wrote:
>>>>
>>>>> You could use the same pattern in your flatMap function. If you want
>>>>> Spark to keep retrying though, you don't need any special logic, that is
>>>>> what it would do already. You could increase the number of task retries
>>>>> though; see the spark.excludeOnFailure.task.* configurations.
>>>>>
>>>>> You can just implement the circuit breaker pattern directly too,
>>>>> nothing special there, though I don't think that's what you want? you
>>>>> actually want to retry the failed attempts, not just avoid calling the
>>>>> microservice.
>>>>>
>>>>> On Wed, Feb 16, 2022 at 3:18 AM S <sh...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We have a spark job that calls a microservice in the lambda function
>>>>>> of the flatmap transformation  -> passes to this microservice, the inbound
>>>>>> element in the lambda function and returns the transformed value or "None"
>>>>>> from the microservice as an output of this flatMap transform. Of course the
>>>>>> lambda also takes care of exceptions from the microservice etc.. The
>>>>>> question is: there are times when the microservice may be down and there is
>>>>>> no point recording an exception and putting the message in the DLQ for
>>>>>> every element in our streaming pipeline so long as the microservice stays
>>>>>> down. Instead we want to be able to do is retry the microservice call for a
>>>>>> given event for a predefined no. of times and if found to be down then
>>>>>> terminate the spark job so that this current microbatch is terminated and
>>>>>> there is no next microbatch and the rest of the messages continue therefore
>>>>>> continue to be in the source kafka topics unpolled and therefore
>>>>>> unprocesseed.  until the microservice is back up and the spark job is
>>>>>> redeployed again. In regular microservices, we can implement this using the
>>>>>> Circuit breaker pattern. In Spark jobs however this would mean, being able
>>>>>> to somehow send a signal from an executor JVM to the driver JVM to
>>>>>> terminate the Spark job. Is there a way to do that in Spark?
>>>>>>
>>>>>> P.S.:
>>>>>> - Having the circuit breaker functionality helps specificize the
>>>>>> purpose of the DLQ to data or schema issues only instead of infra/network
>>>>>> related issues.
>>>>>> - As far as the need for the Spark job to use microservices is
>>>>>> concerned, think of it as a complex logic being maintained in a
>>>>>> microservice that does not warrant duplication.
>>>>>> - checkpointing is being taken care of manually and not using spark's
>>>>>> default checkpointing mechanism.
>>>>>>
>>>>>> Regards,
>>>>>> Sheel
>>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> Best Regards,
>>>>
>>>> Sheel Pancholi
>>>>
>>>
>>
>> --
>>
>> Best Regards,
>>
>> Sheel Pancholi
>>
>

Re: Implementing circuit breaker pattern in Spark

Posted by Gourav Sengupta <go...@gmail.com>.

Hi,
once again, just trying to understand the problem first.

Why are we using SPARK to place calls to micro services? There are several
reasons why this should never happen, including costs/ security/
scalability concerns, etc.

Is there a way that you can create a producer and put the data into Kafka
first?

Sorry, I am not suggesting any solutions, just trying to understand the
problem first.


Regards,
Gourav



On Wed, Feb 16, 2022 at 2:36 PM S <sh...@gmail.com> wrote:

> No I want the job to stop and end once it discovers on repeated retries
> that the microservice is not responding. But I think I got where you were
> going right after sending my previous mail. Basically repeatedly failing of
> your tasks on retries ultimately fails your job anyway. So thats an
> in-built circuit breaker. So what that essentially means is we should not
> be catching those HTTP 5XX exceptions (which we currently do) and let the
> tasks fail on their own only for spark to retry them for finite number of
> times and then subsequently fail and thereby break the circuit. Thanks.
>
> On Wed, Feb 16, 2022 at 7:59 PM Sean Owen <sr...@gmail.com> wrote:
>
>> You stop the Spark job by tasks failing repeatedly, that's already how
>> it works. You can't kill the driver from the executor other ways, but
>> should not need to. I'm not clear, you're saying you want to stop the job,
>> but also continue processing?
>>
>> On Wed, Feb 16, 2022 at 7:58 AM S <sh...@gmail.com> wrote:
>>
>>> Retries have been already implemented. The question is how to stop the
>>> spark job by having an executor JVM send a signal to the driver JVM. e.g. I
>>> have a microbatch of 30 messages; 10 in each of the 3 partitions. Let's say
>>> while a partition of 10 messages was being processed, first 3 went through
>>> but then the microservice went down. Now when the 4th message in the
>>> partition is sent to the microservice it keeps receiving 5XX on every retry
>>> e.g. 5 retries. What I now want is to have that task from that executor JVM
>>> send a signal to the driver JVM to terminate the spark job on the failure
>>> of the 5th retry. Currently, what we have in place is retrying it 5 times
>>> and then upon failure i.e. 5XX catch the exception and move the message to
>>> a DLQ thereby having the flatmap produce a *None* and proceed to the
>>> next message in the partition of that microbatch. This approach keeps the
>>> pipeline alive and keeps pushing messages to DLQ microbatch after
>>> microbatch until the microservice is back up.
>>>
>>>
>>> On Wed, Feb 16, 2022 at 6:50 PM Sean Owen <sr...@gmail.com> wrote:
>>>
>>>> You could use the same pattern in your flatMap function. If you want
>>>> Spark to keep retrying though, you don't need any special logic, that is
>>>> what it would do already. You could increase the number of task retries
>>>> though; see the spark.excludeOnFailure.task.* configurations.
>>>>
>>>> You can just implement the circuit breaker pattern directly too,
>>>> nothing special there, though I don't think that's what you want? you
>>>> actually want to retry the failed attempts, not just avoid calling the
>>>> microservice.
>>>>
>>>> On Wed, Feb 16, 2022 at 3:18 AM S <sh...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We have a spark job that calls a microservice in the lambda function
>>>>> of the flatmap transformation  -> passes to this microservice, the inbound
>>>>> element in the lambda function and returns the transformed value or "None"
>>>>> from the microservice as an output of this flatMap transform. Of course the
>>>>> lambda also takes care of exceptions from the microservice etc.. The
>>>>> question is: there are times when the microservice may be down and there is
>>>>> no point recording an exception and putting the message in the DLQ for
>>>>> every element in our streaming pipeline so long as the microservice stays
>>>>> down. Instead we want to be able to do is retry the microservice call for a
>>>>> given event for a predefined no. of times and if found to be down then
>>>>> terminate the spark job so that this current microbatch is terminated and
>>>>> there is no next microbatch and the rest of the messages continue therefore
>>>>> continue to be in the source kafka topics unpolled and therefore
>>>>> unprocesseed.  until the microservice is back up and the spark job is
>>>>> redeployed again. In regular microservices, we can implement this using the
>>>>> Circuit breaker pattern. In Spark jobs however this would mean, being able
>>>>> to somehow send a signal from an executor JVM to the driver JVM to
>>>>> terminate the Spark job. Is there a way to do that in Spark?
>>>>>
>>>>> P.S.:
>>>>> - Having the circuit breaker functionality helps specificize the
>>>>> purpose of the DLQ to data or schema issues only instead of infra/network
>>>>> related issues.
>>>>> - As far as the need for the Spark job to use microservices is
>>>>> concerned, think of it as a complex logic being maintained in a
>>>>> microservice that does not warrant duplication.
>>>>> - checkpointing is being taken care of manually and not using spark's
>>>>> default checkpointing mechanism.
>>>>>
>>>>> Regards,
>>>>> Sheel
>>>>>
>>>>
>>>
>>> --
>>>
>>> Best Regards,
>>>
>>> Sheel Pancholi
>>>
>>
>
> --
>
> Best Regards,
>
> Sheel Pancholi
>

Re: Implementing circuit breaker pattern in Spark

Posted by S <sh...@gmail.com>.

No I want the job to stop and end once it discovers on repeated retries
that the microservice is not responding. But I think I got where you were
going right after sending my previous mail. Basically repeatedly failing of
your tasks on retries ultimately fails your job anyway. So thats an
in-built circuit breaker. So what that essentially means is we should not
be catching those HTTP 5XX exceptions (which we currently do) and let the
tasks fail on their own only for spark to retry them for finite number of
times and then subsequently fail and thereby break the circuit. Thanks.

On Wed, Feb 16, 2022 at 7:59 PM Sean Owen <sr...@gmail.com> wrote:

> You stop the Spark job by tasks failing repeatedly, that's already how
> it works. You can't kill the driver from the executor other ways, but
> should not need to. I'm not clear, you're saying you want to stop the job,
> but also continue processing?
>
> On Wed, Feb 16, 2022 at 7:58 AM S <sh...@gmail.com> wrote:
>
>> Retries have been already implemented. The question is how to stop the
>> spark job by having an executor JVM send a signal to the driver JVM. e.g. I
>> have a microbatch of 30 messages; 10 in each of the 3 partitions. Let's say
>> while a partition of 10 messages was being processed, first 3 went through
>> but then the microservice went down. Now when the 4th message in the
>> partition is sent to the microservice it keeps receiving 5XX on every retry
>> e.g. 5 retries. What I now want is to have that task from that executor JVM
>> send a signal to the driver JVM to terminate the spark job on the failure
>> of the 5th retry. Currently, what we have in place is retrying it 5 times
>> and then upon failure i.e. 5XX catch the exception and move the message to
>> a DLQ thereby having the flatmap produce a *None* and proceed to the
>> next message in the partition of that microbatch. This approach keeps the
>> pipeline alive and keeps pushing messages to DLQ microbatch after
>> microbatch until the microservice is back up.
>>
>>
>> On Wed, Feb 16, 2022 at 6:50 PM Sean Owen <sr...@gmail.com> wrote:
>>
>>> You could use the same pattern in your flatMap function. If you want
>>> Spark to keep retrying though, you don't need any special logic, that is
>>> what it would do already. You could increase the number of task retries
>>> though; see the spark.excludeOnFailure.task.* configurations.
>>>
>>> You can just implement the circuit breaker pattern directly too, nothing
>>> special there, though I don't think that's what you want? you actually want
>>> to retry the failed attempts, not just avoid calling the microservice.
>>>
>>> On Wed, Feb 16, 2022 at 3:18 AM S <sh...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> We have a spark job that calls a microservice in the lambda function of
>>>> the flatmap transformation  -> passes to this microservice, the inbound
>>>> element in the lambda function and returns the transformed value or "None"
>>>> from the microservice as an output of this flatMap transform. Of course the
>>>> lambda also takes care of exceptions from the microservice etc.. The
>>>> question is: there are times when the microservice may be down and there is
>>>> no point recording an exception and putting the message in the DLQ for
>>>> every element in our streaming pipeline so long as the microservice stays
>>>> down. Instead we want to be able to do is retry the microservice call for a
>>>> given event for a predefined no. of times and if found to be down then
>>>> terminate the spark job so that this current microbatch is terminated and
>>>> there is no next microbatch and the rest of the messages continue therefore
>>>> continue to be in the source kafka topics unpolled and therefore
>>>> unprocesseed.  until the microservice is back up and the spark job is
>>>> redeployed again. In regular microservices, we can implement this using the
>>>> Circuit breaker pattern. In Spark jobs however this would mean, being able
>>>> to somehow send a signal from an executor JVM to the driver JVM to
>>>> terminate the Spark job. Is there a way to do that in Spark?
>>>>
>>>> P.S.:
>>>> - Having the circuit breaker functionality helps specificize the
>>>> purpose of the DLQ to data or schema issues only instead of infra/network
>>>> related issues.
>>>> - As far as the need for the Spark job to use microservices is
>>>> concerned, think of it as a complex logic being maintained in a
>>>> microservice that does not warrant duplication.
>>>> - checkpointing is being taken care of manually and not using spark's
>>>> default checkpointing mechanism.
>>>>
>>>> Regards,
>>>> Sheel
>>>>
>>>
>>
>> --
>>
>> Best Regards,
>>
>> Sheel Pancholi
>>
>

-- 

Best Regards,

Sheel Pancholi

Re: Implementing circuit breaker pattern in Spark

Posted by Sean Owen <sr...@gmail.com>.

You stop the Spark job by tasks failing repeatedly, that's already how
it works. You can't kill the driver from the executor other ways, but
should not need to. I'm not clear, you're saying you want to stop the job,
but also continue processing?

On Wed, Feb 16, 2022 at 7:58 AM S <sh...@gmail.com> wrote:

> Retries have been already implemented. The question is how to stop the
> spark job by having an executor JVM send a signal to the driver JVM. e.g. I
> have a microbatch of 30 messages; 10 in each of the 3 partitions. Let's say
> while a partition of 10 messages was being processed, first 3 went through
> but then the microservice went down. Now when the 4th message in the
> partition is sent to the microservice it keeps receiving 5XX on every retry
> e.g. 5 retries. What I now want is to have that task from that executor JVM
> send a signal to the driver JVM to terminate the spark job on the failure
> of the 5th retry. Currently, what we have in place is retrying it 5 times
> and then upon failure i.e. 5XX catch the exception and move the message to
> a DLQ thereby having the flatmap produce a *None* and proceed to the next
> message in the partition of that microbatch. This approach keeps the
> pipeline alive and keeps pushing messages to DLQ microbatch after
> microbatch until the microservice is back up.
>
>
> On Wed, Feb 16, 2022 at 6:50 PM Sean Owen <sr...@gmail.com> wrote:
>
>> You could use the same pattern in your flatMap function. If you want
>> Spark to keep retrying though, you don't need any special logic, that is
>> what it would do already. You could increase the number of task retries
>> though; see the spark.excludeOnFailure.task.* configurations.
>>
>> You can just implement the circuit breaker pattern directly too, nothing
>> special there, though I don't think that's what you want? you actually want
>> to retry the failed attempts, not just avoid calling the microservice.
>>
>> On Wed, Feb 16, 2022 at 3:18 AM S <sh...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> We have a spark job that calls a microservice in the lambda function of
>>> the flatmap transformation  -> passes to this microservice, the inbound
>>> element in the lambda function and returns the transformed value or "None"
>>> from the microservice as an output of this flatMap transform. Of course the
>>> lambda also takes care of exceptions from the microservice etc.. The
>>> question is: there are times when the microservice may be down and there is
>>> no point recording an exception and putting the message in the DLQ for
>>> every element in our streaming pipeline so long as the microservice stays
>>> down. Instead we want to be able to do is retry the microservice call for a
>>> given event for a predefined no. of times and if found to be down then
>>> terminate the spark job so that this current microbatch is terminated and
>>> there is no next microbatch and the rest of the messages continue therefore
>>> continue to be in the source kafka topics unpolled and therefore
>>> unprocesseed.  until the microservice is back up and the spark job is
>>> redeployed again. In regular microservices, we can implement this using the
>>> Circuit breaker pattern. In Spark jobs however this would mean, being able
>>> to somehow send a signal from an executor JVM to the driver JVM to
>>> terminate the Spark job. Is there a way to do that in Spark?
>>>
>>> P.S.:
>>> - Having the circuit breaker functionality helps specificize the purpose
>>> of the DLQ to data or schema issues only instead of infra/network related
>>> issues.
>>> - As far as the need for the Spark job to use microservices is
>>> concerned, think of it as a complex logic being maintained in a
>>> microservice that does not warrant duplication.
>>> - checkpointing is being taken care of manually and not using spark's
>>> default checkpointing mechanism.
>>>
>>> Regards,
>>> Sheel
>>>
>>
>
> --
>
> Best Regards,
>
> Sheel Pancholi
>

Re: Implementing circuit breaker pattern in Spark

Posted by S <sh...@gmail.com>.

Retries have been already implemented. The question is how to stop the
spark job by having an executor JVM send a signal to the driver JVM. e.g. I
have a microbatch of 30 messages; 10 in each of the 3 partitions. Let's say
while a partition of 10 messages was being processed, first 3 went through
but then the microservice went down. Now when the 4th message in the
partition is sent to the microservice it keeps receiving 5XX on every retry
e.g. 5 retries. What I now want is to have that task from that executor JVM
send a signal to the driver JVM to terminate the spark job on the failure
of the 5th retry. Currently, what we have in place is retrying it 5 times
and then upon failure i.e. 5XX catch the exception and move the message to
a DLQ thereby having the flatmap produce a *None* and proceed to the next
message in the partition of that microbatch. This approach keeps the
pipeline alive and keeps pushing messages to DLQ microbatch after
microbatch until the microservice is back up.

On Wed, Feb 16, 2022 at 6:50 PM Sean Owen <sr...@gmail.com> wrote:

> You could use the same pattern in your flatMap function. If you want Spark
> to keep retrying though, you don't need any special logic, that is what it
> would do already. You could increase the number of task retries though; see
> the spark.excludeOnFailure.task.* configurations.
>
> You can just implement the circuit breaker pattern directly too, nothing
> special there, though I don't think that's what you want? you actually want
> to retry the failed attempts, not just avoid calling the microservice.
>
> On Wed, Feb 16, 2022 at 3:18 AM S <sh...@gmail.com> wrote:
>
>> Hi,
>>
>> We have a spark job that calls a microservice in the lambda function of
>> the flatmap transformation  -> passes to this microservice, the inbound
>> element in the lambda function and returns the transformed value or "None"
>> from the microservice as an output of this flatMap transform. Of course the
>> lambda also takes care of exceptions from the microservice etc.. The
>> question is: there are times when the microservice may be down and there is
>> no point recording an exception and putting the message in the DLQ for
>> every element in our streaming pipeline so long as the microservice stays
>> down. Instead we want to be able to do is retry the microservice call for a
>> given event for a predefined no. of times and if found to be down then
>> terminate the spark job so that this current microbatch is terminated and
>> there is no next microbatch and the rest of the messages continue therefore
>> continue to be in the source kafka topics unpolled and therefore
>> unprocesseed.  until the microservice is back up and the spark job is
>> redeployed again. In regular microservices, we can implement this using the
>> Circuit breaker pattern. In Spark jobs however this would mean, being able
>> to somehow send a signal from an executor JVM to the driver JVM to
>> terminate the Spark job. Is there a way to do that in Spark?
>>
>> P.S.:
>> - Having the circuit breaker functionality helps specificize the purpose
>> of the DLQ to data or schema issues only instead of infra/network related
>> issues.
>> - As far as the need for the Spark job to use microservices is concerned,
>> think of it as a complex logic being maintained in a microservice that does
>> not warrant duplication.
>> - checkpointing is being taken care of manually and not using spark's
>> default checkpointing mechanism.
>>
>> Regards,
>> Sheel
>>
>

-- 

Best Regards,

Sheel Pancholi

Re: Implementing circuit breaker pattern in Spark

Posted by Sean Owen <sr...@gmail.com>.

You could use the same pattern in your flatMap function. If you want Spark
to keep retrying though, you don't need any special logic, that is what it
would do already. You could increase the number of task retries though; see
the spark.excludeOnFailure.task.* configurations.

You can just implement the circuit breaker pattern directly too, nothing
special there, though I don't think that's what you want? you actually want
to retry the failed attempts, not just avoid calling the microservice.

On Wed, Feb 16, 2022 at 3:18 AM S <sh...@gmail.com> wrote:

> Hi,
>
> We have a spark job that calls a microservice in the lambda function of
> the flatmap transformation  -> passes to this microservice, the inbound
> element in the lambda function and returns the transformed value or "None"
> from the microservice as an output of this flatMap transform. Of course the
> lambda also takes care of exceptions from the microservice etc.. The
> question is: there are times when the microservice may be down and there is
> no point recording an exception and putting the message in the DLQ for
> every element in our streaming pipeline so long as the microservice stays
> down. Instead we want to be able to do is retry the microservice call for a
> given event for a predefined no. of times and if found to be down then
> terminate the spark job so that this current microbatch is terminated and
> there is no next microbatch and the rest of the messages continue therefore
> continue to be in the source kafka topics unpolled and therefore
> unprocesseed.  until the microservice is back up and the spark job is
> redeployed again. In regular microservices, we can implement this using the
> Circuit breaker pattern. In Spark jobs however this would mean, being able
> to somehow send a signal from an executor JVM to the driver JVM to
> terminate the Spark job. Is there a way to do that in Spark?
>
> P.S.:
> - Having the circuit breaker functionality helps specificize the purpose
> of the DLQ to data or schema issues only instead of infra/network related
> issues.
> - As far as the need for the Spark job to use microservices is concerned,
> think of it as a complex logic being maintained in a microservice that does
> not warrant duplication.
> - checkpointing is being taken care of manually and not using spark's
> default checkpointing mechanism.
>
> Regards,
> Sheel
>