You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by "Matthias J. Sax" <mj...@mailbox.org.INVALID> on 2022/03/05 18:41:34 UTC

Re: How to achieve high availability in a Kafka Streams app during deployment?

Hard to answer from a 10,000ft view.

In general, a rolling upgrade (ie, bounce one instance at a time) is 
recommended. If you have state, you would need to ensure that state is 
not lost during a bounce. As you are using Kubernetes, using stateful 
sets that allow you to re-attach disk should be the way to go.

Rolling upgrade are only supported if the new program (ie, Topology) is 
compatible to the old one. The alternative to a rolling upgrade would 
be, to deploy the new version in parallel to the old one (using a 
different application-id), and after the new version is running stable, 
shutting down the old version.

Hope this helps.

-Matthias

On 2/28/22 12:11, Ismar Slomic wrote:
> We run Kafka Streams (Java) apps on Kubernetes to *consume*, *process* and
> *produce* real time data in our Kafka Cluster (running Confluent Community
> Edition v7.0/Kafka v3.0). How can we do a deployment of our apps in a way
> that limits downtime on consuming records? Our initial target was approx *2
> sec* downtime a single time for each task.
> 
> We are aiming to do continuous deployments of changes to the production
> environment, but deployments are too disruptive by causing downtime in
> record consumption in our apps, leading to latency in produced real time
> records.
> 
> Since this question has already been described in detail on Stack Overflow (
> https://stackoverflow.com/questions/71222496/how-to-achieve-high-availability-in-a-kafka-streams-app-during-deployment),
> but has not been answered yet, we would like to refer to it instead of
> copy/pasting the content in this mailing list.
> 
> Please let me know if you prefer to have the complete question in the
> mailing list instead.
> 

Re: How to achieve high availability in a Kafka Streams app during deployment?

Posted by Ismar Slomic <is...@slomic.no>.
Hi Matthias and thanks for replying to this thread.

> "Hard to answer from a 10,000ft view."
We tried hard to include a detailed explanation of our use case in the
Stack Overflow thread(
https://stackoverflow.com/questions/71222496/how-to-achieve-high-availability-in-a-kafka-streams-app-during-deployment),
while we also wanted to avoid making it too complicated. We can of course
provide more details if needed. Did you read the Stack Overflow thread and
are there any details you want us to explain?

> "As you are using Kubernetes, using stateful sets that allow you to
re-attach disk should be the way to go"
We have considered this approach and it may very well have to be the way to
go. However, this approach would require us to terminate one replica
instance before starting the other. This would not take us to our initial
target that is approx 2 sec downtime a single time for each task. And we
would always be limited by the time for Kubernetes to start our JVM pod. We
think this is not in the spirit of what you are trying to achieve in
KIP-429.

> "Rolling upgrade are only supported if the new program (ie, Topology) is
compatible to the old one."
We have testet rolling upgrade with identical Topology and by replacing one
by one Kafka Streams replica/EKS pod. Details about this are included in
the Stack Overflow thread.

We are really trying to understand how we can achieve high availability and
to make it work for our use case. We don't think that our use case is
unique and that this is something that is useful for others as well. We
are  willing to explain a detailed solution, both in Stack Overflow and a
Medium article if we find a working solution.

- Ismar

On Sat, Mar 5, 2022 at 7:42 PM Matthias J. Sax <mj...@mailbox.org.invalid>
wrote:

> Hard to answer from a 10,000ft view.
>
> In general, a rolling upgrade (ie, bounce one instance at a time) is
> recommended. If you have state, you would need to ensure that state is
> not lost during a bounce. As you are using Kubernetes, using stateful
> sets that allow you to re-attach disk should be the way to go.
>
> Rolling upgrade are only supported if the new program (ie, Topology) is
> compatible to the old one. The alternative to a rolling upgrade would
> be, to deploy the new version in parallel to the old one (using a
> different application-id), and after the new version is running stable,
> shutting down the old version.
>
> Hope this helps.
>
> -Matthias
>
> On 2/28/22 12:11, Ismar Slomic wrote:
> > We run Kafka Streams (Java) apps on Kubernetes to *consume*, *process*
> and
> > *produce* real time data in our Kafka Cluster (running Confluent
> Community
> > Edition v7.0/Kafka v3.0). How can we do a deployment of our apps in a way
> > that limits downtime on consuming records? Our initial target was approx
> *2
> > sec* downtime a single time for each task.
> >
> > We are aiming to do continuous deployments of changes to the production
> > environment, but deployments are too disruptive by causing downtime in
> > record consumption in our apps, leading to latency in produced real time
> > records.
> >
> > Since this question has already been described in detail on Stack
> Overflow (
> >
> https://stackoverflow.com/questions/71222496/how-to-achieve-high-availability-in-a-kafka-streams-app-during-deployment
> ),
> > but has not been answered yet, we would like to refer to it instead of
> > copy/pasting the content in this mailing list.
> >
> > Please let me know if you prefer to have the complete question in the
> > mailing list instead.
> >
>