You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by nguyen duc Tuan <ne...@gmail.com> on 2017/03/06 02:51:45 UTC

Kafka failover with multiple data centers

Hi everyone,
We are deploying kafka cluster for ingesting streaming data. But sometimes,
some of nodes on the cluster have troubles (node dies, kafka daemon is
killed...). However, Recovering data in Kafka can be very slow. It takes
serveral hours to recover from disaster. I saw a slide here suggesting
using multiple data centers (
https://www.slideshare.net/HadoopSummit/building-largescale-stream-infrastructures-across-multiple-data-centers-with-apache-kafka).
But I wonder, how can we detect the problem and switch between datacenters
in Spark Streaming? Since kafka 0.10.1 support timestamp index, how can
seek to right offsets?
Are there any opensource library out there that supports handling the
problem on the fly?
Thanks.

Re: Kafka failover with multiple data centers

Posted by nguyen duc Tuan <ne...@gmail.com>.
Hi Soumitra,
We're working on that. The Idea here is to use Kafka to get brokers'
information of the topic and use Kafka client to find coresponding offsets
on new cluster (
https://jeqo.github.io/post/2017-01-31-kafka-rewind-consumers-offset/). You
need kafka >=0.10.1.0 because it supports timestamp-based index.

2017-03-28 5:24 GMT+07:00 Soumitra Johri <so...@gmail.com>:

> Hi, did you guys figure it out?
>
> Thanks
> Soumitra
>
> On Sun, Mar 5, 2017 at 9:51 PM nguyen duc Tuan <ne...@gmail.com>
> wrote:
>
>> Hi everyone,
>> We are deploying kafka cluster for ingesting streaming data. But
>> sometimes, some of nodes on the cluster have troubles (node dies, kafka
>> daemon is killed...). However, Recovering data in Kafka can be very slow.
>> It takes serveral hours to recover from disaster. I saw a slide here
>> suggesting using multiple data centers (https://www.slideshare.net/
>> HadoopSummit/building-largescale-stream-infrastructures-across-
>> multiple-data-centers-with-apache-kafka). But I wonder, how can we
>> detect the problem and switch between datacenters in Spark Streaming? Since
>> kafka 0.10.1 support timestamp index, how can seek to right offsets?
>> Are there any opensource library out there that supports handling the
>> problem on the fly?
>> Thanks.
>>
>

Re: Kafka failover with multiple data centers

Posted by Soumitra Johri <so...@gmail.com>.
Hi, did you guys figure it out?

Thanks
Soumitra
On Sun, Mar 5, 2017 at 9:51 PM nguyen duc Tuan <ne...@gmail.com> wrote:

> Hi everyone,
> We are deploying kafka cluster for ingesting streaming data. But
> sometimes, some of nodes on the cluster have troubles (node dies, kafka
> daemon is killed...). However, Recovering data in Kafka can be very slow.
> It takes serveral hours to recover from disaster. I saw a slide here
> suggesting using multiple data centers (
> https://www.slideshare.net/HadoopSummit/building-largescale-stream-infrastructures-across-multiple-data-centers-with-apache-kafka).
> But I wonder, how can we detect the problem and switch between datacenters
> in Spark Streaming? Since kafka 0.10.1 support timestamp index, how can
> seek to right offsets?
> Are there any opensource library out there that supports handling the
> problem on the fly?
> Thanks.
>