You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Sachin Mittal <sj...@gmail.com> on 2016/12/18 06:42:53 UTC

Is running kafka streaming application advisable on high latency WAN setup

Hi folks,
I needed bit of feedback from you based on your experiences using kafka
streaming application.

We have a replicated kafka cluster running in a data center in one city.
We are running a kafka streaming application which reads from a source
topic from that cluster and commits the output into local database in its
own data center.

The distance between these two data center is about 1000 miles, with high
latency(20 - 70 ms) 100 mbps connection between the two.

Our source topic receives 10,000 message per second and a message size is
around 4 KB.

Since the streaming application receives lot of messages, aggregates them
and again sends aggregated messages to a changelog topic, and then again
reads from changelog topic and updates local store. This is a continuous
process, with changelog topic message size may grow upto 100KB to 750KB.

So you get an idea that there is lot of network data exchange to and fro
between 2 data centers.

In such a scenario is it advisable to run streaming application in a WAN
kind of setup or it is better to move the streaming application within the
LAN of kafka cluster.

We seem to be running into some request timeout issues when running the
application on a WAN vs LAN and needed to know if network connection
between the two could be the issue.


Please let me know your thoughts.

Thanks
Sachin

Re: Is running kafka streaming application advisable on high latency WAN setup

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Streams is designed to run in same DC as brokers. So you might want to
move the app, or replicate the topics you are interested in into the
second DC (using mirror maker of Confluent's proprietary replication tool).

Nevertheless, if you increase timeouts etc. you might still be able to
make it work -- but it is not recommended.


-Matthias


On 12/17/16 10:42 PM, Sachin Mittal wrote:
> Hi folks,
> I needed bit of feedback from you based on your experiences using kafka
> streaming application.
> 
> We have a replicated kafka cluster running in a data center in one city.
> We are running a kafka streaming application which reads from a source
> topic from that cluster and commits the output into local database in its
> own data center.
> 
> The distance between these two data center is about 1000 miles, with high
> latency(20 - 70 ms) 100 mbps connection between the two.
> 
> Our source topic receives 10,000 message per second and a message size is
> around 4 KB.
> 
> Since the streaming application receives lot of messages, aggregates them
> and again sends aggregated messages to a changelog topic, and then again
> reads from changelog topic and updates local store. This is a continuous
> process, with changelog topic message size may grow upto 100KB to 750KB.
> 
> So you get an idea that there is lot of network data exchange to and fro
> between 2 data centers.
> 
> In such a scenario is it advisable to run streaming application in a WAN
> kind of setup or it is better to move the streaming application within the
> LAN of kafka cluster.
> 
> We seem to be running into some request timeout issues when running the
> application on a WAN vs LAN and needed to know if network connection
> between the two could be the issue.
> 
> 
> Please let me know your thoughts.
> 
> Thanks
> Sachin
>