You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Viktor Somogyi <vi...@gmail.com> on 2017/11/30 10:21:59 UTC

Kafka in virtualized environments

Hi folks,

Recently I bumped into an interesting question: using kafka in virtualized
environments, such as vmware. I'm not really familiar with virtualization
in-depth (how disk virtualization works, what are the OS level supports
etc.), therefore I think this is an interesting discussion from Kafka's
point. As far as I know Kafka is designed for a non-virtualized environment
mainly (although I haven't seen it explicitly anywhere) but thinking of
it's hard reliance on disk optimization I always assumed this.

Anyone has experiences with virtualized Kafka? Are you aware of any pain
points that people should consider (or performance issues)?
Are there any publications on this topic?

Regards,
Viktor

Re: Kafka in virtualized environments

Posted by Thomas Crayford <tc...@salesforce.com>.
We run many thousands of clusters on EC2 without notable issues, and
achieve great performance there. The real thing that matters is how good
your virtualization layer is and how much of a performance impact it has.
E.g. in modern EC2, the performance overhead of using virtualized IO is
around 1-5% tops, which isn't enough of an impact for kafka to really
notice.

On Thu, Nov 30, 2017 at 11:56 AM, Wim Van Leuven <
wim.vanleuven@highestpoint.biz> wrote:

> We are running kafka on openstack for a testing/staging environment.
>
> It runs good and stable, but it obviously is way slower than bare-metal.
> Simple reason is the distance to the disk (as with any IO batch oriented
> system on virtualisation) and virtual network.
>
> HTH
> -wim
>
>
> On Thu, 30 Nov 2017 at 11:22 Viktor Somogyi <vi...@gmail.com>
> wrote:
>
> > Hi folks,
> >
> > Recently I bumped into an interesting question: using kafka in
> virtualized
> > environments, such as vmware. I'm not really familiar with virtualization
> > in-depth (how disk virtualization works, what are the OS level supports
> > etc.), therefore I think this is an interesting discussion from Kafka's
> > point. As far as I know Kafka is designed for a non-virtualized
> environment
> > mainly (although I haven't seen it explicitly anywhere) but thinking of
> > it's hard reliance on disk optimization I always assumed this.
> >
> > Anyone has experiences with virtualized Kafka? Are you aware of any pain
> > points that people should consider (or performance issues)?
> > Are there any publications on this topic?
> >
> > Regards,
> > Viktor
> >
>

Re: Kafka in virtualized environments

Posted by Wim Van Leuven <wi...@highestpoint.biz>.
We are running kafka on openstack for a testing/staging environment.

It runs good and stable, but it obviously is way slower than bare-metal.
Simple reason is the distance to the disk (as with any IO batch oriented
system on virtualisation) and virtual network.

HTH
-wim


On Thu, 30 Nov 2017 at 11:22 Viktor Somogyi <vi...@gmail.com> wrote:

> Hi folks,
>
> Recently I bumped into an interesting question: using kafka in virtualized
> environments, such as vmware. I'm not really familiar with virtualization
> in-depth (how disk virtualization works, what are the OS level supports
> etc.), therefore I think this is an interesting discussion from Kafka's
> point. As far as I know Kafka is designed for a non-virtualized environment
> mainly (although I haven't seen it explicitly anywhere) but thinking of
> it's hard reliance on disk optimization I always assumed this.
>
> Anyone has experiences with virtualized Kafka? Are you aware of any pain
> points that people should consider (or performance issues)?
> Are there any publications on this topic?
>
> Regards,
> Viktor
>

Re: Kafka in virtualized environments

Posted by Viktor Somogyi <vi...@gmail.com>.
@Girish, wow, that could be a nice issue to debug. I was thinking about
exactly these kind of issues with virtualized environments.

@Wim, how did you overcome the problem?
Thinking about such issues my first thoughts are increasing the VM's memory
that can be utilized to read/write caching by the OS or using smaller
segments so it won't sync a big chunk of data at once (by possibly switching
to synchronized
<https://lonesysadmin.net/2013/12/22/better-linux-disk-caching-performance-vm-dirty_ratio/>
from async) but more smaller ones.

On Fri, Dec 1, 2017 at 2:08 AM, Girish Aher <gi...@gmail.com> wrote:

> I am no storage or ESX expert, what I was told by our storage folks is that
> they essentially created a dedicated storage pool in the SAN for zookeeper
> VMs plus other VMs that did not have a lot of IO activity (non DB VMs). I
> assume that implies dedicated physical disks in the SAN for that pool.
>
> I am not sure if a dedicated datastore was created in ESX for this pool, I
> am guessing they did.
> I have not seen the issue since then.
>
> Of course, the best solution is to have zookeeper on their own physicals
> and dedicated disks especially if you plan to use it for purposes in
> addition to Kafka.
>
> Also want to mention that a *temporary* solution around this problem is to
> increase the connection and session timeouts between Kafka and zookeeper.
>
>
> On Thu, Nov 30, 2017 at 2:33 PM, Sean Glover <se...@lightbend.com>
> wrote:
>
> > Giresh, I'm curious what your solution was.  Did you use locally attached
> > storage for your ZK ensemble?  Did you move it to static machines?
> >
> > On Thu, Nov 30, 2017 at 4:50 PM, John Yost <ho...@gmail.com> wrote:
> >
> > > Great point by Girish--its the delays of syncing with Zookeeper that
> are
> > > particularly problematic. Moreover, Zookeeper sync delays and session
> > > timeouts impact other systems as well such as Storm.
> > >
> > > --John
> > >
> > > On Thu, Nov 30, 2017 at 10:14 AM, Girish Aher <gi...@gmail.com>
> > > wrote:
> > >
> > > > We did not face any problems with kafka application per se but we
> have
> > > > faced problems with zookeeper in virtualized environments due to
> > slowness
> > > > in fsyncs. We were using a shared SAN storage with shared pools with
> > > other
> > > > VMs. So every time, there was some kind of considerable storage
> > activity
> > > > like DB backup or something, our zookeeper fsyncs used to take tens
> of
> > > > seconds causing kafka-zookeeper sessions to timeout.
> > > >
> > > > On Nov 30, 2017 2:22 AM, "Viktor Somogyi" <vi...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi folks,
> > > > >
> > > > > Recently I bumped into an interesting question: using kafka in
> > > > virtualized
> > > > > environments, such as vmware. I'm not really familiar with
> > > virtualization
> > > > > in-depth (how disk virtualization works, what are the OS level
> > supports
> > > > > etc.), therefore I think this is an interesting discussion from
> > Kafka's
> > > > > point. As far as I know Kafka is designed for a non-virtualized
> > > > environment
> > > > > mainly (although I haven't seen it explicitly anywhere) but
> thinking
> > of
> > > > > it's hard reliance on disk optimization I always assumed this.
> > > > >
> > > > > Anyone has experiences with virtualized Kafka? Are you aware of any
> > > pain
> > > > > points that people should consider (or performance issues)?
> > > > > Are there any publications on this topic?
> > > > >
> > > > > Regards,
> > > > > Viktor
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Senior Software Engineer, Lightbend, Inc.
> >
> > <http://lightbend.com>
> >
> > @seg1o <https://twitter.com/seg1o>
> >
>

Re: Kafka in virtualized environments

Posted by Girish Aher <gi...@gmail.com>.
I am no storage or ESX expert, what I was told by our storage folks is that
they essentially created a dedicated storage pool in the SAN for zookeeper
VMs plus other VMs that did not have a lot of IO activity (non DB VMs). I
assume that implies dedicated physical disks in the SAN for that pool.

I am not sure if a dedicated datastore was created in ESX for this pool, I
am guessing they did.
I have not seen the issue since then.

Of course, the best solution is to have zookeeper on their own physicals
and dedicated disks especially if you plan to use it for purposes in
addition to Kafka.

Also want to mention that a *temporary* solution around this problem is to
increase the connection and session timeouts between Kafka and zookeeper.


On Thu, Nov 30, 2017 at 2:33 PM, Sean Glover <se...@lightbend.com>
wrote:

> Giresh, I'm curious what your solution was.  Did you use locally attached
> storage for your ZK ensemble?  Did you move it to static machines?
>
> On Thu, Nov 30, 2017 at 4:50 PM, John Yost <ho...@gmail.com> wrote:
>
> > Great point by Girish--its the delays of syncing with Zookeeper that are
> > particularly problematic. Moreover, Zookeeper sync delays and session
> > timeouts impact other systems as well such as Storm.
> >
> > --John
> >
> > On Thu, Nov 30, 2017 at 10:14 AM, Girish Aher <gi...@gmail.com>
> > wrote:
> >
> > > We did not face any problems with kafka application per se but we have
> > > faced problems with zookeeper in virtualized environments due to
> slowness
> > > in fsyncs. We were using a shared SAN storage with shared pools with
> > other
> > > VMs. So every time, there was some kind of considerable storage
> activity
> > > like DB backup or something, our zookeeper fsyncs used to take tens of
> > > seconds causing kafka-zookeeper sessions to timeout.
> > >
> > > On Nov 30, 2017 2:22 AM, "Viktor Somogyi" <vi...@gmail.com>
> > wrote:
> > >
> > > > Hi folks,
> > > >
> > > > Recently I bumped into an interesting question: using kafka in
> > > virtualized
> > > > environments, such as vmware. I'm not really familiar with
> > virtualization
> > > > in-depth (how disk virtualization works, what are the OS level
> supports
> > > > etc.), therefore I think this is an interesting discussion from
> Kafka's
> > > > point. As far as I know Kafka is designed for a non-virtualized
> > > environment
> > > > mainly (although I haven't seen it explicitly anywhere) but thinking
> of
> > > > it's hard reliance on disk optimization I always assumed this.
> > > >
> > > > Anyone has experiences with virtualized Kafka? Are you aware of any
> > pain
> > > > points that people should consider (or performance issues)?
> > > > Are there any publications on this topic?
> > > >
> > > > Regards,
> > > > Viktor
> > > >
> > >
> >
>
>
>
> --
> Senior Software Engineer, Lightbend, Inc.
>
> <http://lightbend.com>
>
> @seg1o <https://twitter.com/seg1o>
>

Re: Kafka in virtualized environments

Posted by Sean Glover <se...@lightbend.com>.
Giresh, I'm curious what your solution was.  Did you use locally attached
storage for your ZK ensemble?  Did you move it to static machines?

On Thu, Nov 30, 2017 at 4:50 PM, John Yost <ho...@gmail.com> wrote:

> Great point by Girish--its the delays of syncing with Zookeeper that are
> particularly problematic. Moreover, Zookeeper sync delays and session
> timeouts impact other systems as well such as Storm.
>
> --John
>
> On Thu, Nov 30, 2017 at 10:14 AM, Girish Aher <gi...@gmail.com>
> wrote:
>
> > We did not face any problems with kafka application per se but we have
> > faced problems with zookeeper in virtualized environments due to slowness
> > in fsyncs. We were using a shared SAN storage with shared pools with
> other
> > VMs. So every time, there was some kind of considerable storage activity
> > like DB backup or something, our zookeeper fsyncs used to take tens of
> > seconds causing kafka-zookeeper sessions to timeout.
> >
> > On Nov 30, 2017 2:22 AM, "Viktor Somogyi" <vi...@gmail.com>
> wrote:
> >
> > > Hi folks,
> > >
> > > Recently I bumped into an interesting question: using kafka in
> > virtualized
> > > environments, such as vmware. I'm not really familiar with
> virtualization
> > > in-depth (how disk virtualization works, what are the OS level supports
> > > etc.), therefore I think this is an interesting discussion from Kafka's
> > > point. As far as I know Kafka is designed for a non-virtualized
> > environment
> > > mainly (although I haven't seen it explicitly anywhere) but thinking of
> > > it's hard reliance on disk optimization I always assumed this.
> > >
> > > Anyone has experiences with virtualized Kafka? Are you aware of any
> pain
> > > points that people should consider (or performance issues)?
> > > Are there any publications on this topic?
> > >
> > > Regards,
> > > Viktor
> > >
> >
>



-- 
Senior Software Engineer, Lightbend, Inc.

<http://lightbend.com>

@seg1o <https://twitter.com/seg1o>

Re: Kafka in virtualized environments

Posted by John Yost <ho...@gmail.com>.
Great point by Girish--its the delays of syncing with Zookeeper that are
particularly problematic. Moreover, Zookeeper sync delays and session
timeouts impact other systems as well such as Storm.

--John

On Thu, Nov 30, 2017 at 10:14 AM, Girish Aher <gi...@gmail.com> wrote:

> We did not face any problems with kafka application per se but we have
> faced problems with zookeeper in virtualized environments due to slowness
> in fsyncs. We were using a shared SAN storage with shared pools with other
> VMs. So every time, there was some kind of considerable storage activity
> like DB backup or something, our zookeeper fsyncs used to take tens of
> seconds causing kafka-zookeeper sessions to timeout.
>
> On Nov 30, 2017 2:22 AM, "Viktor Somogyi" <vi...@gmail.com> wrote:
>
> > Hi folks,
> >
> > Recently I bumped into an interesting question: using kafka in
> virtualized
> > environments, such as vmware. I'm not really familiar with virtualization
> > in-depth (how disk virtualization works, what are the OS level supports
> > etc.), therefore I think this is an interesting discussion from Kafka's
> > point. As far as I know Kafka is designed for a non-virtualized
> environment
> > mainly (although I haven't seen it explicitly anywhere) but thinking of
> > it's hard reliance on disk optimization I always assumed this.
> >
> > Anyone has experiences with virtualized Kafka? Are you aware of any pain
> > points that people should consider (or performance issues)?
> > Are there any publications on this topic?
> >
> > Regards,
> > Viktor
> >
>

Re: Kafka in virtualized environments

Posted by Girish Aher <gi...@gmail.com>.
We did not face any problems with kafka application per se but we have
faced problems with zookeeper in virtualized environments due to slowness
in fsyncs. We were using a shared SAN storage with shared pools with other
VMs. So every time, there was some kind of considerable storage activity
like DB backup or something, our zookeeper fsyncs used to take tens of
seconds causing kafka-zookeeper sessions to timeout.

On Nov 30, 2017 2:22 AM, "Viktor Somogyi" <vi...@gmail.com> wrote:

> Hi folks,
>
> Recently I bumped into an interesting question: using kafka in virtualized
> environments, such as vmware. I'm not really familiar with virtualization
> in-depth (how disk virtualization works, what are the OS level supports
> etc.), therefore I think this is an interesting discussion from Kafka's
> point. As far as I know Kafka is designed for a non-virtualized environment
> mainly (although I haven't seen it explicitly anywhere) but thinking of
> it's hard reliance on disk optimization I always assumed this.
>
> Anyone has experiences with virtualized Kafka? Are you aware of any pain
> points that people should consider (or performance issues)?
> Are there any publications on this topic?
>
> Regards,
> Viktor
>